# ADVANCES IN GENOMICS AND EPIGENOMICS OF SOCIAL INSECTS

EDITED BY: Greg J. Hunt and Juergen R. Gadau PUBLISHED IN: Frontiers in Genetics

#### *Frontiers Copyright Statement*

*© Copyright 2007-2017 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88945-080-0 DOI 10.3389/978-2-88945-080-0

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

## **ADVANCES IN GENOMICS AND EPIGENOMICS OF SOCIAL INSECTS**

Topic Editors: **Greg J. Hunt,** Purdue University, USA **Juergen R. Gadau,** Institute for Evolution and Biodiversity, Germany

Introductory paragraph figure: Illustration by Elizabeth Cash Cover: Photo by Greg Hunt

Social insects are among the most successful and ecologically important animals on earth. The lifestyle of these insects has fascinated humans since prehistoric times. These species evolved a caste of workers that in most cases have no progeny. Some social insects have worker sub-castes that are morphologically specialized for discrete tasks. The organization of the social insect colony has been compared to the metazoan body. Males in the order Hymenoptera (bees, ants and wasps) are haploid, a situation which results in higher relatedness between female siblings. Sociality evolved many times within the Hymenoptera, perhaps spurred in part by increased relatedness that increases inclusive fitness benefits to workers cooperating to raise their sisters and brothers rather than reproducing themselves. But epigenetic processes may also have contributed to the evolution of sociality. The Hymenoptera provide opportunities for comparative study of species ranging from solitary to highly social. A more

ancient clade of social insects, the termites (infraorder Isoptera) provide an opportunity to study alternative mechanisms of caste determination and lifestyles that are aided by an array of endosymbionts. This research topic explores the use of genome sequence data and genomic techniques to help us explore how sociality evolved in insects, how epigenetic processes enable phenotypic plasticity, and the mechanisms behind whether a female will become a queen or a worker.

**Citation:** Hunt, G. J., Gadau, J. R., eds. (2017). Advances in Genomics and Epigenomics of Social Insects. Lausanne: Frontiers Media. doi: 10.3389/978-2-88945-080-0

# Table of Contents

*05 Editorial: Advances in Genomics and Epigenomics of Social Insects* Greg J. Hunt and Juergen R. Gadau

#### **The Termites**


#### **Epigenetics in Social Evolution**


#### **Comparing Social and Solitary Species and Castes**


#### **Mechanistic Studies of Behavioral and Developmental Plasticity**


Matthias Biewer, Francisca Schlesinger and Martin Hasselmann

#### **A Genomic-Geographic Survey of Honey Bee Disease and Microbiome**

### *144 Metatranscriptomic analyses of honey bee colonies*

Cansu Ö. Tozkar, Meral Kence, Aykut Kence, Qiang Huang and Jay D. Evans

## Editorial: Advances in Genomics and Epigenomics of Social Insects

Greg J. Hunt <sup>1</sup> \* and Juergen R. Gadau<sup>2</sup>

*<sup>1</sup> Department of Entomology, Purdue University, West Lafayette, IN, USA, <sup>2</sup> School of Life Sciences, Arizona State University, Tempe, AZ, USA*

Keywords: social evolution, phenotypic plasticity, caste determination, epigenetics, eusociality, social insect, reproductive caste, sterile caste

**Editorial on the Research Topic**

#### **Advances in Genomics and Epigenomics of Social Insects**

The adaptive advantage of the eusocial lifestyle is evident from the fact that social insects represent more than half of the world's arthropod biomass. This topic explores how the recent advances in genomics and epigenomics are helping researchers to ask and answer questions concerning the evolution of social behavior and the genetic and epigenetic mechanisms behind phenotypic plasticity, i.e., how environmental signals can morph the same genome in a reproductive or nonreproductive individual resulting in dramatically different phenotypes. The articles in this research topic deal broadly with the evolution of reproductive and sterile castes (workers), mechanisms of caste determination, and the role of epigenetic processes for division of labor. The termites were the first group of insects to evolve eusociality and a thorough review describes what is known about the development of subcastes from a mechanistic perspective (nymphs, workers, soldiers) and the genomic contributions of gut symbionts and their hosts in digestion of wood, and the role of symbionts in host fitness (Scharf). Korb et al. compares the genomes of two termites with contrasting social complexities and symbioses. One of the interesting findings was that gene families involved in chemical communication in other social insects are not expanded in termites with more complex social organization. But transposable elements are, suggesting a role for transposition in social evolution but perhaps also pointing toward other mechanisms.

Darwin had a "special difficulty" understanding how sterile worker castes arose in the social insects and the existence of morphological specializations in individuals that did not have progeny. Epigenetic processes could provide mechanisms to encode these specializations within a worker caste just as it does in clonal cells of developing tissues. For example, experimental manipulations that cause honeybee workers to switch task specializations are marked by specific methylation events (Herb). However, the function of gene body methylation in regards to behavioral plasticity of workers, although associated with alternative splicing remains uncertain. The less-studied, and less abundant 5hydroxymethylcytosine (5hmC) modifications are intriguingly enriched in germ cells and brain of honeybees just as they are in mammals (Rasmussen and Amdam). Ruden et al. continue Herb's answer to Darwin's dilemma by suggesting that solitary ancestors of social bees may have experienced nutrient limitations, leading to a de-facto sterile caste in communal nesting situations. Stresses such as this could also activate heat shock proteins such as those that are involved in multi-generational inheritance of bizarre phenotypes in Drosophila without a change in DNA sequence. For example, Hsp90 inactivation has been linked to Ubx expression and the formation of pollen baskets on the legs of bees. On the other hand, Cini et al. ask how it is that some eusocial species went the other way and lost the sterile caste? Some of these species that showed social reversals evolved into social parasites that still depend on workers, but they exploit workers of closely related eusocial species. It seems more data is needed to determine whether comparing expression levels of conserved genes such as Ubx in different castes and species will provide insight into this process.

#### Edited and reviewed by:

*Samuel A. Cushman, United States Forest Service Rocky Mountain Research Station, USA*

> \*Correspondence: *Greg J. Hunt ghunt@purdue.edu*

#### Specialty section:

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics*

> Received: *08 September 2016* Accepted: *31 October 2016* Published: *16 November 2016*

#### Citation:

*Hunt GJ and Gadau JR (2016) Editorial: Advances in Genomics and Epigenomics of Social Insects. Front. Genet. 7:199. doi: 10.3389/fgene.2016.00199*

Comparative studies of social insects and their solitary relatives can be used to look for signatures of social evolution. Sovik et al. analyze the question of whether specific miRNAs may have predisposed bee species to evolve eusociality. One pattern that emerges is that taxonomically restricted genes apparently have the highest rates of adaptive evolution in the honeybee. Similarly, recent expansions of regulatory sequences are restricted to specific ant lineages. A population genomic study combined with a meta-analysis of microarray data in the honeybee suggest that both protein coding and regulatory sequences that are rapidly evolving tend to lie at the periphery of gene networks (Moldostova et al.). One question asked by Helanterä and Uller is whether genes that show biases in expression between morphological castes of ants and bees are under strong purifying selection or whether neutral processes allow genes to be co-opted for specific roles in castes. Similar differences in gene expression have been observed between morphs of plants and animals. More data comparing expression between and within castes is needed to answer these questions.

The final three chapters we will mention take a more mechanistic approach to understanding development and behavior of bees. It has been repeatedly shown that fundamental changes in gene expression during development of either the worker or queen phenotype are mediated by ecdysteroid hormones. An impressive series of experiments by Mello et al. characterize the interactions of ecdysone, juvenile hormone and ecdysone receptor expression, along with downstream gene regulation in the fat body of honeybees. Analysis of interacting miRNAs on differentially transcribed genes during development may provide even more insight into the making of a queen.

Reciprocal hybrids derived from European and Africanized honeybees exhibit both gene expression differences and aggressive behaviors that depend on the direction of the cross. In hybrids with European maternity (but not the reciprocal family), about 8% of genes tested were strongly biased toward expression of the maternal allele in European-maternity hybrids (Gibson et al.). The biased genes are enriched for mitochondrial proteins and genes of metabolic function. Most biased genes are dispersed in the genome but large tracts of them are localized to two quantitative trait loci reported to influence aggressive behavior and alarm pheromone production. The authors speculate that this phenomenon involves partial cytoplasmic incompatibility,

## REFERENCES

Kocher, S. D., and Paxton, R. J. (2014). Comparative methods offer powerful insights into social evolution in bees. Apidologie (Celle). 45, 289–305. doi:10.1007/s13592-014-0268-3

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

nuclear/mitochondrial signaling, heat-shock proteins and short interfering RNA.

The vast majority of social insects are in the order Hymenoptera—the bees, ants, and wasps, which exhibit male haploidy. In most of these species female development is determined by heterozygosity at a single locus but some wasp species rely on a process that signals fertilization of the egg. A common theme however is the involvement of the gene transformer. In honeybees, it appears that duplication of a putative ortholog of tra, called fem, followed by positive selection resulted in the single-locus, multi-allele complementary sex determiner (csd) gene. Biewer et al. present evidence that ancestral duplications of fem is restricted to specific bee lineages. They go on to discuss how the gene that sends the initial signal in sex determination could be re-purposed after duplication.

It has been 10 years since the honey bee genome was published. Currently (2016), we have about 50 social insect genomes published with an expected rapid increase in the rate of genome sequencing on the horizon. For example, a proposal to sequence all ant genera has just been put forward by a group of researchers (GAGA, Global Ant Genomics Alliance). Hence, in the near future comparative genomics will greatly increase our knowledge about the processes that shaped the genomes of social insects. For example, comparative studies of bees will be useful for understanding changes associated with the evolution of sociality because there were multiple gains and losses of the eusocial lifestyle in this clade (Kocher and Paxton, 2014). Sequencing of individuals from population studies, coupled with phenotypic data will help identify genes under selection during social evolution, including the genetic architecture of traits of primitively social species. Functional genomics of social insects will be greatly aided by gene editing using CRISPR/CAS methodologies, RNAi and physiological and behavioral assays that are informed by what is learned from metabolomics and transcriptomics will enable social insects to be models for understanding behavioral genetics in general and social evolution in particular.

#### AUTHOR CONTRIBUTIONS

All authors listed, have made substantial, direct and intellectual contribution to the work, and approved it for publication.

Copyright © 2016 Hunt and Gadau. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## A genomic comparison of two termites with different social complexity

#### *Judith Korb1 \*, Michael Poulsen2, Haofu Hu3, Cai Li 3,4, Jacobus J. Boomsma2, Guojie Zhang2,3 and Jürgen Liebig5*

*<sup>1</sup> Department of Evolutionary Biology and Ecology, Institute of Biology I, University of Freiburg, Freiburg, Germany*

*<sup>2</sup> Section for Ecology and Evolution, Department of Biology, Centre for Social Evolution, University of Copenhagen, Copenhagen, Denmark*

*<sup>3</sup> China National Genebank, BGI-Shenzhen, Shenzhen, China*

*<sup>4</sup> Centre for GeoGenetics, Natural History Museum of Denmark, University of Copenhagen, Copenhagen, Denmark*

*<sup>5</sup> School of Life Sciences, Arizona State University, Tempe, AZ, USA*

#### *Edited by:*

*Juergen Rudolf Gadau, Arizona State University, USA*

#### *Reviewed by:*

*Seirian Sumner, University of Bristol, UK Bart Pannebakker, Wageningen University, Netherlands Michael E. Scharf, Purdue University, USA*

#### *\*Correspondence:*

*Judith Korb, Department of Evolutionary Biology and Ecology, Institute of Biology I, University of Freiburg, Hauptstrasse 1, D-79104 Freiburg, Germany e-mail: judith.korb@ biologie.uni-freiburg.de*

The termites evolved eusociality and complex societies before the ants, but have been studied much less. The recent publication of the first two termite genomes provides a unique comparative opportunity, particularly because the sequenced termites represent opposite ends of the social complexity spectrum. *Zootermopsis nevadensis* has simple colonies with totipotent workers that can develop into all castes (dispersing reproductives, nest-inheriting replacement reproductives, and soldiers). In contrast, the fungus-growing termite *Macrotermes natalensis* belongs to the higher termites and has very large and complex societies with morphologically distinct castes that are life-time sterile. Here we compare key characteristics of genomic architecture, focusing on genes involved in communication, immune defenses, mating biology and symbiosis that were likely important in termite social evolution. We discuss these in relation to what is known about these genes in the ants and outline hypothesis for further testing.

**Keywords: chemical communication, genomes, immunity, social organization, social insects, symbiosis, termites, transposable elements**

#### **INTRODUCTION**

The termites are "social cockroaches," a monophyletic clade (Infraorder "Isoptera") nested within the Blattodea (Inward et al., 2007a; Engel et al., 2009; Krishna et al., 2013). They superficially resemble the ants in having wingless worker foragers, but are fundamentally different in a series of ancestral traits that affect the organization of their eusocial colonies (Korb, 2008; Howard and Thorne, 2011). The (eu)social Hymenoptera are haplodiploid holometabolous insects whose males develop from haploid eggs and have transient roles in social life, because they survive only as sperm stored in the spermatheca of queens. Hymenopteran colonies thus consist of female adults that develop from fertilized eggs to differentiate into workers, virgin queens and occasionally soldiers of which only the former care for the helpless grub-like larvae. By contrast, termites are diploid hemimetabolous insects whose colonies usually have workers, soldiers, and reproductives of both sexes. Both have life-time monogamy upon colony founding as ancestral state (Hughes et al., 2008; Boomsma, 2013), but in contrast to the eusocial Hymenoptera, royal pairs regularly remate to produce immatures that increasingly come to resemble the workers, soldiers, and reproductives into which they differentiate. Hence, termite caste differentiation is based on phenotypic plasticity among immatures (Korb and Hartfelder, 2008; Miura and Scharf, 2011), while the eusocial Hymenoptera have castes of adults (Wilson, 1971).

Termites and ants also share many traits that convergently evolved in response to similar selective pressures (Thorne and Traniello, 2003; Korb, 2008; Howard and Thorne, 2011). Both are mostly soil-dwelling and thus continuously exposed to high pathogen loads and their long-lived, populous and genetically homogenous colonies appear to be ideal targets for infections (Schmid-Hempel, 1998). However, both the ants and the termites also evolved impressive disease defense strategies, which have implied that very few pathogens have been able to specialize on infecting perennial ant and termite colonies over evolutionary time (Boomsma et al., 2005). In large part this appears to be due to immune defenses operating both at the individual and the collective (social immunity) level (Cremer et al., 2007; Rosengaus et al., 2011). Another common characteristic of the ants and termites is that both evolved complex communication systems that largely rely on chemical cues, such as cuticular hydrocarbons (CHCs), for nestmate recognition and within-colony communication (e.g., Liebig, 2010; Van Zweden and D'Ettorre, 2010). Strikingly, long-chained CHCs of queens often appear to function as fertility signals for workers of both lineages (Liebig et al., 2009; Weil et al., 2009; Liebig, 2010; van Oystaeyen et al., 2014). Here, we offer the first comparative exploration of the extent to which lineage ancestry has determined these convergent phenotypic similarities based on the first two termite genomes that became recently available (Poulsen et al., 2014; Terrapon et al., 2014).

The two termite genomes represent opposite ends of the social complexity spectrum within the Isoptera (Roisin, 2000) (**Table 1**) as they exemplify the two fundamental termite life types: the wood-dwelling one-piece nesters and the central place foraging lineages that generally differ in social complexity, feeding ecology, gut symbionts, and developmental plasticity (Abe, 1987; Korb, 2007; Korb and Hartfelder, 2008) (**Figure 1**). *Zootermopsis nevadensis* belongs to the former type and *Macrotermes natalensis* to the latter. Wood-dwelling species (Abe, 1987; Shellman-Reeve, 1997) nest within a single piece of dead wood that serves both as food and nesting habitat so the termites never leave their nest to forage. This social syndrome is widely considered to be ancestral (e.g., Noirot and Pasteels, 1987, 1988; Inward et al., 2007b) and associated with high degrees of developmental plasticity for the individual termites (**Figure 2A**). Workers remain totipotent immatures throughout several instars that commonly develop further into sterile soldiers, winged sexuals (alates) that found new nests as primary reproductives, or neotenic reproductives that reproduce within the natal nest (**Figure 2A**).

The foraging termite species (also called "multiple piece nesters"; Abe, 1987; Shellman-Reeve, 1997) forage for food outside the nest at some point after colony foundation and bring it back to the colony to feed nestmates. They represent more than 85% of the extant termite species (Kambhampati and Eggleton, 2000). They have true workers and an early separation into distinct developmental pathways (Roisin, 2000; Korb and Hartfelder, 2008) (**Figure 2B**). In the apterous line, individuals are unable to develop wings and can thus never disperse as reproductives. They become workers and soldiers, but can in some species also advance to become neotenic reproductives in their own nest. In the nymphal line, however, individuals develop wings and dispersing phenotypes that found new colonies elsewhere (**Figure 2B**). The Macrotermitinae to which *Macrotermes natalensis* belongs are special examples of foraging termites because their colonies are dependent on nutrition provided by a *Termitomyces* symbiont (Basidiomycota: Agaricales) (Wood and Thomas, 1989; Nobre et al., 2011). This fungal symbiosis is evolutionarily derived and comes in addition to more fundamental protist (lower termites) and bacterial gut symbionts (all termites), which have played major roles throughout termite evolution. *Macrotermes* species have two (major/minor) worker castes and two (major/minor) soldier castes (Ruelle, 1970) that may be determined as early as the egg stage (suggested for *Macrotermes* *michaelseni* by Okot-Kotber, 1985). *Macrotermes* colonies often build conspicuous mounds that may harbor several millions of individuals (Noirot and Darlington, 2000; Korb, 2011).

We compare the genomes of these divergent species (**Table 1**) with those of other insects and outline first hypotheses how sociality and ecological factors left their footprints in the genomes.

#### **MATERIALS AND METHODS CONSTRUCTION OF GENE FAMILIES**

To gain insight into the evolution of gene families in termites, we clustered genes from 12 insect genomes (pea aphid: *Acyrthosiphon pisum*: The International Pea Aphid Genomics Consortium, 2010; body louse: *Pediculus humanus*: Kirkness et al., 2010; flour beetle: *Tribolium castaneum*: Richards et al., 2008; fruitfly: *Drosophila melanogaster*: Adams et al., 2000; jewel wasp: *Nasonia vitripennis*: Werren et al., 2010; honeybee: *Apis mellifera*: The Honeybee Genome Sequencing Consortium, 2006; ants: *Acromyrmex echinatior*: Nygaard et al., 2011, *Atta cephalotes*: Suen et al., 2011, *Camponotus floridanus, Harpegnathos saltator*: Bonasio et al., 2010; termites: *Z. nevadensis, M. natalensis*), the water flea *Daphnia pulex* (Colbourne et al., 2011), and the round worm *Caenorhabditis elegans* (Coulson and C. elegans Genome Consortium, 1996). The gene sets of the species that we chose were downloaded from the Ensembl database (Flicek et al., 2014), except for ants and termites which were downloaded from their own reference databases. Then we used Treefam (Li et al., 2006) to construct gene families. For more information see also Terrapon et al. (2014) and Poulsen et al. (2014) (Table S1).

#### **FUNCTIONAL ANNOTATION OF TERMITE GENES**

InterproScan v4.8 (Zdobnov and Apweiler, 2001) was used to annotate motifs and domains of translated proteins in two termites. Protein sequences were searched against SUPERFAMILY, Pfam, PRINTS, PROSITE, ProDom, Gene3D, PANTHER, and SMART databases in Interpro with default parameter settings. GO (gene ontology) terms for each gene were obtained from the Interpro database according to the relationship of GO and Interpro terms. The KEGG annotation (Kanehisa and Goto, 2000)


**Table 1 | Summary of traits that differ between the two study species.**

*Traits 1–3 co-vary in termites in that wood-dwelling termites with totipotent workers are always less socially complex, while foraging termites are more socially complex with workers having restricted developmental options. However, huge trait variability exists within foraging species, see also Figure 1.*

some genomic/molecular genetic research has been done. Added to the right are characteristic social and ecological traits. Social: increasing social complexity from + to +++ (e.g., increasing colony size, division of labor, morphological differentiation between castes); Type: life type, foraging vs. wood dwelling; Region: temperate vs. tropical; Pathogens:

*Macrotermes natalensis* (Judith Korb), *Reticulitermes speratus* (Kenji Matsuura), *Reticulitermes flavipes* (not shown), *Coptotermes formosanus* (not shown), *Prorhinotermes simplex* (Judith Korb), *Cryptotermes secundus* (Judith Korb), *Zootermopsis nevadensis* (Judith Korb), *Hodotermes sjostedti* (Toru Miura).

was done via the KAAS online server (Moriya et al., 2007) using the SBH method against the eukaryotic species set.

#### **TERMITE-SPECIFIC GENES**

Some gene families were termite-specific and absent from the other investigated genomes. For these genes we performed functional enrichment analyses of GO and IPR (Interpro domain) annotation. *P*-values for significant difference were obtained by χ2-tests adjusted by FDR (false discovery rate). Similarly, we analyzed differences between the gene sets of *Z. nevadensis* und *M. natalensis* by comparing IPR annotation, KEGG pathways, and gene families. We constructed gene families for both genomes using Treefam (Li et al., 2006) and tested for differences in gene numbers using χ2-tests (or Fisher's exact test for small sample sizes). For gene families that were specific to *M. natalensis* and/or *Z. nevadensis*, we performed IPR enrichment analyses to obtain information on the putative functions of these genes.

#### **REPEAT ANALYSES**

We used the *M. natalensis* and *Z. nevadensis* genome assemblies to perform repetitive sequence annotation. First, we did homologous repeat family annotation to identify transposable

*M. natalensis***.** Wood-dwelling termites have totipotent immature stages that can explore all caste options, whereas higher termites have a bifurcating caste development pathway splitting into a nymphal line leading to winged dispersing alates and an apterous line leading to workers and soldiers. In *M. natalensis* this bifurcation is already established in the egg stage. (i) progressive development via nymphal instar(s) into winged sexuals (alates) that disperse and found a new nest elsewhere; (ii) stationary molt remaining in the same instar; (iii) regressive development into an "earlier" instar (gray semi-circle); (iv) development into a soldier, and (v) development into a neotenic replacement reproductive that reproduces within the natal nest. Part (a) is adapted from Korb et al. (2012b). (Photo credits: Judith Korb).

elements (TEs) using the TE database Repbase v17.06 (Jurka and Kapitonov, 2005) and the programs RepeatMasker (parameter –norna) and RepeatProteinMask v4.0.1 (http://www*.* RepeatMasker*.*org) (parameter –p 0.0001) (Smit et al., 1996- 2010). *De-novo* repeat family annotation was done with PILER v1.0 (Edgar and Myers, 2005), LTRfinder v1.05 (Zhao and Wang, 2007) and RepeatModeler v1.05, (http://www*.*RepeatMasker*.*org) (Smit et al., 1996-2010) using default parameters. TEs identified by PILER were converted into TE families and aligned with Muscle v3.28 (Edgar, 2004) to obtain consensus sequences from the alignments. In order to reduce redundancy in the results of LTRfinder and PILER, an "all against all" BLASTn (*e*-value 1e-5) was performed. If sequences overlapped for more than 80% we kept the longer TE.

We combined the TE families with the consensus sequences of LTRfinder and PILER together with those identified using RepeatModeler to obtain the final TE sequence library for the two termites. All TE sequences were classified with RepeatClassifier in the RepeatModeler package against Repbase v17.06 (Jurka and Kapitonov, 2005) (Dataset S1). Finally, we used the *de novo* TE library to annotate all TEs in the two genomes and combined the results of homologous TE annotation and the *de novo* annotation. If there were overlapping annotations we kept the longer TE. In addition, we predicted tandem repeats using TRF finder (parameters settings: match = 2, mismatch = 7, delta = 7, PM = 80, PI = 10, Minscore = 50, and MaxPeriod = 12) (Benson, 1999). In total, the non-redundant repetitive sequences accounted for 27.8 and 45.9% of the *Z. nevadensis* and *M. natalensis* genome, respectively (**Table 2**, Dataset S1).

We also checked for Talua elements in both termite species, SINE elements that were first identified in termites (Luchetti, 2005; Luchetti and Mantovani, 2009). Talua reference sequences (Dataset S1) were mapped to the TE annotations using BLASTn (*e*-value 1e-5). If the alignment contained more than 50% of the Talua domain, the TE was considered to be a Talua containing TE. In total, we found 1575 and 4385 Talua containing TEs in the *Z. nevadensis* and *M. natalensis* genome, respectively.

#### **RESULTS AND DISCUSSION GENOME ARCHITECTURE AND REPETITIVE SEQUENCES**

A striking difference between ants and termites is that termite genomes are about three times larger (Table S1), which appears to be an ancestral cockroach characteristic (always several Gbs; Koshikawa et al., 2008). Termites actually have smaller genomes than cockroaches and it has been hypothesized that sociality was in fact associated with a reduction in genome size (Koshikawa et al., 2008). Yet the socially more complex *M. natalensis* has a genome size that is more than twice the genome size of *Z. nevadensis* (1.31 Gb vs. 562 Mb), which has the smallest genome known for any termite so far (Koshikawa et al., 2008). On the other hand, ant genome size appears to vary relatively little around an average of 300 Mb, with the largest ant genome published so far being 352 Mb (the red fire ant *Solenopsis invicta*) and smallest genome being 219 Mb (the Argentine ant *Linepithema humile*) (Table S2).

The two termite assemblies covered over 85% of the genomes, so any differences observed are unlikely to be related to the slightly fewer protein coding genes in *Z. nevadensis* (15,876 vs. 16,310 in *M. natalensis*). However, the *M. natalensis* genome contained a much higher proportion of repeat sequences (67.1 vs. 26.0% in *Z. nevadensis*) (**Table 2**). Subtracting these repeat sequences leads to comparable respective genome sizes of 367 and 365 Mb. Further genomic data will be needed to find out whether these


**Table 2 | The number and length of each type of repetitive sequence.**

*Simple repeats are 2–5 bp repetitive units while longer satellite and tandem repeats have 6–40 bp. "Other" includes repeats that do not belong to any of the listed types, such as DNA-viruses or centromeric regions (listed in Table S1).*

ca. 365 Mbs represent a kind of "core genome" for termites and whether additional variation in genome size would then only be due to variation in repeat sequences. It will also be interesting to evaluate the first cockroach genomes to see whether their huge genomes (multiple Gbs) are associated with a higher number of coding or repeat sequences. In ants, genome-wide repeat content so far varies between 11.5 and 28.0% (Gadau et al., 2012) and no overall correlation with genome size appears to exist.

The *M. natalensis* genome had almost twice as many TEs (transposable elements) than the *Z. nevadensis* genome (45.9 vs. 27.8%; **Table 2**) and most of these were LINEs (long interspersed nuclear elements), which accounted for 20% of the *M. natalensis* genome (**Table 2**). According to the Rebase classification, most LINEs in *M. natalensis* resemble BovB retrotransposons, accounting for 16% of the genome, while LINEs contribute only ca. 3% in *Z. nevadensis* (**Table 2**). BovBs are relatively well known from vertebrates where they have a patchy distribution in squamates, monotremes, marsupials, ruminants, and several African mammals (Afrotheria), possibly as a consequence of horizontal gene transfer via reptile ticks (Walsh et al., 2013). In ruminants, part of one BovB LINE seems to have been recruited into a functional gene after duplication (Iwashita et al., 2006), but whether similar cooption processes may have occurred in termites remains to be explored.

The *M. natalensis* genome appears to have fewer SINEs (short interspersed nuclear elements) than the *Z. nevadensis* genome (3.6 vs. 0.2%). A new SINE retrotransposon, *Talua*, has recently been described for termites (Luchetti, 2005; Luchetti and Mantovani, 2009). It belongs to a new family of tRNA-derived elements that are very G+C-rich (55–60%) but makes up only a small proportion of the termite genomes (0.25 and 0.19% in *Z. nevadensis* and *M. natalensis*, respectively; Table S3). There are multi-copy TEs that are present in both termite genomes that do not resemble any known TEs. They may thus be novel termite-specific TEs, but additional termite and non-termite genomes will be needed to test this against a null hypothesis of being more general TEs that also occur in other hemimetabolous insects.

TE sequence divergence (i.e., percentage of different base pairs) relative to TE consensus sequences showed a peak at about 25% for both *M. natalensis* and *Z. nevadensis* (**Figure 3**), but *M. natalensis* had an additional divergence rate peak at ca. 7∼8% (**Figure 3**). This might indicate that the lineage leading to *M. natalensis* has undergone a genome expansion that multiplied TE copies and BovB retrotransposons, which could then explain why the *M. natalensis* genome is so much larger than the *Z. nevadensis* genome.

Consistent with the high prevalence of repeat sequences, IPR annotation results showed a functional enrichment of DNA/RNA cutting genes in termites (Ribonuclease H domain: 22 genes, Ribonuclease H-like domain: 26 genes, endonuclease/ exonuclease/phosphatase: 26 genes) compared to other insects (Table S4). Strikingly, *M. natalensis* had at least twice as many of such transposon-related genes than *Z. nevadensis*, supporting the idea that selfish replicating elements played a major role in the evolution of termite genome architecture and size (Tables S5, S6).

Cluster analyses of caste-specific transcriptomes in *Z. nevadensis* revealed that several of these DNA/RNA-cutting genes are overexpressed in the nymphal stages (i.e., instars with wing buds) compared to all other stages and castes (Terrapon et al., 2014). Nymphs are individuals destined to develop into winged dispersing reproductives, suggesting that TE activity might be linked to maturation processes such as gonad development. Such functions remain speculative at this point, but would be consistent with TEs having been coopted to fulfill host functions and to play fundamental roles in epigenetic regulation in organisms as different as *Arabidopsis thaliana* plants, *Caenorhabditis elegans* worms, *Drosophila melanogaster* flies and *Mus musculus* house mice (e.g., Lippman et al., 2004; Slotkin and Martienssen, 2007; Fedoroff, 2012). Silenced TEs are often activated through stressful environmental conditions (Slotkin and Martienssen, 2007; Fedoroff, 2012). In wood-dwelling termites, such conditions may arise by reduced food availability or possibly parasite pressure inducing higher rates of nymphal (sexual dispersing) development (Lenz, 1994; Korb and Schmidinger, 2004; Korb and Fuchs,

2006). Hence, it may be interesting to test whether a similar link between TE activity and stressful conditions exists during nymphal development.

Whether TEs can also be linked with epigenetic regulation of gene expression through DNA methylation (Lippman et al., 2004; Slotkin and Martienssen, 2007; Fedoroff, 2012) remains to be seen. DNA methylation has been proposed to regulate caste differentiation (Kucharski et al., 2008; Elango et al., 2009; Gadau et al., 2012; Terrapon et al., 2014) and the complete epigenetic toolbox was indeed identified in *Z. nevadensis* with orthologs of *DNMT1* and *DNMT3* (Terrapon et al., 2014). However, in *M. natalensis* only *DNMT1* (and possibly *DNMT2*) could be confirmed, but not *DNMT3*.

#### **COMMUNICATION**

Termite-specific expansions for gene families were also found among chemoperception genes that are important for communication (Table S4). Given the disparate social systems of *Z. nevadensis* and *M. natalensis*, differences in expansions of such genes may be related to divergent communication systems. Chemoperception genes mainly comprise four families: Odorant receptors (ORs), gustatory receptors (GRs), ionotropic receptors (IRs), and odorant binding proteins. ORs mostly control for the specificity and sensitivity of insect olfaction. GRs are primarily involved in contact chemoperception and IRs belong to a recently discovered gene family for olfaction and gustation in *Drosophila* (Benton et al., 2009; Grosjean et al., 2011; Rytz et al., 2013). Odorant binding proteins primarily shuttle such compounds through the hydrophilic environment of the sensory lymph to the receptors.

The IR family is most consistently expanded in *Z. nevadensis*, representing the highest known value in insects (Terrapon et al., 2014). This IR number was between 4 and 10-fold higher in *Z. nevadensis* than in eusocial Hymenopterans, but the 80 intact GR genes remained within the overall range of 10–97 known from ants and honeybees (Zhou et al., 2012). The number of OR genes in *Z. nevadensis* was between one third and one half of the numbers normally found in the ants (Zhou et al., 2012), consistent with the lifestyle of wood-dwelling termites likely requiring lower levels of olfactory communication.

Overall, we found termite-specific enrichment in all four major gene families relating to olfaction (Table S4). Most IPR enrichment occurred in the ionotropic glutamate receptors that include IR genes (21). Significant enrichment was also found in ORs (7), GRs (7 TM chemoreceptor: 7), and various odorantbinding proteins (9, 7, 5). Direct comparison between *Z. nevadensis* and *M. natalensis* (Table S6) showed that *Z. nevadensis* had significantly more genes related to chemical communication than *M. natalensis* (Table S7). However, chemoperception genes are notoriously difficult to assemble and annotate (Terrapon et al., 2014), so this difference should be considered with caution, also because these genes were manually annotated in *Z. nevadensis* (with support from antennal RNAseq data), but automatically in *M. natalensis*. More work will therefore be needed before solid conclusions on the relative role of ORs in different termite species can be drawn.

#### **IMMUNE DEFENSES**

Both termite species live in potentially pathogen-rich habitats. *Z. nevadensis* nests in decaying wood with abundant fungal growth that has probably selected for intensive allogrooming behaviors (Korb et al., 2012a). Also *M. natalensis* is potentially exposed to many pathogens both from its soil-nesting habitat and across its foraging range. *Macrotermes* species are known to protect their *Termitomyces* fungal symbiont from being overgrown by other fungi (Nobre et al., 2011) and termite-specific antimicrobial peptides (AMPs) have been described in another genus of fungus-growing termites (Lamberty et al., 2001).

Relative to ants and other insects, we did not find enrichments for immune defense genes in the two termite genomes and neither were there substantial differences between the two termite genomes (Tables S4, S6). All of the immune-related pathways, including pattern recognition, signaling, and gene regulation (as described for *Drosophila melanogaster* and other insects; Hoffmann, 2003; Hultmark, 2003; Schmid-Hempel, 2005) are present in both termite genomes (Table S8). Only two differences are noteworthy (**Table 3**). First, *Z. nevadensis* has 6 gramnegative binding proteins (GNBPs), whereas only four of these were recovered in *M. natalensis*. These four GNBPs are all termitespecific (**Figure 4**) and some of them were previously shown to be under positive selection in several *Nasutitermes* species, especially in species with arboreal nests (Bulmer and Crozier, 2006). The *Macrotermes* genome seems to lack the insect-typical GNBP duplicate and one GNBP gene that has so far only been found in *Z. nevadensis* (**Figure 4**). Second, while AMPs were not enriched in either termite genome (Table S4), their identities were completely different with *Z. nevadensis* having 2 AMPs and *M. natalensis* having 3 other AMPs (**Table 3**, Table S8). *M. natalensis* has a termite-specific defensin-like gene termicin, a category of genes that seem to have duplicated repeatedly during the radiation of *Nasutitermes* termites (Bulmer and Crozier, 2004). After duplication, one copy seems to often be under strong selection, while the other evolves toward neutrality (Bulmer and Crozier, 2004; Bulmer et al., 2010). Also in the soil-foraging *Reticulitermes* species these genes seem to be under positive selection (Bulmer et al., 2010).

In contrast to other insects where AMP production is normally induced, these genes seem to be constitutively expressed in fungus-growing termites, as has been shown for *Pseudacanthotermes spiniger* (Lamberty et al., 2001), which might be an adaptation to protect the symbiont against competing fungi. Termicin and other defensins (Table S8) were absent in *Z. nevadensis* but this species has GNBPs that are differentially expressed between castes (Terrapon et al., 2014) and may thus serve a similar function in protecting the nest from fungal infections. For the arboreal nesting termite *Nasutitermes corniger* it has been shown that GNBP2 has (1,3)-glucanase effector activity and functions as an antifungal agent (Bulmer et al., 2009). It is incorporated in the nest building material, where it cleaves and releases pathogenic components while priming termites for improved antimicrobial defense (Bulmer et al., 2009). Such a defensive strategy is likely to be most effective for termites with closed nests, consistent with positive selection on GNBP being most pronounced in *Nasutitermes* that live in arboreal nests (Bulmer and Crozier, 2006). Hence, antifungal stategies might differ in termites with different habitats; with GNBPs and termicin possibly playing complementary roles. This is supported by the fact that GNBPs in subterranean, foraging *Reticulitermes* species evolve neutrally while termicin was shown to have been under strong positive selection in these species (**Table 3**).

We can reject the possible alternative hypothesis that different defense strategies are linked to the gut symbionts that need different defense strategies to protect the symbiotic partner. As lower termites harbor protists as well as bacteria, while higher termites only have bacteria, we would then have expected higher termites having more AMPs and lower termites more GNBPs, but this is not the case because lower *Reticulitermes* termites have positively selected termicins. If there is an association between nesting habit and defense strategy, we expect that GNBPs are under positive selection in other wood-dwelling termites, and termicins are selected in soil-foraging termites. Additional genomic data, particularly for wood-dwelling termites, would be needed to validate this hypothesis.

Reduced numbers of immune defense genes were found in ants and the honeybee (Evans et al., 2006; Gadau et al., 2012) but also here there seems to be selection on some of the AMP genes. Similar to termicin, positive selection was detected on defensin in ants (Viljakainen and Pamilo, 2008), but this gene was not overexpressed after experimental fungal infections of leafcutting ant colonies, whereas two other AMPs were (Yek et al., 2013). This contrasts with dipterans (*Drosophila* and *Anopheles*) for which no evidence was found for positive selection on any AMPs (Sackton et al., 2007; Simard et al., 2007), but instead for immune recognition and signaling proteins (Schlenke and Begun, 2003; Jiggins and Kim, 2005; Sackton et al., 2007). This provides further support for the hypothesis that social insects have responded differently to selection pressure caused by microbial pathogens than solitary insects (Viljakainen and Pamilo, 2008).

#### **MATING BIOLOGY**

Compared to *M. natalensis*, the *Z. nevadensis* genome is enriched in genes that are related to male fertility/spermatogenesis (e.g., *KLHL10*) (**Table 4**, Table S7). This suggests that the co-expansion (and co-expression) of these genes in *Z. nevadensis* is not typical for termite sociality but rather taxon-specific. It might be linked to the seasonal reproduction of this temperate zone species where spermatogenesis is cyclically switched on and off, which contrasts with tropical *Macrotermes* males that produce offspring all year round. However, some members of two spermatogenesis-related gene families, seven-in-absentia (SINA) proteins and α-tubulins, do not show *Z. nevadensis*-specific expansions.

An alternative evolutionary explanation could be that males of wood-dwelling termites have low but consistent probabilities to face sperm competition when neighboring colonies merge after colony foundation. Such mergers are impossible in foraging termites where unrelated males never compete for inseminating the


*GNBPs and termicins might serve complementary roles in fungal defense in termites. GNBPs might be more important in species with closed nests, whereas termicins seem to be under strong positive selection in foraging termites with subterranean nests. ?, unknown.*

**replicates) after alignment of the peptide sequences in ClustalW2.**

**SYMBIOSIS**

same queen (Boomsma, 2013). This hypothesis would predict no difference between temperate and tropical wood-dwelling termites, but a series of termite genomes will be needed to test these contentions.

#### The ancestral termite gut microbiota was derived from a cockroach ancestor, but major subsequent changes occurred, most notably when the higher termites evolved (Dietrich et al., 2014).

The guts of the wood-dwelling termites are dominated by protists that appear to be primarily adapted to break down wood (Cleveland, 1923; Brugerolle and Radek, 2006), with complementary roles of bacteria that are often symbiotic with the gut-flagellates (Dietrich et al., 2014). The common ancestor of the evolutionarily derived Termitidae lost these flagellate symbionts so their gut microbiotas became dominated by bacteria, which may have facilitated their dietary diversification (Brune and Ohkuma, 2011; Dietrich et al., 2014). The single origin of fungiculture by the Macrotermitinae led to *Termitomyces* taking over primary plant decomposition and the gut microbiota shifting phylogenetically and functionally to perform complementary roles (Liu et al., 2013; Dietrich et al., 2014; Otani et al., 2014; Poulsen et al., 2014).

Changes in symbiont associations are tightly associated with termite life styles (for a recent review on termite gut symbionts, see Brune, 2014), but this may hardly induce structural genomic changes in the termite hosts, consistent with the similar gene repertoires for plant biomass decomposition found in the two termite genomes (Poulsen et al., 2014). A comparison of carbohydrate-active enzyme (CAZy) profiles of the two termite species showed a reduction in the absolute number of glycoside hydrolase enzymes (85) in *M. natalensis* compared to *Z. nevadensis* (97) (**Table 5**), but very similar relative abundances of specific enzyme families (Poulsen et al., 2014). Profile similarities suggest that plant-biomass decomposition genes may be ancestrally conserved across the termites, but additional termite genomes are needed to shed light on this. Such additional genomic work will need to be accompanied by enzyme function validations to test whether differences in absolute numbers reflect changes in the relative importance of termite-derived enzymes.

#### **CONCLUSION**

Despite the striking differences in social complexity between *Z. nevadensis* and *M. natalensis* we did not find major differences in gene composition. The gene families underlying chemical communication seem not to be expanded in the more complex fungus-growing termite compared to *Z. nevadensis*. The major differences between the two termite genomes are related to genome architecture and the presence of transposons that can explain the much larger genome size of *M. natalensis*. Whether these ancestrally selfish elements have been domesticated for functions related to the increased social complexity of *M. natalensis* needs further work.



Our comparison allowed us to generate hypotheses that can be tested with functional genomic studies and with more advanced comparative analyses as more termite genomes become available.

We have highlighted the contours of further testable predictions concerning TE number and genome size, male fertility, and habitat-specific disease pressure. For any next termite genome to be sequenced (**Figure 1**), authors should ask questions like: (1) Is the habitat of this (e.g., drywood) termite more diseaseridden than the habitat of a comparable dampwood termite such as *Z. nevadensis*? (2) Would this tropical new wood-dwelling termite have similar gene family expansions for male fertility as *Z. nevadensis*? (3) Has this arboreal higher (e.g., *Nasutitermes*) termite lost specific immune defenses that match the disease pressure of its habitat and is it equally burdened by TEs as *Macrotermes natalensis*?

While two genomes are a major achievement in some sense, these genomes also leave us with insufficient resolution to move much beyond the crude comparisons that we offer in this paper, because *Z. nevadensis* and *M. natalensis* differ in too many evolutionary and ecological factors (**Table 1**). It has also become clear



from comparative ant genomics that gene expression mechanisms may be more informative than structural gene differences (Simola et al., 2013). Finally, apart from obtaining more termite genomes and population genomic studies on gene expression and signatures of selection, it will also be crucially important to obtain a *Cryptocercus* cockroach sister lineage genome and more distant outgroup genomes for non-social hemimetabolous insects. Many surprises will likely be waiting in the wings, as both the pea aphid and the body louse genomes turned out to be unusual because of the specialized feeding habits of these insects with or without symbionts.

#### **ACKNOWLEDGMENTS**

Judith Korb was supported by research grant from the German Science Foundation (DFG; KO1895/6), Michael Poulsen by a STENO grant from The Danish Council for Independent Research Natural Sciences, Jacobus J. Boomsma by a Danish National Research Foundation grant (DNRF57), Guojie Zhang by a Marie Curie International Incoming Fellowship (300837), and Jürgen Liebig by the Agriculture and Food Research Initiative (2007-35302-18172 to Jürgen Liebig and Colin S. Brent). We thank the three referees for helpful comments and the editors Jürgen Gadau and Greg Hunt for inviting us to contribute to this special journal issue.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fgene. 2015.00009/abstract

#### **REFERENCES**


a pest control strategy. *Proc. Natl. Acad. Sci. U.S.A.* 106, 12652–12657. doi: 10.1073/pnas.0904063106


retrotransposon created the paralogous bucentaur gene (bcnt) in the ancestral ruminant *Mol. Biol. Evol.* 23, 798–806. doi: 10.1093/molbev/msj088


Walsh, A. M., Kortschak, R. D., Gardner, M. G., Bertozzi, T., and Adelson, D. L. (2013). Widespread horizontal transfer of retrotransposons. *Proc. Natl. Acad. Sci. U.S.A.* 110, 1012–1016. doi: 10.1073/pnas.12058 56110

Weil, T., Hoffmann, K., Kroiss, J., Strohm, E., and Korb, J. (2009). Scent of a queencuticular hydrocarbons specific for female reproductives in lower termites. *Naturwissenschaften* 96, 315–319. doi: 10.1007/s00114-008-0475-8

Werren, J. H., Richards, S., Desjardins, C. A., Niehuis, O., Gadau, J., Colbourne, J. K., et al. (2010). Functional and evolutionary insights from the genomes of three parasitoid *Nasonia* species. *Science* 327, 343–348. doi: 10.1126/science.1178028


**Conflict of Interest Statement:** The Associate Editor Júrgen Rudolf Gadau declares that, despite being affiliated with the same institute and having collaborated with the author Jürgen Liebig, the review process was handled objectively. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 September 2014; paper pending published: 25 November 2014; accepted: 09 January 2015; published online: 04 March 2015.*

*Citation: Korb J, Poulsen M, Hu H, Li C, Boomsma JJ, Zhang G and Liebig J (2015) A genomic comparison of two termites with different social complexity. Front. Genet. 6:9. doi: 10.3389/fgene.2015.00009*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2015 Korb, Poulsen, Hu, Li, Boomsma, Zhang and Liebig. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Omic research in termites: an overview and a roadmap

#### *Michael E. Scharf\**

*Department of Entomology, Purdue University, West Lafayette, IN, USA*

Many recent breakthroughs in our understanding of termite biology have been facilitated by "omics" research. Omic science seeks to collectively catalog, quantify, and characterize pools of biological molecules that translate into structure, function, and life processes of an organism. Biological molecules in this context include genomic DNA, messenger RNA, proteins, and other biochemicals. Other permutations of omics that apply to termites include sociogenomics, which seeks to define social life in molecular terms (e.g., behavior, sociality, physiology, symbiosis, etc.) and digestomics, which seeks to define the collective pool of host and symbiont genes that collaborate to achieve high-efficiency lignocellulose digestion in the termite gut. This review covers a wide spectrum of termite omic studies from the past 15 years. Topics covered include a summary of terminology, the various kinds of omic efforts that have been undertaken, what has been revealed, and to a degree, what the results mean. Although recent omic efforts have contributed to a better understanding of many facets of termite and symbiont biology, and have created important new resources for many species, significant knowledge gaps still remain. Crossing these gaps can best be done by applying new omic resources within multi-dimensional (i.e., functional, translational, and applied) research programs.

Keywords: holobiome, digestome, sociogenomics, symbiosis, metabolomics, DNA methylation, sociobiology, socioevolution

#### Introduction

*Edited by:*

*Germany*

*Genetics*

*Citation:*

*Front. Genet. 6:76.*

*University, USA Reviewed by:*

*Juergen Rudolf Gadau, Arizona State*

*Judith Korb, University of Freiburg,*

*Edward L. Vargo, North Carolina State University, USA \*Correspondence:*

*Michael E. Scharf, Department of Entomology, Purdue University, 901 West State Street, West Lafayette,*

> *Received: 02 September 2014 Accepted: 13 February 2015 Published: 13 March 2015*

*Scharf ME (2015) Omic research in termites: an overview and a roadmap.*

*doi: 10.3389/fgene.2015.00076*

*IN 47907-2089, USA mscharf@purdue.edu Specialty section: This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in*

#### Overview and Terminology

In a broad sense, the underlying goals of omic1 science are to catalog, quantify, and characterize pools of biological molecules that translate into structure, function, and life processes of an organism or environment. The types of biological molecules receiving focus in omics2 include genomic DNA, messenger RNA (mRNA), protein, and metabolites (**Figure 1**). DNA, mRNA, and protein are respectively the foci of genomics, transcriptomics, methylomics, and proteomics. Genomics, methylomics, and transcriptomics rely on nucleic acid sequencing, whereas proteomics utilizes peptide sequencing procedures. By contrast, metabolomics is rooted more in analytical chemistry and focuses on biochemicals, metabolites, or pathways. Another relevant omic approach is the cataloging of bacterial and protist symbionts using high-throughput 16S and 18S rRNA sequencing.

<sup>1</sup>The singular term "omic" is used as an adjective in this review.

<sup>2</sup>The plural term "omics" is used as a noun.

Termite omic research has focused on the host termite, individual gut microbial symbionts or entire populations of gut microbes. In the latter case, these "meta" analyses focusing broadly on collective microbiota occurring in the gut microenvironment have been popular, particularly with microbiologists specializing in termite intestinal microbiology. Although it presents significant bioinformatic challenges, a more inclusive approach that considers host and symbionts together as a single functional unit is the best approach for appreciating the full functional capacity of termites. A fundamental advantage of omic research over more traditional organismal research is that it enables direct mechanistic insights into termite and symbiont physiology and biochemistry. The use of omic technologies has led to new insights into behavior, social structure, digestion, and host-symbiont/symbiont– symbiont interactions, and many other aspects of termite biology. However, also as addressed throughout this review, omic science has limits for being able to define biological function.

#### Termite Symbiosis and the Holobiont Concept

Termites are perhaps best known for their symbiotic associations with gut microbes (König et al., 2013; Brune, 2014) that are often linked to digestive processes, although lignocellulose digestion is not mediated entirely by gut microbes (Watanabe and Tokuda, 2010; **Figure 2A**). The more ancestral lower termites have tri-partite symbioses that include host, bacteria and protozoa; whereas in higher termites, symbiosis has been reduced to a two-way association between host and bacteria (but some higher termites also maintain ecto-symbiotic associations with fungi; Brune, 2014). The host component of termite symbiotic systems adds substantially to the digestive process both in terms of contributing enzymes and maintaining a favorable gut microenvironment for symbiosis and digestion to occur (Watanabe et al., 1998; Tartar et al., 2009; Scharf et al., 2011; Sethi et al., 2013a; Tokuda et al., 2014). Because of the high degree of interplay that occurs between the termite host and gut symbionts, a key idea moving forward will be to consider termites from the perspective of the "holobiont" (a single functional unit in which host and symbionts are physiologically tightly connected). Omic research has enabled a multifaceted systemic understanding of gut digestomes that is central to understanding the termite holobiome from an applied perspective (Scharf, 2015).

#### Sociogenomics and Digestomics

The term *sociogenomics* was coined to describe the use of omic approaches for defining social life in molecular terms, which began with studies on the honey bee, *Apis mellifera* (Robinson et al., 2005). A parallel idea cited as rationale for many omic studies in social insects, including termites, is that solitary genes and traits were likely co-opted for new functions as solitary ancestors transitioned to social lifestyles (West-Eberhard, 2003; Nelson et al., 2007). Understanding such traits is essential for understanding termite social evolution (Miura and Scharf, 2011; **Figure 2B**). Another term used specifically in relation to digestive research is *digestomics*, which was coined to describe the collective pool of host and symbiont genes that collaborate to achieve high-efficiency lignocellulose digestion in the termite gut (Scharf and Tartar, 2008; Tartar et al., 2009; **Figure 2A**). Such terminology is useful because of the large number of symbionts that occupy termite guts and collaborate with the host in lignocellulose digestion. A related term is *termitosphere*, which is the full complement of gut and ectosymbiotic (nest) microbes present in termites, termite colonies, and their surrounding nest structures (Roose-Amsaleg et al., 2004; Bastien et al., 2013). Whether in relation to social, solitary or symbiont genes, proteins or other biomolecules, sociogenomic and digestomic research in termites has created an explosion of new sequence data.

#### Omic Studies in Termites: *What has been Done?*

Based on a recent literature survey (**Table 1**), at the time of writing this article around 70 papers had been published describing omic efforts in termite systems. These studies include all the themes introduced above, as well as microbial 16S and 18S surveys.

#### Taxonomic Distribution

In total, 82 termite species have been investigated using various omic approaches, with greater representation by lower than higher termites (72 vs. 28%). Among lower termites the top genera studied are important pest groups (*Reticulitermes* and *Coptotermes*), followed by non-pests from *Hodotermopsis*, *Mastotermes,* and *Cryptotermes*. Among higher termite genera, *Nasutitermes* dominate, followed by *Odontotermes*, *Trinervitermes,* and several other minor groups. Two termite genome sequences have now been published from the lower termite *Zootermopsis angusticollis* and the higher termite *Macrotermes natalensis* (see below).

#### Host vs. Symbiont Investigation

Of the various omic studies to date considering symbiosis and symbiotic partnerships in termite systems, the majority have taken an exclusive symbiont-oriented approach (*>*60%), whereas a minority have considered the host termite separately

(*<*20%). The remainder have considered host and symbiont together (∼20%). In the latter category of host and symbiont combined, some studies have been a case of "accidental metatranscriptomics" (because protist symbionts have polyadenylated transcripts that are represented in cDNA libraries along with host transcripts; e.g., Scharf et al., 2003, 2005; Steller et al., 2010), but others have been deliberate metatranscriptomic studies (e.g., Tartar et al., 2009; Raychoudhury et al., 2013; Sen et al., 2013). The greater emphasis on gut symbiota compared to the host termite is likely because of the stereotypically well-recognized presence of gut microbes in termites.

#### Experimental Approaches and Types of Sequencing

In terms of experimental approaches taken, there has been an approximately equal split between descriptive and hypothesisdriven studies. Regarding the types of sequencing performed, transcriptomics and metatranscriptomics have been the dominant approaches (25 and 21% of studies), followed by microbial surveys for cataloging purposes (23%). The transcriptomic approaches used can be further divided into different methodologies such as cDNA library sequencing (Sanger, pyrosequencing or Illumina RNA-seq) and microarrays. Other efforts have targeted symbiont metagenomes (15%), symbiont or termite genomes (9%), proteomes (3%), and DNA methylomes (3%).

### Omic Studies in Termites: *What has been Revealed?*

#### Genomics

#### Host Termite Genomes

At present only two termite genome sequences are available (**Table 1**); one from the lower termite *Zootermopsis nevadensis* (Terrapon et al., 2014) and one from the higher termite *M. natalensis* (Poulsen et al., 2014). *Z. nevadensis* was selected for sequencing based on its small genome size of 562 Mb relative to other termites, most of which are over 1000 Mb (Koshikawa et al., 2008). The *Z. nevadensis* sequencing approach involved shotgun genome sequencing of genomic DNA from symbiont-free soldier heads (*n* = 50 and 150 heads for 2 and 20 kb libraries, respectively). The transcriptomes of castes and various phenotypes were also sequenced for both gene prediction and comparative transcriptomic purposes. Transcriptome data were also used to search for DNA methylation machinery and methylation/epigenetic differences among castes and developmental stages.

The *Z. nevadensis* genome provided the first hints into how termites differ at the genome level from their eusocial counterparts in the order Hymenoptera, which evolved sociality independently. For making socio-evolutionary comparisons, emphasis was placed on gene family expansions, male fertility, chemoreception, immunity, polyphenism/division of labor, and potential epigenetic caste regulation. An expansion of genes


A comprehensive literature summary of termite omic research, organized by approaches

 taken.

TABLE 1 |


TABLE 1 | Contniued



TABLE 1 | Contniued


TABLE 1 | Contniued





related to male fertility and upregulated gene expression in male reproductives are consistent with differences in mating biology between termites and Hymenoptera. Regarding chemoreception, divergent numbers of genes and gene families relative to Hymenoptera were identified, as were variations in chemoreception gene expression among castes. Regarding caste polyphenism and division of labor, caste-associated gene expression profiles were readily identifiable. Key caste-regulatory and reproductionassociated genes identified through preceding work (e.g., hexamerins, vitellogenins, and CYP genes) were further defined and verified as gene families at the genomic level. Interestingly, there are 76 cytochrome P450 genes in the *Z. nevadensis* genome; which is nearly 2x as many as encoded by the honey bee genome (Honey Bee Genome Sequencing Consortium, 2006). Lastly, DNA methylation signatures and patterns of alternative splicing provided some evidence to suggest epigenetic caste regulation (see later).

The *M. natalensis* sequencing considered not only the host genome, but also the entire tri-partite system of this higher fungus-growing termite. This included the 1.3 Gb host genome, the 84 Mb genome of the *Termitomyces* sp. fungal symbiont and 816 Mb of prokaryotic gut metagenome from major workers, minor soldiers, and queens. Emphasis was placed mostly on cellulose digestion, which revealed a rich complement of glycosyl hydrolases from host, fungi, and gut microbes that likely collaborate in lignocellulose digestion. Another major finding was that gut microbiota composition is reduced by over 50% in queens relative to workers and soldiers, suggesting that queen gut microbiota undergo substantial compositional changes during colony founding, which points toward the local environment or other external factors as sources of microbiota as incipient colonies grow and age. Moving forward, the *Z. nevadensis* and *M. natalensis* genomes will be important resources for termitologists, and will also provide important scaffolds for assembly of additional termite genomes that will facilitate study of genes related to many evolutionary and biological processes.

#### Individual Symbiont Genomes

Five individual symbiont genomes have been sequenced (**Table 1**), with several others published or in progress since the writing of this article. No protist genomes have yet been sequenced. Two bacterial endosymbionts of hindgut protists from *Coptotermes formosanus* and *Reticulitermes speratus* (phylum Elusimicrobia or "TG1") were the first symbiont genomes sequenced; they were obtained from isolated individual cells after whole-genome amplification (Hongoh et al., 2008a,b). No lignocellulase genes were identified; however, both genomes encoded capabilities to fix nitrogen, recycle host nitrogen wastes for amino acid and cofactor biosynthesis, and import glucose and xylose as energy and carbon sources. The next symbiont genomes were from gut bacteria in the phyla Verrucomicrobia and Fusobacteria, from the termites *Reticulitermes flavipes* and *R. lucifugus* (Harmon-Smith et al., 2010; Isanapong et al., 2012). These genomes were from culturable isolates and were found to encode genes related to cellulose degradation and nitrogen fixation. Another example is the genome of an obligate fat body endosymbiont *Blattabacterium* from the basal termite

*Mastotermes darwiniensis* (Sabree et al., 2012). This bacterium displays a reduction in genome size and loss of genes required for amino acid production relative to free-living gut bacteria, which is consistent with its ability to recycle nitrogenous wastes and its role as a co-evolved endosymbiotic partner of the host termite.

#### Symbiont Metagenomes

At the time of writing this article, at least 12 prokaryotic metagenomes had been partially sequenced (**Table 1**). Most metagenome publications have reported on lignocellulase identification from genome sequences of gut bacteria that selectively grew on lignocellulose media (Liu et al., 2011; Mattéotti et al., 2011a,b, 2012; Nimchua et al., 2012; Rashamuse et al., 2012, 2014; Wang et al., 2012). Another study used targeted xylanase screening from gut and ectosymbiotic fungi-associated bacteria of the higher termite *Pseudacanthotermes militaris* (Bastien et al., 2013). Other studies took broader approaches to sequence from gut bacterial communities of higher termites. By combining metagenome sequencing with 16S surveys and metatranscriptomics, these studies revealed new information on bacterial cellulase diversity from termites with different symbiosis strategies (i.e., with and without fungal ectosymbionts; Warnecke et al., 2007; Liu et al., 2013) and from different feeding guilds (dung vs. wood; He et al., 2013). While these studies provided a wealth of new high-impact information on bacterial symbionts, they did not consider how symbionts from the gut and/or nest termitosphere collaborate with or complement the host termite.

#### Transcriptomics Host Transcriptome

Around 15 transcriptomic studies to date have focused on physiological processes or tissues in the host termite (**Table 1**). Early studies looked for caste-biased gene expression, but the approaches employed had low resolving power and typically revealed only small numbers of differentially expressed genes. These studies mainly used subtractive hybridizations or cDNA "macro" arrays (reviewed by Miura and Scharf, 2011). Also, these early studies in lower termites often fell into the category of "accidental metatranscriptomics" as described earlier. The majority of focus in termite transcriptomic work has been on differences among castes or during caste differentiation (reviewed by Miura and Scharf, 2011). Mainly, newer studies are considered here.

Because of the importance of juvenile hormone (JH) to soldier caste differentiation and the reliability of JH treatment for inducing soldier caste differentiation, continuing focus has been placed on this transition in hypothesis-driven studies that combine JH assays with transcriptomics (e.g., Cornette et al., 2013; Sen et al., 2013). Caste-regulatory primer pheromones and the social environment have also been studied in the same context (Tarver et al., 2010; Sen et al., 2013). Other studies have included tissue-directed subtractive hybridizations, random/*de novo* cDNA library sequencing and/or cDNA oligonucleotide microarrays to reveal caste-biased gene expression (Weil et al., 2009; Ishikawa et al., 2010; Leonardo et al., 2011; Hojo et al., 2012; Huang et al., 2012; Husseneder et al., 2012; Terrapon et al., 2014). The over-arching themes emerging from this work include caste and morphogenesis-associated gene expression, endocrine signaling, vitellogenesis, reproduction-related processes, and regulatory mechanisms that maintain juvenile worker states in lower termites.

The immune response is another aspect of host termite physiology investigated through transcriptomics. Four studies have revealed responses to immune challenges by both stereotypical and unprecedented immune-responsive genes (Thompson et al., 2003; Yuki et al., 2008; Gao et al., 2012; Hussain et al., 2013). Finally, an emerging theme has been to investigate pathogen-xenobiotic interactions at the transcriptome level (Husseneder and Simms, 2014; Sen et al., 2015).

#### Symbiont-Host Metatranscriptomes

In addition to host-targeted studies noted above, other studies have considered symbiont or host-symbiont metatranscriptome composition (**Table 1**). Early examples in this category showed worker-biased expression of protist cellulases (Scharf et al., 2003) and differential expression of symbiont cellulases between dispersing and non-dispersing adult reproductives (Scharf et al., 2005). Subsequent studies focused on metatranscriptome composition of bacteria, protist and/or fungal symbionts, mostly for the purpose of identifying digestive cellulases (reviewed by Scharf and Tartar, 2008). Recent work has probed deeper into gut metatranscriptomes by taking advantage of both traditional and next-generation sequencing technology (Todaka et al., 2010; Rosenthal et al., 2011; Xie et al., 2012; Zhang et al., 2012; He et al., 2013). Other work has sought to partition host and symbiont digestive contributions and identify candidate enzymes expressed specifically in response to wood (i.e., complex lignocellulose), cellulose and lignin feeding (Tartar et al., 2009; Raychoudhury et al., 2013; Sethi et al., 2013a).

One microarray study investigated gut metatranscriptome changes in responses to JH, primer pheromones and socioenvironmental conditions, suggesting interesting linkages between gut symbiota and caste differentiation (Sen et al., 2013). Another microarray study investigated host and symbiont gene expression in response to pathogen and nicotinoid-insecticide challenges, providing new insights into immunological roles played by bacterial and protist gut symbionts in defending against invading fungal and bacterial pathogens (Sen et al., 2015), building on the ideas of extended disease resistance as conferred by fecal nest bacteria (Chouvenc et al., 2013) and gut microbiota (Rosengaus et al., 2014).

#### Proteomics

Proteomics (**Table 1**) is important to validate transcriptome studies, particularly for determining if a gene's presence and/or its transcription and translation are proportional. For example, proteomic studies in a higher termite were unable to identify most of the bacterial cellulase proteins predicted by metagenome sequencing (Warnecke et al., 2007; Burnum et al., 2011). Alternatively, proteomic studies in lower termites were able to identify both protist cellulases and other host lignocellulases initially identified via metatranscriptome sequencing (Todaka et al., 2007; Sethi et al., 2013a). Another study investigated proteins present in labial gland secretions of 12 lower and higher termite species, identifying endogenous GHF9 cellulases as dominant components of worker labial gland secretions in most species investigated (Sillam-Dussès et al., 2012). Another study used proteomics to catalog gut microbial communities, but with limited resolution (Bauwens et al., 2013). Clearly, more proteomic efforts are needed to resolve issues related to: (1) congruency between nucleic acid and protein sequencing approaches, and (2) to verify open reading frames predicted by metagenome and transcriptome sequencing.

#### DNA Methylomes

Four studies to date have looked at methylation signatures across termite castes with somewhat differing results. A seminal study used a methylation-targeted amplification fragment length polymorphism (AFLP) approach in *Coptotermes lacteus* to look for methylation signature differences among castes (Lo et al., 2012). Evidence of methylation was found, but no significant casteassociated methylation patterns were identified.

A subsequent study was done *in silico* using database sequences from *R. flavipes* and *C. formosanus* (Glastad et al., 2013). In this study and the two described below, transcriptome data were mined to determine the specific distribution of CpG dinucleotides (i.e., 5 –3 cytosine followed by guanine), in order to predict DNA methylation levels *in silico*. Evidence of DNA methylation machinery and methylation signatures was found at high levels among expressed genes. Results also suggested that DNA methylation in *R. flavipes* is targeted to genes with ubiquitous (rather than differential) expression among castes and morphs. A third study examined host transcriptomes of three termite species that included two lower (*Hodotermopsis sjostedti*, *R. speratus*) and one higher termite (*Nasutitermes takasagoensis*; Hayashi et al., 2013). Pyrosequencing was done in combination with 69 caste and phenotypic libraries from the three termite species. Sequence analysis revealed that DNA methyltransferases potentially responsible for DNA methylation were present in each species, and verified the presence of methylation signatures. However, only limited evidence of casteassociated methylation profiles was detectable across the three species.

Finally, DNA methylation was assessed in *Z. nevadensis* as part of genome and transcriptome sequencing efforts (Terrapon et al., 2014). Transcriptome data were used to determine the specific distribution of CpG dinucleotides, in order to make *in silico* predictions of DNA methylation levels and explore for epigenetic differences among castes. In addition to verifying the presence of genes that encode for DNA methylation machinery (i.e., DNA methyltransferases 1 and 3), results showed greater methylation of genes rather than intergenic DNA, and a greater presence in introns than exons. This evidence, along with findings that alternatively spliced genes have greater degrees of methylation, suggests intronic methylation may impact alternative splicing.

While it is clear that DNA methylation exists in termites, sofar inconclusive results have been obtained to suggest epigenetic caste regulation. As concluded previously in relation to genetic caste determination (Vargo and Husseneder, 2009), the field of epigenetic caste regulation is in its infancy and epigenetic phenomena may or may not be relevant in natural colonies. More importantly, *in silico* methylation studies can only suggest that methylation may exist and which genes might be differentially methylated. Functional/translational research will be required to verify whether or not such genes truly are methylated, as well as the functions of those genes.

#### Metabolomics

Metabolomic studies are useful for assessing *in situ* processes, both as an exploratory approach and for functional/translational studies to verify nucleotide sequences. Soldier defensive secretions previously received much attention in this respect (Prestwich, 1984; Nelson et al., 2001). A more recent study investigated chemical components of labial gland secretions in soldier and worker termites from 7 lower and 1 higher termite (Sillam-Dussès et al., 2012). This study confirmed hydroquinone and other glucose and benzene-linked compounds as common labial gland secretions among most species.

Other metabolomic studies have focused on lignocellulose digestion. One main question addressed has been: *does lignin digestion or modification occur during passage through the termite gut*? Several studies over the past 25 years have addressed this question (reviewed by Ni and Tokuda, 2013) but recent metabolomic studies have been particularly informative (Geib et al., 2008; Ke et al., 2011, 2013). In general, findings are consistent regarding modification of lignin during passage through the gut, but evidence of actual lignin depolymerization has been more elusive. One possible reason for this could relate to insufficient detection procedures. Another possibility is that lignin-ether bonds, broken during depolymerization, only remain in this state for a short time and thus appear as intact lignin in frass. The induction of numerous antioxidant and detoxification enzymes by lignin feeding, as well as increased saccharification in the presence of lignin-associated phenoloxidases, supports the latter possibility (Sethi et al., 2013a). Despite convincing evidence of lignin modification during passage through the termite gut, and related omic studies revealing lignin-associated changes in host oxidative enzymatic machinery, the topic of lignin digestion/modification in termite guts remains contentious (Brune, 2014).

Another aspect of termite metabolomic research considers cellulose digestion and relative contributions of host and symbiont to this process. A recent metabolomic study investigated *in situ* digestion of 13C-labeled crystalline cellulose by *H. sjostedti* (Tokuda et al., 2014). Novel insights obtained related to both cellulose digestion and nitrogen metabolism. The results not only confirmed preceding work showing that endogenous cellulose digestion by the host is substantial, but also suggested other novel possibilities; for example (i) a significant digestive contribution by hindgut bacteria is phosphorolysis of cello-oligosaccharides to glucose-1-phosphate, and (ii) essential amino acid acquisition occurs via lysis of hindgut microbes obtained through proctodeal trophallaxis. The rapid buildup of glucose observed in the foregut agrees well with prior studies showing that host foregut cellulases can produce high levels of glucose directly from wood lignocellulose (Scharf et al., 2011; Sethi et al., 2013a,b). Additionally, higher glucose levels observed in the hindgut than other regions agrees with estimates that glucose release from lignocellulose is about 1/3 host and 2/3 symbiont (Scharf et al., 2011). However, since this study only focused on metabolite identification in gut tissue, it could not account for nutrients/metabolites transported out of the foregut and catabolized in other areas of the body.

#### Symbiont 16S and 18S Surveys

Bacterial 16S rRNA sequence surveys have been used extensively for cataloging bacteria and archaea (Wang and Qian, 2009), whereas 18S small subunit (SSU) rRNA surveys are just beginning to gain attention for cataloging protist symbionts (Tai and Keeling, 2013). Over 20 bacterial 16S surveys have been published to date using both cloningdependent and -independent, high- and low-throughput approaches (**Table 1**). Highly variable species-level compositions have been obtained across the different termite species investigated, but, in general, six major bacterial phyla are represented across higher and lower termites: Bacteroidetes, Firmicutes, Spirochaetes, Proteobacteria, Fibrobacteres, and Elusimicrobia (Brune, 2014). Surveys conducted in parallel with higher-termite metagenome studies have been very informative for matching functional and taxonomic diversity (Warnecke et al., 2007; He et al., 2013); however, a study comparing multiple colonies through pyrosequencing of 16S amplicons found that bacterial compositions were different among colonies and likely influenced by local environment (Boucias et al., 2013). Additionally, 16S surveys revealed that lignocellulosic diet shifts have no short-term impacts on termite and cockroach microbiota composition (Sanyika et al., 2012; Boucias et al., 2013; Schauer et al., 2014). Another 16S survey of fungus-growing termites suggested a core microbiota of 42 genera that was shared among all nine termite species tested (Otani et al., 2014). This core microbiota was very different from other higher and lower termites, leading the authors to conclude the 42 common genera represent a core microbiota of fungus-growing termites. Conversely, since the termites were sampled from a limited geographic area it is possible that the core genera represent common microbes acquired from the local environment.

In comparison to prokaryotic 16S surveys, comparatively few protist 18S SSU surveys have been conducted (**Table 1**). These studies, conducted using a combination of cloningdependent and independent approaches, have been transformative. Two studies provided new evidence to suggest greater protist symbiont diversity than originally indicated by traditional morphological identification (James et al., 2013; Tai et al., 2013). Two other studies used high-throughput 16S and 18S SSU sequencing to compare 24 lower termites with three woodfeeding cockroaches (Tai and Keeling, 2013; Tai et al., 2015). Like their predecessors, these studies found protist diversity to be higher than when estimated by morphology, and also that protist symbiont taxa tend to be highly endemic to a host genus, which is different than relationships between termite hosts and bacterial symbiota. These findings illustrate the significant opportunities that exist for development of high-throughput techniques for assessing protist symbiont communities and studying protist-bacterial symbiont relationships.

#### Needs and Opportunities

Termite omic research in the last 10–15 years has led to a new era of understanding for termite and symbiont biology. Omics has also enabled the development of new unparalleled resources (i.e., transcriptome, genome, proteome, metabolome, symbiont meta-omic, and symbiont rDNA) useful for moving ahead with targeted functional work. The stage is now set for making significant headway in many aspects of termite research, including, but not limited to digestion, symbiosis, caste differentiation, and social evolution. However, key needs and opportunities remain in specific areas that seem particularly relevant for filling in knowledge gaps and potentially leading to transformative, paradigm-shifting outcomes.

Having the *Z. nevadensis* and *M. natalensis* genomes available not only facilitates further study of genes related to a range of evolutionary and biological processes, but these resources also provide important scaffolds for assembly of additional lower and higher termite genomes. Once multiple termite genomes are available, this would certainly better inform our view of termite social evolution. On the topic of hostsymbiont "hologenomes," sequencing more host genomes and symbiont metagenomes from the same termites concurrently (as recently done for *M. natalensis*), would provide unprecedented insights into the scope of interactions and synergies occurring in termite holobiomes. Such efforts could further reveal important differences between clades of higher and lower termites, leading to new evolutionary insights. Such datasets would also provide unmatched resources for advancing integrative sociogenomic, digestomic, termitosphere, and other research topics.

On the topic of proteomics, more studies are needed in species that have had genomes, transcriptomes, metagenomes, or metatranscriptomes sequenced. Combining proteomics with nucleic acid sequencing will better resolve gene prediction models and better test for congruency between transcription and translation profiles. On the topic of metabolomics, termite digestion remains an area much in need of metabolomic research focusing on how complex lignocellulose is broken down in termite guts and converted to energy. Also, tracking metabolites as they leave the gut and are utilized in the termite body would be very informative for testing hypotheses on the relative importance of nutrient flow into symbiont metabolic pathways.

On the topic of DNA methylomics, while it is now clear that DNA methylation happens in termites, so-far inconclusive results have been obtained regarding the role of DNA methylation in caste regulation. *In silico* methylation studies as performed can only suggest that methylation may exist and which genes are potentially differentially methylated. Functional and translational research is needed to understand the roles of such genes.

Substantial opportunities and needs still remain for 16S and 18S rRNA-based symbiont cataloging. Protist 18S SSU cataloging capabilities in particular have recently been developed, and can continue to improve provided that several conditions are met, such as: (1) appropriate primers can be developed, (2) statistically sound sampling regimes can be developed at biologically relevant scales, (3) single-cell microbiology and other data sources can be integrated, and (4) appropriate analytical tools developed (Tai and Keeling, 2013). This line of research has already begun to transform the view of protist diversity and co-evolution with host termites but more studies are needed in different termite species with established omic resources.

Finally, regarding prokaryotic 16S surveys, much has already been done, but an important gap in knowledge is the extent to which environment influences bacterial microbiota composition. This is important information for understanding differences in behavior and physiology across the geographic range for a termite species, as well as potentially for limiting the extent to which generalizations can be made about the relative importance of individual microbes or core microbiota in gut communities.

#### Conclusion

This review has covered many aspects related to outcomes, findings and trends resulting from termite omic research. To date, omic research in diverse termite species has provided key insights into caste differentiation, digestion, pathogen defense and microbiomes, and most recently has provided two termite genome sequences. Termite omics has also created important tools and resources for conducting targeted, functional, translational, and applied research. However, these resources have only received limited attention to date for asking hypothesis-driven questions to elucidate the functional and evolutionary significance for pools of identified genes, proteins, and microbes. In recent years sequencing has rapidly moved into the realm of super high-throughput, with accompanying assembly and analyses requiring proportional super-computing power and bioinformatics expertise, but only limited resolution of biology or function. Transitioning from research that produces lists of genes, proteins and microbes, to research that determines their functional significance, is where the most important challenges lie for the next phases of termite science.

### Funding

Work conducted in the author's laboratory was supported by the following funding sources: USDA-CSREES-NRI grant no. 2007-35607-17777, USDA-NIFA-AFRI grant nos. 2009-05245 and 2010-65106-30727, Consortium for Plant Biotechnology Research-DOE grant no. DE-FG36-02GO12026, DOE-SBIR grant nos. DE-FG02-08ER85063 and DE-85538 S08-II, NSF grant no. 1233484CBET, and the O.W. Rollins/Orkin Endowment at Purdue University. M.E.S. is an inventor on the following patents: US Patent No. 7,968,525, US Patent No. 8,445,240, US Provisional Patent No. 61/602,149, and US Provisional Patent No. 61/902,472.

#### References


#### Acknowledgments

Apologies are extended to investigators whose research could not be cited because of space limitations. The author thanks Priya Rajarapu, Brittany Peterson, and Andres Sandoval for manuscript review, Vera Tai for sharing prepublication data, as well as his collaborators and all members of his laboratory, past and present, for their contributions and input.

the South American termite *Cornitermes cumulans*. *Microb. Ecol.* 65, 197–204. doi: 10.1007/s00248-012-0119-6


in the damp-wood termite *Hodotermopsis sjostedti*. *BMC Genomics* 11:314. doi: 10.1186/1471-2164-11-314


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Scharf. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

### Epigenetics as an answer to Darwin's "special difficulty"

#### *Brian R. Herb\**

Center for Epigenetics and Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA

#### *Edited by:*

Greg J. Hunt, Purdue University, USA

*Reviewed by:*

Xianyun Mao, Guardian Analytics, USA Keijo Viiri, University of Tampere, Finland

*\*Correspondence:* Brian R. Herb, Center for Epigenetics and Department of Medicine, Johns Hopkins University School of Medicine, 855 North Wolfe Street, Baltimore, MD, USA e-mail: brianherb@jhmi.edu

Epigenetic modifications produce distinct phenotypes from the same genome through genome-wide transcriptional control. Recently, DNA methylation in honeybees and histone modifications in ants were found to assist the formation of caste phenotypes during development and adulthood. This insight allows us to revisit one of Darwin's greatest challenges to his natural selection theory; the derivation of multiple forms of sterile workers within eusocial species. Differential feeding of larvae creates two distinct developmental paths between queens and workers, with workers further refined by pheromone cues. Flexible epigenetic control provides a mechanism to interpret the milieu of social cues that create distinct worker sub-caste phenotypes. Recent findings suggest a distinct use for DNA methylation before and after adult emergence. Further, a comparison of genes that are differentially methylated and transcriptionally altered upon pheromone signaling suggests that epigenetics can play a key role in mediating pheromone signals to derive sub-caste phenotypes. Epigenetic modifications may provide a molecular mechanism to Darwin's "special difficulty" and explain the emergence of multiple sub-phenotypes among sterile individuals.

**Keywords: epigenetics, evolution, genomics, developmental plasticity, eusociality, behavior, pheromones**

A major defining feature of eusocial species is the division of labor among phenotypically distinct castes (Winston, 1987). The evolution of such a system required the eventual partitioning of all reproductive tasks to a single individual, leaving the remaining tasks to sterile relatives. This arrangement, however, posed a great challenge to Darwin's theory of natural selection (Darwin, 1859). The bedrock of his theory was that successful individuals passed on traits to the next generation. How then, could a sterile individual possess traits distinct from reproductive individuals and not have the means to pass them on? Darwin wrote:

... *one special difficulty, which at first appeared to me insuperable, and actually fatal to the whole theory. I allude to the neuters or sterile females in insect-communities; for these neuters often differ widely in instinct and in structure from both the males and fertile females, and yet, from being sterile, they cannot propagate their kind.*

Lacking a through understanding of the underlying genetics, Darwin nonetheless had the great insight that these sterile workers, being related to the reproductive member of the colony, can ensure the survival of their species by helping the colony as a whole. This insight did not fully answer the challenge, because he goes on to marvel, not at the existence of sterile workers *per se,* which he equates to the trait divergence between males and females, but rather how can multiple sub-phenotypes of sterile workers arise:

*The great difficulty lies in the working ants differing widely from both the males and the fertile females in structure, as in the shape of the thorax, and in being destitute of wings and sometimes of eyes, and in instinct. As far as instinct alone is concerned, the wonderful difference in this respect between the workers and the perfect females would have been better exemplified by the hive-bee.*

Darwin's curiosity might have been further heightened if he knew that for most social insects the reproductive and sterile females are genetically identical. Phenotypic difference in the absence of genetic difference falls in the realm of epigenetics, which is the study of heritable information other than the DNA sequence itself. Epigenetic information can be stored in the molecular form as methylation on the cytosine base of DNA or a variety of modifications to histone tails (Kouzarides, 2007; Jones, 2012). These epigenetic modifications play a key role in tissue development where drastically different organs are derived from the same genome (Irizarry et al., 2009). Here we explore the role of epigenetic modifications in caste determination and propose that epigenetic machinery is important to derive the multiple forms of sterile workers that vexed Darwin so long ago.

Honeybees (*Apis mellifera*) have unique control over the developmental fate of the females in the colony through differential feeding of the larvae and pupae. A nutrient rich diet of royal jelly produces a reproductive queen, and the absence of such diet produces facultatively sterile workers. This royal jelly contains royalactin, a potent activator of p70 s6 kinase that increases ovary development and shortens development time (Kamakura, 2011). Queen development is marked by an increase in Tor activity during the third to fifth instars, stimulating growth and increased metabolism. Increased Tor activity occurs at the developmental time point when queens and workers diverge into two irreversible paths, permanently locking in caste differences. RNAi knockdown of Tor causes larvae to prolong development, reduce growth and ultimately emerge as workers, even on a diet of royal jelly (Patel et al., 2007; Mutti et al., 2011).

While honeybee hives have a single queen that lays millions of eggs over her 2–3 year lifespan, thousands of workers perform the remaining tasks. The typical adult worker will first act as a nurse to raise the young, attend to the queen, and clean combs for the next generation. About 8 days post-emergence, the worker will transition into foraging tasks, which are metabolically taxing and accelerate physiological decline (Winston, 1987). However, the division of labor is dependent on the needs of the colony, and individuals within the hive are able to communicate these needs either through direct contact or by pheromones. Queen mandibular pheromone (QMP) is emitted by the queen in order to recruit nurse bees to care for her, suppress ovary growth in workers and discourage workers from raising a new queen. The brood translates its own needs by emitting brood pheromone (BP) to stimulate nurse bees to feed and care for the brood. BP also influences nurse bees to delay the transition into foraging, and existing foragers to skew their collecting toward the protein source pollen (Slessor et al., 2005). Workers further refine their tasks by physically interacting with fellow workers and recruiting them to specific tasks based on the needs of the hive (Seeley et al., 1998). Foragers themselves can also suppress nurse bees from foraging by emitting Ethyl oleate (Leoncini et al., 2004). So it is in this environment of constant signals that the worker bee refines her role throughout life. These signals form a basis for unlocking multiple phenotypes, the marvel of Darwin 150 years ago.

Honeybees use diet and social cues to separate genetically similar females into distinct roles, but what is the underlying molecular mechanism that integrates environmental stimuli and solidifies phenotype? Lacking a strong genetic candidate, epigenetic modifications can drive differentiation of multiple phenotypes as seen with cellular lineages in blood (Ji et al., 2010). The beauty of epigenetic mechanisms is that they can assist in maintaining a particular transcriptional state by storing information in the form of temporary chemical tags at the level of DNA itself. DNA methylation and histone modifications have been thoroughly studied in mammals and are known to play a major role in development and disease (Ho and Crabtree, 2010; Hansen et al., 2011). Genome-wide epigenetic modifications, like DNA methylation, can be context specific depending on their placement relative to genes and enhancers. A unique combination of DNA methylation and histone modifications in the promoter of a given gene can have a persistent repressive effect when these marks are bound by proteins that in turn establish larger protein complexes that as a whole suppress transcription. A good example of this process occurs during mammalian differentiation where pluripotency genes such as *OCT4* and *NANOG* are silenced by methylation of H3K9 by G9a, which in turn leads to condensing of chromatin by HP1 binding and eventual DNA methylation (Feldman et al., 2006; Smith and Meissner, 2013). This step-wise change in epigenetic modifications indicates different degrees of repression that become increasing resistant to activation. These epigenetic modifications can be reversed, but require persistent signals, such as the expression of the reprograming factors *OCT4*, *SOX2*, *MYC,* and *KLF4* to derive induced pluripotent stem cells (iPS; Doi et al., 2009). Another example of dynamic epigenetic change is during the activation of the *pS2* gene upon estrogen signaling. Time-course experiments showed active demethylation of DNA and recruitment of chromatin

remodeling proteins to the site of the *pS2* gene after estrogen signaling (Metivier et al., 2008). In addition to chemical modifications to DNA and histone tails, RNA itself can provide temporal control of gene expression through the binding of noncoding RNAs to DNA, proteins or other RNAs. Non-coding RNAs can organize chromatin structure on a large scale as evidenced by X chromosome inactivation by the ncRNA *Xist*, or control local expression in the case of the ncRNA *Air* interacting with the histone methyltransferase G9a to silence the *Slc22a3* gene during development (Nagano et al., 2008; Mercer and Mattick, 2013). Studies investigating DNA methylation differences between worker subcastes in honeybees (Herb et al., 2012) and between queens and workers in ants (Bonasio et al., 2012) have found that many differentially methylated genes are involved in noncoding RNA processing, suggesting a role for non-coding RNA in caste determination. While the fundamental role and scope of non-coding RNA in mammalian development is established (Mattick, 2011), the impact of non-coding RNA in social insects is just starting to be understood (Bonasio, 2012; Humann et al., 2013), therefore the focus of this perspective will only include epigenetic modifications that have been mapped genome-wide, namely DNA methylation and histone modifications. Overall, the temporal control of epigenetic modifications enforces a specific transcriptional state by storing information at the level of the DNA itself, which remembers that state until a new stimuli is encountered.

Only recently has the importance of epigenetics in social insects been appreciated through the discovery of DNA methylation in many species (Bonasio et al., 2010; Beeler et al., 2014). Social insects are an ideal test ground for studying the role of epigenetic mechanisms because they can derive multiple behavioral phenotypes from the same genome. The first major clue that epigenetics played a role in queen/worker differentiation came soon after the complete sequencing of the honeybee genome in 2006 (Consortium, 2006) when the presence of DNA methyltransferase enzymes confirmed a functional DNA methylation system (Wang et al., 2006). Kucharski et al. (2008) knocked down Dnmt3 in larvae and found thatregardless of diet, most knockdowns developed queen features. This initial result inspired a genome-wide search for functional DNA methylation differences between queens and workers that resulted in three major studies that interrogated three developmental time points; larvae (Foret et al., 2012), adult emergence (Herb et al., 2012), and advanced age adults (Lyko et al., 2010). While differences between queens and workers were found across the genome in larvae and advanced age adults, there were no statistically significant differences at the time of adult emergence (**Figure 1A**). While these studies take different approaches to find regional changes in DNA methylation, the large number of differences found in larvae compared to the complete absence of differences between queens and workers strongly suggest that DNA methylation is required to maintain queen/worker differences during the larval stage, but are not required to separate newly emerged queens and workers when morphological differences are irreversible. DNA methylation appears to target many genes of the Tor pathway (Mutti et al., 2011; Foret et al., 2012), which has been implicated in queen worker developmental differentiation (Patel et al., 2007). It is possible that DNA

methylation assists in maintaining the activation of Tor pathway genes caused by royal jelly. Returning to Darwin's difficulty, we see that DNA methylation can assist in maintaining separate transcriptional programs for queens and workers in honeybees, providing a mechanism for maintaining caste differences. Further proof that epigenetic mechanisms help produce alternative phenotypes is illustrated by the finding that histone modifications, in particular H3K27ac, differentiate major from minor workers in the carpenter ant *Camponotus floridanus* (Simola et al., 2013). The remarkable size difference between ant worker sub-castes was particularly striking to Darwin and this result illustrates that epigenetic modifications, including histone modifications, can help produce multiple sterile worker phenotypes (Darwin, 1859). This example from ants bolsters the idea of epigenetic modifications solidifying differences initiated by diet during development, but how do epigenetic modifications help individuals navigate transitions throughout adult life as seen above with nurses and foragers?

workers are simply a result of intra-caste changes in adults.

While worker bees generally transition from nursing to foraging tasks over their lifetime, the timing of this transition and exact task they perform at any given point along this continuum is refined by social cues within the hive (Slessor et al., 2005). Specific pheromones can elicit a change in expression of hundreds of genes and recruit workers to a task or delay their transition into a new task (Grozinger et al., 2003; Alaux et al., 2009). Although powerful, these signals must be regarded in the context of the hive, where the organization of the brood in the center and the storage of pollen and honey on the periphery create "task zones."Workers are born into a region where the queen is actively laying eggs and pheromones from the queen and brood are strongest and influence the newborn worker to assume nursing tasks. As workers age, they encounter returning foragers that present recruitment signals to elicit the nurses to transition into new roles (Winston, 1987; Whitfield, 2003). However, if upon the first interaction with a returning forager, a nurse flew out of the nest and began collecting nectar, or inversely upon a whiff of queen pheromone foragers began caring for the brood, the hive would be in chaos. Instead, tasks are performed for continuous periods and a transition requires repeated cues to initiate. The flexible control that epigenetics offers is an ideal mechanism for interpreting social cues within the hive and provide temporal control over gene expression.

Workers generally start their adult lives performing nursing tasks and transition into foraging tasks, but it is possible to revert foragers back to nursing tasks if the need arises (Amdam et al., 2005). This reversion schema includes two types of nurses, one set that has always performed nursing tasks, and one set that has had foraging experience that reverts back to nursing. When age-matched nurses, foragers and reverted nurses were compared, hundreds of differentially methylated regions (DMRs) distinguished these phenotypes. Incredibly 57 DMRs followed the behavioral reversion where DNA methylation levels changed during the nurse to forager transition, and changed back to original nurse levels during the reversion (**Figure 1B**). Genes associated with these 57 reversion DMRs had far reaching developmental and gene regulatory functions, including multiple genes containing DEAD-box helicase domains that act through chromatin remodeling to affect global gene expression. Importantly, distinct nurse and forager specific epigenetic signatures were identified, demonstrating that specific levels of DNA methylation in the brain are required toform sub-caste phenotypes. This study was also the first evidence of reversible methylation underlying a behavioral trait and demonstrates the flexible control epigenetic modifications can bestow on phenotype (Herb et al., 2012).

If pheromones and other social interactions impact the epigenome, then the socially defined task is remembered at the level of the genome itself. This also ensures an intrinsic "buffering" against instantly changing task upon the random encounter with workers performing different tasks. This buffering would ensure that only persistent social cues, reflective of the true needs of the hive, would cause the worker to switch tasks. This is not meant to detract from or oversimplify the complex network of social cues in the hive that organize labor in such an efficient way. Rather, coupling the flexible control of epigenetics with the spatial and social cues in the hive establishes a framework for understanding how

workers can achieve stabilized sub-caste phenotypes. If true, then genes under the influence of pheromones should also be regulated by epigenetic modifications. The influences of two major pheromones, QMP and BP, have been studied at the genome level using gene expression microarrays. Key genes *POE*, active in neuro-development, and the chromatin remodeling genes *HCF* and *ISWI* are all chronically regulated by QMP (Grozinger et al., 2003) and are also differentially methylated between nurses and foragers (Herb et al., 2012). In addition, genes regulated by BP and those differentially methylated between nurses and foragers (Alaux et al.,2009) sharefunctional enrichment of helicase and nucleoside binding genes. BP influences the expression of the developmental genes *WIT*, *UNR*, and *BICD* (Alaux et al., 2009) and these genes are associated with reversible methylation between nurses and foragers (Herb et al., 2012). Epigenetic control of a core group of master regulatory genes may be sufficient to help maintain pheromone-induced phenotypes and help stabilize the division of labor in the hive. For example, the chromatin-remodeling gene *ISWI*, which is regulated by QMP and differentially methylated between nurses and foragers, remodels nucleosomes around gene promoters (Sala et al., 2011). The action of Iswi may facilitate the large-scale gene expression differences observed during the nurse to forager transition (Whitfield, 2003).

Social insects are masters at controlling the division of labor within their colonies to maximize efficiency and react to changing environmental conditions. However, the evolution of the sterile worker proved troublesome to Darwin, who struggled to incorporate the existence of multiple phenotypically distinct sterile workers in his natural selection theory that emphasized passing traits directly to the next generation. Through differential feeding and social cues, the colony as a whole has evolved numerous mechanisms to fine-tune the phenotypes of sterile workers to obtain an efficient division of labor. We can now integrate the action of epigenetic machinery to the evolution of social insects. Epigenetic information stabilizes phenotype and provides a mechanism for deriving multiple castes from the same genome. This extra layer of information works with established signaling pathways and regulatory programs to lock in gene expression patterns and interpret external stimuli. DNA methylation appears to play two major roles, distinguishing queens and workers during development and defining sub-castes within the lifetime of worker bees. Further, based on the limited presence of methylation across the honeybee genome, it seems that methylation has been reserved to act on select genes that have far reaching effects. These key genes are regulated by epigenetics but are initiated by social cues within the hive, directing the division of labor. As seen with histone modifications in ants and DNA methylation in honeybees, epigenetics plays an important role in social insects. Perhaps it will bear out that utilizing epigenetic machinery to derive additional worker phenotypes was critical to the evolution of eusociality in insects.

#### **REFERENCES**


honey bee workers. *Exp. Gerontol.* 40, 939–947. doi: 10.1016/j.exger.2005. 08.004


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 15 July 2014; paper pending published: 08 August 2014; accepted: 27 August 2014; published online: 12 September 2014.*

*Citation: Herb BR (2014) Epigenetics as an answer to Darwin's "special difficulty." Front. Genet. 5:321. doi: 10.3389/fgene.2014.00321*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Herb. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Epigenetics as an answer to Darwin's "special difficulty," Part 2: natural selection of metastable epialleles in honeybee castes

#### *Douglas M. Ruden1\*, Pablo E. Cingolani <sup>2</sup> , Arko Sen3 , Wen Qu3 , Luan Wang4 , Marie-Claude Senut <sup>4</sup> , Mark D. Garfinkel <sup>5</sup> , Vincent E. Sollars <sup>6</sup> and Xiangyi Lu4*

<sup>1</sup> Department of Obstetrics and Gynecology, C. S. Mott Center for Human Growth and Development and Center for Urban Responses to Environmental Stressors, Institute of Environmental Health Sciences, Wayne State University, Detroit, MI, USA

<sup>2</sup> School of Computer Science and Genome Quebec Innovation Centre, McGill University, Montreal, QC, Canada

<sup>3</sup> Department of Pharmacology, Wayne State University, Detroit, MI, USA

<sup>4</sup> Institute of Environmental Health Sciences, Wayne State University, Detroit, MI, USA

<sup>5</sup> Department of Biological Sciences, University of Alabama in Huntsville, Huntsville, AL, USA

<sup>6</sup> Department of Biochemistry and Microbiology, Joan C. Edwards School of Medicine, Marshall University, Huntington, WV, USA

#### *Edited by:*

Greg J. Hunt, Purdue University, USA

#### *Reviewed by:*

Aaron Arthur Comeault, University of Sheffield, UK Feng-Chi Chen, National Health

#### *\*Correspondence:*

Research Institutes, Taiwan

Douglas M. Ruden, Department of Obstetrics and Gynecology, C. S. Mott Center for Human Growth and Development and Center for Urban Responses to Environmental Stressors, Institute of Environmental Health Sciences, Wayne State University, 275 East Hancock, Room 002, Detroit, MI 28201, USA e-mail: douglasr@wayne.edu

In a recent perspective in this journal, Herb (2014) discussed how epigenetics is a possible mechanism to circumvent Charles Darwin's "special difficulty" in using natural selection to explain the existence of the sterile-fertile dimorphism in eusocial insects. Darwin's classic book "On the Origin of Species by Means of Natural Selection" explains how natural selection of the fittest individuals in a population can allow a species to adapt to a novel or changing environment. However, in bees and other eusocial insects, such as ants and termites, there exist two or more castes of genetically similar females, from fertile queens to multiple sub-castes of sterile workers, with vastly different phenotypes, lifespans, and behaviors. This necessitates the selection of groups (or kin) rather than individuals in the evolution of honeybee hives, but group and kin selection theories of evolution are controversial and mechanistically uncertain. Also, group selection would seem to be prohibitively inefficient because the effective population size of a colony is reduced from thousands to a single breeding queen. In this follow-up perspective, we elaborate on possible mechanisms for how a combination of both epigenetics, specifically, the selection of metastable epialleles, and genetics, the selection of mutations generated by the selected metastable epialleles, allows for a combined means for selection amongst the fertile members of a species to increase colony fitness. This "intra-caste evolution" hypothesis is a variation of the epigenetic directed genetic error hypothesis, which proposes that selected metastable epialleles increase genetic variability by directing mutations specifically to the epialleles. Natural selection of random metastable epialleles followed by a second round of natural selection of random mutations generated by the metastable epialleles would allow a way around the small effective population size of eusocial insects.

**Keywords: epigenetics, evolution, genomics, developmental plasticity, eusociality, group selection**

#### **DARWIN'S "SPECIAL PROBLEM"**

Eusocial (Greek *eu*: "good/real" + "social") insects include the Hymenoptera (ants, bees, and wasps) and the Isoptera (termites). In honeybees, which is the focus of this perspective, a hive has caste differences; the diploid queen and the haploid drones are the sole reproducers, while the nurses, soldiers, guards, and foragers are "sub-castes" or "task groups" of the worker caste of sterile females that work together to benefit the group as a whole (Free, 1987). However, as pointed out by Herb (2014) in a previous perspective in this journal, having sterile females in a colony is a potentially fatal flaw in Darwin's theory of natural selection, which states that the fittest individuals pass their traits (i.e., genes) to the next generation. Darwin (1859) referred to sterile workers in insect communities as, "... one special difficulty, which at first appeared insuperable, and actually fatal to the whole theory."Darwin (1871) later proposed a way around this "special difficulty" by proposing a "group selection" model for evolution of altruistic behaviors in eusocial insects. Darwin argued that "group selection" can occur when the benefits of altruism between castes are greater than the individual benefits of selfishness (egotism) within a subpopulation. Hamilton (1964), the great population geneticist Hamilton formalized the idea of group selection in a mathematical model, rb > c, where b represents the benefit to the recipient of altruism, c the cost to the altruist, and r their degree of relatedness. Kin selection takes into account the genetic relatedness of individuals in a group, was a further refinement of the group selection theory. In kin selection, rbk + be > c, in which bk is the altruistic benefit to kin and be is the altruistic benefit accruing to the group

as a whole (Wilson and Wilson, 2007). However, group- and kinselection models are mathematically complex and remain controversial amongst many evolutionary theorists, such Dawkins (1976) and Nowak et al. (2010), who argue that group selection is unlikely because, among many reasons, selfishness (i.e., "selfish genes" – a phrase Dawkin's coined) would always predominate over altruism.

Here, we further elaborate on Herb's (2014) thesis that epigenetics might be a way around Darwin's "special difficulty." We argue that epigenetic inheritance systems (EISs) can allow rapid evolution of traits specific for sterile workers and fertile queens. Epigenetics does not involve changes in the DNA sequence, but rather covalent, yet reversible, changes to the DNA in the form of 5-methylcytosine (5mC). EISs should work fine for short-term evolutionary changes. However, natural selection of DNA sequence variants would still be needed for long-term evolutionary changes. Histone modifications, long non-coding RNAs, prions, and other types of EISs will not be discussed in detail, but rather we will focus on 5mC, since 5mC represents a reversible change to the genome that can be modified by the environment (Chia et al., 2011). Heritable changes in 5mC, such as occurs in imprinted genes in mammals, are also called metastable epialleles (Rakyan et al., 2002; Dolinoy et al., 2007). The most important aspect of DNA methylation in the hypothesis presented in this paper is that, unlike histone modifications, 5mC is mutagenic and can lead to permanent changes to the DNA. Specifically, 5mC can undergo spontaneous deamination, which converts 5mC to T (Coulondre et al., 1978; Duncan and Miller, 1980). A hypothesis for how natural selection of metastable epialleles can lead to DNA mutations that permanently stabilize the epialleles into real alleles, the epigenetic directed genetic error (EDGE) hypothesis, is presented in the last section of this perspective.

The inspiration for many of the ideas in this perspective is a chapter in Jablonka and Lamb's (2005) excellent book "Evolution in Four Dimensions" on EISs. They created an imaginary planet named Jaynus where the variety of organisms all had exactly the same genome sequences, yet had many different phenotypes. They wrote:

*Jaynus organisms have a genetic system that is based on DNA, and replication transcription, and translation are much the same as on Earth. However, there is one very extraordinary thing about the DNA of Jaynus creatures – every organism has exactly the same DNA sequences. From the simplest organism, a tiny unicellular creature, to the enormous fanlike colonial worms, the DNA is identical. Their genomes are large and complex, but no organism deviates from the universal standard sequences because there are cellular systems that check DNA and destroy any cell suspected of carrying a mutation.*

In this perspective, we describe how intra-caste evolution is, in many respects, similar to how evolution proceeds on the mythical planet of Jaynus.

#### **MECHANISMS OF CASTE DETERMINATION AND EPIGENETIC MODIFICATION IN HONEYBEES**

*At the extreme superorganismic phase, the level of selection becomes the genome of the queen and the sperm she stores, and the workers can be viewed as robotic extensions of her phenotype (Wilson and Nowak, 2014*).

As beautifully described in the above quote, eusocial insects are even more extreme in some respects than the mythical organisms on Jaynus because they have evolved to a "superorganismic" stage in which the queen is the reproductive organ(ism) and the workers are the "robotic extensions" or the somatic cells of the superorganism. Kennedy et al. (2014) called the formation of eusocial insect colonies and super colonies "a new major transition in evolution." We propose that the intra-caste evolutionary process in honeybees might share epigenetic mechanisms with those proposed on Jaynus. Darwin (1859) evolution utilizes the concepts of "survival of the fittest" in a population and "natural selection" of genetic variation to eventually form new species. Darwin, of course, did not know about either genetic or epigenetic variation, for his work pre-dated Gregor Mendel's discoveries (or, more accurately, rediscovery in the 20th century (Meneses Hoyos, 1960; Fairbanks and Rytting, 2001), but the modern interpretation of "natural selection" is selection of genetic variation. We propose that "intra-caste evolution" is a type of micro-evolution, which is small-scale evolution within a population, and refers to survival of the fittest members of a caste. We propose here that "intra-caste evolution" is initially based on natural selection of metastable epialleles of the most-fit caste (i.e., queen and worker) and sub-caste members (i.e., nurse, soldier, guard, and forager). As in the mythical Jaynus example, genetic selection probably cannot be the primary mechanism for selecting the fittest worker bee, since most worker bees cannot breed (however, see below). For example, the most efficient forager sub-caste of workers cannot be selected for by direct genetic selection, since workers, in most situations, are sterile females. However, a hive with more efficient foragers can be produced by group selection of metastabile epialleles that produce an increased foraging efficiency. One phenotype that worker bees have evolved to increase foraging efficiency are the pollen baskets on the hind legs, which are present on workers but not on queens. A possible mechanism for the evolution of pollen baskets is presented later in this perspective.

If the "intra-caste evolution" hypothesis of honeybee castes is not mediated primarily by genetic means, then how are the desirable phenotypes, such as efficiency in foragers, transmitted to the next generation? We propose that, first, queens undergo a great deal of stress (i.e., malnutrition) when there is not an adequate amount of foraging being performed by the workers. The stress, in a mechanism that we present in a later section, leads to an activation of random stress-induced metastable epialleles, some of which increase the food-carrying capacity of pollen baskets. Second, the metastable epialleles which improve the fitness of the colony, such as those that serendipitously alter pollen baskets in workers in a manner that increases storage capacity, are selected over several generations by group selection. Third, random mutations can potentially be directed to the selected metastable epialleles by the EDGE mechanism, described in the final section of this perspective. The main reason for the need for the EDGE hypothesis is, we believe, because the "normal" background mutation and group selection processes are not adequate when the effective population size of a species is too low, as it arguably is in eusocial insects (i.e., only the queen breeds). The EDGE hypothesis provides an additional mechanism to increase

the mutation rate of specific genes required to ensure the survival of the colony.

#### **CHEMICAL MEANS OF EPIGENETIC MODIFICATIONS IN HONEYBEE CASTES**

In addition to the selection of the most-fit caste and sub-caste members in each generation by group selection, honeybees have evolved to produce royal jelly to alter the epigenetic and developmental machinery of their offspring. The active ingredients of royal jelly include a fatty acid, (E)-10-hydroxy-2-decenoic acid (10HDA), which accounts for up to 5% of royal jelly. The fatty acid 10HDA, interestingly, is an epigenetic modifier molecule with a histone deacetylase inhibitor (HDACi) activity (Spannhoff et al., 2011). HDACs remove acetyl groups from histones, which are present in actively transcribed genes to open up the chromatin, presumably by repellent ionic charges pushing the nucleosomes apart (Jenuwein and Allis, 2001). HDACi's inhibit the deacetylation of histones, which would lead to the acetyl groups remaining on histones, and therefore transcriptional activity would be high in the "queen-specific genes" of larvae fed royal jelly. Another component of royal jelly is the protein royalactin, which increases body size and ovary development in queens (Kamakura, 2011). The mechanism of action of royalactin is thought to be multifold: activation of mitogen-activated protein kinase (MAPK), which decreases developmental time, activation of p70 S6 kinase, which increases body size, and increasing juvenile hormone production, which is an essential hormone for ovary development (Kamakura, 2011). Interestingly, the same paper also showed that royalactin dramatically increases body size and ovary development whenfed to thefruit fly, *Drosophila melanogaster* (Kamakura, 2011).

Based on the fact that royalactin has similar effects on the solitary fruit fly as on eusocial bees, we propose a theoretical epigenetic mechanism for how the queen's dependence on royal jelly for ovary development evolved. In our model, bees originally were solitary, like fruit flies, and every female fended for herself in terms of feeding and reproduction. However, when the food is in short supply, the absence of nutrients would lead to a reduction of reproductive fitness and a diminution in ovary development. Consequently, the population reaches a bottleneck when the food runs low, and only those few individuals that have sufficient nutrition survive. If the few survivors evolved the capacity to feed some of their offspring, which would be one of the first steps in eusocial evolution, then when the food runs too low, they can feed adequately only some of their offspring and leave the other offspring malnourished. The female offspring that are fed would develop ovaries, whereas the female offspring that were not sufficiently fed would develop atrophied ovaries and would be sterile. A decrease in reproductive fitness is a universal character of most animals during starvation (Carey et al., 2008). However, and this is key, both fertile and sterile offspring are produced by the same mother, in a manner that is dependent on how much food or the quality of food they were fed. If the sterile offspring provided a selective-advantage to the group as a whole, then those mothers that produced both fertile and sterile offspring would have a selective advantage over those mothers that produced only fertile offspring. After millions of years of fine-tuning this process, the honeybee sterile-fertile dimorphism could have theoretically evolved by group selection.

As pointed out by an anonymous reviewer, there are at least three potential problems with our hypothesis on how honeybees evolved to produce royal jelly. First, the feeding behavior would need to be developed when food was scarce. Second, the sterilefertile dimorphism would have to be maintained even though food became abundant again. Third, altruism would have to be developed when the sterile-fertile dimorphism emerged. It is hard to argue around these criticisms for a solitary insect such as *Drosophila*, and that might be why *Drosophila* and other solitary insects never evolved a sterile-fertile dimorphism. However, as suggested by Wilson and Nowak (2014), perhaps a way to circumvent all of these problems is the fact that the first step in eusocial evolution is probably the ability to form nests or colonies. This would allow the development of the dichotomy in bees, wasps, and ants of being a forager or staying in the nest to lay eggs. Since foraging is dangerous and taxing, if the workers are bringing the proto-queen pollen and nectar, then she is less inclined to forage for it. Once the proto-queen evolved the ability to produce royal jelly, then she would become the only fertile member of the colony – all of the workers could be chemically sterilized by withholding royal jelly. The development of altruism, in this case, could be an emergent property of the sterile-fertile dimorphism. As discussed further in a later section, there are many examples of emergent behavior in eusocial insects (Johnson,2001), and we argue that altruism of sterile workers could be one of them.

In addition to royal jelly, honeybees have evolved an arsenal of other chemical weapons that subvert developmental and behavioral processes in the workers. For example, after about 8 days post-emergence, the nurse bees who take care of the eggs will transition into foragers, and foraging is more metabolically taxing because it requires the filling of baskets on the hind legs to transport the pollen (Free, 1987). However, the behavioral transition from nurse to forager depends on the needs of the hive, and individuals in the hive transmit these needs by both direct contact and by pheromones. The transition among sub-caste members in honeybees, but not some ant species that have physically different worker castes (Wilson and Nowak, 2014), is purely behavioral because all honeybee workers have pollen baskets despite the fact that only foragers use them – pollen baskets do not develop *de novo* in the nurse when she transforms into a worker.

Queen mandibular pheromone (QMP) is emitted by the queen to recruit nurses to her and to suppress ovary growth (Free, 1987). The larvae emit brood pheromone (BP) to stimulate nurse bees to feed and care for them (i.e., the brood). BP affects the nurses and foragers in different manners: it stimulates nurses to care for the brood and to delay their transition into foragers, while it stimulates foragers to collect nutrientrich pollen to feed the brood (Slessor et al., 2005). Foragers, in turn, emit ethyl oleate (EO) to suppress nurse honeybees from foraging (Leoncini et al., 2004). Isoamyl acetate, which has a similar odor to the banana and pear, was found in Boch et al. (1962) to be an active component in the sting pheromone of the honeybee which is presumably released by honeybee guards when a hive is disturbed. It is through such chemical

(i.e., environmental) signals that the honeybees are able to epigenetically maintain the caste structure in a manner that circumvents, in most aspects, the need for the selection of genetic variation.

#### **THE POTENTIAL ROLE OF NATURAL SELECTION OF GENETIC AND EPIGENETIC VARIATION IN DRIVING THE EVOLUTION OF CASTE SYSTEMS**

As mentioned earlier, an important consideration regarding genetic-stabilization of the sterile-fertile dimorphism, is that the sterile workers can, in rare cases, develop ovaries. Removal of a queen can cause some workers to develop ovaries, in part because they are no longer exposed regularly to QMP (Herb et al., 2012, 2013). Also, nurses and foragers can revert back-andforth rapidly in either direction in a manner that is dependent on the needs of the hive (Amdam et al., 2005). We believe that the occasional reversion of a sterile worker to a reproductive female is a critical mechanism for transmitting both metastable epialleles and genetic variation that is required for the worker caste. According to the EDGE hypothesis, genetic variation that is induced by the metastable epialleles, can be selected to increase worker specialization in the next generation. The purpose of the metastable epialleles, in the EDGE hypothesis, are not to circumvent the need for genetic variation, but rather to increase genetic variation in precisely the genes that need to be adapted for the organism, or superorganism in the case of eusocial insects, to survive the novel environment.

The selection of metastable epialleles in *Drosophila* is wellestablished in our laboratory (Ruden et al., 2003, 2008, 2009; Sollars et al., 2003; Ruden and Lu, 2008) and in other laboratories (Carrera et al., 1998; Ruden and Lu, 2008; Tariq et al., 2009; Gangaraju et al., 2010; Valtonen et al., 2012; Branco and Lemos, 2014; Le Thomas et al., 2014; Nystrand and Dowling, 2014; Somer and Thummel, 2014; Stern et al., 2014; Wei et al., 2015). We showed, for instance, that stress, or the inactivation of the chaperone protein Hsp90, can activate a metastable epiallele of the *Kruepple*Incomplete facets−<sup>1</sup> (*Kr*If−1) allele, which causes ectopic large bristle outgrowths (ELBOs) to protrude from the eyes (Sollars et al., 2003). We indicate the metastable epiallele with the nomenclature [*Kr*If−1]<sup>∗</sup> and showed that the metastable epiallele can be transmitted through both the male and female germlines for tens or even hundreds of generations (Ruden et al., 2003, 2008). What makes a metastable epiallele an example of an epigenetic variant rather than a genetic variant is the fact that a metastable epiallele, such as [*Kr*If−1]∗, can be reverted back to the original allele, in this case *Kr*If−1, in just one or two generations by negative selection (Sollars et al., 2003). Since *Drosophila* has very little DNA methylation, the metastable epialleles in *Drosophila* are probably not the result of differential DNA methylation. However, Gangaraju et al. (2010) presented evidence that the [*Kr*If−1]<sup>∗</sup> metastable epiallele requires Piwi and Pi RNAs, which are small non-coding RNAs in the germline and function similarly to siRNAs and mi-RNA (Ruden, 2011; Grentzinger et al., 2012). We are still actively trying to determine the exact nature of the [*Kr*If−1]<sup>∗</sup> metastable epiallele and how it is transmitted through both the male and female germlines. As discussed later, we believe that *Drosophila*, and more generally most or all Dipterans (flies) and Coleopterans (beetles), lost DNA methylation because the presence of 5mC would slow down the syncytial blastoderm mitotic cycles, which at ∼8 min are the fastest in the animal kingdom (Ruden and Jackle, 1995).

There is no direct laboratory evidence that selection of metastable epialleles occurs in eusocial insects, such as honeybees. However, there are at least three indirect indications that metastable epialleles that utilize differential DNA methylation occur in eusocial insects. First,Herb et al. (2012, 2013)showed that reverting foragers back to nurses reestablished the nurse-pattern of DNA methylation. This was the first evidence of reversible epigenetic changes associated with behavior. Second, Hunt et al. (2010) found that worker-biased proteins exhibited slower evolutionary rates than queen biased proteins or non-biased proteins. This is consistent with the idea that metastable epialleles must be transmitted through the germline, and the queen and fertile workers are the only females that produce eggs. Finally, as described in the next section, the bimodal distributions of CG content and/or DNA methylation in most insect genes suggests a role for differential DNA methylation and the existence of metastable epialleles in most insects.

#### **MECHANISMS OF EPIGENETIC MODIFICATION IN HONEYBEES**

How might an EIS in honeybees and other organisms evolve? In order to understand this, it is necessary to describe the patterns of DNA methylation in mammals and honeybees (**Figures 1A,B**). In mammals, ∼60% of genes have so-called CpG islands in the promoter regions and 5 regions, which are defined as regions of higher than average CG content. DNA methylation of CpG islands in mammals occurs primarily at CpG sites in somatic cells but often at CHH (where H = C, A or T) sites in stem cells (reviewed in Patil et al., 2014). The degree of CpG island methylation is inversely proportional to gene expression for most genes; i.e., highly expressed genes have little CpG island DNA methylation, whereas, low-expressed genes have large amounts of CpG island DNA methylation (**Figure 1A**). Two mechanisms that CpG island DNA methylation in mammals are thought to function to reduce gene expression are by inhibiting binding of some transcriptional activation factors, such as AP1, which binds to GC-rich consensus sequences, and by increasing the binding of transcriptional inhibitory factors, such as MeCP2, which recruits HDACs to inhibit transcription (reviewed in Jones, 2012).

The CpG island DNA methylation story is the most-wellknown aspect of epigenetic regulation of transcription in mammals. However, several studies have shown that gene-body DNA methylation also occurs in a manner that is mostly proportional to gene expression in both mammals and insects (Konu and Li, 2002; reviewed in Jones, 2012). In other words, highly expressed genes have the most gene-body DNA methylation, and this DNA methylation is mostly restricted to exon sequences (**Figure 1B**), but this is partly because exons, since they encode proteins, are CG rich compared to intronic and intergenic regions, which do not encode proteins. DNA methylation in mammals also occurs at repeat sequences, such as ALUs, SINES, LINES, and retroviruses,

and this has been shown to prevent expression and, thereby, retrotransposition of the retroviruses to new genomic regions (Jones, 2012).

Interestingly, DNA methylation in honeybees occurs primarily in gene bodies, particularly in exons (**Figure 1B**). However, exclusion by alternative splicing in honeybees.

in contrast to mammals, CpG islands are not apparent in the promoters of honeybee genes (i.e., there are very few genes with enriched CG-content in the promoter regions). Additionally, in the honeybee, little or no DNA methylation occurs in repeat or intergenic sequences (Lyko et al., 2010; Zemach et al., 2010; Chen et al., 2011). Therefore, in honeybees, DNA methylation is not thought to epigenetically regulate expression of genes by controlling transcription factor binding to promoter regions, but rather is a consequence of gene expression. Gene body methylation in honeybees likely improves the fidelity of gene expression by allowing transcription to initiate only at the promoter and not at intergenic regions. Gene body DNA methylation in plants, for instance, has been shown to suppress intragenic transcriptional start sites and anti-sense transcription, presumably by preventing transcriptional activation proteins from binding to the gene body and inappropriately activating transcription from cryptic promoters (Zhang et al., 2006).

Originally, it was reported that most DNA methylation occurs primarily in CpG sequences in honeybees (Lyko et al., 2010; Zemach et al., 2010; Chen et al., 2011). However, we have shown, by analyzing our own data, and by reanalyzing the data from Lyko et al. (2010), that there is actually more CHH DNA methylation in honeybees than CpG DNA methylation (Cingolani et al., 2013). The other laboratories that analyzed DNA methylation in the honeybee used software that removed most of the CHH DNA methylation, presumably because this type of DNA methylation occurs in less complex regions of the genome (i.e., CG poor) and are therefore harder to align to the reference genome. Also, multiple CHH methylation events in a single next-generation DNA sequencing (NGS) read are often, sometimes improperly, interpreted as poorly converted by bisulfite and thrown out. However, we validated that most of the CHH methylation events are real by alternative methods, such as sequencing honeybee genomic DNA after immunoprecipitation with anti-5mC antibodies, and enzymatic digestion of DNA at 5-hydroxymethylcytosine (5hmC) sites (Cingolani et al., 2013). We did confirm, however, like the other groups, that CpG DNA methylation is primarily in exons. Interestingly, we also found that CHH DNA methylation is primarily in introns, partly because introns are larger and have a lower CG content (Cingolani et al., 2013).

We were also the first group to find significant amounts of 5hmC in bees (Cingolani et al., 2013). 5hmC is an oxidized form of cytosine, and is presumably produced by the honeybee ortholog to the ten-eleven-translocation (TET) protein, a dioxygenase that converts 5mC to 5hmC, and is involved in epigenetic reprogramming in mammals (reviewed in Chia et al., 2011). Wojciechowski et al. (2014) recently confirmed the presence of 5hmC in honeybees and characterized the enzymatic function of the TET enzyme. Because of the uncertainty of whether 5hmC is a stable epigenetic mark, as some investigators believe (including us), or a transient DNA modification in the de-methylation pathway, as most investigators believe, we will not discuss 5hmC further in this review but will await future clarification on this topic.

Genome sequencing the honeybee showed that it has an unusual genome structure that we believe facilitates the generation of metastable epialleles (Elango et al., 2009). In honeybees, there are two types of genes based on CG content in exons (**Figure 1C**). Highly expressed, so-called housekeeping genes, which are expressed in all cells, have a lower CG content than low-expressed genes. This bimodal distribution of CG content in genes, which are called isobars, was first observed by a bioinformatics analysis of the newly sequenced honeybee genome (Jorgensen et al., 2007). The discovery of isobars in the honeybee genome was made prior to the mapping of the 5mC sites by whole-genome shotgun bisulfite sequencing by our laboratory and several other laboratories (Lyko et al., 2010; Zemach et al., 2010; Chen et al., 2011; Cingolani et al., 2013). Sodium bisulfite converts C to uracil (U) unless it is methylated (5mC), and whole genome shotgun bisulfite sequencing is used to map all of the 5mC sites in the genome (Xi and Li, 2009). Interestingly, all of the groups that performed whole-genome shotgun bisulfite sequencing to map the 5mC distribution in honeybees found, at first impression paradoxically, that the low-CG content genes have much more DNA methylation than the high-CG content genes (Lyko et al., 2010; Zemach et al., 2010; Chen et al., 2011; **Figure 1C**). We additionally found that CHH DNA methylation is also greater in the low-CG content genes than in the high-CG content genes. We proposed, since there is not a bimodal distribution of CHH sequences, that the same DNA methyltransferases (i.e., DNMT1 and DNMT3) methylate both CG and CHH sequences in a manner that is directly proportional to the level of gene expression (Cingolani et al., 2013). **Figure 2** shows an example of a high-CG content, low-5mC gene (Ubx, **Figure 2A**) and a low-CG content, high-5mC gene (Actin, **Figure 2B**). Both Ubx and Actin will be discussed as examples throughout this perspective.

To reiterate, low-CG content genes have more 5mC than high-CG content genes. This is counter-intuitive because it indicates that the greater the CG content, the less the DNA methylation, despite there being more cytosines (specifically, CpG sites) to methylate. However, high-CG content genes having low DNA methylation makes biological sense for the same reason that CpG islands (by definition, with high CG content) have low DNA methylation. The biological sense is based on the fact that 5mC has a much higher (up to 10-fold) mutation rate to thymidine (T) than non-methylated cytosine (Rakyan et al., 2001). Therefore, the more highly expressed genes would have more 5mC (**Figure 1B**), and, consequently, more of the cytosines would become thymidine. Consequently, in highly expressed genes, the CG-content would be expected to become lower-and-lower as more-and-more CGs are converted to TGs. The reason for the higher mutation rate of 5mC-to-T compared with C-to-T is that 5mC spontaneously deaminates at the 6-position to form T, which is a natural DNA base. However, unmethylated C deaminates to U, which is normally not present in DNA, and there are enzymes [specifically uracil *N*-glycosylase (UNG)] to remove the U bases in DNA (Rakyan et al., 2001). The diagrams in **Figures 1A,B** are a simplification for clarity purposes because the most highly expressed genes, which we will call "ultra-high," usually have less DNA methylation than the medium and highly expressed genes in the gene bodies in both insects and mammals. This might be because the ultra-high expressed genes may have lost so many of their CpGs that there are not enough remaining to allow them to enter the most highly methylated class – in other words, the amount of DNA methylation that can occur in genes is saturated and peaks before it reaches equilibrium. The genetic code for certain amino acids and intra-exon RNA-splicing enhancers requiring CGs in their consensus sequencings are likely two additional reasons for retaining a few CGs in housekeeping genes.


**FIGURE 2 | Examples of high CG and low CG genes. (A)** Apis mellifera Ubx has 97 CG s in the coding region of a 993 base pair cDNA. The 5mC level of high CG -content genes, such as Ubx, is low. **(B)** A. mellifera actin has 40 CG s in the coding region of a 1131 base pair cDNA. The 5mC level of low CG -content genes, such as actin, is high.

As discussed in the next section, differential gene-body methylation might be a contributing factor to the emergence of eusociality. However, the bimodal distribution of CG content seems to be less of a contributor to eusociality than the bimodal distribution of DNA methylation. Bees, wasps, and ants all have bimodal distributions in DNA methylation in genes, but only bees and wasps have a bimodal distribution in CG content. In all three eusocial insects – bees, wasps and ants – the highly expressed genes are generally more methylated than the low-expressed genes (Sarda et al., 2012). Sarda et al. (2012)studied the evolution of gene-body DNA methylation in invertebrates and showed that silkworm (*Bombyx mori*), which has DNA methylation at appreciable levels in the genome, nevertheless does not have a bimodal peak of CG content in genes. This is similar to our finding of a unimodal peak of CHH sites but a bimodal peak of DNA methylation based on CG content, discussed earlier (Cingolani et al., 2013).

Interestingly, the silkworm has a bimodal peak in DNA methylation levels similar to the honeybee, in which highly expressed genes have higher levels of DNA methylation in the gene body. The unimodal peak in CG content but bimodal peak in DNA methylation levels seen in the silkworm genome also occurs in all ant species studied so far (Glastad et al., 2011; Bonasio et al., 2012; Bonasio, 2015). The bimodal peak in CG content in genes is not unique for the honeybee, however, because it is also seen in other invertebrates such as the sea anemone (*Nematostella vectensis*) and the sea squirt (*Ciona intestinalis*; Sarda et al., 2012). We conclude that while bimodal peaks in CG content and DNA methylation might facilitate the formation of metastable epialleles, they are not essential for the generation of metastabile epialleles. In the next section, we explore the possibility that metastable epiallele hyper-mutability, a key component of the EDGE hypothesis, is an emergent property of bimodal levels of DNA methylation in eusocial insects.

#### **METASTABLE EPIALLELE HYPER-MUTABILITY MIGHT BE AN EMERGENT PROPERTY OF BIMODAL LEVELS OF DNA METHYLATION**

*The movement from low-level rules to higher level sophistication is what we call emergence (Johnson, 2001).*

The above quote is from Johnson's (2001) best-selling 2001 book, "Emergence: the connected lives of ants, brains, cities, and software." In the book, Johnson (2001) describes how a simple behavior, such as an increasing number of ants following a weak-and-winding scent trail laid down by one ant to a food supply, can lead to a complex behavior, such as all of the ants following a direct path to the food. Eusocial insects show many other examples of bottom-up behavior where workersfollow simple rules that emerge into complex hive behaviors (Johnson, 2001). However, in contrast to human societies, there is little if any top–down behaviors in eusocial insects. For example, as mentioned above, the queen is best characterized as the "reproductive organ" in the hive and does little to influence the behaviors of the worker sub-castes (Johnson, 2001), who themselves follow simple rules that are programmed into their genomes and epigenomes. We believe that the differential methylation of genes based on the level of gene expression is just

such a simple rule that can lead to complex emergent phenomena, such as metastable epiallele hypermutability and, ultimately, eusociality.

We hypothesize that an emergent property of low-expressed genes having low levels of DNA methylation is that they become more susceptible to epigenetic control, for the simple fact that they have more unmethylated cytosines. Highly expressed genes with high levels of DNA methylation can also potentially become metastable epialleles, but this would require differential demethylation, such as by TET enzymes, in the germline cells after a stress response. In another review we presented a model for how oxidative stress can alter the function of the TET enzyme (Chia et al., 2011). However, what is the normal function(s) of gene body DNA methylation? In addition to preventing intragenic and antisense transcription within genes, mentioned above, one process that we and others have shown evidence to be regulated by gene body DNA methylation is alternative mRNA processing. For example, DNA methylation of cassette exons, at both CpG and CHH sites, correlates with their preferential exclusion in the mature mRNA (Lyko et al., 2010; Cingolani et al., 2013; **Figure 1D**). Furthermore, Li-Byarlay et al. (2013) have shown that RNA interference (RNAi) knockdown of DNMT3a, the *de novo* DNA methlyltransferase, alters RNA splicing and causes intron retention in hundreds of genes in the honeybee fat bodies. How DNA methylation affects alternative mRNA splicing is not known in bees, but in mammals, DNA methylation inhibits the binding of the transcription factor CCCTC binding factor (CTCF), which affects alternative splicing (Shukla et al., 2011). We speculate that there might be some biophysical processes involved too, since methylated DNA has a higher melting temperature (Tm) than unmethylated DNA (Severin et al., 2011). Therefore, the increased Tm of methylated DNA might alter RNA polymerase translocation rates, cause pausing, and thereby affect the alternative mRNA splicing pattern.

One interesting observation is that most insects, such as honeybees, have relatively large amounts of DNA methylation (but much less than mammals), but *Drosophila* has very little DNA methylation (Lyko et al., 2000; Lyko, 2001). The reason for the scarcity in DNA methylation in *Drosophila* is that *Drosophila* appears to have lost Dnmt1, the maintenance DNA methyltransferase, which methylates hemizygous DNA after replication, and Dnmt3, the *de novo* DNA methyltransferase, which methylates unmethylated DNA. The existence of DNA methylation in *Drosophila* is controversial because the only cytosine methyltransferase orthologs in *Drosophila* is a homolog to DNA methyltransferase 2 (MT2), but this enzyme was shown to methylate transfer-RNA-Asp (tRNAAsp) and presumably not DNA (Goll et al., 2006). However, the controversy appears to be resolved (at least to some in the field) by a recent paper that shows CHH methylation, albeit at very low levels, in *Drosophila* in a manner that is independent of MT2 (Capuano et al., 2014). The authors were able to detect low levels of 5mC in *Drosophila* embryos in a two-step protocol of first immunoprecipitation of DNA with anti-5mC antibodies, followed by bisulfite sequencing of the immunoprecipitated DNA fragments (Capuano et al., 2014). Our laboratory has similar evidence for low levels of 5mC in *Drosophila* and we speculate that it is generated non-enzymatically

by spontaneous methylation of cytosines by intrinsic alkylation of DNA.

We speculate that Dipterans (flies) and Coleopterans (beetles) lost DNA methyltransferases 1 and 3 because DNA methylation is redundant with histone modifications, such as H3K9me3 and H3K27me3, in repressing gene expression. Furthermore, we speculate that methylated DNA slows down DNA replication because of the higher melting temperature (Tm) of methylated DNA compared with unmethylated DNA (Severin et al., 2011), which we mentioned earlier in the discussion of mRNA splicing. The predicted slowing down of DNA replication by DNA methylation is important in *Drosophila* because the first 10 syncytial nuclear divisions in the blastoderm embryo are in "hyper-drive" and are less than 8–10 min in duration (a world record, to our knowledge). Therefore, any process that slows down these rapid divisions would presumably be selected against because the faster-developing siblings would breed sooner (Ruden and Jackle, 1995).

#### **EPIGENETIC DIRECTED GENETIC ERRORS AND THE EVOLUTION OF CASTS IN HONEYBEES**

Macroevolution requires selection of existing genetic variation to generate new species with greater fitness, but how does the subcaste worker specialization increase when the effective population size of eusocial insects is so low (i.e., only one reproductive female per hive)? We mentioned group selection and kin-selection models at the beginning of this review, but they remain controversial in light of Dawkins's (1976) "selfish gene" hypothesis. Dawkins (1976) argued that "selfish genes" that benefit the immediate survival and propagation of the "vessel" (the organism) would have much greater (and more immediate) selective advantage than altruistic genes that benefited the group. We speculate again, as we did in several other reviews, that one possible mechanism to facilitate genetic variation in the evolution of species is what we call the EDGE hypothesis (Ruden, 2005; Ruden et al., 2005a, 2008; Ruden and Lu, 2008).

In the simplest version of the EDGE hypothesis, the first step is the intra-caste selection of metastable epialleles that increase the specialization of a worker. The metastable epialleles could initially be generated by a stressful (i.e., non-optimal) environment, which would lead to a functional inactivation of Hsp90 (Rutherford et al., 2007a), which is a chaperone for many chromatin remodeling proteins (Ruden and Lu, 2008), including the Trithorax (Trx) protein (Tariq et al., 2009; **Figures 3A,B**). Hsp90 has been called a "capacitor for morphological evolution" because many previously cryptic phenotypes are revealed when stress inactivates Hsp90 protein and this alters multiple signaling pathways (Rutherford and Lindquist, 1998; McLaren, 1999; Rutherford and Henikoff, 2003; Rutherford et al., 2007a,b). The Trx protein, since it is a client for Hsp90, is an environmentally sensitive component of the Trx Group (TrxG) complex of proteins that is involved in maintaining transcriptional memory (i.e., activation) of the Hox genes, such as the Ultrabithorax (Ubx) gene during early embryogenesis in insects (Orlando et al., 1998). One of the enzymatic functions of the TrxG complex is trimethylation of histone 3 at lysine 4 (H3K4me3), which is an activating mark for transcription (Jenuwein and Allis, 2001).

When Hsp90 is inactivated in a stressful environment, Hox genes such as Ubx would have lower expression, presumably because there would be less H4K4me3 histone marks at the promoters. Since stress inactivates the Trx protein (**Figure 3B**), then stress would be expected to cause an increase in the DNA methylation status of the Ubx gene. The reason for this is that, in the absence of the Trx protein, the gene would no longer be in an activated state but switch to a repressed state by the Polycomb Group (PcG) repressor proteins (Paro et al., 1998). It is not known whether this occurs in bees, but in mammals genes that are initially repressed by PcG proteins are often further repressed by intragenic DNA methylation during cellular differentiation (Deaton et al., 2011). We speculate that Ubx would have originally become a metastable epiallele in the proto-queen, who still has pollen baskets, because full pollen-baskets could immobilize her, and hence stress her, in the confines of the hive. In the EDGE hypothesis, the repair of base substitutions caused by methylated cytosines increases the mutation frequency of not only of the methylated cytosine, as mentioned above (Rakyan et al., 2001), but also neighboring bases because of error-prone DNA repair mechanisms (Ruden, 2005; Ruden et al., 2005a,b). This error-prone DNA repair could lead to an increase in the mutation frequency of genes with metastable epialleles, such as in the Ubx gene (**Figure 4D**). Through this "mutation-spreading" effect, the metastable epialleles could cause not only an increase in the mutation frequency of the exons, but also regulatory sequences in the adjacent promoters and introns. In other words, simply by becoming a metastable epiallele, the EDGE hypothesis predicts that the mutation frequency of a gene would increase. Fortuitously, genes with increased mutation frequencies are precisely those that need to be mutated to stabilize the metastable epialleles in a genetic manner.

How EDGE mutations generated in sterile workers are transmitted to the next generation in honeybees is a major issue that warrants discussion. One possible mechanism for transmitting the EGDE mutations to the next generation could be through honeybee workers who develop ovaries and become fertile after queen removal, as mentioned above (Feldmeyer et al., 2014). They could then directly transfer the mutations (as well as the metastable epialleles) to their offspring. Those mutations that are beneficial to the hive by stabilizing the metastable epialleles would have a selective advantage for the whole hive and would thereby be selected by group selection. An important consideration is that fertile-workers only have drone progeny (i.e., haploid males) and queens have both drone and worker progeny. This would necessitate that the metastable epialleles be transmitted through the male germline in the offspring of fertile workers. However, it is not clear whether worker-to-fertile-female conversions are frequent enough to explain the evolution of sterile-worker specializations.

Another possible mechanism for the transfer of EGDE mutations to the next generation is that that the queen can transmit EDGE mutations to her offspring directly, without having to go through a worker-to-fertile-female conversion process. Metastable epialleles have to be in either the queen or the worker (by definition), but affect them in different manners. Therefore, the genes that become metastable epialleles would be predicted to have a

#### **FIGURE 3 | Epigenetic control of development of the pollen basket in worker bees. (A)** In unstressed conditions, Hsp90 is functional and activates

Trithorax (Trx), through the chaperone activity of Hsp90. The Trx group (TrxG) proteins tri-methylate histone 3 lysine 4 (H3K4me3) on the promoter nucleosomes (N) (red dots), and increase expression of Hox genes such as Ubx. Transcriptional activation of Ubx in bees increases the DNA methylation of the gene body (red dots), as shown in **Figure 1B**. In the epigenetic directed error hypothesis (EDGE), the repair of base substitutions caused by methylated cytosines increases the mutation frequency of not only of the methylated cytosine but also neighboring bases. This could lead to an increase in the mutation frequency of genes with metastable epialleles, such as in the Ubx gene. This figure is modified from a previous review from our

laboratory and we retain the copyright (Ruden, 2011). **(B)** In stressed conditions, Hsp90 is inactive and cannot activate Trx, and transcription of Ubx is low. **(C)** Diagram of an empty and full pollen basket in forager bees. This diagram is used with permission from the Encyclopedia of Science, Copyright © The Worlds of David Darling (http://www.daviddarling.info/). **(D)** In queens, the pollen basket does not form because Ubx is low in T3. This causes an anterior transformation of the third thorax (T3) leg to look like the T2 leg. **(E)** In workers, the pollen basket forms because Ubx expression is high in T3. This figure represents a simplified representation of the homotic transformation that occurs when Ubx levels are reduced and are not meant to be accurate illustrations. This photograph is used with permission from Spike Walker, Wellcome Images, London (http://wellcomelibrary.org/).

mutations.

higher mutation rate by the EDGE process. Some of these mutations would not affect the queen (and therefore reproduction), but might stabilize the metastable epialleles that affect the workers. Support for the EDGE hypothesis is the fact that queen-specific genes mutate faster than worker-specific genes (Hunt et al., 2010; Helantera and Uller, 2014) This makes sense since queens breed

by Trx-switching-to-PCG, described in **Figure 2A**. **(B)** Left, few 5mCs in the

much more frequently than fertile workers, which, as mentioned above, only occur when the queen is removed from the colony (Feldmeyer et al., 2014). Queens would therefore have a greater opportunity to transmit both metastable epialleles, and mutations in the metastable epialleles, to the offspring, than the fertile workers.

#### **EPIGENETIC DIRECTED GENETIC ERRORS AND THE EVOLUTION OF POLLEN BASKETS IN HONEYBEE WORKERS**

A recent paper that we believe supports the EDGE hypothesis for intra-caste evolution in honeybees discusses the dimorphism in pollen basket formation in genetically similar queens and workers. This fascinating paper shows that the Hox gene Ubx, mentioned throughout this perspective, promotes pollen basket formation on the tibia of the hind legs in bees in the third thoracic segment (T3) (Medved et al., 2014). The pollen basket is a hollow indentation on the large and mostly bristle-free tibia segment that the forager bees use to store and transport impressive amounts of pollen (**Figure 3C**). In the queen, who does not collect pollen, the tibia is covered with hairs that would otherwise inhibit pollen collection. The investigators showed that reduction of Ubx levels in the workers by injecting inhibitory RNA (RNAi) into the worker embryos caused the hind legs to resemble that of the queens and become reduced in bristles (Medved et al., 2014). In *Drosophila*, mutations in Ubx, combined with other mutations in the bithorax complex (BXC), produced the famous four-winged fly that won Edward Lewis the 1995 Nobel Prize in Physiology and Medicine (Crow and Bender, 2004). Normally, *Drosophila* one pair of wings on the second thoracic segment (T2) and one pair of halteres (balancer organs that counteract the wing movement) on the third thoracic segment (T3). Lewis (1978) explained the Ubx phenotype as causing an anterior homeotic transformation of T3 to T2, hence, the famous four-winged fly. Ubx has a conserved 60 amino acid homeobox (Hox) domain, which is nearly identical from *Drosophila* to humans, and a highly variable transcriptional regulatory domain. Hox genes, such as Ubx, not only regulate segmentation during embryogenesis, but they also affect subtle changes in limb, brain, and other organ development.

Medved et al. (2014) found that mutations in the Hox gene, Ubx, causes complex fate decisions in each segment of the honeybee T3 legs. A simplification of the results of the Ubx-RNAi experiments in honeybees is that the third leg has a partial homeotic transformation to the second leg by a similar T3-to-T2 homeotic transformation as seen in *Ubx*-mutant flies (**Figures 3D,E**). As mentioned earlier, the way the nurse honeybee controls pollen basket development in workers is by withholding royal jelly. In honeybees, the targets of Ubx are not known, but the authors speculated on what might be occurring in honeybees, based on what is known in the much better characterized *D. melanogaster* genetic system. In honeybee queens, when they are fed royal jelly as larvae, the HDACi activity in the royal jelly could possibly help in the activation of expression of the likely Ubx-target genes, such as grunge (gug) and Ataxin-2 (Atx2), which play a role in the formation of bristles in *Drosophila* (Erkner et al., 2002; Al-Ramahi et al., 2007). Consequently, the authors speculate, this might be one reason why the T3 tibia segments in queens have bristles in the area of the pollen basket, while workers do not (Medved et al., 2014).

#### **EPIGENETIC DIRECTED GENETIC ERRORS IN NON-CG DINCULEOTIDES IN METASTABLE EPIALLELES**

In the EDGE hypothesis, we propose that methylated cytosines are mutagenic not only in the 5mC sites but also in the surrounding bases. The reason we propose this broader-range of mutagenicity is because error-prone DNA repair mechanisms can increase the mutation frequency of surrounding bases while repairing 5mC>T base substitution mutations. Metastable epialleles, which have variable levels of 5mC, can occur in both somatic cells and germline cells, but they are generally referred to as simply "differentiated cells" when they occur in somatic cells. When metastable epialleles occur in somatic cells, they cannot be transmitted to the progeny. However, when a metastable epiallele occurs in a germline cell, then it can be transmitted to the progeny, as we and others have demonstrated in *Drosophila* (Sollars et al., 2003; Tariq et al., 2009).

It is not yet known whether there is a bimodal distribution of 5mC in bee germline cells, but for the sake of argument, let's assume for this perspective that it is similar to what occurs in somatic cells – i.e., housekeeping genes have low CG-content and high levels of 5mC and low-expressed genes have high CG-content and low levels of 5mC (**Figure 4A**). Therefore, housekeeping genes, such as Actin (**Figure 4A**, left) would never be metastable epialleles because their few CG s are always heavily methylated – i.e., there can be no differential 5mC if it is always high. In contrast, lowexpressed genes, such as Ubx, which is presumably not expressed at all in germline cells, would have high CG content but very little 5mC (**Figure 4A**, middle). We hypothesize that maternal stress can increase the DNA methylation in low-expressed genes, such as Ubx, and turn them into metastable epialleles (**Figure 4A**, right). This has not yet been demonstrated in any organism, but it should be possible to test this hypotheses in the laboratory once single-cell epigenomics techniques are further optimized (Teles et al., 2014).

In housekeeping genes, such as Actin, there would still be expected to be an increase in mutations near and surrounding the 5mC sites. However, since there is a great deal of purifying selection in housekeeping genes, that would make any deleterious mutations in such important structural genes selected against (**Figures 4B,C**, left). Also, the 5mC rate in housekeeping genes is so high that there has probably been a maximum change in CG-to-TG sequences so that no further such mutations can occur without having deleterious structural or regulatory changes to the gene. In contrast, in low-expressed genes, such as Ubx, there would be mutations in CG-sites that do not undergo as much purifying selection (**Figures 4B,C**, right). As mentioned above, while the Ubx Hox domain is a 60 amino acid sequence that is almost absolutely conserved from *Drosophila* to humans (Scott, 1986), the remaining amino acids, such as in the transcriptional regulatory domains, are amongst the most variable sequences in proteins (Ruden et al., 1991; Ruden, 1992).

In the Ubx-mutagenesis hypothesis for basket formation in honeybees, several questions arose during review of this manuscript. First, "How did Ubx changed its biological role without affecting fitness?" Second, "Is it possible that Ubx regulates both body plan and caste differentiation in honeybee but not in solitary insects?" Third, "Could some other gene(s) compensate for the supposed "functional loss" of Ubx in honeybee?" To answer these questions, we do not believe that the mutations in Ubx would necessarily affect fitness by causing a "functional loss." Rather, we believe that the mutations in Ubx were most likely regulatory mutations in the promoter and introns and they represent a functional gain rather than a functional loss. Developmental genes have large and complex regulatory regions, such as individual enhancers for each of the eight stripes in segmentation genes such as fushi tarazu (Ohtsuki et al., 1998). The Hox genes in *Drosophila*, such as Ubx and Antennapedia (Antp), have enhancer regions 10s or even 100s of kilobases from the promoter regions (Calhoun et al., 2002; Calhoun and Levine, 2003). The Ubx gene in *Drosophila* has a complex array of alternative spliced products and an unusual mechanism for splicing the 74 kb intron that involves multiple steps of re-splicing the intron (Hatton et al., 1998). This re-splicing mechanism avoids competition between distant splice sites and allows removal of the 74 kb intron as a series of smaller RNA fragments (Hatton et al., 1998). The diverse array of transcriptional and RNA splicing regulatory sequences should allow Ubx to evolve multiple additional roles in caste formation without the need for other genes to compensate for it proposed "functional loss."

#### **CONCLUSION**

We propose an intra-caste model of evolution that is based on selection of metastable epialleles in worker bees that runs parallel to the macro-evolution and group selection of DNA mutations. Like the mythical world of Jaynus, the evolution of the most-fit sub-caste members occurs through the selection of metastable epialleles by group selection. However, our EDGE hypothesis expands upon the limited world of Jaynus, in which all of the organisms have exactly the same sequence, by proposing a mechanism to direct mutations to the metastable epialleles that were selected. These directed mutations can, in turn, stabilize and increase the penetrance of the metastable epialleles in future generations of superorganism colonies. Waddington, who is often considered the father of epigenetics, proposed a mechanism similar to the EDGE hypothesis in Waddington (1942) for the inheritance of acquired characteristics that were induced by stress. In follow-up experiments, in response to Waddington, we provide a possible epigenetic mechanism for how stress can reveal previously cryptic phenotypic information by the inactivation of Hsp90 (Ruden et al., 2003; Sollars et al., 2003). Finally, our EDGE hypothesis presented here provides a possible mechanism for the stabilization of metastable epialleles, thereby allowing the evolution of castes and sub-castes in eusocial insects.

#### **ACKNOWLEDGMENTS**

This research was supported by R01 ES012933 and R21 ES021893 and the WSU-NIEHS Center (P30 ES020957). We thank Greg Hunt for excellent editorial assistance and much needed expertise on eusocial insects. DMR worked as an undergraduate in Dr. Edward B. Lewis' laboratory in 1980–1981 dedicates this review to his memory.

#### **REFERENCES**

Al-Ramahi, I., Perez, A. M., Lim, J., Zhang, M. H., Sorensen, R., de Haro, M., et al. (2007). DAtaxin-2 mediates expanded Ataxin-1-induced neurodegeneration in a *Drosophila* model of SCA1. *PLoS Genet.* 3:e234. doi: 10.1371/journal.pgen.0030234


Fairbanks, D. J., and Rytting, B. (2001). Mendelian controversies: a botanical and historical review. *Am. J. Bot.* 88, 737–752. doi: 10.2307/2657027


genetic errors in repeat-containing proteins (RCPs) involved in evolution, neuroendocrine signaling, and cancer. *Front. Neuroendocrinol.* 29:428–444. doi: 10.1016/j.yfrne.2007.12.004


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 23 November 2014; accepted: 08 February 2015; published online: 24 February 2015.*

*Citation: Ruden DM, Cingolani PE, Sen A, Qu W, Wang L, Senut M-C, Garfinkel MD, Sollars VE and Lu X (2015) Epigenetics as an answer to Darwin's "special difficulty," Part 2: natural selection of metastable epialleles in honeybee castes. Front. Genet. 6:60. doi: 10.3389/fgene.2015.00060*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2015 Ruden, Cingolani, Sen, Qu, Wang, Senut, Garfinkel, Sollars and Lu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

#### *Erik M. K. Rasmussen1\* and Gro V. Amdam1,2*

<sup>1</sup> Department of Chemistry, Biotechnology and Food Science, Faculty of Veterinary Medicine and Biosciences, Norwegian University of Life Sciences, Aas, Norway

<sup>2</sup> School of Life Sciences, Arizona State University, Tempe, AZ, USA

#### *Edited by:*

Greg J. Hunt, Purdue University, USA

#### *Reviewed by:*

Bernardo Lemos, Harvard University, USA Amy Lynn Toth, Iowa State University, USA

#### *\*Correspondence:*

Erik M. K. Rasmussen, Department of Chemistry, Biotechnology and Food Science, Faculty of Veterinary Medicine and Biosciences, Norwegian University of Life Sciences, Christian Magnus Falsens vei 1, N-1432 Aas, Norway e-mail: erik.rasmussen@nmbu.no

Epigenetic changes enable genomes to respond to changes in the environment, such as altered nutrition, activity, or social setting. Epigenetic modifications, thereby, provide a source of phenotypic plasticity in many species. The honey bee (Apis mellifera) uses nutritionally sensitive epigenetic control mechanisms in the development of the royal caste (queens) and the workers. The workers are functionally sterile females that can take on a range of distinct physiological and/or behavioral phenotypes in response to environmental changes. Honey bees have a wide repertoire of epigenetic mechanisms which, as in mammals, include cytosine methylation, hydroxymethylated cytosines, together with the enzymatic machinery responsible for these cytosine modifications. Current data suggests that honey bees provide an excellent system for studying the "social repertoire" of the epigenome. In this review, we elucidate what is known so far about the honey bee epigenome and its mechanisms. Our discussion includes what may distinguish honey bees from other model animals, how the epigenome can influence worker behavioral task separation, and how future studies can answer central questions about the role of the epigenome in social behavior.

**Keywords: honey bee, methylation, demethylation, 5-hydroxymethylcytosine, social behavior**

#### **INTRODUCTION**

Since the first honey bee methylome was sequenced in 2010, our understanding of the functional implications of DNA methylation in the honey bee has begun to unfold (Lyko et al., 2010). 5-methylcytosine (5mC) is believed to be involved in alternative splicing, caste differentiation and worker behavioral task separation (Lyko et al., 2010; Flores et al., 2012; Herb et al., 2012). Recently, several other cytosine modifications were discovered in mammalian genomes (Kriaucionis and Heintz, 2009; He et al., 2011; Ito et al., 2011). These modifications are believed to have separate functions from 5mC as they are distributed differently in the genome, and specific reader proteins for one of these modifications exist (Spruijt et al., 2013). Although studies to investigate cytosine modifications other than 5mC in bees have been performed, little is known about their functions and distributions (Cingolani et al., 2013; Wojciechowski et al., 2014). Here we review cytosine modifications and the enzymatic machinery responsible for their generation in different model organisms.

#### **HONEY BEES**

Nutritional cues lead female honey bee larvae into one of two developmental trajectories. The larvae either develop into a queen or into a worker (Winston, 1991). Queens are larger, highly fecund and long-lived (years), while the smaller workers are functionally sterile and shorter lived (weeks, months). Workers show a flexible physiological and behavioral progression that typically starts with care behavior toward siblings (nursing) and culminates in food collection (foraging) weeks later. Nursing is associated with enhanced somatic maintenance and slower aging than foraging

(Münch and Amdam, 2010). Yet, foragers can return to nursing tasks, and this behavioral reversion can put age-associated cognitive decline in reverse as well (Baker et al., 2012).

Honey bees, in other words, display a wide range of phenotypes that include complex social caste development and behavior, behavioral shifts, and plasticity of aging. Epigenetic mechanisms are already found to likely play major roles in queen-worker development as well as in worker behavioral progression and reversion (Kucharski et al., 2008; Spannhoff et al., 2011; Herb et al., 2012). These findings put the honey bee forward as a very interesting study organism to investigate the interplay between the social milieu and the epigenome. The use of the honey bee for complex epigenetic research is, furthermore, not diminished by the mainstream models fruit fly (*Drosophila melanogaster*) and nematode (*Caenorhabditis elegans*), since they do not have the full complement of the mammalian epigenetic machinery (**Table 1**).

#### **EPIGENETIC MACHINERY**

DNA methyltransferases (DNMTs) are enzymes that add a methyl group to the 5 carbon of the DNA base cytosine from the donor *S*-Adenosyl methionine (Law and Jacobsen, 2010). DNMT1 is the "maintenance" DNMT that copies the methylation pattern to the newly synthesized strand during DNA replication. DNMT3 is the *de novo* methyltransferase that can methylate specific loci independently of replication. DNMT2 is primarily an RNA methyltransferase that methylates t-RNAAsp (Goll et al., 2006), however, DNA activity has been shown *in vivo* in the fruit fly (Phalke et al., 2009). The *de novo* and the maintenance DNMTs


**Table 1 | Genomic copies of enzymes implicated in DNA methylation and demethylation and presence of epigenetically modified cytosines in select metazoan groups.**

Sources: (Kriaucionis and Heintz, 2009; Law and Jacobsen, 2010; Lyko et al., 2010; Walsh et al., 2010; Ito et al., 2011; Cingolani et al., 2013; Beeler et al., 2014) and assembled genomes available at http://blast.ncbi.nlm.nih.gov.

are found in a range of species including honey bees, mammals, aphids, and jewel wasps (**Table 1**). They are catalytically active in the honey bee (Wang et al.,2006), whilefruit fly and nematode only contain a single copy of DNMT2. Nevertheless, 5mC originating from DNA has been reported in the fruit fly in both embryos and adult flies (Lyko et al., 2000), suggesting that DNMT2 has some DNA methylation activity *in vivo*. The impact of 5mC in the fruit fly genome is still debated, however (Phalke et al., 2010; Schaefer and Lyko, 2010).

In mammals, the ten eleven translocation (TET) enzyme is responsible for further oxidizing 5mC to 5-hydroxymethylcytosine (5hmC) that again can be oxidized to 5-formylcytosine (5fC), and ultimately 5-carboxylcytosine (5caC) (Tahiliani et al., 2009; He et al., 2011; Ito et al., 2011). 5fC and 5caC are recognized by the thymine DNA glycosylase (TGD), which is a part of the base excision repair pathway of the mammalian cell (Maiti and Drohat, 2011). The TET enzyme together with TDG are probably central to the mammalian active demethylation pathway (Pastor et al., 2013). Mammalian genomes harbor multiple TET enzyme genes, while bees, fruit flies, aphids, and jewel wasps only have one (**Table 1**). The RNA expression levels of the different mammalian TET enzymes vary greatly between developmental stages and cell types. The honey bee TET catalytic domain is catalytically active *in vitro,* and active transcription of the honey bee TET gene has been shown to vary in different stages of development as well as in different adult tissues (Wojciechowski et al., 2014). Interestingly, some species (including fruit fly) that contain only DNMT2 have well conserved TET orthologs, but their activity and function have not been deciphered (Dunwell et al., 2013).

The honey bee genome encodes several core histone modifying enzymes, which are also part of the epigenetic machinery of the honey bees (The Honeybee Genome Sequencing, 2006). However, the impact of and the mechanisms behind histone modifications are beyond the scope of this review.

#### **5-METHYLCYTOSINE**

The distribution and relative abundance of 5mC vary significantly between mammals, honey bee and fruit fly (**Figure 1**). 5mC is primarily located in a CpG dinucleotide context within repeat sequences and in proximity of promoter areas in mammals (Law and Jacobsen, 2010), whereas in bees methylated CpGs are primarily located within genes (Lyko et al., 2010). However, 5mC can exist in a non-CpG dinucleotide context in both mammals and honey bees (Lister et al., 2009; Cingolani et al., 2013). In addition, the honey bee genome is much more sparsely methylated than mammalian genomes, thus reducing overall complexity and simplifying data analyses for studies conducted in bees. In the fruit fly genome, 5-mC is located within a non-CpG dinucleotide context and seems to be distributed randomly within the genome at an abundance 3- to 100-fold less when compared to honey bees and mammals (Mandrioli and Borsatti, 2006; Phalke et al., 2009). *C. elegans*, on the other hand, does not contain 5mC in its genome (Simpson et al., 1986).

The effect of 5mC on transcription varies between metazoans and genomic context. In mammalian promoters, 5mC is principally a repressive mark, silencing transcription (Bird, 2002). On the other hand, 5mC within gene bodies in mammals, honey bees, and the fruit fly, does not influence transcription levels to the same extent (Mandrioli and Borsatti, 2006; Flores et al., 2012). In honey bees, 5mC within gene bodies rather plays a role in the generation of alternative splice variants on the genome-wide level (Flores et al., 2012; Foret et al., 2012; Li-Byarlay et al., 2013). This role is not clearly defined in mammalian cells, as the role of 5mC in gene bodies differs between cell types and depends on whether 5mC is in a CpG context or not (Lister et al., 2009). These findings make honey bees an attractive system for studies on how 5mC influences the generation of alternative transcripts.

5-methylcytosine is found in multiple cell types, tissues, and life stages in both honey bees and mammals (Ikeda et al., 2011; Ziller et al., 2013). In *D. melanogaster*, 5mC is mostly found during early embryonic stages (Lyko et al., 2000). Although adult 5mC has been reported in fruit fly, the content is too low to be robustly detected by bisulfite sequencing, the gold standard in base resolution 5mC interrogation techniques, making further studies difficult with many established methods depending on bisulfite conversion (Capuano et al., 2014).

#### **5-HYDROXYMETHYLCYTOSINE**

The TET oxidative products of 5mC recently became a center of attention in mammalian epigenetic research. Many questions about TET and 5hmC dynamics have been answered in embryonic stem cells (Pastor et al., 2013), although 5hmC has been detected in different tissues at different life stages (Kriaucionis and Heintz,

2009; Ivanov et al., 2013). The abundance of 5hmC compared to 5mC is much lower ranging from 2- to 100-fold times less depending on tissue (Kriaucionis and Heintz, 2009; Song et al., 2012). The distribution of 5hmC does not seem to be directly linked to 5mC, as 5hmC is found more often in promoter areas and enhancers, and much less in repetitive elements (Pastor et al., 2011; Stroud et al., 2011; Yu et al., 2012). In addition, proteins capable of specifically binding 5hmC have been discovered, fueling the theory that 5hmC exists as separate epigenetic mark and not simply just as an intermediate in an active demethylation pathway (Frauer et al., 2011; Méllen et al., 2012; Spruijt et al., 2013). In honey bees, 5hmC has been characterized in multiple tissues, and its abundance seems to be highest in germ cells and the brain (7–10% of 5mC and about 4% of 5mC, respectively), following the trend in mammalian cell types (Wojciechowski et al., 2014). Only one study has attempted to map 5hmC in honey bees at a single nucleotide resolution (Cingolani et al., 2013). This same study, surprisingly, mapped the majority of 5hmC in head tissue to non-CpG intronic sequences. Further studies seems warranted to precisely quantify and map 5hmC in bees, especially in nonbrain tissue, which has received less interest so far. To date, 5hmC together with 5fC and 5caC have not been identified in the fruit fly, aphid, jewel wasp, and *C. elegans* genomes. However, since *C. elegans* has no 5mC precursor or TET homolog, the existence of 5hmC, 5fC, and 5caC seems highly unlikely.

#### **5-FORMYLCYTOSINE AND 5-CARBOXYLCYTOSINE**

The recently identified nucleotides 5fC and 5caC have, so far, not accumulated the same level of information as their precursors

5mC and 5hmC. This situation is in part due to extremely low abundance, especially for 5caC, making robust detection difficult (in mammals 5caC is 10- to 1000-fold less abundant than 5hmC). Moreover, the molecular toolbox for investigating 5fC and 5caC is not as developed as it is for 5hmC (Song and He, 2013). Bisulfite sequencing for example, only discriminates between "methylated" and "unmethylated" cytosines, so that 5mC and 5hmC are identified as "methylated" and 5fC and 5caC as "unmethylated" (Pastor et al., 2013). Such data are therefore difficult to use as guidelines in narrowing down possible locations of 5fC and 5caC.

The extremely low abundance of 5caC suggest that this nucleotide is merely an intermediate step in complete demethylation (Song and He, 2013). Although 5fC is a more prominent epigenetic mark than 5caC, its function is still not fully understood. It is possible that 5fC might regulate transcription through stalling of RNA pol II (Kellinger et al., 2012), but further research is needed to elucidate the role of 5fC and 5caC in both vertebrates and invertebrates. In honey bees, 5fC and 5caC have not been investigated yet, though their precursors and catalytic enzyme have been reported (Lyko et al., 2010; Wojciechowski et al., 2014).

#### **FUTURE WORK: EPIGENETICS AND WORKER BEHAVIOR**

Epigenetic mechanisms have been linked to the queen-worker differentiation of honey bees (Kucharski et al., 2008), as well as to worker behavioral progression and reversion (Herb et al., 2012). Herb et al. (2012) bisulfite sequenced brains of age-matched nurses, foragers, and reverted workers (previous foragers now involved in care behavior). Their data revealed differentially

methylated regions (DMRs) between the behavioral groups indicating that DNA methylation can play a role in regulation of social behavior. These DMRs are associated with genes involved in development, nuclear pore formation, and ATP binding. RNA sequencing revealed that these same DMRs were connected to alternative splicing events. It is also very likely that the "behaviorally sensitive" DMRs of honey bees are hydroxymethylated at some point during either transition from nurse to forager, or reversion from forager to nurse. Since the study was conducted in adult brain tissue, which has no neurogenesis (Fahrbach et al., 1995), dilution by replication would be unlikely or would only display minor effects. This situation makes these DMRs excellent candidates for investigating if 5hmC is associated with worker behavioral transitions, and if these hydroxymethylated regions are differentially hydroxymethylated between nurses, foragers, and reverted worker bees. Such a study could be the first to establish a putative link between hydroxymethylation and behavior.

Future studies should also dissect the role of TET in worker transitions from nurse to foragers, and back. Other candidate tissues than brain should include the fat body. This tissue is functionally homologous to liver and white adipose tissue and undergoes major remodeling during honey bee behavioral change (Chan et al., 2011). Functional implications of an RNA interference-mediated TET knockdown should provide insight into TET function. Studies can be conducted in honey bee larvae to investigate if TET knockdowns are capable of both queen and worker development. Similarly, consequences for behavioral plasticity can be studied in adult honey bee workers and perhaps link TET and its products with behavior for the first time.

Finally, a possible link between 5hmC and alternative splicing can be investigated by combining 5hmC sequencing at single nucleotide resolution with RNA sequencing of honey bee tissue samples. 5mC is reportedly implicated in the generation of alternative transcripts in the bee, but using methods not able to distinguish 5mC from 5hmC (Flores et al., 2012; Herb et al., 2012). Therefore, further studies that can map 5hmC alongside RNA sequencing data seems warranted, and could potentially give 5hmC a novel function in gene regulation.

#### **CONCLUSION**

The honey bee offers a system where the interplay between DNA methylation and social behavior can be studied in great detail. Published studies of the honey bee epigenome are dominated by questions surrounding queen and worker development, while the epigenetic dynamics of worker behavioral castes have only more recently gained attention. The readily identifiable social behaviors of worker honey bees make setting up precise, large scale experiments feasible (Münch et al., 2013). Better knowledge about honey bee epigenetics also has a dual purpose; increasing the understanding of epigenetic machineries in general, and gaining specific information about gene regulatory mechanisms in an economically important beneficial insect.

#### **ACKNOWLEDGMENTS**

We thank the two reviewers for their helpful comments. This study was supported by funds from the Research Council of Norway, grant no. 191699.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 September 2014; accepted: 09 January 2015; published online: 06 February 2015.*

*Citation: Rasmussen EMK and Amdam GV (2015) Cytosine modifications in the honey bee (Apis mellifera) worker genome. Front. Genet. 6:8. doi: 10.3389/fgene.2015. 00008*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2015 Rasmussen and Amdam. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Social parasitism and the molecular basis of phenotypic evolution

#### *Alessandro Cini 1† ‡, Solenn Patalano2,3 ‡, Anne Segonds-Pichon3, George B. J. Busby2,4, Rita Cervo1 and Seirian Sumner 2,5\**

*<sup>1</sup> Dipartimento di Biologia, Università di Firenze, Firenze, Italy*

*<sup>2</sup> Institute of Zoology, Zoological Society of London, London, UK*

*<sup>3</sup> The Babraham Institute, Babraham Research Campus – Cambridge, Cambridge, UK*

*<sup>4</sup> Wellcome Trust Centre for Human Genetics, Oxford, UK*

*<sup>5</sup> School of Biological Sciences, University of Bristol, Bristol, UK*

#### *Edited by:*

*Juergen Rudolf Gadau, Arizona State University, USA*

#### *Reviewed by:*

*Zachary Cheviron, Univeristy of Illinois, Urbana-Champaign, USA René Massimiliano Marsano, University of Bari, Italy*

#### *\*Correspondence:*

*Seirian Sumner, School of Biological Sciences, Life Sciences Building, 24 Tyndall Avenue, Bristol BS8 1TQ, UK e-mail: seirian.sumner@bristol.ac.uk*

#### *†Present address:*

*Alessandro Cini, CRA – ABP Consiglio per la Ricerca in Agricoltura e l'Analisi Dell'Economia Agraria, Centro di Ricerca per l'Agrobiologia e la Pedologia, Cascine del Riccio, Firenze, Italy and Corpo Forestale dello Stato, Centro Nazionale Biodiversità Forestale "Bosco Fontana", Verona, Italy*

*‡These authors have contributed equally to this work.*

Contrasting phenotypes arise from similar genomes through a combination of losses, gains, co-option and modifications of inherited genomic material. Understanding the molecular basis of this phenotypic diversity is a fundamental challenge in modern evolutionary biology. Comparisons of the genes and their expression patterns underlying traits in closely related species offer an unrivaled opportunity to evaluate the extent to which genomic material is reorganized to produce novel traits. Advances in molecular methods now allow us to dissect the molecular machinery underlying phenotypic diversity in almost any organism, from single-celled entities to the most complex vertebrates. Here we discuss how comparisons of social parasites and their free-living hosts may provide unique insights into the molecular basis of phenotypic evolution. Social parasites evolve from a eusocial ancestor and are specialized to exploit the socially acquired resources of their closely-related eusocial host. Molecular comparisons of such species pairs can reveal how genomic material is re-organized in the loss of ancestral traits (i.e., of free-living traits in the parasites) and the gain of new ones (i.e., specialist traits required for a parasitic lifestyle). We define hypotheses on the molecular basis of phenotypes in the evolution of social parasitism and discuss their wider application in our understanding of the molecular basis of phenotypic diversity within the theoretical framework of phenotypic plasticity and shifting reaction norms. Currently there are no data available to test these hypotheses, and so we also provide some proof of concept data using the paper wasp social parasite/host system *(Polistes sulcifer—Polistes dominula)*. This conceptual framework and first empirical data provide a spring-board for directing future genomic analyses on exploiting social parasites as a route to understanding the evolution of phenotypic specialization.

**Keywords: phenotypic plasticity, social insects,** *Polistes***, social parasites, genomics, gene expression**

#### **INTRODUCTION**

#### **THE MOLECULAR BASIS OF PHENOTYPIC DIVERSITY**

Evolution plays with inherited traits to produce altered phenotypes which may be better adapted to fill a niche different to that of their ancestors. Ultimately, phenotypic traits arise at the level of the genes. A major outstanding question in evolutionary biology is what roles do losses, gains, co-options and modifications of genomic material play in the evolution of phenotypic diversity within and between species? (West-Eberhard, 2003; Kaessmann, 2010; Van Dyken and Wade, 2010; Wissler et al., 2013) Many species show phenotypic plasticity in the expression of alternative phenotypes from the same genotype, through variance in reaction norm responses to changes in the environment (Aubin-Horth and Renn, 2009). Such plasticity affects both short-term (ecological) and long-term (evolutionary) adaptation, and thus influences survival and fitness (Pfennig et al., 2010; Beldade et al., 2011; Hughes, 2012). Conditional expression of the genes underlying polyphenisms facilitate gene, and consequently phenotypic, evolution (Van Dyken and Wade, 2010). Canalized developmental pathways shaped by evolution can result in heritable shifts in phenotype (Waddington, 1942). Genomic methods in modern evolutionary biology now allow us to dissect the molecular basis of such phenotypic diversity across a range of organisms, from genes to phenotypes (Tautz et al., 2010). But selection acts directly on phenotypes and only indirectly on the molecular machinery, and so an integrated study of key phenotypic traits in ecologically relevant settings and the genes associated with them is essential (West-Eberhard, 2005; Schwander and Leimar, 2011; Valcu and Kempenaers, 2014). Insects provide excellent models for studying these facets of phenotypic evolution within and across species (Nijhout, 2003; Moczek, 2010; Simpson et al., 2011), e.g., eusocial insect castes (Evans and Wheeler, 2001; Smith et al., 2008), male morphologies beetles (Moczek, 2009), asexual and sexual reproductive phases in aphids (Brisson and Stern, 2006).

There are several features unique to hymenopteran obligate social parasite/host systems that make them ideal models for studying the molecular basis of phenotypic diversity.


*(Continued)*

#### **Box 1 | Continued**

cryptic manipulation). Whilst these are well studied at the phenotypic level, we know nothing about how such losses and gains occur at the molecular level. Comparisons of the molecular bases of closely related host and social parasite traits will provide new insights into phenotypic evolution.

Photo credits: Alessandro Cini, Rita Cervo, Stefano Turillazzi and David Nash.

An ideal model system for determining the molecular basis of phenotypic evolution allows comparisons of related species which have evolved mutually exclusive traits and/or life histories (e.g., Arendt and Reznick, 2008; Schlichting and Wund, 2014). Parasites are good examples of species that have lost ancestral, free-living traits and gained new ones to evolve a specialized life-history that depends on exploiting the resources of other species. For example, endoparastic worms have lost ancestral gut, head and light sensing organs, but have gained traits such as a specialized tegument, which protects them from host-stomach acids (Burton et al., 2012). Hosts co-evolve to combat parasitism, through enhanced immune responses and mechanisms for detecting infection; parasites manipulate their host to benefit the parasite's life cycle, often through an extended phenotype (Dawkins, 1982). Comparisons between parasites and their free-living relatives therefore present intriguing models for studying the molecular basis of phenotypic evolution (Dybdahl et al., 2014). However, these comparisons are complicated by co-evolution where frequency distributions of host and parasite genotypes (and traits) shift reciprocally and responsively over time, and moreover hosts and their parasites are rarely closely related species (Hamilton, 1980).

Insect social parasites and avian brood parasites differ from other parasites in that they exploit the parental behavior of the hosts rather than the physical resources of individuals. Such parasites have evolved several times in the animal kingdom. For example, cuckoldry occurs in more than 100 bird species, where the host pays the cost of raising unrelated chicks (Davies, 2000). Social parasites of eusocial insects (e.g., the Hymenoptera—bees, wasps and ants) are especially interesting as they are usually close relatives of their hosts, and have often entirely lost their worker caste (Savolainen and Vepsäläinen, 2003). The potential for using social parasites, especially of eusocial insects, as models for understanding the molecular basis of phenotypic plasticity has been recognized (West-Eberhard, 1989, 2003). However, we lack a defined theoretical framework and clear hypotheses to properly exploit this untapped niche using molecular studies. Advances in molecular technologies now make gene-level studies accessible in any organism. It is therefore timely to lay out a framework for exploiting social parasites and their hosts as models for understanding the genomic basis of phenotypic losses and gains in evolution. Here we identify the key traits of hymenopteran social parasites of eusocial insects that make them useful models for understanding phenotypic evolution at the molecular level. We define specific, testable hypotheses on the

**Box 2 | An example test system: the paper wasp social parasite** *Polistes sulcifer* **and its free-living host, the eusocial** *Polistes dominula.*

Both species have an annual lifecycle (Pardi, 1996; Cervo, 2006). Host colonies (blue line) are founded in spring (March–April) by one or more foundresses, among which a reproductive hierarchy is soon established through the mean of dominance interactions (Pardi, 1946). The first brood emerges around the end of May or early June and develops into workers. At the end of the summer, reproductives (males and females) emerge on the nest, leave the colony and mate. Males die soon after mating. Mated females cluster together in sheltered places to overwinter. Those who survive overwinter found new colonies the following spring (Pardi, 1996). Parasite females (orange line) emerge later than their hosts (late May) from overwintering (Cervo and Turillazzi, 1996) and migrate from their overwintering sites to pre-emergence host nests (Cervo and Dani, 1996; Cervo, 2006). Parasites find host colonies using visual and chemical stimuli (Cervo et al., 1996; Cini et al., 2011a). Nest usurpation takes place during a small window of time (late May-early June) (Cervo and Turillazzi, 1996; Ortolani et al., 2008) and it involves violent fights between hosts and parasites (Turillazzi et al., 1990; Cini et al., 2011b). Parasites display a novel behavior during this time (restlessness) (Ortolani et al., 2008). If the parasite is successful she becomes the sole egg-layer of the nest, adopting both the behaviors and chemical signatures of the host queen (Turillazzi et al., 2000; Sledge et al., 2001; Dapporto et al., 2004). After colony usurpation, the social parasite and un-parasitised host queens share the same environmental and social conditions (temperature, microclimate, diet etc.). Photo credits: Alessandro Cini, Rita Cervo and Stefano Turillazzi.

molecular basis of shared and contrasting traits in the evolution of social parasitism within the conceptual framework of shifting reaction norms and phenotypic plasticity (e.g., Aubin-Horth and Renn, 2009; Fusco and Minelli, 2010). We also provide a first test of some of these hypotheses, as proof of concept for our conceptual model and a spring-board for future genomic analyses on the evolution of phenotypic adaptation (see Supplementary Materials).

#### **SOCIAL PARASITISM IN EUSOCIAL INSECTS**

There are over 14,000 eusocial species in the Hymenoptera (bees, wasps and ants) representing over 11 independent origins of eusociality. Their societies are defined by a division of reproductive labor in the form of queen and worker castes, overlapping of generations, and cooperative brood care. Social parasitism has evolved multiple times independently in the eusocial insects: three times in wasps (once in Polistinae *Polistes*—Choudhary Cini et al. Molecular basis of social parasitism

et al., 1994; Cervo, 2006; twice in Vespinae—genus *Vespula* and *Dolichovespula*, Carpenter and Perera, 2006); at least 12 times in bees [three times in bumblebees - *Bombus* (subgenus *Psythrus*, Thoracobombus) and *Alpinobombus*, (Alford, 1975; Cameron et al., 2007; Hines and Cameron, 2010)]; seven times in Allodapinae (Tierney et al., 2008; Smith et al., 2013); twice in Halictidae (*Dialictus* genus, Gibbs et al., 2012); and multiple times in the ants (Huang and Dornhaus, 2008; Buschinger, 2009).

There are several features of social parasite/host systems that make them ideal models for studying the molecular basis of phenotypic diversity. Their easily observable behaviors (e.g., paper wasps *Polistes,* Cervo, 2006, and leafcutting ants *Acromyrmex;* Sumner et al., 2004a) facilitate an integrated study of the behavioral phenotype with the molecular one. Social parasites are usually closely related to their hosts and thus share recent genomic (and phenotypic) ancestry (**Box 1**) (Choudhary et al., 1994; Lowe et al., 2002; Savolainen and Vepsäläinen, 2003; Sumner et al., 2004a; Huang and Dornhaus, 2008; Smith et al., 2013). Obligate social parasites depend on their host for their entire life cycle, and so have lost many of the essential free-living traits such as the ability to found a nest, produce an effective worker caste and raise offspring (Sumner et al., 2004b; Cervo, 2006; Buschinger, 2009). They have also evolved new traits, e.g., the ability to manipulate the host worker force so that parasitic offspring are raised as if they were host offspring. Full release from free-living traits means there are few restrictions on phenotypic evolution. This may facilitate phenotypic diversity at the molecular level. Obligate social parasites of eusocial insects therefore allow a direct comparison of the molecular basis of traits with recent, shared evolutionary history and contrasting traits that have evolved (and persist) within the same environmental context (see **Box 1**).

#### **A MODEL**

Eusocial species evolve from solitary ancestors. Solitary phenotypes occupy a normal distribution of variation, determined by their individual threshold level of response to environmental cues (**Figure 1A**). Queen and worker castes are alternative phenotypes that arise from the same genome, via bi-modal developmental pathways of individuals with evolved differences in their response thresholds to an environmental cue (Wheeler, 1986; Nijhout, 2003; Page and Amdam, 2007; **Figure 1B**). These alternative phenotypes arise through differential expression of shared genes, possibly via epigenetic regulation (Sumner, 2006; Smith et al., 2008; Patalano et al., 2012; Yan et al., 2014). This bi-modal landscape of phenotypic fitness is the ancestral basis from which social parasites must evolve. There are two likely routes by which specialized social parasites evolve from their eusocial ancestor. They may lose the worker phenotype and thus share a phenotypic fitness landscape with just the queens of their social ancestor (De Visser and Krug, 2014). Their phenotype response therefore becomes genetically fixed (canalized) by genetic assimilation, with selection favoring the loss of plasticity such that the genotype no longer responds to the caste-relevant environmental cue ("Phenotype Deletion Model" **Figure 1C**). Alternatively, they may evolve an entirely new phenotype with a novel/contrasting phenotype-response curve ("Phenotype Shift Model" **Figure 1D**), by genetic accommodation whereby there is

**FIGURE 1 | A model for the evolution of a social parasite phenotype from a eusocial ancestor.** A model of shared and contrasting reaction norms is a useful way of exploring the possible ways by which social parasite phenotypes may evolve. A bell curve describes the expression of a single phenotype in a solitary species **(A)**. Eusocial insects evolved form a solitary ancestor **(A)**, and produce two phenotypes—reproductive queens and non-reproductive workers **(B)**. Queens and workers occupy a bimodal expression of phenotypic space, expressing distinct mutually exclusive molecular phenotypes (e.g., gene expression profiles). The genome remains plastic and able to produce alternative phenotypes in response to the environment. Social parasites may evolve via canalization, whereby the phenotype is fixed (as a reproductive) irrespective of the environment, and so unlike its eusocial ancestor, phenotypic expression is robust to the environment: social parasites always produce a reproductive and never a worker phenotype. We propose two ways by which this could arise. Since the social parasite resembles so closely the phenotype of their ancestral eusocial queen, one model is that the worker phenotype is functionally "deleted." This would suggest that the phenotypic reaction norm landscape of the worker caste is not expressed (**C**, Phenotype Deletion Model). An alternative is that the social parasite is a new, or modified, phenotype, with a reaction norm that is different to both the bimodal (caste) peaks of the eusocial ancestor (**D**, Phenotype Shift Model). For simplicity, we place this shifted phenotype in a different phenotypic space to the ancestral queen and worker phenotypes, but this curve could lie at any point. Dashed curves depict the ancestral eusocial phenotypes that are no longer expressed by the social parasite. Determining this point may shed light on the mechanisms of social parasite phenotype evolution. The two models are not necessarily mutually exclusive: depending on the time since divergence between the lineages, the two models may represent different ends of a spectrum of phenotypic evolution.

selection for altered patterns of gene expression and associated phenotypic effects (West-Eberhard, 2003; Schlichting and Wund, 2014). Under either scenario, the pre-existing polyphenism of the eusocial ancestor facilitates the evolution of the specialist social parasite. Determining which route evolution takes is important: in the Deletion Model (**Figure 1C**) co-option of conserved genomic processes would be paramount, but with silencing of the worker response threshold (e.g., using the existing machinery used by queens to silence worker expression); in the Phenotype Shift Model (**Figure 1D**) novel genomic processes (e.g., brought about via mutation) would be important in generating a new range of response thresholds to the environment. The timing since speciation between the eusocial ancestor and social parasite is also important to consider as this may mean the two models are not mutually exclusive: the longer the time since the two lineages split, the more differences each lineage may accumulate. There may be a transition from the phenotype shift model to the phenotype deletion model for traits, gene expression or genes, depending on the time since the two lineages split.

#### **HYPOTHESES AND PREDICTIONS**

Here we present some testable hypotheses for these models. These hypotheses and predictions are specific to obligate social parasites and their eusocial insect hosts, but they may also be of general relevance to furthering our understanding of the molecular basis of phenotypic diversity. The empirical approach we suggest requires a combined analysis of individual-level behavioral monitoring with subsequent quantitative analyses of the many components of the molecular phenotype (Pavey et al., 2010), e.g., transcription (RNAseq/transcriptomics; Ferreira et al., 2013), protein synthesis (proteomics; Begna et al., 2012), regulatory elements (e.g., microRNAs; Greenberg et al., 2012) and epigenetic modifications (Kucharski et al., 2008; Lyko et al., 2010; Bonasio et al., 2012; Simola et al., 2013). In **Figure 2**, we illustrate schematic regions of shared and contrasting trait-associated molecular phenotypes, which we refer to in our hypotheses, and suggest this as a useful way of making sense of complex genomics datasets.

#### **HYPOTHESIS 1: CONSERVED MOLECULAR PROCESSES UNDERLIE CONVERGENT PHENOTYPES**

Conserved genes, like the *Hox* gene family (Lee et al., 2003; Fernald, 2004), underlie convergent phenotypes, suggesting that phenotypic variation can evolve using shared genes and regulatory mechanisms differently (Shubin et al., 2009; Stern, 2013). By this mechanism, evolution re-uses the same ingredients (or "toolkit") in different organisms, but tinkers with the recipe to produce different outcomes. By expressing genes at different times in development and/or in different parts of the body, the same genes can be used in different combinations, generating phenotypic diversity and innovation. Animals look different not because the molecular machinery is different, but because different parts of the machinery are activated to differing degrees, at different times, in different places and in different combinations. The number of combinations is huge, and so this is a compelling and simple explanation for the development of complex and diverse phenotypes from even a small number of genes. For example, the human genome has a mere 19,000 protein-coding genes (Ezkurdia et al., 2014), and yet humans are arguably one of the most complex products of evolution, and differ in significant ways from close relatives with similar gene sets. "Toolkit" genes are old, present in all animals and often share functions across species. Conserved toolkit genes associated

**FIGURE 2 | Conceptual framework for predictions on shared and contrasting genomic/phenotypic diversity in social parasite/host relationships.** Venn diagram depicting predicted shared and contrasting molecular phenotypes of non-hosts, hosts and social parasites. We define the molecular phenotype to include contrasting patterns of gene expression (significant up or down regulation), gene regulatory elements (e.g., non-coding RNAs, microRNAs, DNA methylation, histone modifications), gene interaction networks (e.g., correlated co-expression) and protein synthesis. Each area represents the molecular phenotype of the specific suite of traits. Overlapping areas indicate putatively shared molecular phenotypes. The yellow shaded area indicates the shared environment of the three species, which we predict will cause similar responses in molecular phenotypes of all three species. Conserved generic traits (area d): Molecular processes underlying traits shared by all species, and thus putatively inherited from their common ancestor. These will include fundamental machinery for growth, cell repair, metabolism, as well as more specific traits of interest that are shared among queens and social parasites such as aggression and reproduction. Identifying the molecular phenotype of this area allows tests of the genetic toolkit hypothesis. Parasite-specific (area a): Molecular processes underlying traits that have evolved in the parasite to facilitate its specialized parasitic life style, for example enhanced fighting ability, usurpation behaviors, cryptic mimicry. Identifying the molecular phenotype of this area addresses the question of whether newly evolved phenotypic traits require new genes/pathways or simply re-use existing ancestral genes/pathways. Free-living specific (area e): Molecular processes underlying free-living traits that no longer provide a fitness advantage to social parasites, e.g., parental care traits and nest founding. Identifying the molecular phenotype of this area allows us to determine what happens at the molecular level when phenotypic traits are lost, e.g., are there changes in regulation/expression, loss of processes/genes? Restricting this to traits/genes shared by free-living host and non-host species is likely to represent the traits present in the eusocial ancestor of the social parasite, and exclude processes that may have evolved subsequently. These latter processes may be associated with social parasite resistance (areas c and g) in sympatric non-hosts, host response to parasitism (area b) and co-evolved traits (area f) in host and parasite that are absent from the non-host.

with convergent social behaviors have been detected in a range of eusocial insects (Toth and Robinson, 2007; Fischman et al., 2011; Woodard et al., 2011; Toth et al., 2014), but recent work has also revealed that eusocial lineages also harbor novel (taxonomically restricted) genes that are associated with eusocial behaviors (Ferreira et al., 2013; Simola et al., 2013; Feldmeyer et al., 2014; Sumner, 2014).

Closely related social parasites and their hosts are especially powerful models for asking to what extent conserved molecular processes underlie similar phenotypes in species with shared, recent genomic inheritance. The toolkit hypothesis predicts that host queens and social parasites will share the same molecular phenotype (i.e., express the same genes and proteins), because they are both reproductive specialists. Support for this hypothesis would suggest that social parasites are simply a reduced form of the social phenotype, expressing the reproductive component, but suppressing the worker component of their ancestors (i.e., the Phenotype Deletion Model; **Figure 1C**). Alternatively, if gene conservation is not supported, this may suggest that social parasitism evolves via Phenotype Shift (**Figure 1D**), or a combination of the two processes. This can be tested by looking at shared transcriptional patterns between social parasites and their host queens (See **Figure 2**; molecular processes underlying traits in areas d & f).

Preliminary data suggest that expression of toolkit genes is not conserved in the evolution of a social parasite, supporting the Phenotype Shift Model (**Figure 1D**). Analyses of gene expression profiles for putative toolkit genes thought to be important in castes of *Polistes* paper wasps reveal that social parasites and their host queens have distinct expression patterns (**Figure 3A**, see Supplementary Materials). This is unlikely to be a specieslevel effect since host workers are equally as distinct from their conspecific queens (**Figures 3A,B**). Importantly, gene expression differences between social parasites and queens were greater than among social parasites, suggesting that social parasite gene expression is not strongly overlapping with the queens among these putative toolkit genes (**Figure 3B**). Quantitative transcriptome sequencing (e.g., RNAseq) would allow a comprehensive test of this. However, these preliminary data suggest that social parasites evolve via a Phenotype Shift Model (**Figure 1D**), and that they may be a more complex phenotype than simply a partial genomic expression of the ancestral social state (as suggested by the Phenotype Deletion Model, **Figure 1C**). We predict that the shared molecular components between host and parasite will be few and limited to fundamental processes, e.g., egg production and protein storage, as characteristics of any reproductively active insect.

#### **HYPOTHESIS 2: CONSERVED MOLECULAR PROCESSES UNDERLIE RESPONSE TO A SHARED ENVIRONMENT**

Molecular phenotypes (e.g., gene expression, regulation and protein synthesis) are highly labile and can change responsively to environmental variation. A key question is whether different organisms use the same genes to respond to the same environmental cues. There will be strong selection for the social parasites to be able to accurately detect and respond to their host's environmental cues since they share the same intimate environment on the nest. Moreover, the social parasite must synchronize its life cycle and behavior perfectly with the host's life cycle (Cervo, 2006; Ortolani et al., 2008). The molecular processes underlying responsiveness to their shared environment may therefore be conserved. The Phenotype Deletion Model (**Figure 1C**) makes the implicit assumption that the phenotypes of host and parasite arise via different responses to the *same* environmental cue. Conversely, the Phenotype Shift Model (**Figure 1D**) is compatible with either a response to the *same* cue (but with a novel threshold), or a response to a *new* cue (i.e., one that evokes no caste-related response in the eusocial host).

One important phenotype-environment response trait in both hosts and social parasites is the ability to respond to the switch from a solitary to social environment. Many eusocial insects have a solitary phase, when a single queen founds a new colony and raises her first brood alone, and then switches to a eusocial phase when her workers emerge (see **Box 2**). Likewise, social parasites have a solitary phase, during which they need to locate and successfully infiltrate a host colony, followed by a social phase where the parasite takes over the role of the queen in a society of host workers (see **Box 2**). The Phenotype Deletion Model predicts that the social parasite co-opts the molecular plasticity of its eusocial ancestor. Thus, we would expect the same genes to change in both the social parasite, its eusocial host and any co-occuring related eusocial non-hosts (see **Box 1**) when each shifts from a solitary to a eusocial phase. In **Figure 2** the social environment is depicted by the yellow shaded area surrounding the three species spheres. Since all three species (social parasite, host and non-host) occupy similar societies, we predict that each will respond to a shift between solitary (nest founding/nest searching) and eusocial (established queens on host/non-hosts, and established parasite queens on host colonies) environments using similar changes in their molecular phenotypes. Conversely, if the social parasites evolve via Phenotype Shifting, we would not necessarily expect host and social parasite to respond to the same cue, using the same molecular processes. A test of this requires comparisons of transcription, protein synthesis and regulatory elements in the solitary and eusocial forms of the reproductive phenotypes in each species (**Figure 2**, area d).

Among the toolkit genes we analyzed, insulin growth factor *(IGF)* is a putative candidate gene for response to changes in the social environment. We observed up-regulation of *IGF* in social parasites brains when they shift from solitary to social living, whilst *IGF* shows no change in expression in the constant eusocial environments of the host (**Figure 3C**). In our *Polistes* test system (see **Box 2**), both host and parasite over-winter as newly mated queens, but the parasite overwinters alone whilst the host overwinters in socially active aggregations (Dapporto and Palagi, 2006; Cini and Dapporto, 2009). If social context influences gene expression, hosts should show no significant change in the expression of genes responsive to social environment since they remain in a social phase during the winter and summer. Conversely, social parasites shift between solitary (overwintering) and social phases, and expression of genes responsive to social environment should reflect this dynamic, as seen with *IGF* in our system (**Figure 3C**). Recent work in a free-living species of *Polistes* has highlighted the importance of social environment in gene expression (Toth et al., 2014). Further analyses will reveal whether host/non-host species in the solitary founding phase also show similar patterns of response to environment as found in the social parasite (**Figure 2**, area d). Other likely candidate genes for this response include juvenile hormone-binding proteins and hexamerins, which are up-regulated in gregarious/social forms relative to solitary phases in the migratory locust (Kang et al., 2004).

**FIGURE 3 | Brain gene expression data from the social parasite** *Polistes sulcifer* **and its social host** *Polistes dominula.* Comparison of expression levels for five "toolkit" genes that are differentially expressed among queens and workers in *Polistes* (chosen from: Sumner et al., 2006; Toth et al., 2007; Ferreira et al., 2013). *Arrestin* (*Art*) is expressed in response to light; *Apolipophorin* (*Apo*) is involved in general metabolic processes and lipid transport; *Heat Shock Protein 70kDa* (*HSP*) is involved in response to heat stimulus; insulin growth factor (IGF) responds to nutrition; *Major Royal Jelly Protein* (*MRJP*) is a yellow protein associated with reproductive behaviors. We compared individual-level gene expression across three phenotypes: social parasites (P), host queens (Q) and host workers (W). **(A)** Discriminant analyses revealed three distinct clusters, corresponding to the 3 phenotypes. Function 1 closely correlates with gene expression of *MRJP* and *IGF,* and discriminates between social parasites and workers while function 2 closely correlates with *Apo* and *HSP* and discriminates social parasites from queens. 79.3% of individuals grouped into non-overlapping clusters. Cross validation analyses correctly classified 69% of samples. **(B)** Euclidean distances in gene expression among phenotypes showing greater inter-phenotype differences than intra-phenotypes (*t*-test, *t* = −2*.*114, *df* = 376, *p* = 0*.*035, *n* = 126 vs. 252). Gene expression differences between social parasites and queens were greater than among social parasites (Mann Whitney test, *U* = 233, *p* = 0*.*0005, *n* = 72 vs. 15). **(C,D)** Gene

#### **HYPOTHESIS 3: TRAIT LOSSES AND GAINS WILL BE REFLECTED AT THE MOLECULAR LEVEL**

Phenotypically, social parasites exhibit a functional deletion of parental care traits (West-Eberhard, 2003). It is this expression dynamics across the seasons (OW, overwinter; US, usurpation; SU. summer). **(C)** Changes in social environment experienced by the social parasites are accompanied by changes in *IGF* gene expression (within social parasites: Mann Whitney test, *U* = 4*.*0, *p* = 0*.*0183, *n* = 8 vs. 5; between species: Mann Whitney test, *U* = 8*.*0, *p* = 0*.*1498, *n* = 7 vs. 5). **(D)** *Apo* and *Art* are upregulated during usurpation compared to the pre and post usurpation periods (Kruskal Wallis test, *Apo*: *H* = 8.525, *p* = 0*.*0141: *Art*: *H* = 8.842, *p* = 0*.*0120). Expression levels of *Apo* and *Art* are significantly higher in usurping social parasites than in overwintering social parasites but no differences occur between overwintering and summer period [*Apo*: Mann Whitney post hoc pair wise comparisons US vs. OW *p* = 0*.*0112, US vs. SU, *p* = 0*.*0230; OW vs. SU *p* = 0*.*341, *n* = 9 (OW) vs. 5 (US) vs. 7 (SU), *Art*: Mann Whitney post hoc pair wise comparisons US vs. OW, *p* = 0*.*00848, US vs. SU, *p* = 0*.*01421; OW vs. SU *p* = 0*.*9485, *n* = 8 (OW) vs. 4 (US) vs. 6 (SU)]. No changes were observed in the expression levels for *Art* and *Apo* in the host species (Mann Whitney test, *Apo*: OW vs. SU Hosts *U* = 12,0, *p* = 0*.*2343, *n* = 7 vs. 6; *Art*: *U* = 14.0, *p* = 0*.*366, *n* = 7 vs. 6). No significant changes in *MRJP* and *HSP* gene expression dynamic across season were observed in parasites (Mann Whitney test, *MRJP*: *U* = 4, *p* = 0*.*176; *HSP*: *U* = 13.0, *p* = 0*.*236), or in the hosts who remain in a social environment throughout (Mann Whitney test, *MRJP*: *U* = 8.0, *p* = 0*.*246; *HSP*:*U* = 6.0, *p* = 0*.*226) (data not shown).

observation that forms the basis of the Phenotype Deletion Model (**Figure 1C**). At the molecular level, selection for the genes/gene functions associated with parental care will be relaxed as their expression no longer has any fitness consequence. Such genes may be subject to rapid evolution, loss or other modifications (Hunt and Carrano, 2010; Hunt et al., 2011). This means genomic changes can be fixed rather than conditionally expressed (Van Dyken and Wade, 2010). Genes identified as important in parental care in host species therefore, are predicted to be lost (or not expressed) in social parasites. These traits can be easily recognized in the host (**Figure 2**, area b), thus providing a base-line of "absent" traits to compare with in the parasite (**Figure 2**, area a). Comparisons of molecular phenotypes of social parasites and their host (and non-host) workers are promising routes to defining the genes, regulatory processes and pathways involved in parental care in free-living species. Such analyses would provide a test of the Phenotype Deletion Model, and it also raises intriguing questions regarding the fate of the molecular processes involved in ancestral maternal care: does the parasite lose these genes/functions? In what sense are they "lost"; via their coding potential? What are the molecular processes that prevent these ancestral molecular processes from being expressed?

The evolution of social parasitism is accompanied by release from the evolutionary constraints experienced by a free-living species (Sumner et al., 2004b). This may allow the evolution of new/modified traits, not found in their free-living ancestor (West-Eberhard, 2003). For example, exaggerated morphological traits that enhance a social parasite's fitness e.g., enlarged Dufours glands in Vespine social parasites (Jeanne, 1977); enlarged mandibles (Cervo, 1994; Cervo and Dani, 1996); specific usurpation behaviors in Polistine social parasites (Ortolani et al., 2008); reduced scopae and mouthparts in Allodapinae social parasites (Michener, 1970; Smith et al., 2013); specialized piercing mandibles in slave making ants (Buschinger, 2009). Other traits include mechanisms of effective manipulation and deception of the host, such as chemical insignificance to elude host recognition and chemical mimicry to integrate into the host colony (Lenoir et al., 2001; Bagnères and Lorenzi, 2010; Bruschini et al., 2010) or suppression of host queens/workers reproduction (e.g., Cervo and Lorenzi, 1996; Vergara et al., 2003). A key question is whether these novel traits arise through co-opted conserved molecular processes, or via *de novo* birth of novel genes and/or re-organization of existing genomic material.

Novel traits that have evolved in a range of different taxa have recently been associated with taxonomically restricted genes (Khalturin et al., 2008; Johnson and Tsutsui, 2011; Ferreira et al., 2013; Looso et al., 2013; Harpur et al., 2014), and this includes the eusocial Hymenoptera (Simola et al., 2013; Wissler et al., 2013; Sumner, 2014). We predict that social parasites will harbor a higher proportion of new genes, gene functions, or novel gene networks relative to their free-living eusocial hosts. Additionally, ancestral genes may be modified substantially in function through modulation of their expression patterns, regulatory roles or protein production (**Figure 2**, area a).

Analyses of gene expression dynamics in *Polistes* social parasite brains at the pre-usurpation (OW), usurping (U) and postusurpation (SU) phases of their life cycle (see **Box 2**), revealed significant changes in the expression of Arrestin (*Art)* and Apolipophorin (*Apo)* (**Figure 3D**). These genes are significantly up-regulated during usurpation—a critical period in a social parasite's life which, if not executed correctly during a narrow temporal window, could result in zero fitness (Turillazzi et al., 1990; Cervo and Turillazzi, 1996). During this phase, a novel behavior is exhibited—restlessness—(Ortolani et al., 2008), which is not found in the host (or non-host). No such variation of *Art* and *Apo* expression was detected in the host queens suggesting that these expression patterns are specific to the parasite's novel behavior, potentially due to the acquisition of regulatory mechanisms that enhance gene expression variability. Unbiased genome-wide RNAseq analyses are required to determine whether putative novel genes are also involved in usurpation behaviors. New genes may be important drivers of phenotypic evolution (Chen et al., 2013). Studies on social parasites and their hosts will therefore help identify some such novel genes, and facilitate further exploration of the role of novel genes in phenotypic evolution. Such phenotype-led gene discovery is likely to be a rich, untapped resource.

#### **HYPOTHESIS 4: RESISTANCE TO SOCIAL PARASITISM IN NON-HOSTS WILL BE REFLECTED AT THE MOLECULAR LEVEL**

Comparison of social parasites, hosts and non-hosts has the potential to reveal the molecular processes associated with host response to parasitism (**Figure 2**, area b), for example in host worker rebellions to the presence of social parasites in *Protomognathus americanus* ants (Achenbach and Foitzik, 2009), and resistance to social parasitism as found in sympatric nonhost sister species (**Figure 2** area c). In *Polistes dominula*, workers respond to parasite queens as if they were the host (mother) queen (Cervo et al., 1990; Cervo, 2006) suggesting that the parasite manipulates host workers successfully. However, recent work suggests that after several weeks of parasitism, workers are able to detect and respond to the parasite as they show some level of ovarian development, perhaps priming themselves for direct reproduction (Cini et al., 2014). Examining the molecular changes that take place in workers over the social parasite's life cycle may reveal important insights into the dynamic interactions of host and social parasite genomes, in a similar way to pathogens and their hosts (Riddell et al., 2011; Dybdahl et al., 2014).

Non-host sister species that occur sympatrically to the host in parasitized populations are powerful models for studying the molecular basis of social parasite resistance. For example, the freeliving leafcutter ant *Acromyrmex octospinosus* co-occurs with its sister species *Acromyrmex echinatior,* and yet is resistant to parasitism by *Acromyrmex insinuator* (Sumner et al., 2004a; **Box 1A**); *Polistes nimphus* occurs alongside *P. dominula* and is resistant to invasion by *P. sulcifer* (Cervo, 2006; **Box 1B**). Phenotypically, there is no explanation for why co-occuring close relatives of hosts and social parasite are not also vulnerable to social-parasitism. We hypothesize that there will be key differences in the transcriptional and/or regulatory processes of hosts and non-hosts, which may confer resistance to non-hosts (**Figure 2**, area c). These may include novel processes (or novel usage of conserved genes) that have evolved in the non-host since speciation. Functional genomics (e.g., RNAi, cross-species expression experiments) provide powerful tools to test candidate genes or regulatory elements involved resistance.

#### **CONCLUSIONS AND FUTURE PERSPECTIVES**

Comparative genomic analyses of obligate social parasites with their eusocial hosts and non-hosts are powerful approaches to studying losses and gains in phenotypic evolution. These analyses promise important insights into how genomes give rise to phenotypic diversity. We outline two scenarios for the evolution of social parasites from their eusocial ancestors. The scant data available to date suggest that the social parasite phenotype is distinct from their eusocial ancestor counter-part (i.e., eusocial queens). Social parasites therefore may not evolve through simple "deletion" (silencing) of the worker phenotype and its associated molecular functions (West-Eberhard, 2005). Based on recent empirical findings on the molecular basis of phenotypic evolution in other organisms, we predict that the evolution of new genes as well as the re-use of old ones will be important in the generation of the novel traits that characterize this new phenotype. We also predict that the full social parasite phenotype (defined as a combined consideration of the behavioral and molecular phenotype, Nachtomy et al., 2007) will be more complex than perceived from classical behavioral studies. Crucially, social parasites may retain the machinery for detecting and responding to the environment, just like their social ancestor and their free-living social hosts. The molecular processes associated with response to the environment, rather than behavior, are likely to be conserved (e.g., toolkit genes).

Our model and predictions are preliminary, but are relevant more widely to non-hymenopteran social parasites, as social parasitism of parental care has evolved multiple times in different taxa of the animal kingdom, e.g., birds (Davies, 2000); lycaenid butterflies (Fiedler, 2006); freshwater fishes (Baba et al., 1990). In each case, the social parasite is a highly specialized species that has lost the traits associated with caring for its own young, and evolved new traits that enable it to successfully insinuate its young into the home of its chosen host. More generally, our framework may also be relevant to phenotypic evolution in non-social parasites that are closely related to their hosts, such as in fungi, red algae and mistletoe, cynipids wasps, gall inducing aphids (West-Eberhard, 2003) and parasitoids (e.g., *Nasonia,* Werren et al., 2010).

#### **ACKNOWLEDGMENTS**

This work was funded by Fondation Fyssen and Accademia dei Lincei-Royal Society grants (AC), Università di Firenze (RC), Wellcome Trust (SP and ASP), RCUK & NERC (NE/G000638/1) (SS, SP), L'Oreal for Women in Science Fellowship (SS, GB).

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www*.*frontiersin*.*org/journal/10*.*3389/fgene*.*2015*.* 00032/abstract

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 September 2014; accepted: 23 January 2015; published online: 18 February 2015.*

*Citation: Cini A, Patalano S, Segonds-Pichon A, Busby GBJ, Cervo R and Sumner S (2015) Social parasitism and the molecular basis of phenotypic evolution. Front. Genet. 6:32. doi: 10.3389/fgene.2015.00032*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2015 Cini, Patalano, Segonds-Pichon, Busby, Cervo and Sumner. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Function and evolution of microRNAs in eusocial Hymenoptera

#### *Eirik Søvik1, Guy Bloch2 and Yehuda Ben-Shahar1\**

*<sup>1</sup> Department of Biology, Washington University in St. Louis, St. Louis, MO, USA, <sup>2</sup> Department of Ecology, Evolution, and Behavior, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel*

The emergence of eusociality ("true sociality") in several insect lineages represents one of the most successful evolutionary adaptations in the animal kingdom in terms of species richness and global biomass. In contrast to solitary insects, eusocial insects evolved a set of unique behavioral and physiological traits such as reproductive division of labor and cooperative brood care, which likely played a major role in their ecological success. The molecular mechanisms that support the social regulation of behavior in eusocial insects, and their evolution, are mostly unknown. The recent whole-genome sequencing of several eusocial insect species set the stage for deciphering the molecular and genetic bases of eusociality, and the possible evolutionary modifications that led to it. Studies of mRNA expression patterns in the brains of diverse eusocial insect species have indicated that specific social behavioral states of individual workers and queens are often associated with particular tissue-specific transcriptional profiles. Here, we discuss recent findings that highlight the role of non-coding microRNAs (miRNAs) in modulating traits associated with reproductive and behavioral divisions of labor in eusocial insects. We provide bioinformatic and phylogenetic data, which suggest that some Hymenoptera-specific miRNA may have contributed to the evolution of traits important for the evolution of eusociality in this group.

#### *Edited by:*

*Greg J. Hunt, Purdue University, USA*

#### *Reviewed by:*

*Darren E. Hagen, University of Missouri, USA Roberto Bonasio, University of Pennsylvania, USA*

#### *\*Correspondence:*

*Yehuda Ben-Shahar, Department of Biology, Washington University in St. Louis, Campus Box 1137, One Brookings Drive, St. Louis, MO 63130, USA benshahary@wustl.edu*

#### *Specialty section:*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics*

> *Received: 23 March 2015 Accepted: 14 May 2015 Published: 27 May 2015*

#### *Citation:*

*Søvik E, Bloch G and Ben-Shahar Y (2015) Function and evolution of microRNAs in eusocial Hymenoptera. Front. Genet. 6:193. doi: 10.3389/fgene.2015.00193*

Frontiers in Genetics | www.frontiersin.org May 2015 | Volume 6 | Article 193 |

Keywords: miRNA, Aculeata, Hymenoptera, eusociality, non-coding RNAs

### Introduction

Most insect species are solitary, and behavioral interactions with conspecifics are primarily restricted to reproductive behaviors such as male–female courtship and male–male competition. This is in sharp contrast to social insects, where groups of genetically related individuals often live together in a colonial lifestyle. The size and stability of these colonies vary from a few individuals sharing a nest for a short period of time, to large perennial colonies composed of thousands of individuals (Hölldobler and Wilson, 2009). The most advanced form of animal social organization is termed "eusociality" (Crespi and Yanega, 1995), marked by the presence of sterile workers that often forgo own reproduction in order to support the reproduction of other colony members. Although eusociality is relatively rare in most taxonomic animal lineages, eusocial species have been immensely successful. Current projections estimate eusocial insects to represent the largest proportion of the global animal biomass (Hölldobler and Wilson, 2009). Although the reasons for this remarkable success are not well-understood, it is commonly assumed that the social lifestyle of these animals must have played a major role in their current ecological dominance (Wilson, 1990). For example, it is thought that specialization in task performance (division of labor) amongst eusocial workers enables colonies to maximize the exploitation of their environment. In contrast, solitary insects have to multitask independent activities, including foraging and brood rearing (Wilson, 1985).

The recent sequencing of genomes from diverse social and solitary Hymenoptera clades offers a unique opportunity for identifying genome-level molecular events that may have supported the emergence of specific traits associated with the evolution of eusociality ("eusocial traits"). The ability to compare whole-genome sequences, gene expression patterns, and other molecular properties of species with diverse forms of social lifestyles, has generated novel mechanistic and evolutionary insights into these complex behaviors. This approach has been used most successfully in studies of the division of labors in worker tasks (Smith et al., 2008) and reproduction (Schwander et al., 2010), both of which are hallmarks of eusociality (Wilson, 1985). To date, the efforts to decipher the evolution of eusocial traits, and the mechanisms that support them, have focused on protein-coding genes (Keller and Ross, 1998; Page and Amdam, 2007; Fischman et al., 2011; Woodard et al., 2011). In contrast, how non-coding regulatory RNAs may have played a role in the evolution of eusociality is understudied. Here, we examine the emerging role of an important class of small, non-coding RNAs, which are collectively referred to as "microRNAs" (miRNAs), in regulating social behaviors. We discuss their possible role in regulating eusocial traits in social Hymenoptera at the developmental, physiological, and evolutionary time scales.

#### miRNAs: History and Background

During the early days of the molecular biology revolution, the majority of research on gene regulation was limited to transcriptional mechanisms of protein coding genes as originally defined by the "Central dogma of molecular biology" (Crick, 1970). However, the discovery of the regulatory function of non-coding RNAs indicated that the early views on gene regulation and their associated phenotypic outcomes, were oversimplified and required major revisions to the dogma. We now know that in addition to transcriptional regulation (Lee and Young, 2000; Yan et al., 2015), gene functions are also regulated by factors such as post-transcriptional RNA editing (Gott and Emeson, 2000), mRNA splicing (Breitbart et al., 1987), RNA degradation (Bushati and Cohen, 2007), and diverse post-translational protein modifications (Braakman and Bulleid, 2011). More recently, regulatory non-coding RNAs have also emerged as important factors that regulate phenotypic variation via diverse molecular mechanisms (Qureshi and Mehler, 2012; Bonasio and Shiekhattar, 2014).

miRNAs are short (18–24 nucleotides) non-coding RNAs, which in animals seem to act primarily by repressing protein translation via interaction with the 3- UTR of mRNAs (**Figure 1**). miRNAs were first discovered in the nematode *Caenorhabditis elegans*, where the miRNA *cel-lin-4* was shown to be necessary for the temporal timing of key developmental events (Lee et al., 1993). Because of their short length and the nature of their molecular interaction with mRNA targets, it has been hypothesized that a single miRNA can potentially regulate the function of multiple protein-coding genes (Bartel, 2009), and

miRNA are transcribed as 80–100 nucleotide (nt) hairpin loops. (2) The initial transcript, referred to as the primary-miRNA (pri-miRNA), (3) is cleaved into precursor miRNA (pre-miRNA) and exported to the cytoplasm. Subsequently, (4) the pre-miRNA is cleaved into a single mature miRNA strand, (5) which binds to the RNA-Induced Silencing Complex (miRISC, shown in turquoise). (6) The miRISC binds to the 3- UTR of mRNAs, which leads to the inhibition of protein translation. Eventually the mature mRNA becomes (7) deadenylated and (8) decapped, which leads to transcript degradation by RNases.

thus act as a pleiotropic genetic factor (Bartel, 2004). It is estimated that between one and two thirds of mRNAs encoded by animal genomes are regulated by miRNAs (Berezikov, 2011). As a result, it is likely that miRNAs play some roles in the regulation of most biological processes in animal cells (Bushati and Cohen, 2007).

#### miRNAs in Development and Function of Nervous Systems

Various miRNAs have been implicated in neuronal development (Alvarez-Garcia and Miska, 2005; Wienholds and Plasterk, 2005). There is evidence that miRNAs play important roles in fine tuning the temporal and spatial regulation of protein translation during development (Aboobaker et al., 2005; Wienholds et al., 2005). For example, miRNAs have been shown to affect canonical signaling pathways that are important for nervous system development, such as the MAPK and Notch signaling pathways (Lai et al., 2005; Chiba, 2006; Louvi and Artavanis-Tsakonas, 2006; Paroo et al., 2009; Zhu et al., 2010). It has been hypothesized that in these essential developmental pathways, miRNAs reduce the impact of stochastic variability in mRNA transcript levels on actual protein levels, which subsequently buffers the effects of environmental perturbations on cellular functions (Wu et al., 2009). Thus, some miRNAs evolved to maintain the robust association between gene expression patterns and fixed developmental traits (Peterson et al., 2009).

In contrast to their role on constraining plasticity during development, miRNAs seem to play a role in enhancing plasticity in the context of behavior and neuronal functions. This has been demonstrated in several recent studies, which implicated multiple miRNA genes in the regulation of neuronal plasticity (Fiore et al., 2011; Siegel et al., 2011; McNeill and Van Vactor, 2012). For example, *miR-132* and *miR-134* have been implicated in the growth and pruning of mammalian dendritic spines (Schratt et al., 2006; Impey et al., 2010), *miR-133b* in neurotransmitter vesicle size (Kim et al., 2007), and others in different aspects of neuronal plasticity (Schratt, 2009).

Given the emerging importance of miRNAs for neuronal plasticity, it is perhaps not surprising that distinct miRNA genes have been implicated in the regulation of behavioral plasticity as well, including entrainment of the circadian clock in mammals (Na et al., 2009; Bartok et al., 2013), positive and negative responses to specific odors in *Drosophila* (Li et al., 2013), and the social response to unfamiliar conspecifics in mice (Gascon et al., 2014). Specific miRNA genes have also been implicated in processes associated with learning and memory, including in social insects. For example, expression levels of several miRNAs are associated with spatial learning (Qin et al., 2013), and long-term olfactory memory (Cristino et al., 2014) in the honey bee. In addition, studies in *Drosophila melanogaster* showed that blocking the action of *dme-miR-276a* in the mushroom bodies, a key neuroanatomical structure necessary for many cognitive functions (Heisenberg, 2003), leads to inhibition of long-term olfactory memory formation via direct interaction with the dopamine receptor *DopR* (Li et al., 2013).

In addition to neuronal functions of miRNAs, some miRNAs can also affect behavior via their actions in in non-neuronal tissues. For example, manipulations of the expression of *miR-184* is implicated it in the synthesis and release of insulin (Morita et al., 2013), a conserved and ubiquitously important neuroendocrine factor that is secreted from non-neuronal cells in all animal lineages (Ament et al., 2008; Wolschin et al., 2011).

#### The Possible Role of miRNAs in the Regulation of Traits Associated with Eusociality

#### Developmental Plasticity: Caste Differentiation

The completion of the honey bee genome revealed many conserved candidate miRNAs (Weinstock et al., 2006). Because of the known functions of miRNAs in the regulation of various developmental processes, it has been suggested that miRNAs are likely to contribute to the developmental processes of reproductive caste (queen-worker) differentiation (Weaver et al., 2007; Bonasio et al., 2010). In this context, it was recently reported that the expression level of the miRNA *ame-miR-71* is higher in workers relative to queens during the pupal stage (Weaver et al., 2007). A subsequent study revealed that many additional miRNAs are differentially expressed between larvae that are destined to develop as either queens or workers (Shi et al., 2014). These differences in miRNA expression levels are consistent with the hypothesis that miRNAs are involved in the

regulation of caste determination and differentiation. However, functional analyses of these miRNAs is needed to establish genetic causation between changes in the expression of specific miRNAs and the development of reproductive traits.

In contrast to species such as the honey bee, in which caste differentiation occur early during larval development, in some eusocial species such as the ant *Harpegnathos saltator*, females retain the potential to become reproductive individuals (gamergates) throughout life. Although gamergates are morphologically worker-like, they reproduce and behave like a queen following the loss of the primary queen (Peeters et al., 2000). In this species, the transition of workers into gamergates is associated with a significant reduction in the global expression levels of several miRNA genes (Bonasio et al., 2010). How global miRNA down-regulation occurs, and why it might be important for the regulation of reproductive division of labor in this species, are not yet known.

Surprisingly, recent reports suggest that exogenous miRNAs can also affect reproductive caste-determination in honey bees. Guo et al. (2013) reported that miRNAs are present in the honey bee larval food. A comparison of short RNAs found in worker food versus "royal jelly" (food that induces queen development) indicated that the overall amount of miRNAs that are fed to worker-destined larvae is significantly higher than in food given to queen-destined larvae. Furthermore, queen-destined larvae that were fed with royal jelly supplemented with the worker-enriched miRNA *ame-miR-184* developed some workerlike morphologies (e.g., smaller body and shorter wings). This remarkable finding suggests that in honey bees, the consumption of exogenous miRNAs could play an important role in the differentiation of totipotent larvae into either sterile workers or reproductive queens. In this context, the conserved role of *miR-184* in the regulation of neuroendocrine functions across different animal taxa (Morita et al., 2013) is particularly alluring. In agreement with this hypothesis, genetic pathways that are targeted by *miR-184* in mammals are also important for queen versus worker differentiation in bees (Wolschin et al., 2011; Foret et al., 2012), suggesting that perhaps these observed effects of *miR-184* are conserved to the same pathways across mammals and insects.

#### Behavioral Plasticity: Division of Labor

One of the best-studied aspects of eusociality is the division of labor between workers. In some eusocial insects, such as the honey bee, division of labor relates to age (Robinson, 1992; Naug and Gadagkar, 1998; Kim et al., 2012). Young worker bees (typically *<*14 days of age) typically perform in-hive tasks, such as brood care ("nursing") or food handling, and later in life (typically at around 3 weeks of age) they transition to foraging outside the hive. This well-characterized form of behavioral development has emerged as an excellent model for the molecular mechanisms involved in social behavioral plasticity (Robinson et al., 1997, 2005; Denison and Raymond-Delpech, 2008; Bloch and Grozinger, 2011). Gene expression studies, mostly using brain tissue, have demonstrated that division of labor in honey bees, and several other eusocial species, is associated with taskspecific mRNA transcriptional profiles (Whitfield et al., 2003; Adams et al., 2008; Daugherty et al., 2011; Liu et al., 2011; Oxley et al., 2014).

Three recent studies also examined the possible association between changes in brain miRNAs transcript levels and division of labor in honey bees (Behura and Whitfield, 2010; Greenberg et al., 2012; Liu et al., 2012). All three studies found that the expression levels of several miRNAs are upregulated in the brains of foragers relative to bees that perform in-hive duties (**Table 1**).

The association of miRNA transcript levels with specific behavioral states in colonies of eusocial insects is not limited to reproductive and worker divisions of labor. For example, reproductive queens in diverse eusocial species mate only once in their lifetime (Woyke, 1955). In honey bees, newly eclosed virgin queens (gynes) leave the hive for their sole "nuptial flight" during which they copulate with 10–20 males. After mating, they spend the rest of their lives laying eggs inside the hive. Thus, virgin and mated queens represent two distinct behavioral and physiological states (Winston, 1987). A recent study of the miRNA transcriptome in virgin and mated honey bee queens identified two different genes (*ame-miR-124* and *ame-miR-275*), which are differentially expressed in virgin and mated queens (Wu et al., 2014). While the precise function of these miRNAs



*miRNAs that were differentially expressed in at least two of the studies are highlighted in red. Denoted worker group (foragers/nurses) expressed significantly higher levels relative to the other group. Behura and Whitfield (2010) measured expression of pri-miRNA using qRT-PCR, Liu et al. (2012) relied on RNA sequencing of mature miRNA, while Greenberg et al. (2012) measured mature miRNA using northern blots.* ∗*qRT-PCR analysis showed a trend that was opposite to the RNA-seq data.*

in honey bees is not known, previous reports indicate that *miR*-*124* is an evolutionary conserved, brain-enriched miRNA that plays a role in neural development and plasticity in invertebrates, birds, and mammals (Cao et al., 2007; Makeyev et al., 2007; Rajasethupathy et al., 2009), and more specifically in the development and function of the peripheral sensory system in *C. elegans* (Clark et al., 2010). *miR-275* is also conserved across insects, and has been implicated in the regulation of egg laying behavior in *Aedes aegypti* (Bryant et al., 2010). Wu et al. (2014) speculated that the upregulation of *ame-miR-124* miRNA in virgin queens might be related to the modulation of sensory and/or other neuronal functions associated with mating behaviors, while the increased expression of *ame-miR-275* in mated queens might be important for the newly mated queens to initiate egg-laying behavior.

#### A Case for the Possible Role of miRNAs in the Evolution of Eusociality

Why eusociality evolved multiple times within Hymenoptera but is rare in other insect orders is still a mystery. Several evolutionary models have attempted to explain this phenomenon by proposing various ultimate selective forces that may have driven the repeated rise of eusocial traits in this insect order (Hamilton, 1964; Andersson, 1984; Nowak et al., 2010). Although the regulation of phenotypes associated with eusociality has been independently linked to key regulatory pathways such as insulin and juvenile hormone signaling (Page and Amdam,

FIGURE 2 | Eusociality evolved multiple times in hymenoptera. Phylogeny of the Aculeata. Clades containing eusocial species highlighted in red. Phylogeny is based on Danforth et al. (2013) and Johnson et al. (2013). Each branch represents the lowest taxonomic classification level that is solely

comprised of eusocial species.

#### TABLE 2 | Genomes analyzed.


*The following genomes were analyzed for the presence or absence of miRNAs. We performed an initial BLAST search of annotated miRNAs from D. melanogaster, A. mellifera, and N. Vitripenis in the species denoted by* ∗*. Candidate miRNAs identified as either present only in the genomes of eusocial species (red) or only in Aculeate species (bold), were subsequently analyzed in all genomes listed.*

2007; Toth and Robinson, 2007; Bloch and Grozinger, 2011), the actual molecular events that supported traits contributing to eusociality remain elusive. Here, we propose that the molecular evolution of specific miRNAs could have contributed to the phenotypic evolution of eusociality. We propose that these miRNAs may have contributed to the emergence of eusociality by either introducing new regulatory nodes to ancestral behavioral genetic networks, and/or by supporting novel behavioral genetic networks.

The primary sequence of mature miRNAs is often completely conserved across long phylogenetic distances. Consequently, conserved miRNAs are likely to regulate similar target proteincoding genes in distant taxa, and thus support analogous phenotypes across phylogeny (Lee et al., 2007). Given their broad pleiotropic function, novel miRNAs can modify complex developmental or physiological genetic programs. Because of this, it has been suggested by several investigators that, similarly, to the evolution of protein regulatory networks (e.g., evolution of novel transcription factors), novel miRNAs could lead to evolutionary innovations (Sempere et al., 2006; Lee et al., 2007; Niwa and Slack, 2007; Tarver et al., 2012) such as the establishment of new body plans, or novel behavioral traits (Peterson et al., 2009).

Consistent with this premise, the evolution of bilateral animals from eumetazoans was associated with a great expansion in the number of miRNA genes (Niwa and Slack, 2007). Other examples include the many novel miRNA genes found within placental mammals, and their clade-specific expansion in primates (Sempere et al., 2006). Although the evolution of eusociality is considered a major evolutionary transition event (Maynard Smith and Szathmary, 1995), the hypothesis that it was also associated with the evolution of novel miRNAs has not been previously suggested. We reasoned that the monophyletic Aculeata clade is ideal for testing this hypothesis since, based on current phylogenetic models (Danforth et al., 2013; Johnson et al., 2013), eusociality has independently emerged in this group multiple times (**Figure 2**). Below we discuss two independent, non-mutually exclusive hypotheses for the possible involvement of miRNAs in the evolution of eusociality.

#### Hypothesis 1: Specific miRNAs have been Repeatedly Associated with Eusocial Evolution in Hymenoptera

Here, we hypothesize that, similarly to the evolution of novel transcription factors, the repeated evolution of specific new miRNAs, either *de novo* or via duplication events, facilitated the evolution of some eusocial traits in multiple independent clades that currently display eusociality. Under this hypothesis, novel miRNAs in current eusocial species act as essential nodes in genetic networks that support eusocial traits. If true, we expect that specific miRNAs would be more likely to be present in the genomes of eusocial species in comparison to related solitary species.

As an initial test of this hypothesis, we searched for miRNA genes in the sequenced genomes of species in the Aculeata clade, which includes all living eusocial species in Hymenoptera. We first generated a list of all known annotated miRNA genes available in miRBase for the eusocial honey bee *Apis mellifera*, the solitary wasp *Nasonia vitripennis*, and the fruit fly *D. melanogaster* (Griffiths-Jones et al., 2006). Next, we searched for the presence or absence of each annotated miRNA in several representative hymenopteran genomes (**Table 2**) using BLASTN (Altschul et al., 1997). We only scored a miRNA as "present" if an exact match to the mature miRNA sequence was found in the genome (**Figure 3**). Consistent with data from other animal clades (Tarver et al., 2012), we found that most annotated miRNA genes aligned with phylogeny rather than with the presence or absence of eusociality. Nevertheless, five miRNA genes (*amemiR-281*, *ame-miR-306*, *ame-miR-279c*, *ame-miR-279d*, and *amemiR-6065*) seem to be associated with the expression of eusocial traits independent of phylogeny (**Figure 3**).

To further refine our results, we subsequently extended the bioinformatic analyses to all available sequenced hymenopteran genomes, as well as several non-hymenopteran insect species, which served as outgroups (**Figure 4A**). Although the low sequence coverage for some of the analyzed ant genomes could lead to higher false-negative discovery rate, we reasoned that the likelihood that certain miRNAs will be falsely missing from all analyzed genomes is very low. Future miRNA sequencing data from many of the species studied here should further help reducing the possibility of false-negatives.

This analysis revealed that three out of the five putative eusociality-associated miRNAs were unique to Hymenoptera (*ame-miR-281, ame-miR-306,* and *ame-miR-279c*)*,* and one possibly unique to Aculeata (*ame-miR-6065*). The phylogenetic distribution of these five miRNAs indicated that multiple eusociality-associated miRNAs might have been gained and lost during the Hymenoptera radiation. In addition, we found that two eusociality-associated miRNA genes (*miR-306* and *miR-6065*) were lost in the eusocial wasp *Polistes dominula*. Markedly, two of the eusocial-related miRNAs (*miR-281* and *miR-6065*) were also present in the genome of the facultative eusocial bee *Lasioglossum albipes*. One possible explanation for this finding is that these specific miRNAs are important for traits associated with basal levels of sociality such as communal living, overlapping generations, and reproductive division of labor (Kocher et al., 2013).

Our analysis also revealed that two of the candidate eusociality-related miRNAs (*mir-279c* and *mir-279d*) belong to a single conserved miRNA-family (Cayirlioglu et al., 2008; Hartl et al., 2011; Luo and Sehgal, 2012; Mohammed et al., 2014). The most parsimonious interpretation of these observed phylogenetic patterns is that *miR-279d* is conserved across Arthropoda, but was lost in Diptera and Hymenoptera, and then reappeared via duplications in eusocial Aculeates. In contrast, *miR-279c* seems to have specifically evolved in Hymenoptera prior to the divergence of Aculeata, and was subsequently lost from nonsocial Aculeate species. The identification of members of the

#### FIGURE 3 | Phylogenetic distributions of miRNAs in Hymenoptera genomes. miRNAs present in each genome are shown in yellow, while those absent are shown in blue. miRNAs present in all or only one species are not shown. Data are clustered based on the phylogenetic relationships between the species analyzed, with eusocial species shown in red. Genes framed in black are present only in eusocial species. Genes framed in red are present only in Aculeata. The fruit fly *Drosophila melanogaster* served as the outgroup. Phylogeny based on Danforth et al. (2013) and Johnson et al. (2013).

*mir-279* family as possible candidate genes for the evolution of eusociality is in agreement with findings about their differential regulation between nurses and foragers (**Table 1**), and possible functions in *Drosophila*. For example, members of the *mir-279* family have been implicated in regulating neuronal development (Hartl et al., 2011), olfactory receptivity (Cayirlioglu et al., 2008; Hartl et al., 2011), and circadian rhythms (Luo and Sehgal, 2012). It is interesting to note that plasticity in both circadian rhythms (Bloch, 2010) and olfactory neurons has been shown to be associated with worker and reproductive divisions of labor in eusocial Hymenoptera (López-Riquelme et al., 2006; Zube and Rössler, 2008; Mysore et al., 2009). Although preliminary, these findings suggest that members of the *mir-279* gene family are prime candidates for studies on the possible roles of specific miRNA in the evolution of eusocialityrelated traits.

To further test this hypothesis it will be necessary to increase the phylogenetic resolution of our analyses by studying the miRNA repertoire encoded by the genomes of additional social and solitary insects. It will also require the development of tools that will allow the manipulation of focal miRNA expression to causally determine their effect on behavioral and physiological traits related to eusociality. The recent progress in genomeediting techniques for honey bees and other social insects (Wang et al., 2013; Schulte et al., 2014) suggest that this will be feasible in the near future. Another complementary approach will be to study the protein-coding genetic networks that eusocialityassociated miRNAs are interacting with. By identifying the genes involved, their spatial and temporal expression patterns, and the possible physiological and behavioral processes they modulate, a higher resolution picture of the genetics that support eusociality could emerge.

#### Hypothesis 2: Aculeate-Specific miRNAs were Required for Eusocial Evolution

The second hypothesis we consider is that the presence of specific miRNAs in the pre-eusocial Aculeate genome might have "primed" certain species to evolve eusociality. In other words, specific miRNAs, already present in the genome of the solitary Aculeate ancestor were required, but not sufficient, for the emergence of eusocial traits. Under this hypothesis, specific miRNAs already present in the ancestral solitary aculeate increased the probability of emergence of specific behavioral and physiological traits in response to selective pressures that favored eusociality.

If true, we expect that specific miRNAs should be present in all Aculeate genomes, but absent from all other hymenopteran genomes, as eusociality has never been observed in hymenopteran species outside of the Aculeta. Our initial analysis revealed six Hymenoptera specific miRNA genes (*ame-miR-927b*, *ame-miR-980*, *ame-miR-2765*, *ame-miR-3786*, *ame-miR-6001*, and *ame-miR-6048*; **Figure 3**). However, two of these genes were were also present in the sawfly *Athalia rosae* (*ame-miR-927b* and *ame-miR-3786*), and therefore are not specific to Aculeata. Three additional genes (*ame-miR-980*, *ame-miR-2765*, and *ame-miR-6048*) appear to have originated after the divergence of Vespidae and therefore did not fulfill the above criteria (**Figure 4B**). Thus, our analysis revealed *amemiR-*6001 as the single Aculeate-specific miRNA candidate gene that should be tested in the context of the above hypothesis. Similarly to Hypothesis 1 (see Hypothesis 1: Specific miRNAs have been Repeatedly Associated with Eusocial Evolution in Hymenoptera), the possible role of *miR-6001* in the repeated evolution of eusocial traits in Aculeata is hypothetical. Directly testing the hypothesis we put forward here will require extensive molecular, biochemical, and phenotypic studies of its possible physiological and behavioral roles in eusocial traits.

#### References


#### A Look to the Future

To date, the majority of data about the function of miRNAs in social insects come from studies of the European honey bee, *A. mellifera*. Thus, additional molecular and evolutionary analyses of non-hymenopteran eusocial insects as well as other eusocial and solitary clades in the Hymenoptera are required in order to better understand miRNA functions in the context of eusociality. Furthermore, to establish causation between the action of specific miRNAs and eusocial traits, new *in vivo* genetic and molecular techniques to manipulate social insects are required. Recent advances in molecular genetics of social insects (Yan et al., 2014), and the successful generation of transgenic honey bees (Ben-Shahar, 2014; Schulte et al., 2014), suggest that such studies might be possible in the near future. Furthermore, the development of pharmacological reagents that can block or mimic the action of specific miRNAs (e.g., antagomirs), would represent another important step in that direction (Cristino et al., 2014).

#### Author Contributions

ES collected and analyzed genomic data. ES, GB, and YB-S gathered and synthesized relevant literature. ES, GB, and YB-S wrote the manuscript.

#### Acknowledgments

We would like to thank Amy Toth for granting us access to unpublished genomic data of *P. dominula*. This work was supported by grants from the US National Science Foundation (1322783) to YB-S and the US-Israel Binational Science Foundation (BSF2012807) to GB.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Søvik, Bloch and Ben-Shahar. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Pleiotropy constrains the evolution of protein but not regulatory sequences in a transcription regulatory network influencing complex social behaviors

#### *Daria Molodtsova1, Brock A. Harpur 1, Clement F. Kent 1†, Kajendra Seevananthan2 and Amro Zayed1 \**

*<sup>1</sup> Department of Biology, York University, Toronto, ON, Canada*

*<sup>2</sup> Department of Computer Science and Engineering, York University, Toronto, ON, Canada*

#### *Edited by:*

*Greg J. Hunt, Purdue University, USA*

#### *Reviewed by:*

*Brian R. Johnson, University of California, Davis, USA Sandra M. Rehan, University of New Hampshire, USA*

#### *\*Correspondence:*

*Amro Zayed, Department of Biology, York University, 4700 Keele Street, Toronto, ON M3J 1P3, Canada e-mail: zayed@yorku.ca*

#### *†Present address:*

*Clement F. Kent, HHMI Janelia Farm Research Campus, Ashburn, VA, USA*

It is increasingly apparent that genes and networks that influence complex behavior are evolutionary conserved, which is paradoxical considering that behavior is labile over evolutionary timescales. How does adaptive change in behavior arise if behavior is controlled by conserved, pleiotropic, and likely evolutionary constrained genes? Pleiotropy and connectedness are known to constrain the general rate of protein evolution, prompting some to suggest that the evolution of complex traits, including behavior, is fuelled by regulatory sequence evolution. However, we seldom have data on the strength of selection on mutations in coding and regulatory sequences, and this hinders our ability to study how pleiotropy influences coding and regulatory sequence evolution. Here we use population genomics to estimate the strength of selection on coding and regulatory mutations for a transcriptional regulatory network that influences complex behavior of honey bees. We found that replacement mutations in highly connected transcription factors and target genes experience significantly stronger negative selection relative to weakly connected transcription factors and targets. Adaptively evolving proteins were significantly more likely to reside at the periphery of the regulatory network, while proteins with signs of negative selection were near the core of the network. Interestingly, connectedness and network structure had minimal influence on the strength of selection on putative regulatory sequences for both transcription factors and their targets. Our study indicates that adaptive evolution of complex behavior can arise because of positive selection on protein-coding mutations in peripheral genes, and on regulatory sequence mutations in both transcription factors and their targets throughout the network.

**Keywords:** *Apis mellifera***, network hubs, natural selection, evo devo, social evolution**

#### **INTRODUCTION**

Understanding the genetics and evolution of complex traits is a central goal in biology. Behavior is a complex phenotype that exhibits a high degree of variation within an individual's lifetime, within and between populations of the same species, and between species. Behavioral genetics research conducted over the past decade has emphasized the role of conserved genes in behavioral evolution. There is good evidence that behavior, like most complex phenotypes, is controlled by gene regulatory networks that exhibit modularity and pleiotropy, and that genes and gene networks that influence behavior in one organism also influence similar behaviors in evolutionary distant species (Anholt and Mackay, 2004; Reaume and Sokolowski, 2011; Zayed and Robinson, 2012). This conservation of gene action on behavior has allowed researchers to study behavioral evolution within the framework of Evolutionary Developmental Biology (i.e., evo devo) (Carroll, 2008). The synthesis of behavioral genetics and evo devo has led to many insights (Linksvayer and Wade, 2005; Toth and Robinson, 2007, 2009), including the existence of a genetic tool kit for behavior (i.e., conserved gene modules that influence basic forms of behavior across species), and that complex behaviors can evolve through the co-option of genetic modules that control simple forms of behavior. In contrast to the evo devo paradigm, there is a burgeoning body of literature suggesting that novel taxonomically-restricted genes are important, and perhaps most prominent, in behavioral evolution (Johnson and Tsutsui, 2011; Chen et al., 2013; Ferreira et al., 2013; Simola et al., 2013; Harpur et al., 2014; Jasper et al., 2014; Sumner, 2014). Fortunately, genomics-enabled research on a variety of model and non-model organisms is providing a wealth of information on the contribution of novel and conserved genes to the genetic architecture of complex traits. Along with population genomic data on levels of selection acting on genes and regulatory sequences, evolutionary biologists are at the verge of ultimately testing the different theories of phenotypic evolution.

The different paradigms of phenotypic evolution make distinct predictions about the relative contribution of regulatory and protein-coding sequence changes. On one end of the spectrum, the evo devo paradigm emphasizes the role of adaptive regulatory sequence evolution (Wray, 2007; Carroll, 2008) because of the assumption that genes with multiple functions, or genes that interact with other genes, are expected to experience a great deal of constraint at their amino acid sequence (Fisher, 1930). Others have challenged this central assumption of the evo devo paradigm by arguing that seemingly "conserved" proteins, including transcription factors, have several features that allow them to "escape" the constraining influence of pleiotropy thereby allowing adaptive evolution via amino-acid changing mutations (Lynch and Wagner, 2008; Wagner and Lynch, 2008); such features include alternative splicing, modularity at the level of protein domain and structure, and the presence of mutable short or simple sequence motifs. At the other end of the spectrum, there is a growing interest in novel taxonomically restricted genes that are free to evolve new functions without suffering from the constraining effect of pleiotropy (Chen et al., 2013). Empirical evidence do not fully support any one of these three paradigms over the others—there is population genetic evidence for both adaptive protein sequence evolution and adaptive coding sequence evolution in many organisms (Andolfatto, 2005; Hoekstra and Coyne, 2007; Halligan et al., 2010, 2013; Harpur et al., 2014; Wallberg et al., 2014). However, most previous tests of these paradigms involved correlating general rates of protein evolution with molecular features of genes and their position in regulatory networks (e.g., Hahn and Kern, 2005; Kim et al., 2007; Davila-Velderrain et al., 2014); data on the actual levels of positive or negative selection on coding sequences (Assis and Kondrashov, 2014) are seldom used. Moreover, we know very virtually nothing about how pleiotropy and the structure of gene regulatory networks affect patterns of regulatory sequence evolution.

The honey bee *Apis mellifera* has emerged as a model organism for studying the genetics and evolution of complex behaviors (Hunt et al., 2007; Page et al., 2012; Zayed and Robinson, 2012). Here we use several powerful genomic resources developed for the honey bee to examine if regulatory networks that influence behavior follow the predictions of the evo devo paradigm for phenotypic evolution. Chandrasekaran et al. (2011) recently constructed a brain transcriptional regulatory network (TRN) influencing several aspects of worker behavior, including behavioral maturation, foraging, and colony defense. The honey bee brain TRN is highly amenable to studies of how connectedness and network topology constrain behavioral and molecular evolution, especially given the recent availability of a large population genomic dataset for the honey bee (Harpur et al., 2014), which consists of genome wide polymorphism data for 39 *A. mellifera* diploid genomes and genome wide divergence data between *A. mellifera* and its sister species *A. cerana*.

We used the honey bee population genomic dataset to study the strength of selection on protein and putative *cis-*regulatory sequences of genes in the bee brain TRN. We tested the following hypotheses from the evo devo paradigm: (1) Highly connected TFs and target genes are predicted to experience stronger negative selection on nonsynoymous mutations relative to weakly connected TFs and target genes and (2) Genes with signs of adaptive amino acid sequence evolution are expected to be less central within the regulatory network. The evo devo paradigm does not explicitly make predictions about the relationship between pleiotropy and regulatory sequence evolution, but rather predicts that the evolution of regulatory sequences should be less constrained relative to protein sequence evolution, and that regulatory mutations are more likely to fuel adaptive evolution. We compared the average selection coefficient on mutations in putative *cis-*regulatory regions of strongly and weakly connected genes within the TRN to explore how network properties influence regulatory sequence evolution. Our study provides an important glimpse into the evolution of regulatory networks that influence complex behaviors.

#### **MATERIALS AND METHODS**

#### **SEQUENCING, ALIGNMENT, SNP CALLING AND MODIFIED McDONALD-KREITMAN (MK) TESTS**

We recently sequenced 40 honey bee genomes, each at approximately 40X coverage, using Illumina Hi-Seq technology (Harpur et al., 2014). Alignment and polymorphism identification were described in detail by Harpur et al. (2014). We used a Bayesian implementation of the McDonald-Kreitman (MK) test, using SnIPRE (Eilertson et al., 2012), to determine the population size scale selection coefficient γ for 12,303 genes in the honey bee genome. Here, we used the population genomics dataset to study selection acting on putative *cis-*regulatory regions of the honey bee genome. We first estimated the number of polymorphic mutations in *A. mellifera,* and the number of fixed mutations between *A. mellifera* and its sister species *A. cerana,* in putative *cis*-regulatory regions of honey bee genes. Because the regulatory sequences of the honey bee genome have not been characterized, we considered the 1000 bp sequence upstream of each gene's start codon as a putative *cis-*regulatory region (Davidson, 2006; Li et al., 2006; Myers, 2014). We excluded upstream sequences that overlapped genes encoded by the complementary DNA strand, resulting in putative *cis*-regulatory regions with an average size of 905 bp. These regions are expected to contain most of the sequences important for transcriptional and translational control, including the 5'UTR and important transcription factor binding sites (Davidson, 2006; Li et al., 2006; Myers, 2014). Our cut-off would have certainly excluded some regulatory sequences that reside far upstream of genes (Negre et al., 2011)—sequences that are currently very difficult to annotate in the honey bee. Despite this important caveat, our population genomic analyses (see results) show an overall signature of negative purifying selection within 1 Kb upstream of genes, which is consistent with such regions having a functional role related to gene regulation (Dunham et al., 2012; Wittkopp and Kalay, 2012). Following, Torgerson et al. (2009), we studied the evolution of *cis-*regulatory regions using a modified MK test by comparing the ratio of fixed:polymorphic mutations in a *cis-*regulatory sequence of a gene to same ratio for silent sites in the same gene. The modified MK test was implemented using SnIPRE (Eilertson et al., 2012), which allowed us to estimate the average population size scaled selection coefficients on regulatory sequence mutations. Similar to Harpur et al. (2014), we only used polymorphism data from African honey bee genomes, which represent a large population that is minimally impacted by human management (Harpur et al., 2012; Kent et al., 2012).

#### **TRN CONSTRUCTION AND ANALYSIS**

The honey bee brain TRN (Chandrasekaran et al., 2011) is freely available online (Web: http://price*.*systemsbiology*.*net/ honeybee-transcriptional-regulatory-network). The dataset consisted of microarray probes for TFs and their targets in the bee brain TRN. We remapped the array probes to the honey bee's official gene set OGS v3.2 (Elsik et al., 2014) using Blastn v. 2.2.28+. We only retained probes that had perfect matches to OGS v3.2 gene predictions. We were able to blast match microarray probes to 191 transcription factors and 1597 target genes. We restricted our analyses to 184 TFs and 1521 target genes that had γ estimates for coding and putative regulatory sequences. We estimated the number of target genes for every transcription factor (k ranged from 1 to 161), and the number of transcription factors regulating every target (k ranges from 1 to 15). We plotted the regulatory network using Gephi (Bastian et al., 2009) and produced a directed graph with 1504 nodes and 5149 edges representing transcription factor—target interactions. Gephi was used to estimate *betweenness* centrality of the genes in the network. We used the R package poweRlaw (Gillespie, 2014) to fit a power law distribution to TRN connectedness using established methods (Clauset et al., 2009). Statistical tests were carried out using R. We used a one-tailed test to compare the γ of hub and non-hub TFs and targets, given *a priori* theoretical expectations and empirical findings regarding the relationship between pleiotropy/connectedness and molecular evolution. All other *p*-values are two-tailed. It is important to note that the honey bee brain TRN was developed by first selecting honey bee TFs that had robust orthologs to *Drosophila* TFs (Chandrasekaran et al., 2011); the bee brain TRN is thereby enriched for old taxonomically-conserved TFs and target genes. Our study of the bee brain TRN can therefore illuminate how ancestral gene networks influencing behaviors evolve, but tell us little about the role of taxonomically-restricted genes in behavioral evolution—a topic that we recently discussed elsewhere (Harpur et al., 2014).

#### **RESULTS**

#### **SELECTION ON REGULATORY AND CODING SEQUENCES IN THE HONEY BEE GENOME**

We had previously estimated the average population size scaled selection coefficient γ on nonsynonymous mutations in 12,303 genes in the honey bee genome since divergence between *A. mellifera* and *A. cerana* (ca. 5 MYA) (Harpur et al., 2014). Here we used a variant of the MK test (Torgerson et al., 2009; implemented using Eilertson et al., 2012) to estimate the average γ on mutations in putative *cis*-regulatory sequences by comparing the ratio of polymorphic:fixed mutations within 1 kb upstream of a gene's start codon to the ratio of polymorphic:fixed synonymous mutations at the same gene. We were able to estimate γ on the putative *cis*-regulatory sequences of 10,807 genes in the honey bee genome (**Figure 1**). We found most (93%) *cis*-regulatory sequences to have estimates of γ consistent with neutral or nearly neutral evolution (−1 *<* γ *<* 1). About 6% of *cis*-regulatory sequences have γ *<* −1, indicative of negative purifying selection, while 1% of sequences have signs of positive selection (γ *>* 1). In contrast to evolution of protein coding sequences (average γ ∼ 0), the average mutation in *cis*-regulatory regions appear to be

weakly deleterious (average γ = −0*.*4). This pattern was previously observed in humans (Torgerson et al., 2009) and most likely results from an observational bias: sequences from rapidly evolving regulatory regions will have many mismatches between *A. mellifera* and *A. cerana*, which results in lower alignment scores and coverage, and would have been removed from the dataset based on our quality control filters. As such, direct comparisons of the selection coefficient on coding and regulatory mutations are not appropriate. Instead, we examined the influence of a gene's connectedness and position within the TRN on regulatory and protein sequence evolution in separate analyses.

#### **NETWORK TOPOLOGY AND EVOLUTION OF TFs AND THEIR TARGET GENES**

We studied patterns of selection on coding and regulatory mutations in 170 transcription factors (TFs) and 1334 of their target genes in the honey bee brain TRN. Similar to other regulatory networks (Babu et al., 2004; Nicolau and Schoenauer, 2009), the honey bee brain TRN is approximately scale-free, whereby the distribution of connectedness (*k*) between the network nodes (i.e., genes) has a very long tail (Supplementary Information **Figure S1**). The bee brain TRN contained a large number of genes with a small number of connections, and a small number of genes with a large number of connections—often called "hub" genes. The number of connections, *k*, between nodes in a scale-free network follows a power law, at least above a certain value of *k* (Nicolau and Schoenauer, 2009). Connectedness varied between 1 and 161 in the honey bee brain TRN, and we found the tail of the connectedness distribution to follow a power law (*x*min = 42, ∝= 3*.*00; *H*<sup>0</sup> = power law: Goodness of fit: 0.088, *p* = 0*.*32). We elected to analyse the dataset by categorizing genes as hub or non-hub, following Wang et al. (2010a), because analyses based on linear models or correlations do not adequately deal with the properties of regulatory networks (i.e., the distribution of connections within the TRN is not normal). Following Wang et al. (2010a), we considered the top 20% of most connected TFs as hubs (*k >* 44 connections). Hub TFs were more central in the network as evidenced by a significantly higher estimate of eigenvector centrality relative to non-hub TFs (Wilcoxon test, *p <* 2*.*2e-16). We found that hub TFs had a significantly lower mean coding γ than non-hub transcription factors (**Figure 2A**, Wilcoxon 1-tailed *p* = 0*.*0025), and that hub TFs were significantly enriched for genes with negative coding γ (Chi square enrichment *p* = 0*.*015) relative to non-hub TFs. In contrast to coding γ, hub TFs and non-hub TFs did not significantly differ with respect to *cis-*regulatory γ (**Figure 2C**, Wilcoxon 1-tailed *p* = 0*.*27). Hub and non-hub TFs did not significantly differ in terms of sequence coverage and length at regulatory and coding sites (Supplementary Information Table 1).

Similar to TFs, we used connectedness to classify target genes in the TRN into hubs (top 20%) and non-hubs based on *k*. Hub target genes within the TRN were regulated by four or more TFs, and were significantly more central within the network relative to non-hub target genes (Wilcoxon *p* = 2*.*2e-16). Similar to the

differences between hub TFs and non-hub TFs, hub target genes had significantly lower coding γ (**Figure 2B**, Wilcoxon 1-tailed *p* = 0*.*0425), but not *cis*-regulatory γ (**Figure 2D**, Wilcoxon 1 tailed *p* = 0*.*12) relative to non-hub target genes. Hub and non-hub target genes did not significantly differ in terms of sequence coverage and length at regulatory and coding sites (Supplementary Information Table 1).

#### **WHERE IS POSITIVE SELECTION ACTING WITHIN THE TRN?**

We mapped all genes with signatures of positive selection on coding and *cis*-regulatory sequences in the TRN (**Figure 3**). We also estimated *betweenness* for each gene in the TRN; *betweenness* is a global measure of centrality (Borgatti and Everett, 2006) which ranges from 1, indicating most central or at the core of the network, to 0, indicating the outside perimeter or the periphery of the network. We compared the average *betweenness* of genes with substantial signs of positive (γ *>* 1) and negative (γ *<* −1) selection. We found that proteins with signatures of positive selection on their coding sequences had significantly lower *betweenness* relative to proteins with signatures of negative selection, indicating that adaptively evolving proteins are often more distant from the network core relative to proteins with signs of negative selection (**Figure 4A**, Wilcoxon, two tailed *p* = 0*.*04). In contrast, we did not find a significant difference in the *betweenness* of genes with positive selection on their *cis*-regulatory sequences relative to those with negative selection on their *cis*-regulatory sequences (**Figure 4B**, Wilcoxon two-tailed *p* = 0*.*4). This indicates that genes with regulatory sequences experiencing positive selection reside in approximately the same locations within the TRN as genes with regulatory sequences experiencing negative selection.

#### **DISCUSSION**

We examined how gene position within a network influenced the average selection coefficient γ on putative *cis*-regulatory and replacement mutations in 1504 genes in the honey bee brain TRN. Our results support a "mosaic" view of phenotypic evolution by illuminating how the scale-free properties of regulatory networks (Wang et al., 2010b; Le Nagard et al., 2011; Wagner and Zhang, 2011) facilitate adaptive evolution involving both coding and regulatory mutations.

Several lines of evidence suggest that the most connected, and likely most pleiotropic, proteins within the bee brain TRN experience the greatest levels of purifying selection, as predicted by Fisher (1930) and the Evo Devo paradigm (Carroll, 2008). Despite the large number of factors that influence the rate of molecular evolution of genes (Xia et al., 2009) we consistently found that the most connected genes in the TRN had the strongest signatures of negative selection on their coding sequence. In brief, transcription factors that regulate hundreds of target genes experience, on average, stronger negative selection on their coding sequence relative to transcription factors the regulate a few target genes (**Figure 2A**). Hub transcription factors likely have to interact with many other co-factors, in addition to binding target promoter sites, which may be responsible for the stronger levels of purifying selection on their amino acid sequence. Similar to hub transcription factors, hub target genes that are regulated by

many transcription factors experience stronger negative selection on their coding sequence relative to target genes that are regulated by a few transcription factors. Target genes that are regulated by multiple TFs may be expressed in multiple tissues or during multiple contexts relative to target genes regulated by a few TFs, resulting in greater pleiotropy and stronger purifying selection, as evident from our analysis (**Figure 2B**). It is important to note that several genes within the TRN had signs of adaptive protein evolution; most of these genes were transcription factor targets, and most resided near the periphery of the TRN. Lynch and Wagner (2008) and Wagner and Lynch (2008) previously argued that proteins, including conserved TFs, have features that allow them to escape from the negative effects of pleiotropy. Our population genomic data are not fully consistent with the Lynch and Wagner hypotheses because the most central and most connected TFs or targets do experience stronger levels of negative selection vs. peripheral and weakly connected TFs or targets; a relationship that is more inline with the classic evo devo paradigm. We strongly believe that the structure of TRNs hold the key for reconciling the predictions of the evo devo paradigm with the empirical data showing that amino-acid changes do contribute to adaptive evolution. The classic evo devo paradigm assumes that *most* genes are constrained by pleiotropy, while studies of TRN structure clearly show that only a *few* genes are highly connected and central, while *most* genes are weakly connected and peripheral. Although pleiotropy does appear to curtail adaptive protein sequence evolution of the *few* most connected and most central genes within a TRN, adaptive protein evolution is still a powerful evolutionary force for *most* TRN genes that reside *at* the network periphery.

In stark contrast to the influence of TRN topology on protein coding evolution, we found that connectedness matters little with respect to levels of selection on putative *cis*regulatory regions. The average selection coefficient on regulatory sequence mutations of hub transcription factors was similar to that of non-hub transcription factors (**Figure 2C**). Similarly, the selection coefficient on regulatory sequences of hub target genes was similar to those of non-hub target genes. Genes with signs of adaptive regulatory sequence evolution were found in similar locations within the TRN as genes with negative selection on their regulatory sequences. Our analysis indicates that network properties do not significantly shape the selection pressures acting on regulatory sequences within the TRN. It is not clear how this evidence support the evo devo paradigm because the evo devo paradigm does not make explicit predictions about the relationship between pleiotropy, connectedness and regulatory sequence evolution. On one hand, our finding that putative *cis*regulatory sequences evolve independently of TRN connectedness and topology appears to support an important assumption of the evo devo paradigm: pleiotropy or connectedness of a protein only influences the protein's amino acid sequence, not its *cis*-regulatory sequence. On the other hand, another interpretation of the evo devo paradigm suggests that the most connected and pleiotropic genes should have the greatest levels of adaptive regulatory evolution, while the least connected genes should have the least levels of adaptive regulatory evolution (i.e., regulatory sequence evolution compensates for constrained amino acid sequences); our findings do not support this idea. It would appear that adaptive regulatory sequence evolution can occur throughout any compartment of the regulatory network.

Our analyses shed light on the evolution of regulatory networks influencing complex behavior. Highly connected genes within the honey bee brain TRN exhibit stronger patterns of purifying selection on amino acid replacement mutations similar to highly connected genes in other types of networks studied so far. Also, genes with signs of adaptive protein evolution tend to be concentrated at the network periphery, as previously documented for proteins in the Human Interactome (Kim et al., 2007). We found that connectedness does not influence the strength of selection on regulatory sequences of genes in the bee brain TRN. Our study suggests that the properties of regulatory networks, with a few large modules and many small modules, allows for both coding and regulatory sequence mutations to contribute to adaptive evolution. Based on our findings, we expect adaptive evolution of regulatory networks influencing complex traits to proceed through positive selection on coding mutations in peripheral genes and on regulatory mutations in TFs and their targets across the regulatory network. We had previously presented strong evidence that novel taxonomically-restricted genes have the highest rates of adaptive protein evolution in the honey bee genome (Harpur et al., 2014). A recent analysis also pointed to an increased expansion of regulatory sequences in social genomes (Simola et al., 2013). Going forward, it will be important to study how novel taxonomically restricted genes interact with conserved TRN modules with expanded regulatory features to influence the evolution of complex behaviors in social insects.

#### **ACKNOWLEDGMENTS**

We thank K. Eilertson for help with SnIPRE, the N. Price and G. Robinson groups for constructing the honey bee brain TRN and for making this valuable dataset publically available, and two anonymous referees for providing insightful comments on our mansucirpt. This study was supported by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada and an Early Researcher Award from the Ontario Ministry of Research and Innovation (to Amro Zayed). Brock A. Harpur was supported by an Elia Doctoral Scholarship from York University.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fgene. 2014.00431/abstract

#### **Figure S1 | Degree distribution of connectedness in the TRN.**

#### **REFERENCES**


Fisher, R. A. (1930). *The Genetic Theory of Natural Selection*. New York, NY: Dover.

Gillespie, C. S. (2014). Fitting heavy tailed distributions: the powerlaw package. *R package version* 0.20.25 (Newcastle, UK).


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 03 October 2014; accepted: 21 November 2014; published online: 23 December 2014.*

*Citation: Molodtsova D, Harpur BA, Kent CF, Seevananthan K and Zayed A (2014) Pleiotropy constrains the evolution of protein but not regulatory sequences in a transcription regulatory network influencing complex social behaviors. Front. Genet. 5:431. doi: 10.3389/fgene.2014.00431*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Molodtsova, Harpur, Kent, Seevananthan and Zayed. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Neutral and adaptive explanations for an association between caste-biased gene expression and rate of sequence evolution

#### *Heikki Helanterä1 \* and Tobias Uller 2,3*

*<sup>1</sup> Department of Biosciences, Centre of Excellence in Biological Interactions, University of Helsinki, Helsinki, Finland*

*<sup>2</sup> Department of Zoology, Edward Grey Institute, University of Oxford, Oxford, UK*

*<sup>3</sup> Department of Biology, University of Lund, Sölvegatan, Lund, Sweden*

#### *Edited by:*

*Greg J. Hunt, Purdue University, USA*

#### *Reviewed by:*

*Jacob A. Tennessen, Oregon State University, USA Alex Wong, Carleton University, Canada*

#### *\*Correspondence:*

*Heikki Helanterä, Department of Biosciences, Centre of Excellence in Biological Interactions, University of Helsinki, PO Box 65, FI-00014 Helsinki, Finland e-mail: heikki.helantera@helsinki.fi*

The castes of social insects provide outstanding opportunities to address the causes and consequences of evolution of discrete phenotypes, i.e., polymorphisms. Here we focus on recently described patterns of a positive association between the degree of caste-specific gene expression and the rate of sequence evolution. We outline how neutral and adaptive evolution can cause genes that are morph-biased in their expression profiles to exhibit historical signatures of faster or slower sequence evolution compared to unbiased genes. We conclude that evaluation of different hypotheses will benefit from (i) reconstruction of the phylogenetic origin of biased expression and changes in rates of sequence evolution, and (ii) replicated data on gene expression variation within versus between morphs. Although the data are limited at present, we suggest that the observed phylogenetic and intra-population variation in gene expression lends support to the hypothesis that the association between caste-biased expression and rate of sequence evolution largely is a result of neutral processes.

**Keywords: polymorphism, social insects, phenotypic plasticity, neutral evolution, antagonistic pleiotropy**

#### **INTRODUCTION**

Polymorphic populations are comprised of distinct interbreeding phenotypes. These include males and females, alternative male mating morphs and different forms of resource, dispersal, or defense polymorphisms (West-Eberhard, 2003). Notably, polymorphisms also include the social insect castes, such as queens and workers. Different forms of polymorphisms vary in several respects (**Box 1**), but they have in common that determination and maintenance of morph-specific phenotypes involve differential expression of genes. Polymorphisms are outstanding model systems for studying the relative importance of changes in regulatory versus coding sequences (Carroll, 2005; Hoekstra and Coyne, 2007), and the interchangeability of genes and environments in phenotypic evolution (West-Eberhard, 2003; Schwander and Leimar, 2011; Uller and Helanterä, 2011).

The recent increase in availability of large scale gene expression data through microarray and whole transcriptome sequencing has facilitated quantitative and qualitative description of the developmental genetic basis of polymorphism. In social insects, caste biased expression patterns have been investigated transcriptome wide in, for example, honeybees *Apis mellifera* (Grozinger et al., 2007), bumblebees *Bombus terrestris* (Colgan et al., 2011), *Polistes* wasps (Sumner et al., 2006; Ferreira et al., 2013) and ants such as *Solenopsis invicta* (Hunt et al., 2011, 2013) and *Temnothorax longispinosus* (Feldmeyer et al., 2014). Comparisons can be made with data from other polymorphic systems, including males versus females of laboratory model species with genotypic sex determination (e.g., mice and *Drosophila* spp, (Ranz et al., 2003; Zhang et al., 2007; Mank et al., 2008; Meisel, 2011), and a variety of environmentally induced polymorphisms, such as horn polymorphism in beetles (Snell-Rood et al., 2011), feeding type polymorphism in toad tadpoles (Leichty et al., 2012), and dispersal polymorphisms in pea aphids (Purandare et al., 2014). Sequencing methods, sampling design, pooling of samples, statistical power and definitions of what qualifies as a morph biased expression pattern vary extensively among studies, and the proportions of genes or transcripts that are classified as morphbiased range from a few to several tens of percents. For example, between 7.5% (Hunt et al., 2011) and 40% (Grozinger et al., 2007) of studied genes have been classified as caste biased in social insects. In males and females, Naurin et al. (2011) describe only 1.6–2.4% of genes as sex biased in two bird species, whereas as many as 90% of genes were reported to exhibit sex biased expression in a study of *Drosophila* (Innocenti and Morrow, 2010). Approximately half of the genes in the pea aphid show a biased expression pattern according to either morph or sex (Purandare et al., 2014). However, across all these examples, very few genes, if any, are exclusively expressed in one morph. That is, gene expression patterns vary between morphs in degree, and not in an on-or-off manner.

Morph-biased gene expression has a wide range of causes and consequences that are of interest to developmental and evolutionary biologists. In the rest of this commentary we focus on the intriguing observation that worker or queen biased genes

#### **Box 1 | Animal polymorphism.**

Different forms of polymorphisms share many features, but there are also important differences that may affect how the developmental genetic regulation of their determination and function will evolve. This is an area that would benefit strongly from theory and comparative analyzes, and here we can only provide a brief summary of some morph features that may be important for the relationship between patterns of gene expression and sequence evolution.

All morphs are by definition discrete phenotypes, but the degree to which morphs differ from each other varies dramatically between systems. Queens and workers are among their more extreme polymorphisms, but even within social insects the extent to which they differ morphologically (e.g., size and shape) and physiologically (e.g., reproductive activity, lifespan) is quite variable (Bourke, 1999). In some systems where morphs represent, for example, alternative reproductive strategies, it is not necessarily the case that the average difference in gene expression between morphs is greater than differences within morphs (e.g., throughout the season). Other morphs are not even functionally different and hence might only differ consistently in gene expression during morph determination (e.g., some color or pattern morphs).

Morphs also sometimes differ genetically. The most familiar example is the sexes in mammals. It is unclear if and how the extent to which morph determination relies on the presence of specific genes versus specific environments should affect the evolution of morphbiased gene expression. Comparisons of closely related species with environment- versus genotype-dependent sex or caste determination would be informative. Related to this is the extent to which genome evolution in species with genotypic morph determination parallels that of sex chromosome evolution, which not only affects morph-specific gene content but also the extent of antagonistic selection across the genome (Connallon and Clark, 2010).

Polymorphisms can be maintained by different forms of selection. In some cases the average fitness of morphs may be equal when averaged across contexts, for example due to frequency-dependent selection. In other cases one morph is adopted under poor conditions and is therefore maintained also when it exhibits consistently lower fitness on average. The selective dynamics affect the frequency of morphs within populations and hence the strength of selection on genes with morph-specific expression or function (Van Dyken and Wade, 2010). But it is also possible that different forms of selection create particular signatures in terms of sequence evolution, for example their tendency to maintain nucleotide polymorphism within populations (Nielsen, 2005). The outcome of these processes will also be affected by the age of polymorphisms. For example, we suggest that antagonistic selection may be more common in the early stages of morph evolution when resolution of antagonism through expression patterns has not had time to evolve yet, and neutral evolution of both sequence and expression pattern more common in highly canalized morphs.

Morphs are defined at the level of the individual organism. However, in social insects where the colony, or even the supercolony, can function as an individual organism (Queller and Strassmann, 2009), it is possible that morph-biased gene expression is not analogous to, for example, sex-specific gene expression, but more similar to the differential expression of genes among tissues. Some predictions for the rate of sequence evolution of tissue-specific and morph-specific genes compared to constitutively expressed genes are shared, but others differ (in particular when not all morphs are capable of reproduction). Analogously to genes that are over-expressed in one tissue, to understand the actual strength of selection on worker-biased genes it may be necessary to understand how this expression pattern contributes to the performance of the colony (i.e., the reproductive unit).

in ants and social bees appear to evolve faster at the sequence level than do genes with no expression bias (Hunt et al., 2010, 2011; Feldmeyer et al., 2014). This is not just a social insect phenomenon. For example, it has repeatedly been shown in fruit flies and mice that both male and female biased genes evolve faster than unbiased genes (for recent studies see Meisel, 2011; Assis et al., 2012; Grath and Parsch, 2012, reviewed in Parsch and Ellegren, 2013). The same pattern has also been found with respect to sex-specific reproductive functions in *Arabidopsis* (Gossmann et al., 2014), tadpole feeding morphs in spadefoot toads (Leichty et al., 2012), horn polyphenisms in beetles (Snell-Rood et al., 2011), and dispersal morphs in pea aphids (Purandare et al., 2014).

There are a number of potential explanations for these patterns. To the extent that the morphs reflect different reproductive roles, as is the case for males and females, queens and workers, and dispersing sexuals and sedentary asexuals, faster sequence evolution of biased genes can partly be explained by faster evolution of reproductive genes (Meisel, 2011; Wright and Mank, 2013). Fast evolution of reproductive genes has been attributed to sexual selection, including sexual conflict and sperm competition, which is expected to increase the rate of sequence evolution (Swanson and Vacquier, 2002). Similar arguments should apply to the various reproductive conflicts in insect societies (Rice and Holland, 1997). Rapid evolution of reproduction-related genes may also be partly due to faster evolution of tissue specific genes when the observed sex biases arise from genes that are expressed in sex-specific reproductive tissues (Meisel, 2011). Furthermore, when interpreting results of transcriptomes of whole individuals, it needs to be kept in mind that they may reflect differences in the size or composition of tissues between morphs, rather than differences in gene expression at a cell level. More generally, studies in model organisms have established that a large number of additional factors could underlie a correlation between gene expression patterns and evolutionary rates, including expression breadth, overall expression levels, DNA methylation patterns, architecture of regulatory sequences, and potential for pleiotropy (Lemos et al., 2005a; Larracuente et al., 2008; Meisel, 2011; Park et al., 2012; Warnefors and Kaessmann, 2013). Teasing these explanations apart is major challenge in emerging model organisms such as social insects. However, the correlation between morph biased expression pattern and fast sequence evolution rate remains significant even if many of these are controlled for statistically (e.g., Snell-Rood et al., 2011; Grath and Parsch, 2012; Warnefors and Kaessmann, 2013), suggesting that this relationship may be a fundamental feature of the evolution of genomes.

A recent study in fire ants suggested that caste-biased genes evolved faster at sequence level even before they became morph-biased or, indeed, before the evolution of the castes (as shown by comparisons with the solitary, monomorphic wasp *Nasonia vitripennis*; Hunt et al., 2011). Interestingly, a similar pattern was found in in toads where repeated evolution of polymorphism has taken place within a single genus (Leichty et al., 2012). A second interesting finding is that genes that vary in expression levels among morphs also seem to vary extensively in their expression levels within morphs. This is particularly well documented in fire ants (Hunt et al., 2013), and appears to occur also in polymorphic toads (Leichty et al., 2012), and sex biased genes in birds and fruit flies (Mank et al., 2007; Mank and Ellegren, 2009). As described below, both of these observations provide important pieces of evidence for addressing the role of neutral and selective explanations for associations between biased gene expression and rate of sequence evolution across the genome.

In this paper, we provide an overview of five different scenarios that predict a relationship between caste-biased gene expression and accelerated sequence evolution. We draw from both studies of sex biased gene expression in model organisms and from the diverse but less studied polymorphic organisms. We conclude that the association between caste-biased gene expression and rate of sequence evolution will be better understood if we address the contribution of selective and neutral processes for both inter-individual and phylogenetic divergence in gene expression. This requires both more detailed analyzes of individual and context-dependent variation in gene expression, and establishing whether the strength of selection on gene sequences is a cause or a consequence of changing patterns of gene expression.

#### **ROUTES TO COUPLING OF MORPH-BIASED EXPRESSION AND RATE OF SEQUENCE EVOLUTION**

Contemporary patterns of variation in DNA sequence and gene expression partly reflect a mix of selective and stochastic events accumulating over evolutionary time (as well as current conditions experienced by the focal individuals). The relative importance of neutral and adaptive evolution for genome evolution is a contentious issue. DNA sequences diverge as a result of accumulation of changes that are neutral with respect to fitness or too weakly selected to be purged, but they also diverge because of repeated fixation of mutations due to selection. Similarly, divergence in gene regulation can represent both selection and drift. Here we ask how these processes can cause genes with caste-biased expression to exhibit evidence of accelerated sequence evolution.

#### **NEUTRALITY**

Assuming morphs are adaptive, at least some morph-bias in gene expression is a result of selection. However, in many species the number of genes that contribute to morph determination or maintenance of morph-specific phenotypes may be quite small. Thus, it is possible that a large proportion of variation in gene expression between morphs is a result of neutral evolution. Genes whose expression level is under weak selection are expected to be less precisely regulated due to accumulation of near-neutral regulatory mutations (Khaitovich et al., 2005, 2006). By chance, some of the accumulating regulatory mutations may result in morphbiased expression. Thus, morph specific expression patterns can arise through drift. This suggests that, in any given data set, morph-biased expression partly reflects a history of weak purifying selection on gene regulation. This can create a link between rate of sequence evolution and biased gene expression if genes that are under weak selection with respect to sequence are also under weak selection in terms of expression, and consequently more likely to have a drifted toward biased expression pattern than constrained genes. This is likely to often be the case given the observed correlations between gene essentiality and expression noise (Fraser et al., 2004), and expression divergence and sequence divergence among species (Lemos et al., 2005b; Zhang et al., 2007; Mcmanus et al., 2010).

This process should result in the pattern observed in *S. invicta* (Hunt et al., 2011), where genes that presently exhibit morphbiased expression evolved faster even before the evolution of morphs or before morph biased expression pattern arose (**Figure 1**). Also, since it implies that regulation of expression is not under strong selection or constraint, we expect such genes to show substantial variation in their expression both between and within morphs—a pattern also shown in *S. invicta*. The temporal and phylogenetic patterns of sequence and expression evolution that would correspond to this scenario are shown in **Figure 1A**. Assessing neutrality of gene expression is compromised by the lack of a widely accepted neutral baseline (comparable to comparison of synonymous and non-synonymous amino acid changes in sequence data), but theory is advancing fast in this area and several alternatives have recently been proposed (Gout et al., 2010; Warnefors and Eyre-Walker, 2012; Rohlfs et al., 2014).

#### *Relaxed selection due to expression bias*

Interpreting a correlation between expression bias and evolutionary rate in social insects is complicated by the fact that genes expressed in workers only have indirect effects mediated through the queen genotype (Linksvayer and Wade, 2009; Hall and Goodisman, 2012). The strength of selection on worker biased genes thus depends on the genetic similarity or kinship between the worker expressing a gene, and the queen that is reproducing in the nest. Relaxed selection may also occur in polymorphic species where all morphs reproduce. A very general model of polymorphic expression suggests that, all else being equal, expression bias itself may directly contribute to evolutionary rate (Snell-Rood et al., 2010; Van Dyken and Wade, 2010). This is because once expression of a gene falls below functional levels in some individuals (e.g., one of several morphs), those genes are under direct selection in a subset of the population and hence under weaker selection than constitutively expressed genes. Thus, both directional and purifying selection become relaxed following morph-biased gene expression, which allows mildly deleterious alleles to accumulate at a higher rate than is the case for constitutively expressed genes. In one such scenario, a gene under weak selection may drift to non-detectable expression levels in one morph (assuming the strength of selection on sequence and expression are correlated as in **Figure 1A**), leading to further relaxation of selection and hence neutral sequence evolution (**Figure 1B**).

Although this is an attractive hypothesis, the extremely low number of genes that are morph-specific, rather than simply

**expression and sequence evolution for the five scenarios outlined in the text.** The left hand column shows the expected relative variation in expression within morphs and hypothetical changes in morph biased gene expression over time. White bars for morph 1 and gray bars for morph 2. The second column shows how average

morph-biased, may suggest that few genes in fact are under relaxed selection. However, if we assume that there is a general correlation between expression level and strength of selection (Pál et al., 2001; Lemos et al., 2005a; Meisel, 2011), or that genes with a low expression level may be below a threshold value for being functional, the logic holds for genes with non-zero expression. Consistent with these predictions, genes expressed in rarer morphs of pea aphids appear to evolve faster due to relaxed






evolutionary rates are predicted to change over time (p, purifying; n, neutral; d, directional), and the third column shows how the patterns would be seen in a phylogeny in a group where some species show biased expression (B) for the gene in question, whereas others do not (U). Solid line, neutral rate; hatched line, purifying selection; dotted line, directional selection.

purifying selection than genes biased toward the more common morphs (Purandare et al., 2014). Furthermore, the findings that both queen biased and worker biased genes evolve faster than unbiased genes (Hunt et al., 2010, 2011) suggests that fast evolution is not only due to reproductive genes (expressed in queens) evolving fast, which is broadly consistent with the general theory of relaxed selection. Nevertheless, widespread positive selection on worker biased genes in honeybees (Harpur et al., 2014) suggests that relaxed selection is not necessarily a major force limiting adaptive evolution of genes with worker biased expression.

This relaxation of selection due to gene copies in nonexpressing or non-reproducing individuals being invisible to selection should apply to all genes with extreme expression bias and not only those that drift to this situation. For genes that are under positive selection before morph bias evolves, evolution could slow down and approach neutrality due to morph bias, whereas genes historically under purifying selection could start accumulating mutations and shift toward neutrality due to weakened purifying selection with morph bias. As a result, many different patterns of historical signatures of sequence evolution are possible.

#### **SELECTION**

Consistent stabilizing (purifying) or directional (positive) selection can generate both slower and faster rates of sequence evolution compared to the neutral expectation. Harmful mutations in genes whose sequence is essential for organismal function are rapidly purged, resulting in slow rates of evolutionary change. On the other hand, mutations in genes that cause functional changes to phenotypes can be consistently and repeatedly selected if conditions change, such as the case in evolutionary "arms races." Thus, if these patterns of selection covary with expression patterns, it could contribute to the observed relationship between caste-biased gene expression and the rate of gene sequence divergence.

#### *Co-option of neutral genes to morph specific function*

Genes with high rate of sequence evolution due to weak purifying selection may not only drift toward morph-biased expression, as described above, but also be more likely to become co-opted for morph specific functions. This is because weak selection enables the accumulation of genetic variation that can become functional in novel contexts (True and Carroll, 2002). Co-option can potentially occur during morph evolution. Alternatively, genes may become morph biased after the evolution of morphs even if they did not play a role in their original divergence. This has potential implications for the evolution of both the sequence and regulation of those genes. Both positive and purifying selection following co-option are possible and can make the rate of sequence evolution change from near-neutral toward faster or slower or, comparing site by site, increase both the proportions of sites under positive and purifying selection, respectively (**Figure 1C**).

In this scenario, following evolutionary rates of gene sequences over time should reveal a change from expectations of neutrality toward signatures of selection. Consequently co-opted genes should contribute to the observed correlation of expression bias and fast sequence evolution only through those genes that became positively selected following co-option, as only these genes continue to evolve fast. At the level of expression, co-option of historically "near-neutral" gene sequences should result in further selection for precise gene regulation and hence a reduction in expression noise over evolutionary time in lineages with morphs (**Figure 1C**). The role of co-option in the evolution of morph biased gene expression has not been directly studied in social insects, and doing so requires more information on the extent to which morph-biased genes also have morphbiased fitness effects. For example, studies showing that weak selection on sequence precedes morph biased expression (Hunt et al., 2011; Leichty et al., 2012) have not demonstrated that the subsequent expression bias reflects morph specific function rather than continued weak selection. In contrast, positive selection of worker biased genes in honeybees is also consistent with a co-option scenario, but the historical data on evolutionary rates before caste biased expression evolved is lacking (Harpur et al., 2014). Outside social insects it has been observed that up-regulated expression in one morph is linked to higher fitness effects in that morph (Connallon and Clark, 2011; Hall and Goodisman, 2012), but it is unknown whether the fitness effects caused selection for biased expression, or if biased expression arose first followed by compensatory changes to maintain morph fitness.

#### *Evolution under weak pleiotropic constraint*

Genes typically have multiple functional targets, which may constrain their evolution. Functional constraint contributes to the overall strength of selection on sequence and expression and is therefore implicit in much of what has already been discussed. However, the literature also emphasizes a more constructive role of weak pleiotropy where it directly causes particular fast evolving genes to become morph biased. Genes that are expressed in a context or tissue specific manner (Duret and Mouchiroud, 2000; Zhang and Li, 2004), are likely to have low number of interactions with other gene products (e.g., Assis et al., 2012) and therefore be free to evolve under directional selection. Furthermore, it has been suggested that genes that have a regulatory architecture that allows precise regulation, which also decreases pleiotropic constraint, are more likely to exhibit context sensitive expression patterns (Grishkevich and Yanai, 2013), such as morph biased expression. While the correlation of pleiotropic constraint and expression pattern has not yet been tested in social insects, it is supported by sex-specific gene expression patterns in mice, chicken and fruit flies, where sex biased genes appear to exhibit weak pleiotropic constraints (Mank et al., 2007; Meisel, 2011; see also below). If precisely regulated genes, with potentially low pleiotropic constraint, are more likely to evolve morph biased expression patterns, this should create a consistent positive correlation between morph biased expression and high rate of sequence evolution, caused by positive selection, over evolutionary time (**Figure 1D**).

Fast sequence evolution due to positive selection has been shown to occur in worker biased genes in the honeybee (Harpur et al., 2014), in caste biased genes in seven ant genomes (Roux et al., 2014) and male biased genes in *Drosophila* (reviewed in Wright and Mank, 2013). However, additional data is necessary before this can be taken as support for rapid evolution due to relaxation from pleiotropic constraints. If fast evolving genes with expression bias are indeed only weakly constrained by pleiotropy, we expect their evolutionary rate to be high already before the biased expression pattern evolved. Furthermore, under highly precise regulation we expect relatively low expression variation among individuals within morphs. Consistent directional selection is thought to be rare, but may be particularly likely for genes involved in reproduction, immunity and social and reproductive conflicts (Swanson and Vacquier, 2002; Summers and Crespi, 2005; Obbard et al., 2009).Because many morphs, notably sex and caste, have different reproductive functions the morph biased genes that evolve fast under directional selection may largely be composed of reproductive and conflict related genes. This could also explain why such genes evolve fast before they become morph biased (or before morphs evolve). In social insects such genes may be found among genes involved in recognition and responses to hormones, as suggested by Roux et al. (2014). Outside social insects, genes with a conserved male biased expression, likely to be involved in reproductive function and often expressed in sex specific tissues, have been shown to evolve faster than other sex biased genes in *Drosophila* (Grath and Parsch, 2012). The often narrow tissue wide expression profiles of sex biased genes may also support that genes without pleiotropic effects are more likely to become morph biased, but without temporal data it is difficult to tease apart what is cause and consequence for this association.

#### *Morph antagonistic selection*

One source of antagonistic selection is when an allele has beneficial effects on one morph but negative effects in another morph. This form of antagonistic selection has been discussed frequently with respect to sex biased gene expression (Rice and Chippindale, 2001; Morrow et al., 2008; Innocenti and Morrow, 2010), but here we emphasize that the same logic applies to any polymorphism, including social insect castes (see Hall et al., 2013; Holman, 2014, for specific models on caste antagonistic selection). If an allele has opposite fitness effects in two or more morphs, selection should favor suppression of expression in the morph(s) where it has negative consequences (Rice and Chippindale, 2001). Following the evolution of suppression of gene expression, antagonistic pleiotropy is relaxed, which enables genes to respond to directional selection and exhibit fast sequence evolution (Gadagkar, 1997). This follows the general logic described above, but under this scenario the changes in expression pattern and evolutionary rates are predicted to occur concurrently, i.e., sequence evolution accelerates when biased expression evolves (**Figure 1E**). Also, genes under positive selection should be highly regulated (Fraser et al., 2004; Wang and Zhang, 2011), and thus vary relatively little in their expression pattern within each morph. Many sex biased genes do evolve under positive selection (reviewed in Wright and Mank, 2013), and positive selection in worker biased genes has recently been demonstrated in the honeybee (Harpur et al., 2014). The expression history of these genes, and the timing of possible changes in evolutionary rates, is largely uncharted. Support for the theory would require data showing that these genes began to evolve under positive selection following the evolution of morph-biased expression.

Actual mapping of loci with antagonistic fitness effects is currently out of reach for any social insect system, but studies have recently been conducted in *Drosophila* (Innocenti and Morrow, 2010; Parsch and Ellegren, 2013). Only a minor proportion of genes with sex biased expression showed sexually antagonistic fitness effects in a hemiclonal analysis (Innocenti and Morrow, 2010), which suggests that ongoing selection for suppression is not a major explanation for biased gene expression. However, there are several reasons why some of the antagonistic fitness effects may go undetected in such coarse scale analyses (outlined in Parsch and Ellegren, 2013), and the contemporary lack of antagonistic fitness variation in sex biased genes could be a signal of a resolved ancestral conflict (Innocenti and Morrow, 2010). Because a considerable proportion of unbiased genes appear to have antagonistic fitness effects (Innocenti and Morrow, 2010), it is also possible that constraints such as intersexual genetic correlation may limit an evolutionary response to sexual antagonism in terms of biased expression. Alternatively, alleles at unbiased loci that show antagonistic fitness effects may have arisen so recently that the resulting conflict has not yet been resolved through morph-biased suppression of expression.

#### *Summary of scenarios*

It is important to keep in mind that even when the scenarios make mutually exclusive predictions (summarized in **Figure 1**), they still represent processes that can co-occur and overlap. For example, if antagonistic fitness drives the system to morph-specific expression, this leads to relaxation of selection as well. Similarly, co-option and relaxed selection can be seen as alternative interpretations of a similar process that on the one hand allows exploration of the phenotypic space and on the other hand may lead to accumulation of slightly harmful mutations and "polymorphism load." Finally, although weak pleiotropic constraint can be viewed as the principal reason for why some genes become morph biased in their expression, it is also an important determinant of evolutionary rates in other scenarios simply by dictating the overall selection on both gene sequence and expression.

#### **DISCUSSION**

Recent research provides partial support for several of the scenarios described above, but the data on social insects are still very limited and it is unknown how much can be generalized from other polymorphic systems (**Box 1**). It is likely that several processes contribute to some extent. Thus, the empirical task is assessing the relative contribution of the different processes rather than forcing a single explanation to any given pattern. We suggest that doing so relies on two critical types of data—gene expression variation between and within species—both of which are limited in published studies to date.

First, phylogenetic mapping of the rates of sequence evolution and patterns of gene expression (**Figure 1**) is necessary for revealing the temporal order of changes in gene expression and sequence divergence. To date, comparisons are typically weak in terms of phylogenetic rigor (e.g., making use of two-species comparisons), and availability of a number of relevant contrasts is clearly a major challenge for future work. With more species, mapping rates of sequence divergence on a phylogenetic tree can determine whether fast-evolving genes are more likely to show morph-biased expression than slow-evolving genes. In cases where ancestral monomorphic populations are extant, such comparisons can be carried out by comparing evolutionary rates in lineages with and without morphs (e.g., Leichty et al., 2012), although in many cases replication is limited by the number of independent origins of morphs. For the advanced eusocial Hymenoptera, the number of independent origins of morphs is a serious limiting factor—but social taxa that comprise multiple origins of sociality, such as halictid bees or snapping shrimps, could be fruitful model systems for replicated studies. For cases where number of independent replicates is limiting, or where monomorphic outgroups are unavailable, the phylogenetic reconstruction of expression patterns has to be carried out gene-by-gene (see e.g., Grath and Parsch, 2012). Making the comparisons at a relevant phylogenetic scale can reveal both the effects of idiosyncratic features of specific polymorphic taxa, and possible convergent features shared across independent evolutionary origins of polymorphisms.

Furthermore, separating weak purifying selection from positive selection as causes of fast sequence change places demands on sequence data. Simple summary statistics such as the average *dn/ds* ratios per gene can reveal interesting patterns of average evolutionary rates but are unlikely to capture the complexity of the process. McDonald-Kreitman type tests are a more powerful and suitable method for detecting genes under positive selection (see e.g., Harpur et al., 2014 for a recent example) when a small number of taxa are analyzed. Furthermore, given that some of the scenarios predict concurrent changes in both the strength of positive and purifying selection, investigating site specific signatures of selection e.g., using maximum likelihood methods (Yang, 2007) in larger phylogenetic data sets might be necessary for thoroughly teasing apart the contributions of all the different processes (Nielsen, 2005).

Second, we suggest that it will be necessary to establish the patterns of variation among individuals within morphs for teasing apart adaptive and non-adaptive scenarios. This is because the processes that drive the rate of sequence divergence phylogenetically are also expected to generate different patterns of variation in gene expression among individuals within populations and species. Large expression variation among individuals in morph-biased genes would support the idea that genes become morph-biased because they are under relatively weak selection. In contrast, if genes evolve fast as a result of directional selection, this should be associated with precise gene regulation and hence biased genes should exhibit low expression variation within and between individuals of a given morph. Individual-level data on gene expression therefore provide one potential source of information that can help to evaluate the reasons for biased expression, which also sets demands for replication and careful study design for future studies. Given the large size of many social insects, replication at an individual (see e.g., Morandin et al., 2014) and tissue level (Johnson et al., 2013) should be feasible.

Unfortunately, interpretation of gene expression variation is difficult. On the one hand, variation may represent lack of precise regulation, which causes noisy expression (Fraser et al., 2004). On the other hand, gene expression data may show substantial variation simply because of variable external or internal states not controlled during data collection (**Figure 2**). While it has been shown that expression is inherently noisier in non-essential genes in model organisms such as yeast, interpreting patterns of expression variability that underlie complex phenotypes is far from straightforward given the large numbers of genes that exhibit context-dependence that is unrelated to morph-specific function. For example, the proportions of genes that are differently expressed across life stages (Ometto et al., 2011; Perry et al., 2014), social environments (Manfredini et al., 2013), and genotypes (Nipitwattanaphon et al., 2013) are sometimes comparable in magnitude to morph-biased proportions. It has also been shown in studies focusing on single genes, such as vitellogenin, that caste bias is sensitive to seasonal and contextual variation (Azevedo et al., 2011; Libbrecht et al., 2013; Morandin et al., 2014). Furthermore, factors such as individual condition in *Drosophila* (Wyman et al., 2010), behavior in zebrafish (Rey et al., 2013), presence of social and sexual stimuli in swordtails (Cummings et al., 2008), and abiotic environmental conditions (Yampolsky et al., 2012) have been demonstrated to co-vary with expression patterns. These results suggest that without proper replication it cannot be assumed that all observed variation within morphs is stochastic and a sign of weak regulation. Also the observation that morph bias varies extensively between life stages (Ometto et al., 2011) and tissues (Mank et al., 2008) suggests that the more we understand the causes of variation in expression patterns, the fewer genes will be consistently classified as morph biased (Meisel, 2011).

Importantly, assessing any adaptive scenario for gene expression variation is only possible when compared against a suitable neutral expectation. While the neutral evolution of morph biased expression patterns has been directly assessed in only a few cases, studies of selection acting on gene expression patterns in general may shed some light on this issue. There are several recently suggested neutral scenarios in the literature (Gout et al., 2010; Warnefors and Eyre-Walker, 2012; Smith et al., 2013; Rohlfs et al., 2014) but empirical studies that address neutral expectations have focused on species divergence in gene expression and not morph-biased expression. It has been suggested that the factors that cause gene expression to diverge among species (e.g., non-essentiality) also expose genes to evolve context specific expression patterns (Grishkevich and Yanai, 2013). Whether general conclusions about selection on gene expression also apply to caste specific patterns remains an open question, but we suggest that they may very well do. This is supported by the finding of enriched signatures of adaptive regulatory evolution in genes underlying worker behavioral plasticity in honeybees (Harpur et al., 2014), and the extensive diversification of regulatory elements in social insects in general (Simola et al., 2013). Overall, the evidence for selection on gene expression is mixed, but a prevalence of stabilizing selection has been suggested (Gilad et al., 2006; Khaitovich et al., 2006; Warnefors and Eyre-Walker, 2012). In contrast, the relatively large turnover in the set of morph biased genes [caste biased genes between two species of *Polistes* paper wasps (Ferreira et al., 2013), between two species of *Cryptotermes* termites (Weil et al., 2009), sex biased genes among species of *Drosophila* (Metta et al., 2006; Zhang et al., 2007; Jiang and Machado, 2009; Assis et al., 2012) and between zebra finch *Taeniopygia guttata* and common whitethroat *Sylvia communalis* (Naurin et al., 2011)] supports that neutral processes play a large role, implying that genes acquire or lose morph biased expression largely due to drift. This is consistent with studies comparing a small numbers of genes in closely related species that have shown

that caste biases may be evolutionarily labile (Weil et al., 2009; Morandin et al., 2014).

past or current social experience) (white boxes) will show a morph

#### **CONCLUSIONS**

Recent data suggests a relationship between the rate of sequence evolution and morph-biased gene expression in social insects and other polymorphic taxa, but its causes remain poorly understood. Morph-biased genes can evolve faster for several reasons. We suggest that the majority of morph-biased genes are under relatively weak selection, which can also explain why those genes evolve faster before the evolution of morphs. This suggests that adaptive scenarios should be treated with caution unless further supporting evidence can be provided. However, we also suggest that genes that ancestrally have been under weak selection, and therefore show high accumulation of mutations, may be co-opted in morph evolution and hence continue to evolve fast because of directional selection. Alternatively, co-option can lead to a reduction in the rate of evolution because of purifying selection following the onset of morph-biased expression. There are therefore several different possible genomic signatures of the evolution of morphs. Distinguishing between adaptive and (near-)neutral scenarios for the coupling of the rate of sequence evolution and morph-biased expression will require data to be replicated in several dimensions (individuals, contexts, morphs, species) at a level that is only now beginning to be possible in any taxa, including social insects. Many of the reported correlations to date are weak, and the patterns are likely to be refined by carefully assessing different functional classes of genes, more detailed studies of tissue specific expression, and studies that directly assess the evolution of gene regulation.

#### **ACKNOWLEDGMENTS**

The authors thank Tim Linksvayer, Jonna Kulmuni, and the two reviewers for comments. Heikki Helanterä is funded by the Academy of Finland (135990) and the Centre of Excellence in Biological Interactions. Tobias Uller is funded by the Royal Society of London and the Knut and Alice Wallenberg Foundations.

#### **REFERENCES**

un-biased but highly variable (gray boxes).


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 16 May 2014; accepted: 08 August 2014; published online: 29 August 2014. Citation: Helanterä H and Uller T (2014) Neutral and adaptive explanations for an association between caste-biased gene expression and rate of sequence evolution. Front. Genet. 5:297. doi: 10.3389/fgene.2014.00297*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Helanterä and Uller. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Developmental regulation of ecdysone receptor (*EcR*) and EcR-controlled gene expression during pharate-adult development of honeybees (*Apis mellifera*)

#### *Tathyana R. P. Mello1, Aline C. Aleixo1, Daniel G. Pinheiro2, Francis M. F. Nunes 3, Márcia M. G. Bitondi 4, Klaus Hartfelder 5, Angel R. Barchuk6 \* and Zilá L. P. Simões <sup>4</sup>*

*<sup>1</sup> Departamento de Genética, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, São Paulo, Brazil*

*<sup>2</sup> Faculdade de Ciências Agrárias e Veterinárias, Universidade Estadual Paulista, São Paulo, Brazil*

*<sup>3</sup> Departamento de Genética e Evolução, Centro de Ciências Biológicas e da Saúde, Universidade Federal de São Carlos, São Carlos, Brazil*

*<sup>4</sup> Departamento de Biologia, Faculdade de Filosofia, Ciências e Letras de Ribeirão Preto, Universidade de São Paulo, São Paulo, Brazil*

*<sup>5</sup> Departamento de Biologia Celular, Molecular e de Bioagentes Patogênicos, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, São Paulo, Brazil*

*<sup>6</sup> Laboratório de Biologia Animal Integrativa, Departamento de Biologia Celular, Tecidual e do Desenvolvimento, Instituto de Ciências Biomédicas, Universidade Federal de Alfenas, Alfenas, Brazil*

#### *Edited by:*

*Greg J. Hunt, Purdue University, USA*

#### *Reviewed by:*

*Xavier Belles, Institute of Evolutionary Biology (CSIC-UPF), Spain Joanna Kelley, Stanford University, USA*

#### *\*Correspondence:*

*Angel R. Barchuk, Departamento de Biologia Celular, Tecidual e do Desenvolvimento, Instituto de Ciências Biomédicas, Universidade Federal de Alfenas, Rua Gabriel Monteiro Silva, 700, CEP 37130-000 Campus Sede, Alfenas, Brazil e-mail: barchuk@unifal-mg.edu.br; arbarchuk@yahoo.com*

Major developmental transitions in multicellular organisms are driven by steroid hormones. In insects, these, together with juvenile hormone (JH), control development, metamorphosis, reproduction and aging, and are also suggested to play an important role in caste differentiation of social insects. Here, we aimed to determine how *EcR* transcription and ecdysteroid titers are related during honeybee postembryonic development and what may actually be the role of EcR in caste development of this social insect. In addition, we expected that knocking-down *EcR* gene expression would give us information on the participation of the respective protein in regulating downstream targets of EcR. We found that in *Apis mellifera* females, EcR-A is the predominantly expressed variant in postembryonic development, while EcR-B transcript levels are higher in embryos, indicating an early developmental switch in EcR function. During larval and pupal stages, EcR-B expression levels are very low, while EcR-A transcripts are more variable and abundant in workers compared to queens. Strikingly, these transcript levels are opposite to the ecdysteroid titer profile. 20-hydroxyecdysone (20E) application experiments revealed that low 20E levels induce *EcR* expression during development, whereas high ecdysteroid titers seem to be repressive. By means of RNAi-mediated knockdown (KD) of both EcR transcript variants we detected the differential expression of 234 poly-A+ transcripts encoding genes such as CYPs, MRJPs and certain hormone response genes (*Kr-h1* and *ftz-f1*). EcR-KD also promoted the differential expression of 70 miRNAs, including highly conserved ones (e.g., miR-133 and miR-375), as well honeybee-specific ones (e.g., miR-3745 and miR-3761). Our results put in evidence a broad spectrum of EcR-controlled gene expression during postembryonic development of honeybees, revealing new facets of EcR biology in this social insect.

**Keywords: honey bee, adult development, 20E, ecdysteroid, juvenile hormone, JH, RNAi, miRNA**

#### **INTRODUCTION**

Most multicellular organisms go through developmental transitions that enable them to cope with environmental changes and/or broaden their niche possibilities. Such transitions are generally timed and synchronized by morphogenetic hormones in a broad range of species, including insects, amphibians, metamorphic fish, tunicates, echinoderms, and plants. In insects, developmental transitions, such as larval and metamorphic molts, are driven by steroid hormones (ecdysteroids) acting in conjunction with juvenile hormone (JH). These hormones also control reproduction and aging (Flatt et al., 2005; Gáliková et al., 2011), and, in social insects, play important roles in caste polyphenism (Hartfelder and Emlen, 2012).

The steroid hormone ecdysone is produced by the prothoracic glands. After secretion, it is transported via the hemolymph to its target organs. Due to its lipophilic nature it passes directly into the cytoplasm of target and/or modification center cells (Iga and Kataoka, 2012; Ono, 2014), where it can be modified to 20-hydroxyecdysone (20E) by a 20-monooxygenase encoded by the *shade* gene, a member of the cytochrome P450 family (CYP314a1) known as Halloween (Petryk et al., 2003). The mode of action of JH, which is a sesquiterpenoid morphogenetic molecule, has only recently become clear (for review see Bellés and Santos, 2014), both in terms of its receptor and downstream cascade, as well as its molecular interaction with ecdysteroids. Produced by the *corpora allata* in the retrocerebral complex, JH relies on binding proteins for its transport in the hemolymph to target cells. There, it first binds to its intracellular receptor, the Methoprenetolerant (Met) protein, which then forms a complex with Taiman (Charles et al., 2011). This dimeric hormonereceptor complex then regulates the expression of target genes.

Knowledge on the mechanism of action of insect ecdysteroids initiated with the early work of Clever and Carlson (for a historical review see Bellés and Santos, 2014), which eventually resulted in the so-called Ashburner model (Ashburner et al., 1974), which proposed a general model for the action of ecdysone, based on its participation in the regulation of gene expression (puffing) in the polytenic salivary gland chromosomes during *Drosophila melanogaster* molting and metamorphosis. Briefly, the model states that ecdysone associates with an intracellular receptor protein to activate early genes encoding transcription factors, which then activate late genes and, on the other, inhibit the transcription of previously activated early genes. The receptor protein and certain other members of this cascade belong to a large family of proteins, the nuclear hormone receptors (NR, see Fahrbach et al., 2012). NR proteins are generally comprised of four independent but functionally interacting domains. A/B is a highly variable domain that may contain a motif (AF-1) driving ligand-independent transcription. The second, the C domain, is a DNA-binding domain (DBD), the most conserved region of NRs. The D or hinge domain provides a link between DBD and the next domain, LBD, a multifunctional domain that mediates ligand binding, dimerization, and interaction with heat shock proteins, nuclear localization, and transactivation functions. Functional NRs form homodimers and/or heterodimers that recognize specific DNA sequences. In the absence of a ligand molecule they act as repressors maintaining target genes inhibited by co-repressor complexes. In the presence of hormone they are activators of target genes by recruiting co-activator proteins and displacing corepressors (Hill et al., 2013; Yamanaka et al., 2013; Evans and Mangelsdorf, 2014).

The functional ecdysone receptor is a heterodimeric NR formed by the *Ecdysone Receptor* (*EcR*) and the *ultraspiracle* (*usp*) gene products (for a comprehensive review, see Hill et al., 2013). USP is an ortholog of the vertebrate retinoid-X receptor (RXR) (Yao et al., 1992) and is most commonly considered a kind of orphan NR. Though its ligand is not known, its participation as a mediator of JH action has been postulated, probably through a direct binding of JH (Barchuk et al., 2004). Furthermore, the EcR/USP complex can also bind to the *let-7-C* gene after a 20E pulse has triggered the larval-to-pupal metamorphic molt, thus inducing the transcription of a cluster of three microRNAs (miR-100, let-7 and miR-125). These then post-transcriptionally regulate the expression of genes involved in neuromuscular morphogenesis, leading to adult body characteristics (Chawla and Sokol, 2012; see also Rubio and Bellés, 2013). The EcR protein can also *per se* regulate the expression of target genes (Davis and Li, 2013), thus adding extra levels of complexity to the mechanisms and gene regulatory networks involving hormone/transcription factor activities.

In insects showing caste polyphenism, there is evidence that ecdysteroids are important players in caste differentiation, not only during post embryonic, but possibly even during embryonic development (Schwander et al., 2008). The role of ecdysteroids in caste development and regulation of adult reproduction is currently best understood in bees, especially so in the bumblebee *Bombus terrestris* (Geva et al., 2005) and in the honeybee *Apis mellifera* (Hartfelder and Engels, 1998), where they participate in the regulation of the differential morphogenesis programs by interacting with JH and possibly other mediating environmental modulators.

Receptor proteins mediating ecdysteroids action in social insects have been studied mainly in the honeybee (The Honey Bee Genome Sequencing Consortium, 2006; Velarde et al., 2006), where USP and EcR cDNAs have been cloned (Barchuk et al., 2004; Takeuchi et al., 2007), and the expression profiles of the respective genes were determined in several organs, tissues, and conditions (Barchuk et al., 2004, 2008; Takeuchi et al., 2007; Velarde et al., 2009). However, and despite all these works, several responses to differential hormone signaling in honeybee caste development are still poorly understood (Barchuk et al., 2007). For instance, ecdysteroid titers in developing females are higher in queens during the second half of the last larval instar (Rachinsky et al., 1990) and differ in their profiles during pupal and pharateadult development of queens and workers (Pinto et al., 2002). These hormone titer differences are associated with the differential development of specific structures (e.g., brain and ovary, Barchuk et al., 2007) and also the onset of vitellogenin synthesis (Barchuk et al., 2002), but this is essentially correlative information lacking functional support. Herein we aimed at determining the extent to which *EcR* transcription follows ecdysteroids titers during honeybee postembryonic development and can actually mediate the action of molecular determinants of caste development in honeybees. Moreover, we expected that knocking-down *EcR* gene expression during pharate-adult development would bring to light new downstream targets of EcR.

#### **MATERIALS AND METHODS BEES**

Embryos and the successive developmental phases in the larval and pupal stages, as well as newly-emerged adults were obtained from *A. mellifera* colonies (Africanized hybrids) maintained at the Experimental Apiary of the University of São Paulo at Ribeirão Preto, Brazil. The developmental phases of workers and queens (**Table 1**) were identified according to Rembold (1987) and Michelette and Soares (1993). Immediately after sampling, the bees were immersed in TRIzol reagent (Life Technologies) and frozen at −80◦C until RNA extraction.

#### **NORTHERN BLOT ANALYSIS**

Approximately 15μg of total RNA extracted from queens and workers at the PP1 and Pb developmental stages were subjected to electrophoresis in a denaturing 1.5% agarose/formaldehyde gel, and the RNA was then transferred to a PVF (Polyvinylidene Fluoride, GE) membrane using a VacuGene XL Vacuum Blotting



system (GE Healthcare). An EcR cDNA fragment of 160 bp encoding the 3 part of the DNA-binding domain was used for probe synthesis by means of the Random Primers DNA Labeling System (Life Technologies) and Redivue 32P-nucleotides (Amersham). After 3 h of hybridization at 42◦C, the membranes were washed during 20 min with 0.1 × SSC solution containing 0.1% SDS and then exposed to a Super Sensitive ST film and the bands revealed with Cyclone™ Storage Phosphor System (PerkinElmer).

#### **HORMONE TREATMENTS**

For the analysis of the *EcR* expression response to artificially augmented levels of hormones, workers at the brown-eyed pupal phase (Pb) were removed from the brood frames and maintained in an incubator at 34◦C and 80% relative humidity. For the ecdysone response, three groups of 3–7 workers were injected with 5μg of 20-hydroxyecdysone (20E; Sigma) dissolved in 2μL Ringer saline containing 12.5% ethanol. For the JH response, a similar number of Pb-phase workers received a topical application of 10μg JH-III (Fluka) dissolved in 2μL acetone. Controls received 2μL of the respective solvents. The amounts of applied hormone were based on previous experiments in which we had examined their effects on inducing gene expression during pupal stage (Barchuk et al., 2002, 2004). RNA was isolated from fat bodies after 1, 12, and 24 h (independent experiments). Fat bodies were obtained via a longitudinal incision in isolated abdomens, which were then kept under gentle agitation in Petri dishes containing 0.9% NaCl. The resultant suspension of dispersed fat body cells was centrifuged during 1 min at 2500× g and the pellet was transferred into TRIzol reagent and frozen at −80◦C until RNA extraction. We used fat bodies because this allowed us to specifically assay this metabolically important organ, especially with regard to vitellogenin (*vg*) gene expression in honeybees.

#### **RNA EXTRACTION, REVERSE TRANSCRIPTION AND QUANTITATIVE PCR ASSAYS**

Total RNA was isolated using TRIzol (Life Technologies), following the manufacturer's protocol, and purified by column purification (RNeasy Mini Kit, QIAGEN), as described previously (Barchuk et al., 2004, 2007). For the quantification of mRNA levels (except those validating the RNA-Seq data), first strand cDNA was synthesized by reverse transcription from 2μg of RNA with SuperScript II Reverse Transcriptase (Life Technologies) and an oligo(dT)12–18 primer (Life Technologies). For the validation of the RNA-Seq libraries, cDNA was synthesized using NCode™ miRNA First-Strand cDNA Synthesis and qRT-PCR (Invitrogen) kits and their instructions, adding a DNase (Promega) treatment step.

Comparative analyses of transcript levels were performed by Real Time quantitative PCR (qPCR) using a 7500 Real-Time PCR System (Applied Biosystems) or a StepOne Plus system (Applied Biosystems). Amplifications were carried out in 20μL reaction mixtures, each containing 10μL of SYBR® Green Master Mix 2× (Applied Biosystems), 0.8μL of a 10 mM stock solution of each of the gene-specific forward and reverse primers (Table S1), and 1μL of first-strand cDNA diluted 1:4 (or 1:10, for cDNA samples used to validate RNA-Seq data) in ultrapure water. The sequences of forward primers were identical to the mature miRNA sequences available at miRBase, but replacing U by T, while the reverse Universal qPCR primer is supplied by NCode kit. Reaction conditions were 50◦C for 2 min, 95◦C for 10 min, followed by 40 cycles of 95◦C for 15 s and 60◦C for 1 min (or 33 s for miRNA amplification). Three biological replicates were run in three technical replicates each. *Actin related protein* 1 (*Arp1*, GenBank accession number NM\_001185145.1), *rpl32* (accession number NM\_001011587.1), or a *U5 snRNA* gene were used as reference genes (for confirmation, cDNAs for all three reference genes were partially sub-cloned and sequenced in our laboratory). Relative quantities of transcripts were calculated using the comparative Ct method (Applied Biosystems, User bulletin#2). Statistical analyses were carried out with Statistica version 7.0 (http://statistica*.*software*.*informer*.*com/).

#### **ASSESSING GENE TRANSCRIPTION PATTERNS ASSOCIATED TO EcR FUNCTION DURING HONEYBEE DEVELOPMENT USING RNAi** *dsRNA synthesis and treatment*

We employed a general protocol for dsRNA synthesis and injection in honeybees (Pbl phase) (Amdam et al., 2003). For *EcR* dsRNA synthesis, a 391 bp clone of *EcR* cDNA was amplified to serve as template, this comprising a fragment shared by the two transcript variants (A and B). The primers with the respective recognition site for T7 RNA polymerase (underlined) were: *EcR*-forward 5 -TAATACGACTCACTATAGGGCGAGAAT GGCGAGGAAGTACGAC and *EcR*-reverse 5 -TAATACGACT CACTATAGGGCGATTCTTGAACTTGAGGCTGAAG. A green fluorescent protein (*GFP*) gene clone was used as template to synthesize the respective dsRNA used as a non-target control (*GFP*-forward 5 -TAATACGACTCACTATAGGGCGAAGTGGA GAGGGTGAAGGTGA-3 and *GFP*-reverse 5 -TAATACGACTC ACTATAGGGCGAGGTAAAAGGACAGGGCCATC-3 ; see Nunes et al., 2013a). The amplification products were visualized and retrieved after agarose gel electrophoresis and purified using QIAquick™ (QIAGEN) columns. *In vitro* transcription reactions were performed by using the RiboMax™ T7 system (Promega) and the obtained dsRNA was isolated using TRIzol LS reagent (Invitrogen), subjected to a denaturation step at 98◦C for 5 min, followed by 30 min at room temperature, and diluted with nuclease free water to a final concentration of 2.5μg/μL. The dsRNA quality was assessed by agarose gel electrophoresis.

Pbl-phase workers (*n* = 30 for each experimental group) received an intra-abdominal injection of 2μL of *EcR* dsRNA solution (2.5μg/μL). Controls of the same developmental phase received the same volume of *GFP* dsRNA solution. dsRNAinjected bees were kept in an incubator at 34◦C and 80% relative humidity until adult eclosion (∼2 days), when they were transferred to TRIzol reagent (Invitrogen) and frozen at −80◦C until RNA extraction. Total RNA extraction and first strand cDNA synthesis were carried out as described above. *EcR* knockdown efficiencies were assessed by RT-qPCR using variant-specific primers (EcRA-F, EcRB-F, and EcRA/B-R; see Table S1). Bees not used for gene expression analysis were used for evaluation of the adult phenotype.

#### *Analysis of gene expression patterns by RNA-Seq*

RNA pools of equal concentration from each group of EcRand GFP-dsRNA treated bees were used for RNA-sequencing. Libraries were prepared using the TruSeq RNA™ Sample Preparation kit (Illumina) for poly-A+ RNA, and the TruSeq™ Small RNA Sample preparation kit (Illumina) for small RNAs (shorter than 200 nt). These libraries were shipped to the University of North Carolina (Chapel Hill, USA) facility where they were sequenced on an Illumina platform (Genome Analyzer II, Life Sciences).

RNA-Seq reads for the poly-A+ RNA library were first submitted to adapter clipping using Scythe (Buffalo, 2011) (v.0.981 default parameters) for the 3 -end adapter and CutAdapt (Martin, 2011) (v.1.1—minimum overlap of 5 bp) for the 5 -end adapter. The next step was read trimming based on quality scores (mean *Q* ≥ 25), Ns (number of *N* bases lower than 10%) and poly-A tail prediction (minimum of 5 bp of A/T at both ends). This step was performed using PRINSEQ (v.0.19.5) (Schmieder and Edwards, 2011), which also filtered very small reads (length *<* 15 bp). An alignment against the *A. mellifera* genome (assembly version 4.5) was run using TopHat (Trapnell et al., 2009) (v.2.0.7), guided by the respective RefSeq (Release 55) transcript coordinates. The genomic alignments were then submitted to Cufflinks (Trapnell et al., 2010) (v.2.0.2) for transcript assembly, estimation of their abundances and testing for differential expression between EcR-KD and control samples. The Cufflinks procedures were also guided by the RefSeq transcript coordinates. The expression estimates were properly normalized considering ambiguous alignments, and corrected for fragment bias (Roberts et al., 2011). The Poisson fragment dispersion model was used in the comparison analysis. Cufflinks calculates the FPKM (Fragments Per Kilobase of exon per Million fragments mapped), log2-fold-change and *q*value (*p*-value adjusted by False Discovery Rate, FDR). However, the log2-fold-change was recalculated after adding an offset of 1 to FPKM values in order to enable comparison involving samples without expression (zero) and to reduce the variability of the log ratios for low expression values (less than one). The functional annotation was done using Blast2GO (Conesa et al., 2005) (v.2.5), InterProScan (Mulder and Apweiler, 2007) (v.5-RC6), RefSeq transcript annotation and finding the Reciprocal-Best-Hit of *A. mellifera* RefSeq proteins against *D. melanogaster* proteins database (FlyBase r5.49) using blastp. The Blast2GO annotation pipeline was run based on blastp results of RefSeq proteins against nr database.

Computational processing of the Small RNA-Seq reads comprised the following steps: (i) initial sequence quality filtering based on unidentified bases; (ii) rRNA read filtering based on matches against SILVA database (Release 115); (iii) sequence adapter clipping using CutAdapt and Scythe; (iv) read trimming based on quality scores, Ns and poly-A+ tail prediction. All of these procedures were performed using PRINSEQ in the same way as described above. After each one of these preprocessing steps, an alignment against the *A. mellifera* genome (assembly version 4.5) was performed using the reads that had not already been aligned at each previous alignment step. Finally, all the alignment results were concatenated and transformed into a proper format to identify miRNAs. For this purpose, any splitted alignments were excluded.

Genomic alignments were performed using TopHat and the other alignments were performed using Bowtie2 (Langmead and Salzberg, 2012) (v.2.0.6). miRNA digital expression (MDE) levels were obtained by analysis with miRDeep2 (Friedländer et al., 2012) (v.2.0.0.5∗), which provides the counts of reads mapped to the *A. mellifera* miRNA dataset in miRBase (Release 19). The original miRDeep2 code was modified to provide read counts for mature miRNAs instead of each precursor, and then the log2-fold-change was calculated and statistical significance was assessed using the method proposed by Audic and Claverie (1997) with adjustment by FDR.

#### **RESULTS**

#### **EcR-A AND EcR-B TRANSCRIPT VARIANT IDENTIFICATION IN HONEYBEES**

Two transcript variants, EcR-A (Accession numbers NM\_001098215.2) and EcR-B (NM\_001159355.1) of 2635 and 2782 nucleotides, respectively, have been identified for the *A. mellifera EcR* gene (Takeuchi et al., 2007 and Watanabe et al., 2010). The difference in nucleotide length was shown to reside within the 5 end, resulting in amino acid sequence variation in the N- modulator A/B domain. Conceptual translation of the nucleotide sequences resulted in a putative EcR-A protein consisting of 629 amino acid residues and an EcR-B protein of 557 amino acids, both sharing a 452 amino acid sequence in the carboxy terminal (**Figure 1A**). Northern blot analysis using a C-terminal *EcR* probe showed hybridization bands of approximately 2.7 kb and 2.6 kb (**Figure 1B**), mainly in queen samples, but since we did not aim at quantifying, the respective band

domains with their respective amino acid numbers. The lenght of the coding sequence of each exon within the respective EcR domain is marked by

density does not necessarily represent difference in transcript levels between the two castes. Nonetheless, this result reveals that the two transcripts indeed have small differences in length, this supporting the *in silico* evidence.

#### **DEVELOPMENTAL PROFILES OF THE EcR TRANSCRIPT VARIANTS A AND B**

Using variant-specific primers we quantified the transcript levels of *EcR-A* and *EcR-B* covering the entire postembryonic development for honeybee queens and workers (**Figure 2**). Three major findings are worthy of note: (i) transcripts representing the *EcR-B* variant are predominant in embryos (Mann–Whitney Test, *P* ≤ 0*.*05), but these transcript levels decline at the transition to the first larval instar, and it is the *EcR-A* variant which is then predominantly expressed during the end of the larval stage (fifth instar) and pupal stage; (ii) at several time-points, *EcR* expression is higher in workers than in queens (Mann–Whitney Test, *P* ≤ 0*.*05); and (iii) there is a clear discrepancy between circulating ecdysteroid levels and the developmental expression of EcR-A.

Major caste differences in EcR-A expression were seen to accompany the larval/pupal metamorphic molt. As soon as the larvae were no longer fed by nurse bees and the brood cells were closed, the EcR-A levels were increased by two orders of magnitude in cocoon-spinning worker larvae (S1–S3 phases). A rise was also seen in EcR-A levels in cocoon spinning queen larvae, but this was significantly lower than in workers (Mann– Whitney Test, *P* ≤ 0*.*05). A similar pattern was also seen for 160 bp fragment that included part of the DBD coding region. PP1, early prepupa; Pb: brown-eyed pharate-adult with unpigmented cuticle.

the EcR-B variant, but at much lower modulation. Interestingly, the EcR expression levels were then decreased for both variants and in both castes at the onset of the prepupal development (PP1), marked by the appearance of apolysis fluid separating the fifth instar larval cuticle from the newly synthesized pupal cuticle in the head region. A new rise in the transcript levels of both variants was then seen at the end of the prepupal development (PP3), but this was primarily evident in workers (Mann– Whitney Test, *P* ≤ 0*.*05). EcR-A and EcR-B transcript variants remained at low levels during the pupal and early pharate-adult stages (Pw to Pbl phases) before they showed another steady increase, but again mainly so in workers (Mann–Whitney Test, *P* ≤ 0*.*05).

#### **TRANSCRIPTIONAL RESPONSE OF EcR TO ARTIFICIALLY AUGMENTED ECDYSTEROID AND JH TITERS**

So as to better understand the relationship between hemolymph hormone titers and hormone receptor expression, especially the remarkable divergence in the pupal stage, we treated Pb-phase workers and queens, as these are at the transition from pupal development *per se* to the pharate adult stage, with JH and 20E. At the Pb-phase the ecdysteroid titer is rapidly declining in both castes after having gone through the maximum peak at the preceding Pp phase (Pinto et al., 2002), while JH levels are still basal (Rembold, 1987). The transcriptional responses for the two EcR variants assayed by RT-qPCR revealed a general repressive effect of both hormones at 24 h after application (**Figure 3**). In queens, 20E injection elicited a repressive effect on both EcR variants.

Mean transcript levels were diminished at 12 h after 20E injection and were significantly lower at 24 h (Mann–Whitney Test, *P* ≤ 0*.*05). In workers this was the case only for the *EcR-B* transcript and only at 24 h (**Figure 3A**).

The effect of exogenous JH on *EcR* expression was not as clearcut as that elicited by 20E treatment. While there was no apparent effect on *EcR-A* transcripts in workers, the *EcR-B* levels showed slightly elevated means at all time points (**Figure 3B**), and these were significantly higher at 12 h following hormone treatment (Mann–Whitney Test, *P* ≤ 0*.*05). Interestingly, in the queen caste the response to JH treatment appeared to be opposite to that seen in workers, with mean EcR-A and EcR-B transcript levels diminished already at 1 h after treatment and significant differences apparent at 1 h in the case of *EcR-B* and at 24 h for *EcR-A* (Mann– Whitney Test, *P* ≤ 0*.*05). These results indicate a repressor effect of high circulating ecdysteroid levels on *EcR* expression in both castes and a differential response to JH, with workers responding positively and queens negatively to elevated JH levels.

#### **EcR KNOCKDOWN IN PHARATE-ADULT HONEYBEE WORKERS SIGNIFICANTLY DOWNREGULATES THE EXPRESSION OF CANDIDATE TARGET GENES**

So as to understand the role of the *EcR* gene in honeybee development, beyond the correlation analysis between transcript levels of the two EcR variants and hormone levels, we experimentally decreased the *EcR* gene functionality by an RNA interference approach. We herein focused on the *EcR* response in workers during the pharate-adult to adult transition because only one of the two transcript variants, *viz*. *EcR-A*, undergoes a gradual increase at this developmental interval, and only so in the worker caste (**Figure 2**). We expected this to give not only more clear-cut results and insights into the role of the predominant EcR variant, but also into still very little understood aspects of morphogenetic processes taking place in developing adult honeybees.

The dsRNA fragment used in this experiment represented an EcR region shared by the two transcript variants and its injection resulted in a reduction of 79.8 and 74.9% for *EcR-A* and *EcR-B* mRNA levels, respectively (*P <* 0*.*001, Student's *t*test; see **Figure 4**). A mortality of 10% was observed in both EcR- (KD) and GFP-dsRNA treated (control) bees. A proportion of dsRNA-injected bees showed alterations in cuticle pigmentation and wing development, similar to previously reported observations by Barchuk et al. (2008) when studying *ultraspiracle* function. Based on the strong knockdown response we next assayed the transcriptional response of four candidate target genes, these being a homolog of the *D. melanogaster ftz-f1* gene, the *vg* gene, and two genes involved in adult cuticle formation (*AmelCPR14* and *BursA*). The *ftz-f1* gene was included in this analysis because in *D. melanogaster* it acts as a competence factor for the response to 20E; furthermore, EcR also inhibits *ftz-f1* expression in *D. melanogaster* mid-prepupa, thus temporarily impairing the larval-to-pupal transition in response to the second 20E peak (King-Jones and Thummel, 2005). In pharate-adult honeybees, the levels of *ftz-f1* transcripts were seen to increase (data not shown) concomitantly with the levels of *EcR-A*, suggesting a synergistic action of the two genes. In addition, the increase in the levels of the two genes coincides with the increase in the expression of genes encoding enzymes and proteins needed for the complete differentiation of the adult cuticle (Soares et al., 2007, 2011, 2013; Elias-Neto et al., 2010). Similarly, in *D. melanogaster* the expression of *ftz-f1* has recently been related to adult cuticle formation and eclosion (Sultan et al., 2014). The analysis of *ftz-f1* transcript levels in newly emerged workers (*N* = 12), i.e., approximately 2 days after injecting dsR-NAs, showed that *ftz-f1* expression was significantly decreased in EcR-KD bees (*P* ≤ 0*.*05, Student's *t*-test) (**Figure 4**). A significant effect of the EcR knockdown was also seen for the cuticular protein gene *AmelCPR14*, but not for the *Burs A* gene that encodes a subunit of the neurohormone Bursicon. The significant reduction in the expression of a cuticular protein gene following EcR-RNAi is consistent with the ecdysteroid-related expression of these genes in developing honeybees (Soares et al., 2007, 2011, 2013). An interesting though not easily explained finding was that *vg* gene expression was not significantly affected by reducing EcR function, although the mean *vg* transcript levels were slightly reduced

compared to the non-target dsRNA control. This was surprising as *vg* gene expression has been shown to gradually increase in pharate-adult honeybee females, and this increase was thought to be related to ecdysone levels (Barchuk et al., 2002; Piulachs et al., 2003).

#### **EcR KNOCKDOWN AFFECTS THE POLY-A+ PROFILE OF NEWLY EMERGED WORKERS**

So as to understand EcR functions during the pharate-adult to adult transition of honeybee workers on a more global scale we compared the poly-A+ transcriptomes of EcR-KD and GFP-injected (control) bees. After filtering of the raw data we obtained 112,659,148 reads for the KD and 71,050,536 reads for the control samples. Most of these reads were 50 nt long. This data has been submitted to the Sequence Read Archive (SRA, NCBI, http://www*.*ncbi*.*nlm*.*nih*.*gov/sra) under the Accession Number SRX700299. As we had only one RNA sample set per group (two libraries, no replicates), the estimate obtained by Cuffdiff analysis was that 234 loci were differentially expressed [absolute log2 (fold change) *>*1; *q*-value = 0.05; FPKM *>* 5 in at least one library] (Table S2). Among these, 121 code for known protein products, and 100 of these were upregulated in KD pharate-adults and 21

were downregulated (overexpressed in control bees; **Table 2**). The five times higher number of differentially expressed genes in the EcR-KD group indicates that during the pharate-adult to adult transition more genes may be repressed by ecdysone than are induced.

In terms of functional assignments the following conclusions can be drawn. Seven genes among the ones upregulated in the KD group code for cytochrome P450 proteins (**Table 2** and Figure S1A), six of these belonging to CYP clade 3 (CYP6AS2, CYP6AS3, CYP6AS4, CYP6AS5, CYP6AS12, and CYP6BD1) and one (CYP305D1) to CYP clade 2 [for clade assignments of honeybee cytochrome P450 genes see (Claudianos et al., 2006)]. A second protein family that was well-represented among the upregulated genes in the KD group is that encoding Major Royal Jelly Proteins (MRJP1 and MRJP9) and an MRJP-associated protein, apisimin. A third class is represented by hormone responserelated genes: a gene encoding a JH-inducible protein, a gene encoding a honeybee eclosion hormone (EH) homolog, and *krüppel-homolog* 1, an immediate response gene regulated by the JH receptor (Bellés and Santos, 2014). Nonetheless, the genes with the highest differential expression index are three genes encoding transcripts of unknown function and without conserved domain evidence (LOC100576540, LOC727013, and LOC727546). The fourth highest upregulated gene in the KD group encodes an α-glucosidase, an enzyme that converts the disaccharide sucrose into glucose and fructose and is, thus, critically involved in carbohydrate metabolism. Another three genes in the top gene list are also related to metabolic functions, these being transcripts for a glycine N-methyltransferase-like, a glycine-methanol-choline (GMC) oxidoreductase 3 and a lipase 3-like protein. Furthermore, three genes upregulated in KD bees, the GMC oxidoreductase 3, a UDP-glycosyltransferase (LOC 413043) and a glucuronosyltransferase (LOC 725997), could be related to ecdysteroid metabolism and function.

The genes downregulated in EcR-KD bees are listed at the bottom of **Table 2**. They are represented with positive fold change values, as these were calculated as relative to the control group. In contrast to the upregulated genes, those that were downregulated are not as clearly associated to putative functions during the pharate-adult to adult transition, except for the LOC724735 and *Grp* genes that encode structural cuticle proteins needed for the construction of the adult cuticle at this stage. The gene with the highest overexpression index in the control group codes for a Niemann-Pick type protein (NPC2), that is, genes involved in cholesterol metabolism-related syndromes and diseases (another *npc2*-type gene was found slightly overexpressed in the KD bees). Next are three transcripts possibly related to venom gland function, encoding a phospholipase, secapin and a putative mast cell degranulating peptide (**Table 2** and Figure S1A). Also downregulated was the *brown* gene, which encodes an ABC-2 type transporter protein, and a gene coding for a Major Royal Jelly Protein (MRJP3).

A more global analysis on the entire set of differentially expressed genes was done based on Gene Ontology (Blast2GO and InterProScan) using Fisher and Kolmogorov–Smirnov statistics. This confirmed that the poly-A+ RNAs representing genes upregulated in the KD group are enriched in proteins participating in metabolic pathways, particularly ones with catalytic and oxidoreductase activities (Table S3).

So as to validate the poly-A+ RNA-sequencing results we then chose two genes revealed as upregulated in the KD group (*Cyp6as5,* a P450 protein coding gene, and *kr-h1*, a gene encoding the JH response factor Krüppel homolog-1) and two downregulated genes (LOC406145, *secp* and LOC724386, *npc2*). For these we designed or selected from the literature gene-specific primers and ran RT-qPCR assays. The expression pattern was confirmed for all four genes (Figure S1A), thus providing further evidence that the 234 poly-A+ RNA coding genes found as differentially expressed between treated and control bees are under EcR control.

#### **EcR KNOCKDOWN AFFECTS THE miRNA PROFILE OF NEWLY EMERGED WORKERS**

We obtained a total of 31,171,886 and 33,683,147 reads of small RNAs from the KD and control sequence libraries, respectively. This data has been submitted to the Sequence Read Archive (SRA, NCBI, http://www*.*ncbi*.*nlm*.*nih*.*gov/sra) under the Accession Number SRX700299. After filtering the raw data, we focused on the discovery of miRNAs linked to the EcR network. A total of 4,436,511 reads of the KD samples (∼13.2%) and 10,557,117 reads of the control sample (∼33.9%) mapped to known honeybee mature miRNAs (available in miRBase version 19), suggesting that EcR disruption causes a general downregulation of miRNA families. We considered as "expressed" those miRNAs with more than 10 reads represented in at least one library. By doing so we retrieved a total of 132 known miRNAs expressed in newly emerged workers, most of them (124) in both conditions (Table S4). In order to find a set of miRNAs whose transcription is significantly affected by the EcR pathway, we filtered the Cuffdiff results by selecting miRNAs with expression differences higher than 1.2 fold and a *q*-value *<* 0.05 between KD and control bees. We found 60 downregulated and 10 upregulated miRNAs in KD samples compared to controls (**Table 3**). These data were then further validated by RT-qPCR assays for the following miRNAs: miR-14, miR-100, miR-125, miR-133, miR-375, miR-3728, and miR-3771 (Figure S1B).



#### **Table 2 | Continued**


#### **Table 2 | Continued**


#### **Table 2 | Continued**


*Negative values of log2 FC indicate genes that are upregulated in EcR-KD bees (highlighted in red); positive values indicate downregulation in EcR-KD bees (highlighted in green). Genes shown in bold had their transcription patterns validated by qPCR. Expression values measured as FPKM (Fragments Per Kilobase of exon per Million fragments mapped). This list contains only the genes with FPKM values of 5 for any of the two samples.*

#### **DISCUSSION**

#### **THE HONEYBEE EcR TRANSCRIPT VARIANTS AND THEIR DEVELOPMENTAL REGULATION**

The existence of more than one EcR isoform is commonplace in insects, including the honeybee, for which two transcript variants, *EcR-A* and *EcR-B* had been found (Takeuchi et al., 2007). First shown for *D. melanogaster* (Talbot et al., 1993) and then for the red flour beetle *Tenebrio molitor* (Mouillet et al., 1997), the extensive review of insect EcR isoforms by Watanabe et al. (2010) showed a high similarity in their nucleotide and amino acid sequences in most of their functional domains, except for the N-terminal region including the variable A/B modulator domain, which might allow for the recruitment of different co-activators/co-repressors (Tora et al., 1988; Kato et al., 1995; Watanabe et al., 2010).

First, we confirmed by northern blotting the expression of the two *EcR* variants in honeybee queens and workers. Then, we compared their temporal expression profiles to the hemolymph ecdysteroid titers of fifth instar queen and worker larvae (F1-PP3 phases) (Rachinsky et al., 1990). The results for the developmental expression profiles of the two ecdysteroid receptor variants are surprising in two aspects. First, contrasting with the hormone titers, which are higher in queens than in workers, the *EcR* transcript levels were found to be higher in workers, especially so for *EcR-A*. Second, there was a marked drop in *EcR* expression at the beginning of the prepupal phase (PP1), i.e., exactly when the hemolymph ecdysteroid levels increase to reach a developmental peak at the subsequent PP2 phase. Strikingly as well, the transcript levels for both EcR variants remained at low or basal levels during the pupal and early pharate-adult stages (Pw to Pbl phases), even though the ecdysteroid hemolymph titers are at a maximum during this period (Feldlaufer et al., 1985; Pinto et al., 2002).

The switch from EcR-B expression in the embryonic stage to EcR-A as the predominant isoform in the fifth larval instar and pupal stage reflects a change in the processing of an eventual long pre-mRNA, or a shift in transcription start site utilization (our RNA-Seq data are in support of the latter possibility and even suggest the existence of a third EcR transcript variant). Since the *EcR* gene is known to be induced after an ecdysteroid pulse (Karim and Thummel, 1992; Davis and Li, 2013), the production of *EcR-B* mRNA in honeybee embryos would require the presence of steroid hormones, which is indeed the case. Makisterone A, the predominant ecdysteroid in *A. mellifera* (Feldlaufer et al., 1985), has been shown to be present in ovaries in quite large amounts (Feldlaufer et al., 1986a), and unpublished data from our laboratory also confirm the presence of ecdysteroids in developing embryos. High levels of ecdysteroids in ovaries have also been shown for bumblebee queens (Geva et al., 2005) and queens of a swarm-founding neotropical wasp, *Polybia micans* (Kelstrup et al., 2014). Embryonic ecdysteroids can be synthesized by enzymatic conversion from inactive conjugates stored during oogenesis (Dorn, 2000) or, as seen in mosquitoes, transferred by males during copulation (Baldini et al., 2013).

Since makisterone A is the predominant ecdysteroid compound in queen ovaries (Feldlaufer et al., 1986a) and also in pupal-stage hemolymph (Feldlaufer et al., 1986b) and 20E is not negligible in prepupal hemolymph (Rachinsky et al., 1990), the observed embryonic-to-larval EcR isoform switch may be linked to variation in the ecdysteroid composition circulating in the hemolymph, throughout a bee's life cycle. This could not only be responsible for the observed differential *EcR* transcription, but also for the formation of different hormone/receptor complexes with potentially different target genes thus, governing separate physiological processes. 20E, for example, might have retained a role in reproductive physiology, as suggested by Takeuchi et al. (2007), whereas makisterone A may have been co-opted for governing postembryonic development, as suggested for *Dysdercus fasciatus* (Feldlaufer et al., 1991). Nonetheless, for honeybees such "division of labor" in ecdysteroid compounds is still highly speculative, especially since the ecdysteroid hemolymph levels in adult honeybee queens and workers are continuously low, this making it rather unlikely that these steroid hormones may play a major role in the reproductive female physiology (Hartfelder et al., 2002). Instead, they seem to be preferentially stored in the developing follicles.

The second and third major findings mentioned above are that the *EcR-A* transcript levels are higher in worker than in queen development, and that there is no positive, but rather an apparently negative correlation between hormone levels and hormone receptor transcript levels. This stands in stark contrast to the developmental pattern of the hemolymph ecdysteroid titers in the two castes, which are higher in queens than in workers, particularly so during larval-pupal metamorphosis (Rachinsky et al., 1990). The ecdysteroid titer in last instar queen larva rises as soon as the brood cells are closed and the larvae start to spin their

#### **Table 3 | miRNAs that were differentially expressed in EcR-knockdown bees.**



#### **Table 3 | Continued**

*Only miRNAs with Fold-change >1.2 are listed.*

cocoons (S1-stage), while in workers this was only seen in the late spinning (S3) to early prepupal (PP1) phases. Furthermore, the peak in edysteroid levels reached during the prepupal phase (PP2) is twice as high in queens compared to workers (Rachinsky et al., 1990). The negative correlation between ecdysteroid levels and *EcR* expression is particularly evident at two time points: in prepupae, when ecdysteroid levels are high in the PP1–PP2 phases, just as the EcR-A expression pattern undergoes a valley, and in the pupal stage, when the ecdysteroid levels are high in both castes at the Pp phase, before dropping in the Pb-Pbl phases (Pinto et al., 2002). It is only after this drop in circulating hormone levels that *EcR-A* transcription is resumed, particularly so in the worker caste, and strikingly, it is during the subsequent Pbm and Pbd phases that the ecdysteroid levels are again lower in workers than in queens (Pinto et al., 2002).

This apparent negative correlation between hormone and hormone receptor levels was suggestive of a repressive action of high concentrations of circulating ecdysteroids on the expression of their receptor gene. To test this we manipulated the endogenous hormone levels by treating Pb pharate-adults with either 20E or JH. The results of the 20E injection experiments showed that a prolonged and excessive presence of ecdysteroids had a repressive effect on EcR-A and EcR-B expression, especially so in queens (**Figure 3A**). Interestingly, workers seem to be more resilient to this repressor effect, as there was no significant reduction in EcR-A transcript levels, comparable to that seen in queens, or for EcR-B in both castes. Such resilience was also denoted in the JH application experiment, where EcR-A transcript levels in workers remained little affected compared to those in queens and for EcR-B in both castes 24 h after the 20E injection. Strikingly, JH appeared to have opposite effects on EcR-B expression in the two female castes, showing a positive effect in workers and a negative one in queens. These differences of hormone effects on *EcR*expression related to caste certainly deserve a closer look in future experiments.

Repressive effects of high concentrations of ecdysteroids on EcR-expression are, however, not new and are likely to be a general feature of hormone systems that underly cyclical events in morphogenesis and physiology. For instance, similar results were described for *Manduca sexta*, where low concentrations of 20E induced *EcR* expression while high concentrations repressed the expression of this gene (Jindra et al., 1996). Like ours, these results suggest that the *EcR* gene responds positively to a slight increase in ecdysteroids, whereas high hormone levels are repressive. In fact, as we could see, *EcR* expression actually appears to precede the rise in hormone levels, for instance in the S1–S3 phases, when the circulating ecdysteroid levels start increasing (Rachinsky et al., 1990), but EcR-A and also EcR-B transcript levels have already undergone a steep rise. A repetition of this pattern can be inferred for the pupal ecdysis event, occurring between PP3 and Pw, when the ecdysteroid titers undergo a sharp drop, but EcR-A and EcR-B are on the rise (mainly in workers), and drop once the ecdysteroid titers build up to maximal values in the Pp phase. It is such cyclical events, the molts, that are synchronized by the ecdysone/ecdysone receptor complex action, and this is primarily seen in the epidermis, the main organ of cuticle synthesis. In the honeybee, several cuticle protein genes were shown to be regulated by ecdysteroids (Soares et al., 2007, 2011, 2013; Elias-Neto et al., 2010).

#### **RNAi-MEDIATED KNOCKDOWN REVEALS EcR REGULATED GENES IN DEVELOPING ADULTS**

Upon comparing the sequencing results of the poly-A+ libraries for EcR knockdown (EcR-KD) and control groups, the Cutdiff analysis classified 234 loci as differentially expressed. Among these, 121 were annotated as coding for known protein products or, from another point of view, 113, i.e., one half, represent loci for unknown, not annotated products, which could be either proteins or long non-coding RNAs. Especially the latter are still "dark matter" in the honeybee genome, represented by many ESTs in the databases, but only four long non-coding RNAs are so far characterized to some detail (Sawata et al., 2002; Humann et al., 2013).

Among the genes with known orthologs or sequence similarity in functional domains, 100 were overexpressed (fold change *>* 1) in the EcR-KD group and 21 in the control group, this indicating that apparently more genes are repressed by the ecdysone/EcR receptor complex than are activated. Furthermore, a Gene Ontology and KEGG pathway analysis showed that there is little overlap in gene functions between the two sets of differentially expressed genes (DEGs). As mentioned above, cytochrome P450 genes are strongly represented among the DEGs. While cytochrome P450 genes are a large gene family, strongly related to detoxification processes, this family has undergone considerable reduction in honeybee genome evolution (Claudianos et al., 2006). This reduction is, however, denoted only in certain clades of the P450 enzymes, but not in the clades comprising the genes found in our EcR-RNAi experiment. Unfortunately, there is no further functional or tissue/cell type information available for the five cytochrome P450 genes, especially whether or not they may be related to steroid synthesis or metabolism. Nonetheless, similar findings as the ones we report here were also denoted by Davis and Li (2013) in their genomic screen for ecdysone and EcR-dependent gene expression in *D. melanogaster*.

A second group of overrepresented genes that called attention was the hormone response-related genes, as these may provide a link between JH and ecdysteroid action during the pharate-adult to adult transition in honeybees. For this group we found three genes as overexpressed in the EcR-KD group, *viz.* a JH-induced protein, *kr-h1*, and an Eclosion hormone-like (*EH-like*) gene. *krh1* is certainly the most interesting gene in this set, as it represents a direct readout of the activity of the JH response in target tissues (Lozano and Belles, 2011; Bellés and Santos, 2014). As *kr-h1* has previously been identified in a screen for ecdysone-response genes in *D. melanogaster* (Beckstead et al., 2005), the current identification of this gene in the EcR-KD group provides experimental evidence toward a mechanistic explanation for the modulation of vitellogenin induction in honeybee pharate-adults, where *vg* expression is caste-specifically induced by JH and counteracted by ecdysteroids (Barchuk et al., 2002). Overexpression of an *EH-like* gene in the EcR-KD group was not unexpected, as EH is synthesized in response to declining ecdysteroid titers and is part of the ecdysis triggering signaling cascade (Zitnan and Adams, 2012). Interestingly, other three upregulated genes in EcR-KD bees may have roles in ecdysteroid metabolism and function. Several GMC oxidoreductase genes in diverse insects, including *A. mellifera*, are clustered in an evolutionary conserved tandem array with potential to be co-regulated for a common function related to ecdysteroid metabolism (Iida et al., 2007). The products of LOC413043 and LOC725997 may regulate ecdysteroid titer and function since the enzymes encoded by these genes catalyze the transfer of glucose from UDP-glucose to ecdysteroids, and thus are possibly related to ecdysteroid inactivation (O'Reilly and Miller, 1989).

Overexpression of members of the Major Royal Jelly Protein (MRJP) family can be interpreted in the context of a repressive action of the ecdysone/EcR receptor complex on genes of the adult honeybee life cycle (the only exception being the gene coding for the MRJP3, which was overexpressed in control bees). The *mrjp* gene family with its nine members is a lineage-specific extension in the genus *Apis*, from a single *mrjp-like* gene within the *yellow* genes complex (Drapeau et al., 2006). Even though these proteins are highly expressed in the hypopharyngeal glands of nurse worker bees, constituting the major protein fraction of the glandular secretions fed to larvae (royal jelly and worker jelly), expression of the *mrjp* genes is neither exclusive to this tissue nor is it restricted to the worker caste. Especially *mrjp9* has been shown to be broadly expressed, in different tissues of adult workers and also in queens and even drones (Buttstedt et al., 2013). In contrast to *mrjp9*, *mrjp1* expression is more tissue-specific, being highest in heads (*viz*. hypopharyngeal glands) of nurse bees, with expression levels being considerably lower in other body parts, castes and sexes (Buttstedt et al., 2013). MRJP1 is the predominant MRJP moiety in royal jelly, present as oligomers of MRJP1 subunits, which are held together by apisimin, a small 5 kDa protein (Tamura et al., 2009). ESTs corresponding to *apisimin* were found as overrepresented in the EcR-KDS group, indicating that its expression is co-regulated with that of *mrjp1*. But this coregulation is not due to genomic proximity, as the *mrjp/yellow* gene cluster maps to chromosome 11, whereas *apisimin* is located in chromosome 6. Interestingly, an MRJP1 monomer, royalactin, was found to be an important factor in caste development, acting through the Egfr signaling pathway (Kamakura, 2011).

Among the EcR-KD group genes we also identified *obp15*, which encodes a putative odorant binding protein. Some *obp* genes were also found to be under negative EcR control in *D. melanogaster,* including the *obp15* and *obp6* genes (Davis and Li, 2013). Forêt and Maleszka (2006) had previously shown that *obp15* is expressed in the antennae of adult bees and also in young larvae but not in pupae. The high ecdysteroid levels in honeybee hemolymph during the pupal to pharate-adult transition, thus, appear to repress *obp15* expression, and possibly also other members of this complex gene family.

Among the genes overrepresented in the transcriptome of the control group (downregulated in EcR-KD bees), the first in the top ten list is annotated as *npc2*. Genes of this family are associated with Niemann-Pick syndromes and diseases affecting cholesterol metabolism (Carstea et al., 1997). In *D. melanogaster*, *NPC* mutations cause intracellular enrichment of cholesterol, reduced ecdysteroidogenesis and death in the first larval instar. The fact that this condition could be fully rescued when an excess of dietary cholesterol was given to these mutants indicated that the ecdysone biosynthesis pathway is intact, but precursor processing is not (Huang et al., 2007). Interestingly, in honeybees, as in other insects, the major ecdysteroid is not ecdysone or its derivative 20E, but makisterone A, an ecdysteroid methylated at C24 (Feldlaufer et al., 1985, 1986a,b; Rachinsky et al., 1990), possibly due to a lack or restriction in C24-demethylation of a phytosterol precursor. The expression of two other genes overrepresented in the transcriptome of the control group has been shown to be dependent on the ecdysteroid titer. The protein encoded by LOC724735, an endocuticle structural protein (Márcia M. G. Bitondi, unpublished results) and also the *Grp* gene, renamed as *tweedle1* (*AmelTwdl1*) (Soares et al., 2011), were induced in the integument by the ecdysteroid pulse that promotes the pupal to pharate-adult transition. Thus, the functionality of the ecdysone/EcR complex is necessary for the activation of these genes. Interestingly, *mrjp3*, the third among the genes overrepresented in control bees (and thus induced by the EcR pathway), encodes one of the main MRJPs produced by nurse bees (Buttstedt et al., 2013). The *mrjp3* gene thus seems to be highly expressed by the time of adult emergence and the first days of adult life. Unlike the *mrjp1* and *mrjp2* genes, *mrjp3* reaches negligible expression levels in foragers, which, together with its distinctive amino acid sequence (Drapeau et al., 2006), supports the notion of its main function as food protein supplier to larvae by nurse bees. However, the fact that *mrjp1*, another MRJP gene highly expressed in nurse bees, was found to be repressed by the ecdysteroid pathway (see above), suggests the *mrjp3* gene is regulated in a distinct mode from the other *mrjp* genes.

#### **miRNAs AS ACTORS IN THE EcR REGULATORY NETWORK**

Here we demonstrate that the RNAi-mediated knockdown of *EcR* function perturbs the expression of 70 miRNAs (∼1/3 of the honeybee miRNAs known to date). Most of these (60) were downregulated and 10 were upregulated and we assume that these down and upregulated miRNAs are "induced" or "repressed," respectively, by the EcR pathway as bees undergo the pharate-adult to adult transition.

Among the miRNAs that showed significant changes in abundance following *EcR* knockdown, most had already been identified in a large-scale sequencing project (Chen et al., 2010), but had no function(s) associated. Our data now lead to infer that these miRNAs are, at least, closely associated with EcR action and, consequently, connected to pupal-adult metamorphosis. In addition to these miRNAs of yet unclear functions, we also found conserved and functionally well-defined miRNAs, such as let-7, miR-1, miR-133, miR-375, miR-184, and miR-34. For example, miR-133 and miR-1 are both clustered in the mouse and fly genomes, and they play important roles in muscle development and differentiation in vertebrates and invertebrates (Sokol and Ambros, 2005; Chen et al., 2006; Boutz et al., 2007). In the honeybee, however, we found these two miRNAs to be located far apart from one another on chromosome 16. Nonetheless, they still seem to be linked in their cooperative functions, such as formation and physiology of flight muscle tissue. miR-133 has also been implicated in dopamine production (Yang et al., 2014), and high levels of dopamine were shown to coincide with rapid growth and compartmentalization of the antennal lobe neuropil, suggesting a role in the developing brain of the honeybee (Kirchhof et al., 1999). Furthermore, dopamine-derivatives are substrates for oxidation by laccases (Andersen, 2010) that are involved in tanning of the developing adult cuticle (Elias-Neto et al., 2010). Members of the *D. melanogaster let-7-C* locus (a cluster containing the *let-7, miR-100,* and *miR-125* genes) are also found in the honeybee genome. In *D. melanogaster* they are expressed in neuromusculature development of pupae and adults, and knockout flies showed disturbances in flight, reproduction and locomotion (Sokol et al., 2008). Moreover, ecdysteroid signaling was shown to be linked to the expression levels of the let-7-C cluster genes, as well as of miR-14 and miR-34 during insect development (for review see Kucherenko and Shcherbata, 2013).

Many of the miRNAs affected by *EcR* knockdown in honeybees (let-7, miR-1, miR-9a, miR-12, miR-14, miR-34, miR-79, miR-92b, miR-124, miR-184, miR-210, miR-219, miR-263a, miR-276, miR-279, miR-283, miR-305, miR-306, miR-316, miR-317) have previously been reported as putatively involved in the regulation of *D. melanogaster* immune genes, particularly those belonging to the JNK, Imd and Toll signaling pathways (Fullaondo and Lee, 2012). Accordingly, ecdysone and the ecdysone receptor complex (EcR/USP) are considered critical for innate cellular immunity (Flatt et al., 2008; Regan et al., 2013). Among these miRNAs, miR-184 is highly and/or broadly expressed in a number of tissues and developmental stages of vertebrates (Wienholds and Plasterk, 2005) and invertebrates (Jagadeeswaran et al., 2010), including *A. mellifera* (Chen et al., 2010; Nunes et al., 2013b). Moreover, several studies reported a wide spectrum of roles for miR-184, such as germline differentiation, axis formation of the egg chamber, anteroposterior patterning and cellularization of the embryo, gastrulation and neuroectoderm formation, apoptosis, and processes involved in the development and differentiation of imaginal discs (head, wing, and eyes) (see Iovino et al., 2009; Li et al., 2011, and references therein). The ecdysone response of miR-184 seen here in pharate-adult honeybees is associated with a period of extensive tissue remodeling, suggesting that miR-184 may play a role in the differentiation of honeybee imaginal disc-derived structures and maintenance of their tissue identities. Interestingly, the EcR mRNA has predicted binding sites for miR-14 (data not shown), and our global gene expression assays revealed a downregulation of this miRNA in bees silenced for *EcR* gene function. These results suggest that in *A. mellifera, EcR* gene expression is regulated in a loop-type mechanism involving miR-14, as already demonstrated for *D. melanogaster* (Varghese and Cohen, 2007; for a comprehensive review see Yamanaka et al., 2013).

#### **CONCLUDING REMARKS**

Our results suggest a differential use of EcR isoforms during the honeybee life-cycle stages. We could show that there is a generally positive *EcR* gene response to slight increases in ecdysteroids, whereas high levels of these hormones are repressive. The EcR knockdown experiments revealed that the expression of several hormone response-related genes (e.g., *kr-h1*) is contingent on a functional ecdysone/EcR receptor complex, thus providing a possible link between JH and ecdysteroid action during preimaginal honeybee development. These knockdown experiments also hightlighted the relevance of a set of miRNAs involved in the regulation of immune response genes and in the general morphogenesis processes during pharate-adult development (e.g., *miR-184* and *let-7* locus genes). Within this framework and on the background of current knowledge on honeybee biology, our results highlight the relevance of the drop in the ecdysteroid pathway function for the appropriate timing in the expression of adult-specific genes, such as the Major Royal Jelly Protein (MRJP) family members.

#### **AUTHOR CONTRIBUTIONS**

Tathyana R. P. Mello, Aline C. Aleixo, Angel R. Barchuk, and Zilá L. P. Simões conceived the project; Tathyana R. P. Mello, Aline C. Aleixo, Daniel G. Pinheiro, Francis M. F. Nunes, Klaus Hartfelder, and Angel R. Barchuk performed the experiments; Tathyana R. P. Mello, Aline C. Aleixo, Daniel G. Pinheiro, Francis M. F. Nunes, Márcia M. G. Bitondi, Klaus Hartfelder, Angel R. Barchuk, and Zilá L. P. Simões analyzed and interpreted the data and drafted the MS. All authors approved the final version of the MS.

#### **FUNDING**

This study was funded by the Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP), grants 2011/03171-5; 2008/1446- 4; 2008/10757-3.

#### **ACKNOWLEDGMENTS**

We thank Dr. Nilce M. M. Rossi, Roseli de Aquino P. Ferreira and Diana Gras for lab facility and support in Northern blotting assays, and Luiz Aguiar for technical assistance in the apiary. We also thank Felipe Martelli, Denyse C. Lago, Fabiano Abreu, and Camilla Valente Pires for help with the qPCR assays.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://journal*.*frontiersin*.*org/journal/10*.*3389/fgene*.* 2014*.*00445/abstract

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 28 September 2014; accepted: 04 December 2014; published online: 22 December 2014.*

*Citation: Mello TRP, Aleixo AC, Pinheiro DG, Nunes FMF, Bitondi MMG, Hartfelder K, Barchuk AR and Simões ZLP (2014) Developmental regulation of ecdysone receptor (EcR) and EcR-controlled gene expression during pharate-adult development of honeybees (Apis mellifera). Front. Genet. 5:445. doi: 10.3389/fgene. 2014.00445*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Mello, Aleixo, Pinheiro, Nunes, Bitondi, Hartfelder, Barchuk and Simões. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Biased Allele Expression and Aggression in Hybrid Honeybees may be Influenced by Inappropriate Nuclear-Cytoplasmic Signaling

*Joshua D. Gibson1\*, Miguel E. Arechavaleta-Velasco2, Jennifer M. Tsuruda3 and Greg J. Hunt1*

*<sup>1</sup> Department of Entomology, Purdue University, West Lafayette, IN, USA, <sup>2</sup> CENID-Fisiología y Mejoramiento Animal, Instituto Nacional de Investigaciones Forestales, Agrícolas y Pecuarias, México, Mexico, <sup>3</sup> Public Service and Agriculture, Clemson University, Clemson, SC, USA*

#### *Edited by:*

*Melanie April Murphy, University of Wyoming, USA*

#### *Reviewed by:*

*Guy Bloch, The Hebrew University of Jerusalem, Israel Bart Pannebakker, Wageningen University, Netherlands Douglas Mark Ruden, Wayne State University, USA Eamonn Mallon, University of Leicester, UK*

> *\*Correspondence: Joshua D. Gibson gibson85@purdue.edu*

#### *Specialty section:*

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics*

> *Received: 18 August 2015 Accepted: 20 November 2015 Published: 01 December 2015*

#### *Citation:*

*Gibson JD, Arechavaleta-Velasco ME, Tsuruda JM and Hunt GJ (2015) Biased Allele Expression and Aggression in Hybrid Honeybees may be Influenced by Inappropriate Nuclear-Cytoplasmic Signaling. Front. Genet. 6:343. doi: 10.3389/fgene.2015.00343*

Hybrid effects are often exhibited asymmetrically between reciprocal families. One way this could happen is if silencing of one parent's allele occurs in one lineage but not the other, which could affect the phenotypes of the hybrids asymmetrically by silencing that allele in only one of the hybrid families. We have previously tested for allele-specific expression biases in hybrids of European and Africanized honeybees and we found that there was an asymmetric overabundance of genes showing a maternal bias in the family with a European mother. Here, we further analyze allelic bias in these hybrids to ascertain whether they may underlie previously described asymmetries in metabolism and aggression in similar hybrid families and we speculate on what mechanisms may produce this biased allele usage. We find that there are over 500 genes that have some form of biased allele usage and over 200 of these are biased toward the maternal allele but only in the family with European maternity, mirroring the pattern observed for aggression and metabolic rate. This asymmetrically biased set is enriched for genes in loci associated with aggressive behavior and also for mitochondrial-localizing proteins. It contains many genes that play important roles in metabolic regulation. Moreover we find genes relating to the piwi-interacting RNA (piRNA) pathway, which is involved in chromatin modifications and epigenetic regulation and may help explain the mechanism underlying this asymmetric allele use. Based on these findings and previous work investigating aggression and metabolism in bees, we propose a novel hypothesis; that the asymmetric pattern of biased allele usage in these hybrids is a result of inappropriate use of piRNA-mediated nuclear-cytoplasmic signaling that is normally used to modulate aggression in honeybees. This is the first report of widespread asymmetric effects on allelic expression in hybrids and may represent a novel mechanism for gene regulation.

Keywords: parental effects, cytoplasmic incompatibility, hybrid incompatibility, aggression, Africanized, *Apis mellifera*, PIWI, PIWI-interacting small RNAs

### INTRODUCTION

The honeybee (*Apis mellifera*) is becoming a promising model for understanding epigenetic processes, which affect gene expression without modifying the DNA sequence. Honeybees possess all of the key genes in the methylation machinery (Wang et al., 2006) and DNA methylation plays a role in queen caste determination (Kucharski et al., 2008). Additionally, honeybee post-translational histone protein modifications have been characterized and may play a role in caste determination (Spannhoff et al., 2011; Dickman et al., 2013). However, we are just beginning to learn how these epigenetic processes regulate gene expression in honeybees. For example, methylation events in the bee genome have been shown to be plastic and related to behavioral castes (Herb et al., 2012), though they appear to be associated with alternative splicing rather than large alterations in transcriptional abundance (Flores et al., 2012; Foret et al., 2012).

One major hypothesis for epigenetic control of gene expression in honeybees, the kinship theory of genomic imprinting, predicts a bias in the expression of alleles in a parent-specific manner due to differences in relatedness between nestmates in order to enhance inclusive fitness (Haig, 2002; Queller, 2003). Another hypothesis predicts a bias in expression toward the maternal allele in order to maintain a match between co-adapted nuclear alleles and the maternally inherited mitochondria (Wolf, 2009). There is considerable empirical support for the former theory while there is little evidence in support of the latter (Haig, 2004). Only the kinship theory predicts a paternal expression bias, but both theories predict a maternal bias for some genes and in these cases support for one theory over the other can only be distinguished by the functions of the biased genes. In either case, this differential allelic expression must include an epigenetic mechanism because the expression bias of an allele is affected by the parent from which it is inherited and not solely by its genotype (i.e., European or Africanized alleles). Until recently, it was not known if there were gene expression effects in honeybees that were consistent with epigenetic regulation. However, there are phenotypic parental effects on aggression and metabolic rate that have been identified in studies utilizing hybrids from crosses between European and African subspecies that occur in patterns that suggest that epigenetic processes may be involved (Harrison and Hall, 1993; Guzman-Novoa et al., 2005; Oldroyd et al., 2014).

We previously tested for epigenetic effects by identifying parent-specific gene expression (PSGE) in honeybees by utilizing reciprocal F1 worker families derived from crosses of European (*A.m. carnica*) and Africanized bees (invasive hybrids between African *A.m. scutellata* and European honeybees; Kocher et al., 2015). Transcriptomes of workers in the two reciprocal families (differing in the lineage from which each parent is derived) were sequenced and read counts of heterozygous SNPs were used to test for PSGE. This experimental design allows us to assess epigenetic effects on transcription because the genotype (European or Africanized) and parent-of-origin of an allele will differ between the two families. Each allele will be inherited maternally in one family and paternally in the other. We found that PSGE is present in the honeybee (1–2% of tested loci) and that a bias toward maternal expression was common. A set of 46 genes showed consistent, symmetric parental biases in both families (maternal or paternal in both families) and several of these genes have functions that are predicted by the kinship theory of genomic imprinting. Surprisingly, a strong maternal bias occurred primarily in the family with European maternity (EA hybrids hereafter) and 215 genes were maternally biased exclusively in this family compared to only 24 genes that were maternally biased exclusively in the Africanized maternity family (AE hybrids hereafter). This was the first evidence of PSGE in honeybees and, while PSGE has been studied in many organisms, to the best of our knowledge the observed asymmetric pattern of PSGE (PSGE in only one family) has not been documented in other species.

Asymmetries in phenotypic effects between reciprocal hybrid families are commonly observed, including honeybee hybrids (Harrison and Hall, 1993; Guzman-Novoa et al., 2005; Turelli and Moyle, 2007). These asymmetric phenotypic effects require that there be some asymmetry in the expression of genes inherited from the parents, which could include sex chromosomes, cytoplasmic factors, differentially imprinted genes, or maternal effects (Wolf et al., 2014). We propose that these asymmetric phenotypic effects in honeybees are due to asymmetric PSGE that is the result of inappropriate signaling in the hybrids. Specifically, we propose that these wide crosses disrupt, in the hybrids, nuclear-cytoplasmic signaling pathways, and epigenetic processes that are utilized differentially in the parental lineages. This disruption leads to inappropriate signaling in these pathways that influences epigenetic chromosomal modifications (in an allelespecific manner), ultimately resulting in asymmetric phenotypic effects. In support of this, some of the maternally biased genes in our previous study were located within quantitative trait loci (QTL) that influence honeybee stinging behavior, a trait which is asymmetrically expressed in these hybrids (Hunt et al., 1998, 2007; Guzman-Novoa et al., 2005). In addition, the maternally biased genes for one of the three tissue samples analyzed (first instar larvae) were enriched for nuclear-encoded proteins known to localize to the mitochondria, supporting a connection between the cytoplasm, asymmetric PSGE, and asymmetric hybrid effects. Here we undertake a more comprehensive analysis of the transcriptome data to identify additional genes that show bias in these hybrids, to characterize their function and chromosomal localization with respect to QTL, and to test for differential gene expression between the two families.

#### MATERIALS AND METHODS

#### Previous and New Analyses

The work presented here utilizes a dataset originally published in Kocher et al. (2015). The sequencing data are available in the NCBI Short Read Archive, project number PRJNA277772. This dataset consisted of the cross utilizing Africanized honeybees (AHB) and the European honeybee (EHB) subspecies *A.m. carnica* (described in "crosses" below) and the analyses leading to the expression levels of the alleles in the reciprocal hybrid families (described in "PSGE of alleles" below). We produced a new gene set by altering the criteria for a gene to be considered to show PSGE and differences in these criteria are described below. New analyses were performed using this gene set, constituting the results presented here. The new work also includes new crosses utilizing the European subspecies *A.m. ligustica* and *A. m. carnica* that provided workers used in the stinging behavior assays.

### Crosses

Crosses for evaluating PSGE were previously described in Kocher et al. (2015) and biased expression patterns were further analyzed from this dataset. Briefly, EHB and AHB colonies were maintained at INIFAP facilities near Villa Guerrero, Estado de México, Mexico. There are several closely related subspecies of EHB that are often used by beekeepers and two of these were among the colonies tested for use in our crosses, Carniolan bees, *A.m. carnica*, and Italian bees, *A.m. ligustica*. The two most aggressive AHB colonies and the most docile EHB colonies (one each of *A.m. carnica* and *A.m. ligustica*) were chosen based on differences in stinging behavior and response to queen mandibular pheromone. Daughter queens and drones were raised from the parental colonies. Pairs of reciprocal crosses were performed using single-drone queen instrumental insemination between one AHB and the *A.m. ligustica* parental colony and between the AHB and the *A.m. carnica* parental colony. For crosses with *A. m. carnica*, two EA families and four AE families were tested for individual stinging behavior (see below). For crosses with *A. m. ligustica*, three EA and three AE families were tested.

### PSGE of Alleles

Expression levels of the alleles in the two reciprocal crosses are from Kocher et al. (2015). Briefly, all transcriptome data is derived from the two reciprocal crosses utilizing the *A.m. carnica* queen (designated EA and AE for those with EHB and AHB maternity, respectively). Transcriptomes were sequenced from cDNA libraries of pooled first instar larvae (two libraries per family), pooled adults (guard bees, two libraries per family), and individual adult brains (three libraries per family). Single nucleotide polymorphisms (SNPs) differentiating European and Africanized alleles were identified by sequencing genomic DNA of the queen and drone parents of these two crosses to ensure the European and Africanized alleles were homozygous and different, resulting in F1 offspring that are heterozygous at all tested SNPs. All reads were mapped to the honeybee reference genome (Amel4.0; The Honeybee Genome Sequencing Consortium, 2006). Using counts of reads at each SNP, a general linear interactive mixed model (SAS, Cary, NC, USA) was utilized to assess expression of each allele for all transcripts containing diagnostic SNPs. The analysis in Kocher et al. (2015) required the bias to be in the same direction (maternal or paternal) in both directions of the cross (EA and AE hybrids) based on significant parent FDR *<* 0.05, and a bias of at least 0.6 (maternal or paternal reads/total reads; Wang and Clark, 2014) in order to search for consistent parent-of-origin effects. For the current analysis, we relaxed this criteria and only required that the bias be present in one direction of the cross. These genes were then placed into bias categories based on the expression levels of their alleles in each family relative to the parentof-origin of that allele (e.g., Maternal bias, EA maternal/AE maternal; European bias, EA maternal/AE paternal, etc.; see **Figure 1**).

#### Individual Stinging Behavior

We tested the stinging behavior of individual bees from our crosses by measuring the time that each bee took to sting a black suede patch after being stimulated with electrical current (assay described in Shorter et al., 2012). In total 573 bees were tested from the four F1 reciprocal colonies. These colonies are designated AL (AHB queen × *A.m. ligustica* drone), LA (*A.m. ligustica* queen × AHB drone), AC (AHB queen × *A.m. carnica* drone), and CA (*A.m. carnica* queen <sup>×</sup> AHB drone) in **Figure 2**. Data was transformed using the natural log function to fit a normal distribution and was analyzed under a one way analysis of variance to test for differences in the stinging behavior of the four F1 reciprocal crosses. Least squares means *t*-tests were performed to compare the means of the four groups.

### Differential Expression Analysis

We assessed differential gene expression (DGE) between comparable stages/tissues in the EA and AE families using CLC Genomics Workbench version 7.5 (CLC Bio, Boston MA, USA) employing the Empirical Analysis of DGE option, which implements the "Exact Test" of Robinson et al. (2010). Genes were considered significantly differentially expressed if the False Discovery Rate corrected *p*-value was less than 0.05.

### Overlap of Biased Genes with Known QTL

We assessed whether genes showing asymmetric maternal PSGE lie within QTL influencing traits related to colony defense (Hunt et al., 1998, 1999, 2007; Arechavaleta-Velasco et al., 2003; Shorter et al., 2012), reproduction (Oxley et al., 2008; Linksvayer et al., 2009; Rueppell et al., 2011), and pollen foraging behavior (Hunt et al., 1995, 2007; Page et al., 2000). We used the diagnostic SNPs within biased genes to determine their location in the Amel4.5 assembly. In cases where physical locations of markers were given in the QTL studies, we used these to identify the bounds of the QTL (typically the 1.5 LOD support interval). When information on physical locations of markers wasn't available, we used the sets of candidate genes from these projects to identify the range of the QTL. We then compared the positions of the biased genes with the ranges of these QTL to determine overlap.

### Genomic Clustering of Biased Genes

In addition to identifying biased genes that are within previously identified QTL, we also assessed whether any biased genes were in physical clusters within the genome by visualizing their positions on SNP-based linkage maps (Arechavaleta-Velasco et al., 2012; Tsuruda et al., 2012). Once putative clusters of significantly biased genes were identified, we also looked at the allelic expression patterns of all tested genes (regardless of significance) within the putative cluster.


FIGURE 1 | Gene counts in bias categories. Average maternal/paternal bias of all genes in each bias category in each hybrid family (EA, European maternity; AE, Africanized maternity). Gray columns = European allele, Black = Africanized allele. The total number of genes in each category and the number falling in each QTL type are given in the columns on the right. Only genes falling into a single bias category across samples are included in counts for QTL types. ∗Significantly more genes in this category are present within these QTL than expected by chance (Bonferroni corrected *p*-value = 0.005).

#### Statistical Tests of Enrichment/Overlap

We utilized goodness of fit tests to determine whether the genes in our bias categories (see **Figure 1**) were enriched for mitochondrial-localizing genes, significantly overlapped other gene sets, or were overrepresented in QTL. In all cases we only report results if they were significant after Bonferroni correction. We tested for enrichment of mitochondrial-localizing proteins by performing reciprocal BLASTs of the AmelOGS3.2 peptide sequences against a set of *Drosophila melanogaster* genes with proteins that are known to localize to mitochondria (Pagliarini et al., 2008; Smith et al., 2012). Expected values for our biased gene categories were calculated using the proportion of genes in the total honeybee gene set that match the *Drosophila* mitochondrial-localizing set. We also tested for significant overlap of our genes in each bias category with our own differentially expressed gene set, genes differentially expressed between aggressive and non-aggressive bees (Alaux et al., 2009), and for overrepresentation within QTL. We used the proportion of the total official gene set represented in each of these groups to calculate the expected number of genes in each of our bias categories. We also tested for Gene Ontology (GO) term enrichment using the best reciprocal matching *D. melanogaster* genes by utilizing the Gene Ontology Consortium's enrichment analysis pipeline (geneontology.org).

#### Use of Animals in Research

This research did not require IRB approval because we only used invertebrates in this study, which are exempt from IRB approval. Despite not requiring approval, we made every effort to minimize any potential suffering of the bees used in this research.

Data presented is untransformed, letters designate significant differences.

#### RESULTS

#### Genes Showing PSGE Bias

We found that out of the 2663 unique transcripts that we could test, 509 exhibited biased expression with one of the parental alleles used more than the other (≥0.6 bias) in at least one reciprocal hybrid family. In addition to the previously reported genes that show a parent-of-origin effect (either maternal or paternal in both families), we found evidence for biased PSGE in all other potential categories of bias (maternal or paternal only, allele-specific, or no bias in either family; **Figure 1**). Over 40% (223 genes) of the biased genes had a maternal bias only in the hybrids with European maternity (EA hybrids; **Figure 1**). Out of these 509 transcripts, 33 fell into more than one category of bias due to differences between samples (Supplementary Tables S1 and S2). In the majority of these cases, the category shift was due to a small level of bias in the AE family (near the 60% cutoff value) relative to the greater bias in the EA family. This is evident because the number of transcripts falling into more than one category decreases by 85% (to 5 total) simply by increasing the cutoff value to 70% bias. The EA maternal bias is much more robust, as indicated by the decrease of only *<*2% in the EA maternal-only category with the same change in criteria (Supplementary Table S2 and Figure S1). Nevertheless, to ensure unambiguous results we removed these genes for tests of enrichment of mitochondrial-localizing genes and presence in QTLs.

#### Differential Expression Analysis

A total of 160 unique genes were differentially expressed between the EA and AE families in at least one stage and six of these genes were differentially expressed in both guards and another stage while only one gene was shared between larvae and brains (Supplementary Table S3). One-hundred and one of these genes have a more than twofold change in expression.

#### Parental Effect on Stinging

There were significant differences in the honeybee individual stinging behavior between the four F1 reciprocal crosses (*F* = 53.64; df = 3,567; *p <* 0.01). The bees of the colonies with European maternity (CA and LA) stung significantly faster than the bees of the colonies with Africanized mothers (AC and AL; *p <* 0.05). There were differences between the two crosses with European maternity, the bees with *A.m. carnica* maternity (CA) stung faster than the bees of the cross with *A.m. ligustica* maternity (LA; *p <* 0.05), but there were no differences in the time to sting for the two crosses with Africanized maternity (AC and AL; **Figure 2**; *<sup>p</sup> <sup>&</sup>gt;* 0.05).

#### Genomic Clustering of Biased Genes

We identified two regions containing clusters of genes that all appear to be biased, both of which overlap with defense-related QTL. One on chromosome 3 lies within the Sting-2 QTL, a region associated with increased colony-level stinging behavior (Hunt et al., 1998, 2007). There are 12 genes within a region of ∼410 kb that show a significant maternal bias of greater than 90% in the European maternity family (**Figure 3A**). There is only one additional gene within this region that could be tested and this gene also shows *>*90% maternal bias in this family. The second cluster lies on chromosome 12 within the bounds of a QTL associated with production of the active component of alarm pheromone, isopentyl acetate (Hunt et al., 1999). This region is ∼600 kb in length and there are 29 genes that could be tested within this region. Similar to the cluster within the Sting-2 QTL, 27 of these genes show a significant maternal bias (*>*90% maternal in 23 of these genes) in the European maternity family and the remaining two genes show the same pattern of extreme maternal bias (**Figure 3B**).

#### Overlap of Biased Genes with Other Data Sets

We found 164 of our biased genes overlapping with known QTL associated with traits for defense, reproduction and foraging behavior based on the position of these genes in OGS3.2 (Elsik et al., 2014). Within these QTL, just two of the gene bias categories were over-represented relative to the expected number based on OGS3.2 (**Figure 1**). The European maternal biased (only maternal bias in EA) gene set is overrepresented in defensive QTL with 55 genes (expect 36.3, *p <* 0.002), and the EA maternal AE paternal (European biased) gene set is overrepresented in pollen hoarding QTL with eight genes (expect 2.9, *p <* 0.0031).

We tested whether any of our biased categories are enriched for genes whose proteins are known to localize to mitochondria and found that the genes that are maternally biased only in the EA family are significantly enriched in each of the three samples (larvae [10/78], *p* = 0.0005; brains [16/140], *p* = 0.0001, adults [7/49] *p* = 0.0016). Moreover, 15 of the 17 genes that have the same bias in all three sample types are maternally biased only in the EA family and are highly enriched for mitochondrial-localizing genes (6/15, *<sup>p</sup>* <sup>=</sup> 9.4 <sup>×</sup> <sup>10</sup><sup>−</sup>11). Despite genomic clustering of some of our biased genes, there is no clustering of mitochondrial-localizing genes. We also

allele usage of each reciprocal family within each sample, and OGS 3.2 gene ID is given for every tested gene within these clusters. Relative allele usage calculated as maternal read count/total read count. ∗Statistically significant allelic bias. NT, not tested. N/A in gene ID column are transcripts that had no clear match to a protein coding gene.

determined the extent to which our gene list overlapped with genes that were differentially regulated in aggressive vs. nonaggressive bees (Alaux et al., 2009). Significantly more genes overlapped between this study and our biased gene list than expected by chance (115 genes, *p* = 0.017), though this is not significant if we correct for testing each individual bias category. None of the individual categories are significantly enriched for overlapping genes, even without multiple test correction (Supplementary Table S1). Similarly, 43 of our differentially expressed genes overlap with those of Alaux et al. (2009), though there is no pattern to the overlap in regards to the up or down regulation of the genes in each study (Supplementary Table S3).

#### DISCUSSION

We previously found evidence of PSGE in honeybees (Kocher et al., 2015). However, we also found over 200 genes that showed highly biased expression toward the maternal allele, but only in the family with European maternity (EA hybrids). This asymmetric bias in expression is not predicted by theories of genomic imprinting. Similarly, if the pattern we observed were due to allelic effects (Africanized or European alleles preferentially expressed), we would expect to see a maternal bias in one family and a paternal bias in the other, but we did not. There have been several cases reported in which asymmetric hybrid phenotypic effects have resulted from disrupted genomic imprinting in hybrids. One example is found in two species of deer mice, *Peromyscus maniculatis* and *P. polionotus*, in which the hybrid family with a *P. polionotus* mother exhibits offspring overgrowth while the reciprocal family exhibits undersized offspring (Loschiavo et al., 2007; Duselis and Vrana, 2010; reviewed in Wolf et al., 2014). These asymmetric hybrid effects are expected because of the imprinting that is predicted to occur in *P. maniculatis*, but not in *P. polionotus*, based on the conflict hypothesis of imprinting (Moore and Haig, 1991). *P. maniculatis* is polygamous while *P. polionotus* is monogamous, and therefore there is selection for *P. maniculatis* fathers to increase their own offspring's growth at the expense of other male's offspring from the same mother. This inclusive fitness incentive doesn't exist in a monogamous system. In *P. maniculatis* there is also a selective advantage for the mother to counteract this with mechanisms that allow all of her offspring to receive equal nutrition. The hybrid families end up with asymmetric offspring growth because the genomic conflict (and hence the balanced offspring growth) is disrupted in these crosses. *P. polionotus* parents don't have this conflict and so don't counteract the growth effects of their *P. maniculatis* partners, resulting in small offspring with *P. maniculatis* mothers and large offspring with *P. maniculatis* fathers. We extend this idea to hybrids between races of honeybees by proposing that the disruption of these parental effects (whether imprinting or other heritable factors) results in inappropriate signaling within established nuclear-cytoplasmic signaling pathways that leads to allelespecific changes in expression in only one of the hybrid families. To the best of our knowledge, there are no other examples of PSGE biases that show a widespread asymmetric pattern such as we find in these reciprocal hybrids.

Our biased gene set is dominated by a single category of bias, genes that are maternally biased only in the European maternity family (EA hybrids; **Figure 1**). The second most abundant category is bias toward European alleles, which may be influenced by similar processes. Previous studies investigating several trait differences between AHB and EHB found phenotypic patterns in hybrid crosses that are similar to the asymmetry we see in allelic bias. EA bees repeatedly exhibit high Africanized-like aggression while AE hybrids exhibit levels of aggression intermediate to the parents (Guzman-Novoa et al., 2005). We tested whether our biased genes may play a role in this asymmetric aggression by assessing the positions of the biased genes in the Amel4.5 assembly to see how they may fit with previously identified QTL associated with aggressive behavior, as well as QTL associated with reproduction and foraging (Elsik et al., 2014). We found that 89 out of the total of 509 biased genes lie within QTL for defensive traits, 58 genes are within QTL for reproductive traits and 17 are within QTL for pollen foraging behavior (**Figure 1**). There were significantly more genes than expected by chance between the EA maternal-only bias category and defensive QTL (*p* = 0.0029) and between the overall European bias category (EA maternal and AE paternal) and pollen foraging QTL (*<sup>p</sup>* <sup>=</sup> 3.39 <sup>×</sup> <sup>10</sup><sup>−</sup>5). The connection between the European bias group and pollen foraging is interesting given that the propensity for pollen collection has been shown to vary between European and Africanized bees (Pesante et al., 1987; Page et al., 2000). However, unlike the genes with an EA maternal-only bias, when we increase our bias cutoff criteria from 60 to 70%, the number of genes in the pollen foraging QTL category is reduced by more than 60% and the enrichment within these QTL disappears, indicating that these genes are not highly biased in either family (Supplementary Table S2). We also tested whether our biased gene set is enriched for genes that are differentially expressed between aggressive and non-aggressive bees (Alaux et al., 2009). While the entire set of biased genes shows a slight enrichment for these genes (115 biased genes overlapping with 2254 differentially expressed genes; *p* = 0.017), no individual category of bias is significantly enriched.

Another asymmetric phenotype is that EA hybrids have asymmetrically low flight metabolic capacity (based on whole body CO2 measurements) relative to both the parents and AE hybrids, which could indicate that the there is an incompatibility between maternally derived European mitochondria and paternally derived nuclear genes in EA hybrids (Harrison and Hall, 1993). A separate study of aggression in honeybees found that aggression and brain metabolic rates are related, as the brains of highly aggressive bees showed significantly reduced oxidative metabolism relative to non-aggressive bees (Alaux et al., 2009). Reducing the rate of oxidative phosphorylation both in bees and in *Drosophila*, even in the whole body, increased aggression and therefore it appears that brain metabolic rate plays a causal role in aggressive behavior in insects (Li-Byarlay et al., 2014). Given this connection, differential gene expression associated with aggression in the parents (Alaux et al., 2009) would lead us to expect to find that metabolic genes show differential expression between our reciprocal families. Despite this expectation, genes that were differentially expressed between these families are not enriched for any functional GO category or for mitochondrial-localizing proteins (Supplementary Table S3; See Supplementary File 1 for discussion of differentially expressed genes). The lack of enrichment for genes showing expression differences between aggressive and non-aggressive bees or for any functional GO categories makes interpretation of our differentially expressed gene set difficult. Interestingly, only 13 of the 509 biased genes are also differentially expressed between the two families. This is significantly more overlap than expected given the small number of differentially expressed genes (*p* = 0.0008, Supplementary Table S1), however, there was no pattern to the overlap between up/down regulation and bias category. The fact that the vast majority of the biased genes are not differentially expressed means that in general there is a combination of allele-specific silencing and dosage compensation and that this process is occurring for many genes in only one of the reciprocal families. These results are reminiscent of the increased expression (e.g., *Drosophila* males) or silencing (e.g., mammalian females) that occurs on sex chromosomes to maintain comparable expression in both sexes (Disteche, 2012).

If nuclear-mitochondrial interactions are involved in the asymmetric phenotype of EA aggressive behavior through changes in metabolism, then we expect that this phenotype would have an inherent physiological basis and not necessarily be influenced by social interactions. Therefore we used an aversive stimulus of individuals in a lab assay to test for an asymmetric phenotype outside the colony environment. We used hybrids with two European mitochondrial backgrounds (*A.m. ligustica* and *carnica*) for these tests. As seen at the colony level, bees with European mothers reacted more aggressively (faster to sting) than those with Africanized mothers but also that bees with *A.m. carnica* mothers were significantly more aggressive than those with *A.m. ligustica* mothers (**Figure 2**) even though the *A.m. carnica* parental source was less aggressive than the *A.m. ligustica* parent (data not shown). The stinging behavior QTL discussed above were also identified in a cross with *A.m. carnica* mothers (Hunt et al., 1998).

In addition to the significant overlap with QTL associated with aggressive behavior, another clue that the asymmetric PSGE in the EA family may be tied to both the asymmetric hybrid aggression and metabolic deficit is the fact that we found highly significant enrichment for mitochondrial-localizing proteins in every sample (Supplementary Table S1). This contrasts with results of our previous analyses that focused on 46 genes that showed consistent parental effects in both families. That study only found significant enrichment of mitochondrial proteins for biased genes in larvae of the EA family (Kocher et al., 2015). It is important to note that if the asymmetric bias in EA hybrids were due solely to incompatible interactions between nuclear genes and their proteins that directly interact with mitochondria, we would expect this enrichment to be very high (approaching 100%). However, enrichment only reaches ∼8% in this biased gene set, which implies that this asymmetric bias may be due to dysfunctional signaling involving the mitochondrial and nuclear genomes rather than a direct result of nuclear-mitochondrial dysfunction. The reduced oxidative brain metabolism associated with aggression in bees isn't necessarily an overall reduction in energy metabolism, as studies show this reduction in oxidative metabolism is accompanied by a shift toward aerobic glycolysis (AG; glycolysis in the presence of oxygen; Chandrasekaran et al., 2015). The shift toward AG is mediated by mitochondrial retrograde signaling (signals from mitochondria that regulate nuclear transcription), a process that is normally used to maintain energy homeostasis (Liu and Butow, 2006). These connections may implicate retrograde signaling in the modulation of aggression in bees (Li-Byarlay et al., 2014).

In addition to this newfound connection with aggression in bees, the shift away from oxidative metabolism and toward AG is a well established phenomenon in cancer cells, known as the Warburg effect (Warburg, 1956). This metabolic shift is thought to play an important role in cell proliferation in both cancer cells and in normal, non-cancer cells (Lunt and Vander Heiden, 2011). AG seems to be especially important in brain tissue, as developing brain tissue shows this same metabolic transition and brain areas with increased synaptic activity exhibit major changes in lactate concentrations, a byproduct of AG (Barros, 2013; Gershon et al., 2013; Goyal et al., 2014). GO analysis of our EA maternal-only biased gene set revealed an enrichment for 80 GO terms but these fall into a few broad categories that include cellular morphogenesis (particularly neurogenesis), behavior, and regulation and cell signaling (Supplementary Table S4). These categories are consistent with both the behavioral changes in the EA family as well as the cellular processes associated with a metabolic switch toward AG. Although genes that were maternally biased in EA were enriched for mitochondriallocalizing proteins, this set is not enriched for any GO categories directly involved in energy metabolism. However, the biased gene set does contain many genes that likely play a role in the retrograde response that elicits the switch toward AG. Moreover, this gene set has many genes that play a role in transcriptional regulation, including the piwi-interacting RNA (piRNA) pathway, which is a small RNA pathway involved in chromatin modifications (Huang et al., 2013). While we couldn't test for these small RNAs due to the size selection involved in our library preparation, these genes may provide a link between the metabolic shift to AG involving mitochondrial signaling and a potential mechanism for the asymmetric maternal-only bias that we see in gene expression (see Supplementary File 1 for a discussion of our interpretation of the connection these genes provide).

The piRNA pathway acts by modifying chromatin to inhibit transcription, as compared to posttranscriptional silencing initiated by other small RNA pathways (Huang et al., 2013). The piRNA pathway is primarily involved in the suppression of transposable elements, but recent studies have shown that piRNAs also play important roles in epigenetic modulation and genomic imprinting (Brennecke et al., 2008; Chang et al., 2009; Huang et al., 2013; Le Thomas et al., 2013). Chromatin modifications may help to explain our lack of differential expression of biased genes between the hybrid families: if the paternal allele is unable to be expressed due to these modifications, then any signal in the cell to increase expression (e.g., transcription factor binding) will only be able to act on the maternal allele, resulting in both allelic bias and a lack of differential expression. Chromatin modifications may also explain why two of the defensive QTL identified above contained large clusters of significantly biased genes in which every gene that could be tested showed the same pattern of *>*90% maternal bias only in the EA family (**Figure 3**). In both of these QTL we were only able to test a subset of all the genes in the region due to non-informative SNPs and/or insufficient read counts (Sting 2, 14/57; alarm pheromone, 27/59), however, the consistent level of bias in tested genes and their broad distribution across these clusters indicates that this bias likely occurs across all genes within these clusters. Within these clusters the same genes are biased across all sample types (though not always significant due to read counts), which indicates that this pattern is likely present throughout the lifespan of the individuals and across all tissues. It is possible that there is an inversion in these regions in the AHB lineage resulting in this pattern of expression bias, but we are unable to test for this due to low coverage of our genomic DNA from this lineage. This possibility seems unlikely, however, as we would expect to see a comparable reduction in Africanized (i.e., maternal) expression in the AE family in these regions and previous independent studies of recombination in EA hybrids haven't indicated the expected loss of recombination within these regions (Hunt et al., 1998; Shorter et al., 2012; Ross et al., 2015). Given the size of these clusters (∼500 kb)

and the near complete silencing of the paternal alleles across tissues and life stages it seems likely that differential chromatin modifications in the homologous chromosomes contribute to this asymmetric pattern. While it is possible that these chromatin modifications occur independent of the piRNA pathway (e.g., chromosomal conformational changes that can be assessed using the Hi-C technique; Belton et al., 2012), these also seem less likely to be responsible for the overall asymmetric bias than the piRNA pathway due the genes involved in this bias (see Supplementary File 1) and the fact that *>*80% of these asymmetrically biased genes lie outside of these clusters (223 genes total with 39 in clusters). Given that the piRNA pathway acts in a sequence specific manner and may therefore be able to act on individual genes (Huang et al., 2013), we consider these other possibilities as alternative hypotheses to the model we propose below.

Piwi-interacting RNAs are loaded into oocytes, where they serve as a sequence-specific transgenerational epigenetic memory of both gene silencing and activation (maternal licensing), and in self/non-self recognition. piRNAs have also recently been found in mature sperm (Johnson and Spence, 2011; Shirayama et al., 2012; Pantano et al., 2015). The piRNA pathway has been shown to be involved in epigenetic regulation of phenotypic traits in both mice (white tail tip, WTT; Yuan et al., 2015) and fruit flies (ectopic long bristle outgrowths on the eyes, ELBOs; Sollars et al., 2003; Gangaraju et al., 2011). Both of these phenotypes occur at a low frequency in populations (naturally for WTT mice and artificially induced for ELBOs in *Drosophila*) and represent an epigenetic capacity within these populations that is normally suppressed through the piwi/piRNA pathway. The epigenetic capacity for these phenotypes can be released through selection for these traits (Sollars et al., 2003; Ruden et al., 2015; Yuan et al., 2015). These phenotypes are initially expressed in individuals with certain mutant alleles but the phenotype can occur in offspring that lack the causal mutant (Ruden et al., 2015; Yuan et al., 2015). Perhaps most intriguingly, the ELBOs in *Drosophila* can be maintained in the population (over 100 generations) as long as selection for the trait is maintained (Ruden et al., 2008). This epigenetic selection may help to explain our asymmetric hybrid effects.

We speculate that the metabolic switch toward AG in honeybee brains and the associated aggression is a phenotypic trait that has a partially epigenetic basis, mediated through the piwi/piRNA pathway. An epigenetic switch to aggression is at least implied by theory regarding genomic imprinting in honeybees, as honeybees exhibit extreme polyandry and drones should have an evolutionary drive to produce daughters that are more selfish in regards to producing their own offspring, including more aggressive daughter queens that may be successful in queen duels, therefore inheriting the nest including the worker bees and other resources. In a population there should also be simultaneous selective pressure on the queens to suppress this selfish and aggressive behavior, resulting in intragenomic conflict similar to the *Peromyscus* mouse example given earlier (Queller, 2003). This level of intracolonial aggression also needs to be balanced with the need for appropriate extra-colonial aggression (i.e., colony defense). The genomic conflict could occur through paternal piRNA silencing to increase aggression and maternal piRNA licensing to mitigate the paternal silencing and reduce aggression. Similar to the WTT and ELBOs discussed above this is a phenotype that would normally be suppressed (or canalized) but which would vary in extent between populations due to differing selective pressures (e.g., within AHB and EHB lineages; **Figure 4**).

The implication of our admittedly speculative model is that wider crosses in honeybees can result in increased aggression in one of the hybrid families because the cross disrupts the balance of genomic conflict for/against aggression. This could occur in any case where the extent of genomic conflict differs between lineages and might explain why beekeepers who cross different races of bees sometimes report higher aggression in one of the reciprocal families, though individual crosses would need to be investigated to gain a full understanding of this phenomenon (Adam, 1983). The selection for highly aggressive AHB colonies to use for our crosses might have resulted in the epigenetic release of this phenotype in that parent colony (i.e., simultaneously selected for increased paternal silencing of alleles that leads to aggression and less maternal opposition to this silencing), which explains the asymmetrically high aggression in the EA family (**Figure 4**). The importance of selecting for aggressive traits in the AHB parent may explain why another study that used EHB × AHB crosses similar to ours, but that didn't select for differential aggression, did not have these same asymmetries in allelespecific expression (Galbraith and Grozinger, personal communication).

This piRNA mediated aggression model can be tested in multiple ways. Isolating and sequencing small RNAs from both sperm and eggs of the parent colonies would allow us to determine whether piRNAs are present and whether they target the genes that show biased allele usage. If they are present, the total RNA from sperm derived from drones of AHB colonies selected for high and low aggression could be injected into eggs from an EHB queen crossed with an AHB drone from a colony selected to be docile (the eggs must still be F1 hybrids to ensure both alleles are present for sequence specificity). Allele use can then be assessed in the resulting offspring, with the prediction that eggs injected with RNA derived from the "aggressive drones" will result in biased allele use in these offspring while those injected with RNA from the "docile drones" will not. Further experiments could then be performed to analyze brain metabolism and aggression in both sets of offspring. This model can also be tested through RNAimediated knockdown of the piRNA machinery in the parent EHB and AHB queens and drones (from colonies selected for high and low aggression) and subsequent testing of hybrid allele usage, metabolism, and aggression. Biased allele use and aggression would be expected to be lower when the machinery is knocked down paternally. Knockdowns could also be used in crosses between the highly aggressive AHB colonies. The cross of AHBpiRNA<sup>−</sup> queens <sup>×</sup> AHBpiRNA<sup>+</sup> drones would be expected to have high aggression (perhaps even higher than the parents) due to a loss of maternal licensing and AHBpiRNA<sup>+</sup> queen × AHBpiRNA<sup>−</sup> drones would be expected to have lower aggression (**Figure 4**).

#### REFERENCES


The patterns that we observed in this family likely represent an inappropriate utilization of the retrograde signaling pathway because the maternal bias in gene expression in the EA family seems to occur in all tissues across life stages and not just in the brains as described for these metabolic changes in aggressive bees (Alaux et al., 2009; Li-Byarlay et al., 2014). Similarly, the metabolic deficiency in EA hybrids occurs at the whole body level even though aggressive bees are known to have increased oxidative metabolic rates at the whole body level in response to alarm pheromone (Moritz et al., 1985). Taken together, these results indicate that the whole organism-level asymmetric hybrid effects on allelic gene expression, metabolism, and aggression may be due to perturbations of established nuclear-mitochondrial signaling pathways that normally modulate brain metabolism and aggression in honeybees. While these results mark an important step in our understanding of aggression and describe a new pattern of hybrid gene regulation, additional work is necessary to better understand how differential allele expression acts on these traits, how this allelic expression is controlled and how these signaling pathways modulate aggression in the context of honeybee natural history.

#### AUTHOR CONTRIBUTIONS

GH and MA planned hybrid crosses, collected samples and performed stinging behavior assays. MA performed the crosses. JG, JT, and GH performed bioinformatics analyses. JG and GH wrote the manuscript with input from all authors.

#### ACKNOWLEDGMENTS

We would like to thank Sarah Kocher for helpful comments on the manuscript. This project was funded by NSF IOS – 1145509.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal*.*frontiersin*.*org/article/10*.*3389/fgene*.* 2015*.*00343

behaviors of individual honey bees. *Behav. Genet.* 33, 357–364. doi: 10.1023/A:1023458827643


asymmetry in European honeybee workers. *Heredity* 106, 894–903. doi: 10.1038/Hdy.2010.138


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Gibson, Arechavaleta-Velasco, Tsuruda and Hunt. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## The evolutionary dynamics of major regulators for sexual development among Hymenoptera species

Matthias Biewer 1, 2, Francisca Schlesinger 1, <sup>3</sup> and Martin Hasselmann1, 2 \*

*<sup>1</sup> Population Genetics of Social Insects, Institute of Genetics, University of Cologne, Cologne, Germany, <sup>2</sup> Livestock Population Genomics Group, Institute of Animal Science, University of Hohenheim, Stuttgart, Germany, <sup>3</sup> Institute of Bee Research, Hohen Neuendorf, Germany*

#### Edited by:

*Juergen Rudolf Gadau, Arizona State University, USA*

#### Reviewed by:

*Frederic Guy Brunet, The Pennsylvania State University, USA Ehab Abouheif, McGill University, Canada*

#### \*Correspondence:

*Martin Hasselmann, Livestock Population Genomics Group, Institute of Animal Science, University of Hohenheim, Garbenstrasse 17, 70599 Stuttgart, Germany martin.hasselmann@ uni-hohenheim.de*

#### Specialty section:

*This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics*

> Received: *29 September 2014* Accepted: *16 March 2015* Published: *10 April 2015*

#### Citation:

*Biewer M, Schlesinger F and Hasselmann M (2015) The evolutionary dynamics of major regulators for sexual development among Hymenoptera species. Front. Genet. 6:124. doi: 10.3389/fgene.2015.00124* All hymenopteran species, such as bees, wasps and ants, are characterized by the common principle of haplodiploid sex determination in which haploid males arise from unfertilized eggs and females from fertilized eggs. The underlying molecular mechanism has been studied in detail in the western honey bee *Apis mellifera*, in which the gene *complementary sex determiner* (*csd*) acts as primary signal of the sex determining pathway, initiating female development by *csd*-heterozygotes. *Csd* arose from gene duplication of the *feminizer* (*fem*) gene, a *transformer* (*tra*) ortholog, and mediates in conjunction with *transformer2* (*tra2*) sex-specific splicing of *fem*. Comparative molecular analyses identified *fem*/*tra* and its downstream target *doublesex* (*dsx*) as conserved unit within the sex determining pathway of holometabolous insects. In this study, we aim to examine evolutionary differences among these key regulators. Our main hypothesis is that sex determining key regulators in Hymenoptera species show signs of coevolution within single phylogenetic lineages. We take advantage of several newly sequenced genomes of bee species to test this hypothesis using bioinformatic approaches. We found evidences that duplications of *fem* are restricted to certain bee lineages and notable amino acid differences of *tra2* between *Apis* and non-*Apis* species propose structural changes in Tra2 protein affecting co-regulatory function on target genes. These findings may help to gain deeper insights into the ancestral mode of hymenopteran sex determination and support the common view of the remarkable evolutionary flexibility in this regulatory pathway.

Keywords: gene duplications, sex determination, adaptive evolution, regulatory changes, pathway evolution

#### Introduction

Understanding the evolution of biological pathways and the driving processes shaping them still belongs to the central questions in biology. Studying genetic networks and their underlying selectional and developmental processes can provide important insights into the divers evolutionary trajectories of molecules between species (Pires-da Silva and Sommer, 2003; Wilkins, 2007; Fani and Fondi, 2009; Davidson, 2010; Peter and Davidson, 2011). As one common process, gene duplication has been identified to play a key role in providing novel or modified gene functions resulting from various forms of selection acting on the paralogous copies (Innan and Kondrashov, 2010). Following the model of neofunctionalization, a paralogous copy may acquire a novel function not present in the gene from which it arose. Besides positive selection promoting the fixation of advantageous mutations in this copy, exon or domain shuffling may further contribute to the evolution of a neofunctionalized gene. Among others, the well-established duplication-degenerationcomplementation (DDC) model provides a basis for the evolution of an modified (subfunctionalized) function in the paralogous gene (Force et al., 2005).

The sex determination pathway in honey bees constitutes a well-studied example, in which gene duplication has been identified to play a major role in its evolutionary history (Hasselmann et al., 2008). Common for all hymenopteran species (ants, wasps, and bees) is the principle of haplodiploidy in which males are haploid and develop from unfertilized eggs, whereas females are diploid and arise from fertilized eggs (Bull, 1983). The underlying molecular signals and regulatory key genes involved in the sex determination pathway have been studied in greater detail for only two hymenopteran species, the parasitic wasp Nasonia vitripennis (Beukeboom et al., 2007; Verhulst et al., 2013; van de Zande and Verhulst, 2014) and the western European honey bee Apis mellifera (Beye et al., 2003; Hasselmann and Beye, 2004; Hasselmann et al., 2008; Gempe et al., 2009) with an estimated divergence time of about 170 million years ago (Werren et al., 2010).

With the now available new genomes of bee species covering a divergence time of about 100 million years, we are closing the so far existing gap between Apis and Nasonia. Consequently, we can now study the evolution of the sex determination pathway and the driving forces shaping key components on a refined scale. Thus, one of the obvious questions is whether lineage specific events such as gene duplications can be observed to affect key regulator coevolution. Within the sex determination pathway of insects, a conserved unit of genes has been identified, giving rise to a transductional core downstream of the primary signal (Bopp et al., 2014) that transmits the information of the primary signal and releases male/female specific developmental regulatory signals to a variety of target genes. We hypothesize that the core unit of sex determining genes is relative conserved in all Hymenopterans; however, evolutionary processes may have shaped these genes and the additional cofactors lineage specifically.

In the honeybee, the primary signal is the gene complementary sex determiner (csd), which arose from gene duplication of its copy feminizer (fem) (Beye et al., 2003; Hasselmann et al., 2008). The molecular decision of male or female development is mediated by a multiallelic system of protein-protein interaction, in which a heterozygous conformation leads to female development, while homo- and hemizygotes develop into males. The evolutionary history of the paralogous genes has been shaped by contrasting forms of selection in Apis: after the duplication, csd experienced strong positive selection, following the model of neofunctionalization, whereas fem evolved under strong purifying selection (Hasselmann et al., 2010). The formation of specific protein regions such as a hypervariable region (HVR) and a protein-interacting coiled-coil motif are important for the rise and function of csd-alleles. Molecular functional analysis provided evidence for sex-specific splicing of fem, initiated by the allelic state of csd, in which heterozygote csd lead to femalespecific fem-mRNA splicing. Acting as binary switch gene, fem transcripts are maintained and enhanced by an autoregulatory feedback loop of the Fem protein (Hasselmann et al., 2008; Gempe et al., 2009). This serine-arginine (SR) rich protein and its ortholog transformer (tra) are differentially spliced, either to a female functional or to a male nonfunctional isoform, as found for other insect species (Butler et al., 1986; Pane et al., 2002; Sarno et al., 2010; Verhulst et al., 2010). The processing of sex-specific information by the fem/tra gene is conserved in these insects and the sex determining pathway converged at this level (Gempe et al., 2009).

The absence of an RNA recognition motif (RRM) domain in Apis fem requires a cofactor protein for RNA binding to mediate the sex-specific splicing process. It has been shown by Nissen et al. (2012) that the Transformer2 (Tra2) protein in conjunction with the Csd protein transmit the sex-specific splicing of fem-mRNA. Tra2 is evolutionary conserved among insects and characterized by a single, 80–90 amino acid long RRM domain, flanked by two SR domains. Two sequence elements (RNP1 and RNP2) have been shown to be directly involved in RNA recognition. With this ability to recognize RNA motifs, Tra2 facilitates the fem/tra autoregulatory splicing loop, which can be found in other insect species, except Drosophila (Gempe et al., 2009; Salvemini et al., 2009; Hediger et al., 2010; Sarno et al., 2010).

The female-specific active Fem (Tra)/Tra2 complex regulates the differential splicing of the downstream target for sex-specific development, doublesex (dsx). The gene doublesex (dsx) represents the key gene in sex determination of insects as the most downstream component of the pathway regulating sex-specific phenotypes (Burtis and Baker, 1989; Cline and Meyer, 1996). Acting as transcription factor, dsx encodes a protein with a zincfinger DNA-binding domain (DM domain). In all insect species studied so far, gene structure and pattern of sex-specific splicing is generally conserved (Cho et al., 2007). Female and male transcripts consists of two oligomerization domains (OD1 and OD2) harboring DNA and protein interaction functions. The use of different splice sites at the C-terminal region results in OD2 sequence variation that alters the female- and male-specific regulation of target genes, which regulates the sex-specific splicing of pre-mRNA into male or female isoforms for the particular development as an essential transductional core of the pathway.

Among hymenopteran non-Apis species, the molecular basis of sex determination is best understood for the phylogenetically most basal parasitic wasp Nasonia, in which an alternative mode of haplodiploid sex determination evolved (Verhulst et al., 2013). Similar to what is known for many other dipteran insects, transformer mRNA of Nasonia vitripennis (Nvtra mRNA) is maternally provided to all eggs, however only in fertilized eggs Nvtra transcription can initiate and maintain female Nvtra mRNA by an autoregulatory feedback loop. In unfertilized eggs, the maternally provided genome induces low level of Nvtra expression, leading to the hypothesis of genomic imprinting as sex determination mechanism (Verhulst et al., 2010). Recent findings indicate that alleles of an trans-acting factor (womanizer), likely to be maternally provided may have been recruited as novel component in the sex determining pathway (Verhulst et al., 2013).

There is increasing evidence that the initial signals of sex determining pathways may evolve rapidly, contributing to the astonishing diversity of species. The underlying processes driving this rapid evolution may be gene duplications, accompanied by the gain of novel or modified function and changes in the selective regime under which the key genes evolve. In our study we provide evidence for the importance of instantaneously occurring events such as gene duplications and lineage specific mutations that affect key regulator coevolution within the sex determination pathway of hymenopteran species.

#### Materials and Methods

#### Sequence Data

Genome assemblies and annotations of recently sequenced bee species (Kapheim et al., in revision) were used to identify gene copies of interest (feminizer—fem, transformer2—tra2, doublesex—dsx), taking Amell vs. 4.5, OGS 3.2 as reference and using various blast parameters to avoid non-detection errors. Hidden Markov profile searches (Eddy, 1998) were performed to search specifically for fem paralogs in bee genomes using HMMer3 on protein (HMMsearch) and nucleotide (nHMMer) level (Eddy, 2009). Multiple sequence alignments were generated using MUS-CLE (Edgar, 2004) and optimized manually. To reduce the loss of informative sites due to incomplete or misleading annotations, experimentally proven and publicly available data were used for some species and GenBank and OrthoDB entries were used for fem and paralogous copies, tra2 and dsx sequences. The sequences used for our analyses for comparing functional and evolutionary relationships were retrieved from GenBank and OrthoDB. Accession numbers are given in the Supplementary Tables 1–3.

#### Evolutionary Analyses

Genealogies were reconstructed after applying Model Test (Posada and Crandall, 1998) on the dataset to determine the evolutionary substitution model that fitted the data best. The model with the lowest BIC (Bayesian Information Criterion) scores was considered best for describing the substitution pattern. Non-uniformity of evolutionary rates among sites was modeled by using a discrete Gamma distribution (+G) with 5 rate categories. Evolutionary trees were constructed using the maximum likelihood method (JTT model) implemented in MEGA6 (Tamura et al., 2013). Examination of exonic splicing regulatory elements (ESR) was performed on the ESR search website (http://esrsearch.tau.ac.il/) using the highest number of available parameters (Fairbrother et al., 2002). Further, analyses of conserved protein domains and protein function were performed with conserved domain search module (Marchler-Bauer et al., 2011) implemented on www.ncbi.nlm. nih.gov and InterPro (Hunter et al., 2012), http://www.ebi.ac. uk/interpro/). The program COIL (implemented online under: http://embnet.vital-it.ch/software/COILS\_form.html) was used to search specifically for predicted coiled-coil regions. The COIL program compares the query sequence to a database of known coiled coils and derives a similarity score. The probability that the sequence will form a coiled-coil motif is obtained within the program by comparing the similarity score against the distribution of scores in globular and coiled-coil proteins. Sequencebased motifs were identified and analyzed using the MEME suite package (http://meme.nbcr.net/meme/, Bailey et al., 2009). The significance of the motif is determined by first finding the most statistically significant (low E-value) motifs. Motifs are shown as sequence logos, represented by position-specific probability matrices that specify the probability of each possible letter appearing at each possible position in an occurrence of the motif. Displayed as stacks of letters at each position in the motif, the total height of the stack is the "information content" of that position in the motif in bits.

#### Results

#### Diversification of Feminizer Gene Duplicates

In a previous study (Kapheim et al., in revision), we have identified fem paralogs and orthologs of recently obtained genomic resources of bees representing different levels of social organization (**Figure 1**). Representative species were analyzed from Apini (the western European honey bee Apis mellifera and the dwarf honey bee Apis florea), Bombini (the buff-tailed bumble bees Bombus terrestris and Bombus impatiens), Euglossini (the orchid bee Eufriesea mexicana), Meliponini (Melipona quadrifasciata), Megachilini (the leafcutter bee Megachile rotundata) and Halictini (Lasioglossum albipes and Habropoda laboriosa). We noticed that the occurrence of fem duplications varies among different lineages in conjunction with varying signs of diversifying and negative selection. When including transformer (tra) orthologous sequences of seven ant species and one parasitic wasp, the sequences fall into two major clades, separating anttra from the remaining sequences (**Figure 2**). All genes share an arginine-serine rich and a proline rich domain, establishing these copies as strong candidates to be involved in protein interaction and splicing processes. For the paralogous genes fem and csd within the Apis lineage, evidence for both processes has been given by numerous functional studies (Hasselmann et al., 2008; Gempe et al., 2009; Nissen et al., 2012). In the bumble bee Bombus terrestris, for fem and its paralogous copy fem1 several splice forms were identified (Biewer et al., in revision); in the stingless bee Melipona interupta the single copy fem gene is characterized by two splice forms (Brito et al., unpublished).

Here, we focus on the evolutionary dynamic of Fem proteins among bees using amino acid sequence motifs. We follow the hypothesis that characteristic motifs should be found in all bees harboring changes in species-specific paralogs of fem. These could hint to lineage-specific modifications of protein interaction in the sex determination pathways. Our hypothesis is supported by the previous study of Koch et al. (2014) showing the independent origin of fem paralogs in Apis and Bombus (and Ants) and thus different evolutionary fates, for which the multiallelic evolution of csd stands as one remarkable example (Hasselmann et al., 2008; Lechner et al., 2014).

In order to test our hypothesis, we first evaluated the amino acid motifs in Fem protein sequences of bee species and the wasp Nasonia vitripennis using the MEME program package (see Materials and Methods). Six motifs with the best scoring E-values (E-values ranging from 1.0e−<sup>488</sup> to 8.0e−143) were detected, represented by sequence logos (see **Figure 3** and Supplementary Figure 1). The relative positions of these motifs in the protein are located in the N-terminal and C-terminal, as well as in-between regions of the protein (Supplementary Figure 2). Sequence logos illustrate the evolutionary conservation of several amino acids, the most prominent ones are Glutamic acid (E), Arginine (R), Lysine (K), Glutamine (G) and Proline (P), as well as variable positions, giving rise to species-specific divergence.

Next, we evaluated the phylogenetic signal for each motif by constructing genealogies based on the maximum likelihood algorithm. Amino acid divergence of Apis compared to non-Apis bees is most pronounced for motif 1, 5, and 6, resulting in three separated and clearly supported (78–98% bootstrap support) clusters. The sequence clustering is less obvious for motif 3 and an unresolved branching pattern results from motif 2 and 4. Interestingly, motif 5 locates in direct vicinity of the predicted coiledcoil (cc)-motif, identified to be specifically evolved in csd of A. mellifera, A. cerana and A.dorsata by positive selection of six non-synonymous changes (Hasselmann et al., 2008). No such ccmotif can be detected on the homolog positions in A. florea csd (Biewer et al., in revision) and in those of fem or its paralogs for other hymenopteran species (Supplementary Figure 3). However, we identified the presence of a cc-motif in the region of motif 3 for A. florea csd (Biewer et al., in revision) that coincides with an α and β sheet PLP-dependent transferase-like structure predicted by the SMART tool (http://smart.embl-heidelberg.de/) in non-Apis bees. We conclude that at least regions of motif 3 and 5 are candidates for having resulted from lineage-specific evolution in protein interaction processes associated with the sex determination pathway in bees.

#### Lineage Specific Coevolution of Transformer2

Subsequently, we followed the hypothesis that coevolutionary signs should be detectable within the tra2 gene as major coregulator in the sex determining pathway, if the key regulator fem and (if present) its paralogous copies evolved with a modified function. Therefore we first aligned Tra2 protein sequences from orthologs of 10 bee species and three other insect species (N. vitripennis, B. mori, D. melanogaster) and focused on the RNA recognition motif (RRM), which is flanked by two SR rich regions. The RRM contains about 80 aa and forms a βαββαβ barrel motif, whereas on the third β sheet the two conserved elements RNP1 and RNP2 are located, known to be directly involved in the RNA recognition of dsx in D. melanogaster (Chandler et al., 1997), **Figure 4**. No amino acid changes between bee species exist in RNP2 whereas the remaining part of the RRM show pronounced differences among the species. Two observations are of particular interest: First, all non-bee species compared to the bee species show numerous amino acid changes, ranging from 9 aa (Nvit) to 28 aa (Dmel) which reflects their phylogenetic distance. Second, within the bee species, the Apisspecies A. mellifera and A. florea are consistently different for 9 amino acids that are otherwise conserved in bees, two of them locate in the RNP1 region. In addition, Bombus and Melipona species have one common amino acid replacement, as compared to the other species. When

FIGURE 2 | Overview of the evolutionary relationship of the fem gene and copies (fem1, csd, tra) in social insect species. The tree with the highest log likelihood was inferred by using the Maximum Likelihood method. The percentage of trees in which the associated taxa clustered together is shown next to the branches. Pairwise amino acid distances estimated using a JTT model using a discrete Gamma distribution was applied to model evolutionary rate differences among sites. All positions with less than 95% site coverage were eliminated. Abbreviation of species: Aech, *Acromyrmex echinatior;* Acep, *Atta cephalotes;* Acer, *Apis cerana*; Adors, *Apis dorsata*; Aflor, *Apis florea*; Amel, *Apis mellifera*; Bimp, *Bombus impatiens*; Bter, *Bombus terrestris*; Cflor, *Camponotus floridanus*; Dnov, *Dufourea novaeangliae*; Emex, *Eufriesea mexicana*; Hlab, *Habropoda laboriosa*; Hsal, *Harpegnathos saltator*; Lalb, *Lasioglossum albipes*; Lhum, *Linepithema humile*; Mquad, *Melipona quadrifasciata*; Mrot, *Megachile rotundata*; Nvitr, *Nasonia vitripennis;* Pbar, *Pogonomyrmex barbatus*; Sinv, *Solenopsis invicta*.

compared over full length, Apis-Tra2 shows 21 of otherwise fixed amino acid differences compared to non-Apisspecies. In previous analyses (Kapheim et al., in revision), we noticed that the RRM domain is on average more divergent between Apis and non-Apis species than outside of the domain (P < 0.1), predominantly for the downstream region (P < 0.01). These unexpected findings could hint to an Apis specific functional association of tra2 with fem, depending on the lineage specific fem copies and their function.

We compared the relative evolutionary rate of tra2 bee sequences to further evaluate the differences between Apis and non-Apis species. Using Tajima's relative rate test, we tested the null hypothesis of equal molecular clock rate between Mint/Amel, Mint/Bter, and Mint/Bter using four non-Apis species (Mrot, Emex, Hlab, Nvit) as outgroup. Tests on molecular evolutionary rates of fem reveal a higher rate in Apis compared to non-Apis species (P < 0.01 for all pairwise comparisons, using different outgroups). No difference in evolutionary rate was detected between non-Apis (Mint/Bter) tra2 comparison (P > 0.5). To test, whether these evolutionary rate differences is specific to tra2 or a general phenomenon among Apis and non-Apis species, two reference genes were analyzed (elongation factor 1 and GB11211—a gene know to be located in close vicinity of the fem gene within the sex determination locus, Hasselmann et al., 2010). No rate differences were detected between Apis and non-Apis for both genes (P > 0.05).

#### The Evolutionary Conserved Key Regulator Doublesex

Sequence analyses of different bee and non-bee species indicate fundamental changes in the initial regulatory elements of the sex determining pathways. Although the gene doublesex (dsx), which is located toward the bottom of the pathway, shows large amino acid sequence divergence between species, two major domains remain highly conserved (**Figure 5**). OD1 harbors a DNA-binding domain containing a zinc-finger, while OD2 includes a dimerization domain which was found in all analyzed species except Eufriesea mexicana, which could be due to poor sequence quality. The evolutionary tree of dsx shows a distinct segregation between bee and non-bee species (**Figure 5A**). This might be not only due to evolutionary distances by nucleotide changes, but also by structural changes. All non-bee species (except the wasp Nvit) showed a female-specific exon which was not present in the bee species. In D. melanogaster this exon contains six 13-nucleotide repeats, which are exonic splicing regulatory elements (ESR) and are essential for Tra2 binding to dsx (Baker, 1989). This repeats were not specifically found in the other non-bee species (e.g., Bmor, Ccap), whereas the presence of the female specific exon might suggests a similar mechanism of protein-binding to dsx as it was found in Drosophila and other dipteran species (data not shown; Ruiz et al., 2007). Since this female specific exon seems to be absent in the bee species, one hypothesis could be that these ESR are located in other positions of the gene. We tested for this and did not find any evidence of similar ESR in other positions of the gene (data not shown). Alternatively, bees might have evolved other regulatory elements transmitting the Tra2 binding to dsx. This hypothesis should be tested in future experimental studies.

#### Discussion

Studying the evolution of genetic components within regulatory pathways may shed light on the flexibility of how similar requirements are satisfied by different approaches in nature. This ubiquitous phenomenon, known as developmental system drift (DSD) has been identified to establish homologous conserved traits by developmental mechanism that are diverged among species (True and Haag, 2001; Abouheif and Wray, 2002; Nahmad et al., 2008). Here, we focused on major regulators of the sex determination pathway in social insect species, elucidating their

FIGURE 4 | Amino acid alignment of the tra2 RNA recognition motif and phylogenetic tree. The RNA recognition motif (RRM) with two elements (RNP1: red, RNP2: green) known to be directly involved in RNA recognition are highlighted. The maximum likelihood tree branch length represents amino acid changes per site for *tra2*. Abbreviations are the same as for Figure 2 adding Bmor, *Bombyx mori* (Lepidoptera); Dmel, *Drosophila melanogaster* (Diptera); Mint, *Melipona interupta*.

evolutionary dynamic. The transductional core of the sex determining pathway [fem(tra)/dsx complex] is evolutionarily conserved in insects over more than 280 million years of divergence (Diptera/Hymenoptera). Upstream initial signals regulating the sex-specific splicing of fem/tra may evolve within much shorter time, being consistent with the bottom-up theory (Wilkins, 1995) and the hour-glass model recently developed by Bopp et al. (2014). The different copy numbers of fem duplications found in bee genomes (this study and Kapheim et al., in revision) would allow either lineage specific gene loss (in Mqua, Mrot, Dnov, and Hlab) from a single ancestral duplication event or independent gene duplications (in Apis, Bter, Bimp, Lalb).

Our data from a variety of bee species now provide evidence for different evolutionary fates of the key regulator fem in bees. Gene duplications of fem in only some of the bee lineages in conjunction with diversifying selection seem to be the major force driving the evolution of fem and it paralogous copies. We identified amino acid motifs in fem and its copies that coincides with the prediction of protein structures (e.g., coiled-coil) known to be involved in protein interaction processes. The amino acid divergence between Apis and non-Apis species on these motifs favors the hypothesis that functional constraints may have shaped these parts of the protein differently. Among them, motif 5 reveals highest divergence between Apis and non-Apis species (73% total aa divergence, Supplementary Table 4) whereas to the low overall divergence (0.4% aa) between both groups hints to an lineage specific accumulation of amino acid changes. Recent analyses of Koch et al. (2014) provide evidence for independent gene duplication of fem in Apis and Bombus and reject the hypothesis of concerted evolution between fem/csd and fem/fem1 as proposed by Privman et al. (2013) and Schmieder et al. (2012). By these processes, primary sex determining signals may evolve rapidly including modified function of known key regulators. This hypothesis could be supported by the greater divergence of the Tra2-RRM-domain, particularly between Apis and non-Apis bees, indicating a lineage specific functional interaction of tra2/csd, tra2/fem or tra2/fem1.

#### Evolutionary Changes in tra2 but Not in Dsx between Apis and Non-Apis Bees

There are several indications that tra2 in Apis has evolved differently compared to other bee species. The tra2 genealogy (**Figure 4**) does not match to the species phylogeny (**Figure 1**) derived from seven genes. The tra2 sequences of Apis cluster in a separate branch from phylogenetic closely related groups and evolve with higher evolutionary rate. Reflected by the high number of Apis-specific amino acid changes we suggest a modified function of tra2 compared to non-Apis. Changes in the amino acid composition on 21 sites, 9 of them inside the RRMdomain, led us to conclude that target molecule specificities in binding sites may have been modified. These target molecules could be fem and/or dsx. Our evolutionary analysis of Dsx protein indicates a rather high degree of structural conservation (**Figure 5B**). Consequently, and in agreement with the widely accepted hypothesis of bottom-up evolution in sex determining pathways (Wilkins, 1995), we have reasons to assume that dsx has retained its conserved function and that the structural changes in Tra2 were driven by fem evolution.

#### Coevolutionary Model of tra2 and Fem/Paralog Complex in Apis and Non-Apis

The evolution of novel or modified gene function may affect the function of associated genes (Innan and Kondrashov, 2010), a characteristic that we have noticed already in the evolution of fem in Apis in a previous study (Hasselmann et al., 2010). In that study we found that fem in Apis evolves under stronger functional constraints than in non-Apis, likely due to the origin of the novel function raised by csd. Often known as coevolution, molecular changes among closely interacting genes may lead to lineagespecific modification of protein function. The concept of genefor-gene evolution has been introduced and widely described in plant-pathogen interactions, with natural selection and genetic drift as the major evolutionary processes driving this form of coevolution (Thompson and Burdon, 1992; Dodds et al., 2006). Our present results led us to propose a model of coevolutionary changes in sex determining key regulator tra2 and fem with its paralogs, depending on their presence or absence.

We propose three scenarios that may impact the evolution of the tra2/fem/paralog gene complex. Scenario one resembles the best studied case so far (**Figure 6**) found to be established in the Apis lineage. In this scenario, the evolution of the multiallelic csd operating as primary signal of sex determination following the model of neofunctionalization was accompanied by lineage specific changes of the Tra2 protein. Tra2 has been proven to mediate fem mRNA sex-specific splicing, transmitting the information of the allelic composition at csd to its downstream target (Nissen et al., 2012). Consequently, our data of numerous Apis-specific amino acid substitutions (**Figure 4**) within and outside of the RRM domain indicates a coevolutionary, fast-evolving process forced by the strong directional evolution that has acted on csd (Hasselmann et al., 2008). In addition, Tra2 has been proposed to interact with the genes fem and dsx to act on regulating sexspecific splicing of dsx (Nissen et al., 2012). To disentangle which of the amino acid changes are directly associated to this twofold functions of tra2, future in vitro studies are needed.

The second scenario illustrates duplication events of fem, as found in e.g., Bombus, giving rise to the paralogous copy fem1 (Sadd et al., in press). The proposed model of subfunctionalization (**Figure 6**) is supported by the absence of allelic variation in fem1 which is in contrast to the fem paralog in Apis (csd) (Biewer et al., in revision). Another difference between csd and fem1 is the occurrence of various splice transcripts in the latter and their absence in csd (Gempe et al., 2009). We hypothesize that the numerous amino acid differences in Tra2 are associated with its modified binding specificity in Bombus (dotted arrow), driven by a different evolutionary fate of the fem paralog. Still, it remains up to further investigation to identify the primary signal of sex determination in Bombus and the position of fem1 within the sex determination pathway.

Our last scenario three (**Figure 6**) is stimulated by the observation that in some bee species (e.g., Melipona) obviously no fem duplication exist. This result is not only supported by bioinformatic approaches on newly sequenced genome data (this study and Kapheim et al., in revision) but also by various experimental setups (Brito et al., unpublished). In this scenario tra2 function is likely to be related to the sex determination pathway based on its evolutionary conservation (this study) and on its constant expression over early (egg) and late (larvae, pupae, adults) developmental stages in Melipona interupta (Schlesinger and Hasselmann, unpublished data). Gene expression studies can add another useful dimension to examine coevolution among genes as interacting proteins are often precisely coexpressed, (Fraser et al., 2004), ultimately leading to a better understanding of protein interaction processes within regulatory pathways. Further, analyses will likely elucidate the primary signal of sex determination in Melipona, a system on which various alternative models to explain the determination of different sexes have been developed in the past, including empirical evidence for a complementary mode of sex determination resulting from controlled crossing experiments (Kerr, 1987; Carvalho, 2001).

Our comparative analyses of major regulators of sex determination in hymenopteran species provide further support to the wide range of evolutionary possibilities for shaping the sex determination regulatory pathway, consistent with the concept of DSD. Driving forces affecting the evolutionary dynamic of sex determining key regulators are gene duplication, selection and coevolution. More instantaneously occurring events such as transposon mediated translocation of genes or fragments and recombination events may lead to gene copy number variations, including pseudogenization (Lonnig and Saedler, 2002). These processes are likely to be common in hymenopteran species, as high recombination frequencies in bees and ants (Beye et al., 2006; Sirvio et al., 2006; Meznar et al., 2010) and transposable elements near sex determining genes (Beye et al., 2003;

Koch et al., 2014) have been observed. For the hymenopteran wasp species Nasonia vitripennis a non-complementary sex determining system has been recently proposed, based on maternal effected genomic imprinting (van de Zande and Verhulst, 2014). To ensure male development in unfertilized eggs, a womanizer factor, which is maternally silenced during oogenesis and affects tra expression, has been described, opening the road to study the probably highly divergent alternative mechanism that has evolved in course of wasp and bee divergence. The challenge for future studies on species with newly sequenced genomes will be to test evolutionary predictions raised by bioinformatic analyses using functional experiments.

### Acknowledgments

This work was supported by a grant from the Deutsche Forschungsgemeinschaft (DFG) to MH. We thank two reviewers for their helpful comments and Paul D'Alvise for carefully reading a previous version of the manuscript.

#### Supplementary Material

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fgene. 2015.00124/abstract

#### References


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Biewer, Schlesinger and Hasselmann. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

## Metatranscriptomic analyses of honey bee colonies

#### Cansu Ö. Tozkar <sup>1</sup> \*, Meral Kence<sup>1</sup> , Aykut Kence1 † , Qiang Huang<sup>2</sup> and Jay D. Evans <sup>2</sup> \*

<sup>1</sup> Ecological Genetics Laboratory, Department of Biological Sciences, Middle East Technical University, Ankara, Turkey, <sup>2</sup> Bee Research Laboratory, United States Department of Agriculture-Agricultural Research Service, Beltsville, MD, USA

#### *Edited by:*

Juergen Rudolf Gadau, Arizona State University, USA

#### *Reviewed by:*

Marcial Escudero, Doñana Biological Station–Consejo Superior de Investigaciones Científicas, Spain Robert Brucker, Harvard University, USA

#### *\*Correspondence:*

Cansu Ö. Tozkar, Ecological Genetics Laboratory, Department of Biological Sciences, Middle East Technical University, Dumlupınar Bulvarı No: 1, Cankaya, Ankara 06800, Turkey tozkar@metu.edu.tr; Jay D. Evans, Bee Research Laboratory, USDA-ARS, 10300 Baltimore Avenue Bldg. 306 BARC-E, Beltsville, MD 20705-0000, USA jay.evans@ars.usda.gov †Deceased.

#### *Specialty section:*

This article was submitted to Evolutionary and Population Genetics, a section of the journal Frontiers in Genetics

> *Received:* 16 November 2014 *Accepted:* 25 February 2015 *Published:* 19 March 2015

#### *Citation:*

Tozkar CÖ, Kence M, Kence A, Huang Q and Evans JD (2015) Metatranscriptomic analyses of honey bee colonies. Front. Genet. 6:100. doi: 10.3389/fgene.2015.00100 Honey bees face numerous biotic threats from viruses to bacteria, fungi, protists, and mites. Here we describe a thorough analysis of microbes harbored by worker honey bees collected from field colonies in geographically distinct regions of Turkey. Turkey is one of the World's most important centers of apiculture, harboring five subspecies of Apis mellifera L., approximately 20% of the honey bee subspecies in the world. We use deep ILLUMINA-based RNA sequencing to capture RNA species for the honey bee and a sampling of all non-endogenous species carried by bees. After trimming and mapping these reads to the honey bee genome, approximately 10% of the sequences (9–10 million reads per library) remained. These were then mapped to a curated set of public sequences containing ca. Sixty megabase-pairs of sequence representing known microbial species associated with honey bees. Levels of key honey bee pathogens were confirmed using quantitative PCR screens. We contrast microbial matches across different sites in Turkey, showing new country recordings of Lake Sinai virus, two Spiroplasma bacterium species, symbionts Candidatus Schmidhempelia bombi, Frischella perrara, Snodgrassella alvi, Gilliamella apicola, Lactobacillus spp.), neogregarines, and a trypanosome species. By using metagenomic analysis, this study also reveals deep molecular evidence for the presence of bacterial pathogens (Melissococcus plutonius, Paenibacillus larvae), Varroa destructor-1 virus, Sacbrood virus, and fungi. Despite this effort we did not detect KBV, SBPV, Tobacco ringspot virus, VdMLV (Varroa Macula like virus), Acarapis spp., Tropilaeleps spp. and Apocephalus (phorid fly). We discuss possible impacts of management practices and honey bee subspecies on microbial retinues. The described workflow and curated microbial database will be generally useful for microbial surveys of healthy and declining honey bees.

Keywords: *Apis mellifera*, pollination, colony collapse disorder, RNA sequencing, bioinformatics, honey bee viruses, trypanosomes

#### Introduction

The honey bee (Apis mellifera L.) has ecological importance as a natural pollinator of wild flora and crops. Moreover, managed honey bees have economical importance with hive products including honey, pollen, wax, propolis, and royal jelly (Maheshwari, 2003). Recently, declines of managed colonies have been noted on many continents. Several causes of these large-scale losses have been reported, including honey bee parasites (Varroa destructor, Acarapis woodi); pathogens (Nosema spp. and bee viruses); pesticides, contaminated water, use of antibiotics, poor nutrition, and migratory beekeeping practices (Kevan et al., 2007; Higes et al., 2008; Naug, 2009; vanEngelsdorp et al., 2009; Bacandritsos et al., 2010; vanEngelsdorp and Meixner, 2010). The high density of individuals and the exchange of food among A. mellifera colony members create a favorable environment for bacterial, viral, fungal, and protist pathogens and several studies have noted an increase in diversity and infection rates of pathogens in failing bee colonies. Here we will focus on honey bee pathogens and parasites and the use of modern sequencing techniques to identify these agents in healthy and declining colonies.

Among the honey bee pathogens, viruses are of special concern. Viruses are widespread in honey bees although most often without noticeable symptoms (Ball and Bailey, 1997). Multiple viral infections have been diagnosed in many bee colonies (Chen et al., 2004). At least 18 different viruses exist in honey bees (Bailey and Ball, 1991) with six of them; Sacbrood virus (SBV), Deformed wing virus (DWV), Acute bee paralysis virus (ABPV), Black queen cell virus (BQCV), Chronic bee paralysis virus (CBPV), and Kashmir bee virus (KBV) most commonly linked to bee disease. These viruses have strong impacts on managed bee populations, pollination services, and honey production on several continents (Allen and Ball, 1996; Nordstrom et al., 1999; Ellis and Munn, 2005; De Miranda et al., 2010). DWV and ABPV have been linked to parasitic mite loads, while Chronic bee paralysis virus (CBPV) and the related Lake Sinai viruses are also widespread and tied to significant losses in honey bee colonies (Runckel et al., 2011; Ravoet et al., 2013). Sacbrood virus is the only common virus of developing bees, and this virus is not generally implicated in adult bee mortality or morbidity (Anderson and Gibbs, 1989).

Two species of Microsporidia, Nosema apis and Nosema ceranae, are widespread parasites of adult honey bees. N. apis is a long-standing infection agent of the European honey bee, A. mellifera, (Zander, 1909) that causes nosemosis, a disease with mild virulence. N. ceranae was first found as a parasite of the Asian honey bee, Apis cerana (Fries et al., 1996) although this species is now widespread throughout the range of A. mellifera (Fries et al., 1996, 2006; Paxton et al., 2007) arguably thanks to worldwide trade in bees (Klee et al., 2007) and perhaps pollen supplements. Honey bees can be co-infected with both Nosema species (Fries, 2010). Additionally, N. ceranae seems to be replacing N. apis worldwide (Klee et al., 2007). In association with Nosema infections, several viruses can significantly affect the apparent virulence of Nosema (Bailey and Ball, 1991).

Trypanosomes are a varied group of parasites that infect insects (Merzlyak et al., 2001). Crithidia bombi, a trypanosome that infects bumble bees, has effects on behavior (Gegear et al., 2005) and longevity during stressful conditions (Brown et al., 2003). Trypanosomatid parasites parasitize the midand hind-guts of their hosts (Lange and Lord, 2012) and can be widespread (Langridge and McGhee, 1967; Schmid-Hempel and Tognazzo, 2010). The current role of trypanosomes in honey bee health is not clear (Schwarz and Evans, 2013), although they have been recognized as possible correlates with bee declines in two field surveys (Runckel et al., 2011; Ravoet et al., 2013). The most common honey bee trypanosomatid currently is distinct from the species Crithidia mellificae described by Langridge and McGhee (1967), and has recently been named as Lotmaria passim (Schwarz et al., 2015).

Spiroplasmas are particularly virulent pathogens that are found in various environments and implicated as pathogens of plants, vertebrates and insects. They have a seasonal occurrence associated with the nectarines and surfaces of flowers (Markham and Townsend, 1981; Williamson et al., 1989). Adult honey bees are parasitized by two species of bacteria, Spiroplasma apis (Mouches et al., 1983) and Spiroplasma melliferum (Clark et al., 1985). Upon invading the hemolymph, these bacteria can cause a fatal disease called spiroplasmosis or May disease. Gut microbiota of animals living in social communities may influence their health with their functions related to nutrition, immune responses and resistance against pathogens (Dillon and Dillon, 2004; Round and Mazmanian, 2009). Surveys of 16S rDNA sequences from the honey bee indicate the presence of eight predominant species (defined as strains sharing >97% 16S rRNA identity) which account for 95% of the resident bacteria (Moran et al., 2012). These species include the beta-proteobacterium Snodgrassella alvi (family Neisseriaceae) and the gamma-proteobacterium Gilliamella apicola (family Orbaceae), the dominant Gram-negative members of the gut community, with each comprising up to 30–39% of the microbiota (Moran et al., 2012).

The surveillance and discovery of novel pathogens via highthroughput sequencing can provide a relatively unbiased view of pathogens and microbes associated with insects and other arthropods (Bishop-Lilly et al., 2010; Ma et al., 2011; Vayssier-Taussat et al., 2013). Recent efforts based on these technologies have uncovered novel and unexpected taxa associated with honey bees (e.g., Cox-Foster et al., 2007; Runckel et al., 2011; Cornman et al., 2013) and have provided estimates of normal microbial levels vs. those of diseased colonies. Turkey is one of the World's most important centers of apiculture, with managed and wild populations of five subspecies of Apis mellifera L. including A. m. caucasica, A. m. syriaca, A. m. anatoliaca, A. m. meda and an ecotype in Thrace belonging to the carnica subspecies group which is distinctly different from the subspecies found in Anatolia (**Figure 1**) (Kandemir et al., 2000; Bodur et al., 2007). These subspecies cover approximately 20% of honey bee subspecies in the world. Our purpose here was to determine the regional prevalence of bacterial pathogens, viruses, fungi, parasites, protists, and symbionts and to compare their loads in areas where migratory and stationary beekeeping is practiced in Turkey. Monitoring viruses and other disease agents can help solve problems related to the health of stationary and migratory honey bee colonies and limit or avoid their spread. Although commercial migratory beekeeping practices are necessary for pollination and crop production, their effects on honey bee colony health and pathogen transmission should be addressed. We used transcriptome analyses to survey a wide range of pathogens and to detect unexpected or rare taxa. RNA-seq technology allows the precise detection of rare transcripts by mapping reads against inclusive sequence databases,

reducing the repetitive effort and possible biases of conventional molecular diagnostics (Minoche et al., 2011). Identification and confirmation of select virus and pathogen loads was confirmed by quantitative RT-PCR. Our survey was aligned with the European Honey bee colony loss network (www.COLOSS.org) survey, and was conducted to determine the colony losses in many regions representing honey bee diversity in Turkey. By using metagenomic analysis, our results provide new incidence records of the virome and microbiome in search of etiologically unexpected or previously unknown agents among 10 distinct provinces in Turkey and suggest a higher viral prevalence, and increased losses, in migratory beekeeping operations. Apis mellifera is an economically and ecologically important model organism and identification of pathogens and other microbes can have extensive implications for current practices in apiculture and agriculture.

### Materials and Methods

#### Sampling

Colony loss surveys were carried out in eight regions and 158 beekeeping operations in 2010. Among those, 98 were migratory beekeepers and 60 were stationary. In 2011, surveys of 221 beekeepers from seven different regions were evaluated for this study. Samples analyzed for microbial loads comprised adult honey bees collected from field colonies in Turkey during summer and fall of 2010 and 2011. Efforts were made to collect from different regions of Turkey, and from a diverse set of beekeepers (**Figure 1**). Adult bees were collected from 134 colonies from different regions of Turkey from 38, 48, and 51 beekeepers in 2010-Fall, 2010-Spring, and 2011 respectfully and the samples were kept frozen until molecular diagnosis.

#### RNA Isolation

RNA was extracted from a pooled sample of 50 bees from each sampled colony, using an acid-phenol RNA extraction method (Evans et al., 2013).

#### cDNA Synthesis and Real Time qPCR

RNA extracts were used to generate first and second-strand cDNA's using random hexamer primers and the reverse transcriptase Superscript II <sup>R</sup> (Invitrogen™), as described in vanEngelsdorp et al. (2009). Pathogen loads were estimated using real-time quantitative-PCR (qPCR) and a Bio-Rad CFX-96™ thermocycler. Complementary DNA (cDNA) was generated from 1µg RNA template and was amplified in a separate 20µl final reaction volume of Sso-Fast™SYBR <sup>R</sup> Green reaction mix (Bio-Rad™) for each diagnostic primer pair. We used published primers to survey for SBV, KBV, IAPV, DWV, ABPV, BQCV, trypanosomes (vanEngelsdorp et al., 2009), AFB (Evans et al., 2006), Nosema ceranae (Fries et al., 2013) and Nosema apis (Schwarz and Evans, 2013), Spiroplasma apis and Spiroplasma melliferum (Schwarz et al., 2014). Honey bee ribosomal protein S5 (RPS5) was used to normalize for cDNA content and to filter samples for degradation or experimental losses.

We used a thermal profile of 95◦C for 30 s followed by 95◦C for 5 s and 60◦C for 30 s. Steps two and three were repeated for a total of 50 cycles and included plate reads for florescence during each 60◦C step. Following the cycle program, products were denatured for 10 s at 95◦C., reannealed and then a dissociation profile was measured between 69 and 95◦C at an increment of 0.5◦C to provide evidence for reaction fidelity (Evans et al., 2006). Annealing temperature of 55◦C was used for the CytbSF (trypanosomatid L. passim) primers. Positive and negative control reactions were run on each 96-well plate. Pathogen loads (11CT) were determined as the difference between the CT of RPS5 and the CT of each target (1CT), scaled up from the minimal 1CT across all samples.

#### PCR Purification and Sequencing

RT-PCR products were selected for sequencing to confirm the identities of products indicating the trypanosome, L. passim, and the bacteria S. apis and S. melliferum. These PCR products were purified using QIAquick PCR purification kit according to protocol recommended by manufacturer, and then were sequenced commercially by Macrogen (Rockville, MD, USA) DNA sequence similarity with trypanosomes, S. apis and S. melliferum, was confirmed using the BLAST search tool (Altschul et al., 1990) from the U.S. National Institutes of Health and searches against the National Center for Biotechnology Information (NCBI) nr database.

#### High Throughput Sequencing and Data Analysis

High throughput sequencing was performed with the pooled samples from stationary colonies of each region and all RNA samples were pooled before sequencing (n = 6 ILLUMINA RNA libraries). Libraries were run on paired-end 100-cycle reactions and flow cells using Illumina Hi-Seq 2000 machines at the University of Maryland Institute of Genome Sciences. Sequences for all six libraries have been deposited at the US National Institutes of Health NCBI "Honey Bee Disease Database" Bioproject (PRJNA52851).

After trimming and quality control of the generated sequences, transcriptomic analysis was done with the support of CLC Genomics Workbench 7.0.3 (CLC Bio, Aarhus, Denmark) by mapping sequencing reads and counting and distributing the reads across genes and transcripts based on annotated reference genes (**Figure 2**). During the alignment, two mismatches were allowed with deletion cost of three, insertion cost of three, length fraction of 0.8, similarity fraction of 0.8 and maximum 10 hits for a read. For statistical analyses, proportions-based tests were used for the comparison of the counts by considering the proportions that they comprise of the total sum of counts in each sample. Multi-group comparison was done by weighted t-type test statistic Baggerly's test (Baggerly et al., 2003). FDR (False Discovery Rate) corrected p-values were calculated according to the methods of Benjamini and Hochberg (1995) to determine the statistical significance of the pathogen load.

Real time q-PCR data statistics were performed by using JMP™ (SAS Institute, Cary, NC, USA, v.9). A One-Way analysis of variance was used to test differences between group means. The total variability in the response was divided into two parts; within-group variability and between-group variability. The differences between the group means were considered to be significant if the between-group variability was broader relative to the within-group variability. Multiple comparisons of group means were done by using pooled variance estimates for these means. Student's t**-**tests were computed for each pair of group levels and individual pairwise comparisons. Matrix of correlation coefficients that summarized the strength of the linear relationships between each pair of response variables was calculated and Pearson product-moment correlations for each pair of variables were listed. Correlations and the significance probabilities were calculated by the pairwise deletion method and the count values differed if any pair had a missing value for either variable.

#### Results

#### RT-qPCR Results

Kashmir bee virus (KBV), Israeli acute paralysis virus (IAPV), American foulbrood (AFB) and Sacbrood virus (SBV) were not detected by PCR in any of the samples, nor was the microsporidian pathogen N. apis (except for only three colonies in 2011) by RT-PCR. Present in these samples were Acute bee paralysis virus (ABPV), Deformed wing virus (DWV), Black queen cell virus (BQCV), and N. ceranae. Mixed infections of ABPV, DWV, BQCV, and N. ceranae were detected (**Tables 1**, **2**).

The distribution of bee pathogens was significantly different among provinces and with beekeeping practices in 2010 (Table 3 in Supplementary Material). Generally, DWV loads were higher in Bitlis, Hatay, Mugla, and Ardahan than in other regions. ABPV ˘ was the most common virus in Bitlis and was especially high in the samples of migratory beekeepers. Among provinces that were sampled, BQCV loads were lowest in Edirne and highest in colonies of migratory beekeepers sampled in Ardahan, Hatay, and Mugla ( ˘ **Table 2**). In 2011, DWV, BQCV, and N. ceranae loads were significantly different among the regions (Table 3 in Supplementary Material). DWV loads were higher in Mugla, Hatay, ˘ and Yıgılca. BQCV levels were high in Ardahan and Artvin and ˘ was not detected at all in Yıgılca. ˘ N. ceranae was more frequent among Artvin, Ardahan, and Kırklareli but not detected among stationary colonies of Hatay (**Table 2**). ABPV occurrence was the highest in samples from stationary apiaries of Yıgılca and Arda- ˘ han and samples from migratory ones of Hatay and Mugla. The ˘ lowest level was in the stationary colonies of Hatay. ABPV loads showed difference between Yıgılca-Kırklareli and Yı ˘ gılca-Mu ˘ gla ˘ (p = 0.0398 and p = 0.0478).

ABPV loads in Mugla and Ankara, BQCV loads in Ardahan, ˘ Mugla, and ˘ N. ceranae loads in Hatay were significantly higher in the samples of migratory beekeepers than the samples of stationary beekeepers in 2010. Among 2011 samples, the pathogen loads of migratory colonies were also higher in Mugla and Hatay ˘ but the results were not significant (Table 3 in Supplementary Material).

TABLE 1 | The results of pathogen analysis of honey bee samples collected from different regions of Turkey.


\*Detected among samples of 2011 only.

In 2010, N. ceranae was widespread in migratory colonies of Mugla, Hatay, and Bitlis and less frequent in samples from ˘ Ankara and Kırklareli. N. ceranae was not found in samples of beekeepers from Edirne, Artvin, Ardahan, and Elazıg in 2010 ˘ (**Table 2**). In contrast, N. ceranae showed high incidence in all sites except for the stationary colonies of Hatay in 2011 (**Table 2**). N. ceranae loads were higher among migratory colonies than stationary ones in both years. N. apis was not observed among the samples of 2010 and most of the samples of 2011 although the analysis was repeated twice. In 2011, N. apis was detected in three stationary colonies in total, from three beekeepers in Mugla-Bodrum, Düzce-Nas, and Ardahan-Posof. ˘

Trypanosome loads were significantly different among regions in both years. Trypanosome abundance was higher within the samples of migratory beekeepers than those of stationary ones (Table 3 in Supplementary Material). Ten/ten positive samples were confirmed by DNA sequencing as reflecting the 28S rRNA locus of trypanosomes. In general trypanosome levels were higher among samples from Mugla, Ankara, Artvin and Ardahan ˘ compared to the samples from Kırklareli, Edirne and Elazıg, for ˘ 2010. In 2011, levels were especially high in samples from Artvin, Yıgılca, and Mu ˘ gla, and low in samples of stationary colonies in ˘ Hatay and Kırklareli (**Table 2**). Seasonal variation was observed among samples collected in 2010 (ANOVA p = 0.0023). Specifically, trypanosome loads were highest in spring as opposed to fall. In Ardahan and Hatay, trypanosomes were not detected among any of the fall samples. Among the samples with trypanosome infection, thirteen 2010 samples for which products were assayed by DNA sequencing were all confirmed as being part of the "SF" strain of L. passim (identified as C. mellificae) (Runckel et al., 2011) and 10 were positive among 51 samples in 2011, based on qPCR and melt-curve analyses (seven products were confirmed by DNA sequencing). four of the "SF" positive samples were from Mugla, six from Hatay, eight from Artvin-Ardahan, ˘ four from Yıgılca, and one from Elazı ˘ g. GAPDH sequence anal- ˘ yses from Artvin and Yıgılca all matched ˘ L. passim, haplotype A (Morimoto et al., 2013), indicating that this haplotype was more common among the samples when compared with haplotype B.

S. melliferum was detected for two samples in Artvin province in 2010 while 13 samples were positive for Artvin, Ardahan, Mugla, Yı ˘ gılca, and Kırklareli during 2011. Six of the positive ˘ samples were confirmed by DNA sequencing. S. apis was detected in only two samples from Mugla and Elazı ˘ g in 2010 and in one ˘ sample from Yıgılca in 2011. ˘

Dual infections of DWV and BQCV were detected in both migratory and stationary beekeepers in 2010 (50–100% of honey bees in six provinces sampled). Dual infections with these two viruses were less commonly observed in 2011 (18–40% of honey bees in four of the provinces sampled). Triple infections of ABPV, BQCV, and DWV were seen in three regions (50–100% of honeybee samples) in 2010, whereas occurred less (18–40% of honey bee samples) at four regions in 2011. Among 2010 samples, N. ceranae loads were correlated with DWV loads (r = 0.25, p = 0.0186). ABPV loads were correlated with both DWV (r = 0.40, p = 0.0001) and N. ceranae (r = 0.35, p = 0.0009). BQCV correlated with DWV within samples from Ankara, Bitlis, Edirne, and Mugla. In Bitlis, ABPV levels were positively correlated with ˘ DWV (r = 0.89, p = 0.0061). In Hatay, BQCV correlated with N. ceranae (r = 0.47, p = 0.0443). Among 2011 samples ABPV showed correlations with DWV (r = 0.29, p = 0.0381) and in some regions with N. ceranae. A positive correlation was observed between trypanosome load and N. ceranae (r = 0.28, p = 0.0083) and between trypanosomes and BQCV (r = 0.59, p < 0.0001).

According to the survey results, there was an increase in colony losses in 2011 when compared with 2010 losses (**Figure 3**). Kırklareli, Mugla, H ˘ atay, and Ankara had relatively higher colony losses in both years while Ardahan and Artvin tended to have lower losses than the other provinces. High colony losses were also observed in Edirne and Bitlis in 2010 and especially in Yıgılca in 2011. Average colony losses of migratory bee- ˘ keepers were significantly higher than those of the stationary beekeepers.

#### Read Mapping Results

DWV reads were detected in sequenced RNA libraries from all of the regions (**Figure 4** and Table 4 in Supplementary Material), although they were most highly represented in Hatay, Yıgılca, ˘ and Mugla, matching the results from real time q-PCR. We found ˘ strong evidence for Varroa destructor virus-1 as well, a first for these regions. Overall, the DWV/VDV group of iflaviruses was the predominant viral taxon (**Figure 4**). RNA sequences of BQCV were highest in the Ardahan region, although BQCV reads were also abundant in other provinces. Consistent with the fact that BQCV was not detected in Yıgılca by RT-qPCR very few reads ˘ were found in this province during RNA sequencing According to the sequence analysis, ABPV contigs were common in most regions. Consistent with RT-PCR results, they were most frequently observed in Yıgılca and Ardahan. Very few CBPV reads ˘ were found in Kirklareli province and this virus was not detected or was very few in other regions. In contrast, the related Lake Sinai viruses were highly represented both in diversity across this large clade, and in abundance. IAPV and KBV were notably rare or were not present across all regions. Sacbrood virus was similarly rare, although 1885 SBV reads were detected in Ardahan


TABLE 2 | Normalized pathogen loads across sites and between stationary and migratory beekeepers for 2010 and 2011.

Abundances are log-based-two scale so every increase by one number reflects a doubling of pathogen RNA. ND, Not detected; /, Sampling was not done.

region. SBPV reads (Slow Bee Paralysis Virus) were not observed in any of the regions (Table 4 in Supplementary Material).

Nosema ceranae was highly prevalent in our RNA-seq analysis. N. ceranae was especially common in Artvin province, again matching our RT-qPCR analyses. N. ceranae was not detected in stationary colonies of Hatay province but it was evident there by RNA-seq analysis. We found minimal, if any reads matching N. apis in the RNASeq data, those present were largely the result of regions with high sequence similarity to N. ceranae. Reads for the betaproteobacterium, Snodgrassella alvi were common

among all of the provinces with highest frequency in Kırklareli, Mugla, Yı ˘ gılca, and ˘ Ardahan. The gamma-proteobacterium, Gilliamella apicola was prevalent in all regions, while the related Frischella perrara was rare (Table 4 in Supplementary Material). Consistent with both RT-qPCR and DNA sequencing, the presence of Spiroplasma melliferum was confirmed in most of the regions. The abundance of congener S. apis was confirmed with the RNA-seq analysis despite being missed by PCR, yet S. apis remained the minor of the two species.

#### Discussion

#### Evaluation of Real Time q-PCR Results

The surveyed regions comprise important centers of beekeeping in Turkey. Kashmir Bee Virus (KBV) and Israel Acute Bee Paralysis Virus (IAPV) were not detected in sampled colonies by RT-qPCR, while the related virus Acute Bee Paralysis Virus (ABPV) was abundant in most of the samples and was present in the regions where colony declines were observed. Historically, KBV has been found most frequently in the United States and Australia (Allen and Ball, 1996) and this virus is thought to be an exotic bee virus in central Europe (Berenyi et al., 2006). Our discovery of just one of these three species could reflect climatic and environmental conditions (Anderson, 1991) or differential abilities of these viruses to infect different honey bee subspecies.

DWV was detected in both migratory and stationary operations in all of the regions, consistent with the high prevalence of this virus worldwide (Tentcheva et al., 2004; Berenyi et al., 2006; Kukielka et al., 2008; Nielsen et al., 2008; Welch et al., 2009). The presence of DWV has been reported in colonies of Apis mellifera L. in Ordu province of Turkey (Gülmez et al., 2009). BQCV was the second most widespread bee virus in our study. BQCV is also the second-most prevalent virus in honey bee colonies in Asia and Europe (Tentcheva et al., 2004) and its presence was confirmed in many studies (Berenyi et al., 2006; Kukielka et al., 2008; Welch et al., 2009; Choe et al., 2012). BQCV was previously reported in 21.42% of 28 bee samples from six provinces of the Black Sea Region in Turkey (Gümü¸sova et al., 2010). BQCV was more prevalent in migratory bee colonies, consistent with the results found for migratory bees sampled in the U.S. (Welch et al., 2009).

In most of the regions we studied, BQCV and DWV were found together more often than expected by chance. This result is consistent with the findings in a previous study of virus infections (Chen et al., 2004) for which these distantly related viruses were found coinfecting honey bees at a high frequency. Triple infections of ABPV, BQCV, and DWV were detected in some colonies. Nielsen et al. (2008) also found a high incidence of dual and triple infections. According to Chen et al. (2004), 50% of colonies in the USA had dual infections while 7% had triple infections. Simultaneous multiple infections were also common in Austria (Berenyi et al., 2006), Southwest England (Baker and Schroeder, 2008), Brazil (Teixeira et al., 2008), Jordan (Haddad et al., 2008), France (Tentcheva et al., 2004), and Hungary (Forgach et al., 2008).

In Turkey, the presence of N. apis was confirmed earlier (Aydin et al., 2005; Muz et al., 2010; Whitaker et al., 2010). Using qPCR, we detected N. apis in only three colonies, indicating the replacement of N. apis by N. ceranae in many regions included in this study. Worldwide, N. ceranae is now far more common than N. apis (Chen et al., 2009; Valera et al., 2011; Yoshiyama and Kimura, 2011).

In 2006, N. ceranae was found in the provinces of Artvin, Hatay, and Mugla ( ˘ Whitaker et al., 2010). Collapsed colonies from the Hatay overwintering region and Southeastern Marmara were found to show infections of N. ceranae (Muz et al., 2010). In our 2010 samples, N. ceranae was observed in Bitlis, Hatay, and Mugla and not in Edirne, Artvin, Ardahan, and Elazı ˘ g in ˘ 2010. Interestingly, N. ceranae was widely common for all of the regions involved in the study in 2011. Detection of N. ceranae among 2011 samples of Ardahan-Artvin might be the reason for the increase in colony losses in these provinces in 2011. The incidence of N. ceranae in Kırklareli province in 2011 was much more than the previous year. N. ceranae loads of samples from migratory beekeepers were significantly higher than the stationary ones and N. ceranae loads were correlated the viruses ABPV, BQCV and DWV. We propose that the infectivity of N. ceranae expands with migratory beekeeping activities and in association with different viruses and trypanosomatids. In July, the Thrace region is a highly frequented location by migratory beekeepers seeking to harvest sunflower honey. It is thought that these beekeepers in this region had colony losses because of the application of pesticides rather than honey bee disease factors. Hatay and Ankara are also important locations for migratory beekeepers. Caucasian bees which are used by most of the migratory beekeepers, are raised in Ardahan and Artvin. Bitlis also suffered high colony losses in 2010, combined with high disease loads among samples of migratory beekeepers. In contrast, while honeybee samples in Yıgılca were derived from stationary colonies, high colony losses ˘ and high pathogen loads were observed. Probably some other factors were contributing to colony losses in this province. All five viruses and pathogens were also detected in Mugla, the center of ˘ migratory beekeeping. These results could reflect the differences in pathogen exposure of local and migratory colonies, varying resistance levels, or perhaps a differential ability to handle stress. Genetic impacts of migratory beekeeping has become an important concern in Turkey and elsewhere. Population structure can be disturbed with hybridization, leading to a loss of regionally adaptive traits and perhaps decreasing colony fitness.

Trypanosomes were highly prevalent in our samples. Trypanosomes have also been reported in Australia (Langridge and McGhee, 1967), China (Yang et al., 2013), France (Dainat et al., 2012), Japan (Morimoto et al., 2013), Switzerland (Schmid-Hempel and Tognazzo, 2010), USA (vanEngelsdorp et al., 2009; Runckel et al., 2011), and Spain (Orantes-Bermejo, 1999). Trypanosome levels were higher in summer samples than the fall samples in our samples. In southern Spain, this parasite appears in July and August (Orantes-Bermejo, 1999) and L. passim (recently named as the primary trypanosomatid in lieu of C. mellificae, Schwarz et al., 2015) was common in summer colonies in Belgium (Ravoet et al., 2013). These findings are in contrast with the results of Runckel et al. (2011) which show a peak in L. passim levels in January in U.S. colonies.

Synergistic effects can make colonies more vulnerable to other pathogens (Cornman et al., 2012) and we noted a positive correlation between L. passim and N. ceranae prevalence. Similarly, this relationship was documented in field surveys from the U.S. (Runckel et al., 2011) and the co-occurence of L. passim and N. ceranae in summer was tied to higher colony mortality in Belgium (Ravoet et al., 2013). Complex dynamic immune responses of honey bees to both Nosema and trypanosomatids were recently reported (Schwarz and Evans, 2013) and it will be interesting to test further for mechanistic explanations for any synergisms between these parasites.

#### Metagenomic Sequencing

As in prior studies, iflaviruses related to Deformed wing virus (DWV) were especially prevalent. While DWV amounts were higher, reads for the closely related Varroa destructor-1 virus (VDV-1) were fairly abundant in Hatay, Yıgılca, and Mu ˘ gla. ˘ CBPV reads were relatively rare in most of the regions and were not detected in Yıgılca and Ardahan. CBPV was reported ˘ in seven of 28 (25%) samples from six provinces of Black Sea region in Turkey (Gümü¸sova et al., 2010). CBPV was detected in four of 96 apiaries in survey study of Denmark, in 73% of the samples from Greek and 9% of apiaries in China (Nielsen et al., 2008; Bacandritsos et al., 2010; Ai et al., 2012). A recently described relative of CBPV, Lake Sinai virus, was prevalent in all regions, second only to the DWV and VDV group. In fact, this group was the most prevalent virus in Kirklareli and Ardahan provinces. The pathogenic or epidemiological significance of Lake Sinai viruses are not well-known. The LSV species complex is diverse, with members sharing between 70 and 99% sequence identity, hence this group is often missed in screenings based on PCR. LSV4 appears to be especially abundant, along with LSV1 and LSV2. There was significant variation across the regions in the specific lineages seen for this group. LSV2 was the most abundant single component of the honey bee microbiome in the study of Runckel et al. (2011). The presence of LSV was also confirmed in honey bees from Spain by high-throughput sequencing (Granberg et al., 2013). A new fourth strain of Lake Sinai Virus (LSV) was identified in the study of Ravoet et al. (2013).

In this study, despite tens of millions of microbial gene reads, few reads matching IAPV, KBV, and SBV were seen. IAPV, KBV, and SBV were not detected using RT-qPCR, indicating that RNAseq sensitivity was higher than that of qPCR, or that current qPCR primers for this group must be redesigned to capture all strains. Given the RNA-Seq data, we can consider that these pathogens are quite rare in our samples. KBV has been absent in some other European surveys (e.g., Berenyi et al., 2006; Forgach et al., 2008) and rare in surveys in France, Denmark, and United Kingdom (Tentcheva et al., 2004; Ward et al., 2007; Nielsen et al., 2008). IAPV is prevalent in the Middle East and Australia (Maori et al., 2007; Palacios et al., 2008) and this species was reported in 71 samples containing 10 bees; each from 20 provinces in Turkey (Ozkirim and Schiesser, 2013). In this study IAPV wasn't confirmed by RT-qPCR in our samples and RNA sequence analysis revealed that this virus was not present in Mugla and Hatay ˘ and was incredibly rare among samples from Artvin, Yıgılca, ˘ and Ardahan. SBV was similarly scarce in this study. This conflicts with results of Tentcheva et al. (2004) but overlaps with the results of Baker and Schroeder (2008) and Forgach et al. (2008). As with most surveys, we examined only adult honey bees. SBV causes a fatal disease in honey bee larvae (Bailey, 1975), thus brood samples could provide more evidence for its prevalence.

Apis mellifera filamentous virus (AmFV) was not detected within our samples, nor was VdMLV (Varroa Macula-like virus). A plant-pathogenic RNA virus, tobacco ringspot virus (TRSV) was not detected in any of the RNA samples in our study.

Recent culture-independent studies reported eight bacterial phylotypes inhabiting the gut of the honey bee, Apis mellifera from several continents (Jeyaprakash et al., 2003; Mohr and Tebbe, 2006; Babendreier et al., 2007; Cox-Foster et al., 2007; Olofsson and Vasquez, 2008). Colonies founded by swarms, interactions within the colony, intercolony interactions like robbing food in neighboring hives and mixing of colonies by beekeepers all might affect the gut microbiota (Engel et al., 2012). In our study, unique gene reads of Snodgrassella alvi and Gilliamella apicola were highly represented and showed differences among provinces because of geographic, environmental and subspecies differences of hosts. The differences in social behaviors of the subspecies, the dietary sources and exposure to varying pathogens and pesticides might influence the abundance of these bacteria among regions. One gamma-proteobacterial member of the gut microbiota Candidatus Schmidhempelia bombi, was present in 90% of bumble bee individuals in the study of Martinson et al. (2014). This symbiont was prevalent in all of our surveyed locations and widely represented in Hatay, Yıgılca, and Mu ˘ gla. ˘ The recently described gamma-proteobacterium Frischella perrara (Engel et al., 2013) was present in our study albeit at low levels.

Lactobacillales symbionts have been proposed as actors in both nutrition and parasite defenses of honey bees. Lactobacillales stimulate the innate immune system, arguably increasing honey bee defenses against disease agents (Evans and Lopez, 2004). Along with their impacts on immunity, the microbial symbionts have been proposed to nutritionally compete with pathogens by occupying the available niches (Crotti et al., 2012). In this study, very low number of 16S Lactobacillus reads were observed among regions. Overall bacterial loads were especially high in Hatay, Yıgılca, and Artvin. The prevalence of the bac- ˘ terial pathogens S. apis and S. melliferum is low among Belgian honey bee colonies (Ravoet et al., 2013), and these bacteria are present only seasonally in North and South America (Runckel et al., 2011; Schwarz et al., 2014). We found low Spiroplasma levels, with highest incidence in Artvin province. S. melliferum was the more common species, matching results from the Americas (Schwarz et al., 2014). The bacterial brood diseases European Foulbrood (EFB) caused by the bacterium Melissococcus plutonius (Bailey et al., 1983) and American Foulbrood (AFB) caused by the bacterium Paenibacillus larvae (Genersch et al., 2006) are globally important diseases of honey bees. P. larvae reads were not common among our samples. Similarly, M. plutonius was not prevalent among the regions but was more ubiquitous in Hatay, Mugla, and Yı ˘ gılca. Environmental conditions at these sites can ˘ be conductive for the expression of the disease. Like AFB, EFB transmission is also linked to larval immune responses (Evans, 2004), hygienic behavior (Spivak and Reuter, 2001) as well as interaction between M. plutonius and the intestinal microbiota of the honey bee larvae (Gilliam, 1997; Olofsson and Vasquez, 2008), nutritional and stress conditions, weather and geography (Bailey, 1961).

Sequences for Ascosphaera apis, the causative agent for Chalkbrood disease, were generally rare, with the highest incidence from samples of in Hatay province. Similarly, neogregarines (nominally Apicystis bombi) persisted at extremely low levels among regions in this study. Among arthropod parasites of honey bees, we found no genetic evidence for the presence of the tarsonemid tracheal mite, Acarapis woodi within our samples. The Asian parasitic mite Tropilaelaps is considered more dangerous to A. mellifera than the parasitic mite Varroa destructor (Rath et al., 1995), and this mite is worthy of screening. Our deep sequencing analysis showed no sign of Tropilaelaps. The presence of phorid flies (Apocephalus borealis) in the study of Ravoet et al. (2013) proves their existence in Europe, but our deep sequencing did not reveal signs of this parasite.

In conclusion, we screened bee-derived RNA against the most complete sequence set for honey bee associates used to date. We assessed levels of known and novel parasites, pathogens, and symbionts. We present quantitative data for bacterial pathogens (Melissococcus plutonius, Paenibacillus larvae, S. apis, S. melliferum), protists (Apicytis, trypanosomatids), viruses (Lake Sinai virus, Chronic bee paralysis virus, Deformed wing virus, Varroa destructor virus, Sacbrood, and Dicistroviruses), symbionts (Candidatus Schmidhempelia bombi, Frischella perrara, Snodgrassella alvi, Gilliamella apicola, Lactobacillus spp., Acetobacteracea), microsporidia and fungi in A. mellifera colonies in distinct regions of Turkey. The presence of KBV, SBPV, Tobacco ringspot virus, VdMLV (Varroa Macula like), Acarapis spp., Tropilaelaps claerae and Apocephalus (phorid fly) were also examined. As in other countries, bee viruses were correlated with colony losses in Turkey. In comparison with 2010, the increase in pathogen loads in 2011 might be a factor for increased colony losses observed in this study. It seems likely that migratory beekeeping practices enable the spread of disease factors among honey bees in places where they visit and causing an important threat to the honey bee colonies. In addition, the impacts of parasites and pathogens varies between regions, perhaps reflecting different honey bee genetic traits. Migratory beekeeping was correlated with both higher disease loads and a potential risk of dispersing regional parasites and pathogens across the country. This practice also allows for greater gene flow between migratory honey bee populations and local populations. Current diversity and local genetic structure can be preserved with selection strategies and establishing broad areas of isolation to reduce the risks of migratory beekeeping practices. While experimental work and longitudinal analyses will be needed to confirm causes of bee declines, our analyses, reference sequences, and strategy will help reduce the set of likely causes.

### Acknowledgments

It is with sadness that we note the passing of COLOSS Management Committee member Dr. AK on February 1, 2014. We dedicate this paper to the memory of Professor AK, for initiating the study and for his teaching, mentoring, collegiality and friendship. The work was part of the project "The effects of the genetic differences of Turkish honey bees on colony vitality" and the work was funded by "The Science and Technological Research Council of Turkey" (TUBITAK Project Number:109T547). The work was also supported by Middle East Technical University (BAP Project number:BAP-01-08-2012-004. The molecular diagnostics part was technically and financially supported by the USDA-ARS Bee Research Laboratory. We would like to express our gratitude to Dr. Ryan Schwarz for technical advice, Dawn Lopez and Margaret Smith for their laboratory guidance, and the scientists of the BRL for their feedback. We appreciate the advice and help of Prof. Dr. Muhsin Dogaroglu, Prof.Dr. Banu Yucel and Dr. Hasan Huseyin Inal in the initiation of the project. We are also grateful to Assist. Prof. Dr. Mustafa Muz, Okan Can Arslan,

#### References


Mehmet Kayim, Mert Kukrer, and Mustafa Nail Cirik for their contributions to the field work.

#### Supplementary Material

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fgene. 2015.00100/abstract


Maheshwari, J. K. (2003). Endangered pollinators. Environ. News Arch. 9, 32–45.

Maori, E., Lavi, S., Mozes-Koch, R., Gantman, Y., Peretz, Y., Edelbaum, O., et al. (2007). Isolation and characterization of Israeli acute paralysis virus, a dicistrovirus affecting honeybees in Israel: evidence for diversity due to intra- and interspecies recombination. J. Gen. Virol. 88, 3428–3438. doi: 10.1099/vir.0.83284-0

Markham, P. G., and Townsend, R. (1981). Spiroplasmas. Sci. Prog. Oxf. 67, 43–68.


are circulating in the United States. J. Virol. 82, 6209–6217. doi: 10.1128/JVI.00 251-08


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Tozkar, Kence, Kence, Huang and Evans. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.