# EPIGENETICS AS A DEEP INTIMATE DIALOGUE BETWEEN HOST AND SYMBIONTS

EDITED BY: Ilaria Negri and Eva Jablonka PUBLISHED IN: Frontiers in Genetics

#### *Frontiers Copyright Statement*

*© Copyright 2007-2016 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88919-875-7 DOI 10.3389/978-2-88919-875-7

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

### **EPIGENETICS AS A DEEP INTIMATE DIALOGUE BETWEEN HOST AND SYMBIONTS**

Topic Editors: **Ilaria Negri,** Koiné - Environmental Consulting S.n.c., Italy **Eva Jablonka,** Tel-Aviv University, Israel

Fluorescent in situ hybridization of the endosymbiont *Wolbachia* (yellow dots) in the ovary of the insect *Zyginidia pullula*. Copyright © American Society for Microbiology, [Applied and Environmental Microbiology, 77, 2011, 1423-1435, 10.1128/AEM.02121-10]

Cover image by Fabio Sabatini

Symbiosis is an intimate relationship between different living entities and is widespread in virtually all organisms. It was critical for the origin and diversification of Eukaryotes and represents a major driving force in evolution. Indeed, symbiosis may support a wide range of biological processes, including those underlying the physiology, development, reproduction, health, behavior, ecology and evolution of the organisms involved in the relationship.

Although often confused with mutualism, when both organisms benefit from the association, symbiosis actually encompasses several and variable relationships. Among them is parasitism, when one organism benefits but the other is harmed, and commensalism, when one organism benefits and the other remains unaffected.

Even if many symbiotic lifestyles do exist in nature, in many cases the intimacy between the partners is so deep that the "symbiont" (sensu strictu) resides into the tissues and/or cells of the other partner. Since the partners frequently belong to different kingdoms, e.g. bacteria, fungi, protists and viruses living in association with animal and plant hosts, their shared "language" should be a basic and ancient form of communication able to effectively blur the boundaries between extremely different living entities.

In recent years studies on the role of epigenetics in shaping host-symbiont interactions have been flourishing. Epigenetic changes include, but are not limited to, DNA methylation, remodelling of chromatin structure through histone chemical modifications and RNA interference. In this E-book we present a series of papers exploring the fascinating developmental and evolutionary relationship between symbionts and hosts, by focusing on the mediating epigenetic processes that enable the communication to be effective and robust at both the individual, the ecological and the evolutionary time scales. In particular, the papers consider the role of epigenetic factors and mechanisms in the interactions among different species, comprising the holobiont and host-parasite relationships. On the whole, since epigenetics is fast-acting and reversible, enabling dynamic developmental communication between hosts and symbionts at several different time scale, we argue that it could account for the enormous plasticity that characterizes the interactions between all the organisms living symbiotically on our planet.

**Citation:** Negri, I., Jablonka, E., eds. (2016). Epigenetics as a Deep Intimate Dialogue between Host and Symbionts. Lausanne: Frontiers Media. doi: 10.3389/978-2-88919-875-7

## Table of Contents


Elena Gómez-Díaz, Ana Rivero, Fabrice Chandre and Victor G. Corces

## Editorial: Epigenetics as a Deep Intimate Dialogue between Host and Symbionts

Ilaria Negri <sup>1</sup> \* and Eva Jablonka<sup>2</sup>

<sup>1</sup> Koiné - Environmental Consulting S.n.c., Parma, Italy, <sup>2</sup> The Cohn Institute for the History and Philosophy of Science and Ideas, Tel-Aviv University, Tel-Aviv, Israel

Keywords: epigenetics, holobiont, host-symbiont crosstalk, pathogen, DNA methylation, histone modifications, chromatin re-modeling, genome immunity

**The Editorial on the Research Topic**

**Epigenetics as a Deep Intimate Dialogue between Host and Symbionts**

### HOST-SYMBIONT EPIGENETIC CROSSTALK: A "KOINÉ LANGUAGE" THAT ENABLES COMMUNICATION BETWEEN DIFFERENT SPECIES

In ancient Greek, the term koiné refers to a common language, which spread in the Eastern Mediterranean region and the Near East following the conquests of Alexander the Great. In the 4th century BC, koiné quickly became a shared language that enabled people speaking different dialects to communicate with each other. Indeed, effective communication is always necessary for overcoming "barriers" between individuals and groups, tightening relationships, and enabling new ways of coping with a changing world. This is the case not only for interactions involving cultural relations. Different living entities communicate in many ways, with the most intimate and often the most long-lasting communication being the formation of mutually beneficial, obligatory symbiotic associations. As Lynn Margulis forcefully argued, symbiotic relations are found among virtually all living organisms, and were critical for the origin and diversification of the eukaryotes (Sagan, 1967; Margulis, 1971; Guerrero et al., 2013). Since the partners in symbiotic relationships frequently belong to different kingdoms, and the intimacy may be so deep that the Symbionts reside in the tissues and/or cells of the host, their shared "language" should be a basic—and ancient—form of communication. Such effective communication blurs the boundaries between different living entities, giving rise to a single biomolecular network, a "holobiont" with a "hologenome" (Zilber-Rosenberg and Rosenberg, 2008; Bordenstein and Theis, 2015), thus problematizing the conventional notion of individuality. The series of papers presented in this topic-issue explore the fascinating developmental and evolutionary relationship between Symbionts and hosts, by focusing on the mediating epigenetic processes that enable the communication to be effective and robust at both the individual, the ecological, and the evolutionary time scales.

One of the currently most researched cases illustrating the productive cross-talk between Symbionts and hosts is the symbiosis between mammals and their gut microbiome. As Gilbert stresses, the birth of a mammal is not merely the origin of a new distinct (traditional) individual, but the onset of a new community. Already in utero, the fetus interacts with a network of symbiotic

Edited and reviewed by: Michael E. Symonds, The University of Nottingham, UK

\*Correspondence: Ilaria Negri ilarianegri@koineambiente.com

#### Specialty section:

This article was submitted to Epigenomics and Epigenetics, a section of the journal Frontiers in Genetics

Received: 17 December 2015 Accepted: 20 January 2016 Published: 09 February 2016

#### Citation:

Negri I and Jablonka E (2016) Editorial: Epigenetics as a Deep Intimate Dialogue between Host and Symbionts. Front. Genet. 7:7. doi: 10.3389/fgene.2016.00007

factors that is provided by the mother and, at birth, as it leaves the maternal symbiotic association system, it forms its own community which, although largely based on the legacy it gets from its mother, becomes rapidly adapted to its specific idiosyncratic conditions of life. Since the normal development and the thriving of the newborn depend on the symbiotic legacy it receives, it is the holobiont the community of interacting species—that is the target of both developmental and natural selection. However, as Fridmann-Sirkis et al. illustrate, in extreme circumstances, when interactions fail, for example, because of acute stress to the microbiome, the host adapts by making adaptive changes in its epigenome, which can be inherited by subsequent generations. In more normal conditions, the maternal legacy provides crucial developmental resources, and is the basis on which flexible, context-sensitive, modifications of these resources, that allow rapid adaptation to the normally dynamic and fluctuating conditions in which the new individual lives, are constructed.

Since the microbiome reacts and evolutionarily adapts much faster than the host, most host's physiological adaptations are likely to be initiated by evolutionary changes to the constitution of the microbiome. Soen analyzes the manner in which the changes in the constitution of the microbiome lead to the modification of the communication process among the species comprising the microbiome, as well as between the microbiome and the host; and shows how these changes enable the developmental construction of a new dynamic, adaptive equilibrium, involving mutually beneficial epigenetic modifications in the host that can be epigenetically inherited between host generations. Hence, adaptive evolutionary modifications in the rapidly evolving microbiome, that occur at the time scale of host's individual life, are followed by the epigenetic inheritance of the host's physiological adaptations, thus enabling adaptive adjustments of the host at the ecological time scale (Soen; see also Moran and Sloan, 2015).

Although many symbiotic associations play essential roles in host development, physiology, and health, some associations may be harmful for the host. In these cases, the "selfish" Symbiont (often referred to as "parasite") must evade the host's immune responses, and this may occur through "misleading" the host, for example, by mimicking the host's signaling and control system for its own benefits. Some intracellular pathogenic bacteria, such as Anaplasma phagocytophilum a pathogen studied by Sinclair et al. encode eukaryotic-like proteins (e.g., the nucleomodulin AnkA) for subverting the host cells metabolism by recruiting chromatinmodifying enzymes or by altering the folding patterns of chromatin that bring distant regulatory regions together to coordinate control of transcriptional reprogramming. Other epigenetic factors are SET-domain containing proteins, which are known to modify chromatin structure in eukaryotic cells. Alvarez-Venegas shows that SET domain genes, which meditate host-symbiont epigenetic interactions, and that have been identified in several bacterial genomes on the basis of their similarity to the SET domains of eukaryotic histone methyltransferases, allow pathogens to inhibit transcriptional activation of host defense genes. Indeed, there is plenty of evidence supporting the role of epigenetic mechanisms (e.g., DNA methylation, mechanisms that re-model chromatin structure through histone modifications, and mechanisms underlying RNA interference) both in pathogen plasticity and pathogen-induced alterations of the host (Gómez-Díaz et al., 2012). One such example is the molecular pathogenesis of Epstein–Barr virus (which induces diverse lymphoid and epithelial malignancies) discussed by Niller et al. that is accompanied by epigenetic alterations of both the viral and the host genomes.

Some "parasites" promote genetic exchange with the host genome. According to many studies, the exchange of human and viral genes must have happened repeatedly during evolution, and this can account for the high degree of homology between many viral and human genes. Niller et al. point to the example of the thymidylate synthase gene in the genome of Herpesvirus saimiri, which has 70% amino acids homology to that of the human gene, and Shapiro reviews and discusses the already massive and still growing evidence showing that such invasions into a host genome through virally-transmitted genetic elements (transposons, retrotransposons, and genomic proviruses) rewire the hosts' genomic networks, rearrange the host's genome and exert new types of selection pressures on it. The evolution of epigenetic silencing strategies, that prevent mobile DNA from destroying the integrity of the host genome and that have epigenetic heritable effects, is one such genomic adaptation. Indeed, it has been argued that one of the main functions of epigenetic mechanisms is a form of "genome immunity," promoting protection against foreign genetic elements. Sagy et al. suggest that this cellular immune function may explain the systematic failure of cloning some eukaryotic genomes in bacteria. For example, about 20% of the nematode Caenorhabditis elegans genome is not clonable in Escherichia coli, but can be cloned in yeast, a eukaryote. Interestingly, the bacterially unclonable sequences are enriched with repetitive DNA transposons and PIWIinteracting small RNAs (piRNAs). In worms and many other organisms (other animals and protists), piRNA-mediated RNA interference has a role in genome surveillance against foreign sequences. Sagy et al. therefore propose that piRNAs may act in trans to eliminate bacteria (E. coli), thus preventing its ability to be cloned within the bacterium. As they note, this role of piRNAs may have therapeutic applications, and could be used against pathogenic bacteria. Another potential epigeneticmedical intervention can be based on a genome-wide profiling of histone modifications combined with gene expression in the human malaria vector Anopheles gambiae. Targeting changes in histone modification and transcription of genes that regulate malaria transmission may provide, as Gómez-Díaz et al. point out, new type of malaria control. Deciphering the mechanisms underlying the molecular "dialogue" between hosts and symbionts therefore seems of major importance for many research areas including medicine, genetics, cell biology, zoology, microbiology, evolutionary biology, and ecology.

#### CONCLUDING REMARKS AND PERSPECTIVES

The papers published in this topic-issue, which focus on the role of epigenetic factors and mechanisms in the interactions among the different species comprising the holobiont, point to important research directions. Epigenetic mechanisms enable dynamic developmental communication, an epigenetic koiné, between hosts and symbionts at several different time scales. We are confident that the molecular-epigenetic

#### REFERENCES


technologies already available—and those that are rapidly being developed—will provide important insights into the evolution and development of the organisms on our planet, whose history and future are based on ongoing communication and interactions.

#### AUTHOR CONTRIBUTIONS

All authors listed have made substantial, direct and intellectual contribution to the work, and approved it for publication.

Zilber-Rosenberg, I., and Rosenberg, E. (2008). Role of microorganisms in the evolution of animals and plants: the hologenome theory of evolution. FEMS Microbiol. Rev. 32, 723–735. doi: 10.1111/j.1574-6976.2008. 00123.x

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Negri and Jablonka. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

### A holobiont birth narrative: the epigenetic transmission of the human microbiome

#### *Scott F. Gilbert 1,2\**

*<sup>1</sup> Department of Biology, Swarthmore College, Swarthmore, PA, USA <sup>2</sup> Biotechnology Institute, University of Helsinki, Helsinki, Finland*

#### *Edited by:*

*Eva Jablonka, Tel-Aviv University, Israel*

#### *Reviewed by:*

*Ian C. G. Weaver, Dalhousie University, Canada Margaret Jean McFall-Ngai, University of Wisconsin-Madison, USA*

#### *\*Correspondence:*

*Scott F. Gilbert, Biotechnology Institute, University of Helsinki, 500 College Avenue, Swarthmore, Helsinki, PA 19081, Finland e-mail: sgilber1@swarthmore.edu*

This essay plans to explore, expand, and re-tell the human birth narrative. Usually, human birth narratives focus on the origins of a new individual, focusing on the mother and fetus. This essay discusses birth as the origin of a new community. For not only is the eukaryotic body being reproduced, but so also are the bodies of its symbiotic microbes and so is the set of relationships between these organic components. Several parts of the new narrative are surprising: (1) bacterial symbionts might cause some of the characteristics of pregnancy and prepare a symbiotic community for transfer; (2) the first bacterial colonizers of the mammalian organism my enter the fetus prior to the lysing of the amniotic membrane and birth; (3) the same signals that often cause immunological attack against a microbe may serve under these conditions to signal homeostatic stability between symbiont and host; and (4) the mother may actively provide substances that promote the growth and settlement of helpful bacteria. The birth of the holobiont exemplifies principles of co-evolution, co-development, niche construction, and scaffolding. Birth is nothing less than the passage from one set of symbiotic relationships to another.

**Keywords: holobiont, birth, symbiosis, colonization, individuality**

#### **RETHINKING THE BIRTH NARRATIVE**

This essay plans to explore, expand, and re-tell the human birth narrative. Usually, human birth narratives focus on the origins of a new *individual*, focusing on the heroic trevails of the mother or the amazing journey of fetus. I wish to discuss birth as the origin of a new *community*<sup>1</sup> . For not only is the eukaryotic body being reproduced, but so also are the bodies of its symbiotic microbes and so is the set of relationships between these organic components. Not only are the nuclear and mitochondrial genomes being transmitted, so are the genomes of the symbiotic community, whose microbial genes outnumber those of the eukaryotic component by over 100-fold (Human Microbiome Project Consortium, 2012; McFall-Ngai et al., 2013). Birth is nothing less than the passage from one set of symbiotic relationships to another. The holobionts (mother, fetus, infant) are preserved, but the components of these consortia have changed.

Mammalian birth is one of the great biological stories—the fundamental mammalian symbiosis of mother-and-fetus becoming another fundamental mammalian symbiosis, that of motherand-child or mother-and-pup. In the classical story, the two partners are obviously the mother and the conceptus (zygote, embryo, fetus, child). Their interactions constitute the grand idea behind the obstetric subspecialty of Maternal-Fetal Medicine, which justifies its existence through the claim that the mother and fetus constitute a co-organized interacting whole, the "maternalfetal unit." Indeed, they are a symbiotic unit, a system, where the treatment of one affects the physiology of the other. The mother influences the fetus, providing it with nutrition, oxygen, antibodies, and hormones for its growth. The fetus reciprocally influences the mother, changing her blood circulation, immune responsiveness, and metabolism, while providing her with hormones to retain pregnancy. The physiology of the mother changes as the pregnancy continues, with both the mother and the fetus producing hormones and other growth factors to influence the other's survival and development.

But there is a third major player in this symbiotic mix: the mother's microbes. The pregnant mammal is herself a symbiotic community, a holobiont, composed of numerous species, most of them bacterial (Rosenberg et al., 2007; Gilbert et al., 2012). Nine out of every 10 of the mother's cells are microbial (Bäckhed et al., 2005; Ley et al., 2006) and metagenomic sequencing has shown that each human has entered into partnerships with over 150 species of bacteria (Qin et al., 2010). These bacteria are actively metabolizing nutrients, and the blood being received by the fetus has been substantially altered by the mother's microbes. About 30–35% of the metabolites in mammalian blood has a bacterial origin (Wikoff et al., 2009; McFall-Ngai et al., 2013). Different microbes metabolize dietary products differently, and different diets promote the population of different microbial communities within the mother (Turnbaugh et al., 2006, 2008; Frankenfeld et al., 2014). In mice, for instance, nearly all the blood-born serotonin is made by symbiotic bacteria (Wikoff et al., 2009). So the fetus is not free of a mother's symbiotic associations, even if it is thought that the fetus is sterile or persists in a sterile environment. Rather, the microbes of the mother are interacting with it.

<sup>1</sup>Always be wary of male scientists telling birth narratives.

So the fetus survives and develops in a network of symbiotic relationships provided by the pregnant mammal. When the infant/pup is born, these young animals will be going from one set of symbiotic associations to another. They will be leaving the symbiotic consortium of the mother and forming their own symbiotic consortium. They will not be "independent organisms," nor will they ever be. One is always a holobiont. But at birth, one has to pass through different symbiotic associations. Birth is the process of leaving of one symbiotic association system and forming another.

And remarkably, this transition and transmission is mediated by the maternal holobiont. This transition is not as stark as might be imagined. Rather, the mother is both actively and passively engaged in providing the symbiotic community that will persist over the life of the newborn. The standard scientific birth narrative has been that the fetus is growing in sterile conditions within the amnion, and that bacterial colonization of the fetus is made possible as soon as the amnion breaks and the fetus travels through the birth canal (see Funkhouser and Bordenstein, 2013 for a history and critique of this view). Then, according to this hypothesis, as the fetus traverses the cervix and vagina, the bacteria residing there can enter this new body, whose immune system is not sufficiently mature to attack them. Somehow, though, only certain bacteria can be let into this body, while other bacteria must be kept out. Moreover, only certain areas (gut, pores, mouth, etc.) will be allowed to retain persistent colonies, and these places will select for different types of bacteria.

I wish to critique this standard model more thoroughly, for recent studies find several more interactive and important roles for the mother. The symbiotic microbiome must be understood as constituting a third set of inherited genes. In addition to the nucleus and mitochondria, the symbiotic microbiome is passed from one generation to the next (see Moran, 2007; Douglas, 2010; Gilbert, 2011). This inheritance can be vertically in the germ line (as is often the case in invertebrates) or horizontally by infection (as is often seen in vertebrates). In some cases, both types of transmission are used. In mammals, where the germline is not seen to contain symbiotic bacteria, the newborn acquires its symbionts from its immediate environment. However, the mammalian fetus does not just leave the uterus and passively acquire a new set of symbionts. Rather, the mother actively passes the symbiotic baton to the developing fetus, and she doesn't relinquish control as rapidly and immediately as one might expect from the standard story. Indeed, the colonization of the body, along with the first breath that changes the circulatory system of the newborn, is possibly the most important biological aspect of birth, and the mother will be playing an active role in this process.

#### **PREPARATIONS FOR DELIVERY**

The pregnant mother (i.e., the maternal holobiont) changes dramatically during pregnancy, as the body undergoes hormonal, immunological, and metabolic changes. These include fat gain starting early in pregnancy, and insulin resistance later in pregnancy. These two metabolic conditions, which are often detrimental to men and to non-pregnant women, are thought to be beneficial during pregnancy. At this time, increased adiposity and increased insulin resistance are thought to support fetal growth and to prepare the mother for lactation (Di Cianni et al., 2003; Lain and Catalano, 2007; Nelson et al., 2010). As we will see, this preparation for lactation is critical for the handover of symbiotic community from the mother to the infant.

Employing 91 women and assuming that stool samples accurately reflect the intestinal microbiota, Koren et al. (2012) used polymerase chain reactions to show that the gut microbiome of pregnant women changed dramatically during pregnancy. These included women who took probiotics during pregnancy and who had used antibiotics during either the first or second trimester. In most of the women, the first-trimester gut bacteria were similar to that of the general non-pregnant population, but the thirdtrimester samples differed significantly. In a majority of women, the relative abundances of Proteobacteria and Actinobacteria increased substantially (*P* = 0*.*0004 and 0.003, respectively) during this time between 13 and 33 weeks of pregnancy. Moreover, the microbial community became more streamlined, with the diversity much reduced by the third trimester.

The importance of these bacteria to normal pregnancy was demonstrated by transferring the bacteria from the healthy first-trimester and third-trimester women into healthy female germ-free mice. The mice receiving the bacteria from the stools of first-trimester pregnant women remained normal. However, within 2 weeks, the healthy formerly germ-free mice that received the third-trimester bacteria had a pregnancy-like metabolic syndrome, complete with insulin insensitivity, excessive weight gain, and increased markers of inflammation.

These bacteria were derived from the gut. The vaginal bacterial community has also been analyzed (Aagaard et al., 2012; Romero et al., 2014) and was found to have a dynamic pattern during gestation, returning to an essentially non-pregnant state toward the end of pregnancy. *Lactobacillus* species, however, appear to be enriched during pregnancy. Several enriched Lactobacci species digest glycogen, and they produce an acidic environment that prevents pathogenic infection (O'Hanlon et al., 2011). One particular Lactobacillus, *L. johnsonii* is found in the gut and vagina. In the gut it is an important component of the upper digestive tract and is critical for the processing of bile salts (Pridmore et al., 2004). However, *L. johnsonii* also produces a bacteriotoxic compound, Lactacin F, which prevents the growth of particular bacterial pathogens (Abee et al., 1994). So the vaginal microbiota appear to be helping the mother ward off infections of the reproductive tract during pregnancy.

Thus, there is a dramatic remodeling of the gut and vaginal microbiological communities over the course of pregnancy. Although the mechanisms have not been delineated, it appears that the hormonal changes of the host are changing the population of microbes in the gut and vagina. These are the microbial populations that will be experienced by the late-stage fetus as it leaves the birth canal.

#### **INITIAL COLONIZATION: THE SOONERS AND THE INITIATION OF LABOR**

It has long been assumed (Tissier, 1900) that the fetus develops within a sterile environment, and that when the amnion bursts during labor, the colonization could begin. The first microbes would reach the fetus as it was being born. These would be the resident microbes of the birth canal. Later, microbes from the mother's breast and skin would be in line for colonizing the newborn baby.

This would be the obvious way. However, there appear to be "sooners."2 New evidence affirms that the first settlers of this newfound fetal territory are colonists from the mother's microbiome that gain entry into the developing fetus, bypassing the placental and amniotic barriers. Strangely, bacteria have been found in normal amniotic fluid, in the umbilical cord, and in infant's first bowel movements, the meconium (see Funkhouser and Bordenstein, 2013). This would indicate that the bacteria were already in the fetus before birth. To experimentally observe whether maternal bacteria could be transferred into the fetus, Jimenez et al. (2008) fed pregnant mice milk that had been inoculated with genetically labeled *Enterococcus faecum* bacteria. A day before the mice were to be naturally born, the researchers performed a Caesarean section on the mice, delivering them aseptically. They found that their first bowel movement not only contained bacteria, but that some of the bacteria had the transgenic label that could only have been received through the oral cavity or gut of their mothers.

However, while the vaginal microbiota might prevent pathogenic infection, and while the gut bacteria may induce a metabolic syndrome in the mother, these sources do not appear to be where the initial colonizers are coming from. Surprisingly, data suggest that the first colonizing bacteria arise from the mouth and then work their way into the fetus while it is still within the amnion. Molecular studies also indicated that the early colonization of human neonates appears to be accomplished by bacteria originating from the oral cavity (Palmer et al., 2007; Human Microbiome Project Consortium, 2012; Jost et al., 2012; Milisavljevic et al., 2013). According to sequencing data (Stout et al., 2013; Aagaard et al., 2014; Prince et al., 2014), the neonatal gut microbiota do not resemble the maternal vaginal or gut microbiota, but contain populations of bacteria derived from a placental source that stems from the oral cavity. Moreover, specific bacteria that are found normally or pathologically in the oral cavity (and not in the lower gut or vagina) have been isolated from human amniotic fluid (Ernest and Wasilauskas, 1985; Douvier et al., 1999; Bearfield, 2002; and Han et al., 2004). Bacteria were formerly thought to be found in placentae only in those mothers at risk for preterm labor. However, Stout et al. (2013) have questioned this idea by identifying intracellular bacteria in normal term and preterm placentae3 .

The mechanism by which oral bacteria can get to the placenta is not yet known. However, one possibility is that *dendritic* *cells* of the oral cavity transport bacteria to lymphatic tissue in the placenta (see Donnet-Hughes et al., 2010; Funkhouser and Bordenstein, 2013). The oral mucosa contains numerous populations of dendritic cells, and these cells migrate through the blood and lymphatic vesicles to lymphoid tissues to mediate tolerance or immunogenicity (Hovav, 2014). When reaching the lymphoid tissues, the dendritic cells can diapedese across the endothelial cells into the lymphoid tissues (see de la Rosa et al., 2003; Johnson and Jackson, 2014). In many cases, they transport bacteria or other potential pathogens with them, and they present these microbial cells to the lymphocytes. The uterine decidua has a population of resident lymphocytes, and these cells are essential for normal implantation and the lack of rejection of the fetus (Blois et al., 2004; Juretic et al., 2004; Laskarin et al., 2007; Zarnani et al., 2008). So the oral cavity has a mucosa with associated dendritic cells, and the placenta has a lymphoid region capable of receiving dendritic cells. Recently, it was shown that dendritic cells carrying pathogens in them can migrate to the placenta and enable their parasitic passengers to infect the fetus. The transplacental passage of the intracellular *Toxoplasmosis*-like parasite *Neospora caninum* in mice appears to be facilitated by such dendritic cells. Inoculation of pregnant mice with dendritic cells infected with *Neospora* resulted in the migration of these dendritic cells to the placenta, the transmission of the parasite to the offspring, and often the resulting neonatal death (Collantes-Fernandez et al., 2012). There is therefore a pathway by which bacteria in the oral cavity can be transported to the placenta.

But the innate immune system of the fetus should prevent these bacteria from entering the fetal gut. The proteins that regulate immediate ("innate") immune responses against bacteria in the adult are the Toll-like receptors (TLRs). Interestingly, the activity of these receptors appears to be down-regulated in the fetus. The amniotic fluid, in addition to providing suspension and anti-dessication protection to the embryo, also contains large concentrations of Epidermal Growth Factor. This protein prevents the function of the TLRs. So while the early digestive tract is being bathed by amniotic fluid (the mouth and anus are open and exposed to amniotic fluid), the bacteria might be accepted as colonizers (Good et al., 2012). So there is a pathway through which bacteria in the oral cavity can become the first colonizers of the fetus, even before the amniotic membrane has lysed.

#### **THE SECOND WAVE: THE COLONIZATION OF THE COLON**

When John Donne famously wrote that "No man is an island," he was fully correct from the sociological perspective. But from the bacterial perspective, a man is a remarkable island, and the rules of island colonization hold for bacteria (Costello et al., 2012). Those bacteria who arrrive first get a wide choice of options, and they restrict the conditions for the next wave of settlers. Once the amnion has broken, the fetus is exposed to a wide variety of microbes, mostly from the gut and birth canal. Vaginal delivery exposes the fetus and newborn to the microbes of the mother's vagina and gut, a microbiome that has changed over the course of pregnancy (Tannock et al., 1990; Makino et al., 2011). These bacteria appear to be very important, as babies born through Caesarean section (i.e., not passing through the birth canal) have an altered bacterial colonization pattern early in life compared

<sup>2&</sup>quot;Sooner" comes from an American slang for those European settlers who entered the Oklahoma Territories of the United States prior to the legal opening of the land.

<sup>3</sup>This has led Prince et al. (2014) to speculate that these bacteria may be involved in the aberrant timing of parturition. It had been thought that preterm births were caused by infections ascending from the vagina. However, recent studies (Madianos et al., 2013) shows that these bacteria are not representative of pathogenic vaginal microbes but are most likely a set of oral microbes. Moreover, periodontal pathogens might reach the fetus and cause preterm births or fetal illness. These, according to Prince and colleagues, would be those that had colonized the placenta.

with vaginally delivered babies (Ley et al., 2006; Makino et al., 2013). These gut and vaginal bacteria initiate new host-symbiont relationships, and they will have an important role in the health of the holobiont (Conroy et al., 2009; Le Huërou-Luron et al., 2010). Babies born vaginally have gut bacterial communities that resemble those of the maternal vagina (Matsumiya et al., 2002; Dominguez-Bello et al., 2010). Those babies born by Caesarean section receive many of their colonizers from the hospital environment and from the mother's skin (Martirosian et al., 1995; Dominguez-Bello et al., 2010). Mother's milk also supplies bacteria that will reside in the newborn's gut (Martín et al., 2003; Collado et al., 2009; Solís et al., 2010; Garrido et al., 2012).

It takes over a year for the babies born by C-section to have a similar bacterial profile, and during this time, they have a lower microbial diversity, delayed colonization of important microbes (such as *Bacteroides*) and reduced lymphocyte responses (Guarner and Malagelada, 2003; Jakobsson et al., 2014). The ability of the fetus to decide which bacteria stay and which ones must be excluded is still very much a mystery. The process of colonization appears to involve many of the same molecules that are usually used to attack bacteria. It seems that at the core of either acceptance or rejection is recognition, and the reaction (rejection or tolerance) depends on the context in which the fetus receives these microbes.

Immunology is the science of recognition in contexts. The same signal may call out for destruction or synthesis depending on how it is presented. Here, the host recognizes symbiotic bacteria apparently by the same sets of molecules usually used to attack bacteria; but the turns this recognition into acceptance rather than attack, and in this particular context, these bacteria are actually encouraged to settle into our guts (Chu and Mazmanian, 2013; Lee et al., 2013). This, along with the bacterial regulation of pregnancy weight gain and the entry of bacteria into the fetus prior to tocolysis, has been another surprise. The innate immune system had been thought to recognize bacteria by their pathogen-associated molecular patterns (PAMPs) by their pattern recognition receptors (PRRs). It now turns out that PAMPs are found in all microbes, including symbionts, and the agents recognized by the host's PRRs are now called "microbe-associated molecular patterns" (MAMPs). In mice (as well as in numerous other organisms, including hydra and *Drosophila*), the activation of PRRs in the newborn induces a homeostatic integration of host and symbiont. It appears that the composition of the early microbiome dictates whether the response of the cells is inflammatory or tolerant (See Chu and Mazmanian, 2013).

#### **THE POWER OF POSITIVE SELECTION**

Now that the newborn's body is being colonized by microbes originating from the gut, vaginal, and oral cavities of the mother, the problem becomes one of specificity: Which bacteria are going to remain and which are to be eliminated? This is a critical question, and one in which research is just beginning. However, some of the research is pointing out how important the mother is in determining which populations persist. One of the most important bacteria present in the mother's intestines is *Bifidobacteria*. These bacteria provide several services to the neonate (Garrido et al., 2012). First, they actively prevent the colonization of the gut by pathogenic bacteria and help induce and sustain the immune system. Second, they provide essential vitamins to the infant (Lievin et al., 2000; Schell et al., 2002; Fukuda et al., 2011). They also increase the tight junctions that are necessary for tightly linking the intestinal epithelial cells together (Chichlowski et al., 2012). Surprisingly, this tightening of intestinal epithelial binding may be essential for the cognitive health of the infant (Hsiao et al., 2013). Bifidobacteria is a good bacterium to have as one of the colonizers of the gut, and genetic evidence supports the idea that mammals and Bifidobacteria have a co-evolutionary history of helping each other for at least 200 million years.

These Bifidobacteria are encountered as the fetus squeezes through the birth canal. Makino et al. (2013) found that specific strains of *Bifidobacteria* become translocated from the mother's intestine to the newborn's gut. When delivered vaginally, monophyletic representatives of the woman's intestinal *Bifidobacteria* formed colonies in the newborn within 3 days of birth. This did not occur in those babies born by Caesarean section, where Bifidobacterial counts were lower after the first week (Makino et al., 2013).

And the mother supports the growth of the newborn's Bifidobacteria colonies. One of the most interesting components of a newborn's diet is mother's milk. And here came another surprise: some of the complex sugars found in human mothers' milk are not digestible by the infant. Rather, they serve as food for certain bacterial symbionts such as *Bifidobacteria* that help the infants' bodies develop (Sela et al., 2011; Zivkovic et al., 2011; Yoshida et al., 2012; Underwood et al., 2013).

The genome of one of the most common Bifidobacteria (*B. longum subspecies infantis*) has been sequenced and shown to contain a remarkable region of DNA—a series of genes linked together and dedicated to the intake and digestion of complex sugars found specifically in the earliest secretions of mother's milk. This small (43 kb) unit is not found in related bacteria that are not part of the gut microbiome, and these data indicate a remarkable co-evolution between this symbiotic bacterium and its host, suggesting that the host product (human milk) and the microbial genome enabling the bacteria to use this product have reciprocally formed each other (Sela et al., 2008). Thus, there would be a co-evolution of human and microbe for the purpose of gut colonization by this microbe, a colonization that would benefit both.

Not only do the sugars in mother's milk feed the "good guys," these oligosaccharides "prevent the pathogens from latching onto healthy cells, routing trouble-makers into a dirty diaper instead" (Bode, 2012; Manthey et al., 2014; Shugart, 2014). Mother's milk may also help symbiosis by instructing changes in the immune system through microRNAs in its lipid fraction (Munch et al., 2013).

Although each baby starts with a unique bacterial profile, within a year the types and proportions of bacteria have converged to the adult human profile that characterizes the human digestive tract (Palmer et al., 2007). Upon weaning and ingesting solid food, the bacterial population changes again, to the more adult form (Pantoja-Feliciano et al., 2013). Interestingly, the types of bacteria allowed to colonize depend on (1) the prevalence of a particular species of microbes in the environment; (2) which microbes have already entered the gut; (3) the genetics of the digestive tract; and (4) the diet one receives (Nicholson et al., 2012; Pacheco et al., 2012; Kashyap et al., 2013).

#### **HOLOBIONT PROCREATION**

So we have re-told the birth narrative from a holobiont perspective. In doing so, several surprising hypotheses have emerged, ideas that had not been part of the traditional account of pregnancy:


In this new narrative, we see "birth" as involving the reproduction of the holobiont. In other words, both the mammal and her persistent microbial populations have to be reproduced. Both niche construction and scaffolding are critical. *Niche construction* is defined as "the process whereby organisms, through their metabolism, their activities and their choices, modify their own and/or each other's niches" (Odling-Smee et al., 2003, p. 419). Our symbiotic bacteria employ such niche construction in specifying their environment by changing their host's development. The niches in which bacteria reside are to a large part generated by the bacteria, themselves. Some of the symbiotic microbes in the mouse intestine, for instance, induce gene expression in the gut epithelia not only to help the host, but to help themselves. The normal gut microbes, such as *Bacteroides*, induce gene expression in the Paneth cells of the intestine, instructing these cells to produce two compounds—angiogenin-4 and RegIII—that prevent the colonization of the intestine by *other* species of microbes. *Bacteroides, Escherichia coli*, and other symbiotic species are impervious to this compound, while several pathogenic Grampositive bacteria (*Enterococcus faecalis* and *Listeria monocytogenes*) are wiped out by it (Hooper et al., 2003; Cash et al., 2006). These bacteria are enemies of Bacteroides and of the mammalian host. Thus, the microbial species is modifying its niche, causing its environment (i.e., the mammal) to change in such a way that they can better survive.

*Scaffolds* are "material environmental inputs with organizations that are sensitive and responsive to the developmental state of the developmental system being scaffolded" (Griesemer, 2014). A scaffold facilitates developmental processes that would be difficult or costly without it, and the scaffold is often temporary. Such scaffolds are critical and reciprocal parts of the holobiont's birth. First, we've seen that the bacteria are part of the scaffolding that allows human reproduction. If some of the "symptoms" of late pregnancy that support the fetus and its delivery are caused by bacteria, then the bacteria is part of the scaffolding of human reproduction. And if the milk sugars of the mother and the modified immune system of the newborn enable the successful reproduction of a particular set of bacteria (that will enable the completion of the developmental capacities of the newborn), then humans are a critical scaffolding for the reproduction of the microbes.

The mother, through her hormones, anatomy, and milk production, is in large part responsible for the successful handing over of the fetus to a new set of symbionts. Going from the maternal environment to the outside world is not merely leaving a symbiotic support system and gaining "independence." There is no such thing as "independence." It's mutual dependency all the way down, and birth is the exchanging of one symbiotic system for another.

#### **ACKNOWLEDGMENTS**

The idea for this paper came from conversations in 2012 at the University of California, Santa Cruz, with Martin Jacobs, who suggested birth as mediating between two symbiotic systems.

#### **REFERENCES**


adaptations for milk utilization within the infant microbiome. *Proc. Natl. Acad. Sci. U.S.A*. 105, 18964–18969. doi: 10.1073/pnas.0809584105


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 14 July 2014; accepted: 31 July 2014; published online: 19 August 2014. Citation: Gilbert SF (2014) A holobiont birth narrative: the epigenetic transmission of the human microbiome. Front. Genet. 5:282. doi: 10.3389/fgene.2014.00282 This article was submitted to Epigenomics and Epigenetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Gilbert. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

### Delayed development induced by toxicity to the host can be inherited by a bacterial-dependent, transgenerational effect

#### *Yael Fridmann-Sirkis 1†, Shay Stern1†, Michael Elgart 1†, Matana Galili 1, Amit Zeisel 2, Noam Shental <sup>3</sup> and Yoav Soen1 \**

*<sup>1</sup> Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel*

*<sup>2</sup> Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel*

*<sup>3</sup> Department of Computer Science, The Open University, Raanana, Israel*

#### *Edited by:*

*Ilaria Negri, University of Turin, Italy*

#### *Reviewed by:*

*Abhijit Shukla, Harvard Medical School, USA Ilaria Negri, University of Turin, Italy*

#### *\*Correspondence:*

*Yoav Soen, Department of Biological Chemistry, Weizmann Institute of Science, Ullmann Building, Rehovot 76100, Israel e-mail: yoavs@weizmann.ac.il*

*†These authors have contributed equally to this work.*

Commensal gut bacteria in many species including flies are integral part of their host, and are known to influence its development and homeostasis within generation. Here we report an unexpected impact of host–microbe interactions, which mediates multi-generational, non-Mendelian inheritance of a stress-induced phenotype. We have previously shown that exposure of fly larvae to G418 antibiotic induces transgenerationally heritable phenotypes, including a delay in larval development, gene induction in the gut and morphological changes. We now show that G418 selectively depletes commensal *Acetobacter* species and that this depletion explains the heritable delay, but not the inheritance of the other phenotypes. Notably, the inheritance of the delay was mediated by a surprising trans-generational effect. Specifically, bacterial removal from F1 embryos did not induce significant delay in F1 larvae, but nonetheless led to a considerable delay in F2. This effect maintains a delay induced by bacterial-independent G418 toxicity to the host. In line with these findings, reintroduction of isolated *Acetobacter* species prevented the inheritance of the delay. We further show that this prevention is partly mediated by vitamin B2 (Riboflavin) produced by these bacteria; exogenous Riboflavin led to partial prevention and inhibition of Riboflavin synthesis compromised the ability of the bacteria to prevent the inheritance. These results identify host–microbe interactions as a hitherto unrecognized factor capable of mediating non-Mendelian inheritance of a stress-induced phenotype.

**Keywords: host–microbe interactions, non-Mendelian inheritance,** *D. melanogaster***, epigenetics, stress responses**

#### **INTRODUCTION**

Transgenerational epigenetic phenomena have been reported in a number of different animal and plant species (Cavalli and Paro, 1998; Morgan et al., 1999; Kaati et al., 2002; Sollars et al., 2003; Anway et al., 2005; Cropley et al., 2006; Molinier et al., 2006; Lumey et al., 2007; Xing et al., 2007; Greer et al., 2011; Rechavi et al., 2011; Seong et al., 2011; Crews et al., 2012; Stern et al., 2012). These events may be induced by the environment, and have been shown to involve a variety of host-intrinsic factors and molecular pathways (Jablonka and Raz, 2009), including chromatin modifiers (Cavalli and Paro, 1998; Greer et al., 2011; Seong et al., 2011), DNA methylation (Xing et al., 2007; Heijmans et al., 2008; Carone et al., 2010; Guerrero-Bosagna et al., 2010; Schmitz et al., 2011), RNAi machinery (Rechavi et al., 2011; Ashe et al., 2012; Buckley et al., 2012), and secreted signals (McCaffery et al., 1998). Involvement of other host intrinsic or extrinsic factors is also possible but has been largely overlooked. Of these, the commensal gut microbiome is a particularly attractive candidate. The gut microbiome is an integral part of the development and homeostasis of its host (Rosenberg and Zilber-Rosenberg, 2011; Charroux and Royet, 2012; Buchon et al., 2013), but nonetheless, can be extensively modified by the environment. As with other environmental perturbations, disruption of the gut microbiome may lead to changes in the host which extend beyond one generation. However, potential multigenerational impacts of commensal gut bacteria on host development and physiology have not yet been investigated because typical studies assume that these bacteria affect each generation in a similar way.

Here, we examine multigenerational impacts of microbial changes using a model based on exposure of fly larvae to G418 toxicity in specific tissues. We have recently shown that this exposure can induce developmental phenotypes which persist for multiple generations in non-exposed offspring (Stern et al., 2012). In this system, we supplemented the larval food with G418 antibiotic and placed a resistance transgene fused to *GFP* (*neoGFP*) under the regulation of arbitrary, spatio-temporally restricted developmental promoters. This leads to toxic stress in tissues that are exposed to G418 but do not express sufficient levels of the "rescue," *neoGFP* gene. Exposure to G418 led to multiple phenotypes, including a delay in larval development, promoterdependent induction of *neoGFP* expression and morphological changes in two promoter cases. Moreover, some of the induced phenotypes persisted in a number of subsequent generations of non-exposed offspring (Stern et al., 2012). In particular, the delay in development and the induction of *neoGFP* expression were inherited at high penetrance and typically persisted for 3–10 generations without G418. One of the morphological phenotypes, wing abnormalities in the *Hsp70::neoGFP* case, was also heritable albeit at a much lower penetrance.

As has been previously shown, exposure of flies to an antibiotic (Chlortetracycline) can have a direct influence on the host tissue as well as an indirect effect mediated by an impact of the antibiotic on the commensal microbiome (Ridley et al., 2013). This rationale also applies to G418 which is an aminoglycoside which blocks polypeptide synthesis in both eukaryotic and prokaryotic cells. Thus, the above paradigm of G418-induced inheritance may provide a model for investigating potential contributions of host– microbe interactions to the inheritance of induced phenotypes in the host. We therefore, investigated the impact of G418 on the microbial composition and the resulting implications for induction and inheritance of responses in the host. We show that G418 leads to a selective depletion of commensal *Acetobacter* species. Removal of extracellular bacteria without exposure to G418 had an almost negligible effect on the first generation of bacterialdepleted larvae (F1), but nonetheless caused a considerable delay in larval development in the following generation (F2). The delay in offspring development following parental removal of gut bacteria was completely eliminated by re-introduction of a commensal *Acetobacter* species. Reintroduction of a commensal *Acetobacter* species also prevented the inheritance of the delay in development in offspring of G418-exposed flies. We further show that this prevention of the heritable delay is mediated in part by Riboflavin produced by the *Acetobacter* species.

These results show that environmental disruption of the gut microbiome can induce different effects in parents and offspring. They also uncover an unexpected scenario by which host–microbe interactions mediate the inheritance of delayed development in response to G418, namely: the delay in the parental generation is induced by a direct effect of G418 on the host tissue, but is maintained in non-exposed offspring by the transgenerational effect of *Acetobacter* depletion. Specifically, the *Acetobacter* depletion causes a modification in the parents which becomes phenotypic (delayed development) only in the offspring. We show that this transgenerational effect is responsible for the inheritance of the delay in development, but not for the inheritance of induced *neoGFP* expression and the inheritance of morphological changes.

The inter-generational difference between the rate of development in bacterial-depleted parents and offspring suggests that changes in the gut microbiome may influence the germline in the parents.

#### **RESULTS**

#### **G418 SELECTIVELY DEPLETES** *ACETOBACTER* **SPECIES FROM THE LARVAL GUT**

As a starting point for investigating the potential involvement of the gut microbiome in the response to G418, we tested if exposure to G418 modifies the composition of bacteria in the larva. We first analyzed the larval gut microbiome using an improved method of deep-sequencing of DNA coding for 16S ribosomal RNA (Amir et al., 2014). In line with recent findings (Wong et al., 2011), we identified in the gut of *hairy::neoGFP* larvae various *Acetobacter* and *Lactobacillus* spp. (**Supplementary Data Sheet 1**—Figure S1A, **Supplementary Data Sheet 2**). In addition to these extracellular species, we detected high abundance of the endosymbiont *Wolbachia*, which is known to exist in many lines of *D. melanogaster* (Bourtzis et al., 1996; Dobson et al., 1999; Veneti et al., 2003; McGraw and O'Neill, 2004) and is capable of manipulating various reproductive features of its host (Werren, 1997; Starr and Cline, 2002; Ikeya et al., 2009). The deep-sequencing analysis revealed strong reduction in the relative amount of *Acetobacter* spp. in G418-exposed larvae (**Supplementary Data Sheet 1**—Figure S1A). To validate this depletion, we developed a quantitative PCR-based assay capable of selectively measuring the total contents of *Acetobacter*, *Lactobacilus,* and *Wolbachia* spp. (**Supplementary Data Sheet 1**—Figures S1B–D). Measurement of the amounts of these three types of bacteria confirmed that G418 selectively depletes *Acetobacter* species (**Figure 1A**). Notably, the depletion of *Acetobacter* species persisted in non-exposed F2 offspring of G418 exposed flies (**Figure 1A**), indicating that the change in bacterial composition is itself heritable. Thus, G418 toxicity modifies the fly's microbiome by selectively depleting commensal *Acetobacter* species.

#### **THE DELAY IN DEVELOPMENT IS INHERITED BY A BACTERIAL-DEPENDENT, TRANSGENERATIONAL EFFECT**

We investigated whether the depletion of commensal bacteria might be responsible for the G418-induced phenotypes. We began by testing if removal of bacteria by egg dechorionation and sterilization (Brummel et al., 2004) can reproduce the delay observed in G418-treated larvae (Stern et al., 2012). We used the qPCR and bacterial growth assays to verify complete removal of extracellular bacteria (**Supplementary Data Sheet 1**–Figures S1E,F) and analyzed the influence of bacterial removal on pupation time. As previously demonstrated (Shin et al., 2011; Storelli et al., 2011), the effect of bacterial removal on pupation time depended on the food quality (**Supplementary Data Sheet 1**— Figure S2). Under the food conditions we have previously used (Stern et al., 2012), removal of bacteria led to a mild delay, which was considerably smaller than the delay caused by exposure to G418 (∼0.5 vs. 2–3 days, respectively; **Figure 1B**, **Supplementary Data Sheet 1**—Figure S3A). The lack of a considerable delay was consistent with previous work in which germfree larvae developed in high quality diet (Storelli et al., 2011). Thus, depletion of extracellular bacteria did not reproduce the G418-induced delay in the first generation. Nevertheless, analysis of F2 *hairy::neoGFP* larval offspring of bacterial-depleted F1 parents revealed a very substantial delay, comparable to the delay observed in offspring of G418 exposed parents (**Figure 1C**). This effect of bacterial removal was also observed in three non-transgenic lines (**Figure 1D**). The difference between the rate of development in bacterial depleted parents and offspring depended on the diet, and was mostly evident within a particular range of diet quality (**Supplementary Data Sheet 1**—Figure S2). Within this range, removal of gut bacteria had a transgenerational

SE of type-specific 16S DNA sequences extracted from guts of third instar larvae (*n >* 7). **(B)** Effect of egg dechorionation and sterilization ("Bacterial depleted") on the kinetics of pupation of *hairy::neoGFP* larvae in the same generation. Note the small effect compared to the large delay following exposure to G418. Mean fraction of pupae ± SE in 7 vials. Inset: Statistical Mean fraction of pupae ± SE in at least three biological replicates. **(D)** Effect of egg dechorionation and sterilization in F1 on the kinetics of pupation in F1 and F2 in three non-transgenic fly lines (yw, Canton-S, and Oregon-R). Inset: Statistical analysis of differences between fractions of pupae (FOP) in day 9. ∗*p <* 0*.*05, ∗∗∗*p <* 0*.*001 (Student's *t*-test).

effect that was not observed in the first generation of larvae lacking the bacteria. This transgenerational effect of bacterial depletion can account for the delay in development in non-exposed offspring of G418-exposed (and hence, *Acetobacter*-depleted) parents. As has been noted (Ridley et al., 2013), direct evidence can be obtained by antibiotic exposure of flies that had been depleted of their bacteria prior to the exposure. Applying this strategy in our system showed that removal of extracellular bacteria prior to G418 exposure led to significant reduction in the heritable delay (**Supplementary Data Sheet 1**—Figure S4A). This verifies the key contribution of bacterial depletion to the inheritance of delayed development in response to parental exposure to G418.

Other phenotypes reported by Stern et al., were largely unaffected by the mere depletion of bacteria. Removal of extracellular bacteria by egg dechorionation without exposure to G418 had no effect on the levels of *neoGFP* in the gut of the larvae (**Supplementary Data Sheet 1**—Figure S3B). Similarly, we did not observe any of the morphological influences of G418 exposure in bacterial depleted flies (**Supplementary Data Sheet 1**—Figures S3C,D). Genome-wide analysis of mRNA levels in the proventriculus of bacterialdepleted *hairy::neoGFP* larvae further revealed very small overlap with the response to G418 (only 8 and 6% of genes that were up- and down-regulated in G418 were similarly affected by bacterial removal; **Supplementary Data Sheet 1**— Figure S3E, **Supplementary Data Sheet 3**). As expected, the set of transcripts that were down regulated following dechorionation were enriched for genes involved in response to bacterium (*p <* 10−4; **Supplementary Data Sheet 1**—Figure S2F, **Supplementary Data Sheet 4**), e.g., *Attacin*, and *Defensin* (**Supplementary Data Sheet 1**—Figure S3J). The set of up-regulated transcripts, on the other hand, was strongly enriched with genes associated with phospholipase activity and polysaccharide metabolism (*p <* 10−<sup>8</sup> and *p <* 10−4, respectively; **Supplementary Data Sheet 1**—Figure S3F, **Supplementary Data Sheet 4**). These categories were not enriched in the set of proventriculus genes which responded to G418 (**Supplementary Data Sheet 1**—Figure S3F, **Supplementary Data Sheet 4**). Moreover, unlike in G418-exposed larvae (Stern et al., 2012), the depletion of bacteria did not down-regulate Polycomb group (PcG) genes (**Supplementary Data Sheet 1**—Figures S3G,H) and did not induce the anti-detoxification, *GstD* genes (**Supplementary Data Sheet 1**—Figure S3I). Further analysis of offspring of bacterial-depleted flies revealed that the difference between parents and their offspring with respect to the delay in development is not observed with respect to the other heritable phenotypes. Specifically, depletion of bacteria in F1 did not lead to induction of *neoGFP* expression in the larval gut of the offspring (**Supplementary Data Sheet 1**—Figures S4B,C), or to wing deformations in *Hsp70::neoGFP* offspring of bacterial depleted parents (**Supplementary Data Sheet 1**—Figure S4D).

Altogether, this shows that the transgenerational effect of bacterial depletion can account for the inheritance of the delay in development, but not for the other transgenerationally heritable phenotypes of G418 exposure.

#### **THE TRANSGENERATIONAL INHERITANCE OF THE DELAY IN DEVELOPMENT IS PREVENTED BY REINTRODUCTION OF THE DEPLETED** *ACETOBACTER* **SPECIES**

To test if reintroduction of depleted bacteria can prevent the inheritance of the delay in development following parental exposure to G418, we isolated bacterial colonies from nonexposed flies. We then tested the ability of bacteria from these colonies to prevent the inheritance in non-exposed offspring of G418 exposed (F1) *hairy::neoGFP* parents. Supplementing the fly food with bacteria from a single colony (*Colony 1*, **Supplementary Data Sheet 1**—Figure S5D) sufficed to prevent the inheritance of the delay in development (**Figure 2A**). Analysis of the 16S sequence of the ribosomal RNA gene of *Colony 1* revealed similarity to 4 *Acetobacter* spp. (**Figure 2B**): *A. aceti* (99.5% similarity to *Colony 1*), *A. cibinongensis* (97.6% similarity), *A. pomorum* (96.6% similarity), and *A. tropicalis* (97.2% similarity). However, the non-commensal *Acetobacter* spp., *A. aceti*, *A. pomorum*, *A. tropicalis* and *A. cibinongensis* (all obtained from the DSMZ stock collection) could not prevent the inheritance of the delay in development (**Figures 2C,D**). Thus, the prevention of the inheritance of the delay following parental exposure to G418 required commensal *Acetobacter* species.

Notably, prevention of the inheritance of the delay was also observed in various natural, mating-dependent, or independent settings, in which exposed flies were infected with bacteria carried by non-exposed flies. Specifically, when G418 exposed males or females were crossed to naïve partners (i.e., flies lacking any history of exposure), the offspring larvae were not delayed in their development (**Figure 2E**). Additionally, when G418-exposed parents were inserted into a vial which temporarily contained naïve flies (either males or virgin females) for 1 day prior to the insertion of the exposed parents, the offspring were no longer delayed (**Figure 2F**). Similar results were obtained when the vials were supplemented with PBS that was brought in contact with naïve flies, but not when this fluid was filtered to remove all the bacteria (**Supplementary Data Sheet 1**—Figure S5A). These findings show that the inheritance of the delay in development can be prevented by either natural or experimental exposures to bacteria from male or female flies with no history of exposure to G418.

While *Colony 1* supplementation always prevented the inheritance of the delay in offspring, it did not necessarily prevent the inheritance of induced *neoGFP* expression. Altogether, the average expression of *neoGFP* in the foregut of the offspring larvae across experiments was slightly reduced but this effect was not statistically significant (**Supplementary Data Sheet 1**—Figure S5B). Similar results were obtained in additional settings in which the food was exposed to micro-organisms carried by naïve flies (**Supplementary Data Sheet 1**—Figure S5C). Thus, gut bacteria had a significant preventive effect only on the inheritance of the delay in development.

To determine if reintroduction of commensal *Acetobacter* species can prevent the delay in development also in offspring of bacterial depleted flies that have not been exposed to G418, we added specific bacterial species to the food of the F1 larvae that hatched from dechorionated (and sterilized) eggs. Analysis of the rate of larval development in F2 showed that *Colony 1* completely rescued the delay in these offspring of bacterial-depleted flies (**Figure 2G**). As in the case of parental G418 exposure, noncommensal *A. aceti* had no effect on the delay in F2 (**Figure 2H**). Additionally, isolated bacteria (*Colony 7*) with 98.9% 16S rRNA similarity to *L. plantarum* (**Figure 2B**) and *L. plantarum* from another stock (Sharon et al., 2010), led to only partial rescue (**Figures 2G,H**). Thus, commensal *Acetobacter* species in the F1 parents are necessary for complete prevention of the delay in the F2 offspring. This indicates that lack of these *Acetobacter* species following exposure to G418 is sufficient to account for the heritable delay.

Altogether, the results show that G418 induces a delay in larval development by a direct stress to the host (**Figure 1B**, **Supplementary Data Sheet 1**—Figure S3A), but the delay is inherited in the non-exposed offspring because of a transgenerational effect of *Acetobacter* depletion in their parents (**Figures 1C,D**, **2A,B**). This indicates that the induction of the developmental delay and its transgenerational inheritance occur by different mechanisms.

To evaluate the generality of involvement of *Acetobacter* species in the inheritance of delayed development, we exposed nontransgenic, *yw* flies to other types of antibiotics, supplemented at sub lethal dosages. We established antibiotic-specific dosages that induce a 2–3-day delay in larval development (G418— 100 ug/ml, Puromycin—200 ug/ml, Ampicillin—100 ug/ml, and Ciprofloxacin—300 ng/ml). All these exposures to antibiotics induced developmental delay (**Figure 3A**), which except for the Puromycin case, persisted in subsequent generations without further exposure to the antibiotic (**Figure 3B**). Since Puromycin stood out as the only antibiotic which did not induce heritable delay, we suspected that it does not sufficiently deplete bacteria

**FIGURE 2 | Reintroduction of a commensal** *Acetobacter* **species prevents the inheritance of the delay in development. (A)** Supplementing the food with bacteria from a single *Acetobacter* colony (*Colony 1*) isolated from non-exposed flies, prevented the inheritance of the delay in

non-exposed F2 offspring of G418-exposed parents. Mean fraction of pupae ± SE in 10 vials. All insets: Statistical analysis of differences between fractions of pupae (FOP) in the indicated day (7 or 8). **(B)** Analysis of 16S *(Continued)*

#### **FIGURE 2 | Continued**

rRNA gene sequence of bacteria from two isolated colonies, *Colony 1* and *Colony 7*. Shown are computed similarities to various species of *Acetobacter, Lactobacillus*, and *Wolbachia*. **(C,D)** Supplementing fly vials with the non-commensal species, *A. aceti* or *A. cibinogenesis* **(C)**, and with *A. pomorum* or *A. tropicalis* **(D)**, did not prevent the inheritance of the delay. Mean fraction of pupae ± SE in 10 vials. **(E)** Kinetic curves of pupation of F2 larval offspring generated from crosses between G418-exposed and non-exposed *hairy::neoGFP* F1 males or virgin females (all four combinations are shown). In all cases, the F2 larvae were not exposed to G418. Delay in F2 was observed only when both parents were exposed to G418. Mean fraction of pupae ± SE in 6 vials. "M(E)"—exposed males "M(N)"—non-exposed

which are capable of preventing the inheritance. To test this, we examined the response of *Acetobacter* spp. to exposure with the same concentration of antibiotic. We found that Ampicillin and Ciprofloxacin removed nearly all of the *Acetobacter* spp. from the larval gut and abolished the growth of *Colony 1* in suspension (**Figures 3C,D**). Exposure to Puromycin, on the other hand, spared a substantial fraction of these bacteria (∼1/3 of the *Acetobacter* spp. content of untreated flies; **Figure 3C**). Additionally, Puromycin had almost no effect on *Colony 1* in suspension (**Figure 3D**). Thus, substantial *Acetobacter* content could not prevent the delay in Puromycin-exposed flies, but nonetheless prevented the inheritance of the delay in their non-exposed offspring. These results show that the transgenerational effect of *Acetobacter* depletion mediates the inheritance of delayed development in response to additional antibiotics, and that these findings are not limited to transgenic flies.

#### *ACETOBACTER***-DEPENDENT PREVENTION OF THE HERITABLE DELAY IS MEDIATED BY BACTERIAL RIBOFLAVIN (VITAMIN B2)**

Since the bacteria may provide vitamins and nutrients to the larvae, we tested if vitamins could influence the heritability of the delay in development. In particular, we tested the effect of six vitamins that have been shown to be essential additives for germfree flies reared on axenic food (Sang, 1956). Supplementing all six vitamins to the food prevented the inheritance of the delay in development (**Figure 4A**), but not the inheritance of the induced expression (**Supplementary Data Sheet 1**—Figure S6A). To test which of the six vitamins is essential for preventing the inheritance of the delay, we repeated the experiment with one vitamin excluded from each pool. Of the six vitamins tested, Riboflavin (vitamin B2) had the strongest effect on the inheritance. Pools lacking Riboflavin could not prevent the delay in the non-exposed offspring of G418-exposed parents (**Figure 4A**), indicating that Riboflavin is necessary for the activity of the full vitamin pool. Removal of Pyridoxine (vitamin B6) also compromised the activity of the pool (**Supplementary Data Sheet 1**—Figure S6B), while the remaining vitamins, Pantothenic acid, Thiamine, Nicotinic acid and Biotin, had no significant effect on the delay in offspring development (**Supplementary Data Sheet 1**—Figures S6C–F).

Supplementing the food with Riboflavin alone reduced, but did not completely eliminate the delay in development of the non-exposed offspring (**Figure 4B**). Co-supplementing Riboflavin together with Pyridoxine and Pantothenic acid increased the effect of Riboflavin alone, but only by a small males, "F(E)"—exposed females "F(N)"—non-exposed females. **(F)** Prevention of the inheritance of delayed development in vials that were temporarily exposed for 1 day (conditioning) to flies with no history of ancestral exposure to G418. Mean fraction of pupae ± SE in 5 vials. **(G)** Placing dechorionated and sterilized F1 eggs on food with bacteria from *Colony 1* prevented a delay in development of the F2 larvae. Bacteria from *Colony 7* led to only partial reduction in the delay. Mean fraction of pupae ± SE in at least 15 vials pooled from three biological replicates. **(H)** Same as **(G)** for food with *L. plantarum* from another stock (partial reduction) and non-commensal *A. aceti* (no effect). Mean fraction of pupae ± SE in at least 15 vials pooled from three biological replicates. ∗*p <* 0*.*05, ∗∗∗*p <* 0*.*001 (Student's *t*-test).

amount (**Figure 4B**). To test if G418-depleted bacteria produce Riboflavin, we analyzed bacterial extracts of *Colony 1* by Mass Spectrometry. We identified a clear signature of Riboflavin in the extract of *Colony 1* (**Figure 4C**). Quantitative Liquid Chromatography Mass Spectrometry (q-LCMS) analysis further showed that the amount of Riboflavin produced by *Colony 1* is substantially larger than the amount produced by commensal Lactobacilli from *Colony 7* (**Figure 4D**). These results suggested that the ability of *Colony 1* to prevent the inheritance of the delay is mediated, at least in part, by production and secretion of Riboflavin. We tested this by supplementing the food with Roseoflavin (RoF), an inhibitor of bacterial Riboflavin synthesis (Lee et al., 2009). Riboflavin is the product of 5 genes encoded by a single polycistronic transcript that is subjected to negative feedback via a riboswitch mechanism; Riboflavin-derived Flavin mononucleotide, FMN, binds to riboswitch aptemers in the transcript, changing the conformation of the transcript into a translationally inactive form, thereby repressing the production of Riboflavin biosynthesis genes (**Figure 4E**). The FMN analog, Roseoflavin, inhibits Riboflavin production by binding to this riboswitch. We identified a range of Roseoflavin concentrations which did not significantly compromise the growth of *Colony 1* (**Supplementary Data Sheet 1**—Figure S6G). Nonetheless, supplementing the food with Roseoflavin within this range repressed the production of Riboflavin by 10-fold, as determined by quantitative LCMS (**Figure 4F**). Consistent with the hypothesized influence of Riboflavin, the inhibition of its synthesis by Roseoflavin eliminated the ability of *Colony 1* to prevent the delay in the offspring (**Figure 4G**). This outcome was completely reversed by cosupplementing the food with exogenous Riboflavin (**Figure 4G**), indicating that the effect of Roseoflavin was indeed mediated by its inhibitory influence on the synthesis of bacterial Riboflavin. Altogether, these results show that Riboflavin produced by a commensal *Acetobacter* species interferes with the inheritance of delayed development and inhibition of Riboflavin synthesis in the bacteria compromises the ability of the bacteria to prevent this inheritance.

#### **DISCUSSION**

In this work we investigated the potential involvement of the gut microbiome in the inheritance of environmentally-induced phenotypes. In particular, we tested if G418-mediated disruption of the intact microbiome could account for the inheritance of induced phenotypes in the host. We found that G418 leads

**FIGURE 3 | Generality of involvement of** *Acetobacter* **species in the inheritance of delayed development. (A)** Kinetic curves of pupation of wild-type F1 *yw* larvae exposed to sub-lethal dosages of four different antibiotics: G418 (100 ug/ml), Ampicillin (100 ug/ml), Puromycin (200 ug/ml), and Ciprofloxacin (300 ug/ml). Mean fraction of pupae ± SE in at least three biological replicates. Inset: Statistics of differences between fractions of pupae (FOP) in day 8. **(B)** Kinetic curves of pupation of non-exposed F2 offspring of F1 flies that were exposed to G418, Ampicillin, Puromycin and Ciprofloxacin. Mean fraction of pupae ± SE in at least 5 vials. Inset: Statistics of differences between fractions of pupae (FOP) in day 8. **(C)** Measured abundance of *Acetobacter* spp. in the gut of third instar larvae exposed to various antibiotics. Mean abundance normalized to non-exposed larvae ± SE in three biological replicates. **(D)** Same as **(C)** for population density of *Colony 1* in YPD media supplemented with these antibiotics. Mean OD600 ratio measured after 24 h and normalized to the ratio in YPD alone ± SE in four biological replicates. <sup>∗</sup>*p <* 0*.*05, ∗∗*p <* 0*.*005 (Student's *t*-test).

to selective changes in the microbial gut composition and that these changes persist in the next generation. This inheritance of microbial disruption raises two potential scenarios which may support inheritance of induced phenotypes. Both scenarios are based on host response to the modifications in the gut bacteria. In the simpler, and more expected scenario, disruption of the gut microbiome modifies host phenotypes within one generation and this response is roughly the same in the following generations (which inherit the changes in the microbiome). This scenario might be expected if the germline is not modified by the disruption of the microbiome. In this case, persistence of the microbial changes creates the phenotypes anew in each generation. In the second potential scenario, the disruption of the microbiome has a different influence on the parents and their offspring. This difference between parents and offspring could suggest an effect of microbial disruption on the parental germline.

Our bacterial removal experiments revealed a clear signature of the second scenario. This was demonstrated by the dramatic difference in pupation time between extracellular bacteria-free parents and their offspring. Our rescue experiments with specific bacterial species further showed that as long as the parents are depleted of their commensal *Acetobacter* spp., the development of their offspring is delayed compared to the development of the parents. Together with the selective depletion of *Acetobacter* spp. by G418, this inter-generational difference accounts for the inheritance of the delay in development in offspring of G418-exposed parents. This scenario of inheritance was unexpected because the induction and inheritance of the delay in development are mediated by different mechanisms. Specifically, we show that the delay in pupation time is induced in the parental generation by the direct influence of G418 on the host, whereas the delay in the offspring generation is caused by the transgenerational influence of *Acetobacter* depletion. Reaching this conclusion requires bacterial depletion and rescue without G418, thus enabling de-coupling of the influence of bacteria from the toxic influence of G418 (Ridley et al., 2013).

The difference between generations with respect to pupation time, clearly indicate that the influence of bacteria in one generation can be different than the influence in the following generation. We hypothesize that this reflects an influence of microbial disruption on the germline. While bacterial-mediated manipulation of the germline has been previously noted for germline-residing endosymbionts (Bourtzis et al., 1996; Werren, 1997; Starr and Cline, 2002; Fast et al., 2011), it has not yet been shown for extracellular gut bacteria. This does not at all negate the diverse influence of gut bacteria on the development and physiology within generation. Indeed, we found that larvae which hatched from dechorionated and sterilized eggs are very sensitive to food quality and, upon reduction in quality, exhibited a considerable delay in development already in the first generation without bacteria (**Supplementary Data Sheet 1**—Figure S2). Previous studies of gut bacteria in flies further uncovered substantial involvement of the micrbiome in the fly development and homeostasis, including the regulation of larval growth (Shin et al., 2011; Storelli et al., 2011), immune activation (Charroux and Royet, 2012), lifespan determination (Brummel et al., 2004; Ben-Yosef et al., 2008), fecundity (Ben-Yosef et al., 2010), and even mating choice (Sharon et al., 2010). Many past studies, however, assumed that the effect of gut bacteria is the same in each generation. Accordingly, they examined the effect of bacterial removal in flies that had been kept without bacteria for

to lay eggs on G418-free food, supplemented with a full set of six vitamins (Riboflavin, Pyridoxine, Pantothenic acid, Thiamine, Nicotinic acid, Biotin) or with a set from which Riboflavin was excluded. Mean fraction of pupae ± SE in 10 vials. Inset: Statistical analysis of differences between fractions of pupae (FOP) in day 8. **(B)** Same as **(A)** with Riboflavin (Rb) alone (instead of the vitamin pool) or with Riboflavin together with Pyridoxine (Pyr) and Pantothenic acid (Pan). Mean fraction of pupae ± SE in 5 vials. Inset: Statistics of differences between

**(E)** Schematic diagram of the current model of bacterial Riboflavin riboswitch and its inhibition by Roseoflavin. **(F)** Quantitative LCMS measurements of Riboflavin abundance in *Colony 1* with or without Roseoflavin (RoF). Mean abundance ± SE in three biological replicates. **(G)** Same as **(A)** with Roseoflavin (red) and Roseoflavin together with Riboflavin (blue). Mean fraction of pupae ± SE in five vials. <sup>∗</sup>*p <* 0*.*05, ∗∗*p <* 0*.*005, ∗∗∗*p <* 0*.*001 (Student's *t*-test).

multiple generations prior to the experimental analysis. Under these settings, the transgenerational effect of bacterial removal is influencing both parents and offspring thus, eliminating the ability to determine which part of the response reflects an influence within generation and what part is due to the transgenerational effect. The ability to resolve the different scenarios is further reduced by treatments with antibiotics, which are often used to maintain bacteria-free conditions after dechorionation.

We have previously shown that G418 toxicity induces additional phenotypes beyond the delay in development. These include induced expression of *neoGFP* in gut regions of *hairy::neoGFP* and *drm::neoGFP* larvae, morphological abnormalities in the wings of *Hsp70::neoGFP* flies, reduction in fly size in G418-exposed *drm::neoGFP* larvae, down-regulation of a number of *Polycomb* group genes in the proventriculus of *hairy::neoGFP* larvae, and up-regulation of detoxification (*GstD*) genes in the same region (Stern et al., 2012). Here we show that removal of gut bacteria without exposure to G418 does not induce these other G418 reported phenotypes. This indicates that these phenotypes are caused by a direct effect of the toxic stress of G418 on the fly's tissues, and not by its effect on the fly's microbiome. Additionally, unlike the delay in pupation time, the other heritable phenotypes of G418 exposure (induced expression of *neoGFP* and wing abnormalities) were not observed in offspring of bacterial depleted parents. Thus, the inter-generational difference between parents and offspring did not extend to other heritable phenotypes reported in Stern et al. (2012). Altogether, this indicates that the inheritance of these phenotypes following parental exposure to G418 reflects additional mechanisms beyond those of host–microbe interactions. Recent work in mice also provided evidence for multiple mechanisms supporting the inheritance of different phenotypes in a given experimental setting (Padmanabhan et al., 2013). Having more than one mechanism of transgenerational inheritance in a single experimental setting (e.g., exposure to G418), and in different animals, suggests that non-Mendelian inheritance might be more prevalent than often assumed. Additionally, by uncovering the contribution of microbial disruption to the inheritance of a stress-induced phenotype, our findings expand the set of factors which could support transgenerational inheritance of induced responses.

The transgenerational effect of bacterial removal on pupation time was completely rescued by supplementing the parental food with bacteria from a commensal *Acetobacter* species (*Colony 1*). On the other hand, commensal *Lactobacillus* spp. (*Colony 7*, and *L. plantarum*) led to only partial rescue. The complete rescue by *Acetobacter* species indicates that their absence is sufficient for causing a delay in offspring generations. Additionally, this rescue of the transgenerational effect is consistent with the *Acetobacter*mediated prevention of the developmental delay in offspring of G418-exposed parents. This prevention was observed in every experimental setting in which G418-free food was exposed to *Acetobacter* spp*.*, including natural forms of contact with bacteria from non-exposed (male or female) parents. Altogether, these results indicate that depletion of *Acetobacter* spp. in both parents is necessary and sufficient for the inheritance of the delay in development. Since the bacterial composition, including *Acetobacter* content, may be influenced by a diverse set of environmental agents (and not just by antibiotics), the microbiome likely mediates multi-generational influences on growth kinetics in response to a variety of natural environmental settings.

The mechanistic basis of bacterial–host interactions in flies has been previously focused on the involvement of Insulin signaling (case of *A. pomorum* Shin et al., 2011) and the TOR pathway (case of *L. plantarum* Storelli et al., 2011). Here we identified Riboflavin as one of the mediators of *Acetobacter*-dependent prevention of the delay in larval development. We found that a commensal *Acetobacter* species produces relatively large amount of Riboflavin. Supplementing the food with exogenous Riboflavin reduced the delay in non-exposed larval offspring of G418 exposed parents. Conversely, co-supplementing the food with *Acetobacter* species and the Riboflavin inhibitor, Roseoflavin, compromised the production of Riboflavin by the *Acetobacter* species and inhibited its ability to prevent the delay in development. This inhibition was relieved by addition of exogenous Riboflavin. Collectively, these results indicate that Riboflavin is necessary for the *Acetobacter*-dependent prevention of the inheritance of the delay in development.

The contribution of Riboflavin to the prevention of the delay in offspring of bacterial depleted parents was also consistent with the smaller rescue ability of commensal *Lactobacillus* spp., *Colony 7*, and *L. plantarum*. Indeed, quantitative LCMS-based analysis of *Colony 7* revealed substantially lower amounts of Riboflavin compared to the commensal *Acetobacter* species (*Colony 1*).

The influence of Riboflavin is also consistent with previous studies which showed that vitamins affect the rate of development of germ-free larvae when the food conditions are also axenic (Tatum, 1939; Sang, 1956). In our settings of non-axenic rich diet, Riboflavin did not influence the growth of non-manipulated flies. Nonetheless, it expedited the development of non-exposed larval offspring of G418-exposed parents. This suggests that the history of exposure to G418 creates an additional requirement beyond that provided normally in the fly's diet.

Deciphering downstream mechanisms of Riboflavin action in host tissue is highly confounded by the broad influence of Riboflavin on many enzymes. Riboflavin is a source of FAD and FMN derivatives serving as cofactors in the oxidative metabolism of carbohydrate, amino acids, and fatty acids. These cofactors are involved in many functions and can have pleiotropic effects on development and physiology (reviewed in Powers, 2003). Similarly to other vitamin-derived cofactors, deficiency in Riboflavin supply (e.g., in the case of bacterial depletion) likely impacts multiple molecular pathways in host tissues. Future work will determine if the effect of Riboflavin deficiency following exposure to G418 is primarily mediated by a single host pathway or, alternatively, by a coordinated action of multiple mechanisms.

Non-Mendelian Transgenerational phenomena in multiple organisms can involve a variety of epigenetic mechanisms (reviewed in Jablonka and Raz, 2009; Daxinger and Whitelaw, 2012; Jablonka, 2012; Lim and Brunet, 2013). In this work we describe a previously unrecognized scenario of transgenerational inheritance mediated by microbial-based modifications in parents which, in turn, affect the development of their progeny. While we demonstrate this in flies, it might be more broadly applicable to many organisms which maintain complex interactions with their commensal bacteria. Indeed, commensal bacteria are an integral part of many, if not all other known animals. These bacteria can be viewed as a "distributed organ" which can be readily affected by the environment on rapid time scales (within and across several generations). The changes in the microbiome might be in species composition as well as in bacterial gene sequence (i.e., following bacterial adaptation to the stressful environment). These changes feedback on host development and physiology, and may be responsible for diverse effects which could have been previously attributed to host-intrinsic factors. The heritability of the changes in the microbiome further provides potential infrastructure for influencing many generations (Bakula, 1969; Rosenberg and Zilber-Rosenberg, 2011). As such, the involvement of commensal bacteria in non-Mendelian, multi-generational responses to the environment might be considerably widespread. Thus, commensal bacteria are major mediators of gene-environment interactions with potential transgenerational and evolutionary implications that await further exploration. The diverse genetic tools available in *D. melanogaster*, together with the relatively low complexity of its microbiome and the ease with which this microbiome can be manipulated, make the fly a powerful model system for this exploration.

#### **MATERIALS AND METHODS**

#### *DROSOPHILA* **STOCKS**

*drm*-*GAL4*, *hairy*-*GAL4, and Hsp70-GAL4* lines were obtained from the Bloomington Stock Center. *yw* line was obtained from the lab of Dr. Eli Arama (Weizmann Institute of Science), *Canton-S* and *Oregon-R* lines were obtained from the lab of Prof. Adi Salzberg (The technion). The UAS-*neoGFP* line was generated as described in Stern et al. (2012).

#### **FOOD PREPARATION**

Standard cornmeal food (Bloomington Stock Center recipe, http://flystocks.bio.indiana.edu/Fly\_Work/media-recipes/ molassesfood.htm) was heated to ∼60◦C and mixed (1:100 in volume) with antibiotics to reach the desired final concentration. For the experiments involving the transgeneic, *hairy::neoGFP*, *drm::neoGFP* and *Hsp70::neoGFP* lines we used G418 at final concentration of 400µg/ml. For wild-type strains (*yw, Canton-S, Oregon-R*), final concentrations of antibiotics were as follows: G418 (100µg/ml), Ampicillin (100µg/ml), Puromycin (200µg/ml), and Ciprofloxacin (300 µg/ml). The food was divided into standard 25 × 95 mm Drosophila vials (cat# 51–0500, Biologix, USA). Vials with 10 ml of food were left overnight at room temperature and stored at 4◦C for up to 2 weeks prior to usage. For non-exposed conditions, the food was prepared in the same way but without the antibiotics.

#### **MEASURING DURATION OF LARVAL DEVELOPMENT**

Duration of larval development was measured by counting the number of pupae in each vial daily. The integrated number of pupae formed prior to each inspection time was normalized to the total number of pupae formed in the vial at the end of the experiment.

#### **ISOLATION AND GROWTH OF COMMENSAL BACTERIA**

Ten male flies were shaken in 1 ml of PBS buffer at room temperature for 30 min. 100 ml of this fluid were serially diluted, spread on YPD agar plates and colonies were grown at room temperature for 3 days. Several colonies were picked, underwent three additional rounds of isolation, grown overnight at 30◦C in liquid YPD and stocked in 35% glycerol at −80◦C.

#### **TRANSGENERATIONAL EXPERIMENTS**

Three males and two females were crossed and allowed to lay eggs for 3 days in vials with or without G418 (or other antibiotics). Unless specifically indicated, all F1 experiments with transgenic flies were done using lines heterozygous for the *GAL4* driver and the UAS-*neoGFP* transgenes, as described in Stern et al. (2012). F1 flies developed from these eggs were collected after 19–20 days from the start of the experiment (4–7 day old adults) and the same number of males and females were crossed again in vials without antibiotics. For subsequent generations in experiments involving the transgenic lines, only fluorescent adults were crossed (i.e., flies carrying at least one *GAL4* and one UAS-*neoGFP* transgenes).

#### **TRANSGENERATIONAL EXPERIMENTS WITH PRIOR REMOVAL OF BACTERIA FROM FLIES**

Dechorionated and sterilized embryos were transferred to vials with or without G418. For analyses of F2 and F3 bacterial depleted flies, F1 flies developed from these eggs were collected after 19–20 days from the start of the experiment (4–7 day old adults) and the same number of males and females were crossed again in vials without G418.

#### **STATISTICAL ANALYSES**

All statistical tests were performed using the MATLAB software (MathWorks). One-sided student's *t*-test was used for evaluating the statistical significance of mean values. Analyses of functional enrichments in groups of up- and down-regulated genes in the microarray experiments were done using the DAVID web tool (http://david*.*abcc*.*ncifcrf*.*gov/, Huang da et al., 2009a,b).

#### **DETECTION AND QUANTIFICATION OF** *neoGFP* **EXPRESSION**

Quantification of *neoGFP* intensity in the proventriculus: For each proventriculus of an individual third instar larva, an image was taken using a fluorescent stereoscope (Leica MZ16F) with constant imaging parameters. Image analysis for measuring the induction of *neoGFP* in the proventriculi of *hairy::neoGFP* larvae was performed as follows: GFP intensity was measured in the anterior half of the proventriculus by computing the average pixel intensity of GFP. Then, for each image the background autofluorescence of the tissue (evaluated in the dark area in the middle of the proventriculus) was subtracted. Image analysis and computation was performed using a custom MATLAB (MathWorks) script.

Quantification of *neoGFP* intensity of expression in the midgut: images were taken from each larva as described above for the proventriculus tissue. Quantification of *neoGFP* levels in the midgut of *drm::neoGFP* larvae was performed by calculating the average pixels intensity in the region of the midgut with *GFP* intensity above background. Measurements from nonexposed larvae were performed in a midgut tissue of about the same size and location. Average intensity was corrected for autofluorescence in the midgut (primarily due to food inside the gut) by subtracting the average intensity measured in an adjacent (non-induced) midgut area.

#### **DETECTION AND QUANTIFICATIONS OF PUPA AND ADULT PHENOTYPES**

Images of adult males or females were taken using a fluorescent stereoscope (Leica MZ16F). Length of adult *drm::neoGFP* flies was calculated by measuring (using the NIS-Elements software, Nikon) the number of pixels along a line drawn from head to genital of each fly. The number of pixels was then converted to millimeters.

Adult *Hsp70::neoGFP* flies with wing abnormalities were scored by eye.

#### **CONDITIONING EXPERIMENTS**

Conditioning experiments with flies were performed by insertion of 5 adult male flies into a regular fly vial. After 24 h, these flies were discarded and replaced by flies that were used for mating. Conditioning with wash fluid was performed by washing 10 adult flies in 1 ml of PBS for 30 min. 100 ml of the wash fluid was applied to the top of the fly food. After 24 h these vials were used for the mating experiments.

#### **ANALYSIS OF 16S rRNA SEQUENCE OF** *COLONY 1* **AND** *COLONY 7*

The 16S gene was PCR-amplified from bacterial DNA using universal primers 8F (AGAGTTTGATCCTGGCTCAG) and 1492R (GGTTACCTTGTTACGACTT) (Weisburg et al., 1991). The PCR product was cloned into PGMT plasmid, sequenced and compared to the 16S database (Green Genes database http://greengenes*.*lbl*.*gov/) using the BLAST algorithm. The resulted closest matches—*A. aceti* (AJ419840.1), *A. pomorum* (AJ001632.1), *A. tropicalis* (AJ419842.1), and *A. cibinongensis* (AB052711.1), along with the previously reported commensal bacteria *L. plantarum* (JQ411248.1), *L. brevis* (JQ236623.1), *L. fructivorans* (AB680532.1), and Wolbachia (DQ412083.1) were used for multiple alignment using the T-Coffee web resource.

#### **BACTERIAL STRAINS FROM COMMERCIAL STOCKS**

*A. aceti*, *A. tropicalis, A.pomorum, and A.cibinongensis* were purchased from the German Collection of Microorganisms and Cell Cultures (DSMZ, http://old.dsmz.de/identification/main. php?contentleft\_id=2) and were grown in YPD or LB media.

#### **SAMPLE PREPARATION FOR SCANNING EM**

Bacteria were grown to mid-log in YPD medium at 37◦C and were fixed with 2.5% gluteraldhyde and 2.5% PFA in 0.2 M cacodylate buffer. The cells were attached to PLL (poly-L-lysine) coated silica chip, dehydrated in an ethanol series (50, 70, 96, and 100%) and critical-point dried using BALTEC dryer. The dry samples were coated with gold-palladium for 4 min and visualized in a highresolution Ultra 55 SEM (Zeiss).

#### **SENSITIVITY OF** *COLONY 1* **TO DIFFERENT ANTIBIOTICS**

A single colony from the *Colony 1* stock was isolated from YPD plates and grown to OD600 = 1 in YPD at 30◦C with continuous shaking. 10µl of this were inoculated into 2 ml of YPD, and different concentrations of antibiotics were added in triplicates. OD600 was measured after 24 h of incubation at 30◦C. OD measurements were normalized to antibiotic-free sample.

#### **BACTERIAL REMOVAL BY EGG DECHORIONATION**

Flies were crossed and allowed to lay eggs for 2–4 h. Eggs were collected and dechorionated for 2 min in 2.7% sodium hypochlorite (2-fold diluted bleach), washed twice in 70% ethanol and then twice with sterile distilled water as previously described (Brummel et al., 2004).

#### **BACTERIAL REINTRODUCTION EXPERIMENTS**

For the bacterial reintroduction experiments we used bacteria from single colonies (either *Colony 1* or *Colony 7*) isolated from naive flies, a strain of commensal *L. plantarum* obtained from the Rosenberg lab (Tel-Aviv University), and bacteria from DSMZ collection (*A. aceti, A. pomorum*, *A. tropicalis*, and *A. cibinongensis*). The bacteria were grown (with shaking) overnight to an OD600 of 1–2 at 30◦C in 3 ml YPD. Bacteria were diluted to OD 0.1 in YPD, and 100µl were added to each vial before the transfer of untreated or dechorionated eggs to these vials.

For analysis of the F2 generation, 5–10 days old adult flies were collected from these vials, and 3 males and 2 females were crossed in a new (untreated) vial for 2 days.

#### **SUPPLEMENTING FLY FOOD WITH VITAMINS AND ROSEOFLAVIN**

Standard cornmeal food (Bloomington Stock Center recipe, http://flystocks.bio.indiana.edu/Fly\_Work/media-recipes/molass esfood.htm) was heated to ∼60◦C and mixed with Thiamine (2·10−<sup>3</sup> mg/ml final), Riboflavin (1·10−<sup>2</sup> mg/ml), Nicotinic Acid (1.2·10−<sup>2</sup> mg/ml), Calcium Pantothenate (1.6·10−<sup>2</sup> mg/ml), Pyridoxine (2.5·10−<sup>3</sup> mg/ml), Biotin (1.6·10−<sup>4</sup> mg/ml), Roseoflavin (100µM), or combinations thereof dissolved in water. The mix was split to standard 25 × 95 mm *Drosophila* vials (cat# 51–0500, Biologix, USA). Vials with 10 ml of mixed food were left overnight at room temperature before storage at 4◦C for up to 2 weeks prior to usage.

#### **mRNA EXTRACTION AND ANALYSIS**

Proventriculi were dissected from ∼50 *hairy::neoGFP* third instar larvae for each condition, and were kept during dissection in ice cold PBS. Total RNA was extracted and purified using the RNeasy MinElute Cleanup Kit (QIAGEN). Genome-wide mRNA expression levels in each sample were measured using Affymetrix drosophila 2.0 array using standard Affymetrix protocols. Arrays were normalized with the RMA algorithm using the Expression Console software (Affymetrix).

Results for specific genes were verified using real-time quantitative PCR as follows: mRNA was converted to cDNA using high-capacity Reverse Transcription kit (Ambion). Transcript levels were measured using real-time qPCR on a 7900HT Fast Real-Time PCR Machine using Power SYBR green PCR master mix (Applied Biosystems). Specific primers used in this study are listed below:


#### **MASS SPECTROMETRY ANALYSIS**

#### *Mass spectra analysis of colony 1*

*Colony 1* was cultured overnight (with shaking) at 30◦C in 100 ml YPD to OD600 between1.5 and 2. Bacteria were centrifuged at 4◦C for 15 min at 4000 g and washed twice with 20 ml of water at the same conditions. The bacteria were re-suspended in 10 ml of water and sonicated on ice using microtip at 40% power (30 s ON followed by 50 s OFF for a total of 15 min) on a Sonics Vibra-Cell machine (Sonics and Materials, Inc.). The solution was then centrifuged (20,000 g) at 4◦C for 20 min and the supernatant was passed through 0.22 micron filter. Pure vitamins [Riboflavin—Sigma-Aldrich R4500–25; Calcium D−(+)-pantothenate—Santa-Cruz SC-202515; Pyridoxine hydrochloride—Sigma-Aldrich P9755– 25G] were reconstituted from powder to serve as references for the bacterial samples. All samples were analyzed on an Electron Spray machine in the Mass Spectrometry core facility unit at the Weizmann Institute of Science, Rehovoet, Israel.

#### *Quantitative measurements of riboflavin content*

Bacteria were cultured (with shaking) overnight to OD600 between 1 and 2 at 30◦C in 20 ml YPD. For experiments with Riboflavin inhibition, Roseoflavin (0215425225, ENCO) was added to the YPD to a final concentration of 100µM and the bacteria were allowed to grow. It was then diluted to OD600 1, and 10 ml were centrifuged at 4◦C for 15 min at 4000 g and washed twice with 10 ml of ice-cold DDW. The bacteria were re-suspended in 1 ml of methanol and sonicated on ice using "Bioruptor® Standard sonication device" on high setting (30 s ON followed by 30 s OFF for a total of 30 min). The solution was then centrifuged (20,000 g) at 4◦C for 20 min and the supernatant was passed through 0.22 micron filter. It was completely evaporated with Nitrogen gas and re-suspended in 100µl methanol. Pure vitamins (Riboflavin— R4500–25 and Thiamine hydrochloride—T4625–25G, Sigma-Aldrich) were reconstituted from powder to concentrations of 50µg/ml to serve as references for the bacterial samples. Three biological repeats of each sample were subjected to LC-MS analysis.

Analysis was performed using an Ultra Performance Liquid Chromatography—Mass Spectrometry (UPLC)-MS/MS triple quadrupole instrument (Xevo TQ MS, Waters) equipped with an electrospray ion source and operated in positive ion mode. MassLynx and TargetLynx software (v.4.1, Waters) were applied for the acquisition and analysis of data. Chromatographic separation was done on a 100× 2.1-mm inside diameter, 1.7-µm UPLC BEH C18 column (Acquity, Waters) with mobile phases A (10 mM ammonium formate buffer, pH = 6.26) and B (acetonitrile) at a flow rate of 0.3 ml/min and column temperature 40◦C. A gradient was as follows: 0–4 min linear gradient from 0 till 100% B, 4.0–4.5 min hold at 100% B, 4.5–5.0 min return to 0% B and equilibration at 0% B for 2 min. Injection volume of 5µL was used.

For MS, argon was used as the collision gas with flow 0.22 ml/min. Cone voltage was 40 V, the capillary was set to 1.15 kV, source temperature—150◦C, desolvation temperature— 550◦C, desolvation gas flow—800 L/min, cone gas flow—50 L/min. Riboflavin was detected using multiple reaction monitoring (MRM) applying the following parameters: transition 377.2 *>* 243.1 (collision energy—22 eV, was used for quantification), transition 377.2 *>* 198.1 (collision energy—36 eV).

#### **BACTERIAL DNA-SEQ EXPERIMENTS**

Ten guts of *hairy*::*neoGFP* third instar larvae that were developed in vials with or without G418, were dissected from each sample and pooled together. Bacterial DNA was extracted using the "chemagic DNA bacteria Kit" (Chemagen). The sequencing of the 16S rDNA was performed as described by Amir et al. (2014). Briefly, DNA was PCR amplified using six pairs of nonoveralpping 16S primers. Each pair amplified relatively short (∼200 bp) region that is conserved among a subset of known bacteria. Altogether, the six regions amplify over 90% of the Greengenes database (version dated 2010), having an amplicon of approximately 1200 base pairs in total. Additionally, the primers included Illumina sequencing barcodes and therefore the library preparation stage was not necessary. PCR products were cleaned (Promega, Fitchburg, WI) and DNA concentration was measured using Qubit (Life Technologies). All samples were pooled in equimolar ratios and sequenced on a single lane of an Illumina HiSeq2000 sequencer using 100 nt paired-end reads. The number of reads per sample following quality filtering was about 20·106.

Microbial profiling based on the 16S sequences was performed using a novel framework which allows massively parallel sequencing (MPS) of a large genomic region, thus increasing the resulting phylogenetic resolution while reducing the number of false positively detected bacteria. Detailed description of the microbial reconstruction methodology is provided by Amir et al. (2014). In brief, we formulated a statistical model for generating the observed reads given a pre-defined database of about 450,000 known 16S sequences (Greengenes, version dated 2010), and solved an optimization problem whose output is the identity and frequency of each sequence. Each of the millions of reads measured by MPS, together with all "absent" reads that were not found, set constraints which enable "zooming in" on the correct species present in the mixture. Since reads are much shorter than the amplified region and contain errors, they are often shared by many bacterial sequences in the database. Nonetheless, each read provides evidence in support of the existence of the "correct" bacteria in a probabilistic way. The method integrates the statistical evidence from all reads to infer the frequency of each sequence in the database.

#### **MEASURING BACTERIAL CONTENT BY PCR**

For primer specificity tests, we used defined bacteria obtained from the DSMZ stock collection. Colonies were isolated and grown over-night in YPD medium. Wolbachia was omitted from these tests due to lack of culturing conditions. To measure the levels of bacteria in the gut, 7–10 third instar larvae were collected. The gut of the larvae was dissected and pooled. Bacterial genomic DNA was extracted using "chemagic DNA Bacteria" kit (Chemagen) and 5 ng of purified DNA was used per qPCR reaction. All reactions were performed in triplicates on a qPCR machine (Applied Biosystems 7900HT Fast Real-Time PCR System, Life Technologies Corporation) using SYBRGreen in 384 well-plates. The species-specific primers used are: aceto\_rt\_1\_f (TAG TGG CGG ACG GGT GAG TA), aceto\_rt\_1\_r (AAT CAA ACG CAG GCT CCT CC), lacto\_rt\_2\_f (AGG TAA CGG CTC ACC ATG GC), lacto\_rt\_2\_r (ATT CCC TAC TGC TGC CTC CC), wolb\_rt\_2\_f (CAA TGG TGG CTA CAA TGG GC), wolb\_rt\_2\_r (GTA TTC ACC GTG GCG TGC TG). DNA content of the Drosophila *Actin* gene was used to normalize the bacterial content. *Actin* primers used: dros\_rt\_1\_f (GGA AAC CAC GCA AAT TCT CAG T), dros\_rt\_1\_r (CGA CAA CCA GAG CAG CAA CTT).

#### **ACKNOWLEDGMENTS**

We thank Prof. Erez Braun for useful discussions. We thank Dr. Ilit Cohen-Ofri and Naama Halevi (The Weizmann Mass Spectrometry and Chemical Analysis Laboratory) for help with the Mass Spectrometry analysis. We thank Dr. Elena Kartvelishvily (The Irving and Cherna Moskowitz Center for Nano and Bio-Nano Imaging at the Weizmann Institute of Science) for help with the Electron Microscopy analysis. We thank Dr. Alexander Brandis (of the Weizmann Biological services, the Small Molecule unit) and Dr. Ilana Rogachev (of the Weizmann Biological services, the Small Molecule unit and the Plant Sciences department) for their extensive help with the Quantitative Mass-Spectrometry analysis. This work was supported by a grant from the Templeton Foundation (ID: #40663), THE ISRAEL SCIENCE FOUNDATION (grant No. 1860/13), THE F.I.R.S.T program of THE ISRAEL SCIENCE FOUNDATION **(**grant No. 1419/09), Minerva-Weizmann Program, IDD Open University grant, and the Kahn Center for Systems Biology. Yoav Soen is Incumbent of the Daniel E. Koshland Sr. Career Development Chair at the Weizmann Institute. Shay Stern was supported by the Clore fellowship and the "Kahn Family Research Center on Systems Biology of the Human Cell" award.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fgene.2014.00027/ abstract

#### **Supplementary Data Sheet 1 | Supplementary Figures S1–S6 (including captions).**

**Supplementary Data Sheet 2 | Summary of 16S deep-sequencing results of gut samples of G418 exposed and non-exposed larvae.** Samples were extracted from third instar *hairy::neoGFP* larvae. Data is represented as the normalized percent of reads mapped to a Greengenes Database annotation (for bacterial abundance *>*1%). The 16S ribosomal RNA sequence of each Greengenes annotation was associated with known bacteria by calculating the respective sequence homology using RDP classification (http://rdp.cme.msu.edu/tutorials/classifier/RDPtutorial\_ RDP-CLASSIFIER.html).

#### **Supplementary Data Sheet 3 | Summary of microarray analyses of mRNA profiles in the proventriculus of bacterial depleted and untreated flies.**

mRNA was extracted from the proventriculi of *hairy::neoGFP* third instar larvae that hatched from dechorionated and sterilized eggs or from untreated eggs. Genome-wide profiles were analyzed using affymetrix microarrays. Data represented as log2 fold-change difference between bacterial depleted and untreated larvae in two biological replicates.

#### **Supplementary Data Sheet 4 | Functional enrichments in sets of proventriculus genes which respond to removal of extracellular bacteria.**

Analyses of functional enrichments in groups of up- and down-regulated genes (*>*1.5-fold) performed using the DAVID web tool (http://david. abcc.ncifcrf.gov/).

#### **REFERENCES**


to famine in humans. *Proc. Natl. Acad. Sci. U.S.A.* 105, 17046–17049. doi: 10.1073/pnas.0806560105


of novel methylation variants. *Science (New York, NY)* 334, 369–373. doi: 10.1126/science.1212959


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 23 December 2013; accepted: 25 January 2014; published online: 25 February 2014.*

*Citation: Fridmann-Sirkis Y, Stern S, Elgart M, Galili M, Zeisel A, Shental N and Soen Y (2014) Delayed development induced by toxicity to the host can be inherited by a bacterial-dependent, transgenerational effect. Front. Genet. 5:27. doi: 10.3389/fgene. 2014.00027*

*This article was submitted to Epigenomics and Epigenetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Fridmann-Sirkis, Stern, Elgart, Galili, Zeisel, Shental and Soen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

### Environmental disruption of host–microbe co-adaptation as a potential driving force in evolution

#### *Yoav Soen\**

Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel

#### *Edited by:*

Ilaria Negri, University of Turin, Italy

#### *Reviewed by:*

Abhijit Shukla, Harvard Medical School, USA Sateesh Kagale, National Research Council Canada, Canada

#### *\*Correspondence:*

Yoav Soen, Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 76100, Israel e-mail: yoavs@weizmann.ac.il

The microbiome is known to have a profound effect on the development, physiology and health of its host. Whether and how it also contributes to evolutionary diversification of the host is, however, unclear. Here we hypothesize that disruption of the microbiome by new stressful environments interferes with host–microbe co-adaptation, contributes to host destabilization, and can drive irreversible changes in the host prior to its genetic adaptation. This hypothesis is based on three presumptions: (1) the microbiome consists of heritable partners which contribute to the stability (canalization) of host development and physiology in frequently encountered environments, (2) upon encountering a stressful new environment, the microbiome adapts much faster than the host, and (3) this differential response disrupts cooperation, contributes to host destabilization and promotes reciprocal changes in the host and its microbiome. This dynamic imbalance relaxes as the host and its microbiome establish a new equilibrium state in which they are adapted to one another and to the altered environment. Over long time in this new environment, the changes in the microbiome contribute to the canalization of the altered state. This scenario supports stability of the adapted patterns, while promoting variability which may be beneficial in new stressful conditions, thus allowing the organism to balance stability and flexibility based on contextual demand. Additionally, interaction between heritable microbial and epigenetic/physiological changes can promote new outcomes which persist over a wide range of timescales. A sufficiently persistent stress can further induce irreversible changes in the microbiome which may permanently alter the organism prior to genetic changes in the host. Epigenetic and microbial changes therefore provide a potential infrastructure for causal links between immediate responses to new environments and longer-term establishment of evolutionary adaptations.

**Keywords: host–microbe symbiosis, genotype-to-phenotype transformations, canalization, developmental plasticity, epigenetics, dynamical systems, non-Mendelian inheritance, adaptation**

#### **HYPOTHESIS**

#### **THE MICROBIOME AS A 'DOUBLE AGENT' OF CANALIZATION AND DE-CANALIZATION**

In his influential perspective on "Canalization and the inheritance of acquired characters" (Waddington, 1942), Conrad Waddington argued that evolution of developing organisms under natural selection tends toward canalization of the developmental process. That is to say, the sensitivity of developmental patterns to environmental and genetic perturbations is reduced over time in the environment to which the organism becomes adapted. A similar idea was developed independently by (Schmalhausen, 1949). Owing to this hypothesized tendency toward canalization, the patterns of development in the wild type organism are presumed to be more stable than the patterns in mutants (Waddington, 1942). This allows cryptic variation to be accumulated without phenotypic consequences. This variation can be unmasked by a sufficiently disruptive environmental or genetic perturbation (de-canalizing event) which breaks robustness and leads to phenotypic consequences reflecting the previously hidden variability (Flatt, 2005). Although canalization is most likely a systemic property, few specific molecular mechanisms have

been proposed to support canalization [reviewed in (Salathia and Queitsch, 2007)]. These include buffering of genetic variability by the Hsp90 chaperone (Rutherford and Lindquist, 1998; Queitsch et al., 2002; Milton et al., 2006; Samakovli et al., 2007; Sgro et al., 2010), tissue-specific maintenance of active and repressed state of developmental genes by the Polycomb (and by analogy also the trithorax) system (Lee et al., 2005; Sawarkar and Paro, 2010; Stern et al., 2012), stabilizing negative feedback by microRNAs (Ronshaugen et al., 2005; Hornstein and Shomron, 2006; Li et al., 2009; Wu et al., 2009), and piwimediated silencing of transposon activity (Specchia et al., 2010; Gangaraju et al., 2011) or of existing variation (Gangaraju et al., 2011).

It is quite possible, though, that at least part of the noticeable contribution of these molecular pathways to canalization is due to their particularly wide range of interactions with other developmental genes and processes. In other words, these pathways are so deeply embedded into many regulatory feedbacks, that their disruption (by the environment or by a genetic change) can lead to significant impacts on developmental and physiological phenotypes. If correct, this rationale should hold more broadly for other intrinsic and extrinsic factors, which were co-opted to a particularly wide range of processes. One candidate factor is the microbiome which has co-evolved with its host for many generations. Similarly to host-intrinsic molecules and pathways that become involved in multiple functions, co-evolution of the microbiome with its host likely leads to integration of the microbiome into diverse host functions (Backhed et al., 2005; Rosenberg et al., 2009). This integration may be further widened and deepened by the ability of bacteria to produce many factors which could interact and cooperate with host processes. As part of this cooperation, the microbiome can provide the host with metabolites and signaling molecules (Backhed et al., 2004; Arora and Sharma, 2011; Shin et al., 2011; Flint et al., 2012; LeBlanc et al., 2013), assist in harvesting these molecules (Yamanaka et al., 1977; Brune and Friedrich, 2000; Backhed et al., 2005; Wang et al., 2011), synthesize organic compounds that are passed on to the host (Cavanaugh et al., 2006; Dubilier et al., 2008), regulate the immune system [reviewed in (Round and Mazmanian, 2009; Kranich et al., 2011)], reduce the propensity for disease (Markle et al., 2013), protect against invaders [Fukuda et al., 2011; Wang et al., 2013; which may be outcompeted and displaced by the microbiome (Kamada et al., 2012)], affect lifespan (Brummel et al., 2004; Behar et al., 2008) and fecundity (Ben-Yosef et al., 2010), influence mating preference (Sharon et al., 2010a), and likely perform many other developmental stage-dependent or independent functions yet to be discovered. Such evolved cooperation can result in wide, reciprocal interdependencies between the host and its microbiome. Owing to this broad dependency of the host on its microbial partners, disruption of the microbiome can compromise normal host development and homeostasis (Littman and Pamer, 2011; Martin et al., 2013). Conversely, the intact microbiome could contribute to the stability of host processes. This presumption is supported by recent work in a variety of organisms which have begun to uncover increasing numbers of host pathways and processes that could be affected by disruption of the microbiome (Brummel et al., 2004; Sharon et al., 2010b; Shin et al., 2011; Storelli et al., 2011), including modulation of epigenetic gene control [Takahashi, 2014; e.g., methylation of Toll-like receptors in intestine cells (Takahashi et al., 2011)]. As in the case of mutations (Waddington, 1942), a disrupted microbiome does not necessarily de-stabilizes every phenotype in every stressful scenario. However, it is expected to promote significantly more cases of compromised phenotypic robustness compared to cases of increased robustness.

When the host and its microbiome co-evolve in a stable, or a frequently occurring environment, they become adapted not only to this environment, but also to one another. We refer to this state as coordinated adaptation. Deviations from this co-adapted state (in response to environmental, genetic, or epigenetic modifications) could create stressful conditions promoting reciprocal changes in the host and its microbiome. Stated differently, modifications in the previously adapted microbiome can change the internal environment of the host. This change in the internal environment could promote response in the host which, in turn, may affect the conditions for survival and growth of the bacteria, and so on. Coordinated adaptation is re-established if and when the overall responses reduce the stress to the host and its microbiome to a level no longer sufficient for driving further changes. This is analogous

to the Le Chatelier's principle in chemistry (Atkins and de Paula, 2002), namely: if a system in chemical equilibrium is subjected to a disturbance, it tends to change in a way that opposes this disturbance. Thus, the establishment of a coordinated adaptation of the host and its microbiome can be viewed as an extension of the Le Chatelier's principle, from chemical systems to self-organization of living cells and organisms.

Establishing a new co-adapted state involves interactions between the microbiome and the physiological and epigenetic state of the host. Important aspects of this interaction can be heuristically illustrated using the epigenetic landscape metaphor (Waddington, 1957). This metaphor views the development of an organism as movement down a canal (**Figure 1**, top left), starting with the fertilized egg and moving toward the adult form. Developed adults produce a fertilized egg positioned at the top of the same canal, thus starting the process over again. Here, the walls of the canal represent stabilizing (canalizing) processes, acting at every stage to prevent deviations from the normal patterns of development (termed by Waddington as homeorhesis to denote the constancy of flow along a determined trajectory). Still, the canal and the entire landscape depend on the host genome and its environment (Waddington, 1957), including the microbiome which can be viewed as an "internal" environmental factor. It is important to note, however, that the use of a 2-dimensional surface in this metaphor is an over simplification which may not properly represent attractors and flows in a high dimensional space corresponding to a developing organism. Moreover, because of potential influences of (past and present) environments and the dependence of these influences on the actual trajectory occupied by the organism, the details of the landscape are more dynamic and less pre-defined than often perceived.

The robust patterns of development in frequently encountered environments (to which the organism is adapted) correspond to movement in a particularly deep canal (although the depth of the canal is likely inversely correlated with the plasticity of the developmental process and may therefore be shallower during early specification of cells and body segments and possibly also during metamorphosis in which a given tissue type is transformed into another). At any given time in a frequently encountered environment, the microbiome "assists" the establishment of the adapted patterns and is therefore, an agent of canalization. However, since the epigenetic landscape can be influenced by the environment, a sufficiently strong environmental stress can modify the landscape to an extent sufficient to de-canalize the process of development OFF the "regular canal" and ON to a "side canal" (**Figure 1**, top right). The change in landscape can reflect a direct effect of the external environment on the host as well as an indirect effect through environmental disruption of the microbiome. Indeed, severe disruption of the microbiome could compromise the normal function of many host processes, including those which contribute to robustness. This would tend to destabilize (de-canalize) the adapted patterns, leading to dysbiosis and increased phenotypic variability (wider sampling of the available epigenetic landscape). While such microbial disruption is not expected in frequently occurring environments to which the organism is well adapted, it can be readily occur under severe stressful environments in which the organism is considerably

**FIGURE 1 | Canalization and its disruption by stressful environments. (A)** Schematic depiction of the epigenetic landscape metaphor, viewing the process of development of an organism (e.g., fly) as a movement down a canal in epigenetic landscape. The landscape depends on the host genome and its environment, including its microbiome (which can be viewed as an internal environment). The canal represents the stage-dependent processes preventing deviations from the regular patterns of development in an environment to which the organism is adapted. The deeper the canal, the more robust are the patterns. The movement is perpetuated over generations by generating a fertilized egg positioned at the top of the same canal. **(B)** De-canalization by exposure to a stressful new environment. In this example, the change in the environment modifies the epigenetic landscape and increases the propensity of moving within a different canal. This can

maladapted. Thus, the microbiome can contribute to stability in frequently encountered environments, while promoting variability in sufficiently stressful new environments (**Figure 1**, bottom panels).

#### **SPECIES-SPECIFIC RATES OF ADAPTATION CREATE HOST–MICROBE INCOMPATIBILITIES WHICH DISRUPT CO-ADAPTATION AND PROMOTE RECIPROCAL CHANGES**

Although the host and its microbiome form a partnership of organisms which propagate as a community, members of this partnership exhibit species-specific differences in the rate, type, and magnitude of responses to the environment (Zilber-Rosenberg and Rosenberg, 2008; Rosenberg and Zilber-Rosenberg, 2011). Owing to differences in generation time, population size, and developmental constraints, the microbiome can adapt to new environments on time scales that are much faster than those of genetic adaptation in the host (Zilber-Rosenberg and Rosenberg, 2008). For example, the doubling time of bacteria is on the order of 1 h whereas the generation time of a human host is about 20 years. Thus, within a single human generation, bacterial population can be propagated over 105 generations. Combining this large number of generations with very large population sizes [>10<sup>13</sup> bacterial cells in human (Backhed et al., 2005)] provides a very substantial potential for bacterial adaptation and diversification within result in altered somatic tissues, altered germline and altered microbiome. Inset provides color-coded representation of various potential changes in the embryos and larval stages of a developing fly that has been exposed to a new stressful external environment. The gut microbiome is represented by the outer layer of the egg and the tube in the larva. The early embryo corresponds to the inner part of the egg and the germline is represented by the circle in larva. A change in color corresponds to alteration in the respective organ following the change in the external environment. **(Bottom)** Proposed, context-dependent influence of the microbiome on stability/flexibility. The microbiome contributes to stabilization in the regular environment to which the organism is well adapted (left). Disruption of the microbiome in a sufficiently stressful environment (right) destabilizes the host, thereby increasing phenotypic variability (increased flexibility).

a single generation of the host. Notably, modifications in the microbiome include changes in gene sequence (and/or function) within existing bacteria, and changes in species composition of the microbial community. This community often comprises many different species, whose relative abundance depends on the environment. As such, a stressful environment for the bacteria leads to species-specific selection which alters the composition and genetic structure of the community. This selection can be independent of whether or not the stress is also compromising the survival of the host.

While the bacteria can undergo substantial adaptation to new environments already within a single or few generations of the host, the latter remains largely genetically adapted to the previous environment (**Figure 2**). This can compromise the adaptation of the host to the new environment as well as its compatibility with the microbiome, leading to several distinct effects: first, disruption of the microbiome can destabilize the host, thereby enhancing its phenotypic responsiveness to the new environment (potentially resulting in increased occurrence, magnitude, and diversity of the phenotypic response). Additionally, the change in the microbiome is itself an "internal" environmental input which feeds back on the physiological and epigenetic response of the host (**Figure 3**). For example, the microbiome can respond to a new stress by producing (or elevating) a substance that is toxic to the host. This substance

adds to the external stress and can modify the type and magnitude of the response in the host even if the external environment is back to normal (**Figure 3B**). This is because the microbiome may still remain altered and disruptive to the host. Thus, the response of the host to microbial-mediated inputs can persist as long as the host–microbe incompatibility is large enough to modify the host.

#### **TRANSGENERATIONAL IMPLICATIONS OF THE DISRUPTION OF COORDINATED ADAPTATION**

The presumed contribution of the microbiome to the stabilization of host development and physiology is analogous to that of other factors (e.g., Hsp90) which can confer robustness to the organism in frequently occurring environments. However, unlike many other factors, the change in composition and/or genetic sequence of resident bacteria might itself be heritable, thereby providing potential infrastructure for multi-generational influences on the host (Zilber-Rosenberg and Rosenberg, 2008). Indeed, the microbiome in many organisms has been shown to be transmitted across generations either vertically or horizontally (Bright and Bulgheresi, 2010). Vertical transmission refers to transfer within the zygote, while horizontal transmission is based on variousforms of pickup of micro-organisms from the surroundings, followed by their re-colonization within the host. These mechanisms of transmission normally facilitate inheritance of the intact (co-adapted) microbiome. However, the same mechanisms could support inheritance of environmentally modified microbiome. In line with this possibility, we have recently found that environmentally induced changes in the composition of gut bacteria in flies can be inherited (Fridmann-Sirkis et al., 2014). Similar findings are expected whenever the microbiome is altered to a degree which no longer supports full recovery within one generation. In this case, the offspring inherit altered microbiome which may or may not have been partially restored. The extent of inherited changes depends on the details of the external environment (type and strength of stress, its onset and duration, proximity to unaffected individuals, etc.), the internal conditions inside the host and the dynamics within the community of microbial species.

Modifications in the inherited microbiome, can change the initial conditions of offspring development. Depending on the structure and dynamics of the epigenetic landscape, this could correspond to a change in trajectory leading to a different outcome in the adult (**Figure 4**, top right). The altered adults could again produce modified embryos, thus enabling multigenerational changes in movements in epigenetic space. These multi-generational changes may involve different scenarios which could support non-Mendelian inheritance. Heuristic representation of some of these scenarios is depicted in the bottom panels of **Figure 4** (scenarios 2–5). For example, a change in the microbiome in one generation can potentially modify the host germline, leading to altered embryonic development of the offspring (viewed either as an altered initial point in the landscape or as movement in a modified landscape). Alternatively, persistent change in the microbiome can alter later stages of development, leading again to a modified offspring adult with a potentially modified microbiome. Evidence supporting potential influence of resident bacteria on the germline in insects has been reported for the endosymbiont *Wolbachia* (Werren, 1997; Stouthamer et al., 1999; Starr and Cline, 2002; Negri et al., 2006, 2009). More recent evidence in flies (*Drosophila melanogaster*) provided indirect evidence for influence of gut microbiome on the germ line (Fridmann-Sirkis et al., 2014). Specifically, removal of gut bacteria in one generation led to a substantial delay in larval development starting in the following generation. The phenotypic difference between parents and offspring following the depletion of gut bacteria (in parents) suggests an influence of these bacteria on the parental germline. Such a transgenerational effect of bacterial depletion was further shown to be responsible for the inheritance of a delay in larval development following parental exposure to G418 antibiotic (Fridmann-Sirkis et al., 2014). In this case, the delay in the first generation of larvae is caused by a direct toxicity of G418 in the host tissue, but is maintained in the non-exposed offspring by the transgenerational impact of depleted gut *Acetobacter* species in the G418-exposed parents. This unexpected situation is just one example of potentially diverse circumstances in which disruption of the microbiome could contribute to non-Mendelian transfer of influences. Thus, regardless of whether the effects of microbial changes are restricted to the soma or not, inheritance of a modified microbiome provides substantial infrastructure for multi-generational persistence of non-genetic changes in the host.

and without contribution of microbial disruption by the external environment (green and dotted red, respectively). Disruption of the microbiome can increase the response of the host, leading to a potentially faster and more pronounced response which is also more diverse and affects more individuals. **(B)** Same as **(A)** when the environment reverts back (E˜→E) within generation.

**FIGURE 4 | Potential transgenerational impacts of environmental exposures. Top:** Heuristic depiction of fly development in an unperturbed landscape (left) and in a landscape modified by exposure to a strong environmental stress (right). The flow in a modified canal leads to characteristic modifications in somatic tissues of the fly and/or its germline and/or its microbiome. **Bottom:** Illustration of a few possible scenarios of development in subsequent offspring of exposed ancestors. Scenario (1): if the parental change in the environment does not modify the genome or the epigenetic state of the fertilized egg and, in addition, the microbiome is either unperturbed or fully restored ("no memory," most left panel), the offspring may follow the regular trajectory in an unperturbed landscape. Scenarios (2 and 3): if the initial state of the fertilized egg and/or the microbiome associated with the

This persistence depends on the strength and duration of the environmental stress, the degree of incompatibility between the host and the microbiome, and their inherent plasticity, or adaptability. The manner in which these factors are combined has a significant effect on the timescale of establishing a new equilibrium state.

#### **TIMESCALES OF INTERACTIONS BETWEEN STRESS-INDUCED EPIGENETIC AND MICROBIAL CHANGES**

Current understanding of dynamical systems suggests that interaction between stress-induced disruption of the microbiome and

egg are modified following the parental exposure ("memory"), the development of the offspring may differ from their parents even if the external environmental reverts to normal already during the parental generation. In these cases, alteration in offspring development might reflect localization to a different canal in the original or in a modified landscape (scenarios 2 and 3, left trajectory), or localization to a canal that has been modified by the impact of having modifications in the fertilized egg and/or the microbiome (scenario 3, right trajectory). Scenarios (4 and 5): if the external environmental stress persists beyond the parental generation it may again lead to changes in the entry point and shape of the canal ("no accumulation," scenario 4) as well as further modify the entry point and shape ("accumulation," scenario 5).

modifications in the epigenetic state provide potential for a rich variety of dynamical outcomes. Indeed, coupling between relatively simple non-linear systems can generate diverse modes of collective or non-collective dynamics, depending on the intrinsic parameters of the uncoupled systems, the type and strength of the coupling, the type and magnitude of the non-linearity, and the initial conditions (Kuramoto, 1984; Glass and MacKey, 1988; Badola et al., 1991; Otsuka, 1991; González-Miranda, 2004). For example, specific choices of couplings between two non-linear oscillators with sufficiently similar intrinsic frequencies can lead to synchronization of these oscillators, doublings of their intrinsic

or collective periods, chaotic dynamics (of either one or both) and destruction of the oscillatory behavior (Sarkar et al., 2013). It is plausible that analogous wealth of dynamics could be generated by interaction between the microbiome and the epigenetic state of the host (regarded here in an inclusive sense which covers various types of non-genetic host-intrinsic changes). This interaction can be viewed as coupling between two non-linear systems, each modulating the activity of the other. Unlike the slowly varying genome sequence of the host, epigenetic changes in the host can occur on a range of timescales which overlaps with timescales associated with microbial changes. This provides potential for inter-modulations which prolong the duration of changes expected in a microbiomefree host. Encountering such scenarios under extreme forms of stress is quite plausible and may have occurred in a recent example of transgenerational inheritance following parental exposure to toxic stress (Stern et al., 2012). This inheritance was shown to involve host-intrinsic (Stern et al., 2014) and microbial-mediated mechanisms (Fridmann-Sirkis et al., 2014), and typically persisted for 3–10 generations. It is possible that interactions between the host-intrinsic and microbial-mediated modifications extended the duration of inheritance beyond the duration that would have been observed without an interaction.

Depending on the duration of change in the external environment, one can further distinguish three different regimes:

#### *Stressful external conditions which persist for only one, or a small number of generations*

This regime of stress may initiate reciprocal modifications in the microbiome and the host. For most stressful environments, the modifications in the host would be primarily non-genetic (e.g., physiologic, metabolic, epigenetic etc.'). The microbiome, on the other hand, may change in gene sequence and species composition (in addition to epigenetic changes). Owing to the heritability of some of these changes [and potentially also heritability of epigenetic changes in the host (Jablonka and Raz, 2009; Jablonka, 2013; Lim and Brunet, 2013)], return to the normal environment will not necessarily lead to immediate reversion of the host back to its previous state (**Figure 4**, panels 2 and 3). This is because reversal of the environment does not necessarily eliminate host–microbe incompatibilities (and/or persistent host-intrinsic epigenetic changes) that have been induced during the external stress period. The induced host–microbe incompatibility promotes cross talk between the microbiome and the physiologic/epigenetic state of the host, eventually leading to establishment of a new co-adapted state in which the host and its microbiome become re-adapted to one another. The process of relaxation toward an altered state could span multiple generations of the host, but might be counteracted by re-exposure to nonmodified microbiome (e.g., by infection-mediated transfer from individuals which were not exposed to the new environmental conditions).

#### *Persistence of the stressful new environment over timescales much longer than required to reach a new host–microbe equilibrium state*

This scenario likely promotes more extensive and longer lasting changes in the host and the microbiome. After a few generations in this stable environment, the microbiome will likely become irreversibly modified. The modified microbiome will continue to change with its host until they become sufficiently adapted to the altered environment and to one another. If, in addition, the stressful environment influences a large geographic region, it will substantially reduce the chances of encountering non-exposed individuals, thus reducing the chances of infectionmediated reversion to the original microbiome. It is also possible that the microbiome niche will be taken over by modified bacteria that are abundant enough to prevent re-colonization of the original bacteria. Additionally, the host may become sufficiently modified so as to favor the altered microbiome over the original microbiome.

#### *Persistence of external stress over a time period comparable to that required for co-adaptation*

In this intermediate case, equilibrium of the host and the microbiome can be re-challenged by the reversion of the environment, thus prolonging the establishment of a new stable state. Since the timescales of environmental change might indeed be comparable to those of epigenetic and microbial changes, this scenario may not be extremely rare. Additionally, the likelihood of encountering this scenario may increase when the organism is capable of modifying its external environment (niche construction).

Taken together, the above regimes support phenotypic diversification and/or adaptation over timescales ranging from few to many generations. This diversification may become irreversible prior to genetic changes and adaptation in the host. As such, it may provide important bridge between the physiological timescales of one generation and the much longer timescales of genetic adaptation and phylogenetic diversification.

According to the original canalization hypothesis (Waddington, 1942; Schmalhausen, 1949), a new co-adapted state of the host and its microbiome is expected to become progressively more canalized (stabilized) over time (**Figure 5**). The principles and mechanisms leading to progressive stabilization are not entirely clear. It was suggested that the trend toward stabilization is implemented by increasing the connectivity of the underlying regulatory network, which is in turn supported by a tendency of natural selection to favor stable development (Siegal and Bergman, 2002). An increase in regulatory complexity can be achieved by different means, including by wider integration (co-option) of the microbiome into host processes. Increasing integration may reflect an entropic effect which is expected from having many more potential scenarios of integration compared with no integration. Among these scenarios would be host–microbiome integrations which either increase or reduce the stability of development. However, if changes which increase the stability tend to be more persistent, the entropic force of increasing co-option is expected to promote progressively more extensive host–microbiome interactions that are biased toward increased stability.

#### **DIFFERENT UNITS OF SELECTION AND THEIR IMPLICATIONS FOR ADAPTATION AND EVOLUTION**

Much of the above considerations stems from having more than one unit of selection. First, the host and its microbes are subjected to selection as a cooperating community. This is the most relevant

unit of selection for the host. On the other hand, each of the resident bacterial species is subjected to its own selection conditions within the host. The outcomes of these individual selections feed into the higher level of host (or community) selection.

This interaction between different layers of selection has far reaching implications which are not accounted for in current models of population genetics (Jaenike, 2012). The latter often consider changes in allele frequencies in the host, ignoring the potential contribution of changes in bacterial allele frequencies to the selection of the host. These changes in the microbiome could affect the propagation of the host, either by modifying it or by changing its internal environment. Owing to these changes, the host phenotypes and epigenetic state may be stably altered prior to the emergence or fixation of adaptive genetic changes in its own genome.

Selection of bacterial species within the host may also promote maladaptive changes in the host, thus providing potential for conflicts between the microbiome and the host. On longer timescales, however, these conflicts are subjected to selection at the higher, community level, which could then promote successful communities, and hence, successful hosts (Zilber-Rosenberg and Rosenberg, 2008). A somewhat similar, host-intrinsic scenario involving multiple layers of selection is provided by somatic mutations which lead to generation of tumor cells. These cells undergo changes promoting their own propagation on the expense of the host. However, unlike the microbiome, these tumors are not inherited, and do not evolve to cooperate with their host.

What is then the added evolutionary value of having these two layers of selection? Here, one can immediately recognize a few important considerations: first, the microbiome offers a substantial gene and cellular pool which can evolve to support reproduction of the community under new environmental conditions (Zilber-Rosenberg and Rosenberg, 2008). For example, functions and products of host cells may be provided as a service by resident bacteria (Backhed et al., 2005). This includes extreme cases in which endosymbiotic bacteria synthesize organic compounds (e.g., by chemosynthesis) and provide nutrition to certain gutless flatworms (Ott, 1982). The feasibility of such remarkable

collaborations is indicative of the vast potential of the microbiome to generate "creative" solutions. Important components of this potential are the relative speed in which these solutions can emerge and the separation from the host. Having these rapid changes in entities distinct from the host allows the community to dramatically expand the range of genetic and cellular variation while paying significantly reduced costs of maladaptive changes. Indeed, a comparable extent of genetic and cellular changes in the host itself, will likely lead to failed development, or alternatively, tumorigenesis and death. In other words, variation of the microbiome is likely a safer option in much the same way as would therapy with bacteria as opposed to hostintrinsic gene therapy. Thus, the microbiome offers extended potential for rapid, flexible, and safer exploration of new solutions and functions which benefit the host. This potential is particularly important when the host encounters new problems which can be not eliminated fast enough by a change in the host, but may be alleviated by rapid adaptation of the microbiome. Thus, the microbiome may provide a developing organism with a substantial capability to adapt to novel stressful conditions. It is often assumed that the demand for such capability is not very strong because of the extremely low likelihood of encountering a truly novel environment. However, this rationale may not necessarily invalidate the demand for adaptive capacity because the organism may also face new stressful conditions in regular environments. Indeed, many changes in the genome, epigenome, or the microbiome can create stressful conditions. Owing to the immensely large space of possible changes at these levels, it is not be realistic to assume that a large fraction of all possible scenarios have been frequently encountered during the evolutionary history of the organism. In this case, the organism is not expected to be equipped with an efficient genetic program for coping with all these stressful scenarios. Thus, the demand for at least some ability to adapt to new conditions within one or few generations can be quite high. Future work will determine whether and how epigenetics and symbiosis provide the organism with such adaptive plasticity.

#### **DIRECTIONS FOR FUTURE STUDIES**

The hypothesized involvement of the microbiome in regulating stability and flexibility based on contextual demand generates clear predictions which could be used to assess the validity of this hypothesis. In particular, the hypothesis asserts that: (1) intact microbiome tends to suppress environmentally- and genetically-induced phenotypic variation (compared to modified microbiome), and (2) severe stressful conditions tend to compromise this buffering by disrupting the composition and/or genetic and epigenetic makeup of the intact microbiome. These predictions can be tested by combining experimental manipulation of the microbiome with environmental and genetic perturbations. For example, the buffering capacity of the intact bicrobiome can be tested by comparing phenotypic variation in germ-free animals with variation in animals containing intact bacteria. Since the hypothesized buffering refers only to a statistical tendency to stabilize the phenotypes, the evaluation must be based on a sufficiently large set of experimental setups, involving different phenotypes in a variety of organisms subjected to different conditions (e.g., malnutrition, exposure to environmental stressors, use of genetically modified animals etc.). If the hypothesis is correct, we expect a positive correlation between lack of bacteria and the extent of deviation from regular developmental phenotypes. Positive correlation is also expected for milder manipulations of the microbiome, but the statistical validation in this case likely requires averaging across larger sets of conditions and phenotypes. It is also important to note that compromised (or complete lack of) microbiome in otherwise good conditions might not be disruptive enough to modify particular phenotypes in the host. Still, if a compromised microbiome reduces buffering capacity in the host, it should increase the likelihood that additional environmental or genetic insults will induce phenotypic variation. We therefore expect that the positive effect of bacterial disruption on the induction of phenotypic variability would tend to increase as a function of the overall amount of stress. The second part of the hypothesis (environmental suppression of phenotypic buffering by bacteria), can be tested by exposing developing organisms to different types of severe environmental stress and analyzing the ability of the stress to modify the overall composition of the microbiome as well as the properties of its individual bacterial species. Here again, the evaluation should include a variety of stress paradigms which do not involve stressors that have been specifically selected for targeting bacteria.

To further evaluate the hypothesized contribution of the microbiome to stable diversification and adaptation prior to genetic changes in their host, it would be instrumental to establish experimental setups in which the host is lacking defined functionalities. Combining these with multi-generational analysis of the type and stability of changes in the host and its microbiome should provide direct assessment of the validity of this hypothesis.

While the above evaluations should ultimately include a variety of different organisms, it may be useful to focus on organisms enabling rapid and rigorous assessment. One example is the fruit fly, *D. melanogaster*, which contains a relatively small number of (culturable) bacterial species (Wong et al., 2011). Additionally, the gut microbiome of the fly can be easily manipulated without relying on anti-microbial agents that could also have a direct impact on host tissues (Ridley et al., 2013). Avoiding these direct impacts is crucial for creating defined setups enabling rigorous evaluation of the influence of microbial changes on the host (Ridley et al., 2013; Fridmann-Sirkis et al., 2014). It is also important to devote attention to the potential impact of bacterial manipulations on the germline. Such impact can generate phenotypic differences between parents and offspring (Fridmann-Sirkis et al., 2014) thereby invalidating certain comparisons between non-matched generations. Avoiding these potentially confounding effects is achieved in flies by dissolving and sterilizing the chorionic shell of fertilized eggs (without antibiotics or propagation under germ-free conditions for an uncharacterized number of generations). Larvae that hatch from these dechorionated eggs are devoid of gut bacteria and can be re-exposed to defined compositions of bacterial species (e.g., bacteria that were isolated from intact flies with or without additional manipulations). These useful features synergize with the obvious benefits of having a large knowledgebase, powerful genetic tools and generation time which is compatible with practical analysis over about 10 generations.

The outcome of these (and similar) studies may shed important light on two critical questions in gene regulation and evolution, namely: (1) How the balance between stability and flexibility of developing organisms can be regulated by the environment based on contextual demand, and (2) Whether and how epigenetic- and symbiotic-based responses to new stressful conditions are causally connected to longer-term establishment of genetic adaptations in the host. This includes evaluation of the scope and mechanisms by which the adaptive potential of the microbiome might contribute to rapid buildup of stable adaptive modifications in the host. Following these lines of investigation may uncover hitherto unrecognized mechanisms bridging ecological and evolutionary processes.

#### **ACKNOWLEDGMENTS**

I wish to thank the members of my lab for their insightful research contributions which helped in molding the ideas described in this manuscript. I thank Professor Erez Braun (Technion, Israel) and Professor Mike Walker (Weizmann Institute of Science, Israel) for helpful discussions and suggestions. I thank Genia Brodsky (Weizmann Institute, Graphics department) for valuable contribution to the design and preparation of the graphic presentations. This work was supported by the Sir John Templeton Foundation (grant ID: #40663), and THE ISRAEL SCIENCE FOUNDATION (grant No. 1860/13). Yoav Soen is Incumbent of the Daniel E. Koshland Sr. Career Development Chair at the Weizmann Institute.

#### **REFERENCES**


of *Drosophila melanogaster*. *Proc. Natl. Acad. Sci. U.S.A.* 107, 20051–20056. doi: 10.1073/pnas.1009906107


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 02 April 2014; paper pending published: 05 May 2014; accepted: 20 May 2014; published online: 20 June 2014.*

*Citation: Soen Y (2014) Environmental disruption of host–microbe co-adaptation as a potential driving force in evolution. Front. Genet. 5:168. doi: 10.3389/fgene.2014.00168 This article was submitted to Epigenomics and Epigenetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Soen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

**REVIEW ARTICLE** published: 14 August 2014 doi: 10.3389/fgene.2014.00274

### Effector bottleneck: microbial reprogramming of parasitized host cell transcription by epigenetic remodeling of chromatin structure

#### *Sara H. Sinclair 1,2,3,4 , Kristen E. Rennoll-Bankert 2,3 and J. S. Dumler 1,2,3,4 \**

<sup>1</sup> Graduate Program in Cellular and Molecular Medicine, The Johns Hopkins University School of Medicine, Baltimore, MD, USA

<sup>2</sup> Department of Microbiology and Immunology, School of Medicine, University of Maryland Baltimore, Baltimore, MD, USA

<sup>3</sup> Department of Pathology, The Johns Hopkins University School of Medicine, Baltimore, MD, USA

<sup>4</sup> Department of Pathology, School of Medicine, University of Maryland Baltimore, Baltimore, MD, USA

#### *Edited by:*

Ilaria Negri, University of Turin, Italy

*Reviewed by:* Ian C. G. Weaver, Dalhousie University, Canada Ravi Goyal, Loma Linda University, USA

#### *\*Correspondence:*

J. S. Dumler, Departments of Pathology and Department of Microbiology and Immunology, School of Medicine, University of Maryland Baltimore, 685 West Baltimore Street, Health Sciences Facility-1 Room 322D, Baltimore, MD 21201, USA e-mail: sdumler@som.umaryland.edu

Obligate intracellular pathogenic bacteria evolved to manipulate their host cells with a limited range of proteins constrained by their compact genomes.The harsh environment of a phagocytic defense cell is one that challenges the majority of commensal and pathogenic bacteria; yet, these are the obligatory vertebrate homes for important pathogenic species in the Anaplasmataceae family. Survival requires that the parasite fundamentally alter the native functions of the cell to allow its entry, intracellular replication, and transmission to a hematophagous arthropod. The small genomic repertoires encode several eukaryoticlike proteins, including ankyrin A (AnkA) of Anaplasma phagocytophilum and Ank200 and tandem-repeat containing proteins of Ehrlichia chaffeensis that localize to the host cell nucleus and directly bind DNA. As a model, A. phagocytophilum AnkA appears to directly alter host cell gene expression by recruiting chromatin modifying enzymes such as histone deacetylases and methyltransferases or by acting directly on transcription in cis. While cis binding could feasibly alter limited ranges of genes and cellular functions, the complex and dramatic alterations in transcription observed with infection are difficult to explain on the basis of individually targeted genes. We hypothesize that nucleomodulins can act broadly, even genome-wide, to affect entire chromosomal neighborhoods and topologically associating chromatin domains by recruiting chromatin remodeling complexes or by altering the folding patterns of chromatin that bring distant regulatory regions together to coordinate control of transcriptional reprogramming. This review focuses on the A. phagocytophilum nucleomodulin AnkA, how it impacts host cell transcriptional responses, and current investigations that seek to determine how these multifunctional eukaryotic-like proteins facilitate epigenetic alterations and cellular reprogramming at the chromosomal level.

**Keywords: epigenetics, nucleomodulin, DNA methylation, histone deacetylase, chromatin,** *Anaplasma phagocytophilum*

#### **INTRODUCTION**

In order to infect mammalian hosts, bacterial pathogens evolved an array of mechanisms that serve to create an environment conducive for survival, replication, and spread. While many bacterial species survive in an extracellular environment, intracellular pathogens must be capable of both entering their host cells undetected and altering the cellular milieu in order to replicate. Traditionally, the ability of bacterial-derived proteins to induce disruptions of cell signaling or major cellular processes such as NFκB, MAPK, and JAK/STAT pathways, are the predominant focus of host–pathogen interaction studies (Brodsky and Medzhitov, 2009). Recently, there has been an increasing interest in the ability of these intracellular pathogens to direct alterations in host cell gene expression that promote survival and replication (Paschos and Allday, 2010; Bierne et al., 2012; Silmon de Monerri and Kim, 2014). It is now well recognized that bacterial pathogens can reprogram host gene expression either directly or indirectly, by altering the accessibility of gene promoters via epigenetic modifications.

Eukaryotic DNA is highly organized and gene expression tightly regulated by an orchestrated network of proteins, RNAs, and other modulators. Histone octamers, or nucleosomes, organize DNA by acting as a bobbin on which the DNA winds. The charge of the histone proteins can be covalently modified in order to more tightly or loosely associate with DNA. Histone acetylation, which imparts a negative charge, is predominantly associated with an open configuration where promoters are easily accessed by RNA polymerases. Histone methylation or phosphorylation which impart a positive charge, cause DNA to more tightly associate with histone proteins, reducing promoter accessibility to transcription activating machinery. The process of modulating open and closed chromatin is further complicated by (i) the residue(s) of which histone protein(s) is/are modified, (ii) methylation of cytosine residues in DNA, and (iii) non-coding RNAs, and by the manner in which these mechanisms are intertwined. It therefore is no surprise that bacterial-derived proteins have evolved to interfere with host gene expression that improves bacterial fitness.

Over the last decade, examples of secreted bacterial effector proteins, ranging from those produced by *Listeria monocytogenes*, *Chlamydia trachomatis*, *Shigella flexneri*, and others, were found to target the host cell nucleus (Paschos and Allday, 2010; Bierne et al., 2012; Silmon de Monerri and Kim, 2014). These discoveries demonstrate the relationship of altered transcription and function of infected host cells and microbial survival, and in some cases, pathogenicity. *S. flexneri*, a bacterial pathogen that can cause dysentery, prevents NF-κB from binding its target gene promoters by altering the phosphorylation state of histone H3 at serine 10. The bacterium does so by secreting outer membrane protein F (OspF) that de-phosphorylates MAPKs in the nucleus resulting in a lack of histone H3 phosphorylation at serine 10 at a number of NF-κB-dependent genes (Arbibe et al., 2007). By altering NF-κB target gene transcription, *S. flexneri*suppresses the host cell inflammatory response promoting microbial survival and transmission (Arbibe et al., 2007). *L. monocytogenes* nuclear targeted protein A (LntA), blocks binding of heterochromatin inducing protein BAHD1 at interferon-stimulated genes resulting in upregulated expression (Rohde, 2011). Nuclear effector E (NUE) of *C. trachomatis* and RomA of *L. pneumophila* have methyltransferase activity and induce methylation of eukaryotic histones and altered host cell gene expression (Pennini et al.,2010; Rolando et al.,2013). Plant pathogens in the genus *Xanthomonas*, as well as *Ralstonia solanacearum*, *Burkholderia rhizoxinica*, and an endosymbiont of the plant pathogenic fungus *Rhizopus microsporus* all possess genes encoding transcription activator-like (TAL) protein effectors that when expressed, bind plant DNA and alter transcription and disease susceptibility (Sugio et al., 2007; Bogdanove, 2014). These examples are part of an expanding array of bacterial-derived proteins termed nucleomodulins that target host cell chromatin or chromatin-linked pathways to alter transcription, typically at one or a few host genes. To date, the only prokaryotic nucleomodulins shown to directly bind mammalian DNA and influence surrounding chromatin are from the *Anaplasmataceae* family. Ankyrin A (AnkA) of *Anaplasma phagocytophilum*, as well as Ank200 and several tandem-repeat containing proteins (TRPs) from *Ehrlichia chaffeensis*, have been shown to enter the nucleus and bind DNA, and interact with host epigenetic machinery or alter nearby histone octamers (Garcia-Garcia et al., 2009a; Luo et al., 2011; Zhu et al., 2011; Luo and McBride, 2012; Dunphy et al., 2013).

Targeting of individual genes or binding of an effector at a single chromatin region or small numbers of promoter loci could, in theory, lead to *cis* regulation of those loci and in part explain transcriptional alterations induced by infection. However, the degree of transcriptional alterations and the coordination of these events that dramatically affect cellular function programs are unlikely to be explained by individual targets given the complexity and repertoire of the human genome, at approximately 3251 Mb vs. genome sizes of *S. flexneri* (≤4.83 Mb; Jin et al., 2002), *C. trachomatis* (≤1.04 Mb; Stephens et al., 1998), *L. monocytogenes* (≤3.00 Mb; Evans et al., 2004), *L. pneumophila* (≤3.40 Mb; Chien et al., 2004), and *A. phagocytophilum* (≤1.47 Mb; Lin et al., 2011). We hypothesize that nucleomodulins can act broadly, even genome-wide, to affect entire chromosomal neighborhoods and topologically associating chromatin domains by recruiting chromatin remodeling complexes or by altering the folding patterns of chromatin that bring distant regulatory regions together to coordinate control of transcriptional reprogramming. This review will discuss the current knowledge of *A. phagocytophilum* subversion of host cells by nucleomodulins and how AnkA could play an even larger role than its documented effects on transcription.

#### *Anaplasma phagocytophilum*

*Anaplasma phagocytophilum,* transmitted by *Ixodes* spp. ticks*,* was discovered as the causative agent of human granulocytic anaplasmosis (HGA) in 1990 (Chen et al., 1994). While the infection is usually subclinical, manifestations in humans range from mild fever to severe infection requiring intensive care or even death (Chen et al., 1994). *A. phagocytophilum* is a Gram negative, obligate intracellular bacterium that is a parasite of neutrophils in mammalian hosts. Owing to their roles in protective inflammatory and immune responses toward microbial infections and their innate ability to recognize and kill pathogens, neutrophils are unlikely host cells for any microorganism. Yet, *A. phagocytophilum* colonizes these cells and thwarts their normal functions to create a hospitable environment for intracellular replication and subsequent transmission via tick bite. With a limited genome of approximately 1.5 Mb (Lin et al., 2011) the potential genetic reservoir for controlling an infected host cell, whether in the tick or mammal, is only a fraction of the tick or human genome. This simple observation suggests that to circumvent such extremes, its control mechanisms must be highly efficient, multifunctional, or target master regulators or similar checkpoints in eukaryotic cells.

One remarkable feature of *A. phagocytophilum*-infected cells is the marked change in transcriptional profiles that belies aberrant regulation of many key host pathways (Borjesson et al., 2005; de la Fuente et al., 2005; Lee et al., 2008). *A. phagocytophilum*-infected neutrophils are characterized by an "activated-deactivated" phenotype with major functional aberrations including decreased respiratory burst, delayed apoptosis, reduced transmigration across endothelial cell barriers, and decreased antimicrobial activities such as phagocytosis and microbial killing, while simultaneous increases in degranulation of vesicle contents including proteases and increased production of chemokines, are observed (Carlyon et al., 2002; Carlyon and Fikrig, 2003; Choi et al., 2005; Dumler et al., 2005; Garyu et al., 2005; Ge et al., 2005; Carlyon and Fikrig, 2006; Garcia-Garcia et al., 2009b). The net result of these changes is an increase in (i) inflammatory responses that recruit new host cells, (ii) prolonged survival of infected cells, (iii) an inability to kill internalized microbes, and (iv) net sequestration of infected cells within the intravascular compartment that is more readily accessed by the next tick bite (Rennoll-Bankert and Dumler, 2012). In fact, interference with any of these processes using *in vitro* and *in vivo* models leads to reduced fitness of the microbe underscoring how important these functional changes are in mammalian hosts.

The biological basis for such dramatic changes in neutrophil function is increasingly studied by methods that range from examination of individual pathways to genome-wide systems biology approaches. Transcriptional profiling of infected neutrophils and HL-60 cells, the latter a commonly used cell model for *A. phagocytophilum* infection, reveals altered transcription genome-wide, confirming that changes in transcription are not restricted to a few genes and limited cellular functions, but likely play a role in most of the known functional changes induced by infection (Borjesson et al., 2005; de la Fuente et al., 2005; Lee et al., 2008). Transcriptional downregulation of two components of the NADPH oxidase, *CYBB* encoding NOX2 or gp91phox and the GTPase *RAC2*, that assemble in the membranes of activated cells to generate superoxide production and promote microbial killing plays a role in prolonged inhibition of respiratory burst (Banerjee et al., 2000; Carlyon et al., 2002). In addition, delayed apoptosis is achieved by maintained transcription of *BCL2* family members (Choi et al., 2005; Ge et al., 2005) and increased proinflammatory responses are due to upregulated transcription of cytokine and chemokine genes such as*IL8* (Klein et al.,2000;Akkoyunlu et al.,2001; Scorpio et al., 2004).

While many studies document these changes at functional and transcriptional levels, the precise mechanisms that organize and coordinate alterations to support improved *A. phagocytophilum* fitness are not well addressed. The biggest advance in understanding these processes occurred with the discovery of secreted prokaryotic effector proteins and their secretion systems. The genome of *A. phagocytophilum* encodes a type IV secretion system (T4SS) that allows protein effectors to translocate into infected host cells where they likely act by mechanisms similar to those of other bacterial secreted effectors that target cytosolic pathways such as signal transduction, cytoskeletal rearrangements, intracellular trafficking, etc. (Ohashi et al., 2002; IJdo et al., 2007; Gillespie et al., 2009). These observations also hold true for *A. phagocytophilum* effectors. However, not all effector proteins remain localized within the host cytosol, and those that enter the nucleus have direct access to genes as well as to a distinct and diverse array of proteins that could impact cellular function on a global scale.

#### **AnkA OF** *Anaplasma phagocytophilum*

*Anaplasma phagocytophilum* expresses a number of major immunoreactive proteins, including major surface protein 2/p44 (Msp2/p44) and AnkA, the latter of which was found by Caturegli et al. (2000) using immunoelectron microscopy to be transported into the host cell nucleus and bound within heterochromatin (Park et al., 2004). AnkA contains many EPIYA motifs that become tyrosine phosphorylated and recruit SHP-1 and ABL1 when introduced into mammalian cells which in turn regulates endosomal entry and intracellular infection (IJdo et al., 2007; Lin et al., 2011). Aside from this, AnkA contains multiple eukaryotic motifs including 8–15 or more ankyrin repeats, a putative bipartite nuclear localization signal (NLS), and a putative high mobility group Nchromatin unfolding domain (HMGN-CHUD; Caturegli et al., 2000). The ankyrin repeat domains are organized tandemly, creating highly stable spring-like structures that allow protein– protein or protein–DNA interactions and are commonly found in transcription factors and their regulatory proteins (Bork, 1993; Jernigan and Bordenstein, 2014). HMGN-CHUD domains facilitate binding to nucleosomes to alter chromatin structure and transcription of surrounding genes (Bustin, 1999). The presence of these motifs lends credence to the idea that AnkA plays a role in altering neutrophil transcription.

Park et al. (2004) confirmed thatAnkA directly binds both DNA and nuclear proteins, and provided limited evidence of its capacity to bind broadly throughout the human genome. AnkA binds to regions at the promoter of *CYBB*, encoding the gp91phox component of phagocyte oxidase (Garcia-Garcia et al., 2009b). *CYBB* is known to be transcriptionally repressed with *A. phagocytophilum* infection, further suggesting that AnkA acts in *cis* to alter gene transcription (Thomas et al., 2005; Garcia-Garcia et al., 2009b). Moreover, transcription of *CYBB* is decreased in a dose dependent manner as nuclear AnkA concentrations increase (Garcia-Garcia et al., 2009b). When AnkA-expressing plasmids are transfected into HL-60 cells, *CYBB* expression is dampened, strengthening the link to transcriptional regulation (Garcia-Garcia et al., 2009b). In electrophoretic mobility shift assays, AnkA binds to the *CYBB* promoter at the same locations as other known transcriptional regulators such as CAATT displacement protein (CDP) and special-AT rich binding protein-1 (SATB1; Thomas et al., 2005; Garcia-Garcia et al., 2009b; Rennoll-Bankert and Dumler, 2012). Surprisingly, DNA binding by AnkA does not target a conserved DNA sequence motif or signature; rather it binds to regions rich in AT nucleotides that have specific structural qualities, including the ability to uncoil under superhelical stress (Park et al., 2004; Garcia-Garcia et al., 2009b). The latter feature is often observed in eukaryotic proteins that tether DNA strands together into the nuclear matrix at matrix attachment regions (MARs) to regulate transcription from distantly located but functionally related chromosomal regions; proteins that bind these regions are called MAR-binding proteins (MARBPs).

Interestingly, Garcia-Garcia et al. (2009a) showed that multiple downregulated defense genes in infected granulocytes were clustered on chromosomes. The spatial clustering of similarly regulated genes suggests that, in addition to *cis*-regulation as with *CYBB*, long ranges of chromosomes are affected – a process often attributed to epigenetic mechanisms such as DNA methylation and histone chromatin modifications. Using chromatin immunoprecipitation (ChIP) to investigate histone marks at defense gene promoters, acetylation of histone H3 was dramatically reduced, a finding often associated with silenced gene transcription (Garcia-Garcia et al., 2009a). To explain this, binding of histone deacetylase-1 (HDAC1) was found increased across multiple defense gene promoters (Garcia-Garcia et al., 2009a). Moreover, *A. phagocytophilum*-infected granulocytes have increased HDAC activity most likely explained by the increased quantity of both HDAC1 and HDAC2. Inhibition of *HDAC1*, but not *HDAC2* expression by siRNA or pharmacologic inhibition of HDAC activity impairs *A. phagocytophilum* propagation, whereas overexpression leads to increased intracellular propagation (Garcia-Garcia et al., 2009a). We modeled this process by using the wild type *CYBB* promoter, to which AnkA binds, and mutated forms unable to bind AnkA in order to demonstrate that AnkA binding leads to HDAC1 recruitment and silencing of expression at *CYBB* (Garcia-Garcia et al., 2009b). These results clearly link AnkA with HDAC1 toward facilitating widespread downregulation of antimicrobial responses.

Currently, most data examining the ability of HDAC1 and AnkA to modulate transcription focuses on a small number of loci, including *CYBB* and up to 17 other gene promoters. While AnkA and HDAC1 are important contributors to transcriptional downregulation of some genes in *A. phagocytophilum-*infected cells, neither has been determined to provide a broad mechanism that coordinates the transcriptional or functional alterations observed. Importantly, transcriptional profiling of *A. phagocytophilum*infected granulocytes, including both primary neutrophils and cell lines, demonstrates that the majority of differentially expressed genes are upregulated (Borjesson et al., 2005). Thus, while HDAC1 recruitment by AnkA is important for down-regulating some genes, the majority of DEGs are likely regulated by alternative or additional mechanisms.

Of considerable interest is the interplay between chromatin structure and DNA methylation. Histone modifications, DNA methylation at CpG islands, and binding of methyl CpG DNAbinding proteins (MeCPs) via their methyl-DNA binding domains (MBDs) occur in concert during biological responses including neoplasia and cellular differentiation (Baylin and Herman, 2000; Baylin et al., 2001; Stirzaker et al., 2004; Srivastava et al., 2010; Reddington et al., 2013). Cytosine residues of CpG dinucleotides can by methylated by DNA methyltransferases (DNMTs) leading to sustained DNA methylation over multiple generations of cell divisions. It is believed in part that the methylation protects genes and transcriptional programs from inappropriate ectopic expression as cells enter various stages of tissue differentiation (Srivastava et al., 2010). DNMT1 is highly conserved and is thought to be primarily responsible for maintaining existing methyl CpGs (Baylin et al., 2001). In contrast, DNMT3a and 3b are believed to undergo transient *de novo* expression that induces CpG methylation in response to cellular stimuli (Okano et al., 1999). MBDs can recognize these newly methylated regions of DNA and are often associated with HDACs in MeCP1 and MeCP2 complexes, illustrating the potential for a direct link between DNA methylation and alterations in chromatin structure (Jones et al., 1998; Nan et al., 1998; Ng et al., 1999). Generally, hypermethylation of gene promoters occurs synchronously with histone deacetylation and decreased gene expression of many tumor suppressor genes (Baylin and Herman, 2000; Baylin et al., 2001; Stirzaker et al., 2004). This complex interplay has been best studied in cancer genomes, where it is currently unclear whether DNA methylation induces alterations in chromatin structure or if chromatin structure induces changes in DNA methylation. However, given the interplay between HDAC activity and DNA methylation, it is not unreasonable to hypothesize that *A. phagocytophilum* infection induces hypermethylation of the host genome during the process of global transcriptome reprogramming (**Figure 1**).

In addition to chromatin alterations induced by HDACs, MARs are responsible for dictating the three-dimensional architecture of chromatin loops and serve as tethering points for DNA to the nuclear matrix (Spector, 2003; Kumar et al., 2007; Kohwi-Shigematsu et al., 2012). Arranging chromatin into loops allows transcription factors attached to the matrix access to promoters and brings distal genomic loci and regulatory regions into a position of proximity for coordinated regulation (Arope et al., 2013). Interestingly, the AT-rich DNA docking sites of AnkA are similar to those of MARs. In fact, several transcription factors

During Anaplasma phagocytophilum infection, AnkA accumulates in the host cell nucleus where it can directly bind DNA and influence the transcription of CYBB, which encodes the gp91phox component of the NADPH oxidase. **(A)** In the absence of infection, CYBB is activated by inflammatory signals and is easily transcribed. **(B)** During infection, AnkA binds the proximal promoter and recruits HDAC which deacetylates nearby histones in order to inhibit transcription. **(C)** We propose that additional host or bacterial-derived chromatin modifying enzymes, such as DNMT methylation of host CpGs, may also be involved in altering the host epigenome to the bacterium's advantage.

including SATB1, bind MARs to coordinately modulate transcription from large genomic regions (Kumar et al., 2007). Like AnkA, SATB1 occupies the *CYBB* promoter during myeloid differentiation where it represses the transcription of *CYBB*, is involved in maintaining expression of *BCL2* (Gong et al., 2010) which is transcriptionally sustained during *A. phagocytophilum* infection (Ge et al., 2005) and is implicated in activation of cytokine expression in T cells (Hawkins et al., 2001; Beyer et al., 2011). SATB1 interactions with chromatin remodeling complexes include HDACs which are likely involved in some of these processes (Yasui et al., 2002). The complex looping of chromatin facilitated by MARs and MAR binding proteins allows for entire chromatin domains and territories to be remodeled despite relatively few binding sites (Cai et al., 2006). For example, SATB1 binds to only nine regions across the 200 kb TH2 locus, yet it is a major regulator of TH2 lymphocyte differentiation and function (Cai et al., 2006). Complex protein interactions involving chromatin remodeling and histone modification mediated by anchoring proteins like SATB1 make it plausible that DNAbinding bacterial nucleomodulins such as AnkA could target broad transcriptional programs that belie functions of host cells. While only limited data currently exist, AnkA binds to at least 23 distinct sites on 12 separate chromosomes with *A. phagocytophilum* infection of the human HL-60 promyelocytic cell line (Park et al., 2004) detailed mapping of AnkA genomic binding sites is now

in progress. We therefore think it is plausible that AnkA binds throughout the genome and exerts its effects, like SATB1, to both repress and activate transcription by tethering chromatin to the nuclear matrix and exposing promoters to chromatin remodeling complexes.

#### *Anaplasma phagocytophilum* **INFECTION IS ASSOCIATED WITH TRANSCRIPTIONAL CHANGES OVER MEGABASES IN THE HUMAN GENOME**

As a relatively new field, studies that examine the epigenome of cells infected by or as a consequence of bacterial infection, whether parasitic or symbiotic, are few and far between. Given the proven interaction of *A. phagocytophilum* AnkA with gene promoters, epigenetic alterations of nearby chromatin, and its MAR-binding attributes, we reanalyzed publicly available transcription microarray data generated from *A. phagocytophilum*-infected human peripheral blood neutrophils (Borjesson et al., 2005). We used updated bioinformatics tools, including an analysis of the SD estimates for differential transcription at each locus, and wherever possible, improved gene and gene feature localization using the NCBI GRCh38 human genome assembly (release date December 24, 2013). The differential expression of all 18,400 interrogated transcripts and variants, including 14,500 well-characterized human genes, was mapped by chromosomal position so that the relationship between the linear chromosome landscape and transcriptional activity could be visualized and investigated. To display physical locations on chromosomes with long regions of altered

transcription, the fold change of expression for each locus was averaged with the nearest neighboring genes to create sliding windows for each chromosome. Overall, the median sliding window included 10.0 genes/gene features (IQR 2.0) spanning a median interval of 2.04 Mb (IQR 2.16) corresponding approximately to the lower end dimensions of chromosomal or gene expression dysregulation domains (Letourneau et al., 2014). The sliding window average along with individual gene transcriptional fold changes were plotted for each chromosome. To assess significance of fold changes in the sliding window, a similar sliding window was created to simultaneously show the average estimated SDs over each window.

All chromosomes showed linear regions of marked differential transcription covering megabases in scale on both p and q arms (**Figure 2**). In keeping with the original observations of Borjesson et al., most clusters were upregulated; however, the analysis also demonstrated a similar pattern of downregulated genes over long linear regions. Of interest was chromosome 17 with a 9.4 Mb cluster containing myeloid peroxidase (*MPO*), eosinophil peroxidase (*EPX*), and lactoperoxidase (*LPO*) which showed marked downregulation, and a 3.2 Mb cluster containing multiple upregulated chemokine genes (**Figure 3**). In addition to reduced transcript levels, *MPO* and *EPX* promoters were previously shown to be deacetylated upon *A. phagocytophilum* infection (Garcia-Garcia et al., 2009a). Furthermore, a 3.6 Mb cluster on chromosome 6 containing *FLOT1*, *TNF*, and the major histocompatibility complex (*MHC*) was upregulated (**Figure 4**). The

**linear genomic regions in** *A. phagocytophilum***-infected human peripheral blood neutrophils.** The top panel shows human chromosome 17; the bottom panel shows the MPO/EPX /LPO locus on human chromosome 17 and the large genomic region that is upregulated with infection. The chromosome ideogram is shown at the top. Red bars (left axes) represent differential transcription of individual genes, including some replicates; the dark blue zones (left axes) show the sliding window average log2-fold differential transcription over the contiguous 9–11 genes (or 0.38–1.25 Mb). The light blue zone at the bottom (right axis) shows the sliding window average over the same region for estimated SDs of log2-fold differential transcription at each gene or gene feature. Data re-analyzed from Borjesson et al. (2005) http://www.ncbi.nlm.nih.gov/geo/, accession no. GSE2405.

**FIGURE 4 | Differential gene expression patterns comparing** *A. phagocytophilum***-infected vs. non-infected human peripheral blood neutrophils are clustered over large linear genomic regions.** The top panel shows human chromosome 6; the bottom panel shows the MHC locus on human chromosome 6 including a large genomic region spanning the TNF and HLAD loci that are upregulated with infection. The chromosome ideogram is shown at the top. Red bars (left axes) represent differential

transcription of individual genes, including some replicates, the dark blue zones (left axes) show the sliding window average log2-fold differential transcription over the contiguous 9–12 genes (or 0.42–3.24 Mb). The light blue zone at the bottom (right axis) shows the sliding window average over the same region for estimated SDs of log2-fold differential transcription at each gene or gene feature. Data re-analyzed from Borjesson et al. (2005) http://www.ncbi.nlm.nih.gov/geo/, accession no. GSE2405.

chromosomal landscape of the *MHC* locus is well documented with respect to chromatin looping via MARS and histone marks (Ottaviani et al., 2008). It is difficult to determine with certainty whether these changes are the result of manipulation by microbial effectors like AnkA, or whether these represent host cellular responses to infection, or an amalgam of both. The limited genome-wide transcriptional alterations originally documented with exposure to heat-killed *A. phagocytophilum* argue for the former (Borjesson et al., 2005). Regardless, these observations clearly suggest that *A. phagocytophilum* infection targets large chromosomal territories to induce transcriptional alterations, in addition to specific genes. Given that HL-60 cells transfected to express AnkA have nearly identical differential gene expression patterns (Garcia-Garcia et al., 2009b) and reductions in induced respiratory burst compared to *A. phagocytophilum*-infected cells, a hypothesis that prokaryotic nucleomodulins evolved to modulate cell function by epigenetic alterations is compelling.

#### **CONCLUSION AND FUTURE DIRECTIONS**

Given the limited nature of bacterial genome repertoire, and the expansive genome repertoire, organization and complexity of transcriptional regulation in eukaryotes, the ability of prokaryotes to alter eukaryotic host cell functions at the epigenetic level is truly a remarkable phenomenon. The processes by which bacteriaderived proteins alter host cell signaling, endocytic or vesicular trafficking, and *cis* transcription of genes are important. However, these pathways become challenging as paradigms that can account for the marked and diverse changes in host cell transcription and functions during the course of infection. We believe that successful intracellular prokaryotes, whether parasitic and pathogenic, mutualistic or symbiotic, evolved multifunctional and complex proteins with broad effects that target key master regulators or checkpoints that regulate major cellular reprogramming events. The *A. phagocytophilum* genome is a fraction of the size of its human host, yet it manages to disrupt core functions of the cell by transforming the transcriptome and reprogramming the cell for improved microbial fitness, survival, and transmission. It is increasingly apparent that *A. phagocytophilum* and perhaps other intracellular prokaryotes manipulate their hosts with a high degree of efficacy, but by altering the "on" or "off " state of genes in a *cis* only fashion or targeting individual or small numbers of signaling pathways is unlikely to account for this alone, in essence, an "effector bottleneck". An analysis of available infection transcriptome data demonstrates that large chromosomal territories appear to be coordinately regulated – up or down. This observation, along with HDAC recruitment by AnkA, lends strong evidence to the idea that *A. phagocytophilum* induces global changes in host gene expression via a broad mechanism that involves epigenetic regulation of transcriptional programs which belie cellular functions.

It is currently unclear as to whether HDAC recruitment is the predominant mechanism by which AnkA exerts its chromatin modulating effects, including whether there are other host factors [e.g., polycomb repressive or hematopoietic associated factor-1 (HAF1) complexes] or additional bacterial-derived nucleomodulins that further contribute to the dramatic changes observed. To examine this, we designed a genome-level bioinformatics approach to predict nucleomodulins based on the combination of secretory and NLSs and studied the genomes of 12 phylogenetically diverse intracellular prokaryotic pathogens, including *Chlamydia pneumoniae*, *Mycobacterium tuberculosis*, and *Yersinia pestis* among others, and identified between 7 and 35 candidates for each (Borroto et al., 2009). Among these and by using iTRAQ proteomics approaches, we identified other proteins encoded in the *A. phagocytophilum* genome that localize to the host cell nucleus and are examining whether these too contribute to chromosomal structure alterations or work in synergy to enhance AnkA function (Gilmore et al., 2012). Elucidating the exact locations where AnkA or other nucleomodulins bind throughout the genome, and determining the architecture of the surrounding chromatin will be crucial to understanding this process. The interplay between HDACs, DNMTs, and methyl binding proteins suggests that *A. phagocytophilum* infection could also induce widespread DNA methylation perhaps as a mechanism for obtaining broad epigenetic changes and functional reprogramming.

Understanding the role of prokaryotic control over complex eukaryotic transcription machinery in lieu of the bottleneck that would occur with single effector targets at signaling pathways will allow for better understanding of the essential components of whole cell transcriptional re-programming. However, this has implications not only for host–pathogen or host–symbiont interactions, but across biology. The ability of proteins like AnkA of *A. phagocytophilum* and OspF of *S. flexneri* to alter inflammatory responses could provide opportunities for engineering of therapeutic agents which interfere with cellular responses in inflammatory diseases or other conditions where epigenetic factors control or contribute to disease.

#### **ACKNOWLEDGMENTS**

Supported by grant R01AI044102 from the US National Institutes of Allergy and Infectious Diseases, National Institutes of Health and by Mid-Atlantic Regional Center for Excellence in Biodefense and Emerging Infectious Diseases (MARCE) Career Development award 0002374. The authors thank J. C. Garcia-Garcia for creating the algorithm to identify potential nucleofactors in prokaryotic genomes and for work on iTRAQ studies.

#### **REFERENCES**


lineages. *Appl. Environ. Microbiol.* 70, 2383–2390. doi: 10.1128/AEM.70.4.2383- 2390.2004


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 17 June 2014; paper pending published: 23 July 2014; accepted: 26 July 2014; published online: 14 August 2014.*

*Citation: Sinclair SH, Rennoll-Bankert KE and Dumler JS (2014) Effector bottleneck: microbial reprogramming of parasitized host cell transcription by epigenetic remodeling of chromatin structure. Front. Genet. 5:274. doi: 10.3389/fgene.2014. 00274*

*This article was submitted to Epigenomics and Epigenetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Sinclair, Rennoll-Bankert and Dumler. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited andthatthe original publication inthis journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

### Bacterial SET domain proteins and their role in eukaryotic chromatin modification

#### *Raúl Alvarez-Venegas\**

*Laboratory of Chromatin and Epigenetics, Department of Genetic Engineering, CINVESTAV Unidad-Irapuato, Irapuato, México*

#### *Edited by:*

*Ilaria Negri, University of Turin, Italy*

*Reviewed by: Abhijit Shukla, Harvard Medical School, USA Sateesh Kagale, National Research Council Canada, Canada*

#### *\*Correspondence:*

*Raúl Alvarez-Venegas, Laboratory of Chromatin and Epigenetics, Department of Genetic Engineering, CINVESTAV Unidad-Irapuato, Km. 9.6 Libramiento Norte, Carr. Irapuato-León C.P. 36821, Irapuato, México e-mail: ralvarez@ira.cinvestav.mx*

It has been shown by many researchers that SET-domain containing proteins modify chromatin structure and, as expected, genes coding for SET-domain containing proteins have been found in all eukaryotic genomes sequenced to date. However, during the last years, a great number of bacterial genomes have been sequenced and an important number of putative genes involved in histone post-translational modifications (histone PTMs) have been identified in many bacterial genomes. Here, I aim at presenting an overview of SET domain genes that have been identified in numbers of bacterial genomes based on similarity to SET domains of eukaryotic histone methyltransferases. I will argue in favor of the hypothesis that SET domain genes found in extant bacteria are of bacterial origin. Then, I will focus on the available information on pathogen and symbiont SET-domain containing proteins and their targets in eukaryotic organisms, and how such histone methyltransferases allow a pathogen to inhibit transcriptional activation of host defense genes.

**Keywords: SET-domain, bacteria, pathogen, symbiont, histone methyltransferases, epigenetics**

#### **INTRODUCTION**

In eukaryotic organisms, genomic DNA is packaged in the form of a complex structure known as chromatin. The basic unit of chromatin is the nucleosome, consisting of approximately two superhelical turns of DNA wrapped on a histone octamer composed of four histone species: a histone H3/H4 tetramer and two histone H2A/H2B dimmers (Kornberg, 1977). This nucleoprotein complex occurs basically every 200 ± 40 bp in all eukaryotic genomes. The repeating nucleosome units further assemble into higher-order structures stabilized by the linker histone H1.

Core histones are predominantly globular except for their N-terminal "tails," which are unstructured. A remarkable attribute of histones, and mainly of their tails, is the great number and type of modified residues they possess (Kouzarides, 2007). Thus, there are several histone post-translational modifications (histone PTMs) that correlate with either positive or negative transcriptional states. These modifications include acetylation, phosphorylation, methylation, ubiquitylation, and SUMOylation. Most modifications localize to the amino- and carboxy-terminal histone tails, and a few localize to the histone globular domains (Berger, 2007). It has been suggested that the combination of histone amino-terminal modifications on one or more histones represent a "histone code" that modulates gene expression, regulates chromatin structure, and determines cellular and epigenetic identities during development, therefore extending the information potential of the genetic code encoded in the DNA (Jenuwein and Allis, 2001; Casadio et al., 2013). Recently, with the discovery of several novel histone-binding modules the histone-code hypothesis predicts the existence of "reader proteins" that recognize chromatin covalent-modification marks to influence downstream events (Ruthenburg et al., 2007). On the other hand, all histone PTMs are removable. For example, histone deacetylases (HDACs) remove acetyl groups; Ser/Thr phosphatases remove phosphate groups; ubiquitin proteases remove mono-ubiquitin from H2B; arginine methylation is altered by deiminases; and two classes of lysine demethylases remove methyl groups from lysines: the LSD1/BHC110 class and the jumonji class (Berger, 2007).

One of the most extensively studied histone PTMs has been the methylation of lysine residues in histones. Accordingly, the first histone lysine methyltransferase (HKMT) to be identified was the human and mouse SUV39H1 that targets lysine 9 of histone H3 (H3K9) (Rea et al., 2000). Since then, numerous HKMTs have been identified, most of which methylate lysines within the histone N-terminal tails. Noticeably, all of the HKMTs that methylate N-terminal lysines contain the SET domain, a ∼130 amino acid catalytic domain initially found to be conserved in the PEV modifier SU(VAR)3-9, the Polycomb-group protein E(Z), and the trithorax-group protein TRX (Jenuwein et al., 1998). Crystal structures of SET-domain proteins have revealed that the SET domain is folded into several small β sheets (packed together with pre-SET and post-SET domains or regions) surrounding a knot-like structure. This "pseudo-knot" fold brings together the two most-conserved sequence motifs of the SET domain (RFINHXCXPN and ELXFDY) to form an active site in a location immediately next to the pocket where the methyl donor binds and to the peptide-binding cleft (Dillon et al., 2005). Moreover, it has also been shown that both N- and C-terminal flanking regions to the SET-domain are as well required for HKMT activity (Rea et al., 2000).

Histone lysine residues methylated *in vivo*, in animals, include H3K4, H3K9, H3K27, H3K36, H3K79, H4K20, H2BK5, and H1K26 (Barski et al., 2007). The first H3K4 (histone H3 lysine 4) methylase, Set1/COMPASS, was isolated from *Saccharomyces* *cerevisiae* and was demonstrated to be capable of mono-, di-, and tri-methylate H3K4 (Miller et al., 2001; Roguev et al., 2001). Thus, methylation can occur several times on one lysine side chain and each level of modification may have different biological outcomes. Furthermore, of the large family of SET domains, a subset is encoded by bacterial pathogens and symbionts, which lack chromatin, suggesting a role in altering the host chromatin upon infection. For instance, pathogenic bacteria make use of a wide range of strategies to avoid elimination by their host. Then, aiming at histone modifications could allow a pathogen to inhibit transcriptional activation of host defense genes. Hence, pathogenic bacteria can be considered as "epimutagens" able to remodel the epigenome. Their effects might generate specific, long-lasting imprints on host cells, leading to a memory of infection that influences immunity and that might be the foundation of unexplained diseases.

Here, I will examine the available information on SET domain genes identified in bacterial genomes. I will analyze the hypothesis that SET domain genes found in bacteria are of bacterial origin. Then, I will concentrate on the information related to pathogen and symbiont SET-domain containing proteins, their targets in eukaryotic organisms, and on how such histone methyltransferases could allow a pathogen to inhibit transcriptional activation of host defense genes.

#### **BACTERIAL SET DOMAIN**

Considering that SET-domain containing proteins modify chromatin structure, as expected, genes coding for SET-domain containing proteins have been found in all eukaryotic genomes sequenced to date. However, during the last years, thanks to the recent advances in biological research such as nextgeneration sequencing, a great number of bacterial genomes have been sequenced and, what's more, an important number of putative genes involved in histone PTMs have been identified in many bacterial genomes. For example, from 390 completely and partially sequenced bacterial genomes available in 2007 at the National Center for Biotechnology Information database (NCBI), 83 bacterial species encoding putative SET domain proteins were retrieved (Alvarez-Venegas et al., 2007). However, a basic BLASTP search performed today retrieves more than 500 bacterial genomes (and counting; Supplementary Table ST1), including very closely related genomes, all of them containing SET-domain proteins. The number of hits range from one to four SET coding genes per genome. Interestingly, species like *Gemmata obscuriglobus* (a nonpathogenic spherical budding bacteria; NCBI locus WP\_010044726), *Bacillus coahuilensis* (a Gram-positive, sporeforming bacterium from a highly saline desert lagoon; NCBI locus WP\_010172611), *Opitutus terrae* PB90-1 (an obligatory anaerobic bacteria isolated from rice paddy soil; NCBI locus YP\_001821399), *Burkholderia rhizoxinica* HKI 454 (an intracellular symbiont of a phytopathogenic fungus; NCBI locus CBW73645), *Methanoregula boonei* 6A8 (a novel acidiphilic, hydrogenotrophic methanogen; NCBI locus WP\_012107610), and many more interesting bacterial species have recently been shown to code for SET-domain containing proteins (Supplementary Table ST1).

At first, proteins with SET domain present in bacterial genomes were considered to be the result of horizontal transfer from a eukaryotic host (Stephens et al., 1998; Aravind and Iyer, 2003). However, nowadays, the expanded collection of sequenced bacterial genomes, to include not only pathogenic but also free-living and environmental species, as well as methanogenic archaea, indicate that SET domain genes have existed in the bacterial domain of life (Alvarez-Venegas et al., 2007). Furthermore, Aravind et al. (2011) have proposed that SET domain methylases, which display the β-clip fold, first emerged in prokaryotes from the SAF superfamily of carbohydrate-binding domains (based on its representative members, SAS, type III AFP and FlgA). Thus, taking into account recent evidence that supports a chromatinrelated role for at least a portion of the bacterial SET domain versions, it is likely that the SET domain had matured into a primitive chromatin-remodeling enzyme in prokaryotes, prior to its transfer to eukaryotes (Aravind et al., 2011).

On the other hand, phylogenetic analysis have shown that bacterial SET domain genes have undergone an evolution of their own, unrelated to the evolution of the eukaryotic SET domain genes (Alvarez-Venegas et al., 2007; Murata et al., 2007). This can be substantiated, for example, by performing a BLASTP search at NCBI, with any SET-domain. In all "distance trees" produced using BLAST pair-wise alignments, the Newick dendrograms (as well as any phylogenetic tree) produced at NCBI show all eukaryotic entries clustered as a monophyletic group. Important to our discussion is that eukaryotic SET proteins do not mix together with SET proteins of bacterial origin and that a similar distribution pattern is continuously reproduced with different combinations of prokaryotic or eukaryotic entries (data not shown). In addition, the branching arrangement of the respective monophylies of Archaea, Eukarya, and Bacteria rejects any sort of horizontal gene transfer (HGT) involving SET genes from bacteria and vertebrates, although some ambiguous cases may arise in such analyses and could be, for example, a consequence of the poor representation of certain genomes in present-day databases (Stanhope et al., 2001). In contrast, phylogenetic and chromosome analyses of Chlorobium, Bacillus, and Methanosarcinal SET domain-containing species support an ancient HGT between bacteria and Archeae (Alvarez-Venegas et al., 2007).

#### **PATHOGEN AND SYMBIONT SET-DOMAIN CONTAINING PROTEINS**

#### **CHLAMYDIAE SET DOMAIN PROTEINS**

One of the earliest reports on the identification of SET domain proteins in bacteria was the result of the genome sequencing of an obligate intracellular pathogen of humans that targets epithelial cells, *Chlamydia trachomatis* (Stephens et al., 1998). At that time, it was suggested that the SET domain protein CT737 in Chlamydia was the result of HGT based upon the assumption that the SET domain was found only in eukaryotic chromatinassociated proteins (Stephens et al., 1998).

Unlike eukaryotes, prokaryotes do not have histones nor highly ordered chromatin. However, pathogenic bacteria must employ a wide range of tactics to avoid eradication by their host. Then, targeting histone modifications could be one of those strategies that allow a pathogen to inhibit transcriptional activation of host defense genes. With this in mind, Pennini et al. (2010) set out to characterize the chlamydial SET domain protein CT737 (named NUE, as the first "nuclear effector identified in chlamydiae"). They found a type three secretion (TTS) system signal in the N-terminal part of NUE. Also, Pennini and colleagues found that NUE acts as a TTS system effector protein, that it is secreted from bacteria, translocated to the host cell nucleus during *C. trachomatis* infection and associated with chromatin, particularly at late time points of infection. NUE has a histone methyltransferase activity that modified mammalian histones H2B, H3, and H4 but with a stronger activity toward H4 (**Table 1**). NUE itself automethylates, suggesting that automethylation enhances NUE enzymatic activity toward its substrate (Pennini et al., 2010). Consequently, the chlamydial SET domain protein seems to have evolved into a secreted protein capable to modify eukaryotic histones (**Figure 1**). Thus, by using its SET domain protein as an epigenetic control of host cells represents an advantage for Chlamydia in the persistence of chlamydial infection to maintain chronic disease progression.

On the other hand, *Chlamydophila pneumonia*, an obligatory intracellular eubacterium that causes acute respiratory diseases (e.g., pneumonia) and chronic inflammatory processes (e.g., asthma, bronchitis, and chronic obstructive pulmonary disease; Roulis et al., 2013), has also a SET-domain coding gene, recently characterized (NCBI locus BAA99086). The *C. pneumoniae* SET domain protein (named cpnSET by Murata et al., 2007), with a murine histone H3 methyltransferase activity, has a similar expression pattern of two chlamydial histone H1-like proteins, Hc1, and Hc2. In addition, cpnSET has a physical interaction with the Hc1 and Hc2 proteins, as determined by the yeast twohybrid system. Furthermore, Hc1 is also methylated by cpnSET, indicative that cpnSET may play an important role in chlamydial cell maturation due to modification of chlamydial histone H1-like proteins (Hc1 proteins) during the alternate morphologies between elementary bodies (EBs) and reticulate bodies (RBs) of *C. pneumonia* (Murata et al., 2007). However, localization of cpnSET was shown mainly in chlamydial cells (Murata et al., 2007), raising the possibility that the *C. pneumoniae* cpnSET protein is exported into host cells, and then cpnSET and host histones may functionally interact with each other. This is something extremely possible if we consider that the cpnSET protein has the two ("bipartite") potential nuclear localization sequences (RHR at position 123 and KHRKKR at position 208) as those analyzed in the *C. trachomatis* translocated SET domain protein NUE (RRR at position 121 and KHRKKR at position 206; Pennini et al., 2010). Moreover, the chimeric N-terminal sequence of *C. pneumoniae* was secreted when expressed in *Shigella flexneri* ipaB (constitutive type three secretion—TTS-system), something highly indicative that chlamydiae SET domains proteins are TTS effectors (Pennini et al., 2010). Conversely, NUE did not methylate Hc1, as cpnSET did (**Figure 1**) (Murata et al., 2007; Pennini et al., 2010). This is something that deserves further investigation.

#### **PROTEOBACTERIA:** *LEGIONELLA PNEUMOPHILA*

The causative agent of a severe form of pneumonia called Legionnaires' disease, *Legionella pneumophila*, is a Gram-negative intracellular pathogen, ubiquitous in aquatic habitats where it survives and replicates within a wide range of protists, such as amoeba and ciliated protozoa. Upon transmission to humans, *L. pneumophila* invades and replicates in alveolar macrophages, evades the default endosome–lysosome pathway, remodels the phagosome, and escapes into the host cell cytosol. This ends in the expression of various bacterial virulence traits and bacterial escape to the extracellular environment, leading to the disease (Al-Quadan et al., 2012). *L. pneumophila* is a master manipulator of a variety of eukaryotic hosts ranging from unicellular amoebae to mammals. *L. pneumophila* facilitates this by taking control of numerous eukaryotic cellular functions through translocation of around 300 effectors into the host cell by the Dot/Icm type IVB secretion system. More than 70 of the injected effector proteins contain eukaryotic-like domains, including the ankyrin repeat, Fbox, U-box, leucine-rich repeats and SET domain (Price and Abu Kwaik, 2013).

Recently, Rolando et al. (2013) identified one protein encoded by the gene *lpp1683* of the *L. pneumophila* strain Paris that contains a SET domain. It was shown that the Lpp1683 protein (named RomA, for regulator of methylation A) is a histone methyl transferase, with an apparent preference for histone H3. Specifically, RomA tri-methylates lysine 14 of histone H3 (H3K14), on free histones as well as in reconstituted oligonucleosomes. Then, Rolando and colleagues determined that RomA also conserved its activity *in vivo* during infection of human macrophages and amoeba, and that RomA localizes to the host nucleus in which it catalyzes a new histone methylation mark, H3K14. Deletion of a nuclear localization signal (NLS) in its N-terminal part altered the cellular distribution of RomA, leading to a predominant cytosolic localization. Fluorescent-based translocation assays showed that this protein is translocated in a Dot/Icm-dependent manner into the host cell. Next, it was shown that the H3K14 methylation mark, which replaces H3K14 acetylation (an active mark of ongoing transcription at the transcription start sites, TSSs), functions as a transcriptional repressive mark and results in global gene transcriptional repression, particularly in genes that are involved in innate immunity. Furthermore, deletion of RomA leaves the Paris strain defective in intracellular growth within both macrophages and amoebae, indicative that repression of host global transcription is important for *L. pneumophila* pathogenesis.

In contrast, Li et al. (2013) used the *L. pneumophila* Philadelphia-derivative Lp02 strain to characterize the protein LegAS4, a *L. pneumophila* type IV secretion system (TFSS) effector that contains a SET domain and tandem nuclear localization signals (NLS). LegAS4 is efficiently translocated from *L. pneumophila* into host macrophages and when ectopically expressed, LegAS4 was localized exclusively in the nuclei, specifically was localized in the host nucleolus, and associated with rDNA chromatin in which it catalyzes H3K4 dimethylation at the rDNA promoter and promoted rDNA transcription. Furthermore, LegAS4 interacted with host heterochromatin binding proteins HP1α and HP1γ, but extremely weakly with HP1β. The high-affinity HP1 binding is responsible for specific recruitment of LegAS4 to the transcriptionally silent rDNA chromatin region. Thus, *L. pneumophila* might exploit host ribosome activity for its own survival



*Abbreviations: n.d., not determined; H3K14me3, tri-methylated lysine 14 of histone H3; H3K4me2, di-methylated lysine 4 of histone H3.*

representation of Burkholderia, Legionella, Chlamydia, and Bacillus secreted factors involved in the control of gene expression of host 4 of histone H3; H3K9me3: tri-methylated lysine 9 of histone H3; H1K: histone H1 lysine.

advantages by stimulation of rDNA transcription (Li et al., 2013).

Why those two highly homologous effector proteins have such inconsistent phenotypes in two different strains of *L. pneumophila*? This could be the result of the differences in the amino acid sequences of both proteins, as suggested by Price and Abu Kwaik (2013). For example, RomA is missing an aminoterminal fragment present in LegAS4. In addition, there is a weak sequence homology at an eight amino acid stretch almost in the middle of the proteins. These sequence differences could alter the structure of RomA from LegAS4 resulting in a change in histone methyltransferase substrate specificity. Also, LegAS4 has a robust localization to the nucleolus, which is not seen for RomA. The ability of LegAS4 to localize to the nucleolus is dependent on its tandem NLS, whereas RomA has three individual amino acid differences from LegAS4 in its corresponding NLS (Price and Abu Kwaik, 2013). Therefore, the structural differences between LegAS4 and RomA most likely explain the differential nucleolar protein targeting and distinct functional phenotypes of homologous effectors. It will be interesting to determine whether LegAS4 can also catalyze H4K14me3 in the nucleolus, as well as to determine the potential role of HP1 in RomA-mediated genome-wide repression.

Other bacterial pathogens, including *Bordetella bronchiseptica* and *Burkholderia thailandensis*, also possess LegAS4-like HKMTase effectors targeted to the host nucleolus. Specifically, the *B. thailandensis* type III effector BtSET upholds H3K4 methylation of rDNA chromatin and contributed to infection-induced rDNA transcription (Li et al., 2013). Thus, activation of rDNA transcription could be a common virulence strategy employed by bacterial pathogens for intracellular survival.

#### **BACILLUS SET DOMAIN PROTEINS**

*Bacillus anthracis*, the etiologic agent of anthrax disease, is a Gram-positive spore-forming organism found in soil environments. *B. anthracis* spores are taken up by macrophages and/or dendritic cells, and subsequently migrate in the draining lymph nodes where they germinate, leading to bacterial multiplication, and dissemination through the whole organism (Raymond et al., 2009). Respiratory, gastrointestinal, or cutaneous entry of *B. anthracis* spores into mammals can result in a rapid systemic infection and death (mainly, in the pulmonary form and in the absence of treatment). Accordingly, the ability to drive bacterial molecules directly into host cells is a major strategy used by diverse bacterial pathogens to destabilize the host transcriptional machinery and to overcome host defenses. Strategically, most effectors aim to stabilize the NF-κB/IκB (nuclear factorκB/inhibitor of κB) complex for precluding nuclear localization and transcriptional activation of the nuclear factor NF-κB (Mujtaba et al., 2013).

Recently, Mujtaba et al. (2013) characterized a SET-domain protein in *B. anthracis* (named *Ba*SET) in order to determine its role in *B. anthracis* survival in infected hosts. It was shown that *Ba*SET is a specific histone H1 trimethylase that functions as a transcriptional repressor by reducing the activation of NF-κB response elements (NF-κB \_RE) in a dose-dependent expression, as well as by repressing diverse NF-κB target gene promoters. Particularly, *Ba*SET methylates eight lysine residues of the histone H1, which could be the strategy of the bacillus to hypermethylate host chromatin to silence the host inflammatory response. Furthermore, *Ba*SET is secreted by *B. anthracis* and localizes to the nucleus of infected macrophages where it methylates Histone H1, although the mechanism by which *Ba*SET translocates outside the bacillus is unclear, as the *Ba*SET sequence does not display any secretion signal (Mujtaba et al., 2013). Additionally, an engineered *Ba*SET deletion mutant (*Ba*-SET) indicated that *Ba*SET is not involved in the formation and germination of *B. anthracis* heat-resistant endospores, but that it plays a major role in the virulence of *B. anthracis*, as deletion of the gene eliminates the capacity for the organism to cause disease and death as well as survival in the infected host.

It will be interesting to determine the function of SET domain proteins present in other Bacillus, for example: *Bacillus cereus* (a ubiquitous soil organism and an opportunistic human pathogen most commonly associated with food poisoning), and *Bacillus thuringiensis* (an insect pathogen that is widely used as a bio-pesticide) (Han et al., 2006). But even more interesting will be the study of SET domain proteins in bacteria like *Bacillus megaterium* (a commercially available, nonpathogenic host for the biotechnological industry), and *Bacillus coahuilensis*, an ancient and a moderately halophilic, Gram-positive and rod-shaped bacterium, isolated from a Chihuahuan desert lagoon in Cuatro Ciénegas, Coahuila, México (Alcaraz et al., 2008).

#### **ARCHAEAL SET DOMAIN**

On the basis of their molecular properties, Archaea has been defined as a separate domain of life. Archaea lack nuclear membranes and are therefore prokaryotes, however, they are genetically and biochemically as divergent from bacteria as are eukarya. Archaea contain a set of sequence-independent DNAbinding proteins some of which undergo post-translational modifications, similar to the histone modifications in eukaryotic chromatin (Reeve, 2003). Among these archaeal DNA-binding proteins are the so-called histone-like proteins. On the other hand, SET domain encoding genes have also been identified in Archaea (Aravind and Iyer, 2003). Consequently, in order to determine the functional significance of SET domain proteins in Archaea, Manzur and Zhou (2005) characterized the first archaeal SET protein from the acetate-utilizing archaeal methanogen, *Methanosarcina mazei* strain Gö1 (referred as Gö1-SET).

The Gö1-SET protein was shown to selectively methylate *in vitro* bovine histone H4 at Lys5. However, histone H4 is not present in *M. mazei*. Alternatively, *M. mazei* has three DNA interacting proteins: a histone-like protein and two homologous MC1 proteins (methanogen chromosomal 1, MC1-α and MC1-β proteins). In view of that, *in vitro* MTase assays using the three DNA interacting proteins showed that Gö1-SET selectively methylates MC1-α but not the MC1-β and the histone-like protein, and that likely Lys37 of MC1-α is the specific target of Gö1-SET (Manzur and Zhou, 2005). Thus, archaeal SET domain proteins, like Gö1- SET, may regulate structures of archaeal chromatin composed of MC1–DNA complexes and indicate that chromatin modification by methylation took place before the separation of the archaeal and eukaryotic lineages (Manzur and Zhou, 2005).

Recently, a crenarchaeal protein lysine methyltransferase (named aKMT4, or also aKMT), which shows structural and enzymatic similarity to the eukaryotic KMT4/Dot1 family, has been characterized from *Sulfolobus islandicus* (Chu et al., 2012; Niu et al., 2013). However, such protein does not have a SETdomain and belongs to the Dot 1 family of histone lysine methyltransferases (KMTs). Nonetheless, the detection and further characterization of aKMT4/aKMT and protein homologs in other crenarchaeal species will allow a better understanding of the mechanisms involved in lysine methylation in crenarchaea and will clarify the evolutionary relationships among methyltransferases from the three domains of life.

#### **SYMBIONTS**

Most of the research related to complex interactions between eukaryotes and bacteria has been related to associations with microbes that are pathogenic; usually, microbial infection has been considered as deleterious, or at best irrelevant, to vigor and reproduction. However, symbionts and their metabolic potential play essential roles for many eukaryotic organisms that may benefit from enhanced fitness, survival, and even acquired virulence. Symbiosis is ubiquitous in terrestrial, freshwater, and marine ecosystems and it has played a crucial role in the appearance of major life forms on Earth and in the generation of biological diversity (Moran, 2006). Frequently, the associations are persistent for the hosts and are being transmitted vertically across generations. At times, the organisms involved in a symbiosis may be fully fused that they cannot live separately or be recognized as distinct entities without close scrutiny.

Recently, Partida-Martinez et al. (2007a,b) reported a unique symbiosis between bacteria belonging to the genus Burkholderia and the saprotrophic fungus *Rhizopus microsporus*. They have found that *Burkholderia rhizoxinica* is an intracellular symbiont of the phytopathogenic fungus *Rhizopus microspores* and that *B. rhizoxinica* is the producer of rhizoxin, the causative agent of rice seedling blight. This symbiosis represents an extraordinary example in which a fungus hosts a bacterial population for the production of a virulence factor. On the other hand, studies on the evolution of host resistance indicate that the fungus lost its ability to sporulate independently and became totally dependent on endobacteria for reproduction through spores (fungus formation of sporangia and spores is restored only upon reintroduction of endobacteria), thus warranting the persistence of the symbiosis and its efficient distribution through vegetative spores (Partida-Martinez et al., 2007a). Therefore, reproduction of the host is totally dependent on endofungal bacteria, which in return provides a highly potent toxin for defending the habitat and accessing nutrients from decaying plants.

Interestingly, *B. rhizoxinica* has a gene coding for a SETdomain protein (locus YP\_004027789). However, the function of this SET protein is still unknown. On the other hand, *B. rhizoxinica* has a type II secretion pathway, an encoded type III secretion system shown to play a crucial role for the establishment of the symbiosis, and a putative type IV secretion system (Lackner et al., 2011). It is tempting to speculate that the SETdomain protein functions as a type III or type IV secretion system effector (just like SET proteins present in other proteobacteria), and that it might be translocated from *B. rhizoxinica* into the fungus *R. microspores*. The question is, for what purpose? If we take into consideration that a microbe that forms chronic infections in a host or in a host lineage may evolve to conserve or even to benefit its host, as this will help to maintain its immediate ecological resource (Moran, 2006), then we can only hypothesize that the bacterial SET protein might complement the array of posttranslational modifications of histones in the fungus in order to improve bacterial intracellular replication through regulation of its host gene expression, and most important for the fungus, to formation of sporangia and spores. This is something that has to be scrutinized in the near future.

#### **CONCLUDING REMARKS**

Pathogenic or symbiotic bacteria make use of eukaryotic cell functions via specific interactions between microbial surface factors or secreted molecules and eukaryotic targets. These interactions, in turn, affect multiple signaling pathways and consequently promote a wide range of effects in host cells, including altered production of components involved in immune responses. Thus, as expected, bacteria employ a variety of strategies to affect the host cell cycle and gene expression program for their own benefit.

Recent studies have found that microbes affect a diverse set of epigenetic factors such as DNA methylation, histone modifications, chromatin-associated complexes, and non-coding RNAs (ncRNAs) to alter chromatin structure and gene expression. Intriguingly, DNA methylation plays a critical role in epigenetic gene regulation in eukaryotes as well as in prokaryotes (Kumar and Rao, 2013); Argonaute proteins, key players in RNA interference (RNAi) and related gene silencing phenomena in diverse eukaryotic species, are also present in many bacterial and archaeal species (Makarova et al., 2009); and post-translational modifications (PTMs) of proteins (e.g., histone and histone-like protein modifications) are used by both prokaryotic and eukaryotic cells to regulate the activity of key proteins. Thus, epigenetic mechanisms are clearly implicated in modulating biological interactions between hosts and bacterial pathogens and symbionts.

Chromatin modifications during animal development and in response to diverse environmental factors contribute to adult phenotypic variability and susceptibility to a number of diseases, including cancer, neurodegenerative, and neurological diseases and autoimmune disorders (Portela and Esteller, 2010). Because epigenetic modifications of chromatin may be transmitted to daughter cells during cell division, leading to heritable changes in gene expression, it is likely that a bacterial infection could generate heritable marks (Bierne et al., 2012). Therefore, understanding whether histone modification and/or DNA methylation marks imposed by bacterial proteins are maintained over time is a whole new area of research.

In plants, heritable histone modifications (e.g., histone lysine methylation) reported to regulate plant immunity against bacterial pathogens, by plant endogenous SET-domain proteins, have been associated for instance to the phenomenon of "priming" (a potentiated induction of defense genes and antimicrobial compounds for protection against biotic and/or abiotic stress). For example, Jaskiewicz et al. (2011) have demonstrated that during the interaction *Arabidopsis thaliana*-*Pseudomonas syringae* pv. *maculicola*, histone modifications on WRKY gene promoters have been detected in leaves distal to localized foliar infection for an augmented response to secondary stress. Thus, pathogen exposure induces one or more systemic signals that are stored on gene promoters in remote leaves in the form of histone modifications, mainly in the form of trimethylation of Lys 4 on histone H3 (H3K4me3) (Jaskiewicz et al., 2011). In another Arabidopsis study, *Pseudomonas syringae* pv *tomato* DC3000 (*Ps*tDC3000) inoculation increased resistance in subsequent generations. Progeny from *Ps*tDC3000-inoculated Arabidopsis were primed to activate salicylic acid-inducible defense genes and were more resistant to PstDC3000 (Luna et al., 2012). This transgenerational systemic acquired resistance indicates an epigenetic basis of the phenomenon.

However, neither *P. s.* pv. *maculicola* nor *Ps*tDC3000 have SET-domain coding genes. Thus, epigenetic changes can also contribute to and/or result from bacterial infectious diseases. Consequently, many open questions remain. For example, we should ask if events like priming take place only in plants that are being attacked by pathogens that do not have SET domain proteins, or if all bacterial pathogens induce priming. How immune priming induced protection does takes place in animals infected by pathogenic bacteria containing SET-domain proteins? What's more, it is essential to know how histone modifications might contribute to host response to infection and/or if bacteria take control of histone modifications to drive a transcriptional program beneficial for infection. Also, it is imperative to determine if SET domain proteins present in all pathogenic bacteria with a secretion system are secreted from bacteria, translocated to the host cell nucleus during infection and if they associate with chromatin. With respect to symbionts, do bacterial SET proteins complement the collection of post-translational modifications of their hosts to facilitate bacterial intracellular replication through regulation of its host gene expression? How do hosts benefit from bacteria containing SET-domain proteins in a symbiotic relationship? What group of bacterial SET-domain proteins has evolved into a chromatin-related role similar to their eukaryotic counterparts? How many new histone modifications are performed by bacterial SET-domain proteins, like RomA?

All these questions open new opportunities for future research in the subject of bacterial pathogenesis and chromatin-based regulation of host genes and may help to better understand the pathophysiology of bacterial infections and to develop efficient therapeutic approaches to treat important diseases, as well as to increase crop productivity.

#### **ACKNOWLEDGMENTS**

This work was funded by CONACYT, grant CB2011-167693 to Raúl Alvarez-Venegas.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fgene. 2014.00065/abstract

#### **REFERENCES**


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 10 February 2014; paper pending published: 06 March 2014; accepted: 14 March 2014; published online: 02 April 2014.*

*Citation: Alvarez-Venegas R (2014) Bacterial SET domain proteins and their role in eukaryotic chromatin modification. Front. Genet. 5:65. doi: 10.3389/fgene.2014.00065 This article was submitted to Epigenomics and Epigenetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Alvarez-Venegas. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

### Epstein–Barr virus–host cell interactions: an epigenetic dialog?

#### **Hans H. Niller <sup>1</sup>\*, Kalman Szenthe<sup>2</sup> and Janos Minarovits <sup>3</sup>**

1 Institute of Medical Microbiology and Hygiene, University of Regensburg, Regensburg, Germany

<sup>2</sup> RT-Europe Nonprofit Research Ltd, Mosonmagyaróvár, Hungary

<sup>3</sup> Department of Oral Biology and Experimental Dental Research, Faculty of Dentistry, University of Szeged, Szeged, Hungary

#### **Edited by:**

Ilaria Negri, University of Turin, Italy

#### **Reviewed by:**

Ian C. G. Weaver, Dalhousie University, Canada Nejat Dalay, Istanbul University Oncology Institute, Turkey

#### **\*Correspondence:**

Hans H. Niller, Institute of Medical Microbiology and Hygiene, University of Regensburg, Franz-Josef-Strauss Allee 11, D-93053 Regensburg, Germany e-mail: hans-helmut.niller@klinik. uni-regensburg.de

Here, we wish to highlight the genetic exchange and epigenetic interactions between Epstein–Barr virus (EBV) and its host. EBV is associated with diverse lymphoid and epithelial malignancies. Their molecular pathogenesis is accompanied by epigenetic alterations which are distinct for each of them. While lymphoblastoid cell lines derived from B cells transformed by EBV in vitro are characterized by a massive demethylation and euchromatinization of the viral and cellular genomes, the primarily malignant lymphoid tumor Burkitt's lymphoma and the epithelial tumors nasopharyngeal carcinoma and EBVassociated gastric carcinoma are characterized by hypermethylation of a multitude of cellular tumor suppressor gene loci and of the viral genomes. In some cases, the viral latency and oncoproteins including the latent membrane proteins LMP1 and LMP2A and several nuclear antigens affect the level of cellular DNA methyltransferases or interact with the histone modifying machinery. Specific molecular mechanisms of the epigenetic dialog between virus and host cell remain to be elucidated.

**Keywords: chromatin, CpG island, epigenome, viral latency, lytic cycle, pioneer transcription factor, tumor suppressor gene, tumor virus**

#### **EPSTEIN–BARR VIRUS—THE FIRST HUMAN TUMOR VIRUS**

Epstein–Barr virus (EBV), the proto-typical gamma-herpesvirus infecting humans, has been discovered 50 years ago in cultured Burkitt's lymphoma (BL) cells (Epstein et al., 1964). EBV physiologically homes to memory B cells, a property which is reflected in the genus name *Lymphocryptovirus* for gamma-herpesviruses which "hide in lymphoid cells." EBV is the causative agent of infectious mononucleosis, and was the first known human tumor virus. In addition to BL, an endemic childhood tumor mainly of equatorial Africa, EBV plays a role in the origin or progression of other primarily malignant B cell tumors, such as Hodgkin lymphoma, and diverse AIDS-associated lymphomas including lymphomas of the central nervous system. A profound failure of T cell surveillance may allow early-onset post-transplant lymphoproliferative disorder (PTLD) which may be overcome upon a timely onset of T cell control, but may turn malignant if growing uncontrolled for too long a time span.

In the first decade of EBV research, EBV was considered a purely lymphotropic virus. Finding viral DNA in cellular DNA from biopsies of anaplastic carcinomas of the nasopharynx (NPC, nasopharyngeal carcinoma) by DNA hybridization did not change that general view, because NPC as a lymphoepithelial tumor contains a great many of infiltrating lymphocytes (zur Hausen et al., 1970). Localizing the virus specifically to the malignant epithelial cells first established EBV infection of non-lymphatic cells and paved the way for the novel concept of EBV as an epithelial tumor virus (Wolf et al., 1973). NPC is an endemic tumor with a strong preference for South East Asia, especially Guangdong and Hong Kong, with an incidence rate of 20– 30 cases per 100,000 persons per year, and virtually 100% of non-keratinizing and undifferentiated NPCs are EBV-associated. Almost two decades later on, the association of EBV with gastric carcinomas (EBVaGC) was established, too (Burke et al., 1990; Shibata et al., 1991). Contrary to the endemic tumor NPC, about 10% of the worldwide sporadic tumor gastric carcinoma (GC) are associated with EBV infection. Among gastric remnant carcinomas approximately 30% and among lymphoepithelioma-like GCs, approximately 80% are EBV-associated. Altogether, EBVaGC with an estimated more than 80,000 cases per year is probably the most frequent EBV-associated malignancy worldwide (reviewed in Niller et al., 2014a,b). Beyond the B cell lineage, EBV infection is currently associated with T cell lymphomas, epithelial tumors, and rarely with leiomyosarcoma, a neoplasm of mesodermal origin (McClain et al., 1995). Furthermore, the risk of autoimmune disease, including multiple sclerosis, is significantly increased after primary EBV-infection, and even more so after symptomatic primary infection, i.e., mononucleosis (Niller et al., 2008).

#### **PATHOGENESIS OF EBV-ASSOCIATED TUMORS**

The spectacular ability of EBV to transform and immortalize B cells dominated the first four decades of EBV research and tumor virology. General view was that the EBV-transformed cell was the origin of the endemic BL cell, too, although a fundamental difference of the epidemiology and pathogenesis between lymphoblastoid cell line (LCL)-like tumors on one side (early onset PTLD), and of primarily malignant EBV-associated lymphomas

**Table 1 | Host cell-dependent expression of latent Epstein–Barr virus proteins.**


Key viral latency proteins or so-called oncoproteins are differentially expressed in EBV-associated malignancies and transformed cell lines. Depending on the specific viral gene expression patterns, specific viral latency classes are defined. BL, Burkitt's lymphoma; NPC, nasopharyngeal carcinoma; CLL, chronic lymphocytic leukemia; LCL, lymphoblastoid cell line; EBNA, Epstein–Barr virus nuclear antigen; LMP, latent membrane protein; BARF1, BamHI A rightward frame 1.

on the other side (endemic BL) became evident. Early onset PTLD originate under conditions of severe immune suppression and depend on viral transforming functions, including EBV nuclear antigens (EBNAs) and latent membrane proteins (LMPs) that are expressed both in PTLDs *in vivo* and in LCLs immortalized *in vitro*. On the other hand, BL and Hodgkin lymphoma originate under conditions of hyperstimulation of the lymphoid germinal center reaction, and they do not depend on EBNA2 which the vast majority of them do not express (Klein, 1987; Lenoir and Bornkamm, 1987; **Table 1**). In this context, it is important to distinguish between morphological and oncogenic transformation (Niller et al., 2011). Our discovery of a binding site for the oncoprotein c-Myc in the central locus control region of the EBV genome suggested that the molecular pathogenesis of endemic BL does not depend on a previous EBNA2 transformed state of the B cell, but mostly on a dysbalance of pro- and anti-apoptotic functions in consequence of Myctranslocation, a molecular accident in a virus-infected B cell undergoing the germinal center reaction (Niller et al., 2003). The need to counter-balance the pro-apoptotic force of translocated *c-Myc* through anti-apoptotic functions, either encoded by the viral genome or induced by virus infection, in order for a BL to emerge has recently been re-emphasized (Mbulaiteye, 2013; Westhoff Smith and Sugden, 2013; Rickinson, 2014). Our differential pathogenesis model for EBV-associated lymphomas was controversial at first (Rossi and Bonetti, 2004; Thorley-Lawson, 2004; reviewed in Niller et al., 2012), but has now gained strong support by recent large-scale epigenomic analyses of LCLs and tumor cells (see below). Thus, finding the binding site for the oncoprotein c-Myc in the locus control region of EBV caused a conceptual shift away from the morphologically transformed cell and has turned out as a heuristic discovery (Niller et al., 2003).

#### **GENETIC EXCHANGE BETWEEN HERPESVIRUSES AND HOST CELLS**

Homology between herpesviral and human genes is now a common theme which was first highlighted by the finding of a gene for a functional thymidylate synthase (TS) in the genome of herpesvirus saimiri with an extremely high homology of 70% identical amino acids with the human *TS* gene. Various parameters suggested that the *TS* gene had been acquired in virus evolution by an ancestral herpesvirus from the cellular genome (Bodemer et al., 1986; Honess et al., 1986). The exchange of human and viral genes, and in the case of human herpesvirus 6 (HHV-6) the invasion of an entire herpesviral genome into the human germ line (Daibata et al., 1999) in about 0.8% of humans, must have happened on numerous occasions in evolutionary time. In the case of EBV, the intimate evolutionary relationship of virus and host cell is emphasized by the presence of several viral genes with sequence homology to cellular genes, i.e., *BHRF1* and *BALF1*, two anti-apoptotic *BCL2* homologs, *BILF1*, coding for a constitutively active G protein-coupled receptor (GPCR) homolog, and *BCRF1*, an *IL-10* (*interleukin 10*) gene homolog. The BCRF1 protein appears to be a functional homolog of IL-10, an immune suppressive cytokine secreted by Th2 cells with a sequence identity of about 70% (Hsu et al., 1990). The sequence homologies of the other three viral peptides are of lower degree than that. BHRF1 carries a 25% identity of a 150 amino acid Cterminal portion with the anti-apoptotic cellular protein BCL2 (Cleary et al., 1986). BALF1 shows homology at a similar degree in functionally important domains to its cellular counterparts BCL2 and BCLX (Marshall et al., 1999). BILF1 carries a homology of around 20% with the human chemokine receptor CXCR3A (Davis-Poynter and Farrell, 1996; Beisser et al., 2005; Paulsen et al., 2005). EBV-associated carcinoma cells express BARF1, identified initially as a lytic cycle protein that binds to hCSF-1 (human colony stimulating factor 1) as a viral decoy receptor, although its crystal structure is most closely related to CD80, a co-stimulatory molecule of antigen presenting cells (Seto et al., 2005; Tarbouriech et al., 2006; Elegheert et al., 2012).

Herpesviral DNA-binding proteins ICP8 of HSV1 and BALF2 of EBV, and additional homologous proteins of human herpesviruses which are required for viral replication belong to a class of "DDE/RNase H-like fold-family" nucleases, together with the recombination activating gene (RAG) 1 protein, essential for V(D)J recombination, and the Argonaute protein of the RNAinduced silencing complex (RISC). For ICP8 of HSV, divalent cation binding of the DDE-site was actually shown to be functional and required for viral replication (Bryant et al., 2012). Furthermore, inhibitors of HIV integrase, another RNase H-fold protein, inhibited the replication of viruses from all herpesvirus genera, too (Yan et al., 2014). Based on a co-regulatory transcriptional network for both *RAG-1* and *RAG-2*, and the genes for herpesviral DNA-binding proteins, and based on signature sequence homologies between *V(D)J* recombination sites and the viral terminal repeats, an evolutionary relationship between the RAG recombinase and herpesviral DNA-binding proteins was proposed. The *RAG* locus may have originally been introduced to host cells by a primordial herpesvirus (Dreyfus, 2009). We found a striking co-linearity of structural and functional elements between the cellular immunoglobulin gene loci and the left part of the EBV genome. Therefore, although speculative, we agree with the view that, in the case of the *RAG* genes, the appearance of the adaptive immune system may have been dependent on a primordial herpesvirus genome and, in the case of the B cell, may have developed further as consequence of an evolutionary pingpong game between EBV and the host cell (Niller et al., 2004).

#### **EPIGENETIC INTERACTIONS BETWEEN EBV AND ITS HOST CELLS**

Epstein–Barr virus infects both B lymphocytes and epithelial cells *in vivo*. It enters B cells after binding to CD21, a cell surface molecule absent from epithelial cells. However, EBV-infected B cells are capable to transfer the virus to epithelial cells lacking the EBV receptor. Both B cells and epithelial cells can support productive (lytic) EBV replication when all of the proteins and non-translated RNAs encoded by the viral genome are expressed in a sequential order. EBV also causes latent infections, typically in resting memory B cells and in various neoplastic cells that usually carry circular, double stranded viral genomes. During latency, only a restricted set of EBV promoters is active. The activity of latent EBV promoters depends on the phenotype of host cells, and it is controlled by the epigenetic regulatory machinery. Based on the epigenetic marks deposited on the viral chromatin by the cellular epigenetic machinery one can distinguish between viral epigenotypes that are associated with unique patterns of viral gene expression (reviewed in Minarovits, 2006). In parallel, certain latent EBV proteins characteristic for the major cell types carrying latent EBV genomes act as epigenetic regulators themselves: they alter the cellular epigenotype and gene expression pattern and may contribute to the development of malignant tumors. Thus, the situation is similar to an "epigenetic dialog," indeed. Typical examples of EBV latency types are summarized in **Table 1**, based on the nomenclature suggested by Klein et al. (2013; see also Laytragoon-Lewin et al., 1995; Niller et al., 2012).

#### **EPIGENETIC CONTROL OF EBV LATENCY PROMOTERS**

The epigenetic regulatory mechanisms of host cells not only control the preservation of cell type-specific gene expression patterns from cell generation to cell generation, but ensure the maintenance of host cell-dependent usage of latent EBV promoters as well. Epigenetic regulation is based on writing, reading, and erasing epigenetic marks on chromatin as well as on protein–DNA interactions that are stable even in mitotic, highly condensed chromatin (reviewed in Gopalakrishnan et al., 2008; Zaret et al., 2008; Sharma et al., 2010; Blomen and Boonstra, 2011). Euchromatic marks favor transcription whereas heterochromatic marks are associated with a more condensed chromatin structure that usually represses promoter activity.

DNA methylation is involved in silencing of most latent EBV promoters. It is well documented that the alternative promoters Wp and Cp, where transcripts coding for six EBNAs are initiated can be switched off by CpG methylation. In addition, LMP1p and LMP2Ap, the promoters for LMP1 and LMP2A transcripts are silenced by the activity of cellular DNA methyltransferases, too. DNA methylation does not play a role, however, in switching off Qp, an alternative promoter for EBNA1 transcripts (reviewed in Li and Minarovits, 2003; Niller et al., 2012). One may speculate, that EBNA2, the major transactivator protein encoded by the EBV genome that interacts with both histone acetyltransferases and histone deacetylases (reviewed in Niller et al., 2009) may activate key cellular genes mediating genome-wide demethylation in LCLs. Acetylation of histone H3 and histone H4 molecules and di- or trimethylation of lysine 4 of histone H3 (H3K4me2 or H3K4me3) are euchromatic marks frequently associated with active Cp, Qp, LMP1p, and LMP2Ap (reviewed in Niller et al., 2012; Arvey et al., 2013). In principle, complexes formed by Polycomb and Trithorax group proteins that modify histone tails could also play a role in the regulation of latent EBV promoters but there are no data supporting such a mechanism. In contrast, Polycomb group protein EZH2 was observed to leave a heterochromatic histone mark (H3K27me3) at early lytic promoters of EBV that are silent during latency, and chromatin immunoprecipitation proved the association of the EZH2 methyltransferase with this class of viral promoters in the BL cell line Raji (Woellmer et al., 2012). The immediateearly promoter Zp, where transcripts for the lytic cycle initiating BZLF1 protein are initiated was also found to be repressed by H3K27me3 but also by H4K20me3 in Raji cells (Murata et al., 2012).

Pioneer transcription factors and variant histone molecules that bind to repressive chromatin areas and mark the genes to be activated have not been implicated in the control of EBV latency. Chromatin loops formed by binding of insulator proteins to EBV episomes may play a role, however, in the epigenetic regulation of the latency promoters Cp and Qp (Tempera et al., 2011).

#### **EBV-ENCODED ONCOPROTEINS AS EPIGENETIC REGULATORS**

The methylation patterns of NPC cells and EBVaGC cells differ from their normal counterparts: these neoplastic cells regularly carry hypermethylated genomic regions with silenced cellular promoters (Iizasa et al., 2012; Lo et al., 2012). Focal hypermethylations frequently inactivate tumor suppressor genes and may contribute to carcinogenesis and tumor progression. Because the EBV-encoded transmembrane proteins LMP1 and LMP2A are capable of upregulating the cellular DNA methyltransferases DNMT1, DNMT3A, and DNMT3B, it was suggested that hypermethylation of CpG rich sequences, the so called CpG islands, is mediated by LMP1 or LMP2A in EBV-associated carcinomas (reviewed in Niller et al., 2012). LMP1 and LMP2A are expressed in EBV positive Hodgkin lymphomas as well. Thus, they may contribute to gene silencing in these neoplasms, too.

It is worthy to note that, similarly to the EBV-associated carcinomas, LCLs established by *in vitro* EBV-infection of B cells also express LMP1 and LMP2A that could potentially upregulate DNA methyltransferases. It was observed, however, that the typical epigenetic change in LCLs is a widespread demethylation of the B cell methylome affecting one third of all cellular genes and 2.18 GB of the genome (Hansen et al., 2014). The mechanism of demethylation remains to be established. Furthermore, the EBV episomes carried by LCLs are also hypomethylated, in contrast to the overall hypermethylation of EBV genomes in latency type I BL lines, BL biopsies, and EBV-associated carcinomas (Minarovits et al., 1991; Fernandez et al., 2009). The viral oncoproteins EBNA3A and EBNA3C expressed during *in vitro* immortalization of B cells silence distinct cellular tumor suppressor genes by depositing a heterochromatic histone mark via the Polycomb repressor complex PCR2 (reviewed in Allday, 2013). The dominant change in EBV immortalized B cells seems to be, however, a genome-wide decrease and redistribution of heterochromatic marks (Hernando et al., 2014). The EBV latency products eliciting the reprograming of the host cell epigenome in LCLs remain to be elucidated. Contrary to LCLs, primarily malignant lymphomas, i.e., BL and other lymphomas, are characterized not by a massive and wide-spread hypomethylation, but by a local hypermethylation of selected genomic loci (Martin-Subero et al., 2009a,b; Kreck et al., 2013; reviewed in Niller et al., 2014b).

In contrast to BL tumors, in which hypermethylated loci are strongly enriched for polycomb repressive complex (PRC) target genes of embryonic stem cells, hypermethylated genes in EBVaGC are not enriched for PRC targets (Martin-Subero et al., 2009a; Matsusaka et al., 2011). Contrary to *Helicobacter pylori*-associated GC, EBVaGC does not emerge from an "epigenetic field" in the gastric mucosa (Matsusaka et al., 2011; Niller et al., 2014a). Thus, EBV-associated epigenetic changes may quite quickly set the stage for malignancy (Au et al., 2005; Niller et al., 2014a). Notably, even transient EBV infection of epithelial cells leaves permanent epigenetic scars indicating past infection (Queen et al., 2013; Birdwell et al., 2014).

Although there are no data as to the interaction of host cell-encoded pioneer transcription factors and the EBV genome, certain viral proteins that bind to both viral and cellular DNA and remain associated with mitotic chromosomes may act as pioneer transcription factors. EBNA1, a nuclear protein expressed in all EBV latency types is a putative pioneer factor functionally resembling transcription factors of the FoxA family that act as epigenetic regulators controlling important developmental processes (Niller et al., 2012). EBNA1 was shown to bind to different sets of cellular promoters in cell lines of epithelial and lymphoid origin, and upregulation as well as downregulation of distinct gene batteries was observed (Canaan et al., 2009). In the BL cell line Raji, EBNA1 was shown to interact with a large number of cellular genes as well as LINE elements, and high affinity EBNA-binding sites were observed in a repetitive sequence in chromosome 11 (Lu et al., 2010).

BZLF1, an immediate-early EBV protein initiating productive viral replication that preferentially binds methylated DNA sequences was also suggested to act as a pioneer transcription factor (Woellmer et al., 2012). BZLF1 is expressed not only during the lytic cycle but transiently also during the establishment of latent infection of B cells *in vitro* (Kalla and Hammerschmidt, 2012). Because BZLF1 can bind to cellular promoters (Lan et al., 2013) and elicit epigenetic alterations (Woellmer et al., 2012), an "epigenetic dialog" between the latent EBV genomes and the host cell genome may occur, indeed: transient BZLF1 expression may change the cellular epigenotype followed by silencing of the BZLF1 promoter through the cellular epigenetic machinery. In parallel, EBV-encoded epigenetic regulators may leave their marks on the cellular epigenotype in a next phase of EBV-mediated B cell immortalization.

In conclusion, on an evolutionary time scale a genetic exchange between herpesviruses and their hosts is evident. Beginning with the early steps of viral infection, epigenetic interactions between virus and host cell are taking place. The multi-tiered epigenetic dialog between EBV and its host needs to be elucidated in greater molecular detail in order to understand the diverse outcomes of infection.

#### **REFERENCES**


Epstein–Barr virus-induced B-cell immortalization. *Genome Res.* 24, 177–184. doi: 10.1101/gr.157743.113


with intense lymphoid infiltration. Lymphoepithelioma-like carcinoma. *Am. J. Pathol.* 139, 469–474.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 13 August 2014; accepted: 02 October 2014; published online: October 2014. 21*

*Citation: Niller HH, Szenthe K and Minarovits J (2014) Epstein–Barr virus–host cell interactions: an epigenetic dialog? Front. Genet. 5:367. doi: 10.3389/fgene.2014.00367 This article was submitted to Epigenomics and Epigenetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Niller, Szenthe and Minarovits. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

### Epigenetic control of mobile DNA as an interface between experience and genome change

#### *James A. Shapiro\**

*Department of Biochemistry and Molecular Biology, University of Chicago, Chicago, IL, USA*

#### *Edited by:*

*Ilaria Negri, University of Turin, Italy*

#### *Reviewed by:*

*Jennifer Cropley, Victor Chang Cardiac Research Insitute, Australia Mauro Mandrioli, University of Modena and Reggio Emilia, Italy*

#### *\*Correspondence:*

*James A. Shapiro, Department of Biochemistry and Molecular Biology, University of Chicago, GCIS W123B, 979 E. 57th Street, Chicago, IL 60637, USA e-mail: jsha@uchicago.edu*

**INTRODUCTION**

Understanding the functional organization of the genome and its evolutionary history are key goals of modern molecular biology. The task has become more interesting and complex as we learn more the details of cell processes involving the genome. Recent applications of high resolution technologies to genome expression in animals reveal a dynamic four-dimensional interactive control architecture incompatible with prior notions that genomes contain discrete functional segments of DNA ("genes") (Mercer and Mattick, 2013). This review will focus on the role of epigenetic regulation of viruses and mobile genetic elements as a key interface between the activities of these agents of evolutionary change and inputs from cell and organism life histories. The hypothesis developed as a result of the review is that disruption of epigenetic silencing constitutes a major target for life history activation of cellular functions for genome change. This likely occurs after genome replication, possibly by changes in small non-coding (snc) RNAs, typically on the order of 20–50 nucleotides long.

#### **MOBILE DNA IS A MAJOR AND FUNCTIONALLY SIGNIFICANT COMPONENT OF GENOMES**

One of the major surprises to come from the initial sequencing of the human genome was the high abundance of dispersed mobile repeat elements (Consortium, 2001). Today, we estimate that at least two-thirds of our genomes is composed of mobile DNA (De Koning et al., 2011). The human genome is not exceptional in its high content of mobile DNA (http://shapiro*.*bsd*.*uchicago*.*edu/ TableII*.*1*.*shtml).

We increasingly recognize that viruses contribute to cell genomes (Kokosar and Kordis, 2013). They provide sequences for non-coding ncRNAs (Frias-Lasserre, 2012), sites for transcriptional control (Peaston et al., 2004; Dunn et al., 2005; Maksakova et al., 2006; Conley et al., 2008), and elements important in epigenetic regulation (Brunmeir et al., 2010; Conley and Jordan, 2012).

Mobile DNA in the genome is subject to RNA-targeted epigenetic control. This control regulates the activity of transposons, retrotransposons and genomic proviruses. Many different life history experiences alter the activities of mobile DNA and the expression of genetic loci regulated by nearby insertions. The same experiences induce alterations in epigenetic formatting and lead to trans-generational modifications of genome expression and stability. These observations lead to the hypothesis that epigenetic formatting directed by non-coding RNA provides a molecular interface between life history events and genome alteration.

**Keywords: mutation, evolution, natural genetic engineering, mobile DNA, viruses, mobile genetic elements, non-coding RNA**

> Similar transcriptional and epigenetic regulatory contributions are made by mobile genetic elements (http://shapiro.bsd. uchicago.edu/Table5C-1.MobileElementsFoundtobeExaptedascis-RegulatoryControlSitesinAnimals.html) (Youngson et al., 2005; Kinoshita et al., 2007; Suzuki et al., 2007; Fujimoto et al., 2008; Gehring et al., 2009; Pask et al., 2009; Nakayashiki, 2011).

> Mobile DNA is a major source of novel coding information. One mechanism is the process known as "exonization," when splice signals are utilized in newly inserted DNA segments (http://shapiro.bsd.uchicago.edu/Origin\_of\_New\_Protein\_Domains.html). New coding sequences also form by reverse transcription of processed RNAs and genome insertion of the cDNAs, sometimes producing chimeric fusions with existing exons (http://shapiro.bsd.uchicago.edu/Table 5B. Reports of retrogenes in plant and animal genomes.html) (Long, 2001; Betrán et al., 2002; Fu et al., 2010).

> It is now clear that mobile genetic elements play a key role in establishing and rewiring genomic networks (http:// shapiro.bsd.uchicago.edu/Table5C-1.MobileElementsFoundtobe Exaptedascis-RegulatoryControlSitesinAnimals.html) (Feschotte, 2008; Lindblad-Toh et al., 2011; Lowe et al., 2011; Testori et al., 2012; Kokosar and Kordis, 2013). Moreover, mobile element proliferation is a key factor in the formation of very large genomes (http://shapiro.bsd.uchicago.edu/Genome\_Size.html).

> The potential functional importance of distributed mobile DNA in genomes grows rapidly as evidence accumulates for pervasive genome transcription (http://shapiro*.*bsd*.*uchicago*.* edu/PervasiveGenomeTranscription*.*html) and for the regulatory role of non-coding RNAs (ncRNAs) in genome expression, including the functional juxtaposition of distant genome regions to activate transcription (http://shapiro*.*bsd*.*uchicago*.* edu/NonCodingRNAinGenomeExpression*.*html). Mobile elements participate in this long-range genomic communication and provide the sequences of many ncRNAs (Kapusta et al., 2013).

#### **CELLS USE RNA-TARGETED EPIGENETIC CONTROL TO INHIBIT THE ACTIVITY OF MOBILE DNA**

Given the high content of mobile DNA in many genomes, an important question is: what prevents all the mobility systems from destroying genome integrity? In eukaryotic cells, a major control mechanism is sncRNA-directed epigenetic formatting into silent chromatin (Law and Jacobsen, 2010; Castel and Martienssen, 2013).

Both prokaryotes and eukaryotes have systems for capturing fragments from invading DNA molecules and placing the fragments into special loci encoding sncRNAs (Dumesic and Madhani, 2014). In prokaryotes, these loci are called CRISPRs (clustered regular interspersed palindromic repeats) (http:// shapiro*.*bsd*.*uchicago*.*edu/CRISPRs*.*html) (Marraffini and Sontheimer, 2010; Garrett et al., 2011; Bikard and Marraffini, 2013; Watanabe et al., 2013). The RNA transcripts from CRISPRs are processed into sncRNAs that target cleavage of homologous invading DNA and also inactivation of complementary mRNA (Djordjevic et al., 2012). The details of the RNA processing and interference activities are well-characterized, but the acquisition of DNA fragments is poorly understood. The process must be very rapid, because viral infection yields cells that survive the initial infection with appropriate fragments added to their CRISPR repertoire (Barrangou et al., 2007).

Virtually all eukaryotes investigated, with the notable exception of budding yeast, have mechanisms for sncRNA-directed chromatin silencing. They are based on members of the Argonaute family of sncRNA-processing proteins (http://shapiro*.* bsd*.*uchicago*.*edu/microRNA-directedchromatinsilencing*.*html). Plants and animals have independently evolved distinct mechanisms of processing the sncRNAs for the Argonaute family systems, but both groups use targeted epigenetic regulatory processes to defend against virus infection (Ding and Voinnet, 2007; Csorba et al., 2009) and prevent genome instability (**Table 1**). Like prokaryotes, *Drosophila* has specific genomic loci where it acquires fragments of invading DNA to encode the targeting sncRNAs (Brennecke et al., 2007, 2008; Handler et al., 2013).

#### **LIFE HISTORY EVENTS DESTABILIZE GENOMES AND ACTIVATE MOBILE DNA**

Anyone who has studied real-time genome changes quantitatively knows that mutation frequencies depend upon the treatment of the experimental organism prior to measurement. A wide variety of life history events influence the natural genetic engineering (NGE) functions that generate mutations, especially mobile elements (**Table 2**; Shapiro, 2011). In some cases, the genome instabilities are large scale and last multiple cell or organismal generations.

Many observations demonstrate responses of the circuits controlling NGE functions to biological and abiotic inputs. It is particularly significant that many such responses occur following exceptional cell interactions with viruses or with other cells, either by infection or by hybridization (**Table 2**). As we might expect, the introduction of alien DNA or chromatin into a cell often has disruptive effects on genome homeostasis (Shapiro, 2014).

**Table 1 | Genome immunity by sncRNA targeting of mobile DNA (see also http://shapiro.bsd.uchicago.edu/TableII.9.shtml for earlier references).**


#### **EPIGENETIC CHANGES IN RESPONSE TO LIFE HISTORY EVENTS**

One of the most active research areas in the second decade of the 21st century is analyzing the impact of life history events on the epigenetic layers of cell regulatory architecture (**Table 3**) (Chinnusamy and Zhu, 2009a; Vandegehuchte and Janssen,



2013). The observed epigenetic responses include alterations to cytosine methylation in DNA (Chinnusamy and Zhu, 2009b), histone modifications in nucleosomes, and sncRNAs (Ruiz-Ferrer and Voinnet, 2009; Ng et al., 2012) as well as transgenerational inheritance of complex novel phenotypes (Zucchi et al., 2012), frequently induced by stress (Boyko and Kovalchuk, 2010). The phenomenon of hybrid vigor, or heterosis, increasingly is viewed as an alteration in sncRNA-targeted epigenetic formatting stimulated by the encounter of two distinct genome control regimes (Groszmann et al., 2011; Miller et al., 2012; Shivaprasad et al., 2012).

Many of the studies demonstrating induced epigenetic modifications also document accompanying genome instabilities and emphasize their evolutionary potential (Madlung and Wendel, 2013). It is noteworthy that many of the same stimuli are involved in both genomic and epigenomic responses in plants (Hegarty et al., 2013) and animals (Arkhipova and Rodriguez, 2013). The common stimuli include infection and symbiosis (Hamon and Cossart, 2008; Bierne et al., 2012; Takahashi, 2014), hybridization and changes in ploidy.

#### **DIRECT INTERACTIONS BETWEEN NGE ACTIVITIES AND EPIGENETIC REGULATORY FUNCTIONS**

In addition to disruption of sncRNA-targeted inhibition, there is limited but growing evidence that NGE functions acting on DNA molecules interact directly with epigenetic control factors. There is convincing evidence of the connection between NGE and the epigenome in DNA damage repair and retroviral or retrotransposon insertions into chromosomes.

#### **EPIGENETIC INVOLVEMENT IN DNA PROOFREADING AND REPAIR**

There are recent reports that a specific histone modification (H3K36me3) primes DNA mismatch repair (Schmidt and Jackson, 2013), that H3K56 acetylation affects mismatch repair (Kadyrova et al., 2013), that hypoacetylation of H3K56 by HDACs 1 and 2 facilitates recruitment of non-homologous end-joining


#### **Table 3 | Life history events that induce epigenetic changes (see also http://shapiro.bsd.uchicago.edu/TableII.10.shtml for earlier references).**


(NHEJ) proteins (Miller et al., 2010; Munoz-Galvan et al., 2013), and that nucleosome remodeling is integral to DS break repair (Seeber et al., 2013). Longstanding observations document the involvement of a specific histone, gamma-H2AX, in DS break repair and NHEJ (Kinner et al., 2008; Altaf et al., 2009; Dickey et al., 2009b; Redon et al., 2009; Firsanov et al., 2011; Chen et al., 2013). A direct role in chromatin remodeling for DNA repair has been claimed for another H2 analog, H2A.Z (Xu et al., 2012a).

Published reports indicate that H2AX incorporation into chromatin suppresses conversion of single-strand nicks to DS breaks (Franco et al., 2006) and affects the processing of the ends of broken DNA molecules (Helmink et al., 2011). H2AX operates in phosphorylated form (Rogakou et al., 1998; Kinner et al., 2008).

Beyond the role of H2AX, chromatin dynamics play an essential role in DNA repair and genome homeostasis (Lahue and Frizzell, 2012; Shi and Oberdoerffer, 2012). Many reports claim repair roles for chromatin regulators, remodeling complexes and nucleosome exchange factors (Ryan and Owen-Hughes, 2011):


Nucleosome disassembly is probably necessary for certain repair processes (Linger and Tyler, 2007; Amouroux et al., 2010; Gospodinov and Herceg, 2013), and histone modifications affect damage-induced checkpoint signaling (Chen and Tyler, 2008). Once repair is complete, nucleosome modifications are reversed, and H2AX∼P is eliminated from chromatin (Svetlova et al., 2010). So-called "bystander" cells, which are not subjected to DNA damage but are in the same culture as irradiated cells, also display H2AX phosphorylation (Sokolov et al., 2007; Dickey et al., 2009a, 2011).

A key feature of genome repair is that H2AX-marked damaged DNA mobilizes to subnuclear "repair centers" where homologous recombination and NHEJ proteins also localize (Lisby and Rothstein, 2005; Plate et al., 2008; Bekker-Jensen and Mailand, 2010). A role for chromatin in mobilization of damaged DNA has been proposed (Seeber et al., 2013), but multiple sources of evidence are lacking.

#### **RETROVIRAL AND RETROTRANSPOSON INTEGRASES**

A more extensive case for NGE-chromatin interactions comes from analysis of retroviral and retrotransposon insertion specificities (Zhang and Mager, 2012). Each type of retrovirus displays a characteristic insertion specificity for its provirus (Lewinski et al., 2006). A number of targeting mechanisms involve epigenetic formatting molecules.

In budding yeast, Ty1 retrotransposon integrase contacts an H2A/H2B interface upstream of RNA polymerase III initiation sites (Baller et al., 2012; Bridier-Nahmias and Lesage, 2012; Mularoni et al., 2012). Histone deacetylase Hos2 and Trithorax group protein Set3 stimulate this nucleosome-targeted integration (Mou et al., 2006), and chromatin remodeling factor Isw2p is also implicated (Bachman et al., 2005). In contrast, the Ty5 retrotransposon inserts in silent chromatin, targeted by binding of its integrase to the Sir4 heterochromatin nucleating factor (Xie et al., 2001; Dai et al., 2007; Brady et al., 2008; Baller et al., 2011).

HIV and other lentiviral targeted integration into actively transcribed regions of the genome is associated with transcriptionassociated histone modifications, including H3 acetylation, H4 acetylation, and H3 K4 methylation, but is disfavored in regions rich in transcription-inhibiting modifications, which include H3K27me3 and DNA CpG methylation (Wang et al., 2007). The specificity results from integrase tethering by the LEDGF/p75 chromatin-binding growth factor (Vanegas et al., 2005; Llano et al., 2006; Ciuffi, 2008; Meehan and Poeschla, 2010; Zheng et al., 2010; Christ and Debyser, 2013). Replacing the LEDGF/p75 domain that interacts with expressed chromatin by the CBX1 domain, which binds histones H3K9me2 or H3K9me3 found in pericentric heterochromatin, targets HIV insertions to silent chromatin regions (Gijsbers et al., 2010).

Murine leukemia virus (MuLV) insertion targeting to initiation sites upstream of actively transcribed regions involves integrase interactions with bromodomain proteins BRD2, BRD3, and BRD4 (De Rijck et al., 2013; Gupta et al., 2013; Sharma et al., 2013a). Interestingly, chromatin recognition bromodomain protein BRD4 antagonizes HIV provirus reactivation (Zhu et al., 2012).

Certain retrotransposons are specifically targeted to centromeres (Wolfgruber et al., 2009; Birchler and Presting, 2012; Tsukahara et al., 2012; Sharma et al., 2013b), which have a special chromatin configuration characterized by centromeric versions of H3 (Henikoff and Dalal, 2005; Vos et al., 2006; Partridge, 2008; Zhang et al., 2008a). Centromeric retrotransposons in rice are highly associated with H3K9me2, a hallmark for heterochromatin (Neumann et al., 2007). Some centromeric retrotransposons encode integrase proteins with histone-binding chromodomains at their carboxy-termini (Neumann et al., 2011). Chromodomains recognize lysine methylation (Blus et al., 2011; Yap and Zhou, 2011; Eissenberg, 2012).

It is probably not coincidental that the most widely distributed group of retrotransposons among all eukaryotic clades are the "chromoviruses," which are so named because they have chromodomains in their integrase proteins (Gorinsek et al., 2004; Kordis, 2005; Novikov et al., 2012; Weber et al., 2013). A chromodomain has been reported to target fungal chromovirus MAGGY insertions to heterochromatin marked by H3K9me2/me3 (Gao et al., 2008). An integrase chromodomain also participates in activator protein-targeted insertion of fission yeast retrotransposon Tf1 upstream of RNA polymerase II transcription start sites (Hizi and Levin, 2005; Chatterjee et al., 2009).

#### **DNA TRANSPOSONS**

In contrast with many retrotransposons that interact with nucleosomes, the DNA transposon Hermes inserts preferentially in budding yeast into nucleosome-free regions of the genome (Gangadharan et al., 2010). The widely used P element DNA transposons in *Drosophila* show targeting (called "P element homing") by incorporating binding sites for various regulatory factors, including chromatin insulators (Bender and Hudson, 2000; Fujioka et al., 2009) and Polycomb group response elements (Kassis, 2002; Cheng et al., 2012).

#### **EPIGENETIC REFORMATTING AFTER DNA REPLICATION AND ncRNAs AS POTENTIAL AGENTS FOR TRANSMITTING EXPERIENCE TO THE GENOME**

While the evidence is increasingly abundant for effects of different life history events on epigenetic regulation in general, and on genome homeostasis in particular, it is far from clear how those effects occur (Lim and Brunet, 2013). We know very little about the connections between cell sensors and epigenetic (re)formatting complexes (Erdel et al., 2011; Narlikar et al., 2013). Deciphering those connections is currently an important research goal.

DNA replication provides a key decision point for maintaining or changing chromatin configurations (Poot et al., 2005; Liu and Gong, 2011; Mermoud et al., 2011). The replication apparatus must disassemble chromatin for polymerization and then reassemble chromatin once replication is complete. Replication takes place only in dividing cells, and transgenerational inheritance of epigenetic states must involve the proliferating cells that give rise to gametes. Transfer of outside information from somatic tissues to the germline has been reported in mammals (Sharma, 2013; Skinner et al., 2013). And epigenetic windows of susceptibility to environmental insults have been suggested during sperm development (Soubry et al., 2014). Since there is no segregated germ line in plants and eukaryotic microbes, the same cells that experience environmental inputs can also be the progenitors of gametes.

A number of different factors have been found or hypothesized to participate in post-replication chromatin restoration: histone chaperones (Budhavarapu et al., 2013), RNA editing and sncRNAs (Savva et al., 2013), chromatin remodeler SMARCAD1 (Mermoud et al., 2011), chromatin assembly factor 1 (Huang and Jiao, 2012), histone chaperon FACT (Winkler and Luger, 2011) and Swi/Snf complexes (Neves-Costa and Varga-Weisz, 2006; Ryan and Owen-Hughes, 2011; Zhu et al., 2013), and ISW1 complexes (Erdel and Rippe, 2011).

One frequently overlooked feature of post-replication reestablishment of epigenetic formatting is where in the nucleus it might occur. Replication takes place in specialized "replication factories" (Vago et al., 2009; Guillou et al., 2010). Does chromatin reestablishment occur in the same location or does it involve migration of newly replicated DNA segments to distinct subnuclear "chromatin factories," like the ones that exist in the nucleolus for heterochromatin formation on rRNA-encoding DNA (Guetg and Santoro, 2012)? If so, such post-replication relocalization would be guided by the nucleoskeleton and lncRNAs (Mercer and Mattick, 2013; Mercer et al., 2013) and might present an attractive target for stress response and sensory input signaling (Weiner et al., 2012).

It is notable that changes to ncRNAs are frequently cited with regard to the impact of life history events on the genome (Sunkar et al., 2007; Khraiwesh et al., 2012; Lelandais-Briere et al., 2012; Nakaminami et al., 2012; Amaral et al., 2013). In the plant literature, there is documentation of numerous ncRNA changes in response to particular biotic and abiotic stress regimes (**Table 4**).

A number of observations about resistance to biotic and abiotic stresses are consistent with a key role for ncRNA changes in life history responses. Several viruses encode siRNA suppressors to overcome host defenses (Jiang et al., 2012; Omarov and Scholthof, 2012; Guo and Lu, 2013). Transgenic constructs encoding constitutive miRNA expression can lead to salt and drought tolerance in creeping bentgrass (Zhou et al., 2013), to immunity against blast fungus in rice (Li et al., 2014), and in *Arabidopsis* to greater salt and alkalinity sensitivity (Gao et al., 2011). Acquired aphid resistance in *Arabidopsis* involves sncRNA changes (Kettles et al., 2013), and most acquired stress resistances in plants display transgenerational epigenetic inheritance (Holeski et al., 2012; Luna and Ton, 2012; Slaughter et al., 2012).

#### **SPECULATIVE CONCLUSIONS ABOUT AN EPIGENETIC INTERFACE BETWEEN EXPERIENCE AND GENOME CHANGE**

Mobile DNA and other NGE functions are the key agents for adaptively significant changes in genome organization and DNA sequences. The data reviewed and tabulated above establish the importance of RNA-directed chromatin formatting in the regulation and operation of mobile elements, viruses and DNA repair

#### **Table 4 | Changes in non-coding RNAs in response to life history events.**


functions. In addition, there is a remarkable correlation between the life history events that activate NGE functions to destabilize genomes and those that lead to alteration of chromatin states and DNA methylation patterns.

The preceding observations lead to the plausible hypothesis that epigenetic regulation serves as a key interface between organismal life history and the agents that restructure genomic DNA. This hypothesis is supported by the limited number of cases where empirical observations have established direct molecular connections between NGE functions and components of the epigenetic control system: histones, nucleosomes, and chromatin reformatting complexes.

If, as I expect, further research bolsters the epigenome-NGE correlations and connections documented above, then we need to ask: what components(s) of the epigenetic control apparatus communicate information about experience to NGE operators? We do not know the answer to this fundamental question. However, the data reported in **Table 4** indicate that ncRNAs are good candidates for key intermediates in the experience-genome signal transduction process. If this is so, then ncRNAs are logical molecular targets for modulating genome change toward potentially adaptive outcomes. Let us hope that research aimed at examining this proposal deepens our understanding of how life history impacts both epigenetic and genome change operations (**Tables 2**–**4**), whether or not my speculation ultimately proves to be correct.

#### **ACKNOWLEDGMENTS**

The author is grateful to the editors for the invitation to contribute to this special issue and for the opportunity to comment on the relationship between life history and genome change.

#### **REFERENCES**


in primary hippocampal neuronal cultures. *PLoS ONE* 8:e77859. doi: 10.1371/ journal.pone.0077859


*Populus euphratica*. *Plant Cell Rep.* 30, 1893–1907. doi: 10.1007/s00299-011- 1096-9


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 27 February 2014; accepted: 01 April 2014; published online: 25 April 2014. Citation: Shapiro JA (2014) Epigenetic control of mobile DNA as an interface between experience and genome change. Front. Genet. 5:87. doi: 10.3389/fgene.2014.00087 This article was submitted to Epigenomics and Epigenetics, a section of the journal*

*Frontiers in Genetics. Copyright © 2014 Shapiro. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

### Examination of exhaustive cloning attempts reveals that C. elegans piRNAs, transposons, and repeat sequences are efficiently cloned in yeast, but not in bacteria

#### *Or Sagy1, Ron Shamir <sup>2</sup> and Oded Rechavi 1\**

<sup>1</sup> Department of Neurobiology, Wise Faculty of Life Sciences and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel <sup>2</sup> Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel

#### *Edited by:*

Eva Jablonka, Tel Aviv University, Israel

#### *Reviewed by:*

Jafar Sharif, RIKEN, Japan Eugene V. Koonin, National Center for Biotechnology Information – National Library of Medicine – National Institutes of Health, USA

#### *\*Correspondence:*

Oded Rechavi, Department of Neurobiology, Wise Faculty of Life Sciences and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv 69978, Israel e-mail: odedrechavi@gmail.com

Genome sequencing requires insertion of random fragments of the sequenced organism's DNA into a unicellular host, most often Escherichia coli bacteria. This manipulation was found in the past to be analogous to naturally occurring horizontal gene transfer, and moreover has proved valuable to understanding toxicity of foreign genetic elements to E. coli. Sequencing of the Caenorhabditis elegans genome was similarly achieved via DNA transformation into E. coli. However, numerous attempts have proven a significant percentage of the genome unclonable using bacteria, although clonable via yeast. We examined the genomic segments that were not clonable in bacteria but were clonable in yeast, and observed that, in line with previous hypotheses, such sequences are more repetitive on average compared with the entire C. elegans genome. In addition, we found that these gap-sequences encode significantly more for DNA transposons. Surprisingly, we discovered that although the vast majority of the C. elegans genome is clonable in bacteria (77.5%), almost all the thousands of sequences that encode for PIWI-interacting small RNAs, or 21U-RNAs (91.6%) were only clonable in yeast. These results might help understanding why most piRNAs in C. elegans are physically clustered on particular loci on chromosome IV. In worms and in a large number of other organisms, piRNAs serve to distinguish "Self" from "Non-Self" sequences, and thus to protect the integrity of the genome against foreign genetic elements, such as transposons. We discuss the possible implications of these discoveries.

**Keywords:** *C. elegans***, bacteria, lateral gene transfer, piRNAs, PIWI-interacting small RNAs, repetitive elements**

#### **INTRODUCTION**

During the 1990s and early 2000s, there was a race to sequence the human genome (Collins et al., 2003). The required techniques were developed in a rapid pace, and in parallel utilized for sequencing of other multicellular organism's genomes. The genome of the nematode *Caenorhabditis elegans* (*C. elegans*) was the first one completed (The *C. elegans* Sequencing Consortium, 1998). An unexpected result of the development of the different DNA cloning techniques was the accumulation of negative results, failed cloning attempts that allow gaining insight regarding barriers for genomic information transfer between organisms.

As part of sequencing the *C. elegans* genome, the whole genome of the worm was randomly broken into overlapping fragments, which were transformed into *Escherichia coli* bacteria through the use of very large cloning vectors termed cosmids and fosmids (Coulson et al., 1986; Kim et al., 1992; Perkins et al., 2005). With sufficiently high coverage, sequencing of multiple overlapping fragments should allow in theory the assembly of the corresponding full genome. During the project, researchers soon realized that certain parts of the worm's genome could not be cloned in the bacteria, leaving gaps in the resultant genome (Coulson et al., 1991). Even though there have been numerous attempts over the past 20 years at filling the gaps using cosmids and fosmids [including a

consortium dedicated to creating a library of fosmids that would cover the entire genome (Perkins et al.,2005)], none have been successful in covering these gaps. In total, about 20% of the genome could not be cloned in bacteria in spite of these repeated efforts. Throughout this manuscript we will refer to such sequences as "gap sequences."

In the 1990s, in order to close the gaps and obtain a complete sequence of the *C. elegans* genome, researchers turned to cloning in yeast, by using YACs [Yeast Artificial Chromosomes (Coulson et al., 1991)]. The gap sequences quickly proved clonable in yeast (*Saccharomyces cerevisiae*), and thusYACs that contain sequences covering the gaps were rapidly obtained and facilitated the completion of the worm genome (Wilson, 1999). Even so, the reason for the "non-clonability" of these gaps in bacteria largely remained a mystery.

We found this incompatibility of worm genome fragments with bacterial cloning a valuable resource. *C. elegans* nematodes feed on bacteria both in real-life and in lab conditions, and this close contact facilitates exchange of genetic information between the nematodes and their resident microbes. For example, bacterially-expressed double-stranded RNA transfers from the bacteria to the worm and silences endogenous *C. elegans* genes (Timmons et al., 2001; Mello and Conte, 2004; Liu

et al., 2012; Sarkies and Miska, 2013). Although lateral gene transfer has been most extensively described in microorganisms, recent studies suggest that the process occurs more frequently than previously appreciated in eukaryotes as well (Bruto et al., 2014). DNA sequences were also shown to be laterally exchanged between nematodes and symbionts (Haegeman et al., 2011). This is consistent with the various findings of gene exchange between organisms at all evolutionary distances (Boto, 2010; Koonin, 2010). Thus, a side benefit of the artificial cloning procedure used for sequencing the *C. elegans* genome is that it provides a keyhole into limitations of natural genome transfer between worm and bacteria.

Given that the aforementioned gaps have not been clonable in *E. coli*, a prokaryote, but have been clonable in the eukaryote *S. cerevisiae*, we hypothesized that there would be a genetic factor in these regions causing the adopting prokaryotic host to die. This notion was inspired by Sorek et al.'s (2007) experimental research on inhibition of gene transfer between different bacterial genomes to *E. coli*. Such analysis of clone insertions and cloning gaps has proved valuable to understanding toxicity of bacterially derived genetic elements to *E. coli* in the past (Sberro et al., 2013).

It has been believed that the cause of the lack of clonability of various regions of the *C. elegans* genome was their high degree of repetitiveness (Wilson, 1999). We set out to explore whether there was another underlying property of these regions that, possibly in conjunction with repetitiveness, was actively inhibiting the prokaryotic hosts of these genome sequences from thriving.

#### **MATERIALS AND METHODS**

#### **PRECISE ALIGNMENT OF THE CLONES TO THE GENOME**

The sequences of *C. elegans* clones were downloaded from NCBI BioProject PRJNA13758, and the precise location of each of the clones was determined by BLASTing the sequence to the *C. elegans* genome (version WBcel235; this step was required as the locations in the NCBI record did not fit the genome version that we utilized). We used clone sequences from WormBase, retrieved from NCBI on April 4th, 2013. The sequences were categorized as "Cosmid," "Fosmid," or "YAC" clones.

The YAC subsequences that we analyzed tended to bridge across gap sequences, with both YAC ends occasionally overlapping with finished cosmid/fosmid sequences. The overlap varied between 0% and 100% (3 YAC sequences were found to be covered fully by cosmids/fosmids – Y119C1C, Y110A2B, Y70C5B). On average ∼7% of the YAC sequence's ends overlapped with cosmids/fosmids. To prevent this overlap from influencing our results, we removed the overlapping segments from the YAC sequences before running any analyses. The resulting"YAC-exclusive" sequences were used in our study (hereby referred to as "YACs").

#### **REPEATMASKER ANALYSIS**

As a first step, and in accordance with previous notions of repetitiveness as the factor limiting cloning of the YAC sequences, we ran RepeatMasker separately on YACs, cosmids, and fosmids. This was conducted for each of the individual clones. We compared the average, median, minimal, maximal, and non-zero results of each RepeatMasker parameter for every clone type (RepeatMasker was run with the species set as "*C. elegans"*)*.* RepeatMasker lists retroelements (including sub-classifications), DNA transposons (including sub-classifications), rolling-circles, other unclassified interspersed repeats, small RNA, satellites, simple repeats, and low complexity regions (Smit et al., 1996–2010). As a control test for each clone type we also generated a random library with the same number of clones taken from random segments of the *C. elegans* genome, (hereby "random clones"), of similar sizes as those comprising that clone type (based on the average length and length standard deviation). These libraries were examined using the same RepeatMasker analysis. This analysis was used to assess the significance of our results, in comparison to enrichments detected by chance.

#### **COMPARISON BY GENOMIC LOCATION**

It has been observed that the distribution of repeat elements along chromosomes is highly uneven (The *C. elegans* Sequencing Consortium, 1998). We therefore evaluated the difference in repeat element distributions between YACs and cosmids, by taking into account the location of the clones. The purpose of this analysis was to reduce bias caused by different location distributions of the types of clones. It may be hypothesized that these differences were causing the disparity that we found. In this, and the next couple of analyses, we compared YACs to cosmids and ignored fosmids, as there were much fewer fosmids compared to cosmids (107 vs. 2,520). To measure the effect of the clone's location along the chromosome and to obtain robust statistical analysis, we partitioned each chromosome into four equal-size quarters and summarized the statistics of the two extreme quarters together and of the two middle ones together.

#### **COMPARISON BY CLONE LENGTH**

Another factor that we took into consideration was the clone's length. Although YACs are generally much longer than cosmids, the YAC segments that were evaluated here were shorter, as they were trimmed (as mentioned above) to contain only the gaps used to bridge between cosmids. The YAC-exclusive segment median length was roughly 23.7 kb, while for the cosmids the median length was approximately 32.5 kb. In this analysis we divided each clone type into two groups – one of clones below the median length of the clone type and the other of clones over the median length. Statistics were collected for each group.

#### **COMPARISON OF GENES ENCODED IN THE CLONES**

We categorized the genes in the clones based on the type of product that they encode for (or do not encode for): coding sequences (CDS), non coding RNAs (ncRNA), tRNAs, and rRNAs. This categorization was determined based on the NCBI classification of each gene.

#### **RESULTS**

By comparing the sequences that were cloned in YACs, cosmids and fosmids, we observed, consistently with previous reports, that YAC sequences were overall more than twice as repetitive as the sequences cloned in cosmids. On average, 21.42% of the YAC nucleotides vs. 10.15% of cosmid nucleotides were masked by RepeatMasker. The most abundant repeat category, and an interesting result by itself, was enrichment in DNA transposons

of various types in the YAC sequences (See **Table 1**). On average, 11.44% of YAC sequences were found to be DNA transposons vs. 5.69% in cosmids. Of the transposons, the most major difference was in PiggyBac transposons – 1.53% in YACs on average vs. 0.48% in cosmids. Other than the transposons, significant differences were found in unclassified repeats (4.65% vs. 1.58%) and satellites (2.52% vs. 0.51%). The content of Simple repeats also varied – 1.65% vs. 1.06%. The total interspersed repeats (all the repeats that were identified by RepeatMasker, excluding small RNA, satellites, simple repeats and low complexity regions) composed, on average, 17.04% of a YAC sequence vs. 8.27% of a cosmid sequence. Contrary to what might be expected, low complexity repeats did not show a significant difference (0.31% vs. 0.3%).

Comparison with the results that were obtained with the randomly generated clones showed that the differences between the clone types did not originate from their size. For all repeat types the percentage of sequence they covered in the "random YACs"was very similar to the percentages for"random cosmids"and"random fosmids," unlike the results that were obtained with the real clones (Supplementary **Table S7**).

#### **THE RELATIVELY HIGH REPETITIVENESS OF THE SEQUENCES IN YACs PERSISTS EVEN WHEN CONTROLLING FOR THE CHROMOSOMAL LOCATION**

As previously reported, we observed a significant difference in repetitiveness and transposon-frequency between the different chromosome regions (both *p* < 0.0001, *t*-test). The extreme quarters of chromosomes were more repetitive than the middle quarters of the chromosomes (bases masked – 18.69% vs. 6.99%), and were made up more of transposons (10.12% vs. 3.96%). We alsofound a noteworthy difference between chromosomal location distributions of the clone types. The vast majority of the YACs, a little over three quarters (396) were in the extreme quarters and the minority in the middle quarters (119), while the cosmids were mostly in the middle quarters (979 vs. 1,540 – a ratio of ∼2:3). We then ran RepeatMasker while taking into account this difference – comparing the differentiated groups between YACs and cosmids and within each clone type. Histograms of the locations of YACs and cosmids and the locations of repeats of any type are shown in **Figures 1–2**. A similar histogram for transposons, portraying the deviation more finely can be found in **Figure 3**. To test the effect of the chromosomal location, we reran Repeat-Masker separately on the extreme and on the middle quarters. In both cases the conclusions that were drawn from the previous analysis persisted. YAC-contained sequences were more repetitive, and were composed more of transposons than cosmids, although clones in the first and last quarters were overall more repetitive than their second and third quarter counterparts (Bases masked: YACs – 22.83 and 16.71%, cosmids – 15.69 and 6.62%, composed of transposons: YACs – 12.78 and 7.01%, cosmids – 8.49 and 3.91%).

#### **DIFFERENCES IN THE LENGTH OF YACs AND COSMIDS DO NOT ACCOUNT FOR THEIR REPETITIVENESS DIFFERENCE**

There were no outstanding differences in repeat frequencies between the clones from the same type above and below the median length. Hence, the differences in repeat frequency between the clone types could not be explained by length variation.

#### **18 YACs HAD NO REPETITIVE ELEMENTS**

We examined the 18 YACs for which RepeatMasker did not find any repetitive elements. Out of these YACs, after removing overlaps with cosmids and fosmids, 12 were found to be smaller than 1 kb in length, 3 between 1 and 1.5 kb, and only 3 over 1.5 kb in length – Y68G5A (3.4 kb, after removing a 0.2 kb overlap; no genes in the non-overlapping portion), Y2C2A (8.5 kb, after removing an 8.5 kb overlap; part of the gene tag-80 in the nonoverlapping portion) and Y53C12C (23.4 kb, after removing a 0.2 kb overlap; the gene *eyg-1* is enclosed in the non-overlapping portion).

#### **YACs ARE SIGNIFICANTLY ENRICHED FOR 21U piRNAs**

An outstanding difference between the densities of proteincoding genes in YACs and cosmids was not found (a slightly higher density was found in cosmids: 182 genes per MB in YACs vs. 238 genes per MB in cosmids). tRNAs and rRNA are overrepresented in YACs – 94 YAC tRNA genes vs. 114 cosmid tRNA genes (4.2 vs. 1.5 genes per MB), and 10 YAC rRNA genes vs. 12 cosmid rRNA genes (0.45 vs. 0.15 genes per MB). Results for the CDS and ncRNA genes can be found in **Table 2**.

The most striking result of our analysis was that ncRNAs of the type 21U piRNAs were dramatically overrepresented in YACs. A roughly similar absolute amount of ncRNA was found in YACs and cosmids despite the fact that YACs take up about a quarter of the length that cosmids take in the genome (10,799 vs. 11,108 ncRNA genes, respectively – corresponding with densities of 484 vs. 143 ncRNAs/MB). Further investigation revealed that the vast majority of the YAC ncRNAs are PIWI-interacting small RNAs, or 21U-RNAs (about 91%), while they are less than half of the cosmid ncRNAs (about 42.4%). Moreover, in terms of absolute quantities, YACs include more than twice as many 21U-RNAs as cosmids – 4,713 vs. 9,879 flip order to maintain consistency.

The majority of the DNA sequences that give rise to 21U piR-NAs reside in two regions in chromosome IV (and also in another small region on chromosome IV) that are called "piRNA clusters" (Ruby et al., 2006). We analyzed these regions for repetitiveness and they turned out to be overall less repetitive compared to other regions, both on chromosome IV and on other chromosomes (the two major piRNA clusters had 8.15 and 11.51% of the bases masked as repetitive and the third cluster had 8.11% of the bases masked, compared to 11.89% for chromosome IV as a whole and 14.9% for chromosome I for example). Since the regions that encode for abundant YAC-specific piRNAs are relatively not repetitive, it raises the possibility that they are unclonable in *E. coli* for other reasons, perhaps due to toxicity (see Discussion). The longest four YACs, and seven of the longest ten YACs, were on chromosome IV (see **Table 3**). The longest four YACs alone hold 3,754 ncRNAs (roughly 35% of all ncRNA genes found in YACs), with the 6th longest holding 1,083 more. The longest 4 YACs also mapped specifically to one of the two piRNA clusters on chromosome IV.


**Table1|RepeatMaskerresultsforYACs,cosmids,andfosmids(averagepercloneforeachclone**

**Frontiers in Genetics**

**FIGURE 3 | DNA transposon locations. (A)** The fraction of DNA transposons starting at each percent point of their chromosomes on average, as identified by RepeatMasker. **(B)** The cumulative probability distributions of DNA transposons on average. For each percentage point the fraction of clones starting at that point or further is given.



**Table 3 | Longest 15YACs, their locations, and lengths.**


While chromosome IV contains the vast majority of the YAC ncRNAs (93.3% vs. 44.1% of cosmid ncRNAs), it is not overrepresented for YACs overall (21.8% of the YAC exclusive bases are on chromosome IV, which constitutes 17.4% of the nematode genome). When removing chromosome IV from the analysis, the discrepancy between YACs and cosmids on ncRNA content was flipped: instead of the ncRNAs being divided roughly equally between YACs and cosmids in terms of absolute numbers, when excluding chromosome IV, there were about 8.8 as many ncRNAs in cosmids as in YACs (6,182 vs. 702).

#### **DISCUSSION**

We used a bioinformatic approach to probe DNA sequences that could be cloned in a eukaryote (yeast), but not in a prokaryote (bacteria). We found several major characteristics in such sequences, including high repetitiveness and enrichment for DNA

transposons and 21U piRNAs. If these characteristics are typical to sequences of other organisms, it could in theory shed light on the barriers to genetic information transfer that exist between prokaryotes and eukaryotes, and should be interesting to validate experimentally.

The inability to efficiently clone repetitive sequences in bacteria is perhaps not biologically relevant, as it could simply stem from technical considerations. On the other hand, and while our results do not prove it in any way, it is possible that the inability to clone DNA transposons or piRNAs in bacteria has physiological importance.

Acquisition of vectors (cosmids or fosmids in this case) that carry mobile elements could be detrimental, if such genomic parasites are allowed to colonize or disrupt the genome of the host. It would be interesting to examine whether the type of transposons that we found to be enriched in YACs "jump" in *E. coli*, but not in *S. cerevisiae*, and if such transposition, should it be shown to occur, compromises the bacteria's viability*.* Transposons that are not dependent on host factors could in theory transpose and confer damage, even upon lateral gene transfer.

piRNA-mediated RNA interference has a role in genome surveillance against foreign sequences in multiple organisms across the tree of life (animals and protists). One fascinating scenario that could be envisioned based on our data is that piR-NAs might act *in trans*, to eliminate bacteria, *E. coli* in this case, with which *C. elegans* intimately interacts. Although *C. elegans* worms are not natural hosts of *E. coli*, the two organisms have been grown together in the lab for half a century. *C. elegans* and *E. coli* could in theory adapt to interact by these mechanisms, as piRNAs and piRNA pathway genes evolve rapidly, due to an "arms race" with transposons (Yi et al., 2014), and since piRNAs target rapidly evolving genes (Weick et al., 2014). The promoters of most piRNAs, which are autonomous transcriptional units in *C. elegans*, contain a consensus motif for binding of transcription factors (Weick et al., 2014). Thus in theory, possession of just one appropriate transcription factor could allow a host (such as *E. coli*) to transcribe most piRNAs. In flies, piRNA clusters act as "traps" in which transposons land (Malone and Hannon, 2009). It is not understood why most piRNAs in *C. elegans* are clustered together on chromosome IV. Perhaps the inability to transfer a large cluster of piRNAs from *C. elegans* to *E. coli* is one of the reasons.

Characterization of DNA sequences that are lethal to bacteria but not to yeast could be applicative (Sorek et al., 2007; Kimelman et al., 2012). Given further research and validation, it could be possible in the future to combine in gene therapy, for example, piRNA-encoding DNA sequences, and thus to affect bacteria, while sparing the eukaryote hosts with which they interact.

#### **ACKNOWLEDGMENTS**

We thank all members of the Rechavi lab for their helpful comments, and Gregory Minevich for discussions and help with the development of the idea. Oded Rechavi and Ron Shamir were supported in part by the Israel Science Foundation (OR grant 1101/13, RS grant 317/13).

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fgene.2014.00275/ abstract

**Table S1 | RepeatMasker results for the various groups of clones, averaged per clone for each clone type.**

**Table S2 | RepeatMasker results for the various groups of clones, median per clone for each clone type.**

**Table S3 | RepeatMasker results for the various groups of clones, summed per clone for each clone type.**

**Table S4 | RepeatMasker results for the various groups of clones, number of clones for each clone type that are non-zero.**

**Table S5 | RepeatMasker results for the various groups of clones, the minimal result per clone for each clone type and the clones with that value.**

**Table S6 | RepeatMasker results for the various groups of clones, the maximal result per clone for each clone type and the clones with that value.**

**Table S7 | RepeatMasker results for the randomly-selected genomic segments with the same size distributions asYACs, cosmids, and fosmids (averaged per segment for each clone type).**

#### **REFERENCES**


**Conflict of Interest Statement:** The Guest Associate Editor Eva Jablonka declares that, despite being affiliated to the same institution as the authors, the review process was handled objectively and no conflict of interest exists. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 28 May 2014; accepted: 26 July 2014; published online: 28 August 2014.*

*Citation: Sagy O, Shamir R and Rechavi O (2014) Examination of exhaustive cloning attempts reveals that C. elegans piRNAs, transposons, and repeat sequences are efficiently cloned in yeast, but not in bacteria. Front. Genet. 5:275. doi: 10.3389/fgene.2014.00275*

*This article was submitted to Epigenomics and Epigenetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Sagy, Shamir and Rechavi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

### Insights into the epigenomic landscape of the human malaria vector *Anopheles gambiae*

#### *Elena Gómez-Díaz <sup>1</sup> \*, Ana Rivero 2, Fabrice Chandre <sup>2</sup> and Victor G. Corces <sup>1</sup>*

*<sup>1</sup> Department of Biology, Emory University, Atlanta, GA, USA*

*<sup>2</sup> Maladies Infectieuses et Vecteurs: Écologie, Génétique, Évolution et Contrôle (UM1-UM2-CNRS 5290-IRD 224), Centre IRD, Montpellier, France*

#### *Edited by:*

*Ilaria Negri, University of Turin, Italy*

#### *Reviewed by:*

*Beisi Xu, St. Jude Children's Research Hospital, USA Irene Ricci, University of Camerino, Italy*

#### *\*Correspondence:*

*Elena Gómez-Díaz, Department of Biology, O. Wayne Rollins Research Center, Emory University, Room 1071B, 1510 Clifton Road NE, Atlanta, GA 30322, USA e-mail: elegomezdiaz@gmail.com*

The epigenome of the human malaria vector *Anopheles gambiae* was characterized in midgut cells by mapping the distribution and levels of two post-translational histone modifications, H3K27ac and H3K27me3. These histone profiles were then correlated with levels of gene expression obtained by RNA-seq. Analysis of the transcriptome of *A. gambiae* midguts and salivary glands led to the discovery of 13,898 new transcripts not present in the most recent genome assembly. A subset of these transcripts is differentially expressed between midgut and salivary glands. The enrichment profiles of H3K27ac and H3K27me3 are mutually exclusive and associate with high and low levels of transcription, respectively. This distribution agrees with previous findings in *Drosophila* showing association of these two histone modifications with either active or inactive transcriptional states, including Polycomb-associated domains in silenced genes. This study provides a mosquito epigenomics platform for future comparative studies in other mosquito species, opening future investigations into the role of epigenetic processes in vector-borne systems of medical and economic importance.

**Keywords: H3K27ac, H3K27me3, histone post-translational modifications, chromatin immunoprecipitation, gene expression regulation, mosquito-borne diseases, ChIP-seq, RNA-Seq**

#### **INTRODUCTION**

Host-parasite interactions are among the most plastic and dynamic systems in nature. In these systems, epigenetic modifications can provide an accessory source of fast-acting, reversible and readily available phenotypic variation that can be directly shaped by both host and parasite selection pressures (Bonduriansky and Day, 2009; Gómez-Díaz et al., 2012). Mosquitoes are important disease vectors worldwide. *Anopheles gambiae* is major vector of malaria in Africa, a disease that affects more than 300 million people and causes around 650,000 deaths each year (Who, 2013). The genome of *A. gambiae* was sequenced more than 10 years ago (Holt et al., 2002). Since then, the roadmap for malaria control strategies has been mostly centered on the study of vector genetic variation (Severson and Behura, 2012). Yet, genetic mechanisms alone are not sufficient to explain natural phenotypic variation in terms of vector competence (Lambrechts, 2010). Thus, it is critical to expand our current understanding of mosquito-parasite interactions into an integrated view that includes both genetic and epigenetic dimensions. Although a great deal of progress has been made in deciphering the epigenetic code of *Plasmodium* parasites (see Hoeijmakers et al., 2012 for a review), knowledge of epigenetic processes in mosquitoes is limited.

Post-translational modification (PTM) of histones (acetylation, methylation, phosphorylation, sumoylation, and ubiquitinylation) is an important regulatory mechanism of transcriptional control that acts by altering chromatin structure (Kouzarides, 2007; Bonasio et al., 2010). In *Drosophila*, epigenomic profiling of enrichment or depletion patterns for specific histone modifications has been established genome-wide and this information has been correlated with the location of regulatory elements and functional domains (Kharchenko et al., 2011; Negre et al., 2011). For example H3K4me3 and H3K27ac are associated with active promoters, H3K4me1 and H3K27ac are present in active enhancers, and H3K27me3 associates with Polycomb-mediated silencing. Surprisingly, no information is available in mosquitoes on histone PTMs or regulatory elements involved in chromatin-based epigenetic processes, such as the Polycomb and trithorax complexes, insulators, and enhancers.

Available technologies now allow the analysis of whole epigenomes in a straightforward and cost-effective manner. Chromatin immunoprecipitation followed by sequencing (ChIPseq) allows the mapping of protein-DNA interactions genomewide. In this method DNA-protein complexes containing a specific protein of interest are immunoprecipitated from crosslinked, sonicated chromatin. DNA is purified from the enriched pool and used to generate a library for subsequent whole genome sequencing. The sequence reads for the enriched DNA are computationally aligned to the reference genome to define sharp peaks or broad blocks of modified histones. Since its development in 2007 (Robertson et al., 2007), ChIP-seq has been extensively used to survey the genomic profiles of histones and their modifications, transcription factors (TFs), DNA and histone modifying enzymes, the transcriptional machinery, and other chromatin associated proteins in various model organisms, including the fruit fly *Drosophila melanogaster* (Roy et al., 2010). One downside of this technique is, however, that its application in non-model species is still limited by the availability of reference genomes. On the other hand, in cases where the annotation of the genome is still incomplete, such as *A. gambiae*, the combination of histone modification maps and gene expression information constitutes a powerful approach that allows the functional interpretation of chromatin marks as well as *de novo* prediction of regulatory elements, genes and splice variants (Ernst and Kellis, 2010; Cheng et al., 2011). Previous studies have analyzed the transcriptome of *A. gambiae* using microarrays (Dana et al., 2005; Marinotti et al., 2005; Koutsos et al., 2007; Baker et al., 2011; Maccallum et al., 2011), but this approach gives only semi-quantitative results due to the small dynamic range of the microarray signal. A few studies have applied RNA-seq in this species on whole bodies or chemosensory appendages (Pitts et al., 2011; Vannini et al., 2014). To our knowledge no study has specifically targeted RNAseq analyses to tissues such as the midgut or salivary glands that play a key role in the development of *Plasmodium* parasites within the mosquito, and are thus directly implicated in malaria transmission.

Using the technical and theoretical knowledge accumulated from chromatin studies in *Drosophila*, here we carry out an analysis of the transcriptome and histone PTMs in the human malaria vector *A. gambiae*. We characterize the distribution and levels of H3K27ac and H3K27me3, two key histone modifications that have been shown to play an important role in gene regulation, by performing ChIP-seq analyses in the midguts of adult mosquito females. The midgut is key tissue for *Plasmodium* development because the obligate passage of the parasite through this tissue results in large losses in parasite numbers, which may explain the frequent failure of the parasite to complete its life cycle in the mosquito (Blandin et al., 2008). We then correlate the histone profiles with levels of genome-wide gene expression obtained by RNA-seq, in order to infer functional states and predict putative regulatory elements in the mosquito. This integrative analysis allowed us to link enrichment or depletion of active and repressive histone modifications to their target genes. The result is the first platform for mosquito comparative epigenomics that can serve as a basis for future studies on the biology of mosquitoes and mosquito-borne diseases.

#### **EXPERIMENTAL PROCEDURES MOSQUITO REARING AND DISSECTION**

Experiments were performed using an isogenic strain of *A. gambiae* (strain Kisumu) maintained at the MIVEGEC insectarium under standard rearing conditions (27 ± 1◦C, 70 ± 10% RH and 16L: 8D photoperiod). Blood- fed adult females were allowed to lay eggs. On the day of hatching, larvae were seeded into plastic trays (25 × 35 × 7 cm) containing one liter of mineral water at a density of 300 individuals per tray. Larvae were provided with 200 mg of TetraMin® fish flakes the day after hatching, and, from then on, 400 mg TetraMin every 2 days until pupation. On pupation, trays were placed inside an emergence cage (27 × 40 × 35 cm) and provided with an *ad libitum* source of 10% sugar solution for the emerged adults. Dissection of the midguts and the salivary glands was performed on adult females aged 6–8 days. Tissues were maintained in ice-cold Schneider's insect culture medium (Sigma-Aldrich) to avoid degradation, and fresh tissues were immediately processed for chromatin and RNA analyses.

#### **RNA ISOLATION AND RNA-SEQ LIBRARY PREPARATION**

Total RNA was extracted from fresh mosquito tissues (∼25 midguts and ∼ 50 salivary glands) using the *mir*Vana™ RNA Isolation Kit (Ambion®) according to the manufacturer's protocol. RNA concentration was quantified using a Qubit® 2.0 Fluorometer, and RNA integrity was determined with an Agilent 2100 Bioanalyzer. Illumina libraries were prepared and sequenced at the HudsonAlpha Institute for Biotechnology, using an Illumina HiSeq2000 sequencer. Standard directional RNA-seq library construction, 50 bp paired end reads with ribosomal reduction (RiboMinus™ Eukaryote Kit, Ambion®).

#### **CHROMATIN ISOLATION, IMMUNOPRECIPITATION, AND SEQUENCING**

Antibodies to histone modifications H3K27ac (Abcam ab4729) and H3K27me3 (Millipore 07-449) used in this study have been previously tested and shown to specifically recognize the appropriate epitopes (Kharchenko et al., 2011; Landt et al., 2012). The epitope sequences are conserved between *Drosophila* and *A. gambiae*.

For immunoprecipitation experiments, mosquito tissues (15–20 midguts) were suspended in 100µl of Schneider's medium, fixed by adding 37% formaldehyde to a final concentration of 1%, and incubated at room temperature for 10 min. The reaction was stopped by adding 1/10 volumes of 1.25 M glycine solution, incubated for 5 min before centrifugation at 4000 rpm for 3 min, and washed twice using cold 1X PBS. Tissues were lysed in 200µl of cell lysis buffer (5 mM PIPES pH 8.0, 85 mM KCl, 0.5% NP40), and protease inhibitor cocktail (complete protease inhibitor cocktail tablets, Roche), then homogenized with a pestle mixer and incubated on ice for 15 min. After centrifugation at 4000 rpm for 8 min, the nuclei were resuspended in 200µl of nuclear lysis buffer (50 mM Tris-HCl pH 8.0, 10 mM EDTA.Na2, 1% SDS, and protease inhibitors), and incubated on ice for 20 min. Then, 100µl of IP dilution buffer (0.01% SDS, 1.1% Trition X-100, 1.2 mM EDTA.Na2, 16.7 mM Tris-HCl pH 8.0, 167 mM NaCl) were added. Sonication was performed using a Diagenode BioRuptor using 40 cycles of 30 s ON/ 30 s OFF, resulting in fragments around 300–500 bp, followed by centrifugation at 4◦C for 10 min. Quality control of fixed chromatin was performed by visualization of seared fragments by agarose electrophoresis prior to proceeding with the chromatin IP. For the preclear step, 50µl of protein A magnetic beads (Dynabeads® Protein A, Novex®) were pre-coated using anti-rabbit IgG in 5% BSA. Beads were then added to each sample diluted 5X in IP dilution buffer, and incubated overnight (OVN) at 4◦C in a rocker. Seventy five µl (5%) were taken as a control input from this solution and subjected to the same experimental procedure than the test sample, except for the antibody-binding step. For the test sample, 3µl of Ab (1µg/µl) pre-bound to 50µl of protein A beads were added to the pre-cleared chromatin and incubated OVN at 4◦C. After discarding the supernatant by magnetic separation, antibody-bound chromatin was subjected to 3 washes in low salt buffer (0.1% SDS, 1% Trition X-100, 2 mM EDTA.Na2 pH 8.0, 20 mM Tris-HCl pH 8.0, 150 mM NaCl), 2 washes in high salt buffer (0.1% SDS, 1% Trition X-100, 2 mM EDTA.Na2 pH 8.0, 20 mM Tris-HCl pH 8.0, 500 mM NaCl), 2 washes in LiCl buffer (1 mM EDTA.Na2 pH 8.0, 10 mM Tris-HCl pH 8.0, 0.25 M LiCl, 1% NP40, 1% SDS), and one final wash in TE. Elutions were then performed by adding 2 × 200µl of IP Elution Buffer (0.1 M NaHCO3, 1% SDS). Crosslinks were reversed by incubating the ligated chromatin at 65◦C OVN with the addition of 0.25 M NaCl, 10 mM EDTA, and 40 mM Tris. Extraction was performed by adding 8µl of proteinase K (10 mg/ml) and incubation at 50◦C for 2 h. The DNA was extracted following a standard phenol-chloroform procedure.

Sequencing libraries were prepared following the protocol described by Bowman et al. (2013) optimized for low quantity DNA. Before library preparation, ChIP DNA was pre-cleaned using a 1.8 × ratio of SPRI beads (Agencourt AMPure XP, Beckman Coulter). End preparation was performed by end repair (End-It DNA End Repair Kit, Epicenter Cat# ER0720) and addition of "A" base to 3 ends (Klenow 3 -5 exo, NEB Cat# M0212S). Adaptor ligation was performed using a universal adaptor sequence and T4 ligase (NEB Cat# M0202S), and the ligated products were SPRI cleaned (1.6 × beads ratio). Amplification of ChIP libraries was performed by qPCR using an Applied Biosystems 7500 Fast Real-Time PCR System. Reactions (50µl) consisted of 0.2µM universal primer, 0.2µM barcoded primer and 25µl of KAPA SYBR FAST qPCR Master Mix (KapaBiosystems). Real time kinetics was monitored and library amplifications were stopped at the end of the extension after SYBR green reported reaction kinetics in the log phase (usually 15 cycles). A final SPRI purification with 1.0 × beads ratio was performed on the ChIP DNA libraries to remove adapter/primer dimers (120 bp) from the rest of the library (250 bp and above). ChIP libraries were sequenced at the HudsonAlpha Institute for Biotechnology using an Illumina HiSeq2000 sequencer.

#### **DATA ANALYSES**

#### *RNA-seq data analysis*

RNA raw sequencing data were quality checked using FastQC, and trimmed on both ends, based on the quality estimates for each sequence, by using the FASTX-Toolkit (http://hannonlab.cshl.edu/fastx\_toolkit). For the analysis of RNA paired directional reads we used the Tuxedo suite of programs as described by Trapnell et al. (2012). First, TopHat v2.0.10 (Trapnell et al., 2009) was used to map reads to the GTF annotation file of the most recent version of the *A. gambiae* genome, assembly AgamP4 PEST strain, available at VectorBase (https://www*.*vectorbase*.*org/). Reads were aligned using default parameters, except the option of library type set as first-strand for directional RNA-seq, and the inner mate distance parameter r, which was estimated using the CollectInsertSizeMetrics tool implemented in the Picard's tools suite (http://picard*.* sourceforge*.*net/index*.*shtml). The Cufflinks v2.1.1 package (Roberts et al., 2011), which includes Cuffcompare, Cuffmerge and Cuffdiff; was used to quantify transcript abundance in terms of Fragment Per Kilobase Million (FPKM), allowing the discovery of new transcripts. Parameters were set as default using the GTF option, bias correction, and applying the total hits norm. We then compared the new transcriptomes to existing annotated transcripts using Cuffcompare. We ran Cuffdiff first on the conserved, known transcripts, and then on the combined transcriptome, which includes both previously annotated and new transcripts, to examine tissue specific differences in gene expression between midguts and salivary glands. The clusters of differently expressed genes were visualized using the R/Bioconductor cummeRbund package (http://compbio*.*mit*.* edu/cummeRbund/). Significant expression differences between tissues was defined as having both a *P*-value *<* 0.01 and a false discovery rate (FDR) of *<*0.05.

#### *Chip-seq data analyses*

Quality metrics statistics of Illumina reads were obtained using FastQC (http://www*.*bioinformatics*.*bbsrc*.*ac*.*uk/projects/fastqc). We used Bowtie v.1.0.0 (Langmead and Salzberg, 2012) to map reads from control and ChIP libraries to a custom genome index for *A. gambiae*; built using the most recent AgamP4 reference assembly using the bowtie-*build* tool. We ran Bowtie using the m option to filter repetitive sequences. To identify significantly enriched regions for H3K27ac we performed peak calling analysis using MACS v1.4.2 (Zhang et al., 2008; Feng et al., 2012). Parameters were set to consider equal numbers of unique reads for input and ChIP samples, remove redundant reads (PCR duplicates), and a *p*-value cutoff of 1 × 10e-5. To analyze the distribution of H3K27me3 we used SICER, which is optimized for the analysis of broad ChIP-enrichment regions (Zang et al., 2009; Xu et al., 2014) characteristic for this histone modification mark. Parameters were set to a window size of 200, gap size 600 and FDR cutoff of 1 × 10e-5, and the default value for the redundant rate cutoff.

We used BEDTools2.19.0 (Quinlan and Hall, 2010) to obtain genome coverage for each histone modification mark, which is the percentage of the genome in bp that shows significant enrichment of H3K27ac and H3K27me3, as well as the intersects of H3K27ac and H3K27me3 enrichment peaks with various genomic features (promoters, TSSs and genes). For these analyses we define promoters as regions located within a distance of 200 bp upstream and downstream of the transcription start site (TSS).

To examine the association between H3K27ac and H3K27me3 profiles genome-wide, we first normalized control and ChIP samples based on read counts and calculated the difference using DeepTools (Ramírez et al., in press). The resulting enrichment files were binned into 10 bp windows covering the entire *A. gambiae* genome reference sequence. We then referenced the 10 bin normalized and subtracted ChIP enrichments to gene coordinates of the AgamP4 gene set and calculated average enrichments per gene across a region that includes the gene body scaled to 500 bp, flanked by 200 bp before the TSS and after the Transcription Termination Site (TTS), respectively. The same type of analysis was performed by referencing ChIP data to gene coordinates from the "custom" gene set generated by CuffCompare, which merges novel and conserved transcripts (see above). We then applied K-means clustering to the matrix of enrichment values for H3K27me3 to organize genes based on their similarities into 3 clusters using Cluster v3.0. Genes within each cluster were ordered by decreasing mean enrichment values. Groups of H3K27me3 enrichment were then used as anchors to order the matrix of mean values of H3K27ac and gene expression. Clustered data was then visualized as a heatmap using deepTools.

Combined analyses of ChIP and RNA-seq data were performed in R 3.1.0 (http://www*.*r-project*.*org/) using Bioconductor (http://www*.*bioconductor*.*org). We used BedTools v2.19.0 and SamTools v1.4 (http://samtools*.*sourceforge*.*net) for file manipulation and conversion. Coverage analyses on bed and bam files were performed using BedTools and DeepTools suites.

#### *Functional analysis of genes*

Functional analyses were performed on genes obtained from the clustering approach corresponding to the various enrichment profiles (see results), and that intersect with H3K27ac and H3K27me3 peaks identified by MACS or SICER. Sequences for the various gene sets were retrieved by mapping VectorBase gene IDs against the UniProtKB database (http://beta*.*uniprot*.* org/). We employed Blast2Go (http://www*.*blast2go*.*de/) to conduct Gene Ontology (GO) and KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways annotations. Fisher's exact test was used to retrieve significantly enriched GO terms for genes marked with H3K27ac and H3K27me3 using Blast2Go. GO functional categories are defined as those containing at least five genes and a minimum enrichment score of 1.3 (*p*-value *<* 0.05).

#### **REAL-TIME PCR VALIDATION**

Validation by ChIP-qPCR was performed by targeting two housekeeping genes, actin5C and TER94, which typically show significant enrichment of H3K27ac in *Drosophila* (http://www*.* modencode*.*org/). For this assay we also targeted an intergenic region located 2 Kb upstream of these two genes as a negative control. Reactions were performed using SYBR Green I Master kit (Applied Biosystems) in an Applied Biosystems Lightcycler. The PCR parameters are: 1 cycle of 10 min at 95◦C, 40 cycles of 10 s at 95◦C, 10 s at 60◦C, and 20 s at 72◦C. PCR primer sequences are listed in Table S1.

#### **RESULTS**

#### **GENE EXPRESSION AND TRANSCRIPT DISCOVERY**

RNA library processing and quality metrics are shown in **Table 1**. The expression analysis on the assembly of reads obtained from *A. gambiae* female mosquito tissues, midguts and salivary glands, resulted in a total of 13,969 novel transcripts that correspond to newly discovered non-annotated loci (class code "u": Unknown, intergenic transcript); combined GTF and BED files available at GEO GSE59773) as well as transcripts categorized as class code "="correspond in most cases to extensions of previously annotated genes based on intron matches. Most novel transcripts (6907 in midguts and 6943 in salivary glands) show relative abundances, expressed as Fragments Per Kilobase of exon model per Million mapped fragments (FPKM), above the 95 percentile of expression for the distribution of transcript abundances, which we estimated correspond to a value of 0.26 for midguts and 0.56 for salivary glands (lower bound of the 95% confidence interval (CI) for the distribution of transcript abundance). Thus transcripts that show FPKM values above this threshold can be considered as reliable in terms of novel transcript discovery and gene expression differences. Blasting a subset of these transcripts against non-redundant protein databases confirms that they are novel (data not shown). This supports the notion that there is a large number of genes in the reference genome of *A. gambiae* that remain to be discovered and/or properly annotated.

The comparison between transcripts obtained from midguts and salivary glands identified a total of 23 differentially expressed genes, 23 TSS and 21 isoforms switching events (*p*-value *<* 0.01, *q*-value *<* 0.05) (**Figure 1**). We did not detect significant differences in expression when considering the combined transcriptome, which includes both previously annotated and novel genes (**Figure 1**), probably due to low abundances of novel transcripts, which represent half of the combined data set, and which may have bias quantification results. Note that these are conservative estimates based on the threshold used by Cufflinks.

#### **VALIDATION OF THE ChIP-SEQ APPROACH AND MAPPING OF H3K27ac AND H3K27me3 IN** *A. GAMBIAE*

Statistics of data processing are presented in **Table 1**. ChIP and control libraries passed all quality tests (**Figure S1**). The percentage of reads mapped to the mosquito genome was 53.70% for H3K27ac (13,300,823 reads) and 58.19% for H3K27me3 (12,832,729 reads). Of these, 13.73% for H3K27ac and 17.51% for H3K27me3 reads were identified by Bowtie as corresponding to repetitive sequences such as those typically found in transposons and centromeric or telomeric regions. The percentage of mapping is similar between the two target histones and the control. Results of the ChIP-qPCR validation are also consistent with data obtained by the ChIP-seq analysis (data not shown). Taken together, these results indicate that the ChIP-seq data provide an accurate representation of the genome-wide distribution of histone modifications analyzed in *A. gambiae* and offer proof-of-concept validation of the approach.

#### **Table 1 | Statistics of ChIP-seq and RNA-seq data processing and quality checks.**


Each symbol represents one gene that has detectable expression in either tissue. Relative differences in signal intensity along the X-axis reflect up-regulation in salivary glands when positive and up-regulation in showing the expression level distribution for all genes, isoforms, and TSS, in the two tissues. FPKM, fragments per kb of transcript per million fragments mapped.


*The depth of coverage in bp, the fraction of the chromosome covered, and the total size in bp for each chromosome and the complete genome is indicated.*

Coverage analyses of ChIP peaks regions reveal that 6.4% of the genome is occupied by H3K27ac, whereas 22% is occupied by H3K27me3. The fraction of the genome covered by each histone modification across different chromosomes is shown in **Table 2**. By applying a peak calling approach using either MACS or SICER we identified 6639 peaks for H3K27ac and 12,939 peaks for H3K27me3 after input control normalization. Of the peaks identified for H3K27ac, 70.2% (4657) intersect with annotated genes or their promoters (here defined as regions located 200 bp upstream and downstream of TSSs), whereas a small fraction of the remaining peaks (29.8% of total) correspond to intergenic regions. In the case of H3K27me3, only 41% of the peaks (5414) intersect genes or their promoters and, therefore, most peaks appear to be intergenic. When considering the custom gene set that includes the newly discovered transcripts, the percentage of peaks that overlap to genes is 6156 for H3K27ac and 7813 for H3K27me3, thus indicating that a considerable proportion of ChIP signal maps to non-annotated regions.

#### **GENOME-WIDE PROFILING OF HISTONE AND GENE TRANSCRIPTIONAL STATES**

To examine the patterns of histone acetylation and methylation genome-wide, we considered *A. gambiae* genes plus flanking regions of 200 bp upstream and downstream of the TSS and TTS, respectively. The distribution maps of the two histone modifications at genes revealed a mutually exclusive pattern of H3K27ac and H3K27me3 enrichment genome-wide (**Figure 2**). The same pattern was obtained when using the standard annotated AgamP4 or our custom gene-set as anchors. When examining the average histone modification profiles across the gene regions, results show peak maxima of H3K27ac downstream of the TSSs of genes (**Figure 2B**), whereas H3K27me3 tends to occupy broader regions that cover the entire body of genes (**Figure 2A**). K-means clustering of ChIP-seq data resulted in the classification of genomic regions into three clusters based on their H3K27me3 mean enrichment profiles. Cluster 1 genes, (C1: 12,913 genes) are depleted of H3K27me3 and contain high levels of H3K27ac. This cluster may correspond to genes that are highly transcribed. Cluster 3 genes (C3: 2021 genes) contain high levels of H3K27me3 and they are depleted of H3K27ac. Genes in this cluster probably correspond to Polycomb-silenced genes in *Drosophila*. Interestingly, a third gene cluster identified based on H3K27me3 levels, cluster 2 (C2: 11,976 genes) contains intermediate levels of H3K27me3 and are also generally depleted of H3K27ac but not as dramatically as genes in C3 (**Figure 2D**). Pairwise comparisons between clusters revealed significant differences in mean enrichment values for genes marked with H3K27ac (C1 = 0*.*320 ± 0*.*013, C2 = −0*.*100 ± 0*.*004, C3 = −0*.*198 ± 0*.*007; Kruskal-Wallis, all *p*-values *<* 0.001), or H3K27me3 (C1 = −1*.*281 ± 0*.*005, C2 = −0*.*017 ± 0*.*004, C3 = 2*.*525 ± 0*.*033; Kruskal-Wallis, all *p*-values *<* 0.001; Mean values of ChIP enrichment by gene, normalized and input subtracted). It is also noticeable from the heatmaps that at regions where H3K27me3 localize at high levels, the signal often expands toward the 3 and 5 end of the genes covering most of the flanking region. Likewise, a closer look at genes marked with H3K27ac reveals two sets of genes based on the distribution across the region: genes that have this histone modification at the promoter and coding region and a second group of intermediate enrichment where this mark is centered on promoters (**Figure 2B**).

We next asked how these various histone modification profiles relate to the transcriptional status of genes. Considering genes, annotated and novel, that intersect with H3K27ac and H3K27me peaks, we found significant differences in gene expression between genes containing each of these two histone modification (Kruskal-Wallis chi-squared = 7138.69, *df* = 1, *p*-value = 0). Comparison of histone modification maps and gene expression genome-wide reveals a preferential localization of H3K27ac in active genes and promoters whereas genes that are marked with H3K27me3 are silent or are transcribed at low levels (**Figure 2C**). Indeed, this gene expression pattern is also true in quantitative terms when considering the gene clusters defined based on the levels of enrichment or depletion of histone PTMs (C1: 1.60 ± 0.029, C2: −1*.*22 ± 0*.*03, C3: −2*.*18 ± 0*.*05; Kruskal-Wallis chi-squared C1–C2 = 4646.224, *df* = 1; C2–C3 = 238.38, *df* = 1; C1–C3 = 2097.31, *df* = 1, all *p*-values *<* 0.001; Mean gene expression by gene are calculated based on the log2 of FPKM values) (**Figure 2E**).

#### **FUNCTIONAL ANNOTATION OF HISTONE MODIFICATION PROFILES**

Genes clustered based on their histone modification profile, either marked with H3K27ac or H3K27me3, were assigned to GO accession numbers and classified into functional categories under three major classes (biological process, cellular component, and molecular function). We found multiple GO categories associated with each cluster (Table S2). Genes enriched in either H3K27ac or H3K27me3, were assigned to multiple different KEGG pathways. Of these, pathways involved in metabolism of amino acids and signaling pathways were top ranked hits for genes marked with H3K27ac or H3K27me3 (**Figure S2**). The genes associated to each histone PTM (see **Figure 2** for cluster names) also differ in the relative enrichment of GO terms (**Figure 3**, *p <* 0*.*05). Particularly, genes in cluster 1, highly enriched in H3K27ac, and cluster 2, containing intermediate levels of H3K27ac and H3K27me3, display a significant differential enrichment in GO terms associated with metabolic processes, immune, and signaling pathways. This is in marked contrast with genes in cluster 3, which display differential enrichment in GO terms related to membrane transporters and developmental processes (**Figure 3**, Table S2). One classic example of developmental genes that fall into this category and are associated with Polycomb and H3K27me3 domains in *Drosophila* are the homeobox genes (Schwartz et al., 2006) (**Figure 4A**, GO:0005634 Table S2). We then asked whether there is an association between the GO terms retrieved for genes differentially marked with H3K27ac and H3K27me3, and GO terms that have been linked to malaria infection and mosquito resistance to insecticides (Table S3). Interestingly, genes marked with these two histone modification marks include immunity related genes such as Toll, Immune Deficiency (IMD) and Janus kinase/signal transducers and activators of transcription (JAK/STAT) pathways; antimicrobial effectors such as defensins and cepropins; and metabolic genes such as the Vitellogenin gene. Examples of histone modification profiles of two of these candidate genes, which are significantly associated with enriched GO terms, are shown in **Figure 4B**, and the histone modification profiles for each candidate gene examined are shown in Table S3.

#### **DISCUSSION**

Changes in chromatin state play an important role in a number of biological processes, including responses to stress and environmental conditions. In organisms like *Drosophila* that lack DNA methylation, post-translational modifications of histones are an important layer of gene regulation (Bonasio et al., 2010; Bannister and Kouzarides, 2011). However, chromatin maps and the dynamics of epigenetic control in mosquitoes have remained to date unexplored. In order to fill this gap we describe genomewide maps of two histone modifications with key roles in gene regulation, H3K27ac and H3K27me3. To establish a link with gene expression, we then correlate the presence of these modifications in specific genes with their transcriptional state in midguts and salivary glands of *A. gambiae* females.

In *Drosophila*, various combinations of active and repressive histone marks have been found to be associated with distinct chromatin functional states (Filion et al., 2010; Roy et al., 2010; Kharchenko et al., 2011; Negre et al., 2011; Yin et al., 2011). Two

#### **FIGURE 2 | Genome-wide distribution of histone modifications.**

Distribution of **(A)** H3K27me3 and **(B)** H3K27ac with respect to gene features in *A. gambiae* midguts. The enrichment or depletion is shown relative to chromatin input. The regions in the map comprise gene bodies flanked by a segment of 200 bp at the 5 end of TSSs and TTSs. Average profiles across gene regions ±200 bp for each histone modification are shown on top. **(C)** Heatmap of RNA-seq data showing the level gene expression, as read count, profiled along the region. In

all heatmaps **(A–C)** genes were organized into 3 clusters based on their level of H3K27me3. For H3K27ac, cluster genes are ordered by descent but independent to H3K27me3. The color bars indicate the range of intensities based on ChIP enrichment, from red to blues for higher to low enrichment values. Boxplots showing the mean enrichment of H3K27ac and H3K27me3 by cluster **(D)**, and mean level of gene expression by gene cluster **(E)**. Significant pairwise comparisons are indicated by asterisks (see text).

of these histone modifications that display contrasting distribution patterns and regulatory functions include acetylation and methylation of lysine 27 of histone 3. Previous studies have shown that H3K27ac is typically enriched at enhancers, TSSs and coding regions of active genes, whereas H3K27me3 localizes to repressed loci (Tolhuis et al., 2006; Kharchenko et al., 2011; Kellner et al., 2012; Van Bortle et al., 2012; Brown and Bachtrog, in press). Results described here for the mosquito *A. gambiae* agree with

of *Drosophila* developmental genes where H3K27me3 is highly enriched and distributed in large domains that encompass both genic and intergenic regions in cells where the *Antennapedia* gene is silenced. Examples of

GO:0005515) (Tables S2, S3). The scale of the tracks is proportional to the number of sequencing reads for each histone modification. Gene expression in terms number of RNA reads mapped to each gene is also displayed.

these observations and suggest a close association between the presence of H3K27ac and H3K27me3, and the transcriptional state of genes.

Genome-wide histone maps in *A. gambiae* reveal that H3K27ac and H3K27me3 cover approximately 6 and 14% of the genome, respectively. These two genome fractions do not overlap and there is a mutually exclusive pattern of enrichment at most genes: H3K27me3-containing promoters and genes show no enrichment for H3K27ac, and, reciprocally, those containing H3K27ac are not enriched for H3K27me3 (**Figure 2**). This is in agreement with previous studies in *Drosophila*, and supports the notion that these two histone modifications also have opposite biological functions in mosquitoes. Despite a general mutual exclusive pattern in the distribution of methylated and acetylated H3K27 sites, we found some regions of overlap, including both genic and intergenic regions. A possible explanation for this pattern would be promoter bivalency, which occurs when gene promoters contain both active and repressive modifications. These promoters are considered poised for activation (or repression) at a later stage. But this pattern has been described in the context of differentiating cells or during development (Creyghton et al., 2010), and is restricted in principle to promoters containing activating histone H3K4me3 and repressive H3K27me3 marks, not H3K27ac (Voigt et al., 2013). Alternatively, the presence of the two histone modifications can be the result of heterogeneous enrichments across cells as it has been recently been shown using single-cell ChIP-seq technologies (Wang and Bodovitz, 2010). In the case of intergenic regions, co-localization of H3K27ac and H3K27me3 has been associated with the presence of enhancer elements (Creyghton et al., 2010). To validate this hypothesis in mosquitoes it would be necessary to map other histone modifications such as H3K4me1 that are diagnostic for this type of regulatory elements (Visel et al., 2009; Zhu et al., 2013).

The combined analysis of ChIP-seq and RNA-seq data in midgut cells of *A. gambiae* supports a relationship between transcription levels and profiles of histone PTMs. We find that the presence of H3K27ac coincides with actively transcribed genes, whereas H3K27me3 is mostly associated with clusters of repressed genes that show low levels of transcription. Our analyses also suggest that enrichment or depletion of H3K27ac and H3K27me3 correlates not only with the transcriptional state of genes but also with the magnitude of their expression or silencing, which is in agreement with earlier studies in *Drosophila* (Negre et al., 2011). For example, in flies and mammals it has been shown that H3K27me3 can form domains that can expand 100 kb or more and covers blocks of silent genes and intergenic regions, and expressed genes that are preferentially located in regions immediately flanking H3K27me3 BLOCs (Pauler et al., 2009; Young et al., 2011; Hou et al., 2012). Many of these domains can also contain Polycomb (Schwartz et al., 2006; Tolhuis et al., 2006; Schuettengruber et al., 2009). In this study, Polycomb-containing genes would correspond to those of cluster 3 in **Figure 2**. One example of this type is the *Hox* genes of *Drosophila*, which are active in embryonic and larval stages of the life of this organism. These *Hox* genes have their homologous in mosquitoes and they display a pattern of H3K27me3 enrichment as broad peak blocks in differentiated cells of the midgut (**Figure 4**). It has been shown in *Drosophila* that H3K27me3 can be also be present in genes in euchromatic regions as focal peaks, rather than in domains, where there is a dynamic turnover between active and repressive histone marks and where these marks participate in the fine tuning of transcription (Riddle et al., 2011; Young et al., 2011). In the case of *A. gambiae* such a distribution pattern would correspond to genes of cluster 2, which show intermediate levels of H3K27ac and H3K27me3 and medium to low transcription (**Figure 2**). Finally, genes containing H3K27ac show different enrichment levels in this histone PTM. Particularly interesting is the finding of a subset of genes that have this modification only at promoters and show intermediate levels of transcription, suggesting that RNA Polymerase II may be paused at these regions (**Figure 2**). This has been previously reported by ChIP-seq studies in flies and mammals that mapped RNApolymerase binding patterns in association with various histone PTMs and regulatory elements genome-wide (Negre et al., 2011; Adelman and Lis, 2012). These findings together support the conclusion that mechanisms of regulation of chromatin structure and function have been conserved between *Anopheles* and *Drosophila*.

The functional analysis of genes differentially enriched in histone PTMs further supports the idea that histone modification profiles correlate with gene expression in *A. gambiae.* This finding, and the similarities found with the mechanisms of epigenetic regulation described in *Drosophila*, has potentially important implications in the context of infection and the response of mosquitoes to environmental stresses. We further explored this issue by examining the enrichment in H3K27 PTMs and the transcriptional status of a set of target genes that have been previously shown to be up-regulated in response to malaria infection as well as genes involved in insecticide resistance (referenced in Table S3). Interestingly, the majority of the candidate genes examined appear to be enriched in one of the two histone modifications, and most fall within clusters 1 and 2, which suggests that they are in a dynamic on/off switch for transcriptional regulation (Table S3).

One limitation of epigenomic studies in non-model organisms is the availability of fully annotated genomes that allow high resolution mapping of epigenetic modifications and regulatory elements to their target genes. This is clearly illustrated in this study, and may be one of the reasons why analysis of epigenetic modifications has remained unexplored in mosquitoes. The analysis of the *A. gambiae* transcriptome in midguts and salivary glands using RNA-seq resulted in the identification of several thousand novel transcripts. The high discovery rate is not surprising considering that the majority of transcriptome studies carried out to date in *A. gambiae* have employed microarrays (Baker et al., 2011; Maccallum et al., 2011). It is important to note that a large fraction of regions enriched for H3K27ac and H3K27me3 map to non-annotated genes. The differential enrichment in histone modification marks and tissue specific signatures of transcription at these genes can be important in predicting potential targets in the response of *A. gambiae* to pathogens and environmental stressors.

#### **CONCLUSIONS AND FUTURE DIRECTIONS**

Here we present an integrative approach that combines genomewide profiling of histone modifications and gene expression that allows the functional interpretation of chromatin modifications as well as *de novo* prediction of regulatory elements and genes. Results support a link between changes in histone modification profiles and transcription levels in *A. gambiae*, and this includes genes that have been previously identified as having important roles in the epidemiology and control of malaria transmission. The patterns obtained are similar to those previously described in *Drosophila*, suggesting the existence of common mechanisms of chromatin regulation in *A. gambiae*. These findings open new opportunities to study the dynamics of chromatin states including other histone PTMs in various developmental stages and environmental conditions (i.e., infection or exposure to insecticides) as well as the mechanisms by which these states are controlled by regulatory elements such as enhancers and insulators.

#### **DATA ACCESS**

ChIP-seq and RNA-seq raw data as well as processed data (wig, bed, FPKM values and GTF files) are deposited in GEO database under accession number GSE59773.

#### **ACKNOWLEDGMENTS**

We thank The Genomic Services Lab at the HudsonAlpha Institute for Biotechnology for their help in preparing RNA-seq libraries and performing Illumina sequencing of RNA-seq and ChIP-Seq samples. We thank A. Nicot, M. N. Lacroix, for assistance with mosquito breeding, A. Kotlar and K. Van Bortle for advice in the data analysis. This work was supported by U.S. Public Health Service Award R02GM035463 from the National Institutes of Health. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www*.*frontiersin*.*org/journal/10*.*3389/fgene*.*2014*.* 00277/abstract

#### **Figure S1 | Fingerprint plot that shows the distribution of reads, as the cumulative sum of read counts in 10 bp window bins, for H3K27ac,**

**H3K27me3, and input samples.** A tight diagonal is expected when reads are equally distributed across the genome, as is the case for the input, whereas the curve becomes more pronounced as the degree of enrichment increases and is more localized in the ChIPs relative to the control sample.

**Figure S2 | Piecharts that show the number of sequences assigned to KEGG pathways for each histone modification gene cluster (see text, Figure 2).**

#### **REFERENCES**


and specifies a histone banding pattern on a mouse autosomal chromosome. *Genome Res.* 19, 221–233. doi: 10.1101/gr.080861.108


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 June 2014; accepted: 30 July 2014; published online: 15 August 2014. Citation: Gómez-Díaz E, Rivero A, Chandre F and Corces VG (2014) Insights into the epigenomic landscape of the human malaria vector Anopheles gambiae. Front. Genet. 5:277. doi: 10.3389/fgene.2014.00277*

*This article was submitted to Epigenomics and Epigenetics, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Gómez-Díaz, Rivero, Chandre and Corces. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*