# **MOLECULAR BIOLOGY OF THE TRANSFER RNA REVISITED**

# **Topic Editor Akio Kanai**

### *FRONTIERS COPYRIGHT STATEMENT*

© Copyright 2007-2014 Frontiers Media SA. All rights reserved.

All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.

The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.

Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.

Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.

As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.

All copyright, and all rights therein, are protected by national and international copyright laws.

The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.

**ISSN** 1664-8714 **ISBN** 978-2-88919-366-0 **DOI** 10.3389/978-2-88919-366-0

### *ABOUT FRONTIERS*

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

### *FRONTIERS JOURNAL SERIES*

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing.

All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

### *DEDICATION TO QUALITY*

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view.

By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

### *WHAT ARE FRONTIERS RESEARCH TOPICS?*

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area!

Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# **MOLECULAR BIOLOGY OF THE TRANSFER RNA REVISITED**

Topic Editor: **Akio Kanai,** Keio University, Japan

tRNA in an eye. Image created by Prof. Akio Kanai.

Transfer RNAs (tRNAs) are one of the classical non-coding RNAs whose lengths are approximately 70–100 bases. The secondary structure of tRNAs can be represented as the cloverleaf with 4 stems, and the three dimensional structure as an "L" shape. Historically, the basic function of tRNA as an essential component of translation was established in 1960s, i.e., each tRNA is charged with a target amino acid and these are delivered to the ribosome during protein synthesis. However, recent data suggests that the role of tRNA in cellular regulation goes beyond this paradigm.

In most Archaea and Eukarya, precursor tRNAs are often interrupted by a short intron inserted strictly between the first and second nucleotide downstream of the anticodon, known as canonical nucleotide position (37/38). Recently, a number of reports describe novel aspects of tRNAs in terms of gene diversity, for example, several types of disrupted tRNA genes have been reported in the Archaea and primitive Eukarya, including multiple-intron-containing tRNA genes, split tRNA genes, and permuted tRNA genes.

Our understanding of the enzymes involved in tRNA functions (e.g., aminoacyl-tRNA synthetase, tRNA splicing endonuclease, tRNA ligase) has deepened. Moreover, it is well known that tRNA possesses many types of base modifications whose enzymatic regulations remain to be fully elucidated. It was reported that impaired tRNA nuclear-cytoplasmic export links DNA damage and cell-cycle checkpoint.

Furthermore, a variety of additional functions of tRNA, beyond its translation of the genetic code, have emerged rapidly. For instance, tRNA cleavage is a conserved part of the responses to a variety of stresses in eukaryotic cells. Age-associated or tissue-specific tRNA fragmentation has also been observed. Several papers suggested that some of these tRNA fragments might be involve in the cellular RNA interference (RNAi) system.

These exciting data, have lead to this call for a Research Topic, that plans to revisit and summarize the molecular biology of tRNA. Beyond the topics outlined above, we have highlighted recent developments in bioinformatics tools and databases for tRNA analyses.

# Table of Contents


## Welcome to the new tRNA world!

### *Akio Kanai\**

*Functional RNA Group, Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan \*Correspondence: akio@sfc.keio.ac.jp*

### *Edited by:*

*William Cho, Queen Elizabeth Hospital, Hong Kong*

### *Reviewed by:*

*Naoki Shigi, National Institute of Advanced Industrial Science and Technology (AIST), Japan*

### **Keywords: transfer RNA, molecular biology, gene diversity, pre-tRNA processing, base modification, new biological functions, molecular evolution, human diseases**

Transfer RNAs (tRNAs) are one of the classical non-coding RNAs, with lengths of approximately 70–100 bases. The secondary structure of tRNAs can be represented as a cloverleaf with four stems, and the three-dimensional structure as an "L" shape. Historically, the basic function of the tRNAs as essential components of translation was established in the 1960s, when it was found that each tRNA is charged with a target amino acid by a specific aminoacyltRNA synthetase, and delivers it to the ribosome during protein synthesis (Crick, 1966; Normanly and Abelson, 1989; Frank, 2000). However, recent studies suggest that the roles of tRNA in cellular regulation go beyond this paradigm. Now, tRNA is recognized as a regulator of many biological processes, and several unique tRNA genes have been discovered. Our understanding of the enzymes involved in tRNA functions has also increased and many tRNA-related diseases have been reported. In response to these exciting data, I have edited this special issue of *tRNA*, which revisits and summarizes the molecular biology of tRNA. The topics contributed by specialists in the field cover a wide range of tRNA research.

In the last decade, a number of reports have described novel aspects of tRNAs in terms of the diversity of their genes. For example, several types of disrupted tRNA genes have been reported in the Archaea and primitive Eukarya. These include multipleintron-containing tRNA genes, split tRNA genes, and permuted tRNA genes (Fujishima and Kanai, 2014; Soma, 2014). Because these tRNAs are encoded as precursor forms (pre-tRNAs) in the genome, they must be processed to yield mature functional tRNAs. Studies of tRNA introns and their processing enzymes suggest that rather complex pathways are required to generate mature tRNAs (Yoshihisa, 2014). Metazoan mitochondrial tRNA is another example of a unique tRNA, lacking either one or two arms of the typical tRNA cloverleaf structure (Watanabe et al., 2014), and transfer messenger RNA (tmRNA) is involved in *trans*translation, the major ribosome rescue system in bacterial cells (Himeno et al., 2014). Most of these tRNA genes and a huge number of tRNA sequences from metagenomic analyses are registered on the tRNA gene databases (Abe et al., 2014).

The universal 3- -terminal CCA sequence of the tRNAs, which is required for amino acid attachment to the molecule, is synthesized by tRNA nucleotidyltransferase or the "CCA-adding enzyme." The molecular mechanism of the template-independent RNA polymerization catalyzed by the CCA-adding enzyme is discussed, based on its structural features (Tomita and Yamashita, 2014). It is well-known that tRNAs contain many types of base modifications. Recent progress in our understanding of two major modifications in tRNAs, methylated nucleosides (Hori, 2014) and thionucleosides (Shigi, 2014), is reviewed and summarized.

As well as the canonical role of tRNA during protein biosynthesis, recent studies have shown that tRNA performs additional functions in regulating biochemical processes (Raina and Ibba, 2014). For example, aminoacyl-tRNA is involved in cell wall formation, protein labeling for degradation, and antibiotic biosynthesis. Moreover, tRNA cleavage is a conserved part of the responses of eukaryotic cells to various stresses. Age-associated and tissue-specific tRNA fragmentation have also been observed and several studies have suggested that some of these tRNA fragments are involved in the cellular RNA interference (RNAi) system.

Pathological mutations in tRNA genes and tRNA-related enzymes have been linked to human diseases (Abbott et al., 2014). Mutations in the mitochondrial tRNA genes, in particular, are responsible for many diseases, and aminoacyl-tRNA synthetase mutations are associated with neurological diseases. Finally, the evolution of the tRNA molecule is discussed based on the analysis of the tRNA structure (Caetano-Anolles and Sun, 2014) and ancestral ribozymes (Fujishima and Kanai, 2014).

Please enjoy reading all these articles, which will open the door to a new tRNA world!

### **ACKNOWLEDGMENTS**

I would like to acknowledge all the scientists and individuals who have supported this special issue: authors, reviewers, editors, and publishers. This work was supported in part by a Grant-in-Aid for Scientific Research (A) #26242075 from the Ministry of Education, Culture, Sports, Science and Technology of Japan (to Akio Kanai).

### **REFERENCES**


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 14 August 2014; accepted: 05 September 2014; published online: 23 September 2014.*

*Citation: Kanai A (2014) Welcome to the new tRNA world! Front. Genet. 5:336. doi: 10.3389/fgene.2014.00336*

*This article was submitted to Non-Coding RNA, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Kanai. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

**REVIEW ARTICLE** published: 26 May 2014 doi: 10.3389/fgene.2014.00142

#### *Kosuke Fujishima1,2\* and Akio Kanai <sup>2</sup> \**

*<sup>1</sup> NASA Ames Research Center, Moffett Field, CA, USA*

*<sup>2</sup> Institute for Advanced Biosciences, Keio University, Tsuruoka, Japan*

### *Edited by:*

*Sanjeev Kumar Srivastava, Mitchell Cancer Institute, USA*

### *Reviewed by:*

*Sandeep Kumar, SUNY Downstate Medical Center, USA Nikhil Tyagi, Mitchell Cancer Institute, USA Kaushlendra Tripathi, Mitchell Cancer Institute, USA*

### *\*Correspondence:*

*Kosuke Fujishima, NASA Ames Research Center, Building N239, Moffett Field, CA 94035, USA e-mail: kosuke.fujishima@nasa.gov; Akio Kanai, Institute for Advanced Biosciences, Keio University, 403-1 Nipponkoku, Daihoji, Tsuruoka 997-0017, Japan e-mail: akio@sfc.keio.ac.jp*

Transfer RNA (tRNA) is widely known for its key role in decoding mRNA into protein. Despite their necessity and relatively short nucleotide sequences, a large diversity of gene structures and RNA secondary structures of pre-tRNAs and mature tRNAs have recently been discovered in the three domains of life. Growing evidences of disrupted tRNA genes in the genomes of Archaea reveals unique gene structures such as, intron-containing tRNA, split tRNA, and permuted tRNA. Coding sequence for these tRNAs are either separated with introns, fragmented, or permuted at the genome level. Although evolutionary scenario behind the tRNA gene disruption is still unclear, diversity of tRNA structure seems to be co-evolved with their processing enzyme, so-called RNA splicing endonuclease. Metazoan mitochondrial tRNAs (mtRNAs) are known for their unique lack of either one or two arms from the typical tRNA cloverleaf structure, while still maintaining functionality. Recently identified nematode-specific V-arm containing tRNAs (nev-tRNAs) possess long variable arms that are specific to eukaryotic class II tRNASer and tRNALeu but also decode class I tRNA codons. Moreover, many tRNA-like sequences have been found in the genomes of different organisms and viruses. Thus, this review is aimed to cover the latest knowledge on tRNA gene diversity and further recapitulate the evolutionary and biological aspects that caused such uniqueness.

**Keywords: intron-containing tRNA, split tRNA, permuted tRNA, nev-tRNA, armless tRNA, RNA splicing endonuclease, co-evolution**

### **INTRODUCTION**

Transfer RNA (tRNA) is a short non-coding RNA of approximately 70–100 bases. The principal function of tRNA is its involvement in translation machinery. Each tRNA is charged with a corresponding amino acid and delivered into the ribosome during protein biosynthesis. Currently all living organism possess tRNA molecules and thus it is known as one of the most classical RNA molecules found in nature, and it is essential for the core biological system. The recent development in the fields of genomics and transcriptomics has revealed many non-canonical tRNA genes and their structures in the three domains of life. These include tRNA gene disruption, fragmentation, rearrangement, minimization, and re-coding (Kanai, 2013). It is likely that such diversification is a consequence of the co-evolution of tRNA and its processing enzyme known as splicing endonuclease (Tocchini-Valentini et al., 2005a). RNA splicing endonuclease in Archaea recognizes and cleaves a structural motif consisting of two three-nucleotide bulge loops separated by four base pairs, known as the bulge-helix-bulge (BHB) motif (Thompson and Daniels, 1990). Structural motif and location of tRNA introns in Archaea have been affected by the change in the recognition and activity of the different unit compositions in the tRNA splicing endonucleases (Fujishima et al., 2011). Some type of disrupted tRNA genes such as split tRNA are considered as potential analogs of early tRNA, thus creating a hot debate in the field of tRNA evolution (Randau and Söll, 2008; Di Giulio, 2012). While it is still unclear how tRNA molecules originated, evolutionary biologists continue to question how these chains of ribonucleotides became involved in the context of protein synthesis, and how they influenced the evolution of these biological systems. When in the course of molecular evolution did tRNA molecule and its characteristic cloverleaf structure emerged is still an ongoing debate, however several evolutionary models representing the origin and convergence of proto-tRNA have been proposed (Weiner and Maizels, 1999; Widmann et al., 2005; Sun and Caetano-Anollés, 2007). In this review, we will recapitulate the characteristics of modern tRNA gene diversity, summarize the coevolutionary scenario of tRNA and their processing enzymes, and provide different models for the origin and evolution of early tRNA.

### **tRNA GENE DIVERSITY IN THE THREE DOMAINS OF LIFE**

In the current era of large scale-genomics a large number of complete genome sequences are available; 2615 Bacteria, 166 Archaea, 171 Eukaryote, and 3490 Virus (March 2014; https://www.ebi.ac. uk/genomes/). tRNA genes have been long predicted computationally using a classical software tRNA-scanSE which enables one to identify 99–100% of canonical cloverleaf structure and introns at the anticodon loop with very few false positives (Lowe and Eddy, 1997). However, in 2005, a first example of *trans-*spliced tRNA encoded on two separate genes, so-called split tRNA was identified (Randau et al., 2005a). Two tRNA halves are bound by the complimentary leader sequence that forms the characteristic BHB motif at the tRNA exon-intron boundary. In an attempt to find structurally disrupted tRNA genes, our team developed a new tRNA prediction software SPLITS and SPLITSX (Sugahara et al., 2006, 2007) that centers on finding multiple intron-containing Fujishima and Kanai tRNA gene diversity

tRNA and split tRNA through detection and removal of BHB motifs at the genome level. For similar reasons, improved version of tRNA-scanSE was recently launched (Chan et al., 2011). Based on these powerful computational approaches, a variety of non-standard tRNA genes have been found in all three domains of life and organelles (**Table 1**). We also summarized the tRNA gene diversity on the tree of life (**Figure 1**). Domain Archaea provides many of the examples of non-standard tRNAs such as multiple-intron containing tRNAs (Sugahara et al., 2008), split tRNAs (Randau et al., 2005a; Chan et al., 2011), tri-split tRNAs (Fujishima et al., 2009), and permuted tRNAs (Chan et al., 2011). Permuted tRNAs are also found in the nuclear and nucleomorph genomes of early-diverged eukaryotic algae (Soma et al., 2007; Maruyama et al., 2010). The only exception (exception to what?) is the Nematode-specific variable arm containing tRNAs (nevtRNAs) that decode alternative genetic code found in the phylum Nematoda (Hamashima et al., 2012). Many metazoan mitochondrial tRNAs are known for their armless tRNA structures lacking either one, or in extreme case, both of their arms (Ohtsuki et al., 2002; Masta and Boore, 2008). On the contrary, bacterial tRNA genes are surprisingly uniform and lack structural diversity. The only significant example found so far is the insertion of group I and II introns between nucleotides 37 and 38 of the precursor tRNA (37/38) when they are predominantly positioned adjacent to the anticodon (Paquin et al., 1997; Vogel and Hess, 2001). Similarly this location has also been known as "canonical" intron insertion site for enzymatically spliced introns in the tRNA precursors of Archaea and Eukaryotes (Abelson et al., 1998).

### **INTRON-CONTAINING tRNA**

tRNA splicing events occur in all three domains: bacteria, archaea, and eukaryotes. In bacteria, tRNA introns are self-splicing group I introns found mostly in cyanobacteria and in few alpha- and betaproteobacteria (Reinhold-Hurek and Shub, 1992; Tanner and Cech, 1996). There are currently two group I intron families in cyanobacteria. One is found in tRNAfMet that are recently recently gain by lateral gene transfer. The other one has more ancestral origin, found at exact same position of tRNALeu (UAA) genes in both cyanobacteria and organelles called plastid that originated from endosymbiotic cyanobacteria (Paquin et al., 1997). The typical secondary structure of a group I intron consists of approximately 10 helical elements with roughly 100 nucleotides as central catalytic core of the intron RNA to facilitate the splicing reaction (Haugen et al., 2005). The majority of the bacterial and plastid tRNA group I introns are located at position 37/38 immediately downstream of the anticodon. Interestingly the same feature is observed for eukaryotic and archaeal tRNA introns that are relatively short in length (less than 100 nt) and are spliced by a series of protein enzymes called splicing endonucleases (Abelson et al., 1998). Eukaryotic and archaeal tRNA splicing share similar RNA motif recognized by orthologous splicing endonucleases, and thus origin of the tRNA introns at canonical position is assumed to be very ancestral (Kanai, 2013). Whereas, recent bioinformatics studies have revealed a number of archaeal introns located at non-canonical positions and even some tRNA genes that harbor multiple introns (Sugahara et al., 2006, 2009). There is a clear trend showing that multiple intron-containing tRNAs are prominent in the archaeal order Thermoproteales. In the most extreme cases, more than half of the tRNA genes are intervened by multiple introns (Sugahara et al., 2008). Comprehensive sequence comparison of introns among seven Thermoproteales clearly show that similar intron sequences are observed among diverse tRNA species, indicating a large-scale intron transposition occurred within this archaeal order and that could have contributed to the rapid gain of introns (Fujishima et al., 2010).

### **SPLIT tRNA AND TRI-SPLIT tRNA**

The first example of *trans*-spliced tRNA was reported in 2005 by Dieter Söll's group from a highly reduced genome of an ultrasmall nanoarchaeal parasite *Nanoarchaeum equitans,* termed as "split tRNA" (Randau et al., 2005a). In total, 11 fragmented tRNA genes encoding 5 or 3- -tRNA halves were determined through expression and sequencing analysis. This also included a unique case where one 3- -tRNAGlu half was shared by two 5- -tRNAGlu halves (Randau et al., 2005b). tRNA halves are joined in *trans* through annealing of complimentary leader sequences and forms a relaxed BHB motif at the leader-exon boundary to be processed by the tRNA splicing endonuclease. Because of its unique features, split tRNA was first considered as a by-product of genome size reduction, however recently sequenced Nanoarchaeota Nst1 genome held no split tRNA genes, indicating that the splitting event is a specific feature of *N. equitans* and seems to reflect ongoing genome rearrangement (Podar et al., 2013). However, in 2009, our group found a new set of split tRNA genes in a free-living hyperthermophilic crenarchaeon *Caldivirga maquilingensis* genome without obvious trace of genome rearrangement nor reduction. We also found a new type of split tRNA consists of a maximum of three different RNA pieces thus coined as "tri-split tRNA" (Fujishima et al., 2009). Interestingly, some of these genes that carry anticodon sequence can be swapped with other genes to create synonymous tRNA, just like a jigsaw puzzle. In **Figure 2**, we summarized all the combinations of primary split/tri-split tRNA transcripts forming mature tRNAs in *C. maquilingensis*.

Split tRNA genes have also been discovered from four archaeal species belonging to the Desulfurococcales branch using an improved version of tRNA-scanSE (Chan et al., 2011). To date, total 29 trans-spliced tRNA genes corresponding to 12 anticodons have been identified in seven hyperthermophilic archaeal genomes (**Table 1**). The ligation sites of split tRNA gene varies, however it generally overlaps with the frequent intron insertion site of Archaea (Fujishima et al., 2010). That said, in Section Split tRNA and Intron-Containing tRNA Early or Late?, we will explain some of the possible hypotheses that explain the origin and evolution of these non-contiguous tRNAs.

### **PERMUTED tRNA**

Permuted tRNA is another form of disrupted tRNA gene where the 3 half of the tRNA lies upstream of the 5 half. It was first discovered in the nuclear genome of unicellular red alga *Cyanidioschyzon merolae*, (Soma et al., 2007). Expression analysis revealed that the BHB motif is formed at the termini of permuted tRNA precursor, which is further spliced and ligated into a characteristic circular tRNA intermediate. This circular intermediate is then further processed at the acceptor stem, possibly


**Table 1 | List of various types of tRNA genes found in the three domains of life.**

by RNase P and tRNase Z for maturation (Soma et al., 2007). SPLITS has not only contributed in finding permuted tRNA in the genome of *C. melorae* but also in other diverse photosynthetic eukaryotes including Chlorophytes, a clade of unicellular green algae, and nucleomorph genome of the green alga *Bigelowiella natans* (Maruyama et al., 2010). A nucleomorph is a vestige of primitive algal nuclei which has undergone a process known as secondary endosymbiosis. The lack of permuted tRNA in other

shown in red.

known nucleomorph genomes and the patchy distribution of the permuted tRNA species among unicellular eukaryotes supports the gain and loss of permuted tRNA during the evolutionary stages of red and green algae (Maruyama et al., 2010). In 2011,

Archaea, and Eukaryotes) derived from Last Universal Common

first examples of archaeal permuted tRNA genes were found in the Crenarchaeota *Thermofilum pendens* genome, expanding the realm of permuted tRNAs to eukaryotes and Archaea (Chan et al., 2011). Given the accumulating phylogenomic evidence supporting the eocyte hypothesis, a theory where eukaryotes originate within the archaeal tree and share a sister-group with Crenarchaeota (Cox et al., 2008; Williams et al., 2013), we expect precise phylogenetic studies of permuted tRNAs and their splicing enzymes that may further strengthen this hypothesis.

### **CO-EVOLUTION OF SPLICING ENDONUCLEASE AND tRNA**

So far, the RNA splicing endonuclease *endA* family is known to be the only enzyme responsible for processing the characteristic BHB motif found in precursor sequences of tRNA (Marck and Grosjean, 2003; Sugahara et al., 2007), rRNA (Tang et al., 2002), and mRNA (Yokobori et al., 2009) in Archaea, as well as removal of tRNA introns in eukaryotes. **Figure 3** represents the diversity and evolutionary events that possibly occurred during the course of *endA* gene evolution. Currently four different types of archaeal splicing endonucleases have been identified along with their crystal structures. Both homotetrameric α<sup>4</sup> type and homodimeric α<sup>2</sup> type in Euryarchaeota are only capable of cleaving canonical BHB motifs (Li et al., 1998; Li and Abelson, 2000). Heterotetrameric (αβ)2 is mostly found in Crenarchaeota (Tocchini-Valentini et al., 2005b) with a few exceptions found in Nanoarchaeota, *N. equitans,* and Euryarchaeota *Methanopyrus kandleri* (Randau et al., 2005c). Recently a fourth type of endonuclease, an unique threeunit homodimeric ε<sup>2</sup> type was found in ultrasmall acidophilic archaeon ARMAN-1 and 2 (Fujishima et al., 2011; Hirata et al., 2012) and it has been shown to cleave non-canonical introns (BHL and hBH) inserted at various positions of tRNA (Tocchini-Valentini et al., 2005a; Fujishima et al., 2011). Indeed, in the

### **FIGURE 3 | Diversification of splicing endonuclease family and their RNA substrate specificity. (A)** Box representation of endA protein unit structure. **(B)** Comparison of strict (left) and relaxed (right) form of BHB motifs. **(C)** Possible evolutionary path of *endA* protein family based on sequence, phylogenetic (Randau et al., 2005c; Tocchini-Valentini et al., 2005b; Fujishima

et al., 2011) and molecular studies (Calvin and Li, 2008; Yoshinari et al., 2009; Hirata et al., 2012; Soma et al., 2013). Unit/subunit architecture of five known types [archaeal α4, α2, (αβ)2, ε2, and eukaryotic αβγδ] and two hypothetical types (korarchaeal α<sup>2</sup> with specific loop, and unknown *C. melorae* endonuclease) are shown, with colors indicating the phylogenetic relationship. crenarchaeal order thermoproteales, large numbers of tRNA introns seem to have rapidly accumulated, generating multiple intron-containing tRNAs with a maximum of three introns. This phenomenon strongly suggest that the change of splicing endonuclease type will directly influence the tRNA gene structure (Sugahara et al., 2008). In (αβ)2 type and ε<sup>2</sup> type endonucleases, an insertion of specific loops were observed and based on the structural analysis, both loops are essential for the recognition of relaxed BHB splicing motif found at the boundaries of tRNA exon and non-canonical introns (Hirata et al., 2012). These results clearly show the evolutionary path of splicing endonuclease toward acquiring broad substrate specificity, which drove the co-evolution along with tRNA genes to accept introns at various positions. Based on the sequence evidence and phylogeny of subunits, we recently proposed a novel homodimeric α<sup>2</sup> type of archaeal endA specifically found in Korarchaeota, which shares a same "specific loop" with the (αβ)2 type, known to be essential for the cleavage of non-canonical tRNA introns (Fujishima et al., 2011). Indeed, *Korarchaeum cryptofilum* is the only archaeon with α<sup>2</sup> endonuclease that harbors tRNA genes with non-canonical introns, representing another example of the tRNA—*endA* coevolution. In eukaryotes, tRNA splicing endonuclease consists of four subunits Sen2p, Sen15p, Sen34p, and Sen54p, in which Sen2p and Sen34p share clear sequence homology to archaeal endA (Trotta et al., 1997). The function of eukaryotic endA has been well studied in yeast, where the yeast splicing endonuclease can only strictly cleave tRNA introns located at position 37/38 just after the anticodon (Reyes and Abelson, 1988). Highly conserved 50 aa core sequence shared between the two domains of life suggest a monophyletic origin of RNA splicing endonuclease as a single subunit protein forming a homotetramer known as α<sup>4</sup> type, emerged before the divergence of Archaea and eukaryotes (Abelson et al., 1998). While protein crystal structures of the four types of archaeal endonucleases have been solved, so far only one of the four eukaryotic endA subunits sen15p has been crystalized from human homolog (Song and Markley, 2007), leaving overall orientation of the eukaryotic tRNA splicing endonuclease up to speculation. Two-hybrid experiment has previously shown that *in vivo*, interaction occurs between Sen2p and Sen54p, and between Sen34p and Sen15p specifically, leading to a heterotetrameric enzyme model with Sen54p functioning as a ruler to measure the distance from the tRNA mature domain (Trotta et al., 1997). However, the genome sequence of primitive red algae *C. melorae* and green algae harbors highly-disrupted tRNA genes including permuted tRNA and multiple intron-containing tRNA. Only three homologs, cmSen2p, cmSen34p, and cmSen54p have been identified from the *C. melorae* genome (Tocchini-Valentini and Tocchini-Valentini, 2012). More recently, a yeast-two hybrid experiment revealed a reciprocal interaction between cmSen2 and cmSen54, however cmSen34p did not interact with either of the two subunits suggesting a distinct complex formation and splicing machinery from canonical eukaryotic heterotetramer (Soma et al., 2013).

### **NEV-tRNA (NEMATODE-SPECIFIC V-ARM-CONTAINING tRNAs)**

Mature tRNA can be grouped into two distinct classes (I and II) based on the presence of the long variable arm (V-arm) located between the anticodon arm and the T-arm (Brennan and Sundaralingam, 1976). This V-arm has known to be specific for tRNASer, tRNALeu, and bacterial tRNATyr, and it plays an important role in the recognition of cognate aminoacyl tRNA synthetase (aaRS) (Tocchini-Valentini et al., 2000). However, comprehensive analysis of 46 diverse eukaryotic genomes revealed over 100 novel class II tRNAs in six nematodes genomes with a non-canonical anticodon, coined as nematode-specific V-arm-containing tRNA (nev-tRNA) (Hamashima et al., 2012). Comparative sequence analysis of nev-tRNA has shown that aaRS recognition elements in the V-arm are similar to that of class II tRNA. Indeed, aminoacylation assay and *in vivo* translation experiment have shown that nev-tRNAGly can decode Gly(GGG) codon as leucine. Expression levels of two nev-tRNAGly(CCC) and nev-tRNAIle(UAU) turned out to be very low, compared to that of canonical tRNAs in *C. elegans*. Moreover, codons that nev-tRNAs correspond to are mostly rare codons and comprise less than 20% of the synonymous codons, suggesting it will not drastically influence the proteome (Hamashima et al., 2012). Recently we have identified additional nev-tRNAs in two plant-parasitic nematodes belonging to genus *Meloidogyne*. Similar anticodon alternation has been reported in other organisms including primates, *Drosophila*, yeast, and Enterobacteria, in which half of the cases were involved in switching the codon identity (Rogers and Griffiths-Jones, 2014). In extreme cases, anticodon shift results in assigning an alternative genetic code in certain species, such as the stop codons UGA or UAA being reassigned to various sense codons in *Mycoplasma*, certain ciliated protozoans, and peritrich species (Hamashima and Kanai, 2013).

### **MITOCHONDRIAL ARMLESS tRNA**

Mitochondria are an essential organelle in eukaryotes, generating most of the cell's energy supply as ATP. It is generally accepted that mitochondria originated from within the bacterial phylum α-Proteobacteria and underwent an endosymbiotic event (Gray, 2012). With few exceptions, animal mitochondrial genomes carry over 20 tRNA genes that are distinct from their host cell tRNA (Boore, 1999). However, some nucleus-encoded tRNAs have to be imported from the cytosol to complete the mitochondrial translation (Salinas et al., 2012). Mitochondria have also developed their own variant genetic codes from the universal genetic code, repeatedly and independently in various eukaryotic taxa (Knight et al., 2001). Currently a compilation of over 30,525 mitochondrial tRNA (mt tRNA) sequences from 1418 fully sequenced metazoan mitochondrial RefSeq genomes are registered in the mitotRNAdb (http://mttrna.bioinf.uni-leipzig.de) (Jühling et al., 2009). Most mt tRNAs possess a canonical cloverleaf structure, however extreme examples of truncated mt tRNAs have been identified in some metazoan mitochondria (Ohtsuki et al., 2002; Masta and Boore, 2008). D-armless and T-armless mt tRNAs were first identified in two nematode worms *C. elegans* and *Ascaris summ* (Okimoto and Wolstenholme, 1990). Later, a significant number of T-armless mt tRNAs were found in six different eumetazoa phylum including Nematodes and Arthropods (Ohtsuki and Watanabe, 2007). On the contrary, mammalian mitochondria possess only the cloverleaf and D-armless tRNA, and so far truncated mt tRNA have not been observed in either plants or fungi, suggesting that D-armless tRNA first emerged after the branching of metazoa and more recently T-armless mt tRNA arose in independent branches of eumetazoa (Ohtsuki and Watanabe, 2007). It has been speculated that the main cause of such diversification is due to the subfunctionalization of elongation factor EF-Tu. In *C. elegans*, mitochondrial DNA encodes two EF-Tu homologs, EF-Tu1 and EF-Tu2, which exclusively recognize aminoacylated T-armless and D-armless mt tRNAs, respectively (Arita et al., 2006). Given the fact that in mammalian mitochondria, cloverleaf and D-armless tRNAs are recognized by a single mt EF-Tu that resembles bacterial type (Andersen et al., 2000), gene duplication of EF-Tu leading to a subfunctionalizion of EF-Tu1 to recognize T-armless mt RNA should have contributed in the co-evolution of mt tRNA species, allowing their extreme truncation (Arita et al., 2006).

### **ORIGIN AND EVOLUTION OF tRNA—MOLECULAR EVIDENCE AND EVOLUTIONARY SCENARIO**

tRNA is unique for its capability to bridge information from nucleotide polymer (RNA) to amino acid polymer (protein). Decoding of mRNA information by tRNA is governed by triplet codon-anticodon base pairing, and each codon corresponds to one of the 20 standard proteinogenic amino acids that all living organism share in common. These amino acids are correctly charged onto the 3- -adenosine terminal of the tRNA molecule by aaRSs. Currently two evolutionary unrelated classes of tRNA synthetases (Class I and Class II) are known (Wolf et al., 1999), and surprisingly for many of these aaRSs, anticodon arm is not necessary for aminoacylation, capable of charging short RNA minihelices and duplexes that including the minimal 12 bp acceptor arm-T-C stem loop, known to be the top half of tRNA (Francklyn and Schimmel, 1989, 1990). This finding led to the theory of "operational code," which in the early stage of tRNA evolution, identity was solely embedded in the simple minihelix RNA which later became the acceptor stem of tRNA (Schimmel et al., 1993). Hence, in this chapter, we will feature major findings and theories that lead to a plausible scenario for the origin and evolution of tRNA molecule.

### **MINIMAL AMINOACYL RIBOZYME**

The essentiality of tRNA lies in its core function of 2- -3 aminoacylation. Modern aaRS catalyze this reaction as a two-step reaction (1) Amino acid activated by ATP, forming an intermediate aminoacyl adenylate (aa-AMP), (2) Aminoacyl group is transferred from the adenylate to the tRNA 3- -terminal adenosine nucleotide. However, modern aaRSs are the product of aminoacylation-based translation and thus researchers have been seeking for a more primitive chemical path that eventually led to this sophisticated machinery. For example, it has been shown that high energy aminoacyl adenylate can be formed from ATP and amino acids under prebiotic conditions (Paecht-Horowitz and Katchalsky, 1973). Furthermore, an *in vitro* evolution experiment has achieved in selecting a ribozyme capable of selfaminoacylating its own 5- -hydroxyl group and transferring the aminoacyl group to the 3- -end of other RNA, supporting the idea that aminoacyl-tRNA synthetase ribozymes playing a important role in the RNA world (Lee et al., 2000). The most extreme example of aminoacylating ribozyme to date, is a 5-nt ribozyme discovered though radical minimization of C3 ribozyme that self-aminoacylates (Turk et al., 2010), this ultrasmall ribozyme initially trans-phenylalanylates a complementary 4-nt RNA selectively (**Figure 4A**). The transfer reaction occurs regiospecifically from the phenylalanine atom to ribose 2- -hydroxyl, forming an ester bond between amino acid and RNA that is identical to the modern aminoacylated tRNA. This discovery implies that short ribonucleotides can aminoacylate its counterpart to produce peptidyl-RNA, which can be interpreted as the minimal form of tRNA that could have originated in the very early stage of life.

### **MINIHELIX MODEL**

One of the plausible models for the early stage of tRNA evolution is known as a minihelix or a minigene that forms a hairpin structure (**Figure 4B**). This concept comes from the fact that L-shaped tertiary structure can be distinguished into two halves, where only the top part of tRNA (acceptor stem + T-C arm) is recognized by many of the modern tRNA synthetases (Schimmel and Ribas de Pouplana, 1995). The identity of tRNA is still embedded within the acceptor stem, generally depending on position discriminator base N73 and few base pairs downstream, termed the operational RNA code (Schimmel et al., 1993). Along with the discovery of minimal self-aminoacylating ribozymes presented in the previous section, presumably aminoacylated minihelix originated in an environment where short RNA oligos and amino acids co-existed, prior to the emergence of the ribosome. Tamura and Schimmel demonstrated that D-ribose chirality of RNA minihelix exhibited a clear preference for charging L-amino acids as opposed to D-amino acids (Tamura and Schimmel, 2004), indicating that aminoacylation of RNA could have been the key toward modern protein homochirality. Consequently, ribozyme-based aminoacylation was eventually substituted by more efficient and less promiscuous homochiral protein-based aminoacylation.

### **GENOMIC-TAG HYPOTHESIS AND tRNA-LIKE STRUCTURE**

The top half of modern tRNAs is recognized by RNase P, CCAadding enzyme, and tRNA synthetases which are all related to the maturation of tRNA molecules function in protein synthesis. Various positive strand RNA viruses in plants possess tRNA-like structures (TLSs) at the 3- -end of their genome sequences that are functional mimics of tRNA (Dreher, 2009) (**Figure 4C**). They fall into three types, mimicking the tRNA function of specific aminoacylation by valine, histidine, or tyrosine at the 3 end of pseudoknotted aminoacyl acceptor stem (Pleij et al., 1985). Surprisingly their roles are diverse; tRNA mimicry providing translational enhancement, presentation of minus strand promoter elements for RNA replicase recognition, recruitment of host CCA nucleotidyltransferase as a 3- -telomere, and *in vitro* packaging of the viral genome (Dreher, 2010). Similar trend in using TLS for replicator element can also be seen for most retroviral RNA genomes and long terminal repeat (LTR) retrotransposons, in which annealing of the top half tRNA to the primer binding site initiates the reverse transcription (Le Grice, 2003). The growing evidence of tRNA elements involved in RNA and DNA replication has therefore led to the idea of the "Genomic Tag Hypothesis" noting that tRNA-like structural motifs initially

evolved as a 3 terminal motif that tagged RNA genomes for replication in the RNA world before the advent of protein synthesis (Weiner and Maizels, 1999).

showing the recognition of minihelix by RNA replicase. This functionality could have been retained and integrated into the tRNA-like structure

### **DOUBLE HELIX MODEL**

From a thermodynamic perspective, modern cloverleaf structure of tRNA is not a minimum free energy state but rather a local minimum structure that is reinforced by multiple post transcriptional modification (Wuchty et al., 1999). Similarities between nucleotides at comparable positions within the 5 and 3 halves of the tRNA molecule have often been taken as evidence that the modern cloverleaf structure arose through direct duplication of a hairpin (Tanaka and Kikuchi, 2001; Widmann et al., 2005) (**Figure 4D**). Indeed, one of the suboptimal structures of tRNA is a double helix structure that closely resembles a duplicate form of a minihelix and this structural feature was confirmed by *in vitro* cleavage of tRNA using RNase P (Hori et al., 2000). One possible scenario is that after the duplication event, the latter top half with 3- -CCA terminal continued its role as a recognition RNA element, while bottom half underwent subfunctionalization to provide anticodon and sequence diversity to coevolve with variants of tRNA synthetases.

introns are universally and predominantly found at position 37/38 and that

### **SPLIT tRNA AND INTRON-CONTAINING tRNA EARLY OR LATE?**

intron could be a reminiscent scar of gene fusion.

The symmetric form of tRNA is also considered as a consequence of two separate hairpin RNAs fused to form a modern tRNA structure (**Figure 4E**). The evidence showing that intron sequence is universally found at position 37/38 just after the anticodon indicates a scenario that introns could be a trace of the assembled two hairpin RNA genes (Di Giulio, 2006). The difference from the duplicated double helix model is that the later added bottom half of the tRNA (D-arm + anticodon) could be evolutionary distinct from the top half. We have previously suggested the possibility of separate origin of 5 and 3 tRNA halves based on the sequence similarity and diversity in archeal tRNAs (Fujishima et al., 2008). Furthermore, based on phylogenetic studies of both structural and statistical characteristics of over 500 tRNA molecules suggests that the bottom half of the tRNA was added later in time (between 3 to 4 billion years ago) to the ancient top half, which grew by slow substructural accretion (Sun and Caetano-Anollés, 2007). Currently, split tRNA genes that separately encode the 5- and 3 halves of tRNAs are found in several archaeal genomes (see Section Split tRNA and Tri-Split tRNA for detail). Their split positions and flanking sequences resemble the intron in related species, indicating a strong evolutionary link between split tRNA and intron-containing tRNA (Fujishima et al., 2008, 2009). It is likely that the split tRNA that we currently see in several archaeal genomes is a recently acquired trait. For example, recently the sequenced genome of a first terrestrial hyperthermophilic member of nanoarchaeota Nst1 possess not even a single split tRNA gene (Podar et al., 2013). Interestingly, while *N. equitans* possess (αβ)2 type endonucleases capable of processing non-canonical introns and split tRNA, Nst1 genome encodes α<sup>4</sup> type that can only cleave canonical introns. Accordingly, acquisition of (αβ)2 type may have allowed tRNA gene fragmentation to occur and linger in *N. equitans* genome (Podar et al., 2013). Indeed, we have recently found evidence of ongoing tRNA gene disruption from a community genomic library prepared from a *Caldiarchaeum subterraneum*-dominated microbial mat (Sugahara et al., 2012). A heterogenic clone library revealed a putative DNA recombinase coding a 1.8 kb DNA insertion separating the tRNA fragment and the tRNA gene, a feature that is frequently found at the integration site of mobile elements, such as conjugative plasmids and viruses. Randau and Söll have earlier suggested that split tRNA genes present a strategy for impeding the viral integration or insertion of other mobile elements into canonical tRNA genes (Randau and Söll, 2008). A similar conclusion was reached by Chen et al., based on the fact that 5 and 3 halves of pre-tRNAAsp(GUC) are located adjacent to each other in two related archaea *A. pernix* and *T. aggregans,* indicating two different local genome rearrangement events that occur at the same position in the same tRNA (Chan et al., 2011). On the other hand, Di Giulio interpreted this evidence from a different perspective that the current split genes we see in Archaea may represent the transition stage through which the evolution of tRNA molecule have passed (Di Giulio, 2009). He also explains that the discrepancy of having interruption at non-canonical position of some split tRNA is expected, and compatible with the split early model where assembly of the two hairpin-like structures may also have built structures to tinker with cloverleaf structure (Di Giulio, 2012).

### **CONCLUSIONS**

In this review, we have recapitulated the diversity of gene orientation and RNA structures of modern tRNAs in three domains of life, organelles and viruses. We also focused on the co-evolution of tRNA and their splicing endonucleases, and discussed how subfunctionalization of the enzyme could shape tRNA gene arrangement by allowing the tRNA gene to accept introns at various positions as well as allowing gene fragmentation and permutation. The origin of the tRNA intron is still under debate since intron located at canonical position 37/38 is conserved between Archaea and Eukaryotes and thus represents an ancestral trait, while non-canonical introns are likely added recently due to the change in the functionality of RNA splicing endonuclease. Canonical positioning is a plausible explanation of the hypothesis that modern tRNA arose from simple minihelix RNA through gene duplication/fusion. There is also growing evidence of virus or transposable elements involved in the recent addition of the tRNA intron. It will be important to collect snapshots of the tRNA intron insertion event through sequence analysis to reveal the precise molecular mechanism. Lastly, the charm of tRNA research lies in its potential of shaping the genetic code. There are examples of re-coding in mitochondria and higher eukaryotes possibly driven by tRNA gene multiplication, which tends to target the rare codons to avoid impact on proteome. Further accumulation of genomic and transcriptomic sequence data will likely provide further examples of modern tRNA diversity as well as insight into the origin and evolution of this essential molecule that is deeply involved in the rise of modern genetic system.

### **ACKNOWLEDGMENTS**

We thank Dr. Akira Hirata (Ehime University, Japan) for providing cartoon structures for splicing endonuclease. We thank Ryan Kent (NASA Ames Research Center, USA) for proofreading this manuscript. Finally we thank all the members of the RNA group at the Institute for Advanced Biosciences, Keio University, Japan, for their insightful discussions.

### **REFERENCES**


into the evolution of fragmented tRNAs in archaea. *Proc. Natl. Acad. Sci. U.S.A.* 106, 2683–2687. doi: 10.1073/pnas.0808246106


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 03 April 2014; paper pending published: 14 April 2014; accepted: 28 April 2014; published online: 26 May 2014.*

*Citation: Fujishima K and Kanai A (2014) tRNA gene diversity in the three domains of life. Front. Genet. 5:142. doi: 10.3389/fgene.2014.00142*

*This article was submitted to Non-Coding RNA, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Fujishima and Kanai. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

**REVIEW ARTICLE** published: 01 April 2014 doi: 10.3389/fgene.2014.00063

### *Akiko Soma\**

*Graduate School of Horticulture, Chiba University, Matsudo, Japan*

### *Edited by:*

*Akio Kanai, Keio University, Japan*

### *Reviewed by:*

*Akio Kanai, Keio University, Japan Patricia Chan, University of California Santa Cruz, USA*

### *\*Correspondence:*

*Akiko Soma, Graduate School of Horticulture, Chiba University, 648 Matsudo, Chiba 271-8510, Japan e-mail: soma@chiba-u.jp*

A number of genome analyses and searches using programs that focus on the RNA-specific bulge-helix-bulge (BHB) motif have uncovered a wide variety of disrupted tRNA genes. The results of these analyses have shown that genetic information encoding functional RNAs is described in the genome cryptically and is retrieved using various strategies. One such strategy is represented by circularly permuted tRNA genes, in which the sequences encoding the 5- -half and 3- -half of the specific tRNA are separated and inverted on the genome. Biochemical analyses have defined a processing pathway in which the termini of tRNA precursors (pre-tRNAs) are ligated to form a characteristic circular RNA intermediate, which is then cleaved at the acceptor-stem to generate the typical cloverleaf structure with functional termini. The sequences adjacent to the processing site located between the 3- -half and the 5- -half of pre-tRNAs potentially form a BHB motif, which is the dominant recognition site for the tRNA-intron splicing endonuclease, suggesting that circularization of pre-tRNAs depends on the splicing machinery. Some permuted tRNAs contain a BHB-mediated intron in their 5- - or 3- -half, meaning that removal of an intron, as well as swapping of the 5- - and 3- -halves, are required during maturation of their pre-tRNAs. To date, 34 permuted tRNA genes have been identified from six species of unicellular algae and one archaeon. Although their physiological significance and mechanism of development remain unclear, the splicing system of BHB motifs seems to have played a key role in the formation of permuted tRNA genes. In this review, current knowledge of circularly permuted tRNA genes is presented and some unanswered questions regarding these species are discussed.

**Keywords: tRNA gene, circular gene permutation, BHB motif, tRNA-splicing endonuclease, intron**

### **INTRODUCTION**

The cloverleaf structure of a single polynucleotide tRNA molecule is universally conserved among organisms. However, tRNA genes are often divided into parts on the chromosome; in bacteria, archaea, eukarya, and organelles, several tRNA genes are interrupted by various types of introns, which are removed by RNA splicing after transcription (Thompson and Daniels, 1988; Kjems et al., 1989; Westaway and Abelson, 1995; Belfort and Weiner, 1997; Marck and Grosjean, 2002, 2003; Jühling et al., 2009; Abe et al., 2011). Introns in nuclear and archaeal tRNAs are generally cleaved by tRNA-splicing endonuclease (Reyes and Abelson, 1988; Abelson et al., 1998; Calvin and Li, 2008), while those in eubacteria and organelle tRNAs are encoded as self-splicing group I or II introns (Kuhsel et al., 1990; Xu et al., 1990a; Reinhold-Hurek and Shub, 1992; Biniszkiewicz et al., 1994; Jacquier, 1996; Bonen and Vogel, 2001). In addition to these well-known *cis*-spliced tRNA genes (intron-containing tRNAs), recently developed software has enabled the identification of additional distinct types of disrupted tRNA genes. Use of the Split-tRNA-Search (Randau et al., 2005a), SPLITS and SPLITSX (Sugahara et al., 2006, 2007) packages, in combination with the widely used tRNAscan-SE program (Lowe and Eddy, 1997), has led to the discovery of a variety of disrupted tRNA genes from the archaeal lineage, such as *trans-*spliced tRNAs (split tRNAs) that are joined at several positions in the cloverleaf structure (Randau et al., 2005a,b; Fujishima et al., 2009; Chan et al., 2011) and *cis-*spliced tRNAs containing one or multiple introns at non-canonical positions (Sugahara et al., 2008, 2009; Chan and Lowe, 2009). These newly identified tRNAs commonly harbor a characteristic bulge-helixbulge (BHB) motif, which comprises two 3-nucleotide bulges separated by a single 4-base pair stem and was originally identified around the intron-exon junction of eukaryal and archaeal tRNAs (Thompson and Daniels, 1988; Kjems et al., 1989; Belfort and Weiner, 1997; Fabbri et al., 1998; Marck and Grosjean, 2003).

Nuclear tRNA introns are generally short, comprise a relaxed form of the BHB motif denoted as a hBH or BHB-like (BHL) motif, and are located exclusively between positions 37 and 38 (37/38), which is 3 adjacent to the anticodon (the canonical position) (Marck and Grosjean, 2002, 2003; Jühling et al., 2009). This limited location of the BHB motif in the cloverleaf structure is crucial for the precise recognition of precursor tRNAs (pre-tRNAs) by eukaryal tRNA-splicing endonucleases (Greer et al., 1987; Reyes and Abelson, 1988; Westaway and Abelson, 1995; Di Nicola Negri et al., 1997; Trotta et al., 1997, 2006; Xue et al., 2006; Calvin and Li, 2008). However, recent analyses of the nuclear and nucleomorph genomes of unicellular algae identified a number of non-canonical BHB-mediated disrupted tRNA genes, including circularly permuted and atypical introncontaining genes (Kawach et al., 2005; Soma et al., 2007, 2013; Landweber, 2007; Maruyama et al., 2010; Chan et al., 2011). Analysis of the processing intermediates of permuted tRNAs revealed a new strategy for post-transcriptional processing of genetic information by inversion of RNA fragments and relocation of the termini via circularization of pre-RNA molecules (Soma et al., 2007, 2013; Maruyama et al., 2010). A further analysis also identified permuted tRNA genes in an archaeal lineage (Chan et al., 2011), highlighting the considerable diversity and wide distribution of tRNA gene disruption among organisms. While BHB motifs and the tRNA-intron splicing system must have been a prerequisite for the development of permuted tRNA genes (Soma et al., 2007; Sugahara et al., 2009; Maruyama et al., 2010; Tocchini-Valentini and Tocchini-Valentini, 2012; Kanai, 2013), their detailed mechanisms and physiological relevance remain unclear.

Here, the structure, expression, and phylogeny of circularly permuted tRNA genes are summarized. Discussions of their possible physiological relevance and method of development in correlation with the tRNA expression system and other disrupted non-coding RNA genes are also provided.

### **IDENTIFICATION AND DISTRIBUTION OF CIRCULARLY PERMUTED tRNA GENES**

Circularly permuted tRNA genes were initially identified in the nuclear genome of *Cyanidioschyzon merolae* 10D (Soma et al., 2007), an ultra-small unicellular red alga that inhabits an extreme environment (pH 1–3, 40–50◦C) and represents one of the most ancestral forms of eukaryote (Kuroiwa, 1998; Nozaki et al., 2003, 2007; Matsuzaki et al., 2004). A primary search of the complete 16.5 Mbp *C. merolae* nuclear genome sequence was performed using tRNAscan-SE, the most wellknown and widely used software, which identifies tRNA genes without or with introns canonically located at position 37/38 in the anticodon-loop (Lowe and Eddy, 1997). This analysis identified a total of 30 predicted tRNA genes, which is insufficient to decode the 61 sense codons utilized in the nuclear genome of *C. merolae* (Matsuzaki et al., 2004). Therefore, to discover unidentified *C. merolae* nuclear tRNA genes, a genomewide analysis was performed using the SPLITS and SPLITSX programs (Sugahara et al., 2006, 2007), which were developed to detect tRNA genes harboring BHB motifs, including *cis*spliced tRNAs with introns inserted at various positions and split tRNAs that are joined at several positions in the cloverleaf structure. In addition, a BLAST search of conserved sequences in the T-C-arm or anticodon-arm was also performed. This approach identified a total of 43 tRNA genes for 42 anticodons, which is sufficient to decode the 61 codons (Soma et al., 2007, 2013).

Notably, for 11 of the 43 tRNA genes identified in *C. merolae*, the sequence encoding the 3- -half of the tRNA is positioned upstream of the sequence encoding the 5- -half in the genome (**Figure 1A**), and the two halves are interrupted by an intervening sequence that corresponds to the boundary connecting the 5- and 3- -ends of the acceptor-stem of a mature tRNA. This arrangement is termed the circular gene permutation model (Heinonen et al., 1987; Pan et al., 1991; Keiler et al., 2000); hence, these genes were named "circularly permuted tRNA genes" (Soma et al., 2007). The study by Soma et al. (2007) was the first report of the existence of permuted genes encoding tRNAs or eukaryal nuclearencoded non-coding RNAs. A TATA-like sequence was identified within the region 50 bp upstream of the 3- -half of most of the permuted tRNA genes as well as the non-permuted tRNA genes in *C. merolae* (Matsuzaki et al., 2004; Soma et al., 2013), indicating its importance for the transcription of tRNA genes. A T-stretch corresponding to a termination signal for RNA polymerase III (RNAPIII) was identified downstream of the 5- -half of these genes (Sprague, 1995; Hamada et al., 2000; Nielsen et al., 2013), but no promoter or termination signals were identified in the intervening region between the 3- - and 5- -halves, which varies in length from 7 to 74 nucleotides (**Table 1**). These observations suggest that the 3- - and 5- -halves of the putative tRNA genes are transcribed as a linear RNA. The exon sequences of both the permuted and non-permuted *C. merolae* tRNA genes show ordinary characteristics of eukaryal tRNAs and contain consensus elements found in eukaryotic tRNAs (Marck and Grosjean, 2002; Jühling et al., 2009), including U8, the R15:Y48 tertiary base pairing, G18G19, and U33; in addition, the U54U55C56 for elongator tRNAs and the A54U55C56 for initiator tRNAMet are also conserved. The 3- terminal CCA sequence, to which an amino acid is conjugated, is not encoded in the *C. merolae* genome, as found in other eukaryotes.

As shown in **Figure 1B**, *C. merolae* permuted tRNA genes can be classified into four types (I–IV) based on the location of the junction between the 3- -end of the 5- -half and the 5- -end of the 3- -half in the inferred secondary structures of the pretRNAs. The junctions are located at position 20/21 in the D-loop (type I), position 37/38 in the anticodon-loop (type II), position 50/51 in the T-C-stem (type III), or position 59/60 in the T-C-loop (type IV). In *C. merolae*, one type I, six type II, one type III, and three type IV candidate tRNAs have been identified (**Figure 1C**, **Table 1**). The sequences adjacent to the junctions in the pre-tRNAs are predicted to form a BHB motif that is generally found around the intron-exon junctions of nuclear and archaeal pre-tRNAs (**Figure 1C**).

To date, 34 permuted tRNA genes have been identified in unicellular algae and archaea (**Table 1**), including 11 genes from the nuclear genome of the red alga *C. merolae* (Soma et al., 2007); 19 genes from the nuclear genome of four green algae (*Ostreococcus lucimarimus*, *Ostreococcus tauri*, *Micromonas Pusilla*, and *Micromonas* sp. *RCC299*) (Maruyama et al., 2010); two genes from the nucleomorph genome of the chlorarachiniophyte alga *Bigelowiella natans* (Maruyama et al., 2010), which is a remnant of a green algal nuclear DNA that developed as a secondary endosymbiont (Douglas et al., 2001; Archibald, 2007; Archibald and Lane, 2009); and two genes from the genome of the crenarchaeon *Thermofilum pendens* (Chan et al., 2011). In the nucleomorph and the nucleus of green algae, the junctions of the permuted pre-tRNAs are located most commonly at

5- -half (blue) and the 5- -end of the 3- -half (red) in the secondary structure. **(C)** Inferred secondary structures of pre-tRNAs representing the four types of permuted tRNA genes in *C. merolae*. The arrowheads indicate the positions to be processed. The intron sequence is shown in lower case. The tRNA positions are numbered according to Marck and Grosjean (2002). The figures are partially identical to the Figure 1 of Soma et al. (2007).


**Table 1 | Characteristics of permuted tRNA genes in unicellular algae and archaea.**

*Each gene is identified by the amino acid with which the tRNA is charged. Classification of permuted tRNAs (types I–V) is based on the location of the junction between the 5* - *-half and the 3* - *-half in the inferred secondary structure for pre-tRNAs. The length (bp, base pair) of sequence encoding for the intervening sequence and the presence of an intron are indicated. In C. merolae, two distinct genes that produce identical mature tRNA sequences encode tRNAGly(CCC). iMet and eMet mean initiator and elongator tRNAMet(CAT), respectively. The asterisk indicates that the sequence of this tRNA is a typical, and experimental analysis would be preferable.*

position 37/38 in the anticodon-loop (type II), while they are located at position 59/60 in the T-C-loop (type IV) in archaea (**Figure 2A**, **Table 1**). This tendency contrasts with that in the red alga *C. merolae*, in which the junctions are found at various positions in the cloverleaf structure. The intervening sequence varies from 1 to 74 bp among organisms (**Table 1**); tRNATyr(GTA) from *T. pendens* contains the shortest intervening sequence, while tRNAiMet(CAT) from *C. merolae* contains the longest intervening sequence. The species of amino acid or anticodon in the tRNAs encoded by permuted genes are not conserved among organisms. Interestingly, permuted tRNAiMet(CAT) exists in each lineage of red algae, green algae, and crenarchaea. In addition, tRNASer, tRNALeu, and tRNATyr, which are classified as class II tRNAs and have long variable-arms (Rich and Rajbhandary, 1976; Dirheimer et al., 1995), tend to be encoded as permuted genes. This observation may imply that the evolution of the long variable-arm, which is the dominant element required for recognition by corresponding aminoacyl-tRNA synthetases (Asahara et al., 1993;

and 21 means that the HBh' is from the junction of the *C. merolae* permuted tRNALeu(UAA), which contains an intron at 37/38. **(B)** The BHB

helices is denoted as "hBH" or "HBh'." The term "no H" represents motifs that do not contain a central 4-bp helix.

Himeno et al., 1997a; Soma et al., 1999), is correlated with that of the tRNA gene structure. Indeed, the long variable-arm is suggested to have arisen from an intron (Kjems et al., 1989). Further analyses of the sequences of disrupted tRNAs will aid identification of the types of tRNA genes that tend to be permuted.

### **TRANSCRIPTION OF PERMUTED tRNA GENES**

Northern blotting and aminoacylation analyses of *C. merolae* total RNA verified that tRNA molecules expressed from permuted genes are aminoacylated and are thus likely to participate in protein synthesis (Soma et al., 2007). Expression of some permuted tRNA genes from the nucleomorph and the nucleus of green algae has also been confirmed by northern blotting or reverse transcription polymerase chain reaction analyses (Maruyama et al., 2010). However, the function of mature tRNAs in the nucleomorph is unclear because protein synthesis in these structures has not yet been observed experimentally (Archibald and Lane, 2009; Curtis et al., 2012). The two permuted tRNAs in the archaea *T. pendens* are both encoded by single-copy genes for a unique anticodon that cannot be supplemented by other isoacceptors (Chan et al., 2011); therefore, they must be expressed and produce functional tRNA molecules.

The identification of unusual (permuted and atypical introncontaining) tRNA genes in eukaryotes raised an intriguing question about the mechanism of transcription. In eukaryotes, transcription of tRNA genes is generally performed by RNAPIII and it relies on an intragenic bipartite promoter consisting of an A box and a B box, which correspond to the highly conserved sequences in the D-arm (positions 8–19) and T-C-arm (positions 52–62), respectively, (Galli et al., 1981; Ciliberto et al., 1983; Sprague, 1995; Guffanti et al., 2006; Marck et al., 2006). The protein factors that bind to these motifs have been wellcharacterized in yeast (Willis, 1993; Paule and White, 2000; Geiduschek and Kassavetis, 2001; Huang and Maraia, 2001; Schramm and Hernandez, 2002). Polymerase III C (TFIIIC), a multi-subunit complex of transcription factors that is essential for transcription by RNAPIII, binds to the A and B boxes simultaneously and promotes binding of the TFIIIB complex, which includes the TATA-box binding protein, to the region upstream of the tRNA sequence, followed by recruitment of RNAPIII. The dependency of transcription of tRNA genes on the A and B boxes is predominantly conserved; however, the additional requirements for transcription are diverse among organisms and the upstream region sometimes contributes to the efficiency of the initiation step (Choisne et al., 1998; Yukawa et al., 2000; Hamada et al., 2001; Giuliodori et al., 2003; Dieci et al., 2006).

In permuted tRNA genes, the A box and B box are located inversely and are interrupted by an intervening sequence of variable length. This positional relationship is unsuitable for TFIIIC binding; therefore, the A and B boxes may not be uniformly bound by TFIIIC and the intragenic promoter may be dispensable for transcription of these genes. Instead, an upstream TATA-like sequence and a downstream T-stretch, which are probably the promoter and termination signal, respectively, (Sprague, 1995; Hamada et al., 2000, 2001; Nielsen et al., 2013), are located close to most permuted tRNA genes in *C. merolae* (Soma et al., 2007). This genomic arrangement also occurs for non-permuted tRNA genes in *C. merolae*, and the A and B boxes in the promoters of these genes may not be recognized by TFIIIC because they are often interrupted by a single or multiple (up to three) introns of various lengths (11–69 bp) (Matsuzaki et al., 2004; Soma et al., 2013). Homologs of TFC1 and TFC3, the TFIIIC components that are responsible for binding to the A and B boxes, have not been identified in *C. merolae* (Matsuzaki et al., 2004; Nozaki et al., 2007). Taken together, these findings suggest that *C. merolae* employs a non-canonical transcription system that is independent of TFIIIC and directs recruitment of TFIIIB to the upstream TATA-box, thereby enabling the transcription of various types of tRNA genes. An ambiguous AT-rich region is also located upstream of some permuted tRNA-encoding sequences in the *B. natans* nucleomorph and the nucleus of green algae (Maruyama et al., 2010). Therefore, TATA-like sequencedependent transcription of tRNA genes may predominate in algae. This possibility is supported by the fact that an upstream TATA box is well conserved and functionally important for transcription of tRNA genes in some plants and fungi (Choisne et al., 1997, 1998; Yukawa et al., 2000; Hamada et al., 2001; Dieci et al., 2006). In addition, transcription of *Saccharomyces cerevisiae* tRNA genes harboring an upstream TATA box proceeds without TFIIIC *in vitro* (Dieci et al., 2000).

In archaea, transcription of a stable RNA depends on the upstream promoter including BRE (TFB response element) and TATA box (Wich et al., 1986; Thomm and Wich, 1988; Palmer and Daniels, 1995; Reeve, 2003), and on a downstream poly T sequence, which contributes to transcription termination (Santangelo et al., 2009). In *T. pendens*, which harbors two permuted tRNA genes, a predicted AT-rich promoter is located upstream of most of its tRNA genes (Chan et al., 2011), suggesting that various types of tRNA genes are potentially expressed. Consistent with this notion, *T. pendens* contains a large number of tRNA genes that are disrupted by various introns (Sugahara et al., 2009; Chan et al., 2011; Fujishima et al., 2011).

### **MATURATION OF PERMUTED PRE-tRNAs VIA A CIRCULAR RNA INTERMEDIATE**

Processing of a pre-tRNA typically involves intronic splicing, maturation of the 5- - and 3- -ends at the acceptor stem, and nucleotide modification (**Figure 3A**) (Deutscher, 1995; Hopper and Phizicky, 2003). Biochemical analyses have shown that permuted pre-tRNAs in unicellular algae are maturated by a processing pathway that utilizes a circular RNA intermediate to exchange the location of the 5- - and 3- -halves of the tRNA (Soma et al., 2007, 2013; Maruyama et al., 2010). Reverse transcription polymerase chain reaction and sequencing analyses identified the following processing intermediates derived from algal permuted tRNAs: a circularly permuted pre-tRNA, the sequence of which aligns in the order of the leader sequence, the 3- -half of tRNA, the intervening sequence, the 5- -half of tRNA, and then the trailer sequence; and a circular RNA intermediate, in which the leader and trailer sequences are removed and the resulting ends are ligated, while the intervening sequence is retained. Furthermore, a consistent PCR product was also observed in these analyses, suggesting that two rounds of reverse transcription occur around a circular intermediate, thereby confirming the presence of the circular RNA molecule. Terminal sequences were also verified for a mature tRNA, in which the extra sequences are removed and the CCA sequence is added post-transcriptionally to the 3- -terminus of the acceptor-stem, as occurs in other eukaryotes.

As summarized in the model presented in **Figure 3B**, maturation of permuted pre-tRNAs in algal cells probably starts with processing of the junction of the termini to form a circular RNA intermediate in which the termini are joined by the intervening sequence. The intervening sequence in the acceptor-loop of the circular RNA intermediate is then removed, possibly by RNase P (Altman et al., 1995; Jarrous and Gopalan, 2010; Altman, 2011) and tRNase Z (Deutscher, 1995; Schürer et al., 2001; Schiffer et al., 2002; Späth et al., 2007), which are universal endoribonucleases. Finally, the 3- -terminal CCA sequence is added (Weiner, 2004) to generate the functional acceptor-stem of the tRNA. Because the circular RNA intermediate has been detected in red and green algae, this model is likely to be common to permuted tRNAs of both types of algae.

Cleavage of the leader and trailer sequences at the junction of permuted pre-tRNAs is most likely performed by the tRNAintron splicing machinery, because the sequences adjoining the processing sites potentially form a BHB motif, which is the dominant recognition element for nuclear and archaeal tRNA-splicing endonucleases (**Figures 1C**, **2B**). After excision of the BHB motifs at the junction, subsequent ligation of the 5- - and 3- -termini of the exons is required and is probably carried out by tRNA-splicing ligase (Xu et al., 1990b; Westaway and Abelson, 1995; Englert et al., 2011, 2012; Popow et al., 2011). It is intriguing that various positions in the cloverleaf structure of tRNAs, even the core region of the L-shaped tertiary structure, can serve as termini for

### permuted pre-RNA molecules that are recognized by the splicing machinery.

RNase P (McClain et al., 1987; Christian et al., 2002; Zahler et al., 2003; Kirsebom, 2007; Reiter et al., 2010; Altman, 2011) and tRNase Z (Nashimoto et al., 1999; Li de la Sierra-Gallay et al., 2006; Späth et al., 2007; Minagawa et al., 2008) generally recognize the top half of the L-shaped tertiary structure of a tRNA corresponding to the acceptor-stem and the T-C-arm, and do not require the mature body of the tRNA. Therefore, these enzymes may also perform endonucleolytic cleavage of the acceptor-loop of a circular RNA intermediate. Although some endoribonucleases require the linear ends of substrates to function (Mackie, 1998; Suzuki et al., 2006), it is not known whether this condition holds for RNase P and tRNase Z. The intron in the D- and T-C-arm, which inhibits folding of the tertiary structure of a tRNA, should be removed before processing of the acceptor-loop by RNase P and tRNase Z. Consistent with this requirement, the intron in the T-C-loop of a circular intermediate of *C. merolae* tRNAGly, which harbors both intronic and permuted structures, is removed before the intervening sequence at the acceptor-loop is processed (Soma et al., 2013). This finding can be explained by the fact that the top half of substrates for *C. merolae* tRNase Z must form a canonical tertiary structure, and circular pre-tRNAs without an intron would be able to fold into the canonical tertiary structure, which agrees with the previous discovery that artificially permuted tRNA molecules can fold into correct tertiary structures (Pan et al., 1991).

In eukaryotes, each tRNA processing step occurs at a different location in the cell, and the cellular distribution of processing enzymes is not conserved among organisms. In animal cells, the tRNA-splicing endonuclease and ligase are localized to the nucleus (Westaway and Abelson, 1995; Paushkin et al., 2004). By contrast, in budding yeast, the endonuclease is present on the surface of mitochondria (Huh et al., 2003; Yoshihisa et al., 2003) and the ligase is present in the cytosol (Huh et al., 2003). RNase P and tRNase Z are found in the nucleus and/or cytoplasm in eukaryotic cells (Späth et al., 2007; Canino et al., 2009; Gobert et al., 2010; Pinker et al., 2013). Accordingly, the order of the processing steps of a permuted pre-tRNA in algal cells will likely be governed by the location of the enzymes required.

It is unclear whether maturation of archaeal permuted pretRNAs involves the formation of a circular RNA intermediate. In a recent study, an *in vitro* transcript simulating a permuted pre-tRNA, which was composed of a tandem repeat of introncontaining tRNA, was cleaved at the BHB motif by a recombinant splicing endonuclease from the euryarchaeon *Methanococcus jannaschii* (Tocchini-Valentini and Tocchini-Valentini, 2012), suggesting that archaeal permuted pre-tRNAs can be processed in a similar pathway to that found in algae. Analysis of permuted pretRNA processing in *T. pendens* may also help to clarify whether the physiological role of permuted tRNA genes is ascribed to the formation of the circular RNA intermediate. With the exception of *Nanoarchaeum equitans* (Randau et al., 2008; Heinemann et al., 2010), RNase P and tRNase Z generally contribute to the end maturation of tRNAs in archaea (Späth et al., 2007; Jarrous and Gopalan, 2010). The 3- -terminal CCA sequence of two permuted tRNAs is encoded in the genome sequence of *T. pendens* (Chan et al., 2011) and one of these genes contains a short intervening sequence of only one nucleotide. It will be intriguing to clarify how such a short intervening sequence in the acceptor-loop is removed.

### **PROCESSING OF AN INTRON IN PERMUTED PRE-tRNAs**

Four tRNA genes from the red alga *C. merolae* (Soma et al., 2007), one tRNA gene from the green alga *O. lucimarinus* (Maruyama et al., 2010), and two tRNAs genes from the crenarchaeon *T. pendens* (Chan et al., 2011) contain an intron in the 5- - or 3- -half of the gene (**Figure 1A**, **Table 1**), meaning that their pre-tRNAs require splicing of an intron in addition to swapping of the 5- and 3- -halves. The position of the intron is not conserved among these organisms; in the four *C. merolae* tRNA genes, the introns are inserted at various positions (the D-loop, the anticodon-loop, and the T-C-loop), while those in the *O. lucimarinus* and *T. pendens* tRNA genes are inserted at specific positions: 27/28 in the anticodon-stem and 37/38 in the anticodon-loop, respectively. In *C. merolae* and *T. pendens*, the intron-exon junction and the termini of permuted pre-tRNAs harboring an intron can each form an independent BHB motif. The two BHB motifs are not nested; therefore, processing of one BHB motif can be preceded by processing of the other. Using *C. merolae* tRNAGly(CCC), which possesses both permuted (with the junction at position 37/38 in the anticodon-loop) and intronic (inserted at position 55/56 in the T-C-loop) structures, it was determined that the BHB motif in the intron is processed before the BHB motif in the termini of permuted pre-tRNAGly(CCC) (Soma et al., 2013). The theoretical G of the BHB motif in the intron was calculated to be slightly lower than that of the BHB motif in the termini. The same phenomenon was also observed for precursors transcribed from multiple intron-containing (but not permuted) tRNA genes in *C. merolae*, in which the BHB motifs in the pre-tRNAs were removed in the order dictated by the theoretical free energy of each motif (Soma et al., 2013). These findings indicate that multiple BHB motifs in permuted and/or intronic pre-tRNAs in *C. merolae* are processed sequentially, even when each BHB motif can fold independently. This feature may be attributable to the stability of each BHB motif and their accessibility to the splicing endonuclease. Alternatively, it may depend on the position of the BHB motifs, because the BHB motif at the canonical position 37/38 is always the final substrate and has a relatively high G. BHB motifs at 37/38, even those that form the junction of the permuted pre-tRNA or the intron, may be recognized by *C. merolae* endonuclease only after BHB motifs at the other positions have been processed. This procedure contrasts with the processing of multimeric introns in some archaeal pre-tRNAs, in which the introns are nested and the last intron can form a BHB motif only after the other introns are processed (Sugahara et al., 2007; Tocchini-Valentini et al., 2009).

### **CORRELATION BETWEEN THE BHB MOTIF AT THE JUNCTION OF PERMUTED PRE-tRNAs AND THE SUBSTRATE SPECIFICITY OF SPLICING ENDONUCLEASES**

The BHB motif is the dominant recognition element for all known nuclear and archaeal tRNA-splicing endonucleases (Fruscoloni et al., 2001; Marck and Grosjean, 2003; Tocchini-Valentini et al., 2005a; Xue et al., 2006; Calvin and Li, 2008) and processing by these enzymes should have been pivotal for the development and maintenance of permuted tRNA genes in the genome. Archaeal endonucleases exhibit symmetrical architectures, and recognition of the splice sites of intronic pretRNAs by these enzymes is largely dependent on the BHB motif (**Figure 4A**) (Thompson and Daniels, 1988; Diener and Moore, 1998; Tocchini-Valentini et al., 2005a,b; Calvin and Li, 2008). In most archaeal tRNAs, the BHB motifs develop a relaxed form (hBH, as shown in **Figure 2B**) and are located at position 37/38 in the anticodon-loop, while several species from Crenarchaeota contain introns at non-canonical positions, such as the anticodon-arm, D-arm, T-C-arm, variable-arm, or acceptor-stem, with strict (hBHBh- ) or relaxed (hBH, BHh- , or no H) forms of the BHB motif (Marck and Grosjean, 2003; Tocchini-Valentini et al., 2005a; Sugahara et al., 2007, 2008, 2009). In addition to tRNAs, BHB-mediated introns are also found in rRNAs and mRNAs in some archaea (Kjems and Garrett, 1988; Tang et al., 2002; Watanabe et al., 2002; Yoshinari et al., 2006). Furthermore, the combination of RNA fragments during maturation of split tRNAs depends on the processing of the BHB motifs by the tRNA-splicing machinery in *N. equitans* (Randau et al., 2005c), indicating that BHB-mediated disruption of genetic information and its processing by splicing endonucleases is widespread in archaea. Four different types of endonuclease have been identified in archaea (Tocchini-Valentini et al., 2005a; Calvin and Li, 2008; Fujishima et al., 2011; Hirata et al., 2011); the

possible heterodimeric (cmSen2 and cmSen34), heterotrimeric (cmSen2, cmSen34, and cmSen54), and heterotetrameric (cmSen2, cmSen34, cmSen54, and unidentified cmSen15) forms of the *C. merolae* endonuclease.

subunit architecture of these endonucleases seems to have coevolved, by "subfunctionalization," with their substrate specificity (Tocchini-Valentini et al., 2005b, 2007). *T. pendens* contains a heterotetrameric endonuclease (α2β2) that can recognize both strict (hBHBh- ) and relaxed (BHL) motifs, and the junction of its two permuted tRNAs comprises no H or hBHBh motif, and is located at position 59/60 in the T-C-loop (**Figure 2**, **Table 1**). The broad substrate specificity of the *T. pendens* endonuclease would have allowed the development and maintenance of permuted tRNAs during evolution.

The *S. cerevisiae* splicing endonuclease forms a heterotetrameric structure (αβδε) comprised of two catalytic subunits (Sen2 and Sen34) and two accessory subunits (Sen15 and Sen54) (Rauhut et al., 1990; Westaway and Abelson, 1995; Trotta et al., 1997; Calvin and Li, 2008). Interactions between Sen2 and Sen54, and between Sen15 and Sen34, were identified by a yeast twohybrid experiment (Trotta et al., 1997). These four subunits function cooperatively to recognize cleavage sites via "a ruler mechanism," in which the endonuclease measures a specified distance to the site at which the cuts should be made in a pretRNA (**Figure 4B**) (Greer et al., 1987; Reyes and Abelson, 1988; Westaway and Abelson, 1995; Fabbri et al., 1998; Calvin and Li, 2008). In addition to the typical hBH motif at the canonical 37/38 position, yeast endonuclease recognizes the mature domain of pre-tRNA and the base pairs between the anticodon and the intron (A·I base pairs) (Mattoccia et al., 1988; Baldi et al., 1983, 1992; Di Nicola Negri et al., 1997; Trotta et al., 2006; Xue et al., 2006). Similarly, wheat germ endonuclease recognizes some specific nucleotides in the D-stem, and the mature tRNA domain is required for adequate binding to the endonuclease (Stange et al., 1992). Coordination between all four subunits of eukaryal endonucleases would stabilize the enzyme to place its active site at a specific position in the cloverleaf structure of pre-tRNA, namely position 37/38. Thus, it is likely that the recognition system and asymmetric subunit architecture of eukaryal endonucleases have co-evolved strictly with the BHB motifs at position 37/38.

The junctions of permuted pre-tRNAs in the *B. natans* nucleomorph and the nucleus of green algae comprise a hBH motif and are located at position 37/38 in the anticodon-loop (type II), which are the characteristics for recognition by the eukaryal splicing endonuclease (**Table 1**, **Figure 2**). In addition, *B. natans* and green algae contain almost no tRNA genes harboring atypical introns (Palenik et al., 2007; Maruyama et al., 2010). On the contrary, in the red alga *C. merolae*, the junctions of permuted pre-tRNAs and introns comprise various types of BHB motifs and are scattered along the cloverleaf structure (**Figure 2A**). This arrangement suggests that the *C. merolae* splicing endonuclease recognizes a wide variety of BHB motifs and employs a recognition strategy that is different from that of the known eukaryotic endonucleases.

A search of the *C. merolae* genome identified homologs of three of the yeast endonuclease subunits (cmSen2, cmSen34, and cmSen54) (Soma et al., 2013); however, no apparent homolog of the Sen15 accessory subunit was identified by homology searching or yeast two-hybrid analyses, which conflicts with the notion that all four subunits are essential for functional multimerization of the endonuclease. In yeast, Sen15 interacts with Sen34 to aid the proper positioning of the 3- -splice site (Westaway and Abelson, 1995; Di Nicola Negri et al., 1997; Trotta et al., 1997; Fabbri et al., 1998; Xue et al., 2006). The *C. merolae* endonuclease may contain an unidentified subunit or may comprise a novel heterotrimeric complex (**Figure 4C**). However, the *C. merolae* endonuclease containing accessory subunits is not likely to interact with pre-tRNAs that are disrupted at positions other than 37/38, because yeast Sen54 probably interacts with the D-arm and the acceptor-stem that are located in the core region of the L-shaped tertiary structure of a pre-tRNA (Di Nicola Negri et al., 1997; Xue et al., 2006). Thus, the *C. merolae* endonuclease may act on these pre-tRNAs as a dimer composed of catalytic subunits only (cmSen2 and cmSen34), via a tRNA mature domain-independent recognition mechanism. It is also tempting to speculate that the subunit composition of the *C. merolae* endonuclease depends on the positions or types of BHB motifs in the substrates. A feasible model may be that BHB motifs at positions other than 37/38 are removed by cmSen2 cmSen34, making the BHB motif at position 37/38 accessible to cmSen2-cmSen54-cmSen34 or cmSen2-cmSen54-cmSen34 cmSen15(unidentified), which interacts with the mature domain of the pre-tRNA, as occurs in yeast (**Figure 4C**). A previous study showing that the BHB motif at the canonical 37/38 position is always the final substrate during tRNA processing in *C. merolae* cells (Soma et al., 2013) may support this hypothesis. These observations imply that processing of the BHB motif in eukaryal tRNAs is more divergent among species than previously thought. Various types of BHB-mediated disrupted tRNA genes and splicing endonucleases may be present in other eukaryotes. In fact, ectopic intron-containing tRNA genes have been discovered in the nucleomorph of the cryptomonad *Guillardia theta* (Kawach et al., 2005), although many of these introns do not form a defined BHB motif. Furthermore, the absence of an accessory subunit (Sen15) homolog in *Arabidopsis thaliana* (Akama et al., 2000) implies that plant endonucleases have evolved various patterns of subunit architectures. On the other hand, *A. thaliana* contains only a few species of canonical intron-containing tRNA genes and does not contain any other disrupted tRNA genes; therefore, its endonuclease has not been adapted to process non-canonically disrupted pre-tRNAs.

### **IMPLICATIONS FOR THE PHYSIOLOGICAL RELEVANCE OF PERMUTED tRNA GENES**

To date, circular gene permutation of non-coding RNAs other than tRNA has been reported for the LSU rRNA from *Tetrahymena* mitochondria (Heinonen et al., 1987) as well as bacterial and organellar tmRNAs (Keiler et al., 2000; Mao et al., 2009), the latter of which are involved in the *trans*-translation system that rescues stalled ribosomes and maintains quality control of proteins in the cell (Keiler et al., 1996; Himeno et al., 1997b; Muto et al., 1998). However, permuted tRNAs show some substantial differences to permuted rRNAs and tmRNAs. A pretRNA of a permuted tRNA gene is processed and re-ligated at the junction of the 5- - and 3- -halves. The resultant tRNA molecule is composed of a continuous single-stranded RNA that can form a canonical cloverleaf structure, which is equipped with a functional acceptor-stem and an anticodon in the proper position. By contrast, the corresponding breaks between the 5- - and 3- -halves of rRNAs and tmRNAs encoded by permuted genes are not ligated and they function in a two-piece form. In the case of tmRNA, this form has been suggested to have a beneficial function, perhaps by solving topological problems on the ribosome (Williams, 2002; Sharkady and Williams, 2004). This idea is supported by the independent evolution of a similar two-piece form of tmRNA, encoded as a permuted gene in different lineages of bacteria (Sharkady and Williams, 2004; Williams, 2004). Additionally, the location of the junction of the 5- - and 3- -halves differs between permuted tmRNAs and permuted tRNAs. The two-piece form of tmRNA is adapted to its functional advantage, and the corresponding breakage between the 5- -half and the 3- -half is located at a unique position downstream of the tag peptide coding region. By contrast, the junctions of permuted tRNAs are located at various positions in the cloverleaf structure because breakage at any position is ultimately ligated to produce a typical tRNA molecule. Consequently, permutation of genes encoding tRNAs does not seem to affect the authentic function of the mature tRNA or confer any physiological benefit or restriction.

In *C. merolae*, disrupted tRNA genes that exhibit permuted (7/43), intron-containing (23/43), or both types of structures (4/43) account for 79.1% (34/43) of all nuclear tRNA genes (Soma et al., 2013), whereas only a few protein-encoding genes have spliceosomal introns (Matsuzaki et al., 2004). The conservation of a large number of permuted tRNAs, in addition to intronic tRNAs, which require more extensive processing, in the streamlined genome of *C. merolae*, implies that BHB-mediated disruption of tRNA genes has some physiological meaning. It is well known that while some tRNA introns are dispensable (Mori et al., 2011) others are involved in post-transcriptional modification (Johnson and Abelson, 1983; Szweykowska-Kulinska and Beier, 1992; Björk, 1995), quality control to ensure the supply of precisely processed tRNA molecules to the cytosol (Arts et al., 1998; Lund and Dahlberg, 1998; Takano et al., 2005; Hopper, 2013), and regulation of the cell cycle in response to DNA damage (Ghavidel et al., 2007; Weinert and Hopper, 2007). Therefore, permuted tRNA genes may contribute to essential cell functions. Alternatively, the circular RNA intermediate may be preferable because of its resistance to degradation in the cell.

From a physiological point of view, a possible explanation for the maintenance of disrupted tRNA genes is protection against mobile elements (Randau and Söll, 2008). Fragmentation of tRNA genes is thought to prevent the integration of mobile elements because tRNA gene sequences are sometimes used as conventional target sites in the genome (Devine and Boeke, 1996; Hani and Feldmann, 1998; Mou et al., 2006). This direct and valuable strategy would have functioned as a selective pressure at some point during evolution to increase the number of permuted tRNA genes. This possibility may be supported by the fact that almost no recognizable transposons or viruses are found in the contemporary genomes of *C. merolae* and *M. pusilla*, which harbor permuted tRNA genes (Matsuzaki et al., 2004; Worden et al., 2009). By contrast, *Ostreococcus* species, which contain some permuted tRNA genes and *cis*-spliced tRNA genes, have many transposons (Worden et al., 2009; Maruyama et al., 2010). Genome-wide analyses and studies focusing on the relationship between mobile elements and disrupted tRNA genes should further our understanding of this concept.

The eukaryal tRNA processing system has proofreading functions to ensure that only mature tRNAs are supplied for translation, and yeast cells possess multiple pathways to degrade inappropriately processed and folded tRNAs (Arts et al., 1998; Lund and Dahlberg, 1998; Kadaba et al., 2004; Takano et al., 2005; Whipple et al., 2011; Hopper, 2013; Kramer and Hopper, 2013). In *Xenopus laevis* oocytes, intron-containing pre-tRNAs are exported from the nucleus less efficiently than intron-spliced tRNAs, and nucleotide modifications and removal of the 5- - and 3- -flanking sequences at the acceptor-stem are monitored before transport of tRNAs into the cytosol (Arts et al., 1998). Therefore, the BHB motifs at various positions of permuted pre-tRNAs and the acceptor-loop of the circular RNA intermediate inhibit their exportin-dependent transport from the nucleus, and the sequential processing of permuted pre-tRNAs would contribute to the discrimination of immature tRNAs, providing a selective pressure to retain them in the genome. *C. merolae* cells use a small repertoire of tRNAs; hence, the quality of tRNA molecules must be checked to guarantee translational fidelity. Furthermore, elimination of incorrectly processed tRNA molecules might be more important for organisms that harbor a splicing endonuclease with relaxed substrate specificity.

A different perspective is that permuted tRNA genes might have been formed as a remnant of genome dynamics under relatively neutral selective pressure. Even if such tRNA genes were acquired, most of them could not be retained because of the failure of transcription or subsequent RNA processing. In some organisms, including early-rooted algae and archaea, permuted tRNA genes could have persisted in the genome because of the upstream promoter-dependent transcription system and the capacity of the splicing machinery to process disrupted pretRNAs into the canonical cloverleaf structure. An expression system adapted to the wide variety of tRNA genes might have been preferable for organisms attempting to reduce redundantly duplicated tRNA genes, thereby enabling disruption of tRNA genes in various ways while maintaining the repertoire of those essential for protein synthesis. It has been suggested that permuted tRNA genes might have contributed to the maintenance of genome integrity during the reduction of the *B. natans* nucleomorph genome, which is the smallest eukaryotic genome (Gilson et al., 2006; Maruyama et al., 2010). Thus, plasticity of tRNA gene structure and expression systems may be more important than permuted tRNA genes.

### **SCENARIOS FOR THE DEVELOPMENT OF PERMUTED tRNA GENES**

There are two hypotheses for the development of permuted tRNA genes: the "ancient origin" hypothesis, which is related to the origin of the cloverleaf structure of tRNA (Di Giulio, 2008; Fujishima et al., 2008); and the "recent origin" hypothesis, which assumes that permuted tRNA genes arose from existing tRNAs in a relatively late stage of evolution (Randau and Söll, 2008; Sugahara et al., 2008; Maruyama et al., 2010; Chan et al., 2011).

The cloverleaf structure of tRNA is thought to have originated from mini-hairpins (Weiner and Maizels, 1987; Di Giulio, 1992, 2006; Schimmel and Ribas De Pouplana, 1995; Widmann et al., 1995), and tRNA sequences sometimes form a double hairpin structure flanked by the anticodon sequence (Tanaka and Kikuchi, 2001). The dominant localization of introns at position 37/38, which divide tRNAs into two hairpins, may be a remnant of the boundary connecting the hairpins, and disrupted tRNAs may represent plesiomorphic forms produced during the development of the modern cloverleaf structure (Di Giulio, 2008; Fujishima et al., 2008). The results of archaeal genome analyses have consistently suggested that modern tRNAs evolved through the combination of 5- -half and 3- -half fragments (Fujishima et al., 2008). Based on this concept, it was proposed that permuted tRNA genes arose from an event in which the two hairpin-like structures encoding the 5- - and 3- -halves of a tRNA were brought together in an inverted configuration on the genome (Di Giulio, 2008). However, there is some debate surrounding this idea. First, some permuted tRNAs are intervened at positions other than 37/38, which conflicts with the assumptions of the hairpin model (Di Giulio, 2008). Second, it is questionable whether the ancient forms of tRNA genes are preserved in the modern genome (Randau and Söll, 2008).

Based on comparative genome analyses, another hypothesis suggests that BHB-mediated disrupted tRNA genes were gained by gene transfer as apomorphies or were developed from extant tRNA genes (Di Giulio, 2008; Randau and Söll, 2008; Sugahara et al., 2009, 2012; Fujishima et al., 2010; Maruyama et al., 2010; Chan et al., 2011). Given that permuted tRNAs are present in early-rooted algae (Nozaki et al., 2003, 2007; Matsuzaki et al., 2004) and deep-branching Crenarchaeota from which eukarya might have derived (Lake et al., 1984; Cox et al., 2008), the algal genome may retain permuted tRNAs as a vestigial trait inherited from archaea. In fact, *C. merolae* tRNAs exhibit some characteristics that are found in archaea but not eukaryotes. For example, a number of *C. merolae* tRNAs contain ectopic and multiple introns, and *C. merolae* tRNAIle has the anticodon GAU (Matsuzaki et al., 2004), which has been identified in prokaryotes but not eukaryotes. However, sequence and structural similarities of the disrupted tRNAs in *C. merolae* and archaea have not been identified. Moreover, archaeal permuted tRNA genes encode the terminal CCA sequence, which is not encoded in the eukaryal genome, indicating that they have not simply been exchanged between archaea and algae (Chan et al., 2011). Therefore, permuted tRNA genes might have arisen independently in each lineage. This possibility is supported by the fact that BHB-mediated disrupted tRNA genes exhibit a discontinuous and patchy distribution in eukaryotes and archaea (Maruyama et al., 2010; Chan et al., 2011; Soma et al., 2013). An evolutionary relationship between *cis*-spliced tRNAs and split tRNAs has been suggested, because the leader sequences of some split tRNAs show a high degree of homology to the intronic sequence of tRNAs in correlated archaea (Fujishima et al., 2010). In addition, continuous transcripts corresponding to read-through of adjacently encoded 5- - and 3- -halves of split tRNAs are produced, albeit at very low levels, suggesting that they represent a transition state between a split tRNA and a *cis-*spliced tRNA in the genome (Chan et al., 2011). Thus, it is possible that permuted tRNAs emerged from extant tRNA genes.

A plausible description of the emergence of permuted tRNA genes via convergent evolution can be inferred from the model proposed for permuted rRNAs and tmRNAs, which function as a two-piece form as described earlier. These species are hypothesized to have been established by a gene duplication event that formed a tandem repeat of the RNA genes, followed by the loss of the outer segment of each copy (Heinonen et al., 1987; Williams, 2002). Similarly, permuted tRNA genes might have originated from duplication of an intronic tRNA gene, followed by the loss of the outer exons to leave the 3- -half of the upstream tRNA gene and the 5- -half of the downstream tRNA gene (**Figure 5A**) (Soma et al., 2007; Di Giulio, 2008; Maruyama et al., 2010). In algae and archaea, these rearranged tRNA genes could have persisted in the genome because of the use of the upstream promoter-dependent transcription system and the tRNA maturation system that allows processing of permuted pre-tRNAs. In this context, the high frequency of permutation with the junction at position 37/38 (type II) can be ascribed to the overall dominance of introns located at the corresponding position in both eukaryotes and archaea. It is noteworthy that some tandem repeats of tRNA genes composed of single tRNA species containing an intron have been found in the nuclear genomes of green algae, namely the prasinophyte *O. lucimarinus* and the chlorophycea *Chlamydomonas reinhardtii*, which contain some and no permuted tRNAs, respectively, (**Table 1**) (Maruyama et al., 2010). Furthermore, an additional 5- -half is located downstream of the 5- -half of the permuted tRNACys(GCA) gene in the nuclear genome of *O. lucimarinus*. These duplicated tRNA genes may be structurally identical to the plausible intermediate stage of permuted tRNA evolution shown in the proposed model.

An alternative scenario is that the formation of permuted or circularized tRNA molecules preceded that of the corresponding permuted genes. Canonical pre-tRNAs co-transcribed from two tandemly-repeated intronic tRNA genes might be able to form a permuted pre-tRNA via the combination of the 3- -half of the initial tRNA and the 5- -half of the duplicated tRNA (**Figure 5B**). In support of this concept, a recent study showed that an artificial transcript simulating a tandemly-repeated intron-containing pretRNA could form a permuted tRNA structure *in vitro* (Tocchini-Valentini and Tocchini-Valentini, 2012). Furthermore, a circular pre-tRNA may be produced by ligation of an intron-containing (non-permuted) pre-tRNA at the acceptor-stem (**Figure 5C**). Indeed, many kinds of circularized non-coding RNAs have been identified in archaeal cells, indicating that the circularization of RNA is fairly prevalent, although the significance of this feature is still unknown (Danan et al., 2012). The resulting permuted or circular pre-tRNA molecules might have been reverse transcribed and integrated back into the genome to generate permuted tRNA genes (**Figures 5B,C**). Therefore, it is plausible that circular permutation has contributed to the evolution of the tRNA-like structures that are prevalent in nature (Pan et al., 1991; Pan and Uhlenbeck, 1993; Florentz and Giegé, 1995), and the cloverleaf structure of tRNA might have developed as a circularly permuted RNA isomer.

Regardless of the mechanism(s) by which permuted tRNA genes originated, the BHB motifs must have played a pivotal role during their development. The existence of a number of BHB-mediated *cis-*spliced tRNAs in algae and archaea may reflect a background that has accelerated the production of permuted tRNA genes. If so, permuted tRNA genes could have occurred frequently in archaea, especially in Crenarchaeota, whose splicing machinery can process various types of BHB motifs. However, only two permuted tRNA genes have been identified from one crenarchaeon (*T. pendens*) (**Table 1**), which harbors plenty of intron-containing tRNA genes (Fujishima et al., 2011). In eukaryotes, tRNA genes contain an intron at the canonical 37/38 position and tRNA genes of plants and yeast can be transcribed depending on the upstream promoter; therefore, it is plausible that eukaryotes could possess permuted tRNA genes with the junction at the canonical 37/38 position. However, most eukaryotes do not contain permuted tRNAs. These observations may indicate that the background for the development of permuted tRNA genes is intrinsically different among organisms. Moreover, even if permuted tRNA genes did once emerge in archaeal and eukaryotic species, they may not have been maintained in the genome due to their instability or harmful influence. For example, an inverted tRNA gene structure might have been lost easily, or the BHB motifs may have been associated with a specific adverse effect on the genome or organism.

The phylogenic distributions of BHB-mediated disruptions of tRNA genes are biased and an organism harboring all three types of disrupted tRNAs has not yet been identified. Some archaea, including *N. equitans* and *Caldiviga maquilingensis*, harbor split tRNAs but only a few *cis*-spliced tRNAs and no permuted tRNAs (Chan et al., 2011; Fujishima et al., 2011). Other archaea, including the Pyrobaculum and Thermofilum genera, harbor a number of intronic tRNAs that are disrupted at various positions, although Pyrobaculum have no split or permuted tRNAs (Fujishima et al., 2011). Similarly, green algae contain some permuted tRNAs but almost no ectopic intron-containing tRNAs (Kawach et al., 2005; Maruyama et al., 2010). Hence, *C. merolae* is unique because it possesses a number of permuted tRNAs and various intron-containing tRNAs. *C. merolae* might be permissive for the absorption and retention of various tRNA genes, or some characteristics of *C. merolae* may have accelerated the development and preservation of permuted tRNA genes during evolution. Considering that *C. merolae* has a compact genome, it is possible that the successive genome size reduction put pressure on redundantly duplicated tRNA genes to be arranged into a single permuted tRNA gene. Split tRNAs have not been identified in *C. merolae*, despite its potential ability to express them. Formation of a split tRNA may be a less efficient strategy to reduce the genome size because it requires two sets of promoter and terminator sequences to produce one species of tRNA. To date, there has been no report of an organism in which split tRNAs coexist with permuted tRNAs; therefore, the individual mechanisms and requisite elements required for the acquisition or maintenance of each disrupted tRNA gene could be substantially different, as suggested previously (Chan et al., 2011). This hypothesis supports a non-monophyletic origin of BHB-mediated disrupted tRNA genes, which may have arisen and disappeared multiple times independently in various organisms. The next challenge will be to identify the specific characteristics and fundamental background that led to the BHB-mediated disruption of tRNA genes, and to clarify the method of formation of each type of tRNA gene.

### **CONCLUSIONS**

The identification of circularly permuted tRNA genes has revealed a unique style of gene structure and RNA processing. Comparative genome analyses should be performed to identify more examples of permuted genes and to investigate the origin of the permuted tRNAs in correlation with other BHB-mediated disrupted tRNAs and mobile elements that target tRNA genes. Studies of the transcription and maturation systems for tRNAs that must have co-evolved with disrupted tRNA genes would help to clarify the physiological meaning and the mechanisms that govern the development and maintenance of permuted tRNA genes.

### **AUTHOR CONTRIBUTIONS**

Akiko Soma wrote the manuscript, and prepared **Figures 1**–**5** and **Table 1**.

### **ACKNOWLEDGMENTS**

The author thanks A. Muto, S. Maruyama, S. Goto, Y. Sekine, A. Onodera, J. Sugahara, N. Yachie, and A. Kanai for valuable discussions; and T. Kuroiwa, K. Tanaka, O. Misumi, and M. Matsuzaki for technical advice; and M. Ohara and K. Nakayama for helping preparation of the figures. This work was supported by a grant from a Grant-in-Aid for Scientific Research C (25440003 to Akiko Soma), Career-Support Program for Woman Scientists at Chiba University (to Akiko Soma), and the Nakajima Foundation (to Akiko Soma).

### **REFERENCES**


features in the hot-spring red alga. *Cyanidioschyzon merolae*. *BMC Biol.* 5:28. doi: 10.1186/1741-7007-5-28


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 January 2014; accepted: 12 March 2014; published online: 01 April 2014. Citation: Soma A (2014) Circularly permuted tRNA genes: their expression and implications for their physiological relevance and development. Front. Genet. 5:63. doi: 10.3389/fgene.2014.00063*

*This article was submitted to Non-Coding RNA, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Soma. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Handling tRNA introns, archaeal way and eukaryotic way

### *Tohru Yoshihisa\**

*Graduate School of Life Science, University of Hyogo, Ako-gun, Hyogo, Japan*

### *Edited by:*

*Akio Kanai, Keio University, Japan*

### *Reviewed by:*

*Akiko Soma, Chiba University, Japan Javier Augusto Martinez, Institute of Molecular Biotechnology, Austria*

*\*Correspondence:*

*Tohru Yoshihisa, Graduate School of Life Science, University of Hyogo, 3-2-1, Kouto, Kamigori-cho, Ako-gun, Hyogo 678-1297, Japan e-mail: tyoshihi@sci.u-hyogo.ac.jp*

Introns are found in various tRNA genes in all the three kingdoms of life. Especially, archaeal and eukaryotic genomes are good sources of tRNA introns that are removed by proteinaceous splicing machinery. Most intron-containing tRNA genes both in archaea and eukaryotes possess an intron at a so-called canonical position, one nucleotide 3 to their anticodon, while recent bioinformatics have revealed unusual types of tRNA introns and their derivatives especially in archaeal genomes. Gain and loss of tRNA introns during various stages of evolution are obvious both in archaea and eukaryotes from analyses of comparative genomics. The splicing of tRNA molecules has been studied extensively from biochemical and cell biological points of view, and such analyses of eukaryotic systems provided interesting findings in the past years. Here, I summarize recent progresses in the analyses of tRNA introns and the splicing process, and try to clarify new and old questions to be solved in the next stages.

### **Keywords: tRNA, intron, splicing, genome, archaea, eukaryote**

Progress in bioinformatics widens our understanding of structural characteristics of tRNA genes (Lowe and Eddy, 1997; Sugahara et al., 2006, 2008; Heinemann et al., 2010; Cognat et al., 2013). Especially, recent powerful sequence analyses with the next generation sequencers accumulate an enormous amount of sequence information in tRNA genes through whole genome sequencing of non-model organisms from various evolutional clades and through metagenome analyses mostly of prokaryotic species. In these analyses, introns were found in many tRNA genes in genomes among all of the three kingdoms of life (Heinemann et al., 2010). In eubacterial genomes, and their relatives, eukaryotic organellar genomes, small numbers of tRNA genes harbor the group I intron within the anticodon region (Reinhold-Hurek and Shub, 1992; Haugen et al., 2005). These introns are spliced by a series of phosphoester transfer reactions catalyzed by intronic sequences, whose mechanism is somehow related to splicing of mRNA. On the other hand, archaeal and eukaryotic nuclear genomes have tRNA introns whose splicing is completely dependent on proteinaceous enzymes (Phizicky and Hopper, 2010; Popow et al., 2012; Hopper, 2013). In addition to normal introns, their variations have been found in both archaeal and eukaryotic genomes. Furthermore, various interesting observations have been reported on biochemical and cell biological aspects in pretRNA splicing machinery in recent years. In this review, I mainly handle issues related to these "protein-spliced" tRNA introns and their splicing machinery by emphasizing comparison between archaeal and eukaryotic systems.

### **STRUCTURAL CHARACTERISTICS OF tRNA GENES HARBORING INTRONS**

Introns found in archaeal and eukaryotic tRNA genes are mostly inserted at one nucleotide 3 to the anticodon, namely position 37/38 in the standard nomenclature (**Figure 1**) while introns are also inserted into other parts of tRNA genes in minor cases (see below in detail). Because intron insertion at this canonical position disrupts the anticodon stem-loop structure, its splicing is indispensable for tRNA maturation. On the other hand, the canonical intron does not seem to interrupt the overall tRNA structure (**Figure 1**). Indeed, pre-tRNAs with the canonical intron were shown to maintain structures of the D and T-C arms, and the acceptor stem by chemical and enzymatic probing (Swerdlow and Guthrie, 1984; Lee and Knapp, 1985). Structural characteristics of archaeal and eukaryotic tRNA genes containing the canonical intron have some similarity: mostly, 5- - and 3- -splice sites are set in short single-stranded segments franked by doublestranded stretches. However, close inspection of these structures reveal some difference between the two groups, which is derived from difference in strategy of splice site recognition by splicing enzymes. For splicing of pre-tRNAs, both archaebacteria and eukaryotes utilize similar sets of enzymes, namely tRNA splicing endonuclease (Sen) (Thompson and Daniel, 1990; Trotta et al., 1997; Li et al., 1998; Akama et al., 2000; Paushkin et al., 2004) and tRNA ligase (Phizicky et al., 1986; Englert and Beier, 2005; Englert et al., 2010, 2011; Popow et al., 2011, 2014). Some organisms require additional factors for the ligation step (Culver et al., 1997; Harding et al., 2008; Popow et al., 2011, 2014). Among these, splicing endonuclease is responsible for recognition of splice sites, and acts as a decoding engine of tRNA-type splice sites on various transcripts (see below in detail).

### **INTRON-CONTAINING tRNA GENES IN ARCHAEBACTERIA**

Introns are found in every isodecoder tRNA (tRNAs with the same anticodon) genes of sequenced archaeal genomes. On an average, ∼15% of tRNA genes have introns in the archaeal genomes while ratio of intron-containing genes varies from ∼8% in *Euryarchaeota* to ∼48% in *Crenarchaeota* (Marck and Grosjean, 2003; Sugahara et al., 2008; Chan et al., 2011). Length of introns ranges from 11 to 129 nt, and its median for each isodecoder

tRNA falls mostly in 12–25 nt except the case of tRNA-TrpCCA introns (65 nt), which nest a box C/D small RNA (Omer et al., 2000; Clouet d'Orval et al., 2001; Singh et al., 2004). The hallmark of splice sites in archaeal pre-tRNAs is the bulge-helix-bulge (BHB) motif (Kjems and Garrett, 1988; Tang et al., 2002; Marck and Grosjean, 2003), and the BHB motif is a critical determinant for recognition by archaeal splicing endonuclease (Xue et al., 2006), which consists of a 4 bp double-stranded helix flanked with two 3 nt bulges (**Figure 1**, shadowed with orange). In a pretRNA, the first two nucleotides of the anticodon base-pair with the intron to form a part of the central 4 bp helix, and the two adjacent bulges provide the 5- - and 3- -splice sites. The 5- - and 3- -splice sites may exist as independent entities, such as hBH and HBh motifs, in which splice sites are flanked by two short helices, in certain archaeal species (see non-canonical splice sites in **Figure 3**) (Marck and Grosjean, 2003). This secondary structure requirement in the BHB motif restricts a part of the primary sequence of the intron while quite large variations in sequence are accepted in the other parts of the intron.

exon of pre-tRNA, respectively. The anticodon is represented with black


circles, and CCA tri-nucleotides added at the 3-

Because of this transplantable nature of the splice sites, introns are also found at non-canonical positions of tRNAs, such as 20/21, 22/23, 25/26, 29/30, 30/31, 45/46, 53/54, 56/57, 59/60, etc., in some archaeal genomes (**Figures 2A**, **3**). Especially, *Crenarchaeota* and *Nanoarchaeota* genomes are rich sources of non-canonical introns. In an extreme case, intron is inserted at position 3/4 in certain *Thermoproteales* tRNAs (Sugahara et al., 2008). The non-canonical intron is sometimes accompanied by the canonical and other non-canonical introns in the same tRNA genes. For example, a tRNA-ProUGG gene in *Pyrobaculum islandicum* has two non-canonical introns inserted at position 25/26 and 56/57 in addition to a canonical intron (**Figure 3B**) (Sugahara et al., 2007). Splice sites of these non-canonical introns are found not only in the BHB motif but also in the HBh and/or h- BH motifs (Marck and Grosjean, 2003).

Furthermore, tRNA genes consist of separated transcriptional units have been identified in Crenarchaeal genomes amino acid-accepting units of mature tRNA are marked with light blue and light green shadow, respectively.

(Randau et al., 2005b,c). "Split tRNA genes" were first reported in *Nanoarchaeum equitans*: in this organism, tRNA-HisGUG is encoded by two gene fragments corresponding to 5- - and 3- -halves separated at position 37/38 in the anticodon loop (**Figure 3C**). The 5- - and 3- -halves are transcribed from their own promoters with the trailer and leader sequences whose portions are complementary to each other. A predicted secondary structure of the hybridized fragments is highly similar to that of a pre-tRNA harboring a canonical intron with a relaxed BHB motif, like a pre-tRNA received a cleavage at a loop in the intron.

Indeed, RT-PCR analysis revealed that the split tRNA fragments are transcribed, and the transcripts are joined to form mature and functional tRNAs *in vivo*, indicating that *trans*-splicing is operating in *N. equitans* (Randau et al., 2005b,c). Further *in vitro* analyses revealed that splicing endonuclease from *N. equitans* can recognize this "pre-tRNA" complex and cleave off the trailer and leader sequences at the precise positions, and the resulting RNA is suitable for ligation by tRNA ligase (Randau et al., 2005a; Tocchini-Valentini et al., 2005). An interesting case in this organism is that tRNA-GluCUC and tRNA-GluUUC, isodecoder tRNAs for the same amino acid, are produced from two different 5- halves and one common 3- -half by *trans*-splicing (Randau et al., 2005b). Probably, *trans*-splicing contributes to saving genomic space to be assigned to tRNAs and to increasing probability to have more isodecoders especially in the case of *N. equitans*, a parasitic bacterium with massive genome reduction (Makarova and Koonin, 2005). Or *trans*-splicing may be an evolutionary remnant of the ancient form of tRNA gene organization (see below).

There are more complicated cases. In *Caldivirga maquilingensis*, tRNA-Gly isodecoders with anticodons, CCC, UCC, and GCC, are formed by combinations of up-to three out of five independent transcripts through *trans*-splicing (**Figure 3D**) (Fujishima et al., 2009; Sugahara et al., 2009). While tRNA-GlyCCC is made from the 5- - and 3- -halves covering 1–37 and 38– 73 regions, respectively, like the case of *N. equitans* tRNA-GluCUC, the other two are joined from the 1–25 fragment common for the two, either one of the two specific fragments covering 26–37 with the anticodons, and the 3- -half used for all of the three tRNA-Gly isodecoders. Most of splits in the separated tRNA genes are located at canonical position (37/38) while those are at position 29/30 of tRNA-AlaCGC and tRNA-AlaUGC, and at 25/26 of tRNA-GluUUC in the *C. maquilingensis* genome. *N. equitans* and some *Staphylothermus* genomes harbor split tRNA-LysCUU at position 30/31 (Fujishima et al., 2009; Chan et al., 2011). For the tri-split tRNA genes, splits are usually combination between one canonical position and one or more non-canonical positions (Fujishima et al., 2009).

### **INTRON-CONTAINING tRNA GENES IN EUKARYOTES**

In eukaryotic genomes, most of intron-containing tRNA genes have their introns at the canonical position with length of 6– 133 nt (Lowe and Chan, 2011). Because the nuclear genome of a eukaryote contains multiple tRNA genes encoding an isodecoder tRNA to allow sufficient supply of the tRNA, the number of tRNA genes in a eukaryotic genome does not correspond to that of sequence variations of tRNAs. Ratios of introncontaining tRNA genes vary significantly among eukaryotes: only 39 out of 1068 tRNAs genes (3.7%) have an canonical intron in *Strongylocentrotus purpuratus* (sea urchin) while in the infectious yeast, *Cryptococcus neoformans*, 132 out of 143 tRNA genes (92%) have an intron (Lowe and Chan, 2011). Whether all the predicted introns in eukaryotic genomes are spliced properly has not been fully confirmed. However, the yeast, *Saccharomyces cerevisiae*, which has 61 intron-containing genes encoding 10 different isodecoder tRNAs, can splice all the pre-tRNAs with introns by the single splicing endonuclease, the Sen complex, and this is also true in human (Trotta et al., 1997; Paushkin et al., 2004).

A clear difference between eukaryotic and archaeal introns in tRNA genes is that eukaryotic introns do not have clear local motifs specifying splice sites as described on archaeal introns (**Figure 1**). Rather, eukaryotic splicing endonuclease is considered to recognize splicing sites of pre-tRNAs by ruler-mechanism, in which the Sen complex measures distance between the body of tRNA and the splice sites, and this allows more flexible sequence selection around splicing sites in eukaryotic pre-tRNAs (Greer et al., 1987; Reyes and Abelson, 1988). Analyses of the secondary and tertiary structures of eukaryotic pre-tRNAs indicate that many pre-tRNAs have splice sites in single-stranded regions adjacent to double-stranded stretches, suggesting a eukaryotic site-recognition strategy similar to that of archaeal enzyme to some extent (Greer et al., 1987; Reyes and Abelson, 1988). Indeed, eukaryotic splicing endonuclease can recognize and splice archaeal-type pre-tRNAs with a BHB motif, meaning that eukaryotic splicing endonuclease has ability to read local structural features of substrate RNAs (Fabbri et al., 1998; Di Segni et al., 2005). However, close inspection of substrate specificity in tRNA splicing of *S. cerevisiae* and *Xenopus* extracts with artificial substrates demonstrated that double-stranded regions formed between the anticodon loop and the intronic sequence are not prerequisite to splicing (Reyes and Abelson, 1988). Thus, for our complete understanding of splice site properties essential for Sen recognition, we need to wait until structures of complexes between eukaryotic splicing endonuclease and various pre-tRNAs are solved.

Although short insertions can be found in D or T-C arms of various eukaryotic tRNA genes, such insertions are mostly found one of many synonymous tRNA genes on a genome. Thus, such genes are supposed to be pseudogenes, and non-canonical introns in eukaryotic genomes are supposed to be rare. However, it was reported that nuclear remnants of enslaved algae, nucleomorphs, in cryptophytes and chlorarachniophytes harbor tRNA genes with non-canonical introns in the D and T-C arms in addition to non-canonical positions in the anticodon stem-loop (Kawach et al., 2005; Maruyama et al., 2010). For example, the nucleomorph genome of *Guillardia theta* contains 9 tRNA genes (including one pseudogene) with insertions with length from 3 to 24 nt at non-canonical positions (**Figure 2B**). Interestingly, their excision, including that of the 3 nt intron in the T-C arm of tRNA-CysGCA, tRNA-ValAAC, and tRNA-LeuUAA(pseudo) genes, was confirmed by sequencing of RT-PCR products of the algal RNAs (Kawach et al., 2005) while we still do not know how these non-canonical introns, especially extremely short ones, are spliced. Moreover, 4 tRNA genes, such as tRNA-PheGAA genes, of this organism harbor two introns, and 2 tRNA genes, elongator tRNA-Met (tRNA-eMet) and tRNA-CysGCA, do three introns, like the case of Crenarchaeal tRNA genes. Similar non-canonical introns are also found in the nuclear genome of a red algae *Cyanidioschyzon merolae* (Soma et al., 2007). Therefore, certain eukaryotic splicing endonucleases have ability to recognize splice sites inserted at non-canonical positions. In this point of view, splicing machinery of *C. merolae* seems to have more archaea-like characteristics, and BHB motifs seem to be critical determinants of substrate selection for this splicing endonuclease (Soma et al., 2013).

Another unusual tRNAs that were first reported on the *C. merolae* genome are permutated tRNAs, in which normal 5- and 3- -termini are bridged with a short loop, and their 3- -half is positioned upstream of the 5- -half (Soma et al., 2007). Six tRNAs such as tRNA-GlnCUG in *C. merolae* have its 5- -leader sequence before position 38 and its 3- -trailer after position 37. Atypical 5- - and 3- -termini are also introduced in the D arm of tRNA-LeuUAA, and the T-C arm in 4 tRNAs including tRNA-AlaUGC (**Figures 3E,F**). RT-PCR analyses revealed that these permutated tRNA transcripts are indeed converted into mature and functional tRNAs via circular intermediates (Soma et al., 2007, 2013). The new 3- -trailer and 5- -leader sequences have ability to basepair and produce a splice site-like structure with a BHB motif similar to that of archaeal split tRNA transcripts (Randau et al., 2005b). Thus, splicing machinery is thought to cleave the extensions from the permutated pre-tRNAs and joined separated ends of revers-oriented exons while true 5- - and 3- -termini of the mature tRNAs are thought to be generated by the action of terminal processing enzymes, such as RNase P and tRNase Z. Similar permutated tRNA genes have been found in other single-cell algae, such as prasinophytes and nucleomorphs in cryptophytes (Maruyama et al., 2010). Permutated tRNA genes are not limited to eukaryotes. An archaea, *Thermofilum pendens*, also has permutated tRNA-TyrGUA and initiator tRNA-Met (tRNA-iMet) genes with new 5- - and 3- -termini formed in the T-C arm (Chan et al., 2011). However, distribution of permutated tRNA genes is so far limited in only certain clades of eukaryotes and archaea.

### **SPLICING MACHINERY**

### **SPLICING ENDONUCLEASE**

As mentioned previously, splicing endonuclease is a search engine for tRNA-type splices sites among various transcripts produced by an organism. One splicing endonuclease, or an endonuclease complex, in an organism cleaves both the 5- - and 3- -splice sites of an intron to produce two 5- -OH termini on the intron and 3- -exon, and two 2- , 3- -cyclic phosphate termini on the 5- -exon and intron, meaning that the reaction is phosphoester transfer but not hydrolysis. The archaeal endonucleases are classified into three types from subunit organization; α4, α<sup>2</sup> and α2β<sup>2</sup> (Tocchini-Valentini et al., 2005; Calvin and Li, 2008). The archetypal configuration of the enzymes is supposed as an α<sup>4</sup> homotetramer. The crystallographic analysis of EndA from *Methanococcus jannaschii* revealed that this tetrameric enzyme consists of a 2-fold but not 4-fold symmetric structure, where two subunits are mainly used to build two reaction centers for splice site cleavage, and the other two are used as structural components to position the catalytic subunits in an appropriate spatial arrangement for accepting the BHB motif (Lykke-Andersen and Garrett, 1997; Li et al., 1998). The α<sup>2</sup> type is a kind of "a dimer of dimers," in which two tandemly duplicated endonuclease units in a polypeptide act as a catalytic unit and a structural unit, respectively (Kleman-Leyer et al., 1997; Li and Abelson, 2000). The resulting dimer shows a configuration similar to that of α<sup>4</sup> enzymes. The α2β<sup>2</sup> type is also a derivative of α<sup>4</sup> (Mitchell et al., 2009; Yoshinari et al., 2009). Gene duplication and different requirements for catalytic and structural subunits have led sequence divergence of the two subunits. Appearance of certain subunit types is well correlated with phylogenetic relation of archaebacteria and, more importantly, with diversity of tRNA introns and splice sites (Tocchini-Valentini et al., 2005; Calvin and Li, 2008). Species in *Euryarchaeota* and *Korarchaeota* essentially have α4- or α2-type splicing endonucleases, while those in *Nanoarchaeota*, *Thaumarchaeota*, and *Crenarchaeota* have α2β2 type enzymes. Interestingly, the latter group of archaea is rich in tRNA genes with introns: most unusual tRNA genes, such as those with non-canonical introns, with multiple introns or with permutated configuration, are found in these organisms (Marck and Grosjean, 2003; Sugahara et al., 2008; Chan et al., 2011). Some of the splice sites in these tRNA genes comprise expanded formats of the BHB, such as hBH or HBh- (Tocchini-Valentini et al., 2005). On the other hand, *Euryarchaeota*, mostly possessing α4- or α2-type enzymes, has low ratios of tRNA genes with introns (less than 10%), and these tRNA genes have only one intron with the BHB motif at the canonical position. Therefore, subunit organization of splicing endonuclease seems to be a determinant for the substrate spectrum of the enzyme; strict to the BHB motif at the canonical position in α<sup>4</sup> and α<sup>2</sup> enzymes, and lenient in α2β<sup>2</sup> enzymes.

Eukaryotic splicing endonuclease consisting of four subunits, namely Sen2, Sen34, Sen54, and Sen15, was first identified in the yeast *S. cerevisiae* (Rauhut et al., 1990; Trotta et al., 1997). Among the four subunits, Sen2 and Sen34 have catalytic centers that are responsible for cleavage of 5- - and 3- -splice sites, respectively. On the other hand, Sen54 was reported to interact with the D-arm of substrate pre-tRNAs and to position catalytic subunits away from the body of tRNA with appropriate distances (Trotta et al., 2006; Xue et al., 2006). Human endonuclease contains homologs of all the 4 Sen proteins found in *S. cerevisiae* (Paushkin et al., 2004), and *Arabidopsis thaliana* also possesses Sen2 and Sen34 homologs (Akama et al., 2000). There is appreciable conservation among these eukaryotic Sen2 and Sen34 subunits, and archaeal endonucleases, indicating the same evolutional origin of splicing machinery. The human Sen complex also contains Clp1 (hClp1) (Paushkin et al., 2004; Weitzer and Martinez, 2007). Clp1 was first identified as a component of the cleavage factor II for polyadenylation of pre-mRNAs (de Vries et al., 2000), and then re-identified as an *in vitro* kinase for tRNA exons when searching for an enzyme phosphorylating siRNAs displaying a 5- -OH group (**Figure 4**) (Weitzer and Martinez, 2007). Divergence of subunit organization is greater in eukaryotic endonuclease than in archaeal enzymes, and eukaryotic splicing endonucleases have the ability to cleave both eukaryotic and archaeal types of splice sites as described above. Thus, it can be said that subunit organization diversity is, again, correlated with local splice site variability even between different kingdoms of life. Although direct relation between repertoires of intron-containing tRNA genes and subunit configuration of splicing endonucleases among eukaryotes has not been analyzed extensively, one interesting example is *C. merolae*. Bioinformatic analysis successfully identified genes for Sen2, Sen34, and Sen54 but failed to identify any gene or gene fragment that may encodes the fourth essential subunit Sen15 (Soma et al., 2013). In addition, yeast two-hybrid analyses could not identify other proteins that interact with either one of the three *C. merolae* Sen proteins, suggesting that the Sen complex of this organism may consist of 3 but not 4 subunits. Although further study is needed, such unique subunit organization in *C. merolae* may be related to existence of tRNA genes with non-canonical introns and permutated tRNA genes in this organism.

### **ENZYMES CATALYZING LIGATION STEPS**

Excised tRNA exons are joined by tRNA ligase. There are two completely different chemical pathways of ligation classified from the origin of the phosphate bridging the 5- - and 3- -exons:

### **FIGURE 4 | tRNA splice site cleavage and two pathways of tRNA**

**ligation.** Series of chemical reactions in splice site cleavage of pre-tRNA (left), the 5- -phosphate ligation pathway (upper), and the 3- -phosphate ligation pathway (lower) are shown schematically. Proteins involved in individual reactions are also shown, and unidentified factors are denoted by question marks. Those proteins of yeast, mammal and prokaryotes are color-coded with black, blue and orange, respectively. In the 5- -phosphate ligation pathway, yeast Tr11 uses GTP as a phosphate donor while hClp1 in the mammalian system uses ATP instead of GTP for 5- -phosphorylation of the 3- -exon. Although a mammalian enzyme(s) catalyzing the last two steps of this pathway is still missing, such an enzyme was found in lancelet. In the 3- -phosphate ligation pathway, the mammalian protein responsible for 2- -3 cyclic phosphodiesterase in the HSPC117 complex was not fully identified, but eubacterial RtcB, an HSPC117 homolog, was demonstrated to possess this activity (Tanaka and Shuman, 2011). The starting material, pre-tRNA, and the end product, mature tRNA, are shadowed. Appr>Pi represents ADP-ribose 1---2-cyclic phosphate.

one is the 5- -phosphate ligation pathway and the other is the 3- -phosphate ligation pathway (**Figure 4**) (Popow et al., 2012). At the first step, ligation systems need to open the 2- , 3- -cyclic phosphodiester on the 5- -exon. The 5- -phosphate ligation pathway produces the 3 terminus with 2- -phosphate and 3- -OH (Phizicky et al., 1986; Sawaya et al., 2003; Wang and Shuman, 2005) while the 3- -phosphate ligation pathway does that with 2- -OH and 3- -phosphate (Popow et al., 2011; Chakravarty et al., 2012). The former does not use the 2- -phosphate but a new phosphate derived from a nucleotide triphosphate to form a bridge between the two exons (Westaway et al., 1993), while the latter utilizes the 3- -phosphate left on the 5- -exon (Popow et al., 2011; Chakravarty et al., 2012). In both cases, the phosphorylated exon is further activated by nucleotidylation, and then ligated to its counterpart using the nucleotide monophosphate as a leaving group (Phizicky et al., 1986; Westaway et al., 1993; Popow et al., 2011; Chakravarty et al., 2012). In the case of the 5- -phosphate ligation pathway, the 2- -phosphate left at the splice junction must be removed after the ligation (Culver et al., 1997; Harding et al., 2008).

#### *5* **-** *-phosphate ligase*

The 5- -phosphate ligation pathway has been found widely in fungi and plants. It also operates in lancelet but is most probably absent in mammalian cells (Lappe-Siefke et al., 2003; Harding et al., 2008; Englert et al., 2010; Popow et al., 2011). tRNA ligase for 5- -phosphate ligation, Trl1/Rlg1, was first identified in *S. cerevisiae* and then in *A. thaliana* through biochemical analysis (Phizicky et al., 1986; Englert and Beier, 2005). Trl1 homologs are widely found in the genomes from fungi through diatoms to angiosperms but are absent in animals (Englert and Beier, 2005; Wang et al., 2006). Trl1 homologs catalyze all the chemical reactions of exon ligation except for removal of the 2- -phosphate at the splice junction, and consist of three domains covering three enzymatic activities; adenylate synthetase/RNA ligase (ASTase), polynucleotide kinase (PNKase), and 2- , 3- cyclic phosphodiesterase (CPDase) in this order (Xu et al., 1990; Sawaya et al., 2003). CPDase opens the 2- , 3- -cyclic phosphate on the 3- -terminus of the 5- -exon, and PNKase phosphorylates the 5- -terminus of the 3- -exon with GTP. By transferring an AMP moiety to this 5- -terminus from ATP, ASTase activates this end and allows ligation between the two exons (**Figure 4**, upper). During this terminal activation, the AMP moiety is first covalently attached to the ASTase domain and then transferred to the substrate. Although splicing endonuclease is responsible for primary recognition of splice sites, yeast Trl1 has the ability to interact with introns in pre-tRNAs (Tanner et al., 1988) and with an intron-containing precursor of its non-tRNA substrate, *HAC1* mRNA acting as a key component for the unfolded protein response (Sidrauski et al., 1996; Mori et al., 2010). The 2- -phosphate left at the splice junction is finally removed by 2- -phosphotransferase, or Tpt1 in *S. cerevisiae* (Culver et al., 1997). This reaction is unique in that 2- -phosphate is transferred to NADH to yield ADP-ribose 1---2- cyclic phosphate and nicotinamide (**Figure 4**, right most part). Indeed, this enzyme is essential for yeast growth, indicating that the 2- -phosphate is toxic for protein translation.

Although the 5- -phosphate ligation activity was once demonstrated in the nuclear extract of HeLa cells (Zillmann et al., 1991), a vertebrate 5- -phosphate ligase has not yet been identified. A lancelet *Branchiostoma floridae* has a complete set of enzymes for this ligation pathway; adenylate synthetase/RNA ligase activities and the other two activities for the tRNA ligation are encoded by two independent genes (Englert et al., 2010). In vertebrates, several polypeptides in different complexes can catalyze some of the individual reaction steps responsible for a putative 5- -phosphate ligation. First, *in vitro* phosphorylation of the 5- -terminus of the 3- -exon can be done by hClp1 kinase associated with the human Sen complex (Weitzer and Martinez, 2007). Second, mammalian genomes harbor genes for 2H CPDase, which can open the 2- , 3- -cyclic phosphodiester bond at the 3- -terminus of the 5- -exon to produce 3- -OH and 2- -phosphate groups, and the CPDase domain of fungal and plant Trl1 is classified as this family (Popow et al., 2012). Third, mammals possess 2- -phosphotransferase activities for RNA substrates. However, both mouse 2H CPDase and Tpt1 (Cnp1 and Trpt1, respectively) are non-essential for viability (Lappe-Siefke et al., 2003; Harding et al., 2008). Especially, Trpt1-knockout mice, which completely lacks 2- -phosphotransferase activity, were demonstrated not only to translate Tyr-rich proteins efficiently, which require appropriate splicing of the tRNA-TyrGUA intron but also to show no defects in splicing of *XBP1* mRNA (Harding et al., 2008), which undergoes tRNA-type unconventional splicing in the cytoplasm upon unfolded protein response, like yeast *HAC1* mRNA (Yoshida et al., 2001; Calfon et al., 2002). Therefore, even if vertebrates have the complete set of the enzymes for the 5- -phosphate ligation pathway, this pathway seems to contribute to only a small, if any, part of RNA splicing.

#### *3* **-** *-phosphate ligase*

In vertebrates, a tRNA ligase activity utilizing the 2- , 3- -cyclic phosphate to bridge the tRNA exons has been known for decades, but the enzyme responsible for this reaction was found only a few years ago (Filipowicz and Shatkin, 1983; Laski et al., 1983; Popow et al., 2011). Indeed, the 3- -phosphate ligase HSPC117 is a main player for tRNA splicing in mammals, and HSPC117 homologs are found in vertebrates, lancelets, insects, protozoa, algae etc., but not in fungi or angiosperms, which possess Trl1 homologs (Popow et al., 2011, 2012 for a review). Although HSPC117 forms a large complex with DDX1 (RNA helicase), CGI-99, FAM98B, and ASW, whether all of these subunits are involved in tRNA ligation is not known completely (see below). Homologs of HSPC117, whose name in prokaryotes is RtcB, exist widely in archaeal genomes where no Trl1 homologs exist (Englert et al., 2011). Interestingly, RtcB genes are also found in eubacterial genomes despite the fact that eubacteria do not have tRNA genes with introns removed by protein-assisted splicing. And eubacterial RtcB genes are located in operons with genes encoding RNA 3- -phosphate cyclase, which transforms a 3- -phosphate of an RNA molecule into a 2- , 3- -cyclic phosphate (Tanaka and Shuman, 2011). It is suggested that the set of enzymes is used to repair damaged tRNAs whose anticodon loop is endonucleolytically cleaved. These facts may indicate that the 3- -phosphate ligation pathway is a predominant and probably primordial pathway for tRNA ligation.

As mentioned above, HSPC117/RtcB also has adopted nucleotidylation to activate an exon terminus, but the ligase transfers a GMP moiety, instead of AMP, to the 3- -terminus of the 5 exon (**Figure 4**, lower) (Chakravarty et al., 2012; Popow et al., 2014). During the reaction, the GMP moiety is covalently attached to the ligase, and this activation step seems to be rate limiting to the overall ligase cycle in eubacterial RtcB (Chakravarty et al., 2012). This is also true in human ligase HSPC117, and these facts suggest that some factor(s) is required for efficient turnover of 3- -phosphate ligase. Bioinformatic search for clusters of eukaryotic orthologous groups conserved in the same model organisms as RtcB and biochemical confirmation revealed that a small protein named "archease" enables HSPC117 to catalyze multiple-rounds of ligation reaction (Popow et al., 2014). Detailed biochemical analyses revealed that archease specifically enhances GMP transfer from GTP to HSPC117 protein but not that from the HSPC117-GMP adduct to the 3- -terminus of RNA substrates. Furthermore, this step also requires another activity catalyzed by DDX1, an RNA helicase within the HSPC117 complex. The implication of DDX1 ATPase provides evidence for molecular mechanism of ATP hydrolysis in the mammalian ligation reaction (Popow et al., 2011, 2014). Interestingly, involvement of archease in tRNA processing was first reported on m5C formation of tRNA in archaea *Pyrococcus abyssi*, where archease enhances substrate specificity of m5C methyltransferase and prevents it from aggregation probably by acting as a chaperone (Auxilien et al., 2007). Recently, archease was also demonstrated to catalyze efficient activation of archaeal RtcB where it enhances all nucleotidyl transfer steps in the 3- -phosphate ligation, including GMP transfer to RNA molecules (Desai et al., 2014).

### **DIVERSITY OF SPLICING MACHINERY IN EUKARYOTES; INTRACELLULAR LOCALIZATION AND FUNCTION**

As mentioned above, eukaryotic splicing factors are more complex than those of archaea. According to this complexity, the splicing factors in various eukaryotes have acquired diversity in several aspects. One interesting aspect is the place of splicing in eukaryotic cells. Despite the functional and structural conservation of eukaryotic splicing endonucleases, their intracellular localization is divergent among organisms. Splicing endonucleases in vertebrates show expected localization: they are mainly found in the nucleus (De Robertis et al., 1981; Paushkin et al., 2004). Thus, it has been suggested that pre-tRNAs are spliced in the nucleus and only matured tRNAs are supplied to the cytoplasm. On the other hand, in *S. cerevisiae*, splicing of pretRNAs proceeds in the cytoplasm: tRNA splicing endonuclease was found to associate with mitochondrial surface, and this association is required for proper function of the enzyme (Yoshihisa et al., 2003, 2007). Later, the other enzymes required for completing splicing were also demonstrated to localize in the cytoplasm (Mori et al., 2010). It is to be noted that cytoplasmic localization of the Sen complex itself is not essential for pre-tRNA splicing for the yeast. When all the Sen subunits are expressed in the nucleus, pre-tRNAs are spliced normally (Dhungel and Hopper, 2012). These findings suggest that eukaryotic cells can tolerate drastic alteration of the place of tRNA splicing.

Although we have not obtained a complete view of tRNA splicing machinery in plant cells, some subunits of Sen proteins and tRNA ligase from *A. thaliana* were revealed to have multiple destination signals (Englert et al., 2007). When expressed as GFP fusions, AtSen2 and AtSen1 localized to the nucleus and mitochondria while AtTrl1 did to the nucleus and chloroplasts. Especially in the case of AtTrl1, alternative translational initiation may produce two isoforms; one from the most up-stream AUG is targeted to chloroplasts and the other from downstream AUG is to the nucleus (Englert et al., 2007). Thus, the nucleus may be the primary site of pre-tRNA splicing of nuclear encoded tRNAs in plant cells. On the other hand, one report provided a piece of evidence against this notion: unspliced pre-tRNAs accumulated when tRNA export from the nucleus was suppressed by RNAimediated knock-down of the tRNA export carrier, Exportint/Paused (Park et al., 2005). This situation is quite similar to that of *S. cerevisiae*, where disruption of *LOS1*, which encodes the yeast homolog of Exportin-t, leads accumulation of pre-tRNAs because of their sequestration from cytoplasmic splicing enzymes (Sharma et al., 1996; Yoshihisa et al., 2003). What is the role of mitochondrial or chloroplastic splicing enzymes in plants? Since there are no tRNA genes harboring eukaryote-archaea type introns in mitochondria and chloroplasts, plant splicing enzymes in these organelles should process a different type(s) of RNAs.

Evolution of eukaryotes also has hooked extra functions up to splicing machinery. As mentioned above, the yeast Sen complex can be transplanted from mitochondria to the nucleus without major defects in tRNA splicing. However, partial deletion of the mitochondrial targeting signal in Sen54 leads temperature sensitivity in yeast growth, and transplantation of the whole Sen complex to the nucleus compromises growth and rRNA maturation (Yoshihisa et al., 2003; Dhungel and Hopper, 2012). Thus, the yeast splicing endonuclease should have another essential function in the cytoplasm. Human splicing endonuclease activity also may have an extra function(s) other than pretRNA splicing. Mutations in human *SEN2*, *SEN34*, and *SEN54* (*TSEN2*, *TSEN34*, and *TSEN54* respectively) lead to pontocerebellar hypoplasia, which causes specific neurodegenerative disorders including hypoplasia of the cerebellum and the ventral pons, despite the fact that tRNA splicing is essential for every cell in the human body (Budde et al., 2008; Namavar et al., 2011). Similar phenotypes were observed when a *SEN54* homolog was knocked down in zebrafish (Kasher et al., 2011). Interestingly, this disease is also caused by mutations of mitochondrial arginyltRNA synthetase (Edvardson et al., 2007; Namavar et al., 2011). Although targets of human Tsen in the pontocerebellar cells related to this disease have not been identified so far, these facts suggest that Tsen has some neuronal cell specific function, other than usual tRNA splicing, through collaboration with mitochondrial arginyl-tRNA synthetase, another tRNA-related factor. Probably, because of their relaxed substrate recognition and expansion of repertoires of transcripts in eukaryotic cells, the Sen complexes from various eukaryotes may have adopted abilities to process and/or degrade various substrates in addition to intron-containing pre-tRNAs. Mutations in vertebrate Clp1 were also found to cause neurodegenerative phenotypes, such as motor-sensory defects, cortical dysgenesis and microcephaly, like the case of Tsen mutants (Hanada et al., 2013; Karaca et al., 2014; Schaffer et al., 2014). Mouse Clp1-K127A (kinase-dead) and hClp1-R140H (disease-related) compromise both pre-tRNA cleavage and 5- -RNA kinase activities *in vitro* while they specifically affect neuronal cells *in vivo* in mouse and human, respectively (Hanada et al., 2013; Karaca et al., 2014). Similar phenotypes were seen in *CLP1* mutants of zebrafish (Schaffer et al., 2014). The above facts suggest that, *in vivo*, residual activities of these Clp1 mutant proteins cannot complete tasks required in the neuronal cells while they can provide enough tRNA splicing ability in other cell types. Although fundamental functions of mammalian Clp1, as a subunit of the Sen complex or as a component of Cleavage Factor II generating mRNA 3- -ends, are carried out in the nucleus, some reports predict an extra-function in the cytoplasm. As mentioned before, hClp1 also acts as a 5- -kinase for siRNAs (Weitzer and Martinez, 2007). Phosphorylation of artificial siRNAs supplied from the outside of the cells may occur in the cytoplasm because only phosphorylated double stranded siRNAs are loaded to Ago proteins to yield RNA-induced silencing complexes in the cytoplasm while double stranded RNA import across the nuclear envelope has not been known (Nykänen et al., 2001; Ameres and Zamore, 2013). Indeed, Tsen-hClp1 can be purified from the cytosolic fraction (Weitzer and Martinez, 2007; Karaca et al., 2014).

High-order functions are also postulated for mammalian 3- -phosphate ligase. The HSPC117 complex was independently isolated as one of the components that allow axonal transport of mRNAs in neurons, suggesting that the HSPC117 has another function in the cytoplasm (Kanai et al., 2004). Indeed, the HSPC117 complex was recently found to shuttle between the nucleus and the cytoplasm (Pérez-González et al., 2014). A versatile RNA helicase DDX1 in this complex may account for such functions related to mRNA dynamics. As discussed above, both 5- -phosphate and 3- -phosphate tRNA ligases may also act as healing-sealing enzymes for damaged tRNAs.

### **PHYSIOLOGICAL MEANINGS TO HAVE AN INTRON IN A tRNA GENE**

### **PHYSIOLOGICAL FUNCTION OF INTRON IN PRE-tRNA DURING ITS MATURATION**

The tRNA intron is indeed an obstacle for tRNA's function, and seems to exist just to be removed after transcription. Why do organisms need to keep the intron on their tRNA genes? What are roles of the introns? So far, there are no clear answers to these questions. However, there are several examples that indicate some specific effects of tRNA introns on wide variety of aspects in the life of tRNAs and in the life of archaebacteria and eukaryotes.

First, involvement of tRNA introns as recognition motifs for tRNA modification enzymes has been known in eukaryotes (Grosjean et al., 1997). Mainly from *in vitro* analyses, it was found that certain modifications in the anticodon loop are strictly applied on intron-containing pre-tRNAs. One famous example is that pseudouridylation of the 1st and 3rd U of the anticodon (U34 and U36) of tRNA-IleUAU in *S. cerevisiae* (Szweykowska-Kulinska et al., 1994; Simos et al., 1996; Motorin et al., 1998). tRNA-IleUAU needs to distinguish the AUA codon for Ile from AUG for Met. Conversion of UAU into -A is thought to enhance selectability to A against G at the wobble position. This pseudouridylation is applied only to intron-containing pretRNA-IleUAU by pseudouridine synthase Pus1 *in vitro* (**Figure 5A**, upper) (Simos et al., 1996; Motorin et al., 1998). In. *S. cerevisiae*, Pus1 localizes in the nucleus, and the splicing machinery does in the cytoplasm. Thus, the order of these processing events is guaranteed by the difference in intracellular localization (**Figure 5A**, lower). Interestingly, Pus1 also pseudouridylates at positions 25 and 67 of both spliced and unspliced forms of tRNA-IleUAU. A similar example of intron-dependent modification is 2- -*O*methylation of C34 in tRNA-LeuCAA by Trm4 (Grosjean et al., 1997). Both yeast and human Trm4 methyltransferases require the intron in the substrate, tRNA-LeuCAA, *in vitro* (Strobel and Abelson, 1986; Brzezicha et al., 2006). The intron is also used to avoid premature modification during the series of modification reactions. tRNA-PheGAA, derived from intron-containing genes, has an unusual nucleotide wybutosine (yW) at position 37 (Blobstein et al., 1975). The formation of yW starts with methylation of G37 by Trm5 in *S. cerevisiae*. Trm5 only recognizes the sliced but not intron-containing form of tRNA-PheGAA (**Figure 5B**, upper) (Noma et al., 2006). Again, the spliced form of tRNA-PheGAA first appears in the yeast cytoplasm because of the cytoplasmic localization of the splicing machinery. However, the spliced intermediate must be re-imported into the nucleus for this methylation since Trm5 localizes in the nucleus (Ohira and Suzuki, 2011). It has been shown that various tRNA species constantly shuttle between the cytoplasm and the nucleus, and the import system can deliver the spliced intermediate to the nucleus (Shaheen and Hopper, 2005; Takano et al., 2005; Shaheen et al., 2007; see also review, Hopper, 2013). After methylation by Trm5, tRNA-PheGAA with m1G37 is transported back to the cytoplasm to receive a series of chemical reactions to build yW at position 37 (**Figure 5B**, lower) (Ohira and Suzuki, 2011). In this case, the tRNA intron acts as a determinant of timing of modification.

In archaea, there is a unique example in which tRNA intron itself encodes a functional unit for modification of the body of the tRNA. In archaeal tRNA-TrpCCA precursor, a box C/D small RNA is nested in the intron, and this intronic part of pre-tRNA-TrpCCA or the excised intron is used to select nucleotides C34 and U39 for 2- -*O*-methylation *in trans* (**Figure 5C**) (Omer et al., 2000; Clouet d'Orval et al., 2001; Singh et al., 2004). Although these modifications in eukaryotes and archaea seem to be the driving force to maintain the introns in tRNA genes, the fact that Pus1 and Trm4 are dispensable for viability of *S. cerevisiae* is against this assumption (Simos et al., 1996; Motorin and Grosjean, 1999). It was also demonstrated that the intron of tRNA-TrpCCA can be removed from the genome of *Haloferax volcanii*, suggesting that modification assisted by the intronic sequence itself does not cause strong selective pressure for intron maintenance (Joardar et al., 2008). Thus, no essential modifications only applied to intron-containing pre-tRNAs have been identified in eukaryotes and archaea so far.

**FIGURE 5 | tRNA introns affect various biological activities as RNA and DNA units.** Several examples where tRNA introns affect various biological activities as RNA units (upper) and as DNA units (lower) are illustrated. **(A)** The intron of tRNA-IleUAU in *S. cerevisiae* is an essential recognition motif for Pus1. Thus, pseudouridylation of U34 and U36 in the anticodon by Pus1 is only applied to the intron-containing pre-tRNA. Pre-tRNA-IleUAU pseudouridylated by Pus1 in the nucleus is subjected to cytoplasmic splicing after export from the nucleus in the yeast. **(B)** Yeast Trm5, a nuclear methyltransferase catalyzing the initial step of yW formation, only recognizes a spliced form of tRNA-PheGAA, which is produced in the cytoplasm and re-imported into the nucleus. Thus, yW formation starts at a late stage of

**POSSIBLE FUNCTIONS OF INTRON IN tRNA GENES ON THE CHROMOSOME**

The intron may affect functions or abilities of tRNA genes on chromosomes. Although the primary function of tRNA genes is the source of genetic information to produce tRNAs, their well-structured and conserved sequences are utilized as targets of integration of viral genomes and other mobile genetic elements. Randau and Söll proposed that intron insertion will prevent tRNA genes from viral genome integration (Randau and Söll, 2008). Indeed, many archaeal viruses have been isolated from extreme environments, which tRNA-intron-rich archaea also prefer (Rice et al., 2001; Prangishvili, 2013). In well-studied cases in archaea, such viruses chose tRNA genes as integration sites. For example, *Sulfolobus* spindle-shaped viruses (SSVs) belonging to the *Fuselloviridae* family use either tRNA-LeuGAG, tRNA-AspGUC, tRNA-GluUUC, or tRNA-GluCUC as their *att*B sites, and they integrate their genome into the anticodon loop and T-C arm of these tRNA genes (Wiedenheft et al., 2004). By possessing an intron at or near the canonical position, or dividing tRNA genes into as a guide RNA to select two nucleotides, C34 and U39, of the same tRNA for 2- -*O*-methylation. **(D)** *Sulfolobus* SSVs use a part of tRNAs as an *attB* site for its integration into the host genome. Some *Sulfolobus* species harbor an intron in such tRNA genes to disrupt *attB* sequences. This may be a bacterial strategy to avoid viral infection. **(E)** tRNA-ThrAGU near the *HMR*-E locus, which is in a heterochromatic region, acts as an insulator and prevents expansion of the heterochromatic region over the tRNA gene by recruiting TFIIIC properly in *S. cerevisiae*. The intron-containing tRNA-LeuCAA gene cannot replace tRNA-ThrAGU while the intronless version of the same tRNA can. See the text for details.

halves, such tRNA genes may escape from integration of these viruses (**Figure 5D**). However, there are other tRNA genes can be used as *att*B sites for these viruses, so that avoidance of viral infection may not be the sole reason to gain and/or maintain tRNA introns on archaeal genomes.

In eukaryotes, promoter elements are embedded in the coding region of tRNA genes, namely the A box and B box (**Figure 5E**) (Galli et al., 1981; Geiduschek and Kassavetis, 2001). These elements must be recognized by general transcription factor complexes in order for RNA polymerase III, especially TFIIIC, to initiate efficient transcription. Since the distance between the A and B boxes is altered by insertion of the intron, introns may affect recognition of the promoter elements by TFIIIC. Indeed, some early reports demonstrated that intron removal from reporter tRNA genes affects expression levels of some tRNAs in *Xenopus* oocytes *in vivo* and in yeast extracts *in vitro*, supporting the above assertion (Ciliberto et al., 1982; Fabrizio et al., 1987). On the other hand, other reports argue that yeast TFIIIC can bind to the tRNA promoter elements with various distance through its flexible linker between two recognition domains (Schultz et al., 1989; Camier et al., 1990). Recent genome-wide study of TFIII occupancy using ChIP/Seq analyses also revealed that rather even occupancy of Tfc1 (TFIIIC subunit) and Brf1 (TFIIIB subunit) on various tRNA genes (Nagarajavel et al., 2013). Therefore, alteration of distance between the A and B boxes by intron insertion *per se* may cause only minor effects on transcription of tRNA genes in general.

On the other hand, there is one example to indicate the effect of intron insertion on TFIIIC recognition of the tRNA promoter elements. It was reported that tRNA genes act as insulators or nucleosome phasing barriers (Donze et al., 1999). For this unique activity, the promoter elements of the tRNA genes must be properly occupied by TFIIIC (Simms et al., 2008). A tRNA-ThrAGU gene, which does not have an intron, next to the yeast *HMR*-E locus acts as an insulator to block propagation of heterochromatin over this gene. On the other hand, a tRNA-LeuCAA gene, which has a 19 nt-long intron, cannot act as an insulator, and removal of the intron brings insulator activity to the tRNA gene (**Figure 5E**) (Donze and Kamakaka, 2001). Therefore, possession of the intron can affect the additional function of tRNA genes related to genome organization.

### **DIRECT EXAMINATION OF INTRON REQUIREMENT**

Essentiality of keeping introns in tRNA genes for organisms can be directly assessed by constructing strains without tRNA introns. As mentioned above, in archaea, one report demonstrated that the intron in the tRNA-TrpCCA gene is dispensable for viability of *H. volcanii* in spite of its critical function in 2- -*O*-methylation of the tRNA (Joardar et al., 2008). By expanding similar analyses in certain archaea, it may be tested whether every tRNA gene harboring an intron(s) on an archaeal genome needs to possess the intron(s). On the other hand, the situation is not so simple in eukaryotes. As mentioned previously, most of isodecoder tRNAs are encoded by degenerated genes with the same or very similar sequences in eukaryotes. For example, even simple eukaryotes, such as yeasts, has more than 5 degenerated genes for each isodecoder tRNA on average. In *S. cerevisiae*, numbers of degenerated genes encoding intron-containing isodecoders varies from one for tRNA-SerCGA to 10 for tRNA-LeuCAA, tRNA-PheGAA, and tRNA-ProUGG (Lowe and Chan, 2011). Thus, only one isodecoder, tRNA-SerCGA, was examined for dispensability of its intron in 1980s despite versatility of the yeast in chromosomal modification through homologous recombination (Ho and Abelson, 1988). Indeed, it was found that the intron in the tRNA-SerCGA gene can be removed from the yeast chromosome without any apparent growth defects. However, the sequence of tRNA-SerCGA is quite similar to that of tRNA-SerUGA; only three nucleotides are different between the two. Overproduction of tRNA-SerUGA can suppress lethality of deletion of the tRNA-SerCGA gene, and this suppression requires ncm5U modification at U34 of tRNA-SerUGA to expand its recognition repertoire at the wobble position (Johansson and Byström, 2004; Johansson et al., 2008). These facts suggest functional redundancy between tRNA-SerCGA and ncm5U-modified tRNA-SerUGA and that effects of intron removal from the tRNA-SerCGA gene may be masked by this redundancy. A stricter test was done recently with tRNA-TrpCCA, which is the only isodecoder to decode a single UGG codon for Trp and encoded by six genes with the same sequences by our group (Mori et al., 2011). Complete intron removal from all the tRNA-TrpCCA genes conducted by repetitive replacement of the chromosomal tRNA genes with an intronless allele caused minimal effects on yeast growth under various conditions. In co-culture experiments with the wild-type strain up to 50 generations, the intronless strain even showed slight advantage over the wild type in growth. Furthermore, intron removal neither affects the amount and aminoacylation level of tRNA-TrpCCA, nor the protein synthesis rate while minor changes in protein composition were detected in 2D-PAGE analysis (Mori et al., 2011). Because there exists no other tRNA to decode Trp codons on the yeast genome, the above results indicate that the intron of tRNA-TrpCCA is dispensable for viability of *S. cerevisiae*. Therefore, not strict but rather subtle effects to have introns in genes of certain isodecoder tRNAs may be advantageous for organisms during evolutional time scale under the natural conditions. Or, intron insertion to and intron loss from tRNA genes are rather neutral for organisms in evolution.

### **APPEARANCE OF INTRON-CONTAINING tRNA GENES AND ORIGIN OF INTRONS**

### **ARCHAEBACTERIAL INTRON: ESTABLISHMENT OF ARCHETYPAL INTRON-CONTAINING tRNA GENES AND MODERN INTRON-CONTAINING GENES**

There is a big debate on the origin of tRNA introns. One group argues for the "intron-first" scenario: all the primordial tRNA genes had an intervening sequence to be removed on the anticodon loop or consisted of a set of two halves, and then a large part of such introns have lost from the tRNA genes or the two halves have been joined on the genome during evolution (Di Giulio, 1992, 2006). The other argues for the intron-late scenario that introns were inserted after primordial tRNA genes had been established (Cavalier-Smith, 1991). From the "simpleto-complex" view, the hypothesis that tRNA is derived from a tandem duplication of a short stem-loop or ligation of two fragments may be reasonable. Indeed, existence of split tRNAs, especially those create multiple tRNAs by combination of 5- - and 3- -halves, fits to this idea (Di Giulio, 2006). Phylogenetic analysis of archaeal tRNA genes also suggests that 5- - and 3- -halves of tRNA genes even without introns have evolved through different evolutional trails (Fujishima et al., 2008). One and critical problem of this hypothesis is that functional units of a tRNA molecule, a decoding unit and an amino acid-accepting unit, do not correspond to these possible ancestral stem loops (**Figure 1**).

On the other hand, comparison of closely related species supports the idea that some "modern" introns have been inserted into tRNA genes afterwards. Many tRNA genes encoding different isodecoders in *Thermoproteales* genomes possess introns with very similar sequences at canonical and non-canonical positions despite the fact that the tRNA bodies are mapped at different phylogenetic positions (Fujishima et al., 2010). This finding suggests that large-scale intron transposition occurred in this order of archaea after the species had been established. Although the mechanism of intron transposition is still unknown, this is allowed because the BHB motif is required at exon-intron junctions and may be able to specify possible transposition targets on the chromosome. Thus, this type of intron gain by archaeal tRNA genes is supposed to proceed in the DNA level.

### **EUKARYOTIC INTRON: POSSIBILITY OF "RECENT" GAIN OF tRNA INTRON**

Eukaryotes are thought to have evolved from an ancestral prokaryote more closely related to archaebacteria than eubacteria. So that, most of modern eukaryotic species might have lost archetypal intron-containing tRNA genes, which may be found in some of the archaeal genomes. Spectra of tRNA genes harboring an intron in eukaryotic genomes are not correlated with evolutional trails. Indeed, certain eukaryotic species possess intron-containing genes for a certain isodecoder tRNA while other species in the same clade do not. For example, the yeast *Lachancea thermotolerans* in *Saccharomycetaceae* has 3 introncontaining tRNA-AlaUGC genes and 3 intron-containing tRNA-LeuCAG genes while other *Saccharomycetaceae* yeasts have only intron-less genes for these isodecoders (**Figure 6A**). From the sequencing analysis, the intron-containing tRNA-LeuCAG genes seem to have emerged through codon switching by U34C mutation from tRNA-LeuUAG, which is mostly encoded by introncontaining genes in the *Saccharomycetaceae* yeasts. Thus, unique appearance of an intron on genes for a certain isodecoder tRNA does not necessarily mean intron gain of the tRNA genes during evolution. On the other hand, there is no tRNA paralogs encoded by intron-containing genes for tRNA-AlaUGC in the *Saccharomycetaceae* yeasts. Mature parts of this isodecoder from *L. thermotolerans* are highly homologous to those of the same isodecoder from other related yeasts. Thus, tRNA-AlaUGC genes in *L. thermotolerans* probably acquired their intron during evolution. Another unique piece of evidence for gain of a modern intron by tRNA genes is tRNA-GlyUCC in *Kluyveromyces lactis*. In *Saccharomycetaceae*, only *K. lactis* has two tRNA-GlyUCC genes with a eukaryotic intron. Surprisingly, the sequence of mature tRNA-GlyUCC from *K. lactis* is less homologous to those of the same isodecoder from other yeasts, including those from the most related genera such as *Lachancea* and *Eremothecium* (**Figure 6B**), but more homologous to tRNA-GluUUC from a eubacterium, *Lactobacillus plantarum*, where no protein-spliced intron exists in tRNA genes (**Figure 6C**). These situations strongly suggest that a bacterial tRNA-GluUUC gene was horizontally transferred to the genome of *K. lactis* or its close ancestor and was converted into tRNA-GlyUCC by U35C mutation. Then, the tRNA-GlyUCC gene seems to have acquired an intron. Thus, some, if not all, of introns in eukaryotic tRNA genes are suggested to have modern origins. There are many opposite cases, in which only one species lacks the intron of a certain isodecoder tRNA while the same isodecoder of other related species is encoded by intron-containing genes (see tRNA-PheGAA in **Figure 6A**), suggesting intron-loss from tRNA genes during evolution.

Although it is easy to postulate ways of intron-loss from tRNA genes during evolution, how can a eukaryotic tRNA gene gain such a modern intron at the canonical position? Not like the

### **FIGURE 6 | Distribution of intron-containing tRNA genes in yeast**

**genomes.** Distribution of intron-containing genes in 6 representative yeast genomes (*Saccharomyces cerevisiae*, *Candida glabrata*, *Lachancea thermotolerans*, *Kluyveromyces lactis*, *Eremothecium gossypii*, and *Debaryomyces hansenii*) is summarized in the anti-codon table **(A)**. As illustrated in the right, existence of intron-containing (orange) or intronless (blue) tRNA genes for each anticodon position is color-coded. Anticodon positions to which no tRNA genes are assigned on the genome or those correspond to termination codons are shown in white and black, respectively. The phylogenetic tree of these yeasts shown in **(B)** was drawn according to Génolevures Consortium (2009). Among these yeast, *D. hansenii* does not belong to *Saccharomycetaceae* and is used as an outgroup. **(C)** Sequence comparison of tRNA-GlyUCC from *D. hansenii* (*Dha*), *S. cerevisiae* (*Sce*), *C. glabrata* (*Cgl*), *L. thermotolerans* (*Lth*), *E. gossypii* (*Ego*), and *K. lactis* (*Kla*), and tRNA-GluUUC from *Lactobacillus plantarum* (*Lpl*) and *Clostridium tetanus* (*Cte*). Nucleotides identical to those of *K. lactis* tRNA-GlyUCC (marked with a frame) are shown in blue, and anticodons are highlighted with orange shadow.

case of archaea, sequences of exon-intron junctions in eukaryotic pre-tRNAs do not show clear structural characteristics as mentioned above. Therefore, it is difficult to imagine mechanisms by which a DNA fragment corresponding to a tRNA intron is transposed to another position on a chromosome. Another possibility is that intron is inserted into a tRNA gene via a reversetranscribed intermediate. It is to be noted that the anticodon loop, into which eukaryotic tRNA introns are mostly inserted, is often the target of tRNA cleavage enzymes. To recognize a codon on mRNAs, the anticodon loop must be accessible to macromolecules while the D- and T-C-loops interact with each other via base-pairing to form a tight structure. Thus, the anticodon loop is an Achilles' tendon for the tRNA to receive undesired cleavages. Well-known examples are so-called ribotoxins, such as colicin E5, and PrrC endonucleases produced by *Escherichia coli*, which cleave the anticodon of certain tRNAs to kill surrounding competitor bacteria and to kill itself when infected by a deadly pathogen, bacteriophage, respectively (Kaufmann, 2000). Similar anticodon nucleases are also found in eukaryotes. γ-Subunit of zymocin, produced by *K. lactis*, enters *S. cerevisiae* cells and cleaves tRNA-GluUUC if U34 of the anticodon is modified to mcm5s 2U (Lu et al., 2005). Eukaryotic cells also have tRNA cleavage enzymes whose targets are tRNAs in their own cells, such as Rny1 in yeast and angiogenin in mammals (Thompson and Parker, 2009; Yamasaki et al., 2009). tRNA cleavage activities of these endonucleases are regulated by cellular stresses, and the resulting tRNA fragments are used to suppress translation (Ivanov et al., 2011; Luhtala and Parker, 2012). Interestingly, the cleavage catalyzed by zymocin cannot be healed by tRNA ligase of *S. cerevisiae,* while heterologous expression of *A. thaliana* tRNA ligase can rescue *S. cerevisiae* from the deleterious effect of zymocin (Nandakumar et al., 2008). Probably, cleavage of the anticodon loop by these endonucleases is carried out so as to escape emerged RNA termini from endogenous tRNA ligase. Thus, the termini may provide initiation sites from which RNAdependent RNA polymerase extends the 5- -tRNA halves with unidentified templates, and then eventually its 3- -terminus may be ligated with the 5- -terminus of the 3- -half by tRNA ligase if these ends are recognized by healing-sealing activity of tRNA ligase (Schwer et al., 2004; Nandakumar et al., 2008). Even though such cleavage, elongation and healing might occur in the cytoplasm, the resulting tRNA derivatives might be re-imported into the nucleus using tRNA import machinery (Shaheen and Hopper, 2005; Takano et al., 2005). Thus, the tRNA derivatives can be subjected to reverse transcription and integration into the genome in the nucleus. Or, if the cleavage occurs in the nucleus, reverse transcriptase may directly elongate the 5fragment.

Both loss and gain of tRNA introns seem to have occurred during various stages of eukaryotic evolution. In addition to such events, eukaryotic genomes require another layer of gene arrangements to settle their tRNA gene repertoires. All the genes encoding a certain isodecoder tRNA on a eukaryotic genome tend to have an intron or they do not at all. Even in the cases that an isodecoder tRNA is encoded by a mixture of intron-containing and intronless genes, those intron-containing and intron-less genes are supposed to have different origins. For example, human tRNA-LeuCAA is encoded by total seven genes; one intron-containing gene on Chr I, four intron-containing genes on Chr VI, one intron-less gene on Chr I, and one intron-less gene on Chr XI. Sequence comparison revealed that the intron-less gene on Chr I is more related to tRNA-Met in dog and mouse than the human intron-containing tRNA-LeuCAA genes while that on Chr XI is more related to tRNA-LeuUAG in human and zebrafish. Therefore, sequences of tRNA genes with the same origin are supposed to be equalized including their intronic part. This is also confirmed among species. If sequences of tRNA-TrpCCA genes are compared among fungal genomes, those from the same organism are clustered (Mori et al., 2011). Intergenic conversion between highly homologous tRNA genes for an isodecoder may contribute to this sequence equalization. Indeed, intergenic spreading of mutations among tRNA genes of the same isodecoder have been known for years as intergenic gene conversion hot spots (Amstutz et al., 1985; Heyer et al., 1986). Although tRNA genes are scattered around the whole yeast chromosomes, they are closely located around the nucleolus in *S. cerevisiae*, and near centromeres in *Schizosaccharomyces pombe*, which has only three chromosomes, and this intra-nuclear localization is driven by interaction between TFIIIC and condensin (Thompson et al., 2003; Haeusler et al., 2008; Iwasaki et al., 2010). Such spatial positioning of tRNA genes in the nucleus is supposed to help efficient equalization of tRNA sequences through intergenic gene conversion. We still do not know what determines balance between domination of intron-containing genes and that of intron-less genes. As described before, there seems to be no essential difference in function of tRNA molecules derived from intron-containing and intron-less genes. It might be possible that domination of introncontaining or intron-less tRNA genes for an isodecoder tRNA just comes from stochastic change between the two states during the evolution: if the one of the two types occupies all the tRNA loci, no change becomes possible until a new and very rare event of intron-loss or intron-gain occurs.

Here, I have gone through recent progresses related to tRNA introns found in both archaea and eukaryotes. The life of a tRNA molecule transcribed from intron-containing genes has been studied for decades from various points of views, and has provided various interesting findings in bioinformatics, in molecular biology, in cell biology, and even in human pathology. However, accumulating information from the experimental and computational analyses has brought us more questions to be solved, especially those related to eukaryotic tRNA introns and their splicing. Still, we do not understand completely how eukaryotic splicing endonuclease decodes hidden motifs of splice sites in eukaryotic pre-tRNAs. What kinds of substrates other than pre-tRNAs does splicing machinery process in archaea and eukaryotes? Why can tRNA introns be spliced in different intracellular compartments among eukaryotes? Are there any other physiological roles of tRNA introns in the life of tRNA or those as genetical elements on the chromosomes? What are real selective pressures to keep introns in tRNA genes if existing? Most importantly, what are origins of modern tRNA introns, and how have they appeared and disappeared in archaeal and eukaryotic genomes during evolution? Some of the questions may not be able to be answered easily, but our quest of the life of tRNAs to seek the answers is going on.

### **ACKNOWLEDGMENTS**

This work was supported by Grant-in-aid for Scientific Research from the Ministry of Education, Culture, Sports, Science, and Technology of Japan, and by a special grant from University of Hyogo.

### **REFERENCES**


GTP/ATP cofactor specificity and accelerates RNA ligation. *Nucleic Acids Res.* 42, 3931–3942. doi: 10.1093/nar/gkt1375


*Fuselloviridae* Viruses. *J. Virol.* 78, 1954–1961. doi: 10.1128/JVI.78.4.1954- 1961.2004


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 March 2014; accepted: 20 June 2014; published online: 10 July 2014. Citation: Yoshihisa T (2014) Handling tRNA introns, archaeal way and eukaryotic way. Front. Genet. 5:213. doi: 10.3389/fgene.2014.00213*

*This article was submitted to Non-Coding RNA, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Yoshihisa. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

**REVIEW ARTICLE** published: 01 May 2014 doi: 10.3389/fgene.2014.00109

## Losing the stem-loop structure from metazoan mitochondrial tRNAs and co-evolution of interacting factors

### *Yoh-ichiWatanabe1\*,Takuma Suematsu1 † and Takashi Ohtsuki <sup>2</sup> \**

*<sup>1</sup> Department of Biomedical Chemistry, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan <sup>2</sup> Department of Biotechnology, Okayama University, Okayama, Japan*

### *Edited by:*

*Akio Kanai, Keio University, Japan*

### *Reviewed by:*

*Hyouta Himeno, Hirosaki University, Japan*

*Kozo Tomita, Biomedical Research Institute – National Institute of Advanced Industrial Science and Technology, Japan*

### *\*Correspondence:*

*Yoh-ichi Watanabe, Department of Biomedical Chemistry, Graduate School of Medicine, The University of Tokyo, Hongo, Bunkyo-ku, Tokyo 113-0033, Japan*

*e-mail: ywatanab@m.u-tokyo.ac.jp; Takashi Ohtsuki, Department of Biotechnology, Okayama University, 3-1-1 Tsushimanaka, Okayama 700-8530, Japan*

*e-mail: ohtsuk@okayama-u.ac.jp*

### *†Present address:*

*Takuma Suematsu, Department of Molecular and Cell Biology, Boston University, 72 East Concord Street, Boston, MA 02118, USA*

### **INTRODUCTION**

As discussed in other articles in this special issue, conventional tRNAs are highly conserved: they have a four-armed cloverleaf secondary structure and L-shaped tertiary structure (**Figures 1A,B**; Jühling et al., 2009). However, some tRNAs encoded in mitochondrial DNA, particularly in metazoan (multi-cellular animal) mitochondria, have diverged from standard form tRNAs in a variety of ways. In this review, we focus on mitochondrial tRNAs (mt tRNAs) lacking either the dihydrouridine arm (D-arm) or the ribothimidine arm (T-arm; **Figures 1C,D**). The function of tRNA is to help decode mRNA into protein. tRNA collaborates with a variety of proteins from post-transcription to decoding in ribosomes. The unique characteristics of factors interacting with such shrunken tRNAs have been uncovered over the past several decades. In this review, these factors and their evolution will also be discussed.

### **SHRUNKEN mt tRNAs**

### **tRNAs LACKING THE D-arm**

In the 1970s, D-arm-lacking tRNAs were at first identified as non-tRNA molecules (a putative equivalent to cytoplasmic 5S ribosomal RNA) because of their short length (Dubin and Friend, 1972; Dubin et al., 1974; Baer and Dubin, 1980). Since the identification of genes in mitochondrial DNA, these truncated tRNAs

Conventional tRNAs have highly conserved sequences, four-armed cloverleaf secondary structures, and L-shaped tertiary structures. However, metazoan mitochondrial tRNAs contain several exceptional structures. Almost all tRNAsSer for AGY/N codons lack the Darm. Furthermore, in some nematodes, no four-armed cloverleaf-type tRNAs are present: two tRNAsSer without the D-arm and 20 tRNAs without the T-arm are found. Previously, we showed that in nematode mitochondria, an extra elongation factor Tu (EF-Tu) has evolved to support interaction with tRNAs lacking the T-arm, which interact with C-terminal domain 3 in conventional EF-Tu. Recent mitochondrial genome analyses have suggested that in metazoan lineages other than nematodes, tRNAs without the T-arm are present. Furthermore, even more simplified tRNAs are predicted in some lineages. In this review, we discuss mitochondrial tRNAs with divergent structures, as well as protein factors, including EF-Tu, that support the function of truncated metazoan mitochondrial tRNAs.

**Keywords: mitochondrial tRNA, D-arm,T-arm, tRNA nucleotidyltransferase, aminoacyl-tRNA synthetase, elongation factor Tu, ribosome**

> have been proposed to be functional (Arcari and Brownlee, 1980; de Bruijn et al., 1980). Virtually all metazoan mitochondria have at least one of D-arm-lacking tRNA, namely tRNASer(GCU/UCU) for AGY or AGN codons (**Figure 1C**; Jühling et al., 2009). In addition, some animal mitochondria have additional D-arm-lacking tRNAs, such as tRNASer(UGA) in chromadorean nematodes (Okimoto et al., 1992), and tRNACys in some vertebrates (Seutin et al., 1994).

> The secondary structures of D-arm-lacking tRNAs have been classified into several groups based on the base pairs in T and anticodon stems (Steinberg et al., 1994). Experimental verifications of the secondary and tertiary structures have been performed using chemical modification, limited enzymatic digestion, nuclear magnetic resonance (NMR) spectroscopy, and native gel electrophoresis (de Bruijn and Klug, 1983; Ueda et al., 1985; Hayashi et al., 1997; Frazer-Abel and Hagerman, 1999, 2004; Ohtsuki et al., 2002a). Although these results support the coaxial stacking of T and acceptor stems (Frazer-Abel and Hagerman, 2008), this idea is somewhat controversial, possibly because of the structural flexibility of the D-arm-lacking tRNAs themselves (Frazer-Abel and Hagerman, 2008). The shortest possible D-arm-lacking tRNA was suggested to be 54 nt long (Steinberg et al., 1994).

> Aminoacylation and EF-Tu binding of D-arm-lacking tRNAs have been demonstrated (Ueda et al., 1985, 1992;

Yokogawa et al., 1989, 2000; Kumazawa et al., 1991; Watanabe et al., 1994; Hanada et al., 2001; Shimada et al., 2001; Ohtsuki et al., 2002a; Chimnaronk et al., 2005; Suematsu et al., 2005). Translation with an unmodified D-arm-lacking mammalian mt tRNASer(GCU) derivative with a GAA anticodon has been investigated using a cell-free system (Hanada et al., 2001). The ability to form a ternary complex with EF-Tu/GTP of the tRNASer(GCU) derivative is similar to that of the tRNASer(UGA) derivative, which has a four-armed cloverleaf secondary structure (Hanada et al., 2001). However, the amount of peptides produced using tRNASer(GCU) derivative is lower than that produced using tRNASer(UGA) derivative (Hanada et al., 2001).

### **tRNAs LACKING THE T-arm**

T-arm-lacking tRNA genes were identified in nematode mitochondria inWolstenholme et al. (1987). Since then, T-arm-lacking tRNA genes have also been found in mitochondrial DNA in other lineages of animals, such as Arthropoda (Masta, 2000). They have a TV replacement loop instead of a variable loop and Tarm (**Figure 1D**). Isolation of nematode mt T-arm-lacking tRNAs has been performed with preparative gel electrophoresis and/or solid-phase DNA affinity purification (Watanabe et al., 1994, 1997, Ohtsuki et al., 1998; Sakurai et al., 2005a,b).

Basically, intramolecular interactions in T-arm-lacking tRNAs are thought to be identical to those in conventional tRNAs, except for interactions between T- and D- arms, because of conservation around the D-arm and the similarity of the 5 region in the TV replacement loop to the variable loop region (Watanabe et al., 1994; Wolstenholme et al., 1994). As an analog of cloverleaftype tRNA, an L-shape-like structure of T-arm-lacking tRNAs has been proposed (Watanabe et al., 1994; Wolstenholme et al., 1994). Hypothesized interactions have been supported by chemical and enzymatic probing and NMR spectroscopy (**Figure 1E**; Watanabe et al., 1994; Ohtsuki et al., 1998).

The aminoacylation capacity of T-arm-lacking nematode mt tRNAs has been demonstrated with mt extract or recombinant enzymes (Watanabe et al., 1994; Ohtsuki et al., 1996; Chihade et al., 1998; Lovato et al., 2001; Sakurai et al., 2005b; Arita et al., 2006). Furthermore, the formation of a tertiary complex of a T-armlacking aminoacyl tRNA/EF-Tu/GTP has been detected (Ohtsuki et al., 2001; Arita et al., 2006).

In nematode mt T-arm-lacking tRNAs sequenced at the RNA level, the 1-methyladenosine at position 9 is strictly conserved (Watanabe et al., 1994, 1997; Sakurai et al., 2005a,b; see also Ohtsuki and Watanabe, 2007). This modification helps maintain the tertiary structure of the tRNA, and also aids in efficient aminoacylation and formation of the ternary complex with EF-Tu/GTP (Sakurai et al., 2005b).

### **tRNAs POTENTIALLY LACKING BOTH D- AND T-arms**

As mentioned above, the shortest biochemically characterized tRNA is a 54-nt long mt tRNASer(UCU) from the nematode *Ascaris suum* (Watanabe et al., 1994). A computational survey of mitochondrial tRNA genes predicted the presence of tRNA genes lacking both D- and T-arms (Jühling et al., 2012a,b). More recently, after the submission of the abstract of this review, RT-PCR analyses using 5- - and 3- -RACE showed that such putative tRNA genes are indeed transcribed, and the transcripts even have a 3- CCA sequence in the nematode *Romanomermis culicivorax* (Wende et al., 2014). Note that some tRNAs are imported from the cytoplasm into the mitochondria in some animals (Rubio et al., 2008), suggesting that imported tRNAs may function in place of mitochondrial-encoded putative tRNAs lacking both D- and T-arms. Thus, functional analysis of such extremely truncated putative tRNAs is critical.

### **FACTORS INTERACTING WITH tRNAs LACKING D- OR T-arms AMINOACYL tRNA SYNTHETASES**

Aminoacyl tRNA synthetases recognize a cognate tRNA and add an aminoacyl moiety to its 3 end. The major recognition sites of the enzymes in tRNAs are the anticodon, a discriminator base at position 73, and the acceptor stem (Giegé et al., 1998). In fact, in the case of alanyl-tRNA synthetases, even the mitochondrial enzyme uses the acceptor stem as a major recognition site (Chihade et al., 1998). However, some of the enzymes also use the D-arm in tRNA [e.g., *Escherichia coli* isoleucyl-tRNA synthetase (Nureki et al., 1994)]. Whether the mitochondrial counterparts of such enzymes encoded in nuclear genome still use D-arms as recognition sites is an interesting issue yet to be investigated. If so, even if a tRNA lost its T-arm, the enzyme could still add the aminoacyl moiety to the shrunken tRNA.

On the other hand, seryl-tRNA synthetase (SerRS) uses recognition sites other than the anticodon and acceptor stem. Bacterial SerRS recognizes T- and characteristic long variable arms in bacterial tRNASer (Asahara et al., 1994) using the N-terminal coiled–coil region (Biou et al., 1994). However, metazoan mt tRNASer has lost its long variable arm, and even the Darm is absent in tRNASer(GCU/UCU). Thus, how mt SerRS recognizes mt tRNASer without its long variable arm is an interesting question. Earlier studies suggested that mammalian mt SerRS can recognize not only mt tRNASer but also bacterial tRNASer; however, bacterial SerRS could not recognize mt tRNASer (Kumazawa et al., 1991). Also, mammalian mt SerRS recognizes the T-loop of both D-arm-lacking tRNASer and cloverleaf-type tRNASer(UGA) without the long variable arm, and further requires a T-loop/D-loop interaction with tRNASer(UGA) (dual-mode recognition; Ueda et al., 1992; Shimada et al., 2001). The crystal structure of a mammalian mt SerRS, a model of it complexed with tRNA, and mutational analyses suggest that the N-terminal coiled–coil region, the distal helix, and the C-tail interact with the T-arm of mt tRNASer (Chimnaronk et al., 2005, **Figure 2**). Mutational analysis of mammalian mt SerRS showed the substitution of some of the residues in N-terminal coiled–coil region (shown in stick model in **Figure 2B**) reduced the aminoacylation activities of either one of two mt tRNAsSer or both, suggesting that interaction of these residues with the tRNA in the enzyme-tRNA complex (Chimnaronk et al., 2005, **Figure 2B**). To maintain these interactions, the movement of N-terminal coiled–coil region (shown as a red arrow) is expected (Chimnaronk et al., 2005, **Figure 2B**). Furthermore, mutational analysis of the enzyme suggested that, for the recognition of tRNASer(UGA) which have T-loop/D-loop interaction, Arg24, Tyr28 and Arg32, in the distal helix and the Lys93 and Arg122 on the N-terminal coiled–coil region are important (**Figure 2B**). On the other hand, for the D-arm-lacking tRNASer(GCU), Arg24 and Arg32 flanking the distal helix, and Arg129 on the N-terminal helical region are crucial (**Figure 2B**). Thus, with the dual-mode recognition, mammalian mt SerRS recognizes two tRNASer isoacceptors which have different secondary structures using distinct sets of the residues.

Interestingly, in chromadorean nematode mt, two tRNAsSer have lost their D-arms, but the remaining 20 tRNAs have lost their T-arms. It is of interest whether nematode mt SerRS also recognize the T-arm. If so, that may explain why only tRNAsSer have kept their T-arms in nematode mitochondria. However, it seems reasonable that the secondary structure of tRNA is governed by EF-Tu and ribosomes rather than aminoacyltRNA synthetase (ARS) during evolution, because the recognition mode of each ARS is constrained by only one or a few tRNA, while that of EF-Tu or the ribosome is constrained by over 20 tRNAs.

### **EF-Tu**

The EF-Tu/GTP complex delivers aminoacyl-tRNAs to the A-site in ribosomes. Bacterial EF-Tu binds to the aminoacyl-moiety, a part of acceptor stem, and the T-arm (Nissen et al., 1995; **Figure 3A**), and it cannot bind to a tRNA analog missing the T-arm (Rudinger et al., 1994). In mitochondria, nuclearencoded EF-Tu exists. Due to the presence of aminoacyl-tRNAs missing the T-arm in nematode mitochondria (Watanabe et al., 1994; Ohtsuki et al., 1996), the EF-Tu counterpart in nematode mitochondria should use an alternative binding mode for Tarm-lacking tRNAs. In fact, nematode mitochondria have two EF-Tu homologs, and only one of them (EF-Tu1) binds to T-arm-lacking tRNAs (Ohtsuki et al., 2001; Arita et al., 2006).

(Chimnaronk et al., 2005). Two subunits of the enzyme are shown in green (monomer 1), and gray (monomer 2), respectively. Distal helix of monomer 1 and C-tail of monomer 2 which are mitochondrial-specific extensions and possibly interact with the tRNA, are shown in yellow and pink, respectively. In the tRNA, D-arm, variable loop, and T-arm are shown in purple, sky blue,

*Caenorhabditis elegans* mt EF-Tu1 has an approximately 60 aminoacid extension at the C-terminus (domain 3- ) that is essential for binding to T-arm-lacking tRNAs (Ohtsuki et al., 2001; Sakurai et al., 2006). The extension likely interacts with the D-arm region of T-arm-lacking tRNAs through positive charges in Lys residues (Sakurai et al., 2006; **Figures 3B,C**). Interestingly, *C. elegans* mt EF-Tu1 lacks binding ability to cloverleaf-type tRNAs (Ohtsuki et al., 2001), which are missing in *C. elegans* mitochondria (Okimoto et al., 1992). In another lineage of nematode, *Trichinella*, mitochondria have T-arm-lacking tRNAs, cloverleaftype tRNAs, and D-arm-lacking tRNAsSer (Lavrov and Brown, 2001). *Trichinella* also have two EF-Tu homologs, and one of them (EF-Tu1) binds to both T-arm-lacking tRNAs and cloverleaf-type tRNAs (Arita et al., 2006). Interestingly, *Trichinella* mt EF-Tu1 has a 41-residue C-terminal extension shorter than that in *C. elegans* mitochondria (Arita et al., 2006). A mutant of *C. elegans* mt EF-Tu1 with a 13-residue deletion at the C-terminus (43-residue extension left) cannot bind to T-arm-lacking tRNAs (Sakurai et al., 2006). Although the detailed tRNA binding mode of *Trichinella* mt EF-Tu1 has not been elucidated, it could be similar but not identical to that of *C. elegans* mt EF-Tu1. Note that *C. elegans* EF-Tu1 binds to only T-arm-lacking tRNAs, while *Trichinella* EF-Tu1 binds to T-arm-lacking tRNA, D-arm-lacking tRNA, and cloverleaf tRNA (Ohtsuki et al., 2001; Arita et al., 2006).

coiled–coil region and the C-tail distal helix with the T-arm of the tRNA. The interactions are inferred by mutational analysis of the enzyme (Chimnaronk et al., 2005). Residues in N-terminal coiled–coil region and distal helix involved in the interaction with tRNA are shown in stick model (Chimnaronk et al., 2005).

Nematode mitochondria have another EF-Tu homolog, EF-Tu2. Nematode mt EF-Tu2 has a short (about 15-residue) C-terminal extension, and it binds to D-arm-lacking tRNASer, but not to T-arm-lacking or cloverleaf-type tRNAs (Ohtsuki et al., 2001; Suematsu et al.,2005;Arita et al.,2006). *C. elegans* mt EF-Tu2 binds to a region of the T-arm exposed due to the missing interaction between the T-arm and D-arm (**Figures 3D,E**; Suematsu et al., 2005). Interestingly, D-arm-lacking tRNAs in this species have anticodons for serine, and EF-Tu2 binds only to Ser-tRNA and accepts neither Ala-tRNA nor Val-tRNA with the same backbone (Ohtsuki et al., 2002b; Arita et al., 2006). This is likely due to the evolution of the aminoacyl-moiety binding pocket in EF-Tu2 to specialize in binding with the seryl moiety because of a unique adaptation in Ser-tRNA (Sato et al., 2006).

More recently, in some taxa other than the nematodes, such as Arthropoda, there have been mitochondrial T-arm-lacking tRNA genes discovered (Masta, 2000). Interestingly, there are two mt EF-Tu genes in arthropods (Ohtsuki and Watanabe, 2007). The functional differences between the two EF-Tu homologs in these species should be elucidated, and this project is in progress in our laboratory.

### **FUTURE PERSPECTIVES**

Besides ARS and EF-Tu, other factors such as tRNA terminal nucleotidyltransferases (CCA enzymes) and ribosomes could

be interesting in terms of their interactions with shrunken mt tRNAs.

After the trimming of 3 extra sequences, 3- CCA sequences are added to 3 ends of pre-tRNAs by CCA enzymes (Deutscher, 1983). The bacterial CCA enzyme binds to the acceptor-T helix of pre-tRNA (Tomita et al., 2004), and thus a T-arm-lacking tRNA precursor is not a good substrate for the bacterial CCA enzyme (Tomari et al., 2002). On the other hand, the chromadorean nematode *C. elegans* has two genes for CCA enzymes, one of which encodes a putative mt CCA enzyme (Tomari et al., 2002). The recombinant (putative) mt CCA enzyme of *C. elegans*

can recognize and efficiently add a CCA sequence, not only to conventional cloverleaf tRNAs, but also to T- or D-arm-lacking tRNAs (Tomari et al., 2002). It would be interesting to know how the nematode mt enzyme recognizes T-arm-lacking mt tRNAs efficiently.

During translation, conventional bacterial tRNA interacts with several sites in the ribosome (reviewed by Khade and Joseph, 2010). In bacterial ribosomes, the T-arm of P-site tRNA interacts with ribosomal protein L5 (Korostelev et al., 2006; Selmer et al., 2006). At the A-site, the T-arm of tRNA interacts with ribosomal protein L16 (Selmer et al., 2006; Voorhees et al., 2009).

The residues in the A-site finger (helix 69) of 23S rRNA interact with the D-arm of tRNA at the A- and P-sites (Korostelev et al., 2006; Selmer et al., 2006; Voorhees et al., 2009). In a structural model of *C. elegans* mt rRNAs, the corresponding rRNA positions exist (Mears et al., 2002). At the E-site, residues in the T- and D-loops interact with ribosomal protein L1 and helices 76, 77, and 78 in 23S rRNA (e.g., the L1 stalk; Korostelev et al., 2006; Selmer et al., 2006). Interestingly, the corresponding regions in nematode mt rRNA are missing (Mears et al., 2002). In general, mitochondrial ribosomal proteins are enlarged compared to their counterparts in bacteria (Koc et al., 2000; Suzuki et al., 2001), suggesting that mitochondrial ribosomal proteins may have alternate binding modes for truncated tRNAs. Further structural analysis of metazoan mt ribosomes (Sharma et al., 2003; Greber et al., 2014) would be helpful to reveal the detailed interaction mode between mt ribosomes and shrunken mt tRNAs.

Structural alterations of metazoan mt tRNAs have been compensated for by several interacting factors. The mode of compensation by these factors may explain why metazoan tRNAs have undergone truncation during evolution. Further investigation into the detailed binding modes between shrunken tRNAs and the interacting factors that co-evolved with them will shed light on how truncated tRNAs evolved.

### **ACKNOWLEDGMENTS**

Our studies described in this review were supported by Ministry of Education, Culture, Sports, Science and Technology (MEXT), The Japan Society for the Promotion of Science (JSPS), the Kurata Memorial Hitachi Science and Technology Foundation and a grant from the Program for Promotion of Basic and Applied Researches for Innovations in Bio-oriented Industry (BRAIN), the Science and Technology Research Promotion Program for Agriculture, Forestry, Fisheries and Food Industry. We thank Prof. K. Watanabe and Prof. K. Kita for their continued support, and Dr. Sarin Chimnaronk for the docking model coordinates of mammalian mt SerRS/ tRNA.

### **REFERENCES**


T-armless tRNAs in nematode mitochondria. *Biochem. J.* 399, 249–256. doi: 10.1042/BJ20060781


by using a hybridization assay method. *Nucleic Acids Res.* 17, 2623–2638. doi: 10.1093/nar/17.7.2623

Yokogawa, T., Shimada, N., Takeuchi, N., Benkowski, L., Suzuki, T., Omori, A., et al. (2000). Characterization and tRNA recognition of mammalian mitochondrial seryl-tRNA synthetase. *J. Biol. Chem.* 275, 19913–19920. doi: 10.1074/jbc.M908473199

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 January 2014; accepted: 12 April 2014; published online: 01 May 2014.*

*Citation: Watanabe Y-i, Suematsu T and Ohtsuki T (2014) Losing the stem-loop structure from metazoan mitochondrialtRNAs and co-evolution of interacting factors. Front. Genet. 5:109. doi: 10.3389/fgene.2014.00109*

*This article was submitted to Non-Coding RNA, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Watanabe, Suematsu and Ohtsuki. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# tmRNA-mediated trans-translation as the major ribosome rescue system in a bacterial cell

### *Hyouta Himeno\*, Daisuke Kurita and Akira Muto*

*Department of Biochemistry and Molecular Biology, Faculty of Agriculture and Life Science, Hirosaki University, Hirosaki, Japan*

### *Edited by:*

*Akio Kanai, Keio University, Japan*

### *Reviewed by:*

*Toshifumi Inada, Tohoku University, Japan Yoshitaka Bessho, RIKEN SPring-8 Center, Japan*

### *\*Correspondence:*

*Hyouta Himeno, Department of Biochemistry and Molecular Biology, Faculty of Agriculture and Life Science, Hirosaki University, Hirosaki 036-8561, Japan email: himeno@cc.hirosaki-u.ac.jp*

Transfer messenger RNA (tmRNA; also known as 10Sa RNA or SsrA RNA) is a small RNA molecule that is conserved among bacteria. It has structural and functional similarities to tRNA: it has an upper half of the tRNA-like structure, its 5' end is processed by RNase P, it has typical tRNA-specific base modifications, it is aminoacylated with alanine, it binds to EF-Tu after aminoacylation and it enters the ribosome with EF-Tu and GTP. However, tmRNA lacks an anticodon, and instead it has a coding sequence for a short peptide called tagpeptide. An elaborate interplay of actions of tmRNA as both tRNA and mRNA with the help of a tmRNA-binding protein, SmpB, facilitates *trans*-translation, which produces a single polypeptide from two mRNA molecules. Initially alanyl-tmRNA in complex with EF-Tu and SmpB enters the vacant A-site of the stalled ribosome like aminoacyl-tRNA but without a codon–anticodon interaction, and subsequently truncated mRNA is replaced with the tag-encoding region of tmRNA. During these processes, not only tmRNA but also SmpB structurally and functionally mimics both tRNA and mRNA. Thus *trans*-translation rescues the stalled ribosome, thereby allowing recycling of the ribosome. Since the tag-peptide serves as a target of AAA+ proteases, the *trans*-translation products are preferentially degraded so that they do not accumulate in the cell. Although alternative rescue systems have recently been revealed, *trans*-translation is the only system that universally exists in bacteria. Furthermore, it is unique in that it employs a small RNA and that it prevents accumulation of non-functional proteins from truncated mRNA in the cell. It might play the major role in rescuing the stalled translation in the bacterial cell.

**Keywords: tmRNA, SmpB, ribosome,** *trans***-translation, molecular mimicry**

### **INTRODUCTION**

Translation often stalls in various situations in a cell, sometimes in a programmed fashion and other times unexpectedly. For example, translation of mRNA lacking a stop codon (non-stop mRNA) does not terminate efficiently because peptide release factor does not function. Thus, the cell should have a system to cope with such emergencies. However, little attention was given to this issue until the mid-1990s, and therefore the discovery of tmRNA was a big surprise. Initially the tRNA-like structure and function of tmRNA were elucidated. Both terminal regions of tmRNA can form a secondary structure resembling the upper half of the cloverleaf-like structure of tRNA (Komine et al., 1994; Ushida et al., 1994), which includes several tRNA-specific consensus sequences and base modifications (**Figure 1A**; Felden et al., 1997, 1998). Like that of tRNA, the 3 end of tmRNA can be aminoacylated with an amino acid (alanine) by an aminoacyltRNA synthestase (alanyl-tRNA synthetase; Komine et al., 1994; Ushida et al., 1994). Other tRNA-like functions, such as 5 processing by RNase P (Komine et al., 1994), binding to EF-Tu (Rudinger-Thirion et al., 1999; Barends et al., 2000, 2001; Hanawa-Suetsugu et al., 2001) and interaction with 70S ribosome (Ushida et al., 1994; Komine et al., 1996; Tadaki et al., 1996), have also been revealed. Although it is about fivefold larger than tRNA, tmRNA has no apparent anticodon, making it difficult to clarify whether and how tmRNA is involved in translation. A few years

later, it was found that tmRNA has functions not only as tRNA but also as mRNA: a short peptide is encoded by the middle part of tmRNA (Tu et al., 1995), which is surrounded by four pseudoknot structures (**Figure 1A**; Nameki et al., 1999b,c). Intriguingly, these two functions cooperate, rather than being independent, to produce a chimeric polypeptide from two mRNAs, a C-terminally truncated polypeptide encoded by mRNA fusing the tmRNAencoded short peptide with an alanine residue of unknown origin in between them (Keiler et al., 1996; Muto et al., 1996; Himeno et al., 1997). This acrobatic translation involving co-translational mRNA swapping produces a single polypeptide from two mRNAs, and thus it has been called *trans*-translation (**Figure 2**; Muto et al., 1998). This system provides a stop codon to allow completion of translation of a non-stop mRNA and consequently recycling of the ribosome. In addition, *trans*-translation has been regarded as a quality control system that prevents non-functional polypeptides derived from truncated mRNAs from accumulating in the cell, as the tag-peptide consisting of the first alanine residue and the tmRNA-encoded short peptide, especially the sequence of the last four hydrophobic amino acids (ALAA), serves as the target for cellular ATP-dependent proteases (Gottesman et al., 1998; Herman et al., 1998; Flynn et al., 2001; Choy et al., 2007).

*Trans*-translation requires a tmRNA binding protein called SmpB in addition to canonical elongation factors (Karzai et al.,

1999). SmpB consists of a globular domain and an unstructured C-terminal tail (Dong et al., 2002; Someya et al., 2003). It binds to the tRNA-like domain (TLD) of tmRNA to prevent tmRNA from degradation and enhance aminoacylation of tmRNA (Barends et al., 2001; Hanawa-Suetsugu et al., 2002; Shimizu and Ueda, 2002; Nameki et al., 2005). SmpB also plays a crucial role in the ribosomal process of *trans*-translation.

The *trans*-translation system is universally present in bacterial cells and is present in organelles of some eukaryotes but not in the cytoplasm of eukaryotes or archaebacteria. There is accumulating evidence indicating that the bacterial cell is equipped with additional systems to cope with stalled translation. Here, we review the current understanding of the molecular mechanism and the cellular functions of tmRNA-mediated *trans*-translation as well as other ribosome rescue systems.

### **MOLECULAR MECHANISM OF** *TRANS***-TRANSLATION**

An outline of the process of *trans*-translation is as follows (**Figure 3**): initially, tmRNA in complex with SmpB is aminoacylated with alanine by alanyl-tRNA synthetase. Ala-tmRNA enters the A-site of the stalled ribosome on a truncated mRNA to receive the nascent polypeptide from peptidyl-tRNA in the P-site. Then peptidyl-Ala-tmRNA translocates to the P-site, which exchanges the template from truncated mRNA to the tag-encoding region on tmRNA. It can reasonably explain the missing origin of the alanine residue connecting the truncated polypeptide encoded by mRNA with the tmRNA-encoded tag-peptide: it is derived from the alanine moiety aminoacylated to tmRNA. This model was proposed on the basis of the results of an *in vivo* study showing that truncated polypeptides fusing the tmRNA-encoding tag-peptide in its C-termini accumulate in the cell when a truncated mRNA is expressed (Keiler et al., 1996), and the model was supported by the results of an *in vitro* study showing that the tag-peptide is synthesized using *Escherichia coli* cell extract depending on the addition of poly(U) and on the aminoacylation capacity of tmRNA (Muto et al., 1996; Himeno et al., 1997). However, several questions have been raised. How does tmRNA find the stalled ribosome? How does tmRNA enter the A-site of the ribosome without an anticodon? How does the tag-encoding region of tmRNA substitute for truncated mRNA? How is the resuming point on tmRNA determined? SmpB has emerged as the key molecule to solve these questions.

Besides canonical translation factors and tmRNA, SmpB is the minimal requirement for *in vitro trans-*translation (Shimizu and Ueda, 2002; Takada et al., 2007; Kurita et al., 2012). SmpB has been thought to continue binding to tmRNA throughout the process of *trans*-translation (Ivanova et al., 2007). In a crystal structure of a complex of SmpB and a model RNA fragment corresponding to TLD of tmRNA, the globular domain of SmpB binds to TLD so that it compensates for the lack of the lower half of the L-form structure in tmRNA (**Figure 1B**; Gutmann et al., 2003; Bessho et al., 2007). Thus, TLD in complex with the globular domain of SmpB would structurally mimic a whole tRNA molecule. A directed hydroxyl radical probing study has revealed two SmpB binding sites in an *E. coli* ribosome, one in the A-site and the other in the P-site, both of which can be superimposed on the lower half of the tRNA molecules in the translating ribosome (Kurita et al., 2007). An additional mimicry of the upper half of tRNA by TLD could complete two whole translating tRNA mimics at the A-site and P-site so that their aminoacyl ends are oriented to the peptidyl-transferase center. The pattern of cleavage of 16S rRNA by hydroxyl radicals from the C-terminal tail residues has suggested two binding sites of the Cterminal tail of SmpB with two different modes of conformation

in the ribosome in addition to the unstructured conformation in solution: an extended conformation from the A-site to the downstream tunnel along the mRNA path as an α-helical structure and a folded conformation around the mRNA path in the P-site (Kurita et al., 2007, 2010).

On the basis of these SmpB properties, the *trans*-translation process can be described in more detail (**Figure 3**). AlatmRNA·SmpB·EF-Tu·GTP enters the vacant A-site of the stalled ribosome. GTP hydrolysis induces a conformational change of the stalled complex to release EF-Tu.GDP, allowing accommodation of Ala-TLD·SmpB in the A-site. During this process, the C-terminal tail of SmpB interacts with the mRNA path extending towards the mRNA entry channel. Subsequently, Ala-TLD in the A-site receives the nascent polypeptide chain from peptidyltRNA in the P-site, and the resulting peptidyl-Ala-TLD·SmpB translocates from the A-site to the P-site. During this process, the

C-terminal tail of SmpB dissociates from the mRNA entry channel and binds around the site of codon–anticodon interaction in the P-site with change in its conformation from the extended structure to the folded structure, which in turn promotes release of mRNA from the ribosome. The conformational change of the Cterminal tail concomitant with translocation makes the A-site free, thereby allowing introduction of the resume codon of tmRNA into the decoding region. This model has been supported by results of structural studies of several *trans*-translation intermediates. Cryo-EM studies have revealed three kinds of intermediates in the pre-accommodated, accommodated, and translocated states (Kaur et al., 2006; Cheng et al., 2010; Fu et al., 2010; Weis et al., 2010a,b). In all of them, SmpB and TLD occupy the lower and upper halves, respectively, of the tRNA-binding sites. Although the C-terminal tail of SmpB has not been identified in these maps due to low resolution, its interaction with the mRNA path has clearly

*trans*-translation. GTP hydrolysis by EF-Tu may induce a conformation change of the stalled complex, allowing the C-terminal tail of SmpB to interact with the mRNA path with an extended structure. After peptidyl-transfer,

been shown in a crystal structure of a pre-accommodation state complex of *trans*-translation containing kirromycin (Neubauer

et al., 2012). This model can explain why *trans*-translation preferentially occurs at the ribosome stalled on mRNA with a shorter 3- extension, which has been exemplified *in vitro* (Ivanova et al., 2004; Asano et al., 2005): the C-terminal tail of SmpB competes with the 3- -extension of mRNA for the mRNA entry channel. A chemical

process, the extended C-terminal tail somehow folds. Then the resume codon of tmRNA is set on the decoding region. SmpB and the tag-encoding region are shown in red and blue, respectively.

footprinting study has suggested that SmpB interacts with A1492, A1493, and G530 in 16S rRNA, which form the decoding region (Nonin-Lecomte et al., 2009). However, these nucleotides can be changeable without loss of both peptidyl-transferase and GTP hydrolytic activities in *trans*-translation, indicating much lower significance of these decoding nucleotides for *trans*-translation (Miller et al., 2011). In a crystal structure of a *Thermus thermophilus* pre-accommodation state complex of *trans*-translation, G530 stacks with a residue (Y127) around the start of the Cterminal tail of SmpB (Neubauer et al., 2012). Recently, Miller and Buskirk (2014) have found that the corresponding residue in *E. coli* SmpB (H136) has a crucial role in GTP hydrolysis, leading to the proposal that stacking of this residue with G530 triggers GTP hydrolysis.

EF-G promotes release of truncated mRNA from the stalled ribosome after peptidyl transfer to Ala-tmRNA, suggesting the presence of a canonical translocation-like step in the *trans*translation process (Ivanova et al., 2005). During this event, SmpB must pass through the barrier between the A-site and P-site, and tmRNA must enter the inside of the mRNA entry channel to set the resume codon in the decoding region. Consistently, in a cryo-EM map of a translocational intermediate complex containing EF-G and fusidic acid, both bridge B1a, which serves as a barrier between the A-site and P-site, and latch, which is usually closed by the interaction between the head (helix 34) and body (G530 region) to form the mRNA tunnel, are open (Ramrath et al., 2012). Precise positioning of the resume codon at the decoding region requires the sequence just upstream of the resume codon at positions –6 to +1 (Williams et al., 1999; Lee et al., 2001), and this sequence is recognized by the globular domain of SmpB (Konno et al., 2007), suggesting that SmpB bridges two separate domains of tmRNA in the P-site to determine the resume codon for tag-translation presumably just after translocation. This is in agreement with cryo-EM maps of translocated state complexes of *trans*-translation (**Figure 1B**; Fu et al., 2010; Weis et al., 2010b). Another study has suggested the importance of the C-terminal tail of SmpB and its interaction with the start GCA codon on tmRNA for determination of the start point of tag-peptide translation (Camenares et al., 2013). Taken together, the results suggest that the interaction between tmRNA and SmpB is more important for resume point determination than the interaction between tmRNA and the ribosome. It should be noted that some kinds of aminoglycosides that bind the decoding region shift the resume point of tag-translation (Takahashi et al., 2003; Konno et al., 2004).

Although several examples of the potential molecular mimicry of tRNA by a translation factor have been reported, SmpB is the sole molecule that has been assumed to mimic the dynamic behavior of tRNA throughout all of the classical and hybrid states, A/T, A/A, A/P, P/P, and P/E, in the translating ribosome. The ribosomal protein S1, which has been identified as a tmRNA-binding protein (Wower et al., 2000), is not thought to participate in the early stage of *trans*-translation, at least until the first translocation (Qi et al., 2007; Takada et al., 2007).

### **REQUIREMENT OF MRNA CLEAVAGE FOR** *TRANS***-TRANSLATION**

Because the tag-sequence serves as a degradation signal, *trans*translation products are hardly detected in the cell or its extract, although they become accumulated and thus detectable when the tag-encoding sequence of tmRNA is engineered. It has long been believed that *trans*-translation occurs around the 3- -end of truncated mRNA lacking a stop codon (non-stop mRNA) in the stalled ribosome since publication of the results of an earlier *in vivo* study using an artificial mRNA (Keiler et al., 1996). Non-stop mRNA can be produced either unexpectedly or in a programmed fashion, and a similar situation can also arise when the normal termination

codon is read through in the presence of a non-sense suppressor tRNA (Ueda et al., 2002) or a miscoding drug (Abo et al., 2002). Proteomic analyses have identified endogenous *trans*-translation products from various bacterial sources, indicating that *trans*translation preferentially occurs at specific sites of specific mRNAs (Roche and Sauer, 2001; Collier et al., 2002; Fujihara et al., 2002; Hong et al., 2007; Barends et al., 2010). Consequently, several situations that promote *trans*-translation in the middle of mRNA have been focused on: translational pausing due to a rare codon (Roche and Sauer, 1999), an inefficient termination codon (Roche and Sauer, 2001; Hayes et al., 2002a; Sunohara et al., 2002) and a programmed stalling sequence (Collier et al., 2004) induces *trans*translation. However, whether *trans*-translation actually occurs in the middle of mRNA without cleavage has been controversial. It has been found that a bacterial toxin, RelE, cleaves an mRNA specifically at the A-site in the stalled ribosome (Pedersen et al., 2003). RelE is usually inactivated by an antitoxin, RelB, and it is activated by degradation of RelB by Lon protease upon amino acid starvation. Yet, the finding of an A-site-specific endoribonuclease has supported the idea that mRNA cleavage is the prerequisite for *trans*-translation. Ribosome stalling induces cleavage of mRNA at the A-site even in a cell lacking RelE or several other endoribonucleases (Hayes and Sauer, 2003; Sunohara et al., 2004a,b; Li et al., 2008), indicating the involvement of an as-yet-unidentified ribonuclease or the ribosome itself in mRNA cleavage. It has also been reported that the 3- –5 exoribonulease activity of RNase II is an important prerequisite for A-sitespecific mRNA cleavage (Garza-Sánchez et al., 2009). Besides RelE, several kinds of ribosome-dependent endoribonucleases, each having a specific antitoxin, have been identified in *E. coli* (Feng et al., 2013).

*In vitro* studies have clearly shown that *trans*-translation can occur in the middle of mRNA, although the efficiency of *trans*translation is drastically decreased with increasing length of the 3- -extension from the stalled position (Ivanova et al., 2004; Asano et al., 2005). This is in agreement with results of structural studies showing that the C-terminal tail of SmpB occupies the mRNA path in the early stage of *trans*-translation so that it competes with the 3- -extension of mRNA (Kurita et al., 2010; Neubauer et al., 2012).

Proteomic studies have shown that *trans*-translation preferentially occurs at the proline codon just preceding the stop codon (Hayes et al., 2002a,b). Asp–Pro, Glu–Pro, Pro–Pro, Ile–Pro, and Val–Pro are favorable C-terminal dipeptides for *trans*-translation, suggesting an additional importance of the penultimate residue. In fact, Asp–Pro and Pro–Pro are unusually underrepresented at the C-terminus in most bacterial proteins. Due to the structural irregularity of proline having a secondary amine instead of the primary amine, peptidyl-Pro-tRNAPro in the A-site would interfere with the access of peptide release factor (Janssen and Hayes, 2009), rather than that of the Ala-tmRNA·SmpB·EF-Tu·GTP complex. Consistently, limited amounts of aminoacyltRNA or release factor induce *trans*-translation, indicating competition of *trans*-translation with aminoacyl-tRNA and release factor for sense and non-sense codons, respectively, in the stalled ribosome (Ivanova et al., 2004; Asano et al., 2005; Li et al., 2007). Consecutive proline residues also affect

peptidyl-transfer during the elongation process to cause translational arrest, which can be rescued by EF-P (Doerfel et al., 2013; Ude et al., 2013).

These *in vivo* and *in vitro* studies together have settled the controversy shown above: translation can stall even in the middle of mRNA in some situations, and this kind of stalled ribosome can be a potential target for *trans*-translation, although it would substantially occur only after cleavage of mRNA around the A-site by RelE or another as-yet-unidentified ribonuclease with the help of a 3- –5exoribonuclease RNase II.

### *TRANS***-TRANSLATION AS QUALITY CONTROL SYSTEMS OF PROTEIN AND MRNA**

As described above, the most significant role of *trans*-translation is to promote recycling of stalled ribosomes in the cell. *Trans*translation is thought to have additional roles as quality control systems of protein and mRNA.

Most *trans*-translation products would be non-functional, and thus their accumulation might be deleterious for the cell. To avoid this situation, the tag-peptide and in turn the *trans*-translation products are promptly degraded in the cell by cytoplasmic ATPdependent proteases (AAA+ proteases), including ClpXP, ClpAP, Lon and FtsH, and the periplasmic protease Tsp (Prc; **Figure 4**). ClpX or ClpA recognizes the C-terminal ALAA sequence of the tag-peptide to unfold the *trans*-translation products in an ATPdependent fashion for degradation by its partner ClpP peptidase (Gottesman et al., 1998). The tag-peptide specifically binds to a protein, SspB, to increase its affinity to ClpX, and consequently ClpXP is thought to play the dominant role in degradation of *trans*-translation products at least in β- and γ-proteobacteria (Flynn et al., 2001) and perhaps in α-proteobacteria (Lessner et al., 2007). Lon participates in degradation of *trans*-translation products under stressful conditions (Choy et al., 2007). FtsH is anchored to the cytoplasmic side of the inner membrane

in blue and red, respectively.

to degrade the membrane-associated *trans*-translation products (Herman et al., 1998). The C-terminal ALAA sequence of the tagpeptide required for ClpXP and ClpAP is highly conserved among bacteria except *Mycoplasma*, in which the tag-peptide terminates with AFA instead of ALAA. This can be addressed by the absence of ClpXP, ClpAP, and Tsp in *Mycoplasma* (Gur and Sauer, 2008; Ge and Karzai, 2009).

While *trans*-translation is induced by cleavage of mRNA in the stalled ribosome as described above, *trans*-translation also promotes further degradation of non-stop mRNA (Yamamoto et al., 2003). *Trans*-translation would expose the 3 end of non-stop mRNA sequestered by the stalled ribosome, facilitating the access of 3- –5 exoribonuclease for degradation of non-stop mRNA. It should be a great advantage for the cell, considering that ribosome stalling would be repeated until non-stop mRNA is degraded, even if the stalled ribosome at the 3 end of the polysome is rescued by *trans*-translation. RNase R is a likely candidate for such an exoribonuclease (Oussenko et al., 2005; Mehta et al., 2006; Richards et al., 2006; Ge et al., 2011). *E. coli* RNase R makes a complex with tmRNA and SmpB (Karzai and Sauer, 2001) via direct interaction with SmpB (Liang and Deutscher, 2010). It is induced under stress conditions in *E. coli* (Chen and Deutscher, 2005) and is involved in cell cycle-regulated degradation of tmRNA in *Caulobacter crescentus* (Hong et al., 2005). *E. coli* RNase R is acetylated in exponential phase, resulting in the exponential phase-specific degradation via tighter binding to tmRNA·SmpB (Liang et al., 2011).

### **PHYSIOLOGICAL SIGNIFICANCE OF** *TRANS***-TRANSLATION**

The apparent universality of the *trans*-translation system among bacteria suggests some biological significance of this system. Indeed, it is essential for some bacteria including *Neisseria gonorrhoeae* (Huang et al., 2000), *Mycoplasma genitalium* (Hutchison et al., 1999), *Haemophilus influenzae* (Akerley et al., 2002), *Helicobacter pylori* (Thibonnier et al., 2008), and *Shigella flexneri* (Ramadoss et al., 2013a), and its depletion causes a wide variety of disorders. Since its lack causes avirulence of some infectious bacteria, the *trans*-translation system has been focused on as an effective target for antibiotics (Shi et al., 2011; Ramadoss et al., 2013b).

Many of these defective phenotypes are caused by a defect in the *trans*-translation reaction rather than degradation of the *trans*translation products (Keiler, 2008). This suggests that ribosome recycling is more important for the cell than preventing accumulation of non-functional proteins. Upon starvation of amino acids, supply of amino acids from *trans*-translation products should become important for new protein synthesis (Pedersen et al., 2003; Li et al., 2008).

*Trans*-translation is often employed for regulation of gene expression. In *E. coli*, tmRNA-mediated *trans*-translation targets mRNA for LacI, a repressor of the lactose operon, to accelerate its degradation upon glucose depletion, leading to derepression of the *lac* operon (Abo et al., 2000). In *Bacillus subtilis*, *trans*-translation occurs around the catabolite responsive element (*cre*) sequence, a binding site of the repressor protein catabolite control protein A (CcpA), within the coding region of several mRNAs including TreP mRNA for trehalose phosphorylase (Fujihara et al., 2002). Binding of CcpA to the *cre* sequence would induce a transcriptional roadblock to produce truncated mRNA (Ujiie et al., 2009). In *C. crescentus*, the cell cycle (Keiler and Shapiro, 2003a) and the initiation of DNA replication (Keiler and Shapiro, 2003b; Hong et al., 2007) are controlled by *trans*-translation.

There is accumulating evidence for increased importance of tmRNA under stressful conditions, such as high or low temperature (Oh and Apirion, 1991; Muto et al., 2000; Shin and Price, 2007), nutrient starvation (Oh and Apirion, 1991; Okan et al., 2006; Abe et al., 2008), ethanol treatment (Muto et al., 2000), cadmium treatment (Muto et al., 2000), and acid exposure (Thibonnier et al., 2008). Stresses might increase the frequency of aberrant translation in cells, which can be rescued by *trans*translation. Consistently, the total amount of *trans*-translation products increases under stressful conditions (Fujihara et al., 2002). Perhaps in response to the increased requirement of the *trans*-translation system, the intracellular level of tmRNA or SmpB increases with an increase in stress in some bacteria (Muto et al., 2000; Palecková et al., 2007; Rezzonico et al., 2007).

The *trans*-translation system sometimes regulates other stress response systems possibly via expression of a global regulator. For example, depletion of tmRNA induces heat shock response in *E. coli* (Munavar et al., 2005). The expression level of the sigma factor RpoS (sigma S), which controls the expression of a series of genes involved in general stress response, is positively controlled by *trans*-translation in *E. coli* (Ranquet and Gottesman, 2007). The involvement of *trans*-translation in the extracellular stressresponse pathway via another sigma factor, RpoE (sigma E), has also been suggested (Ono et al., 2009). Other stress-related proteins including the molecular chaperone DnaK are regulated by *trans*-translation in streptomycetes (Barends et al., 2010). Interestingly, the expression of ArfA, an alternative ribosome rescue system (described in a later subsection), is regulated by *trans*-translation (Chadani et al., 2011a; Garza-Sánchez et al., 2011).

### **EVOLUTIONARY ASPECTS OF** *TRANS***-TRANSLATION**

Although it is not essential for viability in most bacteria, tmRNA is present universally in the bacterial kingdom and in some plastids or mitochondria of some protists. The tRNA-like secondary structure of TLD as well as several tRNA-specific consensus sequences is highly conserved except for the deformed D-arm structure, which has an extensive interaction with SmpB. In addition, the third base-pair position of the amino acid acceptor stem is completely conserved as G–U, which serves as a potent identity determinant for recognition by alanyl-tRNA synthetase (AlaRS). Alanine might not be an absolute prerequisite for *trans*-translation as the amino acid aminoacylated to tmRNA as exemplified by Nameki et al. (1999a). However, AlaRS might be the most preferable aminoacyl-tRNA synthetase for tmRNA, considering the unique recognition mode of AlaRS depending largely on the acceptor stem instead of the anticodon. The tRNA-like structure would also be significant for recognition by EF-Tu and RNase P as well as for ribosome binding. In contrast to the high degree of conservation of TLD, there is variation in the pseudoknot-rich region (Nameki et al., 1999b; Williams, 2002). Plastid tmRNA has fewer or no pseudoknot structures (Gueneau de Novoa and Williams, 2004). Indeed, at least the last three of four pseudoknots of *E. coli* tmRNA are dispensable for *trans*-translation *in vitro* (Nameki et al., 2000), although they participate in proper folding and processing of tmRNA (Wower et al., 2004). TLD and the pseudoknot-rich region are linked by a long degenerated helix, which protrudes from TLD in a direction corresponding to that of the long variable arm of class II tRNA (**Figure 1B**; Bessho et al., 2007). This direction should be conserved within the constraints of the tRNA-like dynamic behavior of tmRNA in the limited space of the ribosome, although the sequence of the connector helix is less conserved.

In some lineages of α-proteobacteria, β-proteobacteria, cyanobacteria, and mitochondria of lower eukaryotes, tmRNA is separated into two pieces, a 5- -coding piece typically including only two pseudoknots (PK1 and PK2) with the tag-encoding region in between and a 3- -amino acid acceptor piece, and the two pieces join together by base-pairing to form into a tRNA-like structure similar to that in one-piece tmRNA (**Figure 1A**; Keiler et al., 2000; Williams, 2002; Sharkady and Williams, 2004). The gene for two-piece tmRNA is circularly permuted (Keiler et al., 2000; Williams, 2002; Mao et al., 2009), and the permuted precursor might be processed into a mature two-piece tmRNA probably with the help of RNase P and tRNase Z. A similar processing strategy has been found in the circularly permuted tRNA gene in some primitive eukaryotes or archaebacteria (Soma et al., 2007). The two-piece tmRNA in *C. crescentus* belonging to α-proteobacteria has been shown to actually function in *trans*-translation (Keiler et al., 2000). Either one-piece or two-piece tmRNA is present in mitochondrial genome of some groups of protists (jakobids; Jacob et al., 2004). They lack a tag-encoding sequence as well as pseudoknots, arguing against their capacity for bacterial type of *trans*-translation.

SmpB together with tmRNA is universally present in bacteria. Plastid tmRNA is encoded by the plastid genome, while plastid SmpB is encoded by the nuclear genome and it is imported from the cytoplasm (Jacob et al., 2005). Up to now, there has been no report about mitochondrial SmpB. Both tmRNA and SmpB should have been required at the birth of *trans*-translation. The gene for tmRNA might have been formed by insertion of something into a tRNAAla gene. In contrast, no one can envisage the origin of SmpB because of the absence of its homologue.

### **DIVERSITY OF RESCUE SYSTEMS OF STALLED TRANSLATION**

As described above, *trans*-translation targets various kinds of translational pausing due to a rare codon, an inefficient termination codon or a programmed stalling sequence, but after cleavage of mRNA. The bacterial cell has alternative mechanisms to rescue the stalled ribosome (**Figure 4**; Himeno, 2010).

Peptidyl-tRNA hydrolase (Pth) has an activity to hydrolyze the linkage between tRNA and the nascent polypeptide of peptidyltRNA after it drops off from the ribosome. Drop-off is enhanced by RRF alone, RRF together with RF3 (Heurgué-Hamard et al., 1998; Gong et al., 2007) or RRF, IF3, and EF-G (Singh et al., 2008). Drop-off was initially assumed to occur in the earlier cycles of elongation. This seems reasonable considering that a longer nascent polypeptide chain would prevent release of peptidyl-tRNA from the peptide channel of the translating ribosome. However, dropoff has been shown to efficiently occur at the 3end of non-stop

mRNA in the absence of tmRNA (Kuroha et al.,2009). Overexpression of tmRNA suppresses the temperature-sensitive phenotype of Pth (Singh and Varshney, 2004), suggesting that Pth contributes not only to hydrolyzing the dropped-off peptidyl-tRNA but also to rescuing the stalled ribosome or suggesting that *trans*-translation can substitute for spontaneous or factor-promoting drop-off and the following peptidyl-tRNA hydrolysis by Pth.

It has recently been found that two proteins, ArfA (YhdL) and YaeJ (ArfB), facilitate rescue of the stalled ribosome. A single knockout of either *E. coli* ArfA or tmRNA is viable, whereas a double knockout is lethal, explaining why tmRNA is not essential in many bacteria (Chadani et al., 2010). The ribosome rescue activity of ArfA was initially shown using *E. coli* crude extract (Chadani et al., 2010). However, ArfA alone does not have an activity to hydrolyze peptidyl-tRNA in the P-site possibly due to the absence of a typical GGQ catalytic motif, and it requires the help of RF-2 (Chadani et al., 2012; Shimizu, 2012). RF-2 usually acts as a UAA or UGA codon-dependent release factor, while it serves as a stop codon-independent release factor in the presence of ArfA. Intriguingly, translation for ArfA protein is stalled near the 3- -terminus of its mRNA due to cleavage by RNase III, and consequently ArfA is usually degraded via the *trans*-translation system, and only when the cellular *trans*-translation activity becomes diminished, is C-terminally truncated but active ArfA synthesized via spontaneous drop-off or ArfA-mediated release of the nascent polypeptide (Garza-Sánchez et al., 2011; Chadani et al., 2011a). Thus, the ArfA-mediated ribosome rescue system is considered to be a backup system for *trans*-translation. YaeJ has also been shown to rescue the ribosomes stalled at the 3 end of a non-stop mRNA *in vitro* (Handa et al., 2011) and *in vivo* (Chadani et al., 2011b). A double knockout of *E. coli* ArfA and tmRNA is lethal as described above, whereas overexpression of YaeJ makes it viable (Chadani et al., 2011a). Unlike ArfA, YaeJ alone has an activity to hydrolyze peptidyl-tRNA in the P-site of the stalled ribosome, as expected from the similar sequence and structure to those of the catalytic domain of bacterial class I release factor having a GGQ motif. YaeJ is likely to act as a stop codon-independent peptide chain release factor since it lacks a stop codon-recognition domain and instead it is replaced by a C-terminal basic-residue-rich extension that might be unstructured in solution (Handa et al., 2010). In a crystal structure of *E. coli* YaeJ in complex with the stalled ribosome from *T. thermophilus*, the C-terminal extension of YaeJ, like that of SmpB, binds along the mRNA path of the stalled ribosome extending to the downstream mRNA tunnel with an α-helical structure (Gagnon et al., 2012). ArfA as well as Ala-tmRNA·SmpB·EF-Tu(GTP) does not favor the long 3 extension of mRNA from the decoding region upon entrance to the stalled ribosome, while YaeJ is less sensitive (Shimizu, 2012). Thus bacterial cells are equipped with multiple systems to cope with stalled translation, and they are therefore often still viable even when they lose the *trans*-translation system. Judging from phenotypes of factor-depleted cells, the *trans*-translation system must be the primary ribosome rescue system.

There are some reports about stress-specific ribosome rescue systems. The heat shock protein HSP15 has been shown to bind to the dissociated 50S subunit with a nascent protein (Korber et al., 2000). Upon exposure to a high temperature, a fraction of translating ribosomes might prematurely be dissociated into subunits, although peptidyl-tRNA remains bound to the dissociated 50S subunit unless the nascent peptide is short. In this 50S subunit, HSP15 fixes peptidyl-tRNA at the P-site to make the A-site free presumably for entrance of a peptide release factor (Jiang et al., 2009). High intracellular magnesium ion concentration or low temperature causes translational arrest after defective translocation. It also promotes release of a GTPase, EF4 (LepA), which is usually stored in the cell membrane, into the cytoplasm to rescue the translational arrest by back translocation (Pech et al., 2011).

Translation often stalls at a specific site on an mRNA in a programmed fashion. As in the case of ArfA expression described above (Chadani et al., 2011a; Garza-Sánchez et al., 2011), translational arrest is sometimes used for repression of gene expression via *trans*-translation. On the other hand, a stalled ribosome often prevents access of rescue machineries to keep translational arrest for regulation of gene expression. *E. coli* tryptophanase (*tna*) operon is induced by tryptophan via the translational arrest of the leader peptide (TnaC) by inhibiting hydrolysis of peptidyl-tRNAPro by RF2 (Yanofsky, 2007). This stalled ribosome is not rescued by *trans*-translation in the presence of tryptophan, although it is rescued slowly by RRF and RF3, leading to dropoff (Gong et al., 2007). The ribosome is also stalled at an internal proline codon of *E. coli secM* mRNA,which up-regulates the translation of the downstream SecA-encoding sequence presumably by disrupting the secondary structure that sequesters the ribosome binding site (Muto et al., 2006). This translational arrest is caused by inefficient peptidyl-transfer of Pro-tRNAPro in the A-site to the nascent peptidyl-tRNA in the P-site, which inhibits entrance of Ala-tmRNA to the A-site and the A-site specific cleavage of mRNA (Garza-Sánchez et al., 2006). The translation of *B. subtilis yidC* mRNA is regulated by translational arrest at multiple sites on the upstream *mifM* mRNA (Chiba et al., 2009; Chiba and Ito, 2012). Puromycin is less reactive to this translational arrest, suggesting that ribosome rescue machineries including Ala-tmRNA·SmpB·EF-Tu(GTP) are also less accessible to the Asite of this stalled ribosome. Consecutive proline codons cause a translational arrest due to inefficient peptidyl-transfer between peptidyl-(Pro)n-tRNA in the P-site and Pro-tRNA in the A-site (Doerfel et al., 2013; Ude et al., 2013). In this case, the A-site is occupied by Pro-tRNAPro, and in turn Ala-tmRNA·SmpB·EF-Tu(GTP), ArfA or YaeJ would fail to access this stalled ribosome. Instead, the peptidyltransferase center is modulated by EF-P binding to the region between the P-site and E-site to resume peptidyltransfer (Blaha et al., 2009). Pro–Pro–Pro and Gly–Pro–Pro arrest sequences, which can be rescued by EF-P, are often found in bacterial genes, and they might be programmed for regulation of gene expression.

Pth is essential for bacteria and is widely distributed among the other domains. While the *trans*-translation system universally exists in bacteria, YaeJ is distributed among Gram-negative bacteria and ArfA shows more limited distribution within enterobacteria. EF-P is conserved among bacteria and its homologue (a/eIF5A) is universally present in archaea and eukaryotes. EF4 is universally conserved among bacteria. Neither tmRNA nor SmpB is present in the cytoplasm of eukaryotes, where a complex of Dom34p (Pelota) and the GTP-binding protein Hbs1 promotes subunit dissociation of the stalled ribosome and drop-off of peptidyl-tRNA (Shoemaker et al., 2010) in concert with an ATPase, ABCE1 (Pisareva et al., 2011). The Dom34p·Hbs1 complex is structurally similar to the eRF1·eRF3 complex or the aminoacyl-tRNA·EF-Tu complex (Chen et al., 2010), although the GGQ motif is absent in Dom34, and peptidyl-tRNA hydrolysis is assumed to be catalyzed by Pth after drop-off. In mitochondria, two protein factors partially similar to the bacterial class I release factor, ICT1 (a homologue of YaeJ) and C12orf65, both lacking a stop codon-recognition domain while retaining the catalytic GGQ motif, participate in ribosome rescue (Handa et al., 2010; Richter et al., 2010; Kogure et al., 2012, 2014). ICT1 (YaeJ), but not C12orf65, has an insertion of an α-helix in the GGQ domain, and thus ICT1 is less similar to class I release factor. C12orf65 has been found in very limited bacteria. Collectively, various kinds of release factor homologues, YaeJ, ICT1, C12orf65, ArfA/RF2, Dom34p (Pelota), and Hbs1, have been found to participate in ribosome rescue. The *trans*-translation system is unique in that it employs a small RNA and that it prevents accumulation of non-functional proteins from truncated mRNA in the cell.

### **ACKNOWLEDGMENTS**

The authors are grateful to all those who have been involved in this work. This work was supported by a grant-in-aid for Scientific Research from the Ministry of Education, Science, Sports and Culture, Japan to Hyouta Himeno, grants-in-aid for Scientific Research (B) and (C) from the Japan Society for the Promotion of Science to Akira Muto and grants-in-aid for Young Scientists from the Japan Society for the Promotion of Science to Daisuke Kurita.

### **REFERENCES**


mitochondrial disease-related protein C12orf65. *Proteins* 80, 2629–2642. doi: 10.1002/prot.24152


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 26 January 2014; accepted: 15 March 2014; published online: 07 April 2014. Citation: Himeno H, Kurita D and Muto A (2014) tmRNA-mediated trans-translation as the major ribosome rescue system in a bacterial cell. Front. Genet. 5:66. doi: 10.3389/ fgene.2014.00066*

*This article was submitted to Non-Coding RNA, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Himeno, Kurita and Muto. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# tRNADB-CE: tRNA gene database well-timed in the era of big sequence data

### *Takashi Abe1\*, Hachiro Inokuchi <sup>2</sup> ,YukoYamada2 , Akira Muto3 ,Yuki Iwasaki <sup>2</sup> and Toshimichi Ikemura2*

*<sup>1</sup> Graduate School of Science and Technology, Niigata University, Niigata, Japan*

*<sup>2</sup> Nagahama Institute of Bio-Science and Technology, Nagahama, Shiga, Japan*

*<sup>3</sup> Faculty of Agriculture and Life Science, Hirosaki University, Hirosaki, Japan*

### *Edited by:*

*Akio Kanai, Keio University, Japan*

### *Reviewed by:*

*Akio Kanai, Keio University, Japan Rintaro Saito, University of California, USA*

### *\*Correspondence:*

*Takashi Abe, Graduate School of Science and Technology, Niigata University, 8050, Ikarashi 2-no-cho, Nishi-ku, Niigata 950-2181, Japan e-mail: takaabe@ie.niigata-u.ac.jp*

The tRNA gene data base curated by experts "tRNADB-CE" (http://trna.ie.niigata-u.ac.jp) was constructed by analyzing 1,966 complete and 5,272 draft genomes of prokaryotes, 171 viruses', 121 chloroplasts', and 12 eukaryotes' genomes plus fragment sequences obtained by metagenome studies of environmental samples. 595,115 tRNA genes in total, and thus two times of genes compiled previously, have been registered, for which sequence, cloverleaf structure, and results of sequence-similarity and oligonucleotide-pattern searches can be browsed. To provide collective knowledge with help from experts in tRNA researches, we added a column for enregistering comments to each tRNA. By grouping bacterial tRNAs with an identical sequence, we have found high phylogenetic preservation of tRNA sequences, especially at the phylum level. Since many species-unknown tRNAs from metagenomic sequences have sequences identical to those found in species-known prokaryotes, the identical sequence group (ISG) can provide phylogenetic markers to investigate the microbial community in an environmental ecosystem. This strategy can be applied to a huge amount of short sequences obtained from next-generation sequencers, as showing that tRNADB-CE is a well-timed database in the era of big sequence data. It is also discussed that batch-learning self-organizing-map with oligonucleotide composition is useful for efficient knowledge discovery from big sequence data.

**Keywords: tRNA, database, metagenome, phylogenic maker, BLSOM, big data**

### **INTRODUCTION**

After completion of genome sequencing, tRNA genes on each genome have been predicted using computer programs such as tRNAscan-SE (Lowe and Eddy, 1997), ARAGORN (Laslett and Canback, 2004), and tRNAfinder (Kinouchi and Kurokawa, 2006) and included in the flat file of the sequence data for registration to the International DNA Data Banks (DDBJ/ENA/GenBank). However, in approximately 5% of the completely sequenced genomes, the annotation data of tRNA genes have not been included. There also exist many cases where an important information of anticodon (and thus of amino acid) has not be added while the tRNA itself is noted. The compilation of tRNA sequences and tRNA genes has been published for the first time by Sprinzl et al. (1978) and has been updated and reconstructed the database including modified nucleosides (tRNAdb1; Jühling et al., 2009). Using tRNAscan-SE, the Genomic tRNA Database (GtRNAdb2) has been constructed for complete and near complete genomes (Chan and Lowe, 2009). In addition, a database for comprehensive listing of modified nucleotide (The RNA Modification Database3; Limbach et al., 1994) and that for modification pathways (MODOMICS4; Machnicka et al., 2012)

have been reported. Mamit-tRNA<sup>5</sup> is a compilation of mammalian mitochondrial tRNA genes (Pütz et al., 2007). Another characteristic tRNA database is SPLITSdb<sup>6</sup> constructed by Keio Group (Sugahara et al., 2006, 2008), which has compiled tRNA sequences in archaeal and primitive eukaryotic species for promoting next studies of tRNA evolution and processing; 671 intron-containing and 12 split tRNAs have been registered. In the present Review Article, we introduce current status of our "tRNA gene database curated by experts (tRNADB-CE)" and compare it with other tRNA databases for explaining its characteristics.

In accord with the remarkable progress of DNA sequencing technology, a vast quantity of partially sequenced draft genome sequences and metagenomic sequences obtained from a wide variety of environmental and clinical samples have been compiled in DDBJ/ENA/GenBank. However, no information of tRNA genes has been added to draft and metagenomic sequences in DDBJ/ENA/GenBank and in GtRNAdb and tRNAdb. Metagenomic sequences have attracted broad industrious interests; even short sequences obtained with new-generation sequencers contain a large number of full-length tRNAs because tRNA lengths are short. Search for tRNA genes in metagenomic sequences may provide a new strategy for clarifying the microbial community

<sup>1</sup>http://trnadb.bioinf.uni-leipzig.de/

<sup>2</sup>http://lowelab.ucsc.edu/GtRNAdb/

<sup>3</sup>http://mods.rna.albany.edu/home

<sup>4</sup>http://modomics.genesilico.pl/

<sup>5</sup>http://mamit-trna.u-strasbg.fr/

<sup>6</sup>http://splits.iab.keio.ac.jp/splitsdb/

in an environmental sample. This prediction is supported by the findings that examination of a group of tRNAs with an identical sequence obtained from species-known prokaryotes has revealed that such tRNAs belong primarily to a specific phylogenetic group and that the phylotype-specific tRNA sequences have also been found in species-unknown metagenomic sequences. This supports the view that these tRNA genes should become good phylogenetic markers for studying phylotype composition in an environmental ecosystem. It should be noted here that, when analyzing a dataset composed only of short fragmental sequences (e.g., 100 nt), genomic sequences other than tRNA genes appears to be difficult to be properly used for phylogenetic assignments, except for dominant genomes for which contiguous sequences with a sufficient length for constructing reliable phylogenetic trees can be obtained.

### **GENOMIC SEQUENCES ANALYZED AND METHODS OF SEARCH FOR tRNA GENES**

Because one important role of tRNADB-CE is a use of tRNA genes as phylogenetic markers in metagenomic studies, we have mainly focused on microbial genomes. The following sources of DNA sequences have been used for constructing the present version of tRNADB-CE: 1966 and 171 complete prokaryote and virus genomes released by DDBJ/EMBL/GenBank up to September 2012, 121 complete chloroplast genomes released by organelle genome database (GOBASE7) up to March 2009, 5272 prokaryote draft genomes released by WGS division in DDBJ/ENA/GenBank up to September 2012, 12 eukaryote complete genomes, 17 million metagenomic sequences released by DDBJ/ENA/GenBank up to March 2012, and 217 million metagenomic sequences obtained using next-generation sequencers and released by sequence read achieve (SRA8) in NCBI up to March 2012.

To enhance the completeness and accuracy of searching for tRNA genes, three computer programs, tRNAscan-SE, ARAGORN, and tRNAfinder have been used in combination, since their algorithms are partially different and have rendered somewhat different results. The tRNA genes found concordantly by all three programs were stored in tRNADB-CE after brief manual checking for cases of non-standard anticodons, such as bacterial A-starting anticodons (except for Arg) and those responding to termination codons, as explained later. Discordant cases among the three programs (approximately 5% of the total of tRNA gene candidates predicted by at least one program) were manually checked by three experts (YY, AM, and HI) in tRNA experimental fields and were classified into three categories: (A) reliable tRNA genes, (B) not tRNA genes, and (C) ambiguous cases.

The tRNA genes of Archaea obtained from SPLITSdb<sup>9</sup> constructed by Keio Group (Sugahara et al., 2006, 2008) are included in the current version of tRNADB-CE. Basic functions of the database have been described previously (Abe et al., 2009, 2011). For fragment sequences obtained by metagenome studies, only tRNA genes found concordantly by the three programs and those

with sequences identical to the tRNAs already included in the database were stored. Many tRNA genes were detected in various environmental samples, and the number was registered for each environment sample. This result has enabled us to predict the microbial population in an environmental ecosystem by using tRNAs as phylogenetic markers. Since a significant portion of environmental sequences is thought to be derived from unculturable microbes, tRNA genes of novel microbes should be included.

### **tRNA GENES COMPILED IN THE CURRENT VERSION AND A NEW FUNCTION FOR ORGANIZING COLLECTIVE KNOWLEDGE**

In the present database, 595,115 tRNA genes in total (112266, 317508, 961, 3534, 4137, and 156709 genes from 1966 complete and 5272 draft prokaryote genomes, 171 viruses, 121 chloroplasts, 12 eukaryotes, and 221 metagenomic samples, respectively) have been registered. This number is two times as many genes as were previously registered (Abe et al., 2011) and functions of the database are listed in **Figure 1**. Comparison of registered data and functions among three tRNA databases is presented in **Table 1**. More than ten and five times of tRNAs have been compiled by tRNADB-CE than by tRNAdb and GtRNAdb, respectively. This is mainly because tRNADB-CE has included tRNAs obtained from draft genome and metagenome sequences, as well as virus and plastid sequences. Another important difference of tRNADB-CE from others is the use of three computer programs for tRNA gene search.

The nucleotide frequency in each position on the tRNA cloverleaf structure is presented on a statistics page, providing information on consensus nucleotides for each anticodon of individual data types (**Figure 1A**). Since a vast amount of tRNAs has been registered, this is useful for systematic search for identifier nucleotides, which interact with aminoacyl-tRNA synthetases. To aim at creating a high quality database by collecting knowledge in various tRNA research fields, we developed a new function for including comments on each gene in "The detailed information of tRNA gene sequence page" (**Figure 1D**). Users can add comments after typing in their e-mail address and password, while we reserve the right to remove irrelevant comments. We hope that the accumulation of user's comments will provide annotations with high quality and this database will become an information sharing system in the tRNA research community.

### **IDENTICAL SEQUENCE GROUPS AND THEIR USE AS PHYLOGENETIC MARKERS**

We investigated a group of tRNAs with an identical sequence and found that one group was composed primarily of tRNAs derived from species belonging to a certain phylogenetic group, i.e., phylotype-specific tRNAs. To verify this phylogenetic preservation of tRNA sequences in more detail, we conducted the clustering of 429,774 tRNA sequences derived from 7,237 prokaryote genomes through CD-HIT sequence alignment (Li and Godzik,2006). Then we designated the group with an identical sequence as "Identical Sequence Group: ISG". When focusing on ISGs composed of more than five genes, 95% of ISGs have been conserved at the phylum level, showing that most tRNA sequences are good phylogenetic markers at least at the phylum level. In addition, approximately 65% of ISGs have been conserved at the genus level, indicating

<sup>7</sup>http://gobase.bcm.umontreal.ca/

<sup>8</sup>http://www.ncbi.nlm.nih.gov/Traces/sra/

<sup>9</sup>http://splits.iab.keio.ac.jp/splitsdb/

that more than half of ISGs may be usable as genus-specific makers. This is because tRNA sequences have been stably conserved during evolution. Increases in the number of genes in each ISG in the near future will progressively clarify the phylogenetic range that can be covered by one ISG. By combining the data provided by this database with other knowledge obtained from experiments or literatures, users can choose useful phylogenetic markers (e.g., genus-specific markers) by themselves. Our group has started to search phylogenetic markers for the genomes that are rare in regular environments and will publish these markers in the aforementioned column for comments to each tRNA gene. It should be noted here that horizontal gene transfer between different species is a general characteristic of microbial genomes. Therefore, we may not find phylogenetic markers with the 100% accuracy. Any informatics method, including sequence homology searches, most likely assigns the horizontally transferred elements to the donor but not the recipient genome. When certain tRNAs that have primarily found in a certain phylotype have been additionally found in the phylogenetically distant but restricted species, these tRNAs

may represent genes that have been transferred horizontally to the restricted species or products of the convergent evolution. Phylogenetic marker tRNAs have to be used in consideration of these points.

Interestingly, approximately 25% of tRNA genes obtained from metagenomic sequences were found identical in sequence to genes from species-known prokaryotes and assignable to ISGs. By using these assigned tRNAs as phylogenetic markers, we have predicted the microbial community in an environmental ecosystem at a phylum level (**Figure 2A**). The number of tRNA genes found in known species and a list of the species can be browsed for each environmental sample (**Figures 2B,C**). When the numbers are clicked, the list of metagenomic tRNAs assignable to phylum can be accessed (**Figure 2D**). This strategy can be applied even to the data of short sequences obtained with new-generation sequencers, such as metagenomic sequences in the SRA10 at NCBI. It is noteworthy that phylogenetic clustering of short sequences (e.g., 100 nt)

<sup>10</sup>http://www.ncbi.nlm.nih.gov/sra

**Table 1 | Comparison of registered data and functions between databases (at March, 2014).**


with conventional sequence homology searches is inadequate for studying a microbial community in an ecosystem. However, evolutionary stable tRNA genes can be used as effective phylogenetic markers for predicting the microbial community, since full-length tRNAs can be found even from short genomic fragments obtained with new-generation sequencers.

The database is also capable of searching for sequences with 1–3 nt differences (**Figure 2A**). By using tools in the database as well as phylogenetic markers found by users (e.g., genus-specific markers), users can search for very specific and rare genomes of the user's attention from a wide range of environmental samples.

### **MINIMUM ANTICODON SET AND NON-STANDARD-TYPE tRNAs**

An important curation process by experts was to investigate the minimum anticodon set most likely to be essential for the translation system of each bacterial species (Osawa, 1995), because three computer programs did not necessarily assign this set concordantly. When we could not find the minimum set in our standard search, we reexamined the candidates predicted by one or two programs and searched for the most probable candidate that is satisfactory to the minimum set, according to various criteria such as identifiers for respective anticodons and referring to literatures of experiments. We also checked whether an identical sequence and the sequences with 1–2 nt differences were present in closely related genomes, basing on the view that a functionally active gene should be stably maintained during evolution. This final check has become increasingly useful because many closely related species have been sequenced currently. The search for the minimum anticodon set in each genome can assign tRNAs whose sequences appear to be rather non-standard. If non-standard-type tRNAs have been found iteratively in species belonging to a special phylogenetic group, such tRNAs will become good phylogenetic markers with high specificity. The aforementioned newly added column for comments on each tRNA can be used for mentioning this phylogenetic information.

We next explain one example of non-standard-type tRNAs more in detail. The Anticodon Table of bacterial tRNAs (**Figure 3A**) points out that occurrences of A-starting anticodons are very rare except for Arg, Leu, and Thr: 0 for Phe, Tyr, Cys, Val, Asp, and Gly; 1 for His, Ile, and Asn; 3 for Ser and 6 for Pro. We have manually checked tRNA candidates with these very rare A-starting anticodons, on the basis of the aforementioned criteria for the minimum anticodon set and/or the iterative occurrences

in the closely related species. Information concerning such nonstandard-type tRNAs can promote experimental studies to prove predicted tRNA candidates at the RNA level. For example, we have found novel bacterial tRNA genes with TAT anticodon in 19 species, including *Mycoplasma mobile*. In many bacteria, ATA codons are deciphered by tRNAIle2 bearing lysidine (L) at the wobble position (Soma et al., 2003; Suzuki and Miyauchi, 2010); L is a modified cytidine introduced post-transcriptionally by tRNAIlelysidine synthetase (TilS). Some bacteria, including *M. mobile*, do not carry the tilS gene, indicating that they have a specific system to decode ATA codons. Taniguchi et al. (2013) have experimentally shown at the cellular RNA level that the *M. mobile* tRNAIle2 registered in our database (**Figure 3B**) contains an unmodified UAU anticodon.

### **tRNA DATABASE WELL-TIMED IN THE ERA OF BIG SEQUENCE DATA**

One important characteristic of tRNADB-CE is inclusion of tRNAs found in a large number of partially sequenced genomes. The aforementioned search for the minimum anticodon set can be

applied only to a complete genome. Another check that is applicable even to partially sequenced genomes is to examine whether non-standard-type candidates have been repeatedly found in closely related species, and this process has become increasingly useful because genomes of many closely related species, even of different strains belonging to one species, have been sequenced. When the same or almost the same sequence has been found repeatedly, we have included the sequence in the Reliable category basing on the knowledge that functionally important genes have been stably maintained throughout evolution. Importantly this check process can be applied even to metagenomic sequences; if the non-standard-type candidates have been found repeatedly in metagenomic samples, especially in different environmental samples, these can be included in the Reliable category. Accumulation of a massive number of sequences in the near future should further increase the usefulness and reliability of this strategy, indicating that tRNADB-CE is a well-timed database in the era of big sequence data.

### **ENHANCING QUALITY OF A LARGE-SCALE DATABASE AND EFFICIENT KNOWLEDGE DISCOVERY**

The number of tRNA genes compiled has already become huge and will undoubtedly increase more rapidly in the near future. For efficient knowledge discovery from such big data, analysis tools presently available are inadequate, and new tools suitable for big data analyses should be urgently required. Our group has developed a bioinformatics tool "BLSOM (Batch-Learning Self-Organizing-Map)" with oligonucleotide compositions (Abe et al., 2003), which can analyze more than one million genomic sequences simultaneously and allows us to efficiently acquire a wide range of knowledge from big sequence data. Oligonucleotides, such as penta- and heptanucleotides, often represent motif sequences responsible for sequence-specific protein binding (e.g., transcription factor binding). Occurrences of such motif oligonucleotides should differ from occurrences expected from the mononucleotide composition in each genome and may differ among genomic portions within a single genome. Actually, we have recently found that a pentanucleotides BLSOM for the human genome can detect characteristic enrichment of many transcription-factor-binding motifs in pericentric heterochromatin regions (Iwasaki et al., 2013a), showing that BLSOM can effectively detect the characteristic and combinatorial occurrences of functional motif oligonucleotides in the genomic sequence. Each tRNA has the characteristic and combinatorial occurrences of motif oligonucleotides, which are required to fulfill its function (e.g., binding to proper enzymes) and structural formation (e.g., clover leaf). To examine the usefulness of BLSOM for knowledge discovery from a huge amount of tRNAs, we have recently constructed a BLSOM with the pentanucleotide composition in all bacterial tRNAs in tRNADB-CE. Interestingly, tRNAs are separated primarily according to amino acid (should be published elsewhere), showing that the BLSOM

can detect the characteristic combinations of motif oligonucleotides required for proper recognition by various enzymes, including aminoacyl-tRNA synthetases. Bacterial tRNAs with the same anticodon form their own territories, indicating this BLSOM can be used for an informatics method for assigning reliable tRNAs. For creating a high quality database, it is important to find errors that are slipped into the database, including those attributable to sequencing errors. An orphan tRNA located apart from the proper anticodon territories on BLSOM may be a candidate for erroneous cases, for which manual checks by experts have to be conducted. This type of new informatics strategies is required to find errors present in the huge amount of predicted tRNAs, especially in those predicted concordantly by all three programs and thus included in the database with no manual check. More importantly, BLSOM is an unsupervised clustering method with a strong visualization power (Abe et al., 2003) and this unsupervised data-mining method can be used for efficient knowledge discoveries from big sequence data (Iwasaki et al., 2013b). Informatics tools suitable for big data analyses have to be introduced to extract a wide range of knowledge from large-scale databases in the era of big data.

### **ACKNOWLEDGMENTS**

This work was supported by a Grant-in-Aid for Publication of Scientific Research Results, for Scientific Research (C), and for Young Scientists (B) from the Ministry of Education, Culture, Sports, Science and Technology, Japan.

### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 January 2014; accepted: 16 April 2014; published online: 01 May 2014. Citation: Abe T, Inokuchi H, Yamada Y, Muto A, Iwasaki Y and Ikemura T (2014) tRNADB-CE: tRNA gene database well-timed in the era of big sequence data. Front. Genet. 5:114. doi: 10.3389/fgene.2014.00114*

*This article was submitted to Non-Coding RNA, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Abe, Inokuchi, Yamada, Muto, Iwasaki and Ikemura. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Molecular mechanisms of template-independent RNA polymerization by tRNA nucleotidyltransferases

### *Kozo Tomita\* and SeisukeYamashita*

*RNA Processing Research Group, Biomedical Research Institute, National Institute of Advanced Industrial Science and Technology, Tsukuba, Japan*

### *Edited by:*

*Akio Kanai, Keio University, Japan*

### *Reviewed by:*

*Gota Kawai, Chiba Institute of Technology, Japan Michael Ibba, Ohio State University, USA*

### *\*Correspondence:*

*Kozo Tomita, RNA Processing Research Group, Biomedical Research Institute, National Institute of Advanced Industrial Science and Technology, 1-1-1, Higashi, Tsukuba, Ibaraki 305-8566, Japan e-mail: kozo-tomita@aist.go.jp*

### **INTRODUCTION**

Every tRNA has the CCA sequence at its 3- -terminus (CCA-3 at positions 74–76; C74C75A76-3- ). The CCA-3 moiety is required for amino acid attachment (aminoacylation) onto the 3- -end of the tRNA by aminoacyl-tRNA synthetases (Sprinzl and Cramer, 1979), and for peptide-bond formation on the ribosome. The CCA-3 physically interacts with the ribosomal RNA during translation on the ribosome (Green and Noller, 1997; Kim and Green, 1999; Nissen et al., 2000). The CCA-3 is also required for tRNA quality control. The tandem C74C75A76C77C78A79-3 sequence, added onto the 3- -end of tRNA, acts as a degradation signal for dysfunctional tRNA molecules (Wilusz et al., 2011).

The CCA-3 is synthesized and/or repaired by the CCA-adding enzyme, CTP:(ATP) tRNA nucleotidyltransferase (NT), using CTP and ATP as substrates (Deutscher, 1990; Weiner, 2004). The CCAadding activity is conserved in all three primary kingdoms – archaea, eubacteria, and eukarya (Yue et al., 1996), and is essential in organisms in which some tRNA genes do not encode CCA-3- (Aebi et al., 1990). The CCA-adding enzymes belong to the NT family, encompassing enzymes as diverse as polyA polymerase (PAP), terminal deoxynucleotidyltransferase (TdT), DNA polymerase β (pol β), and kanamycin nucleotidyltransferase (KanNT; Holm and Sander, 1995; Martin and Keller, 1996, 2007; Yue et al., 1996). Among the NT family members, the CCA-adding enzyme is a remarkable, template-independent RNA polymerase. The enzyme can synthesize an ordered nucleotide sequence, CCA, onto the 3- -end of a specific primer, tRNA, without a nucleic acid template. Moreover, the enzyme is sensitive to register. It recognizes three kinds of tRNAs – tRNA lacking C74C75A76, C75A76, or A76 – and reconstructs the CCA-3sequence as needed.

The CCA-adding enzymes are classified into two classes (class-I and class-II), based on the sequence alignments (Yue et al., 1996). The archaeal CCA-adding enzymes belong to class-I, and share sequence similarity with the eukaryotic PAPs. On the other

The universal 3- -terminal CCA sequence of tRNA is built and/or synthesized by the CCA-adding enzyme, CTP:(ATP) tRNA nucleotidyltransferase. This RNA polymerase has no nucleic acid template, but faithfully synthesizes the defined CCA sequence on the 3- -terminus of tRNA at one time, using CTP and ATP as substrates. The mystery of CCA-addition without a nucleic acid template by unique RNA polymerases has long fascinated researchers in the field of RNA enzymology. In this review, the mechanisms of RNA polymerization by the remarkable CCA-adding enzyme and its related enzymes are presented, based on their structural features.

**Keywords: tRNA, CCA, template-independent, nucleotidyltransferase, class-I and II**

hand, the eubacterial/eukaryotic CCA-adding enzymes belong to class-II, and share sequence similarity with the eubacterial PAPs (**Figure 1A**). Although both enzyme classes catalyze the same reaction in defined fashions, significant amino acid similarities are not readily apparent between the two classes of CCA-adding enzymes. Only local similarities around the active site signatures have been identified.

In most organisms, the CCA-3 is synthesized by a single enzyme that can add the CCA-3 at one time. However, in some eubacteria, the CCA-3 is synthesized by two distinct, but closely related, class-II enzymes. One adds C74C75 and the other adds A76, and the CCA-3 is synthesized by these two enzymes in a collaborative manner (Tomita and Weiner, 2001, 2002; Bralley et al., 2005; Neuenfeldt et al., 2008).

For more than forty years, the molecular mechanisms of CCAaddition by the CCA-adding enzyme, a remarkable, templateindependent RNA polymerase, and its related enzymes have been a mystery, and have fascinated researchers in the field of RNA enzymology since the identification of their activities. The following two main models had been proposed to explain the unique enzymatic activity, using class-I and class-II CCA-adding enzymes (Sprinzl and Cramer, 1979; Deutscher, 1990; Shi et al., 1998a; Hou, 2000; Li et al., 2000; Tomari et al., 2000; Weiner, 2004). In the first model (**Figure 1B**), there are multiple nucleotide binding sites (two or three) for CTP and ATP, and the active site moves relative to these sites and tRNA during polymerization (Deutscher, 1982; Hou, 2000; Li et al., 2000). In the second model (**Figure 1C**), there is a single nucleotide binding site, and the growing 3- -terminus of the tRNA refolds in the active site to specify nucleotide addition. Thus, the nucleotide specificity is collaboratively dictated by the RNA–protein complex (Yue et al., 1996; Shi et al., 1998a). The second model, termed the "collaborative templating" model, was proposed to explain the biochemical studies showing that there is a single active site in the enzyme

**models. (A)** Schematic representation of two classes of CCA-adding enzymes and their related enzymes. Archaeal CCA-adding enzymes and eukaryotic polyA polymerases (PAP) belong to class-I. Eubacterial/eukaryotic CCA-adding enzymes and eubacterial PAPs belong to class-II. The N-terminal 25 kDa of the class-II CCA-adding enzyme and PAP are homologous. In some eubacteria, CCA-addition is accomplished by two class-II A-adding and CC-adding enzymes. Domains of CCA (or A, or CC)-adding enzymes and polyA polymerases are colored based on the structures as in **Figure 2A** and **Figure 5A**. DXD is the active site signature containing two carboxylates. The third catalytic carboxylate is also depicted. **(B)** Multiple nucleotide binding sites model. CCA is added by using three nucleotide binding sites (two for CTPs and one for ATP) and a mobile single catalytic site. **(C)** Collaborative templating model. The nucleotide binding site is composed of the refolded 3- -terminus of the RNA and the protein. The nucleotide specificity is collaboratively dictated by the RNA-protein complex.

for both CTP and ATP incorporations, and the tRNA is fixed on the surface of the enzyme, and neither translocates nor rotates relative to the enzyme during CCA-addition (Shi et al., 1998a; Yue et al., 1998).

Over the past fifteen years, the crystal structures of class-I and class-II enzymes and their complexes with nucleotide(s) and/or various tRNAs or mini-helices have been reported (Li et al., 2002; Augustin et al., 2003; Okabe et al., 2003; Xiong et al., 2003; Tomita et al., 2004, 2006; Xiong and Steitz, 2004; Toh et al., 2008, 2009, 2011; Pan et al., 2010; Yamashita et al., 2014). The structural information has solved most of the long-standing mysteries, but not all yet, about the mechanism of the remarkable, template-independent CCA-adding enzyme and its relatives (Schimmel and Yang, 2004; Weiner, 2004; Xiong and Steitz, 2006). Here, we review the current

understanding of the mechanism of CCA-addition by the CCAadding enzyme, and its related enzymes, based on their crystal structures.

### **Class-I ARCHAEAL CCA-ADDING ENZYME**

In this section, we review the mechanism of CCA-addition by the class-I archaeal CCA-adding enzyme, based on the crystal structures of the enzyme and its complexes with RNAs and nucleotides or nucleotide analogs (Okabe et al., 2003; Xiong et al., 2003; Xiong and Steitz, 2004; Tomita et al., 2006; Toh et al., 2008; Pan et al., 2010).

### **THE TEMPLATE DOES NOT RESIDE WITHIN THE class-I CCA-ADDING ENZYME**

The crystal structures of class-I CCA-adding enzyme from *Archaeoglobus fulgidus* (AFCCA) and its complexes with CTP or ATP were reported (Okabe et al., 2003; Xiong et al., 2003). The structures of AFCCA consist of four domains: N-terminal, central, C-terminal, and tail domains, and the overall structure is U-shaped (**Figure 2A**). This architecture of AFCCA is different from that of the class-II CCA-adding enzyme as described below, and is rather similar to that of the eukaryotic PAP (Bard et al., 2000; Martin et al., 2000). This was anticipated from the comparisons of their primary amino acid sequences (Yue et al., 1996; **Figure 1A**).

The N-terminal domain of AFCCA consists of five β-strands and two α-helices, and three catalytic carboxylates (Glu59, Asp61, and Asp110) reside on the β-sheets. The three catalytic carboxylates are located in close proximity to each other and coordinate the catalytic Mg2<sup>+</sup> ion. The N-terminal catalytic domain structure of AFCCA is homologous to those of other NT family members, including class-II CCA-adding enzyme, pol β and other polynucleotide polymerases (Sakon et al., 1993; Pelletier et al., 1994; Bard et al., 2000; Martin et al., 2000). This suggested that the catalytic cores of the class-I and class-II CCA-adding enzymes share a common ancestor, together with those of other NT family members, and that the class-I and class-II enzymes both catalyze nucleotidyltransfer by the same metal–ion catalytic mechanism (Brautigam and Steitz, 1998). The central domain consists of four stranded β-sheets, and is topologically homologous to the RNA-recognition motif (RRM) of several RNA-binding proteins, such as ribosomal protein S6 (Okabe et al., 2003; Xiong et al., 2003).

The structures of AFCCA complexed with various nucleotides (Xiong et al., 2003) revealed that the nucleotide sits in the interdomain region, between the N-terminal and central domains of AFCCA (**Figure 2B**). In the complex structures, all nucleotides bound to the catalytic pocket of the enzyme non-specifically, and the base moieties of the nucleotides were disordered in the complex structures. This observation implied that the nucleotide specificity is not dictated by the enzyme alone in the class-I CCA-adding enzyme. Instead, it suggested that the specificity is dictated by the RNA–enzyme complex, as conceptually suggested in the "collaborative templating" model (Shi et al., 1998a; **Figure 1C**). The detailed mechanism of nucleotide selection by the class-I CCAadding enzyme was later clarified by the determination of several crystal structures of AFCCA complexed with an RNA primer in the presence of an incoming nucleotide, as described below in detail.

by surface models.

### **RNA-PROTEIN TEMPLATE FOR CTP AND ATP SELECTION BY class-I CCA-ADDING ENZYME**

structures of the catalytic core of AFCCA with CTP (left), ATP (middle)

The complex structures of AFCCA with a tRNA bearing a CCA-3- terminus, and with various RNA primers mimicking the top-half of a tRNA molecule (tRNA mini-helix or double-stranded RNA) in the presence or absence of nucleotide, were reported (Xiong and Steitz, 2004; Tomita et al., 2006; Toh et al., 2008; Pan et al., 2010). These complex structures of AFCCA with various RNA primers, representing the sequential CCA adding reactions, revealed the detailed mechanism of nucleotide specificity and the dynamic CCA-adding reaction by the class-I CCA-adding enzyme.

AFCCA recognizes the acceptor-TΨC helix, the top-half of the tRNA, and does not interact with the tRNA anticodon region at all (Xiong and Steitz, 2004; **Figure 2C**). This is consistent with a previous biochemical study showing that a mini-helix RNA (and even the double-stranded RNA) corresponding to the top-half of the tRNA can be a primer for CCA-addition by the class-I CCAadding enzyme (Shi et al., 1998b). The tail domain of AFCCA interacts with the elbow region in the TΨC loop of the tRNA. The tail domain functions as an anchor for the tRNA, and prevents the tRNA from dislodging from the enzyme surface during CCA-addition. The 3- -terminus of the tRNA enters the active

pocket between the N-terminal and central domains of AFCCA (**Figure 2C**).

The ternary structures of AFCCA with a tRNA mini-helix (or double helix RNA) and an incoming nucleotide CTP (orATP), representing the C75-adding (or A76-adding) reaction, were reported (Xiong and Steitz, 2004; Tomita et al., 2006). In the ternary complex structures, the geometries of the incoming CTP (or ATP) and the 3- -OH group of the ribose in the 3- -terminal nucleoside of the RNA, relative to the catalytic carboxylates (Glu59, Asp61, and Asp110) and a Mg<sup>2</sup><sup>+</sup> metal, suggested that the structures represent the nucleotide insertion stages of RNA polymerization.

In the ternary complex structure of AFCCA with a tRNA minihelix ending with C74 (or double helix RNA) in the presence of CTP, representing C75-addition, the cytosine base of the CTP stacks with the cytosine base of C74 at the 3- -terminus of the RNA. The 4-NH2 group and the N3 atom of the CTP hydrogen-bond with the phosphate groups of C74 and A73 (discriminator nucleoside) of the RNA primer, and with Arg224 in the central domain, respectively (**Figure 2D**). In the ternary structure of AFCCA with the tRNA mini-helix (or double helix RNA) ending with C74C75 in the presence of ATP, representing the A76-adding reaction, the adenine base of the ATP stacks with the cytosine base of C75 at the

3- -terminus of the RNA. The 6-NH2 group and the N1 atom of the incomingATPform hydrogen-bonds with the phosphate groups of C74 and A73 in the RNA primer and Arg224 in the central domain, respectively (**Figure 2D**). Thus, the templates for CTP and ATP selection by AFCCA were found to be the phosphate backbone of the RNA primer and the protein, rather than solely the protein itself, as implicated (Shi et al., 1998a).

After the C75-addition, tRNA mini-helix acceptor stem is fixed on the enzyme surface, and the tRNA mini-helix neither translocates nor rotates relative to the enzyme surface. Then, the 3- -terminus of tRNA mini-helix refolds and retracts into the enclosed active pocket (Xiong and Steitz, 2004; Tomita et al., 2006; Toh et al., 2008; Pan et al., 2010). The refolding of the 3- -terminus of the tRNA in the pocket places the ribose 3- -OH group of C75 in the RNA proximal to the active site of the enzyme. The structural changes in the active pocket of the enzyme, as well as the refolding of the 3- -end of RNA, after C75-adition, ensure that the active pocket is free of any nucleotide, for successive ATP accommodation in the pocket. Thus, a single active pocket can be utilized for both C-addition and A-addition by the class-I CCA adding enzyme.

These sequential structural analyses of AFCCA also revealed that the size and the shape of the nucleotide pocket, composed of the growing 3- -terminus of the RNA and the enzyme, successively change during the CCA-adding reaction (**Figures 2E,F**). At the C75-adding stage, the size and the shape of the nucleotide binding pocket are suitable for CTP accommodation, and the larger ATP cannot snugly fit in the pocket. After C75-addition, the 3- -terminus of the RNA refolds in the enclosed active site, and the size and the shape of the nucleotide binding pocket become suitable for ATP. Although the smaller CTP could bind in the pocket, it does not snugly fit (Tomita et al., 2006; Pan et al., 2010). Thus, the nucleotide specificity of the class-I CCA adding enzyme shifts from CTP-specific to ATP-specific during the successive CCA-adding reactions.

Together, these extensive crystallographic analyses suggest that the template of the class-I CCA-adding enzyme is neither the protein nor the RNA, but the RNA–protein complex. The RNA primers neither translocate nor rotate relative to the enzyme surface, and a single active pocket is utilized for both C- and A-addition, by successive refolding of the 3- -terminal nucleoside in the enclosed active pocket. The successive refolding of the 3- -terminus of the tRNA during polymerization changes the nucleotide specificity of the class-I CCA-adding enzyme from CTP to ATP. These structural features explained the previous biochemical studies well (Shi et al., 1998a; Yue et al., 1998).

### **DYNAMICS OF CCA-ADDITION BY THE class-I CCA-ADDING ENZYME**

The binary complex structures of AFCCA with a tRNA mini-helix in the absence of incoming nucleotides were also reported (Tomita et al., 2004; Toh et al., 2008). The comparison of the binary and ternary structures highlights the dynamic change in the orientation of the N-terminal domain of the enzyme and the 3- -terminus of the RNA during CCA-addition (**Figures 3A,B**).

In the binary complex of AFCCA with a tRNA mini-helix ending in either D73 (D is the discriminator nucleoside) or C74, the nucleobase of the 3- -terminal nucleoside stacks onto the preceding nucleobase, and the ribose 3- -OH group of the 3- -terminal nucleoside is far from the active site (three catalytic carboxylates). This structure represents an inactive form. At the C75-adding reaction stage, upon binding the incoming CTP in the active site, the N-terminal domain of AFCCA relocates toward the central domain, leading the enzyme to transit from an open conformation to a closed conformation (**Figure 3A**). This allows the 3- -nucleoside C74 of the RNA to flip, and positions the ribose 3- -OH group of the 3- -terminal nucleoside proximal to the catalytic residues and the triphosphate group of the incoming CTP. This structure represents a catalytically active form (**Figure 3C**). Although the structure representing C74-addition was not determined, the binary complex of AFCCA with tRNA mini-helix ending in D73 showed that the 3- -terminal D73 of RNA and three catalytic carboxylates well superimposed onto the 3- terminal C74 of RNA and the catalytic carboxylates, respectively, in the binary complex of AFCCA with tRNA mini-helix ending in C74, as described below (**Figure 3D**). Thus, C74-adding reaction would proceed in the same mechanism as C75-adding reaction.

At the C74-adding reaction stage, the mini-helix RNA ending with D73 adopts an extended form (Tomita et al., 2006; **Figure 3D**). After the C74-addition reaction is completed, the N-terminal domain of the enzyme relocates outward, and the enzyme transits to the open, inactive form. Then, the 3- -terminal nucleoside (C74) of the tRNA mini-helix flips back into the active pocket, and the acceptor helix of the tRNA mini-helix shrinks back from the extended form. The change in the tRNA mini-helix from the extended form to the shrunken form allows the active pocket to become nucleotide free, for successive CTP accommodation and C75-addition. CTP binding in the active pocket for C75-addition induces the relocation of the N-terminal domain of the enzyme toward the central domain again (the enzyme transits to an active closed form again; **Figure 3A**). The 3- -nucleoside of the tRNA flips again, and the C75-adding reaction proceeds (**Figure 3C**). Thus, in both the C74-adding and C75-adding reactions, CTP binding in the active site dynamically induces the conformational change of the enzyme from an inactive open form to an active closed form, and only the correct nucleotide, CTP, can allow the transition of the enzyme.

After the C75-adding reaction, the enzyme is fixed in an active closed from (**Figure 3B**). The 3- -terminus of the tRNA mini-helix refolds in the enclosed active site, and a newly shaped nucleotide binding pocket is created by the enzyme and the 3- -terminus of the RNA (**Figure 3E**). ATP can bind in the nucleotide pocket, and A76-addition proceeds without the open to closed conformational change of the enzyme. The fixation of the enzyme in a closed conformation after C75-addition is facilitated by the interaction between the β-turn in the N-terminal domain and the 3- -terminus of the tRNA mini-helix. The mutation of a key amino acid residue, Arg224, reduced C74C75-addition, but not A76-addition, *in vitro*, indicating that Arg224 in the pocket does not discriminate ATP from CTP in the A76-adding reaction (Tomita et al., 2006; Pan et al., 2010). Thus, the A76-adding reaction is static, and is distinct from the C74C75-adding reactions, which are accompanied by the dynamic open to closed conformational transition of the enzyme

with a tRNA mini-helix at the C75-adding stage, in either the absence (colored magenta) or the presence (colored cyan) of CTP. **(B)** Superimposition of the complex structures of AFCCA with a tRNA mini-helix at the A76-adding stage, in either the absence (colored magenta) or the presence (colored cyan) of ATP. **(C)** Comparison of the catalytic core structures of AFCCA at the C75-adding stages in either the absence (left) or the presence (right) **(D)** Superimposition of RNAs at the C74-adding stage (colored magenta) and C75-adding stage (colored cyan). RNA at the C74-adding stage adopts an extended form. **(E)** Comparison of the catalytic core structures of AFCCA at the A76-adding stage in either the absence (left) or the presence (right) of ATP. For clarity, only the key amino acid residues of AFCCA are shown. RNA and nucleotides are depicted by stick models.

and require the proper conformation of Arg244 in the pocket. Consecutive conformation changes of the β-turn in the N-terminal domain accompany the refolding of the tRNA 3- -terminus during the reaction (**Figures 3C,E**). The β-turn in the N-terminal domain monitors the 3- -terminal sequence of the tRNA for correct CCA-addition (Toh et al., 2008).

After A76-addition, the N-terminal domain relocates outward, triggered by pyrophosphate release, and the enzyme adopts the open conformation. At this stage, there is no room to accommodate another nucleotide in the active pocket. Finally, a tRNA with a CCA-3 terminus dissociates from the enzyme, and the CCAadding reaction is completed (Xiong and Steitz, 2004; Tomita et al., 2006).

The dynamic sequence of the CCA-adding reaction by the class-I CCA-adding enzyme, revealed by the crystallographic analyses of complexes of class-I CCA-adding enzyme with various RNA primers with or without nucleotides, is presented in **Figure 4**.

### **THE class-II CCA-ADDING ENZYME AND ITS RELATED ENZYMES**

In this section, we review the mechanism of CCA-addition by the class-II eubacterial/eukaryotic CCA-adding enzyme, and its related class-II eubacterial CC-adding and A-adding enzymes, based on the crystal structures of the enzymes and their complexes with RNAs (Li et al., 2002; Augustin et al., 2003; Tomita et al., 2004; Toh et al., 2009; Yamashita et al., 2014).

### **PROTEIN-BASED TEMPLATE FOR CTP AND ATP SELECTIONS BY class-II CCA-ADDING ENZYME**

The crystal structures of the class-II CCA-adding enzymes from *Bacillus stearothermophilus* (BstCCA), *Thermotoga maritima* (TmCCA) and human mitochondria (HmtCCA) were reported (Li et al., 2002; Augustin et al., 2003; Toh et al., 2009).

The class-II CCA-adding enzymes adopt a sea-horse-shaped structure, and consist of four domains – the head, neck, body and tail domains (**Figure 5A**). The overall architecture of the class-II CCA adding enzyme is different from that of the class-I CCA-adding enzyme (Okabe et al., 2003; Xiong et al., 2003; **Figure 2A**), and it is rather similar to that of the eubacterial PAP from *Escherichia coli* (EcPAP; Toh et al., 2011; **Figure 5A**). This was anticipated by the comparisons of their amino acid sequences (Yue et al., 1996; **Figure 1**), although EcPAP adopts a sea-otter-shaped structure.

The head domain of the class-II CCA-adding enzyme comprises five-stranded β sheets connected by two α helices, with three conserved catalytic carboxylate residues (Asp55, Asp57, Asp99 of

TmCCA) on the β sheets, and the Mg2<sup>+</sup> ions are coordinated by the carboxylates. As described, the head domain structure of the class-II CCA-adding enzyme is homologous to that of the class-I CCA-adding enzyme (**Figure 2A**), together with those of the catalytic domains of other polynucleotide polymerases. The neck domain is composed of helices, and includes the key nucleobaseinteracting residues (Asp and Arg) that are putatively conserved among the class-II enzymes. The body and tail domains are composed of a bundle of α helices, and recognize the acceptor and TΨC helices of the tRNA, as described below.

In the complex structures of the class-II CCA-adding enzymes (BstCCA and TmCCA) with CTP or ATP, the nucleotide sits in the inter-domain region between the head and neck domains (**Figure 5B**). Both nucleotides are recognized in the same active pocket, through Watson-Crick-like base pairings between the nucleobases and the conserved Asp and Arg residues in the neck domain (Li et al., 2002; Toh et al., 2009). The 4-NH2 of CTP and the 6-NH2 of ATP form hydrogen-bonds with Asp174, whereas the N3 atom of CTP and the N1 atom ATP hydrogen-bond with Arg177. The O2 atom of CTP also hydrogen-bonds with Arg177 (**Figure 5B**; the amino acid numbering is according to TmCCA).

The mechanism of nucleobase recognition by the class-II CCA-adding enzyme is distinct from that observed in the class-I CCA-adding enzyme (**Figures 2D,E**). The template for the CCAaddition by the class-II CCA-adding enzyme is composed of the protein itself, rather than the RNA–protein complex as in the class-I CCA-adding enzyme. The protein-template of the class-II CCA-adding enzymes was confirmed by the mutations of Asp and Arg in the neck domain. The rational mutagenesis of the two key residues in the neck domain allowed the enzyme to add other nucleotides *in vitro* (Cho et al., 2007).

### **PROTEIN-BASED TEMPLATE FOR ATP-SELECTION BY THE CLASS-II A-ADDING ENZYME**

Although the structures of the class-II CCA-adding enzyme and its complexes with CTP and ATP are available (Li et al., 2002; Toh et al., 2009), the structures of the class-II CCA-adding enzyme complexed with tRNA (or RNA) have not been reported yet. Thus, the detailed RNA polymerization mechanism of CCA-addition by the class-II CCA-adding enzyme remained enigmatic.

Compounding the unsettled questions on the polymerization mechanism by the class-II CCA-adding enzyme, in some eubacteria such as *Aquifex aeolicus*, the CCA-adding activity is split between two distinct, but closely related, enzymes – one adds C74C75 and the other adds A76 (Tomita and Weiner, 2001, 2002; Bralley et al., 2005; Neuenfeldt et al., 2008; **Figure 1**). In*A. aeolicus*, which is placed at the deepest root of the 16S rRNA-based phylogenetic tree (Pace, 1997), the CC-adding and A-adding enzymes collaboratively synthesize the CCA-3- (Tomita and Weiner, 2001). On the other hand, in *T. maritima*, which is also located at the deepest root and is evolutionarily close to *A. aeolicus*, a single enzyme homologous to the *A. aeolicus* A-adding enzyme (AaL), rather than the CC-adding enzyme (AaS), adds CCA-3- (Tomita and Weiner, 2001, 2002).

The complex structure of AaL with tRNA lacking the terminal A76 and an ATP analog was reported (Tomita et al., 2004). AaL also adopts a sea-horse-shaped structure, as found with the other class-II CCA-adding enzymes (**Figure 5C**). As anticipated from the sequence similarity and closely related phylogeny (Tomita and Weiner, 2002), the overall structure of AaL superimposed well onto that of TmCCA (Toh et al., 2009).

In the complex structure of AaL with tRNA lacking the terminal A76 and an ATP analog (**Figure 5C**), the acceptor and TΨC helices of the tRNA are recognized by the enzyme, and the anticodon region does not interact with the enzyme. As in the class-I CCA-adding enzyme (**Figure 2C**), the tail domain of AaL interacts with the elbow region of the tRNA, and functions as an anchor.

The 3- -terminus of the tRNA enters the active pocket, which resides between the head and neck domains of AaL. The geometry of the incoming ATP analog, the 3- -OH group of C75 of tRNA, relative to the catalytic carboxylates (Asp31, Asp33, and

Glu74), suggested that the structure represents the insertion stage of A76addition (**Figure 5D**). The adenine base of the ATP analog is sandwiched by the cytosine base of C75 of tRNA and by hydrogen-bonds between Asp105 and Arg155. Thus, the binding pocket of ATP is composed of the 3 end of the tRNA and the protein. In the ternary complex structure, as observed in the complex structure of TmCCA (or BstCCA) with ATP (**Figure 5B**), the adenine base is recognized by the side chains of the conserved amino acids, Asp149 and Arg152, throughWatson-Crick-like hydrogen-bonds (**Figure 5D**). The 6-NH2 and the N1 atom of ATP form hydrogen-bonds with Asp149, and Arg152, respectively. Thus, the specificity for ATP by AaL is determined by the side chain of the protein itself, as in the class-II CCA-adding enzymes.

Biochemical and genetic studies revealed that a flexible loop in the head domains of the CCA-adding and A-adding enzymes is involved in the A76-adding reaction, but not the C74C75-adding reaction (Tomita et al., 2004; Neuenfeldt et al., 2008; Toh et al., 2009). In most of the reported crystal structures of the class-II CCA-adding and A-adding enzymes, the loop structure is disordered (Li et al., 2002; Augustin et al., 2003; Tomita et al., 2004). The loop region was only clearly visible in the crystal structure of *apo* TmCCA (Toh et al., 2009; **Figure 5E**). The loop extends from the head domain to the neck domain, bridging the two domains.

The superimposition of the structures of *apo* TmCCA and AaL complexed with tRNA revealed that the loop would interact with the 3- -part of the tRNA (Toh et al., 2009; **Figure 5E**). It is likely that the loop recognizes the growing 3- -CC sequence of the tRNA, and fixes the conformations of two key residues, Asp174 and Arg177, for the specific recognition of ATP. The corresponding loops in the CC-adding enzymes are shorter than those in the CCA-adding and A-adding enzymes, and the shorter loop was suggested to be one of the hallmarks of the CC-adding enzyme (Neuenfeldt et al., 2008).

### **TRANSLOCATION AND ROTATION OF tRNA DURING CC-ADDITION BY THE CC-ADDING ENZYME**

More recently, the structures of *A. aeolicus* CC-adding enzyme, AaS (Tomita and Weiner, 2001), in its *apo* form and in complexes with various tRNAs were reported (Yamashita et al., 2014). AaS also adopts a sea-horse-shaped structure similar to the other class-II CCA-adding and A-adding enzymes (**Figure 6A**). Although the structures of the head and neck domains of AaS are homologous to those of the class-II CCA-adding and A-adding enzymes, the structure of the body domain of AaS slightly differs from those of the CCA-adding and A-adding enzymes (**Figures 5A** and **6A**). The overall structure of AaS adopts a relatively closed form, by the insertion of an additional α-helix between the head and neck

**FIGURE 6 | Structure of the class-II CC-adding enzyme. (A)** Overall structure of *Aquifex aeolicus* CC-adding enzyme (AaS). The head, neck, body, and tail domains are colored magenta, green, cyan, and orange, respectively. Unique insertion helices are colored red and yellow. **(B)** Complex structures of AaS with tRNA lacking CCA. The tRNA is depicted by a stick model. Detailed view of the interaction between the acceptor helix of tRNA and AaS (inset). **(C)** Structures of the catalytic

pocket of AaS at the C74-adding (left) and C75-adding (middle) stages. CTPs are depicted by stick models. The superimposition of the structures at C74-adding (magenta) and C75-adding (cyan) stages (right). **(D)** Comparison of the overall complex structures of AaS at the C74-adding (cyan) and C75-adding (pink) stages. **(E)** Detailed view of the superimposed structures of the tRNA acceptor helices at the C74-adding (cyan) and C75-adding (pink) stages in **(D)**.

domains, and the body domain of AaS contains an additional α-helix and forms a bulging structure.

The complex structures of AaS with tRNA and an incoming CTP, representing the C74-adding and C75-adding stages, were also reported (Yamashita et al., 2014; **Figures 6B,C**). As observed in the complex structure of AaL with tRNA (Tomita et al., 2004; **Figure 5C**), AaS also recognizes the top-half region of the tRNA, and does not interact with the anticodon region (**Figure 6B**). The TΨC loop and D-loop of the tRNA interact with the tail domain of AaS.

In the C74-adding structure, the base-pair at the top of the tRNA acceptor helix stacks with Phe83 and Phe85 on the β-sheet in the head domain, and the 3- -terminal discriminator nucleoside A73 enters the active pocket. The cytosine base of the incoming CTP and the adenine base of A74 are stacked, and the triphosphate of the CTP and the ribose 3- -OH group of A73 are proximal to the catalytic carboxylates (Asp58, Asp60 and Asp112) and Mg<sup>2</sup><sup>+</sup> ion. In the ternary structure, the 4-NH2 and the N3 atom of CTP form hydrogen-bonds with Asp182 and Arg185, respectively. The O2 atom of CTP also form a hydrogen-bond with Arg185 (**Figure 6C**). The mechanism of CTP recognition by AaS is the same as those observed in the complex structures of class-II CCAadding enzyme with CTP in the absence of tRNA (Li et al., 2002; Toh et al., 2009). In the complex structure of AaS, representing the C75-adding stage, the mechanism of CTP recognition is the same as that of CTP recognition at the C74-adding stage (**Figure 6C**). Thus, the C74 and C75-adding reactions both proceed by the same mechanism, using the same active pocket. The size and the shape of the nucleotide binding pocket at the C74-adding and C75-adding stages are suitable to accommodate CTP, but not the other three nucleotides. CTP is selectively accommodated as a consequence of competition between nucleotides, using the conserved Asp182 and Arg185 in the pocket.

A comparison of the complex structures of the C74-adding and C75-adding stages revealed the translocation and rotation of the tRNA relative to the enzyme (**Figure 6D**). As a consequence of its backward translocation, the tRNA rotates and the relative orientation of its anticodon region on the enzyme changes by approximately twenty five degrees, as compared to that of the tRNA in the C74-addition stage. In the complex structure, the positions that were formerly occupied by the terminal base-pair (G1–C72) of the tRNA acceptor stem at the C74-addition stage are now empty. A73 translocates out of the catalytic pocket, and the position that was occupied by A73 at the C74-adding stage accommodates the newly incorporated C74 (**Figure 6E**). The release of pyrophosphate, a byproduct of RNA polymerization, from the active site of AaS triggers the backward translocation of the tRNA, as observed in T7 RNA polymerase (Yin and Steitz, 2004).

Upon pyrophosphate release from the active site after C75-addition, the tRNA translocates further toward the tail domain and rotates relative to the enzyme. A tRNA ending with C74C75 could no longer be retained on the enzyme, due to the further translocation and rotation of the tRNA relative to the enzyme, and would dissociate after C74C75 synthesis. Thus, the enzyme would terminate RNA synthesis. The tRNA ending with C74C75 then binds to AaL, the terminal A76 is added, and the CCA-3- synthesis of tRNA is completed.

Previous biochemical studies using class-II CCA-adding enzymes and a tRNA mini-helix, corresponding to the top-half of tRNA, suggested that C74 addition, like C75 and A76 addition, involves neither tRNA translocation nor rotation (Cho et al., 2006). The mechanisms of CC-addition by the class-II CC-adding and CCA-adding enzymes might be different. Alternatively, the previous biochemical study using a tRNA mini-helix might not represent the actual nature of C74-addition onto tRNA, since the tRNA mini-helix lacks the interactions between the TΨC loop and the D-loop in tRNA.

The dynamic sequence of the CC-adding reaction by the class-II CC-adding enzyme, revealed by the crystallographic analyses of the complexes of the CC-adding enzyme with various tRNAs with or without and CTP, is presented in **Figure 7**.

### **CONCLUSIONS AND PERSPECTIVES**

The detailed and extensive crystallographic analyses of the class-I CCA-adding enzyme complexed with various RNA primers

explained the previous biochemical results well (Shi et al., 1998a; Yue et al., 1998; Okabe et al., 2003; Xiong et al., 2003; Xiong and Steitz, 2004; Cho et al., 2006; Tomita et al., 2006; Toh et al., 2008; Pan et al., 2010). As previously suggested by biochemical studies using the class-I CCA-adding enzyme (Shi et al., 1998b; Yue et al., 1998; Cho et al., 2006), structural studies of the class-I CCA-adding enzyme revealed that the tRNA neither translocates nor rotates relative to the enzyme during the CCA-adding reaction. The reaction proceeds in a single active pocket, and the template for CTP and ATP is the RNA-protein complex, rather than the protein itself. The crystallographic analyses also showed that the size and the shape of the nucleotide binding pocket, formed by the growing 3- -terminus of the RNA and the enzyme, successively change during the CCA-adding reaction, thus switching the nucleotide specificity, and that the CCA-adding reaction proceeds via two modes – dynamic CC-addition and static A-addition.

On the other hand, regarding the class-II CCA-adding enzyme, only the *apo* structures of the enzyme and its complexes with CTP and ATP are available (Li et al., 2002; Toh et al., 2009). Until now, the structure of a class-II CCA-adding enzyme complexed with tRNA (or RNA) has not been reported, and the detailed molecular mechanism of nucleotide specificity switching during CCA-addition has remained elusive. The complex structures of the class-II CCA-adding enzyme with CTP and ATP revealed that both CTP and ATP are recognized in the same active pocket, through Watson-Crick-like base pairings between the nucleobases and the conserved Asp and Arg residues in the active pocket (Li et al., 2002; Toh et al., 2009). Thus, the template for CTP and ATP is the protein itself, rather than the RNA-protein complex. This is distinct from the mechanism of nucleotide selection by the class-I CCA-adding enzymes.

The complex structures of the *A. aeolicus* class-II CCadding and A-adding enzymes with tRNA and a nucleotide were reported (Tomita et al., 2004; Yamashita et al., 2014). The recognition mechanisms of CTP and ATP by the *A. aeolicus* CC-adding and A-adding enzymes at the insertion stage of

The head, neck, body, and tail domains are colored magenta, green, cyan, and orange, respectively. tRNAs are colored gray. Catalytic sites are colored yellow in the head domains.

RNA polymerization, respectively, are the same as those of the class-II CCA-adding enzyme, as revealed by the complex structures of these enzymes with nucleotides (Li et al., 2002; Toh et al., 2009).

The detailed molecular basis for the different activities between the A-adding enzyme and the CC-adding enzymes is not fully understood. The mechanism by which the CC-adding enzyme adds only C74C75, and then terminates RNA polymerization without adding A76, was explained well by the structural analyses (Yamashita et al., 2014). However, the mechanism by which the A-adding enzyme adds only A76, but not C74C75, has not been clarified yet, even though the complex structure of the *A. aeolicus* A-adding enzyme with tRNA and a nucleotide analog is available (Tomita et al., 2004).

The short loop in the head domain is suggested to be a hallmark of the CC-adding enzymes (Neuenfeldt et al., 2008). However, transplantation of the corresponding flexible longer loop from the A-adding enzyme (or CCA-adding enzyme) into the corresponding position of the CC-adding enzyme did not always transform the CC-adding enzyme into a CCA-adding enzyme, *in vitro* as well as *in vivo* (Toh et al., 2009). Thus, the longer loop in the head domain of the A-adding enzyme itself is not the main determinant for the A-adding enzyme to add only terminal A76. The C-terminal body and tail domains of the A-adding enzymes reportedly inhibit C74C75 addition *in vitro* (Tretbar et al., 2011). Since the overall structures of *A. aeolicus* A-adding enzyme and *T. maritima* CCA-adding enzyme superimposed well, the inhibitory effects of the C-terminal region of all A-adding enzymes on CC-addition

**FIGURE 8 | Hypothetical mechanisms of nucleotide addition by class-II enzymes. (A)** CC-addition by CC-adding enzyme possessing two tRNA binding sites (sites A and B). Sites A and B are used for C74- and C75-addition, respectively. When a tRNA ending with the discriminator nucleoside (D73) is on site B, the 3- -terminal D73 cannot reach the active site for catalysis (inactive state). A tRNA ending with C74C75 cannot bind either site A or B. **(B)** A-addition by A-adding enzyme, possessing a single tRNA binding site. The neck domain is not flexible. The tRNA binds the enzyme, using the single site. On this site, the 3- -terminal nucleoside of a tRNA ending in either a discriminator nucleoside (tRNA-D73) or C74 (tRNA-C74) cannot

reach the active site (inactive state). Only the 3- -terminal nucleoside of tRNA ending in C74C75 can reach the active site for the catalysis (active state). **(C)** CCA-addition by CCA-adding enzyme, possessing a single tRNA binding site. tRNA binds the enzyme, using the single site. Since, unlike the A-adding enzyme, the neck domain of the CCA-adding enzyme is flexible, the head domain of the CCA-adding enzyme could relocate toward the neck domain to catalyze C74 and C75 addition. The head, neck, body, and tail domains of the enzymes are colored magenta, green, cyan, and orange, respectively. tRNAs are colored gray. Catalytic sites are colored yellow in the head domains.

probably do not dictate the specificity for the enzymes to add only terminal A76, in general.

The C74C75-addition by the CC-adding enzyme involves the translocation and rotation of the tRNA relative to the enzyme (Yamashita et al., 2014). Apparently, there are two tRNA binding sites on the surface of *A. aeolicus* CC-adding enzyme. One is for C74-addition and the other is for C75-addition (**Figure 8A**). The body domain of *A. aeolicus* CC-adding enzyme adopts a bulging structure and the overall structure has a more closed conformation, as compared with the *A. aeolicus* A-adding enzyme and other CCA-adding enzymes (Li et al., 2002; Tomita et al., 2004; Toh et al., 2009; Yamashita et al., 2014). Thus, the distinct structures of the body domain of the CC-adding enzyme, with two tRNA binding sites, allow the tRNA to translocate and rotate during the CCadding reactions and to terminate the RNA polymerization after CC-addition.

A chimeric enzyme of *A. aeolicus* A-adding enzyme and the closely related *T. maritima* CCA-adding enzyme (Tomita and Weiner, 2001, 2002), designed based on their crystal structures, was constructed, and the A-adding enzyme was transformed into an enzyme that could perform CCA-addition *in vitro* as well as *in vivo* (Toh et al., 2009). These biochemical and genetic analyses suggested the importance of the flexibility of the neck domain, in defining the number of nucleotides added onto the 3- -end of the tRNA and the nucleotide specificity by the class-II CCA-adding enzyme.

It could be hypothesized that only a single tRNA binding site exists in both the class-II A-adding and CCA-adding enzymes (**Figures 8B,C**). On the single tRNA binding site, the 3- -terminus of tRNA lacking CCA or CA could not reach the active site without structural change of the enzyme, while the 3- -terminus of tRNA lacking A76 could reach it. The absence of flexibility in the neck domain of the A-adding enzyme would not allow the head domain of the enzyme to relocate toward the neck domain for CC-addition, when tRNA binds the single tRNA binding site. Hence, the A-adding enzyme could not add C74C75, but could only add A76 (**Figure 8B**). On the other hand, the CCA-adding enzyme, with a flexible neck domain, could add CC as well as A, by relocating the catalytic active head domain toward the neck domain for catalysis (**Figure 8C**), even though there is only one tRNA binding site, as in the A-adding enzyme. In these models, neither translocation nor rotation of tRNA is involved in the CCA-addition by the class-II CCA-adding enzyme, as previously suggested (Cho et al., 2006).

In the future, the complex crystal structures of a class-II CCAadding enzyme with various tRNAs, representing sequential CCAaddition, and the comparison of the structures with those of the A-adding enzyme and the CC-adding enzyme complexed with tRNAs, will provide clear and definitive answers to all of the abovementioned unsettled questions about the mechanisms of class-II CCA-adding enzyme and its relatives.

### **ACKNOWLEDGMENTS**

The research in our laboratory was supported by grants to Kozo Tomita from the Funding Program for Next Generation World-Leading Researchers (NEXT Program) of JSPS, Grants-in-Aid for Scientific Research (A) of JSPS, Precursory Research for Embryonic Science and Technology (PRESTO Program) of JST, the Takeda Science Foundation, the Mitsubishi Foundation, the Naito Foundation, and the Kurata Hitachi Memorial Foundation.

### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 11 January 2014; accepted: 31 January 2014; published online: 17 February 2014.*

*Citation: Tomita K and Yamashita S (2014) Molecular mechanisms of templateindependent RNA polymerization by tRNA nucleotidyltransferases. Front. Genet. 5:36. doi: 10.3389/fgene.2014.00036*

*This article was submitted to Non-Coding RNA, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Tomita and Yamashita. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Methylated nucleosides in tRNA and tRNA methyltransferases

### *Hiroyuki Hori\**

*Department of Materials Science and Biotechnology, Applied Chemistry, Graduate School of Science and Engineering, Ehime University, Matsuyama, Japan*

### *Edited by:*

*Akio Kanai, Keio University, Japan*

*Reviewed by:*

*Ramesh Gupta, Southern Illinois University, USA Akira Muto, Hirosaki University, Japan*

*\*Correspondence: Hiroyuki Hori, Department of Materials Science and Biotechnology, Graduate School of Science and Engineering, Ehime University, Bunkyo 3, Matsuyama, Ehime 790-8577, Japan e-mail: hori@eng.ehime-u.ac.jp*

To date, more than 90 modified nucleosides have been found in tRNA and the biosynthetic pathways of the majority of tRNA modifications include a methylation step(s). Recent studies of the biosynthetic pathways have demonstrated that the availability of methyl group donors for the methylation in tRNA is important for correct and efficient protein synthesis. In this review, I focus on the methylated nucleosides and tRNA methyltransferases. The primary functions of tRNA methylations are linked to the different steps of protein synthesis, such as the stabilization of tRNA structure, reinforcement of the codon-anticodon interaction, regulation of wobble base pairing, and prevention of frameshift errors. However, beyond these basic functions, recent studies have demonstrated that tRNA methylations are also involved in the RNA quality control system and regulation of tRNA localization in the cell. In a thermophilic eubacterium, tRNA modifications and the modification enzymes form a network that responses to temperature changes. Furthermore, several modifications are involved in genetic diseases, infections, and the immune response. Moreover, structural, biochemical, and bioinformatics studies of tRNA methyltransferases have been clarifying the details of tRNA methyltransferases and have enabled these enzymes to be classified. In the final section, the evolution of modification enzymes is discussed.

**Keywords: RNA modification, RNA methylation, RNA maturation**

### **INTRODUCTION**

The first tRNA sequence was determined in 1965 and numerous modifications were identified at various positions within the sequence (Holley et al., 1965). At almost the same time, several tRNA methyltransferase activities were detected in *Escherichia coli* cell extract (Hurwitz et al., 1964), which suggested that diverse enzymes are involved in tRNA modification. To date, more than 90 modified nucleosides have been identified in tRNA (Machnicka et al., 2013). Thus, the majority of modified nucleosides that have been discovered in different RNA species are found in tRNA. In the twenty-first century, the major modification pathways of tRNA have been elucidated on the basis of genome sequence data. These studies have demonstrated that the pathways of tRNA modification show diversity among living organisms. In this review, I focus on the methylated nucleosides in tRNA, together with tRNA methyltransferases, and introduce their basic roles as well as their more complex functions.

### **THE PRIMARY ROLE OF tRNA MODIFICATIONS IS THE REGULATION OF PROTEIN SYNTHESIS**

Transfer RNA is an adaptor molecule that enables the genetic code of nucleic acids to be converted to amino acids in protein. Consequently, the primary functions of individual tRNA modifications are linked to the different steps of protein synthesis. In fact, if a tRNA remains unmodified, it becomes charged with a non-cognate amino acid, the corresponding codon in the mRNA is mistranslated, and a mutation is introduced. **Table 1** summarizes the typical methylated nucleosides and their positions within the tRNA, their distributions in the three domains of life, the corresponding tRNA methyltransferases, their contributions to tRNA structure, their functions in addition to structural roles, and related publications. (Phillips and Kjellin-Straby, 1967; Taya and Nishimura, 1973; Yaniv and Folk, 1975; Delk et al., 1976; Watanabe et al., 1976, 2005, 2006; Pierre et al., 1978, 2003; Pope et al., 1978; Raba et al., 1979; Greenberg and Dudock, 1980; Ny and Bjork, 1980; Osorio-Almeida et al., 1980; Byström and Björk, 1982; Hopper et al., 1982; Walker, 1983; Gupta, 1984; Johnson et al., 1985; Ellis et al., 1986; Reinhart et al., 1986; van Tol et al., 1987; Ny et al., 1988; Björk et al., 1989, 2001; Jakab et al., 1990; Keith et al., 1990; Perret et al., 1990; Edmonds et al., 1991; Gu and Santi, 1991; Gustafsson and Björk, 1993; Hagervall et al., 1993; Edqvist et al., 1994; Kowalak et al., 1994; Martin and Hopper, 1994; Grosjean et al., 1995, 1996, 2008; Durand et al., 1997; Jiang et al., 1997; Li et al., 1997; Persson et al., 1997, 1998; Anderson et al., 1998, 2000; Constantinesco et al., 1998, 1999a,b; Helm et al., 1998; Hori et al., 1998, 2002, 2003; Matsuyama et al., 1998; Qian et al., 1998; Tomita et al., 1998; Cavaillé et al., 1999; Farabaugh and Björk, 1999; Liu et al., 1999, 2003, 2013; Motorin and Grosjean, 1999; Niederberger et al., 1999; Liu and Straby, 2000; Nordlund et al., 2000; Clouet-d'Orval et al., 2001, 2005; Dong et al., 2001; Urbonavicius et al., 2001, 2002, 2003, 2005; Yasukawa et al., 2001; Alexandrov et al., 2002, 2005, 2006; Johansson and Byström, 2002; King and Redman, 2002; Pintard et al., 2002; Suzuki et al., 2002, 2007, 2011a; Ahn et al., 2003; Bortolin et al., 2003; De Bie et al., 2003; Droogmans et al., 2003; Elkins et al., 2003; Jackman et al., 2003;


 1986;

1999; Liu and Straby, 2000; Grosjean et al., 2008; Ihsanawati

et al., 2008; Lai et al., 2009; D'Silva

et al., 2011; Dewe et al., 2012

*(Continued)*

 et al.,

 2010




et al., 2011, 2012; Leihne et al., 2011 *(Continued)*


**95**

**www.frontiersin.org** May 2014 | Volume 5 | Article 144 |


**Frontiers in Genetics** | Non-Coding RNA May 2014 | Volume 5 | Article 144 |


**97**



**Table 1 | Continued**

Kalhor and Clarke, 2003; Kaneko et al., 2003; Kierzek and Kierzek, 2003; Takai and Yokoyama, 2003; Armengaud et al., 2004; Brulé et al., 2004; Bujnicki et al., 2004; Christian et al., 2004, 2013; Freude et al., 2004; Kadaba et al., 2004; Nasvall et al., 2004; Nureki et al., 2004; O'Dwyer et al., 2004; Okamoto et al., 2004; Roovers et al., 2004, 2008a,b, 2012; Singh et al., 2004; Cartlidge et al., 2005; Chen et al., 2005, 2011; Durant et al., 2005; Huang et al., 2005; Kalhor et al., 2005; Kirino et al., 2005; Leipuviene and Björk, 2005; Lu et al., 2005; Pleshe et al., 2005; Purushothaman et al., 2005; Renalier et al., 2005; Sakurai et al., 2005a,b; Umeda et al., 2005; Waas et al., 2005, 2007; Brzezicha et al., 2006; Goll et al., 2006; McCrate et al., 2006; Noma et al., 2006, 2010, 2011; Ote et al., 2006; Purta et al., 2006; Shigi et al., 2006; Takano et al., 2006; Takeda et al., 2006; Yim et al., 2006; Zegers et al., 2006; Auxilien et al., 2007, 2011, 2012; Begley et al., 2007; Christian and Hou, 2007; Choudhury et al., 2007a,b; Lee et al., 2007; Matsumoto et al., 2007; Ozanick et al., 2007; Walbott et al., 2007; Wilkinson et al., 2007; Alian et al., 2008; Barraud et al., 2008; Chernyakov et al., 2008; Goto-Ito et al., 2008, 2009; Ihsanawati et al., 2008; Klassen et al., 2008; Kurata et al., 2008; Kuratani et al., 2008, 2010; Leulliot et al., 2008; Meyer et al., 2008, 2009; Tomikawa et al., 2008, 2010, 2013; Toyooka et al., 2008; Awai et al., 2009, 2011; Lai et al., 2009; Moukadiri et al., 2009, 2014; Nishimasu et al., 2009; Osawa et al., 2009; Shi et al., 2009; Shimada et al., 2009; Umitsu et al., 2009; Ye et al., 2009; Arragain et al., 2010; Atta et al., 2010, 2012; Benítez-Páez et al., 2010; Böhme et al., 2010; Chen and Yuan, 2010; de Crécy-Lagard et al., 2010; Fu et al., 2010; Guelorget et al., 2010, 2011; Kempenaers et al., 2010; Mazauric et al., 2010; Ochi et al., 2010, 2013; Songe-Møller et al., 2010; Tkaczuk, 2010; D'Silva et al., 2011; Hamdane et al., 2011a,b, 2012, 2013; Joardar et al., 2011; Kitamura et al., 2011, 2012; Leihne et al., 2011; Liger et al., 2011; Lin et al., 2011; Menezes et al., 2011; Pearson and Carell, 2011; Qiu et al., 2011; van den Born et al., 2011; Wei et al., 2011; Armengod et al., 2012; Chan et al., 2012; Chatterjee et al., 2012; Chujo and Suzuki, 2012; Dewe et al., 2012; Fislage et al., 2012; Gehrig et al., 2012; Guy et al., 2012; Jöckel et al., 2012; Novoa et al., 2012; Pastore et al., 2012; Patil et al., 2012a,b; Perche-Letuvée et al., 2012; Sakaguchi et al., 2012; Towns and Begley, 2012; Vilardo et al., 2012; Wurm et al., 2012; Yamagami et al., 2012; Edelheit et al., 2013; Fujimori, 2013; Igoillo-Esteve et al., 2013; Kim and Almo, 2013; Ohira et al., 2013; Paris et al., 2013; Preston et al., 2013; Shao et al., 2013; Swinehart et al., 2013). In **Table 1**, several important tRNA modifications such as pseudouridine (ψ), lysidine, agmatidine, queosine (Q), and 2-thiouridine (s2U) are not listed because their biosynthetic pathways do not include any methylation steps. Nevertheless, **Table 1** outlines the roles of key tRNA modifications, and demonstrates that methylated nucleosides and tRNA methyltransferases are very important for such functions. The structures of typical methylated nucleosides are shown in **Figure 1**. It is impossible to depict all methylated nucleosides in **Figure 1** due to limitations of space. Please visit the database (http://modomics.genesilico. pl/modifications/) to obtain additional structural information (Machnicka et al., 2013). The structure of tRNA and positions of the methylated nucleotides are shown in **Figure 2**. As for tRNA stabilization by methylated nucleosides, see this review (Motorin and Helm, 2010). Even today, the contributions to tRNA structure and/or function in protein synthesis of many methylated nucleosides remain unknown (**Table 1**). However, various tRNA methyltransferases and their corresponding disruptant strains have been analyzed, and their functions are gradually being elucidated. Among the phenotypes of the gene disruptant strains, many phenomena have been reported that are difficult to understand directly in terms of enzyme function or effects on protein synthesis. For example, *E. coli* miaA mutant strains, which contain A37 instead of ms2i 6A37 in the tRNA, show a moderate mutator phenotype that results in an increased rate of GC->AT transversion (Zhao et al., 2001). Furthermore, inosine 34 modification in fission yeast is essential for cell cycle progression (Tsutsumi et al., 2007). These phenomena might be caused by changes in the amount of certain protein(s), such as transcription factors, in the disruptant strains. In fact, recently, it has been reported that Trm9-specific tRNA modifications enhance codon-specific elongation of translation and promote increased levels of DNA damage response proteins (Begley et al., 2007). Furthermore, several eukaryotic tRNA methyltransferases (for example, human ALKBH8 Shimada et al., 2009; Fu et al., 2010 and yeast Trm2 Choudhury et al., 2007a,b) are involved directly in DNA repair and carcinogenesis because they exist as fusion proteins with other enzyme(s). However, it remains possible that some of the phenotypes observed in the disruptant strains are linked to unknown biological phenomena.

### **MULTIPLE REGULATION OF tRNA MODIFICATION PATHWAYS AND IMPORTANCE OF THE AVAILABILITY OF METHYL DONORS**

In living cells, more than 50% of the high energy compounds such as ATP, that are produced by respiration are consumed by protein synthesis. Furthermore, the most important metabolic pathway of amino acids is protein synthesis. The metabolic pathways of energy and amino acids are closely linked. Studies on the pathways of tRNA modification have revealed that the RNA modification systems are located downstream of the pathways of energy and amino acid metabolism and that they are regulated at multiple steps (Herbig et al., 2002; Iwata-Reuyl, 2003; Ikeuchi et al., 2008, 2010; Shigi et al., 2008; Suzuki and Miyauchi, 2010; Phillips et al., 2012; Laxman et al., 2013; Miyauchi et al., 2013; Perrochia et al., 2013 **Figure 3** and **Table 1**). Thus, depletion of a certain compound (for example, one of the amino acids) or disruption of a metabolic pathway can result in incomplete modification of tRNA and thus an increased frequency of translational errors.

The structures of identified modified nucleosides suggest that the majority of tRNA modifications require a methylation step(s) (**Table 1** and **Figure 1**). The methyl-transfer reaction by majority of tRNA methyltransferases consumes S-adenosyl-L-methionine (AdoMet) as the methyl-group donor. Thus, the depletion of AdoMet leads to multiple incomplete modifications in tRNA. The precursors of AdoMet are ATP and methionine. These facts seem to provide an answer for the question, "Why do living organisms use the methionine codon as the initiation codon for protein synthesis?" Under conditions where methionine is limited and the tRNA contains multiple incomplete modifications, to avoid increase of frequency of translational error, the methionine codon is selected the initiation codon of protein synthesis. Analogously,

the fact that eubacterial methionyl-initiator tRNAMet is formylated and formylation is the transfer of one carbon atom suggests that the supply of sources of single carbon atoms is very important for efficient and accurate protein synthesis in bacteria.

### **STRUCTURES OF tRNA METHYLTRANSFERASES**

Transfer RNA methyltransferases can be divided into two types on the basis of their methyl donor: one class uses AdoMet whereas the other utilizes 5, 10-methylenetetrahydrofolate (**Table 2**). As mentioned above, the majority of tRNA methyltransferases are AdoMet-dependent. For information on the catalytic mechanisms of tRNA methyltransferases, (see Watanabe et al., 2005; Kuratani et al., 2008; Meyer et al., 2008; Osawa et al., 2009; Hou and Perona, 2010; Hamdane et al., 2012). Recently, a radical SAM enzyme was identified as a ribosomal RNA methyltransferase (Atta et al., 2010); radical SAM enzymes utilize a 4Fe-4S cluster to generate a reactive radical from AdoMet. No radical SAM enzymes that act as tRNA methyltransferases have been identified as yet. However, three types of radical SAM enzymes are involved in tRNA modifications (2-methylthiotransferases that generate ms2t 6A derivatives, 2-methylthiotransferases that generate ms2i 6A derivatives, and enzymes involved in the biosynthesis of yW37 derivatives) (Suzuki et al., 2007; Atta et al., 2010, 2012; de Crécy-Lagard et al., 2010; Fujimori, 2013 and **Table 1**). Radical SAM tRNA methyltransferase(s) might be identified in the near future, because there are many methylated nucleosides, for which the responsible enzyme(s) have not yet been identified (**Table 1**).

AdoMet-dependent methyltransferases are classified by their catalytic domain (Schubert et al., 2003). Two different classes (classes I and IV) have been identified among the tRNA methyltransferases (**Table 2**). Class I enzymes contain the Rossmann fold in the catalytic domain (**Figure 4A**), whereas class IV enzymes have the topological-knot structure (**Figure 4B**). Class IV enzymes were predicted initially by bioinformatics studies to be members of the SpoU-TrmD (SPOUT) superfamily (Anantharaman et al., 2002). Subsequently, crystallographic studies (**Table 2**) revealed that these enzymes have a topological knot structure. YibK was predicted initially to be an RNA methyltransferase of unknown function (Gustafsson et al., 1996). Determination of the crystal structure revealed the presence of the topological-knot structure in the catalytic domain of YibK (Lim et al., 2003). Later, YibK was shown to function as tRNA (Cm34/cmnm5Um34) methyltransferase and was renamed TrmL (Benítez-Páez et al., 2010; Liu et al., 2013). At almost the same



### **Radical SAM-tRNA methyltransferase**

Trm10 Shao et al., 2013 aTrm56 Kuratani et al., 2008


2012; Wurm et al., 2012

*The enzymes, of which structures have been determined by X-ray crystal structure studies, are listed. There are various enzymes, of which structures have been predicted by their amino acid sequences, conserved motifs and bioinformatics studies (Gustafsson et al., 1996; Anantharaman et al., 2002; Purta et al., 2006; Roovers et al., 2008a; Phizicky and Hopper, 2010; Tkaczuk, 2010). Detailed insight into catalytic mechanisms of tRNA methyltransferases is only available in a few cases: see these references (Watanabe et al., 2005; Kuratani et al., 2008; Meyer et al., 2008; Osawa et al., 2009; Hou and Perona, 2010; Hamdane et al., 2012).*

time, three groups independently reported the crystal structures of TrmD proteins and revealed that TrmD proteins also contain the topological-knot structure (Ahn et al., 2003; Elkins et al., 2003; Liu et al., 2003). In 1997, SpoU was found to have tRNA (Gm18) 2- -*O*-methyltransferase activity and was renamed as TrmH (Persson et al., 1997). We solved the crystal structure of TrmH in 2004 and confirmed that it is a class IV enzyme with the topological-knot structure (Nureki et al., 2004 and **Figure 4C**). These studies established the structural foundation of SPOUT enzymes (Anantharaman et al., 2002; Tkaczuk et al., 2007), which can be identified on the basis of the topological-knot structure. To date, several tRNA methyltransferases have been identified as members of the SPOUT superfamily on the basis of crystal structures (Kuratani et al., 2008; Chen and Yuan, 2010; Chatterjee et al., 2012; Wurm et al., 2012; Shao et al., 2013) or structures predicted from amino acid sequences and conserved motifs (Renalier et al., 2005; Purta et al., 2006; Tkaczuk et al., 2007; Kempenaers et al., 2010, and **Figure 4B**). Furthermore, the SPOUT superfamily is expanding beyond the SpoU and TrmD families: novel enzymes such as an archaeal Trm10 homolog (Kempenaers et al., 2010) and TrmY (Chen and Yuan, 2010; Chatterjee et al., 2012; Wurm et al., 2012) have been identified. These enzymes cannot be simply classified into the SpoU or TrmD families. Therefore, it might be necessary to reclassify the enzymes of the SPOUT superfamily on the basis of their structure, the methylated nucleosides produced, and their reaction mechanisms.

The number of identified class I methyltransferases has also increased. Crystal structures of class I enzymes have been reported, as shown in **Table 2**; however, for many of the enzymes, structures have been predicted from their amino acid sequences and conserved motifs. The difficulty with crystallographic studies is that the eukaryotic and archaeal enzymes often require other subunit(s) to regulate (or stabilize) their activities (Anderson et al., 1998; Alexandrov et al., 2002, 2005; Purushothaman et al., 2005; Mazauric et al., 2010; Liger et al., 2011; Noma et al., 2011, and **Table 1**). Only a few structural studies of the multisubunit complexes have been performed, namely Trm8–Trm82 (Leulliot et al., 2008), and the Fibrillalin, Nop5 and L7Ae complex (Ye et al., 2009; Lin et al., 2011). In addition, structures for the tRNA bound-form of Trm5 (Goto-Ito et al., 2009) and T-armlike RNA bound-form of TrmA (Alian et al., 2008) have been reported. Furthermore, several eukaryotic tRNA methyltransferases are fused with other functional domains and are involved in other processes such as DNA repair (Choudhury et al., 2007a,b; Shimada et al., 2009; Fu et al., 2010; Songe-Møller et al., 2010; D'Silva et al., 2011; Leihne et al., 2011; Noma et al., 2011; van den Born et al., 2011; Pastore et al., 2012). Although the crystal structures of the RNA recognition motif and AlkB domains of ALKB8H, which also contains a methyltransferase domain, have been reported (Pastore et al., 2012), there is no entire crystal structure of a eukaryotic multidomain tRNA methyltransferase. To understand the reaction mechanisms, substrate specificity, subunit (domain) interactions, and regulation of activity of these enzymes, structural studies are necessary.

Among the enzyme complexes that are involved in tRNA methylation, the mnmEG and mnmC complexes, which are required for the mnm5U34 modification (Taya and Nishimura, 1973; Bujnicki et al., 2004; Yim et al., 2006; Meyer et al., 2008, 2009; Roovers et al., 2008b; Moukadiri et al., 2009, 2014; Osawa et al., 2009; Shi et al., 2009; Böhme et al., 2010; Kitamura et al., 2011, 2012; Pearson and Carell, 2011; Armengod et al., 2012; Kim and Almo, 2013), are only found in eubacteria, which shows the complexity of the Xm5U34 biosynthetic pathway. In eukaryotes, the biosynthetic pathways of Xm5U34 have not been completely clarified: Trm9 and the so-called "Elongator" complex are known to be involved (Huang et al., 2005; Chen et al., 2011; Leihne et al., 2011). Furthermore, although we determined recently that tRNALeu from *Thermoplasma acidophilum*,

a thermo-acidophilic archaeon, has 5-carbamoylmethyluridine at position 34 (ncm5U34) (Tomikawa et al., 2013), the biosynthetic pathway in archaea is unknown.

respectively. The known class IV enzymes work as a dimer. **(C)** The dimer

As studies on eukaryotic enzymes have progressed, the number of complex enzymes identified has increased. For example, mammalian enzymes often have additional domains, regulatory subunits and/or paralogs. For information on the identification and prediction of human tRNA methyltransferases, see this review (Towns and Begley, 2012).

### **TRANSFER RNA RECOGNITION BY tRNA METHYLTRANSFERASES**

Transfer RNA methyltransferases strictly modify a specific nucleoside at a specific position in a tRNA. Within the field of nucleic acid-related enzymes, a common question is "How does the enzyme recognize a specific substrate and act at a specific position?" Consequently, the substrate specificities of tRNA methyltransferases have been studied by measuring activities in crude cell extracts, microinjecting labeled tRNA, biochemical studies with purified enzymes, crystallographic studies, and analyses of tRNA from disruptant strains.

In general, tRNA methyltransferases recognize the local structure around the target site in the tRNA, including tertiary structural elements such as stem-loop structure(s). TrmA from *E. coli* recognizes U54 in the ribose-phosphate backbone of the T-arm (Gu and Santi, 1991; Alian et al., 2008). *Aquifex aeolicus* TrmB requires the five nucleotides AGG∗UC sandwiched between two stem-loop structures (the asterisk corresponds to the methylation site, G46) (Okamoto et al., 2004). TrmFO recognizes the G53- C61 base pair and U54U55C56 sequence in the T-arm (Yamagami et al., 2012). TrmD recognizes the purine36G37 sequence in the anticodon-arm-like microhelix (Brulé et al., 2004; Takeda et al., 2006). In some cases, tertiary interactions are required. For example, crystallographic studies of the complex between Trm5 and tRNA revealed that the enzyme requires interaction between the D- and T-loop of the tRNA (Goto-Ito et al., 2009), which is consistent with the results of biochemical studies with the purified enzyme (Christian et al., 2004; Christian and Hou, 2007).

Clouet-d'Orval et al. (2005), Ochi et al. (2013) with slight modifications.

The target site for methylation is often embedded in the L-shaped tRNA structure. Consequently, in many (or almost all) cases, recognition of tRNA by tRNA methyltransferases seems to involve multiple steps (initial binding and induced fit processes). Although it is very difficult to prepare intermediate complexes, we recently analyzed the initial binding and changes in structure of TrmH by stopped-flow presteady-state kinetic analysis (Ochi et al., 2010, 2013). TrmH binds to tRNA within 10 ms in the initial binding process, in which substrate and non-substrate (methylated) tRNAs are not distinguished. Methylated tRNA is excluded from the complex subsequently due to steric hindrance between the methyl groups in the tRNA and AdoMet before the induced-fit process occurs. The advantage of this mechanism is that methylated tRNA does not severely inhibit the methyl-transfer reaction as a competitive inhibitor. Subsequently, in the induced-fit process, which takes more than 50 ms, G18 is recognized and ribose introduced into the catalytic pocket. During the induced-fit process, movement of Trp126 in motif 2 is observed (Ochi et al., 2013 and **Figure 4C**).

Several tRNA methyltransferases act on multiple sites in tRNA. For example, archaeal TrmI acts on both A57 and A58 (Roovers et al., 2004; Guelorget et al., 2010). Similarly, *Aquifex aeolicus* Trm1 acts on both G26 and G27 (Awai et al., 2009). On the basis of biochemical studies, we determined that this eubacterial Trm1 recognizes the methylation sites (G26 and G27) from the T-arm (Awai et al., 2009, 2011) whereas archaeal Trm1 recognizes G26 from the D-stem and variable region (Constantinesco et al., 1999b). These Trm1 proteins share high sequence homology (Awai et al., 2009); however, comparison of the crystal structures revealed that the distribution of positive charges on the enzyme surface differs between archaeal (Ihsanawati et al., 2008) and eubacterial (Awai et al., 2011) Trm1. Thus, these studies show how difficult it is to predict target sites on the basis of amino acid sequences. Furthermore, in some cases, other subunits regulate the site specificity. For example, the methylation site recognized by Trm7 is determined by its partner subunit (Guy et al., 2012) and the site specificity of archaeal Trm4 changes in the presence of archease (Auxilien et al., 2007). Moreover, the m5C modifications in eukaryotic tRNA are regulated by the presence of an intron in the precursor tRNA (Motorin and Grosjean, 1999; Brzezicha et al., 2006; Auxilien et al., 2012). In addition, some 2- -*O*-methylated nucleosides in archaeal tRNA are introduced by the aFib, Nop5p and L7Ae complex with the BoxC/D guide RNA system (Clouetd'Orval et al., 2001, 2005; Bortolin et al., 2003; Singh et al., 2004; Renalier et al., 2005; Ye et al., 2009; Joardar et al., 2011; Lin et al., 2011). In some cases, an intron in the precursor tRNA acts as the guide RNA (Clouet-d'Orval et al., 2001, 2005; Bortolin et al., 2003; Singh et al., 2004). This system is useful in minimizing the size of the genome. In the future, it is possible that considerable numbers of 2- -*O*-methylated modifications in archaeal tRNA might be identified as products of this system.

### **REGULATION OF THE DEGRADATION AND LOCALIZATION OF tRNA BY METHYLATED NUCLEOSIDES**

As shown in **Table 1**, modifications of the anticodon loop (positions 32–38) are involved directly in protein synthesis whereas other modifications affect the structure of the tRNA. Consequently, for a long time, it was thought that modifications outside the anticodon loop acted to stabilize tRNA structure and regulate the half-life of tRNAs. Indeed, we observed in the thermophilic eubacterium *Thermus thermophiles* that hypomodification at multiple sites in tRNA owing to disruption of one of the modification enzymes promotes the degradation of tRNAPhe and tRNALys at high temperatures (Tomikawa et al., 2010).

In the case of eukaryotes, tRNA methylations work coordinately as stabilizing factors and markers of maturation, and the degree of modification changes in response to various stresses. Hypomodified tRNAs are degraded aggressively. For example, in the *Saccharomyces cerevisiae trm4* (synthesizes m5C at multiple sites) and *trm8* (produces m7G46) double knock-out strain, the half-life of tRNAVal is shortened and the strain shows a growth defect (Alexandrov et al., 2006). Therefore, tRNA modifications stabilize tRNA structure coordinately and systems to degrade hypomodified tRNAs exist in eukaryotic cells (Alexandrov et al., 2006; Chernyakov et al., 2008; Phizicky and Hopper, 2010; D'Silva et al., 2011; Dewe et al., 2012). Furthermore, in *S. cerevisiae*, the m1A58 modification by the Trm6–Trm61 complex regulates both the degradation of initiator tRNAMet and its transport from the nucleus to the cytoplasm (Anderson et al., 1998, 2000; Kadaba et al., 2004). The m1A58 modification functions a marker of maturation and absence of modification leads to degradation of initiator tRNAMet during transport. Thus, m1A58 is part of the RNA quality control system. Moreover, in the case of *S. cerevisiae,* splicing is performed in the cytoplasm (Takano et al., 2005) and precursor tRNAs are matured during repeated-transports between the nucleus and cytoplasm (Ohira and Suzuki, 2011). Therefore, some tRNA modifications might act as the markers of maturation at halfway checkpoints. In *Leishmania tarentolae*, a proportion of tRNAGlu and tRNAGln is transported from the cytoplasm to the mitochondria (Kaneko et al., 2003). In the cytoplasmic tRNA, U34 is modified to mcm5s 2U34, whereas in the mitochondrial tRNA it is modified to mcm5Um34. These results suggest that the s2U34 modification may suppress transport from the cytoplasm to mitochondria. Given that both the s 2U and Um modifications shift the equilibrium of ribose puckering to the C3- -endo form (Kawai et al., 1992), these modifications have a nearly equivalent stabilizing effect on the codon-anticodon interaction. The 5-methylcarboxymethyl (mcm) group restricts wobble base pairing (Takai and Yokoyama, 2003). Taken together, these findings suggest that a substantial number of methylated nucleosides contribute to RNA quality control systems and/or the regulation of tRNA localization, even though they were considered previously to have simply a structural role.

### **ADAPTATION OF PROTEIN SYNTHESIS TO ENVIRONMENTAL CHANGE THROUGH A NETWORK BETWEEN MODIFIED NUCLEOSIDES AND tRNA MODIFICATION ENZYMES tRNA MODIFICATIONS IN** *T. THERMOPHILUS*

*Thermus thermophilus* provides an example of a living organism that utilizes changes in the structural rigidity (flexibility) of tRNA through multiple nucleoside modifications to adapt protein synthesis to environmental changes. *Thermus thermophilus* is an extreme thermophilic eubacterium found in hot springs and can grow at a wide range of temperatures (50∼83◦C ). Under natural conditions, the temperature of hot springs can be changed dramatically by several factors, for instance the overflow of hot spring water, snow falling, and the influx of river water. *Thermus thermophilus* can synthesize proteins in response to these temperature changes. Three distinct modifications (Gm18, m5s 2U54, and m1A58) are found in *T. thermophilus* tRNA and the combination of these modifications increases the melting temperature of tRNA by near 10◦C as compared with that of the unmodified transcript (Watanabe et al., 1976; Horie et al., 1985; Shigi et al., 2006; Tomikawa et al., 2010). Although these modifications are very important as structural factors in tRNA, they do not have an effect on translational fidelity below 65◦C and the level of modification is very low in tRNA from cells cultured at 50◦C (**Figure 5A**). This change in the extent of modification reflects the adaptation of protein synthesis to temperature change (Yokoyama et al., 1987). Transfer RNAPhe from cells cultured at 80◦C efficiently synthesizes poly(U) at high temperatures (above 65◦C). In contrast, tRNAPhe from cells cultured at 50◦C, in which the levels of the three modifications are low, works efficiently at low temperatures (**Figure 5B**). Thus, the levels of three modified nucleosides, Gm18, m5s 2U54, and m1A58, in tRNA control the elongation of translation *via* the flexibility of the tRNA. These findings were

**modification enzymes observed in** *T. thermophilus***. (A)** The proportion of Gm18, m5s2U54, and m1A58 in tRNA (contents in tRNA fraction) increases with increasing culture temperature. **(B)** Transfer RNAPhe from cells cultured at 80◦C can efficiently synthesize Poly(U) at high temperatures. In contrast, at low temperatures, tRNAPhe from cells cultured at 50◦C can work more efficiently than tRNA from cells cultured at 80◦C. **(C)** Modifications of tRNA in *T. thermophilus* are depicted on the clover-leaf structure. Dotted lines show the tertiary base pairs. The levels of the m7G46 and ψ55 modifications are nearly 100% at a wide range of temperatures. The levels of modifications marked by yellow are regulated by the m7G46 and ψ55 modifications. **(D)** At temperatures greater than 65◦C, the presence of m7G46 increases the rates of modification of Gm18 by TrmH, m1A58 by TrmI and m1G37 by TrmD. The

reported in 1987 (Yokoyama et al., 1987). However, at the beginning of the twenty-first century, the mechanisms of regulation of these modifications remained unknown.

### **SWITCHING OF NETWORK BETWEEN MODIFIED NUCLEOSIDES AND tRNA MODIFICATION ENZYMES**

Initially, we assumed that transcriptional and/or translational regulation of the tRNA modification enzymes was involved in the regulation of the three modifications. However, unexpectedly, we have observed that the phenomenon can be simply explained by the RNA recognition mechanisms of the tRNA modification enzymes (Shigi et al., 2002; Tomikawa et al., 2010; Ishida et al., 2011; Yamagami et al., 2012). Several common modifications (for example, m7G46 and ψ55) are found in *T. thermophilus* tRNA in addition to Gm18, m5s 2U54, and m1A58. When the genes for the modification enzymes for m7G46 and ψ55 (*trmB* and *truB*, respectively) were disrupted individually, the levels of Gm18, m5s 2U54 and m1A58 in tRNA were changed dramatically temperatures below 65◦C, the ψ55 modification increases rigidity within the local structure of the tRNA as described in the main text. This network provides a mechanism by which extreme thermophilic eubacteria adapt to temperature changes. The network regulates the order of modifications in tRNA. This figure summarizes the experimental data in these publications Yokoyama et al. (1987), Shigi et al. (2006), Tomikawa et al. (2010), Ishida et al. (2011), Yamagami et al. (2012).

exchange complex that is required for the formation of m5s2U54. Therefore, at high temperatures, m7G46, m5U54, and m1A58 coordinately promote the formation of m5s2U54 and increases tRNA stability. In contrast, at low

(Tomikawa et al., 2010; Ishida et al., 2011). Thus, modified nucleosides and tRNA modification enzymes form a network, and this network regulates the extent of modifications on the basis of temperature (**Figures 5C,D**).

At high temperatures (above 65◦C), m7G46 functions as a marker of precursor tRNA and increases the reaction rates of other modification enzymes. In contrast, at low temperatures, ψ55 confers local structural rigidity and slows down the rate of formation of other modifications around ψ55 (that is, Gm18, m5s 2U54, and m1A58). This inhibitory effect weakens as the temperature increases and is not observed above 65◦C. Thus, the m7G46 and ψ55 modifications work as an accelerator and a brake in the network, respectively. The advantage of this mechanism is that the network does not include any transcriptional or translational regulatory steps: protein synthesis is not necessary. Thus, the response of the network to environmental changes is very rapid. This is a typical strategy in eubacteria, where genome size is limited.

Similar networks between modified nucleosides and tRNA modification enzymes have also been reported in mesophiles. For example, ms2i 6A37 modification in *E. coli* tRNA is required for 2- - *O*-methylation by TrmL (Benítez-Páez et al., 2010), and the Cm32 and Gm34 modifications in *S. cerevisiae* tRNAPhe are required for the formation of yW37 from m1G37 (Guy et al., 2012). However, the network in *T. thermophilus* is distinct because the modifications are almost all in the three-dimensional core of the tRNA and the network responds to environmental changes.

### **GENETIC DISEASE AND tRNA METHYLATION**

Modifications of tRNA regulate protein synthesis. Consequently, if a disruption of tRNA modification is not lethal, it can directly cause a genetic disease. In fact, there are several reports concerning the relationship between genetic disease and tRNA modification (Yasukawa et al., 2001; Suzuki et al., 2002, 2011b; Freude et al., 2004; Kirino et al., 2005; Umeda et al., 2005; Wei et al., 2011; Towns and Begley, 2012; Igoillo-Esteve et al., 2013). In particular, the number of reports of a link between diabetes and tRNA modification are increasing, which suggests that an increase in the frequency of translation errors has an effect on energy metabolism. The severe disruption of energy metabolism often damages muscle and neuronal cells, which consume large amounts of energy. This perspective enables mitochondrial diseases that are caused by a problem with mitochondrial tRNA modification to be understood (Yasukawa et al., 2001; Suzuki et al., 2002, 2011b; Kirino et al., 2005; Umeda et al., 2005). Furthermore, several tRNA methyltransferases are fused to DNA repair enzymes, which means that these enzymes are related directly to DNA repair and carcinogenesis (Choudhury et al., 2007a,b; Shimada et al., 2009). Moreover, abnormal tRNA modifications have been also reported in cancers (Kuchino and Borek, 1978; Kuchino et al., 1981; Shindo-Okada et al., 1981). These might be caused by the rearrangement of chromosomes in cancer cells.

### **INFECTION, IMMUNITY, AND tRNA METHYLATIONS—tRNA THERAPY**

Among the tRNA modification enzymes, tRNA guanine transglycosidase (Tgt), which is required for the production of Q34, and tRNA ψ55 synthase (TruB), which generates ψ55, are essential factors for infection by *Shigella flexneri* (Durand et al., 1994) and *Pseudomonas aeruginosa* (Saga et al., 1997), respectively. Similarly, we also found that tRNA (m7G46) methyltransferase is essential for infection by *Colletotrichum lagenarium*, an infectious fungus (Takano et al., 2006). Furthermore, tRNAs that contain mcm5U modifications are the target of *Kluyveromyces lactis* gamma-toxin (Lu et al., 2005) and *Pichia acaciae* killer toxin (Klassen et al., 2008). Moreover, given that retroviruses utilize host tRNA as the primer for reverse transcription, tRNA methylation and methyltransferases are involved in both reverse transcription and the packaging of virus particles. For example, human immunodeficiency virus (HIV; AIDS virus) utilizes the m1A58 modification in tRNA*Lys*3 as the terminator of reverse transcription (see reviews, Marquet, 1998; Saadatmand and Kleiman, 2012; Sleiman et al., 2012). Consequently, the regulation of tRNA modification and modification enzymes might be a powerful tool to control infectious organisms.

When an exogenous single-stranded RNA such as *Haemophilus influenzae* tRNA is present in humans, Tolllike receptor 7 (TLR7) forms a dimer structure and then activates the immune response systems (**Figure 6**). However, endogenous or *E. coli* tRNA does not stimulate TLR7. The mechanism of differentiation was clarified recently by two groups, who found that the Gm18 modification in *E. coli* tRNA suppresses immunostimulation *via* TLR7 (Gehrig et al., 2012; Jöckel et al., 2012). Thus, enterobacteria exploit the Gm18 modification in tRNA to avoid the host immune system. Furthermore, given that Gm18-modified tRNA acts as an antagonist of TLR7 (Jöckel et al., 2012), Gm18-modified tRNA might be an effective anti-inflammatory drug.

### **EVOLUTION OF MODIFICATIONS IN tRNA**

Finally, it is worthwhile discussing the evolution of modifications in tRNA. During the early period of chemical evolution (see reviews Cermakian and Cedergren, 1998; Joyce and Orgel, 2006), inosine could be used as a basic component of RNA, because it can be synthesized from adenosine non-enzymatically. Inosine seems to have been excluded after the appearance of genes because it changes the genetic information during the replication process. Simple methylated nucleosides such as m1G became essential when the reading frame of protein synthesis was separated into three-nucleotide units (Björk et al., 1989, 2001). Thus, several methylated nucleosides seem to have appeared during the chemical evolution period (Cermakian and Cedergren, 1998). After the appearance of the reading frame, the importance of the availability of methyl groups increased and it seems that the methionine codon was selected as the translation initiation codon.

It appears that complicated enzymes were not formed during the period of chemical evolution (Joyce and Orgel, 2006). The early enzymes might have been oligopeptides and might have included metals as the catalytic center, as is the case for deaminases (Carter, 1998; Schaub and Keller, 2002). It is possible that the codons were not fixed strictly as is observed in the universal code (Jukes, 1973; Cedergren et al., 1986; Osawa et al., 1992). However, it is likely that the most basic catalytic core of tRNA methyltransferases was established when cell-like organisms began to exchange their components and genes because the basic structure of tRNA methyltransferases is shared by all living organisms found today (**Figure 4** and **Table 2**). The structures of methyltransferases (Schubert et al., 2003) suggest that RNA methyltransferases, which were required for protein synthesis, evolved to yield DNA and protein methyltransferases many times during the evolution of life. The mechanisms to generate the complicated modified nucleotides that regulate the wobble base pair seem to have arisen after the origination of living organisms because they show considerable diversity and involve multistep reactions (**Table 1**).

The temperature of primordial Earth was higher than that of the Earth at present. Consequently, several nucleoside modifications in tRNA and rRNA would be necessary to stabilize the structure of the RNA (Motorin and Helm, 2010). However, it is likely that the network between modified nucleosides and tRNA modification enzymes that is observed in extreme thermophiles (**Figure 5D** and section Adaptation of Protein Synthesis

to Environmental Change Through a Network Between Modified Nucleosides and tRNA Modification Enzymes) was established after the cooling of the Earth because it responds to low temperatures (Ishida et al., 2011). Obviously, the functions of modified nucleotides with respect to the RNA quality control system and regulation of cellular localization were acquired after the appearance of eukaryotes (see section Regulation of the Degradation and Localization of tRNA by Methylated Nucleosides).

Transfer RNA modifications are still evolving. The most powerful driving force is the existence of infectious organisms (see section Infection, Immunity, and tRNA Methylations—tRNA Therapy). Hosts need to distinguish endogenous RNA from exogenous RNA to prevent infection and infectious organisms need to avoid the host defense system to survive. Consequently, tRNA modifications and modification enzymes are still subject to evolution even today.

### **REFERENCES**


members, together with pseudouridine synthase Pus10, catalyze the formation of 1-methylpseudouridine at position 54 of tRNA. *RNA* 18, 421–433. doi: 10.1261/rna.030841.111


Chen, P., Crain, P. F., Näsvall, S. J., Pomerantz, S. C., and Björk, G. R. (2005). A "gain of function" mutation in a protein mediates production of novel modified nucleosides. *EMBO J*. 24, 1842–1851. doi: 10.1038/sj.emboj. 7600666


at 2.6 A resolution: a novel methyltransferase fold. *Proteins* 53, 326–328. doi: 10.1002/prot.10479


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 28 March 2014; accepted: 04 May 2014; published online: 23 May 2014. Citation: Hori H (2014) Methylated nucleosides in tRNA and tRNA methyltransferases. Front. Genet. 5:144. doi: 10.3389/fgene.2014.00144*

*This article was submitted to Non-Coding RNA, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Hori. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Biosynthesis and functions of sulfur modifications in tRNA

### *Naoki Shigi\**

*Biomedical Research Institute, National Institute of Advanced Industrial Science and Technology, Tokyo, Japan*

### *Edited by:*

*Akio Kanai, Keio University, Japan*

### *Reviewed by:*

*Hiroyuki Hori, Ehime University, Japan Yumi Nakai, Osaka Medical College, Japan*

### *\*Correspondence:*

*Naoki Shigi, Biomedical Research Institute, National Institute of Advanced Industrial Science and Technology, 2-4-7 Aomi, Koto-ku, Tokyo 135-0064, Japan e-mail: naoki-shigi@aist.go.jp*

Sulfur is an essential element for a variety of cellular constituents in all living organisms. In tRNA molecules, there are many sulfur-containing nucleosides, such as the derivatives of 2-thiouridine (s2U), 4-thiouridine (s4U), 2-thiocytidine (s2C), and 2-methylthioadenosine (ms2A). Earlier studies established the functions of these modifications for accurate and efficient translation, including proper recognition of the codons in mRNA or stabilization of tRNA structure. In many cases, the biosynthesis of these sulfur modifications starts with cysteine desulfurases, which catalyze the generation of persulfide (an activated form of sulfur) from cysteine. Many sulfur-carrier proteins are responsible for delivering this activated sulfur to each biosynthesis pathway. Finally, specific "modification enzymes" activate target tRNAs and then incorporate sulfur atoms. Intriguingly, the biosynthesis of 2-thiouridine in all domains of life is functionally and evolutionarily related to the ubiquitinlike post-translational modification system of cellular proteins in eukaryotes. This review summarizes the recent characterization of the biosynthesis of sulfur modifications in tRNA and the novel roles of this modification in cellular functions in various model organisms, with a special emphasis on 2-thiouridine derivatives. Each biosynthesis pathway of sulfurcontaining molecules is mutually modulated via sulfur trafficking, and 2-thiouridine and codon usage bias have been proposed to control the translation of specific genes.

**Keywords: post-transcriptional modification, post-translational modification, sulfur, tRNA, ubiquitin**

### **INTRODUCTION**

A characteristic structural and functional feature of RNA is post-transcriptional modification. More than 100 forms of naturally occurring chemical modification have been reported to date1,2 (Cantara et al., 2011; Machnicka et al., 2013). The roles of modified nucleosides in tRNA are important and wide-ranging, and include critical roles in biogenesis, structural stability, codon recognition, maintenance of reading frame, and identification elements for the translation machinery (Björk, 1995; Curran, 1998).

The biosynthesis and functions of thionucleosides have been elucidated mainly by using *Escherichia coli*, *Salmonella enterica* serovar Typhimurium, and *Saccharomyces cerevisiae* as model organisms. *E. coli* tRNAs contain five thionucleosides, 4-thiouridine (s4U) at position 8, 2-thiocytidine (s2C) at position 32, 5-methylaminomethyl-2-thiouridine (mnm5s 2U) or 5-carboxymethylaminomethyl-2-thiouridine (cmnm5s 2U) at position 34, and 2-methylthio-*N*6-isopentenyladenosine (ms2i 6A) at position 37 (**Figure 1**). The biosynthesis of these thionucleosides can be divided into two major groups depending on the involvement of iron–sulfur (Fe–S) cluster biosynthesis. The thiouridines s4U8 and (c)mnm5s 2U34 are synthesized independently of Fe–S cluster formation, while s2C32 and ms2i 6A37 synthesis is dependent upon Fe–S cluster formation, which suggest that Fe–S-containing proteins are present in the latter biosynthesis pathways (**Figure 2**; Lauhon et al., 2004; Leipuviene et al., 2004).

The first step of mobilization of sulfur in both pathways starts with the activation of the sulfur atom of cysteine by an enzyme, cysteine desulfurase, IscS (**Figure 2**). IscS forms an enzyme-bound persulfide (IscS–SSH) using pyridoxal-5'-phospate (PLP) as a cofactor and this activated sulfur is transferred to the next acceptor protein in each pathway. For the biosynthesis of s4U8 and (c)mnm5s 2U34, the sulfur atom is transferred to specific sulfurcarrier proteins or a "modification enzyme" (Palenchar et al., 2000; Ikeuchi et al., 2006). The modification enzymes bind and activate target tRNA and catalyze the final step of sulfur transfer to tRNA. For the biosynthesis of s2C32 and ms2i 6A37 (Fe–S cluster dependent pathway), the persulfide generated by IscS is transferred to the "scaffold" protein IscU, on which the Fe–S cluster is synthesized, and the Fe–S cluster is then incorporated into the modification enzymes for ms2i 6A37 (and maybe also for s2C32; Pierrel et al., 2002, 2003, 2004; Jäger et al., 2004). In ms2i 6A synthesis, it was reported that the sulfur atom in the Fe–S cluster is not the sulfur donor (Forouhar et al., 2013); therefore, the ultimate sulfur donor *in vivo* remains to be determined.

The sulfur atom activated by IscS is also used in molybdenum cofactor (Moco) and thiamin biosynthesis (**Figure 2**). These are sulfur-containing cofactors whose biosynthesis also includes unique sulfur-carrier proteins. Moco is incorporated into the active sites of many molybdoenzymes, including nitrate reductase, sulfite oxidase, and xanthine dehydrogenase (Schindelin et al., 2001). Moco contains a molybdenum atom and a pterin named molybdpterin (MPT). In MPT biosynthesis, two sulfur atoms are incorporated into precursor Z using a protein-thiocarboxylate as a sulfur donor. Thiamin is an essential cofactor for enzymes involved

<sup>1</sup>http://mods.rna.albany.edu/

<sup>2</sup>http://modomics.genesilico.pl/

**FIGURE 1 | Sulfur-containing tRNA modifications. (A)** Secondary structure of tRNA and positions of thiolated nucleosides in tRNA. **(B)** Chemical structure of thiolated nucleosides in *E. coli*: s4U, 4-thiouridine; s2C, 2-thiocytidine; xm5s2U, 5-methyl-2-thiouridine derivatives; ms2i 6A, 2-methylthio-*N*6-isopentenyladenosine. **(C)** Conformation of the xm5s2U: C3' -endo form is preferred because of the steric hindrance of the 2-thio and 2' -OH groups.

in carbohydrate and branched-chain amino acid metabolism and is synthesized from thiazole and pyrimidine moieties (Settembre et al., 2003). The sulfur atom of the thiazole ring is added in most bacteria by a system similar to the Moco biosynthetic machinery.

In *S. cerevisiae*, there are two thiouridines in tRNA, 5 methoxycarbonylmethyl-2-thiouridine (mcm5s 2U34) in cytosolic tRNAs and 5-carboxymethylaminomethyl-2-thiouridine (cmnm<sup>5</sup> s 2U34) in mitochondrial tRNAs. The biosynthesis pathway of 2-thiouridine in cytosolic tRNA is Fe–S cluster dependent, while the mitochondrial pathway is independent of Fe–S cluster formation (Umeda et al., 2005; Nakai et al., 2007). The biosynthesis of s2U in cytosolic tRNA in the eukaryote utilizes a proteinthiocarboxylate as intermediate sulfur donor. This pathway is functionally and evolutionarily related to the ubiquitin-like posttranslational modification system of cellular proteins in eukaryotes and a similar biosynthesis pathway in archaea was reported (Humbard et al., 2010; Miranda et al., 2011).

In some thermophiles, 5-methyl-2-thiouridine (m5s 2U) [also called 2-thioribothymidine (s2T)] occurs at position 54 in the T-loop (**Figure 1**). Intriguingly, the biosynthesis pathway (Shigi et al., 2008) is similar to that of cytosolic s2U34 in eukaryotes, and ubiquitin-like post-translational modification of cellular proteins has recently been discovered also in the bacteria domain (Shigi, 2012).

In this review, I summarize recent advances with respect to the characterization of the biosynthesis mechanisms of sulfur modifications in tRNA, with special reference to 2-thiouridine derivatives. Up to the time of writing, two major pathways for the biosynthesis of 2-thiouridine have been reported. These pathways differ in terms of the types of modification enzyme and the ultimate sulfur donor (**Table 1**). The novel roles of 2-thiouridine in cellular functions have been revealed by new techniques including genome-wide analyses in some model organisms. Interestingly, each biosynthesis pathway to sulfur-containing molecules has been suggested to be mutually modulated via sulfur trafficking and translational control of specific genes by 2-thiouridine derivatives in tRNAs.

### **FUNCTIONAL PROPERTIES OF 2-THIOURIDINE BASED ON ITS STRUCTURE**

The 2-thiouridine modification at position 34 and 54 plays critical roles in protein synthesis. Position 34 (the wobble base) of tRNAs for Glu, Gln, and Lys are universally modified to 5-methyl-2-thiouridine derivatives (xm5s 2U; **Figure 1**): 5-methylaminomethyl-2-thiouridine (mnm5s 2U) and 5-carboxymethylaminomethyl-2-thiouridine (cmnm5s 2U) in bacterial tRNAs, 5-methoxycarbonylmethyl-2-thiouridine (mcm5s 2U) in eukaryotic cytosolic tRNAs, cmnm5s 2U in yeast mitochondrial tRNA, and 5-taurinomethyl-2-thiouridine (τm5s 2U) in mammalian mitochondrial tRNAs (Suzuki, 2005).

The conformation of xm5s 2U preferentially takes the C3'-*endo* form of ribose puckering, because of the steric effect of the bulky 2-thiocarbonyl group toward the 2'-hydroxyl group (**Figure 1C**; Yokoyama et al., 1985; Agris et al., 1992). The xm5s 2U34 base pairs preferentially with purines and prevents misreading of near cognate codons ending in pyrimidines (Agris et al., 1973; Yokoyama et al., 1985; Murphy et al., 2004; Durant et al., 2005; Johansson et al., 2008) and frame shifting (Urbonavicius et al., 2001; Atkins and Björk, 2009; Isak and Ryden-Aulin, 2009; Jäger et al., 2013). The 2-thio group of xm5s 2U34 is required for efficient codon recognition on the ribosome (Ashraf et al., 1999; Vendeix et al., 2012; Rodriguez-Hernandez et al., 2013). In addition, the 2-thio group of cmnm5s 2U34 in tRNAGluacts as the identity element for specific recognition by glutaminyl-tRNA synthetase (Sylvers


**Table 1 |Two pathways of 2-thiouridine biosynthesis.**

et al., 1993; Rodriguez-Hernandez et al., 2013). In human, a defect in mitochondrial translation is induced by the lack of xm5s 2U34 modification in mutant mitochondrial tRNALys from patients with myoclonus epilepsy with ragged-red fibers (MERRF; Yasukawa et al., 2000, 2001).

The 2-thio modification of m5s 2U (s2T) at position 54 in the T-loop also plays an important role in protein synthesis in high temperature environments. In thermophilic organisms such as *Thermus thermophilus* and *Pyrococcus furiosus*, almost all tRNA species are modified to m5U54 and m5s 2U54 (Watanabe et al., 1974; Kowalak et al., 1994). The m5s 2U54 is also found in the hyperthermophilic bacterium *Aquifex aeolicus* (Awai et al., 2009). The 2-thiolation content of m5U54 increases with cultivation temperature (Watanabe et al., 1976; Kowalak et al., 1994). As deletion strains of *T. thermophilus* lacking the 2-thio group of the m5s 2U54 modification show a temperature sensitive phenotype, this modification is suggested to be required for survival of the thermophile at high temperature (Shigi et al., 2006a). In the Lshaped tRNA structure, m5s 2U54 is buried inside the tertiary core and forms a reverse Hoogsteen base pair with m1A58 and also stacking with G53 and ψ55. In addition, ψ55 and C56 form tertiary base pairs with G18 and G19 in the D-loop, respectively. The rigid conformation of m5s 2U54 stabilizes the A-form helix of the D-loop–T-loop interaction, contributing to the thermostability of tRNAs in the thermophile (Watanabe et al., 1974; Horie et al., 1985).

### **MnmA PATHWAY FOR 2-THIOURIDINE SYNTHESIS IN BACTERIA AND EUKARYOTE ORGANELLES**

In *E. coli*, seven proteins are responsible for 2-thiolation of 5-methylaminomethyl-2-thiouridine (mnm5s 2U) or 5 carboxymethylaminomethyl-2-thiouridine (cmnm5s 2U) in the wobble base of tRNAGluUUC, tRNAGlnUUG, and tRNALysUUU: a cysteine desulfurase (IscS), a modification enzyme (MnmA), and three persulfide carriers (TusA, TusBCD complex, and TusE; **Figure 3A**; Kambampati and Lauhon, 2003; Ikeuchi et al., 2006; Numata et al., 2006a). The sulfur atom of L-cysteine is first activated by IscS cysteine desulfurase to form an enzyme-bound persulfide. The small sulfur-carrier proteins TusA, TusBCD, and TusE relay this sulfur atom via their active site cysteine residues to MnmA. Tus proteins stimulate sulfur transfer from IscS to the catalytic cysteine of MnmA (Cys199). MnmA is an N-type ATP-pyrophosphatase that possesses the characteristic PP-motif (Bork and Koonin, 1994) and two conserved cysteine residues (Cys102 and Cys199). The reaction mechanism was well documented in a biochemical study based on the crystal structure

of the MnmA-tRNA complex (Numata et al., 2006b). MnmA binds the anticodon arm and D-stem regions of tRNA and activates the C2-position of the uracil ring at position 34 as an acyl-adenylated intermediate (tRNA-OAMP). This is then followed by nucleophilic attack by the persulfide sulfur of MnmA-Cys199-SSH, which results in the completion of 2-thiouridine formation.

Genomic analysis of bacteria revealed that IscS, TusA, and MnmA are mostly conserved, whereas TusBCD and TusE are not found in many organisms (Kotera et al., 2010). This implies that a variation in the sulfur-transfer pathways from IscS to MnmA may exist. In eukaryotic mitochondria, NifS and Mtu1 (homologs of IscS and MnmA, respectively) are responsible for 2-thiolation of cmnm5s 2U in yeast and 5-taurinomethyl-2-thiouridine (τm5s 2U) in mammals (Umeda et al., 2005). In eukaryotic mitochondria, the intermediate sulfur carriers remained to be identified.

### **Ncs6/Urm1 PATHWAY FOR 2-THIOURIDINE SYNTHESIS IN THE CYTOSOL OF EUKARYOTES**

The Ncs6/Urm1 pathway is responsible for the 2-thiolation of 5-methoxycarbonylmethyl-2-thiouridine (mcm5s 2U) in the wobble base of tRNAGluUUC, tRNAGlnUUG, and tRNALysUUU in the cytosol of eukaryotes (*S. cerevisiae*, *Schizosaccharomyces pombe*, *Caenorhabditis elegans*, *Homo sapiens*; Esberg et al., 2006; Björk et al., 2007; Dewez et al., 2008; Huang et al., 2008; Nakai et al., 2008; Schlieker et al., 2008; Schmitz et al., 2008; Leidel et al., 2009; Noma et al., 2009). A similar pathway was reported subsequently in plants (Leiber et al., 2010; Nakai et al., 2012). With the exception of the first step catalyzed by cysteine desulfurase Nfs1, the eukaryotic pathway is quite different from the MnmA pathway in bacteria described above. The function of Nfs1 is to donate the sulfur to the Fe–S cluster and 2-thiouridine. Formation of 2-thiouridine is dependent on Fe–S cluster biosynthesis (ISC) and cytosolic Fe–S cluster assembly (CIA) machineries in yeast (Nakai et al., 2007). This suggests that the Ncs6/Urm1 pathway depends on Fe–S protein(s), although at the time of writing it remains to be determined which protein(s) possess Fe–S cluster(s).

The Ncs6/Urm1 pathway is composed of at least six proteins including a cysteine desulfurase (Nfs1), a modification enzyme complex (Ncs6/Ncs2), two sulfur carriers (Urm1 and Tum1), and an activation enzyme for Urm1 (Uba4; **Figure 3B**). The gene names here are those of *S. cerevisiae*, and homologs of Ncs6/Ncs2 and Uba4 in humans are designated ATPBD3/CTU2 and MOCS3 (molybdenum cofactor synthesis 3), respectively. Tum1 and Uba4 contain rhodanese-like domains (RLDs) bearing conserved cysteine residues. Rhodanese is a widespread sulfur-carrier enzyme

that catalyzes sulfur-transfer reactions in various metabolic pathways (Bordo and Bork, 2002). The conserved cysteine residues of RLDs in Tum1 and Uba4 are critical for 2-thiouridine formation *in vivo*. Tum1 probably directs sulfur flow to 2-thiouridine formation (Noma et al., 2009). The persulfide of Nfs1 is transferred to the RLD of Uba4 mainly via the RLD of Tum1.

protein modifier. A thioester conjugate or an acyldisulfide conjugate (not

Urm1 is a ubiquitin-related modifier and Uba4 is an E1 like Urm1-activating enzyme involved in protein urmylation (see

the following; Furukawa et al., 2000). The carboxy-terminus of Urm1 is first activated as an acyl-adenylate intermediate (Urm1- COAMP) and then thiocarboxylated (Urm1-COSH) by a persulfide from the RLD of Uba4 (**Figure 3B**; Schlieker et al., 2008; Schmitz et al., 2008; Leidel et al., 2009; Noma et al., 2009). The activated thiocarboxylate may be utilized in subsequent reactions for 2-thiouridine formation, which is mediated by a heterodimer complex, Ncs6/Ncs2 (Dewez et al., 2008; Noma et al., 2009).

involved in sulfur transfer.

Ncs6 has the PP-motif and many CXXC motifs (**Figure 4A**; see TtuA/TtuB Pathway for 2-Thiouridine Synthesis in Thermophile tRNAs). Thus, 2-thiolation of mcm5s 2U shares a pathway and chemical reactions with protein urmylation. Intriguingly, eukaryotic 2-thiouridine formation employs a thiocarboxylated intermediate as the active form of the sulfur atom, which is a mechanism distinct from bacterial sulfur-relay based on persulfide chemistry.

### **POST-TRANSLATIONAL MODIFICATION OF CELLULAR PROTEINS BY Urm1 IN EUKARYOTES**

Ubiquitin (Ub) and ubiquitin-like proteins (Ubls) are posttranslational protein modifiers with important roles in proteolysis and the regulation of diverse processes in eukaryotes (Hochstrasser, 2009). The breakdown of the Ub/Ubl system is often associated with the development of various diseases. In the first step of conjugation to target proteins, the conserved C-terminal glycine of Ub/Ubl is acyl-adenylated by an activating enzyme (E1) and covalently linked to a cysteine residue of E1 to form an Ub/Ubl-E1 thioester intermediate. The activated Ub/Ubl is next transferred to a conjugating enzyme (E2). Finally, Ub/Ubl is attached to a lysine residue in the target protein by a ligase (E3; Hochstrasser, 2009).

Proteins homologous to eukaryotic Ub/Ubl and E1s exist in almost all members of bacteria and archaea (Iyer et al., 2006;

to group II of the TtcA family. TtuA is composed of two Zn finger domains and a catalytic domain. TtcA is responsible for the formation of 2-thiocytidine34. **(B)** Structure of *P. horikoshii* TtuA. The three domains are colored the same as in **(A)**. The PP-motif is shown in red. Three Cys residues that is important for enzyme activity is shown with a stick representation. The target K135 for TtuB conjugation (K137 in *T. thermophilus*) is shown in a blue stick model.

Burroughs et al., 2009, 2012). Earlier works established that these bacterial proteins function in the biosynthesis of sulfur compounds such as molybdenum cofactor and thiamin (Kessler,2006). Bacterial Ubls (MoaD and ThiS) are adenylated by cognate E1 homologs (MoeB and ThiF), subsequently bind activated sulfur via their C-termini to form thiocarboxylates, and finally act as sulfur donors (**Figure 2**; Pitterle and Rajagopalan, 1993; Taylor et al., 1998; Lauhon and Kambampati, 2000; Leimkühler et al., 2001; Zhang et al., 2010). These findings imply an evolutionary link between the eukaryotic Ub/Ubl system and the bacterial sulfurtransfer reaction (Iyer et al., 2006; Hochstrasser, 2009). Urm1 is an ubiquitin-related modifier and Uba4 is an E1-like enzyme involved in protein urmylation in eukaryotes (Furukawa et al., 2000; **Figure 3B**). A thioester conjugate (Furukawa et al., 2000) or an acyldisulfide conjugate (Van der Veen et al., 2011) of Urm1 and Uba4 were proposed in this process. As Urm1 also functions as a sulfur donor for 2-thiouridine synthesis (see the preceding) and has close sequence and structural homology with bacterial Ubls (Xu et al., 2006), Urm1 is thought to be the most ancient Ubl possessing dual functions of protein modifier and sulfur carrier. The E2 and E3 enzymes for urmylation have not been identified at the time of writing.

Several targets of urmylation have been identified upon cell exposure to an oxidant (Van der Veen et al., 2011), although earlier reports only identified a peroxiredoxin Ahp1 (Goehring et al., 2003a,b). Among them, a modification enzyme complex ATPBD3/CTU2 and an E1-like MOCS3, both of which are required for 2-thiouridine biosynthesis, have been identified. The target residues in these proteins have not been identified and the roles of urmylation of these proteins are unknown at the time of writing; however, regulation of the activities of these enzymes would be possible.

### **TtuA/TtuB PATHWAY FOR 2-THIOURIDINE SYNTHESIS IN THERMOPHILE tRNAs**

Prior work from our group identified the TtuA/TtuB pathway for the biosynthesis of thiouridine (m5s 2U) at position 54 in tRNAs from a thermophilic bacterium *T. thermophilus*. The TtuA/TtuB pathway includes cysteine desulfurases (IscS or SufS), a modification enzyme (TtuA), a small ubiquitin-like sulfur carrier (TtuB), and an activation enzyme for TtuB (TtuC; **Figure 3C**; Shigi et al., 2006a,b, 2008). Similar to the eukaryotic Ncs6/Urm1 pathway described above, the C-terminal Gly of TtuB is acyl-adenylated (TtuB-COAMP) by TtuC and is then thiocarboxylated (TtuB-COSH) by cysteine desulfurases (IscS or SufS). The sulfur atom of the thiocarboxylated TtuB is transferred to tRNA by TtuA. This step also requires ATP as a cofactor and TtuA possesses the PPmotif, suggesting that TtuA may activate the target uridine as an acyl-adenylate. The sulfur-transfer activity in the *in vitro* system requires the addition of cell-free extract and the activity was low, suggesting that there may still be additional factors required for TtuA-mediated sulfur transfer to tRNA.

TtuA and eukaryotic Ncs6 are homologous to each other, and belong to group II of the TtcA family, whose members are characterized by five conserved CXXC(H) motifs and the PP-motif (**Figure 4A**; Bork and Koonin, 1994; Jäger et al., 2004; Björk et al., 2007). TtcA, which catalyzes 2-thiocytidine (s2C) synthesis (Jäger

et al., 2004), has only two CXXC motifs and the PP-motif, and therefore belongs to group I of the TtcA family. The PP-motif is used for ATP binding to adenylate the target nucleotide, and is widely distributed among ATP pyrophosphatases, including modification enzymes MnmA (see the preceding; Numata et al., 2006b), ThiI for 4-thiouridine (s4U) synthesis (Mueller and Palenchar, 1999; Palenchar et al., 2000), and TilS for tRNAIle2 lysidine synthesis (Ikeuchi et al., 2005).

We determined the crystal structure of the TtuA homolog (PH0300) of the archeaon *P. horikoshii* (Nakagawa et al., 2013). The *P. horikoshii* genome has two TtuA/Ncs6-like ORFs: one (PH0300) seems to be an ortholog of *T. thermophilus* TtuA; and the other (PH1680) seems to be an ortholog of eukaryotic Ncs6, based on their sequence homology to TtuA and Ncs6. The *P. horikoshii* TtuA forms a homodimer, and each subunit contains a catalytic domain and unique N- and C-terminal zinc fingers (**Figure 4B**). The N-terminal zinc finger is made up of the first and second CXXC/H motifs, where the zinc atom is coordinated by three Cys residues and one His residue. The C-terminal zinc finger is made up of the fourth and fifth CXXC motifs, where the zinc atom is coordinated by four Cys residues.

Interestingly, the catalytic domain of TtuA has much higher structural similarity to that of another tRNA modification enzyme, TilS (tRNAIle2 lysidine synthetase), than to the other type of tRNA 2-thiolation enzyme, MnmA (**Figure 4**). However, three Cys residues (128, 131, 220 in PhTtuA) are clustered in the putative catalytic site, which are absent in TilS. Cys128 and Cys131 are in the third CXXC motif and Cys220 is also conserved. By *in vivo* mutational analysis of TtuA in *T. thermophilus* (Nakagawa et al., 2013), it became apparent that the three conserved cysteine residues and the putative ATP-binding residues are important for TtuA activity, implying a key role for these Cys residues in sulfur transfer from TtuB-COSH to tRNA. A positively charged surface that includes the catalytic site and the two zinc fingers is likely to provide the tRNA binding site. TtuA recognizes the T-loop (Shigi et al., 2002) and Ncs6/Ncs2 is predicted to recognize the anticodon arm. The recognition mechanisms of the different target sites on tRNA require clarification.

### **POST-TRANSLATIONAL MODIFICATION OF CELLULAR PROTEINS BY TtuB IN A BACTERIUM** *T. thermophilus*

Homology modeling suggests that TtuB possesses a Ub/β-grasp fold and TtuC has significant sequence homology with the adenylation domain of eukaryotic E1s (Shigi et al., 2008). These findings suggest that Ub/Ubl homologous conjugation systems also exist in bacteria. A series of *in vivo* analyses in *T. thermophilus* revealed that TtuB is covalently attached to target proteins most likely via its C-terminal glycine (Shigi, 2012). TtuC is required for conjugate formation, and TtuC and TtuA are targets for TtuB conjugation. Mass spectrometric analysis combined with *in vivo* mutational analysis revealed that lysine residues (K137/K226/K229) in TtuA are covalently modified by the C-terminal carboxylate of TtuB. K137 in *T. thermophilus* TtuA is situated just after the third CXXC motif. In the crystal structure of *P. horikoshii* TtuA, K137 (K135 in PhTtuA) is situated close to the catalytic center of this enzyme family (**Figure 4B**; Nakagawa et al., 2013). K137 in TtuA is conserved in related bacteria and archaea, such as *Aquifex*, *Pyrococcus*, *Thermococcus*, and *Metanocaldococcus*. On the other hand, this position is occupied by a conserved arginine in eukaryotic Ncs6. K226 and K229 in *T. thermophilus* TtuA are situated just after Cys222 (Cys220 in PhTtuA), although the regions near the two lysine residues were disordered in the PhTtuA structure. However, K226 and K229 are only conserved in *T. thermophilus* and a few other species, possibly implying species-specific functions of the conjugation. Intriguingly, a deletion mutant of a JAMM [JAB1/MPN/Mov34 metalloenzyme (Ambroggio et al., 2004)] ubiquitin isopeptidase homolog in *T. thermophilus* showed aberrant TtuB-conjugates of TtuC and TtuA, and a ∼50% decrease in the amount of thiouridine in tRNA (Shigi, 2012). These results support the hypothesis that thiouridine synthesis is regulated by TtuB conjugation.

### **THE CASE IN ARCHAEA**

Although the precise chemical structure of the archaeal counterpart remains unknown at the time of writing, the existence of modified uridines at the wobble position in tRNALys and tRNAGlufrom *Haloferax volcanii* has been reported (Gupta, 1984). The existence of 2-thiouridine in tRNALys from *H. valcanii* was suggested by APM-electrophoresis (Miranda et al., 2011), a method that can detect sulfur modifications in RNAs (Igloi, 1988). Because the homologs of eukaryotic Ncs6 are widely distributed in archaeal genomes, these proteins may be involved in this modification (Kotera et al., 2010). Genetic analysis in *H. valcanii* shows that SAMP2 (small archaeal modifier protein 2) and E1-like protein UbaA are required for thiouridine formation in this organism (Miranda et al., 2011). These results indirectly suggest that SAMP2-COSH is formed and used as a sulfur donor for thiouridine formation in archaea, as observed previously in eukaryotes and bacteria. SAMPs are the first example of a ubiquitin-like protein modifier identified other than from a eukaryote (Humbard et al., 2010) and extensive studies show that the archaeal protein modification system resembles that of eukaryotes in many aspects (Van der Veen et al., 2011; Hepowit et al., 2012; Miranda et al., 2014). SAMP2 covalently conjugates too many target proteins including UbaA (a Uba4 homolog), HVO\_0580 (a Ncs6 homolog), and HVO\_0025 (a Tum1 homolog) (Humbard et al., 2010), implying that the SAMP2 modification also regulates the thiolation machinery.

4-Thiouridine (s4U), is a modified nucleotide of tRNA that is conserved from bacteria to archaea, where it functions as a sensor for near-UV irradiation (Favre et al., 1969; Carre et al., 1974; Ryals et al., 1982). Sulfur transfer in the biosynthesis of s4U has been extensively studied in bacteria (Kambampati and Lauhon, 2000; Mueller et al., 2001; Leipuviene et al., 2004). The persulfide of IscS is transferred to the RLD of ThiI, a PP-motif-containing modification enzyme. Recently, ThiI lacking an RLD domain has been characterized in methanogenic archaea (Liu et al., 2012). It has three cysteine residues (two of which come from a CXXC motif) in the putative catalytic site, and they are all required for persulfide intermediate formation. This may provide a hint about the catalytic mechanism of TtuA/Ncs6, because of the sequence and structural similarities between TtuA and ThiI (Waterman et al., 2006; Nakagawa et al., 2013).

### **BIOSYNTHESIS NETWORK OF SULFUR-CONTAINING MOLECULES**

The mobilization of sulfur in biosynthesis pathways of sulfur-containing compounds starts with the activation of the sulfur atom of cysteine by the cysteine desulfurase IscS. IscS forms an enzyme-bound persulfide and this activated sulfur is transferred to the next acceptor protein in each pathway, such as TusA (2-thiouridine in tRNA), ThiI (4-thiouridine in tRNA), IscU (Fe– S cluster), ThiS (thiamin), and MoaD (molybdenum cofactor; **Figure 2**). It is conceivable that each biosynthesis pathway of sulfur-containing molecules is mutually modulated via competition of sulfur trafficking. An interesting observation with respect this was made during a study of lambda phage infection in *E. coli* (Maynard et al., 2010, 2012). During viral infection, the normal amount of modified uridine in tRNALysUUU grarantees a normal translation and frameshifting rate for production of the proper ratio of viral gpG and gpGT proteins (gpGT production needs programmed ribosomal frameshifting). Hypomodification of tRNALysUUU caused by deletion of Tus genes in the host cell leads to increased frameshifting in the translation of viral mRNA of G and T genes, which affects the ratio of viral gpG to gpGT. A lower gpG:gpGT ratio leads to decreased virion production. Another factor lowering infection is overexpression of IscU in the host cell. In this situation, higher sulfur flow from IscS to IscU conversely lowers the sulfur flow to Tus proteins, which leads to hypomodification of tRNALysUUU and abnormal frameshifting, which finally affects the viral infection rate. The competitive binding of TusA and IscU to IscS has been analyzed in detail, based on the structures of the complexes IscS/TusA and IscS/IscU (Shi et al., 2010; Marinoni et al., 2012).

The sharing of a factor downstream of cysteine desulfurase also occurs in this sulfur-transfer network. TusA was originally identified as a sulfur carrier for 2-thiouridine synthesis (Ikeuchi et al., 2006); however, it was later reported to be involved in Moco synthesis in *E. coli* (Dahl et al., 2013; Kozmin et al., 2013). It has been proposed that the deletion of TusA leads to the overproduction of Fe–S clusters, which finally affects the expression of several genes (Dahl et al., 2013). A study of the link between the MnmA pathway and cellular redox state has recently been reported (Nakayashiki et al., 2013). By screening for mutants sensitive to hydroxyurea (HU) in *E. coli*, the authors identified mutations in all genes in the MnmA pathway (*iscS, mnmA,* and *tusA-E* in 2-thiouridine synthesis at the wobble position). These mutations resulted in a more reduced state, which may have led to a change in the activity of ribonucleotide reductase, an enzyme inhibited by HU. It is possible that a change in sulfur flow to each pathway in mutants of the MnmA pathway led to the reduced cellular state, although the precise mechanism underlying this phenomenon still remains unknown.

There is another interesting case. In *E. coli*, bacterial Ubls (MoaD and ThiS) are adenylated by cognate E1 homologs (MoeB and ThiF), subsequently bind activated sulfur at their C-termini to form thiocarboxylate, and work as sulfur donors for Moco and thiamin biosynthesis, respectively (**Figure 2**; Kessler, 2006). In the *T. thermophilus* genome, there is only one E1 homolog, TtuC. The *ttuC* mutant cannot synthesize 2-thio modification of m5s 2U54; moreover, Moco and thiamin biosynthesis are also

defective. TtuC and cysteine desulfurase can activate and thiocarboxylate TtuB, MoaD, and ThiS *in vitro*. Thus, TtuC is a common E1-like enzyme shared by the biosynthesis pathways of these three sulfur-containing compounds (Shigi et al., 2008). Similarly, in human, MOCS2A (a MoaD homolog) and URM1 are adenylated by an E1-like MOCS3 (Uba4 homolog; Chowdhury et al., 2012). In archaea, E1-like UbaA is involved in the biosynthesis of Moco and thiouridine (Miranda et al., 2011). Interestingly, these E1 homologs are also involved in the post-translational modification of cellular proteins in all domains of life (Furukawa et al., 2000; Miranda et al., 2011; Shigi, 2012).

### **FUNCTION VIA TRANSLATIONAL CONTROL OF A SPECIFIC GROUP OF GENES IN EUKARYOTES**

The inactivation of the genes in the Ncs6/Urm1 pathway results in a pleotropic phenotype that includes increased sensitivity to high temperature, oxidative stress, and rapamycin, a Tor-signaling inhibitor (Furukawa et al., 2000; Goehring et al., 2003a,b; Dewez et al., 2008). These phenotypes can be suppressed by overexpression of tRNALys, tRNAGlu, and tRNAGln, which normally possess mcm5s 2U34 (Leidel et al., 2009). The phenotypes are similar to those of mutants of the elongator complex (Esberg et al., 2006), which is essential for modification of the C5 position of U34. These observations provide evidence that the mcm5s 2U modification in tRNAs affects global translation, resulting in a pleotropic phenotype.

In *S. cerevisiae*, pioneering work has addressed the function of tRNA modifications on gene expression. The mcm<sup>5</sup> modification of the wobble base of specific tRNAs modulates the expression of a DNA damage response mRNA, whose cognate codons are unusually overrepresented (Begley et al., 2007). A similar observation has been made concerning telomeric gene silencing (Chen et al., 2011). Proteome analysis in *S. pombe* showed that the amount of a specific group of proteins, including those involved in cell division, was decreased in mutants defective in the mcm5s 2U modification (Bauer and Hermand, 2012; Bauer et al., 2012). The genes coding for these proteins have skewed lysine codon usage, such that the AAA codon was overrepresented compared to the AAG codon. The mcm5s 2U modification in tRNALysUUU is necessary especially for the efficient translation of mRNAs enriched in the AAA codon. Among the genes affected by the mcm5s 2U modification, a central regulator of mitosis and cytokinesis, Cdr2, was identified. The amount of Cdr2, a protein kinase, was recovered by overexpression of tRNALysUUU in the mutant defective in mcm5s 2U biosynthesis. In addition, after substituting AAG codons for all AAA codons in *cdr2* mRNA, the Cdr2 protein amount was no longer affected by the mcm5s 2U modification. This study provides an interesting example of how translational control of a specific group of mRNAs can be affected by tRNA modifications and codon usage. The translation of specific genes in *S. pombe* and *S. cerevisiae* has also been reported by two other groups to be controlled by the s2U modification and codon usage (Fernandez-Vazquez et al., 2013; Rezgui et al., 2013).

In *S. cerevisiae*, 2-thiouridine formation in tRNA-Lys, -Glu, and -Gln is actively downregulated when methionine and cysteine are limiting, which leads to an overall reduction in translational capacity and reduced growth, because Glu, Gln, and especially Lys

codons are overrepresented in the genes essential for translation and growth (Laxman et al., 2013). In this case, tRNA thiolation works as a key effector to maintain amino acid homeostasis.

### **CONCLUSION AND FUTURE PERSPECTIVES**

In the course of the characterization of biosynthesis pathways of sulfur-containing modifications, a number of sulfur-carrier proteins have been identified. The sulfur-carrier proteins may have evolved to deliver reactive sulfur atoms to specific targets and avoid non-specific transfer of activated sulfur atoms, which could inactivate other biomolecules. At the time of writing, it is still unclear whether or not the differences in the chemical properties of persulfide and thiocarboxylate result in different biological outcomes.

Each biosynthesis pathway of sulfur-containing molecules is mutually modulated by sulfur trafficking, and translational control of specific genes by 2-thiouridine and codon usage bias is now proposed in some model organisms. It would be interesting to determine whether similar mechanisms exist in higher eukaryotes.

Unexpectedly, the characterization of the biosynthesis of 2-thiouridine has revealed molecular fossils, namely, ancient ubiquitin-like molecules, including Urm1 in eukaryotes, TtuB in bacteria, and SAMP2 in archaea. These proteins have two functions; they function as sulfur carriers for 2-thiouridine synthesis and as protein modifiers. Therefore, these proteins may be evolutionarily intermediates between ancient sulfur-carrier proteins and protein modifiers. It is possible that an adenylated or thiocarboxylated intermediate, formed in the course of 2-thiouridine biosynthesis, was incidentally attached to adjacent proteins at some time in the past. By this post-translational modification, the activities of the attached proteins have probably changed. This was certainly the origin of the post-translational modification of proteins by these Ubls. The primitive function of these conjugates was probably self-regulation. Conjugates of Ubls and the modification enzymes Ncs6/Ncs2 and TtuA have already been detected, but the function of these post-translational modifications remains to be clarified. The resultant conjugates have become to be used as tags for the recognition and regulation of other proteins. By acquiring E2 and E3 enzymes, which can recognize target proteins precisely, the Ub system evolved considerably to become the sophisticated system it is today in eukaryotes.

### **ACKNOWLEDGMENTS**

I would like to thank all collaborators, including Dr. Tsutomu Suzuki (University of Tokyo), Dr. Kimitsuna Watanabe (Tokyo University of Pharmacy and Life Sciences), and Dr. Shigeyuki Yokoyama (RIKEN), as well as members of their laboratories, namely Ms. Yuriko Sakaguchi (University of Tokyo) and Dr. Hirofumi Nakagawa (University of Tokyo and RIKEN). This study was supported in part by KAKENHI Grant (24570173) of the Ministry of Education, Culture, Sports, Science, and Technology of Japan, the Naito Foundation, the Kurata Memorial Hitachi Science and Technology Foundation, and the Takeda Science Foundation.

### **REFERENCES**

Agris, P. F., Sierzputowska-Gracz, H., Smith, W., Malkiewicz, A., Sochacka, E., and Nawrot, B. (1992). Thiolation of uridine carbon-2 restricts the motional

dynamics of the transfer RNA wobble position nucleoside. *J. Am. Chem. Soc.* 114, 2652–2656. doi: 10.1021/ja00033a044


maintain genome integrity. *Proc. Natl. Acad. Sci. U.S.A.* 105, 5459–5464. doi: 10.1073/pnas.0709404105


thiolation of cytidine in position 32 of tRNA from *Salmonella enterica* serovar Typhimurium. *J. Bacteriol.* 186, 750–757. doi: 10.1128/JB.186.3.750-757.2004


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 25 February 2014; accepted: 17 March 2014; published online: 02 April 2014. Citation: Shigi N (2014) Biosynthesis and functions of sulfur modifications in tRNA. Front. Genet. 5:67. doi: 10.3389/fgene.2014.00067*

*This article was submitted to Non-Coding RNA, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Shigi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## tRNAs as regulators of biological processes

### *Medha Raina1,2 and Michael Ibba1,2 \**

*<sup>1</sup> Department of Microbiology, The Ohio State Biochemistry Program, The Ohio State University, Columbus, OH, USA <sup>2</sup> Center for RNA Biology, The Ohio State University, Columbus, OH, USA*

### *Edited by:*

*Akio Kanai, Keio University, Japan*

### *Reviewed by:*

*Tohru Yoshihisa, University of Hyogo, Japan Dieter Soll, Yale University, USA*

### *\*Correspondence:*

*Michael Ibba, Department of Microbiology, The Ohio State Biochemistry Program, The Ohio State University, 484 West 12th Avenue, Columbus, OH 43210-1292, USA e-mail: ibba.1@osu.edu*

Transfer RNAs (tRNA) are best known for their role as adaptors during translation of the genetic code. Beyond their canonical role during protein biosynthesis, tRNAs also perform additional functions in both prokaryotes and eukaryotes for example in regulating gene expression. Aminoacylated tRNAs have also been implicated as substrates for nonribosomal peptide bond formation, post-translational protein labeling, modification of phospholipids in the cell membrane, and antibiotic biosyntheses. Most recently tRNA fragments, or tRFs, have also been recognized to play regulatory roles. Here, we examine in more detail some of the new functions emerging for tRNA in a variety of cellular processes outside of protein synthesis.

**Keywords: amino acid, protein synthesis, regulation, transfer RNA, translation**

### **INTRODUCTION**

tRNAs are important players in the protein synthesis pathway, linking the genetic code with the amino acid sequence of proteins. tRNAs are composed of 73–90 nucleotides and have a characteristic cloverleaf secondary structure made up of the D-loop, T loop, variable loop, and the anticodon loop. The tRNA further folds into an L-shaped tertiary structure through coaxial stacking of the T and D loops. To function as a substrate in protein synthesis, tRNA is charged with an amino acid by its cognate aminoacyl-tRNA synthetase. The aminoacyl-tRNA (aa-tRNA) thus formed serves as a substrate and participates in the chemistry of peptide bond formation in the process of protein synthesis. Beside this well-known canonical role during protein biosynthesis, tRNAs have been shown to perform additional functions such as acting as signaling molecules in the regulation of numerous metabolic and cellular processes in both prokaryotes and eukaryotes. Aminoacylated tRNAs have also been implicated as substrates for non ribosomal peptide bond formation in the case of cell wall formation, protein labeling for degradation, modification of phospholipids in the cell membrane, and antibiotic biosynthesis. Due to their universally conserved L-shaped three-dimensional conformation, which is stabilized by extensive secondary and tertiary structural contacts and modifications, tRNA molecules are among the most stable RNAs in a cell and are considerably more robust than mRNAs (Gebetsberger and Polacek, 2013). For a long time, tRNA fragments were considered as non-functional degradation intermediates, but have now been recognized to be major RNA species in human cells for which regulatory roles are beginning to be discovered. It was also recently shown that tRNAs can act as an effective scavenger of cytochrome *c*, consistent with a role in regulating apoptosis. With new functions still emerging for tRNA, in this review we examine some of the many "non-protein synthesis" roles of tRNA in the cell.

### **ROLES OF tRNA IN GENE EXPRESSION**

While aminoacyl-tRNAs have been implicated in many roles outside translation, several important functions of tRNA have been found not to require the aminoacyl form (aa-tRNA). Uncharged tRNAs have been shown to regulate global gene expression in response to changes in amino acid pools in the cell. Bacteria have adopted various strategies to adapt to external stresses, of which the most-studied global regulatory system is the stringent response. Stringent response is mediated through the production of the alarmone 5- -diphosphate 3- -diphosphate guanosine (ppGpp) and 5- -triphosphate 3- -diphosphate guanosine (pppGpp) which were first discovered by Cashel and Gallant (1969) in *Escherichia coli* as a response to amino acid starvation. *E. coli* uses two pathways for the synthesis of ppGpp dependent on RelA and SpoT. RelA is a ribosome-associated (p)ppGpp synthase which senses the presence of uncharged tRNAs that accumulate at the ribosome A site as a result of amino acid limitation. The presence of the uncharged tRNA acts as an effector molecule, stalling protein synthesis and activating RelA which then synthesizes pppGpp and ppGpp by phosphorylation of GTP or GDP using ATP as the phosphate donor (Haseltine and Block, 1973; Sy and Lipmann, 1973). ppGpp was recently shown to bind at an interface of ω and β subunits of RNA polymerase, thereby acting as an allosteric effector to inhibit global gene transcription, while stimulating the expression of only a few genes related to the synthesis of amino acids (Ross et al., 2013). rRNA and tRNA synthesis are primarily inhibited, resulting in the global downregulation of bacterial metabolism. SpoT is a bifunctional (p)ppGpp synthase and hydrolase, which presumably regulates the (p)ppGpp level in response to nutrient deficiency. The mechanism by which SpoT senses starvation and synthesizes ppGpp is unclear (Magnusson et al., 2005). Many other bacterial species including *Bacillus subtilis* contain only one RelA-SpoT homolog, designated as Rel, which possesses both (p)ppGpp synthase and hydrolase activities. RelA-SpoT homologs have also been detected in plants (Givens et al., 2004). Two *Bacillus subtilis* genes, yjbM and ywaC, were found to encode a novel (p)ppGpp synthase that corresponds to the synthase domain of RelA-SpoT family members while having a different mode of action (Nanamiya et al., 2008).

Another mechanism by which bacteria regulate gene expression using uncharged tRNA as the effector molecule has been demonstrated in *B. subtilis* and other Gram-positive bacteria. In these organisms, the expression of aminoacyl-tRNA synthetase genes and genes involved in amino acid biosynthesis and uptake is regulated by the T box control system (reviewed in Green et al., 2010). Regulation by the T box mechanism most commonly occurs at the level of transcription attenuation (Henkin and Yanofsky, 2002). The 5 untranslated regions of regulated genes contain a 200– 300 nt conserved sequence and structural element (a G + C-rich helix followed by a run of U residues) that serves as an intrinsic transcriptional terminator and can also participate in formation of an alternate, less stable antiterminator structure. During amino acid starvation, binding of a specific uncharged tRNA stabilizes the antiterminator and in doing so prevents formation of the terminator helix. The T box binds specific uncharged tRNA at two conserved sites: the anticodon of the tRNA interacts with the codon sequence of the specifier loop (SL) in the 5- -UTR, while the 3 acceptor end interacts with the UGGN sequence found in the antiterminator bulge, thus stabilizing the structure of the antiterminator and preventing the formation of the competing terminator. RNA polymerase then continues past the terminator region and transcribes the full-length mRNA. The N residue in the antiterminator bulge varies with the corresponding position of the tRNA. Both charged and uncharged tRNAs can interact with specifier sequence in the 5- -UTR; however the presence of the amino acid at the 3 end of a charged tRNA prevents the interaction of its 3 end with the antiterminator bulge region; and allows formation of the terminator hairpin that results in premature termination of transcription (Grundy et al., 2005). Recently a unique mechanism of tRNA-dependent regulation at the transcriptional level was discovered. Saad et al. (2013) found a two-codon Tbox riboswitch binding two tRNAs in *Clostridium acetobutylicum*. This T-box regulates the operon for the essential tRNA-dependent transamidation pathway and harbors an SL with two potential overlapping codon positions for tRNAAsn and tRNAGlu. Both tRNAs can efficiently bind the SL *in vitro* and *in vivo*. This feature allows the riboswitch to sense two tRNAs and balance the biosynthesis of two amino acids (Saad et al., 2013). Regulation at the level of translation initiation has also been demonstrated for T box riboswitches in certain bacteria (Fuchs et al., 2006). Translationally regulated leader RNAs include an RNA element with the ability to sequester the Shine-Dalgarno (SD) sequence by pairing with a complementary anti-SD (ASD) sequence. Binding of uncharged tRNA stabilizes a structure analogous to the antiterminator that includes the ASD sequence, and formation of this alternate structure releases the SD sequence for binding of the 30S ribosomal subunit, thereby enabling translation of mRNA coding for proteins involved in amino acid biosynthesis (Green et al., 2010).

Uncharged tRNAs also function as regulators in eukaryotes. In amino-acid-starved yeast and mammalian cells, uncharged tRNA activates a protein kinase named Gcn2p whose regulatory sequences include the amino terminal region, a pseudo kinase domain, protein kinase region, histidyl-tRNA synthetase (HisRS) related region and the c-terminal dimerization and ribosome binding sequences. The tRNA has been shown to bind to the

HisRS like regulatory domain, thereby activating Gcn2p which in turn phosphorylates eIF2, a protein involved in binding GTP and Met-tRNAi Met and forming the ternary complex required for translation initiation (Wek et al.,1995). The activated Gcn2p phosphorylates the α subunit of eIF2 at serine 51, lowering its activity and thereby reduces global protein synthesis. Gcn2p was shown to bind several types of uncharged tRNA with similar affinities but showed a reduced affinity for the charged form of a tRNA, implying that Gcn2 can discriminate between charged and uncharged forms of tRNA. (Dong et al., 2000). It was recently proposed that in the inactive form of Gcn2 present in non−starvation conditions, association with the substrate eIF2 is prevented by binding of the HisRS-like domain and C−term to the PK domain of Gcn2 thereby sequestering the substrate binding cleft. However, under starvation conditions, uncharged tRNA binds to Gcn2, at both the HisRS and C−term domains thereby producing conformational changes which open up the substrate binding cleft in the PK domain by releasing the HisRS-like domain and the C-terminal portion of Gcn2p, from inhibitory interactions with the PK domain, which allows eIF2 binding and phosphorylation (Qiu et al., 2001).

It has been proposed that discrimination between the charged and uncharged tRNA by Gcn2p occurs via an analogous mechanism of RelA protein activation as observed in *E. coli* by the presence of uncharged tRNA at the decoding (A) site on translating ribosomes. The activation of Gcn2p by uncharged tRNA requires its association with the ribosome via its C-terminal region and also, interactions between the N terminus of Gcn2p and the Gcn1p–Gcn20p protein complex which is also associated with the ribosome. Gcn1p, has been proposed to facilitate the eviction of uncharged tRNA from the A site and its transfer from the A site to the HisRS-like domain in Gcn2p for kinase activation and the Gcn1p-Gcn20p complex has also been implicated to increase the binding of uncharged tRNA to ribosomes. The importance of the Gcn1p–Gcn20p complex in Gcn2p activation was shown by the Hinnebusch group, who demonstrated that deletion of GCN1 blocks eIF2 phosphorylation by Gcn2p (Marton et al., 1993). The activation of eIF2 by an uncharged tRNA at the A site of the ribosome could explain how starvation of a single amino acid can activate Gcn2p, even though it cannot discriminate between uncharged tRNA species in cells starved for only one amino acid (Marton et al., 1997; Garcia-Barrio et al., 2000; Sattlegger and Hinnebusch, 2000). In yeast, phosphorylation of eIF2, allows for selected mRNAs such as *GCN4* to be translated. Elevated levels of Gcn4, which acts as a transcription factor, stimulate the expression of genes involved in amino acid biosynthesis (reviewed in Hinnebusch, 2005). In comparison to *S. cerevisiae*, which has a single eIF2α kinase, Gcn2p, mammalian cells have expanded this stress response pathway to include additional eIF2α kinases, which each respond to different environmental stresses. Analogous to yeast, phosphorylation of mammalian eIF2α leads to a block in global translation, accompanied by induced translational expression of ATF4 and ATF5, transcription factors related to Gcn4p (Harding et al., 2000; Lu et al., 2004; Vattem and Wek, 2004; Zhou et al., 2008).

The above mechanisms demonstrate that under certain nutritional stresses, the aminoacylation levels of tRNAs change and the

accumulated uncharged tRNAs participate in numerous biological pathways that regulate global gene expression levels, helping the organism to survive under adverse conditions.

### **AMINOACYL-tRNAs AS NON-RIBOSOMAL SUBSTRATES**

In recent years, the diverse roles of aa-tRNAs have received a great deal of attention. While much of the research has focused on the use of aa-tRNA by the ribosome for protein synthesis, a number of studies have uncovered roles for aa-tRNAs as substrates in other biochemical processes, such as cell wall formation, protein labeling for degradation, aminoacylation of phospholipids in the cell membrane, and antibiotic biosynthesis (**Figure 1**). In this section, we will briefly review some of these various processes that use aa-tRNAs as substrates.

### **AMINOACYL-tRNAs IN CELL WALL BIOGENESIS** *Aminoacyl-tRNA-dependent building of peptidoglycan bridges*

Peptidoglycans (PG) are structural components of bacterial cell walls that can both serve as a barrier to environmental challenges and provide a scaffold for the attachment of various proteins including virulence factors (Vollmer et al., 2008). Peptidoglycan is a polymer of β (1-4)-linked *N*-acetylglucosamine (GlcNAc) and *N*-acetylmuramic acid (MurNAc), with all lactyl groups of MurNAc substituted with stem peptides, typically comprised of alternating D and L-amino acids with an overall common structure of L-Ala-γ-D-Glu-X-D-Ala-D-Ala. The composition of the peptide varies among different bacteria: Gram-negative bacteria and Gram-positive bacilli have meso-diaminopimelic acid (DAP) as the third amino acid (DAP-type peptidoglycan), whereas most other Gram-positive bacteria (including Grampositive cocci) have L-lysine as the third amino acid (Royet and Dziarski, 2007). The stem peptides from adjacent strands are often crosslinked, either directly or through short peptides between the X position of the first pentapeptide side chain with the L-Ala at the fourth position of another. The amino acids required for bridge formation are typically derived from aminoacylatedtRNA donor molecules and are transferred onto the pentapeptide by tRNA-dependent aminoacyl-ligases which catalyze peptidebond formation by using aminoacyl-tRNAs and peptidoglycan precursors as donor and acceptor, respectively.

The peptidoglycan in *Streptococcus pneumoniae* contains a "stem peptide" composed of up to five amino acids, Ala-γ-D-Glu-Lys-D-Ala-D-Ala, with an L-Ala-L-Ala or an L-Ser-L-Ala dipeptide branch that is attached to the third L-Lys of the pentapeptide side chain. MurM is responsiblefor the addition of either L-Ala or L-Ser as the first amino acid of the cross-link and then MurN invariably adds L-Ala as the second amino acid (Filipe et al., 2000). In both cases, appropriately aminoacylated-tRNA species serve as the amino acid donors for the reaction (Lloyd et al., 2008), although MurM also efficiently accepts mischarged tRNA substrates (Shepherd, 2011; Shepherd and Ibba, 2013b). In *Enterococcus faecalis,* BppA1 and BppA2 add L-Ala-L-Ala dipeptide to the pentapeptide chain (Bouhss et al., 2002), while FemXAB from *Staphylococcus aureus* sequentially adds one (FemX) or two (FemA and FemB) glycines (Schneider et al., 2004). Lif and Epr in *Staphylococcus simulans* and *Staphylococcus capitis*, FemX in *Weissella virides* and FemX and VanK in *Streptomyces coelicolor* all catalyze similar reactions using aa-tRNAs as substrates (reviewed in Shepherd and Ibba, 2013a). How aa-tRNAs are diverted from protein synthesis and used as substrates by these enzymes remains somewhat unclear in most instances. In *S. aureus,* the mechanism of escape from the protein synthesis machinery could be explained by the observation that three out of the five tRNAGly isoacceptors encoded in the *S. aureus* genome have sequence identity elements consistent with weak binding to EF–Tu (Giannouli et al., 2009). These specific tRNA sequence elements include replacement of the strong EF–Tu binding pairs G49–U65 and G51–C63 [23–25] with A49–U65 and A51–U63, respectively, in the T loop (Roy et al., 2007; Sanderson and Uhlenbeck, 2007). The three non-proteinogenic tRNAGly isoacceptors also show replacement of GG at positions 18 and 19 with either UU or CU. Hence the isoacceptors with weak binding to EF– Tu could escape protein synthesis and thus allow *S. aureus* to maintain an adequate supply of Gly-tRNAGly for two essential processes: translation and cell wall modification (Shepherd and Ibba, 2013a).

The specificity of peptidoglycan-modifying enzymes with respect to amino acid and tRNA substrates was demonstrated in the Fem X enzyme from *Weissella viridescens.* In *W. viridescens* the peptide bridge is made up of L-Ala-L-Ser or L-Ala-L-Ser-L-Ala. FemX initiates peptide bridge formation by transfer of the first L-Ala residue to the amino group of L-Lys found at the third position of the pentapeptide side chain. The enzymes involved in the subsequent transfer of the second position Ser and third position Ala residues have not yet been identified. FemX has a preference for L-Ala addition to UDP-MurNAc pentapeptide because it reacts much more unfavorably with both L-Ser and the acceptor arm of tRNAGly . *In vitro* assays show that FemX turns over Ser-tRNASer and Gly-tRNAGly 17- and 38 fold less efficiently than Ala-tRNAAla, respectively. In the latter case, the penultimate base pair of tRNAAla, G2-C71, was identified as an essential identity element for FemX. This is typically replaced by C2-G71 in tRNAGly species (Fonvielle et al., 2009). L-Ala is preferred 110-fold over D-Ala, suggesting relatively weak specificity toward different stereoisomers. The exclusion of serine is due to steric hindrance at the FemXWv active site rather than poor recognition of the nucleotide sequence of tRNASer. Hence, Fem enzymes discriminate non-cognate aa-tRNAs on the basis of both the aminoacyl moiety and the sequence of the tRNA.

### *Aminoacyl-tRNA-dependent aminoacylation of membrane lipids*

Bacteria are frequently exposed to cationic antimicrobial peptides (CAMPs), for example eukaryotic host defense peptides or prokaryotic bacteriocins, whose cationic properties impart strong affinities to the negatively charged bacterial lipids phosphatidylglycerol (PG) and cardiolipin (CL). Many bacteria, among them several important human pathogens, achieve CAMP resistance using MprF proteins, a unique group of enzymes that aminoacylate anionic phospholipids with L-lysine or L-alanine, thereby introducing positive charges into the membrane surface

and reducing the affinity for CAMPs (Ernst and Peschel, 2011). MprF was first identified when its inactivation rendered a *S. aureus* transposon mutant susceptible to a wide range of cationic antimicrobial peptides (CAMPs) leading to the name "multiple peptide resistance factor" (MprF; Peschel et al., 2001). MprFs can use lysyl or alanyl groups derived from aminoacyl tRNAs for modification of PG (Roy and Ibba, 2008). MprF proteins are integral membrane proteins made up of a C terminal, hydrophilic, cytoplasmic domain responsible for the transfer of amino acid onto PG, and an N terminal transmembrane hydrophobic domain that flips newly synthesized LysPG to the membrane outer leaflet (Ernst et al., 2009). MprF homologs can be found in most bacterial phylas and are abundant in firmicutes, actinobacteria, and proteobacteria with the exception of enterobacteria. Some archaea also harbor genes for MprF, probably resulting from lateral gene transfer events (Roy and Ibba, 2009). MprF homologs exhibit differential specificity for the aa-tRNA substrate they use to modify PG, resulting in a broader classification of these enzymes as aminoacylphosphatidylglycerol synthases (aaPGS; Klein et al., 2009; Dare and Ibba, 2012). For example, the MprFs in *S. aureus* and *P. aeruginosa* only synthesize Lys-PG or Ala-PG, respectively (Staubitz et al., 2004; Klein et al., 2009). In contrast, *Enterococcus faecium* MprF2 exhibits rather relaxed specificity for the donor substrate and produces both, Ala-PG and Lys-PG along with small amounts of Arg-PG (Roy et al., 2009). *Listeria monocytogenes* MprF is less strict in its specificity for the acceptor substrate and generates both, Lys-PG and Lys-CL (Thedieck et al., 2006; Dare et al., 2014). Based on the ability of MprF1 to efficiently recognize tRNAAla, tRNAPro, and a minihelixAla and recognition of the tRNALys species from both *Borrelia burgdoferi* and humans, which share less than 50% sequence identity, it was proposed that the specificity of MprF arises from direct recognition of the aminoacyl moiety of aa-tRNA (Roy and Ibba, 2008). The mechanism utilized by MprF and other similar enzymes raises the question of how aa-tRNA donor substrates are directed into membrane lipid modification and away from protein synthesis. Determination of the *K*Ds of Lys-tRNA for EF-Tu and for MprF suggested that the two proteins have similar affinities for tRNA under physiological conditions (Roy and Ibba, 2008). Comparison of the sites in tRNA recognized by MprF and EF–Tu would give a better understanding of how aa-tRNAs are partitioned between translation and membrane lipid modification pathways.

### **ROLE OF AA-tRNA IN ANTIBIOTIC BIOGENESIS**

In addition to having essential roles in protein synthesis and nonribosomal peptide bond formation, aminoacyl-tRNAs are also used in pathways where the donated amino acid moiety undergoes transformation into a significantly different compound. These pathways involve different amino acid-tRNA pairs and a variety of acceptor molecules (Banerjee et al., 2010). Examples of aatRNA-dependent addition of amino acids in antibiotic biogenesis, which have been reviewed in detail previously, include valanimycin, pacidamycin, and cyclodipeptide synthesis (Shepherd and Ibba, 2013a).

Valanimycin is a potent antitumor and antibacterial azoxy compound first isolated from *Streptomyces viridifaciens* by Yamato et al. (1986). A gene cluster has been identified that contains 14 genes involved in the biosynthesis of valanimycin (Garg et al., 2002). The functions of the products of eight of these genes have now been established. Valanimycin is derived from L-Val and L-Ser via an isobutylhydroxylamine intermediate. VlmD, VlmH, and VlmR catalyze the conversion of valine into isobutylhydroxylamine, while VlmL catalyzes the formation of L-seryl-tRNA from L-serine. VlmA, which is a homolog of the housekeeping SerRS, catalyzes the transfer of L-serine from L-seryl-tRNA to isobutylhydroxylamine, to produce *O*-(L-seryl)-isobutylhydroxylamine, while VlmJ and VlmK catalyze the phosphorylation and subsequent dehydration of the biosynthetic intermediate valanimycin hydrate to form valanimycin (Garg and Parry, 2010). The mechanism by which Ser-tRNASer is directed away from translation into the valanimycin pathway, and the identity elements of tRNASer that help in recognition by VlmA and VlmL, are still unknown.

Other examples of antibiotics derived from aa-tRNAs are the cyclodipeptides (CDP), a large group of secondary metabolites with a notable range of clinical activities (Rodriguez and Carrasco, 1992; Prasad, 1995; Magyar et al., 1996; Waring and Beaver, 1996; Kanoh et al., 1999; Strom et al., 2002; Cain et al., 2003; Kanzaki et al., 2004; Jia et al., 2005; Kohn and Widger, 2005; Musetti et al., 2007; Minelli et al., 2012). It was originally proposed that formation of the CDPs was catalyzed by nonribosomal peptide synthetases, which do not use aa-tRNAs as substrates. However, subsequent characterization of synthesis of the CDP albonoursin in *Streptomyces noursei* identified the tRNA-dependent CDP synthase AlbC (Lautru et al., 2002). AlbC synthesizes the albonoursin precursor cyclo (L-Phe-L-Leu) from aminoacylated tRNAs in an ATP-independent reaction (Lautru et al., 2002; Gondry et al., 2009). CDP synthase products identified to date include cyclo(L-Leu-L-Leu) (cLL), cyclo(L-Phe-L-Leu) (cFL), cyclo(L-Tyr-L-Tyr) (cYY), and cyclo(L-Trp-L-Xaa) (cWX), all of which are intermediates in antibiotic synthesis (Belin et al., 2012). CDP synthases use their two aa-tRNA substrates in a sequential ping-pong mechanism, with a similar first catalytic step: the binding of the first aa-tRNA and subsequent transfer of its aminoacyl moiety to the conserved serine residue of the enzyme pocket (e.g., Ser37 in the AlbC enzyme; Sauguet et al., 2011). The mechanism of addition of the second amino acid remains unclear, as do the specificity determinants for CDP synthases. Recently, similarities between the predicted secondary structure for PacB, a protein involved in the biosynthesis of the antibiotic pacidamycin, and structures of two Fem transferases led to the characterization of PacB as an alanyl-tRNA-dependent transferase (Zhang et al., 2011). Pacidamycins are a family of uridyl tetra/pentapeptide antibiotics produced by *Streptomyces coeruleorubidus* with antipseudomonal activities through inhibition of the translocase MraY during bacterial cell wall assembly. Analogous to the activity of CDP synthases, PacB hijacks aa-tRNAs and transfers L-Ala from aminoacyl-tRNA donors to the N terminal m-Tyr2 residue of the growing PacH-anchored antibiotic scaffold (Zhang et al., 2010).

### **tRNA-DEPENDENT ADDITION OF AMINO ACIDS TO THE AMINO-TERMINUS OF PROTEINS**

Protein degradation plays an important role in maintaining cellular physiology and in regulation of various cellular processes such as cell growth, differentiation and apoptosis by removing damaged polypeptides and regulatory proteins in a timely manner. As compared to cellular compartments like lysosomes and vacuoles where proteases are involved in non-specific degradation of proteins, protein degradation in the cytosol of prokaryotes and eukaryotes is often strictly targeted to protect cellular proteins from unwanted degradation. One means to achieve specificity involves the aa-tRNA transferases, which recognize a secondary destabilizing residue (pro-N degrons) at the N-terminus of a target peptide and utilize an aminoacyl-tRNA to transfer a primary destabilizing amino acid (N-degron) to the N-terminal residue, making the protein a target for the cellular degradation machinery (N-recognins; Mogk et al., 2007). This specificity in protein degradation was discovered by Bachmair et al. (1986) when they found that different genetic constructs of β-galactosidase proteins from *E. coli* exhibited very different half-lives when produced in *Saccharomyces cerevisiae*, ranging from more than 20 h to less than 3 min, depending on the identity of their N-terminal amino acid [the N-end rule (Bachmair et al., 1986)]. The N-end rule relates the identity of the N-terminal residue of a protein to its *in vivo* half-life (Mogk et al., 2007) and has been shown to function in bacteria (Tobias et al., 1991), fungi (Bachmair et al., 1986), plants (Potuschak et al., 1998) and mammals (Gonda et al., 1989). In eukaryotes an N-terminal Arg residue is the preferred N-degron and acts as a target for ubiquitin conjugation and subsequent degradation by the eukaryotic proteasome (Tasaki et al., 2012). The degron is generated by the *ATE1* gene product arginyl (*R*)-transferase, which transfers Arg from Arg-tRNA to the Nterminal α-amino group of oxidized cysteine, Asp, or Glu which constitute secondary destabilizing residues (Pro-N-degrons; Rai and Kashina, 2005; Graciet et al., 2006). In prokaryotes, Leu and Phe act as the primary destabilizing N-terminal residues (Ndegrons) and can be generated by two classes of aa-transferases, leucyl/phenylalanyl(L/F)-transferase encoded by the *Aat* gene and leucyl- transferase encoded by *Bpt*. The L/F- transferase attaches a primary destabilizing residue of either Leu or Phe to the secondary destabilizing residues Lys, Met, and Arg (Shrader et al., 1993), whereas *Bpt*-encoded L-transferase attaches Leu to the secondary destabilizing residues Asp and Glu (Graciet et al., 2006). The Leu/Phe N-degron acts as a target for ClpS, which transfers the protein to ClpAP for subsequent degradation (Mogk et al., 2007). The question that next arises is how the aa-tRNA transferases achieve specificity in binding aa-tRNAs? The crystal structure of leucyl/phenylalanyl-tRNA-protein transferase and its complex with an aminoacyl-tRNA analog solved by Suto et al. (2006) revealed that the side chain of Leu or Phe is accommodated in a highly hydrophobic pocket, with a shape and size suitable for hydrophobic amino-acid residues lacking a branched β-carbon, such as leucine and phenylalanine. The adenosine group of the 3 end of tRNA is recognized largely through π–π stacking with conserved Trp residues. However, L/F transferases achieve specificity for aa-tRNAs through specific interaction with the aminoacyl moiety and not the tRNA, and only the presentation of the specific

aminoacyl moiety by a single-stranded RNA region is required for recognition (Abramochkin and Shrader, 1996). The activity of L/F-transferases is reduced in the presence of an excess of EF–Tu, suggesting that L/F-transferase and EF–Tu compete for binding to aa-tRNA.

### **tRNA-DERIVED FRAGMENTS**

Small non-coding RNA (sncRNA) molecules are major contributors to regulatory networks that control gene expression, and significant attention has been directed toward their identification and studying their biological functions. sncRNA was first discovered in 1993 in *Caenorhabditis elegans,* and since then a large number of sncRNAs have been identified. sncRNAs are 16–35 nucleotides (nts) long and are classified into different groups such as microRNA (miRNA), small-interfering RNA (siRNA), piwiinteracting RNA, and small nucleolar RNA (snoRNA). Among them, miRNA and siRNA are the most extensively studied, and both suppress gene expression by binding to target mRNAs. The recent development of high-throughput sequencing technology has improved the identification of other types of small, RNAs-like, tRNA-derived RNA fragments (tRFs) which have been identified by several research groups (Lee et al., 2009). There is increasing

evidence that these are not by-products from random degradation, but rather functional molecules that can regulate translation and gene expression. The production of tRNA fragments and their emerging roles in the cell are discussed below (**Figure 2**).

### **PRODUCTION OF tRNA FRAGMENTS** *tRNA halves*

tRNA halves are composed of 30–35 nucleotides derived from either the 5 or 3 part of full-length, mature tRNA. These tRNA halves are produced by cleavage in the anticodon loop under nutritional, biological, physicochemical, or oxidative stress (Thompson et al., 2008; Dhahbi et al., 2013; Nowacka et al., 2013). In mammalian cells, tRNA halves are generated during stress conditions by the action of the nuclease angiogenin, a member of the RNase A family (Fu et al., 2009) whereas in yeast Rny1p, a member of the RNase T2 is responsible for tRNA half production. Apart from their roles as nucleases, both angiogenin and Rny1 act as sensors of cellular damage and can promote cell death and inhibit tumor formation (Thompson and Parker, 2009; Gebetsberger and Polacek, 2013). Under normal conditions, yeast Rny1 is usually localized in the vacuole (Thompson and Parker, 2009), while angiogenin is secreted to the plasma, sequestered in the nucleolus

processed by RNase P, RNase Z, the splicing endonuclease and CCA-adding enzyme to form mature tRNA in the nucleus. Processing of both the pre-tRNA and mature tRNA can give rise to small RNA. The figure shows

and 5 leader exon tRF) production from tRNA. The dashed lines and question marks indicate mechanisms of formation or transport of these tRFs that are not clear.

or bound to its inhibitor RNH1 (Yamasaki et al., 2009; Saikia et al., 2012), and released into the cytoplasm under certain stress conditions. tRNA halves have also been identified in bacteria, archaea, and plants. In bacteria, tRNA anticodon nucleases such as PrrC, colicin D, and colicin E5 have been shown to cleave specific subsets of tRNAs (reviewed in Kaufmann, 2000; Masaki and Ogawa, 2002).

### *tRNA-derived fragments*

tRNA-derived fragments (tRFs) are shorter than tRNA halves ranging between 13 and 20 nt in size. They have been identified in all domains of life. There are four types of tRFs known and they are classified based on the part of the mature tRNA or pre-tRNA from which they are derived. tRFs were classified as 5 tRFs, 3- CCA tRFs, 3- U tRFs, or 5 leader-exon tRFs. 5 tRFs are derived from the 5 end of the tRNA generated at any point of tRNA processing, provided the 5 leader sequence is removed by RNaseP, and are formed by a cleavage in the D loop. In the case of 5 tRFs their biogenesis is carried out by Dicer in mammalian cells (Cole et al., 2009). However, it is known that the Dicer-independent generation of 5 tRFs takes place in *Schizosaccharomyces pombe* due to the differences in length of the 5 tRFs generated in these two organisms (19 nt long in mammals and 23 nt long in yeast), suggesting that a protein other than Dicer is responsible for their production in yeast (Buhler et al., 2008). 3- CCA tRFs are produced from the 3 ends of mature tRNA by cleavage at the T loop and carry the trinucleotide CCA at the acceptor stem. Dicer has been implicated in the generation of the 3 end fragment (Maute et al., 2013), although angiogenin and other RNase A members have also been proposed to function in Dicer-independent processing (Li et al., 2012; Gebetsberger and Polacek, 2013). 3- U tRFs are cleaved from the 3 end of tRNA precursors by RNase Z, and their biogenesis is normally Dicer independent. They commonly start directly after the 3 end of mature tRNAs and end in a stretch of U residues produced by RNA polymerase III run-off (Lee et al., 2009; Haussecker et al., 2010). One 3- U tRF is produced in an RNaseZ-independent manner by the action of Dicer on the predicted bulged hairpin structure of the pre-tRNA (Babiarz et al., 2008). The mechanism of formation of 5 leader-exon tRFs is not known; however they have been identified in CLP1 mutant cells possibly arising due to aberrant splicing. CLP1 is an RNA kinase and is a component of the mRNA 3 end cleavage and polyadenylation machinery in mammals (Hanada et al., 2013).

While it was previously thought that production of tRNA halves and tRFs were solely mechanisms to remove damaged tRNAs, increasing evidence suggests their formation to be regulated. Angiogenin and Rny1 involved in the production of tRNA halves are usually sequestered in compartments before they are released in the cytoplasm where they cleave tRNAs (Spriggs et al., 2010). However, the regulation of their release from these cellular compartments is not known. Also a number of tRNAs [including tRNAAsp(GTC) , tRNAVal(AAC) and tRNAGly(GCC) ] can be methylated by Dnmt2, which has been shown to protect these tRNAs from cleavage during stress (Schaefer et al., 2010). This specificity in cleavage of tRNAs might be responsible for the different types of tRFs observed under various conditions.

### **FUNCTIONS OF tRFS**

Are tRFs merely the products of tRNA degradation or do they have *bona fide* biological functions? If so, how diverse are these functions given the various forms of tRFs identified? Several lines of evidence point toward regulated production, suggesting that they may be functional RNA species. First, the abundance of different types of tRF does not correlate with the number of parent tRNA gene copies (Kawaji et al., 2008; Cole et al., 2009; Hsieh et al., 2009; Sobala and Hutvagner, 2011) with the exception of those found in *Tetrahymena* (Couvillion et al., 2010). Second, the fragments of tRNA formed are produced by cleavage at specific points in the tRNA. Third, whilst tRFs corresponding to the 5 and 3 ends of tRNA have been reported, those corresponding to the middle (incorporating the anticodon loop) have not. Although, the exact roles of tRNA halves and tRFs are yet to be elucidated, accumulating evidence suggests that tRNA-derived small RNAs participate in two main types of biological processes as discussed in more detail below.

### *Translation regulation of gene expression under stress conditions*

tRNAs are indispensible components of the translational machinery, hence tRNA cleavage under stress conditions can affect protein synthesis. However, the mode of translational regulation by tRNA cleavage is not simple. It has been shown previously that during stress conditions, formation of tRNA cleavage products does not change the pool of full-length tRNA significantly, rather these fragments represent only a small portion of the tRNA pool (Saikia et al., 2012). Ivanov et al. (2011) showed a more intricate role for tRNA halves in translational control. They observed that tRNA halves formed by angiogenin during stress were able to inhibit protein synthesis and trigger the phospho-eIF2α-independent assembly of stress granules (SGs). These granules are mainly composed of stalled pre-initiation complexes, suggesting that the translation initiation machinery can be targeted by 5 tRNA halves. They demonstrated that selected tRNA halves inhibit protein synthesis by displacing eIF4G/eIF4A from capped and uncapped mRNA and eIF4E/G/A (eIF4F) from the m7G cap. Using pull down of 5- -tiRNAAla– protein complexes the authors implicated YB-1, a translational repressor known to displace eIF4G from RNA and eIF4E/G/A from the m7G cap (Evdokimova et al., 2001; Nekrasov et al., 2003). Analysis of the 5 tRNA halves in complex with YB-1 revealed that a terminal oligo-G motif containing four to five consecutive guanosines present in certain 5 tRNA halves (Ala/Cys) was absolutely required for translational repression of a reporter mRNA, suggesting the inhibition is caused by specific tRNA and is not a consequence of global upregulation of tRFs (Ivanov et al., 2011). This result came as a surprise as regulation of translation during stress is carried out via phosphorylation of eIF2 (See Roles of tRNA in Gene Expression), which induces translational repression facilitated by active sequestration of untranslated mRNAs into SGs (Holcik and Sonenberg, 2005).

In addition to tRNA halves, tRFs have also been implicated in regulation of translation. In the archaeon *Haloferax volcanii* a 26 nt-long 5- tRF originating from tRNAVal in a stress-dependent manner was shown to directly bind to the small ribosomal subunit and inhibit translation by interfering with peptidyl transferase activity (Gebetsberger et al., 2012). A similar mechanism of translation inhibition by a 5 tRF was recently observed in human cells (Sobala and Hutvagner, 2013). A 26 nt 5 tRF derived from tRNAValwas able to inhibit translation by affecting peptide bond formation. An interesting observation from this study was that the tRFs required a conserved "GG" dinucleotide for their activity in inhibiting translation. A similar motif dependence is observed as discussed above in translation inhibition by a 5 tRNA half. 5 tRNA halves containing the 5 tRF sequence were shown to require a run of at least four guanosine residues at the 5 end of the molecule, which is present only in tRNAAla and tRNACys, as compared to 5 tRFs that require only two guanosine residues at the 3 end of the molecule, residues conserved between tRNAs. Mutating the di-guanosine motif required by 5 tRF in the 5 tRNA half did not affect its inhibitory activity, and the precise mechanism of translation inhibition by these tRFs warrants further investigation (Sobala and Hutvagner, 2013).

### *tRNA-derived fragments as regulators of gene silencing*

One of the first studies showing the involvement of tRNA-derived fragments in gene regulation and silencing was carried out by Yeung et al. who addressed the role of small RNAs in human immunodeficiency virus (HIV) infected cells. A highly abundant, 18 nt-long, tRF originating from the 3 end of human cytoplasmic tRNALys3 was shown to target the the HIV-1 primer-binding site (PBS) similarly to siRNAs that target complementary RNA (Yeung et al., 2009). tRNAlys is used by viral reverse transcriptases as primer for the initiation of reverse transcription and DNA synthesis (Marquet et al., 1995). The 3 tRF was shown to be associated with Dicer and AGO2, and to cause RNA cleavage of the complementary PBS sequence thereby showing the role of a tRF in viral gene silencing. Other tRFs like 3- CCA, 5 and 5- U tRF have also been shown to be associated with agronautes and hence have a potential to function as an siRNA or miRNA. Haussecker et al. (2010) investigated the ability of 3- CCA and 3- U tRFs to associate with Argonaute proteins and cause silencing of a reporter luciferase transgene. They found that both types of 3 tRF associated with Argonaute proteins, but often more effectively with the non-silencing Ago3 and Ago4 than Ago1 or Ago2. They observed that 3- CCA tRFs had a moderate effect on reporter transgene silencing, but 3- U tRFs did not. However, upon co-transfection of a small RNA complementary to the 3- U tRF, the tRF preferentially associated with Ago2 and caused 80% silencing of the reporter transgene. This correlated with redirection of the reconstituted fully duplexed double-stranded RNA into Ago 2, whereas Ago 3 and 4 were skewed toward less structured small RNAs, particularly singlestranded RNAs. This is in stark contrast with results normally obtained in the miRNA field where sequences complementary to miRNAs relieve repression, a phenomenon known as senseinduced transgene silencing (SITS). Modulation of tRF levels had minor effects on the abundance of microRNAs, but more pronounced changes in the silencing activities of both microRNAs and siRNAs. This study provides compelling evidence that tRFs play a role in the global control of small RNA silencing through associating with different Argonaute proteins (Haussecker et al., 2010).

A tRF that functions as an miRNA was recently described, a 22 nt 3- tRF generated in a Dicer-dependent manner from tRNAGly in mature B cells and associated with Argonaute proteins (Maute et al., 2013). The 3- tRF was shown to inhibit RPA1, an essential gene involved in DNA repair by possibly binding to the 3- UTR region. Expression of this tRF was downregulated in a lymphoma cell line indicating that loss of 3 tRF expression might help the cancer cells to tolerate the accumulation of mutations and genomic aberrations during tumor progression.

### *Other biological functions of tRFs*

Apart from the two known biological functions of tRFs in regulation, other potential biological functions are beginning to be identified. Recently a study by Ruggero et al. (2014) showed their role in viral infectivity. Large scale sequencing of small RNA libraries was used to identify small non-coding RNAs expressed in normal CD4+ T cells compared to cells transformed with human T-cell leukemia virus type 1 (HTLV-1), the causative agent of adult T-cell leukemia/lymphoma (ATLL). Among the miRNAs and tRFs expressed, one of the most abundant tRFs found was derived from the 3 end of tRNAPro, and exhibited perfect sequence complementarity to the primer binding site of HTLV-1. *In vitro* reverse transcriptase assays verified that this tRF was capable of priming HTLV-1 reverse transcriptase thereby suggesting an important role in viral infection. One possible role suggested for the tRF fragment is to support the initiation of reverse transcription, but not progressivity, with failure to proceed to the strand transfer step (Ruggero et al., 2014). Further studies are now needed to compare the abilities of the tRF and of full-length tRNAPro to prime and support strand transfer. Variation of tRNA halves accumulation was also shown in the parasites *Toxoplasma gondii*, the agent of toxoplasmosis, and the rodent malaria parasite *Plasmodium berghei*. These organisms exhibited increased tRNA accumulation upon egress from host cells and in response to stage differentiation, amino acid starvation, and heat-shock. It was observed that avirulent isolates of *T. gondii* and attenuated *P. berghei* parasites displayed higher rates of tRNA cleavage compared to virulent strains. Also tRNA half production was significantly higher in the metabolically quiescent bradyzoite and sporozoite stages of *T. gondii*, compared to the fast-growing tachyzoite indicating a relationship between half-tRNA production and growth rate in this important group of organisms (Galizi et al., 2013). A role for tRF halves in Respiratory SyncytialVirus (RSV) infectivity was recently shown byWang et al. who observed an induction of tRNA cleavage upon RSV infection with a specific subset of tRNAs being cleaved. The 31 nt 5 tRF(Glu) formed exhibited *trans*-silencing capability against target genes; however the mechanism of gene silencing was found to be different than the gene-silencing mechanism of miRNA/siRNA, previously also shown for other tRFs. Interestingly the tRF was also shown to promote RSV replication (Wang et al., 2013)

tRNA fragments have also been implicated in progressive motor neuron loss. Hanada et al. recently demonstrated that tRNA fragments generated in CLP1 mutant cells sensitize cells to oxidative stress-induced activation of the p53 tumor suppressor pathway and in turn lead to progressive loss of spinal motor neurons leading to muscle denervation and paralysis thereby providing a possible link between tRNA cleavage and p53 dependent cell death. However, the exact mechanism by which these tRNA fragments affect the p53 pathway needs to be determined (Hanada et al., 2013).

### **REGULATION OF CELL DEATH BY tRNA**

Apoptosis is a cellular process by which damaged, harmful, and unwanted cells are eliminated. Apoptotic regulation is critical to cell homeostasis, immunity, multi-cellular development, and protection against infections and diseases like cancer (Thompson, 1995). Apoptotic cells have been shown to undergo various morphological and biochemical changes caused by a group of cysteine-dependent aspartate *s*pecific prote*ases*, or caspases. In healthy cells, caspases are inactive, however during apoptosis caspases are activated and signal the onset of apoptosis via cleavage of various intracellular proteins including apoptotic proteins, cellular structural and survival proteins, transcriptional factors, signaling molecules, and proteins involved in DNA and RNA metabolism (Li and Yuan, 2008; Hou and Yang, 2013). Cleavage of these intracellular proteins ultimately leads to phagocytic recognition and engulfment of the dying cell. While many factors have been discovered that regulate the apoptotic pathway, in this section the recently discovered role of tRNA as a regulator of cell death is discussed (**Figure 3**).

### **CASPASE ACTIVATION BY EXTRINSIC AND INTRINSIC PATHWAYS**

Apoptosis can be triggered via two major routes: an extrinsic, or extracellularly activated pathway and/or an intrinsic, or mitochondrial-mediated pathway. Both pathways activate caspases, a class of endoproteases that hydrolyze peptide bonds (Thornberry and Lazebnik, 1998). Although there are various types of caspases, those involved in apoptosis can be classified into two groups, the initiator (or apical) caspases and the effector (or executioner) caspases. Initiator caspases (e.g., Caspases-8 and 9) are capable of autocatalytic activation, whereas effector caspases (e.g., Caspases-3, 6 and 7) are activated by initiator caspase cleavage (Chang and Yang, 2000; Riedl and Shi, 2004). The extrinsic pathway begins outside the cell through activation of a group of pro-apoptotic cell surface receptors, such as Fas/CD95 and tumor necrosis factor receptor. Upon binding to their cognate ligand, these receptors recruit an adaptor protein Fas-associated death domain (FADD) that binds and dimerizes the initiator procaspase-8, to form an oligomeric death-inducing signaling complex (DISC), in which procaspase-8 becomes activated through an autoproteolytic cleavage event. The active caspase-8 then cleaves and activates the effector caspases 3 and 7 (Ashkenazi and Dixit, 1998; Krammer et al., 2007; Hou and Yang, 2013). The intrinsic pathway causes mitochondrial outer membrane permeabilization (MOMP), which leads to release of cytochrome *c*, a mitochondrial protein which transfers electrons from complex III to complex IV in the electron transport chain (Wang, 2001). The discovery of the role of cytochrome *c* in apoptosis by Liu et al. (1996) came as a surprise due to its essential role in the survival of the cell. In the cytosol, cytochrome *c* interacts with the apoptotic protease activating factor-1 (APAF-1) to form the apoptosome complex (Zou et al., 1997). The complex recruits procaspase-9, which

converts to active caspase-9 by autocatalysis. Active caspase-9 activates effector caspases like caspase-7 and caspase-3 and causes apoptosis (**Figure 3**). Apoptosis is regulated by several pro-apoptotic proteins (Bax, Bak, and Bid), anti-apoptotic proteins (Bcl-2, Bcl-XL, and Mcl-1) and a range of cellular factors (HSP90, HSP70 and HSP27; Sreedhar and Csermely, 2004; Gorla and Sepuri, 2014) that is now known to include tRNA.

### **INTERACTION BETWEEN tRNA AND CYTOCHROME c: POTENTIAL ROLE IN REGULATING APOPTOSIS**

To answer the long-standing conundrum of why 1 mM dATP is required to induce caspase-9 activation in cell lysates, when the intracellular concentration of dATP is only 10 μM, Mei et al. investigated the role of RNA, which is essentially a polymer of nucleoside monophosphates, in cytochrome *c-*mediated caspase activation. They observed that treatment of mammalian S100 extracts with RNase strongly increased cytochrome *c-*induced caspase-9 activation, while the addition of RNA to the extracts impaired caspase-9 activation. These results implicated an inhibitory role of RNA in the activation of caspase-9. Systematic evaluation of the steps leading to caspase-9 activation identified cytochrome *c* as the target of the RNA inhibitor. Analysis of cytochrome *c*-associated species revealed that tRNA binds specifically to cytochrome *c*. Microinjection of tRNA into living cells inhibited the ability of cytochrome *c* to induce apoptosis, while degradation of tRNA by an RNase that preferentially degrades tRNA, onconase, enhanced apoptosis via the intrinsic pathway. Taken together, these findings showed that tRNA binds to cytochrome *c* and inhibits formation of the apoptosome (Mei et al., 2010). This suggested a direct role for tRNA in regulating apoptosis and revealed an intimate connection between translation and cell death. This finding also raised an interesting question as to how the interaction between tRNA and cytochrome *c* modulates apoptosis. This question was addressed recently by Gorla et al. who proposed that tRNA interacts with the heme moiety of cytochrome *c* and thereby protects the positively charged surface of cytochrome *c* from being exposed to the APAF-1 complex. This model was further confirmed by the observation that cytochrome *c* lost its ability to interact with tRNA after treatment with oxidizing agents or cysteine modifying agents. In such a state, hemin is unable to bind to tRNA and the exposed positively charged residues of cytochrome *c* then bind to APAF-1 (Gorla and Sepuri, 2014). Hence tRNA can regulate apoptosis by binding to cytochrome *c*. Further investigation of the nucleotide residues of tRNA involved in these interactions is required to answer questions about how tRNA binding to cytochrome *c* is regulated in the cell, whether specific tRNA isoacceptors are involved, and if this interaction is non-specific. Increased expression of tRNA has been detected in a wide variety of transformed cells (Marshall and White, 2008), such as ovarian and cervical cancer (Winter et al., 2000; Daly et al., 2005), carcinomas, and multiple myeloma cell lines (Zhou et al., 2009). Expression levels of tRNA molecules in breast cancer cells were 10-fold higher as compared to in normal cells and overexpression of tRNAi Met induces proliferation and immortalization of fibroblasts and also significantly alters the global

tRNA expression profile (Pavon-Eternod et al., 2013). It was also observed that certain individual tRNAs were overexpressed more as compared to others. tRNAArg(UCU) , tRNAArg(CCU) , tRNAThr(CGU) , tRNASer(CGA) , and tRNATyr(GTA) were among the most over-expressed tRNAs, while tRNAHis(GTG) , tRNAPhe(GAA) , and tRNAMet(CAT) were the least over-expressed tRNAs (Pavon-Eternod et al., 2009) indicating overexpression is not random and may be related to regulation of cytochrome *c*. Identification of the tRNA sites involved in binding to cytochrome *c* might help elucidate the connection between tRNA overexpression and cancer. tRNA cleavage has also been suggested as a mode of regulation of this interaction (Hou and Yang, 2013).

### **CONCLUSION**

While aa-tRNAs have been implicated in variety of roles in biosynthetic pathways, much less is known about the various functions of uncharged tRNAs in cells apart from their role in acting as sensors for cellular stress like nutritional deprivation. The recent discovery of the role of tRNA in regulating apoptosis has opened a whole new field which requires investigation into tRNA-protein interactions and has created a link between regulation of cell death and cellular metabolism. With the advent of high throughput sequencing techniques, studying the whole transcriptome of various organisms has become feasible. These techniques refuted the age-old assumption that rRNA, mRNA and tRNA constitute the main RNA species in the cell. It is now clear that almost all of the DNA in the cell is transcribed; however, only a small portion of these transcripts are translated into proteins or used as substrates for biological processes. The emergence of these sequencing techniques has resulted in discoveries of novel ncR-NAs, and several studies have highlighted their role as important regulators of gene expression. Among the ncRNAs discovered, a number of cleavage products of tRNAs formed in response to stress have been also been discovered. These cleavage products were initially thought to be a result of random degradation; however, a number of studies have revealed their production to be a result of specific cleavage, and possibly regulated. Although a number of cleavage products have been observed, all the possible mechanisms of their production are not fully understood.

Also in the case of 3- U tRF and 5 leader exon tRF that are most likely produced inside the nucleus, their mechanism of export from the nucleus is not understood. Also regulation of tRNA fragment production, i.e., its initiation, efficiency, and termination are not fully understood. A recent study by Hanada et al. (2013) demonstrated that tRNA fragments sensitize cells to oxidative stress-induced activation of the p53 tumor suppressor pathway. This suggests that tRNA cleavage activates apoptosis via activation of p53 and hence protects against cancer, while full-length tRNA binds cytochrome *c* and prevents apoptosis thereby aiding cancer development. This hypothesis is strengthened by the overexpression of tRNAs observed in cancer cell lines. Further investigation into the link between tRNA cleavage and p53 activation is required to help understand how tRNAs help regulate the progression of cancer.

tRNA is post-transcriptionally modified at various nucleotides. While their role in tRNA structure stability and translation is well studied, these modifications might aid in the regulation of tRNA fragmentation. Further studies are needed to answer why some tRNAs are cleaved and others not – for example could modifications make certain positions in tRNA more sensitive to RNases or could they be responsible for blocking RNases? Also, modifications might also help regulate tRNA binding to cytochrome *c* during apoptosis. Regulation of this interaction and its role in metabolism and tumorigenesis will help our understanding of regulation of death in both normal and cancer cells. Cells have various mechanisms to sense the absence of a modification and remove non-functional tRNAs (Phizicky and Alfonzo, 2010). Variations in the modification status of tRNAs during stress have been implicated directly in decoding (Dedon and Begley, 2014), and such effects may be accentuated by indirect effects on the generation of regulatory tRFs. Clearly, much still remains to be discovered about the various regulatory roles of both charged and uncharged tRNA.

### **ACKNOWLEDGMENTS**

Work in the authors' laboratory on this topic was supported by grants MCB 1052344 from the National Science Foundation and GM65183 from the National Institutes of Health.

### **REFERENCES**


with the kinase domain and is required for tRNA binding and kinase activation. *EMBO J.* 20, 1425–1438. doi: 10.1093/emboj/20.6.1425


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 March 2014; accepted: 21 May 2014; published online: 11 June 2014. Citation: Raina M and Ibba M (2014) tRNAs as regulators of biological processes. Front. Genet. 5:171. doi: 10.3389/fgene.2014.00171*

*This article was submitted to Non-Coding RNA, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Raina and Ibba. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

**REVIEW ARTICLE** published: 03 June 2014 doi: 10.3389/fgene.2014.00158

## Transfer RNA and human disease

### *Jamie A. Abbott , Christopher S. Francklyn and Susan M. Robey-Bond\**

*Department of Biochemistry, College of Medicine, University of Vermont, Burlington, VT, USA*

### *Edited by:*

*Akio Kanai, Keio University, Japan*

### *Reviewed by:*

*Lluís Ribas De Pouplana, IRB Barcelona, Andorra Anthony Antonellis, University of Michigan, USA*

### *\*Correspondence:*

*Susan M. Robey-Bond, Department of Biochemistry, College of Medicine, University of Vermont, Health Sciences Complex, B403 Given Bldg., 89 Beaumont Avenue, Burlington, VT 05405, USA e-mail: susan.robey-bond@uvm.edu* Pathological mutations in tRNA genes and tRNA processing enzymes are numerous and result in very complicated clinical phenotypes. Mitochondrial tRNA (mt-tRNA) genes are "hotspots" for pathological mutations and over 200 mt-tRNA mutations have been linked to various disease states. Often these mutations prevent tRNA aminoacylation. Disrupting this primary function affects protein synthesis and the expression, folding, and function of oxidative phosphorylation enzymes. Mitochondrial tRNA mutations manifest in a wide panoply of diseases related to cellular energetics, including COX deficiency (cytochrome C oxidase), mitochondrial myopathy, MERRF (Myoclonic Epilepsy with Ragged Red Fibers), and MELAS (mitochondrial encephalomyopathy, lactic acidosis, and stroke-like episodes). Diseases caused by mt-tRNA mutations can also affect very specific tissue types, as in the case of neurosensory non-syndromic hearing loss and pigmentary retinopathy, diabetes mellitus, and hypertrophic cardiomyopathy. Importantly, mitochondrial heteroplasmy plays a role in disease severity and age of onset as well. Not surprisingly, mutations in enzymes that modify cytoplasmic and mitochondrial tRNAs are also linked to a diverse range of clinical phenotypes. In addition to compromised aminoacylation of the tRNAs, mutated modifying enzymes can also impact tRNA expression and abundance, tRNA modifications, tRNA folding, and even tRNA maturation (e.g., splicing). Some of these pathological mutations in tRNAs and processing enzymes are likely to affect non-canonical tRNA functions, and contribute to the diseases without significantly impacting on translation. This chapter will review recent literature on the relation of mitochondrial and cytoplasmic tRNA, and enzymes that process tRNAs, to human disease. We explore the mechanisms involved in the clinical presentation of these various diseases with an emphasis on neurological disease.

**Keywords: tRNA, neurodegenerative disease, localized translation, mitochondrial disease, aminoacyl-tRNA synthetase, Usher syndrome Type IIIB**

### **INTRODUCTION**

Although the role of tRNA in translation has been known since the late 1950's, the first report linking a mutation in tRNA to a human disease was published in 1990, when MELAS was associated with a mutation in the mitochondrial tRNALeu gene [mitochondrial tRNA Leu (L)] (*MTTL1*) (Kobayashi et al., 1990). This report was followed by the demonstration that tRNALys is associated with MERRF (Shoffner et al., 1990). Since then, multiple mutations in individual tRNA genes have been associated with multiple diseases, and individual diseases have been found to be caused by mutations in one of several tRNAs. Not unexpectedly, the penetrance and severity of disease caused by tRNA mutations has been demonstrated to be unpredictable (Phizicky and Hopper, 2010; Hurto, 2011; Suzuki et al., 2011; Tuller, 2012). The primary role of tRNA is to deliver amino acids to the nascent polypeptide chain during protein translation, a seemingly irreplaceable function. Disease-causing mutations in tRNA, to date, have been found only in mitochondrial tRNA, indicating that the etiology of tRNA-linked diseases are tightly associated with mitochondrial biology. How can different point mutations in the same molecule lead to very different diseases? Do tRNAs play additional roles in mitochondria or in the cell apart from translation, for example in organelle localization or replication, cell death, membrane environment, or DNA replication (Dimauro et al., 2013)? About 1700 mitochondrial proteins are coded in the nuclear genome, and it is unknown how mutated mt-tRNA might interact with these.

Many macromolecules interact with tRNA in the cytosol as well. The association of mutations in tRNA binding or processing enzymes with disease is even more recent: tRNA splicing endonuclease mutations were found to cause pontocerebellar hypoplasia as recently as 2008 (Budde et al., 2008). Mutations in the canonical tRNA partner, aminoacyl tRNA-synthetases (ARSs), were first identified as causative agents of Charcot-Marie-Tooth in 2003 (Antonellis et al., 2003). The subject of ARSs and diseases has also received attention, with a number of useful recent reviews addressing various issues (Kim et al., 2011; Guo and Schimmel, 2013; Jia et al., 2013; Konovalova and Tyynismaa, 2013; Schwenzer et al., 2013; Diodato et al., 2014). A major question yet to be addressed is the localization of translation in neural cells, and how tRNA may be trafficked to ribosomes in distant regions (e.g., long axons and synapses in neurons) of the cell.

In this review, we survey the known and emerging connections between tRNA and human diseases, and the role of its key cellular partners in pathophysiology. We begin in the mitochondria, as only mitochondrial tRNA mutations have been found, and briefly review how mitochondrial biology contributes to the effects of tRNA mutations. Several tissue-specific diseases are reviewed, followed by a discussion of potential therapeutics. Next, we move into the cytoplasm to investigate tRNA interactions with other molecules—specifically modifiers and splicing endonucleases, followed by the role tRNA binding plays in regulating protein synthesis. Finally we review the canonical role of tRNA interactions with aminoacyl synthetases, with an emphasis on recent disease discoveries.

### **MITOCHONDRIAL tRNA MUTATIONS AND DISEASE MITOCHONDRIAL FUNCTIONAL BIOLOGY AND GENOMICS**

Mitochondria perform the essential function of synthesizing ATP in eukaryotic cells, and this cellular energy resource powers the biosynthesis of key metabolites, mechanochemical and transport functions, and a vast array of other activities. In addition to these functional roles, the mitochondria also generate and regulate the production of reactive oxygen species, as well as activate important cellular pathways such as apoptosis. The 16.59 kb double stranded circular mitochondrial genome is located within the inner mitochondrial membrane, and encodes for a total of 37 genes. Thirteen of these encode for open reading frames that comprise most of the subunits of the respiratory chain complexes I (seven subunits), III (one subunit), IV (three subunits), and V (two subunits). The remaining subunits of the respiratory chain complex are encoded by genes located in the nuclear genome. Two mitochondrial genes encode for ribosomal RNAs (rRNAs) 16S and 12S and the remaining 22 genes encode for the set of mitochondrial transfer RNAs (mt-tRNA). All of the RNA molecules necessary for mitochondrial translation are provided for by the mitochondrial genome. However, many proteins necessary to perform protein synthesis, including mitochondrial aminoacyl tRNA synthetases (mt-ARS), ribosomal proteins, and tRNA processing enzymes are encoded in the nuclear genome. These are synthesized in the cytoplasm and subsequently imported into the mitochondria.

### **MITOCHONDRIAL tRNA MUTATIONS AND DISEASE**

It is well established that many different mt-tRNA mutations can cause a wide range of human diseases. High-energy consuming tissues such as muscular and nervous systems are particularly affected by mitochondrial defects. Unlike chromosomal DNA localized to the nucleus and subject to Mendelian inheritance, mitochondrial DNA is inherited solely from the mother and thousands of identical copies can exist per cell. Cells that carry a homogeneous population of the mitochondrial genome, wild type or mutated, are referred to as homoplasmic. A condition in which more than one type of mitochondrial genome exists, either in a cell, tissue or in an individual, is referred to as heteroplasmy. Should a mutation spontaneously arise or be maternally inherited, mitochondria carrying the pathogenic mutated mtDNA can accumulate over the course of organismal development. When the mutational load of a certain tissue type reaches a particular threshold, disease symptoms can become evident. Depending on the mutation and the tissue, this threshold can vary from 50 to 90%. Mitochondrial diseases and their respective tRNA mutations have been extensively cataloged, and the information is available in a variety of useful databases (Putz et al., 2007; Ruiz-Pesini et al., 2007). Past and recent examples of mt-tRNA disease-causing mutations demonstrate that these tRNA genes can affect mitochondrial protein synthesis, aminoacylation activity, and three-dimensional structure and folding of the tRNA in


*Disease abbreviations: MELAS, mitochondrial encephalomyopathy, lactic acidosis, and stroke-like episodes; MERRF, myoclonic epilepsy with ragged red fibers; CMH1, cardiomyopathy familial hypertrophic; CPEO, Chronic progressive external ophthalmoplegia; CM, cardiomyopathy; NSHL, non-sensory hearing loss; RP, Retinitis pigmentosa; C,SP,A, cataracts, spastic paraparesis, and ataxia.*

the mitochondria (**Table 1**). Mitochondrial tRNA structures can differ significantly from cytosolic tRNA (Florentz et al., 2003) and many require specific complex modifications in order to fold correctly. Here we will review some well-characterized and recently reported mt-tRNA mutations, particularly as they relate to neurobiology and specific human disease phenotypes.

### **MELAS AND MERRF**

The two classic well-characterized diseases associated with mttRNA mutations are mitochondrial encephalomyopathy, lactic acidosis, and stroke-like episodes (MELAS) and myoclonic epilepsy with ragged red fibers (MERRF). These are linked to mutations in mt-tRNALeu and mt-tRNALys respectively (Suzuki et al., 2011). Patients with MELAS present with seizures, recurrent headaches and vomiting, anorexia, exercise intolerance, and proximal limb weakness; these are often seen early in childhood. Recurring stroke-like episodes can progressively impair motor abilities, vision, hearing, and mentation. MELAS is primarily caused by defects in respiratory chain complexes I and IV, which lead to impaired oxidative phosphorylation. The A3243G mutation, which encodes a mutation in the mt-tRNALeu wobble position, is common among some 80% of MELAS patients (Goto et al., 1990; Kobayashi et al., 1990). This mt-tRNALeu A3243G mutation was initially confirmed in 5 patients with MELAS syndrome (Kirino et al., 2005). Interestingly, the A8344G mutation associated with MERRF is also located in a wobble base position, in this case in mt-tRNALys.

The effects of the mt-tRNALeu A3243G and mt-tRNALys A8344G mutations were initially studied in cybrid cell lines (Kirino et al., 2004) followed by further characterization using mass spectrometry. Cybrid cells are created by fusion of cells that are free of mitochondria (rho 0 cells) with donor platelets, or by fusion of enucleated patient derived fibroblasts with osteosarcoma cells that lack mtDNA. These cells thus possess mtDNA from patients or controls in a control nuclear background. Mutations linked to MELAS were subsequently shown to affect the τm5U modification for mt-tRNALeu and the τm5s 2U modification for mt-tRNALys at the wobble position in the anticodon region (Yasukawa et al., 2000). These wobble position modifications are both essential for proper decoding by the mitochondrial ribosome (Kirino et al., 2004). Additional details concerning the biochemical affects of these taurine defective mutations have been recently reviewed (Suzuki et al., 2011).

Other mt-tRNA mutations associated with MELAS and MERRF phenotypes have been isolated in the gene encoding mt-tRNAHis (**Figure 1**). In the first reports, the mutation was found to be a heteroplasmic G12147A substitution (Melone et al., 2004; Taylor et al., 2004), and analysis of muscle biopsy samples revealed deficiency in cytochrome *c* oxidase (COX). The G12147A mutation in the D-arm of the mt-tRNAHis molecule would be expected to alter tRNA folding and abundance, leading to a specific decrease in the his-rich COXIII polypeptide, as well as a general decrease in mitochondrial protein synthesis. Additionally, a homoplasmic A12146G mutation was identified in another MELAS patient (Calvaruso et al., 2011). As in the previous example, an analysis of cybrid cells and patient muscle biopsy samples showed that deficiencies in various respiratory chain complexes enzymes were linked to the mt-tRNAHis mutation.

### **MITOCHONDRIAL BIOLOGY AND TISSUE SPECIFICITY OF DISEASE**

Diseases related to mitochondrial tRNA mutations can manifest in very specific tissue types, leading to complex phenotypes. For example, the heart is heavily reliant on mitochondrial ATP production for proper and synchronized muscle contraction, and is thus particularly sensitive to mt-tRNA mutations. Animal models for heart failure and ischemia/reperfusion frequently display morphological changes to mitochondria that would be expected to exert negative impacts on the function of adult cardiac cells (Piquereau et al., 2013). One such tRNA-linked cardiomyopathy is the homoplasmic G12192A mt-tRNAHis mutation localized to the loop of the T-C arm, which was initially identified in five patients (**Figure 1**) (Shin et al., 2000). Electron microscopy imaging carried out on patient cardiomyocytes showed a loss of sarcomere formation and an accumulation of mitochondria with aberrant morphology.

Mutations associated with hypertrophic cardiomyopathy have also been found in mt-tRNAIle (**Figure 1**) (Taylor et al., 2003; Hollingsworth et al., 2012). For patients carrying these A4300G and C4277T mt-tRNAIle mutations, the only symptom was hypertrophic cardiomyopathy. However, detailed immunohistochemistry and biochemical studies on the hearts of patients after transplant surgery indicated a large proportion of COX-deficient cardiomyocytes and defects in activity for respiratory chain complexes I, III, and IV (Giordano et al., 2013). Also, there appeared to be reduced expression of mt-tRNAIle in both left and right ventricle samples from two patients compared with control mttRNALeu. This suggests that the phenotype may arise from an increasing number of COX-deficient cardiomyocytes; however, the numbers of COX-deficient cells in cardiac muscle have been shown to increase with age.

In addition to the classic MELAS and MERRF cases, mutations in other mt-tRNAs are also linked to sensorineural defects. A number of reports suggest that inner ear hair cells and their connecting neuronal circuitry may be specifically affected by mt-tRNA mutations. In one report, a late onset hearing impairment in a Chinese family was linked to a T12201C mutation in the acceptor arm of mt-tRNAHis, with the severity of hearing impairment being linked to the degree of heteroplasmy (**Figure 1**) (Yan et al., 2011). For this mutation, cellular and biochemical analysis point to reduction in the levels of mt-tRNAHis as being the source of reduced mitochondrial translation and subsequent respiratory chain defects. Two related patients carrying a mt-tRNAHis inherited heteroplasmic G12183A mutation both developed sensorineural pathology, presenting with visual impairment around the average age of 9 (Crimi et al., 2003). This mt-tRNAHis mutation alters a highly conserved base pair in the T-C arm and would most likely disrupt secondary structure. The older sibling had pigmentary retinopathy, neurosensory deafness, and some muscle atrophy and ataxia, while the younger sibling only experienced both visual and inner ear neurosensory deficits. Interestingly, histochemical analysis of muscle biopsy tissues for (COX/SDH) activity did not convincingly demonstrate a complete COX deficiency.

Like the auditory system, the visual system is similarly sensitive to mutations in mt-tRNA. For example, carriers of G12183A mutation in mt-tRNAHis often present with a pigmentary retinopathy that leads to photoreceptor degeneration, pigmentary

deposits in the retina, and ultimately progressive loss of vision. Specific muscles responsible for ocular movement are another target in the visual system susceptible to mt-tRNA mutations. Chronic progressive external ophthalmoplegia (CPEO) is a neuromuscular disorder characterized by the loss of extraocular muscle mobility, which results in the inability to move the eyes.

In one case report, CPEO was found to be linked to a G4308A mutation in the T-stem of mt-tRNAIle; this substitution disrupts a conserved GC base pair (**Figure 1**) (Schaller et al., 2011; Souilem et al., 2011). While there was no evidence of maternal inheritance, patient blood samples were homoplasmic for the wild type mttRNAIle, and muscle tissue obtained by biopsy was heteroplasmic. Analysis of the latter tissue indicated a link between the mutation and respiratory chain defects, leading to abnormal mitochondrial morphology. The G4308A substitution leads to a misfolded conformation that is likely to be incompatible with 3 end processing by tRNase Z. A G4302A mt-tRNAIle mutation located in the variable arm was also found in a small number of CPEO patients (Berardo et al., 2010). Despite a predicted disruption of a conserved base pair in mt-tRNAIle, a muscle biopsy showed normal function of all of the respiratory chain complex proteins (Berardo et al., 2010). Of 22 pathogenic mutations observed in mt-tRNAIle, nine cause CPEO, suggesting that mt-tRNAIle is a "hot-spot" for CPEO. Most reported CPEO cases involving MTTI mutations are heteroplasmic and restricted to skeletal muscle. MTTI mutations exist that are primarily homoplasmic and can be found in various tissue types. The molecular rationale for the tight linkage between mt-tRNAIle and CPEO provides a fascinating research question for future study.

In some systems, a tRNA-linked disorder that is primarily a sensory defect may exert negative effects on additional brain and motor neuron functions. One recent study reported that the pathogenic G14685A mutation in mt-tRNAGlu affects both visual and hearing systems (Lax et al., 2013). In this case, the 7 year old affected subject suffered from cataracts, peripheral retinal degeneration, and pigmentary retinopathy. This was followed by bilateral sensorineural hearing impairment as an adolescent. By the age of 40, the patient developed symptoms of spastic paraparesis, ataxia, slurred speech, and incontinence. Subsequent immunohistochemical (IHC) and biochemical analysis of muscle and brain tissues collected post mortem indicated respiratory chain complex I deficiencies. Additional evidence suggested that this heteroplasmic mutation arose sporadically, with brain regions exhibiting the highest degree of heteroplasmy. The mutation is located in the T-C arm and substitutes a conserved G-C Watson-Crick base pair; detailed functional studies have yet to be conducted. To date there are 12 mutations in the *MTTE* gene encoding mt-tRNAGlu that cause mitochondrial pathologies with very different phenotypes, such as encephalomyopathy, retinopathy, MELAS, and Leber's hereditary optic neuropathy (LHON).

### **MITOCHONDRIA IN NEUROBIOLOGY AND ADDITIONAL CONSIDERATIONS FOR MITOCHONDRIAL DISEASE PHENOTYPES**

While most cells are roughly spherical structures with diameters in the range of micrometers to tens of micrometers, neurons possess irregular structures with long dendrites and axons that extend out from the soma at distances from millimeters to a meter. The synapse has the highest energy consumption of the neuron, and mitochondria are responsible for meeting this major energy demand. Among the important ATP dependent processes occurring at the synapse are vesicle exocytosis and endocytosis, maintenance of membrane potential by ion channels, and actin rearrangements of the cytoskeleton. Accordingly, synaptic mitochondria may be docked or anchored in these specialized compartments in order to execute these essential functions (Court and Coleman, 2012; Sheng and Cai, 2012). Owing to their ability to be transported along filamentous structures in the cell, mitochondria are motile and can be readily localized at the synapses and other high demand areas. Additionally, mitochondria are dynamic and can undergo major structural changes such as fusion, fission, and fragmentation.

Neuronal synapses are rich in mitochondria, both at the presynaptic and postsynaptic nerve terminals, and recent work suggests that there may be differences between synaptic vs non-synaptic mitochondria (Gillingwater and Wishart, 2013). In one model, specialized synaptic mitochondria may be synthesized at the soma and then trafficked to the synapse, where they adapt to the specific demands of this specialized environment. Alternatively, synapse specific mitochondria may be generated, selected for, and then trafficked to the synapse. Other considerations are that the mitochondria at the synapse are "older" and therefore not as efficient or that synaptic mitochondria may be synthesized in axons and would be closer to their destination at the axonal synapse (Amiri and Hollenbeck, 2008; Yarana et al., 2012).

These issues bear on the investigation of the above described neurobiological phenotypes, where defects arising from mttRNA mutation or mt-ARS mutations cause losses of mitochondrial function that are concentrated at the synapse (Baloh, 2008). While past work underscores how the altered structure, expression, or lack of tRNA modifications resulting from mt-tRNA mutations may disrupt protein translation (Florentz et al., 2003), the apparent tissue specific phenotype of many of these pathophysiological mutations remains to be elucidated. For diseases that affect basic neurobiologial functions, mtDNA tRNA mutations may affect processes beyond mitochondrial translation, including the mitochondrial inner membrane (MIM) lipid environment, dynamics, maintenance, and replication machinery (Dimauro et al., 2013). Mutations in mitofusin, a protein responsible for mitochondrial fusion and fission, cause the peripheral neuropathy disease Charcot-Marie-Tooth (CMT) type 2A (Zuchner et al., 2004; Kijima et al., 2005). In a zebrafish model, the loss-of-function mutation demonstrated that indeed transport of mitochondria along the axon is disrupted and may be the contributing factor in the CMT2A phenotype (Chapman et al., 2013). The issue of how mitochondrial movement may be regulated, and the role of the other essential factors in that process is discussed in detail elsewhere (Schwarz, 2013). Clearly, how specific mt-tRNA mutations affect biological functions of mitochondria is only beginning to be understood, and will remain a fertile area of research for the foreseeable future.

### **FUTURE MITOCHONDRIAL RESEARCH STRATEGIES AND POTENTIAL THERAPEUTIC APPROACHES**

Therapeutic progress toward cures for mitochondrial diseases has been slowed by the complex genetics associated with mitochondrial inheritance. For example, a mother with a low degree of heteroplasmy in her mtDNA can transmit a higher level of heteroplasmy to her children, a process referred to as a "mitochondrial bottleneck." Current efforts to better understand mitochondrial inheritance and the pathophysiology of mt-tRNA mutations are benefitting from both traditional systems like cybrid cells, and from induced pluripotent stem cell (iPScs) technology. iPScs can be challenging to develop if only specific tissues of affected patients carry the mt-tRNA mutation. Mito-mouse models have been generated to recapitulate pathogenic mitochondrial DNA mutations (Inoue et al., 2000). Recently this group of investigators generated a MERRF mito-mouse model incorporating the mttRNALys G7731A mutation (Shimizu et al., 2014), arguably the first of many such models. The animals generated carried varying levels of the mt-tRNALys G7731A mutation, and these levels correlated with the severity of the disease phenotype. Selection of oocytes that carried low amounts of the mt-tRNALys G7731A mtDNA mutation produced animals that appeared to be phenotypically normal. This work highlights the promise of this technology in generating future mt-tRNA disease models.

Other therapeutic options under active investigation for mitochondrial diseases include cytoplasmic transfer, gene therapy, or even eliminating dysfunctional mitochondria by altering mitochondrial dynamics to favor quality control mechanisms (Dimauro et al., 2013). In yeast, overexpression of the corresponding mt-ARS provides a rescue mechanism for pathological mt-tRNA mutations (De Luca et al., 2006). In human cells that model MELAS, the mt-tRNALeu A3243G mutation can be rescued by over expression of mt-LeuRS (Park et al., 2008; Li and Guan, 2010). Furthermore, in yeast models of some mt-tRNALeu, mt-tRNAVal, and mt-tRNAIle mutations, overexpression of the carboxy-terminal domain of mt-LeuRS is sufficient to rescue the respiratory chain defects (Francisci et al., 2011). In human cybrid cells overexpression of mt-LeuRS, mt-ValRS, and mt-IleRS has been shown to rescue pathogenic mutations in mt-tRNAIle (Perli et al., 2014). How the mt-LeuRS C-terminal domain is able to specifically rescue the defect associated with the mutant tRNA remains to be determined, but may involve direct interactions with mt-tRNAIle (Perli et al., 2014).

Mitochondrial gene replacement offers one of the most dramatic routes to treatment of mitochondrial disease. A recent noteworthy experiment featured the transfer of the nuclear genome from a primate oocyte to an enucleated oocyte of another primate containing only mitochondria (Tachibana et al., 2009). The oocytes generated contained the nuclear genome from two parents but mitochondria from the donor; when implanted in pseudopregnant mother they were able to successfully produce healthy rhesus macaque offspring. Not surprisingly, human oocyte mitochondrial genome transfer is under way (Amato et al., 2014), although research in the U.S. likely to lag relative to Europe, owing to U.S. funding restrictions on human embryo research. While this three-parent *in vitro* fertilization shows tremendous promise as means to produce a child without mitochondrial DNA mutations that retains both parents' nuclear genome, the approach does not address curing existing patients with mitochondrial disease, or treating mtDNA mutations that arise spontaneously.

### **CYTOPLASMIC tRNA CELLULAR INTERACTIONS AND DISEASE**

### **DISEASE-ASSOCIATED CYTOPLASMIC tRNA MUTATIONS ARE NOT OBSERVED**

The human genome contains approximately 500 tRNA genes, including gene duplications (Lowe and Eddy, 1997; Schattner et al., 2005). However, there are only 61 anticodons specified by the triplet code, so many of these identified tRNA genes share the same anticodon but differ in sequence elsewhere. Remarkably, a human disease that is linked to a mutation in a cytoplasmic tRNA has not yet been reported, and this may be a direct result of the presence of multiple paralogs of the gene encoding each cytoplasmic tRNA molecule. The tRNA for each anticodon, save a single gene for tRNATyr (ATA), is encoded by as many as 32 paralogous genes (Chan and Lowe, 2009). The process of generating each mature tRNA is a complex sequence of events that includes gene transcription, splicing, 5 and 3 end processing, CCA addition, transportation, and aminoacylation (Hopper et al., 2010). Errors in this sequence arising from mutations in the genes encoding the enzymes that bind and process cytoplasmic tRNA molecules are known, and account for a number of reported diseases. Furthermore, a growing body of literature suggests that tRNA molecules possess previously unrecognized biological functions in eukaryotes which are not fully understood (Phizicky and Hopper, 2010), and that may be perturbed by disruptions of the tRNA-protein interaction. Some of these biological functions include regulation of cellular apoptosis (Mei et al., 2010; Hou and Yang, 2013). See **Figure 2** for a schematic overview.

### **ROLE FOR tRNA MODIFICATIONS IN DISEASE**

The tRNAs found in higher eukaryotic species are more highly modified relative to prokaryotic and viral tRNA, a distinction that may have arisen from multiple selection pressures. Consequently, various human diseases can arise from improperly modified tRNA molecules (Torres et al., 2014). One element may have been to facilitate immune system discrimination of foreign tRNAs from self-tRNAs (Hanada et al., 2013), discussed below. Modifications of tRNA molecules can serve to support proper folding or facilitate recognition by other interacting enzymes. Classically, impaired modification of mt-tRNA has been associated with reduced recognition by ARS. More recently, methylation of tRNA has been linked to immune response stimulation (or suppression), via toll-like receptor 7 (TLR7) (Kariko et al., 2005; Robbins et al., 2007; Hamm et al., 2010; Jockel et al., 2012). Toll-like receptor activation initiates intracellular signaling events in immune cells by stimulating the release of cytokine type I interferon (IFN). Immune cell TLR7 stimulation can also occur when either viral or bacterial single-stranded RNA or siRNA binds to the receptor in the endosome or cytoplasm (Blasius and Beutler, 2010) and results in IFN production (Takeuchi and Akira, 2010). Purified tRNA from various bacterial species was used to test the immunostimulatory potential in human peripheral blood mononucleated cells (PBMCs). Interestingly, tRNA purified from *T.thermophilus* and *E.coli* did not stimulate IFN-α production from PBMCs. Differences in specific tRNA methylation sites between bacterial species may thus be responsible for loss of TLR7 stimulation and IFN-α production. Isolated tRNA from *E.coli* species lacking a specific tRNA-methyltransferase, the *trmH* encoded Gm18-2'-O-methyltransferase, acquired immunostimulation of TLR7. Curiously, only 7 of 42 tRNAs are Gm18 modified in *E.coli* (Sprinzl and Vassilenko, 2005). Modified Gm18 tRNA mediated inhibition of TLR7 stimulation in mouse FLT3Linduced dendritic cells (DCs) occurs in a dose dependent manner. Also, Gm18-modified tRNA can inhibit viral influenza a virus (IAV) TLR7 stimulation. These experiments demonstrate that Gm18-modified tRNA can act as an antagonist against TLR7 mediated production of IFN-α. In a similar investigation, the 2- -O-methylated Gm18 of yeast tRNAPhe and tRNATyr was also found to exert a suppressive affect on the immune response via TLR7 antagonism (Gehrig et al., 2012).

### **tRNA SPLICING AND DISEASE**

After transcription by polymerase III, many pre-tRNA molecules undergo post-transcriptional alterations to produce a mature product that can be utilized for translation. Initial transcripts are cleaved by the nucleases tRNaseZ and RNase P, which act to form the 3 and 5 ends, respectively. Additional modifications are made in the nucleus before transportation into the cytoplasm where, if necessary, alternative splicing can occur. Overall, RNA editing and alternative splicing is highest in the brain compared with other tissues (Norris and Calarco, 2012; Tariq and Jantsch, 2012). In humans the spleen has the highest expression of tRNA followed by the brain (Dittmar et al., 2006). As tRNA splicing has an important role in the central nervous system (CNS) it is not surprising that mutations in the splicing machinery can cause specific CNS diseases. In the human genome there are 32 intron-containing tRNAs (Lowe and Eddy, 1997).

Pontocerebellar hypoplasia (PCH) is a heterogeneous neurodegenerative disorder characterized by defective growth,

development, and function of the brainstem and cerebellum. Symptoms of PCH include microcephaly, seizures, hypotonia, hyper-reflexia, and optic atrophy. Magnetic resonance imaging (MRI) of patient brain tissue shows immature development of the cerebellum as well as mild structural defects (Budde et al., 2008). Mutations in the tRNA-splicing endonuclease complex (TSEN2, TSEN15, TSEN34, and TSEN54) were identified in multiple PCH2 and PCH4 patients. An A307S substitution in TSEN54 that is predicted to be non-catalytic appeared in multiple cases (Budde et al., 2008). Notably, TSEN54 is highly expressed in the pons and cerebellar dentate regions of the developing brain (Budde et al., 2008). Two mutations that encode Y309C and R58W substitutions in the catalytic subunits TSEN2 and TSEN34, respectively, were also identified. As these are unlikely to completely abolish endonuclease activity, altered tRNA splicing may not be the sole cause of the phenotype. Alternatively, these might compromise mRNA 3end formation or a function of TSEN54 other than tRNA editing, leading to insufficient protein synthesis of specific brain regions during development.

A mutation in a kinase that associates with the TSEN splicing complex affects tRNA processing and is embryonic lethal. Cleavage and polyadenylation factor I subunit 1 (CLP1) is a mammalian RNA kinase capable of phosphorylating the 5- hydroxyl of double and single stranded RNA and has a role in mRNA 3- -end formation (Weitzer and Martinez, 2007; Ramirez et al., 2008). CLP1 associates with the TSEN splicing complex and a role for CLP1 in tRNA processing has recently been demonstrated (Hanada et al., 2013). A mouse model for a CLP1 mutation that abolishes kinase activity was used to examine the phenotypic outcome. Neonatal death of the CPL1 mutant mice was a consequence of respiratory failure and non-viable mouse pups showed a substantial loss of motor neurons. An accumulation of novel tyrosine tRNA fragments derived from pre-tRNA was observed in the brain, muscle, kidney, heart, and liver, while mature tRNA levels remained normal. Recent initial biochemical characterization of CLP1 as part of the TSEN tRNA splicing complex has demonstrated that ATP is not required for CLP1 binding to the complex but CLP1 ATP hydrolysis is required for proper pre-tRNA cleavage (Hanada et al., 2013). Also, efficient pre-tRNA cleavage is protein phosphorylation dependent.

### **tRNA FRAGMENTS AND DISEASE**

As shown above, tRNA fragments can be associated with disease (Fu et al., 2009). Angiogenin is a ribonuclease that is essential in blood vessel growth and development (Shapiro and Vallee, 1987) and has a role in angiogenesis (Shapiro and Vallee, 1989). Interestingly, *in vitro* full length tRNAs were shown to be substrates for angiogenin (Saxena et al., 1992). Under a broad range of stresses that include heat shock, hypoxia, hypothermia, and nutritional deprivation stress-induced small RNAs (tiRNAs) of 30–40 nucleotides can be generated by cleavage in the anticodon region (Fu et al., 2009). Mature tRNA fragments derived from tRNAGly, tRNAGlu, tRNAVal, tRNAArg have been identified in human fetal liver, mouse liver and heart tissue, and various cells lines. Significantly, reduction in angiogenin levels by siRNA treatment has been shown to decrease the accumulation of tRNAVal halves under stress conditions. The production of tRNA halves appears to cause translational arrest by a mechanism distinct from the better-known eIF2α dependent phosphorylation (Yamasaki et al., 2009). Of further interest, only the 5 generated fragments are associated with translational arrest, and 5- -tiRNACys and 5- tiRNA*Ala* are the most potent repressors of translation relative to other 5- -tiRNAs (Ivanov et al., 2011). The molecular mechanism of tiRNA translational repression has been linked to displacement of eIFG/A from mRNA. This proposed translational repression mechanism in mammalian cells involves the formation of a tiRNA and protein Y box binding protein 1 (YB-1) complex that associates with m7G cap of mRNA to inhibit eIFG/A from initiating translation (Ivanov et al., 2011).

### **tRNA BINDING PROTEINS DURING CELLULAR STRESS CAUSE DISEASE**

Translation in eukaryotes can be attenuated in response to diverse environmental stresses, including nutrient deprivation, iron limitation, protein export blockage, viral infection, UV irradiation and others. The physiological responses to these environmental challenges are coordinated by members of the eIF2α kinase family, which possess the ability to phosphorylate eIF2α when activated (Wek et al., 2006; Baird and Wek, 2012). The GCN2 kinase serves as the principal response to nutrient limitation, and reveals a special role for tRNA in its regulatory mechanism (Harding et al., 2003; Wek et al., 2006). In response to amino acid starvation, uncharged tRNA specifically activates GCN2, by virtue of binding to a HisRS-like domain. GCN2 is a serine-threonine kinase and, in addition to its kinase domain, also contains a HisRS like domain(Wek et al., 1995). The binding of tRNA to the HisRS-like domain induces a conformational change that subsequently activates the kinase domain (Dong et al., 2000; Qiu et al., 2001).

Animals lack the ability to synthesize 9 of the canonical 20 amino acids found in proteins. When an animal consumes an amino acid imbalanced diet, the deficiency in one or more essential amino acids decreases the concentration of essential amino acids in the blood stream. This elicits a short-term response, mainly controlled by the brain, and a longer-term response, largely controlled by the liver. The short term and brain specific response is based on special properties of the anterior piriform cortex, a region of the brain that functions in olfaction (Hao et al., 2005). GCN2 specifically expressed in this region acts as a special sensor of indispensible (essential) amino acids, regulating feeding behavior through its control of activating transcription factor 4 (ATF-4), the mammalian homolog of GCN4 (Harding et al., 2000; Hao et al., 2005). When rats are fed an IAAdeficient diet, the buildup of uncharged tRNA leads to activation of GCN2 (Rudell et al., 2011). This causes an adverse response, such that the animals cease feeding and search for new food sources that have a complete panel of essential amino acids. The neural circuitry for this behavior is only partially understood, but is being actively investigated (Maurin et al., 2014).

In addition to nutrient deprivation, the expression of GCN2 in the brain has recently been shown to participate in the control of long-term potentiation (LTP), a major component of synaptic plasticity associated with the enhancement of signaling between neurons. The late phase of LTP (L-LTP) is dependent on gene expression and de novo protein synthesis. A key factor in long term memory in general and LTP in particular is the repressor of cAMP responsive element binding protein (CREB), which plays a direct role in L-LTP, learning, and memory (Costa-Mattioli et al., 2005). Notably, CREB expression is negatively regulated by ATF4, which is controlled by GCN2. CREB, ATF4, and GCN2 establish a circuit in which perturbations of translational initiation lead to changes in both electrophysiological measurements of LTP and altered performance in quantitative tests of memory, such as the Morris water maze test. The expression of ATF4 is down regulated in GCN2−/<sup>−</sup> animals, resulting in increased spatial memory after weak training, but poorer spatial memory after extensive training (Costa-Mattioli et al., 2005). Notably, the switch from short term to long synaptic plasticity and memory is dependent on eIF2α phosphorylation, which is a direct consequence of GCN2 activity (Costa-Mattioli et al., 2007). Collectively, these observations underscore the role of protein synthesis control in the establishment of memory. GCN2 also serves a function in proper neurobiology development and is down regulated by inhibition from imprinted and ancient gene protein homolog (IMPACT) in order to promote protein synthesis and neurite growth (Pereira et al., 2005; Roffe et al., 2013).

Recently, two reports linked inherited mutations in the gene encoding GCN2, *EIF2AK4*, with complex phenotypes arising from vascular pathology. In the first report (Eyries et al., 2014), numerous mutations in the *EIF2AK4* gene were linked to pulmonary veno-occlusive disease (PVOD), which is characterized by obstruction and blockage of pulmonary veins by fibrous, collagen rich tissue. PVOD is characterized by low diffusing capacity for carbon monoxide, septal line and lymph node enlargement, as well as occult alveolar hemorrhage (Eyries et al., 2014). By whole-exome sequencing, subjects in 13 families with the disease were found to possess multiple mutations in the *EIF2AK4* gene, in both the homozygous and compound heterozygous state (Eyries et al., 2014). Age of onset in patients varied widely from 10 to 50 years of age. Most mutations encoded stop codons in the GCN2 open reading frame, but at least two encoded missense mutations that mapped to conserved residues in the kinase domain.

In a second study (Best et al., 2014), inherited and sporadic mutations in the *EIF2AK4* gene were linked to pulmonary capillary hemangiomatosis (PCH), a progressive and eventually fatal disease characterized by capillary proliferation and invasion into alveolar septa and the bronchial wall. Symptoms included pulmonary hypertension, fatigue, cough, weight loss, and dyspnea. Exome sequencing between related individuals of patients with PCH determined that mutations in *EIFAK4* are responsible for the autosomal recessive PCH phenotype. Both affected siblings carried the same compound heterozygous mutations for which the parents were heterozygous carries, a nonsense 3438C>T (Arg1150X) mutation and frameshift 1153dupG (Val385fs) mutation (Best et al., 2014). Owing to clinical similiarity and related immunohistochemistry, PVOD is grouped with PCH, (Langleben et al., 1988; Humbert et al., 1998) and it is likely that these are different clinical description of the same basic pathophysiology. There is insufficient information to rationalize the connection between GCN2 and lung vascular pathophysiology, but an interesting hypothesis is that mutations that compromise GCN2 stress response function may expose pulmonary environment to more oxidative stress, which may predispose both mouse pulmonary hypertensive models and human subjects to pulmonary hypertension (Eyries et al., 2014).

Many studies have demonstrated the importance of GCN2 regulation during cellular stress (Grallert and Boye, 2013). In addition to its link in the pulmonary diseases described above GCN2 has an essential role in supporting tumor cell growth (Wek and Staschke, 2010) and proliferation (Ye et al., 2010, 2012). An alternative hypothesis is that GCN2 interacts with SMAD4 and SMAD1 family members, which have been implicated in signaling pathways associated with bone morphogenic protein (BMP) signaling, which is also genetically linked to pulmonary arterial hypertensive disorders (West et al., 2004). As whole genome sequencing becomes more affordable and widely available, it is likely that other clinical manifestations will be linked to altered GCN2 function, perhaps even with the neurobiological context suggested by the previously described work on nutrition sensing and LTP formation.

### **THE MOLECULAR BIOLOGY OF TRANSLATION IN NEURONS AND PROTEIN SYNTHETIC MACHINERY**

Neurons possess distinct compartments, axons and dendrites collectively referred to as neurites, which are packed with specific sets of proteins to facilitate neuronal function. Positioning mRNA in these specialized compartments for protein synthesis allows neurons to remodel neurites during outgrowth, regulate synthesis of receptor subunits altering the synaptic proteome, and prevent uncontrolled recurring excitation (Turrigiano, 2011; Penn et al., 2013). A number of significant neurodegenerative diseases, such as amyotrophic lateral sclerosis (ALS), fragile X tremor ataxia syndrome (FXTAS), and myotonic dystrophy apparently involve dysfunctional messenger RNA (mRNA) localization (Ramaswami et al., 2013). Neuronal transport of mRNA from the soma to axons and dendrites is in part dependent on a family of proteins referred to as RNA binding proteins (RBPs) (Darnell, 2013). Localized translation can provide efficient increases in local protein concentrations, and regulate both immediate and delayed responses in neurons (Holt and Schuman, 2013). As our understanding of localized translation grows to include mRNA metabolism and spatial regulation, our understanding of how the other components of protein synthesis machinery are trafficked and spatially regulated in neurons should also expand.

In most cells, tRNA comprises approximately 12% of total RNA (Pace et al., 1970). Currently, tRNA molecules undergo selective transportation and localization in order for accurate protein synthesis to occur in eukaryotic cells. After transcription by RNA Pol III some tRNA molecules can be cleaved and posttranscriptionally modified in the nucleus (Hopper and Phizicky, 2003) before being transported into the cytoplasm for protein synthesis. Cytoplasmic tRNA can undergo retrograde transportation from the cytoplasm back into the nucleus (Rubio and Hopper, 2011). This mechanism may help control the rate of protein synthesis by sequestering tRNA into the nucleus during times of cellular stress. Evidence of nuclear accumulation of tRNA has been shown to occur within 15 min after glucose starvation (Whitney et al., 2007). Further indication shows that this appears to be linked to PKA activation and signaling but GCN2 pathway independent (Whitney et al., 2007). Amino acid starvation can also drive accumulation of tRNA into the nucleus (Shaheen and Hopper, 2005).

Given the emerging role of tRNA localization, it may be noteworthy that the interferon-induced tetratricopeptide repeat (IFIT) protein IFIT5 was recently shown to have tRNA binding properties (Katibah et al., 2013). IFIT5 was shown to bind RNA with 5- phosphate caps including initiator methionine iMet, tRNAVal, tRNAGly, and tRNALys. IFIT5 appears to coimmunoprecipitate with shorter fragments rather than full-length tRNA molecules, suggesting it may help target tRNAs for degradation. IFIT5 can also bind actin, and localize to the cellular surface. In the same way that mRNA is preferentially trafficked to dendrites and axons (Wang et al., 2007), tRNA molecules may also be recruited to active sites of localized translation, setting the stage for translation to occur. Recently, single molecule studies using fluorescence *in situ* hybridization (FISH) showed that β-actin mRNA and ribosome mobilization to axons is stimulus induced, and that the β-actin mRNA becomes more available to translation machinery after synaptic stimulation (Buxbaum et al., 2014). This same group later developed a transgenic mouse with *in vivo* labeled β-actin mRNA (Park et al., 2014). β-actin mRNA appeared as part of granules and often multiple copies of the transcript were found in granules localized to dendrites. The movement of endogenous β-actin mRNA containing granules from the soma to the synapse was subsequently estimated to occur at a speed of 1.3μm/s (Park et al., 2014). Comparably, IFT5 actin localization and tRNA binding activity could serve as a potential mechanism to inhibit localized translation, but IFIT5 localization to actin structures was shown to also be independent of RNA binding. However, there may still be potential mechanisms in place to spatially regulate tRNA localization within neurons that remain to be elucidated.

### **AMINOACYL-tRNA SYNTHETASES AND DISEASE OVERVIEW OF AMINOACYL-tRNA SYNTHETASES**

Aminoacyl-tRNA synthetases (ARSs) catalyze the attachment of amino acids to cognate tRNA, a key reaction that contributes to the accuracy of protein synthesis. ARSs can be divided into two distinct classes (Carter, 1993; Ibba and Soll, 2001, 2004). Class I enzymes are, with the exception of TyrRS and TrpRS, principally monomeric, contain a Rossman fold catalytic domain, and aminoacylate the 2 hydroxyl of the terminal adenosine of their cognate tRNAs. Class II ARSs typically form dimers and tetramers, and feature a catalytic domain organized around an antiparallel beta sheet fold flanked by alpha helices. The class II enzymes couple the amino acid moiety to the 3 hydroxyl of the tRNA's terminal adenosine. A typical animal cell contains 37 cytoplasmic and mitochondrial synthetase genes, all of which are encoded on nuclear chromosomes. Mitochondrial synthetases are given the same name as cytosolic synthetases, appended with the numeral 2 (e.g., HARS and HARS2). Collectively, they ensure accurate and efficient protein synthesis under a broad range of conditions and, significantly in multiple cellular compartments.

Pathological mutations have been identified in 10 of the genes encoding mt-ARSs (Schwenzer et al., 2013). The first of these

**Table 2 | Aminoacyl-tRNA synthetase mutations associated with neurological disease.**

was identified in the *DARS2* gene, which encodes mitochondrial aspartyl-tRNA synthetase (Scheper et al., 2007). Currently, pathological mutations that cause human diseases have been identified in ten genes that encode cytoplasmic tRNA-synthetases. The first cytoplasmic ARS mutation associated with a human disease, Charcot-Marie-Tooth, was discovered in glycyl-tRNA synthetase (GARS) (Antonellis et al., 2003). Here, we will focus specifically on aminoacyl tRNA synthetase mutations that affect the peripheral nervous system, sensorineural tissues, and the central nervous system (**Table 2**).

### **AMINOACYL-tRNA SYNTHETASES AND PERIPHERAL NEUROPATHIES**

Charcot-Marie-Tooth disease (CMT) is a hereditary peripheral neuropathy that manifests as progressive degeneration of distal motor and sensory neurons, leading ultimately to muscle weakness and atrophy of the legs and hands/arms. The various forms of CMT can be further categorized into either demyelinating type 1, which features defects in the myelin sheath surrounding peripheral nerves, and axonal type 2, which results in abnormalities in the axon of the peripheral nerve. Nerve conduction studies and electromyography (EMG) examination of patients with CMT are typically abnormal. While more than 80 genes have been linked to CMT (Timmerman et al., 2014), 30 mutations in ARS genes are associated with the intermediate and axonal autosomal dominant and intermediate autosomal recessive forms of the disease (Yao and Fox, 2013).


*NC, not confirmed; ONA, OMIM not assigned; Na, not applicable; CMT, Charcot-Marie-Tooth disease; DHMN5A, distal spinal muscular atrophy type V; ARNSHI, autosomal recessive non-syndromic hearing impairment; HBSL, hypomyelination with brain stem and spinal cord involvement and leg spasticity; LBSL, leukoencephalopathy with brainstem and spinal cord involvement and elevated lactate; PCH, pontocerebellar hypoplasia; MCPH, autosomal recessive primary microcephaly; HSP, hereditary spastic paraplegia; LTBL, leukoencephalopathy with thalamus and brainstem involvement and high lactate.*

The first cytoplasmic ARS to be linked to CMT was glycyltRNA synthetase (GARS) (Antonellis et al., 2003). Since that initial report, mutations in the genes encoding four additional aminoacyl-tRNA synthetases have been linked to CMT (Wallen and Antonellis, 2013). Mutations in the *GARS* gene can cause both CMT2D and distal spinal muscular atrophy type V (DHMN5A) (Antonellis et al., 2003). It is likely significant that the synthetase enzyme encoded by the *GARS* gene is obliged to function in both the cytoplasmic and mitochondrial compartments. Similar to other class II enzymes, GARS is an obligate dimer, and several of the mutant substitutions linked to CMT alter residues located at the dimer interface (Xie et al., 2007). Some of these have been shown to either weaken or strengthen dimer formation (Xie et al., 2007). Additionally, it may also be relevant that wild type protein forms punctate structures (granules) in neurite projections, a property lost in some of the GARS CMT mutants (Antonellis et al., 2006). Other *in vitro* models of CMT GARS mutations also demonstrate a defect in localization to neurite like projections in a transfected neuroblastoma cell line (Nangle et al., 2007). It is not clear whether the phenotype is related solely to the cytoplasmic or mitochondrial distribution of GARS, or both. However, cytoplasmic ARS gene mutations associated with peripheral neuropathies (Wallen and Antonellis, 2013) indicate that dysfunctional mitochondrial protein translation is not the primary cause of these phenotypes.

A unique *Drosophila* model was created to examine the effects of the GARS CMT2D mutation on two different types of neurons, olfactory projection neurons and mushroom body γ neurons (Chihara et al., 2007). These represent two different CNS neuron populations and differ in development, morphology, and circuitry. The olfactory projection neurons showed severe defects in dendritic morphology but few axonal defects. However, mushroom body γ neurons had defects in both axon and dendritic morphology. Interestingly, as GARS is cytoplasmic and mitochondrial, individual disruption of cytoplasmic protein synthesis caused large defects in axonal and dendritic nerve end branching, while disruption of mitochondrial protein synthesis led to more dendritic nerve end branching than axonal. The *Drosophila* GARS mutations could be rescued by expression of human WT GARS. However, CMT2D mutations E71G and L129P could not rescue the defective neuronal projections and hence are loss of function mutations.

Mouse models of GARS CMT mutations have shed some light on the possible pathological mechanism for which ARSs can cause peripheral neuropathy. Interestingly, there did not appear to be significant mislocalization of GARS protein or change in GARS granule formation in the CMT2D mouse model, GARSNmf294/<sup>+</sup> (Stum et al., 2011). The GARSNmf294/<sup>+</sup> mouse model of CMT2D also showed defects in the neuromuscular junction (NMJ), reduced axon diameter, and sensory and peripheral axonal loss but no myelination defects. Importantly, this mutation does not cause a substantial loss of primary aminoacylation function (Seburn et al., 2006). More recent studies of GARSNmf294/<sup>+</sup> mice have shown that peripheral neuropathy is not caused by reduced neuronal connectivity to distal muscle fibers but that defects and denervation commonly observed at NMJs corresponds to inappropriate maturation process at distal muscle fibers (Sleigh et al., 2014).

The cytoplasmic tyrosyl-tRNA synthetase (YARS) is also linked to CMT. In particular, the mutant substitutions G41R and E196K, and deletion 153–156, are associated with dominant intermediate Charcot-Marie-Tooth C (DI-CMTC) (Jordanova et al., 2006). While the G41R and 153–156 mutations demonstrated severe catalytic defects, the E196 mutant was more active, suggesting that canonical activity defects are not responsible for the disease phenotype (Froelich and First, 2011). Furthermore, mutations are not found in domains responsible for previously reported non-canonical functions such as the cytokine activity of YARS (Wakasugi and Schimmel, 1999), suggesting that the CMT pathogenic substitutions may not be associated with YARS noncanonical activities. The mutated YARS enzyme exhibits stability *in vivo* that is comparable to the wild type enzyme (Froelich and First, 2011), but may be subject to a similar tendency to cellular mislocalization in neuroblastoma cells lines that was seen with GARS (Jordanova et al., 2006).

The bifunctional (cytoplasmic and mitochondrial) lysyl-tRNA synthetase (KARS) represents another targeted gene in CMT. One interesting CMT patient was found to be a compound heterozygote with mutations encoding both a L133H missense and a Y173SX7-frame shift mutation (McLaughlin et al., 2010). Both mutations localize to the anticodon-binding domain, however the Y173SX7-frame shift likely results in complete loss of the catalytic domain. The L113H catalytic analysis results showed a reduction in aminoacylation to levels below 25% of wild type. In addition to peripheral neuropathy, a behavioral pathology was observed in this subject. KARS is also part of the multi-synthetase complex (MSC) and can exist as either a dimer or tetramer. KARS has additional non-canonical functions involved in signaling, cell migration, and viral HIV infection, some of which are activated by post-translational modification (Motzik et al., 2013). Mitogenactivated kinase (MAPK) phosphorylation induces release of one dimer from the MSC for translocation into the nuclease, while the other dimer remains with the MSC to continue protein synthesis (Ofir-Birin et al., 2013).

Another well characterized tRNA synthetase associated with hereditary motor neuropathy is the cytosolic alanyl-tRNA synthetase (AARS). In an earlier study, characterization of the mouse strain AARSsti (or "sticky" mouse) with uncharacteristic CMT phenotype, a progressive neurogeneration and a coat defect, suggested a link between these properties and an editing defect in AARS (Lee et al., 2006; Latour et al., 2010). Phenotypically, the AARSsti mice have significant Purkinje cell loss and ubiquitin inclusions suggestive of protein misfolding that are absent in GARSNmf294/<sup>+</sup> mice (Stum et al., 2011). Another genetic investigation of AARS CMT2N subjects in an Australian family showed additional symptoms that included sensorineural deafness (McLaughlin et al., 2012). An attractive hypothesis is that the protein misfolding is a direct consequence of mistranslation originating from the misacylation of tRNAAla with serine instead of alanine (Guo et al., 2009).

Mutations in ARS genes have also been linked to non-CMT peripheral neuropathies. Recently, a cytoplasmic histidyl-tRNA synthetase (HARS) heterozygous mutation, R137Q, was found in a 54-year-old patient with peripheral neuropathy (Vester et al., 2013). In yeast, the mutation encoded a loss of function substitution in HARS, and the transfected mutant *HARS* gene failed to support growth. When introduced in *C. elegans*, the mutant *HARS* gene bearing the R137Q mutation caused aberrant motor neuron axonal growth and a progressive loss of motor coordination. In a fashion reminiscent of the some of the GARS-linked CMT mutants, the R137Q substitution eliminates a salt bridge at the dimer interface with D64 of the opposite monomer (Xu et al., 2012). This provides one scenario by which the R137Q substitution could bring about a loss of function, and promote the peripheral neuropathy phenotype. However, the existence of a heterozygous asymptomatic carrier with the same mutation suggests that the genetics of the system are not straightforward, and may be confounded by incomplete penetrance effects. On a related note, patients with late-onset CMT2 were identified with mutations in the methionyl-tRNA synthetase (*MARS*) gene (Gonzalez et al., 2013). The mutation affects a highly conserved arginine residue, R618C, that forms a salt-bridge interaction at the catalytic and anticodon-binding domain interface. Much like with R137Q HARS, the R618C MARS phenotype is observed to have a late onset, and does not show complete penetrance. Both mutations cause a loss of function in yeast models.

### **AMINOACYL-tRNA SYNTHETASES AND SENSORINEURAL DISEASES**

Sensorineural deafness is characterized by defects in either the inner ear or the connecting auditory neural circuitry. Perrault Syndrome is described clinically as an ovarian dysgenesis with sensory hearing loss. Genetic studies indicate that mutations in mitochondrial HARS2 and LARS2 are both linked to Perrault Syndrome (Pierce et al., 2011, 2013). In the *HARS2* gene, a compound heterozygous mutation encoding both L200V and V368L is linked to Perrault Syndrome, and these substitutions alter highly conserved residues in the catalytic domain. Pyrophosphate exchange assays demonstrated that the resulting mutant proteins exhibit decreased activity, implying that the phenotype is a straightforward loss of function effect likely arising from reduced respiratory chain complex activity. Given the limitations of this single assay, the mutant substitutions may have other structural and functional consequences. Similarly, the LARS2 mutation associated with Perrault Syndrome, T522N, also targets a highly conserved residue located in the catalytic domain. By contrast, the frame shift mutation at codon 360 (c.1077delT) and T629M compound heterozygous mutations found in another Perrault syndrome patient localizes to the poorly conserved leucine-specific C-terminal domain. Yeast complementation assays showed that T522N LARS2 did not support growth, while T629M LARS2 did. Notably, *C. elegans* models carrying the mutations were completely sterile, recapitulating at least one component of the disease phenotype.

Cytoplasmic HARS has been also been associated with a heterogeneous sensorineural disease that causes deaf-blindness Usher Syndrome Type IIIB. Usher Syndrome is a polygenetic disease with at least 11 different loci identified. Most of the genes associated with Usher Syndrome are typically involved in inner ear hair cell morphology and development (Yan and Liu, 2010). The three clinical subtypes are divided up by severity of the symptoms and age of onset that the phenotype presents. A homozygous mutation in HARS, Y454S, was discovered in patients with Usher Syndrome IIIB (Puffenberger et al., 2012) and is the first cytoplasmic HARS mutation associated with disease. In these patients hearing and vision are severely impaired early in childhood, and fever induced hallucinations can occur. While peripheral nerve function seems to be normal, there is some mild trunk ataxia. The Y454S mutation, which is localized specifically on the surface of the anticodon-binding domain (ABD), juxtaposes the catalytic domain of the second monomer. Y454 hydrogen-bonds with residue E439 within the anticodon binding domain, and E439 is positioned to interact with K148 from the catalytic domain of the second monomer to form a saltbridge interaction (Xu et al., 2012)1. Initial aminoacylation data employing wild type and mutant versions of the mouse enzyme demonstrates that the Y454S substitution is unlikely to be a simple loss of function. Accordingly, current efforts are testing the hypothesis that HARS has a secondary function that is impaired by the Y454S substitution.

Hearing loss phenotypes have been linked to a number of ARSs in addition to HARS. Patients in three unrelated consanguineous Pakistani families suffered from an autosomal recessive non-syndromic hearing impairment (ARNSHI) linked to mutations in KARS in the known ARNSHI-associated locus DFNB89 (Santos-Cortez et al., 2013). One of the mutations is predicted to encode an Y173H substitution in a residue located in the oligomer-binding (OB) fold motif of the KARS anticodon binding domain. In this location, the substitution could negatively affect tRNA binding and/or catalysis. Significantly, the mouse organ of Corti and vestibular system features many cell types, including inner/outer hair cells spiral ligament, and sulcus and spiral limbus cells of the vestibular membrane epithelium where KARS is prominently expressed.

### **AMINOACYL-tRNA SYNTHETASES AND CENTRAL NERVOUS SYSTEM DISEASES**

In contrast to the peripheral nervous system and the sensorinervous system, there are few reports linking ARSs to central nervous system diseases, which often affect specific brain regions. The autosomal recessive monogenetic disease Leukoencephalopathy with Brain stem and Spinal cord involvement and elevated Lactate (LBSL) represents the first well-characterized CNS disease associated with an ARS. Genetic analysis of some 30 different families helped link the disease to mutations in the *DARS2* gene encoding mitochondrial AspRS (Scheper et al., 2007). Hallmarks of the disease defined by MRI imaging include abnormalities of the white matter in the cerebellum, spinal cord, and brainstem. While the mutant substitutions in LBSL are predicted to impair dimer formation and decrease DARS2 catalytic function, mitochondrial respiratory chain complex activity in these LBSL patients was normal (Scheper et al., 2007).

Mutations in the *DARS* gene encoding cytoplasmic AspRS were identified in patients with hypomyelination with brain stem and spinal cord involvement and leg spasticity (HBSL), an inherited white matter disease (Taft et al., 2013). These DARS coding changes may affect enzyme activity by either disrupting tRNA binding or reducing protein expression (Van Berge et al., 2012, 2013). Interestingly, patients with the DARS-linked white matter disease exhibited the same types of white matter abnormalities in brain stem and spinal cord regions by MRI imaging that were observed in patients with the DARS2 mutations linked to LBSL. Cases of white matter disease associated with ARSs are not exclusively restricted to AspRS. A recent report described a disease called leukoencephalopathy with thalamus and brainstem involvement and high lactate (LTBL) linked to mutations encoding mitochondrial glutamyl-tRNA synthetase (EARS2) (Steenweg et al., 2012). The full characterization of the role of ARS function in these diseases is at an early stage, and it is too soon to conclude that they are entirely explained by the sensitivity of neurons to decreased output in translation.

Several other relatively recent reports highlight the potential association of mutations in ARS genes with neurodegenerative diseases. For example, the lethal heterogeneous neurodegenerative disease pontocerebellar hypoplasia (PCH6) was linked to the *RARS2* (mitochondrial arginyl-tRNA synthetase) gene in a patient with a homozygous frameshift mutation predicted to generate a truncated protein (Edvardson et al., 2007). Other case studies of PCH6 subjects subsequently identified additional *RARS2* mutations (Glamuzina et al., 2012). In the zebrafish model, RARS2 is highly expressed in the brain 24 h post fertilization. Of further interest, zebrafish knockdown models of TSEN54 (subunit associated with RNA splicing previously described) and RARS2 produce comparable phenotypes characterized by brain hypoplasia and increased cell death, but little effect on brain patterning. These interesting results suggest there may be a common pathological PCH phenotype associated with loss of function of the *TSEN* and *RARS2* alleles, as well as a demand for specific spliced tRNAs products at specified times during neuronal development (Kasher et al., 2011). Recently, whole-exome sequencing identified that *QARS* is a causative gene in affected individuals of the two families with children affected by autosomal-recessive primary microcephaly (MCPH) (Zhang et al., 2014). Symptoms of this disease are associated with intellectual disability, seizures during infancy, and atrophying in brain regions of the cerebellar vermis and cerebral cortex. This disease phenotype is similar to the PCH caused by mutations in RARS and TSEN complex previously described. However, specific brain regions are differentially affected for each disease. Four variants were identified, two of which were localized to the catalytic domain and the remaining two to the tRNA binding N-terminal domain. Activity studies showed that all mutations caused a loss of functional protein. Also, a zebrafish model demonstrated that QARS is essential for proper eye and brain development.

The potential value of the zebrafish in studying ARS-linked neurodegenerative diseases is suggested by another recent report in which methionyl-tRNA synthetase (*MARS*) was identified in an exome sequencing study as one of 15 genes linked to hereditary spastic paraplegias (HSP) (Novarino et al., 2014). HSP is characterized by the degeneration and progressive loss of corticospinal motor neuron tract function; patients typically present with lower limb spasticity, seizures, ataxia, peripheral neuropathy, intellectual disability, skin, and visual defects. The potential roles of many of the genes reported in this study in HSP were validated in the zebrafish model; the MARS mutation was too severe to be fully evaluated.

Another important connection between ARS function and neurodegenerative disease reported recently is the linkage between the aminoacyl-tRNA synthetase complex interacting multifunctional protein-2 (AIMP2, also referred to as p38) and familial Parkinson's disease. Initially, AIMP2 was reported to be deposited in Lewy bodies, and its accumulation had been noted in some familial cases of Parkinson's disease (Corti et al., 2003; Ko et al., 2005). Later, AIMP2 was determined to be a substrate of the E3 ligase PARKIN. Multiple loss of function mutations in the gene encoding PARKIN are a common cause of familial Parkinson's disease. While the complete basis of AIMP2-linked neurological pathophysiology is not yet clear, it is possible that an important secondary role of AIMP2 is modulating neuronal protein turnover; when this process is dysfunctional, particular cell death pathways (e.g., "parthanatos") may become activated.

### **FUTURE PROSPECTS FOR THE ROLE OF tRNA IN HUMAN DISEASE: THE ROLE OF SPATIAL AND TEMPORAL CONTROL OF tRNA AND THE TRANSLATIONAL MACHINE IN DISEASE PATHOPHYSIOLOGY**

Given the emerging role of mRNA transport and localization in neuronal function (Wang et al., 2007), it would not be surprising if the temporal and spatial regulation of aminoacyl-tRNA synthetases in neuronal compartments such as axons and dendrites were not similarly important. Protein synthesis in these diverse compartments clearly requires the full translational apparatus in addition to the messenger RNA. At present, this remains an underexplored area of tRNA and ARS biology, and a role for tRNA and ARS in localized translation in neurons should be investigated. Some of the models seeking to explain the link between the ARS and CMT have proposed that the mutant substitutions create localization defects for the affected enzymes. As yet, the evidence is not definitive, and the cellular mechanisms that traffic and localize these enzymes are essentially unknown. Is the localization of ARS enzymes to distal sites of the axon and dendrites dependent on vesicular transport, tRNA, or other trafficking proteins? In addition to the well-known Multi-Synthetase Complex (MSC), with what types of protein complexes are the ARS enzymes associated, and what positions them at the site of local translation in dendrites and neurons? Are there signaling pathways that are necessary to initiate localization of ARS enzymes to these regions? Furthermore, there may be mechanisms in place (e.g., post- translational control) to spatially regulate the activity of ARS such that tRNA is not aminoacylated until the proper cues are received to synthesize proteins. These questions, and other issues related to functional compartmentalization, should keep the role of tRNA and ARS in complex human diseases as a major question for years to come.

### **REFERENCES**


type 2D and distal spinal muscular atrophy type V. *Am. J. Hum. Genet.* 72, 1293–1299. doi: 10.1086/375039


with the kinase domain and is required for tRNA binding and kinase activation. *EMBO J.* 20, 1425–1438. doi: 10.1093/emboj/20.6.1425


by mutation in the tRNALys gene. *Proc. Natl. Acad. Sci. U.S.A*. 111, 3104–3109. doi: 10.1073/pnas.1318109111


Zhang, X., Ling, J., Barcia, G., Jing, L., Wu, J., Barry, B. J., et al. (2014). Mutations in QARS, encoding Glutaminyl-tRNA synthetase, cause progressive microcephaly, cerebral-cerebellar atrophy, and intractable seizures. *Am. J. Hum. Genet*. 94, 547–558. doi: 10.1016/j.ajhg.2014.03.003

Zuchner, S., Mersiyanova, I. V., Muglia, M., Bissar-Tadmouri, N., Rochelle, J., Dadali, E. L., et al. (2004). Mutations in the mitochondrial GTPase mitofusin 2 cause Charcot-Marie-Tooth neuropathy type 2A. *Nat. Genet.* 36, 449–451. doi: 10.1038/ng1341

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 March 2014; accepted: 14 May 2014; published online: 03 June 2014. Citation: Abbott JA, Francklyn CS and Robey-Bond SM (2014) Transfer RNA and human disease. Front. Genet. 5:158. doi: 10.3389/fgene.2014.00158*

*This article was submitted to Non-Coding RNA, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Abbott, Francklyn and Robey-Bond. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# The natural history of transfer RNA and its interactions with the ribosome

#### *Gustavo Caetano-Anollés <sup>1</sup> \* and Feng-Jie Sun2*

*<sup>1</sup> Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana-Champaign, IL, USA*

*<sup>2</sup> School of Science and Technology, Georgia Gwinnett College, Lawrenceville, GA, USA*

*\*Correspondence: gca@illinois.edu*

### *Edited by:*

*Akio Kanai, Keio University, Japan*

### *Reviewed by:*

*Kosuke Fujishima, NASA Ames Research Center, USA Savio Torres Farias, Universidade Federal da Paraíba, Brazil*

**Keywords: structure, phylogenetic analysis, sequence, non-coding RNA, translation, ribosome, origin of life**

Transfer RNA (tRNA) is undoubtedly the most central and one of the oldest molecules of the cell. Without it genetics and coded protein synthesis are impossible. The crucial specificities responsible for the genetic code and accurate translation are by far entrusted to interactions between tRNA and translation proteins, fundamentally aminoacyl-tRNA synthetase (aaRS) enzymes and elongation factor (EF) switches (Yadavalli and Ibba, 2012). Discrimination mediated by aaRSs and EFs against misincorporated tRNA and amino acids is at least 20 times more stringent than ribosomal recognition, editing, and other proofreading mechanisms (Reynolds et al., 2010). The fact that crucial genetic code specificities in highly selective interactions with protein enzymes do not involve the ribosomal ribonucleoprotein biosynthetic machinery challenges the "replicators first" origin of life scenario of an ancient RNA world (Caetano-Anollés and Seufferheld, 2013). It also highlights the central functional, mechanistic, and evolutionary roles of tRNA and its recognition determinants, which enable coevolution between nucleic acids and proteins. These coevolutionary relationships are compatible with a late origin of the ribosome in its mechanism and not in protein biosynthesis, which was inferred from the computational analysis of thousands of RNAs and proteomes (Harish and Caetano-Anollés, 2012). These analyses showed tight coevolution of ribosomal RNA (rRNA) and ribosomal proteins (r-proteins). While these relationships delimit molecular makeup when organisms use translation to negotiate growth and viability amidst environmental change, coevolution also constrains recruitment of the canonical L-shaped structure of the tRNA molecule into a multiplicity of modern functions. These new functions include the synthesis of antibiotics, bacterial cell wall peptidoglycans and tetrapyrroles, modification of bacterial membrane lipids, protein turnover, and the synthesis of other aminoacyl-tRNA molecules (Francklyn and Minajigi, 2010). Here we unfold coevolutionary relationships between tRNA substructures and translation proteins that embody crucial protein-nucleic acid interactions. We focus on a series of computational biology analyses of the structure and conformational diversity of tRNAs and their interacting proteins that provide information about the history of structural accretion of this "adaptor" molecule. Using this information, we place tRNA history within the framework of an evolutionary timeline of protein domain innovation, uncovering the natural history of tRNA within the context of the geological record.

### **tRNA MOLECULES ARE OLD AND EVOLVE BY ACCRETION OF STRUCTURAL PARTS**

When studying the organismal distribution of a catalog of over a thousand RNA families describing the modern RNA world, tRNA was found to be one of only five families that were universally present (Hoeppner et al., 2012). These families showed a strong vertical evolutionary trace and included rRNA and ribonuclease P (RNase P) RNA, which are present (with exceptions; e.g., Randau et al., 2008) in all studied cellular organisms and are minimally affected by horizontal gene transfer. We note however that RNA-free RNase P (Gutmann et al., 2012; Taschner et al., 2012) can challenge RNase P RNA ancestrality (Sun and Caetano-Anollés, 2010). The ubiquity of tRNA in the cellular lineages of life and its central molecular role provide strong support to the very early origin of the molecule, prompting the study of the origin and evolution of the tRNA molecule using information in its sequence and structure (Fitch and Upper, 1987; Eigen et al., 1989; Di Giulio, 1994; Sun and Caetano-Anollés, 2008a; Farias, 2013). A computational analysis of the history of tRNA based on the structure of thousands of molecules revealed that tRNAs evolve by accretion of component parts (substructures) and that the "top half" of tRNA that includes the acceptor stem is more ancient that the "bottom half" with its anticodon arm (Sun and Caetano-Anollés, 2008a; reviewed in Sun and Caetano-Anollés, 2008b) (**Figure 1A**). While other models of evolutionary growth of the tRNA molecule have been proposed (Di Giulio, 2012), phylogenetic reconstructions are compatible with biochemical evidence of molecular recognition that makes amino acid charging ancestral and molecularly distant (∼70 Å) to codon recognition, which locate to more modern regions of tRNA (Caetano-Anollés et al., 2013). These findings revive the "genomic tag" hypothesis in which tRNA harbored ancestral genomic information and the derived bottom half provided genetic code specificity (Weiner and Maizels, 1987).

**FIGURE 1 | The natural history of tRNA inferred from nucleic acid-protein interactions and structural phylogenomics. (A)** The history of tRNA portrays the history of its interactions with cognate aminoacyl-tRNA synthetase (aaRS) protein enzymes. This is exemplified by the domains of the tRNA and cysteinyl-tRNA synthetase binary complex (PDB entry 1U0B), which are colored according to their age. The ancient "top half" of tRNA embeds a "operational code" in the identity elements of the acceptor arm that interact with the catalytic domain of aaRSs through classes I and II modes of tRNA recognition. The evolutionarily recent "bottom half" of tRNA holds the standard code in identity elements of the anticodon loop that interact with anticodon-binding domains of aaRSs. **(B)** Flow diagram showing the retrodiction strategy used to build phylogenetic trees of RNA molecules (ToMs) and associated trees of substructures (ToSs), and trees of protein domains (ToDs). The structures of RNA molecules are first decomposed into substructures. Structural features of substructures such as helical stem tracts and unpaired regions are coded as phylogenetic characters and assigned character states according to an evolutionary model that polarizes character transformation toward an increase in conformational order (character argumentation). Coded characters (s) are arranged in data matrices, which can be transposed. Phylogenetic analysis using maximum parsimony optimality criteria generates rooted ToMs and ToSs. A census of domain structures in proteomes of hundreds of

completely sequenced organisms is used to compose data matrices, which are then used to build ToDs. Elements of the matrix (g) represent genomic abundances of domain structures in proteomes, defined at different levels of classification of domain structure (e.g., SCOP folds, superfamilies, and families). They are converted into multi-state phylogenetic characters with character states transforming according to linearly ordered and reversible pathways. Embedded in the trees of nucleic acids and proteins are timelines that assign age to molecular structures and associated functions. **(C)** The natural history of tRNA and rRNA overlap when they are mapped onto a timeline of protein domain history. A tree of tRNA substructures (ToS) was derived from statistical phylogenetic characters that define a molecular morphospace (the Shannon entropy of the base-pairing probability matrix, base-pairing propensity and mean length of stem structures) in 571 tRNA molecules. The optimal most parsimonious tree (43,281 steps; consistency index = 0.853, retention index = 0.654, rescaled consistency index = 0.557, g1 = −1.033) was recovered from a branch-and-bound search. The most basal subtree of a ToS describing the evolution of the rRNA core (Harish and Caetano-Anollés, 2012) is also shown. Both trees are anchored to the geological record via an evolutionary timeline of first appearance of protein domains that are capable of establishing crucial interactions with the RNA molecules (see description in the main text). AC, anticodon; PTC, peptidyl transferase center.

### **PHYLOGENOMIC RETRODICTION UNCOVERS COEVOLUTION BETWEEN tRNA SUBSTRUCTURES AND INTERACTING aaRS PROTEIN DOMAINS**

In the studies mentioned above, phylogenetic analysis of nucleic acid structure was directly derived from structural topology and the thermodynamics of tRNA (Caetano-Anollés, 2002a,b; Sun et al., 2007; Sun and Caetano-Anollés, 2008a), taking unique advantage of links that exist between secondary structure and conformation, dynamics, and adaptation (Bailor et al., 2010). Specifically, a census of geometrical features that describe the length and topology of tRNA substructures (such as stem and non-paired segments) or statistical features describing their stability and conformational diversity were analyzed with modern phylogenetic methods to produce phylogenetic trees of molecules (ToMs) and trees of substructures (ToSs) that portray the history of the system (molecules) or its component parts (substructures), respectively. **Figure 1C** shows a ToS that describes the evolution of stem substructures of the tRNA molecule and of early evolving stem substructures of rRNA. The trees that are produced are rooted using a phylogenetic process model that complies with Weston's generality criterion. The model automatically roots the trees by assuming conformational stability increases in evolution as structures become canalized (Sun et al., 2010). The validity of polarization and rooting depends on the axiomatic component of character transformation, which is falsifiable and supported by considerable evidence (e.g., thermodynamic and phylogenetic; Sun et al., 2010).

While ToSs are powerful retrodiction statements that unfold history of RNA accretion (Sun and Caetano-Anollés, 2008a,b,c, 2009, 2010; Sun et al., 2007; Harish and Caetano-Anollés, 2012), the gradual appearance of protein domains in evolutionary history can be inferred from phylogenomic trees of domains (ToDs) (**Figure 1B**) (Caetano-Anollés and Caetano-Anollés, 2003) and can illustrate the establishment of intermolecular interactions in evolution. Domains are structural and evolutionary units of proteins that are highly conserved (Caetano-Anollés et al., 2009). The evolutionary accumulation of these units unfolds recurrence patterns that encompass the entire history of proteins and can be mined with suitable phylogenomic methods. ToDs are derived from a structural census of protein domains in the proteomes of hundreds to thousands of genomes that have been completely sequenced. The fold structures of domains are defined using the different levels of structural abstraction of the accepted classification gold standards, the SCOP (Murzin et al., 1995) or CATH (Orengo et al., 1997) databases. Timelines of domain innovation are then derived directly from the trees taking advantage of their highly imbalanced nature. Imbalance unfolds when the splitting of lineages depends on an evolving "heritable" trait (Heard, 1996). In our case, the evolving trait is the gradual accumulation of domains in proteomes and the semipunctuated discovery of new fold structures (made evident for example in simulations; Zeldovich et al., 2007). The predictive power of ToDs is considerable (Caetano-Anollés and Seufferheld, 2013) and central for the history of tRNA, as ToDs have established the evolutionary history of aaRS domain structures and their associated coevolving tRNA molecules (Caetano-Anollés et al., 2013). The timeline of evolutionary appearance of fold families revealed the early emergence of the "operational" RNA code linked to the specificities of synthetases that were homologous to the catalytic domains of modern TyrRS and SerRS protein enzymes. These archaic synthetases interacted with the "top half" of tRNA and were capable of peptide bond formation and aminoacylation (Caetano-Anollés et al., 2013). The timeline also showed the late implementation of the standard genetic code with the late appearance of anticodon-binding domains that interacted with the "bottom half" of tRNA. **Figure 1A** shows a representative aaRS enzyme and the tight coevolutionary link between aaRS domains and tRNA arms. Remarkably, structural phylogenomic retrodictions indicate that genetics arose through episodes of structural recruitment as an exacting mechanism that favored flexibility and folding of the emergent proteins (Caetano-Anollés et al., 2013). These enhancements of phenotypic robustness matched evolutionary trends of folding speed in proteins (Debes et al., 2013) and are compatible with recent simulations of the origin of the genetic code (Jee et al., 2013).

### **ABUNDANCE OF PROTEIN DOMAINS IN PROTEOMES FOLLOWS AN EVOLUTIONARY CLOCK**

The history of RNA does not represent a phylogenetic statement that applies to the entire world of RNA molecules. Consequently, it cannot be placed within a global historical context. In contrast, the history of protein domains inferred from ToDs follows a global molecular clock of fold structures that spans 3.8 billion years (Gy) of evolution (Wang et al., 2011). Traditionally, molecular clocks are based on rates of change in protein or nucleic acid sequences, which are limited by historical information existing in the individual protein or nucleic acid molecules being studied (Zuckerkandl and Pauling, 1965; Ayala et al., 1998). These clocks are therefore constrained by the highly dynamic nature of sequence change, including the problems of mutational saturation and rate heterogeneity (heterotachy). In contrast, molecular structures exhibit characteristics of recurrent change that are much more stable. The clocks of domain structures were calibrated by associating diagnostic domain structures with multiple geological ages derived from the study of fossils and microfossils, geochemical, biochemical, and biomarker data. Remarkably, excellent linear correlations between the ages of domain structures at fold and fold superfamily levels of SCOP and geological timescales were identified and used to time fundamental evolutionary events (Wang et al., 2011). These events included the rise of planetary oxygen and episodes of organismal diversification (Wang et al., 2011; Kim et al., 2012).

### **THE CLOVERLEAF STRUCTURE OF tRNA UNFOLDS EARLY IN EVOLUTION, PRIOR TO THE APPEARANCE OF A FUNCTIONAL RIBOSOMAL MACHINERY**

Assuming that the age of interactions that are established between RNA and proteins is the age of the interacting components, we tracked the appearance of domains in ribonucleoprotein complexes along the evolutionary timeline and used the molecular clock of folds to link interactions to a geological timescale (**Figure 1C**). The catalytic domains of classes I and II aaRS enzymes (belonging to SCOP families d.104.1.1 and c.26.1.1, respectively) are the first to appear in the timeline ∼3.7 Gy ago (Caetano-Anollés et al., 2013). These domains harbor pre-transfer and post-transfer editing and trans-editing activities. The most ancient of these editing structures, present in the catalytic domains of TyrRS, SerRS, and LeuRS, involve interactions with the oldest type II cognate tRNAs, which harbor a long variable loop necessary for tRNA recognition (Sun and Caetano-Anollés, 2008c). While the evolutionary significance of the variable loop in tRNA-aaRS interactions is unclear (Sun and Caetano-Anollés, 2008c), its late evolutionary appearance could simply represent the shift or recruitment of an archaic interacting region of the molecule. Interactions of tRNA with the "ValRS/IleRS/LeuRS editing" domain (SCOP family b.51.1.1) (Hale et al., 1997) suggest the D arm was already present ∼3.3 Gy ago, which is derived compared to the acceptor stem (Sun and Caetano-Anollés, 2008a). The late appearance of anticodon-binding domains (beginning with SCOP family c.51.1.1) in well over half of aaRSs ∼3 Gy ago confirms that the full "bottom half" of tRNA and its anticodon loop identity elements unfolded completely before the onset of planetary oxygenation and cellular diversification ∼2.9 Gy ago.

Comparing the natural history of tRNA (Sun and Caetano-Anollés, 2008a) and the ribosome (Harish and Caetano-Anollés, 2012) within the framework of the interacting proteins shows the remarkable functional connection of the cloverleaf structure and ribosomal functionality (**Figure 1C**). The origin of r-proteins in interaction with helix 44 (the ribosomal ratchet) of the small subunit (SSU) rRNA occurred 3.3–3.4 Gy ago once the tRNA molecule unfolded its anticodon arm. This manifests in the pivotal role of one of the two earliest r-proteins, S12, in tRNA selection (anticipated by Ogle and Ramakrishnan, 2005), which is mediated by a bonding network connecting two sites in S12 to the anticodon and the CCA arm of the tRNA-elongation factor bound state (Li et al., 2008). Similarly, the full cloverleaf structure of tRNA was already present when the ribosomal peptidyl transferase center (PTC) responsible for modern protein synthesis appeared in the emerging domain V of the large subunit of rRNA 2.8–3.1 Gy ago. This is an expected outcome since the structurally mature 70–80 Å-long and 20–25 Å-wide tRNA molecule must traverse a path of ∼100 Å and physically span the intersubunit interface of the ribosomal core for the ensemble to be fully functional (Agirrezabala and Frank, 2009). Remarkably, this late development of the ribosomal core coincided with the appearance of pathways of amino acid (Kim et al., 2012) and purine nucleotide biosynthesis (Caetano-Anollés and Caetano-Anollés, 2013). This suggests that tRNA and ribosomal functionality (anticodon loop recognition, decoding, protein biosynthesis) and modern metabolic pathways for amino acids and nucleotides developed concurrently, supporting the co-evolution theory of the genetic code (Wong, 2005).

### **CONCLUSION**

The natural and overlapping history of tRNA and rRNA reveals that: (1) the tRNA cloverleaf structure unfolded prior to the appearance of a fully functional ribosomal core, (2) the primordial role of tRNA, originally linked to archaic dipeptide-forming synthetases, was coopted into modern translation functions once anticodonloop specificities appeared concurrently with the PTC, and (3) the emergence of modern genetics unfolded relatively quickly in a period of 0.3–0.5 Gy, starting with anticodon-loop recognition and once the cloverleaf structure had formed.

### **REFERENCES**


protein synthesis. *PLoS ONE* 7:e32776. doi: 10.1371/journal.pone.0032776


molecules for replication: implications for the origin of protein synthesis. *Proc. Natl. Acad. Sci. U.S.A.* 84, 7383–7387. doi: 10.1073/pnas.84. 21.7383


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 March 2014; accepted: 22 April 2014; published online: 09 May 2014.*

*Citation: Caetano-Anollés G and Sun F-J (2014) The natural history of transfer RNA and its interactions with the ribosome. Front. Genet. 5:127. doi: 10.3389/fgene. 2014.00127*

*This article was submitted to Non-Coding RNA, a section of the journal Frontiers in Genetics.*

*Copyright © 2014 Caetano-Anollés and Sun. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*