# **DNA POLYMERASES IN BIOTECHNOLOGY**

**Topic Editors Andrew F. Gardner and Zvi Kelman**

MICROBIOLOGY

#### *FRONTIERS COPYRIGHT STATEMENT*

© Copyright 2007-2015 Frontiers Media SA. All rights reserved.

All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.

The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.

Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.

Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.

As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.

All copyright, and all rights therein, are protected by national and international copyright laws.

The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.

**ISSN** 1664-8714 **ISBN** 978-2-88919-455-1 **DOI** 10.3389/978-2-88919-455-1

# *ABOUT FRONTIERS*

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

# *FRONTIERS JOURNAL SERIES*

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing.

All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

# *DEDICATION TO QUALITY*

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view.

By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

# *WHAT ARE FRONTIERS RESEARCH TOPICS?*

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area!

Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# **DNA POLYMERASES IN BIOTECHNOLOGY**

Topic Editors: **Andrew F. Gardner,** New England Biolabs, USA **Zvi Kelman,** National Institute of Standards and Technology, USA

Polymerase image reprinted from www.neb.com (2014) with permission from New England Biolabs, Inc.

DNA polymerases are core tools for molecular biology including PCR, whole genome amplification, DNA sequencing and genotyping. Research has focused on discovery of novel DNA polymerases, characterization of DNA polymerase biochemistry and development of new replication assays. These studies have accelerated DNA polymerase engineering for biotechnology. For example, DNA polymerases have been engineered for increased speed and fidelity in PCR while lowering amplification sequence bias. Inhibitor resistant DNA polymerase variants enable PCR directly from tissue (i.e. blood). Design of DNA polymerases that efficiently incorporate modified nucleotide have been critical for development of next generation DNA sequencing, synthetic biology and other labeling and detection technologies. The Frontiers in Microbiology Research Topic on DNA polymerases in Biotechnology

aims to capture current research on DNA polymerases and their use in emerging technologies.

# Table of Contents


*111 DNA Polymerases Drive DNA Sequencing-by-SynthesisTechnologies: Both Past and Present*

Cheng-Yao Chen

*122 DNA Polymerases Engineered by Directed Evolution to Incorporate Non- Standard Nucleotides*

Roberto Laos, J. Michael Thomson and Steven A. Benner

*136 A Novel Thermostable Polymerase for RNA and DNA Loop-Mediated Isothermal Amplification (LAMP)*

Yogesh Chander, Jim Koelbl, Jamie Puckett, Michael J. Moser, Audrey J. Klingele, Mark R. Liles, Abel Carrias, David A. Mead and Thomas W. Schoenfeld

# DNA polymerases in biotechnology

#### *Andrew F. Gardner <sup>1</sup> \* and Zvi Kelman2,3*

*<sup>1</sup> New England Biolabs Inc., Ipswich, MA, USA*

*<sup>2</sup> National Institute of Standards and Technology, Gaithersburg, MD, USA*

*<sup>3</sup> Institute for Bioscience and Biotechnology Research, Rockville, MD, USA*

*\*Correspondence: gardner@neb.com*

#### *Edited by:*

*John R. Battista, Louisiana State University and A & M College, USA*

#### *Reviewed by:*

*Katarzyna Bebenek, National Institute of Environmental Health Sciences - National Institutes of Health, USA*

#### **Keywords: DNA polymerase, DNA polymerase evolution, DNA polymerase fidelity, DNA sequencing, molecular diagnostics, next generation sequencing, PCR, PCR inhibitors**

Accurate duplication of parental DNA is a fundamental biological process, conserved in function across all life forms. All organisms depend on DNA polymerases for genome replication and maintenance. DNA polymerases also play central roles in modern molecular biology and biotechnology, enabling techniques including DNA cloning, the polymerase chain reaction (PCR), DNA sequencing, single nucleotide polymorphism (SNP) detection, whole genome amplification (WGA), synthetic biology, and molecular diagnostics. Each of these applications relies on the ability of polymerases to duplicate DNA, yielding a product that accurately represents the initial input. This book on "DNA Polymerases in Biotechnology" focuses on how detailed understanding of DNA polymerase structure and function informs protein engineering efforts, leading to development of novel reagents for molecular biology and clinical diagnostics.

DNA polymerases are classified into several families (A, B, C, D, X, Y) and reverse transcriptase (RT) based on primary amino acid sequence similarities (Burgers et al., 2001). The book leads off with several reviews that describe how these DNA polymerase families are evolutionarily (Makarova et al., 2014) and structurally (Doublie and Zahn, 2014) related as well as how polymerases have been utilized in biotechnology (Ishino and Ishino, 2014). Subsequent research articles build on this basic knowledge to describe how DNA polymerases are engineered as tools in biotechnology.

The best known and one of the earliest DNA polymerasebased biotechnology applications is PCR. Since its development over 30 years ago, PCR has been a foundational tool for amplifying and detecting specific alleles (Erlich et al., 1991). Advances in DNA polymerase fidelity, speed, and processivity continue to improve PCR workflows for genetic analysis, cloning, and diagnostics. Several articles in the issue highlight engineered polymerases with improved properties for PCR. Elshawadfy and co-workers demonstrate that combining desirable protein domains from several DNA polymerases into a single engineered chimeric enzyme can increase both speed and processivity during PCR (Elshawadfy et al., 2014). Similarly, Yamagami et al. create novel DNA polymerases by swapping domains from DNA polymerases found in hot springs to select for hybrid polymerases with desirable PCR properties (Yamagami et al., 2014). Castillo-Lizardo and colleagues analyze replication slippage during PCR of repeat sequences and show that the processivity clamp of DNA polymerase, proliferating cell nuclear antigen (PCNA) (Indiani and O'Donnell, 2006), reduces slippage to permit errorfree replication of repeat sequences (Castillo-Lizardo et al., 2014).

As nucleic acid analysis by PCR moves toward clinical diagnostics, there is a need for both faster DNA polymerases and those that are capable of directly amplifying DNA from clinical samples such as tissue, blood, body fluids, or stool to speed and simplify diagnostic workflows. Several papers characterize DNA polymerases that tolerate PCR inhibitors and allow rapid DNA amplification from clinical samples without DNA purification, thereby reducing analysis time, cost, and potential for contamination. The contribution by Killelea et al. demonstrates that a Family D polymerase from *Pyrococcus abysii* is tolerant to high concentrations of PCR inhibitors while Arezi and colleagues describe a method to select for DNA polymerase variants that enable direct PCR from whole blood (Arezi et al., 2014; Killelea et al., 2014).

In addition to PCR, DNA polymerases play key roles in DNA sequencing technologies. Sanger DNA sequencing was used to sequence the first draft of the human genome in 2001 (Lander et al., 2001; Venter et al., 2001) and remains a standard and widespread method to determine DNA sequence. Several reviews describe the recent progress in the use of DNA polymerases for DNA sequencing. The review by Zhu examines the pivotal role of T7 DNA polymerase and its engineered derivatives in accelerating Sanger sequencing techniques (Zhu, 2014). Reha-Krantz et al. combine genetic and biochemical methods to identify T4 DNA polymerase mutants with increased processivity that along with T4 single stranded DNA binding protein (gp32) and T4 processivity factors (gp45 and gp44/62 complex) (Indiani and O'Donnell, 2006) improve Sanger sequencing of difficult DNA regions (Reha-Krantz et al., 2014). In the years since the human genome was first sequenced, new next generation sequencing methods have dramatically increased sequencing output while lowering costs (Mardis, 2011). Again, engineered DNA polymerases form the core of these next generation DNA sequencing-by-synthesis technologies. Chen reviews how DNA polymerases enable sequencing-by-synthesis technologies including the Illumina, Ion Torrent, and Pacific Biosciences platforms (Chen, 2014). The contribution by Laos et al. details how DNA polymerases have been engineered to incorporate the modified nucleotides used in DNA sequencing, genotyping, and synthesis of artificial DNA (Laos et al., 2014).

Recently, point of care diagnostic tests that are cheap, reliable and do not depend on specialized instruments have emerged. For example, isothermal amplification techniques such as Loop-Mediated Amplification (LAMP) have been routinely used as diagnostic tests to detect infectious disease (Njiru, 2012). Chander and co-worker describe an engineered thermostable viral polymerase with RT and DNA polymerase activities that can be used in isothermal RT-LAMP detection of RNA (Chander et al., 2014). Additionally, they demonstrate that the reaction components can be lyophilized as a dry pellet to allow storage without refrigeration and may be used in the field as a simple diagnostic test for RNA viruses.

#### **FUTURE CHALLENGES**

Engineered DNA polymerases will continue to play important roles in biotechnology and the delivery of health care. Over the next several years, molecular methods that are easier, cheaper, and faster will emerge. At the same time, molecular biology will move toward analysis of low concentration biomolecules (i.e., a single set of chromosomes). Unfortunately, tools for analysis of minute quantities of DNA are currently inadequate or technically challenging. For example, advances in sequencing technology (i.e., nanopore sequencing) can use extremely long DNA but methods to create long DNAs have not kept pace (Branton et al., 2008; Metzker, 2010; Shendure et al., 2011). Novel amplification techniques are also required to profile genetic variations among single cells (Navin and Hicks, 2011; Schubert, 2011) because the quantity of genomic DNA from a single cell is insufficient to sequence directly. Therefore, DNA must first be amplified prior to further analysis (Kalisky and Quake, 2011; Kalisky et al., 2011).

Additionally, synthetic biology aims to design new biological systems such as genetic pathways, operons, and genomes (Montague et al., 2012) and thus may require long, chromosomesize, amplification. Pathway engineering relies on assembling the DNA coding for the desired characteristics and then using a host to activate the pathway *in vivo*. Current methods for DNA assembly are limited to about 20 kb and larger fragments must be assembled *in vivo* at a very low frequency thereby limiting utility. Furthermore, current DNA polymerases introduce errors during amplification and thus DNA polymerases with very low error rates are needed to ensure that long, amplified DNA are exact copies of the starting material.

Therefore, novel DNA amplification systems are needed to accelerate progress in emerging technologies and to make highfidelity *in vitro* genome analysis and manipulation routine. Engineered DNA polymerases or cellular replication machineries capable of amplifying large DNA fragments have the potential to enable single cell genomics, genome synthesis, and manipulation. This issue summarizes the known properties of various DNA polymerase systems and how DNA polymerases are currently being manipulated to meet these growing demands.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 17 October 2014; accepted: 13 November 2014; published online: 01 December 2014.*

*Citation: Gardner AF and Kelman Z (2014) DNA polymerases in biotechnology. Front. Microbiol. 5:659. doi: 10.3389/fmicb.2014.00659*

*This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology.*

*Copyright © 2014 Gardner and Kelman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Evolution of replicative DNA polymerases in archaea and their contributions to the eukaryotic replication machinery

#### *Kira S. Makarova1, Mart Krupovic <sup>2</sup> and Eugene V. Koonin1 \**

*<sup>1</sup> National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA <sup>2</sup> Unité Biologie Moléculaire du Gène chez les Extrêmophiles, Institut Pasteur, Paris, France*

#### *Edited by:*

*Zvi Kelman, University of Maryland, USA*

#### *Reviewed by:*

*Thijs Ettema, Uppsala University, Sweden Uri Gophna, Tel Aviv University, Israel*

#### *\*Correspondence:*

*Eugene V. Koonin, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A, Room 5N503, 8600 Rockville Pike, Bethesda, MD 20894, USA e-mail: koonin@ncbi.nlm.nih.gov*

The elaborate eukaryotic DNA replication machinery evolved from the archaeal ancestors that themselves show considerable complexity. Here we discuss the comparative genomic and phylogenetic analysis of the core replication enzymes, the DNA polymerases, in archaea and their relationships with the eukaryotic polymerases. In archaea, there are three groups of family B DNA polymerases, historically known as PolB1, PolB2 and PolB3. All three groups appear to descend from the last common ancestors of the extant archaea but their subsequent evolutionary trajectories seem to have been widely different. Although PolB3 is present in all archaea, with the exception of Thaumarchaeota, and appears to be directly involved in lagging strand replication, the evolution of this gene does not follow the archaeal phylogeny, conceivably due to multiple horizontal transfers and/or dramatic differences in evolutionary rates. In contrast, PolB1 is missing in Euryarchaeota but otherwise seems to have evolved vertically. The third archaeal group of family B polymerases, PolB2, includes primarily proteins in which the catalytic centers of the polymerase and exonuclease domains are disrupted and accordingly the enzymes appear to be inactivated. The members of the PolB2 group are scattered across archaea and might be involved in repair or regulation of replication along with inactivated members of the RadA family ATPases and an additional, uncharacterized protein that are encoded within the same predicted operon. In addition to the family B polymerases, all archaea, with the exception of the Crenarchaeota, encode enzymes of a distinct family D the origin of which is unclear. We examine multiple considerations that appear compatible with the possibility that family D polymerases are highly derived homologs of family B. The eukaryotic DNA polymerases show a highly complex relationship with their archaeal ancestors including contributions of proteins and domains from both the family B and the family D archaeal polymerases.

**Keywords: DNA replication, archaea, mobile genetic elements, DNA polymerases, enzyme inactivation**

# **INTRODUCTION**

Recent experimental and comparative genomic studies on DNA replication systems have revealed their remarkable plasticity in each of the three domains of cellular life (Li et al., 2013; Makarova and Koonin, 2013; Raymann et al., 2014). In particular, archaea, members of the prokaryotic domain that gave rise to the information processing systems of eukaryotes, show remarkable diversity even with respect to the core components of the replication machinery, the DNA polymerases (DNAPs) (Makarova and Koonin, 2013). The main replicative polymerases of archaea belong to the B family of Palm domain DNAPs (Burgers et al., 2001) which is also widely represented in eukaryotes, eukaryotic and bacterial viruses, as well as some bacteria; however, in bacteria, these polymerases appear to be of viral origin and are involved mainly in repair whereas replication relies on a distinct, unrelated enzyme (Gawel et al., 2008). In addition to the polymerase core, which consists of three domains known as palm, fingers and thumb, most of the B family DNAPs contain an N-terminal 3- -5- exonuclease domain and a uracil-recognition domain (Hopfner

et al., 1999; Steitz and Yin, 2004; Rothwell and Waksman, 2005; Delagoutte, 2012).

Family B DNAPs are present in all archaeal lineages, and many archaea have multiple paralogs some of which appear to be inactivated; at least two paralogs can be traced to the Last Archaeal Common Ancestor (LACA) (Rogozin et al., 2008; Makarova and Koonin, 2013). In addition to the archaeal chromosomes, family B DNAPs are encoded by several mobile genetic elements (MGEs) that replicate in archaeal cells and could contribute to horizontal transfer of DNAPs (Filee et al., 2002). In particular, family B DNAPs closely related to those found in the host species are encoded by haloarchaeal head-tailed viruses such as *Halorubrum* myoviruses HF1, HF2 (Filee et al., 2002; Tang et al., 2002) and HSTV-2 (Pietila et al., 2013) whereas more diverged protein-primed Family B DNAPs have been identified in other haloviruses such as His1 and His2 (Bath et al., 2006). Furthermore, recently, family B DNAPs have been identified in a new group of self-synthesizing mobile elements, called casposons because they apparently employ Cas1, originally known as a component of the CRISPR-Cas immunity systems, as their integrase (Makarova et al., 2013; Krupovic et al., 2014a).

In addition to the family B polymerases, most of the archaeal lineages, with the exception of the Crenarchaeota, encode the unique family D DNAP (Cann et al., 1998) that accordingly can be inferred to have been present in LACA. The family D polymerases consist of two subunits. The large subunit DP2 is a multidomain protein which forms a homodimer that is responsible for the polymerase activity (Shen et al., 2001; Matsui et al., 2011). The DP2 protein does not show significant sequence similarity with any proteins except for the two C-terminal Zn finger domains. The structure of the complete DP2 protein so far has not been solved but the structure of the N-terminal domain reveals a unique fold (Matsui et al., 2011). The small subunit DP1 contains at least two domains, an ssDNA-binding OB-fold, and a 3- -5- exonuclease domain of the metallophosphatase MPP family. The DP1 protein is the ancestor of the small B subunits of eukaryotic replicative DNAPs that, however, have lost the catalytic amino acid residues of the 3- -5 exonuclease (Aravind and Koonin, 1998; Klinge et al., 2009). Evidence has been presented that in euryarchaea the family D DNAP specializes in the synthesis of the lagging strand whereas the family B DNAP, PolB3, is involved in the leading strand synthesis (Henneke et al., 2005). However, at least in *Thermococcus kodakarensis*, the family D DNAP is sufficient for the replication of both strands (Cubonova et al., 2013). The Crenarchaeota lack the family D DNAP but possess at least one additional active DNAP of the B family, suggesting that the two distinct B family DNAPs specialize in the leading and lagging strand replication, respectively, as is the case in eukaryotes. In particular, biochemical data suggest that in *Sulfolobus solfataricus*, one family B polymerase (PolB1/Dpo1) is responsible for the synthesis of the leading strand whereas the other one, PolB3/Dpo3, is involved in the synthesis of the lagging strand (Bauer et al., 2012).

Some crenarchaeal and euryarchaeal plasmids encode palm domain polymerases of the archaeo-eukaryotic primase superfamily (Iyer et al., 2005), known as prim-pol, but in these plasmids the protein apparently is employed for initiation of replication rather than elongation (Iyer et al., 2005; Lipps, 2011; Krupovic et al., 2013; Gill et al., 2014).

Here we summarize the results of an updated comparative genomic and phylogenetic analysis of archaeal polymerases, focusing primarily on the diversity of Family B, including the polymerases associated with proviruses and mobile elements, and discuss their evolutionary relationships with eukaryotic DNAPs.

#### **COMPARATIVE GENOMIC AND PHYLOGENETIC ANALYSIS OF ARCHAEAL DNA POLYMERASES PHYLOGENY, DOMAIN ARCHITECTURE AND GENE NEIGHBORHOODS**

# **OF B FAMILY DNAPs IN ARCHAEA**

Using the latest recent update of archaeal clusters of orthologous genes (arCOGs) (Wolf et al., 2012) which includes 168 complete genome sequences of archaea (Refseq update as of February 2014), we reconstructed a phylogenetic tree of family B polymerases for a representative set of archaeal genomes and analyzed their gene context (**Figure 1**). One of the selected sequences (YP\_006773615 from *Candidatus* Nitrosopumilus koreensis) belongs to the distinct, protein-primed DNAP family (see discussion below) and thus was used as an outgroup (**Figure 1**). Another protein (YP\_007906966 from *Archaeoglobus sulfaticallidus*) is extremely diverged and poorly alignable and therefore has not been included in the tree reconstruction. Consistent with previous observations (Edgell et al., 1998; Rogozin et al., 2008), the tree encompassed three large branches: (i) PolB3, the "major" DNAP, present in all archaea except Thaumarchaeota, (ii) PolB1, the "minor" DNAP, present only in the TACK (Thaumarchaota, Aigarchaota, Crenarchaeota and Korarchaeota) superphylum (Guy and Ettema, 2011; Martijn and Ettema, 2013) and (iii) PolB2, a distinct family of DNAP homologs most of which appear to be inactivated as inferred from the replacement of the catalytic amino acid residues (Rogozin et al., 2008) and show a patchy distribution in most archaeal lineages (**Figures 1**, **2**, Supplementary Table S1).

Despite the presence in most archaeal genomes, the PolB3 branch shows little topological congruence with the archaeal phylogeny that was established primarily through phylogenetic analysis of multiple translation, transcription and replication system components (Guy and Ettema, 2011; Yutin et al., 2012; Podar et al., 2013; Raymann et al., 2014). The deviations include the polyphyly of Euryarchaeota, Methanomicrobia, and Thermoplasmatales, and paraphyly of Sulfolobales-Desulfurococcales with respect to Thermoproteales. These discrepancies suggest that the history of archaeal Family B DNAPs included multiple horizontal gene transfer (HGT) events and/or major accelerations of evolution. No recent duplications are observed within this group of polymerases but some archaea possess two versions of PolB3 that could have different origins. In particular, acquisition of two versions of PolB3 (one from Archaeoglobales and another from Thermoplasmatales), followed by the loss of the ancestral methanomicrobial gene, seems likely for the genus *Methanocella*.

Several groups of archaea contain intein insertions in the PolB3 gene, up to three per gene (Perler, 2002). Inteins are parasitic genetic elements that insert into protein-coding genes, perform self-splicing at the protein level and typically encode an endonuclease that mediates intein gene propagation into ectopic DNA sites (Perler et al., 1994; Gogarten et al., 2002). The majority of intein insertion sites in PolB3 genes are shared between different archaea but some are lineage-specific (Perler, 2002; MacNeill, 2009). It appears likely that the split PolB3 genes in Methanobacteriales (Kelman et al., 1999) evolved as a result of erratic intein excision, especially considering that in the tree these split DNAP genes cluster with Methanococcales and Thermococcales which both contain inteins in PolB3 genes (**Figure 1**). Similarly, a split PolB gene, in this case with the two parts non-adjacent, is found in *Nanoarchaeum equitans* where it could be trans-spliced via an intein parts of which are associated with the two split gene fragments (Perler, 2002; Choi et al., 2006). In the recently sequenced nanoarchaeon Nst1, the orthologous PolB3 gene is not split (Podar et al., 2013), suggesting that intein insertion and split occurred late in the evolution of the Nanoarchaeota.

In most of the archaea, PolB3 genes do not form conserved genomic neighborhoods. The only notable exception is a conserved genomic context of this gene in most crenarchaea that

#### **FIGURE 1 | Phylogenetic analysis of the polymerase B family in**

**archaea.** The MUSCLE program (Edgar, 2004) was used for construction of sequence alignments. The tree was reconstructed using the FastTree program (Price et al., 2010) (179 sequences and 209 aligned positions). The complete tree is available in the Supplementary Figure S3. The sequences are denoted by their GI numbers, species names, refseq genome UID number and the arCOG number to which the respective protein currently assigned. Several branches are collapsed and shown as triangles denoted by the respective lineage taxonomy name. Color code: Euryarchaeota, dark blue, with the exception of Halobacteria that are shown in orange; Crenarchaeota, light blue; deeply branched archaeal

lineages (Thaumarchaeota, Korarchaeota, Nanoarchaeota), purple; Nanoarchaea, red. The conserved neighborhoods (if any) are shown on the right side of the tree for the respective branches. Homologous genes are shown by arrows of the same color; genes are shown approximately to scale. Color code: polymerase genes are shown by red outline, inteins are shown by yellow triangles, uncharacterized genes are rendered in gray. The arCOG numbers are provided underneath the respective gene arrows for all non-polymerase genes. Abbreviations: arORC2—ORC/CDC6 AAA+ ATPases, arORC2 subfamily (Makarova and Koonin, 2013), HTH—helix-turn-helix; P.AW—the conserved motif for the respective uncharacterized protein.

includes the bacterial-type DNA primase *dnaG*; however, the *polB3* and *dnaG* genes are oriented convergently and accordingly are transcribed from different promoters. In addition, in all haloarchaeal genomes, PolB3 might be co-regulated with three uncharacterized genes that are specific to this group of archaea (e.g., HVO\_0855-HVO\_0857 from *Haloferax volcanii*); the protein product of one of these genes (HVO\_0855) contains a helixturn-helix DNA-binding domain, suggesting that it could be a regulator of PolB3 transcription (**Figure 1**).

The second major branch of archaeal family B DNAPs includes the replicative polymerases of the PolB1 group that is represented in all members of the TACK superphylum (**Figure 2**). Most of the Thaumarchaeota possess only this form of active family B DNAP whereas Korarchaeaum and Crenarchaeota encode both PolB3 and PolB1. In a striking contrast to PolB3, the topology of this branch is almost fully consistent with the archaeal phylogeny, indicative of a primarily vertical mode of evolution of this gene. So far only in *Nitrososphaera gargensis*, two inteins are inserted into the PolB1 gene (**Figure 1**).

The third large group of family B DNAPs that includes the experimentally characterized PolB2/Dpo2 of *S. solfataricus* shows a patchy distribution in archaea but is rapidly growing with the sequencing of new genomes that have been found to encompass this gene, along with several bacteria (Rogozin et al., 2008). The PolB2 family is currently represented in Crenarchaeota, Euryarchaea and also in *Caldiarchaeum subterraneum*, the only known member of the putative phylum Aigarchaeaota (**Figure 1**). The topology of this branch is generally consistent with a predominantly vertical mode of evolution, along with multiple losses in several archaeal lineages. It appears likely that in the case of this group, the deviations from the archaeal phylogeny are due primarily to increased rates of evolution of this gene in some lineages (**Figure 1**). Thus, along with the PolB3 and PolB1 groups, PolB2 probably was already represented in LACA. Sequence comparison of this subfamily with other family B DNAPs shows that, in most members, multiple catalytic residues of both the polymerase and the exonuclease domains are replaced, suggesting that these proteins are inactivated DNAPs (Rogozin et al., 2008). However, very weak activities of both enzymatic domains have been reported for a single member of this group, PolB2/Dpo2 of *S. solfataricus*(Choi et al., 2011).

Recent comparative genomic analysis identified an association between PolB2 genes, an uncharacterized gene of arCOG07300 and a *radA*-like gene in Sulfolobales (Makarova and Koonin, 2013). We analyzed the genomic neighborhoods for this family in greater detail and found that many diverged members of arCOG07300 have been missed originally due to the low sequence similarity with proteins from Sulfolobales but were now detected by using more sensitive methods, such as PSIBLAST, allowing to expand the family considerably (Supplementary Figure S1). The arCOG07300 proteins are small (∼90 aa), alpha-helical proteins that do not show statistically significant similarity with any available protein sequences. Three arCOGs (arCOG07763, arCOG04294, arCOG08101) in the predicted operons with the inactivated polymerase PolB2 and arCOG07300 belong to the RadA family but all appear to be inactivated as judged by the substitution of the key amino acid residues implicated in ATP binding and hydrolysis (Supplementary Figure S2). In one of these proteins (arCOG07763), the P-loop ATPase domains deteriorated so severely that similarity to RadA could be detected only using such sensitive methods as HHpred (Supplementary Figure S2). Because the phyletic patterns of arCOG07763, arCOG04294, arCOG08101 are complementary and the respective genes are embedded in the same genomic context, these genes appear to be orthologs that have evolved at high rates, losing readily detectable sequence similarity. Several haloarchaea possess an additional copy of a two gene operon that consists of arCOG07763 and arCOG07300. In two *Methanocella* species, the arCOG07300 gene is also present in the same operon with predicted active DNAPs which form the sister group to the inactivated PolB2/Dpo2 (**Figure 1**), suggesting that the functional link with arCOG07300 evolved before polymerase inactivation. In many euryarchaeal genomes, the neighborhood also includes an arORC2 family gene (Makarova and Koonin, 2013), an ATPase component of origin recognition complex (**Figure 1**). The same three gene families are linked also in the several bacterial genomes that encode a PolB2 homolog. In addition to these three genes, in some bacteria, *lexA*, the SOS response master repressor gene, is located in the same predicted operon. This association implies that the putative protein complex encoded by this operon is involved in DNA damage response. A typical example of these associations is a locus in *Leptospirillum ferriphilum* that consists of four genes LFML04\_0990-LFML04\_0993 encoding, respectively, an "inactivated" polymerase, a homolog of arCOG07300, inactivated *radA* and *lexA*. Taken together, these observations indicate that PolB2/Dpo2, inactivated RadA and arCOG07300 proteins are most likely functionally linked and could also form a complex given that proteins encoded in evolutionarily conserved operons often interact both physically and functionally (Dandekar et al., 1998; Quax et al., 2013).

Given its wide spread and likely ancestral provenance in archaea, this complex might perform important, albeit dispensable function in DNA damage repair, more specifically, perhaps in the elimination of stalled replication forks, and/or in the regulation of DNA replication.

The presence of the arORC2 family gene, which encodes an ATPase component of the origin recognition complex, in the same neighborhood of many euryarchaeal genomes implies a replication-related function (Makarova and Koonin, 2013) (**Figure 1**). Recently, it has been shown that in *Haloferax volcanii* a RadA protein is required for initiation of replication in originless cells (Hawkins et al., 2013). Although the RadA shown to be involved in this process is an active ATPase and belongs to a different family (arCOG00417), given the association of the putative PolB2-inactive RadA-arCOG07300 operon with the arORC2 gene, the complex of these proteins might be involved in an alternative mechanism of replication initiation or in the regulation of origin recognition. Clearly, an important aspect of the further characterization of this predicted complex is the determination of the presence or absence (as suggested by comparative sequence analysis) of enzymatic activities in PolB2.

Several archaea possess another, divergent B family DNAP (arCOG04926) that is predicted to be active. Recently, it has been shown that this gene is tightly associated with several other genes, including Cas1 (a CRISPR-Cas system gene), and belongs to a new class of mobile elements called Casposons (see details below). A sister branch of this family includes active polymerases from several closely related genomes of Thermoproteales and a single representative of Desulfurococcales, *Ignisphaera aggregans* (**Figure 1**). In *I. aggregans,* the DNAP gene of this group probably belongs to a provirus (see below) whereas the respective genes in Thermoproteales do not display any conserved genomic associations and are unlikely to belong to mobile genetic elements although their origin from such elements cannot be ruled out.

#### **DNA POLYMERASES ENCODED WITHIN INTEGRATED MOBILE ELEMENTS**

Mobile genetic elements (MGE), such as viruses and plasmids, often encode their own genome replication proteins. In archaea, viruses from at least four different families are known to encode DNA polymerases. Tailed viruses of the order *Caudovirales* encode RNA-primed family B DNA polymerases (PolB) (Sencilo et al., 2013), whereas certain members of the families *Ampullaviridae* (Peng et al., 2007), *Fuselloviridae* (Bath et al., 2006; Krupovic et al., 2014b) and *Pleolipoviridae* (Bath et al., 2006; Pietila et al., 2012) carry genes for protein-primed PolBs. Integration of MGEs that contain genes for cellular-like replication proteins into the host chromosome can be and often is confused with the duplication of the *bona fide* cellular genes encoding these proteins (Krupovic et al., 2010; Forterre and Prangishvili, 2013). Therefore, careful gene neighborhood analysis is necessary to ascertain the provenance of replication protein genes in genomes of cellular organisms, especially when multiple paralogs of a given gene appear to be present.

With regard to DNAPs, two types of elements encoding diverse family B polymerases are integrated in the genomes of diverse archaea (**Figure 3**). The first group includes the recently discovered transposon-like elements called Casposons (Krupovic et al., 2014a). Unlike other known mobile genetic elements, casposons apparently rely on Cas1 endonucleases, the key enzymes of the prokaryotic CRISPR-Cas immunity (hence the name), for integration into the cellular genome. These elements are found in both bacteria and archaea. Casposons are 7–20 kb in length and are surrounded by terminal inverted repeats and target site duplications (**Figure 3**). Three families of casposons have been defined based on the phylogenetic analysis of the Cas1 endonucleases, gene content and taxonomic distribution. Family 1 casposons are thus far exclusively found in Thaumarchaeota (4 elements) and encode proteinprimed PolBs that are most closely related to the corresponding proteins of archaeal viruses His1 (*Fuselloviridae*) and His2 (*Pleolipoviridae*). Phylogenetic analysis of the viral and casposon pPolB suggests that there has been exchange of the pPolB genes between these two types of MGEs (Krupovic et al., 2014a).

Casposons of families 2 and 3 encode typical RNA-primed PolBs and are respectively found in the genomes of euryarchaeota (11 casposons) and bacteria (4 casposons). In the phylogenetic tree of PolB, these bacterial and archaeal casposons form a clade that emerges as a sister group to the DNAPs of different species of the crenarchaeal class Thermoprotei (**Figure 1**). Notably, in the latter group, PolB of *Ignisphaera aggregans* DSM 17230 is also encoded within an integrated mobile element which is, however, unrelated to the casposons (see below).

The second type of PolB-encoding MGEs includes two elements, IgnAgg-E3 (24.7 kb) and ArcSul-E1 (21.2 kb), found in the genomes of the crenarchaeon *I. aggregans* and the euryarchaeon *Archaeoglobus sulfaticallidus* PM70-1, respectively (**Figure 3**). These two elements share genes neither with each other nor with known archaeal viruses or plasmids (a detailed description of IgnAgg-E3 and ArcSul-E1 will be published elsewhere) and accordingly could be founding members of two new groups of MGEs.

**FIGURE 3 | Genome maps of archaeal PolB-encoding mobile genetic elements. (A)** Casposons of families 1 and 2. NitAR1-C1 is present in the genome of *Candidatus Nitrosopumilus koreensis* AR1 (NC\_018655; nucleotide coordinates: 655308 to 663492), whereas MetMaz-C1 is from *Methanosarcina mazei* Go1 (NC\_003901; nucleotide coordinates: 3946601 to 3956653). **(B)** Tyrosine recombinases-encoding elements. IgnAgg-E3 is found in the genome of *Ignisphaera aggregans* DSM 17230 (NC\_014471; nucleotide coordinates: 1844012 to 1868704) and ArcSul-E1 is from

*Archaeoglobus sulfaticallidus* PM70-1 (NC\_021169; nucleotide coordinates: 873590 to 894826). Predicted protein-coding genes are indicated with arrows, indicating the direction of transcription. Genes for PolBs are shown in red, *cas1* genes are in cyan, and genes for tyrosine recombinases are colored blue. Abbreviations: TIR, terminal inverted repeats; att, attachment site; ZBD, Zinc-binding domain-containing protein; HNH, HNH family endonuclease; (w)HTH, (winged) helix-turn-helix proteins; RBD, RNA-binding domain.

#### **EVOLUTIONARY RELATIONSHIPS OF ARCHAEAL AND EUKARYOTIC DNA POLYMERASES**

Based on the above considerations and the respective phyletic patterns, three family B polymerases, PolB1, PolB2 ("inactivated" DNAPs) and PolB3, could be projected to LACA. In addition to these three families, two subunits of family D polymerase, arCOG04455 and arCOG04447, and a family Y polymerase, arCOG04582 (see the respective phyletic patterns in the Supplementary Table S1) also are likely to be ancestral. A polymerase of family X, although common in archaea, cannot be projected to LACA with confidence (Wolf et al., 2012). The latter two polymerases (families X and Y) are unlikely to be involved in genome replication. In bacteria and eukaryotes, members of both families have been thoroughly characterized and shown to function in DNA repair (Jarosz et al., 2007; Moon et al., 2007; Silverstein et al., 2010; Sharma et al., 2013). The experimental data suggests that PolB1 and the family D DNAP are the main replicative polymerases in crenarchaea and euryarchaea, respectively, whereas PolB3 appears to be involved in the replication of the lagging strand in most archaea (Cubonova et al., 2013).

Most eukaryotes possess four paralogous family B DNAPs denoted Pol-α, Pol-δ, Pol-ε, and Pol-ζ, four family Y polymerases (Yang, 2014), four family X polymerases (Bebenek et al., 2014) and two family A polymerases involved in mitochondrial replication and DNA repair (Burgers et al., 2001). All these polymerases seem to have been present in the last eukaryotic common ancestor (LECA). The functions of family B polymerases in eukaryotes are diversified: Pol-ε is the main replicative polymerase specialized in the replication of the leading strand, Pol-δ replicates the lagging strand, Pol-α is the main component of the eukaryote-specific primase complex, which synthesizes short DNA primers during the lagging strand replication (Kunkel and Burgers, 2008; Pavlov and Shcherbakova, 2010 and references therein), and Pol-ζ is involved in lesion bypass (Sharma et al., 2013). Furthermore, the functions of all family B DNAPs in eukaryotes require an additional small subunit, the same for all family B DNAPs (Bell and Dutta, 2002).

Domain architectures and the relationships between archaeal and eukarytic replicative polymerase families are schematically shown in **Figure 4A**. The small subunits evolved from the small subunit (DP1) of the archaeal family D polymerase which in archaea is an 3- -5- -exonuclease of the MPP superfamily that appears to be involved in proofreading during archaeal DNA replication (Aravind and Koonin, 1998; Jokela et al., 2004). However, the homologous small subunit of the eukaryotic DNAPs has lost the catalytic amino acid residues and performs an architectural role in the DNAP complex (Aravind and Koonin, 1998; Klinge et al., 2009; Yamasaki et al., 2010).

**FIGURE 4 | Reconstruction of the complements of replicative DNAPs in the last archaeal and eukaryotic ancestors and a hypothetical scenario of their evolutionary relationships. (A)** Polymerase B and D family genes projected to archaeal (LACA) and eukaryotic (LECA) last common ancestors and their domain organization. Homologous domains are shown by shapes of the same color. Inactivated domains are crossed. For eukaryotic polymerase families, human and yeast gene names are provided. **(B)** The unrooted phylogenetic tree of active polymerases of B family. The MUSCLE program

(Edgar, 2004) was used for construction of multiple sequence alignments. The tree was reconstructed using the FastTree program (Price et al., 2010) (141 sequences and 264 aligned positions). The complete tree is available in the Supplementary Figure S4. The tree is rendered as a scheme, with all major groups collapsed. **(C)** The inferred evolutionary relationships between archaeal and eukaryotic replicative DNAPs. Details on the involvement of PolD in the evolution of eukaryotic DNAPs are discussed in the text. The question mark denotes an uncertainty in evolutionary scenario.

The evolutionary relationships between the polymerase subunits themselves are much more difficult to establish due to the multiplicity of paralogs in both archaea and eukaryotes, and the apparent differences in the evolutionary rates resulting in poorly resolved phylogenetic tress (Edgell et al., 1998; Filee et al., 2002; Tahirov et al., 2009). Furthermore, due to the use of considerably different sets of sequences and different methods of tree reconstruction employed, the results of different analyses are not directly comparable. The only observation that seems to be fully consistent is the grouping of eukaryotic polymerases δ and ζ. We made another attempt to reconstruct a phylogenetic tree of the family B DNAPs including only major branches of active polymerases (hence excluding PolB2) from archaea, eukaryotes and bacteria, and using an updated, representative set of sequences (**Figure 4B**). In the resulting tree, most of the deep branches are poorly resolved and unstable depending on the set of sequences used and the method of tree reconstruction (data not shown). The only additional observation that appears reliable is the confident grouping of PolB3 from several Methanomicrobia with the eukaryotic branch containing DNAPs δ and ζ (**Figure 4B**); this affinity is supported by the relatively high BLAST scores of the pairwise alignments of these sequences to eukaryotic polymerases compared with other archaeal polymerases (Supplementary Table S2). However, the PolB3 sequences from these Methanomicrobia lack the two Zn fingers at the C-terminus, a synapomorphy of the eukaryotic family B DNAPs that is also present in the archaeal PolD (Tahirov et al., 2009) (**Figures 1**, **4A** and see discussion below). If grouping of PolB3 from Methanomicrobia with DNAPs δ and ζ reflects an actual evolutionary event, then a complicated scenario would have to be proposed, including acquisition of a eukaryotic polymerase by the ancestor of this group of organisms, loss of the "original" PolB3 and loss of the C-terminal Zn-fingers in the acquired polymerase. The alternative is the even less plausible scenario whereby the common ancestor of the eukaryotic DNAPs δ and ζ evolved from an unknown variant of the methanomicrobial PolB3 that contained at least one Zn finger; however, this scenario contradicts the recent conclusions on the origin of eukaryotes from the archaeal TACK superphylum (Martijn and Ettema, 2013; Koonin and Yutin, 2014). Given the complexity of these scenarios, the possibility should be considered that, the apparent strong support notwithstanding, the eukaryote-methanomicrobial affinity is yet another tree reconstruction artifact caused by large differences in evolutionary rates in different branches.

Thus, we have to conclude that phylogenetic analysis fails to resolve the evolutionary relationships between archaeal and eukaryotic family B DNAPs. So could any other considerations help understanding the origin of family B DNAPs that are responsible for eukaryotic DNA replication? In particular, this puzzle cannot be solved in full without uncovering the provenance of the family D polymerases, especially taking into account that the DP1 subunit clearly made it to LECA and is an indispensable component of all replicative B-family polymerases in eukaryotes (Yamasaki et al., 2010) whereas the DP2 subunit appears to have been lost. Furthermore, there is a significant, specific sequence similarity between the C-terminal Zn fingers of Pol-ε and DP2 (Tahirov et al., 2009). Any scenarios that strive to accommodate all these findings require intricate chains of events (Tahirov et al., 2009).

An intriguing possibility is suggested by the conservation of several aspartate residues in the catalytic domain of DP2, including the DxD motif that is present in all palm domain polymerases and is involved in the binding of an essential divalent cation (Cann et al., 1998). This observation might indicate that, notwithstanding the absence of readily detectable sequence similarity, DP2 is a highly derived homolog of family B DNAPs. This hypothesis appears to be able to accommodate all available facts in the simplest possible fashion. The fact that the small subunit of the family D DNAP, DP1, is the readily detectable ortholog of the B subunit that is shared by all eukaryotic family B polymerases is also compatible with this scenario. It has been shown that the eukaryotic Pol-ε consists of an N-terminal DNAP domain in which all major catalytic motifs of family B are conserved and a C-terminal DNAP domain in which most of these motifs are disrupted suggestive of inactivation(Tahirov et al., 2009).The present hypothesis could account for the origin of Pol-ε as a fusion of an ancestral form of DP2 (before its accelerated evolution period) that would give rise to the active, N-terminal domain of Polε, followed by an inactivated PolB2 domain inserting between the active N-terminal domain and the Zn finger (**Figure 4**). The N-terminal polymerase domain of Pol-ε shows a pattern of insertions and deletions that is distinct from those in all other family B DNAPs, which is compatible with a distinct origin (Tahirov et al., 2009). The accelerated evolution of the PolB1 at the origin of DP2 might have occurred within a viral genome, followed by reintroduction of this evolved gene into an ancestral euryarchaeal lineage via the so-called host-to-virus-to-host transfer loop, as has been proposed for the replicative MCM helicases of Methanococcales (Krupovic et al., 2010). In functional terms, this hypothesis is compatible with the fact that Pol-ε is the leading strand polymerase in eukaryotes. Obviously, this hypothesis will be put to test when the structure of the catalytic domain of DP2 is solved. Furthermore, the possibility remains that genome sequencing of currently uncharacterized, deep branches of archaea results in identification of novel DNAPs that help clarifying the relationships between the B and D families, and possibly, other aspects of DNAP evolution.

# **CONCLUSIONS**

The DNAPs comprise the core of the DNA replication machinery, obviously one of the key functions in any cellular life form and many viruses and mobile elements. Other genes involved in key information processing functions, such as the core components of the translation and transcription systems, have highly conserved sequences, are rarely duplicated and do not seem to experience major accelerations of evolution (Koonin, 2003; Puigbo et al., 2009). Therefore, reconstruction of the evolution of the respective systems is a relatively straightforward task. Their major biological importance notwithstanding, the DNAPs evolve under a different regime that appears to involve multiple duplications, gene losses, horizontal gene transfers and domain rearrangements. Moreover, inactivated DNAPs seem to have adopted new functions the exact nature of which remains to be elucidated. The complexity of the evolution of the DNAPs is likely to stem partly from the functional differentiation because in archaea and eukaryotes the lagging and leading strand are replicated by distinct DNAPs. Another important factor is the common presence of DNAPs in viruses and other mobile genetic elements that can transfer the DNAP genes between cellular organisms, providing an environment conducive to accelerated evolution, and possibly replacing the original genes.

The starkest manifestation of the complexity of DNAP evolution is the intricate relationship between the archaeal and eukaryotic replication machineries. Here we proposed a parsimonious evolutionary scenario under which the archaeal family D of DNAPs is a highly derived form of family B. However, the available data are also compatible with various other scenarios that would involve contributions from different archaeal DNAPs and possibly also viruses.

#### **ACKNOWLEDGMENTS**

Eugene V. Koonin and Kira S. Makarova are supported by intramural funds of the US Department of Health and Human Services (to the National Library of Medicine). Mart Krupovic was partly supported by the European Molecular Biology Organization (ASTF 82-2014).

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www*.*frontiersin*.*org/journal/10*.*3389/fmicb*.* 2014*.*00354/abstract

#### **REFERENCES**


environments from five new Thermococcus plasmids. *PLoS ONE* 8:e49044. doi: 10.1371/journal.pone.0049044


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 May 2014; accepted: 24 June 2014; published online: 21 July 2014.*

*Citation: Makarova KS, Krupovic M and Koonin EV (2014) Evolution of replicative DNA polymerases in archaea and their contributions to the eukaryotic replication machinery. Front. Microbiol. 5:354. doi: 10.3389/fmicb.2014.00354*

*This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology.*

*Copyright © 2014 Makarova, Krupovic and Koonin. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

**REVIEW ARTICLE** published: 25 August 2014 doi: 10.3389/fmicb.2014.00444

# *Sylvie Doublié\* and Karl E. Zahn*

*Department of Microbiology and Molecular Genetics, University of Vermont, Burlington, VT, USA*

#### *Edited by:*

*Andrew F. Gardner, New England Biolabs, USA*

#### *Reviewed by:*

*Erik Johansson, Umeå University, Sweden Aneel Aggarwal, Mount Sinai School of Medicine, USA*

#### *\*Correspondence:*

*Sylvie Doublié, Department of Microbiology and Molecular Genetics, University of Vermont, 89 Beaumont Drive, Given E314A, Burlington, VT 05403, USA e-mail: sdoublie@uvm.edu*

Three DNA polymerases of the B family function at the replication fork in eukaryotic cells: DNA polymerases α, δ, and ε. DNA polymerase α, an heterotetramer composed of two primase subunits and two polymerase subunits, initiates replication. DNA polymerases δ and ε elongate the primers generated by pol α. The DNA polymerase from bacteriophage RB69 has served as a model for eukaryotic B family polymerases for some time. The recent crystal structures of pol δ, α, and ε revealed similarities but also a number of unexpected differences between the eukaryotic polymerases and their bacteriophage counterpart, and also among the three yeast polymerases. This review will focus on their shared structural elements as well as the features that are unique to each of these polymerases.

**Keywords: DNA polymerase, B family, eukaryotic replication, fidelity, proofreading**

# **INTRODUCTION**

Replication in the nucleus of eukaryotic cells employs three DNA polymerases: polymerase α, δ, and ε (Hubscher et al., 2002; Pavlov et al., 2006b; Kunkel and Burgers, 2008; Loeb and Monnat, 2008; Burgers, 2009; Pavlov and Shcherbakova, 2010; Lange et al., 2011). DNA synthesis is directional and proceeds from 5 to 3- , where nucleophilic attack on the α phosphate of a nucleotide by the 3- OH of a primer results in the incorporation of a nucleoside monophosphate and release of pyrophosphate (Steitz, 1999). All DNA polymerases require a primer and a free 3- OH to conduct DNA synthesis, and pol α is no exception. Pol α is a heterotetramer composed of two primase subunits and two polymerase subunits. The primase subunits initiate DNA replication by synthesizing short (7–12 ribonucleotides) RNA primers, which are then extended by polymerase α (Pellegrini, 2012). DNA polymerase δ and ε elongate the primers generated by pol α in an accurate and processive manner (Kunkel, 2004, 2011; Pellegrini, 2012). In yeast, DNA polymerase δ has been shown to be essential for DNA synthesis of the lagging strand whereas pol ε appears to mainly function at the leading strand (Pursell et al., 2007; Nick Mcelhinny et al., 2008; Kunkel, 2011; Georgescu et al., 2014). In contrast, in the mitochondria replication is the responsibility of one sole polymerase, DNA polymerase γ (Lee et al., 2009).

DNA polymerases are grouped into seven families (A, B, C, D, X, Y, and RT). In eukaryotes the three nuclear replicative DNA polymerases happen to belong to the B family (Burgers et al., 2001; Patel and Loeb, 2001). There are now crystals structures of all three replicative DNA polymerases from yeast, which allow for the first time a comparison of their shared structural elements as well as a study of their unique features (Swan et al., 2009; Perera et al., 2013; Hogg et al., 2014; Jain et al., 2014a). All three replicative DNA polymerases are multi-subunit enzymes (**Table 1**) (Johansson and Macneill, 2010; Pavlov and Shcherbakova, 2010; Makarova et al., 2014). The main focus of this review is on their catalytic domain, or subunit A.

#### **OVERALL STRUCTURE OF B FAMILY POLYMERASES**

All DNA polymerases share a common polymerase fold, which has been compared to a human right hand, composed of three subdomains: fingers, palm, and thumb (Steitz, 1999; Patel and Loeb, 2001). The palm, a highly conserved fold composed of four antiparallel β strands and two helices, harbors two strictly conserved catalytic aspartates located in motif A, **D**XXLYPS and motif C, DT**D**S (Delarue et al., 1990; Braithwaite and Ito, 1993). This RRM-like fold is shared by a very large group of enzymes, including DNA and RNA polymerases, reverse transcriptases, CRISPR polymerase, and even reverse (3- –5- ) transferases such as Thg1 (Anantharaman et al., 2010; Hyde et al., 2010). In contrast, the thumb and fingers subdomains exhibit substantially more structural diversity (Steitz, 1999). The fingers undergo a conformational change upon binding DNA and the correct incoming nucleotide. This movement allows residues in the fingers subdomain to come in contact with the nucleotide in the nascent base pair. The thumb holds the DNA duplex during replication and plays a part in processivity (Doublié and Ellenberger, 1998; Doublié et al., 1999).

Eukaryotic DNA polymerases α, δ, and ε share homology with many archaeal, bacterial, bacteriophage, and viral polymerases (Delarue et al., 1990; Braithwaite and Ito, 1993; Franklin et al., 2001; Firbank et al., 2008; Wang and Yang, 2009). Koonin and collaborators contributed a detailed phylogenetic analysis of archaeal DNA polymerases and their relationship with eukaryotic polymerases in this issue of Frontiers in Microbiology dedicated to polymerases Makarova et al. (2014).

All B family polymerases are composed of five subdomains: the fingers, thumb, and palm (described above) constituting the core of the enzyme, as well as an exonuclease domain and an N-terminal domain (NTD) (Franklin et al., 2001; Xia and Konigsberg, 2014) (**Figure 1**; Table S1). The exonuclease domain carries a 3- –5 proofreading activity, which removes misincorporated nucleotides. The exonuclease active site is located 40–45 Å away from the polymerase active site. The NTD seems to be


devoid of catalytic activity. In pol δ the NTD comprises three motifs: one has a topology resembling an OB fold, a singlestranded DNA binding motif, and another bears an RNA-binding motif (RNA Recognition Motif or RRM) (Swan et al., 2009). In bacteriophage T4, mutations in the NTD decrease expression of the polymerase (Hughes et al., 1987). In RB69 and T4, the gp43 polymerase binds its own messenger RNA, presumably through the NTD and represses translation (Petrov et al., 2002), which does not seem to be the case for pol δ (Swan et al., 2009). New data indicate that the NTD plays a role in polymerase stability and fidelity through its interactions with other domains (Li et al., 2010; Prindle et al., 2013) (see below).

All mammalian B family DNA polymerases are known to harbor two cysteine-rich metal binding sites (CysA and CysB) in their C-terminal domain (CTD) (**Figure 2**). CysA is presumed to be a zinc-binding site whereas CysB is an iron sulfur cluster [4Fe-4S] (Netz et al., 2012). Loss of the [4Fe-4S] cluster in the CTD of yeast pol δ negatively affects interactions with its accessory B-subunit (Sanchez Garcia et al., 2004). The zinc-binding motif was shown to be important for interaction of pol δ with its processivity factor, PCNA (Netz et al., 2012).

# **DNA POLYMERASE α**

The catalytic subunit of DNA polymerase α is composed of 1468 amino acids (**Table 2**). The protein construct designed for

#### **A MECHANISM OF DISENGAGEMENT OF THE POLYMERASE**

The RNA/DNA oligonucleotide captured in the crystals adopts an A-form conformation, as expected. The thumb domain engages in multiple interactions with the RNA primer, both via hydrophobic contacts and polar interactions (Perera et al., 2013). Experiments in solution have shown that the extension of the RNA primer by pol α is limited to 10–12 nucleotides, which amounts to one turn of a helix. This observation led the authors to suggest a mechanism for termination of primer synthesis by pol α in which loss of specific interactions between the thumb and the RNA would trigger the polymerase to disengage from the DNA/RNA oligonucleotide, and allow a hand off to a replicative polymerase.

#### **MOVEMENTS IN THE PALM DOMAIN MAY FACILITATE TRANSLOCATION OF POL α**

Having crystallized the enzyme in three states (apo, binary, and ternary) allowed the authors to overlay all three structural models. Pol α is the only eukaryotic family B DNA polymerase for which all three states were captured in a crystal structure. The structural superposition revealed that, in addition to the well-documented movements of the fingers and thumb subdomains accompanying substrate binding and nucleotidyl transfer, the palm subdomain itself undergoes a structural rearrangement (Perera et al., 2013). The authors propose that the different conformations of the palm domain could facilitate translocation of pol α along and beyond the RNA/DNA duplex. As mentioned above, loss of contacts to the RNA strand is predicted to trigger release of primer, which then becomes available for extension by pol δ or ε.

# **A DIFFERENT PROTEIN FOLD IN THE INACTIVATED EXONUCLEASE SUBDOMAIN**

The proofreading activity is abolished in pol α, due to mutations in all four carboxylates (Asp114/Glu116/Asp222/Asp327 in RB69 gp43 correspond to Ser542/Gln544/Tyr644/Asn757 in a structure-based alignment) (**Table 2**). Moreover, the β-hairpin motif found in most polymerases of the B family (residues 246–267 in RB69 gp43) is replaced by a helical region in pol α (residues 667–676; 681–693) (Hogg et al., 2007). The β hairpin is part of the exonuclease domain and has been shown in T4 and RB69 pols to participate in the partitioning of the DNA primer between the polymerase and the exonuclease active site (Reha-Krantz, 1988; Stocki et al., 1995; Hogg et al., 2007). In the absence of proofreading activity it is not surprising that this motif was not retained in pol α. Residues His 684 and Phe 685 of the helical region in pol α stack with a thymine and guanine base, respectively, at positions -3 and -2 in the unpaired 5- end of the template (Perera et al., 2013). Thus, in pol α the region corresponding to the β-hairpin motif adopts a different fold (helices vs. β strands) and a different function (stabilizing the unpaired region of the template strand rather than facilitating active site

domain appears in gold, adjacent to the 3- –5 exonuclease domain (cyan). **(A)** Polymerase α (PDBID 4FYD) binds an RNA/DNA hybrid, where the wide, shallow minor groove of A-form DNA is apparent near the thumb. The 3- –5- exonuclease domain is devoid of activity. A helical region (magenta) in the inactivated exonuclease domain stabilizes the 5- end of the template. **(B)** Polymerase δ (PDBID 3IAY) harbors a large β hairpin motif (magenta), which is

switching). Since pol α is devoid of proofreading activity the question arises as to whether the short oligonucleotides are corrected, and if so, by which DNA polymerase. It appears that proofreading of the primers synthesized by pol α is performed by pol δ (Pavlov et al., 2006a).

### **DNA POLYMERASE δ**

System, Version 1.5.0.4 Schrödinger, LLC.).

Human pol δ is composed of four subunits whereas *Saccharomyces cerevisiae* has three (Gerik et al., 1998; Liu et al., 2000) (**Table 1**). In addition to its function in DNA replication pol δ has been shown to play a role in DNA repair and recombination (Hubscher

domain organization, is evident when the model enzyme from bacteriophage RB69 gp43 (PDBID 2OZS) is viewed along with the three eukaryotic replicative polymerases. The domain delineation for each polymerase is given in Table S1. Figure was made with PyMOL (The PyMOL Molecular Graphics

et al., 2002; Lee et al., 2012; Tahirov, 2012). P12, the smallest subunit in human pol δ and also the subunit that is not seen in budding yeast, is degraded in response to DNA damage (Lee et al., 2014). The catalytic subunit of yeast pol δ (POL3) is composed of 1097 residues. The construct used for crystallization comprises residues 67–985 and thus lacks the CTD (**Figure 1**; **Table 2**).

#### **A THIRD METAL ION IN THE POLYMERASE ACTIVE SITE**

The palm domain contains three conserved carboxylates (Asp608, Asp762, and Asp764). The two catalytic aspartates, Asp608 and Asp764, contact two metal ions (Ca2+) in the polymerase active site separated by 3.7 Å. Intriguingly a third metal was observed coordinated by the γ phosphate of the incoming nucleotide and Glu802, with Glu800 in the vicinity. Mutating both glutamates to alanine yielded a polymerase variant with reduced incorporation efficiency for both correct and incorrect nucleotides (Swan et al., 2009). At these amino acid positions, pol α and pol ε also have carboxylate residues (pol δ Glu800/Glu802 correspond to pol α Asp1033/Asp1035, and pol ε Glu945/Asp947). Whether these carboxylates play similar roles in pol α and ε remains to be investigated.

#### **HIGH FIDELITY AND PROOFREADING**

Human pol δ is a high-fidelity polymerase, catalyzing the nucleotidyl transfer reaction with an error frequency of 1 per 22,000 (Schmitt et al., 2009). Proofreading boosts the fidelity of the polymerase by a factor of 10–100 (Mcculloch and Kunkel, 2008; Prindle et al., 2013). Pol δ harbors a polymerase and exonuclease active site, separated by about 45 Å (Swan et al., 2009). DNA polymerases with proofreading activity are able to sense misincorporated nucleotides by contacting the minor groove of base pairs beyond the insertion site. The protein interacts with universal hydrogen bond acceptors at the N3 and O2 positions of purines and pyrimidines, respectively (Seeman et al., 1976; Doublié et al., 1998; Franklin et al., 2001). These hydrogen bond contacts are preserved when the base pair adopts a Watson-Crick geometry and lost in the event of a mismatch. In RB69 gp43, the contacts extend to the first two base pairs beyond the nascent base pair (Franklin et al., 2001; Hogg et al., 2004, 2005). The contacts are much more extensive in pol δ, extending to five base pairs postinsertion (Swan et al., 2009), which could contribute to its high fidelity.

As mentioned above, the β-hairpin segment from the exonuclease domain plays a critical role in the partition of the DNA between polymerization and proofreading sites in T4 and RB69 pols (Stocki et al., 1995; Hogg et al., 2007). In RB69 gp43 the β-hairpin motif adopts different conformations, depending on whether the complex was obtained with undamaged DNA (Franklin et al., 2001; Zahn et al., 2007) or DNA containing a damage (Freisinger et al., 2004; Hogg et al., 2004). It was fully visualized contacting both the primer and template strands in a complex with thymine glycol (Aller et al., 2011). Similarly, the β hairpin in pol δ protrudes into the major groove of the DNA and acts as a wedge between double-stranded DNA and the singlestranded 5- end of the template strand, which is stabilized by two aromatic residues Phe441 and Tyr446 (**Figure 1**) (Swan et al., 2009). The position of the β hairpin is consistent with a role in active site switching.

#### **INTERDOMAIN CONTACTS AND FIDELITY**

Mutations involved in cancer are mostly found in the exonuclease domain of pol δ and ε, emphasizing the critical role of proofreading in lowering the incidence of mutations (Church et al., 2013; Henninger and Pursell, 2014). One mutation in human colorectal cancer cells localizes to the fingers domain, R689W. The


**2|CompilationofDNApolymerasesoftheBfamilyofknown**

analogous mutation in yeast (R696W) results in a mutator phenotype (Daee et al., 2010). A mutation in the vicinity of Arg696 in the highly conserved motif B of the fingers subdomain of yeast pol δ (A699Q) also results in a mutator phenotype. This region of the fingers is in close proximity to the NTD. Mutating Met540 of the NTD to alanine abolishes the mutator phenotype of A699Q, illustrating that interactions between the fingers and the NTD can affect the fidelity of the polymerase (Prindle et al., 2013). Similarly in T4 and RB69 pols the NPL core motif, which involves residues from the N-terminal and palm domains, is in contact with the fingers domain and was shown to stabilize polymerase-DNA complexes (Li et al., 2010).

# **DNA POLYMERASE ε**

The catalytic subunit of DNA polymerase ε is the product of a very large gene (2222 amino acids in yeast; 2286 in humans), and is only third in size after polymerase ζ (also a member of the B family) and pol θ, a family A polymerase (3130 and 2590 amino acids, respectively, in humans) (Lange et al., 2011; Hogg and Johansson, 2012) (**Figure 1**; **Table 2**). Pol ε is twice as large as pol δ and is composed of two tandem polymerase/exonuclease regions. The N-terminal segment harbors both polymerase and proofreading activities whereas the C-terminal segment is inactivated. The two exonuclease-polymerase modules are distantly related (Tahirov et al., 2009). Although the inactivated segment is presumed to play a structural role during replication, two groups were able to crystallize catalytically active pol ε constructs (residues 1–1228; 1–1187) lacking the entire C-terminal module (Hogg et al., 2014; Jain et al., 2014a). Both crystal structures were of a ternary complex of the polymerase, DNA primer/template and incoming nucleotide.

#### **A NOVEL PROCESSIVITY DOMAIN EMANATING FROM THE PALM DOMAIN**

Pol ε differs from pol δ in that it does not require the DNA sliding clamp PCNA for high processivity (Hogg and Johansson, 2012). The palm domain of pol ε is substantially larger (380 residues) than that of pol α or δ (175 and 203 residues, respectively). The recent pol ε crystal structures revealed that insertions in the palm domain collectively form a new domain consisting of three β strands and two helices (residues 533–555; 682–760) (Hogg et al., 2014; Jain et al., 2014a) (**Figure 1**; Table S1). Deleting residues 690–751 resulted in a variant with decreased polymerase activity. Moreover, mutating positively charged residues (His748, Arg749, and Lys751) located in the vicinity of the phosphate backbone affected the processivity of the enzyme (Hogg et al., 2014). The extra domain originating from the palm was thus named the processivity or P domain, after its function. The base of the P domain harbors a metal binding site (see below) (Hogg et al., 2014; Jain et al., 2014a,b).

#### **AN IRON SULFUR CLUSTER WITHIN THE POLYMERASE DOMAIN**

Unexpectedly solution studies revealed that the catalytic subunit of yeast polymerase ε itself contains an [4Fe-4S] cluster within its polymerase fold (Jain et al., 2014b), in addition to the [4Fe-4S] cluster in the CTD (**Figure 2**; **Table 2**). The second [4Fe-4S] cluster within pol ε suggests that this polymerase may be more sensitive to oxidative stress (Jain et al., 2014b). The crystal structures of pol ε, however, did not reveal a [4Fe-4S] cluster in the polymerase domain (Hogg et al., 2014; Jain et al., 2014a; Zahn and Doublié, 2014). Two of the cysteines residues are disordered in the structural models and the resulting metal binding site appears to bind zinc (Hogg et al., 2014; Jain et al., 2014a). Substitution of a [4Fe-4S] by a non-native zinc in metal-binding proteins is not unusual (Netz et al., 2012) as [4Fe-4S] clusters are labile. Visualizing the [4Fe-4S] within the polymerase domain of pol ε may necessitate anaerobic conditions.

#### **A SHORT β-HAIRPIN MOTIF IN THE EXONUCLEASE DOMAIN**

In any DNA polymerase harboring both polymerase and exonuclease activities the bound DNA is in equilibrium between the two active centers (Beechem et al., 1998). The concentration of incoming nucleotide and the presence of a damaged base or mispair are two factors that influence the transfer of DNA from the polymerase activate site to the proofreading active site. Polymerases monitor the minor groove side of the newly formed base pairs and interact with the universal H bond acceptors, O3, and N2, as a way of checking for mismatches (Seeman et al., 1976; Franklin et al., 2001). A unique feature of pol ε is the contact to the major groove side of the nascent base pair via a residue from the exonuclease domain, Tyr431. Further analysis is warranted to elucidate the potential role of this tyrosine in the high fidelity of pol ε.

In pol δ the β-hairpin segment inserts itself in the DNA and acts as a wedge between single-stranded and double-stranded DNA (Swan et al., 2009). In *E. coli* DNA pol II, the insertion of a β barrel shifts the position of the β hairpin in such a way that polymerization is favored over proofreading (Wang and Yang, 2009). This modification presumably allows this polymerase to carry out translesion synthesis extension. Since pol ε is an accurate DNA polymerase the assumption before knowledge of the crystal structure would be that the β hairpin should be closer to that of pol δ than that of *E. coli* Pol II. Surprisingly, the β-hairpin motif in polε is truncated, too short to contact the DNA (**Figure 1**). Which protein motif, then, might be facilitating active site switching upon sensing of a mispair? The P domain is a good candidate, because of its contacts to both primer and template strands; residues from the P domain could sense replication errors and thus may help facilitate active site switching.

#### **CONCLUSIONS**

All three eukaryotic replicative DNA polymerases use a common B-family fold, and each polymerase has incorporated modified structural elements which are unique and tailored for each polymerase's specific function (for example, the addition of the processivity domain in pol ε, a processive polymerase that does not use PCNA, or the modified region contacting the 5- end of the template in pol α, a polymerase devoid of proofreading activity). The fold of B family polymerases is well suited for high-fidelity, replicative polymerases. But surprisingly, it is also used by translesion polymerases. Eukaryotic pol ζ (or REV3L) is a 353 kDa polymerase which functions in translesion synthesis and appears to suppress tumorigenesis (Wittschieben et al., 2010; Lange et al., 2011; Zahn et al., 2011; Hogg and Johansson, 2012; Sharma et al., 2013). The structure of *E. coli* Pol II revealed modifications in the NTD which affect the position of the β hairpin of the exonuclease domain, and thus partitioning of the DNA between the polymerization and proofreading sites (Wang and Yang, 2009). The structure of pol ζ may reveal similar adjustments, which alter the fold employed by high-fidelity, replicative polymerases to render the enzyme less faithful and able to perform translesion synthesis.

### **ACKNOWLEDGMENT**

This work was supported by a grant from the National Institutes of Health (NCI R01 CA 52040).

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www*.*frontiersin*.*org/Journal/10*.*3389/fmicb*.*2014*.* 00444/abstract

### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 18 June 2014; paper pending published: 07 July 2014; accepted: 04 August 2014; published online: 25 August 2014.*

*Citation: Doublié S and Zahn KE (2014) Structural insights into eukaryotic DNA replication. Front. Microbiol. 5:444. doi: 10.3389/fmicb.2014.00444*

*This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology.*

*Copyright © 2014 Doublié and Zahn. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

**REVIEW ARTICLE** published: 29 August 2014 doi: 10.3389/fmicb.2014.00465

# DNA polymerases as useful reagents for biotechnology – the history of developmental research in the field

### *Sonoko Ishino and Yoshizumi Ishino\**

Department of Bioscience and Biotechnology, Graduate School of Bioresource and Bioenvironmental Sciences, Kyushu University, Fukuoka, Japan

#### *Edited by:*

Zvi Kelman, University of Maryland, USA

#### *Reviewed by:*

Frank T. Robb, University of Maryland School of Medicine, USA Lori Kelman, Montgomery College, USA

#### *\*Correspondence:*

Yoshizumi Ishino, Department of Bioscience and Biotechnology, Graduate School of Bioresource and Bioenvironmental Sciences, Kyushu University, 6-10-1 Hakozaki, Higashi-ku, Fukuoka 812-8581, Japan e-mail: ishino@agr.kyushu-u.ac.jp

DNA polymerase is a ubiquitous enzyme that synthesizes complementary DNA strands according to the template DNA in living cells. Multiple enzymes have been identified from each organism, and the shared functions of these enzymes have been investigated. In addition to their fundamental role in maintaining genome integrity during replication and repair, DNA polymerases are widely used for DNA manipulation in vitro, including DNA cloning, sequencing, labeling, mutagenesis, and other purposes. The fundamental ability of DNA polymerases to synthesize a deoxyribonucleotide chain is conserved. However, the more specific properties, including processivity, fidelity (synthesis accuracy), and substrate nucleotide selectivity, differ among the enzymes. The distinctive properties of each DNA polymerase may lead to the potential development of unique reagents, and therefore searching for novel DNA polymerase has been one of the major focuses in this research field. In addition, protein engineering techniques to create mutant or artificial DNA polymerases have been successfully developing powerful DNA polymerases, suitable for specific purposes among the many kinds of DNA manipulations. Thermostable DNA polymerases are especially important for PCR-related techniques in molecular biology. In this review, we summarize the history of the research on developing thermostable DNA polymerases as reagents for genetic manipulation and discuss the future of this research field.

**Keywords: thermostability, gene amplification,** *in vitro* **gene manipulation, Archaea, hyperthermophile**

#### **IN THE BEGINNING: TAQ POLYMERASE**

DNA polymerase I from *Thermus aquaticus* (Taq polymerase) is the most famous representative enzyme among the thermostable DNA polymerases. Taq polymerase was identified from *T. aquaticus* isolated from Yellowstone National Park in Montana, USA. The report was published by Chien et al. (1976) as her Master's course study. At that time, nobody foresaw how famous this enzyme would later become. In 1985, PCR (polymerase chain reaction) technology using the Klenow fragment of DNA polymerase I from *Escherichia coli* was reported (Saiki et al., 1985). It was easily imagined that a heat-stable DNA polymerase that is not inactivated at the denaturation step from double-stranded to single-stranded DNA would transform this method of gene amplification to a practical technology. Subsequently, a simple and robust PCR method using Taq polymerase was published (Saiki et al., 1988). Due to the heat stability of Taq polymerase, the reaction tube could remain in the incubator after the reaction mixture containing the DNA polymerase was prepared, and only temperature changes were required for PCR. An instrument capable of quick reaction temperature change was developed, and the PCR market opened with a PCR kit (GeneAmp PCR Reagent Kit) and an instrument (Thermal Cycler) provided by Perkin-Elmer Cetus. DNA polymerase from *Thermus thermophilus* (Tth polymerase) was also developed as a commercial product in the early age of the PCR, but a scientific report was only an abstract of ASBMB in 1974 from the Mitsubishi-Kasei Institute of Life Sciences, Japan, where this enzyme was originally identified. A specific property of

Tth polymerase is that it has a distinct reverse transcriptase (RT) activity, and a single-tube RT-PCR method was developed with this enzyme.

At the beginning of the PCR age, Taq polymerase was purified from *T. aquaticus* cells. However, the *pol* gene was soon cloned from the *T. aquaticus* genome and expressed in *E. coli* cells. The native Taq polymerase was replaced by the recombinant Taq polymerase, named AmpliTaq DNA polymerase, in the commercial field. The amount of the recombinant Taq polymerase produced in *E. coli* cells was very low, probably because of the low expression of the *T. aquaticus* gene, which has a high GC content (70%)*,* although the protein quality was improved, as compared to the native Taq polymerase (Lawyer et al., 1989). We successfully constructed an efficient overproduction system by changing the codons around the N-terminal region from the original gene to either the AT-type at the third letter or the optimal codons for *E. coli*. These manipulations improved the production of Taq polymerase more than 10-fold, as compared with the production of AmpliTaq (Ishino et al., 1994). Taq polymerase has been used as the standard enzyme for PCR since its inception. An abundance of PCR data obtained using Taq polymerase has been accumulated, providing a valuable resource for developing new products for useful PCR modifications.

#### **THERMOSTABLE DNA POLYMERASES FROM THERMOPHILES**

Thermophilic organisms utilize thermostable DNA polymerases, and therefore, thermophiles became more popular as genetic

resources of DNA polymerases and other enzymes for industrial use. The heat stability of the enzymes is directly related to the temperature, at which the organism thrives. Thermophiles are classified into extreme thermophiles, which grow at temperatures greater than 75◦C, and moderate thermophiles, which grow at 55–75◦C. The thermostabilities are obviously different between the DNA polymerases from extreme thermophiles and moderate thermophiles as shown in **Figure 1**. Taq polymerase is applicable to PCR; however, the DNA polymerases from the moderately thermophilic *Bacillus* species are not suitable for PCR, because of their insufficient stability. Hyperthermophiles are particular extreme thermophiles that grow optimally at temperatures above 80◦C. Most of the hyperthermophilic organisms are Archaea, although some are bacteria, as shown in (**Table 1**). Generally, hyperthermophiles have the potential to provide more heat-stable enzymes than normal thermophiles. Actually, the DNA polymerase from *Pyrococcus furiosus* (Pfu polymerase) is more stable than Taq polymerase (**Figure 1**). Hyperthermophilic archaea became popular not only as sources of useful enzymes for application, but also as interesting model organisms for molecular biology. In the early 1990s, the metabolic phenomena in archaeal cells were just barely understood, and therefore, the molecular biology of Archaea, the third domain of life, became a novel and exciting field.

#### **DNA POLYMERASES FROM HYPERTHERMOPHILES**

When choosing thermostable DNA polymerases as reagents for genetic engineering, research scientists generally do not consider the biology of the source organisms. The properties of the obtained enzyme are important, regardless of the source. To obtain a thermostable DNA polymerase, the growth temperature of the thermophile attracts the most attention. *Thermotoga maritima* DNA polymerase was the first commercial product (ULTIMA DNA polymerase) from the hyperthermophilic bacteria. This enzyme has an associated 3 –5 exonuclease activity and thus is expected to perform PCR more accurately with its proofreading

circles), Thermus aquaticus (closed circles), and Bacillus caldotenax (open squares) were used as representatives from hyperthermophiles, extreme thermophiles, and moderate extremophiles, respectively.

activity. All PCR enzymes from the domain Bacteria are from family A, whose members generally lack 3 –5 exonuclease activity, and ULTMA DNA polymerase was an exception, like *E. coli* Pol I. In spite of this selling point, ULTIMA DNA polymerase was not a commercial success. One report described no significant differences in the fidelities of the ULTIMA and Taq polymerases, when using optimal buffer conditions for each enzyme, for sequencing purposes (Diaz and Sabino, 1998).

DNA polymerases from the hyperthermophilic archaea were also assessed as PCR enzymes.We cloned the *pol* gene from *P. furiosus* and expressed it in *E. coli* (Uemori et al., 1993). We thought ours would be the first report of the full-length sequence of an archaeal family B DNA polymerase, which had been predicted earlier because of the aphidicolin-sensitive phenotype of a halophile and a methanogen (Forterre et al., 1984; Zabel et al., 1985). However, two papers showing the deduced total amino acid sequences of DNA polymerases from the hyperthermophilic archaea, *Sulfolobus solfataricus* (Pisani et al., 1992) and *Thermococcus litoralis* (Perler et al., 1992) were published during the preparation of our manuscript (Uemori et al., 1993). All these reports clearly showed that the archaeal DNA polymerases have sequences similar to the

#### **Table 1 | Representative hyperthermophiles.**


eukaryotic replicative DNA polymerases, Pol α, δ, and ε(family B). It is also interesting that the *T. litoralis pol* has inteins that must be spliced out after translation (Perler et al., 1992). Thereafter, many cases of DNA polymerases containing various pattern of inteins, inserted in motifs A, B, and C, were discovered (Perler, 2002). The fidelity of DNA synthesis *in vitro* is markedly affected by the reaction condition. However, the archaeal family B enzymes generally perform more accurate DNA synthesis as compared with Taq polymerase (Cariello et al., 1991; Ling et al., 1991; Lundberg et al., 1991; Mattila et al., 1991), suggesting that the strong 3 –5 exonuclease activities of the hyperthermophilic family B polymerase *in vitro* affect the fidelity of PCR.

#### **DEVELOPMENT OF LA-PCR**

DNA polymerases are classified into seven families based on the amino acid sequence similarity (**Figure 2**). To date, the enzymes utilized for genetic engineering have been only from families A and B among them. Taq polymerase from family A has strong extension ability and performs efficient amplification of the target DNA. However, their fidelity is low. On the other hand, the Pfu polymerase from family B performs highly accurate PCR amplification, but their extension rate is slow and a long extension time is required for each cycle of PCR. Therefore, a method was required for the accurate PCR amplification of long DNA regions. One simple idea that researchers considered trying was to combine one enzyme each from family A and family B in a single PCR reaction mixture. However, the actual PCR performance was not so simple, and persevering trials were necessary to find suitable conditions to develop a long and accurate (LA) PCR system. The amplification of a ∼35 kb DNA fragment from λ phage genomic DNA was successfully accomplished in 1994, by the mixture of Klentaq1 (N-terminal deletion mutant of Taq polymerase) and an archaeal family B DNA polymerase with 3 –5 exonuclease activity (Barns, 1994). Subsequently, commercial products for LA-PCR were rapidly developed by several manufactures and LA-PCR technology became popular throughout the world.

#### **FAST AND HIGHLY ACCURATE PCR BY AN ARCHAEAL FAMILY B DNA POLYMERASE**

A family B DNA polymerase from the hyperthermophilic archaeon, *Thermococcus kodakarensis* (this strain was originally named *Pyrococcus kodakaraensis* KOD1), was identified and

**life.** The names of DNA polymerases vary, depending on the domains. Only DNA polymerases with in vitro activity, if applicable, are shown. Eukaryotic Polγ is from mitochondria and archaeal PolE is a plasmid-encoded enzyme.

applied to PCR (Takagi et al., 1997). This enzyme has the typical amino acid sequence of the archaeal family B enzymes, but it showed a high extension rate while maintaining high fidelity, and therefore, the commercial product, KOD DNA polymerase (KOD Pol), was developed and became popular as a PCR enzyme. Commercial products related to KOD Pol, including a hot start kit with a monoclonal antibody and an LA-PCR kit with a mixture of the wild type and 3 –5 exonuclease-deficient mutant of this enzyme, were subsequently developed by the manufacturers. The underlying reason why this family B enzyme shows high extension speed is interesting. Comparisons of the crystallographic structures and amino acid sequences of KOD Pol with other archaeal family B enzymes revealed the logical explanation for the efficient extension ability of this enzyme. Many basic residues are located around the active site in the finger domain of KOD Pol. In addition, many Arg residues are located at the forked point, which is the predicted as the junction of the template binding region and the editing cleft. This unique structure may stabilize the melted DNA structure at the forked point, resulting in high PCR performance (Hashimoto et al., 2001).

# **BASIC RESEARCH ON ARCHAEAL DNA POLYMERASES**

Research on DNA polymerases in hyperthermophilic archaea is motivated by not only industrial applications, but also basic molecular biology, to elucidate the molecular mechanisms of genetic information processing systems at extremely hot temperatures. To identify all of the DNA polymerases in the archaeal cell, we tried to separate the DNA polymerase activities in the total cell extract of *P. furiosus*. Three major fractions showed nucleotide incorporation activity after anion exchange column chromatography (Resource Q column, GE Healthcare; Imamura et al., 1995). In addition to the further purification of each fraction, the screening of the DNA polymerase activity from the heat-stable protein library, made from *E. coli* cell extracts containing *P. furiosus* DNA fragments, revealed a new DNA polymerase gene (Uemori et al., 1997). The new DNA polymerase consisted of two proteins, the small and large subunits, and we named it DP1 and DP2. There two proteins are strictly required for both 5 –3 polymerizing and 3 –5 exonucleolytic activities *in vitro*. The genes encoding DP1 and DP2 are located in tandem on the *P. furiosus* genome and form an operon. Interestingly, this operon has a total of five genes, including a gene encoding a eukaryotic Cdc6/Orc1 protein (important for initiation of DNA replication) and a gene encoding a Rad51-like protein (involved in homologous recombination in Eukarya), in addition to DP1 and DP2 (**Figure 3**; Uemori et al., 1997). This was the first report of a eukaryoticlike initiator protein for DNA replication in Archaea. The amino acid sequences of DP1 and DP2 are not similar to those of any other DNA polymerases. After the discovery of this DNA polymerase, the total genome sequence of *Methanococcus jannaschii* was published as the first complete archaeal genome (Bult et al., 1996). One of the topics of this report was that only one DNA polymerase (family B) was found in the deduced amino acid sequences, in contrast to the three DNA polymerases, PolI, II, and III, in *E. coli* and several DNA polymerases in eukaryotic cells (Gray, 1996). We searched for homologous sequences of DP1 and DP2 in the *M. jannaschii* genome, and found them. The two genes

were not present in tandem, but were located separately on the genome. We cloned and expressed them in *E. coli*, and demonstrated their polymerase and exonuclease activities *in vitro*. With this report, DP1 and DP2 became recognized as a novel archaeal DNA polymerase (Ishino et al., 1998). Three more total genome sequences were subsequently reported, and the genes for DP1 and DP2 were found in all them. Thus, this new DNA polymerase became more generally found in Archaea (Cann et al., 1998). Due to the lack of sequence homology to other DNA polymerases, we proposed a newfamily,family D,for this enzyme (Cann and Ishino, 1999).

In parallel to the identification of DNA polymerase activities in the cell extract of *P. furiosus*, we amplified a gene fragment for the family B DNA polymerase from the genomic DNA of *Pyrodictium occultum*, which grows at 105◦C, in an attempt to find a more heatstable DNA polymerase than that from *P. furiosus*. By using a set of mixed primers based on the conserved sequences of motifs A and C in the family B DNA polymerase, a single band was amplified. However, two different fragments were found after the cloning and sequencing of the PCR product. The full-length sequences of both *pol*-like genes were cloned from the *P. occultum* genome by the primer walking method, and they were expressed in *E. coli*. Both of the gene products exhibited the heat stable DNA polymerase activity (Uemori et al., 1995). Unfortunately, the performance of these two enzymes in PCR was not better than Pfu polymerase, and we discontinued further research on them. However, this was the first report that an archaeal cell has two different family B DNA polymerases. It was an exciting discovery because three family B DNA polymerases, Polα, Polδ, and Polε, were known in eukarya, and so we proposed that plural family B enzymes were a common feature between Archaea and Eukarya. However, there is only one gene encoding a family B DNA polymerase in the *M. jannaschii* genome as described above. We subsequently found two family B DNA polymerases in *Aeropyrum pernix* (Cann et al., 1999), and thus the presence of two family B enzymes is not special for *Pyrodictium*, but is more general in Archaea. In the early stages of the total genome sequences, all sequences were from Euryarchaeota (*Archaeoglobus fulgidus, Methanothermobacter thermautotrophicus, Pyrococcus horikoshii*) and the determination of the genome sequence of a crenarchaeal organism was delayed until that of *A. pernix* was reported (Kawarabayasi et al., 1999). Taken together

with the new knowledge at that time, it was predicted that euryarchaeal organisms have one DNA polymerase each from family B and family D, respectively, and crenarchaeal organisms have at least two family B enzymes in the cell. This overview of the distribution of DNA polymerases in Archaea is generally correct as shown in (**Figure 4**), which displays DNA polymerases in the archaeal phyla (subdomains) including newly proposed phyla from recent ecological research.

All of the original biochemical data for *P. furiosus* PolD from our group, including thermostability, strong primer extension and 3 –5 exonuclease activity, showed that PolD is a suitable enzyme for PCR (Ishino and Ishino, 2001). However, PolD has not been commercially developed. Recent analysis of *Pyrococcus abyssi* PolD revealed that it is a suitable PCR enzyme (Killelea et al., 2014). On the contrary, PolD from *Thermococcus* sp 9◦N does not have any advantages as compared with the current commercially available PCR enzymes (Greenough et al., 2014).

# **PROTEIN ENGINEERING OF THERMOSTABLE DNA POLYMERASES**

Once PCR technology was established, efforts to improve PCR performance were pursued. At the early stage, hot start PCR was one of the big improvements for the specific amplification. An antibody against Taq polymerase was used to suppress its enzyme activity by specific antigen**–**antibody binding at the low temperature, and when PCR started from the denaturing temperature at more than 90◦C, the antibody became separated from the enzyme by heat denaturation. This hot start PCR method is generally effective to prevent non-specific amplification. For this purpose, another idea was tested. A chemical modification of Taq polymerase inactivated its enzymatic activity at low temperatures, but the modification can be released by high temperature resulting in activation of Taq polymerase to start PCR. This temperature-dependent reversible modification of the Taq protein led to the commercial product, AmpliTaq Gold, as the hot start PCR enzyme. Alternatively, a cold-sensitive Taq polymerase with markedly reduced activity at 37◦C, as compared with the wild type enzyme, was produced by site-directed mutagenesis, and this mutant is suitable for hot start PCR (Kermekchiev et al., 2003).

Taq polymerase is a family A enzyme, and is applicable to practical dideoxy sequencing. However, the output of the sequencing data was not ideal as compared with that from T7 DNA polymerase

(known commercially as Sequenase; see below). An ingenious protein engineering strategy produced a mutant Taq polymerase that is more suitable for dideoxy sequencing than the wild type Taq polymerase. *E. coli* PolI and Taq polymerase discriminate deoxyand dideoxynucleotide as substrates for the incorporation into the DNA strand, and therefore, an excess amount (50 to 1000-fold) of dideoxynucleotides must be present in the reaction mixture to stop DNA strand synthesis by their incorporation. For this property, the strength of each signal is not uniform, but is distinctly unbalanced. However, T7 DNA polymerase equally incorporates deoxynucleotides and dideoxynucleotides, and therefore, it is easy to adjust the reaction conditions to provide very clear signals (Tabor and Richardson, 1990). A mutant T7 DNA polymerase lacking the 3 –5 exonuclease activity was developed as a commercial product, named Sequenase. A detailed comparison of *E. coli* Pol I and T7 polymerase revealed one amino acid that discriminates deoxy- and dideoxynucleotides, resulting in the successful conversions of the properties from PolI to T7 and T7 to PolI (Tabor and Richardson, 1995). This work was applied to Taq polymerase and a modified Taq with F667Y, which endows Taq with T7-type substrate recognition, was created (Tabor and Richardson, 1995). This enzyme was called Thermosequenase, and it became popular as the standard enzyme for the fluorescently labeled sequencing method (Reeve and Fuller, 1995).

Another target for the creation of a new enzyme by mutagenesis is an enzyme that is more resistant to PCR inhibitors in blood or soil, such as hemoglobin and humic acid. A mutant Taq DNA polymerase with enhanced resistance to various inhibitors, including whole blood, plasma, hemoglobin, lactoferrin, serum IgG, soil extracts, and humic acid, was successfully created by site-directed mutagenesis (Kermekchiev et al., 2009). The molecular breeding of *Thermus* DNA polymerases by using a direct evolution technique, compartmentalized self-replication (CSR; Ghadessy et al., 2001), also generated a PCR enzyme with striking resistance to a broad spectrum of inhibitors with highly divergent compositions, including humic acid, bone dust, coprolite, peat extract, clay-rich soil, cave sediment, and tar (Baar et al., 2011). Furthermore, enzymes with a broad substrate specificity spectrum, which are thus useful for the amplification of ancient DNA containing numerous lesions, were also obtained by the CSR technique (Ghadessy et al., 2004; d'Abbadie et al., 2007). Mutational studies of the O-helix of Taq DNA polymerase produced enzymes with reduced fidelity, which may be useful for error-prone PCR (Suzuki et al., 1997, 2000; Tosaka et al., 2001).

One successful strategy to produce improved DNA polymerases is the "domain tagging." For example, new DNA polymerases were created by the flexible attachment of helix–hairpin–helix (HhH) domains of *Methanopyrus kandleri* topoisomerase V to the catalytic domains of Taq polymerase and Pfu polymerases. HhH is a widespread motif and generally functions on sequence-nonspecific DNA binding. These hybrid enzymes increased thermostability and became more resistant to salt and several inhibitors such as phenol, blood, and DNA intercalating dyes (Pavlov et al., 2002). This tagging strategy was also applied to ϕ29 DNA polymerase (de Vega et al., 2010) and *Bacillus stearothermophilus* DNA polymerase (Pavlov et al., 2012). Another

successful example of the tagging strategy is the creation of commercial product "Phusion DNA polymerase" (**Figure 5**). This is a fusion protein of Pfu DNA polymerase and a DNA binding protein, Sso7d, from *S. solfataricus* (Wang et al., 2004). Sso7d has strong affinity to DNA, and it retains the fused Pfu DNA polymerase on the DNA once it starts DNA synthesis along with the template DNA strand. Phusion DNA polymerase compensates for the low extension rate of Pfu DNA polymerase while maintaining its high fidelity. This enzyme shows very high processivity and accurate PCR performance, and is now widely used.

Another idea to improve the processivity of the archaeal family B DNA polymerases was to use PCNA (proliferating cell nuclear antigen) as a processivity factor. The ring-shaped PCNA encircles the DNA strand and slides on it, and various binding proteins are attached to PCNA (Pan et al., 2011). DNA polymerase is a typical PCNA binding protein and it is connected to the DNA strand by PCNA during strand synthesis. This is why DNA polymerase shows highly processive DNA synthesis in the presence of PCNA. Based on this property of PCNA, scientists have searched for a thermostable PCNA for PCR with DNA polymerase. However, PCNA has not yet been successfully used for PCR. Unexpectedly, PCR is inhibited, rather than stimulated, in the presence of PCNA. We developed a PCNA-assisted PCR method, which is highly processive PCR with high fidelity, by using a mutant PCNA. Originally, we determined the crystal structure of *P. furiosus* PCNA (PfuPCNA; Matsumiya et al., 2001), and our continued research

revealed that the intermolecular ion pairs between the protomers of PfuPCNA contributed to its ring stability (Matsumiya et al., 2003), which was greatly affected by the ionic strength of the solution. Mutations of the amino acid residues involved in the ion pairs clearly decreased its ring stability, but unexpectedly, a less stable mutant PfuPCNA enhanced the primer extension reaction of Pfu DNA polymerase *in vitro* (Matsumiya et al., 2003). Therefore, we applied the mutant PfuPCNA to PCR and successfully amplified DNA fragments up to 15 kbp with a markedly shorter reaction time, by Pfu DNA polymerase in the presence of a PfuPCNA mutant under conditions where Pfu DNA polymerase alone did not function (Ishino et al., 2012; Kawamura et al., 2012) This PCNA-assisted PCR (**Figure 5**) is also a successful example of processive PCR with high accuracy.

Because of the high sensitivity of PCR, very small amounts of carry-over contaminants from previous PCRs are considered to be one of the major sources of false positive results. The most common strategy to prevent carry-over contamination is to replace dTTP with dUTP during PCR amplification, thereby producing DNA containing uracil. Prior to initiating PCR, the PCR mixture is treated with Uracil-DNA glycosylase (UNG). During the initial denaturation step temperature is elevated to 95◦C, resulting in cleavage of apyrimidinic sites and fragmentation of carry-over DNA. One problem of the archaeal family B DNA polymerase to be used for this carry-over prevention is that they specifically interact with uracil and hypoxanthine, which stalls their progression on DNA template strands (Connolly, 2009). The crystal structure of the DNA polymerase revealed that read-ahead recognition occurs by an interaction with the deaminated bases in an N-terminal binding pocket that is specifically found in the archaeal family B DNA polymerases (Fogg et al., 2002). Due to this specific recognition of uracil, the archaeal family B DNA polymerases, including Pfu DNA polymerase and KOD DNA polymerase, are not suitable for carry-over prevention PCR. To conquer this defect, a point mutation (V98Q) was introduced into Pfu polymerase. This mutant enzyme is completely unable to recognize uracil, while its DNA polymerase activity is unaffected (Fogg et al., 2002; Firbank et al., 2008). Therefore, this mutant Pfu polymerase is useful for the carry-over prevention PCR. It is also useful for amplification of uracil-containing DNA, such as damaged DNA and bisulfite-converted DNA for epigenetic analysis.

#### **FUTURE PERSPECTIVES**

Polymerase chain reaction initiated a revolution in molecular biology, and is now used daily not only in research, but also in the general human society. PCR is a complete technology, but more powerful and reliable enzymes for PCR are still desired. Notably, an enzyme with faster, longer, and more efficient extension ability, as compared to the properties of the current commercial products, will contribute to further improvements in PCR technology. In addition to these basic abilities, DNA polymerases that can incorporate various modified nucleotides, which are useful for highly sensitive labeling, are valuable for single molecule analysis. Mutations of the DNA polymerase itself, by site-specific or random mutagenesis, are effective ways to create modified enzymes with improved PCR performance or specific properties

for *in vitro* DNA manipulations. An artificial evolution procedure also has attracted a great deal of attention, for the creation of DNA polymerases with novel activities (Brakmann, 2005; Henry and Romesberg, 2005; Holmberg et al., 2005; Ong et al., 2006). Our strategy of using environmental DNA as a genetic resource also works well to investigate the structure–function relationships of DNA polymerases. The region corresponding to the active center of the DNA polymerizing reaction, in the structural genes of Taq polymerase and Pfu polymerase, was substituted with PCR fragments amplified from DNAs within soil samples from various locations in Japan. The chimeric *pol* genes were constructed within the expression plasmids for the Taq and Pfu polymerases in *E. coli*. The chimeric enzymes thus produced, exhibited DNA polymerase activities with different properties (Matsukawa et al., 2009). The main focus for the future development of DNA polymerases is not on versatile enzymes, but rather on specialized enzymes suitablefor individual purposes, including whole genome amplification, rapid detection of short DNA, new sequencing technologies, etc. Continued research on DNA polymerases may facilitate the invention of new genetic analysis technologies that are completely different from PCR or PCR-related techniques. The isothermal amplification without temperature cycling is more convenient and practical than PCR, and development of this type of technique has been actively performed (Gill and Ghaemi, 2008). Several methods practically utilized now are based on the strand displacement (SD) activity of the DNA polymerases. DNA polymerases from ϕ29 bacteriophage and *B. stearothermophilus* are the representative enzymes for the SD activity. A whole genome amplification using the SD activity of ϕ29 DNA polymerase is now especially useful for single cell analysis. Alternatively, helicase was applied for the dissociation of the double-stranded DNA from an idea to mimic DNA replication *in vivo* (Vincent et al., 2004). Although the helicase-dependent amplification (HDA) technique has not been practically used (Jeong et al., 2009), brushing up this technique may generate a powerful tool for genetic engineering.

#### **ACKNOWLEDGMENTS**

The writing of this review article was supported by a grant from the Ministry of Education, Culture, Sports, Science and Technology of Japan (grant number 26242075 to Yoshizumi Ishino).

#### **REFERENCES**


Gray, M. (1996). The third form of life. *Nature* 383:299. doi: 10.1038/383299a0


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 02 July 2014; accepted: 15 August 2014; published online: 29 August 2014. Citation: Ishino S and Ishino Y (2014) DNA polymerases as useful reagents for biotechnology – the history of developmental research in the field. Front. Microbiol. 5:465. doi: 10.3389/fmicb.2014.00465*

*This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology.*

*Copyright © 2014 Ishino and Ishino. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# DNA polymerase hybrids derived from the family-B enzymes of *Pyrococcus furiosus* and *Thermococcus kodakarensis*: improving performance in the polymerase chain reaction

# *Ashraf M. Elshawadfy†, Brian J. Keith , H'Ng Ee Ooi , Thomas Kinsman , Pauline Heslop and Bernard A. Connolly\**

*Institute of Cell and Molecular Biosciences, University of Newcastle, Newcastle upon Tyne, UK*

#### *Edited by:*

*Zvi Kelman, University of Maryland, USA*

#### *Reviewed by:*

*Juergen Reichardt, James Cook University, Australia David L. Bernick, University of California, Santa Cruz, USA*

#### *\*Correspondence:*

*Bernard A. Connolly, Institute of Cell and Molecular Biosciences (ICaMB), University of Newcastle, The Medical School, Framlington Place, Newcastle upon Tyne, NE24HH, UK e-mail: bernard.connolly@ncl.ac.uk*

#### *†Present address:*

*Ashraf M. Elshawadfy, Botany Department, Zagazig University, Zagazig, Egypt*

The polymerase chain reaction (PCR) is widely applied across the biosciences, with archaeal Family-B DNA polymerases being preferred, due to their high thermostability and fidelity. The enzyme from *Pyrococcus furiosus* (Pfu-Pol) is more frequently used than the similar protein from *Thermococcus kodakarensis* (Tkod-Pol), despite the latter having better PCR performance. Here the two polymerases have been comprehensively compared, confirming that Tkod-Pol: (1) extends primer-templates more rapidly; (2) has higher processivity; (3) demonstrates superior performance in normal and real time PCR. However, Tkod-Pol is less thermostable than Pfu-Pol and both enzymes have equal fidelities. To understand the favorable properties of Tkod-Pol, hybrid proteins have been prepared. Single, double and triple mutations were used to site arginines, present at the "forked-point" (the junction of the exonuclease and polymerase channels) of Tkod-Pol, at the corresponding locations in Pfu-Pol, slightly improving PCR performance. The Pfu-Pol thumb domain, responsible for double-stranded DNA binding, has been entirely replaced with that from Tkod-Pol, again giving better PCR properties. Combining the "forked-point" and thumb swap mutations resulted in a marked increase in PCR capability, maintenance of high fidelity and retention of the superior thermostability associated with Pfu-Pol. However, even the arginine/thumb swap mutant falls short of Tkod-Pol in PCR, suggesting further improvement within the Pfu-Pol framework is attainable. The significance of this work is the observation that improvements in PCR performance are easily attainable by blending elements from closely related archaeal polymerases, an approach that may, in future, be extended by using more polymerases from these organisms.

**Keywords: PCR,** *Pyrococcus furiosus***,** *Thermococcus kodakarensis***, archaeal DNA polymerase, domain swapping, thermostability, fidelity**

### **INTRODUCTION**

Thermostable DNA polymerases are essential components of the polymerase chain reaction (PCR), a technique with myriad uses across the life sciences (Mullis et al., 1994; Weissensteiner et al., 2004; McPherson and Möller, 2006; Saunders and Lee, 2013). The family-B DNA polymerases from the *Thermococcales* order of the archaea are especially favored for PCR, due to their extreme stability at elevated temperatures and the presence of fidelity-conferring 3- -5 exonuclease activity (Lundberg et al., 1991; Cline et al., 1996; Takagi et al., 1997; Nishioka et al., 2001). Both the amino acid sequences and X-ray structures of these polymerases demonstrate a high degree of similarity, with structures available for the enzymes isolated from *Thermococcus gorgonarius* (Tgo-Pol) (Hopfner et al., 1999; Firbank et al., 2008; Killelea et al., 2010), *Thermococcus kodakarensis* (Tkod-Pol) (Hashimoto et al., 2001; Kuroita et al., 2005; Bergen et al., 2013), *Thermococcus* species 9◦N-7 (9◦N-Pol) (Chapin-Rodriguez et al., 2000), *Pyrococcus furiosus* (Pfu-Pol) (Kim et al., 2008) and *Pyrococcus abysii* (Pab-Pol) (Gougel et al., 2012). Despite similarities of amino acid sequence and structure, these polymerases have diverse kinetic properties; a strong influence on PCR performance. Most notably Tkod-Pol possesses higher processivity (the number of dNTPs incorporated per binding event) than other enzymes (Takagi et al., 1997; Nishioka et al., 2001). Enhanced processivity may arise from the presence of seven arginines, suggested to play a role in stabilizing primer-template binding and influencing the movement of DNA between the polymerization and proof reading active sites (Hashimoto et al., 2001; Kim et al., 2008). These arginines cluster near the "forked-point" (the junction between the template-binding and editing clefts) of Tkod-Pol (**Figure 1A**). The seven amino acids are well conserved in *Thermococcales* DNA polymerases, with two (R243 and R264) being present in all species (**Figure 1B**). The other five locations show more variation, although the amino acid at 266 is an arginine in both Tkod-Pol and Pfu-Pol. At the remaining four positions (247, 365, 381, 501), the arginines present in Tkod-Pol

**the** *Thermococcales* **order of the archaea. (A)** Close spatial proximity of the forked-point arginines in Tkod-Pol (pdb; 1GCX). The arginines shown in orange are not conserved in Pfu-Pol and are the subject of this study. The arginines shown in blue are present in both Tkod-Pol and Pfu-Pol. **(B)** Amino acid sequence line up of the "forked-point" arginines and their immediate neighbors. The seven arginines present at the forked-point of *Thermococcus kodakarensis* DNA polymerase are shown in red and their position in the polypeptide chain indicated by the numbers above the sequence. Retention of arginine in other polymerases is indicated in red, change to an alternative at amino acid 381 allows for two alternative alignments. For the species list, T = *Thermococcus*, P = *Pyrococcus*. **(C)** Superimposition of the Tkod-Pol (main chain in blue) (pdb; 1GCX) and Pfu-Pol (main chain in green) (pdb; 2JGU) near arginine 381. The overlay clearly shows that the spatial equivalent of Tkod-Pol R381 (purple) is Pfu-Pol L381 (red), rather than Pfu-Pol R382 (yellow) which is near Tkod-Pol Q382 (gray). **(D)** Thumb domain of Tkod-Pol bound to DNA in a polymerization mode (pdb; 4K8Z). The amino acids that are different in Pfu-Pol are shown in blue (no direct contact to DNA) or green (direct contact to DNA).

are replaced by an alternative amino acid. The situation at Tkod-Pol position 381 is slightly obscured by insertion of an additional leucine at position 381 in Pfu-Pol. This arrangement, seen with many of the *Thermococcales* DNA polymerases, leads to two possible sequence alignments (**Figure 1B**). The first, generated by the alignment algorithm Clustal (Sievers et al., 2011), lines up Tkod-Pol R381 with Pfu-Pol R382. However, superimposition of the structures of the two polymerases (**Figure 1C**) shows that R381 in Tkod-Pol and L381 in Pfu-Pol are spatially equivalent, suggesting the alternative alignment shown in **Figure 1B** (Hashimoto et al., 2001). Another region that may contribute to processivity is the thumb domain, responsible for binding double-stranded DNA (Firbank et al., 2008; Killelea et al., 2010; Gougel et al., 2012; Bergen et al., 2013). This domain grips DNA tightly and interacts with newly synthesized double-strands, implying an important role in DNA translocation. In the absence of DNA, the thumb shows high flexibility and stretches of amino acids are often invisible in apo-enzyme crystal structures. The overall fold of the thumb domain does not differ significantly between archaeal DNA polymerases; however, its location relative to other domains is quite variable and entire domain motion is observed on DNA binding. Although the amino acid sequences in this domain are similar overall, many changes to individual amino acids are seen when Pfu-Pol and Tkod-Pol are compared (**Figure 1D**; supplementary data, Figure S1). However, in the current absence of polymerase-DNA-dNTP ternary structures, it is not easy to correlate any differences in processivity between Pfu-Pol and Tkod-Pol with individual amino acids in the thumb region.

During archaeal replication processivity is facilitated by PCNA, a sliding clamp which completely encircles the DNA, while simultaneously binding to the polymerase (Ishino and Ishino, 2011). Polymerase dissociation from DNA is hindered and overall processivity is enhanced, leading to rapid copying of the genome. In the *thermococcales* PCNA is a homo-trimeric ring-shaped molecule and the ring must be opened to load the protein onto DNA. *In vivo*, loading of PCNA is carried out by RFC, an ATPdriven clamp loader (Ishino and Ishino, 2011). However, *in vitro*, e.g., during the PCR, PCNA is not usually present and copying of DNA relies on the intrinsic processivity of the polymerase. Occasionally PCNA has been added and has been reported to enhance the PCR capability of Tkod-Pol (Kitabayashi et al., 2002). A recent comprehensive study showed that native Pfu-PCNA only improved PCR if RFC was also present, presumably to facilitate loading of the clamp onto DNA (Ishino et al., 2012). However, two PCNA mutants with reduced ring stability where able to self-load and considerably improved the PCR performance of Pfu-Pol, in the absence of RFC. For biotechnology applications more processive polymerases have been generated by fusion with double-stranded DNA binding proteins such as helix-hairpinhelix motifs (Pavlov et al., 2002) and Sso7d (a thermostable protein from *Sulfolobus solfataricus*) (Wang et al., 2004); the latter giving significant improvement in many PCR applications. Elevated processivity leads to more rapid copying of DNA and a polymerase with this property should be better at the PCR i.e., generate product more rapidly or produce longer amplicons. As processivity, *in vivo*, is largely a function of the sliding clamp, there may be little evolutionary pressure on polymerases to maximize this function, allowing considerable scope for improvement. In this publication we have sought to rationalize the differences in processivity and PCR ability previously observed between Tkod-Pol and Pfu-Pol. Hybrid polymerases have been produced, mainly Pfu-Pol but containing amino acids/domains from Tkod-Pol suggested to contribute to processivity. Do these hybrids result in a transfer of processivity from Tkod-Pol to Pfu-Pol and improved PCR performance? A better comprehension of why two structurally similar DNA polymerases have different kinetic properties is intrinsically important and may also guide further rational mutagenesis.

#### **MATERIALS AND METHODS PROTEIN PURIFICATION AND MUTAGENESIS**

Wild type Pfu-Pol B (gene inserted into pET17b) was purified as reported (Evans et al., 2000; Emptage et al., 2008). Wild type TKod-Pol B (present in pET21a, supplied by Dr. Zvi Kelman University of Maryland) was purified in an identical manner. The single, double and triple mutants of Pfu-Pol were produced using a QuickChange® site directed mutagenesis kit (Agilent-Stratagene, Stockport) with Velocity™ DNA polymerase (Bioline, London, UK). The genes encoding the Pfu-TKod thumb swap derivatives were prepared using overlap extension PCR (Warrens et al., 1997). All mutated genes were completely sequenced to ensure the presence of the desired mutation and an absence of changes elsewhere. All mutants were purified in the same manner as the wild type enzyme with SDS-PAGE and Coomassie Blue staining indicating a purity of >95%.

#### **POLYMERASE AND EXONUCLEASE ASSAYS**

Primer-template extensions used the fluorescent-labeled primertemplates given in the legends to **Figures 2**, **3** and supplementary data Figures S2, S3. These reactions were conducted, at 30◦C, in 400μl of 20 mM Tris pH 8.0, 10 mM KCl, 10 mM (NH4)2SO4, 2 mM MgSO4, 0.1% (v/v) Triton X-100 and 40μg of bovine serum albumin. 400μM of each dNTP was used with 10 nM primer-template and reactions were initiated by adding the polymerase to a final concentration of 500 nM. Subsequently 40 μl aliquots were sampled at appropriate time-points by quenching with an equal volume of stop buffer (95% formamide, 10 mM EDTA, 10 mM NaOH, 2μM of a competitor oligodeoxynucleotide and 0.05% xylene cyanol indicator dye. Primer-templates were denatured by heating at 100◦C for 10 min followed by cooling on ice. The competitor, which has the same sequence as the fully extended primer but lacks the fluorophore, prevents significant re-hybridization of the fluorescent products to the template (Russell et al., 2009). The samples (20μl) were resolved on a denaturing 17% polyacrylamide gel containing 8 M urea run at 4.5 Watts for 4.5 h and visualized using a Typhoon scanner with ImageQuant software (GE Healthcare). Exonuclease assays were performed similarly, save for omission of the dNTPs and data were fitted to a first order reaction (% substrate remaining <sup>=</sup> 100e−kt <sup>+</sup> offset; *<sup>k</sup>* <sup>=</sup> rate constant, *<sup>t</sup>* <sup>=</sup> time) using GraFit (Erithacus Software, London, UK), giving rate constants.

### **REAL TIME PCR**

Real-time PCR used a Rotor-Gene 6000 thermocycler (Corbett Research, Qiagen). Genomic DNA from *S. cerevesiae* was used as the template, with the DNA Pol 2 gene targeted for amplification. A common forward primer (TACGTACCGCCGCAATACAA TGGCAGG) and four different reverse primers (TCGAATTG CCGCCGCCATTACTACCAC, TCGACTTGAAGCTCCCACC CTCTTCATC, GGCGTCAACTTTTTCCGAGCCATTTGC and TCATCGAACATGTCCAAGCCGTGAATCTTAC) were used to

**FIGURE 2 | Elongation of Primer-templates by archaeal family-B DNA polymerases.** Gel electrophoretic analysis of primer strand extension observed with: **(A)** Pfu-Pol wild type; **(B)** Pfu-Pol M247R/L381R; **(C)** Pfu-Pol L381R/K501R; **(D)** Tkod-Pol. The primer-template used was: 5- -GGGGATCC TCTAGAGTCGACCTGC 3- -CCCCTAGGAGATCTCAGCTGGACGACCGTTCGTT CGAACAGAGG. The primer was labeled at its 5- -terminus with either cyanine 5 (used with wild type Pfu-Pol and the double mutants) or fluorescein (used with Tkod-Pol).

amplify lengths of 145, 232, 543, and 1040 base pairs. Reactions were carried out in 25μl that contained 30 ng *S. cerevesiae* genomic DNA (Novagen), 1μM of each primer, 400μM of each dNTP, and 2.5μl of SYBR green (10,000 × stock in dimethlysulfoxide, Invitrogen; initially diluted 1000-fold with water). The reactions were initiated by adding the polymerase (final concentration 20 nM) in the same buffer used for primer-template extensions. The PCR consisted of 1 × 95◦C for 2 min followed by 40 cycles of: 95◦C (10 s)/58◦C (20 s)/72◦C (the time used for the 72◦C extension step varied as indicated in the results). On completion of the PCR a melt curve analysis, consisting of a 90 s pre-melt step at 67◦C followed by a temperature increase to 95◦C at 0.2◦C per second, was carried out. 20μl of the real time PCR mixtures were run on a 1% agarose gel (detection with ethidium bromide) in order to verify that amplification resulted in a product of the correct size.

#### **PCR**

Amplification of a stretch of DNA ∼5 kbases long within the plasmid pET17b[Pfu-Pol] (Evans et al., 2000) was carried out in 50μl using TCTGCTATGTGGCGCGGTATTATCC and CAACTCAGCTTCCTTTCGGGCTTTG (1μM each) as primers, 400μM of each of the 4 dNTPs and either 20 or 100 nM of polymerase. 50 ng of pET17b[Pfu-Pol] was used with two buffers, either 20 mM Tris-HCl pH 8 or 20 mM Bicine-NaOH pH 9 both containing 10 mM KCl, 10 mM (NH4)2SO4, 2 mM MgSO4, 0.1% (v/v) Triton-X100 and 5μg bovine serum albumin. The PCR cycle comprised: pre-heat at 98◦C (2 min); 30 cycles of 95◦C (30 s), 60◦C (30 s), 70◦C (5 min); final hold at 70◦C for 5 min. 20μl of the PCR mixtures (expected size of the correct amplicon ∼5 kb) were analyzed using 1% agarose gel electrophoresis with ethidium bromide staining.

# **POLYMERASE FIDELITY**

Fidelities were determined using pSJ2, an assay based on gapfilling of the single stranded *lacZ*α segment within the plasmid by a polymerase *in vitro*. The gapped derivative of pSJ2 (1 nM) was fully extended using the polymerase (100 nM) in 20 μl of 20 mM Tris (pH 8.0), 10 mM KCl, 10 mM (NH4)2SO4, 2 mM MgCl2, 0.1% (v/v) Triton X-100, 2μg bovine serum albumin and 250μM of each of the four dNTPs. Extension was carried out at 70◦C for 30 min and the mixture was used to transform *E. coli* Top 10 cells, which were plated on LB agar (containing X-gal, IPTG and ampicillin) and scored for blue/white colonies. Ratios of blue/white colonies were converted to mutation frequency and error rate as previously described (Jozwiakowski and Connolly, 2009; Keith et al., 2013).

#### **PRIMER-TEMPLATE BINDING**

The binding of the polymerases to primer-templates was determined using fluorescence anisotropy with hexachlorofluoresceinlabeled DNA (Shuttleworth et al., 2004). The buffer conditions are detailed in the supplementary data (Figure S6).

#### **PROCESSIVITY**

Four hundred μl of 20 mM Tris pH 8.0, 10 mM KCl, 10 mM (NH4)2SO4, 1 mM EDTA, 0.1% Triton X-100, 40μg of bovine serum albumin, 400μM of each of the four dNTPs, 40 nM primer-template (sequence given in the legend to **Figure 7**) and 500 nM of polymerase were pre-incubated at 50◦C for 5 min. Reactions were initiated by the simultaneous addition of 3 mM MgSO4 and 10μM of a uracil-rich oligodeoxynucleotide trap (sequence given in legend to **Figure 7**) and polymerization allowed to proceed at 50◦C. Aliquots of 40μl were withdrawn at appropriate times and extension determined using denaturing gel electrophoresis as described above.

#### **DSF ANALYSIS**

The thermal stabilities of the polymerases were analyzed by heating samples in a Rotor-Gene 6000 (Corbett Research, Qiagen) in the presence of SYPRO Orange and measuring any increase

**37**

in fluorescence. Thermal melts were carried out in 100 μl of 40 mM HEPES, pH 7, containing 400 mM NaCl, 2 M guanidinium hydrochloride, 10μl of SYPRO Orange (5000 × stock in dimethylsulfoxide, Sigma; initially diluted 100-fold with water). The polymerases were used at a concentration of 2μM. Excitation and emission were at 470 nm and 555 nm, respectively. The temperature increased from 25 to 100◦C at a rate of 1◦C per min. Data analysis was carried out with the Rotor-Gene 6000 series software and the melting profiles are presented as first derivatives.

### **RESULTS**

#### **MANIPULATING AMINO ACIDS AND DOMAINS OF Pfu-Pol TO MAKE IT MORE SIMILAR TO Tkod-Pol**

TKod-Pol contains seven arginines at the forked-point, four of which are replaced in Pfu-Pol (**Figures 1A,B**). The absent arginines have been investigated by introducing them into Pfu-Pol, initially as the single amino acid substitutions, Pfu-PolM247R, T265R, and K502R. The situation at Tkod-Pol position 381 is more complex, due to the insertion of an additional leucine at position 381 in Pfu-Pol (**Figure 1B**). Therefore, two Pfu-Pol variants have been created, corresponding to the potential sequence line ups shown in **Figure 1B**. The deletion of L381, to give Pfu-PolL381, removes the extra amino acid, which may bring R382 in Pfu-Pol into register with R381 in Tkod-Pol. Additionally the direct substitution mutation, Pfu-PolL381R, has been prepared, corresponding more closely to structural data (**Figure 1C**). Insertion of L381 also means that lysine 502 in Pfu-Pol corresponds to arginine 501 in TKod-Pol. Several of the single amino acid modifications to Pfu-Pol showed improved ability to copy DNA (next section). To generate incremental increases in polymerization activity, two double mutants, Pfu-PolM247R/L381R and Pfu-PolL381R/K502R and a triple variant, Pfu-PolM247R/L381R/K502R were prepared.

A second Pfu-Pol mutant has the thumb domain (amino acids T591 to S775 at the carboxyl terminus) replaced with the corresponding region from Tkod-Pol (named Pfu-TkodTS, TS = thumb swap). There are multiple differences in the amino acid sequences that comprise the thumb domains of the two polymerases (**Figure 1D**; supplementary data figure S1). Of the amino acids in the Tkod thumb that contact DNA directly, only two, R709 and G711 (shown in green in **Figure 1D**), are changed in Pfu-Pol, to proline and serine, respectively. However, many Tkod residues, immediately adjacent to a DNA-contacting amino acid, also vary in Pfu-Pol (supplementary data Figure S1). The numerous variations in sequence make it a sizeable task to probe the contribution of individual amino acids; rather a complete thumb transplant was used. Finally the most advantageous "forkedpoint" double mutant (L381R/K502R) has been combined with the thumb domain swap.

#### **PRIMER-TEMPLATE EXTENSION BY DNA POLYMERASES**

Polymerase activities were initially compared using several primer-templates, a simple extension assay useful for indicating the effectiveness with which DNA is copied. The single amino acid substitutions Pfu-Pol M247R and K502R generated fulllength product more rapidly than wild type (supplementary data, Figure S2). At amino acid position 381, where two mutations were made, L381R appeared slightly better than L381 and both were superior to wild type. Pfu-Pol T265R was the only instance of a substitution giving poorer extension (supplementary data, Figure S3). Building on the single-swaps, two double mutants (M247R/L381R and L381R/K502R) were created; both extended the primer-template significantly more rapidly than wild type and slightly faster than their single parents. Faster extension with the double mutants was consistently observed with three primer-templates (**Figures 2**, **3**; supplementary data Figure S2). As can be seen in **Figure 2**, wild type Pfu-Pol required 10–20 min before full length product became visible, whereas completely extended primer was apparent after 1–2 min with both of the doubles. An alternative primer-template (**Figure 3**) again indicated more rapid extension with the double mutants (compare the five and twenty minute time points for wild type, M247R/L381R and L381R/K502R in **Figure 3**). This figure additionally demonstrates that the triple mutation (M247R/L381R/K502R) shows little additive improvement. Although Pfu-Pol M247R/L381R and L381R/K502R are superior to wild type, they do not match the performance of Tkod-Pol, where full extension is apparent after less than 10 s (**Figures 2**, **3**). The assay was also shows that Pfu-TkodTS extends primer-template more rapidly than wild type (**Figure 3**). With Pfu-TkodTS the reaction is almost complete after 5 min, whereas little full length product is seen with the wild type after this time. Even faster extension was seen when Pfu-TkodTS/L381R/K502R was investigated, with the reaction finishing in 1–2 min. Extensions were carried out at 30◦C, away from the temperature optima of about 75◦C for Pfu-Pol and Tkod-Pol (Takagi et al., 1997). Maybe, as temperature decreases, the activity of Pfu-Pol declines more steeply than Tkod-Pol, explaining the observed superior performance of Tkod-Pol at 30◦C. However, Tkod-Pol maintained its advantage in both real time and standard PCR (**Figures 4**, **5** respectively), techniques that involve DNA synthesis near the temperature optimum. Based on primer-template extensions, four mutants, M247R/L381R, L381R/K502R, Pfu-TkodTS, and Pfu-TkodTSL381R/K502R were selected for more detailed investigation.

#### **REAL TIME PCR ANALYSIS**

A key motivation in manipulating Pfu-Pol was to produce superior PCR enzymes and, to allow direct identification of useful mutants, priority was given to PCR-based assays. Initially real

time PCR (RT-PCR), commonly used to measure DNA and RNA levels (Saunders and Lee, 2013), was applied to investigate polymerase performance. During RT-PCR a *Ct* value defines the number of cycles taken for product to become apparent; the more effective the polymerase, therefore, the lower the *Ct*. In these experiments, yeast genome DNA targets of 232, 543, and 1040 bases were generated and the products detected using the fluorophore SYBR green. With the shortest amplicon (232 bases), Pfu-Pol and the double mutant L381R/K502R generated product after a similar number of cycles (*Ct* 15.30 and 14.93, respectively) and M247R/L381R was slightly slower (*Ct* 15.96). Here, the behavior of M247R/l381R is anomalous as it appeared superior to the wild type in every other test of polymerization ability. Tkod-Pol produced product more rapidly than the wild type i.e., had a lower *Ct*, and the two thumb swap mutants showed intermediate *Ct* values. As an example the real time PCR data found with the short amplicon is shown in **Figure 4** and all *Ct* values are summarized in **Table 1**. Further information was obtained using a longer amplicon of 543 bases and varying the extension time.

The double mutants appeared superior to Pfu-Pol, either giving a product where none was produced by the parent or generating amplicons more quickly; L381R/K502R demonstrated better performance than M247A/L38R. Greater efficiency was again apparent for Tkod-Pol, which generated products more rapidly as measured by *Ct* values. Similarly the thumb swap mutants showed better performance than Pfu-Pol (wild type and the double mutants) but were inferior to Tkod-Pol (**Table 1**). Confirmation of these trends came from the use of the longest amplicon (1040 bases) with four different extension times. Overall, as summarized in **Table 1**, Tkod-Pol gives the best performance in RT-PCR followed, in order, by Pfu-TkodTSL381R/K502R, Pfu-TkodTS, Pfu-PolL381R/K502R, Pfu-PolM247R/L381R, with Pfu-Pol wild type, the worst enzyme. This ranking is an excellent match to the results seen in primer-template extensions. While RT-PCR is a straightforward method for characterizing polymerases, incorrect products are often produced. Therefore, the integrities of the 232, 543, and 1040 base products were checked using melting temperature analysis (all three had similar *Tm* values of between 86 and 88◦C; for an example see **Figure 4**) and by gel electrophoresis to confirm the presence of product with the expected length (supplementary data, Figure S4). In **Table 1**, *Ct* values are only quote if the anticipated product comprised at least 95% of the amplification mixture.

#### **PCR PERFORMANCE OF POLYMERASES**

To determine if the more rapid primer-template extensions and lower *Ct* values exhibited by the Pfu-Pol mutants translated into improved standard PCR performance, amplifications of a 5 kb stretch of DNA were carried out. Two buffers with pH of 8 and 9 were used and the polymerases were tested at two concentrations, 20 or 100 nM. The results are given in **Figure 5** and show that wild type Pfu-Pol failed to generate substantial amounts of product, traces being apparent only at pH 9 with 20 nM enzyme. The double mutants M247A/L381R were better; at 20 nM levels both gave obvious product at pH 9, L381R/K502R also yielded product at pH 8. However, at the higher concentration of 100 nM the doubles gave smeared bands, suggesting non-specific amplification. The thumb swaps showed marked PCR improvement, at both pH values and protein concentrations a clear and intense (apart from pH 8, 20 nM) product band was visible, suggesting efficient amplification. Some non-specific products were apparent, particularly at pH 9. Wild type Tkod-Pol also performed well, perhaps marginally better than the two thumb swap mutants. Non-specific smaller products were also observed with Tkod-Pol at pH 9. The PCR results concur with all those above; the double mutants are superior to wild type Pfu-Pol and the thumb swaps even better. In these experiments the thumb swaps approach the performance exhibited by wild type Tkod-Pol.

#### **FIDELITY OF DNA POLYMERASES**

DNA polymerase accuracy is crucial for PCR and the main reason archaeal enzymes are widely applied. Two techniques were used to check the fidelity of the enzymes. Initially proof reading exonuclease activity was directly measured using a primer-template containing a base mismatch at the point of extension. A protein concentration (500 nM) in excess of primer-template (10 nM) was used and exonuclease activity observed by monitoring the removal of the mismatched base at the 3- -end of the primer. All Pfu-Pol mutants demonstrated slightly faster exonuclease rate constants than wild type, although increases were at most a factor of two (**Figure 6**, **Table 2**). Noticeably wild type Tkod-Pol showed more pronounced activity, about 7.5 times faster than Pfu-Pol. The plots shown in **Figure 6** and the rate constants in **Table 2** measure the disappearance of initial primer and so report on removal of the mismatched base. It is abundantly clear that subsequent degradation of the primer can occur, as a consequence of exonuclease activity at Watson-Crick base pairs. With Tkod-Pol



*aYeast genome DNA was the target for amplification and primers (given in Materials and Methods) were selected to give the amplicon lengths indicated.*

*bThe figures in the table represent the Ct value, the number of cycles required for the amplicon to become detectable. All experiments were conducted in triplicate with the Ct being the average of the three runs. In all cases were a figure is quoted, melting temperature analysis revealed a product with a Tm value of between 86 and 88*◦*C and gel electrophoresis showed an amplicon of the correct length (Figure 4; supplementary data, Figure S4). In all cases the anticipated product comprised at least 95% of the total amplified material.*

*cNP, no product; either no product was produced or non-specific amplification occurred, giving either an incorrect product or a mixture of amplicons containing both the desired and non-specific products.*

accumulation of the n-1 product is clear, suggesting preferential degradation of mismatched bases. Equivalent accumulation is less obvious with Pfu-Pol, inferring less discrimination between mismatches and *bona fide* base-pairs in this case. To further investigate the proof reading activities of Pfu-Pol and Tkod-Pol were compared using a fully complementary primer-template. With these undamaged substrates the exonuclease activity of Tkod-Pol was only 1.5 fold faster than Pfu-Pol, less than the factor observed with the mismatched substrate (supplementary data, Figure S5). The faster exonuclease rate of Tkod-Pol, compared to Pfu-Pol, with mismatches, followed by near equal reactivity at normal bases accounts for the n-1 product accumulation seen in **Figure 6**. Exonuclease activities were greater with mismatched than fully complementary substrates (**Table 1**); an expected observation as mismatches are easier to unwind, a necessary step for proof reading (Reha-Krantz, 2010).

To complement the measurements of exonuclease activities, fidelities have been determined using a *lacZ*α plasmid-based assay (Jozwiakowski and Connolly, 2009; Keith et al., 2013). A gapped plasmid, which contains the *lacZ*α gene in the single stranded region, is fully extended, *in vitro* at 70◦C, by a polymerase and then used for transformation of an appropriate *E. coli* strain. The observed ratio of white to blue colonies on plates contain the indicator X-gal, reflects the fidelity of the polymerase. For these experiments pSJ2 was used, a plasmid that has been characterized sufficiently to allow conversion of white/blue ratio i.e., the mutation frequency into error rate, the number of mistakes made per base incorporated (Keith et al., 2013). The results observed are given in **Table 3**, which demonstrates near identical error rates for each of the polymerases. All the mutants maintain the high accuracy associated with wild type Pfu-Pol and Tkod-Pol. Although Tkod-Pol has a measurably higher exonuclease activity than all the Pfu-Pol derivatives (**Table 2**), this does not translate into higher accuracy as determined using pSJ2. Fidelity critically depends on the balance between exonuclease and polymerase rates and with Tkod-Pol both are elevated compared with Pfu-Pol. The ratio of polymerase/exonuclease activities may be similar for both enzymes, leading to equal propensity to continue synthesis vs. engaging proof-reading and conferring similar overall fidelity on both proteins.

# **PRIMER-TEMPLATE BINDING**

The influence of the mutations on primer-template binding was determined using fluorescence anisotropy with hexachlorofluorescein-labeled primer-templates. Previous studies, carried out at pH 7.5 and 100 mM NaCl (ionic strength = 100), indicated relatively poor binding of Pfu-Pol to primer-templates, a *KD* of 270 nM being measured (Shuttleworth et al., 2004). As expected, binding affinity was increased at the lower salt concentration of 20 mM KCl (ionic strength = 20), reflected in a *KD* of 32 nM (Richardson et al., 2013). In this publication binding titrations were performed in the same buffer used for extensions; pH 8.5 containing 10 mM KCl and 10 mM (NH4)2SO4 (ionic strength of 40). Under these conditions wild type Pfu-Pol bound the primertemplate with a *KD* of 251 nM (supplementary data, Figure S6). Tkod-Pol showed similar affinity (*KD* = 276 nM) and so the more rapid primer-template extensions seen with this enzyme, cannot simply be accounted for by tighter binding to the DNA substrate. The two double mutants showed better interaction with DNA, by a factor a little greater than two for L381R/K502R and just over three for M247R/L381R (supplementary data, Figure S6). Surprisingly Pfu-KodTS bound primer-template (*KD* = 95 nM) more strongly than either of the parent polymerases from which it is derived and a further improvement was observed with the Pfu-KodTS L381R/K502R (*KD* = 50 nM). All The *KD* values seen are summarized in **Table 2**.

#### **PROCESSIVITY OF DNA POLYMERASES**

The processivity of a polymerase is the number of dNTPs incorporated per binding event (Von Hippel et al., 1994; Bambara et al.,

1995).For accurate measurement each primer should undergo only one round of extension and, under these single hit conditions, the number of dNTPs incorporated equals the processivity. Single hits can be achieved using a low concentration of polymerase, relative to primer-template, such that the probability of secondary initiation is low. Alternatively a trap, such as heparin or poly(dA-dT), can be used to sequester the polymerase after it dissociates from the primer-template (Bambara et al., 1995). Archaeal DNA polymerases bind tightly to uracilcontaining DNA (Shuttleworth et al., 2004), enabling a 23-mer containing 5 uracil residues (**Figure 7**) to be used as trap. The uracil-rich trap bound all polymerase variants with *KD* values in the 1–10 nM range (data not shown). For processivity determination 40 nM primer-template (sequence given in the legend to **Figure 7**) and 500 nM polymerase were pre-incubated in the absence of Mg2<sup>+</sup> and reactions initiated by the simultaneous addition of the metal and 10 μM of the trap. These stringent conditions, a 250-fold excess of a uracil-rich oligodeoxynucleotide with a high affinity for polymerase, should ensure efficient polymerase sequestration and inhibit re-binding to primer-template.

All polymerases were observed to have very low processivities at 250-fold trap excess and nearly identical results were see with a 100-fold excess (data not shown). Pfu-Pol acted largely in a distributive manner, with little product seen at short time intervals (**Figure 7A**; larger versions of all processivity gels are given in



*aThe affinity for DNA measured using fluorescence anisotropy titrations (supplementary data, Figure S6). Each value is the average of three determinations* <sup>±</sup> *the standard deviation. The primer-template used had the following sequence (Hex* = *hexachlorofluorescein): 5* - *-HexGGGGATCCTCTAGAGTCGACCTGC 3* - *- CCCCTAGGAGATCTCAGCTGGACGACCGTTCGTTCGAACAGAGG. (the mismatched bases are underlined.)*

*bRate observed for the degradation of a mis-paired primer-template, determined with polymerase in excess of DNA (Figure 6). Each value is the average of three determinations* ± *the standard deviation. For the mis-paired DNA, the same sequence as given above was used with the underlined dG changed to dA, giving a dC:dA mis-match at the primer-template junction. Fluorescein (Flu) was used as indicator dye.*

*cRate observed for the degradation of a fully base-paired primer-template, determined with polymerase in excess of DNA (supplementary data, Figure S5). Each value is the average of three determinations* ± *the standard deviation. This primer-template has the exact sequence given above and fluorescein (Flu) was used as indicator dye.*

*dProcessity (number of dNTPs incorporated per binding event) of the polymerases measured using a uracil-containing single stranded DNA trap (Figure 7).*

*eMelting temperatures were determined by DSF in the presence of 2 M guanidinium hydrochloride (Figure 8). The Tm of the first transition observed is given as an average* ± *standard deviation from three measurements.*

**Table 3 | Fidelities of DNA polymerases determined using the** *lacZα* **indicator pSJ2.**


*aSum of three independent experiments, each consisting of five repeats.*

*bThe Mutation frequency is the ratio mutant (white) colonies/total colonies and has been corrected by subtracting the background mutation frequency of* 1.1 × 10−<sup>4</sup> *found for gapped pSJ2 (Keith et al., 2013).*

*cThe Error rate is the number of mistakes made by the polymerase per base incorporated. The determination of the error rate from the mutation frequency has been described previously (Keith et al., 2013).*

*dData taken from an earlier publication (Keith et al., 2013).*

the supplementary data, Figure S7). A very faint band representing a single dNTP addition is seen after 2 min and +1 and +2 bands are visible after 5 min. Longer products were observed at 15 and 30 min but may result from multiple binding events. Thus the uracil trap is not perfect and at longer times polymerase "escape" is seen, maybe arising from trap degradation by the exonuclease activity. As expected, polymerization was more extensive when the trap was omitted (**Figure 7A**). Adding the trap in the pre-incubation step results in the abolition of the barely extended bands seen at short times; instead only slower mobility bands are observed after prolonged incubation, which correspond well with the similar bands observed when the uracil trap is used in the standard manner. Pfu-Pol has a processivity of 1 (values are summarized in **Table 2**), implying dissociation prior to adding even a single dNTP is highly likely. With Tkod-Pol a prominent band, corresponding to the incorporation of 3 dNTPs is seen at the shortest time (5 s) in the presence of the trap, representing the processivity (**Figure 7E**, **Table 2**). Tkod-Pol is more active than Pfu-Pol and even in the presence of trap full length product, presumably due to multiple binding events, was seen at short times. As expected these bands were more prominent when the trap was left out. Pronounced exonuclease activity, evidenced by shortened primer products was also apparent. The four mutants showed processivity profiles near those of the wild types (**Figures 7B–D**, **Table 2**). The double mutant M247A/L381R was very similar to Pfu-Pol with a faint band, representing incorporation of a single dNTP just visible at the one minute time point. L381R/K502R appeared marginally better, the band indicating one dNTP addition was predominant at times of 1 and 2 min, but fainter bands corresponding to the +2 and +3 products were seen. The processivity of this mutant, 1–3, may be slightly improved. Processivity does not need to be a single integral value; rather polymerases show a spread of dNTP incorporations per binding event. Pfu-Kod(TS) also demonstrated a processivity of between 1 and 3 (**Figure 7C**). Finally Pfu-TkodTSL381R/K502R appeared to be equivalent to Tkod-Pol with a single band at the +3 location visible at short times. Overall Tkod-Pol has higher processivity than Pfu-Pol (3 vs. 1) and with the mutations values are shifted toward Tkod-Pol.

The processivities determined here, using a uracil-rich oligodeoxynucleotide trap, are much lower than previously observed. Values of 270 and >300 have been reported for Tkod-Pol and of 6, <20 and 80 for Pfu-Pol (Takagi et al., 1997; Wang et al., 2004; Kim et al., 2007). None of these studies used a trap to hinder re-binding of the polymerase and secondary extension

and the change in protocol may explain the discrepancy. All studies agree, however, that Tkod-Pol is more processive than Pfu-Pol. Yeast DNA polymerase δ, similar in structure to the archaeal enzymes (Swan et al., 2009), shows a processivity of 2–3 when evaluated using a trap (Hogg et al., 2014).

#### **THERMOSTABILITY OF DNA POLYMERASES**

The ability to survive elevated temperature is an essential polymerase feature for the PCR. Both archaea, from which the polymerases used in this study are isolated, are hyper-thermophiles, but their preferred growth temperatures differ. *Pyrococcus furiosus* grows optimally at 100◦C and can survive at 110◦C, whereas *Thermococcus kodakarensis* grows best at 85◦C and can tolerate 94◦C (Fiala and Stetter, 1986; Borges et al., 2010). Our group earlier measured the thermostability of Pfu-Pol using differential scanning fluorimetry (DSF), where the protein is subject to a steady increase in temperature in the presence of the dye SYPRO orange (Killelea and Connolly, 2011). As thermally-induced unfolding takes place the dye binds to exposed hydrophobic regions, resulting in an increase in fluorescence that can be measured using a real time PCR apparatus. Pfu-Pol was found to be extremely thermostable with the unfolding transition being incomplete at 100◦C, the maximum achievable with this technique. Addition of guanidinium hydrochloride (GuHCl) destabilizes Pfu-Pol, making thermal unfolding more accessible. A Pfu-Pol mutant, that lacked two disulphide bridges and was more heat sensitive, demonstrated two well separated melting transitions. Most likely the wild type behaves similarly but the second transition cannot be observed as it takes place above 100◦C (Killelea and Connolly, 2011). In the present study 2M GuHCl was used to bring heat-induced unfolding into the DSF range and the melting profiles are shown in **Figure 8**. Tkod-Pol shows two melting transitions, the first of which has a *Tm* of 82.9◦C (all *Tm* values are summarized in **Table 2**). Pfu-Pol is more stable with a *Tm* of 93.2◦C. As postulated earlier it is probable that a second melting event for Pfu-Pol takes place above 100◦C; therefore, only the first observed transitions have been used for comparing unfolding. The *Tm* of 93.2◦C is slightly higher than that of 89.7◦C observed earlier for Pfu-Pol in 2 M GuHCl (Killelea and Connolly, 2011). Such variation may be accounted for by the different NaCl concentrations used, 400 mM here, 200 mM previously. Unsurprisingly the double mutants of Pfu-Pol, M247R/L381R (*Tm* = 95.7◦C and L381R/K502R (*Tm* = 94.5◦C) retain the thermostability of the wild type, if anything being slightly more heat resistant. More unexpectedly the thumb swaps, in which about 24% of the protein is derived from Tkod-Pol, fully retain the stability of Pfu-Pol, with no lowering of *Tm*, toward that of Tkod-Pol is (**Figure 8** and **Table 2**).

#### **DISCUSSION**

The use of the PCR across biological, medical, veterinary, agricultural and forensic sciences has aroused considerable interest

**the thermal unfolding of the polymerases.** The first derivatives of the DSF profiles are shown with *dF*/*dT* indicating the change in fluorescence (relative units). Individual polymerases are identified by the color coding given in the figures.

in thermostable DNA polymerases and many enzymes, especially from the *Thermococcales* order of the archaea, are employed (Terpe, 2013). Tkod-Pol and Pfu-Pol are the most established and several investigations have pointed out the advantageous nature of the former in terms of speed and processivity (Takagi et al., 1997; Nishioka et al., 2001; Kim et al., 2007). The abundance of arginines at the "forked-point" has been offered as one reason for the high processivity of Tkod-Pol (Hashimoto et al., 2001; Kim et al., 2008), but the absence of a closed ternary complex (enzyme/primer-template/dNTP) limits knowledge on the exact functions of these amino acids. It was originally proposed that Tkod-Pol R247 may separate the primer-template and stabilize the denatured substrate, in a similar manner to corresponding amino acid in the bacteriophage RB69 polymerase (Shamoo and Steitz, 1999; Hashimoto et al., 2001). More recent studies have suggested that this amino acid is relatively unimportant (Aller et al., 2011; Richardson et al., 2013). The thumb domain of archaeal family-B polymerases, and indeed all DNA polymerases, is responsible for binding double-stranded DNA and expected to be critical in DNA translocation and processivity. Several "forkedpoint" arginines are missing in Pfu-Pol and the thumb domain shows subtle differences between the two enzymes. A major aim of this study was to graft these elements into Pfu-Pol in the hope of creating better PCR enzymes.

As "forked-point" arginines and/or the thumb domain of Tkod-Pol are introduced into Pfu-Pol, extension rates become faster, PCR performance improves and processivity, albeit only from one to three, is increased. In all cases the superior thermostability of Pfu-Pol is retained and there is no decrease in fidelity. The influence appears cumulative with the best mutant, combining both "forked-point" and thumb alterations. Pfu-TkodTS L381R/K502R represents an end-point (the entire thumb has been exchanged and addition of further missing arginines e.g., R246 does not enhance properties), yet still falls short of the performance of Tkod-Pol. This suggests other regions of the polymerase may play a role and these may subtly differ between Tkod-Pol and Pfu-Pol. One important area is the fingers domain, made up of two long α-helices (Hopfner et al., 1999; Chapin-Rodriguez et al., 2000; Hashimoto et al., 2001; Kim et al., 2008), which undergoes a conformational change following dNTP binding to the polymerase/primer-template complex, to produce a catalytically competent "closed" ternary complex. The fingers domain is highly conserved in all DNA polymerases and plays a critical role in dNTP selection and accurate insertion of the incoming base into the extending primer (Brautigam and Steitz, 1998). The amino acid sequences of the fingers domain of Tkod-Pol and Pfu-Pol are similar, but not identical (supplementary data, Figure S1). Given the critical function of this area in catalysis, the small variations may impinge on PCR performance. A second key element may be the Y-GG/A motif (Brautigam and Steitz, 1998), shown to be important in the processivity, fidelity and PCR capability of archaeal polymerases (Bohlke et al., 2000). The sequences near this region are shown in **Figure 1**, corresponding to the amino acids near arginine 381 (Y-GG/A = YEGG for Tkod-Pol and YTGG for Pfu-Pol). The presence of an additional leucine (L381) in Pfu-Pol alters the location of the conserved tyrosine and glycines, moving them away from the DNA phosphate backbone and disrupting conserved protein-DNA interactions (Kim et al., 2008). Deletion of this leucine did result in slightly improved primer-template extension, but it is not clear if this mutation correctly lines up the subsequent arginine in Pfu-Pol with R381 of Tgo-Pol (**Figure 1**) and the L381R substitution was slightly superior (supplementary data, Figure S2). Of course Pfu-PolL381R still retains an extra amino acid and this may disrupt the downstream Y-GG/A motif as seen for the wild type. Finally communication between the exonuclease and thumb domains has been implicated in the co-ordination of polymerase and proof-reading exonuclease activities (Kuroita et al., 2005; Kim et al., 2008). Trans-domain interactions between amino acids in these regions appear to control the relative motions of the two domains, influencing catalytic activities. The inter-domain contacts appear to vary slightly between Tkod-Pol and Pfu-Pol, potentially impinging on PCR properties. Such considerations may be especially significant with the thumb swap mutants.

The significance of this publication is the demonstration that it is relatively simple to improve the PCR performance of Pfu-Pol by substituting individual amino acids and even an entire domain found in the closely related Tkod-Pol. The Pfu-Pol framework seems remarkably tolerant to substantial alteration, maintaining thermostability and fidelity. Future possibilities include incorporating elements from other *Thermococcales* family-B polymerases; a reasonable number are known (**Figure 1**) and many have been applied in the PCR (Terpe, 2013). Further improvement may be achieved by combining gene segments using random techniques such as DNA shuffling (Stemmer, 1994) or staggered extension process (StEP) (Zhao et al., 1998) and selecting for improved PCR performance with compartmentalized selfreplication (CSR) (Ghadessy and Holliger, 2007). This approach may even be extending to environmental DNA from uncharacterized archaea, widening the genetic resource (Matsukawa et al., 2009). Experiments could target the "forked-point" and thumb, as well as other elements, suggested above to play a role in PCR ability. Such an undertaking should yield important clues about features of archaeal DNA polymerases that are important for efficient and robust PCR and may, ultimately, lead to superior reagents. It may be possible to develop DNA polymerases more suited to demanding PCR applications such as amplifying DNA from single cells or old/damaged/degraded sources and in the steps needed prior to high throughput DNA sequencing.

#### **AUTHOR CONTRIBUTIONS**

All authors carried out experiments to acquire data and participated in analysing and interpreting the results. Bernard A. Connolly originally conceived the project and Ashraf M. Elshawadfy, Brian J. Keith, and Thomas Kinsman contributed to experimental design. Bernard A. Connolly wrote the first draft of the work and all authors contributed in critical revision. All authors concur with the final version and agree to take responsibility for the work and conclusions described within.

#### **ACKNOWLEDGMENTS**

Ashraf M. Elshawadfy was supported by grants from the Egyptian government and the University of Zagazig (Zagazig, Egypt). Brian J. Keith and Thomas Kinsman were UK BBSRC-supported Ph.D. students. The research was supported by a Wellcome Trust (UK) equipment grant (grant number 064345). Zvi Kelman is thanked for kindly supplying the Tkod-Pol B overexpressing plasmid.

### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fmicb. 2014.00224/abstract

# **REFERENCES**


*Thermococcus gorgonarius*. *Proc. Natl. Acad. Sci. U.S.A.* 96, 3600–3605. doi: 10.1073/pnas.96.7.3600


**Conflict of Interest Statement:** Brian J. Keith was partially funded by Bioline, a company with commercial interests in PCR enzymes. Apart from this the authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 04 April 2014; accepted: 28 April 2014; published online: 27 May 2014. Citation: Elshawadfy AM, Keith BJ, Ee Ooi H, Kinsman T, Heslop P and Connolly BA (2014) DNA polymerase hybrids derived from the family-B enzymes of Pyrococcus furiosus and Thermococcus kodakarensis: improving performance in the polymerase*

*chain reaction. Front. Microbiol. 5:224. doi: 10.3389/fmicb.2014.00224 This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology.*

*Copyright © 2014 Elshawadfy, Keith, Ee Ooi, Kinsman, Heslop and Connolly. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Mutant Taq DNA polymerases with improved elongation ability as a useful reagent for genetic engineering

#### *Takeshi Yamagami 1, Sonoko Ishino1, Yutaka Kawarabayasi 1,2 and Yoshizumi Ishino1 \**

*<sup>1</sup> Protein Chemistry and Engineering, Department of Bioscience and Biotechnology, Graduate School of Bioresource and Bioenvironmental Sciences, Kyushu University, Fukuoka, Japan*

*<sup>2</sup> Health Research Institute, National Institute of Advanced Industrial Science and Technology, Amagasaki, Japan*

#### *Edited by:*

*Zvi Kelman, University of Maryland, USA*

#### *Reviewed by:*

*Paul Beare, National Institutes of Health, USA Zhuo Li, State Oceangrapic Administration, China*

#### *\*Correspondence:*

*Yoshizumi Ishino, Department of Bioscience and Biotechnology, Graduate School of Bioresource and Bioenvironmental Sciences, Kyushu University, 6-10-1 Hakozaki, Higashi-ku, Fukuoka, Fukuoka 812-8581, Japan e-mail: ishino@agr.kyushu-u.ac.jp*

DNA polymerases are widely used for DNA manipulation *in vitro*, including DNA cloning, sequencing, DNA labeling, mutagenesis, and other experiments. Thermostable DNA polymerases are especially useful and became quite valuable after the development of PCR technology. A DNA polymerase from *Thermus aquaticus* (Taq polymerase) is the most famous DNA polymerase as a PCR enzyme, and has been widely used all over the world. In this study, the gene fragments of the family A DNA polymerases were amplified by PCR from the DNAs from microorganisms within environmental soil samples, using a primer set for the two conserved regions. The corresponding region of the *pol* gene for Taq polymerase was substituted with the amplified gene fragments, and various chimeric DNA polymerases were prepared. Based on the properties of these chimeric enzymes and their sequences, two residues, E742 and A743, in Taq polymerase were found to be critical for its elongation ability. Taq polymerases with mutations at 742 and 743 actually showed higher DNA affinity and faster primer extension ability. These factors also affected the PCR performance of the DNA polymerase, and improved PCR results were observed with the mutant Taq polymerase.

**Keywords: thermostability, gene amplification,** *in vitro* **gene manipulation,** *Thermus aquaticus***, PCR**

## **INTRODUCTION**

In addition to their fundamental roles in maintaining genome integrity during replication and repair, DNA polymerases are widely used for genetic engineering techniques, including DNA cloning, dideoxy-sequencing, DNA labeling, mutagenesis, and other *in vitro* DNA manipulations. Among them, thermostable DNA polymerases are particularly useful for PCR and cyclesequencing (Perler et al., 1996; Ishino and Ishino, 2013; Terpe, 2013).

The fundamental ability to synthesize a deoxyribonucleotide chain is conserved in relation to the structural conservation of the DNA polymerases. However, the more specific properties for this catalysis, including processivity, synthesis accuracy, and substrate nucleotide selectivity, differ among the enzymes. These factors should be considered when evaluating a DNA polymerase as an enzyme for genetic engineering (Ishino and Ishino, 2014). An enzyme possessing faster extension with better accuracy and higher efficiency is more preferable. In addition to these catalytic properties, thermostability is necessary for practical PCR. DNA polymerases are now classified into seven families, based on the amino acid sequences (Braithwaite and Ito, 1993; Ishino and Cann, 1998; Cann and Ishino, 1999; Ohmori et al., 2001; Lipps et al., 2003). The enzymes within the same family have basically similar properties. Commercial genetic engineering reagents have originated only from families A and B to date. The family A enzymes are used for dideoxy-sequencing, and the family A and B enzymes are used for PCR. None of the DNA polymerases from the other families are suitable for general use in genetic engineering experiments. The 3- -5 exonuclease activity, which contributes to the proofreading of DNA strand synthesis, is generally associated with the family B enzymes, but not with the family A enzymes, although some family A enzymes have a weak 3- - 5 exonuclease activity (Joyce and Steitz, 1994; Villbrandt et al., 2000). Based on these differences, family A is advantageous for the efficient amplification of a long DNA region, and family B is generally more suitable for the precise amplification of a shorter region by PCR (Eckert and Kunkel, 1991). Researchers in this field have been making continuous efforts toward longer extension and better accuracy in PCR, and have succeeded in developing practical and reliable PCR methods. One notable example is the development of LA (long and accurate)-PCR, which is performed with a mixture of two DNA polymerases, one each from family A and family B (Barns, 1994). Further improvements of PCR have included the identification of a processive enzyme (Takagi et al., 1997) and the modifications within family B DNA polymerases that confer higher accuracy (Wang et al., 2004; Ishino et al., 2012).

Protein engineering techniques, using site-specific or random mutagenesis, are powerful ways to create mutant enzymes from the known DNA polymerases. Several useful enzymes were successfully produced by these procedures. A cold-sensitive mutant of DNA polymerase from *Thermus aquaticus* (Taq polymerase) was developed with markedly reduced activity at 37◦C, as compared with the wild type (WT) enzyme (Kermekchiev et al., 2003). This mutant may be applicable to hot start PCR. Another example is a mutant Taq polymerase with enhanced resistance to various inhibitors of PCR reactions, including whole blood, plasma, hemoglobin, lactoferrin, serum IgG, soil extracts, and humic acid (Kermekchiev et al., 2009). The molecular breeding of *Thermus* DNA polymerases by a direct evolution technique (Brakmann, 2005; Henry and Romesberg, 2005; Holmberg et al., 2005; Ong et al., 2006), compartmentalized self-replication (CSR) (Ghadessy et al., 2001), also generated a PCR enzyme with striking resistance to a wide range of inhibitors (Baar et al., 2011). Furthermore, enzymes with a broad substrate specificity spectrum were also obtained by the CSR technique (Ghadessy et al., 2004; d'Abbadie et al., 2007), and are thus useful for the amplification of ancient DNA containing numerous lesions. Mutational studies in the O-helix of Taq polymerase produced enzymes with reduced fidelity (Suzuki et al., 1997, 2000; Tosaka et al., 2001), which may be useful for error-prone PCR. These studies have contributed to the elucidation of the detailed structurefunction relationships of DNA polymerases, as well as to the creation of novel enzymes with different substrate specificities, stabilities, and activities from those of their naturally evolved counterparts.

In addition to the engineering of characterized enzymes to convert PCR performance, the screening for a suitable DNA polymerase activity from known organisms is the most conventional way to discover useful enzymes. However, the culturable organisms are limited, and large-scale cultivation is needed to purify an enzyme to homogeneity for precise characterization. In this study, we analyzed the DNAs from microorganisms within various soil samples obtained from a hot spring area, and compared the sequences of a region within the *pol* genes included in the environmental DNAs. We then predicted the amino acid residues that are critical for the primer extension reaction of Taq polymerase, by constructing numerous chimeric Taq polymerases including the *pol* gene fragments from the various environmental DNAs. A mutant Taq polymerase with the E742 and A743 substitutions possessed more efficient DNA strand synthesis ability and better PCR performance. This polymerase will contribute to the development of high-speed PCR with the standard PCR conditions optimized for Taq polymerase.

# **MATERIALS AND METHODS**

#### **ENZYMES AND SUBSTRATES**

Enzymes for *in vitro* DNA manipulation and oligonucleotides were purchased from New England Biolabs (Ipswich, MA, USA) and Sigma Aldrich (St. Louis, MO, USA), respectively. The [methyl-3H]TTP was purchased from Amersham (Buckinghamshire, UK) and the [γ32-P]ATP was purchased from NEN Life Science Products (Boston, MA, USA).

#### **DNA EXTRACTION**

Extraction of DNA from the environmental specimens was performed with an UltraClean Soil DNA Isolation Kit (MO BIO, San Diego, CA, USA), according to the manufacturer's instructions. The extracted DNAs were assessed by agarose gel electrophoresis and were quantified by spectrophotometrical measurement.

#### **CONSTRUCTION OF THE EXPRESSION PLASMID FOR CHIMERIC Taq POLYMERASES**

The expression plasmid for Taq polymerase, pTV-Taq, which contains the entire region of the structural gene encoding Taq polymerase in the pTV118N vector (Takara Bio, Shiga, Japan), was constructed exactly as described (Ishino et al., 1994). The gene in the pTV118N vector was expressed under control of the *lac* promoter and SD sequence. The pTV-Taq plasmid was subjected to site-specific mutagenesis, using QuikChange™ kit (Agilent Technologies, Santa Clara, CA, USA) to introduce BlpI (GCTNAGC) and BglII (AGATCT) restriction sites into the positions corresponding to the 5- - and 3- -termini, respectively, of the substitution region within the Taq *pol* gene. The insertion of a BglII site leads to the mutations of amino acids, Leu787Ile and Val788Leu. The resultant plasmid was named pTV-Taq- , and it was used for the expression of chimeric Taq polymerases, produced by the direct substitution of the BlpI–BglII fragment with the *pol* gene fragments from the environmental DNAs, as described in detail below.

#### **AMPLIFICATION OF THE** *pol* **GENE FRAGMENTS FROM METAGENOMIC DNA**

A region of the *pol* genes was amplified directly from the environmental DNA by PCR, using a primer set with the sequences 5- -dC GCAGGCTAAGCAGCTCC**GAYCCHAACYTSCARAAYATHCC**-3 and 5- - dGAG**YAAGATCTCRTCGTGNACYTG**-3- , which correspond to the degenerate codons for DPNLQNIP (forward) and QVHDEIL (reverse), respectively (Y indicates C and T; H indicates A, C, and T; S indicates C and G; R indicates A and G; N indicates A, G, C, and T). The above two regions, which are conserved in the family A DNA polymerases, were successfully used to make a mixed primer set for PCR (Uemori et al., 1993). The nucleotide sequences corresponding to the conserved regions are shown in boldface, and the restriction endonuclease recognition sequences are underlined in the above primers. PCR was performed in a 50μl reaction, containing 10 ng DNA, 25 pmol of each primer, 0.2 mM dNTP, and 1 unit of PfuUltra DNA polymerase (Stratagene). After an incubation of the mixture without the enzyme for 3 min at 95◦C, thirty-cycles of PCR, with a temperature profile of 30 s at 95◦C, 30 s at 55◦C, and 1 min at 72◦C, were performed. The reaction mixtures were electrophoresed in a 1% agarose gel, and the amplified fragments were visualized by ethidium bromide staining.

#### **ANALYSIS OF THE AMPLIFIED GENE FRAGMENTS**

DNA fragments with a 600 bp size, amplified from the environmental DNA by PCR, were excised from the agarose gel, digested with the BlpI and BglII restriction endonucleases, and ligated into the pTV-Taq plasmid predigested with the BlpI and BglII restriction endonucleases. The ligation mixtures were introduced into *E. coli* JM109 cells (TaKaRa Bio.), and 20 clones were picked independently from the transformants for each amplification. Plasmid DNAs were extracted from these clones, and the nucleotide sequences of the DNA inserts were determined by CEQ2000XL DNA analysis system (Beckman Coulter, USA).

#### **CONSTRUCTION OF MUTANT Taq POLYMERASES**

The pTV-Taq plasmid was subjected to site-specific mutagenesis to introduce mutations into positions 742 and 743 of Taq polymerase. The sequences of the 14 primers used for mutagenesis are shown in **Table 1**. The PCR reaction mixture contained 20 ng template DNA, 1× PCR buffer for KOD-Plus-Neo, 1.5 mM Mg2SO4, 0.2 mM of each dNTP, 0.3μM of each primer, and 0.5 unit KOD-Plus-Neo DNA polymerase (TOYOBO, Osaka, Japan) in a final volume of 20μl. The mixture was heated at 95◦C for 30 s and then subjected to thermal cycling (14 cycles of 95◦C for 30 s, 55◦C for 1 min, and 68◦C for 8 min). The PCR product was treated with DpnI at 37◦C for 1 h, and introduced into *E. coli* JM109 cells. For each mutation, the polymerase gene was fully sequenced to ensure that the mutation of interest was present and that no additional mutation was introduced by the PCR.

#### **PURIFICATION OF WILD TYPE AND MUTANT DNA POLYMERASES**

*E. coli* JM109 cells carrying the expression plasmid were grown at 37◦C, in 1 L of LB medium containing 100μg/ml ampicillin.



*The sites for the mutations are underlined.*

The cells were cultured to an A600 of 0.2–0.3, and then the expression of the *pol* gene was induced by further cultivation for 3 h in the presence of 1 mM isopropyl-β-D-thiogalactopyranoside (IPTG). The cells were harvested and disrupted by sonication in the lysis solution (50 mM Tris-HCl, pH 8.0, 1 mM DTT, 1 mM EDTA, 1 mM PMSF). For the preparation of the WT and mutant Taq polymerases, the soluble cell extract, obtained by centrifugation at 12,000 × *g* for 20 min, was heated at 75◦C for 30 min. The heat-stable fraction was obtained by centrifugation, and was treated with 0.15% (w/v) polyethyleneimine in the presence of 1 M NaCl, to remove the nucleic acids. The soluble proteins were precipitated by 80%-saturated ammonium sulfate. The precipitate was resuspended in the buffer [50 mM Tris-HCl, pH 8.0, 0.5 M (NH4)2SO4], and was subjected to chromatography on a hydrophobic column (HiTrap Phenyl HP 5 ml, GE Healthcare). The column was washed with 50 mM Tris-HCl, pH 8.0, and the Taq polymerase was eluted with deionized water. An equal volume of 100 mM Tris-HCl, pH 8.0, was added to this fraction, and it was then subjected to affinity chromatography (HiTrap Heparin HP 5 ml, GE Healthcare) with a gradient of 0–2 M NaCl. The proteins that eluted at 0.8 M NaCl were stored at −25◦C as the final sample, in 20 mM Tris-HCl, pH 8.0, 100 mM KCl, 0.1 mM EDTA, 1 mM DTT, 0.5% NP-40, 0.5% Tween 20, and 50% (w/v) glycerol.

#### **MEASUREMENT OF NUCLEOTIDE INCORPORATION ACTIVITY**

The DNA polymerizing activity was assayed by measuring the incorporation of [methyl-3H]TTP into the acid insoluble materials, basically as described previously (Uemori et al., 1995). To a 50μl solution, containing 20 mM Tris-HCl, pH 8.8, 5 mM MgCl2, 14 mM 2-mecaptoethanol, 0.2 mM each dATP, dGTP, dCTP, and dTTP, 400 nM [methyl-3H]dTTP, and 20μg of activated salmon sperm DNA, a constant amount of the enzyme fraction was added, and the reaction was incubated at 74◦C for 2.5, 5, and 10 min. After the reaction, a 10μl portion of each reaction mixture was spotted onto DE81 filters (GE Healthcare Japan, Tokyo, Japan). The filters were washed three times with a 5% Na2HPO4 solution, and the radioactivity incorporated into the DNA strands was counted by a scintillation counter. One unit of activity is defined as the amount of enzyme catalyzing the incorporation of 10 nmol of dNTP into DNA per 30 min at 74◦C, and the specific activity was calculated as the units per one mg of protein (units/mg) for each DNA polymerase.

#### **PRIMER EXTENSION ACTIVITY**

The primer extension ability was investigated by using M13 single-stranded DNA (ssDNA) annealed with a 32P-labeled DNA, as described previously (Cann et al., 1999). M13 ssDNA (0.05 pmol), annealed with a 55 nucleotide long primer (5- -dTC GTAATCATGGTCATAGCTGTTTCCTGTGTGAAATTGTTATCC GCTCACAATTC-3- ), was mixed with each DNA polymerase (0.05 unit) and dNTP (0.2 mM) in a 20μl solution, containing 20 mM Tris-HCl, pH 8.8, 5 mM MgCl2, and 14 mM 2-mercaptoethanol, and incubated 74◦C for 5 min. The reaction mixtures were analyzed by alkaline agarose gel electrophoresis, and the sizes of the products were visualized by autoradiography.

#### **ELECTROPHORETIC MOBILITY-SHIFT ASSAY**

The electrophoretic mobility-shift assay (EMSA) was performed as described previously (Komori and Ishino, 2000), to measure the DNA binding ability of the DNA polymerases. The 32P-labeled 27mer oligonucleotide (5- -dAGCTATGACCATGATTACGAATT GCTT-3- ) was annealed with the 49mer oligonucleotide (5- -dA GCTACCATGCCTGCACGAATTAAGCAATTCGTAATCATGGT CATAGCT-3- ) and was used as a DNA substrate. The radiolabeled DNA (3 nM) was mixed with DNA polymerase proteins (0.6–400 nM) in a 20μl solution, containing 20 mM Tris-HCl, pH 8.8, 10 mM NaCl, 5 mM MgCl2, 14 mM 2-mercaptoethanol, 0.1 mg/ml BSA, and 5%(w/v) glycerol, and incubated at 40◦C for 5 min. The DNA-enzyme mixtures were fractionated by 1% agarose gel electrophoresis. The autoradiograms were scanned and the band intensities were quantified using an image analyzer (FLA5000; Fuji Film, Tokyo, Japan). The fraction of bound DNA in each lane was calculated to be: fraction bound DNA = 1 − fraction unbound DNA. The quantitated data for binding (association) was plotted vs. enzyme concentrations. The apparent *K*d was determined to be the protein concentration at which the degree of binding equals 0.5.

### **RESULTS**

#### **AMPLIFICATION OF FAMILY A DNA POLYMERASE GENE**

We collected soil samples from various hot spring areas in Japan, and isolated the DNAs within these samples. Using these DNAs as templates, a region of the *pol* genes encoding family A DNA polymerase-like sequences was amplified. We previously reported the amplification of the *pol* gene encoding a family A DNA polymerase by a set of mixed primers based on the two conserved sequence motifs (Uemori et al., 1993). The mixed primer set worked for the specific amplification of the *pol* gene fragments from the hot spring samples in this study. As shown in **Figure 1** (a part of the experiments is shown), a 600 bp DNA fragment was amplified as a single band from several samples. We tested 384 different samples, and detected the amplified DNA in 37 samples, obtained at Onikobe, Hachimantai, Nasu, Kirishima, and Beppu, as shown in **Table 2**. These locations had various environmental conditions, including pH values of 1∼7 and temperatures

**FIGURE 1 | Amplification of a region within the family A DNA polymerase gene from the environmental DNA.** PCR reaction mixtures were fractionated by 1% agarose gel electrophoresis, and the DNA bands were visualized by ethidium bromide staining. The lane numbers represent the serial numbers of the samples obtained from the hot spring areas in the Tohoku and Kyushu districts.

of mostly 70–100◦C, and thus were expected to be inhabited by microorganisms with highly diverse genetic resources. The efficiency of DNA isolation was not addressed in this experiment, and thus it is possible that a sufficient amount of DNA was not present in some samples. Therefore, the result that only about 10% of the samples provided target gene amplification is not directly related to the presence or absence of microorganisms in the samples. The amplified gene fragments were excised from the gel, and were cloned into a plasmid vector. Twenty colonies were isolated independently from each cloning of the amplified DNA, and a total of 740 (20 × 37 amplifications) plasmids were isolated. These cloned DNA fragments were subjected to

**Table 2 | Summary of metagenomic analyses.**


*"–" indicates "not analyzed."*

sequencing, and the different sequences were counted (**Table 2**). In total, we obtained 250 different sequences, which were not present in the public databases. This result suggested that there are still many uncharacterized DNA polymerases in the soil samples, as expected.

#### **PREPARATION OF THE CHIMERIC Taq POLYMERASES**

The amplified gene fragments encode the region in the family A DNA polymerase that is important for the nucleotide connecting reaction. To investigate the structure and function relationships of the family A DNA polymerases, we substituted the corresponding region of Taq polymerase gene with the amplified gene fragments *in vitro*. To construct the expression plasmids for the chimeric Taq polymerases systematically, restriction sites were created at appropriate sites in the Taq *pol* structural gene (**Figure 2**). For this purpose, the BlpI and BglII recognition sequences were suitable, although the substitutions of two amino acids, Leu797Val and Ile798Leu, could not be avoided by the introduction of BglII at the reverse priming site. The mutant Taq polymerase

**FIGURE 2 | Schematic diagram of the construction of the expression plasmid for chimeric** *Taq* **polymerase.** The recognition sites for BlpI (GCTNAGC) and BglII (AGATCT) were created in the Taq *pol* structural gene (gray arrow). The introduced gene fragments (black) also have the recognition sequences for these two enzymes from the PCR primers, and the substitution of the DNA fragments can be performed directly on this plasmid (upper panel). The produced chimeric protein is shown by a bar (lower panel). The replaced region was indicated in black. The motifs conserved in family A DNA polymerases are indicated by white lines (Loh and Loeb, 2005).

(L797V/I798L) was purified, and we confirmed that its fundamental properties were not affected. Therefore, the mutant Taq, designated as Taq polymerase, was considered to be equivalent to the WT Taq polymerase. PCR primers containing the recognition sequences for BlpI (forward primer) and BglII (reverse primer) at each 5- -terminus were synthesized and each cloned DNA was re-amplified, and thus 250 chimeric genes were constructed in the Taq polymerase expression plasmid. The total cell extracts of *E. coli* producing recombinant chimeric Taq polymerases were treated at 75◦C for 30 min, and the supernatants were assayed to measure the nucleotide incorporation activity. About half of the chimeric enzymes were inactivated by the heat treatment. The thermostable chimeric Taq polymerases were further purified to apparent homogeneity by the procedure described in the Materials and Methods, and the specific activity (units/mg protein) was measured by the standard DNA polymerase assay, using activated DNA. Furthermore, the primer extension ability was evaluated for each enzyme, using a constant amount (unit). These results are summarized in **Table 3**, which shows only the chimeric Taq polymerases possessing extension abilities better than 5 kb per 5 min-reaction. As shown in the **Table 3**, 13 enzymes were superior to WT Taq polymerase. However, the thermostabilities of these high-speed DNA polymerases were not sufficient for PCR applications (data not shown).

### **SEQUENCE COMPARISON OF CHIMERIC Taq POLYMERASES AND CONSTRUCTION OF THE MUTANT Taq POLYMERASES**

The amino acid sequences of the chimeric Taq polymerases with extension rates greater than 1 kb/min (in the condition of 0.0025 unit/μl) are aligned in **Figure 3**. In this alignment, we focused on the region from amino acids 730 to 745 in the Taq polymerase. One distinct feature is that 3 genes and 4 genes have insertions of 9 amino acids and 3 amino acids, respectively, as compared with WT Taq polymerase. The other characteristic feature is that continuous stretches of basic amino acids were

#### **Table 3 | Properties of the chimeric Taq polymerase.**


*– indicate "not analyzed."*


indicated by a blue line.

found at residues 741–743 in the chimeric Taq polymerases. The WT Taq polymerase has Glu and Ala at 742 and 743, but many chimeric enzymes showing faster extension have Arg at both positions. The crystal structure of the large fragment of Taq polymerase (Klentaq)-DNA complex revealed that Glu742

conserved motifs are shown on the top (Loh and Loeb, 2005). The

directly interacts with the template DNA in the closed conformation, but not in the open conformation (Li et al., 1998). As shown in the right panel of **Figure 4**, the residues Glu742 and Ala743 (magenta) are located in the finger subdomain and face to the template DNA (blue). The basic amino acid cluster in the chimeric Taq polymerases is supposed to interact with the template DNA. We focused on this finding, and made a series of mutant polymerases by substitutions at positions 742 and 743 in WT Taq polymerase to change the affinity of the enzymes with DNA. The names of the mutant enzymes are as follows: RR (E742R/A743R), RA (E742R), AA (E742A), ER (A743R), AR (E742A/A743R), RK (E742R/A743K), KR (E742K/A743R), KK (E742K/A743K), QY (E742Q/A743Y), AH (E742A/A743H), EH (A743H), HA (E742H), HH (E742H/A743H), and HK (E742H/A743K). Fourteen mutant recombinant enzymes were purified to homogeneity from *E. coli* cells. The specific activity (units/mg protein) was measured by the standard incorporation assay (**Table 4**). Thermal stabilities of the mutant Taq polymerases were similar to WT enzyme (data not shown).

#### **Table 4 | Properties of the mutant Taq polymerase.**


#### **FASTER PRIMER EXTENSION BY THE MUTANT Taq POLYMERASES**

The *in vitro* primer extension rates were compared for these mutant Taq polymerases, as well as WT Taq polymerase. As shown in **Figure 5**, all of the mutant Taq polymerases exhibited faster extension reactions compared with that by the WT. The results of these experiments were quantified. The increased extension rate is generally related to the number of positive charges at this site (**Table 4**). The basic residues gave the varied effects. The positive charge of His appeared to have lower effect than those of Arg and Lys. There is a difference in pKa among the basic residues, and pKa of His, Arg and Lys are 6.8, 12.5, and 11.0, respectively. The relative degree of positive charge of His is estimated to be low. The DNA binding affinity of each enzyme was evaluated by EMSA, using a primed-DNA as a 32P-labeled probe. As shown in **Figure 6**, the DNA binding ability of all the mutant Taq polymerases was distinctly increased from that of WT Taq polymerase. The increased number of positive charge at the positions 742 and 743 appeared to provide higher binding efficiency of Taq polymerase. Apparent *K*d was determined with EMSA (**Table 4**). All the mutants bound to DNA by up to 2 orders of magnitude more tightly than WT Taq polymerase. Although there is no difference in the apparent *K*d among these mutants, the second-shifted bands appeared in the gel images of EMSA in the case of the mutants, which possess Arg or Lys at the positions 742 and 743. The positive charge of Arg or Lys at the positions 742 and 743 might cause a nonspecific binding, in addition to the functional binding, of the enzyme to DNA.

#### **BETTER PCR PERFORMANCE BY THE MUTANT Taq POLYMERASES**

The main goal of this study was to create PCR enzymes with superior performance, as compared to that of WT Taq polymerase. Since the mutant Taq polymerases are as thermostable as WT enzyme, it was promising to apply these enzymes to PCR. Therefore, the PCR performances for several target DNAs with different lengths were compared. A representative example of the PCR experiments is shown in **Figure 7**. For the

**polymerases.** M13 ssDNA annealed with a 32P-labeled deoxyoligonucleotide (55mer) was used as the substrate. For each DNA polymerase, 0.05 unit was added to 20μl reaction mixture, containing 2.5 nM primed DNA. The reaction mixtures were incubated at 74◦C for 5 min, and the products were analyzed by 1% alkaline agarose gel electrophoresis, followed by autoradiography. The sizes indicated on the left are from BstPI-digested λ phage DNA labeled with 32P at each 5end. The names of the proteins were indicated on the top.

performed using primed DNA (32P-labeled 27mer DNA and 49mer DNA). The names of the mutant proteins were indicated on the top of each

primed DNA. Lanes 3∼8 contained 0.4, 1.6, 6.3, 25, 100, and 400 nM enzyme, respectively.

amplification of 15 kb of DNA, several mutant Taq polymerases, AA, RA, AH, EH, HA, and HH, successfully amplified the target DNA under conditions where WT Taq polymerase did not function. The other mutant enzymes prepared in this study did not work well in the same conditions. The performances of some of the mutant enzymes, RR, QY, ER, AR, and HK, are shown in **Figure 7**. These experiments showed inconsistency with the results of primer extension experiment. The target DNA product was not detected by the mutant enzymes that possess primer extension rate with *>*8 kb/5 min. The enzymes with extension rate of *>*8 kb/5 min have Arg or Lys at the positions 742 and 743. The observed PCR inhibition by these mutations may be due to the too tight binding to DNA, suggested by the EMSA as shown in the Section Faster Primer Extension by the Mutant Taq Polymerases. These results indicated that the positions of 742 and 743 in Taq polymerase are important for DNA strand synthesis, and the electrostatic environment of this site severely affects its PCR performance.

# **DISCUSSION**

DNA polymerase is an important enzyme for both fundamental living phenomena (DNA replication/repair) in cells and applications to genetic engineering *in vitro*. Therefore, numerous structural and functional investigations of DNA polymerase have been reported to date. In this study, we developed PCR enzymes that provide a superior extension reaction as compared to Taq polymerase, the standard enzyme for PCR. As compared to the PCR performance of Taq polymerase, these enzymes achieved the amplification of either the same length of DNA in a shorter time or a longer DNA in the same reaction time.

Metagenomic analysis is a revolutionary technique for microbiological ecology. The amplification of target genes from metagenomic DNA is a very powerful method to investigate many different DNA polymerases from uncultivated microbes. In this study, we focused on thermophilic bacteria as useful genetic resources for new thermostable family A DNA polymerases. We obtained many new sequences encoding a region of a family A DNA polymerase from the hot spring soil samples. These results suggested that our strategy to amplify a specific region of the family A DNA polymerase genes is actually applicable to the analysis of microbial populations in any habitat. We employed the same strategy to search for new family B DNA polymerases, and some of this work was published previously (Matsukawa et al., 2009).

We constructed chimeric enzymes between Taq polymerase and the products of the various *pol* genes amplified from the metagenomic DNAs, and their primer extension abilities were compared. Many chimeric polymerases possessing excellent extension ability were obtained by this experiment. However, none of the chimeric enzymes were sufficiently thermostable for PCR use. The microbial sources of the gene fragments used for the construction of the chimeric genes are not necessarily extreme thermophiles, and some moderate thermophiles and mesophiles may be included among the amplified genes. The chimeric Taq polymerases showing faster extension ability than WT Taq polymerase would have gene fragments from the organisms, which are not extreme thermophiles. However, the amino acid sequence comparison of the chimeric Taq polymerases provided an important clue to design a mutant Taq polymerase with superior speed for the primer extension reaction, by site-specific mutagenesis. We focused on positions 742 and 743 in this study. The positions 742 and 743 are located in the finger subdomain and affected the interaction with DNA. The PCR performances of some of the mutant Taq polymerases showed reliable improvement, and they are useful for faster PCR and also for longer target DNAs. The conversion of the electrostatic environment at this position, from a negative charge to a positive charge, will affect the stabilization of the DNA binding near the active site of Taq polymerase. It is important to check whether the mutations in this site affect the fidelity of Taq polymerase. Our preliminary data revealed that the fidelities of these enzymes are not different from that of WT Taq polymerase (data not shown). We will confirm this with more experiments and provide statistical data in the future.

In addition to positions 742 and 743, we found one more remarkable feature in the sequences of the chimeric enzymes. These enzymes have an insertion of either 9 or 3 amino acids between positions 738 and 739 of Taq polymerase. It will be interesting to investigate the effects of these insertions in the finger subdomain on the PCR performance of Taq polymerase. Characterizations of the mutant Taq polymerases with the different inserted sequences are now underway.

In conclusion, we designed a method for engineering Taq polymerase to improve its primer extension rate, by using information obtained from the metagenomic analysis of soil samples from various hot-spring areas. The created enzymes showed robust PCR performances that were better than that of Taq polymerase. Since Taq polymerase is the standard enzyme used for PCR, an abundance of PCR data using this enzyme has been accumulated to date. The enzymes created in this study basically retain the properties of Taq polymerase, and therefore, they are applicable to many uses that have already been optimized with Taq polymerases.

# **ACKNOWLEDGMENTS**

We thank Drs. Masaaki Takahashi and Yukiko Miyashita for valuable discussions and encouragement. This work was supported by grants from the Ministry of Education, Culture, Sports, Science and Technology of Japan [grant numbers 21113005, 23310152, and 26242075 to Yoshizumi Ishino]. This work was partly supported by Institute for fermentation, Osaka (IFO).

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 02 July 2014; accepted: 14 August 2014; published online: 03 September 2014. Citation: Yamagami T, Ishino S, Kawarabayasi Y and Ishino Y (2014) Mutant Taq DNA polymerases with improved elongation ability as a useful reagent for genetic engineering. Front. Microbiol. 5:461. doi: 10.3389/fmicb.2014.00461*

*This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology.*

*Copyright © 2014 Yamagami, Ishino, Kawarabayasi and Ishino. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Replication slippage of the thermophilic DNA polymerases B and D from the Euryarchaeota Pyrococcus abyssi

# *Melissa Castillo-Lizardo1, Ghislaine Henneke2,3 and Enrique Viguera1\**

<sup>1</sup> Departamento de Biología Celular, Genética y Fisiología, Facultad de Ciencias, Universidad de Malaga, Málaga, Spain

<sup>2</sup> Laboratoire de Microbiologie des Environnements Extrêmes, UMR 6197, Institut Français de Recherche pour l'Exploitation de la Mer, Université de Bretagne

Occidentale, Plouzané, France

<sup>3</sup> CNRS, UMR 6197, Laboratoire de Microbiologie des Environnements Extrêmes, Plouzané, France

#### *Edited by:*

Zvi Kelman, University of Maryland, USA

#### *Reviewed by:*

Juergen Reichardt, James Cook Univerrsity, Australia Bernard Connolly, University of Newcastle, UK

#### *\*Correspondence:*

Enrique Viguera, Departamento de Biología Celular, Genética y Fisiología, Facultad de Ciencias, Universidad de Málaga, Campus Universitario de Teatinos, 29071, Málaga, Spain e-mail: eviguera@uma.es

Replication slippage or slipped-strand mispairing involves the misalignment of DNA strands during the replication of repeated DNA sequences, and can lead to genetic rearrangements such as microsatellite instability. Here, we show that PolB and PolD replicative DNA polymerases from the archaeal model Pyrococcus abyssi (Pab) slip in vitro during replication of a single-stranded DNA template carrying a hairpin structure and short direct repeats. We find that this occurs in both their wild-type (exo+) and exonuclease deficient (exo-) forms. The slippage behavior of PabPolB and PabPolD, probably due to limited strand displacement activity, resembles that observed for the high fidelity P. furiosus (Pfu) DNA polymerase. The presence of PabPCNA inhibited PabPolB and PabPolD slippage. We propose a model whereby PabPCNA stimulates strand displacement activity and polymerase progression through the hairpin, thus permitting the error-free replication of repetitive sequences.

**Keywords: slippage, primer-template misalignment, DNA polymerases, strand displacement activity, Archaea**

# **INTRODUCTION**

Low complexity DNA sequences such as microsatellites (1–9 nt repeat length), including mono, di, and trinucleotide repeats, and minisatellites (unit ≥10 nt) are frequently associated with mutagenesis "hot-spots" in both eukaryotic and prokaryotic genomes (Bierne et al., 1991; Michel, 2000; Aguilera and Gomez-Gonzalez, 2008). These types of sequences are characterized by high instability, consisting of the addition or deletion of repeated units, leading to variations in repeat copy number. Such genetic variations have been termed "dynamic mutations" (Richards and Sutherland, 1992; Pearson et al., 2005). Arrest of the replication machinery within a repeated region is associated with such instability, where primer and template become misaligned (reviewed in Michel, 2000). This process, known as replication slippage, is involved in the generation of deletions or insertions within repeat regions (Viguera et al., 2001a; Lovett, 2004).

Replication slippage has been proposed to occur within homopolymeric runs (Kroutil et al., 1996) as well as in short and long tandem repeat sequences (Trinh and Sinden, 1993; Madsen et al., 1993; Tran et al., 1995; Bierne et al., 1996; Feschenko and Lovett, 1998). Repeated DNA sequences are generally characterized by the formation of non-B DNA structures, the majority of which can form intra-strand hairpin loops (Samadashwily et al., 1997; McMurray, 1999; Mirkin, 2007; Sinden et al., 2007). A direct role for replication slippage in the deletion of repeated sequences within hairpin structures has been demonstrated *in vitro* and *in vivo* (d'Alencon et al., 1994; Canceill and Ehrlich, 1996). Slippage-mediated deletions are believed to occur via a three step mechanism as illustrated in **Figure 1** (Viguera et al., 2001a). In this model, the polymerase pauses as it reaches the base of the

hairpin after copying the first direct repeat (DR),followed by polymerase dissociation. The 3 end of the nascent strand then unpairs from the template before reannealing to the second DR. This new primer/template complex is recognized by the polymerase, allowing replication to continue but also generating a deletion.

Several DNA polymerases have been tested for their propensity to slip *in vitro* when replicating hairpin-containing templates. Surprisingly, the replicative DNA polymerase Pol III holoenzyme (HE) from *Escherichia coli* can slip *in vitro* (Canceill and Ehrlich, 1996; Canceill et al., 1999). This is of utmost importance because high fidelity replication is required to maintain genome integrity. Studies on DNA polymerases involved in DNA repair such as *E. coli* Pol I, *E. coli* Pol II, and the T4, T7, and φ29 phage DNA polymerases revealed that the strand displacement activity of a DNA polymerase is inversely related to their propensity to slip. DNA polymerases with high strand displacement activity such as φ29 or T7 pol exo- (SequenaseTM) can progress through template hairpin structures and consequently do not slip. On the other hand, DNA polymerases devoid of strand displacement activity such as *E. coli* DNA Pol II or T4 are blocked at the base of the hairpin, promoting DNA repeat misalignment and subsequent loss of repeat sequences. Depending on the template and the strand displacement activity of a DNA polymerase it is possible for the deletion error rate to exceed the base substitution rate (Kunkel and Bebenek, 2000). In the context of the model proposed for slippagemediated deletions (**Figure 1**), a polymerase with high strand displacement activity would be able to open the hairpin duplex, avoiding the polymerase dissociation and nascent strand reannealing steps, and thus replicate the repeat-containing template faithfully.

Several thermostable DNA polymerases utilized for PCR can also slip even under the high temperatures used during PCR amplification (Viguera et al., 2001b); **Table 1**. DNA polymerases with high fidelity in terms of base substitution rate, such as *P. furiosus* (*Pfu*Pol), consistently undergo slippage and introduce deletions while replicating hairpin-containing templates (Viguera et al., 2001b). In contrast, a low fidelity DNA polymerase such as *Thermus aquaticus* (*Taq*Pol) can replicate the same hairpin sequence without introducing deletions, although this is dependent on the magnesium concentration used. Other thermostable DNA polymerases endowed with a high strand displacement activity such as *Thermococcus fumicolans* (*Tfu*Pol) or *Bacillus stearothermophilus* (*Bst*Pol) also do not slip when replicating hairpin-containing sequences (Viguera et al., 2001b).

We have studied here the biochemical properties of DNA polymerases PolB (*Pab*PolB) and PolD (*Pab*PolD) from the hyperthermophilic euryarchaeon *P. abyssi* in terms of slippage during *in vitro* primer extension reactions. Archaeal replication proteins are more closely related to their eukaryotic than their bacterial equivalents. Euryarchaeal members contain DNA polymerases that belong to both the ubiquitous B family as well as the D family, which is unique to archaea (Ishino et al., 1998; Barry and Bell, 2006; Raymann et al., 2014). Both PolB and PolD have associated 3 -5 exonuclease activity and moderate strand displacement activity, although PolB cannot displace a RNA-DNA hybrid (Henneke, 2012). However, in the presence of *Pab* Proliferating cell nuclear antigen (PCNA), both *Pab* polymerases show strand displacement activity (Henneke et al., 2005; Rouillon et al., 2007). Moreover, *Pab*PCNA can be loaded onto DNA in the absence of the clamp-loader replication factor C (RF-C), although the presence of this factor does enhance its loading (Rouillon et al., 2007).

In this work, we report that both *P. abyssi* DNA polymerases slip *in vitro* on a template that consists of single-stranded DNA (ssDNA) with a hairpin structure flanked by short direct repeats. In addition, we find that *Pab*PCNA increases replication fidelity of this template by triggering the strand displacement activity of *Pab*polB. Furthermore, we describe the effect of magnesium concentration on the replication slippage of both *Pab* DNA polymerases. These results help toward understanding the dynamics of replication through common non-B DNA structures and identifying the key DNA polymerases involved in replication slippage; a crucial step for understanding genome stability in these organisms.

#### **MATERIALS AND METHODS PROTEINS**

*Pab*PCNA, *Pab*pol D, and exonuclease-deficient *Pab*pol D were obtained from G. Henneke (Ifremer, Brest, France). They were



The main product obtained for each reaction is shown. S indicates slipped molecules, generated by replication slippage error. P indicates parental molecules, indicative of faithful replication. rcr indicates high molecular weight molecules generated by rolling circle replication as a consequence of the strand displacement activity of a DNA polymerase. nd, not determined.

cloned, expressed and purified as described (Gueguen et al., 2001; Henneke et al., 2002, 2005; Palud et al., 2008). *Pab*polB (IsisTM) and *Pab*polB exonuclease-deficient (*Pyra*TM exo-) were purchased from MP Biomedicals. One unit of *Pab* pols corresponds to the incorporation of 1 nmol of total dTMP into acid precipitable material per minute at 65◦C in a standard assay containing 0.5 μg (nucleotides) of poly(dA)/oligo(dT)10:1 M13 gene protein II (gp II) was purified to homogeneity as described (Greenstein and Horiuchi, 1987). *Thermus thermophilus* SSB was a kind gift from Drs. C. Perales and J. Berenguer (CBM-SO, Madrid). Native *Pfu* Pol was from Stratagene. *Taq* Pol was from Roche Molecular Biochemicals.

#### **ssDNA TEMPLATE**

Construction of the pHP727FXc plasmid has been described previously (Canceill and Ehrlich, 1996). Preparation of ssDNA templates was carried out essentially as described (Canceill and Ehrlich, 1996) with the following modifications: plasmid DNA was extracted using a Maxi Plasmid Kit (Qiagen). Briefly, a specific nick was introduced into the f1 replication origin (+) strand of purified FXc plasmid DNA using the M13 gpII protein. The reaction was stopped with 20 mM EDTA and the products treated with 200 μg/ml proteinase K for 10 min at 55◦C, phenol extracted and dialyzed against TE buffer. Nicked strands were removed by exonuclease III digestion (10–40 units per μg of DNA for 1 h at 37◦C). Finally, Exo III, nucleotides and oligonucleotides were removed using QIAquick® PCR (Qiagen) purification kits.

#### **PRIMER EXTENSION REACTIONS**

*Pyrococcus abyssi* pols were tested in a primer extension reaction performed as described (Canceill and Ehrlich, 1996; Canceill et al., 1999). Briefly, 24.3 fmol of a 5 -end fluorescein labeled primer (Applied Biosystems) designated 1233 (5 AGC GGA TAA CAA TTT CAC ACA GGA 3 ), were annealed 1235 bases upstream of the palindrome. All 10 μl primer extension reactions contained 25 ng (12.2 fmol) of primed ssDNA, and *Pab* pols that were added to the reaction mixture as indicated in the figure legends. Additionally, reactions contained unless otherwise mentioned 50 mM Tris-HCl (pH 8.8), 50 mM KCl, 10 mM DTT, 2 mM MgCl2, and 200 μM dNTPs. Reactions were performed at 60◦C for 30 min, synthesis was arrested by the addition of 25 mM EDTA and 500 μg/ml proteinase K, and the mixture was further incubated for 15 min at 55◦C. Reaction products were analyzed by electrophoresis using 0.8% agarose gels under native conditions, run in TBE buffer (89 mM Tris-borate, 2 mM EDTA, pH 8.3) at 2V/cm for 16 h and visualized with a Typhoon 9400Variable Mode Imager (Amersham Biosciences, GE Healthcare). Analysis of the results was performed using Image Quant 5.2 software. Quantification analysis was performed with Visilog 6.3 (Visualization Sciences Group. Noesis). A common fixed area was selected at the center of the bands corresponding to parental and heteroduplex molecules. The average gray value of all pixels of each area was obtained and the proportion of parental/heteroduplex was calculated.

*Pfu* Pol and *Taq* Pol were tested as above except that 200 μM dGTP, dATP, and dTTP (each), 40 μM dCTP and 50 μM (2.5 μCi) (α-32P)dCTP was used. The reaction buffers were prepared magnesium free as those furnished by the suppliers and contained, in addition to 30 mM NaCl brought by the primed ssDNA, the following ingredients: (i) for *Taq* Pol: 10 mM Tris-HCl pH 8.3, 50 mM KCl; (ii) for Native *Pfu* Pol: 20 mM Tris-HCl pH 8.0, 10 mM KCl, 6 mM (NH4)2SO4, 0.1% Triton® X-100, 10 μg/ml BSA. After gel electrophoresis, DNA was visualized by direct exposure of the dried gels to Imaging Plates (IP BAS-MP 2040S) and analyzed on a Fujifilm-BAS 1500.

# **RESULTS**

### **EXPERIMENTAL SYSTEM**

To study whether *P. abyssi* thermostable DNA polymerases (*Pab* pols) promote replication slippage, we performed primerextension assays using the circular ssDNA template, FXc (Canceill and Ehrlich, 1996; Canceill et al., 1999; Viguera et al., 2001a,b). This template contains two 27 bp direct repeats (DRs) that flank a pair of 300 bp inverted repeats (IR) separated by a 1.3 kb insert as shown in **Figure 2**. The IRs anneal to form a stem-loop with the DRs at its base.

DNA synthesis was carried out with a fluorescently labeled primer and the reaction products analyzed by agarose gel electrophoresis. Faithful replication of the FXc template generates complete double-stranded parental (P) molecules, which migrate in a retarded position on the gel. A slippage event generates a heteroduplex molecule (H), composed of a parental strand annealed to a recombinant strand lacking one of the DRs and the 1370 bp region between them. Heteroduplex molecules migrate ahead of parental molecules. Stalled (S) replication as the polymerase reaches the base of the hairpin results in a truncated molecule that migrates further than either parental or heteroduplex molecules.

double-stranded pHP727FXc plasmid containing a central 1370 bp region (insert) flanked by 300-bp inverted repeats (IR: yellow arrows) and 27-bp direct repeats (DR; red arrows). ssDNA FXc template is prepared in vitro and primer extension reactions performed at 60◦C in the presence of a fluorescein-labeled primer (green arrow) and DNA polymerase as described in the "Materials and Methods." Reaction products are then separated by

#### **SLIPPAGE OF** *P. abyssi* **PolB AND PolD REPLICATIVE POLYMERASES**

It has been proposed that PolB and PolD have different roles in the cell, both participating at the replication fork in a manner analogous to *Bacillus subtilis* and the eukaryotic replisome. The current model for *P. abyssi* DNA replication proposes that *Pab*PolD performs RNA-primed DNA synthesis and is later displaced by *Pab*PolB to carry out processive DNA synthesis, at least on the leading strand (Henneke et al.,2005; Rouillon et al.,2007). Because of the capacity of *Pab*PolD to displace RNA primers in a PCNAdependent manner, it has been suggested that *Pab*PolD is involved in lagging strand replication. However, definitive confirmation using genetic approaches such as those employed with eukaryotic polymerases has yet to be performed.

In order to test whether their putatively separate roles in leading and lagging strand replication also imply different slippage properties, we examined the slippage efficiency of wildtype *Pab*PolB and *Pab*PolD enzymes using the FXc template (**Figure 2**). *Pab*PolB generated both parental and heteroduplex molecules, which indicate a mixture of normal FXc replication and slippage events (**Figure 3**, lanes 1–3). Similar proportions of parental and heteroduplex molecules were produced by *Pab*PolB, with a slightly higher ratio of parental molecules as the polymerase concentration was increased. *Pab*PolD behaved in a similar way although the proportion of heteroduplex molecules was higher and overall synthesis was improved at higher polymerase concentrations (**Figure 3**, lanes 7–9). These results indicate that *Pab*PolB and *Pab*PolD can slip under our assay conditions. The behavior of *Pab*PolB and *Pab*PolD is similar to that observed for Pol III HE, T7 Pol, or *Taq* Pol that also produce both parental and heteroduplex molecules (Canceill and Ehrlich, 1996; Canceill et al., 1999; Viguera et al., 2001b).

with one strand lacking one direct repeat unit and the hairpin. S indicates stalled molecules generated by arrest of the polymerase at the base of the hairpin. Bands migrating between S and H correspond to DNA polymerase arrest inside the hairpin. Bands migrating above P corresponds to high molecular weight molecules generated by displacement of the extended primer (Viguera et al., 2001b).

To generate parental molecules, a DNA polymerase must open the hairpin formed by the annealed inverted repeats of the singlestranded template (**Figure 2**), which is largely dependent on a DNA polymerase's strand displacement activity. As a consequence, polymerases with high strand displacement activity (e.g., φ29 DNA polymerase) do not slip while DNA polymerases devoid of strand displacement activity (e.g., *E. coli* Pol II or T4 DNA pol) generate heteroduplex molecules as the sole product of the reaction (Canceill et al., 1999).

Strand displacement activity is modified in some DNA polymerase exo- mutants. For example the T7 DNA polymerase has relatively low strand displacement activity (Canceill et al., 1999). However, the T7 pol exo- (SequenaseTM), carrying a 28 amino acid deletion that inactivates its proofreading activity (Engler et al., 1983; Lechner et al., 1983), has increased strand displacement activity that prevents T7 pol exo- slippage (Canceill et al., 1999). Similarly, the *E. coli* Pol II exo- mutant gains a degree of strand displacement activity and the ability to synthesize parental molecules (Canceill et al., 1999). However, not all exo- forms exhibit increased strand displacement activity. For example, an exo- form of F 29 caused by a point mutation shows a 90% reduction in strand displacement activity compared to the native enzyme (Soengas et al., 1992).

To test whether *Pab* pol exo- variants have modified slippage properties, we performed FXc template primer extension assays using exo- mutant forms of *Pab*PolB and *Pab*PolD carrying single point mutations (D215A and H451A, respectively; see Palud et al., 2008). Our results show that the exo- forms of *Pab*PolB and *Pab*PolD both generated heteroduplex and parental molecules. However, unlike their native forms, increasing polymerase concentration inhibited slippage and resulted in a higher

proportion of parental molecules (**Figure 3**, lanes 4–6 and 10–12, respectively). Both the native *Pab*PolD enzyme and its exo- form produced some molecules that migrate between the heteroduplex and stalled molecules, possibly the result of inefficient polymerase progression through the hairpin (Viguera et al., 2001a).

#### **MAGNESIUM CONCENTRATION AFFECTS THE SLIPPAGE OF** *P. abyssi* **POLYMERASES**

The concentration of divalent cations needs to be precisely controlled during DNA synthesis as it affects enzyme activity, enzyme fidelity, primer/template annealing, and the stability of secondary structures, such as the stem-loop used in our assay. The fidelity of *Taq* and *Pfu* DNA polymerases in terms of base substitution and frameshift errors is dependent on magnesium concentration (Eckert and Kunkel, 1990; Cline et al., 1996). Moreover, trinucleotide repeat expansions are produced *in vitro* by *Taq*, *E. coli* Pol I, and the Pol I Klenow fragment at certain magnesium concentrations (Lyons-Darden and Topal, 1999). With respect to slippage, magnesium concentration differentially affects the slippage errors produced by thermostable DNA polymerases (Viguera et al., 2001b). *In vitro* experiments showed that slippage errorderived heteroduplex molecules account for almost all the product generated by *Pfu* DNA polymerase over the magnesium concentration range that permits efficient DNA synthesis (0.5–7.5 mM MgSO4). In contrast, from the same template *Taq* DNA polymerase faithfully generates parental molecules at a low magnesium concentration (0.5 mM MgCl2), heteroduplex molecules at high magnesium concentrations (10–20 mM MgCl2) and a mixture of parental and heteroduplex molecules at intermediate magnesium concentrations (1–7.5 mM MgCl2; Viguera et al., 2001b).

These observations prompted us to analyze the effect of magnesium concentration on the slippage errors produced by the wildtype and exo- forms of *Pab*PolB and *Pab*PolD. We found that varying magnesium concentration affected both slippage and overall DNA synthesis (**Figure 4**). There was almost no synthesis by *Pab*PolB, *Pab*PolD or their exo- forms at low magnesium concentrations (0.1–0.5 mM; **Figure 4A**, lanes 2–3 and 12–13; **Figure 4B**, lanes 2–3 and 12–13). Synthesis was also inhibited at the highest concentrations tested (15–20 mM; **Figure 4A**, lanes 9–10 and 19–20; **Figure 4B**, lanes 9–10 and 19–20). Parental molecules were readily detectable together with heteroduplex molecules at low to medium magnesium concentrations (1–5 mM; **Figure 4A**, lanes 4–6). Increasing magnesium concentration up to 15 mM decreased the proportion of parental molecules and resulted in heteroduplex molecules as the main reaction product (**Figure 4A**, lanes 7–9). This latter result could be due to stabilization of the hairpin structure by high magnesium concentrations making polymerase progression more difficult inside the hairpin (Canceill and Ehrlich, 1996). The effect of magnesium concentration on *Pab*PolB was similar at the 40 and 100 μM nucleotide concentrations tested (data not shown). *Pab*PolB exobehaved in a similar way to the wildtype enzyme (**Figure 4A**, lanes 14–18).

Although both *Pab*PolD and *Pab*PolD exo- generated parental molecules, the main reaction products were heteroduplex molecules whenever synthesis was efficient (**Figure 4B**, lanes 4–10 and lanes 14–20). Additionally, some of the molecules generated by *Pab*PolD and *Pab*PolD exo- migrated between heteroduplex and stalled molecules (**Figure 4B**, lanes 8–9 and 15–20) that probably represent partially replicated DNA molecules due to inefficient polymerase progression within the hairpin.

In order to confirm that the magnesium concentrations used in the previous experiment are compatible with efficient *Pab*PolB and *Pab*PolD DNA synthesis, we performed primer extension experiments using a 5 fluorescently labelled primer (33 mer) and a short single-stranded linear DNA template (87 mer) that has the potential to form a 28 bp secondary structure but lacks DRs (Henneke, 2012). This assay should only assess the replication efficiency of a template with a small hairpin without the possibility of slippage between DRs at its base. Fully replicated molecules were generated for both *Pab*PolB and *Pab*PolD using the same range of magnesium concentrations used for the primer extension assays (Figure S1), which indicates that the reaction conditions used were optimal for DNA synthesis.

We conclude that in spite of their high fidelity in terms of base substitution, *Pab*PolB and *Pab*PolD are highly prone to slip on ssDNA templates upon encountering secondary structures flanked by DRs, generating parental and heteroduplex molecules in a magnesium concentration-dependent manner. This is in agreement with previous results describing similar behavior for *Taq* DNA polymerase (Viguera et al., 2001b); Figure S2. In contrast, native *Pfu* DNA polymerase generates mostly heteroduplex molecules regardless of magnesium concentration (Viguera et al., 2001b); Figure S2.

#### *Pyrococcus abyssi* **PCNA CAN MODULATE THE SLIPPAGE OF** *Pab***PolB AND** *Pab***PolD**

The sliding clamp of Archaea, Eukarya, and Bacteria forms a ring around dsDNA that prevents the dissociation of DNA polymerases from their template, thus enhancing processivity (O'Donnell et al., 2013). Moreover, it acts as a platform that regulates polymerase switching, coupling DNA replication and DNA repair (López de Saro, 2009). The *E. coli* sliding clamp β homodimer subunit

requires the clamp loader (or γ complex) to load it onto the template. The addition of β to primer extension reactions on a hairpin-containing template favors Pol III HE slippage as the synthesis of heteroduplex molecules is stimulated (Canceill and Ehrlich, 1996).

*Pyrococcus abyssi*, possesses a single processivity clamp, PCNA, that forms a homotrimer (Castrec et al., 2009). In contrast to Bacteria and Eukarya, the archaeal PCNA can be loaded onto DNA without a clamp-loader. *Pab*RF-C and *Pab*PolB, but not *Pab*PolD, enhance PCNA loading. *Pab*RF-C and *Pab*PolB associate with *Pab*PCNA, forming a stable complex on primed DNA (Rouillon et al., 2007).

We therefore investigated the role of PCNA on *Pab*PolB and *Pab*PolD slippage (**Figure 5**). The addition of equimolar amounts of *Pab*PCNA to the FXc replication assay reduced *Pab*PolB slippage. Formation of parental molecules was stimulated and the proportion of heteroduplex molecules diminished (**Figure 5**, lanes 1–3) with respect to reactions performed in the absence of PCNA (compare with **Figure 3**, lanes 1–3). *Pab*PCNA addition also reduced *Pab*PolB exo- slippage (**Figure 5**, lanes 4–6, compare with **Figure 3**, lanes 4–6). Furthermore, the addition of PCNA resulted in the appearance of slowly migrating high molecular weight molecules (**Figure 5**, lanes 5–6). These molecules could be the result of rolling circle replication (*rcr*), which implies that after completion of one round of replication, the newly synthesized strand becomes displaced allowing synthesis to continue (Canceill et al., 1999).

The presence of *Pab*PCNA also increases the proportion of parental versus heteroduplex molecules generated by *Pab*PolD (**Figure 5**, lanes 7–9), indicating that it also represses slippage by this DNA polymerase. However, *Pab*PCNA had only a slight effect on *Pab*PolD exo- slippage (**Figure 5**, lanes

10–12), as the proportion of parental molecules was only slightly higher.

We conclude from these experiments that *Pab*PCNA stimulates the ability of *Pab*PolB and *Pab*PolD to replicate through a hairpin structure by inhibiting slippage, with the strongest effect in terms of slippage inhibition observed on *Pab*PolD exo-. These results agree with those obtained by Henneke et al. (2005)in which *Pab*PCNA stimulated the strand displacement activity of *Pab*PolB and *Pab*PolD (Henneke, 2012).

#### **THE** *Thermus thermophilus* **SINGLE-STRANDED DNA BINDING (SSB) PROTEIN DOES NOT AFFECT SLIPPAGE ERRORS PRODUCED BY** *Pab* **DNA POLS**

The amount of slippage exhibited by different DNA polymerases has been shown to be modulated by SSB proteins (Canceill and Ehrlich, 1996; Canceill et al., 1999). *E. coli* SSB stimulates the slippage of Pol III HE, inhibits the slippage of *E. coli* polymerase I and T7 DNA polymerase, and has no effect on *E. coli* pol II or T4 DNA polymerase. On the other hand, T4 SSB protein (gp32) inhibits T4 DNA pol slippage but does not affect the slippage properties of the Pol I Klenow fragment. These contrasting effects of SSB proteins on the same DNA template cannot be understood solely in terms of interaction with DNA, but rather as an interaction between SSB proteins and the different polymerases that alters their strand displacement activity (Canceill et al., 1999).

We therefore investigated whether a SSB protein could modify the slippage properties of *Pab* pols. *Thermus thermophilus* (*Tth*) SSB stimulates DNA synthesis of *Tth* DNA polymerase and the heterologous DNA polymerase from the Archaea *P. furiosus* (Perales et al., 2003). Furthermore, *Tth*SSB increases the fidelity of proofreading deficient *Thermus thermophilus* DNA polymerase (Perales et al.,2003).We tested the slippage properties of *Pab*PolB,*Pab*PolD, and their exo- forms in the presence of increasing amounts of *Tth*SSB. Results obtained for *Pab*PolB and its exo- form are shown in **Figure 6A**, lanes 1–8. Heteroduplex products were detected in both the presence and absence of *Tth*SSB. Similar results were obtained for *Pab*PolD and its exo- form (**Figure 6B**, lanes 1–6). We conclude that *Tth*SSB neither stimulated overall synthesis efficiency nor the slippage of *Pab* polymerases under the reaction conditions assayed.

# **DISCUSSION**

Interest in DNA repeat instability has increased dramatically since links were established between expansions of trinucleotide repeats and neurodegenerative diseases (Bacolla andWells, 2009; reviewed in Kim and Mirkin, 2013), microsatellite instability and certain types of cancer (Shah et al., 2010) and the identification of frameshift-mediated regulation of gene expression at simple sequence contingency loci of pathogenic bacteria such as *Neisseria* or *Haemophilus* (Moxon et al., 1994; Bayliss et al., 2001; reviewed in Gemayel et al., 2010).

Because of the association between DNA repeat instability and DNA replication, DNA polymerases have been analyzed *in vitro* to establish their ability to replicate repeated DNA sequences. We have shown previously that the main replicative DNA polymerase of the model bacteria *E. coli*, DNA Pol III HE, is able to slip *in vitro* on hairpin-containing templates despite the high fidelity required for genome replication (Canceill et al., 1999). Moreover, slippage is stimulated by factors that affect Pol III processivity such as

the presence of β-clamp or SSB proteins (Canceill and Ehrlich, 1996).

Thermostable DNA polymerases are widely used for a number of applications, mostly involving PCR amplification. We have previously shown that replication slippage occurs efficiently even during the first PCR amplification cycle of *Taq* Pol, *Pfu* Pol, PyraTM Pol (*Pab* PolB exo-), or the ExpandTM mixture (*Taq* Pol and *Pwo* Pol;Viguera et al., 2001b). However, no slippage was detected during PCR performed by *Tfu* Pol or Vent® Pol. Since, high fidelity DNA polymerases can undergo slippage in terms of base substitution, slippage is only inhibited in those polymerases endowed with high strand displacement activity. Thus, the use of DNA polymerases with high strand displacement activity is advisable when amplifying DNA templates with potential strong secondary structures.

In this study, we have shown that replicative *P. abyssi* DNA polymerases are able to slip *in vitro* on hairpin containing templates as heteroduplex molecules, indicative of slippage error, as well as parental molecules were obtained at every enzyme concentration assayed. This result is quite different to those obtained for the thermostable polymerase B from *P. furiosus* (*Pfu*) or the mesophilic DNA polymerases *E. coli* Pol II and T4, where heteroduplex molecules are the only reaction product (Viguera et al., 2001b). Although native *Pfu* Pol generated some parental molecules, the main reaction products were heteroduplex structures (Figure S2B). In contrast, *Taq* DNA pol, Pol III HE, Pol I, and T7 DNA pol generate heteroduplex and/or parental molecules depending on the polymerase concentration

used in the assay (Canceill et al., 1999; Viguera et al., 2001b; Figure S2A). High polymerase concentrations are believed to promote step-by-step progression inside the hairpin via multiple association/dissociation events (Canceill et al., 1999), thus slippage assay sensitivity to polymerase concentration is consistent with a polymerase possessing some degree of strand displacement activity.

Our interpretation is that the different slippage properties of the closely related *Pab*PolB and *Pfu* Pol are most likely due to *Pab*PolB having higher strand displacement activity, which allows it to generate a higher proportion of parental molecules.

Both *Pab*PolB and *Pab*PolD generated heteroduplex molecules alone or a combination of parental and heteroduplex products at the different magnesium concentration tested whenever synthesis was efficient. This result was somewhat similar to that obtained for *Taq* DNA pol, where either parental or heteroduplex molecules were obtained depending on the magnesium concentration (Viguera et al., 2001b), although we did not find any reaction condition where parental molecules were the sole product of either of the *Pab* polymerases. The strand displacement activity of *Pab*PolB and *Pab*PolD is probably insufficient to reliably open and progress through the hairpin structure even at the lowest magnesium concentration tested; a condition that should reduce DNA duplex stability. Consequently, even if *Pab*PolB and *Pab*PolD have different roles at the replication fork, they do not differ in terms of their slippage properties. This result prompted us to study other cellular factors that could inhibit the slippage errors detected by our *in vitro* assay.

We have shown that the *Pab*PCNA sliding clamp promotes the synthesis of parental molecules by *Pab*PolB. This effect is even more prominent for the exonuclease-deficient *Pab*PolB (**Figure 5**). In comparison to *Pab*PolB, inhibition of slippage by PCNA was weaker for *Pab*PolD (**Figure 5**). *Pab*PolB has been identified as the leading strand DNA polymerase (Henneke et al., 2005; Rouillon et al., 2007). *Pab*PCNA interacts with *Pab*PolB in a DNA-dependent way and stimulates its processivity, clamping *Pab*PolB to DNA (Henneke et al., 2005; Henneke, 2012). The higher processivity of the *Pab*PCNA-PolB complex and stimulation of strand displacement activity (Henneke et al., 2005) would facilitate opening of the DNA duplex leading to the synthesis of parental molecules. The effect of *Pab*PCNA is further increased for *Pab*PolB exo-. One possibility is that strand displacement activity is increased to some degree in this mutant and that this facilitates parental formation as has been observed for T7 pol exo- (SequenaseTM) and Pol II exo- (Canceill et al., 1999).

In our opinion, this result confirms *Pab*PCNA-PolB as a competent and stable complex, capable of continuously synthesizing the leading strand. Upon encountering secondary structures (such as hairpin loops), DNA synthesis is unperturbed and the *Pab*PCNA-PolB complex is capable of continuing strand elongation.

The reason that *Pab*PCNA inhibited *Pab*PolD replication slippage to a lesser extent than *Pab*PolB is probably due to insufficient stimulation of strand displacement activity (Henneke et al., 2005) under the conditions tested. *Pab*PCNA binds PolB and PolD in different ways (Castrec et al., 2009). Two PCNA-interacting protein (PIP) boxes are needed for *Pab*PolD binding to *Pab*PCNA whereas only one PIP motif is essential for *Pab*PolB binding. This suggests that the mechanism involved in the *Pab*PCNA-mediated stimulation of *Pab*PolD may be different from that involved in *Pab*PolB.

Previous work (Canceill and Ehrlich, 1996) has shown that in *E. coli*, the β-clamp does not stimulate the generation of parental molecules by Pol III HE but instead increases the formation of heteroduplex ones. Our finding that the addition of *Pab*PCNA promotes parental formation by *Pab*PolB and *Pab*PolD, indicates that interaction between DNA polymerases and PCNA in the archaeal *P. abyssi* promotes faithful replication of DNA secondary structures. The functional homology between archaeal and eukaryal proteins, i.e., human PCNA can be loaded onto DNA by the *P. abyssi* RF-C complex (Henneke et al., 2002), suggests that slippage by replicative eukaryal DNA polymerases may also be inhibited by presence of the sliding clamp.

#### **ACKNOWLEDGMENTS**

We thank Dr. C. Perales and Dr. J. Berenguer (CBM-SO, Madrid) for their generous gift of *Tth*SSB and Dr. J. R. Pearson for scientific English editing of the manuscript. This work was supported by grants BFU2007-64153 from the Ministerio de Educación y Ciencia and P09-CVI-5428 from the Junta de Andalucía to Enrique Viguera. Ghislaine Henneke was financially supported by grant ANR-10-JCJC-1501 01 from the National Research Agency. Melissa Castillo-Lizardo acknowledges the short-term fellowship from EMBO and the FPI predoctoral Fellowship BES-2005-10150 from Ministerio de Educación y Ciencia, Spain.

#### **SUPPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www.frontiersin.org/journal/10.3389/fmicb.2014.00403/ abstract

#### **REFERENCES**


*abyssi* does not need ATP hydrolysis for clamp-loading and contains a functionally conserved RFC PCNA-binding domain. *J. Mol. Biol.* 323, 795–810. doi: 10.1016/S0022-2836(02)01028-8


**Conflict of Interest Statement:** The Review Editor Bernard Connolly declares that, despite having collaborated and published with Ghislaine Henneke, the review process was handled objectively. All authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 30 May 2014; accepted: 17 July 2014; published online: 07 August 2014. Citation: Castillo-Lizardo M, Henneke G and Viguera E (2014) Replication slippage of the thermophilic DNA polymerases B and D from the Euryarchaeota Pyrococcus abyssi. Front. Microbiol. 5:403. doi: 10.3389/fmicb.2014.00403*

*This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology.*

*Copyright © 2014 Castillo-Lizardo, Henneke and Viguera. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited andthatthe original publication inthis journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# PCR performance of a thermostable heterodimeric archaeal DNA polymerase

# *Tom Killelea1,2,3, Céline Ralec 1,2,3, Audrey Bossé1,2,3 and Ghislaine Henneke1,2,3\**

*<sup>1</sup> Université de Bretagne Occidentale, UMR 6197, Laboratoire de Microbiologie des Environnements Extrêmes, Plouzané, France*

*<sup>2</sup> Ifremer, UMR 6197, Laboratoire de Microbiologie des Environnements Extrêmes, Plouzané, France*

*<sup>3</sup> CNRS, UMR 6197, Laboratoire de Microbiologie des Environnements Extrêmes, Plouzané, France*

#### *Edited by:*

*Zvi Kelman, University of Maryland, USA*

#### *Reviewed by:*

*Uli Stingl, King Abdullah University of Science and Technology, Saudi Arabia Dennis W. Grogan, University of*

*Cincinnati, USA*

#### *\*Correspondence:*

*Ghislaine Henneke, Laboratoire de Microbiologie des Environnements Extrêmes, Ifremer, UMR 6197, ZI de la point du diable, CS 10070, 29280 Plouzané, France e-mail: ghenneke@ifremer.fr*

DNA polymerases are versatile tools used in numerous important molecular biological core technologies like the ubiquitous polymerase chain reaction (PCR), cDNA cloning, genome sequencing, and nucleic acid based diagnostics. Taking into account the multiple DNA amplification techniques in use, different DNA polymerases must be optimized for each type of application. One of the current tendencies is to reengineer or to discover new DNA polymerases with increased performance and broadened substrate spectra. At present, there is a great demand for such enzymes in applications, e.g., forensics or paleogenomics. Current major limitations hinge on the inability of conventional PCR enzymes, such as *Taq*, to amplify degraded or low amounts of template DNA. Besides, a wide range of PCR inhibitors can also impede reactions of nucleic acid amplification. Here we looked at the PCR performances of the proof-reading D-type DNA polymerase from *P. abyssi*, Pab-polD. Fragments, 3 kilobases in length, were specifically PCR-amplified in its optimized reaction buffer. Pab-polD showed not only a greater resistance to high denaturation temperatures than *Taq* during cycling, but also a superior tolerance to the presence of potential inhibitors. Proficient proof-reading Pab-polD enzyme could also extend a primer containing up to two mismatches at the 3 primer termini. Overall, we found valuable biochemical properties in Pab-polD compared to the conventional *Taq*, which makes the enzyme ideally suited for cutting-edge PCR-applications.

**Keywords: DNA polymerase, Archaea, family D, PCR,** *Pyrococcus*

# **INTRODUCTION**

On the basis of their amino acid sequence and structural analysis, DNA polymerases have been classified into seven families, A, B, C, D, E, X, and Y (Delarue et al., 1990; Braithwaite and Ito, 1993; Joyce and Steitz, 1994; Cann et al., 1998; Ishino et al., 1998; Ohmori et al., 2001; Lipps et al., 2003). Despite the conserved template-directed synthesis (or editing) of a complementary deoxyribonucleotide chain (Kornberg and Baker, 1992; Hubscher et al., 2010) and the similarity of the three-dimensional organization of their polymerase domain ("palm," "thumb," and "finger") (Joyce and Steitz, 1994; Rothwell and Waksman, 2005), DNA polymerases differ extensively in many of their specific features (e.g., processivity, fidelity, rate of DNA synthesis, and nucleotide selectivity) (Hubscher et al., 2010; Langhorst et al., 2012).

Beginning with the discovery and characterization of DNA polymerase I (Family A) from *Thermus aquaticus* (*Taq*) (Chien et al., 1976), a variety of thermostable DNA polymerases have been isolated and identified from prokaryotic organisms. Besides their crucial biological functions, thermostable DNA polymerases have proven to be technically and economically important enzymes. They are versatile tools used in DNA technologies such as cycle sequencing and polymerase chain reaction (PCR) (Pavlov et al., 2004). Since its invention by Saiki et al. (1985), PCR has become a widespread molecular biology method. Originally PCR was developed to specifically amplify a stretch of DNA prior to cloning; however, its flexibility underpins a number of applications such as site-directed mutagenesis, genetic diagnostics, gene therapy, forensics, and paleogenomics.

In the PCR, DNA amplification is performed by thermostable enzymes; invariably either family A DNA polymerases from thermophilic and hyperthermophilic Bacteria (e.g., *Thermus aquaticus*, Taq-polA and *Thermotoga maritima*, Tma-polA) or family B DNA polymerases from hyperthermophilic Archaea (e.g., *Pyrococcus furiosus*, Pfu-polB and *Pyrococcus abyssi*, Pab-polB; *Isis*™). Family Y DNA polymerase from the hyperthermophilic archaeon *Sulfolobus solfataricus*, Sso-polY, is also an enzyme marketed for PCR, but with specialist applications (McDonald et al., 2006). Each thermostable DNA polymerases has different characteristics (e.g., thermostability, processivity, fidelity, specificity, modified nucleotides selection, resistance to contaminants and inhibitors, slippage, pyrophosphorolysis) and to achieve optimal results, the choice of a PCR enzyme depends on the application itself (e.g., high-yield PCR, high-fidelity PCR, routine PCR, multiplex PCR, colony PCR, difficult PCR, long Range PCR, fast PCR, incorporation of modified nucleotides). Detailed information about individual properties of PCR enzymes and their related applications have been recently reviewed (Terpe, 2013).

Archaeal family B DNA polymerases are generally more thermostable enzymes than bacterial polymerases (Kong et al., 1993; Takagi et al., 1997; Cambon-Bonavita et al., 2000; Gueguen et al., 2001; Hogrefe et al., 2001; Moussard et al., 2006; Marsic et al., 2008). In addition, when accuracy is a desired property it leads to a preference for archaeal enzymes which possess a 3- –5- proof-reading exonuclease activity, absent in most thermostable bacterial polymerases (Eckert and Kunkel, 1991; Cline et al., 1996; Perler et al., 1996). On the other hand, archaeal family B DNA polymerases can incorporate dUTP during DNA replication but cannot copy these strands in subsequent DNA amplification rounds (Fogg et al., 2002). With constant development of new techniques based on PCR, improved DNA polymerase variants are continuously being engineered including: polymerases which are thermo-activated (hot start polymerases) (Sharkey et al., 1994), bacterial and archaeal DNA polymerase derivatives with increased processivity (Wang et al., 2004), archaeal family B DNA polymerase variants insensitive to uracil inhibition (Fogg et al., 2002), or thermostable enzymes proficient in the synthesis of fluorescent CyDNAs (Ghadessy et al., 2004; Wynne et al., 2013). Error-prone PCR which creates random mutagenesis of the parental gene relies on different strategies using either lowfidelity DNA polymerase variants (Biles and Connolly, 2004) or error-prone PCR conditions (McCullum et al., 2010; Le et al., 2013). Most of these conventional marketed PCR enzymes have the drawback of replicating exclusively native DNAs. Among thermostable DNA polymerases that can counteract this major limitation, archaeal translesional family Y DNA polymerases are of particular interest. They are able to bypass a variety of DNA lesions and therefore are well-suited for the PCR amplification of ancient and damaged DNAs (McDonald et al., 2006).

Over 14 years ago a new family of archaeal DNA polymerases (the D-family) was discovered (Uemori et al., 1997). Despite obvious interest in the biochemical characteristics of archaeal family D DNA polymerases (Cann et al., 1998; Gueguen et al., 2001) there is limited information regarding the structure (Yamasaki et al., 2010; Matsui et al., 2011) and kinetics of these enzymes (Palud et al., 2008; Richardson et al., 2013a). However, it is known that these thermostable DNA polymerases are heterodimeric and comprise a small subunit (DP1), possessing 3- → 5 exonuclease activity, and a large subunit (DP2), exhibiting DNA polymerase activity (Cann and Ishino, 1999). The small subunit shares low level of homology with the non-catalytic B-subunits of the eukaryotic family B DNA polymerases (Cann et al., 1998; Ishino et al., 1998; Gueguen et al., 2001). In contrast, the sequence of the large subunit shows no significant homology to any other DNA polymerase (Macneill et al., 2001). Currently, the growing body of evidence suggests involvement of the family D DNA polymerases in genome replication in Archaea (Henneke et al., 2005; Rouillon et al., 2007; Castrec et al., 2009; Cubonova et al., 2013).

Family D DNA polymerases from hyperthermophilic Archaea which have been biochemically characterized, to date, are from the Pyrococcus genus such as *Pyrococcus horikoshii* (Shen et al., 2001), *Pyrococcus furiosus* (Uemori et al., 1997), and *Pyrococcus abyssi* (Gueguen et al., 2001). These microorganisms contain only one family B enzyme in addition to the family D DNA polymerase. In contrast with commercialized family B enzymes (Pfu-polB and Pab-polB), none of the family D DNA polymerases have been reported as active enzymes in PCR or in other DNA technologies.

Family D DNA polymerase from *Pyrococcus abyssi* shows comparable nucleotide selectivity to family B, and increased fidelity with the active proofreading (Palud et al., 2008; Richardson et al., 2013a). Family D DNA polymerase preferentially binds to primer/template with an affinity higher than family B, while showing reduced DNA synthesis of smaller DNA fragments (Henneke et al., 2005). The assembly of the two subunits into a heterodimer is required to substantially increase both polymerase and exonuclease activities in family D, while both activities are contained within the same polypeptide in the family B DNA polymerase (Castrec et al., 2010; Gouge et al., 2012). These functional properties suggest that family D DNA polymerase might perform PCR performance distinct than Pab-polB. In this paper, the ability of the recombinant family D DNA polymerase from *Pyrococcus abyssi* (Pab-polD) to PCR-amplify DNA has been developed in terms of biochemical and PCR performance parameters (e.g., stability to heat denaturation steps, extension efficiency, resistance to common PCR inhibitors). These results are compared with data acquired from commercial thermostable DNA polymerases (Pab-polB and Taq-polA) and reveal that family D DNA polymerase has significant commercial value in PCR technology.

# **MATERIALS AND METHODS**

#### **CHEMICALS AND ENZYMES**

Unlabeled dNTPs were purchased from MP Biomedicals. PabpolD was cloned, expressed, and purified as described (Henneke et al., 2005). One unit of Pab-polD corresponds to the incorporation of 1 nmol of total dTMP into acid precipitable material per minute at 65◦C in a standard assay containing 0.5 mg (nucleotides) of poly(dA)/oligo(dT)10:1. Pab-polB (*Isis* DNA polymerase) and Taq-polA (*Taq* DNA polymerase) were purchased from MP biomedicals. All other chemicals and bioreagents were analytical grade and purchased from Sigma-Aldrich (St. Louis, MO). Bioactive small molecules (Human hemoglobin, humic acid, hematin, heparin, and urea) were molecular biology grade from Sigma-Aldrich (St. Louis, MO). The 1.7 million basepair genome of *Pyrococcus abyssi* GE5 was obtained as described (Charbonnier et al., 1995).

#### **POLYMERASE CHAIN REACTION (PCR ENZYMES)**

PCR primers for the amplification of targets in genomic DNA from *P. abyssi* (*Pab*) genomic sequence from 1323272 to 1333272 base pairs (bp) were purchased from Eurogentec (Belgium). The primer sequences, the *Pab* genomic sequence, and the size of the expected amplicons (in kilobases, kb) are summarized in **Table 1**. These selective amplifications were dictated by the availability of total genomic DNA from *P. abyssi* devoid of any potential PCR inhibitors and the use of thermally stable oligonucleotide primers. PCR performance parameters of Pab-polD were determined in the optimized buffer: 20 mM Tris-HCl pH 9, 25 mM KCl, 10 mM (NH4)2SO4, 2 mM MgCl2, 0.1 mg/ml Bovine Serum Albumin (BSA), 0.1% (v/v) Tween 20. PCR reactions (25µl) contained 200 nM of each primer, 200µM dNTPs, and 100 ng


#### **Table 1 | Primers applied in this study.**

*F and R denote forward and reverse primers, respectively. M stands for mismatch.*

of genomic DNA unless otherwise specified. The PCR conditions for commercial Taq-polA and Pab-polB were set according to the manufacturers' instructions. All reactions were run in (at least) duplicate. Negative control included all reaction components without genomic DNA. The amplification was carried out in GeneAmp® PCR System 9700 Thermal Cycler (Applied Biosystems) and in Veriti® 96-Well Thermal Cycler (Applied Biosystems). Cycling conditions were 2 min at 94◦C; 30 cycles with 1 min denaturation at 94◦C, 1 min annealing at 58◦C and extension at 72◦C at the indicated times. A final extension step at 72◦C was applied before the termination of the reaction as specified in the corresponding figure legends. Elongation temperature was set at 72◦C according to the manufacturer protocols for Taq-PolA and Pab-polB, therefore validating the temperature of assay performance by Pab-polD. The products were analyzed with 1% agarose gel electrophoresis, stained with ethidium bromide, and visualized with the Molecular Imager FX (BioRad). When mentioned, activity (%) is expressed as a percentage of the maximal value obtained in each experiment.

PCR experiments in the presence of inhibitors were conducted with the optimized Pab-polD buffer as described above. The PCR conditions for Pab-polB and Taq-polA were set according to the manufacturers' instructions. A 0.5 kb fragment was amplified from *Pab* genomic DNA using the 500 bp reverse and forward primers (listed in **Table 1**). Titration of each inhibitor was performed at least in triplicate. Cycling conditions were 2 min at 94◦C; 30 cycles with 1 min denaturation at 94◦C, 1 min annealing at 58◦C, and 2 min extension at 72◦C; final extension, 5 min at 72◦C. The products were analyzed with 1% agarose gel electrophoresis, stained with ethidium bromide, and visualized with the Molecular Imager FX (BioRad).

#### **RESULTS**

#### **OPTIMIZED PCR REACTION CONDITIONS**

The optimized buffer for PCR with Pab-polD was obtained by varying different components of the standard Pab-polB reaction buffer. Pab-polD PCR activity was optimal in 10–30 mM Tris-HCl buffer concentration (**Figure 1A**) and between pH 8.3 and 9 (measured at 25◦C) (**Figure 1B**). Incubation of Pab-polD with either magnesium chloride (MgCl2) or magnesium sulfate (MgSO4) in the same concentration range resulted in the amplification of non-specific and undesirable PCR products with MgSO4 (**Figure 1C**). Reactions carried out with MgCl2 gave rise to the amplification of specific products in the optimal concentration range tested (**Figure 1D**). The effects of different salt concentrations of potassium chloride (KCl) (**Figure 1E**) and ammonium sulfate ((NH4)2SO4) (**Figure 1F**) were analyzed in PCR by Pab-polD. The maximal activity detected with the two salts was 0–20 mM for KCl and 15–25 mM for (NH4)2SO4, respectively. Although the presence of (NH4)2SO4 further enhanced PCR amplification at optimal concentration, KCl could be dispensable. Finally, the two additives, Tween and BSA, used in the standard Pab-polB reaction buffer were added or omitted in PabpolD PCR reactions. In the conditions tested, 0.1 mg/ml BSA and 0.1% Tween did not significantly improved the amount of PCR products by Pab-polD (**Figure 1G**). Overall, the optimal reaction buffer for *in vitro* amplification of DNA fragments by Pab-polD has been determined and is now available in **Table 2**.

#### **EFFECT OF INPUT GENOMIC DNA ON PCR EFFICIENCY AND SPECIFICITY**

PCR amplification, targeting the 0.5 kb fragment in the 1.7 million base-pair genome of *P. abyssi* (**Table 1**), was employed to determine the minimal amount of DNA required. In its optimal reaction conditions, Pab-polD was able to specifically amplify the 0.5 kb target from 0.5 to 100 ng of input genomic DNA (**Figure 2A**). Although the yield of PCR products was severely reduced at 0.5–1 ng, all three enzymes retained polymerase activity (∼2–5% of activity) (**Figures 2A–C**). In the presence of 0.1 ng, only Pab-polB was capable of amplification of the 0.5 kb DNA target (**Figure 2B**).

#### **IMPACT OF THERMAL DENATURATION DURING CYCLING**

The resistance of Pab-polD to the temperature of the denaturation step during cycling was investigated in comparison with Taq-polA and Pab-polB. PCR amplifications of the 0.5 kb DNA target were performed with 4 different thermal denaturation steps during cycling (91, 95, 97, and 99◦C). As shown in **Figure 3A**,

#### **Table 2 | Properties of experimental thermophilic DNA polymerases.**

in the figure. PCR program was (2 min at 94◦C) × 1; (1 min at 94◦C,


*N.P., Not Published.*

*Error rate for Pab-polB and Taq-polA were determined in the same experiments under the same conditions (Dietrich et al., 2002).*

Pab-polD could yield specific PCR products (15% activity compared to 91◦C) when the denaturation temperature was as high as 97◦C. PCR products were hardly detectable at 99◦C. Interestingly, the PCR efficiency of Pab-polB was not profoundly affected by the increase in the thermal denaturation step during cycling, while Taq-polA was mostly inactive above 95◦C (**Figures 3B,C**). Taken together, these results are in agreement with those published previously (Dietrich et al., 2002) and indicate that hyperthermophilic *Pab* DNA polymerases are more robust than the thermophilic Taq-polA.

#### **RATE OF DNA EXTENSION**

indicates the specific 0.5 kb band.

In order to determine the rate of primer extension by Pab-polD, the 1.95 kb DNA target was amplified using variants of the endpoint PCR method employed throughout this study. Numerous reactions were carried out, with each PCR possessing an incrementally larger extension time than the last, until end-point PCR products were detectable on an agarose gel. Our initial attempt, in which the longest extension time was set to 240 s, failed with Pab-polD and Taq-polA (**Figures 4A,C**). However, the 1.95 kb DNA target was successfully amplified

**FIGURE 2 | Effect of input genomic DNA on PCR efficiency and specificity.** PCR amplification of the 0.5 kb target (**Table 1**) was carried out with 0.1 U of Pab-polD **(A)**, 1 U of Pab-polB **(B)**, and 1 U of Taq-polA **(C)** in their respective reaction buffer (**Table 2**). PCR program

was (2 min at 94◦C) × 1; (1 min at 94◦C, 1 min at 58◦C, 2 min at 72◦C) × 30; (5 min at 72◦C) × 1. Molecular weight markers (M) are SmartLadder SF from Eurogentec. The arrow indicates the specific 0.5 kb band.

by Pab-polB (**Figure 4B**). Pab-polD exhibited a lower extension rate than Taq-polA since a single specific product at the expected size appeared at 360 and 300 s, respectively for each enzyme. According to these results, the rate of extension by PabpolD was estimated at 0.33 kb/min, resembling that of Taq-polA (0.39 kb/min), but dissimilar to that of Pab-polB (0.48 kb/min) (**Table 2**).

#### **PCR AMPLIFICATION OF DNA FRAGMENTS OF VARIOUS LENGTHS**

To determine the ability of Pab-polD to amplify various sized target sequences (ranging 0.5 to 10 kb) from genomic DNA, six specific primers have been designed (**Table 1**). The extension time assigned to the amplification of each DNA fragment during cycling is in agreement with the extension rate of PabpolD described above. In the range of 0.5–1.1 kb, Pab-polD and Taq-polA efficiently and specifically amplified DNA fragments (**Figures 5A,C**). Amplification of DNA molecules ranging from 1.95 to 2.95 kb was severely reduced for Pab-polD and to lower extent for Taq-polA (∼8% activity for the 2.95 kb target compared with the 0.5 kb target for both DNA polymerases), with bands at these sizes being faint and slightly detectable in ethidium bromide stained agarose gels after 30 PCR cycles. Amplification of target sequences ranging from 4.15 to 10 kb by Pab-polD and Taq-polA failed in the conditions tested (**Figures 5A,C**), with Pab-polD producing weak and unspecific bands. With the exception of the 4.15 and 10 kb DNA targets, Pab-polB amplified single products with the expected size (**Figure 5B**). Except for the 10 kb DNA target, a single product was amplified with Pab-polB. However, the yield of specific PCR products is decreased for the 2.95 kb DNA target. The above data clearly show that Pab-polD is suitable for the specific amplification of DNA molecules in the range of 0.5–2.95 kb, while showing reduced yields above 1.1 kb. Although not shown, replacement of Pab-polD optimal buffer conditions by Taq-polA or Pab-polB reaction buffer did not improve the yield of amplified 1.95– 2.95 kb DNA targets, nor enhanced the synthesis of longer PCR products.

#### **PCR AMPLIFICATION IN THE PRESENCE OF PRIMER MISMATCHES**

Complete 3- -terminal primer annealing to its complementary target sequence is a very important factor for the success and stringency of PCR (Petruska et al., 1988; Ishii and Fukui, 2001; Sipos et al., 2007). To evaluate the impact of primer mismatches on PCR efficiency and specificity by Pab-polD, forward primer sets containing up to four mismatches at the 3- -end have been designed for full amplification of the 0.5 kb DNA target from genomic DNA (**Table 1**). As shown in **Figure 6A**, specific amplification of the 0.5 kb DNA target could be achieved in the presence of either one or two 3- -end terminal mismatches, with lower PCR efficiency observed with two mismatches (∼5% remaining activity). The presence of three or four mismatches had a detrimental effect on the extension efficiency by Pab-polD. TaqpolA DNA polymerase generated specific PCR products with only one mismatched primer termini and longer mismatches prevented successful PCR amplification (**Figure 6C**). Although strand extension, and hence PCR amplification efficiency, were influenced by multiple mismatches at the 3 end of the primer, the detection of specific PCR products was never compromised for Pab-polB (**Figure 6B**). Here, the data pointed out that Pab-polD is a suitable enzyme for Taq-polA substitution when 3- -terminal mismatched primers are refractory to PCR amplification.

**FIGURE 4 | Rate of DNA extension.** PCR amplification of the 1.95 kb target (**Table 1**) was carried out with 0.1 U of Pab-polD **(A)**, 1 U of Pab-polB **(B)**, and 1 U of Taq-polA **(C)** in their respective reaction buffer (**Table 2**). PCR program was

(2 min at 94◦C) × 1; (1 min at 94◦C, 1 min at 58◦C, varying times in seconds as indicated at 72◦C) × 30. Molecular weight markers (M) are SmartLadder LF from Eurogentec. The arrow indicates the specific 1.95 kb band.

**FIGURE 5 | PCR amplification of DNA fragments of various lengths.** PCR reactions were carried out using 100 ng of genomic DNA with 0.1 Uof Pab-polD **(A)**, 1 U of Pab-polB **(B)**, and 1 U of Taq-polA **(C)** in their respective reaction buffer (**Table 2**). Primer sets were chosen to amplify 0.5, 1.1, 1.95,

2.95, 4.15, and 10 kb (**Table 1**). PCR programs were (2 min at 94◦C) × 1; (1 min at 94◦C/1 min at 58◦C/2, 4, 6, 8, 11, and 16 min with respect to the target length at 72◦C) × 30. Molecular weight markers (M) are SmartLadder LF from Eurogentec.

**FIGURE 6 | PCR amplification in the presence of primer mismatches.** PCR amplification of the 0.5 kb target (**Table 1**) was carried out with 0.1 U of Pab-polD **(A)**, 1 U of Pab-polB **(B)**, and 1 U of Taq-polA **(C)** in their respective reaction buffer (**Table 2**). Primer sets were chosen to

introduce 0, 1, 2, 3, and 4 mismatches at the 3--termini of the forward primer (**Table 1**). PCR program was (2 min at 94◦C) × 1; (1 min at 94◦C, 1 min at 58◦C, 2 min at 72◦C) × 30; (5 min at 72◦C) × 1. The arrow indicates the specific 0.5 kb band.

#### **PCR AMPLIFICATION IN THE PRESENCE OF KNOWN PCR INHIBITORS**

To investigate the impact of known PCR inhibitors (Al-Soud and Radstrom, 1998, 2001; Schrader et al., 2012), the 0.5 kb target was amplified from genomic DNA in reactions containing various levels of PCR-inhibiting compounds. The results are presented in **Table 3**. It was found that the variation of the ionic strength affected the PCR performances of the three different DNA polymerases. Indeed, Pab-polD was more resistant to NaCl ions than were Pab-polB and Taq-polA. In the conditions tested, a specific PCR product was detectable at the permissive concentration of 50 mM with Pab-polD, while absent with Pab-polB and Taq-polA (Supplementary Figure 1). When SDS (Sodium Dodecyl Sulfate), the anionic reagent well-known for its protein-denaturing effects, was employed at a concentration of 0.02%, PCR amplification was successful with Pab-polD and Pab-polB (**Table 3**); however, the yield of PCR products was dramatically impaired (Supplementary Figure 1). In comparison, Taq-polA was weakly active at a SDS concentration of 0.01%. CaCl2, a potent PCR inhibitor (Bickley et al., 1996; Al-Soud and Radstrom, 1998) found for instance in milk, cheese, or bones, was found to impede the amplification of the specific 0.5 kb target when incubated at a concentration of 2 mM for Pab-polD and Taq-polA (**Table 3**). Family B DNA polymerase was highly sensitive to CaCl2 as observed the severe reduction in PCR activity at a final concentration of 1 mM (Supplementary Figure 1).

It has been reported that heme compounds, for instance hemoglobin and hematin, yielded interferences in PCR amplifications (Akane et al., 1994). For these reasons, we investigated Pab-polD PCR performances in the presence of these two bloodsubstances. Interestingly, hemoglobin never altered PCR specificity and efficiency by Pab-PolD and Pab-polB even at a final concentration of 13 mg/ml (Supplementary Figure 1). However, PCR performances by Taq-polA were entirely compromised at the minimal concentrations of 1.75–3.5 mg/ml. Hematin decreased the yield of PCR products by Pab-polD and Taq-polA when present at a final concentration of 12.5µM, while Pab-polB still retained significant activity in the presence of *>*25µM of hematin.

The blood anticoagulant substance heparin described to inhibit PCR (Yokota et al., 1999) has been investigated. In the PCR reactions, the addition of heparin suppressed the formation of a specific PCR product in a dose-dependent fashion (from 0.006 to 0.2 U/µl) for all three PCR enzymes. Moreover, EDTA (EthyleneDiamineTetraAcetic acid), used also as a common anticoagulant to treat blood samples, and included in several elution buffers of nucleic acid purification, has been described to interfere with PCR (Yokota et al., 1999). In this study, EDTA had an inhibitory effect at a concentration greater than 0.5 mM for all three PCR enzymes. This result is likely indicative of the chelation of Mg2<sup>+</sup> ions present in reaction buffers of each DNA polymerase, therefore compromising DNA amplification.

Since urea has been identified as the main component of urine that inhibits PCR (Khan et al., 1991), we challenged Pab-polD, Pab-polB, and Taq-polA in its presence. As shown in **Table 3**, Pab-polB was the most resistant DNA polymerase, specifically amplifying the 0.5 kb target even at the highest concentration of 100 mM. Pab-polD and Taq-polA showed inhibitory concentrations *>*25 and 50 mM, respectively. Humic acid, representative of environmental samples (e.g., soil, water, and dead matter), has been recognized as an efficient PCR inhibitor even at low concentrations (Tsai and Olson, 1992; Ijzerman et al., 1997). For this reason, all three enzymes have been submitted to PCR amplification with increasing amounts of humic acid (15–250 ng/µl). No PCR product was visible with Pab-polD, even at the lowest

**Table 3 | Potential inhibitory effects of organic and inorganic substances on PCR.**


concentration. PCR amplification with Taq-polA was positive up to a humic acid concentration of 15 ng/µl. Pab-polB was the most resistant since it could PCR amplify the 0.5 kb target at 62.5 ng/µl of humic acid (**Table 3**).

Phenol, ethanol, and isopropanol are common organic substances used in the procedure for genomic extraction (Charbonnier et al., 1995) and relevant of food and environmental samples (Wilson, 1997). Interestingly, PCR performances of family B DNA polymerase were never affected by the varying concentrations of the three organic compounds (Supplementary Figure 1). In contrast, Taq-polA and Pab-polD exhibited similar inhibitory concentrations of isopropanol and ethanol (*>*2 and *>*4%, respectively). Moreover, Pab-polD was more tolerant to PCR inhibition by phenol than Taq-polA (**Table 3**). Overall, this comparative study clearly revealed that Pab-polB is the most tolerant enzyme to PCR inhibitors. Pab-polD seems to have a higher resistance to particular PCR inhibitors than Taq-polA (e.g., NaCl, SDS, Phenol, and hemoglobin), although sharing similar sensitivity (EDTA, ethanol, isopropanol, calcium, hematin, heparin). Finally, TaqpolA exhibited superior resistance to humic acid and urea than Pab-polD.

### **DISCUSSION**

The family D DNA polymerase from *P. abyssi* has been applied to PCR on genomic DNA and submitted to varying chemical parameters in order to evaluate its performance. For this purpose, a buffer has been optimized using the Pab-polB reaction buffer as a starting point due to this family B DNA polymerase originated from the same archaeon, *P. abyssi*. Here, we show for the first time that a family D DNA polymerase is functional in PCR amplification of a 0.5 kb DNA target. Under the conditions listed in **Table 1**, the three enzymes were analyzed to compare thermal resistance. Pab-polD was identified as more resistant than Taq-polA, yet not as robust as Pab-polB to the increased thermal denaturation during cycling (**Table 2**). These data obtained for the commercial enzymes being comparable with that already published (Gueguen et al., 2001; Dietrich et al., 2002) (Supplementary Figure 2) clearly suggest that Pab-polD is a thermostable enzyme. These results are interesting since they indicate that denaturation temperature during cycling can be increased when required during PCR. This is particularly useful when genomic DNA contains secondary structures or high GC-rich regions.

As expected, for all three enzymes PCR efficiency was variable in respect to the concentration of the 1.7 million base-pair genomic DNA, with 0.5 ng being the permissive amount for Pab-polB, while Pab-polD and Taq-polA were more sensitive to template dilution. PCR analysis of trace amounts of DNA has become an important concern in forensic investigations (Van Oorschot et al., 2010). While some laboratories set up 0.2 ng as a threshold limit for reliability of the investigations (Budowle et al., 2009), others continue to revise this limit (Kaminiwa et al., 2013). In our conditions, Pab-polD did not behave as an effective tool for the amplification of limited amounts of genomic DNA. Therefore, increasing the number of cycles or varying some compounds within the reaction buffer can be a useful alternative to overcome the limits (Van Oorschot et al., 2010).

In this study, the highest amount of genomic DNA has been applied to all PCR experiments and in these conditions the 0.5 kb target was significantly amplified by Pab-polD, Pab-polB, and Taq-polA. However, upon increasing the length of the target, the differences in PCR performance became more obvious (Pab-polB *>* Taq-polA *>* Pab-polD). A maximum of length of 2.95 kb was produced by Taq-polA and Pab-polD. Although barely detectable, Pab-polB could amplify the 4.15 kb target. The difficulty of PabpolD to PCR amplify long DNA fragments was not due to a high GC content of the DNA regions since all exhibited a value below 48% (**Table 1**). Pab-polD is known to be endowed with lower processivity than Pab-polB, requiring the PCNA (Proliferating Cell Nuclear Antigen) clamp for robust DNA synthesis (Henneke et al., 2005). Thus, further optimization of PCR amplification of large DNA fragments is certainly possible, for instance, by altering the reaction buffer components, adding PCNA or mixing the two *Pab* PCR enzymes.

Full annealing between primer and template sequences is generally considered crucial for the specific amplification of a nucleic acid sequence (Ghadessy et al., 2004). PCR-based amplification of specific sequences is essential in detecting single nucleotide polymorphisms (SNPs), in identifying microbial-archaeal populations and in diagnostics (Sipos et al., 2007; Liu et al., 2012). In these approaches, "universal" primer sets are used with the possibility to induce the formation of mismatched base pairs at template-primer 3- -termini. As a result, PCR amplification is reduced or fully inhibited (Huang et al., 1992), depending on the length of base mispairs. In our study, Pab-polD was challenged in PCR with mismatched base pairs (1, 2, 3, and 4 base mispairs) at template-primer 3- -termini. Pab-polD could amplify the 0.5 kb DNA target despite the presence of 2 mismatches but with reduced efficiency. Taq-polA retains activity in the presence of 1 mismatch at the 3- -terminus as already published (Huang et al., 1992) but was inhibited by two mismatches. Pab-polB was functional even in the presence of 4 mismatches. The *Pab*PCR enzymes show a higher tolerance to the presence of mismatches which must be attributed to their associated 3- –5 exonuclease function as already compared (Gueguen et al., 2001). Up to four and two mismatches can be accommodated into the exonuclease active site of Pab-polB and Pab-polD, respectively, which subsequently induce the degradation of the 3- -termini. Families B and D from *Pfu* are also known to efficiently process 3- -end termini of primers (Richardson et al., 2013a,b).

Time-dependent PCR extension has been carried out with Pab-polD and compared to Pab-polB and Taq-polA. A PCR product of constant length (1.95 kb) was amplified by increasing the extension time. Under these conditions, Pab-polD was the slowest enzyme able to generate the 1.95 kb target in 6 min whereas 5 and 4 min where required respectively for Taq-polA and PabpolB. The calculated extension rate of Pab-polD was 0.33 kb/min. This is almost comparable with the value of 0.39 kb/min for TaqpolA and slightly lower to that of 0.48 kb/min for Pab-polB. The values confirmed that Pab-polB, like other family B DNA polymerases, e.g., *Pfu* and *Tfu*, are particularly slow enzymes (Perler et al., 1996; Terpe, 2013). Although the extension rate has been determined by conventional end-point PCR which is not the method of choice compared to real-time quantitative PCR (Arezi et al., 2003), Pab-polD also shows a reduced elongation rate in PCR. In Pab-polB, and potentially Pab-polD, this property could be explained by the slow kinetic partitioning of the primer in the exonuclease and polymerase active sites allowing the DNA polymerase to proofread the nucleotide incorporation events, and when required to remove the misincorporated base (Gouge et al., 2012). On the other hand, the presence of secondary structures could also impede the efficiency of DNA synthesis by the *Pyrococcus* enzymes (Henneke, 2012).

The negative effect of inorganic and organic substances on PCR efficiency and specificity by Pab-polD along with PabpolB and Taq-polA has been investigated. The DNA polymerase the most resistant to ions NaCl and CaCl2 was Pab-polD. This higher resistance to elevated NaCl concentrations is similar to that found for some bacterial and archaeal PCR enzymes investigated previously (Al-Soud and Radstrom, 1998). The highest tolerance to calcium ions compared to other thermostable DNA polymerases (Al-Soud and Radstrom, 1998) places Pab-polD as a suitable enzyme in the amplification of food (e.g., milk and cheese) and human samples (e.g., teeth and bones) for instance. Introduced during the procedure of genomic extraction or naturally present in food and in environmental samples (Charbonnier et al., 1995; Wilson, 1997), phenolic compounds (Ethanol, phenol, and isopropanol) reduced the PCR performance of Pab-polD. These negative effects are commonly observed with most PCR enzymes (Rossen et al., 1992), except for Pab-polB (shown in this study) and *Tth* (*Thermus thermophilus*) (Katcher and Schwartz, 1994). Compounds, such as the SDS anionic detergent and EDTA, known to have direct and indirect negative effects on proteins, respectively, did not dramatically impact on PCR performances by Pab-polD compared to Pab-polB. Surprisingly, the permissive concentration of the two compounds for Taq-polA were slightly different to those found in another study (Yang et al., 2007), indicating that the source of the enzyme, the DNA target to be amplified and the reaction conditions are important parameters impacting on the issue of the investigation. The inhibitory activity of urea in PCR was observed with Pab-polD at a lower concentration threshold compared to Taq-polA or Pab-polB, and the values obtained with Taq-polA confirmed those previously published (Khan et al., 1991). Organic substances like heparin or humic acid were completely inhibitory to PCR reactions by Pab-polD. The strong inhibitory effect of heparin on all three PCR enzymes has been observed with other thermostable DNA polymerases (Yokota et al., 1999). This is not so surprising since heparin is commonly used as a trapping agent of DNA polymerases in both polymerase assays and chromatography. The effects of two heme blood substances, hemoglobin and hematin, did not impact similarly on PCR performances by Pab-polD. While completely resistant to hemoglobin, Pab-polD was sensitive to hematin but to the same level as Taq-polA. Generally, inhibitory effects by heme compounds appear as a drawback in PCR with thresholds dependent on the PCR enzymes used (Akane et al., 1994; Al-Soud and Radstrom, 2001). In this study, Taq-polA was the most sensitive to hemoglobin.

In conclusion, our results demonstrated for the first time that an archaeal family D DNA polymerase is functional in PCR. PCR performances (rate of DNA synthesis, maximal length of amplification, minimal input genomic DNA, resistance to thermal denaturation during cycling, and PCR amplifications with 3- -end mismatched primers) of Pab-polD appears more comparable to Taq-polA than Pab-polB, but with some valuable properties (e.g., high resistance to thermal denaturation during cycling, amplification with primers containing up to 2 mismatches). In addition, due to its superior resistance to inhibitors than Taq-polA (e.g., calcium ions, sodium chloride, hemoglobin, SDS), Pab-polD could replace the enzyme in some applications. Additional investigations (e.g., PCR fidelity, ability to specifically amplify high GC rich content and degraded genomic DNA) are now required to consider Pab-polD as a suitable PCR enzyme that could overcome the handicap encounter by conventional enzymes that are marketed for PCR.

#### **ACKNOWLEDGMENTS**

The work was financially supported by the French National Research Agency (ANR-10-JCJC-1501-01 to Ghislaine Henneke) and the General Council of Finistère (CG29). I would like to thank Sébastien Le Laz for help with PCR optimization.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at: http://www*.*frontiersin*.*org/journal/10*.*3389/fmicb*.*2014*.* 00195/abstract

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 13 March 2014; accepted: 12 April 2014; published online: May 2014. Citation: Killelea T, Ralec C, Bosse A and Henneke G (2014) PCR performance of a thermostable heterodimeric archaeal DNA polymerase. Front. Microbiol. 5:195. doi: 10.3389/fmicb.2014.00195 07*

*This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology.*

*Copyright © 2014 Killelea, Ralec, Bosse and Henneke. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Compartmentalized self-replication under fast PCR cycling conditions yields *Taq* DNA polymerase mutants with increased DNA-binding affinity and blood resistance

*Bahram Arezi\*, Nancy McKinney , Connie Hansen , Michelle Cayouette , Jeffrey Fox , Keith Chen , Jennifer Lapira , Sarah Hamilton and Holly Hogrefe*

*Agilent Technologies, La Jolla, CA, USA*

#### *Edited by:*

*Andrew F. Gardner, New England Biolabs, USA*

#### *Reviewed by:*

*Phil Holliger, MRC Laboratory of Molecular Biology, UK Jennifer Lee Ong, New England Biolabs, USA*

#### *\*Correspondence:*

*Bahram Arezi, Agilent Technologies, 11011 North Torrey Pines Rd., La Jolla, CA 92037, USA e-mail: bahram.arezi@agilent.com*

Faster-cycling PCR formulations, protocols, and instruments have been developed to address the need for increased throughput and shorter turn-around times for PCR-based assays. Although run times can be cut by up to 50%, shorter cycle times have been correlated with lower detection sensitivity and increased variability. To address these concerns, we applied Compartmentalized Self Replication (CSR) to evolve faster-cycling mutants of *Taq* DNA polymerase. After five rounds of selection using progressively shorter PCR extension times, individual mutations identified in the fastest-cycling clones were randomly combined using ligation-based multi-site mutagenesis. The best-performing combinatorial mutants exhibit 35- to 90-fold higher affinity (lower *Kd* ) for primed template and a moderate (2-fold) increase in extension rate compared to wild-type *Taq*. Further characterization revealed that CSR-selected mutations provide increased resistance to inhibitors, and most notably, enable direct amplification from up to 65% whole blood. We discuss the contribution of individual mutations to fast-cycling and blood-resistant phenotypes.

**Keywords: fast PCR, fast cycling,** *Taq* **mutants, blood resistant, inhibitor resistant**

#### **INTRODUCTION**

*Taq* DNA polymerase is still considered the workhorse of PCR, providing great economy and reliability in routine amplification of genomic targets up to 2 kb. Moreover, PCR detection in realtime relies on *Taq*'s intrinsic 5- -flap endonuclease activity for TaqMan probe hydrolysis and lack of proofreading activity to minimize primer/probe degradation (Holland et al., 1991). With an extension rate of 60 nt/s, wild-type *Taq* produces high amplicon yields after 30–40 cycles using 1 min anneal-extension times. Run times are 1.5–2 h on conventional Peltier-based PCR instruments and approximately 1 h using advanced qPCR instrumentation with improved thermal ramp rates (2.2–3◦C/s). Demand for higher throughput and shorter turn-around-time continues to fuel interest in developing faster PCR instrumentation, along with polymerases with improved kinetic properties. In the near future, microchip-based technologies are expected to provide drastically reduced run times (<3–5 min), limited only by the kinetics of nucleotide incorporation (Hashimoto et al., 2004).

Currently, shorter qPCR run times are achieved by reducing hold times for denaturation, annealing and extension steps, and/or by using a 2-step cycling regimen with combined annealing and extension steps (at 60◦C). Although run times can be cut by up to 50%, shorter cycle times with *Taq* have been correlated with lower detection sensitivity and higher failure rates when applied across a range of primer-template combinations (Hilscher et al., 2005). Results can be further improved by increasing the amount of *Taq* or other reagents, or by reducing reaction volume and using thin-walled PCR tubes to further improve heat transfer. Ultimately, however, PCR run times are limited by the kinetic properties of the PCR enzyme. In the most telling example, the processivity of a proofreading Family B DNA polymerase was directly correlated to PCR cycle times. When the processivity of *Pfu* DNA polymerase was increased by 9-fold by fusion to a small basic double-stranded DNA-binding protein (Sso7d), PCR annealing/extension times could be reduced from 2 min to 30 s for a 5 kb λ target (Wang et al., 2004). Unfortunately, this strategy could not be employed to accelerate TaqMan assays, as *Pfu* lacks 5 endonuclease activity and Sso7d fusions to full-length *Taq* are unstable (data not shown).

To accelerate qPCR run times, we applied the Compartmentalized Self Replication (CSR) technique (Ghadessy et al., 2001) to evolve faster-cycling *Taq* mutants. CSR employs emulsion PCR to trap individual *E. coli* cells harboring mutant polymerase genes in microscopic aqueous compartments along with nucleotides and *pol* gene-specific primers. When the emulsion is cycled under selective conditions, active mutant polymerases self-replicate and are enriched, while those with insufficient activity fail to replicate and are lost from the gene pool. CSR has been used to successfully evolve *Taq* mutants with increased thermostability or heparin resistance (Ghadessy et al., 2001), and chimeric polymerases that are broadly resistant to complex environmental inhibitors or can process non-canonical primertemplate duplexes and bypass lesions found in ancient DNA such as abasic sites (d'Abbadie et al., 2007). In this study, we used CSR to evolve *Taq* mutants that can self-replicate under progressively shorter extension times. As we will show, CSR selection netted *Taq* variants with a broad range of beneficial attributes, in addition to increased polymerization rate.

# **MATERIALS AND METHODS**

All molecular biology reagents were from Agilent Technologies unless otherwise noted. Oligonucleotides were purchased from Integrated DNA Technologies. Radioactive nucleotides [γ33P] ATP-3000 Ci/mmol-1 mCi (NEG302H001MC) and deoxythymidine-5- -triphosphate [Methyl-3H] tetrasodium salt-1 mCi (NET221A001MC) were purchased from Perkin Elmer.

#### **RANDOM AND SITE-DIRECTED MUTAGENESIS**

*Taq* mutants were generated by random mutagenesis of the *Thermus aquaticus pol I* gene using the GeneMorph II random mutagenesis kit and PCR primers that introduce *Xba*I and *Sal*I sites and an N-terminal His6 tag (F: GGCGGCTCTAGATAACGAGGGCAAAAAATGCA TCATCA TCACCATCAC, R: GCGGTGCGGAGTCGACTTACTCCTTGGC GGAGAGCCAGTC). PCRs also included 5% DSMO and increasing amounts of plasmid template (10 ng, 1 ng, 0.1 ng) to generate libraries with varying mutation rates of 4.7–6.2 per kb. After *Dpn*I treatment, PCR products were gel purified and digested with *Xba*I and *Sal*I (NEB). Purified fragments were cloned into the pASK-IBA5C expression vector (IBA) and transformed into XL10-Gold Kan cells. Site-directed mutagenesis was performed using the QuikChange Lightning or QuikChange Lightning Multi Site-Directed Mutagenesis kit.

#### **CSR SELECTION**

Approximately 250,000 independent clones were scraped from plates, re-suspended, and stored as glycerol stocks. LB/CAM cultures (40 ml) were freshly inoculated, grown at 30◦C to OD600 of 0.6, and induced with 200 ng/ml anhydrotetracycline. After 3 h of growth, cells were harvested and washed in 1× *Taq* buffer. CSR was carried out essentially as described by Ghadessy et al. (2001). To select for faster-cycling mutants, extension times were successively reduced over five rounds of CSR from 2.5 min (round 1) to 15 s (round 5). PCR selection was performed on a Robocycler 96 using 2.5 min at 94◦C followed by 30 cycles of 45 s at 94◦C, 30 s at 60◦C, and 0.25–2.5 min at 72◦C.

#### **PROTEIN EXPRESSION**

Colonies expressing mutant *Taq* polymerases were randomly picked, replicated, and then grown overnight at 30◦C in 96 deep well plates (VWR) containing 750μl LB/CAM. Overnight cultures (30μl) were inoculated into fresh media, induced with anhydrotetracycline at OD600nm of 0.3–0.5, and grown overnight with shaking at 30◦C. Cells were collected and used to prepare lysates for direct PCR screening or for affinity protein purification (see below). Cell pellets were re-suspended in 50μl Tris pH 8 containing 4 mg/ml lysozyme, and incubated at 37◦C for 10 min to disrupt cell walls and at 75◦C for 15 min to inactivate *E. coli* protein. Lysates were clarified by centrifugation for 15–30 min at 4000 RPM through a 96-well filter plate (Millipore, Multiscreen HTS, HV).

#### **PURIFICATION OF HIS-TAGGED** *Taq* **MUTANTS**

Cell pellets were re-suspended in 90μl of buffer prepared by adding one cOmplete EDTA-free Protease Inhibitor tablet (Roche) to 50 ml of 50 mM Tris pH 8, 0.5 M NaCl, 5 mM imidazole. An aliquot (60μl) of 1× FastBreak Cell Lysis Reagent (Promega) was added and the lysate was incubated at 37◦C for 15 min and at 70◦C for 15 min, before centrifugation through a Millipore 96-well filter plate. Clarified lysates were combined with 60μl Ni-NTA agarose (Qiagen) and incubated with shaking at room temperature for 2 h. After collecting the agarose resins using a fresh filter plate, resins were washed two times with 200μl wash buffer (50 mM Tris pH8, 0.5 M NaCl, 20 mM imidazole) and eluted with 80μl of 50 mM Tris pH8, 0.5 M NaCl, 200 mM imidazole.

### **FAST PCR SCREENING**

*Taq* mutants were screened by amplifying a 549 bp GAPDH target on the Mx3005 qPCR system using fast-cycling conditions consisting of 1 min at 95◦C followed by 50 cycles of 2 s at 99◦C, 7 s at 59◦C. PCRs (25μl) contained 1× *Taq* buffer (15 mM Tris pH 8.0, 50 mM KCl, 2.5 mM MgCl2, 0.01% Tween-20), 0.8 mM dNTPs, GAPDH primers (5- -ATCTTGAGGCTGTTGTCATAC; 5- -CAGGAAACAGCTATGACCATG), 105 copies plasmid DNA, 0.8× Eva Green, and either 2μl clarified lysate (neat or diluted 1:5) or 10–50 ng of purified His-tagged *Taq*. Primary hits displayed earlier Cqs compared to wild-type *Taq* controls processed in the same way on the same plates.

#### **COLUMN PURIFICATION OF NON-TAGGED** *Taq* **MUTANTS**

Mutant *pol* genes were subcloned into pET11 (with no His tag) and expressed in *E. coli* strain BL21-DE3-RIPL. One liter cultures were grown at 30◦C in LB medium with 125μg/ml ampicillin and 30μg/ml chloramphenicol, and induced at OD600nm of 0.6 with 1 mM IPTG for 4–5 h. Cell pellets were recovered by centrifugation and stored at −20◦C. For purification, pellets were suspended in Buffer A (50 mM Tris-Cl pH 8.2, 1 mM EDTA, 10 mM 2-mercaptoethanol) plus cOmplete Protease Inhibitor (Roche). Cell suspensions were disrupted by sonication, brought to 0.2 M with solid (NH4)2SO4, heated in a water bath at 80◦C for 15 min, and then cooled on ice. Polyethyleneimine was added to 0.2% (w/v), and after mixing thoroughly, insoluble material was removed by centrifugation. The supernatant was loaded on an SP Sepharose FastFlow (GE Healthcare) column equilibrated and run with Buffer B (50 mM Tris-Cl pH8, 1 mM EDTA, 0.2 M (NH4)2SO4, 10 mM 2-mercaptoethanol, 5% glycerol). Flow-through fractions were pooled and dialyzed against 15 volumes (with two changes) of Buffer C (50 mM Tris-Cl pH8.3, 1 mM EDTA, 10 mM 2-mercaptoethanol, 5% glycerol), and then loaded on Q Sepharose HP (GE Healthcare) equilibrated with Buffer C. The column was eluted with a 12.5 column-volume gradient to Buffer C containing 400 mM KCl. *Taq*-containing fractions (as judged by SDS-PAGE) were pooled, diluted with 2.5 volumes of Buffer D (50 mM Tris-Cl pH 7.5, 1 mM EDTA, 10 mM 2-mercaptoethanol, 5% glycerol, 125 mM KCl) and then loaded on Heparin Sepharose HP (GE Healthcare). The column was eluted with a 25 column-volume gradient to Buffer D with 650 mM KCl. Substantially-pure *Taq*-containing fractions were pooled, dialyzed into storage buffer (20 mM Tris-Cl pH 8, 0.1 mM EDTA, 1 mM DTT, 100 mM KCl, 50% glycerol), and stored at −20◦C. Protein was quantified using the Coomassie Plus protein assay (Thermo Fisher Scientific).

#### **REAL-TIME PCR ASSAYS**

SYBR Green qPCR reactions (20μl) consisted of 10–20 ng wildtype or mutant *Taq*, 0.5–50 ng human genomic DNA, 200μM each dNTP, 600 nM total primers, 1× *Taq* buffer (adjusted to 95 mM KCl for all PCRs and biochemical assays employing *Taq* mutants), and 0.24× SYBR Green. Primer sequences were as follows: ABC (F: 5- -CCAAACCCTGGATCACGTGTT-3- ; R: 5- -CCTCCGCGTCTCGTAGTTCT-3- ), COMTE2 (F: 5- - GAGATCAACCCCGACTG-3- ; R: 5- -GGCCCTTTTTCCAG-3- ), Quantos (F: 5- -TATAAGAAACTACTAAGCACCCAAAGG-3- ; R: 5- -AAGAAAGGAGTCTAAGTGACTCAACAG-3- ; Aldolase (F: 5- -AGCCTAGCTCCAGTGCTTCTAGTA-3- ; R: 5- -CTTTGG ATGAGGAGCCGATATTG-3- ), Numb (F: 5- -GAGGTTCCTA CAGGCACCTGCCCAG-3- ; R: 5- -CAAAATCACCCCTCACAG TACTCTG-3- ). TaqMan qPCR reactions consisted of 10 ng wild-type or mutant *Taq*, human genomic DNA, 0.8 mM dNTPs, 1× *Taq* buffer (adjusted to 95 mM KCl for *Taq* mutants), and 1× β-actin primer/probe (171 bp) from Life Technologies' Assays-On-Demand. qPCR reactions were run on the StepOnePlus (Life Technologies) or CFX96 (BioRad) instrument using cycling parameters indicated in the Figure legends.

### **PRIMER EXTENSION ASSAYS**

Extension rate and processivity were measured at 70◦C using M13mp18 template DNA (NEB), pre-annealed at a 1.3–1.5:1 molar ratio to a 5- 33P-labeled primer with the sequence 5- - GGTTTTCCCAGTCACGACGTTGTAAAACGACGGCCAGTGC-3- . Extension reactions (80μl) consisting of 1.75 pmol primed M13, 200μM each dNTP, and 1× *Taq* buffer were brought to 70◦C prior to adding 0.55 pmol *Taq* enzyme. Aliquots (8μl) were removed at 15, 30, 45, 60, 90, and 120 s and quenched with 50 mM EDTA. Extension products were denatured at 80◦C for 3 min and analyzed on 1% alkaline-agarose gels to determine mean fragment length. Extension rates were calculated as number of nucleotides incorporated divided by incubation time. Processivity assays were conducted similarly using 1.1 pmol primed M13 and limiting amounts of *Taq* (template/enzyme molar ratios of 37:1, 370:1, 3700:1). Aliquots removed at various time points (15, 30, and 60 s) were quenched in gel loading buffer, and analyzed on 6% TBE-Urea gels (Life Technologies). Median processivity was determined from reactions producing the same product length over several time points or enzyme amounts.

### **NUCLEOTIDE INCORPORATION ASSAYS**

Polymerase cocktails (10μl) contained 200μM dATP, dGTP and dCTP, 100μM TTP (3H-TTP), 1<sup>×</sup> *Taq* buffer (adjusted to 95 mM KCl for *Taq* mutants), and fixed or varying concentrations of primed (non-radiolabeled) M13 DNA for half-life (T1/2) or *Kcat* measurements, respectively. Steady-state kinetic parameters were determined using 0.005 pmol *Taq* and 0.5–100 nM primed M13. After 3 min at 60◦C, incorporation reactions were quenched with ice-cold 0.1 M EDTA and 5μl aliquots spotted on DE-81 filters. After washing 5 times with 2× SSC, incorporated radioactivity was measured by scintillation counting. *Km* and *Vmax* values were determined from Lineweaver-Burk plots, and *Kcat* was calculated as maximum number of nucleotides incorporated per *Taq* molecule per second (*Vmax*/[*Taq*]). To determine T1/2(95◦C), *Taq* was pre-heated in the absence or presence of genomic DNA before residual DNA polymerase activity was assayed as described above. Mixtures (10μl) consisting of 0.02 pmol *Taq*, 1× *Taq* buffer, and 0 or 10 ng human genomic DNA were overlaid with mineral oil and incubated at 95◦C for 5–180 min. At various time points, aliquots (2μl) were transferred to polymerase assay cocktail (10μl) containing 0.05 pmol primed M13. Polymerase activity (CPM) was plotted against pre-incubation time at 95◦C.

## **K***<sup>d</sup>* **ASSAY**

Dissociation constant *Kd* (DNA) was measured using a gel mobility-shift assay that employs a 5- 33P-labeled hairpin template (5- -CTCCAGACACGACGCAGTTGCCCGATGGTC GACGTTCGCGAAAGCGAACGTCGACCATCGGGCAACT-3- ) blocked at the 3 end with dideoxy TMP. The radiolabeled hairpin (0.25 nM) was incubated with varying concentrations of wild-type or mutant *Taq* DNA polymerases (0.5–1000 nM; bracketing the predicted *Kd*) in 1× *Taq* buffer at 37◦C for 30 min. DNA-protein complexes were run on 10% non-denaturing TBE gels (Life Technologies) in 0.5× TBE buffer. Gels were dried down and exposed to film. The fraction of DNA bound was quantified by densitometry using AlphaView SA software (Alpha Innotech), and then plotted against enzyme concentration to determine *Kd* values by interpolation.

#### **PCR/qPCR INHIBITION ASSAYS**

Xylan, humic acid, CTAB, and dextran sulfate were purchased from Sigma. Blood was collected from healthy human volunteers using BD Vacutainer tubes with EDTA or heparin. Inhibitorresistance screens ("spike-in" assays) were conducted by adding varying concentrations of inhibitors to qPCRs (20μl) containing 20 ng wild-type or mutant *Taq* DNA polymerase, 200μM each dNTP, 1<sup>×</sup> *Taq* buffer, 0.24<sup>×</sup> SYBR Green, 10<sup>5</sup> copies plasmid DNA, and 600 nM GAPDH primers (5- -ATCTTGAGGC TGTTGTCATAC; 5- -CAGGAAACAGCTATGACCATG) designed to amplify 549 bp GAPDH target on the Mx3005 qPCR system using 2 min initial denaturation at 95◦C followed by 40 cycles of 12 s at 95◦C and 60 s at 60◦C.

Endpoint PCRs directly from blood (endogenous targets) consisted of (50μl) 50 ng DNA polymerase, 15 mM Tris pH 8.8, 50 mM (wild-type *Taq*) or 95 mM (mutant *Taq*) KCl 2.5 mM MgCl2, 0.02% Tween-20, 200μM each dNTP, and 2% DMSO. A 322 bp IGF target was amplified directly from blood using 400 nM each primer (IGF322-F 5- -ATGGAGGGACCAATAGTAGGGAA; IGF322-R 5- -AGTACCACGTACAGGCTTTGCAT), and the following cycling parameters: 5 min at 90◦C followed by 30 cycles of 30 s at 95◦C, 30 s at 60◦C, 1 min at 72◦C. Water (25μl) was added to PCRs containing blood before centrifugation to pellet debris. A portion (12μl) of each sample was run on a 4% Nusieve® (3:1) TBE agarose gel (Lonza). Comparisons employing commercial enzymes were conducted using the 232 bp Quantos (see above) assay system and each manufacturer's recommended reaction buffer and cycling conditions.

#### **RESULTS**

#### *Taq* **MUTAGENESIS AND SCREENING**

We used the CSR technique to evolve mutants of *Taq* DNA polymerase that self-replicate using abbreviated extension times. Faster-replicating mutant polymerases are expected to provide robust performance with "fast" PCR instruments and cycling conditions. Moreover, identifying mutations that allow shorter cycle times may provide insight on kinetic factors that limit PCR performance of wild-type *Taq*.

*Taq* mutant libraries were subject to CSR selection using progressively shorter extension times, ranging from 1 min down to 6 s per kb of *taq pol I*. After rounds 2, 4, and 5, crude protein preparations were prepared and tested in amplifications employing 2-step cycling with 2 s denaturation and 7 s annealing-extension times. To account for differences in protein expression, lysates were prepared and assayed at least 2–3 times before affinity-purifying the top-performing *Taq* mutants. From a screen of several hundred clones, we recovered 24 His-tagged *Taq* mutants that consistently produce earlier Cq values compared to wild-type *Taq* in real-time PCR under fast cycling conditions. DNA sequencing identified 8 mutations that appear in 2 or more independent clones, as follows (in order of frequency): E507K (11); G59W (9); L245M (5); V155I, L375V, F749I (4); K508R, E734G (2).

To identify the most effective mutation or combination of mutations, we performed multi-site mutagenesis with an equimolar mixture of 8 mutant primers to create all possible (single, double, triple, etc.) combinations of G59W, V155I, L245M, L375V, E507K, K508R, E734G, and F749I mutations. The combinatorial library was enriched by 1 round of CSR selection using 15 s (6 s per kb) extension times, and random clones were screened as described above with a modified *Taq* PCR reaction buffer. We increased KCl concentration from 50 to 95 mM to eliminate non-specific amplification products that were generated by the majority of fast-cycling mutants (<30% of total amplified product as judged by agarose gels or qPCR melt curves; data not shown). Higher KCl concentrations (>50 mM) inhibit wild-type *Taq* DNA polymerase to the extent where no PCR products are generated at 95 mM KCl. The most active *Taq* mutants (1C2, 2C2, 3B, 42; see **Table 1**) were sub-cloned to remove the His-tag, purified using a standard wild-type *Taq* protocol, and characterized further by PCR.

In endpoint PCRs employing longer (>1 kb) genomic targets, *Taq* 1C2, 2C2, 3B, and 42 efficiently amplified targets using 15–30 s/kb extension times whereas *Taq* required 1 min/kb to generate similar yields (data not shown). *Taq* 1C2, 2C2, 3B, and 42 were also tested with shorter targets to assess the benefits of fast cycling in quantitative PCR (qPCR) assays that traditionally employ targets of <300 bp (SYBR Green detection) or <200 bp (probe detection) to achieve amplification efficiencies as close to 100% as possible. In qPCRs employing SYBR Green, *Taq* readily amplifies genomic DNA targets up to 300 bp using standard anneal-extension times of 1 min at 60◦C (**Figure 1**). However, with 10 s anneal-extension times, *Taq* produced slightly lower -Rn values in the 109 bp COMTE assay and completely failed to amplify a 305 bp NUMB target (**Figures 2B,C**). In contrast, *Taq* 1C2, 2C2, 3B, and 42 amplified the entire series (91–305 bp) of genomic DNA targets using abbreviated cycle times (10 s, **Figure 2**; 1 s, **Figure 1**), and no significant differences were observed among the mutants with respect to Cqs, -Rn values, and amplification efficiencies. Compared to SYBR Green assays, abbreviated cycle times appear to have less of an impact in TaqMan assays designed according to standard primer-probe design rules (<200 bp targets). In the example shown in **Figure 3**, *Taq* 1C2, 2C2, 3B, and 42 produce equivalent results to wild-type *Taq* using 10 s anneal-extension times, thereby confirming that probe-hydrolysis (5- -structure-specific endonuclease) activity is not affected by the fast-cycling mutations.

# **MUTATIONS CONFERRING THE FAST-CYCLING PHENOTYPE**

DNA sequencing revealed that 3 of the original 8 mutations-L245M, E507K, and F749I- are absolutely conserved in *Taq* 1C2, 2C2, 3B, and 42, indicating that one or more are essential for fast-cycling under CSR selection conditions (**Table 1**). While not critical, other mutations emerging from CSR selection may further enhance overall fitness of the *Taq* 245M/507K/749I mutants. For example, G59W (missing in *Taq* 3B) and/or V155I (missing in *Taq* 42) appear in majority (3 out of 4) of the top-performers, suggesting a role for these N-terminal domain mutations in enhancing robustness. The L375V, K508R, and E734G mutations appear less frequently, and may be of limited importance to survival during CSR selection. **Figure 4A** shows the location of G59W, V155I, and L245M in the N-terminal 5- –3 endonuclease domain, and the close proximity of E507K and F749I in the thumb and fingers domains, respectively.

To further elucidate the contribution of individual mutations to the fast-cycling phenotype, we constructed and purified individual mutants *Taq* G59W, *Taq* V155I, *Taq* E507K, and *Taq* F749I. When equivalent amounts of enzyme were compared in qPCRs, only *Taq* E507K amplified 286 and 232 bp genomic DNA targets with short (1 s) anneal-extension times (**Figure 1** and data not shown). Moreover, *Taq* E507K produced equivalent Cq values to


*Taq* 42, 2C2, and 3B, indicating that the E507K mutation is solely responsible for the fast-cycling phenotype. With standard cycle times (60 s anneal-extension), amplification profiles are comparable across all *Taq* enzymes, confirming that single-site mutants *Taq* G59W, *Taq* V155I, and *Taq* F749I retain wild-type levels of polymerase activity.

### **KINETIC PARAMETERS OF** *Taq* **MUTANTS**

Kinetic parameters were investigated at each enzyme's KCl optimum (50 or 95 mM for *Taq* or *Taq* mutants, respectively) to determine how E507K confers the fast-cycling phenotype. In radio-labeled primer extension assays, *Taq* E507K and *Taq* 42 exhibit somewhat higher (1.7-fold) polymerization rates compared to wild-type *Taq* (85 vs. 50 nt s−1; **Table 2**), but no change in processivity (20 bases; **Table 2**). Measurements of steady-state kinetic parameters also show a moderate (2.2-fold) increase in *Kcat* values (2.5 <sup>±</sup> 0.03 s−<sup>1</sup> for *Taq* 42; 1.1 <sup>±</sup> 0.04 s−<sup>1</sup> for *Taq*; data not shown). The most compelling difference between wild-type and E507K mutants was observed in gel-based *Kd* assays employing a hairpin oligonucleotide template. As shown in **Table 2**, dissociation constants for *Taq* 42, 1C2, and E507K mutants (2.75, 1.9, and 1.1 nM, respectively) were approximately 35–90-fold lower compared to those of *Taq* and *Taq* G59W (102 and 91 nM, respectively). Moreover, *Kd* measurements were comparable for assays run in the absence or presence of dNTPs (data not shown). In sum, these results indicate that the E507K mutation supports faster PCR cycling conditions by increasing binding affinity of *Taq* for primed DNA templates irrespective of nucleotide binding.

## **MUTATIONS IMPROVING POLYMERASE FITNESS**

CSR has been shown to exert strong selective pressure on polymerase fitness, enriching for variants that self-replicate with sufficient accuracy and efficiency to remain in the gene pool (Ghadessy et al., 2001). In some cases, selected traits evolved in parallel with increased inhibitor resistance, presumably because survival requires self-replication in the presence of CSR emulsifiers/stabilizers and bacterial debris (Baar et al., 2011). In our study, several mutations including G59W, V155I, L245M, and F749I emerged during selection of the fast-cycling phenotype, prompting speculation that one or more of these ancillary mutations may enhance overall fitness of *Taq* DNA polymerase. To address this possibility, we further characterized the stability and inhibitor-resistance of a subset of *Taq* multi-site mutants.

Thermostability was investigated by determining half-life (T1/2) at 95◦C, in the absence and presence of DNA template. Compared to wild-type, *Taq* 42 exhibits a slightly higher T1/<sup>2</sup> in the presence of DNA (71 ± 4.6 vs. 60 ± 1.7 min), but no significant difference in the absence of DNA (60.5 ± 2.3 vs. 62 ± 2 min; data not shown). These results are consistent with increased

thermal resistance conferred by tighter binding (of E507K mutants) to DNA, but also imply that none of the other mutations in *Taq* 42 (G59W, V155I, L245M, L375V, K508R, E734G, F749I) enhance intrinsic thermal resistance. Next, inhibitor resistance of *Taq* 2C2 (all mutations except K508R) was investigated by amplifying an exogenous target in the presence of varying amounts of known *Taq* inhibitors, including plant-associated substances (xylan, dextran sulfate, CTAB), soil inhibitors (humic acid), whole blood treated with various anti-coagulants, and other PCR inhibitors (NaCl, SYBR Green) (Demeke and Adams, 1992; Watson and Blackwell, 2000; Kermekchiev et al., 2009). In these studies, inhibitor resistance was determined relative to wild-type *Taq* using standard PCR cycling times. Compared to wild-type *Taq*, *Taq* 2C2 shows significantly (≥4-fold) higher tolerance to dextran sulfate (50-fold; data not shown), NaCl (4-fold; **Table 3**), and whole blood (see below).

95◦C, 10 s at 60◦C. Genomic DNA targets are as follows: **(A)** 91 bp ABC, **(B)** 109 bp COMTE2, **(C)** 305 bp Numb. Sixty second extension time is required for *Taq* wild-type to efficiently amplify the 305 bp target (data not shown).

assay (Life Technologies; 171 bp β-actin target) was performed using 50, 5, 0.5 ng human genomic DNA and 10 ng of purified wild-type or mutant *Taq* DNA polymerase. Reactions were cycled on the StepOnePlus instrument using cycling conditions consisting of 2 min at 95◦C followed by 40 cycles of: **(A)** 10 s at 95◦C, 60 s at 60◦C; or **(B)** 10 s at 95◦C, 10 s at 60◦C.

Additional testing with single- and multi-site mutants revealed that E507K confers increased tolerance to NaCl (up to 100 mM) in addition to higher affinity for primed-template. In contrast, mutations at other positions result in equivalent (V155I, F749) or reduced (*Taq* G59W; by 2.5-fold) tolerance to NaCl compared to wild-type *Taq*. Next, resistance to blood-associated inhibitors was investigated in more depth by amplifying endogenous targets directly from blood (EDTA tubes). As shown in **Figure 5**, the E507K mutation also provides increased tolerance to blood, allowing amplification of a 322 bp IGF target from up to 45% EDTA-blood. Resistance to blood may be partly explained by higher salt tolerance, as the amount of NaCl introduced with 22.5μl blood is within the range (up to 100 mM; **Table 3**) tolerated by *Taq* E507 (67.5 mM NaCl final). Interestingly, individual mutations at G59W, V155I, and F749I also confer significant resistance to EDTA-blood, albeit less than E507K (15% compared to 45%), perhaps reflecting intrinsic sensitivity to NaCl (tolerate <10–25 mM, while 15% blood introduces 22.5 mM NaCl). In contrast, wild-type *Taq* fails to amplify the 322 bp endogenous target from as little as 1% EDTA-blood (**Table 3**), consistent with previous reports (Al-Soud and Radstrom, 1998; Kermekchiev et al., 2009). In total, these results indicate that G59W, V155I, and F749I mutations confer significant (>10–15-fold) resistance to blood-associated inhibitors through a mechanism that is distinct from E507K (unrelated to increased binding affinity and salt resistance). Moreover, when the majority of CSR-selected mutations

E507 resides in close proximity to the primer portion of the primer-template duplex **(B)** (Li et al., 1998).

#### **Table 2 | Kinetic parameters of wild-type and mutant** *Taq* **polymerases.**


*ND, Not Determined.*

are added to E507K, resistance increases from 45% (*Taq* E507K) to between 50% (*Taq* 3B; 3 additional mutations) and 60–65% (*Taq* 1C2, 2C2, 42; 4–6 additional mutations).

The fast-cycling *Taq* mutants were found to be much less resistant to heparinized blood. The difference was more pronounced for single-site mutants (≤2% for heparinized blood vs. 15–45% EDTA-blood) than for combinatorial mutants (10– 30% vs. 50–65% for heparin- vs. EDTA-blood, respectively). Presumably, heparinized blood poses a greater challenge due


*\*The values represent the highest inhibitor concentration beyond which PCR amplification is inhibited.*

to the combined effects of these known inhibitors (blood and heparin; Satsangi et al., 1994). Resistance to heparinized blood decreases in the following order: 1C2, 2C2 (25–30%) > 42, 3B (10%) > G59W, E507K, F794I (2%) > V155I (≥1%) > wildtype *Taq* (<1%; **Table 3**), indicating that additional mutations (along with E507K) are essential to overcoming the inhibitory burden posed by heparinized blood. *Taq* 1C2 DNA polymerase has been incorporated into a commercially-available master mix (SureDirect PCR, Agilent Technologies), designed for amplification of genomic targets including cfDNA from large volumes of blood. When compared to other blood PCR kits, we found the majority can amplify the 232 bp Quantos target from 2.5 and 25% EDTA- and heparin-treated blood (**Figure 6**). Only the *Taq* 1C2 based formulation could amplify an endogenous target from 45% blood (heparin-treated).

# **DISCUSSION**

In this report, we describe the properties of *Taq* DNA polymerase mutants evolved to self-replicate under abbreviated cycle times. Mutations emerging from CSR selection (G59W, V155I, L245M, L375V, E507K, K508R, E734G, F749I) were randomly combined and subjected to a final round of selection employing 15 s anneal-extension times. The fastest-cycling combinatorial mutants (*Taq* 42, 1C2, 2C2, and 3B) were shown to readily amplify genomic DNA targets up to 300 bp using abbreviated (≤10 s hold times) two-step cycling protocols on fastramping instruments (**Figures 1**, **2**). When tested individually, the fast-cycling phenotype could be attributed solely to the E507K mutation (**Figure 1B**). Further characterization revealed that compared to wild-type, *Taq*E507K exhibits a dramatic (∼90 fold) reduction in *Kd* for primer-template (measured in the absence of nucleotides), and only moderate (∼1.7-fold) or no improvement in polymerization rate or processivity, respectively. These findings suggest that *Taq*'s affinity for primed-template, rather than catalytic rate, is a limiting factor in PCR amplification under fast-cycling conditions.

A previous study concluded that E507 plays a role in primertemplate binding, as substitution of E for Q improves *Taq's* RNA-dependent 5 nuclease activity without altering DNAdependent 5 nuclease activity or RNA- and DNA-dependent DNA polymerase activities (Ma et al., 2000). This result would

**FIGURE 5 | Amplification directly from blood.** A 322 bp IGF target was amplified from 5 to 25μl EDTA-blood using 50 ng of each mutant *Taq* DNA polymerase. Reactions were cycled on the SureCycler 8800 using the following parameters: 5 min at 90◦C, followed by 30 cycles of 30 s at 95◦C, 30 s at 60◦C, 60 s at 72◦C. Wild-type *Taq* can amplify the 322 bp from human genomic DNA in the absence of blood, but not in the presence of 1% blood (data not shown), even though the amount of EDTA introduced with 0.5 μl EDTA-blood (0.089 mM EDTA) is well below inhibitory levels.

be expected if the amino acid side chain at position 507 comes in close proximity to the template strand, and substituting amine for negatively-charged oxygen reduces discrimination against the 2- OH of the ribose sugar. The crystal structure of *Taq* large fragment shows that E507 resides in the H1H2 loop of the thumb domain, which interacts with the distal portion of the primertemplate duplex in both open and closed forms of binary and ternary complexes (Li et al., 1998). In this model, the peptide carbonyl of E507 comes in close proximity (3.8 Å) to the phosphate moiety between the 6th and 7th nucleotide from the 3- end of the newly-extended primer (**Figure 4B**). As a whole, these data suggest that the E507K mutation stabilizes the *Taq*-DNA binary complex by forming additional contacts with the distal (away from the active site) portion of the primed-template. Higher KCl (95 mM instead of 50 mM) may be required with *Taq* E507K mutants to reduce binding affinity for mis-annealed primer-template during PCR annealing steps. The importance of E507 residue in protein/primer-template interactions has been shown in other DNA polymerase families as well. For example, in a study by Cozens et al. (2012), an analogs mutation to *Taq* E507K was made in the thumb domain of *Thermococcus gorgonarious* (family B DNA polymerase; E664K), which transforms *Tgo* into an RNA polymerase by lowering the *Kd* for non-cognate RNA/DNA duplex and lowering the *Km* for ribonucleotide incorporation. This mutant was also capable of translesion synthesis across an abasic site or thymidine dimer.

As discussed above, CSR has been shown to exert strong selective pressure on polymerase fitness, enriching for variants that self-replicate with sufficient fidelity and catalytic efficiency to remain in the gene pool. For example, selected traits such as improved thermostability and increased tolerance to inhibitors have been shown to evolve with no cost to fidelity or catalytic efficiency (*Kcat*/*Km*) (Ghadessy et al., 2001; Baar et al., 2011). CSR selection for tolerance to individual inhibitors has also been shown to produce broad spectrum resistance, presumably because survival requires self-replication in the presence of CSR emulsifiers, stabilizers, and bacterial debris (e.g., denatured protein, nucleic acid, and membrane lipid in the aqueous PCR compartments). In a striking example, CSR selection with bone powder produced a chimeric polymerase (2D9) with broad tolerance to a variety of environmental inhibitors, including humic acid, coprolite, peat extract, clay-rich soil, cave sediment, and tar, but surprisingly not to inhibitors in whole blood (Baar et al., 2011). Despite 81 amino acid changes, 2D9 exhibited comparable fidelity and processivity to wild-type *Taq*, consistent with the intrinsic requirement for polymerase fitness. A broad resistance spectrum implies a common mechanism of inhibition (for bone and soil extracts), prompting the authors to speculate that non-specific binding of inhibitors to protein and/or nucleic acid may sequester *Taq* or DNA template and prevent PCR amplification (Baar et al., 2011).

Encouraged by these reports and others, we assayed our CSRselected mutants for inhibitor resistance. In addition to fastcycling, the E507K mutation was shown to improve resistance to NaCl (tolerates up to 100 mM) and to inhibitors in whole blood (tolerates up to 45% (v/v) EDTA-blood). Individual mutations G59W, V155I, and F749I also confer blood resistance, but the magnitude of improvement is less (by 3-fold) than for E507K, and no corresponding increase in NaCl tolerance was observed (*Taq* L245M not tested). Heparinized blood posed a significant challenge due to the combined inhibitory effects of blood and heparin, as shown by the drastic difference in blood volumes tolerated by *Taq* E507K (22.5μl EDTA-blood vs. 1μl heparin-blood per 50μl PCR). Apparently, increased DNA-binding affinity and NaCl-resistance conferred by the E507K substitution is insufficient to overcome the inhibitory burden, and additional mutations are required for amplification from larger (>5μl) volumes of heparinized blood. Not all mutations or mutation combinations have been tested, but our results to date implicate G59W and 1–3 additional mutations (V155I, L245M, and/or F749I) in improving tolerance to heparinized blood. Compared to *Taq* E507K, *Taq* 3B (E507K *plus* V155I, L245M, F749I) can amplify genomic targets directly from 5% or 8% more EDTA- or heparinblood, respectively. Adding the G59W mutation (*Taq* 3B *plus* G59W) further improves tolerance, allowing *Taq* 1C2 to amplify from 15% (EDTA-treated) or 28% (heparin-treated) more blood compared to *Taq* E507K.

Heparin and hemoglobin/hemin (the most potent inhibitors in blood) are thought to mimic and compete with duplex DNA for binding to the polymerase active site (Byrnes et al., 1975; Akane et al., 1994; Satsangi et al., 1994; Ghadessy et al., 2001). *Taq* mutants with increased tolerance toward heparin or blood have been described previously. After mutagenizing four amino acids implicated in cold-sensitivity and overall performance, Kermiekchiev et al. identified several substitutions at E708 that dramatically (>30-fold) improve resistance of *Taq* E626K/I707L and KlenTaq (N-truncated *Taq*) E626K/I707L to blood, hemoglobin/hemin, soil extract, and humic acid (Kermekchiev et al., 2009). Although no kinetic data were provided, results of a competitive filter-binding assay suggested that generic resistance to PCR inhibitors was related to increased affinity for DNA rather than diminished binding to hemin or humic acid (Kermekchiev et al., 2009). With the possible exception of humic acid, *Taq* E507K mutant exhibits a similarly-broad spectrum of resistance compared to wild-type (*Taq* E507K vs. *Taq* E626K/I707L/E708Q: >45- vs. ∼100-fold for EDTA-blood; 0 vs. 2-fold for SYBR Green; 10- vs. 32-fold for SYBR in the presence of blood; 2- vs. 8-fold for humic acid) in addition to increased affinity for primer-template. E708 mutants have been commercialized by DNA Polymerase Technology, Inc. under the trade names OmniTaq (*Taq* E626K/I707L/E708Q) and OmniKlenTaq (KlenTaq E626K/I707L/E708K) (Zhang et al., 2010).

As discussed above, E507 resides in the primer-template binding site where mutations are expected to modulate DNA binding affinity. It's less clear how substitutions at or near E708 contribute to cold-sensitivity or increased DNA affinity/inhibitor resistance, as this region lies at the hinge-point of the fingers, away from the fingertip (which contacts incoming nucleotide and single-stranded template) and portions of the thumb and palm domains that interact with primer-template (Kermekchiev et al., 2009). *Taq* E507K and E708Q mutants are further distinguished by relative resistance to heparinized blood. *Taq* E507K tolerates up to 45% EDTA-blood compared to 2% heparin-blood, suggesting the combination of heparin and blood-associated (e.g., hemoglobin/hemin) inhibitors saturates a common (mutuallyexclusive) DNA/heme/heparin binding site. Alternatively, E507K confers blood resistance by lowering sensitivity to NaCl, and mutations at other positions confer resistance to DNA mimics by reducing binding affinity for heme/heparin. In contrast to *Taq* E507K, E708 mutants (OmniTaq, OmniKlenTaq) exhibit similar tolerances to blood treated with EDTA, heparin, and citrate, consistent with a unique mechanism of inhibitor resistance (Zhang et al., 2010).

The heparin-binding site of *Taq* was precisely delineated by Ghadessy et al. (2001), who used CSR selection with heparin to identify a *Taq* mutant (*Taq* H15) with 130-fold higher resistance to heparin and comparable (to wild-type *Taq*) affinity for DNA in BIACore and template-dilution assays. Heparin resistanceconferring mutations cluster in the DNA-binding domain, and four of six residues mutated in *Taq* H15 make direct contacts with either the primer (K540, N583) or template (D578, M747) strand in open and closed forms of the binary/ternary complex (Ghadessy et al., 2001).

By selecting for increased speed rather than blood/heparin resistance, we and Kermekchiev et al. (2009) identified mutations that confer increased DNA binding affinity and tolerance for blood; curiously, the E507K mutant retained sensitivity to heparin, while E708 mutants appear equally tolerant to EDTAblood and heparinized blood. In contrast, Ghadessy et al. (2001) identified 6 mutations in H15 that collectively diminish heparin binding without altering *Kd* (DNA), even though heparin is a DNA mimic and the DNA-heparin binding sites are thought to overlap. These findings can be reconciled if one assumes that *Taq* makes additional contacts with primer-template outside the DNA/heparin binding site. In the Ghadessy study, one or more of the H15 mutations (K540, D578, N583, M747) may lower affinity for DNA and heparin through the same mechanism, while the remaining mutations restore *Kd* (equal to wild-type *Taq*) through additional interactions with primer-template. Residing close to K540, N583 and the 3 primer terminus, E507 is close to the DNAheparin binding site, and introducing a positively-charged side chain (E→K) may increase binding affinity for both DNA and heparin (lower heparin tolerance). In contrast to E507, E708 lies farther from primer-template binding site, where mutations that increase DNA binding affinity won't necessarily enhance heparin binding or sensitivity.

Among the other blood-resistant mutations identified here, F749I is of particular interest due to its location relative to the template strand. Although pointing inward, F749 is flanked by amino acids that interact directly with heparin (M747) or the template nucleoside opposing the incoming nucleoside triphosphate (R746, M747, N750). F749 is also thought to stack against I707(Kermekchiev et al., 2003), which may provide a causal link to the cold-sensitive (KlenTaq 706–708 mutants) and inhibitorresistant (*Taq*/KlenTaq 708 mutants) phenotypes identified by Kermekchiev et al. (2003, 2009).

The contribution of mutations located outside the DNAbinding pocket is more difficult to rationalize. In this study, individual mutations G59W and V155I confer increased (by >15 fold) resistance to EDTA-blood, while G59W improves amplification from heparinized blood (by 3.5-fold) when added to *Taq* V155I/L245M/E507K/F749I. G59 and V155 reside in the 5 structure-specific nuclease domain responsible for excising Okazaki RNA during lagging strand synthesis (Li et al., 1998). The 5 nuclease domain was implicated in an earlier study showing that N-truncated *Taq* (KlenTaq; residues 279–882) can amplify from 10- to 100-fold more blood (up to 5–10%) compared to full-length *Taq* (Kermekchiev et al., 2009). KlenTaq also exhibits higher thermostability and lower processivity compared to wildtype, suggesting that point mutations in the 5 nuclease domain can modulate several properties of *Taq* that contribute to PCR performance (Barnes, 1992). Mutations that increase PCR fitness may alleviate the effects of other blood-associated inhibitors (immunoglobulin G, lactoferrin, proteases) that inhibit PCR by unknown means (Al-Soud and Radstrom, 1998, 2001; Al-Soud et al., 2000).

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 31 May 2014; paper pending published: 02 July 2014; accepted: 18 July 2014; published online: 14 August 2014.*

*Citation: Arezi B, McKinney N, Hansen C, Cayouette M, Fox J, Chen K, Lapira J, Hamilton S and Hogrefe H (2014) Compartmentalized self-replication under fast PCR cycling conditions yields Taq DNA polymerase mutants with increased DNA-binding affinity and blood resistance. Front. Microbiol. 5:408. doi: 10.3389/fmicb.2014.00408 This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology.*

*Copyright © 2014 Arezi, McKinney, Hansen, Cayouette, Fox, Chen, Lapira, Hamilton and Hogrefe. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

**REVIEW ARTICLE** published: 16 April 2014 doi: 10.3389/fmicb.2014.00181

# BacteriophageT7 DNA polymerase – sequenase

# *Bin Zhu\**

Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA, USA

#### *Edited by:*

Andrew F. Gardner, New England Biolabs, USA

#### *Reviewed by:*

Kirk Matthew Schnorr, Novozymes A/S, Denmark Samir Hamdan, King Abdullah University of Science and Technology, Saudi Arabia

#### *\*Correspondence:*

Bin Zhu, Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, C2-221, 240 Longwood Avenue, Boston, MA 02115, USA e-mail: bin\_zhu@hms.harvard.edu An ideal DNA polymerase for chain-terminating DNA sequencing should possess the following features: (1) incorporate dideoxy- and other modified nucleotides at an efficiency similar to that of the cognate deoxynucleotides; (2) high processivity; (3) high fidelity in the absence of proofreading/exonuclease activity; and (4) production of clear and uniform signals for detection. The DNA polymerase encoded by bacteriophage T7 is naturally endowed with or can be engineered to have all these characteristics. The chemically or genetically modified enzyme (Sequenase) expedited significantly the development of DNA sequencing technology. This article reviews the history of studies on T7 DNA polymerase with emphasis on the serial key steps leading to its use in DNA sequencing. Lessons from the study and development of T7 DNA polymerase have and will continue to enlighten the characterization of novel DNA polymerases from newly discovered microbes and their modification for use in biotechnology.

**Keywords: bacteriophage T7, DNA polymerase, sequenase, DNA sequencing, marine phages**

#### **INITIAL CHARACTERIZATION**

DNA polymerases catalyze the synthesis of DNA, a pivot process in both living organisms and in biotechnology (Hamilton et al., 2001; Reha-Krantz, 2008). Family A DNA polymerases including *Escherichia coli* DNA polymerase I, Taq DNA polymerase, and T7 DNA polymerase have served as prototypes for biochemical and structural studies on DNA polymerases and have been widely used as molecular reagents (Patel et al., 2001; Loh and Loeb, 2005).

A DNA polymerase activity from bacteriophage T7 was first observed in an *E. coli* mutant deficient in DNA polymerase I infected with bacteriophage T7 (Grippo and Richardson, 1971). The initial characterization of T7 DNA polymerase was intriguing. Although the gene responsible for the polymerase activity was mapped to gene 5 (Hinkle and Richardson, 1974; Hori et al., 1979b), gene 5 protein (gp5) itself had what appeared to be no DNA polymerase activity but only ssDNA exonuclease activity (Hori et al., 1979a). Apparently a host component was required to reconstitute the full DNA polymerase (Modrich and Richardson, 1975a). This host factor turned out to be a small redox protein – *E. coli* thioredoxin (Modrich and Richardson, 1975b; Mark and Richardson, 1976). The redox capacity of thioredoxin, however, is not required for stimulation of the DNA polymerase activity (Huber et al., 1986). Instead thioredoxin plays a structural role in stabilizing the binding of gene 5 protein to a primertemplate (Huber et al., 1987) and increase the processivity of the polymerase more than 100-fold (Tabor et al., 1987a), representing a unique function of this universal protein. Thioredoxin binds to a 71-residue loop of T7 gene 5 protein (Doublié et al., 1998), which is not present in other Pol I-type polymerases, resulting in a stable 1:1 complex (*KD* = 5 nM; Tabor et al., 1987a).

Another intriguing finding during the initial characterization of T7 DNA polymerase is on its exonuclease activity. T7 DNA polymerase lacks the 5- –3 exonuclease activity found in *E. coli* DNA polymerase I but does possess a strong 3- –5single and double stranded DNA exonuclease activity (Hori et al., 1979b). The double-stranded DNA exonuclease activity requires the presence of thioredoxin. Interestingly, various protein purification procedures, depending on the presence or absence of EDTA in the buffer, can generate T7 DNA polymerases that differ significantly in their exonuclease activity, resulting in two forms of DNA polymerase (Fischer and Hinkle, 1980; Engler et al., 1983). By comparison of the two forms of polymerase and careful tracking of the purification procedures, it was revealed that the exonuclease activity of T7 DNA polymerase could be specifically inactivated in an oxidation reaction by oxygen, a reducing agent and ferrous ion (Tabor and Richardson, 1987b). The easily modifiable exonuclease and extraordinary processivity of T7 DNA polymerase kindled the emergence of a powerful tool in the DNA sequencing era.

#### **SEQUENASE ERA**

Invented by Sanger et al. (1977), the method of chain-terminating sequencing initiated a revolution toward the genome-sequencing era. However, the enzymes initially used for chain-terminating sequencing, the Klenow fragment of *E. coli* DNA polymerase I and avian myeloblastosis virus (AMV) reverse transcriptase, had low processivity (∼15 nt for Klenow fragment and 200 for AMV reverse transcriptase, the latter has a relatively higher processivity but its rate of DNA synthesis is only several nucleotides per second). Processivity describes the number of nucleotides continuously incorporated by a DNA polymerase using the same primer-template without dissociation. Thus if the DNA polymerase used for chain-terminating sequencing is non-processive, artifactual bands will arise at positions corresponding to the nucleotide at which the polymerase dissociated. Frequent dissociation will create strong background that obscures the true DNA sequence. Although the issue can be partially improved by long time incubation with high concentration of substrates that may "chase" those artifactual bands up to higher molecular weight, this procedure is not an ideal solution since reinitiation of primer elongation at dissociation sites (usually regions of compact secondary structure or hairpins) is inefficient and may result in the incorporation of incorrect nucleotides. Although T7 DNA polymerase itself has a processivity of only a few nucleotides, the association with *E. coli* thioredoxin dramatically increases its processivity. Consequently, with T7 DNA polymerase termination of a sequencing reaction will occur only at positions where a chain-terminating agent (such as a dideoxynucleotide) is incorporated, yielding a long DNA sequence (Tabor and Richardson, 1987c).

A more severe problem with DNA polymerases used prior to T7 DNA polymerase is the discrimination against dideoxynucleotides, the chain-terminating nucleotides used in Sanger sequencing. Most of known DNA polymerases strongly discriminate against ddNTP. For example, T4 DNA polymerase, *E. coli* DNA polymerase I, Taq DNA polymerase, and Vent DNA polymerase incorporate a dideoxynucleoside monophosphate (ddNMP) at least a 1000 times slower than the corresponding deoxynucleoside monophosphate (dNMP). To use these polymerases in DNA sequencing a high ratio of ddNTP to dNTP must be used for efficient chain-termination. Even though the overall incorporation of ddNMP can be improved in such an uneconomic way, wide variation in the intensity of adjacent fragments still occur because the extent of discrimination varies with different DNA sequences and structures. T7 DNA polymerase, however, is at the other end of the spectrum, discriminating against ddNTP only several-fold. Thus a much lower concentration of ddNTP can be used with T7 DNA polymerase and the uniformity of DNA bands on the gel is much higher (Tabor and Richardson, 1987c). The discrimination was further lowered by replacing magnesium with manganese in the sequencing reaction (Tabor and Richardson, 1989a). With Mn2<sup>+</sup> in an isocitrate buffer, T7 DNA polymerase incorporates dNMP and ddNMP at same rate, resulting in uniform terminations of sequencing reactions.

With the naturally endowed high processivity and the lack of discrimination against ddNTP, the only hindrance for T7 DNA polymerase as a DNA sequencing enzyme is its robust 3- –5- exonuclease activity. Exonuclease activity increases the fidelity of DNA synthesis by excising newly synthesized bases incorrectly base-paired to the template. For applications like PCR it is often a desired feature. While for DNA sequencing such activity is detrimental since when the dNTP concentration falls, the rate of exonuclease activity increases close to that of polymerase activity, resulting in no net DNA synthesis or degradation of DNA. The associated exonuclease activity will also cause DNA polymerase to idle at regions with secondary structures in the template, producing variability in the intensity of signals. The iron-catalyzed oxidation mentioned above can produce modified T7 DNA polymerase with greatly reduced exonuclease activity, and this chemically modified enzyme was the basis for Sequenase and the first easy-to-use DNA sequencing kits commercialized by United States Biochemical Co. However, the residual exonuclease activity can still result in some loss of labeled DNA bands upon prolonged incubation (Tabor and Richardson, 1987b). Tabor and Richardson carried out an extensive chemical and mutagenesis screen for selective elimination of the

exonuclease activity of T7 DNA polymerase. The rapid screen of a large number of mutants was based on the observation that exonuclease minus mutants of T7 DNA polymerase can synthesize through a specific hairpin region in the DNA template (Tabor and Richardson, 1989b). As a result many mutants deficient in exonuclease activity were revealed and among them a mutant lacking 28 amino acids in the N-terminal exonuclease domain had no detectable exonuclease activity, while its polymerase activity is significantly higher that of the wild-type protein. This mutant was the basis of version 2 of Sequenase. Sequenase pioneered development of themostable enzymes and facilitated the automation for high-throughput sequencing.

Degradation of a DNA fragment can occur via a nucleophilic attack on the 3- -terminal internucleotide linkage by H2O or pyrophosphate (PPi). The 3- –5 exonuclease catalyzes the former reaction, generating dNMP or ddNMP. The latter reaction is called pyrophosphorolysis. As the reversal of polymerization, pyrophosphorolysis generates dNTP or ddNTP, sometimes resulting in "holes": the disappearance of ddNMP labeled DNA fragments on the gel. By adding pyrophosphatase to the reaction to cleave PPi the pyrophosphorylysis can be eliminated (Tabor and Richardson, 1990). The combination of modified T7 DNA polymerase, manganese ion, and pyrophosphatase can generate accurate and uniform bands on a DNA sequencing gel to the extent that, the DNA sequence can be directly determined by the relative intensity of each band if different amount of the four ddNTPs are added at certain ratio (Tabor and Richardson, 1990).

Themostability is a highly desired feature for DNA polymerase. A thermostable enzyme like Taq DNA polymerase is superior for cycle sequencing, in which multiple rounds of DNA synthesis are carried out from the same template, with the newly synthesized DNA strand released after each cycle by heat denaturation. The heat stable DNA polymerase survives the denaturation step and is available for the next cycle of polymerization. Cycle sequencing allows much less DNA template and polymerase to be used in a sequencing reaction. In cycle sequencing low processivity is an advantage because a polymerase with low processivity cycles rapidly, decreasing the chance of strong specific stops. However, the strong discrimination against ddNTP (at lease 100-fold, often 10,000-fold) by most thermostable DNA polymerase was a significant obstacle for their use in cycle sequencing. Although the use of manganese ion can decrease the discrimination (Tabor and Richardson, 1989a), manganese has several disadvantages compared with magnesium such as narrow working concentration, precipitation, and less activity of DNA polymerase than that supported by magnesium ion.

Studies on T7 DNA polymerase led to one of the most elegant demonstrations of enzyme engineering and turned Taq DNA polymerase into "Thermo Sequenase." To pursue the molecular mechanism underlying the discrepancy in discrimination against ddNTP among family A DNA polymerases, Tabor and Richardson swapped the five most conserved regions in the crevice responsible for binding DNA and NTPs between T7 DNA polymerase and *E. coli* DNA polymerase I (Tabor and Richardson, 1995), based on the 3D structure of *E. coli* DNA polymerase I. By an SDS-DNA activity assay, the "Helix O" from *E. coli* DNA polymerase I was observed to confer strong discrimination against ddNTP to T7 DNA polymerase. Further mutagenesis in this region revealed that the tyrosine-526 in T7 DNA polymerase or the homologous position phenylalanine-762 in *E. coli* DNA polymerase I was the single determinant for discrimination against ddNTP. When the corresponding residue, F667 in Taq DNA polymerase was replaced with tyrosine, the modified Taq DNA polymerase F667Y actually preferred ddNTP 2-fold over dNTP, comparing to the 6000-fold discrimination against ddNTP by the wide-type enzyme (Tabor and Richardson, 1995). Taq DNA polymerase F667Y, with its naturally endowed superior thermostability and engineered elimination of discrimination against ddNTP, was the basis for "Thermo Sequenase," an enzyme that greatly expedited the Human Genome Project. The structure of T7 DNA polymerase in complex with a primed-template and a nucleoside triphosphate solved later (Doublié et al., 1998) revealed that the 3- -hydroxyl of the incoming nucleotide and the hydroxyl of Tyr 526 are both within hydrogen-bonding distance of the pro Sp-oxygen of the β-phosphate and suggested that one or both of these interactions may be required for nucleotide incorporation. However, even with the structure one could not have predicted the dramatic effect of tyrosine-526 on nucleotide analog discrimination.

#### **AN IDEAL MODEL TO STUDY INTERACTIONS WITHIN A REPLISOME**

T7 DNA polymerase consisting of T7 gene 5 protein and *E. coli* thioredoxin, together with T7 gp4 bifunctional primasehelicase, and gene 2.5 ssDNA-binding proteins constitute the simplest known replisome that mediates coordinated leadingand lagging-strand DNA synthesis (Richardson, 1983; Debyser et al., 1994; Lee et al., 1998; Hamdan and Richardson, 2009). The concise organization of the T7 replisome makes it ideal for studies of the multiple interactions of DNA polymerase during the movement of the replisome such as loading of the polymerase (Zhang et al., 2011), polymerase exchange (Johnson et al., 2007), processive synthesis (Hamdan et al., 2007), and translesion synthesis (Zhu et al., 2011). Critical interactions for coordinated DNA synthesis including polymerase-thioredoxin (Johnson and Richardson, 2003; Ghosh et al., 2008; Akabayov et al., 2010; Tran et al., 2012), polymerase-helicase (Zhang et al., 2011; Kulczyk et al., 2012), polymerase-primase (Chowdhury et al., 2000; Zhu et al., 2010), and polymerase-gene 2.5 single-stranded DNA binding protein (He et al., 2003; Hamdan et al., 2005; Ghosh et al., 2009, 2010) interaction were extensively studied. The solid biochemical background of T7 DNA polymerase also attracted investigations using single-molecular methods (Lee et al., 2006; Hamdan et al., 2009; Pandey et al., 2009; Etson et al., 2010; Loparo et al., 2011; Geertsema et al., 2014).

#### **NOVEL T7-LIKE DNA POLYMERASES**

DNA polymerases from microbes advanced DNA sequencing technology that in turn unveiled a much larger, diverse and unexplored microbial world. Metagenomics data indicates that the marine phages are the most abundant and diverse organisms on the earth (Suttle, 2005), of which 60–80% potential gene products do not match any in the database. A large portion

of these gene products must be involved in the nucleic acid metabolism, thus one can expect numerous novel nucleic acid enzymes that can enrich the present toolbox of enzymes derived from a small group of characterized microbes. Indeed, our own initial effort on the characterization of marine phage polymerases have revealed unique features of a single-subunit RNA polymerase from marine cyanophage Syn5 that can complement the predominantly used T7 RNA polymerase for *in vitro* RNA synthesis (Zhu et al., 2013a,b). Characterization of marine phage DNA polymerases appears more promising since one can easily target numerous interesting DNA polymerases from the reported marine phage genomes, even just for T7-like or family A DNA polymerases such as those from cyanophage Syn5 (Pope et al., 2007) and P-SSP7 (Sullivan et al., 2005), phages infecting SAR116-clade bacterium (Kang et al., 2013) and marine ssDNA phages (Schmidt et al., 2014). Considering the high probability that the 60–80% unmatched genes may harbor novel polymerase genes, the marine phage is an unlimited treasure to contribute new polymerase tools that can fulfill niches in biotech industry. Characterization and engineering of T7 DNA polymerase has shown the value of identifying novel properties of nucleic acid enzymes.

#### **REFERENCES**


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 10 March 2014; paper pending published: 31 March 2014; accepted: 01 April 2014; published online: 16 April 2014.*

*Citation: Zhu B (2014) Bacteriophage T7 DNA polymerase – sequenase. Front. Microbiol. 5:181. doi: 10.3389/fmicb.2014.00181*

*This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology.*

*Copyright © 2014 Zhu. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Engineering processive DNA polymerases with maximum benefit at minimum cost

# *Linda J. Reha-Krantz 1\*, Sandra Woodgate2 and Myron F. Goodman3*

<sup>1</sup> Department of Biological Sciences, University of Alberta, Edmonton, AB, Canada

<sup>2</sup> Trevigen, Inc., Gaithersburg, MD, USA

<sup>3</sup> University of Southern California, Los Angeles, CA, USA

#### *Edited by:*

Andrew F. Gardner, New England Biolabs, USA

#### *Reviewed by:*

Sylvie Doublie, University of Vermont, USA Michael Trakselis, Baylor University, USA

#### *\*Correspondence:*

Linda J. Reha-Krantz, Department of Biological Sciences, University of Alberta, CW405 Biosciences Building, Edmonton, AB T6G 2E9, Canada e-mail: linda.reha-krantz@ualberta.ca

DNA polymerases need to be engineered to achieve optimal performance for biotechnological applications, which often require high fidelity replication when using modified nucleotides and when replicating difficult DNA sequences. These tasks are achieved for the bacteriophage T4 DNA polymerase by replacing leucine with methionine in the highly conserved Motif A sequence (L412M). The costs are minimal. Although base substitution errors increase moderately, accuracy is maintained for templates with mono- and dinucleotide repeats while replication efficiency is enhanced. The L412M substitution increases intrinsic processivity and addition of phage T4 clamp and single-stranded DNA binding proteins further enhance the ability of the phageT4 L412M-DNA polymerase to replicate all types of difficult DNA sequences. Increased pyrophosphorolysis is a drawback of increased processivity, but pyrophosphorolysis is curbed by adding an inorganic pyrophosphatase or divalent metal cations, Mn2<sup>+</sup> or Ca2+. In the absence of pyrophosphorolysis inhibitors, the T4 L412M-DNA polymerase catalyzed sequence-dependent pyrophosphorolysis under DNA sequencing conditions. The sequence specificity of the pyrophosphorolysis reaction provides insights into how the T4 DNA polymerase switches between nucleotide incorporation, pyrophosphorolysis and proofreading pathways. The L-to-M substitution was also tested in the yeast DNA polymerases delta and alpha. Because the mutant DNA polymerases displayed similar characteristics, we propose that amino acid substitutions in Motif A have the potential to increase processivity and to enhance performance in biotechnological applications. An underlying theme in this chapter is the use of genetic methods to identify mutant DNA polymerases with potential for use in current and future biotechnological applications.

**Keywords: A+T- and G+C-rich DNA templates, DNA polymerase-DNA dynamics, DNA polymerase processivity, DNA polymerase translocation, DNA sequencing, motif A, pyrophosphorolysis, replication fidelity**

# **INTRODUCTION**

The remarkable advances made in molecular biology in the last 40 years are dependent to a large extent on DNA polymerasedependent methods. Researchers who pioneered the use of DNA polymerases for DNA sequencing, PCR and site-directed mutagenesis were awarded Nobel prizes for their ground-breaking achievements, but DNA polymerases play key roles in numerous additional indispensable methods including DNA labeling, cloning, whole genome amplification, and diagnostic techniques. Researchers, however, recognized from the earliest days that some DNA polymerase activities are counter-productive for *in vitro* applications. Nuclease activities, for example, can degrade primers and the newly synthesized DNA, yet some exonucleolytic proofreading activity is needed for high fidelity DNA replication. Other DNA polymerase activities need to be modified, for example the ability to utilize non-standard nucleotides and to replicate "difficult" DNA templates with simple repeats or sequences that are excessively rich in A+T or G+C base pairs. DNA polymerases from several organisms have been subject to extensive engineering in order to remove or curb unwanted

activities and to modify or enhance others that are required for optimal performance *in vitro* (see several chapters in this volume and reviews by Hamilton et al., 2001; Reha-Krantz, 2008).

One of the most challenging tasks has been to engineer DNA polymerases to replicate difficult DNA sequences. DNA polymerases stall and often dissociate at difficult template sequences and these sequences are frequently sites for replication errors. For example (Streisinger et al., 1966; Streisinger and Owen, 1985) observed that simple repeat sequences in phage T4 genes are hotspots *in vivo* for frameshift mutations, which are typically insertions or deletions within monoand di-nucleotide repeat sequences. The length of the repeat sequence affects the likelihood of frameshift mutations as hotspot mutation sites have longer repeat tracks than colder, less mutable sites. Streisinger et al. (1966) and Streisinger and Owen (1985) that frameshift mutations are created by transient separation of the primer and template strands followed by strand misalignment during reannealing to create an intermediate with an unpaired repeat in the template or primer

strand, but with a correctly paired primer-template terminus that can be extended by a DNA polymerase. If the misaligned DNA strands are not corrected before the next round of chromosome replication, the repeat sequence will be expanded (if the unpaired repeat is in the primer strand) or contracted (if the unpaired repeat is in the template strand). As expected, these "slippery" DNA templates are also difficult to replicate *in vitro* as they slow or even prevent further replication and are sites for insertions and deletions (Rao, 1994; Fidalgo da Silva and Reha-Krantz, 2000; Clarke et al., 2001; Fazekas et al., 2010). Kunkel et al. (1994) demonstrated that +1 insertions in homopolymeric runs dramatically increase in reactions with the phage T7 DNA polymerase in the absence of thioredoxin, the T7 DNA polymerase processivity factor. Thus, decreased DNA polymerase processivity is correlated with increased strand misalignment mutagenesis, presumably by increasing DNA polymerase dissociation which provides an opportunity for the free primer-end to spontaneously denature and then to re-anneal in a misaligned configuration. This proposal is supported by other studies, for example, Fazekas et al. (2010).

We observed that the leucine to methionine (L412M) substitution in the conserved Motif A (**Table 1**) of the bacteriophage T4 DNA polymerase produced a mutant DNA polymerase that has improved ability to replicate difficult DNA sequences under typical DNA sequencing conditions. Furthermore, the L412M-DNA polymerase has increased ability to incorporate and extend modified nucleotides. Both activities are facilitated by increased intrinsic processivity (Reha-Krantz and Nonay,1994), as explained below.

The L412M substitution in the phage T4 DNA polymerase was identified by a genetic selection strategy as a second-site amino acid substitution that suppressed the excessive proofreading observed for several mutant DNA polymerases (Stocki et al., 1995; Li et al., 2010). As expected, the L412M substitution reduces proofreading activity, but base substitution mutations are increased moderately 10- to 40-fold, while frameshift mutations by only threefold at most (**Table 2**). Thus, high fidelity DNA replication observed for the wild type T4 DNA polymerase, about one error in 107–108 nucleotides incorporated (Kunkel et al., 1984), is essentially retained; hence, the T4 L412M-DNA polymerase is an excellent candidate polymerase for single-molecule sequencing and other applications where error-free replication is required. Note that amino acid substitutions in Motif A of DNA polymerases from several organisms have been observed to cause minor to major changes in replication fidelity, both increasing and decreasing accuracy (**Table 2**; Reha-Krantz and Nonay, 1994; Patel and Loeb, 2000; Patel et al., 2001; Fidalgo da Silva et al., 2002; Niimi et al., 2004; Li et al., 2005; Venkatesan et al., 2006, 2007; Pursell et al., 2007; Sakamoto et al., 2007; Nick McElhinny et al., 2008; Zhong et al., 2008).

We review the development of the phage T4 L412M-DNA polymerase as an important DNA sequencing tool and its use in other applications. There are useful lessons to be learned because replication of difficult DNA templates remains salient with current DNA sequencing and amplification technologies.

Because amino acid substitutions in Motif A of the bacteriophage T4 DNA polymerase have profound effects on intrinsic processivity (**Figure 1**), we propose that amino acid substitutions in Motif A of other DNA polymerases, especially the L-to-M substitution in family B DNA polymerases, have the potential to increase processivity and enhance DNA polymerase performance in biotechnological applications. A strategy to "evolve" mutant DNA polymerases with increased processivity and other desirable activities for *in vitro* applications (Bourn et al., 2011) is compared to our genetic evolution strategies. We also present new data about pyrophosphorolysis activity, a byproduct of increased intrinsic processivity. While the scope of this chapter is limited to methods used to enhance the intrinsic processivity of the phage T4 DNA polymerase, information about the naturally highly processive φ29 DNA polymerase is presented by Blanco et al. (1983) and Eid et al. (2009). The association of thioredoxin with the T7 DNA polymerase and the development of SequenaseTM is presented by Zhu (2014). Increased processivity produced by fusing DNA-binding domains to DNA polymerases is discussed by Arezi et al. (2014) in this volume.

### **MATERIALS AND METHODS**

#### **DNA POLYMERASES AND DNA POLYMERASE ACCESSORY PROTEINS**

Expression, purification, and characterization of wild type and mutant T4 DNA polymerases were described previously (Lin et al., 1987; Spicer et al., 1988; Reha-Krantz et al., 1993). The T4 gp45 clamp and clamp loading proteins (gp 44/gp62) and the T4 singlestranded DNA (ssDNA) binding protein (gp32) were expressed and purified as described (Shamoo et al., 1986; Rush et al., 1989), but with modifications for large scale production.

#### **DNA POLYMERASE INTRINSIC PROCESSIVITY**

The intrinsic processivity of wild type and mutant phage T4 DNA polymerases was determined in the presence of a heparin trap as described by Reha-Krantz and Nonay (1994). Briefly, 20 μl reactions contained 25 mM HEPES, pH 7.5, 60 mM NaOAc, 1 mM dithiothreitol, 0.5 mM EDTA, 80 μM dNTPs, 0.2 mg/ml bovine serum albumin, 7.5 nM primed singlestranded circular M13 DNA (expressed as 3- -primer ends), and 150 nM DNA polymerase. Reactions were pre-incubated 5 min at 30◦C and started by adding a solution of Mg2<sup>+</sup> [Cf, 6 mM Mg(OAc)2] and heparin (Cf, 0.1 mg/ml). Reactions were incubated 15 s at 30◦C and stopped by the addition of 2 μl 0.2 M EDTA. Reaction products were separated on DNA sequencing gels (7% acrylamide, 8 M urea) and the 32P-labeled products were visualized by exposure to Kodak X-Omat AR film.

#### **PYROPHOSPHOROLYSIS ASSAY CONDITIONS AND INHIBITION BY Mn2<sup>+</sup> AND Ca2<sup>+</sup> IONS**

Pyrophosphorolysis activity was measured for the exonucleasedeficient D112A/E114A/L412M-DNA polymerase in 12 μl reactions containing partially digested duplex DNA that was 3- labeled with [32P]dCMP (50 pmol of labeled 3- -ends/reaction), 10 nM DNA polymerase, 67 mM Tris-HCl (pH 7.5), 16.7 mM (NH4)2SO4, 0.5 mM DTT, 167 μg BSA/ml, 6.7 mM MgCl2, and



<sup>a</sup>Data from*Table 2* and Reha-Krantz and Nonay (1994).

<sup>b</sup>Data from Li et al. (2005); mutator phenotype observed in the absence of mismatch repair.

<sup>c</sup>Data from Niimi et al. (2004).

<sup>d</sup>Unpublished observations, LR-K.

<sup>e</sup>V758M is near Motif C.

<sup>f</sup>Li (2004).

1 mM PPi. Reactions with Mn2<sup>+</sup> or Ca2<sup>+</sup> ions also contained 15 mM Na citrate. The PPi concentration varied from 0.5 to 6 mM and Mn2<sup>+</sup> or Ca2<sup>+</sup> ion concentrations varied from 0.25 to 10 mM. Reactions were stopped by the addition of 2 μl 0.2 M EDTA. The product of pyrophosphorolysis, [α-32P]dCTP, was separated by thin layer chromatography on polyethyleneimine impregnated cellulose (PEI) plates. Samples (2 μl) of the reactions were applied 1 cm from the bottom of the plate. The plates were developed in 50% ethanol up to 5 cm, dried and developed in 0.15 M KH2PO4 – 15% ethanol solvent until the solvent reached the top of the plate. Plates were dried and either the UV-absorbing spot was cut out and radioactivity was determined using a scintillation counter or the dried TLC plate was exposed to a PhosphorImager screen (Molecular Dynamics). These conditions were also studied under DNA sequencing conditions (see below).

Pyrophosphorolysis was also determined in assays using the fluorescence of the base analog, 2-aminopurine (2AP). The use of 2AP to study DNA polymerase function is described with detailed instruction by Reha-Krantz (2009). Briefly, 2AP fluorescence in DNA is quenched by base-stacking interactions, but the 2AP nucleotide is highly fluorescent. Thus, for primer-templates labeled at the 3- -end of a primer with 2AP, pyrophosphorolysis is detected as an increase in fluorescence intensity due to production of free 2AP deoxynucleoside triphosphate (d2APTP). Because 3- -exonuclease activity will also release the terminal 2AP nucleotide, but as the deoxynucleoside monophosphate (d2APMP), DNA polymerases were engineered to be exonuclease deficient with the D112A/E114A amino acid substitutions

which prevent Mg2<sup>+</sup> binding in the exonuclease active site (Reha-Krantz and Nonay, 1993; Elisseeva et al., 1999). Stopped-flow experiments were performed with the Applied Photophysics SX.18 MV spectrofluorometer. Samples were excited at 310 nm; a 335 nm cut-off filter was used. Temperature in the samplehandling unit was maintained at 20.0 ± 0.5◦C. Reactions were initiated in the stopped flow by mixing equal volumes of a solution of 1400 nM exonuclease-deficient T4 DNA polymerase, 400 nM 2AP-DNA substrate, 5 mM PPi, 25 mM HEPES (pH 7.5), 2 mM DTT, 50 mM NaCl and 1 mM EDTA with a second solution of 16 mM MgCl2, 25 mM HEPES (pH 7.5), and 50 mM NaCl. After mixing, the final concentrations of reaction components were 700 nM exonuclease-deficient T4 DNA polymerase, 200 nM 2AP-DNA, 2.5 mM PPi, 8 mM MgCl2, 1 mM DTT, 25 mM HEPES (pH 7.5), 50 mM NaCl, and 0.5 mM EDTA. Between five and six determinations were performed for each reaction and mean values were calculated. The experimental traces were fit to either single or double exponential equations. The agreement of the curve fits was judged by analysis of the randomness of the distribution of residuals for curves generated by single or double exponential equations. Two DNA substrates were used that differed in the A+T- or G+C-richness in the primer-terminal region. A+T DNA: primer, 5- GCACGTCATCGG**TAATP**; template, 3- CGTGCAGTAGCC**ATTAT**GGATCGATGGTTT. G+C DNA: primer, 5- GCACGTCATTAA**CGG**TP; template, 3- CGTGCAG TAATT**GCC**ATGGATCGATGGTTT. The A+T and G+C primer terminal DNA sequences are indicated in bold font. P indicates 2AP.



<sup>a</sup>Mutations that confer acriflavin resistance in the ac gene (Wang and Ripley, 1998).

<sup>b</sup>Reversion of rII199oc to rII; mostly AT→GC mutations (Reha-Krantz, 1995). <sup>c</sup>Reversion of rII131 by <sup>+</sup>1 fs. <sup>d</sup>Data from several students, especially Alia Daoud and Fatima Kamal.

<sup>e</sup>Total "other" are mostly deletions between direct repeats.

<sup>f</sup>nd; not determined.

<sup>g</sup>Data from Reha-Krantz and Nonay (1994).

#### **DNA SEQUENCING WITH THE T4 EXONUCLEASE-DEFICIENT, L412M-DNA POLYMERASE HOLOENZYME COMPLEX: FIDELITYTM**

FIDELITYTM (ONCOR, Gaithersburg, MD, USA) was a manual DNA sequencing kit that was marketed for general sequencing purposes but also for use to sequence difficult DNA templates. There are four steps. In the first step, primer (1 pmol), labeled at the 5- -end in reactions using T4 polynucleotide kinase and [γ32P]ATP, was annealed in a 10 μl reaction to 1 μg ssDNA M13 DNA in annealing buffer [25 mM Tris-HCl (pH 8.5), 20 mM MgCl2, and 50 mM NaCl]. Reactions were held at 65◦C for 2 min and then cooled slowly over 30 min to 35◦C. The reactions were then pulse-centrifuged and chilled briefly on ice. If the primer was not labeled, a second step was performed to label the primer DNA internally using [α35S] or [α33P]dATP at 1500 Ci/mmol, 10 mCi/ml. Internal labeling reactions contained the 10 μl annealed DNA produced in the first step; 3 μl T4 reaction buffer [0.4 M Tris-HCl (pH 8.5), 40 mM MgCl2, 40 mM DTT, 0.4 mg/ml acetylated BSA; 4 mM ATP; 1.4 μM each dTTP, dGTP, dCTP]; exonuclease-deficient L412M-DNA polymerase (10 nM); and water to produce a final total volume of 18 μl. The reaction was incubated at 40 to 42◦C for 15 min and then placed on ice. Primer elongation with chain terminators was carried out in the third step. A mixture (6 μl) of T4 DNA polymerase accessory proteins (gp32, 2.7 mg/ml; gp44/62, 0.2 mg/ml; gp45, 0.7 mg/ml) in 20 mM Tris-HCl (pH 8.0), 100 mM NaCl, 1 mM EDTA, 1 mM DTT, and 62% glycerol was added to the 18 μl of labeled DNA and mixed. 5.5 μl of the elongation mix was added to 2 μl of the four separate termination reaction tubes. All termination mixes contained 300 μM dATP, dTTP, dGTP, and dCTP that were supplemented with 40 μM 3- NH2ddATP (A mix), or 80 μM 3- NH2ddTTP (T mix), or 40 μM 3- NH2ddGTP (G mix), or 80 μM 3- NH2ddCTP (C mix). The reactions were incubated at 40 to 42◦C for 5 min. In the last (4th step), reactions were stopped and proteins degraded by addition of 5 μl of a solution containing

3 parts Stop Solution (95% formamide, 20 mM EDTA, 0.05% bromophenol blue, 0.05% xylene cyanol blue) and 1 part Proteinase K Solution (100 μg/ml proteinase K in 50 mM Tris-HCl, pH 8.0). The reactions were mixed and incubated at 40◦C for 15 min. Samples were heated at 80◦C immediately prior to loading samples for gel electrophoresis.

panel A. The figure is modified from Reha-Krantz and Nonay (1994) and is

shown with permission from the ASBMB.

Note, for reactions with inorganic pyrophosphatase (iPPase), the T4 DNA polymerase solution also contained 8 U/ml yeast iPPase. Alternatively, pyrophosphorolysis was inhibited in reactions with 16.7 mM Mg2+, 15 mM Na citrate and either 2.5 mM Mn2<sup>+</sup> or 10 mM Ca2+.

### **RESULTS**

#### **BIOCHEMICAL CHARACTERIZATION OF THE BACTERIOPHAGE T4 L412M-DNA POLYMERASE**

#### *Increased intrinsic processivity*

DNA polymerase intrinsic processivity refers to the number of nucleotides incorporated during one enzyme-DNA association replication cycle, but intrinsic processivity is partially masked *in vivo* by the association of processivity factors. For many DNA polymerases that function at replication forks, a clamp protein (gp45 in phage T4, PCNA in eukaryotes and the beta sliding clamp in *Escherichia coli*) tethers the DNA polymerase to the DNA template. The gp45 and PCNA clamps are composed of three identical subunits that form a doughnut structure around duplex DNA. The clamp is proposed to function as a tool belt to tether one or more DNA polymerases or other proteins found at replication forks (Pagès and Fuchs, 2002; Indiani et al., 2005; Moldovan et al., 2007). Although researchers have known for years that tethering of the T4 DNA polymerase in holoenzyme complexes increases both polymerase and exonuclease activities, Yang et al. (2004) observed that the primer-template appears to transfer between two tethered polymerases during proofreading. Thus, apparent continuous replication by the tethered T4 DNA polymerase may involve coordinated replication by two DNA polymerases tethered to the same clamp (Joyce, 2004). Tethering, however, does not fully compensate for differences in intrinsic processivity as explained below.

Without tethering, the wild type T4 DNA polymerase has little ability to initiate or extend a primer (**Figure 1**; Reha-Krantz and Nonay, 1993, 1994; Spacciapoli and Nossal, 1994). In reactions with 32P-labeled primer, single-stranded circular DNA and heparin to trap dissociated DNA polymerase, the vast majority of primers are not extended by the wild type T4 DNA polymerase and the few extensions detected are limited to incorporation of one or two nucleotides (**Figure 1B**, lane 1). Longer extension products are visible with prolonged film exposure (**Figure 1A**, lane 1), which means that most non-tethered T4 DNA polymerase molecules cannot extend the primer, the few complexes that are competent for extension incorporate just one or two nucleotides before dissociation, and only a few of these complexes escape the initiation phase and commence primer elongation. However, for primer extension complexes that escape early dissociation, there are preferential termination sites as indicated by discrete bands after incorporation of 46, 112, and 252 nucleotides (**Figure 1A**). Thus, the intrinsic processivity of the wild type T4 DNA changes from an initial very low or essentially nonprocessive state to a state with higher processivity during the elongation phase, but the elongating DNA polymerase remains sensitive to difficult DNA sequences as demonstrated by preferential termination sites. The role of the clamp then is in initiation and in assisting replication through difficult DNA sequences.

Amino acid substitutions in Motif A of the T4 DNA polymerase (**Table 1**) dramatically alter intrinsic processivity. Longer extension products are visible for the L412M-DNA polymerase (**Figure 1A**, lane 2), especially when concentrations of deoxyribonucleoside triphosphates (dNTPs) are reduced to 1 μM (**Figure 1**, lane 6); thus, more L412M-DNA polymerase complexes are able to escape the initiation phase and to replicate past difficult sequences. Kinetic studies are consistent with these observations. In standing start primer extension assays, the *K*<sup>d</sup> for dAMP incorporation is lower for the L412M-DNA polymerase compared to wild type, 11 versus 16 μM (Hariharan et al., 2006), and the *k*off rate is about twofold slower (Fidalgo da Silva et al., 2002). While the L412M substitution is the most conservative substitution, dramatic changes in the opposite direction are observed for the slightly less conservative L412I substitution. The L412I-DNA polymerase cannot replicate DNA when the dGTP pool is reduced and an antimutator phenotype is observed instead of the mutator phenotype observed for the L412M-DNA polymerase (**Table 1**). These features resemble the consequences of another conservative substitution in Motif A, I417V. The I417V-DNA polymerase is much less processive than the wild type T4 DNA polymerase (**Figure 1**, lane 3), especially when dNTP pools are reduced (**Figure 1**, lane 7). Like the L412I-DNA polymerase, the I417V-DNA polymerase cannot replicate DNA when the dGTP pool is reduced (Reha-Krantz and Nonay, 1994; Stocki et al., 1995) and proofreading is increased as indicated by the antimutator phenotype (**Table 2**). The low processivity of the I417V-DNA polymerase means that replication is inhibited by subtle DNA damage, for example a phosphotriester, which does not impede replication by other DNA polymerases (Tsujikawa et al., 2003). Because the differences in replication fidelity and sensitivity to dNTP pools are observed *in vivo* for the mutant DNA polymerases in the presence of the gp45 clamp, intrinsic processivity prevails in the presence of tethering and determines the stability of DNA polymerase complexes when dNTP pools are low, when DNA templates are damaged or difficult sequences are present, and if proofreading will be initiated.

Sequences at preferential termination sites are informative because they identify DNA sequences that are difficult for the T4 DNA polymerase to replicate. Many termination sites occur at simple repeats: the +46 product terminates within the template ATAT sequence and the +112 product terminates at the beginning of the template GCGC sequence (**Figure 1**; Reha-Krantz and Nonay, 1993). Termination with another DNA template was observed at the beginning of a template CACACA sequence (Spacciapoli and Nossal, 1994). These dinucleotide repeat termination sites are distinct from another type of difficult sequence, namely hairpin DNA structures that form between inverted repeats in ssDNA ahead of an advancing DNA polymerase. The T4 DNA polymerase has little ability to bypass hairpin structures in the absence of T4 ssDNA binding protein (gp32) and the clamp (gp45) and clamp loading proteins (Huang et al., 1981; Roth et al., 1982; Bedinger et al., 1989; Jarvis et al., 1991; Hacker and Alberts, 1994).

The simple repeat sequences at the termination sites in **Figure 1** likely cause pausing, which causes enzyme dissociation that is associated with initiation of the proofreading pathway (see review by Reha-Krantz, 2010). While mismatches at the primer-end trigger proofreading as an error avoidance mechanism (Brutlag and Kornberg, 1972; Muzyczka et al., 1972), gratuitous proofreading of correctly paired primer ends can be caused by anything that hinders continued primer elongation. In contrast to dissociation of the T4 DNA polymerase when proofreading is initiated, return of the trimmed primer-end to the polymerase active site is rapid and processive, even in the absence of tethering (Reddy et al., 1992; Fidalgo da Silva and Reha-Krantz, 2007). With respect to *in vitro* applications, less gratuitous proofreading and subsequent dissociation is observed for the T4 L412M-DNA polymerase than for other DNA polymerases; hence, difficult template sequences, low dNTP concentrations, and other factors that impede nucleotide incorporation will be less problematic for the more processive DNA polymerase.

### *Biochemical mechanism for increased intrinsic processivity by the T4 L412M-DNA polymerase; insights provided by heightened sensitivity to PAA*

The L412M substitution converts the T4 DNA polymerase from resistance to sensitivity to the herpes virus antiviral drug, phosphonoacetic acid (PAA). PAA-sensitivity is observed for the T4 L412M-DNA polymerase *in vivo* (**Table 1**; Reha-Krantz et al.,1993; Li et al., 2010) and *in vitro* (Reha-Krantz and Nonay, 1994). PAA resembles pyrophosphate (PPi), a byproduct of nucleotide incorporation; hence, PAA-sensitivity indicates increased PPi binding. PAA-sensitivity can be understood in terms of the nucleotide incorporation pathway described in **Figure 2** (Reha-Krantz et al., 2011). Complex I is a pre-translocation (Pre-T) complex. Complex II is formed by translocation forward by one template position, which removes the PPi binding site present in Complex I and creates a new nucleotide binding pocket (pre-insertion site). Complexes I and II are in rapid equilibrium (Fidalgo da Silva et al., 2002; Hariharan et al., 2006). Nucleotide binding produces the open ternary complex, Complex III, and a conformational change produces the closed ternary complex, Complex IV. Nucleotide incorporation takes place in Complex V; then PPi is released to form Complex VI, a Pre-T complex like Complex I except that the primer-end has been extended by one nucleotide.

Complex VI is at the crossroads of three competing pathways. The DNA polymerase may continue nucleotide incorporation by translocating to form Complex VII, a new post-T complex, or the reverse reaction can be initiated by re-binding PPi to form Complex V and then catalyzing pyrophosphorolysis, which produces dNTP and shortens the primer strand by one nucleotide (Complex IV). This is the pathway that is sensitive to PPi-like antiviral drugs. A third possibility is proofreading, which is initiated if the primerend is mismatched, but gratuitous proofreading is also possible if polymerization conditions are not optimal, for example low dNTP pools. For proofreading, the end of the primer strand is separated from the template and transferred to the exonuclease site. A beta hairpin structure located in the exonuclease domain of the T4 DNA polymerase acts as a wedge to stabilize the exonuclease complex, Complex VIII (Stocki et al., 1995; Marquez and Reha-Krantz, 1996; Reha-Krantz, 1998, 2010; Reha-Krantz et al., 1998; Hogg

**FIGURE 2 | Nucleotide incorporation scheme for the bacteriophage T4 DNA polymerase.** Pre- and Post-T complexes are in equilibrium (Complexes I and II). dNTP binding results in formation of an open ternary complex (Complex III); a conformation change to evaluate the accuracy of the incoming nucleotide results in formation of the closed ternary complex (Complex IV). Chemistry takes place to join the incoming nucleotide to the primer end and the PPi byproduct is formed (Complex V). PPi dissociates to form Complex VI,

a new Pre-T, which is at the crossroads of three pathways. Complex VI is in equilibrium with the Post-T complex (Complex VII); binding the correct nucleotide starts another cycle of nucleotide incorporation. Alternatively, PPi may bind to Complex VI to form Complex V, which leads to the reverse reaction (pyrophosphorolysis). PAA, as a PPi mimic, also binds to Complex VI. A third alternative is proofreading (Complex VIII). The figure is modified from Reha-Krantz et al. (2011) and is shown with permission from the ACS.

et al., 2004, 2007; Subuddhi et al., 2008). All three pathways are in competition, but differences in DNA polymerase interactions with the primer-template affect which pathway is chosen. PAAsensitivity indicates that the T4 L412M-DNA polymerase favors forming the Pre-T complex, which has a PPi-binding site, over the Post-T complex, which has a dNTP binding site. Although PPi-like antiviral drugs could conceivably inhibit replication by interfering with dNTP binding in Post-T complexes, structural studies show that PPi-like phosphonoformic acid (PFA) traps a mutant RB69 DNA polymerase in a complex that resembles Complex V (Zahn et al., 2011) as expected if PFA binds to the Pre-T complex.

Formation of Pre- and Post-T complexes can also be observed in assays that use the fluorescence of the base analog 2AP. Pre-T complexes have higher fluorescence intensity than Post-T complexes for DNA substrates with 2AP positioned in the +1 position in the template strand, adjacent to the terminal base pair (Mandal et al., 2002; Hariharan and Reha-Krantz, 2005; Reha-Krantz et al., 2011). The higher fluorescence intensity of polymerase complexesformed with the L412M-DNA polymerase indicates that the L412M-DNA polymerase favors formation of the highly fluorescent Pre-T complexes over Post-T complexes in the absence of PPi and dNTPs (Hariharan and Reha-Krantz,2005). The role of the Pre-T complex in drug sensitivity is further substantiated by studies that show that PFA inhibits the HIV-1 reverse transcriptase by trapping a Pre-T complex (Marchand et al., 2007). We extend these findings by linking PAA-sensitivity of the L412M-DNA polymerase to increased intrinsic processivity, which suggests that Pre-T complexes are more stable, and less subject to enzyme dissociation than Post-T complexes (see further discussion by Li et al., 2010).

### *Increased pyrophosphorolysis is observed for the T4 L412M-DNA polymerase*

Increased pyrophosphorolysis is expected for PAA-sensitive DNA polymerases because the equilibrium between Pre- and Post-T complexes favors Pre-T complexes, which can bind PPi (**Figure 2**). The PAA-sensitive T4 L412M-DNA polymerase catalyzes a robust pyrophosphorolysis reaction. Section "Materials and Methods"for assay conditions. The pyrophosophorolysis rate in the presence of 2.5 mM PPifor the exonuclease-deficient (D112A/E114A) L412M-DNA polymerase is fivefold higher than the rate detected for the exonuclease-deficient T4 DNA polymerase with the wild type Motif A sequence and more than 80-fold higher than the less processive, exonuclease-deficient I417V-DNA polymerase. The *K*<sup>m</sup> for PPi for the exonuclease-deficient L412M-DNA polymerase was ∼0.5 mM compared to 2 mM for the exonuclease-deficient DNA polymerase with the wild type Motif A and was too high to be measured for the exonuclease-deficient I417V-DNA polymerase (Damaraju and Reha-Krantz, unpublished observations).

# *Mechanisms to curb pyrophosphorolysis catalyzed by the L412M-DNA polymerase; formation of divalent cation-PPi complexes*

Pyrophosphorolysis is the reversal of the polymerization reaction, but pyrophosphorolysis is usually not a problem during DNA replication *in vivo* because iPPase degrade PPi to prevent build-up

of the high concentrations of PPi needed to drive pyrophosphorolysis. But high concentrations of PPi formed during DNA sequencing reactions support pyrophosphorolysis, which results in the degradation of chain-terminated sequencing products and ambiguous sequencing data (**Figure 3**). The addition of iPPase to sequencing reactions is often used to prevent pyrophosphorolysis (Tabor and Richardson, 1990), and this is successful in sequencing reactions catalyzed by the L412M-DNA polymerase, but divalent metal ions – Mn2<sup>+</sup> and Ca2+, are equally effective when combined with Mg2<sup>+</sup> in the presence of 15 mM Na citrate (**Figure 3**; Damaraju and Reha-Krantz, unpublished observations).

The mechanism of inhibition of pyrophosphorolysis by divalent metal cations is complex because, as demonstrated for yeast iPPase (Ridlington and Butler, 1972), while some divalent metal ions such as Mg2<sup>+</sup> are needed for catalysis (activators), other metal ions such as Mn2<sup>+</sup> or Ca2<sup>+</sup> may inhibit activity by binding to the enzyme or DNA or by forming complexes with PPi. We observed this complexity for the nucleotide incorporation and

pyrophosphorolysis reactions catalyzed by the T4 exonucleasedeficient L412M-DNA polymerase (Damaraju and Reha-Krantz, unpublished data). Either Mg2<sup>+</sup> or Mn2<sup>+</sup> can support nucleotide incorporation and pyrophosphorolysis reactions, but Mg2<sup>+</sup> supports both reactions optimally at concentrationsfrom 1 to>5 mM, while Mn2<sup>+</sup> is only half as effective as Mg2<sup>+</sup> and over a narrow concentration range that peaks at about 0.5 mM. In the presence of 15 mM Na citrate, however, Mn2<sup>+</sup> supports nucleotide incorporation over a broad concentration range up to 5 mM, but the pyrophosphorolysis reaction is supported only over the narrow range from 0.5 to 1 mM. Thus, Na citrate protects the nucleotide incorporation reaction at high concentrations of Mn2+, but not the pyrophosphorolysis reaction. This means that pyrophosphorolysis catalyzed DNA degradation can be avoided in DNA sequencing reactions by the addition of 15 mM Na citrate and 3 to 5 mM Mn2+, but note further improvements discussed below.

Na citrate protects the nucleotide incorporation reaction by functioning as a weak chelator, which reversibly binds divalent metal cations; however, PPi also forms complexes with metal cations. This point is demonstrated by the observation that higher amounts of Mn2<sup>+</sup> ions are needed to inhibit pyrophosphorolysis by the T4 L412M-DNA polymerase as the concentration of PPi is increased (Damaraju and Reha-Krantz). For example, in the presence of 1 mM PPi, <sup>∼</sup> 1 mM Mn2<sup>+</sup> produces 50% inhibition of pyrophosphorolysis; for 2 mM PPi, <sup>∼</sup>2 mM Mn2<sup>+</sup> is needed for 50% inhibition; and for 4 mM PPi, >3 mM Mn2<sup>+</sup> is required to achieve the same level of inhibition (**Figure 4**). The stoichiometry of the reaction is consistent with formation of Mn-PPi chelate complexes or in the case of Ca2+, Ca-PPi chelate complexes. Thus, there is a complex equilibria involving one or more metal ions (Me): DNA pol-Me, citrate-Me, PPi- Me, free Me2+, free DNA pol, free PPi, free Na citrate, and active and inactive DNA pol- Me-PPi complexes. While Mn-PPi binding may inactivate pyrophosphorolysis directly, this is not the most likely explanation for inhibition in reactions with Na citrate and Mn2<sup>+</sup> above 3 mM because nucleotide incorporation is not inhibited; Mn-PPi binding would be expected to inhibit both reactions. Another explanation is that Mn2<sup>+</sup> sequesters PPi and reduces the high concentration of PPi needed for the pyrophosphorolysis reaction, but there is sufficient free Mn2<sup>+</sup> (∼0.5 mM) to support, but not to inhibit, the nucleotide incorporation reaction. The ability of Na citrate to maintain low concentrations of Mn2<sup>+</sup> has been proposed by others (Beckman et al., 1985; Tabor and Richardson, 1989, 1990), but in this case Mn2<sup>+</sup> was thought to improve the evenness of band intensities (DNA sequencing products) by reducing discrimination against chain-terminating nucleotides. Ca2<sup>+</sup> can also chelate PPi and effectively prevent pyrophosphorolysis (results not shown).

Although conditions were optimized for Mn2+-dependent nucleotide incorporation and minimal pyrophosphorolysis, nucleotide incorporation was only 60% of what is observed in Mg2+-dependent reactions and Ca2<sup>+</sup> does not support the nucleotide incorporation reaction catalyzed by T4 DNA polymerase. Optimal nucleotide incorporation and minimal pyrophosphorolysis activity were obtained in mixed, divalent

metal reactions with Mg2<sup>+</sup> and either Mn2<sup>+</sup> or Ca2<sup>+</sup> in the presence of Na citrate. For DNA sequencing using FIDELITYTM (see Materials and Methods), pyrophosphorolysis was essentially eliminated in reactions with 16.7 mM Mg2+, 15 mM Na citrate and either 2.5 mM Mn2+, or 10 mM Ca2<sup>+</sup> without adverse effects on DNA sequencing efficiency (**Figure 3**).

#### *DNA sequence context effects on the pyrophosphorolysis reaction catalyzed by the T4 L412M-DNA polymerase*

DNA sequence effects have been reported for the pyrophosphorolysis reactions catalyzed by the Klenow fragment of *E. coli* DNA polymerase I (Mizrahi et al., 1986) and the phage T7 DNA polymerase (Tabor and Richardson, 1990), but systematic studies were not carried out to determine if specific DNA sequences are required for pyrophosphorolysis. We observed both general and specific DNA sequence contexts for the pyrophosphorolysis reactions catalyzed by the exonuclease-deficient T4 wild type and L412M-DNA polymerases.

Conditions that favor formation of Pre-T complexes (Complex VI, **Figure 2**) are predicted to increase pyrophosphorolysis because PPi can bind to this complex. In previous studies using the fluorescence of the base analog, 2AP, more Pre-T than Post-T complexes are formed with A+T-rich than G+C-rich DNA substrates (Hariharan and Reha-Krantz, 2005; Hariharan et al., 2006). Thus, pyrophosphorolysis is predicted to favor DNA substrates with A+T-rich primer-terminal regions. This was observed in reactions with A+T- and G+C-rich DNAs labeled

with 2AP at the 3- -primer end (see Materials and Methods). The rate for release of the terminal d2APTP by the exonucleasedeficient L412M-DNA polymerase in the presence of 2.5 mM PPi was sixfold faster for the A+T-rich DNA compared to the <sup>G</sup>+C-rich DNA, 14 s−<sup>1</sup> compared to 2.4 s−<sup>1</sup> (Damaraju and Reha-Krantz). The exonuclease-deficient T4 DNA polymerase with the wild type Motif A sequence also favored A+T-rich DNA, but with slower rates, 3 and 0.5 s−<sup>1</sup> for A+T- and G+C-rich DNAs, respectively. However, while pyrophosphorolysis was observed for DNA substrates with a terminal 2AP opposite template T, pyrophosphorolysis was not detected for 2AP opposite template C as expected if pyrophosphorolysis requires a matched primer-end. DNA substrates with A+T-rich primer-terminal regions also favor the exonucleolytic proofreading pathway (Bessman and Reha-Krantz, 1977; Bloom et al., 1994; Reha-Krantz, 2010). Thus, increased formation of Pre-T complexes, which likely reflects decreased ability to form Post-T complexes, increases the likelihood of proofreading or pyrophosphorolysis. Proofreading is favored if the primer-end is mismatched and pyrophosphorolysis is observed if sufficient PPi is present.

Specific DNA sequences were observed to promote pyrophosphorolysis under DNA sequencing conditions using the FIDELITYTM manual DNA sequencing kit developed by ONCOR, Gaithersburg, MD, USA. Reactions contained the exonuclease deficient L412M-DNA polymerase, the gp45 processivity factor, the gp45 loading proteins as well as the T4 ssDNA binding protein, gp32, to prevent formation of DNA hairpin structures in ssDNA. Reaction products were internally labeled with α33PdATP; the template was single-stranded circular DNA. Instead of 2- ,3- -dideoxy nucleoside triphosphate terminators (ddNTPs), 3- amino-2- ,3- -dideoxy nucleoside triphosphates (3- NH2ddNTPs) were used because the T4 DNA polymerase showed less discrimination. 3- NH2ddNTPs can be purchased from TriLink BioTechnologies, San Diego, CA, USA. All reaction components and procedures are described in Section "Materials and Methods." Four separate reactions were run, each with a different chain-terminator. The reaction products were separated by polyacrylamide gel electrophoresis under denaturing conditions (**Figure 3A**). In the absence of agents that reduce pyrophosphorolysis – iPPase or manganese (Mn2+), there are missing bands due to pyrophosphorolysis as well as many faint bands (**Figures 3A,B**).

It is readily apparent that not all chain-terminated primer termini are equally sensitive to pyrophosphorolysis; only primer ends ending in "T" were subject to pyrophosphorolysis, but not all terminal Ts (see right side of **Figure 3A**). Examination of 200 nucleotides of DNA sequence (**Figure 3B**) with 54 sequencing products that terminated in T revealed that only 12 of the terminal Ts were subject to pyrophosphorolysis. What is different about the DNA sequences for pyrophosphorolysissensitive and insensitive sites? There are three related sequences in the primer-template region that promote pyrophosphorolysis (**Table 3**); the sequences differ only in the base pairs at the −1 and −2 positions. Sequence 1 has a GtemplateCprimer base pair in the −2 position; sequence 2 has a CtemplateGprimer base pair in the −1 position, but not in the −2 position; and sequence 1/2 has a GtemplateCprimer base pair in the −2 and

a CtemplateGprimer base pair in the −1 position. A template "G" was never observed in the +1 position in the template strand, even though statistically 2 or 3 were expected in the 11 sequences examined. Note that a "G" in the +1 position for a terminal T at position 26 in the DNA sequence (**Figure 3B**) stabilized the T to pyophosphorlysis even though the remainder of the consensus sequence is present. Similarly, a template "A" was not observed in the +4 position, but an A stabilized the Ts at DNA sequence positions 5 and 8. The consensus sequences 1, 2 and 1/2 (**Table 3**) are required to sensitize terminal Ts to pyrophosphorolysis, but one exception was observed, see Ea in **Figure 3**. Although the consensus sequence is not present, the terminal T is just before a dinucleotide GT repeat, which may stall formation of Post T complexes and, thus, promote pyrophosphorolysis. Thus, the pyrophosphorolysis sensitive sites observed in **Figure 3** may be an underestimation of the number of sensitive sites. Nevertheless, the strong bias for pyrophosphorolysis for only primers ending in "T" and only for a subset of terminal "Ts" indicates sequence specificity for the pyrophosphorolysis reaction catalyzed by the T4 L412M-DNA polymerase.

#### *Increased incorporation of fluorophore-labeled nucleotides*

The L412M-DNA polymerase has increased ability to incorporate modified nucleotides compared to the wild type T4 DNA polymerase, exonuclease-deficient T4 DNA polymerases and other mutant T4 DNA polymerases and DNA polymerases from other organisms (Goodman and Reha-Krantz, 1999). Reactions showing incorporation and extension of a variety of modified nucleotides with the L412M- and exonuclease-deficient L412M-DNA polymerase are shown in US patent 5945312. The modified nucleotides tested include, but are not limited to rhodaminedUTP, fluorescein-dUTP, rhodamine-dCTP, biotin-dCTP and DIG-dCTP. For all modified nucleotides, the L412M-DNA polymerase performed significantly better than the wild type T4 DNA polymerase or DNA polymerases from other organisms. The T4 L412M-DNA polymerase was used to prepare fluorescently labeled DNA with one or two fluorophore-labeled bases for proof-inprinciple testing of DNA sequencing by exonuclease digestion proposed by Keller and colleagues (Goodwin et al., 1995; Werner et al., 2003).

#### **THE PHAGE T4 EXONUCLEASE-DEFICIENT L412M-DNA POLYMERASE AS A DNA SEQUENCING ENZYME: FIDELITYTM**

The FIDELITYTM DNA sequencing kit (ONCOR, Gaithersburg, MD, USA) was developed for routine DNA sequencing, but especially for sequencing difficult DNA sequences. The use of the exonuclease-deficient L412M-DNA polymerase with increased intrinsic processivity plus the addition of T4 processivity factors and ssDNA binding protein provided the means to sequence all types of difficult DNA sequences with greater success than achieved by other methods as judged by testimonials and applications, for example, see Devireddy and Jones (1998) and Tullius et al. (2001). An example of routine DNA sequencing is shown in **Figure 3** and examples of sequencing difficult DNAs are shown in **Figure 5**. Note the clean sequence for the G+C-rich template


#### **Table 3 | DNA sequences that promote pyrophosphorolysis by the phage T4 L412M-DNA polymerase.**

<sup>a</sup>n, terminal base pair.

<sup>b</sup>X is any nucleotide; A, C, G, T.

<sup>c</sup>Chain terminator.

<sup>d</sup>Letter code that corresponds to faint or missing Ts in the sequencing reactions shown in *Figure 3*.

Bold indicates highly conserved sequence.

(**Figure 5A**) and for a template with several tracks of C repeats (**Figure 5B**).

### **USING GENETIC APPROACHES TO IDENTIFY "BIOTECH" DNA POLYMERASES:** *IN VIVO* **AND** *IN VITRO* **SELECTION METHODS**

Genetic selection methods are powerful tools to identify mutant DNA polymerases on the basis of phenotype without the need to have a complete understanding of structure or function. The advantage of genetic selection strategies is that if the stringency is sufficiently strong, then only mutants with the desired phenotype will survive. This expedites "finding a needle in a haystack." The phage T4 L412M-DNA polymerase was the result of two consecutive genetic selection schemes.

First, a genetic selection was carried out for the identification of mutant T4 DNA polymerases with a strong mutator phenotype. Details for the selection of mutator T4 DNA polymerases (Reha-Krantz et al., 1986; Reha-Krantz, 1988) and a similar strategy for mutator yeast DNA polymerase δ mutants (Murphy et al., 2006) are described. Several mutator T4 DNA polymerases were identified, but not the L412M-DNA polymerase because this mutant replicates DNA with relatively high fidelity as discussed (**Table 2**). For many isolates there was more than one mutation in the DNA polymerase gene. Single mutant strains had to be constructed in order to determine if one or more mutations were required to confer the mutator phenotype and to rule out contributing mutations in other genes. For one isolate, called *mel5*, there were 11 mutations in the DNA polymerase gene, but only the mutation encoding the D131G amino acid substitution was required for mutator activity (Reha-Krantz, 1988). The D131G-DNA polymerase has strongly reduced ability to form exonuclease complexes and, as a consequence, has severely reduced 3- −→5 exonuclease activity (Baker and Reha-Krantz, 1998).

Why were there so many mutations in the *mel5* strain? One possibility is that error-prone replication by mutator DNA polymerases created many mutations within the DNA polymerase gene and in other genes during propagation, but a more likely explanation is that additional DNA polymerase mutations were *selected* to temper the mutational burden produced by the D131G amino acid substitution. This proposal is supported by the presence of a mutation in the *mel5* strain that encodes the I417V substitution in Motif A (**Table 1**). The I417V-DNA polymerase has an antimutator phenotype, increased proofreading, and low intrinsic processivity (Reha-Krantz and Nonay, 1994; **Figure 1** and **Table 2**), activities which would help to maximize residual exonuclease activity in the presence of the D131G substitution and increase replication fidelity. These activities, however, would also help to improve maturation of Okazaki fragments. Okazaki fragment maturation requires the coordinated action of DNA polymerase 3- −→5 exonuclease activity with a 5- −→3 exonuclease, ExoI/RNaseH in T4 and Rad27 in yeast, to create a nick that can be sealed by DNA ligase (Jin et al., 2004; reviewed by Reha-Krantz, 2010). In the absence of DNA polymerase 3- −→5 exonuclease activity, the DNA polymerase catalyzes strong displacement synthesis at junctions between Okazaki fragments that creates 5- −→3 flaps that prevent ligation unless the flap is removed. Persisting unjoined Okazaki fragments are dangerous because strand discontinuities result in double strand breaks in the next replication cycle. Thus, there are at least two strong selective pressures exerted to decrease the negative effects of reduced DNA polymerase proofreading.

The second selection to identify the L412M-DNA polymerase was to select for suppressors of the low processivity and high proofreading activity of the I417V-DNA polymerase (**Table 1** and **Figure 1**). One consequence of low processivity is the need for high dNTP pools to sustain replication. The I417V-DNA polymerase could not replicate DNA when the dGTP pool was reduced, which allowed for selection of second site mutations that allowed replication under low dGTP conditions (Reha-Krantz and Nonay, 1994; Stocki et al., 1995; Li et al., 2010). Mutations encoding the L412M substitution in Motif A and elsewhere in the DNA polymerase gene were identified (Stocki et al., 1995; Li et al., 2010). Thus, the mutation encoding the L412M substitution and functionally equivalent amino acid substitutions are expected to be identified as suppressors of mutant DNA polymerases with low processivity.

Selection schemes can also be done *in vitro* to identify mutant DNA polymerases with desired properties for biotech applications. Bourn et al. (2011) describe an *in vitro* strategy beginning with a chimeric DNA polymerase derived from *Pyrococcus kodakaraensis* and *Pyrococcus furiosus*, which was named Kofu DNA polymerase. Several mutant DNA polymerases with increased processivity and other desirable features were identified, but many had multiple mutations as we observed for *in vivo* selections with the T4 DNA polymerase. There are no reports to link amino acid substitutions in the Kofu DNA polymerase with specific polymerase properties and the amino acid substitutions in the DNA polymerase marketed as KAPA HiFi DNA polymerase by KAPA Biosystems Ltd., South Africa, are proprietary. Bourn et al. (2011) suggest, however, that advantageous properties may require the combined action of several amino acid substitutions. None of the amino acids disclosed, however, are in Motif A, but there could be amino acid substitutions in other DNA polymerase regions that affect processivity, for example see Stocki et al. (1995), Reha-Krantz and Wong (1996), and Li et al. (2010). While our genetic selections were done *in vivo*, *in vitro* strategies similar to Bourn et al. (2011) can be envisioned that exploit sensitivities to dNTP pool concentrations, PAA or PPi can be envisioned.

We have applied information learned about T4 DNA polymerase function to other DNA polymerases, namely yeast DNA polymerases alpha and delta. Amino acid changes in the yeast DNA polymerases that are analogous to the L412M substitution in the T4 DNA polymerase produce PAA-sensitivity (**Table 1**; Li et al., 2005). Because PAA-sensitivity is correlated with increased intrinsic processivity, the mutant yeast DNA polymerases are also predicted to have increased processivity; however, the mutant yeast DNA polymerases are more error prone than the L412M-DNA polymerase (**Table 2**; Niimi et al., 2004; Li et al., 2005). The differences in replication fidelity for the T4 and yeast DNA polymerases suggest differences in the equilibria between Pre- and Post-T complexes and in forming exonuclease complexes. However, fine tuning this balance can be achieved by testing other amino acid substitutions in Motif A and by combining amino acid substitutions. For example, the phage T4 L412M/I417V-DNA polymerase is PAA-sensitive, although less than the L412M-DNA polymerase, but the double mutant has an antimutator phenotype and slightly increased proofreading compared to the wild type T4 DNA polymerase (**Tables 1** and **2**). Thus, single or multiple amino acid changes within the Motif A sequence of DNA polymerases in general may produce mutant DNA polymerases with the optimal balance of intrinsic processivity and proofreading that will allow replication of difficult DNA sequences and increased utilization of modified nucleotides while maintaining replication fidelity.

#### **DISCUSSION**

The phage T4 exonuclease-deficient L412M-DNA polymerase is proven to be an excellent DNA sequencing enzyme especially when combined with processivity factors (clamp and clamp loaders) and ssDNA binding protein; this is the basis of FIDELITYTM. All DNAs tested, including DNAs with long stretches of monoand di-nucleotide repeats and high A+T- or G+C-content are sequenced cleanly (**Figure 5**; S. Woodgate personal observations). While the T4 L412M-DNA polymerase remains an attractive candidate as a DNA sequencing enzyme, other DNA polymerases may have additional desirable properties that would profit by modifications to the Motif A sequence. Indeed, in addition to our studies of Motif A in phage T4 and yeast DNA polymerases (Reha-Krantz and Nonay, 1994; Li et al., 2005; **Table 1**), Motif A in family B and A

DNA polymerases is surprisingly amenable to engineering despite being a conserved sequence (**Table 1**; Patel and Loeb, 2000; Niimi et al.,2004;Venkatesan et al., 2006,2007). Our studies demonstrate that Motif A functions in determining intrinsic processivity for the phage T4 DNA polymerase (**Figure 1**) and, by extrapolation, to other DNA polymerases. The level of intrinsic processivity determines the ability of DNA polymerases to replicate difficult DNA sequences and to incorporate modified nucleotides; these properties are maintained even when dNTP pools are reduced for the L412M-DNA polymerase (**Figure 1**). Different amino acid substitutions and combinations of substitutions in Motif A allow fine-tuning to achieve the optimal balance of intrinsic processivity for the replication of difficult DNA templates while maintaining high replication fidelity; these are highly desirable properties for sequencing, amplification and labeling technologies. We focus here on Motif A because this motif is conserved in all DNA and RNA polymerases, which means that site-directed mutagenesis of Motif A can be used to identify mutant DNA polymerases with increased intrinsic processivity that still retain adequate proofreading activity. However, other DNA polymerase regions also affect intrinsic processivity (Reha-Krantz et al., 1993; Stocki et al., 1995; Li et al., 2010) and, thus, are attractive secondary mutational targets.

Increased intrinsic processivity increases pyrophosphorolysis, but methods were shown to curb pyrophosphorolysis activity of the exonuclease-deficient L412M-DNA polymerase by the addition of iPPase or divalent metal ions, Mn2<sup>+</sup> or Ca2<sup>+</sup> (**Figure 3**). In the absence of these agents, pyrophosphorolysis under DNA sequencing conditions displayed sequence specificity. Pyrophosphorolysis was detected only for a subset of sequencing products terminated with T (**Figure 3**). Although pyrophosphorolysis by the T4 wild type and L412M-DNA polymerases was higher for DNA substrates with A+T- compared to G+C-rich primer terminal regions, the sequence specificity observed under DNA sequencing conditions was specific for primer-ends with the Tterminating nucleotide plus either a CtemplateGprimer base pair in the −1 position or a GtemplateCprimer base pair in the −2 position or a CtemplateGprimer in the −1 and a GtemplateCprimer in the −2 position (**Table 3**). Dinucleotide repeats may also sensitize terminal Ts, see the pyrophosphorolysis sensitive site Ea in **Figure 3** and **Table 3**. The preference of pyrophosphorolysis for A+T-rich primer-terminal regions is expected for the T4 DNA polymerase, because A+T-rich primer-termini favor formation of Pre-T complexes (Hariharan and Reha-Krantz, 2005), which can bind PPi and PPi-like antiviral drugs (**Figure 2**). The sequence-dependence for pyrophosphorolysis under DNA sequencing conditions must also indicate sequences that favor Pre-T complexes, but the mechanism is unclear. Unfortunately, while there are numerous structural studies of the RB69 DNA polymerase (for example Franklin et al., 2001; Hogg et al., 2004; Wang et al., 2011), which is a close relative of the phage T4 DNA polymerase (Hogg et al., 2006), the primer-template DNAs used in crystallography did not have the pyrophosphorolysis-sensitive sequences indicated in **Table 3**; thus, a direct test of DNA polymerase interactions with the pyrophosphorolysis sequences has not been done. However, the DNA substrate used to capture a mutant RB69 DNA polymerase with the PPi analog, PFA (foscarnet), was A+T-rich except

for the terminal acyclo guanine (Zahn et al., 2011), which is consistent with our data that demonstrate increased pyrophosphorolysis with A+T-rich DNAs.

We propose that the sequence specificity for the pyrophosphorolysis reaction under DNA sequencing conditions is caused by subtle interactions of the DNA polymerase with a dynamic primer-terminal region that breathes more or less depending on the DNA sequence, but we do not rule out the possibility of base recognition, especially for bases in the +1 and +4 positions in the template strand (**Table 3**). Breathing in the terminal region is proposed to explain why A+T-richness favors proofreading by increasing the strand separation needed to form exonuclease complexes (for example, see Bloom et al., 1994 and Bessman and Reha-Krantz, 1977); we suggest here that breathing also affects the rapid equilibrium between Pre- and Post-T complexes and increased breathing favors Pre-T complexes. Breathing has been detected at the primer-terminal junction by fluorescence studies that demonstrate measurable unwinding ∼2 base pairs into the duplex region (Jose et al., 2009), which may be exacerbated by DNA polymerase interactions. But how does the DNA polymerase distinguish between opting for pyrophosphorolysis compared to proofreading? Proofreading a chain-terminated primer-end may always be preferred, but the DNA polymerase in sequencing reactions is exonuclease-deficient which allows detection of pyrophosphorolysis, but pyrophosphorolysis is not simply a default pathway because sequence specificity is observed. The pyrophosphorolysis-sensitive DNA sequences (**Table 3**) must stabilize formation of stable Pre-T complexes (reduce dissociation) at the expense of Post-T complexes. In other words, sequences that promote pyrophosphorolysis must hinder translocation to form Post-T complexes.

A critical step in translocation may involve interactions of the DNA polymerase at the −1 position in the primer-terminal region, using the naming system indicated in **Table 1** with "n" designating the terminal base pair. Structural studies show that a conserved lysine residue intercalates into the primer-terminal region at the −1 position of ternary complexes formed with the RB69 DNA polymerase (Franklin et al., 2001), but in complexes trapped with the PPi mimic PFA (Zahn et al., 2011), translocation was blocked after nucleotide incorporation and lysine intercalation did not change, which meant intercalation was observed at the −2 position. Thus, lysine intercalation advances step wise after each nucleotide is incorporated with intercalation at the −2 base pair position in Pre-T complexes before translocation and advancing to the −1 base pair position to form Post-T complexes. The −1 and −2 positions are also critical for promoting pyrophosphorolysis; a GtemplateCprimer base pair is observed in the −2 position of sequences that promote pyrophosphorolysis (sequence 1) unless there is a CtemplateGprimer base pair in the −1 position (sequence 2) or two GC and CG base pairs are observed in the −2 and −1 positions, respectively (**Table 3**). If localized breathing in the primer-terminal region is the critical factor for forming stable Pre-T complexes, disfavoring exonuclease complexes, and hindering translocation to form Post-T complexes, then the requirement of GC and CG base pairs at the −2 and −1 positions suggests that GC base pairs impart stability that reduces breathing and strand separation needed to form exonuclease complexes. But what hinders

formation of Post-T complexes? One possibility may reflect the effect of duplex stability on the ability of the conserved lysine residue to intercalate into duplex DNA if the ease of intercalation is affected by A+T/G+C-richness. Another possibility may be that sequences which promote pyrophosphorolysis cause a slight distortion of the DNA helix. This possibility is suggested because not any GC base pair is sufficient, but only a GtemplateCprimer base pair in the −2 position or a CtemplateGprimer base pair in the −1 position, not the reverse Watson-Crick base pairs. There is also specificity for AtemplateTprimer or C templateGprimer base pairs in the −3 position. While base recognition has not been ruled out, subtle physical changes in the primer-terminal region appear to affect the equilibria between Pre- and Post-T complexes and exonuclease complexes. This suggestion is in keeping with trying to understand the underlying mechanism to explain how the conservative L412M substitution in the T4 DNA polymerase has such profound effects on processivity and sensitivity to the PPimimic PAA (**Figure 1**; Reha-Krantz and Nonay, 1994) while the L412I substitution also has profound effects, but in the opposite direction. Structural studies of the analogous RB69 L415M-DNA polymerase of Post-T ternary complexes with dNTP (Xie et al., 2013) or Pre-T complexes trapped with PFA (Zahn et al., 2011) do not shed light on mechanism, which then leaves open the possibility that rapidly changing conformations not yet captured in structural studies are behind the observed characteristics of the T4 L412M- and L412I-DNA polymerases.

Future studies are needed to determine if the pyrophosphorolysis-sensitive sequences described in **Table 3** promote pyrophosphorolysis in the absence of a chain-terminating T or if the chain-terminator contributes to the observed sequence specificity. DNA binding studies with the primer-templates described in **Table 3** are also needed. Studies are also needed to explore the possibility that bases in the +1 and +4 positions in the template are recognized by the T4 DNA polymerase. The ability of "A"in the +4 position in the template strand to prevent pyrophosphorolysis suggests that the DNA polymerase can detect sequence. Interestingly, the Pfu DNA polymerase detects deaminated bases in the +4 position as part of an apparent error avoidance mechanism (Connolly, 2009). The sequence specificity of the pyrophosphorolysis reaction suggests possible implications for antiviral drug therapy. For example, is the sensitivity of herpes viral DNA polymerases to PPi analogs, PAA/PFA/foscarnet, dependent on DNA sequence? See additional discussion by Li et al. (2010).

In conclusion, amino acid substitutions in the phage T4 DNA polymerase Motif A increase or decrease intrinsic processivity by altering the equilibria between Pre- and Post-T complexes and formation of exonuclease complexes. Amino acid changes in Motif A affect intrinsic processivity by increasing the stability of Pre-T complexes. Increased stability of Pre-T complexes means increased ability to replicate difficult DNA sequences and to incorporate modified nucleotides. Motif A is an attractive target for engineering "biotech" DNA polymerases because intrinsic processivity can be fine-tuned by using different amino acid substitutions and combinations of amino acid substitutions to optimize replication of difficult DNA sequences and to enhance the ability to use modified nucleotides while maintaining replication fidelity. Another important take-home message is to acknowledge that DNA and DNA polymerase-DNA interactions are dynamic, especially in the primer-terminal region (Hariharan and Reha-Krantz, 2005; Hariharan et al., 2006; Jose et al., 2009). Lastly, DNA polymerases may have more ability to detect specific sequences than previously recognized.

#### **AUTHOR CONRIBUTIONS**

Linda J. Reha-Krantz was responsible for organizing and writing the initial manuscript and for providing new data. Sandra Woodgate and Myron F. Goodman added intellectual content and critically evaluated the manuscript. Sandra Woodgate developed the FIDELITYTM DNA sequencing kit in collaboration with Myron F. Goodman and Linda J. Reha-Krantz and she provided data on the pyrophosphorolysis reaction under DNA sequencing conditions. All authors agree to be accountable for all aspects of the work presented.

#### **ACKNOWLEDGMENTS**

Linda J. Reha-Krantz acknowledges support from the Natural Sciences and Engineering Research Council of Canada (NSERC) for operating and strategic grants, and from the Canadian National Cancer Institute. Linda J. Reha-Krantz also gratefully acknowledges the contributions of more than 40 skilled research personnel, students, and postdoctoral fellows from 1982 to the present. Myron F. Goodman has received support from NIHGM, NIHEHS and NIHCA grants. Sandra Woodgate acknowledges Jay George, CSO at ONCOR.

Selection of the mutant L412M-DNA polymerase and other mutant DNA T4 DNA polymerases, biochemical characterization of the mutant L412M-DNA polymerase and other mutant DNA T4 DNA polymerases, and determining the ability of the mutant L412M-DNA polymerase and other mutant T4 DNA polymerases to incorporate modified nucleotides are covered in the following U.S. patents authored by Myron F. Goodman and Linda J. Reha-Krantz: chain-terminating nucleotides for DNA sequencing, U.S. Patent Number – 5,547,859, Issued Aug. 20, 1996; Methods for identifying and isolating variant T4 DNA polymerases, U.S. Patent Number – 5,660,980, Issued Aug. 26, 1997; Variant DNA Polymerases, U.S. Patent Number – 5,928,919, Issued July 27, 1999; Synthesis of fluorophore-labeled DNA, U.S. Patent Number – 5,945,312, Issued Aug. 31, 1999. A licensing agreement was formed between ONCOR of Gaithersburg, Maryland and Myron F. Goodman and the University of Southern California and Linda J. Reha-Krantz and the University of Alberta. The FIDELITYTM kit was developed by Sandra Woodgate at ONCOR, Gaithersburg, Maryland in collaboration withMyron F. Goodman, Linda J. Reha-Krantz, and other researchers at ONCOR. The kit is no longer manufactured.

#### **REFERENCES**


for primer extension and proofreading reactions. *Biochemistry* 44, 15674–15684. doi: 10.1021/bi051462y


of pyrophosphorolysis and misincorporation reactions. *Proc. Natl. Acad. Sci. U.S.A.* 83, 5769–5773. doi: 10.1073/pnas.83.16.5769


and *Escherichia coli* DNA polymerase I. *Nucleic Acids Res.* 31, 4965–4972. doi: 10.1093/nar/gkg722


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 03 June 2014; accepted: 07 July 2014; published online: 04 August 2014. Citation: Reha-Krantz LJ,Woodgate S and Goodman MF (2014) Engineering processive DNA polymerases with maximum benefit at minimum cost. Front. Microbiol. 5:380. doi: 10.3389/fmicb.2014.00380*

*This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology.*

*Copyright © 2014 Reha-Krantz, Woodgate and Goodman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# DNA polymerases drive DNA sequencing-by-synthesis technologies: both past and present

# *Cheng-Yao Chen\**

Protein Engineering Group, Illumina, San Diego, CA, USA

#### *Edited by:*

Andrew F. Gardner, New England Biolabs, USA

#### *Reviewed by:*

Suleyman Yildirim, Walter Reed Army Institute of Research, USA Andreas Marx, University of Konstanz, Germany

#### *\*Correspondence:*

Cheng-Yao Chen, Protein Engineering Group, Illumina, 5200 Illumina Way, San Diego, CA 92122, USA e-mail: cchen2@illumina.com

Next-generation sequencing (NGS) technologies have revolutionized modern biological and biomedical research. The engines responsible for this innovation are DNA polymerases; they catalyze the biochemical reaction for deriving template sequence information. In fact, DNA polymerase has been a cornerstone of DNA sequencing from the very beginning. Escherichia coli DNA polymerase I proteolytic (Klenow) fragment was originally utilized in Sanger's dideoxy chain-terminating DNA sequencing chemistry. From these humble beginnings followed an explosion of organism-specific, genome sequence information accessible via public database. Family A/B DNA polymerases from mesophilic/thermophilic bacteria/archaea were modified and tested in today's standard capillary electrophoresis (CE) and NGS sequencing platforms. These enzymes were selected for their efficient incorporation of bulky dye-terminator and reversible dye-terminator nucleotides respectively. Third generation, real-time single molecule sequencing platform requires slightly different enzyme properties. Enterobacterial phage φ29 DNA polymerase copies long stretches of DNA and possesses a unique capability to efficiently incorporate terminal phosphatelabeled nucleoside polyphosphates. Furthermore, φ29 enzyme has also been utilized in emerging DNA sequencing technologies including nanopore-, and protein-transistorbased sequencing. DNA polymerase is, and will continue to be, a crucial component of sequencing technologies.

**Keywords: Sanger sequencing, chain terminators, reversible terminators, sequencing-by-synthesis, DNA polymerase, next-generation sequencing, protein engineering**

# **INTRODUCTION**

Since the advent of enzymatic dideoxy-DNA sequencing by Frederic Sanger (Sanger et al., 1977), sequencing DNA/RNA has become standard practice in most molecular biology research. The proliferation of next-generation sequencing (NGS) technologies has further transformed modern biological and biomedical research. Today, large-scale whole genome sequencing has become routine in life science research. Although technical advances in current NGS technologies have dramatically changed the way nucleic acids are sequenced, the engine ultimately responsible for these modern innovations remains unchanged. Like Sanger sequencing, today's NGS technologies, with the exception of oligonucleotidebased ligation sequencing (Drmanac et al., 2010), still require a DNA polymerase to carry out the necessary biochemical reaction for replicating template sequence information. This unique, polymerase-dependent sequencing approach is generally referred to as DNA sequencing-by-synthesis (SBS), because the consecutive sequencing reaction concurrently generates a newly synthesized DNA strand as a result.

However, unlike Sanger sequencing, DNA polymerases utilized in NGS technologies are more diverse and tailor-made. The Klenow enzyme, a proteolytic fragment of *Escherichia coli* DNA polymerase I. was originally utilized in Sanger's dideoxy chainterminating DNA sequencing chemistry (Sanger et al., 1977). This enzyme was chosen for its efficient incorporation of 2- , 3- dideoxynucleotides (ddNTPs) that leads to chain termination of DNA synthesis (Atkinson et al., 1969). From this humble beginning, followed by a robust sequencing chemistry improvement, the nucleotide substrates used for DNA sequencing became larger and bulkier. First, four fluorescent dyes with distinct, nonoverlapping optical spectra were attached to either purine or pyrimidine bases, respectively, and even the terminal gamma phosphate of four (A, T, C, and G) nucleotides for the ease of signal detection (Smith et al., 1986; Ju et al., 2006; Guo et al., 2008, 2010; Eid et al., 2009; Korlach et al., 2010). Second, the 3 hydroxyl group on deoxyribose of four nucleotides was replaced with a larger, cleavable chemical group used to reversibly terminate DNA synthesis (Ju et al., 2006; Guo et al., 2008, 2010). As a result, the original Klenow enzyme no longer efficiently incorporated these newly modified nucleotides. DNA polymerases with different enzymatic properties were required for improving the nucleotide incorporation reactions. Fortunately, the adoption of NGS sequencing in life science research allowed a rapid expansion of organism-specific, genome sequence information accessible via public database. Various DNA polymerases from mesophilic/thermophilic viruses, bacteria, and archaea were discovered and later screened for efficient incorporation of modified nucleotides in new DNA sequencing methods. A pool of new, advantageous DNA polymerases from a wide variety of microorganisms were selected and served as protein backbones for further improvement via protein engineering or directed enzyme evolution (Patel and Loeb, 2001).

Evolved DNA polymerases with improved biochemical performances were ultimately utilized for each, unique sequencing technology.

This article briefly covers (1) the progression of decades' enzymatic DNA sequencing methods reliant on functions of DNA polymerase for synthesizing new DNA strands; (2) the novel properties of DNA polymerase that are required for high-precision DNA sequencing; (3) the influence of nucleotide modifications on DNA polymerase research that ultimately lead to improved sequencing performance; (4) the application of DNA polymerases in emerging DNA sequencing methods. Readers interest in learning more about other sequencing methods can refer to these literatures (Landegren et al., 1988; Ding et al., 2012) for more information.

# **OVERVIEW OF DNA POLYMERASE FAMILIES AND FUNCTIONS**

Since the discovery of DNA polymerase I in *E. coli* by Arthur Kornberg's group in the late 1950s (Lehman et al., 1958a,b), multiple DNA polymerases have been discovered, purified and characterized from bacteria, eukaryotes, archaea, and their viruses. The expansion of organism-specific, genome sequence information accessible via public database, together with advanced search-algorithms based on DNA polymerase structure–function relationships, have reduced the time necessary for identification of additional, putative DNA polymerases from a variety of sources (Burgers et al., 2001). Based on the phylogenetic relationships of *E. coli* and human DNA polymerases, DNA polymerases are generally classified into seven main families: A (*E. coli* Pol I), B (*E. coli* Pol II), C (*E. coli* Pol III), D, X (human Pol β-like), Y (*E. coli* Pol IV and V and TLS polymerases), and RT (reverse transcriptase) (Burgers et al., 2001; Langhorst et al., 2012). All living organisms, except viruses, harbor multiple types of DNA polymerases for

cellular functions. Interestingly, neither bacteria, eukaryotes nor archaea contain all families of DNA polymerases. As summarized in **Table 1**, the family C DNA polymerases are unique to bacteria, and have not been found in either eukaryotes or archaea (Hübscher et al., 2010; Langhorst et al., 2012). Likewise, the family D polymerases are restricted to archaea (Euryarchaeota), and do not exist in bacteria or eukaryotes (Hübscher et al., 2010; Langhorst et al., 2012). Another characteristic exclusive to archaeal DNA polymerases is the presence of intervening sequences (inteins) within the polymerase coding genes (Perler et al., 1992). These inteins cause in-frame insertions in archaeal DNA pols and must be spliced out in order to form mature enzymes (Hodges et al., 1992).

The basic function of DNA polymerases (cellular DNA replicases) are to faithfully replicate the organism's whole genome and pass down the correct genetic information tofuture generations. In bacteria, family C DNA polymerases, such as Pol III holoenzyme in *E. coli* or *Bacillus subtilis*, are the key element for driving chromosomal replication and thus absolutely mandatory for cell viability (Gefter et al., 1971; Nusslein et al., 1971; Gass and Cozzarelli, 1973, 1974). Besides the Pol III holoenzyme, the A-family Pol I also participates in bacterial DNA replication (Olivera and Bonhoeffer, 1974). Pol I contains a separate 5 to 3 exonuclease, independent of the DNA polymerase domain, that can remove RNA primers and concurrently fill in the nucleotide gaps between Okazaki fragments during lagging strand DNA synthesis (Okazaki et al., 1971; Konrad and Lehman, 1974; Xu et al., 1997). Unlike bacterial cells, eukaryotic B-family DNA polymerases, such as Pol δ and ε in human and yeast, are responsible for nuclear chromosomal replication (Miyabe et al., 2011). Recent studies in yeast by Thomas Kunkel's group suggest that Pol δ and ε divide their roles during DNA replication and are responsible for lagging and leading strand DNA synthesis, respectively (Pursell et al., 2007; Kunkel and

**Table 1 | Families and properties of cellular DNA replicases (Kunkel, 2004; Hübscher et al., 2010; Greenough et al., 2014).**


In the bacterial column, the gene for each corresponding protein is indicated in the bracket. In the eukaryotic and archaeal column, the components of each holoenzyme are listed in the parentheses. \*N.A. denotes "not applicable." \*\*The unit for "Error Rate" is one error per incorporated base.

Burgers, 2008; Nick McElhinny et al., 2008; Miyabe et al., 2011). In archaea, both B- and D-family pols are involved in genomic replication. However, the role of each Pol *in vivo* remains controversial. From biochemical studies, both Pol B and D enzymes from hyperthermophilic *Pyrococcus abyssi* are proposed to function together in DNA replication (Henneke et al., 2005). In contrast, a recent genetic study in *Thermococcus kodakarensis* showed that Pol D alone is sufficient for cell viability and genomic replication which argues that Pol D, rather than Pol B, is the main replicative DNA polymerase in this archaeon (Cubonova et al., 2013). It is possible that the requirements for Pol B and D enzymes in DNA replication are different in separate phyla of Archaea.

In summary, all DNA polymerases engaged in cellular genome replication, regardless of origin, have the following common features (See **Table 1**): (1) they appear to form a multi-subunit enzyme complex (holoenzyme); (2) they all possess an intrinsic 3- to 5 exonuclease, or proofreading activity, that removes misincorporated nucleotides immediately after nucleotide incorporation to ensure high-fidelity of DNA synthesis (**Figure 3A**). In contrast to the major cellular DNA polymerases, functions of X, Y, and RT families of Pols are more diverse and specialized in many DNA processes, such as DNA repair, translesion synthesis, and eukaryotic telomere maintenance (Hübscher et al., 2010). None of these Pols have any intrinsic 3 to 5 proofreading exonuclease activity and are thus more error-prone during DNA synthesis (Kunkel and Bebenek, 2000; Kunkel, 2004, 2009).

# **CHOOSING THE RIGHT DNA POLYMERASE FOR DNA SEQUENCING**

Growing numbers of DNA polymerases, each with distinct functions, provide abundant enzymatic resources for improving current and emerging DNA sequencing techniques. However, not all families of DNA polymerases are suitable for high-precision DNA sequencing reactions. To be considered, and ultimately applied for a particular method of sequencing, the DNA polymerase should possess the following properties:


and catalytic, divalent cations (Mg2<sup>+</sup> or Mn2+, etc.) for the sequencing reaction. Addition of nucleotides to the 3' end of a primer by DNA pols proceeds through a highly ordered, temporal mechanism. The minimal catalytic mechanism of single-nucleotide incorporation by DNA pol has been proposed (Donlin et al., 1991; Johnson, 1993) and is illustrated in **Figure 1**. A brief description for each reaction step can be found in the figure legend. As shown in **Figure 1**, the nucleotide incorporation efficiency (specificity) of a DNA polymerase (*k*pol/kd,dNTP) is determined by the rate of phosphodiester bond formation (*k*pol) and the binding constant for the cognate nucleotide (*k*d,dNTP; Wong et al., 1991; Johnson, 1993). DNA pols with a faster nucleotide incorporation rate and lower *k*d,dNTP (large *k*pol and small *k*d,dNTP) can catalyze DNA synthesis much more efficiently. In this aspect, none of the X and Y-family pols can meet this requirement. Both X and Y-family Pols have much lower nucleotide incorporation efficiency (Brown et al., 2011a,b) compared to cellular DNA replicases from A, B, C, or D-family enzymes (Patel et al., 1991; Wong et al., 1991; Bloom et al., 1997; Zhang et al.,2009). Therefore, they are not idealfor DNA sequencing.

(3) The pol must have high replicative fidelity to minimize systematic sequencing errors. In order to accurately read DNA template sequence information, the DNA pol must faithfully

**FIGURE 1 |The minimal catalytic steps required for single-nucleotide incorporation by DNA polymerase.** The addition of nucleotide to the 3- end of a primer by DNA polymerase passes through a temporally ordered mechanism. The reaction begins with the binding of free DNA polymerase (E) to a duplex primer/template DNA complex (DNAn) resulting in a binary enzyme−DNA complex (E•DNAn; step 1). The koff, DNA represents the rate of enzyme dissociation from the E•DNAn complex. Addition of the correct nucleotide (dNTP) in the presence of divalent cations, such as Mg2+, promotes the enzyme−DNA-dNTP ternary complex formation (E•DNAn•dNTP; step 2 and 3). The kd, dNTP denotes the nucleotide binding constant of the enzyme. The binding of the dNTP induces the first conformational change of the enzyme in the ternary complex (E\*•DNAn•dNTP; step 4; Wong et al., 1991). The actual chemistry happens (step 5). The phosphodiester bond is formed between the α-phosphate of the incoming dNTP and 3- -OH of the primer terminus and produces an added nucleotide base to the primer terminus (DNAn+1). The chemical reaction generates a pyrophosphate (PPi) and proton molecule (H+). This is followed by a second conformational change of the enzyme (step 6), which allows the final release of the PPi leaving group (step 7). The nucleotide incorporation cycle is complete after PPi release. If the enzyme remains associated with DNA, a new round of nucleotide addition will continue until the enzyme dissociates from the DNA (processive synthesis).

incorporate the correct, matched nucleotides along the DNA template. The fidelity of nucleotide incorporation by X, Y, and RT Pols range from <sup>∼</sup>10−<sup>1</sup> to 10−<sup>4</sup> error per base incorporated, two to three orders of magnitude lower than high-fidelity cellular DNA polymerases from A, B, or C-family enzymes (Kunkel, 2004). These repair pols generally make errors during DNA synthesis (Kunkel and Bebenek, 2000; Kunkel,2004,2009) and are not appropriatefor high-precision DNA sequencing applications.


In summary, to fulfill the above requirements for highprecision DNA sequencing, only A-family enzymes from bacteria and phage viruses (such as T5 and T7 phages), and B-family pols from bacterial viruses (such as T4, Rb69, and φ29 phages), bacteria, and archaea (*Vent, 9*◦*N, Pfu,* and *KOD1*) have been evaluated for sequencing chemistry development (See **Table 1**). All family A and B enzymes have an associated, intrinsic 3- to 5 exonuclease proofreading activity. When these enzymes incorporate an incorrect nucleotide at the primer terminus, the enzymes' ability to extend the primer terminus diminishes, and allows the nascent DNA strand to migrate to the 3 exonuclease site for excision (See **Figure 3A**; Donlin et al., 1991; Joyce

and Steitz, 1994; Patel and Loeb, 2001). This unique partitioning mechanism of the 3 exonuclease proofreading domain among A and B-family polymerases is disfavored for DNA sequencing. It causes asynchronous DNA sequencing reactions and generates systematic sequencing errors (**Figures 3B,C**). Therefore, the majority of A and B-family pols used for DNA sequencing are either lacking, or have an attenuated, 3 exonuclease proofreading activity.

#### **NUCLEOTIDE SUBSTRATES FOR THE GENERATIONS OF DNA POLYMERASE-BASED SEQUENCING**

Generations of DNA polymerase-based sequencing methods and their corresponding commercial platforms are summarized in **Table 2**. As shown in **Table 2**, all methods require a DNA polymerase to catalyze the necessary biochemical reaction for extracting DNA sequence information. The fundamental difference amongst these technologies is the type of nucleotide substrate incorporated. The structures of these nucleotides are illustrated in **Figure 2**. More in-depth information regarding these nucleotides can be found in the following articles (Metzker et al., 1996; Lee et al., 1997; Kumar et al., 2005; Metzker, 2010; Chen et al., 2013a). From classical Sanger sequencing to modern NGS technologies, the nucleotide substrates used for sequencing have changed over time. In the original Sanger sequencing method, four 2- , 3- -ddNTPs (**Figure 2B**) are utilized (Sanger et al., 1977). Unlike normal dNTPs (**Figure 2A**), the ddNTPs lack the 3- -hydroxyl group (3- -OH), which is required for the phosphodiester bond formation between the incorporating nucleotide and primer terminus. Once ddNTPs are incorporated by the DNA polymerase, they terminate further addition of nucleotides from the primer terminus, and cease elongation of the DNA chain (Atkinson et al., 1969). Besides the utilization of ddNTPs, Sanger's protocol requires a set of radioisotopelabeled primers in four, separate (A, T, C, and G) reactions. The resulting dideoxy-terminated DNA fragments must be analyzed side-by-side using slab gel electrophoresis while sequence information is deduced via autoradiography (Sanger et al., 1977). The procedure itself is extremely time consuming and further compounded by low data output. This makes such an approach insufficient at meeting the growing demand for high-throughput DNA sequencing.

To simplify and subsequently automate Sanger's method, Leroy Hood's group, then at California Institute of Technology, invented the first fluorescent sequencing (dye-primer) method based on Sanger's approach (Smith et al., 1986). In Hood's revised protocol, the primers used for sequencing reactions are covalently attached to four distinct colors of fluorophores at the 5- -end, corresponding to each of the A, T, C, and G reactions in Sanger sequencing. The advantages to this approach are (1) the four reaction mixtures can be combined and analyzed in a single sequencing lane; (2) the results can be directly monitored by a computeraided fluorescence detection system, specifically matched to the emission spectra of the four dyes. These advantages allow DNA sequence information to be analyzed automatically by the computer.

Hood's dye-primer method simplifies traditional Sanger sequencing processes but it is not, however, completely ideal



\*On November 15, 2012, Helicos Biosciences filed for Chap. 11 bankruptcy.

for fully automated DNA sequencing, mainly due to the four, separate reactions still required. To solve this problem, the fluorescently labeled chain-terminating ddNTPs (dye-terminators) were soon introduced by Prober et al. (1987) from DuPont. Similar to the dye-primers, a set of fluorescently distinguished fluorophores are covalently attached to each of four ddNTPs (See **Figure 2C**). Adaptation of dye-terminators for Sanger sequencing workflow makes the four, base-specific chain termination reactions happen in one, single reaction tube. DNA polymerase is able to simultaneously incorporate four dye-terminators and generate the terminated DNA pieces for sequence analysis (Rosenthal and Charnock-Jones, 1992, 1993). The speed and throughput of dye-terminator sequencing was drastically improved when the automated capillary-array electrophoresis (CAE) was adopted for DNA analysis (Drossman et al., 1990; Luckey et al., 1990; Zagursky and McCormick, 1990; Dovichi and Zhang, 2000).

The dye-terminator-CE method has greatly improved sequencing performance and has become the laboratory standard for DNA sequencing over the past few decades. However, the technique itself is still very limited, especially for large-scale, whole genome sequencing. Increasing the sequencing throughput of dye-terminator-CE chemistry requires additional capillary tubes to be implemented. This becomes impractical for the application of high-throughput, multiplexing sequencing that is capable of sequencing millions of different DNA strands concurrently. To alleviate this limitation, reversible dye-terminators were introduced to the modified, dye-terminating sequencing scheme. Similar to dye-terminators (**Figure 2C**), reversible dye-terminators (**Figure 2D**) are also missing the 3- -OH group needed for DNA polymerase extension of the primer terminus. Incorporation of these modified nucleotides by DNA polymerase terminates DNA chain elongation (Bentley et al., 2008; Guo et al., 2008; Hutter et al., 2010). When these reversible dye-terminators are used in parallel with immobilization of DNA molecules on a solid-state surface, the individual DNA sequence can be directly ascertained from the base-specific, terminated DNA molecules recognized by the fluorescent imaging system (Bentley et al., 2008; Guo et al., 2008, 2010). As a result, the requirements for capillary electrophoresis (CE) analysis in a typical dye-terminator approach are no longer necessary, and millions of different DNA molecules can be sequenced simultaneously. Differentiating themselves from dyeterminators, reversible dye-terminators contain cleavable chemical groups at the 3 position of the pentose and linker region, located between the base and attached fluorophore (**Figure 2D**; Bentley et al., 2008; Guo et al., 2008; Hutter et al., 2010). These cleavable chemical groups can be removed in order to restore the normal 3- -OH group of deoxyribose and maintain the integrity of bases attached with dye. DNA chains can thus be further extended by the DNA polymerase and incorporation can resume once more in the next reaction cycle (Bentley et al., 2008; Guo et al., 2008, 2010). A similar sequencing scheme was also carried out using another class of reversible dye-terminators with normal 3- -OH groups (Wu et al., 2007; Pushkarev et al., 2009; Litosh et al., 2011; Gardner et al., 2012). These 3 unblocked, reversible terminators possess both chemical blockage group and fluorescent dye attached to the same base (**Figure 2E**), and can be removed by either chemical cleavage or UV light (Pushkarev et al., 2009; Litosh et al., 2011).

In both classes of reversible dye-terminators, cleavage of the linker group carrying the fluorescent dye leaves extra chemical molecules on the normal purine and pyrimidine bases. These molecular remnants may perturb the protein–DNA interaction and eventually impact the sequencing performance of the DNA polymerase (Metzker, 2010; Chen et al., 2013a). To circumvent this concern, terminal γ-phosphate, fluorescently labeled nucleoside polyphosphates (**Figure 2F**) were developed for the more advanced, third-generation DNA sequencing technique (Kumar et al., 2005; Korlach et al., 2010). There are two major

**polymerase-based sequencing methods. (A)** Deoxynucleotides (dNTPs); **(B)** 2- , 3- -dideoxynucleotides (ddNTPs); **(C)** Dye-terminators; **(D)** Rever-

advantages of performing DNA sequencing with γ-phosphate-

semiconductor-based proton sequencing technique (Rothberg et al., 2011), which monitors the proton (H+) release during phosphodiester bond formation between the 3- -OH and α-phosphate of incoming nucleotide. Both technologies utilize natural nucleoside triphosphates (dNTPs) for their sequencing reactions (**Table 2** and **Figure 2A**).

an A, T, C or G base, and "B" indicates a cleavable chemical blockage group.

# **CHALLENGES OF RAPIDLY EVOLVING NUCLEOTIDE SUBSTRATES ON DNA POLYMERASE RESEARCH**

A series of nucleotide modifications, created for rapidly changing DNA polymerase-based sequencing technologies has created a daunting task for DNA polymerase researchers to look for, design or evolve compatible enzymes for ever-changing DNA sequencing chemistries. From the beginning, A-family *E. coli* DNA polymerase I (Pol I) or its proteolytic (Klenow) fragment was chosen by Dr. Sanger for his dideoxy-sequencing chemistry (Sanger et al., 1977). This was the only DNA polymerase available at the time and, quite fortunately, tolerated incorporation of 2- , 3- -ddNTPs (Atkinson et al., 1969). However, Pol I effectively discriminates between a deoxy- and dideoxyribose in the

labeled nucleotides over conventional chain terminators. First, the nucleotides, once incorporated, don't generate a molecular scar on the newly synthesized DNA, and second, they enable real-time, single-molecule SBS (Korlach et al., 2010). Because the phosphoryl transfer reaction only occurs between the 3- -OH group of the primer terminus and α-phosphate of the incoming nucleotide, the conclusion of each enzymatic reaction results in one nucleotide addition to the primer terminus plus a pyrophosphate (PPi) leaving group (**Figure 1**, steps 5–7; Steitz, 1997, 1999). Hence, any fluorophore covalently attached to the PPi leaving group will be released after nucleotide addition to the primer terminus, and thus leave no molecular vestige in the DNA. Since the added nucleotide possesses no blockage group to hinder DNA elongation from the primer terminus, the sequencing reaction can continue uninterrupted.

Finally, there are no DNA scar issues for both pyrosequencing technology (Ronaghi et al., 1996, 1998), which detects the release of PPi after nucleotide addition by DNA polymerase, and

incorporate the next nucleotide base (greatly reduced kpol value) and triggers a rapid transfer of DNA primer strand to the intrinsic 3 to 5- exonuclease domain. The mismatched nucleotide base is then removed (incorrect deoxynucleoside monophosphate, dNMP) by the 3 to 5systematic DNA sequencing errors. In the panels **B,C**, each filled circle indicates a nucleotide base. A string of filled-gray circles represents the primer strand, and a string of filled-blue circles is the template DNA strand. Specific bases (dC, dG, and dT) are indicated inside the circles.

nucleoside triphosphate, and does not incorporate ddNTPs very well (Atkinson et al., 1969). In fact, the incorporation rate of ddNTP by Pol I is several hundred-fold slower than that of normal dNTPs and is also sequence context-dependent (Tabor and Richardson, 1989). This sequence-specific ddNTP incorporation by Pol I creates non-uniform band intensities on the sequencing gel. This phenomenon becomes increasingly problematic, especially in the dye-primer/terminator sequencing, because the method of sequence information retrieval relies on the interpretation of fluorescent intensity of each dideoxy-terminated DNA band from the gel or capillary tubes. Similar results were reported with thermostable, Family A, *Thermus aquaticus* (*Taq*) DNA polymerase I (Innis et al., 1988).

In contrast, phage T7 DNA polymerase does not distinguish ddNTPs from dNTPs, and incorporates both types of nucleotides at nearly equal efficiencies (Tabor and Richardson, 1987; Brandis et al., 1996). Thus, the intensities of dideoxy-terminated bands are significantly more uniform with T7 pol in Sanger sequencing. To understand the molecular basis for this discrepancy, sequence analysis and biochemical studies were conducted among these three, A-family enzymes. The results indicate that a single phenylalanine to tyrosine residue change (Y526) on T7 pol, homologous position (F672), of a highly conserved finger motif (motif B) in A-family pols greatly reduces the enzyme's ability to select against ddNTPs (Tabor and Richardson, 1995). Biochemical studies further confirm that mutant Pol I, or *Taq,* carrying a F672Y or F667Y mutation, respectively, loses its discriminatory ability for ddNTPs, and thus incorporates ddNTPs very efficiently (Patel and Loeb, 2001). Additionally, these two mutant proteins were demonstrated to incorporate fluoresceinand rhodamine-labeled dye-terminators, three orders of magnitude more efficiently than their wild-type parent enzymes (Tabor and Richardson, 1995). Subsequently, T7, F672Y Pol I, and F667Y *Taq* pols were all used for manual and automated Sanger sequencing (Tabor and Richardson, 1987, 1989; Rosenthal and Charnock-Jones, 1992; Tabor and Richardson, 1995). However, *Taq* pol has become preferred for dye-terminator sequencing, because the enzyme has several advantages over Pol I or T7. The enzyme is more readily purified and modifiable for further improvement. It also has no intrinsic, 3 to 5 exonuclease proofreading activity, and is active over a broad range of temperatures (Innis et al., 1988). The thermostablility of *Taq* pol became essential for sequencing after the PCR-based "cycle sequencing" approach was introduced (Rosenthal and Charnock-Jones, 1993).

The Phe to Tyr mutation at position 667 on conserved motif B of *Taq* pol only addresses the deoxy- and dideoxyribose selectivity problem in dye-terminator sequencing. The enzyme, like Pol I, possesses bias. Uneven ddNTP incorporation results in variable DNA band intensities, and unequal peak heights in CE analysis, creating unwanted sequencing errors (Parker et al., 1995; Li et al., 1999). Kinetic analysis reveals that *Taq* pol favors ddGTP incorporation over other ddNTPs, with a much more robust nucleotide incorporation rate (*k*pol; Brandis et al., 1996). To investigate the cause of ddGTP bias, structural analysis of all four, ddNTP-trapped ternary complexes of the large fragment of *Taq* pol (Klentaq1) was implemented. The data reveals a selective interaction between the guanidinium side chain of arginine residue 660 (R660) and the O6/N7 atoms of the guanine base of the incoming ddGTP. Substitution of the Arg660 residue with a negatively charged aspartic acid completely eliminates preference for ddGTP incorporation. The R660D/F667Y double mutant of *Taq* pol greatly improves dye-terminator sequencing quality and accuracy (Li et al., 1999).

Although the F667Y mutation on *Taq* pol greatly improves the enzyme's incorporation efficiency for dideoxy-dye-terminators, the improvement becomes marginal for the reversible dyeterminators, which carry larger chemical blocking groups than the normal 3- -OH at the 3 position of deoxyribose (Bentley et al., 2008; Guo et al., 2008; Chen et al., 2010, 2013a; Hutter et al., 2010). The 3 reversible terminating group is normally linked to the deoxyribose of the nucleotide through the oxygen atom of 3- -OH. A series of 3- -*O*-blocking groups have been developed including 3- -*O*-allyl (Ruparel et al., 2005; Wu et al., 2007), 3- -*O*-(2-nitrobenzyl) (Wu et al., 2007), and 3- -*O*azidomethylene (Bentley et al., 2008). Serendipitously, reversible dye-terminators bearing either blockage group were found to be incorporated well by a variant of archaeal *9*◦*N* DNA polymerase (a B-family Pol) of hyperthermophilic *Thermococcus* sp. *9*◦*N*-*7* (Southworth et al., 1996; Ruparel et al., 2005; Ju et al., 2006; Bentley et al., 2008). The enzyme variant bearing A485L and Y409V double mutations on conserved motifs A and B, respectively, of the DNA polymerase shows enhanced preference for incorporating both acyclic and dideoxy dye-terminators over the parent enzyme (Gardner and Jack, 2002). The same mutational effects were also found in enzyme mutants possessing homologous mutations in other archaeal, B-family DNA polymerase

species (Gardner and Jack, 1999; Gardner et al., 2004). Similarly, the analogous combination of mutations (P410L/A485T) at the same conserved protein regions of closely related, B-family DNA polymerase *Thermococcus* sp. JDF-3 also shows an additive effect on improving dye-terminator incorporation (Arezi et al., 2002). Furthermore, an A485L variant of *9*◦*N* DNA pol, termed Therminator DNA polymerase commercially, was recently demonstrated to efficiently incorporate 3- -OH unblocked dyeterminators with a terminating 2-nitrobenzyl moiety attached to hydroxymethylated nucleobases (Gardner et al., 2012). Thus, mutations at these two conserved protein motifs of archaeal, Bfamily DNA polymerase might affect the enzyme's selectivity and tolerance for modifications and substitutions on the deoxyribose and nucleobase.

Recently, a more rational approach was taken to search for variants of *Taq* pol that can accept new types of reversible terminators possessing a 3- -ONH2 blocking group (dNTP-ONH2; Chen et al., 2010). Using the structure-guided reconstruction of ancestral DNA sequence analysis on *Taq* pol, a library of 93 protein variants carrying different combinations of mutations were designed and screened for the ability to incorporate dNTP-ONH2 in primer-extension assays. One beneficial mutation (L616A) on *Taq* pol was identified. The L616A *Taq* enzyme variants incorporated both dNTP-ONH2 and ddNTPs faithfully and efficiently.

The path toward acquisition of a compatible DNA polymerase for incorporation of fluorescent, terminal polyphosphate-labeled nucleotides has not been so straightforward. Historically, the specificities of DNA polymerases toward γ-phosphate modified dNTPs are found to be very different, due to the various degrees of steric effects of substituted chemical groups on each enzyme's dNTP binding pocket (Arzumanov et al., 1996; Martynov et al., 1997). For instance, a bulky 2, 4-dinitrophenyl group substitution at the γ-phosphate of dNTP is a good substrate for the RT-family AMV RT, but is not acceptable for A or B-family DNA polymerases (Alexandrova et al., 1998). Similar findings were reported with the bis-(2- -deoxynucleoside) 5- , 5- -triphosphates (Victorova et al., 1999). HIV-RT utilizes this type of γ-phosphate modified nucleotide very effectively, while *E. coli* Pol I and *Taq* pol do not. Interestingly, in the same study, both Pol I and *Taq* pol were found to incorporate the bis-(2- -deoxynucleoside) 5- , 5- -tetraphosphates more efficiently than the triphosphate analog (Victorova et al., 1999). Thus, the addition of an extra-phosphate moiety to the terminal γ-phosphate of dNTP seems to attenuate the steric effects on the enzyme. Alternatively stated, the extra phosphate spacer, linked to the terminal γ-phosphate of dNTP, makes the modified nucleotide better tolerated by the enzyme. Indeed, when nucleotide incorporation rates were evaluated with fluorescent, terminal phosphate-labeled nucleoside polyphosphates containing 3, or more, phosphates at the 5- -position of the nucleoside, the nucleotides possessing greater than three phosphates were more effective substrates for A and B-family DNA polymerases (Kumar et al., 2005). Later studies proved both dye-labeled nucleoside penta/hexaphosphates (dN5Ps and dN6P) alone can be used by enterobacterial phage φ29 DNA polymerase for incorporating thousands of bases in length, approaching natural dNTP rates

(Korlach et al., 2008, 2010). This unique, long, replicative processivity of φ29 DNA pol, together with intrinsic, superior capability of incorporating dye-labeled, terminal polyphosphate nucleotides plays a key role in real-time, single-molecule SBS (Korlach et al., 2010).

## **APPLICATIONS OF DNA POLYMERASE FOR EMERGING SEQUENCING TECHNOLOGIES**

In contrast to current, SBS approaches, emergent DNA sequencing methods rely on unconventional applications of DNA polymerase. These techniques utilize DNA polymerase as a traditional incorporating enzyme, and alternatively as a molecular motor, responsible for controlled DNA translocation across the protein nanopore. Traditional, nanopore-based, SBS uses commercial Therminator γ DNA polymerase, a variant*9*◦*N* DNA pol, to incorporate terminal, γ-phosphate-labeled nucleoside tetraphosphates. These modified nucleotides are coupled with four, different-length PEG-coumarin tags corresponding to base A, T, C, and G (Kumar et al., 2012). DNA sequence information can be ascertained by measuring current (*amp*) fluctuations of the orderly, released PEG-coumarin tags through the α-hemolysin nanopore following DNA polymerase incorporation. A related, but fundamentally different approach involves mutant *Mycobacterium smegmatis* porin A (MspA) nanopore, φ29 DNA polymerase, and natural dNTPs (Manrao et al., 2012). In this approach, the enzyme functions as both DNA replicative enzyme, and molecular motor, which control the speed of DNA translocation through the MspA nanopore.

Besides the nanopore-based sequencing approach, a protein, transistor-based sequencing method, leveraging electrical conductance measurement of φ29 DNA polymerase reactions has been reported (Chen et al., 2013b). Unfortunately, this study is currently called into question, and the merits of this particular method must be reevaluated (Chen et al., 2013b).

#### **CONCLUSION**

Since the introduction of the first enzymatic DNA sequencing by Frederic Sanger in the mid-1970s, decades of scientific research on various DNA polymerases, starting with Arthur Kornberg's enzyme discovery in the mid-1950s, have provided the basic understanding of how these enzymes function and replicate DNAs, further cementing the foundation for improving enzyme properties and applications in current, and future, DNA polymerase-based sequencing technologies. The large-scale of organism-specific, genome research reveals the intrinsic diversity and unique characteristics of DNA polymerases present in all kingdoms of life, including their viruses. Diverse DNA polymerases with distinct functions and properties provide a large pool of natural protein variants that can be tested, and later utilized, for continuously evolving sequencing-chemistries. Tailor-made protein variants designed via protein engineering or directed-enzyme evolution have created powerful protein-engines that have propelled the progression of DNA sequencing technologies over the past few decades. Without a doubt, DNA polymerase has been, and will continue to remain, a crucial component of future sequencing technologies.

#### **ACKNOWLEDGMENTS**

The author thanks Ali Nikoomanzar for editing the manuscript, and Dr. Lawrence A. Loeb, Eddie Fox, and Thomas Lie for critical reading the manuscript.

#### **REFERENCES**


required for genome replication in *Thermococcus kodakarensis*. *J. Bacteriol.* 195, 2322–2328. doi: 10.1128/JB.02037-12


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 07 April 2014; paper pending published: 08 May 2014; accepted: 03 June 2014; published online: 24 June 2014.*

*Citation: Chen C-Y (2014) DNA polymerases drive DNA sequencing-by-synthesis technologies: both past and present. Front. Microbiol. 5:305. doi: 10.3389/fmicb.2014.00305 This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology.*

*Copyright © 2014 Chen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# DNA polymerases engineered by directed evolution to incorporate non-standard nucleotides

# *Roberto Laos\*, J. Michael Thomson and Steven A. Benner*

Foundation for Applied Molecular Evolution, Gainesville, FL, USA

#### *Edited by:*

Andrew F. Gardner, New England Biolabs, USA

#### *Reviewed by:*

Michael Metzker, Baylor College of Medicine, USA Piet Herdewijn, KU Leuven, Belgium

#### *\*Correspondence:*

Roberto Laos, Foundation for Applied Molecular Evolution, 720 SW 2nd Avenue, Suite 208, Gainesville, FL 32601, USA e-mail: manuscripts@ffame.org

DNA polymerases have evolved for billions of years to accept natural nucleoside triphosphate substrates with high fidelity and to exclude closely related structures, such as the analogous ribonucleoside triphosphates. However, polymerases that can accept unnatural nucleoside triphosphates are desired for many applications in biotechnology. The focus of this review is on non-standard nucleotides that expand the genetic "alphabet." This review focuses on experiments that, by directed evolution, have created variants of DNA polymerases that are better able to accept unnatural nucleotides. In many cases, an analysis of past evolution of these polymerases (as inferred by examining multiple sequence alignments) can help explain some of the mutations delivered by directed evolution.

**Keywords: DNA polymerases, non-standard nucleotides, AEGIS, directed evolution, CSR, protein engineering**

# **INTRODUCTION**

DNA polymerases are enzymes that catalyze the template-directed synthesis of DNA. Over billions of years, they have evolved to have the speed, specificity, and accuracy required for them to transmit valuable genetic information to and from living organisms with a level of infidelity just sufficient to support Darwinian evolution.

Currently, many DNA polymerases are used in polymerase chain reaction (PCR) and other procedures that involve the copying of nucleic acids. These include multiplexed PCR, nested PCR, reverse transcription PCR, and DNA sequencing. Polymerases are also used to incorporate modified nucleotides, including those that tag, report, or signal the presence of product DNA molecules. They are also now being used to copy sequences built from "artificially expanded genetic alphabets," which add new base pairs to the standard A:T and G:C pair. Together, these technologies are combined to allow nucleic acids to be amplified from complex samples, including saliva, blood, forensic traces, and fossil remains. Furthermore polymerases are supporting *in vitro* selection with expanded genetic alphabets to create receptors that bind to cancer cells (Sefah et al., 2014). Accordingly, the demand for new polymerase variants, especially those with specialized attributes, shows no sign of diminishing, despite the large number of polymerases already available.

This review focuses primarily on polymerase variants that accept nucleic acids having additional nucleotide "letters" that form additional nucleobase pairs. Such expanded genetic systems are being developed in many laboratories (Rappaport, 1988; Switzer et al., 1989; Ishikawa et al., 2000; Tae et al., 2001; Kool, 2002; Geyer et al., 2003; Henry and Romesberg, 2003; Minakawa et al., 2003; Benner, 2004; Hirao et al., 2004; Sismour and Benner, 2005). Some of these simply shuffle the hydrogen bonding groups that join base pairs within a Watson-Crick geometry, such as the artificially expanded genetic information system (AEGIS; Piccirilli et al., 1990; Geyer et al., 2003). Others attempt to add hydrogen bonds to hold the pair together (Minakawa et al., 2006). Still others hope to dispense with hydrogen bonds entirely (Morales and Kool, 1999). Some polymerases have been modified without the use of directed evolution; however, these cases provide an insight on structure and function of polymerases.

Major advances in "next generation" sequencing, which requires the use of modified nucleotides and DNA polymerases, are considered in a separate review in this series (Chen, 2014).

# **DNA POLYMERASE FAMILIES**

DNA polymerases have been classified into evolutionary families based on an analysis of their amino acid sequences. Initially two decades ago Braithwaite and Ito (1993) used an extensive compilation of the then-available sequences to classify polymerases into three families: A, B, and C. The family names indicated homology to the products of three genes: *pol*A, *pol*B, and *pol*C, which encode for the three canonical polymerases from *Escherichia coli*: DNA polymerase I, DNA polymerase II and DNA polymerase III alpha subunit, respectively (Ito and Braithwaite, 1991; Braithwaite and Ito, 1993). The most studied polymerases belong to Family A (found in prokaryotes, eukaryotes and bacteriophages) and family B (found in prokaryotes, eukaryotes, archaea, and viruses). The D family groups polymerases from Archaea (Cann and Ishino, 1999). Families X and Y are involved in repair. The family X perform base excision repair and double-strand break repair by using their ability to fill short gaps (Moon et al., 2007; Yamtich and Sweasy, 2010). Some polymerases from the family X can perform polymerase activity without template (Berdis, 2014). The family Y groups eukaryotic polymerases (Ohmori et al., 2001) and these show less homology to the previously identified families. Most of family Y polymerases lack proofreading exonuclease domains and have a more open active site to accommodate base damage, presumably this allows them to bypass DNA lesions (Pryor et al., 2014). The RT family groups

the reverse transcriptases, including eukaryotic telomerases and reverse transcriptases found in viruses (Le Grice and Nowotny, 2014).

Early studies recognized that mild proteolysis of DNA polymerase I from *E. coli* produces two fragments, a large fragment that lacks the 5- –3 exonuclease activity and a small fragment that is then discarded. The large fragment, called the Klenow fragment, retains both the polymerization and proofreading activities of the native enzyme. The Klenow fragment yielded the first crystal structure of a family A polymerase, solved by Ollis et al. (1985). This crystal revealed a "right hand" shape with the active site being located at the "palm" which holds the catalytic amino acids, a "thumb" that binds double-stranded DNA and "fingers" where the incoming nucleotide binds and interacts with the template*.* The structure of *Thermus aquaticus* DNA polymerase and the analog of the Klenow fragment, the large fragment of *Thermus aquaticus* DNA polymerase (Klentaq1) has also been studied by crystallography (Kim et al., 1995; Korolev et al., 1995).

#### **FAMILY A POLYMERASES**

FamilyA is the most studied of the seven DNA polymerasefamilies. It includes many of the"workhorse"polymerase in classical molecular biology, including the Klenow fragments of *E. coli* and *Bacillus* DNA polymerase I, *Thermus aquaticus* DNA polymerase and the T7 RNA and DNA polymerases. It also includes the first DNA polymerase to be characterized enzymatically, DNA polymerase I from *E. coli*, in seminal work by Kornberg (1960).

#### *Taq polymerase*

With the advent of the PCR, it became clear that a polymerase stable to heating would be useful. Here, the DNA polymerase I from *Thermus aquaticus* (*Taq* polymerase) is widely used in PCR. *Thermus aquaticus* was isolated in 1976 from hot springs in Yellowstone National Park (Chien et al., 1976), where it thrives at 70◦ C. Since the enzyme can be activated by heating the sample and remains active with the high temperatures required to denature DNA strands (typically 94◦C), it allows repeated cycles of denaturing, annealing and extension (thermocycling) without the need to add additional polymerase at each cycle. This made PCR a routine laboratory technique.

Eom et al. (1996)solved a co-crystal structure of *Taq* with blunt DNA duplex bound to the active site cleft. This structure had several features: (a) DNA is in an intermediate form between B and A forms. (b) Functionality from certain amino acid side chains hydrogen-bond to the *N*<sup>3</sup> of purines and the *O*<sup>2</sup> of pyrimidines of specific residues in the duplex. (c) The 3 hydroxyl of the primer strand is near three carboxylate groups, delivered by amino acids Asp 785, Glu 786, and Asp610. These are considered to constitute the catalytic core of the enzyme.

As with its homolog, polymerase I from *E. coli*, Taq DNA polymerase can be cleaved to give an active fragment, called Klentaq. This fragment retains polymerase activity without one of its nuclease activities. Li et al. (1998) solved the crystal structures of two ternary complexes of the large fragment of *Thermus aquaticus* DNA polymerase I (Klentaq1): (a) Klentaq with primer/template and dCTP; (b) Klentaq with primer template. These identified

two conformations of the polymerase: (i) an "open" conformation where the tip of the fingers of the hand is rotated 46◦ outward and presumably not actively performing the polymerase reaction and (ii) a "closed" conformation, which is "caught in the act" of incorporating a nucleotide. This was the first direct evidence in any DNA polymerase for a large conformational change as part of the catalytic cycle.

#### *Motifs relevant to the engineering of Family A polymerases*

Six motifs in the structure of *Taq* Pol are also conserved throughout Family A polymerases. These include motifs A, B, and C (Delarue et al., 1990). Motifs A and B are the most conserved. These two motifs are relevant to DNA polymerase fidelity and substrate specificity, which makes them of special interest to experimentalists seeking to improve the ability of investigators to obtain polymerases that accept unnatural substrates.

Motif A is found in the palm domain of the polymerase and includes the amino acids 605–617; in *Taq*, the sequence is (LLVAL**DY**SQI**E**LR). Within this motif, Asp 610 (bold D) cannot be changed without losing enzymatic activity, presumably, because it coordinates the metal that is directly responsible for catalysis. Glu 615 (bold E) can be changed to Asp without complete loss of activity. Tyr 611(bold Y) which is located in a hydrophobic pocket, can be replaced by a planar aromatic amino acid. The rest of the amino acids in motif A can be replaced by many amino acids without destroying catalytic activity (Patel and Loeb, 2000a).

The amino acids that make up motif B are located in the fingers domain and form the O-helix, which contacts the base pair being formed in the primer extension step. This motif covers residues 659–671; in *Taq*, the sequence is (**R**RAA**K**TINFGVLY). This motif contains Arg 659 (bold R) and Lys 663 (bold K), which are known to interact with the incoming triphosphate moiety and are critical for enzymatic activity. For these reasons, they are most likely immutable. Alternatively, Phe 667 and Tyr 671 tolerate conservative substitutions as these are involved in base stacking. The remaining amino acids tolerate a wide range of substitutions. **Figure 1** shows the conserved motifs on the structure of *Taq* polymerase.

Certain substitutions within motif A and B have shown to lower fidelity without eliminating catalytic activity (Suzuki et al., 1997). This is the case of Ile614 in motif A and the Ala661Glu and Thr664Arg substitutions on motif B.

In one example, replacing an amino acid at a single site is known to change substrate specificity in a useful way. Replacement of Phe667 in *Taq* polymerase by a tyrosine eliminates the ability of the polymerase to discriminate against dideoxynucleotides. The *Taq* variant F667Y is, therefore, used for DNA sequencing. Interestingly, in T7 DNA polymerase, the replacement of Tyr526 by Phe increases the discrimination against dideoxynucleotides. This illustrates a general principle in protein engineering: rationales are best constructed *after* the replacement is made and its impact is evaluated.

#### **FAMILY B POLYMERASES**

Exploration of the natural microbiosphere led to the discovery of organisms that could grow at temperatures even higher than *Thermus*. These came to be known as "hyperthermophiles,"

and were shown by their ribosomal RNA sequences to belong to a third kingdom, or domain, of life of Earth: the Archaea. These proved to be sources of polymerases that were even more thermostable.

For example, *Pyrococcus furiosus* (*Pfu*), a hyperthermophilic archaeon, was discovered in the Lower Geyser Basin of Yellowstone National Park (Brock and Freeze, 1969; Brock and Edwards, 1970). Its DNA polymerase (*Pfu*) has been used in many PCR applications. Crystallographic analysis of the native form (Kim et al., 2008) as well as a variant able to replace dCTP with a cyanine dye-labeled dCTP (Wynne et al., 2013) showed that it contains five distinct domains, called the finger, palm, thumb, N-terminal and exonuclease domains (Hopfner et al., 1999; Hashimoto et al., 2001).

*Pyrococcus furiosus* has a feature that is absent in *Taq* polymerase: an exonuclease domain that has 3- –5 exonuclease activity. This allows *Pfu* to proofread using a conformational change (Hopfner et al., 1999). When the polymerase encounters a mismatch, it binds more weakly to the primer/template, causing strand unwinding. This allows the mismatch to move into the exonuclease pocket, where excision ensues (Freemont et al., 1988). A conserved loop in the exonuclease domain interacts with the thumb domain (Kuroita et al., 2005). Mutation of a key residue H147 to a glutamate residue in this loop results in an electrostatic attraction of the thumb domain to the exonuclease domain, preventing the 3 end of single stranded DNA from entering the exonuclease domain, thus significantly reducing the 3- –5 exonuclease activity (Wang et al., 1997; Kuroita et al., 2005). Kim et al. (2008) suggested that an alternative residue E148 was located at a better position in the loop to interact with the thumb

domain through a comparison of the crystal structures of *Pfu* and KOD1, a homologous Family B polymerase from the*Thermococcus* genus.

Family B DNA polymerases of hyperthermophilic archaeons may generally have 3- –5 exonuclease activity (Joyce, 1989; Joyce and Steitz, 1994; Benkovic et al., 2001; Joyce and Benkovic, 2004). Some have shown to recognize the presence of uracil and hypoxanthine in a template strand, stalling when they sense it ahead of the extension site (Greagg et al., 1999; Fogg et al., 2002; Connolly, 2009). This may reflect functional adaptation. Both uracil and hypoxanthine are "mistakes" in a DNA sequence, arising via the deamination of cytosine and adenine, respectively. Such deaminations presumably occur more rapidly at the higher temperatures where hyperthermophiles live. If the polymerase, nevertheless, extends further through the incorporation of dNTPs placing the uracil in the +2 position, the resulting outcome is the activation of the proofreading excision of the deaminated base (Connolly, 2009).

The need for additional proofreading in the natural environment may not be so pressing to a biotechnologist. Accordingly, many have altered the proofreading ability of *P. furiosus* by either removing the exonuclease activity for use in error-prone PCR (ePCR; Biles and Connolly, 2004) or increasing the efficiency of ligation-mediated PCR protocols (Angers et al., 2001). Sanger sequencing also requires the elimination of the exonuclease activity, otherwise incorporated ddNTPs would be removed and the sequencing signals would disappear.

# **PROTEIN ENGINEERING AND DIRECTED EVOLUTION THE "NEXT GENERATION" GOALS**

For classical Sanger sequencing, a polymerase need only accept a tagged triphosphate with a 3- -blocking group with modest fidelity. The termination:extension ratio will be adjusted in any case by adjusting the concentrations of the terminating and nonterminating triphosphates, meaning that relative inefficiency of incorporation of the unnatural species is not problematic.

However, as the synthetic biology research paradigm has developed, the demands placed on polymerase performance have increased. Here, polymerases are often called upon to copy DNA and PCR amplify molecules containing unnatural nucleotides, often at multiple sites. Here, the fidelity and (preferably) processivity required by a DNA polymerase to support PCR with unnatural nucleotides must be very high. In addition, the structural differences between a DNA polymerase that makes one error per thousand nucleotides and one error per million can be quite subtle and can arise through geometric differences that would not be necessarily distinguished even in a high resolution crystal structure.

#### **RATIONAL DESIGN**

#### *Information from structural biology*

Molecular biologists would like to believe that they have command of structural theory to "rationally" design polymerases with new, anticipatable properties. In some cases, this is possible, especially when it involves domain shuffling. This has been productive in improving one feature of polymerases important for a wide range of applications: processivity.

DNA binding factors are known to enhance processivity of many polymerases charged with copying complete microbial genomes. In principle, these might be added to improve the processivity of Pol I polymerases, which (as noted above) do not perform this role naturally. Their addition might also, in principle, be used to enhance the performance of any polymerase or polymerase variant. This addition, however, is not often used in biotechnology because of the complexity of the assembled combination. Indeed, *Taq* polymerase and other enzymes are used without accessory proteins for PCR because of their simplicity, which comes from their physiological roles in lagging strand replication and DNA repair.

The complexity of a multicomponent system would be avoided by directly fusing a processivity domain to the active polymerase domain. Adopting this rationale, Wang et al. (2004) covalently fused the double stranded DNA binding protein Sso7d from *Sulfolobus solfataricus* at the N-terminus of *Taq* polymerase (S-*Taq*) and to the fragment of Taq polymerase that results from the deletion of the first 289 amino acids which lacks the exonuclease domain [S-*Taq*(-289)]. The average length of primer extension prior to template-primer dissociation with *Taq* (-289) was increased from 2.9 to 51 nucleotides in S-*Taq*(-289). The full-length *Taq* polymerase, which is intrinsically more processive than *Taq* (-289), improves its average primer extension from 22 (*Taq*) to 104 (S-*Taq*) nucleotides (Wang et al., 2004).

In parallel work,Wang et al. (2004) also fused the Sso7d domain to the C-terminus of the polymerase from *P. furiosus*, to give *Pfu* polymerase (*Pfu*-S). As in the case of *Taq* polymerase, the fusion of the Ssod7 domain lead to an increase of the average primer extension, from 6.4 nucleotides for *Pfu* to 55 for *Pfu*-S.

Uses for the more processive (*Pfu*-S) were further realized in 1999 when the crystal structure of *Thermococcus gorganarius* DNA polymerase (*Tgo*) was solved. This structure identified a uracil binding pocket, which is used physiologically to prevent the polymerase from copying a template containing uracil, which arises from deamination of cytosine. This structure directed the construction of mutant forms of *Tgo* and *Pfu* DNA polymerases with reduced uracil stalling (Hopfner et al., 1999; Fogg et al., 2002). To increase the ability to read through uracil in the template, the Ssos7 domain was fused to both (*Pfu*-S) and the high fidelity mutant *Pfu* (V93Q). The result was higher processivity and improved uracil-excision cloning (Nour-Eldin et al., 2006).

Structural biology also provided a domain-swapping rational to increase the processivity of *Taq* polymerase. Here, the thioredoxin binding domain (TBD) of the T3 bacteriophage DNA polymerase was inserted into the thumb domain of *Taq* DNA polymerase, deleting amino acids 480–485 (Davidson et al., 2003). The rationale recognized that the processivity of T7 DNA polymerase increases from 15 to 2000 nucleotides when it forms a complex with *E. coli* thioredoxin. The affinity to the primer-template is also increased 80-fold upon binding to thioredoxin. The polymerase arising from this domain fusion remains thermostable, and has a 20–50 times higher processivity than the original *Taq* polymerase.

### *Exploiting information from multiple sequence alignments (MSAs) in rational engineering*

Polymerases are, of course, widely distributed in the biosphere in homologous form. During their divergent evolution, natural selection superimposed upon random variation has carried out several billion years of "protein engineering" experiments, of a sort. With the explosion of microbial sequencing in the last two decades, the results of these "experiments" can be obtained from a public sequence database. To the extent that these results are not corrupted by sequence error, they provide"evolutionary guidance" to assist laboratory protein engineering (Weinhold et al., 1987).

Evolutionary guidance has been productively applied to engineer polymerases, with Tabor and Richardson (1995) providing a classic example. Seeking to improve the ability of *Taq* DNA polymerase I to accept 2- ,3- dideoxynucleoside triphosphates (ddNTPs) for sequencing applications, Tabor and Richardson (1995) examined the sequences of three DNA polymerases from Family A (Braithwaite and Ito, 1993). "Wet" biochemistry had already told them that one of these, that from bacteriophage T7, incorporated ddNTPs better than the two others, polymerases from *E. coli* and *Thermus aquaticus*.

Tabor and Richardson (1995) then constructed a multiple sequence alignment (MSAs) for the three homologous Family A polymerases. They noticed that T7 polymerase had a tyrosine at a site (numbered 526) that is homologous to positions that held a phenylalanine in the *E. coli* and *Taq* polymerases (numbered 762 and 667 respectively). From this comparison, they hypothesized that this single amino acid difference was responsible for the different levels of discrimination against ddNTPs among the three polymerases.

Based on this hypothesis, Tabor and Richardson (1995) replaced the phenylalanine in the *Taq* polymerase by a tyrosine. The result was a variant *Taq* (F667Y) that retained the thermostability of the *Taq* parent but gained improved ability to accept ddNTPs. Similar improvements were seen when the analogous replacement was made in the polymerase from *E. coli*. The mutant *Taq* (F667Y) became one of the first "designed" polymerases to be used in DNA sequencing (Tabor and Richardson, 1995).

Subsequently, Li et al. (1999) studied the crystal structures of Klentaq1, a derivative of *Taq* DNA polymerase that lacks an exonuclease domain. In separate structures, protein crystals binding ddNTPs were observed to have closed ternary complexes, where a conformational change upon substrate binding was associated with a large shift in the position of the side chain of residue 660 in the O helix. Comparing the open and closed structures with ddGTP, Li et al. (1999) concluded that the selective interaction of arginine 660 with the *O*<sup>6</sup> and *N*<sup>7</sup> atoms of the G nucleobase might provide structural grounds for better incorporation of ddGTP by *Taq* polymerase. Guided by these observations, Li et al. (1999) then replaced amino acids at residue 660 in Klentaq1 already holding the Tabor-Richardson replacement (F667Y) and studied the resulting variants. Among the variants, the double mutant *Taq* (F667Y; R660D) showed superior performance in DNA sequencing architectures that used ddNTPs.

#### **THE NEED FOR DIRECTED EVOLUTION: SEQUENCE LANDSCAPES**

While structure, evolutionary comparison, and mechanistic analysis are all important tools in polymerase engineering, it remains a fact that chemical theory is inadequate to predict the exact outcome of any amino acid replacement on the performance of any protein, including polymerases. A degree of "trial and error" is inherent in protein engineering experiments. This, in turn, requires that we consider the size of the "protein sequence space" that might be explored as we set out to modify a protein to allow it to support a specific technological goal.

Background to this concept was presented by Smith (1970) almost a half century ago. We begin by noting that the behavior of all possible proteins of length *n* with respect to a measurable behavior can be represented by a space in *n* dimensions, where each dimension can have one of 20 discrete values, representing the 20 natural amino acids. Each protein sequence is represented by a point in that space. Two points are neighbors in that space if one can be converted into another by a single amino acid substitution. Thus, with 20 amino acids, each point in the sequence space has 19*n* neighbors. The measurable behavior is a real number displayed in the *n*th +1 dimension.

Different sequences have different functions, and moving from a sequence having a function to another functional sequence can proceed via intermediates that either have or lack function. This is illustrated in **Figure 2** with a word game used by Smith (1970), where functional protein sequences are analogous of strings of letters that have a meaning in English. In Smith's (1970) analogy, the sequence of letters in the word "WORD" is converted to the sequence of letters in the word "GENE" by exchanging one letter

at the time, with each step in one path having a meaning (WORE, GORE, and GONE). Paths where all intermediates are meaningful are illustrated by solid lines between points on the surface. Other paths proceed via words lacking meaning, as illustrated by broken lines (for example, WOND, GOND, and GEND).

In this example, linguistic "meaning" is equated to fitness, which provides the *n*th +1 dimension to the surface, a "fitness landscape" (Wright, 1932). The landscape is represented as a topographic map with peaks marked with a (+) for optimal sequences. The absence of function is depicted as dips, marked with a (−). Smith (1970) proposed that natural evolution evolves along paths only if all intermediates are functional. Non-functional sequences are removed by "purifying" selection. Thus, the only valid pathways to explore a sequence space proceed via functional sequences, just as the evolution of words can proceed only via meaningful words.

Sequence space within a protein framework is vast, but enumerable. For example, a 100-amino acid protein can be arranged in 20<sup>100</sup> different ways. Typical polymerases, eight times longer, constitute a space with 20<sup>800</sup> points. Both numbers are astronomical. No experiment can sample this space effectively.

Several features of the fitness landscape influence the ease with which it is searched: (a) the fitness landscape is "smooth," meaning that a useful protein sequence can be obtained starting at any point on the landscape via a path that encounters only other functional proteins or, if not, then (b) useful functional proteins can be obtained no matter where one starts the search, as the surface has many of them or, if not, then (c) the library is guided so as to start the search in a region of the functional hypersurface

**map, peaks (+) indicate the locations where function exist while dips (–) represent regions with lack of function.** Illustrated through an analogy to a word game, a meaningful (functional) string of

Solid arrows indicate a path of accepted mutations while dashed arrows illustrate deleterious mutations that produce non-functional proteins.

where useful functional proteins reside. Directed evolution is an approach that mimics natural evolution in a time scale that can be reproduced in a laboratory. A directed evolution experiment starts by producing a library of variants (to be discussed further) which then would be selected to a screen or to a selection. A screen involves testing individual variants for the desired properties and is suitable for relatively small libraries, perhaps no more than a few 100s. A selection typically sorts millions of variants at the same time. The experimenter designs the selection in a way that only the variants with the desired properties would survive the selection. The expected outcome of a directed evolution experiment is an enriched pool of variants with proteins having the desired characteristics. Directed evolution can be used to optimize and study any protein (Sterner, 2011).

#### **THE PRACTICE OF DIRECTED EVOLUTION WITH POLYMERASES**

In a directed evolution experiment, a "parent" enzyme is chosen to start the search that has (at least) some of the properties desired in the enzyme that will ultimately have utility. The gene of this parent enzyme is then altered to create a library encoding variant forms of the enzyme; some of which might be able to catalyze the desired transformation better than the parent enzyme. The members of the library that are of interest can be isolated by screening or selection.

#### *Library generation*

In fact, we have little information about the "smoothness" of any protein fitness landscape. The native polymerase used to initiate an experiment in directed evolution is, of course, already at an elevated point on the fitness landscape, at least for some conditions. It is not clear how many steps (amino acid replacements) can be taken away from the native sequence without losing activity. Further, we expect that certain replacements are more likely to retain core activity than others. All of this suggests that the nature of the library generated from that native sequence might influence the outcome of a directed evolution experiment. It is certainly expected that library generation, if intelligently biased, will allow desired outcomes to be generated faster.

*Error prone PCR.* A common way to generate libraries from a starting sequence is "mutagenic" or "ePCR." This approach takes advantage of the inherent propensity of *Taq* polymerase to introduce mistakes into the copies of DNA under certain conditions. The frequency of mismatching is often increased by introducing manganese Mn2<sup>+</sup> along with the natural cofactor Mg2<sup>+</sup> (Vartanian et al., 1996). Other additives, such as alcohols or unbalanced concentrations of nucleotides, can also be used to introduce mutations through PCR.

The ePCR method produces does not produce a truly random set of amino acid replacements, for several reasons:

(i) *Taq* pol tends to replace purines (adenine and guanine) by other purines and pyrimidines (thymidine and cytidine) by other pyrimidines; these changes are called transitions (as opposed to transversions, which exchange a purine for a pyrimidine or a pyrimidine for a purine). The biased tendency of the polymerase to generate transitions over transversions leads to libraries with amino acid replacements that are non-random with respect to the parent protein.

(ii) Even if ePCR introduced transitions and transversions equally, the resulting amino acid replacements would not be random, due to the structure of the genetic code. In the code, amino acids having similar chemical properties have closely related codons (Wong et al., 2007). For example, the valine codon (GTN1) is converted by a single nucleotide replacement to a phenylalanine codon (TTY2), a leucine codon (CTN), an isoleucine codon (ATN), an aspartate codon (GAY) or a glycine codon (GGN). To gain access to codons for other amino acids and, consequently, more dramatically, alter chemical properties in the variant protein, two or three nucleotide replacements are required.

High levels of replacement are not easily achieved by ePCR, nor are they desired. Typical ePCR introduces no more than 4–6 mutations per 1000 nucleotides. Further, a mutation rate that is high enough to search amino acid sequences independent of the code is almost certainly too high to generate any variants that retain polymerase activity as polymerases.

*Degenerate codons.* Recognizing this challenge, Reetz et al. (2008) developed an elegant approach to library generation that introduces degenerate codons: NNK and NDT. Here, N is any nucleobase, K is guanine or thymine, and D is guanine or adenine or thymine. With the NNK degenerate codon, all 20 amino acids are covered by just 32 (= 4 × 4 × 2) of the 64 codons possible with standard nucleotides. The twelve NDT degenerate codons ( = 4 × 3 × 1) cover a representative sample of the standard amino acids, including non-polar, aromatic hydrophobic, hydrophilic, and charged amino acids.

Behind this discussion are assumptions about the meaning of the word "random" when discussing amino acid replacements. Some amino acids are encoded by more codons than other amino acids, like serine, with six codons; in contrast, tryptophan is encoded by just one codon. A gene with a truly random sequence would give proteins with a codon-weighed distribution of amino acids. Even this might not be the desired goal of an unguided approach to library generation as some amino acids appear in natural proteins more abundantly than expected from their few codons, for example aspartate and glutamate, each with two codons. Thus, an "ideal" library might arguably be one in which amino acids are replaced by a process that leaves the naturally observed overall composition of the protein unchanged. Finally, our ignorance on the shape of function landscapes, as well as our ignorance of the local topography around any individual parent sequence, means that we cannot state *a priori* which amino acid distribution is most likely to give a desired result in a directed evolution experiment.

*Libraries made by gene shuffling or molecular breeding.* Random mutagenesis of a parent gene fails, of course, to use all of the information available to a protein engineer, especially in a postgenomic world. As noted above, Nature has already run evolution

<sup>1</sup>N is any nucleotide.

<sup>2</sup>Y is a pyrimidine.

experiments. These provide to us many homologs of a parent protein having many amino replacements relative to the parent sequences. Most of these are functional, and, therefore, identify points in sequence space that are elevated on the fitness landscape. It would be desirable to use the information that these homologs provide.

Gene shuffling was introduced by Stemmer (1994) more than a decade ago to directly use these homologs. Here, the starting point is a family of genes that share enough sequence similarity that they can undergo homologous recombination. Using a modified PCR protocol, gene chimeras are produced.

Those using shuffling in protein evolution assume, of course, that sequence space is more efficiently searched by combining the outcomes of two historically successful searches of a particular region of sequence space, than a search that simply replaces single amino acids starting from a single parent. These historical searches delivered the two functioning proteins whose genes are being shuffled. Here, the landscape is assumed to be such that specific paths between two elevated points are also similarly elevated.

This would be a more compelling hypothesis if natural evolution were observed to use shuffling. Natural evolution does, of course, have access to mechanisms that shuffle parts of genes. Natural evolution uses these mechanisms to rearrange (for example) the order of independently folded units in multi-unit polypeptides. This is famously done in the evolution of multi-unit proteins involved in metazoan signal transduction, where a regulatory protein might contain one "src homology domain 1" unit (SH1, a protein kinase), a few SH2 units, and a few SH3 units (Benner et al., 1993). Evolutionary analysis shows that these are all obtained by shuffling, implying that shuffling is an efficient way to search sequence space when no protein folding unit is disrupted.

However, natural evolution does not provide many examples where polypeptide chains within a *single* folded unit are shuffled. This is presumably because the buried contacts binding collections of secondary structural units are finely tuned to permit packing. Changing a single hydrophobic side chain in a packed protein fold often converts a core that is (typically) as densely packed as an organic crystal into a "molten globule." Thus, these biophysical realities would make it surprising to expect that shuffling explores sequence space more effectively than point mutation. Such expectations rely, of course, on the view that natural evolution exploits the most effective ways to search sequence space.

*Use of evolutionary information to create smaller but better libraries.* Alternative approaches now exist to create libraries that search sequence space around parent sequences (Lutz and Patrick, 2004; Jackel et al., 2008; Lutz, 2010). One class of these exploits evolutionary guidance. For example, Cole and Gaucher (2011) introduced an approach, called the Reconstructing Evolutionary Adaptive Paths (REAP) to create libraries that were hypothesized to explore local sequence space with more efficiency. REAP begins with a phylogenetic analysis of homologous sequences, seeking signatures of functional divergence. An amino acid at a site may be entirely conserved in one branch of a phylogenetic tree, while not conserved at all in a second branch. This pattern of divergence, sometimes called *heterotachy*, indicates that the purifying

selective pressures operating in the first branch at this site are different and stronger than those in the second. This, in turn, means that the function of the proteins within the first phylogenetic branch is different from the function in the second branch.

Only rarely, however, has natural history sought a phenotype desired by a protein engineer, of course, only rarely responsive to the specific adaptive changes needed by today's biotechnologist. Ancient polymerases, for example, were most likely *not* evolving to become resistant to heparin, a target of one of Holliger's selections. Therefore, the rationale for exploiting "evolutionary guidance" is more subtle.

A REAP analysis identifies sites that have been historically involved in *some* adaptive event. Because *some* changes are involved, the amino acid at the site cannot be *absolutely* required for core function. Conversely, the REAP-identified sites are not likely to be those whose amino acids *never* have a phenotypic impact. The rationale being that sites that have in the past been involved in an adaptive event without losing core function are sites that might be productively examined to identify sites that might adapt the protein to the *new*, biotechnologist-demanded, function.

Thus, the rationale behind REAP is the hypothesis that the most productive sites to replace in a protein engineering experiment are neither sites whose amino acids contribute to a core function (as indicated by their absolute conservation) nor sites in which the choice of amino acid is incidental to function (as indicated by their easy variability). By identifying sites for which replacement might have phenotypic impact without destroying core function, REAP is proposed to have an advantage compared to other methods in the generation of libraries with productively altered behaviors. The advantage of the REAP approach relies on the fact that nature has already tested several amino acid sites, and these modifications on these sites produce enzymes that retain the original activity. Searching for new variants in a REAP library gives the advantage of having several parent enzymes.

Thus, the design of a high fidelity DNA polymerase from a medium fidelity polymerase is largely beyond current structure theory. This makes it impossible to get polymerases with the desired high level behaviors from fully guided protein engineering. As a consequence, many investigators use protein engineering to select for polymerases with certain properties improved with respect to a desired function, starting from libraries of polymerase variants. The directed evolution approach is today considered by many to be the method of choice for protein engineering (Bornscheuer and Pohl, 2001; Yuan et al., 2005; Leemhuis et al., 2009; Turner, 2009).

#### *Compartmentalization*

Directed evolution requires the connecting of a phenotype with a genotype in a way that allows only genes that confer a desired phenotype to be propagated. This can be done in many ways. One method is compartmentalized self replication (CSR). Developed by Tawfik and Griffiths (1998), CSR holds proteins and genes together in water droplets suspended in oil emulsions. These generally receive the geneprotein pair from a single *E. coli* cell that is encapsulated within individual droplets (Tawfik and Griffiths, 1998). When the protein is a polymerase variant, its gene is copied only if that variant is active under the conditions of the evolution experiment.

Compartmentalized self replication was first applied to the directed evolution of DNA polymerases by Ghadessy et al. (2001). Here, a library of polymerase genes was delivered in plasmids to create clones in *E. coli* cells. These cells were dispersed into emulsified water droplets containing the primers and buffers needed to perform a PCR amplification of the polymerase gene. Approximately <sup>∼</sup>108−10<sup>9</sup> compartments are formed per milliliter of emulsion; ideally, each compartment contains a single variant. PCR cycling is then performed, with the first heat step lysing the *E. coli* cell to present its expressed thermostable polymerase and its encoding plasmids to the primers. Lysis of the cells then delivers polymerase variants expressed inside of the cells to the buffer, which contains the necessary components for PCR. The polymerase variants and the contents of the buffer remain encapsulated during the PCR cycling.

Polymerases that functioned under the conditions imposed by the experiment were able to make copies of only their own genes. After 20 rounds or more of PCR, the emulsions are broken to give a pool of PCR products enriched in the genes that encoded the selected polymerase variants. These genes could be used directly, or be introduced in cells for another round of

selection. This process is shown schematically in **Figure 3**. With iteration, this process mimics natural evolution, except that the selective pressures applied come from the bioengineer, rather than from Nature.

Phage display is an alternative way to connect genotype and phenotype. In it, a polymerase is linked to its encoding gene in a single viral particle. The protein of interest is co-expressed on the coat of a virus, linking genotype to phenotype. The Romesberg laboratory has been especially active in generating polymerase variants using this approach (Xia et al., 2002; Leconte et al., 2005, 2010).

#### **EXAMPLES OF MODIFIED POLYMERASES**

#### **CONVERTING A DNA POLYMERASE TO AN RNA POLYMERASE**

Misincorporation by a DNA polymerase through the incorporation of ribonucleoside triphosphates, rather than deoxynucleoside triphosphates, would circumvent the normal pathways in living cells. Accordingly, all DNA polymerases utilize a common mechanism to avoid misincorporation of ribonucleotides by a single active site residue known as the "steric gate" (Joyce, 1997; Gardner and Jack, 1999; Brown and Suo, 2011). Mutations in the steric gate alone are sufficient to render the DNA polymerase able to incorporate nucleoside triphosphates. Yet, products lengths have not exceeded 58 nucleotides and generally result in short termination sequences stalling at +6–7 nucleotides (Gao et al., 1997;

**experiments start with the creation of a library of genes encoding variants of a polymerase.** Members of this library are introduced into E. coli cells by electroporation. Here, just two variant genes (red and blue) are represented. These genes drive the expression of mutant polymerases in each E. coli cell, each of which is isolated in its own water-in-oil-emulsion droplet. **(B)** The first cycle of PCR breaks the cell wall of the E. coli, exposing the expressed polymerase molecules and their gene to the contents of a water droplet containing all of the necessary components necessary for a

polymerase, and (iv) the enzyme expressed by this gene **(C)**. During PCR, any polymerases active under the selective pressure (blue) amplify their respective genes, enriching the pool of mutants having the desired properties; inactive polymerases (red) fail to do so **(D)**. The emulsion is then broken and the amplified genes enriched in those encoding polymerases having the desired behaviors are extracted and inserted in a plasmid vector [circular DNA; **E**]. These then enter the cycle of selection again **(A)**. After repeating these cycles an enriched pool of variants of the original gene are produced.

Gardner and Jack, 1999; Patel and Loeb, 2000b; Xia et al., 2002; Yang et al., 2002; Ong et al., 2006; McCullum and Chaput, 2009; Brown et al., 2010; Staiger and Marx, 2010; Brown and Suo, 2011). Recently, Cozens et al. (2012) discovered a single amino acid mutation (E664K) in the DNA polymerase from *Thermococcus gorgonarius* that in conjunction with a "steric gate mutation" produced a DNA polymerase capable of synthesizing long RNAs, up 1.7 kb.

Using phage display the Romesberg laboratory has evolved a DNA polymerase [the Stoffel fragment (Sf) of Taq polymerase] into a RNA polymerase. With just five mutations, one of them the "steric gate mutation" (E615G in Taq) the DNA polymerase was able to incorporate ribonucleotides triphosphates (rNTPs) with rates increased by 103–104 fold compared to the wild type polymerase (Xia et al., 2002). The Holliger laboratory known for the use of the CSR approach has produced a variant of *Taq* polymerase that can incorporate both dNTPs and rNTPs; this variant has only four mutations one of them the "steric gate mutation" mentioned in the previous example (Ong et al., 2006).

#### **DNA POLYMERASES ABLE TO BYPASS DEFECTS**

d'Abbadie et al. (2007) shuffled the genes of the polymerases from three *Thermus* species (*aquaticus*, *thermophilus,* and *flavus*) to generate libraries to start a directed evolution experiment to identify DNA polymerases that can extend single, double and quadruple mismatches, process non-canonical primertemplate duplexes, and bypass hydantoins and abasic sites (d'Abbadie et al., 2007). They applied these to PCR-amplify cave bear DNA from remains ca. 50 000 years old. These experiments showed that the polymerases obtained by directed evolution applied to these libraries outperformed Taq DNA polymerase and were, therefore, better able to solve a biotechnological problem, here, the sequencing of ancient damaged genomes.

#### **DNA POLYMERASES ABLE TO ACCEPT EXPANDED GENETIC ALPHABETS**

One of the AEGIS base pair created in our laboratories is formed between the nucleotides trivially called Z and P (**Figure 4**). The **Z**:**P** pair has a standard Watson-Crick geometry joined by three hydrogen bonds, differing from the standard C:G pair in the arrangement of donor and acceptor groups that form the connecting hydrogen bonds. Both nucleobases place electron density into the minor groove, a density that can accept a hydrogen bond from a polymerase (Geyer et al., 2003). These features allow polymerases to accept d**Z**TP and d**P**TP as substrates to form duplexes containing **Z**:**P** pairs in primer extension reactions, PCR and nested PCR architectures.

In order to improve polymerases able to accept these AEGIS components, we did a selection among -(1-279) *Taq* using CSR. Two of the best variants identified: variant (M444V/P527A/D551E/E832V) and variant (N580S/L628V/ E832V) showed to pause less when challenged *in vitro* to incorporate dZTP opposite P in a template (Laos et al., 2013). Interestingly, our library was created by introducing random mutations on the *Taq* gene, but the outcome of the selection produced variants which contain several sites that have displayed heterotachy

(different rates of change) in their natural history (Lopez et al., 2002). Heterotachy is a sequence pattern such that the rate of evolution acting at an individual site can be slow in one portion of the phylogeny while the rate at the same site can be rapid in a different portion of the phylogeny. Such patterns arise from shifts in the selective constraints acting at individual sites throughout the evolutionary history of a gene family, and by extension, the precise biomolecular behaviors of the homologous proteins are not identical across the phylogeny (Chen et al., 2010; Cole and Gaucher, 2011) suggesting that these sites were involved in an adaptive change in natural polymerase evolution.

The Romesberg laboratory, using phage display, has produced polymerases having an improved ability to incorporate the selfpairing hydrophobic nucleobases analog propynylisocarbostyril (PICS; Leconte et al., 2005).

Loakes et al. (2009) have produced a novel polymerase product of the shuffling of polymerases from family A (*Taq* from *T. aquaticus*, *Tth* from *T. thermophilus*, and *Tfl* from *T. flavus*) all of them from the genus *Thermus* and selected by CSR.

#### **POLYMERASES FOR SEQUENCING BY SYNTHESIS METHODS**

Sequencing by synthesis (SBS) is a promising next-generation DNA sequencing approach. There are currently several commercial instruments that are offered in the market. Some of the common features of these products are the use of solid phase chemistry to amplify the initial sample and the use of reversible terminators. Reversible terminators are nucleotides that generally have two modifications: one at the 3- OH and the other is either at the 5 or 7 position of the nucleobase. The 3 hydroxyl position has a cleavable moiety that terminates the polymerase extension reaction after a single-base incorporation.Yet, some reversible terminators do not have a modification on the 3- hydroxyl like some scarless photocleavable terminator of LaserGen (Wu et al., 2007), the virtual terminator of Helicos BioSciences (Bowers et al., 2009) and the more recently reported Lightning TerminatorsTM developed in New England Biolabs (Gardner et al., 2012).

The other modification is at the C-5 of pyrimidines or the N-7 position of purines and consist of a fluorescent molecule that is used as a reporter for each of the individual bases. The C-5 and N-7 positions are used because these positions point away from the catalytic pocket of the enzyme. Gardner and Jack (1999) studied variants of Vent DNA polymerase from the hyperthermophilic archaeon *Thermococcus litoralis*. They studied variants on a Tyrosine residue that is highly conserved on family B and was proposed to act as a steric gate (Gardner and Jack, 1999).

The Romesberg laboratories have found polymerases having an improved ability to incorporate modified dUTP with a fluorophore (dUTP-Fl) that can be used for SBS. Leconte et al. (2010) generated a library of Sf, which is *Taq* DNA polymerase minus the first 289 amino acids. This fragment conserves the polymerase activity but lacks the exonuclease domain. The library was done by shuffling the genes of six homologous polymerases: *Thermus aquaticus*; *Thermus thermophilus*; *Thermus caldophilus*; *Thermus filiformis*; *Spirochaeta thermophila*; *and Thermomicrobium roseum*. The three most active polymerase mutants were: Sf168 (with 19 mutations); Sf197 (with 14 mutations). These mutants

had between 10 to 50-fold increase in efficiency for dUTP-Fl incorporation compared with wild type Sf (Leconte et al., 2010).

Our laboratories have produced a variant of Taq polymerase using an evolutionary approach to design a polymerase library and then screen a relatively small library for polymerases able to accept unnatural triphosphates modified on their sugar units. Using REAP, they identified 35 sites having heterotachous behavior, after filtering for sites where additional information from evolutionary history, structural biology, and experiments was exploited. They then asked which replacements improve the ability of *Taq* polymerase to accept reversible terminating triphosphates, where the 3- -OH unit of the nucleoside triphosphate had been replaced by an -ONH2 unit, which prevents continued primer extensions. A single modification (L616A) appears to open space behind Phe-667, allowing the enzyme to accommodate a larger 3- -substituent (Chen et al., 2010).

The Holliger lab, when selecting for variants that accepted 2- -deoxycytidine derivatives carrying appended Cy3- and Cy5 fluorescent dyes, recovered variants of *Pfu* DNA polymerase each having two to six amino acid replacements (Ramsay et al., 2010).

### **COMMERCIAL APPLICATIONS**

The number of commercial applications for non-standard nucleotides is large and growing, implying a growing need for engineered polymerases. This review cannot describe all of the potential commercial applications, but a brief summary of those that already exist indicates their scope. For example, Sherrill et al. (2004) used isoC and isoG modified with reporter molecules to develop an assay to detect both RNA and DNA. These modified nucleic acids (isoC and isoG) are also used to quantify levels of HIV and hepatitis viruses in patients (Collins et al., 1997; Elbeik et al., 2004a,b). isoC and isoG are also used to diagnose a panel of respiratory diseases (Nolte et al., 2007).

Sequencing by synthesis technology relies on nucleoside derivatives that are modified in two ways, first with a fluorescent tag, and then (usually) with a reversibly terminating blocking group (Fedurco et al., 2006; Turcatti et al., 2008). Different modified polymerases, which are commercially available have been suggested to improve the procedure (Aird et al., 2011; Fisher et al., 2011; Quail et al., 2012). Real-time sequencing also requires polymerases that accept nucleoside derivatives (Eid et al., 2009).

At least one polymerase obtained by directed evolution is commercially available; it was selected to the ability to incorporate dZTP opposite dP in a template, and is available through Firebird Biomolecular Sciences LLC (www.firebirdbio.com).

#### **CONCLUSION AND PERSPECTIVES**

The demand for polymerases capable of incorporating unnatural nucleotides is certain to grow as the interest to build modified DNA structures continues, including alternative genetic alphabets (Geyer et al., 2003), highly tagged substrates (Hollenstein et al., 2009), modified backbones (Pinheiro et al., 2012), and other unusual structures (Fa et al., 2004; Leconte et al., 2005; Hirao et al., 2007).

The literature teaches that in some cases, simple downstream screening can obtain polymerases with the needed properties. This is illustrated by efforts by Tabor and Richardson (1995), Gardner and Jack (1999), and Chen et al. (2010) to name a few examples. In these cases to create polymerases that accept various 3- -terminating groups. Their combination of structural biology and evolutionary biology analyses were sufficiently powerful to ensure that regions of sequence space small enough to be screened containing polymerases having the desired properties. In each case, screening began with a relatively small number of variants extracted from the sequence space local around a deftly chosen parent, allowing to get useful enzymes by inspection. However, as a result of the large space sequence space of proteins, examples of success in designing polymerases are not frequent, and are rarely (if ever) *de novo*.

In our experience, the outcome of directed evolution experiments can be explained by the analysis of the evolutionary history of the protein; in this case we found the heterotachy pattern. The heterotachy analysis was originally used to elaborate a small library and screen for variants of *Taq* polymerase (Chen et al., 2010). Later we found several of the substitutions recovered in our directed evolution experiment with *Taq* polymerase had this pattern.

It is interesting to note that some mutations reported in the literature by other research groups happen in amino acid sites considered to be heterotachous by our analysis. One particular amino acid change found by our selection of polymerases better able to synthesize duplexes containing Z:P pairs (D578N) occurred in a site that underwent substitution in the CSR experiment that obtained a *Taq* variant resistant to heparin inhibition (D578G; Ghadessy et al., 2001). Remarkably, position 614 on *Taq* polymerase has been reported at least three times as the outcome of directed evolution experiments (Patel et al., 2001; Xia et al., 2002; Fa et al., 2004). Other amino acid sites from *Taq* polymerase considered heterotachous and reported on directed evolution experiments are: D144, F598 (Ghadessy et al., 2004), A597, A600, E615 (Xia et al., 2002), and L616 (Patel et al., 2001). This provides support for the general hypothesis behind REAP, that sites involved in adaptation to one environmental novelty might also help adaptation to environmental novelties more generally.

We believe that the recapitulation of the natural history of proteins reflects the fact that the sites can be changed to meet new challenges presented to polymerases without damaging the catalytic power or fidelity of the proteins. The observations in the literature underline the importance of understanding the evolution of polymerases in designing libraries to better explore their sequence space. It will be interesting to further study the outcome of contemporary *in vitro* selection experiments and how they recapitulate.

For less effectively constructed libraries of variants, including those generated by shuffling and by undirected mutagenesis, various selection tools stand to pick up where screening cannot possibly go. Here, CSR and phage display have been especially useful. These have yielded polymerases that support the copying of entirely different genetic systems (Pinheiro et al., 2012).

Although some of the mutations found to be useful for altering polymerases to accept unnatural nucleotides fall on or near the conserved motifs and could potentially be rationalized, there are still a number of mutations that cannot be easily explained and their effect could be subtle.

Other approaches have been found to be useful for the evolution of other enzyme systems. For example, neutral drift libraries (Amitai et al., 2007; Bloom et al., 2007a,b), have yet to be applied as starting points for directed evolution experiments with DNA polymerases.

Further, we (and many others) are seeking to develop living systems that implement a "synthetic biology" based on unnatural DNA analogs. These have the potential for being "biosafe" platforms for artificial metabolisms, fermentations, diagnostics, and therapeutic tools, *inter alia* (Schmidt, 2010).

#### **ACKNOWLEDGMENTS**

The authors are thankful to Dr. Dietlind Gerloff from FFAME for useful suggestions in the preparation of this manuscript. We are indebted to DTRA for funding through grant HDTRA1-13-1- 0004.

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 20 June 2014; accepted: 07 October 2014; published online: 31 October 2014. Citation: Laos R, Thomson JM and Benner SA (2014) DNA Polymerases engineered by directed evolution to incorporate non-standard nucleotides. Front. Microbiol. 5:565. doi: 10.3389/fmicb.2014.00565*

*This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology.*

*Copyright © 2014 Laos, Thomson and Benner. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# A novel thermostable polymerase for RNA and DNA loop-mediated isothermal amplification (LAMP)

#### *Yogesh Chander 1, Jim Koelbl 1, Jamie Puckett 1, Michael J. Moser 1, Audrey J. Klingele1, Mark R. Liles 2, Abel Carrias 2, David A. Mead1 and Thomas W. Schoenfeld1 \**

*<sup>1</sup> Lucigen Corporation, Middleton, WI, USA*

*<sup>2</sup> Department of Biological Sciences, Auburn University, Auburn, AL, USA*

#### *Edited by:*

*Andrew F. Gardner, New England Biolabs, USA*

#### *Reviewed by:*

*Marla Tuffin, University of the Western Cape, South Africa Daniel M. Jenkins, University of Hawaii, USA*

#### *\*Correspondence:*

*Thomas W. Schoenfeld, Lucigen Corporation, 2905 Parmenter Street, Middleton, WI 53719, USA e-mail: tschoenfeld@lucigen.com*

Meeting the goal of providing point of care (POC) tests for molecular detection of pathogens in low resource settings places stringent demands on all aspects of the technology. OmniAmp DNA polymerase (Pol) is a thermostable viral enzyme that enables true POC use in clinics or in the field by overcoming important barriers to isothermal amplification. In this paper, we describe the multiple advantages of OmniAmp Pol as an isothermal amplification enzyme and provide examples of its use in loop-mediated isothermal amplification (LAMP) for pathogen detection. The inherent reverse transcriptase activity of OmniAmp Pol allows single enzyme detection of RNA targets in RT-LAMP. Common methods of nucleic acid amplification are highly susceptible to sample contaminants, necessitating elaborate nucleic acid purification protocols that are incompatible with POC or field use. OmniAmp Pol was found to be less inhibited by whole blood components typical in certain crude sample preparations. Moreover, the thermostability of the enzyme compared to alternative DNA polymerases (*Bst*) and reverse transcriptases allows pretreatment of complete reaction mixes immediately prior to amplification, which facilitates amplification of highly structured genome regions. Compared to *Bst*, OmniAmp Pol has a faster time to result, particularly with more dilute templates. Molecular diagnostics in field settings can be challenging due to the lack of refrigeration. The stability of OmniAmp Pol is compatible with a dry format that enables long term storage at ambient temperatures. A final requirement for field operability is compatibility with either commonly available instruments or, in other cases, a simple, inexpensive, portable detection mode requiring minimal training or power. Detection of amplification products is shown using lateral flow strips and analysis on a real-time PCR instrument. Results of this study show that OmniAmp Pol is ideally suited for low resource molecular detection of pathogens.

**Keywords: diagnostics, RNA/DNA polymerase, infectious diseases, RT-LAMP, point-of-care**

### **INTRODUCTION**

Rapid, sensitive, easy-to-use methods for detection of pathogens are needed for timely diagnosis of infectious diseases especially at point-of-care (POC). Common molecular detection methods by end point and real time PCR are valuable tools for pathogen detection and are widely used in clinical diagnostics because of high sensitivity and specificity (Segawa et al., 2014). However, there are several problems in implementing these methods at POC, particularly the need for trained personal, extensive sample preparation protocols and specialized laboratory equipment, which have prevented use of these methods in resource limited settings. Isothermal amplification methods such as loop mediated amplification (LAMP; Notomi et al., 2000) hold great promise to shorten nucleic acid detection times, simplify the instrumentation and reduce power requirements by eliminating the need for thermal cycling. These improvements are facilitating the movement of nucleic acid tests (NATs) from the central laboratory to POC environments like clinics, hospital emergency rooms, farms and other remote areas of need, but still require improvement before fulfilling their potential.

While LAMP has proven highly useful in laboratory environments, the current formats have limited application under POC conditions (Nijru, 2012). Improvements in the DNA polymerase and detection modes could allow use of the LAMP platform in POC testing in resource limited settings. A key drawback of typical LAMP formulations is the inability to directly detect RNA without a second reverse transcriptase enzyme. Currently, most LAMP methods use a truncated product (large fragment) of *Bacillus stearothermophilus* (*Bst*) Pol (Huang et al., 1999) or a highly similar enzyme from closely related moderately thermophilic bacterium. While this enzyme is highly effective for amplification of DNA based targets, it cannot amplify RNA without the addition of a reverse transcriptase for conversion of RNA template to cDNA that serves as a target for LAMP. This adds additional steps and necessitates use of a buffer that is a compromise between the optimal conditions for the respective enzymes. The reaction requirement for the reverse transcriptase also imposes a limit on the thermal stability of the reaction.

Some of the most important RNA targets for diagnostic detection are viral genomes, which can be highly structured. Thermal treatment during sample preparation immediately prior to amplification has been indispensable in allowing direct detection of bacterial and viral targets. Currently available reverse transcriptase's, Avian Myeloblastosis Virus (AMV RT) or Moloney Murine Leukemia Virus (MMLV RT), are relatively labile and thermal melting to alleviate secondary structure has not been possible with any of the current LAMP systems. While single enzyme RT-LAMP methods are available (http://www*.*optigene*.*co*.*uk/), most RT LAMP uses the two-enzyme format.

In order to use NATs in POC settings, it is important to have a simple and easy to use method for detecting amplification. Ideally the detection would confer additional specificity and sensitivity, while keeping total testing costs low. Most common detection methods for LAMP, such as agarose gel electrophoresis or use of real-time PCR instruments are prohibitively expensive, slow, and require extensive user training. Dyes such as calcein (Tomita et al., 2008), or hydroxynapthol blue (HNB; Goto et al., 2009) in the LAMP reaction mixture allows direct visual detection of amplification results, but do not improve specificity and the ambiguous results require more user judgment than is acceptable for POC use. An alternative detection mode is the use of lateral flow devices (LFD), which can be portable, and does not require instrumentation or electrical power. The combination of LAMP and LFD provides an inexpensive, facile tool for NAT in remote, low resource environments. The need for a refrigerated cold chain is unavailable in many low resource settings, which impairs the utility of a POC test, so a final component of a molecular based POC technology is the stability of the test for distribution and storage under ambient conditions.

Screening viral metagenomes from boiling hot springs uncovered new thermostable DNA polymerases (Schoenfeld et al., 2008). An engineered derivative of one of these, PyroPhage 3173 DNA polymerase, was effective in RT PCR (Moser et al., 2012). This enzyme exhibits innate reverse transcriptase activity, thermostability and potent strand-displacing activity and has now been formulated for use in direct detection of RNA and DNA pathogens by LAMP. Its thermostability allows additional flexibility for using a thermal treatment in sample preparation and amplification of highly structured regions of genomes.

In this report we describe the use of this novel polymerase in LAMP and RT-LAMP (reverse transcription LAMP). In order to understand the potential applications and limitations of using OmniAmp polymerase in LAMP, a diverse group of DNA and RNA based targets were selected (**Table 1**). In addition to developing LAMP method for each pathogen, we also evaluated the use of a lateral flow device to detect the amplification results and validated the use of dried reagents stable to ambient storage as a step in providing POC LAMP assays.

#### **MATERIALS AND METHODS**

#### **LAMP ENZYMES**

The discovery and initial characterization of PyroPhage 3173 DNA polymerase and its application in RT-PCR has been described earlier by Schoenfeld et al. (2008) and Moser et al. (2012). The wild type DNA polymerase had a potent proofreading exonuclease activity that was disabled by mutagenesis. The modified enzyme was formulated for use in LAMP and RT LAMP and is commercially available as OmniAmp polymerase (Lucigen Corporation, Middleton, WI). *Bst* DNA polymerase (Lucigen, Corporation, WI) was used to compare DNA LAMP assay results with OmniAmp polymerase.

#### **PATHOGENS**

**Table 1** lists the pathogens for which LAMP assays were developed. All pathogens were obtained from different sources and nucleic acids (DNA or RNA) were extracted from overnight grown cultures (for bacteria) or from cell culture supernatants (for viruses) using commercial kits (Qiagen, Valencia, CA). For Ebola virus (EBoV) and Crimean-Congo hemorrhagic virus (CCHFV), agents of viral hemorrhagic fever, RNA was extracted

**Table 1 | List of targets for which LAMP assays were developed using OmniAmp polymerase.**


*\*RNA extracts were provided by Galveston National Laboratory, TX and were certified for use in BSL II facility.*

#### **Table 2 | List of LAMP primers used in this report.**


*(Continued)*

#### **Table 2 | Continued**


in a BSL-4 facility at Galveston National Laboratory, TX and tested for safety for use in BSL-II laboratory before being transferred to Lucigen.

### **LAMP PRIMER DESIGN**

For each pathogen, LAMP primers targeting conserved regions of the indicated pathogens were designed using the online primer design utility, Primer Explorer (https://primerexplorer*.* jp/e/). Conserved regions for the targeted genes were identified by aligning the nucleotide sequences of target genes from GenBank (www*.*ncbi*.*nlm*.*nih*.*gov) together using clustal W (www*.*megasoftware*.*net). Nucleotide sequences (200–300 bp) of the conserved regions as determined by alignment were used to design LAMP primers. Primer designs were selected to provide 100% specificity based on analysis by BLAST (www*.*ncbi*.*nlm*.*nih*.* gov) and the list of primers is provided in **Table 2**.

For use in LAMP reaction, 20X primer mix was prepared by mixing all six primers (F3,B3:FL,BL:FIP,BIP) in 1:4:8 ratio (Nagamine et al., 2002). Primer mix was stored at −20◦C till used.

# **OPTIMIZATION OF LAMP ASSAY**

LAMP assays were developed using OmniAmp 2X Isothermal Master Mix (Lucigen Corporation, WI). This master mix is formulated for LAMP and contains optimal concentrations of betaine, salts, dNTPs, and OmniAmp polymerase. Reactions were formulated and performed as described in Lucigen's OmniAmp manual. Final concentration of the reaction mixes were: 1X OmniAmp Master Mix, 2 mM Fiona Green dye (Marker Gene, OR), and 1X LAMP primer mix (IDT, IA; stock solution: 20X); 5μl of target (DNA or RNA), brought to volume (25μl) with DNase-RNase free water and incubated in a real time thermocycler (iQ5, Bio-Rad, CA) at constant temperature for indicated times and monitored by detection of Fiona Green fluorescence, measured and quantified by the instrument software at 30 s intervals. The TTR (time to result) was set as the time at which

the fluorescence crossed a hypothetical threshold of 10% of maximal fluorescence. Samples were considered negative if they failed to cross the threshold. In each case, at least three primer sets were synthesized and compared for TTR and specificity. Post-amplification melt analysis was used to distinguish correct (target-dependent) from spurious (target-independent) amplification products. To further verify specificity, reaction products were also visualized by electrophoresis on ethidium bromidestained 2% agarose gels. Optimal amplification temperatures for each assay were determined using a temperature gradient ranging from 66 to 74◦C. To determine the sensitivity of assay, 10-fold serial dilutions of DNA or RNA was prepared in water for detection by LAMP.

#### **DEVELOPMENT OF RAPID SAMPLE PREPARATION METHOD**

We also evaluated use of a simple heat lysis method for the extraction of nucleic acid from different clinical matrices. Heat lysis was performed by diluting sample into an extraction buffer followed

by incubation at 90◦C for 5 min. After incubation, lysates were used as template in LAMP reaction as described in above section Optimization of LAMP Assay.

For this, sheep whole blood (Hemostat Laboratories, CA) was spiked with *E. coli* MS2 RNA virus particles followed by 10-fold serial dilutions in the same matrix. As a control, 10-fold dilutions of virus particles were made in Tris buffer. Spiked samples were divided into two parts, one part was extracted using a heat lysis method and the other part was used for viral nucleic acid extraction using a commercial kit (QIAamp Viral RNA extraction kit, Qiagen, CA). For heat lysis, samples were diluted in a Tris-EDTA extraction buffer (Lucigen Corporation, WI) and incubated at 90◦C for 5 min. After extraction, lysates from both methods were used directly as template in LAMP.

#### **LAMP WITH LYOPHILIZED REAGENTS**

To allow ambient storage of formulated LAMP reagents, 1X isothermal master mix, including OmniAmp polymerase was prepared without glycerol, primers, and Fiona green dye. LAMP formulation was lyophilized using BioLyph's (Hopkins, MN) patented technology. Lyophilized LAMP reactions were rehydrated with template, primers and dye into a total volume of 25μl and incubated and detected in a real time thermocycler run isothermally as described above.

#### **DETECTION OF AMPLIFICATION BY USING LATERAL FLOW DEVICE**

To simplify detection of positive reactions, we evaluated use of LFD. For this application, forward and reverse loop primers were synthesized with a 5- -conjugated biotin and FITC, respectively. The LF strips were prepared in-house (Lucigen Corporation, Middleton, WI) using an anti-biotin antibody (Thermo Scientific, IL) for capture and a colloidal goldconjugated anti-FITC antibody (British Biocell International, UK) for detection. In this application LAMP was performed as

agarose gel.

described above using labeled loop primers and after completion, reaction products were loaded on the LFD for detection. A positive reaction was indicated by the appearance of red lines at both "Control" and "Test" whereas the appearance of a red line only at "Control" indicates a negative reaction.

This method was evaluated using two strains of *Edwardsiella ictaluri* (S97-9773 and 219). Specificity was determined using one strain each of *Edwardsiella tarda* and *Escherichia coli* (DH10B). For LAMP, six 100-fold dilutions (−2, −4, −6, −8, −10, and −12) of each strain were made in Tryptic Soy Broth (TSB) from overnight grown cultures. These dilutions were used directly as template in *E. ictaluri* LAMP assays. After incubation, LAMP reaction products were loaded on to LFD for visualization.

#### **RESULTS**

#### **COMPARISON OF OmniAmp AND** *Bst* **POLYMERASES**

Performance of OmniAmp polymerase in a LAMP reaction was compared with that of *Bst* polymerase (**Figure 1**). The temperature optimum of OmniAmp is about 70◦C, while that of

*Bst* is 65◦C. At its optimal temperature, the OmniAmp polymerase was significantly faster than *Bst* polymerase. This translates to a shorter time to result (TTR), as shown in the detection of the DNA target in *Edwardsiella ictaluri*, an important catfish pathogen. This advantage in shorter TTR was more pronounced at lower template concentrations where detection by the OmniAmp polymerase was 20% faster (**Figure 2**).

#### **DETECTION OF DNA TARGETS**

OmniAmp Pol-based LAMP assays were developed for detection of *Staphylococcus aureus*,l *Bacillus atrophaeous* (BAT), and Porcine circovirus (PCV-2), all of which are DNA targets. LAMP primer designs were tested with serial 10-fold dilutions of DNA under optimized reaction conditions. Overall, amplification of all three pathogens was achieved in *<*30 min with minimal non-specific amplification (**Figure 3A**). In *S. aureus* LAMP assay, amplification was observed in no-template control (NTC) which was found to be non-specific as it had different melt temperature; melt temperature of specific product: 82.5◦C and melt temperature of non-specific product: 84◦C (data not shown).

Post-amplification, reaction products were separated on 2% agarose gel and the appearance of ladder like patterns confirmed the correct amplification products. **Figure 3B** shows PCV-2 LAMP reaction products on 2% gel.

To determine limit of detection, 10-fold serial dilutions of PCV-2 DNA were prepared in water and each dilution was tested in triplicate in LAMP reaction. Results presented in **Figure 4** shows high sensitivity of LAMP assay for detection of PCV-2 with limit of detection of about 4 copies of DNA μl. Regression analysis showed good correlation (*R*<sup>2</sup> <sup>=</sup> <sup>0</sup>*.*95) between dilutions and time to result (minutes). No amplification signal was detected in any of the negative (no template) controls.

#### **DETECTION OF RNA TARGETS**

OmniAmp Pol has inherent reverse transcriptase activity which enables RT-LAMP detection of RNA targets without modification

of the formulation used for DNA targets. RT-LAMP assays were developed for six viruses with RNA genomes (**Table 1**). Results indicate detection in less than 30 min with no additional steps or modification of the reaction formulations (**Figure 5A**). After LAMP, reaction products were separated on 2% agarose gel and appearance of ladder like patterns confirmed the correct amplification product as shown in **Figure 5B** for MS2.

High temperature optimum for OmniAmp was found to be indispensable in detecting certain highly structured regions of genomes especially common in RNA viruses. This advantage of OmniAmp was evaluated in developing a LAMP method for detection of BVDV type 1. Primers designed to target the 5- -UTR failed to amplify when used with standard isothermal incubation at 70◦C (not shown). A brief incubation at 92◦C for 3 s immediately prior to isothermal incubation allowed efficient amplification (**Figure 6**).

#### **RAPID SAMPLE PREPARATION METHOD**

Results presented in **Figure 7** shows that performance of heat lysis method in terms of sensitivity was equivalent to the commercial nucleic acid extraction kit. Presence of inhibitors in whole blood had no effect on the performance of OmniAmp Pol although it did increase TTR when compared to control (dilutions in Tris buffer). We tested the same protocol with other matrices (serum and feces) and results were comparable (data not shown).

### **LAMP WITH LYOPHILIZED REAGENTS**

A lyophilized formulation for the LAMP reagents was compared with wet reagents for detection of MS2 RNA phage target. No diminution of TTR or sensitivity was seen with the dried formulation compared to wet (**Figure 8A**) nor was an increase in non-specific amplification seen by visualization of LAMP products on a 2% agarose gel (**Figure 8B**). To evaluate stability, the lyophilized reaction mix (**Figure 9A**) was incubated at 23, 37, and 45◦C and assayed in a *Clostridium difficile* LAMP reaction compared to wet reagent stored at −20◦C (**Figure 9B**). The dried LAMP formulation was stable at 23◦C and 37◦C for 180 days. The dried reagent was stable at 45◦C for 50 days although it did show a measurable drop in TTR after 90 days.

#### **LATERAL FLOW DETECTION OF LAMP PRODUCTS**

A combination of LAMP and LFD was used to detect the catfish pathogen *E. ictaluri*. For this assay, the LAMP reaction was formulated and performed using the standard protocol except to facilitate LF detection, forward and reverse loop primers were labeled as described in the methods section. Results presented in **Figure 10** shows sensitivity and specificity of *E. ictaluri* LAMP as determined using LFD. Appearance of a red line only at "Control" indicated that no product was amplified/or detected in any of the dilutions from both *E. coli* and *E. tarda* cell cultures, confirming 100% specificity of *E. ictaluri* LAMP (**Figure 10**; lower panel).

In contrast, product was amplified and detected from the cell cultures of both *E. ictaluri* strains as well as positive controls indicating presence of target gene in the samples (**Figure 10**, upper panel). Positive reaction was detected only in dilutions −2, −4, and −6 indicating sensitivity of *E. ictaluri* LAMP equivalent to approximately 8 cells (starting concentration <sup>=</sup> 109 CFU mL−1).

### **DISCUSSION**

Nucleic acid tests (NATs) offer major advantages in terms of speed and sensitivity for pathogen detection, but these assays are not simple or inexpensive enough to implement in resourcelimited settings. However, development of LAMP technology has changed this paradigm and has given new impetus toward diagnostic methods suitable for use without extensive training or equipment. LAMP (Notomi et al., 2000; Mori and Notomi, 2009; Mori et al., 2013) is a nucleic acid amplification method that is highly amenable to isothermal detection and best suited to overcome some of the disadvantages of other NATs (PCR, real time PCR). This method uses four or more primers, two of which are engineered to generate loop structures in the nascent strand that primes a cascade of DNA synthesis resulting in microgram yields of amplification product from as low as single-copy targets in as little as 10 min. Well-designed LAMP tests rival real-time PCR in sensitivity and specificity and excel in simplicity of set-up and time to result.

A POC molecular diagnostic test using LAMP based assays is readily achievable, as the only instrument requirement is an inexpensive heater. Portable, battery operated heaters can be improvised (Hernandez et al., 2011) for remote detection amenable to use by individuals with very little training. In some cases, these assays are miniaturized and coupled to hand held devices which would allow instantaneous reporting of results to a central database from virtually any corner of the planet (Stedtfeld et al., 2012; Myers et al., 2013). In other cases, field operation is facilitated by detection of the amplification product using an inexpensive lateral flow device that provides an unambiguous easily interpreted result (Ge et al., 2013). During the last 10 years, LAMP based methods have been developed for detection of various pathogens (Parida et al., 2008; Fu et al., 2011; Mori et al.,

2013). Conventionally, LAMP uses *Bst* polymerase for amplification of DNA targets. In this paper, we report on applications and advantages of using OmniAmp polymerase in DNA and RT-LAMP reactions.

OmniAmp Pol has a unique combination of properties, including strand displacement, thermostability and reverse transcriptase activity that make it uniquely suitable for use in LAMP formulations for detection of both DNA and RNA targets without modification of the buffer formulation or work flow. In this study, we showed the ability of a single formulation of OmniAmp polymerase to amplify 4 bacterial and viral DNA targets such as *E. ictaluri, S. aureus*, *B. atrophaeus*, and PCV-2; and 6 RNA viral targets such as WNV, CCHF, EBoV, SIV, MS2, and BVDV. All of the targets amplified in under 30 min with high sensitivity and no alteration of formulation or process. In comparison, RT-LAMP using *Bst* polymerase requires pre-incubation with a reverse transcriptase, typically AMV RT for detection of RNA targets (Notomi et al., 2000; Tanner and Evans, 2014).

Post-incubation, separation of reaction products on 2% agarose gel showed ladder like patterns, which is typical of LAMP (Notomi et al., 2000). In certain cases, where non-specific amplifciation was observed, melt analysis was used to differentiate between specific and non-specific products. Yamamura et al. (2009) has shown the utility of melt analysis in enabling identification of correct amplification products.

The thermostability of OmniAmp polymerase compared to *Bst* polymerase translates into faster TTR, particularly with more dilute templates. Thermostability is especially important for amplification of GC rich targets or those with extensive secondary structure as high temperature incubation can be used to relax secondary structure. We showed utility of this approach in LAMP method for BVDV. Design parameters for bovine viral diarrhea virus (BVDV) type I LAMP primers is highly constrained by the overall variability of the BVDV genome (Deng and Brock, 1993). This variability limits primer designs to the conserved 5- -UTR, which is highly structured. However, brief incubation at 92◦C for 3 s before isothermal incubation enabled amplification through the secondary structure in the 5- -UTR region. In contrast, *Bst* polymerase is not stable above approximately 68◦C, and cannot be used for high temperature denaturation of structured targets.

Having a simple and easy to use sample preparation method is one of the major criteria for a true POC diagnostic test. Toward this end, we developed a simple heat lysis method for extraction of nucleic acid and crude lysates used as template in LAMP reaction. No inhibitory effects were observed, indicating that performance of OmniAmp Pol is not impacted by presence of sample matrix components that act as contaminants in PCR based amplification. Another major unmet requirement for POC diagnostics in resource limited settings is long shelf life without maintaining refrigeration or other means of a cold chain (Mabey et al., 2004; Nijru, 2012). The dried formulations described in this report were stable at ambient temperature (23◦C) and 37◦C for at least 6 months with no apparent loss in activity.

Positive LAMP reactions can be detected by agarose gel electrophoresis or spectrophotometric measurement of turbidity; however, these methods are not amenable to POC use. Fluorescent detection using dyes such as calcein (Tomita et al., 2008), or HNB (Goto et al., 2009), offers easy to use detection methods. Because these dyes can bind to any dsDNA, they fail to distinguish between specific and non-specific amplification products (Nijru, 2012; Ge et al., 2013). Use of SYBR green I is also not suitable for field applications as it has to be added after the completion of the reaction, a step which increases risk of contamination (Nijru, 2012). LFD has been explored as a means of detecting positive LAMP reactions (Njiru, 2011; Ge et al., 2013; Roskos et al., 2013). In this study, we evaluated LFD in combination with OmniAmp polymerase-based LAMP to visualize amplification products. This method improves specificity due to the secondary binding and detection of amplicon specific targets and negates the need for techniques and instruments unavailable in many low resource settings. The method of labeling the loop primers with biotin and FITC was found to provide high sensitivity and specificity for detection of true positive amplification products. In the present study, we could detect as little as eight cells from two different strains of *E. ictaluri* with no amplification of non-specific targets (*E. tarda and E. coli*). These results suggest high sensitivity and specificity of the detection method (LAMP coupled with LF) and shows utility of LFD as a simple and easy to use read out method for visualization of LAMP results.

#### **CONCLUSION**

Results presented in this paper show the utility of OmniAmp polymerase in LAMP assays for detecting both RNA and DNA targets. This formulation provides advantages in sample preparation, speed, shelf-stability, and reliability on structured templates compared to traditional LAMP enzymes. We also provide a POC compatible means of detecting positive reactions using LFD.

#### **AUTHOR CONTRIBUTORS**

Yogesh Chander helped conceive project, designed and performed experiments and wrote the manuscript. Jim Koelbl, Michael J. Moser, Audrey J. Klingele, Abel Carrias, and Jamie Puckett designed and executed experiments. Mark R. Liles conceived and interpreted experiments. David A. Mead helped conceive project and edit the manuscript. Thomas W. Schoenfeld conceived the project and edited the manuscript.

#### **ACKNOWLEDGMENTS**

Authors acknowledge funding received from various agencies: NIH, NSF, and USDA. We thank Drs. Thomas Geisbert and Dennis Bente at Galveston National Laboratory, Galveston, TX for providing VHF RNA extracts.

#### **REFERENCES**

Deng, R., and Brock, V. (1993). 5 and 3 untranslated regions of pestivirus genome: primary and secondary structure analysis. *Nucleic Acid Res*. 21, 1949–1957. doi: 10.1093/nar/21.8.1949


Yamamura, M., Makimura, K., and Ota, Y. (2009). Evaluation of a new rapid molecular diagnostic system for Plasmodium falciparum combined with DNA filter paper, loop-mediated isothermal amplification, and melting curve analysis. *Jpn. J. Infect. Dis.* 62, 20–25.

Zhao, K., Shi, W., Han, F., Xu, Y., Zhu, L., Zou, Y., et al. (2011). Specific, simple and rapid detection of porcine circovirus type 2 using the loop-mediated isothermal amplification method. *Virol. J.* 8:126. doi: 10.1186/1743-422X-8-126

**Conflict of Interest Statement:** Yogesh Chander, Jim Koelbel, Jamie Puckett, Michael Moser, Audrey Klingele, David Mead, and Thomas Schoenfeld are employed by Lucigen Corporation. Lucigen has commercialized the OmniAmp polymerase for research use only. Mark Liles and Abel Carrias have no commercial or financial relationship with Lucigen Corporation, WI and declare no conflict of interest.

*Received: 15 April 2014; accepted: 14 July 2014; published online: 01 August 2014. Citation: Chander Y, Koelbl J, Puckett J, Moser MJ, Klingele AJ, Liles MR, Carrias A, Mead DA and Schoenfeld TW (2014) A novel thermostable polymerase for RNA and DNA loop-mediated isothermal amplification (LAMP). Front. Microbiol. 5:395. doi: 10.3389/fmicb.2014.00395*

*This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology.*

*Copyright © 2014 Chander, Koelbl, Puckett, Moser, Klingele, Liles, Carrias, Mead and Schoenfeld. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*