# SYNTHETIC BIOLOGY: ENGINEERING COMPLEXITY AND REFACTORING CELL CAPABILITIES

EDITED BY: Francesca Ceroni, Karmella Ann Haynes, Pablo Carbonell and Jean Marie François PUBLISHED IN: Frontiers in Bioengineering and Biotechnology

#### *Frontiers Copyright Statement*

*© Copyright 2007-2015 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88919-685-2 DOI 10.3389/978-2-88919-685-2

# About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

# Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

# Dedication to quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

# What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# **SYNTHETIC BIOLOGY: ENGINEERING COMPLEXITY AND REFACTORING CELL CAPABILITIES**

# Topic Editors:

**Francesca Ceroni,** Imperial College London, UK **Karmella Ann Haynes,** Arizona State University, USA **Pablo Carbonell,** SYNBIOCHEM, University of Manchester, UK **Jean Marie François,** Université de Toulouse, France

Tackling multifaceted complexity in synthetic biology.

This illustration by Karmella Haynes incorporates figures from the articles in this Special Topic into a graphical design that is inspired by the Frontiers logo. One of the key features of biological systems is complexity, where the behavior of high level structures is more than the sum of the direct interactions between single components. Synthetic Biologists aim to use rational design to build new systems that do not already exist in nature and that exhibit useful biological functions with different levels of complexity. One such case is metabolic engineering, where, with the advent of genetic and protein engineering, by supplying cells with chemically synthesized non-natural amino acids and sugars as new building blocks, it is now becoming feasible to introduce novel physical and chemical functions and properties into biological entities.

The rules of how complex behaviors arise, however, are not yet well

understood. For instance, instead of considering cells as inert chassis in which synthetic devices could be easily operated to impart new functions, the presence of these systems may impact cell physiology with reported effects on transcription, translation, metabolic fitness and optimal resource allocation. The result of these changes in the chassis may be failure of the synthetic device, unexpected or reduced device behavior, or perhaps a more permissive environment in which the synthetic device is allowed to function.

While new efforts have already been made to increase standardization and characterization of biological components in order to have well known parts as building blocks for the construction of more complex devices, also new strategies are emerging to better understand the biological dynamics underlying the phenomena we observe. For example, it has been shown that the features of single biological components [i.e. promoter strength, ribosome binding affinity, etc] change depending on the context where the sequences are allocated. Thus, new technical approaches have been adopted to preserve single components activity, as genomic insulation or the utilization of prediction algorithms able to take biological context into account.

There have been noteworthy advances for synthetic biology in clinical technologies, biofuel production, and pharmaceuticals production; also, metabolic engineering combined with microbial selection/adaptation and fermentation processes allowed to make remarkable progress towards bio-products formation such as bioethanol, succinate, malate and, more interestingly, heterologous products or even non-natural metabolites. However, despite the many progresses, it is still clear that ad hoc trial and error predominates over purely bottom-up, rational design approaches in the synthetic biology community. In this scenario, modelling approaches are often used as a descriptive tool rather than for the prediction of complex behaviors. The initial confidence on a pure reductionist approach to the biological world has left space to a new and deeper investigation of the complexity of biological processes to gain new insights and broaden the categories of synthetic biology.

In this Research Topic we host contributions that explore and address two areas of Synthetic Biology at the intersection between rational design and natural complexity: (1) the impact of synthetic devices on the host cell, or "chassis" and (2) the impact of context on the synthetic devices.

Particular attention will be given to the application of these principles to the rewiring of cell metabolism in a bottom-up fashion to produce non-natural metabolites or chemicals that should eventually serve as a substitute for petrol-derived chemicals, and, on a long-term view, to provide economical, ecological and ethical solutions to today's energetic and societal challenges.

**Citation:** Ceroni, F., Haynes, K. A., Carbonell P. and François, J-M., eds. (2015). Synthetic biology: engineering complexity and refactoring cell capabilities. Lausanne: Frontiers Media. doi: 10.3389/978- 2-88919-685-2

# Table of Contents

*05 Editorial – Synthetic biology: engineering complexity and refactoring cell capabilities*

Francesca Ceroni, Pablo Carbonell, Jean-Marie François and Karmella A. Haynes


Lizbeth M. Nieves, Larry A. Panyon and Xuan Wang


Jacob Beal


Tamás Fehér, Vincent Libis, Pablo Carbonell and Jean-Loup Faulon

*111 New transposon tools tailored for metabolic engineering of Gram-negative microbial cell factories*

Esteban Martínez-García, Tomás Aparicio, Víctor de Lorenzo and Pablo I. Nikel

# **Editorial – Synthetic biology: engineering complexity and refactoring cell capabilities**

*Francesca Ceroni 1,2 \*, Pablo Carbonell <sup>3</sup> , Jean-Marie François 4,5,6 and Karmella A. Haynes <sup>7</sup>*

*<sup>1</sup> Centre for Synthetic Biology and Innovation, Imperial College London, London, UK, <sup>2</sup> Department of Bioengineering, Imperial College London, London, UK, <sup>3</sup> SYNBIOCHEM, Manchester Institute of Biotechnology, Faculty of Life Sciences, University of Manchester, Manchester, UK, <sup>4</sup> LISBP, INSA, INP, UPS, Université de Toulouse, Toulouse, France, <sup>5</sup> MR792, Ingénierie des Systèmes Biologiques et des Bioprocédés, INRA, Toulouse, France, <sup>6</sup> UMR 5504, CNRS, Toulouse, France, <sup>7</sup> Ira A. Fulton School of Biological and Health Systems Engineering, Arizona State University, Tempe, AZ, USA*

**Keywords: synthetic biology, complexity, metabolic engineering, emerging properties, crosstalk**

Synthetic Biology is now in its second decade and many goals have been achieved toward the rational design of biological systems. This Research Topic features and reviews some of the latest progress in Synthetic Biology with a focus on research at the intersection between rational design and natural complexity with a potential outcome to concrete biotechnological applications. Kelwick et al. (2014) summarize the great expansion in the genetic toolkit and DNA assembly techniques that are currently available for synthetic biologists. These tools will advance the implementation of new functions and the production of useful metabolites in living cells in a controlled fashion. Using engineering formality, Synthetic Biology aims to identify biological design principles that can be used for practical applications. As one of the results, metabolic engineering is now becoming feasible to introduce novel functions and properties into an increasing number of microbial hosts. Examples come from Yu et al. (2014) and Heider et al. (2014) that describe the production of fattyacid-derived chemicals and astaxanthin in microbes*,* respectively. Furthermore, bacteria can be engineered for the conversion of waste into renewable products, as Nieves et al. (2015) demonstrate with the bioconversion of lignocellulose.

Along with its great successes, Synthetic Biology is also encountering new challenges, represented by emerging behaviors in modified host cells (chassis) that are difficult to predict. Limitations in the robust prediction of gene networks arise from the lack of a proper understanding of the living systems used in synthetic biology. For instance, Akhtar and Jones (2014) appropriately present the evidence that the failure of a number of pathway engineering strategies are often due the lack of co-factors needed for the proper activity of the key enzymes. Co-factor production needs to be integrated in the system's design to achieve proper enzymatic activity. As synthetic network designs become more complex, emerging evidence shows that elements within these networks can exhibit crosstalk and lead to non-specific behavior. As presented in Davis et al. (2015), bacterial quorum sensing pathways, which are widely used in Synthetic Biology, exhibit crosstalk that can limit the number of nodes in a network, and therefore stifle efforts to build sophisticated systems. New efforts are needed to better understand the behavior of composable parts and to develop new orthogonal elements. Lastly, Beal (2015) addresses unresolved questions in the area of cell-based information processors and noise. He proposes a quantitative signal-to-noise ratio-based standard to assess circuit performance.

An important aspect in the engineering of living cells that only recently has been investigated in detail by the synthetic biology community is the interaction between the system and the chassis. The exploitation of the cell's resources for the operation of heterologous systems has proven to be deleterious, leading to non-robust gene expression and inefficient cellular

#### *Edited by:*

*Pengcheng Fu, Beijing University of Chemical Technology, China*

#### *Reviewed by:*

*Qiang Wang, Chinese Academy of Sciences, China*

> *\*Correspondence: Francesca Ceroni f.ceroni@imperial.ac.uk*

#### *Specialty section:*

*This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology*

> *Received: 06 July 2015 Accepted: 06 August 2015 Published: 21 August 2015*

#### *Citation:*

*Ceroni F, Carbonell P, François J-M and Haynes KA (2015) Editorial – Synthetic biology: engineering complexity and refactoring cell capabilities. Front. Bioeng. Biotechnol. 3:120. doi: 10.3389/fbioe.2015.00120* performance with decreased population growth. In that direction, Moya (2014) reflects on the need of controllable systems in synthetic entities preventing obsolescence, similarly as how living cells exhibit self-maintenance. In Fehér et al. (2015) the observation of the dynamic response of a malonyl-CoA biosensor in *Escherichia coli* was used to understand the toxicity of the overproduction of a synthetic compound, which interfered with the system's behavior. Martínez-García et al. (2014) present the development of new broad host range Tn5 vectors in order to relieve the burden of PHB production on the health of gram

# **References**


negative bacteria (*E. coli*). These studies address the need to investigate and develop controlled production of the molecule of interest to avoid burden-related negative feedback from the chassis.

As illustrated by the works in this Research Topic, now is a critical moment for Synthetic Biology, where the initial enthusiasm for the major achievements attained gives way to a deeper and better understanding of the complexity of biological systems. Advancing in this direction will significantly improve the applicability of design principles for living organisms.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Ceroni, Carbonell, François and Haynes. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Developments in the tools and methodologies of synthetic biology

# **Richard Kelwick 1,2\*, James T. MacDonald1,2, Alexander J.Webb1,2 and Paul Freemont 1,2\***

<sup>1</sup> Centre for Synthetic Biology and Innovation, Imperial College London, London, UK

<sup>2</sup> Department of Medicine, Imperial College London, London, UK

#### **Edited by:**

Karmella Ann Haynes, Arizona State University, USA

#### **Reviewed by:**

M. Kalim Akhtar, University College London, UK Dong-Yup Lee, National University of Singapore, Singapore

#### **\*Correspondence:**

Richard Kelwick and Paul Freemont, Department of Medicine, Centre for Synthetic Biology and Innovation, Sir Ernst Chain Building, South Kensington Campus, Exhibition Road, London SW7 2AZ, UK e-mail: r.kelwick@imperial.ac.uk; p.freemont@imperial.ac.uk

Synthetic biology is principally concerned with the rational design and engineering of biologically based parts, devices, or systems. However, biological systems are generally complex and unpredictable, and are therefore, intrinsically difficult to engineer. In order to address these fundamental challenges, synthetic biology is aiming to unify a "body of knowledge" from several foundational scientific fields, within the context of a set of engineering principles. This shift in perspective is enabling synthetic biologists to address complexity, such that robust biological systems can be designed, assembled, and tested as part of a biological design cycle. The design cycle takes a forward-design approach in which a biological system is specified, modeled, analyzed, assembled, and its functionality tested. At each stage of the design cycle, an expanding repertoire of tools is being developed. In this review, we highlight several of these tools in terms of their applications and benefits to the synthetic biology community.

**Keywords: synthetic biology, engineering biology, design cycle, tools, standardization**

## **INTRODUCTION**

The synthetic biology toolkit has expanded greatly in recent years, which can be attributed to the efforts of a highly dynamic community of researchers, ambitious undergraduate students in the International Genetically Engineered Machine competition (iGEM), and the growing number of amateur scientists from the DIY BIO movement. Each of these groups has bold ambitions for the rapidly growing field of synthetic biology, which aims to rationally engineer biological systems for useful purposes (Purnick and Weiss, 2009; Anderson et al., 2012; Landrain et al., 2013; Jefferson et al., 2014). The merging of several foundational sciences, including molecular, cellular, and microbiology with a set of engineering principles, is a profound shift and is the key distinction between synthetic biology and genetic engineering (Andrianantoandro et al., 2006; Heinemann and Panke, 2006; Khalil and Collins, 2010; Kitney and Freemont, 2012). Indeed, many social scientists, who are themselves a part of the synthetic biology community, have extensively explored the ontological implications of this perspective (Schark, 2012; Preston, 2013). Although many of the social aspects of synthetic biology are beyond the scope of this review, they will continue to shape the synthetic biology toolkit. In particular, society is an important stakeholder that has some influence over chassis (host cell) choice, the design of biosafety measures, biosecurity considerations, and long-term research applications (Marris and Rose, 2010; Anderson et al., 2012;Agapakis, 2013; Moe-Behrens et al., 2013;Wright et al., 2013; Douglas and Stemerding, 2014).

From a biological perspective, there have been important developments in the field across several areas, some of which have been reviewed elsewhere (Arpino et al., 2013; Lienert et al., 2014; Way et al., 2014). For instance, the number, quality, and availability of biological parts (bioparts, e.g., promoters and ribosomal binding sites) have continued to increase. This is exemplified by the iGEM student registry of standard biological parts, which has increased its biopart collection to include over 12,000 parts, across 20 different categories (partsregistry.org). However, due to its open nature, the iGEM registry contains parts of variable quality that are mostly uncharacterized. There are also professional parts registries, such as those at BIOFAB, which include expansive libraries of characterized DNA-based regulatory elements (Mutalik et al., 2013a,b). Although libraries of bioparts are indeed useful, putting them together into predictable devices, pathways and systems are incredibly challenging as many biological design rules are not yet fully understood (Endy, 2005; Kitney and Freemont, 2012). Developing synthetic passive and active insulator sequences may help increase predictability and thus reduce context dependency (Davis et al., 2011; Lou et al., 2012; Qi et al., 2012; Mutalik et al., 2013a). Notwithstanding these challenges, the field is progressing across several areas. One such area is biopart characterization, which is critical to the field, primarily because it is fundamentally a realization of several of the core engineering principles adopted in synthetic biology, namely standardization, modularization, and abstraction. Discrete biological parts of known sequence and behavior can be abstracted based upon a descriptive function and thus, their true complexity can be masked behind a biological concept. For example, discrete DNA sequences (bioparts) that fit a standardized descriptive function, such as a promoter, can be functionally characterized and as a consequence bioparts become reusable (modularization) for use in other synthetic systems. Additionally, methods that provide standardized ways of assembling DNA parts such as the BioBrick standard can help establish platforms for the sharing and reuse of bioparts. At a higher

level, abstraction and standardization are important because they permit the separation of design from assembly (Endy, 2005).

A desirable consequence of this perspective is that these engineering principles enable the separation of labor, expertise, and complexity at each level of the design hierarchy (Endy, 2005). In practical terms, this separation of biological design from DNA assembly enables innovation within these hierarchies to occur at different rates. For instance, it is generally true that with more recent DNA assembly methods it is currently easier to assemble multi-part genetic circuits consisting of several bioparts, or even entire genomes, than it is to reliably predict how these bioparts will interact in the final system (Purnick and Weiss, 2009; Ellis et al., 2011; Arpino et al., 2013; Ellefson et al., 2014). However, it is envisioned that this will change, with the increasing adoption of high-throughput characterization platforms that can test entire biopart libraries in parallel. These platforms typically use automated liquid-handling robots, coupled with plate readers although microfluidics approaches are also gaining traction (Lin and Levchenko, 2012; Boehm et al., 2013; Benedetto et al., 2014). In either case, when coupled with automated data analysis, modeling, and sophisticated forward-design strategies (Marchisio and Stelling, 2009; Wang et al., 2009; Esvelt et al., 2011; Ellefson et al., 2014;Marchisio, 2014; Stanton et al., 2014), these high-throughput platforms provide the basis for the rapid prototyping workflows required to realize a synthetic biology design cycle (Kitney and Freemont, 2012).

In this review, we focus on several significant tools, both classical and emerging, that the field of synthetic biology employs as part of a typical design cycle workflow. Building upon a design cycle template, the review is organized to explore prominent tools and research methodologies across three core areas: designing predictable biology (design), assembling DNA into bioparts, pathways, and genomes (build), and rapid prototyping (test) (**Figure 1**). We first describe several of the core challenges that are associated with designing predictable biology, including the complexities associated with chassis selection, biopart design, engineering, and characterization. In parallel, we highlight relevant tools and methodologies that are particularly aligned with the engineering principles of synthetic biology. We then discuss established and newly developed DNA assembly methodologies, and group them according to four broad assembly strategies: restriction enzyme-based, overlap-directed, recombination-based, and DNA synthesis. Finally, we highlight several emerging rapid prototyping technologies that are set to significantly improve the field's capacity for testing synthetic parts, devices, and systems. We conclude with a summary of several of the core challenges that were described in each of the design, build, test sections of the review and discuss whether the synthetic biology toolbox is equipped to address them. In addition to this, we have also created an online community, the Synthetic Biology Index of Tools and Software (SynBITS) – synBITS.co.uk, which has also been structured according to the design cycle (**Figure 1**).

### **DESIGNING PREDICTABLE BIOLOGY**

From an engineering perspective, living systems can be perceived as overly complex, inefficient, and unpredictable (Csete and Doyle, 2002). It is this perception that has driven the concept of the

biopart, in which a particular DNA sequence is defined by the function that it encodes (Endy, 2005). Thus, complex biological functions can be conceptually separated (abstracted) from the complexities of the sequence context from which they originated (Endy, 2005). As a consequence of this approach, biological pathways and circuits can potentially be redesigned into less complex and potentially more predictable designs. The defining examples of this perspective are the toggle switch (Gardner et al., 2000), a genetic circuit defined by two repressible promoters that were engineered to form a mutually inhibitory network, and the repressilator (Elowitz and Leibler, 2000), a type of oscillator (biological clock). What sets these examples apart from general genetic engineering is that modeling was used to predict and optimize the behavior of these genetic circuit designs prior to their construction.

While these forward-design approaches were hugely successful, the repressilator displayed noisy behavior as a result of stochastic fluctuations in components of the genetic circuit (Elowitz and Leibler, 2000). In other words, *in silico* modeling did not fully capture the true *in vivo* complexity of the synthetic circuit. Likewise, the toggle switch experienced natural fluctuations in gene expression that were sufficient to create variations in the level of inducer needed to switch the cells from one state to another. These variations were also not fully anticipated during *in silico* modeling (Gardner et al., 2000). While these genetic circuits have

been improved, with novel oscillator (Stricker et al., 2008; Olson et al., 2014) and toggle switch designs, including those designed for mammalian cells (Muller et al., 2014b) and plants (Muller et al., 2014a), it is clear that the modeling of biological systems still requires a concerted and long-term effort. Critical to this effort is the availability of new synthetically designed bioparts and experimental data that accurately captures the behavior of the components or bioparts that constitute a synthetic system (Arkin, 2013) as well as the characteristics or influence that the chassis/host cell enacts upon them.

#### **CHASSIS SELECTION**

As an engineering concept, the chassis refers to a physical internal framework or structure that supports the addition of other components that combine to form a finalized engineered structure. From a synthetic biology perspective, the concept invokes an understanding that a biological chassis is a tool to provide the structures that accommodate (host) the execution of a synthetic system, including the provision of a metabolic environment, energy sources, transcription, and translation machinery, as well as other minimal cellular functions (Acevedo-Rocha et al., 2013; Danchin and Sekowska, 2014). Chassis selection is therefore a critical design decision that synthetic biologists are required to take, particularly since the chassis will directly influence the behavior and function of a synthetic system. Essentially, the chassis determines which bioparts can be used since they must be compatible with the biological machinery that is present. This can result in a difficult choice for the synthetic biologist: either to use an established chassis and design the circuit to be orthogonal with that host, or design a synthetic system that fits a requirement and then choose a host chassis that is compatible with the resultant bioparts or system. These constraints can to some degree be designed around, either by engineering the chassis to knockout genes that optimize its orthogonality and reduce burden, through codon optimization (Chung and Lee, 2012) or through the use of insulator sequences that negate context dependency effects (Guye et al., 2013; Torella et al., 2014a). Ultimately, however, chassis selection will dictate the downstream design considerations for any given synthetic system, and therefore, chassis selection must be coordinated with biopart design efforts.

In order to rationalize which chassis selection strategy is most appropriate for an intended application, it is important to consider the consequences and advantages of each strategy. Where a chassis is selected as a priority above that of the design considerations of the synthetic system, it is important to consider whether the chassis has been extensively characterized in the literature and/or if the chassis has known intrinsic capabilities that complement the intended application (**Table 1**). Additionally, access to detailed biological knowledge of a chassis will aid modeling-guided design efforts and the implementation of chassis optimization strategies for dealing with burden or metabolic flux effects. Likewise, the wealth of knowledge acquired about model organisms across several biological disciplines may encourage synthetic biologists to consider them as a potential chassis in preference to established favorites (**Table 1**). Indeed, there are already several emerging chassis that are gaining traction and are set to be utilized more frequently in the field (**Table 1**).

Alternatively, a synthetic system could be specified and designed as a priority above that of chassis selection. As a consequence, there will be chassis, which are not compatible with the synthetic system and others that may require extensive engineering to accommodate its design. However, this approach is complementary to those chassis that are bespoke engineered. "Synthia," the first organism to feature a fully synthetically manufactured genome, is indicative that the field of synthetic biology is shifting toward the development of rationally engineered chassis (Gibson et al., 2010a)*.* Though it is important to recognize that the "Synthia" genome, while synthetic in origin, was not designed to significantly alter the characteristics of the chassis, and therefore, does not represent the first truly bespoke-engineered chassis. Yet, its successors, the synthetic yeast project (Annaluru et al., 2014), protocell developments (Xu et al.,2010),and even to some extent cell-free expression systems (Shin and Noireaux, 2012; Sun et al., 2013a) may all usher in an era in which the design of bespoke-engineered chassis is routine. Wholly rationally engineered chassis could conceivably be built around the specifications of a synthetic system, such that the chassis is both compatible with the synthetic system and the majority of its cellular resources are directed toward the execution of the synthetic system. In this sense, the function of the synthetic system would be free of chassis constraints; however, the full realization of this approach is still several decades away. Until then, chassis selection will remain a trade-off between which should be prioritized for each application, the chassis or the synthetic system? There are of course many other considerations to address, some of which we cover in the biopart design section of this review and others that have been previously discussed in the literature (Heinemann and Panke, 2006; Arpino et al., 2013; Danchin and Sekowska, 2014).

#### **BIOPART DESIGN AND ENGINEERING**

The field of synthetic biology continues to benefit from decades of biological research that has built a knowledge base of biological systems that can be deconstructed and re-engineered as bioparts and synthetic systems. Here, we highlight prominent bioparts that are particularly aligned with the engineering principles of synthetic biology. In most cases, existing natural biological parts can be reused in synthetic devices or systems. However, there are situations where new bioparts need to be designed and synthesized by modifying existing bioparts or by creating entirely new parts *de novo*. These novel parts could be enzymes that catalyze unnatural reactions (Jiang et al., 2008; Rothlisberger et al., 2008), molecular biosensors (Penchovsky and Breaker, 2005), protein scaffold (Koga et al., 2012; Heider et al., 2014), DNA or RNA scaffolds (Rothemund, 2006; Delebecque et al., 2011), ribosome-binding sites with specifically designed transcription rates (Salis et al., 2009), promoters with novel regulatory features and/or specific translation rates (Marples et al., 2000; Kelly et al., 2009).

Transcriptional circuits use RNA polymerase operations per second (PoPS) as the common signal carrier but, until recently only a small set of DNA-binding proteins and associated operator sequences were used to regulate the flux of RNA polymerase (RNAP) and construct synthetic circuits. The lack of a large set of orthogonal regulatory proteins has limited the complexity of synthetic systems (Purnick and Weiss, 2009), but a new wave of

#### **Table 1 | Synthetic biology chassis**.


(Continued)

#### **Table 1 | Continued**


engineered proteins has greatly increased the number of tools available to synthetic biology circuit designers. The clustered, regularly interspaced, short palindromic repeats (CRISPR)/Cas system consists of CRISPR and CRISPR associated genes (cas) coding for related proteins, which together constitute an adaptive prokaryotic immune system (Barrangou et al., 2007). The CRISPR

loci consist of repeats interspaced with spacer sequences, which are transcribed and processed into crRNAs containing individual spacer sequences that are complementary to foreign DNA. The crRNAs bind Cas9 nuclease and the resulting complex recognizes and cleaves sequences complementary to the spacer sequences. This natural system has been repurposed as a transcriptional regulator by modifying the Cas proteins to deactivate nuclease activity and creating artificial guide RNA (gRNA) sequences to create the CRISPR interference (CRISPRi) system. Deactivated Cas9:gRNA complexes act as repressors by binding specific sites and inhibiting RNAP activity (Qi et al., 2013). Alternatively, Cas9 can be fused to domains that recruit RNAP in order to act as transcriptional activators (Bikard et al., 2013; Mali et al., 2013).

Transcription activator-like effectors (TALEs) are proteins secreted by *Xanthomonas* bacteria in order to activate expression of plant genes during the course of infection. They consist of tandem repeats of a small domain with two variable amino acid sites. The amino acid identities of the variable sites have a simple mapping to the DNA base recognized, enabling chains of domains to be stringed together in order to bind specific sequences (Boch et al., 2009; Moscou and Bogdanove, 2009). The simple modular nature of TALEs has enabled the engineering of synthetic proteins such as TAL effector nucleases (TALENs) (Mahfouz et al., 2011) and artificial orthogonal activators and repressors (Morbitzer et al., 2010; Blount et al., 2012).

Translation initiation regulators are relatively easy to *de novo* design as they rely on the reasonably well-characterized thermodynamics of RNA structure (Liang et al., 2011). However, unlike transcriptional circuits, there is no common signal carrier and thus they cannot be as easily composed into complex regulatory designs. By repurposing a regulatory element from the *tnaCAB* operon of *E. coli*, Liu et al. (2012), have created an adapter to convert translational regulators into transcriptional regulators (Liu et al., 2012). The 5<sup>0</sup> -end of the operon codes for a short leader peptide, TnaC that stalls the ribosome in the presence of free tryptophan. The stalled ribosome then blocks a Rho factorbinding site located adjacent to the stop codon of *tnaC*, allowing the transcription of the downstream genes *tnaA* and *tnaB*. The ribosome-binding site of *tnaC* in the native operon is constitutive but replacing this with translational regulator sequences, such as the RNA-IN/OUT system (Ross et al., 2013), enables the control of transcription of downstream genes.

In recent years, there has been rapid progress in developing software algorithms to enable the design of synthetic proteins that can be controlled at the atomic level of resolution (Leaver-Fay et al., 2011). Computational protein design is generally split into two components. Initially, a backbone scaffold is either artificially generated or taken from an existing known structure. Secondly, the amino acid sequence is optimized such that it minimizes the free energy of folding. It appears that minimizing a potential energy function by trialing different amino acid identities and rotamers is sufficient to achieve this. There have been a number of dramatic successes using this approach including the *de novo* design of enzymes. The design of a novel enzyme requires knowledge about the transition state structure of the reaction to be catalyzed and a predicted spatial arrangement of chemical groups that are likely to stabilize the transition state. The transition state structure and the stabilizing constellation of chemical groups around it can be designed theoretically (theozyme) (Tantillo et al., 1998), and using this knowledge, known protein structures can be searched for sites capable of accommodating side chain functional groups in the desired geometry (Zanghellini et al., 2006). These methods

have resulted in a number of successful enzyme designs (Jiang et al., 2008; Rothlisberger et al., 2008).

In the cell, many biochemical processes are spatially organized in order to locally concentrate substrates or isolate toxic substances (e.g., the carboxysome or peroxisome) and reduce cross talk between components. Efforts to engineer high-level organization in synthetic biological systems is a major challenge with applications in encapsulating artificial organelles or protocells (Choi and Montemagno, 2005;Agapakis et al., 2012; Hammer and Kamat, 2012; Mali et al., 2013), the precise detection and delivery of payloads (Sukhorukov et al., 2005; Uchida et al., 2007). Methods based on the computational protein design methods described above have been applied to create new self-assembling biomaterials at the atomic level from protein subunits that do not naturally form into higher-order structures (King et al., 2012, 2014). Other work has focused on using hydrophobic patterning of peptides to produce higher-order structures based on coiledcoils (Rajagopal and Schneider, 2004; Woolfson and Mahmoud, 2010; Zaccai et al., 2011; Fletcher et al., 2013). However, these are not designed to atomic level accuracy, tend to be chemically synthesized and so have not yet been reported to assemble *in vivo*. An alternative approach is to reuse naturally occurring protein– protein interfaces and assemblies (Padilla et al., 2001; Howorka, 2011; Sinclair et al., 2011), although ultimately it may be more desirable to design completely artificial protein scaffolds that are more likely to be biologically neutral and avoid the Mullerian complexity of naturally evolved biological systems (Dutton and Moser, 2011).

Novel protein biomaterials have applications in metabolic engineering by co-locating enzymes in the same pathway on a structural scaffold. This has the advantage of increasing the local concentration of substrates improving reaction kinetics, helping to prevent the loss of intermediates to competing pathways and the accumulation of toxic intermediates (Dueber et al., 2009). Protein cages can be used to completely encapsulate metabolic pathways and create synthetic bacterial micro-compartments. In a recent example, genes from the propanediol utilization operon (*pdu*) encoding for an empty protein shell in *Salmonella enterica* were expressed in *E. coli*. Short peptide sequences known to bind to *pdu* shell proteins were used to target pyruvate decarboxylase and alcohol dehydrogenase to the micro compartment resulting in increased ethanol production (Lawrence et al., 2014). This approach promises to be particularly useful for biosynthetic pathways involving toxic metabolites.

Similarly, structural scaffolds can be constructed using nucleic acids. Base pairing in nucleic acids makes predicting and designing structures somewhat more tractable than for proteins. For example, 2D and 3D structures have been engineered *in vitro* using long single stranded DNA (ssDNA) and small ssDNA oligonucleotides called "staples," that direct the folding of the long ssDNA into a pre-designed structure (Rothemund, 2006; Douglas et al., 2009; Han et al., 2011). It has also been shown to be possible to express simpler nanostructures *in vivo* using RNA transcribed by the cell (Delebecque et al., 2011). These structures were used together with specific protein-binding aptamers to efficiently channel substrates from one enzyme to another and substantially increase hydrogen production. At short distances, substrate channeling has been

found to be more effective than expected by simple 3D Brownian diffusion models (Fu et al., 2012, 2014).

A number of tools for predicting and designing relative translation rates of ribosome-binding sites have been developed (Reeve et al., 2014) including the RBS Calculator (Salis et al., 2009; Salis, 2011), the RBS Designer (Na and Lee, 2010), and the UTR Designer (Seo et al., 2013), which can aid operon design (Arpino et al., 2013). These software tools are based on thermodynamic models of the pre-initiation complex of the 30S ribosomal subunit and the messenger RNA (mRNA) and include terms based on the free energy required to unfold the unbound mRNA, the free energy of hybridization of the mRNA and the 16S rRNA, and various other terms. If the pool of free 30S ribosomal subunits is assumed to be roughly constant then the translation initiation rate can be assumed to be proportional to exp(−β∆G). Mechanistic predictive models for promoters are somewhat more complicated as promoter strength is related to the binding of the sigma factor and RNAP, and also the efficiency of promoter escape. However, there has been some success in predicting the strength of promoters for the *E. coli* sigma factor σ <sup>E</sup> using relatively simple position weight matrix models (Rhodius and Mutalik, 2010; Rhodius et al., 2012).

Most of the synthetic regulatory tools described above are used in the construction of transcriptional circuits. Nevertheless, post-transcriptional circuit design, particularly using RNA molecules, has attracted a great deal of interest in recent years (Liang et al., 2011; Wittmann and Suess, 2012). Unlike proteins, RNA molecules are somewhat easier to design due to their wellunderstood thermodynamics and the dominance of secondary structure formation on folding. One important application area is the use of RNA as switches (riboswitches) that respond to their environment. Riboswitches are RNA molecules that can regulate protein production in response to changes in the concentration of a small molecule and occur naturally as well as being synthetically designed. These molecules are composed of an RNA aptamer that binds a specific small-molecule ligand. On binding the small molecule, the RNA aptamer may then change conformation resulting in either the occlusion of the Shine–Dalgarno sequence or its increased accessibility. Expression of the downstream genes is then turned either on or off in response (Suess et al., 2004). Alternatively, aptamers may be coupled to a ribozyme that allosterically cleaves itself in response to ligand binding (Tang and Breaker, 1997; Penchovsky and Breaker, 2005). The *de novo* design of small-molecule binding RNA aptamers is a non-trivial task but novel aptamers can be evolved *in vitro* using methods such as SELEX (systematic evolution of ligands by exponential enrichment) (Ellington and Szostak, 1990; Tuerk and Gold, 1990; Jenison et al., 1994). Riboswitches may have uses in metabolic engineering such as down-regulating upstream genes if a metabolite reaches toxic levels (Zhang and Keasling, 2011). RNA aptamers have also found use as mimics of GFP by binding small-molecule fluorophores (Paige et al., 2011). As discussed in the metrology section, these aptamers can be used to monitor mRNA levels.

#### **BIOPART CHARACTERIZATION**

Biopart characterization describes the functional and experimental metadata that is required to sufficiently capture the biological behavior of a biopart and the context in which it is being tested. The type and range of these characterization data have evolved over time as highlighted by refinements in biopart characterization data sheets (Arkin, 2008; Canton et al., 2008). Typically, these experimental metadata include information on the plasmid vector, the testing organism or strain, any relevant growth conditions, and the equipment or methodologies used to capture the bioparts functionality. The primary purpose of biopart characterization data is to provide the necessary experimental data for predictive *in silico* biological modeling. The determination of which biological data provide0 the greatest insight into the behavior of a given biological system is largely debatable; at least until more biological design rules are understood. The context dependency of bioparts *in vivo* provides significant challenges in predicting their function as modular components. Therefore, for biopart characterization, measurement standards should largely be defined by those biological data that can be measured (metrology), how relevant those data are for predicting the behavior of a biological process (modeling) and how widely these data can be adopted (standardization). This last point is particularly important since bioparts should ideally be reusable (modular) across multiple applications and contexts. To enable this, the formatting of these data should ideally be standardized to facilitate the measurement and use of biopart characterization data across different *in silico* design tools, forward-design strategies, and workflows.

The most concerted effort is the Synthetic Biology Open Language (SBOL) consortia, a group of life scientists, engineers, computer scientists and mathematicians that are actively building a set of standards that define a common data format for bioparts and their accompanying characterization data (Bower et al., 2010; Galdzicki et al., 2011; Quinn et al., 2013; Roehner and Myers, 2013). The concept is to create a file structure that can capture biopart sequence, characterization and experimental data in a format that is platform independent. Crucially, the format is designed to be extendable to include additional parameters as new characterization technologies and methodologies emerge. In combination with SBOL visual (SBOLv),which defines a standardized way to visually denote bioparts through symbols, the SBOL standard is set to enable the seamless sharing of genetic designs. Several bioinformatics and molecular cloning design tools have already adopted SBOL, and the intention for SBOL is to provide an interoperable standard between several *in silico* tools such that individuals can optimize their workflow as required, yet retain information between them. Several of these *in silico* tools have been extensively reviewed (MacDonald et al., 2011; Galdzicki et al., 2014); however, we include an updated list here, that combines *in silico* tools from these existing reviews, along with several new tools, in particular R2o designer and COOL (**Table 2**).

In contrast to SBOL, which is still under development, the iGEM registry of standard biological parts (http://parts.igem.org/) has provided a relatively large-scale and publically accessible repository of bioparts and some biopart characterization data for almost 10 years. Since its inception, the iGEM community has led, with mixed success, concerted efforts to improve the quality of its characterization data. The 2014 iGEM competition, for instance, has announced several specialist awards for teams that demonstrate advancements in metrology. This push for improvements in biopart characterization at the grassroots (undergraduate) level

#### **Table 2 | Emerging tools for the forward-design of synthetic pathways and systems**.


has permeated up to professional characterization efforts. For instance, early difficulties in the reproducibility of the behavior of DNA regulatory elements between iGEM teams and professional research groups provided the context for the emergence of the relative promoter unit (RPU) as a reference measurement standard (Kelly et al., 2009). The RPU standard compares the relative activity of a promoter against a reference standard, tested under the same experimental conditions, with an RPU arbitrarily set to 1. The rationale underpinning this standard is that while the absolute activity of a promoter may differ between experimental repeats, the relative activity should be less prone to such variability. Essentially, a promoter that is twice the strength of the standard should remain so, even between different experimental conditions and methodologies of different research groups. In agreement with this, Kelly et al. (2009) reported a 50% decrease in variability, when RPUs were independently reported for a set of Anderson constitutive promoters. Inter-experimental variability and reproducibility of data are a significant problem facing all scientific endeavors (Collins and Tabak, 2014), and for synthetic biologists the RPU measurement standard has highlighted these issues within the context of biopart characterization.

There are, however, no universally agreed standards for advancing biopart characterization metrology, though in general the field is shifting away from relative measurements toward absolute measurements (**Table 3**). Many research groups are currently interested in measuring absolute numbers of cells, DNA molecules, proteins, or other components that constitute the synthetic system and its context. But this shift is largely incremental as certain types of biological data are very difficult to measure directly. These challenges are, however, worth addressing since it is assumed that such biological data are essential to improve the predictive capabilities of forward-design *in silico* models (Bower et al., 2010; Cooling et al., 2010). Yet, because of such data limitations, current-modeling approaches often depend upon inferred or assumed parameters that are derived from biological data that can be experimentally verified. One such modeling approach by Canton et al. (2008), proposed a set of standardized measurement units termed, PoPs and ribosomes per second (RIPS), even though the absolute biological data that underpin them has not been directly measured *in vivo* (Canton et al., 2008; Cooling et al., 2010; Marchisio, 2014). PoPs infers the flow of RNAP along a point of DNA per second and RIPS infers the flow of ribosomes across an mRNA molecule. As previously noted, PoPs and RIPS cannot be measured directly; instead they are calculated using fluorescence data from a reporter protein (e.g., GFP), growth data (OD), and largely assumed values for other parameters including protein or mRNA concentrations. These data are generally measured *in vivo* within a plate reader

setup, though flow cytometry-based characterization efforts are increasingly being adopted and are set to progress metrology at the single cell level (Díaz et al., 2010; Tracy et al., 2010; Choi et al., 2013; Zuleta et al., 2014). In either case, if experimental setups are sufficiently standardized, it is possible to convert measurements between several widely adopted standards: RPU, PoPs/RIPS, and absolute measurements such as GFP cell−<sup>1</sup> s −1 (Kelly et al., 2009).

Notwithstanding the above limitations of PoPs and RIPS, these units were primarily designed to reflect the behavior of genetic circuits at the level of information flow (inputs/outputs) rather than at the truly mechanistic level (Gardner et al., 2000; Canton et al., 2008; Stricker et al., 2008; Marchisio, 2014). For biologists, however, these terms represent an abstract merger of several elements of the transcriptional and translational machinery, which does not accurately reflect the mechanistic underpinning biology. However, abstract and mechanistic modeling approaches are not necessarily mutually exclusive since both approaches can provide insightful information for the forward-design of predictable biological pathways and systems.

Advances in metrology and novel measurement standards that are accessible, and hence, more widely adopted will clearly benefit the whole field of synthetic biology. Yet, it is challenging to achieve consensus for developing measurement standards, since standards intrinsically empower those that promote them above those that have not adopted them (Calvert, 2012; Frow and Calvert, 2013). Conversely, it should be noted that consensus in measurement standards and metrology does not preclude innovation if such standards are flexible enough to accommodate developments in the tools and methodologies that enable researchers to easily share, reuse, and build upon existing genetic designs. Likewise, standardized biological information can still be combined with expert knowledge, or novel forward-design strategies for the construction of complex, robust, and efficient biological systems.

Metrology in biology has been enabled in part to continual advancements in microscopy and in synthetic biology,


#### **Table 3 | Synthetic biology measurement standards**.

technologies such as microfluidics coupled with quantitative microscopy are continuing to gain traction (Lin and Levchenko, 2012; Song et al., 2013; Walter and Bustamante, 2014). Microfluidic technologies enable the precise manipulation of fluids at small-scales through engineered channels, chambers, and valves. Microfluidic chip designs are sufficiently advanced to enable a high-degree of spatial-temporal control of liquid-flows to and between individual cells or cell populations seeded within the chambers of prefabricated microfluidics chips. With this level of control, small molecules that induce gene expression or influence other biological processes can be precisely delivered to elicit acute, basal, or morphogenic responses. Within a synthetic biology context, such systems have been used to characterize DNA regulatory elements, intercellular communication, and synthetic pathways at high spatial–temporal resolution. One notable example shown by Hansen and O'Shea (2013), in which the microfluidic control of the delivery of a small molecule (1-NM-PP1) was used to control the nuclear localization of a Yeast stress-inducible transcription factor, Msn2. Deliberate alterations in the oscillatory or acute dynamics of Msn2 trans-nuclear localization revealed the extent to which promoters respond differently to transcriptional-activation dynamics. From this, promoters could be modeled *in silico*, according to the extent that they could elicit differential gene expression patterns, as a consequence of their ability to distinguish a genuine nuclear-influx of Msn2 from background "noise" (Hansen and O'Shea, 2013). Manipulation of these dynamics could be used to reduce promoter leakiness; or conversely to exploit different classes of promoter transcriptional-signal processing to coordinate multiple genetic programs, through the modulation of a single transcription factor.

Another important technology for synthetic biology is flow cytometry, which relies upon hydrodynamic focusing to guide single cells through a fluidic channel where they are measured (Piyasena and Graves, 2014). Recent models of flow cytometers can simultaneously measure cell size, complexity, and up to 17 channels of fluorescence (Basiji et al., 2007; Piyasena and Graves, 2014), each of which could be used to capture data from different reporter outputs. Of the biological reporters available, RNA aptamers are particularly noteworthy, since they have the potential to increase the type and range of biological information that can be measured (Cho et al., 2013; Pothoulakis et al., 2014). For instance, several groups have reported the simultaneous measurement of both transcription (mRNA levels) and translation (protein levels) (Chizzolini et al., 2013; Pothoulakis et al., 2014). In both cases Spinach, an RNA aptamer that binds a fluorophore (Paige et al., 2011), was incorporated within the 3<sup>0</sup> untranslated region (UTR) of a fluorescent reporter protein, either GFP or RFP (Chizzolini et al., 2013; Pothoulakis et al., 2014). Providing there is no spectraloverlap between fluorophores, this strategy could conceivably be up-scaled to measure entire synthetic pathways, and thus inform operon design strategies (Hiroe et al., 2012;Chizzolini et al., 2013). Metabolic engineering efforts may also benefit from engineered RNA aptamer-hybrids that simultaneously bind cellular metabolites and a fluorophore, effectively enabling the real-time reporting of intracellular metabolic flux (Barrick and Breaker, 2007; Roth and Breaker, 2009; Sefah et al., 2013; Szeto et al., 2014). These exponential increases in biological data could significantly impact

whole-cell modeling (Atlas et al., 2008; Gama-Castro et al., 2011; Shuler et al., 2012; O'Brien et al., 2013) and pave the way for novel measurement standards or modeling approaches that are wholly based upon directly measured biological processes.

# **ASSEMBLING DNA INTO BIOPARTS, PATHWAYS, AND GENOMES**

Recombinant DNA technology, in which DNA sequences are "cut and pasted" together via restriction enzymes and DNA ligases respectively, form the foundations of the 1970s biotechnological revolution and have greatly expanded the possibilities of genetic engineering (Zimmerman et al., 1967; Cohen et al., 1973; Lobban and Kaiser, 1973). Synthetic biology continues to benefit from these foundational advancements in recombinant DNAbased biotechnology. For example, the BioBrick DNA assembly standard, uses a set of standardized restriction sites, termed the prefix (*Eco*RI *Xba*I) and suffix (*Spe*I *Pst*I), that flank each biopart (BioBrick) (Rokke et al., 2014). Digestion and ligation using these sites allow several parts to be assembled together in a standard fashion. The BioBrick standard was originally developed by Tom Knight in 2003 and is still used within the synthetic biology community, particularly during the iGEM competition (Rokke et al., 2014). The BioBrick assembly standard is beneficial to the synthetic biology community for several reasons. Firstly, the flanking restriction site sequences set a physical border that defines individual bioparts. As a result, the BioBrick assembly standard realizes the idea that DNA sequences encode discrete functions and that these individual blocks (BioBricks) can be assembled together like "legotm bricks." Additionally, the use of standardized restriction sites ensures that the cloning strategy for assembling BioBricks is standardized across the entire research community; thereby eliminating the requirement for some cloning-based tacit knowledge. Despite these advantages, a major limitation of the approach is that BioBrick sequences must not contain the prefix and suffix restriction sites, thus limiting the range of sequences that can be assembled. Additionally, when *Xba*I and *Spe*I sites are ligated together, the ligated sequence creates a "scar," which does not contain either an *Xba*I or *Spe*I restriction site (Speer and Richard, 2011; Rokke et al., 2014). Scar sequences may alter the behavior of the flanking bioparts or prevent the generation of fusion proteins, and therefore, can be undesirable (Anderson et al., 2010; Ellis et al., 2011).

BioBrick assembly is also an inefficient way to create large multi-part constructs since it is limited to the assembly of two bioparts per reaction, as defined by the three antibiotic (3A) assembly method (Speer and Richard, 2011). ePathBrick potentially overcomes this limitation through the use of an expansive set of BioBrick-compatible isocaudomer pairs of restriction sites (Xu et al., 2012). The combinatorial assembly of multiple inserts is possible through the restriction digestion and ligation of different isocaudomer pairs into an ePathBrick vector. Backwards compatibility with the BioBrick standard is certainly advantageous from the perspective of modularity (re-useable bioparts); however, ePathBrick is still subject to the BioBrick limitations of forbidden sequences and post-assembly scar sequences. With these limitations in mind, several DNA assembly methods have been developed to address them (**Figure 2**) (Chao et al., 2014).

within a plasmid vector.

# **RESTRICTION-DIRECTED ASSEMBLY: Bgl BRICKS, GOLDEN GATE, AND SEVA**

complementarity of the integration sequence with the target locus.

Golden gate assembly (Engler et al., 2008; Engler and Marillonnet, 2011, 2013), Bgl Bricks (Anderson et al., 2010), and the Standard EuropeanVector Architecture (SEVA) (Silva-Rocha et al.,2013) use a set of restriction sites to standardize DNA assembly. However, in contrast to the BioBrick standard, these assembly methods use rare restriction site sequences, and therefore, support a greater range of sequences. The Bgl Brick standard uses *Bgl*II and *Bam*HI restriction sites. Annealed *Bgl*II and *Bam*HI restriction sites generate an inert, glycine-serine encoding scar sequence, which in contrast to the BioBrick standard scar allows the assembly of protein fusions. Golden Gate assembly supports scar-less assembly through the use of Type IIS restriction enzymes that act by cleaving outside of their recognition sequence leaving a variable overhang, which directs the assembly order and ligation reaction. If cleavage sites are designed appropriately, these overhangs can be designed so that

the final assembled sequences are "scar-less." More recently, combinatorial Golden Gate assembly methods have been described that allow multi-gene constructs, including synthetic pathways, to be assembled in parallel (Engler and Marillonnet, 2011, 2013). SEVA and to some extent ePathBrick, differ from the majority of assembly methods in that they are more correctly described as modular standards. SEVA describes a set of criteria for the physical assembly of plasmids according to a three-component architecture: an origin of replication segment, a selection marker segment, and a cargo segment (Silva-Rocha et al., 2013). These segments are flanked by insulator sequences and assembled together with a set of rare restriction sites. While the rationales for restriction site-based assembly methods support modularity, their limitations have led several research groups in the synthetic biology community to "trade-in" standardization and modularity, in favor of "bespoke" assembly methods that enable one-pot assembly of multiple DNA parts.

# **OVERLAP-DIRECTED ASSEMBLY: GIBSON, SLiC, CPEC, SLiCE, AND PAPERCLIP**

Daniel Gibson developed a widely adopted DNA assembly method that allows multiple DNA fragments to be assembled in a onepot *in vitro* reaction (Gibson et al., 2009; Gibson, 2011). The Gibson assembly uses a linearized destination vector and PCR generated inserts as its starting material. Inserts are generated with PCR primers that include 20–40 bp overlaps that share sequence homology to adjacent DNA fragments. As a result, the correct arrangement of several inserts entering the same destination vector can be defined. During the reaction, a T5 exonuclease acts to chew-back at the 5<sup>0</sup> ends of the linearized destination vector and inserts. The reaction occurs at 50°C and therefore the T5 exonuclease along with its activity is eventually inactivated. The destination vector and inserts anneal together, as defined by their exonuclease exposed homologous ends, and Phusion polymerase activity acts to fill in the gaps. Finally, Taq ligase seals nicks between the joined DNA fragments. Gibson assembly is simple, can assemble five or more parts in a single reaction, and the reaction itself only takes around 60 min, after which the final assembled product can be directly transformed into *E. coli*.

Sequence and Ligase-independent Cloning (SLiC) (Li and Elledge, 2007), Circular Polymerase Extension Cloning (CPEC) (Quan and Tian, 2009, 2011), Seamless Ligation Cloning Extract (SLiCE) (Zhang et al., 2012) are also overlap-directed DNA assembly methods that all result in the same final product. Therefore, inserts and destination vectors designed for Gibson assembly can also be used in SLiC, CPEC, and SLiCE assemblies. During SLiC reactions, the destination vector and inserts are independently treated *in vitro* with T4 DNA polymerase, which exhibits exonuclease activity in the absence of deoxynucleotide triphosphates (dNTPs). Exonuclease activity is subsequently inhibited with the addition of deoxycytidine triphosphate (dCTP) and the destination vector and inserts are then mixed together for annealing. However, because SLiC reactions do not include DNA ligase, gaps, or nicks in the DNA are repaired once the final product is transformed into *E. coli*. CPEC on the other hand, is a PCR-based approach in which the linearized destination vector and inserts are initially denatured to produce single DNA strands. These are then annealed together, as directed by the homologous DNA overlap regions. Once annealed, the destination vector and inserts act to prime each other for extension via the activity of Phusion DNA polymerase. A low number of PCR cycles act to prevent the propagation of PCR-based errors. SLiCE reactions markedly differ from the assembly methods just described in that they involve an *ex vivo* bacterial cell extract (PPY, *E. coli* DH10B λ–red) as the reaction mix. Since exogenous polymerases and DNA ligases are not required, this is a potentially cost-effective method and like Gibson, assembly reactions also typically take just 60 min, although at 37°C instead of 50°C as per Gibson assembly.

PaperClip DNA assembly is a relatively new overlap-directed assembly method that uses pairs of bridging oligonucleotides termed "Clips" to direct the assembly of multi-part constructs (Trubitsyna et al., 2014). Interestingly, PaperClip assembly protocols are derived from CPEC (PCR-based) and SLiCE (*ex vivo*-based) assembly methodologies. Yet, PaperClip assembly is advantageous over these assembly methods in that once the"Clips" have been prepared, the required assembly order of parts can be determined in a single reaction. While, "Clips" introduce an alanine encoding scar sequence between each part, the bridging oligos used to assemble multi-part constructs in ligase cycling reaction (LCR) assembly are scar-less (Rouillard et al., 2004; de Kok et al., 2014). Though as we describe below, PaperClip assembly differentiates itself from Gibson, CPEC, SLiCE, and LCR assembly methods in that *de novo* assembly fragments do not need to be generated each time the order assembly is changed (Trubitsyna et al., 2014).

Overlap-directed assembly methods use sequence homology to guide assembly and are therefore largely sequence independent. This is a clear advantage over restriction site-based DNA assembly methods and their forbidden sequences. It should be noted that repeat and short DNA sequences, particularly those that give rise to DNA secondary structures, can reduce the efficiency of overlapdirected methods and are best avoided. On the other hand, CPEC denaturation PCR cycles mitigate the effect of DNA secondary structures to some degree. Overlap-directed methods are also efficient at assembling multiple parts in a predefined order within a single one-pot reaction. Gibson assembly, for example, has been used to assemble genome-scale DNA fragments, including the complete assembly of the *M. genitalium* genome (583 kb) and more recently the entire mouse mitochondrial genome (16.3 kb) (Gibson et al., 2008, 2010b). It is clear therefore that overlapdirected assembly methods can be scaled toward the assembly of large genetic constructs, including synthetic genomes (Gibson et al., 2010a). Yet, despite their proven utility, they are inherently "bespoke" and are thus in conflict with the ideals of embedding standardization and modularity concepts within DNA assembly strategies. For instance, custom primers are needed to generate inserts *de novo* each time the assembly order is changed and while it is now possible to automate overlap-directed assembly primer design (Hillson et al., 2012), these assembly methods still require tacit knowledge. To this end, additional methodologies are being developed with the aim of making overlap-directed DNA assembly modular.

# **OVERLAP-DIRECTED ASSEMBLY WITH BIOLOGICALLY NEUTRAL LINKER SEQUENCES**

Modular overlap-directed assembly with linkers (MODAL) makes use of standardized flanking sequences and biologically neutral (orthogonal) linkers as part of a modular overlap-directed DNA assembly strategy (Casini et al., 2013). MODAL assembly requires bioparts to be standardized with the addition of a common prefix and suffix sequence. The prefix and suffix sequences do not contain restriction sites and are not directly required for the assembly process. Instead, these sequences serve as a consistent set of PCR primer "landing pads" that enable all MODAL bioparts to be generated using the same primer set. Additionally, these sites serve as priming sites for the PCR-directed addition of biologically neutral linker sequences that serve as homologous sequences for overlapdirected assembly. These sequences can be designed with R2oDNA Designer (Casini et al., 2013, 2014), an *in silico* tool that was developed to automatically design orthogonal linker sequences for use in MODAL and other applications. Similar strategies have also been developed in parallel, in which biologically inactive unique nucleotide sequences (UNSes) were utilized to guide the Gibson assembly of insulated genetic circuits (Guye et al., 2013; Torella et al., 2014a,b). These neutral sequences are often standardized and may also incorporate BioBrick restriction sites, thus enabling modularity and standardization to be embedded within overlap-directed assembly strategies.

#### **IN VIVO DNA ASSEMBLY AND GENOME ENGINEERING**

An array of chassis with a broad set of useful, extensively characterized genotypes and phenotypes are available to the synthetic biology community (**Table 2**). However, there are applications where it is appropriate to rationally engineer a chassis. For instance, an application may require a novel strain that is optimized, at the genome level, to fit a set of specific design requirements that may be difficult or otherwise impractical to bioprospect. Typically, genome-engineering efforts are geared toward maximizing compatibility between a chassis and a synthetic system, increasing the efficiency of the metabolic flux across a synthetic pathway or toward minimizing burden effects. The field is making progress in establishing rationally engineered genomes; of which the synthetic yeast 2.0 project (Dymond and Boeke, 2012;Annaluru et al., 2014; Lin et al., 2014) and minimal genome projects (Glass et al., 2006; Dewall and Cheng, 2011; Shuler et al., 2012), are currently the most prominent exemplars. These genome-engineering efforts are made possible due to the emergence and ongoing development of an expanding set of *in vivo* DNA assembly methods and genome-engineering tools.

Recombineering approaches, in which synthetic linear ds/ssDNA sequences are introduced into genomic regions through a process of homologous recombination, have proven utility as an efficient method to knockout or knock-in sequences of interest. Recombineering enables genomic engineering at all scales; from the introduction of single nucleotide polymorphisms, to the replacement of 40 kb+ DNA fragments or even toward the assembly of entire genomes (Narayanan and Chen, 2011; Zhao et al., 2011; Bonde et al., 2014; Song et al., 2014). *S. cerevisiae* transformation-associated recombination (TAR) cloning (Kouprina and Larionov, 2008), *Bacillus* Domino (Ohtani et al., 2012), and the *E. coli* Single-Selective-Marker Recombination Assembly System (SRAS) (Shi et al., 2013) uses the endogenous homologous recombination machinery of the indicated organisms to assemble DNA constructs *in vivo*. A variant of the yeast TAR method has successfully generated several genomes, including the first *in vivo* assembled synthetic genome of *M. genitalium* (Gibson et al., 2008). *Bacillus* domino has also shared similar successes in that this assembly method has also assembled DNA at the genomic scale, including the mouse mitochondrial genome and the rice chloroplast genome (Itaya et al., 2008; Ohtani et al., 2012; Iwata et al., 2013). While *E. coli* SRAS could potentially support the assembly of large DNA fragments, it is currently optimized for the assembly of multi-part constructs and their simultaneous integration into the *E. coli* genome (Shi et al., 2013).

The lambda-red (λ-red) recombinase system is another recombineering strategy, which is used for the integration of ssDNA or dsDNA constructs into the *E. coli* genome (Murphy, 1998; Murphy and Campellone, 2003). Optimized lambda-red recombination protocols can integrate linear DNA sequences into a specific genomic target, guided by only 35–50 bases of flanking homologous sequence (Murphy and Campellone, 2003). Interestingly, lambda-red-mediated recombination events do not require endogenous recombination proteins (e.g., RecA) and instead linear ssDNA or dsDNA constructs are integrated into the *E. coli* genome via the action of three λ-red proteins; Gam, Exo, and Beta. Gam protects linear dsDNA from the exonuclease activity of the endogenous proteins RecBCD, thus increasing the efficiency at which the introduced dsDNA will be recombined into the genome. λ-red-mediated recombination itself is primarily mediated by Exo, a 50–3<sup>0</sup> – dsDNA-specific exonuclease and Beta, a ssDNA annealing protein. It is interesting to note that Gam-associated protection of dsDNA is exploited in SLiCE *ex vivo* DNA assembly and as we discuss later, for *in vitro* transcription–translation (TX–TL) coupled reactions involving linear DNA as the input (Sitaraman et al., 2004).

The introduction of a large number of rationally engineered genomic changes is a potentially laborious process; however, multiplex automated genome engineering (MAGE) enables the automation of large-scale recombineering strategies. MAGE was originally characterized within EcNR2, a variant strain of *E. coli* MG1655. EcNR2 was modified to incorporate the λ-red recombination system and also to be deficient in DNA mismatch repair via the knockout of the *mutS* gene (Wang et al., 2009). MAGE relies upon the λ-red Beta protein-assisted incorporation of ssDNA oligonucleotides, typically 90mers, into the lagging strand during DNA replication (Wang et al., 2009). MAGE oligonucleotide pools can be designed to incorporate highly specific changes at a single genomic site, to introduce multiple changes across a single locus or to simultaneously target multiple genomic sites. These outcomes are largely defined through the diversity of the MAGE oligonucleotide pool, where mixtures of degenerate oligonucleotides can be designed to introduce divergent changes across a broad sequence and recombination efficiency space. Where a large number of simultaneous genomic changes are required, the process can be repeated through multiple MAGE cycles of cell growth, electroporation of oligonucleotides into the cell population, and phenotype/genotype characterization. MAGE cycles can be automated through a microfluidics-type setup and in combination with MODEST or optMAGE, which are *in silico* MAGE oligonucleotide design tools (**Table 4**), the directed evolution of a rationally designed chassis, can be accomplished within a timescale of several days. Indeed, MAGE has been used to optimize the DXP pathway in *E. coli*, such that isolated variants that are capable of a fivefold increase in lycopene production were engineered in just 3 days (Wang et al., 2009).

In parallel with MAGE, conjugative assembly genome engineering (CAGE) can be used to coordinate large-scale genomic engineering strategies across phases, such that subtle genetic combinations that are lethal can be screened out in a manner that does not impede overall progress toward the final strain. To achieve this, CAGE guides the conjugal transfer of MAGE genome modifications between hierarchical pairs of donor–recipient *E. coli*, such that a new strain emerges which incorporates all of the MAGE-optimized modifications from previous generations (Isaacs et al., 2011). Multiple MAGE–CAGE rounds enable a large set of genomic modifications to be generated and carefully

#### **Table 4 | DNA assembly and genome-engineering tools**.


(Continued)

**Table 4 | Continued**


\*Sequence-independent assembly strategies do not place restrictions upon which DNA sequences are permitted within assembly fragments.

integrated. As an example of such an approach, Isaacs et al. (2011) used a MAGE–CAGE strategy to replace 314 TAG stop codons with the synonymous TAA in *E. coli* across its entire genome (Isaacs et al., 2011).

Engineered nucleases, which cleave specific DNA sequences, creating double-stranded DNA breaks, can be used to introduce genomic changes. These strategies depend upon the random occurrence of perturbations in DNA repair mechanisms, where double-stranded breaks are inappropriately repaired, resulting in erroneous sequence insertions, deletions, or even significant chromosomal rearrangements. Screening strategies to identify cells that contain desirable genomic alterations can be subsequently isolated as an engineered population. Zinc-finger nucleases (Ellis et al., 2013), TALENS (Mahfouz et al., 2011), and the CRISPR/Cas system (Sander and Joung, 2014) have all been engineered for these types of genome editing applications. The CRISPR/Cas system is particularly interesting since as discussed above, a deactivated Cas9 nuclease:gRNA complex can also be fused with domains that act as transcriptional activators or repressors (Bikard et al., 2013; Mali et al., 2013; Qi et al., 2013). Nuclease-mediated genome editing strategies can also be combined with a recombineeringtype approach, in which an engineered dsDNA can be introduced into the cell, which has sequence complementarity at the site of the nuclease breakage. Through the endogenous homologous recombination machinery (DNA repair mechanisms), it is possible

to rationally integrate the engineered dsDNA into the genome (Cong et al., 2013; Sander and Joung, 2014). Thus in combination, MAGE, CAGE, and engineered, targeted nucleases (Zinc, TALENS and Cas9) represent a set of molecular tools that enable genome editing and the transcriptional control of natural and synthetic genomes.

# **DNA SYNTHESIS**

Synthetic biology has greatly benefited from the rapid decline in the cost of commercial gene synthesis, a phenomenon popularized by the Carlson curve (Carlson, 2009), which is analogous to Moore's law. Although the rate of decline has decreased in recent years, with DNA synthesis costs now relatively stable (Carlson, 2009, http://www.synthesis.cc/cgi-bin/mt/mt-search. cgi?blog\_id=1&tag=CarlsonCurves&limit=20), it is likely that new disruptive technologies will decrease DNA synthesis costs in the near future. DNA synthesis costs are still sufficiently low that many research groups routinely order the synthesis of genes and gene fragments although still prohibitive for library generation or for the synthesis of large multi-part pathways. In these cases, gene synthesis can be combined with additional cloning techniques such as overlap-directed assembly or mutagenic PCR to generate large constructs or biopart libraries, respectively. It is likely that as DNA synthesis costs decline, there will be a continual shift away from DNA assembly toward *de novo* DNA synthesis, which will have a transformative effect on synthetic biology and the design-build-test cycle.

# **RAPID PROTOTYPING**

High-throughput platforms bring scalability to biopart characterization efforts, through the parallel characterization of function and context of entire biopart libraries (Arkin, 2013; Keren et al., 2013; Mutalik et al., 2013b). To ensure consistency at such scale, high-throughput workflows typically couple liquidhandling robots with plate readers (Keren et al., 2013), flow cytometry (Piyasena and Graves, 2014; Zuleta et al., 2014), or microfluidics (Lin and Levchenko, 2012; Benedetto et al., 2014) in order to automate the majority of the experimental workflow. Several high-throughput platforms have been described, the majority of which were used to characterize DNA regulatory elements (Keren et al., 2013; Mutalik et al., 2013a,b), however, this is expanding to include the characterization of enzymes (Choi et al., 2013), multi-gene operons (Chizzolini et al., 2013), and RNA aptamers (Cho et al., 2013; Szeto et al., 2014). When coupled with automated data analysis and modeling, these technologies and workflows could become rapid prototyping platforms, enabling a truly biological design cycle approach (Kitney and Freemont, 2012). At present, these high-throughput workflows are typically semi-rational design strategies in which thousands of biopart variants are tested and screened as part of a discovery workflow. Yet, at the same time, these approaches are simultaneously generating large data sets that provide useful insights into biological processes that may inform biological design rules. For example, characterization efforts have informed several systematic methodologies for the rational optimization of synthetic systems at the transcriptional, translational, and post-translational level (**Table 2**) (Arkin, 2008; Arpino et al., 2013; Reeve et al., 2014). In cases where synthetic systems could conceivably be rationally designed, it is still naïve to assume that the first iteration of a synthetic biological system will perfectly match the design specifications. Instead, multiple iterations of the design-build-test cycle will be needed until forward-design approaches are sufficiently advanced. Therefore, the requirements of interoperable standards in which researchers can apply the same protocols across different liquidhandling platforms are essential. To this end, Linshiz et al. (2013) have implemented a high-level robot programing language (PaR-PaR), which can translate biological protocols into instruction sets for an extendable range of liquid-handling robot platforms. As a consequence of this approach, the training requirements for endusers to implement the same biological protocol across different liquid-handlers are significantly reduced (Linshiz et al., 2013). If, as the authors propose, PaR–PaR is combined with SBOL, then the adoption of PaR–PaR scripts will enable researchers to share the same high-throughput DNA assembly or characterization protocols, but have them implemented across different experimental and equipment setups.

The majority of the rapid prototyping platforms that we have described so far have been optimized for testing biological parts, devices, and systems *in vivo*; however, *in vitro* systems are emerging as a useful testing platform. Cell-free protein synthesis (CFPS) systems based upon *E. coli* (Nirenberg and Matthaei, 1961; Sitaraman et al., 2004; Hong et al., 2014), *B. subtilis* (Zaghloul and

Doi, 1987), *S. cerevisiae* (Hodgman and Jewett, 2013; Gan and Jewett, 2014), or other cell extracts have been reported in the scientific literature for several decades. Several CFPS systems are commercially available and are principally marketed as protein expression systems. Optimized *E. coli* CFPS systems can synthesize up to 2.3 mg/ml of the target protein (Caschera and Noireaux, 2014), including those that are toxic *in vivo*. In recent years, the synthetic biology community has repurposed CFPS systems as *in vitro* transcription–translation (TX–TL) coupled characterization platforms. A typical TX–TL reaction combines a synthetic system encoded into plasmid, linear or closed circular DNA, with cell-free extract, and a reaction buffer, the contents of which can be optimized (Sun et al., 2013a). For instance, the addition of maltodextrin (Wang and Zhang, 2009) and to a lesser degree maltose (Caschera and Noireaux, 2014) as an additional energy source can increase protein production, essentially prolonging the duration of *in vitro* reactions for up to 10 h.

Transcription–translation characterization systems provide characterization data within a timescale of hours (Chappell et al., 2013), and are therefore, amenable to a rapid prototyping workflow (Chappell et al., 2013; Sun et al., 2013b). For instance, Chappell et al. (2013) characterized a panel of Anderson constitutive promoters, using a commercially available TX–TL system, within a 5-h workflow. Interestingly, the *in vitro* characterization data of a set of Anderson promoters correlated with their performance *in vivo* (Chappell et al., 2013). Likewise, in the same study, a panel of LasR responsive, AHL-inducible promoters, also behaved similarly *in vitro* and *in vivo*, although meaningful comparisons could only be made where constructs were encoded into plasmid or closed circular DNA (Chappell et al., 2013). PCR-generated linear DNA templates did not produce sufficient transcription and translation of the reporter protein (Chappell et al., 2013). Based upon several reports, it is likely that linear DNA templates are unstable *in vitro* due to the presence of exonuclease activity in the cell-free extract (Sitaraman et al., 2004; Sun et al., 2013b). Expression of the phage lambda protein Gam, an inhibitor of RecBCD (ExoV), along with other modifications, can minimize linear DNA degradation, thus restoring protein expression to levels that are comparable to plasmid DNA constructs (Sitaraman et al., 2004; Sun et al., 2013b). Yet, in disagreement with several other studies (Chappell et al., 2013;Iyer et al., 2013; Lu and Ellington, 2014), Sun et al. (2013b) reported that *in vitro* characterization data were not comparable to *in vivo* data, though they did describe a methodology to calibrate between them. While the comparability between *in vitro* and *in vivo* characterization requires further investigation, several reports have demonstrated that cell-free TX–TL systems have proven utility in the rapid prototyping of logic-based genetic circuits (Karig et al., 2012; Shin and Noireaux, 2012; Iyer et al., 2013) or synthetic operons (Lu and Ellington, 2014). Within a systematic design context, *in vitro* characterization approaches have the potential to complement *in vivo* prototyping efforts by rapidly providing the characterization data required to rationally select a smaller number of designs for final testing (**Figure 3**).

# **CONCLUSION**

Synthetic biology is generally described as the"engineering of biology" yet since its inception, the field has faced the well-understood

reality that biological systems are complex, stochastic, and difficult to predict, and are therefore, intrinsically difficult to engineer. In order to address these fundamental challenges, synthetic biology must use and explore the existing large body of knowledge of biological systems at different scales from molecular to cellular to organismal. By establishing a systematic design framework in which existing biological knowledge can be adapted and utilized will ensure the rapid development of successful applications using synthetic biology. Furthermore, the accumulated measurements and acquired knowledge of many synthetic biology experiments will allow synthetic biologists to establish design rules that tackle biological complexity, such that robust biological systems can be designed, assembled, and prototyped as part of a biological design cycle. At each stage of the design cycle, an expanding repertoire of tools is being developed. In this review, we have highlighted several of these tools in terms of their applications and benefits to the synthetic biology community within the context of the synthetic biology design cycle namely, designing predictable biology (design), assembling DNA into bioparts, pathways, and genomes (build), and rapid prototyping (test).

Design encompasses the development of tools and methodologies that make it easier to forward-design predictable synthetic biological systems. While there are several areas that are critical to designing predictable biology including, chassis selection, biopart design, or engineering strategies, as well as, several accompanying *in silico* design tools, we would argue that measurement and characterization (metrology) of biological parts, devices, and systems is essential for the field of synthetic biology to fulfill its promise. It is only through improvements in our ability to measure and generate meaningful conclusions about the behavior of biological processes that the field can progress in terms of unlocking additional biological design rules. The RBS calculator is the current exemplar of this perspective, though further work is required to equip the synthetic biology toolbox with the tools to make it easier to engineer radically complex synthetic biological parts, devices, and systems.

Build encompasses DNA assembly and genome-engineering methods that enable synthetic systems to be assembled. The field has benefited immensely from the BioBrick assembly standard. BioBrick assembly, effectively making bioparts reusable (modular) at the physical DNA level, creates a standard that enables multiple research groups to use and share an expanding library of bioparts, without the need for bespoke cloning strategies. While limitations in the BioBrick assembly standard led to the emergence of powerful overlap-directed assembly methods, including Gibson, these methods also shifted away from several of the core principals of synthetic biology since these methods rely on bespoke cloning strategies. However, emerging DNA assembly methods including MODAL or Gibson with UNSes, aim to unify the advantages of overlap-directed assembly with the engineering principle of modularity. However, advances in DNA synthesis and resultant reduction of costs could radically transform the field, such that more time could be diverted away from DNA assembly toward the designing or testing of synthetic systems.

Test encompasses elements of biopart characterization, since even the testing of non-functional designs may provide insights into our understanding of the biological design rules. Liquidhandling robot high-throughput characterization platforms,along with plate readers are equipped to test prototypes of synthetic bioparts, devices, and systems. However, these systems benefit Kelwick et al. Tools for synthetic biology

from the addition of flow cytometry and microfluidics, which bring single cell analysis to these platforms. Thus, individual cells could be analyzed and selected based upon preferred biological performance from a heterogeneous cell mix. Additionally, an array of emerging *in vitro* TX–TL cell-free characterization systems provide characterization data within a timescale of hours, and are therefore, amenable to a rapid prototyping workflow. Such systems are complementary to *in vivo* high-throughput approaches and may speed up iterations through the design cycle by reducing the number of final designs that need to be tested.

As the synthetic biology toolkit expands and more design rules are unlocked, the most successful forward-design strategies are likely to be those that encompass a diverse workflow that combines several interoperable tools at each stage of the design cycle.

# **ACKNOWLEDGMENTS**

We wish to acknowledge the support of the Engineering and Physical Science Research Council (EPSRC) and that of our colleagues in the Centre for Synthetic Biology and Innovation (CSynBI) at Imperial College. Funding: EPSRC [EP/K034359/1; EP/J02175X/1], Bill and Melinda Gates Foundation [OPP1046311], The Wellcome Trust [084369/Z/07/Z].

# **REFERENCES**


and cloning of a *Mycoplasma genitalium* genome. *Science* 319, 1215–1220. doi:10.1126/science.1151721


Lobban, P. E., and Kaiser, A. D. (1973). Enzymatic end-to end joining of DNA molecules. *J. Mol. Biol.* 78, 453–471. doi:10.1016/0022-2836(73)90468-3


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 09 September 2014; accepted: 12 November 2014; published online: 10 December 2014.*

*Citation: Kelwick R, MacDonald JT, Webb AJ and Freemont P (2014) Developments in the tools and methodologies of synthetic biology. Front. Bioeng. Biotechnol. 2:60. doi: 10.3389/fbioe.2014.00060*

*This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology.*

*Copyright © 2014 Kelwick, MacDonald, Webb and Freemont . This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Production of fatty acid-derived valuable chemicals in synthetic microbes

# **Ai-QunYu1,2, Nina Kurniasih Pratomo Juwono1,2, Susanna Su Jan Leong1,2,3 and MatthewWook Chang1,2\***

<sup>1</sup> Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore

<sup>2</sup> Synthetic Biology Research Program, National University of Singapore, Singapore, Singapore

<sup>3</sup> Singapore Institute of Technology, Singapore, Singapore

#### **Edited by:**

Jean Marie François, Laboratoire d'Ingénierie des Systèmes Biologiques et des Procédés UMR-CNRS 5504, France

#### **Reviewed by:**

Zongbao K. Zhao, Chinese Academy of Sciences, China Taek Soon Lee, Lawrence Berkeley National Laboratory, USA

#### **\*Correspondence:**

Matthew Wook Chang, Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 28 Medical Drive, 117456 Singapore

e-mail: bchcmw@nus.edu.sg

Fatty acid derivatives, such as hydroxy fatty acids, fatty alcohols, fatty acid methyl/ethyl esters, and fatty alka(e)nes, have a wide range of industrial applications including plastics, lubricants, and fuels. Currently, these chemicals are obtained mainly through chemical synthesis, which is complex and costly, and their availability from natural biological sources is extremely limited. Metabolic engineering of microorganisms has provided a platform for effective production of these valuable biochemicals. Notably, synthetic biology-based metabolic engineering strategies have been extensively applied to refactor microorganisms for improved biochemical production. Here, we reviewed: (i) the current status of metabolic engineering of microbes that produce fatty acid-derived valuable chemicals, and (ii) the recent progress of synthetic biology approaches that assist metabolic engineering, such as mRNA secondary structure engineering, sensor-regulator system, regulatable expression system, ultrasensitive input/output control system, and computer science-based design of complex gene circuits. Furthermore, key challenges and strategies were discussed. Finally, we concluded that synthetic biology provides useful metabolic engineering strategies for economically viable production of fatty acid-derived valuable chemicals in engineered microbes.

**Keywords: synthetic biology, metabolic engineering, fatty acid biosynthesis pathway, biochemical production, Escherichia coli, yeast**

# **INTRODUCTION**

Fatty acids are one of the major components found in all organisms, usually in the intracellular forms of fatty acyl–acyl carrier protein (acyl-ACP), fatty acyl-coenzyme A ester (acyl-CoA), storage lipids, eicosanoids, and unesterified free fatty acids. In industry, applications of free fatty acids are generally limited due to the ionic nature of their carboxyl group (Peralta-Yahya et al., 2012). Comparatively, fatty acid derivatives have wider applications such as biofuels, biomaterials, and other biochemicals (Lennen and Pfleger, 2013; Runguphan and Keasling, 2014).

The low abundance or yield of fatty acid-derived chemicals in organisms renders their isolation from natural sources noneconomically viable. The synthesis of fatty acid derivatives by chemical means also suffers from low efficiency and often requires harsh reaction conditions, prolonged times, and expensive equipment footprint (Song et al., 2013). The production of fatty acidderived chemicals by engineering microbial cells into microbial factories is becoming an attractive alternative approach that can overcome the aforementioned bottlenecks associated with the other synthesis routes (Keasling and Chou, 2008; Schirmer et al., 2010; Lee et al., 2012).

To date, synthetic enzymatic pathways that lead to the production of fatty acid-derived valuable chemicals including fatty alkanes, fatty acid methyl/ethyl esters, fatty alcohols, hydroxy fatty acids, and lactones have been constructed in microorganisms such as *Escherichia coli* and *Saccharomyces cerevisiae*. However, it remains a challenge to achieve high yield, titer, and productivity of these fatty acid-derived chemicals. The major challenges faced in maximizing product titer are associated with: (1) improving the low enzyme activity of an entire metabolic pathway, (2) increasing the inadequate tolerance of the used microorganisms toward toxic target compounds, (3) recycling or replacing insufficient cofactors for enzymatic reactions, (4) enriching precursors and eliminating byproducts, and (5) optimizing and balancing the fluxes of whole metabolic networks to reduce burden on the host, and remove negative feedback regulation. Recently, advanced synthetic biology approaches have provided potential to address these challenging problems in re-engineering microbial systems for fatty acid-derived chemicals production (Clomburg and Gonzalez, 2010; Siddiqui et al., 2012; Zhang et al., 2012a), which narrows the gap toward realizing full-scale commercialization and industrialization of this manufacturing route.

In this review, we focus on the recent progress in metabolic engineering efforts to convert fatty acids to valuable chemicals using microbes as hosts, and advancement in synthetic biology approaches for further optimizing biochemical production in microbial biofactories.

# **METABOLISMS OF FATTY ACIDS IN ORGANISMS**

Fatty acids are an integral part of all living organisms, and are generally composed of a hydrophobic hydrocarbon chain ending in one hydrophilic carboxylic acid functional group. The metabolic

**FIGURE 1 | Overview of metabolic pathways that lead to the production of fatty acids and fatty acid-derived chemicals**. The fatty acid biosynthesis (orange), β-oxidation cycle (blue), and the biosynthesis pathway of fatty acid-derived chemicals (gray) are presented. The enzymes of fatty acid metabolism in S. cerevisiae is in blue, in E. coli is in black, and the enzymes for conversion of fatty acids to their derivatives from other organisms is in red. AAR, acyl-ACP reductase; ACC1, acetyl-CoA carboxylase; AccABCD, a four subunits, biotin carboxyl carrier protein (AccB), biotin carboxylase (AccC), and acetyl-CoA carboxytransferase (AccA, AccD); Acr1 & Acr2, acyl-CoA reductase; ADC, aldehyde decarbonylase; ADH, alcohol dehydrogenase; ADO, aldehyde-deformylating oxygenase; AHR, aldehyde reductase; BVMO, Baeyer–Villiger mono-oxygenase; CAR, carboxylic acid reductase; CER1, fatty aldehyde decarbonylase Des, fatty acid desaturase; DGAT1, acyl-CoA:diacylglycerol acyltransferase; Elo, fatty acid elongase; FAA1 & FAA4, long-chain fatty acyl-CoA synthetase; FAA2 & FAA3, fatty acyl-CoA synthetase; FabA & FabZ, β-hydroxy acyl-ACP dehydratase; FabB, β-keto acyl-ACP synthase I; FabD, malonyl-CoA:ACP transacylase; FabF, β-keto

acyl-ACP synthase II; FabG, β-keto acyl-ACP reductase; FabH, β-keto acyl-ACP synthase III; FabI, enoyl acyl-ACP reductase; FadA & FadI, β-keto acyl-CoA thiolase; FadB & FadJ, enoyl-CoA hydratase/β-hydroxy acyl-CoA dehydrogenase; FadD, fatty acyl-CoA synthase; FadE, acyl-CoA dehydrogenase; FadM, long-chain acyl-CoA thioesterase III; FAMT, fatty acid methyltransferase; FAR, fatty acid reductase; FAS1, acyl-CoA:ACP transferase/β-hydroxyl acyl-ACP dehydratase/acyl-ACP reductase; FAS2, acyl-ACP synthase/β-keto acyl-ACP synthase; FOX2, enoyl-CoA hydratase/β-hydroxyl acyl-CoA dehydrogenase; LipL, lactonizing lipase; OhyA, oleate hydratase; OleABCD, a four protein families for long-chain olefin biosynthesis; OleTJE, Jeotgalicoccus sp terminal olefin-forming fatty acid decarboxylase; OIs, a type I polyketide synthase for α-olefin biosynthesis; PaaF, 2,3-dehydroadipyl-CoA hydratase; PhaJ & PhaC, polyhydroxyalkanoate (PHA) synthases to yield medium-chain length polyester (mcl-PHA); POX1, fatty acyl-CoA oxidase; POT1, β-keto acyl-CoA thiolase; TE, acyl-ACP thioesterase; WS/DGAT, wax ester synthase/acyl-CoA:diacylglycerol acyltransferase.

pathway of fatty acid metabolism in organisms is well-studied (**Figure 1**). Fatty acids are commonly built via *de novo* synthesis and elongation. **Figure 1** shows that the *de novo* fatty acid synthesis starts from the primer acetyl-CoA and the extender malonyl-CoA through a cyclic series of reactions catalyzed by fatty acid synthases. The synthesized fatty acids are almost entirely composed of even-length and straight carbon chains that have various numbers of carbon atoms (<6, short chain; 6–12, medium chain; >14, long chain) and different degrees of unsaturation (saturated, monounsaturated, and polyunsaturated). Fatty acid breakdown takes place mainly via the β-oxidation pathway, which is like the *de novo* synthesis pathway running in a reverse direction (**Figure 1**).

The fatty acid metabolic pathway generates both fatty acids and their derivatives. The fatty acids and their derivatives from the synthesis and breakdown pathways can ultimately be converted to desirable value-added chemicals through metabolic engineering.

#### **METABOLIC ENGINEERING**

Metabolic engineering is undoubtedly an essential tool in biocatalytic systems because it can develop new cell factories or improve existing cell factories to produce non-native compounds. The primary objective of metabolic engineering is to improve the cellular properties by intentional modification of organisms through redirecting metabolic fluxes. Traditionally, metabolic engineering is performed by introducing completely new pathways for production of novel proteins, drugs, chemicals, or modifying native pathways to achieve desired metabolic goals such as high productivity of metabolites and high robustness of host strains. Here, metabolic engineering relies on directed genetic perturbations, usually in terms of modifying the promoter activity of a given gene, performing over-expression or deletion of endogenous genes/enzymes/pathways, and utilizing heterologous expression of genes/enzymes/pathways (Ostergaard et al., 2000).

However, traditional metabolic engineering approaches frequently fail to lead to the desired phenotypes because of unclear or complex gene structures, functions, and regulations in cellular metabolic networks. Hence,more efforts are required to achieve an integrative and holistic view of the overall network of pathways in organisms rather than individual pathways, which can then guide rational design strategies.

It is challenging to reconstruct certain biochemical pathways in a dynamic metabolic network without having the entire information on intracellular gene regulatory, metabolic, and signaling networks. Thus, fundamental knowledge on cellular genetics, biochemistry, and physiology is critical. Recently, multiple analytical and modeling tools, such as genomics, transcriptomics, proteomics,metabolomics, fluxomics, high-throughput screening, and *in silico* studies, have been utilized to elucidate metabolic engineering workflows, which provide useful information to predict the altered behaviors of metabolic networks, guide strain design, and maximize the efficacy of metabolic engineering.

# **MICROBIAL HOSTS FOR THE PRODUCTION OF FATTY ACID-DERIVED CHEMICALS**

Metabolic engineering of microbial systems provides a renewable route to produce desired organic molecules such as fuels, materials, and chemicals. Many different types of microbes can naturally produce and accumulate varying levels of fatty acids efficiently. Some of them exhibit properties advantageous to the production of fatty acid-derived compounds through metabolic engineering.

*Escherichia coli* and *S. cerevisiae* are the most intensively studied and widely used model microorganisms in the development of metabolic engineering strategies aimed at providing heterologous bioproduction of value-added metabolites. They have several key advantages such as lower safety risks, faster growth rates, good tractability, more well-studied, and more industrially relevant. So far, a number of fatty acid-derived chemicals have been successfully produced in metabolically engineered *E. coli* and *S. cerevisiae* (for references, see**Table 1** below). Compared to *E. coli*, *S. cerevisiae* can be cultured at higher cell density and has a better fermentation performance at low temperature and pH (Aronsson and Ronner, 2001; Ageitos et al., 2011). *S. cerevisiae* is also more suited for the functional expression of eukaryotic enzymes (many enzymes involved in fatty acid production are from the plant kingdom) due to its endomembrane systems and post-translational modifications (Ageitos et al., 2011). However,in many cases, the production yields of fatty acid-derived chemicals from the engineered *S. cerevisiae* are much lower than those of *E. coli* when overexpressing identical heterologous genes. The reasons for this are not clearly understood.

Oleaginous microorganisms, which include bacteria, yeast, cyanobacteria, microalgae, and filamentous fungi, can accumulate intracellular lipids to at least 20% of their cellular dry mass. Thus they are considered attractive next-generation host candidates for production of fatty acid-derived chemicals because these oleaginous species have the ability to provide fatty acids or lipids as precursors (Ratledge, 1994). Oleaginous bacteria have been less studied to date because the lipid content in oleaginous bacteria is relatively lower than that in yeast, cyanobacteria, microalgae, and filamentous fungi, and they are also limited by lower growth rates. Oleaginous cyanobacteria and microalgae are attractive hosts for fatty acid-derived chemical production mainly because of their unique photosynthesis capability that directly converts solar energy and recycles CO<sup>2</sup> into fuels (Parmar et al., 2011). For instance, cyanobacteria *Synechococcus elongatus* sp. strain PCC 7942 have already been successfully engineered to produce a number of different biofuel related compounds, including 1-butanol (Lan and Liao, 2012), isobutanol (Atsumi et al., 2009), isobutyraldehyde (Atsumi et al., 2009), and 2-methyl-1-butanol (Shen and Liao, 2012). However, they are both technically difficult to manipulate genetically, and their cultivation and growth processes are more complicated and expensive than bacteria, yeast, and fungi. These hurdles have hampered their use in the production of fatty acid-derived chemicals through metabolic engineering. Similarly, the exploitation of oleaginous filamentous fungi as production hosts is also impeded by the lack of efficient genetic transformation techniques.

In comparison, oleaginous yeast has many advantages over other oleaginous microbial sources that makes this class of microbes the most promising cell factories for the production of fatty acid-derived chemicals. They can grow to high cell densities in simple and inexpensive culture, reaching extremely high levels of lipid accumulation of more than 70% of their dry weight (Beopoulos et al., 2008; Santamauro et al., 2014). They are also able to use different kinds of residues in waste resources as nutrients (Papanikolaou et al., 2003; Fickers et al., 2005). They are more genetically tractable than oleaginous cyanobacteria, microalgae, and filamentous fungi with relatively well-developed genetic tools (Madzak et al., 2004). Oleaginous yeast candidates, which show great potential as hosts for fatty acid-derived chemical production,



DCW, dry cell weight.

include*Yarrowia lipolytica* (Blazeck et al., 2014),*Lipomyces starkeyi* (Tapia et al., 2012), *Lipomyces tetrasporus* (Lomascolo et al., 1994), *Rhodotorula glutinis* (Saenge et al., 2011), *Rhodosporidium toruloides* (Li et al., 2007), *Cryptococcus albidus* (Fei et al., 2011), *Cryptococcus curvatus* (Gong et al., 2014), *Metschnikowia pulcherrima* (Santamauro et al., 2014), *Trichosporon pullulans* (Huang et al., 2011), and *Waltomyces lipofer* (Raschke and Knorr, 2009). In particular, the model oleaginous yeast *Y. lipolytica* provides a promising platform as an oleaginous cell factory to convert fatty acids to more valuable metabolites. This oleaginous platform has the ability to utilize wide-scale renewable materials as substrates (Papanikolaou et al., 2003; Fickers et al., 2005) and multiple cheap carbon sources for growth (Papanikolaou et al., 2002; Athenstaedt et al., 2006). Furthermore, it is more competitive than the non-oleaginous yeast *S. cerevisiae* in terms of lipid yield and heterologous protein yield (Gellissen et al., 2005; Papanikolaou and Aggelis,2009). All of these features make*Y. lipolytica* very attractive for use in the production of fatty acid-derived products. Recently,

production of various fatty acid-derived biofuel and bioproducts using engineered *Y. lipolytica* has been investigated, including compounds such as triglycerides (Tai and Stephanopoulos, 2013), alkanes (Blazeck et al., 2013),lactones (Wache et al., 2003), hydroxy fatty acids (Beopoulos et al., 2014), dicarboxylic acids (Wache, 2013), and polyunsaturated fatty acids (Xue et al., 2013). However, transport mechanisms, transcriptional regulatory, and signal transduction pathways involved in lipid accumulation and degradation in *Y. lipolytica* need further exploration. This will pave the way to better utilization of this platform.

# **METABOLIC ENGINEERING OF MICROBES FOR PRODUCING FATTY ACID-DERIVED CHEMICALS**

As discussed above, most fatty acid-derived chemicals are hard to obtain efficiently from natural sources or through native metabolic pathways. Recent efforts of metabolic engineering have been made in developing microbial chemical factories for the production of target chemicals. **Figure 1** shows that the chemicals derived from fatty acids are generated by introducing the corresponding conversion steps associated with native fatty acid metabolic pathways. In this section, we describe pathway engineering for biochemical synthesis and review applications of metabolic engineering in the production of various fatty acid-derived chemicals, including: (1) fatty acid esters; (2) fatty alkanes and alkenes; and (3) fatty alcohols and other chemicals such as fatty ketones and lactones.

#### **METABOLIC ENGINEERING TOWARD FATTY ACID ESTER PRODUCTION**

Fatty acid methyl esters (FAMEs) and fatty acid ethyl esters (FAEEs) can be used as "biodiesel"fuel. The key enzyme to synthesize FAEEs in engineered microbes is wax ester synthase, which is responsible for catalyzing the esterification reaction of acyl-CoAs and alcohols.

In *S. cerevisiae*, by expressing heterologous wax ester synthase from *Marinobacter hydrocarbonoclasticus* DSM 8798 and up-regulating endogenous acetyl-CoA carboxylase, FAEEs were produced at a final titer of 8.2 mg/L (Shi et al., 2012). By further eliminating pathways for triacylglycerols (TAG) formation, steryl esters (SE) formation, and β-oxidation that compete with FAEE forming pathway, the production of FAEEs was at 17.2 mg/L in the strain lacking these non-essential fatty acid utilization pathways (Valle-Rodriguez et al., 2014). The corresponding FAEE production increased up to 34 mg/L after integrating the wax ester synthase gene cassette into the yeast genome. To further improve FAEE production, endogenous acyl-CoA binding protein, and NADP+-dependent glyceraldehyde-3-phosphate dehydrogenase from *Streptococcus mutans* were overexpressed in the final integration strain. The highest FAEE titer of 47.6 mg/L was achieved (Shi et al., 2014). In *E. coli*, FAEEs at 674 mg/L were produced by using combinatorial approaches: (1) over-expression of wax ester synthases from *Acinetobacter baylyi* for conversion of fatty acids to FAEEs, native acyl-ACP thioesterases and acyl-CoA ligases for acyl-CoA production, pyruvate decarboxylase and alcohol dehydrogenase from *Zymomonas mobilis* for non-native ethanol-forming, and (2) deletion of the competing fatty acid β-oxidation pathway (knockouts are *fadE*) (Steen et al., 2010). It was reported that over-expression of acetyl-CoA carboxylase and optimization of cultivation conditions further improved the yield of FAEEs to 922 mg/L (Duan et al., 2011). A recent work demonstrated that a dynamic sensor-regulator system increased the FAEEs titer to 1.5 g/L in genetically engineered *E. coli* strain (Zhang et al., 2012a). Fed-batch pilot scale cultivation of the engineered *E. coli* p(Microdiesel) strain could yield 15 g/L FAEEs, by first using glycerol as sole carbon source for biomass production before glucose and oleic acid were added as carbon sources (Elbahloul and Steinbüchel, 2010).

In *E. coli*, FAMEs were formed from free fatty acids and Sadenosylmethionine through expressing fatty acid methyltransferases from *Mycobacterium marinum* and *Mycobacterium smegmatis*. Over-expression of heterologous thioesterases can increase free fatty acids, and further result in increased FAME synthesis. It was reported that over-expression of thioesterases such as thioesterase II from *E. coli*, acyl-ACP thioesterases from *Clostridium phytofermentans*, *Clostridium sporogenes*, *Clostridium tetani* and *M. marinum*, 3-hydroxyacyl ACP:CoA transacylases from *Pseudomonas putida*, and methionine adenosyltransferases from rat, combined with deletion of a global methionine regulator *metJ*, led to the production of FAMEs at up to 16 mg/L (Nawabi et al., 2011).

#### **METABOLIC ENGINEERING TOWARD FATTY ALKA(E)NE PRODUCTION**

Fatty alka(e)nes can exist as straight or branched chains. Both straight- and branched-chain alka(e)nes have the potential to serve as advanced biofuels. There are two primary pathways for alka(e)ne biosynthesis: (1) a pathway that starts from acyl-ACP, followed by reducing acyl-ACPs to form fatty aldehydes catalyzed by reductases, and then converting fatty aldehydes to alka(e)nes by aldehyde decarbonylases; and (2) a pathway that starts from free fatty acids, followed by reduction and decarboxylation to generate alka(e)nes.

Over-expression of acyl-ACP reductases and aldehyde decarbonylases from cyanobacteria in *E. coli* and *Synechocystis* sp. PCC 7002 achieved alka(e)ne concentration at 300 mg/L (Schirmer et al., 2010) and 5% of cell dry weight (Reppas et al., 2010), respectively. Recently, in *E. coli*, free fatty acids were catalyzed to form fatty aldehydes by expressing fatty acid reductase complex from *Photorhabdus luminescen.* Coupled with aldehyde decarbonylases from *Nostoc punctiforme*, fatty aldehydes were converted further to alka(e)nes. In this study, production of branched-chain alka(e)nes from branched-chain fatty acids at a titer of 2–5 mg/L was also reported by over-expression of branched-chain α-keto acid dehydrogenase complex and β-ketoacyl-ACP synthase III from *B. subtilis* (Howard et al., 2013).

Terminal alkenes can also be produced in microorganisms via two pathways: (1) conversion of free fatty acids to terminal alkenes by cytochrome P450 peroxygenase (Rude et al., 2011); and (2) conversion of acyl-ACP to terminal alkenes by a large multi-domain type I polyketide synthases (Mendez-Perez et al., 2011). However, the pathways involving free fatty acids and acyl-ACP need to be further optimized to improve the efficiency and yield. Very longchain alkenes can be generated by a head-to-head condensation of two acyl-CoAs catalyzed by the OleABCD protein families. In a previous study, heterologous expression of the Ole cluster from *Micrococcus luteus* ATCC 4698 in *E. coli* led to the production of very long-chain alkenes at a total concentration of 40µg/L (Beller et al., 2010).

# **METABOLIC ENGINEERING TOWARD PRODUCTION OF FATTY ALCOHOLS AND OTHER CHEMICALS**

Fatty alcohols (or long-chain alcohols) can be formed by reduction from fatty aldehyde intermediates using aldehyde reductases, for example, from cyanobacterium *Synechocystis* sp. PCC 680 (Steen et al., 2010). Fatty alcohols can also be directly produced by acyl-CoA reductases from *M. aquaeolei*, mouse, jojoba, and *Arabidopsis thaliana*. Another fatty aldehyde reductase from *M. aquaeolei* was found to possess the ability to catalyze not only fatty aldehydes but also acyl-CoA or acyl-ACP to corresponding fatty alcohols (Hofvander et al., 2011; Liu et al., 2013). In these pathways, fatty aldehyde intermediates can be bypassed (Tan et al., 2011). In addition, another synthetic pathway leading to 1 butanol (short-chain fatty alcohol) production from *Clostridium* species was functionally constructed in *E. coli* (Shen et al., 2011), *S. cerevisiae* (Steen et al., 2008), and *Thermoanaerobacterium saccharolyticum* (Bhandiwad et al., 2014). This pathway begins with a CoA-dependent Claisen condensation reaction of two acetyl-CoA followed by reduction, dehydration, and hydrogenation. Thus, this sequence of chemical reactions is the reverse direction of that in β-oxidation pathway. Recently, this CoA-dependent 1-butanol synthesis pathway has been extended to produce other linear shortchain fatty alcohols (C6–C8) in *E. coli* (Zhang et al., 2008; Tseng and Prather, 2012)*.*

In addition, chemicals derived from fatty acids also include methyl ketones, hydroxy fatty acids, lactones, and dicarboxylic acids. Methyl ketones can be synthesized through conversion of fatty acids to β-keto acyl-CoAs in β-oxidation, and hydrolysis of β-keto acyl-CoAs by thioesterases to form β-keto fatty acids, followed by decarboxylation of β-keto fatty acids to methyl ketones (Goh et al., 2012). Hydroxy fatty acids can be synthesized by diverse kinds of fatty acid-hydroxylation enzymes, including P450, lipoxygenase, hydratase, 12-hydroxylase, and diol synthase (Kim and Oh, 2013). Lactones can be generally obtained by one-step biotransformation of the precursors hydroxy fatty acids (Wache et al., 2003). To generate dicarboxylic acids, hydroxy fatty acids can be oxidized to fatty ketones by alcohol dehydrogenases, followed by further oxidation of the fatty ketones to esters by Baeyer– Villiger monooxygenases. The esters are subsequently hydrolyzed by esterases to yield dicarboxylic acids (Song et al., 2013). The representatives of valuable chemicals derived from fatty acids in engineered microbes are listed in **Table 1**.

Taken together, metabolic engineering of microorganisms serves as a good platform for effective production of desired fatty acid-derived valuable chemicals. However, more research efforts are required to achieve industrially relevant titers of these chemicals.

# **FACILITATION OF FATTY ACID-DERIVED CHEMICAL BIOPRODUCTION WITH ADVANCED SYNTHETIC BIOLOGY TOOLS**

Successful production of fatty acid-derived chemicals by metabolic engineering of microbial systems has already been achieved. However, the productivity and titers of each of these processes remain to be improved. Further improvement in production efficiency is critical because high productivity and product yield for cost-effective production are the most important pre-requisites for large-scale industrial production of fatty acid-derived chemicals that is also financially viable.

Recent years have witnessed the emergence and marked progress in synthetic biology. Many advanced synthetic biology tools have offered a variety of applications to improve the ability to re-engineer microbial cells for achieving high yields of valuable chemicals, e.g., modular control over metabolic flux in mevalonate biosynthesis pathway using synthetic protein scaffolds in *E. coli* (Dueber et al., 2009), enhancement in production of fatty acid-derived biofuels by using dynamic sensor-regulator system in *E. coli* (Zhang et al., 2012a) and improvement of tolerance against alkane biofuels by transporter engineering in *S. cerevisiae* (Chen et al., 2013a). Although these tools are not widely used in metabolic engineering of microorganisms aiming to produce fatty acid-derived chemicals, there is no doubt that these innovations would facilitate tremendous potential for improved metabolic engineering of microbial systems in the production of various fatty acid-derived products.

In summary, advanced synthetic biology approaches for pathway optimization show great promise in enhancing the speed and efficiency of creating improved microbial strains in combination with common metabolic engineering efforts. The production of fatty acid-derived chemicals could benefit from the integration of synthetic biology tools with the work already accomplished through metabolic engineering. Thus implementation of advanced synthetic biology tools in redesigning fatty acid biosynthesis pathway and heterologous metabolic pathways for the production of fatty acid-derived targets will guide rational manipulation for production of our target at high yields and titers. In this section, we will briefly review the recent development of synthetic biology methodologies and possible applications for construction and optimization of metabolic pathways in microbes at DNA, transcription, translation, and post-translation levels (**Figure 2**).

# **DNA ENGINEERING**

The first step of most metabolic engineering and synthetic biology studies is to reconstruct a completely or partially synthetic pathway. Therefore, rapid assembly of heterologous pathways with many enzymatic steps is a major challenge in metabolic engineering. Traditional DNA molecular cloning approaches, which are tedious, time-consuming and mainly limited by template-based synthesis, restriction digestion, and ligation-based cloning, are increasingly being replaced with *de novo* DNA synthesis and more sophisticated assembly capabilities. Many simple, rapid, highthroughput, high-fidelity and low-cost DNA synthesis, and assembly methods in synthetic biology have been developed, including programmable microfluidic chips (Tian et al., 2004), BioBricks assembly (Sleight et al., 2010), BglBricks assembly (Anderson et al., 2010), In-Fusion assembly (Zhu et al., 2007), Gibson DNA assembly (Gibson et al., 2009), TAR-based assembly (Benders et al., 2010), Circular polymerase extension cloning (CPEC) (Quan and Tian, 2009), Sequence and ligase independent cloning (SLIC) (Li and Elledge, 2007), Seamless Ligation Cloning Extract (SLiCE) (Zhang et al., 2012c), DNA assembler (Shao et al., 2009), Uracilspecific excision reagent cloning (USER) (Gulig et al., 2009), Methylation-assisted tailorable ends rational ligation (MASTER) (Chen et al., 2013b), Site-specific recombination-based tandem assembly (SSRTA) (Zhang et al., 2011), PCR-based two-step DNA synthesis (PTDS) (Xiong et al., 2004), Golden Gate assembly (Cermak et al., 2011), and Polymerase incomplete primer extension cloning (PIPE) (Liu and Naismith, 2008). These approaches together enable the efficient synthesis of synthetic DNA fragments with no apparent limits on either sequence or length. Therefore, these powerful and efficient toolboxes allow efficient manufacture of genes, regulatory elements, circuits, gene clusters, and metabolic pathways for the production of novel chemicals.

The laborious and site-specific gene targeting by homologous recombination techniques, which have limited applicability for genome wide modification are now being increasingly displaced with such genome-scale engineering techniques as multiplex

automated genome engineering (MAGE), conjugative assembly of genome engineering (CAGE), and transcription activator-like effector nucleases (TALENs). MAGE simultaneously targets multiple locations on chromosomes to introduce small modifications in a single cell or across a population of cells, facilitating rapid generation of a diverse set of genetic changes. 1-deoxy-d-xylulose-5-phosphate (DXP) biosynthesis pathway in *E. coli* was optimized by this technique. Twenty-four genetic components in the DXP pathway were modified simultaneously using a complex pool of synthetic oligonucleotides, creating over 4.3 billion combinatorial genomic variants per day and achieving a more than fivefold increase in lycopene production within 3 days (Wang et al., 2009). CAGE enabled large-scale assembly of many modified genomes on the basis of MAGE (Isaacs et al., 2011). TALENs is another powerful tool created to target double-strand breaks at specific locations in the genome (Christian et al., 2010).

# **TRANSCRIPTIONAL ENGINEERING**

Transcription is the first dedicated phase of gene expression and therefore, different toolsets have been developed in synthetic biology for controlling gene expression and modulating RNA levels in the engineered cells. The primary goal of transcriptional engineering is synthetic control of RNA transcription and transcript levels by controlling gene copy number, transcription initiation rate, transcription termination efficiency, and transcript decay rate. Modifications of gene copy number can be achieved by changing the origin of replication of recombinant expression plasmids or the number of chromosomally integrated gene copies (in particular, strategies for chromosomal integrations at multiple loci). In addition, promoter engineering can be applied to regulate the rate of transcription initiation by using different types of promoters such as constitutive promoters, inducible promoters, specific promoters, hybrid promoters, synthetic promoters, and synthetic promoter libraries (De Mey et al., 2007). Transcription termination efficiency can be regulated as well by changing terminator sequence contexts (Cambray et al., 2013). Studies on mRNA folding and degradation rates determined by mRNA message itself (primary sequences and/or secondary-structures) and on the genomic region of 5<sup>0</sup> - and 3<sup>0</sup> -UTR allowed for further control of transcript abundances of genes of interest (Dori-Bachash et al., 2011; Zaborske et al., 2013).

Based on the principles above, increasing attempts have been recently made to further improve the sensitivity and precision of transcription regulation. First, RNA control system by engineered RNA hairpins enables conditional activation of an endogenous pathway capable of operating in autonomous mode within a complex cellular regulatory network (Venkataraman et al., 2010). Second, dynamic sensor-regulator system uses a transcription factor to specifically sense key intermediates and dynamically regulates the expression of genes. In biodiesel biosynthetic pathways in *E. coli*, this system substantially improved the stability of biodiesel-producing strains and increased the yield by threefold. This strategy can also be extended to other biosynthetic pathways to balance metabolism, thereby increasing product titers and conversion yields and stabilizing production hosts (Zhang et al., 2012a). Third, regulatable expression system has been developed for modulating gene expression in *Corynebacterium glutamicum*. Furthermore, this work provided a synthetic promoter library that enabled the selection of strong promoters. This technology should have many future applications for optimizing bioproduction in *C. glutamicum* and other organisms (Rytter et al., 2014). In addition, transcription factor engineering (Lee et al., 2011) and global transcription machinery engineering study (Zhang et al., 2012b) also serve as a good example for using synthetic biology tools to reengineer transcriptional regulation in organisms. All these efforts have already shown promise and could lead to highly optimized expression of synthetic pathways at the transcriptional level.

# **TRANSLATIONAL ENGINEERING**

After gene transcription is complete, translational engineering tools can be used to speed translation rates, lower degradation rates, and tune protein yields. Synthetic ribosome binding sites (Salis, 2011), antisense RNA (Chang et al., 2012), ribozymes (Meaux and Van Hoof, 2006), translation machinery (rRNA, tRNA, and amino acid) (Harris and Jewett, 2012), peptide tags, and codon optimization method have been proved effective in control of cellular protein levels at the translational level. mRNA secondary structure engineering is a newly developed method for translational regulation of gene expression. The engineered mRNA molecules that exhibit diverse activities including sensing, regulatory, information processing, and scaffolding activities has been implemented as key control elements in synthetic genetic networks to program biological function (Liang et al., 2011). Compared with DNA engineering and transcriptional engineering, translational engineering tools have not yet been extensively developed. Although translational regulation in cellular systems is not as wellstudied, these advances have shown to be effective in removing translation-level limitations.

## **POST-TRANSLATIONAL ENGINEERING**

Post-translational modification of proteins also takes place after translation and include phosphorylation, glycosylation, ubiquitination, methylation, acetylation, and proteolysis. Regulation of this process in the field of synthetic biology is especially important to either prolong or shorten the half-life of desirable proteins. To this end, addition of a synthetic ligand that binds to the destabilizing domains of specific proteins shields them from degradation, allowing fused proteins to perform their cellular functions in mammalian cells (Banaszynski et al., 2006). In addition, a synthetic gene network for tunable degradation of a tagged protein has been constructed in *S. cerevisiae* using components of the *E. coli* degradation machinery (Grilly et al., 2007), opening the door forengineering, and optimization of protein degradation for a variety of future applications in microbial cell factories.

#### **PATHWAY ENGINEERING**

Once the enzymes are expressed from specific genes, the last major challenge lies in optimizing gene expression, protein abundance, enzyme activities, synthetic pathways, and metabolic products as a system, especially in a dynamic manner. To address this problem, researchers have recently developed an array of tools, including global regulator engineering (Hong et al., 2010), computational protein design (Samish et al., 2011), protein engineering (Bommarius et al., 2011), protein trafficking (Hou et al., 2012), protein scaffolds (Dueber et al., 2009), transporter engineering (Chen et al., 2013a), cellular efflux pump engineering (Dunlop et al., 2011), ultrasensitive input/output control system (Dueber et al., 2007), and computer-based complex gene circuits (Daniel et al., 2013). For example, transporter engineering through expression of heterologous ABC transporters from *Y. lipolytica* has been utilized successfully to significantly improve tolerance of *S. cerevisiae* against alkanes. In particular, the tolerance limit of *S. cerevisiae* against decane was increased about 80-fold (Chen et al., 2013a). Ultrasensitive switches with a non-linear input/output function can be effectively harnessed to control many complex biological behaviors in higher-order regulatory systems. These switches approximate digital behavior, providing an input detection threshold at which small changes in input concentration lead to large changes in output behavior. Another successful example of pathway engineering is computer-based complex gene circuits. Synthetic analog gene circuits were engineered to execute sophisticated computational functions in living cells using three transcription factors. Such circuits could lead to new applications for synthetic biology and biotechnology that require complex computations with limited parts (Daniel et al., 2013). These methods and technologies can be combined to optimize the metabolic pathway and significantly boost the production of target compounds in a controllable, scalable, and effective way within host cells.

# **CONCLUSION AND FUTURE PERSPECTIVES**

Fatty acid-derived diverse valuable chemicals are in great demand. This class of chemicals has recently been successfully produced by introducing different biosynthesis genes, enzymes, and pathways into various microbial hosts. Although much progress has been made in the use of metabolic engineering of microbes for the production of fatty acid-derived chemicals, the sub-optimal product yields, and productivities render these platforms far from reaching large-scale commercial exploitation.

Conventional metabolic engineering efforts on the microbial production of fatty acid-derived chemicals predominantly rely on identifying the activity of related enzymes isolated from different sources. In this regard, future efforts should be invested in finding and adopting novel sources of enzymes either in existing pathways or from completely novel producing pathways with such desired features as higher enzyme activity, stability, and specificity. High-throughput enzyme screening methods and bioinformatics tools could be used to screen these enzymes from vastly different organisms.

However, many attempts have demonstrated that the simple import of heterologous pathways into microbial hosts without a good understanding of complex regulatory networks underlying their biosynthesis pathways, will unlikely yield high-level production of target fatty acid-derived chemicals. Hence, the exploration of such metabolic and regulatory information is crucial for the heterologous production of these chemicals. Due to the complexity of regulatory networks, difficulties can be formidable. Synthetic biology-based tools can help to elucidate complex regulatory networks, enhance gene expression, increase enzyme activities and substrate specificity, improve metabolic flux, and boost product titer in heterologous microbial hosts. Taken together, combinatorial approaches encompassing metabolic engineering and synthetic biology together with more detailed knowledge of metabolic and genetic regulatory mechanisms, will be effective in overcoming bottlenecks inherent in the production of fatty acid-derived valuable chemicals in microbes. Ultimately, successful engineering strategies will be key to push efficient microbial-based production of the fatty acid-derived valuable chemicals forward toward industrialization.

### **ACKNOWLEDGMENTS**

We gratefully acknowledge funding support from the Competitive Research Program of the National Research Foundation of Singapore (NRF-CRP5-2009-03), the Agency for Science, Technology and Research of Singapore (1324004108), the National Environment Agency of Singapore (ETRP 1201102), and Global R&D Project Program, the Ministry of Knowledge Economy, the Republic of Korea (N0000677).

# **REFERENCES**


potential applications. *FEMS Yeast Res.* 5, 527–543. doi:10.1016/j.femsyr.2004. 09.004


for improved biosynthesis of fatty acid ethyl esters. *Biotechnol. Bioeng.* 111, 1740–1747. doi:10.1002/bit.25234


Zhu, B., Cai, G., Hall, E. O., and Freeman, G. J. (2007). In-Fusion assembly: seamless engineering of multidomain fusion proteins, modular vectors, and mutations. *Biotechniques* 43, 354–359. doi:10.2144/000112536

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 August 2014; accepted: 10 December 2014; published online: 23 December 2014.*

*Citation: Yu A-Q, Pratomo Juwono NK, Leong SSJ and Chang MW (2014) Production of fatty acid-derived valuable chemicals in synthetic microbes. Front. Bioeng. Biotechnol. 2:78. doi: 10.3389/fbioe.2014.00078*

*This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology.*

*Copyright © 2014 Yu, Pratomo Juwono, Leong and Chang . This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **Sabine A. E. Heider, NatalieWolf , Arne Hofemeier, Petra Peters-Wendisch and Volker F.Wendisch\***

Faculty of Biology and Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany

#### **Edited by:**

Jean Marie François, Laboratoire d'Ingénierie des Systèmes Biologiques et des Procédés UMR-CNRS 5504, France

#### **Reviewed by:**

Klaas J. Jan Hellingwerf, University of Amsterdam, Netherlands Tiangang Liu, Wuhan University, China

#### **\*Correspondence:**

Volker F. Wendisch, Faculty of Biology and CeBiTec, Bielefeld University, Universitätsstr. 25, Bielefeld 33615, Germany e-mail: volker.wendisch@ uni-bielefeld.de

The biotechnologically relevant bacterium Corynebacterium glutamicum, currently used for the million ton-scale production of amino acids for the food and feed industries, is pigmented due to synthesis of the rare cyclic C50 carotenoid decaprenoxanthin and its glucosides.The precursors of carotenoid biosynthesis, isopenthenyl pyrophosphate (IPP) and its isomer dimethylallyl pyrophosphate, are synthesized in this organism via the methylerythritol phosphate (MEP) or non-mevalonate pathway. Terminal pathway engineering in recombinant C. glutamicum permitted the production of various non-native C50 and C40 carotenoids. Here, the role of engineering isoprenoid precursor supply for lycopene production by C. glutamicum was characterized. Overexpression of dxs encoding the enzyme that catalyzes the first committed step of the MEP-pathway by chromosomal promoter exchange in a prophage-cured, genome-reduced C. glutamicum strain improved lycopene formation. Similarly, an increased IPP supply was achieved by chromosomal integration of two artificial operons comprising MEP pathway genes under the control of a constitutive promoter. Combined overexpression of dxs and the other six MEP pathways genes in C. glutamicum strain LYC3-MEP was not synergistic with respect to improving lycopene accumulation. Based on C. glutamicum strain LYC3-MEP, astaxanthin could be produced in the milligrams per gram cell dry weight range when the endogenous genes crtE, crtB, and crtI for conversion of geranylgeranyl pyrophosphate to lycopene were coexpressed with the genes for lycopene cyclase and β-carotene hydroxylase from Pantoea ananatis and carotene C(4) oxygenase from Brevundimonas aurantiaca.

**Keywords: carotenoid production, genome-reduced Corynebacterium glutamicum, MEP pathway, synthetic operons, astaxanthin**

#### **INTRODUCTION**

Carotenoids are ubiquitous natural pigments with colors ranging from yellow to red. They are composed of isoprene units and belong to the family of terpenoids. These pigments do not only play important and versatile roles in their biological hosts, but are also suggested to have a beneficial effect on human health. Furthermore, they are intensively applied for food and beverage coloration (Downham and Collins, 2000; Gassel et al., 2013). Hence, carotenoids have received extensive considerable attention and especially the interest for an efficient and environmental-friendly production by microbial hosts is increasing (Lee and Schmidt-Dannert, 2002; Das et al., 2007; Harada and Misawa, 2009; Cutzu et al., 2013). In order to compete with already existing production processes, such as chemical synthesis or extraction from organic material, the large-scale production in microbial hosts requires process as well as strain optimization. One of the most common strategies for enhanced production is the efficient supply of precursor molecules as all carotenoids derive from the universal C5 precursor molecule IPP and its isomer DMAPP. IPP and DMAPP can be synthesized via two independent pathways, the mevalonate (MVA) and the 2-methylerythritol 4-phosphate (MEP) pathway (Rodriguez-Concepcion and Boronat, 2002). The MVA pathway

starts from acetyl-CoA and operates mainly in eukaryotes (mammals, fungi, in the cytoplasm of plant cells), archaea, and a limited number of bacteria. The MEP pathway that starts from pyruvate and glyceraldehyde 3-phosphate and proceeds via the eponymous intermediate MEP was identified much later (Rohmer et al., 1993) and is found in most bacteria as well as in plant plastids (Rohmer, 1999; Lange et al., 2000; Lee and Schmidt-Dannert, 2002). Both pathways also differ regarding redox and energy requirements (Steinbüchel, 2003). As the MEP pathway is present in several pathogens such as *Plasmodium falciparum* and *Mycobacterium tuberculosis*, but not in mammals, it is considered a drug target (Jomaa et al., 1999; Testa and Brown, 2003).

The MEP pathway consists of nine reactions catalyzed by eight enzymes (**Figure 1**) starting with the transfer of an acetaldehyde group derived from pyruvate to GAP, forming 1-deoxy-d-xylulose 5-phosphate (DXP), in the reaction of DXP synthase Dxs (EC 2.2.1.7). The intermediate DXP is also the precursor for thiamine (vitamine B1) (Begley et al., 1999) and pyridoxol (vitamine B6) (Hill et al., 1996) biosynthesis. Subsequently, DXP reductoisomerase Dxr (EC 1.1.1.267) converts DXP to MEP using NADPH as cofactor. MEP is then converted to the cyclic diphosphate 2Cmethyl-d-erythritol-2,4-cyclodiphosphate (ME-cPP) by the three

enzymes IspD, IspE, and IspF (Gräwert et al., 2011). ME-cPP is then converted to IPP and DMAPP by a reduction and elimination reaction catalyzed by the two iron–sulfur proteins IspG and IspH (Rohdich et al., 2004). It is proposed that flavodoxin is an essential redox partner for one of the enzymes (Adam et al., 2002; Gräwert et al., 2004; Puan et al., 2005). IPP and DMAPP can be synthesized independently by IspH (Gräwert et al., 2004). IPP and DMAPP often do not occur in the same ratio as for example in *Escherichia coli* IPP is synthesized in a 5:1 proportion to DMAPP (Rohdich et al., 2002; Gräwert et al., 2004; Xiao et al., 2008). The IPP:DMAPP isomerase Idi (EC 5.3.3.2) facilitates the isomerization between IPP and DMAPP. In the case of microorganisms using the MVA pathway produce/synthesize IPP exclusively, isomerases are essential enzymes, whereas in bacteria possessing the MEP pathway *idi* is not essential for the survival of the cells (Hahn et al., 1999; Julsing et al., 2007).

*Corynebacterium glutamicum* is a pigmented Gram-positive bacterium with a long and safe history in the food and feed sector as it is used for the fermentative production of amino acids. Annually, about 2.6 million tons of l-glutamate and about 1.95 million tons of l-lysine are produced biotechnologically worldwide (Ajinomoto, Food Products Business. Available from http://www.ajinomoto.com/en/ir/pdf/Food-Oct2012.pdf and /Fe ed-useAA-Oct2013.pdf, Cited 18 March 2014). Besides amino acids, the diamines cadaverine and putrescine (Mimitsuka et al., 2007; Schneider and Wendisch, 2010) and the alcohols ethanol and isobutanol (Sakai et al., 2007; Blombach and Eikmanns, 2011), among others, can be produced from sugars by recombinant *C. glutamicum* strains. Furthermore, access of *C. glutamicum* to alternative feed stocks like glycerol from the biodiesel process (Meiswinkel et al., 2013), pentoses from lignocellulosics (Gopinath et al., 2011), amino sugars (Uhde et al., 2013; Matano et al., 2014), starch (Seibold et al., 2006), and β-glucans (Tsuchidate et al., 2011) has been engineered.

Recently, the potential of *C. glutamicum* for production of carotenoids has been explored. *C. glutamicum* synthesizes the cyclic C50 carotenoid decaprenoxanthin and its glucosides (**Figure 1**). Its carotenogenic pathway and the respective genes have been elucidated (Krubasik et al., 2001; Heider et al., 2012, 2014a) and overproduction of the C50 carotenoids decaprenoxanthin, sarcinaxanthin, and C.p. 450 in the milligrams per gram cell dry weight (DCW) range by *C. glutamicum* was achieved by metabolic engineering of the terminal carotenoid pathway (Heider et al., 2014a). Moreover, the heterologous production of the C40

carotenoids β-carotene and zeaxanthin could be established (Heider et al., 2014a) and hydroxylated carotenoids could be produced either as aglycons or as di-glucosides (Heider et al., 2014a). Engineering of *C. glutamicum* for the production of a sesquiterpene, (+)-valencene, was possible as well (Frohwitter et al., 2014).

Based on its genome sequence, all genes of the MEP pathway of *C. glutamicum* have been putatively assigned. However, neither have the respective genes or enzymes of the MEP pathway been functionally analyzed nor has engineering for an increased IPP supply been reported. The MEP pathway genes are distributed over the genome of *C. glutamicum*. The MEP pathway genes *dxs* (cg2083), *ispH* (cg1164), and *idi* (cg2531) are monocistronic, while *dxr* (cg2208), *ispD* (cg2945), *ispE* (cg1039), *ispF* (cg2944), and *ispG* (cg2206) belong to operons. *IspE* is the third gene of the operon cg1037-*ksgA*-*ispE*-cg1040-*pdxK* with genes for a putative resuscitation-promoting factor (cg1037), putative dimethyladenosine transferase KsgA, and putative pyridoxamine kinase PdxK. *IspD* and *ispF* are encoded in the cg2946-*ispDF* operon with cg2946, which codes for a CarD-like transcriptional regulator. *Dxr* and *ispG* are organized in a transcriptional unit separated by an uncharacterized gene (cg2207) putatively encoding a membraneembedded Zn-dependent protease. In bacteria, two bottlenecks in the MEP pathway were proposed. On the one hand, DXP synthase, which catalyzes the first reaction is claimed to be rate-limiting (Sprenger et al., 1997; Xiang et al., 2007) and is essential in *E. coli* (Sauret-Gueto et al., 2003) and *Bacillus subtilis* (Julsing et al., 2007) and possibly further bacteria. On the other hand, overproduction of Idi, which is not essential in bacteria possessing the MEP pathway (Hahn et al., 1999; Julsing et al., 2007), improved carotenoid production (Harker and Bramley, 1999; Kim and Keasling, 2001).

In this study, two synthetic operons (*ispDFE* and *dxr-ispGH*) under control of the strong promoter P*tuf* of the *C. glutamicum* translation elongation factor EF-Tu gene were integrated into the prophage-cured, genome-reduced *C. glutamicum* strain MB001 (Baumgart et al., 2013). Furthermore, *dxs* was overexpressed from the chromosome by exchanging the endogenous promoter with the P*tuf* promoter. Finally, *idi* was overexpressed from an IPTGinducible plasmid. The genome-reduced strain overexpressing all of the eight MEP pathway genes was then shown to be suitable for production of lycopene and endogenous decaprenoxanthin as well as for production of the non-native astaxanthin.

# **MATERIALS AND METHODS**

# **BACTERIAL STRAINS, MEDIA AND GROWTH CONDITIONS**

The strains and plasmids used in this work are listed in **Table 1**. *C. glutamicum* ATCC13032 was used as wild type (WT), for metabolic engineering the prophage-cured *C. glutamicum* MB001 (Baumgart et al., 2013) was used as platform strain. Precultivation of *C. glutamicum* strains was performed in LB medium or LB with glucose. For cultivation in CGXII medium (Eggeling and Reyes, 2005), precultivated cells were washed once with CGXII medium without carbon source and inoculated to an initial OD<sup>600</sup> of 1. Glucose was added as carbon and energy source to a concentration of 100 mM. Standard cultivations of *C. glutamicum* were performed at 30°C in a volume of 50 ml in 500 ml flasks with two baffles shaking at 120 rpm. The OD<sup>600</sup> was measured in dilutions using a Shimadzu UV-1202 spectrophotometer (Duisburg, Germany). Alternatively, cultivations were performed in 1 ml volume in microtiterplates at 1100 rpm at 30°C using Biolector® micro fermentation system (m2p-labs GmbH, Baesweiler, Germany). For cloning, *E. coli* DH5α was used as host and cultivated in LB medium at 37°C. When appropriate, kanamycin or spectinomycin was added to concentrations of 25 and 100µg ml−<sup>1</sup> , respectively. Gene expression was induced by adding 50µM and 1 mM IPTG, respectively, at inoculation of the main culture.

# **RECOMBINANT DNA WORK**

Plasmids were constructed in *E. coli* DH5α from PCR-generated fragments (KOD, Novagen, Darmstadt, Germany) and isolated with the QIAprep spin miniprep kit (QIAGEN, Hilden, Germany). Oligonucleotides used in this study were obtained from Eurofins MWG Operon (Ebersberg, Germany) and are listed in **Table 2**. Standard reactions like restriction, ligation, and PCR were performed as described previously (Sambrook and Russell, 2001). Besides the common ligation reaction, the Gibson assembly has been applied for the construction of plasmids (Gibson et al., 2009). If applicable, PCR products were purified using the PCR purification kit or MinElute PCR purification kit (QIAGEN, Hilden, Germany). For transformation of *E. coli*, the RbCl method was used (Hanahan, 1983) and *C. glutamicum* was transformed via electroporation (van der Rest et al., 1999) at 2.5 kV, 200 Ω, and 25µF. All cloned DNA fragments were shown to be correct by sequencing.

## **DELETION OF CAROTENOGENIC GENES IN C. GLUTAMICUM MB001**

For deletion of the carotenogenic genes *crtYe/f* and *crtEb*, encoding the C45/C50 carotenoid ε-cyclase and the lycopne elongase, respectively, the suicide vector pK19*mobsacB* was used (Schäfer et al., 1994). Genomic regions flanking the *crtYEb* cluster were amplified from genomic DNA of *C. glutamicum* WT using primer pairs *crtY* -A/*crtY* -B and *crtEb*-C/*crtEb*-D (**Table 2**), respectively. The PCR products were purified and linked by crossover PCR using the primer pair *crtY* -A/*crtEb-*D (**Table 2**). The purified PCR product was cloned into pK19*mobsacB* resulting in the construction of deletion vector pK19*mobsacB*-δ*crtYEb* (**Table 1**). The targeted deletion of *crtYEb* via two-step homologous recombination as well as the selection for the first and second recombination events were carried out as described previously (Eggeling and Bott, 2005). Deletion of *crtYEb* was verified by PCR analysis of the constructed mutant using primer pair *crtY-*E/*crtEb*-F (**Table 2**).

# **CONSTRUCT DESIGN OF THE SYNTHETIC MEP OPERONS AND THEIR INTEGRATION INTO THE GENOME OF C. GLUTAMICUM LYC3**

The integration of the synthetic operons Op1 and Op2 was conducted by using the suicide vector pK19*mobsacB* (Schäfer et al., 1994). Op1 consists of the MEP-pathway genes *ispD*, *ispF*, and *ispE* under the control of the constitutive P*tuf* promoter. *IspD* and *ispF* form a transcription unit and were amplified as such from genomic DNA from *C. glutamicum* WT using the oligonucleotides 5 and 6. The primer pair 7/8 was used to amplify *ispE* from *C. glutamicum* WT, introducing an artificial ribosome binding site (RBS) in front of the gene. The promoter region was amplified using the oligonucleotides 3 and 4. In Op2 *dxr*, *ispG* and *ispH* were combined, by amplification from the *C. glutamicum* WT genome using

#### **Table 1 | Strains and plasmids used in this study**.


#### **Table 1 | Continued**


the primer pairs 15/16, 17/18, and 19/20, respectively. An artificial RBS in front of *ispG* and *ispH* each was introduced by the oligonucleotides 17 and 19, respectively. Also the genes of Op2 were put under the control of the P*tuf* promoter, amplified from genomic DNA using the primers 13 and 14. Genomic regions flanking the selected insertion region were amplified from genomic DNA of *C. glutamicum* LYC3 using primer pairs 1/2 and 9/10 for integration in the cgp2 cured region in the case of Op1, or 11/12 and 20/22 for integration of Op2 in the cgp1 cured region (**Table 2**), respectively. The purified PCR products were either linked by crossover PCR or were directly combined together with the plasmid by Gibson assembly (Gibson et al., 2009). The final assembly of the insert with linearized pK19*mobsacB* led to the construction of the respective integration vectors pK19*mobsacB*-Op1 and pK19*mobsacB*-Op2 (**Table 1**). The following integration of the operon by two-step homologous recombination was performed according to the deletion of genes. The integration of operon1 and 2 was verified by PCR using the primers 29/30 and 31/32, respectively.

## **PROMOTER EXCHANGE OF THE dxs GENE IN C. GLUTAMICUM LYC3**

The plasmid pK19*mobsacB-*P*tufdxs* was constructed to replace the native *dxs* promoter with the *tuf* promoter region from *C. glutamicum* WT. For this purpose, the upstream region of *dxs* (483 bp), the 3<sup>0</sup> part of *dxs* and the *tuf* promoter region [200 bp upstream of the coding sequence of the *tuf* gene(cg0587)] were amplified from chromosomal DNA of *C. glutamicum* LYC3 using the oligonucleotide pairs 27/28, 23/24, and 25/26, respectively (**Table 2**). By crossover PCR, the *dxs* 3 0 fragment and the *tuf* promoter region were fused with oligonucleotides 23/26. Afterward, the *dxs* upstream region was fused to this 644 bp long fragment using oligonucleotides 27/26. The final purified PCR product was cloned into pK19*mobsacB* resulting in the vector pK19*mobsacB-*P*tufdxs* (**Table 1**). The following process for the promoter exchange by two-step homologous recombination was performed as described earlier for the deletion of genes. The promoter exchange was verified by PCR using the primers dxs\_E and 33, and sequencing of the PCR product.

## **OVEREXPRESSION OF CAROTENOGENIC GENES**

Plasmids harboring a carotenogenic gene (general abbreviation *crt*), pEKEx3-*crt* or pVWEx1-*crt* allowed an IPTG-inducible overexpression of *crt*. They were constructed on the basis of pEKEx3 (Stansen et al., 2005) or pVWEx1 (Peters-Wendisch et al., 2001), respectively. Amplification of *crt* by polymerase chain reaction (PCR) from genomic DNA of *C. glutamicum* WT, *P. ananatis* and *B. aurentiaca*, which was prepared as described (Eikmanns et al., 1995), was carried out using the respective primers (**Table 2**). The amplification of the *crt* genes from was based on genomic DNA as template. The amplified products were cloned into the appropriately restricted pEKEx3 or pVWEx1 plasmid DNA.

# **EXTRACTION ANALYSIS OF CAROTENOIDS**

To extract carotenoids from the *C. glutamicum* strains 15 ml aliquots of the cell cultures were centrifuged at 10,000 × g for 15 min and the pellets were washed with deionized H2O. The pigments were extracted with 10 ml methanol:acetone mixture (7:3) at 60°C for 30 min with thorough vortexing every 10 min. When necessary, several extraction cycles were performed to remove all visible colors from the cell pellet (Heider et al., 2012).

The extraction mixture was centrifuged 10,000 × g for 15 min and the supernatant was transferred to a new tube. The carotenoid content in the extracts was quantified through absorbance at 470 nm by HPLC analysis (see below) and the concentrations were calculated using a standard curve and appropriate dilutions. High performance liquid chromatography (HPLC) analyses of the *C. glutamicum* extracts were performed like described earlier (Heider et al., 2014a) on an Agilent 1200 series HPLC system (Agilent Technologies Sales & Services GmbH & Co., KG, Waldbronn), including a diode array detector (DAD) for UV/visible (Vis) spectrum recording. For separation, a column system consisting of a precolumn (10 mm × 4 mm MultoHigh 100 RP18-5, CS Chromatographie Service GmbH, Langerwehe, Germany) and a main column (ProntoSIL 200-5 C30, 250 mm × 4 mm, CS Chromatographie Service GmbH, Langerwehe, Germany) was used. Quantification of carotenoids was performed using the extracted wavelength

#### **Table 2 | Oligonucleotides used in this study.**


(Continued)

#### **Table 2 | Continued**


Sequence in bold: artificial ribosome binding site; sequence underlined: restriction site; sequence in italics: linker sequence for hybridization.

chromatogram at 470 nm for decaprenoxanthin and carotenoids with corresponding UV/Vis profiles as well as for lycopene and corresponding carotenoids. Lycopene from tomato (Sigma, Steinheim, Germany), astaxanthin (Ehrenstorfer GmbH, Augsburg, Germany), and β-carotene (Merck, Darmstadt, Germany) were used as standards. The carotenoids were dissolved in chloroform according to its solubility and diluted in methanol:acetone (7:3). Due to the lack of appropriate standards decaprenoxanthin and zeaxanthin quantification was calculated based on a β-carotene standard and reported as β-carotene equivalents. The HPLC protocol comprised a gradient elution for 10 min and a mobile phase composition of (A) methanol and (B) methanol/methyl tert-butyl ether/ethyl acetate (5:4:1) starting from 10 to 100% eluent B followed by 20 min of isocratic elution with 100% B. After that, the eluent composition is set back to 10% B for 3 min. The injection volume was 50µl and the flow rate was kept constant at 1.4 ml/min.

# **DXS ACTIVITY ASSAY**

The DXS activity of *C. glutamicum* crude extracts was determined using an endpoint assay adopted from Xiang et al. (2007), which is based on the measurement of the remaining pyruvate level in the reaction mixture. The assays were carried out at 30°C in total volume of 1 ml containing 50 mM Tris (pH 7.5), 60µM pyruvate, 60µM GAP, 10 mM dithiothreitol (DTT), 5 mM MgCl2, and 600µM TPP. Reactions were stopped after 5, 15, 30, and +60 min of incubation by heat inactivation (5 min at 95°C). Subsequent the leftover pyruvate was converted to lactate with lactate dehydrogenase and the concomitant consumption of NADH was determined by fluorescence. Therefore, the reaction was allowed to proceed for 60 min at room temperature. Then, 2.5 U ml−<sup>1</sup> lactate dehydrogenase and 0.1 mM NADH was added to the reaction mixture and incubated for 30 min at 37°C. The NADH diminution was determined photometrically at 340 nm.

# **RESULTS**

## **OVEREXPRESSION OF dxs INCREASED LYCOPENE YIELD**

The first and often rate-limiting reaction in the MEP pathway is the condensation of pyruvate and GAP to DXP catalyzed by Dxs (Harker and Bramley, 1999;Kim and Keasling, 2001). To test if Dxs is a bottleneck in carotenoid biosynthesis in*C. glutamicum*,*dxs* was

**Table 3 | Influence of chromosomal promoter exchange of the 1-deoxy-d-xylulose 5-phosphate synthase gene dxs on Dxs actitivities, growth rates, and lycopene production**.


Cells were grown in glucose CGXII minimal medium for 24 h. Means and standard deviations of three cultivations are reported.

overexpressed in *C. glutamicum* LYC3, a mutant derived from the genome-reduced *C. glutamicum* strain MB001 (Baumgart et al., 2013) that accumulates lycopene due to deletion of the lycopene elongase and C45/C50 carotenoid ε-cyclase genes*crtEb* and *crtYe/f*. To exchange the native *dxs* promoter by the strong constitutive promoter of *tuf* (cg0587), which encodes for the elongation factor EF-Tu (Fukui et al., 2011), the replacement vector pK19mobsacB-Ptuf*dxs* was constructed and *C. glutamicum* LYC3-Ptuf*dxs* was obtained. Dxs activities measured in crude extracts were about twofold higher in *C. glutamicum* LYC3-P*tufdxs* (16 ± 1 mU mg−<sup>1</sup> ) than in the control strain *C. glutamicum* LYC3 (**Table 3**). As consequence of enhanced Dxs activity, lycopene production doubled (0.08 ± 0.01 mg g−<sup>1</sup> DCW as compared to 0.04 ± 0.01 mg g−<sup>1</sup> DCW) (**Table 3**). Thus, increased Dxs activity improved lycopene production by *C. glutamicum*. Increased specific Dxs activities were also observed when a plasmid-borne copy of *dxs* was overexpressed from an IPTG-inducible promoter in LYC3, but lycopene production was only slightly improved (**Table 3**). Hence, chromosomal overexpression proved better and was therefore chosen for subsequent metabolic engineering of the MEP pathway.

# **OVERPRODUCTION OF ENZYMES CONVERTING DXP TO IPP USING TWO SYNTHETIC OPERONS INTEGRATED INTO THE C. GLUTAMICUM CHROMOSOME**

For overproduction of the six MEP pathway enzymes catalyzing the conversion of DXP to IPP, two synthetic operons were constructed and integrated into the chromosome of *C. glutamicum* LYC3. Operon 1 was constructed to drive expression of *ispDF*, which are cotranscribed naturally, fused to *ispE* from P*tuf*. The RBS of the *tuf* gene was inserted upstream of *ispD*, while the endogenous RBS of *ispF* and a perfect *C. glutamicum* RBS upstream of *ispE* were used. To construct operon 2, *dxr*, *ispG*, and *ispH* were fused for expression from P*tuf* and perfect *C. glutamicum* RBS were inserted upstream of *ispG* and *ispH* while the RBS of the *tuf* gene was used upstream of *dxr*. Both operons were integrated by homologous recombination into the chromosome of *C. glutamicum* LYC3, which lacks prophages cgp1 and cgp2. Operon 1 was integrated into the chromosome of *C. glutamicum* LYC3 between cg1506 and cg1525, i.e., at the position that harbors prophage cgp2 in the *C. glutamicum* WT, but which is absent from LYC3, and the resulting strain was named LYC3-Op1. Similarly, *C. glutamicum* LYC3-Op2 was obtained by integrating operon 2 into the chromosome of *C. glutamicum* LYC3 at the position (between cg1745 and cg1753) that in *C. glutamicum* WT harbors prophage cgp1, but which is absent from LYC3. The constructed *C. glutamicum* strain LYC3-Op1Op2 contains both operons in the chromosome instead of prophages cgp1 and cgp2. *C. glutamicum* LYC3-Op1 showed slightly higher lycopene accumulation than *C. glutamicum* strains LYC3 and LYC3-Op2. *C. glutamicum* LYC3-Op2 grew slower than LYC3 and LYC3-Op1. *C. glutamicum* LYC3-Op1Op2 that harbors both operons also grew slower, but accumulated almost threefold more lycopene than LYC3. Thus, overexpression of MEP pathway genes from two chromosomally integrated synthetic operons improved lycopene production (**Figure 2**).

# **IMPROVED IPP SUPPLY BY CHROMOSOME-BASED ENHANCEMENT OF MEP PATHWAY GENE EXPRESSION**

To combine chromosome-based overexpression of the genes necessary for conversion of DXP to IPP with overproduction of Dxs, the first enzyme of the MEP pathway, the endogenous promoter of chromosomal *dxs* was exchanged by P*tuf* in *C. glutamicum* LYC3- Op1Op2 and the resulting strain was named *C. glutamicum* LYC3- MEP. Surprisingly, LYC3-MEP showed slower growth on solid as well as in liquid medium. Poor growth in liquid glucose medium was accompanied by little lycopene production, although LYC3- MEP colonies appeared well pigmented on plates. Since the central carbon metabolites pyruvate and GAP are the immediate precursors of the MEP pathway, it was tested if lycopene production by *C. glutamicum* LYC3-MEP was affected by the carbon source. To this end, pyruvate and glycerol were tested as carbon sources. Since glycerol is no carbon source for *C. glutamicum* WT, *glpFKD* from *E. coli* encoding the enzyme for conversion of glycerol to GAP were expressed from plasmid pVWEx1-*glpFKD* (Rittmann et al., 2008) in *C. glutamicum* LYC3-MEP. Growth by *C. glutamicum* LYC3-MEP(pVWEx1-*glpFKD*) on glycerol, glycerol + glucose, or glycerol + pyruvate was still impaired, but about twofold more lycopene (around 0.07 ± 0.01 mg g−<sup>1</sup> DCW) accumulated than with glucose as sole carbon source (**Figure 3**).

Since IspH synthesizes both IPP and DMAPP, but typically not in equimolar amounts (Rohdich et al., 2002; Gräwert et al., 2004; Xiao et al., 2008), it is possible that unbalanced biosynthesis of IPP and DMAPP in *C. glutamicum* LYC3-MEP impairs

growth and carotenogenesis. To test this hypothesis, isopentenyl pyrophosphate isomerase Idi was overproduced. Indeed, *C. glutamicum* LYC3-MEP(pVWEx1-idi)(pEKEx3) produced twofold more lycopene (0.08 ± 0.02 mg g−<sup>1</sup> DCW) than *C. glutamicum* strains LYC3, LYC3-MEP, and the empty vector control strain, but still showed impaired growth (**Table 4**). Thus, a lycopene producing *C. glutamicum* strain with improved IPP supply overexpressing all MEP pathway genes and *idi* could be constructed. However, lycopene production by this strain (**Table 4**) was comparable to that by *C. glutamicum* strains LYC3-Ptuf*dxs* (**Table 3**) and LYC3-Op1Op2 (**Figure 2**) indicating that the positive effects did not act synergistically. This was also observed when the strains were grown in LB medium supplemented with 100 mM glucose; however, they grew faster (data not shown). Taken together,*C. glutamicum* strains with improved IPP and DMAPP supply showed higher lycopene production than the respective parental strains.

# **APPLICATION OF C. GLUTAMICUM WITH IMPROVED IPP SUPPLY FOR PRODUCTION OF DECAPRENOXANTHIN AND ASTAXANTHIN**

To test if *C. glutamicum* LYC3-MEP overexpressing *idi* is suitable for production of the endogenous C50 carotenoid decaprenoxanthin, this strain was transformed with plasmid pEKEx3-*crtEbY*. Expression of lycopene elongase gene *crtEb* and of carotenoid ε-cyclase gene *crtYe/f* from this plasmid complements the lycopene producing *C. glutamicum* LYC3-MEP, which carries chromosomal*crtEb* and *crtYe/f* deletions allowing for decaprenoxanthin biosynthesis. The resulting strain LYC3-MEP(pVWEX1-*idi*)(pEKEx3-*crtEbY* ) overproduces



Cells were grown in glucose CGXII minimal medium and plasmid carrying strains were induced with 50µM IPTG. Means and standard deviations of three cultivations are shown.

all enzymes of endogenous carotenogenesis except *crtE*, *crtB*, and *crtI* (**Figure 1**). Although it grew slowly, LYC3-MEP (pVWEX1-*idi*)(pEKEx3-*crtEbY* ) produced 0.35 ± 0.02 mg g−<sup>1</sup> DCW (**Table 5**) and, thus, is a genome-reduced strain with improved IPP supply suitable for the overproduction of the endogenous C50 carotenoid decaprenoxanthin.

*C. glutamicum* has previously been engineered for the production of the non-native C40 carotenoids β-carotene and zeaxanthin (Heider et al., 2014a). When *crtY* Pa (PANA\_4160) encoding lycopene cyclase from *Pantoea ananatis* was expressed, β-carotene accumulated. Additional expression of *crtZ*Pa (PANA\_4163), which encodes β-carotene hydroxylase, resulted in partial conversion of β-carotene to zeaxanthin (Heider et al., 2014a). To enable astaxanthin production, *crtW* Ba encoding carotene C(4) oxygenase from *Brevundimonas aurantiaca*, which oxidizes zeaxanthin to yield astaxanthin, was expressed in addition to *crtY* Pa and *crtZ*Pa. The resulting plasmid pEKEx3 *crtZWY* was used to transform LYC3-MEP(pVWEX1-*idi*). *C. glutamicum* LYC3-MEP(pVWEX1-*idi*)(pEKEx3-*crtZWY* ) produced 0.14 ± 0.01 mg g−<sup>1</sup> DCW astaxanthin and neither β-carotene nor zeaxanthin accumulated (**Table 5**). Thus, to the best of our knowledge, this is the documentation of astaxanthin production by recombinant *C. glutamicum*. Although levels were low, LYC3- MEP(pVWEX1-*idi*)(pEKEx3-*crtZWY* ) produced astaxanthin as only carotenoid.

Based on our previous findings that overexpression of the genes *crtE*, *crtB*, and *crtI* (**Figure 1**) strongly increased lycopene production (Heider et al., 2012), as well as decaprenoxanthin production (Heider et al., 2014a); these genes were overexpressed from plasmid pVWEx3-*crtEBI*. The resulting strain *C. glutamicum* LYC3-MEP(pVWEx3-*crtEBI*)(pEKEx3-*crtZWY* ) produced 2.1 ± 1.3 mg g−<sup>1</sup> DCW β-carotene and 1.2 ± 0.2 mg g−<sup>1</sup> DCW zeaxanthin (**Table 5**), but also ninefold more astaxanthin (1.2 ± 0.5 mg g−<sup>1</sup> DCW) than LYC3-MEP(pVWEx1 *idi*)(pEKEx3-*crtZWY* ). Thus, it was shown that astaxanthin can be produced by recombinant *C. glutamicum* in the milligrams per gram DCW range.

# **DISCUSSION**

Recently, *C. glutamicum* has been engineered for production of diverse lycopene-derived carotenoids (Heider et al., 2014a) and of a sesquiterpene (Frohwitter et al., 2014). There is an increasing demand for efficient, low-cost, and natural production of terpenoids (Zhu et al., 2014) as they have many applications,

**Table 5 | Astaxanthin and decaprenoxanthin production by recombinant C. glutamicum strains with improved IPP supply**.


Cells were grown in glucose CGXII minimal medium with 50µM IPTG. Means and standard deviations of three cultivations are reported.

e.g., in the medicinal and nutraceutical industries or as fuels (Martin et al., 2003; Ajikumar et al., 2010; Peralta-Yahya et al., 2011). Besides terminal terpenoid pathway engineering, an efficient supply of the prenyl pyrophosphate precursors is important (Heider et al., 2014b). It could be shown here that MEP pathway engineering to improve IPP supply in *C. glutamicum* improved lycopene production. However, as observed in similar studies of MEP pathway engineering in other bacteria individual bottlenecks may be overcome, but the individual beneficial effects do not necessarily add up (Kim and Keasling, 2001; Martin et al., 2003; Rodriguez-Villalon et al., 2008). Overexpressing the initial MEP pathway gene, *dxs* improved lycopene production by *C. glutamicum* (see **Figure 1**) and by other bacteria (Harker and Bramley, 1999; Matthews and Wurtzel, 2000). However, optimal overexpression levels need to be established since, e.g., chromosomal overexpression proved better than overexpression from a multy-copy plasmid (Yuan et al., 2006). Similarly, when *dxs* was overexpressed in *C. glutamicum* by exchanging the native promoter of *dxs* with the strong constitutive *tuf* promoter more lycopene accumulated than when plasmid-borne *dxs* overexpression, which led to higher Dxs activities, was tested (**Table 3**). The complex interplay of MEP pathway enzymes is also reflected by the fact that overexpression of *dxr*, *ispG*, and *ispH* in LYC3-Op2 only improved lycopene accumulation when combined with overexpression of *ispDF* and *ispE* (Op1) (**Figure 2**). Although lycopene titers obtained with *C. glutamicum* LYC3-Op1Op2 were comparable to the *dxs* overexpressing strain LYC3-P*tufdxs* (**Figure 2** and **Table 3**), their combination in strain LYC3-MEP was not synergistic and even perturbed growth. This may be explained by accumulation of inhibitory MEP pathway intermediates as shown for *B. subtilis* (Sivy et al., 2011) and *E. coli* (Martin et al., 2003; Zou et al., 2013), from an excessive drain of central metabolic intermediates (Kim and Keasling, 2001) and/or from an imbalance between IPP and DMAPP (Kajiwara et al., 1997). In *C. glutamicum*, improved lycopene production as consequence of overexpression of IPP isomerase gene *idi* was observed in LYC3- MEP (**Table 4**). However, lycopene production by LYC3-MEP overexpressing *idi* was not higher than by LYC3-P*tufdxs* or by LYC3-Op1Op2. Moreover, when *dxs* was overexpressed in the WTderived strain ∆*crtEb* lycopene production increased from about 0.04 to about 0.12 mg g−<sup>1</sup> DCW, but combined overexpression of *dxs* and *idi* did not further increase lycopene production (data not shown). Thus, the perturbed growth may not only be due to an imbalance between IPP and DMAPP.

It remains to be shown if combinatorial approaches to optimize multiple gene expression levels (Zelcbuch et al., 2013; Nowroozi et al., 2014) would improve the IPP precursor supply in *C. glutamicum*. Fine-tuning of gene expression in recombinant *C. glutamicum* by varying promoters (Holátko et al., 2009; van Ooyen et al., 2011; Schneider et al., 2012), RBSs (Schneider et al., 2012), translational start codons (Schneider et al., 2012), or translational stop codons (Jensen and Wendisch, 2013) improved production of amino acids and diamines. In addition,overexpression of heterologous instead of endogenous genes may be beneficial, e.g., as shown for improving isoprene production by *E. coli* via overexpression of two MEP pathway genes *dxs* and *dxr* from *B. subtilis* (Zhao et al., 2011) or by combining overexpression of *xylA* from *Xanthomonas*

*campestris* with endogenous *xylB* to accelerate xylose utilization of *C. glutamicum* (Meiswinkel et al., 2013).

Besides fine-tuning of MEP pathway gene overexpression, growth, and terpenoid production by recombinant *C. glutamicum* with increased IPP supply could be improved by metabolic pull, i.e., by overexpression of genes of the downstream terpenoid pathway (**Table 5**). Similarly, amorphadiene synthase overexpression prevented accumulation of inhibitory isoprenoid pathway intermediates in *E. coli* (Martin et al., 2003). Overcoming the toxicity of accumulating IPP and DMAPP was successfully used as screening method for the identification of genes that are involved in isoprenoid biosynthesis (Withers et al., 2007). Accumulation of the MEP pathway intermediate ME-cPP inhibits growth and isoprenoid production by recombinant *E. coli*. To abolish its accumulation overexproducing the two enzymes downstream of ME-cPP (*ispG* and *ispH*) needed to be combined with overexpressing an operon for iron–sulfur cluster assembly since both IspG and IspH are containing iron–sulfur clusters (Zou et al., 2013).

To the best of our knowledge, production of astaxanthin by recombinant *C. glutamicum* was shown here for the first time. Astaxanthin is the third most important carotenoid after βcarotene and lutein and its global market amounted to about 230 million US\$ in 2010 (BBC Research, 2011). The economically most significant application of astaxanthin is its use as feed additive in aquaculture industry (Lorenz and Cysewski, 2000; Higuera-Ciapara et al., 2006; Schmidt et al., 2011), but it also exhibits high potential as a nutraceutical and as an approved ingredient for cosmetics due to its remarkably high antioxidative activity (Miki, 1991; Schmidt et al., 2011). Astaxanthin is mainly produced by marine bacteria and microalgae, but only the green freshwater microalga *Haematococcus pluvialis* and the red yeasts *Xanthophyllomyces dendrohous/Phaffia rhodozyma* are established as hosts for commercial production (Bhosale and Bernstein, 2005; Rodriguez-Saiz et al., 2010). Algae-based production of astaxanthin is still more costly than chemical synthesis (Jackson et al., 2008), but markets more and more demand naturally produced carotenoids. The astaxanthin titers by recombinant *C. glutamicum* reported here are in the milligrams per gram DCW range and, thus, they are comparable to yields described for *P. rhodozyma* (ranging from 0.16 to 6.6 mg g−<sup>1</sup> DCW (Cruz and Parajo, 1998; Jacobson et al., 1999). The highest product titer of 9.7 mg g−<sup>1</sup> DCW is reported for a *P. rhodozyma* strain improved by metabolic engineering and classical mutagenesis (Gassel et al., 2013), while the highest titer in a recombinant bacterium, i.e., *E. coli* strain was 5.8 mg g−<sup>1</sup> DCW astaxanthin (Zelcbuch et al., 2013). Thus, the astaxanthin titers reported for *C. glutamicum* are comparable and it is conceivable that they may be improved further by combining metabolic engineering with classical mutagenesis as in *P. rhodozyma* (Gassel et al., 2013), by combinatorial approaches to gene expression (Zelcbuch et al., 2013), or by high-cell density cultivation since biomass concentrations of up to 95 g DCW/l have been reported for *C. glutamicum* (Riesenberg and Guthke, 1999).

# **REFERENCES**

Abe, S., Takayarna, K., and Kinoshita, S. (1967). Taxonomical studies on glutamic acid producing bacteria. *J. Gen. Appl. Microbiol.* 13, 279–301. doi:10.2323/jgam. 13.279


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 02 May 2014; accepted: 31 July 2014; published online: 20 August 2014. Citation: Heider SAE, Wolf N, Hofemeier A, Peters-Wendisch P and Wendisch VF (2014) Optimization of the IPP precursor supply for the production of lycopene, decaprenoxanthin and astaxanthin by Corynebacterium glutamicum. Front. Bioeng. Biotechnol. 2:28. doi: 10.3389/fbioe.2014.00028*

*This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology.*

*Copyright © 2014 Heider, Wolf, Hofemeier, Peters-Wendisch and Wendisch. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Engineering sugar utilization and microbial tolerance toward lignocellulose conversion

#### **Lizbeth M. Nieves† , Larry A. Panyon† and XuanWang\***

School of Life Sciences, Arizona State University, Tempe, AZ, USA

#### **Edited by:**

Pablo Carbonell, University of Evry, France

#### **Reviewed by:**

Weiwen Zhang, Tianjin University, China Taek Soon Lee, Lawrence Berkeley National Laboratory, USA

#### **\*Correspondence:**

Xuan Wang, School of Life Sciences, Arizona State University, 427 E. Tyler Mall, Tempe, AZ 85287, USA e-mail: wangxuan@asu.edu

†Lizbeth M. Nieves and Larry A. Panyon have contributed equally to this work.

# **INTRODUCTION**

One of the daunting challenges faced by the modern world is our unsustainable dependence on petroleum as the primary source for transportation fuels and many chemical products including solvents, fertilizers, pesticides, and plastics (Service, 2007). To fulfill future societal needs, we have to find a sustainable supply of energy and chemicals. Synthetic biology has emerged as a young discipline with the great potential to construct a novel biological system to produce fuels and chemicals from renewable sources in a cost-effective manner, thus ultimately achieving energy selfsufficiency independent of petroleum. We will apply the synthetic biology definition of "the design and construction of new biological components, such as enzymes, genetic circuits, and cells, or the redesign of existing biological systems" throughout this review (Keasling, 2008). The engineered biological systems created by synthetic biology include enzymes with new functions, genetic circuits, and engineered cells with unique specifications (Cameron et al., 2014; Way et al., 2014). In many cases, the ultimate goal is to rationally manipulate organisms to facilitate novel functions, which do not exist in nature (Cameron et al., 2014; Way et al., 2014). Thus far, synthetic biology has contributed to many fields such as bio-based production (Keasling, 2008; Jarboe et al., 2010), tissue and plant engineering (Bacchus et al., 2012; Moses et al., 2013; Xu et al., 2013; Trantidou et al., 2014), and cell-free synthesis (Lee and Kim, 2013).

Plant biomass (lignocellulose) represents arguably the most important renewable feedstock on the planet. Lignocellulose is a complex matrix of various polysaccharides, phenolic polymers, and proteins that are present in the cell walls of woody plants (Saha, 2003; Girio et al., 2010). Conversion of non-food plant biomass, especially agricultural residues such as corn stover and sugarcane bagasse, avoids the many concerns about the production of fuels and chemicals derived from food sources (Lynd, 1990).

Production of fuels and chemicals through a fermentation-based manufacturing process that uses renewable feedstock such as lignocellulosic biomass is a desirable alternative to petrochemicals. Although it is still in its infancy, synthetic biology offers great potential to overcome the challenges associated with lignocellulose conversion. In this review, we will summarize the identification and optimization of synthetic biological parts used to enhance the utilization of lignocellulose-derived sugars and to increase the biocatalyst tolerance for lignocellulose-derived fermentation inhibitors. We will also discuss the ongoing efforts and future applications of synthetic integrated biological systems used to improve lignocellulose conversion.

**Keywords: synthetic biology, metabolic engineering, lignocellulose, xylose, furan aldehydes**

Additionally, non-food-based biofuels offer greater cost reduction in the longer term (Lynd, 1990). For numerous types of agricultural residues, the sugar content is comparable to corn (Saha, 2003). However, the conversion of these sugars from agricultural residues to fuels and chemicals in a cost-effective manner still remains challenging. There are at least three major challenges to be solved before lignocellulose bioconversion becomes financially feasible (**Figure 1**). First, in contrast to starch, which is easily degraded into fermentable sugar monomers, sugars in lignocellulose are locked into very stable polymeric structures including cellulose and hemicellulose (Saha, 2003; Girio et al., 2010). These polymers are designed by nature to resist deconstruction (Alvira et al., 2010). The crystalline-like fibers of cellulose are encased in a covalently linked mesh of lignin and hemicellulose. Cellulose (30– 40% of biomass dry weight) is composed of only d-glucose linked by β-1,4 glycosidic bonds while a mixture of pentoses, especially d-xylose, and hexoses comprises the main component of hemicellulose (20–40% of biomass dry weight) (Saha, 2003). Lignin is not the saccharides polymer but a complex polymer of aromatic alcohols. Different types of lignocellulosic biomass vary in the composition of cellulose, hemicellulose, and lignin (Saha, 2003). Chemical pretreatment processes are commonly required for lignocellulose conversion. Steam pretreatment with dilute mineral acids is an efficient approach to depolymerize hemicellulose into sugar monomers and to increase the accessibility of cellulase enzymes to degrade cellulose (Saha, 2003; Sousa et al., 2009; Alvira et al., 2010). After pretreatment and cellulase digestion, most of the sugars in agricultural waste will be released into the broth and thus ready to be converted into fuels and chemicals if a suitable biocatalyst is applied. The cost of cellulase enzymes is currently still prohibitive to wide application of lignocellulose conversion. Continuing efforts of synthetic biologists from academic and industrial labs are improving cellulase enzymes or

cost-effective lignocellulose conversion.

enzyme complexes aiming to develop catalysts that are costeffective enough to be suitable for commercialization. The recent advancements in cellulases have been extensively reviewed (Elkins et al., 2010; Garvey et al., 2013; Hasunuma et al., 2013; Bommarius et al., 2014) and therefore are not the scope of this review. Second, one of the major carbohydrates in the typical lignocellulosic biomass is d-xylose, a five-carbon aldose, which is difficult for many microbes to metabolize. For instance, common ethanolproducing industrial microbes such as *Saccharomyces cerevisiae* and *Zymomonas mobilis*, do not natively metabolize xylose (Saha, 2003). Although some microbes such as *Escherichia coli* and *Klebsiella pneumonia* have the native xylose metabolic pathway, it is not efficient and is commonly repressed by the presence of glucose (Saha, 2003). Third, side products that hinder cell growth and fermentation such as furfural, 5-hydroxymethylfurfural, formate, acetate, and soluble lignin products are formed during common chemical pretreatment processes (Saha, 2003; Mills et al., 2009). For example, furfural (dehydration product of pentose sugars) is widely regarded as one of the most potent inhibitors (Mills et al., 2009; Geddes et al., 2010a, 2011). It can completely inhibit cellular growth at low concentrations (Zaldivar et al., 1999; Liu and Blaschek, 2010). The concentration of furfural is correlated with the toxicity of dilute acid hydrolyzates (Martinez et al., 2000). Overliming to pH 10 with Ca(OH)<sup>2</sup> or active carbon filter reduces the level of furfural and toxicity, but increases the process complexity and operational cost, thus reducing economic viability (Martinez et al., 2000). There has been a growing interest to engineer industrially related strains to be more resistant to these inhibitors (Wang et al., 2012a,b; Zheng et al., 2012; Geddes et al., 2014; Xiao and Zhao, 2014). For example, beneficial genetic traits to increase host tolerance of furan aldehydes have been identified (Taherzadeh et al., 2000; Liu et al., 2004, 2005, 2008; Gorsich et al., 2006; Petersson et al., 2006; Almeida et al., 2008; Geddes et al., 2014; Glebes et al., 2014a,b; Luhe et al., 2014), knowledge about toxicity mechanisms has been accumulated (Lin et al., 2009a; Miller et al., 2009a,b; Ma and Liu, 2010; Glebes et al., 2014a,b), and thus the integrated synthetic detoxification systems have been constructed and proven effective in different biocatalysts (Wang et al., 2013).

Despite government incentives and mandates, these grand challenges have prohibited the commercialization of lignocellulose conversion into fuels and chemicals at low cost (Sheridan, 2013). Until now, most efforts for lignocellulose conversion have been devoted to microbial ethanol production. By pathway engineering and metabolic engineering, the microbial hosts can extend their metabolism to produce valuable chemicals other than ethanol from lignocellulose. This review focuses on engineering new biological components by synthetic biology to improve lignocellulose conversion. The past efforts, current status, and future challenges will be discussed.

# **GENETIC IMPROVEMENT OF UTILIZATION AND TRANSPORT OF MONOSACCHARIDES DERIVED FROM LIGNOCELLULOSE**

Hydrolysis of hemicellulose and cellulose into five- and six-carbon sugars by pretreatments provides the mixture of sugars. Microorganisms tend to selectively utilize a preferred sugar, usually dglucose, by a regulation mechanism called catabolite repression. Synthetic biology has the potential to re-design microbial biology to simultaneously use d-glucose and other pentoses efficiently. Lignocellulosic raw materials commonly contain much higher amounts of d-xylose compared to other pentoses, and therefore, improving xylose fermentation has become a priority (Girio et al., 2010). Xylose degradation is not universal for all microbes in spite of being the most abundant monosaccharide in hemicellulose. At the current stage,most related research still uses the trial-and-error approach to accelerate xylose transport and xylose metabolism. A more quantitative understanding of sugar catabolism is necessary before synthetic biologists are able to predict and design a biological system that efficiently transports and metabolizes sugars.

There are two major metabolic pathways to catabolize xylose: xylose isomerase pathway and oxidoreductase pathway used by bacteria and fungi, respectively (**Figure 2**). These pathways have been constructed and optimized in industrial biocatalysts such as *S. cerevisiae* and *Z. mobilis*, which cannot natively metabolize xylose. There are comprehensive reviews that excellently summarized this research topic (Jeffries and Jin, 2004; Chu and Lee, 2007; Matsushika et al., 2009; Young et al., 2010; Cai et al., 2012; Kim et al., 2013). Here, we only briefly review some of important past efforts. The xylose oxidoreductase pathway is commonly used by some ascomycetous yeasts such as *Pichia stipitis* (**Figure 2**). Although the *S. cerevisiae* chromosome has genes encoding xylose reductase, xylitol dehydrogenase, and xylulokinase, their native expression level is too low to support cellular growth when using xylose as the sole carbon source (Yang and Jeffries, 1997; Richard et al., 2000; Traff et al., 2002; Toivari et al., 2004). Anaerobic xylose fermentation by *S. cerevisiae* was first demonstrated by heterologous expression of *XYL1* (Rizzi et al., 1988) and *XYL2* (Rizzi et al., 1989) genes encoding xylose reductase and xylitol dehydrogenase from *P. stipitis* (Kotter et al., 1990; Tantirungkij et al., 1994). However, the xylitol is accumulated as a significant side product when genes *XYL1* and *XYL2* are overexpressed in the recombinant *S. cerevisiae*, which lowers the ethanol yield. The accumulation of xylitol is likely due to the cofactor imbalance of the first two steps in the oxidoreductase pathway (**Figure 2**). NADPH is the preferred cofactor for xylose reductase to reduce xylose, while NAD is used by xylitol dehydrogenase to oxidize xylitol, resulting in the formation of xylulose (**Figure 2**). Unlike many bacteria, *S. cerevisiae*

some bacteria or reduced to xylitol by xylose reductase in some fungi. Xylitol is oxidized to xylulose and then phosphorylated to form xylulose-5-phosphate by xylulokinase. Xylulose-5-phosphate enters the pentose phosphate pathway for further degradation. The isomerase pathway avoids the production of xylitol.

lacks pyridine nucleotide transhydrogenases, which catalyze the conversion between these two reducing cofactors, NADPH and NADH (Nissen et al., 2001). Therefore, this imbalance of cofactors caused by these two reactions will eventually lead to slow kineticsfor xylose degradation and xylitol accumulation.Although overexpression of the xylose reductase and xylitol dehydrogenase genes has been shown to enable xylose metabolism in recombinant *S. cerevisiae* strains, overexpression of the xylulokinase gene is often required to create a complete functional heterologous pathway and to further reduce xylitol production (Ho et al., 1998; Jin et al., 2005; Bettiga et al., 2008). One successful example of engineering an efficient xylose-metabolizing yeast is the recombinant *Saccharomyces* sp. strain 1400(pLNH32) (Ho et al., 1998). In this strain, the *P. stipitis* xylose reductase, *P. stipitis* xylitol dehydrogenase, and *S. cerevisiae* xylulokinase genes under the control of the strong native glycolytic promoters were cloned into the plasmid pLNH32 to achieve high expression level. The aerobic conversion of xylose to ethanol has relatively high titer (23 g/L), yield (~0.45 g ethanol/g xylose, theoretic yield is ~0.5 g ethanol/g xylose for ethanol fermentation), and productivity (4 g/L/h) in a complex medium (Ho et al., 1998). Further improvements of ethanol titer and yields in several xylose-fermenting industrial yeast strains such as TMB 3400 and 424A(LNF-ST) have been achieved by utilizing the heterologous xylose oxidoreductase pathway and other genetic modifications that enhance the downstream pentose phosphate pathway (Matsushika et al., 2009). This demonstrates the potential of the xylose oxidoreductase pathway to improve xylose metabolism.

The xylose isomerase pathway, dominantly used by many bacteria including *E. coli* and *Bacillus subtilis*, has also been constructed in *S. cerevisiae* strains. In this pathway, xylose is directly converted to xylulose through a one-step reaction catalyzed by xylose isomerase or other aldose isomerases (**Figure 2**). This pathway does not involve xylitol formation and it does not require a reducing cofactor. However, this isomerization reaction thermodynamically favors xylose over xylulose at equilibrium (Jeffries, 1983), which requires an alternative driving force such as efficient downstream reactions to promote the equilibrium moving toward the formation of xylulose (**Figure 2**). In addition, it has been shown that the expression of functional bacterial xylose isomerase genes often result in inefficient enzymatic activities and thus low xylose utilization (Sarthy et al., 1987; Gardonyi and Hahn-Hagerdal, 2003). The unsuccessful heterologous expression is probably due to the protein misfolding and post-transcriptional modification. Even though the successful synthesis of active xylose isomerases derivedfrom different microbes including thermophilic bacterium *Thermus thermophiles* (Walfridsson et al., 1996), *Piromyces* sp.E2 (Kuyper et al., 2003), *Orpinomyces* (Madhavan et al., 2009), and *Clostridium phytofermentans* (Brat et al., 2009) has been achieved in *S. cerevisiae* at high levels, the rate of growth on xylose was still poor. It is possible that further optimization is needed to increase metabolic flux of downstream reactions, especially the pentose phosphate pathway. Ethanol yield is often higher in these recombinant *S. cerevisiae* using the xylose isomerase pathway than those using the heterologous xylose oxidoreductase pathway because xylitol production is avoided. However, the titer and productivity of *S. cerevisiae* using the xylose isomerase pathway are very low. The *Piromyces* sp. xylose isomerase has been extensively engineered to increase catalytic efficiency, and the *S. cerevisiae* BY4741-S1 derivatives expressing this mutant enzyme improved both its aerobic growth rate and ethanol production (Lee et al., 2012). However, in terms of xylose utilization and ethanol production, these optimized recombinant *S. cerevisiae* strains still perform more poorly with a final ethanol titer lower than 4 g/L. The heterologous xylose isomerase pathway has also been successfully constructed in other biocatalysts such as *Z. mobilis*, a bacterium notable for its bioethanol-producing capabilities, which has been used as a natural fermentative agent in alcoholic beverage production (Skotnicki et al., 1983). Similar to *S. cerevisiae*, *Z. mobilis* cannot metabolize xylose, which limits its application in lignocellulose conversion. In addition, *Z. mobilis* metabolizes glucose into pyruvate using the Entner–Doudoroff pathway instead of glycolysis (Embden–Meyerhof–Parnas pathway) and then converts pyruvate into ethanol and CO<sup>2</sup> (Conway, 1992). Even with the successful expression of the xylose isomerase and xylulokinase genes from *Xanthomonas campestris* or *Klebsiella pneumoniae*, *Z. mobilis* was still unable to grow using xylose as the sole carbon source (Liu et al., 1988; Feldmann et al., 1992). Interestingly, in addition to overexpression of the xylose isomerase and xylulokinase genes, overexpression of the transaldolase and transketolase genes (the main enzymes in the pentose phosphate pathway) resulted in a recombinant *Z. mobilis* with a functional xylose metabolism (Zhang et al., 1995). The resulting strain CP4 (pZB5) is able to convert xylose to ethanol with a higher titer (11 g/L) and yield (0.44 g/g xylose) compared to recombinant *S. cerevisiae* using the xylose isomerase pathway (Zhang et al., 1995). This excellent work strongly suggests a high flux of downstream metabolic reactions such as the pentose phosphate pathway is required for a functional xylose catabolism using the xylose isomerase pathway (**Figure 2**). A high performance

of xylose to ethanol conversion using a bacterial xylose isomerase pathway has been achieved in a wild-type *E. coli* strain (ATCC9637) after extensive metabolic engineering and adaptive laboratory evolution (Jarboe et al., 2007). The recombinant *E. coli* strain LY180 uses the native xylose isomerase pathway and the *Z. mobilis* ethanol-producing pathway to achieve the efficient conversion of xylose to ethanol with a high titer (45 g/L after 48 h) and yield (0.48 g/g xylose) using mineral salts medium (Miller et al., 2009b; Yomano et al., 2009). These successful examples of engineering *Z. mobilis* and *E. coli* suggest that the bacterial xylose isomerase pathway has the potential for efficient xylose conversion when the metabolic flux in downstream pathways is efficient.

Another challenge for the conversion of sugars derived from lignocellulose is the sequential metabolism of sugar mixtures, a phenomenon called catabolite repression. d-glucose represses the utilization of other sugars such as xylose in many industrial catalysts, thus impeding the rapid and complete utilization of sugar mixtures during fermentation. The mechanism of glucose repression is very complex and involves multiple levels of regulation. For example, *E. coli* has complex glucose repression mechanisms mainly through cyclic AMP, cyclic AMP-binding protein and enzymes of the phosphotransferase system (Kim et al., 2010). There are also other mechanisms involving the inhibition of transport of alternative sugars and a dual transcriptional regulator called Cra (Ramseier, 1996). Strains with the relaxed glucose repression should be able to simultaneously use a heterogeneous sugar mixture. However, genetic perturbation of glucose repression components can disrupt regular glucose metabolism and result in decreased glucose metabolism. It is challenging to engineer a biocatalyst with relaxed glucose repression while keeping a high glucose utilization rate. There are different engineering strategies developed to improve sugar co-utilization (Yomano et al., 2009; Chiang et al., 2013). In a recombinant *E. coli* strain, a combinatory engineering strategy has achieved efficient co-utilization of glucose and xylose (30 g/L for each) in 16 h (Chiang et al., 2013). This genetic engineering strategy includes (1) deletion of *ptsG* (the glucose permease in phosphotransferase system) to release catabolite repression; (2) overexpression of a glucose transporter from *Z. mobilis* to restore glucose transport and metabolism; (3) overexpression of genes *rpiA*, *tktA*, *rpe*, and *talB* to increase pentose phosphate pathway. Recently, a completely different approach to decrease glucose repression has been developed (Galazka et al., 2010; Ha et al., 2011). Cellodextrins are glucose polymers of varying length (two or more glucose monomers) resulting from degradation of cellulose. Wild-type *S. cerevisiae* cannot assimilate cellodextrin because it lacks both the cellodextrin transporter and β-glucosidase capable of hydrolyzing cellodextrin into glucose. By integrating efficient transporters, the complemented hydrolytic enzymes for cellodextrin and the xylose oxidoreductase pathway (**Figure 2**) into *S. cerevisiae*, this recombinant *S. cerevisiae* strain is able to simultaneously consume cellodextrin and xylose probably because the glucose concentration is never high enough to induce the catabolite repression phenotype (Ha et al., 2011). It is plausible that intracellular hydrolysis of cellodextrin minimizes glucose repression of xylose fermentation allowing this co-consumption (Galazka et al., 2010; Ha et al., 2011). This novel strategy has

the potential to enable efficient co-utilization of sugar mixtures derived from lignocellulose.

Successful lignocellulose conversion requires efficient transport of the mixture of sugars into the cells. The transport of xylose is less efficient than the transport of glucose and often inhibited by d-glucose, which suggests xylose transport is a limiting factor for lignocellulose conversion (Jeffries and Jin, 2004; Luo et al., 2014). Overexpression of homologous and heterologous sugar transporters enables recombinant strains to transport xylose, but have very limited positive effect on xylose fermentation and growth (Weierstall et al., 1999; Hamacher et al., 2002; Gardonyi et al., 2003; Sedlak and Ho, 2004; Saloheimo et al., 2007; Hector et al., 2008; Runquist et al., 2009). To improve xylose transporters, the substrate affinities for xylose of different yeast hexose transporters were altered and selected through mutagenesis and screening approaches (Young et al., 2012, 2014; Farwick et al., 2014). These efforts identified regions and motifs of the hexose transporters as the engineering targets for reprograming transporter properties (Farwick et al., 2014; Young et al., 2014). However, whether the transport of xylose is the limiting factor for xylose fermentation requires more characterization. Theoretically, xylose uptake becomes a limiting step only when the rate of xylose fermentation is higher than xylose uptake (Cai et al., 2012). The wild-type *S. cerevisiae* CEN.PK2-1C with its native hexose transporter Hxt was reported to be able to take up 0.14 g xylose/h/g dry cell weight in the presence of 50 mM xylose, which exceeds the xylose consumption rate in most recombinant *S. cerevisiae* strains (Hamacher et al., 2002; Cai et al., 2012). Without optimization of sugar transporters, engineered yeast strains already achieved relatively high performance of xylose fermentation using native hexose sugar transporters for xylose uptake (Ho et al., 1998; Sonderegger et al., 2004). The potential beneficial effect of these improved xylose transporters in the recombinant yeast strains with high xylose metabolism remains to be tested.

# **ENGINEERING BIOCATALYSTS RESISTANT TO LIGNOCELLULOSE INHIBITORS**

Pretreatments such as dilute acid at elevated temperature are effective for the hydrolysis of pentose polymers in hemicellulose and also increase the access of cellulase enzymes to cellulose fibers. However, the fermentation of the resulting syrups, called hydrolyzates, is hindered by minor reaction products such as furan aldehydes including furfural and 5-hydroxymethylfurfural (5-HMF), organic acids, and phenolic compounds (Saha, 2003). Furfural and 5-HMF are formed by the dehydration of sugars (pentoses and hexoses, respectively) during pretreatment and more furfural than 5-HMF is present in most hemicellulose hydrolyzates (Saha, 2003; Geddes et al., 2010a,b, 2013). Furfural is of particular importance as a fermentation inhibitor because of its abundance and toxicity (Saha, 2003; Almeida et al., 2009; Mills et al., 2009; Geddes et al., 2010b, 2011). Furfural is more toxic than 5-HMF to industrial catalysts such as *E. coli* and *S. cerevisiae* (Zaldivar et al., 1999; Gorsich et al., 2006). In model studies with various hydrolyzate inhibitors, furfural was unique in potentiating the toxicity of other compounds (Zaldivar et al., 1999). The advancement of engineering tolerance to organic acids and phenolic compounds has been excellently summarized in recent reviews (Mills et al., 2009; Laluce et al., 2012). This review mainly focuses on furan aldehydes as important lignocellulose inhibitors.

A significant amount of effort has been contributed to the identification and optimization of biological components to increase the resistance to furan aldehydes, especially furfural (**Table 1**). The toxicity mode of furan aldehydes is complex and involves multiple factors (Almeida et al., 2009; Lin et al., 2009a,b; Mills et al., 2009). Cellular growth is arrested in the presence of furan aldehydes and growth resumes after the complete reduction of furfural. This furan-induced delay in growth was observed in both *E. coli* and *S. cerevisiae* (Taherzadeh et al., 2000; Miller et al., 2009b; Wang et al., 2012b). There are two major metabolic pathways to metabolize or reduce furan aldehydes in nature (**Figure 3**). Some bacteria such as *Cupriavidus basilensis* HMF14 can catabolize furan aldehyde as a sole carbon source when growing aerobically (Koopman et al., 2010). Furan aldehydes such as furfural are firstly oxidized into 2-furoic acid and then further metabolized to 2-oxoglutaric acid that eventually enters the TCA cycle to provide energy and biosynthetic building block (Trudgill, 1969; Koenig and Andreesen, 1990; Koopman et al., 2010) (**Figure 3**). The key step of this furfural degradation is dependent on oxygen thus limiting its application for anaerobic fermentative production (Koopman et al., 2010; Ran et al., 2014). *E. coli* and *S. cerevisiae* do not have furan aldehydes oxidative degradation pathways. Under anaerobic fermentation conditions, these microbes use their native oxidoreductases to reduce furan aldehydes to furan alcohol, which is much less toxic (Zaldivar et al., 1999, 2000). Furan alcohols are secreted outside of cells and remain in the fermentation broth without further degradation (Liu and Blaschek, 2010; Wang et al., 2012b). Cells do not grow until furfural or 5-HMF is reduced to a low threshold concentration (~5 mM) (Liu and Blaschek, 2010; Wang et al., 2012b; Ran et al., 2014) (**Figure 3**). This native detoxification approach has been strengthened in *S. cerevisiae* strains by overexpression of the native oxidoreducase genes such as *ADH1* (Laadan et al., 2008), *ADH6* (Petersson et al., 2006; Almeida et al., 2008; Liu et al., 2008), and *ADH7* (Liu et al., 2008) encoding the enzymes with activities to reduce furan aldehydes (**Table 1**). Overexpression of these oxidoreductase genes increases the 5-HMF reduction rate and shortens the lag time of cell growth. Interestingly, this native detoxification response causes the growth arrest in *E. coli*. The presence of furfural activates the expression of the *yqhD* gene encoding an oxidoreductase able to reduce furfural to furfuryl alcohol using NADPH as the reducing cofactor (Miller et al., 2009b; Turner et al., 2010). However, NADPH is essential

#### **Table 1 | Beneficial genetic traits for furan aldehydes degradation and tolerance**.


for biosynthesis but is very limited under anaerobic xylose fermentation (Frick and Wittmann, 2005; Miller et al., 2009a). It is this depletion of NADPH by YqhD that has been proposed as the mechanism for growth inhibition in *E. coli* (Miller et al., 2009a,b; Turner et al., 2010). The NADPH-intensive pathway for sulfate assimilation was identified as a sensitive site that may be responsible for growth inhibition (Miller et al., 2009a). Addition of cysteine, deletion of *yqhD*, or increased expression of *pntAB* (transhydrogenase for interconversion of NADH and NADPH) conferred the tolerance to furan aldehydes including furfural and 5-HMF in *E. coli* (Miller et al., 2009a,b, 2010). To accelerate the furfural reduction but avoid using NADPH as the reducing cofactor, an alternative NADH-dependent furfural reductase is desired. A native oxidoreductase, FucO,was identified to have such properties and its overexpression did increase furfural tolerance in different *E. coli* biocatalysts (Wang et al., 2011). FucO normally functions in fucose metabolism and its catalytic efficiency for furfural reduction is low (Wang et al., 2011). The enzyme properties of FucO as a furfural reductase were improved by site-saturated mutagenesis and growth-based selection (Zheng et al., 2013). Overall, optimization of NADH-dependent furfural reductase has potential to shorten the lag phase and to increase tolerance of biocatalysts under fermentation conditions.

A variety of genomic and transcriptomic approaches have yielded many beneficial genetic traits related to furan aldehydes tolerance (**Table 1**). *S. cerevisiae* gene disruption library was screened for mutants with growth deficiencies in the presence of furfural and *ZWF1* was found to relate to furfural tolerance (Gorsich et al., 2006). Overexpression of *ZWF1* increased furfural tolerance (Gorsich et al., 2006). *ZWF1* encodes glucose-6 phosphate dehydrogenase, which catalyzes the first step of the pentose phosphate pathway, the major pathway providing NADPH when utilizing glucose as the carbon source. A similar approach using genome-wide RNAi screen showed that inactivation of the *SIZ1* gene increased furfural tolerance (Xiao and Zhao, 2014). *SIZ1* encodes E3 SUMO-protein ligase and inactivation of *SIZ1* increases the tolerance to oxidative stress besides furfural (Xiao and Zhao, 2014). At least part of the toxicity mechanism induced by furfural is suggested to be associated with oxidative stress (Mills et al., 2009). Furfural was shown to induce the accumulation of reactive oxygen species inside of the *S. cerevisiae* cells and to cause damage to mitochondria, vacuole membranes, and cytoskeletons (Allen et al., 2010). Furan aldehydes were also reported to act as thiol-reactive electrophiles, to directly activate Yap1 transcription factor and to deplete glutathione (Kim and Hahn, 2013). Overexpression of either wild-type *YAP1* or its target genes *CTA1* and *CTT1*encoding catalases increased tolerance to furan aldehydes (Kim and Hahn, 2013). Interestingly, furan aldehydes do not induce oxidative responses in *E. coli*. The expression of the genes in major oxidative regulons such as OxyR and SoxRS regulons is not activated by the presence of furfural (Miller et al., 2009a). This strain difference adds another layer of complexity to engineering tolerance of furan aldehydes. In *E. coli*, an oxidoreductase UcpA with an undefined function was found to be associated with furfural tolerance by a transcriptomic analysis and its overexpression increased furan aldehyde tolerance (Wang et al., 2012b). Genomic libraries from three different bacteria were screened for genes that conferred furfural resistance to *E. coli* on plates. Beneficial plasmids containing the *thyA* gene were recovered from all three genomic libraries. The *thyA* gene encodes thymidylate synthase, important for dTMP biosynthesis, suggesting furfural toxicity is possibly related to DNA damage (Zheng et al., 2012). The microarray studies and whole genome sequencing of furfural resistant *E. coli* mutants led to the discovery of some polyamine transporters including PotE, PuuP, PlaP, and PotABCD with a beneficial role for furfural tolerance (Geddes et al., 2014). The detoxification mechanism was proposed to relate to the protection role of polyamine for important cellular constituents such as DNA (Geddes et al., 2014). Other advanced genomic tools such as multiSCaleAnalysis of Library Enrichments (SCALE) (Lynch et al., 2007) and trackable multiplex recombineering (TRMR) (Warner et al., 2010) have been used to identify more furfural related genetic traits in *E. coli* (Glebes et al., 2014a,b). These experiments showed the *lpcA*, *groESL*, *ahpC*, *yhiH*, *rna*, and *dicA* genes are associated with furfural tolerance although the overexpression of these genes individually only showed limited positive effect (Glebes et al., 2014a,b). Another interesting approach is to select a mutant form of the stress-related exogenous regulator IrrE, which confers *E. coli* the tolerance to furan aldehydes (Wang et al., 2012a). Considering the complexity of the toxicity mode induced by furfural, it is not surprising to identify multiple biological parts beneficial for furan tolerance (**Table 1**). However, all these individual beneficial genetic traits discussed above only provide limited improvement for furan aldehyde tolerance. How to combine multiple beneficial genetic traits to achieve a significant increase of tolerance is a great challenge for synthetic biologists. An ideal synthetic detoxification system should contain a furfural responsive promoter driving the expression of the optimal combinations of different effector genes to minimize metabolic burden and maximize the benefit of effector genes (**Figure 4**).

There are at least two major challenges for designing such an integrated detoxification system. First, most epistatic interactions between beneficial genetic traits are not predictable and

are produced to mitigate the toxicity of furan aldehydes. Example effectors shown in the graph are furfural reductase **(A)**, anti-oxidative protein **(B)**, polyamine transporter **(C)**, and chaperonin **(D)**, assuming these effectors have synergistic epistatic interaction. When furfural level decreases, promoter remains silenced and no more new effectors are made. This design provides a controllable mechanism for furfural tolerance to minimize metabolic burden and maximize the benefit of effector genes.

the experimental search for the optimal combination of multiple effector genes is time-consuming and labor-intensive (Sandoval et al., 2012b). Negative epistatic interactions are present for different beneficial genetic traits for furan aldehyde tolerance. For example, the combination of two beneficial traits, the increased expression of *pntAB* and the deletion of the *yqhD* gene together, made cells less tolerance to furfural than the cells with either one of these two beneficial genetic traits alone (Wang et al., 2013). Further characterization of the beneficial traits in a high-throughput manner is desired to eventually construct an optimal combination of multiple effector genes. Second, the technical challenges to achieve optimal expression of effector genes at the chromosomal level remain to be solved. The effector genes are normally expressed from an expression vector with expensive inducers and antibiotics or other selective conditions. The application of a plasmid-based expression system is undesired in large-scale bio-based production conditions due to the genetic instability, metabolic burden, and the costs (Keasling, 2008; Jarboe et al., 2010). Integration of furan aldehydes detoxification systems into the chromosome is desired. However, it is challenging to achieve the optimal expression of target genes at the chromosomal level, especially when high expression is needed.

# **CONCLUSION AND FUTURE PERSPECTIVES**

Efficient xylose metabolism and tolerance to furan aldehydes are desired features of microbial catalysts used in lignocellulose conversion. Past efforts of synthetic biology focused on identification and optimization of individual biological parts needed for a successful lignocellulose conversion. We have gradually accumulated much knowledge about xylose metabolism and transport, glucose repression, and furan aldehyde toxicity. Limited success of lignocellulose conversion has been achieved using these individual optimized parts (Sandoval et al., 2012a; Wang et al., 2013). Instead of taking a reductionist approach, we are reaching a new phase to characterize the epistatic interactions and to integrate the optimal combinations of different biological parts. This development is dependent on the modular high-throughput approach for epistasis characterization and large-scale genome editing.With the new development of high-throughput techniques and genome editing tools such as CRISPR/Cas9 technology (Doench et al., 2014; Harrison et al., 2014; Sampson and Weiss, 2014), constructing an effective platform strain for lignocellulose conversion is in the scope. The platform strains with high efficiency of sugar co-utilization and tolerance to chemical insult can be used to produce a variety of fuels and chemicals from lignocellulosic biomass by metabolic engineering. These common platforms can also be tuned to different types of biomass by laboratory adaptive evolution.

# **ACKNOWLEDGMENTS**

This work was supported by start-up fund from Arizona State University.

# **REFERENCES**


Cameron, D. E., Bashor, C. J., and Collins, J. J. (2014). A brief history of synthetic biology. *Nat. Rev. Microbiol.* 12, 381–390. doi:10.1038/nrmicro3239


pathway genes ZWF1, GND1, RPE1, and TKL1 in *Saccharomyces cerevisiae*. *Appl. Microbiol. Biotechnol.* 71, 339–349. doi:10.1007/s00253-005-0142-3


Service, R. F. (2007). Cellulosic ethanol – biofuel researchers prepare to reap a new harvest. *Science* 315, 1488–1491. doi:10.1126/science.315.5818.1488


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 26 November 2014; accepted: 04 February 2015; published online: 18 February 2015.*

*Citation: Nieves LM, Panyon LA and Wang X (2015) Engineering sugar utilization and microbial tolerance toward lignocellulose conversion. Front. Bioeng. Biotechnol. 3:17. doi: 10.3389/fbioe.2015.00017*

*This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology.*

*Copyright © 2015 Nieves, Panyon and Wang . This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Cofactor engineering for enhancing the flux of metabolic pathways

# **M. Kalim Akhtar <sup>1</sup>\* and Patrik R. Jones <sup>2</sup>\***

<sup>1</sup> Department of Biochemical Engineering, University College London, London, UK

<sup>2</sup> Department of Life Sciences, Imperial College London, London, UK

#### **Edited by:**

Pablo Carbonell, University of Evry, France

#### **Reviewed by:**

Juan Manuel Pedraza, Universidad de los Andes, Colombia Gary Sawers, Martin-Luther University Halle-Wittenberg, Germany

#### **\*Correspondence:**

M. Kalim Akhtar, Department of Biochemical Engineering, University College London, Torrington Place, London, WC1E 7JE, UK e-mail: kalim.akhtar@ucl.ac.uk; Patrik R. Jones, Department of Life Sciences, Imperial College London, Sir Alexander Fleming building, London, SW7 2AZ, UK e-mail: p.jones@imperial.ac.uk

The manufacture of a diverse array of chemicals is now possible with biologically engineered strains, an approach that is greatly facilitated by the emergence of synthetic biology. This is principally achieved through pathway engineering in which enzyme activities are coordinated within a genetically amenable host to generate the product of interest. A great deal of attention is typically given to the quantitative levels of the enzymes with little regard to their overall qualitative states. This highly constrained approach fails to consider other factors that may be necessary for enzyme functionality. In particular, enzymes with physically bound cofactors, otherwise known as holoenzymes, require careful evaluation. Herein, we discuss the importance of cofactors for biocatalytic processes and show with empirical examples why the synthesis and integration of cofactors for the formation of holoenzymes warrant a great deal of attention within the context of pathway engineering.

**Keywords: cofactors, metabolic pathway engineering, Fe–S clusters, enzymatic activity, synthetic biology**

# **INTRODUCTION**

Synthetic biology permits the engineering of biological devices or systems with novel or enhanced functions (Church et al., 2014). Such an approach has numerous applications, most notably in the manufacture of a diverse number of molecules including household chemicals, biofuels, and pharmaceutical drugs (Keasling, 2010). This is principally achieved through pathway engineering in which enzyme activities are carefully coordinated to generate the product of interest. Approaches based on the activities of isolated enzymes (*in vitro*) or whole cells (*in vivo*) can be employedfor this purpose (Guterl et al., 2012; Stephanopoulos, 2012). The latter approach in particular offers a significant benefit with respect to complex, multi-step pathways that rely on secondary cellular factors, as well as additional pathway processing. Since the enzymes serve as the workhorse components of these biocatalytic systems, both their quantitative levels and qualitative states are important parameters to consider for pathway engineering. A great deal of focus is typically placed on the quantitative levels of the enzyme (Zelcbuch et al., 2013) with little attention given to their overall qualitative states. Thus, the assumption is usually made that these enzyme components are functioning at full capacity. However, this may not always be the case given that a large subset of enzymes depend on cofactors for functionality.

# **SIGNIFICANCE OF COFACTORS IN BIOLOGY**

All biological organisms possess a network of pathways that lead to the production of metabolites with an array of cellular functions (Feist et al., 2009). The synthesis and interconversion of these metabolites is made possible by the catalytic activities of countless enzymes. By lowering the activation energy barrier, enzymes catalyze reactions at considerably faster rates than their chemical counterparts. This characteristic property along with the relatively high degree of substrate selectivity, permit the use of enzymes as catalysts for industrial purposes. For those enzymes, which rely solely on amino acids for catalysis, the types of reactions are extremely narrow in scope and, for the most part, restricted to acid/base and electrophilic/nucleophilic reactions (Broderick, 2001). To further extend, the range of reactions, enzymes are commonly associated with non-protein moieties known as cofactors (Broderick, 2001).

In the broadest sense of the term, cofactors are thought to be associated with well over half of known proteins (Fischer et al., 2010a). However, for this article, the term "cofactor" will refer specifically to those moieties, either organic or inorganic, which remain physically associated with the enzyme throughout the catalytic cycle (Fischer et al., 2010a). This excludes dissociable cosubstrates such as NADPH and glutathione. Additionally, since the primary focus of this article is on pathway engineering, only those cofactors that require *de novo* pathways for syntheses will be mentioned and sole metal entities such as calcium and selenium will also be excluded. Using this strict definition, a selection of common cofactors are listed in **Table 1**. These are categorized into two types: organic and inorganic (Rees, 2002; Fischer et al., 2010b). Members of the organic group of cofactors tend to be derivatives of vitamins and undertake numerous types of reactions, while the inorganic group is usually based on various arrangements of iron–sulfur (Fe–S) clusters.

In its cofactor-bound state, enzymes are referred to as holoenzymes while in the unbound state, they are known as apoenzymes (**Figure 1A**). Two discrete structural parts are required for the


#### **Table 1 | Examples of enzyme bound cofactors**.

For a more comprehensive list of organic cofactors refer Fischer et al. (2010b).

synthesis of a holoenzyme: a polypeptide chain and a cofactor moiety. The former is generated by the ubiquitous translational machinery while the latter, depending upon the cofactor, is synthesized by a defined metabolic pathway. In certain cases, subtle variations of the cofactor pathway may exist. For example, in animals, fungi, and α-proteobacteria, heme synthesis is initiated by 5-aminolevulinate synthase, via condensation of glycine and succinyl CoA, while in photosynthetic eukaryotes and some species of the α-proteobacterial group, it depends on glutamate, via the concerted actions of three enzymes (Layer et al., 2010). Once synthesized, the cofactor is integrated with the apoenzyme, either in a co-translational or post-translational manner, to form the holoenzyme. The nature of the association may be covalent and, in such cases, linkages are typically formed with serine, threonine, histidine, tyrosine, and lysine residues and catalyzed by a distinct maturation system. As an example, the heme *c* in cytochrome *c* is attached to a cysteine residue, via the vinyl group with the aid of Ccm (cytochrome *c* maturation) factors (Sanders et al., 2010). Alternatively, the interaction may be tight but non-covalent as in the case of the flavin-containing enzyme, acyl CoA dehydrogenase (Thorpe and Kim, 1995).

# **IMPORTANCE OF COFACTOR SYNTHESIS FOR BOTH ENZYME AND PATHWAY FUNCTIONALITY**

As integral components of numerous holoenzymes, cofactors are required for the majority of metabolic pathways. To name but a few, these include: FAD in succinate dehydrogenase for the Krebs cycle; TPP in transketolase for the pentose phosphate pathway; pyridoxal phosphate in glycogen phosphorylase for glycogenolysis; H-clusters in hydrogenases for hydrogen metabolism; Fe–MoCo in nitrogenases for nitrogen fixation; biotin in acetyl CoA carboxylase for fatty acid biosynthesis; and haem in cytochrome P450 for

**FIGURE 1 | (A)** Generalized overview of the synthesis of holoenzymes. **(B)** Significance of cofactor engineering for enhancing the output of holoenzyme-dependent pathways. The example (Akhtar and Jones, 2009) illustrates a synthetic pyruvate:H2-pathway that is heavily dependent on Fe–S clusters. These clusters are required for (i) the proteins/enzymes directly involved in the pathway for hydrogen production and (ii) the maturation factors that are responsible for the synthesis and integration of the H-cluster present in Fe–Fe hydrogenases.

various detoxification pathways. The most crucial point to consider is that the functional output of holoenzymes can only arise if the apoenzyme is correctly folded with its cofactor. Without the cofactor, such enzymes would be rendered inoperable and the associated pathways would become redundant. This situation, though undesirable, is likely to be encountered in a bottom-up approach to microbial engineering in which a genetically amenable and wellcharacterized organism, such as *Escherichia coli* and *Saccharomyces cerevisiae* may be completely devoid of cofactors necessary for heterologous enzyme activity. Alternatively, the capability of the host to synthesize the required cofactor exists but may be insufficient, resulting in a pool of enzymes with a high ratio of apo to holo form (see next section). The most obvious strategy for resolving these issues would be to resort to "cofactor engineering," by genetically modifying the host so that the cofactor assembly system is present or that the level of native cofactor assembly is in sufficient supply.

Consider the expression of the clostridial Fe–Fe hydrogenase. This enzyme depends on an H-cluster that essentially is a di-iron arrangement with three carbon monoxide, two cyanide ligands, and one dithiolate bridge (Mulder et al., 2011). The formation of this cluster is catalyzed by three maturation enzymes; namely HydE, HydF, and HydG (Posewitz et al., 2004). Since *E. coli* is not naturally endowed with the Hyd maturation enzymes, *E. coli* invariably produces a non-functional Fe–Fe hydrogenase. This can be circumvented by simply complementing the expression of Fe– Fe hydrogenases with the maturation pathway for the H-cluster in order to form the active Fe–Fe hydrogenase (Posewitz et al., 2004; Akhtar and Jones, 2008b). Pyrroloquinoline (PQQ) is another example of a cofactor, which is not naturally synthesized in *E. coli* (Matsushita et al., 1997). This cofactor, present in a family of quinoproteins, has potential uses in biofuel cells, bioremediation, and biosensing (Matsushita et al., 2002). By incorporating the *pqqABCDE* gene cluster from *Gluconobacter oxydansa*, Yang et al. (2010)were able to successfully demonstrate the activity of a PQQrequiring d-glucose dehydrogenase in *E. coli*. They noted, however, that the gene cluster was most likely complemented by the native *tldD* gene. A final example is tetrahydrobiopterin, which in itself is a desirable commodity for the treatment of mild and moderate forms of phenylketonuria (Perez-Duenas et al., 2004). It can be synthesized *in vivo* from GTP via a three-step pathway comprising GTP cyclohydrolase I, 6-pyruvoyl-tetrahydropterin synthase, and sepiapterin reductase (Yamamoto et al., 2003). By augmenting the pathway with expression of a GTP cyclohydrolase I sourced from *Bacillus subtilis*, a 1.5-fold improvement was observed with titers reaching as high as 4 g biopterin per liter of culture (Yamamoto et al., 2003).

# **COFACTOR INSERTION IS KEY TO MAXIMIZING TOTAL AND SPECIFIC HOLOENZYME ACTIVITY**

To maximize the specific activity of a holoenzyme, cofactor synthesis would need to be complemented and/or coupled with cofactor insertion. For enzymes, which bind to cofactors in a noncovalent fashion, the process of cofactor insertion is somewhat of an enigma. It is still not known, even for the well-known heme *b* cofactor, whether this process is facilitated by dedicated *in vivo* components or is a spontaneous process (Thöny-Meyer, 2009). Though overwhelming *in vitro* data show that heme *b* insertion can be a spontaneous event, recent evidence with *in vivo* model systems have implicated the involvement of cellular factors that have yet to be elucidated (Waheed et al., 2010; Correia et al., 2011). For those enzymes that have covalently attached cofactors, specialized maturation systems have evolved to catalyze both insertion and covalent linkage of the cofactor. Induction of the maturation system can increase holoenzyme activity, as in the case of a carboxylic acid reductase (CAR), which was recently employed for the production of a broad range of chemical commodities (Akhtar et al., 2013). Venkitasubramanian et al. (2007) had verified that CAR requires a cofactor known as phosphopantetheine. This cofactor, during its synthesis, is concomitantly integrated with the enzyme, via a phosphodiester bond, by a maturation enzyme known as phosphopantetheinyl transferase. The sole expression of CAR in *E. coli* leads to an observable, but exceedingly poor, activity. However,with coexpression of the phosphopantetheinyl transferase Sfp from *Bacillus subtilis*, the specific activity of CAR can be enhanced several-fold to a level that is on par with one that is purified from the native organism.

In addition to stimulating the specific enzyme activity, increasing the intracellular levels of cofactors is also known to improve the overall production levels of holoenzymes, suggesting a relationship between holo/apo-forms and degradation. This is a phenomenon that has been frequently observed for hemoproteins (Harnastai et al., 2006; Lu et al., 2010, 2013; Michener et al., 2012). By elevating heme levels, via supplementation of the media with δ-aminolevulinic acid, the expression levels of hemoglobin and cytochrome *b*<sup>5</sup> can be significantly improved (Gallagher et al., 1992; Liu et al., 2014). Likewise, increasing Fe–S levels, via overexpression of the *isc* (Fe–S cluster) operon, also leads to similar effects (Nakamura et al., 1999;Akhtar and Jones, 2008a). An explanation for the increased holoenzyme levels may be gleaned from studies of proteins, which utilize divalent metal ions as cofactors (Wilson et al., 2004; Bushmarina et al., 2006; Palm-Espling et al., 2012). Evidence from these published reports suggests that cofactors may aid in the folding of the polypeptide chain and, in turn, accelerate the formation of a functional protein (Goedken et al., 2000). In the case of ribonuclease HI, metal cofactor integration was found to impart a greater degree of rigidity on the final native conformational state of the protein, in addition to improving the refolding rate of the protein (Wittung-Stafshede, 2002). Further insights on the importance of cofactors in protein folding can be gained with the *S*-adenosylmethionine-containing biotin synthase. This enzyme relies on an intact Fe–S cluster for the addition of sulfur to dethiobiotin to form the biotin thiophane ring. Reyda et al. (2008) noticed that the loss of the Fe–S cluster destabilized the protein, which led to transient unfolding of specific regions, as well as increased proteolysis. Proteolytic degradation was found to proceed by an apparent ATP-dependent proteolysis mechanism, via sequential cleavage of small C-terminal fragments. Interestingly, it was also speculated that since the activity of the protein is generally well maintained under high-iron conditions, a repair process, possibly mediated by the Isc and/or Suf (Sulfur mobilization) machinery, may be active under conditions of destabilization (Reyda et al., 2008).

# **STIMULATING COFACTOR SYNTHESIS CAN ENHANCE THE FLUX OF SYNTHETIC PATHWAYS**

With regard to the actual impact that cofactor engineering can have on the metabolic performance of synthetic pathways, two studies are particularly worthy of mention. In the first study relating to the production of the vitamin C precursor, 2-keto-l-gulonic acid, Gao et al. (2013) utilized two PQQ-dependent dehydrogenases with d-sorbitol as the starting substrate. The authors noted that induced expression did not improve titers beyond a certain threshold and hypothesized that PQQ was the limiting factor. This was proven to be correct since induction of a pathway for PQQ synthesis resulted in a 20% increase in overall titer. A later refinement of the work in which the two pathway enzymes were incorporated as a fusion protein in *Ketogulonigenium vulgare* also resulted in a similar improvement in titer (Gao et al., 2014).

The second study relates to a synthetic pathway for hydrogen production consisting of a pyruvate:ferredoxin oxidoreductase (PFOR, also known as YdbK in the literature), Fdx and Fe–Fe hydrogenase (Akhtar and Jones, 2009). The design of this pathway was based on the observation that pyruvate, rather than NAD(P)H, was a metabolically superior source of electrons for hydrogen synthesis (Veit et al., 2008). Initial structural analysis of the PFORbased pathway, in addition to the maturation enzymes, revealed that each protein component was associated with at least one Fe–S cluster that essentially provides the route of electron transfer from pyruvate to the H-cluster of the Fe–Fe hydrogenase (**Figure 1B**). Up to a total of 12 Fe–S clusters were found to be required for the pathway. In *E. coli*, the formation of Fe–S clusters is undertaken by the *isc* operon, though an analogous *suf* operon is also present (Fontecave et al., 2005; Roche et al., 2013). Since the *isc* operon is controlled by the negative IscR transcriptional regulator, our initial work using a ∆*iscR* background strain had shown that the levels of holo-ferredoxin and the *in vitro* hydrogenase activity could be improved two and threefold, respectively, relative to the wild-type strain (Akhtar and Jones, 2008a). Based on this insight and given the heavy demand for Fe–S clusters, we reasoned that the ∆*iscR* strain may improve the pathway flux toward hydrogen production by increasing the availability of Fe–S clusters. In accordance with our prediction, we observed a twofold improvement in hydrogen yield relative to the wild-type, resulting in an overall yield of 1.5 moles of H<sup>2</sup> per mole of glucose. Even more remarkably with addition of TPP, which serves as a cofactor for the PFOR component, the hydrogen yield was further increased to give a final yield of 1.9, out of a theoretical yield of two moles per mole of glucose (Akhtar and Jones, 2009). Given also that both the specific and total *in vitro* activity of PFOR was improved, this suggests that the availability of the TPP cofactor may well be another potential limiting cofactor for hydrogen production (Akhtar and Jones, 2009).

Although cofactor biosynthesis and integration under native control can be limiting and unresponsive to high-expression levels of the apoform, presumably to conserve cellular resources, cofactor engineering can quite clearly be advantageous for the assembly of metabolic pathways that depend on the activity of heterologous holoenzymes. This benefit presumably arises from the improved holoenzyme activity, via increased structural stability and/or folding rates in conjunction with reduced protein degradation and/or protein unfolding (explained in the previous section). Interestingly, data from our work on the *in vitro*

activity of Fe–S proteins also suggests that, under certain conditions, a steady and constant supply of cofactors may well aid the restoration of damaged or inactivated enzymes (Akhtar and Jones, 2008a).

# **CONCLUDING REMARKS**

Achieving a balanced production of polypeptide and cofactor for optimal holoenzyme activity would be an iterative process involving the (i) modulated induction of genes, (ii) monitoring of cofactor levels, (iii) evaluation of enzyme activity, and (iv) evaluation of whole-system productivity. Genetic modulation can be controlled at the transcriptional and translational levels by varying the strengths of promoter and ribosomal binding sites,while cofactor levels can be monitored and quantified using suitable analytical methods, e.g., mass spectrometry, high-performance liquid chromatography (HPLC). Combining this with information relating to specific enzyme activity should allow provide profound insights into the intracellular cofactor levels required to achieve optimal holoenzyme activity. Furthermore, if a product reporter system is available, it may also be possible to screen for optimal cofactor metabolism, for example using RBS-variation libraries (Zelcbuch et al., 2013).

Given the importance of cofactor synthesis and integration for holoenzyme activity, a few key points need to be considered when assembling pathways involving holoenzymes. Firstly, holoenzyme activity will only be possible within a host that is metabolically equipped to synthesize the necessary cofactor, otherwise a complementary pathway for cofactor production would also need to be implemented. This is particularly relevant for synthetic pathways that employ holoenzymes from diverse origins. Secondly, to ensure maximal activity of the holoenzyme, the apoenzyme needs to be sufficiently coupled with the synthesis and insertion of its respective cofactor. An imbalance between the two will lead to poor enzyme activity, one that would most likely be inadequate for catalytic purposes. Even native cofactor biosynthesis may not be optimally tuned or responsive to demand from an over-expressed apoenzyme, as would be required in a biocatalytic system where the objective function has shifted from biomass to metabolite producer. Thirdly, since cofactor stimulation is known to improve the stability and activity of holoenzymes, cofactor engineering is likely to be a useful strategy for enhancing the total activity of all holoenzymes in the engineered pathway, and maximizing chances for high flux toward the product of interest. As it currently stands, the impact of cofactor engineering on the activity of holoenzymes is still very much a poorly studied area and certainly warrants more attention given its potential impact on the success of engineered biocatalytic systems.

# **ACKNOWLEDGMENTS**

This work that spurred our views, reflected within this perspective, was supported in part by the Academy of Finland project no. 25369 and European Research Council under the European Union's Seventh Framework Programme (FP7/2007- 2013)/European Research Council Grant Agreement 260661 (Patrik R. Jones).

# **REFERENCES**


Yang, X. P., Zhong, G. F., Lin, J. P.,Mao, D. B., andWei, D. Z. (2010). Pyrroloquinoline quinone biosynthesis in *Escherichia coli* through expression of the gluconobacter oxydans pqqABCDE gene cluster. *J. Ind. Microbiol. Biotechnol.* 37, 575–580. doi:10.1007/s10295-010-0703-z

Zelcbuch, L., Antonovsky, N., Bar-Even, A., Levin-Karp, A., Barenholz, U., Dayagi, M., et al. (2013). Spanning high-dimensional expression space using ribosomebinding site combinatorics. *Nucl. Acids Res.* 41:e98. doi:10.1093/nar/gkt151

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 12 June 2014; accepted: 12 August 2014; published online: 28 August 2014. Citation: Akhtar MK and Jones PR (2014) Cofactor engineering for enhancing the flux of metabolic pathways. Front. Bioeng. Biotechnol. 2:30. doi: 10.3389/fbioe.2014.00030 This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology.*

*Copyright © 2014 Akhtar and Jones. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Can the natural diversity of quorum-sensing advance synthetic biology?

# **René Michele Davis 1,2\*, RyanYue Muller 3,4 and Karmella Ann Haynes <sup>1</sup>**

1 Ira A. Fulton School of Biological and Health Systems Engineering, Arizona State University, Tempe, AZ, USA

<sup>2</sup> Biological Design Graduate Program, Arizona State University, Tempe, AZ, USA

<sup>3</sup> Department of Chemistry and Biochemistry, Arizona State University, Tempe, AZ, USA

<sup>4</sup> School of Life Sciences, Arizona State University, Tempe, AZ, USA

#### **Edited by:**

Zhanglin Lin, Tsinghua University, China

#### **Reviewed by:**

Baojun Wang, University of Edinburgh, UK Jesus Picó, Universitat Politecnica de Valencia, Spain

#### **\*Correspondence:**

René Michele Davis, Ira A. Fulton School of Biological and Health Systems Engineering, Arizona State University, 501 E Tyler Mall, 9709, Tempe, AZ 85287, USA e-mail: rene.davis@asu.edu

Quorum-sensing networks enable bacteria to sense and respond to chemical signals produced by neighboring bacteria. They are widespread: over 100 morphologically and genetically distinct species of eubacteria are known to use quorum sensing to control gene expression. This diversity suggests the potential to use natural protein variants to engineer parallel, input-specific, cell–cell communication pathways. However, only three distinct signaling pathways, Lux, Las, and Rhl, have been adapted for and broadly used in engineered systems. The paucity of unique quorum-sensing systems and their propensity for crosstalk limits the usefulness of our current quorum-sensing toolkit. This review discusses the need for more signaling pathways, roadblocks to using multiple pathways in parallel, and strategies for expanding the quorum-sensing toolbox for synthetic biology.

**Keywords: quorum sensing, homoserine lactone, crosstalk, orthogonal, genetic wire, synthetic gene circuit**

# **MODULES FROM NATURAL QUORUM-SENSING NETWORKS CAN BE DECOUPLED AND INTEGRATED INTO SYNTHETIC SYSTEMS**

Scientists first explored the genetic circuitry of a quorum-sensing system through basic research of *Vibrio fischeri,* a symbiotic microbe that populates the light organ of the bobtail squid, *Euprymna scolopes*. Researchers identified an operon called "Lux" that allowed individual *V. fischeri* cells to produce a glowing phenotype by expressing Luciferase specifically in dense bacterial populations (Ruby and Nealson, 1976). Explorations of other microbial genomes revealed dozens of Lux homologs that are collectively known as homoserine lactone (HSL) quorum-sensing networks (Fuqua et al., 1996; Williams et al., 2007; Dickschat, 2010). In addition to bioluminescence, they found that these bacteria use quorum sensing to couple population density with the onset of group behaviors such as virulence, biofilm formation, sporulation, competence, and disruption of neighboring bacterial biofilms (Eberl, 1999).

Homoserine lactones networks are more commonly known as *N*-acyl homoserine lactone (AHL) quorum-sensing networks. However, our discussion includes LuxI-like synthases that produce compounds with a homoserine lactone ring but groups other than the acyl tail. In this review, we will consider homoserine lactone, HSL, to include AHLs as well as non-acyl tail compounds. We will refer to HSL with an acyl tail as acyl-HSL.

Homoserine lactones quorum-sensing networks generally consist of an HSL synthase LuxI-like protein, an HSL-binding LuxRlike regulator, and promoters that are regulated by LuxR-like/HSL complexes. The LuxI-like HSL synthase enzyme produces chemical signals called HSLs (Engebrecht and Silverman, 1984; Kaplan and Greenberg, 1987). Most HSLs diffuse passively across the cell membrane, while some require active transport. Quorum sensing is triggered when high external HSL concentrations drive net influx, allowing HSLs to bind and activate a LuxR-like regulator. The activated LuxR-like/HSL complex binds to a 20 base pair inverted repeat known as a Lux-like-box and regulates expression of downstream genes (**Figure 1A**). Synthases from various species of bacteria produce different HSL signals, and their corresponding regulators generally bind their cognate HSL with varying levels of specificity.

Researchers have also identified two other families of cell– cell communication networks: autoinducer-2 (AI-2) networks (Vendeville et al., 2005; Reading and Sperandio, 2006) and autoinducing peptides (AIPs), also called peptide pheromone networks (Kleerebezem et al., 1997; Reading and Sperandio, 2006). While AI-2 and AIP networks may be used in engineered systems, the molecular components of HSL networks are simpler,more diverse, and require little modification to function as expected when they are transferred into new host cells. These characteristics of the HSL family of quorum-sensing networks are well suited for building sophisticated, multi-component, synthetic systems. Therefore, we focus primarily on the HSL networks in this review.

Synthetic biologists and genetic engineers often use HSL quorum-sensing pathways to engineer novel behaviors in prokaryotic microorganisms. In these engineered systems, quorumsensing pathways are used as a set of decoupled components where the HSL synthase is the "Sender" component and the regulator and promoter are collectively the "Receiver" component (**Figure 1B**) (Miller and Bassler, 2001). They can be employed as"genetic wires" linking the functional elements of multi-component biological systems (Tamsir et al., 2011; Goñi-Moreno et al., 2013). The wires connect circuit components within a cell or across a population of

single or multiple strains. The Sender converts an input stimulus into a transmittable signal, the HSL, which activates the Receiver. The Receiver modulates expression of an output as designated by the designer (**Figure 2A**). This input stimulus may be anything that activates a promoter, including heavy metals (Prindle et al., 2012a;Wang et al., 2013), specific wavelengths of light (Tabor et al., 2009), biochemical signals secreted by pathogens (Saeidi et al., 2011; Gupta et al., 2013), the hypoxic microenvironment surrounding a tumor (Anderson et al., 2006), and HSLs from tandem quorum-sensing networks (Tamsir et al., 2011). The output may be any gene controlled by a Lux-like promoter, such as a visible reporter (Canton et al., 2008; Tabor et al., 2009; Tamsir et al., 2011), cell motility (Liu et al., 2011), antimicrobial proteins (Saeidi et al., 2011; Gupta et al., 2013), and anti-cancer drugs (Anderson et al., 2006).

The simplicity of these networks allows researchers to model how quorum-sensing-controlled gene expression is regulated in response to HSL signal concentration (McMillen et al., 2002; Pai and You, 2009; Pai et al., 2014). These models can inform how a quorum-sensing network should be implemented in a synthetic circuit to achieve the desired behavior (McMillen et al., 2002; Pai and You, 2009). Furthermore, dry lab researchers have used modeling to demonstrate how quorum-sensing systems control group response in the presence of noisy signal concentrations, supporting their use in synthetic biology as robust circuit components (Koseska et al., 2009; Weber and Buceta, 2013).

Incorporating quorum-sensing networks into production strains has advanced the field of metabolic engineering. Quorum sensing has been used to synchronize gene expression across a population to reduce cell-to-cell variability and to increase yields in engineered strains (Danino et al., 2010; Prindle et al., 2012a,b; Anesiadis et al., 2013). For example, by linking a Lux-based genetic oscillator with a gas phase signal oscillator, researchers coordinated gene expression among 2.5 million cells across 5 mm of space with minimal noise (Danino et al., 2010; Prindle et al., 2012a,b). Anesiadis et al. (2013) employed this type of circuit in a production strain, where they engineered a cell-density-dependent switch using the Lux system to control production of serine in an *Escherichia coli* knockout strain. Group-controlled gene expression implemented by an HSL quorum-sensing network leads to overall higher serine production.

Quorum-sensing networks are also used in genetic circuits to perform computation. Tabor et al. (2009) took advantage of the diffusibility of HSL through agar to build a bacterial edge detector using the Lux network. They demonstrated that stationary physical spacing of bacteria relative to different inputs drives controlled expression of an output. The circuit was designed such that bacteria exposed to darkness expressed HSLs but no output (LacZ). The circuit allowed only bacteria that were both adjacent to HSL-producers and exposed to light to express LacZ, which resulted in a pigmented outline at the edges of a light-masked region (**Figure 2B**). While most biocomputation is digital, Daniel et al. (2013)showed the versatility of quorum-sensing networks by demonstrating analog computation using the Lux network; their circuit converts logarithmic HSL input into linear fluorescent output over a large range of HSL concentrations. Thus far, engineered

biocomputation has used monolayers of cells. Three-dimensional (3D) colony-printing techniques will increase the sophistication of these systems (Connell et al., 2013). Controlled spacing of colonies based on HSL-diffusion rates could allow engineering a temporal element into a split circuit.

In the preceding examples, the cells in each system are expressing the same circuit. However, engineers may also coordinate gene circuits distributed among multiple populations. Brenner et al. (2007) used the Rhl and Las networks from *Pseudomonas aeruginosa* to build two strains of *E. coli* that form a biofilm together once both populations reach a threshold density. Balagaddé et al. (2008) used components from the Lux and Las networks to engineer a predator–prey relationship between *E. coli* strains. High predator population density induces cell death in the prey strain, while high prey population density supports survival of the predator strain. You et al. (2004) placed a cell death gene under the control of the Lux promoter and built a bistable system that maintains a population density of a defined range. At high cell density, the Lux network activates a cell death. At decreased cell density, the cell death gene is inactive and the population begins to grow again. Computation may be split across multiple strains, distributing the energy demands of a complex computation that are too great for a single cell (Ji et al., 2013; Payne and You, 2014) (**Figure 2A**). Wang et al. (2013) built a two-strain, three-input biosensor in *E. coli* that produces RFP only in the presence of three heavy metal contaminants: arsenic, mercury, and copper. Cell 1 produces LuxI after exposure to arsenic and mercury; Cell 2 expresses RFP in response to 3O-C6-HSL produced by Cell 1 and copper. Tamsir et al. (2011) linked circuits expressed in multiple cell populations using two quorum-sensing networks derived from *P. aeruginosa*, Rhl and Las (**Figure 2C**). They implemented complex Boolean expressions using different spatial arrangements on agar plates. Their system is built with the functionally completed NOR operator and can implement any Boolean expression.

# **CROSSTALK BETWEEN QUORUM-SENSING PATHWAYS CHALLENGES THE DEVELOPMENT OF SYNTHETIC GENETIC CIRCUITS**

Attempts to isolate, study, and apply quorum-sensing pathways for bioengineering is often thwarted by unexpected crosstalk. Quorum sensing is a popular tool among synthetic biologists for designing multicellular systems, but widely utilized HSL quorumsensing networks are currently limited to three pathways: Lux, Las, and Rhl. These networks all exhibit crosstalk with each other, complicating the design of complex genetic systems implemented with quorum-sensing networks.

For instance, a single regulator can be activated by multiple acyl-HSL-class molecules, resulting in cross-activation of regulators from different species of bacteria. This phenomenon was observed in a proof-of-concept system designed by Canton et al. (2008)wherein an output gene for green fluorescent protein (GFP) was placed under the control of a LuxR receiver module (Wu et al., 2014). Four chemically distinct acyl-HSLs,C6-HSL,C7-HSL, 3O-C8-HSL and, at higher concentrations, C8-HSL, all activated expression of GFP at levels comparable to or even greater than the cognate LuxI acyl-HSL, 3O-C6-HSL (Canton et al., 2008). Many

different HSL synthases, including EsaI, ExpI, and AhlI, produce the same major cognate acyl-HSL as LuxI, suggesting that these pathways would have high levels of crosstalk if built into the same network (Miller and Bassler, 2001; Põllumaa et al., 2012). In the report of their predator–prey, two-strain system, Balagaddé et al. (2008) discussed low-level crosstalk between LuxI and LasR,which was recently confirmed by observing LuxI and LasR interactions in a single-strain system (Wu et al., 2014). However, Balagadde's system functioned such that crosstalk was apparently below the threshold for altering intended behavior. Interestingly, this type of crosstalk is also observed in nature (Fuqua et al., 1996). For example, two opportunistic pathogens, *Burkholderia cepacia* and *P. aeruginosa*, are known to co-infect patients with cystic fibrosis (Lewenza et al., 2002). Each pathogen's quorum-sensing regulators respond to the other's HSLs, resulting in coordination of virulence-gene expression.

Crosstalk can also occur at the level of the target, or "output," gene; similarities between promoter sequences and the DNA-binding domains within the regulator proteins contribute to crosstalk between quorum-sensing pathways. The acyl-HSLactivated LuxR regulator stimulates transcription at its cognate promoter as well as the Esa promoter, while acyl-HSL-activated LasR, EsaR, and ExpR regulators are also capable of initiating transcription at the Lux promoter (von Bodman et al., 2003; Saeidi et al., 2011; Shong et al., 2013). While this type of crosstalk can be avoided by using only one regulator per strain, they will not behave as two orthogonal wires within a single cell.

# **EXPANDING THE SET OF ORTHOGONAL QUORUM-SENSING PATHWAYS ENABLES DESIGN OF COMPLEX GENETIC CIRCUITS**

Synthetic circuits may be engineered to detect specific combinations of input signals so long as each sensing pathway functions independently (orthogonally) without undesired intercommunication (crosstalk). Genetic circuits designed to respond to complex combinations of environmental conditions must distinguish and integrate multiple distinct input signals. Orthogonal quorum-sensing pathways are necessary to implement complex circuits that respond to signals produced by living cells, rather than requiring synthetic, exogenous inputs. Engineered division of labor is a major research area in metabolic engineering (Bernstein et al., 2012; Vinuselvi and Lee, 2012); orthogonal quorum-sensing modules will enable further development of cell-autonomous metabolic regulation in multi-strain bioreactor systems. Quorum-sensing circuits could be used to engineer multi-strain, self-monitoring microbial populations that perform energetically expensive metabolic processes in a single culture. Multiple co-cultured strains could be designed to monitor and maintain a target population ratio, or steps in a metabolic process could be timed for accumulation of precursors (Tamsir et al., 2011).

Circuit sophistication is limited by metabolic capacity, transcription and translation resources, and crosstalk within the cell. Moon et al. (2012) pushed the bounds of single-cell computational capability by building a four input AND gate in *E. coli*. Their complex logic gate allows living bacterial cells to express GFP in the presence of four exogenous compounds and no fewer. Transcription activator complexes were decoupled and placed under the control of distinct inducible promoters that respond to the presence of soluble compounds (arabinose, IPTG, tetracycline, and the acyl-HSL 3O-C6-homoserine lactone) in the cell culture medium. More complex circuits could be implemented by replacing the exogenous inputs in the Moon et al. system with quorum-sensing wires linking cells performing independent computation. Scaling can be achieved through modularity by building complex computational systems with simple independent components. By designing the components in separate strains and connecting them with orthogonal quorum-sensing wires, computational steps can be performed independently without exhausting cellular resources.

Connecting complex circuits requires orthogonal HSL networks to independently signal each strain's computation. However, using even two quorum-sensing networks in parallel is constrained by crosstalk. To our knowledge, there is no published demonstration of three or more orthogonal quorum-sensing networks in a single system. The complexity of multi-input integration circuits remains constrained by reliance on exogenous signals and by the limited number of orthogonal input–output pathways.

# **STRATEGIES FOR MINIMIZING CROSSTALK**

Promiscuous interactions between HSLs and regulators, as well as between regulators and promoters, prevent many quorum-sensing systems from operating independently and in parallel. Some have used gene-network engineering approaches to mitigate crosstalk (Brenner et al., 2007; Balagaddé et al., 2008; Tamsir et al., 2011;Wu et al., 2014). For instance, Brenner et al. (2007) engineered their system to avoid crosstalk between the Las and Rhl networks. They split the networks between two strains to eliminate promoter– regulator crosstalk and controlled HSL synthase production via a positive feedback loop to achieve a two-strain, biofilm-forming consortium.

Another approach to eliminate crosstalk between signaling pathways is using quorum-sensing pathways from distinct families (the aforementioned HSL,AI-2, andAIP pathways). Significant variance in the chemistry of the signaling molecules suggests that cross-reactivity is unlikely: HSLs contain a lactone ring with a hydrocarbon acyl or aryl tail, AI-2 is a furanosyl borate diester composed of two five-membered rings stabilized by a boron atom, and AIPs are relatively large circular peptides composed of amino acids (Chen et al., 2002; Marchand and Collins, 2013). However, this approach may be limited in its flexibility since both AI-2 and AIP require active transport and multiple proteins to generate and detect the signals. With a few exceptions, HSL networks require only two proteins and one promoter. While AI-2 quorum sensing is limited to only one signaling molecule, multiple AIP pathways may exist that do not have cross-reactivity. Marchand and Collins (2013) recently demonstrated modularity and orthogonality of two AIP signals from *Staphylococcus aureus*. In their system, *E. coli* was the AIP sender, producing and secreting two AIPs, and two engineered strains of *Bacillus megaterium* each received one of the signals but not the other. While the ability to use two AIPs in a single cell was not explored, this is a promising result and further research could demonstrate orthogonality between AIPs and HSL quorum sensing.

Directed evolution could also be used to generate regulator proteins that specifically respond to any desired HSL. Mutational analyses and 3D protein structure data have helped to identify key amino acid residues that govern the interaction between regulators and acyl-HSL ligands. Using positive and negative selection, Collins et al. (2006) generated a LuxR mutant that no longer responds to the cognate 3O-C6-HSL but gained responsiveness to C10-HSL, to which wild-type LuxR does not respond. They then demonstrated the orthogonality of LuxR wild type versus the mutant. However, directed evolution of regulator proteins to generate novel orthogonality is technically daunting and only generates mutants with minor changes to the wild-type binding pocket, limiting the range of possible novel behaviors. Furthermore, they bind to and activate the same promoter and, while this feature could be leveraged to build OR gates, they cannot be used as orthogonal networks in the same cell without further mutagenesis to alter promoter-binding specificity.

Finally, scientists could explore other microbial genomes for quorum-sensing homologs that have not yet been exploited for synthetic biology. Comparative genomics has identified dozens of HSL family (Lux-like) homologs in divergent species (Case et al., 2008). A major advantage of exploring wild-type homologs over directed evolution is that natural evolution has already "discovered" functional regulators in a very broad exploration space of amino acid sequences. Evolution has selected for regulator proteins of significantly different sizes, as opposed to artificial selection techniques that, due to practical constraints, do not deviate significantly from pre-existing primary structures.

#### **THE BASIS OF SPECIFICITY IN THE HSL SIGNALING FAMILY**

Investigations of microbial quorum-sensing pathways have revealed molecular characteristics that underlie the diversity of HSL signaling pathways in different species. These signaling pathways have been distinguished on the basis of the operator binding sites at promoters elsewhere (Vannini et al., 2002). In this review, we focus on diversity in the geometries of HSL signaling molecules and the HSL-binding pockets within the regulator proteins.

The extensive molecular diversity of naturally occurring HSL signaling molecules suggests that many functionally distinct HSLs, and thus orthogonal pathways, may exist. HSLs vary in the R-group, an acyl or aryl tail that extends from a homoserine lactone head (Dickschat, 2010) (**Figure 3**). HSL synthases have been reported to generate HSLs of varying carbon chain lengths, branching functional groups, and hydrocarbon saturation (**Figure 4**). Straight-chain acyl R-groups vary by chain lengths (e.g., C4-HSL versus C6-HSL in **Figure 3**) from 4 to 18 carbon atoms. Some acyl R-groups carry side-group replacements at the third or fourth carbon in the chain: a carbonyl group at C3 (e.g., 3O-C6-HSL), a hydroxyl group at C4 (e.g., 3OH-C6-HSL), or a methyl group at C3 (e.g., branched-chain isovaleryl-HSL). Aryl R-groups have a phenol group at C4 (p-Coumaroyl-HSL), or a phenyl group at C4 (Cinnamoyl-HSL). R-groups also differ by the degree of saturation in the carbon chain (e.g., monounsaturated 3OH-C14:1); unsaturation results in a carbon–carbon double bond, which changes the shape of the acyl tail, compared to the saturated form.

In some instances, a single synthase can produce two or more HSL variants. This variety arises from the species-specific combination of acyl tails that are carried into the HSL synthesis pathway by acyl-carrier proteins (ACPs) or aryl-CoA (Lindemann et al., 2011). Some HSL synthases display promiscuity in ACP or CoA-binding affinity and can catalyze formation of several different HSL molecules. Other HSL molecules show no overlap across species, suggesting that the cognate regulators may have evolved to respond specifically to certain HSLs, and orthogonal quorum-sensing systems may exist in nature.

Regulator proteinsfrom the HSL quorum-sensingfamily (LuxR homologs) consist of two major domains: an N-terminal autoinducer (HSL) binding region and a C-terminal region that binds DNA (**Figure 5**). To visually compare the topologies of functional regions in different regulators, we have generated scaled protein domain maps using descriptions from the literature (Egland and Greenberg, 2001; Zhang et al., 2002; Bottomley et al., 2007; Chen et al., 2011) and annotations from protein domain-scanning databases Uniprot (UniProt Consortium, 2014), Prosite (Sigrist et al., 2013), InterPro (Mitchell et al., 2015), and the Protein Data Bank (Berman et al., 2000) (**Figure 5**). Autoinducer-binding regions (InterPro IPR005143) contain roughly six alpha helices and six beta strands. Published 3D structures for TraR (PDB 1L3L) (Zhang et al., 2002), LasR (PDB et al., 2UV0) (Bottomley et al., 2007), CviR (PDB 3QP6) (Chen et al., 2011), and SdiA (Yao et al., 2006) reveal that a five-strand beta sheet is sandwiched

between two three-helix bundles. The C-terminal DNA-binding domains are characterized as "helix–turn–helix" (HTH) regions (Prosite PS50043) that consist of four alpha helices. The second and third helices within the HTH region are often identified as a conserved H–T–H motif (Prosite PRU00411); the third helix has been characterized as the DNA recognition helix in TraR (Zhang et al., 2002). When HSL molecules bind to their corresponding quorum-sensing regulator, they often induce multimerization of regulator proteins. This multimeric state is the active form, capable of binding an inverted DNA sequence repeat at the target promoter and inducing transcription of downstream genes.

Analysis of the HSL-regulator binding pockets suggests that the shape, size, hydrophobicity, and functionalization determine the binding affinity of a regulator for a specific HSL. This implies that comparison of known HSL–regulator interactions may identify likely candidates for orthogonal quorum-sensing networks. For example, it has been hypothesized that quorum-sensing systems that produce long, straight-chain acyl-HSLs have regulators with longer binding pockets; likewise, a system that uses acyl-HSL molecules with branching functional groups will have regulators with binding pockets that accommodate the branches (Bottomley et al., 2007). Thus, taking sterics into account, a quorum-sensing system that uses HSL molecules with a relatively short hydrocarbon tail and bulky functional groups may be orthogonal to a system that uses long-chain, non-branched HSL molecules.

Hydrophobic interactions between the HSL tail and amino acid residues within the binding pocket suggest that these binding interactions are dominated by van der Waals forces (Bottomley et al., 2007) (**Figure 3**). Because the HSL tail is buried within the binding

three-dimensional (3D) structure of TraR is shown as an example of how domains and the homoserine lactone (HSL) ligand are typically positioned in space. The underlined letters in the b–b–a–a–b–a–b–b secondary structure motif indicate the location of highly conserved amino acids that form hydrogen bonds with the homoserine lactone head of HSLs. Published 3D structure data (Protein Data Bank) are listed where available ("–" = not available). Abbreviations used are: Reg. = regulator protein,

IPR005143, HTH LuxR-type = PS50043. Inferred binding pockets are patterns of secondary structures that are similar to the TraR binding pocket. Inferred recognition helices are the second alpha helix from the C-terminus. Secondary structures for proteins with no available 3D structure data were mapped using the Jpred prediction tool (Cole et al., 2008). Maps were generated using DomainDraw (Fink and Hamilton, 2007).

pocket, the hydrophobicity of each component also determines the entropic stability, with a predominantly hydrophobic tail pairing stably with a predominantly hydrophobic binding pocket and less hydrophobic tail pairing stably with a less hydrophobic binding pocket. Therefore, HSL tail and binding pocket hydrophobicity may be a predictor of orthogonality between quorum-sensing pathways.

Pharmacophore models for HSL-regulator binding developed by Geske et al. (2008) support the idea that functionalization of the HSL molecule underlies binding pocket selectivity. Their models are based on the response of Tra, Las, and Lux regulators to libraries of HSLs and synthetic analogs in a system that used beta-galactosidase as the output gene. Comparison of the atomic geometries of ligands reveals three general properties linked with HSL efficacy: spacing of hydrophobic regions, hydrogen bond donor regions, and hydrogen bond acceptor regions within the R-group attached to the lactone ring. For instance, TraR shows the greatest response to a group of ligands in which the acyl tail contains one hydrogen bond donor region followed by two hydrogen bond acceptor regions arranged in *trans* and ended in a hydrophobic region (Geske et al., 2008).

Conservation and divergence in the conformation of regulator N-terminal HSL-binding regions support the idea that variation in HSL R-groups coordinates selective regulator–ligand interactions. Here, we explore whether motifs in the protein structures of regulators provide insight into the underlying mechanism of HSL-binding selectivity. Primary sequence alignments show 10–25% identity in regulator homologs (Bottomley et al., 2007) and therefore provide very limited information. We have attempted a more coarse-grained approach on a select set of well-characterized regulators by annotating secondary structures that correspond to the TraR binding pocket. For regulators that lack published 3D structure data, we have annotated secondary structures as hypothetical HSL-binding pockets (**Figure 5**).

The autoinducer-binding region contains two functional domains in its tertiary structure: the multimerization surface and the HSL-binding pocket. The multimerization surface of the TraR homodimer consists primarily of alpha helices a1 and a6 (Bottomley et al., 2007), plus other residues within loops that link helices and beta strands (**Figure 5**). The HSL-binding pocket binds a single HSL molecule in the space between a five-strand antiparallel beta sheet and a three-helix bundle (Bottomley et al., 2007). In the primary structure of TraR, these secondary structures are arranged in the order of b–b–a–a–b–a–b–b. The first and second alpha helix and the last beta strand of this motif (underlined in **Figure 5**) contain the amino acids that form hydrogen bonds with the homoserine lactone head of the HSL ligand. These residues are highly conserved in LuxR protein homologs, reflecting a common binding mechanism at the non-variable head regions of HSL molecules. In contrast, the variable acyl tail extends into the region of the binding pocket that is formed by residues that show less conservation in LuxR homologs, suggesting a mechanism for HSL selectivity (Bottomley et al., 2007).

TraR and SdiA are most responsive to the ligand 3O-C8- HSL (Michael et al., 2001; Geske et al., 2008). These regulators contain the same b–b–a–a–b–a–b–b pattern of secondary structures in their HSL-binding pocket domains (**Figure 5**). This pair of regulators fits the attractive idea that binding pockets with similar secondary structures may prefer the same HSL ligands. However, comparisons of other regulators challenge this idea. While some regulators that respond to HSLs with smaller or larger R-groups deviate from the b–b–a–a–b–a–b–b motif, there are others, i.e., RhlR, LasR, and SinR, which contain the same motif yet respond to different ligands: C4-HSL, 3O-C12-HSL, and C18-HSL, respectively (Llamas et al., 2004; Kumari et al., 2006; Geske et al., 2008). It is possible that variations in specific residues in RhlR, LasR, and SinR account for their preferences for different ligands. AubR contains a substitution of the third beta strand with an alpha helix in the b–b–a–a–b–a–b–b motif, similar to LuxR and BjaR. LuxR and BjaR respond to 3O-C6- HSL (Canton et al., 2008) and isovaleryl-HSL (Lindemann et al., 2011), respectively. Assuming that the ligand for AubR is C12- HSL [produced by AubI (Nasuno et al., 2012)], AubR, LuxR, and BjaR represent another set of regulators where similarities in secondary structures do not appear to correspond to similar ligands. Exploration of the range of HSL-responsiveness of these regulators may provide more insight into their structure–function relationships.

Interestingly, no HSL-regulator protein-related motifs appear in GtaR. Leung et al. (2012) reported that GtaR regulates its target promoter (a Lux promoter homolog) in response to C16- HSL and cell-free growth medium collected from HSL-producing strains. GatR shows sequence conservation with the TatD family of deoxyribonuclease proteins. Like the LuxR homologs, TatD proteins contain interspersed beta strands and alpha helices. Here, we have annotated predicted secondary structures within GtaR; these domains are inferred from comparisons of GatR with closely related TatD proteins that have published 3D structures. Given its distinct arrangement of secondary structures, GtaR might represent a unique class of HSL-responsive regulator proteins.

# **CONCLUSION AND DISCUSSION**

The information we present here on the diversity of HSL molecules and regulator proteins is insufficient to conclude that the structures of regulator binding pockets and the atomic geometries of the HSL ligands imply orthogonality. Regulators that respond to distinct HSL ligands show different protein folding patterns in some cases but similar structures in others. With limited data on regulator promiscuity, secondary structure alone cannot predict HSL ligand preference; thus, interaction between specific amino acid residues and atoms within the HSL molecule may need to be considered. This investigation is limited by the lack of 3D structure data for LuxR homologs.

Many gaps in knowledge remain in understanding the extent of orthogonality or interaction between the homologous pathways in living cells. To date, the published functional studies of the quorum-sensing homologs in synthetic circuits (HSL synthases, regulators, and promoters) include just three homologous quorum-sensing pathways, or they use purified compounds to stimulate one or a few regulators. Furthermore, the available 3D structure data for regulator proteins is sparse compared to the total number of putative regulator homologs that have been identified via metagenomic analysis (Nasuno et al., 2012). More comprehensive analyses to study the responses of regulator proteins to different HSLs, such as that of Geske et al. (2008), may enable us to predict and select orthogonal pathways for use in complex synthetic systems. For instance, *E. coli* could be used as a universal host to carry dozens of decoupled sender and receiver components (**Figure 1**), derived from the genomes of various bacterial species. Culture media from sender strains could be used to stimulate receiver strains carrying a reporter driven by a receiver system (regulator protein and its corresponding promoter).

The discovery of novel orthogonal quorum-sensing pathways will provide metabolic engineers and synthetic biologists with HSL signaling wires that do not cross-react. Using these insulated, independently functioning pathways, synthetic circuits could be designed to detect distinct combinations of multiple input signals and scale simple single-cell components to sophisticated multistrain circuits. It is imperative to continue research on quorumsensing pathway behavior across multiple disciplines, including crystallography, molecular biology, microbiology, metabolic engineering, and synthetic biology, to fill critical gaps in knowledge that have prevented us from engineering highly sophisticated biological systems.

# **ACKNOWLEDGMENTS**

RD is supported by Achievement Rewards for College Students (ARCS). RM is supported by the ASU School of Life Sciences Undergraduate Research Program (SOLUR). RD and KH are supported by the ASU Foundation: Women and Philanthropy. The authors thank F. Wu, W. Alexander, D. Nyer, S. Hays, J. Kemper, and K. Breeden for constructive criticism and help in finalizing the manuscript.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at http://www.frontiersin.org/Journal/10.3389/fbioe.2015.00030/ abstract

# **REFERENCES**


homologues within the LuxR family, retain the ability to function as activators of transcription. *J. Bacteriol.* 185, 7001–7007. doi:10.1128/JB.185.23.7001- 7007.2003


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 28 December 2014; accepted: 21 February 2015; published online: 10 March 2015.*

*Citation: Davis RM, Muller RY and Haynes KA (2015) Can the natural diversity of quorum-sensing advance synthetic biology? Front. Bioeng. Biotechnol. 3:30. doi: 10.3389/fbioe.2015.00030*

*This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology.*

*Copyright © 2015 Davis, Muller and Haynes. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **Signal-to-noise ratio measures efficacy of biological computing devices and circuits**

*Jacob Beal\**

*Raytheon BBN Technologies, Cambridge, MA, USA*

Engineering biological cells to perform computations has a broad range of important potential applications, including precision medical therapies, biosynthesis process control, and environmental sensing. Implementing predictable and effective computation, however, has been extremely difficult to date, due to a combination of poor composability of available parts and of insufficient characterization of parts and their interactions with the complex environment in which they operate. In this paper, the author argues that this situation can be improved by quantitative signal-to-noise analysis of the relationship between computational abstractions and the variation and uncertainty endemic in biological organisms. This analysis takes the form of a ∆SNRdB function for each computational device, which can be computed from measurements of a device's input/output curve and expression noise. These functions can then be combined to predict how well a circuit will implement an intended computation, as well as evaluating the general suitability of biological devices for engineering computational circuits. Applying signal-to-noise analysis to current repressor libraries shows that no library is currently sufficient for general circuit engineering, but also indicates key targets to remedy this situation and vastly improve the range of computations that can be used effectively in the implementation of biological applications.

# *University of California Davis, USA*

*Linh Huynh, University of California Davis, USA (in collaboration with Ilias Tagkopoulos) Naglis Malys, University of Warwick, UK*

#### *\*Correspondence:*

*Edited by: Karmella Ann Haynes, Arizona State University, USA*

> *Reviewed by: Ilias Tagkopoulos,*

*Jacob Beal, Raytheon BBN Technologies, 10 Moulton Street, Cambridge, MA 02138, USA jakebeal@bbn.com*

#### *Specialty section:*

*This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology*

> *Received: 19 January 2015 Accepted: 15 June 2015 Published: 30 June 2015*

#### *Citation:*

*Beal J (2015) Signal-to-noise ratio measures efficacy of biological computing devices and circuits. Front. Bioeng. Biotechnol. 3:93. doi: 10.3389/fbioe.2015.00093* **Keywords: synthetic biology, controls, signals, digital circuits, Boolean logic, analysis**

# **1. Introduction**

Engineering biological cells to perform computations has been one of the major goals of synthetic biology from its inception (Knight and Sussman, 1998; Elowitz and Leibler, 2000; Gardner et al., 2000; Weiss, 2001). The complexity of computations that have actually been implemented, however, has been quite small (Purnick and Weiss, 2009), only quite recently rising as high as a 3-layer logic circuit comprising 6 regulatory devices (Moon et al., 2012). A number of well-known obstacles have contributed to the difficulty of building multi-element logic circuits, including insufficient numbers of strong regulatory elements for building circuits, undesirable interactions between genetic elements, difficulties in constructing and delivering large genetic constructs, and difficulty in modeling and predicting circuit behavior. A number of ongoing efforts are showing progress toward decreasing these problems in circuit engineering, promising to soon deliver many more strong regulatory elements [e.g., Bonnet et al. (2013), Kiani et al. (2014), and Stanton et al. (2014)], improved isolation between components [e.g., Lou et al. (2012) and Mutalik et al. (2013)], fast and easy construction and delivery [e.g., Weber et al. (2011) and Linshiz et al. (2012)], and better predictive circuit models [e.g., Davidsohn et al. (2015) and Beal et al. (2015)].

Among all of these improvements in our ability to engineer computational circuits, however, there are two critical and surprisingly unresolved questions:


A number of efforts have been made toward providing a clear definition for biological computational devices [e.g., Knight and Sussman (1998) andWeiss (2001)] and toward characterizing their performance [e.g., Canton et al. (2008), Ellis et al. (2009), Kelly et al. (2009), and Beal et al. (2012)]. None of these efforts to date, however, has provided a practical method for quantifying the performance of real devices and circuits that can be implemented with readily obtainable information about biological devices.

This paper aims to provide such a method, based on the mathematical foundation of a signal-to-noise ratio (SNR). The basic idea is this: although biological computation is defined in the platonic realm of abstract numbers and symbols, it must be realized in the noisier physical reality of quantities like chemical concentration. Such reality is never perfect, and a signal-to-noise ratio quantifies how much of a problem the noise is with respect to the intended representation. In electronics, signal-to-noise analysis is a foundational tool for the engineering of computation and communication; this paper now adapts this tool to the engineering of biological computation circuits.

To this end, Section 2.1 of this paper thus begins by reviewing these foundational concepts and adjusting their application to be suitable for biological circuits. Section 2.2 applies this to computing devices, analyzing them in terms of the degree to which they enhance or degrade signal strength under various conditions. Section 2.3 shows how SNR analysis of individual devices can be used to predict the behavior of circuits, and Section 3.1 follows the implications of these methods to develop a new framework for engineering biological circuits based on SNR analysis. Building on this framework, Section 3.2 applies SNR analysis to existing libraries of biological computational devices, finding that none are yet suitable for large-scale circuit engineering and identifying targets for improvement that may remedy this situation. Finally, Section 4 summarizes and considers future directions.

# **2. Materials and Methods**

### **2.1. Boolean Biochemical Signals**

For any biochemical implementation of Boolean values, we need to choose what physical phenomena will be interpreted as the abstract values "true" and "false." In this paper, we will focus on one of the earliest proposed (Knight and Sussman, 1998; Weiss, 2001) and most commonly used representations, in which Boolean values are represented by the concentration of particular chemical species within a cell.

Many other biological phenomena have also been proposed or used to represent Boolean values, including extracellular concentration of chemicals [e.g., Danino et al. (2009)], rate of transcription by DNA polymerase or translation by ribosomes [e.g., Canton et al. (2008)], presence, absence, or inversion of a given DNA sequence [e.g., Bonnet et al. (2013)], epigenetic markings on a DNA sequence [e.g., Keung et al. (2014)], fluorescence or light emission [e.g., Kim and Lin (2013)], and trans-membrane voltage [e.g., Adams and Levin (2013)]. For nearly all such mechanisms of biological computation, however, at some point the coupling between elements in the computation is regulated by the concentration of some chemical species. Thus, for many of these alternative representations of Boolean values, it is possible to identify an equivalent chemical representation to which the signal analysis developed in this paper can be applied.

We can evaluate the quality of a chemical concentration representation of Boolean values by comparing the distribution of concentrations per cell produced when the chemical should be in the "true" state with the distribution of concentrations per cell when the chemical should be in the "false" state. The more that these two distributions overlap, the harder it is to distinguish between them, and therefore the worse the quality of the signal and the more difficult it is to engineer an effective computation. Likewise, the more that the two distributions they are separated, the better the quality and the easier it is to engineer.

In electromagnetic systems, this notion of quality is typically quantified as a signal-to-noise ratio (SNR).<sup>1</sup> Signal-to-noise ratio is normally measured on a logarithmic scale of decibels, which can be computed using the standard definition:

$$\text{SNR}\_{\text{dB}} = 20 \log\_{10} \frac{A\_{\text{signal}}}{A\_{\text{noise}}} \tag{1}$$

where *A* is the root-mean-square (RMS) amplitude of the signal and noise waveforms, respectively (Oppenheim and Willsky, 1997). Applied to a general Boolean signal, this becomes:

$$\text{SNR}\_{\text{dB}} = 20 \log\_{10} \frac{|\mu\_{\text{true}} - \mu\_{\text{false}}|}{2\sigma} \tag{2}$$

computing expected signal amplitude as half the difference between mean "true" value and the mean "false" value (i.e., approximated by the RMS amplitude of a square wave), and noise amplitude as *σ*, the mean standard deviation for true and false states (i.e., the RMS amplitude for the waveform remaining when the intended Boolean signal is subtracted).

Superficially, it seems that the same analysis should be applicable to biochemical systems. In fact, however, this is not the case. The problem is that strong cellular expression of chemicals typically exhibits a log-normal distribution of concentration per cell [see, e.g., Friedman et al. (2006), Beal et al. (2012), Bonnet et al. (2013), Davidsohn et al. (2015), and Stanton et al. (2014)] – even if output might be population-level, computation is currently typically carried out within individual cells, because there are currently many more intracellular than intercellular devices available. This means that both signal and noise are generally

<sup>1</sup>Note that the comparison of signal and noise distinguishes this discussion from prior investigations of gene expression noise in cells [e.g., Elowitz et al. (2002), Ozbudak et al. (2002), Rosenfeld et al. (2005), Bar-Even et al. (2006), and Friedman et al. (2006)]: these prior investigations characterize the characteristics of noise, but we cannot analyze the efficacy of computation without comparing such noise to an intended signal.

better represented using geometric statistics, implying that the signal-to-noise ratio calculation becomes:

$$\text{SNR}\_{\text{dB}} = 20 \log\_{10} \frac{|\log\_{10}(\mu\_{g, \text{true}}/\mu\_{g, \text{false}})|}{2 \cdot \log\_{10}(\sigma\_{g})} \tag{3}$$

where the *µ<sup>g</sup>* variables are the geometric means of the true and false values and *σ<sup>g</sup>* is the geometric standard deviation for both states (i.e., variation expressed in fold times/divide rather than value plus-minus).

The SNR that is actually required depends on the application. For example, if the goal is simply to detect that a computation is followed a specified truth table, this can be accomplished even when the signal is significantly less than the noise. For example, achieving a twofold difference in signal levels in a system with twofold SD of noise requires only an SNRdB of only 20 log<sup>10</sup> *|*log10(2)*|* <sup>2</sup>*·*log10(2) <sup>=</sup> *<sup>−</sup>*6*.*0dB. For another example, controlling cells in an industrial fermenter, such that they select the most efficient of two modes of operation based on changing local conditions, are still a fairly permissive application, since individual cells selecting the wrong choice is likely to have only a minor effect on the overall batch process, and thus might require a fairly low SNR, in the 0–5 dB range. At the opposite end of the scale, a system intended to identify and kill cancer cells inside of a human patient likely needs to have a much higher SNR, perhaps in the range of 20–30 dB, since even a small fraction of cells erroneously killing healthy cells may have a major adverse impact on the patient's health.

For an example of such an SNR calculation, consider the simulated distributions shown in **Figure 1**. These distributions are generated from a log-normal process using values drawn from within the typical range of expression in mammalian cells, as based on the experimental data in (Beal et al., 2012, 2015; Kiani et al., 2014; Davidsohn et al., 2015): *µg,*true = 10<sup>6</sup> molecules of equivalent fluorescein (MEFL),<sup>2</sup> *µg,*false = 10<sup>4</sup> MEFL, and *σ<sup>g</sup>* = 3.2-fold. Each distribution shown is a 10 bins/decade histogram of expressed fluorescence from 50,000 simulated cells.

Here, **Figure 1A** shows high expression representing a true state, while **Figure 1B** shows low expression representing a false state. The geometric means of these two distributions are nicely separated, with an approximately 100-fold ratio between the true and false levels. The cell-to-cell variation, however, is also fairly strong, with a *σ<sup>g</sup>* of more than threefold, resulting in an overall SNR of only 6.2 dB.

Notice that the SNR value here is not very high, due to the high degree of cell-to-cell variation. Such relatively low SNRs are unfortunately rather typical for biological systems, and are an important factor in the difficulty of engineering reliable biological computations. The consequence is a low margin for error in design, putting even more importance on the quality of computing elements.

# **2.2. Effects of Computation on Signal Strength**

Each computational element in a biological circuit, in addition to performing its intended purpose, also affects the signal-to-noise characteristics of the signals passing through it. An element with strong amplification and inputs that are well-matched to its range of operation will produce *true* and *false* outputs that are more distinct than the inputs, i.e., with an increased SNR. An element with poorly matched inputs or poor amplification, on the other hand, will produce outputs that are less distinct than the inputs, i.e., with a decreased SNR. We may thus summarize the "quality" of a computational element in terms of the difference between input SNR and output SNR across the various combinations of inputs with which it may be supplied:

$$
\Delta \text{SNR}\_{\text{dB}} = \text{SNR}\_{\text{dB,output}} - \text{SNR}\_{\text{dB,input}} \tag{4}
$$

Under this definition, the higher the ∆SNRdB, the better the biological element is at implementing a computation.

In general, the effect of a computational element is on SNR is not uniform, but depends on the circumstances of its use. This fact is independent of any additional biological effects of context, such as metabolic competition, toxicity, or translational read-through. Rather, it is an inherent characteristic of the nonlinear relationships between input and output found in most computational elements: different combinations of input levels and noise environment (*µg,*true, *µg,*false, and *σ<sup>g</sup>* ) produce different output SNRs. The output SNR is also affected by the dynamics of a signal, e.g., how often the value of the input changes. For the

<sup>2</sup>MEFL units will be the unit of choice throughout this paper, as population histograms can be readily obtained experimentally in MEFL using protocols such as Beal et al. (2012), whereas other units such as concentration or number of molecules are much more difficult to obtain for large numbers of single cells at present. Using MEFL thus makes a simpler path for validation and application in the laboratory of the results presented.

analysis in this paper, however, we will focus only on converged behavior in response to a stable input.<sup>3</sup>

The ∆SNRdB for a computational element can be computed from an input/output curve, i.e., a function measuring the outputs observed across a range of input levels. **Figure 2A** shows three simulated examples of such curves for repressor devices with one input and one output (note that to allow easy visualization, all examples in this paper will be restricted to one input and one output, but the methods presented work for multiple inputs and multiple outputs as well). The three example devices have input/output curves *f<sup>i</sup>* generated using Hill equations (Hill, 1910) 4 of the form:

$$Out = \alpha \cdot \frac{1 + K^{-1} \left(\frac{In}{D}\right)^H}{1 + \left(\frac{In}{D}\right)^H}$$

with parameters selected to place the curve within the typical observed range for current repressor devices (Bonnet et al., 2013; Kiani et al., 2014; Stanton et al., 2014; Davidsohn et al., 2015), and both *In* and *Out* concentrations expressed in MEFL. In particular, Device A (blue) uses *K* = 10<sup>3</sup> , *D* = 10<sup>5</sup> , *<sup>H</sup>* <sup>=</sup> 2, *<sup>α</sup>* <sup>=</sup> <sup>3</sup> *<sup>×</sup>* <sup>10</sup><sup>7</sup> , while Device B (red) uses *K* = 10<sup>2</sup> , *D* = 10<sup>6</sup> , *<sup>H</sup>* <sup>=</sup> 2, *<sup>α</sup>* <sup>=</sup> <sup>2</sup> *<sup>×</sup>* <sup>10</sup><sup>6</sup> , and Device C (black) uses *K* = 10<sup>3</sup> , *D* = 10<sup>5</sup> , *<sup>H</sup>* <sup>=</sup> 1.2, *<sup>α</sup>* <sup>=</sup> <sup>3</sup> *<sup>×</sup>* <sup>10</sup><sup>7</sup> . As can be seen in **Figure 2A**, Device A has a larger range than Device B, though their slope is similar in the transition between values; Device A also has a more similar range for its outputs and inputs. Device C, meanwhile, has similar input and output ranges to Device A, but a significantly flatter slope.

When the expression noise *σ<sup>g</sup>* is low, the ∆SNRdB for a computational element is converged to a maximum determined by the difference between input range and output range. At the opposite extreme, as *σ<sup>g</sup>* continues to increase, the ∆SNRdB decreases, eventually converging to a linear slope entirely dominated by noise. For example, **Figure 2B** shows ∆SNRdB for the three example devices as a function of uncorrelated noise<sup>5</sup> for inputs with *µg,*true = 10<sup>8</sup> MEFL and *µg,*false = 10<sup>4</sup> MEFL, simulating 10 samples of 50,000 cells per sample. Notice that as expression noise decreases toward the minimal-noise limit of *σ<sup>g</sup>* = 1, the ∆SNRdB converges to an upper limit of around *−*2.5 dB for Device A, *−*6 dB for Device B, and *−*3 dB for Device C. As the expression noise increases, the SNR degrades as the distributions become less separated. By *σ<sup>g</sup>* = 3, the noise is having a noticeable effect on all three devices, and by *σ<sup>g</sup>* = 10 it degrades device performance by around 1.5 dB for all three devices.

A good upper limit on the computational quality of a device can thus be estimated by considering the noise-free limit of its performance for various input levels. **Figure 3** illustrates this with a simulation parameter scan of *µg,*true and *µg,*false for each example device. In specific, the scan simulates all combinations of *µg,*true *> µg,*false in the range of 10<sup>4</sup> –10<sup>8</sup> MEFL in logarithmic steps at 50 levels/decade, for each combination running one sample of 50,000 cells at a very low noise *σ<sup>g</sup>* = 1.02. Given the input/output functions for each of the example devices, the maximum output SNR at this noise level is 43.5 dB for Device A, 40 dB for Device B, and 43.1 dB for Device C. For each device, the strongest output SNR is found for input signals in the saturated regions of the device input/output curves: for Device A roughly *µg,*true *>* 106.5 and

<sup>3</sup> Such static analysis is expected to be a reasonable first approximation of cellular behavior with strong signals, given the apparent dominance of extrinsic vs. intrinsic noise under such conditions per Elowitz et al. (2002) and Rosenfeld et al. (2005). <sup>4</sup>Hill equations simulate regulated production, not concentration, but in steady state (as we consider here) the two are linearly related by a constant.

<sup>5</sup>Correlations in expression noise can shift the results slightly, but the overall trends remain the same.

*µg,*false *<* 10<sup>5</sup> , for Device B roughly *µg,*true *>* 10<sup>7</sup> and *µg,*false *<* 10<sup>6</sup> , and for Device C roughly *µg,*true *>* 107.5 and *µ<sup>g</sup>* ,false *<* 10<sup>5</sup> . At the boundary of this region, the strong slope of the Devices A and B allows some minor signal restoration, but outside of a relatively small "sweet spot" the output SNR degrades badly with respect to the input SNR. Device C has a similar "sweet spot" pattern, but its lower input/output curve slope means that even its best possible performance still sees a significant signal degradation ∆SNRdB = *−*1.6dB.

Such a ∆SNRdB chart can provide a good first analysis of the efficacy and operating range of a device. For example, with our three example devices, Device A has a decent range of potential use, while Device B is much narrower, and Device C, although has a very strong on/off ratio, significantly degrades signal strength even under ideal conditions of usage.

In practice, of course, there is typically a significant level of expression noise, which further degrades the SNR characteristics of a device. With measurements of the expected *σ<sup>g</sup>* for a device (which can be readily obtained through high-throughput per-cell assays such as flow cytometry or microscopy with automated image analysis), we can apply the same SNR analysis to estimate the actual expected ∆SNRdB, which will always be overall worse (more negative) than in the ideal minimal-noise

condition. **Figure 4** shows an example of such an analysis for Device A with *σ<sup>g</sup>* = 3, a typical level of observed expression noise (simulated using the same parameters as before). Notice that the essential character of the chart is not changed, meaning that the conclusions drawn from the maximum SNR analysis still apply. All of the features of the SNR chart, however, are more "blurred," degrading the regions of high performance. Ironically, this also somewhat mitigates the regions of worst performance, but performance in these regions is still generally too poor to be useful.

Computation of ∆SNRdB can be used as a first stage of triage in analyzing whether a given biological device will be useful in attempting to realize digital computations. First, a device cannot be used at all unless it has both a ∆SNRdB that is sufficient to meet application SNR requirements, and also can achieve that ∆SNRdB requirement in a range matched with its inputs [such evaluation has the obvious pre-requisite of characterizing the input/output relation using SI units rather than relative units, e.g., by means of the protocols in Beal et al. (2012, 2015), Davidsohn et al. (2015), and Kiani et al. (2014)]. Beyond that, the wider the region of good ∆SNRdB, the easier it will be to match a device with others to form a circuit and the more tolerant a device will be of other types of perturbations inflicted by its context of deployment.

# **2.3. Multi-Device Computational Circuits**

charts above present the same analysis of Device A as in **Figure 3**,

Just as the computational efficacy of a single biological device may be analyzed in terms of its signal-to-noise characteristics, so can the same approach be applied to analyzing the computational efficacy of an entire computational circuit. The complete circuit can, after all, be viewed as just a more complicated device, and the SNR for its inputs and outputs be computed in the same way as for a single device.

The converged SNR characteristics of a circuit with no feedback loops can be predicted using the single-device SNR charts presented in the previous section. As has recently been demonstrated (Davidsohn et al., 2015), the mean and expression variation of such circuits can be predicted with high accuracy. Given such predictions, the maximum possible ∆SNRdB can be predicted directly from the input signal levels, using the input-output curves and ∆SNRdB analyses for the individual devices. For example, consider a chain of repressors, acting as logical inverters. For the *i*th inverter in the chain, its output is given by its input/output function *f<sup>i</sup>* , producing the input for the next stage. In the minimalnoise case, the SNR changes at each step are independent, meaning that they add linearly. The ∆SNRdB for the circuit can thus be computed by composing together input-output curves to predict the inputs for each device, then summing for each device *i* the device ∆SNRdB*,i* along the path from input to output. This produces a total end-to-end change of:

$$\begin{aligned} \Delta \underset{\text{dR,total}}{\text{SNR}} &= \Delta \underset{\text{dB},1}{\text{SNR}} (\mu\_{g,\text{true}}, \mu\_{g,\text{false}}) \\ &+ \Delta \underset{\text{dB},2}{\text{SNR}} (f\_1(\mu\_{g,\text{false}}), f\_1(\mu\_{g,\text{true}})) \\ &+ \Delta \underset{\text{dB},3}{\text{SNR}} (f\_2 \cdot f\_1(\mu\_{g,\text{true}}), f\_2 \cdot f\_1(\mu\_{g,\text{false}})) + \dots \quad \text{(5)} \end{aligned}$$

The overall efficacy of a circuit is thus a function of both the SNR properties of individual devices and how well signal levels are matched between devices. As seen in the previous section, positive SNR ranges may often be quite narrow, and even a relatively small mismatch can be disastrous for the efficacy of a computation.

the lower end to provide better resolution in the upper range.

For example, **Figures 5A–D** show the ∆SNRdB for chains of one to four repressors, each with the characteristics of Device A. This is a nice example of (potentially) effective digital computation: Device A is strong enough to restore signal and a fairly good match between its input and output ranges. As a result, any input starting in a fairly broad region of the upper left can maintain a strong SNR over multiple stages of computation – in fact, an unbounded number in the absence of noise. Inputs falling outside of this good operating range, however, quickly degrade away to very low SNR.

This presages the problems that occur when the output and input levels of devices are not as well matched (or less strong, which makes for a smaller "sweet spot" and more difficulty in matching levels). For example, **Figures 5E–H** show chains of one to four repressors with the characteristics of Device B. Although its performance characteristics are not much worse than Device A for a single repressor (as seen in the previous section), the poor match with a narrow high-SNR "sweet spot" means that ∆SNRdB collapses when a second repressor is added – much worse than the twice the original ∆SNRdB – and continues to degrade thereafter. Indeed, the "least bad" region is where the high and low inputs hold almost the same value to begin with, meaning there is little signal to be lost in the first place.

With a good match between signal levels but not a steep enough slope of the input/output curve, there is a third mode of behavior. This is exemplified by **Figures 5I–L**, which show chains of one to four repressors with the characteristics of Device C. Without a region of positive ∆SNRdB, the signal cannot be sustained, but degrades incrementally. With devices of this sort, it is impossible to implement many-layer computations, but computations with only a few devices between any input and output are viable.

As with individual devices, of course, the minimal-noise model gives only a best-case evaluation of the computational efficacy of a circuit. This is still valuable, because it can be used to eliminate

many non-viable options and to triage viable options based on the difficulty of attaining the (SNRdB required for an application.

Just as with individual devices, however, we can use the same signal-to-noise models to estimate the performance of a circuit with higher *σg*. As before, the best (SNRdB is expected to be less than can be achieved in a minimal-noise circuit, though some of the worst performance can be mitigated. Unlike the minimalnoise case, however, we cannot precisely predict performance of the circuit by adding single-device SNR losses. At higher levels of expression noise, SNR losses are not independent because the operation of each device affects the effective *σ<sup>g</sup>* observed by the devices that consume its output. We can, however, estimate a conservative lower bound on performance by adding singledevice SNR losses. For example, **Figure 6** shows (SNRdB for the Device A repressor chains with *σ<sup>g</sup>* = 3. **Figures 6A–D** estimate the value from the (SNRdB of Device A with *σ<sup>g</sup>* = 3 shown in **Figure 4A**, while **Figures 6E–H** simulate chains of Device A using the same parameters as in Section 2.2 (*K* = 10<sup>3</sup> , *D* = 10<sup>5</sup> , *H* = 2, *α* = 3 *×* 107, *σ<sup>g</sup>* = 3,50,000 cells per sample). As expected, these show that the estimate from individual devices is a good lower bound on the performance that can be attained from the device under conditions of noise, with the actual simulated performance being somewhere above that and below the minimal-noise performance.

# **3. Results**

# **3.1. Implications for Biological Circuit Engineering**

Let us now consider how the engineering of biological circuits can be assisted by these models. We must, however, remember that having a strong predicted SNR for a circuit will not ensure

that a biological circuit computes effectively, any more than using standard TTL components will ensure that an electronic circuit computes effectively: there are many other types of problems that also might interfere with the desired behavior. Importantly, though, having a strong SNR (both the overall circuit and the (SNRdB at individual devices) does mean there is more margin for error in dealing with these other aspects of circuit engineering. Complementarily, an insufficient predicted SNR is a virtual guarantee that the circuit will not work. SNR analysis may thus be expected to be a useful tool for discriminating between possible circuit design alternatives.

As seen in the previous sections, in order to apply SNR analysis, it is necessary to have the following characterization data for each computational device:


To apply SNR analysis to a circuit requires the following additional information:

*•* The topology of the circuit, specifying the interconnections between device inputs and outputs.


Given a library of characterized devices and a circuit specification, it is then possible to search for good candidate circuits. The best candidates should go beyond satisfying output SNR requirements and maximize output SNR, in order to have the most margin for dealing with other engineering difficulties. With a homogeneous library of devices with very similar behavior [e.g., as CRISPR-based repressors appear likely to provide (Kiani et al., 2014)], circuit viability can be determined by a straightforward application of the SNR analysis presented in the prior section and devices assigned arbitrarily. With a more heterogeneous library, [e.g., the TetR homolog library in Stanton et al. (2014)], different combinations of devices will have different properties, but the design problem should still be susceptible to efficient search with any number of well-established constrained-search methods (Russell and Norvig, 2003).

More important to the success of circuit engineering is the SNR characteristics of the devices in the library. The three circuit examples in the previous section are characteristic of three general qualitatively different "phases" of expected difficult in engineering biological circuits. These phases are predicted by considering the selection of devices from a library as a search process proposed by Beal (2014). The behavior of such a search process is critically affected by degree of coupling between design choices (i.e., the likelihood that two independent choices are incompatible), as has been well-established in complexity theory (Cheeseman et al., 1991; Hogg et al., 1996) and statistical physics

<sup>6</sup>Technically, relative units could be also used [e.g., RFUs, per Kelly et al. (2009)], but the lack of SI measurements means that it is much more difficult to debug any problems that arise, particularly with regards to differences between practitioners or laboratories.

(Krzakala and Kurchan, 2007; Dall'Asta et al., 2008; Zdeborová, 2008). In this case, the degree of coupling is determined by the likelihood of two devices having an output/input match with a high (SNRdB, leading to three qualitatively different expected engineering environments:

*• Difficult circuits*: when most biological devices in a library are either weak or poorly matched (e.g., having characteristics like Device B), it is difficult to discover a working combination of components even in the best circumstances.

Engineering computational circuits using such devices is expected to be characterized by extensive and lengthy "tuning" and many failed attempts, since even small perturbations in device characteristics (e.g., from the biological operating context) can result in massive SNR losses.

*• Shallow circuits*: when many biological devices in a library have a large region of small negative SNR (e.g., having characteristics like Device C), it is easy to find acceptable matches, but there is still significant signal loss at every device.

Engineering computational circuits using such devices is expected to be relatively simple for circuits up to a certain depth, because there is tolerance for small perturbations and many good candidates for working circuits. When the circuit requires more depth than can be readily attained while maintaining sufficient SNR, however, this strain raises the effective coupling and it will be extremely difficult to engineer an effective circuit of such depth, just as in the prior case.

*• Deep circuits*: when many biological devices in a library have a well-matched region with positive SNR (e.g., having characteristics like Device A), it is easy to find combinations of devices where signals do not degrade from layer to layer.

Engineering computational circuits using such devices is expected to no longer be constrained by issues of computational efficacy: in principle, circuits of any depth and complexity can be readily engineered, and limits instead come from other aspects of the biological implementation, such as circuit delivery and demand on cellular resources.

Analysis of circuit and library SNR characteristics can determine which of these engineering environments we are operating in. Note, however, that there are no "hard" boundaries between phases: rather, as SNR characteristics improve, there is a gradual shift in the dominating engineering constraint from signal matching to signal degradation to non-signal constraints (with concomitant conclusions that can be drawn about the likely difficulty of circuit engineering). Unfortunately, knowing for certain if we are in trouble, while useful, does not actually make it any easier to engineer circuits. Quantification of SNR characteristics can, however, point to what target properties need to be achieved in order to move to a better engineering environment.

# **3.2. Prospects for Deep Circuit Libraries**

Given the widely observed difficulty of engineering biological systems [e.g., Kwok (2010)], it seems intuitive to guess that synthetic biology is currently operating in the "difficult circuits" regime. By applying SNR analysis to current high-efficacy device families, we can verify that this is actually the case. More importantly, however, we can also estimate approximately how far these device families are from the "shallow circuit" or "deep circuit" regimes, and what changes would be likely to allow them to attain those goals. When analyzing some properties of some device families, the relevant device characteristics are well enough known to allow rough quantification of requirements; in other cases, only qualitative conclusions can be drawn at present.

At present, there are several families of biological computational devices with the prospect of producing large numbers of universal logic devices with a high differential between output signal levels. The strongest current candidates are homolog mining, integrase logic, TALE and zinc finger repressors, and CRISPRbased repressors, each of which we discuss in detail. Other promising candidates include miRNA, aptamers, RNA-binding proteins, riboregulators, and protein/protein regulation, but all of these currently face various obstacles that mean they appear to be significantly farther away from providing large families of strong universal logic devices. As these technologies continue to mature, however, the same type of analysis presented in this section can be applied to them as well.

# 3.2.1. Homolog Mining

The TetR repressor is a naturally occurring strong repressor that has been used successfully in many systems. Genomic mining for TetR homologs has produced a library of 20 orthogonal repressors, many of them with quite strong on/off ratios (Stanton et al., 2014). Each repressor has also been characterized with a highresolution input/output curve (though only in relative units), and the models for these input/output curves are published in Stanton et al. (2014). **Figure 7** shows parameter scans of (SNRdB for a wide range of input level combinations for all 20 devices, using *σ<sup>g</sup>* = 2.0 as a conservatively low estimate of a typical value of bacterial expression noise, as estimated from the histograms reported in Stanton et al. (2014) and the noise values reported in Ozbudak et al. (2002). Parameters scans are performed as in Section 2.2 except shifted to the relative unit range of the devices (10*−*<sup>2</sup> –10<sup>2</sup> relative units) and more coarsely, at five levels/decade. A summary of the results is given in **Figure 8A**, which lists the maximum (SNRdB for each device, along with the on/off ratio reported in Stanton et al. (2014). Of the 20 reported gates, only 4 have the positive (SNRdB needed that is a pre-requisite for deep circuits. Somewhere around another 5–10 are likely have sufficiently strong (SNRdB for shallow circuits, given their relatively high amplification and moderate signal loss. Since the library is highly heterogeneous (**Figure 8B**), signal matching must be done on a circuit-by-circuit basis. One significant challenge that is clear from the input/output functions, however, is that few devices have an output *σg*,false low enough to achieve the optimal (SNRdB input; the mismatches between devices can thus be expected to lower the effective (SNRdB that can be achieved for any circuit.

Nevertheless, this library is the closest currently in existence to supporting deep circuits. Key targets for developing that capability are to further expand the library by additional mining, to calibrate the input/output curves to SI units, and to adjust the signal levels to better match, likely by decreasing output expression via 5*′*UTR modifications.

# 3.2.2. Integrase Logic

Integrase logic gates, which operate by inverting segments of DNA, have been demonstrated to produce input/output curves with a very high amplification in their transition between high and low output (Bonnet et al., 2013). No model parameters were included in the publication, but the very steep slope of the curves makes it clear that these devices should have a high maximum (SNRdB. This is tempered, however, by a significant number of cells that do not change state, leading to a (SNRdB that appears to be net negative rather than net positive.

**(A–T)** show ∆SNRdB for each device in the library, sorted in descending order

At present, however, these integrase logic gates have quite poorly matched input and output signal levels. In addition, to date, very few have been demonstrated: it is reasonable to expect that many more might be discovered through homolog mining, though the availability of usable naturally occurring of orthogonal integrases is not yet clear. Key targets for expanding this technology into a library capable of deep computation are genomic mining to expand the number of devices, calibration of the input/output curves to SI units, and adjustment the signal levels to better match, likely by decreasing output expression via 5*′*UTR modifications.

better resolution in the upper range.

curves, computed from models, provided in Stanton et al. (2014).

# 3.2.3. TALE and Zinc Finger Repressors

TALE proteins are a modular DNA-binding protein that can be engineered to bind to specific sequences with high specificity. Coupled with appropriately designed promoters, TALE proteins have been used to implement extensible libraries of strong promoters (Garg et al., 2012; Davidsohn et al., 2015; Li et al., 2015). TALE repressors can produce remarkably strong repression [measured at a maximum of nearly 5000-fold repression in Garg et al. (2012)]. Detailed input/output curves taken for TALE repressors in Davidsohn et al. (2015) and Li et al. (2015), however, have found a poor slope and uncertain match between input and output levels, implying a poor (SNRdB for composed TALEs – consistent with the low input/output differential observed in the composite circuits investigated in that paper.

At present, TALEs are thus viable only for implementing very shallow circuits with low SNR. One likely path for increasing their potential depth is to increase repression strength by adjusting the synthetic promoter architectures used for TALE repressors. Given the level of deamplification observed in circuits in Davidsohn et al. (2015) and Li et al. (2015), an approximately 10-fold increase would likely be sufficient and may be attainable through this approach. Another possibility might be to heighten cooperativity (steepening the input/output curve) by changing the TALE to a fusion protein. Furthermore, the characterization in Davidsohn et al. (2015) was of transient rather than converged operation (i.e., fluorescence levels were still changing over time, rather than having reached a stable level of expression), and it is possible that TALE repressors may have a significantly steeper input/output curve when converged.

Zinc finger repressors are a very similar modular protein technology, which has also been demonstrated to produce strong orthogonal repressors [e.g., Khalil et al. (2012) and Lohmueller et al. (2012)]. No detailed input/output curves of these strong repressors have been produced to date, so obtaining input/output curves in SI units is the first key step to evaluating the viability of zinc finger repressors as a library. Given the similarity in promoter architectures used in the two technologies, however, it seems likely that they will face similar challenges to TALE repressors.

# 3.2.4. CRISPR-Based Repressors

CRISPR-based repressors are a recent addition to the set of candidate libraries (Kiani et al., 2014), based on a protein that can be targeted with high specificity by a separately expressed sequence of guide RNA (gRNA). Like TALE and zinc finger repressors, they have showed very high repression strength, and may be significantly more homogeneous and easier to engineer with since the sequences are much shorter and do not involve any protein design. They have not yet had detailed input/output curves measured, however, and what characterization has been done to date has been of transient rather than converged behavior, as with TALE repressors.

For this family, the clear next step toward deep computation is to determine the SNR characteristics of the components, though this is complicated by their current use of a Pol III promoter to express gRNA, which is not compatible with the fluorescent proteins typically used for characterization. If the CRISPR-based repressors prove to have a steep slope in their converged behavior, then their SNR may already be sufficient for deep circuits; otherwise, they will likely require similar promoter engineering to TALE and zinc finger devices.

All told, we see that the current situation of synthetic biology is one of difficult circuit engineering. Even though some devices provide good SNR, there are not enough and there is not enough compatibility to reliably support engineering of either shallow or deep circuits. Other devices may also provide good SNR, but require characterization before this can be determined and, if true, effectively exploited. For all of these families of devices, however, the SNR approach identifies key targets for improvement that appear to be reasonable to aim for and that offer the prospect of enabling deep circuit engineering and the transformative capabilities that would imply.

# **4. Discussion of Contributions**

This paper has developed methods for characterizing the efficacy of biological computing devices and circuits based on signal-to-noise ratio. This approach has the advantage of being firmly mathematically grounded in the fundamental definition of a signal, and can be applied using readily obtainable characterization data. This paper has also illustrated the use of SNR methods by applying them to analyze individual devices and predict the behavior of circuits in simulation, as well to develop a framework for SNR-based circuit engineering. Finally, a SNRbased analysis of current device libraries indicates that, while no library is yet sufficient to support deep biological circuits, several may be able to if particular targeted improvements can be realized.

# **References**


One important direction for further development of this method is to extend it to a broader range of circuits and behaviors. Although this paper considered only static analysis of combinational Boolean logic circuits, there is no reason to think these cannot be extended to feedback circuits, analog circuits, and dynamic behavior of circuits. Another important direction is verification of the analysis and predictions made in this paper in the laboratory. This paper has also made specific predictions about particular targeted improvements to existing device libraries that should enable the engineering of deep biological circuits. In parallel with the progression of these other efforts, SNR methods are largely complementary to the methodologies considered by the many various prototype higher-level genetic circuit design tools [e.g., Myers et al. (2009), Beal et al. (2011), Bilitchenko et al. (2011), Marchisio and Stelling (2011), Yaman et al. (2012), and Huynh et al. (2013), to name a few], and have the potential to improve their operation by improving the metrics used by such tools for evaluating various design options. Investment to realize these improvements may thus have a revolutionary effect on the capabilities of synthetic biology, by enabling rapid engineering of complex computation and control circuits.


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Beal. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **Andrés Moya1,2,3\***

<sup>1</sup> Cavanilles Institute of Biodiversity and Evolutionary Biology, University of Valencia, Valencia, Spain

<sup>2</sup> Valencian Region Foundation for the Promotion of Health and Biomedical Research (FISABIO), Valencia, Spain


#### **Edited by:**

Pablo Carbonell, University of Evry, France

#### **Reviewed by:**

Daniel James Nicholson, University of Exeter, UK

**Keywords: natural entities, artificial entities, metabolism, evolution, machines**

Oftentimes, topics that might fall outside of science's remit seem to end up becoming a part of it, sooner or later. This appears to be the case of synthetic biology, a new biological science (although some maintain that it is a form of engineering, or treat it as such; Endy, 2005), which seems to have become essential to the understanding of living beings and their extreme manipulation. I believe it to be a new form of biology. In truth, synthetic biology has a long history and, conceptually speaking, may well have formed part of the interests and research efforts of our illustrious predecessors throughout the first half of the twentieth century and even earlier. In any event, and broadly speaking, it may be asserted that such efforts were premature and that the state of the science, at that time, did not allow for progress, in terms of the modification, creation, or recreation of organisms or parts thereof that we have today. Now, it has rather come of age, which is why I think of synthetic biology as a young or new biological science (Moya, 2014).

To a great extent, when you talk about synthetic biology, what you are doing, first and foremost, is making a statement of intent, if not expressing your concern, wish, or hope that any biological organism which this discipline might examine should be completely under control, and that it should not deviate even the slightest bit from the role that has been ascribed to it. Of course, if some achievements had not already been made in the field of synthetic biology, clinging to a mere statement of intent would do little to further its chances of survival in the future. If some sciences

make headway, it is because some of their initial achievements thrust fame, awareness, and recognition upon them, in the eyes of the scientific community and the rest of society. Here is where we start to get to grips with wishes and hopes for these new sciences and their potential. This is what is happening with synthetic biology.

Fundamentally, the synthetic biologist aspires to make manufactured biological entities behave just as a car might, once it has been put together on an assembly line. The metaphor of the car is as valid as that of any other mechanical entity, or any other type of entity, with perhaps the only condition being that they should all be put together on an assembly line. A car, effectively, begins to take shape on an assembly line and ends up having a specific physical form. It comprises many different components and each one has a specific function, as designated by the manufacturer. Together, its components combine to form a mechanical entity that functions in a pre-determined way because of prior knowledge of the manner in which each component works, and the manufacturer puts them all together following a predetermined plan, so that the whole may function as desired. Attempting to make a manufactured biological entity function in the same way as a car leads us to make two relevant observations. To begin with, there is the obsolescence of the entity, then there is intervention in the entity itself. The nature of these properties differs in cells when they are compared to cars.

First, let us take a look at the question of obsolescence. As everyone knows, a car has a limited lifespan. Its constituent

parts, in particular those that are essential in order for it to run, become worn down, inevitably, until the materials [from which they are made] break down and become altered through the contact of some parts with others, or on account of the various reasons for which the car might cease to run. I refer not to the failure of the car to run, on account of damage incurred following an accident, but quite simply to wear and tear, rendering it unable to perform the purpose for which it has been intended. To what degree might this metaphor apply to the biological entity, in general, and the syntheticbiological entity, in particular? In truth, the following question might be asked, equally, of the biological entity and the car. Might it suffer some type of wear and tear that causes it either to stop functioning or to function in a different way before, for example, reaching the culminating point of its division or reproduction? In other words, before it can reproduce itself? This is, in effect, the case. Let us take, for example, an individual biological entity; a microorganism or a cell from a multicellular organism which, on account of its specialization, could be a germ cell – the purpose of which is the reproduction of the organism – or a non-germ cell, which will subdivide through mitosis, creating copies of itself. In the case of the microorganism, or the non-germ cells, there comes a point at which they divide, producing two genetic copies of themselves; however, up until that point, they have undergone transformation processes, ending with metabolism, that have altered them in relation to what they might have

become in the moment that they were generated by their respective parent cells. It is generally believed that it is only at the point of division (reproduction), when genetic changes occur, that they are at their most relevant, when considering any possible transformation. These changes manifest themselves once they start to exhibit genetic differences from their progenitors; however, what I am talking about here are previous changes in the metabolic machinery. The cell is transformed, grows, and, before multiplying, changes. Consequently, this is not just about genetic changes; it is also about changes in the cell's metabolism (de Lorenzo, 2014). Contrary to cars, that cannot avoid wear and tear, cells are undergoing metabolic changes to avoid obsolescence up to a point. A microbial cell is usually said to be immortal, in that before it ceases to exist in that form, it has already divided. And in a multicellular organism, with specialized reproductive cells, it is usually these cells themselves that join with the reproductive cells of other organisms, having previously undergone meiosis, in order to reproduce. Yet, all of these different types of cells – the immortal microbial cells, the germ cells that have the capacity to reproduce, and all the other non-germ cells that undergo mitosis – transform metabolically, before reaching their respective points of division and reproduction. And these changes, which are quite noticeable and dramatic, can be crucial. So much so that they may end the very life of the corresponding entities before division or reproduction can take place. All of these cells have a lifespan that has been optimized, through evolution, to ensure that they reach the point of division or reproduction, whichever is relevant. We can, it seems, only take the metaphor of the cell as car so far. There is something that does not quite fit, because eventually cars become obsolete and end up in the scrapyard, whereas cells, even though they have a limited lifespan, and undergo transformation and decomposition eventually, are able to reproduce. They arrive, it would seem, at these stages, with a certain degree of autonomy. The higher autonomy displayed by living systems has its basis in their internal organizational dynamics, that is, in the fact that cells (and organisms in general) are self-maintaining, self-organizing, selfrepairing, and self-reproducing systems

(Nicholson, 2013). It is these properties that confer organisms a far greater degree of functional autonomy when compared to machines. The metaphor of the car would only be valid if it were possible to extend the lifespan of its component parts and if, by means of systematic repair, the car continued to be the same car that rolled off the assembly line. In order for the car to remain the same, intervention must take place. And here is where we come to our second observation, the biological entity – in order to remain the same and reach the point of division or reproduction – effects its own intervention, regulating itself by means of its metabolism. All of evolution – more than 3000 million years of ceaseless effort – has conspired to ensure that this self-driven intervention of single-cell organisms, which divide, and multicellular organisms, which have cells that divide and others that reproduce, is effective; in other words, ensuring that the transformations that take place as a result of these organisms being in permanent interaction with their environment do not do so to such an extent that they cause them to decompose or degenerate before they can divide or reproduce (Danchin, 2009). The obsolescence of a cell is remedied by self-intervention, whereas the wear and tear of a car is externally remedied. The crucial difference with regards to intervention seems to be that cars require external intervention to remain viable and operational whereas cells have internal means of repairing damaged parts and compensating against external perturbations. This observation, which I consider to be crucial, and which I have formulated with regard to biological entities, must be kept in mind when considering syntheticbiological entities. Because even when cars become obsolescent, and their obsolescence can be remedied through intervention, biological entities more or less remedy their own obsolescence by effecting their own intervention.

If a synthetic-biological entity is, by definition, a biological entity, it will resist obsolescence autonomously, or much more autonomously than a car might (Nicholson, 2013). It would only fail to do so if, focusing on the synthetic part of the entity, we were to introduce some types of control that might prevent this natural dynamic. I cannot, however, conceive of another way of controlling the biological entity that is not based on absolute knowledge of the entity. In other words, intervention into the biological entity or, failing that, manufacturing a biological entity from biological components, i.e., creating a synthetic-biological entity, differs fundamentally from the case of the entity we refer to as a car. Every aspect of the latter is designed, right from the start. By contrast, the biological entity is not designed, and our knowledge of how it functions is in no way complete. This being the case, we find ourselves in uncharted territory, not that I wish to suggest that we are in completely uncharted territory, or that there is no way to control what we are dealing with. There exist various controls of and external interventions into the entity itself, which may render said entity controllable, or perhaps, I should say, increasingly controllable, rather than totally controllable. The synthetic-biological entity is preceded by a biological entity which, as an autonomous system in the process of evolution, requires a certain level of knowledge in order for its self-effected interventions to be controlled (Serrano, 2007).

#### **ACKNOWLEDGMENTS**

This work was supported by projects SAF2012-31187 from the Ministerio de Ciencia e Innovación, PrometeoII/2014/065 from the Generalitat Valenciana, Spain, and ST-FLOW from the European Commission. The author acknowledges comments of the reviewer to improve the manuscript.

### **REFERENCES**


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 20 October 2014; accepted: 11 November 2014; published online: 26 November 2014.*

*Citation: Moya A (2014) Obsolescence and intervention: on synthetic-biological entities. Front. Bioeng. Biotechnol. 2:59. doi:10.3389/fbioe.2014.00 059*

*This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology.*

*Copyright © 2014 Moya. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or* *reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# A sense of balance: experimental investigation and modeling of a malonyl-CoA sensor in Escherichia coli

# **Tamás Fehér 1,2, Vincent Libis 1,3, Pablo Carbonell 1,4,5 and Jean-Loup Faulon1,5\***


<sup>5</sup> SYNBIOCHEM Center, Manchester Institute of Biotechnology, School of Chemistry, University of Manchester, Manchester, UK

#### **Edited by:**

Jean Marie François, CNRS, France

#### **Reviewed by:**

Mattheos Koffas, Rensselaer Polytechnic Institute, USA Shota Isogai, Technology Research Association of Highly Efficient Gene Design, Japan

#### **\*Correspondence:**

Jean-Loup Faulon, 5 rue Henri Desbrueres, Genavenir 6, F-91030, Evry Cedex, France e-mail: jean-loup.faulon@issb. genopole.fr

Production of value-added chemicals in microorganisms is regarded as a viable alternative to chemical synthesis. In the past decade, several engineered pathways producing such chemicals, including plant secondary metabolites in microorganisms have been reported; upscaling their production yields, however, was often challenging. Here, we analyze a modular device designed for sensing malonyl-CoA, a common precursor for both fatty acid and flavonoid biosynthesis. The sensor can be used either for high-throughput pathway screening in synthetic biology applications or for introducing a feedback circuit to regulate production of the desired chemical. Here, we used the sensor to compare the performance of several predicted malonyl-CoA-producing pathways, and validated the utility of malonyl-CoA reductase and malonate-CoA transferase for malonyl-CoA biosynthesis.We generated a second-order dynamic linear model describing the relation of the fluorescence generated by the sensor to the biomass of the host cell representing a filter/amplifier with a gain that correlates with the level of induction.We found the time constants describing filter dynamics to be independent of the level of induction but distinctively clustered for each of the production pathways, indicating the robustness of the sensor. Moreover, by monitoring the effect of the copy-number of the production plasmid on the dose–response curve of the sensor, we managed to coarse-tune the level of pathway expression to maximize malonyl-CoA synthesis. In addition, we provide an example of the sensor's use in analyzing the effect of inducer or substrate concentrations on production levels. The rational development of models describing sensors, supplemented with the power of high-throughput optimization provide a promising potential for engineering feedback loops regulating enzyme levels to maximize productivity yields of synthetic metabolic pathways.

**Keywords: malonyl-CoA, dynamic pathway regulation, high-throughput screening, synthetic regulatory circuit, fluorescent reporter circuit, sensor–actuator circuit**

# **INTRODUCTION**

Natural metabolic pathways are highly regulated and can adapt dynamically to changes in the levels of chemicals within and around the cell (Chubukov et al., 2014). For this purpose, cells have developed a high number of sensors to monitor and control their own metabolic state. This stringent regulatory process is impaired or lost when metabolic engineers insert heterologous enzymes into artificial pathways, usually at non-optimal levels of activities,leading to accumulation of intermediate metabolites and reduced growth rate (Holtz and Keasling, 2010). To address this issue, several projects have attempted to mimic natural regulation by adding such sensors to engineered metabolic pathways. The advantages of using natural sensors such as transcription factors to implement synthetic dynamic regulation in heterologous pathways have been demonstrated in several studies: following pioneering work on lycopene synthesis regulation (Farmer and Liao, 2000), this strategy also led recently to higher yields in fatty acid synthesis with the help of fatty acids (Zhang et al., 2012) and malonyl-CoA (Xu et al., 2014; Liu et al., 2015) biosensors.

Such biosensor-based approaches allow the bacterial factory to artificially monitor its own level of metabolites and to modify expression levels of certain enzymes in order to balance pathway function. They can also be used to devise external monitoring systems to aid the selection of cells displaying improved biological functions. A major application for this monitoring is the screening of libraries of variant strains in directed evolution of enzymes (Michener and Smolke, 2014). Screening of producers is still a major bottleneck in metabolic engineering as quantification of the compound of interest usually relies on low-throughput measurement techniques such as liquid or gas chromatography. However, the typical throughput necessary for directed evolution applications or combinatorial libraries' screening is several orders of magnitude higher (Dietrich et al., 2010). Therefore, *in vivo* biosensing offers a relevant non-destructive, cost-effective, and high-throughput alternative for monitoring production levels (Schallmey et al., 2014). This strategy has proven successful in identifying high-producing strains, via the coupling of biosensor to reporter genes such as fluorescent proteins or fitness-related proteins. The screening can either be done at the colony level, in microtiter plates or by fluorescence-activated cell sorting (FACS). For instance, screening of *Escherichia coli* colonies expressing a transcription factor evolved to detect mevalonate allowed successful identification of high producers of mevalonate in a library of mevalonate-synthesis variants (Tang and Cirino, 2011). Theoretically, FACS may allow even higher throughputs and was notably used to evolve the caffeine demethylase enzyme in yeast cells harboring riboswitches able to detect theophylline, the enzyme's product (Smolke and Michener, 2012). FACS was also used in combination with transcriptionfactor-based sensors to measure llysine production in *C. glutamicum* (Binder et al., 2012) and detect production of Kaempferol in engineered *E. coli* (Siedler et al., 2014). Alternatively, growth rate can be coupled to the production of the product of interest via transcription factor-dependent expression of antibiotic resistance proteins. This strategy has been used in a proof of principle experiment to enrich a population of *E. coli* for cells harboring the pathway for 1-butanol-production (Dietrich et al., 2013). Later, the power of multiplex automated genome engineering was combined with this approach to optimize the production of naringenin and glucaric acid in *E. coli* (Raman et al., 2014).

Because both pathway dynamic regulation and screening rely on the same type of biosensors, a synergy exists between these two approaches. Illustrating such synergy, here we adapted a malonyl-CoA biosensor previously developed for a dynamic regulation purpose (Liu et al., 2015) to a new screening goal. Transcription factor-based biosensing is particularly interesting for monitoring malonyl-CoA as it is difficult and time consuming to quantify this molecule *in vivo* via standard LC/MS. In addition to being an important step for fatty acid synthesis in *E. coli*, malonyl-CoA production is also a limiting step for the production of plant flavonoids in engineered *E. coli*. Our group is working on the development of computer-assisted design of metabolic pathways and is seeking high-throughput techniques to test predicted pathways and enzymes for continual feedbackoptimization of the prediction algorithm. In our earlier proof of concept work, we used our software RetroPath (Carbonell et al., 2011) to identify and rank pathways capable of producing the high-value flavonoid pinocembrin in *E. coli*. During the implementation of multiple constructs, Malonyl-CoA, was identified as a bottleneck, preventing high-yield production. Here, we used the biosensor based on transcription factor FapR to analyze the malonyl-CoA-producing pathways that were proposed by our software to remove the bottleneck. We identified optimal conditions for malonyl-CoA monitoring by fine-tuning levels of expression of the transcription factor and we fitted our experimental data to a model giving insights on proper interpretation of output profiles encountered in such screening protocols. This approach helped us identify the best candidate pathways, validate two novel routes of malonyl-CoA synthesis, and optimize enzyme-expression levels through modification of plasmid copy-number.

**FIGURE 1 | The effect of cerulenin on fluorescence generated by the sensor circuits**. On all three graphs, the solid lines show the cultures challenged with cerulenin, while the segmented lines depict the corresponding untreated cultures. **(A)** DH5α cells harboring the pBFR1k\_RFP\_8FapR sensor plasmid in mineral salts medium. **(B)** DH5α cells harboring the pCFR sensor plasmid in LB medium. Three cultures were tested, originating from three colonies obtained after transformation of the pCFR ligate. **(C)** DH5α cells harboring the pCFR sensor plasmid from the best performer colony seen in **(B)**, cultured in mineral salts medium.

# **MATERIALS AND METHODS STRAINS AND PLASMIDS**

Plasmid construction was done in *E. coli* strain DH5α, and all malonyl-CoA sensing experiments were carried out in strain BL21DE3. Plasmid pBFR1k\_RFP\_8FapR (Liu et al., 2015) was a kind gift of Dr. Fuzhong Zhang (Washington University, St. Louis, MO, USA). To construct plasmid pCFR, pBFR1k\_RFP\_8FapR was PCR-amplified using primers pBFR-FW (ACT*GTCGAC*GAAACGATCCTCATCCTG) and pBFR-Rev (GC*TCTAGA*TTCTTCTGAGCGGGACTCTG), and the Cmresistance marker of pSG76-CS (GeneBank: AF402780.1) was amplified with primers pSGCSH-FW (GC*TCTAGA*GTGAGGCA CCAATAACTG) and pSGCSH-Rev (ACT*GTCGAC*GATCGGCA CGTAAGAGGTTC). Both PCR products were double-digested

with *Xba*I and *Sal*I, and were ligated using T4 ligase. Plasmid pMSD8 was a kind gift of Prof. John Cronan (University of Illinois, Urbana, IL, USA), and plasmids pETM6- M*accABCD* and pETM6-P*accABCD* (Xu et al., 2013) were kindly provided by Prof. Mattheos Koffas (Rensselaer Polytechnic Institute, Troy, NY, USA). The construction of plasmids pRSF*matCmatB*, pRSF*mmsA*, pRSF*cagg1256*, pRSFM*accABCD*, and pRSF*matCatoDA* was described elsewhere (Fehér et al., 2014). Plasmid pACYC*matCmatB* (Wu et al., 2013) was a kind gift of Dr. Jingwen Zhou (LBBE, Jiangnan, China). pACYCM*accABCD* was constructed by double-digesting pRSFM*accABCD* with *Bam*HI and *Avr*II, and ligating the 4855 bp-long fragment with the similarly digested pACYC*matCmatB*. pACYC*matCatoDA* was engineered by double-digesting pRSF*matCatoDA* with *Nde*I and *Xho*I, and ligating the 1408 bp fragment with the 5273 bp fragment of the similarly digested pACYC*matCmatB*. pACYC*mmsA* and pACYC*cagg1256* were constructed by double-digesting pRSF*mmsA* and pRSF*cagg1256*, respectively, both with *Eco*RI and *Xho*I, and ligating the respective 1685 and 3848 bp fragments with the similarly digested pACYC*matCmatB*. Restriction endonucleases and T4 ligase were obtained from Thermo Fisher Scientific (Waltham, MA, USA), Q5 DNA polymerase for PCR was from New England Biolabs.

# **FLUORESCENCE MEASUREMENT OF BACTERIAL CULTURES**

Each bacterial strain to be measured was grown overnight in mineral salts medium (Hall, 1998) supplemented with 0.2% glucose and the appropriate antibiotics. On the day of the measurement, the cultures were diluted 20-fold in fresh medium, and dispensed into clear-bottomed black-walled 96-well plates (Costar Ref. 3603, Corning, NY, USA). Growth and measurement of the cultures took place in a TECAN Infinite 500 fluorescent reader, shaken and incubated at 37°C. Optical density was determined at 600 nm. Fluorescence intensity was recorded using an excitation wavelength of 580 ± 10 nm, and an emission wavelength of 610 ± 5 nm, with a gain set to 40. Readings were taken every 12 min. Antibiotics were used at the following concentrations: ampicillin (Ap): 50µg/ml, chloramphenicol (Cm): 25µg/ml, kanamycin (Km): 30µg/ml. Na-malonate was administered at 2 mg/ml, cerulenin was used at 20µg/ml, and β-alanine at 3 mM end concentrations. For characterization of the sensors' response to malonyl-CoA, IPTG was administered in a concentration-series of 0.01, 0.1, 0.3, 0.6, 1, and 10 mM. Arabinose was used in a log10 series from 10−<sup>4</sup> to 10−1% during optimization and at 10−2% for comparison of alternate malonyl-CoA producer constructs. All chemicals were obtained from SIGMA (St. Louis, MO, USA).

## **IDENTIFICATION OF THE DYNAMIC MODEL OF THE SENSOR**

In order to determine the dynamic response of the sensor, we consider three approximate dynamic models.

#### **Zero-pole model**

This model considers that the dynamics between biomass*X*(*t*) and fluorescence *R*(*t*) can be approximately described by an integral time constant τ*p*, a derivative time constant τ*<sup>z</sup>* , and a gain *K*:

$$\mathfrak{r}\_p \frac{d\mathcal{R}(t)}{dt} = -\mathcal{R}(t) + K \left( X(t) - \mathfrak{r}\_z \frac{d\mathcal{X}(t)}{dt} \right).$$

The transfer function between the input and the output expressed through the Laplace transform notation is:

$$\frac{R(s)}{X(s)} = K \frac{1 - \mathfrak{r}\_z s}{1 + \mathfrak{r}\_p s}$$

#### **1 zero-2 poles model**

In this model, we consider an additional integral time constant τ*p*<sup>2</sup> between the input *X*(*t*) and the output *R*(*t*), represented in the following equation through an auxiliary state variable *X*1(*t*):

$$\begin{aligned} \pi\_{p1}\frac{dX\_1(t)}{dt} &= -X\_1(t) + K\left(X(t) - \pi\_z \frac{dX(t)}{dt}\right),\\ \pi\_{p2}\frac{dR(t)}{dt} &= -R(t) + X\_1(t) \end{aligned}$$

In Laplace transform notation:

$$\frac{R(s)}{X(s)} = K \frac{1 - \mathfrak{r}\_2 s}{(1 + \mathfrak{r}\_{\mathbb{P}^1} s)(1 + \mathfrak{r}\_{\mathbb{P}^2} s)}$$

#### **1 zero-3 poles model**

In this model, we consider an additional integral time constant τ*p*<sup>3</sup> between the input *X*(*t*) and the output *R*(*t*), represented in the following equation through an additional state variable *X*2(*t*):

$$\begin{aligned} \mathfrak{r}\_{p1}\frac{dX\_1(t)}{dt} &= -X\_1(t) + K\left(X(t) - \mathfrak{r}\_z\frac{dX(t)}{dt}\right) \\ \mathfrak{r}\_{p2}\frac{dX\_2(t)}{dt} &= -X\_2(t) + X\_1(t) \\ \mathfrak{r}\_{p3}\frac{dR(t)}{dt} &= -R(t) + X\_2(t) \end{aligned}$$

In Laplace transform notation:

$$\frac{R(s)}{X(s)} = K \frac{1 - \mathfrak{r}\_z s}{(1 + \mathfrak{r}\_{\mathfrak{p}1}s)(1 + \mathfrak{r}\_{\mathfrak{p}2}s)(1 + \mathfrak{r}\_{\mathfrak{p}3}s)}$$

In order to identify the model from the samples, we used an approximate discrete model based on the bilinear transform (Oppenheim and Schafer, 2010):

$$s \leftarrow \frac{2}{T\_s} \frac{z-1}{z+1}$$

where *z* −1 corresponds to a pure delay in the sample and *T*<sup>s</sup> is the sampling frequency. Parameters of the discrete model were fitted by linear regression using the R package with the dynlm package (Zeileis, 2014), which was also employed to perform the simulations.

#### **RESULTS**

#### **EXPERIMENTAL ASSESSMENT OF MALONYL-CoA SENSING**

0.01 mM; magenta: 0.01 mM; yellow: 1 mM; cyan: 10 mM.

Malonyl-CoA is an important building block from the aspect of metabolic engineering, for it is required for the biosynthesis of fatty acids, polyketides, flavonoids, and other compounds (Fowler et al., 2009; Xu et al., 2011). In an earlier work, we have predicted and tested several alternative pathways that yield malonyl-CoA in order to boost pinocembrin production in *E. coli* (Fehér et al., 2014). Briefly, one candidate for malonyl-CoA synthase (*matB*, EC 6.2.1), one for malonate-CoA transferase (*atoDA*, EC 2.8.3.3),

two for malonyl-CoA reductase (*mmsA* and *cagg1256*, EC 1.2.1.75) as well as one for the acetyl-CoA carboxylase complex (*accABCD* from *E. coli*, EC 6.4.1.2) were expressed together with the complete pinocembrin pathway, and their efficiencies were deduced from the resulting normalized pinocembrin titers. Here, we used the molecular sensor as a more direct method to investigate the levels of malonyl-CoA produced by the alternative enzymes.

## **Verifying the function of the malonyl-CoA sensor**

The core of the malonyl-CoA sensor is an RFP gene driven by the synthetic pFR1 promoter (Liu et al., 2015). pFR1 is a combination of PA1 promoter from phage T7 (Deuschle et al., 1986) and two FapR-binding segments flanking the −10 region. The FapR protein, originating from *Bacillus subtilis*, dissociates from the DNA upon the binding of malonyl-CoA (Schujman et al., 2003), thereby allowing the *E. coli* RNA polymerase to transcribe the downstream sequences. The transcription of FapR is driven by a pAra promoter (controlled by the AraC protein), pFR1 is therefore increasingly repressible by elevating the l-arabinose levels. The AraC and FapR transcription factors, as well as RFP are all encoded on pBFR1k\_RFP\_8FapR, a kanamycin-resistant plasmid constructed by the workgroup of Fuzhong Zhang (Liu et al., 2015). The authors validated the quantitative nature of malonyl-CoA sensing by generating a calibration curve that correlated the fluorescence levels to LC-MS measurements of intracellular malonyl-CoA. In order to use this sensor to compare the malonyl-CoA productivity of our construct collection available on the kanamycin-resistant pRSF vector, we generated a chloramphenicol-resistant version of the plasmid, which we call pCFR (plasmids used in this study are listed in Table S1 in Supplementary Material). The functionality of pCFR, as well as that of the original sensor plasmid was benchmarked with the help of cerulenin. This is an inhibitor of 3-ketoacyl-ACP synthase I and II (D'Agnolo et al., 1973), which participate in the committing step of free fatty acid biosynthesis. The inhibition of these enzymes results in a substantial intracellular accumulation of malonyl-CoA (Davis et al., 2000), and should therefore cause an increase in RFP fluorescence. Indeed, *E. coli* cells carrying either pBFR1k\_RFP\_8FapR or pCFR responded to cerulenin with a sharp increase in OD-normalized fluorescence (**Figures 1A,B**). As a matter of fact, a small scale screening of four pCFR-carrying clones allowed the identification of the most effectively functioning plasmid, which was chosen for all downstream experiments (**Figure 1B**). As expected, the ratio of RFP/OD values of cerulenin-treated to control cultures was higher in minimal medium than in LB broth (**Figure 1C**), due to the relatively high background-fluorescence of LB medium (Xu et al., 1999). For this reason, all further measurements were carried out in minimal medium.

## **Optimizing sensitivity toward malonyl-CoA detection**

In its original publication by Liu et al. (2015), the malonyl-CoA sensor was characterized in a split form: the FapR and RFP genes were carried by two separate plasmids. Therefore, we recharacterized pBFR1k\_RFP\_8FapR in order to find the optimal conditions of comparing various malonyl-CoA producer enzymes using the single-plasmid sensor construct. The expression level of the FapR repressor was varied by administering a dilution series of arabinose, while the expression level of the malonate transporter (*matC*) and the *matB* from plasmid pACYC*matCmatB* was controlled by altering the IPTG concentration. A 4 × 4 concentration matrix was thus applied to cells carrying either both plasmids or just the sensor plasmid. As apparent from the OD-normalized fluorescence values shown on **Figure 2**, an arabinose concentration of 0.01% proved to be optimal, taking both specificity and sensitivity into account, similarly to the originally published sensor system. At lower arabinose concentrations, specificity deteriorated, since the sensor plasmid produced slight increases in fluorescence in response to IPTG even in the absence of the malonyl-CoA producer plasmid. This could indicate the promoter's non-specific binding to LacI, which is competitively inhibited by sufficient quantities of the FapR protein. Increasing arabinose levels to 0.1% strongly reduced fluorescence levels of the sample, leading to a decrease of sensitivity and of the signal to noise ratio, possibly resulting from an over-repression by FapR. The initial decrease in fluorescence/OD values, seen on all graphs is likely due to the delay of RFP expression compared to culture growth.

#### **Comparison of malonyl-CoA-producing construct collection**

To compare the malonyl-CoA production efficiencies of alternative malonyl-CoA-producing pathways, at least one representative enzyme for each pathway was cloned, as described earlier (Fehér et al., 2014). Briefly, the *matB*, *atoDA*, *mmsA*, and *cagg1256* genes, as well as the *accABCD* gene complex were expressed from the high-copy plasmid pRSFduet. To provide sufficient amounts of substrate, a MatC was co-expressed with *matB* and *atoDA*, and the medium was supplemented with Na-malonate. To elevate the substrate levels for *mmsA* and *cagg1256*, cultures were supplemented with β-alanine, which is converted to 3-oxopropanoate (malonate semialdehyde) by the cell's endogenous 4-aminobutyrate aminotransferase. In every case, the fluorescence/OD values were monitored over time after growing the cells in various IPTG concentrations. Cells carrying the empty pRSFduet plasmid (besides the sensor plasmid) were used as a negative control. Based on their response, the strains carrying the various constructs fell into either one of the two following categories: (i) the cells exhibited growth and a dose-dependent fluorescence/OD response to IPTG (pRSF*mmsA* and pRSF*cagg* ) (**Figure 3**) or (ii) the cells exhibited absolutely no growth upon IPTG induction in minimal medium (pRSFM*accABCD* and pRSF*atoDA*) (**Figures 4A,B**). It is important to note that certain cells of the latter category also gave a seemingly dose-dependent response to IPTG (such as pRSF*atoDA*), but their fluorescence and OD values were practically unchanged during the course of the experiment, most probably indicating that the trend in their ratio is an artifact (Figure S1 in Supplementary Material). Interestingly, both types of responses were seen with pRSF*matCmatB*-carrying cells, in a quite irreproducible manner (see Discussion and **Figures 4C,D** and Figure S2 in Supplementary Material). The growth curves obtained by inducing strains

carrying various producer plasmids are summarized on Figure S3 in Supplementary Material.

#### **Investigating lower-copy-number alternative constructs**

Since the overexpression of certain members of the *accABCD* complex (Davis et al., 2000), as well as protein overproduction in general (Flores et al., 2004) have been described to have a toxic effect on the cells, the overexpression of certain constructs was repeated using lower-copy vectors. For the *accABCD* complex, pMSD8, an ultra-low-copy variant was available expressing the four genes as a T7-driven operon. In addition, two medium-copy alternatives, pETM6-P*accABCD* and

pETM6-M*accABCD*, carrying the genes as a pseudo-operon (separate promoters) and in monocistronic form (separate promoters and terminators), respectively, were also obtained and tested (Xu et al., 2012, 2013). All three plasmids displayed a clear dose-dependent fluorescence/OD pattern, well above the levels of the negative control (**Figures 5A–C**). For *matCmatB*, pACYC*matCmatB* was readily available as a low-copy alternative that functioned in a quite well-reproducible fashion (**Figure 2**), when tested with the pBFR1k\_RFP\_8FapR sensor plasmid.

These results prompted us to subclone our pathway collection into a pACYC backbone, thereby reducing the copynumber of all constructs to ~15. When expressed this way, all five constructs (*matCmatB*, *mmsA*, *cagg1256*, *M.accABCD*, and *matCatoDA*) allowed cell growth and displayed a scalable, IPTGdependent fluorescence (**Figures 2** and **6**). As it turned out, the concise nature of the dataset obtained with the pACYC-collection allowed us to fit various models describing the response of the sensor to malonyl-CoA production (see below).

# **Quantification and reproducibility of malonyl-CoA levels**

Since *E. coli* strain BL21DE3, the host used in our experiments is not deleted for the genes of arabinose catabolism, it breaks down arabinose when the medium is depleted for glucose. This likely causes a decrease in FapR expression, and a consequent derepression of RFP at the end of growth. As a result, a second, sudden increase in fluorescence was seen in many samples upon the transition to stationary phase, as apparent from juxtaposing the temporal evolution of fluorescence and OD (Figure S4 in Supplementary Material). To avoid this effect, we analyzed fluorescence levels corresponding to the log-phase (OD = 0.6), a physiological state when arabinose catabolism is still inhibited by the catabolite repression of *E. coli*. Figure S5 in Supplementary Material shows some of the fluorescence vs. OD plots generated to obtain such readings.

Prior to comparing the performance of various malonyl-CoA-producing constructs, we investigated the reproducibility of our measurements at three different levels. First, we compared the fluorescence values of parallel clones, resulting from

**FIGURE 7 | Reproducibility of measurements of malonyl-CoA production in BL21DE3 cells using the fluorescent sensor plasmid pCFR and various production plasmids**. The IPTG-dependence of fluorescence measured at OD = 0.6 is shown in every case. **(A)** Two cultures harboring pMSD8, originating from two distinct colonies after transformation are depicted by open squares and open triangles. Open diamonds represent the re-measurement of the strain corresponding to the open squares, 1 day later. **(B)** Six cultures harboring ETM6-MaccABCD measured on different days. **(C)** Two cultures of the same clone of pACYCmatCmatB originating from the same starter culture, measured in the same run, in the presence of Na-malonate.

the transformation of a given producer construct. These displayed a strong variation in their response to IPTG, which was not unexpected taking into account their heterologous colony morphologies (**Figure 7A**). Second, we re-tested the best performing clones after their storage in the form of glycerol stocks. Reproducibility was substantially better, being the best for the lowest copy-number pMSD8 plasmid (**Figure 7A**). For other constructs, the performance seemed to fall into two distinct clusters: it either matched that of earlier measurements, or deteriorated to a lower, near-zero value, as demonstrated by the dose–response curves of pETM6-M*accABCD* (**Figure 7B**). This phenomenon was reminiscent of a burden-relieving mutation that swept through the population early after induction. The third level of reproducibility corresponds to comparing the performance of a given strain in the same microtiter plate in the same measurement-run, starting from the same overnight seed-culture (**Figure 7C**). The high reproducibility of such measurements was sufficient to draw conclusions on the effect of providing the substrates for malonyl-CoA production (see below).

#### **Investigating the effect of administering the substrate**

To test whether providing the substrates of malonyl-CoA synthesis in the growth medium offers a measurable advantage in the product levels, we repeated the experiment for, pACYC*matCmatB* and pACYC*matCatoDA* with and without Na-malonate as well as for pRSF*mmsA* and pRSF*cagg* with and without β-alanine. For pACYC*matCmatB*, we observed significantly boosted production at 0.3 and 0.6 mM IPTG (**Figure 8A**). However, the presence of Na-malonate significantly hindered the performance of pACYC*matCatoDA* at IPTG concentrations of 0.3 mM and above (**Figure 8B**). Similarly, supplementing β-alanine to pRSF*mmsA* seemed to have a negative effect at 0.6 mM IPTG induction (**Figure 8C**). No significant effect was seen for adding the substrate to pRSF*cagg* (not shown).

#### **Investigating the efficiencies of malonyl-CoA production**

The four alternative pathways for malonyl-CoA production, as well as the top-scoring enzymes that catalyze the required steps were sought for by RetroPath, our retrosynthetic biology tool developed for metabolic engineering (Carbonell et al., 2011). As opposed to our initial plans, the growth-inhibitory effect of certain pRSF-based constructs prohibited us to validate the ranking of the alternative pathways by direct comparison of their measured fluorescence values. Indeed, the toxicity of overproducing malonyl-CoA by overexpression of acetyl-CoA carboxylase has been described before (Davis et al., 2000). The performance of the readily available low-copy variants of *accABCD*, however (pMSD8, pETM6-P*accABCD*, and pETM6-M*accABCD*), indicated the importance of using lower-copy expression vectors. After recloning our gene-set into pACYC-derived plasmids, *matCatoDA*, *MaccABCD*, *mmsA*,*cagg1256*, and *matCmatB* all permitted growth when expressed. Importantly, based on their fluorescence values, these constructs all produced significantly more malonyl-CoA when induced with 1 mM of IPTG, than the control cell harboring no production (see below). Ranking the alternative malonyl-CoA-producing pathways was not feasible due to the high variation of the data. Important practical conclusions could be nevertheless made, listed in the Section "Discussion."

**FIGURE 8 | The effect of administering the substrate on the level of fluorescence obtained upon induction of malonyl-CoA production**. Hashed bars represent the effect of IPTG on the fluorescence obtained at OD = 0.6 with constructs **(A)** pACYCmatCmatB, **(B)** pACYCmatCatoDA, and **(C)** pRSFmmsA. Gray bars depict the peak fluorescence detected when inducing the same constructs in the presence of their substrate: Na-malonate for matCmatB and matCatoDA and β-alanine for mmsA. The sensor plasmid was pBFR1k\_RFP\_8FapR for **(A,B)** and pCFR for **(C)**. Asterisks indicate a significantly different fluorescence caused by the substrate (p < 0.05).

#### **MODELING THE MALONYL-CoA SENSOR**

#### **Characterization of the sensor response**

In order to characterize the sensor,we modeled the sensor response to the production of malonyl-CoA, its measured signal. Ideally, the sensor once calibrated should display a good reproducibility as well as a stable sensitivity for its use in applications such as screening or feedback regulation. For that purpose, it is desirable to keep the dynamic response of the sensor uncoupled from any parameter variation so that the relationship with the measured signal and its output corresponds basically to a linear one. The time of response of such linear sensor, on the other hand, should be kept below some reasonable limit in order to avoid large lags introducing potential instability into the system. Here, as we are basing our model on the observations of OD and RFP, we need to consider in our model the relation between these variables with respect to the malonyl-CoA concentration, the signal that is measured by the sensor. In this study, the model necessarily will be kept simple, as our purpose is to provide a set of parameters that can be easily measured in order to calibrate the sensor.

Similarly to other proposed models (Anesiadis et al., 2008), we consider that the signal undergoes several transformation stages in cascade-mode going from the biomass to the fluorescence levels:

1. We assume that the variation on concentration of malonyl-CoA *M(t)* depends on its instant production, given by the rate *v*m, which is specific to each selected malonyl-CoA-producing construct or pathway, multiplied by the biomass *X*(*t*) minus the malonyl-CoA that is consumed for growth defined as a constant rate *v*<sup>g</sup> that multiplies the variation in time of biomass and another term for degradation, defined by a rate constant γm:

$$\frac{dM(t)}{dt} = \nu\_{\rm m} X(t) - \nu\_{\rm g} \frac{dX(t)}{dt} - \gamma\_{\rm m} M(t) \tag{1}$$

During the exponential growth phase, this previous equation is simplified as:

$$\frac{dM(t)}{dt} = (\nu\_{\rm m} - \mu \,\nu\_{\rm g})X(t) - \gamma\_{\rm m}M(t) \tag{2}$$

where µ is the specific growth rate.

2. Malonyl-CoA binds to transcription factor FapR to form a complex whose concentration *C*(*t*) depends on a dissociation constant *K*<sup>d</sup> [2.4µM according to Schujman et al. (2006)]. The concentration of FapR *F*(*t*) is considered constant depending on the concentration levels of the inducer l-arabinose:

$$F + M \overset{k\_f}{\underset{k\_l}{\rightleftharpoons}} C$$

$$\frac{dC(t)}{dt} = k\_{\text{f}} \text{FM}(t) - k\_{\text{f}}C(t) \tag{3}$$

3. The change on concentration of RFP *R*(*t*), depends on the concentration of the complex *C*(*t*) through a constant κ associated to the promoter strength (Oyarzún and Stan, 2013) and a decay constant γ<sup>r</sup> :

$$\frac{d\mathcal{R}(t)}{dt} = \kappa C(t) - \gamma\_\text{r} \mathcal{R}(t) \tag{4}$$

Under the assumptions given by the model described by Eqs 1–4, the response of the sensor can be approximated as a cascade of filters as shown in **Figure 9**. The advantage of using such approximation is that we can assume that the dynamics

**FIGURE 9 | An approximate model of the response behavior of the sensor based on a cascade of first-order filters**. The biomass X (t) is assumed to be approximately related to the concentration of malonyl-CoA M(t) through a first-order pole-zero filter. The sensor complex malonyl-CoA-FapR concentration C(t) is then related to M(t) through a first-order filter and finally, the RFP concentration R(t) is again related to C(t) through another first-order filter.

due to the time constants of these three filters occur at different time scales and therefore, under the appropriate conditions, time-scale separations can be performed. Based on this principle, we can approximate the model by considering main time constants present in the model, i.e., by estimating parameters of a cascade of first-order models (see Materials and Methods).

#### **Sensor model fitting**

In our sensor model, we are assuming that under same inducer and regulatory conditions on a strain population, malonyl-CoA levels should change basically because of the change in the rate of production of this metabolite by the strain (v<sup>m</sup> in Eq. 1), which would correspond to a different gain of the sensor, i.e., a different measured RFP and biomass *X* in steady state.

Such gain according to the sensor model in Eqs 1–4, the gain or relationship in steady state between RFP and *X* is given by the following relationship between model parameters:

$$\frac{R}{X} = K\_{\rm d} \frac{\mathbf{v\_m}}{\mathbf{\dot{\gamma}\_m}} \frac{\mathbf{\dot{\kappa}}}{\mathbf{\dot{\gamma}}} \tag{5}$$

Regarding model dynamics, we considered three simplified models for the response dynamics depending on the time constants that are considered (see Materials and Methods): (1) a first-order with integral and derivative terms; (2) a second-order model consisting of a first-order model with integral and derivative terms and an additional first-order integral term; and (3) a third-order model with the integral and derivative terms and two additional first-order integral terms.

As shown in **Figure 10**, measured responses were in general fitted with an increasing level of accuracy to each of the three approximate sensor models. For some constructs, however, measurements could not be successfully fitted to model 3 (see Figure S4 in Supplementary Material). Therefore, we decided to use model 2 (the one considering one derivative and two integral terms) as the reference model in order to characterize the dynamics of the sensor. **Table 1** provides a list of estimated parameters for different constructs. Time constants ranged between approximately 1 and 10 h.

#### **Dependence of sensor parameters on induction**

According to our simplified model of the sensor consisting on a second-order transfer function, gain between biomass and RFP should depend on the level of induction of the malonyl-CoAproducing enzyme. Time constants, on the contrary, should not depend on the level of induction, but should be constitutive parameters of the sensor that can be calibrated for each construct. As shown in **Figure 11**, we found effectively a good

**FIGURE 10 | Example of the mode of fitting models to describe the response of the pBFR1k\_RFP\_8FapR sensor in the presence of pACYCmatCmatB, induced with 1 mM of IPTG**. Na-malonate was included in the medium. The input signal is the measured OD (shown in dotted line and scaled to 50% of maximum in the plot) and the simulated response of the model for the observed output RFP (in black) is shown for a fitting to a first-order model with input derivative (red), a second-order with input derivative (blue), and a third-order with input derivative (green).

**Table 1 | Gain and integral time constants of the sensor fitted to model 2 estimated for different constructs for IPTG** = **1** × **10**−**<sup>5</sup>** .


correlation between IPTG levels and sensor gain (*r* = 0.72, *p*value = 8.1 × 10−<sup>6</sup> ), indicating that the sensor is responsive to changes in induction of the enzyme. Time constants, as expected, were not significantly affected by changes in the IPTG concentration. These time constants, on the other hand, appear as distinctively clustered for each construct, showing the robustness of the sensor (**Figure 12**).

#### **DISCUSSION**

In this work, we have successfully implemented a malonyl-CoA sensor for the partial optimization of malonyl-CoA production in *E. coli* BL21DE3. Initially, we screened several conditions to be used for the measurements, and found the optimal one to match that described earlier (Liu et al., 2015), despite the possible changes in relative gene dosage caused by the single-plasmid vs.

dual-plasmid layout. We constructed a chloramphenicol-resistant version of the plasmid as well, to compare our construct collection encoding four alternative pathways for malonyl-CoA production, predicted by RetroPath. Besides their comparison, our aim was to use malonyl-CoA sensing to seek optimal conditions for efficient malonyl-CoA production, previously seen to be hindered by growth retardation (Fehér et al., 2014). Our current results also indicated, that the viability of the cells can be, in many cases strongly impaired by the high expression levels of the original pRSF-based constructs (Figure S3 in Supplementary Material). It was also apparent, that the lower-copy alternatives available for the *accABCD* gene complex (pMSD8, pETM6-M*accABCD*, and pETM6-P*accABCD*) outperformed all or most members of the RSF-collection. Therefore, a partial optimization of expression levels was carried out by using smaller copy-number vectors to obtain viable cells with measurable malonyl-CoA production upon induction, leading to a successful circumvention of the possible toxicity caused by the high-copy expression vectors. Due to the relatively high variation of the resulting data, comparison of these constructs was not conclusive enough to validate

all our predicted rankings. However, the obtained measurements were quite useful for the practical purpose of selecting the most efficient implementation of our malonyl-CoA-producing constructs. From this aspect, the best performer of our construct set (in terms of fluorescence at OD = 0.6) turned out to be pACYC*matCatoDA* carrying the malonate-CoA transferase on a P15a-derived low-copy-number plasmid (**Figure 13**). Its performance could be nonetheless matched by the expression of the acetyl-CoA carboxylase complex, which, in turn was indiscernible from the *matB*. The two genes encoding malonyl-CoA reductase (*mmsA* and *cagg1256*) performed significantly weaker. Importantly, the activity of these constructs uncovered two pathways, malonyl-CoA reductase and malonate-CoA transferase, which have not been implemented before in boosting malonyl-CoA biosynthesis.

When comparing the variation of measured fluorescence values, we observed a consistent decrease in the coefficient of variation for all constructs when recloned into the low-copy pACYC vector, possibly marking an increased genetic stability (Table S2 in Supplementary Material). In theory, this change could

also be caused by the fact of using different sensor plasmids in the comparison. However, no such difference was seen in the case of the pET-based plasmids when switching from pCFR to pBFR1k\_RFP\_8FapR used for sensing (Table S2 in Supplementary Material), indicating that it is the change in copy-number of the malonyl-CoA producer plasmid, and not the change in the sensor plasmid that caused the decrease of variation. This variation can be, at least in part, explained by burden-relieving mutations that sweep through the population early after induction. Infact, such an expansion of a non-fluorescent subpopulation was seen real time during one of our tests concerning pRSF*matCmatB* (**Figure 4C**), which was the least reproducible among all of our constructs. In that specific experiment, the fluorescence of the population ceased to grow after a certain point of time, despite the accelerating increase of the OD, leading to a decrease of the fluorescence/OD ratio (Figure S2 in Supplementary Material). The fact that no such effect is seen in **Figure 4D** indicates the random and incidental nature of this phenomenon, supporting the assumption that it is caused by a mutation. The specific mutations inactivating the production were not sought for.

In their original publication, the designers of this sensor detected a saturation of malonyl-CoA levels at 25µM,when inducing the expression of the *acc* operon at different strengths (Liu et al., 2015). This phenomenon was not investigated further, but could have been caused by the toxicity of the produced compound. Although we did not observe an unambiguous plateau of fluorescence values in our experiments, the possibility of such a scenario should be taken into account when using similar systems in the future. One possible solution for such cases could be to transform the molecule of interest into a compound *in vivo* that is better tolerated by the cell. The prerequisites of this strategy are to have a high-capacity, unsaturated pathway for transformation as well as a sensory device for the downstream product.

To characterize further the dynamic response of the sensor, we proposed and validated a second-order model linking biomass to fluorescence. We showed that dynamic parameters were specific to each construct and that they were not significantly altered by changes in the production rate of malonyl-CoA. This result shows the robustness of the sensor for its use in pathway regulation, since a dynamic response uncoupled from the measured signal is often necessary in order to assure the stability of the feedback loop. Moreover, in such regulation strategies, knowledge of the sensor's parameters is of critical importance in the choice of one particular control architecture over another (Stevens and Carothers,2014). Therefore,we hope our effort of modeling experimental data provided by this malonyl-CoA sensor will facilitate future developments in dynamic regulation of pathways such as the ones involved in fatty acid or flavonoid production.

Our most unexpected findings arose when we investigated the effect of providing the substrates of malonyl-CoA production on the fluorescence levels produced by the sensor. On one hand, providing Na-malonate to *matB*, resulted in an increased fluorescence, possibly indicating more product formation. On the other hand however, administering β-alanine or Na-malonate to *mmsA* and

*atoDA*, respectively resulted in an opposite effect. This phenomenon was especially pronounced for *atoDA*, which was notably the most effective malonyl-CoA producer among the constructs tested in this study. We can only speculate on the mechanism, which may even be a direct effect on the sensor plasmid, and may not represent an actually reduced malonyl-CoA synthesis. According to our hypothesis, it could be due to the fact that *atoDA* (as well as *matB*) are consumers of cellular acetyl-CoA, thereby causing an energy depletion. This is probably more severe if we provide the missing substrate (malonate), with very high enzyme concentrations, and thereby push the reaction toward the product. Perhaps, this energy depletion inhibits RFP production more than the newly produced malonyl-CoA would elevate it. Clearing this issue nevertheless requires further experiments.

When summarizing the observed results, several conclusions important to the metabolic engineer in general can be drawn. First, it is useful to test the performance of several parallel clones after transformation of the constructs, and choose the best for further experiments. Second, the reproducibility of cultures re-grown from glycerol stocks is acceptable, but repeated measurements are advisable to uncover the repeated emergence of deleterious mutants, possibly indicating genetic instability. Third, this instability was minimal when using the lowest copy-number vector for expressing the producer construct. This provides a further argument to start testing various alternative enzymes by expressing them from low-copy vectors with gradually controllable promoters, and varying ribosome binding sites to increase expression and find a construct that is optimal both in efficiency and stability. And finally, the variation among measurements obtained in the same run were small enough to test the effect of adding the substrates of the tested pathways to the cell culture, and obtain the optimal IPTG levels for most effective utilization. This approach could be useful in the future to optimize the substrate concentration itself, as well as any other component of the culture medium or the induction process in a high throughput, combinatorial fashion.

# **ACKNOWLEDGMENTS**

We thank Prof. John Cronan, Prof. Mattheos Koffas, Prof. Fuzhong Zhang, and Dr. Jingwen Zhou for providing their plasmids and Dr. Brian Jester for helpful discussions. PC is supported by Genopole through an ATIGE Grant by PRES UniverSud Paris, by Agence Nationale de la Recherche, and by UPFellows program with the support of the Marie Curie COFUND program. VL is supported by a DGA (French Ministry of Defense) graduate fellowship. The authors acknowledge funding provided by ToulouseWhite Biotech (TWB) and the GIP GENOPOLE.

# **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at http://journal.frontiersin.org/article/10.3389/fbioe.2015.00046

# **REFERENCES**


**Conflict of Interest Statement:**The authors certify that there is no conflict of interest with any financial organization regarding the material discussed in the manuscript. The Associate Editor, Jean Marie François, declares that, despite hosting a Frontiers Research Topic alongside the author Pablo Carbonell, the review process was handled objectively and no conflict of interest exists.

*Received: 02 September 2014; accepted: 23 March 2015; published online: 08 April 2015.*

*Citation: Fehér T, Libis V, Carbonell P and Faulon J-L (2015) A sense of balance: experimental investigation and modeling of a malonyl-CoA sensor in Escherichia coli. Front. Bioeng. Biotechnol. 3:46. doi: 10.3389/fbioe.2015.00046*

*This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology.*

*Copyright © 2015 Fehér, Libis, Carbonell and Faulon. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited andthatthe original publication inthis journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# New transposon tools tailored for metabolic engineering of Gram-negative microbial cell factories

# **Esteban Martínez-García,Tomás Aparicio, Víctor de Lorenzo and Pablo I. Nikel \***

Systems and Synthetic Biology Program, Centro Nacional de Biotecnología (CNB-CSIC), Madrid, Spain

#### **Edited by:**

Jean Marie François, CNRS, France

#### **Reviewed by:**

M. Kalim Akhtar, University College London, UK Daehee Lee, Korea Research Institute of Bioscience and Biotechnology, South Korea

#### **\*Correspondence:**

Pablo I. Nikel, Systems and Synthetic Biology Program (CNB-CSIC), C/Darwin, 3, Madrid 28049, Spain e-mail: pablo.nikel@cnb.csic.es

Re-programming microorganisms to modify their existing functions and/or to bestow bacteria with entirely new-to-Nature tasks have largely relied so far on specialized molecular biology tools. Such endeavors are not only relevant in the burgeoning metabolic engineering arena but also instrumental to explore the functioning of complex regulatory networks from a fundamental point of view. À la carte modification of bacterial genomes thus calls for novel tools to make genetic manipulations easier. We propose the use of a series of new broad-host-range mini-Tn5-vectors, termed pBAMDs, for the delivery of gene(s) into the chromosome of Gram-negative bacteria and for generating saturated mutagenesis libraries in gene function studies. These delivery vectors endow the user with the possibility of easy cloning and subsequent insertion of functional cargoes with three different antibiotic-resistance markers (kanamycin, streptomycin, and gentamicin). After validating the pBAMD vectors in the environmental bacterium Pseudomonas putida KT2440, their use was also illustrated by inserting the entire poly(3-hydroxybutyrate) (PHB) synthesis pathway from Cupriavidus necator in the chromosome of a phosphotransacetylase mutant of Escherichia coli. PHB is a completely biodegradable polyester with a number of industrial applications that make it attractive as a potential replacement of oil-based plastics.The nonselective nature of chromosomal insertions of the biosynthetic genes was evidenced by a large landscape of PHB synthesis levels in independent clones. One clone was selected and further characterized as a microbial cell factory for PHB accumulation, and it achieved polymer accumulation levels comparable to those of a plasmid-bearing recombinant.Taken together, our results demonstrate that the new mini-Tn5-vectors can be used to confer interesting phenotypes in Gram-negative bacteria that would be very difficult to engineer through direct manipulation of the structural genes.

**Keywords: metabolic engineering,Pseudomonas putida,Escherichia coli, transposon mini-Tn5, central metabolism, chromosomal integration, polyhydroxyalkanoates**

# **INTRODUCTION**

Over the last few years, a number of Gram-negative bacteria have become increasingly attractive *chassis* for a number of synthetic biology and metabolic engineering purposes. One conspicuous case involves the environmental bacterium *Pseudomonas putida* as a robust host for strong oxidative bioreactions, together with its GRAS (generally recognized as safe) status and its inherent ability to grow on a wide range of substrates (Nikel et al., 2014a). This situation calls for the expansion of the available tools for rewiring its extant genetic features to further extend its metabolic potential – or even introducing new-to-Nature functions.

One frequently used molecular biology resource for analyses and manipulations of bacterial genomes is the Tn*5* transposon. Historically, a number of plasmid vectors based on both wildtype and minimized versions of Tn*5* (i.e. mini-transposons) had allowed the user to introduce stable insertions of foreign DNA into the chromosome of virtually any Gram-negative bacteria (de Lorenzo et al., 1990, 1998; Herrero et al., 1990; de Lorenzo and Timmis, 1994; Martínez-García et al., 2011; Martínez-García and de Lorenzo, 2012; Nikel and de Lorenzo, 2013a). Such Tn*5*-derived elements present clear advantages over the use of their plasmidbased counterparts for the introduction and expression of heterologous genes into several bacterial species. These features include (but are not limited to) (i) the maintenance of the corresponding transgenes without antibiotic selective pressure, (ii) the long-term stability of the constructs and the re-usability of the functional parts, and, furthermore, (iii) Tn*5* vectors admit cloning and chrosomosomal delivery of considerably long DNA fragments. Finally, as the transposase gene (*tnpA*) is lost following each transposition event (Berg, 1989; Reznikoff, 2006, 2008), one added value of mini-Tn*5*-vectors is the possibility to use them recursively in the same host, provided that they bear different selection markers. Moreover, as the TnpA transposase tends to act in *cis* (Phadnis et al., 1986), it promotes the insertion of DNA sequences borne by the plasmid, irrespective of previous insertions in a given target chromosome. These features allow for the integration of more than one DNA cargo into the same genome.

In this study, we report a series of synthetic, modular broadhost-range mini-Tn*5*-vectors for the delivery of gene(s) into the chromosome of a diversity of Gram-negative bacteria and to construct saturated mutagenesis libraries for gene function studies. These vectors were termed pBAMDs, and they enable the possibility of easy cloning and subsequent chromosomal insertion of functional cargoes with three different and interchangeable antibiotic-resistance markers (kanamycin, streptomycin, and gentamicin). Moreover, the functional parts of the new vectors can be easily swapped by digestion with the appropriate restriction enzymes, allowing for the shuffling of each element as needed. Potential applications of the new tools are illustrated in two different genetic contexts. In one case, a systematic validation of the Tn*5* vectors was carried out in *P*. *putida* KT2440, demonstrating the potential of the new insertional plasmids to be used in a sequential fashion for constructing (and deconstructing) complex phenotypes. In a second case, one of the pBAMD plasmids was used to insert a gene cluster from *Cupriavidus necator*, encoding all the biochemical functions needed for the formation of poly(3 hydroxybutyrate) (PHB), into the chromosome of *Escherichia coli*, thereby resulting in a new microbial cell factory tailored for biopolymer synthesis.

# **MATERIALS AND METHODS**

#### **BACTERIAL STRAINS, PLASMIDS, AND GROWTH CONDITIONS**

The bacterial strains and plasmids used in this study are described in **Table 1**. Bacteria were routinely grown batchwise in LB medium (10 g l−<sup>1</sup> tryptone, 5 g l−<sup>1</sup> yeast extract, and 5 g l−<sup>1</sup> NaCl) with rotary agitation (170 rpm). *P*. *putida* was grown at 30°C while *E*. *coli* cells were grown at 37°C. Selection of *P*. *putida* transconjugants was performed by spotting the cells onto M9 minimal medium agar plates (Sambrook et al., 1989) added with 0.2% (w/v) sodium citrate as the sole carbon source (to counterselect *E*. *coli* cells). PHB accumulation was assessed in selected *E*. *coli* transconjugants cultured in M9 minimal medium containing 30 g l−<sup>1</sup> glucose as the sole carbon source. Aerobic culture conditions in experiments aimed at polymer synthesis were achieved essentially as described by Nikel et al. (2010a), by using a 1:10 culture medium-to-flask volume ratio. Antibiotics were added at the following final concentrations whenever needed: ampicillin, 150µg ml−<sup>1</sup> for *E*. *coli* or 500µg ml−<sup>1</sup> for *P*. *putida*, chloramphenicol, 30µg ml−<sup>1</sup> ; kanamycin, 50µg ml−<sup>1</sup> , streptomycin, 80µg ml−<sup>1</sup> ; and gentamicin, 10µg ml−<sup>1</sup> . All solid media also contained 15 g l−<sup>1</sup> agar. Growth was estimated spectrophotometrically by measuring the optical density at 600 nm (OD600) of the cultures (appropriately diluted in 9 g l−<sup>1</sup> NaCl whenever needed) in a Ultrospec 3000 *pro* UV/Visible spectrophotometer (GE Healthcare Bio-Sciences Corp., Piscataway, NJ, USA). When culturing *E*. *coli* strains that accumulate PHB, for which OD<sup>600</sup> readings are no longer useful to estimate the cell dry weight (CDW), cells from 15 ml aliquots were washed, concentrated, and the CDW determined after drying the samples at 80°C to constant weight as previously indicated by Nikel et al. (2008a,b).

## **NUCLEIC ACID MANIPULATIONS AND GENERAL CLONING TECHNIQUES**

DNA manipulations followed routine laboratory techniques as described by Sambrook et al. (1989) and Martínez-García and de Lorenzo (2012). Plasmid DNA was obtained using the QIAprep Spin™ Miniprep kit (Qiagen, Inc., Valencia, CA, USA). Restriction enzymes were obtained from New England Biolabs Inc. (Ipswich, MA, USA), and T4 DNA ligase was purchased from Roche Applied Science Co. (Indianapolis, IN, USA). Plasmid p-R-SETA111 was constructed using isothermal assembly essentially as detailed by Gibson et al. (2009) but using a home-made mixture of enzymes. Colony PCR was performed using a single colony from a fresh LB agar plate and transferred directly into the reaction tube. PCR reactions were purified either with the NucleoSpin™ Gel and PCR clean-up kit (Macherey-Nagel GmbH & Co. KG, Düren, Germany) or the ExoSAP-IT™ PCR product clean-up kit (USB, Affymetrix Ltd., Santa Clara, CA, USA). Oligonucleotides were purchased from Sigma-Aldrich Co. (St. Louis, MO, USA). The oligonucleotides used in this work for specific DNA constructions are indicated in Table S1 in the Supplementary Material; the oligonucleotides used to identify the location of the chromosomal insertions are indicated in **Table 2** (see also next section). The three different mini-Tn*5* modules were chemically synthesized *de novo* by GeneCust Europe S.A. (Dudelange, Luxembourg). DNA sequencing was carried out by Secugen SL (Madrid, Spain).

Three separate DNA segments, carrying the transposon module along with the corresponding antibiotic-resistance determinant, were obtained as follows. In the first step, we used pBAM1 (Martínez-García et al., 2011) as the template to amplify the *bla* gene using primers Ap-*Asi*SI-F and Ap-*Mlu*I-R (Table S1 in the Supplementary Material), thereby substituting the *Swa*I and *Psh*AI restriction sites by target recognition sites for *Asi*SI and *Mlu*I, respectively, while maintaining the transcriptional control of *bla* through the native P3 promoter (Brosius et al., 1982). The backbone of plasmid pSEVA111 (Silva-Rocha et al., 2013) was amplified with the SEVA111-F and SEVA111-R oligonucleotide pair to obtain the second DNA fragment. Finally, the *tnpA* gene from pBAM1 was obtained using the oligonucleotides *tnpA*-*San*DI-F and *tnpA*-*Asi*SI-R that add the corresponding *San*DI and *Asi*SI restriction sites to the amplified fragment. These fragments were joined together by isothermal assembly, giving rise to plasmid p-R-SETA111 (Figure S1 in Supplementary Material).

Tripartite conjugative matings were set using *E*. *coli* CC118 λ*pir* (carrying pBAMD1-*x*, where *x* stands for any of the three antibiotic markers; see below) as the donor strain, the matinghelper strain *E*. *coli* HB101 (carrying pRK600), and *P*. *putida* KT2440 as the recipient strain. Conjugative matings were performed as described elsewhere (Martínez-García et al., 2011; Martínez-García and de Lorenzo, 2012). Briefly, the OD<sup>600</sup> from overnight cultures grown in LB medium with the appropriate antibiotics was adjusted to 1, then the cells were washed twice with 10 mM MgSO<sup>4</sup> to remove antibiotics from the culture medium, and each bacterial suspension was added to a 10 ml tube containing 5 ml of 10 mM MgSO<sup>4</sup> to obtain a final OD<sup>600</sup> of *ca*. 0.03. Biparental matings were done by following a similar procedure, but using *E*. *coli* S17-1 λ*pir* as the donor strain.

The mixture was concentrated by filtration and the cells were laid onto a filter disk (0.45µm pore-size, 23 mm diameter, EMD Millipore Corp., Billerica, MA, USA). The filter was placed onto the surface of an LB agar plate and incubated at 30°C for 6 h. Finally, the biomass from the filter was suspended in 5 ml of 10 mM MgSO<sup>4</sup> and different dilutions were plated onto a suitable selective media.

#### **Table 1 | Bacterial strains and plasmids used in this work**.


<sup>a</sup>Antibiotic markers: Ap, ampicillin; Cm, chloramphenicol; Gm, gentamicin; Km, kanamycin; Nal, nalidixic acid; Rif, rifampicin; Sm, streptomycin; Sp, spectinomycin; and Tc, tetracycline.

<sup>b</sup>Strain obtained from the E. coli Genetic Stock Center (Yale University, New Haven, CT, USA).

# **LOCALIZATION OF THE mini-Tn5 TRANSPOSON INSERTION SITES BY ARBITRARY PCR**

In order to specifically map the landing sites of the minitransposon within the chromosome, a set of oligonucleotides was designed that specifically hybridizes in each of the mini-Tn*5* elements (**Table 2**). Transconjugants were streaked onto M9 minimal medium agar plates with 0.2% (w/v) sodium citrate as the sole carbon source and supplemented with the appropriate antibiotics to obtain isolated colonies, which were re-streaked again in the same culture medium to obtain isolated clones. These colonies were used as the template for arbitrary PCR. The identification of the transposon insertion site could be independently obtained using either the oligonucleotides that are close to the ME-O end or with the ones designed for the ME-I end. However, when these plasmids carry any heterologous DNA in the multiple cloning site (MCS), the use of the oligonucleotides that are close to the ME-O end is the preferable option, since the ones for the ME-I end are located downstream the MCS and therefore would amplify



through it. In the later case, and depending on the length of the DNA cloned in the MCS, the corresponding amplicon may not provide long enough a sequence to ascertain the insertion site of the mini-transposon module.

The conditions of the first round of arbitrary PCR were as follows: 5 min at 95°C (initial denaturation); six cycles of 30 s at 95°C, 30 s at 30°C, and 90 s at 72°C; and 30 cycles of 30 s at 95°C, 30 s at 45°C, and 90 s at 72°C (Das et al., 2005). The ARB6 oligonucleotide was used together with the external oligonucleotides within the mini-transposon (indicated with the "Ext" acronym in **Table 2**). Then, we used 1µl of the first PCR round as the template for the second round of arbitrary PCR by applying the following conditions: 1 min at 95°C (initial denaturation); 30 cycles of 30 s at 95°C, 30 s at 52°C, and 90 s at 72°C; followed by an extra extension of 4 min at 72°C (Das et al., 2005). For the second round of arbitrary PCR, the ARB2 oligonucleotide was used together with the internal primers within the mini-transposon (indicated with the "Int" acronym in **Table 2**). Finally, the PCR amplification product obtained in the second round was directly purified and sent for sequencing with the corresponding internal oligonucleotide.

DNA sequences were thoroughly inspected visually for any error and analyzed using the *Pseudomonas* Genome Database (Winsor et al., 2011), and BlastN (Altschul et al., 1990, 1997) was subsequently employed to map the precise transposon insertion point. To ascertain the conservation level of the 9-bp target sequence of Tn*5* transposase, we resorted to the web-based application WebLogo 3.4 (Crooks et al., 2004).

#### **ANALYTICAL PROCEDURES**

For a coarse estimation of the PHB content in *E*. *coli* transconjugants, we resorted to a fluorimetric assay based on Nile red staining (Spiekermann et al., 1999). Cells were grown overnight in LB medium with the proper antibiotic. The cultures were diluted to an OD<sup>600</sup> of 0.1 in fresh LB medium containing 30 g l−<sup>1</sup> glucose and 200µl aliquots were placed in a 96 well microtiter plate (Costar™ black plates with clear bottom; Thermo Fisher Scientific Inc.). After growing the cells for 24 h at 37°C, 0.002 volume of a Nile red stock solution, freshly prepared by dissolving the dye (purchased from Sigma-Aldrich Co.) to 1 mg ml−<sup>1</sup> in dimethyl sulfoxide, were added to each well. The microtiter plates were incubated at 37°C in the dark for 30 min, and the fluorescence at 585 nm was measured in a SpectraMax M2e plate reader (Molecular Devices, LLC., Sunnyvale, CA, USA) in cells before and after staining with Nile red. The raw fluorescence readings were normalized to the biomass in each well by dividing the values by the OD<sup>600</sup> of the corresponding culture.

The quantification of PHB content in *E*. *coli via* fluorescenceactivated cell sorting (FACS) was conducted by following a slight modification of the protocol described by Tyo et al. (2006). In brief, cultures to be analyzed were promptly cooled to 4°C by placing them in an ice bath for 15 min. Cells were harvested by centrifugation (5 min, 5,000 × *g*, 4°C), resuspended to an OD<sup>600</sup> of 0.4 in cold TES buffer [10 mM Tris·HCl (pH = 7.5), 2.5 mM EDTA, and 10% (w/v) sucrose], and further incubated on ice for 15 min. Bacteria were recovered by centrifugation as explained above, and finally resuspended in the same volume of cold 1 mM MgCl2. A 1 ml aliquot of this suspension was added with 3µl of an 1 mg ml−<sup>1</sup> Nile red solution and incubated in the dark at 4°C for 30 min. Cells were analyzed by FACS immediately after the staining procedure. FACS was carried out in a MACSQuant™ VYB cytometer (Miltenyi Biotec GmbH, Bergisch Gladbach, Germany). Cells were excited with an Ar laser (488 nm, diode-pumped solid state), and the Nile red fluorescence at 585 nm was detected with a 614/50 nm band-pass filter. FACS analysis was done on at least 50,000 cells and the results were analyzed with the built-in MACSQuantify™ software 2.5 (Miltenyi Biotec). The geometric mean of fluorescence in each sample was correlated to the PHB content (expressed as a percentage) through a calibration curve as described previously (Tyo et al., 2006).

Cell-free extracts were obtained from cells harvested by centrifugation from an appropriate culture volume at 4,000 × *g* at 4°C for 10 min and processed as described previously (Nikel and de Lorenzo, 2013b; Nikel et al., 2014b). The total protein concentration in cell extracts was assessed by means of the Bradford method (Bradford, 1976) using a commercially available kit from BioRad Laboratories, Inc. (Hercules, CA, USA), with crystalline bovine serum albumin as the standard for determinations. *In vitro* quantification of the specific 3-ketoacyl-coenzyme A (CoA) thiolase activity in the thiolysis direction was conducted according to the protocols developed by Palmer et al. (1991) and Slater et al. (1998), with some modifications. The assay mixture (1 ml) contained 65 mM Tris–HCl (pH = 7.5), 50 mM MgCl2, 62.5µM CoA, and 65µM 3-acetoacetyl-CoA (Sigma-Aldrich Co.). Solutions of both CoA and 3-acetoacetyl-CoA were freshly prepared just prior to the assay. The assay was initiated upon the prompt addition of the cell-free extract, and the disappearance of 3-acetoacetyl-CoA was measured with time at 304 nm (using an extinction coefficient for 3-acetoacetyl-CoA ε<sup>304</sup> = 16.9 × 10<sup>3</sup> M−<sup>1</sup> cm−<sup>1</sup> ). One enzyme unit was defined as the amount of enzyme catalyzing the conversion of 1µmol of substrate to product per min at 25°C.

Residual glucose and acetate concentrations in culture supernatants were determined in selected samples using adequate enzymatic kits (R-Biopharm AG, Darmstadt, Germany), essentially as per the manufacturer's instructions. In either case, control mock assays were made by spiking M9 minimal medium with different amounts of the metabolite under examination. Metabolite yields and kinetic culture parameters were analytically calculated from the raw growth data as described elsewhere (Nikel et al., 2008a, 2010a, 2014b; Nikel and de Lorenzo, 2013a,b).

## **STATISTICAL ANALYSIS**

The reported experiments were independently repeated at least twice (as indicated in the corresponding figure legend), and the mean value of the corresponding parameter ± SD is presented. When appropriate, data were statistically treated with an unpaired Student's *t* test, and 95% confidence intervals for each parameter were calculated to demonstrate a statistically significant difference in means among the experimental samples. For the flow cytometry experiments, the geometric mean values (from which the PHB content is derived) were analyzed *via* the Mann–Whitney *U* test.

#### **NUCLEOTIDE SEQUENCE ACCESSION NUMBERS**

The sequences of the pBAMD vectors were deposited in the GenBank database with the following GenBank accession numbers: KM403113 (pBAMD1-2), KM403114 (pBAMD1-4), and KM403115 (pBAMD1-6).

# **RESULTS AND DISCUSSION**

## **RATIONALE, DESIGN, AND GENERAL CHARACTERISTICS OF THE mini-Tn5 -BASED VECTORS pBAMDs**

Vector pBAM1 (*b*orn *a*gain *m*ini-transposon) is a synthetic and modular plasmid with a number of features to facilitate the genome editing of Gram-negative bacteria (Martínez-García et al., 2011). We decided to further extend the range of such applications by constructing a new set of pBAM1-derivative plasmids, which are compatible with the rules set in the *Standard European Vector Architecture* (SEVA) format (Silva-Rocha et al., 2013). We decided to accomplish this challenge by constructing a standardized version of the mini-transposon delivery plasmid that take full advantage of all the benefits of the pBAM1 plasmid together with the functional elements available within the SEVA collection. The starting idea was to design a plasmid series in which the mini-transposon module, the antibiotic-resistance marker, and the *tnpA* could be easily interchanged at the user's will. In doing so, several changes were needed to re-structure the constituents of pBAM1, giving rise to three plasmids that were collectively termed pBAMD (i.e., *pBAM d*erivative) vectors. These insertion plasmids share all the advantages and several structural features of their pBAM1 predecessor. In brief, these features include (i) the narrow host-range origin of replication of plasmid R6K [*ori*(R6K)], dependent on the Π protein (encoded by the *pir* gene of plasmid R6K); (ii) an origin of transfer, *oriT*, that allows for the conjugative transfer of the plasmid from a host strain to a new bacterial recipient through RK2-mediated mobilization; (iii) the *bla*-encoded β-lactamase marker that confers resistance to ampicillin as a selective marker of the backbone vector; and [iv] a modified, hyper-active transposase encoded by *tnpA* just outside (but adjacent to) a DNA segment that is flanked by the terminal sequences of Tn*5* (i.e., the mini-transposon module itself). The Tn*5*-transposition system is an optimal source of biological parts because of its genetic promiscuity, and it can be considered to operate as a virtually orthogonal part, since it displays an autonomous behavior with respect to the host metabolic and regulatory traits.

We first constructed an intermediary plasmid, named p-R-SETA111, that was later used as the backbone in which the different mini-Tn*5* antibiotic modules were implanted to obtain the pBAMD1-*x* vectors. A mixture of three separate DNA fragments were isothermally assembled (Gibson et al., 2009) to construct the p-R-SETA111 intermediary plasmid (Figure S1 in Supplementary Material). The sequence of p-R-SETA111 was thoroughly checked after assembling with the set of oligonucleotides described in Table S1 in the Supplementary Material. This intermediate vector bears an R6K origin of replication that depends on the Π protein supplied in *trans* (Kolter et al., 1978) for replication. This situation calls for the use of *E*. *coli pir*<sup>+</sup> strains to propagate these plasmids (Miller and Mekalanos, 1988; Herrero et al., 1990), such as *E*. *coli* DH5α λ*pir*, CC118 λ*pir*, or S17-1 λ*pir* (**Table 1**). Another feature of plasmid p-R-SETA111 is the presence of a minimized origin of transfer (*oriT*) from the promiscuous conjugative plasmid RP4 (Lyras and Rood, 1998; Silva-Rocha et al., 2013). Another trait of this intermediary vector shared with the SEVA plasmids is that the cargo module is flanked by the strong T1 and T0 transcriptional terminators (Silva-Rocha et al., 2013), which isolate transcriptionally any DNA sequence cloned in the MCS of p-R-SETA111. Importantly, the modular design of this plasmid allows for the convenient exchange of the *tnpA* by restriction of p-R-SETA111 with the rare cutters *San*DI (5<sup>0</sup> -GG/GWCCC-3<sup>0</sup> , W =A or T; Simcox et al., 1995) and *Asi*SI (5<sup>0</sup> -GCGAT/CGC-3<sup>0</sup> ). Likewise, the antibiotic marker of the plasmid backbone (*bla* in this particular case), can be easily exchanged by enzymatic restriction with *Asi*SI and *Mlu*I (5<sup>0</sup> -A/CGCGT-3<sup>0</sup> ).

Three different mini-Tn*5* modules were designed as the cargo segments to be implanted into the p-R-SETA111 plasmid. These elements were devised as cargoes for the SEVA plasmid collection (Silva-Rocha et al., 2013; Durante-Rodríguez et al., 2014), and so they were bracketed by *Pac*I and *Spe*I restriction sites. The cargo modules have a similar structural design, only differing in the antibiotic marker placed within the mini-transposon (see below). The mini-Tn*5* modules are flanked by the two mosaic end (ME) sequences. ME elements are optimized 19-bp DNA sequences recognized by the Tn*5* TnpA transposase to promote the specific transposition of any DNA segment bracketed by these elements (Zhou et al., 1998). Even though they are identical in sequence, and with the aim to facilitate the orientation of each functional element within the plasmid, we termed them as either ME-I (the one just after the *Pac*I recognition site) or ME-O (the one close to the *Spe*I recognition site).

The next relevant feature, placed right after the ME-I element, is a 10-bp buffer DNA sequence that is immediately followed by a MCS (spanning recognition sites for *Avr*II, *Sfi*I, *Not*I, *Eco*RI, *Sac*I, *Sma*I, *Bam*HI, *Xba*I, *Sal*I, *Pst*I, *Sph*I, *Hin*dIII, and *Not*I, in the 5<sup>0</sup> →3 <sup>0</sup> direction). This DNA stretch has the same restriction sites as those present in cargo 1 in the SEVA database. The synthetic T500 terminator (Yarnell and Roberts, 1999) was placed after the MCS in order to avoid any transcriptional read-through that any heterologous DNA cloned within the cargo can leak into the following component of the plasmid. The next functional

element of the mini-transposon is the antibiotic selection marker that allows for the proper selection of transconjugants. For this set of plasmids, we have used resistances to kanamycin (*aphA*, encoding an aminoglycoside 3<sup>0</sup> -phosphotransferase; Martínez-García et al., 2011), streptomycin/spectinomycin (*aadA*, encoding a streptomycin 300(9)-*O*-nucleotidyl transferase; Fling et al., 1985), and gentamicin (*aacC1*, encoding a gentamicin 3<sup>0</sup> -*N*acetyltransferase; Kovach et al., 1995). These antibiotic-resistance cassettes are coded as 2, 4, and 6 in the SEVA database. Since the antibiotic selection cassettes are flanked by *Swa*I and *Psh*AI recognition sites, the user has the ability to further expand the plasmid collection by exchanging the cognate resistance genes with any of the two other markers present in the SEVA collection, i.e.,*cat* (encoding a chloramphenicol *O*-acetyltransferase) or *tetA* (encoding a tetracycline efflux protein). The *rho*-independent transcriptional terminator from the gene 32 encoded in the phage T4 genome (Gorski et al., 1985; Miller et al., 2003; Martínez-García et al., 2011) was placed immediately downstream to the antibiotic-resistance gene to prevent any possible read-through from the corresponding promoter elements. Furthermore, the motif 5<sup>0</sup> -**G**GGACCC-3<sup>0</sup> was changed to 5<sup>0</sup> -**C**GGACCC-3<sup>0</sup> to eliminate a *San*DI restriction target within the existing terminator sequence.

The last step in the construction of the pBAMD1-*x* vectors was the insertion of the mini-transposon modules themselves into the p-R-SETA111 backbone. The sequences of the three mini-Tn5 modules were edited *in silico* to follow the SEVA rules and synthesized *de novo*. The mini-transposon modules were restricted

using *Pac*I and *Spe*I and inserted into p-R-SETA111 previously digested with the same enzymes. The correctness of this last cloning step was verified by DNA sequencing with the SEVA oligonucleotides PS1 and PS2 (Table S1 in the Supplementary Material), that flank the cargo module. The resulting delivery plasmids were termed pBAMD1-2 (Km<sup>R</sup> ), pBAMD1-4 (Sm<sup>R</sup> /Sp<sup>R</sup> ), and pBAMD1-6 (Gm<sup>R</sup> ) (**Figure 1A**). Note that the first digit numeric nomenclature stems for the fact that all the pBAMD1 *x* vectors are derivatives of the pSEVA111 vector, while the second number identify the antibiotic-resistance marker. All the vectors share the same MCS (**Figure 1B**), also compatible with the rest of SEVA vectors already available.

# **FUNCTIONAL VALIDATION OF THE pBAMD1-x VECTORS IN P. PUTIDA KT2440**

To evaluate the functionality of the new set of Tn*5* plasmids, the frequency of transposition into the environmental bacterium *P*. *putida* KT2440 was firstly tested in 6 h triparental mating assays. In order to estimate the frequency of transposition events, the number of antibiotic-resistant colonies was assessed after 24 h of incubation at 30°C, and normalized to the total of 1.5 × 10<sup>8</sup> recipient cells used in each experiment. The average frequency of transposition obtained at 6 h was (3.8 ± 2.9) × 10−<sup>4</sup> transconjugants (ranging from a minimum of 2.5 × 10−<sup>5</sup> to a maximum of 1.0 × 10−<sup>3</sup> transconjugant cells). These figures were independent of the antibiotic marker used (either pBAMD1-2, pBAMD1-4, or pBAMD1-6), and no spontaneous antibiotic-resistant clones (i.e., Km<sup>R</sup> , Sm<sup>R</sup> , or Gm<sup>R</sup> , respectively) were detected under such growth conditions (data not shown).

The next step was to differentiate between *bona fide* transposition events and non-specific plasmid integration events in the *P*. *putida* genome. To do so, transconjugants cells were re-streaked onto LB medium plates containing 500µg ml−<sup>1</sup> ampicillin and incubated for 24 h to check for possible growth – which would indicate that the corresponding pBAMD vector had integrated into the target chromosome instead of transposing the MEs-flanked DNA cargo. Besides this simple test, the user could also perform colony PCR amplifications to confirm that the transconjugants do not have the pBAMD plasmid backbone by using any of the following two SEVA oligonucleotides combination. If the plasmid is present, the PS5-PS4 oligonucleotide pair will produce a 225-bp amplicon within the *oriT* region, and the PS5-PS6 oligonucleotide pair will generate a 665-bp amplicon including the *oriT* segment and the R6K origin of replication (Silva-Rocha et al., 2013). By conducting these two assays, we noticed that 4.2 ± 2.4% of the potential transconjugants obtained with the pBAMD vectors resulted from plasmid co-integration events (a percentage very similar to that reported for other Tn*5*-based plasmid systems; de Lorenzo et al., 1990). Therefore, it is highly recommended to confirm the nature of the antibiotic-resistant clones obtained after each round of insertions.

We also studied whether the pBAMD mini-transposon delivery plasmids can be used serially to generate double and even triple transconjugant mutants by taking advantage of the three different antibiotic resistance markers (**Figure 2A**). A first round of transposition was performed with the three individual pBAMD plasmids using *P*. *putida* KT2440 as the recipient strain. Transconjugants

**FIGURE 2 | Functional characterization of the pBAMD1-x delivery vectors in P. putida KT2440**. **(A)** Sequential insertion of different mini-Tn5 modules from pBAMD1-x plasmids carrying all the three possible antibiotic-resistance determinants. The flowchart shows the procedure followed for the combinatorial integrations, starting from the wild-type strain KT2440. The names given to the intermediate strains reflect the order in which each antibiotic was delivered into the recipient bacteria (K, kanamycin; S, streptomycin/spectinomycin; and G, gentamicin). The exact chromosomal localization of the insertions in these strains is given in **Table 3**. Plasmids used in each round of integration are indicated in red. **(B)** Assessment of the possible sequence preference in the target DNA during the insertion process of the pBAMD1-x delivery vectors in P. putida KT2440. The WebLogo 3.4 software was used to identify the DNA signature (if any) in which the transposon lands in the chromosome of recipient bacteria. The software was fed with the 9-bp DNA sequence targeted by mini-Tn5 in independent trials (**Table 3**). Note the slight preference for G/C pairs at both ends of the target DNA motif (i.e., in positions 1 and 9).

were re-streaked onto M9 minimal medium plates containing 0.2% (w/v) citrate and supplemented with the appropriate antibiotics to obtain isolated colonies. These colonies were restreaked again in the same culture medium to obtain pure clones. We randomly picked 22 clones obtained with each pBAMD1-*x* plasmid and characterized the corresponding transposon insertion site. In 12 out of 22 transconjugant clones, the ME-O related oligonucleotides (**Table 2**) were used, while in the remaining 10 transconjugant clones, the sequence was obtained by using the ME-I related oligonucleotides for the arbitrary PCR amplification. One Km<sup>R</sup> clone after the first round of transposition with pBAMD1-2 was selected and used as the recipient strain for the pBAMD1-4 and pBAMD1-6 mini-transposon plasmids. After this second insertion round, we selected four transconjugants obtained with each system, and mapped the landing point of the minitransposon using only the ME-O related oligonucleotides. Finally, we selected one *P*. *putida* KT2440 Km<sup>R</sup> and Gm<sup>R</sup> and one *P*. *putida* KT2440 Km<sup>R</sup> and Sm<sup>R</sup> to perform the third round of transposition with either pBAMD1-4 or pBAMD1-6, respectively. In the last step, we have chosen one mutant per plasmid system and characterized again the insertion site of the three mini-transposons using the ME-O oligonucleotides. The precise site of transposon insertion could be ascertained in 30 transconjugants out of 32 independent clones (**Table 3**). Specifically, in one of the cases in which it was not possible to identify the localization of the mini-Tn*5* element, the transposon insertion could have happened in either PP2612 or PP3616, since both genes share a 97% sequence identity. In the other case, the transposon insertion site could not be precisely mapped due to the presence of a large number of internal repeats within the *lapA* gene (PP0168).

#### **Table 3 | Insertion sites of the mini-transposon born by the different pBAMD1-x vectors in the genome of P. putida KT2440<sup>a</sup>** .


<sup>a</sup>The insertion sites were ascertained by means of arbitrary PCR with the oligonucleotides indicated in**Table 2**, and the assigned function of the corresponding ORF is given according to the information available in the Pseudomonas Genome Database (Winsor et al., 2011). NA, not available.

b In the two transconjugant clones indicated, the insertion site of the mini-Tn5 element could not be unambiguously identified.

We then used the insertion sequence data of each mapped clone in **Table 3** to detect any sequence preference for the integration of the mini-Tn*5* cassettes within the genome of *P*. *putida* KT2440. The web-based application WebLogo 3.4 (Crooks et al., 2004) was fed with the 9-bp landing sequence targeted by the minitransposon. The program was set to show the probability of having a defined base at any specific position within the 9-bp motif, and the G + C percentage of the genome was adjusted to the value of *P*.

Poly(3-hydroxybutyrate) (PHB) biosynthesis pathway. Three enzymes are necessary for de novo synthesis of PHB in C. necator: a 3-ketoacyl-coenzyme A (CoA) thiolase (PhaA), a NADPH-dependent 3-acetoacetyl-CoA reductase (PhaB1), and a PHB synthase (PhaC1). PhaA and PhaB1 catalyze the condensation of two molecules of acetyl-CoA to 3-acetoacetyl-CoA and the reduction of acetoacetyl-CoA to R-(–)-3-hydroxybutyryl-CoA, respectively. PhaC1 polymerizes these monomers to PHB, whereas one CoA-SH molecule per monomer is released. The resulting PHB polymer is stored as water-insoluble granules in the cytoplasm of the cells. **(B)** Organization of the functional elements borne by plasmid pBAM1-6-pha and transferred into the

*putida* KT2440 (G + C = 61.5%; Nelson et al., 2002). The results shown in **Figure 2B** reveal that there is no DNA sequence bias for the integration site of the mini-transposon, in a similar fashion as observed for plasmid pBAM1 (Martínez-García et al., 2011). However, a relatively minor preference for G/C pairs at both ends of the target DNA motif could be observed, as detected for other systems based on Tn*5* (Lodge et al., 1988). These experiments confirm that the three mini-transposons borne by the pBAMD1 *x* vectors could be used serially to generate a second or even a third round of insertion mutagenesis procedure, or also to stably integrate multiple genetic devices into the genome of a single microbial strain.

# **DESIGN AND CONSTRUCTION OF A MICROBIAL CELL FACTORY FOR POLY(3-HYDROXYBUTYRATE) SYNTHESIS**

# **Engineering an stable PHB**<sup>+</sup> **phenotype in E. coli**

Poly(3-hydroxybutyrate) is an isotactic polyester composed by 3 hydroxybutyrate units (Anderson and Dawes, 1990). The PHB synthesis pathway in *C*. *necator* (formerly known as *Ralstonia eutropha*) comprises three enzymes (**Figure 3A**) (Steinbüchel and Hein, 2001). PhaA, a 3-ketoacyl-CoA thiolase, condenses two acetyl-CoA moities,yielding 3-acetoacetyl-CoA. This intermediate is the substrate for PhaB, a NADPH-dependent 3-acetoacetyl-CoA reductase (encoded by *phaB1*). In the final step of this biosynthetic pathway, (*R*)-(–)-3-hydroxybutyryl-CoA is polymerized to PHB by PhaC, a poly(3-hydroxyalkanoate) synthase (encoded by *phaC1*). The very idea of a thermoplastic and biocompatible material, which is also readily biodegraded by a number of bacteria has become very attractive in an era of increasing environmental concern (Keshavarz and Roy, 2010). A number of different recombinant *E*.*coli* strains designed for polymer accumulation have been constructed thus far (Li et al., 2007; Chen et al., 2013; Ruiz et al., 2013), outsourcing the *pha* genes from several bacteria (Verlinden et al., 2007). However, most of the PHB production systems

chromosome of the recipient E. coli strain. The transcriptional terminators (Gm<sup>R</sup> ) determinant (accC1), are depicted as T<sup>500</sup> and T32. Note that the elements in this outline are not drawn to scale. **(C)** Exploring the landscape of PHB synthesis in E. coli transconjugants. The phaC1AB1 gene cluster from C. necator was randomly integrated into the chromosome of E. coli JW2293-1 (∆pta), and 24-h cultures of individual colonies were analyzed for PHB accumulation by fluorimetry after staining the cells with Nile red (see Materials and Methods for details). Several colonies, identified by numbers in the heat-map, were kept and further analyzed to establish the precise site of mini-Tn5(phaC1AB1) insertion (Table S2 in the Supplementary Material). AFU, arbitrary fluorescence units.

available thus far suffer from a number of drawbacks (Wang et al., 2014). Among them, the transcriptional regulation of the *pha* genes is of particular importance. In natural producer bacteria, PHB accumulation is triggered by an imbalance in the availability of critical nutrients (e.g., the N or S source; Anderson and Dawes, 1990). In recombinant *E*. *coli*, however, the constitutive expression of the *pha* genes leads to a growth-dependent accumulation of PHB, which normally results in metabolic burden in the producing cells (Wang and Lee, 1997). Controlling the rate of polymer accumulation in recombinant *E*. *coli* is thus of paramount importance for the design of efficient microbial cell factories. Besides this feature, the segregational stability of plasmids in *E*. *coli* recombinants could also be an issue in prolonged fermentation processes aimed at biopolymer production (Nikel et al., 2010b; Ruiz et al., 2013).

To overcome this state of affairs, we decided to explore the landscape of potentially useful transcription levels in *E*. *coli* by randomly integrating the *pha* genes from *C*. *necator* into the chromosome. To this end, we used vector pBAMD1-6 as the backbone to clone a 5.3-kb DNA fragment spanning the *phaC1AB1* genes. Plasmid pAeT41 (Peoples and Sinskey, 1989) was digested with *Eco*RI and *Sma*I to liberate the aforementioned DNA segment, and inserted into the corresponding restriction sites of pBAMD1- 6 to generate plasmid pBAMD1-6-*pha* (**Table 1**). The functional parts of the DNA element to be transferred into *E*. *coli* are shown in **Figure 3B**.

As acetyl-CoA is the precursor metabolite for PHB formation, competing pathways that use this intermediate are expected to drain building blocks of the biopolymer synthesis. In *E*. *coli*, the acetate formation pathway, comprising Pta and AckA (phosphotransacetylase and acetate kinase), uses acetyl-CoA as the starting metabolite (Neidhardt et al., 1990). This pathway, which diverts a considerable amount of carbon from the central metabolism (Wolfe, 2005), is active under both oxic and anoxic conditions (Clark, 1989). For this reason, the *phaBAC* gene cluster was

delivered into a ∆*pta* recipient strain (**Table 1**),in which the acetate formation is expected to be low. Plasmid pBAMD1-6-*pha* was first transferred to *E*. *coli* S17-1 λ*pir* to perform a biparental mating. *E*. *coli* S17-1 λ*pir* or SM10 λ*pir* are the preferred donor strains when mobilizing RP4-based plasmid to *E. coli* recipient strains since they bear the functional *tra* and *mob* elements integrated in the genome, thus avoiding the inadvertent transfer of the matinghelper plasmid (pRK600) alongside the mini-transposon delivery system. Alternatively, *E*. *coli* strain MFD *pir*<sup>+</sup> (Ferrières et al., 2010), devoid of the Mu element present in the strains detailed above, could be used for transposon insertions.

After integration of the mini-Tn*5*:*phaC1AB1* device into *E*. *coli* JW2293-1, individual Gm<sup>R</sup> colonies were purified and separately grown in microtiter plates as explained in the Section "Materials and Methods" to explore PHB accumulation in 100 independent transconjugants after 24 h of incubation. **Figure 3C** shows the level of PHB accumulation in the transconjugants as compared to that of *E*. *coli* JW2293-1 carrying the pAeT41 plasmid, in which the expression of the *phaC1AB1* gene cluster is driven by the native promoter. All the transconjugants tested accumulated the polymer from mono-copy chromosomal insertions of the PHB biosynthesis pathway to some extent, ranging from 3% up to 78% of the accumulation levels observed in *E*. *coli* JW2293P (which carries plasmid pAeT41). Table S2 in the Supplementary Material shows the insertion locus for some selected *phaC1AB1*<sup>+</sup> transconjugants, clearly illustrating the non-selective nature of the chromosomal incorporation of Tn*5* (and consequently, the wide range of PHB accumulation levels), as it was already observed in *P*. *putida* transconjugants. We selected one of the transconjugants, termed *E*. *coli* TA2293P, that accumulated high polymer levels (clone 1 in **Figure 3C** and Table S2 in Supplementary Material), and the insertion site of the transposon was determined to be *ykgH*, an open reading frame encoding a predicted inner membrane protein (Table S2 in the Supplementary Material). The physiology of PHB accumulation of this strain was studied as detailed below.

# **Physiological and biochemical characterization of E. coli TA2293P as a microbial cell factory for PHB synthesis**

The growth parameters of several *E*. *coli* strains were determined to explore their potential as biopolymer cell factories (**Table 4**). We decided to compare the performance of strains in which the *pha* genes are expressed either in a multi-copy plasmid or as a mono-copy insertion in the bacterial chromosome side-by-side. Interestingly, the elimination of Pta resulted in a reduction of the specific growth rate, probably by an imbalance in the acetyl-CoA pool that was partially restored by the heterologous expression of the *phaC1AB1* gene cluster. This positive effect was more evident in the strain in which the genes were inserted into the chromosome (the specific growth rate attained *ca*. 85% of that in the wild-type strain), thus suggesting that the adequate expression level of the PHB biosynthesis genes is important to recover a homeostatic acetyl-CoA balance. This kinetic pattern was also mirrored in the final biomass density of the cultures. In fact, among the strains tested, *E*. *coli* TA2293P attained the highest cell density. This result highlights the advantage of integrating the *pha* genes in the chromosome, as such approach not only avoids the metabolic burden usually associated with heterologous gene expression from plasmids and other extra-chromosomal elements but it also allows for the selection of an integrant strain exhibiting the appropriate level of transcription of the corresponding genes. Moreover, the selection of *ykgH* as a target was not at all obvious, indicating, again, the value of random insertion of the genes of interest in the bacterial chromosome.

Since we selected a *pta* mutant of *E*. *coli* as the recipient strain, in which acetate formation is expected to be impaired, by-product formation was also explored in these cultures as a measure of the carbon flow from glucose to PHB (**Table 4**). In cultures of the wildtype strain, up to 60% of the total carbon source was converted into acetate, pinpointing this metabolite as the key by-product of hexose catabolism in *E*. *coli* (and therefore, as the main side pathway competing for acetyl-CoA). The *pta* mutant still produced some acetate (probably through the action of pyruvate oxidase, PoxB); however, the molar conversion of glucose into acetate reached only *ca*. 20% of that observed in the wild-type strain. Acetate formation in both *E*. *coli* strains carrying PhaC1AB1 was comparable, and much lower than the other two strains. In all, these results bear witness of (i) the suitability of a *pta* mutant, deficient in acetate formation, as the starting point to construct a PHB cell factory, and (ii) the effect of the PHB biosynthetic pathway in using acetyl-CoA as the precursor metabolite. Once the coarse physiological

**E. coli strain Relevant characteristics Physiological parameter<sup>a</sup>** µ **(h**−**<sup>1</sup> ) CDW (g l**−**<sup>1</sup> ) Y A/S (mol mol**−**<sup>1</sup> )** BW25113 Wild-type strain 0.74 ± 0.04 3.8 ± 0.2 0.64 ± 0.05 JW2293-1 ∆pta 0.44 ± 0.03 2.9 ± 0.3 0.13 ± 0.02 JW2293P ∆pta phaC1AB1<sup>+</sup> 0.51 ± 0.07 3.5 ± 0.1 0.05 ± 0.02 TA2293P ∆pta ykgH:mini-Tn5(phaC1AB1) 0.63 ± 0.05 4.2 ± 0.4 0.09 ± 0.03

**Table 4 | Physiological characterization of wild-type and mutant E. coli strains as microbial cell factories for PHB biosynthesis in shaken-flask cultures**.

<sup>a</sup>Cells were grown aerobically in M9 minimal medium containing 30 g l<sup>−</sup><sup>1</sup> glucose as the sole carbon source.The specific growth rate (microns) was determined during logarithmic growth, whereas the final cell density (expressed as the cell dry weight, CDW) and the molar yield of acetate on glucose (YA/S) were calculated after 24 h of incubation. Reported results represent the mean value ± SD of triplicate measurements from at least two independent cultures.

characterization of these strains was completed, the next relevant question was how they perform as PHB producers.

Since PhaA is the first committed enzymatic step of PHB formation from acetyl-CoA, the *in vitro* activity of this enzyme was assayed as a proxy of the activity of the whole biosynthetic pathway in shaken-flask cultures using glucose as the sole carbon source (**Figure 4A**). Note that there was some degree of thiolase activity in both *E*. *coli* BW25113 and JW2293-1, probably represented by FadA (an enzyme normally involved in the degradation of fatty acids via the β-oxidation cycle). However, the specific PhaA activity was six and fourfold higher in the strains carrying the *phaC1AB1* gene cluster (*E*. *coli* JW2293P and TA2293P, respectively) as compared to that of *E*. *coli* JW2293-1. As expected, the highest enzymatic activity corresponded to the strain carrying the *pha* genes in a multi-copy plasmid. Nevertheless, *E*. *coli* TA2293P had an activity level *ca*. 60% of the strain bearing pAeT41, indicating that the appropriate insertion of the gene cluster could result in thiolase activities similar to those of a typical recombinant PHB producer. Yet, how do these activities translate into PHB accumulation?

**Figure 4B** shows that *E*. *coli* TA2293P accumulated PHB up to 62% of the level observed in the same strain but expressing the *pha* genes in a plasmid (i.e., *E*. *coli* JW2293P). As previously observed in other traits during the physiological characterization, the later strain had the highest PHB accumulation level among all the strains tested. That *E*. *coli* TA2293P accumulates such a high amount of PHB is a somewhat surprising (and welcome) result considering the difference in copy number between the two *phaC1AB1*<sup>+</sup> strains under comparison. Note that a possible effect of the absence (or an altered expression level) of YkgH on the properties of *E*. *coli* TA2293P cannot be completely ruled out. Interestingly, when this strain was persistently cultured in LB medium without any selective pressure, its phenotypic traits, particularly regarding polymer accumulation, remained unchanged. By contrast, the segregational stability of pAeT41 was assessed in cultures of *E*. *coli* JW2293P, and, after growing the cells in LB medium and sub-culturing them daily seven times without any antibiotic, <25% of the cells were resistant to ampicillin.

# **CONCLUSION**

In our current study, we presented a set of new mini-Tn*5*-derived vectors that can be used to engineer the genome of Gram-negative bacteria. While the worth of these tools has been exposed in two model bacteria, *P*. *putida* and *E*. *coli*, the inherent promiscuity of Tn*5* ensures its functioning in a number of different microbial hosts. Of particular importance is the possibility of sequentially using the three pBAMD1-*x* vectors, thereby enabling the user to accumulate insertions in the same genetic background in a combinatorial fashion. This is a particularly interesting feature for the construction of complex phenotypes, such as biopolymer formation, which depend on more than one enzyme. Although the expression of the biosynthetic genes in a multi-copy plasmid is in principle enough to bestow the desired phenotype on the recipient bacterium, fine-tuned expression levels (together with the appearance of emergent phenotypic properties in the host, brought about by the insertion process itself) can be easily achieved by randomly

**FIGURE 4 | Biochemical characterization of E. coli TA2293P as a microbial cell factory for PHB synthesis**. **(A)** In vitro determination of the specific (Sp) 3-ketoacyl-coenzyme A thiolase (PhaA) activity. Cells were harvested after growing them for 24 h in M9 minimal medium added with 30 g l<sup>−</sup><sup>1</sup> glucose as the sole carbon source, and the activity of PhaA was determined in the cell-free extract as detailed in the Section "Materials and Methods." **(B)** Poly(3-hydroxybutyrate) (PHB) accumulation. The PHB content (expressed as a percentage of the cell dry weight) was assessed by flow cytometry after growing the cells for 24 h in M9 minimal medium added with 30 g l<sup>−</sup><sup>1</sup> glucose as the sole carbon source. In all cases, each bar represents the mean value of the corresponding enzymatic activity ± SD of triplicate measurements from at least two independent experiments. The strains used to explore these biochemical traits were E. coli BW25113 (wild-type strain), E. coli JW2293-1 (∆pta), E. coli JW2293P (∆pta, carrying the phaC1AB1 gene cluster in a multi-copy plasmid), and E. coli TA2293P [∆pta, ykgH:mini-Tn5(phaC1AB1)]. See**Table 1** for further details about the genotype of each E. coli strain. The relevant features of each strain are indicated at the bottom of the figure.

integrating the structural genes into the chromosome. In this way, and since the genetic context of each integration will surely result in different regulatory patterns at the transcriptional level, the user could choose among a library of insertions those clones that meet any desired criterion (in our case, PHB accumulation). Moreover, as the delivery plasmids described in this study can be used in a sequential manner, other polymer-associated enzymes, such as phasins, can also be incorporated in the same strain to enhance further polymer production.

## **ACKNOWLEDGMENTS**

We are indebted to Prof. A. Sinskey (Massachusetts Institute of Technology) for sharing research materials and to A. Goñi (CNB-CSIC) for his help in image processing. I. Benedetti (CNB-CSIC) is gratefully acknowledged for her help in FACS measurements. This study was supported by the BIO Program of the Spanish Ministry of Economy and Competitiveness, the ST-FLOW and ARISYS Contracts of the EU, the ERANET-IB Program, and the PROMT Project of the CAM to VDL. PIN is a researcher from the Consejo Nacional de Investigaciones Científicas y Técnicas (Argentina) and holds a Marie Curie Actions Program grant from the EC (ALLE-GRO, UE-FP7-PEOPLE-2011-IIF-300508). The authors declare no conflict of interest.

# **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at http://www.frontiersin.org/Journal/10.3389/fbioe.2014.00046/ abstract

# **REFERENCES**


of the metabolically versatile *Pseudomonas putida* KT2440. *Environ. Microbiol.* 4, 799–808. doi:10.1046/j.1462-2920.2002.00366.x


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 22 August 2014; accepted: 13 October 2014; published online: 28 October 2014.*

*Citation: Martínez-García E, Aparicio T, de Lorenzo V and Nikel PI (2014) New transposon tools tailored for metabolic engineering of Gram-negative microbial cell factories. Front. Bioeng. Biotechnol. 2:46. doi: 10.3389/fbioe.2014.00046*

*This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology.*

*Copyright © 2014 Martínez-García, Aparicio, de Lorenzo and Nikel. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# ADVANTAGES OF PUBLISHING IN FRONTIERS

FAST PUBLICATION Average 90 days from submission to publication

COLLABORATIVE PEER-REVIEW

Designed to be rigorous – yet also collaborative, fair and constructive

RESEARCH NETWORK Our network increases readership for your article

# OPEN ACCESS

Articles are free to read, for greatest visibility

# TRANSPARENT

Editors and reviewers acknowledged by name on published articles

GLOBAL SPREAD

Six million monthly page views worldwide

# COPYRIGHT TO AUTHORS

No limit to article distribution and re-use

IMPACT METRICS Advanced metrics track your article's impact

SUPPORT By our Swiss-based editorial team

EPFL Innovation Park · Building I · 1015 Lausanne · Switzerland T +41 21 510 17 00 · info@frontiersin.org · frontiersin.org