# ENGINEERING SYNTHETIC METABOLONS: FROM METABOLIC MODELLING TO RATIONAL DESIGN OF BIOSYNTHETIC DEVICES

EDITED BY: Lars M. Voll and Zoran Nikoloski PUBLISHED IN: Frontiers in Bioengineering and Biotechnology

#### *Frontiers Copyright Statement*

*© Copyright 2007-2016 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

*All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88919-921-1 DOI 10.3389/978-2-88919-921-1

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

#### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# **ENGINEERING SYNTHETIC METABOLONS: FROM METABOLIC MODELLING TO RATIONAL DESIGN OF BIOSYNTHETIC DEVICES**

#### Topic Editors:

**Lars M. Voll,** Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany **Zoran Nikoloski,** Max-Planck-Institute for Molecular Plant Physiology, Germany

Side view of an ScrY trimer (orange ribbons) embedded in a phospholipid membrane (blue spheres and sticks) with the periplasmic side facing downward. The substrate of this porin, sucrose, is depicted as green stick-and-balls, water molecules are represented by blue dots.

Image adapted from Sun et al. (2016), DOI: 10.3389/fbioe.2016.00009.

The discipline of Synthetic Biology has recently emerged at the interface of biology and engineering. The definition of Synthetic Biology has been dynamic over time ever since, which exemplifies that the field is rapidly moving and comprises a broad range of research areas.

In the frame of this Research Topic, we focus on Synthetic Biology approaches that aim at rearranging biological parts/ entities in order to generate novel biochemical functions with inherent metabolic activity. This Research Topic encompasses Pathway Engineering in living systems as well as the in vitro assembly of biomolecules into nano- and microscale bioreactors.

Both, the engineering of metabolic pathways *in vivo*, as well as the conceptualization of bioreactors *in vitro*, require rational design of assembled synthetic pathways and depend on careful selection of individual biological functions and their optimization. Mathematical modelling has proven to be a powerful tool in predicting metabolic flux in living and artificial systems, although modelling approaches have to cope with a limitation in experimentally verified, reliable input variables. This Research Topic puts special emphasis on the vital role of modelling approaches for Synthetic Biology, i.e. the predictive power of mathematical simulations for (i) the manipulation of existing pathways and (ii) the establishment of novel pathways in vivo as well as (iii) the translation of model predictions into the design of synthetic assemblies.

**Citation:** Voll, L. M., Nikoloski, Z., eds. (2016). Engineering Synthetic Metabolons: From Metabolic Modelling to Rational Design of Biosynthetic Devices. Lausanne: Frontiers Media. doi: 10.3389/978-2-88919-921-1

# Table of Contents

*06 Editorial: Engineering Synthetic Metabolons: From Metabolic Modeling to Rational Design of Biosynthetic Devices* Lars M. Voll and Zoran Nikoloski

**Chapter 1: Molecular Dynamics Simulations at Biomembranes**


Liping Sun, Franziska Bertelshofer, Günther Greiner and Rainer A. Böckmann

*24 GroPBS: Fast Solver for Implicit Electrostatics of Biomolecules* Franziska Bertelshofer, Liping Sun, Günther Greiner and Rainer A. Böckmann

**Chapter 2: From Mathematical Modelling to Biochemical Microreactors**


**Chapter 3: Metabolic Modelling and Metabolic Engineering in Plants**


Georg Basler, Anika Küken, Alisdair R. Fernie and Zoran Nikoloski

*86 Optimization of Engineered Production of the Glucoraphanin Precursor Dihomomethionine in Nicotiana benthamiana*

Christoph Crocoll, Nadia Mirza, Michael Reichelt, Jonathan Gershenzon and Barbara Ann Halkier

#### **Chapter 4: Protein Engineering**

## *95 Synthetic Peptides as Protein Mimics*

Andrea Groß, Chie Hashimoto, Heinrich Sticht and Jutta Eichler


# Editorial: Engineering Synthetic Metabolons: From Metabolic Modeling to Rational Design of Biosynthetic Devices

#### *Lars M. Voll1 \* and Zoran Nikoloski2*

*1 Division of Biochemistry, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany, 2 Max-Planck-Institute for Molecular Plant Physiology, Potsdam-Golm, Germany*

Keywords: synthetic biology, pathway engineering, bioreactors, model-based design, computational models, molecular dynamics modeling

#### **The Editorial on the Research Topic**

#### **Engineering synthetic metabolons: from metabolic modelling to rational design of biosynthetic devices**

#### *Edited by:*

*Zhanglin Lin, Tsinghua University, China*

*Reviewed by: Yinjie Tang, Washington University, USA*

> *\*Correspondence: Lars M. Voll lars.voll@fau.de*

#### *Specialty section:*

*This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology*

> *Received: 17 March 2016 Accepted: 21 April 2016 Published: 06 May 2016*

#### *Citation:*

*Voll LM and Nikoloski Z (2016) Editorial: Engineering Synthetic Metabolons: From Metabolic Modeling to Rational Design of Biosynthetic Devices. Front. Bioeng. Biotechnol. 4:39. doi: 10.3389/fbioe.2016.00039*

Synthetic Biology is an emerging discipline that enjoys rapidly increasing popularity. However, there is no widely accepted definition for the term *Synthetic Biology*, because the field of Synthetic Biology is multifaceted and rapidly moving. As viewed from its underlying principle, Synthetic Biology can be defined as the "application of engineering principles to biological systems." It reaches from pathway engineering in living systems, i.e., the introduction of functional biochemical pathways into existing organisms (see featured research topic review by Pröschel et al. and references cited therein), and the design of vesicle-based multicompartmented biochemical reactors and protocells (see featured research topic review by Schmitt et al. and references cited therein) to the creation of entirely synthetic cells with reproductive potential that is encoded by synthetic genes [with the work of Gibson et al. (2008, 2010) that attracted the greatest publicity in the recent years]. Likewise, the definition provided at http://syntheticbiology.org covers these two extremes in that Synthetic Biology is defined as "(A) the design and construction of new biological parts, devices, and systems and (B) the re-design of existing, natural biological systems for useful purposes." This also means that Synthetic Biology is not limited to living systems but also comprises the combination and re-design of biological parts into artificial bioreactors. In this respect, Synthetic Biology is highly interdisciplinary, bringing together molecular biology with biophysics, material sciences, bioengineering, and computational approaches.

Besides the invaluable role in helping biochemists understand the metabolic complexity of living systems and the extent to which observations from existing metabolic engineering strategies match predictions from large-scale modeling (see, e.g., the featured research topic article by Basler et al.), models are crucial for the rational design of synthetic systems (see, e.g., the featured research topic article by Elbinger et al.). Metabolic modeling has the capacity to guide the biochemical layout of engineered pathways *in vivo* and the design of artificial bioreactors *in vitro*. For instance, computational modeling can play an important role for the selection of individual enzyme isoforms for engineered metabolic pathways. *Vice versa*, *in vitro* studies in artificial bioreactors has advanced our understanding of biochemically complex processes, such as starch biosynthesis, as summarized in the featured research topic review by O'Neill and Field. On the other hand, molecular dynamics simulations help to understand complex biochemical and biophysical processes such as enzymatic catalysis, molecular interactions of biomolecules (see featured research topic reviews by Horn and Sticht

and Groß et al.), or membrane transport (see featured research topic articles by Róg et al., Bertelshofer et al. and Sun et al.). The knowledge gained by these simulations can be exploited for the structural and molecular design of synthetic systems, i.e., for the choice of artificial protein or nucleic acid scaffolds for the spatial modular organization of artificial systems (see featured research topic review by Pröschel et al. and references cited therein).

This research topic focuses on the importance of computational modeling for the rational design of synthetic systems on all these levels. Consequently, a wide variety of approaches is being covered in this issue, from pathway engineering in eukaryotic cells to molecular dynamics simulations of transport processes. The modeling approaches range from partial differential equations, allowing predictions of spatiotemporal concentration of metabolites in a biochemical microreactor (see featured research topic original research article by Elbinger et al.), to steady-state equations, testing the effect of metabolic engineering strategy in large-scale metabolic networks (see, e.g., the featured research topic article by Basler et al.). A new metabolic engineering strategy for the glucosinolate pathway that increases dihomomethionine levels without increasing the levels of leucine-derived side products was experimentally validated in *Nicotiana benthamiana*

## REFERENCES


(see featured research topic original research article by Crocoll et al. and references to previous attempts therein). Sharing of modeling results and ensuring reproducibility of model-data integration requires setting and following a set of standards, which are currently established in the context of metabolomics research (see featured research topic review by Hill et al.).

The majority of the articles covered in this research topic article is contributed by members of the interdisciplinary project SynBio that is funded by the emerging fields initiative of the Friedrich-Alexander-Universität Erlangen-Nürnberg (Germany).

## AUTHOR CONTRIBUTIONS

LV and ZN have conceived and written the article.

## FUNDING

The topic editor would like to gratefully acknowledge generous financial support by the Emerging Fields Initiative of the FAU Erlangen-Nürnberg in the framework of the EFI SynBio program and funding by the FAU open access fund.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2016 Voll and Nikoloski. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **Building synthetic sterols computationally – unlocking the secrets of evolution?**

*Tomasz Róg<sup>1</sup> , Sanja Pöyry <sup>1</sup> and Ilpo Vattulainen1,2 \**

*<sup>1</sup> Department of Physics, Tampere University of Technology, Tampere, Finland, <sup>2</sup> MEMPHYS-Center for Biomembrane Physics, University of Southern Denmark, Odense, Denmark*

Cholesterol is vital in regulating the physical properties of animal cell membranes. While it remains unclear what renders cholesterol so unique, it is known that other sterols are less capable in modulating membrane properties, and there are membrane proteins whose function is dependent on cholesterol. Practical applications of cholesterol include its use in liposomes in drug delivery and cosmetics, cholesterol-based detergents in membrane protein crystallography, its fluorescent analogs in studies of cholesterol transport in cells and tissues, etc. Clearly, in spite of their difficult synthesis, producing the synthetic analogs of cholesterol is of great commercial and scientific interest. In this article, we discuss how synthetic sterols non-existent in nature can be used to elucidate the roles of cholesterol's structural elements. To this end, we discuss recent atomistic molecular dynamics simulation studies that have predicted new synthetic sterols with properties comparable to those of cholesterol. We also discuss more recent experimental studies that have vindicated these predictions. The paper highlights the strength of computational simulations in making predictions for synthetic biology, thereby guiding experiments.

**Keywords: cholesterol, synthetic sterol, computer simulation, molecular dynamics simulation**

## **Why Synthetic Lipids and Sterols are Important?**

As nature has designed thousands of lipid species, why then would we need synthetic lipids in addition? Clearly, however, the use of synthetic lipids is commonplace in both applied and basic sciences. The largest applications of synthetic lipids are in pharmacology, where synthetic lipids are used, e.g., in drug delivery and gene transfection. In drug delivery, the most commonly used carriers are liposomes, however, simple micelles or nanodiscs can be used as well. Technical requirements for the carriers include optimal lifetime, just-in-time triggered release of their contents, feasible targeting agents, etc. Numerous synthetic lipids have been synthesized and tested for this purpose [for a recent review, see Kohli et al. (2014)]. As in several other cases, here also atomistic molecular dynamics simulations have been used to unravel the physicochemical properties of these lipids [e.g., Bunker(2012)]. In gene transfection, one possible form of DNA packaging is the so-called *genosome*, commonly also called the *lipoplex*. Lipoplex is an aggregate of DNA and lipids; however, the cationic lipids needed to form this aggregate do not exist in nature. Consequently, only synthetic lipids can be used for this purpose.

Synthetic lipids have also numerous applications in basic research. Possibly, the most apparent example is labeling lipids with fluorescent or spin labels. For instance, cholesterol labeled with BOD-IPY or NBD has been used to study cholesterol trafficking in cells. Here also, molecular dynamics (MD) simulations have been used to examine the different behaviors of native and modified

#### *Edited by:*

*Lars Matthias Voll, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany*

#### *Reviewed by:*

*Rainer A. Böckmann, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany Olle Edholm, KTH Royal Institute of Technology, Sweden*

#### *\*Correspondence:*

*Ilpo Vattulainen, Department of Physics, Tampere University of Technology, POB 692, Tampere FI-33101, Finland ilpo.vattulainen@tut.fi*

#### *Specialty section:*

*This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology*

> *Received: 27 May 2015 Accepted: 07 August 2015 Published: 21 August 2015*

#### *Citation:*

*Róg T, Pöyry S and Vattulainen I (2015) Building synthetic sterols computationally – unlocking the secrets of evolution? Front. Bioeng. Biotechnol. 3:121. doi: 10.3389/fbioe.2015.00121*

molecules, thus complementing and explaining experiments (Hölttävuori et al., 2008; Robalo et al., 2013). Synthetic detergents like cholesteryl hemisuccinate are commonly used in G-protein coupled receptor crystallography, and again MD simulations have elucidated the differences between native and modified molecules (Kulig et al., 2014, 2015). More sophisticated applications of synthetic lipids include modifying the molecule's native structure by removing functional groups, in order to understand their individual function. Particularly, sphingolipids have been extensively studied in this manner (Slotte, 2013).

In this perspective article, we show an example of this last approach. The studies discussed in this article aimed at understanding the detailed structure–function relationships of cholesterol, in particular, the role of methyl groups attached to the steroid ring system. As we next explain in detail, these groups might with good reasons be thought of as unnecessary molecular fossils. However, as the below discussion highlights, extensive atomistic MD simulations showed that the methyl groups are indeed important parts of the cholesterol molecule, and the simulation results were later confirmed by experiments.

## **What is So Special About Cholesterol?**

Cholesterol is a truly special molecule and absolutely vital for animals' wellbeing. This is probably best proved by the complete lack of mutations that would totally block the synthesis of cholesterol. Furthermore, some rare genetic syndromes caused by impaired cholesterol synthesis lead to serious conditions or death (Kelley and Herman, 2001). To ensure proper function, cholesterol needs a high degree of structural specificity. Indeed, cholesterol's precursors that have one additional double bond compared to cholesterol cannot substitute it independently, irrespective of whether the bond is located in the ring structure (7-dehydrocholesterol) or in the hydrocarbon tail (desmosterol) (Kelley and Herman, 2001). Highlighting its pivotal role, cholesterol is the single most common lipid species in our body. Its concentration in cell membranes varies from 30 to 50 mol% (van Meer et al., 2008), whereas in specialized membranes, such as the ocular lens (Mason et al., 2003), its concentration may reach 75 mol%. Ten percent of brain dry mass is cholesterol (Snipes and Suter, 1997). In the intracellular membranes, the concentration of cholesterol is lower but still typically 10–20 mol%. Deservedly, cholesterol is one of the most studied lipid molecules of all time.

Many of the various functions of cholesterol are related to modifying the structural properties of membranes. For example, cholesterol increases the mechanical strength of membranes, decreases their permeability, and affects membrane thickness and condensation [for reviews, see Ohvo-Rekila et al. (2002), Almeida (2009), and Róg and Vattulainen (2014)]. Presence of cholesterol alters the pressure profile across membranes (Ollila et al., 2007); this effect is sensitive to even small modifications in sterol structure. Cholesterol also modulates the phase behavior of lipid bilayers in a complex way (Ipsen et al., 1987; Vist and Davis, 1990). At larger cholesterol concentrations, a new phase called the *liquid ordered (Lo) phase* occurs, while at lower concentrations a *liquid disordered phase* is observed. Cholesterol is able to promote the formation of so-called *lipid rafts*, functional nanoscale domains that are rich in cholesterol, sphingolipids, and saturated phospholipids (Lingwood and Simons, 2010), and numerous cellular functions, such as signaling and intracellular trafficking, actually depend on cholesterol (Coskun and Simons, 2011). Other cellular functions of cholesterol include its role as a metabolite and precursor of bile salts, some vitamins, and adrenal, pituitary, and sex (steroid) hormones.

All of the discussed points give rise to a picture of cholesterol having a very special and specific structure. Already during the seventies, cholesterol was established to be composed of three structural elements: a small hydroxyl head group, a rigid steroid ring system, and a short iso-octyl tail (Demel et al., 1972; Wenz, 2012). Modifications of these elements typically decrease the strength of cholesterol's effects on the physical properties of lipid bilayers and, as mentioned above, other sterols cannot substitute cholesterol in its biological function.

#### **Does Cholesterol's Biosynthetic Pathway Reflect Molecular Evolution?**

The biosynthesis of cholesterol is a complex process. The first sterol on the path is lanosterol (**Figure 1**), which is synthesized from squalene in a reaction that requires molecular oxygen. Consequently, the occurrence of this sterol can be located in the history of earth to a time after prokaryotic life had developed. Thus, perhaps not surprisingly, sterols are not typical bacterial lipids with the exception of *Mycoplasma*, one of the most simple parasitic bacteria that utilizes lipids produced by their hosts and is actually often thought of as an intermediate form of life between viruses and bacteria. Next, lanosterol is converted into cholesterol through two alternative pathways: one ending in desmosterol and another with 7-dehydrocholesterol – the direct precursors of cholesterol. Although textbooks show these as separate pathways, it should be kept in mind that at each of the individual steps, it is possible to swap to the other pathway, as appropriate enzymes for this do exist. The conversion of lanosterol to cholesterol needs a minimum of only 7 steps; however, 18 steps are possible and thus also 18 enzymes exist! This has to be energetically very expensive for cells, once more stressing the great importance of cholesterol.

This amazing redundancy has been noticed a long time ago and it has given rise to a question as to what is so special about the structure of cholesterol that sets it apart from lanosterol and other precursors. When looking at the structures of lanosterol and cholesterol in **Figure 1**, one notices that the differences are limited to the number and position of double bonds (one more in lanosterol) and the number of methyl groups attached to the steroid ring system (three more in lanosterol). While these do not seem such large differences, they have substantial consequences. First, it has been shown that lanosterol does not induce the existence of the *Lo* phase and thus lipid rafts cannot be formed by this sterol (Miao et al., 2002). Even more intriguingly, it has been shown, already in the sixties, that cholesterol's precursors affect the properties of lipid bilayers step by step more, ending in cholesterol whose effect is the strongest of all. Thus, it has been proposed that the biosynthetic pathway of cholesterol reflects the evolutionary optimization of its structure (Bloch, 1979; Nielsen et al., 2000).

This idea was the starting point for our first investigation into the matter using atomistic MD simulations. Intriguingly, the methyl groups stick out from one side of the cholesterol molecule, called the β-side, while the other side, called the α-side, is flat (**Figure 1**). Lanosterol has three additional methyl groups as compared to cholesterol. Two of these additional methyl groups stick out from the α-side, while the third is directed along the ring plane. Our first results showed greater ordering of saturated lipids neighboring the α-side of cholesterol as compared to lipids next to the β-side (Róg and Pasenkiewicz-Gierula, 2001). Subsequent studies showed that the packing of lipid carbon atoms near the α-side is tight; while near the β-side it is much looser (Róg and Pasenkiewicz-Gierula, 2004). In other words, we showed that the flatness of the ring is associated with higher ordering of lipids. These results fit perfectly with the idea of considering the removal of methyl groups as optimization of cholesterol's structure.

molecules are given. The middle panel shows only the ring system of each

At this point, another open question remains about the role of double bonds in the sterols' structure. In case of desmosterol, atomistic MD studies showed it to be inferior to cholesterol in its ordering capability of saturated lipids; while in the case of unsaturated lipids, there is no significant difference between the two sterols (Vainio et al., 2006; Róg et al., 2008). These results agree with experimental data (Huster et al., 2005; Scheidt et al., 2005). Subsequently, studies of 7-dehydrocholesterol showed very small or non-existent differences as compared to cholesterol. This was observed both in MD simulations (Róg et al., 2008; Liu et al., 2011) and experimental studies (Chen and Tripp, 2012). However, there are two conjugated double bonds in the ring structure of 7-dehydrocholesterol, which may render the molecule prone to oxidation. This might be the reason why 7-dehydrocholesterol is not the sterol of choice for biological membranes.

oxygen in red, and hydrogen in silver.

Eukaryotic cells require the ordering properties of sterols. At the same time, all of the above considerations lead us to the conclusion that these ordering properties are decreased in the presence of methyl groups. Then, why would any methyl groups remain on the β-side of the ring system? Are they molecular fossils? Could we further optimize the structure of sterols by removing these last remaining groups?

## **Are Cholesterol's Methyl Groups Molecular Fossils? – Simulations said No!**

Molecular dynamics is a very flexible method and provides an inexpensive way to start investigating a new molecule. Surely, if the new molecule does not exist yet, validating the model may be problematic. Nevertheless, taking into account the current development of organic synthesis methods, one may expect the results from MD to be eventually validated.

In the second phase of our investigations, we designed our first sterol, which lacks the methyl groups C19 and C18: 18-19-di-nor-cholesterol, which we called Dchol (see **Figure 1**) (Róg et al., 2007). To our surprise, this sterol does not induce more

order in saturated bilayers than cholesterol does, even though packing of lipid tails' atoms is almost identical at both sides of Dchol and even slightly higher than in the case of cholesterol. On the contrary, Dchol's ordering capability is clearly worse. In unsaturated bilayers, the differences were smaller; however, cholesterol was still superior to our artificial Dchol. The molecular level mechanism behind the weaker ordering and condensing effects was related to the larger tilt of Dchol in the bilayer (Aittoniemi et al., 2006). Studies of several sterols have shown that the sterol's tilt correlates with its ordering capability (Aittoniemi et al., 2006; Khelashvili and Harries, 2013). Thus, our conclusion was that the methyl groups at the β-side are needed to ensure the proper orientation of cholesterol. Following the initial idea, we then designed alternative sterols with the methyl groups removed one by one – we expected that maybe not all of the methyl groups are needed for maintaining the optimal tilt (Pöyry et al., 2008). Contrary to expectations, however, all the designed sterols turned out again to be inferior in their ordering capabilities to cholesterol, although in some cases, the differences were very small. These studies also showed the C18 methyl group to be the most important one, as its removal had the largest effect. Still, other methyl groups also enhanced the sterols' ordering abilities. All this was very surprising and was in contrast to our expectations, so we continued our investigations even further.

The observation of the most important methyl group being C18 has interesting connotations. The most common lipid chain is an 18-carbon, monounsaturated chain, with the double bond located at position 9–10, and attached at the *sn*-2 position of a glycerol moiety. Cholesterol's effect on unsaturated lipids is weaker than on saturated ones. However, as our studies have shown (Martinez-Seara et al., 2008), the position of the double bond is significant. The largest differences between saturated and unsaturated lipids were observed when the double bond was located at position 9–10. Shifting the double bond up or down leads to stronger effects of cholesterol, and gradually the interactions of the unsaturated and saturated tails with cholesterol converged. Even shifting an unsaturated tail from the *sn*-2 position to *sn*-1 slightly increased cholesterol's effects (Martinez-Seara et al., 2009). Plausibly, the reason for this may be the difference in equivalent atom positions in the two tails. Consequently, we proposed an additional function for the C18 methyl group: discrimination between saturated and unsaturated chain. We also hypothesized that lipids and sterols coevolved, leading to the known cholesterol structure, and selection of hydrocarbon chain, which together optimize the desired membrane properties. Moreover, the differences in cholesterol effects on saturated and unsaturated lipids affect phase separation and properties of the formed domains.

Another difference between cholesterol and Dchol can be easily visualized. If we look at the cholesterol molecule perpendicularly from its side (**Figure 1**), we see a clear pattern – a flat and a rough face. Now, if we instead look at the cholesterol molecule from top down, we see a kind of threefold symmetry, shown in **Figures 1** and **2**. This is caused by the β-face being subdivided into two further faces (Martinez-Seara et al., 2010). Dchol, due to its lack of methyl groups on the β-face, does not display this kind of threefold symmetry. The difference can be visualized well by looking at the two-dimensional radial distribution of cholesterols around a tagged cholesterol shown in **Figure 2**. This difference may affect the phase behavior of lipid bilayers. As we mentioned above, lanosterol does not promote the *Lo* phase formation, and due to the additional methyl group does not possess the threefold symmetry. As depicted in **Figure 2**, our preliminary data suggest that the symmetry of cholesterol's ring affects the sterol–sterol arrangement. Sterols tend to locate in the second coordination shell of each other, with a lipid molecule in between (Martinez-Seara et al., 2010). Due to the threefold symmetry, cholesterol molecules are able to form a fork net (**Figure 2**) that is likely capable of covering large areas. By contrast, Dchol has only twofold symmetry and thus forms linear structures. It seems plausible that this different form of molecular packing will affect also the phase behavior of Dchol. At this point, we need more extensive studies to further clarify the matter.

## **Experiments Confirmed the Results from Atomistic MD Simulations**

To validate the results from these MD simulation studies, one first has to synthesize the de-methylated form of cholesterol. This task is not to be taken lightly, as cholesterol has seven chiral centers, which make its synthesis particularly complicated. Nevertheless, Dchol was recently synthesized, 7 years after our first simulation studies of de-methylated sterols (Mydock-McGrane et al., 2014). Synthesis was started from a compound whose synthesis was known before: perhydrochrysenone from which 18-19-dinor-cholesterol was obtained in eighteen steps. The yield from the whole synthesis was 3.5%, which taking into account the complexity of the process is a very good result.

The properties of Dchol were carefully examined via an extensive set of biophysical methods (Krause et al., 2014). Langmuir monolayers and fluorescence anisotropy measurements showed that Dchol has slightly weaker condensing and ordering ability than cholesterol, in agreement with our simulation data. Calorimetric study showed that the temperature of the main phase transition is within error range for lipid bilayers with both sterol types. Nevertheless, excess heat capacity endotherms showed that cholesterol affects the phase transition more strongly than Dchol, indicating differences in interactions of both sterols with phospholipids. Most interestingly, the results of this study showed decreased nearest neighbor interactions in bilayers with Dchol, compared to those with cholesterol. This result cannot be directly compared to results from MD simulations; however, it has interesting consequences. The difference of nearest neighbor interactions of tens of calories per mole, as observed in this experimental study, might lead to substantial changes in domain size distribution as documented by Monte Carlo simulations (Almeida, 2009).

## **Conclusion**

Both atomistic MD simulations and experimental studies have shown that cholesterol's methyl groups are important structural elements of cholesterol and definitely are not molecular fossils. On the contrary, they are important structural elements. Removal of these groups clearly decreases the sterol's ordering and condensing effects. MD simulation studies have indicated that the decreased ordering is related to a larger tilt of the de-methylated sterols, suggesting that the methyl groups are involved in maintaining the proper orientation of cholesterol in lipid bilayers. Both experimental and MD studies imply that the presence of methyl groups might affect the sterol's ability to induce phase separation by affecting domain sizes or changing the structure of the formed sterol–lipid–sterol patches. This problem clearly requires more studies, as it might potentially be the most important reason for nature to select cholesterol.

This perspective article has provided a clear example of how MD simulations can independently provide powerful predictions and thus guide experiments. We have shown how constructing molecules that do not exist in nature can increase our understanding of molecular design of lipids and how simulations of these systems are capable of providing correct, valuable predictions later confirmed by experiments. Despite these kinds of successes, the current editorial practice in high-impact journals clearly favors papers that include both MD simulations and experiments. This happens at the expense of pure simulation papers, which are considerably harder to publish. Surely, every model has to be validated with experimental data, yet lifting the requirement that every single simulation study has to be coupled to experiments *in the same paper* might result in the publication of a greater number of progressive, high-quality simulation articles, which would likely provide fresh, valuable ideas, and predictions for experimental scientists.

## **Acknowledgments**

This work was supported by the Academy of Finland (Center of Excellence program, project no. 272130), the European Research Council (Advanced Grant CROWDED-PRO-LIPIDS), and the Sigrid Juselius Foundation. CSC-IT Center for Science (Espoo, Finland) is acknowledged for computational resources.

#### **References**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Róg, Pöyry and Vattulainen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Characteristics of Sucrose Transport through the Sucrose-Specific Porin ScrY Studied by Molecular Dynamics Simulations

*Liping Sun1 , Franziska Bertelshofer1,2 , Günther Greiner2 and Rainer A. Böckmann1 \**

*1Computational Biology, Department of Biology, Friedrich-Alexander University of Erlangen-Nürnberg, Erlangen, Germany, 2Computer Graphics Group, Department of Computer Science, Friedrich-Alexander University of Erlangen-Nürnberg, Erlangen, Germany*

#### *Edited by:*

*Zoran Nikoloski, Max-Planck Institute of Molecular Plant Physiology, Germany*

#### *Reviewed by:*

*Alexander Schulz, University of Copenhagen, Denmark Mario Andrea Marchisio, Harbin Institute of Technology, China Ulrich Kleinekathöfer, Jacobs University Bremen, Germany*

> *\*Correspondence: Rainer A. Böckmann rainer.boeckmann@fau.de*

#### *Specialty section:*

*This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology*

*Received: 10 September 2015 Accepted: 25 January 2016 Published: 15 February 2016*

#### *Citation:*

*Sun L, Bertelshofer F, Greiner G and Böckmann RA (2016) Characteristics of Sucrose Transport through the Sucrose-Specific Porin ScrY Studied by Molecular Dynamics Simulations. Front. Bioeng. Biotechnol. 4:9. doi: 10.3389/fbioe.2016.00009*

Sucrose-specific porin (ScrY) is a transmembrane protein that allows for the uptake of sucrose under growth-limiting conditions. The crystal structure of ScrY was resolved before by X-ray crystallography, both in its uncomplexed form and with bound sucrose. However, little is known about the molecular characteristics of the transport mechanism of ScrY. To date, there has not yet been any clear demonstration for sucrose transport through the ScrY. Here, the dynamics of the ScrY trimer embedded in a phospholipid bilayer as well as the characteristics of sucrose translocation were investigated by means of atomistic molecular dynamics (MD) simulations. The potential of mean force (PMF) for sucrose translocation through the pore showed two main energy barriers within the constriction region of ScrY. Energy decomposition allowed to pinpoint three aspartic acids as key residues opposing the passage of sucrose, all located within the L3 loop. Mutation of two aspartic acids to uncharged residues resulted in an accordingly modified electrostatics and decreased PMF barrier. The chosen methodology and results will aid in the design of porins with modified transport specificities.

Keywords: molecular dynamics, ScrY, porin, sucrose binding, transport mechanism, potential of mean force

## 1. INTRODUCTION

Most bacteria produce cell walls surrounding the cytoplasmic membrane to protect their vulnerable cell structure and to maintain their mechanical rigidity. Gram-negative bacteria possess an outer membrane surrounding the inner cell wall with a peptidoglycan layer in between (Glauert and Thornley, 1968). The outer membrane acts as a selective permeability barrier to exclude noxious compounds and exchange nutrients and waste products with the external medium (Nikaido and Nakae, 1980). For this purpose, the outer membrane contains porins, a special class of proteins, which function as channels across the membrane. Through porins a variety of substrates can pass in a diffusion-like process (Nakae, 1976). Such porins can be classified into (i) general diffusion porins that are responsible for the non-specific and spontaneous transport of ions and small hydrophilic molecules, and (ii) specific diffusion channels that contain stereospecific binding sites within the pore, facilitating the uptake of solutes of certain types (Nikaido, 1992). The expression of these specific porins is usually induced under special environmental conditions (Nikaido and Vaara, 1985).

General diffusion porins usually form tightly assembled homotrimers. Each subunit is a water-filled *β*-barrel consisting of typically 16 or 18 antiparallel *β*-strands oriented perpendicular to the membrane plane and tilted by 30–60° with respect to the molecular symmetry axis (Nabedryk et al., 1988; Jap, 1989). The *β*-strands are connected by short turns on the periplasmic side and by long loops on the external side (Nikaido, 1994). Interestingly, porin channels are constricted by the so-called eyelet loop that folds inwardly and is attached to the inner side of the barrel wall, resulting in a cross-section of a minimal size of ~7 × 11 Å, which allows the passage of hydrophilic solutes up to an exclusion size of ≈600 Da (Weiss et al., 1991; Schirmer, 1998). Since the pore size is similar to the diameter of most nutrient molecules, the diffusion rates are strongly affected by the physical properties of the substrates. As typical general porins in *E. coli*, OmpF and OmpC were reported to favor both neutral molecules and cations, while PhoE favors anions (Nikaido and Vaara, 1985; Bauer et al., 1989).

One example of a specific porin is the sucrose-specific porin (ScrY) of enteric bacteria, expressed when *E. coli* is starved for sucrose (Schmid et al., 1982). It permits the rapid influx of sucrose across the outer membrane, allowing cells to grow on sucrose as a solo carbon source (García, 1985; Schmid et al., 1991). ScrY was found during the investigation of the plasmid-encoded metabolic pathway of sucrose in *Salmonella typhimurium*, where sucrose has only a small rate of translocation through the outer membrane in the absence of the *scr* genes pUR400 (Schmid et al., 1988). The crystal structure of sucrose-specific porin was determined at a resolution of 2.4 Å, both in its uncomplexed form and with bound sucrose (Forst et al., 1998). Apart from the shared architectural properties with general porins, each polypeptide chain of the ScrY channel, containing 413 structurally well-defined amino acids, traverses the membrane 18 times as antiparallel *β*-strands surrounding a hydrophilic pore. Importantly, the eyelet loop (L3) folds inwardly into the lumen of the *β*-barrel forming a selective gate, comparable to general porins as mentioned above. This is in excellent agreement with the physiological function of the porin to exclude toxic compounds and to maximize the uptake of nutrients using wide openings and a greasy pathway for sugars (see below), minimizing frictional interactions to the contriction site in the L3 region (Welte et al., 1995). Sucrose-specific porin binds two sucrose molecules at the same time in a certain configuration (Forst et al., 1998) [similar to maltoporin (Dutzler et al., 1996)]. Besides, it also has the features of a general diffusion pore with a comparable single-channel conductance, which is much smaller for other specific porins (Schmid et al., 1991; Schülein et al., 1995).

The essential role of the inner loop L3 for modulating the translocation of molecules was also suggested for the outer membrane porin OprD, specific for the uptake of small natural substrates like cationic amino acids, by means of *in silico* electrophysiology and metadynamics simulation techniques (Samanta et al., 2015). A combination of *in silico* and *in vitro* studies was used in the study of the outer membrane channels OprP (specific for phosphate transportation) and OprO (specific for diphosphate transportation). Two amino acids in the central constriction region were suggested to generate the substrate specificity. Reciprocal exchange of these amino acids resulted in an interchange of substrate specificities for these channels (Modi et al., 2015). For OmpF, the cation selectivity was reported to be highly influenced by the electrostatic environment of the constriction region. It was found that removing the cationic residues in the cross-sectional area enhanced the cation selectiviy, whereas removal of the anionic residues reversed the selectivity (Pezeshki et al., 2009).

Another specific porin, Maltoporin (LamB), was first identified as the receptor for *λ*-phage in *E. coli* (Randall-Hazelbauer and Schwartz, 1973). It forms trimeric channels, which are specific for the transport of maltose and malto-oligosaccharides and is synthesized for maltodextrin concentrations below 10 μM (Nikaido and Vaara, 1985; Saurin et al., 1989; Death et al., 1993). Noteworthy, ScrY can also function as a malto-oligosaccharide porin, and LamB is also able to activate the influx of sucrose, but only at high substrate concentrations (Szmelcman and Hofnung, 1975; Schülein and Benz, 1990; Schmid et al., 1991). Therefore, it was suggested that ScrY is a sugar-specific porin with a similar function as LamB. This is further stressed by their structural similarity (Dali Server yields a *Z* score of 36.8) and their sequence similarity (Blast *E* value = 4e − 173) (Altschul et al., 1990; Holm and Rosenström, 2010). Although the similarity in topology is remarkable, the depicted porins share only 23% amino acid sequence homology (Protein Data Bank). The sequence identity is increased at the prospective glucose-binding regions, located in the first half of the primary sequence from the N-terminal end (Hardesty et al., 1991; Schülein et al., 1995). Interestingly, the binding constant of ScrY and LamB with malto-oligosaccharides increases with the number of glucose residues but becomes saturated after five residues, as inferred from current noise and ion flow inhibition studies respectively (Benz et al., 1987; Schülein et al., 1991). Based on these observations, Benz and colleagues proposed that the binding site has a length of about five glucose residues along the wall of the channel and the dextrine penetrates in a single file because of the hourglass-shaped constriction of the pore. However, evidence for this mechanism is still scarce (Benz et al., 1987; Benz and Bauer, 1988).

Very few experiments have been carried out so far that provide insights into the structure–function relationship of ScrY. The first clear demonstration of sucrose uptake was demonstrated by Schmid et al. (1982) in their *in vivo* research of *E coli* using chromatography techniques, where the apparent *K*m value (i.e., the Michaelis constant) was determined to 10 μM. Simultaneously, the binding of sucrose to ScrY has been investigated by Schülein who reported a stability constant K defined as the ratio between on and off rate constants for sugar binding of 20 l mol<sup>−</sup><sup>1</sup> based on the relative rates of permeation of ScrY and LamB and the stability constant of sucrose binding to LamB (Schülein et al., 1995). The penetration rates for sugar through porin channels is likely to be strongly affected by its concentration on each side of the membrane, as revealed by current noise studies of Jordy on LamB due to the blockage of the ionic flow during the passage (Jordy et al., 1996). Besides, the conception of a "greasy slide" was described both for ScrY and LamB containing five or six contiguous aromatic side chains lining up on one side of the channel and forming a smooth hydrophobic path (Schirmer et al., 1995; Wang et al., 1997; Forst et al., 1998). The greasy slide extends from the channel vestibule through the constriction zone to the periplasmic exit and functions as a guidance for sugar molecules sliding through the channel by engagement in non-specific hydrophobic interactions with the pyranose rings (Schirmer et al., 1995). Based on this concept, the passage of sucrose has been hypothesized to consist of a series of steps: first, sucrose diffuses from the external solution to a trapping zone in sucrose-specific porin. Second, a sucrose molecule slides along the greasy slide and enters the binding region. Finally, sucrose passes through the binding region and enters the cell (Forst et al., 1998). However, to date there is no convincing evidence supporting this hypothesis.

In this study, we have investigated the dynamics of the ScrY trimer embedded in a phospholipid bilayer as well as the characteristics of sucrose translocation by means of atomistic molecular dynamics (MD) simulations. Various structural properties of ScrY and its implications for sucrose transport are discussed. Based on results of potential of mean force (PMF) calculations for sucrose transport through ScrY, key residues were pinpointed and a mutant suggested that showed a significantly decreased free energy barrier for the passage of sucrose.

#### 2. COMPUTATIONAL METHODS

#### 2.1. System Setup

The simulation system was prepared by embedding the refined protein model in the lipid bilayer using the INSANE approach (Pluhackova et al., 2013; Wassenaar et al., 2015). The crystal structure of sucrose-specific porin was taken from the Protein Data Bank (PDB entry 1A0T, see **Figure 1A**) (Forst et al., 1998). The parameters for sucrose were generated using the GLYCAM06 Carbohydrate Builder. The lipid type and the bilayer composition have been proven to be non-essential for both the porin functions and for the pore properties (Parr et al., 1986; Wiese et al., 1994, 1996). Here, a bilayer patch composed of 331 1-palmitoyl-2-oleoyl-sn-glycero-3-phosphocholine (POPC) lipids was constructed in the fluid phase with equal cell lengths in the membrane plane (*Lx* = *Ly*, thickness of approx. 3.76 nm). The protein membrane system was dissolved with TIP3P water (Jorgensen et al., 1983) in a rhombic box (>36,000 water molecules). *Na*<sup>+</sup> and *Cl*<sup>−</sup> ions were added at a physiological concentration of 50 mM to the solution, the protein net charge was compensated by ions. All starting structures were initially energy minimized using the steepest descent algorithm. Two setups constructed from the crystal structure without sucrose (system "ScrY-nosuc") and with 6 bound sucrose molecules ("SrcY-suc") were simulated for 100 ns each. Subsequently, the system without sucrose after 100 ns was taken as the start configuration for two additional simulations, with 36 (system "Sucsol-36") and 72 sucrose molecules ("Sucsol-72") added to the solvent, respectively. These systems were simulated for 0.5 μs each. A snapshot of one simulation system is shown in **Figure 1B**, it contains in total more than 173,000 atoms.

#### 2.2. Simulation Details

Atomistic molecular dynamics simulations were carried out using the open-source software package GROMACS version 4.6.3 with a time step of 2 fs (Berendsen et al., 1995; Hess et al., 2008). The AMBER99SB-ILDN force field (Hornak et al., 2006; Lindorff-Larsen et al., 2010) for proteins was combined with GLYCAM06 parameters (Kirschner et al., 2008; Tessier et al., 2008) for sucrose and the SLIPIDS force field (Jämbeck and Lyubartsev, 2012a,b,c) for lipids. The cutoff for the van der Waals and the (short-range) Coulomb potential was chosen to 1.2 nm. Short-range electrostatic interactions were calculated explicitly, whereas long-range electrostatic interactions were computed using the Particle Mesh Ewald (PME) method (Darden et al., 1993). Temperature coupling was achieved with the Nose-Hoover scheme with a reference temperature of 310 K and a time constant of 1.0 ps (Nosé, 1984; Hoover, 1985). The semiisotropic pressure coupling was controlled using the Berendsen-thermostat algorithm with a time constant of 5.0 ps (Berendsen et al., 1984). The Lincs algorithm was applied to constrain the bond lengths of the hydrogen atoms to a constant value (Hess et al., 1997). Periodic boundary conditions were applied in all three dimensions to avoid boundary effects caused by a finite simulation system.

Sucsol-36. The periplasmic side is on the lower site. The porin is depicted in cartoon representation (orange), lipid tails as sticks (dark blue), nitrogen and phosphorus atoms as spheres (dark blue), and sucrose molecules in stick-and-ball representation (green). The system was fully solvated (light blue, ions not shown).

Analysis was conducted using in-house routines, GROMACS analysis utilities, and the HOLE program (Smart et al., 1997). Molecular visualization was performed using VMD (Humphrey et al., 1996) and PyMOL (DeLano and Bromberg, 2004).

#### 2.3. Potential of Mean Force

The Potential of Mean Force (PMF) was computed using umbrella sampling in order to gain insight into the energetic determinants of sucrose transport through ScrY. In umbrella sampling simulations, the sampling of high-energy regions is improved by adding a biasing "umbrella" potential [e.g., Christ et al. (2010)]. For this purpose, Steered Molecular Dynamics (SMD) simulations were initially carried out by pulling a sucrose molecule through one of the three porins in the trimer (Isralewitz et al., 2001). To avoid ambiguities at the channel openings, a structure with a sucrose bound inside the pore was taken as a start configuration. This sucrose molecule was pulled in both directions (along *z*-axis, 50 ns for each direction). From these pulling simulations, 105 starting structures for subsequent umbrella sampling simulations were extracted with a spacing of the sucrose positions along the channel axis of ≤0.08 nm (in total 103 simulations for the mutant). The umbrella potential was introduced between the center of mass of the channel and the sucrose. The harmonic force constant was chosen to 1,000 kJ/mol/nm2 . Other molecular dynamics parameters were identical to the equilibrium simulations as described above.

Each umbrella was simulated for at least 10 ns. Several samples where the sucrose was close to the pore constriction were equilibrated for longer times in order to obtain an improved equilibration of the system. Histograms distal to the mutation site and the constriction region were used both for the wild type and the mutant. **Figure 2** shows the umbrella histograms exhibiting sufficient overlap between the adjacent windows. The potential of mean force as a function of the channel coordinate (*z*-coordinate along the membrane normal) was calculated using the Weighted Histogram Analysis Method (WHAM) based on these umbrella sampling simulations (Rosenbergl, 1992) as implemented in the Gromacs tool g\_wham (Hub et al., 2010). The error was obtained from bootstrapping of the histograms using *Bayesian bootstrap* with 200 bootstraps. The PMF was analyzed for a periodic reaction coordinate, and the integrated autocorrelation times of the bootstrapped trajectories were smoothed using a Gaussian filter (Hub et al., 2010).

#### 3. RESULTS

The ScrY-trimer with and without bound sucrose were stable in 100 ns simulations with comparably small root mean square deviations (rmsd) of 1–1.5 Å for the backbone atoms of each monomer (not shown, similar rmsd for backbone atoms of trimer structure empty/with bound sucrose). The spontaneous binding of sucrose to the channel was addressed in two 0.5-μs simulations for different sucrose concentrations in solution (36 and 72 sucrose molecules, corresponding to a concentration of ~50 and 100 mM in solution, respectively).

#### 3.1. Sucrose Binding to ScrY

Two bound sucrose molecules per monomer were resolved in the crystal structure of ScrY (Forst et al., 1998). In the simulations, the sucrose in the periplasmic binding sites spontaneously left ScrY within the first 20 ns of simulation (see **Figure 3A**, traces colored light blue, gray, and magenta; sucrose colored brown in ScrY structure). For the external binding sites, only one sucrose molecule stayed bound for the full 100 ns (colored green). It is located above the inner loop L3 and below the more flexible loops connecting the *β*-strands of the barrel at the external side. No sucrose molecule passed the center of the channel defined by the midpoint between the bound sucrose molecules of the crystal structure (*z* = 0 nm). These simulation results together with the reported sucrose occupancies of only 0.71 (periplasmic binding site) and 0.80 (external binding site) in the crystal structure (Forst et al., 1998) suggest that not all ScrY sucrose binding sites are occupied under physiological conditions. In addition, sucrose

FIGURE 2 | The histograms of umbrella sampling both for the wild type (A) and for the mutant protein (B). The brown sphere depicts the initial position of the sucrose molecule.

FIGURE 3 | Displacement of the sucrose molecules along the pore axis (*z*-coordinate) as a function of simulation time for the simulation systems ScrY-suc (A), Sucsol-36 (B), and Sucsol-72 (C). The green and brown dashed lines indicate the external and the periplasmic binding sites for sucrose in the crystal structure. In addition, the channel structures of chain A (ChA) at 0 ns (i.e., the crystal structure) and after 100 ns (ScrY-suc system) are shown in the right panel [(A); The three monomers of the homo-trimer were assigned the labels (A–C) to distinguish between them.]. The channel structures of chain C (ChC, Sucsol-36 system) and chain B (ChB, Sucsol-72 system) after 0.5 μs are provided in subfigures (B,C), respectively. The inward folded loop L3 is highlighted in blue. The sucrose molecules bound from the external and the periplasmic sides of the channel are shown in stick representation (green and brown, respectively). The coloring of the sucrose traces is as followes: chain A: black and gray, chain B: blue and light blue, chain C: red and magenta.

may be trapped in the region between the inner loop L3 and the free loops at the external side of the channel.

**Figures 3B,C** show the binding of sucrose molecules from the solution to the apo-ScrY as observed in 0.5 μs simulations

#### TABLE 1 | Number of ions passing through ScrY in equilibrium simulations.


*The three monomers of the homo-trimer were assigned the labels A, B, and C to distinguish between them.*

(Sucsol-36 and Sucsol-72). No sucrose molecule passed the channel in the simulations. However, a number of events is observed for sucrose binding to both periplasmic and external openings of the channel. Each two sucrose molecules got bound to two of the three chains at intermediate concentrations (Sucsol-36), at high concentration two sucrose molecules were bound to each chain. The binding positions were, however, shifted with respect to the positions reported in the crystal structure. Sample snapshots after 0.5 μs of simulation with marked sucrose positions are provided in **Figure 3**. In conclusion, sucrose is able to spontaneously move into and bind within the channel close to the periplasmic binding site and within the external trapping area, but is not transported on the submicrosecond timescale.

In addition to sucrose binding, transport of Na<sup>+</sup> and Cl<sup>−</sup> ions could be observed. As detailed in **Table 1**, in total seven Na<sup>+</sup> ions and one Cl<sup>−</sup> ion moved through ScrY during the 100 ns equilibrium simulations (ScrY-nosuc and ScrY-suc), and 33 (1) Na<sup>+</sup> (Cl<sup>−</sup>) ions during the 0.5 μs simulations (Sucsol-36, Sucsol-72). Thus the passage of positively charged ions through ScrY (both directions) is strongly preferred over the passage of negatively charged ions. This may be explained by the negative electrostatic potential at the external opening and through the pore as shown in **Figure 4** that is probably caused by several (negatively charged) aspartic acids in the inner L3 loop of ScrY. Exchange of two aspartic acids of L3 (Asp194 and Asp201) by alanines significantly shifted the electrostatic potential within the pore (see below for the selection of mutants). Interestingly, ion passage was blocked for the wild type ScrY if a sucrose molecule bound close to the inner constriction zone. This is in agreement with earlier experimental studies on porins reporting ion blockage by sugar binding to the porin (Andersen et al., 1998; Kullman et al., 2002).

## 3.2. ScrY Pore Size

The shape and size of the ScrY channel was addressed using the HOLE program (Smart et al., 1996). **Figures 5A,B** provide both the average pore radius along the channel axis as well the pore flexibility or pore fluctuations (gray shaded area). The radius profile as well as the channel structure (**Figure 5C**) reflects an hourglass shape of the pore with a minimum radius of only 0.3 nm in the constriction area (0–1 nm). In this region, the pore is narrowed down by the inwardly folded loop L3; it forms the external binding site for sucrose. The loop L3 is stabilized by a hydrogen bond network in particular with loop L1 (**Figure 5D**). Arg110 of L1 forms hydrogen bonds with His196 and Trp197 (both L3), and Tyr97 (L1) forms hydrogen bonds with Asp199 and Ser200 (both L3).

The external opening of the porin shows a comparable high flexibility. The shape of the pore and also its flexibility appear unchanged for high sucrose concentrations in the medium.

FIGURE 4 | Electrostatic potential along the channel axis. The electrostatic potential was calculated using a recently developed Poisson– Boltzmann solver (Bertelshofer et al., 2015). The membrane was implicitly modeled using a dielectric constant of 2 (Böckmann et al., 2008), the region of the membrane boundaries are shown by black lines, the extension of the ScrY protein into the solvent phase by dashed gray lines. The potential for the wild type (black line) was averaged over 100 ns of simulation, the potential for the Asp194Ala:Aps201Ala mutant averaged over 50 ns (red line). The path through the individual channels was analyzed using the HOLE program (Smart et al., 1997). The blue line shows the electrostatic potential for the crystal structure.

## 3.3. Potential of Mean Force for Passage of Sucrose through ScrY

Umbrella sampling was applied to study the energetics of sucrose passage through ScrY. Starting structures for the individual umbrellas along the channel were extracted from pulling simulations; in these, starting from an initial configuration after 100 ns of simulation (ScrY-suc system, see **Figure 3A**) a bound sucrose was pulled to both channel openings.

The PMF for sucrose transport through the ScrY using a periodic reaction coordinate shows a total barrier height of ≈ 22 kJ/ mol (**Figure 6**, black line). Two equally steep regions were identified labeled B1 and B2. The first barrier B1 at the periplasmic part of the porin has a height of ≈19 kJ/mol, the second energy barrier B2 of ≈22 kJ/mol. The latter barrier is close to the external sucrose trapping region of ScrY. It is located within the constriction area. The two crystal binding sites for sucrose are found at two metastable positions, right above the B2 barrier, and close to the minimum following the intracellular B1 barrier (dashed lines in **Figure 6**). Smaller barriers preceding B1 (*z* < −0.6 nm) are caused by electrostatics interactions of sucrose with the mobile loops at the periplasmic pore entrance (not shown). **Figure 7** shows sampled sucrose positions from the umbrella simulations. As suggested before, the passage of sucrose mainly follows the amino acids that form the greasy slide (highlighted).

An energy decomposition was applied to determine the influence of all residues aligning the pore through the porin on the passage of sucrose, and key residues were selected for subsequent mutation. The enthalpic contributions of the 115 amino acids forming the ScrY pore on the PMF for sucrose transport were estimated as the sum of the corresponding Lennard-Jones and (short-range) Coulomb interactions (short-range cutoff of 1.2 nm) between each amino acid and the sucrose molecule along

FIGURE 5 | Pore radius profiles along *z*-coordinate for equilibrium simulations of ScrY without sucrose [ScrY-nosuc, 100 ns, (A)] and mutant [mutant-chA, 50 ns, (A)], and for ScrY in a solution with high sucrose concentration [ScrY-72, 500 ns, (B)]. The gray shadow describes the flexibility caused by the channel fluctuation. The middle of two crystal sucrose binding sites is found at *z* = 0 nm. The pore for a monomer is visualized (C), where the porin is described by cartoon presentation in orange, and the inner loop L3 in blue. The red spherical probes describe the lumen along the pore axis. The hydrogen bond interactions between loop L1 (red) and loop L3 (blue) of the channel are shown in (D). The detailed description is shown in the enlarged square. The involved amino acids are described by sticks and spheres, where ARG110 (white) from L1 forms hydrogen bonds with HIS196 (blue) and TRP197 (gray) from L3, TYR97 (green) from L1 forms hydrogen bonds with ASP199 (red), and SER200 (yellow) from L3.

FIGURE 6 | Potential of mean force (PMF) profiles along the channel coordinate both for the wild type (black) and for the mutant protein (red). The shaded areas indicate the statistical uncertainty (67% confidence interval). The two main energy barriers for wt ScrY are highlighted by gray shaded areas (B1, B2). The brown dot depicts the initial position of the sucrose molecule for the initial pulling simulation. The two sucrose binding sites of the crystal structure are found at the dashed lines. The external side of ScrY is found on the right side. The PMF was analyzed using a periodic reaction coordinate in the Weighted Histogram Analysis Method (WHAM), a profile using non-periodic WHAM is provided as Supplementary Information.

the path. The average residue–sucrose interaction energy for each umbrella window was computed and analyzed.

The interaction energies of the three most strongly interacting residues with sucrose are shown in **Figure 8**. Asp194, Asp199, and Asp201 show strong Coulomb interactions, which coincide with the largest barriers of the PMF profile (see **Figure 6**). Interestingly, these three amino acids are located in the central L3 loop, highlighting the importance of L3 for sucrose transport. For the design of a ScrY mutant with improved sucrose transport characteristics, we accordingly chose Asp194 and Asp201 for mutation to alanine (termed ScrYmut). Asp in position 199 was kept due to its role in fixing the position of the L3 loop within the pore (see **Figure 5D**), as shown above. The influence of the double mutant Asp194Ala:Asp201Ala on the PMF was subsequently tested in additional umbrella simulations, restricted to the region from the external sucrose binding site to the periplasmic ScrY opening. Starting structures for the umbrella simulations were obtained from a sucrose pulling simulation, applied pulling forces for the wt ScrY, and mutant ScrY are compared in **Figure 9**. The maximal pulling force is reduced from ≈240 pN for wt ScrY to <200 pN for the double mutant. However, considerable fluctuations for the pulling forces are to be expected due to activated processes. Still, the location of the main barrier is in agreement with the PMF result. As expected from the residue-resolved sucrose interaction energies and the pulling forces, the potential of mean force is significantly lowered for the designed ScrY double mutant (**Figure 6**, red line). Both the B1 and B2 barriers are diminished by the Asp-to-Ala mutations within the L3 loop. The overall barrier height was found decreased by ≈8 kJ/ mol. Also the electrostatic potential along the pore axis increased significantly (see **Figure 4**, green line). Provided the stability of this mutant also *in vivo*, the change in the PMF profile suggests a significantly enhanced sucrose transport for this mutant.

#### 4. DISCUSSION

The characteristics of sucrose passage through the porin ScrY was studied using atomistic molecular dynamics simulations. Equilibrium simulations showed that very high sucrose concentrations are required to see (meta-)stable binding of sucrose to both the periplasmic- and the external openings of the porin, i.e., the two binding sites reported for the crystal structure are most probably not simultaneously occupied at low or intermediate sucrose concentrations. The inside width of the hourglass shaped pore and thus the size of molecules allowed for passage is mainly restricted by the inwardly folded L3 loop. It forms a constriction zone with a radius of only 0.3 nm. Larger fluctuations of the pore width were observed only for the external pore mouth, providing a larger access volume for sucrose approach. It has been reported that the sucrose molecule has to form a specific configuration (left-handed helical conformation) in order to pass through this narrow pore (Dutzler et al., 1996; Forst et al., 1998).

The potential of mean force for sucrose passage through the porin was obtained from extensive umbrella sampling simulations. The PMF exhibits two main energy barriers with 19and 22 kJ/mol. The enthalpic contribution to these barriers could be mainly ascribed to Coulombic interactions with the aspartic acids of the central L3 loop (Asp194, Asp199, Asp201), suggesting these acidic residues as mutation candidates for the design of ScrY variants with altered transport characteristics. How well will the porin structure be retained for such mutants? A study of OmpF with four point mutants and one deletion mutant showed no alteration of the barrel structure but only local effects on the structure of the pore constriction region (Lou et al., 1996). The authors mutated pore wall arginines at the constriction zone of OmpF by shorter uncharged residues. Similar to ScrY, OmpF contains acidic residues on the L3 loop and basic residues on the

facing barrel wall (Nikaido, 2003). Therefore, one may conclude from the OmpF study that also non-conservative mutations of the counterpart residues of the pore wall arginines, i.e., the mutations of acidic residues of the L3 loop to uncharged ones, will leave the overall porin structure unaltered.

Since one of the aspartic acids (Asp199) is involved in hydrogen bonding with loop L1, that appears essential for the configurational stabilization of loop L3, only two aspartic acids of L3 (Asp194 and Asp201) were mutated to ALA in order to generate a mutant showing enhanced sucrose passage capabilities. Comparisons of forces for pulling sucrose through the porin, of the electrostatic potential along the pore, and of the PMF between wt ScrY and ScrYmut showed a significantly altered energetics for sucrose passage. The PMF barrier is decreased by ≈8 kJ/mol. However, while sucrose passage through ScrYmut will be enhanced, the mutation will possibly also affect the specificity of the porin. Additionally, despite using umbrellas being very close in space with long simulation times (total simulation time for wt was 1.29 μs and for the mutant 1.72 μs), the PMFs are not fully converged and show a substantial error. Still, the results indicate a substantially improved passage of sucrose through the mutated porin.

In summary, key residues for passage of sucrose through ScrY were identified and a double mutant with improved transport characteristics designed *in silico*. The suggested mutant is currently further characterized in experiments.

#### AUTHOR CONTRIBUTIONS

RB designed research, LS performed simulations, LS, FB, and RB performed analysis, and all wrote the manuscript.

#### ACKNOWLEDGMENTS

We acknowledge support by the Emerging Fields Initiative *Synthetic Biology* (EFI) and the Research Training Group 1962, *Dynamic Interactions at Biological Membranes: From Single Molecules to Tissue*. Liping Sun was supported by the China Scholarship Council (CSC). We acknowledge computational support from the Computer Center of the Friedrich-Alexander University of Erlangen-Nürnberg (RRZE).

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at http://journal.frontiersin.org/article/10.3389/fbioe.2016.00009

#### REFERENCES


pores and substrate-specific porins. *Mol. Microbiol.* 5, 2233–2241. doi:10.111 1/j.1365-2958.1991.tb02153.x


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be constructed as a potential conflict of interest.

*Copyright © 2016 Sun, Bertelshofer, Greiner and Böckmann. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **GroPBS: Fast Solver for Implicit Electrostatics of Biomolecules**

*Franziska Bertelshofer 1,2 \*, Liping Sun<sup>2</sup> , Günther Greiner <sup>1</sup> and Rainer A. Böckmann<sup>2</sup> \**

*<sup>1</sup> Computer Graphics Group, Department of Computer Science, University Erlangen-Nürnberg, Erlangen, Germany, <sup>2</sup> Computational Biology, Department of Biology, University Erlangen-Nürnberg, Erlangen, Germany*

Knowledge about the electrostatic potential on the surface of biomolecules or biomembranes under physiological conditions is an important step in the attempt to characterize the physico-chemical properties of these molecules and, in particular, also their interactions with each other. Additionally, knowledge about solution electrostatics may also guide the design of molecules with specified properties. However, explicit water models come at a high computational cost, rendering them unsuitable for large design studies or for docking purposes. Implicit models with the water phase treated as a continuum require the numerical solution of the Poisson–Boltzmann equation (PBE). Here, we present a new flexible program for the numerical solution of the PBE, allowing for different geometries, and the explicit and implicit inclusion of membranes. It involves a discretization of space and the computation of the molecular surface. The PBE is solved using finite differences, the resulting set of equations is solved using a Gauss–Seidel method. It is shown for the example of the sucrose transporter ScrY that the implicit inclusion of a surrounding membrane has a strong effect also on the electrostatics within the pore region and, thus, needs to be carefully considered, e.g., in design studies on membrane proteins.

#### **Keywords: electrostatics, Poisson–Boltzmann equation, finite-difference method, molecular surface, membranes**

## **1. INTRODUCTION**

Electrostatic interactions govern the physical–chemical interactions in and between biomolecules (Perutz, 1978). A quantitative description of electrostatic energies (Warshel et al., 2006) is required both for a thorough understanding of biomolecular systems, e.g., of membrane-embedded ion channels or of pK<sup>a</sup> changes during enzymatic function and in protein design, e.g., the design of novel protein folds or in the search for high-affinity ligands.

Coulombic forces are modulated by the environment (Warshel et al., 2006), i.e., in case of soluble proteins by water and ions and possibly other proteins, and additionally by phospholipids in the case of membrane proteins. In atomistic molecular dynamics (MD) simulations (Karplus and McCammon, 2002), this environment is treated explicitly for a proper description, in particular, of the local electrostatics, e.g., in water-mediated hydrogen bonds. Computationally less demanding models introduced uniform dielectric constants for both the protein and the solvent (Tanford and Kirkwood, 1957), a distance-dependent dielectric constant accounting for electrostatic shielding within the solvent (Brooks et al., 1983), or modeled the solvent using an explicit grid of Langevin dipoles (Warshel and Levitt, 1976).

The electrostatic contribution to free energy differences between two states of a biomolecular system – e.g., two proteins bound to each other vs. the two proteins separated in space – is in many cases difficult to directly access; however, it may be computed from ensembles of microscopic structures. The simulation-based free energy perturbation (FEP) approach or the thermodynamic

#### *Edited by:*

*Zoran Nikoloski, Max-Planck Institute of Molecular Plant Physiology, Germany*

#### *Reviewed by:*

*Zhaoyong Yang, Chinese Academy of Medical Sciences, China Daehee Lee, Korea Research Institute of Bioscience and Biotechnology, South Korea*

#### *\*Correspondence:*

*Franziska Bertelshofer franziska.bertelshofer@fau.de; Rainer A. Böckmann rainer.boeckmann@fau.de*

#### *Specialty section:*

*This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology*

> *Received: 11 September 2015 Accepted: 30 October 2015 Published: 17 November 2015*

#### *Citation:*

*Bertelshofer F, Sun L, Greiner G and Böckmann RA (2015) GroPBS: Fast Solver for Implicit Electrostatics of Biomolecules. Front. Bioeng. Biotechnol. 3:186. doi: 10.3389/fbioe.2015.00186*

integration (TI) method is frequently used to analyze free energy difference of biomolecular states. The computational cost of such methods is, however, at variance with the need for high throughput, e.g., in protein design or protein–ligand docking and in the pK<sup>a</sup> analysis of titratable sites as well. The latter problem requires the accurate estimation of the free energy differences between protonated and de-protonated states of all titratable groups in proteins, either on crystal structures or on trajectories obtained from molecular dynamics simulations to better grasp the protein flexibility (Narzi et al., 2008a). Protonation or pK changes are, e.g., important during the function of enzymes (Narzi et al., 2008a), or in protein–ligand binding (Narzi et al., 2008b; Onufriev and Alexov, 2013).

Due to the computational efficiency required to tackle the above problems, the solvation free energy is usually addressed in implicit models by solving the Poisson–Boltzmann equation (PBE) for the different states [see, e.g., Ullmann and Bombarda (2014)]. Different values for the dielectric continuum inside the protein have been used in the literature. The states may be crystal structures, modeled structures, snapshots from molecular dynamics simulations [MM/PBSA (Kollman et al., 2000)], or structural ensembles generated from crystal structures [CC/PBSA (Benedix et al., 2009)]. The solution of the PBE on structural ensembles should yield a more accurate solution for the solvation free energies of the studied biomolecules as it takes the flexibility of the protein into account.

A number of program packages for the numerical solution of the Poisson–Boltzmann equation have been developed in the past, namely APBS (Baker et al., 2001), UHBD (Madura et al., 1995), or DelPhi (Li et al., 2012), to mention a few. Here, we present a sequential PBE solver (*GroPBS*) that allows the accurate analysis of the electrostatic component of the solvation free energy of soluble proteins and of membrane proteins in explicit or implicit membranes. *GroPBS* is compatible with the GROMACS simulation suite and can, thus, easily be combined with simulations of biomolecules and the various biomolecular force fields available in Gromacs.

## **2. METHODS**

#### **2.1. Poisson–Boltzmann Equation**

The analysis of the electrostatic potential Φ of a biomolecule in an implicitly treated solvent environment requires the numerical solution of the Poisson–Boltzmann equation (PBE). For the most simple case of a solute molecule in a homogeneous medium, three different domains (Holst, 1994) may be distinguished (see **Figure 1**):

1. Inside the molecule (green) (i.e., within the van der Waals radii of the atoms of the solute) Φ can be computed using the Poisson equation and Green's identity resulting in

$$\nabla^2 \Phi(\mathbf{x}) = \sum\_{i=1}^{M} \frac{-4\pi q\_i}{\varepsilon\_1} \delta(\mathbf{x} - \mathbf{x}\_i)$$

where *M* is the number of atoms of the solute with point charges *q<sup>i</sup>* at positions *x<sup>i</sup>* .

2. Outside the molecule (light blue), the charge density of the ions in the solvent is assumed to follow a Boltzmann distribution that yields

$$
\nabla^2 \Phi(\varkappa) = \kappa^2 \frac{k\_B T}{\varkappa\_\varepsilon} \sinh\left(\frac{e\_\varepsilon \Phi(\varkappa)}{k\_B T}\right),
$$

*k<sup>B</sup>* is the Boltzmann constant, *e<sup>c</sup>* the elementary charge, *T* the temperature, and *κ* is the modified Debye–Hückel parameter, given by

$$\kappa^2 = \frac{8\pi N\_A e\_c^2 I\_s}{1000\varepsilon\_3 k\_B T}$$

with *N<sup>A</sup>* Avogadro's number and *I<sup>s</sup>* the ionic strength of the solvent.

3. In the ion exclusion layer (dark blue) around the molecule no mobile ions are present. The Poisson equation reads accordingly

$$
\nabla^2 \Phi(x) = \mathbf{0}.
$$

Combining these conditions into one equation, results in the non-linear PBE

$$-\nabla(\varepsilon(\mathbf{x})\nabla\Phi(\mathbf{x})) + \kappa^2(\mathbf{x})\frac{k\_B T}{\varrho\_\varepsilon}\sinh\left(\frac{\varrho\_\varepsilon\Phi(\mathbf{x})}{k\_B T}\right) = 4\pi\rho(\mathbf{x}) \quad \text{(1)}$$

with the charge distribution function *ρ*(*x*) of the molecule and the spatial relative dielectric function *ε*(*x*) and *κ* 2 (*x*), which is 0 in the solute and the ion exclusion layer and *κ* 2 (*x*) = *ε*3*κ* 2 for *x* in the solvent. *ε*(*x*) allows for the description of the dielectric discontinuity between the molecule and the solvent and is typically chosen to adopt values between 2 and 20 inside the biomolecule and 80 outside. More complicated models for the dielectric "constant" taking smoothed boundaries into account were suggested recently by Li et al. (2013) and will be considered for future work.

The framework provided by equation (1) allows the straightforward inclusion of various regions with different dielectric properties, e.g., of membranes. The membrane is modeled as another bulk medium like the solvent but with its own dielectric constant [*ε*mem *≈* 2*. . .*3 (Böckmann et al., 2008)]. In a first step, the membrane is treated as a box in the *xy*-plane represented by its upper and lower *z*-value. However, more sophisticated models like curved membranes or taking the surface structure into account may easily be implemented.

The above partial differential equation cannot be solved analytically for objects shaped more complex than, e.g., a single sphere or cylinder. Therefore, the PBE has to be solved numerically.

#### **2.2. Finite-Difference Method**

The first step in the numerical solution of the PBE is to map all physical quantities (charges at atom centers, dielectric values, etc.) onto a three-dimensional uniform grid. Such a grid allows to replace differential operators by grid value differences. This approach is facilitated by linearizing the PBE [LPBE (Holst, 1994)]; For sinh(*x*) *≈ x* the latter simplifies to

$$-\nabla(\varepsilon(\mathbf{x})\nabla\Phi(\mathbf{x})) + \kappa^2(\mathbf{x})\Phi(\mathbf{x}) = 4\pi\rho(\mathbf{x}).\tag{2}$$

Discretization of this equation yields for every grid point

$$\Phi\_0 = \frac{\sum\_{k=1}^6 \varepsilon\_k \Phi\_k + 4\pi q\_0/h}{\sum\_{i=k}^6 \varepsilon\_k + (\kappa\_0 \cdot h)^2},\tag{3}$$

*q*<sup>0</sup> and *κ*<sup>0</sup> denote the charge and Debye–Hückel parameter at the grid point, Φ*<sup>k</sup>* is the potentials at the six neighboring grid positions (in *x-*, *y-*, and *z*-direction), and *ε<sup>k</sup>* is the dielectric values at the midpoints between Φ<sup>0</sup> and its neighbors Φ*<sup>k</sup>* (see **Figure 2**) (Nicholls and Honig, 1991). *h* is the step size, i.e., the distance between the grid points. While charge and the Debye–Hückel parameter are given values at the grid points, one important step is to define in which medium a grid point or midpoint is located. This problem is described in more detail below.

Application of equation (3) for each grid point results in a system of *N* 3 linear equations where *N* is the grid size. This system can be reformulated as

$$
\Phi = M\Phi + \nu,\tag{4}
$$

with Φ as a vector containing the potential at all grid points and *M* being a sparse matrix containing zeros at positions (*i*, *j*) if *i* and *j* are not neighboring grid points and <sup>∑</sup> *εj* 6 *k*=1 *εk*+(*κ*0*·h*) <sup>2</sup> at positions (*i*, *j*) else, with *k* denoting all neighboring grid points of *i*. *v* is a vector containing the remaining terms of the discretized LPBE.

#### **2.3. Successive Over-Relaxation**

Several alternatives were suggested for the treatment of the grid boundaries: setting the electrostatic potential at the boundaries to 0, application of distance-dependent quasi-Coulombic boundary conditions, or periodic extensions of the system box in one or more directions.

The set of linear equations (4) is iteratively solved using methods like Jacobi or Gauss–Seidel (Demmel, 1997). Here, we used in a first serial implementation a successive over-relaxation (SOR) for Gauss–Seidel yielding the iteration rule

$$
\Phi^{(n+1)} = \omega \Phi^{n \kappa \eta} + (1 - \omega) \Phi^{(n)}, \tag{5}
$$

*n* is the iteration step, Φ *new* is computed using equation (3). *ω >* 1 is the relaxation parameter. In a sequential implementation, every newly computed grid point value is immediately used for computing further grid points in the same iteration step – in contrast to the Jacobi method that renders the Gauss–Seidel approach much faster. Since the convergence rate of the Gauss–Seidel method depends on the spectral radius, i.e., the largest eigenvalue of *M*, *ω* should be chosen in a way it makes the spectral radius smaller. It can be shown that the optimal value is given by

$$
\omega = \frac{2}{1 + \sqrt{1 - \lambda\_N}},
$$

*λ<sup>N</sup>* is the spectral radius. The spectral radius of *M* can be computed using the Connected-Moments Expansion (Cioslowski, 1987; Nicholls and Honig, 1991).

#### **2.4. Solvation Free Energy**

The grid-based electrostatic potential is used to compute the solvation free energy of the system. This is achieved by summing the product of the potential and the charge at each grid point, followed by subtraction of the corresponding energy as obtained for the potential using a uniform dielectric inside and outside of the solute. This approach eliminates the self-energy terms that are physically not meaningful. The disadvantage of this approach is the required double computation of the electrostatic potential.

A different approach can be used if the molecular surface is known (see below). The reaction field effects due to a dielectric boundary are replaced by the computation of the induced charges at this boundary (Rocchia et al., 2002). The solvation energy *G<sup>W</sup>* may then be calculated applying Coulomb's law for the induced and the real charges. Formalizing this approach yields the following equation:

$$G\_W = 0.5 \cdot \sum\_{b \in boundary} \left( \sum\_{p \in \text{grid}} \frac{q\_p}{dist(b\_s, p)} \right)$$

$$\cdot \left( \frac{3h}{2\pi} \left( \Phi\_b - \frac{1}{6} \sum\_{k=1}^6 \Phi\_k \right) - q\_b \right), \tag{6}$$

*b* denotes the grid points at the dielectric boundary, *p* all grid points (these can be reduced to all charged grid points, as for all other points the term is 0) and *b<sup>s</sup>* is the points at the molecular surface obtained by projecting the boundary grid points to the surface.

#### **2.5. Optimizations**

Some simplifications of equation (3) are possible (Nicholls and Honig, 1991): first, most grid points do not hold a point charge, i.e., the term 4*πq*0/*h* is equal to 0. Second, most grid points are not found at a dielectric boundary, i.e., all neighboring midpoints are located in only one medium. These modifications lead to a quite simple 7-point stencil (modified by salt if present)

$$\underbrace{\Phi\_{0} = \frac{1}{6} \sum\_{k=1}^{6} \Phi\_{k}}\_{\text{without salt}} \qquad \text{or} \qquad \underbrace{\Phi\_{0} = \underbrace{\frac{\sum\_{k=1}^{6} \Phi\_{k}}{6 + \left(\frac{(\kappa\_{0} \cdot h)^{2}}{\varepsilon\_{0}}\right)}}\_{\text{with salt}}.\tag{7}$$

In each iteration step of the SOR first this stencil is used, followed by analysis of the correction for dielectric discontinuities and charges if necessary. This approach is also a first step for a parallelization as the uniform stencil in equation (7) suits to be applied in parallel.

## **2.6. Extraction of Molecular Surface**

In order to define which grid points lie inside and outside of the solute, the assignment of dielectric constants to the midpoints and, in particular for the computation of the free energy in a more sophisticated way (see above), the solute surface needs to be defined. The molecular surface is determined by its implicit description from the atom positions and radii of the solute. This surface is defined as the contact surface between the van der Waals surface and the surface of a spherical probe representing the solvent.

The first step is to determine a van der Waals map by mapping the atoms onto the grid. Every midpoint is categorized as inside or outside of the solute. This is done by examining all midpoints inside a box surrounding each atom and comparing their distance to the atom center with the radius of the atom. The same approach is used to determine which grid points are in solution, which is important for knowing which grid points' stencil has to be modified by salt. Additionally, the grid points are classified as internal points if all surrounding midpoints are inside, external points if all surrounding midpoints are outside, or boundary points if some midpoints are in solution and some are not.

Taking into account the probe radius, some of these points have to be reclassified as sketched in the red marked region in **Figure 3**. This is done by iteratively examining the midpoints that surround the boundary points. For each such point, the distance to the solvent accessible surface is computed. This distance is defined as the distance from the van der Waals surface (violet line in **Figure 3**) extended by the radius of the probe, e.g., by 1.4 Å for water (one has to take into account that these extended atoms can overlap, see green line in **Figure 3**). If this distance is smaller than the probe radius, the midpoint remains outside; otherwise, it will be turned into an inside midpoint. Next, the grid points are reclassified using the new midpoints. This is repeated until no new boundary grid points are produced.

The actual surface points are finally constructed by projecting the boundary grid points onto the molecular surface either directly by moving them along the line connecting the grid point

**FIGURE 3 | 2D representation of a very simple molecule consisting of two atoms on a grid**. The midpoints inside are marked as green dots, black dots (and no dots) denote midpoints outside. The grid points of interest at the boundary are marked with orange diamonds.

and the nearest atom center or by projecting them first onto the closest point of the solvent accessible surface and then back on the molecular surface in a similar way (Rocchia et al., 2002).

**Figure 4** shows the result of the surface computation for a small peptide. In the convex parts, the molecular surface corresponds to the van der Waals surface (brightly colored parts), whereas they differ in concave parts and cavities.

## **3. RESULTS**

## **3.1. Evaluation of Current Program**

The approach described in the Section "Methods" has been implemented in a sequential program using C++. As input separate files containing the atom positions (".pdb," Protein Data Bank format) and force field parameters, like in the program Delphi (Li et al., 2012) can be used as well as the Gromacs (Pronk et al., 2013) input file format (".tpr") for atom positions, Lennard Jones parameters for the atom sizes, and partial atomic charges. The leading biomolecular force fields for molecular modeling such as CHARMM, GROMOS, OPLS, or Amber are supported by Gromacs. This combination with the widely used Gromacs simulation package considerably simplifies the usage and enhances the applicability of the presented PBE solver in combination with biomolecular simulations, e.g., in the analysis of protein–protein binding free energies using the MM/PBSA approach.

#### **TABLE 1 | Possible parameters**. in(tpr, path) Gromacs input file Can also be given in the command line in(pdb, path) Protein positions Alternative to tpr in(siz, path) Protein sizes Necessary with pdb, optional with tpr in(crg, path) Protein charges Necessary with pdb, optional with tpr in(sph, path) Positions in pdb format Optional, to compute the potential Along a path or a specific postilions Filling Percentage of box that is filled with protein Default: 80% Spacing Spacing (in Å) between two grid points Default: 1 Å gridS Grid size, i.e., number of grid points in each direction (odd) Default: is computed Two of these three parameters Can be chosen rmsc Convergence criterion Default: 0.0001 maxc Convergence criterion Default: 0.0001 maxit Maximal iteration Default: 500 epsIn Dielectric constant inside the molecule Default: 2.0 epsOut Dielectric constant in the solution Default: 80.0 salt Concentration in moles/liter Default: 0.0 temp Temperature Default: 273.15 bc = {1,2} Boundary conditions 1: 0-boundary, 2: quasi-Coulombic pb = *xyz* Periodic boundary conditions in *x*-, *y*-, *z*-direction Default: false nomem No membrane input Default: no information mem = zmin, zmax, eps Minimal and maximal spread of membrane Default: no information In *z*-direction and its dielectric constant tprmem = res, atm, eps Residue and atoms to be considered membrane in tpr Default: no information Its dielectric constant

The above-described input files and additional parameters like grid size, percentage of filling, probe radius, etc. are listed in one parameter file that serves as input. **Table 1** contains all currently available parameters.

In the following, some of these parameters are explained in more detail.

#### 3.1.1. Membranes

Membranes are modeled as a slab in the *xy*-plane defined in extent by their minimal and maximal *z*-value. There are several possibilities to define the membrane extension:


#### 3.1.2. Boundary conditions

There are three possibilities to specify boundary conditions. These have to be fixed, as the grid points at the grid's boundary do not have enough neighbors to be computed directly. The first and easiest way is to set the boundary grid points to 0. Another possibility is to approximate the potential at the boundary using quasi-Coulombic dipole conditions:

$$\Phi(\mathbf{x}) = \frac{\mathfrak{c}\_{+} \cdot \exp(\frac{-d\_{+}}{\lambda})}{d\_{+} \cdot \mathfrak{c}\_{sol}} + \frac{\mathfrak{c}\_{-} \cdot \exp(\frac{-d\_{-}}{\lambda})}{d\_{-} \cdot \mathfrak{c}\_{sol}},$$

where *εsol* is the dielectric constant of the solvent, *c*<sup>+</sup> and *c<sup>−</sup>* are the total positive and negative charges, *d*<sup>+</sup> and *d<sup>−</sup>* are the distance of the grid point to the center of the positive and negative charges, and *λ* denotes the Debye length.

Alternatively, periodic boundary conditions can be used in each grid dimension separately. Then for the missing neighbors, the corresponding points at the opposite side of the grid are used. This provides the opportunity to model infinite boxes.

#### 3.1.3. Convergence Criterion

There are three possible ways to define the convergence of the iterative procedure. The first is to set a threshold on the root mean squared change (rmsc), defined as

$$rmsc^{(i)} = \sqrt{\frac{1}{M} \sum\_{\mathbf{x}} \left(\Phi^{(i)}(\mathbf{x}) - \Phi^{(i-1)}(\mathbf{x})\right)^2},$$

*M* is the number grid points and *i* is the iteration count. The rmsc measures the mean differences in the potential between two iterations. The option maxc limits the maximal change in the potential between two iterations. The third option allows to limit the number of iterations without regarding the convergence at all. Of course, these criteria can be combined, stopping the iteration process as soon as one criterion is fulfilled.

**FIGURE 5 | The number of iterations and, therefore, the time to solve the linear system depends on the grid size, i.e., the size of the studied solute molecule**.

**FIGURE 6 | The computation of the molecular surface, of the energy, as well as of the total CPU time depend on the number of residues**.

The implementation was tested using molecules of different sizes, ranging from small proteins with 49 residues to proteins as large as 1,070 residues (1,260–16,260 atoms, respectively). The grid size was chosen such that the filling rate was *≈*80% using a grid size of 1 Å. Different results regarding the correlation between the number of atoms and grid size, number of iterations, and time consumption for different parts of the program are shown in **Figures 5** and **6**.

The results show that the number of iterations and, therefore, the time to solve the linear system strongly depends on the number of grid points. The surface computation and energy calculation, however, depend on the size of the molecule.

The electrostatic potential computed using GroPBS may be mapped onto the surface of biomolecules using, e.g., PyMOL (Schrödinger, 2010). As an example, **Figure 7** shows the surface potential of acetylcholinesterase.

#### **3.2. Influence of an Implicit Membrane**

In order to evaluate the influence of the low dielectric of a lipid membrane on the electrostatic potential of an embedded membrane protein, the potential was compared for the sucrose-specific porin ScrY (Forst et al., 1998) in different environments. **Figure 8** shows this porin embedded in a POPC bilayer. The electrostatic

**FIGURE 9 | Potential profiles along the paths through the sucrose-specific porin ScrY under different conditions**. Upper panel: the electrostatic potential computed without consideration of a membrane slab (red), and with an implicit membrane slab with *ε* = 2 (blue) and *ε* = 4 (green). Lower panel: Results for the electrostatic potential along the sucrose pore averaged over snapshots of a 100 ns molecular dynamics simulation. The protein flexibility is highlighted by its influence on the electrostatic potential (gray shaded region). The vertical lines describe the extent of the protein (gray) and the membrane (black).

potential was analyzed along a path through the pore of each monomer of the ScrY homo-trimer. This path was obtained using the program hole (Smart et al., 1993). For the solution of the PBE, the grid was chosen such that the filling rate was *≈*80% using a step size of 0.5 Å and periodic boundary conditions in lateral direction (membrane plane) and dipole boundary conditions normal to the membrane. The dielectric constants were set to 4.0 inside the protein and 80.0 in the solvent phase. The dielectric of the surrounding implicit membrane was varied to study the influence of the membrane on the central, membrane-distal pore region. The PBE was solved every 0.5 ns of a 100 ns simulation, excluding the initial equilibration period of 20 ns. The electrostatic potential was averaged over the trajectory.

The obtained electrostatic potential through the ScrY pore (see **Figure 9**, upper panel) is drastically decreased if an implicit

#### **REFERENCES**


membrane is included in the calculations. While the shape of the potential is similar, the minimum is shifted by approximately 1.5 nm, from the membrane interfacial region to the interior of the pore. The additional inclusion of flexibility by averaging the potential along a molecular dynamics trajectory results in an increase of the potential by up to 7 kT/e (**Figure 9**, lower panel).

## **4. DISCUSSION AND FUTURE WORK**

A new program for the numerical solution of the Poisson–Boltzmann equation around biological macromolecules is presented (GroPBS). Apart from soluble proteins, GroPBS may as well be used to analyze the electrostatic potential of integral membrane proteins. The low-dielectric membrane environment may be modeled implicitly or explicitly. Additionally, GroPBS is shown to efficiently perform such computations both on pdb files and using the Gromacs input file format. This significantly simplifies the fast calculation of, e.g., the solvation free energies of biomolecules for different force fields or on ensembles of structures obtained from molecular dynamics simulations.

On the example of the sucrose-specific porin ScrY, we show that the inclusion of a membrane may have a substantial influence also on the potential inside the protein, and thus should not be neglected in PB calculations of membrane proteins.

In a subsequent step, GroPBS will be parallelized for multicore architectures, and in particular, GPUs to enable, e.g., the fast analysis of changes in pK<sup>a</sup> values on the fly during biomolecular simulations. Parallelization may be easily achieved by the so-called *checkerboard ordering* in the update of the electrostatic potential (Adams and Ortega, 1982).

The program will be made available free of charge on the following website: www.biotechnik.nat.uni-erlangen.de/research/ boeckmann/downloads/GroPBS.

#### **AUTHOR CONTRIBUTIONS**

Research was designed by FB and RB, performed by FB and LS, and the manuscript was written by all.

#### **ACKNOWLEDGMENTS**

We acknowledge support by the Deutsche Forschungsgemeinschaft within the Research Training Group grant No. 1962/1, Dynamic Interactions at Biological Membranes – From Single Molecules to Tissue (FB, LS, and RB). This work was also supported by the Emerging Field Initiative Synthetic Biology of the Friedrich-Alexander University of Erlangen-Nürnberg. Liping Sun was supported by the China Scholarship Council (CSC).


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Bertelshofer, Sun, Greiner and Böckmann. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **Model-Based Design of Biochemical Microreactors**

*Tobias Elbinger <sup>1</sup> , Markus Gahn<sup>1</sup> , Maria Neuss-Radu<sup>1</sup> \*, Falk M. Hante<sup>2</sup> , Lars M. Voll <sup>3</sup> , Günter Leugering<sup>2</sup> and Peter Knabner <sup>1</sup>*

*<sup>1</sup> Chair of Applied Mathematics 1, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany, <sup>2</sup> Chair of Applied Mathematics 2, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany, <sup>3</sup> Chair of Biochemistry, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany*

Mathematical modeling of biochemical pathways is an important resource in Synthetic Biology, as the predictive power of simulating synthetic pathways represents an important step in the design of synthetic metabolons. In this paper, we are concerned with the mathematical modeling, simulation, and optimization of metabolic processes in biochemical microreactors able to carry out enzymatic reactions and to exchange metabolites with their surrounding medium. The results of the reported modeling approach are incorporated in the design of the first microreactor prototypes that are under construction. These microreactors consist of compartments separated by membranes carrying specific transporters for the input of substrates and export of products. Inside the compartments of the reactor multienzyme complexes assembled on nano-beads by peptide adapters are used to carry out metabolic reactions. The spatially resolved mathematical model describing the ongoing processes consists of a system of diffusion equations together with boundary and initial conditions. The boundary conditions model the exchange of metabolites with the neighboring compartments and the reactions at the surface of the nano-beads carrying the multienzyme complexes. Efficient and accurate approaches for numerical simulation of the mathematical model and for optimal design of the microreactor are developed. As a proof-of-concept scenario, a synthetic pathway for the conversion of sucrose to glucose-6-phosphate (G6P) was chosen. In this context, the mathematical model is employed to compute the spatio-temporal distributions of the metabolite concentrations, as well as application relevant quantities like the outflow rate of G6P. These computations are performed for different scenarios, where the number of beads as well as their loading capacity are varied. The computed metabolite distributions show spatial patterns, which differ for different experimental arrangements. Furthermore, the total output of G6P increases for scenarios where microcompartimentation of enzymes occurs. These results show that spatially resolved models are needed in the description of the conversion processes. Finally, the enzyme stoichiometry on the nano-beads is determined, which maximizes the production of glucose-6-phosphate.

**Keywords: biochemical microreactor, multienzymes complexes, spatio-temporal mathematical model, numerical simulation, PDE-constrained optimization, model-based design**

#### *Edited by:*

*Zoran Nikoloski, Max-Planck Institute of Molecular Plant Physiology, Germany*

#### *Reviewed by:*

*Xinyao Liu, SABIC, Saudi Arabia Jesus Picó, Universitat Politècnica de València, Spain*

*\*Correspondence: Maria Neuss-Radu maria.neuss-radu@math.fau.de*

#### *Specialty section:*

*This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology*

> *Received: 17 September 2016 Accepted: 28 January 2016 Published: 15 February 2016*

#### *Citation:*

*Elbinger T, Gahn M, Neuss-Radu M, Hante FM, Voll LM, Leugering G and Knabner P (2016) Model-Based Design of Biochemical Microreactors. Front. Bioeng. Biotechnol. 4:13. doi: 10.3389/fbioe.2016.00013*

## **1. INTRODUCTION**

One of the greatest challenges in biology is to understand the fundamental principles on how evolution has selected networks to fulfill specific functional needs in the control of metabolism or transcription. Synthetic biology approaches may help to shed light on such principles by identifying modular functional units of a network and uncover how units can be linked together to yield new function (Bashor et al., 2010). There has already been early success in engineering simple regulatory circuits that recapitulate some of the behaviors of natural regulatory circuits, like, e.g., circuits that regulate gene expression oscillations, bistable switches, or circuits that perform combinatorial logic operations, see Boyle and Silver (2009) and Purnick and Weiss (2009) for reviews. However, synthetic biology approaches are not limited to this field. An other important area of synthetic biology is in the development of synthetic organelles, which host metabolic processes, and which are able to communicate with the outside environment *via* transport processes over semipermeable membranes, which can either be built from natural constituents or from synthetic polymers. Such robust bioreactors can be, e.g., applied in biotechnology or drug delivery for the production of bioactive ingredients (Roodbeen and van Hest, 2009; Marguet et al., 2013).

A fundamental principle in the development of bioreactors is compartmentation. Hereby, the strategy is to mimic cellular organization, see, e.g., Roodbeen and van Hest (2009). The presence of compartments (organelles) inside living cells allows for a better regulatory control over the biological processes that occur inside these compartments, e.g., pathways competing for intermediates can occur in spatially separated compartments and can be regulated differently. Furthermore, microcompartmentation by means of metabolic channeling prevents the loss of intermediates and minimizes competing cross-reactions. However, clearcut experimental evidence demonstrating the importance of metabolic channeling for metabolic flux *in vivo* is lacking. Mimicking the natural situation, nanoreactors can be built by encapsulating enzymes in vesicular compartments [as summarized in Peters et al.(2012)], and in addition microcompartmentation of enzymes can be achieved with the help of synthetic protein scaffolds (Chen and Silver, 2012). Microreactors based on microfluidic devices carrying out enzymatic processes, reviewed in Nomura et al. (2004) and Asanomi et al. (2011), have also been into the focus of research in the past years. Several methods used to immobilize enzymes are available, like, e.g., immobilization of enzymes on magnetic microparticles. An approach in the development of microreactors for biosynthesis is the creation of fluidic assaybased microreactors with membrane-bounded subcompartments, carrying out biochemical conversion, and allowing exchange of substrates and products between individual subcompartments.

In our paper, we are focusing on the model-based design of biochemical microreactors able to carry out enzymatic reactions and to exchange metabolites between individual subcompartments. The bioreactor is built according to reported model predictions and is based on a microfluidic system consisting of chambers separated by porous walls, which are interspersed with biomembranes (**Figure 1**). The membranes carry specific transporters for input of substrates and export of products. Inside the chambers magnetic nano-beads are present that are positioned by an outer magnetic field.

Multienzyme complexes assembled on these nano-beads by peptide adapters are used to carry out metabolic reactions. The nano-beads allow a maximal enzyme concentration, which depends on the beads' surface. Furthermore, it is possible to provide beads with a given enzyme stoichiometry. We exploit the microcompartmentation offered by the immobilized enzymes on the bead surface to address the question, if the spatial proximity of the individual enzymes and their stoichiometry has an impact on the productivity of the microreactor. In **Figure 1**, a microreactor consisting of an array of compartments is illustrated.

Our goal is to describe the spatio-temporal dynamics of metabolite concentrations involved in the biosynthetic conversions, and to optimize the microreactor, in order to increase the accumulation of the final product. To achieve this goal, we have developed a mathematical model describing the ongoing metabolic processes in spatial resolution. By applying a model with spatial resolution, we can assess, if the spatial arrangement of the individual enzymes and their stoichiometry has an impact on the productivity of the microreactor. Hence, our model will be able to predict, if metabolic channeling plays a role in the described *in vitro* assembly. Our model consists of a system of diffusion equations together with boundary and initial conditions. The boundary conditions model the exchange of metabolites with the neighboring chambers and the reactions at the surface of the nano-beads carrying the multienzyme complexes. For the numerical simulation of the mathematical model, efficient and accurate approaches are developed. These allow the computation of the spatio-temporal distribution of the metabolite concentrations inside the compartments and the flux of products through the export boundaries. Here, different computational scenarios can be considered including different numbers and loading capacities of nano-beads, different enzyme stoichiometries on the surface of the beads, as well as the distribution of enzymes both inside the fluid and on the beads' surfaces. Comparing the metabolic flux through the system for those scenarios allow to test the hypothesis that microcompartmentation of enzymes increases the efficiency of the metabolic pathway. Based on the mathematical model and the simulation approach, the optimal design of the microreactor is performed.

As a proof-of-concept scenario, we choose a synthetic pathway for the conversion of sucrose to glucose-6-phosphate. This metabolic pathway is localized in one chamber of microreactor allowing the import of sucrose and the export of glucose-6-phosphate. Based on the mathematical model, several investigations are performed. First, the spatio-temporal distributions of the metabolite concentrations, as well as application relevant quantities like the production rate of the metabolites and outflow rate of G6P are computed. These computations are performed for different scenarios, where the number of beads as well as their loading capacity are varied. Furthermore, for the calculations, we consider two different hexokinases, namely HsHK2 and ScHK2, and we suppose sucrose and ATP to be present in surplus quantities. Finally, for the mentioned scenarios, we determine the stoichiometry of enzyme concentrations on the nano-beads which maximizes the production of glucose-6-phosphate.

**substrates and export of products**. This microreactor is currently under construction and the modeling results reported here are being utilized to influence the conceptual design of the microreactor.

#### **2. MATERIALS AND METHODS**

#### **2.1. The Mathematical Model**

In this section, we set up a mathematical model describing the conversion of sucrose (S) to glucose-6-phosphate (G6P). This metabolic pathway is carried out in a microreactor consisting of a chamber separated from the surrounding medium by membranes carrying transport proteins for the input of sucrose and export of glucose-6-phosphate. Nano-beads loaded with multienzyme complexes are distributed inside the chamber. The enzymatic reactions constituting the metabolic pathway, as well as the corresponding enzymes and metabolites are listed in **Table 1**.

The layout of the microreactor is given in **Figure 2**. We denote the reactor chamber by Ω*<sup>c</sup> ⊂* R *n* (*n* = 2 or *n* = 3). The space inside the chamber occupied by nano-beads, which are balls with diameter *d*, is denoted by Ω*<sup>b</sup>* , whereas the remainder, representing the domain occupied by the bulk solution, is denoted by Ω := Ω*<sup>c</sup> \*Ω*<sup>b</sup>* . We assume that Ω*<sup>b</sup>* is strictly included in Ω*c*, this means that the beads do not touch the walls of the chamber. The number of beads in the reactor is denoted by *n<sup>b</sup>* . The boundary *∂*Ω of the domain Ω is decomposed into *∂*Ω*c*, the boundary of the chamber, and Γ*b* : = *∂*Ω*<sup>b</sup>* , the reactive surface of the nano-beads. Furthermore, the boundary *∂*Ω*<sup>c</sup>* of the chamber consists of Γ*<sup>i</sup>* , Γ*e*, and Γ0, i.e.,

#### *∂*Ω*<sup>c</sup>* = Γ*<sup>i</sup> ∪* Γ*<sup>e</sup> ∪* Γ0*,*

and the three sets are pairwise disjoint. The sets Γ*<sup>i</sup>* , respectively, Γ*<sup>e</sup>* represent the boundary parts where metabolites are transported into, respectively, out of the chamber Ω*c*. These import/export boundaries have a complex geometric structure. They consist of fenestrations lined with lipid-membranes carrying transporters for the exchange of metabolites. On the boundary Γ*<sup>i</sup>* the sucrose/proton cotransporters are distributed, whereas Γ*<sup>e</sup>* contains the transporters for the exchange of glucose-6-phosphate and inorganic phosphate. In our model, the microscopic structure of the boundaries is taken into account in an averaged (homogenized) way, which makes the model amenable for numerical calculations. More precisely, we assign to each of the boundaries Γ*i* , Γ*<sup>e</sup>* an effective permeability for the transported metabolites denoted by *θ i* , respectively, *θ e* (*θ i* , *θ <sup>e</sup> ∈* [0,1]). These permeabilities can be calculated by an averaging approach (homogenization), assuming that the pores of the transporters are very small compared to the dimension of the reactor chamber and occur in a large number, and that the transporters are uniformly distributed within the lipid-membrane. A sketch illustrating the idea behind the homogenization approach is given in **Figure 3**.

**TABLE 1 | Metabolic reactions and the corresponding enzymes and metabolites**.


*Metabolites: S, sucrose; H*+*, protons; G, glucose; F, fructose; G6P, glucose-6 phosphate; F6P, fructose-6-phosphate; Pi, inorganic phosphate. The index e identifies metabolite located outside the chamber.*

The spatio-temporal dynamics of the concentrations *y<sup>j</sup>* , *j* = 1, *. . .* , *m* of metabolites involved in the metabolic pathway is governed by a system of reaction–diffusion equations of the form:

$$D\_t \mathbf{y}\_j(t, \mathbf{x}) - D\_j \Delta \mathbf{y}\_j(t, \mathbf{x}) = R\_j^{\Omega} (\mathbf{y}(t, \mathbf{x}), \lambda) \qquad \text{for } (t, \mathbf{x}) \in (0, T) \times \Omega,\tag{1a}$$

together with the boundary conditions

$$-D\_{\dot{\jmath}} \nabla \jmath\_{\dot{\jmath}}(t, \mathfrak{x}) \cdot \nu(\mathfrak{x}) = -\mathsf{R}\_{\dot{\jmath}}^{b}(\mathscr{Y}(t, \mathfrak{x}), \lambda) \quad \text{for } (t, \mathfrak{x}) \in (\mathbf{0}, T) \times \Gamma\_{b}, \tag{1b}$$

$$-D\_{\dot{\jmath}} \nabla \jmath\_{\dot{\jmath}}(t, \mathfrak{x}) \cdot \nu(\mathfrak{x}) = -\theta^{\epsilon} \mathcal{R}\_{\dot{\jmath}}^{\epsilon}(\mathcal{y}(t, \mathfrak{x})) \qquad \text{for } (t, \mathfrak{x}) \in (0, T) \times \Gamma\_{\epsilon}, \tag{1c}$$

$$-D\_{\dot{j}} \nabla \mathcal{y}\_{\dot{j}}(t, \mathbf{x}) \cdot \nu(\mathbf{x}) = -\theta^{i} \mathcal{R}\_{\dot{j}}^{i}(\mathcal{y}(t, \mathbf{x})) \quad \text{for } (t, \mathbf{x}) \in (\mathbf{0}, T) \times \Gamma\_{i}, \tag{1d}$$

$$-D\_{\dot{f}} \nabla \chi\_{\dot{f}}(t, \mathfrak{x}) \cdot \nu(\mathfrak{x}) = \mathbf{0} \tag{1e} \\ \qquad \qquad \text{for } (t, \mathfrak{x}) \in (\mathbf{0}, T) \times \Gamma\_0, \\ \tag{1e}$$

and the initial condition

$$y\_j(0, \mathfrak{x}) = y\_j^0(\mathfrak{x}) \qquad\qquad\text{for } \mathfrak{x} \in \Omega.\tag{1f}$$

Here, *ν* denotes the outer unit normal on *∂*Ω with respect to Ω. Equation (1a) describes diffusion with diffusivity *D<sup>j</sup>* and enzymatic reactions with kinetics *R* Ω *j* for the metabolite number *j*. In case that enzymes are localized on beads (and thus no reactions are carried out in the bulk domain), the terms *R* Ω *<sup>j</sup>* are equal to zero. The boundary conditions involve the quantity *−Dj▽y<sup>j</sup> · ν* which describes the normal component of the diffusive flux of the j-th metabolite at the boundary of the domain Ω. Thus, condition (1e) models an impermeable boundary, where the normal flux is equal to zero. Conditions (1c)–(1d) model the flux of metabolites through the import and export boundary, respectively. This flux is proportional to the permeability of the boundary, and the kinetics of the corresponding transport protein. Finally, condition (1b) describes the normal flux of metabolites at the boundary of the beads, generated by enzymatic reactions at the beads' surface. We emphasize that the reaction rates *R b <sup>j</sup>* and *R* Ω *<sup>j</sup>* depend on an additional parameter *λ* = (*λ*1*, . . . , λn<sup>e</sup>* ) *∈* R *ne* . This parameter describes the stoichiometry of enzymes on the beads and in the bulk, and thus for *i* = 1, *. . .* , *ne*, where *n<sup>e</sup>* denotes the number of enzyme species involved in the reactions, holds

$$
\lambda\_i \ge 0 \qquad \text{and} \qquad \sum\_{i=1}^{n\_i} \lambda\_i = 1. \tag{2}
$$

(The case *λi*\* = 1 for some *i*\* *∈* {1, *. . . ne*} means that all binding sites on the bead surface are occupied by the enzyme *i*\*, whereas the case *λ<sup>i</sup>* = 1 *ne* for all *i ∈* {1, *. . . ne*} means that all enzymes occupy the same amount of binding sites).

The existence and uniqueness of positive solutions for the model (1) can be shown by arguments similar to those in Gahn et al. (under review)<sup>1</sup> . We also emphasize that the model (1), valid for one conversion chamber, can easily be extended to model a multicompartment microreactor. This is done by adding transmission conditions at the interfaces separating the compartments, which model the exchange of metabolites between neighboring compartments.

In this paper, we want to investigate the interplay between the metabolic processes at the beads and the export of the product glucose-6-phosphate. We assume that during the conversion processes sucrose and ATP concentrations can be regarded as constant, since these two metabolite species are present in excess. As a consequence, we drop reaction (0). The equation for ADP can be neglected, since ADP is just a product of irreversible reactions and therefore has no direct effect on the reaction rates in equation (1). Finally, we assume that the concentrations of G6P<sup>e</sup> and Pi<sup>e</sup> (glucose-6-phosphate and inorganic phosphate outside the chamber) are constant. This is motivated by the fact that G6P<sup>e</sup> may be consumed in a following reaction, and Pi<sup>e</sup> may be delivered at a desirable rate in the space outside the chamber. Hence, we can drop the equations for G6P<sup>e</sup> and Pie. The constant values of the concentrations mentioned above are given by the corresponding initial concentrations (**Table 3**). The scenario taking into account these assumptions shall be referred to as the sucrose excess scenario (SE-scenario).

The reactions relevant for the SE-scenario are reactions (1)–(5) (**Table 1**). These are in general multisubstrate enzymatic reactions. Their reaction mechanisms and the corresponding reaction kinetics are displayed in **Table 2A**. Note that in **Table 2**, the concentrations of metabolites are denoted with brackets (e.g., for sucrose, we use [S] instead of *yS*). This is chosen in order to keep the notation clear.

The unknowns of the model in the SE-scenario are the following metabolite concentrations: *yG*, *yF*, *y*F6P, *y*Pi, *y*G6P. For each unknown, a reaction–diffusion equation of type (1a) complemented by boundary conditions of type (1b)–(1e), and initial conditions holds. The reaction terms occurring in the equations are denoted by *R* Ω <sup>G</sup> *, . . . , R* Ω G6P , whereas the fluxes at the boundaries Γ*b* , Γ*e*, Γ*<sup>i</sup>* are given by reaction terms denoted by *R b* <sup>G</sup>*, . . . , R b* G6P , *R e* <sup>G</sup>*, . . . , R e* G6P , and *R i* <sup>G</sup>*, . . . , R i* G6P . The precise form of these reaction terms, in case when enzymes are distributed on beads, i.e., *R* Ω *<sup>j</sup>* = 0, is given in **Table 2B**. We emphasize that, when the metabolite *j*, *j* = G, *. . .* , G6P participates in different reactions, the

<sup>1</sup>Gahn, M., Neuss-Radu, M., and Knabner, P. (2015). Homogenization of reactiondiffusion processes in a two-component porous medium with nonlinear flux conditions at the interface. *SIAM J. Appl. Math.* (Under Review).

**TABLE 2 | (A) Metabolic reactions relevant for the sucrose excess scenario and the corresponding reaction kinetics; reaction mechanisms: (1) irreversible Michaelis–Menten; (2), (3) irreversible bi–bi ordered; (4) reversible Michaelis–Menten; (5) bi–bi ping pong. See, e.g., Segel (1975) for an overview on reaction mechanisms for multisubstrate enzymatic reactions; (B) reaction terms in equation (1) corresponding to the concentrations** *y<sup>j</sup>* **,** *j* **= G,***. . .***, G6P, for the SE-scenario with enzymes distributed on beads, i.e.,** *R* **Ω** *<sup>j</sup>* **= 0**.

## **(A) Reaction Reaction rate** (1) *<sup>r</sup>*inv (S) = *<sup>λ</sup>*inv*<sup>v</sup>*


*r*G6P (G6P*,* Pie*,* G6Pe*,* Pi) =

*v*

max *K*eq

inv max[S] *Km*S+[S]

$$\left(\right)$$

*v* max ( [G6P][Pie]*− <sup>K</sup>*eq ) [G6P][Pie]+*Km*Pie [G6P]+*K m*G6PT [Pie] ( 1+ [Pi] *Ki*Pi ) *... v* G6P*,f* max G6P*,b* ( *Km*Pi[G6Pe] ( 1+ [G6P] *Ki*G6P ) +[Pi] ( *Km*G6Pe

*Km*F6P+[F6P] *−*

G6P*,f*

*Km*G6P+[G6P]

[G6Pe][Pi]

+*...*

+[G6Pe] ))



reaction term *R b j* is given as a sum of all relevant reaction kinetics. If reactions take place also in the bulk, the reaction terms *R* Ω *j* , for *j* = G, *. . .* , G6P, have the same structure as *R b <sup>j</sup>* with potentially different parameters. Finally, we mention that in the SE-scenario the inflow boundary Γ*<sup>i</sup>* is impermeable, thus the reaction terms *R i* <sup>G</sup>*, . . . , R i* G6P are set to zero.

The values *v i* max for an enzyme *i, i ∈* {inv, hk, pgi} in **Table 2**, can be calculated by *v i* max = *k i* cat[*E*] 0 , with a turnover number *k i* cat given in **Table 3**, and the concentration [*E*]<sup>0</sup> of occupied binding sites on a bead. The enzymatic activity for the enzyme *i*, *i ∈* {inv, hk, pgi} on each bead is then given by *λiv i* max, where the vector *λ* = (*λ*inv, *λ*hk, *λ*pgi) describes the enzymes stoichiometry on the bead. For the simulations, we choose *λ* = *λ E* := (*λ E* inv*, λ<sup>E</sup>* hk*, λ<sup>E</sup>* pgi), where *λ E* satisfies

$$\begin{aligned} k\_{\rm cat}^{\rm inv} \lambda\_{\rm inv}^{E} &= k\_{\rm cat}^{\rm hk} \lambda\_{\rm hk}^{E} = k\_{\rm cat}^{\rm pġi} \lambda\_{\rm pġi}^{E}, \\ \lambda\_{\rm inv}^{E} &+ \lambda\_{\rm hk}^{E} + \lambda\_{\rm pġi}^{E} = 1. \end{aligned} \tag{3}$$

This enzyme stoichiometry leads to equal enzymatic activity for all enzyme species. The values of *λ E* are given in **Table 3**. Please note that in Section 2.3 optimal values for the parameter *λ* are computed.

**TABLE 3 | Parameter values for the SE-scenario, corresponding to two different hexokinases HsHK2 and ScHK2**.


*These parameters correspond to the 2-dimensional case and their units of measurements were adapted to this case. Approaches for the experimental determination of the kcatvalues can be found in Gao and Leary (2003), Gloyn et al. (2005), Lin et al. (2009), Lafraya et al. (2011), and Somarowthu et al. (2011). We use the values below to calculate the νmax-values. The initial values y* 0 *j in equation (6) for the species j* = *G, F, F6P, G6P, Pi are equal to zero. For the diffusion–coefficients, D<sup>j</sup> belonging to these species we use the value D<sup>j</sup>* <sup>=</sup> *5.5 · 10−<sup>10</sup> <sup>m</sup> 2 s .*

## **2.2. Numerical Methods**

We use a finite element method in order to find an approximation to the solution of equation (1) on a fixed time interval [0, *T*] and with a given initial state.

For the numerical accuracy and the computational complexity of an implementation, also in view of the optimization, it is crucial to choose a suitable discretization. For our problem, we use lowest order Raviart–Thomas elements (Raviart and Thomas, 1977). On a first glance, linear finite elements seem superior to Raviart–Thomas elements due to a lower number of unknowns and a higher order of convergence. However, for our problem, it turns out that the use of linear finite elements leads to solutions with negative concentrations when the triangulation *T <sup>h</sup>* is not extremely fine. More precisely, for *n<sup>b</sup>* = 1 we investigated two scenarios, and compared the number of unknowns that are needed for each discretization, to obtain realistic results. For Ω*<sup>c</sup>* = (0.50 *µ*m)<sup>2</sup> , we need 3200 degrees of freedom per species in order to obtain realistic results for linear finite elements, whereas Raviart-Thomas elements only need 993 degrees of freedom. For Ω*<sup>c</sup>* = (0.300 *µ*m)<sup>2</sup> we need 122,544 degrees of freedom per species in order to obtain realistic results for linear finite elements, whereas Raviart–Thomas elements only need 38,256 degrees of freedom. The computational complexity for Raviart–Thomas elements can be further reduced by hybridization, in this case only 607 degrees of freedom for the smaller domain and 23,016 degrees of freedom for the larger domain are needed.

The resulting space-discrete system is stiff, therefore, we use the implicit Euler method for the time integration. The resulting finite dimensional non-linear problems are solved with Newton's method. Whenever the non-linear solver can not find a nonnegative solution, the time step is rejected and a smaller time step size is used (until the non-linear solver can find a non-negative solution or the time step size is less than a predefined minimal time step size). We emphasize that decreasing the time step size was not necessary on the meshes used for spatial discretization (see Supplementary Material) with Raviart–Thomas elements and a time step size of 0.25 s.

In order to decrease the computational complexity of problem (1), arising from the number of species, we use the scheme presented in Kräutle and Knabner (2005) with minor modifications: instead of finding the basis of the image of the stoichiometric matrix *S* on the right hand side of the partial differential equation, we find the basis of the image of the matrix (*S<sup>b</sup>* |*Se*), where *S<sup>x</sup>* refers to the stoichiometric matrix of the reactions on Γ*x*, *x ∈* {*b,e*}. The decoupled problem consists of 6 coupled species, whereas the original problem consists of 11 coupled species. In the SEscenario, the decoupling scheme does not decrease the computational complexity, since the resulting stoichiometric matrix has full rank.

The numerical scheme has been implemented using the software package M++ that is based on the data structure in Wieners (2005). The algorithm uses MPI parallelization and can therefore use an arbitrary number of processors for computation. This is particularly useful for problems involving many species.

#### **2.3. Optimization Methods**

Besides setting up a mathematical model for the microreactor to describe the temporal and spatial distribution of enzymes and reactants, our goal also was to determine optimal parameters for enhanced performance of the microreactor based on this model. We demonstrate here how to succeed in a systematical way using derivative-based non-linear optimization techniques. Exemplary, we focus on how to determine the real parameters *λ* modeling the stoichiometry of the enzymes loaded on the beads entering in the non-linear terms in equation (1). We then discuss extensions of such methods to other parameters such as the discrete number of beads *n<sup>b</sup>* , the combinatorial problem of which specific enzymes should be used for the metabolic pathway and continuous geometry parameters such as the bead diameters or locations, the total activity of the transporter proteins, the shape and size of the chamber itself, etc.

The criterion for the performance of the reactor will be the total outflow of a desired product, say *y<sup>j</sup><sup>∗</sup>* (*λ*) on Γ*e*, over a fixed production time horizon [0,*T*],

$$J(\boldsymbol{\chi}) = \int\_{0}^{T} \int\_{\Gamma\_{\boldsymbol{\epsilon}}} -D\_{j\_{\star}} \nabla \boldsymbol{\chi}\_{j\_{\star}}(t, \boldsymbol{x}) \cdot \nu(\boldsymbol{x}) \, d\sigma(\boldsymbol{x}) \, dt$$

$$= \int\_{0}^{T} \int\_{\Gamma\_{\boldsymbol{\epsilon}}} -\theta^{\boldsymbol{\epsilon}} \boldsymbol{R}\_{j\_{\star}}^{\boldsymbol{\epsilon}}(\boldsymbol{\chi}(t, \boldsymbol{x})) \, d\sigma(\boldsymbol{x}) \, dt,\tag{4}$$

where we have used equation (1c) for *<sup>j</sup>* <sup>=</sup> *<sup>j</sup><sup>∗</sup>* in equation (4). For our sucrolytic chamber, we have *<sup>j</sup><sup>∗</sup>* <sup>=</sup> *<sup>j</sup>*G6P. By optimal parameters, we mean that any feasible choice close to the optimal one does not lead to a higher total outflow predicted by the model.

While derivative-based non-linear optimization techniques are conceptually well-known, we emphasize that due to the largescale, non-linear system of equations to be solved for each direct simulation of the model, the main challenge is an efficient computation of the derivative of the reduced cost function

$$
\hat{J}(\lambda) = (\mathcal{Y}(\lambda)), \tag{5}
$$

where *y*(*λ*) denotes the numerical solution of the system [equations (1a)–(1f)] as a function of the parameter *λ*. We will exploit that the dimension of the parameter space *n<sup>p</sup>* is typically small compared to the dimension obtained from discretization of the infinite-dimensional state space for *y* in the model and will therefore follow a sensitivity-based approach to compute the derivative [see, e.g., Hinze et al. (2009)]. Letting *e<sup>i</sup>* , *i* = 1, *. . .* , *np*, being a bases of the parameter space this means that we can compute the derivative ˆ*J ′* (*λ*) as

$$\hat{J}'(\lambda) = \sum\_{i=1}^{l} J\_{\mathcal{I}}(\mathcal{Y}(\lambda)) \,\delta\_{\mathcal{L}} \mathcal{Y},\tag{6}$$

where the sensitivity *δeiy* = *y ′* (*λ*)*e<sup>i</sup>* is given by the solution (*ỹ*1, *. . .* , *ỹm*) of the following linearized problem

$$\begin{aligned} \partial\_t \bar{\boldsymbol{y}}\_j(t, \boldsymbol{x}) - D\_j \Delta \bar{\boldsymbol{y}}\_j(t, \boldsymbol{x}) &= \partial\_{\mathcal{V}} \mathbb{R}^{\Omega}\_{\boldsymbol{j}}(\boldsymbol{y}(t, \boldsymbol{x}), \boldsymbol{\lambda}) \bar{\boldsymbol{y}}\_{\boldsymbol{j}} \\ - \partial\_{\lambda} \boldsymbol{R}^{\Omega}\_{\boldsymbol{j}}(\boldsymbol{y}(t, \boldsymbol{x}), \boldsymbol{\lambda}) \boldsymbol{e}\_i &\quad \text{for } (t, \boldsymbol{x}) \in (0, T) \times \Omega, \end{aligned} \tag{7a}$$

together with the boundary conditions

*− Dj∇*˜*yj*(*t, x*) *· ν*(*x*) = *−∂yR b <sup>j</sup>* ( *y*(*t, x*)*, λ*)˜*y<sup>j</sup>* + *∂λR b <sup>j</sup>* ( *y*(*t, x*)*, λ*)*e<sup>i</sup>* for (*t, x*) *∈* (0*, T*) *×* Γ*<sup>b</sup> ,* (7b) *− Dj∇*˜*yj*(*t, x*) *· ν*(*x*) = *− θ e* (*R e j*) *′* ( *y*(*t, x*))˜*y<sup>j</sup>* for (*t, x*) *∈* (0*, T*) *×* Γ*e,* (7c)

$$-D\_{\vec{l}} \nabla \vec{j}\_{\vec{l}}(t, \mathfrak{x}) \cdot \nu(\mathfrak{x}) = $$

$$\begin{aligned} -\theta^i(\mathsf{R}^i\_j)'(\boldsymbol{\chi}(\mathsf{t},\mathsf{x}))\ddot{\boldsymbol{\eta}}\_j & \quad \text{for } (\mathsf{t},\mathsf{x}) \in (\mathsf{0},T) \times \Gamma\_i, \quad \text{(7d)}\\ -D\_{\boldsymbol{\jmath}}\nabla\overline{\boldsymbol{\chi}}\_{\boldsymbol{\jmath}}(\mathsf{t},\mathsf{x}) \cdot \boldsymbol{\nu}(\mathsf{x}) &= \mathbf{0} \quad \quad \text{for } (\mathsf{t},\mathsf{x}) \in (\mathsf{0},T) \times \Gamma\_0, \quad \text{(7e)} \end{aligned}$$

and the initial condition

$$
\bar{y}\_j(\mathbf{0}, \mathbf{x}) = \mathbf{0} \qquad \text{ for } \mathbf{x} \in \Omega,\tag{7f}
$$

where *∂<sup>y</sup>* and *∂<sup>λ</sup>* denote partial derivatives and ()*′* [e.g., (*R i j* ) *′* ] denotes total derivatives of the corresponding kinetic functions, respectively, and where *y* is the solution of equations (1a)–(1f). The solution of this linearized problem can be computed simultaneously with the direct simulation using the same discretization method as described in Section 2.2. Compared to the evaluation of the objective function (5), the additional computational effort to obtain the derivative is solving *n<sup>p</sup>* linear equation systems in each time step. Instead of using the linearized problem (7), we can also obtain a derivative from an adjoint problem, see again, e.g., Hinze et al.(2009). The latter approach is more efficient when *n<sup>p</sup>* is large.

With the derivative at hand, we can then solve the reduced optimization problem subject to further parameter constraints for instance with primal-dual interior-point algorithms using a quasi-Newton approximation of the Hessian. These methods are known to have both good theoretical and practical behavior, see, e.g., Forsgren et al. (2002). Given some initial parameter *λ* (0), they compute a sequence of parameters *λ* (*k*) , *k* = 1, 2, 3, *. . .*, converging to an optimal *λ* until for some *k*\* first order conditions or stationarity in the objective function are satisfied within a tolerance tol*<sup>X</sup>* or tolfun, respectively. We set *λ*\* = *λ* (*k*\*) , *J*\* = *J*(*y*(*λ*\*)), and *J* <sup>0</sup> = *J*(*y*(*λ* (0))).

In order to determine an optimal stoichiometry *λ* for our sucrolytic chamber, we have the three parameters *λ*inv, *λ*hk, and *λ*pgi, but we may eliminate for example *λ*pgi by the condition (2). This yields *n<sup>p</sup>* = 2, together with a linear inequality constraint *λ*inv + *λ*hk *≤* 1 and box constraints *λ*inv, *λ*hk *∈* [0,1]. For our numerical results in Section 3.2 for the SE-scenario, we have used the interior-point algorithm implemented in the Optimization Toolbox in MATLAB (2014) with a BFGS quasi-Newton approximation of the hessian, where equations (5) and (6) are computed using the methods described in Section 2.2. Again we use the software package M++ for our implementation. As initial parameter *λ* (0), we either use the stoichiometry from equal activity *λ E* given by equation (3) or the uniform stoichiometry *λ* (0) = *λ <sup>U</sup>* = ( 1 3 *,* 1 3 *,* 1 3 ) depending on which has a larger value *J* (0). Furthermore, we use tol*<sup>X</sup>* = 1.0*e*–06 and tolfun = 1.0*e*–04 for all our optimization runs.

The derivative computation and hence the above method can directly be adapted to other real parameters entering the kinetic functions such as, e.g., the permeability *θ*. Combinatorial problems from (additional) discrete parameters such as the number of beads can be solved using enumeration or, approximately, by derivative-based optimization techniques based on relaxation methods, even for time dependent parameters as in Hante and Sager (2013). Continuously varying geometry parameters such as bead diameters or shape and size of the chamber Ω*<sup>c</sup>* can be handled similarly by shape derivatives as in Leugering et al.(2011). For the SE-scenario, we enumerate optimal *λ*\* for various combinations of number of beads *n<sup>b</sup>* , different choices of hexokinases in the sucrolytic pathway and different time horizons *T*.

#### **3. RESULTS**

We apply our simulation and optimization methods to the situation of the microreactor in the SE-scenario for realistic parameters. Hereby, we consider two-dimensional quadratic reaction chambers Ω*<sup>c</sup>* = (0, *L*) <sup>2</sup> with *L* = 500 and *L* = 3000 *µ*m, and different numbers of uniformly distributed beads, each of diameter of 10 *µ*m. The outflow boundary is Γ*<sup>e</sup>* = {*L*} *×* (0, *L*). The rest of the chambers' boundary is impermeable. The parameters of the mathematical model are given in **Table 3**. The final time *T* varies between 0.5 and 4 h. This is appropriate, due to the fact that the outflow-membrane Γ*<sup>e</sup>* is in general unstable and bursts after several hours. We stress that, due to the need of small time step sizes and the ratio of bead and chamber size, larger time horizons and chamber sizes lead to a higher computational complexity.

For our numerical investigations, we consider two types of hexokinase, namely HsHK2 and ScHK2. We also have two stoichiometry values which play a role in our calculations: the uniform stoichiometry *λ <sup>U</sup>* = ( 1 3 *,* 1 3 *,* 1 3 ) , and the equal activity stoichiometry *λ E* described in equation (3). More precisely, *λ E* is used for the numerical simulations, whereas either of the two is used as initial parameter in the optimization, depending on which leads to a larger value of the total glucose-6-phosphate outflow.

#### **3.1. Simulation Results**

We consider a quadratic chamber Ω*<sup>c</sup>* with a length of 3000 *µ*m, a fixed run time of *T* = 3 h, a fixed enzyme stoichiometry *λ* = *λ E* and a number of beads *n<sup>b</sup> ∈* {0<sup>2</sup> , 1<sup>2</sup> , 2<sup>2</sup> , *. . .* ,6<sup>2</sup> , 12<sup>2</sup> , 18<sup>2</sup> , *. . .* ,60<sup>2</sup> }, where *n<sup>b</sup>* = 0 refers to a scenario with enzymes distributed in the bulk solution.

The meshes used for spatial discretization consist of between 15,190 triangles and 22,941 edges for 1 bead, and 93,184 triangles and 151,592 edges for 2916 beads. Detailed information about the meshes is available in the Supplementary Material. We used a time step size of 0.25 s. All calculations have been repeated for *T* = 1.5 h on uniform refined meshes and half of the time step size. No significant differences between the results on the coarse and on the fine meshes were observed.

First results are concerned with the numerical computation of the spatial and temporal distribution of the metabolites. **Figure 4** shows the distribution of G6P for a scenario with 1296 beads and the hexokinase HsHK2 after 3 h for completely and partially loaded beads.

Next results are concerned with the computation of quantities, which are particularly relevant for our microreactor, namely the export of G6P at time *t ∈* [0,*T*], i.e., the outflow rate at time *t* given by

$$\int\_{\Gamma\_{\epsilon}} -D\_{\text{G6P}} \nabla \mathcal{Y}\_{\text{G6P}}(t, \mathbf{x}) \cdot \nu(\mathbf{x}) d\sigma(\mathbf{x}),\tag{8}$$

and the production rate for the metabolites at the beads at time *t* given by

$$\int\_{\Gamma\_b} D\_{\hat{l}} \nabla \jmath\_{\hat{l}}(t, \mathfrak{x}) \cdot \nu(\mathfrak{x}) d\sigma(\mathfrak{x}),\tag{9}$$

for *j* = G, *. . .* , G6P. In computing these quantities, we consider the following arrangements:


Both arrangements are simulated for the two hexokinases HsHK2 and ScHK2, and the beads are distributed periodically in the bulk, like, e.g., in **Figure 4**. For the arrangement (S.2), we also consider the case without beads where all enzymes are distributed in the bulk. Then the reaction terms *R* Ω *j* in equation (1a) are equal to *R b j* from **Table 2**. In (S.1), the maximal velocity *v i* max = *k i* cat[*E*] 0 corresponding to enzyme *i, i ∈* {inv, hk, pgi} is calculated with

**loaded beads, where the total number of enzymes distributed to the beads corresponds to that on one completely loaded bead (right), after 3 h**.

**ScHK2 (left) and HsHK2 (right)**.

[*E*]<sup>0</sup> = [*E*]full where [*E*]full is the concentration of binding sites on one bead. These values are denoted by *v i,*full max, and are given in **Table 3**. In (S.2), we fix the total concentration of enzyme to be given by [*E*]full. However, this concentration is now distributed on *n<sup>b</sup>* beads. Thus, the values for *v i* max*, i ∈ {*inv*,* hk*,* pgi*}* are now obtained by dividing the values used in (S.1) by the number of beads *n<sup>b</sup>* .

The production and outflow rate for G6P are plotted in **Figure 5** for (S.1) and **Figure 6** for (S.2). For fully loaded beads the production rate increases with increasing number of beads, and reaches saturation after approximately 1000 s in the case of HsHK2 and after approximately 10,800 s in the case of ScHK2 (**Figure 5**, upper pictures). Concerning the outflow rate, in case of HsHK2 (**Figure 5** right bottom) an upper bound is asymptotically reached, the value of which seams to be the same for all *n<sup>b</sup> ≥* 576 (simulations with longer runtime suggest that the value seams to be the same for *n<sup>b</sup> ≥* 16).

In case of partially loaded beads corresponding to arrangement (S.2), the production rate of G6P at the beads up to a time of 3 h first decreases with increasing number of beads and then stays nearly constant (**Table 4**; **Figure 6**, upper figures). Please note, that for the arrangement (S.2) the outflow rates are not reaching

**TABLE 4 <sup>|</sup> Production and outflow rates at** *<sup>T</sup>* **= 3***<sup>h</sup>* **for scenario (S.2) with the hexokinase HsHK2 and** *<sup>n</sup><sup>b</sup> <sup>∈</sup>* **{1<sup>2</sup> , 2<sup>2</sup> , 6<sup>2</sup> , 18<sup>2</sup> , 30<sup>2</sup> , 42<sup>2</sup> , 54<sup>2</sup> , 0}, and times until half of these rates are reached**.


saturation, which implies that the transporters activity are not limiting the outflow for the chosen enzyme concentration [*E*]full. We also remark that for the chosen stoichiometry *λ E* , in case of arrangement (S.2), the production rate of G6P for HsHK2 is by a factor 10<sup>3</sup> to 10<sup>4</sup> larger than for ScHK2, i.e., in (S.2) the selected hexokinase has a greater influence on the conversion process as in (S.1).

In order to quantify how fast production and outflow rate for various number of beads increase in time in the scenario (S.2) with the hexokinase HsHK2, in **Table 4**, we give the values of these rates at *T* = 3 h, and the times the reactor needs to reach half of these values.

#### **3.2. Optimization Results**

We aim at finding the optimal stoichiometry *λ*\* for the SEscenario. Since our focus is on computing a variety of different scenarios with a reasonable computational effort, we have chosen the smaller chamber Ω*<sup>c</sup>* = (0.500 *µ*m)<sup>2</sup> for these experiments. The meshes used for the spatial discretization of this chamber consist of between 820 triangles and 1270 edges for 1 bead, and 2394 triangles and 3883 edges for 64 beads. Further information about the meshes is available in the Supplementary Material.

For our study, we consider the following arrangements:


We study these arrangements for the hexokinases ScHK2 and HsHK2.

For the enzyme ScHK2, we find in case of arrangement (O.1) that the optimal stoichiometry for a runtime of 1 h is between 3


**TABLE 5 | Optimal parameters** *λ***\* for the SE-scenario with ScHK2 (top) and HsHK2 (bottom), for a fixed runtime** *T* **= 3600, and different numbers** *n<sup>b</sup>* **of beads**.

**TABLE 6 | Optimal parameters** *λ***\* for the SE-scenario with ScHK2 (top) and HsHK2 (bottom), for one bead and different runtimes** *T* **on a mesh with** *n<sup>h</sup>* **= 820**.


and 12% for invertase, between 82 and 94% for hexokinase and between 3 and 6% for phosphoglucose isomerase. With an increasing number of beads, the proportion of ScHK2 increases and the proportions of invertase and phosphoglucose isomerase decrease. The exact results are listed in **Table 5**, where for each number of beads *n<sup>b</sup>* , the optimal stoichiometry *λ*\* as well as the corresponding total outflow *J*\* are displayed. From the arrangement (O.2), we find that the optimal stoichiometry for one bead is between 8 and 13% for invertase, between 80 and 87% for hexokinase and between 5 and 6% for phosphoglucose isomerase, again with an increase in the proportion of binding sites loaded with ScHK2 in expense of proportion for ScINV, but with an almost constant proportion of binding sites loaded with ScPGI when the runtime is successively increased. The exact results are listed in **Table 6**, where for each run time *T*, the optimal stoichiometry *λ*\* as well as the corresponding total outflow *J*\* are displayed.

In the presence of the hexokinase HsHK2, we find from the first arrangement that the optimal stoichiometry for a runtime of 1 h is between 5 and 9% for invertase, between 91 and 95% for hexokinase and approximately 0% for the phosphoglucose isomerase. With an increasing number of beads, the proportion of hexokinase increases and the proportion for invertase decreases. The exact results are listed in **Table 5**. From the arrangement (O.2), we find that the optimal stoichiometry for one bead is between 7 and 10% for invertase and between 90 and 93% for the hexokinase again with an increase in the proportion of binding sites loaded with HsHK2 in expense of the proportions for ScINV. As in arrangement (O.1), the optimal proportion of phosphoglucose isomerase is approximately 0%. The exact results are again listed in **Table 6**. We stress that the optimization results in **Table 5** correspond to local maxima.

#### **4. DISCUSSION**

In this paper, we developed mathematical techniques for the simulation and optimization of metabolic processes in biological microreactors with membrane-bounded subcompartments. The results from our models will be incorporated into the design of a microreactor and will be validated by experiments with this microreactor. The microreactor is based on a microfluidic system, and consists of chambers separated by membranes which carry specific transporters for input of substrates and export of products. Inside the chambers, magnetic nano-beads carrying multienzyme-complexes are distributed, and immobilized by an outer magnetic field.

We have set up a mathematical model describing the spatiotemporal dynamics of the metabolite concentrations involved in the assembled metabolic pathway, and develop efficient and accurate approaches for the numerical simulation of this model. Furthermore, we show that model-based optimization for such systems is feasible by the methods presented. These approaches are applied to the proof-of-concept microreactor carrying out the synthetic pathway for the conversion of sucrose to glucose-6-phosphate. The obtained results shed light on the following important questions linked to the design and functionality of microreactors.

## **4.1. Are Spatially Resolved Models Necessary in the Description of Biochemical Microreactors***?*

The simulation of the distribution of metabolites, e.g., that of G6P in the presence of the enzyme HsHK2 (**Figure 4**), shows spatial patterns, which differ for different experimental arrangements. In case of completely loaded beads (**Figure 4**, left) the highest concentration of G6P can be found around the beads. If the beads are just partially loaded (**Figure 4**, right), a concentration gradient across the chamber prevails. In this case, the total number of enzymes distributed to the beads corresponds to the number of enzymes on one completely loaded bead. These spatial patterns in the metabolite distributions confirm the demand of spatially resolved models in the description of metabolic processes carried out in synthetic microreactors and, ultimately in living cells, which are more complex in architecture. Furthermore, the limiting effect of diffusion also shows that spatial effects have to be taken into account. This is visible in arrangement (S.2), where in the initial phase of the simulation the production rate of G6P at the beads decreases with increasing number of beads (**Figure 6**, upper figures); however, the outflow rate of G6P is higher for a large number of beads. This results from the fact that for high number of beads there are more beads close to the outflow boundary Γ*e*, and consequently G6P diffusion to the boundary Γ*<sup>e</sup>* needs less time. We mention, however, that for longer runtimes, the higher production rate of G6P for lower numbers of beads is predominating, and the outflow rates of G6P are higher for less number of beads (**Figure 6**, lower pictures).

## **4.2. Has Spatial Organization and Microcompartmentation of Enzymes the Potential to Influence the Efficiency of Metabolic Pathways?**

Comparing the outflow rate of the product G6P for a fixed total enzyme quantity which is distributed on different numbers of beads, it turns out that the production rate can be increased with decreasing number of beads (**Table 4**; **Figure 6**, lower figures), especially, assembling all enzymes on one bead leads to the maximal outflow of G6P. This remains true also after comparing the upper scenarios with the scenario, in which enzymes are distributed in the bulk fluid within the reactor chamber. This last scenario turns out to lead to the lowest G6P outflow. For the scenario with one bead and the hexokinase HsHK2, the outflow of G6P is approximately 60% higher than the total outflow for the scenario with enzymes distributed within the bulk fluid with the same hexokinase (**Table 4**). These results suggest that microcompartmentation of enzymes increases the total output of G6P, not only in the model but also *in vitro*.

## **4.3. What Are the Limitations for the Productivity of the Modeled (Synthetic) Membrane-Bounded Microreactors?**

The simulation results for the arrangement (S.1) (with completely loaded beads) show that the outflow rates for G6P reach saturation, in spite of the fact that the activity inside the chamber increases with increasing number of beads. This can directly be seen for the hexokinase HsHK2 (**Figure 5**, right bottom), whereas for ScHK2 this effect does not occur up to the run time of 3 h (**Figure 5**, left bottom). However, our computations show that at later time, this limiting effect also occurs for this hexokinase. This suggests that the limiting factor of the G6P outflow is the activity of the transporters in the membrane Γ*e*. Finally, we mention that for the arrangement (S.2) the outflow rates are not reaching saturation, which implies that the activity of the transporters is not limiting the outflow for the chosen enzyme concentration [*E*]full.

## **4.4. Can Optimal Design Help to Improve the Yield of a Biosynthetic Pathway?**

Our optimization results for the microreactor in the SE-scenario with the hexokinase ScHK2, show that after a run time *T* = 3600 s, the total outflow of G6P, denoted by *J*\*, for the optimal stoichiometry *λ*\* is on average twice as much as the total outflow *J* 0 for the uniform stoichiometry *λ U* , and even much more than for the stoichiometry *λ E* , e.g., 2746% in case of one bead. For one bead, this improvement is also visualized in **Figure 7** (left). When run time is successively increased (for a reactor with one bead) starting from *T* = 1800 s, again the optimal stoichiometries enhance the total outflow of the desired product G6P on average twofold compared to the uniform stoichiometry and even much more compared to the stoichiometry *λ E* . The comparison of the G6P outflow over time for stoichiometries computed for different runtimes shows that a larger proportion of binding sites loaded with ScINV leads to a higher G6P outflow at the initial phase while a larger proportion of binding sites loaded with ScHK2 leads to a higher G6P outflow on the long run (**Figure 8**).

Optimization of the microreactor setup with the hexokinase HsHK2 suggests that the phosphoglucose isomerase can be eliminated for the run times considered in the optimization scenarios. Furthermore, in contrast to the results for ScHK2, the degree of improvement of the optimal stoichiometries is between 50% (for one bead) and 1% (for 64 beads) compared to the stoichiometry computed from equal activity and slightly higher compared to the uniform stoichiometry. For one bead, the improvement is also visualized in **Figure 7** (right). When time is successively increased (for a reactor with one bead) from *T* = 1800–14,400 s, the optimal stoichiometries now enhance the total outflow of G6P. This

**FIGURE 7 | G6P outflow for the SE-scenario with** *n<sup>b</sup>* **=1 compared for the uniform (***λ U* **), equal activity (***λ E* **) and optimal (***λ***\*) stoichiometry for** *T* **= 7200 and ScHK2 (left) and HsHK2 (right) computed with ∆***t* **=0.25 and** *n<sup>h</sup>* **= 820**.

enhancement is 20% up to 78% compared to the stoichiometry computed from equal activity and slightly more compared to the uniform stoichiometry, e.g., 147% for *T* = 1800. The observation obtained for ScHK2 concerning the role of the invertase and hexokinase for the initial phase and on the long run is now even more pronounced (**Figure 8**).

To understand the optimization result obtained for the proportion of phosphoglucose isomerase (the enzyme catalyzing the reaction G6P F6P), in case of the microreactor with HsHK2, namely that this enzyme should be better eliminated, we simulated the production rates for all metabolites, for one completely loaded bead (**Figure 9**). We see that, with respect to our considered time interval, the production rates for G and F increase very fast to a maximum-level around 0.3 *mol s* . The reaction rates in **Table 2** imply that the conversion of S into F is constant, since the concentration of S is constant. Furthermore, **Figure 9** shows that the production rate of F6P increases in time, but the production

rate of F remains nearly constant. Hence, we conclude that there is a metabolic flux from G6P to F6P, what contradicts the reactor to obtain a maximal outflow of G6P, and should be avoided by considering a small activity of phosphoglucose isomerase.

We stress that none of the reported maxima can be guaranteed to be global optima. Nevertheless, concerning the quality of the maxima, we note that all search paths from *λ* (0) to *λ*\* transverse a large part of the parameter space and are associated with a significant decrease in the costs. Moreover, we have tested various initial points *λ* (0) for the scenario with one bead, HsHK2 and *T* = 3600. In all cases, we have observed convergence to the same *λ*\*. We conclude that the reported local maxima are reasonable candidates for an optimized stoichometry for the respective microreactor.

Finally, we emphasize that the simulation and optimization approaches developed in this work can be repeated with minor adaptions for more complex biochemical microreactors and for other optimization parameters, like, e.g., other real parameters entering the reaction kinetics.

#### **AUTHOR CONTRIBUTIONS**

MN-R and LV gave substantial contributions to the Abstract and Introduction. MG and MN-R gave substantial contributions to

#### **REFERENCES**


the conception of the Derivation of the Mathematical Model, see Section 2.1. TE and PK gave substantial contributions to the conception of the Numerical Analysis and Simulations, see Section 2.2 and 3.1. FH and GL gave substantial contributions to the conception of the Optimization Methods and Results, see Section 2.3 and 3.2. LV gave substantial contributions to the acquisition, analysis, and interpretation of data for the work. All authors contributed essentially in the discussion of the results, see Section 4. Also, all authors have revisited the work critically for important intellectual content, approved the version to be published, and agreed to be accountable for all aspects of the work.

#### **ACKNOWLEDGMENTS**

The work of the first and the second author was supported by the Emerging Fields Initiative (EFI) for Synthetic Biology at the Friedrich-Alexander-Universität Erlangen-Nürnberg.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at http://journal.frontiersin.org/article/10.3389/fbioe.2016.00013

isomerase expressed in *Escherichia coli*. *Biochim. Biophys. Acta* 1794, 315–323. doi:10.1016/j.bbapap.2008.11.004


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2016 Elbinger, Gahn, Neuss-Radu, Hante, Voll, Leugering and Knabner. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **Underpinning starch biology with** *in vitro* **studies on carbohydrateactive enzymes and biosynthetic glycomaterials**

*Ellis C. O'Neill <sup>1</sup> and Robert A. Field<sup>2</sup> \**

*<sup>1</sup> Department of Plant Sciences, University of Oxford, Oxford, UK, <sup>2</sup> Department of Biological Chemistry, John Innes Centre, Norwich Research Park, Norwich, UK*

Starch makes up more than half of the calories in the human diet and is also a valuable bulk commodity that is used across the food, brewing and distilling, medicines and renewable materials sectors. Despite its importance, our understanding of how plants make starch, and what controls the deposition of this insoluble, polymeric, liquid crystalline material, remains rather limited. Advances are hampered by the challenges inherent in analyzing enzymes that operate across the solid–liquid interface. Glyconanotechnology, in the form of glucan-coated sensor chips and metal nanoparticles, present novel opportunities to address this problem. Herein, we review recent developments aimed at the bottom-up generation and self-assembly of starch-like materials, in order to better understand which enzymes are required for starch granule biogenesis and metabolism.

#### *Edited by:*

*Lars Matthias Voll, Friedrich-Alexander-University Erlangen-Nuremberg, Germany*

#### *Reviewed by:*

*Weiwen Zhang, Tianjin University, China M. Kalim Akhtar, University of Edinburgh, UK*

#### *\*Correspondence:*

*Robert A. Field, Department of Biological Chemistry, John Innes Centre, Norwich Research Park, Norwich NR4 7UH, UK rob.field@jic.ac.uk*

#### *Specialty section:*

*This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology*

> *Received: 03 July 2015 Accepted: 24 August 2015 Published: 07 September 2015*

#### *Citation:*

*O'Neill EC and Field RA (2015) Underpinning starch biology with in vitro studies on carbohydrateactive enzymes and biosynthetic glycomaterials. Front. Bioeng. Biotechnol. 3:136. doi: 10.3389/fbioe.2015.00136* **Keywords: starch, glucans, phosphorylase, self-assembly, nanotechnology, synthetic biology**

## **Introduction**

Starch makes the highest calorific contribution to the human diet (U.S. Department of Agriculture, 2012) and is a bulk commodity, with a global market of billions of tons per year (Ellis et al., 1998). We cannot meet the ever-increasing demand for starch-based products solely by further commitment of agricultural land to conventional starch-based crops – as well as increases in crop yield, new ways to produce modified starch-based materials are also needed (Diouf, 2009). In order to achieve this goal, the production of modified starches*in planta* is an attractive alternative to post harvest modification (Jobling, 2004) – i.e., moving the diversification of starch functionality from the chemical plant into the crop plant. To this end, better understanding of the enzymes involved in starch metabolism and those used in industrial starch modification is needed.

## **Starch Structure and Metabolism**

Starch granules comprise linear α-1,4-glucans with periodic α-1,6-branches, giving long chains capable of wrapping around each other to form double helical arrangements, which stack side by side, forming alternating layers of highly ordered liquid crystalline lamella interspersed with amorphous regions (**Figure 1**) (Waigh et al., 1998). This self-organizing nanostructure makes the surface of a starch granule highly resistant to enzymatic attack, requiring specialized enzymes to initiate degradation.

In order to engineer precisely defined starch structures, we need to understand the metabolic enzymes involved, insights into which are provided by continued research in the model plant

*Arabidopsis thaliana* (Santelia and Zeeman, 2011). Multiple enzymes are required to create the correct morphology of the starch granule, and the specific role of individual enzyme isoforms is not well-understood (Ball and Morell, 2003; Tetlow, 2011). For example, the correct *synthesis* of granular starch counterintuitively requires two *degradative* isoamylases (Bustos et al., 2004). Storage starch is formed in the amyloplasts of specialist storage organs, such as the grain or tuber, while transitory starch is stored in the leaf chloroplasts, as a carbon source for photosynthetic cells at night. The two plastids use different suites of enzymes to attack the starch granule (Smith et al., 2005; Smirnova et al., 2015). In the storage organs, the granule is enzymatically hydrolyzed (Radchuk et al., 2009), ultimately releasing glucose; in photosynthetic organs, the transitory starch granule is initially phosphorylated by specific glucan kinases, before being broken down into short maltooligosaccharides (Fettke et al., 2009). There are several isoforms of each class of enzyme involved in starch synthesis and breakdown, but differences in their action are generally interpreted in terms of the expression of the gene and protein in question, rather than of the enzyme activity *per se*. The complication of enzymatic reactions taking place on an insoluble starch substrate accounts for these crude approximations, a situation that needs to be addressed in order to inform plant engineering/synthetic biology studies.

## **Current Status on the Enzymatic Degradation of Insoluble Starch**

The enzymology of the starch granule actually takes place at a solid–liquid interface, rather than solely in the solution phase used in most experiments. This can have profound effects on the reaction; for example, the reaction rate of granule-bound starch synthase is enhanced on crystalline amylopectin (Edwards et al., 1999). There are few enzyme studies involving crystalline maltodextrins (Hejazi et al., 2009) or purified starch granules

(Edner et al., 2007), but even these studies measured discreet product formation, not breakdown of the starch granule. Electron microscopy clearly shows that the initial attack on the surface alters the granule by disrupting the insoluble packing of the glucan chains (Sun and Henson, 1990), undermining endpoint kinetic assessments. Direct monitoring of purified starch granules during degradation, either using electron microscopy (Planchot et al., 1995) or synchrotron radiation (Tawil et al., 2011), provides snapshots of these processes during the degradation of the granule. Clearly such experiments provide information on structures, but not on reaction rate as such.

## *In Vivo* **Modification of Starch**

The source of starch can have a marked impact on physicochemical properties that are important for industrial uses. For instance, cereal starch is virtually free of phosphate, while potato starch has high phosphate content, with increased viscosity and decreased crystallization, factors important in the paper industry (Blennow et al., 2003). For commercial applications, starches may need to be chemically modified – by fragmentation, oxidation, or esterification – although this is costly and potentially hazardous. Production of fit-for-purpose modified starches*in planta* is therefore an attractive alternative (Jobling, 2004) and has been achieved in a number of ways. For instance, the gelatinization properties of the starch can be controlled by using genetic engineering to alter the ratio of linear amylose to branched amylopectin (Visser et al., 1991; Jobling et al., 2002). Removal of the glucan water dikinase, which adds phosphate to starch chains during degradation, from potato tubers produced a starch that was resistant to degradation during storage (Lorberth et al., 1998). By replacing the plant disproportionating enzyme, which transfers glucans between chains during starch degradation, with its bacterial homolog the complex soluble heteroglycan could be essentially bypassed during transitory starch degradation (Ruzanski et al., 2013). It has also been possible to engineer a vaccine-displaying starch by fusing starch-binding proteins to known antigens in algal chloroplasts (Dauvillée et al., 2010). However, it is not easy to predict outcomes at the whole plant level due to poor knowledge of individual enzyme functions (Stanley et al., 2011), coupled with compensatory mechanisms and genetic redundancy or partially overlapping enzymatic capabilities.

## *In Vitro* **Synthesis and Modification of Starches**

A number of approaches have been investigated to produce natural and non-natural starch-like materials *in vitro.* To simulate the branch point of a starch granule, specific glucans could be linked synthetically using "click chemistry," producing known length chains with known positioning of the branch point (Marmuse et al., 2005; Nepogodiev et al., 2007). Longer unbranched amylose polymers can be synthesized enzymatically in solution, either by the debranching of amylopectin with isoamylase (Harada et al., 1972) or by the extension of acceptor glucans with amylosucrase or glucan phosphorylase (Yanase et al., 2007). Amylosucrase has been used to synthesize dendritic nanoparticles, based on a glycogen core (Putaux et al., 2006); in combination with branching enzyme, a highly branched glucan can be accessed from sucrose (Grimaud et al., 2013). Amylomaltases, which transfer glucans from one chain onto another, can also be used to generate fluorinated glucans (Tantanarat et al., 2012) and to modify starches in order to make thermoreversible gels (Kaper et al., 2005). Interestingly, phosphorylases also display some promiscuity toward the donor substrate and they have been used to prepare a range of modified glucans (O'Neill and Field, 2015). 2-Deoxy-maltooligosaccharides (Klein et al., 1982) have been synthesized from -glucal in the presence of Pi, which can then be phosphorolyzed to synthesize 2-deoxy-α-glucose-1-phosphate. Deoxy- and fluoro-glucose moieties can also be transferred onto glycogen by glucan phosphorylase, but in low yield (Withers, 1990). Alternative sugar-1-phosphates can be utilized by glucan phosphorylase, including those derived from xylose (Nawaji et al., 2008b), mannose (Evers and Thiem, 1997), glucosamine (Nawaji et al., 2008a), *N*-formyl-glucosamine (Kawazoe et al., 2010), and glucuronic acid (Umegatani et al., 2012), although the products were all isolated after a single residue extension, indicating that these sugars cannot be bound in the acceptor site of the enzyme for further extension. Under certain conditions, and with careful choice of phosphorylase, further extension has been achieved to make a polyglucosamine, analogous to the α-steroisomer of chitosan (Kadokawa et al., 2015). By addition of both glucosamine and glucuronic acid onto a glycogen core, pH-responsive amphoteric hydrogels could be synthesized (Takata et al., 2015).

Plant phosphorylases, with their lack of allosteric regulation (Fukui et al., 1982), have proven more useful than the mammalian equivalent in the synthesis of long chain amylose derivatives, for instance, by extension of glucan immobilized on chitosan (Kaneko et al., 2007) or on polystyrene (Loos and Müller, 2002), or by twining polysaccharides around a hydrophobic core to assemble a macromolecular complex, such as amylose-wrapped lipid (Gelders et al., 2005).

#### **Engineering Insoluble Starch Surfaces for Enzymatic Analysis**

When amylose is synthesized in solution, it self-assembles into crystalline structures (Buleon et al., 2007). Using electron microscopy, this type of crystalline material can be seen after extension of glycogen particles using amylosucrase (Putaux et al., 2006) and X-ray diffraction shows that it has assembled into head-to-tail, B-type starch (Potocki-Veronese et al., 2005). Many physiologically important enzymatic reactions occur on surfaces and their analysis is far from trivial. A range of techniques have been deployed to address this point, including mass spectrometry, radioactivity, and fluorescence-based assays (Gray et al., 2013). Quartz crystal microbalance technology has also been used to monitor real-time extension of amylopectin by phosphorylase (Murakawa et al., 2007) and to provide associated kinetic information (Nishino et al., 2004). The recent development of plant oligosaccharide microarrays also enables high throughput analysis of carbohydrate-active enzymes (Pedersen et al., 2012).

While it has been possible to detect glycosylation reactions on surfaces using surface plasmon resonance (SPR), this technique has typically been deployed to assess non-catalytic lectin binding to sugars (Karamanska et al., 2008). Direct measurement of *trans*-glycosylation reactions on SPR sensors has been achieved (Cle et al., 2008, 2010), but the enzymes to which this technique has been applied use soluble substrates and act in solution. Any kinetic analysis has to therefore take into account the unnatural nature of the surface-immobilized substrates. Starch-active enzymes would benefit markedly from studies using immobilized substrates, which mimic the insoluble surface upon which they naturally act.

The plant phosphorylase, PHS2, can rapidly synthesize α-1,4 glucans in solution and, by immobilizing their acceptor oligosaccharide substrates, insoluble glucan surfaces could be developed (O'Neill et al., 2014). The SPR surfaces behaved in a manner dependent on the density of the immobilized glucan. When relatively dilute, the glucan surface behaved in the same way as in solution; when a denser surface was used, the material produced took on the pattern of enzyme resistance seen in natural granular starch. These results indicate that the amyloglucan polymer created by PHS2 on the high density surface formed a macromolecular architecture, which may be crystalline, rendering the surface resistant to enzymatic digestion. These surfaces could be used to assay proteins that interact with the starch granule to ascertain binding kinetics in classical SPR experiments. Starch-degrading enzymes are normally assayed in solution, which is not a good analog of their natural insoluble granule substrate. Classic enzyme kinetics cannot be utilized to study reactions on an insoluble surface and care will be needed to differentiate between the binding of soluble enzymes to the insoluble surface or the reaction itself as the rate limiting step. This new surface technology offers the prospect of a more informative assay to provide kinetic information on starch surface degradation.

While a quantifiable 2D system is relevant for kinetic analysis of starch-active enzymes, a third dimension is required to represent the true spatial arrangement of the starch granule. The uniform shape and size and defined physical and chemical properties of gold nanoparticles make them useful materials for the study of biological interactions (Saha et al., 2012). Gold nanoparticles can be simultaneously used to display carbohydrates and to provide visual output of interactions (Marin et al., 2015), using the same photophysical effects exploited in conventional SPR studies, to give a change in color from red to purple. Glucan primers immobilized on gold nanoparticles could be extended substantially by PHS2 (O'Neill et al., 2014). The resulting nanoparticle-based glucan surfaces were subject to aging, with the glucan layer appearing to reorganize and assemble into a much more tightly packed structure over the course of ~12 h (**Figure 2**). This type of glyconanoparticle has potential for the analysis of enzymes that naturally act on starch granules, and may serve as models for bottom-up, *in vitro* synthetic biological approaches to biocompatible inorganic surfaces.

imaged with TEM after PHS2-mediated extension with two staining methods (PATAg and UA). As illustrated in the cartoon, the carbohydrate (blue) on the nanoparticles (red), increases after treatment with PHS2 and rearranges to form a thinner layer after 24 h. The schematic shows that the starting glucan (black) is extended by AtPHS2 (blue) and then rearranges by forming intra and inter-chain interactions to produce a more condensed overall structure. Before extension the gold nanoparticles (a) stained weakly with iodine (b), but after PHS2-mediated extension (c) they stained strongly (d), indicating the formation of ordered starch helices. Adapted from O'Neill et al. (2014).

#### **Conclusion**

Reflecting the global need for sustainability and a drive toward environmentally benign manufacturing practices, new ways to produce bulk commodities, such as starch, are key. In order to achieve this goal, better information about how nature handles (bio)chemistry across the liquid–solid interface is necessary. Here, we have highlighted the potential of bottom-up synthetic biology approaches to produce starch-like surfaces in a format suitable for both structural analysis and real-time kinetic assessment of enzyme action thereon. These glyconanotechnology methods will

#### **References**


help to pave the way to a more complete understanding of natural starch metabolism and may be used to inform understanding and exploitation of the natural processes, including the recapitulation of starch production in non-starch producing organisms.

## **Acknowledgments**

The authors gratefully acknowledge support from the UK BBSRC Institute Strategic Programme Grant on Understanding and Exploiting Metabolism (MET) [BB/J004561/1] and the John Innes Foundation.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 O'Neill and Field. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Compartmentalization and Transport in Synthetic Vesicles

*Christine Schmitt1† , Anna H. Lippert 2†,‡ , Navid Bonakdar <sup>2</sup> , Vahid Sandoghdar <sup>2</sup> and Lars M. Voll1 \**

*1Division of Biochemistry, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany, 2Max-Planck-Institute for the Science of Light, Erlangen, Germany*

Nanoscale vesicles have become a popular tool in life sciences. Besides liposomes that are generated from phospholipids of natural origin, polymersomes fabricated of synthetic block copolymers enjoy increasing popularity, as they represent more versatile membrane building blocks that can be selected based on their specific physicochemical properties, such as permeability, stability, or chemical reactivity. In this review, we focus on the application of simple and nested artificial vesicles in synthetic biology. First, we provide an introduction into the utilization of multicompartmented vesosomes as compartmentalized nanoscale bioreactors. In the bottom-up development of protocells from vesicular nanoreactors, the specific exchange of pathway intermediates across compartment boundaries represents a bottleneck for future studies. To date, most compartmented bioreactors rely on unspecific exchange of substrates and products. This is either based on changes in permeability of the coblock polymer shell by physicochemical triggers or by the incorporation of unspecific porin proteins into the vesicle membrane. Since the incorporation of membrane transport proteins into simple and nested artificial vesicles offers the potential for specific exchange of substances between subcompartments, it opens new vistas in the design of protocells. Therefore, we devote the main part of the review to summarize the technical advances in the use of phospholipids and block copolymers for the reconstitution of membrane proteins.

Keywords: liposomes, vesosomes, block copolymers, reconstitution techniques, porins, metabolite transporters, membrane transport, compartmentalized bioreactors

## INTRODUCTION

Compartmentalization is a key feature of eukaryotic cells to spatially separate distinct biochemical processes from each other. Lipid bilayer membranes serve as impermeable barriers that effectively separate subcellular compartments. This (i) enables the simultaneous operation of metabolic pathways that utilize the same intermediates and (ii) allows for the adjustment of specific reaction conditions inside individual organelles. In order to translate this natural principle of biological organization, the

#### *Edited by:*

*Shota Atsumi, University of California Davis, USA*

#### *Reviewed by:*

*Sebastien Lecommandoux, University of Bordeaux, France Oscar Ces, Imperial College London, UK*

*\*Correspondence:*

*Lars M. Voll lars.voll@fau.de*

*† Christine Schmitt and Anna H. Lippert contributed equally.*

#### *‡Present address:*

*Anna H. Lippert, Department of Chemistry, University of Cambridge, Cambridge, UK*

#### *Specialty section:*

*This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology*

*Received: 16 November 2015 Accepted: 11 February 2016 Published: 29 February 2016*

#### *Citation:*

*Schmitt C, Lippert AH, Bonakdar N, Sandoghdar V and Voll LM (2016) Compartmentalization and Transport in Synthetic Vesicles. Front. Bioeng. Biotechnol. 4:19. doi: 10.3389/fbioe.2016.00019*

**Abbreviations:** DOPC, 1,2-dioleoyl-*sn*-glycero-3-phosphocholine; DPPC, 1,2-dipalmitoyl-*sn*-glycero-3-phosphocholine; PB-*b*-PEO, polybutadiene-*b*-poly(ethylene oxide); PEG-*b*-PSBA, poly(ethylene glycol)-*b*-poly(styrene boronic acid); PEO–PPO–PEO, poly(ethylene oxide)–poly(propylene oxide)–poly(ethylene oxide); PEtOz-PDMS-PEtOz, poly(2-ethyl-2 oxazoline)-*b*-poly(dimethylsiloxane)-*b*-poly(2-ethyl-2-oxazoline); PMOXA–PDMS–PMOXA, poly(2-methyloxazoline)– poly(dimethylsiloxane)–poly(2-methyl-oxazoline); PS-*b*-PDMAEMA, polystyrene-*b*-poly(*N*,*N*-dimethylaminoethyl methacrylate); PS-*b*–PIAT, polystyrene–poly(l-isocyanoalanine(2-thiophen-3-yl-ethyl)amide), aka polystyrene-*b*-poly(3- (isocyano-lalanyl-amino-ethyl)-thiophene); PVFc-*b*-P2VP, polyvinylferrocene-*b*-poly(2-vinylpyridine).

use of membranes to encapsulate chemical reactions has attracted interest in the design of synthetic systems. Lipids or block copolymers are commonly used to build membranes in synthetic systems, in so-called liposomes or polymersomes, respectively.

Liposomes consist of a shell of amphiphilic lipid species, such as phospholipids, that encapsulate an aqueous solution. The lipids are arranged in a bilayer with the polar head groups of the two leaflets facing toward the inside and the outside aqueous phase and the hydrophobic tails of the phospholipids facing toward each other (**Figure 1B**). Based on the number of membrane layers, vesicles are called unilamellar or multilamellar. For biotechnological applications, the use of unilamellar vesicles is desirable, and these vesicle species can be defined according to their size, ranging from small unilamellar vesicles (SUVs having a diameter between 25 and 100 nm) to large unilamellar vesicles (LUVs with a diameter between 100 nm and 1 μm) and giant unilamellar vesicles (GUVs being larger than 1 μm up to 100 μm in diameter). Polymersomes consisting of amphiphilic block copolymers are of rising interest for compartmentalization in synthetic systems due to their increased mechanical stability and low membrane permeability compared to liposomes. The ability to (i) encapsulate specific cargo, (ii) trigger cargo release by external stimuli, and (iii) the possibility to incorporate particular membrane transport proteins are distinct features that depend on liposome and polymersome composition of artificial vesicles.

#### THE POTENTIAL OF LIPOSOMES, POLYMERSOMES, AND VESOSOMES – AN OVERVIEW

Both liposomes and polymersomes have become popular as vectors for targeted and tailored drug delivery and for the application in biochemical microreactors. Due to the amphiphilic nature of the lipid or polymer building blocks, a spontaneous assembly into vesicles occurs in aqueous environments (Discher and Eisenberg, 2002). The phase transition temperature represents an important parameter for the choice of specific lipid or polymer building blocks for drug delivery purposes. At the phase transition temperature, lipids and polymers are transformed from a liquid crystalline phase to a gel phase, which leads to maximal bilayer permeability (Van Hoogevest et al., 1984), and hence, to the release of cargo from the lumen of the vesicles. Based on the choice of lipid or block copolymer, the discharge of cargo from artificial vesicles can also be achieved by pH shifts, or redox potential, which we will discuss later in this article. However, the use of artificial vesicles for drug delivery is out of focus of this review, and we would like to refer the interested reader to recent reviews focusing on this topic (Ohya et al., 2011; Lee and Feijen, 2012; Khan et al., 2015; Thambi et al., 2016).

For vesicle formation, a variety of lipids with different properties is available (Marsh, 2012), which can either be used separately or as mixtures. Polymersomes made of amphiphilic block copolymers came into focus because of their potential for functionalization and increased mechanical stability compared to liposomes (Bermudez et al., 2002). Polyethylene glycol (PEG) and polyesytrene (PS)-based block copolymers are widely used to produce polymersomes for all kinds of applications. In addition, polypeptide-based polymersomes have become increasingly popular for biomedical applications, which are not only due to their biodegradability and high tissue compatibility but also based on their ability to change aggregation state and permeability in response to environmental stimuli [as recently reviewed by Zhao et al. (2014)]. Membrane thickness of copolymer-derived polymersomes predominantly depends on the length of the hydrophobic block (Smart et al., 2008). However, not only the chain length of the individual hydrophilic and hydrophobic blocks in diblock and triblock copolymers but also the length ratio of the hydrophilic and hydrophobic segments were found to represent an important parameter for membrane permeability and stiffness (Rodríguez-García et al., 2011). Copolymers that combine a low molecular weight with high hydrophobicity were found to preferably arrange into GUVs (Rodríguez-García et al., 2011). For a more in depth view on the use of polymersomes as vesicle scaffolds in biotechnology, please see recent reviews on the topic (e.g., Lee and Feijen, 2012; Zhao et al., 2014).

Besides simple, single-compartment vesicles, the formation of multicompartmentalized vesicular systems was engineered in the last years to allow the encapsulation of distinct cargos in different vesicular compartments. Various approaches were investigated for this purpose, such as the encapsulation of smaller vesicles into larger vesicles, so-called vesosomes (Walker et al., 1997; Bolinger et al., 2008; Marguet et al., 2012; Paleos et al., 2012). Vesosomes represent nested vesicles that harbor multiple compartments of different sizes encapsulated in each other without a direct connection between the individual compartment boundaries (**Figures 1A,B,D**).

#### VESICLES AND VESOSOMES AS COMPARTMENTALIZED NANOREACTORS

Compartmentation of enzymatic reactions represents the basic biochemical principle of all living systems. The compartmentation of cellular metabolism has many advantages: the cell can be protected against toxic intermediates formed in one compartment, cellular subcompartments offer optimal reaction conditions for subsets of enzymes, compartmentation can avoid competition for substrates by different metabolic pathways, and it permits the differential regulation of isoenzymes within distinct compartments. Therefore, it seems reasonable to attempt to design multicompartmentalized synthetic systems that are confined by lipid or polymer bilayers, which provide optimal reaction conditions for different enzyme species. Pioneering work in the field was obtained when Meier and coworkers reported an enzymatic reaction inside a PMOXA–PDMS–PMOXA triblock copolymer vesicle in 2000. Passive diffusion of the substrate ampicillin into the polymersome was mediated by the reconstituted bacterial porin OmpF, the encapsulated enzyme β-lactamase subsequently hydrolyzed the substrate before the product ampicillinoic acid diffused out again through OmpF (Nardin et al., 2000, 2001).

To date, vesosomes have successfully been applied to generate compartmentalized biochemical systems, e.g., in the co-factordependent enzymatic formation of the fluorescent dye resorufin

FIGURE 1 | Multicompartmented artificial vesicles as bioreactors. (A) Vesosomes consisting of inner and outer coblock polymersomes with different physicochemical properties [adapted from (Peters et al., 2014)]. (B) Immobilization of biotinylated lipid vesosomes *via* the interaction with neutravidin in nanofluidic reactors [adapted from (Bolinger et al., 2008)]. The release of liposome contents is triggered by specific, consecutive temperature shifts. (C) Continuous-flow polymersome reactor with immobilized polymersomes in hydrogel (De Hoog et al., 2010). (D) Vesosomes using the porin OmpF as shuttle system [adapted from (Siti et al., 2014)]. (E) Multicompartment liposomes generated by the phase transfer technique [adapted from (Elani et al., 2014)]. For explanations, please refer to the manuscript text.

from a profluorescent substrate (Peters et al., 2014). To this end, the two different enzyme species *Candida antarctica* lipase B (CalB) and alcohol dehydrogenase (ADH) were encapsulated separately into intrinsically porous sub-micrometer-sized PS-*b*–PIAT polymersome subcompartments, which, in turn, were engulfed by PB-*b*-PEO GUVs (**Figure 1A**). In such a vesosome assembly, the lumen of the PB-*b*-PEO vesicle resembles an artificial cytosol, while the PS-*b*–PIAT vesicle lumen corresponds to artificial organelles (**Figure 1A**). In the initial step, the substrate was converted by NADPH-dependent phenylacetone monooxygenase (PAMO) into an ester in the artificial cytosol of the vesosomes, before the ester intermediate diffused into the CalB containing subcompartments, where it was subsequently hydrolyzed to a primary alcohol. After diffusion of the alcohol product out of the first subcompartment into the ADH containing subcompartments, the alcohol was oxidized in a NAD<sup>+</sup>-dependent reaction into an aldehyde, which then produced the fluorescent dye resorufin by spontaneous beta-elimination (**Figure 1A**). Likewise, other cascade reactions have also been established in polymersomes using glucose oxidase (GOx), horse radish peroxidase (HRP), and CalB (Vriezema et al., 2007; Kuiper et al., 2008).

Even complex cellular processes, such as the synthesis of ATP, could be achieved by the coupled activity of bacteriorhodopsin and F0F1-ATP synthase in synthetic vesicles. An H<sup>+</sup> gradient was built up by bacteriorhodopsin in a light-dependent manner and this H<sup>+</sup> gradient was subsequently utilized by ATP synthase to convert ADP and Pi to ATP. These two membrane-associated proteins have successfully been reconstituted into amphiphilic triblock copolymer PEtOz-PDMS-PEtOz polymersomes leading to ATP synthesis (Choi and Montemagno, 2005). This example nicely demonstrates the potential of synthetic systems to mimic complex cellular functions.

The potential to immobilize vesicular systems can also be exploited for the application as nanoreactors in nanofluidic devices, since the provision of substrate to and the harvest of product from immobilized vesicular compartments is much easier compared to open reaction systems. To this end, vesosomes have been immobilized on neutravidin-coated glass surfaces in nanoreactor systems through the integration of biotin–PEG– lipids into the outer vesosome bilayer [**Figure 1B**; Bolinger et al. (2008)]. In these systems, the inner SUVs consisted of lipids with different phase transition temperatures compared to the outer SUVs. The inner SUVs were loaded with the profluorescent dyes dichlorodimethylacridinone phosphate or fluorescein diphosphate, while the outer compartment was loaded with alkaline phosphatase (AP). The sequential, temperature-triggered release of the substrates from the encapsulated SUVs drove the conversion of the substrates by AP in the outer compartment in two distinct, consecutive steps (**Figure 1B**). The produced fluorescent products dichlorodimethylacridinone and fluorescein, respectively, were still trapped inside the outer lipid bilayer.

Alternatively, artificial vesicles have been immobilized in alginate capsules or hydrogels (De Hoog et al., 2010; Ullrich et al., 2015). A "continuous-flow polymersome reactor" was constructed by the immobilization of CalB and GOx loaded polymersomes in a hydrogel (**Figure 1C**) (De Hoog et al., 2010). The substrate was added on top of the reactor in this setup, while the product was collected at the bottom (**Figure 1C**). Since enzyme leakage from immobilized polymersomes was more than four times lower compared to free enzyme, the total enzyme activity required for nanoreactors can be decreased, once the proteins are encapsulated into polymersomes (De Hoog et al., 2010). These two examples illustrate the advantages of vesicular functional units in nanoreactor assemblies.

#### CONTROLLED RELEASE OF CARGO FROM LIPOSOMES, POLYMERSOMES, AND VESOSOMES BY PHYSICOCHEMICAL TRIGGERS

An important feature in the construction of vesicle-based nanoreactors is the design of the vesicle shell by the choice of lipids, polymers, or a mixture of both. Based on the chosen lipid or block copolymer, a specific release of cargo by external stimuli can be achieved following a physical or chemical trigger that alters membrane permeability. Before we focus on membrane transport proteins for the specific exchange of solutes between vesicle compartments, we would like to briefly summarize the advances in the use of block copolymers that permit a triggered release of solutes.

The release of ABTS2<sup>−</sup> by repeated thermal stimuli and the subsequent conversion to ABTS1<sup>−</sup> by laccase inside the alginate capsules was observed by Ullrich et al. (2015). The heat stimulus was applied either by heating above the phase transition temperature of DPPC in a water bath or by subjecting encapsulated superparamagnetic iron oxide nanoparticles to radiofrequency to cause heat emission.

In addition, stimuli- and cargo-selective content release was achieved in dual stimuli-responsive polymersomes with two kinds of cargo (Staff et al., 2014). On the one hand, the polymer vesicles consisted of the redox- and pH-responsive polymer polyvinylferrocene-*b*-poly(2-vinylpyridine) (PVFc*b*-P2VP) or the pH- and temperature-responsive polymer polystyrene-*b*-poly(*N*,*N*-dimethylaminoethyl methacrylate) (PS-*b*-PDMAEMA). On the other hand, the cargos dimethyldodecylamine (DDA) and diphenyl disulfide (DPDS) were selectively switchable from the water-insoluble to the soluble form by pH change or H2O2 redox trigger, respectively. The substances were stored in the water-insoluble form inside the polymersomes and were then specifically released by an external stimulus that allowed the passage of DPDS (by oxidation) or DDA (by pH change) across the membrane.

Finally, vesosomes were composed of lipids with different phase transition temperatures were used to trigger the successive mixing of the contents inside the vesosome by a suite of specific temperature changes (Bolinger et al., 2004, 2008). Likewise, the incorporation of stimuli-responsive polymers into polymersomes, such as the sugar and pH-responsive PEG-*b*-PSBA block copolymer or pH-responsive non-ionic amphiphilic triblock copolymers such as PEO-PPO-PEO, can lead to partial permeabilization of these vesicles. Pore-like structures are formed upon applying an appropriate external stimulus, but the vesicles are not disrupted (Binder, 2008; Kim et al., 2009).

## THE USE OF TRANSPORT PROTEINS FOR EXCHANGE OF SUBSTRATE BETWEEN VESICULAR SUBCOMPARTMENTS

While the release of cargo by triggered permeabilization of the bilayer is accompanied by at least partial or temporal loss of compartmentalization, a more controlled discharge of cargo from vesicles can be achieved by the integration of membrane proteins into the vesicle membrane. This can either be achieved by the reconstitution of unspecific diffusion pores, such as porins, or by the integration of substrate-specific transporters. This strategy avoids the increase in bilayer permeability by external stimuli but instead enables specific substrate flow across compartment boundaries. A frequently used protein for this purpose is the *Escherichia coli* porin OmpF (outer membrane protein F), which is a trimeric integral membrane protein that enhances the passive diffusion of small hydrophilic molecules (Cowan et al., 1995).

The integration of OmpF in a lipid bilayer encapsulating β-lactamase inside the vesicle lead to hydrolysis of the externally added substrate ampicillin and yielded the product ampicillinoic acid (Graff et al., 2001). The product was first detected inside the vesicle before its accumulation in the medium occurred, which was also facilitated by OmpF. Since the diffusion across the membrane bilayer represented the bottleneck of the reaction, a higher substrate concentration was necessary to achieve comparable activity of encapsulated enzymes compared to free enzymes. Recently, OmpF was also reconstituted into an ABA triblock copolymer bilayer that served as the inner compartment of vesosomes that encapsulated horseradish peroxidase (**Figure 1D**; Siti et al., 2014). The semi-permeable outer compartment was built of PS-PIAT diblock copolymers and contained GOx as well as the inner ABA polymersomes. After glucose and Amplex Red were added to the outside solution, they diffused into the outer compartment, where glucose was oxidized by GOx to produce H2O2 (**Figure 1D**). Hydrogen peroxide then diffused into the inner compartment, where it subsequently oxidized Amplex Red in the presence of horseradish peroxidase to yield the fluorescent end-product Resorufin (**Figure 1D**). Most importantly, this study by Siti et al. (2014) showed an increased reaction rate in the presence of OmpF, which enhanced diffusion into the inner reaction compartment.

The constituents of the heptameric protein α-hemolysin can self-assemble in membrane bilayers to form pores, which makes it an attractive target for the use in artificial membranes. Elani et al. (2013, 2014) incorporated α-hemolysin into DOPC bilayers of multicompartment vesicle networks by phase transfer of water-in-oil droplets (**Figures 1E** and **2E**). To proof functionality of the pore protein in a two-compartment system, the Ca2<sup>+</sup> sensitive dye Fluo-4 was encapsulated in one compartment and Ca2<sup>+</sup> in the other compartment. Only those vesicular systems with α-hemolysin in the internal bilayer showed an increase in fluorescence, while no fluorescence was detectable in vesicle systems that did not contain α-hemolysin (Elani et al., 2013). A spatially segregated reaction setup was established in a threecompartment system with α-hemolysin pores connecting the first compartment with the second and with the third compartment as well as with the surrounding (**Figure 1E**; Elani et al., 2014). Each reaction step was performed in a single compartment: in the first compartment, lactose was hydrolyzed to glucose and galactose by lactase, and glucose was then oxidized to gluconolactone in the second compartment *via* GOx, thereby producing hydrogen peroxide (**Figure 1E**). The diffusion of glucose from the first into the second compartment was conferred by α-hemolysin, while lipid bilayers are permeable to hydrogen peroxide to allow for the diffusion of hydrogen peroxide from the second into the third compartment (**Figure 1E**). Finally, hydrogen peroxide initiated the oxidation of Amplex Red by horseradish peroxidase in the third compartment to yield the fluorescent product resorufin. No increase in fluorescence was detectable in vesicular systems without α-hemolysin (Elani et al., 2014).

## SPECIFIC TRANSPORT PROTEINS AS TOOLS IN ARTIFICIAL VESICLE SYSTEMS

The chapters above deal with the unspecific release of cargo and substrates from vesicles *via* physicochemical triggers or by unspecific porins such as OmpF or α-hemolysin. To enable specific transport of cargo and substrates, appropriate transport proteins can be reconstituted into liposome or polymersome membranes. The incorporation of specific transport proteins into artificial membranes of liposomes, polymersomes, or vesosomes allows to conceptualize much more tightly controlled vesicle-based bioreactors. Therefore, we devote the rest of this review on recent advances in the reconstitution of membrane proteins.

#### RECONSTITUTION OF TRANSMEMBRANE PROTEINS IN LIPID SYSTEMS

The studies on reconstitution of transmembrane proteins into a membrane environment mainly focus on the work with liposomes (Kahya et al., 2001; Montes et al., 2007; Kaneda et al., 2009; Aimon et al., 2011; Dezi et al., 2013; Hansen et al., 2013; Liu et al., 2013) but have also been successfully applied to polymersomes (Meier et al., 2000; Choi and Montemagno, 2005; Nallani et al., 2011; Martino et al., 2012).

There are multiple ways to reconstitute transmembrane proteins in artificial membrane systems. The basic problem behind transmembrane protein reconstitution is the nature of these proteins. Since transport proteins are anchored in the hydrophobic core of the cell membrane, they have a hydrophobic nature. This hydrophobicity aggravates their extraction as well their insertion from and into membrane systems. The cell itself circumvents this problem employing several strategies (Wickner and Lodish, 1985; Gutensohn et al., 2006). One of these is the so-called Sec pathway found in bacteria and eukaryotes (Economou, 1999; Rapoport et al., 2004). Hydrophobic protein parts are co-translationally recognized by a signal recognition particle, which leads to a translocation of the protein translation machinery to the endoplasmic reticulum (ER) and the co-translational insertion of the protein into the cell membrane *via* the Sec apparatus. Another strategy involves the post-translational insertion of membrane proteins. After protein translation in the cytoplasm of eukaryotic cells, the protein is delivered, unfolded, inserted, and refolded in the target membrane, i.e., the TIC TOC (Gutensohn et al., 2006; Andrès et al., 2010) complex in the chloroplast envelope or the TOM TIM (Bauer et al., 2000) complex in mitochondria. In addition, certain transmembrane proteins exhibit spontaneous insertion into the membrane, such as predominantly cytochromes (Wickner and Lodish, 1985). It is debated whether helical hairpin motifs can mediate this spontaneous insertion of proteins into membranes (Engelman and Steitz, 1981). All in all there is a huge variety and complexity of mechanisms the cell uses to insert membrane proteins into the target membrane as well as the heterogeneity of membrane proteins involved. Similarly, there are a couple of methods available for the functional reconstitution of membrane proteins, which need to be tested for individual proteins of interest.

#### DETERGENT-MEDIATED RECONSTITUTION

Since the solubilization process of membrane proteins, the removal from their natural environment, is usually performed in the presence of detergent, the detergent-mediated reconstitution is one of the most common strategies for protein reconstitution (**Figure 2A**). Here, liposomes are formed *via* extrusion (Torchilin and Weissig, 2003) or sonification (Torchilin and Weissig, 2003), and after a presolubilization step, the solubilized proteins are subsequently added to the liposome preparation, which eventually leads to the incorporation of the proteins into the membranes.

After the pioneering work of Kagawa and Racker (1971), the detergent-mediated reconstitution has been successfully reported on the reconstitution of various transmembrane proteins in vesicles up to sizes larger than 1 μm (Rigaud et al., 1988; Steinberg-Yfrach et al., 1998; Seddon et al., 2004; Dezi et al., 2013). The successfully reconstituted proteins includes

FIGURE 2 | Overview of different transmembrane protein reconstitution methods. (A) Detergent-mediated protein reconstitution uses detergent molecules to trigger reconstitution (upper row) and fusion events (lower row). After successful reconstitution, the detergent molecules are removed to form stable proteincontaining vesicles. Illustration adapted from Dezi et al. (2013). (B) Protein reconstitution *via* spontaneous swelling of vesicles from a protein-containing agarose film. The solubilized proteins are added to the agarose gels and incorporate spontaneously upon lipid addition and swelling. (C) An internalized cell-free extract supplied with specific DNA of the transmembrane protein leads to reconstitution of the protein in the vesicle membrane. Illustration adapted from Kaneda et al. (2009). (D) A proteoliposome is formed with the use of antibody or protein ligand-coated beads and the subsequent addition of lipids. Illustration adapted from Frank et al. (2015). (E) Spontaneous protein insertion in a double-emulsion setup. A lipid monolayer forms on the border between a lipid containing oil phase and an aqueous phase. Upon addition of an aqueous droplet, containing the solubilized protein, a micelle forms around the droplet, enclosing the solution. When the micelle passes through the lipid monolayer into the aqueous phase, a vesicle forms. The solubilized protein spontaneously inserts. Illustration adapted from Yanagisawa et al. (2011).

porins, such as FhuA (Dezi et al., 2013), transporters such as bacteriorhodopsin (Steinberg-Yfrach et al., 1998), the Glucose-6-P/P antiporter (Kammerer et al., 1998), H+-ATPase (Steinberg-Yfrach et al., 1998), Ca2<sup>+</sup>-ATPase (Steinberg-Yfrach et al., 1998), CFOF1-ATPase (Steinberg-Yfrach et al., 1998), and channels, such as the voltage-gated potassium channel (Ruta et al., 2003), BmrC/BmrD, a bacterial heterodimeric ATP-binding cassette efflux transporter (Dezi et al., 2013), as well as receptor proteins such as GPCRs (Ishihara et al., 2005).

With the advantage of very low external influences on the membrane protein, the detergent-mediated reconstitution is a suitable method for many proteins. The reconstitution of the membrane proteins is either achieved through direct incorporation of solubilized membrane proteins or, in order to form larger proteoliposomes, through detergent-mediated fusion of vesicles. In both cases, the vesicles are often pre-solubilized by detergent prior to the addition of proteins. The detergent-to-lipid ratio needs to be meticulously adjusted to achieve a fusogenic liposome state that ranges between detergent saturation and solubilization of the vesicle. For detergents, the relationship between critical micelle concentration and lipid-to-detergent ratio is given by (Ollivon et al., 2000; Rigaud and Lévy, 2003).

*D D R L* total water eff = + ⋅ , where *D*total represents the total detergent concentration, *D*water provides the monomeric detergent concentration in water, i.e., the cmc determined in the presence of lipids, while *L* represents the lipid concentration and *R*eff the lipid-to-detergent ratio specific for the solubilization state. *D*water and *R*eff are constants that need to be determined for the detergent of choice (Ollivon et al., 2000; Rigaud and Lévy, 2003).

With increasing detergent concentration, the amount of incorporated detergent molecules in the vesicles increases, leading to the fragile state of saturation (Ollivon et al., 2000; Rigaud and Lévy, 2003). At this stage, further addition of detergent leads to vesicle shrinkage until full solubilization occurs (Ollivon et al., 2000; Rigaud and Lévy, 2003). The incorporation rates are dependent on the protein and detergent used (Rigaud and Lévy, 2003), and the state of solubilization also represents an important factor for the reconstitution rate and the orientation of the reconstituted protein (Rigaud et al., 1995; Rigaud and Lévy, 2003). It was shown that some proteins incorporate better in a state of low detergent incorporation, while other need complete saturation for an efficient reconstitution (Rigaud et al., 1995). The reconstitution efficiency thereby depends also on the type of detergent and protein used (Rigaud et al., 1995; Dezi et al., 2013). The method has to be tuned for the protein of interest to achieve the best fusion rate; therefore, there are various protocols available to achieve reconstitution but no specific protocol is applicable to all membrane proteins (Ollivon et al., 2000; Rigaud and Lévy, 2003).

After protein reconstitution, the detergent molecules need to be removed for stable vesicles to form (Ollivon et al., 2000; Rigaud and Lévy, 2003). The removal method and its efficiency are thereby dependent on the type of detergent (Rigaud et al., 1995; Rigaud and Lévy, 2003). Detergents with a high cmc, such as CHAPS, chapso, cholate, and octyl glucoside, generally form small micelles, which makes them easy to remove *via* dialysis or gel filtration (Rigaud et al., 1995; Rigaud and Lévy, 2003). Detergents with a lower cmc, which form larger micelles, are barely removable by gel filtration or dialysis. Here, the removal can be done using detergent-adsorbent beads. These detergents include Triton-X 100 (Rigaud et al., 1995; Rigaud and Lévy, 2003). Since there are no general protocols available for protein solubilization or reconstitution, they are still accounted for as bottle neck processes (Rigaud et al., 1995; Rigaud and Lévy, 2003; Yanagisawa et al., 2011).

The detergent-mediated reconstitution can also be combined with liposome fabrication methods, such as the double-emulsion approach (Pautot et al., 2003; Yanagisawa et al., 2011). To this end, the solubilized potassium channel protein KcsA was added in the vesicles as well in as the external aqueous phase (Yanagisawa et al., 2011). In both cases, protein reconstitution was detected and the protein was functional (Yanagisawa et al., 2011). During this study, it was observed that not every lipid composition favored the insertion of the target protein. In the case of KcsA, there was no reconstitution detectable when PC lipids were used (Yanagisawa et al., 2011). It was also observed that the reconstitution of KcsA was oriented with the directionality depending on the outside or inside configuration of the protein and the size or the intracellular and extracellular domains (Yanagisawa et al., 2011). KcsA and alpha-hemolysin were successfully reconstituted both from the outer aqueous phase as well as from the inner aqueous solution (Takiguchi et al., 2011; Yanagisawa et al., 2011).

For the reconstitution of KcsA, the detergent DDM was used and due to its high dilution, the detergent concentration was far below the cmc. Nevertheless, detergent molecules can remain in the membranes after successful reconstitution. Since remaining detergent might alter lipid composition and consequently protein behavior, this approach might be unsuitable for certain studies.

## DIRECT INCORPORATION OF MEMBRANE PROTEINS INTO VESICLES IN CELL-FREE SYSTEMS

Another approach to reconstitute transmembrane proteins is to encapsulate the protein translation machinery in vesicles (**Figure 2C**). The use of cell-free extract to reconstitute transmembrane proteins in liposomes or polymersomes has been successfully reported by various groups (Kalmbach et al., 2007; Murtas et al., 2007; Goren and Fox, 2008; Liguori et al., 2008a,b; Kaneda et al., 2009; Katzen et al., 2009; Kuruma et al., 2009; Maeda et al., 2012; Martino et al., 2012; Liu et al., 2013).

The addition of cDNA coding for the protein of interest allows specific protein synthesis, and it has been shown that membrane proteins can be functionally synthesized and incorporated in the surrounding compartment membrane. Synthesized membrane proteins include pores (Shimizu et al., 2001), channels (Liguori et al., 2008a,b), transporters (Liguori et al., 2008a,b), and receptors (Junge et al., 2011) from eukaryotic and prokaryotic origin (Zanders, 2005). The reconstituted membrane proteins cover sizes from 15 kDa, as the mechanosensitive heptamer protein channel MscL (Madin et al., 2000), up to the 114-kDa transporter MdtB (Zanders, 2005; Liguori et al., 2008a). Also the synthesis of presecretory and integral membrane proteins requiring SecA-dependent translocation, for example, proteins with large periplasmic regions, such as FtsQ, or presecretory proteins, such as OmpA or MtlA (Kim et al., 2006), was reported. An overview of the successful use of cell-free systems is given in Zanders (2005).

There are various cell-free extracts available, but the most commonly used extracts are the whole-wheat germ extract, the *E. coli* extract and the PURESYSTEM. The PURESYSTEM was developed by Kuruma et al. (2008) and includes the chaperonefree *E. coli* translation machinery assembled from purified recombinant components. This defined environment might be beneficial for the functional characterization of proteins, with protein yields of around 6 μg/ml at relative high cost (Kuruma et al., 2008). The wheat germ extract shows stable protein expression for weeks (Zanders, 2005) but is considerably more labor intensive in preparation (Berrier et al., 2004; Sawasaki et al., 2004) than the production of the *E. coli* extract. The *E. coli* assay can be prepared in approximately 1 day (Swartz, 2006; Hovijitra et al., 2009) and shows similar efficiencies as the wheat germ extract with protein yields of around 1–6 mg/ml (Ishihara et al., 2005; Kuruma et al., 2008). While the reaction time of the *E. coli* and the wheat germ extract take between 6 and 24 h for the latter, the reaction time of 2 h is considerably shorter for the PURESYSTEM (Kuruma et al., 2008).

All methods have been used to synthesize and reconstitute troublesome proteins exceeding 100 kDa in size as well as membrane proteins (PURESYSTEM: Kim et al., 2006, wheat germ: Schwarz et al., 2010, and *E. coli*: Madin et al., 2000). Posttranslational modifications (Kaiser et al., 2008), such as phosphorylation, prenylation, and glycosylation, as well as the formation of disulfide bonds can be achieved by adding the corresponding enzymes (Kalmbach et al., 2007).

The general workflow to achieve protein synthesis in cellfree systems can be summarized as follows: (i) identify the best expression vector compatible with the cell-free expression system (Kuruma et al., 2008). (ii) If immunological detection or fusion with reporter proteins is desired, N-terminal tags showed to generate a higher yield. (iii) After a small-scale optimization step to identify the best expression conditions in expression tests and (iv) a scale-up step, (v) the proteoliposomes can be purified. The estimated time scale from vector design to protein synthesis is around 15 days to 1 month.

An interesting feature of the cell-free systems is the compatibility of the expression systems with some detergents (Madin et al., 2000; Goren and Fox, 2008). A wide range of non-ionic or zwitterionic detergents, Triton X-100, Tween 20, Brij 58p, *n*-dodecyl β-d-maltoside, and CHAPS, were compatible with cell-free synthesis, allowing the expression of proteins in the presence of detergents, while *n*-octyl β-d-glucoside and deoxycholate had an inhibitory effect on protein yield (Madin et al., 2000).

The incorporation of cell-free extract expression systems into vesicles can be achieved using various liposome formation techniques, including natural swelling (Nomura et al., 2003; Kaneda et al., 2009), double-emulsion (Noireaux and Libchaber, 2004; Maeda et al., 2012; Liu et al., 2013), and microfluidic (Martino et al., 2012) approaches.

A drawback of this reconstitution method is the introduction of the complete translation machinery into the lumen of vesicles, which might introduce unwanted complexity to the synthetic system. Also necessary post-translational modifications which might be necessary for the formation of a fully functional protein may not occur, unless the responsible enzymes, if known, are added to the cell-free extract.

## REHYDRATION OF PROTEIN-CONTAINING AGAROSE

For their reconstitution, solubilized membrane proteins can be dissolved in warm, molten agarose gels (**Figure 2B**) (Hansen et al., 2013, 2015; Gutierrez and Malmstadt, 2014). In this technique, precipitation of the detergents is prevented at a dilution below the cmc. The gel is spread on a coverslip and partially dehydrated. Since agarose retains a high water content (Horger et al., 2009), it is hypothesized that the proteins do not denature (Hansen et al., 2013). Subsequently, lipid droplets are deposited on top of the gel. Under a stream of nitrogen, the solvent of the droplets is evaporated and the gel can be rehydrated using a protein compatible buffer (Hansen et al., 2013). Upon rehydration of the lipids, protein-containing liposomes form from the surface of the protein-containing agarose. The proteins successfully used in this approach so far were aquaporin-Z, bacteriorhodopsin, and SoPIP2 (Hansen et al., 2013), as well as the glucose transporter GLUT1 (Hansen et al., 2015) and the human serotonin receptor 5-HT1A (Gutierrez and Malmstadt, 2014).

This method is relatively easy but requires a comparably large amount of protein. Moreover, it has been reported that liposomes grown on agarose gels contain agarose in the membrane as well as in the interior (Horger et al., 2009). This introduces a change in the mechanical properties (Lira et al., 2014) of the vesicles, which might introduce artifacts in protein diffusion as well as in enzyme kinetics.

## USING PROTEOLIPOBEADS FOR THE RECONSTITUTION OF MEMBRANE PROTEINS

Transmembrane protein-coated beads can be applied for the reconstitution of membrane proteins into lipid bilayers (Mirzabekov et al., 2000; Frank et al., 2015). The beads are coated with streptavidin together with a tag or antibody (**Figure 2D**). The tag allows the purification of the target proteins from detergentcontaining cell lysates (Mirzabekov et al., 2000) as well as the coating of the bead with the solubilized transmembrane protein. After addition of detergent-solubilized lipids, the lipids cluster around the protein. The addition of biotinylated lipids, which bind to the streptavidin on the bead, supports the stable and saturated formation of a lipid bilayer around the bead (Mirzabekov et al., 2000). This approach was successfully applied to reconstitute the G-coupled receptor protein CCR5, a seven transmembrane helix protein, into liposomes in the native confirmation and in a uniformly oriented fahion. The detergent is then removed *via* a dialysis step. Here, the non-ionic maltoside detergent cymal was used (Mirzabekov et al., 2000). Throughout the process, the use of paramagnetic beads facilitates buffer changes and the fabrication process (Mirzabekov et al., 2000). A disadvantage of the method is the fixed position of the transmembrane proteins since they are anchored on the bead surface. This reconstitution method is therefore not universally applicable.

## PARTIAL DRYING OF LIPOSOMES

Another method to achieve protein reconstitution into cell-sized scale vesicles of 1 μm diameter is based on protein-containing bilayers (Girard et al., 2004; Aimon et al., 2011; Fenz et al., 2014). These layers are created by partially drying protein-containing small liposomes. Then, various methods can be employed to form liposomes, including electroformation or swelling (Girard et al., 2004; Aimon et al., 2011; Fenz et al., 2014). A drawback of these techniques is the inevitable rupture and partial drying of the protein-containing vesicles. This is accompanied by the risk of protein denaturation as well as the need for vesicle formation in protein compatible buffers, since mostly high-salt conditions are still required. Electroformation offers a wide range of parameters to fine-tune the vesicle formation process, with a number of protocols available for the use of physiological buffers (Pott et al., 2008). Changes in the electric field (amplitude and frequency), the duration of the protocol, and the swelling buffer can be applied to influence vesicle formation. However, the process of vesicle formation is not yet fully understood (Pott et al., 2008) and side effects, such as lipid-peroxidation (Zhou et al., 2007), or impact of the electric field on the proteins are often not assessable.

Gel-assisted swelling techniques offer a comparably limited range of parameters to improve vesicle production. The use of buffers and lipids involved, as well as the substrate, PVA (Weinberger et al., 2013), agarose (Horger et al., 2009), or others, dominate vesicle size and yield. The processes of vesicle formation in physiological buffers are still not fully understood and require further investigation. The method of proteoliposome formation in the size range greater than one micron is therefore still limited by the available protocols.

#### PEPTIDE-INDUCED FUSION

Membrane protein incorporation into larger vesicles can also be achieved by the fusion of vesicles. Besides detergent-mediated fusion (as already discussed above), other, non-detergentmediated fusion techniques have been elaborated, including the work of Kahya et al. (2001). Here, the fusogenic peptide WAE has been used to initiate vesicle fusion. Liposomes were formed out of DOPC:chol/PE-PDP (3.5:1.5:0.25), and the WAE peptide was subsequently covalently attached to the vesicles (Pécheur et al., 1997, 1999; Kahya et al., 2001). Larger vesicles, with positively charged lipids, a mixture of DOPC:DOPE:SAINT-2 (10:3:1.3), were used as peptide target. Under these conditions, fusion events were observed and the method was successfully used to reconstitute bacteriorhodopsin (Kahya et al., 2001) as well as a complex of the seven-helix photoreceptor NpSRII and its cognate transducer NpHtrII, with the latter containing two transmembrane α-helices and a large cytoplasmic domain (Kriegsmann et al., 2009).

## MECHANICAL AND SPONTANEOUS INSERTION OF MEMBRANE PROTEINS

In other reconstitution techniques, solubilized proteins are added to the already formed liposomes. Defects in the liposome membrane are either induced by sonification pulses (Rigaud and Lévy, 2003), electrical pulses (Rigaud and Lévy, 2003), or freezethawing steps (Kammerer et al., 1998; Rigaud and Lévy, 2003). The drawback of these techniques is that mechanical stress is imposed on the solubilized proteins, which can lead to denaturation or low reconstitution rates (Rigaud and Lévy, 2003).

Some classes of proteins, e.g., cytochromes, bacteriorhodopsin and F0F1-ATPases, and porins (Elani et al., 2013, 2014), show spontaneous incorporation into lipid bilayers without the addition of any detergent. Nevertheless, a certain lipid composition of mostly acidic lipids (Eytan and Broza, 1978), as well as vesicles of small size (Eytan and Broza, 1978; Eytan, 1982), is required for spontaneous insertion of these transmembrane proteins. This process has been examined more closely. Jain and Zakim (1987) have revealed that defects in the membrane associated with amphiphilic contaminants as cholesterols, short-chain lipids, and others are facilitating spontaneous insertion.

## CONCLUSION

Artificial vesicles are a versatile resource to establish compartmentation in synthetic biochemical nanoreactors. To this end, compartmentation can either be achieved by immobilizing artificial liposome or polymersome vesicles to the matrix of microfluidic reactors or by generating nested or concatenated vesicular systems such as vesosomes.

In order to maximize yield of biochemical processes inside vesicular nanoreactors, it is indispensable to control the exchange of substrates and products across the membranes between the reactor compartments. Technically, it is possible to reconstitute entire metabolic pathways into liposomes or polymersomes, and by the use of vesosomes or immobilized vesicles, it seems even feasible to mimic the natural compartmentation of these pathways in synthetic systems. However, the exchange of substances across membrane boundaries needs to be highly specific to make compartmented reconstituted biochemical pathways work.

Exchange of substances across membranes can either be accomplished by utilizing lipid and coblock polymers that change permeability in response to physicochemical triggers such as heat, redox potential, or pH. However, only unspecific mixing of contents can be achieved by altering the permeability of the membrane boundary. The incorporation of unspecific protein pores into the membranes of vesicle compartmented reactors, e.g., porins or α-hemolysin, allows for a more selective exchange of low molecular weight substances but still permits the passage of a variety of intermediates that are similar in charge and/or structure. Ultimately, the reconstitution of specific membrane transporters that only allow the passage of individual substrates is necessary for the functional reconstitution of compartmented biochemical pathways. As outlined in this review, diverse approaches can be undertaken to achieve successful reconstitution of membrane transport proteins into artificial membranes. Since the optimal conditions for successful reconstitution are quite specific for each individual transport protein and can hardly be transferred to other candidates, we have taken care to summarize the available techniques for membrane transporter reconstitution. We believe that the integration of metabolite transporters into vesicle-based nanoreactors will largely advance bottom-up approaches in the development of compartmented protocells in the future.

#### REFERENCES


Eytan, G. D. (1982). Use of liposomes for reconstitution of biological functions. *Biochim. Biophys. Acta* 694, 185–202. doi:10.1016/0304-4157(82)90024-7

#### AUTHOR CONTRIBUTIONS

CS, AL, and LV wrote the article. NB, VS, and LV conceptualized the article.

#### FUNDING

The authors would like to acknowledge generous funding of the EFI SynBio program by the emerging fields initiative of the FAU Erlangen-Nürnberg.


optimization, structure, and ligand binding analyses. *Proc. Natl. Acad. Sci. U.S.A.* 105, 15726–15731. doi:10.1073/pnas.0804766105


use of detergents. 2. Incorporation of the light-driven proton pump bacteriorhodopsin. *Biochemistry* 27, 2677–2688. doi:10.1021/bi00408a007


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2016 Schmitt, Lippert, Bonakdar, Sandoghdar and Voll. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Metabolomics, standards, and metabolic modeling for synthetic biology in plants

*Camilla Beate Hill1 \*, Tobias Czauderna2 , Matthias Klapperstück2 , Ute Roessner1 and Falk Schreiber2,3*

*1School of BioSciences, University of Melbourne, Parkville, VIC, Australia, 2 Faculty of Information Technology, Monash University, Clayton, VIC, Australia, 3 Institute of Computer Science, Martin Luther University Halle-Wittenberg, Halle, Germany*

Life on earth depends on dynamic chemical transformations that enable cellular functions, including electron transfer reactions, as well as synthesis and degradation of biomolecules. Biochemical reactions are coordinated in metabolic pathways that interact in a complex way to allow adequate regulation. Biotechnology, food, biofuel, agricultural, and pharmaceutical industries are highly interested in metabolic engineering as an enabling technology of synthetic biology to exploit cells for the controlled production of metabolites of interest. These approaches have only recently been extended to plants due to their greater metabolic complexity (such as primary and secondary metabolism) and highly compartmentalized cellular structures and functions (including plant-specific organelles) compared with bacteria and other microorganisms. Technological advances in analytical instrumentation in combination with advances in data analysis and modeling have opened up new approaches to engineer plant metabolic pathways and allow the impact of modifications to be predicted more accurately. In this article, we review challenges in the integration and analysis of large-scale metabolic data, present an overview of current bioinformatics methods for the modeling and visualization of metabolic networks, and discuss approaches for interfacing bioinformatics approaches with metabolic models of cellular processes and flux distributions in order to predict phenotypes derived from specific genetic modifications or subjected to different environmental conditions.

#### *Edited by:*

*Lars Matthias Voll, Friedrich-Alexander-University Erlangen-Nuremberg, Germany*

#### *Reviewed by:*

*Weiwen Zhang, Tianjin University, China Holger Hesse, Freie Universität Berlin, Germany Esteban Marcellin, The University of Queensland, Australia*

> *\*Correspondence: Camilla Beate Hill camilla.hill@unimelb.edu.au*

#### *Specialty section:*

*This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology*

*Received: 30 July 2015 Accepted: 05 October 2015 Published: 21 October 2015*

#### *Citation:*

*Hill CB, Czauderna T, Klapperstück M, Roessner U and Schreiber F (2015) Metabolomics, standards, and metabolic modeling for synthetic biology in plants. Front. Bioeng. Biotechnol. 3:167. doi: 10.3389/fbioe.2015.00167*

Keywords: metabolic engineering, synthetic biology, metabolic modeling, systems biology, metabolomics

## INTRODUCTION

One of the greatest challenges for scientists is to understand the genetics, physiology, and biochemistry of plants and the interaction of genes with the environment in order to provide strategies to manipulate these processes to improve plant growth and performance and to prevent diseases. There have been immense advances in the past decades allowing scientists to sequence genomes of organisms addressing many questions, such as how the genome determines a plant's response to any environmental stimuli. Simultaneously, the development of analytical technologies allows us to take a comprehensive and unbiased glance at the gene products, such as gene transcripts (mRNA), proteins, lipids, and metabolites. The post-genomics era was born with the establishment of transcriptomics, proteomics, lipidomics, and metabolomics, each with their associated computational advances providing the path for data analysis, visualization, and integration to establish the relationships between the genome and gene products under certain conditions.

Metabolites are synthesized by enzyme-catalyzed reactions in any living cell. They are important for the maintenance and survival of cells, most importantly for energy storage and provision, and they also contribute to building and maintaining the cell's structural components. Metabolites and their functionalities are indispensable in the interaction of a cell with the environment, and it has been argued that the metabolome of any biological system represents the final "read-out" of the expression of many genes in that system in a particular situation, and reflecting gene × environment relationships (Hill et al., 2014).

In comparison to transcripts or proteins, metabolites have a vast range of different chemical structures with an astonishing array of different functional groups that lead to differences in their physical and chemical properties, such as solubility, reactivity, stability, and polarity (Trethewey, 2004). This sheer diversity presents challenges to assay these compounds in a multiparallel fashion. Firstly, a number of different solvent extraction procedures need to be utilized to extract metabolites efficiently from any given plant tissue. In addition, no single analytical approach is capable of detecting and quantifying such chemical diversity; therefore, a range of different approaches (more detail below) has to be employed to analyze as many metabolites as possible. Today, metabolomics is considered as the science combining modern and sophisticated analytical instrumentation for metabolite detection and quantification with appropriate computational and statistical approaches to extract, mine, and interpret metabolomics data.

Metabolomics is now also becoming an important tool for biotechnological and metabolic engineering approaches, which aim to manipulate biochemical pathways to enhance the accumulation of compounds of interest (Dromms and Styczynski, 2012). Since metabolomics can provide a more complete picture of the biological system studied, it has been argued that it can be applied to identify metabolite markers that indicate a particular phenotype [e.g., level of target compound(s)] to allow the assessment of the successes of engineering steps to provide further guidance for future engineering strategies (Beckles and Roessner, 2011). Historically, metabolic engineers have used the analysis of the levels of the target compound(s) and potentially a few closely related metabolites to define metabolic engineering strategies. However, the potential metabolomics offers, which measures hundreds of metabolites rather than just a few, has only been rarely explored in metabolic engineering approaches, particularly in plants (Rios-Estepa and Lange, 2007; Fernie and Morgan, 2013). The unbiased and broad approach of metabolomics helps to assess how plants maintain energy, carbon, and nutrient resources and provides information how these resources may potentially be redirected into the synthesis of the desired metabolites, therefore allowing smart engineering strategies to be developed.

In the following, we review current metabolomics technologies, information resources for metabolomics, as well as computational analysis and modeling approaches with a focus on plant related research.

## TOOLS AND TECHNOLOGIES TO STUDY PLANT METABOLISM

## Analytical Technologies

The plant metabolome, compared to the metabolome of other organisms, is represented by a particularly vast variety of chemical structures with an enormous diversity of chemical and physical properties (Villas-Boas et al., 2007). In the past decade, researchers have developed and validated a number of complementary analytical approaches to extract, separate, detect, and quantify this diversity. Different solvent extraction procedures may need to be employed to cover the range of polarity of metabolites (Dias et al., 2012). However, most routine metabolite extractions are based on a methanol/water/chloroform biphasic extraction, which captures a large complement of the plant metabolome.

Once metabolites are extracted, the complex mixtures need to be separated allowing individual detection and quantification of compounds. The polarity of metabolites also influences the choice of the separation approach. In liquid chromatography (LC)-based separations, researchers now commonly use two types of separation chemistries, such as C18 reverse phase, which separates the more hydrophobic complement of a metabolite extract, such as many secondary metabolites and lipids, and hydrophilic interaction chromatography, which is better suited for the polar metabolites (Callahan et al., 2009; Hill and Roessner, 2015). In addition, many metabolites are either positively or negatively charged molecules, therefore molecules need to be ionized using both positive and negative ionization mode (Beckles and Roessner, 2011; Hill et al., 2014). There are a number of different ionization techniques available, such as electrospray ionization or atmospheric pressure ionization, again each better suited for a particular subclass of metabolites. However, the most common technique used in LC-MS-based metabolomics is electrospray ionization, which allows reliable ionization of thousands of compounds (Hill et al., 2014).

An alternative separation technique is gas chromatography (GC), which is known for its superior separation power and reproducibility (Dias et al., 2015). The Metabolomics Standards Initiative (MSI; Fiehn et al., 2007) has developed minimal reporting standards for metabolomics data, and strategies to further enhance reproducibility, experimental and data standardization are continuously developed (Allwood et al., 2009). However, GC requires compounds to be volatile to be amenable for analysis. Most metabolites are not volatile and therefore require chemical derivatization to make them volatile. This limits the utility of GC-based separation in metabolomics applications; however, GC coupled to MS still remains the "work horse" in metabolomics due to its reproducibility and its ease of use (Hill and Roessner, 2013).

Mass spectrometry (MS) is the most commonly used detector in metabolomics approaches. The power of MS is that it can distinguish the size of the ionized molecule (by determining the mass to charge ratio of each detected ion) and allows the determination of the number of individual ions detected. The mass to charge ratio determined by MS is used for compound identification and the number of ions detected can be related back to initial concentration of the molecule in the metabolite extract. Another advantage of MS is that it can fragment an ion once (MS/MS) or multiple times (MSn ), a feature used for structural elucidation of unknown compounds. MS also has excellent capabilities to detect and distinguish isotopic patterns of each ion under analysis. This is particularly important for stable isotope labeling experiments used for metabolic flux analysis. MS analysis is able to determine enrichments in stable isotope labels in all the analyzed compounds and therefore allows determining metabolic fluxes through particular pathways. Most commonly 13C labeled metabolic precursors are used, and the distribution and enrichment of the label across pathways of interest are quantified (O'Grady et al., 2012; Chokkathukalam et al., 2014). 13C-labeling patterns can also be detected by nuclear magnetic resonance spectroscopy (NMR), which also represents an important platform for untargeted, nondestructive metabolite profiling (Eisenreich and Bacher, 2007).

## Metabolomics Data Analysis and Visualization

Metabolic data give us a snapshot of the current state of an organism (Hill and Roessner, 2013). It represents the outcome of a preceding gene expression profile, which influences the activity of pathways, transport processes, as well as production and consumption of metabolites. The resulting metabolic profile can be used to classify organisms, for example, by genotype or treatment. Furthermore, these profiles enable comparative analysis between selected treatments or genotypes (Hill et al., 2013a,b, 2015) and to obtain information about metabolites with most or least changes as a result of changes in the gene expression profile.

There are two general methods to analyze metabolic data, which can also be combined. The first, analytical method uses commonly known statistics and clustering algorithms, the second method implies the use of networks to visualize spatial and temporal properties of the data. Performing data statistics in the scope of networks for visualization purposes represents the combination of both methods.

Scientists can choose from a broad repository of statistical methods with respect to the objective at hand. Methods such as frequency distribution, analysis of variance (ANOVA), min–max, and Pearson's and Spearman-rank correlation are examples for univariate data analysis. Principal component analysis (PCA), partial least squares regression (PLS), and multivariate analysis of variance (MANOVA) are commonly used for multivariate data analysis. Self-organizing maps (SOM), support vector machines (SVM), and k-means are popular methods for cluster analysis. Generated results can be visualized using different kinds of diagrams such as plots, histograms, cluster diagrams, and heat maps.

To perform the above-mentioned analytical methods, several software tools can be used. There is a distinction between lowlevel tools to provide the actual set of algorithms, which are in turn used by high-level tools to provide user-friendly application of those algorithms. **Table 1** shows a selection of frequently used software tools.

Graphical representations of metabolic pathways and networks have been used for long time to represent knowledge about metabolic processes, and with the availability of pathways in databases,

#### TABLE 1 | Software tools for metabolomics data analysis.


TABLE 2 | Network visualization software tools that support metabolic data.


several tools have been developed to visualize metabolic data in the context of networks. These tools mostly support tasks such as data mapping and network analysis but often also try to help with layout and exploration of data. The latter touches the field of visual analytics for metabolic information (Kerren and Schreiber, 2012, 2014). **Table 2** shows a selection of tools supporting visualization of metabolic data as diagrams or heat maps in the context of biological networks.

As a result of applying common procedures of extracting and measuring metabolic abundance and concentration, usually metabolomic data possesses no spatial information. With the advent of modern methods such as imaging MS (Kaspar et al., 2011; Miura et al., 2012), additional spatial information can be gathered and displayed using 2D (Rohn et al., 2011) or 3D immersive techniques (Sommer et al., 2011).

**Figure 1** shows an exemplary metabolic network in SBGN style (see section Standards for Systems and Synthetic Biology). Time series data for different genotypes and treatments have been mapped on the tricarboxylic acid (TCA) cycle pathway. The left figure not only shows the data using box plots but also a Pearson's correlation analysis for starch production (yellow). Red- and blue-colored elements have a positive or negative correlation, respectively. For further investigation, single elements can be

enlarged to support additional exploration of data as shown on the right side.

## Databases and Repositories for Metabolites and Metabolism

Given the tools introduced before, metabolite and pathway databases and repositories are valuable resources that can be used to manage, explore, and export knowledge about metabolites and reactions in meaningful ways. They thereby deliver an encyclopedia on metabolic information as well as a base for integration of complex data into metabolic pathways in the context of graphical pathway representations (e.g., data about compound levels, reaction flow, enzyme activity, and gene expression, see Metabolomics Data Analysis and Visualization). Often these databases and repositories also provide or allow building metabolic models which can be analyzed and simulated using mathematical modeling techniques, see Section "Methods for Metabolic Modeling in Plants." A typical example of information provided by databases is shown in **Figure 2**.

A range of pathway databases and repositories are currently available; an overview is available from the Pathguide resource (Bader et al., 2006). In **Table 3,** we provide a summary of important databases and repositories especially for plant research. It should be noted that in addition to general metabolic pathway databases such as Reactome, KEGG (Kyoto Encyclopedia of Genes and Genomes), and PANTHER Pathway, there are also general plant metabolic pathway databases (see **Table 3**) and several species-specific plant metabolic pathway databases available, for example, for *Arabidopsis* AraCyc (Mueller et al., 2003) and MetNetDB (Yang et al., 2005).

Examples for multispecies plant metabolic databases are:


## CURRENT STATUS OF ENGINEERING SYNTHETIC METABOLIC NETWORKS

For a long time, the use of biological organisms for the production of chemicals was limited by the repertoire of biosynthetic pathways naturally present in these organisms. Microbial

low-molecular-weight metabolic end products have long been used as commodity chemicals for medical applications: for example, antibiotics, such as penicillin, or immunosuppressive drugs, such as cyclosporine, have been discovered in 1929 and 1972, respectively (Drews, 2000). Microbes have also been used to produce chemical compounds via biotransformation, a process in which the compound of interest is produced by the microorganism through enzymatic conversion of an external substrate added to the microbial culture medium, used, for example, for the production of acrylamide by nitrile-assimilating bacteria (Asano et al., 1982).

With the advent of recombinant DNA technology in the early 1990s, it was possible to engineer specific genes in biological organisms, which has significantly reduced the time required for mutagenesis and selection of desirable traits. Genetic engineering made it possible to use heterologous hosts for the production of chemical compounds that are not naturally present in the organism. The emergence of the clustered, regularly interspaced, short palindromic repeat (CRISPR) and related technologies that use targeted genome editing via engineered nucleases are the latest developments to introduce alterations of genome sequences and gene expression, which can be ultimately used to also introduce modifications to existing metabolic pathways and to transfer novel traits in agricultural crops (Shan et al., 2013; Sander and Joung, 2014).

The development of new sophisticated genomic sequencing and other enabling technologies for synthetic biology facilitated the production of naturally present chemicals at levels that made extraction economically feasible, and the field of metabolic engineering began to emerge. Metabolic engineering is defined as the targeted modification of metabolic pathways of biological organisms for metabolite overproduction or the improvement of cellular properties (Lessard, 1996). Since the last decade, significant progress was made to engineer the metabolism of plants to produce specific lipids, secondary metabolites, derivatives of complex natural products, and even vaccines (Mortimer et al., 2012). Many recent studies show that it is often not sufficient to modify existing metabolic pathways, but rather it is required to design metabolic pathways *de novo* from other plants or bacteria. Editing or redesigning existing plant metabolic networks is a challenging task that will benefit from advances in targeted genome modification, tissue-, cell-, and organelle-specific gene expression, the controlled expression of multigene pathways, and improvements in analytical technologies (as described in Analytical Technologies) as well as computational analysis and modeling methods (as described in Metabolomics Data Analysis



#### Metabolite databases

(provide information about metabolites/compounds such as names, chemical structures, molecular weight, occurrence in pathways, EC number, and mass spectrum references)


#### Reaction databases

(provide information about reactions and enzymes such as names, reaction diagrams, reaction mechanisms, enzymatic parameters, occurrence in pathways, and links to encoding genes)


#### Pathway databases

(provide information about plant-specific metabolic pathways such as names, involved reactions, metabolites and enzymes, and pathway structure)


and Visualization, Databases and Repositories for Metabolites and Metabolism, and Computational Approaches for Metabolic Engineering).

The ultimate goal of synthetic biology is the efficient design of biological systems (Heinemann and Panke, 2006). In this section, we will discuss the current status of engineering synthetic metabolic networks using recent examples for synthetic biology endeavors in plants: engineering of synthetic metabolic networks of plant lipids to provide an alternative and sustainable source of nutrients (see Metabolic Engineering of Plant Lipids to Provide an Alternative and Sustainable Source of Nutrients) and for the production of fuels from renewable resources (see Metabolic Engineering of Plant Lipids for the Production of Fuels from Renewable Resources), and plant secondary metabolites including alkaloids and lignins (see Metabolic Engineering of Plant Secondary Metabolites).

## Metabolic Engineering of Plant Lipids to Provide an Alternative and Sustainable Source of Nutrients

Plant oils are a major component of human diets, comprising as much as 25% of average caloric intake (Broun et al., 1999). However, certain fatty acids such as omega-3 long-chain polyunsaturated fatty acids (ω3 LC-PUFA) are present predominantly in fish and have important functions for human health, as deficiencies in these fatty acids can increase the risk or severity of cardiovascular and inflammatory diseases (Abeywardena and Patten, 2011). Until recently, the chemical composition of plant oils was constrained by the repertoire of naturally present lipid biosynthetic pathways. Novel opportunities have emerged to tailor the composition of plant-derived lipids so that they are optimized with respect to food functionality and human dietary needs. For example, Petrie et al. (2010, 2012) have recently described metabolic engineering of ω3 LC-PUFA in plants: after inserting seven biosynthesis genes of the docosahexaenoic acid (DHA) biosynthesis pathway from microalgae into the genome of *Arabidopsis thaliana*, they were able to obtain ω3 LC-PUFA levels in seeds similar to that observed in bulk fish oil. If applied to oilseed crops such as *Brassica napus*, this technology could potentially form the basis of a plant-based sustainable source to complement the existing marine fish oil supply.

## Metabolic Engineering of Plant Lipids for the Production of Fuels from Renewable Resources

Fossil fuels are the primary source of many industrial products, but reserves are decreasing rapidly and are non-renewable, and their widespread use has contributed to environmental problems arising from increased CO2 levels in the atmosphere (Le Quéré et al., 2009). Currently, biologically derived fuels from plant oils represent one of the main strategies to provide renewable and sustainable source material that can potentially substitute fossil fuels in some industrial applications. Among the many proposed solutions, algal biofuels are seen as one of the most promising: algal biomass is less resistant to conversion into simple sugars than plant biomass due to lack of lignin, and there is no issue arising from the food versus feed dilemma as no farmland has to be diverted for the production of biofuels (Daroch et al., 2013). Over the last few years, progress was made in bioethanol production through fermentation from algal feedstock (Kim et al., 2012) as well as biodiesel production from algal oils (Singh and Dhar, 2011). Increasingly, research efforts are focusing to metabolically engineer lipid pathways to increase lipid accumulation without compromising growth (Trentacoste et al., 2013). Although previously *A. thaliana* mutants of lipid catabolism were found to be linked with impaired growth (Graham, 2008), Trentacoste et al. (2013) demonstrated that disrupting lipid catabolism via the knockdown of a multifunctional lipase/phospholipase/acyltransferase in the microalgae *Thalassiosira pseudonana* led to an increased lipid accumulation without compromised algal growth. Further elucidation of lipid metabolism has the potential to lead to new strategies to engineer improved algal strains for their fuel molecules.

## Metabolic Engineering of Plant Secondary Metabolites

Plant secondary metabolites, such as alkaloids, flavonoids, terpenes, and phenylpropanoids (Hill et al., 2014), are considered to be non-essential for normal growth and development but play important roles in plant defense against pathogens and other environmental stresses. Additionally, plant secondary metabolites are of great interest to pharmaceutical industries, as they often have beneficial medicinal effects on humans. For example, many plant alkaloids are currently in medical use, such as atropine derived from the nightshade *Atropa belladonna*, morphine from the opium poppy *Papaver somniferum*, and quinine from the *Cinchona* tree (Roberts and Wink, 1998). Recent progress has been made in the metabolic engineering of morphine, a medicinally important benzylisoquinoline alkaloid: Runguphan et al. (2012) reengineered a codeine *O*-demethylase mutant that selectively demethylates codeine instead of both codeine and thebaine, as is common in the wild-type morphinan biosynthesis pathway. The integration of this highly selective mutant enzyme into commercial poppy plants as part of a future metabolic engineering effort has the potential to increase yields of morphine and codeine.

The phenylpropanoid pathway is conserved in all terrestrial plants and is responsible for the biosynthesis of many compounds that are involved in plant cell wall structure and integrity, water transport, and plant defense. They are required for the biosynthesis of lignins, aromatic natural polymers in secondary cell walls derived from the oxidative polymerization of monolignols. Decreasing or altering lignin structure provides enhanced cell wall digestibility and can greatly increase the utilization of lignin itself or cell wall polysaccharides. Due to the importance of lignin in agriculture and industry, the genes participating in lignin biosynthesis have been identified and modified in many plant species including switchgrass (Fu et al., 2011), *A. thaliana* (Gallego-Giraldo et al., 2011), and sugarcane (Jung et al., 2012). In a recent study, Zhang et al. (2012) were able to manipulate lignification in *A. thaliana* without compromising plant growth by introducing an artificial enzyme that esterifies the para-hydroxyl of phenols. The modified 4-*O*-methyl lignin monomers deprive the products of participation in oxidative dehydrogenation, leading to a decreased level of available monolignols for lignin polymerization and thus to depressed lignin biosynthesis. Further metabolic engineering efforts are currently underway to integrate this artificial enzyme into poplar with the potential to manipulate lignin levels.

## COMPUTATIONAL APPROACHES FOR METABOLIC ENGINEERING

A variety of different methods and approaches to collect experimental data can be used to quantify metabolites and other components of regulatory networks in plants, such as metabolomics (see Analytical Technologies). Computational modeling is an important tool for metabolic engineering as it facilitates the integration and analysis of experimental datasets to quantify metabolic fluxes and model metabolic networks. Section "Methods for Metabolic Modeling in Plants" presents a brief overview of modeling approaches. There are many tools and databases available for computational modeling and therefore a standardized exchange of models is highly relevant; Section "Standards for Systems and Synthetic Biology" provides an introduction to major standards in systems in synthetic biology.

## Methods for Metabolic Modeling in Plants

Several approaches have been developed to qualitatively and quantitatively model and simulate metabolic systems *in silico*. This ranges from topological analysis of network models (which looks at the interconnections between metabolites) to stoichiometric models (where constraints can be applied to define the potential metabolic flux state space or which can be analyzed using Petri-nets) to detailed kinetic models (which model changes of metabolite concentration over time). A current review (Baghalian et al., 2014) discusses the different modeling approaches, modeling software, and metabolic models of several plants in detail. A particular challenge in plants compared to prokaryotic cells is the number of different compartments, which needs to be considered in metabolic models. In addition, plants have a greater complexity of metabolic pathways and especially a large number of special pathways for secondary metabolites.

For an overview, **Figure 3** summarizes the major modeling approaches and their advantages and disadvantages. The most detailed are kinetic models, which allow for a comprehensive quantitative description and prediction of metabolic fluxes. However, in plants, these models are common only in the size of 10–20 reactions. As we move further to the right in **Figure 3,** the model size increases, but the level of detailed descriptions and predictions decreases. As the other extreme, topological models allow covering the complete metabolism in plants, but predictions are restricted to qualitative information such as reachability of metabolites.

As detailed in (Baghalian et al., 2014), plant-specific computational models of metabolism can be used for different purposes such as predicting the behavior of the metabolism under different conditions, analyzing the effect of mutations, and investigating the effect of changes due to manipulation of the metabolic system, for example, via the introduction of new metabolic pathways. Computational models usually allow investigating effects much faster and cheaper than by running wet laboratory experiments (Rohwer, 2012). In addition, using a computational model can often generate a set of alternative strategies (Copeland et al., 2012). Finally, computational models allow integrating additional data such as transcriptomics and proteomics data sets, which together with bioinformatics approaches can support a better understanding of metabolic behavior in plants (Töpfer et al., 2012, 2013).

FIGURE 3 | Overview of metabolic modeling approaches and their advantages and disadvantages, adapted from Hartmann and Schreiber (2014). More details are given in the text.

## Standards for Systems and Synthetic Biology

This section presents a short introduction to major standards in systems and synthetic biology related to software infrastructure (see also **Table 4**). Software infrastructure plays an important role in systems biology research (Kitano, 2002), in particular in supporting standardized exchange of information between different tools and databases. The major standards are Systems Biology Markup Language (SBML), CellML, Systems Biology Graphical Notation (SBGN), and Synthetic Biology Open Language (SBOL), Language (SBOL), see (Schreiber et al., 2015) for detailed specifications.

#### Systems Biology Markup Language

Systems Biology Markup Language is a machine-readable format for representation and exchange of computational models in systems biology. It can represent models of metabolism, signal transduction, and gene regulation. The main goals of SBML are (1) sharing and publication of models, (2) reusability of models, and (3) surviving of models beyond the lifetime of the software used to create them. The SBML web page currently lists more than 270 software applications that support SBML, and thousands of SBML-encoded models are available from public repositories such as BioModels including Path2Models (Büchel et al., 2013; Chelliah et al., 2013).

An SBML model consists of hierarchical lists of conceptual elements: (1) species (biological entities taking part in reactions), (2) compartments (physical containers for species), and (3) reactions (transformation, transport, or binding processes occurring over time). For the analysis and simulation of a model, more properties need to be defined such as stoichiometries, rate laws, local and global parameters, as well as units on quantities. A formal TABLE 4 | Overview of major standards in systems and synthetic biology.


description of SBML can be found in the detailed specification (Hucka et al., 2015).

#### CellML

CellML is a machine-readable format for representation, publication, and sharing of mathematical models of cellular function. In comparison to SBML, the focus of CellML is on the representation of a variety of models such as models of biological pathways, electrophysiological models, and mechanical models. The CellML web page lists a couple of software tools which support the CellML format and the development of models. The CellML Model Repository (Lloyd et al., 2008) contains several hundred models including a subrepository providing SVPs (Standard Virtual Biological Parts) for the composition of synthetic biology models (Cooling et al., 2010).

A CellML model description consists of components and lists of connections between the components. A component contains at least one variable and mathematical equations describing its behavior. Connections are mappings of variables between components enabling information exchange between them. Components and connections can be imported from an existing model, CellML allows reusing of parts of other models. A detailed description of CellML can be found in the specification (Cuellar et al., 2006).

#### Systems Biology Graphical Notation

Systems Biology Graphical Notation is a standard for the graphical representation of processes and networks studied in systems biology. Three SBGN languages (Process Description, PD; Entity Relationship, ER; and Activity Flow, AF) allow for the representation of different aspects of biological systems at different levels of detail as SBGN maps, thus providing corresponding views on the underlying biological system. The PD language (Moodie et al., 2015) describes biological entities and processes between these entities, the ER language (Sorokin et al., 2015) focuses on interactions between biological entities, and the AF language (Mi et al., 2015) depicts information flow between biological activities. An example of an SBGN PD map is shown in **Figure 1**.

The standardization of graphical representations helps to exchange biological knowledge more efficiently and accurately between different research communities, industry, and other players in systems biology. Several databases already provide maps in SBGN, e.g., BioModels Database including Path2Models (Büchel et al., 2013; Chelliah et al., 2013), MetaCrop (Schreiber et al., 2012), PANTHER Pathway (Mi et al., 2013), Reactome (Croft et al., 2014), and RIMAS (Junker et al., 2010). The SBGN web page lists more than 20 software tools that support creating, editing, and viewing of SBGN maps, some of these tools allow to visualize SBML models in SBGN PD.

#### Synthetic Biology Open Language

Synthetic Biology Open Language is a data format for sharing and exchanging synthetic biology designs. It allows synthetic biologists to provide an unambiguous description of a design in a hierarchical and fully annotated form with the goal to improve designing, building, testing, and dissemination of synthetic biology designs. For the visualization of synthetic biology designs in SBOL, SBOL Visual (Synthetic Biology Open Language Visual) has been developed. It is a graphical notation allowing depiction of the structure of a design using glyphs to specify genetic parts, devices, modules, and systems.

The SBOL web page currently lists more than 20 software applications supporting SBOL and SBOL Visual. Some of these applications allow the generation of SBML models from synthetic biology designs in SBOL (Roehner et al., 2015) as well as the creation of synthetic biology designs in SBOL by automatically generating DNA sequences from annotated SBML and CellML models (Misirli et al., 2011). More detailed information

#### REFERENCES


about SBOL and SBOL Visual can be found in the specifications (Quinn et al., 2013; Bartley et al., 2015).

## CONCLUSION

The total number of metabolites in the plant kingdom is estimated to be between 100,000 and 200,000 and can be highly variable depending on the physiological and environmental conditions as well as the genetic background of the plant (Hill et al., 2013a,b). Such great metabolic diversity holds great promise for expanding our repertoire of known beneficial plant compounds, as many metabolic pathways and regulatory mechanisms are still awaiting discovery. Reaching significant benchmarks toward attaining these goals will be possible with better analytical tools. More accurate representations of metabolite identities and quantities will require analytical instruments and improved techniques for sample extraction and data analysis. The engineering of synthetic metabolic networks of plants will require further advances in targeted genome modification such as the application of the CRISPR/Cas system, as well as tissue-, cell-, and organelle-specific gene expression, and the controlled expression of multigene pathways. The development of methods for measuring metabolic flux directly, the quantification of metabolites in individual plant compartments, and the analysis of metabolites and activities between compartments *in vivo* will be very important next steps to further enhance the predictive capabilities of existing metabolic models. Continuous development of more user-friendly software, databases, languages, and computer models that incorporate and interpret complex information will be crucial to handle the acquired data and to aid interpretation in a biological context. We are just at the beginning of a new area of synthetic biology in plants based on metabolomics and metabolic modeling.

#### ACKNOWLEDGMENTS

UR and CH are funded through an Australian Research Council Future Fellowship program and are also grateful to the Victorian Node of Metabolomics Australia, which is funded through Bioplatforms Australia Pty Ltd, a National Collaborative Research Infrastructure Strategy (NCRIS), 5.1 biomolecular platforms and informatics investment, and coinvestment from the Victorian State government and The University of Melbourne. The authors are grateful for financial support from EFI SynBio.



production from lignocellulosic biomass. *Plant Biotechnol. J.* 10, 1067–1076. doi:10.1111/j.1467-7652.2012.00734.x


reveals metabolic pathways underlying light and temperature acclimation in *Arabidopsis*. *Plant Cell* 25, 1197–1211. doi:10.1105/tpc.112.108852


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Hill, Czauderna, Klapperstück, Roessner and Schreiber. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Photorespiratory Bypasses Lead to Increased Growth in *Arabidopsis thaliana*: Are Predictions Consistent with Experimental Evidence?

*Georg Basler1,2† , Anika Küken3† , Alisdair R. Fernie4 and Zoran Nikoloski3 \**

*1Department of Chemical and Biomolecular Engineering, University of California Berkeley, Berkeley, CA, USA, 2Department of Environmental Protection, Estación Experimental del Zaidín CSIC, Granada, Spain, 3Systems Biology and Mathematical Modeling Group, Max Planck Institute of Molecular Plant Physiology, Potsdam-Golm, Germany, 4Central Metabolism Group, Max Planck Institute of Molecular Plant Physiology, Potsdam-Golm, Germany*

#### *Edited by:*

*Patrik R. Jones, Imperial College London, UK*

#### *Reviewed by:*

*Biswapriya Biswavas Misra, University of Florida, USA Dong-Yup Lee, National University of Singapore, Singapore*

*\*Correspondence:*

*Zoran Nikoloski nikoloski@mpimp-golm.mpg.de*

*† Georg Basler and Anika Küken contributed equally.*

#### *Specialty section:*

*This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology*

*Received: 20 October 2015 Accepted: 24 March 2016 Published: 07 April 2016*

#### *Citation:*

*Basler G, Küken A, Fernie AR and Nikoloski Z (2016) Photorespiratory Bypasses Lead to Increased Growth in Arabidopsis thaliana: Are Predictions Consistent with Experimental Evidence? Front. Bioeng. Biotechnol. 4:31. doi: 10.3389/fbioe.2016.00031*

Arguably, the biggest challenge of modern plant systems biology lies in predicting the performance of plant species, and crops in particular, upon different intracellular and external perturbations. Recently, an increased growth of *Arabidopsis thaliana* plants was achieved by introducing two different photorespiratory bypasses via metabolic engineering. Here, we investigate the extent to which these findings match the predictions from constraint-based modeling. To determine the effect of the employed metabolic network model on the predictions, we perform a comparative analysis involving three state-of-the-art metabolic reconstructions of *A. thaliana*. In addition, we investigate three scenarios with respect to experimental findings on the ratios of the carboxylation and oxygenation reactions of Ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO). We demonstrate that the condition-dependent growth phenotypes of one of the engineered bypasses can be qualitatively reproduced by each reconstruction, particularly upon considering the additional constraints with respect to the ratio of fluxes for the RuBisCO reactions. Moreover, our results lend support for the hypothesis of a reduced photorespiration in the engineered plants, and indicate that specific changes in CO2 exchange as well as in the proxies for co-factor turnover are associated with the predicted growth increase in the engineered plants. We discuss our findings with respect to the structure of the used models, the modeling approaches taken, and the available experimental evidence. Our study sets the ground for investigating other strategies for increase of plant biomass by insertion of synthetic reactions.

Keywords: flux balance analysis, *Arabidopsis thaliana*, photorespiration, metabolic bypasses, metabolic engineering, crop optimization

## INTRODUCTION

The investigation and understanding of cell metabolism has recently experienced a paradigm shift largely propelled by the development of high-throughput methods. As a result, the classical pathway-centered approach has given way to a network-driven perspective, which considers the entire set of characterized biochemical reactions. This has led to the construction of genome-scale metabolic models (GEMs) for organisms from each of the three domains of life: archaea, bacteria, and eukarya (Schellenberger et al., 2010). While a GEM constitutes an organized and comprehensive system of knowledge about an organism, it also allows *in silico* analyses based on constraint-based methods, relying on the corresponding stoichiometric matrix representation and assumptions about cellular metabolism (e.g*.*, operability in a steady state and reversibility of reactions). Flux balance analysis (FBA) has provided the basic framework for predicting growth and biomass yield as well as investigating reaction fluxes in a metabolic network (Varma and Palsson, 1993a,b; Lewis et al., 2012). Approaches in this, so-called, constraint-based framework usually invoke the steady-state assumption, whereby there is no change in the size of the metabolic pools. Therefore, without additional assumptions, constraint-based modeling framework does not account for changes over time [e.g., due to circadian rhythm, which in plants necessitates the switch between autotrophic and heterotrophic metabolism (Cheung et al., 2014)]. Extensions of this framework have subsequently allowed systematic investigations of network modifications (e.g., deletion and insertion of reactions from other species or underand over-expression of gene products) directed at enhancing particular metabolic functions (Burgard et al., 2003; Pharkya et al., 2004; Ranganathan et al., 2010; Yang et al., 2011; Larhlimi et al., 2012; Lakshmanan et al., 2013).

A metabolic network that includes all known biochemical reactions of an organism may not be realistic in a particular cellular scenario (i.e., context), as mounting evidence shows that cells adapt their metabolism to the prevailing circumstance (e.g*.,* external environment, developmental stage, cell type in multicellular organisms). In different cellular contexts, only a subset of reactions is typically active, which may lead to differences in biomass composition (Chang et al., 2011; Arnold and Nikoloski, 2014). Therefore, the shift toward reconstructing context-specific models of cell metabolism has become necessary to provide more accurate and biologically meaningful insights (Bordbar et al., 2014; Machado and Herrgård, 2014; Robaina Estévez and Nikoloski, 2014). This is of particular importance when tackling the physiology of multicellular organisms, not only to better understand tissue- or cell-specific metabolism, but also as a first step to reconstruct the metabolic network of an entire plant, whereby multiple specialized models are mutually interconnected (de Oliveira Dal'Molin et al., 2015).

Despite these recent developments, it remains questionable to what extent constraint-based methods and large-scale modeling can be used for devising metabolic engineering strategies in plants. The ideal approach to test whether or not modeling based on existing plant GEMs can provide insights into crop optimization and predict novel optimization strategies is to investigate the extent to which the predictions match observations from metabolic engineering experiments. Since photorespiratory bypasses have recently been successfully engineered and tested in the model plant *Arabidopsis* (Peterhansel et al., 2013b), they can be used to evaluate the capability of existing models to correctly predict novel synthetic engineering strategies. We, therefore, focus on the analysis of plant photorespiratory metabolism in the metabolic network context.

Photorespiratory metabolism and its experimentally investigated bypasses represent an excellent test case to investigate the potential of plant GEMs in plant metabolic engineering for two reasons. First, the existing GEMs in the model C3 plant *Arabidopsis thaliana* (and other photosynthetic organisms) include almost all components of the photorespiratory metabolism (Arnold and Nikoloski, 2013); therefore, they provide a suitable starting point for investigating the role of photorespiration in the network context. Second, photorespiratory metabolism does not operate in isolation, but shapes the energetics of photosynthesis, compartmental reductant exchange, nitrate assimilation, one-carbon (C1) metabolism, and redox signal transduction [for recent reviews, see Foyer et al. (2009) and Bauwe et al. (2012)]; therefore, investigating the effects of modulating this pathway necessitates adoption of a network perspective.

The plant photorespiration pathway involves 12 reactions which are partitioned among three compartments, namely chloroplast, peroxisome, and mitochondrion, with bypasses in the cytosol (**Figure 1**) (Timm et al., 2008). Ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) reacts with ribulose-1,5 bisphosphate (RuBP), resulting in the enediol-enzyme complex that can in turn react with oxygen (O2), termed *oxygenation* (Calvin, 1954; Ogren and Bowes, 1971), or carbon dioxide (CO2), termed *carboxylation*. Upon carboxylation, one molecule of RuBP is transformed into two molecules of 3-phosphoglycerate. The oxygenation of RuBP yields one molecule of 3-phosphoglycerate and one molecule of phosphoglycolate. To mitigate the inhibition of the photosynthetic pathway by phosphoglycolate (Peterhansel et al., 2013b), this compound is recycled in a series of reactions comprising the photorespiratory pathway. For completeness, **Table 1** includes the enzyme names, Enzyme Commission (EC) numbers, the corresponding biochemical reactions together with their compartmentation and reversibility, obtained from AraCyc 9.0 (Mueller et al., 2003). From the considered reactions, photorespiration can be regarded as uptake of O2 and phosphoglycolate-related evolution of CO2 and ammonia (NH3) (effectively refixed by the glutamine synthetase and the glutamate synthase).

Three photorespiratory bypasses have been experimentally investigated to date, recently reviewed for their benefits and energy balances (Peterhansel et al., 2013a,b). The bypasses of Kebeish et al. (2007) and Maier et al. (2012) were tested in *A. thaliana*, while the bypass of Carvalho et al. (2011) was examined in tobacco. Since GEMs for tobacco have not been assembled to date, we focus on investigating the first two bypasses (see **Table 2** for the corresponding lists of reactions).

The "Kebeish bypass" starts with glycolate and ends with glycerate by considering three reactions in the chloroplast, namely, glycolate dehydrogenase (EC 1.1.99.14), tartronate semialdehyde carboxylase (EC 4.1.1.47), taking glyoxylate and producing CO2 and 2-hydroxy-3-oxopropanoate, as well as 2-hydroxy-3-oxopropionate reductase (EC 1.1.1.60), transforming 2-hydroxy-3-oxopropanoate (and NADH, NADPH, and H<sup>+</sup>) into glycerate (and NAD, NADP) (**Figure 1**). The "Maier bypass" consists of a complete glycolate catabolic cycle, including glycolate oxidase (EC 1.1.3.15), malate synthase (EC 2.3.3.9), transforming glyoxylate (and Acetyl-CoA, H2O) into

malate (and CoA), and catalase (EC 1.11.1.16) (**Figure 1**), extending earlier promising findings only using the latter two enzymes (Fahnenstich et al., 2008). Malate is decarboxylated to pyruvate by the chloroplastic NADP-malic enzyme (EC 1.1.1.40), while the chloroplastic pyruvate dehydrogenase (PDH, EC 1.2.4.1) converts pyruvate into acetyl-CoA, yielding NADH and one molecule of CO2. By interconversions in the glycolate cycle, one molecule of glycolate is converted into two molecules of CO2. Whereas the "Kebeish bypass" reintroduces three-fourth of the glycerate into Calvin–Benson cycle intermediates, the second bypass operates without recycling of 3PGA [and Calvin–Benson cycle intermediates are depleted (Peterhansel et al., 2013a)].

The aim of our study is to test the extent to which large-scale metabolic models reproduce the experimental observations on engineering the two photorespiratory bypasses. For this purpose, we evaluated three state-of-the-art *A. thaliana* models, and used them to assess the effect of model size and model quality on the accuracy of predictions. In addition, we tested different scenarios with respect to the increase predicted upon enforcing additional biochemical constraints on the ratio between the fluxes of the RuBisCO carboxylation and oxygenation reactions. To avoid introduction of bias about the directionality of the introduced bypass reactions due to inability to assess the subcellular concentration of the participating metabolites, we systematically investigated the effect of all combinations of reaction reversibilities. As our main contribution, we demonstrate that the predictions for the increase in biomass upon insertion of the photorespiratory bypasses are in accord with experimental evidence across the studied models only upon the consideration of additional biochemical constraints. Therefore, our findings indicate the need for inclusion of additional constraints when using largescale plant models for the design of viable metabolic engineering strategies.



*The abbreviations of the compartment names in which the corresponding reaction takes place are as follows: (h), chloroplast; (m), mitochondrion; (p), peroxisome. The compound abbreviations are as follows: carbon dioxide (CO2), oxygen (O2), ribulose-1,5-bisphosphate (RuBP), 3-phosphoglycerate (3PGA), phosphoglycolate (2PG), glycolate (GLC), glyoxylate (GOX), hydrogen peroxide (H2O2), glutamate (Glu), glycine (Gly), α-ketoglutarate (AKG), serine (Ser), ammonia (NH3), 3-hyrdoxypyruvate (HPA), glycerate (GA), glutamine (Gln), water (H2O), phosphate (Pi), lipoylprotein (LPL), aminomethyldihydrolipoylprotein (amDHP), tetrahydrofolate (THF), 5,10-methylenetetrahydrofolate (M-THF), dihydrolipoylprotein (DHP), oxidized nicotinamide adenine dinucleotide (NAD), reduced nicotinamide adenine dinucleotide (NADH), adenosine diphosphate (ADP), adenosine triphosphate (ATP), ferredoxin (Fd). While EC 1.1.1.81 also operates in the cytosol (Timm et al., 2008), this function is included only in the peroxisome of the considered models; hence, the compartment given for this enzyme is (p).*

TABLE 2 | Enzymes and reactions of the photorespiratory bypasses of Kebeish et al. (2007) and Maier et al. (2012) in the chloroplast.


*Carbon dioxide (CO2), oxygen (O2), glycolate (GLC), glyoxylate (GOX), hydrogen peroxide (H2O2), glycerate (GA), 2-hydroxy-3-oxopropanoate (HOP), malate (Mal), pyruvate (Pyr), coenzyme A (CoA), acetyl-coenzyme A (acetyl-CoA), oxidized nicotinamide adenine dinucleotide (NAD), and reduced nicotinamide adenine dinucleotide (NADH).*

## MATERIALS AND METHODS

#### Models and Bypasses

The analysis is based on three large-scale models of *A. thaliana* metabolism: (1) a bottom-up (i.e., experiment-driven) reconstruction, where the operability of the incorporated reactions and metabolites is ensured by starting from well-documented and necessary biochemical pathways (Arnold and Nikoloski, 2014), (2) AraGEM, the first Arabidopsis genome-scale metabolic network, including primary metabolism of compartmentalized plant cells (de Oliveira Dal'Molin et al., 2010), and (3) a larger model that includes both primary and secondary metabolisms (Mintz-Oron et al., 2012). While the last two models come with a single biomass reaction, the smaller model of Arnold and Nikoloski considers three biomass reactions corresponding to biomass compositions that pertain to realistic and frequently examined scenarios: carbon-limiting, nitrogen-limiting, and optimal growth conditions (Kleessen et al., 2012). These biomass reactions were assembled by considering the composition of 1 g dry weight of Arabidopsis leaf under the respective conditions: growth under optimal condition reflects only, if any, light limitation under autotrophic conditions; nitrogen-limitation is based on a protocol that results in a mild but sustained restriction of growth upon restriction of nitrogen availability, while the carbon limitation is experimentally realized via short-day conditions (8:16 light–dark cycle) (Arnold and Nikoloski, 2014). The functional differences of these models have already been compared with respect to the ability to simulate photoautotrophic growth, number of blocked reactions, and flux coupling of reactions (Arnold and Nikoloski, 2014). This systematic comparative analysis indicated the suitability and added value of the bottom-up reconstruction, in which every reaction can carry flux, despite the greater degree of flux coupling, and, hence, more constrained flux space.

The photorespiratory bypasses were introduced by adding the corresponding reactions to the chloroplast of the three Basler et al. Consistency of Photorespiratory Bypass Predictions

models: glycolate dehydrogenase, tartronate semialdehyde carboxylase, and 2-hydroxy-3-oxopropionate reductase for the Kebeish bypass; glycolate oxidase, malate synthase, and catalase for the Maier bypass. Since AraGEM does not contain glyoxylate or hydrogen peroxide in chloroplasts, we introduced the bypasses using the corresponding compounds from the cytosol (which is equivalent to introducing a transport reaction for allowing the exchange of these metabolites between the cytosol and chloroplast). The same approach was used for 2-hydroxy-3-oxopropanoate in the model by Mintz-Oron et al., which contains this metabolite only in the cytoplasm. Moreover, AraGEM and the model of Arnold and Nikoloski do not contain 2-hydroxy-3-oxopropanoate in any compartment. We, therefore, added the metabolite to the chloroplasts to allow introducing the Kebeish bypass. Note that this leads to a full coupling (i.e., fixed flux ratios) between the tartronate semialdehyde carboxylase and 2-hydroxy-3-oxopropionate reductase reactions in these models, since 2-hydroxy-3-oxopropanoate can only be produced and consumed by these reactions.

#### Flux Balance Analysis

Constraint-based modeling investigates the solution space of feasible flux distributions *v <sup>n</sup>* ∈ for *n* biochemical reactions in a metabolic network that is assumed to operate in a (quasi) steadystate, i.e., the concentrations *x <sup>m</sup>* ∈ of the *m* metabolites in the network are constant (Varma and Palsson, 1993a,b; Bordbar et al., 2014). The steady-state condition can be written as *dx dt* = = *Sv* 0, whereby the stoichiometric matrix *S m n* ∈ <sup>×</sup> captures the stoichiometry of all reactions. Upper and lower boundaries of the flux vector *v*min ≤ *v* ≤ *v*max further constrain the solution space and are used to model physiologically relevant scenarios (e.g., limit on nutrient import, reaction reversibility, and environmental conditions). The steady-state condition and the flux boundaries determine the solution space that usually contains infinitely many flux distributions, since the system of linear equations *Sv* = 0 is, in practice, underdetermined. To probe the functionality of the network, FBA (Varma and Palsson, 1993a,b; Bordbar et al., 2014) assumes that metabolic behavior is guided by some optimization principles (e.g., optimal biomass yield). Here, the objective is to maximize the flux through the biomass reaction, *v*biomass, and the resulting optimization problem yields a linear program:

$$\begin{aligned} \max z &= \nu\_{\text{biomass}} \\\\ \text{s.t.} \\ \text{S}\boldsymbol{\nu} &= \mathbf{0} \\ \nu\_{\text{min}} &\le \boldsymbol{\nu} \le \nu\_{\text{max}}. \end{aligned} \tag{1}$$

#### Flux Variability Analysis

The solution of the linear programing problem in Eq. 1, above, is the maximum flux value of the biomass reaction, denoted by *z*\*. FVA allows determining the minimum and the maximum value of flux that a given reaction can carry while ensuring maximum flux through the biomass reaction. These values can be obtained by solving the following linear program for a given reaction *i*:

$$\begin{aligned} \max\limits(\min)\nu\_i \\\\ \text{s.t.} \\ \text{S}\boldsymbol{\nu} &= \mathbf{0} \\ \nu\_{\min} &\leq \boldsymbol{\nu} \leq \nu\_{\max} \\ \nu\_{\text{biomass}} &= \boldsymbol{z}^\*. \end{aligned} \tag{2}$$

#### Flux-Sum

At steady state, the net rate for metabolite consumption and production is zero, but its turnover is not. The flux-sum φ*i* of internal metabolite *i*, used as a proxy for metabolite turnover, is defined as sum over all reaction fluxes *vj* around metabolite *i*, φ*<sup>i</sup> j ij <sup>j</sup>* <sup>=</sup> 0 5. ∑ *S v*

(Chung and Lee, 2009). The flux-sum at the optimum biomass *z*\* from Eq. 1 is determined by the linear program:

$$\max\limits \text{(min}\Big) \mathbf{0}.5 \* \sum\_{j=1}^{n} \left| \mathbf{S}\_{\bar{g}} \boldsymbol{\nu}\_{j} \right|$$
 
$$\text{s.t.}$$
 
$$\text{S}\boldsymbol{\nu} = \mathbf{0}$$
 
$$\boldsymbol{\nu}\_{\text{min}} \le \boldsymbol{\nu} \le \boldsymbol{\nu}\_{\text{max}} \tag{3}$$
 
$$\boldsymbol{\nu}\_{\text{biomass}} = \boldsymbol{z}^\*.$$

#### Implementation

The models were obtained from the respective publications in which they were first analyzed. All analyses were carried out with the help of the optimization platform TOMLAB version 8.1 using the CPLEX solver with default parameter for MATLAB R2015a. All used models with the irreversible variants of the introduced bypasses are provided as a Datasheet S1 in Supplementary Material. The flux-sum was implemented with TOMLAB version 8.1 using the SNOPT solver and default parameters.

#### RESULTS

Our analysis includes three scenarios to compare and contrast the predictions from the models of *A. thaliana* metabolism with respect to actual biomass increase upon insertion of the two photosynthetic bypasses. In Scenario A, we used the models with the default flux boundaries and employed FBA to determine the extent of the increase in the optimal flux through the biomass reaction upon introduction of the bypass reactions (see Materials and Methods). The predictions from Scenario A may not be realistic, since the optimal biomass may not correspond to a flux distribution in which the ratio of the fluxes of the carboxylation and the oxygenation reactions catalyzed by RuBisCO is physiologically plausible. To this end, we considered Scenarios B and C whereby the optimal flux through the biomass reaction is determined with the additional constraint that *v*carb <sup>=</sup> ∈*v*oxy, with ∈ = 4 and ∈ = 1.5, respectively, for those cases from Scenario A where increase in biomass was predicted. The values for the ratios were selected to match observations about the range for the rate of photorespiration estimated either from labeling experiments or from gas exchange data (Sharkey, 1988; Szecowka et al., 2013; Ma et al., 2014; Heise et al., 2015).

Since AraGEM and the model of Arnold and Nikoloski already include the glycolate oxidase, we introduced only the malate synthase and the catalase reaction in the chloroplast to simulate the Maier bypass. For both bypasses, we determined and compared the maximum biomass yield, obtained by FBA, with and without each of the bypasses.

In addition, by applying FVA (see Materials and Methods), we determined the interval of flux values that the reactions in the photorespiratory pathway (**Table 1**) can take at the optimum for each of the three scenarios. Comparison of the intervals allowed us to test the hypothesis that upon insertion of the bypass the photorespiratory flux is reduced, but not completely diverted, in comparison to the case without bypass. Validating the hypothesis would imply that the predictions from the large-scale models are in line with experimental observations suggesting a two- to fivefold reduction of photorespiratory flux (Peterhansel et al., 2013a) upon insertion of the considered bypasses. The results obtained from the three models under the different scenarios are summarized in **Table 3**.

#### Scenario A – No Constraints on the Flux Ratio of Carboxylation and Oxygenation Reactions

We first considered the core model of *A. thaliana* with the three biomass reactions for photoautotrophic growth: carbon-limiting, nitrogen-limiting, and optimal growth conditions (Arnold and Nikoloski, 2014). The carbon-limiting scenario pertains to shortday conditions, where energy efficiency of carbon fixation may be more limiting for growth than under long-day conditions. Incidentally, this is the condition under which the advantage of both investigated bypasses was detectable (Peterhansel et al., 2013b).

Under the assumption that all of the inserted reactions are irreversible, our results showed that no biomass increase could be predicted for any of the considered environments following the insertion of the Maier bypass. By contrast, by allowing the malate synthase to act as a reversible reaction, the model predicted a very small increase (< 0.03%) over the three growth conditions (Table S1 in Supplementary Material). Reversibility of the other two reactions (glycolate oxidase and catalase) does not affect the predicted biomass yield. This finding implies that the insertion of the Maier bypass in the model of Arnold and Nikoloski had negligible effect on biomass increase. The Kebeish bypass with any configuration of reversibilities of three individual reactions did not result in increase of biomass, similar to the Maier bypass scenario above (see Table S4 in Supplementary Material, increase <0.02%).

The AraGEM model with all irreversible reactions for the Maier bypass did not result in an increase in biomass. However, the insertion of all reversible reactions led to an increase of 0.6% for the single biomass reaction that the model includes. The FVA indicates that the malate synthase carries a negative (net) flux (Table S5 in Supplementary Material). The insertion of the Kebeish bypass resulted in an increase of 0.3%, see Table S6 in Supplementary Material. However, both results pertain to the unrealistic scenario in which the RuBisCO oxygenase practically does not carry flux, which was revealed by the FVA.

TABLE 3 | Predicted percentage increase in biomass yield of the three models when introducing the Maier and Kebeish bypasses under the three analyzed scenarios.


*irrev indicates that all reactions of the bypass are considered irreversible; MS rev and TS rev indicate that only malate synthase and tartronate semialdehyde carboxylase, respectively, are considered reversible; rev indicates that all reactions of the bypass are considered reversible. This includes each configuration of reaction reversibilities, which leads to an increase in predicted growth with respect to a more constrained configuration.*

We next investigated the larger model of Mintz-Oron et al., which also includes the glycolate oxidase, but only in the cytoplasm, mitochondria, and peroxisome. Upon introducing the Maier bypass, the predicted increase in the optimal biomass yield was 1.0% with every combination of reaction reversibilities. By inspecting the results from FVA, we found that the increase in biomass yield was associated, as one would anticipate, to larger flux ranges. The only reaction that had a fixed value at the optimum biomass with and without inclusion of the bypass was glutamine synthetase; its flux was more than fivefold reduced, lending support for the hypothesis of reduced photorespiration by Peterhansel et al. (2013a). These findings were invariant over all combinations of reaction reversibilites (Table S2 in Supplementary Material).

Inserting the Kebeish bypass into the model of Mintz-Oron et al. resulted in 0.4% increase of biomass with all reactions considered irreversible. Allowing for reversibility of tartronate semialdehyde carboxylase and the other two reactions irreversible resulted in a maximum increase of 0.8%. Glutamine synthetase was, again, the only reaction that had a fixed value at the optimum biomass with and without inclusion of the bypass; its flux was reduced by more than 1.2-fold (Table S3 in Supplementary Material).

## Scenario B – Constrained Flux Ratio of Carboxylation and Oxygenation Reactions (3:2)

In this section, we repeated the analysis under the additional assumption that the flux ratio of RuBisCO carboxylase and oxygenase reactions was fixed to 3:2. The Maier bypass in the model of Arnold and Nikoloski did not predict an increase in biomass in any of the three environments. On the other hand, the Kebeish bypass resulted in a predicted biomass increase of 6.2% for optimal, 6.2% for carbon-limited, and 6.3% for nitrogen-limited growth conditions. We also inspected the change in flux variability ranges in the model upon insertion of the Kebeish bypass. As in Scenario A, the fluxes of the reactions in the photorespiratory pathway at the optimal biomass yield with the additional ratio constraint vary in a small range. Upon insertion of the bypass, the upper bounds for the photorespiratory flux were lowered by ~10% (Table S4 in Supplementary Material).

Scenario B with AraGEM resulted in biomass increase of 2.7% for the Maier bypass with all reversible reactions. Here too, we found that the increase in biomass yield was associated with a negative net flux for the malate synthase reaction (Table S5 in Supplementary Material). The increase for the Kebeish bypass was 1.1% with the three reversibilities of reactions in **Table 3**.

Scenario B with the model of Mintz-Oron et al. resulted in the same increase in biomass yield as in Scenario A, above, for the Maier bypass. Here too, we found that the increase in biomass yield was associated with larger flux ranges. The reaction catalyzed by glutamine synthetase was the only which showed a fixed value at the optimum biomass with and without inclusion of the bypass; its net flux shifted in the direction of synthesizing glutamine (more than fivefold decrease). As in Scenario A, these findings were invariant over all combinations of reaction reversibilites (Table S2 in Supplementary Material). The results for the Kebeish bypass were largely invariant in comparison to that of Scenario A (Table S3 in Supplementary Material).

#### Scenario C – Constrained Flux Ratio of Carboxylation and Oxygenation Reactions (4:1)

In this section, we repeated the analysis under the assumption that the flux ratio of RuBisCO carboxylase and oxygenase reactions was fixed to 4:1. The model of Arnold and Nikoloski predicted an increase in biomass of 0.1% for all three reversibilities for the Maier bypass under nitrogen limiting growth condition. The Kebeish bypass resulted in an increase of 3.0% for optimal, 3.0% for carbon-limiting, and 3.9% for nitrogen-limiting growth conditions, accompanied by increase in flux variability bounds (Table S4 in Supplementary Material). AraGEM resulted in an increase of 0.1% with the Kebeish pathway and 1.6% for the Maier bypass, while the model of Mintz-Oron et al. resulted in the same predictions as in Scenario B for both bypasses (Tables S2, S3, S5, and S6 in Supplementary Material). As in Scenario B, above, glutamine synthase for the model of Arnold and Nikoloski shows a constant value at the optimum biomass upon the insertion of the Kebeish pathway, which is at least 20% smaller than the values at the optimum biomass for the wild type.

Our earlier analysis demonstrated that consistent predictions can only be made for the Kebeish pathway with the three models. We also showed that this was the case for the model of Arnold and Nikoloski with biomass functions for three different scenarios. Since this model was reconstructed entirely from experimental data, lacks any blocked reactions, and gives the best quantitative match between the predictions and observations from the bypasses, the remaining analyses will consider only this model with and without the Kebeish pathway. Specifically, in the following we focus on (1) biomass increase and (2) variability of CO2 uptake upon varying the carboxylation to oxygenation ratio as well as (3) the comparison of co-factor flux-sums (as proxies for turn-over) between the wild type and transformants (i.e., upon insertion of the bypass).

#### Biomass Increase at Varying Values for the Carboxylation to Oxygenation Ratio

Due to the observed differences in increase of biomass for the model of Arnold and Nikoloski upon insertion of the Kebeish bypass at two different fixed values of carboxylation to oxygenation ratios (i.e., 1.5 and 4), we expanded the range of investigated ratios to values between 1 and 100. This modeling scenario would account for differences in the internal CO2 concentration, resulting from the bypass insertion and manifesting itself in increases of carboxylation rate (Kebeish et al., 2007; Peterhansel et al., 2013a). Considering the wild type (**Figure 2A**), we observed, expectedly, an increase in biomass with the increase in the ratio between the carboxylation and oxygenation. The largest increase in biomass (~8%) for the transformants is observed at the smallest tested value for the ratio of 1. In each case, the largest difference between the three biomass functions was observed for ratios from 2.5 to 5, with an increase for the nitrogen-limiting biomass that outperforms the increase for the optimal and the carbon-limiting

FIGURE 2 | Biomass increase at varying values for the carboxylation to oxygenation ratio. For the three investigated biomass functions, (A) shows the optimal biomass yield obtained for varying carboxylation (c) to oxygenation (o) ratios, (B–D) show the fraction of transformant to wild-type biomass yield upon introduction of the Kebeish bypass with all reactions irreversible (irrev), the Kebeish bypass with reversible tartronate semialdehyde carboxylase (TS) and the Kebeish bypass with all reactions considered as reversible (rev).

TABLE 4 | Minimum and maximum CO2 uptake and exchange flux at optimum biomass for wild type and transformant (T), including the irreversible Kebeish pathway for carboxylation to oxygenation (c/o) ratios of 3:2 and 4:1.


*A, flux range for CO2 uptake from environment into cytosol; B, CO2 exchange flux from cytosol to chloroplast; and C, ranges for CO2 exchange flux from cytosol to the mitochondrion. Results are presented for the biomass function derived from the optimal growth conditions (Arnold and Nikoloski, 2014).*

biomass functions. Interestingly, this region includes the previously tested value for the ratio of 4:1, which was derived from external gas exchange and labeling measurements [see the discussion in Heise et al. (2015)]. These observations held for the three cases of reversibilities considered in the Scenarios A–C, above (see **Figures 2B–D** and **Table 3**).

#### Biomass Increase Is Associated with Differences in CO2 Uptake and Compartmentalization at Optimal Biomass Yields

In addition, we tested whether a higher CO2 concentration in the chloroplast could explain the growth increase observed by introduction of the Kebeish bypass (Kebeish et al., 2007). To this end, we analyzed the flux variability of CO2 import into the cytosol, the exchange of CO2 between the cytosol and the chloroplast, as well as the exchange of CO2 between the cytosol and the mitochondrion. We found that, upon insertion of the Kebeish bypass, the flux of CO2 into the chloroplast is reduced by 30% under Scenario B (carboxylation/oxygenation ratio of 3:2) and 6 to 9% for Scenario C (carboxylation/oxygenation ratio of 4:1) with all reversibilities examined above (**Table 4** and Table S7 in Supplementary Material). From these results, we concluded that less CO2 needs to be imported from the cytosol, likely due to a higher CO2 concentration resulting from its release from the bypass. In line with a shift of CO2 release from mitochondria to the chloroplast, we observed a decrease in CO2 release from the mitochondrion into the cytosol. The uptake of CO2 from the environment into the cytosol, however, was increased. The latter result is in accordance with the experimentally determined increase in the apparent rate of CO2 assimilation in bypass transformants (Kebeish et al., 2007).

### Biomass Increase Is Accompanied by Flux-Sum Differences in Co-Factors ATP, NADH, and NADPH

The flux-sum for a metabolite (see Materials and Methods) can be regarded as a measure for the overall flux through a metabolic pool at a feasible steady state (Chung and Lee, 2009). It has recently been applied in understanding changes in maize metabolism under different nitrogen conditions (Simons et al., 2014). Here, we were interested in comparing the range of the flux-sum values at the optimum biomass yield, while additionally enforcing either one of the previously considered values for the carboxylation-oxygenation ratios (3:2 and 4:1). We focused on interpreting the changes in compartment-specific flux-sum differences in co-factors ATP, NADH, and NADPH only in the cases where the ranges of their flux-sum in the transformants do not overlap with the ranges of the flux-sum in the wild type. In other words, we interpreted a change in turn-over that was valid in every possible steady state (with the imposed constraints). Therefore, we can conclude that the observed increase in the biomass is clearly associated with the difference in flux-sum values for these metabolites.

More specifically, we obtained a constant increase in flux-sum of ATP in the chloroplast and mitochondrion across all biomass functions and for both carboxylation to oxygenation ratios (Table S8 in Supplementary Material). In fact, the flux-sum of the chloroplastic ATP increased by at least 7% in Scenario B and 3.5% in Scenario C (% is determined from the maximum flux-sum in the wild type and the minimum flux-sum in the transformant). For the mitochondrial ATP, the increase was 85% in Scenario B and 143% for the carboxylation to oxygenation ratio of 4:1. Moreover, we observed that in the transformants the flux-sum of the peroxisomal NADH is fixed to 0 at the optimal biomass. The latter implies that NADH is not consumed nor produced by any of the modeled reactions in the peroxisome. For the remaining metabolites, while we observe changes in the ranges of the flux-sum, they overlap between the transformant and the wild type, and hence no conclusive statement can be made (Töpfer et al., 2015).

#### DISCUSSION

The two photorespiratory bypasses considered in our study have recently been investigated by Xin et al. (2015) through kinetic modeling by considering extensions to the model of Zhu et al. (2007). However, this approach cannot be employed to provide predictions about biomass yield, since the kinetic model only considered the core of plant carbon metabolism and, thus, lacks important biomass precursors, such as amino acids and cell wall components. Importantly, the kinetic modeling approach was based on the assumption that the photorespiratory pathway is not operational upon introduction of a bypass pathway, which has not been experimentally confirmed and, thus, precludes the possibility of testing the hypothesis of a reduced photorespiration. With respect to the rate of photosynthesis, the kinetic modeling predicted an increase following insertion of the Kebeish bypass but a decrease following insertion of the Maier bypass. However, gas exchange measurements indicated that the Maier bypass enhances the photosynthetic rate when calculated per mol chlorophyll (Maier et al., 2012). In addition, the predictions based on kinetic modeling depend on the type of enzyme kinetics, the values of the respective parameters, reversibility, and number and type of effectors. These are usually unknown, but will affect the predictions from the insertion of the bypasses and the reference state (i.e., fluxes and concentrations) used in the comparison.

In contrast to kinetic modeling, the constraint-based approach relies only on the assembled stoichiometry together with the assumption that the organism optimizes a particular objective – here biomass yield. While constraint-based modeling cannot make predictions about concentrations of intermediates without additional assumptions (Töpfer et al., 2015), using this approach one can make predictions of higher level phenotypes (e.g., biomass yield), as well as the fluxes associated with these phenotypes, based only on the metabolic reactions. To mitigate the effect of the model size and different reconstruction strategies on the predictions, we conducted a comparative analysis involving three state-of-the-art metabolic reconstructions of *A. thaliana*. We performed comparison with respect to increase in biomass yield and difference in the flux ranges for reactions involved in the photorespiratory pathway (**Table 1**) upon insertion of the bypasses. Moreover, unlike the kinetic modeling approach by Xin et al. (2015), we did not block photorespiratory flux, but instead left its flux unconstrained, which allowed predicting the effect of the bypasses on photorespiration with respect to optimal biomass yield. For instance, in Scenarios A and B for the model of Mintz-Oron et al., we found that the flux through the glutamine synthetase (which was constant at the optimum biomass with and without the bypass) was fivefold reduced upon introduction of the Maier bypass. Moreover, in Scenario B for the model of Arnold and Nikoloski, we found that the introduction of the Kebeish bypass lowered the upper bounds for the photorespiratory flux by 10%. Therefore, this constraint-based modeling approach allowed us to arrive at the prediction that the reduction of flux through the photorespiratory pathway, suggested by the experiments, indeed, holds.

Our results demonstrated that the constraint-based modeling approach predicts an increase in biomass upon insertion of the bypasses without the need for extensive model parameterization. Furthermore, this approach allowed us to test the effects of all possibilities for reaction reversibility in the bypass and the reactions involved in the photorespiratory pathway. However, we concluded that the qualitative match (i.e., increase in biomass yield of up to 6.22%) between the predictions and experimental evidence can be obtained only by further constraining the optimal states using experimental observations (namely, the ratio of RuBisCO carboxylation to oxygenation rates). Therefore, our findings indicate that predictions of metabolic engineering strategies are tightly bound to the reference state used, and bring to question approaches which attempt pathway engineering without considering reaction rates in the rest of the network.

In addition, our approach of simultaneous usage of *A. thaliana* metabolic models with different characteristics allows us to test the extent to which the metabolic engineering predictions may differ due to the intrinsic differences between the models. For instance, in the model of Arnold and Nikoloski, all reactions involving metabolic conversions (i.e., all reactions except those involved in maintenance, transport, import, and export processes, as well as biomass production) are annotated and there is experimental evidence for their occurrence in *A. thaliana*. By contrast, such evidence is missing for 21 and 37% of the annotated reactions in AraGEM and the model of Mintz-Oron et al. as a result of their (semi-)automated reconstruction using reaction databases. Moreover, and most importantly, the model of Arnold and Nikoloski alongside AraGEM does not include blocked reactions (i.e., reactions not carrying flux in any steady state). The percentage of blocked reactions in the model of Mintz-Oron et al. is 59%. Due to the inclusion of fewer reactions, the model of Arnold and Nikoloski is, however, less flexible (i.e., has smaller flux variability ranges and higher coupling of reactions) than AraGEM and the model of Mintz-Oron et al. In addition, the model of Arnold and Nikoloski predicts more efficient conversion of CO2 into biomass than AraGEM, at the cost of assimilating more photons. These increased levels of realism, discussed in detail in Arnold and Nikoloski (2014), may explain the better quantitative match between the predictions from the model of Arnold and Nikoloski upon placing physiologically meaningful constraints on the RuBisCO catalyzed reactions.

We attempted to link the increase in biomass to changes in several key molecular properties proposed to be associated with photorespiratory metabolism. To this end, we investigated the change in biomass increase as a function of an expanded range of values for the ratio of the RuBisCO carboxylation and oxygenation reactions. This allowed us to conclude that the increase in growth conferred by the bypasses may be due to the release of CO2 in chloroplasts and the resulting increase in internal CO2 concentration. This finding was further strengthened by simulating the changes in the CO2 transport fluxes between compartments as well as the uptake of CO2 from the environment, which is in line with the experimental evidence. Finally, we found changes in the flux-sum for ATP and NADH at optimum biomass, indicating that their pool sizes are associated with the predicted increase in biomass. Altogether, these molecular properties provide further evidence for the plausibility of the predictions from the constraint-based modeling of the photorespiratory bypasses in Arabidopsis.

Nevertheless, one question remains open: why is the predicted biomass yield much smaller than what was observed in experiments (e.g., ~17–20% for the Maier bypass [Maier et al., 2012)]? The reasons may very well be related to the quality of the underlying models, the modeling strategy taken, and the available experimental evidence: the considered models already incorporate some of the reactions that are integral to the analyzed bypasses. For instance, the model of Arnold and Nikoloski includes a catalase in the chloroplast, which is deemed essential for growth (i.e., its knock-out, by restricting the flux to 0, results in zero biomass yield). Therefore, the effect of inserting the remaining reactions of the bypass may be masked. In addition, experimental evidence indicates that the reference point (i.e., enzyme activity, metabolite levels) leading to biomass values used for gaging the increase may not correspond to the absolute optimum (i.e., optimum with no additional constraints) (Maier et al., 2012). Hence, the predicted optimal growth from the original (unmodified) plant GEMs may overestimate the actual growth rate of the wild-type plants, therefore partially masking the beneficial effect of introducing the bypasses. We have partly

#### REFERENCES


attempted to remedy the latter by enforcing particular flux ratios for the RuBisCO catalase and oxygenase; however, this is only one of many conceivable options. In addition, none of the experimental studies of the bypasses reported the growth rate of the plants, but instead the increase in biomass between wild type and the transgenic plant at a certain point of time. Therefore, the comparison between the growth rates (or yield) predicted by the constraint-based modeling framework cannot be directly compared with the end-point cumulative measurements from the experiments.

Further developments of this modeling framework may allow improvements in the accuracy of predictions by considering additional constraints from data (e.g., transcriptomics, proteomics, and metabolomics profiles) (Nikoloski et al., 2015). These could be complemented with easy-to-use computational tools for prediction and investigation of intervention strategies. Altogether, our study indicates that constraint-based modeling has indeed the potential to identify target reactions and pathways whose insertion could lead to increase in performance of plant species; however, our study indicates that the consideration of more physiologically meaningful constraints, rather than only simple optimization criteria, leads to predictions in line with existing experimental findings. Our study demonstrated that predictions across models of different sizes and levels of realism can provide additional reinforcement for the validity of plant-growth predictions from the constraint-based modeling framework.

#### AUTHOR CONTRIBUTIONS

GB and ZN designed the research. AK and GB performed the analyses. All authors interpreted the results and wrote the manuscript.

#### FUNDING

GB would like to acknowledge financial support by a Marie Curie Intra-European Fellowship within the 7th European Community Framework Programme and the Max Kade Foundation. ZN and AF would like to acknowledge the financial support of PROMICS, Research Unit 1186 of the German Research Foundation. AK, AF, and ZN would like to thank the Max Planck Society for support.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at http://journal.frontiersin.org/article/10.3389/fbioe.2016.00031


plants aimed to avoid the release of ammonia in photorespiration. *BMC Biotechnol.* 11:111. doi:10.1186/1472-6750-11-111


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2016 Basler, Küken, Fernie and Nikoloski. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Optimization of Engineered Production of the Glucoraphanin Precursor Dihomomethionine in *Nicotiana benthamiana*

*Christoph Crocoll1,2 , Nadia Mirza1,2 , Michael Reichelt3 , Jonathan Gershenzon3 and Barbara Ann Halkier1,2\**

*1DNRF Center DynaMo, Department of Plant and Environmental Sciences, Faculty of Science, University of Copenhagen, Frederiksberg, Denmark, 2Copenhagen Plant Science Center, Department of Plant and Environmental Sciences, Faculty of Science, University of Copenhagen, Frederiksberg, Denmark, 3Department of Biochemistry, Max Planck Institute for Chemical Ecology, Jena, Germany*

#### *Edited by:*

*Lars Matthias Voll, Friedrich-Alexander-University Erlangen-Nuremberg, Germany*

#### *Reviewed by:*

*Sixue Chen, University of Florida, USA Judith Becker, Saarland University, Germany*

> *\*Correspondence: Barbara Ann Halkier bah@plen.ku.dk*

#### *Specialty section:*

*This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology*

*Received: 20 November 2015 Accepted: 01 February 2016 Published: 16 February 2016*

#### *Citation:*

*Crocoll C, Mirza N, Reichelt M, Gershenzon J and Halkier BA (2016) Optimization of Engineered Production of the Glucoraphanin Precursor Dihomomethionine in Nicotiana benthamiana. Front. Bioeng. Biotechnol. 4:14. doi: 10.3389/fbioe.2016.00014*

Glucosinolates are natural products characteristic of the Brassicales order, which include vegetables such as cabbages and the model plant *Arabidopsis thaliana*. Glucoraphanin is the major glucosinolate in broccoli and associated with the health-promoting effects of broccoli consumption. Toward our goal of creating a rich source of glucoraphanin for dietary supplements, we have previously reported the feasibility of engineering glucoraphanin in *Nicotiana benthamiana* through transient expression of glucoraphanin biosynthetic genes from *A. thaliana* (Mikkelsen et al., 2010). As side-products, we obtained fivefold to eightfold higher levels of chain-elongated leucine-derived glucosinolates, not found in the native plant. Here, we investigated two different strategies to improve engineering of the methionine chain elongation part of the glucoraphanin pathway in *N. benthamiana*: (1) coexpression of the large subunit (*LSU1*) of the heterodimeric isopropylmalate isomerase and (2) coexpression of *BAT5* transporter for efficient transfer of intermediates across the chloroplast membrane. We succeeded in raising dihomomethionine (DHM) levels to a maximum of 432 nmol g−<sup>1</sup> fresh weight that is equivalent to a ninefold increase compared to the highest production of this intermediate, as previously reported (Mikkelsen et al., 2010). The increased DHM production without increasing leucine-derived side-product levels provides new metabolic engineering strategies for improved glucoraphanin production in a heterologous host.

Keywords: dihomomethionine, glucoraphanin, glucosinolates, metabolic engineering, *Nicotiana benthamiana*

## INTRODUCTION

Plants are the source of an immense diversity of natural compounds, many of which are of high value as medicine or health-promoting agents. Often these compounds are difficult or impossible to produce by chemical synthesis, and extraction from plants is the only source.

Epidemiological studies strongly indicate that dietary consumption of cruciferous vegetables (e.g., broccoli) is correlated with reduced risk of the developing cancer (Verkerk et al., 2009). These and other health-promoting effects have been associated with glucosinolates, natural products characteristic to the Brassicales order, which include vegetables such as broccoli and cabbages and the model plant *Arabidopsis thaliana* (Halkier and Gershenzon, 2006).

Substantial attention has been given to particularly the glucosinolate glucoraphanin that is present in broccoli, as it is generally thought to be the major bioactive compound associated with the cancer-preventive effects of broccoli (Traka and Mithen, 2009; Kensler et al., 2013). A recent human intervention study showed that diets with glucoraphanin-enriched broccoli resulted in retuning of cellular processes in the mitochondria to a basal level that is critical for maintaining a healthy metabolic balance (Armah et al., 2013). The health-promoting effects have resulted in a strong desire to increase the intake of glucoraphanin. The current market is based on products with unreliable amounts of glucoraphanin, if any at all. The latter has primed an interest to engineer the production of glucoraphanin into a heterologous host to obtain a stable, rich source of this product and enable intake of well-defined doses for dietary and pharmaceutical applications.

As a prerequisite for pathway engineering, all glucoraphanin biosynthetic genes have been identified in the model plant *A. thaliana* (Sonderby et al., 2010). Previously, we have engineered the six glucosinolate core pathway genes of simple indolyl- and benzyl glucosinolate-derived directly from the protein amino acids tryptophan and phenylalanine into the non-cruciferous plant *Nicotiana benthamiana* (Geu-Flores et al., 2009; Pfalz et al., 2011), and for indolylglucosinolates also into yeast (Mikkelsen et al., 2012). Engineering of the complex glucoraphanin pathway presents additional challenges as it consists of 12 biosynthetic enzymes, which are partitioned between the chloroplast and the cytosol (Halkier and Gershenzon, 2006; Sonderby et al., 2010).

Briefly, biosynthesis of glucoraphanin can be divided into three major parts (**Figure 1**). First, methionine is transaminated into a α-keto acid (α-KA) by a cytosolic branched-chain aminotransferase (BCAT4) (Schuster et al., 2006). This α-KA then enters the chloroplast where it undergoes side-chain elongation. The carbon side chain is elongated by a condensation reaction catalyzed by methylthioalkylmalate synthase (MAM) (Textor et al., 2004), followed by isomerization and oxidative decarboxylation catalyzed by an isopropylmalate isomerase (IPMI) and an isopropylmalate dehydrogenase (IPMDH), respectively (He et al., 2010, 2011). Following two cycles of chain elongation dihomomethionine (DHM) is formed, which subsequently is converted by the cytosolic, ER-associated core structure pathway to 4-methylthiobutyl (4MTOB) glucosinolate (GLS) (**Figure 1**), which finally is *S*-oxygenated to 4-methylsulfinylbutyl (4MSOB) glucosinolate, commonly known as glucoraphanin (**Figure 1**) (Sonderby et al., 2010).

The feasibility of metabolic engineering the complex glucoraphanin pathway was recently shown by transient expression of 10 biosynthetic genes in *N. benthamiana* (Mikkelsen et al., 2010). Though formation of ~50 nmol g<sup>−</sup><sup>1</sup> fresh weight (fw) of the chain-elongated methionine-derived glucoraphanin was detected, fivefold to eightfold more of chain-elongated leucinederived glucosinolates were detected. The latter does not accumulate in the native *Arabidopsis* but have been observed under conditions when *MAM3* in the methionine chain elongation pathway was overexpressed using the 35S promoter (Field et al., 2004). Evolutionarily, the recursive methionine chain elongation pathway has evolved from the non-recursive valine to leucine chain elongation pathway in primary metabolism (Field et al.,

2004; Textor et al., 2004; Schuster et al., 2006; Binder et al., 2007; He et al., 2010; de Kraker and Gershenzon, 2011). The ability to accumulate chain-elongated leucine-derived glucosinolates upon engineering in tobacco supports the proposed promiscuity of enzymes in specialized metabolism compared with those in primary metabolism (Weng and Noel, 2012). It also indicates that the native plant has evolved mechanism(s) to prevent the formation of these leucine-derived side-products.

Homologs are described for all genes in methionine chain elongation. Coexpression analysis and knockout mutants have helped to identify specific roles of the different homologs. MAM1 was identified as the enzyme catalyzing two condensation reactions (Textor et al., 2004). For IPMDH, three homologs are known of which IPMDH1 was recently reported to be the best candidate for methionine chain elongation, whereas IPMDH2 and IPMDH3 were involved in leucine biosynthesis (LeuC) (He et al., 2009, 2011, 2013). Nevertheless, IPMDH3 was previously shown to be functional in metabolic engineering of DHM (Mikkelsen et al., 2010). The isomerization reaction catalyzed by IPMI has yet another level of complexity. IPMI is a heterodimeric enzyme consisting of a single large subunit (LSU1) that forms a catalytic active enzyme with either of three small subunits (SSU1/2/3). The small subunits define in which pathway IPMI is active. SSU2 and SSU3 are generally associated with methionine chain elongation and SSU1 with LeuC (He et al., 2010). Nevertheless, it was recently hypothesized that SSU1 was actively involved in the first two cycles of methionine chain elongation (Imhof et al., 2014). Transport of α-keto acids (formed initially by cytosolic BCAT4 and after each cycle by the chloroplastic IPMDH enzyme) across the chloroplast membranes was suggested to be performed by the bile acid transporter 5 (BAT5) (Gigolashvili et al., 2009). This was based on *bat5* knockout mutants showing a 50% reduction in methionine-derived, aliphatic glucosinolates and transport of α-keto acids into the chloroplast was impaired.

Toward our goal to establish high glucoraphanin production in a heterologous host, optimization of DHM production is essential. General means to enhance product formation in pathway engineering projects include screening for lacking enzymes, enzymes with improved properties (i.e., substrate specificity, kinetics), alleviating catalytic bottlenecks and increasing flux through the pathway, and taking compartmentalization into account (Heinig et al., 2013). Here, we report optimization of the production of the glucoraphanin precursor DHM in *N. benthamiana*. As reference for comparison, we use the highestproducing gene combination, as previously reported. The increase is obtained by optimizing the combination of biosynthetic genes used. Moreover, we provide additional evidence that BAT5 is the transporter for α-keto acids across the chloroplast membrane system.

#### MATERIALS AND METHODS

#### Plant Material

*Nicotiana benthamiana* plants were grown in small pots of 5.5 cm diameter in a green house at 24°C (day) and 18°C (night) with 50–60% humidity for ~3–4 weeks (to four to six leaves stage).

#### Cloning and Transformation

All genes were cloned into a USER compatible version of pCambia33001 plasmid by USER cloning (Nour-Eldin et al., 2006; Bitinaite et al., 2007; Geu-Flores et al., 2007). In brief, coding sequences of individual genes were amplified with single Uracil containing primers that were compatible with the USER readymade plasmid. For PCR primers, see Table S1 in Supplementary Material. PCR products were purified (QIAquick PCR Purification Kit, Qiagen, Hilden, Germany), and 1–5 μL of purified PCR product were subsequently mixed with 1 μL of plasmid. The volume was adjusted to 10 μL and after addition of 1 μL USER Enzyme (NEB, Ipswich, MA, USA), the mix was incubated at 37 and 25°C for 30 min each. Two microliters of the USER cloning mix were added to 60 μl of chemical competent *E. coli* DHB10 (NEB, Ipswich, MA, USA) cells by heat shock. Briefly, 10 min on ice, 90 s 42°C followed by 2 min on ice. Cells were incubated for 60 min at 37°C after addition of 250 μL LB media. Subsequently, 100 μL were plated on LB agar plates containing kanamycin (50 μg mL<sup>−</sup><sup>1</sup> ). Correct gene insertions were verified by sequencing (Macrogen Europe, Amsterdam, Netherlands). Constructs with correct insertions were subsequently transformed into *Agrobacterium tumefaciens* strain pGV3850 by electroporation (2 mm cuvette, 2.5 kV, 400 Ω, and 25 μF) in a Bio-Rad GenePulser (Bio-Rad, Hercules, CA, USA). One milliliter YEP media was added, and cells were incubated for 3 h at 28°C with shaking. Subsequently, 150 μL were plated on YEP agar plates containing antibiotics (30 μg mL<sup>−</sup><sup>1</sup> rifampicin and 50 μg mL<sup>−</sup><sup>1</sup> kanamycin).

#### Transient Expression by Infiltration of *Nicotiana benthamiana*

Overnight cultures of *A. tumefaciens* carrying the different gene constructs were grown in 10 mL YEP media (containing 50 μg mL<sup>−</sup><sup>1</sup> kanamycin and 30 μg mL<sup>−</sup><sup>1</sup> rifampicin) at 28°C and 220 rpm. Overnight cultures were harvested by centrifugation at 20°C for 15 min at 4500 × *g*. The cell pellets were resuspended in infiltration buffer (10 mM MgCl2, 10 mM MES, pH 5.6) containing 100 μM acetosyringone (3,5-dimethoxy-4-hydroxyacetophenone, Sigma-Aldrich, Steinheim, Germany) and shaken at 150 rpm for 1–2 h at room temperature prior to plant infiltration. Cell densities for all cultures was adjusted to OD600 ≈ 0.21–0.25, which resulted in a final concentration for each individual construct of OD600≈ 0.03 for experiments with chain elongation enzymes only and OD600 ≈ 0.015 for experiments, which included chain elongation and core structure enzymes. These low OD were sufficient to ensure efficient transformation while keeping the stress levels for *N. benthamiana* leaves low. In all experiments, the suppressor protein p19 was included to reduce silencing effects (Voinnet et al., 2003). For each individual, experiment two to three leaves of four *N. benthamiana* plants (3–4 weeks old) were infiltrated with the different combinations of *A. tumefaciens* cultures harboring the different gene constructs. A maximum of seven different *A. tumefaciens* cultures were mixed. The volume of combinations with fewer than seven constructs was adjusted by addition of the respective amount of infiltration buffer.

<sup>1</sup>http://www.cambia.org/

## Plant Material Harvesting and Sample Preparation

Plant material was harvested 5 days after infiltration with *A. tumefaciens*. From each leaf, four leaf disks of 1 cm diameter were harvested from infiltrated areas and weighed. Amino acids were extracted with 400 μL of 85% methanol containing norleucine (10 μM) as internal standard (IS). Amino acid concentrations were determined by comparison to 13C,15N-labeled algal amino acids described below.

#### Amino Acid Analysis by LC-MS

The resulting extract was diluted in a ratio of 1:10 (v:v) in water containing the 13C,15N-labeled amino acid mix (Isotec, Miamisburg, OH, USA). Amino acids in the diluted extracts were directly analyzed by LC-MS/MS. The analysis method was modified from a protocol described by Jander et al. (2004). Chromatography was performed on an Agilent 1200 HPLC system (Agilent Technologies, Boeblingen, Germany). Separation was achieved on a Zorbax Eclipse XDB-C18 column (50 mm × 4.6 mm, 1.8 μm, Agilent Technologies, Germany). Formic acid (0.05%) in water and acetonitrile were employed as mobile phases A and B, respectively. The elution profile was 0–1 min, 3% B in A; 1–2.7 min, 3–100% B in A; 2.7–3 min 100% B, 3.1–6 min 3% B in A. The mobile phase flow rate was 1.1 mL/ min. The column temperature was maintained at 25°C. The liquid chromatography was coupled to an API 5000 tandem mass spectrometer (AB Sciex, Darmstadt, Germany) equipped with a Turbospray ion source operated in positive ionization mode. The instrument parameters were optimized by infusion experiments with pure standards (amino acid standard mix, Fluka, St. Louis, MO, USA). The ionspray voltage was maintained at 5500 eV. The turbo gas temperature was set at 700°C. Nebulizing gas was set at 70 psi, curtain gas at 35 psi, heating gas at 70 psi, and collision gas at 2 psi. Multiple reaction monitoring (MRM) was used to monitor analyte parent ion → product ion: MRMs were chosen as in Jander et al. (2004) except for Arg (*m*/*z* 175 → 70) and Lys (*m*/*z* 147 → 84). In addition, MRMs for homomethionine (HM, *m*/*z* 164 → 118), DHM (*m*/*z* 178 → 132), and *S*-adenosylmethionine (SAM, *m*/*z* 399 → 136). The chain-elongated leucine products homo-leucine (HL, *m*/*z* 146 → 100), dihomo-leucine (DHL, *m*/*z* 160 → 114), and trihomo-leucine (THL, *m*/*z* 174 → 128) were also monitored, but exact quantification was not possible for DHL and THL due to lack of reference standards. Values for DHL and THL are calculated based on the assumption of an equal response factor of 1 compared to 13C,15N-labeled phenylalanine due to their similar behavior in fragmentation and ionization compared to leucine and HL. Detailed values for mass transitions can be found in Table S2 in Supplementary Material. Both Q1 and Q3 quadrupoles were maintained at unit resolution. Analyst 1.5 software (AB Sciex, Darmstadt, Germany) was used for data acquisition and processing. Linearity in ionization efficiencies were verified by analyzing dilution series of standard mixtures (amino acid standard mix, Fluka + Gln, Asn, and Trp, also Fluka). All samples were spiked with 13C,15N-labeled amino acids (algal amino acids 13C,15N, Isotec, Miamisburg, OH, USA) at a concentration of 10 μg of the mix per milliliter. The concentration of the individual labeled amino acids in the mix had been determined by classical HPLC–fluorescence detection analysis after pre-column derivatization with ortho-phthalaldehyde-mercaptoethanol using external standard curves made from standard mixtures (amino acid standard mix, Fluka + Gln, Asn, and Trp, also Fluka). Individual amino acids in the sample were quantified by the respective 13C,15N-labeled amino acid IS, except for tryptophan, and asparagin: tryptophan was quantified using 13C,15N-Phe applying a response factor of 0.42, asparagin was quantified using 13C,15N-Asp applying a response factor of 1.0.

#### Statistical Analysis

Statistical analysis was performed with the SigmaPlot 12.0 statistics package (Systat Software, San Jose, CA, USA).

#### Accession Numbers

Sequence data from this article can be found *via* the TAIR database2 under the AGI locus identifiers: *BCAT4* (At3g19710), *RBSC1A* (At1g67090), *BAT5* (At4g12030), *MAM1* (At5g23010), *IPMI-LSU1* (AT4g13430), *IPMI-SSU1* (At2g43090), *IPMI-SSU2* (At2g43100), *IPMI-SSU3* (At3g58990), *IPMDH1* (At5g14200), and *IPMDH3* (At1g31180).

## RESULTS AND DISCUSSION

In this study, we identified a new combination of genes for methionine chain elongation that produced the highest level of DHM in *N. benthamiana*. Additionally, we measured how these optimizations affected the formation of chain-elongated leucinederived side-products.

## Definition of a Reference Value for Optimization of DHM Production

Toward our goal of optimizing DHM production by transient expression experiments in *N. benthamiana*, we choose as reference a gene combination identical the highest-producing gene combination, as previously reported (Mikkelsen et al., 2010). In this study, genes were expressed from multi-gene constructs with two or three genes separated by 2A sequences (Mikkelsen et al., 2010). However, as we in the current study would compare multiple gene combinations, we expressed all genes from single gene constructs as this enabled us to freely combine individual genes. The previously reported gene combination for highest DHM production included a chloroplast-localized *BCAT4* together with *MAM1*, *IMPI–SSU3*, and *IPMDH3* and resulted in the formation of 51.4 ± 20.8 nmol g<sup>−</sup><sup>1</sup> fw (Mikkelsen et al., 2010). When we expressed the same genes individually, we obtained only 14.6 ± 4.4 nmol g<sup>−</sup><sup>1</sup> fw, which was used as reference value in this study (**Table 1**; **Figures 2A** and **3A**). Several parameters can account for the discrepancy between the values published by Mikkelsen et al. (2010) and the present study. Using single gene constructs increases the number of *Agrobacterium* strains that need to be mixed for coexpression. It has also been reported that

<sup>2</sup>http://www.arabidopsis.org



*Data are represented as mean* ± *SEM in nanomole per gram fresh weight (N* = *8). DHM, dihomomethionine; fw, fresh weight; chl, chloroplastic signal peptide; BCAT4, branched-chain aminotransferase 4; MAM1, methylthioalkylmalate synthase 1; LSU1, large subunit of isopropylmalate isomerase (IPMI); SSU, small subunit of IPMI; IPMDH, isopropylmalate dehydrogenase; Ctrl, control.*

*a Reference to highest-producing gene combination previously reported (Mikkelsen et al., 2010) with 51.4 nmol DHM g*−*<sup>1</sup> fw. For values of all amino acids, see also Table S3 in Supplementary Material.*

FIGURE 2 | Comparison of production of DHM and chain-elongated leucine-derived products. (A) DHM levels. (B) Levels of homo-leucine (HL), dihomo-leucine (DHL), and trihomo-leucine (THL). BCAT4 is relocalized to the chloroplast (chl BCAT4) in combinations A1–A3. The transporter protein BAT5 is coexpressed in combinations A5–A8 and the large subunit (LSU1) of IPMI is coexpressed in all combinations except A1. Ctrl represents non-infiltrated. Chl BCAT4 = BCAT4 with signal peptide for relocation to chloroplast, +LSU1 = combinations where LSU1 was coexpressed, and +BAT5 = combinations where BAT5 was coexpressed. Data are represented as mean ± SEM in nanomole per gram fresh weight (*N* = 8).

constructs containing 2A sequences for self-processing of multigene constructs can result in incomplete cleavage and formation of fusion proteins, which can influence the outcome of metabolic engineering from transient expression in plants (Burén et al., 2012). Other potential reasons include differences in growth conditions for the tobacco plants and differences in detection and quantification of the individual compounds by LC-MS between the two studies. In combination, the experimental and technical differences do not allow for a direct comparison of the DHM production. All calculations are based on the DHM amounts produced in the reference gene combination of the present study, which previously resulted in the highest DHM production.

#### Coexpression of *Arabidopsis IPMI-LSU1* is Essential for Efficient Formation of DHM by the Methionine Chain Elongation Pathway

In the heterodimeric isopropylmalate synthase (IPMI), the large subunit (LSU1) forms a functional enzyme with one of three small subunits (SSU1–3) catalyzing the isomerization step in the methionine and valine (to leucine) chain elongation machinery (Knill et al., 2009). Previously, it was shown that endogenous LSU of tobacco was able to substitute for the *A. thaliana* LSU1 (that was not included in the gene combination) and form a functional IPMI enzyme with *A. thaliana* SSUs that resulted in production of DHM (Mikkelsen et al., 2010). Here, we show that inclusion of *Arabidopsis*' *LSU1* in the gene combination resulted in a 21-fold increase of DHM production (**Table 1**; **Figures 2A** and **3A**, combinations A1 and A2). Coexpression of *A. thaliana* LSU1 probably has alleviated a bottleneck and thus increased flux through the pathway. Another possibility could be better interaction between the two *A. thaliana* subunits in comparison to a heterodimer formed from *N. benthamiana* LSU and *A. thaliana* SSU. However, the high amino acid sequence identity of ~95% between *A. thaliana* LSU1 and *Nicotiana sylvestris* LSU, a close relative of *N. benthamiana*, suggests that functionality is not impaired. Recently, it was also demonstrated that *A. thaliana* SSUs to a certain extend can complement *E. coli* knockout mutants lacking the respective SSU homolog in LeuC (Imhof et al., 2014). Therefore, the increased production of DHM is most likely a result of a better ratio between large subunit to small subunit and the formation of a higher number of catalytically active IMPI heterodimers as a result of coexpressing *LSU1* under the control of the same CaMV35S promoter as the other constructs.

#### Choice of Small Subunit in IPMI Has No Significant Influence on DHM Production

It has been reported that the three small subunits (SSU1/2/3) define in which pathway IPMI is active, with SSU2 and SSU3 being associated with methionine chain elongation and SSU1 with LeuC (He et al., 2010). More recently, it was hypothesized that SSU1 was active in the first two cycles of methionine chain elongation and that either SSU2 or SSU3 catalyzes the formation of the longer chain-elongated products (Imhof et al., 2014). Interestingly, we detected no significant difference in DHM

production when we coexpressed the individual small subunits of *IPMI* with the rest of the methionine chain elongation machinery (**Table 1**, combinations A5–A7). Furthermore, coexpression of *SSU1* with SSU3 did not result in significantly higher production of DHM (**Table 1**, combination A8). Coexpression of *SSU2* and *SSU3* was previously shown to have no positive effect on DHM production (Mikkelsen et al., 2010). Nevertheless, our data show that SSU1 has the ability to support the first two rounds of methionine chain elongation. At this point, it remains unclear if SSU1 catalyzes the first round(s) of methionine chain elongation in native *A. thaliana* plants and whether either SSU2 or SSU3 catalyzes the formation of the longer chain-elongated products, as previously suggested (Imhof et al., 2014). A previously observed sevenfold difference in DHM production between coexpression of *SSU2* and *SSU3* (Mikkelsen et al., 2010) was not observed. This was insofar puzzling as differences in the present study were not as dramatic and not significant for the different products upon coexpression of either of the three *SSU*s together with *LSU1* (**Figure 2**; **Table 1**, combinations A5–A7). One possible explanation could be that coexpression of *IPMI–LSU1* has alleviated effects that might be related to functional differences between the three SSUs. And even though *A. thaliana* IPMI can complement *E. coli* mutant strains deficient for the respective homologs in LeuC (*LeuC* and *LeuD*) (He et al., 2010; Imhof et al., 2014), it cannot be excluded that there are functional differences when catalyzing reactions in methionine chain elongation. Finally, it still remains unclear whether functional differences and the involvement of the different SSUs in either leucine biosynthesis or methionine chain elongation are the result of structural differences in the active sites or due to differences in

temporal and spatial expression within different tissues in the plant (He et al., 2010; Imhof et al., 2014).

## Coexpression of the BAT5 Transporter Facilitates DHM Production with Cytosolic BCAT4

Bile acid transporter 5 has been proposed to translocate substrates and/or products of methionine chain elongation across the chloroplast membranes (Gigolashvili et al., 2009). The α-keto acid product of BCAT4 was proposed to be the substrate for import into the chloroplast by BAT5 as *bat5* knockout mutants show 50% reduced levels of aliphatic glucosinolates (including 4MSOB and 4MTB) and transport of α-keto acids into the chloroplast was impaired (Gigolashvili et al., 2009). As BAT5 may be critical for translocating the chain elongation products out of the chloroplast, we investigated whether inclusion of BAT5 together with cytosolic BCAT4 improves DHM production.

In experiments where *BCAT4* was expressed in the cytosol without coexpression of *BAT5*, a massive reduction in DHM production to only 41.9 (±9.2) nmol g<sup>−</sup><sup>1</sup> fw was observed compared to chloroplast-localized BCAT4 (**Table 1**; **Figures 2A** and **3A**). This demonstrates that although methionine chain elongation is a heterologous pathway, *N. benthamiana* still has the ability to transport α-keto acid intermediates at a low level. DHM production was fully restored upon coexpression of *BAT5* with the cytosolic *BCAT4* (**Figure 3B**). DHM levels were higher than in combinations with chloroplast-localized BCAT4, though not significantly higher (**Table 1**). It is not known whether cytosolic BCAT4 (or other chloroplast-localized BCATs) are involved in transaminating the final chain-elongated methionine product, and whether a chain-elongated α-keto acid is substrate for export out of the chloroplast by BAT5. Nevertheless, the fact that DHM accumulated to such high levels indicated that BCAT4 may also transaminate the chain-elongated α-keto acids into the respective chain-elongated methionine and that BAT5 is an antiporter for the different chain-lengths α-keto acids.

## Differences in DHM Production with IPMDH3 Compared to IPMDH1

Three homologs exist for IPMDH, of which IPMDH3 was used for in the previous report for engineering DHM (Mikkelsen et al., 2010). A recent study linking association by coexpression suggested that IPMDH1 was the key player in methionine chain elongation while IPMDH2 and IPMDH3 were involved in LeuC (He et al., 2013). When we compared inclusion of *IPMDH1* or *IPMDH3* together with the other chain elongation genes, we observed that the production of DHM was always higher with *IPMDH3* though never significantly higher (**Table 1**). The variation from transient expression in tobacco made it impossible to identify significant differences. Nevertheless, association by coexpression in *Arabidopsis*, as previously reported (He et al., 2013), does not exclude enzymatic promiscuity (Weng and Noel, 2012) of the different IPMDH enzymes. This is also supported by the fact that there was no significant difference detected between the three IPMI small subunits (see above).

#### Presence of LSU1 Has Major Effects on Formation of Leucine-Derived Side-Products but is Independent of BCAT4 Localization

Previously, when the approximately fivefold to eightfold higher levels of leucine-derived glucosinolates were monitored (Mikkelsen et al., 2010), it was not possible to differentiate if both leucine and isoleucine were taken as substrates by the chain elongation machinery. Here, we confirmed by UHPLC-MS analysis that only leucine and not isoleucine is taken as substrate (Mirza et al., 2016). Optimization of the gene combination for DHM production also affected the formation of chain-elongated leucine-derived products (**Figure 2**). Similar to DHM production, chain-elongated leucine-derived products [homo-leucine (HL), dihomo-leucine (DHL), and the newly detected trihomoleucine (THL)] drastically increased by including LSU1 together with the other chain elongation genes (**Figure 2**). In contrast to DHM, leucine-derived products were formed in similar amounts independent of whether *BCAT4* was expressed in the cytosol or the chloroplast, except for combination A1 where *LSU1* was not coexpressed (**Figure 2**, combinations A1 and A4; Table S3 in Supplementary Material). Especially, the formation of DHL was higher when BCAT4 was localized to the chloroplast (**Figure 2**, combinations A2 and A3). Interestingly, formation of leucinederived products – especially the longer chain-elongated products DHL and THL – was reduced in the gene combination when *SSU1* (A6) was coexpressed rather than *SSU2* (A7) or *SSU3* (A5) (**Figure 2B**). Also, THL was only found in combinations where *LSU1* was coexpressed, which may be related to the overall lower production in this combination or a reduced ability of tobacco LSU to support three rounds of leucine chain elongation. THL had not been described previously as a side-product from engineering of methionine chain elongation in tobacco (Mikkelsen et al., 2010) and neither from metabolic engineering of DHM in *E. coli* (Mirza et al., 2016).

## Increased DHM Formation Considerably Improves DHM to Side-Product Ratio

Increased DHM formation had positive effects on the ratio of DHM to leucine-derived products. When the ratios were calculated for the sum of leucine-derived products (HL + DHL + THL) compared to DHM in the individual experiments, we detected 8.8-fold more leucine-derived products compared to DHM in the reference combination A1 as previously reported for this combination (Mikkelsen et al., 2010). All other combinations showed a more favorable ratio of DHM to chain-elongated leucine-derived products. The best ratios with almost equal amounts were found in combinations A6 and A7 (DHM:HL/DHL/THL 0.8:1) containing the complete set of genes and compartmenatlization (**Figures 2** and **3**; Table S4 in Supplementary Material). The amounts of leucine-derived products remained largely unchanged throughout our experiments. Mainly DHL and the newly detected THL contributed to a large extent to the high amounts of leucinederived products. The latter was drastically reduced in the gene combination including SSU1. Interestingly, we detected similar amounts of leucine-derived products in the combination without BAT5 (A4), which suggests that leucine α-keto acids are present and taken up by the methionine chain elongation machinery. A similar effect was seen in experiments without coexpression of *BCAT4* (data not shown). Therefore, other solutions for further reduction of leucine-derived products may include exchange or mutation of single enzymes in the methionine chain elongation to increase affinity toward methionine and away from leucine.

In summary, our experiments provide new insights toward improving engineering of DHM production by transient expression in *N. benthamiana*. Here, we demonstrated a substantial 30-fold increase in production of DHM (432 nmol g<sup>−</sup><sup>1</sup> fw) compared to the highest-producing gene combination previously reported and a not less impressive 9-fold increase compared to the previously highest reported DHM production levels (Mikkelsen et al., 2010). Simultaneously, the amounts of leucine-derived sideproducts were substantially reduced especially by re-establishing the compartmentalized organization of the methionine chain elongation in the transient expression host system. In conclusion, the optimized gene combination for production of DHM consists of five (or six) genes: *BCAT4* (*BAT5*), *MAM1*, *LSU1*, *SSU1*, and *IPMDH1*. *BAT5* is only necessary if the methionine chain elongation is expressed in plants or other chloroplastcontaining organisms, such as microalgae, while in the case of a microbial host, the transport step can be omitted. Our results provide important insights for optimizing the engineering of glucoraphanin production in a heterologous host. Especially, the fact that the separation of the transamination step from the chain elongation itself improved the production of DHM mainly by reducing formation of leucine-derived side-products has implications on metabolic engineering in microbial host organisms where such spatial separation is difficult to establish. This implies that other techniques might be required to create a similar level of spatial separation between the two parts of the chain elongation pathway. This could include creation of fusion proteins or coexpression of chaperons to create microenvironments that also could increase flux through the pathway. In addition, it should be considered to engineer pathway enzymes in a way to increase substrate specificity toward methionine and the corresponding intermediates. These steps should be considered before adding another level of complexity by engineering the core structure biosynthetic pathway on top of the methionine chain elongation pathway to ultimately create an expression system for sustainable production of glucoraphanin.

## AUTHOR CONTRIBUTIONS

CC cloned the constructs for DHM biosynthesis, planned and conducted the experiments, prepared the samples for LC-MS

## REFERENCES


analysis for DHM production. NM conducted initial experiments, contributed to the discussion of results and the manuscript. MR performed analysis of amino acid and DHM production and suggested analytical improvements. JG contributed to the experimental design and made improvements to the manuscript. BH was involved in discussions about the experimental setup and results. CC and BH wrote the manuscript based on a draft written by CC.

#### FUNDING

This work was supported by Danish National Research Foundation (grant DNRF99) and Danish Council for Strategic Research (grant 0603-00387B).

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at http://journal.frontiersin.org/article/10.3389/ fbioe.2016.00014


chain on intake, bioavailability and human health. *Mol. Nutr. Food Res.* 53, S219–S265. doi:10.1002/mnfr.200800065


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2016 Crocoll, Mirza, Reichelt, Gershenzon and Halkier. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Synthetic Peptides as Protein Mimics

*Andrea Groß1 , Chie Hashimoto1 , Heinrich Sticht2 and Jutta Eichler1 \**

*1Department of Chemistry and Pharmacy, University of Erlangen-Nuremberg, Erlangen, Germany, 2 Institute of Biochemistry, University of Erlangen-Nuremberg, Erlangen, Germany*

The design and generation of molecules capable of mimicking the binding and/or functional sites of proteins represents a promising strategy for the exploration and modulation of protein function through controlled interference with the underlying molecular interactions. Synthetic peptides have proven an excellent type of molecule for the mimicry of protein sites because such peptides can be generated as exact copies of protein fragments, as well as in diverse chemical modifications, which includes the incorporation of a large range of non-proteinogenic amino acids as well as the modification of the peptide backbone. Apart from extending the chemical and structural diversity presented by peptides, such modifications also increase the proteolytic stability of the molecules, enhancing their utility for biological applications. This article reviews recent advances by this and other laboratories in the use of synthetic protein mimics to modulate protein function, as well as to provide building blocks for synthetic biology.

#### *Edited by:*

*Zoran Nikoloski, Max-Planck Institute of Molecular Plant Physiology, Germany*

#### *Reviewed by:*

*Juan Manuel Pedraza, Universidad de los Andes, Colombia Alexander D. Frey, Aalto University, Finland*

> *\*Correspondence: Jutta Eichler jutta.eichler@fau.de*

#### *Specialty section:*

*This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology*

*Received: 16 September 2015 Accepted: 22 December 2015 Published: 19 January 2016*

#### *Citation:*

*Groß A, Hashimoto C, Sticht H and Eichler J (2016) Synthetic Peptides as Protein Mimics. Front. Bioeng. Biotechnol. 3:211. doi: 10.3389/fbioe.2015.00211*

Keywords: protein–protein interactions, protein mimics, peptides, structure-based design, biomaterials

## INTRODUCTION

The detailed insight into the human genome does not in itself enable a comprehensive understanding of human protein function, health, and disease. In the post-genome era, an important challenge is the structural and functional analysis of the gene products, i.e., proteins. Proteins play a major role in almost all biological processes, including enzymatic reactions, structural integrity of cells, organs and tissues, cell motility, immune responses, signal transduction, and sensing. All protein-mediated biological processes are based on specific interactions between proteins and their ligands. Therefore, exploring disease-associated protein–ligand and protein–protein interactions is essential to gain insight into the molecular mechanisms underlying diseases and other phenomena, as well as for the development of novel therapeutic strategies.

Molecules that present the binding sites of proteins, which are involved in a disease-associated protein–protein interaction, are promising candidates for therapeutic intervention. Such binding site mimetic molecules can be generated either through recombinant protein synthesis or by means of chemical peptide synthesis. A specific advantage of synthetic peptides is that they can be generated as exact copies of protein fragments as well as in diverse chemical modifications, which include the incorporation of a large range of non-proteinogenic amino acids, as well as the modification of the peptide backbone. Apart from extending the chemical and structural diversity presented by peptides, such modifications also increase the proteolytic stability of the molecules, enhancing their potential as drug candidates.

Three conceptually different approaches are available for the design of protein-binding site mimetic peptides. These approaches are based on one or more of the following information about the proteins of interest: structure, sequence, and function. In random combinatorial methods that are based solely on protein function, such as phage display (Li and Caberoy, 2010) and synthetic peptide combinatorial libraries (Houghten et al., 1999), respectively, large populations of peptides are screened for binders to the respective partner protein, or for inhibitors of the protein–protein interaction of interest. A strategy termed peptide scanning is based on the synthesis of the entire protein sequence – or large parts of it – in the shape of short, overlapping peptides, which are then individually tested for binding to the respective partner protein (Frank, 2002), enabling the identification of proteinbinding sites. The utility of this method, however, is largely limited to the identification of sequentially continuous binding sites, which are located in a protein sequence stretch of consecutive amino acids (**Figure 1A**). Structure-based design, finally, involves the design and generation of protein-binding site mimics based on the 3D structure of the protein–protein complexes (Eichler, 2008). This structural information enables the design and generation of mimics of continuous, as well as of sequentially discontinuous protein-binding sites, which are composed of two or more protein segments that are distant in protein sequence, but brought into spatial proximity through protein folding (**Figure 1B**). Mimicking such discontinuous protein-binding sites by synthetic peptides typically involves presentation of the respective protein fragments through a molecular scaffold (**Figure 1B**).

Here, we review strategies for the use of synthetic peptides as protein mimics. Focusing on structure-based design, the potential of such peptides as drugs against diseases, such as viral and bacterial infections, cancer, as well as autoimmune diseases, are discussed.

## TOOLBOX FOR PEPTIDE SYNTHESIS: NON-PROTEINOGENIC AMINO ACIDS AND SITE-SELECTIVE LIGATION

Most current methods for the chemical synthesis of peptides utilize Merrifield's concept of solid-phase synthesis (Merrifield, 1963), which enables the synthesis of peptides and small proteins of up to 100 amino acids. A major advantage of chemical peptide synthesis, as compared to recombinant protein synthesis, is the extended set of amino acids and other building blocks that can be incorporated, which includes d-amino acids, as well as a wide range of non-proteinogenic amino acids (**Figure 2**). While the recombinant synthesis of proteins containing non-proteinogenic amino acids is possible only through alternative codon usage (Mehl et al., 2003), hundreds of such building blocks are commercially available for the use in chemical peptide synthesis. This opens the door to improved biological activity and peptide stability, as well as structural modifications. One possibility is the use of β- and γ-amino acids (Seebach et al., 2004), which differ from α-amino acids in having one or two additional methylene groups between the carboxy and the amino function of the amino acid (**Figure 2A**). Peptides composed of these amino acids are stable against proteolysis *in vitro* and *in vivo*, as well as metabolism and degradation by microbial colonies (Seebach et al., 2004; Seebach and Gardiner, 2008). On the functional side, they can act similar to the natural α-peptides, as examplified by β- and γ-peptide agonists of naturally occurring α-peptide hormones such as somatostatin (Seebach et al., 2004; Seebach and Gardiner, 2008).

Another possibility is the use of d-amino acids. Due to the chirality of the Cα-atom, amino acids exist in two different stereoisomers (l and d). While recombinantly synthesized peptides and proteins are typically composed entirely of l-amino acids, chemical peptide synthesis can also use d-amino acids, which has been shown to increase the proteolytic stability while maintaining biolocical activity when d-amino acids are introduced at defined positions of an antimicrobial peptide (Hong et al., 1999b). At other positions, on the other hand, using d-amino acids instead of l-amino acids had the opposite effect due to structural damage to the peptide (Hong et al., 1999b).

Furthermore, oligomers of *N*-alkyl glycine monomers, termed peptoids, have been introduced as proteolytically stable peptide derivatives (Simon et al., 1992) (**Figure 2A**). As the amide hydrogen is missing in peptoids, the typical backbone hydrogen bonds present in proteins and peptides cannot be formed, altering the conformational preferences of these molecules. Peptoids have been used as mimics of antimicrobial peptides (termed as ampetoids) (Chongsiriwatana et al., 2008; Mojsoska et al., 2015) as well as novel therapeutics (Zuckermann and Kodadek, 2009).

In addition to alteration of the peptide backbone, the use of non-proteinogenic amino acids enables the introduction of chemical moieties that are not presented by the proteinogenic amino acids, and which can be used to dissect the binding mode of peptides. A prominent example is the substitution of aromatic side chains of phenylalanine or tryptophan with larger aromatic groups such as naphthyl or biphenyl (**Figure 2B**), which increases the size and hydrophobicity of the side chain and affects π-stacking with the respective protein ligand (Muraki et al., 2000; Bachmann et al., 2011). Functionalized and orthogonally protected amino acids are often used for chemo selective ligation strategies (Tornoe et al., 2002; Kimmerlin and Seebach, 2005). In addition, lysine, among other amino acids, can be used for the synthesis of branched peptides (Franke et al., 2007). Furthermore, a range of scaffold molecules, such as trimesic acid derivatives (Berthelmann et al., 2014), triazacyclophane derivatives (Opatz and Liskamp, 2001; Chamorro et al., 2009), bis-, tris-, and tetrakis(bromomethyl)-benzene (CLIPS technology) (Timmerman et al., 2005) (**Figure 2C**), as well as cyclic β-tripeptide derivatives (Seebach and Gardiner, 2008) have been introduced for the generation of multivalent peptides.

#### PROTEIN SECONDARY STRUCTURE MIMICS

The three-dimensional (3D) arrangement of proteins contains unstructured, as well as structured regions, in which peptide chains are organized into secondary structures, such as α-helices and β-sheets. As α-helices and β-sheets mediate protein folding and protein–protein interactions, they are related to various biochemical phenomena and diseases (Fairlie et al., 1998). These secondary structures are stabilized by hydrogen bonds between amide nitrogen and carbonyl oxygen atoms. Bullock et al. have analyzed the full set of helical protein interfaces in the Protein Data Bank (Berman et al., 2000) and found that about 62% of the helical interfaces contribute to protein–protein interactions (Bullock et al., 2011). Although natural proteins contain less β-sheet structure than α-helical structure, β-sheets contribute to protein aggregation, as well as to protein–protein interactions. Thus, peptides that mimic α-helices and β-sheets of proteins are attractive targets for drug development and tools to explore protein binding mechanism. A range of α-helix and β-sheet mimics have been developed, which will be discussed below. The various strategies of mimicking protein-binding sites through secondary structure mimics have also been extensively reviewed recently (Pelay-Gimeno et al., 2015).

## **α**-Helix Mimics

The α-helical conformation of a peptide can be stabilized, and even induced, by introducing covalent links between amino acid side chains at selected positions. These links can be formed by lactam (Ösapay and Taylor, 1992; Yu and Taylor, 1999; Sia et al., 2002; Yang et al., 2004; Mills et al., 2006) and disulfide bridges (Jackson et al., 1991; Leduc et al., 2003), triazole-based linkages (Scrima et al., 2010; Kawamoto et al., 2011; Madden et al., 2011), and hydrocarbon staples (Blackwell and Grubbs, 1998; Schafmeister et al., 2000; LaBelle et al., 2012; Verdine and Hilinski, 2012; Brown et al., 2013; Chang et al., 2013; Nomura et al., 2013; Walensky and Bird, 2014; Chu et al., 2015). Replacing hydrogen bonds by salt bridges has been reported by Otaka et al. as an alternative means of stabilizing α-helices (Otaka et al., 2002). Further examples for hydrogen bond surrogates include cation–π interaction (Olson et al., 2001; Shi et al., 2002; Tsou et al., 2002) and π–π interaction (Albert and Hamilton, 1995).

Foldamers are a very prominent class of α-helix mimetic peptides. They are composed of β-amino acid (Seebach and Matthews, 1997; Gellman, 1998; Cheng et al., 2001; Martinek and Fulop, 2003), α/β-amino acid oligomers (Johnson and Gellman, 2013), or *N*-substituted glycine residues (peptoids) (Sun and Zuckermann, 2013). Such foldamers have been shown to inhibit the proteolytic activity of γ-secretase (Imamura et al., 2009), an enzyme that is involved in the processing of amyloid-β (Aβ) in Alzheimer's disease, by blocking the initial substrate binding site of γ-secretase (Lichtenthaler et al., 1999). For these foldamers, the conformationally constrained β-amino acid *trans*-2-aminocyclopentanecarboxylic acid (ACPC) was used as a building block. As such α-helix mimics can increase α-helicity, stability, and cell-permeability, they are increasingly attracting the attention both in academia and the pharmaceutical industry as candidates for novel therapeutics. Apart from biomedical use, α-helical peptide mimics are also of interest as biomaterials, such as self-assembling nanotubes (Burgess et al., 2015) and hydrogels (Mehrban et al., 2015).

## **β**-Sheet Mimics

In β-sheets, two or more β-strands are connected via loops or turns, and the parallel or antiparallel orientation of β-strands is stabilized by hydrogen bonds between carbonyl oxygen atoms in one strand and amide nitrogen atoms of the opposite strand. Methods to mimic turn structures include macrocyclization as well as the use of turn-inducing building blocks, such as a dipeptide of d-proline and l-proline (Robinson, 2008), or α-aminoisobutyric acid in combination with either a d-α-amino acid or an achiral α-amino acid (Aravinda et al., 2002; Masterson et al., 2007). One noteworthy example for macrocyclization used cyclic cysteine ladders of θ-defensin as a scaffold to stabilize a turn structure (Conibear et al., 2014). The cyclic cysteine ladder of θ-defensin comprises two antiparallel β-strands connected via two β-turns, and has a high thermal and serum stability. Grafting of the integrin-binding peptide Arg–Gly–Asp (RGD) onto this molecule resulted in 10-fold increase in affinity to integrin, illustrating the utility of θ-defensin as a molecular scaffold.

It has been difficult to develop robust chemical models of β-sheets, which tolerate a wide range of amino acid sequences because amyloidogenic sequences vary enormously and folding of β-sheet mimics depends on their amino acid sequences. Woods et al. overcame this problem by using 42-membered rings, which contain two strands connected via two δ-linked ornithine turns (Woods et al., 2007). Forty-two-membered ring macrocyclic β-sheets present a pentapeptide β-strand on one side (recognition strand), while the other β-strand contains the unnatural amino acid Hao (5-hydrazino-2-methoxybenzoic acid) and two α-amino acids. The relatively rigid structure of Hao-containing peptides preserves the structure of the recognition strand, and at the time serves as a template for the recognition strand. Furthermore, Hao is useful for the intermolecular β-sheet interaction to form fibrillike assembled oligomers (Pham et al., 2014). Similar to α-helical peptides, β-sheet mimics have also been used for biomaterials, such as nanotubes (Hamley, 2014).

## STIMULI RESPONSIVE PEPTIDES IN BIOMATERIAL ENGINEERING

Some peptides are able to be structurally rearranged in response to external stimuli, such as temperature, pH, ionic strength, and presence of special ions and light. In 2006, Mart et al. (2006) reviewed different responsive systems based on peptides and their applications, including switchable surfaces, nanoparticle (dis)-assembly, hydrogel-formation, metal ion sensing, and electron transfer. In addition, special applications in medicine, such as drug delivery, tissue engineering, tissue regeneration, wound healing, and nerve cell regrowth rely upon stimuli-responsive peptides. Several conformational transitions of peptides have been reported, ranging from α-helix to random coil and *vice versa* or β-sheet to random coil and *vice versa*, among others. In this review, two selected examples are presented.

One example is the use of an azobenzene moiety as lightsensitive switch (Woolley, 2005; Renner and Moroder, 2006). As a photoswitchable device, azobenzene, which is more stable in the *trans*-conformation, can switch into *cis*-conformation upon irradiation with light at 340 nm, leading to a 3.5 Å shortening of the C–C-distance of azobenzene (Fliegl et al., 2003; Beharry and Woolley, 2011). Incorporation of the reactive azobenzene derivative 3,3′-bis(sulfonate)-4,4′-bis(chloroacetamido)azobenzene at defined positions of the sequence can result either in a loss of helical conformation (positions *i*, *i* + 11, **Figure 3A**) or in helix

stabilization (positions *i*, *i* + 7), upon light stimulus (Woolley, 2005). To make this approach more feasible for *in vivo* application, longer wavelengths should be used for azobenzene isomerization, considering UV-light scattering through cells and tissues. Samanta et al. (2013) recently reported an azobenzene derivative that can be switched using red light (630–660 nm), enabling the development of photo-switchable compounds for *in vivo* use.

Another example of stimuli-responsive peptides is the temperature-dependent formation of hydrogels by β-sheet peptides. Pochan et al. (2003) designed a β-hairpin mimic called Max3 (**Figure 3B**) that undergoes gelation upon heating (*T*gel = 60°C), which was completely reversible while cooling. This peptide is composed of alternating non-polar and polar amino acids bridged via a type II' β-turn. Other peptides undergo non-reversible hydrogelation when heated (Max1, Max2) (Pochan et al., 2003). These β-hairpin peptides were the starting point for the design of peptides whose folding can be triggered by UV light (Haines et al., 2005), changes in pH (Rajagopal et al., 2009), or recognition of electronegative cancer cell membranes (Sinthuvanich et al., 2012). Because of their biocompatibility, biodegradability, weak immunogenicity and selectivity, peptidic hydrogels can serve as potential cancer drugs and antimicrobials, as well as for wound healing (Mart et al., 2006; Branco et al., 2011).

## PROTEIN MIMICS IN BIOMEDICAL RESEARCH

Current drug discovery and development approaches are focused on three different types of molecules (Craik et al., 2013; Fosgerau and Hoffmann, 2015). The traditional approach of using small molecules as drugs is still widely used. While small molecules have been shown to be excellent tools to block the catalytic site of enzymes, as well as the ligand binding sites of numerous receptors, they are less promising for the inhibition of protein–protein interactions, which often involve larger interfaces, which typically cannot be adequately addressed by small molecules. Therefore, protein-based drugs, so-called Biologics, are increasingly used as inhibitors of protein–protein interactions. Many proteins, however, have additional effector functions or binding sites for other ligands, causing problems in *in vivo* applications. Furthermore, proteins can be immunogenic, resulting in immunological clearance before reaching their target site. As an alternative to both small molecule and protein-based drugs, peptides are becoming more relevant as drug candidates, as documented by an increasing number of peptide drugs approved for clinical use (Fosgerau and Hoffmann, 2015). Due to their potential for highly specific binding, combined with low immunogenicity, peptides are promising candidates as inhibitors of protein–protein interactions.

Specific protein–protein interactions are involved in the pathogenesis of numerous diseases. The design and generation of peptides that mimic the respective protein-binding site, as potential inhibitors of the interactions, is therefore a promising therapeutic strategy. Such mimetic molecules are typically designed based on the 3D structure of the protein–protein complex, which yields information on the location of the binding sites within the proteins, as well as the hot spot amino acids directly involved in the intermolecular interaction (Eichler, 2008). This general strategy will be illustrated here using examples of the various protein–protein interactions, which are involved in the entry of the human immunodeficiency virus type 1 (HIV-1) into cells. Furthermore, a range of protein-mimicking peptides used in the treatment of cancer and as antibiotics or anti-inflammatory compounds, will be reviewed.

## Peptides as Mimics of the Viral Spike of HIV-1

The highly active antiretroviral therapy (HAART) has been a breakthrough in the treatment of HIV-1 infection, leading to an effective reduction of morbidity and mortality through drastic suppression of viral replication and, hence, reduction of plasma HIV-1 viral load. HAART consists of a mixture of at least three different drugs with at least two different molecular targets [for details see Arts and Hazuda (2012)]. Almost all of these drugs are small molecules that address intracellular targets. Due to the high genetic variability of HIV-1, the virus is able to rapidly become resistant against drugs. Therefore, there is an ongoing need for new therapeutic strategies against HIV-1. One of these strategies is the prevention of HIV-1 entry into its host cell by blocking the interactions between viral and host proteins that are involved in the entry process. This can be achieved by using peptides, which mimic the binding sites of the involved proteins.

Entry of HIV-1 into its host cells is initiated by a cascade of protein–protein interactions between the viral and host cell proteins. These interactions involve the trimeric viral spike, composed of glycoproteins gp120 and gp41, as well as the primary receptor CD4 and corecptors CCR5 and CXCR4 on the host cell (Wilen et al., 2012).

The initial event of HIV-1 entry is an interaction of viral gp120 with the host receptor CD4. In contrast to the generally high genetic variability of HIV-1, the CD4-binding site of gp120 is highly conserved. Peptides mimicking the CD4-binding site are therefore promising candidates as HIV-1 entry inhibitors. Furthermore, as the epitopes of various broadly neutralizing anti-HIV-1 antibodies have been shown to overlap the CD4 binding site, this part of gp120 is an immunogen candidate for the generation of HIV-1 neutralizing antibodies. Based on the X-ray structure of gp120 in complex with CD4 (Kwong et al., 1998) (**Figure 4A**), novel peptides that mimic the CD4-binding site have been developed (**Figure 4**) (Franke et al., 2007; Chamorro et al., 2009). A special characteristic of these peptides is the fact that they present three sequentially discontinuous fragments of the gp120 sequence, either in linear form, or as cyclic loops, on molecular scaffolds, such as a branched peptide composed of spacer amino acids, CD4bs-M (**Figure 4A**), and a triazacyclophane scaffold (**Figure 4B**). While the triazacyclophane scaffold peptide did not affect HIV-1 infection (Chamorro et al., 2009), CD4bs-M was surprisingly found to strongly enhance HIV-1 infection of both CD4 positive and CD4 negative cells, and this effect could be linked to a strong tendency of the peptide to assemble into amyloid fibrils (Groß et al., 2015b).

Understanding the molecular and structural details of the interaction of antibodies with their viral antigens is an important step in the quest for a still elusive HIV-1 vaccine (Burton et al., 2012). A prominent class of anti HIV-1 antibodies recognizes the V3-loop of the gp120 protein (Zolla-Pazner and Cardozo, 2010), which forms a β-hairpin structure when in the antibody-bound state (**Figures 5A,B**). Robinson et al. were able to stabilize this β-hairpin structure in V3-loop peptides by grafting them on to a d-Pro-l-Pro scaffold (Riedel et al., 2011; Robinson, 2013) (**Figure 5C**). Coupling of such a stabilized V3-loop mimic to a lipopeptide carrier, which self-assembles into virus-like particles (Ghasparian et al., 2011), resulted in increased immunogenicity, enabling an alternative, carrier-independent immunization. Phage display peptide libraries (Smith, 1985) have often been used to identify peptides that bind to antibodies and thus mimic their epitopes (mimotopes). Mimotopes of the broadly neutralizing HIV-1 antibody b12 have been found (Boots et al., 1997) this way. As the viral spike proteins gp120 and gp41 are presented as trimers, Schellinger et al. (2011) generated a potential immunogen based on a trimer of the b12 mimotope in conjunction with a T-helper cell epitope peptide (**Figure 6**). This trimeric peptide bound to b12 substantially better than the monomeric mimotope, illustrating the importance of trimeric presentation,

with triazacyclophane scaffold (Chamorro et al., 2009).

which was achieved using the so-called click reaction (Rostovtsev et al., 2002) as a chemoselective ligation reaction.

In addition to gp120 mimetic peptides, peptides that present parts of gp41 are also intensively researched (Cai et al., 2011). In particular this applies to peptides that mimic a six-helix bundle, consisting of a three-stranded coiled-coil structures formed by an N-terminal (NHR) and a C-terminal (CHR) heptad repeat of gp41 (Chan et al., 1997). This region of gp41 plays a key role in the process of fusion of the viral and cellular membranes (**Figure 7**). Peptides presenting parts of the six-helical bundle are thought to be able to interfere with its correct formation and, consequently, inhibit virus-cell fusion. Already in 1992, Wild et al. (1992) described an approach to mimic the secondary structure of NHR, which was predicted to be α-helical. Using CD spectroscopy, it could be shown that the NHR-mimetic peptide forms a stable α-helix under physiological conditions. Furthermore, the peptide exhibited a strong anti-HIV-1 activity, which could be further enhanced through dimerization. Trimers of the NHR-mimetic peptide were later found to be better HIV-1 entry inhibitors than the respective monomeric peptide (Nakahara et al., 2010). Covalent stabilization of such peptide trimers through inter-chain disulfide bridges dramatically increased the antiviral potency (Bianchi et al., 2005), as well as the HIV-1 neutralizing capacity of anti-peptide antisera (Bianchi et al., 2010).

Similar to the NHR mimics, peptides mimicking the CHR region of gp41 were developed to inhibit the formation of the sixhelical bundle. In 1994, Wild et al. (1994) demonstrated a strong anti-HIV-1 activity of a peptide that overlaps the CHR. Later on, the first and so far only HIV-1 fusion inhibitor approved for clinical use (Enfuvirtide) was developed based on this peptide (Kilby and Eron, 2003; Lalezari et al., 2003). Another fusion inhibitor, called Sifuvirtide, was developed based on the 3D structure of HIV-1

gp41 and computer modeling (He et al., 2008; Wang et al., 2009). Sifuvirtide could effectively block six-helical bundle formation and was active even against Enfuvirtide-resistant HIV-1 strains. Otaka et al. increased the α-helicity of a CHR mimetic peptide by introducing Glu–Lys pairs at the *i* and *i* + 4 positions of the helix (Otaka et al., 2002), which greatly enhanced the solubility and stability of the peptide. Trimeric presentation of a CHR mimetic peptide on a C3-symmetric scaffold dramatically increased the antiviral activity of the peptide (Nomura et al., 2012).

#### Peptides as Mimics of Cellular Receptors

Cellular receptors play important roles in signal transduction pathways, as well as in viral entry. As discussed in the previous chapter, HIV-1 contacts two receptors on the host cell surface prior to fusion with the cell membrane. Peptides that mimic these receptors are useful tools to explore the details of virus infection mechanism, as well as to develop new drugs against HIV-1. In 1998, Drakopoulou et al. (1998) developed a peptidic CD4 mimic, called CD4M, based on the analysis of site-directed mutagenesis studies, antibody-blocking experiments and the structure of the extracellular fragment of CD4, which identified the CDR H2-like loop of CD4 as the binding site for gp120 of HIV-1. To retain the native structure of the CDR H2-like loop, the peptide was transferred onto a scorpion toxin, which served as a structural scaffold. Optimizing CD4M led to a variant with100 fold increased affinity to gp120, as well as infection-inhibitory activity (Vita et al., 1999). Based on the X-ray structure of CD4 in complex with gp120 (Kwong et al., 1998), Martin et al. (2003) further optimized the CD4 mimic, resulting in a 27-mer peptide mimicking the CD4 binding site for gp120. This peptide was able to bind to gp120 at low nanomolar concentrations, inhibit binding of CD4 to gp120, as well as to induce conformational changes in gp120 similar to those triggered by CD4, from which it was derived. The importance of conformational stability of CD4 mimetic peptides could be further confirmed by Meier et al. (2012). Peptides that present the binding site of CD4 for gp120 were covalently stabilized in their loop structure by cyclization through a disulfide bond between the N- and C-terminus. Using alanine and d-phenylalanine substitution analogs, the importance of the hot spot amino acid phenylalanine 43 could be confirmed at the peptide level. These results were further confirmed by molecular dynamics simulations.

The concept of mimicking protein-binding sites through complex synthetic peptides has recently been extended to peptides that mimic the extracellular domains of seven transmembrane G protein-coupled receptors (GPCRs), which is composed of the N-terminus (NT) and the three extracellular loops (ECLs). GPCRs make up the largest class of drug targets, in fact, 27% of all clinically used drugs target a GPCR.

In the context of HIV-1 infection, two GPCRs are important, i.e., the chemokine coreceptors CCR5 and CXCR4. Although the 3D structures of both receptors are available (Wu et al., 2010; Tan et al., 2013), our knowledge of the structural details of their interaction with HIV-1 gp120 remains limited. Therefore, peptides that mimic the binding site of these receptors for gp120 could be useful tools for the exploration of HIV-1-coreceptor interaction at the molecular level. We have generated a peptide that mimics the three ECLs of CXCR4 (Möbius et al., 2012) (**Figure 8A**). This peptide, named CX4-M1, is able to discriminate between CXCR4- and CCR5-recognizing gp120 (Möbius

et al., 2012) and V3-loop peptides mimicking the corresponding binding site on gp120 (Groß et al., 2013), and also inhibits HIV-1 infection of susceptible target cells in a CXCR4-specific manner (Möbius et al., 2012; Groß et al., 2015a). Furthermore, CX4-M1 is recognized by the natural CXCR4-ligand, i.e., the chemokine CXCL12 (also called SDF-1α), as well as anti-CXCR4-antibodies (Groß et al., 2015a).

In a similar approach, Pritz et al. (2008) generated, via a combination of recombinant, enzymatic and chemical synthesis, a molecule that mimics the extracellular domain of the corticotropin-releasing factor receptor type 1 (CRF1) (**Figure 8B**). Improving the scaffold for the presentation of the ECLs and N-terminus, as well as increasing the overall yields through synthesis optimization, enabled structural analysis of the receptor mimic – ligand interaction through NMR spectroscopy (Abel et al., 2014).

The epidermal growth factor receptor (EGFR), which is a key protein of cell proliferation and differentiation (Yarden and Sliwkowski, 2001), has also been subject to structure-based design of receptor mimetic peptides. As the receptor forms dimers or even oligomers, Hanold et al. (2015) generated a peptide mimic of the EGFR dimerization arm, which forms a β-hairpin in the native conformation. This peptide was stabilized *via* a triazole crosslink to increase proteolytic stability, while retaining the native structure, resulting in inhibition of EGFR dimerization and, consequently, a reduction of cell viability. Sequence and functional optimization of EGFR mimetic peptides may be useful for the development of novel cancer drugs addressing EGFR overexpression in tumors.

#### Peptides in Cancer Research

The uncontrolled growth and spread of cells into tumor tissue (Vogelstein and Kinzler, 2004) defines cancer as one of the main fatal diseases worldwide. Therefore, a major focus in peptide drug development is on oncology (Kaspar and Reichert, 2013; Fosgerau and Hoffmann, 2015). Apart from using peptides directly as anticancer drugs (Thundimadathil, 2012), they can also serve as targeting agents to direct highly toxic chemotherapeutics to their respective targets, reducing the systemic toxicity of these drugs [for details see Kaspar and Reichert (2013)].

Structure-based approaches are often used in the design of anticancer peptides, such as the inhibitor of cell migration and invasion published by Bifulco et al. (2008). The urokinase-type plasminogen activator receptor (uPAR), which plays a critical role in cancer cell growth, survival, invasion and metastasis, contains a five amino acid sequence (SRSRY) between two of its three domains, which is exposed through ligand binding, and mediates chemotactic properties of uPAR. Using the pentapeptide SRSRY as a template, glutamine scanning and insertion of a pyroglutamine (pE) resulted in the identification of the peptide pERERY-NH2 as a highly active uPAR inhibitor. Further optimization through structure-based design leads to the tetrapeptide Ac-RERF-NH2, which is 500- to 1000-fold more active than pERERY-NH2 (Carriero et al., 2009). Ac-RERF-NH2, which has a high propensity to adopt an α-turn structure, represents a promising drug candidate against cancer.

An important target for the therapy of pancreatic, gastritic, and colorectal tumors is gastrin, a peptide hormone, whose activity can be blocked by antibodies that recognize gastrin as their epitope, delaying tumor growth (Watson et al., 1996; Barderas et al., 2008b). Detailed analysis of the antibody epitopes through alanine scanning of gastrin (Barderas et al., 2008b) and docking of the epitope into the antibody binding site, followed by affinity maturation through phage display and *in silico* methods (Barderas et al., 2008a) resulted in the development of antibody fragments with enhanced potency to inhibit gastrin-induced tumor growth. With the aim to shrink these antibodies to the size of peptidomimetics, Timmerman et al. (2005, 2009) used a strategy, in which up to three peptides derived from the complementary-determining regions (CDRs) of an antibody are presented in one molecule using the CLIPS strategy (see Toolbox for peptide synthesis: non-proteinogenic amino acids and site-selective ligation). In most cases, the activity of the obtained peptides was much lower compared to the parent antibodies. Nevertheless, neutralization of gastrin in cell-based assays by the mimetic peptides could be demonstrated (Timmerman et al., 2009). The mode of action of the peptides, however, may be different from that of the parent antibodies (Timmerman et al., 2010), leading to the conclusion that further efforts in peptide design have to be made.

Small GTPases, such as Ras, Rab, and Rho, are key proteins in many cancers, as malfunction of these proteins results in abnormal cell growth and differentiation, prolonged cell survival, membrane trafficking, and vesicular transport (Bourne et al., 1990; Cherfils and Zeghouf, 2013). Inhibiting the activity of these small GTPases could lead to new chemotherapeutic drugs for cancer treatment. One strategy to achieve this is to address the GDP–GTP exchange of Ras, which is the rate-limiting step and requires interaction with the Ras-specific guanine nucleotide exchange factor Sos (Konstantinopoulos et al., 2007). In 2011, Patgiri et al. (2011) published the structure-based design of an α-helical peptide derived from the Sos-Protein, which is able to inhibit Sos-mediated Ras activation through interference with the Sos–Ras interaction, providing a promising lead compound for anti-cancer drugs. Likewise, peptide mimics of the Rab ligands R6IP, LidA, REP1, and Rabin8 have been reported (Spiegel et al., 2014). Using the hydrocarbon-peptide stapling approach, α-helical peptides were stabilized at positions *i* and *i* + 4, resulting in up to 200-fold increased affinity of the peptide to Rab proteins. In addition, one of the peptides, being a pioneer inhibitory compound for Rab GTPase–protein interactions, was found to inhibit the Rab8a–effector interaction.

A challenge in cancer drug delivery is the discrimination between self and non-self, i.e., clearance of drug-loaded nanoparticles before they reach their target. To overcome this problem, synthetic polymers such as polyethylene glycol are used, but these can hamper uptake by cancer cells (Hong et al., 1999a). As an alternative strategy, Rodriguez et al. (2013) generated, based on the crystal structure of the hCD47–hSIRPα complex, and in combination with computational simulations, a minimal "self " 21-mer peptide. This peptide, which originates from CD47, an established marker of "self " (Rodriguez et al., 2013), was able to prolong the circulation of nanobeads in mice by preventing phagocytosis, providing a new opportunity for enhanced delivery of drugs or imaging agents. As an example for its utility as a marker of "self," the anti-cancer drug paclitaxel was loaded onto nanoparticles, which also presented the marker-peptide on their surface. Due to delayed clearance, treatment with peptide-coated nanoparticles induced a more efficient size-reduction of lung adenocarcinoma epithelial tumors in mice than beads without the peptide. Although this peptide is not the bio-active compound, it provides an excellent tool for the delivery of drugs to tumor tissues.

Peptides are also promising candidates for cancer immunotherapy, where they are used as vaccines that present tumor-associated antigens, which trigger an immune response against the tumor in the patient. It can be expected that peptides presenting tumor-associated antigens will increasingly gain significance for cancer immunotherapy in the future (Miller et al., 2013).

#### Peptides as Antibiotics and Anti-Inflammatory Compounds

The growing multi-resistance of bacteria to clinically used antibiotics is one of the current challenges in biomedical research (Dennesen et al., 1998). The development of new antibacterial drugs is therefore an urgent necessity, and peptides have proven beneficial in this area of drug development as well. Robinson et al. (2005) could demonstrate improved antimicrobial activity, as well as plasma half-life of β-hairpin mimics of the naturally occurring membranolytic host-defense peptide protegrin 1 (Shankaramma et al., 2002; Srinivas et al., 2010) (**Figure 5D**). These peptides were cyclized via a d-proline–l-proline template, reducing flexibility and stabilizing the conformation of the peptide (Shankaramma et al., 2002; Robinson et al., 2005; Srinivas et al., 2010). Furthermore, these peptides were shown to directly interact with the bacterial β-barrel protein LptD, which sets them apart from other antimicrobial peptides, whose effect is mainly based on a membranolytic activity.

An anti-inflammatory peptide, named CHOPS (Bunschoten et al., 2011) (**Figure 9**), was designed based on the structure of the chemotaxis inhibitory protein of *Staphylococcus aureus* (CHIPS) (Veldkamp et al., 2000; Haas et al., 2005; Ippel et al., 2009). CHIPS is known to bind to the C5a-receptor and to inhibit the C5a–C5a-receptor interaction (Postma et al., 2004), thus addressing an important element in the complement cascade of the innate immunity. As full-length CHIPS is highly immunogenic (Gustafsson et al., 2009), its peptide mimic CHOPS, whose conformation is similar to the respective CHIPS fragment, and which binds to the N-terminus of the C5a-receptor (Bunschoten et al., 2011), may become a promising alternative for the treatment of inflammatory and autoimmune diseases.

Proteins in the outer membrane of Gram-negative bacteria often have β-barrel structures. The proper assembly of these proteins is provided for by the β-barrel assembly machine (Bam) (Hagan et al., 2011). One important component of the Bam multiprotein complex is BamD, which interacts with unfolded protein substrates, like BamA, and facilitates their assembly in the outer membrane (Hagan et al., 2013). Using a peptide scanning approach of the C-terminal region of BamA, a 15-mer peptide was identified as an inhibitor of outer membrane protein assembly (Hagan et al., 2015). *In vivo* expression of this peptide resulted in bacterial growth defects, and sensitized resistant *Escherichia coli* to antibiotics, marking a starting point for the development of new antibiotic compounds for gram-negative bacteria (Hagan et al., 2015).

It should also be noted that a plethora of antimicrobial peptides are found in numerous organisms, including insects, mammals, plants, and bacteria (Mojsoska et al., 2015), which are not subject of this review. Furthermore, computer-based design strategies are aimed at the design of antimicrobial peptides with improved activity and reduced mammalian cell toxicity (Fjell et al., 2012).

#### CHALLENGES AND FUTURE DIRECTIONS

Due to their intrinsic properties, such as their potential for highly specific interactions with target molecules, generally low toxicity and immunogenicity, and rapid clearance, peptides are increasingly appreciated as candidates for novel drugs. This is particularly true for the development of protein–protein interaction inhibitors, where peptides are often better able than small molecules to cover large protein interface areas.

On the other hand, peptides also present severe bottlenecks that need to be considered and, if necessary, addressed in the development of peptide drugs. The biggest challenge clearly is the limited metabolic stability of peptides, since they are rapidly degraded by proteolytic enzymes, precluding oral administration of peptide drugs. This challenge can be addressed by different means. First, unlike recombinant protein synthesis, chemical peptide synthesis is not limited to the proteinogenic amino acids as building blocks. A plethora of additional amino acids are currently available for chemical peptide synthesis. Apart from dramatically increasing the metabolic stability of peptides, incorporation of these amino acids also increases the chemical diversity presented by synthetic peptides, as these additional amino acids introduce chemical moieties that are not presented by the proteinogenic amino acids. Furthermore, conformational stabilization through cyclization, or through introduction of defined secondary structures, has been shown to shield peptides from proteolytic enzymes. Such shielding effects can also be achieved by coupling the peptide to larger inert molecules, such as polyethylene glycol (Swierczewska et al., 2015).

Due to their molecular size, peptides are rarely able to passively pass cell membranes, limiting their utility to address intracellular target molecules. This drawback, however can be counteracted by attaching the drug peptide to one of a large group of available cell-penetrating peptides (Kurrikof et al., 2015), which are able to transport a variety of molecular cargo into cells.

In general, the chemical synthesis of peptides through solidphase synthesis is fairly straightforward and has been optimized over the past decades, so that virtually all peptide sequences are accessible synthetically today. In our experience, however, the synthesis of specific peptides may require the use of specific protected amino acids and other building blocks, solid supports, linkers and other reagents, which significantly increases the cost of synthesis. These considerations may become relevant for the large-scale synthesis of peptide drugs, as well as peptide biomaterials.

The design of peptides as protein–protein interaction inhibitors is typically based on the resolved 3D structure of the respective protein–protein complex. While such structures are increasingly becoming available through powerful x-ray crystallography technology, their generation is not trivial and contingent on the availability of suitable crystals of the protein complexes.

Overall, taking into account the tremendous technical and scientific progress in the field of using peptides as protein mimics, we strongly believe that the significance of synthetic peptides in biomedical research, as well as in biomaterial engineering, will continue to grow in the future.

#### CONCLUSION

The design of peptides as protein mimics has evolved as a promising strategy for the exploration of, as well as the controlled interference with, protein–protein interactions. Due to their chemical nature, peptides are an appropriate type of molecules for the mimicry of protein-binding sites, including those involving large protein–protein interfaces. The possibility to use non-proteinogenic amino acids, as well as various methods of chemical modification, greatly enhances the scope of chemical and structural versatility, as well as stability, of synthetic peptides. Apart from their significance as molecular tools to explore

#### REFERENCES


protein–protein interactions, such protein mimetic peptides are also candidates for the inhibition of protein–protein interactions involved in disease processes. Furthermore, peptides play an important role in biomaterial engineering, as they are biocompatible, biodegradable, and functionally selective. Photo-switchable peptides can be used to temporally and/or spatially control processes in organisms, such as drug release at specific organs or tissues. These applications illustrate the utility and versatility of synthetic peptides as molecular tools in biomedical research, as well as in synthetic biology.

#### AUTHOR CONTRIBUTIONS

AG and CH have written individual chapters and prepared figures. HS and JE have written and edited the manuscript.

#### ACKNOWLEDGMENTS

The authors acknowledge support through the Emerging Fields Initiative, Project Synthetic Biology, by the University of Erlangen-Nuremberg. CH was supported by a Research Fellowship from the Alexander von Humboldt Foundation.

intermediate yields neutralizing antisera against HIV-1 isolates. *Proc. Natl. Acad. Sci. U.S.A.* 107, 10655–10660. doi:10.1073/pnas.1004261107


mediated viral-cell fusion and fusion inhibitor design. *Curr. Top. Med. Chem.* 11, 2959–2984. doi:10.2174/156802611798808497


self-assembly, and hydrogel material formation. *Biomacromolecules* 10, 2619– 2625. doi:10.1021/bm900544e


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2016 Groß, Hashimoto, Sticht and Eichler. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **Synthetic Protein Scaffolds Based on Peptide Motifs and Cognate Adaptor Domains for Improving Metabolic Productivity**

*Anselm H. C. Horn\* and Heinrich Sticht\**

*Bioinformatik, Institut für Biochemie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany*

The efficiency of many cellular processes relies on the defined interaction among different proteins within the same metabolic or signaling pathway. Consequently, a spatial colocalization of functionally interacting proteins has frequently emerged during evolution. This concept has been adapted within the synthetic biology community for the purpose of creating artificial scaffolds. A recent advancement of this concept is the use of peptide motifs and their cognate adaptor domains. SH2, SH3, GBD, and PDZ domains have been used most often in research studies to date. The approach has been successfully applied to the synthesis of a variety of target molecules including catechin, D-glucaric acid, H2, hydrochinone, resveratrol, butyrate, gamma-aminobutyric acid, and mevalonate. Increased production levels of up to 77-fold have been observed compared to non-scaffolded systems. A recent extension of this concept is the creation of a covalent linkage between peptide motifs and adaptor domains, which leads to a more stable association of the scaffolded systems and thus bears the potential to further enhance metabolic productivity.

#### *Edited by:*

*Zoran Nikoloski, Max-Planck Institute of Molecular Plant Physiology, Germany*

#### *Reviewed by:*

*M. Kalim Akhtar, University of Edinburgh, UK Mattheos Koffas, Rensselaer Polytechnic Institute, USA*

#### *\*Correspondence:*

*Anselm H. C. Horn anselm.horn@fau.de; Heinrich Sticht heinrich.sticht@fau.de*

#### *Specialty section:*

*This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology*

> *Received: 26 August 2015 Accepted: 05 November 2015 Published: 23 November 2015*

#### *Citation:*

*Horn AHC and Sticht H (2015) Synthetic Protein Scaffolds Based on Peptide Motifs and Cognate Adaptor Domains for Improving Metabolic Productivity. Front. Bioeng. Biotechnol. 3:191. doi: 10.3389/fbioe.2015.00191* **Keywords: adaptor domain, linear peptide motif, protein scaffold, fusion protein, metabolic engineering**

## **INTRODUCTION**

Nature has developed highly efficient ways for signal and substrate processing in living cells. Synthetic biology is inspired by the natural archetype and tries to mimic and optimize biological processes for tailor-made applications (Luo et al., 2013). Milestone achievements of this relatively young area include the microbial production of artemisinic acid, a key precursor of the antimalarial drug artemisinin (Martin et al., 2003; Ro et al., 2006), the industrial production of 1,3-propanediol, used in a multitude of further applications (Nakamura and Whited, 2003), and the reconstruction of a complete microbial genome (Gibson et al., 2010).

The rational design in synthetic biology is frequently inspired by the spatial proximity of enzymes observed in nature (Conrado et al., 2008; Luo et al., 2013). Following evolution, such artificial bioreactors implement proximity mostly via modular scaffolds (Carroll, 2005; Bhattacharyya et al., 2006). In this concept, modular building blocks are used for the creation of large custom scaffold systems. Such scaffolds define the spatial organization of enzymes and allow substrate channeling like in natural systems, which has several advantages: it rescues the intermediates from diffusion or competing pathways, decreases their transit times, and avoids unfavorable equilibria and kinetics from metabolite concentrations in the bulk phase (Miles et al., 1999; Spivey and Ovadi, 1999). In nature, many organisms have developed multifunctional enzyme systems with a pivotal role in both primary metabolism [e.g., amino acid biosynthesis (Welch and Gaertner, 1980) or fatty acid oxidation (Ishikawa et al., 2004)] and secondary metabolism [e.g., multifunctional polyketide synthases in bacteria (Pfeifer and Khosla, 2001) and flavonoid or alkaloid biosynthesis in plants (Jorgensen et al., 2005)].

Multifunctional enzyme systems that mimic natural systems can be artificially constructed via at least four general strategies: (I) *colocalization* or *immobilization* of enzymes has been the first approach to be of practical use; (II) *compartmentalization* generates an enclosed reaction area that can be defined similar to biological systems (e.g., in cell organelles); (III) *DNA/RNA building blocks* can be utilized for spatial organization of reactive centers; and (IV) *protein scaffolding* presents a versatile approach in synthetic biology. This last scaffolding principle can be divided further into several approaches: fusion proteins are constructed by linking two or more enzymes into a single protein sequence. Non-covalent protein–protein interactions via the mutual recognition of folded domains or coiled-coil pairs as well as amyloid assemblies can be used to construct scaffolds with defined stoichiometry. A brief overview over all these scaffolding strategies is provided as supplementary information. Another concept in protein-based scaffolding uses protein adaptor domains and peptide ligands for bringing enzymes in spatial proximity, the focus of this review.

## **SCAFFOLDING BASED ON ADAPTOR DOMAINS: STRUCTURAL PRINCIPLES**

Many important physiological protein interactions are mediated by relatively small protein domains, which bind to peptides exhibiting specific sequence motifs (**Figure 1A**) (Dinkel and Sticht, 2010). In this type of interaction, only the adaptor domain adopts a globular three-dimensional structure while the interaction motif is mostly linear and has, therefore, been termed short linear interaction motif (SLiM). This type of protein–ligand interaction presents a promising concept in protein scaffolding (**Figure 1B**) that has gained a lot of attention in synthetic biology applications.

Protein scaffolding based on cognate adaptor domains and peptide motifs requires a careful selection of candidate domains and SLiMs as well as the choice of proper linkers to interconnect these moieties and to attach them to the enzymes of interest (**Figure 1B**). The properties of these three building blocks, i.e., domain, linker, and peptide ligand, critically affect the shape of

**FIGURE 1 | Scaffolding with adaptor domains and peptide motifs**. **(A)** Schematic view of scaffolding modules, i.e., three different adaptor domains (D1–D3) with their peptide ligands (black line forms). **(B)** Scaffold protein built by the three adaptor domains D1–D3 with three enzymes (E1–E3) bound via peptide ligands, which are fused by a linker region (pink) to the respective enzyme. **(C)** Alternative scaffold system formed by three peptide ligands [cf. Lu et al. (2014)], which bind to the respective adaptor domain fused to an enzyme. **(D)** Three-dimensional structures of protein domains used for scaffolding: SH2 domain in yellow [PDB-code: 3WA4, Higo et al. (2013)], SH3 domain in red [PDB-code: 1WA7, Schweimer et al. (2002)], PDZ domain in blue [PDB-code: 4UU5, Ivanova et al. (2015)], and GBD domain in green [PDB-code: 2K42, Cheng et al. (2008)]. Structural representations were created with VMD (Humphrey et al., 1996).

the resulting scaffold (cf. **Figure 1**) and will be described in the following in more detail.

## **Adaptor Domains**

Adaptor domains used in protein-peptide scaffolding need to fulfill two basic requirements. First, they should have a strong affinity toward their peptide ligands to allow for effective coupling. Second, they should provide a distinct specificity for their ligands to allow for defined coupling, when several domainligand pairs are used simultaneously. The most often used domains are SH3, SH2, PDZ, and GTPase-binding domain (GBD) (cf. **Figure 1D**).

The first protein modules that were reported to mediate interaction with SLiMs are the "Src homology 2" (SH2) and "Src homology 3" (SH3) domains (Koch et al., 1991). SH3 domains are small modules of ca. 60 residues. They recruit proline-rich ligands, which bind to the domain surface at three shallow grooves formed by conserved aromatic residues (Mayer, 2001) and exhibit two different binding orientations. Over the last few years, an increasing number of SH3 domains with different ligand binding specificity have been described (Saksela and Permi, 2012).

SH2 domains are highly conserved structures of ca. 100 residues comprising two α-helices and seven β-strands (Pawson et al., 2001). In nature, this domain possesses an either promiscuous or strict specificity for a 3–5 residues motif flanking a phosphorylated tyrosine; like for the SH3 domain, additional SH2 binding modes were discovered, underscoring the plasticity of this recognition type in physiological context (Machida and Mayer, 2005).

PDZ domains are also widely used for scaffolding. They are of similar size as SH2 domains and target specific motifs at the C-terminus of the binding partner. The peptide ligand adopts a β-strand and extends an existing β-sheet within the PDZ domain upon binding (Schultz et al., 1998; Harris and Lim, 2001). At least four different classes of ligands are known for PDZ domains exhibiting a distinct binding specificity (Songyang et al., 1997).

The last example of an established domain-ligand pair in synthetic biology originates from GBDs. In contrast to the other domains discussed above, isolated GBD domains do not adopt a single, discrete structure under physiological conditions but rather sample multiple, loosely packed conformations in solution (Abdul-Manan et al., 1999; Kim et al., 2000). The corresponding peptide ligand has been deduced from the autoinhibited form of the GBD (Dueber et al., 2009). **Figure 1D** shows three-dimensional structures of the SH2, SH3, PDZ, and GBD domain. Beyond the examples presented above, other domain/ligand pairs may also be utilized for synthetic scaffolds if they exhibit a sufficiently high affinity and specificity for their ligand.

### **Linear Motif Peptides**

Short linear interaction motifs are the complementary binding partner to protein adaptor domains. These peptide motifs occur in disordered protein regions and are present in 20–50% of all eukaryotic proteins, while up to 17% of the proteins are completely disordered in eukaryotic cells. To date, ~300 known motif patterns are listed in electronic databases, e.g., ELM database (Dinkel et al., 2014), PROSITE (Hulo et al., 2004), and Minimotif-Miner (Balla et al., 2006). Interestingly, there are estimates that in the proteome the SLiM-mediated instances in signaling pathway modulation outnumber those mediated by globular domains (McEntyre and Gibson, 2004).

Linear motif peptides possess a number of properties, which make them well suited as ligands in synthetic biology. The interaction motifs normally comprise only 3–10 amino acids and are thus rather short and intrinsically disordered. Furthermore, SLIMs may constitute the sites of post-translational modification (e.g., phosphorylation), which enables them to function as inducible switches.

In addition to the key residues necessary for binding, SLiMs also frequently contain variable residues (denoted as "X") to ensure proper spacing between the binding residues. Due to its lack of a defined structure prior to binding, this peptide–domain interaction differs from the well-known domain–domain interactions in protein complexes. Prominent examples for SLiM sequence patterns include the classical P–x–x–P motif for binding to SH3 domains or a phosphorylated tyrosine with specific sequence neighbors for binding to SH2 domains.

## **Linkers**

The last part necessary for modular protein scaffolding is the linker region connecting the engineered enzymes and the attached peptide ligands (**Figure 1B**). The importance of linker design is well known from fusion proteins (Chen et al., 2013), as length and amino acid composition may influence the activity and folding properties of the protein construct (Robinson and Sauer, 1998; Bai and Shen, 2006; Zhao et al., 2008).

As a guide for the rational design of artificial linkers, an inspection of natural linkers is helpful. Two independent studies with different data sets gave similar results: while Argos found a preferred mean linker length of 6.5 residues (Argos, 1990), George and Heringa obtained a value of 10.0 *±* 5.8 residues (George and Heringa, 2002). Generally, polar or charged residues were enriched in the natural linkers, with a secondary structure preference for coil (Argos, 1990) or helix (George and Heringa, 2002), respectively. Natural linkers lack interaction with neighboring protein domains and adopt mainly non-globular conformations (Chen et al., 2013).

Designed linkers may be classified according to their structure, which defines their functionality. Flexible linkers are normally rich in small or hydrophilic amino acids and allow for an increased spatial separation and reorientation of the fused parts. A prominent and very early example for a flexible linker is (GGGGS)<sup>3</sup> that connected the heavy and light chain domains (V<sup>H</sup> and VL) of an engineered antibody fragment (Huston et al., 1988). Rigid linkers with the sequence (EAAAK)*<sup>n</sup>* exhibit a stable helical structure and thus pertain a certain distance between the fused parts. This linker type has been successfully used to increase the enzymatic efficiency of bifunctional fusions of β-glucanase and xylanase (Lu and Feng, 2008).

It should also be noted that linker regions may also have additional benefits. They potentially improve folding and stability (Huston et al., 1988; Takamatsu et al., 1990; Werner et al., 2006; Hagemeyer et al., 2009), expression (Amet et al., 2009), or even bioactivity (Bai and Shen, 2006).

## **SCAFFOLDING BASED ON ADAPTOR DOMAINS: APPLICATION TO METABOLIC ENGINEERING**

In this section, several applications of scaffolding using adaptor domains and peptide ligands are presented. The key features of the engineered systems are summarized in **Table 1**.

As one prominent example for this approach, Dueber et al. (2009) engineered a model scaffold for the three-step synthesis of mevalonate (Martin et al., 2003), which is an important precursor for the large field of isoprenoids, starting from acetyl-CoA. The enzymatic system comprised three modules, acetoacetyl-CoA thiolase (AtoB), hydroxy-methylglutaryl-CoA synthase (HMGS) and hydroxymethylglutaryl-CoA reductase (HMGR). From these modules, only AtoB is native to the host system *Escherichia coli*, whereas the two others were imported from *Saccharomyces cerevisiae*. To avoid flux imbalances with high metabolic load and to increase the overall production, scaffold constructs of three domains, GBD, SH3, and PDZ connected via flexible linkers were created and AtoB, HMGS, and HMGR were extended by corresponding peptide ligands, respectively. As this first simple scaffold design yielded only slightly increased product titers compared to the scaffold-free system, the authors designed scaffolding proteins with a varying number of SH3 and PDZ domains. This systematic search revealed the best synthetic scaffold GBD1–SH32–PDZ<sup>2</sup> for this system, i.e., one GBD domain linked to two SH3 and PDZ domains, and exhibited a remarkable 77-fold increase of the product. Furthermore, the authors also investigated the influence of the spatial orientation of the domains toward each other by changing the order of the two SH3 and PDZ domains. A further increase, however, was not observed in these additional systems (Dueber et al., 2009).

In order to demonstrate the generality of this approach, the same group strived to increase the production of -glucaric acid from -glucose via scaffolding. The synthetic pathway had originally been constructed by Moon et al. (2009): myoinositol-1-phosphate synthase (Ino1) from *S. cerevisiae*, myoinositol oxygenase (MIOX) from mouse, and uronate dehydrogenase (Udh) from *Pseudomonas syringae* were coexpressed in *E. coli*. A domain-based scaffold for the two enzymes Ino1 and MIOX, which were equipped with the respective peptide ligand sequences, tripled the product titers compared to the original system (Dueber et al., 2009). Additional optimization of the system by including Udh into the scaffold and also varying the number of cognate domains within the scaffold allowed for an additional product increase of ~50% (Moon et al., 2010).

The first artificially scaffolded redox pathway was presented by Agapakis et al. (2010). They engineered a hydrogen-producing electron transfer circuit in *E. coli* composed of the heterologously expressed enzymes [Fe-Fe]-hydrogenase, ferredoxin, and pyruvate-ferredoxin oxidoreductase. A major issue was the risk of side reactions caused by high energy electrons stored in iron-sulfur cluster proteins. They, thus, applied several methods to insulate the synthetic pathway, one of which was to utilize a protein scaffold constructed from the three domains GBD, SH3, and PDZ. This approach yielded a threefold increase of H<sup>2</sup> production. Furthermore, the authors investigated the influence of scaffold protein composition and peptide ligand


**TABLE 1 | Examples of engineered scaffolds comprising adaptor domains and peptide ligands**.

*<sup>a</sup>Unscaffolded enzymes in parentheses.*

*<sup>b</sup>Scaffold domains without a ligand/enzyme counterpart in parentheses.*

*<sup>c</sup>Compared to the unscaffolded system.*

linker length on the yield and found both to be a significant factor.

Scaffolds consisting of the same domains, GBD, SH3, and PDZ, were used to increase the production of butyrate in *E. coli* (Baek et al., 2013). For the complete biosynthetic pathway, the five enzymes acetoacetyl-CoA thiolase, 3-hydroxybutyryl-CoA dehydrogenase, 3-hydroxybutyryl-CoA dehydratase, trans-enoylcoenzyme A reductase, acyl-CoA thioesterase II were overexpressed in the host. For the three enzymes amidst the pathway, a domain scaffold was created to provide a better spatial proximity of the reaction centers. After additional variation of the domain frequency within the scaffold, the production increased to threefold.

Wang and Yu (2012) used the set of scaffold proteins composed of GBD, SH3, and PDZ domains established by Dueber et al. (2009)for another biotechnological application. Their work aimed to recruit two enzymes, 4-coumarate:CoA ligase and stilbene synthase, via covalently attached SH3 and PDZ peptide ligands for the biosynthesis of resveratrol, a naturally occurring defense molecule from plants with significant physiological effects on human and animals. In contrast to the experimental settings discussed above, *S. cerevisiae* was used as host system. The product yield increased fivefold via the scaffolding approach compared to the unscaffolded enzymes and 2.7-fold compared to a direct fusion protein approach.

The biosynthesis pathway of catechins from flavanone was the target of metabolic engineering efforts of Koffas and coworkers (Zhao et al., 2015). In their pathway optimization, they focused on three enzymes: flavanone-3-hydroxylase, dihydroflavonol 4-reductase, and leucoanthocyanidin reductase. Application of scaffolds composed of GBD, SH3, and PDZ domains yielded only marginal metabolic improvement in some cases, whereas most constructs tested exhibited a decreased productivity.

Besides the creation of a scaffold protein containing multiple adaptor domains, domain–ligand interactions can also be exploited in a different fashion as exemplified by the work of Vo et al. (2013). They enhanced the productivity of *E. coli* producing gamma-aminobutyric acid (GABA) by coupling glutamate decarboxylase (GadA/GadB) to the membrane protein glutamate/GABA antiporter (GadC). For that purpose, they attached an SH3 domain to GadA/GadB and three peptide ligand sequences to GadC each separated with flexible linkers. In that way, they could increase the GABA productivity by 2.5-fold.

A further and different approach used the localization of substrate and enzyme on a self-assembled monolayer for a 30-fold product increase (Li et al., 2010); 4-hydroxyphenyl 2 methylvalerate, which is converted by cutinase to a hydroquinone product, and a SH2-ligand were presented on the surface to the enzyme fused to a SH2 domain. As the SH2 domain only recognizes its ligand in phosphorylated form, the system contains a potential switch, which might be exploited in future applications.

## **OUTLOOK**

Inspection of **Table 1** reveals that the increase in metabolic productivity is highly dependent on the system investigated, and for some of the systems, there is little benefit from scaffolding. As suggested by Zhao et al. (2015), a further increase in catechin biosynthesis might be achieved from an optimization of linkers. An additional factor for optimization might be the use of alternative adaptor domains or the rational design of covalent bonds between the two binding partners in order to increase the stability of the scaffolded complex.

Recently, Lu et al. (2014) constructed two domain-ligand pairs for both SH3 and PDZ domains, in which the ligand–domain interaction was reinforced by an engineered thioether bond. For that purpose, a residue within the domain was mutated to cysteine, while the peptide ligand was equipped with an unnatural amino acid carrying a reactive α-chloroacetyl group. Binding of the ligand to the domain brought the two reactants in close proximity and established the covalent bond. Using this approach, the authors constructed several Y-shaped ligand structures via triazole bonds branched from a lysine site as mini-scaffolds (**Figure 1C**).

Similarly, Guan et al. (2013) created a disulfide bond between a PDZ domain and its ligand by mutating one residue to cysteine in each of the binding partners to reinforce the domain–ligand interaction. Fusing these modified moieties to the trimeric protein CutA, they were able to build stable hydrogels. By adding a second peptide ligand sequence to one CutA species, the hydrogel could be functionalized by an enzyme and formed an enzymatic biocathode for direct electron transfer.

A new versatile approach for covalent protein linkage is based on CnaB domains from bacteria, which autocatalytically establish isopeptide bonds between the sidechains of a lysine and an asparagine/aspartate residue (Veggiani et al., 2014). By a structurebased splitting of the CnaB domain into two parts, it was possible to create a domain-ligand pair that enables spontaneous formation of intermolecular isopeptide linkages (Zakeri et al., 2012; Li et al., 2014). A modification of this approach even allows that two peptides become covalently joined by an artificial ligase (Fierer et al., 2014). A recently described ester bond that forms autocatalytically in a bacterial cell surface adhesion protein (Kwon et al., 2014) also bears the potential for the construction of orthogonal covalent domain/ligand pairs. The enhanced stability due to the isopeptide or ester bonds may present a promising strategy to design more efficient scaffolds for artificial bioreactors in the future.

Synthetic biology is an emerging field with tremendous biotechnological potential. The efforts reviewed above clearly demonstrate that promising steps in this field have been made, though individual system design will require a tailor-made approach for achieving optimization. More complex metabolic pathways or large-scale industrial applications, however, would clearly benefit from an extended and well characterized tool-box of scaffolding components (Kwok, 2010).

## **FUNDING**

This work was supported by the Emerging Fields Initiative from the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU), project "Synthetic Biology."

## **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at http://journal.frontiersin.org/article/10.3389/fbioe.2015.00191

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Horn and Sticht. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Engineering of metabolic pathways by artificial enzyme channels

*Marlene Pröschel1 , Rainer Detsch2 , Aldo R. Boccaccini2 and Uwe Sonnewald1 \**

*1Department of Biology, Biochemistry Division, Friedrich-Alexander-University Erlangen-Nuremberg, Erlangen, Germany, 2Department of Materials Science and Engineering, Institute of Biomaterials, Friedrich-Alexander-University Erlangen-Nuremberg, Erlangen, Germany*

Application of industrial enzymes for production of valuable chemical compounds has greatly benefited from recent developments in Systems and Synthetic Biology. Both, *in vivo* and *in vitro* systems have been established, allowing conversion of simple into complex compounds. Metabolic engineering in living cells needs to be balanced which is achieved by controlling gene expression levels, translation, scaffolding, compartmentation, and flux control. *In vitro* applications are often hampered by limited protein stability/ half-life and insufficient rates of substrate conversion. To improve stability and catalytic activity, proteins are post-translationally modified and arranged in artificial metabolic channels. Within the review article, we will first discuss the supramolecular organization of enzymes in living systems and second summarize current and future approaches to design artificial metabolic channels by additive manufacturing for the efficient production of desired products.

#### *Edited by:*

*Zoran Nikoloski, Max-Planck Institute of Molecular Plant Physiology, Germany*

#### *Reviewed by:*

*Daehee Lee, Korea Research Institute of Bioscience and Biotechnology, South Korea Lee Sweetlove, University of Oxford, UK*

> *\*Correspondence: Uwe Sonnewald uwe.sonnewald@fau.de*

#### *Specialty section:*

*This article was submitted to Synthetic Biology, a section of the journal Frontiers in Bioengineering and Biotechnology*

*Received: 29 July 2015 Accepted: 06 October 2015 Published: 21 October 2015*

#### *Citation:*

*Pröschel M, Detsch R, Boccaccini AR and Sonnewald U (2015) Engineering of metabolic pathways by artificial enzyme channels. Front. Bioeng. Biotechnol. 3:168. doi: 10.3389/fbioe.2015.00168*

Keywords: metabolic engineering, matrix-bound enzymes, protein scaffolding, enzyme arrays, metabolic channels, isopeptide-bonding, SpyCatcher/SpyTag, additive manufacturing

## INTRODUCTION

Living cells are highly dynamic and complex metabolic systems in which most enzymes do not function in isolation but form supramolecular complexes (Jørgensen et al., 2005). By providing spatial and temporal organization of molecules within the cell, these complexes allow optimized substrate channeling and thereby prevent loss of intermediates and improve control and efficiency of catalysis. To mimic supramolecular complexes, several approaches to co-localize functionally related enzymes have been followed. These include scaffolding of enzymes to generate artificial substrate channels (Dueber et al., 2009; Lee et al., 2012). *In vitro* scaffolding of enzymes can be achieved by, i.e., cross-linking, encapsulation, and binding to nucleic acid or protein scaffolds. The latter two options allow the sequential arrangement of enzymes in a correct, programmable, and defined spatial order. Protein-based scaffolding requires specific binding domains for interaction. This bears some problems: only a limited number of high-affinity interaction domains are available, binding efficiency of different domains may not be comparable and interactions are reversible that may result in a short half-life of the artificial channel. To circumvent these problems, covalent linkages between the synthetic scaffold platform and the enzymes to be arranged would be advantageous. In nature, inter- and intramolecular isopeptide bonds are formed to stabilize proteins or to label proteins for proteolysis by ubiquitinylation (Kang and Baker, 2011). By dissecting the mechanism of spontaneous intramolecular isopeptide formation within the CnaB2 domain of the fibronectin-binding protein FbaB from *Streptococcus Pyogenes* (Spy), Howarth and co-workers developed a versatile tool to allow covalent binding of tagged-enzymes to modified macromolecules (Zakeri and Howarth, 2010). This approach can be applied to cell free and possibly even to cellular systems. Besides designing covalent/irreversible or reversible synthetic protein complexes for metabolic engineering, threedimensional (3-D) printing of enzyme arrays may enable the design of *in vitro* protein channels. These channels do not rely on protein–protein interactions but are based on the sequential printing of individual enzymes. Within the review article, we will describe examples of supramolecular organization in cells, attempts to immobilize and stabilize enzymes for industrial use, and finally summarize current approaches to design artificial metabolic channels by additive manufacturing (AM) for efficient production of valuable chemical products.

#### CELLULAR PROTEINS ARE ORGANIZED IN SUPRAMOLECULAR STRUCTURES

Cellular systems are highly complex and contain high concentrations of macromolecules (Long et al., 2005; Conrado et al., 2008; Good, 2011; Chen and Silver, 2012). Within the cell, these molecules are organized in a temporal and spatial manner allowing the cell to fulfill its many distinct reactions that take place simultaneously (Good, 2011). Coordination and organization of cellular processes is achieved through compartmentation (Chen and Silver, 2012). The need for spatial and temporal organization of proteins in signaling pathways and metabolism is evident when looking at the crowded milieu of macromolecules inside cells and the many complex and competing reactions running concurrently (Sweetlove and Fernie, 2013). In signaling pathways the question arises, how correct interaction partners find each other while avoiding interaction and cross-talk with the wrong ones (Good, 2011). This is important since the correct communication of functionally interacting proteins is a prerequisite for the coordination and regulation of many cellular processes required for appropriate cellular responses to external and internal stimuli (Chen et al., 2014). Strict control and tight regulation of flux through metabolic pathways is of equal importance (Dueber et al., 2009). Metabolic regulation faces many challenges, including avoidance of flux imbalances, slow turnover rates of enzymes, toxic pathway intermediates, and competing metabolic reactions (**Figure 1**; Conrado et al., 2008; Chen and Silver, 2012; Lee et al., 2012). Consequently, engineering of artificial metabolic pathways in living cells often suffers from low productivity and yield if spatial organization/compartmentation strategies are not included in the engineering concepts (Conrado et al., 2008). To increase the overall cellular efficiency, accuracy, and specificity, nature has evolved compartmentation strategies to control and regulate flux through metabolic and signaling pathways (Chen and Silver, 2012; Conrado et al., 2012).

Intracellular compartmentation can be divided into macrocompartmentation and micro-compartmentation (Sweetlove and Fernie, 2013). Macro-compartmentation refers to the separation of reaction compartments, organelles, by biological membranes. Organelles, which are a hallmark of eukaryotic cells, contain a certain subset of metabolic enzymes that carry out distinct biological reactions. This physical separation of biological reactions increases the overall metabolic efficiency and allows even incompatible or contradictory reactions such as synthesis (anabolism) and degradation (catabolism) or oxidation and reduction to take place within the same cell at the same time. Compartmentation also allows detoxification of toxic pathway intermediates without harming the cell. In peroxisomes, for example, the cytotoxic reactive oxygen species H2O2 is efficiently degraded by catalase that is specifically present in these organelles. As the amount of catalase in peroxisomes is so high, released H2O2 is degraded instantly.

In nature, most cellular multi-cascaded reactions are not catalyzed by free-floating, isolated enzymes but by multienzyme complexes. In these microcompartments, several enzymes form so-called metabolons or metabolic channels overcoming flux imbalances, diffusion and loss of intermediates, and release of toxic intermediates (**Figure 2**; Conrado et al., 2008; Lee et al., 2012; Jia et al., 2014). The assembly of sequential pathway enzymes into metabolons offers several advantages (Jørgensen et al., 2005; Conrado et al., 2008) when compared to isolated, soluble enzymes. The overall catalytic efficiency is increased because active centers of sequential pathway enzymes are brought into close proximity (enforced proximity) allowing direct transfer of intermediates from one enzyme to the other while avoiding metabolic interference. This phenomenon is called substrate or metabolic channeling and is based on the ordered cascading of subsequent enzymatic steps in which one enzyme produces the substrate of the following enzyme. Thanks to the reduction in diffusion distance and transit time, the effective local concentration of pathway intermediates is higher around the enzyme complex compared to the rest of the cell (Lee et al., 2012). This increase in local concentrations of metabolites prevents unspecific side reactions, favors reaction kinetics, and directs the intracellular flux of metabolites toward the synthesis of the desired product. However, it should be noted that this model is only true for diffusion-limited enzymes (Lee et al., 2012; Sweetlove and Fernie, 2013). In fact, the majority of enzymes are not diffusionlimited which means that the chemistry/conversion process is slower than the diffusion rate. For those non-diffusion-limited enzymes, the most prominent benefits of metabolic channeling are a reduced time to steady state and a better control of reaction specificity and regulation of metabolic branch points (Lee et al., 2012; Sweetlove and Fernie, 2013). Moreover, the release of toxic and/or unstable intermediates into the bulk phase is restricted by such multienzyme complexes that function as pipelines that strictly control the metabolic flux.

There are many examples in a wide range of organisms, including eukaryotes and prokaryotes, where the assembly of different individual sequential enzymes of a metabolic pathway into functional multiprotein complexes increases the overall catalytic efficiency. In the case of primary metabolism, multienzyme complexes are involved in central carbon metabolism (e.g., glycolysis, citric acid cycle), fatty acid oxidation, the Calvin cycle, amino acid biosynthesis (Tryptophan synthesis), the carboxysome, and the proteasome (Conrado et al., 2008). Al-Habori (1995) stated for example that glycolytic enzymes are co-localized on actin filaments to form an active complex. Depending on the energy demand, other authors observed that many of the glycolytic enzymes can be functionally associated with the outer mitochondrial membrane when there is a need for pyruvate to fuel

proteins), and competing pathways (metabolic interference) also leading to undesirable side reactions.

respiration (Giegé et al., 2003; Lunn, 2007; Møller, 2010). This defined intracellular spatial localization of the glycolytic enzymes makes sense as the product pyruvate can directly be transferred into the mitochondria where the next reactions of the central carbon catabolism (citric acid cycle) take place. Reversibility of this spatial organization is important, since pyruvate is an important and central intermediate of several biosynthetic pathways. Several enzymes performing sequential conversion steps in the citric acid cycle are also thought to be associated in a multienzyme complex within the mitochondrial matrix (Barnes and Weitzman, 1986; Lunn, 2007; Jia et al., 2014). The pyruvate dehydrogenase complex that catalyzes the conversion of pyruvate into acetyl-CoA also exhibits an efficient multienzyme structure allowing for substrate channeling and active-site coupling (Smolle and Lindsay, 2006; Jia et al., 2014).

In plants secondary metabolism, biosynthesis of isoprenoids, alkaloids, flavonoids, cyanogenic glucosides (e.g., dhurrin) and phenylpropanoids are examples of the presence/involvement of multienzyme complexes (Winkel, 2004; Jørgensen et al., 2005; Conrado et al., 2008). The phenylpropanoid pathway demands a substantial portion of carbon and energy fixed during photosynthesis. It is organized in a metabolic grid giving rise to a large number of different metabolites (Laursen et al., 2015). Synthesis of the cyanogenic glucoside dhurrin requires seven enzymatic steps starting from tyrosine. In Sorghum bicolor, these reactions are catalyzed by two multifunctional enzymes and one monofunctional enzyme that form a metabolic channel. The importance of this channel became evident in transgenic *Arabidopsis* plants expressing only the two first enzymes of the pathway. As a consequence, the resulting transgenic plants showed significant stunting, most likely caused by accumulation of the toxic pathway intermediate, p-hydroxymandelonitrile. Co-expression of the third enzyme restored normal plant growth and eliminated accumulation of the intermediate.

In fungi, supramolecular enzyme organization can be found in the polyaromatic/shikimate pathway. This pathway includes a multifunctional enzyme known as the AROM complex that has evolved to link five distinct enzymatic activities into a single pentafunctional polypeptide (Conrado et al., 2008). The fungal AROM complex is encoded by a single gene cluster that is expressed in a coordinate manner and also remains associated after synthesis. Interestingly, in *Escherichia coli* and other bacteria, the enzymes are encoded by individual genes distributed throughout the genome (Bachmann, 1983). This precise organization of multiple enzymatic activities on one single peptide, as found in the fungal AROM complex, allows very efficient substrate tunneling between adjacent active sites. Another example of supramolecular enzyme organization in the polyaromatic pathway is the tryptophan synthase. The enzyme is composed of two subunits α and β that assemble as a stable αββα multienzyme complex. Close vicinity of both subunits allows the α subunit to channel the reactive indole intermediate to the β subunit via a hydrophobic, physical tunnel exactly matching indole (Conrado et al., 2008; Dueber et al., 2009). In the case of the tryptophan synthase complex, the active sites are only 25 Å apart from each other leading to the prevention of diffusion of the reactive indole intermediate (Dueber et al., 2009). Thereby the cell is protected and the enzymatic conversion is significantly increased because of the high effective, local concentration of indole (Winkel, 2004; Conrado et al., 2008; Dueber et al., 2009; Chen and Silver, 2012; Lee et al., 2012; Jia et al., 2014).

Bacteria also spatially organize their interior milieu for specialized functions (Chen and Silver, 2012) although lacking membrane-bound organelles as isolated reaction compartments. The interior of prokaryotes can contain various protein-based compartments to physically separate distinct, often critical, enzymatic reactions (Boyle and Silver, 2012). These proteinbased compartments are composed of multiple proteins that co-assemble to form thin protein shells and typically encapsulate sequential pathway enzymes (encapsulation). These shells are named bacterial microcompartments (BMCs). The number or amount of encapsulated enzymes is defined by the size of the protein-based compartments. The incorporation of the enzymes into the complex is thought to take place during the assembly of the shell proteins. Small pores (~0.5 nm) on the shells allow the exchange of metabolites according to the size exclusion limit of the pores. Therefore, unrestricted metabolite diffusion across the shell is prevented. Additionally, some of these pores have been shown to be selective (Lee et al., 2012). One example of BMCs found in cyanobacteria and other autotrophic prokaryotes (Chen and Silver, 2012) are carboxysomes (approximately 100 nm in diameter) that are estimated to contain 270 molecules of the key carbon fixation enzyme ribulose-1,5-bisphosphate carboxylase/ oxygenase (RuBisCO) (Lee et al., 2012; Chen et al., 2014). In these proteinaceous microcompartments, the local CO2 concentration is dramatically increased in the vicinity of RuBisCO preventing the oxygenase reaction of the enzyme and hence increasing its catalytic activity (Lee et al., 2012; Chen et al., 2014). Another example of defined spatial organization to increase metabolic efficiency in prokaryotes is the multienzyme cellulosome complex found in the cellulolytic *Clostridium thermocellum* (Chen and Silver, 2012). Due to the specific interaction of the complementary protein domains Dockerin and Cohesin, many different enzymes required for the degradation of plant cell walls are organized as extracellular nanomachinery on a scaffold on the cell surface (**Figure 3**; Lytle and Wu, 1998; Bayer et al., 2004; Pinheiro et al., 2009; Mazzoli et al., 2012). The organization and co-localization of the different hydrolytic enzymes and the close proximity to the substrate provides an efficient synergistic strategy to degrade cellulose and hemicellulose (Gefen et al., 2012).

In mammalian cells, the purinosome catalyzes the conversion of phosphoribosyl pyrophosphate (PRPP) to inosine monophosphate (IMP). This conversion requires six enzymes that are co-localized on microtubules to form an efficient metabolon. Assembly and disassembly are highly regulated and link the rate of *de novo* purine synthesis to the cellular purine nucleotide pool (DeLisa and Conrado, 2009). The purinosome complex is formed upon depletion of purines. The complex is associated with microtubules and stabilized by molecular chaperones, including HSP70 and HSP90. Accumulation of purines induces the disassembly of the complex. This process involves reversible protein phosphorylation by protein kinase CK2 (for review, see Laursen et al., 2015). Assembly and disassembly of purinosomes are remarkable examples of the dynamic regulation of multienzyme complexes which is of outmost importance to tightly coordinate metabolic flux through metabolic channels (Conrado et al., 2008; DeLisa and Conrado, 2009; Møller, 2010).

If not encoded by a single gene cluster (i.e., the AROM complex in fungi), the described supramolecular enzyme complexes assemble post-translationally into rather stable or transient complexes. Transient complexes allow dynamic responses to intra- (e.g., metabolic demands) and extracellular stimuli or signals (e.g., abiotic and biotic challenges) which is important for fine-tuning metabolism according to the state/demand of the cell or organism (Conrado et al., 2008; DeLisa and Conrado, 2009; Møller, 2010; Sweetlove and Fernie, 2013). Transient or dynamic metabolons are also called "functioning-dependent structures" (Møller, 2010). This micro-compartmentation can be achieved in different ways. One strategy of living cells is to co-localize interacting proteins/enzymes by anchoring them on membranes or on cytoskeleton structures (actin, microtubules). Alternatively, multienzyme complexes are organized by scaffold proteins (Good, 2011). The latter strategy is found in some metabolic pathways such as the cellulosomes but occurs frequently in signal cascades. In yeast, for example, the Ste5 scaffold protein organizes the interaction between Fus3 (MAPK), Ste7

(MAPKK), and Ste11 (MAPKKK) and is essential for mating. In mammalian cells, the Kinase suppressor of Ras (KSR) functions as a scaffold in the Ras–Raf–MEK–MAPK pathway (Roy et al., 2002) and the PSD-95 synaptic scaffold is crucial for the organization of neuronal synapses controlling the neurotransmitter receptor density (Good, 2011). Scaffold proteins are defined as extremely diverse proteins that coordinate the physical assembly of individual partner molecules (Good, 2011). Scaffold proteins, composed of multiple modular interaction domains (for example, protein–protein interaction domains) or motifs (Good, 2011), form flexible platforms where other proteins or relevant molecular components of a specific pathway can bind to. Thereby, the interaction partners are co-localized in a modular manner. Signaling pathways, where cascaded enzyme reactions often take place, especially benefit from this subcellular, spatial organization. Scaffolding strategy supports specificity, accuracy, and efficiency of signal transduction pathways, by enforcing proximity of the correct interaction partners whereas incorrect interaction partners are excluded (Good, 2011).

## SYNTHETIC MULTIENZYME COMPLEX FORMATION TO MIMIC NATURE'S STRATEGY TO INCREASE METABOLIC AND SIGNALING EFFICIENCY

In metabolic engineering where a natural endogenous biosynthetic pathway is manipulated to increase productivity and yield of a valuable molecule (Sonnewald, 2003; Capell and Christou, 2004; Na et al., 2010; Chen and Silver, 2012), several challenges have to be overcome. One is the expression level of heterologous enzymes. In bacterial systems, expression levels are often much higher compared to endogenous enzymes. This can lead to cellular stress responses, for example, due to the huge amount of proteins produced that are often even unfolded or misfolded, flux imbalances coupled with unpredictable and non-controllable metabolic changes and high energy and molecule (amino acids, nucleotides) consumption. An additional challenge is to balance expression of consecutive enzymes (Na et al., 2010). If the reactions are not balanced in a manipulated pathway, toxic intermediates are likely to accumulate which can lead to death of the expression host (Dueber et al., 2009). With nature's strategies to increase metabolic efficiency in mind, metabolic engineers are trying to engineer artificial multienzyme complexes, where the enzymes performing consecutive reactions are spatially organized (directed enzyme organization). The idea is to co-localize functional enzymes into complexes. Due to the enforced proximity of the enzyme active sites and the formation of enzyme microdomains built as a consequence of coclustering of multiple enzymes into higher aggregates, catalytic efficiency and metabolic pathway performance are improved (Sweetlove and Fernie, 2013; Castellana et al., 2014). Overall, pathway balancing involves several layers, including DNA copy number, transcriptional and translational regulation, scaffolding and compartmentation, as well as inclusion of metabolic sensors balancing the flux through synthetic pathways (Boyle and Silver, 2012; Jones et al., 2015). In addition to balancing protein amount and organization, enzyme engineering allows to improve activity, selectivity, and stability of enzymes (Otte and Hauer, 2015). Several approaches can be followed to stabilize enzymes. One promising approach involves cyclization of enzymes which has been achieved by using the split intein (Zhao et al., 2010) or SpyTag/SpyCatcher system (Schoene et al., 2014).

#### NON-PROGRAMMABLE MATRIX-BOUND ENZYME COMPLEXES

One strategy of engineering efficient multienzyme complexes that mimic those found in nature is to co-immobilize multiple enzymes of a sequential/cascaded pathway on the same carrier or on the same matrix. Early studies on immobilized enzymes clearly demonstrated that tethering of enzymes to particles significantly improves product formation. Comparing the activity of three successive enzymes, β-galactosidase, hexokinase, and glucose-6-phosphate-dehydrogenase, either matrix-bound or soluble, Mattiasson and Mosbach (1971) observed a faster conversion of the substrate when the sequential enzymes are co-localized on the same particle. Coupling was achieved by the CNBr method yielding covalently bound enzymes. Obvious disadvantages of the system include (i) spatial arrangement of enzymes is impossible and (ii) chemical cross-linking bears the risk of losing enzyme activity because chemical cross-linking is random and can also affect amino acid residues in the active center of the enzyme. The group of Mallapragada (Jia et al., 2013) sequentially co-localized the two model enzymes glucose oxidase (GOX) and horseradish peroxidase (HRP) on dual-functionalized polystyrene nanoparticles. The nanoparticles had been functionalized with carboxyl groups which have partially been modified using biotin hydrazide resulting in biotinylated carboxyl-polystyrene nanoparticles. This dual-functionalization allowed the use of different attachment strategies for each enzyme to better control the relative amounts of the enzymes on the nanoparticle. The streptavidin-tagged HRP was attached to the nanoparticles via the high-affinity biotin–streptavidin interaction. The unmodified carboxyl groups on the nanoparticles were used to covalently attach GOX by amide bond formation between the reactive carboxyl groups on the nanoparticle and amino groups of the enzyme. Immobilized enzymes retained their enzymatic activity that was comparable to free enzymes. Interestingly, sequential co-localization of GOX and HRP resulted in a twofold enhancement of the overall product conversion rate compared to the free enzymes and a mixture of individual immobilized enzymes on nanoparticles (Jia et al., 2013).

In several studies, immobilization of enzymes caused positive catalytic/kinetic effects and at the same time resulted in the stabilization of various enzymes (Sheldon, 2007; Garcia-Galan et al., 2011; Homaei et al., 2013; Guzik et al., 2014). For industrial applications, immobilized enzymes have additional advantages. They can be reused over multiple cycles and the enzymes are sequestered from the product stream. These properties improve industrial processes and first applications of immobilized enzymes include glucose isomerase for high fructose corn syrup, lipase for biodiesel production from triacylglycerides or thermolysin for aspartame synthesis (DiCosimo et al., 2013). Compared to the conventional fossil fuel-based chemistry, biomanufacturing offers many advantages, including biocompatibility and sustainability that leads to massive growth rates of the world market for industrial enzymes.

## SYNTHETIC COMPARTMENTATION BY PROTEIN ENCAPSULATION

Besides sequestration or tethering of enzymes using different matrices, encapsulation of proteins into semi-permeable compartments allows the physical separation of different metabolic reactions leading to an increase in pathway efficiency. As discussed above, bacteria encode proteinaceous shells, BMCs, in which functionally related enzymes are sequestered. Up to now BMC-shell encoding genes have been found in over 400 different sequenced bacterial genomes (Choudhary et al., 2012). The proteins form polyhedral structures similar to virus-like particles. Co-expression of recombinant shell proteins and selected proteins of interest (e.g., pathway enzymes), fused to shell-targeting signal peptides, in *E. coli* cells, leads to the functional compartmentation of heterologous enzymes in recombinant shells (Choudhary et al., 2012). In addition to bacterial shell proteins, also viral capsid proteins have been used as a tool for compartmentalizing engineered pathways (Chen et al., 2014). Co-expression of two or three enzymes fused by linker sequence to form one multifunctional protein with the bacteriophage P22 capsid protein allowed the design of a synthetic metabolon (Patterson et al., 2014). To improve the suitability of capsid and cargo proteins, several modifications have been tested. By adding positively charged peptides to capsid proteins and negatively charged peptides to cargo proteins, binding could be improved via enhanced electrostatic forces (Chen et al., 2014).

## PROGRAMMABLE AND REVERSIBLE SCAFFOLDING

Besides the co-immobilization strategy of enzymes on carriers to build efficient synthetic multiprotein complexes, various genetic modules (proteins, nucleic acids) have been described to function as building blocks for generating programmable, modular scaffolds on which enzymes can specifically be co-localized in a spatially ordered manner by simple tethering mechanisms. Specific protein–protein, DNA–DNA/RNA–RNA, and DNA–protein/ RNA–protein interactions have been used to co-localize metabolic enzymes to improve pathway flux. Dueber et al. (2009) designed synthetic protein scaffolds out of known, well-characterized, and widespread protein–protein interaction domains from metazoan signaling proteins (SH3-, PDZ-, and GBD binding domain). By fusing enzymes of the mevalonate biosynthetic pathway (AtoB, HMGS, and HMGR) to the respective peptide ligands (SH3-, PDZ-, and GBD-ligand), the authors generated a modular genetically encoded scaffold system on which the enzymes can be co-localized in a programmable and defined manner (**Figure 4**). This scaffold approach is based on simple, specific high-affinity interactions between protein binding domains and their cognate, specific peptide ligands. In addition to the close proximity of the enzymes co-immobilized on the protein scaffold and the substrate

channeling resulting from this, the reaction efficiency and the overall production rate/product yield can be further improved by varying the number of protein binding domain repeats in the scaffold taking into account the kinetic properties of the individual enzymes. Therefore, potential enzymatic bottlenecks, e.g., caused by low Kcat values or binding affinities (KM values), can be compensated by simply increasing the amount of "weak" enzymes with slow turnover rates. When the optimal scaffold stoichiometry (optimal enzyme ratio) is used, the conversion from acetyl-CoA to mevalonate performed by the three sequential enzymes AtoB, HMGS, and HMGR produced a 77-fold higher level of the product mevalonate compared to that of the un-scaffolded pathway (Dueber et al., 2009; Lee et al., 2012; Chen et al., 2014). This demonstrates once again that the spatial organization of enzymes into functional complexes allowing effective substrate channeling increases the overall metabolic efficiency because the local concentrations of metabolic intermediates are increased while their accumulation to toxic levels is prevented (Dueber et al., 2009). The same modular protein-based scaffold strategy was applied to the three-enzyme glucaric acid pathway, where glucaric acid is produced from glucose. Compared to the free enzymes, glucaric acid levels have been improved fivefold by scaffolding in *E. coli* cells (Moon et al., 2010; Lee et al., 2012). Another example of the use of synthetic protein scaffolds to increase pathway flux is resveratrol biosynthesis in yeast cells (Wang and Yu, 2012). Here, the authors scaffolded two enzymes, 4-coumarate: CoA ligase and stilbene synthase and achieved a fivefold increase in resveratrol synthesis compared to the un-scaffolded control. While the above given examples demonstrate the power of protein scaffolding using protein–protein interaction domains from metazoan signal transduction pathways, these domains may not be applicable in all systems. They may misfold or aggregate and cross-talk between engineered scaffolds with native signaling molecules may occur. The potential risk of cross-talk is dependent on the organism and may be neglectable for *E. coli* cells in which the described domains are not present (Dueber et al., 2009). To circumvent unintended perturbations of signaling pathways of the expression host, minimized synthetic domains or alternatives are required. One alternative are interaction domains derived from the bacterial cellulosomes. Based on the modular architecture of bacterial cellulosomes (discussed above), complementary protein modules of Cohesin and Dockerin can be used to generate artificial multienzyme complexes. Any enzyme of interest can be genetically fused to Dockerin domains. A scaffold consisting of various Cohesin domains leads to the targeting of the enzymes in a defined spatial orientation due to the specific protein–protein interaction between Cohesin domains and their cognate Dockerin domains. Thereby the so-called "Designer-Cellulosomes" can be generated.

An alternative to protein scaffolds are nucleic acid-based (DNA or RNA) scaffolds. DNA and RNA molecules represent suitable modular tools for the specific programmable spatial organization of pathway enzymes as they provide specific interactions by either hybridization (base pair complementarity, DNA–DNA/ RNA–RNA binding) or protein binding sequences specific for engineered zinc-finger or TALE proteins (DNA/RNA-protein binding). Conrado et al. (2012) used a configurable DNA-based scaffold to spatially arrange multiple pathway enzymes in a distinct order in the cytoplasm of *E. coli*. To specifically target pathway enzymes to the plasmid DNA-based scaffold consisting of multiple unique zinc-finger binding sites, the enzymes of interest were genetically fused to corresponding zinc-finger domains that specifically bind the DNA sequences present in the engineered scaffold with high affinity. Increased overall production rates due to the DNA scaffold-mediated enzyme co-assembly could be observed when enzymes of the resveratrol biosynthesis and the mevalonate synthesis were used (Conrado et al., 2012).

Another example of a DNA-assembled artificial multienzyme complex made use of luciferase and oxidoreductase catalyzing two consecutive reactions of flavin mononucleotide reduction and aldehyde oxidation (Niemeyer et al., 2002; Müller and Niemeyer, 2008). DNA hybridization of complementary ssDNA oligonucleotides in combination with the high-affinity biotin– streptavidin protein interaction was used to build the bienzymatic complex. Biotinylated enzymes were linked to covalent ssDNAstreptavidin conjugates resulting in enzyme–DNA conjugates due to the specific and very strong high affine, but non-covalent interaction between biotin and streptavidin. As single-stranded DNA carrier strands, containing complementary regions to the DNA oligomers found in the enzyme–DNA conjugates, were previously immobilized on a surface (microtiterplate) via the biotin–streptavidin interaction, co-immobilization and therefore multienzyme complex formation of the enzymes takes place by specific DNA–DNA hybridization (complementary base pairing) (Niemeyer et al., 2002). To explore the proximity effect caused by the spatial arrangement of the two enzymes, activity assays were performed. Results indicate that the enforced spatial proximity of the enzymes luciferase and oxidoreductase increases the overall enzymatic activity of the bienzyme complex in a scaffold architecture dependent manner (Niemeyer et al., 2002). Almost the same DNA-scaffold approach was used by Müller and Niemeyer (2008). The authors reported on the DNA-directed assembly of GOX and HRP. Both enzymes were used because they are a suitable reporter system for which the kinetic rates and the output can easily be measured as the two-step reaction performed by these enzymes produces a highly fluorescent dye (Resorufin) when providing Amplex Red. Catalytic efficiency could be measured by monitoring the fluorescence emission. In this study, the authors could show that the efficiency of the two DNA–enzyme conjugates was dependent on the position and steric parameters (Müller and Niemeyer, 2008).

RNA molecules also provide a modular tool for scaffold approaches where multiple pathway enzymes can be targeted and thereby spatially organized. Delebecque et al. (2011) used RNA aptamers (= short single-stranded oligonucleotides) to create 1D and 2D scaffolds. These scaffolds were used for the spatial organization of two bacterial enzymes required for hydrogen production. Similar to protein- and DNA-scaffolds, RNA-based scaffolds increased the rate of substrate conversion and product yield (Delebecque et al., 2011; Conrado et al., 2012). While proteins and especially large protein fusions often tend to misfold or aggregate, nucleic acids (DNA and RNA) have highly predictable local structures, and enzymes can be arranged into a given and programmable order by changing the distance between the protein binding sites (Conrado et al., 2012; Chen et al., 2014). Furthermore, DNA and RNA can easily be produced, fold into various structures (high-order assemblies), can have different lengths, and can consist of flexible numbers of repetitive scaffold units (Chen et al., 2014). For protein-based scaffolds, the addition of only short peptide ligands to each pathway enzyme is recommended to ensure the correct folding and therefore the function of the enzymes (Lee et al., 2012). In addition of course the fusion of, for example, zinc-finger domains to enzymes is critical, therefore an impact on the enzyme stability and activity cannot be excluded. Consequently, all modifications performed on the enzymes as well as on the modules (proteins, nucleic acids) to design the synthetic scaffold have to be optimized.

Besides all the advantages such as modularity and specificity, a big disadvantage of these scaffolds based on high affine protein–protein interaction domains is that the interactions are reversible depending on the different dissociation constants of the binding domain/peptide ligand pairs and they do not resist forces or boiling. Furthermore, not so many protein–protein or protein–peptide interactions are known that have low (nM) dissociation constants which is a measure for the affinity. In addition, each specific pair has its own dissociation constant and often they are not comparable with each other. This influences the targeting of the ligand-fused enzymes to the scaffold out of the protein binding domains. The different affinities complicate the specific and exact targeting of single enzymes to the scaffold in a defined ratio. There is also the risk of uncontrolled dissociation of the binding domain/peptide ligand pairs that are reversible and do not resist forces as the interaction is not covalent. Therefore, it would be desirable to create covalent linkages between proteins that are specific, formed under a wide range of conditions and which require fusion of small peptides only.

## PROGRAMMABLE AND IRREVERSIBLE SCAFFOLDING OF ENZYMES

Inspiration for the design of intermolecular covalent linkages between proteins originates from the analysis of pili formation in Gram-positive bacteria. Pili of Spy are composed of three subunits, Spy0125, Spy0128, and Spy0130 and are required for adhesion to host cells. Besides the well-known disulfide bridges between two cysteine residues, structural analysis of these and other bacterial surface proteins uncovered several additional covalent intramolecular bonds, including isopeptide bonds between lysine and asparagine or lysine and aspartate (Kang et al., 2014), ester bonds between threonine and glutamine (Kwon et al., 2014) or thioesters between cysteine and glutamine (Pointon et al., 2010; Walden et al., 2015).

The covalent SpySystem, pioneered by the group of Howarth, is based on the CnaB2 domain (immunoglobulin-like collagen adhesion domain) of the extracellular surface protein FbaB (fibronectin-binding protein) from the Gram-positive bacteria Spy. Within this CnaB2 domain, an autocatalytic, spontaneous, intramolecular reaction between a reactive lysine residue and a reactive aspartic acid residue takes place. This covalent isopeptide bond between Lys31 and Asp117 is catalyzed by a third amino acid residue, Glu77. These three amino acids form a catalytic triad and are directly involved in the isopeptide bond formation. In nature, a lot of Gram-positive bacteria were discovered to form spontaneous, intramolecular isopeptide bonds within their surface proteins. As a consequence of the covalent bond within a single protein, the protein is connected to itself conferring thermal, proteolytic, and pH stability to the protein. Consequently, bacteria use this covalent intramolecular bond formation to stabilize their extracellular proteins that are, for example, essential for the penetration and invasion of the host cells. By splitting the CnaB2 domain, two protein partners, SpyTag and SpyCatcher, were generated (Zakeri et al., 2012). The SpyTag (13 amino acids) contains the reactive aspartic acid residue whereas the SpyCatcher (138 amino acids) harbors the reactive lysine residue and the catalytic glutamic acid residue. *In vitro* and *in vivo* experiments have shown that the two protein partners find each other, reconstitute and undergo covalent reaction simply upon mixing. The isopeptide bond forms within minutes and is stable under a wide range of conditions (pH, temperature, buffer composition, reducing agents, and detergents) (Zakeri et al., 2012). This fact is important and should be highlighted as other systems such as covalent disulfide bonds easily dissociate under reducing conditions and are therefore reversible (Veggiani et al., 2014). Due to the robustness and covalent character of this SpySystem and the fact that all components are genetically encoded and can be expressed efficiently in *E. coli*, the SpySystem is ideal for irreversibly attaching proteins to each other. By fusing either SpyTag or SpyCatcher to the enzymes of interest, the catalyst can be covalently linked to corresponding SpyTag- or SpyCatcher-modified carriers (**Figure 5**). The SpySystem can also be used to create artificial multienzyme complexes that are stable. A synthetic scaffold composed of repeats of the SpyCatcher domain and SpyTagenzyme conjugates are needed to co-localize sequential pathway enzymes. The final ratio of enzymes bound to SpyCatcher scaffolds is determined by the ratio of soluble SpyTag-modified enzymes mixed with the SpyCatcher scaffold and can only be controlled by a defined input enzyme stoichiometry which might be a disadvantage of this covalent SpySystem. The artificial multienzyme complex forms spontaneously upon mixing SpyTag- and SpyCatcher-modified enzymes and scaffolds. The SpyTag and SpyCatcher domains find each other, reconstitute and form a covalent isopeptide bond. To provide more flexibility so that the SpyCatcher- and SpyTag domains can fold properly and can undergo the isopeptide bonding, specific linker sequences, such as glycine-serine linkers, are inserted between the enzymes and the SpyTag or SpyCatcher domains. As the SpyTag sequence is very small, the fusion to enzymes should be no problem regarding correct folding, structure, and function of the enzyme. The covalent SpyCatcher–SpyTag system also can be used to covalently immobilize sequential pathway enzymes on carriers such as magnetic beads. By colocalizing pathway enzymes on the same magnetic bead, the metabolic efficiency should be increased due to the enforced proximity of the enzymes and the favored enzyme-to-enzyme substrate channeling. In addition to the metabolic/kinetic benefits, the stability of the enzymes can be enhanced thanks to the co-immobilization. It is also feasible to create multifunctional proteins by fusing SpyTag sequences to the N- and C-terminus of a selected enzyme. Mixing this double tagged enzyme with two enzymes carrying one SpyCatcher each, trimeric complexes are possible (**Figure 6**).

respectively, SpyTag-modified enzymes artificial, covalent enzyme pipes can be generated. Linker sequences between the enzymes and the Spy-domains should ensure correct protein folding and structure. Metabolic channeling between the single, covalently linked enzymes (indicated by the arrows) should be facilitated.

## ARE THERE ADDITIONAL SYSTEMS ALLOWING CONTROLLED FORMATION OF INTERMOLECULAR LINKAGES IN PROTEINS?

To allow the design of precisely ordered enzyme arrays, alternative systems, in addition to the SpyTag-SpyCatcher system, for covalent tagging of proteins are required. Intermolecular isopeptide bonds are long known and form, for instance, between sumo or ubiquitin and lysine residues of target proteins. These modifications are linked to protein function, localization, and degradation. Intramolecular isopeptide bonds have only recently been discovered. They were first found in crystal structures of Spy0128 and have subsequently been detected in *Staphylococcus aureus* adhesin Can, *Enterococcus faecalis* adhesin Ace, *Streptococcus gordonii* antigen I/II adhesin SspB, and others (Kang and Baker, 2011). As intramolecular isopeptide bonds are frequently seen in extracellular proteins of Gram-positive bacteria, there is a good chance to find alternative covalent SpyCatcher–SpyTag interactions. If this search is successful, one can imagine that the targeting of Tag-enzyme conjugates to their specific Catcher binding domains is more specific and can be precisely controlled (**Figure 7**). As an alternative to isopeptide bonding, it may also be feasible to utilize the autocatalytic formation of thr-gln ester bonds as has been reported for the *Clostridium perfringens* adhesion protein Cpe0147 (Kwon et al., 2014). In a recent study of Gao et al. (2015) the authors combined non-covalent, high-affinity protein–protein interaction between the PDZ domain and its peptide ligand with covalent disulfide bonding. To this end, the authors substituted one amino acid in the PDZ domain and ligand by cysteine. Fusing these modified interaction domains to target proteins, the authors could demonstrate that depending on the redox potential of the buffer disulfide bridges could be formed between both domains. This disulfide bonding allowed the formation of "disulfide-locked multienzyme supramolecular devices" (Gao et al., 2015).

FIGURE 8 | Scheme of AM-technologies inkjet printing and bioplotting within light microscope images of typical generated dot and grid structures (Scale bar in all images 200 **μ**m).

## 3-D PRINTING – A NEW TOOL IN DESIGNING ENZYME ARRAYS

Three-dimensional printing is revolutionizing industry and holds great promise in biotechnology/biomedicine. Nowadays, it is possible to process a wide range of biological substances ranging from animal cells, plant cells, bacteria, proteins up to DNA sequences. Different strategies have been employed to transfer biological systems into micropattern, lines, channels, or other 3-D structures. These include photolithography, dip pen nanolithography, microcontact printing, replica molding, and AM eventually coupled with self-assembly techniques (Herzer et al., 2010). For example, photolithographic patterning of proteins on surfaces has been used extensively in the past to analyze cell behavior with micrometer-scale resolution (Kane et al., 1999). However, by photolithographic patterning it is hard depositing several different proteins on the same surface. Furthermore, in order to print microarrays in high-resolution structures down to approx. 100 nm, microcontact printing has been established as the method of choice. A 3-D shaped stamp transfers its surface layout onto a planar target structure (surface patterning). 3-D printing, as AM technique, adds the desired material in its final shape without subtractive removal. Historically, AM started first at an industrial level in the early 1990s as an alternative to traditional model-making techniques for rapid prototyping (sometimes referred to as 3-D printing). AM techniques are now defined by the American Society for Testing and Materials (ASTM) as processes of joining materials to make objects from 3-D digital data, usually layer upon layer, as opposed to subtractive manufacturing methods such as computer numerical control (CNC) milling (ASTM F2792-10 Standard Terminology for Additive Manufacturing Technologies, ASTM Intern., USA). Engineering of metabolic pathways is a highly complex and demanding approach, including large scale and high-throughput biology, manufacturing techniques with high resolution and enormous flexibility. Inkjet printing and bioplotting could be suitable AM techniques for a successful application in engineering of metabolic pathways.

The inkjet technique is a non-contact digital printing method, creating a functional pattern by delivering ink droplets to the substrate surface. One type is the continuously injecting one, which shoots a permanent ink stream. The second one is the drop on demand (DOD) technique, which expulse only a single drop on a signal. The two main technics used are piezoelectric and electro-thermal, which are fast, cheap, digital controlled and simple methods for structured pattering of biomaterials, cells, and protein molecules (Calvert, 2001). Electro-thermal is the easiest and most used DOD technique, while piezoelectric is a milder printing technique. Inkjet printing as suitable AM method has been applied for the creation of a microenvironment for cells, proteins, and oligonucleotides by printing water-based matrices with spatially defined patterns (Barbulovic-Nad et al., 2006). Generally, inkjet systems enable the fast printing of small droplets or patterns with volumes in the range of 50–250 pl with a high resolution (50–100 μm) as illustrated in **Figure 8**. The bioplotting process is significantly slower and possesses a lower resolution compared to inkjet printing (also **Figure 8**). However, larger volumes of up to 10 nl can be processed so that the process is especially well suited for the handling of immobilized cells (Zehnder et al., 2015). With the aid of the 3-D bioplotter, larger and more complex scaffold geometries without the need for support structures can be realized, in contrast to the inkjet printer. The advantage of this method is the possibility to print cells, growth factors, and enzymes at the same time into precisely defined 3-D structure. By using a multi-nozzle bioplotter, it is possible to create scaffold structures consisting of several different materials and metabolic enzymes (Detsch et al., 2014).

## CONCLUSION

Engineering of metabolic pathways holds great promise to pave the way for green and sustainable chemistry. Recent success in Systems and Synthetic Biology opened a large number of possibilities to design more and more complex metabolic pathways *in vitro* and *in vivo*, not foreseen until recently. Thanks to structural analysis and intensive research in protein–protein interactions, it becomes feasible to organize artificial pathways in ordered and regulated arrays. Synthetic scaffolds offer a modular and highly flexible tool for rationally, post-translationally organizing/colocalizing multiple functionally related enzymes in a defined and controllable manner because the protein binding domain/peptide ligand interactions as well as the DNA–DNA/RNA–RNA, respectively, the DNA–Protein/RNA–Protein interactions are specific, adaptable, engineerable and can be theoretically applied to all metabolic pathways only depending on the developed ligand–enzyme conjugates or DNA/RNA–enzyme conjugates. So the main advantage of the scaffold-based multienzyme complex formation strategy is the modularity. The scaffold architecture and therefore the enzyme stoichiometry and enzyme ratio at the synthetic complex can be easily controlled just by varying single, modular biological parts. Through adding, removing, or replacing distinct domains, the scaffold structure is changed which can lead to a completely new behavior of the immobilized components and thereby of the whole pathway. Strategies for reversible interactions between proteins are complemented by covalent bonding enabling the stable construction of large

#### REFERENCES


multienzyme complexes. In addition to advances in the field of protein chemistry, technical developments such as 3-D printing will enable the design of industrial scale metabolic channels in the near future.

#### AUTHOR CONTRIBUTIONS

All authors contributed to writing and providing the literature, which has been cited in this review article.

#### ACKNOWLEDGMENTS

We are grateful to the comments of the reviewers, which improved the quality of our manuscript.

#### FUNDING

We acknowledge support by the emerging field initiative "Synthetic Biology" of the Friedrich-Alexander University of Erlangen-Nuremberg.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Pröschel, Detsch, Boccaccini and Sonnewald. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*