# BIOMASS MODIFICATION, CHARACTERIZATION AND PROCESS MONITORING ANALYTICS TO SUPPORT BIOFUEL AND BIOMATERIAL PRODUCTION

EDITED BY: Robert Henry, Blake Simmons and Jason Lupoi PUBLISHED IN: Frontiers in Bioengineering and Biotechnology and Frontiers in Energy Research

### *Frontiers Copyright Statement*

*© Copyright 2007-2016 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

*All copyright, and all rights therein, are protected by national and international copyright laws. The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88919-867-2 DOI 10.3389/978-2-88919-867-2

### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

### Dedication to quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# **BIOMASS MODIFICATION, CHARACTERIZATION AND PROCESS MONITORING ANALYTICS TO SUPPORT BIOFUEL AND BIOMATERIAL PRODUCTION**

Topic Editors:

**Robert Henry,** The University of Queensland, Australia **Blake Simmons,** Joint BioEnergy Institute, USA **Jason Lupoi,** Joint BioEnergy Institute, USA & University of Queensland, Australia

Sugarcane a lignocellulosic biomass feedstock

Image by Robert Henry

The conversion of lignocellulosic biomass into renewable fuels and other commodities has provided an appealing alternative towards supplanting global dependence on fossil fuels. The suitability of multitudes of plants for deconstruction to useful precursor molecules and products is currently being evaluated. These studies have probed a variety of phenotypic traits, including cellulose, non-cellulosic polysaccharide, lignin, and lignin monomer composition, glucose and xylose production following enzymatic hydrolysis, and an assessment of lignin-carbohydrate and lignin-lignin linkages, to name a few. These quintessential traits can provide an assessment of biomass recalcitrance, enabling researchers to devise appropriate deconstruction strategies. Plants with high polysaccharide and lower lignin contents have been shown to breakdown to monomeric sugars more readily. Not all plants contain ideal proportions of the various cell wall constituents, however. The capabilities of

biotechnology can alleviate this conundrum by tailoring the chemical composition of plants to be more favorable for conversion to sugars, fuels, etc. Increases in the total biomass yield, cellulose content, or conversion efficiency through, for example, a reduction in lignin content, are pathways being evaluated to genetically improve plants for use in manufacturing biofuels and bio-based chemicals. Although plants have been previously domesticated for food and

fiber production, the collection of phenotypic traits prerequisite for biofuel production may necessitate new genetic breeding schemes. Given the plethora of potential plants available for exploration, rapid analytical methods are needed to more efficiently screen through the bulk of samples to hone in on which feedstocks contain the desired chemistry for subsequent conversion to valuable, renewable commodities.

The standard methods for analyzing biomass and related intermediates and finished products are laborious, potentially toxic, and/or destructive. They may also necessitate a complex data analysis, significantly increasing the experimental time and add unwanted delays in process monitoring, where delays can incur in significant costs. Advances in thermochemical and spectroscopic techniques have enabled the screening of thousands of plants for different phenotypes, such as cell-wall cellulose, non-cellulosic polysaccharide, and lignin composition, lignin monomer composition, or monomeric sugar release. Some instrumental methods have been coupled with multivariate analysis, providing elegant chemometric predictive models enabling the accelerated identification of potential feedstocks. In addition to the use of highthroughput analytical methods for the characterization of feedstocks based on phenotypic metrics, rapid instrumental techniques have been developed for the real-time monitoring of diverse processes, such as the efficacy of a specific pretreatment strategy, or the formation of end products, such as biofuels and biomaterials. Real-time process monitoring techniques are needed for all stages of the feedstocks-to-biofuels conversion process in order to maximize efficiency and lower costs by monitoring and optimizing performance. These approaches allow researchers to adjust experimental conditions during, rather than at the conclusion, of a process, thereby decreasing overhead expenses.

This Frontiers Research Topic explores options for the modification of biomass composition and the conversion of these feedstocks into to biofuels or biomaterials and the related innovations in methods for the analysis of the composition of plant biomass, and advances in assessing up- and downstream processes in real-time. Finally, a review of the computational models available for techno-economic modeling and lifecycle analysis will be presented.

**Citation:** Henry, R., Simmons, B., Lupoi, J., eds. (2016). Biomass Modification, Characterization and Process Monitoring Analytics to Support Biofuel and Biomaterial Production. Lausanne: Frontiers Media. doi: 10.3389/978-2-88919-867-2

# Table of Contents

*06 Editorial: Biomass Modification, Characterization, and Process Monitoring Analytics to Support Biofuel and Biomaterial Production*

Jason Lupoi, Blake Simmons and Robert Henry

*08 Efficient Eucalypt Cell Wall Deconstruction and Conversion for Sustainable Lignocellulosic Biofuels*

Adam L. Healey, David J. Lee, Agnelo Furtado, Blake A. Simmons and Robert J. Henry

*22 Optimization of alkaline and dilute acid pretreatment of agave bagasse by response surface methodology*

Abimael I. Ávila-Lara, Jesus N. Camberos-Flores, Jorge A. Mendoza-Pérez, Sarah R. Messina-Fernández, Claudia E. Saldaña-Duran, Edgar I. Jimenez-Ruiz, Leticia M. Sánchez-Herrera and Jose A. Pérez-Pimienta

### *32 Evaluating lignocellulosic biomass, its derivatives, and downstream products with Raman spectroscopy*

Jason S. Lupoi, Erica Gjersing and Mark F. Davis

*50 Standard flow liquid chromatography for shotgun proteomics in bioenergy research*

Susana M. González Fernández-Niño, A. Michelle Smith-Moritz, Leanne Jade G. Chan, Paul D. Adams, Joshua L. Heazlewood and Christopher J. Petzold

*57 Development of a high throughput platform for screening glycoside hydrolases based on Oxime-NIMS*

Kai Deng, Joel M. Guenther, Jian Gao, Benjamin P. Bowen, Huu Tran, Vimalier Reyes-Ortiz, Xiaoliang Cheng, Noppadon Sathitsuksanoh, Richard Heins, Taichi E. Takasuka, Lai F. Bergeman, Henrik Geertz-Hansen, Samuel Deutsch, Dominique Loqué, Kenneth L. Sale, Blake A. Simmons, Paul D. Adams, Anup K. Singh, Brian G. Fox and Trent R. Northen

### *67 Use of nanostructure-initiator mass spectrometry to deduce selectivity of reaction in glycoside hydrolases*

Kai Deng, Taichi E. Takasuka, Christopher M. Bianchetti, Lai F. Bergeman, Paul D. Adams, Trent R. Northen and Brian G. Fox


Fan Lin, Christopher L. Waters, Richard G. Mallinson, Lance L. Lobban and Laura E. Bartley

*106 Potential for Genetic Improvement of Sugarcane as a Source of Biomass for Biofuels*

Nam V. Hoang, Agnelo Furtado, Frederik C. Botha, Blake A. Simmons and Robert J. Henry

*121 Identification and molecular characterization of the switchgrass AP2/ERF transcription factor superfamily, and overexpression of PvERF001 for improvement of biomass characteristics for biofuel*

Wegi A. Wuddineh, Mitra Mazarei, Geoffrey B. Turner, Robert W. Sykes, Stephen R. Decker, Mark F. Davis and C. Neal Stewart Jr.

*142 Phenotypic Changes in Transgenic Tobacco Plants Overexpressing Vacuole-Targeted Thermotoga maritima BglB Related to Elevated Levels of Liberated Hormones*

Quynh Anh Nguyen, Dae-Seok Lee, Jakyun Jung and Hyeun-Jong Bae

*154 Book review: Socio-economic impacts of bioenergy production* Sheikh Adil Edrisi and P. C. Abhilash

# Editorial: Biomass Modification, Characterization, and Process Monitoring analytics to Support Biofuel and Biomaterial Production

*Jason Lupoi1,2,3 , Blake Simmons1,2,4 and Robert Henry2,3\**

*<sup>1</sup> Joint BioEnergy Institute, Emeryville, CA, USA, 2University of Queensland, Brisbane, QLD, Australia, 3Queensland Alliance for Agriculture and Food Innovation, Brisbane, QLD, Australia, 4Sandia National Laboratory, Livermore, CA, USA*

Keywords: high-throughput screening assays, genetic manipulation, glycoside hydrolases, Raman spectroscopy, NIMS, cell-wall degrading enzymes, biomass pretreatment and fractionation

### **The Editorial on the Research Topic**

### **Biomass Modification, Characterization, and Process Monitoring Analytics to Support Biofuel and Biomaterial production**

### *Edited by:*

*Uwe Schröder, Technische Universität Braunschweig, Germany*

### *Reviewed by:*

*Falk Harnisch, UFZ – Helmholtz-Centre for Environmental Research, Germany*

> *\*Correspondence: Robert Henry robert.henry@uq.edu.au*

### *Specialty section:*

*This article was submitted to Bioenergy and Biofuels, a section of the journal Frontiers in Bioengineering and Biotechnology*

*Received: 08 January 2016 Accepted: 22 February 2016 Published: 10 March 2016*

### *Citation:*

*Lupoi J, Simmons B and Henry R (2016) Editorial: Biomass Modification, Characterization, and Process Monitoring Analytics to Support Biofuel and Biomaterial Production. Front. Bioeng. Biotechnol. 4:25. doi: 10.3389/fbioe.2016.00025*

This Frontiers Research Topic journeys through various challenges facing researchers seeking to develop fuels and products derived from lignocellulosic biomass. These challenges include: the rapid quantification of plant cell wall chemistry, enabling yields of potential monomeric sugars to be assessed, identification of plants possessing ideal trait that can be brought to the forefront of research efforts; once the native plant chemistry is known, how can yields be improved by chemically or genetically altering plant cell walls to reduce recalcitrance; does genetic modification of plants to increase accessibility to saccharification enzymes hinder the plant's growth and/or function; are the innovative methods identified by researchers cost-effective and scalable to a commercial level? These topics are a sampling of the obstacles researchers combat when nominating a specific plant for downstream applications or implementing new deconstruction, genetic, or measurement strategies. How well can different plants can be broken down into useful, downstream precursor molecules? Efforts to develop a lucid picture of the chemical composition of abundant, diverse plants being explored as potential starting feedstocks has resulted in the evolution of high-throughput techniques that permit many more samples to be screened in much shorter periods of time. Advances in thermochemical and spectroscopic techniques have enabled the screening of thousands of plants for different phenotypes, such as cell-wall composition and monomeric sugar release. Some instrumental methods have been coupled with multivariate analysis, providing elegant chemometric predictive models enabling the accelerated identification of potential feedstocks. Rapid instrumental techniques have been developed for real-time monitoring of diverse processes, such as the efficacy of specific pretreatment strategies, or downstream products, such as biofuels and biomaterials. Real-time process monitoring techniques are needed for all stages of the feedstocks-to-biofuels conversion process to maximize efficiency and lower costs by monitoring and optimizing performance. These approaches allow researchers to adjust experimental conditions during, rather than at the conclusion, of processes, thereby decreasing overhead expenses.

The article in this book entitled "*Evaluating Lignocellulosic Biomass, Its Derivatives, and Downstream Products with Raman Spectroscopy*" illustrates how advances in Raman instrumentation, and the use of multivariate analysis has aided researchers in developing efficient, rapid methods to quickly evaluate diverse varieties of plants. Another hot research topic is how can we make better, more efficient enzymes beyond what may be naturally available? Just as high-throughput and automated methods are desirable to partition a diverse portfolio of potential feedstocks, gaging enzymatic performance and efficiency presents similar challenges that can be alleviated by employing techniques such as that illustrated in the two articles "*Use of Nanostructure-Initiator Mass Spectrometry to Deduce Selectivity of Reaction in Glycoside Hydrolases*" and "*Development of a High Throughput Platform for Screening Glycoside Hydrolases Based on Oxime-NIMS*". Studies like these have probed a variety of phenotypic traits, including the assessment of dominant plant cell wall chemical constituents (glucan, xylan, lignin, etc.), and the quantification of products from enzymatically deconstructing the plants into simpler, molecular building blocks. "*Immunological Approaches to Biomass Characterization and Utilization*" reviews recent advances in the use of glycan-directed probes to evaluate plant cell walls, including glycan-directed monoclonal antibodies, in efforts to understand cell-wall composition and structure, and how manipulations to these affect the biosynthetic processes. Assessing the study and function of proteins, called proteomics, is typically throughput limited. "*Standard Flow Liquid Chromatography for Shotgun Proteomics in Bioenergy Research*" validates the standard flow technique, and shows how 800–1000 different proteins could be identified in complex samples. Pyrolysis of plants to form bio-oil presents an interesting option for the synthesis of fossil fuel replacements. Before these oils can be utilized, they must be fully characterized. How will the intrinsic biomass chemistry affect the resultant bio-oil? The article entitled "*Relationships between Biomass Composition and Liquid Products formed via Pyrolysis*" explores this topic.

These important traits can aid in gaging biomass recalcitrance, enabling researchers to devise appropriate deconstruction strategies. The article "*Optimization of Alkaline and Dilute Acid Pretreatment of Agave Bagasse by Response Surface Methodology*" describes a technique for optimizing the deconstruction of agave cell walls to maximize the total reducing sugars produced during saccharification. Various techniques used to combat biomass recalcitrance in eucalypts have been outlined in the article "*Efficient Eucalypt Cell Wall Deconstruction and Conversion for Sustainable Lignocellulosic Biofuels*". Plants with high polysaccharide and lower lignin contents are known to breakdown to monomeric sugars more readily, but what happens if a plant does not naturally possess the ideal traits for biofuel production, like high cellulose and low lignin? Researchers can purposely tailor

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2016 Lupoi, Simmons and Henry. This is an open-access article distributed under the terms of the Creative Commons Attribution License*  the plant's chemical composition to be more favorable for conversion to sugars, fuels, or other value-added products. In "*Potential for Genetic Improvement of Sugarcane as a Source of Biomass for Biofuels*", different strategies for improving the degradability of sugarcane are reviewed. Increases in the total biomass yield, cellulose content, or saccharification efficiency through, for example, reductions in lignin content, are pathways being evaluated to genetically improve plants as feedstocks for biofuels and biobased chemicals. The biological pathways with which the plant cell wall is fabricated can shed light on potential genetic manipulations that can favorably alter the cell wall chemistry. The articles "*Phenotypic Changes in Transgenic Tobacco Plants Overexpressing Vacuole-Targeted Thermotoga maritima BglB Related to Elevated Levels of Liberated Hormones*" and "*Identification and Molecular Characterization of the Switchgrass AP2/ERF Transcription Factor Superfamily, and Overexpression of PvERF001 for Improvement of Biomass Characteristics for Biofuel*" illustrate this technique. Although plants have been previously domesticated for food and fiber production, the collection of phenotypic traits prerequisite for biofuel production may necessitate new genetic breeding schemes.

Lastly, the economic and social ramifications of using lignocellulosic biomass, such as the food versus fuel conflict, have been well documented. Can marginal lands provide us with the land required to achieve our energy needs in a cost-effective way? Cost-effective enough that we can seriously consider supplanting some fossil fuel usage on a much larger scale? This Research Topic includes a book review "*Book Review: Socio-Economic Impacts of Bioenergy Production*" of the recent Springer publication *Socioeconomic impacts of bioenergy production*.

This collection of papers demonstrates how advances at multiple levels are likely to contribute to the successful industrial scale production of biofuels and biomaterials. Innovations in biomass composition have a major role to play in facilitating conversion and appropriate analytical methodologies are fundamental to managing the industrial process and guiding research and development.

### AUTHOR CONTRIBUTIONS

JL wrote the editorial for this research topic. BS and RH critically read and edited the document.

*(CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

*Adam L. Healey1 \*, David J. Lee2,3 , Agnelo Furtado1 , Blake A. Simmons1,4,5 and Robert J. Henry1*

*1Queensland Alliance for Agriculture and Food Innovation, University of Queensland, St. Lucia, QLD, Australia, 2 Forest Industries Research Centre, University of the Sunshine Coast, Maroochydore, QLD, Australia, 3Department of Agriculture and Fisheries, Forestry and Biosciences, Agri-Science Queensland, Gympie, QLD, Australia, 4 Joint BioEnergy Institute, Lawrence Berkeley National Laboratory, Emeryville, CA, USA, 5Biological and Engineering Sciences Center, Sandia National Laboratories, Livermore, CA, USA*

### *Edited by:*

*Subba Rao Chaganti, University of Windsor, Canada*

### *Reviewed by:*

*Chiranjeevi Thulluri, Jawaharlal Nehru Technological University Hyderabad, India Maria Carolina Quecine, University of São Paulo, Brazil Brahmaiah Pendyala, University of Toledo, USA*

> *\*Correspondence: Adam L. Healey a.healey1@uq.edu.au*

### *Specialty section:*

*This article was submitted to Bioenergy and Biofuels, a section of the journal Frontiers in Bioengineering and Biotechnology*

*Received: 15 August 2015 Accepted: 04 November 2015 Published: 20 November 2015*

### *Citation:*

*Healey AL, Lee DJ, Furtado A, Simmons BA and Henry RJ (2015) Efficient Eucalypt Cell Wall Deconstruction and Conversion for Sustainable Lignocellulosic Biofuels. Front. Bioeng. Biotechnol. 3:190. doi: 10.3389/fbioe.2015.00190*

In order to meet the world's growing energy demand and reduce the impact of greenhouse gas emissions resulting from fossil fuel combustion, renewable plant-based feedstocks for biofuel production must be considered. The first-generation biofuels, derived from starches of edible feedstocks, such as corn, create competition between food and fuel resources, both for the crop itself and the land on which it is grown. As such, biofuel synthesized from non-edible plant biomass (lignocellulose) generated on marginal agricultural land will help to alleviate this competition. Eucalypts, the broadly defined taxa encompassing over 900 species of *Eucalyptus*, *Corymbia*, and *Angophora* are the most widely planted hardwood tree in the world, harvested mainly for timber, pulp and paper, and biomaterial products. More recently, due to their exceptional growth rate and amenability to grow under a wide range of environmental conditions, eucalypts are a leading option for the development of a sustainable lignocellulosic biofuels. However, efficient conversion of woody biomass into fermentable monomeric sugars is largely dependent on pretreatment of the cell wall, whose formation and complexity lend itself toward natural recalcitrance against its efficient deconstruction. A greater understanding of this complexity within the context of various pretreatments will allow the design of new and effective deconstruction processes for bioenergy production. In this review, we present the various pretreatment options for eucalypts, including research into understanding structure and formation of the eucalypt cell wall.

### Keywords: eucalypts, biotechnology, pretreatment, lignocellulosic biofuel, bioenergy

## INTRODUCTION

Currently, approximately 40% of the world's transportation fuels (fossil fuels) are derived from non-renewable sources, the combustion of which directly contributes to global climate change (Simmons et al., 2008; González-García et al., 2012). As such, renewable plant-based feedstocks for fuel synthesis, aptly referred to as "biofuels," are under consideration to alleviate these concerns. The first generation of feedstocks used for biofuel synthesis was mainly derived from sugarcane and corn, as their energy storage polysaccharides are readily available and easily hydrolyzed into monosaccharides for microbial fermentation. However, as these feedstocks are important links within the human food chain, generation of biofuel from these crops creates a direct competition for resources. Furthermore, in the USA alone, the maximum biofuel yield from the first-generation biofuel feedstocks is roughly 30% of the renewable fuel target (Perlack et al., 2005), creating a large gap that must be filled with alternatives. Plant cell wall structural polysaccharides, although more complex than starch molecules, represent the most abundant biopolymers in the world, containing large stores of carbon for conversion into liquid fuels, such as ethanol and butanol (Wyman, 1999). As structural polysaccharides represent the non-edible portions of plants, fuel synthesized from cellulose and hemicellulose can help alleviate the competition between energy and agriculture. Crops intended for this purpose are known as the second-generation biofuel feedstocks.

There are numerous advantages to using the second-generation feedstocks as a source of renewable energy. Combustion of fossil fuels adds carbon dioxide to the atmosphere, the main contributor to the greenhouse effect and subsequent climate change. Biofuel crops help to mitigate the effect of CO2 by sequestering more carbon within their biomass than is released during biofuel combustion, thus creating a net reduction in CO2 levels (Rubin, 2008; Shepherd et al., 2011; Soccol et al., 2011). High production grassy species, such as those belonging to the *Miscanthus* and *Saccharum* genera, are high-value bioenergy crops due to their exceptional growth rate and desirable biomass composition that is relatively easy to deconstruct for polysaccharides using mild pretreatments (Rubin, 2008). However, high production crops, such as these require nutrient rich soils, normally reserved for intensive agriculture. This indirect competition for land and soil between food or fuel crops can be avoided through the cultivation of feedstocks that grow well on marginal land, of which there is approximately 1.4 billion hectares available globally (Carroll and Somerville, 2009; Somerville et al., 2010). Fuel production from woody (or lignocellulosic) biomass also offers several advantages over grassy biomass. Growing trees for energy production allows biomass to be "stored on the stump" to be harvested when needed (Shepherd et al., 2011), a luxury not afforded by grasses, which must be harvested at particular times during the year and must be processed immediately before fungal degradation begins. Also, woody biomass can be transported to processing facilities more economically, as it more energy dense than grassy biomass which requires greater amounts of fuel to move the biomass than can be generated from its fibers (Kaylen et al., 2000; Somerville et al., 2010).

Eucalypts, a native Australian taxon that includes genera *Eucalyptus*, *Corymbia*, and *Angophora*, are an attractive prospective biofuel crop, being the most widely planted hardwood trees in the world (Myburg et al., 2007; Grattapaglia and Kirst, 2008). Having adapted to the terrestrial environment of Australia, eucalypts are well suited for plantations in a wide variety of climates, soil types, and rainfall conditions (Ladiges et al., 2003; Myburg et al., 2007; Grattapaglia and Kirst, 2008). They are grown commercially in over 100 countries with well-established silviculture practices already in place, such as clonal propagation, allowing plantations to achieve high rates of productivity, up to 25 dry tonnes/hectare/year (Stricker et al., 2000; Rockwood et al., 2008), more than double the required productivity rate estimated by the US Department of Energy for a long-term renewable energy crop (Hinchee et al., 2009). Furthermore, many eucalypt species also regenerate shoots after harvesting, which ensures ease of management by potentially eliminating the need for re-planting (Shepherd et al., 2011).

Eucalypts, due to differences in flowering times (protantry) and self-incompatibility, are predominately out-crossing species which maintains high levels of heterozygosity in their genomes and encourages genetic diversity and phenotypic variation (Horsley and Johnson, 2007; Grattapaglia and Kirst, 2008). This variation is exploited by breeders through selection and combination of desirable traits for industrial application, such as controlled-cross hybrids that combine the high cellulose and fiber content of *Eucalyptus globulus* with the growth rate and form of *E. grandis* (Poke et al., 2005; Grattapaglia, 2008). Phenotypic traits that are desirable for efficient biofuel production are closely aligned with those sought by the pulp and paper industry. Highquality wood pulp is primarily composed of cellulosic fibers, which upon enzymatic hydrolysis releases monomeric glucose subunits, which serve as the main substrate for microbial fermentation and conversion to liquid fuel (Hisano et al., 2009; Wegrzyn et al., 2010).

Despite the advantages of lignocellulosic biofuel crops, woody biomass conversion into a source of renewable energy in hindered through its natural complexity and recalcitrance to deconstruction (Ramos and Saddler, 1994; Blanch et al., 2011). Many of the options for deconstruction require harsh and expensive chemicals (such as acids and alkalis) or energy intensive methods, such as grinding and ball milling. An increased understanding of eucalypt biomass will allow the engineering of more cost-effective pretreatments that can increase fuel production efficiency, while lessening the formation and impact of inhibitory compounds produced during conversion. In this review, we present an overview of the major contributing components of eucalypt cell wall recalcitrance, and the current research surrounding eucalypt biomass pretreatment for fuel production.

## CHALLENGES TO LIGNOCELLULOSIC BIOFUEL CONVERSION

Efficient conversion of lignocellulosic biomass to biofuel requires pretreatment, saccharification, and fermentation, each presenting unique challenges (**Figure 1**). Pretreatment breaks down and separates each of the major components of biomass (cellulose, hemicellulose, and lignin), either through mechanical or through chemical means to reduce cellulose crystallinity, increase surface area, and remove lignin, the largest barrier to efficient enzymatic saccharification (Furtado et al., 2014). **Table 1** summarizes common lignocellulose pretreatments and highlights pros and cons of each process.

Despite the low cost of producing lignocellulosic biomass, the economic cost of producing biofuel remains high (Lange, 2007). Pretreatment, a required process for increasing saccharification is costly, requiring large amounts of energy or expensive chemicals

(e.g., sulfuric acid) to promote enzymatic access to polysaccharides. Fermentation also represents a significant cost to biofuel production as the production of enzymes is expensive, and the efficiency at which microorganisms can convert sugars into fuel is dependent on pretreatment (Hamelinck et al., 2005). Therefore, harsh pretreatments that are used for biomass deconstruction in other industrial processes (e.g., pulping) may not be appropriate for biofuel production. There are also significant operational costs associated with biofuel conversion, including capital costs, labor, and waste water processing. As such, the development of simple, cost-effective, and environmentally safe pretreatments is critical for large-scale sustainable production. Given that pretreatment and fermentation represent the highest costs of producing biofuel, feedstock selection is also critical for fuel production as well.

The simplest pretreatment option is grinding and milling of biomass to increase reactive surface area for hydrolysis. However, the energy required to generate small enough particles is often too high to be a cost-effective option (Zheng et al., 2009; Talebnia et al., 2010). A more common pretreatment is acid hydrolysis, where strong acids (e.g., H2SO4) solubilize the hemicellulose polysaccharide matrix, leaving behind cellulose and lignin (Galbe and Zacchi, 2007). Although effective, acid pretreatment generates compounds that inhibit downstream biomass conversion processes through reduction of microbial growth and enzymatic release (Jönsson et al., 2013). For instance, while the majority of lignin present in the cell wall is acid insoluble (Klason lignin), upon pretreatment, a small portion hydrolyzes releasing phenolics, such as vanillin, trans-cinnamic acid, and 4-hydrobenzoic acid (Palmqvist and Hahn-Hägerdal, 2000; Ximenes et al., 2010). Additionally, monomeric subunits of cellulose and hemicellulose degrade in low pH conditions, generating aldehydes [furfural and 5-hydroxymethyl-2-furaldehyde (5-HMF)] and organic acids. Formation of these degradation products inhibits fermentation by reducing available sugars and limiting microbial growth (Zheng et al., 2009; Soccol et al., 2011; Puri et al., 2012). Similarly, organosolv pretreatment combines an organic solvent (e.g., ethanol) with an inorganic acid catalyst (e.g., sulfuric acid) to destroy internal lignin and hemicellulose bonds, resulting in effective recovery of high-quality cellulose and lignin portions of biomass. Although an effective pretreatment for both hardwood and


TABLE 1 | Summary and assessment of common pretreatment options for lignocellulose.

*Hendriks and Zeeman (2009), Alvira et al. (2010), and Blanch et al. (2011).*

softwood biomass, downstream ethanol production still suffers from the formation of inhibitory products (Sun and Cheng, 2002; Zhu and Pan, 2010).

Alkaline pretreatment, which employs chemicals, such as sodium hydroxide, lime and hydrazine, to disrupt the linkage between hemicellulose and lignin, reduces the formation of inhibitory products but nonetheless remains an expensive option that is dependent on lignin content which determines its efficacy (Blanch et al., 2011). An alternative method, which seeks to work universally well regardless of biomass composition, is ionic liquid (IL) pretreatment. ILs are non-volatile, stable compounds that solubilize lignocellulosic biomass, allowing selective precipitation of components for easy recovery. Once dissolved, cellulose precipitates from solution upon addition of an antisolvent (e.g., water or ethanol) while lignin and other solutes remain intact (Zhu et al., 2006; Singh et al., 2009).

## CELLULOSE CRYSTALLINITY

Cellulose, the most abundant biopolymer on earth, is composed of thousands of glucose monomers linked together by β 1–4 glycosidic bonds. Its function within the cell wall is to provide strength and rigidity, while remaining flexible during cell expansion and growth (Mutwil et al., 2008; Mansfield, 2009). Sucrose, generated through photosynthesis, supplies the glucose molecule required for cellulose synthesis, which is phosphorylated by hexokinase, and is incorporated into growing cellulose microfibrils by cellulose synthase (CESA) enzymes (Somerville, 2006; Joshi and Mansfield, 2007; Mohnen et al., 2008). During synthesis, each cellulose microfibril associates with other glucan chain through extensive hydrogen bonding and Van der Waals forces, creating a highly compact polysaccharide. Within the cell wall, cellulose exists in primarily two forms, a highly ordered crystalline structure that lacks surface area and a less ordered, amorphous type (Harris and DeBolt, 2010). The highly compact crystalline structure lends itself toward the natural recalcitrance of woody biomass to deconstruction, as it prevents cellulase enzymes to accessing microfibrils, thus inhibiting efficient saccharification (Mosier et al., 2005; Hall et al., 2010).

Crystalline cellulose formation in eucalypts has been traditionally researched through the formation of tension wood. Tension wood, characterized by the formation a gelatinous layer of crystalline cellulose (G-layer), serves to re-direct a growing stem upwards in response to gravitational stress (Jourez et al., 2001). As tension wood can be artificially induced, Paux et al. (2005) investigated tension wood formation in *E. globulus* by tying the growing stems of 2-year-old trees to the adjacent tree, bending their trunks to a 45° angle. By extracting RNA from the xylem of the bent trees on either side of the bend (tension wood and opposite wood) at various timepoints (0, 6, 24, and 168 h), the authors were able to identify differentially expressed genes during cellulose formation using a xylem complementary DNA (cDNA) array. As evidenced by a much larger bent-stem experiment performed in *Eucalyptus nitens* with 4,900 xylem cDNAs, Qiu et al. (2008) found tension wood, although lacking the characteristic "G-layer," contained high concentrations of cellulose and low amounts of Klason lignin. Additionally, X-ray diffraction of upper and lower bent stems revealed that the cellulose microfibril angle (MFA) on the upper branch was much less than that of the lower branch. MFA, the angle at which cellulose polymers at synthesized within the cell wall affects their tendency to form hydrogen bonds. MFA, which affects wood stiffness (Schimleck et al., 2001), is an indirect biofuel trait as cellulose content negatively correlates with MFA and lignin content (Plomion et al., 2001). Qiu et al. (2008) also found that in tension wood, the highest expression profiles belonged to β-tubulin genes and fasciclin-like arabinogalactan (FLA) proteins. β-Tubulin proteins are responsible for transporting cellulose synthesis machinery to the plasma membrane, which may in-turn affect MFA. FLA genes, known to associate pectic side-chains and other structural polysaccharides also affect MFA, as demonstrated through transformation of *E. nitens* with FLA3, identified from the *E. grandis* genome (Macmillan et al., 2015).

To investigate the effect of tension and opposite wood on saccharification and fermentation, Muñoz et al. (2011) treated *E. globulus* biomass to organosolv (ethanol/water) pretreatment, followed by simultaneous saccharification and fermentation (SSF) (discussed later). The authors found that tension wood (as compared to opposite wood) contained similar glucan content (46–47%), higher xylan amounts (16.0 and 12.0%, respectively), and lower lignin content (22.1 and 26.1%, respectively). Upon pretreatment, remaining residual lignin was lower in tension wood and required less time and cooking (as expressed by H factor, a single variable calculated from the combination of cooking temperature and time) for delignification. Pulp from tension and opposite wood were assayed for glucose conversion by enzymatic hydrolysis, finding that despite similar or higher lignin content, glucan to glucose conversion was more efficient in opposite wood. However, investigation into pulp viscosity showed that tension wood glucans were of higher molecular mass, which may have influenced their rate of conversion. Upon submission of pulps from tension and opposite wood for SSF, the authors found that harsh pretreatment conditions (H factor – 12,500) outperformed milder conditions (H factor – 3,900) to produce 35 and 30 g/L of ethanol, respectively. Considering the maximum theoretical conversion of ethanol from glucose is 51%, these concentrations represent 95 and 85% conversion efficiency, which scales to a yield of 290 L of ethanol/tonne of biomass. Considering the formation of tension wood is undesirable from a timber standpoint and good management practices within plantations dictate that trees of low economic value are removed to increase the growth of high-value trees (McIntosh et al., 2012), ethanol production from eucalypt plantation thinnings is a potential option for bioenergy production, dependent on distance required for biomass transport, growth rate, and stocking rate.

## NON-CELLULOSIC POLYSACCHARIDES

Before the formation of the secondary cell wall, the plant primary cell wall is a thin yet flexible structure that resists gravity and internal pressure while allowing growth and expansion (Cosgrove, 2005). Cellulose, being the core of the internal structure, provides the scaffold that non-cellulosic polysaccharides, such as hemicellulose and pectin, surround within a polysaccharide matrix (Carpita and Gibeaut, 1993; Mellerowicz and Sundberg, 2008). Although hemicellulose and pectin are polysaccharides, and thus can hydrolyze into monomeric subunits, these monomers consist mainly of pentose sugars which are more difficult to ferment than glucose. As such, based on their difficulty to ferment and how they reduce access to cellulose, hemicellulose and pectin also contribute to biomass recalcitrance (Himmel et al., 2007; Sticklen, 2008).

Xyloglucan is the most abundant hemicellulose polysaccharide of woody dicot species, with a repeating structure of β 1–4 glucan residues with various side-chains, predominantly unbranched glycosyl residues or α 1–6 xylose. Other side-chain molecules include galactose, fructose, and arabinose (Harris and DeBolt, 2010; Scheller and Ulvskov, 2010). Xyloglucan interacts with cellulose by crosslinking with non-crystalline regions or through hydrogen bonding with the microfibrils themselves (Cosgrove, 2005). For further reinforcement and strength, woody plant cell walls synthesize a secondary cell wall of cellulose, hemicellulose, and lignin. However, unlike the primary cell wall with a repetitive hemicellulose structure, the secondary cell wall polysaccharide matrix is composed of highly variable xylan molecules. This varied structure is highly substituted, with the most common modification in woody dicots being glucuronosyl residues which generates glucuronoxylan (Li et al., 2006; Scheller and Ulvskov, 2010). Given that *E. globulus* is a major source of fiber for the pulp and paper industry, the structure of its non-cellulosic polysaccharides has been extensively researched. Originally, eucalypts were believed to possess glucuronoxylan as found in woody dicot species, but investigations by Shatalov et al. (1999) and Evtuguin et al. (2003) found that *E. globulus* xylan structure was highly substituted by galactosyl and acetyl residues. These residues, although not targets for saccharification, can affect downstream conversion efficiency. Galactose is one of the most difficult sugars to ferment (Lee et al., 2011), while acetyl groups can contribute acetic acid during fermentation conditions which inhibits ethanol production in *Pichia* (Ferrari et al., 1992) and *Saccharomyces* (Taherzadeh and Karimi, 2007).

Acid pretreatment, designed to hydrolyze the hemicellulose matrix surrounding cellulose, requires various acid concentrations, pretreatment times, and temperatures to be effective. To examine these parameters on various eucalypt species, McIntosh et al. (2012) conducted a 33 factorial design (acid concentration, temperature, and pretreatment time) to understand sugar solubilization and degradation, enzymatic saccharification in response to pretreatment, and the fermentation of various hydrolyzates. Thinned trees of *Eucalyptus dunnii* and *Corymbia citriodora* subsp. *variegata* at ages 6 and 10 were tested within the factorial design, finding their biomass composition contained approximately 47–48% glucan, 16–17% xylan, 5% minor sugars, and 30% lignin. The authors found that under the mild pretreatment conditions [expressed as a combined severity factor (CSF)], monomeric xylose was the first to solubilize. However, as pretreatment became more severe, recovered xylose yields decreased, likely lost to degradation. Glucose release correlated with CSF increase, with temperature being the main contributing factor, followed by acid concentration and reaction time. In the presence of crude *E. dunnii* hydrolyzate, *Saccharomyces cerevisiae* could be cultured for fermentation, although the time (30 h) at which the organism was able to convert 38 g of glucose into 18 g/L of ethanol (92% efficiency) was double when compared to starch-fed fermentations (Sánchez and Cardona, 2008). This study highlights the cost/benefit analysis of biomass conversion, where more severe treatments will result in greater glucose yields but will generate more degradation products from matrix polysaccharides that inhibit fermentation. The authors also encountered significant differences in saccharification yield between biomass of different ages. After two pretreatment severity conditions (CSF 1.60 and 2.48), 6-year-old eucalypt biomass yielded greater amounts of glucose than their 10-year-old counterparts, despite similar chemical composition. These differences were attributed to changes in cellulose crystallinity, which may be species specific based on similar studies in *Populus* (DeMartini and Wyman, 2011).

Although xylose, the main monosaccharide present within hemicellulose, is more difficult to ferment by fungi due to an overproduction of nicotinamide adenine dinucleotide (NADH) under anaerobic conditions (Bruinenberg et al., 1983), hemicellulose exists as a matrix polysaccharide and is thus far less resistant to pretreatment than cellulose. To demonstrate the ease at which xylose, generated from residual *E. grandis* wood chips during pulp production, could be fermented into fuel, Silva et al. (2011) optimized ethanol production from hemicellulose hydrolyzate, generated from mild acid pretreatment. Dilute sulfuric acid was mixed with the wood chips and was then autoclaved (121°C, 45 min) to allow separation from the hemicellulose hydrolyzate portion from the solids' (cellulose and lignin) portion. The hydrolyzate was then fermented to ethanol by a *Pichia stipitis* strain, known for its ability to ferment xylose, to achieve an ethanol concentration of 15.3 g/L (100 L/tonne of biomass). As a comparison, the solids' portion, which was delignified using an alkaline NaOH pretreatment step (4%, w/v, 121°C, 20 min), was fermented by *S. cerevisiae* by an SSF process yielded a final ethanol concentration of 28.7 g/L.

Although this study demonstrates eucalypt biomass conversion from debarked biomass, bark accounts for approximately 10–12% of tree biomass residue processed from a plantation (Perlack et al., 2005; Zhu and Pan, 2010), which contains considerable levels of glucose (40%) and xylose (10%) (Lima et al., 2013). Given that bark is often not considered or optimized during lignocellulose pretreatment, Lima et al. (2013) tested various options for bark deconstruction from commercial *E. grandis* (EG) and *E. grandis* × *urophylla* (EGU) trees. The authors tested both one- and two-step acid and alkaline combinations in order to maximize sugar recovery. A combination of acid (1%) and NaOH (4%) pretreatment resulted in a solids fraction containing high concentrations of glucose from EG and EGU (78 and 81% dry weight, respectively); however, only 54.2 and 66.6% of total glucose was actually recovered after treatment. Upon saccharification, 65.4 and 84.5% of glucose was released from the acid + alkaline-treated bark samples. Alternatively, a single NaOH (4%) pretreatment step, while retaining lesser amounts of glucose within the solids fraction (56 and 62%), resulted in higher total recovered glucose (63.4 and 73.1%) and more efficient enzymatic saccharification (78.5 and 98.6%).

Although alkaline pretreatments are widely used, particularly in the pulp and paper industry, the chemicals required are considered pollutants and require multiple purification steps for removal from hydrolyzate. More recently, ILs, organic salts that are liquid at room temperature act as a solvent to solubilize cellulose, hemicellulose, and lignin without degradation, have been used as an effective pretreatment (Zhu et al., 2006). ILs, although not yet developed for large-scale use, are prized for their stability, recyclability, and low volatility during biomass solubilization (Zhu et al., 2006; Shi et al., 2015). As an emerging, pretreatment option, their exact interaction with biomass during solubilization is not well understood. To examine changes in cell wall structure and composition in woody biomass in response to IL pretreatment, Çetinkol et al. (2010) compared the cell wall of *E. globulus* before and after exposure to IL 1-ethyl-3-methyl imidazolium acetate [C2min][OAc]. Using a variety of imaging and spectroscopy techniques [2-dimensional nuclear magnetic resonance spectroscopy (2D-NMR), Fourier transform infrared spectroscopy, scanning electron microscopy, small angle neutron scattering, and X-ray diffraction], they found IL pretreatment resulted in the deacetylation of xylan, acetylation of lignin, and the selective removal of G lignin monomers thereby increasing the S/G ratio. Subsequent saccharification of the treated biomass showed a significant increase in glucose (5×) yield after 1 h saccharification, which authors attributed to a decrease in cellulose crystallinity. Xylose yield was also increased after IL treatment, which was undetectable after saccharification of untreated biomass.

Depending on their chemistry, ILs interact with biomass differently. Protic ILs (PILs) can be prepared via a one-step process with low-cost acids and bases and preferentially solubilize lignin, while aprotic IL (AIL) preparation is a multistep process and preferentially dissolve carbohydrate macromolecules (Greaves et al., 2006; Zhang et al., 2015). Zhang et al. (2015) developed a concerted IL pretreatment (CIL) for *Eucalyptus* bark, combining pyrrolidinium acetate ([Pyrr][AC]; PIL) with 1-butyl-3-methylimidazolium acetate ([BMIM][AC]; AIL). Compared to untreated bark, each IL pretreatment alone ([Pyrr] or [BMIM]) or separate combinations of each ([Pyrr] and [BMIM]), the CIL pretreatment ([Pyrr]/[BMIM]) resulted in 91% enzymatic hydrolysis of cellulose, as compared to 5, 67, 50, and 77%. The same trend (13, 48, 65, and 79%) was observed during enzymatic hemicellulose hydrolysis as well (untreated biomass, [Pyrr], [BMIM], [Pyrr] and [BMIM], and [Pyrr]/[BMIM]). Reduced lignin content correlated with cellulose conversion, which was further enhanced through the removal of hemicellulose. These strategies of converting underutilized (bark, thinned trees, and hemicellulose hydrolyzate) or undesirable (tension wood) lignocellulose will be a key for the sustainable generation of biofuels through coupling bioenergy production with traditional industrial forestry practices (van Heiningen, 2006).

While acid pretreatment remains a common method of pretreatment due to its effectiveness, strong industrial acids are expensive to generate and difficult to recycle and neutralize (Menon et al., 2010). An alternative pretreatment method utilizes residues on the xylan backbone to disrupt the structure of lignocellulose. Hot water pretreatment, or autohydrolysis, is a cost-effective pretreatment option that mixes pressurizes hot water with biomass in a reaction vessel, causing acetyl residues on the xylan backbone to generate *in situ* acetic acid. The internal generation of acetic acid reduces the pH of the biomass liquor and accelerates delignification and the solubilization of hemicellulose (Galbe and Zacchi, 2007). To demonstrate the effectiveness of liquid hot water pretreatment for eucalypt biomass, Yu et al. (2010) developed a two-step pretreatment assay (step 1: 180–200°C, 0–60 min and step 2: 180–240°C; 0, 20, 40, and 60 min) to achieve maximize xylose recovery and minimize cellulose degradation. Their results demonstrated that during the first pretreatment step, degradation of xylose to furfural increases linearly with reaction severity, a trend which continues during the second pretreatment step where furfural concentration increases between 180 and 200°C then seemingly decreases through the formation of other aldehyde products. During the second pretreatment step, furfural and 5-HMF production increased steadily over time at constant temperature (200°C), demonstrating that extended pretreatments are detrimental for recovery of monomeric sugars. Temperature had the greatest effect on the formation of inhibitory products, with authors finding that shorter reaction times and lower temperatures (180°C, 20 min; 200°C, 20 min) maximized sugar recovery (96.6%) and enzymatic digestion (81.5%).

Although autohydrolysis pretreatment can effectively solubilize hemicellulose, cellulose will remain in its recalcitrant, crystalline form after pretreatment. To reduce cellulose crystallinity in conjunction with autohydrolysis pretreatment, Inoue et al. (2008) used ball milling to improve saccharification yield from *Eucalyptus* biomass. The authors demonstrated that milling alone for short periods of time (20 min) could dramatically reduce cellulose crystallinity from 59.7 to 7.6%, although only 44.2% of sugars were captured after saccharification. To achieve higher rates of enzymatic saccharification from ball-milled biomass (86.2%), restrictively long milling times were required (120 min). To combat this, the authors combined a hot water pretreatment (160°C, 30 min) and ball milling (20 min) step to yield approximately 70% of total sugars with a low enzyme loading [4 filter paper units (FPU)/g substrate]. By comparison, the same yields were achieved by hot water pretreatment (160°C, 30 min) or ball milling (40 min) separately, each requiring 10× enzyme loading (40 FPU/g). This study demonstrates how combining methods can effectively reduce the severity of the pretreatment required to deconstruct biomass, which will lessen the formation of inhibitory products and the costs associated with enzymatic saccharification.

Traditionally, lignocellulosic biofuel production required separated process vessels where polysaccharide hydrolysis was carried out independently from microbial fermentation. Separate hydrolysis and fermentation (SHF) required additional processing and distilling steps to remove contaminants that prevent biofuel production (Olofsson et al., 2008). To improve biomass conversion efficiency and reduce fuel production costs, SSF processes generate liquid fuel from sugars as they are hydrolyzed from a polysaccharide. The advantages of SSF over SHF include the use of a single reactor for production to reduce capital costs, lower accumulation of sugars which bolsters saccharification rate and yield, and the presence of ethanol in the reaction vessel helps reduce microbial contamination (Krishna and Chowdary, 2000; Olofsson et al., 2008). To examine the efficiency of organosolv (in this case, ethanol and water) pretreatment with SSF processes, Yáñez-S et al. (2013) pretreated *E. globulus* biomass using an SSF process with various substrate loadings (10 and 15%, w/v), thermostable yeast concentrations (6 and 12 g/L), and enzyme loadings (as expressed as cellulase FPU/β-glucosidase IU [10/20, 20/40, and 30/60]). The authors found that the highest ethanol concentration (42 g/L) was obtained from 15% (w/v) substrate loading, 20 FPU/40 IU enzyme loading, at either yeast concentration. Although higher substrate loading decreased the overall ethanol yield, ethanol concentration within the reaction vessel was increased. Furthermore, mass balance calculation from 15% substrate loading within SSF and SHF processes suggested that greater ethanol amounts could be achieved by SSF (164 and 107 L/tonne, respectively).

The strategy of increasing the solids loading during an SSF is another strategy to further reduce operation costs associated with fuel production. By increasing the weight of solids to 15–20% of the SSF reaction, the energy required to heat and distil the reaction is dramatically reduced (Wang et al., 2011). Of course, this requires optimization of process parameters, such as liquid-to-solid ratio (LSR) and enzyme-to-substrate ratio (Romaní et al., 2011). Optimization of these parameters with *E. globulus* biomass, as well as autohydrolysis pretreatment severity, allowed Romaní et al. (2012) to reach an ethanol concentration of 67.4 g/L, representing 91% conversion of ethanol from cellulose, which scales to 291 L of ethanol per tonne of biomass.

Steam explosion (SE), another cost-effective pretreatment that is similar to autohydrolysis, solubilizes hemicellulose and disrupts the structure of biomass through the breakage of linkages caused by a sudden drop in pressure. SE pretreatment is often combined with alkaline or dilute acid catalysts to increase saccharification through either delignification or increased recovery of xylose (respectively). However, addition of catalysts increase biofuel production costs either through the cost of the chemical itself or through the additional washing and neutralization steps. Thus, optimization of SE pretreatment can provide an environmentally friendly process for biofuel production. Romaní et al. (2013) optimized the temperature (173–216) and pretreatment time ranges (6–34 min) with fixed enzyme loadings (15 FPU/10 IU) to improve ethanol production from *E. globulus* biomass. Using a scanning electron micrograph (SEM) to visualize the biomass after explosion, the authors observed that exposure to a temperature of 210°C for 30 min completely opened up the fibular structure of the biomass. Although, maximum ethanol production of the SE treated material was achieved under less severe conditions (210°C, 10 min) which produced 50.9 g/L from an SSF reactor. This represents again a 91% theoretical conversion of ethanol from cellulose, scaling to 248 L/tonne of biomass.

Microbial fermentation efficiency is another limiting step during lignocellulosic biofuel production. High-fuel production strains of yeast can readily convert glucose to ethanol while withstanding ethanol toxicity but are largely unable to utilize hemicellulose derived pentose sugars (Lange, 2007). Alternate strains, belonging to *Pichia* and *Candida* genera, are capable of xylose fermentation but lack productivity. Metabolic engineering achieved through transformation to generate an organism capable of efficiently utilizing multiple carbon sources will greatly increase lignocellulosic fuel production, particularly one unfettered by high concentrations of ethanol or aldehydes, such as furfural and 5-HMF (Sun and Cheng, 2002; Wen et al., 2009). Despite eucalypt's desirable biofuel characteristics, their preferred climate ranges from cool temperate to tropical rainforest (Grattapaglia and Kirst, 2008; Shepherd et al., 2011). As such, their productivity as an energy crop outside of these climates is limited. To combat this limitation, Castro et al. (2014) investigated *E. benthamii*, a naturally cold resistant species that is commercially grown in Southeast USA, as a potential biofuel feedstock. To maximize biomass conversion, authors used a process known as liquefaction plus simultaneous saccharification and cofermentation (L + SScF), which combines dilute acid SE pretreatment with SSF processes with an inhibitor-resistant *E. coli* strain (SL100) capable of dual glucose/xylose fermentation. In addition, the authors used phosphoric acid instead of sulfuric acid, as it forms fewer inhibitory products during deconstruction and it allows the use of lower grades of stainless steel in reaction vessels, which saves on capital costs. Through optimization of temperature, acid concentration and pretreatment time (combined as a function of CSF), Castro et al. (2014) found that sugar yields were affected primarily by pretreatment time and temperature, with acid concentration having the smallest impact. Within the reaction vessel during fermentation, glucose was completely consumed within 48 h of fermentation, at which point the SL100 strain began fermenting xylose for the remainder of the 96 h fermentation. The cofermentation strategy to utilize all available carbon for conversion was successful, producing 240 g of ethanol/kg of raw biomass (304 L/ tonne). For comparison, average ethanol production from sugarcane bagasse using the same process achieved 270–280 g/ kg (342 SScF 355 L/tonne) (Geddes et al., 2013). Given the low costs of producing woody biomass (Hamelinck et al., 2005), this combination of strategies to employ alternative chemicals, SSF reaction vessels and cofermentation microbial strains that are engineered to withstand the detrimental effects of inhibitors demonstrates the feasibility of using eucalypts as a cost-effective crop for bioenergy production.

While ethanol is the most widely produced biofuel due to its ease of production, butanol is another fermentation product that can be used as a liquid fuel. Butanol is less volatile, hygroscopic, corrosive, and explosive than ethanol, can be transported with current infrastructure, and has similar energy content to gasoline (Antoni et al., 2007; Dürre, 2007; Ezeji et al., 2007; Fortman et al., 2008). Despite its advantages, microbial fermentation to butanol lacks efficiency given butanol's toxicity and often requiring nutrient supplementation which increasing operating costs (Zheng et al., 2015). Zheng et al. (2015) demonstrated the feasibility of acetone–butanol–ethanol (ABE) production from *Clostridium saccharoperbutylacetonicum* from steam exploded *Eucalyptus* biomass without nutrient supplementation. Various glucose concentrations (30–75 g/L) were achieved though varying solid loadings (6.7–25%) finding that a hydrolyzate loading of 10% (39.5 g/L) generated the highest concentration of ABE (acetone 4.07 g/L, butanol 7.72 g/L, and ethanol 0.467 g/L). However, further optimization of glucose concentration (dilution of 75–45 g/L) produced the highest ABE concentrations (4.27 g/L acetone, 8.16 g/L butanol, and 0.643 g/L ethanol). Solids loading beyond 10% had a detrimental effect on ABE production, likely due to formation of fermentation inhibitors such as 5-HMF and phenolics.

## LIGNIN

Lignin, being the second most abundant biopolymer in plant tissue, accounts for roughly 25% of biomass. Its primary role is to provide strength and rigidity to the plant, as well as assisting in vascular water transport and protection from pathogens (Boerjan et al., 2003; Ralph et al., 2004). While providing critical functions for the plant, lignin effectively surrounds structural polysaccharides within the secondary cell well, resulting in inefficient release of fermentable sugars from chemical or enzymatic hydrolysis (Hinchee et al., 2010; Jönsson et al., 2013).

Lignin synthesis begins with the conversion of phenylalanine to trans-cinnamic acid, catalyzed by the enzyme phenylalanine ammonia lyase (PAL). The remaining enzymatic steps have been well-reviewed (Ona et al., 1997; Li et al., 2006; Déjardin et al., 2010; Vanholme et al., 2010), but ultimately this biosynthetic pathway ends with the generation of the main precursors of the lignin molecule: coniferyl, *p*-coumaryl, and sinapyl alcohol (Bonawitz and Chapple, 2010). Upon transportation to the secondary cell wall, each alcohol precursor undergoes an oxidation reaction, mediated by laccase and peroxidase enzymes, which destabilize the monolignol causing it to form a covalent bond with another monolignol. Once bonded, these subunits form ρ-hydroxyphenyl (H), guaiacyl (G), and syringyl (S) lignin (Ralph et al., 2004; Bonawitz and Chapple, 2010; Vanholme et al., 2010). The most common covalent bond to occur, particularly in eucalypt lignin, is the β-θ-4 linkage, which is predominately formed from S lignin monomers. Other linkages are present, such as β–β and β-5 dimers, but β-θ-4 linkages are preferential for pulp and biofuel production as they are less stable than other bonds, branch less frequently, and are more easily broken during alkaline pretreatment (Huntley et al., 2003; Hinchee et al., 2010).

Lignin represents the largest barrier to efficient deconstruction of woody biomass. Studies performed in transgenic lines of alfalfa, poplar and *Arabidopsis* have demonstrated how slight alterations in the quantity and composition of lignin can result in large downstream effects for the saccharification of biomass (Chen and Dixon, 2007; Leplé et al., 2007; Eudes et al., 2012). Given its importance to the survivability of the plant, genetic control of cell lignification is tightly regulated. Using the promoter region of cinnamoyl CoA reductase (*CCR*) from *E. gunnii*, paired with a reporter gene (*GUS*), Lacombe et al. (2000) demonstrated using transgenic tobacco plates that *EgCCR* was highly activated during development and lignification of xylem tissues. Control of the lignin biosynthetic pathway is achieved through AC-rich elements within gene promoters. These AC elements serve as a binding platform for transcription factors (such as LIM and

TABLE 2 | Advances in lignocellulosic biofuel production from eucalypt biomass. Reference Strategy Pretreatment and fermentation

IL pretreatment of biomass 1-Ethyl-3-methyl imidazolium

Pretreatments without acids/bases/ solvents are cheaper with fewer environmental impacts

Hemicellulose deconstruction and fermentation from residual wood

Optimization of acid concentration, temperature, and pretreatment time

Screened various woody feedstocks with varying for wood properties

Investigate effects of S/G ratio on IL

pretreatment efficiency

enzyme loading

enzyme loading

pretreatment time

*Eucalyptus* bark

pretreatment

laccases

*C3H* and *C4H*

SSF optimization of substrate loading, yeast concentration, and

SSF optimization of substrate and

Optimization of temperature and

Optimization of pretreatment for

SSF fermentation with inhibitorresistant cofermentation *E. coli* strain

Fungal laccases with mediator

RNAi downregulation of lignin genes

Screening, isolation, and pretreatment with endophytic fungal

Fermentation of tension and

Yu et al. (2010) Two-step liquid hot water hydrolysis of biomass

chips

opposite wood

Inoue et al. (2008)

Çetinkol et al. (2010)

Silva et al. (2011)

Muñoz et al. (2011)

McIntosh et al. (2012)

Santos et al. (2012)

Papa et al. (2012)

Yáñez-S et al. (2013)

Romaní et al. (2012)

Romaní et al. (2013)

Lima et al. (2013)

Castro et al. (2014)

Rico et al. (2014, 2015)

Martín-Sampedro et al. (2015)

Sykes et al. (2015)

Zhang et al. (2015)

Zheng et al. (2015)

*ABE, acetone/butanol/ethanol.*

conditions

acetate

Dilute sulfuric acid *P. stipitis* (*S. cerevisiae* fermentation of solids)

Sulfuric acid *S. cerevisiae*

acetate

1-Ethyl-3-methyl imidazolium

One/two-step acid/alkaline

Steam explosion + phosphoric

SSF fermentation + cofermentation

Laccase pretreatment + alkaline

pretreatment + autohydrolysis

3-methylimidazolium acetate

fermentation

pretreatment

acid

*E. coli*

extraction

Fungal

IL pretreatment of eucalyptus bark Pyrrolidinium acetate and 1-butyl-

ABE production without nutrients Steam explosion and *Clostridium*

Conclusions Result

Deacetylation of xylan 5× glucose yield

70% sugar recovery

saccharification

biomass)

18 g/L ethanol

biomass)

biomass)

biomass)

Preferential G unit removal ~50% lignin reduction and 30% S unit oxidation increase in saccharification

Transgenic plants were dwarfed *C4H* (97% saccharification)

tonne biomass)

solids)

96.6% sugar recovery; 81.5%

15.3 g/L ethanol (100 L/tonne

28.7 g/L ethanol (obtained from

35 g/L ethanol (290 L/tonne

*E. globulus* biomass (low lignin content 7%, 98% saccharification, and 75% sugar recovery)

Glucose yield of 759–897 g/ kg cellulose after 24 h saccharification

42 g/L ethanol (164 L/tonne of

67.4 g/L ethanol (291 L/tonne of

50.9 g/L ethanol (248 L/tonne of

240 g ethanol/kg biomass (304 L/

73.1% glucose recovery and 98.6% saccharification

3.3 and 2.9× increase in total sugar release after pretreatment

*C3H* (94% saccharification)

Control (80% saccharification)

91% enzymatic hydrolysis of

4.27 g/L acetone, 8.16 g/L butanol, and 0.643 g/L ethanol

cellulose

less enzyme for saccharification

Short reaction times and low temperatures maximize recovery

Hemicellulose was separated from

products formation

Acetylation of lignin Increased S/G ratio

cellulose and lignin

conditions to delignify

Hemicellulose solubilizes and

Temperature contributes most to

and S/G ratio contribute most

S/G ratio did not affect IL pretreatment efficiency

midrange enzyme loading

SSF fermentation with *S. cerevisiae* biomass)

degrades first

glucose release

saccharification

maximize yield

at 210°C for 10 min

Single alkaline step recovered most

Sugar yield is primarily determined by pretreatment time and

Endophytic fungi outperformed white rot reference *Trametes* strain

and underwent more efficient

IL combinations had a synergistic

ethanol

glucose

temperature

Increased S/G ratio

saccharification

effect on pretreatment

Solids loading and glucose concentration are critical for microbial inhibition

Autohydrolysis + milling Duel pretreatment required 10×

Autohydrolysis Temperature affects degradation

Organosolv Tension wood required milder

Alkaline pretreatment Lignin content, enzyme adsorbtion,

Organosolv Higher substrate loading and

Autohydrolysis and SSF reaction 91% conversion of cellulose to

Hot water pretreatment Transgenic lines had less lignin

*IL, ionic liquid; S, syringyl; G, guaiacyl; SSF, simultaneous saccharification fermentation; RNAi, RNA interference; C3H,* ρ*-coumarate 3-hydroxylase; C4H, cinnamate 4-hydroxylase;* 

Steam explosion and SSF reaction Maximum ethanol yield is achieved

MYB) that modulate gene expression (Rogers and Campbell, 2004; Zhong and Ye, 2007). The LIM transcription factor, first identified in tobacco, upregulates lignin genes. When silenced in tobacco using antisense *NtLIM1* constructs, transcripts for phenylpropanoid genes *PAL*, 4 coumarate CoA ligase (*4CL*), and cinnamyl alcohol dehydrogenase (*CAD*) were also downregulated, resulting in plants with 27% less lignin than wild type (Kawaoka et al., 2000). Similarly, suppression of the *LIM1* ortholog in *E. camaldulensis* also downregulated the *PAL*, *4CL*, and *CAD* gene pathways, resulting in plants with not only 29% less lignin but also 5% higher structural polysaccharides. The polysaccharide increase could be a result of shifting carbon resources as a result of downregulating the phenylpropanoid pathways (Kawaoka et al., 2006).

The MYB transcription factor, first discovered as a regulator of the lignin pathway in snapdragons, also affects the transcription of the lignin gene pathways. Identified from cDNA libraries of differentiating xylem tissue, the *E. grandis MYB2* gene when overexpressed in tobacco resulted in abnormal secondary cell wall thickening and altered lignin composition. Interestingly, while the expression of phenylpropanoid genes was unaltered, downstream genes responsible for monolignol synthesis [*4CL*, ρ-coumarate 3-hydroxylase (*C3H*), hydroxycinnamoyl:shikimate hydroxycinnamoyl transferase (*HCT*), caffeoyl CoA *O*-methyltransferase (*CCoAOMT*), ferulate 5-hydroxylase (*F5H*), caffeic acid *O*-methyltransferase (*COMT*), *CCR*, and *CAD*] were upregulated, increasing the S/G ratio composition of the lignin (Goicoechea et al., 2005). Another MYB transcription factor, identified from *E. grandis*, *EgMYB1*, when overexpressed in poplar and *Arabidopsis* resulted in plants with dwarfed leaves and stems and downregulated lignin and cellulose and hemicellulose transcripts. Given that the upregulation of *EgMYB1* resulted in the alteration of the major components of secondary cell wall structures suggests that MYB1 is a weak activator of lignocellulose genes, and its upregulation outcompetes stronger activators, thereby reducing overall transcription (Rogers and Campbell, 2004; Legay et al., 2010).

To investigate the effects of various wood properties on the enzymatic saccharification of woody biomass, such as lignin content, S/G ratio, cellulose crystallinity, fiber pore size, and enzyme adsorbtion, Santos et al. (2012) characterized the biomass of nine woody plants, including *E. nitens*, *E. globulus*, and *E. urograndis*. Using a Kraft alkaline pretreatment and fixed enzyme loading, the authors found that of all the parameters investigated, lignin content is the most significant contributing factor for saccharification. *E. globulus* biomass conversion resulted in the highest sugar recovery, efficient enzymatic conversion, and least residual lignin (75.2, 97.9, and 6.9%, respectively). However, lignin content alone did not fully explain saccharification yields, as biomass with similar lignin levels released much less glucose than *E. globulus*. Lignin S/G ratios were also found to impact enzymatic hydrolysis, as increased S lignin monomers undergo less frequent branching, producing a more linear polymer which increases enzymatic access to polysaccharides. Although, the effect of S/G ratio on saccharification appears to be dependent on biomass pretreatment, as acid hydrolysis has been shown to have a greater effect on low S/G lignin (Davison et al., 2006) while Papa et al. (2012)

demonstrated using three mutant lines of *E. globulus* with varying S/G ratios (0.94, 1.13, and 2.15) that lignin composition did not affect saccharification after IL pretreatment.

Given that lignin remains the largest barrier to effective deconstruction of woody biomass for fermentation, treatments to increase the efficiency at which it can be removed from biomass will aid biofuel production. To improve enzymatic saccharification of eucalypt biomass, Sykes et al. (2015) generated transgenic *E. grandis* × *urophylla* hybrids with RNA interference (RNAi) downregulated lignin biosynthetic genes *C3H* and cinnamate 4-hydroxylase (*C4H*). Total lignin content in transgenic lines was reduced by 8–9%, and after hot water pretreatment (designed as a mild, cost-effective method for biomass disruption) and enzymatic saccharification, both *C3H* (94%) and *C4H* (97%) transgenic lines released higher total sugars than control biomass (80% saccharification). However, transgenic lines were dwarfed (*C3H* – 2.0 m and *C4H* – 3.4 m) as compared to controls (6.0 m), a common issue for lignin transgenic plants that could be alleviated through silviculture practices.

Until low lignin transgenic plants are further developed, largescale biofuel production will depend on harsher pretreatments that inhibit microbial growth and enzymatic action through solubilization of phenolics (Ximenes et al., 2010; Jönsson et al., 2013). An alternative option to aid in delignification of biomass is the addition of laccase to destabilize the lignin network through phenol oxidation. Gutiérrez et al. (2012) and Rico et al. (2014) tested the potential of a laccase enzyme to increase saccharification from *E. globulus* biomass. Tested in the presence of an enzyme mediator, either 1-hydroxybenzotriazole (HBT) or methyl syringate (respectively), both studies reported lignin reduction (~48%) in *E. globulus* substrate and increased glucose and xylose yields after saccharification. Using pyrolysis-gas chromatography/mass spectroscopy to understand the effect of the laccase treatment, authors found an increased S/G composition (4.9 vs. 4.0) within the lignin because of preferential hydrolysis of G lignin subunits, resulting in a less condensed phenolic polymer. Continued investigation of laccase pretreatment with mediators was conducted by Rico et al. (2015) using 2D-NMR to characterize each step of delignification by fungal enzymes with *E. globulus* biomass and cellulolytic lignin. The low redox potential *M. thermophila* laccase enzyme and methyl syringate mediator pretreatment was tested against a high redox potential laccase, isolated from *Pycnoporus cinnabarinus*, with HBT mediator across several stages of pretreatment and alkaline extraction. Though various structural changes occurred throughout each stage of the fungal pretreatments, the most striking effects involved the preferential removal of guaiacyl units, reduced β-0-4 alkyl–aryl ether linkages, and S/G ratio increase. Syringyl lignin subunits underwent Cα oxidation during laccase pretreatment, which were incompletely removed through alkaline extraction. Both fungal enzyme treatments achieved similar delignification results (~50%), although multistage analysis suggests that the rate of oxidation by *P. cinnabarinus* laccase + HBT was greater. The 50% delignification result correlated with a 30% increase in glucose yield after enzymatic saccharification. These results suggest that the largest gains in sugar release from biomass result from total delignification of biomass rather than the alteration of lignin composition.

demonstrated using three mutant lines of *E. globulus* with varying S/G ratios (0.94, 1.13, and 2.15) that lignin composition did not

Given that lignin remains the largest barrier to effective deconstruction of woody biomass for fermentation, treatments to increase the efficiency at which it can be removed from biomass will aid biofuel production. To improve enzymatic saccharification of eucalypt biomass, Sykes et al. (2015) generated transgenic *E. grandis* × *urophylla* hybrids with RNA interference (RNAi) downregulated lignin biosynthetic genes *C3H* and cinnamate 4-hydroxylase (*C4H*). Total lignin content in transgenic lines was reduced by 8–9%, and after hot water pretreatment (designed as a mild, cost-effective method for biomass disruption) and enzymatic saccharification, both *C3H* (94%) and *C4H* (97%) transgenic lines released higher total sugars than control biomass (80% saccharification). However, transgenic lines were dwarfed (*C3H* – 2.0 m and *C4H* – 3.4 m) as compared to controls (6.0 m), a common issue for lignin transgenic plants that could be alleviated

Until low lignin transgenic plants are further developed, largescale biofuel production will depend on harsher pretreatments that inhibit microbial growth and enzymatic action through solubilization of phenolics (Ximenes et al., 2010; Jönsson et al., 2013). An alternative option to aid in delignification of biomass is the addition of laccase to destabilize the lignin network through phenol oxidation. Gutiérrez et al. (2012) and Rico et al. (2014) tested the potential of a laccase enzyme to increase saccharification from *E. globulus* biomass. Tested in the presence of an enzyme mediator, either 1-hydroxybenzotriazole (HBT) or methyl syringate (respectively), both studies reported lignin reduction (~48%) in *E. globulus* substrate and increased glucose and xylose yields after saccharification. Using pyrolysis-gas chromatography/mass spectroscopy to understand the effect of the laccase treatment, authors found an increased S/G composition (4.9 vs. 4.0) within the lignin because of preferential hydrolysis of G lignin subunits, resulting in a less condensed phenolic polymer. Continued investigation of laccase pretreatment with mediators was conducted by Rico et al. (2015) using 2D-NMR to characterize each step of delignification by fungal enzymes with *E. globulus* biomass and cellulolytic lignin. The low redox potential *M. thermophila* laccase enzyme and methyl syringate mediator pretreatment was tested against a high redox potential laccase, isolated from *Pycnoporus cinnabarinus*, with HBT mediator across several stages of pretreatment and alkaline extraction. Though various structural changes occurred throughout each stage of the fungal pretreatments, the most striking effects involved the preferential removal of guaiacyl units, reduced β-0-4 alkyl–aryl ether linkages, and S/G ratio increase. Syringyl lignin subunits underwent Cα oxidation during laccase pretreatment, which were incompletely removed through alkaline extraction. Both fungal enzyme treatments achieved similar delignification results (~50%), although multistage analysis suggests that the rate of oxidation by *P. cinnabarinus* laccase + HBT was greater. The 50% delignification result correlated with a 30% increase in glucose yield after enzymatic saccharification. These results suggest that the largest gains in sugar release from biomass result from total delignification of biomass rather than the altera-

affect saccharification after IL pretreatment.

through silviculture practices.

tion of lignin composition.

### TABLE 2 | Advances in lignocellulosic biofuel production from eucalypt biomass.


*IL, ionic liquid; S, syringyl; G, guaiacyl; SSF, simultaneous saccharification fermentation; RNAi, RNA interference; C3H,* ρ*-coumarate 3-hydroxylase; C4H, cinnamate 4-hydroxylase; ABE, acetone/butanol/ethanol.*

While *M. thermophila* is a commercially available strain, its laccase enzymes may lack specificity when applied to various lignocellulose feedstocks. To investigate novel laccase enzymes from endophytic fungi, occurring in symbiosis with *Eucalyptus* trees, Martín-Sampedro et al. (2015) screened more than 100 strains, selecting five for their ligninolytic enzymes. These strains, tested against a white rot *Trametes* sp. reference, were combined with 10 g of *Eucalyptus* wood chips, before or after mild autohydrolysis pretreatment (selected to minimize the production of fungal inhibitory products). Enzymatic saccharification of each pretreatment released greater sugar yields from combination of treatments (fungal + autohydrolysis) than either pretreatment alone. Endophytic fungi strains *Ulocladium* sp. and *Hormonema* sp. outperformed the *Trametes* sp. reference strain, resulting in 3.3- and 2.9-fold increase of total sugars (compared to a 2.3 fold increase) as compared to autohydrolyzed control biomass (~3 g/L). The authors postulated that the specific activity of the ligninolytic enzymes could be a result of evolutionary processes, and endophytic fungi represent a large reservoir of biodiversity to aid biofuel production.

## CONCLUSION

Given the global demand and potential for lignocellulosic biofuels, selection and research into alternative feedstocks is essential. Eucalypts, given their wide range of phenotypic diversity, genetic potential, environmental adaptability, and desirable cell wall chemistry, are excellent candidates for bioenergy production (**Table 2**). While eucalypt biomass is highly prized for other industrial processes, such as pulp and paper and timber production, the most economical way to introduce lignocellulose into the energy supply chain will be in conjunction with other plantation practices where thinned and undesirable trees are removed to promote growth of high-value trees. In addition, the production of fuel from waste wood chips and bark within pulping factories will help convert mills into complete biorefineries. Indeed, as global paper consumption diminishes, alternative uses for eucalypt biomass will require research and development. While pulping plants are efficient at deconstruction, harsh pretreatments are not suitable for downstream microbial conversion of polysaccharides to monosaccharides to fuel. High temperatures and pressures, while effective for deconstruction, generate inhibitory compounds from lignin and carbohydrates that result in sugar losses and inefficient downstream processes. Lignocellulosic fuel will require mild, low-cost pretreatments, coupled with SSF or "one-pot" processes to promote efficient biofuel production.

Genetic and chemical exploitation of eucalypt cell walls has allowed the design of mild and environmentally friendly pretreatments, such as autohydrolysis and SE, relying on *in situ* acid generation to aid deconstruction without expensive and caustic

## REFERENCES

chemicals. Although these pretreatments help to reduce the formation of inhibitory products, aldehydes and phenolics formed from cellulose, hemicellulose, and lignin will likely remain in low concentrations within reaction vessels, necessitating the need for robust fermentive microbial strains. Metabolic engineering to exploit genetic variation has great potential to overcome the largest barriers to fuel conversion. These techniques have already generated dual fermentation stains to utilize all present carbon sources and resist the effects of degradation products within reaction vessels to main productivity. Application of the same principles to feedstocks have downregulated lignin gene pathways, designing plant cell walls that deconstruct with ease under mild conditions. Coupled with screening and isolation of endophytic fungi with specific ligninolytic enzymes, lignin deconstruction and removal from process vessels will maximize enzyme adsorbtion, sugar recovery, and fermentation.

Ionic liquids are the most promising for biomass pretreatment, given their stability and low volatility, and action at low temperatures. Despite the commercial use of cold resistant *E. benthamii*, eucalypts are not the ideal biofuel feedstock in all climates. ILs work universally well regardless of feedstock composition, solubilizing whole biomass without degradation, and selectively precipitating cellulose upon the addition of an antisolvent. Efficient saccharification of the cellulose precipitate maximizes sugar recovery and maintains intact lignin for alternate chemical processing.

In addition to the biological components of biofuel production, process optimization, such as single reaction SSF and high solids' loading, increases achievable ethanol concentrations and lower capital costs for production. Additional savings will be gained through the combination of pretreatments to reduce energy costs and enzyme loading for efficient saccharification. Increased understanding of eucalypt cell wall formation, particularly lignin formation, will allow the engineering of new and effective pretreatment options to make biofuel production suitable for a wide range of lignocellulosic feedstocks to provide renewable fuels for the future.

## ACKNOWLEDGMENTS

This work was part of the DOE Joint BioEnergy Institute (http:// www.jbei.org) supported by the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research, through contract DE-AC02-05CH11231 between Lawrence Berkeley National Laboratory and the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for United States Government purposes.

Antoni, D. W., Zverlov, V. V., and Schwarz, W. H. (2007). Biofuels from microbes. *Appl. Microbiol. Biotechnol.* 77, 23–35.


Alvira, P., Tomás-Pejó, E., Ballesteros, M., and Negro, M. J. (2010). Pretreatment technologies for an efficient bioethanol production process based on enzymatic hydrolysis: a review. *Bioresour. Technol.* 101, 4851–4861. doi:10.1016/j. biortech.2009.11.093


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Healey, Lee, Furtado, Simmons and Henry. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **Optimization of alkaline and dilute acid pretreatment of agave bagasse by response surface methodology**

*Abimael I. Ávila-Lara<sup>1</sup> , Jesus N. Camberos-Flores <sup>1</sup> , Jorge A. Mendoza-Pérez <sup>2</sup> , Sarah R. Messina-Fernández <sup>3</sup> , Claudia E. Saldaña-Duran<sup>3</sup> , Edgar I. Jimenez-Ruiz <sup>4</sup> , Leticia M. Sánchez-Herrera<sup>4</sup> and Jose A. Pérez-Pimienta<sup>1</sup> \**

*<sup>1</sup> Department of Chemical Engineering, Universidad Autónoma de Nayarit, Tepic, Mexico, <sup>2</sup> Department of Engineering in Environmental Systems, Instituto Politécnico Nacional, Mexico City, Mexico, <sup>3</sup> Cuerpo Académico de Sustentabilidad Energética, Universidad Autónoma de Nayarit, Tepic, Mexico, <sup>4</sup> Food Technology Unit, Universidad Autónoma de Nayarit, Tepic, Mexico*

### *Edited by:*

*Robert Henry, The University of Queensland, Australia*

### *Reviewed by:*

*Jian Xu, Chinese Academy of Sciences, China Maria Gonzalez Alriols, University of the Basque Country, Spain*

### *\*Correspondence:*

*Jose A. Pérez-Pimienta, Department of Chemical Engineering, Universidad Autónoma de Nayarit, Ciudad de la Cultura "Amado Nervo" S/N, Tepic, Nayarit 63155, Mexico japerez@uan.edu.mx*

### *Specialty section:*

*This article was submitted to Bioenergy and Biofuels, a section of the journal Frontiers in Bioengineering and Biotechnology*

> *Received: 25 July 2015 Accepted: 09 September 2015 Published: 23 September 2015*

### *Citation:*

*Ávila-Lara AI, Camberos-Flores JN, Mendoza-Pérez JA, Messina-Fernández SR, Saldaña-Duran CE, Jimenez-Ruiz EI, Sánchez-Herrera LM and Pérez-Pimienta JA (2015) Optimization of alkaline and dilute acid pretreatment of agave bagasse by response surface methodology. Front. Bioeng. Biotechnol. 3:146. doi: 10.3389/fbioe.2015.00146* Utilization of lignocellulosic materials for the production of value-added chemicals or biofuels generally requires a pretreatment process to overcome the recalcitrance of the plant biomass for further enzymatic hydrolysis and fermentation stages. Two of the most employed pretreatment processes are the ones that used dilute acid (DA) and alkaline (AL) catalyst providing specific effects on the physicochemical structure of the biomass, such as high xylan and lignin removal for DA and AL, respectively. Another important effect that need to be studied is the use of a high solids pretreatment (*≥*15%) since offers many advantaged over lower solids loadings, including increased sugar and ethanol concentrations (in combination with a high solids saccharification), which will be reflected in lower capital costs; however, this data is currently limited. In this study, several variables, such as catalyst loading, retention time, and solids loading, were studied using response surface methodology (RSM) based on a factorial central composite design of DA and AL pretreatment on agave bagasse using a range of solids from 3 to 30% (w/w) to obtain optimal process conditions for each pretreatment. Subsequently enzymatic hydrolysis was performed using Novozymes Cellic CTec2 and HTec2 presented as total reducing sugar (TRS) yield. Pretreated biomass was characterized by wet-chemistry techniques and selected samples were analyzed by calorimetric techniques, and scanning electron/confocal fluorescent microscopy. RSM was also used to optimize the pretreatment conditions for maximum TRS yield. The optimum conditions were determined for AL pretreatment: 1.87% NaOH concentration, 50.3 min and 13.1% solids loading, whereas DA pretreatment: 2.1% acid concentration, 33.8 min and 8.5% solids loading.

**Keywords: agave bagasse, high solids, biomass pretreatment, optimization, characterization**

## **Introduction**

Lignocellulosic biomass is the most abundant renewable carbohydrate source in the world and it is proposed to dominate the biofuel production in the future (Avci et al., 2013). Mainly composed by cellulose, hemicellulose, and lignin, their organization and interaction between these polymeric structures, the plant cell wall is naturally recalcitrant to biological degradation (da Costa Sousa et al., 2009). A pretreatment step is fundamental to alter the structure of cellulosic biomass to make cellulose more accessible to the enzymes that convert the carbohydrate polymers into fermentable sugars (Mosier et al., 2005).

Many options exist for pretreatment of biomass, increase saccharification efficiency and improve the yields of monomerics sugars; the leading examples use liquid catalysts, such as sulfuric acid, ammonia, ionic liquid, or water, which penetrate the cell wall and alter its chemistry and ultrastructure (Dadi et al., 2006; Chundawat et al., 2011).

Recently, agave bagasse (AGB) byproduct of the Tequila industry that represent 40% of the harvested plant, with an annual generation in Mexico of about 1.12 kg *<sup>×</sup>* <sup>10</sup><sup>8</sup> kg has been studied for biomass conversion using different pretreatment approaches, such as ionic liquid (Perez-Pimienta et al., 2013) and organosolv (Caspeta et al., 2014). Moreover, AGB was also been used with acid and enzymatic hydrolysis followed by a fermentation step using a native microorganism (Pichia caribbica UM-5) obtaining ~57% of theoretical ethanol (w/w) (Saucedo-Luna et al., 2011) or for the production of n-butanol and ethanol from different Agave species (Mielenz et al., 2015).

Dilute acid (DA) and alkaline (AL; NaOH) are among the most extensively studied biomass pretreatments in different feedstocks, such as grasses, agricultural residues, and woods (Kumar et al., 2009; Xu et al., 2010; Sathitsuksanoh et al., 2013; Zhang et al., 2014). The mode of action of the DA pretreatment typically use sulfuric acid that removes hemicellulose in a great extent improving the enzyme accessibility to cellulose which its effectiveness depends on the acid concentration and temperature applied during the process, however, if severe conditions are applied several degradation products are formed, mainly furfural, 5-hydroxymethylfurfural, phenolic acids and aldehydes, levulinic acid, and other aliphatic acids, which can inhibit both, enzymatic hydrolysis and fermentation (Mosier et al., 2005; da Costa Sousa et al., 2009). On the other hand, ALs pretreatment uses AL catalyst, such as sodium hydroxide, which are effective depending on the lignin content on the biomass, increasing cellulose digestibility through lignin solublization/removal, exhibiting minor cellulose and hemicellulose solubilization than acid or hydrothermal processes (Avira et al., 2010).

In recent years, the need to investigate the use of high solids loading (*≥* 15%) in biomass pretreatment has increase hence offers many advantaged over lower solids loadings, including increased sugar and ethanol concentrations, which will be reflected in lower capital costs (Modenbach and Nokes, 2012; Li et al., 2013); however, this data is currently limited for DA and AL pretreatments in AGB (Hernández-Salas et al., 2009; Saucedo-Luna et al., 2011).

In the present manuscript, optimization of DA and AL pretreatment strategies for conversion of AGB to sugars using a central composite design (CCD) for response surface methodology (RSM) was studied. The objective of this study was to identify the optimum process conditions for the selected operating variables namely catalyst concentration, retention time, and solid loading for the maximum production of fermentable sugars. Furthermore, the untreated and selected samples from both pretreatments were characterized by calorimetric techniques (TGA), fluorescence and energy dispersive X-ray spectroscopy (EDS), and scanning electron microscopy (SEM).

## **Materials and Methods**

The biomass used in this study was obtained from Destilería Rubio, a Tequila plant from western Mexico. The AGB was harvested in August 2014. The biomass was milled with a Thomas-Wiley Mini Mill fitted with a 40-mesh screen (Model 3383-L10 Arthur H. Thomas Co., Philadelphia, PA, USA) and stored at 4°C in a sealed plastic bag. Cellic® CTec2 (Cellulase complex for degradation of cellulose) and HTec2 (Endoxylanase with high specificity toward soluble hemicellulose) were a gift from Novozymes (Davis, CA, USA).

### **Experimental Design**

Optimization of processing conditions for fermentable sugars recovery was studied using a factorial CCD of RSM. The independent variables were catalyst concentration, residence time, and solids loading. The experimental data were fit using Eq. 1, a low-order polynomial equation to evaluate the effect of each independent variable to the response, which was later analyzed to obtain the optimum process conditions (Tan et al., 2011). In this study, a polynomial quadratic equation was employed as follows:

$$y = \mathfrak{B}\_0 + \sum\_{i=1}^3 \mathfrak{B}\_i X\_i + \sum\_{i=1}^3 \mathfrak{B}\_{ii} X\_i^2 + \sum\_{i=1}^3 \sum\_{i=1}^3 \mathfrak{B}\_{ii} X\_i X\_j \tag{1}$$

where *y* is the response, *X<sup>i</sup>* and *X<sup>j</sup>* are independent variables, β<sup>0</sup> is the constant coefficient, β*<sup>i</sup>* is the *i*th linear coefficient, β*ii* is the quadratic coefficient, and β*ij* is the *ij*th interaction coefficient. CCD consists of 2*<sup>k</sup>* factorial points, 2*k* axial points (*±* α), and six central points, where *k* is the number of independent variables. Each of the variables were investigated at five coded levels (*−*α, *−*1, 0, 1, α), as listed in **Table 1**, and the complete experimental design matrix for this study is shown in **Table 2**. For each pretreatment (DA and AL), a total of 20 experiments per pretreatment were carried out, including eight per factorial design, six for axial points and six repetitions at the central point.

### **Alkaline Pretreatment**

A NaOH solution at a specific concentration were placed in a serum bottle and mixed with AGB using a glass rod, forming


*<sup>a</sup>*α *(axial distance)* <sup>=</sup> *<sup>4</sup>√N, where N is the number of experiments of the factorial design. In this case, 1.6818.*

### **TABLE 2 | Experimental design matrix of CCD and corresponding results (sugars and solids recovery)**.


*AL, alkaline pretreatment; DA, dilute acid pretreatment.*

a slurry at with a precise biomass concentration and the pretreatment was performed in autoclave conditions (121°C and ~15 psi) during the appropriate time according to **Table 1** (Xu et al., 2010). Pretreated biomass was recovered by filtration and washed with 400 mL of distilled water to remove excess alkali and dissolved byproducts. All experiments were conducted in triplicate.

### **Dilute Acid Pretreatment**

The DA pretreatment with H2SO<sup>4</sup> was conducted using the appropriate acid concentration and solids loading referred to **Table 1** at 130°C and 20 psi in an autoclave for a specific time (Sathitsuksanoh et al., 2013). After DA, the hydrolyzate was separated by filtration and the pretreated AGB was washed with 400 mL of distilled water prior to enzymatic hydrolysis. All experiments were conducted in triplicate.

### **Scanning Electron Microscopy**

The morphology of untreated and selected pretreated AGB solids was analyzed using a high resolution SEM by a JEOL JSM-7800F equipment. The representative images were acquired with a 1 kV accelerating voltage and analysis using 20 kV.

### **Confocal Fluorescent Microscopy**

The confocal fluorescent microscope images of untreated and selected pretreated AGB samples were taken using a Carl Zeiss LSM 710 NLO with two laser sources (405 and 633 nm). To demonstrate the microstructure based on the distribution of lignin (autofluorescence) and cellulose, all samples were labeled with Calcofluor white stain (0.1%) for 5 min, subsequently were washed four times using distilled water and allowed to dry in the dark until analysis under the confocal microscope.

### **DSC and TGA Analysis**

A differential scanning calorimeter (Pyris 1) from Perkin Elmer was employed with an argon atmosphere in the range of 50–450°C, at 10°C/min ramp. DSC curves were obtained with 3.3 mg. The TGA curves were obtained using around 3.8 mg of AGB as initial sample mass. The samples was tested in a SETARAM thermal analysis instrument, with temperature range of 50–800°C and heating rate of 10°C/min in argon atmosphere. Untreated and selected pretreated samples were measured by DSC and TGA.

### **Biomass Porosimetry**

Nitrogen porosimetry (ASAP 2406) from Mca-Micromeritics was employed to measure the surface area, pore volume and pore size distribution of the untreated and selected pretreated AGB with the following methods from ASTM: ASTM D-3663(R2008), ASTM D-4222-03(R2008), and ASTM D-4641-12(R2008). Samples were degasified at 120°C.

### **Enzymatic Saccharification**

The saccharification was carried out using commercially available Cellic® CTec2 and HTec2 enzyme mixtures of untreated and pretreated AGB samples, which was conducted at 55°C and 150 rpm in 50 mM citrate buffer (pH of 4.8). A 3% biomass loading was used, likewise, untreated AGB were run concurrently with the pretreated samples to eliminate potential differences in temperature history or enzyme loading. The enzyme concentrations of CTec2 and HTec2 were set at 35 FPU/g biomass and 60 CBU/g biomass, respectively. All assays were performed in triplicate.

### **DNS Assay**

The total reducing sugar (TRS) yield of the final hydrolyzate calculated as mg sugar/g biomass was determined by DNS assay (Miller, 1959) on a DTX 880 Multimode Detector (Beckman Coulter, CA, USA) at 550 nm with solutions (0–10 g/L) of -glucose in water as calibration standards. All assays were performed in triplicate.

### **Statistical Analysis**

Analysis of experimental CCD results was carried out with the software Design-Expert 7.1.5 (Stat-Ease, Minneapolis, MN, USA). Each coefficient in Eq. 1 was calculated and the possible interaction effects of the process variables on the response were obtained. Their significance was checked by variance analysis (ANOVA) of experimental results.

## **Results and Discussion**

### **Biochemical Composition Analysis of Untreated Agave Bagasse**

By following, the National Renewable Energy Laboratory (NREL, Denver, CO, USA) protocols, the composition of untreated AGB in dry basis was 41.5% glucan, 20.3% xylan, 17.0% insoluble lignin, 3.8% soluble lignin, and 5.4% ash, which is consistent with other reported values (Davis et al., 2011; Perez-Pimienta et al., 2013). Glucan and xylan correspond to 61.8% of the total carbohydrates in the AGB.

### **Model Development**

The experimental data were first analyzed, in order to obtain second-order polynomial equations including terms of interaction between the experimental variables using Design-Expert software and the following models for AL and DA pretreatment describes the TRS yield (mg sugar/g biomass) in terms of coded parameters and actual parameters are based on the statistical analysis of the experimental data shown in **Table 2**.

The final equations for AL pretreatment were as follows:

$$\text{TRS yield} = 513.35 + 21.08 \times A + 3.57 \times B - 16.87 \times C$$

$$-9.95 \times AB + 4.87 \times AC + 1.67 \times BC - 38.44 \times A^2$$

$$-2.52 \times B^2 - 18.14 \times C^2 \tag{2}$$

$$\text{TRS yield} = 277.0937 + 203.1844 + \text{NoOH} \rightarrow 1.0059 + \text{Time}$$

TRS yield = 277.0937 + 203.1844 *∗* NaOH + 1.0059 *∗* Time

$$\begin{aligned} &+5.6903\*\text{Solids}-0.4313\*\text{NaOH}\*\text{Time} \\ &+0.7172\*\text{NaOH}\*\text{Solids}+0.0076\*\text{Time}\*\text{Solids} \\ &-53.8367\*\text{NaOH}^2-0.0034\*\text{Time}^2 \\ &-0.2813\*\text{Solids}^2 \end{aligned}$$

In the same way, the final equations for DA pretreatment were as follows:

$$\text{TRS yield} = 427.27 - 5.87 \times A + 0.26 \times B - 12.80 \times C$$

$$-13.65 \times AB + 2.19 \times AC + 2.92 \times BC$$

$$-27.48 \times A^2 - 11.45 \times B^2 - 0.62 \times C^2 \tag{4}$$

$$\begin{aligned} \text{TRS yield} &= 305.8687 + 137.0487 \times \text{Acid} + 2.1816 \times \text{Time} \\ &- 2.4166 \times \text{Solids} - 0.5917 \times \text{Acid} \times \text{Time} \\ &+ 0.3231 \times \text{Acid} \times \text{Solids} \\ &+ 0.0133 \times \text{Time} \times \text{Solids} - 384832 \\ &\times \text{Acid}^2 - 0.0154 \times \text{Time}^2 - 0.0097 \times \text{Solids}^2 \end{aligned}$$

where A, B, and C are catalyst concentration (NaOH for AL and H2SO<sup>4</sup> for DA), retention time and solids loading, respectively. An analysis of variance (ANOVA) was performed to test the significance of the developed model and the results are presented for AL and DA pretreatment in **Tables 3** and **4**, respectively. If a *p*-value (also known as the Prob *> D*-value) is lower than 0.05 a model in considered significant, indicating only a 5% chance that their respective model could occur due to noise. For both pretreatments, their models effectively describes the response nevertheless the AL pretreatment model have a lower *p*-value (0.0003) than the DA pretreatment model (0.0247). In addition, the Prob *> F* values for each model term in AL pretreatment suggest that A, C, and A<sup>2</sup> , meanwhile for DA pretreatment suggest that only A<sup>2</sup> are the model terms that have significant effects on the TRS yield. To determine the suitability of the model, the lack of fit test was used, which indicated an insignificant lack of fit with an *F*-value of 0.1393 and 0.3009 for AL and DA pretreatment, respectively. The coefficient of determination (*R* 2 ) of the pretreatment models was 0.9151 for AL and 0.7270 and for DA, implying a good and average correlation between the observed and predicted values of AL and DA respectively, as shown in **Figures 1A,B**. Finally, the quadratic models developed for AL and DA pretreatment are appropriate for predicting TRS yield under different pretreatment conditions within the range used in the present study.

### **Effect of Pretreatment Conditions on Solids Recovery**

The highest solids recovery for AL and DA was obtained in the same run (13) with 87.6 and 86.1%, respectively, with experimental conditions of 0.73% catalyst concentration, 74.8 min and 24.53% solids loading. On the other hand, the lowest solids recovery for AL pretreatment of 60.7% was obtained during run 8


*DF, degree of freedom.*

**TABLE 4 | ANOVA table for the quadratic model of dilute acid pretreatment**.


*DF, degree of freedom.*

(3.00% catalyst concentration, 52.5 min and 16.5% solids loading), while for DA pretreatment was 54.4% with run 10 using 2.42% catalyst concentration, 74.8 min and 8.47% solids loading. The difference between low and high solids recovery, which represents process severity are 26.9 and 31.7% for AL and DA pretreatment, respectively.

### **Effect of Pretreatment Catalyst Concentration and Retention Time**

The effect of catalyst concentration and retention time in AL and DA pretreatment on TRS yield during enzymatic saccharification using 3% biomass loading of are shown in **Figure 2**. By means of pretreatment shorter retention times and catalyst concentration, the TRS yield became lower and the same applies to longer times and high catalyst concentration for both AL and DA pretreatment.

However, for AL pretreatment from 1.58 to 2.43% NaOH a TRS yield above ~460 mg sugar/g biomass is obtained within the study range of 15–90 min. In the other hand, in DA pretreatment a more distributed region is shown where the highest TRS yields was obtained at the central design points with a relatively shorter differences between the highest yield that occurred in run 7 (457 mg/g biomass) and an average of the central data points (433 mg/g biomass).

### **Effect of Pretreatment Catalyst Concentration and Solid Loading**

The response surface plots presents the effect of catalyst concentration and solid loading on TRS yield of both AL and DA pretreatment is displayed in **Figure 3**. One area for AL pretreatment is clearly defined showing the highest TRS yield region in the middle range of both parameters. A TRS yield above 500 mg/g biomass is obtained in the range of 1.1–2.3% NaOH and solid loading between 4 and 20%. These results are supported with previous reports in AL pretreatment where using the same temperature conditions (121°C), moderate NaOH concentration (1%) and time (30–60 min), which achieved the highest TRS yield (Wang et al., 2010; Xu et al., 2010). During DA pretreatment a clear region where a TRS yield above 430 mg/g biomass was reached within the range of 0.7–2% acid and a solid loading of 3–15%. It is noticeable that such differences between the TRS yields were obtained from the highest experimental runs from both pretreatments at ~533 mg/g biomass from run 15 in AL and ~457 mg/g biomass from run 7 in DA. This differences are encounter from the objective of each pretreatment, which in the case of AL pretreatment is lignin removal whereas for DA pretreatment xylan removal is the main effect, as consequence a lower TRS yield should be obtained as there is lower xylan available as a substrate for the enzymes to be reacted into xylose causing a lower total TRS yield.

### **Optimization of Pretreatment Conditions**

In both of the evaluated pretreatment processes (AL and DA), a lower catalyst concentration, shorter time and high solids loading if preferred to obtain an optimum TRS yield. The optimum catalyst concentration, retention time and solid loading were found to be for AL pretreatment of 1.87% NaOH concentration, 50.3 min and 13.1% solids loading, while DA pretreatment were 2.1% acid concentration, 33.8 min and 8.5% solids loading. For AL pretreatment, an 18% increase in NaOH concentration, 4% reduction in retention time and 20% reduction of solids loading, whereas for DA pretreatment, 33% increase in acid concentration, 35.6% reduction in retention time and 283% increase of solids

**FIGURE 2 | Response surface plots showing the effects of time and catalyst concentration for (A) alkaline pretreatment and (B) dilute acid pretreatment**.

loading and when comparing the optimum conditions with the experimental conditions (Run 7, **Table 2**) that gave the highest yields.

### **Thermogravimetric and Differential Scanning Calorimetry Analysis**

Untreated and selected pretreated AGB samples were thermogravimetrically analyzed to compare degradation characteristics in terms of pretreatment. Two samples were selected for TGA analysis for each pretreatment, named AL-1 and DA-1 corresponding to experimental run 8, in addition to AL-2 and DA-2 corresponding to experimental run 16 (one of the CCD points). **Figure 4** shows standards weight loss plots, while in **Figure 5** the differential TGA plots of the untreated and pretreated AGB samples are shown. All samples exhibit three decomposition regions with some initial weight loss from 50 to 125°C (mainly due to moisture evaporation). Up to 200°C, the samples presented thermal stability. The decomposition temperature (*T*d) decrease for both AL and AL pretreated samples as compared to the untreated AGB, shown in **Table 5**. In both of the analyzed pretreatment the lowest values correspond to AL-1 (run 8 sample). These results indicate that AL pretreatment reduced the activation energy that is needed to decompose the AGB in a higher extent than DA pretreatment by deconstructing the tight plant cell wall structures. AL-pretreated AGB samples obtained a lower *T*<sup>d</sup> value when compared to an ionic liquid treated AGB from a recent report (310 vs. 347°C) (Perez-Pimienta et al., 2015). Thermal depolymerization of hemicelluloses and the cleavage of glycosidic linkages of cellulose occurs in the region of 220–300°C, while lignin decomposition extended to the whole temperature range, from 200 until 700°C, due to different activities of the chemical bonds present on its structure and the degradation of cellulose taken place between 275 and 400°C (Deepa et al., 2011). The final decomposition stage for all samples was completed above 400°C, where a weight loss due to thermolysis of carbon containing residues does take place (Fisher et al., 2002). DSC curves of untreated

AGB and selected samples from AL and DA (Figures S1–S3 in Supplementary Material) with two endothermic peaks observed and Table S1 in Supplementary Material summarizes those events. The first thermal is shown below 200°C with low energy between 5.3 and 13.9 J/g°C, where the untreated AGB present the onset temperature at 83°C (8.6 J/g°C), while the AL-4 (run 16 of AL pretreatment) achieved 13.9 J/g°C, whereas for DA the highest energy event was at 12.2 J/g°C with DA-1 (run 8) that employed a 3% acid loading. A similar peak was obtained with an IL-treated AGB sample where the untreated sample showed a dehydration peak at 89°C (Perez-Pimienta et al., 2015). In the other hand, the second thermal event presents a high energy peak for all samples with ΔH in the range of 120–627 J/g°C and temperature above 262 up to 415°C. AL pretreatment achieved its highest energy with run 16 (AL-4) with a peak at 335°C (627 J/g°C), whereas the evaluated DA-pretreated samples was with run 9 (AL-2) at 358°C and 296 J/g, so when compared to the untreated sample it is clear that a pretreated offers a reduction in terms of calorific value turning them into a more digestible biomass.

### **Scanning Electron and Confocal Fluorescence Microscopy**

The SEM images of untreated and pretreated samples (run 16 sample for both AL and DA pretreatment) were taken at 500*×* (**Figure 6**). Untreated AGB (**Figure 6A**) presents an intact structure without degradation, otherwise AL pretreatments dissolves lignin disrupting the biomass, besides of the increase of pore quantity as can be observed in **Figure 6B**. Finally, DA pretreatment disrupts the lignocellulosic structure by mainly dissolving hemicellulose, hence, major microfibrous cellulose structures remain (**Figure 6C**) and some lignin or lignin–carbohydrate complexes may be condensed on the surface of the cellulose fibers.

Elements content of untreated and pretreated AGB (run 16 from AL and DA pretreatment) are presented in **Table 6**. In the untreated AGB, C and O accounts for a 98.5% of the totals mass fraction remaining only 1.4% of Ca, these attributable to calcium

**TABLE 5 | Decomposition** *T***<sup>d</sup> temperatures for untreated and pretreated AGB**.


oxalate (CaC2O4) crystals in considerable quantities along the surface of the plant cell wall as referred in a previous paper (Perez-Pimienta et al., 2015). In contrast, the DA-treated AGB the available Ca was removed during the process at these conditions (1.58% acid concentration, 52.5 min and 16.5% solids loading). Nonetheless, this Ca removal does not occurred in the AL-treated sample where a small amount of Na (1.2%) was found, possibly, as a result of some of the alkali was converted to irrecoverable salts and/or incorporated into the biomass.

Confocal fluorescence microscopy was used to investigate the surface morphologies of untreated and pretreated AGB (run 16 from AL and DA pretreatment) as presented in **Figures 7A–F**. When compared to the untreated AGB, only the DA-pretreated sample show a significant reduction in the fluorescence signal intensity in cell walls (lignin is represented with a green signal and cellulose with a blue signal), while the AL-treated sample presents only a slight reduction.


**FIGURE 7 | Confocal fluorescence images of AGB samples: (A,D) untreated, (B,E) alkaline pretreated, and (C,F) dilute acid pretreated**.

## **Effect of Pretreatment on Biomass Porosimetry**

Pretreatment can affect the cellulose accessibility and is often accompanied by variation in the surface area. Surface area, pore volume, and pore average diameter were measured using the Brunauer–Emmett–Teller (BET) method by argon adsorption, which relates the gas pressures to the volume of gas adsorbed, although might not be directly associated to enzyme accessibility since the size differences between argon molecules and enzymes (Li et al., 2013). **Table 7** summarizes surface area, pore volume and pore average diameter of untreated and run 16 (one of the CCD points from both AL and DA-pretreated AGB). When compared to the untreated samples an increment in the surface

**TABLE 7 | Comparison of porosimetry parameters in untreated and pretreated AGB**.


area is noticeable from 0.6 up to 1.1 m<sup>2</sup> /g. This is consistent with the changes in the SEM images upon AL and DA pretreatment described above. However, the pore volume of all samples (untreated and pretreated) presents a negligible difference close to 0.0008 cm<sup>3</sup> /g, whereas a reduction in the pore average diameter is obtained in the pretreated samples.

## **Conclusion**

The effects of catalyst concentration, retention time and solids loading in terms of TRS yield of AL and DA pretreatment in AGB were investigated. This study demonstrated that AGB is a promising biofuel feedstock that can achieved high sugar yields using both DA and AL pretreatment. For both pretreatments, a model was generated with a high correlation obtained from actual

## **References**


TRS data. Furthermore, the results indicate that TRS yield was enhanced by catalyst concentration and solid loading, but longer retention times does not. Both pretreatment increase porosity and surface area, but AL pretreatment achieved a lower decomposition temperature. Finally, RSM was also used to optimize the pretreatment conditions for maximum TRS yield. The optimum conditions were determined for AL pretreatment: 1.87% NaOH concentration, 50.3 min, and 13.1% solids loading, whereas DA pretreatment: 2.1% acid concentration, 33.8 min, and 8.5% solids loading. Finally, fuel synthesis studies should be performed in the sugars obtained using the best conditions for both pretreatments in order to obtain significant data for a scale-up process.

## **Acknowledgments**

The authors gratefully thank CONACYT through the project 229711 and internal funding from Universidad Autónoma de Nayarit (Autonomous University of Nayarit).

## **Supplementary Material**

The Supplementary Material for this article can be found online at http://journal.frontiersin.org/article/10.3389/fbioe.2015.00146


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Ávila-Lara, Camberos-Flores, Mendoza-Pérez, Messina-Fernández, Saldaña-Duran, Jimenez-Ruiz, Sánchez-Herrera and Pérez-Pimienta. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Evaluating lignocellulosic biomass, its derivatives, and downstream products with Raman spectroscopy

### **Jason S. Lupoi 1,2, Erica Gjersing1,2 and Mark F. Davis 1,2\***

<sup>1</sup> Oak Ridge National Laboratory, BioEnergy Science Center, Oak Ridge, TN, USA

<sup>2</sup> National Renewable Energy Laboratory, National Bioenergy Center, Golden, CO, USA

### **Edited by:**

P. C. Abhilash, Banaras Hindu University, India

### **Reviewed by:**

Zongbao K. Zhao, Dalian Institute of Chemical Physics, China Yunqiao Pu, Georgia Institute of Technology, USA

### **\*Correspondence:**

Mark F. Davis, National Renewable Energy Laboratory, BioEnergy Science Center, 15013 Denver West Parkway, Golden, CO 80401-3305, USA e-mail: mark.davis@nrel.gov

The creation of fuels, chemicals, and materials from plants can aid in replacing products fabricated from non-renewable energy sources. Before using biomass in downstream applications, it must be characterized to assess chemical traits, such as cellulose, lignin, or lignin monomer content, or the sugars released following an acid or enzymatic hydrolysis. The measurement of these traits allows researchers to gage the recalcitrance of the plants and develop efficient deconstruction strategies to maximize yields. Standard methods for assessing biomass phenotypes often have experimental protocols that limit their use for screening sizeable numbers of plant species. Raman spectroscopy, a non-destructive, noninvasive vibrational spectroscopy technique, is capable of providing qualitative, structural information and quantitative measurements. Applications of Raman spectroscopy have aided in alleviating the constraints of standard methods by coupling spectral data with multivariate analysis to construct models capable of predicting analytes. Hydrolysis and fermentation products, such as glucose and ethanol, can be quantified off-, at-, or on-line. Raman imaging has enabled researchers to develop a visual understanding of reactions, such as different pretreatment strategies, in real-time, while also providing integral chemical information. This review provides an overview of what Raman spectroscopy is, and how it has been applied to the analysis of whole lignocellulosic biomass, its derivatives, and downstream process monitoring.

**Keywords: Raman spectroscopy, high-throughput, lignin, glucose, xylose, process monitoring, cellulose, ethanol**

### **INTRODUCTION**

The production of fuels, chemicals, and materials from plants has offered an opportunity to supplant usage of products fashioned from non-renewable energy sources. Lignocellulosic biomass is predominantly composed of cellulose, non-cellulosic polysaccharides (NCPs), and lignin, and provides a useful starting feedstock for industrial processes. Before a specific plant can be considered for downstream applications, the chemical traits of the biomass must be characterized. These assessments include, but are not limited to, the compositional analysis of the plant's cellulose, NCP, and lignin contents, the ratio of syringyl (S), guaiacyl (G), and *p*-hydroxyphenol (H) lignin monomers, the release of simple sugars following an acid or enzymatic hydrolysis, and the cellulose crystallinity index. Many of these evaluations gage the recalcitrance of the plant cell wall, and enable researchers to develop appropriate pretreatment strategies to deconstruct the biomass (Blanch et al., 2011), or genetic strategies to synthesize a more ideal starting feedstock (Furtado et al., 2014). The standard methods developed for biomass characterization are beneficial for evaluating small sample sets, but specific experimental attributes limit their use for screening large arrays of prospective plants to isolate those possessing quintessential traits for biofuel and/or biomaterial production. These attributes include laborious sample preparation protocols [derivatization of samples in gas chromatography (GC) analysis and sample clean-up for liquid chromatography or

GC], use of toxic reagents that may require remediation (acetyl bromide, boron trifluoride etherate, trifluoroacetic acid, sulfuric acid), long analysis times [chromatography, nuclear magnetic resonance (NMR)], complex data analysis [pyrolysis GC/mass spectrometry (MS) analysis of lignin monomer content], and/or the destruction of the sample (pyrolysis, GC, solution state NMR). In order to circumvent some of these limitations, techniques have been developed that are non-destructive, require little to no sample preparation, and have increased throughout, allowing more plants to be assessed in less time and with reduced experimental costs (Lupoi et al., 2014b).

The phenomenon of Raman scatter was first envisaged theoretically in 1921 by Smekal, and was proven experimentally in 1928 by Raman and Krishnan, as well as Lansberg and Mandelstam (Smekal, 1923; Landsberg and Mandelstam, 1928; Raman and Krishnan,1928). Raman spectroscopy is a vibrational spectroscopy technique in which the scattered photons, generated during the interaction between light and matter, are measured. While the light source C. V. Raman used was sunlight, modern applications of Raman spectroscopy employ ultraviolet (UV), visible, or near-infrared (NIR) lasers. The scattering produced can have an identical (elastic), higher (inelastic), or lower (inelastic) frequency than that of the excitation source [**Figure 1**; Lupoi (2012)]. These types of scattering are named Rayleigh, Stokes, and anti-Stokes, respectively (Carey, 1982; McCreery, 2000; Smith and Dent, 2005;

Popp, 2006). Rayleigh scattering is the most intense, and needs to be thoroughly removed from the optical beam path using specialized optics such as holographic notch filters (HNFs) (Smith and Dent, 2005; Dao, 2006). If not eliminated, Rayleigh scattering can lead to saturation of the detector, and can obscure Raman signal from Stokes scattering, a much weaker phenomenon, as only approximately one per one million photons generated lead to this type of inelastic scattering (Smith and Dent, 2005). Stokes scattering is the most common type measured using Raman spectroscopy, and results in an energy shift to higher vibrational levels. Anti-Stokes scattering results in a shift from a higher to lower vibrational levels, and is less common due to the lower probability of molecules populating higher vibrational levels at ambient conditions. An important feature of the Raman phenomenon is that, unlike in infrared (IR) spectroscopy, molecules are promoted to short-lived, virtual vibrational levels (**Figure 1**). Therefore, matching the excitation frequency to that necessary to promote molecules from the ground state to the first excited vibrational level is not requisite.

A molecule is considered "Raman active" if there is a change in the polarizability of the electron cloud during the interaction of the molecule with light. Vibrational modes including C–C, C = C, C–H, C–O, H–C–C, C–O–H, H–C–H, etc., can be expected in an archetypal Raman spectrum (Wiley and Atalla, 1987; Agarwal and Ralph, 1997; Larsen and Barsberg, 2010; Agarwal et al., 2011). As a rule of thumb, symmetric bonds will have the largest changes in polarizability and therefore the strongest Raman signals. **Table 1** lists representative vibrational modes measured in biomass constituents, and their respective band assignments. In contrast to Raman, a change in dipole moment leads to molecules being "IR active" in IR spectroscopy. Therefore, asymmetric bonds have strong peaks in IR spectra. This difference in selection rules signifies that these two techniques provide complementary information. Vibrational modes that are Raman active will not be present or have small contributions in IR spectra, and vice versa. If a molecule has a center of symmetry, the principle of mutual exclusion states that the molecule will either be IR*or* Raman active. Some non-centrosymmetric molecules, such as those possessing C<sup>1</sup> symmetry, and hence no symmetry, can have both IR and

Raman active vibrational modes (Ingle and Crouch, 1988). Examples of these types of molecules include isopropyl alcohol, propylene glycol, and 2-butanol (National Institute of Standards and Technology (NIST), 2013). The diatomic nitric oxide is another molecule that, although it produces only one peak, gives rise to IR and Raman active modes, since there is both a change in dipole and polarizability (Smith and Dent, 2005). Another significant difference between the two techniques is the ability of Raman spectroscopy to be used for measuring aqueous and biological samples, whereas IR spectra are appreciably hindered by the presence of water. Lastly, Raman spectra are often less complex than IR spectra due to the diminished signals of overtone and combination vibrational modes, leading to more spectrally resolved peaks.

Typical Raman instruments are composed of the excitation source (i.e., lasers), beam-steering optics to focus the incident light onto the sample, collect the generated scattering, and guide the scattering to the entrance slit of a spectrometer (McCreery, 2000; Dao, 2006; Meyer et al., 2011). HNFs block Rayleigh scattering from entering the spectrometer. Dispersive instruments utilize gratings to diffract the scattering. The diffracted light is projected through the exit slit, and onto a detector. Various detectors, such as the charge-couple device or indium gallium arsenide NIR detector convert light into an electronic response, producing the Raman spectrum. The selection of the excitation wavelength is critical to obtaining quality Raman spectra, as the Raman intensity is directly proportional to the incident frequency to the fourth power. Therefore, when employing UV or visible lasers, the Raman signal, in theory, will be more intense. This is not often achieved in practice when using visible excitation to measure biomass, as the intrinsic fluorescence generated from plants can significantly conceal the Raman signal. Higher energy UV lasers require some precautions to be taken, such as the prevention of sample degradation, which can be achieved by equipping the instrument with a rotating stage, and the enactment of adequate safety strategies. Researchers have developed various techniques to combat the difficulty in obtaining Raman spectra from highly fluorescent molecules like lignin. These methods have typically included the employment of NIR excitation sources, such as 1064 nm neodymium-doped yttrium

### **Table 1 | Vibrational modes and band assignments measured in lignocellulosic biomass**.


### **Table 1 | Continued**


### **Table 1 | Continued**


### **Table 1 | Continued**


NCSPs, non-cellulosic structural polysaccharides.

orthovanadate or neodymium-doped yttrium aluminum garnet lasers (Agarwal et al., 2011; Meyer et al., 2011; Lupoi and Smith, 2012). NIR lasers, having the longest wavelength, lead to diminished spectral intensities. Conversely, since fluorescence emission maxima occur at lower wavelength regions, the employment of NIR excitation can result in significantly reduced spectral background. As an example, the use of a 785 nm laser, juxtaposed to a 1064 nm laser, will produce 3.8-times more scattering (Meyer et al., 2011). The analysis of a lignin sample using both excitations, however, revealed a background that was 160-times higher when employing the higher frequency 785 nm laser (Meyer et al., 2011). Most of these applications utilizing NIR lasers have been Fouriertransform Raman (FT-Raman) spectroscopy experiments. However, instrumental advances, such as better detectors for NIR wavelengths, have enabled NIR, dispersive Raman spectroscopy to provide a lower cost alternative to FT-Raman systems (Chase and Talmi, 1991; Lewis et al., 1993; Barbillat and Da Silva, 1997). Other instrumental methods like coherent anti-Stokes Raman scattering (CARS) and stimulated Raman scattering (SRS) spectroscopies have also provided fluorescence free Raman spectra (Saar et al., 2010; Zeng et al., 2012; Pohling et al., 2014).

Due to the complex composition of biomass, Raman spectra should be prudently interpreted. There can be significant spectral overlap between vibrational modes, challenging a routine spectral assignment of peaks. Cellulose and hemicellulose are structurally similar, and therefore, exhibit comparable Raman spectra. Subtle differences due exist, however, and quantitation may require the use of minor, rather than the most intense peaks (Shih et al., 2011). Hemicelluloses, due to their disorder and complexity, typically result in broader Raman bands than cellulose (Gierlinger and Schwanninger,2006). Raman vibrational modes of cellulose are strongly affected by crystallinity and fiber orientation, enabling studies of cellulose polymorphs (Schenzel and Fischer, 2001). The dominant lignin vibrational mode near 1600 cm−<sup>1</sup> is assigned to ring breathing, and therefore, is comprised of any phenyl-containing molecules, like flavonoids. If a biomass sample has a high extractable content, i.e., herbaceous feedstocks, the 1600 cm−<sup>1</sup> peak will include contributions from lignin and other extractable molecules (Lupoi and Smith, 2012). Studies on lignin, therefore, require the efficient removal of extraneous species. Additionally, the 1600 cm−<sup>1</sup> lignin peak contains overlapping signals from S, G, and H lignin monomers, complicating quantitative or semi-quantitative analyses between different biomass species (Lupoi and Smith, 2012). If the ratio of the monomers is known and does not significantly change between samples, and the samples have been exhaustively extracted, the 1600 cm−<sup>1</sup> mode may be useful for evaluating relative lignin contents within feedstocks.

### **DISPERSIVE RAMAN SPECTROSCOPY**

As previously mentioned, NIR dispersive Raman spectroscopy can provide a suitable, less costly alternative to FT-Raman spectroscopy. Despite this, there are relatively few instances of researchers using this instrumental configuration (Roder and Sixta, 2005; Shih and Smith, 2009; Li et al., 2010, 2011, 2013; Meyer et al., 2011; Shih et al., 2011; Zakzeski et al., 2011; Lupoi and Smith, 2012; Ewanick et al., 2013; Gray et al., 2013; Azimvand, 2014; Iversen et al., 2014). A comparison between 785 and 1064 nm excitation sources revealed the latter to provide better signal-to-noise (S/N) when measuring hydrolytic lignin using home-built Raman spectrometers (**Figure 2**) (Meyer et al., 2011). The spectrum generated using the 785 nm laser exhibited a broad, featureless fluorescence background (**Figure 3**). The fluorescence

### **FIGURE 2 | Instrumental schematic of a 1064 nm dispersive**

**multichannel Raman spectrometer**. The 1064 nm laser is focused onto a sample using a plano-convex lens (L1). The Raman scatter is collected with another plano-convex lens (L2) and focused onto the entrance slit of the spectrometer with a third plano-convex lens (L3). A holographic notch filter (HNF) is used to remove Rayleigh scattering. The spectrometer is equipped with a 1024-multichannel InGaAs detector. The helium–neon laser is oriented co-linearly with the 1064 nm laser, using a dichroic mirror, to facilitate instrumental alignment [reprinted with permission from Elsevier, Meyer et al. (2011)].

emission peak maximum is expected to be in the visible region of the electromagnetic spectrum. When excited with the 785 nm light, however, a low intensity peak was detected that resembled the background measured in the Raman spectrum. Although the intensities of the peaks generated using the 1064 nm laser were relatively weak, the fluorescence was virtually eliminated (**Figure 3**). This instrumental configuration also provided higher S/N when compared to a commercial FT-Raman spectrometer using acquisition times greater than 15 s. The same system was used to develop a principal component regression (PCR) model to predict the S and G lignin content of a diverse assortment of feedstocks, including *Miscanthus*, switchgrass, poplar, and pine (Lupoi and Smith, 2012). The model was constructed from Raman spectral data conjoined with thioacidolysis/GCMS S and G lignin percentages.

The quantitation of glucose, xylose, and ethanol in complex matrices illustrated other novel applications of NIR, dispersive Raman spectroscopy (Shih and Smith, 2009; Shih et al., 2011). Raman methods were juxtaposed to those obtained using UV/visible (UV/VIS) spectrophotometry and headspace-GCMS. The UV/VIS methods required longer sample preparation and incubation steps. The GCMS analysis required the samples to be preheated to promote ethanol into the headspace, and had an experimental run time over 10 min per sample. The Raman measurements required relatively no sample preparation, and the spectral data was obtained using a 200 s acquisition time for glucose and xylose, and 100 s for ethanol. Another interesting feature of this work was the demonstration of the ability to simultaneously quantify glucose and xylose in hydrolyzate liquor using a multi-peak curve fit, with detection limits of 3 ± 2 and 1 ± 1 mg mL−<sup>1</sup> for glucose and xylose respectively (**Figure 4**). The authors also evaluated the effects of various biomass pretreatment strategies on the ability to measure glucose. Soaking the biomass in aqueous ammonia or extracting using an aqueous ethanol solution resulted in lower detection limits. An acid pretreatment did not lower the detection limit, indicating that it was likely lignin and/or extractives like non-lignin phenolics that caused the higher spectral background, and thus elevated detection limits. These results clearly demonstrate the competence of Raman spectroscopy to measure hydrolysis and fermentation products rapidly and accurately.

In addition to evaluating samples after the reaction has concluded (off-line), Raman spectroscopy can provide a valuable on-line, process monitoring tool, such as during the fermentation of glucose to ethanol. A fiber optic probe can be inserted directly into the reaction slurry. When glucose solutions were used as the starting feedstock, a partial least squares (PLS) model that married the Raman spectral data to standard HPLC ethanol and glucose measurements revealed correlation coefficients (*R* 2 ) of 0.984 for ethanol and 0.92 for glucose (**Figure 5**) and good root mean standard errors of cross validation (RMSECV = 0.41, ethanol; 0.53, glucose) given the concentration range evaluated (Ewanick et al., 2013). When switchgrass hydrolyzate liquor was used as the fermentation feedstock, the measurement of glucose was significantly hindered. The hampered ability to quantify glucose resulted from its low concentration as well as the complex, heterogeneous nature of the hydrolyzate, which likely contained lignin-derived phenolics. The ability to measure the spectra of and use PLS to predict the concentration of ethanol was not impeded (*R* <sup>2</sup> = 0.935, RMSECV = 0.60). A Raman spectrometer equipped with a 993 nm laser and a fiber optic probe enabled the real-time

study of the formation of a complex assortment of products generated during a simultaneous saccharification and fermentation reaction (Gray et al., 2013). A simple univariate calibration using the 883 cm−<sup>1</sup> vibrational mode allowed the quantitation of ethanol. The calibration was validated using a separate set of fermentation samples, and exhibited a *R* <sup>2</sup> = 0.996, and a standard error of prediction (SEP) of 0.604. Multivariate PLS calibration models were generated for total starch, dextrins, maltotriose, maltose, glucose, and ethanol using HPLC standard measurements. The percentage error (defined as the SEP/modeling concentration range) was quite low for ethanol (2.1%), starch (2.5%), and dextrin (4.7%) when the calibration sets were broken up into low and high concentration series. The error was approximately two to seven times higher when only one calibration set was employed for these analytes. The percentage errors of glucose, maltose, and maltotriose were 12% or higher. On-line fermentation monitoring has

[reprinted with permission from Elsevier, Shih et al. (2011)].

been further illustrated using a similar instrumental configuration for the estimation of ethanol, glucose, and yeast concentrations (Iversen et al.,2014). Increasing concentrations of yeast were found to decrease the intensities of ethanol and glucose peaks caused by Mie scattering from the cells. The attenuation of the Raman signal was corrected using the 1627 cm−<sup>1</sup> water band as an internal standard to adjust for the scattering from cell particulates. Once the spectra were corrected using the developed quadratic equations for each analyte, a simple linear regression allowed the quantitation of glucose and ethanol with high correlation (*R* <sup>2</sup> = 0.99, ethanol; 0.995, glucose). This method also enabled the estimation of the yeast concentration.

As previously discussed, visible Raman excitation sources are not commonly employed, due to the intrinsic fluorescence of biomass; however, there have been some recent applications of visible Raman spectroscopy. A frequency doubled 1064 nm Nd:YAG green laser (532 nm for analysis) was used in an interesting study of laser-induced fluorescence (LIF) (Lähdetie et al., 2013). A variety of model compounds representing typical lignin sub-structures were evaluated, including erol, bioerol, dibenzodioxocin, 4-*O*-methylated bioerol, two synthesized phenolic compounds, and dehydrodivanillin-5-5<sup>0</sup> . Erol and dibenzodioxocin were easily measured with a relatively flat baseline. Bierol and 4-*O*-methylated bioerol revealed broad fluorescence backgrounds containing relatively no Raman bands. The synthesized molecules

could be measured with only moderate spectral contributions from fluorescence; however, measurement of dehydrodivanillin-5-5<sup>0</sup> resulted in the suppression of Raman modes by a fluorescence background. The authors conclude that the 5-5<sup>0</sup> linkage is likely a strong source of LIF. Molecules that did not possess a conjugated link between two phenolic moieties did not exhibit fluorescence in the Raman spectra. Dibenzodioxocin, although it possesses the 5- 5 0 linkage, did not display a strong fluorescent background, which the authors deduce likely stems from the molecule's rigid octagonal ring. Raman spectra from spruce wood and thermomechanical pulp (TMP), using the 532 nm laser, showed fluorescent backgrounds, however, the characteristic cellulose and lignin peaks were clearly discernible. When chemically treated pulps [kraft, enzymatic mild acidolysis lignin (EMAL), and milled wood lignin (MWL)] were analyzed, however, LIF was more pronounced. While EMAL and MWL isolation procedures are considered to be mild, retaining the native lignin structure, the analysis of these samples was distinctly different than lignin in wood. The authors hypothesized that the lack of a strong enough LIF background to prevent analysis of spruce likely arises from lignin being bound to the polymer matrix, preventing a release of fluorescence emission. Since EMAL and MWL are no longer connected to the polymeric network, a more malleable conformation results, which could trigger the increased fluorescence background. Other analyses using visible laser sources for Raman spectroscopy of biomass include the analysis of carbonaceous plant materials like bio-char (Ochoa et al., 2014; Tsaneva et al., 2014), and how changes in the cellulose crystallinity of delignified hybrid poplar samples affected the enzymatic hydrolysis yields (Laureano-Perez et al., 2006).

### **FOURIER-TRANSFORM RAMAN SPECTROSCOPY**

Fourier-transform Raman spectroscopy has been the most commonly used instrumental configuration for the analysis of biomass (Agarwal and Atalla, 1993; Sene et al., 1994; Agarwal and Ralph, 1997; Ona et al., 1997; Takayama et al., 1997; Kacurikova et al., 1998; Ona et al., 1998a; Ona et al., 1998b,c; Ona et al., 2000; Schenzel and Fischer, 2001; Sivakesava et al., 2001a; Sivakesava et al., 2001b; Kihara et al., 2002; Proniewicz et al., 2002; Agarwal et al., 2003; Ona et al., 2003; Cao et al., 2004; Vester et al., 2004; Agarwal and Kawai, 2005; Schenzel et al., 2005; Keown et al., 2007; Schulz and Baranska, 2007;Agarwal and Ralph, 2008;Keown et al., 2008; Schenzel et al., 2009; Agarwal and Atalla, 2010; Agarwal et al., 2010; Larsen and Barsberg, 2010;Agarwal, 2011;Agarwal et al., 2011;Chundawat et al., 2011; Larsen and Barsberg, 2011; Sun et al., 2012;Agarwal et al., 2013;Kim et al., 2013; Lupoi et al., 2014a; Wójciak et al., 2014; Lupoi et al., 2015). A recent study surveyed three high-throughput vibrational spectrometers (NIR, FTIR, and FT-Raman) to evaluate which was best suited for developing PLS models for predicting lignin S/G ratios (Lupoi et al., 2014a). Pyrolysis-molecular beam MS (pyMBMS) data from 245 diverse *Acacia* and eucalypt (*Eucalyptus* and *Corymbia*), encompassing 17 different biomass species, was coupled with NIR, FTIR, and Raman spectral data to build one global model. Iterations of different spectral processing techniques were conducted to see which permitted the most robust, accurate PLS model(s). The 245 samples were split into randomly generated 195-sample calibration and 50-sample

validation sets. Additionally, the metrics used for evaluating each model were the result of three, independent, randomized models for each type of spectral transformation. The low error in the calibration and validation statistics indicated that these models were highly robust, as in most cases, it did not matter which samples were in the calibration or validation sets since every combination employed led to similar metrics. The best models (**Table 2**), based on RMSEP, were constructed using first-derivative, seven-point smoothed, Raman spectra with an extended multiplicative scatter correction (EMSC) (RMSEP = 0.13) and FTIR spectra that had been transformed using EMSC, first, and then the second derivative with 15-point smoothing (RMSEP = 0.13). In a follow-up study, the best Raman model was used to predict the lignin S/G ratio of 269 unknown *Acacia* and eucalypt samples (Lupoi et al., 2015). The calibration and validation sets used to generate the model were recombined to provide a larger data set, enabling more accurate predictions. The Raman predicted S/G ratios displayed no statistical differences from the pyMBMS measured results for all but one of the biomass species (**Table 3**). Additionally, the plant samples were ranked to illustrate which had the lowest and highest S/G ratios.

Lignin S/G ratios of Eucalyptus, sorghum, switchgrass, maize, and *Arabidopsis* were evaluated using the deconvolution of FT-Raman spectra into peaks identified as representative of S or G lignin monomers (Sun et al., 2012). The specific vibrational modes unique to the different biomass constituents were determined through the measurement of cellulose, xylan, and various model compounds, such as coniferaldehyde, sinapic acid, creosol, 5-methylpyrogallol trimethyl ether, sinapinaldehyde, and sinapyl alcohol. Spectrally resolved peaks corresponding to S or G lignin derivatives were then applied to the biomass samples. The ratios of the resolved S and G peaks were determined and compared to pyGCMS results. The ratios calculated using Raman spectroscopy were consistently higher than those measured using pyGCMS, which could be due to the presence of polysaccharide vibrational modes overlapping with spectral regions designated for each monomer. The deconvolution process itself also contributed to some false peaks such as an artificial S band for pine, a plant known to contain no real S components. Nonetheless, a calibration curve generated using the pyGCMS and Raman calculated ratios resulted in a reasonable correlation (*R* <sup>2</sup> = 0.983). *Arabidopsis* mutants were used to validate the regression model, resulting in a better correlation with the pyGCMS S/G ratios.

When analyzing lignocellulosic materials with Raman spectroscopy, a phenomenon termed"self-absorption"must be considered (Agarwal and Kawai, 2005). Self-absorption occurs when scattered photons are re-absorbed back into the analyte, resulting in an attenuation of the scattered light reaching the detector. This can be visually identified in a Raman spectrum by the decrease in intensity of a vibrational mode where the molecule absorbs light. An analysis of cellulose filter paper, spruce TMP, and MWL illustrated that most of the spectral suppression occurred at the 2895 cm−<sup>1</sup> C–H peak of the filter paper and TMP (Agarwal and Kawai, 2005). Evaluation of the spectra pointed to cellulose and water as the main contributors of self-absorption, while lignin's involvement was unmeasured. FT-Raman spectroscopy enabled the analysis of the structure of MWLs produced from hard- and softwoods


**Table 2 | Comparison of partial least squares models using vibrational spectroscopy and pyrolysis-molecular beam mass spectrometry [reprinted from Lupoi et al. (2014a)]**.

<sup>a</sup>Standard error of the laboratory for the validation data.

<sup>b</sup>Standard error of prediction.

<sup>c</sup>Root mean standard error prediction.

<sup>d</sup>Correlation coefficient for the validation set.

<sup>e</sup>Pearson coefficient of determination for validation.

<sup>f</sup>Number of outliers removed from validation models.

<sup>g</sup>Average errors of three randomly generated models using data provided. Models were not statistically different.

The numbers listed parenthetically reflect the degree of Savitzky–Golay spectral smoothing.

Statistical values are the average of three independent models.

and chemically treated black spruce (Agarwal et al., 2011). The Raman spectra revealed distinct changes when differentiating the untreated to pretreated samples. Acetylation and methylation produced sizeable changes in aliphatic C–H vibrational modes, and also resulted in the formation of several new peaks.

The viability of FT-Raman spectroscopy for monitoring a bioethanol process has also been explored (Sivakesava et al., 2001b). Glucose, ethanol, and optical cell density were evaluated during ethanol fermentation. Raman spectra were coupled with HPLC results for the construction of PLS and PCR models. Although the predictions of glucose and ethanol were acceptable, the cell density modeling proved to be more erroneous due to the weak scattering generated from the cultures. Another study analyzed glucose, lactic acid, and cell density, at-line, during a lactic acid fermentation process (Sivakesava et al., 2001a). PLS models generated using IR, NIR, and Raman spectral data were contrasted, with the Raman models having the second lowest SEP in glucose prediction. The Raman SEP of lactic acid and cell density predictions ranked third between the three instruments. The authors attribute this lack of accuracy to the fact that glucose, lactic acid, and proteins have weaker Raman signals compared to IR spectroscopy.

### **RESONANCE RAMAN SPECTROSCOPY**

Resonance Raman (RR) spectroscopy is achieved when a molecule has an electronic absorption that overlaps with the excitation source wavelength, resulting in the promotion of the molecule to a real, rather than a virtual, electronic state. In complex analytes such as biomass, molecules resonating with the excitation source will be selectively enhanced. For example, lignin has an electronic absorption in the UV region, leading to increased lignin spectral intensities when UV lasers are employed. This resonance allows lignin to be preferentially studied while polysaccharides generate limited spectral response. An advantage of evaluating lignin with UVRR, as previously discussed, is that the lignin can be measured *in situ*. This allows a more pragmatic analysis of lignin structure, since the techniques commonly employed to extract or isolate lignin from plants are known to alter the lignin. Another benefit to using UV lasers is that fluorescence, ubiquitous in visible Raman spectroscopy, and still a hindrance at some shorter NIR wavelengths, is not problematic. The analysis of lignin model compounds using UVRR enabled S, G, and H lignin markers to be characterized (Saariaho et al., 2003, 2005). The use of a tunable argon laser allowed three different excitation wavelengths to be evaluated: 229, 244, and 257 nm (Saariaho et al., 2003). The lignin S, G, and H markers were preferentially enhanced based upon which excitation wavelength was used. H lignin structures showed the strongest enhancement when 244 nm was employed, while G moieties were more intense when 257 nm was used. The spectra generated from S functionalities were essentially indistinguishable when using either 244 or 257 nm. A follow-up study utilized PLS to determine the specific wavelengths correspondent to each type of lignin monomer, as well as condensed structures, conjugated C = C and C = O, and stilbenes (Saariaho et al., 2005). The authors note that using multivariate analysis in this fashion can aid in qualitatively interpreting complex spectra of polymeric lignin, since the UVRR spectra typically have broad peaks. The evaluation of the PLS model loadings plots allowed the identification of important vibrational modes corresponding to the different model compound classes.



<sup>a</sup>Data compiled from Lupoi et al. (2014a), "High-throughput prediction of eucalypt lignin syringyl/guaiacyl content using multivariate analysis: a comparison between mid-infrared, near-infrared, and Raman spectroscopies for model development," Biotechnology for Biofuels, Volume 7, p. 93 and Lupoi et al. (2015), "High-throughput prediction of Acacia and eucalypt lignin syringyl/guaiacyl content using FT-Raman spectroscopy and partial least squares modeling" Bioenergy Research, open access.

The functional groups of lignin contribute to its chemical properties and its valorization potential. Phenolic moieties, one of the principal functionalities in lignin, define the reactivity and solubilization of lignin (Zakis, 1994). The ionization of phenolic species in alkaline media results in a concomitant shift in the vibrational modes of lignin in pulps and lignin model compounds (Warsta et al., 2012). Shifts from 8 to 35 cm−<sup>1</sup> were measured when the pH was increased from 6 to 12. In general, as the pH became more alkaline, a shift to lower wavenumbers was detected. When wood pulps were analyzed, a less pronounced shift resulted, since the pulps have less phenolic functionalities than model compounds. When non-phenolic 3,4-dimethoxytoluene was measured,no shift was detected, indicative that the shifting occurred due to ionization of the phenolic group. Increases in pH also resulted in augmented band intensities; however, the band intensity was still directly proportional to analyte concentration, as exemplified by the construction of a calibration curve for guaiacol. While these bands were detected at more neutral pH levels, the enhancement of these bands at strongly basic pH provided a more detailed structural analysis. The authors suggest that the shifting of the aromatic band near 1600 cm−<sup>1</sup> from increasing the alkalinity of the matrix may aid in determining the amount of free phenolic groups (for example, an 11 cm−<sup>1</sup> shift can be expected if *all* of the phenylpropanoid functionalities have a free phenolic group).

Ultraviolet resonance Raman enabled the analysis of extractable lipophilic and hydrophilic components from Scots pine wood resin (Nuopponen et al., 2004b,c). The authors employed a tunable argon laser set to one of three different excitation wavelengths: 229, 244, or 257 nm. The level of the enhancement for different structures depended on the particular laser wavelength employed. Molecules such as resin (dehydroabietic, abietic, and pimaric type) and fatty acids, sitosterol, and sitosterol acetate were evaluated as standards, in hexane extracts from the biomass, and in solid wood samples. Double-bond moieties, such as those found in alkenes, were resonantly enhanced using the UV laser wavelengths. When the 257 nm wavelength was used, compounds with isolated double bonds provided the most information, while the 229 nm wavelength was more useful for analyzing conjugated resin acids. Additionally, the 257 nm laser was best suited for studying sapwood hexane extracts, while either the 229 or the 244 nm lasers could be employed for evaluating heartwood extracts. The measurement of solid wood revealed a vibrational mode at 1650 cm−<sup>1</sup> , indicative of unsaturated wood resin constituents. For the hydrophilic extractables, only the 244 and 257 nm wavelengths were used. Aromatic and unsaturated moieties of pinosylvin and chrysin were found to be resonantly enhanced. Wavelength selection had a minimal effect on chrysin analysis. The heartwood acetone/water extract included pinosylvin plus resin and fatty acid markers. The sapwood extract

contained oleophilic structures of the resin and fatty acids, as well as some guaiacyl modes. The measurement of Scots pine knotwood unveiled an abundant resin contribution, illustrating that the resin was more resonantly enhanced than lignin. These two analyses have demonstrated the competence of UVRR to selectively analyze extractable compounds. Although extractives are a smaller proportion of biomass compared to polysaccharides and lignin, they have significant impacts on plant properties and may also present a source of bio-based chemicals.

A UVRR method was established for quantifying lignin in bleached hardwood kraft pulps (Jaaskelaeinen et al., 2005). Lignin quantification techniques typically are developed using unbleached biomass, and therefore are not readily transferable to bleached samples. A strong linear correlation (*R* <sup>2</sup> = 0.987) was calculated when the 1604 cm−<sup>1</sup> peak was normalized to the 1093 cm−<sup>1</sup> cellulose peak, and plotted against increasing lignin concentration. A 244 nm excitation wavelength provided more accurate lignin content measurements, since the use of 257 nm resulted in more fluctuations in the spectral baseline. The Raman measured lignin contents were compared with kappa numbers measured using a standard method, and were found to linearly correlate. Other applications of UVRR include the degradation of lignin following a chemical treatment such as bleaching (Halttunen et al., 2001; Mononen et al., 2005; Jaaskelainen et al., 2006; Läehdetie et al., 2009) or steam treatment (Nuopponen et al., 2004a), the changes in TMP after laccase treatments (Lähdetie et al., 2009), photodegradation using an UV laser (Pandey and Vuorinen, 2008), and an evaluation of 25 diverse tropical hardwoods using UVRR spectral data and principal component analysis (PCA) (Nuopponen et al., 2006).

Resonance Raman spectroscopy using visible excitation sources with Kerr-gated fluorescence rejection has enabled structural analyses of lignin that were previously unattainable (Barsberg et al., 2005, 2006). A Kerr-gate is a device consisting of two polarizers and a Kerr medium (carbon disulfide, in this instrument). When closed, the polarizers blocked scattered photons from reaching the detector. The Kerr-gate provided a time-window of 4 ps to collect Raman spectra free from fluorescence, a phenomenon occurring on a nanosecond timeframe. Once the Raman data had been acquired, the Kerr-gate was switched to the closed position, thereby blocking fluorescence. Syringyl moieties were resonantly enhanced when a 400 nm laser was used, whereas the use of 500 nm light caused a reduction in selectivity. The effects of laccase plus various mediators on beech lignin were studied using both excitation wavelengths and RR difference spectra. In a follow-up study, the authors successfully measured lignin radicals produced enzymatically using laccase (Barsberg et al., 2006). A 1570 cm−<sup>1</sup> band was measured in dry wood, regardless of the type of biomass was analyzed. When wet beech was evaluated, a lignin radical peak at 1606 cm−<sup>1</sup> was detected. Density functional theory was used to correlate the experimental results with the predicted vibrational modes of lignin radicals, and indicated that the radicals were formed from syringyl and guaiacyl moieties in beech and spruce, respectively. RR difference spectra were calculated to subtract spectral contributions from the main lignin peak near 1600 cm−<sup>1</sup> . The radicals could only be detected when the 500 nm light was used to generate Raman scatter.

Resonance Raman spectroscopy coupled with Kerr-gated fluorescence suppression allowed the measurement of strongly fluorescent chemical pulps using 400 nm light (Saariaho et al., 2004). Although these pulps are not typically assessable, due to lignin fluorescence, the use of the Kerr-gate permitted a 250-fold reduction in the fluorescence background, enabling much weaker Raman bands to be detected. Chromophoric vibrational modes at 1605 and 1655 cm−<sup>1</sup> were measured in peroxide-bleached pulps, while only the 1605 cm−<sup>1</sup> was identified in biomass treated with chlorine dioxide. When a 257 nm laser was used to evaluate the pulps, the intensity of the aromatic lignin peak was approximately 20-times higher than the main cellulose mode. The square root of the ratio of the 1605 cm−<sup>1</sup> vibrational mode to the 1098 cm−<sup>1</sup> peak correlated linearly with brightness percentage, as measured using a standard method. The authors concluded that while UV excitation preferentially probed lignin, visible lasers allowed the detection of chromophoric lignin structures. Lignin remaining in chemically treated pulps could be quantified using RR spectroscopy, although the detection limit can be lowered when UV lasers are employed.

### **RAMAN IMAGING**

Raman imaging techniques have enabled the visual examination of biomass cell and cell wall structure, and the evaluation of realtime changes in the morphology and chemical content of plants, such as after different pretreatment strategies. These experiments have provided a glimpse into the chemistry of plants before and after treatments, permitting researchers to identify the biomass modification approaches best suited for reducing recalcitrance and increasing yields from downstream conversion into simple sugars. The laser can be focused to small spot sizes, enabling minute areas of interest to be evaluated. Instrumental advances have allowed the rapid acquisition of images with short integration times, preventing the photodegradation of the sample. Another advantage of Raman imaging, juxtaposed to other imaging techniques, is that no staining or embedding of the sample is required. Raman spectra are collected from the sample, as the instrument passages the sample to defined locations using a set step-size, resulting in a plethora of structural and chemical data that can be daunting to analyze. Multivariate analysis, coupled with imaging techniques, has enabled enhanced data mining for valuable information.

Raman microspectroscopy has been used to evaluate how a room temperature pretreatment with the ionic liquid (IL) 1 *n*-ethyl-3-methylimidazolium acetate modified the cell walls of poplar (Lucas et al., 2011). A 785 nm diode laser was used to collect spectral data from 50µm poplar sections. Raman spectra from untreated poplar revealed the characteristic vibrational modes from cellulose, hemicellulose, and lignin. When the wood was swollen with water, the same peaks were identified; however, the intensities differed from the untreated samples. The intensity ratio of the 1460 cm−<sup>1</sup> cellulose peak to the 1605 cm−<sup>1</sup> lignin peak decreased, which signified diminutions in the cellulose-abundant S2 sub-layer compared to the hydrophobic, lignin-rich compound middle lamella (CML) region. The authors conclude that the swelling must be pushing the fibers apart, and progressing into more amorphous cellulose regions since crystalline cellulose fibers are recalcitrant to water penetration. The Raman spectra of the IL treated poplar samples depicted strong signals from the IL itself.

When the samples were washed with water prior to analysis, the spectra showed no traces of IL vibrational modes, and resembled the water-swollen poplar Raman spectrum, leading to the conclusion that both the water and IL treatments led to similar overall cell wall compositions. Confocal Raman spectroscopy using a 785 nm diode laser enabled an evaluation of tissue-specific changes when pretreating corn stover with the IL 1-ethyl-3-methylimidazolium acetate (Sun et al., 2013). A temporal study was conducted to gage the lignin and cellulose remaining in the plant cell walls during the IL pretreatment at 120°C using 0, 30 min, 1, 2, and 3-h time points. To assess the changes brought about by the IL treatment, tracheids, sclerenchyma, and parenchyma cell structures were probed (see **Figures 6** and **7**). Before pretreating the corn stover, cellulose and lignin concentrations were highest in the cell corners (CCs) and CML portions of the three cell structures and in the secondary walls of the sclerenchyma and parenchyma cell types. The lignin content was measured to decrease rapidly during the IL treatment, while no preferential cellulose dissolution was detected. The IL pretreatment is known to cause swelling of the secondary wall, but not of the CML. Accordingly, more significant swelling was observed in tracheid and sclerenchyma cells than parenchyma cells, which are composed of primary cell walls. Although tracheids contained higher lignin concentrations and thicker walls than parenchyma cells, the lignin dissolution occurred more rapidly in the tracheid cells. Confocal Raman microscopy was also employed to evaluate normal and tension wood sections from poplar (Gierlinger and Schwanninger, 2006). The allocation of cell wall components was calculated following the integration of distinct vibrational modes. The Raman images of normal wood illustrated higher lignin concentrations in the CCs and the CML, and increased cellulose content in the S2 layer of parenchyma ray cells and two lesser layers located on each side of the CML, presumed to be S1. A higher fluorescence background was measured for CCs and the CML, which is expected due to the greater lignin concentrations in these regions. Analysis of tension wood samples revealed lignin to be localized in CCs and the CML, while no lignin was detected in the gelatinous, or G-layer. Signals from lignin increased, however, in the lumen. Aromatic compounds were measured to coalesce along an inner region of the G-layer, and were also found deeper in the G-layer, toward the CCs of the S2 layer.

Many applications of Raman imaging utilize NIR excitation sources. Visible excitation, however, has been demonstrated as offering a higher energy source for obtaining Raman images. A

**FIGURE 7 | Raman mapping of sclerenchyma cells during an ionic liquid pretreatment**. **(A)** Bright-field microscopy images; **(B)** lignin maps (black boxes are the locations of cell corners); and **(C)** cellulose maps generated over 0–3 h of pretreatment [reprinted with permission from the Royal Society of Chemistry, Sun et al. (2013)].

over 0–3 h of pretreatment [reprinted with permission from the Royal

Society of Chemistry, Sun et al. (2013)].

novel, polarized 633 nm laser was used to attain images of black spruce cross-sections (Agarwal, 2006). Fluorescence was efficiently blocked by acquiring data in confocal mode using a 100µm pinhole. Lignin concentrations were highest in the CCs, concurrent with other studies, but were not profoundly different in the CML and secondary wall. Coniferaldehyde and conifer alcohol distribution, using the 1650 cm−<sup>1</sup> , was measured to correspond with lignin. Cellulose localization followed an opposite pattern to lignin distribution (high S2 and low CC and CML concentrations). A confocal Raman microscope, equipped with a 532 nm laser and a 100µm pinhole, was used to characterize black cottonwood (Perera et al., 2011). Given the heterogeneity of the sample, the abundance of spectral information, and spectrally unresolved vibrational modes, the authors developed a new analysis strategy to aid in determining the structural characteristics and chemical composition of the wood. The method encompasses three main phases: spectral preprocessing, stepwise clustering, and estimation of spectral profiles of pure components and their respective weights. The spectral processing included wavelet analysis to remove noise, second-derivative transformations to remove contributions from fluorescence, and PCA to reduce the amount of variables as well as reduce noise from the data matrix. Stepwise clustering was achieved using *k*-means clustering to classify the samples according to a preordained number of groups. The image can then be reconstructed using the cluster groupings, facilitating the identification of diverse sub-layers within the cell wall. The last step involves determining which factors are important in contributing to the distinct localization of different cell wall components in the images. A technique called spectral entropy minimization methodology allowed the pure components spectra to be captured. Estimated pure polysaccharide and lignin spectra were generated. Pure cellulose and hemicellulose components could not be generated due to the structural, and therefore, spectral resemblance between the two polysaccharides. The lignin spectrum included regions typically assigned to lignin monomers, permitting an *in situ* analysis of monolignol composition. The authors note that this is not possible with routine data processing techniques, since the lignin monomers have significant spectral overlap with carbohydrate vibrational modes. The partitioning of lignin and carbohydrates in the images was determined by subtracting first the pure lignin and then the pure carbohydrate spectra from the average spectra determined for each cluster. The image analysis procedure can be extended to other types of spectral data such as IR, MS, or fluorescence. In a follow-up study, this method was employed to evaluate the S and G lignin contents of *Arabidopsis*, *Miscanthus*, and poplar. Spectral distinctions between the three plants were clearly discernible in the estimated lignin spectra, indicative of differences in S, G, and H contents. The *Miscanthus* spectrum was less intricate than the dicots, which, the authors deduce is illustrative of *Miscanthus* having a higher percentage of non-condensed lignin. Lignin S/G ratios were calculated to be 0.5 ± 0.08, 0.6 ± 0.1, and 1.9 ± 0.2 for *Miscanthus*, *Arabidopsis*, and poplar, respectively. The S/G ratios within different cell wall structures could also be calculated (0.8 ± 0.1 for *Miscanthus* xylary fiber cells, 0.6 ± 0.1 for *Miscanthus* interfascicular cells of basal stems). A transgenic poplar sample, in which the monolignol biosynthesis gene encoding for 4-coumarate-CoA

ligase was suppressed, revealed reduced total lignin contents and decreased S/G ratios. These examples demonstrate the power of Raman spectroscopy coupled with chemometric techniques to exhaustively extract obscured information from the spectra.

Coherent anti-Stokes Raman scattering microscopy can be used to obtain images of biomass devoid of fluorescence (Zeng et al., 2012; Pohling et al., 2014). In CARS, multiple lasers interact with the analytes, termed pump, probe, or Stokes excitation sources. These lasers are used to generate the anti-Stokes photons. When the frequency difference between the pump and Stokes lasers is tuned to coincide with a specific Raman vibrational mode, the signal is enhanced. CARS intensities are stronger than those obtained using spontaneous Raman spectroscopy, leading to increased sensitivity, and shorter acquisition times. Spectral assignments in CARS spectra are identical to those assignments made using traditional Raman spectra. Wood samples of birch,oak,and spruce were evaluated using CARS microscopy (Pohling et al., 2014). Standards of pure cellulose, xylan, and lignin were measured to establish indicative marker bands. Through the use of spectrally broad lasers, the CARS protocol can probe multiple vibrations (MCARS). Using a technique called the maximum entropy method, Raman spectra could be extracted from the MCARS spectral data, revealing spectra that resembled standard spectra collected with spontaneous Raman spectroscopy, minus fluorescence contributions. Transverse and longitudinally oriented cuts of the wood samples illustrated the cell wall structure and composition. Cellulose, hemicellulose,and lignin were localized allowing the assignment of cellulose-rich secondary walls, and lignin-rich intercellular space. The longitudinally cut images showed polarization dependence. Stronger cellulose signals were detected using horizontal polarization, while more intense lignin peaks were measured using vertical polarization. A semi-quantitative assessment of cellulose, hemicellulose, lignin, and water illustrated the significance of polarization in the longitudinal plant sections, as there was greater disparity in the results when measured with horizontal or vertical polarization.

Coherent anti-Stokes Raman scattering, however, experiences an electronic background that can alter spectral data, obscuring the quantitation of analytes from CARS imaging techniques (Li et al., 2005). SRS microscopy provides orders-of-magnitude higher spectral signals, eliminating the effects of the higher background (Freudiger et al., 2008). The SRS phenomenon is similar to that observed in CARS. Two lasers are overlapped and focused onto the analyte. When the difference frequency of the two lasers resonates with a vibrational mode in the sample, the rate at which photons migrate to higher vibrational levels is enhanced due to stimulated photon excitation. Energy transfers only occur when in resonance with a molecule's fundamental vibrational mode(s). Although the signals are weak and obscured in the background produced from the laser, the laser noise can be eliminated by using a high-frequency (>1 MHz) amplitude modulation/lock-in detection procedure (Saar et al., 2010). Like CARS, the assignments of vibrational modes are equivalent to those generated from spontaneous Raman scattering. Also analogous to spontaneous Raman spectroscopy, signals produced using SRS are linearly dependent on analyte concentration. SRS microscopy was employed to evaluate the real-time processing of corn stover (Saar et al., 2010). The images were acquired in approximately 3 s, whereas the same image would have required nearly 2 h using spontaneous Raman scattering. Cellulose and lignin localization were generated in the images, without using labels or staining, by tuning the frequency difference of the two lasers to the well-known vibrational modes of each biopolymer. The validity of this technique was confirmed by comparing with common staining techniques, such as phloroglucinol for lignin detection. The vessel, tracheid, and fiber cells revealed significant lignification compared to phloem cells. The cellulose content was more uniformly dispersed throughout the cells, juxtaposed to lignin. Areas of higher and lower cellulose and lignin concentration could be detected, an observation that is more challenging in CARS, due to the inability to separate the signal from the higher background. The authors used this method to monitor the delignification of corn stover using a sodium chlorite treatment. An eightfold decrease in lignin content was measured while the cellulose content remained relatively unchanged. Analysis of the SRS images provided information on where lignin was preferentially removed from the corn stover during the bleaching process. The bleaching rates were fastest for the lignin contained in the phloem and CCs. Parenchyma, tracheid, vessel, and fiber cells demonstrated similar delignification patterns, signifying similar accessibilities of lignin to sodium chlorite.

Other Raman imaging applications include an analysis of the structural changes in polyaromatic molecules and non-aromatic moieties following the carbonization of Japanese cedar, cotton cellulose, and lignin at 500–1000°C (Ishimaru et al., 2007), studies on deformation properties of native and regenerated celluloses (Hamad,2008),the monitoring of structural and chemical changes in *Miscanthus x giganteus* following a sodium hydroxide treatment (Chu et al., 2010), the localization of cellulose and lignin in corn stover and *Eucalyptus globulus* (Sun et al., 2010), the *in situ* detection of a single carotenoid crystal (Baranska et al., 2011), and the characterization of cellulose nanocrystal (CNC)-polypropylene composites, and determine the spatial distribution of the CNC in the filaments (Agarwal et al., 2012).

### **CONCLUSION**

As the search for ideal wild-type or transgenic biofuel and biomaterial feedstocks progresses, methods that rapidly and accurately screen large arrays of different plants are becoming essential. Raman spectroscopy, in its diverse configurations, has proven to be a viable asset to these qualitative and quantitative studies. As instrumental innovations evolve, such as field-portable devices, measurements of the feedstocks can be conducted in their natural environments,reducing the need for time-consuming sampling protocols. The construction of robust, multivariate predictive models coupled to Raman spectral data will increase experimental throughput, thereby narrowing the pool of potential plants suitable for downstream renewable energy applications. Raman imaging techniques have empowered researchers to evaluate deconstruction strategies in real-time, providing both fundamental insights into how specific reagents affect the morphology of the biomass, and also the ability to nominate or exclude the pretreatment method based on the efficiency of rendering the cell wall-less recalcitrant based on end-product yields. The extent of endeavors explored for the characterization of lignocellulosic biomass using Raman spectroscopy continues to escalate. Future advancements in instrumentation, multivariate analysis modeling, and the revolutionary ways in which Raman spectroscopy is utilized will continue to proffer researchers a versatile, non-destructive, non-invasive, user-friendly, high-throughput analytical tool.

### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 17 February 2015; paper pending published: 19 March 2015; accepted: 27 March 2015; published online: 20 April 2015.*

*Citation: Lupoi JS, Gjersing E and Davis MF (2015) Evaluating lignocellulosic biomass, its derivatives, and downstream products with Raman spectroscopy. Front. Bioeng. Biotechnol. 3:50. doi: 10.3389/fbioe.2015.00050*

*This article was submitted to Bioenergy and Biofuels, a section of the journal Frontiers in Bioengineering and Biotechnology.*

*Copyright © 2015 Lupoi, Gjersing and Davis. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Standard flow liquid chromatography for shotgun proteomics in bioenergy research

### **Susana M. González Fernández-Niño<sup>1</sup> , A. Michelle Smith-Moritz <sup>1</sup> , Leanne Jade G. Chan<sup>1</sup> , Paul D. Adams 1,2 , Joshua L. Heazlewood1,3 and Christopher J. Petzold<sup>1</sup>\***

<sup>1</sup> Joint BioEnergy Institute and Physical Biosciences Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA

<sup>2</sup> Department of Bioengineering, University of California Berkeley, Berkeley, CA, USA

<sup>3</sup> Australian Research Council Centre of Excellence in Plant Cell Walls, School of Botany, The University of Melbourne, Melbourne, VIC, Australia

### **Edited by:**

Robert Henry, The University of Queensland, Australia

### **Reviewed by:**

Qaisar Mahmood, COMSATS Institute of Information Technology, Pakistan Yu-Shen Cheng, University of California Davis, USA

### **\*Correspondence:**

Christopher J. Petzold, Joint BioEnergy Institute, Lawrence Berkeley National Laboratory, 5885 Hollis Street, 4th Floor, Emeryville, CA 94608, USA e-mail: cjpetzold@lbl.gov

Over the past 10 years, the bioenergy field has realized significant achievements that have encouraged many follow on efforts centered on biosynthetic production of fuel-like compounds. Key to the success of these efforts has been transformational developments in feedstock characterization and metabolic engineering of biofuel-producing microbes. Lagging far behind these advancements are analytical methods to characterize and quantify systems of interest to the bioenergy field. In particular, the utilization of proteomics, while valuable for identifying novel enzymes and diagnosing problems associated with biofuelproducing microbes, is limited by a lack of robustness and limited throughput. Nano-flow liquid chromatography coupled to high-mass accuracy, high-resolution mass spectrometers has become the dominant approach for the analysis of complex proteomic samples, yet such assays still require dedicated experts for data acquisition, analysis, and instrument upkeep. The recent adoption of standard flow chromatography (ca. 0.5 mL/min) for targeted proteomics has highlighted the robust nature and increased throughput of this approach for sample analysis. Consequently, we assessed the applicability of standard flow liquid chromatography for shotgun proteomics using samples from Escherichia coli and Arabidopsis thaliana, organisms commonly used as model systems for lignocellulosic biofuels research. Employing 120 min gradients with standard flow chromatography, we were able to routinely identify nearly 800 proteins from E. coli samples; while for samples from Arabidopsis, over 1,000 proteins could be reliably identified. An examination of identified peptides indicated that the method was suitable for reproducible applications in shotgun proteomics. Standard flow liquid chromatography for shotgun proteomics provides a robust approach for the analysis of complex samples.To the best of our knowledge, this study represents the first attempt to validate the standard flow approach for shotgun proteomics.

**Keywords: proteomics, standard flow chromatography, biofuels, mass spectrometry**

### **INTRODUCTION**

Advances in biofuels research focusing on feedstock characterization and engineering (Persil-Cetinkol et al., 2012; DeMartini et al., 2013; Shen et al., 2013; Eudes et al., 2014) as well as the genetic manipulation of microbes (Alper et al., 2006; Tyo et al., 2007; Keasling, 2008; Lee et al., 2008) have progressed significantly in the last few years. Unfortunately, analytical capabilities required to efficiently monitor and assess these changes are lagging. Furthermore, many modern bioanalytical techniques are focused toward medical and health related research, which have significantly different priorities and requirements for success. Most biotechnology research challenges are not constrained by sensitivity or resolution of an assay, rather they depend on accurate identification and quantitation of target molecules for a large number of samples. Consequently, an important component for biotechnological research is sample throughput supported by a robust analytical platform. Recent advances in proteomics and metabolomics have

focused on liquid chromatography-mass spectrometry (LC-MS) methods by increasing their sensitivity to aid discovery-based research efforts. This is most evident with the development of nano-LC couple to high-resolution mass spectrometers; yet, this technology is yet to mature into a robust platform capable of consistently analyzing hundreds of samples per week. Consequently, alternate technologies capable of answering the questions necessary for biotechnology progress are needed. Recently, we published a high throughput targeted proteomic toolkit based on standard flow chromatography coupled to mass spectrometry (MS) to help address these issues for *Escherichia coli*; however, a significant amount of methods development was necessary. This work prompted us to assess the utility of standard flow liquid chromatography (LC) for shotgun proteomic methods related to biotechnology.

The discipline of proteomics has been dominated by nano-flow LC coupled to MS since its early development over 20 years ago (Emmett and Caprioli, 1994; Gatlin et al., 1998). The adoption of nano-flow LC for protein identification was driven by the substantial increases in sensitivity and detection capabilities of nano-flow (ca. 500 nL/min) over capillary (ca. 50µL/min) and standard flow (ca. 0.5 mL/min) chromatography. Typically, shotgun proteomic studies utilize nano-flow chromatography methods due to limited amounts of sample and to obtain optimal ionization efficiency for sensitive detection of peptides but at the cost of ease of use and system robustness (Gapeev et al., 2009). These issues are particularly problematic for biotechnology research that depends heavily on high sample throughput. Recently, ultra-high performance LC coupled to triple quadrupole mass spectrometers has been shown to yield comparable sensitivity and better analytical metrics (coefficient of variation, dynamic range) than nano-flow LC for MRM-based analysis of biomarker proteins when the amount of sample is adjusted to the column size (Percy et al., 2012). Additionally, the well-established robust operation of standard flow chromatography makes this instrumentation attractive for applications that rely on high sample throughput, consistent results, and less system downtime (Swartz, 2005). For applications where sample abundance is not limited, one of the greatest concerns with ultra-high pressure liquid chromatography- mass spectrometry (UHPLC-MS) workflows is the loss of sensitivity, relative to nano-LC-MS workflows, leading to datasets of insufficient depth to answer questions of interest.

The adoption of a UHPLC-MS workflow for sample delivery requires both efficient ionization and instrument speed to handle both the sample delivery rate and reduced elution times for peptides. The past decade has seen significant developments and advances in instrumentation associated with proteomics-based MS. Advances in reversed phase C18 chromatography columns yield greater separation efficiency and more stable retention times. The current generation of instrumentation is faster, more sensitive, and is better able to deal with the dynamic range inherent in biological samples. Moreover, the development and adoption of off-axis nebulizers for sample delivery when using electrospray ionization significantly reduces contamination of capillaries and skimmers (Banerjee and Mazumdar, 2012). Collectively, these improvements have enabled the development of a UHPLC-MS workflow for MRM-based analysis of biomarker proteins (Percy et al., 2012). Consequently, we were interested in assessing the capacity of this workflow in shotgun proteomic experiments. The work described here details the results of an analysis of a prokaryote (*E. coli*) and an eukaryote (*Arabidopsis thaliana*) whole cell proteomes on an Agilent UHPLC-QTOFMS system, but would be generally applicable for any current generation of tandem mass spectrometer being utilized for shotgun proteomics.

### **MATERIALS AND METHODS PROTEIN EXTRACTION**

Protein was extracted using standard techniques with analytical reagents where suitable. For *E. coli* DH5α samples, cell lysis and protein precipitation was accomplished using a chloroform/methanol precipitation. A 100µL aliquot of cells was transferred to a 1.7 mL tube, followed by the addition of 400µL of methanol, 100µL of chloroform, and 300µL of water, with mixing by vortex after each addition. Following centrifugation at 21,000 × *g* for phase separation, the methanol and water layer was removed and 300µL of methanol was added. The tube was briefly vortexed to dislodge the protein pellet, then centrifuged at 21,000 × *g* for 2 min. The chloroform and methanol layer was removed and the protein pellet was dried for 5 min in a vacuum concentrator. The protein pellet was re-suspended in 100 mM (NH4)HCO<sup>3</sup> with 20% methanol, reduced with 5 mM TCEP [Tris(2-carboxyethyl)phosphine hydrochloride] for 30 min at room temperature, treated with 10 mM iodoacetamide (IAA) for 30 min in the dark at room temperature, and digested with trypsin (1:50 w/w) overnight at 37°C. Aliquots of 40µg were taken for analysis by LC-MS/MS. For *A. thaliana* (L.) Heynh. (ecotype Landsberg erecta), protein was extracted from a previously described heterotrophic cell culture (Ito et al., 2011). A total of 1 g plant material (fresh weight) was used for the isolation of total protein. The plant material was harvested and frozen with liquid nitrogen in an Eppendorf tube with two small steel balls. The protein extraction was performed by the addition of 0.4 mL of fresh disruption buffer [125 mM Tris-HCl, 7% (w/v) SDS, and 10% β-mercaptoethanol], followed by vortex for 10 min at room temperature. The samples were centrifuged at 10,000 × *g* for 5 min at 4°C and the supernatant separated into two 2 mL tubes. Samples were further extracted in 800 µL methanol and mixed, then 200 µL chloroform added and mixed, and finally 500 µL of ddH2O added and vortexed (30 sec each time). Samples were centrifuged for 5 min at 10,000 × *g* at 4°C, the aqueous phase removed, and 500 µL of methanol added. Samples were vortexed for 30 s and centrifuged for 10 min at 9,000 × *g* at 4°C. The supernatant was discarded and the pellet air-dried. The dried pellet was suspended in 200 µL of re-suspension buffer [3M urea, 50 mM (NH4)HCO3, and 5 mM dithiothreitol, pH 8], and incubated with IAA at a final concentration 10 mM for 30 min in the dark. Prior to analysis by MS, 40µg of extracted protein was digested with trypsin (1:10 w/w) overnight at 37°C. Peptides were desalted using C18 Micro SpinColumns (Harvard Apparatus) as previously outlined (Parsons et al., 2013). Eluted peptides were re-suspended in 2% acetonitrile, 0.1% formic acid prior to analysis by LC-MS/MS.

### **STANDARD FLOW MASS SPECTROMETRY**

All samples were analyzed on an Agilent 6550 iFunnel Q-TOF mass spectrometer (Agilent Technologies) coupled to an Agilent 1290 UHPLC system. Peptide samples were loaded onto a Sigma– Aldrich Ascentis Peptides ES-C18 column (2.1 mm × 100 mm, 2.7µm particle size, operated at 60°C) via an Infinity Autosampler (Agilent Technologies) with Buffer A (2% acetonitrile, 0.1% formic acid) flowing at 0.400 mL/min. Peptides were eluted into the mass spectrometer via a gradient with initial starting condition of 5% buffer B (98% acetonitrile, 0.1% formic acid). For analysis of all samples, buffer B was increased to 35% over 120 min. Buffer B was then increased to 50% over 5 min, then up to 90% over 1 min, and held for 7 min at a flow rate of 0.6 mL/min, followed by a ramp back down to 5% B over 1 min where it was held for 6 min to re-equilibrate the column to original conditions. Peptides were introduced to the mass spectrometer from the LC by using a Jet Stream source (Agilent Technologies) operating in positive-ion mode (3,500 V). Source parameters employed gas temp (250°C), drying gas (14 L/min), nebulizer (35 psig), sheath

gas temp (250°C), sheath gas flow (11 L/min), VCap (3,500 V), fragmentor (180 V), OCT 1 RF Vpp (750 V). The data were acquired with Agilent MassHunter Workstation Software, LC/MS Data Acquisition B.05.00 (Build 5.0.5042.2) operating in Auto MS/MS mode whereby the 20 most intense ions (charge states, 2–5) within 300–1,400 m/z mass range above a threshold of 1,500 counts were selected for MS/MS analysis. MS/MS spectra (100– 1,700 m/z) were collected with the quadrupole set to "Medium" resolution and were acquired until 45,000 total counts were collected or for a maximum accumulation time of 333 ms. Former parent ions were excluded for 0.1 min following MS/MS acquisition.

### **LC-MS/MS DATA ANALYSIS AND INTEGRATION**

The acquired data were exported as .mgf files using the Export as MGF function of the MassHunter Workstation Software, Qualitative Analysis (Version B.05.00 Build 5.0.519.13 Service Pack 1, Agilent Technologies) using the following settings: peak Filters (MS/MS), the absolute height (≥ 20 counts), relative height (≥ 0.100% of largest peak), maximum number of peaks (300) by height; for charge state (MS/MS), the peak spacing tolerance (0.0025 m/z plus 7.0 ppm), isotope model (peptides), charge state limit assigned to (5) maximum. Resultant data files were interrogated with the Mascot search engine version 2.3.02 (Matrix Science) with a peptide tolerance of ±50 ppm and MS/MS tolerance of ±0.1 Da; fixed modifications Carbamidomethyl (C); variable modifications Oxidation (M); up to one missed cleavage for trypsin; peptide charge 2+, 3+, and 4+; and the instrument type was set to ESI-QUAD-TOF. Searches were performed against either an *E. coli* (strain K12) dataset obtained from UniProt (Magrane and Consortium, 2011) or the latest release of the *A. thaliana* dataset comprising TAIR10 obtained from The Arabidopsis Information Resource (Lamesch et al., 2012). Both databases incorporated proteins comprising the common Repository of Adventitious Proteins (cRAP v2012.01.01 from The Global Proteome Machine). The *E. coli* database comprised 4,429 sequences (1,398,775 residues) while the *Arabidopsis* database comprised 35,508 sequences (14,522,421 residues). Protein and peptide matches identified after interrogation of MS/MS data by Mascot were filtered and validated using Scaffold v4.3.0 (Proteome Software Inc.). Peptide identifications were accepted if they could be established at >95.0% probability by the Peptide Prophet algorithm (Keller et al., 2002) with Scaffold delta-mass correction. Protein identifications were accepted if they could be established at >95.0% probability and contained at least 1 identified peptide (at 95% and greater). Protein probabilities were assigned by the Protein Prophet algorithm (Nesvizhskii et al., 2003). This resulted in false discovery rates of 0.9 (*E. coli*) and 0.3% (*Arabidopsis*) for protein and 0.37 (*E. coli)* and 1.66% (*Arabidopsis*) for peptides. Proteins that contained similar peptides and could not be differentiated based on MS/MS analysis alone were grouped to satisfy the principles of parsimony.

### **STATISTICAL DATA ANALYSIS**

Statistical analysis was performed using the statistical toolbox in MATLAB v2009b (Mathworks). The following analysis was performed on data from both *Arabidopsis* and *E. coli*. After data filtering using Scaffold v4.3.0 (outlined above), peptide data were exported as .csv files and only peptides derived from the same protein identified across all replicates were considered. The Mascot ions score and total ion current values were only used for the best matching peptide (based on ion score) in each replicate. The total spectrum count for each peptide from each replicate was calculated manually. The coefficient of variation (CV: SD/mean) of total spectrum counts, Mascot ions score, and total ion current for each of the common peptides across the replicates was calculated to determine the variation across the technical replicates. The histograms of the CV were plotted to determine distribution and general extreme value fit algorithm was used to determine fit and mean of the distribution. For principal component analysis (PCA), the total spectrum count, Mascot ions score, and total ion current for all the common peptides of each technical replicate were mean centered. PCA was done by eigenvalue decomposition of the data covariance, resulting in a set of linearly uncorrelated variables. The principal component scores were then plotted to identify groupings.

### **RESULTS AND DISCUSSION**

The initial setup and tuning of the standard flow LC-MS/MS parameters was undertaken using 20 fmol aliquots of trypsin digested bovine serum albumin (BSA). The system settings were deemed adequate once identification of BSA was comparable (i.e., unique peptides, total peptides, and coverage) to that achieved using previously benchmarked nano-flow LC-MS/MS approaches on a variety of instruments over the past 5 years.

### **APPLICATION OF STANDARD FLOW LC-MS/MS WITH PROKARYOTIC SAMPLES**

Initial experiments were performed on *E. coli*, a well-characterized organism with minimal proteomic complexity. The *E.coli* genome of the widely utilized laboratory strain K-12 was completed nearly 20 years ago (Blattner et al., 1997). The genome has undergone multiple revisions and is estimated to codes for over 4,300 proteins (Magrane and Consortium, 2011). Shake flasks of *E. coli* DH5α cultures were grown aerobically at 37°C on Luria Broth (LB) medium supplemented with 1% glucose. Total protein was extracted and digested overnight with trypsin at 37°C. The analysis procedure for *E. coli* samples was developed to take advantage of the speed and robustness of a standard flow analysis and as such, a method was developed that incorporated a 120 min gradient. A total of four biological replicates each equivalent to 40µg of peptide were analyzed by a standard flow UHPLC-QTOFMS operating with typical shotgun proteomics data acquisition parameters. The samples yielded on average 31,938 ± 989 (SE) MS/MS spectra (**Table 1**). Each of these MS/MS datasets was used to query the *E. coli* (K12) protein database from UniProt (Magrane and Consortium, 2011) using the Mascot search engine (Matrix Science) with resultant protein matches filtered using Scaffold (Proteome Software). In total, 802 proteins were identified from the four replicates with an average of 786 ± 4.5 (SE) per sample (**Table 1** and Table S1 in Supplementary Material). This represents about 20% of the protein coding capacity of *E.coli*. On average, 14,639 ± 484 (SE) MS/MS spectra were successfully matched to a peptide, corresponding to around 46% of the total queries for


**Table 1 | Values obtained from E. coli (Ec) and Arabidopsis (At) samples analyzed using the standard flow technique**.

each sample (**Table 1**). This would be regarded as a high conversion rate indicating that the acquired MS/MS was of sufficient quality to confidently assign nearly 50% of the queries using the approach. The consistency of the standard flow approach was highlighted by the total number of proteins assigned in each sample. A total of 768 proteins were identified in three out of four replicates, while 746 proteins (93%) were identified collectively in all four samples. This compares well to the total number of proteins identified in all samples (802). Few proteins were identified uniquely in a single replicate except for replicate Ecoli-1, where 30 unique proteins were identified (**Figure 1A**). Consequently, we sought to examine whether this setup was adequately dealing with the sample complexity given the flow rate and an approximate peptide elution time of around 6 s (Figure S1 in Supplementary Material). Nearly 75% of the matched MS/MS spectra were redundant, with an average of 4,009 unique peptides per sample (**Table 1**). Only 2,825 unique spectra (54%) were shared between all four samples; however, the new unique peptides (2,393) accounted for only 56 new proteins (**Figure 1A**). The approach consistently identifies the major proteins in a sample and a majority of unique spectra assigned between replicates are derived from previously identified proteins. This indicates that the QTOFMS has the capacity to handle the standard flow UHPLC setup in shotgun mode even at these higher flow rates and reduced peptide elution times.

### **APPLICATION OF STANDARD FLOW LC-MS/MS TO A COMPLEX EUKARYOTIC SAMPLE**

Next, we were interested in assessing the suitability of the standard flow proteomics platform on a complex eukaryotic proteome. The genome of the reference plant *Arabidopsis* was completed over a decade ago (Arabidopsis Genome Initiative, 2000) and now represents one of the most highly curated and annotated genomes in biology. As a consequence, *Arabidopsis* is being utilized as a proving ground for plant synthetic biology approaches, many of which have focused on biomass manipulation for efficient biofuels production (Eudes et al., 2012, 2015). The most recent proteome release (TAIR10) from The Arabidopsis Information Resource (Lamesch et al., 2012) comprises over 27,000 loci and over 35,000 distinct protein products. Protein extractions were performed from a 7-day old*Arabidopsis* cell cultures that have previously been extensively employed for proteomic assays (Parsons et al., 2012). A total protein sample was analyzed in triplicate by UHPLC-QTOFMS using the standard flow proteomics platform. We analyzed the equivalent of ca. 40µg of digested total protein over a 120 min gradient with MS conditions identical to those used with the *E. coli* samples. An average of 30,575 ± 3,458 (SE) MS/MS spectra were collected over the 2-hour run from the three replicates (**Table 1**). These numbers are similar to those obtained using the *E.coli* samples and may reflect the upper capacity of the method given the increased peptide complexity that would be expected from a eukaryotic sample.

These *Arabidopsis* datasets were each used to interrogate the most recent *Arabidopsis* protein dataset using the Mascot search engine and matches filtered and integrated using Scaffold. This resulted in an average of 7,532 ± 1,115 (SE) matched spectra from the three replicates and corresponds to about 25% of the total spectra obtained (**Table 1**). This corresponds to about half the conversion rate observed for the *E.coli* samples and likely reflects the quality of the MS/MS spectra given the overall ion intensities are likely on average considerably lower due to the increased size of the proteome. The average number of proteins identified over the three samples was 1,288 ± 73 (SE) with a total of 1,364 unique proteins identified in all three replicates (Table S2 in Supplementary Material). The number of proteins consistently identified by all three replicates was 1,188 proteins or 87% of the total number (**Figure 1B**). The minor variation in identifications between these technical replicates demonstrates the reproducibility of the standard flow approach for shotgun proteomics as the method was capable of consistently identify the same proteins in each of the replicates. In an attempt to understand whether the approach is adequately dealing with the increased complexity of this eukaryotic sample, we examined the proportion of unique

**LC-MS/MS approach**.

matched peptides identified in each replicate. The 1,364 unique proteins identified in these samples were matched using 5,419 unique peptides, while the 1,188 proteins identified in all three replicates only required 2,187 (40.4% of the total) unique peptides (**Figure 1B**). These results indicate that again, as found with the *E. coli* samples, the majority of peptides exclusive to a given replicate are derived from proteins that have already yielded a high scoring peptide match. It is conceivable that the complexity of the eukaryotic sample contributes to many more co-eluting peptides during the 120 min gradient than the *E. coli* sample, resulting in unique peptides selected for MS/MS by the data-dependent acquisition method in each replicate (**Figure 1B**). Although similar results with regard to unique peptides were also observed with the *E. coli* samples, on average, an *E.coli* protein was identified by 6.8 unique peptides (768 proteins from 5,218 unique peptides) compared to the *Arabidopsis* proteins at 4.0 unique peptides each (1,364 proteins from 5,419 unique peptides). Nearly twice as many proteins were identified in the *Arabidopsis* samples (1,364 proteins compared to 768 proteins), which would be expected given the differences in coding capacity between these species. However, considerably more identified proteins would have been expected given proteome sizes, indicating that the standard flow approach has a more limited capacity to deal with the eukaryotic sample due to issues, such as increased sample complexity at any given point in time, lower overall ion intensity, and dynamic range limitations (as discussed above) of this QTOFMS system.

### **ASSESSING THE PERFORMANCE OF STANDARD FLOW LC-MS/MS**

Last, we investigated the reproducibility of the standard flow UHPLC-QTOFMS by comparing parameters of the identified peptides between the replicates for both *E. coli* and *Arabidopsis* samples. The peptide ions score obtained from Mascot after data interrogation can be indicative of the quality of the fragmentation spectra (Perkins et al., 1999), total ion current obtained from the MS/MS spectra can provide information about the intensity of an eluted peptide (Asara et al., 2008) while total spectral count can provide an indication of peptide intensity and spectral complexity (Lundgren et al., 2010). These values can be used as a proxy to assess sample limitations and reproducibility by the LC-MS system. PCA was employed to ascertain whether there were any differences between ion scores, total ion current, and spectral count for peptides identified in all the replicates for either *E. coli* or *Arabidopsis* samples (**Figure 2**). The analysis demonstrated that none of the replicates for either *E.coli* or *Arabidopsis* could be separated by principal component scores for these attributes, indicating that majority of the differences for either data set can be largely attributed to biological heterogeneity of the sample (**Figure 2**).

The similarities in ion score, total ion current, and spectral counts can also be observed when analyzing the distributions of CV for the each identified peptides across the replicates (Figure S2 in Supplementary Material). The variations in Mascot ion score and total ion current (derived from MS/MS spectra) for the peptides identified over the replicates were similar for both *E.coli* and *Arabidopsis*. This indicates that from sample to sample, identical ions performed similarly with regard to the intensity of matched

**ions score, and total ion current (derived from MS/MS) from common peptides identified over all replicates from (A) E. coli and (B) Arabidopsis**.

peptides (total ion current) as measured by the mass spectrometer. This is further supported by the small variation in Mascot ion scores with ion intensity having a relationship to the quality of the MS/MS spectra and subsequent spectral matching procedures. The variation in total spectrum count was more pronounced between the identified peptides of *E.coli* (0.67) and *Arabidopsis* (0.32), with samples of lower complexity (i.e., *E.coli*) having a larger variation in spectral counts for a peptide across replicates. This could be due to the higher repeat sampling rate that likely occurred during analysis of *E.coli* samples due to the reduced number of distinct ions/peptides in the sample. This conclusion is supported when looking at the average number of spectral counts for a given identified peptide; for *E.coli*, it was 3.74 spectra per peptide while for *Arabidopsis* it was 2.70. The lower average number of spectral counts for a peptide from *Arabidopsis* is likely indicative of the increased sample complexity. Collectively, these data indicate that between the replicates, the quality and intensities of peptides identified across the replicates was similar, indicating that the standard flow approach was not significantly impeding the performance of the mass spectrometer to acquire tandem mass spectra.

### **CONCLUSION**

The recent progress of bioenergy research has relied heavily on transformational developments in feedstock characterization and metabolic engineering. Yet, omics methods to characterize and quantify systems of interest have mainly been adapted from the health and clinical fields that have very different research needs (i.e., high sensitivity, deep proteome coverage). We report the application of a standard flow LC-MS/MS approach that is suitable for large numbers of shotgun proteomic experiments where sample abundance is not limiting. The setup is capable of undertaking a rapid analysis of low complexity samples as well as handling highly complex samples by employing extended analysis times. Although its application requires instrumentation with the ability to deal with increased flow rates and shorter peptide elution times, it is apparent that the current generation of tandem mass spectrometers is capable of handling these parameters. While traditional nano-LC-MS/MS approaches are likely to continue to dominate shotgun analyses as they produce a greater number of protein identifications and have the ability to deal with increased sample complexity and dynamic range, the robust nature and simplicity of standard flow coupled to MS makes this approach an attractive alternative for applications where sample throughput and reproducibility are important factors.

### **AUTHOR CONTRIBUTIONS**

JH and CP conceived and advised in all aspects of the study. PA supervised all aspects of the study. SG and LC performed experiments. AS analyzed the data. JH and CP interpreted the data and wrote the manuscript. All authors discussed and commented on the manuscript.

### **ACKNOWLEDGMENTS**

This work was part of the DOE Joint BioEnergy Institute (http: //www.jbei.org) supported by the U.S. Department of Energy, Office of Science,Office of Biological and Environmental Research, through contract DE-AC02-05CH11231 between Lawrence Berkeley National Laboratory and the U.S. Department of Energy.

### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at http://journal.frontiersin.org/article/10.3389/fbioe.2015. 00044/abstract

### **REFERENCES**


MRM-based quantitation of putative plasma biomarker proteins. *Anal. Bioanal. Chem.* 404, 1089–1101. doi:10.1007/s00216-012-6010-y


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 11 February 2015; accepted: 18 March 2015; published online: 01 April 2015. Citation: González Fernández-Niño SM, Smith-Moritz AM, Chan LJG, Adams PD, Heazlewood JL and Petzold CJ (2015) Standard flow liquid chromatography for shotgun proteomics in bioenergy research. Front. Bioeng. Biotechnol. 3:44. doi: 10.3389/fbioe.2015.00044*

*This article was submitted to Bioenergy and Biofuels, a section of the journal Frontiers in Bioengineering and Biotechnology.*

*Copyright © 2015 González Fernández-Niño, Smith-Moritz, Chan, Adams, Heazlewood and Petzold. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

*Edited by:* 

*Bo Hu, University of Minnesota, USA*

### *Reviewed by:*

*Chiranjeevi Thulluri, Jawaharlal Nehru Technological University Hyderabad, India Jiwei Zhang, University of Minnesota, USA*

### *\*Correspondence:*

*Kai Deng kdeng@sandia.gov; Trent R. Northen trnorthen@lbl.gov*

### *†Present address:*

*Noppadon Sathitsuksanoh, University of Louisville, Louisville, KY, USA*

### *Specialty section:*

*This article was submitted to Bioenergy and Biofuels, a section of the journal Frontiers in Bioengineering and Biotechnology*

*Received: 10 August 2015 Accepted: 21 September 2015 Published: 13 October 2015*

### *Citation:*

*Deng K, Guenther JM, Gao J, Bowen BP, Tran H, Reyes-Ortiz V, Cheng X, Sathitsuksanoh N, Heins R, Takasuka TE, Bergeman LF, Geertz-Hansen H, Deutsch S, Loqué D, Sale KL, Simmons BA, Adams PD, Singh AK, Fox BG and Northen TR (2015) Development of a high throughput platform for screening glycoside hydrolases based on Oxime-NIMS. Front. Bioeng. Biotechnol. 3:153. doi: 10.3389/fbioe.2015.00153*

*Kai Deng1,2\*, Joel M. Guenther1,2 , Jian Gao3 , Benjamin P. Bowen 3 , Huu Tran1,2 , Vimalier Reyes-Ortiz1,3 , Xiaoliang Cheng1,3 , Noppadon Sathitsuksanoh1,3† , Richard Heins1,2 , Taichi E. Takasuka4 , Lai F. Bergeman4 , Henrik Geertz-Hansen1 , Samuel Deutsch3,5 , Dominique Loqué1,3 , Kenneth L. Sale1,2 , Blake A. Simmons1,2 , Paul D. Adams1,3,6 , Anup K. Singh1,2 , Brian G. Fox4,7 and Trent R. Northen1,3\**

*1US Department of Energy Joint BioEnergy Institute, Emeryville, CA, USA, 2Sandia National Laboratories, Livermore, CA, USA, 3 Lawrence Berkeley National Laboratory, Berkeley, CA, USA, 4US Department of Energy Great Lakes Bioenergy Research Center, University of Wisconsin, Madison, WI, USA, 5 Joint Genome Institute, Walnut Creek, CA, USA, 6Department of Bioengineering, University of California Berkeley, Berkeley, CA, USA, 7Department of Biochemistry, University of Wisconsin, Madison, WI, USA*

Cost-effective hydrolysis of biomass into sugars for biofuel production requires high-performance low-cost glycoside hydrolase (GH) cocktails that are active under demanding process conditions. Improving the performance of GH cocktails depends on knowledge of many critical parameters, including individual enzyme stabilities, optimal reaction conditions, kinetics, and specificity of reaction. With this information, rate- and/ or yield-limiting reactions can be potentially improved through substitution, synergistic complementation, or protein engineering. Given the wide range of substrates and methods used for GH characterization, it is difficult to compare results across a myriad of approaches to identify high performance and synergistic combinations of enzymes. Here, we describe a platform for systematic screening of GH activities using automatic biomass handling, bioconjugate chemistry, robotic liquid handling, and nanostructure-initiator mass spectrometry (NIMS). Twelve well-characterized substrates spanning the types of glycosidic linkages found in plant cell walls are included in the experimental workflow. To test the application of this platform and substrate panel, we studied the reactivity of three engineered cellulases and their synergy of combination across a range of reaction conditions and enzyme concentrations. We anticipate that large-scale screening using the standardized platform and substrates will generate critical datasets to enable direct comparison of enzyme activities for cocktail design.

### Keywords: cellulase, NIMS, oxime bioconjugation, high throughput screening, enzyme assays

**Abbreviations:** AFEX-SG, ammonia fiber expansion pretreated switchgrass; CBM, carbohydrate binding module; CelE, broad specificity GH family 5 (GH5) domain from *C. thermocellum* Cthe\_0797; DA-SG, diluted acid pretreated switchgrass; GH, glycoside hydrolase; IL-SG, ionic liquid pretreated switchgrass; NIMS, nanostructure-initiator mass spectrometry UT-SG, untreated switchgrass.

## INTRODUCTION

Lignocellulosic biomass (Carroll and Somerville, 2009) is a renewable source of energy, capable of providing the nation with clean, renewable transportation fuels. To convert biomass into biofuels, one key step is enzymatic saccharification, which is known to be inefficient and expensive (Klein-Marcuschamer et al., 2012). Thus, low-cost, robust, high-performance enzymes or enzyme cocktails are needed to reduce the overall cost of biofuel production. There are many sources of inedible plant biomass that can serve as feedstocks for biofuel production, including agricultural wastes [corn stover, switchgrass (SG), wood trimming], municipal solid wastes, and emerging bioenergy crops. However, these various feedstocks have different glycan composition, bond linkages, and individual sugar contents, which complicates the development of cost-effective saccharification approaches. Moreover, the content and structure of glycans and lignin from the same biomass may respond in different ways to the prerequisite pretreatments (Li et al., 2010). These variations contribute to the observation that there is no universal enzyme or enzyme cocktail for all substrates and biofuels processes.

Given these process constraints, it is important to characterize saccharification enzymes against a wide range of biomass compositions, pretreatments, and processing conditions to generate data that can help enable customized optimal enzyme cocktails for saccharification (Banerjee et al., 2010; Walton et al., 2011). Given the large number of potential combinations of substratepretreatment reaction conditions, it is desirable to have a reliable high throughput enzyme assay methods and standardized panels of substrates and conditions.

Currently, several high throughput glycosyl hydrolyze (GH) assays are available to characterize enzyme activity. The majority of these are based on detection of colorimetric or fluorescent products. For example, the 2,4-dinitrosalicyclic acid (DNS) reducing sugar assay (Decker et al., 2009) can provide rapid analysis of large enzyme libraries. However, it is a non-specific method that can only provide total reducing sugar content. Fluorescence-based enzyme assays using surrogate substrates, e.g., 4-methylumbelliferyl-β-glucopyranoside (van Tilbeurgh et al., 1982) are also available for the evaluation of β glycosidases, are also available. Surrogate substrate methods require preparation of each substrate for each enzyme type, which may be laborious, and the reactivity may be biased versus the native substrate. Moreover, care is required to avoid interference from background absorbance/fluorescence.

To address some of these limitations, we developed a mass spectrometry-based enzyme assay platform called Nimzyme (Northen et al., 2008; Reindl et al., 2011; Greving et al., 2012). The first-generation Nimzyme platform was based on soluble model substrates, which were synthesized chemically (Deng et al., 2012). This approach provided the specificity, sensitivity, and high throughput needed to screen a GH1 library of 175 enzymes. Several high-performance enzymes were identified with desired bioprocessing conditions (70°C, 20% ionic liquid) (Heins et al., 2014). Another manuscript in this volume reports use of the firstgeneration Nimzyme platform for numerical analysis of reactions with cellotetraose-nanostructure-initiator mass spectrometry (NIMS) (Deng et al., 2014). Results of this work demonstrate diagnostic behaviors of several classes of GH enzymes.

However, the use surrogate substrates does not allow interrogation of the myriad of bonding types present in plant biomass or the three-dimensional arrangements of these bonds present on the plant biomass, and so does not represent a fully realistic approach for the study of enzyme reactions.

To overcome this limitation, we developed an oxime-Nimzyme probe (Deng et al., 2014) to directly study enzyme hydrolysis of plant biomass. In this approach, soluble oligosaccharide products are captured in a stable oxime linkage and then delivered to the NIMS chip for subsequent analysis, while inclusion of 13C-labeled monosaccharide standards (glucose and xylose) allows quantitation of the derivatized glycans. Besides the diagnostic detection of solubilized products, this next-generation Nimzyme approach also allows quantitative studies of the time-dependence of product formation, and dissection of individual apparent rates for reactions of individual enzymes with plant biomass.

To further advance application of the Nimzyme approach, here we report the development of a process platform and a panel of 12 diverse glycan substrates. These substrates were selected to represent the diversity of plant glycosidic bond linkages and the sugar compositions that are relevant to biofuel production. We use this panel of substrates to characterize three previously described enzymes from *Clostridium thermocellum* that have been engineered to perform outside of their natural cellulosomal location. These are fusions of the catalytic domains of CelA (gene locus Cthe\_0269, GH8 endoglucanase family), CelR (gene locus Cthe\_0578, GH9 cellotetrahydrolase), and CelE (gene locus Cthe\_0797, GH5 endoglucanase family) to the CBM3a domain from scaffoldin protein CipA. The CBM3a domain used in this work comes from the CipA scaffoldin protein from *C. thermocellum*. CBM3a is a well-studied carbohydrate-binding module helps to promote binding of the enzyme onto the polysaccharide, thus plays a key role in promoting the efficient hydrolysis of cellulose (Yaniv et al., 2012). We envision that the standardization and automation of GH assays enabled by this approach will be valuable in providing large datasets of GH performance needed to select enzymes for improved biomass deconstruction.

## MATERIALS AND METHODS

### Materials

1,4-β-d-cellotetraose ~95% was purchased from Megazyme (Ireland) Cat. No. O-CTE100, Lot No. 130604; 1,4-β-dxylotetraose ~95% was purchased from Megazyme (Ireland), Cat. No. O-XTE, Lot No. 120204; 1,4-β-d-mannotetraose ~95% was purchased from Megazyme (Ireland), Cat. No. O-MTE, Lot No. 111004; Arabinoxylan was purchased from Megazyme (Ireland), Cat. No. P-WAXYI, Lot No. 120801a; Carob galactomannan was purchased from Megazyme (Ireland) Cat. No. P-GALML, Lot No. 10501b; beechwood xylan was purchased from Sigma-Aldrich (St. Louis, MO, USA), Cat. No. X4252-100G, Lot No. BCBL2915V.

Switchgrass has emerged as a potential bioenergy crop because it is perennial, resource-efficient, and requires low-inputs for maintenance (Keshwani and Cheng, 2009). Untreated switchgrass (UT-SG), also named as Putnam SG, was obtained from Daniel Putnam at UC Davis; diluted acid pretreated switchgrass (DA-SG) was obtained by mixing Putnam SG with 1% sulfuric acid at 190°C for 0.5 min at NREL. Ammonium fiber expansion pretreated switchgrass (AFEX-SG) was prepared at Michigan State University. Ionic liquid pretreated switchgrass (IL-SG) was prepared at 140°C for 3 h with 15% solid loading in [EMIM][OAc] (1-ethyl-3-methylimidazolium acetate) at JBEI. All four SG substrates (IL-SG, AFEX-SG, DA-SG, and UT-SG) were milled by Thomas Wiley Mill (Model 3383 L1) for 20 min and sieved (passage through a 20-mesh sieve and retention by an 80-mesh sieve). The particle size was found in the range of 200–450 μm. Avicel PH-101 cellulose was requested from FMC Biopolymer (Philadelphia, PA, USA), Lot No. P112824596. Phosphoric acid swollen cellulose (PASC) was prepared from Avicel PH-101 cellulose (FMC Biopolymer) with 85% phosphoric acid as reported (Zhang et al., 2006).

### Synthesis

The synthesis of *O*-alkyloxyamine fluorous tag has been reported previously (Deng et al., 2014).

### Enzymes

The catalytic domains of the enzymes studied were obtained from the following gene loci: CelA (Cthe\_0269); CelR (Cthe\_0578); and CelE (Cthe\_0797). Each of these catalytic domains was fused to the CBM3a domain from the scaffoldin CipA (Cthe\_3077). Additional information on these genes can be found at Uniprot (Apweiler et al., 2011). All genes were prepared by PCR using *C. thermocellum* ATCC 27405 genomic DNA as template and cloned into the *Escherichia coli* expression vector pEC\_CBM3a to create enzyme\_CBM3a fusion proteins, e.g., CelAcc\_CBM3a. The vec tor pEC\_CBM3a is a hybrid of pEU\_HSBC\_CBM3a and pVP65K (Takasuka et al., 2014; Aceti et al., 2015) that yields fusion proteins having an N-terminal enzyme catalytic domain fused by an ~40 aa linker sequence to the CBM3a domain from Cthe\_3077. Methods for PCR amplification, capture, and sequence verifica tion of protein coding sequences, transformation into E. coli 10G competent cells (Lucigen, Middleton, WI, USA) for DNA manipulations and E. coli B834 for protein expression were as previously reported (Takasuka et al., 2014). Additional details of the properties and methods for use of pEU is described elsewhere (Aceti et al., 2015).

## Enzyme Plate Construction

Three enzymes (CelAcc-CBM3a, CelRcc-CBM3a, and CelEcc-CBM3a) were chosen to construct an enzyme plate with varied enzyme concentrations (microgram per microliter) shown in **Table 1**. Each enzyme was present in triplicate as shown for CelAcc-CBM3a (columns A, B, C), CelRcc-CBM3a (columns D, E, F), and CelEcc-CBM3a (columns G, H, I). Some enzyme combinations were also included to investigate the potential synergy among these three enzymes as shown in columns J, K, L.


## Procedures for Handling Insoluble Substrates with Labman Solids-Handling Robot

To minimize static electricity, all plastic labware was prophylactically treated using a Tabletop Ionizing Transport System Model IT-7000 (Electrostatics Incorporated, 90610-03610, Harleysville, PA, USA). Commercially sourced substrates – Avicel PH-101, arabinoxylan, carob galactomannan, and beechwood xylan – were received as fine powders and aliquoted without further processing. The Labman (North Yorkshire, UK) solids-handling robot at JBEI was used to aliquot insoluble substrates (**Table 2**, numbers 4 through 12) from 2-mL Sarstedt vials (72.694.007) into 340-μL 96-well thermal cycler plates (plate: VWR 82006-636, holder: Axygen R-96-PCR-FY) with a target mass of 2 ± 0.25 mg per well. Feeder vibration power was set to 40% for all substrates and feeding durations were optimized for each to minimize overfeeding. Average feeding durations ranged from 3 to 5 h per 96-well plate depending on the substrate. After aliquoting, substrate plates were sealed using peelable heat seal (Agilent 24210-001) using an Agilent PlateLoc heat sealer set to 175°C for 2.5 s and stored at 4°C.

## Procedures for Handling Insoluble Substrates with Biomek FX Robot

For all nine insoluble solid substrates, all the liquid handling for enzymatic hydrolysis reactions and bioconjugation chemistry were performed using a Biomek FX robot equipped with an AP96 multichannel pod (Beckman Coulter). For the enzymatic hydrolysis step, the Biomek FX transferred 180 μL of 50 mM phosphate buffer, pH 6.0, into 96-well PCR plates containing ~2 mg solid substrate that was previously aliquoted using the Labman robot (plate A). Then 20 μL of enzyme solution was transferred from the enzyme plate into plate A. After sealing the 96-well plate with PlateLoc peelable seal (2.5 s, 175°C), the plate was incubated at 60°C for 18 h in a shaker with shaker speed set to 200 rpm (HT INFORS). The recipe for a typical NIMS bioconjugation solution was a mixture of the following: (1) 1 mL of probe solution (150 mM in 1:1 (v/v) H2O:MeOH); (2) 6 mL of 100 mM glycine buffer, pH 1.3; 3) 0.5 mL of 5 mM


*(1) Soluble substrates 1–3 can be dissolved in water and bypass the solid dispersion step by Labman. (2) Insoluble solid substrates 4–12 are being dispersed into a 96-well PCR plate by Labman, then Biomek was used for liquid handling.*

13C glucose aqueous solution; (4) 0.5 mL of 5 mM 13C xylose aqueous solution; (5) 2 mL of acetonitrile; (6) 1 mL of methanol. Plate B is a 96-well PCR plate prepared by the Biomek robot containing with 22.2 μL of NIMS tagging solution in each well. After plate A (enzymatic reaction) was cooled to room temperature, a 4 μL aliquot from plate A was transferred into plate B by the Biomek robot. Plate B was sealed with PlateLoc peelable heat seal (2.5 s, 175°C) and left at room temperature for 16 h. Samples (12 μL) from plate B was transferred into the assay plate (Greiner bio-one, 384 well μ clear-plate, coc black, Lo base, 10 pcs/bag, Lot No. E11060DN; Cat. No. 788876, Made in Germany).

## Procedures for Handling Soluble Substrates with Biomek FX Robot

For the three soluble substrates, 1,4-β-d-cellotetraose (G4), 1,4-β-d-xylotetraose (X4), and 1,4-β-d-mannotetraose (M4), 10 mM aqueous solutions were prepared as stock solutions. The Biomek robot was used to transfer 40 μL of 50 mM phosphate, pH 6.0, into each well of a 96-well PCR plate (plate A). Then 5 μL of soluble substrates [G4 (5 mM), X4 (5 mM), or M4 (5 mM)] was transferred to plate A. After that, Biomek transferred 5 μL of enzymes from the enzyme plate to plate A. After sealing the 96-well plate with PlateLoc peelable heat seal (2.5 s, 175°C), the whole plate was incubated at 60°C for 18 h with 200 rpm in a shaker (HT INFORS). After this step, the Biomek robot was used to perform all subsequent liquid-handling steps as indicated above for insoluble substrates.

## Acoustic Printing of Sample Arrays

Samples from the 384-well Greiner plate (1 μL) were acoustically transferred by ATS Acoustic Liquid Dispenser (EDC Biosystems) onto a 2 × 2 inch NIMS chip. Individual reaction spots on the NIMS chip were ionized by a laser and products were detected by a time-of-flight mass spectrometer (TOF/TOF 5800 MALDI systems, AB Sciex).

## Nanostructure-Initiator Mass Spectrometry Imaging

The 4800 imaging acquisition software was used in these experiments. Chips were loaded using a modified standard MALDI plate. Instrument was set with laser intensity at 2,550 and 15 shots per sub-spectrum. The detector voltage multiplier was set as 0.77. For imaging, the step size was set up at 50 μm for both *x* and *y* direction.

## Mass Spectrometry Imaging Data Processing (Open MSI)

The imaging file was uploaded to openmsi.nersc.gov by Globus Connect Personal. The converted file was analyzed by a draggable points notebook written in Python. Signal intensities were identified for the ions of the tagging products. Enzyme activities were determined by measuring the concentration of glycan products using either [*U*]-13C glucose or [*U*]-13C xylose as an internal standard.

## RESULTS AND DISCUSSION

### Substrate Panel

**Table 2** lists some of the properties of the substrates included in the substrate panel. More detailed rationale for selection of individual substrates is described in the following experimental sections. The following general principles were used to assemble the substrate panel: (1) substrates should be readily available; (2) some substrates should have known structures (e.g., cellotetraose, xylotetraose, mannotetraose) so that enzyme specificity can be rapidly determined; (3) some substrates should be plant biomass substrates such as SG; and (4) examples of different pretreatments of the same biomass should be included. Based on evolving needs, other substrates can be added to the panel using similar principles.

After selection of these 12 substrates, we sourced large quantities of each substrate so as to permit detailed analytical characterizations and extended experimentation. Each of the substrates was acquired and then divided such that our two institutions (GLBRC and JBEI) each had large aliquots of each substrate.

## Enzymes Selected for Platform Testing

To test this assay platform, we used catalytic domains from three *C. thermocellum* enzymes. In earlier studies (Deng et al., 2014; Takasuka et al., 2014), we showed that CelAcc, CelRcc, and CelEcc were able to release a variety of oligosaccharide products from ILand AFEX-pretreated SG. As these enzymes are normally included in the cellulosome, we removed the dockerin domains and instead fused them to CBM3a (Yaniv et al., 2012). This engineered addition of CBM3a targets the catalytic domain to a polysaccharide surface. Our studies also showed that CelEcc was a multifunctional GH5 catalytic domain that was reacted with cellulose, xylan, and mannan (classified as CMX). This broad specificity of reaction provides opportunities to understand contributions of an identical active site to biomass hydrolysis. By contrast, CelRcc\_CBM3a reacted only with cellulose (classified as C), while CelAcc\_CBM3a reacted with cellulose and only weakly with xylan (classified as CX), and so provided more specific enzyme reactions.

## Automated Enzyme Assay Platform

Efforts have been made to establish a simple automated workflow that enables high-throughput and that minimizes assay error. In a preparatory step (**Figure 1**), insoluble substrates (**Table 2**, 4 through 12) were aliquoted into 96-well thermal cycler plates for activity assays using a Labman solids-handling robot and stored at 4°C until use. Immediately before initiating the activity assays, enzymes were manually aliquoted into 96-well plates.

The individual steps of the workflow are as follows. In step 1, reactions were prepared using a Biomek FX liquid-handling robot to add enzyme to substrate. In step 2, reactions were incubated at 60°C for 18 h with shaking. In step 3, an aliquot from the reaction plate was combined with the oxime-NIMS tagging solution using the liquid-handling robot, after which the NIMS tag reacted with reducing sugar to form a stable oxime linkage during overnight incubation. In step 4, the solution containing NIMS-tagged glycans was transferred from tagging reactions onto a NIMS chip mass spectrometry surface by sequentially

Deng et al. HTS Oxime-NIMS to study GHs

using (1) the liquid-handling robot to reformat aliquots from the reactions in 96-well plates into 384-well plates followed by (2) using an ATS Acoustic Liquid Dispenser to print 1-nL droplets onto the chip via non-contact dispensing. In step 5, an AB SCIEX TOF/TOF 5800 mass spectrometer was used to image the NIMS chip to collect data on the tagged glycans. In step 6, the data were processed using OpenMSI (a free web-based visualization, analysis and management package for mass spectrometry imaging (MSI) data) to quantify the tagged glycans in order to compute enzyme activities.

The different characteristics of the substrates introduced challenges for automated sample handling. Thus, while soluble samples can be dispensed using standard liquid handling (**Figure 1**), insoluble samples were more challenging and required application of solid dispensing using a Labman system. As shown in **Table 2**, only 3 out of the 12 substrates were soluble and could processed using simple liquid handling, whereas the nine other substrates were insoluble and required Labman to carry out relatively slow protocol to generate batches of substrate-filled 96-wells PCR plates that were then stored at −20°C freezer until required.

### Reactions with Soluble Substrates

Cellotetraose (G4), xylotetraose (X4), and mannotetraose (M4) are purified oligosaccharides that can be used to study cellulase, xylanase, and mannase activities, respectively. These soluble substrates also permit simple liquid-handling approaches. However, it is important to recognize that some GH enzymes require longer oligosaccharide chains to show effective catalysis.

The reactions of G4, X4, and M4 were screened using a 96-well enzyme plate containing individual wells of CelAcc\_CBM3a, CelRcc\_CBM3a, and CelEcc\_CBM3a, and mixtures of these three (**Table 1**). All three endoglucanases produced cellobiose as the major product from G4 under various concentrations (Tables S1–S3 in Supplementary Material). No products were observed at the lowest enzyme concentration tested (1 ng/μL) for all three enzymes. When the enzyme concentration was increased (5 ng/μL), CelEcc\_CBM3a gave complete hydrolysis of the cellotetraose present into smaller oligosaccharides, while CelRcc\_CBM3a hydrolyzed about 80% of G4 and CelAcc\_CBM3a hydrolyzed about 13% of G4. For CelAcc\_CBM3a, increasing the enzyme concentration above 0.01 μg/μL was needed to obtain complete hydrolysis. Thus, the concentration screening can give a preliminary assessment of the affinity of an enzyme for the oligosaccharide. In the hydrolysis of cellotetraose, the following effective concentrations of enzyme were determined: CelAcc\_CBM3a (10 ng/μL), CelRcc\_CBM3a (10 ng/μL), and CelEcc\_CBM3a (5 ng/μL).

Xylotetraose (X4) is an oligosaccharide with β 1,4-linked xylose unit, and cellulases CelAcc\_CBM3a and CelRcc\_CBM3a were not reactive with xylotetraose. However, the multifunctional enzyme CelEcc\_CBM3a was active with xylotetraose. Increasing the enzyme concentration increased oligomer and monomer products (cellotriose, cellobiose, and xylose, Table S4 in Supplementary Material). For CelEcc\_CBM3a with xylotetraose, the minimum amount of enzyme needed for complete hydrolysis of X4 was 0.25 μg/μL. This is a 50-fold increase in the amount of enzyme needed to hydrolyze xylotetraose versus cellotetraose, suggesting higher affinity for the hexose oligosaccharide.

None of the enzymes studied reacted with mannotetraose (M4). Since CelAcc\_CBM3a and CelRcc\_CBM3a are cellulases, this was not surprising. However, since CelEcc\_CBM3a has been shown to react with mannan and glucomannan, the lack of reaction with M4 was not expected. This may arise from the possibility that mannotetraose is not long enough to bind effectively in the catalytic channel of CelEcc\_CBM3a.

## Reactions with Avicel and PASC

Phosphoric acid swollen cellulose (Zhang et al., 2006) has become a very popular substrate to study cellulase activities because it has a relatively easily hydrolyzed amorphous habit (Sharrock, 1988; Wood, 1988; Wood and Bhat, 1988). Solid-state cross-polarization magic angle spinning 13C NMR was used to determine the crystallinity (Park et al., 2010) of both Avicel and PASC used in this work. The NMR results demonstrated that Avicel had crystallinity of 53% while PASC had crystallinity of less than 5% (Figure S1 in Supplementary Material). Compared with the microcrystalline cellulose (Avicel), amorphous PASC has the advantages of practical solubility and increased accessibility to the cellulases. Therefore, it is more easily hydrolyzed by cellulases. As expected, all three cellulase enzymes produced multiple times more soluble hexose products from PASC than Avicel (4 times for CelEcc\_CBM3a, 2.5 times for CelRcc\_CBM3a, and 3 times more for CelAcc\_CBM3a) (Tables S5 and S6 in Supplementary Material). For the hydrolysis of Avicel, CelRcc\_CBM3a performed better than CelEcc\_CBM3a and CelAcc\_CBM3a. Interestingly, the enzyme combination of CelEcc\_CBM3a and CelRcc\_CBM3a worked better than the combination of either CelAcc\_CBM3a and CelRcc\_CBM3a or CelEcc\_CBM3a and CelAcc\_CBM3a (**Figure 2**). Since the amount of glycan products produced by the combination of CelEcc\_CBM3a and CelRcc\_CBM3a was greater than the scaled contributions of the individual enzymes, this combination is demonstrated to have a synergistic effect in reaction.

## Reactions with Beechwood Xylan and Arabinoxylan

Beechwood xylan has a high xylose content (~84%) with a majority of β-1,4 linkages. Arabinoxylan consists of a mixture of arabinose and xylose in an approximate 40:60 ratio with β-1,4- and β-1,6 linkages. Our screening results showed that CelRcc\_CBM3a was unable to hydrolyze either beechwood xylan or arabinoxylan, even at high enzyme loading. By contrast, CelAcc\_CBM3a showed weak activity with beechwood xylan, and at the highest enzyme loading tested (50 mg/g xylan), about 20% of the xylan was hydrolyzed (Table S8 in Supplementary Material). Furthermore, CelEcc\_CBM3a reacted well with beechwood xylan, and at the highest enzyme concentration tested (50 mg/g xylan), about 40% of the xylan was hydrolyzed into soluble pentose products. Both CelAcc\_CBM3a and CelEcc\_CBM3a had only weak activities with arabinoxylan, even at the highest enzyme loading tested (50 mg/g xylan).

## Reactions with Galactomannan

Galactomannans are polysaccharides consisting of a mannose backbone with galactose side groups. Mannans are important constituent of hemicellulose in some plant biomass. (Malherbe

et al., 2014) For example, softwoods contain 15–20% (w/w) mannans (Rodríguez-Gacio et al., 2012) and legume seeds can contain more than 30% of mannans (Buckeridge, 2010). For hydrolysis of these types of biomass into simple monosaccharides, it is important to find enzymes that can efficiently degrade mannans. Consequently, galactomannan was included in the substrate panel to test for mannanase activities. The screening results show that cellulases (CelAcc\_CBM3a and CelRcc\_CBM3a) could not deconstruct galactomannan at all. However, CelEcc\_CBM3a (Table S9 in Supplementary Material) was able to hydrolyze galactomannan to hexose products (Fox et al., 2014).

It is interesting to compare the results of CelEcc\_CBM3a reaction with galactomannan and its lack of reaction with mannotetraose. Besides the length of the oligosaccharide suggested above, it is also plausible that CelEcc\_CBM3a may prefer reaction with the galactose-substituted mannans.

### Reactions with Switchgrass

The structural features of SG, including surface area, crystallinity, the contents of cellulose, hemicellulose, and lignin vary considerably depending on pretreatment (Xu and Huang, 2014). For example, cellulose is changed from cellulose I (untreated) to cellulose II after ionic liquid pretreatment (Cui et al., 2014). Other factors, like degree of delignification, hemicellulose solubilization (especially in diluted acid pretreatment), changes in porosity, and others affect enzyme accessibility so that different glycan products are produced for different pretreated SG. The compositional analysis of these four pretreated samples demonstrates these significant differences caused by pretreatment (Table S10 in Supplementary Material). These compositional differences imply the need for customized enzyme cocktails.

Four types of pretreated SG were included in the substrate panel to permit comparative studies of the consequences of pretreatment on enzymatic saccharification. These are UT-SG as control, IL-SG (Li et al., 2010), AFEX-SG (Bals et al., 2010), and DA-SG (Pu et al., 2013). This selection covers the three predominant pretreatment methods under investigation by the US-DOE funded Bioenergy Research Centers (Singh et al., 2015).

Screening results show that neither the three individual enzymes (CelAcc\_CBM3a, CelRcc\_CBM3a, and CelEcc\_CBM3a) nor their combinations could deconstruct UT-SG. This is consistent with the substantial value of pretreatment before enzymatic saccharification.

For DA-SG, all assays across the breadth of enzyme concentrations gave similar, low yields for both hexose and pentose products. The relatively low reactivity of these individual enzymes with crystalline cellulose is consistent with the low yield of hexose product. Presumably, the low amount of xylan remaining in DA-SG (4.58%) is not easily accessible to further enzyme hydrolysis. For example, although CelEcc\_CBM3a reacts well with pure xylan and the hemicellulose fractions in AFEX-SG and IL-SG (see below), it did not react with the remaining xylan in DA-SG. Whether this is a result of depletion of reactive substructures or other aspects of the diluted acid pretreatment is not clear.

For reactions with AFEX-SG, both CelRcc\_CBM3a and CelAcc\_CBM3a produced little soluble glycan and no soluble pentose. By contrast, CelEcc\_CBM3a performed much better, and yielded both hexose and pentose products. CelEcc\_CBM3a was especially reactive with the hemicellulose portion of AFEX-SG, yielding about 2.5× more pentose products than in its reaction with IL-SG (Table S11 in Supplementary Material).

For IL-SG, all three enzymes (**Figure 3**) produce significant amount of hexose products (Table S12 in Supplementary Material), which can be attributed to the ability of the IL pretreatment to reduce the crystallinity of cellulose. CelAcc\_CBM3a produced the most hexose products among these three enzymes, and under the highest enzyme loading of 50 mg/g biomass a conversion of 24% of the glycan was observed. A comparison of the reaction of CelEcc\_CBM3a with either IL-SG or AFEX-SG under the same experimental conditions (enzyme loading of 50 mg/g biomass) showed that CelEcc\_CBM3a produced 8× more of hexose products with IL-SG than AFEX-SG. This result from the automated platform matches the conclusion that cellulose in IL-SG is much easier to be accessed and digested by CelEcc\_CBM3a obtained in earlier oxime-NIMS studies (Deng et al., 2014).

## CONCLUSION

In this work, we described a panel of 12 substrates and automated platform for characterization of GH enzymes using the oxime-NIMS approach. Standardization is an important step toward more in-depth comparison of GH enzymes activities, both from natural environments and from engineered systems. These studies are consistent with our earlier assignment that CelEcc\_CBM3a

## REFERENCES


is a multifunctional enzyme that has cellulase, mannanase, and hemicellulase activities. By contrast, CelAcc\_CBM3a has cellulase and only weak hemicellulase while CelRcc-CBM3a only has cellulase activity. This platform automates the handling of both solid biomass and soluble substrates, the introduction of enzymes as individuals or combinations, and the recovery of products for high sensitivity and high-resolution mass spectral analysis. Simplex optimization of the ratios of natural or engineered enzyme combinations produced by systems biology approaches such as gene synthesis and robotic cell-free translation, as well as optimization of the pretreatment conditions can be readily undertaken using this platform.

## AUTHOR CONTRIBUTIONS

KD, BF, and TN designed experiments. KD, JMG, JG, HT, VR-O, XC, NS, RH, TT, LB, HG-H, and SD carried out experimental work, KD, BB, BF, and TN analyzed results, and KD, JMG, BF, and TN prepared the manuscript. DL, KS, BS, and PA supervised the study. All authors read and approved the final manuscript.

## ACKNOWLEDGMENTS

The DOE Joint BioEnergy Institute and DOE Great Lakes Bioenergy Research Center are supported by the US Department of Energy, Office of Science, Office of Biological and Environmental Research, through contract DE-AC02- 05CH11231 and through contract DE-FC02-07ER64494, respectively. Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy's Nuclear Security Administration under contract DE-AC04-94AL85000.

## SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at http://journal.frontiersin.org/article/10.3389/fbioe.2015.00153


**Conflict of Interest Statement:** Kai Deng and Trent R. Northen are co-inventors on a patent application that covers the oxime-NIMS assay. Taichi E. Takasuka and Brian G. Fox are co-inventors on a patent application that covers use of multifunctional enzymes. The remaining authors have no conflict of interest to declare.

*Copyright © 2015 Deng, Guenther, Gao, Bowen, Tran, Reyes-Ortiz, Cheng, Sathitsuksanoh, Heins, Takasuka, Bergeman, Geertz-Hansen, Deutsch, Loqué, Sale, Simmons, Adams, Singh, Fox and Northen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Use of nanostructure-initiator mass spectrometry to deduce selectivity of reaction in glycoside hydrolases

*Kai Deng1,2 , Taichi E. Takasuka3† , Christopher M. Bianchetti3,4 , Lai F. Bergeman3 , Paul D. Adams1,5,6 , Trent R. Northen1,5 and Brian G. Fox3,7\**

*1US Department of Energy Joint BioEnergy Institute, Emeryville, CA, USA, 2Sandia National Laboratories, Livermore, CA, USA, 3US Department of Energy Great Lakes Bioenergy Research Center, Madison, WI, USA, 4Department of Chemistry, University of Wisconsin-Oshkosh, Oshkosh, WI, USA, 5 Lawrence Berkeley National Laboratory, Berkeley, CA, USA, 6Department of Bioengineering, University of California Berkeley, Berkeley, CA, USA, 7Department of Biochemistry, University of Wisconsin-Madison, Madison, WI, USA*

### *Edited by:*

*Robert Henry, The University of Queensland, Australia*

### *Reviewed by:*

*Lixin Cheng, Aarhus University, Denmark Chiranjeevi Thulluri, Jawaharlal Nehru Technological University Hyderabad, India*

### *\*Correspondence:*

*Brian G. Fox bgfox@biochem.wisc.edu*

### *†Present address:*

*Taichi E. Takasuka, Research Faculty of Agriculture, Hokkaido University, Sapporo, Japan*

### *Specialty section:*

*This article was submitted to Bioenergy and Biofuels, a section of the journal Frontiers in Bioengineering and Biotechnology*

*Received: 28 July 2015 Accepted: 02 October 2015 Published: 27 October 2015*

### *Citation:*

*Deng K, Takasuka TE, Bianchetti CM, Bergeman LF, Adams PD, Northen TR and Fox BG (2015) Use of nanostructure-initiator mass spectrometry to deduce selectivity of reaction in glycoside hydrolases. Front. Bioeng. Biotechnol. 3:165. doi: 10.3389/fbioe.2015.00165*

Chemically synthesized nanostructure-initiator mass spectrometry (NIMS) probes derivatized with tetrasaccharides were used to study the reactivity of representative *Clostridium thermocellum* β-glucosidase, endoglucanases, and cellobiohydrolase. Diagnostic patterns for reactions of these different classes of enzymes were observed. Results show sequential removal of glucose by the β-glucosidase and a progressive increase in specificity of reaction from endoglucanases to cellobiohydrolase. Timedependent reactions of these polysaccharide-selective enzymes were modeled by numerical integration, which provides a quantitative basis to make functional distinctions among a continuum of naturally evolved catalytic properties. Consequently, our method, which combines automated protein translation with high-sensitivity and time-dependent detection of multiple products, provides a new approach to annotate glycoside hydrolase phylogenetic trees with functional measurements.

Keywords: cellulase, assay, kinetics, Nimzyme, mass spectrometry, protein engineering, biofuels

## INTRODUCTION

The enzymatic hydrolysis of plant cell wall material is a formidable task because of the complexity of the plant cell wall (Himmel et al., 2007). In most currently deployed cellulosic ethanol plants, enzyme cocktails containing multiple classes of polysaccharide-degrading enzymes are used to hydrolyze plant biomass into fermentable sugars. Understanding the function, synergy, and stability of enzymes is thus of paramount importance in biofuels production.

Polysaccharide-degrading enzymes are classified into families in the carbohydrate active enzyme (CAZy) database (Henrissat and Davies, 1997; Cantarel et al., 2009; Levasseur et al., 2013), including glycoside hydrolases (GHs), pectic lyases (PLs), carbohydrate esterases (CEs), and others. Only a small fraction of the enzymes included in CAZy have a function assigned by biochemical analyses. One root of this limitation arises from difficulties in succeeding with heterologous expression of enzymes after selection from phylogenetic trees (Watson et al., 2007; Fox et al., 2008; Markley

**Abbreviations:** AFEX-SG, ammonia fiber expansion pretreated switchgrass; CBM, carbohydrate-binding module; CelE, broad specificity GH family 5 (GH5) domain from *C. thermocellum* Cthe\_0797; GH, glycoside hydrolase; IL-SG, ionic liquid pretreated switchgrass; NIMS, nanostructure-initiator mass spectrometry.

et al., 2009; Nair et al., 2009; Pieper et al., 2013). As an option to address this limitation, we (Takasuka et al., 2014; Bianchetti et al., 2015) and others (Beebe et al., 2011, 2014; Madono et al., 2011; Hirano et al., 2013, 2015; Makino et al., 2014) have found that wheat germ cell-free protein translation can be used as an effective expression platform to make functional assignments of enzyme function.

Another limitation arises from experimental complications of carrying out high-throughput multisubstrate assays to screen for enzyme function (Gerlt et al., 2011). A breadth of assay methods have been developed for GHs, including use of soluble and insoluble chromogenic and/or fluorogenic substrates, HPLC, and others (Sharrock, 1988; Decker et al., 2003; Chundawat et al., 2008; Bansal et al., 2009; Dowe, 2009; Dashtban et al., 2010; Selig et al., 2011; Eklof et al., 2012; Horn et al., 2012; Kosik et al., 2012; McCleary et al., 2012; Pena et al., 2012; Whitehead et al., 2012; Wischmann et al., 2012). Each of these approaches has intrinsic advantages, but can suffer in sensitivity, complexity of analysis, throughput time, and volumes of reagents and enzyme needed. In comparison, nanostructure-initiator mass spectrometry (NIMS) offers high sensitivity, simplicity of detection of products derived from biomass hydrolysis, microliters or smaller volumes for reaction, and options for automation (Northen et al., 2008; Deng et al., 2012; de Rond et al., 2013; Heins et al., 2014). Recently, we used oxime-NIMS and numerical integration methods to provide time-dependent, quantitative characterization of reducing sugars released by individual enzymes in reactions with pretreated biomass (Deng et al., 2014).

Here, we report a new use of NIMS to provide quantitative analysis of time-dependent reactions of cellulases. The enzymes selected for this study were from *Clostridium thermocellum*, a Gram-positive anaerobe with high cellulolytic capacity (Ding et al., 2008; Fontes and Gilbert, 2010; Smith and Bayer, 2013). The *C. thermocellum* genome encodes ~130 CAZyme domains and ~90 carbohydrate-binding module (CBM) domains (Feinberg et al., 2011). The majority of CAZyme domains also possess dockerin domains, which serve to recruit these enzymes into the cellulosome via dockerin–cohesin interactions (Ding et al., 2008; Smith and Bayer, 2013). The specific gene regulatory and protein secretory patterns of this model consolidated bioprocessing organism have also been well described (Brown et al., 2007; Gold and Martin, 2007; Roberts et al., 2010; Feinberg et al., 2011; Raman et al., 2011; Riederer et al., 2011), and many of the enzymes have been characterized. Given this state of knowledge, individual enzymes from *C. thermocellum* have proven useful for the development and testing of new approaches for assignment of GH function.

In this work, we have used chemically synthesized tetrasaccharide-NIMS probes to study the reactivity of some cellulases from *C. thermocellum*. Patterns of reactivity identified by using the tetrasaccharide-NIMS probes provide a diagnostic approach to assess reaction specificity and also provide comparative apparent rate information. Our results show diagnostic patterns for reactions of a β-glucosidase, relaxed but varied specificity of several endoglucanases and high specificity of a cellobiohydrolase with the model substrate. Time-dependent reactions of these polysaccharide-selective enzymes were modeled by numerical integration, which provides a quantitative basis to make functional distinctions among a continuum of naturally evolved reactive properties. Consequently, this method, which combines high-sensitivity detection of multiple products with quantitative numerical analysis of their time-dependent formation, provides a new approach to enhance the annotation of GH phylogenetic trees with functional measurements.

## MATERIALS AND METHODS

## Enzyme Preparation

Methods for cloning, cell-free translation, and purification of the enzymes studied have been reported elsewhere (Takasuka et al., 2014). Briefly, enzymes were cloned by PCR amplification of catalytic domains as indicated by the first and last codons indicated in **Table 1**. Cloned genes were transferred into an optimized wheat germ cell-free translation plasmid pEU-HSCB (Beebe et al., 2011; Takasuka et al., 2014), which is also available from the NIH Protein Structure Initiative Materials Repository (http://psimr. asu.edu/). Enzymes were prepared by cell-free translation using either bilayer or dialysis methods (Beebe et al., 2011, 2014; Makino et al., 2014), and active enzymes were identified (Takasuka et al., 2014). The enzymes listed in **Table 1** were also cloned by PCR into the *Escherichia coli* expression vector pEC\_CBM3a to create enzyme\_CBM3a fusion proteins, CelAcc\_CBM3a. The vector pEC\_CBM3a is a derivative of pEU\_HSBC\_CBM3a (Takasuka et al., 2014) that yields fusion proteins having an N-terminal enzyme catalytic domain fused by an ~40 aa linker sequence to the CBM3a domain from Cthe\_3077. A stop codon was added to the PCR primer used to amplify the 3′ end of the BglA gene so that no fusion to CBM3a was produced from pEU\_HSBC\_CBM3a. As needed, protein coding sequences were transferred between pEU and pEC vectors by use of FlexiVector cloning (Blommel et al., 2009). Methods for PCR amplification, capture and sequence verification of protein coding sequences, and transformation into *E. coli* 10G competent cells (Lucigen, Middleton, WI, USA) for DNA manipulations and *E. coli* B834 for protein expression were as previously reported (Takasuka et al., 2014). Additional details of the properties and methods for the use of pEU and pVP are described elsewhere (Aceti et al., 2015).

## Synthesis of Cellotetraose-NIMS Substrate

The cellotetraose-NIMS substrate (**Figure 1A**) is an amphiphilic molecule that has a sugar head group coupled to a perfluorinated (F17) tag. The detailed synthetic procedure has been reported previously (Deng et al., 2012).

## Enzyme Reactions

An enzyme reaction consisted of 10 μL of 50 mM phosphate, pH 6.0, supplemented with 1 μL of 1 mM cellotetraose-F17 dissolved in water. An aliquot of each enzyme preparation (containing 1–10 ng of enzyme) was added to initiate the reaction and the resulting mixture was incubated at 37°C. At times of 5, 10, 20, 40, 80, and 120 min, 0.2 μL of the reaction mixture was withdrawn for analysis.



*a First codon of the indicated gene locus that was included in the PCR primer design (Takasuka et al., 2014).*

*bLast codon of the indicated gene locus that was included in the PCR primer design.*

*c Function assigned from annotation as defined in CAZy (Cantarel et al., 2009), from experimental evidence cited in the table, or a combination of both.*

*dRepresentation of the breadth of substrate specificity for each enzyme (Deng et al., 2014). The CMX classification indicates that CelE can hydrolyze cellulose, xylan, or mannan; CX indicates that CelA and CelL can hydrolyze cellulose and xylan, while CelI, CelR, and CelK can only hydrolyze cellulose. This classification derives from reactions with pure* 

*polysaccharides and pretreated biomass (Deng et al., 2014; Takasuka et al., 2014).*

*e CBM3a was subcloned from the scaffoldin CipA gene.*

### Mass Spectrometry

In each case, 0.2 μL per reaction sample was spotted onto the NIMS surface and removed after an incubation of ~30 s. A grid drawn manually on the NIMS chip using a diamond-tip scribe helped with spotting and identification of sample spots in the spectrometer. Chips were loaded using a modified standard MALDI plate. NIMS was performed on a 4800 MALDI TOF/ TOF mass analyzer from AB Sciex (Foster City, CA, USA). In each case, signal intensities were identified for the ions of the cellotetraose substrate and, when present, each product shown in **Figure 1**. For each assay, ~1000 laser shots were collected. Enzyme activities were determined by measuring the intensity ratios of each product over the intensity total of ions of for the cellotetraose-, cellotriose-, cellobiose-, glucose-, and aglycone-NIMS (**Figure 2**).

### Kinetic Analyses

The time dependence of hydrolysis of the tetrasaccharide-NIMS was analyzed by non-linear global optimization of differential equations accounting for the appearance and decay of products (Deng et al., 2014) using Mathematica routine NDSolve and the Nelder-Mead simplex method for constrained minimization (Nelder and Mead, 1965). The differential equations corresponding to the kinetic scheme of **Figure 3** are as follows:

$$\mathbf{y}\begin{bmatrix} 1 \end{bmatrix} = \text{cellotetraose-NIMS} \tag{1}$$

$$\begin{bmatrix} \mathbf{y} \begin{bmatrix} 2 \end{bmatrix} \text{=cellotriose-NIMS} \end{bmatrix} \tag{2}$$

$$y\begin{bmatrix} \mathbf{3} \end{bmatrix} = \text{celloboise-NIMS} \tag{3}$$

$$
\sqrt{4\ } \text{[} \text{4]} = \text{glucose-NIIMS} \tag{4}
$$

$$\mathcal{Y}\begin{bmatrix}\mathbf{5}\end{bmatrix} = \text{aglycone-NIMS} \tag{5}$$

$$\text{d}\underline{\text{y}}\begin{bmatrix} 1 \end{bmatrix} \mathsf{d}\begin{bmatrix} t \end{bmatrix} = -\begin{pmatrix} \mathsf{k}1+\mathsf{k}9+\mathsf{k}11+\mathsf{k}13 \end{pmatrix} \underline{\text{y}}\begin{bmatrix} 1 \end{bmatrix} \underline{\text{t}} \tag{6}$$

$$\text{d}\underline{\chi}\begin{bmatrix}\text{2}\end{bmatrix}\vee\text{d}\begin{bmatrix}\text{t}\end{bmatrix} = \begin{pmatrix}\text{k}\,1\end{pmatrix}\underline{\chi}\begin{bmatrix}\text{1}\end{bmatrix}\underline{\text{t}}\begin{bmatrix}\text{1}\end{bmatrix} - \begin{pmatrix}\text{k}\,3\text{+}\,1\,5+\text{k}\,1\,7\end{pmatrix}\underline{\text{y}}\begin{bmatrix}\text{2}\end{bmatrix}\underline{\text{t}}\begin{bmatrix}\text{1}\end{bmatrix} \quad \text{(7)}$$

$$\begin{aligned} \mathbf{d}\boldsymbol{\upchi}\left[\mathbf{3}\right] \boldsymbol{\upchi}\mathbf{d}\left[\boldsymbol{t}\right] &= \left(\mathbf{k}\mathbf{9}\right)\boldsymbol{\upchi}\left[\mathbf{1}\right] \mathbf{\upleft}\mathbf{t}\right] + \left(\mathbf{k}\mathbf{3}\right)\boldsymbol{\upchi}\left[\mathbf{2}\right] \mathbf{\upleft}\mathbf{t}\right] \\ &- \left(\mathbf{k}\mathbf{5} + \mathbf{k}\mathbf{1}\mathbf{9}\right)\boldsymbol{\upchi}\left[\mathbf{3}\right] \mathbf{\upleft}\mathbf{t}\right] \end{aligned} \tag{8}$$

$$\begin{aligned} \mathbf{d}\boldsymbol{\upbeta}\begin{bmatrix} 4 \\ \end{bmatrix} \boldsymbol{\upbeta}\mathbf{d}\begin{bmatrix} t \\ \end{bmatrix} &= \begin{pmatrix} \mathbf{k}11 \\ \end{pmatrix} \boldsymbol{\upbeta}\begin{bmatrix} 1 \\ \end{bmatrix} \begin{bmatrix} t \\ \end{bmatrix} + \begin{pmatrix} \mathbf{k}15 \\ \end{pmatrix} \boldsymbol{\upbeta}\begin{bmatrix} 2 \\ \end{bmatrix} \boldsymbol{t} \begin{bmatrix} \\ \end{bmatrix} \\ &+ \begin{pmatrix} \mathbf{k}5 \\ \end{pmatrix} \boldsymbol{\upbeta}\begin{bmatrix} 3 \\ \end{bmatrix} \boldsymbol{t} \begin{bmatrix} - \end{bmatrix} \mathbf{k}\boldsymbol{\upbeta} \end{bmatrix} \boldsymbol{\upbeta} \end{aligned} \tag{9}$$

$$\begin{aligned} \mathbf{d}\boldsymbol{\uprho}[\mathbf{5}] \boldsymbol{\uprho}[\mathbf{d}\begin{bmatrix} t \\ \end{bmatrix} &= \begin{pmatrix} \mathbf{k}\mathbf{1}\mathbf{3} \\ \end{bmatrix} \boldsymbol{\uprho}[\mathbf{1}\begin{bmatrix} t \\ \end{bmatrix} + \begin{pmatrix} \mathbf{k}\mathbf{1}\mathbf{7} \\ \end{bmatrix} \boldsymbol{\uprho}[\mathbf{2}\begin{bmatrix} t \\ \end{bmatrix} \\ &+ \begin{pmatrix} \mathbf{k}\mathbf{1}\mathbf{9} \\ \end{bmatrix} \boldsymbol{\uprho}[\mathbf{3}\begin{bmatrix} t \\ \end{bmatrix} + \begin{pmatrix} \mathbf{k}\mathbf{7} \\ \end{pmatrix} \boldsymbol{\uprho}[\mathbf{4}\begin{bmatrix} t \\ \end{bmatrix} \\ \end{bmatrix} \end{aligned} \tag{10}$$

Initial guesses for apparent rate constants were made by visual inspection of the match between the results of single NDSolve calculations and the experimental data. This process was continued in an iterative way until a set of initial apparent rates that adequately matched the experimental data was obtained. Successive rounds of least squares parameter optimization with adjustment of parameter constraints were carried out until the sum of the squares difference between calculated and experimental values reached a minimum and no parameter was artificially constrained.

### RESULTS AND DISCUSSION

### Enzymes Chosen for Study

*Clostridium thermocellum* enzymes were chosen for this study based on previous transcriptomic and proteomic results (Gold and Martin, 2007; Raman et al., 2011; Riederer et al., 2011) and other biochemical and structural results (**Table 1**). Genes encoding these enzymes were expressed using wheat germ cellfree protein synthesis and the translated proteins were assayed using fluorogenic substrates (Takasuka et al., 2014); among the synthesized enzymes, 13 reacted with MUG or MUC, 11 reacted with MUX or MUX2, and 5 reacted with other diagnostic fluorogenic substrates. Reactions of these enzymes with ionic liquid pretreated switchgrass (IL-SG) have been published (Deng et al., 2014). Enzymes from cell-free translation reactions that showed promising characteristics were produced by expression in *E. coli* and purified for use in the studies described here.

## Cellotetraose-NIMS Substrate

**Figure 1** shows the structure of cellotetraose-NIMS and the products that can be formed by various GH reactions. In the synthesized probe, the tetra-saccharide is linked to the NIMS probe by a potentially hydrolyzable anomeric linkage. Synthesis of the NIMS probe and the tetra-saccharide derivatives are summarized in Materials and Methods (Deng et al., 2012; de Rond et al., 2013). The guanidium group on the NIMS probe provides improved ionization properties in the mass spectrometry experiment, while the perfluorinated portion of the NIMS probe provides hydrophobic anchoring of the molecule into the NIMS surface. Enzyme-catalyzed hydrolysis of the anomeric linkages give rise to a cascade of potential products retained on the NIMS surface. Reactions of GHs can progressively remove single glucose units or carry out other reactions that remove cellobiose, cellotriose, or cellotetraose.

## Kinetic Scheme

**Figure 2** shows a representative mass spectrum obtained after partial reaction with BglA (Cthe\_0212), a β-glucosidase. At the selected time point in the reaction (120 min), the cellotetraose-NIMS probe (G4, green) has been partially converted into a mixture of cellotriose (G3, red), cellobiose (G2, blue), glucose (G1, purple), and aglycone (G0, black) derivatives of the NIMS probe. **Figure 3** shows a kinetic scheme that accounts for the potential products shown in **Figure 1**. The scheme accounts for release of one or more glucose units from the cellotetraose-NIMS probe (G4) and its successive products. Time course profiles provide the fundamental data used in this work for numerical analysis of enzyme hydrolysis reactions.

## **β**-Glucosidase BglA Reaction

The nucleotide sequence of BglA (Grabnitz et al., 1991) was published before the genome sequence and annotated to be

a β-glucosidase from the GH1 family (Cantarel et al., 2009). The Cthe\_02012 gene does not encode a signal peptide, so the entire gene was cloned for the studies described here. Beyond our characterization of the reaction of BglA with IL-SG (Deng et al., 2014), no other functional studies have been reported for this enzyme.

**Figure 4** shows the time course for reaction of BglA with cellotetraose-NIMS. The plotted proportions of the different

products come from time series of mass spectra like those shown in **Figure 2**. The solid colored lines are results of simulations of the concentration of individual products based on the kinetic scheme of **Figure 3** and the differential equations shown in the section "Materials and Methods." The apparent rate constants provided by the numerical simulation are given in **Table 2**, and a pictorial representation of the relative magnitudes of the apparent rate constants is also given in **Figure 4**. In the time course of the BglA reaction, cellotetraose-NIMS (green circles) was converted to a succession of intermediates by hydrolysis of a single glucose from the position most distal to the NIMS probe. This pattern of reactivity is as expected for the reaction of an exo-β-glucosidase with an oligosaccharide. Thus, cellotriose-NIMS (red squares) accumulated was subsequently converted to cellobiose-NIMS (purple down triangles), to glucose-NIMS (blue diamonds), and ultimately to aglycone-NIMS (black up triangles).

Table 2 indicated by width of arrows in the modified kinetic scheme. A

dashed line indicates that the apparent rate was zero.

TABLE 2 | Apparent rate constants (min−<sup>1</sup> ) from numerical integration of time course reactions with cellotetraose-NIMS.


*Rates for individual enzymes are color coded with smallest values in blue and larger values in red.*

There are several features of the BglA reaction and simulation that warrant attention. The apparent rates k1, k3, k5, and k7, which correspond to successive removal of single glucose groups, dominate the numerical solution (**Table 2**; **Figure 4**). Under the reaction conditions used, BglA was able to completely convert cellotetraose-NIMS to aglycone-NIMS. It is also noteworthy that shortening the oligosaccharide chain led to an enhancement in the rate of hydrolysis, with reactions k5 (converting cellobiose-NIMS to glucose-NIMS) and k7 (converting glucose-NIMS to aglycone-NIMS) being fastest. Other apparent rates corresponding to side reactions for removal of cellobiose or larger oligosaccharides (e.g., k9 for removal of cellobiose from cellotetraose-NIMS) were less than 1/100th of the value observed for k1, the smallest of the central reactions. These simulation results are consistent with the assigned function of BglA as a β-glucosidase. Indeed, prior oxime-NIMS studies of the reaction of BglA with IL-SG revealed that glucose was the only product released from the biomass substrate (Deng et al., 2014). In the following paragraphs, these diagnostic behaviors of a beta-glucosidase are contrasted with two other classes of GHs, including five phylogenetically diverse endoglucanases and one cellobiohydrolase.

## Endoglucanase and Cellobiohydrolase Reactions

**Figure 5** shows time courses for reactions of endoglucanases CelA, CelI, CelE, CelR, CelL, and cellobiohydrolase CelK with cellotetraose-NIMS. The reactions of the individual enzymes were carried out and evaluated as described above for **Figure 4**. The appearance of the reaction time courses and the relative rates observed are markedly different than observed for BglA. Unlike the β-glucosidase reaction, no intermediates were observed to form and decay, and the central reactions corresponding to release of glucose units were negligible. This seemingly corresponds with the requirement of endoglucanases for a longer oligosaccharide chain to occupy the active site as a determinant of productive binding and catalysis. In effect, the endoglucanases and cellobiohydrolase primarily reacted only once with the cellotetraose-NIMS probe, leading to a markedly simpler cascade of products than observed for the betaglucosidase. None of the enzymes characterized in **Figure 5** was able to carry out reactions that yielded the aglycone-NIMS product (black up triangles), suggesting unproductive binding or blocking steric interactions of the NIMS product with adjacent features of the active site. In contrast, the β-glucosidase BglA (**Figure 4**) was able to successively remove all glucose groups from cellotetraose-NIMS to yield aglycone-NIMS.

## Endoglucanase CelA Reactions

CelA (Cthe\_0269) is a GH8 endoglucanase. It is one of the most abundantly transcribed and secreted proteins in *C. thermocellum* during growth on cellulosic substrates (Brown et al., 2007; Gold and Martin, 2007; Raman et al., 2011; Riederer et al., 2011). Analysis of the crystal structure of the enzyme suggested that the substrate binding channel was optimally configured to bind a cellopentaose molecule (Alzari et al., 1996).

The functional characterizations of **Figure 5** demonstrate a progression in reaction selectivity among the enzymes studied. This is a unique power arising from the combination of timedependent NIMS with numerical analysis. For CelA (**Figure 5A**), k11 governed removal of cellotriose from cellotetraose-NIMS, leading to the predominant accumulation of glucose-NIMS (88%, purple down triangles). The alternative removal of cellotriose via the two step pathway of k1 (removal of glucose) and k15 (removal of cellobiose) contributed ~9% to the overall product yields, while reaction via k9 (removal of cellobiose) added only ~3% of total products as cellobiose-NIMS (blue diamonds). It is worth noting that CelA gave the slowest hydrolysis of cellotetraose-NIMS of all enzymes tested, which is reflected in the values of apparent rates reported in **Table 2** and also in the shape of the plots in **Figure 5**. This may also reflect a partial rate diminution caused by a mismatch between cellotetraose-NIMS and a preferred cellopentaose occupying the active site channel.

In our earlier reactions of CelA with IL-SG (Deng et al., 2014), a mixture of glucose, cellobiose, triose, and tetraose was observed. Other than cellotetraose, whose release from cellotetraose-NIMS was probably prevented by improper binding of the NIMS moiety in the active site channel, the suite of products given by CelA reaction with cellotetraose-NIMS was comparable to that observed from reactions with the pretreated biomass (Deng et al., 2014).

## Endoglucanase CelI, CelE, and CelR Reactions

For the reactions of CelI (**Figure 5B**), CelE (**Figure 5C**), and CelR (**Figure 5D**), the dominant pattern of preferred removal of cellotriose units to yield glucose-NIMS (purple down triangles) was retained. However, functional differences of these three enzymes were identified as the removal of cellobiose leading to cellobiose-NIMS (blue diamonds) assumed an increasing contribution to the total product distribution. For example, the observed change corresponds to an approximately eightfold increase in k9 between CelI and CelR. In the middle of these boundary enzymes, CelE was unique among the endoglucanases tested as it was also able to release a glucose unit from cellotetraose-NIMS in ~2% yield. In reactions with IL-SG and ammonia fiber expansion pretreated switchgrass (AFEX-SG) (Deng et al., 2014), these three enzymes released a mixture of glucose, cellobiose, and cellotriose, with the distribution of products in the biomass reaction shifted toward cellobiose and glucose. However, this shift is, in part, due

Ce6lI; (C) CelE; (D) CelR; (E) CelL; and (F) CelK.

to the ability of these enzymes to cleave solubilized cellotriose into cellobiose and glucose. Subsequent hydrolysis of released oligosaccharides could not be detected when cellotetraose-NIMS was the substrate.

CelI (Cthe\_0040) has a structure consisting of GH9 and two CBM3 domains (Hazlewood et al., 1993). It catalyzes the hydrolysis of 1,4-β-glucosidic linkages in cellulose and other glucans. The structure suggests the position of a tunnel that can permit the release of either cellotriose or cellobiose from cellotetraose-NIMS (PDB 2XFG, no associated publication).

CelE (Cthe\_0797) is a multidomain enzyme consisting of GH5, dockerin, and GSDL-lipase domains. Our work has shown that the GH5 domain has broad specificity for reaction with cellulose, xylan, mannan, xyloglucan, and other polysaccharides (Deng et al., 2014; Takasuka et al., 2014). The active site channel of this enzyme is open and tolerates the placement of each of these different linear and branched polysaccharides in a way that a glycosidic bond can be placed in the appropriate position for hydrolysis (Bianchetti et al., 2015). The release of cellotriose, cellobiose, and glucose from cellotetraose-NIMS is compatible with this broad specificity active site. Nevertheless, the active site is not sufficiently tolerant to remove cellotetraose, leading to the formation of aglycone-NIMS.

Previous studies have reported that CelR (Cthe\_0578) is a β-glucanase with preference for release of cellotetraose in reactions with amorphous cellulose (Zverlov et al., 2005). Subsequently, CelR was able to convert the longer solubilized oligosaccharide to shorter oligosaccharides. The present studies provide support for this conclusion, as k11 for release of cellotriose was the predominant reaction with cellotetraose-NIMS. Our studies of CelR in reactions with IL-SG and AFEX-SG gave glucose and cellobiose as the dominant hydrolysis products (Deng et al., 2014), suggesting a kinetically rapid conversion of longer oligosaccharides to shorter during the duration of the reaction. Removal of cellotetraose was not observed from cellotetraose-NIMS, which as proposed above likely represents ineffective binding of the NIMS probe in the active site adjacent to the active site.

## Endoglucanase CelL and Cellobiohydrolase CelK Reactions

We tested the cellotetraose-NIMS reactions with an additional endoglucanase, CelL (Cthe\_0405, **Figure 5E**), and a reducing end cellobiohydrolase, CelK (Cthe\_0212, **Figure 5F**). These enzymes show a shift in reaction specificity so that removal of cellobiose to produce cellobiose-NIMS (blue diamonds) became the dominant pattern of reaction. Notably, CelL had an approximately threefold enhanced ability to remove cellobiose relative to CelR because of a higher k9 value and also an ~10-fold decrease in the ability to remove cellotriose associated with a lower k11 value (**Table 2**). CelL reacted with IL-SG also showed preference for release of cellobiose (Deng et al., 2014). Furthermore, although CelK also had an approximately threefold enhanced ability to remove

## REFERENCES


cellobiose relative to CelR because of a higher k9 value, it showed no ability to produce either cellotriose or glucose (e.g., k1 and k11 = 0; **Table 2**).

The high specificity for release of cellobiose by a cellobiohydrolase is a characteristic reactivity (Amano et al., 1996; Barr et al., 1996; Divne et al., 1998), including CelK (Kataeva et al., 1999) and also CelK reacted with IL-SG (Deng et al., 2014). Thus, cellotetraose-NIMS clearly reports on this catalytic function of CelK. There are no previously published reactivity studies or crystal structures of CelL, beyond our studies of reaction with IL-SG, where CelL showed strong preference for release of cellobiose and xylobiose from the pretreated biomass (Deng et al., 2014).

## CONCLUSION

This work establishes the utility of a chemically synthesized mass spectral probe for characterization of GHs. We have shown remarkable correspondence between the products obtained from enzyme reactions with the synthetic cellotetraose-NIMS probe and IL- and AFEX-pretreated switchgrass (Deng et al., 2014). Because of the emerging success of robotic cell-free translation to provide active enzyme samples from synthesized genes (Takasuka et al., 2014; Bianchetti et al., 2015), the substantial advantages of automation and miniaturization afforded by the Nimzyme platform (Deng et al., 2012, 2014; de Rond et al., 2013; Heins et al., 2014), and the predictive power inherent in numerical analysis of enzyme reaction time courses (Cleland, 1975; Orsi and Tipton, 1979; Duggleby, 1995; Marangoni, 2003), our combination offers a powerful new approach for functional annotation of bioenergy phylogenetic space.

## AUTHOR CONTRIBUTIONS

KD, TT, CB, LB, PA, TN, and BF designed experiments, carried out experimental work, analyzed results, and prepared the manuscript. All authors read and approved the final manuscript.

## FUNDING

The DOE Great Lakes Bioenergy Research Center and DOE Joint BioEnergy Institute are supported by the US Department of Energy, Office of Science, Office of Biological and Environmental Research, through contract DE-FC02-07ER64494 and through contract DE-AC02-05CH11231, respectively.

xyloglucan. *J. Biochem.* 120, 1123–1129. doi:10.1093/oxfordjournals.jbchem. a021531


structural studies including membrane proteins. *N Biotechnol.* 28, 239–249. doi:10.1016/j.nbt.2010.07.003


Marangoni, A. G. (2003). *Enzyme Kinetics*. Hoboken, NJ: John Wiley & Sons, Inc.


**Conflict of Interest Statement:** Kai Deng and Trent R. Northen are coinventors on a patent application that covers the oxime-NIMS assay. Taichi E. Takasuka, Christopher M. Bianchetti, and Brian G. Fox are coinventors on a patent application that covers use of multifunctional enzymes. Lai F. Bergeman and Paul D. Adams have no conflict of interest to declare.

*Copyright © 2015 Deng, Takasuka, Bianchetti, Bergeman, Adams, Northen and Fox. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

*Sivakumar Pattathil1,2\*, Utku Avci1,2 , Tiantian Zhang1 , Claudia L. Cardenas1† and Michael G. Hahn1,2*

*1Complex Carbohydrate Research Center, University of Georgia, Athens, GA, USA, 2Oak Ridge National Laboratory, BioEnergy Science Center (BESC), Oak Ridge, TN, USA*

### *Edited by:*

*Jason Lupoi, University of Queensland, USA*

### *Reviewed by:*

*Xu Fang, Shandong University, China Arumugam Muthu, Council of Scientific and Industrial Research, India*

> *\*Correspondence: Sivakumar Pattathil siva@ccrc.uga.edu*

*†Present address: Claudia L. Cardenas, Central Piedmont Community College, Charlotte, NC, USA*

### *Specialty section:*

*This article was submitted to Bioenergy and Biofuels, a section of the journal Frontiers in Bioengineering and Biotechnology*

> *Received: 20 August 2015 Accepted: 12 October 2015 Published: 28 October 2015*

> > *Citation:*

*Pattathil S, Avci U, Zhang T, Cardenas CL and Hahn MG (2015) Immunological approaches to biomass characterization and utilization. Front. Bioeng. Biotechnol. 3:173. doi: 10.3389/fbioe.2015.00173*

Plant biomass is the major renewable feedstock resource for sustainable generation of alternative transportation fuels to replace fossil carbon-derived fuels. Lignocellulosic cell walls are the principal component of plant biomass. Hence, a detailed understanding of plant cell wall structure and biosynthesis is an important aspect of bioenergy research. Cell walls are dynamic in their composition and structure, varying considerably among different organs, cells, and developmental stages of plants. Hence, tools are needed that are highly efficient and broadly applicable at various levels of plant biomass-based bioenergy research. The use of plant cell wall glycan-directed probes has seen increasing use over the past decade as an excellent approach for the detailed characterization of cell walls. Large collections of such probes directed against most major cell wall glycans are currently available worldwide. The largest and most diverse set of such probes consists of cell wall glycan-directed monoclonal antibodies (McAbs). These McAbs can be used as immunological probes to comprehensively monitor the overall presence, extractability, and distribution patterns among cell types of most major cell wall glycan epitopes using two mutually complementary immunological approaches, glycome profiling (an *in vitro* platform) and immunolocalization (an *in situ* platform). Significant progress has been made recently in the overall understanding of plant biomass structure, composition, and modifications with the application of these immunological approaches. This review focuses on such advances made in plant biomass analyses across diverse areas of bioenergy research.

Keywords: glycome profiling, immunolocalization, cell walls, biomass, antibodies

## INTRODUCTION

### Complexity and Dynamics of Plant Cell Walls Constituting Biomass

Plant biomass, the prime feedstock for lignocellulosic biofuel production, constitutes the principal sustainable resource for renewable bioenergy. Identifying the optimal plant biomass types that are most suitable for biofuel production and optimizing their downstream processing and utilization are at the forefront of modern-day lignocellulosic feedstock research. The focus of much of this research is the examination of diverse classes of plants for their potential as cost-effective and sustainable raw materials for biofuel production. For example, biomass materials originating from classes of plants ranging from herbaceous dicots (e.g., alfalfa), woody dicots (e.g., poplar), perennial monocots (e.g., *Agave* spp.), herbaceous monocots (e.g., grasses such as *Miscanthus*, sugarcane, and switchgrass), and woody gymnosperms (e.g., pines) are regarded as potentially promising resources for biofuel production (Galbe and Zacchi, 2007; Gomez et al., 2008; Somerville et al., 2010).

Cell walls constitute the major part of plant biomass, and physicochemical features of these cell walls vary among biomass materials from diverse plant classes (Pauly and Keegstra, 2008; Popper, 2008; Fangel et al., 2012). For example, cell walls from grass biomass have distinct structural and compositional features [with a higher abundance of glucuronoarabinoxylans and the presence of mixed-linkage glucans (Vogel, 2008)] that are quite different from those of highly lignified woody biomass (Studer et al., 2011) or herbaceous dicot biomass (Burton et al., 2010; Liepman et al., 2010). Even within a plant, the structure and composition of cell walls can vary significantly depending on the cell types, organs, age, developmental stage, and growth environment (Freshour et al., 1996; Knox, 2008). These cell wall variations are the result of differences in the relative proportions and structural dynamics that occur among the major cell wall polymers, which include (but are not limited to) cellulose, hemicelluloses, pectic polysaccharides, and lignin (Pauly and Keegstra, 2008). Several structural models for plant cell walls have been proposed and published (McNeil et al., 1984; McCann and Roberts, 1991; Carpita and Gibeaut, 1993; Carpita, 1996; Cosgrove, 1997; Somerville et al., 2004; Loqué et al., 2015); all of these models focus on the primary wall. To our knowledge, no model has been proposed for secondary plant cell walls, which constitute the bulk of the biomass used for bioenergy production. In vascular plants, non-glycan components such as lignin (especially in secondary cell wall-containing tissues such as sclerenchyma and xylem cells) are important for optimal growth and development of plants by playing important roles in maintaining cell wall integrity to optimally facilitate water transportation, rendering mechanical support and defense against pathogens (Weng and Chapple, 2010; Voxeur et al., 2015). A high abundance of lignin in cell walls is regarded as disadvantageous for biomass utilization for biofuel production as it contributes significantly to recalcitrance. Transgenic plants that are genetically modified for reduced lignin biosynthesis have been shown to exhibit reduced recalcitrance properties (Chen and Dixon, 2007; Pattathil et al., 2012b). The abundance of diverse potential plant biomass feedstocks that are available to be studied and the aforementioned variations among the cell walls constituting them pose a major challenge in lignocellulosic bioenergy research.

Research on the structure, function, and biosynthesis of plant cell walls has received new impetus with advances in genome sequencing that have made available, for the first time, whole genomes from diverse plant families. Thus, complete genomes have been sequenced for plants from diverse phylogenetic classes including both herbaceous [e.g., *Arabidopsis* (The Arabidopsis Genome Initiative, 2000); *Medicago* (Young et al., 2011)] and woody dicots [e.g., *Populus* (Tuskan et al., 2006)] and monocotyledonous grasses [e.g., maize (Schnable et al., 2009), rice (Goff et al., 2002; Yu et al., 2002), and brachypodium (The International Brachypodium Initiative, 2010)]. The availability of these genome sequences has, in turn, dramatically expanded experimental access to genes and gene families involved in plant primary and secondary cell wall biosynthesis and modification. Functional characterization of cell wall-related genes and the proteins that they encode, combined with expanded research on cell wall deconstruction, have dramatically enhanced our understanding of wall features important for biomass utilization.

## Genetic Approaches to Studies of Cell Walls with Impacts on Lignocellulosic Bioenergy Research

Cell walls are known for their innate resistance to degradation and specifically to the breakdown of their complex polysaccharides into simpler fermentable sugars that can be utilized for microbial production of biofuels. This property of plant cell walls is referred to as "recalcitrance" (Himmel et al., 2007; Fu et al., 2011). Cell wall recalcitrance has been identified as the most well-documented challenge that limits biomass conversion into sustainable and cost-effective biofuel production (Himmel et al., 2007; Pauly and Keegstra, 2008; Scheller et al., 2010). Hence, identifying cell wall components that affect recalcitrance has been an important target of lignocellulosic bioenergy research (Ferraz et al., 2014). A number of plant cell wall polymers, including lignin, hemicelluloses, and pectic polysaccharides, have been shown to contribute to cell wall recalcitrance (Mohnen et al., 2008; Fu et al., 2011; Studer et al., 2011; Pattathil et al., 2012b).

Most of the studies directed toward overcoming recalcitrance focus on genetically modifying plants by specifically targeting genes involved in the biosynthesis or modification of wall polymers (Chen and Dixon, 2007; Mohnen et al., 2008; Fu et al., 2011; Studer et al., 2011; Pattathil et al., 2012b) with the objective of generating a viable, sustainable biomass crop that synthesizes cell walls with reduced recalcitrance. Identification of target genes for reducing recalcitrance has relied largely on model plant systems, particularly *Arabidopsis*, and then to transfer that information to biofuel crops. This has been particularly successful for genes and pathways that participate directly or indirectly in secondary cell wall biosynthesis and development. Secondary walls constitute the bulk of most biofuel feedstocks and thus become a main target for genetic modification (Chundawat et al., 2011; Yang et al., 2013). Secondary wall synthetic genes that have been investigated in this way include, for example, several genes that are involved in cellulose [such as various *CesA* genes (Joshi et al., 2004, 2011; Taylor et al., 2004; Brown et al., 2005; Ye et al., 2006)] and xylan biosynthesis [*IRX8* (Brown et al., 2005; Ye et al., 2006; Peña et al., 2007; Oikawa et al., 2010; Liang et al., 2013), *IRX9* (Brown et al., 2005; Lee et al., 2007, 2011a; Peña et al., 2007; Oikawa et al., 2010; Liang et al., 2013), *IRX9L* (Oikawa et al., 2010; Wu et al., 2010), *IRX14* (Oikawa et al., 2010; Wu et al., 2010; Lee et al., 2011a), *IRX14L* (Wu et al., 2010; Lee et al., 2011a), *IRX15* (Brown et al., 2011), and *IRX15L* (Brown et al., 2011)] in dicots. In addition, a number of transcription factors including plant-specific NAC-domain transcription factors [*SND1*, *NST1*, *VND6*, and *VND7* in *Arabidopsis* (Kubo et al., 2005; Zhong et al., 2006, 2007b)], WRKY transcription factors [in *Medicago* and *Arabidopsis* (Wang et al., 2010; Wang and Dixon, 2012)], and MYB transcription factors [*MYB83* (McCarthy et al., 2009) and *MYB46* (Zhong et al., 2007a) in *Arabidopsis*] with potential involvement in secondary wall biosynthesis and development have been functionally characterized. Examples of the successful transfer of insights gained in model dicots to studies of orthologous genes in monocots include investigations of rice *IRX* orthologs involved in xylan biosynthesis and secondary wall formation (Oikawa et al., 2010) and experiments on transcription factors controlling secondary wall formation in several grasses (Handakumbura and Hazen, 2012; Shen et al., 2013; Valdivia et al., 2013). These molecular genetic approaches toward understanding and manipulating cell wall-related genes for biofuel feedstock improvement would be assisted by improved methods for rapidly identifying and characterizing the effects of genetic changes on cell wall components.

## Need for Efficient Tools for Plant Cell Wall/ Biomass Analyses

The structural complexity of plant cell walls, regardless of their origin, is challenging to analyze, particularly in a high-throughput manner. To date, most of the plant cell wall analytical platforms have been based on the preparation of cell wall materials and/ or extracts that are selectively enriched for particular wall polysaccharides, followed by colorimetric assays (Selvendran and O'Neill, 1987), chemical derivatizations coupled with gas chromatography (Albersheim et al., 1967; Sweet et al., 1974, 1975a,b), mass spectroscopy (Lerouxel et al., 2002), and nuclear magnetic resonance spectroscopy (NMR) (Peña et al., 2008) to gain compositional and structural information about those polysaccharides. Some of these methods have been adapted for biomass analytics [see, for review, Sluiter et al. (2010)]. Overall, these tools have allowed extensive progress in delineating basic structural features of diverse classes of plant cell wall polysaccharides. However, these experimental approaches for plant cell wall/biomass analysis are time-consuming, require specialized and, in some cases, expensive equipment, are low in throughput, and usually provide information only about a single polysaccharide of specific interest. However, given the number of wall components that have already been shown to influence cell wall recalcitrance, and the complex and heterogeneous nature of cell wall components in diverse plants, it is desirable to have additional tools, particularly those with higher throughput and the capability to monitor a broad spectrum of wall polymers. Over the past 10 years, immunological approaches for plant cell wall and biomass analyses have emerged as tools that are broadly applicable to multiple aspects of interests to the biofuel research community, including characterization of genetically altered plant feedstocks, investigations of the effects of diverse biomass pretreatment processes, and the effects of enzymatic or microbial deconstruction of cell walls. In the following sections, we review applications of two immunological tools for studies on plant biomass that employ a comprehensive collection of plant cell wall glycan-directed probes.

## PROBES FOR BIOMASS ANALYSES

Currently, well-characterized cell wall-directed probes range from small molecules (Wallace and Anderson, 2012) to larger proteinaceous probes such as carbohydrate-binding modules (CBMs) and monoclonal or polyclonal antibodies (Knox, 2008; Pattathil et al., 2010; Lee et al., 2011b). In this review, we will focus on the latter cell wall-directed probes.

### Glycan-Directed Probes Monoclonal Antibodies

Plant cell wall glycan-directed monoclonal antibodies (McAbs) are among the most commonly used probes for plant cell wall analyses. McAbs, commonly available as hybridoma culture supernatants, are monospecific probes that recognize specific glycan sub-structures (epitopes) present in plant polysaccharides (Knox, 2008; Pattathil et al., 2010). McAbs have several advantages that make them particularly suited for use as glycan-directed probes. First, since each antibody is the product of a single clonal cell line, each McAb is by definition monospecific with regard to the epitope that is recognized. This is important for studies of glycans, whose structures are frequently repetitive and whose substructures can be found in multiple macromolecular contexts (e.g., arabinogalactan epitopes present on glycoproteins and on rhamnogalacturonan I). The monospecific nature of McAbs also means that, in theory, the binding specificity of the antibody can be determined unambiguously, although this is still difficult for glycan-directed antibodies given the complexity of plant cell wall glycan structures. McAbs also typically bind to their epitopes with high affinity (*K*d ~10<sup>−</sup><sup>6</sup> M), which makes them very sensitive reagents for detecting and quantitating molecules to which they bind. Finally, another significant advantage with McAbs is that their supply is not limited, as cell lines producing them can be cryopreserved indefinitely (some hybridoma lines whose plant glycan-directed antibodies are frequently used today were generated more than 20 years ago) and can be regrown at any time to produce additional McAb, which retains the binding selectivity and affinity of the original McAb, as needed in any quantities required. Currently, a worldwide collection of over 200 McAbs (Pattathil et al., 2010, 2012a) exists (**Figure 1**) that encompasses antibodies recognizing diverse structural features of most major non-cellulosic cell wall glycans, including arabinogalactans, xyloglucans, xylans, mannans, homogalacturonans, and rhamnogalacturonan I. So far, McAbs that bind reliably and specifically to rhamnogalacturonan II have not been reported. The available plant glycan-directed McAbs can be obtained from several stock centers (see **Table 1**) or from the individual research laboratories that generated them. A listing of the McAbs currently available is not practical here. The reader is referred to a plant cell wall McAb database, Wall*Mab*DB,1 where detailed descriptions of most of the currently available plant glycan-directed McAbs, including immunogen, antibody isotype, and epitope structure (to the extent known), can be obtained.

<sup>1</sup>http://www.wallmabdb.net.

Early studies in our laboratory screened 130 of the plant glycan-directed McAbs available at the time for their binding specificity to 54 structurally characterized polysaccharide preparations from diverse plants (Pattathil et al., 2010). Hierarchical clustering analyses of the resultant binding response data resolved the McAbs into 19 antibody clades based on their binding specificities to the 54 plant glycans tested (Pattathil et al., 2010). A more recent study that included almost all available plant glycan-directed McAbs further resolved the antibody collection into about 31 clades of McAbs (Pattathil et al., 2012a). **Figure 1** shows the data from most recent screening studies employing ~210 plant glycan-directed McAbs. While these broad specificity screens provide considerable information about the binding specificities of the McAbs in the collection, they do not provide complete detailed epitope information for the antibodies. Such detailed epitope characterization studies require the availability

### Pattathil et al. Plant biomass characterization

### TABLE 1 | List of major CBMs currently used for plant cell wall analyses.


of purified, structurally characterized oligosaccharide fragments and/or purified and characterized glycosylhydrolases capable of selectively attacking epitope structures. To date, a relatively small number of plant glycan-directed McAbs have had their epitopes characterized in detail using these resources (Meikle et al., 1991, 1994; Puhlmann et al., 1994; Steffan et al., 1995; Willats et al., 2000a; Clausen et al., 2003, 2004; McCartney et al., 2005; Verhertbruggen et al., 2009; Marcus et al., 2010; Ralet et al., 2010; Pedersen et al., 2012; Schmidt et al., 2015). Recent advances in methods for immobilization of oligosaccharides on solid surfaces (Fukui et al., 2002; Wang et al., 2002; Willats et al., 2002; Blixt et al., 2004; Pedersen et al., 2012) is facilitating such epitope characterization studies, but the bottleneck remains the availability of comprehensive sets of purified, well-characterized plant glycan-related oligosaccharides.

### Carbohydrate-Binding Modules

Carbohydrate-binding modules are another set of proteinaceous probes that have been used to study plant polysaccharide localization patterns *in vivo* (Knox, 2008). CBMs are amino acid sequences that are contiguous with the catalytic domain in a carbohydrate-active enzyme and are capable of binding to a carbohydrate structural domain (McCartney et al., 2006; Knox, 2008). CBMs have been shown to enhance the efficiency of cell wall hydrolytic enzymes by facilitating sustained and close contact between their associated catalytic modules and targeted substrates (Boraston et al., 2004; Zhang et al., 2014). Although CBMs have been known to occur in several plant enzymes, most CBMs that are used as probes for cell wall glycans are microbial in origin (Boraston et al., 2004; Shoseyov et al., 2006). CBMs, in contrast to the antibody probes described above, are relatively easy to prepare, given that their gene/protein sequences are known (McCann and Knox, 2011). CBMs have been classified into 71 sequence-based families.2 CBMs from approximately half of these families have been shown to bind to diverse plant cell wall polysaccharides, including cellulose (Blake et al., 2006), mannans (Filonova et al., 2007), xylans (McCartney et al., 2006), and most recently, the galactan side chains of rhamnogalacturonan I (Cid et al., 2010). Protein engineering of a xylan-binding CBM using random mutagenesis, phage-display technology, and affinity maturation has been employed to generate xyloglucan-specific

<sup>2</sup>http://www.cazy.org/Carbohydrate-Binding-Modules.html.

CBMs (Gunnarsson et al., 2006; von Schantz et al., 2009, 2012), showing that it is possible to generate CBMs with new and heretofore unseen specificities.

Carbohydrate-binding modules that have been used to detect cellulose, xylan, mannan, xyloglucan, and pectic galactans in plant cells and tissues, together with information about their origins, are listed in **Table 1**. Binding of various CBMs is usually assessed by an indirect triple-labeling immunofluorescence procedure (His-tagged CBM, anti-His mouse-Ig, and anti-mouse Ig fluorescein isothiocyanate) in plant tissue sections (Knox, 2008; Hervé et al., 2010), which is slightly more complicated than the double-labeling procedure used with McAbs (Avci et al., 2012). The binding specificities exhibited by the CBMs enlarge the suite of probes available for biomass analyses, given that at least some of them bind to carbohydrate structures, such as cellulose substructures, for which no McAbs probes have been developed to date. Additional advantages of the CBMs are the availability of their gene and protein sequences and the wealth of structural information, including in many instances X-ray crystal structures, about their binding sites. Potential disadvantages of CBMs are their typically lower affinity for their ligands and the lower selectivity of their binding sites compared with McAb probes. Nonetheless, CBMs are useful probes for analyzing biomass.

### Immunological Probes Against Lignin

Lignins are phenylpropanoid polymers comprising 5–30% of biomass weight and have been considered as important sources of renewable aromatics (McKendry, 2002). Lignin composition and structure vary considerably depending on the plant species and on the cell type where lignins are deposited (Ruel et al., 1994; Donaldson, 2001). For example, in gymnosperms, lignins are mainly composed of guaiacyl units, whereas in angiosperms, lignins are formed by guaiacyl and syringyl units (Donaldson, 2001). In angiosperms, the guaiacyl-containing lignins are located mainly in secondary cell walls of vessels while syringyl-containing lignins are found on fibers (Ruel et al., 1994; Joseleau et al., 2004; Patten et al., 2010). Lignin composition and localization are also affected by pretreatment strategies aimed at removing lignin from biomass. For example, potassium permanganate labeling and electron microscopy studies revealed morphological alterations in *Zea mays* lignins subjected to different thermochemical pretreatments (Donohoe et al., 2008).

Lignin is most frequently visualized in plant tissue sections using selectively reactive histochemical stains such as phloroglucinol–HCl and Mäule reaction that can distinguish guaiacylenriched from syringyl-enriched cell wall regions (Patten et al., 2010). Although the various histochemical lignin stains provide general information about the localization of different lignin types, they cannot provide detailed information about specific lignin substructures; this would require more highly selective probes.

Given the structural complexity and variability of lignin, several laboratories have undertaken the development of immunological probes for lignins and/or lignin substructures. Much of the early work in this area focused on the production of polyclonal antisera. Thus, polyclonal antisera were raised against synthetic dehydrogenative polymers (DHPs) prepared from the appropriate *p*-hydroxycinnamic alcohols [*p*-hydroxyphenylpropane (H), guaiacyl (G), or syringyl (S), or mixtures of these] (Ruel et al., 1994; Joseleau et al., 2004). These polyclonal sera showed specificity toward the DHPs used to generate them. Other laboratories have generated polyclonal sera against milled wood lignin (Kim and Koh, 1997) or model compounds based on lignin substructures (Kukkola et al., 2003, 2004). The main difficulty with these polyclonal sera is that they are in limited supply, and many of these antisera are no longer available. Thus, new immunizations must be carried out, with uncertain outcomes with regard to the ability to reproduce the specificity of the original antisera; a fundamental problem with polyclonal antisera. In an effort to overcome this limited supply issue, two lignin-related model compounds, dehydrodiconiferyl alcohol and pinoresinol, were used to generate McAbs against these two lignin dimers (Kiyoto et al., 2013); supplies of these antibodies should not be limited. The antibody directed to dehydrodiconiferyl alcohol (KM1) displayed specificity toward a dehydrodiconiferyl alcohol 8-5′ model compound, whereas the antibody directed against pinoresinol (KM2) responded to two 8-8′ model compounds, pinoresinol and syrangaresinol. This recent development suggests that it will be possible, in principle, to generate specific McAbs against diverse lignin substructures. The number and diversity of lignin-directed McAbs will need to be increased in order to fully exploit these probes for greater insights into lignin structural diversity, localization patterns, and integration into the plant cell wall.

## TWO MAJOR APPROACHES FOR McAb/ CBM-BASED ANALYSES OF PLANT BIOMASS

The use of McAb/CBM probes to define the localization of plant cell wall components has a long history. These probes have been used in basic plant cell wall research to study the effects of mutations in wall-related genes on plant cell wall structure and composition, to study changes in plant cell walls during growth, development, and differentiation, and to study changes in plant cell walls that result from environmental and pathogenic influences. A comprehensive review of this literature is beyond the scope of this minireview and the reader is referred to several recent reviews to gain an overview of this literature (Knox, 1997, 2008; Willats et al., 2000b; Lee et al., 2011b; McCann and Knox, 2011). The use of McAb probes, in particular, is rapidly expanding due to the recent dramatic increase in the number and diversity of plant cell wall-directed antibodies (Pattathil et al., 2010) and the availability of more detailed information about the epitopes recognized by these McAbs (Pedersen et al., 2012; Schmidt et al., 2015).

We will concentrate here on an overview of recent studies that have taken advantage of the availability of the comprehensive collection of cell wall-directed McAb/CBM probes for studying plant biomass of interest as possible lignocellulosic feedstocks for biofuel production. These studies have focused on using these probes to understand the effects of genetic modification on biomass recalcitrance, to study the effects of different pretreatment regimes on biomass digestibility, and to study how microbes being considered for consolidated bioprocessing deconstruct plant biomass. Two complementary experimental approaches have been principally employed in these studies, namely, glycome profiling (Moller et al., 2007, 2008; Pattathil et al., 2012a) and immunolocalization (Avci et al., 2012). The following sections provide an overview of the studies with bioenergy implications done to date using these approaches.

### Studies Using Glycome Profiling

Glycome profiling involves the sequential extraction of insoluble cell wall/biomass samples with a series of reagents of increasing harshness and then screening the extracted cell wall materials with McAbs to determine which cell wall polymers are released in which extract. Thus, this experimental method provides two pieces of important information: (1) it provides detailed information about the composition of the biomass/cell walls; and (2) it provides information on how tightly the various wall components that can be detected are linked into the wall structure. The method is limited by the number of probes (McAbs, CBMs, etc.) used in the screen and the extent to which they are able to recognize the full breadth of wall components released by the extractive reagents. The substantial increase in number and diversity of cell wall probes over the past 10 years has dramatically improved the power and versatility of glycome profiling as a technique for rapid screening of cell wall/biomass samples.

The versatility of glycome profiling is also limited by the ability to immobilize the extracted wall components to a solid support. Diverse solid supports have been used, including nitrocellulose (Moller et al., 2007, 2008), glass slides (Pedersen et al., 2012), and multiwell plastic plates (Pattathil et al., 2012a). All of these suffer the limitation that most low-molecular-weight cell wall components that might be released in the wall extracts, especially low-molecular-weight glycans, do not bind to the solid supports without modification and therefore cannot be assayed by glycome profiling. The lower limit of the glycan size that will adhere has not been definitively determined, but is greater than 10 kDa (Pattathil et al., 2010).

The choice of extractive reagents that have been used for glycome profiling analyses has varied, as has their order. However, typically, the extractive reagents are used in order of increasing severity. Thus, relatively mild reagents, such as CDTA (Moller et al., 2007) or oxalate (Pattathil et al., 2012a), are used first, typically extracting primarily arabinogalactans and pectins. Harsher base extractions then follow, in which primarily hemicelluloses (e.g., xylans and xyloglucans) are extracted (Moller et al., 2007; Pattathil et al., 2012a). For samples that contain significant amounts of lignin, which is the case for most biomass samples of interest to the biofuel industry, an acidic chlorite extraction (Ahlgren and Goring, 1971; Selvendran et al., 1975) is used to degrade the lignin and release lignin-associated wall glycans; this chlorite extraction has most frequently been used after the first base extractions (Pattathil et al., 2012a) but has also been used as the first extraction step (de Souza et al., 2013). None of the extraction sequences used to date yield exclusively one kind of polymer in any given extract, an indication that each wall glycan exists as different subclasses that vary in their extent of cross-linking/

interactions within the wall. Ultimately, the choice of extraction reagents and their order depends on the individual investigator and the specific research questions under investigation.

Two approaches for glycome profiling of plant biomass/cell wall samples have been described. The first, termed comprehensive microarray polymer profiling (CoMPP), is a dot blot-based assay system utilizing nitrocellulose as the solid support (Moller et al., 2007, 2008) and typically employs ~20 glycan-directed probes for screening of three sequential extracts [CDTA (50 mM), 4M NaOH, and Cadoxen (33%; v/v)] prepared from plant cell walls. The number of glycan-directed probes that could be used in CoMPP can readily be expanded. An alternative, ELISA-based approach, termed glycome profiling, uses 384-well microtiter plates as the solid support, and uses a broadly diverse toolkit of 155 plant glycan-directed McAbs (Pattathil et al., 2012a) to screen sets of sequentially prepared plant biomass/cell wall extracts [typically, oxalate (50 mM), carbonate (50 mM), 1M KOH, 4M KOH, acidified chlorite, and 4M KOH post-chlorite]. The use of a suite of 155 McAbs ensures a wide-ranging coverage of multiple structural features on most of the major non-cellulosic plant wall glycans (Zhu et al., 2010; Pattathil et al., 2012a). The ELISA-based approach used in glycome profiling lends itself to facile automation and quantitation of antibody binding, hence substantially increasing the throughput of the analyses.

Glycome profiling has seen broad application to diverse experimental approaches in lignocellulosic bioenergy research, including analyzing cell walls from native/genetically modified, variously pretreated, and microbially/enzymatically converted plant biomass (DeMartini et al., 2011; Duceppe et al., 2012; Lee et al., 2012; Tan et al., 2013; Biswal et al., 2015; de Souza et al., 2015; Pattathil et al., 2015; Trajano et al., 2015). Both CoMPP and glycome profiling have been used to undertake comparative glycomics of plant cell wall samples originating from diverse plant phylogenies (Popper et al., 2011; Sørensen et al., 2011; Duceppe et al., 2012; Kulkarni et al., 2012). Examples of such analyses applied to questions related to bioenergy research include a recent study assessing the genetic variability of cell wall degradability of a selected number of *Medicago* cultivars with superior saccharification properties (Duceppe et al., 2012) and an examination of five grass species that revealed commonalities and variations in the overall wall composition and extractability of epitopes among these grasses (Kulkarni et al., 2012). Glycome profiling has also been employed as an effective tool for analyzing cell walls from biomass crops that are genetically modified with the aim of reducing recalcitrance. Examples include examination of the effects on recalcitrance of mutations in lignin biosynthesis in alfalfa [*cad1* (cinnamyl alcohol dehydrogenase 1) (Zhao et al., 2013) and *hct* (hydroxycinnamoyl CoA:shikimate hydroxycinnamoyl transferase) (Pattathil et al., 2012b)] and overexpression of the secondary wall-related transcription factor, PvMYB4 in switchgrass (Shen et al., 2013).

Analyses using cell wall-directed probes have allowed the rapid identification and monitoring of structural and compositional alterations that occur in plant biomass under various regimes of pretreatments (Alonso-Simón et al., 2010; DeMartini et al., 2011; Li et al., 2014; Socha et al., 2014; Pattathil et al., 2015; Trajano et al., 2015). Studies on hydrothermally pretreated wheat straw using CoMPP showed that severe pretreatment regimes induce significant alterations in wheat straw biomass, including reduction in various hemicellulose and mixed-linkage glucan epitopes (Alonso-Simón et al., 2010). In a more recent study, glycome profiling of poplar biomass subjected to low, medium, and severe hydrothermal pretreatment regimes demonstrated that a series of structural and compositional changes occur in poplar cell walls during this pretreatment, including the rapid disruption of lignin–polysaccharide interactions even under mild conditions, with a concomitant loss of pectins and arabinogalactans, followed by significant removal of hemicellulose (xylans and xyloglucans) (DeMartini et al., 2011). The major inference from this study was that lignin content *per se* does not affect recalcitrance; instead, it is the associations/cross-links between polymers, for example, between lignin and various polysaccharides, within cell walls that play a larger role (DeMartini et al., 2011). Glycome profiling has also been used to examine the effects of other types of pretreatment regimes such as Ammonia Fiber Expansion (AFEX™), alkaline hydrogen peroxide (AHP), and various types of ionic liquids (ILs) on the composition and extractability of wall glycan epitopes in biomass samples from diverse bioenergy crop plants (Li et al., 2014; Socha et al., 2014; Pattathil et al., 2015). These studies demonstrate that, unlike hydrothermal pretreatment, these three types of pretreatment, in general, cause loosening of specific classes of non-cellulosic glycans from plant cell walls, thereby contributing to the reduced recalcitrance exhibited by the pretreated biomasses. Conclusions from these studies contribute significantly to a deeper understanding of pretreatment mechanisms and ultimately will enable optimization of biomass pretreatment regimes and perhaps further downstream utilization processes for biomass from different plant feedstocks.

Glycome profiling has also been used to identify cell wall components that affect biomass recalcitrance. A recent study examined poplar and switchgrass biomass subjected to different pretreatments and correlated pretreatment-induced changes in the biomass with recalcitrance properties of the treated biomass samples (DeMartini et al., 2013). A set of samples with varying composition and structure was generated from native poplar and switchgrass biomass via defined chemical and enzymatic extraction. Subsequently, glycome profiling of the extracts was employed to delineate which wall components were removed and residual solid pretreated biomass samples were analyzed for their recalcitrance features. Major conclusions from this study are that pretreatment regimes affect distinct biomass samples differently and that the most important contributors to recalcitrance vary depending on the biomass. Thus, lignin content appears to play an important role in biomass recalcitrance particularly in woody biomass such as poplar (as they contain higher levels of lignin). However, subclasses of hemicellulose were key recalcitrancecausing factors in grasses such as switchgrass. These results may have important implications for the biofuel industry as they suggest that biomass-processing conditions may have to be tailored to the biomass being used as the feedstock for biofuel generation (DeMartini et al., 2013).

Another bioenergy-related area that has benefited from the use of plant cell wall glycan-directed probes is research into how microbes, particularly those being selected for biomass deconstruction, degrade plant biomass during culture. Such knowledge will be useful for bioengineering microbes for better biomass conversion. An analysis of biological conversion of unpretreated wild-type sorghum and various *brown midrib* (*bmr*) lines by *Clostridium phytofermentans* examined variations in extractable polysaccharide epitopes of the cell-wall fractions in detail using glycome profiling (Lee et al., 2012). The conclusions were that the loosely integrated xylans and pectins are the primary polysaccharide targets of *C. phytofermentans* and that these are more accessible in the *bmr* mutants than in the wild-type plants (Lee et al., 2012). In another study, an anaerobic thermophilic bacterium, *Caldicellulosiruptor bescii*, was shown to solubilize both lignin and carbohydrates simultaneously in swichgrass biomass at high temperature (Kataeva et al., 2013). Further studies with *C. bescii* demonstrated that deletion of a cluster of genes encoding pectic-degrading enzymes in this organism compromised the ability of *C. bescii* to grow on diverse biomass samples (Chung et al., 2014). A comparative analysis of hemicellulose utilization potentials of *Clostridium clariflavum* and *Clostridium thermocellum* strains demonstrated that *C. clariflavum* strains were better able to grow on untreated switchgrass biomass and degraded easily extractable xylans more readily than do *C. thermocellum* strains (Izquierdo et al., 2014). In all of these studies, glycome profiling proved to be a very effective tool for understanding what was happening to the biomass during culture with the microbes. Studies of this kind provide information about the mode of action of microbial strains on plant biomass, thus identifying wall components that are resistant/recalcitrant to microbial actions.

### Studies Using Immunolocalization

Immunolocalization techniques use fixed and embedded (generally in plastic resins) biomass samples (Knox, 1997; Lee et al., 2011b). Primary probes (polyclonal antibodies, McAbs, and CBMs) are applied on semithin sections followed by probing with a fluorescently tagged secondary antibody that allows visualization of glycan epitope localization/distribution under a fluorescent microscope (Avci et al., 2012; Lee and Knox, 2014). This approach for biomass analyses provides information regarding the distribution of cell wall glycans at the cellular and subcellular levels.

A handful of studies thus far have employed this technique in the context of bioenergy research for analyses of cell walls in wall biosynthetic mutants and in pretreated biomass. Examination of *Arabidopsis* and *Medicago* mutants in which a WRKY transcription factor was knocked out revealed secondary cell wall thickening in pith cells caused by ectopic deposition of lignin, xylan, and cellulose. In the *Arabidopsis* mutant, this ectopic secondary wall formation resulted in an approximately 50% increase in biomass density in stem tissue (Yu et al., 2014). The use of three xylandirected McAbs and a cellulose-directed CBM were instrumental in proving the ectopic deposition of these cell wall glycans in pith cells. In another recent study, the use of two xylan-directed CBMs (CBM2b-1-2 and CBM35 recognizing different degrees of methyl esterification on xylan) on the *Arabidopsis gxmt-1* mutant demonstrated a reduction of 4-*O*-methyl esterification of xylans (up to 75% as detected by chemical analyses) with a concomitant reduction in the recalcitrance of mutant walls (Urbanowicz et al., 2012). Additional studies also implicate the importance of secondary wall xylan for cell wall recalcitrance. Restoration of xylan synthesis in xylan-deficient mutants, as documented using xylan-directed McAbs, could, in some cases, yield plants with reduced xylan deposition compared with wild-type plants, but with normal growth habits and decreased recalcitrance (Petersen et al., 2012). Likewise, reduction of xylan in rice culm cell walls yielded plants with slightly lower stature, but with reduced recalcitrance (Chen et al., 2013).

Plant glycan-directed probes (McAbs and CBMs) can also be used to study the distribution patterns of glycan epitopes in plant biomass after diverse pretreatments used to reduce cell wall recalcitrance. One example of such a study is the demonstration that increasingly harsh hydrothermal pretreatments lead to an increased loss of various hemicellulosic, pectic, and cellulosic epitopes in cell walls of the pretreated tissues (DeMartini et al., 2011). The effects of other pretreatment methods (Alonso-Simón et al., 2010; DeMartini et al., 2013; Li et al., 2014; Socha et al., 2014; Pattathil et al., 2015; Trajano et al., 2015) on glycan epitope distribution patterns have not yet been carried out. Such information could be potentially useful to chemical engineers for the optimization of pretreatment conditions to enable optimal biomass conversion.

Immunolocalization studies have documented lignin distribution patterns in plant cell walls that may be relevant to bioenergy research. For instance, cell wall ultrastructure studies using three polyclonal antisera against DHPs allowed visualization of where these types of lignin-related polymers were located in cells of *Zea mays* L. (Joseleau and Ruel, 1997), *Arabidopsis thaliana*, *Nicotiana tabacum*, and *Populus tremula* (Ruel et al., 2002). These studies showed that H-DHPs were present in cell corners and middle lamella, whereas G-DHPs and G/S-DHPs were mainly present in secondary cell walls. The syringylpropane DHP epitope was visualized mainly in the S2 layer of secondary cell walls of *A. thaliana*, *N. tabacum*, and *P. tremula* (Joseleau et al., 2004). Recently, immunogold labeling analyses using KM1 and KM2 demonstrated the presence of 8-5′ and 8-8′ linked structures, respectively, on either developed xylem or phloem fibers of *Chamaecyparis obtusa* (Kiyoto et al., 2013). It will likely be informative to use these and other lignin-directed probes to monitor lignin distribution patterns in biomass that has been subjected to various pretreatment regimes and/or subjected to microbial degradation in the context of biomass conversion.

### Concluding Remarks

The application of high affinity, highly selective molecular probes against plant cell wall polymers clearly has high potential to provide complementary and supplementary data to existing chemical and biochemical analyses for studies on plant biomass structure and conversion. The number and diversity of McAb and CBM probes directed against plant polymers is now sufficiently large that these probes can provide extensive information about cell wall composition and structure in native and pretreated or microbially digested biomass. We have reviewed two main approaches using these probes for biomass characterization and conversion studies. Both glycome profiling/CoMPP and immunolocalization methods provide distinct but complementary information about the cell walls that constitute the bulk of plant biomass. Glycome profiling and CoMPP provide extensive information about the epitope composition and epitope extractability of polymers present in the biomass. Histochemical approaches using these probes provide valuable information about the spatial distribution of wall epitopes at all levels of organization, ranging from whole plants, to organs, to tissues, to cells, and even to individual cell walls and cell wall domains.

It is important to recognize several attributes of molecular probes directed against cell wall glycan epitopes, in particular, when interpreting the results of experiments. Both McAbs and CBMs are epitope-directed probes, that is, they specifically recognize particular structural motifs. Hence, glycan-directed McAbs and CBMs may not always be polymer-specific, in as much as glycan structures are frequently present in multiple molecular contexts within plant cell walls (e.g., arabinogalactan epitopes present on both polypeptide and polysaccharide backbones). Hence, positive binding of a McAb or CBM probe does not necessarily infer the presence of a particular cell wall glycan polymer. Likewise, the absence of binding of a given McAb or CBM does not unambiguously infer the absence of the glycan detected by this probe; the epitope may be absent or chemically modified (e.g., acetylated or methylated) such that the probe does not bind, but the polymer may still be present (Avci et al., 2012). Furthermore, plant glycans exist as families of polymers, whose epitope composition may not be uniform among all family members. Thus, a single McAb or CBM probe may not bind to all members of a polymer family, and it is therefore advisable to use multiple probes against diverse epitopes on a particular glycan to obtain a comprehensive picture of its abundance either in cell wall extracts or in histochemical localization studies. The size and diversity of the McAb/CBM collections now make such comprehensive studies possible.

Glycome profiling and CoMPP are dependent on the successful immobilization of cell wall-derived molecules to solid supports (e.g., plastic ELISA plates or nitrocellulose). Cell wall glycans with lower molecular masses (less than 20 kDa) have been found not to adhere reliably to the plates (Pattathil et al., 2010, 2012a). Hence, using glycome profiling as a tool to gather information regarding low-molecular-weight cell wall glycans is not advisable unless alternative strategies are employed to ensure adherence of these molecules to a solid support [e.g., covalent attachment directly to the solid support (Schmidt et al., 2015) or to a protein carrier that adheres to the solid support (Pedersen et al., 2012)]. Both glycome profiling and CoMPP also rely on chemical/enzymatic extractions of biomass/cell wall samples. Such extractions are rarely complete or quantitative and thus absolute quantitation of epitope composition in biomass/cell wall samples using these approaches is problematic. Thus, these approaches are best used as initial broad glycome characterization screens, particularly in comparative studies (e.g., mutant vs. wild-type and pretreated vs. untreated) where they provide valuable information regarding changes in the cell wall/biomass samples as a result of a particular experimental manipulation. In histochemical studies, the embedding medium used may influence the results of labeling experiments; in our laboratory, we have found LR White to give the most consistent results with both McAb and CBM probes (Avci et al., 2012).

## FUTURE PERSPECTIVES

The molecular probe toolkits (McAb and CBM) currently available provide an invaluable resource for plant biomass analyses of relevance to bioenergy research and biomass conversion process development. In spite of the number and diversity of the probes currently available, there is still a need for additional probes against structural features not encompassed by the binding specificities of the probes currently available. Thus, additional probes against lignin substructures, rhamnogalacturonan II, and cellulose would further enhance the utility of the probe toolkit. In addition, coverage by the current probe collection of the epitope diversity for some cell wall glycans (e.g., mannans, glucomannans, and galactomannans) is limited. Finally, there remains a need to obtain more detailed information regarding the binding

### REFERENCES


specificities of many of the molecular probes in the toolkit; about one third of the glycan-directed McAbs have had their epitope specificities characterized in detail. Efforts are underway in multiple laboratories to address these needs. Thus, we can look forward to an enhanced toolkit of probes against plant cell wall polymers in the future.

### ACKNOWLEDGMENTS

Immunological studies on biomass characterization conducted in our laboratory are supported by the BioEnergy Science Center administered by Oak Ridge National Laboratory and funded by a grant (DE-AC05-00OR22725) from the Office of Biological and Environmental Research, Office of Science, United States, Department of Energy. The generation of the CCRC series of plant cell wall glycan-directed monoclonal antibodies used in this work was supported by the NSF Plant Genome Program (DBI-0421683 and IOS-0923992).

modules in *Cellulomonas fimi* xylanase 11A. *Biochemistry* 40, 2468–2477. doi:10.1021/bi002564l


Chen, X., Vega-Sánchez, M. E., Verhertbruggen, Y., Chiniquy, D., Canlas, P. E., Fagerström, A., et al. (2013). Inactivation of *OsIRX10* leads to decreased xylan

Chen, F., and Dixon, R. A. (2007). Lignin modification improves fermentable sugar yields for biofuel production. *Nat. Biotechnol.* 25, 759–761. doi:10.1038/nbt1316

content in rice culm cell walls and improved biomass saccharification. *Mol. Plant.* 6, 570–573. doi:10.1093/mp/sss135


production from switchgrass. *Proc. Natl. Acad. Sci. U.S.A.* 108, 3803–3808. doi:10.1073/pnas.1100310108


adherent mucilage structure in *Arabidopsis* seed. *Plant Physiol.* 164, 1842–1856. doi:10.1104/pp.114.236596


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Pattathil, Avci, Zhang, Cardenas and Hahn. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Relationships between biomass composition and liquid products formed via pyrolysis

*Fan Lin1† , Christopher L. Waters2† , Richard G. Mallinson2 , Lance L. Lobban2 and Laura E. Bartley1 \**

*1Department of Microbiology and Plant Biology, University of Oklahoma, Norman, OK, USA, 2School of Chemical, Biological, and Materials Engineering, University of Oklahoma, Norman, OK, USA*

Thermal conversion of biomass is a rapid, low-cost way to produce a dense liquid product, known as bio-oil, that can be refined to transportation fuels. However, utilization of bio-oil is challenging due to its chemical complexity, acidity, and instability – all results of the intricate nature of biomass. A clear understanding of how biomass properties impact yield and composition of thermal products will provide guidance to optimize both biomass and conditions for thermal conversion. To aid elucidation of these associations, we first describe biomass polymers, including phenolics, polysaccharides, acetyl groups, and inorganic ions, and the chemical interactions among them. We then discuss evidence for three roles (i.e., models) for biomass components in the formation of liquid pyrolysis products: (1) as direct sources, (2) as catalysts, and (3) as indirect factors whereby chemical interactions among components and/or cell wall structural features impact thermal conversion products. We highlight associations that might be utilized to optimize biomass content prior to pyrolysis, though a more detailed characterization is required to understand indirect effects. In combination with high-throughput biomass characterization techniques, this knowledge will enable identification of biomass particularly suited for biofuel production and can also guide genetic engineering of bioenergy crops to improve biomass features.

### Keywords: thermochemical conversion, plant biomass, bio-oil, lignin, polysaccharides, cell wall, fast pyrolysis, minerals

Biomass can be a renewable and sustainable source of transportation fuels not associated with fossil CO2 release. Numerous studies highlight the advantages of displacing petroleum fuels with industrial production of liquid fuels from thermochemical conversion of biomass (Bridgwater et al., 1999; Perlack et al., 2005; Mohan et al., 2006; NSF, 2008). Thermochemical conversion entails heating of biomass in an anoxic environment; condensation of organic liquid products, known as bio-oil; and subsequent treatment of the products with catalysts to create liquid fuels, i.e., refined bio-oil, similar to petroleum-derived gasoline or diesel. This is in contrast to biochemical conversion, which utilizes enzymes to release sugars followed by microbial production of ethanol or other fuel molecules (Somerville, 2007; Youngs and Somerville, 2012). Relative to biochemical approaches, thermal conversion has the potential to make use of all carbon (C)-containing biomass components, would allow society to retain existing infrastructure associated with liquid hydrocarbon fuels, and, due to the rapidity of the process, may reduce production costs by permitting scalability and distribution of production (Huber et al., 2006; Mettler et al., 2012). For both

### *Edited by:*

*Jason Lupoi, Joint BioEnergy Institute, USA; University of Queensland, Australia*

### *Reviewed by:*

*Suyin Gan, The University of Nottingham Malaysia Campus, Malaysia Xu Fang, Shandong University, China*

*\*Correspondence:*

*Laura E. Bartley lbartley@ou.edu*

*† Fan Lin and Christopher L. Waters have contributed equally to this work.*

### *Specialty section:*

*This article was submitted to Bioenergy and Biofuels, a section of the journal Frontiers in Energy Research*

*Received: 22 July 2015 Accepted: 29 September 2015 Published: 21 October 2015*

### *Citation:*

*Lin F, Waters CL, Mallinson RG, Lobban LL and Bartley LE (2015) Relationships between biomass composition and liquid products formed via pyrolysis. Front. Energy Res. 3:45. doi: 10.3389/fenrg.2015.00045*

thermochemical and biochemical biofuels, lowering processing costs and improving fuel yields per hectare are major engineering challenges that hinder economic viability. Thermochemical fuel production also faces challenges related to maintaining a high C-yield while obtaining a fungible fuel. We posit that this latter challenge might be addressed by understanding the relationships between biomass composition and bio-oil components and using this information to alter biomass through genetic, chemical, or thermal means.

### THERMAL CONVERSION CHALLENGES

Two types of pyrolysis have been developed: fast pyrolysis and slow pyrolysis. Slow pyrolysis is usually performed over several hours and has a high solid yield, and as such has little relevance for liquid fuels production. Fast pyrolysis, however, is typically performed quickly, in seconds, at temperatures between 400 and 600°C and decomposes most of the solid biomass into a volatile mixture of various organic molecules, water, and CO/ CO2. Pyrolysis oil or bio-oil constitutes the condensable portion of this vapor. Non-condensable components (primarily CO2 and CO) and a mineral-rich solid (char) are other product classes that will not be addressed here, except in that they detract from the overall C-yield of raw and refined bio-oil. Bio-oil comprises water (15–30%) plus compounds from several chemical families including the following (**Table 1**): organic acids, light (C1–C3) oxygenates, furan and furan derivatives, phenolic species with various methyl and methoxy substituents, pyrones, and sugar derivatives like levoglucosan (Faix et al., 1991a,b). Bio-oil's chemically complex nature prohibits its direct use in combustion applications or petroleum refining. The reasons for this include low heating value; ignition difficulty; high chemical reactivity, which results in oligomerization and polymerization over time and upon heating, prohibiting distillative separation (Oasmaa and Czernik, 1999; Demirbas, 2011; Patwardhan et al., 2011a); immiscibility with petroleum; and high corrosivity (Oasmaa and Czernik, 1999). Many of these features are associated with the

TABLE 1 | The percentage ranges and categories of major bio-oil components.


*Source: Huber et al. (2006).*

high oxygen content of biomass and the resulting bio-oil, relative to fossil fuels.

In order to obtain desirable fuel properties and allow integration with the existing transportation fuels infrastructure (gasoline and diesel engines), the bio-oil must be chemically converted to reduce the undesirable characteristics mentioned above. Catalytic upgrading is typically used to refine bio-oil, improving its stability and making it an acceptable liquid fuel. The simplest method is hydrotreating or hydrodeoxygenation, which removes oxygen via catalytic hydrogenation (Furimsky, 2000), decreasing both the chemical reactivity and corrosivity. However, this process converts any C1–C5 oxygenates, representing as much as half of the carbon in bio-oil, to C1–C5 hydrocarbons that are too volatile for liquid fuels (Resasco, 2011). Another straightforward approach is to "crack" the pyrolysis vapors using acidic zeolite catalysts into light olefins and aromatic hydrocarbons (primarily benzene, toluene, and *o*/*m*/*p*-xylene) (Bridgwater, 1994; Carlson et al., 2008, 2009). This approach is appealing because of the lack of an external H2 requirement and the simplicity of the product streams. Furthermore, since zeolite cracking is widely used in traditional petroleum refining/valorization (Wan et al., 2015), other advantages are the product compatibility with existing refinery infrastructure and the maturity of the process (Wan et al., 2015). However, zeolite cracking is crippled by poor usable carbon yield due to the high amounts of coke, CO, and CO2 formed during the catalytic process (Carlson et al., 2008) and the concomitant rapid catalyst deactivation. Additionally, further catalytic oligomerization and reforming for olefins and aromatics, respectively, are needed to make these products suitable for addition to refinery fuel product streams, increasing the process costs and further reducing overall carbon yield.

More advanced strategies propose to use reactions such as ketonization, condensation, alkylation, and others to retain a higher fraction of the biomass carbon into liquid fuel-range molecules (Zhu et al., 2011; Zapata et al., 2012; Pham et al., 2013; Gonzalez-Borja and Resasco, 2015). However, catalytic upgrading of any one family of compounds (e.g., light oxygenates) typically requires a catalyst and reaction conditions different than those required for another family of compounds (e.g., substituted phenolics). Moreover, catalysts used for upgrading one family of compounds may be ill-suited for other families, either facilitating undesirable reactions (breaking C–C bonds unnecessarily or increasing H:C ratios above the 2:1 optimum) or undergoing rapid deactivation due to reactions with other non-targeted bio-oil oxygenates. These upgrading challenges suggest the desirability of thermal conversion producing more selective product streams, i.e., each stream comprising fewer families of chemical compounds. Developing such thermal conversion processes would be aided by clearer knowledge of the relationship between biomass composition and thermal conversion products.

### BIOMASS COMPOSITION AND CHEMICAL STRUCTURES

Recent reviews have addressed the general relationships between biomass composition and thermal products, such as increasing the content of phenolics relative to carbohydrates to reduce the oxygen content of bio-oil (Tanger et al., 2013). Here, we provide a more detailed description of the chemical structure and interactions among major cell wall components to aid in understanding more subtle relationships between biomass and bio-oil content. Biomass consists of cell walls that establish the structure of the plant and, to a lesser extent, non-structural components (**Table 2**). Cell walls determine the shape of leaves and stems and the cells that compose them and consist of cellulose, hemicellulose, lignin, as well as structural proteins and wall-associated mineral components (O'Neill and York, 2009; Vogel et al., 2011; Tanger et al., 2013). Non-structural components include sugars, proteins, and additional minerals (O'Neill and York, 2009; Vogel et al., 2011; Tanger et al., 2013). For example, in switchgrass, an important potential bioenergy crop, dry biomass consists of ~70% cell walls, 9% intrinsic water, 8% minerals, 6% proteins, and 5% non-structural sugars (Vogel et al., 2011). The relative fractions of different components, chemical linkages within and between polymers, and cellular patterning vary among plant species, organs, developmental stages, and growth conditions (Adler et al., 2006; El-Nashaar et al., 2009; Singh et al., 2012; Zhao et al., 2012). Here, we review the components of secondary cell walls, which are formed as plant growth ceases, as they constitute the majority of plant biomass (Pauly and Keegstra, 2008), and then discuss evidence for interactions among components. **Table 2** lists the different major and minor components of biomass and the broad ranges of their representation within biomass for biofuel conversion.



*a Percent mass composition of secondary cell walls.*

*bPauly and Keegstra (2008).*

*c Scheller and Ulvskov (2010).*

*dAs the highest percentage of xylan in Scheller and Ulvskov (2010) is higher than the highest percentage of hemicellulose in Pauly and Keegstra (2008), the highest percentage of hemicellulose is set to the highest percentage of xylan. e MLG is only abundant in grasses. The maximum percentage of MLG we are aware of is that of the mature rice stem after flowering (Vega-Sanchez et al., 2012). f Galactoglucomannan is only abundant in gymnosperm woods. Dicots and grasses possess* <*8% of mannan and galactoglucomannan (Scheller and Ulvskov, 2010). g The high abundance of solubles is only for sorghum biomass. Other plants usually have less than 15% soluble content (Pauly and Keegstra, 2008). h Vogel (2008).*

*i McKendry (2002).* **Figure 1(1–3)** shows the chemical structures and atom numbering of the most abundant cell wall monomeric species.

Cellulose and hemicellulose represent 15–49% and 12–50% of biomass by dry weight, respectively (Pauly and Keegstra, 2008; Vogel, 2008; Zhao et al., 2012). Cellulose is an unbranched homopolymer of >500 β-(1,4)-linked glucose units. In plant cell walls, cellulose is primarily in the form of crystalline microfibrils consisting of approximately 36 hydrogen-bonded cellulose chains, but also has amorphous regions (Somerville, 2006; Newman et al., 2013).

Hemicelluloses are typically branched polysaccharides substituted with various sugars and acyl groups. As discussed further in the Section "Evidence Relating Biomass Content and Bio-oil Composition," the different sugar composition and linkages of hemicelluloses influence thermal products (Shafizadeh et al., 1972; Mante et al., 2014). The structure and composition of hemicellulosic polysaccharides differ depending on plant species classification, i.e., taxonomy. Major taxonomic divisions with relevance to bioenergy production are grasses, such as switchgrass and wheat; woody dicots, i.e., hardwoods, such as poplar; and woody gymnosperms, i.e., softwoods, such as pine. The most abundant grass hemicelluloses are mixed-linkage glucan (MLG) and glucuronoarabinoxylan (GAX) (Scheller and Ulvskov, 2010; Vega-Sanchez et al., 2013); the hemicelluloses of hardwood are primarily composed of glucuronoxylans (GX) but also contain a small amount of galactomannans (GM) (Pauly and Keegstra, 2008); and softwood hemicelluloses are largely galactoglucomannan (GGM) and GAXs (Scheller and Ulvskov, 2010). MLG is an unbranched glucose polymer similar to cellulose but containing both β-(1-3)- and β-(1-4)-linkages (Vega-Sanchez et al., 2012). MLG is nearly unique to the order Poales, which includes the grasses, but has also been found in horsetail (*Equisetum*). Its abundance in mature tissues and secondary cell walls has recently been recognized (Vega-Sanchez et al., 2013). Xylans consist of a β-(1-4)-linked xylose backbone with various substitutions. GXs are xylans substituted mostly by glucuronic acid and 4-*O*-methyl glucuronic acid through α-(1-2)-linkages. GAXs are not only substituted by glucuronic acid but also substituted by arabinofuranoses at the O-3, which can be further substituted by the phenylpropanoid acids, to form feruloyl- and *p*-coumaryl esters linked at the O-5 (Scheller and Ulvskov, 2010). Acetyl groups are often attached to the O-3 of backbone xyloses but also attach to the O-2. Unlike xylans, which mainly consist of pentoses, mannans consist of hexoses like mannose, glucose, and galactose. GM and GGM have a β-(1-4)-linked backbone with mannose or a combination of glucose and mannose, respectively. Both GM and GGM can be acetylated and substituted by α-(1-6)-linked galactoses (Scheller and Ulvskov, 2010; Rodriguez-Gacio Mdel et al., 2012; Pauly et al., 2013). Relatively depleted in secondary walls, but rich in growing primary walls of dicot species, xyloglucan and pectins are two other polysaccharides in cell walls. Xyloglucan consists of β-(1-4)-linked glucose residues, modified by xylose and other sugar residues; and pectin is another branched or unbranched polymer that is rich in galacturonic acid, rhamnose, galactose, and several other monosaccharide residues (Somerville et al., 2004; Scheller and Ulvskov, 2010).

Lignin is a cross-linked, heteropolyphenol mainly assembled from three monolignols – sinapyl (S), coniferyl (G), and *p*-coumaryl (H) alcohols. As waste products are often selected as biofuel feedstocks, it is also relevant to note that lignin derived from other monolignols such as caffeyl alcohol and 5-hydroxyconiferyl have been found in the seedcoat of both monocots and dicots (Chen et al., 2012, 2013). Lignin structural heterogeneity and various types of incorporated groups can lead to a variety of different depolymerization reactions during pyrolysis (Kawamoto et al., 2007). Often traceable to the corresponding bio-oil components, the three major lignin units differ in the degree of methoxylation of their carbon ring. S-units are methoxylated at both O-3 and O-5 ring positions; G-unit have one methoxy group at the O-3 position; and H-units lack ring methoxy groups (**3**, **Figure 1**) (Boerjan et al., 2003). Lignin units undergo oxidative coupling in the cell wall to form many types of dimers, including β–O–4, β–5, β–β, 5–5, 5–O–4, and β–1, leaving other atoms free to further polymerize, which significantly increases the structural heterogeneity of lignin. Lignin units can also be esterified with *p*-coumaryl, *p*-hydroxybenzoyl, and acetyl groups, primarily at the γ position of terminal units (Petrik et al., 2014; Lu et al., 2015). Lignin compositions and the acylation groups vary among plant clades (Boerjan et al., 2003). Woody dicot lignins have G- and S-units and trace amount of H-units. Poplar wood, for example, has a G:S:H ratio of 55:45:1 (Vanholme et al., 2013). The lignin of many hardwoods is acylated by *p*-hydroxybenzoates (Lu et al., 2015) and acetyl groups in low amounts (Sarkanen et al., 1967). Biomass from other species, such as palms and kenaf, possess a high degree of lignin acetylation (Lu and Ralph, 2002). Grass lignins also contain G- and S-units with slightly higher amount of H-units than woody dicots. Wheat straw, for example, has a G:S:H ratio of 64:30:6 (Bule et al., 2013). Grass lignin possesses high levels of *p*-coumarate esters (Hatfield et al., 2008) and can also be etherified by tricin and ferulic acid (Ralph et al., 1995; Lan et al., 2015), as discussed further below. Woody gymnosperm

lignins are different from angiosperm lignins, being primarily composed of G-units and a lower amount of H-units (Boerjan et al., 2003).

Biomass also contains inorganic elements including Ca, K, Si, Mg, Al, S, Fe, P, Cl, and Na and some trace elements (<0.1%) such as Mn and Ti, according to ash analysis, formed by oxidation of biomass at 575°C (Masia et al., 2007; Vassilev et al., 2010). As with other biomass components, the abundance of mineral elements varies among species. In general, compared with grass biomass, woody biomasses contain less ash, Cl, K, N, S, and Si, but more Ca (Vassilev et al., 2010).

Plant biomass components do not accumulate independently of each other, though their relationships are still an active area of research (Dick-Perez et al., 2011; Tan et al., 2013; Mikkelsen et al., 2015). Biomass component amounts can correlate because they are physically bound to each other through covalent and non-covalent bonds or because they accumulate in the same plant organ or stage of plant development, though a physical interaction may not exist. Because the abundance of some biomass components is correlated, the thermal products from one biomass component may also correlate with other components. For example, the abundance of cellulose correlates with the abundance of lignin in five different biomass sources (Pearson's correlation coefficient = 0.83) and lignin-derived thermal products correlate with cellulosic glucose (Mante et al., 2014). Many mineral elements are also correlated with each other, for example, N, S, and Cl; Si, Al, Fe, Na, and Ti; Ca, Mg, and Mn; K, P, S, and Cl (Vassilev et al., 2010). Numerous interactions between lignins and hemicelluloses and among hemicelluloses have been observed. Among the best-studied examples, GAXs of grasses and other recently evolved monocot species covalently link to lignin through ether bonds with ferulate esters on arabinose moieties of arabinoxylan (Bunzel et al., 2004). In poplar and spruce wood, NMR results indicate that lignin and carbohydrates are directly bonded through several types of ether linkages (Yuan et al., 2011; Du et al., 2014). The data provide evidence for ether bonds between lignin and C1, C5, and C6 atoms of pentoses and hexoses (Yuan et al., 2011). Generally, xylan is the most closely associated polysaccharide to lignin, and NMR studies have also clearly identified lignin–glucuronic acid ester bonds (Yuan et al., 2011). Also, MLGs closely coat low-substituted xylan regions, likely via non-covalent interactions (Carpita et al., 2001; Kozlova et al., 2014). Furthermore, some components can also affect the distribution of other components. For example, rice plants that overexpress an enzyme that cleaves MLG exhibit reduced MLG and have an altered distribution profile of Si though maintain the same total amount of Si (Kido et al., 2015). In sum, mounting evidence supports covalent and non-covalent interactions among cell wall polymers and components; however, these connections have been difficult to study with questions persisting related to how different cell wall preparations and manipulation may alter observations.

## MODELS FOR RELATIONSHIPS BETWEEN BIOMASS COMPONENTS AND BIO-OIL PRODUCT COMPOSITION

Reaction pathways of individual biomass components to formation of thermal products have been described (Collard and Blin, 2014). However, the pyrolysis literature suggests that biomass components tend to have more complex effects on bio-oil yield and product composition than simply their quantity. Here, we introduce three possible "models" of how biomass components may influence the yield or composition of thermal products, and in Section "Evidence Relating Biomass Content and Bio-oil Composition," we discuss evidence supporting each of them. **Figure 2** provides schematic representations of the following models:

*Model 1*: Biomass components are the direct sources of thermal products. Components are converted to products through depolymerization and secondary reactions such as cracking, i.e., splitting, and recombination (**Figure 2A**).

*Model 2*: Components or their derived products act as catalysts that accelerate thermal reactions of other components, altering product yields and ratios (**Figure 2B**).

*Model 3*: Chemical interactions or structural relationships among cell wall components alter bio-oil composition and/or yield (**Figure 2C**). This "indirect" model applies when variation in a biomass component alters the yield of a chemically unrelated product in a manner not easily explained by a catalytic effect. Chemical interactions that alter products may either be covalent or non-covalent chemical bonds between cell wall components. Structural relationships refer to correlations between components, often minor ones, and physical features of the biomass. For example, the abundance of a cell wall component may be indicative of the structure of the plant material, such as biomass bulk density differences caused by different leaf to stem ratios, but do not reflect chemical bonding between components. As of the preparation of this review, very little evidence addresses how biological correlations effect bio-oil products, so the discussion focuses on potential chemical interactions.

## EVIDENCE RELATING BIOMASS CONTENT AND BIO-OIL COMPOSITION

Evidence in the literature for the three models described above is presented in **Table 3** and discussed below. In the reviewed experiments, relationships between biomass components and pyrolysis products have been identified by varying the starting biomass, either through experimentation on purified components, via naturally occurring variation among different biomass sources, or via pretreatment of the biomass. Most studies included in this discussion report the chemical products derived from pyrolysis of biomass or biomass components. Studies that only reported weight losses or elemental balances were not considered. The two dominant techniques present in this corpus of literature are either pyrolysis-gas chromatography/mass spectroscopy, where pyrolysis vapors from microgram- to milligram-scale samples are directly transported to a GC for analysis, or pyrolysis in a gramto kilogram-scale reactor system followed by condensation of the vapors and subsequent chromatographic analysis of the liquid.

## Model 1: Direct Products of Cellulose, Hemicellulose, and Lignin

Thermal breakdown of purified cellulose, hemicellulose, and lignin has been relatively well studied. Levoglucosan, a sixcarbon 1,6-anhyrosugar (see **Figure 1**), was identified as the main product of cellulose pyrolysis nearly a century ago (Pictet and Sarasin, 1918). Levoglucosan is formed alongside other smaller decomposition products, with maximum levoglucosan production occurring at 500°C (Shafizadeh et al., 1979). Minor products of cellulose pyrolysis are dominated by other anhydrosugars that retain all six carbons of glucose, such as 1,6-anhydroglucofuranose and 5-hydroxymethyl furfural, but also smaller molecules, like furfural (**5**, **Figure 1**), formic acid, and glycolaldehyde, among others (Patwardhan et al., 2011b).

As with cellulose, hemicellulose pyrolysis products depend mostly on the number of carbons in the monosaccharide residues of the starting polymer (Shafizadeh et al., 1972). Pentoses and hexoses produce similar light C1–C3 oxygenates but differ in the types and selectivities (i.e., relative ratios) of heavier C4–C6 products. Consistent with expectations, pyrolysis of monosaccharides reveals that hexoses can form more unique compounds than pentoses, including pyranic species; additionally, pentoses yield more lighter fragmentation products than hexoses and only trace amounts of C6 and higher products (Raisanen et al., 2003).

Lignin thermal degradation products generally retain the characteristic ring decoration of the monolignols from which they originate (**3**, **Figure 1**). For example, syringol derivatives are bio-oil products derived from S-lignin units and guaiacols are products derived from G-lignin units (**6**, **Figure 1**). The derivative groups possess 1–3 carbons and/or oxygenate moieties at the fourth position (**6**, **Figure 1**). Consistent with expectations, softwood lignins yield almost exclusively guaiacyl derivatives, while hardwood lignins yield both guaiacyl and syringyl derivatives. Grasses yield not only guaiacyl, syringyl, and *p-*hydroxyphenyl derivatives but also vinylphenol, propenyl-phenols, and *p-*hydroxybenzaldehyde that are not produced during pyrolysis of


*(Continued)* Biomass composition influences on bio-oil products



softwood and hardwood (Saizjimenez and Deleeuw, 1986; Mante et al., 2014) and are likely derived from ferulate and coumarate esters (Penning et al., 2014b). Phenol derivatives are the large majority of the products formed from lignin pyrolysis; aromatic hydrocarbons and some furan derivatives are also detectable, but at very low amounts that might represent lignin sample contaminants (Saizjimenez and Deleeuw, 1986). Lignins from spruce wood with different dimer compositions also show different product distributions, including variations in the yield of major products like guaiacol (Du et al., 2014). This suggests that bonds between lignin units and the lignin structure determined by those bonds may impact pyrolysis as well.

## Model 2: Secondary Reactions Catalyzed by Inorganic Components

The biopolymers that make up the majority of the biomass by weight are established as the primary source of bio-oil products formed during thermal degradation. However, secondary reactions occur during the pyrolysis process involving other components present within the biomass (Ponder and Richards, 1991; Kleen and Gellerstedt, 1995; Muller-Hagedorn et al., 2003; Fahmi et al., 2007; Patwardhan et al., 2010; Ronsse et al., 2012; Lou et al., 2013; Mante et al., 2014). As products form, they can interact with catalytic minerals in the residual solid. For example, levoglucosan has been shown to react on minerals present in the residual char from pyrolysis of biomass. The products formed include levoglucosenone, furan derivatives, and lighter oxygenates such as acetic acid, acetone, and acetol. Demineralization prohibits the formation of these products (Fahmi et al., 2007; Ronsse et al., 2012).

Different inorganics are responsible for different kinds of secondary reactions. In general, the presence of metal cations enhances the homolytic cleavage of pyranose ring bonds over the heterolytic cleavage of glycosidic linkages, leading to the increased formation of light oxygenate decomposition products at the expense of levoglucosan formation. While Na<sup>+</sup>, K<sup>+</sup>, Mg2<sup>+</sup>, and Ca2<sup>+</sup> all catalyze levoglucosan decomposition, the effects of group 1 (alkali metals) and group 2 (alkaline) elements differ. Increased Na+ and K+ alkali metal loading increased formic acid, glycolaldehyde, and acetol more than similar amounts of the alkaline metals, Mg2+ and Ca2<sup>+</sup>, though more furfural is produced with increasing concentrations of Mg2+ and Ca2<sup>+</sup>. Additionally, the alkali metals reduce levoglucosan production at very low thresholds. This suggests that Na<sup>+</sup> and K<sup>+</sup> ultimately promote cracking reactions while Mg2+ and Ca2<sup>+</sup> promote dehydration reactions (Muller-Hagedorn et al., 2003; Patwardhan et al., 2010; Eom et al., 2012).

## Model 3: Interactions and Linkages Between Primary Components

While the first two models address the direct conversion of biopolymer organic components to related bio-oil products and their further reaction catalyzed by biomass inorganics, the third addresses compositional and structural relationships among cell wall components and their impact on products. Interactions between polysaccharides and lignin have been shown to alter pyrolysis products (Du et al., 2014; Zhang et al., 2015). The cellulose–lignin interaction can lead to a decrease in levoglucosan yield and an increase in light (C1–C3) compounds, especially glycolaldehyde and furans. Based on the nature of the small products, Zhang et al. (2015) hypothesized that the cellulose–lignin interaction occupies the C6 position, disfavoring glycosidic bond cleavage that is required for the formation of levoglucosan and favoring light compound and furan formation through ring scission, rearrangement, and dehydration reactions. The strength of this effect on pyrolysis products is most pronounced in grasses, followed by softwood and then hardwood, possibly due to the increased prevalence of covalent bonds between cellulose and lignin in grass cell walls (Jin et al., 2006; Zhou et al., 2010). Hemicellulose–lignin interactions, especially the xylan–lignin interaction revealed in NMR experiments (Yuan et al., 2011), may also affect pyrolysis. Indeed, enzymatic removal of hemicelluloses from lignin–carbohydrate complexes increased coniferyl alcohol yields (Du et al., 2014).

An example of a compositional feature that may impact product distribution is the degree of acetylation of the biopolymers. As mentioned, acetyl groups decorate hemicellulose side chains and are also present in the lignin. The increased abundance of these groups in biomass correlates with increasing yields of acetic acid, methyl pyruvate, acetone, and furan; additionally, this acetylation correlates with decreasing yields of furfural and acetaldehyde (Shafizadeh et al., 1972; Mante et al., 2014). While the acetic acid and perhaps the methyl pyruvate can be explained by the direct production of these compounds upon pyrolytic decomposition (Model 1), the nature of the relationship between acetate and the furanic and other 4-carbon species has not been clearly defined. The production of the 4-carbon species may be due to an indirect effect (Model 3) or may be the result of catalytic reaction of acetate with itself (Model 2).

Several investigations (Westerhof et al., 2007; He et al., 2009; Burhenne et al., 2013) suggest that feedstock moisture content can also play a role in the yield and product distribution of the organic fraction of the bio-oil. As previously discussed, the presence of water in bio-oil prohibits its direct use and creates challenges to catalytic valorization. For these reasons, biomass is typically subjected to drying prior to pyrolysis, which both reduces the required energy of the pyrolysis step and limits the water in the liquid condensate to water produced by decomposition reactions. However, the degree to which the feedstock moisture content should be eliminated is still under investigation. Burhenne et al. (2013) found that higher feedstock moisture content led to slightly lower char and gas yields upon pyrolysis with minimal changes to the elemental composition of the char. However, this is in disagreement with Westerhof et al. (2007) who observed slightly higher char yields with increasing moisture content. The water weight fraction distribution of the feedstocks in the two studies were quite different, 2.4–55.4% in Burhenne et al. versus 0–20% in Westerhof et al. Beyond impacts to the yields, He et al. (2009) studied the change in selectivity to the organic fraction produced upon pyrolysis of switchgrass with 5, 10, and 15% feedstock moisture contents. The authors found that at 500°C, the lowest moisture content feedstock produced the highest amounts of levoglucosan and acetic acid. The authors note that while significant differences in pyrolysis products were observed, they could not identify clear trends in their data. Among these studies, the observable but sometimes contradictory or unclear trends suggest that the feedstock moisture content may have multiple impacts on the pyrolysis process, possibly related to the physical location of the water in biomass.

In addition to compositional factors, morphological factors also influence the bio-oil product distributions. Biomass undergoing thermal decomposition retains its morphology even in harsh thermal treatment regimes (Pohlmann et al., 2014). Biomass is a poor conductor of heat (conductivity <0.1 W/m K) (Bridgwater et al., 1999), and large temperature gradients occur in heated biomass particles (Bryden et al., 2002). Most reactor systems for thermal degradation require size reduction of biomass particles; as an example, fluidized beds require particle sizes no larger than 2 mm (Bridgwater et al., 1999) to ensure rapid reaction. These particle sizes are larger than the tissue structures present in biomass. While the overall tissue and cellular morphology remain intact, micropore formation and shrinkage during the reaction process can occur in a non-uniform manner throughout the biomass (Davidsson and Pettersson, 2002; Pohlmann et al., 2014). Piskorz and colleagues observed decreasing liquid yields with increasing particle size, attributed to increasing incidence of secondary reactions with in wood particles (Scott and Piskorz, 1984). The principles of internal and external diffusion and the impacts of tortuosity, surface area, and diffusion path lengths are all fundamental to catalytic reaction engineering, and in the case of thermal biomass conversion, these important parameters are all dictated by the reacting feedstock (Fogler, 2006). Some evidence supports the notion that different plant developmental stages, which are related to the ratio of leaves to stems and biomass density, result in different pyrolysis products. For example, switchgrass harvested at later times during the growing season produced increased yields of condensable products, relative to that from younger, leafier material (Boateng et al., 2006), though compositional and developmental differences of the starting material were not carefully assessed.

## CONCLUSION

Years of research have led to understanding of the direct pyrolysis conversion pathways of the major monomeric and polymeric constituents of biomass (Model 1, **Table 2**). The observation that these constituents often represent minor components in raw biooil (**Table 1**) highlights the importance of catalytic degradation (Model 2) and possibly indirect effects (Model 3) on pyrolysis products. The latter model is only recently receiving attention as knowledge of cell wall structures and analytical repertoires blossom (Mante et al., 2014; Zhang et al., 2015). Detailed examination of the relationships between components and products is still sparse, with the biological literature providing detailed characterization of cell wall components, while the engineering literature analyzes the chemical components, or often just total yields, of different pyrolysis fractions. We would argue that further investigations on the relationships between biomass components and thermal products will allow improvement of thermal product "quality." Short of attaining (or improving on) petroleum fuel-like properties, even the criteria for a high-quality thermal product remain unclear. As discussed, this is, in part, because methods for upgrading are so dependent on bio-oil composition. Thus, methods that economically separate and/or simplify the different product streams, while still maintaining C–C bonds and overall C-content, are more likely to be amenable to catalytic upgrading.

Greater and more systematic analysis of biomass composition and pyrolysis products within species that show significant compositional variation will aid in better understanding biomass–bio-oil relationships. Much of the existing literature relies on comparisons of thermal degradation products across diverse taxonomic groups that vary greatly in cell wall composition beyond the biomass components measured (**Table 3**). An analysis of more subtle compositional differences, in which compositional factors are varied across different samples, may aid in refining biomass–bio-oil relationships. For example, genetic mutants that vary in only one component relative to near isogenic, unmutated "wild-type" plants can directly address relationships between starting components and products (Li et al., 2012). In addition to genetically determined compositional differences, biomass composition also depends on growth conditions and developmental stage, which relates to harvest time. Taken together, the scale of the problem points to the value of developing high-throughput methods to help identify species and genotypes that are most suitable for production of specific thermal products and to guide the optimization of genetic stocks and growth condition for bioenergy crops. Methods available to identify such "high-quality" biomass include near-infrared reflectance spectroscopy (Vogel et al., 2011), Fourier transform near-infrared spectroscopy (Liu et al., 2010), and pyrolysis molecular beam mass spectrometry, at least for lignin components (Sykes et al., 2009; Penning et al., 2014b). In general, these methods can be trained, either rationally or in a model-independent manner, to detect spectroscopic or molecular signatures in biological materials with linear or nonlinear relationships to thermal products.

Besides selecting or breeding for natural variation in biomass composition (Wegrzyn et al., 2010; Penning et al., 2014a), it is also possible to genetically modify biomass composition (Bartley and Ronald, 2009). Most simply, genetic engineering of bioenergy plants can be achieved by modifying the plant's genome to (1) express genes from other organisms, (2) increase expression of native genes, or (3) reduce expression of native genes. More complex schemes are also possible, in which expression patterns of genes are altered through synthetic biology approaches that recombine various genetic elements (Yang et al., 2013). The most common method for plant genetic engineering co-opts the molecular machinery of a bacterial pathogen that introduces genes into plant chromosomes to facilitate its pathogenesis.

Genetic engineering to improve bio-oil production would aim to increase biomass components that enhance the yield of favored products and/or to decrease components that produce disfavored products or interfere with upgrading strategies. Advances in understanding cell wall biosynthesis, including genes responsible for synthesizing the major polymer classes (Bonawitz and Chapple, 2010; Scheller and Ulvskov, 2010; Pauly et al., 2013) and covalent interactions among them (Chiniquy et al., 2012; Bartley et al., 2013; Schultink et al., 2015); regulation of expression of the cell wall biosynthesis genes (Zhao and Dixon, 2011); and metal ion transport proteins that determine the abundance and distribution of plant mineral content (Ma et al., 2006; Yamaji and Ma, 2009; Zhong and Ye, 2015), lay the foundation for genetically engineering bioenergy crop cell wall content and structure. For example, lignin is an important target for genetic engineering for pyrolysis since the major lignin-derived products have a lower O:C ratio, a higher energy value, and are more stable than sugar-derived products (Tanger et al., 2013; Mante et al., 2014). Some important genes that participate in or regulate lignin synthesis have already been modified in energy crops without major interference with plant biomass yield (Baxter et al., 2014, 2015; reviewed in Bartley et al., 2014). However, current genetic engineering strategies are focused on developing low lignin biomass for saccharification and biochemical conversion to fuels. Therefore, more work is required to develop biomass with high-lignin content for thermal conversion. Producing corrosive acetic acid in bio-oil (Mante et al., 2014), acetyl groups on cell wall polymers are another potential target for genetic engineering of "pyrolysis crops." Three enzyme classes, including the reduced wall acetylation (RWA) proteins, Trichome birefringence-like (TBL) and Altered Xyloglucan (AXY) proteins acetylate cell wall polysaccharides (Lee et al., 2011; Xiong et al., 2013; Schultink et al., 2015). A mutant of the dicot reference plant, *Arabidopsis thaliana*, which lacks expression of all four RWA genes, shows a 40% reduction in secondary wall-associated acetyl groups (Lee et al., 2011). Reducing expression of this family of genes in bioenergy crops may help to solve the problems caused by acetic acid in bio-oil produced from such plants.

Pretreatments such as washing/leaching and torrefaction are another class of strategies to improve biomass quality by changing biomass composition (Zheng et al., 2013; Banks et al., 2014).

### REFERENCES


For example, by washing biomass with detergent (Triton) or acid to remove minerals, the yield of bio-oil is increased and reaction water content is reduced (Banks et al., 2014). Coupling biochemical conversion of biomass, which depletes the polysaccharide fraction, with pyrolysis of the resulting residue, or bagasse, is another avenue to explore further (Islam et al., 2010; Cunha et al., 2011). Torrefaction is a low-temperature (200–400°C) thermal pretreatment that decomposes hemicellulose and may segregate disfavored products such as water and acid into intermediate streams before the next stage of pyrolysis (Zheng et al., 2013). More efficient torrefaction may be achieved by changing the composition or chemical structure of hemicellulose through genetic methods to further separate the decomposition temperatures of hemicellulose from lignin and cellulose. By identifying and studying the roles of key biomass components during thermal conversion, it will be possible to maximize the economic and environmental benefits of plant biomass-derived biofuels in the future.

### AUTHOR CONTRIBUTIONS

FL and CW drafted and revised the manuscript. LB, LL, and RM revised the manuscript.

### ACKNOWLEDGMENTS

This material is based upon work supported by the National Institute of Food and Agriculture, U.S. Department of Agriculture, under award number 2010-38502-21836, through the South Central Sun Grant Program.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Lin, Waters, Mallinson, Lobban and Bartley. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **Potential for Genetic Improvement of Sugarcane as a Source of Biomass for Biofuels**

*Nam V. Hoang1,2 \*, Agnelo Furtado<sup>1</sup> , Frederik C. Botha1,3, Blake A. Simmons 1,4 and Robert J. Henry <sup>1</sup>*

*<sup>1</sup> Queensland Alliance for Agriculture and Food Innovation, The University of Queensland, St. Lucia, QLD, Australia,*

*<sup>2</sup> College of Agriculture and Forestry, Hue University, Hue, Vietnam, <sup>3</sup> Sugar Research Australia, Indooroopilly, QLD, Australia, 4 Joint BioEnergy Institute, Emeryville, CA, USA*

Sugarcane (*Saccharum* spp. hybrids) has great potential as a major feedstock for biofuel production worldwide. It is considered among the best options for producing biofuels today due to an exceptional biomass production capacity, high carbohydrate (sugar + fiber) content, and a favorable energy input/output ratio. To maximize the conversion of sugarcane biomass into biofuels, it is imperative to generate improved sugarcane varieties with better biomass degradability. However, unlike many diploid plants, where genetic tools are well developed, biotechnological improvement is hindered in sugarcane by our current limited understanding of the large and complex genome. Therefore, understanding the genetics of the key biofuel traits in sugarcane and optimization of sugarcane biomass composition will advance efficient conversion of sugarcane biomass into fermentable sugars for biofuel production. The large existing phenotypic variation in *Saccharum* germplasm and the availability of the current genomics technologies will allow biofuel traits to be characterized, the genetic basis of critical differences in biomass composition to be determined, and targets for improvement of sugarcane for biofuels to be established. Emerging options for genetic improvement of sugarcane for the use as a bioenergy crop are reviewed. This will better define the targets for potential genetic manipulation of sugarcane biomass composition for biofuels.

**Keywords: sugarcane, biofuels, biomass for biofuels, biofuel traits, association studies**

## **INTRODUCTION**

Plant biomass from grasses such as sugarcane or woody species contains mostly cellulose, hemicellulose, and lignin (also referred to as lignocellulosic biomass), which can be converted to biofuels as a source of renewable energy. At the moment, plant biomass-derived biofuels have great potential in countries that have limited oil resources because they reduce the dependence on fossil fuel, mitigate air pollution by cutting down greenhouse gas emissions, and can be produced from a wide range

### *Edited by:*

*P. C. Abhilash, Banaras Hindu University, India*

### *Reviewed by:*

*Yu-Shen Cheng, National Yunlin University of Science and Technology, Taiwan Tianju Chen, Chinese Academy of Sciences, China*

> *\*Correspondence: Nam V. Hoang hoang.nam@uq.net.au*

### *Specialty section:*

*This article was submitted to Bioenergy and Biofuels, a section of the journal Frontiers in Bioengineering and Biotechnology*

> *Received: 25 July 2015 Accepted: 26 October 2015 Published: 17 November 2015*

### *Citation:*

*Hoang NV, Furtado A, Botha FC, Simmons BA and Henry RJ (2015) Potential for Genetic Improvement of Sugarcane as a Source of Biomass for Biofuels. Front. Bioeng. Biotechnol. 3:182. doi: 10.3389/fbioe.2015.00182*

**Abbreviations:** AFLP, amplified fragment length polymorphism; BAC, bacterial artificial chromosome; CAD, cinnamyl alcohol dehydrogenase (EC 1.1.1.195); cDNA, complementary DNA; COMT, caffeic acid *O*-methyltransferase (EC 2.1.1.68); DArT, diversity array technology; EST, expressed sequence tag; Gb/Mb, gigabase/megabase; LD, linkage disequilibrium; Lignin G, lignin guaiacyl; Lignin H, lignin hydroxyphenyl; Lignin S, lignin syringyl; NGS, next-generation sequencing; QTL, quantitative trait locus; RFLP, restricted fragment length polymorphism; RNAi, RNA interference; S/G ratio, syringyl/guaiacyl ratio; SNP, single nucleotide polymorphism; SUCEST, sugarcane EST database; TF, transcription factor; TIGR, the institute for genome research.

of abundant sources (Matsuoka et al., 2009). Biofuels generated from plant lignocellulosic biomass (also known as the second generation of biofuels) have been shown to be advantageous over the first generation (from plant starches, sugar, and oil) in terms of net energy and CO<sup>2</sup> balance and, more importantly, they do not compete with food industries for supplies (Yuan et al., 2008). To date, producing bioethanol from the sugar in sugarcane has been one of the world's most commercially successful biofuel production systems, with the potential to deliver second-generation fuels with a high positive energy balance and at a relatively low production cost (Yuan et al., 2008; Botha, 2009; Matsuoka et al., 2009). The rapid growth and high yield of sugarcane compared to other grasses and woody plants makes it a good candidate for ethanol processing platform and the second generation of biofuels in general (Pandey et al., 2000). Sugarcane has an exceptional ability to produce biomass as a C4 plant with the potential of a perennial grass crop allowing harvest four to five times by using ratoons without requiring replanting (Verheye, 2010), resulting in a lower cost of energy production from sugarcane than for most of the other potential sources of biomass (Botha, 2009). Brazil is the world's first country to launch a national fuel alcohol program (ProAlcooL). This program is based on the use of sugarcane and substitutes the usage of gasoline by ethanol (Dias De Oliveira et al., 2005). Approximately, 23.4 billion liters (6.19 billion U.S. liquid gallons) of ethanol was produced in Brazil in the year 2014 (Renewable Fuels Association, 2015). As of 2009, sugarcane bagasse contributes to about 15% of the total electricity consumed in Brazil, and it is predicted that energy generated from sugarcane stalks could supply more than 30% of the country energy needs by 2020 and will be equal to or more than the electricity produced from hydropower (Matsuoka et al., 2009).

Conventionally, sugarcane bagasse is usually burned to produce fertilizer or steam and electricity to fuel the boilers in sugar mills (Pandey et al., 2000). Recently, it has been used for biofuel production; however, the production cost of biofuels from lignocellulosic biomass is still considered to be relatively high, which makes it difficult to be price-competitive and commercialized on a large scale (Halling and Simms-Borre, 2008). At the moment, the cost of bagasse pretreatment (to remove or separate its recalcitrant components before converting to biofuels) and microbial enzymes contributes mostly to the total production cost, resulting in reducing the incentive to transition from first generation to second generation of biofuels in sugarcane (Yuan et al., 2008). To maximize the efficiency of conversion of sugarcane biomass into biofuels, it is imperative to generate improved sugarcane cultivars with not only high biomass yield and fiber content but also better biomass degradability for conversion to biofuels in addition to improving the pretreatment and enzyme digestion technologies.

This review focuses on the potential for the genetic improvement of sugarcane as a source of biomass for biofuels, exploring the beneficial characteristics of sugarcane, the available genetic resources and germplasm, the potential of cell wall modification by breeding and biotechnology, and the potential of whole genome/transcriptome sequencing applications in dissecting important biofuel traits to improve sugarcane biomass composition. This will define the targets for potential genetic manipulation and better exploitation of sugarcane biomass for biofuels.

## **SUGARCANE AT A QUICK GLANCE**

## **Biology and Genetics**

Taxonomically, sugarcane belongs to the genus *Saccharum* (established by Carl Linnaeus in 1753), in the grass family *Poaceae* (or *Gramineae*), subfamily *Panicoideae*, tribe *Andropogoneae*, subtribe *Sacharinae*, under the group *Saccharastrae* and has a very close genetic relationship to sorghum and other grass family members such as *Erianthus* and *Miscanthus* (Amalraj and Balasundaram, 2006). Typically, the genus is divided into six different species namely *Saccharum barberi*, *Saccharum edule*, *Saccharum officinarum*, *Saccharum robustum*, *Saccharum sinense*, and *Saccharum spontaneum* (Daniels and Roach, 1987; Amalraj and Balasundaram, 2006), in which *S. spontaneum*and *S. robustum*are wild species; *S. officinarum*, *S. barberi*, and *S. sinense* are early cultivars while *S. edule* is a marginal specialty cultivar. All genotypes of the *Saccharum* genus are reported to be polyploid with the ploidy level ranging from 5*×* to 16*×* and are considered as among the most complex plant genomes (Manners et al., 2004). The cytotype (2*n*, the number of chromosomes in the cell) was reported to be different in each species as follows: *S. officinarum* (2*n* = 80), *S. spontaneum* (2*n* = 40–128), *S. barberi* (2*n* = 111–120), *S. sinense* (2*n* = 81–124), *S. edule* (2*n* = 60–80), and *S. robustum* (2*n* = 60, 80); hence, the basic chromosome number (x, the monoploid set of chromosomes in the cell) ranges from 5, 6, 8, 10 to 12 (Sreenivasan et al., 1987). The basic chromosome number of *S. spontaneum* is 8 (even though a number of very variable cytotype is observed) and of *S. officinarum* and *S. robustum* is 10 [Panje and Babu, 1960, D'Hont et al. (1998), and Piperidis et al. (2010)]. For the other three species, *S. sinense*, *S. barberi*, and *S. edule*, due to the fact that these are early interspecific hybrid cultivars, there have not been a consensus reported, but a study by Ming et al. (1998) suggested that the basic chromosome number for these three species could also be 10.

Hybrid sugarcane was derived from crosses between a female *S. officinarum* (2*n* = 80) and a male *S. spontaneum* (2*n* = 40–128). Due to the female restitution phenomenon, at first, the F1 hybrid conserves the whole *S. officinarum* chromosome set and half of the *S. spontaneum* which was 2*n* + *n*, then a few backcrosses later, this hybrid breaks down to *n* + *n*, establishing the hybrid chromosome set of modern sugarcane hybrid (Bremer, 1961). For this reason, current sugarcane cultivars (*Saccharum* spp. hybrids) have a combination of a highly aneuploid and interspecific set of chromosomes. By using genomic *in situ* hybridization (GISH) and fluorescent *in situ* hybridization (FISH), it is revealed that among chromosomes in the nucleus of modern hybrid sugarcane, approximately 80% are contributed by *S. officinarum*, 10–20% from *S. spontaneum*, and less than 5–17% from recombination of chromosomes of the two species (D'Hont et al., 1996; Piperidis et al., 2001; Cuadrado et al., 2004). Modern sugarcane hybrids are normally crosses between varieties/clones, which makes the combination of the chromosomes in each offspring unique and unpredictable due to the random sorting of the chromosomes in the genome (Grivet and Arruda, 2002). The first sugarcane breeding program, which started more than one century ago, generated a few interspecific hybrids and constitutes the basic germplasm used by sugarcane breeding programs (Ming et al., 2010). Modern sugarcane cultivars are derived from the basic germplasm, but there has been only a few generations for chromosome recombination opportunities (the number of meiosis that chromosomes have undergone is mainly about 2–7) as the sugarcane breeding processes normally take between 10 and 15 years (Raboin et al., 2008; Ming et al., 2010). As a result, the modern sugarcane population has a narrow genetic basis and high linkage disequilibrium (Roach, 1989; Lima et al., 2002; Raboin et al., 2008).

## **The Nature of a Complex, Polyploid, and Repetitive Genome**

The complex and polyploid genome of sugarcane makes the process of analyzing and understanding difficult by normal methods applied to diploid plants. The size of the sugarcane genome is about 10 Gb while its genome complexity is due to the mixture of euploid and aneuploid chromosome sets with homologous genes present in from 8 to 12 copies (Souza et al., 2011). The estimated monoploid genome size is about 750–930 Mb (the monoploid genome size of the two parental species, *S. officinarum* and *S. spontaneum*, are 930 Mb and 750 Mb, respectively), not much larger than the sorghum genome (~730 Mb) and about twice the size of the rice genome (~380 Mb) (D'Hont and Glaszmann, 2001). On the other hand, studies revealed that despite this complex and polyploid genome, sugarcane showed synteny with other grasses, especially sorghum (collinear, due to the limited divergence time) and maize (orthologous but altered loci collinearity) [reviewed in Grivet and Arruda (2002)]. It was thought that the sugarcane genome contains roughly the same amount of repetitive DNA as in the sorghum genome (Jannoo et al., 2007); however, studies on BAC-end sequences by Wang et al. (2010), Figueira et al. (2012), and Kim et al. (2013) suggested that there is less repetitive content in the sugarcane genome (e.g., 45.2% and 42.8% repetitive sequences observed in large BAC collections in comparison to 61% in the sorghum genome). More recently, using the *k*-mer approach, Berkman et al. (2014) found that the repetitive proportion in three sugarcane hybrid cultivars ranges from 63.74 to 78.37% and higher than that in the sorghum genome (55.5%) using the same approach. The authors postulated that the increased proportion could be attributed to ploidy level rather than repetitive content in the sugarcane genome. A high genecopy number, the integration of two chromosome sets from two different species, and a significant repeat content hinder the understanding of how the genome functions and obtaining a genuine assembled monoploid genome (Souza et al., 2011; Figueira et al., 2012).

## **Candidate Crop for Future Biofuels**

To date, sugarcane is among the most efficient crops in the world together with other C4 grasses such as switch grass (*Panicum virgatum*), *Miscanthus* species (*Miscanthus x giganteus*), and *Erianthus* species (*Erianthus arundinaceus* Retz.) in terms of converting solar energy into stored chemical energy and biomass accumulation (Tew and Cobill, 2008; Furtado et al., 2014). In general, C4 plants outperform C3 plants in biomass yield, including grain, stem, and leaf yield (Jakob et al., 2009; Wang and Paterson, 2013). Sugarcane and other C4 grasses are the highest yield potential feedstocks (**Table 1**), and for sugarcane, the potential yield can exceed 100 tons dry matter per hectare per year (Jakob **TABLE 1 | Average lignocellulosic biomass yield (dry matter) from sugarcane compared to other sources**.


*<sup>a</sup>Average total cane biomass dry matter is 39 tons/ha/year (Moore, 2009).*

et al., 2009; Moore, 2009; Henry, 2010a). At present, the most suitable energy crop is probably sugarcane because of its high biomass yield and the potential for production on other than prime agricultural land avoiding competing with the land used for food industries (Waclawovsky et al., 2010). Globally, sugarcane is the most important crop in about 100 countries with a production area of 26.9 million hectares, total production of ~1.9 billion tons, and yield of 70.9 tons of fresh cane per hectare (FAOSTAT, 2015). At present, Brazil is the world's largest sugarcane producer followed by India, China, Thailand, Pakistan, Mexico, Colombia, Indonesia, Philippines, U.S., and Australia (FAOSTAT, 2015). In sugarcane internodal tissue, sucrose concentration ranges from 14 to 42% of the dry weight (Whittaker and Botha, 1997), while the rest of dry biomass comes from the cell wall lignocellulose, mostly containing cellulose, hemicellulose, lignin, and ash (Pereira et al., 2015). Biofuels from sugarcane can be produced extensively not only from its soluble sugar but also from main residues in sugarcane production, bagasse and trash, on the same production area (Seabra et al., 2010; Alonso Pippo et al., 2011a,b; Macrelli et al., 2012). The total estimated available lignocellulosic biomass from sugarcane worldwide was 584 million dry tons per year, with an average lignocellulosic biomass yield of 22.9 dry tons per hectare per year (Van Der Weijde et al., 2013). Sugarcane bioethanol yield from bagasse is estimated at about 3,000 L per hectare in a total yield of 9,950 L per hectare from sugar and bagasse (Somerville et al., 2010).

## **AVAILABLE SUGARCANE GENETIC RESOURCES FOR BIOFUELS**

## **Existing Variations within** *Saccharum* **Germplasm**

Genetically diverse sugarcane germplasm may play a key role in improving sugarcane for biofuels through breeding and biotechnological approaches. Genetic variation may be found in biomass yield, fiber content, and sugar composition in the *Saccharum* germplasm. This includes the diversity among the cultivars within one species and also diversity among species within the genus. A relatively high genetic variability within sugarcane hybrid cultivars was reported thanks to their heterozygosity and high polyploidy despite their originating from a few clones of a narrow genetic base (Aitken and McNeil, 2010). There is also great genetic and morphological diversity within *Saccharum* species, *Miscanthus* species, and *Erianthus* species to be potentially exploited and incorporated to broaden the genetic base in breeding programs (Harvey et al., 1994;Aitken and McNeil, 2010). To date, the genetic diversity of *S. officinarum* has been exploited in breeding programs; however, the diversity of *S. spontaneum* and other species have not been used much (Aitken and McNeil, 2010). *Saccharum* species have also been shown to have varied genome size, *S. officinarum* genome is about 7.50–8.55 Gb, *S. robustum* ranging from 7.56 to 11.78 Gb, and *S. spontaneum* ranging from 3.36 to 12.64 Gb, whereas the other three species – *S. sinense*, *S. barberi*, *S. edule* – and modern sugarcane are interspecific hybrids whose genome size depends upon each cross (Zhang et al., 2012).

There are two world largest collections of germplasm of *Saccharum* species, one is located in Florida (USA) while the other is in Kerala (India), containing approximately 1,200 accessions collected from 45 countries (Tai and Miller, 2001; Todd et al., 2014). These collections could be potentially selected and utilized for breeding purpose to improve sugarcane germplasm for new biofuel traits (Todd et al., 2014). The wild sugarcane species show wider variability in comparison to the domesticated species. In the *Saccharum* genus, *S. spontaneum* has the widest range of morphological variability, ratoon yielding, as well as biotic and abiotic stress tolerance (Tai and Miller, 2001; Aitken and McNeil, 2010; Govindaraj et al., 2014). The coefficient of variation (CV%) for some of the traits such as internode length, midrib width, leaf width, plant height, and stalk height studied by Govindaraj et al. (2014) were reported to be between 15 and 30%, which indicates a very high variability within the collection. It has been shown that the diversity within modern sugarcane hybrids was mostly contributed by the introgression from *S. spontaneum* (D'Hont et al., 1996). On the other hand, *S. robustum* also possesses a large amount of phenotypic variations in many traits studied (Aitken and McNeil, 2010). Sugarcane parental species (*S. officinarum*, *S. spontaneum*, and *S. robustum*), *Miscanthus* species, *Erianthus* species, and sorghum species with their diversity in genome content, structure, and tremendous allelic variation are a valuable and significant genetic reservoir which could be exploited for improving sugarcane biomass.

## **Genetic Markers and Maps**

To support the effort of understanding the sugarcane genome, many physical maps, molecular markers, and resources such as RFLP, RAPD, AFLP, SSR, and ESTs have been developed over time. These common markers have been applied for genetic studies such as diversity, mapping, quantitative trait loci (QTL), and synteny definition; however, these systems have been developed mostly for well-established diploid species and are less effective for polyploidy plants (Garcia et al., 2013). Markers like AFLP, SSR, and RFLP are unable to estimate the number of allelic copies and level of polyploidy in complicated genomes such as potato, strawberry, and sugarcane (Garcia et al., 2013). More recently, the use of SNPs markers, which are distributed at high density across the genome, for complex genomes can allow estimation of the number of allelic copies and the ploidy level of genomes (Zhu et al., 2008; Hall et al., 2010). The currently available genetic maps and markers have been generated for sugarcane by using lowthroughput methods, providing limited information on genome organization due to the low density of markers and coverage (most of them have less than 1,000 markers) (Aitken et al., 2014). Therefore, it is difficult to allocate these markers into linkage groups or cosegregation groups or sugarcane expected chromosome number (Souza et al., 2011). More detailed linkage maps of *S. officinarum* cultivar IJ76-545 (534 markers in 123 linkage groups) and cultivar Green German (615 markers in 72 linkage groups); *S. spontaneum* cultivar IND (536 markers in 69 linkage groups); and the hybrid cultivars R570 and Q165 (with 2,000 markers placed in more than 100 linkage groups) have been constructed (Souza et al., 2011; Aitken et al., 2014). Most recently, using Diversity Array Technology (DArT), Aitken et al. (2014) integrated DArT markers, RFLPs, AFLPs, SSRs, and SNPs into the largest marker collection for sugarcane, which contains 2,467 single-dose markers for the cross between Q165 and IJ76-514 (a *S. officinarum* accession) and 2,267 markers from the cultivar Q165. These were placed into 160 linkage groups and eight homology groups, with some uncategorized linkage groups indicating that more markers are required. There is still a need to develop high-throughput marker arrays for sugarcane association studies, to generate more markers, and also to make use of the available markers. These markers will be a valuable resource in facilitating and unraveling the complex genome structure of sugarcane. It is worth considering that information on DNA-based molecular markers of progenitor plants can potentially reveal available genetic polymorphism for the analysis of their progenies (Henry et al., 2012). This could be a useful strategy in the case of sugarcane, where the genomes of the parental species are less complex than that of the hybrids.

## **Transcriptome Sequences and Transcription Factors**

Expressed sequence tags (ESTs) and complementary DNA (cDNA) sequences provide direct evidence of the genes present in the samples, and this sequence information is very useful for genome exploration, gene prediction/discovery, genome structure identification, SNP characterization, and transcriptome and proteome analysis (Nagaraj et al., 2007). As of May 2015, the GenBank EST database (dbEST) was composed of 75,906,308 ESTs from different organisms of which 284,818 hits were detected under the search term sugarcane ("*S. officinarum*" or "*Saccharum* hybrid cultivar" or sugarcane). In the last 20 years, sugarcane ESTs have been used for gene discovery, BAC clone selection, and dissecting the coding regions of the genome, involving many projects in South Africa, France, U.S., Australia, and Brazil (Carson and Botha, 2000, 2002; Vettore et al., 2001; Casu et al., 2003, 2004; Grivet et al., 2003; Pinto et al., 2004; Bower et al., 2005). The largest collection of sugarcane ESTs was generated by SUCEST, which is composed of approximate 238,208 ESTs from 26 diverse cDNA libraries of different tissues of sugarcane cultivars, e.g., SP80-3280, SP70-1143, RB845205, RB845298, and RB805028 (Vettore et al., 2001, 2003; Souza et al., 2011). These sequences were assembled into 42,982 sugarcane assembled sequences representing more than 30,000 unique genes (~90% of the estimated genes, about 43,141, of *S. officinarum*) (Vettore et al., 2003; Hotta et al., 2010; Grassius: Grass Regulatory Information Server, 2015). There are other sugarcane EST collections containing less EST entries generated by Casu et al. (2003, 2004) (8,342 ESTs), Ma et al. (2004) (7,993 ESTs), Gupta et al.(2010)(~35,000 ESTs) and small number of ESTs by Carson and Botha (2000, 2002).

Due to the homology between genomes, genome-wide mapping of ESTs of one species provides an important framework for the genome structure of other related species (Sato et al., 2011). However, it is noteworthy that the discovery of the ESTs may be restricted to specific cultivars, as within sugarcane germplasm each cultivar has been shown to have different gene expression level [reviewed in Hotta et al. (2010)]. Moreover, for biofuel trait analysis, the TFs regulating monolignol biosynthesis in lignin pathway have received attention as understanding this allows reducing and modifying lignin content and composition which are essential in addressing the recalcitrant problem in biomass conversion (Santos Brito et al., 2015). It is shown that the lignin regulation can be species specific and information on TFs obtained from model plants such as *Arabidopsis* may require to be validated in other species (Santos Brito et al., 2015). A limited number of TFs in grass and sugarcane have been preliminarily characterized recently including those involved in monolignol biosynthesis, for example, in grass (Handakumbura and Hazen, 2012), rice (Yoshida et al., 2013), sorghum (Yan et al., 2013), and sugarcane (Santos Brito et al., 2015). Gene discovery of sugarcane has progressed to some extent despite the complexity of the genome. The valuable information of ESTs, TFs, fulllength cDNAs, and BACs provides an understanding of allelic variations in the genome while a full-genome sequence is not available.

## **BAC Libraries to Construct a Reference Genome for Sugarcane**

Sugarcane cultivar R570 and other cultivars including ones from the parental species *S. officinarum* and *S. spontaneum* have been used for constructing of bacterial artificial chromosome (BAC) libraries (Hotta et al., 2010). BAC libraries from the sugarcane include hybrid cultivar R570 (103,296 clones, average insert size of 130 kb and two other libraries of 100,000 clones) (Tomkins et al., 1999; Grivet and Arruda, 2002), *S. spontaneum* cultivar SES208 (38,400 clones, average insert size of 120 kb), and *S. officinarum* cultivar LA Purple (74,880 clones, average insert size of 150 kb) generated from different restriction enzymes, e.g., *Hin*dIII and *Bam*H1 [reviewed in Souza et al. (2011)]. BAC sequencing in sugarcane is currently based on the sequencing of BAC clones anchored to an available physical map. Even though it requires a higher cost compared to the whole-genome shotgun sequencing (using high-throughput platforms, Illumina, for example), it is a reliable approach for reference construction, especially, for highly repetitive genomes which cannot be sequenced and resolved by a short-read method (Eversole et al., 2009; Steuernagel et al., 2009). This BAC sequencing approach has been used successfully in sequencing of *Arabidopsis*, rice, and maize genomes and producing the barley reference genome [reviewed in Steuernagel et al. (2009)]. The ongoing Sugarcane Genome Sequencing Initiative (SUGESI) has selected 5,000 BAC clones for sequencing from a library by Tomkins et al. (1999) of cultivar R570, the most intensively characterized cultivar to date, to help assembly of the monoploid coverage (monoploid tiling path) of the sugarcane genome using the sorghum sequence as the guide (Souza et al., 2011; Sugesi, 2015).

## *Sorghum bicolor* **Genome as the Closest Related Reference Genome**

Sorghum is the most closely related species to sugarcane (Grivet et al., 1994; Dillon et al., 2007). The sorghum genome sequencing project was initiated and completed in 2007 with the total genome size of ~730 Mb, and 34,496 protein-coding loci, at the coverage of 8.5*×* using whole-genome shotgun sequencing by standard Sanger methodologies (Paterson et al., 2009). The sequenced genome is composed of 10 pairs of chromosomes and 3,294 supercontigs (most of these have been placed into chunks on 10 chromosomes), covering 90% of the genome and 99% of proteincoding regions (including the majority of available non-repetitive markers, known sorghum protein-coding genes, and the majority of ESTs) (Paterson et al., 2009). The sorghum genome has approximately 61% repetitive DNA, a low level of gene duplication compared to other C4 grasses, and a high degree of gene parallelism with sugarcane, even though the sugarcane genome is much more polyploid (Paterson et al., 2009, 2010). Microcollinearity between sugarcane and sorghum genomes indicated that sorghum is suitable as the template for sugarcane genome assembly (Ming et al., 1998; Wang et al., 2010; Figueira et al., 2012). It has been suggested that the sugarcane genome could be 20–30% smaller than that of sorghum despite the estimated monoploid genome size of sugarcane being about 760–930 Mb, at approximately the size of the sorghum genome (Figueira et al., 2012).

## **BIOMASS-DERIVED BIOFUELS AND THE CHALLENGING ISSUES IN BIOMASS CONVERSION TO BIOFUELS**

## **The Second Generation of Biofuels – Cell Walls for Fuels**

Due to the depletion of fossil fuel sources, the potential for oil to become more expensive, and the raising awareness of the negative impact of fossil fuels on the environment, biomassderived biofuels have been investigated and developed recently as an alternative source of renewable, sufficient, and clean energy (Botha, 2009). The demand for renewable biofuels is predicted to be increasing (Fedenko et al., 2013). The first generation of biofuels from plant biomass involved the process of conversion of stored polysaccharides, non-structural carbohydrates, and oils from plants (starchy, sugary, and oily parts of plants such as corn starch, sugarcane molasses, soybeans, canola seeds, and palm oil) into fuels like ethanol and diesel (Schubert, 2006; Yuan et al., 2008). However, these sources are also used as food supplies and are limited due to the increasing demand from the growing world's population (Schubert, 2006). The second generation of biofuels can be generated by using the non-food parts of plants such as cell walls, composed of structural polysaccharides, such as cellulose and hemicellulose (Schubert, 2006; Yuan et al., 2008; Henry, 2010a). This is considered to be advantageous over the first generation of biofuels as it has a higher energy production potential, lower cost, sustainable CO<sup>2</sup> balance, no competition with the food production, and a wide range of plant biomass sources are available at costs affordable to a biorefinery (Yuan et al., 2008; Henry, 2010a). As of 2009, sugarcane biomass as sucrose accounted for about 40% of biofuels feedstock worldwide for first-generation biofuel production (Lam et al., 2009). Using sugarcane bagasse as a feedstock for second-generation biofuels would lead to doubling the current output of biofuel production from sugarcane (Halling and Simms-Borre, 2008).

## **Sugarcane Cell Wall and Biomass Composition**

Physically, sugarcane biomass can be divided into four major fractions, whose content depends on the industrial process: fiber (heterogeneous organic solid fraction), non-soluble solids (inorganic substances), soluble solids (sucrose, waxes, and other chemicals), and water (Canilha et al., 2012; Shi et al., 2013). Second generation of biofuels focuses on using the fiber fraction especially the cell wall constituents of the plant to produce biofuels (Schubert, 2006; Henry, 2010a). This approach may be made more efficient by optimizing the composition of the biomass source for biofuel production. This could be achieved by advances in pretreatment methods or biotechnological modification of cell wall synthesis pathways to create a biomass that can be more efficiently processed (Sims et al., 2006; Yuan et al., 2008; Simpson, 2009; Viikari et al., 2012). Three major components make up the fiber fraction of sugarcane, namely, cellulose, hemicellulose (or non-cellulosic polysaccharide components), and lignin. Cellulose constitutes around 50% of the dry weight sugarcane bagasse while hemicellulose and lignin each account for about 25% (Loureiro et al., 2011). These three components are biosynthesized through different complex pathways (Higuchi, 1981; Whetten and Ron, 1995; Saxena and Brown, 2000; Mutwil et al., 2008; Harris and DeBolt, 2010; Pauly et al., 2013). Cellulose and hemicellulose molecules form the cell walls which act as the skeleton of plants and are strengthened by lignin and phenolic cross-linkages (Carpita, 1996; Henry, 2010b). The complex interlinking between cell wall components plays an important role in grass defense and yet challenges the biofuel production by requiring the pretreatment to separate them (De O. Buanafina, 2009).

The sugarcane and grass cell wall are categorized as type II cell wall, which differs from the type I and type III cell walls of other plants [reviewed in Souza et al. (2013)]. In general, there is little pectin, less lignin, and less structural proteins in grass cell walls than that in the non-grasses (Carpita, 1996; Henry, 2010b; Saathoff et al., 2011). There is similar cellulose content between grass and non-grass primary and secondary cell walls; however, hemicellulose composition is different between two groups. Grass cell walls have four to eight times more xylans, higher mixed linkage glucans, and lower levels of xyloglucans, mannans, glucomannans, and pectin in primary cell wall, but higher phenolics and lignin in the secondary cell wall (Loureiro et al., 2011). Grassy lignin is composed of three monolignols (lignin syringyl – S, lignin guaiacyl – G and lignin hydroxyphenyl – H subunits) forming various ratios of them and normally has more H subunit (more coumaryl derivatives) than in non-grasses (Vogel, 2008). A recent study by Bottcher et al. (2013)showed that sugarcane lignin content and composition are varied depending on tissue types and stem positions on the plant. Within one plant, the bottom internode has higher lignin accumulation than the top internode, and the inner part of stem has higher syringyl/guaiacyl (S/G) ratio than the outer part. Polysaccharides found in sugarcane leaf and culm walls were similar but different in the proportions of xyloglucan and arabinoxylan (Souza et al., 2013). The major monosaccharides released from sugarcane cell walls were glucose, xylose, and arabinose (Loureiro et al., 2011; Rabemanolontsoa and Saka, 2013; Souza et al., 2013). Understanding the fine structure and detailed composition of sugarcane cell wall will assist in optimizing the tissue pretreatment and cell wall hydrolysis protocol. At present, converting sugarcane lignocellulosic biomass to ethanol includes (1) pretreatment to remove the lignin and other recalcitrant cellular constituents (or hemicellulose) to free cellulose, (2) enzymemediated action to depolymerize carbohydrates to simple sugars, and (3) fermentation of sugars and distillation of ethanol as the end product (Canilha et al., 2012).

## **Dealing with the Conversion Issues**

Even though sugarcane biomass is less resistant to enzymatic digestion compared to that from woody plants, it is reported that biomass recalcitrant components impede the efficiency of the conversion to ethanol (Jung, 1989; Anterola and Lewis, 2002; Chen and Dixon, 2007; Himmel et al., 2007; Balat et al., 2008; Li et al., 2013). Biomass recalcitrance is caused by many factors such as the presence of epidermal and sclerenchyma tissues, vascular bundle density and arrangement, degree of lignification, heterogeneity and complexity of cell wall constituents, insoluble matter, natural inhibitors, and cellulose crystallinity (Himmel et al., 2007). Most approaches for producing biofuels from biomass at the moment rely on the disruption of the biomass, to separate lignocellulose and remove lignin in the biomass, and then conversion using microbial enzymes (Sticklen, 2006). In general, overcoming the recalcitrant issue can be addressed by physical, chemical, and genetic approaches. Physical and chemical strategies deal mainly with the pretreatment and involve loosening the cell wall structure, lowering the biomass heterogeneity, providing the enzymes access to the cellulose, cleaving the crossing linkages, and removing enzymatic inhibitors (Balat et al., 2008; Saathoff et al., 2011). To make the physical and chemical changes in plant biomass, pretreatment processing conditions must be tailored to the specific chemical and structural composition of the various and variable sources of lignocellulosic biomass (Mosier et al., 2005). Currently available physical and chemical pretreatment methods are varied and can be listed as uncatalyzed steam explosion, flow-through acid, liquid hot water, pH-controlled hot water, dilute acid, ammonia, lime and, more recently, the method using ionic liquids (Mosier et al., 2005; Shi et al., 2013; Sun et al., 2013). Genetic approaches involve genetic enhancement, molecular biology, and plant breeding efforts to improve biomass sources by having crops with less lignin, modified lignin, crops that self-produced enzymes, and crops with increased cellulose and biomass overall [reviewed in Sticklen (2006)]. The costs of the enzymatic pretreatment of cellulosic biomass (which accounts for about 25% of total processing expenses), biomass conversion, and microbial tanks limit the price-competitiveness of biofuel from lignocellulosic biomass in comparison to fossil fuel (Gnansounou and Dauriat, 2010; Macrelli et al., 2012, 2014; Van Der Weijde et al., 2013). This emphasizes the value of genetic improvement of biomass composition to reduce processing costs.

## **POTENTIAL IMPROVEMENT OF SUGARCANE BY BREEDING FOR BIOFUELS**

The complex and highly polyploid genome of sugarcane poses a great challenge in unraveling and studying its functions. Each cross of modern sugarcane cultivar has a unique set of chromosomes due to the random sorting of chromosomes and recombination of alleles from two progenitor species (Grivet and Arruda, 2002). There are several distinct alleles at each locus in sugarcane chromosomes, making the characteristics of the offspring unpredictable and requiring evaluation of thousands of lines from many parents to gather sufficient information in breeding programs (Matsuoka et al., 2009). In conventional breeding, after crossing and obtaining the F1 generation, hundreds of thousands of F1 seedlings are used for screening for the desired traits such as disease resistance, sugar content, agronomic characteristics, and adaptability (Matsuoka et al., 2009). The process is normally repeated for some vegetatively propagated generations to obtain the required stability of the traits. For industrial purpose, after a long process of selection, from hundreds of thousands seedlings at the beginning, breeders normally end up at a limited number of clones for release as commercial lines or cultivars.

To facilitate the second generation of biofuels, sugarcane breeding programs need to be focusing not only on important traits such as total biomass yield, sugar yield adaptability to local environment, and resistance to major pathogens but also on biofuel traits (e.g., less lignin, improve biomass composition for conversion) as a whole (Matsuoka et al., 2009; Waclawovsky et al., 2010). In sugarcane breeding, to maximize heterosis, the parents are usually selected from divergent genotypes of genetic background (Tabasum et al., 2010). Increasing sugarcane biomass yield and productivity is getting more and more difficult to achieve by conventional methods; hence, broadening the sugarcane genetic basis by introgression of its ancestors or closely related species such as *Miscanthus* and *Erianthus* is being explored in sugarcane improvement [reviewed in Dal-Bianco et al. (2012) and De Siqueira Ferreira et al. (2013)]. This is normally done by crossing *S. officinarum* and *Erianthus*, *Miscanthus*, or backcrossing the hybrids to *S. spontaneum* (Matsuoka et al., 2009). Dual-purpose cane and energy cane, sugarcane lines for lignocellulosic biomass production, have been derived from two sugarcane species, *S. spontaneum* and *S. robustum*, by crossing to develop lines with a high ability to accumulate fiber and high biomass content in addition to accumulating soluble sugars (De Siqueira Ferreira et al., 2013). Another case is *Miscane*, which was the result of crossing between *Saccharum* x *Miscanthus*. This produces cane varieties with more biomass (lignocellulose and total fermentable sugars), disease resistance, and cold tolerance. This effectively adapts *Miscanthus* to a tropical climate and expands sugarcane production to temperate, dry, and cold conditions (Alexander, 1985; Burner et al., 2009; Lam et al., 2009). Recently, using molecular markers in sugarcane breeding program (marker-assisted selection) allows the direct comparison of DNA genetic diversity and provides a precise tool in assessing the genetic diversity of the germplasm (Tabasum et al., 2010; Berkman et al., 2012). The use of markers associated with the desired traits in combination with the advances in next-generation sequencing (NGS) technology, bioinformatics tools, and highthroughput phenotyping methods will significantly improve the sugarcane breeding programs (Lam et al., 2009). NGS will allow a great number of markers such as SNPs to be generated, which could be used to obtain a high density of marker at high coverage across the genome, to dissect the important traits they associate with. These sources of markers will be essential in breeding programs for screening of the parental plants from germplasm collection and of progenies derived from the crosses, selecting traits where the phenotypic methods are not practical (Berkman et al., 2012). High-throughput phenotyping methods will collect data from a large number of samples to overcome the small effects of genes, especially the QTL, controlling the traits (Lam et al., 2009).

## **POTENTIAL IMPROVEMENT BY MOLECULAR GENETICS FOR BIOFUELS**

The competitiveness of biofuels over other options relies on biotechnology advancement. Efficient conversion of plant biomass to biofuels requires the supply of appropriate feedstocks that can be sustainably produced in large quantities at high yields. The efficient conversion of the biomass in these feedstocks will be facilitated by having a composition that is optimized for efficient processing to deliver high yields of the desired end products. Manipulating of the carbohydrates of the cell walls is the key of improving the biomass composition for biofuels (Harris and DeBolt, 2010). Powerful tools of biotechnology could aim to produce genetically modified sugarcane plants with a favorable ratio of cellulose to non-cellulose content; with *in planta* enzymes that can digest the biomass or degrade the lignin prior to its conversion to ethanol; with pest and disease resistance, flower inhibition, abiotic resistance; or incorporate them into elite sugarcane cultivars for better agronomic performance (Sticklen, 2006; Yuan et al., 2008; Matsuoka et al., 2009; Arruda, 2012).

Among the grasses potentially used for biofuel production such as sugarcane, switch grass, *Miscanthus*, and *Erianthus*, sugarcane has been used more for gene transformation studies (Falco et al., 2000; Manickavasagam et al., 2004; Basnayake et al., 2011) and the first transgenic sugarcane was established by Bower and Birch (1992). The current status of improving sugarcane biomass by using the genetic tools is hindered by its genome complexity, low transformation efficiency, transgene inactivation (gene silencing and regulation), somaclonal variation, and difficulty in backcrossing (Ingelbrecht et al., 1999; Hotta et al., 2010; Arruda, 2012; Dal-Bianco et al., 2012). Targets tackled so far on sugarcane include sucrose and biomass yield increase [i.e., in Ma et al. (2000) and Botha et al. (2001)], downregulation of lignin content or monolignol changes in lignin to lower biomass recalcitrance (described later), expression and accumulation of microbial cellulosic enzymes in leaf [i.e., in Harrison et al. (2011)], herbicide tolerance [i.e., in Gallo-Meagher and Irvine (1996) and Enríquez-Obregón et al. (1998)], disease or pest resistance [i.e., in Joyce et al. (1998), Arencibia et al. (1999), and Zhang et al. (1999)], flowering inhibition [reviewed in Matsuoka et al. (2009) and Hotta et al. (2010)], and drought tolerance [i.e., in Zhang et al. (2006)]. Genetically modified sugarcane has great potential to contribute to biofuel production, with new varieties incorporating these characteristics (Arruda, 2012). Unexploited genes not only from the *Saccharum* germplasm but also in other related species, such as cold-tolerant genes in *S. spontaneum* and *Miscanthus* or drought-tolerant genes in sorghum, once identified would allow their integration into the sugarcane genome, facilitating the production of more sugarcane biomass in temperate areas or under dry conditions (Lam et al., 2009).

Increasing plant cellulose and total biomass content may be achieved by using approaches such as manipulation of growth regulators or key nutrients, increasing the ability of the plant to fix carbon by increasing atmospheric CO<sup>2</sup> and also manipulating some key metabolic enzymes in biomass synthesis pathways [reviewed in Sticklen (2006)]. Reduction of the cross-links of the maize cell walls (including ferulate and diferulate cross-links; benzyl ether and ester cross-links) has been shown to increase the initial hydrolysis of its cell wall polysaccharides by up to 46% (Grabber, 2005). In general, selection of grasses with less ferulate cross-linking or potent microbial xylanases by breeding or engineering tools are more attractive than pretreatment of the cell wall with a feruloyl esterase (Grabber, 2005).

Lignin content accounts for about 25% of sugarcane total lignocellulosic biomass and is probably the main obstacle affecting the efficiency of saccharification during conversion to ethanol (Canilha et al., 2012, 2013). Lignin and other recalcitrant components in cell walls prevent cellulase accessing the cellulose molecules and need to be removed before further processing (Sticklen, 2006). Lignin biosynthesis pathways are complicated and at least 10 different enzymes have been found involved in the lignin pathway in sugarcane (Higuchi, 1981; Whetten and Ron, 1995) and a total of 28 unigenes associated with monolignol biosynthesis were identified in sugarcane using SUCEST database and annotated genes from closely related species such as sorghum, maize, and rice (Bottcher et al., 2013). Tailoring sugarcane biomass composition for biofuels can be achieved by manipulating some of the key genes in lignin pathway (downregulation of some key enzymes), mostly targeting genes which encode the terminal enzymes such as caffeic acid *O*-methyltransferase (COMT) and cinnamyl alcohol dehydrogenase (CAD), to minimize the impact of the modifications on growth and development of the plant [as reviewed in Sticklen (2006), Jung et al. (2012), and Furtado et al. (2014)]. Not only lignin content but also the lignin S/G ratio is a very important aspect to consider in terms of modifying the lignin content because these two are both associated with biomass recalcitrance (Chen and Dixon, 2007; Li et al., 2010). Sugarcane lignin content was reduced by 3.9–13.7% using RNA interference (RNAi) suppression to downregulate the *COMT* gene [which has at least 31 different ESTs involved (Ramos et al., 2001)] by 67–97% and at the same time, the lignin S/G ratio was reduced from 1.47 to 1.27–0.79 (Jung et al., 2012). This resulted in an increase of up to 29% in total sugar yield without pretreatment (34% with dilute acid pretreatment). This study suggests that RNAimediated gene suppression is a promising method for suppression of target genes not only in lignin pathway but also for cell wall constituent biosynthesis (Jung et al., 2012; Bottcher et al., 2013).

Producing enzymes *in planta* is another way to cut the cost of biofuel production as it reduces the expense of enzymes and enzyme treatment. *Cellulase* has been produced within the plant (in the apoplast) of *Arabidopsis*, rice, and maize without effects on the growth and development of the host plants [reviewed in Sticklen (2006)]. *In planta* enzyme expression in sugarcane is still in its infancy; however, a high-yield biofuel plant such as sugarcane must be a target for the production of enzymes within the biomass*.* Recombinant protein enzymes have been targeted to organelles such as chloroplasts, vacuoles, and the endoplasmic reticulum to separate the enzymes produced and their substrates (Harrison et al., 2011). In sugarcane, thanks to its well-established transformation methods via *Agrobacterium*, the expression of enzymes in leaves and other tissues is feasible (Manickavasagam et al., 2004; Taylor et al., 2008). Endoglucanases and exoglucanases have been overexpressed in sugarcane leaves by using the maize *PepC* promoter achieving an accumulation level of 0.05% of total soluble proteins (endoglucanase, in chloroplast) and less of exoglucanases without altering the phenotype (Harrison et al., 2011). In the future, enzymes might be synthesized in specific energy cane plants that could be coprocessed with other biomass sources from sugarcane for sugar and biomass production (e.g., bagasse from sugar mills) (Arruda, 2012).

## **POTENTIAL OF SUGARCANE WHOLE GENOME AND TRANSCIPTOME SEQUENCING FOR BIOFUELS**

The advent of NGS technology and a sharp reduction in per-base cost in the past decade [as reviewed in Van Dijk et al. (2014)] allows us to sequence the whole genome of a species, even a complex genome such as sugarcane, at a relatively low price within a relatively short time. At present, the cost of sequencing of a human genome at 30*×* coverage using the latest Illumina's Hiseq X is around US \$1,000. Since the first plant genome was completely sequenced (*Arabidopsis thaliana* in 2000) using the traditional Sanger sequencing platform, the sequencing strategies have moved to high-throughput and cost-effective approaches (Henry et al., 2012). High-throughput genome sequencing platforms have recently advanced and facilitated improved genotyping, allowing huge data output to be generated for polymorphism detection (especially SNPs) and marker discovery.

## **Potential Strategies in Dissection of Biofuel Traits in Sugarcane**

At present, a whole-genome sequence of sugarcane is not available to support its biofuel trait analysis. However, a strategy to overcome this using the currently available resources, for dissecting biofuel traits, for example, in sugarcane biomass, is to carry on the association studies, in which a population of genetic variability is selected, phenotyped, and genotyped. Association studies use the molecular markers from the genetic variability to detect the association between markers and traits of interest in order to validate the location of the genes, especially for quantitative traits (Huang et al., 2010). This strategy has been used for human and animal genetic studies since it was first established and more recently also for plants. To date, association studies have been applied successfully to many different plants including *Arabidopsis*, wheat, barley, rice, cotton, maize, potato, soybean, sugar beet, *Pinus*, *Eucalyptus*, ryegrass [also Zhu et al. (2008); for a review, see Hall et al. (2010)], and sugarcane (Aitken et al., 2005; Wei et al., 2006) for important traits like pathogen resistance, flowering time, grain composition, and quality. Association studies differ from traditional QTL studies, where in QTL analysis the linkage disequilibrium between markers and QLTs from a segregating population is established in a cross of different genotypes, whereas in association studies a non-structured population is used (Neale and Savolainen, 2004; Ingvarsson and Street, 2011). Therefore, association studies investigate variations of the whole population not just variations between parents. Association studies analyze the direct linkage disequilibrium between genetic markers and traits to overcome the limitations of the traditional QTL in sample size, low variation, and recombination in the population (Ingvarsson and Street, 2011). In sugarcane, association studies are a powerful method for understanding the complex traits which are controlled by many loci and dosage effects (i.e., Ming et al., 2001; Wei et al., 2006; Banerjee et al., 2015). In general, association studies involve population selection, phenotyping, genotyping, population structure, and statistical testing for the association. For these, there is a requirement to have a population with genetic variability and high linkage disequilibrium; and for sugarcane, the most important aspect of doing association studies is having marker data and a breeding population of elite varieties (Huang et al., 2010). Due to the limited number of generations, low recombination rate between chromosomes, and strong founder effect, it is expected that sugarcane has an extensive linkage disequilibrium despite the large number of chromosomes and being an outcrossing species (Huang et al., 2010). In fact, attaining a F2 population (such as inbred backcrosses or recombinant inbred lines and double haploid lines) in sugarcane is not practical due to its clonal propagation, high heterozygosity, and inbreeding depression (Aitken and McNeil, 2010; Sreedhar and Collins, 2010). Therefore, more commonly, a segregating F1 population from biparental crosses or self-pollinated progenies from heterozygous parents (as the *pseudo* F<sup>2</sup> population) are used, and hence, most of sugarcane linkage maps (as AFLP, RAPD, isozyme, and SSR) were developed on this type of F1 population (Sreedhar and Collins, 2010). To date, most of these maps have low coverage and a limited number of markers because of the genome complexity and high cost of marker generation (Aitken et al., 2014). The high redundancy of the chromosomes in the sugarcane genome implies that with conventional approaches only the single-dose markers (present on only one of the homologous/homoelogous haplotype) can be used to obtain a high-resolution mapping (Hoarau et al., 2002; Le Cunff et al., 2008).

The potential applications of the current genotyping technologies to sugarcane association studies employ both whole-genome sequencing and whole transcriptome sequencing technologies. Genotyping is normally either by analysis candidate genes or genome-wide approaches, in which the candidate gene approach is restricted to genes which are likely thought to be associated with traits of interest based on prior knowledge (Hirschhorn and Daly, 2005; Ingvarsson and Street, 2011). At present, whole-genome sequencing based on the random sequencing of fragments of whole genomic DNA has been successfully applied to mediumsize genomes with limited amount of repetitive elements, genome resequencing with the guide of a reference sequence, or *de novo* assembly of small genomes (Steuernagel et al., 2009; Henry et al., 2012; Xu et al., 2012; Edwards et al., 2013). The large genome size of sugarcane is partially attributable to sugarcane being a polyploid and the genome having a significant amount of repetitive sequences (Berkman et al., 2014). As a result, the current short reads generated from NGS technologies cannot resolve completely the challenges in the sugarcane genomes. For highly repetitive genomes, the genomic complexity will be lost or reduced by using the *de novo* assembly approaches of NGS-derived short reads as the identical repeat sequences in the genome will be collapsed (Green, 2002). Therefore, it is required to develop efficient genotyping strategies using whole-genome sequencing data for sugarcane system to overcome the challenges. Moreover, whole transcriptome sequencing gives details of the entire transcript expressed in the samples across the whole genome and could be applicable to the sugarcane genome in identifying biological significant variations (SNPs) between different developmental stages, between varieties, or for transcripts *de novo* assembly and gene discovery (Henry et al., 2012).

For large and polyploid genomes, there are still requirements to enrich the genomic DNA to capture the coding regions to ensure the depth of coverage, resolve the variable short reads, and lessen the effect of repetitive sequences in the genome on discovery of polymorphisms (Bundock et al., 2012; Henry et al., 2012). Selective sequencing of genomic loci of interest (genes or exomes) can reduce the cost compared to whole-genome sequencing and therefore simplify the data interpretation since non-coding regions are not abundant in the data. The enrichment techniques can be hybrid capture (e.g., Agilent SureSelect, NimbleGen, FlexGen) or selective circularization (e.g., Selector probes) or PCR amplification (e.g., Raindance). Hybrid capture supported by a microarray platform has been applied to sugarcane and other complex genomes due to its high capacity to enrich large regions of interest (1–50 Mb), the possibility of multiplexing, the availability of kits, and a the small amount of input DNA required (*<*1–3 μg) (Mertes et al., 2011). This approach uses a selection library of fragmented DNA or RNA representing the targets (normally oligonucleotides from 80 to 180 bases produced from known information such as gene indices, ESTs) to capture the cDNA fragments from a shotgun DNA library based on the hybridization and then sequence the captured fragments (Mertes et al., 2011; Bundock et al., 2012). Bundock et al.(2012) conducted the solution-based hybridization (Agilent SureSelect) to capture the exome regions of sugarcane using sorghum and sugarcane coding probes, enriched the genome 10–11 folds, and detected 270,000–280,000 SNPs in each genotype of the material tested. At the moment, a great number of SNPs from a genome or haplotype can be generated by using high-capacity genome sequencing instruments or high-density oligonucleotide arrays (Zhu et al., 2008). The continuous advancement in genotyping technology Hoang et al. Sugarcane Genetic Improvement for Biofuels

allows generation of up to 1 million SNPs spanning across the entire genome in one reaction (e.g., using SNP chip), and the newest SNP chip can measure the copy number as well as the allelic variation. Examples of available platforms are Affymetrix (e.g., Affymetrix Genome-Wide Human SNP Array 6.0) and Illumina (e.g., Illumina's WGGT Infinium BeadChips). Due to the multiple chromosomes in the homologous groups of sugarcane genome and the number of alleles at each locus (and the SNPs numbers consequently), an allele would likely to be defined by a combination of SNPs, not just a single SNP (McIntyre et al., 2006, 2015). SNP genotyping including SNP calling and statistical methods to estimate the ploidy level and allele dosage within homologous groups have been developed for sugarcane by Garcia et al. (2013) to allow in-depth association analysis of the genome. In this study, SNPs were developed by SEQUENOM iPLEX MassARRAY and capture primers and then discovered by QualitySNP software, mass-based procedures, and the SuperMASSA software. For whole transcriptome sequencing, Cardoso-Silva et al. (2014) identified 5,106 SSRs and 708,125 SNPs from the unigenes assembled from RNA-seq data of contrasting sugarcane varieties. These advances in sugarcane genotyping technology, together with welldeveloped high-throughput phenotyping methods for biofuel traits [reviewed in Lupoi et al. (2013) and Lupoi et al. (2015)] and bioinformatics tools, could accelerate sugarcane analysis while a reference genome is not available.

Some of the association studies have been carried out on sugarcane recently such as those for QTLs which control the *Pachymetra* root rot and brown rust resistance on 154 genotypes (McIntyre et al., 2005); genetics of root rot, leaf scald, Fiji leaf gall, cane sugar, and yield using 1,068 AFLP, 141 SRR (on 154 genotypes), and 1,531 DArT markers (on 480 genotypes) (Wei et al., 2006, 2010); smut and *eldana* stalk borer using 275 RFLP and 1,056 AFLP markers on 77 genotypes (Butterfield, 2007); resistance to sugarcane yellow leaf virus using 3,949 polymorphic markers (DArT and AFLP) on 189 genotypes (Debibakas et al., 2014); markers agro-morphological traits, sugar yield disease resistance, and bagasse content using 3,327 DArT, AFLP, and SSR markers on 183 genotypes (Gouy et al., 2015); and sucrose and yield contributing traits using 989 SSR markers on 108 genotypes (Banerjee et al., 2015). Using the Affymetrix GeneChip Sugarcane Genome Array, Casu et al. (2007) identified 119 transcripts associated with enzymes of cell wall metabolism and development on sugarcane variety Q177. These promising preliminary studies were carried out on small sample sizes and limited numbers of markers (even though a small number of significant associations have been identified) while the polyploid sugarcane genome and small effect of quantitative traits requires larger sample sizes and more markers (e.g., genome-wide markers) so that significant association can be detected (Huang et al., 2010; Gouy et al., 2015).

## **The Reference Sequence Matters**

As mentioned earlier, construction of a sugarcane nuclear genome reference sequence is an important objective, even though it might take some time to finish. However, in the meantime, sugarcane genome analysis still can exploit the currently available genetic resources such as the sorghum gene indices (sorghum gene models), sugarcane gene indices (DFCI Sugarcane Gene Index version 3.0, an integrated collection of sugarcane ESTs, complete cDNA sequences, non-redundant data of all sugarcane genes and their related information), transcription factors (TFs), and sugarcane tentative consensus/assembled sequences. For example, in the study mentioned earlier, Bundock et al. (2012), based on the gene sequences in the sorghum genome and sugarcane gene indices, captured the exomic regions of two sugarcane genotypes Q165 and IJ76-514, detected SNPs present in 13,000–16,000 targeted genes from Illumina short read data of these samples, and 87–91% of SNPs were validated and confirmed by 454 sequencing. For transcript profiling, the reference transcriptome sequence can be constructed for specific tissues using *de novo* assembly such as in Vargas et al. (2014) and Cardoso-Silva et al. (2014) and validated to find suitable reference gene sets to be used for gene expression normalization as in Guo et al. (2014). The currently available resources, on the other hand, are also utilized. Park et al. (2015) used the Sugarcane Assembled Sequences from SUCEST-FUN database as reference sequences in a study on cold-responsive gene expression profiling of sugarcane hybrids and S*. spontaneum* and found that more than 600 genes are differentially expressed in each genotype after applying stress.

## **CONCLUSION**

Sugarcane has been shown to be a good candidate for use as a lignocellulosic biomass feedstock for second-generation biofuel production. However, its genome complexity still remains a great bottleneck restricting the dissection of biofuel traits. The most significant achievements in improving sugarcane biomass for biofuels so far have been the establishment of the high fiber cane varieties to generate more lignocellulosic biomass, and preliminary results in modifying biomass with more cellulose, less lignin content, a preferable lignin S/G ratio, and enzyme expressed *in planta* (in leaves) for easy conversion to biofuels. The improvement of sugarcane biomass has been by traditional breeding, molecular genetics approaches and, more recently, accelerated with the use of NGS technology. The future of second-generation biofuel production using sugarcane lignocellulosic biomass will depend greatly on advances in understanding of the key biofuel traits required to deliver more efficient and price-competitive biofuels. This objective will be facilitated once the whole genome of sugarcane is fully sequenced. Optimizing sugarcane lignocellulosic bagasse composition may result in biomass with better digestibility, modified carbohydrates, and reduction of cross-linking or self-produced enzymes (*in planta*). Currently available sugarcane genetic resources include diverse germplasm in the genus *Saccharum*, genetic markers and maps, ESTs, and the sequence of a closely related species genome. However, novel strategies need to be developed to overcome the challenges posed by the complex genetics. Traditional approaches using breeding and molecular genetics have potential for wider use improving sugarcane while the advent of NGS technology and high-throughput phenotyping technologies will accelerate the process of dissection of biofuel traits, genome-wide. By using these approaches, the loci of interest will be defined for use to improve sugarcane biomass. Once a better understanding of the genes controlling cell wall biosynthesis is achieved, breeding programs will be able to accelerate the selection and development of varieties with optimized biomass composition to generate better sugarcane biomass sources to meet the demand of biofuel production.

## **AUTHOR CONTRIBUTIONS**

NVH wrote the paper. AF, FCB, BAS, and RJH discussed and edited the manuscript. All authors read and approved the final manuscript.

## **REFERENCES**


## **ACKNOWLEDGMENTS**

We are grateful to the Australian Agency for International Development (AusAID) for financial support through an Australian Development Scholarship (ADS) to NVH and to the Queensland Government for funding this research. This work was part of the DOE Joint BioEnergy Institute (http://www.jbei.org) supported by the U. S. Department of Energy, Office of Science, Office of Biological and Environmental Research, through contract DE-AC02- 05CH11231 between Lawrence Berkeley National Laboratory and the U. S. Department of Energy.


in maturing stem vascular tissues of sugarcane by expressed sequence tag and microarray analysis. *Plant Mol. Biol.* 52, 371–386. doi:10.1023/A:1023957214644


parentage among genotypes of sugar cane (*Saccharum* spp.). *Theor. Appl. Genet.* 104, 30–38. doi:10.1007/s001220200003


effects of population substructure. *Theor. Appl. Genet.* 114, 155–164. doi:10. 1007/s00122-006-0418-8


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Hoang, Furtado, Botha, Simmons and Henry. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **Identification and molecular characterization of the switchgrass AP2/ERF transcription factor superfamily, and overexpression of** *PvERF001* **for improvement of biomass characteristics for biofuel**

### *Edited by:*

*Robert Henry, The University of Queensland, Australia*

### *Reviewed by:*

*Jaime Puna, Instituto Superior Engenharia Lisboa, Portugal Arumugam Muthu, Council of Scientific and Industrial Research, India*

### *\*Correspondence:*

*C. Neal Stewart Jr., Department of Plant Sciences, University of Tennessee, 2431 Joe Johnson Drive, 252 Ellington Plant Sciences, Knoxville, TN 37996-4561, USA nealstewart@utk.edu*

### *Specialty section:*

*This article was submitted to Bioenergy and Biofuels, a section of the journal Frontiers in Bioengineering and Biotechnology*

> *Received: 29 May 2015 Accepted: 29 June 2015 Published: 20 July 2015*

### *Citation:*

*Wuddineh WA, Mazarei M, Turner GB, Sykes RW, Decker SR, Davis MF and Stewart CN Jr. (2015) Identification and molecular characterization of the switchgrass AP2/ERF transcription factor superfamily, and overexpression of PvERF001 for improvement of biomass characteristics for biofuel. Front. Bioeng. Biotechnol. 3:101. doi: 10.3389/fbioe.2015.00101* *Wegi A. Wuddineh1,2, Mitra Mazarei 1,2, Geoffrey B. Turner 2,3, Robert W. Sykes 2,3 , Stephen R. Decker 2,3, Mark F. Davis 2,3 and C. Neal Stewart Jr. 1,2 \**

*<sup>1</sup> Department of Plant Sciences, University of Tennessee, Knoxville, TN, USA, <sup>2</sup> Bioenergy Science Center, Oak Ridge National Laboratory, Oak Ridge, TN, USA, <sup>3</sup> National Renewable Energy Laboratory, Golden, CO, USA*

The APETALA2/ethylene response factor (AP2/ERF) superfamily of transcription factors (TFs) plays essential roles in the regulation of various growth and developmental programs including stress responses. Members of these TFs in other plant species have been implicated to play a role in the regulation of cell wall biosynthesis. Here, we identified a total of 207 AP2/ERF TF genes in the switchgrass genome and grouped into four gene families comprised of 25 AP2-, 121 ERF-, 55 DREB (dehydration responsive element binding)-, and 5 RAV (related to API3/VP) genes, as well as a singleton gene not fitting any of the above families. The ERF and DREB subfamilies comprised seven and four distinct groups, respectively. Analysis of exon/intron structures of switchgrass AP2/ERF genes showed high diversity in the distribution of introns in AP2 genes versus a single or no intron in most genes in the ERF and RAV families. The majority of the subfamilies or groups within it were characterized by the presence of one or more specific conserved protein motifs. *In silico* functional analysis revealed that many genes in these families might be associated with the regulation of responses to environmental stimuli via transcriptional regulation of the response genes. Moreover, these genes had diverse endogenous expression patterns in switchgrass during seed germination, vegetative growth, flower development, and seed formation. Interestingly, several members of the ERF and DREB families were found to be highly expressed in plant tissues where active lignification occurs. These results provide vital resources to select candidate genes to potentially impart tolerance to environmental stress as well as reduced recalcitrance. Overexpression of one of the ERF genes (*PvERF001*) in switchgrass was associated with increased biomass yield and sugar release efficiency in transgenic lines, exemplifying the potential of these TFs in the development of lignocellulosic feedstocks with improved biomass characteristics for biofuels.

**Keywords: AP2, ethylene response factors, stress response, transcription factors, biofuel, PvERF001, overexpression, sugar release**

## **Introduction**

Switchgrass (*Panicum virgatum*) is an outcrossing perennial C4 grass known for its vigorous growth and wide adaptability and, hence, is being developed as a candidate lignocellulosic biofuel feedstock (Yuan et al., 2008). The feasibility of commercial production of liquid transportation biofuel from switchgrass biomass is hampered by biomass recalcitrance (the resistance of cell wall to enzymatic breakdown into simple sugars). Lignin is considered to be a primary contributor to biomass recalcitrance as it hinders the accessibility of cell wall carbohydrates to hydrolytic enzymes. Substantial progress has been made in engineering the switchgrass lignin biosynthesis pathway to reduce lignin content and/or modify its composition (Fu et al., 2011a,b; Shen et al., 2012, 2013a,b; Baxter et al., 2014, 2015). The downregulation of individual genes in the lignin biosynthesis pathway has been effective to reduce lignin, but can result in the production of metabolites that can impede downstream fermentation processes (Tschaplinski et al., 2012). Alternatively, overexpression of transcription factors (TFs), such as switchgrass *MYB4*, has been shown to circumvent this inhibitory effect while leading to significantly reduced biomass recalcitrance and improved ethanol production (Shen et al., 2012, 2013b; Baxter et al., 2015).

The master regulators of gene cluster TFs with altered expression could, in turn, endow such traits as increased biomass yield, tiller number, improved germination/plant establishment, or root growth as well as tolerance to environmental stresses (Xu et al., 2011; Licausi et al., 2013; Ambavaram et al., 2014). Therefore, identification of TFs with such putative roles would provide a dynamic approach to developing better biofuel feedstocks that could thrive under adverse environmental conditions. The availability of switchgrass ESTs (Zhang et al., 2013) and draft genome sequences produced by Joint Genome Institute (JGI), Department of Energy, USA, provides a vital resource for the discovery of relevant target genes that could be utilized in the genetic improvement of perennial grasses, which could be used as dedicated bioenergy feedstocks. However, compared to dicots such as *Arabidopsis*, relatively little is known about the key regulatory mechanisms in monocots that control lignification and cell wall formation; this is especially true of switchgrass. Likewise, we also have depauperate knowledge about stress responses and defense against pests in these species.

APETALA2/ethylene responsive factor (AP2/ERF) is a large group of regulatory protein families in plants that are characterized by the presence of one or two conserved AP2 DNA binding domains. AP2/ERF TFs are involved in the transcriptional regulation of various growth and developmental processes and responses to environmental stressors. The AP2 domain is a stretch of 60–70 conserved amino acid sequences that is essential for the activity of AP2/ERF TFs (Jofuku et al., 2005). It has been demonstrated that the AP2 domain binds the *cis*-acting elements including the GCC box motif (Ohme-Takagi and Shinshi, 1995), the dehydration responsive element (DRE)/C-repeat element (CRT) (Sun et al., 2008), and/or TTG motif (Wang et al., 2015) present in the promoter regions of target genes thereby regulating their expression. The AP2/ERF superfamily can be divided into three major families, namely ERF, AP2, and RAV (related to API3/VP) (Licausi et al., 2013). The ERF family is further subdivided into two subfamilies, ERF and dehydration responsive element binding proteins (DREB) based on similarities in amino acid residues in the AP2 domain. The DREB subfamily in *Arabidopsis* and rice has been further classified into 4 distinct groups while ERF subfamily was clustered into 8 groups in *Arabidopsis* and 11 groups in rice based on analysis of gene structure and conserved motifs (Nakano et al., 2006). The AP2 family comprises two groups of proteins differing in the number of AP2 domain in their amino acid sequences. The majority of proteins in this group are characterized by the presence of two AP2 domains, but a few members of this group have only a single AP2 domain that is more similar to the AP2 domains in the double domain groups. RAV proteins, on the other hand, are a small family TFs characterized by the presence of B3 DNA binding domain besides a single AP2 domain. Genomewide analysis of AP2/ERF TFs has been extensively studied in many dicots including *Arabidopsis* (Nakano et al., 2006), *Populus* (Zhuang et al., 2008; Vahala et al., 2013), Chinese cabbage (Liu et al., 2013), grapevine (Licausi et al., 2010), peach (Zhang et al., 2012), and castor bean (Xu et al., 2013). However, with the exception of rice (Nakano et al., 2006; Rashid et al., 2012), and foxtail millet (Lata et al., 2014), little information is available on the AP2/ERF TF families in monocots such as switchgrass.

Numerous genes coding for AP2/ERF superfamily TFs have been identified and functionally characterized in various plant species (Xu et al., 2011; Licausi et al., 2013). The DREB subfamily proteins have been extensively studied with regard to tolerance to abiotic stress such as freezing (Jaglo-Ottosen et al., 1998; Ito et al., 2006; Fang et al., 2015), drought (Hong and Kim, 2005; Oh et al., 2009; Fang et al., 2015), heat (Qin et al., 2007), and salinity (Hong and Kim, 2005; Bouaziz et al., 2013). Moreover, it has been reported that DREB genes play roles in the regulation of ABAmediated gene expression in response to osmotic stress during germination and early vegetative growth stage (Fujita et al., 2011). ERF TFs, on the other hand, have been shown to participate in the regulation of defense responses against various biotic stresses (Guo et al., 2004; Dong et al., 2015) and/or tolerance to environmental stressors, such as drought (Aharoni et al., 2004; Zhang et al., 2010b), osmotic stress (Zhang et al., 2010a), salinity (Guo et al., 2004), hypoxia (Hattori et al., 2009), and freezing (Zhang and Huang, 2010). Moreover, AP2/ERF TFs in aspen (PtaERF1) and *Arabidopsis* (AtERF004 and AtERF038) have been suggested to be associated with the regulation of cell wall biosynthesis in some tissues (Van Raemdonck et al., 2005; Lasserre et al., 2008; Ambavaram et al., 2011). The functions of AP2 family TFs, on the other hand, have been associated with plant organ-specific regulation of growth and developmental programs (Elliott et al., 1996; Jofuku et al., 2005; Horstman et al., 2014). Genes in the RAV TF family have been shown to play a role in the regulation of gene expression in response to phytohormones such as ethylene and brassinosteroid as well as in response to biotic and abiotic stresses (Mittal et al., 2014). Therefore, AP2/ERF TF superfamily may hold tremendous potential for the improvement of bioenergy feedstocks, such as switchgrass, that is intended to be grown on marginal lands that could impose undue environmental stress.

In this study, we report the identification of 207 AP2/ERF TF genes in the switchgrass genome. Cluster analysis of the identified proteins, distribution of conserved motifs, analysis of their gene structure, and expression profiling were presented. We highlight the potential application of these data to identify putative target genes that might be exploited to improve bioenergy feedstocks. To that end, we cloned one of the ERF subfamily genes, which was subsequently overexpressed in switchgrass to improve biomass productivity and sugar release efficiency.

## **Materials and Methods**

### **Identification of AP2/ERF Gene Families in Switchgrass Genome**

We used representative genes from appropriate rice gene families as the basis to search for orthologs in switchgrass. The amino acid sequences of AP2 domain-containing rice genes represented three families: AP2 (Os02g40070), ERF (Os06g40150), and RAV (Os01g04800). These proteins were used to query the derived amino acid sequences of all switchgrass AP2/ERF TFs using tblastn against the switchgrass EST database (Zhang et al., 2013) or blastp against the *P. virgatum* draft genome (Phytozome v1.1 DOE-JGI)<sup>1</sup> . The sequences were retrieved and evaluated for the presence of AP2 domains by searching against the conserved domain database (CDD) at NCBI. The AP2-containing switchgrass sequences were further evaluated for any redundant and missing sequences by blastp searches using the previously identified homologous counterparts of the foxtail millet (Lata et al., 2014) and rice (Nakano et al., 2006; Rashid et al., 2012). The presence of multiple gene copies from the tetraploid switchgrass genome was addressed by the identification of only a single gene copy with the highest similarity to the corresponding homologs in foxtail millet or rice. Genes with additional domains besides the AP2 domain with no corresponding homologs in foxtail millet, rice, and *Arabidopsis* AP2/ERF TFs were excluded from our subsequent analysis.

### **Cluster and Protein Sequence Analysis of AP2/ERF TFs**

The amino acid sequences of the AP2/ERF TFs were imported into the MEGA6 program and multiple sequence alignment analysis was conducted using MUSCLE with default parameters (Edgar, 2004). Construction of cluster trees was performed using the neighbor-joining (NJ) method by the MEGA6 program using a bootstrap value of 1000, Poisson correction and pairwise deletion (Tamura et al., 2013). Conserved motifs in switchgrass AP2/ERF TFs were identified with the online tool, MEME version 4.10.0<sup>2</sup> using the following parameters: optimum width, 6–200 amino acids; with any number of repetitions and maximum number of motifs set at 25 (Bailey and Elkan, 1994).

### **Analysis of Gene Structure and Gene Ontology Annotation**

The genomic and coding DNA sequences of the identified AP2/ERF TFs were retrieved from the Phytozome (*P. virgatum* v1.1 DOE-JGI). The exon–intron organizations in these genes were visualized by the gene structure display server<sup>3</sup> (Guo et al., 2007). To evaluate the gene ontology (GO) annotation of the identified AP2/ERF TFs, their amino acid sequences were imported into the Blast2GO suite (Conesa and Gotz, 2008). Blastp search was performed against rice protein sequences at NCBI. The resulting hits were mapped to obtain the GO terms, which were annotated to assign functional terms to the query sequences. Plant GOslim was used to filter the annotation to plant-related terms. The protein subcellular localization prediction tool WOLF PSORT<sup>4</sup> was used to complement the results of the cellular localization predicted by blast2GO.

### **Analysis of Transcript Data from the Switchgrass Gene Expression Atlas**

The transcript data for the AP2/ERF superfamily TFs were extracted from the publicly available switchgrass gene expression atlas (PviGEA)<sup>5</sup> (Zhang et al., 2013), which was obtained by Affymetrix microarray analysis. The probe set IDs of 108 matching genes representing the switchgrass unitranscripts (PviUT) were identified by tblastn query search using the amino acid sequences of the AP2/ERF TFs. The transcript data for each tissues and stage of development were retrieved using the probe set IDs. The expression values of the genes were log2 transformed and a heatmap was created using an online graphing tool, Plotly<sup>6</sup> . Tissues used for the extraction of RNA to determine the level of expression included the following: whole seeds for seed germination at 24, 48, 72 and 96 h intervals post-imbibition, whole shoots and roots at vegetative stages, V1–V5, pooled leaf sheath (LSH), leaf blade (LB) and nodes, whole crown, the bottom, middle, and top portions of the fourth internode, vascular bundle tissues, and middle portion of the third internode all at E4 (stem elongation stage 4) developmental stage. For analysis of the expression level during reproductive developmental stages, inflorescence tissues and whole seeds along with floral tissues such as lemma and palea were used.

### **Vector Construction and Plant Transformation**

Cloning and tissue culture was performed as previously described (Wuddineh et al., 2015). Briefly, the putative homolog of *Arabidopsis* AtSHN2 (At5g11190) and rice OsSHN (Os06g40150) was identified by tblastn or blastp against the switchgrass EST database or draft genome (Phytozome v1.1 DOE-JGI) followed by cluster and multiple sequence alignment analysis to discriminate the most closely related gene for cloning. For construction of overexpression cassette, the open reading frame (ORF) of *PvERF001* was isolated from cDNA obtained from ST1 clonal genotype of 'Alamo' switchgrass using gene-specific primers flanking the ORF of the gene and cloned into pANIC-10A expression vector by GATEWAY recombination (Mann et al., 2012). The primer pairs used for cloning are shown in Table S1 in Supplementary Material. Embryogenic callus derived from SA1 clonal genotype of 'Alamo' switchgrass (King et al., 2014) was transformed with the expression vector construct through *Agrobacterium*-mediated

<sup>1</sup> http://phytozome.jgi.doe.gov/pz/portal.html

<sup>2</sup> http://meme-suite.org/

<sup>3</sup> http://gsds.cbi.pku.edu.cn/

<sup>4</sup> http://www.genscript.com

<sup>5</sup> http://switchgrassgenomics.noble.org/

<sup>6</sup> https://plot.ly/plot

transformation (Burris et al., 2009). Antibiotic selection was carried out for about 2 months on 30–50 mg/L hygromycin followed by regeneration of orange fluorescent protein reporter-positive callus sections on regeneration medium (Li and Qu, 2011) containing 400 mg/L timentin. Regenerated plants were rooted on MSO medium (Murashige and Skoog, 1962) with 250 mg/L cefotaxime to assure elimination of *Agrobacterium* from the tissues as well as promote shoot regeneration from transgenic callus (Grewal et al., 2006), and the transgenic lines were screened based on the presence of the insert and expression of the transgene. Simultaneously a non-transgenic control line was also generated from callus.

### **Plants and Growth Conditions**

T0 transgenic and non-transgenic control plants were grown in growth chambers under standard conditions (16 h*·*day/8 h*·*night light at 24°C, 390 <sup>μ</sup>E*·*m*−*<sup>2</sup> s *−*1 ) and watered three times per week, including weekly nutrient supplements with 100 mg/L Peter's 20- 20-20 fertilizer. Transgenic and non-transgenic control lines were propagated from a single tiller to produce three clonal replicates for measuring growth parameters (Hardin et al., 2013). The plants were grown in 12-L pots in Fafard 3B soil mix (Conrad Fafard, Inc., Agawam, MA, USA) and grown for 4 months to the R1 stage, in which shoot samples were collected to assay the transgene transcript abundance (Moore et al., 1991; Shen et al., 2009). Each sample was snap frozen in liquid nitrogen and macerated with mortar and pestle. The macerated samples were used for RNA extraction as described below.

### **RNA Extraction and Quantitative Reverse Transcription Polymerase Chain Reaction**

RNA extraction and analysis of transgene transcripts were performed as previously described (Wuddineh et al., 2015). Briefly, total RNA was extracted from shoot tip samples of transgenic and non-transgenic control lines using Tri-Reagent (Molecular Research Center, Cincinnati, OH, USA), and 3 μg of the RNA was treated with DNase-I (Promega, Madison, WI, USA). High-Capacity cDNA Reverse Transcription kit (Applied Biosystems, Foster City, CA, USA) was used for the synthesis of first-strand cDNA. Power SYBR Green PCR master mix (Applied Biosystems) was utilized to conduct quantitative reverse transcription polymerase chain reaction (qRT-PCR) analysis according to the manufacturer's protocol. All the experiments were conducted in triplicates. The list of all primer pairs used for qRT-PCR is shown in Table S1 in Supplementary Material. Analysis of the relative expression was done as previously described (Wuddineh et al., 2015). There was no amplification products observed with all the primer pairs when using only the RNA samples or the water instead of cDNA.

### **Determination of Leaf Water Loss**

The rate of water loss via leaf epidermal layer was determined as previously described (Zhou et al., 2014). The second fully expanded leaves of both transgenic and non-transgenic plants were excised and soaked in 50 mL distilled water for 2 h in the dark to saturate the leaves. Subsequently, the excess water was removed and initial leaf weight was measured and water loss determined by weighing the leaves every 30 min for at least 3 h. Subsequently, the detached leaves were dried for 24 h at 80°C to determine the final dry weight. The rate of water loss was calculated as the weight of water lost divided by the initial leaf weight.

### **Analysis of Lignin Content and Composition**

Both qualitative (phloroglucinol–HCl staining) and quantitative [pyrolysis molecular beam mass spectrometry (py-MBMS)] analysis of lignin content was performed as previously described (Wuddineh et al., 2015). Briefly, leaf samples collected at the R1 developmental stage and cleared in a 2:1 solution of ethanol and glacial acetic acid for 5 days were used for staining analysis. The cleared leaf samples were immersed in 1% phloroglucinol (in 2:1 ethanol/HCl) overnight for staining and the pictures were taken at 2*×* magnification. For the quantification of lignin content and S:G lignin monomer ratio by NREL high-throughput py-MBMS method, tillers were collected at R1 developmental stage, air-dried for 3 weeks at room temperature and milled to 1 mm (20 mesh) particle size. Lignin content and composition were determined on extractives- and starch-free samples (Sykes et al., 2009).

### **Determination of Sugar Release**

For analysis of sugar release efficiency, tiller samples at R1 developmental stage were collected and air-dried for 3 weeks at room temperature. The dry samples were pulverized to 1 mm (20 mesh) particle size and sugar release efficiency was determined via NREL high-throughput sugar release assays on extractives- and starchfree samples (Decker et al., 2012). Glucose release and xylose release were measured by colorimetric assays and summed for total sugar release.

### **Statistical Analysis**

To analyze the differences between treatment means, analysis of variance (ANOVA) with least significant difference (LSD) procedure was used while PROC TTEST procedure was used to examine the statistical difference between the expression of target genes in transgenic vs non-transgenic lines using SAS version 9.3 (SAS Institute Inc., Cary, NC, USA). Pearson's correlation coefficient to determine the relationship between relative transcript levels and growth parameters was calculated by SAS.

## **Results**

### **Identification of AP2/ERF TFs in Switchgrass Genome**

A total of 207 unique switchgrass genes containing one or two AP2 DNA binding domain were identified from the currently available switchgrass EST and genome databases. Amino acid sequence similarities within the conserved AP2 domain between these proteins and previously characterized AP2/ERF TFs from rice and *Arabidopsis* along with the presence of conserved B3 domain suggest that these proteins might be categorized as putative AP2/ERF TFs. The characteristic features of these genes are summarized in Table S2 in Supplementary Material. The amino acid sequences of AP2/ERF TFs showed wide variation in size (ranging from 119 to 666 amino acids) and sequence composition. Twenty-two of these TFs contained two AP2 DNA-binding domains and hence


### **TABLE 1 | Summary of the AP2/ERF superfamily gene members found in various plant species**.

*The switchgrass (P. virgatum) data are from this study. Note that switchgrass is the only polyploidy species listed above.*

*<sup>a</sup>Nakano et al. (2006).*

were classified under AP2 family. Five of the AP2/ERF proteins had a B3 conserved domain at the C-terminus in addition to the common AP2 domain, and these genes were grouped into the RAV family. Three of the remaining 180 proteins, namely PvERF049, PvERF160, and PvERF177 with a single AP2 domain, which is more similar to the amino acid sequences of AP2 domains in the AP2 family TFs, were also grouped under the AP2 family. Moreover, one AP2/ERF protein showed a distinct AP2 domain different from all other switchgrass AP2/ERF proteins but with higher shared sequence similarity with the previously identified genes in rice and *Arabidopsis*. The remaining 176 proteins were grouped into ERF family, which was further subdivided into either one of two subfamilies (ERF and DREB) based on sequence similarity in the AP2 domain. The ERF subfamily members included 121 proteins while DREB had only 55 proteins (**Table 1**).

The distribution of the identified switchgrass AP2/ERF genes across the nine chromosomes was also evaluated. Thus far, only about half of the switchgrass genomic sequences are mapped into their chromosomal locations based on the draft genome assembly by JGI-DOE available at Phytozome. Accordingly, 166 of the 207 genes could be assigned a chromosomal location. The genes were non-evenly distributed across the nine switchgrass chromosomes wherein the highest number of genes was localized on chromosomes 9, 2, and 1, with the fewest number of genes being assigned to chromosome 8 (Table S3 in Supplementary Material).

### **Cluster Analysis of Switchgrass AP2/ERF Proteins**

To confirm the classification and evaluate the sequence similarities between the switchgrass AP2/ERF TFs, a dendrogram was constructed by NJ method using the whole amino acid sequences of the proteins. The analysis showed distinct clustering of the proteins into specific groups and families as previously described in other species (**Figure 1**). Specifically, these clusters highlighted the distinction between the switchgrass AP2, ERF, and RAV families as well as between the ERF and DREB subfamilies. The ERF and DREB subfamilies were further subdivided into seven (groups V–XI) and four (I–IV) distinct groups, respectively. The cluster analysis also resolved the RAV protein family and the singleton into separate clusters, which was in accordance with the sequence similarities in the conserved domains as well as the presence of additional domains in the families/clusters.

### **Characterization of AP2/ERF Gene Structures and Conserved Motifs**

To complement the cluster analysis-based classification, the exon–intron structures of AP2/ERF genes were evaluated. The schematic representations of protein and gene structures of switchgrass AP2/ERF superfamily are presented in **Figure 2** (ERF), **Figure 3** (DREB), and **Figure 4** (AP2, RAV, and Singleton). The ORF lengths of these genes vary from 394 bp for the shortest gene to 5409 bp for the longest gene. Analysis of their gene structure showed highly diverse distribution of intron regions within the ORF of the different gene groups or families. The majority of genes belonging to ERF and DREB subfamilies and all but one of the RAV genes appeared to be intronless. Only nine DREB genes (16%) belonging to group I, III, and VI had a single intron in their gene structures. Among ERF genes, 45 (37%) had a single intron in their ORF while eight genes had two and three of them with three introns in its ORF. On the other hand, genes in the AP2 family contained a higher number of introns; ranging from 1 to 10. Only one gene in the AP2 family had a single intron while majority of the genes had more than five introns. The position and state of the introns in the ORF

*<sup>b</sup>Zhuang et al. (2008).*

of ERF family genes belonging to groups V, VII, and X show high functional conservation. For instance, about half of the genes belonging to phylogenetic group V in the ERF family showed highly conserved intron positions with an intron phase of two, meaning the location of the intron is found between the second and third nucleotides in the codon. Similarly, the intron positions and splicing phases seems conserved in group VII of the ERF subfamily (**Figures 2**–**4**).

Analysis of amino acid sequence conservation in the whole proteins of AP2/ERF superfamily showed the presence of unique conserved motifs shared between proteins within families, subfamilies, or groups (**Figures 2**–**4**). Moreover, shared conserved motifs across families, subfamilies, or between groups within subfamilies were also detected, signifying the conservation of the proteins in the AP2/ERF superfamily. In general, a total of 25 conserved motifs (M1–M25) were identified in the superfamily

of which 14 motifs, M1–M7, M9, M11, M12, M16, M20, M22, and M23, were related to the AP2 domain (Table S4 in Supplementary Material). The conserved motifs from the non-AP2 domain region appear to specify individual groups within the subfamilies. Among the ERF subfamily, proteins in groups VII and IX have the most diverse set of motifs compared to others while proteins in group XI harbors merely two motifs, M1 and M23 with the last motif being unique to the group (**Figure 2**). Moreover, shared unique motifs were found in the ERF subfamily proteins belonging to group VII (M25), IX (M10 and M15), VI (M18), and VI-L (M18). Most of the DREB genes belonging to group II have only one specific motif (M12) while a few others have additional motifs such as M5 (**Figure 3**). The pattern of conserved motif distribution within the largest group in the DREB subfamily (group III) showed the presence of two unique subgroups sharing a set of three conserved motifs, (M2, M9, and M16) and (M4, M11, and M21), respectively. Three of these motifs (11, 16, and 21) were specific to proteins in group III DREB subfamily. DREB subfamily proteins in group I were distinguished by conserved motif-M13 and motif-M24, while group IV DREB genes have unique motif-M2 (**Figure 3**). Proteins of AP2 family genes harbor four family-specific motifs, namely M7, M8, M20, and M22 (**Figure 4**). In addition, the majority of AP2 family proteins share M3 with ERF proteins belonging to group IX.

motifs within the deduced amino acid sequences as determined by MEME tool (Bailey and Elkan, 1994). The colored boxes represent the conserved motifs. **(B)** The gene features as visualized by the gene structure display server

introns are shown by thick black lines. The splicing phases of the introns are indicated by numbers. The Roman numerals indicate the group of the genes within the subfamily.

Similarly, RAV proteins also possess two unique motifs, M14 and M17 spanning the B3 DNA binding domain, in addition to M6 and M12 spanning the AP2 domain (**Figure 4**). M6 and M12 motifs are also present in most proteins in the ERF and DREB (group II) subfamilies (**Figures 2** and **3**; Table S4 in Supplementary Material).

### **Gene Ontology Annotation**

Gene ontology analysis of switchgrass AP2/ERF TFs, based on rice reference sequences, predicted candidate genes' molecular functions, putative roles in the regulation of diverse biological processes, and their cellular localization (**Figure 5**; Table S5 in Supplementary Material). According to blast2GO outputs, over 95% of

the switchgrass genes in the AP2/ERF superfamily were predicted to have sequence-specific DNA binding activities (**Figure 5A**; Table S5 in Supplementary Material). Furthermore, these genes were anticipated to be involved in the regulation of various biosynthetic processes, which could include the biosynthesis of cuticle, waxes, hormones, and other organic compounds. Importantly, many of these genes were also predicted to participate in the regulation of responses to various environmental stresses caused either by biotic factors such as pathogens and insect pests or abiotic factors such as flooding, water deprivation, wounding, and osmotic stress (**Figure 5B**; Table S5 in Supplementary Material). Cellular localization of the AP2/ERF TFs was predicted by Blast2GO analysis complemented with subcellular localization prediction tool, WoLF PSORT for proteins with heretofore ambiguous results. The results showed that majority of switchgrass AP2/ERF proteins (*>*80%) were at least dual targeted, i.e., localized to nucleus, plastid, and/or mitochondrion (**Figure 5C**; Table S5 in Supplementary Material). Only 39 gene products (20%) were predicted to be localized solely to the nucleus (Table S5 in Supplementary Material).

### **Expression Pattern of Switchgrass** *AP2/ERF* **Genes**

A switchgrass gene expression atlas (PviGEA) containing expression data for about 78,000 unique transcripts in various tissues was recently developed (Zhang et al., 2013) and is publicly available at web server<sup>7</sup> . To investigate whether the identified switchgrass AP2/ERF genes may have any association with various biological processes that occur during seed germination, vegetative, and reproductive development as well as lignification or cell wall development, transcript data were pooled from the PviGEA web server to assess their expression profile.

During seed germination (**Figure 6**; Table S6 in Supplementary Material), some genes in the DREB subfamily showed high expression at early stages of germination (radicle emergence) (48 h after imbibition) while others showed increased expression at later stages of germination (mainly coleoptile emergence) (**Figure 6**; Table S6 in Supplementary Material). Similarly, the expression of many ERF genes showed dramatic increase during early germination stage while numerous others had peak expression at later stages (coleoptile emergence (72 h) and mesocotyl elongation (96 h) stages. Four of the AP2 family genes (*PvERF193*, *PvERF194*, *PvERF195*, and *PvERF201)* displayed increased expression level at radicle emergence whereas the other two (*PvERF049* and *PvERF203)* showed increased expression at coleoptile emergence. The expression of the RAV genes and the singleton gene were apparently relatively less variable throughout the seed germination process (**Figure 6**; Table S6 in Supplementary Material).

Comparison of the expression pattern of AP2/ERF genes in roots and shoots at three vegetative phases of development (first, third, or fifth fully collared leaf stages) revealed apparent differential expression pattern between the organs and different stages of vegetative development (Figure S1 and Table S6 in Supplementary Material). Moreover, the expression pattern of AP2/ERF genes during reproductive development also showed differential expression between the reproductive tissues from the initiation of inflorescence meristem to the maturation of the seeds (Figure S2 and Table S6 in Supplementary Material).

<sup>7</sup> http://switchgrassgenomics.noble.org/

processes **(A)**, molecular function **(B)**, and cellular localization **(C)**.

### **Expression Profiles of Switchgrass AP2/ERF Genes in Lignified Tissues**

To evaluate whether the identified switchgrass genes coding for AP2/ERF TFs are associated with the regulation of the cell wall biosynthetic genes during cell wall formation or lignification, the transcripts of the genes extracted from the PviGEA web server were used to compare the level of expression in the lignified tissues of vascular bundles and internode fragments against the expression level in less lignified plant tissues such as LBs and sheath. Four genes in group I (*PvERF95*, *PvERF98*, *PvERF101*, and *PvERF102*) and one gene in group II (*PvERF148*) of the DREB subfamily showed highest expression in vascular bundles and internode tissues followed by internode portions where active lignification is expected (**Figure 7**; Table S6 in Supplementary Material). The majority of DREB genes belonging to group III were highly expressed mainly in the vascular bundles. Similarly, many genes in the ERF subfamily group VIII (*PvERF013*, *PvERF015*, *PvERF016*, *PvERF018*, *PvERF019*, and *PvERF020*) and X (*PvERF047*, *PvERF065*, and *PvERF103*) showed the highest expression in the vascular bundles followed by youngest internode sections (**Figure 7**; Table S6 in Supplementary Material). In comparison, only two genes in group IX (*PvERF037* and *PvERF038*), one gene in group VI-L (*PvERF088*), and three genes in group VII (*PvERF111*, *PvERF112*, and *PvERF116*) had high

expression in vascular bundles. Contrastingly, some genes in the ERF subfamily belonging to group V (*PvERF001* and *PvERF002*) and VI (*PvERF068*) showed the highest expression in the basal fragments of the fourth internodes (E4) that is under less active lignification. Other genes including *PvERF178* (VI); *PvERF110* (VII), *PvERF115* (VII), and *PvERF164* (VII); and *PvERF038* (IX) had notably high relative expression in roots than in other tissues. Compared to the ERF family genes, the expression of AP2 genes was highly diverse with some genes having high specificity to roots and vascular bundles. The expression of the two RAV genes analyzed was uniformly low throughout whereas the singleton gene was highly expressed in the LBs, LSH as well as the vascular bundles, and young internode sections (**Figure 7**; Table S6 in Supplementary Material).

### **Overexpression of** *PvERF001* **in Switchgrass Have Enhanced Plant Growth and Sugar Release Efficiency**

Transgenic switchgrass is desired for less recalcitrance biomass for biofuels. To that end, we selected PvERF001, a putative

**FIGURE 7 | The expression pattern of putative switchgrass AP2/ERF genes in roots and shoot parts including portions of developing internodes and vascular bundles at stem elongation stage 4 (E4)**. The heat-map depicting the log2-transformed values of the expression level of each gene was obtained from the switchgrass gene expression atlas (PviGEA). The color scale represents the log2 values of gene expression with blue color denoting low expression and red for high expression. The level of expression

(E4-I3mVB), middle fragments of the third internode (E4-I3mdl) and from the bottom (E4-I4btm), middle (E4-I4mdl), and top (E4-I4top) fragments of the fourth internode. The Roman numerals I–IV represent the groups of the genes in DREB subfamily while V–X showing the groups of genes in ERF the subfamily.

switchgrass homolog of *Arabidopsis* AtERF004 (AtSHN2) and rice OsERF057 (OsSHN) in ERF subfamily group V, for overexpression analysis in switchgrass. This gene was selected since the expression of its *Arabidopsis* homolog in transgenic rice resulted in modified cell wall composition (Ambavaram et al., 2011). Sequence grouping/cluster and sequence alignment analysis suggested that PvERF001 is closely related with its rice and *Arabidopsis* homologs, sharing two highly conserved motifs: the middle motif (mm) and the C-terminal motif (cm) specific to the *Arabidopsis* SHINE clade of TFs (AtERF001, AtERF004, and AtERF005) and OsERF012 and OsERF057 (**Figures 8A,B**). Thus, the ORF of *PvERF001* was cloned and overexpressed in

switchgrass producing more than six independent transgenic lines, which were confirmed based on genomic PCR for the insertion of the transgene and the hygromycin-resistance gene, as well as visualization of OFP in transgenic plants compared to the non-transgenic control lines (**Figure 9A**; Figures S3A–C in Supplementary Material). Analysis of the transgene expression level by qRT-PCR showed 1–12-fold overexpression in transgenic lines (**Figure 9B**). The expression of the endogenous gene in transgenic lines was not affected compared to the non-transgenic control line (**Figure 9C**). All transgenic lines had equivalent or improved vegetative growth metrics relative to the non-transgenic control lines under greenhouse conditions, which was congruent with the relative transcript levels of the transgene [Pearson's correlation for biomass weights (*R* = 0.77 at *P <* 0.05) and tiller height (*R* = 0.73 at *P* = 0.06)] (**Figure 9B**; **Table 2**; Figure S4 in Supplementary Material). Three transgenic lines (3, 7, and 9) had increased biomass. Line 3 had statistically significant increases in four of the six growth traits and approximately twice the dry biomass of the control line (**Table 2**).

To investigate whether *PvERF001* overexpression could affect the leaf cuticular permeability, the water retention capacity in transgenic and non-transgenic control lines was analyzed in detached leaves measured in the dark to minimize transpirational water loss through stomata. Transgenic lines showed relative reduction in rate of water loss compared with the control lines (Figure S5 in Supplementary Material). However, no tangible difference was observed in the rate of leaf chlorophyll leaching between transgenic and the control lines (data not shown). Subsequently, we analyzed whether the changes in leaf morphology might be accompanied by changes in the expression level of genes in the cutin and wax biosynthesis pathway, in which none were observed (Figure S6 in Supplementary Material). Moreover, overexpression of *PvERF001* in transgenic switchgrass showed relatively reduced expression of some lignin (*PvC4H* and *PvPAL*), hemicellulose (*PvCSLS2*), and cellulose (*PvCESA4*) biosynthetic genes, as well as some of the transcriptional regulators (*PvMYB48*/59 and *PvNST1*) of cell wall biosynthesis (Figures S7A–C in Supplementary Material). The total lignin content in R1 tillers determined by Py-MBMS of cell wall residues and in leaves determined by phloroglucinol–HCl staining did not show sizeable difference between the transgenic and nontransgenic control lines (Figures S8 and S9A in Supplementary Material). Similarly, analysis of the S/G lignin monomer ratio in transgenic lines did not significantly change as compared to that of the non-transgenic control line (Figure S9B in Supplementary Material). However, significant improvement in glucose release efficiency was observed in lines 7 (10%) and 8 (16%) relative to the non-transgenic control line (**Table 3**). In contrast, none of the transgenic lines released significantly more xylose than the control. The total sugar release, however, was significantly increased in transgenic line 8 by 11% relative to the non-transgenic control (**Table 3**).

**FIGURE 9 | Representative** *PvERF001* **overexpressing and non-transgenic control (WT) switchgrass lines (A)**. Relative transcript levels of the transgene **(B)** and endogenous gene **(C)** in *PvERF001* overexpressing and non-transgenic (WT) plants. The expression analysis was done using RNA from the shoot tips at E4 developmental stage. The

dissociation curve for the qRT-PCR products showed that the primers were gene-specific. The relative levels of transcripts were normalized to ubiquitin (UBQ). Bars represent mean values of three replicates *±*SE. Bars represented by different letters are significantly different at *P ≤* 0.05 as tested by LSD method with SAS software (SAS Institute Inc.).



*Tiller height estimates were determined for each plant by taking the mean of the five tallest tillers within each biological replicate. The fresh and dry biomass measurements were obtained from aboveground plant biomass harvested at similar growth stages. Values are means of three biological replicates ±SEs (n* = *3). Values represented by different letters are significantly different at P ≤ 0.05 as tested by LSD method with SAS software (SAS Institute Inc.).*

### **TABLE 3 | Sugar release by enzymatic hydrolysis in transgenic and nontransgenic control (WT) lines**.


*All data are means ± SE (n* = *3). CWR, cell wall residues. Values represented by different letters are significantly different at P ≤ 0.05 as tested by LSD method with SAS software (SAS Institute Inc.).*

## **Discussion**

### **Significance of AP2/ERF TFs for Improvement of Bioenergy Crops**

AP2/ERF TFs constitute one of the largest protein superfamilies in plants. These TFs play a role in regulating a wide array of developmental and growth processes. Thus, they are interesting targets for crop genetic engineering and breeding (Licausi et al., 2013; Bhatia and Bosch, 2014). Numerous TFs belonging to this superfamily have been characterized in various plant species and their potential biotechnological applications in crop improvement has focused primarily on biotic and abiotic stress tolerance (Xu et al., 2011; Licausi et al., 2013; Hoang et al., 2014). However, less effort has been made to utilize this potential for genetic improvement of bioenergy feedstocks such as switchgrass (Bhatia and Bosch, 2014). We found this lack of development to be somewhat anachronistic since these TFs are variably associated with plant growth and cell wall biosynthesis, which are directly related to two most important traits to a bioenergy crops, such as switchgrass: biomass and cell wall recalcitrance.

### **Sequence-Based Classification of Putative AP2/ERF TFs in Switchgrass**

With this in mind, we conducted a whole genome search for putative switchgrass AP2/ERF superfamily of TFs and found 207 members (**Figure 1**; **Table 1**; Table S2 in Supplementary Material). Based on comparative genome analysis with the published results in rice, foxtail millet, and *Arabidopsis*, the identified proteins were classified into three families, namely AP2, RAV, and ERF with the later further divided into two subfamilies (ERF and DREB) (Nakano et al., 2006; Lata et al., 2014). The number of genes in the DREB subfamily found in switchgrass (55) was comparable with that of rice (56), *Arabidopsis* (57), and *Populus* (66). All three species along with switchgrass have a singleton in their genome. Consistent with the previous report in rice (Nakano et al., 2006), the switchgrass DREB and ERF subfamilies comprise four and seven groups, respectively. Moreover, based on comparative analysis of the AP2/ERF TFs between different plant species, it seems that group XI of ERF subfamily is specific to monocots as the Xb-L was reported only in dicots (Nakano et al., 2006; Liu et al., 2013). In general, the relative distribution of genes within the different groups in each subfamily appears to be conserved between the three plant species (**Table 1**). Classification of the switchgrass AP2/ERF TFs into distinct groups was clearly supported by the amino acid sequence-based dendrogram of the identified proteins suggesting robust evolutionary conservation between the superfamily among plant species.

### *In Silico* **Predicted Gene Functions and Subcellular Localization of AP2/ERF TFs in Switchgrass**

Consistent with the purported role of AP2/ERF proteins as transcriptional regulators of target genes (Magnani et al., 2004), GO analysis predicted that the majority of the switchgrass AP2/ERF genes appear to have DNA-binding activity consistent with the previous observation in foxtail millet (Lata et al., 2014). Therefore, these genes might be associated with the regulation of various biosynthetic processes as well as responses to environmental stimuli as previously demonstrated for numerous genes in other plant species (Xu et al., 2011; Mizoi et al., 2012; Licausi et al., 2013) (**Figures 5A,B**). The predicted subcellular localization pattern of AP2/ERF superfamily genes in switchgrass, which was mainly to the nucleus as would be expected for transcriptional regulators but also to the plastids and/or mitochondria in addition to the nucleus, was comparable to that reported in foxtail millet (**Figure 5C**) (Lata et al., 2014). Such multi-localization of the proteins could be attributed to post-translational modifications, protein folding, or interactions with other proteins (Karniely and Pines, 2005), and might serve to facilitate the coordinated regulation of the expression of nuclear and organellar genomes (Duchene and Giege, 2012).

### **Gene and Protein Sequence Diversity of Switchgrass AP2/ERF TFs**

The exon/intron structures of switchgrass *AP2/ERF* genes were analogous with that of foxtail millet (Lata et al., 2014), castor bean (Xu et al., 2013), rice, and *Arabidopsis* (Nakano et al., 2006). Consistent with these species, we observed a high diversity in the distribution of the intron regions of AP2 genes versus a single or no intron in most genes in the ERF and RAV families (**Figures 2** and **4**). The pattern of intron distribution within the ORF and their splicing phases was highly conserved in genes within specific groups as reported in castor bean (Xu et al., 2013). Consistent with the observation in rice, the majority of proteins in the groups or subfamilies of switchgrass AP2/ERF superfamily could be distinguished by the presence of one or more diagnostic motifs located outside the AP2 domain region (Table S4 in Supplementary Material) (Rashid et al., 2012). These groups or subfamily-specific conservation in gene structures and protein motifs supported the accuracy of the predicted cluster relationships between the switchgrass AP2/ERF TFs.

AP2/ERF TFs that function as repressors or activators of specific target genes are distinguished by the presence of conserved motifs called repression domains (RD) that are highly conserved, or by the presence of activation domains which are generally less conserved (Licausi et al., 2013). One of the characteristic motif in AP2/ERF transcriptional activators is the activation domain, EDLL motif (Tiwari et al., 2012), while repressors have unique RD namely the ERF-associated amphiphilic repression (EAR) motif (LxLxL or DLNxxP) (Kagale and Rozwadowski, 2011) and B3 repression domain (BRD: R/KLFGV) motif (Ikeda and Ohme-Takagi, 2009). Analysis of the switchgrass AP2/ERF TF sequences also indicated the presence of these motifs in many proteins (Table S4 in Supplementary Material). For instance, many genes in group IX of ERF subfamily appear to be transcriptional activators due to the presence of motif M10, which is an EDLL-like motif. Moreover, this motif is rich in acidic amino residues which has been suggested as the characteristics of transcriptional activators (Licausi et al., 2013). Majority of the ERF family TFs in group VIII and DREB family TFs in group I displayed a DLNxxP-like motifs. Four TFs belonging to the AP2 family (PvERF204, PvERF205, PvERF206, and PvERF207) also displayed similar EAR motif while PvERF203 and PvERF207 harbors DLELSL and NLDLS-like RD, respectively. Similarly, switchgrass TFs in RAV family also displayed unique repression domain, RLFGV (Ikeda and Ohme-Takagi, 2009). ERF subfamily TFs in groups VI and VI-L share a characteristic motif at the *N*-terminus (M18), also known as the cytokinin responsive factor (CRF) domain in *Arabidopsis* that is also shared by rice ERF genes belonging to same group in rice ERF subfamily (Nakano et al., 2006). Genes containing the CRF domain (VI and VI-L) were shown to be responsive to cytokinin (Rashotte et al., 2006). The distinguishing *N*-terminal motif in group VII ERF subfamily proteins, M25 was conserved in both *Arabidopsis* and rice as described previously (Nakano et al., 2006). This motif was shown to dictate the stability of proteins based on the level of oxygen via *N*-end rule pathway (Dubouzet et al., 2003; Licausi et al., 2011). DREB genes in rice with characteristic LWSY motif have been shown to function in regulation of drought, cold, and salinity responsive gene expression (Dubouzet et al., 2003). Switchgrass genes belonging to group III in DREB subfamily (*PvERF133*, *PvERF134*, *PvERF135*, *PvERF136*, *PvERF137*, *PvERF139*, *PvERF140*, *PvERF141*, *PvERF142*, *PvERF143*, *PvERF145*, and *PvERF146*) displayed LWSY conserved motif (M21) at the C-terminal and thus may play similar roles. No information is available in the literature on some of the conserved motifs identified here including M8, M13, M14, M15, M17, and M24 (Table S4 in Supplementary Material), which might potentially be specific to switchgrass.

### **Diverse Expression Profiles of Switchgrass AP2/ERF TFs and Functional Implications**

Differential expression of genes according to developmental stages and tissue or organ types may provide an insight into the specialized biological processes that are taking place in the specific plant parts (Cassan-Wang et al., 2013; Zhang et al., 2013). The observed pattern of expression for the majority of switchgrass AP2/ERF genes at different stages of plant development as well as in different tissues/organ types highlight the significance of these genes in the regulation of various plant growth and developmental processes at the specific stages (**Figures 6** and **7**; Figures S1 and S2 in Supplementary Material). One of the engrossing observations in this study is the association of the expression of numerous genes with tissues/organs undergoing lignification or secondary cell wall development/modification, suggesting that these genes may have intrinsic association with the regulatory machinery of cell wall formation/lignification, which is not as well characterized compared to their roles in stress response (Licausi et al., 2013; Bhatia and Bosch, 2014). Activation of genes responsible for cell wall modification has already been reported to be key during the initiation of seed germination in barley (Sreenivasulu et al., 2008; An and Lin, 2011). In agreement with this, we reported here the transcriptional upregulation of ERF (*PvERF057*, *PvERF068, PvERF088*, and *PvERF119*), DREB (*PvERF101, PvERF102*, and *PvERF148*), and AP2 genes (*PvERF193, PvERF201*, and *PvERF204*) during the initiation of seed germination as well as in vascular bundles and internode sections. Moreover, the observed robust expression of 14 DREB, 17 ERF, and 3 AP2 genes in tissues or organs undergoing active lignification (vascular bundles, top or middle internode sections as well as roots) but less robust expression in less lignified tissues (leaves) also supports this assertion (**Figure 7**). It should also be noted that the transcript levels of several of these genes showed a relative increase with the developmental stage of the plants (**Figure 7**; Figure S1 in Supplementary Material) while exhibiting only marginal expression in less lignified tissues such as inflorescence meristem and germinating seedlings (**Figure 6**; Figure S2 in Supplementary Material). Differential gene expression profiling between elongating and non-elongating internodes in maize was used to identify a total of seven AP2/ERF TFs that are highly expressed in non-elongating internodes undergoing secondary wall development suggesting that these genes may involve in the regulation of secondary cell wall formation (Bosch et al., 2011). Moreover, recent study in *Arabidopsis* and rice identified several putative secondary cell wall-related AP2/ERF TFs based on preferential expression in secondary cell wall-related tissues and coexpression analysis (Cassan-Wang et al., 2013; Hirano et al., 2013a; Bhatia and Bosch, 2014). Some of the switchgrass genes identified in this study (PvERF037, PvERF115, PvERF116, PvERF143, PvERF148, and PvERF164) appear to be putative homologs of maize, rice, and *Arabidopsis* genes identified in the aforementioned studies. Overexpression of *Populus* ERF genes in wood-forming tissues of hybrid aspen was recently shown to result in modified stem growth (including increased stem diameter following the overexpression of five different ERF genes), reduced lignification, and enhanced carbohydrate content (cellulose) in the wood of transgenic lines hinting that these TFs may indeed interact with the transcriptional machinery regulating cell wall biosynthesis (Vahala et al., 2013). Another evidence supporting this is a recent study suggesting that an ERF TF from loquat fruit (*Eriobotrya japonica*) (EjAP2-1) is an indirect transcriptional repressor of lignin biosynthesis via interaction with EjMYB1 TFs (Zeng et al., 2015).

### **Overexpression of** *PvERF001* **Improved Biomass Productivity and Sugar Release Efficiency in Switchgrass**

Based on global gene coexpression analysis, the rice homolog of AtSHN2, OsSHN (OsERF057) was proposed to have a native association with cell wall regulatory and biosynthetic pathways, yet this was not experimentally verified (Ambavaram et al., 2011). In this study, we investigated whether PvERF001, the closest putative switchgrass homolog of these genes based on clustering, sequence alignment analysis, and the sharing of conserved motifs (mm and cm) specific to *Arabidopsis* SHN clade of TFs and the rice SHN, may participate in the regulation of cell wall biosynthesis (**Figure 8**). Our results suggest that PvERF001 may not be directly involved in the regulation cell wall biosynthesis though its transgenic overexpression resulted in increased sugar release efficiency (Figure S7 in Supplementary Material; **Table 3**). Despite the observed reduction in relative expression of some lignin biosynthetic genes and their transcriptional regulators in switchgrass that seem to relate with the results in rice overexpressing *AtSHN2*, no significant changes in the lignin content and composition was detected in transgenic switchgrass in contrast to the reduced lignin content observed in rice overexpressing *AtSHN2* (Ambavaram et al., 2011) (Figures S7, S8, and S9A in Supplementary Material). The increased sugar release might be attributed to altered storage carbohydrates such as starches as recently reported in *Arabidopsis* where ectopic expression of rice ERF TF (SUB1A-1) gene resulted in improved enzymatic saccharification efficiency via increased level of starch (Nunez-Lopez et al., 2015). Similar results were obtained from overexpression of maize *corngrass1* microRNA in switchgrass (Chuck et al., 2011). However, whether PvERF001 is associated with starch biosynthesis remains to be determined. Moreover, in contrast to the previous reports where heterologous expression of *AtSHN2* in rice did not significantly affect the growth characteristics of transgenic lines (Ambavaram et al., 2011), overexpression of *PvERF001* resulted in increased plant growth including plant height, stem diameter and aboveground biomass weight in transgenic lines (**Table 2**). The discrepancy in lignin content and biomass productivity traits between the AtSHN2 and PvERF001 may indicate the differences in functional specialization between the two genes in monocots and dicots even though sequence analysis seems to suggest that they might be homologs. The fact that overexpression of *AtSHN* genes in *Arabidopsis* rather showed association with the regulation of wax, cutin, and pectin biosynthesis supports this assertion (Aharoni et al., 2004; Shi et al., 2011). Moreover, recent study showed that the homolog of *Arabidopsis SHN* genes in tomato (*SlERF52*) was expressed mainly in the abscission zone and functionally associated with the regulation of the pedicel abscission zone-specific transcription of genes including cell wallhydrolytic enzymes (polygalacturonase and Cellulase) required for abscission (Nakano et al., 2014). These differences in the expression pattern and function may suggest functional divergence between *SlERF52* and its *Arabidopsis* homologs. Functional divergence between homologous TFs in monocots and dicots has also been reported in previous studies involving the homologs of AtMYB58/63, which is a known activator of lignin biosynthesis that did not appear to play similar roles in rice (Hirano et al., 2013b).

A recent study involving overexpression of rice homolog of *AtSHN2*, *OsSHN*, in rice showed enhanced tolerance of transgenic plants to water deprivation and association of the gene with the regulation of wax and cutin biosynthesis and hence named rice wax synthesis regulatory gene (OsWR2) (Zhou et al., 2014). The closest homolog of this gene, OsERF012 (OsWR1), was also shown to be induced by drought stress and involved in the regulation of wax synthesis (Wang et al., 2012). Therefore, we examined whether PvERF001 might be involved in the regulation of wax and cutin biosynthesis. Consistent with previous studies in rice, relative increase in leaf water retention capacity was detected in transgenic plants though the effect on the expression of wax and cutin biosynthetic genes was minimal (Figures S5 and S6 in Supplementary Material). Possible explanation for the observed differences between overexpression of rice and switchgrass homologs might be an indication of the functional divergence in the switchgrass genes due to gene duplication. This may explain the discrepancy between transgenic rice overexpressing rice *SHN* (*OsWR2*) exhibiting reduction in plant height but increase in the number of tillers (Zhou et al., 2014) and transgenic switchgrass overexpressing *PvERF001* showing increased plant height but no difference in number of tillers. This suggests that ERF genes might functionally be highly diversified and PvERF001 may be part of a different pathway than we anticipated such as regulation of responses to biotic stress or other abiotic stress or regulation of cell elongation or division in coordination with the cytokinin pathway, with the latter perhaps explaining the observed increase in biomass and vegetative growth in transgenic lines.

In summary, the expression profiling of the switchgrass AP2/ERF genes provides baseline information as to the putative roles of these genes and thus a useful resource for future reverse genetic studies to characterize genes for economically important bioenergy crops. With the current advancements in switchgrass research and establishment of efficient transformation system, this inventory of genes along with the information provided here could facilitate our understanding regarding the functional roles of AP2/ERF TFs in plant growth and development. Furthermore, it would aid in the identification of potential target genes that may be used to improve stress adaptation, plant productivity, and sugar release efficiency in bioenergy feedstocks such as switchgrass. The increased biomass yield and sugar release efficiency from overexpressing *PvERF001* highlight the potential of these TFs for improvement of bioenergy feedstocks.

## **Author Contributions**

WW designed and performed the experiments, analyzed the data, and wrote the manuscript. MM participated in experimental design and data analysis, assisted with revisions to the manuscript and coordination of the study. GT, RS, SD, and MD assisted with performing lignin and sugar release assays and contributed in revision of the manuscript. CS conceived the study and its design and coordination, and assisted with revisions to the manuscript. All authors read and consented to the final version of the manuscript.

## **Acknowledgments**

We thank Angela Ziebell, Erica Gjersing, Crissa Doeppke, and Melvin Tucker for their assistance with the cell wall characterization and Susan Holladay for her assistance with data entry into LIMS. We thank Wayne Parrot for providing the switchgrass SA1 clone. This work was supported by funding from the BioEnergy Science Center (DE-PS02-06ER64304). The BioEnergy Science Center is a U.S. Department of Energy Bioenergy Research Center supported by the Office of Biological and Environmental Research in the DOE Office of Science. We also thank Tennessee Agricultural Experiment Station for providing partial financial support for WW.

## **References**


## **Supplementary Material**

The Supplementary Material for this article can be found online at http://journal.frontiersin.org/article/10.3389/fbioe.2015.00101


in cold-responsive gene expression in transgenic rice. *Plant Cell Physiol.* 47, 141–153. doi:10.1093/pcp/pci230


in hybrid aspen stem identifies ERF genes that modify stem growth and wood properties. *New Phytol.* 200, 511–522. doi:10.1111/nph.12386


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Wuddineh, Mazarei, Turner, Sykes, Decker, Davis and Stewart. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **Phenotypic Changes in Transgenic Tobacco Plants Overexpressing Vacuole-Targeted** *Thermotoga maritima* **BglB Related to Elevated Levels of Liberated Hormones**

*Quynh Anh Nguyen<sup>1</sup> , Dae-Seok Lee<sup>2</sup> , Jakyun Jung<sup>1</sup> and Hyeun-Jong Bae1,2 \**

*<sup>1</sup> Department of Bioenergy Science and Technology, Chonnam National University, Gwangju, South Korea, <sup>2</sup> Bio-Energy Research Center, Chonnam National University, Gwangju, South Korea*

### *Edited by:*

*Robert Henry, The University of Queensland, Australia*

### *Reviewed by:*

*Tianju Chen, Chinese Academy of Sciences, China Chiranjeevi Thulluri, Jawaharlal Nehru Technological University Hyderabad, India*

> *\*Correspondence: Hyeun-Jong Bae baehj@chonnam.ac.kr*

### *Specialty section:*

*This article was submitted to Bioenergy and Biofuels, a section of the journal Frontiers in Bioengineering and Biotechnology*

> *Received: 18 August 2015 Accepted: 26 October 2015 Published: 09 November 2015*

### *Citation:*

*Nguyen QA, Lee D-S, Jung J and Bae H-J (2015) Phenotypic Changes in Transgenic Tobacco Plants Overexpressing Vacuole-Targeted Thermotoga maritima BglB Related to Elevated Levels of Liberated Hormones. Front. Bioeng. Biotechnol. 3:181. doi: 10.3389/fbioe.2015.00181* The hyperthermostable β-glucosidase *BglB* of *Thermotoga maritima* was modified by adding a short C-terminal tetrapeptide (AFVY, which transports phaseolin to the vacuole, to its C-terminal sequence). The modified β-glucosidase *BglB* was transformed into tobacco (*Nicotiana tabacum* L.) plants. We observed a range of significant phenotypic changes in the transgenic plants compared to the wild-type (WT) plants. The transgenic plants had faster stem growth, earlier flowering, enhanced root systems development, an increased biomass biosynthesis rate, and higher salt stress tolerance in young plants compared to WT. In addition, programed cell death was enhanced in mature plants. Furthermore, the C-terminal AFVY tetrapeptide efficiently sorted *T. maritima* BglB into the vacuole, which was maintained in an active form and could perform its glycoside hydrolysis function on hormone conjugates, leading to elevated hormone [abscisic acid (ABA), indole 3-acetic acid (IAA), and cytokinin] levels that likely contributed to the phenotypic changes in the transgenic plants. The elevation of cytokinin led to upregulation of the transcription factor *WUSCHELL*, a homeodomain factor that regulates the development, division, and reproduction of stem cells in the shoot apical meristems. Elevation of IAA led to enhanced root development, and the elevation of ABA contributed to enhanced tolerance to salt stress and programed cell death. These results suggest that overexpressing vacuole-targeted *T. maritima* BglB may have several advantages for molecular farming technology to improve multiple targets, including enhanced production of the β-glucosidase BglB, increased biomass, and shortened developmental stages, that could play pivotal roles in bioenergy and biofuel production.

**Keywords:** *Thermotoga maritima,* **hyperthermostable** β**-glucosidase BglB, C-terminal AFVY tetrapeptide, vacuoletargeted, hormone conjugates, shoot apical meristem**

## **INTRODUCTION**

β-glucosidase is critical for many developmental processes in plants, and the hydrolysis of phytohormone conjugates is one of its most important roles (Schliemann, 1984; Sembdner et al., 1994; Kleczkowski et al., 1995). The rolC gene of the bacterial pathogen *Agrobacterium rhizogenes* encodes β-glucosidase, and results in abnormal development when transformed into plants. In particular, heterologous β-glucosidase can release active forms of phytohormones from their inactive conjugates that consist of glycoside links (Spena et al., 1992; Brzobohaty et al., 1993). Inactive conjugates of each phytohormone can be found abundantly in plant tissues. Their active forms are liberated via β-glucosidase-mediated hydrolysis. Furthermore, many studies have revealed that the inactive forms of phytohormone conjugates act as reversible deactivated storage molecules, and are important for the regulation of physiologically active hormone levels; however, their normal biological functions remain unknown (Staswick, 2009; Piotrowska and Bajguz, 2011).

The vacuole is considered a storage organelle and is an important component of the secretory pathway in plants. Detailed knowledge of the sorting mechanisms, out of and into the vacuole, is lacking (Hall, 2000; Vitale and Hinz, 2005). Previous studies of several lytic enzymes that are specifically targeted to the vacuole (e.g., phaseolin) have revealed some sorting signals (N-terminal or C-terminal polypeptides or internal sequences) that can sort proteins into the vacuole (Frigerio et al., 2001; De Marcos Lousa et al., 2012). Unfortunately, because the internal environment of the vacuole leads to the rapid degradation and hydrolysis of proteins, and other compounds, it has been difficult to determine whether heterologous expressed proteins maintain their functions and features inside the vacuole.

We previously expressed the hyperthermostable β-glucosidase BglB of *T. maritima* in tobacco plants to obtain transgenic plants for application in bioconversion. The optimal temperature and pH of the plant-expressed BglB were 80°C and 4.5, respectively (Jung et al., 2010). Moreover, we also observed some phenotypic modifications, such as longer stems, larger leaves, and shortened developmental stages (Jung et al., 2010, 2013), which we hypothesized may have been due to changes in hormone homeostasis. Therefore, in the present study, we overexpressed heterologous BglB of *T. maritima* in tobacco plants. We targeted the vacuole by insertion of the AFVY tetrapeptide to examine whether BglB maintain its functions of hydrolyzing glycoside bonds to release free hormones from its conjugates, and to determine how such changes in hormones levels may affect the growth and development of transgenic plants. All of the changes in the aboveground or belowground organs in plants can be explained via the development, division, and reproduction of stem cells harbored in the shoot and root apical meristems, which are regulated by the expression of homeodomain genes and hormone levels. For example, in the shoot apical meristem, the transcription factor *WUSCHELL (WUS)* can be upregulated via cytokinin, and a group of dividing cells called the quiescent center (QC) is upregulated by indole 3-acetic acid (IAA) in the root apical meristem (Kerk et al., 2000; Overvoorde et al., 2010; Yadav et al., 2010; Zhao et al., 2010).

## **MATERIALS AND METHODS**

## **Vector Constructions, Plant Transformation, and Molecular Analysis**

For cytosol expression, the full-length sequence of the *T. maritima* β-glucosidase *BglB* gene (Jung et al., 2010) was constructed under control of the 35S promoter, and named Cyt-BglB (CB). For vacuole targeting, *BglB* was modified by replacing its stop codon with nucleotide sequences encoding the AFVY signal tetrapeptide from the vacuolar storage glycoprotein phaseolin (Frigerio et al., 2001), with a stop codon inserted at the end, and named Vac-BglB (VB). According to previous studies, AFVY tetrapeptide signals are sufficient to target a heterologous protein to the vacuole (Frigerio et al., 2001; Lau et al., 2010). The 35S promoter was also used for vacuole targeting of the recombinant variants. These expression cassettes were then sub-cloned into the modified multiple cloning sites of the binary vector pCambia 2300 (Kim et al., 2010), as shown in **Figure 1A**. *Agrobacterium tumefaciens* strain GV3013 was used for transformation of tobacco (*Nicotiana tabacum* L.) via the leaf-disk method (Helmer et al., 1984). Transformed shoots were selected on solid Murashige–Skoog (MS) medium (Murashige and Skoog, 1962) containing 100 μg/ml kanamycin and 500 μg/ml cefotaxime. Transgenic tobacco plants were grown in a growth chamber under a 16-/8-h light/dark cycle at 25 *±* 3°C. After the presence of the transgene was confirmed by genomic DNA polymerase chain reaction (PCR), reverse transcription (RT)-PCR, and Western blotting, the T<sup>0</sup> generation of transgenic and wild-type (WT) plants was moved to a greenhouse for development.

Total genomic DNA was isolated from the T<sup>0</sup> generation transgenic plant leaves using genomic DNA extraction buffer [200 mM Tris–HCl, 250 mM NaCl, 25 mM Na2-EDTA, 0.5% sodeum dodecyl sulfate (SDS)]. The concentration of genomic DNA was measured using a NanoDrop spectrophotometer (Thermo Fisher Scientific, USA). To confirm the presence of *BglB*, PCR of genomic DNA was performed using two sets of flanking primers: first FP 5*′* -GTC GCT CAT CAC GAA ACC GT-3*′* and RP 5*′* -ACT ACA GAG GAA AAG GTG AA-3*′* for checking the presence of a 0.7-kb sequence within *BglB* in the CB and VB constructs, and second FP 5*′* -TAT GCA GGC TCC CAC CCC TT-3*′* and RP 5*′* - GTA TAC GAA TGC ACT ACA GA-3*′* for checking the presence of a 0.4-kb sequence within *BglB* and nucleotides sequences of AFVY tetrapeptide in VB constructs. For RT-PCR, total RNA was extracted from the leaf tissues and cDNA was synthesized using avian myeloblastosis virus (AMV) reverse transcriptase (Promega, USA) with random hexamers, and the RT-PCR of *BglB* was performed using first primers set mentioned above. RT-PCR was also performed to determine the expression level of *N. tabacum WUS* (JQ686923.1) after RNA was extracted from the stems of seedlings and cDNA was synthesized, with the specific primers: FP 5*′* -ATG CAC ATG AGA GGT GTT TG-3*′* and RP 5*′* -TTA AGG GGA ATT AGG AGA TC-3*′* .

BglB proteins were extracted from the T<sup>0</sup> to T<sup>3</sup> generation transgenic tobacco leaves by grinding the leaf material to a powder in liquid nitrogen and then suspending the powder in protein extraction buffer at pH 8.0 (50 mM Tris–HCl, 5 mM Na2- EDTA, 20 mM Na2S2O<sup>5</sup>*×*5H2O, 100 mM KCl, 5% glycerol, 1% β-mercaptoethanol). Leaf debris was removed by centrifugation at 13,000 *× g* for 20 min at 4°C. Total soluble protein (TSP) in the supernatants was measured using the Bradford method (Bradford, 1976). Using transfer buffer (39 mM glycine, 48 mM Tris, 10% SDS, 20% methanol), 10 μg of protein was electrophoresed on 12% polyacrylamide gels and transferred to polyvinylidene fluoride membranes (Immobilon-P; Millipore). The membrane was blocked by incubation with 5% skimmed milk (Difco, USA)

in phosphate-buffered saline at pH 7.0 (1 mM KH2PO4, 10 mM Na2HPO<sup>4</sup> *×* 12H2O, 137 mM NaCl, 2.7 mM KCl), and then incubated with a polyclonal anti-β-glucosidase antibody as the primary antibody. Alkaline phosphatase-conjugated goat anti-rabbit IgG antibody (Promega, USA) was used at a 1:2500 dilution as the secondary antibody. For detecting β-glucosidase *BglB* targeting vacuole, total protein from isolated vacuole was used to conduct western blot with the same as above, except incubating with a fluorophore-conjugated secondary antibody anti-rabbit IgG (H + L) (DyLight TM 680 Conjugate) (red) in a 1:2500 dilution.

## **Growth Conditions, Sampling, and Phenotypic Observation**

After the presence of the transgene was confirmed, T<sup>0</sup> generation transgenic and WT plants were moved to the greenhouse for development. Seeds from the T<sup>0</sup> generation transgenic plants were sprayed in MS medium containing kanamycin and grown in a growth chamber under a 16-/8-h light/dark cycle at 25 *±* 3°C to produce the T<sup>1</sup> generation, and the germination day was recorded. After 2 weeks, seedlings from ten lines of the CB and VB transgenic plants were used to determine β-glucosidase activity. A

100 mg sample of grinded powder from the leaf was used to extract TSP (Jung et al., 2010), and after checking TSP by the Bradford method, an amount of extracted protein equivalent to 10 μg of TSP were used to examine β-glucosidase enzymatic activity by using p-nitrophenyl b--glucopyranoside (pNPG) as the substrate. One unit of β-glucosidase is defined as the amount of enzyme that released 1 mmol of p-nitrophenol from the pNPG substrate under the assay conditions described below. The assay mixture containing 10 mM pNPG in citrate–phosphate buffer (pH 4.5) was incubated with the enzyme for 30 min at 70°C in a total volume of 1 ml. The reaction was stopped by adding 1M Na2CO3, and absorbance was measured at 405 nm. Based on these results, we selected three transgenic lines from each of the CB and VB transgenic plants that showed the highest crude extract β-glucosidase enzymatic activity. Thirty plants from each of the chosen lines were grown in a growth chamber and then moved to 10-l pots in soil–perlite mixtures at 25 *±* 3°C under a 16-/8-h light/dark photoperiod and a light intensity of 100 mmol m–2 s –1 in a greenhouse for further analysis.

The fifth leaf from the tops of three plants of each of the chosen transgenic lines and W plants was harvested at the same time, and after each 20 days, from 30 to 90 day after germination (DAG), stored at *−*70°C and ground in liquid N<sup>2</sup> to analyze β-glucosidase enzymatic activity.

The phenotypic characteristic of the transgenic and WT plants were also recorded from the T<sup>1</sup> to T<sup>3</sup> generation of transgenic plants, including: stem height, number of leaves, root lengths, number of lateral roots, time from germination to initial flowering, and dry weight (the leaves, stems, and roots of nonsample plants were separately harvested and freeze-dried after harvesting the seeds). The carbohydrate content of each part of the plant was also determined using gas chromatography (Coleman et al., 2009).

To conduct a salt stress tolerance experiment in the growth chamber, after germination, the seedlings were transferred to new MS media (without sucrose) containing 200 mM NaCl, and phenotypic characteristics were measured after 15 DAG. Mature (80 DAG) transgenic and WT plants grown in the greenhouse were used for the salt stress experiment, which the plants were watered with the same amount of 200 mM NaCl within 10 days. The weight of spots (including spot, soil, and plants) was measured before and after 10 days NaCl treatment. Simultaneously, the same position on the leaf (10th leaf from the ground to above) was harvested, ground with liquid N2, chlorophyll was extracted 90% ethanol, boiled for 5 min and absorbance was measured at an optical density of 620 nm to calculate the chlorophyll concentration (Lichtenthaler and Wellburn, 1983).

## **Phytohormone Extraction and Measurement**

The phytohormones [include abscisic acid (ABA), IAA, and cytokinin] from young leaves or seedlings of CB, VB, and WT plants were extracted with 80% methanol (Oliver et al., 2007) and measured using the Phytodetek competitive enzyme-linked immunosorbent assay (ELISA) kit (Agdia; Elkhardt, IN, USA) the Phytodetek competitive ELISA kits (Agdia). Briefly, young leaves or seedlings of transgenic and WT plants were harvested and ground in N<sup>2</sup> liquid and stored at *−*70°C for further analysis. One gram of ground powder was mixed with 1 ml 80% methanol and incubated overnight at *−*4°C. One milliliter (ml) of the supernatant was collected after centrifuging at 13,000 rpm for 10 min to remove debris, and then freeze-dried. The freezedried powder was used to measure the levels of each hormone, according to the kit protocols. Each measurement was conducted in triplicate.

## **Vacuoles Isolation**

For purification of vacuole-targeted BglB, transformed protoplasts from young plants (30 DAG) were isolated by hydrolysis with cell-wall hydrolysis enzymes and fractionated by ultracentrifugation according to Mettler et al. (Mettler and Leonard, 1979) and Raikhel et al. (Robert et al., 2007), with some modifications. Due to the requirement of a highly purified of vacuole, the transformed protoplasts were loaded on top of step gradients consisting of 4, 7, 12, and 15% Ficoll, and centrifuged at 97,000 *× g* for 4 h. The vacuole was isolated in the top layer of the fraction, and then disrupted by sonication before measuring β-glucosidase enzymatic activity with 10 μg TSP.

## **RESULTS**

## β**-glucosidase Enzymatic Activity from Isolated Vacuoles and Total Hormone Levels were Significantly Higher in Transgenic than in WT Plants**

Hormone conjugates, which are found in each class of plant hormones, are mainly localized in the vacuoles of plant. The mechanism controlling their transport across membranes and between plant organs remain unknown (Bajguz and Piotrowska, 2009). To analyze the effects of thermostable *T. maritima* BglB on changes in phytohormone metabolism and the consequences for plant development, we built two constructs of BglB. The CB construct was for ectopic expression of *BglB* in the cytosol, and the VB construct was for expression of vacuole-targeted BglB, under the control of the 35S promoter (**Figure 1A**). In total, 10 and 12 lines of CB and VB transgenic plants, respectively, were confirmed, and three of the lines were used for further analysis after confirmation of the transgenes by genomic DNA PCR, reverse transcription (RT)-PCR, and Western blotting (**Figures 1B–E**). The presence of the nucleotide sequences encoding the AFVY tetrapeptide in the VB construct was confirmed using PCR with a reverse primer specific to the VB construct (**Figure 1C**), and the transcript of the heterologous BglB in the transgenic plants were confirmed by RT-PCR (**Figure 1D**). The existence of BglB in TSP from the CB and VB plants had molecular weights similar to BglB, as mentioned in the previous study (Jung et al., 2010), were detected by western blot (**Figure 1E**, above panel), presented no different between CB and VB plants, but showed significantly higher level of BglB in the isolated vacuoles of VB plant compared to CB plant (**Figure 1E**, below panel), indicated by the higher intensity of the BglB band exposed by the present of BglB in the isolated vacuoles from the VB transgenic than from the CB transgenic and WT plants (**Figure 1F**). These results obviously indicate that VB plants were the highest vacuole-targeted heterologous BglB.

The three best-performing T<sup>0</sup> generation transgenic lines were selected according to their β-glucosidase enzymatic activity, and then self-pollinated. The β-glucosidase enzymatic activity was significantly higher in the transgenic plants, compared to WT plants over three generations (T<sup>1</sup> to T3; **Figure 2A**). A slight reduction of β-glucosidase enzymatic activity after a few generations was observed, possibly due to factors such as epigenetic silencing mechanisms (Iyer et al., 2000; Matzke et al., 2000). Heterologous *BglB* was also stably expressed in the transgenic plants, as indicated by the pattern of β-glucosidase enzymatic activity during plant development from 30 to 90 days after germination (DAG) of the CB1 and VB9 transgenic plants (**Figure 2B**).

To examine the efficiency of vacuole targeting, we isolated the vacuoles of the WT plants, and the CB and VB transgenic lines, from the T<sup>1</sup> generation. The results showed that the isolated vacuoles of the VB transgenic lines had the highest β-glucosidase enzymatic activity, compared to Cyt-BglB and WT plants (**Figure 2C**). In particular, compared to WT plants, increased β-glucosidase enzymatic activity of 452 and 759% were recorded in the vacuoles of CB1 and VB2 transformants, respectively. These results accompany to above identification (presented

**FIGURE 2 |** β**-glucosidase enzymatic activity and total hormones levels**. **(A)** Significant higher β-glucosidase enzymatic activity of heterologously expressed BglB during three generations of CB and VB transgenic plants compared to WT plant. **(B)** Profile pattern of β-glucosidase enzymatic activity of heterologously expressed BglB in CB and VB transgenic plants in T<sup>1</sup> generations, from 30 to 90 DAG revealed the significant higher β-glucosidase enzymatic activity of CB and VB transgenic plants compared to WT plants. **(C)** Significant higher β-glucosidase enzymatic activity of heterologously expressed BglB from isolated vacuole and the total hormones levels of CB and VB transgenic plants compared to WT plant at 14 DAG. Vacuole was isolated after protoplasts preparation from 14 DAG tobacco seedlings (WT and transgenic), broken by sonication within 5 min in 0.5 cycle/60% amplitude in protein extraction buffer at pH 8.0 (50 mM Tris–HCl, 5 mM Na2-EDTA, 20 mM Na2S2O<sup>5</sup> *×* 5H2O, 100 mM KCl, 5% glycerol, 1% β-mercaptoethanol), and concentrated by UFC 710008/Centricon Plus-70 Centrifugal filter (EM Millipore, USA). Average values were calculated from triplicate (*n* = 3) of each transgenic lines and WT plants. \*indicates significant differences from the control (WT) (*P <* 0.05).

in **Figure 1E**), indicate that the VB constructs which imposed AFVY tetrapeptide effectively targeted β-glucosidase to the vacuole, and that BglB was still active in the vacuole.

Moreover, significantly higher total hormone (including IAA, ABA, and cytokinin) levels were recorded in the transgenic plants compared to the WT plants, based on the ELISA results, with the highest hormone levels obtained from the VB transformants (**Figure 2C**). In particularly, maximum increases of 268 and 463%, when comparing total extracted hormone levels in CB1 and VB3 to WT plants, respectively, were attributed to higher levels of each hormone in the transgenic plants (Figure S1 in Supplementary Material).

## **Pronounced Phenotypic Changes in the Transgenic Plants**

The transgenic CB and VB tobacco plants displayed pronounced phenotypic changes compared to WT plants. Phenotypic characteristics such as stem height, time from germination to initial flowering, and dry weight were proportional to the levels of βglucosidase enzymatic activity of transgenic and WT plants, suggesting a correlation between the enhancement of β-glucosidase enzymatic activity and these phenotypic changes. In particular, faster development was observed in the transgenic plants than in the WT plants, as indicated by increased stem height, earlier flowering, increased biomass accumulation, and enhanced root system development (**Figure 3A**; Figure S2A in Supplementary Material). Moreover, a shorter time from germination to initial flowering was recorded in the T<sup>1</sup> generation of transgenic plants compared to WT plants. We reported an average of 103.6 and 94.1 DAG in the CB and VB transgenic plants, respectively, compared to 141.7 DAG in WT plants; **Figure 3B**), and similar results were observed for the T<sup>2</sup> and T<sup>3</sup> generations (Figure S2B in Supplementary Material). Higher β-glucosidase enzymatic activity and total hormone levels were recorded at flowering time, with maximum increases in total hormone levels of 222 and 387% for CB1 and VB3 compared to WT plants, respectively (**Figure 3C**; Figure S2C in Supplementary Material), while no significant differences in β-glucosidase enzymatic activity between the CB and VB transgenic plants was observed (**Figure 3C**). These results indicate that more liberated hormones were released in the VB than the CB transgenic plants. After the seeds were harvested, the stem height and dry weight of total biomass accumulation were significantly higher in the mature transgenic plants than in the WT plants, with maximum increases of 133% for stem height (CB1 compared to WT plants) and 124% for total dry weight (VB9 compared to WT plants; **Figures 3D,E**). Similar results were obtained for the T<sup>2</sup> and T<sup>3</sup> generations (Figures S2D,E in Supplementary Material). These results clearly indicated that the increase in liberated hormone levels (particularly IAA and cytokinin) contributed to increased biomass accumulation, despite the shortened growth cycle (earlier flowering after germination) in the transgenic plants. The same phenotypic characteristics were observed in previous studies that targeted βglucosidase to either general or particular cellular compartments (Jung et al., 2010; Jin et al., 2011). However, despite the significant changes in biomass accumulation and shortened growth cycle, no significant differences in carbohydrate content were observed between the transgenic and WT plants (**Table 1**), indicating that only total biomass accumulation was influenced in the transgenic plants.

individuals (*n* = 20) of each transgenic lines and WT plants for **(B,D,E)**, and from triplicate (*n* = 3) of each transgenic lines and WT plants for **(C)**. \*indicates significant

**Transgenic Plants Showed Faster Development of the Stem and Roots, Elevation of IAA and Cytokinin Levels, and Upregulation of WUS**

differences from the control (WT) (*P <* 0.05).

Based on the increase in stem height and enhanced roots system development, which appeared to be correlated with increased hormone levels of the transgenic plants, we asked whether the increased levels of cytokinin and auxin would affect the development of stems and roots of transgenic seedlings. Seeds from the T<sup>1</sup> to T<sup>3</sup> generations, and the WT plants, were sprayed with MS media containing kanamycin. Immediately after the seeds germinated, tiny seedlings were transferred to new MS media in a line to compare stem development. Faster development of the transgenic plants was clearly observed, as presented by the larger size of the transgenic plants compared to WT plants (**Figure 4A**). At 15 DAG, along with the increase in β-glucosidase enzymatic activity, IAA and cytokinin levels were significantly higher in the transgenic plants compared to WT plants, with the highest hormone level obtained in VB transgenic plants (increase in 585% in VB3 compared to WT plants), whereas there was no difference in β-glucosidase enzymatic activity for the CB and VB transgenic plants (**Figure 4B**). These results indicate that more liberated hormones were released from the vacuole in VB than in CB transgenic plants.

Correlation analysis between the development of the stem (height) and root system (root lengths) to cytokinin and auxin levels showed a maximum increase in 200% for stem height corresponded to an increase in 458% in cytokinin level in VB9 compared to WT plants (**Figure 4C**). The maximum increases in roots lengths and IAA level were 186 and 725%, respectively, in VB3 compared to WT plants (**Figure 4D**). Moreover, these results indicate that, despite slight differences in stem height and root lengths between the CB and VB transgenic plants, the increase in liberated IAA and cytokinin levels promoted faster development in the transgenic lines compared to WT plants. We observed


*Average values were calculated from triplicate (n = 3) of each CB and VB transgenic lines and WT plants.*

maximum increases in stem height and root lengths of 164 vs. 200%, and 183 vs. 194%, for CB and VB vs. WT plants, respectively. Faster development of the stem and roots was also observed in the T<sup>2</sup> and T<sup>3</sup> generations (Figures S3A,B in Supplementary Material). Furthermore, a higher number of leaves and lateral roots, and greater average fresh weight of 20 young plants, were also observed in the transgenic plants compared to WT plants (Figures S3C–E in Supplementary Material), indicating that the faster development of the transgenic plants, compared to WT plants, was stable after three generations.

Plant stem cells are harbored inside the meristem, which is located in the growing tips of the shoots and roots. The faster development observed in the transgenic plants suggests stronger stimulation of stem cell reproduction, which could then induce changes in plant growth and organogenesis (Murray et al., 2012). The population of stem cells in shoot apical meristems is regulated by expression of the homeodomain gene *WUS*, a transcription factor that can be upregulated by cytokinin level. In the root apical meristem, a group of dividing cells, called the quiescent center (QC) in the root apical meristem, is upregulated by IAA level (Yadav et al., 2010; Zhao et al., 2010). To determine the expression levels of *WUS* for transgenic lines and WT plants, RNA was extracted from the stems of young plants (15 DAG) for cDNA synthesis and RT-PCR. The results showed higher *WUS* expression levels in the transgenic plants compared to WT plants (**Figure 4E**), providing evidence that superior development of the transgenic plants compared to WT plants was due to elevated hormone levels.

## **Enhanced Resistance to NaCl Stress and Elevation of ABA in Transgenic Plants**

Next, we asked whether increased ABA levels in the transgenic plants led to increased tolerance of salt stress, as mentioned in previous studies (Lee et al., 2006; Wang et al., 2011; Han et al., 2012; Xu et al., 2012). Seeds from the T<sup>1</sup> to T<sup>3</sup> generations were used, and after germination, tiny seedlings were transferred to new MS media containing 200 mM NaCl to examine the response to high NaCl stress. Observations at 15 DAG showed that the transgenic plants were more resistant to high NaCl, as indicated by enhanced development in the transgenic compared to WT seedlings in term of increased root length, number of leaves, number of lateral roots, and fresh weight (**Figure 5A**). As shown in **Figure 5B**, the higher ABA level was clearly related to increased β-glucosidase enzymatic activity in the transgenic seedlings, with the highest ABA levels recorded in the VB transformants (maximum increase in 504% in VB3 compared to WT seedlings). Increased tolerance to high NaCl stress was also displayed by the obviously longer root lengths and greater fresh weight of 100 transgenic seedlings compared to WT seedlings (maximum increase in 271% in root lengths and 256% in fresh weight in the VB2 compared to WT seedlings; **Figures 5C,D**).

Next, to examine the resistance of mature plants to salt stress, transgenic andWT plants grown in the greenhouse were subjected to a salt stress experiment at 80 DAG. As shown in **Figure 5E**, more senescent leaves appeared in the transgenic than WT plants, which may explain the higher rate of weight reduction (8.3% in VB2 compared to 3.4% in WT plants; **Figure 5F**). The appearance of leaves senescence indicated that a faster programed cell death process occurred in the transgenic than WT plants, which was also represented by the higher rate of chlorophyll degradation in transgenic plants (43.2% in VB3 compared to 14.1% in WT plants; **Figure 5G**).

## **DISCUSSION**

Because of its thermostability and transglycosylation properties, the *T. maritima* BglB enzyme is considered to be a useful

catalyst for biotechnological applications (Goyal et al., 2001). According to Jung et al. (2010), transgenic tobacco plants can not only be utilized for the mass production of BglB, but also, the overexpression of heterologous BglB in tobacco has led to changes in phenotypic characteristics (such as larger leaves and taller plants) (Jung et al., 2013). Plants contain their own βglucosidase genes, and previous studies have demonstrated that the expression of β-glucosidase, including heterologous expression, affects the hydrolysis of hormone conjugates and homeostasis in plants, which in turn control plant development (Schliemann, 1984; Brzobohaty et al., 1993; Dietz et al., 2000; Kiran et al., 2006). In the present study, by observing pronounced phenotypic changes in the *T. maritima* BglB transgenic tobacco compared to WT plants. The transgenic tobacco remained stable over three offspring generations (**Figures 3**–**5**). We were encouraged to evaluate the relationship between β-glucosidase enzymatic activity of *T. maritima* BglB and changes in plant hormone levels.

For vacuole targeting, among the three different types of vacuolar sorting signals (N- or C-terminal polypeptides or internal sequences) that have been identified (Jiang and Rogers, 1998; Matsuoka and Neuhaus, 1999), C-terminal polypeptides, such as the C-terminal amino acids AFVY tetrapeptide from phaseolin, are considered be the most efficient (Frigerio et al., 2001; Nausch et al., 2012a,b). However, due to the presence of numerous hydrolytic enzymes in the vacuole of plant cells, it is generally difficult for proteins to maintain their activity inside the vacuole (Boller and Kende, 1979; Marty, 1999). Here, we showed that

the β-glucosidase enzymatic activity of heterologously expressed BglB was significantly higher in the transgenic (both the CB and VB transformants) plant compared to WT plants (**Figure 2A**). The transgenic plants remained stable during the life cycle and durable after three offspring generation (**Figures 2A,B**). These results indicate that the *T. maritima* BglB was effectively expressed in the transgenic tobacco plants.

For the first time, the present of the heterologous BglB and β-glucosidase enzymatic activity assays were conducted after vacuole isolation, which showed that BglB expression was dramatically higher in the VB than CB transgenic plants (**Figures 1E** and **2C**). This result clearly indicated that AFVY tetrapeptide were effective for sorting *T. maritima* BglB into the vacuole, and that its β-glucosidase enzymatic activity was maintained and could tolerate the protein-degrading conditions of the vacuole environment. Therefore, vacuole-targeted *T. maritima* BglB transgenic plants should be considered candidates for plant molecular farming, where plants are used as bioreactors to produce degrading enzymes for hydrolysis of lignocellulosic material, which is similar to chloroplast-targeted *T. maritima* BglB transgenic plants (Jung et al., 2010, 2013).

Hormone glucoside conjugates, which are mainly stored in the plant vacuole, are considered inactive forms in hormone metabolism, and can be liberated by β-glucosidases, a large group of enzymes that can hydrolyze glucoside ester linkages (Sembdner et al., 1994; Bajguz and Piotrowska, 2009). A wide variety of βglucosidase enzymes from plants have been proven to be hormone conjugates with hydrolysis capability (Schliemann, 1984; Brzobohaty et al., 1993; Dietz et al., 2000; Kiran et al., 2006; Lee et al., 2006; Yao et al., 2007; Jin et al., 2011). We demonstrated a novel approach in which transformation of BglB, encoding a thermostable β-glucosidase from the bacterium *T. maritima* (Goyal et al., 2001), affected plant hormone levels through hydrolyzation of glucoside ester links in hormone conjugates in the transgenic plants, which seemed to be the result of non-specific activity. For example, previous studies demonstrated that each kind of β-glucosidase likely performs its functions in specific hormone conjugates (Brzobohaty et al., 1993; Dietz et al., 2000). Kiran et al. (2012) reported that Zm-p60.1 is capable of releasing active cytokinin from O- and N-glucosides, and confirmed that the liberated hormones are still in the active state. Knowledge of the transportation mechanism from inside to outside of the vacuole is still lacking (Vitale and Hinz, 2005; De Marcos Lousa et al., 2012). In the present study, significantly higher enzymatic activity, particularly in isolated vacuoles, was accompanied by dramatically higher levels of hormones (IAA, ABA, and cytokinin) in the VB plants compared to CB transgenic plants, with WT plants showing the lowest levels (**Figure 2C**). These results clearly demonstrated that, when greater amounts of BglB were targeted to the vacuole, more liberated hormones were released.

In contrast to the results obtained by Kiran et al. (2012), who found no significant phenotypic changes in vacuole-targeted *Zmp60.VAL* transgenic plants, our results showed pronounced phenotypic changes in *T. maritima* BglB transgenic plants compared to WT plants. Mature transgenic plants exhibited enhanced development, in terms of faster growth in stem height and a shortened growth cycle, with earlier flowering (**Figure 3**). Young seedlings had increased stem height and longer roots (**Figure 4**), which were accompanied by significantly higher hormones levels that were maintained over three offspring generations of the transgenic plants. These results provide clear evidence that heterologously expressed BglB increases the plant hormones levels, which then influence their phenotypes.

Due to the elevated levels of IAA, ABA, and cytokinin, it is difficult to determine the specific factor that directly contributes to the phenotypic changes in the transgenic plants. Fortunately, previous works can provide clues to trace the cause of such changes. For example, IAA is known to regulate root development (Overvoorde et al., 2010), cytokinin plays pivotal roles in the formation and activity of shoot meristems (Werner et al., 2003; Werner and Schmülling, 2009), and ABA functions in the plant response to dehydrating/salinity stresses and programed cell death (Finkelstein, 2006; Yang et al., 2014). Previous studies have also shown that the reproduction and differentiation of stem cells harbored in the shoot and root apical meristem contribute to development and organogenesis in plants (Williams and Fletcher, 2005; Powell and Lenhard, 2012). Therefore, the taller stem height, longer roots, and earlier flowering observed in the transgenic plants could indicate enhancement of the shoot and root apical meristem in the transgenic plants compared to WT plants. Specifically, the expression level of *WUS*, a transcription factor that regulates the development and division of stem cells in the shoot apical meristem, is upregulated by cytokinin (Kurakawa et al., 2007; Werner and Schmülling, 2009; Zhao et al., 2010), shedding light on the mechanism contributing to the role of cytokinin, which was increased in our transgenic plants, in enhancing the development of the stems and aboveground organs.

Our result showed enhanced development of the root systems (represented by increased roots dry weight, number of lateral roots, and root length), confirming the effect of a larger amount of IAA on the development of root systems in the transgenic plants (**Figures 3** and **4**). ABA mainly functions in the plant's response to dehydration by inducing stomatal opening/closing, and also plays a role in limiting cell division and expansion, decreasing shoot growth and lateral root initiation, and promoting developmental phase changes such as vegetative-to-reproductive transitions (Finkelstein, 2006). In the present study, the increased ABA levels were related to increased salt stress tolerance in young seedlings. The faster chlorophyll degradation and higher rates of weight reduction after treatment with NaCl solution in mature plants revealed that programed cell death was promptly triggered in the transgenic plants for both the VB and CB transformants, compared to WT plants (**Figure 5**). Notably, no significant difference in β-glucosidase enzymatic activity, but significantly higher hormones levels, in the VB transgenic plants compared to CB transgenic plants, were observed, confirming that the hormone conjugates are mainly stored in the vacuole, and more liberated hormones were released from the conjugates in the VB transgenic plants, which contributed to the greater effect on plant development in the VB transgenic plants.

## **CONCLUSION**

After *T. maritima BglB* was first overexpressed and effectively targeted into the vacuole by the addition of AFVY C-terminal tetrapeptides, BglB was still active and functional. The main results emerging from this study are that the hormone (ABA, IAA, and cytokinin) conjugates are mainly stored in the vacuole, and perhaps more importantly, higher levels of hormones liberated from their conjugates via BglB-mediated hydrolysis enhance the growth and development in VB transgenic plants to a greater extent than in CB transgenic plants. Therefore, the use of heterologously overexpressed vacuole-targeted *T. maritima BglB* may be an approach to develop molecular farming technology to achieve multiple targets: increased production of the β-glucosidase BglB, increased biomass accumulation, and shortened of developmental stages. Also this *BglB* vacuole-targeted plant farming system influences of total biomass accumulation and as such may be useful in increasing biomass production for bioenergy and biofuel production.

## **FUNDING**

This work was supported by Priority Centers Program (2010- 0020141) through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and technology, and by a grant (S211314L010120) from Forest Science & Technology Projects, Forest Service, Republic of Korea.

## **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at http://journal.frontiersin.org/article/10.3389/fbioe.2015.00181

## **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Nguyen, Lee, Jung and Bae. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **Book review: Socio-economic impacts of bioenergy production**

*Sheikh Adil Edrisi and P. C. Abhilash\**

*Institute of Environment and Sustainable Development, Banaras Hindu University, Varanasi, India*

**Keywords: biomass and biofuel, socio-economic factors, sustainable development, waste land development, job opportunities, income effect**

### **A book review on Socio-economic impacts of bioenergy production**

by Rutz, D., and Janssen, R. (eds). (2014). Dordrecht: Springer, 310 pp. ISBN 978-3-319-03828-5

During the last few decades, the bioenergy production has been radically promoted worldwide (EIA, 2013) as a clean and eco-friendly source of energy mainly due to the global concern of climate change, primarily attributed by the increasing emission of CO<sup>2</sup> by fossil fuel consumptions. Although vegetable oils was first used as liquid fuels (bioenergy) in internal combustion engine by Rudolf Diesel in early 1900 (Pousa et al., 2007), the low cost and easy availability have made the fossil fuels, such as petroleum, as a primary fuel for vehicular transports. Nevertheless, the limited reserves of petroleum and its derivative had necessitated the search for cleaner and alternative source of energy (Parente, 2003; Pousa et al., 2007). Although bioenergy is considered as a versatile type of renewable energy, the large scale production of bioenergy from biomass is always under criticism as it requires large tracks of arable lands for bioenergy plantation. Hence, there is a direct conflict of interest between the food and biofuel production in the concerned societies. As a result, the present global bioenergy production is not promising and (55 EJ) (World Bioenergy Association, 2014) is still not enough to satisfy the energy demands of burgeoning global population. Hence, it is the need of the hour to maximize the bioenergy production without having any social and economic conflicts. However, the socio-economic considerations are mainly influenced by the local and regional frameworks (societal and economic) of the concerned nations, including the educational level, cultural aspects, and policies, including environmental and social targets. In this context, the book "*Socio-economic impacts of bioenergy production*" is a relevant and indepth analysis of the socio-economic issues associated with bioenergy production and provides suitable indicators for assessing the sustainability of such bioenergy production programs and also providing strategies for overcoming all such negative impacts in an amicable way. As editors rightly pointed out, the book "*illustrates the complexity of interrelated topics in the bioenergy value chain, ranging from agriculture to conversion processes, as well as from social implications to environmental effects. It furthermore gives an outlook on future challenges associated with the expected boom of a global bio-based economy, which contributes to the paradigm shift from a fossil based to a biomass and renewable energy based economy*." Therefore, the book is targeted to a wider readers ranging from "policy makers, scientists, and NGOs in the fields of agriculture, forestry, biotechnology, and energy."

Though there are few books in the same domain, such as "*Sustainable Bioenergy Production: An Integrated Approach*" (Ruppert et al., 2013) and "*Bioenergy for Sustainable Development in Africa*" (Janssen and Rutz, 2012), addressing one or another aspects of socio-economic implications of bioenergy production, the current book provides a complete deliberations on the above issues along with solutions in a doable way. Therefore, we could appropriately say that the current book has its own perspectives, ideas, and deliberations that handle the cross cutting edges of socio-economic

### *Edited by:*

*Robert Henry, The University of Queensland, Australia*

*Reviewed by: Abu Yousuf, Universiti Malaysia Pahang, Malaysia*

> *\*Correspondence: P. C. Abhilash pca.iesd@bhu.ac.in, pcabhilash@hotmail.com*

### *Specialty section:*

*This article was submitted to Bioenergy and Biofuels, a section of the journal Frontiers in Bioengineering and Biotechnology*

> *Received: 13 September 2015 Accepted: 12 October 2015 Published: 26 October 2015*

### *Citation:*

*Edrisi SA and Abhilash PC (2015) Book review: Socio-economic impacts of bioenergy production. Front. Bioeng. Biotechnol. 3:174. doi: 10.3389/fbioe.2015.00174* impacts of bioenergy production. Importantly, "*this publication builds upon the results of the Global-Bio-Pact project on 'Global Assessment of Biomass and Bio-product Impacts on Socio-economics and Sustainability' which was supported by the European Commission in the 7th Framework Programme for Research and Technological Development from February 2012 to January 2013*." Moreover, the "*contributions to this book are based on the experience of selected authors from Europe, Africa, Asia, and Latin America, including researchers, investors, policy makers and other stakeholders such as representatives from NGOs*."

The book basically focuses on the following aspects, such as (i) various tools for socio-economic impact assessments, (ii) indicators for assessing socio-economic sustainability, (iii) test auditing of the indicators, (iv) linkages between socio-economic and environmental impacts of bioenergy, (v) socio-economic impact of biofuels on land use change, (vi) effects on food security, (vii) socio-economic impacts of sweet sorghum value chains in temperate and tropical regions, (viii) the use of soybean by-products as a biofuel in Argentina, (ix) socio-economic impacts of palm oil and biodiesel products in Indonesia, (x) socio-economic experiences of Jatropha production in Africa and Mali, (xi) socio-economic impacts of bioethanol from sugarcane in Brazil and Costa Rica, (xii) socio-economic analysis of lignocelluloses ethanol refinery in Canada, (xiii) biogas production from organic waste in Africa, (xiv) socio-economic indicators on different bioenergy case studies, and (xv) the contribution of bioenergy to energy access and energy security.

The editors presented the intended themes in an organized and progressive manner via its arrayed, interrelated, and wellstructured chapters. Furthermore, it reveals that the socioeconomic impact assessment basically consists of scoping and determination of issues, social and economic baselines, its impacts, significance, mitigation, management, and monitoring. Moreover, it also describes that these assessments can also be "*used as an add-on to environmental impact assessment and/or to support biomass certification schemes*." The introductory chapter delivers a conceptual framework regarding the different tools used for the assessment of socio-economic impacts with glorious glimpses of sustainability concepts.

The book also highlights various issues related to the impacts of bioenergy production, its assessment and screening strategies, including key indicators for test auditing and also provides linkages between the socio-economic and environmental impacts of bioenergy production and utilization. Editors have made a special attempt to address the further implications of bioenergy production on land use change and food security, but fails to explore the issues under the current global land use scenarios as the concerned chapter is only having the basic definitions of several land use pattern. However, there are some interesting sections, such as "land use rights, land tenure, and ownership," which are useful for

### **REFERENCES**


owing the ownership right to various stakeholders. Moreover, the impact analysis on food security is a well-handled issue corroborated with connections and controversies of food and fuel production, envisioning methodology for an economy-wide assessment of food security and biofuels, and also quantified the different biofuel policies' impact on food security. The chapter on the socioeconomic impacts of sweet sorghum is well-supported through value-chain analysis ranging from its cultivation to conversion scenarios in tropical and temperate production systems. The book also deals with the bioenergy production from soybean, palm, and sugarcane in Argentina, Indonesia, and Brazil, respectively, in different individual chapters and also emphasized Jatropha individually with its different business models in tropical areas specifically in Africa and Mali that will certainly attracts scientists, policy makers, entrepreneurs, stakeholders, and NGOs concerned with agroforestry for sustainable bioenergy production.

Apart from the issues related with the socio-economic perspectives of biomass conversion for bioenergy production, the book also addresses various issues related to the value-chain analysis of biomass conversion technologies with special reference to sugarcane to ethanol production in Costa Rica and also separately illustrates the impacts of a refinery in Canada targeting lignocellulosic ethanol production. It clearly elucidates the socioeconomics of lignocellulosic biomass supply chain in a national context with special emphasis on the forestry sector and land ownership. The section also having regional and local case studies of British Columbia that deals with the lignol technology, supply chain, products of the lignol process, pyrolysis of biomass technology group (BTG), products of the BTG process, macroeconomics in the lignocellulosic biomass chain in Canada and British Columbia, employment generation, working conditions, relevance of impacts, threshold determination, mitigation options and biomass certifications, etc.

Although the use of genetically modified products of sweet sorghum and other plant products were discussed superficially, it would be much better if separate sections or chapters with specific case studies at national, regional, or local levels. The volume also lacks the future challenges and recommendations in the lignocellulosic biomass conversion processes. Moreover, the qualities of the display items (figures and illustrations) are not good and difficult to understand. However, our overall impression is that the book is a well written "*guideline to assess the socio-economic implications of bioenergy production for scientists, practitioners, and decision makers who are interested in a biomass supply, costvalue, macro and micro-economics and value-chain perspective of bioenergy production*."

## **AUTHOR CONTRIBUTIONS**

SE and PA wrote the review.


Parente, E. J. S. (2003). *Biodiesel: Uma Aventura Tecnológica num País Engraçado*, first Edn. Fortaleza: Unigráfica.

World Bioenergy Association. (2014). *Global Bioenergy Statistics*. Available at: www. worldbioenergy.org

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Edrisi and Abhilash. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*