# STRUCTURAL AND COMPUTATIONAL GLYCOBIOLOGY: IMMUNITY AND INFECTION

EDITED BY: Elizabeth Yuriev and Mark Agostino PUBLISHED IN: Frontiers in Immunology

#### *Frontiers Copyright Statement*

*© Copyright 2007-2015 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88919-638-8 DOI 10.3389/978-2-88919-638-8

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

## **STRUCTURAL AND COMPUTATIONAL GLYCOBIOLOGY: IMMUNITY AND INFECTION**

#### Topic Editors:

**Elizabeth Yuriev,** Monash University, Melbourne, Australia **Mark Agostino,** School of Biomedical Sciences, Curtin University, Perth, Australia

Computational methods play an increasingly crucial role in structural glycobiology studies. This complex of a tetrasaccharide xenoantigen (Galα1-3Galβ1-4GlcNAcβ1-3Gal) with the anti-Gal mAb 8.17 was predicted through a combination of molecular docking and computational site mapping techniques. Based on Agostino et al. Glycobiology. 2010;20:724-735. Interest in understanding the biological role of carbohydrates has increased significantly over the last 20 years. The use of structural techniques to understand carbohydrate-protein recognition is still a relatively young area, but one that is of emerging importance. The high flexibility of carbohydrates significantly complicates the determination of high quality structures of their complexes with proteins. Specialized techniques are often required to understand the complexity of carbohydrate recognition by proteins. In this Research Topic, we focus on structural and computational approaches to understanding carbohydrate recognition by proteins involved in immunity and infection. Particular areas of focus include cancer immunotherapeutics, carbohydrate-lectin interactions, glycosylation and glycosyltransferases.

**Citation:** Elizabeth Yuriev and Mark Agostino, eds. (2015). Structural and computational glycobiology: immunity and infection. Lausanne: Frontiers Media. doi: 10.3389/978-2-88919-638-8

# Table of Contents


Ute Krengel and Paula A. Bousquet

*17 Structure based refinement of a humanized monoclonal antibody that targets tumor antigen disialoganglioside GD2*

Mahiuddin Ahmed, Jian Hu and Nai-Kong V. Cheung


Spandana Makeneni, Ye Ji, David C. Watson, N. Martin Young and Robert J. Woods

*44 Differential site accessibility mechanistically explains subcellular-specific N-glycosylation determinants*

Ling Yen Lee, Chi-Hung Lin, Susan Fanayan, Nicolle H. Packer and Morten Thaysen-Andersen


Mark B. Richardson and Spencer J. Williams

*87 Computational and experimental prediction of human C-type lectin receptor druggability*

Jonas Aretz, Eike-Christian Wamhoff, Jonas Hanske, Dario Heymann and Christoph Rademacher

*99 Carbohydrates in cyberspace*  Elizabeth Yuriev and Paul A. Ramsland

## **Editorial: Structural and computational glycobiology – immunity and infection**

*Mark Agostino1,2 \* and Elizabeth Yuriev <sup>3</sup> \**

*<sup>1</sup> CHIRI Biosciences and Curtin Institute for Computation, School of Biomedical Sciences, Curtin University, Perth, WA, Australia, <sup>2</sup> Centre for Biomedical Research, Burnet Institute, Melbourne, VIC, Australia, <sup>3</sup> Medicinal Chemistry, Monash Institute of Pharmaceutical Sciences, Monash University, Melbourne, VIC, Australia*

**Keywords: glycobiology, structural biology, infection, cancer immunotherapy, molecular modeling, molecular recognition, lectins, signaling**

Historically deemed as the realm of the brave or the foolhardy, glycobiology has grown considerably as a discipline over the last 50 years. Carbohydrates, which were once considered to be mere "decorations" on proteins and lipid membranes, are increasingly demonstrated to afford specific roles in signaling and communication (1).

Although the rate of structures deposited into the Protein Data Bank continues to grow at an exponential rate, the characterization of new structures of carbohydrate–protein complexes is growing more modestly, still being very challenging and prone to errors (2). Computational methods are increasingly being pursued to provide structural insight into carbohydrate–protein interactions. The complex structure and high flexibility of carbohydrates, as well as difficulties associated with accurately computing binding energies for these interactions, present considerable challenges for the use of these methods in both understanding the carbohydrate–protein recognition and the structureaided design of carbohydrate-based therapeutics. However, numerous computational approaches have been developed in recent years that address some of these issues (3–9). The Opinion piece in this Research Topic further highlights some computational resources that have been developed specifically for glycobiology (10).

Several carbohydrate classes, most notably gangliosides, Lewis antigens, and Thomsen– Friedenreich antigen, are of considerable interest for the development of cancer immunotherapeutics. Krengel and Bousquet (11) present a comprehensive review on the importance of gangliosides not only to cancer therapeutics but also their relevance for signaling and in mediating infection by pathogens, as well as how their structure and presentation on glycolipids and glycoproteins influences their function and potential to be exploited in therapeutics. Ahmed et al. (12) describe the use of molecular modeling to optimize framework regions of an anti-ganglioside antibody, resulting in the identification of a new construct with enhanced stability, antigen binding, and cytotoxic properties. Kieber-Emmons et al. (13) discuss the challenges and frontiers associated with the development of peptides as immunogenic mimics of carbohydrates, particularly focusing on mimics of tumor-associated carbohydrate antigens.

Despite considerable advances in the understanding of many aspects of glycobiology, several fundamental processes remain only partially understood. An excellent example of this is the structural basis of antibody recognition of the blood group antigens (A, B, H). Makeneni et al. (14) combine docking with a recently developed carbohydrate-specific scoring function and molecular dynamics simulation to demonstrate the structural basis of A vs. B specificity of an anti-A antibody. Lee et al. (15) performed LC-MS/MS-based glycomics and proteomics, combined with structural analyses, of a wide range of glycosylated proteins in order to understand the differences in the glycosylation of secreted cell surface and intracellular proteins. The study correlates the presence of specific *N*-glycan terminations with their subcellular location, providing insight into pathophysiological conditions

#### *Edited and reviewed by:*

*Kendall Arthur Smith, Weill Medical College of Cornell University, USA*

#### *\*Correspondence:*

*Mark Agostino mark.agostino@curtin.edu.au; Elizabeth Yuriev elizabeth.yuriev@monash.edu*

#### *Specialty section:*

*This article was submitted to Immunotherapies and Vaccines, a section of the journal Frontiers in Immunology*

> *Received: 19 June 2015 Accepted: 01 July 2015 Published: 14 July 2015*

#### *Citation:*

*Agostino M and Yuriev E (2015) Editorial: Structural and computational glycobiology – immunity and infection. Front. Immunol. 6:359. doi: 10.3389/fimmu.2015.00359*

caused by glycosylation disorders. Brockhausen (16) provides a comprehensive review detailing known glycosyltransferases with overlapping activities between bacteria and mammals. In many cases, similar catalytic mechanisms between bacterial and mammalian glycosyltransferases can be identified, despite limited sequence similarity.

Lectins, particularly C-type lectins, are of considerable importance for immunity, mediating cell–cell recognition, and representing potential targets for the development of therapeutics. Notable C-type lectins include DC-SIGN and the selectins, known for their roles in the progression of HIV and cancer, respectively. Richardson and Williams (17) review the discovery and characterization of the macrophage C-type lectin (MCL) and

### **References**


the macrophage-inducible C-type lectin (Mincle), their roles in initiating the immune response to infection, and the identification of activating ligands for these receptors. Aretz et al. (18) predict the druggability of a panel of C-type lectins, as well as perform fragment-based screening by nuclear magnetic resonance spectroscopy against DC-SIGN, langerin, and MCL. Their work highlights limitations in the application of computational methods to predict the druggability of this class of proteins.

The work presented in this Research Topic illustrates a small selection of the wide ranging research in this area and the considerable challenges associated with both understanding glycan function and targeting glycan interactions for the development of therapeutic agents.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Agostino and Yuriev. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Molecular recognition of gangliosides and their potential for cancer immunotherapies

### **Ute Krengel \* and Paula A. Bousquet \***

Department of Chemistry, University of Oslo, Oslo, Norway

#### **Edited by:**

Elizabeth Yuriev, Monash University, Australia

#### **Reviewed by:**

Paul A. Ramsland, Burnet Institute, Australia Anne Imberty, CNRS, France

#### **\*Correspondence:**

Ute Krengel and Paula A. Bousquet, Department of Chemistry, University of Oslo, P.O 1033 Blindern, NO-0315 Oslo, Norway e-mail: ute.krengel@kjemi.uio.no; paula.bousquet@kjemi.uio.no

Gangliosides are sialic-acid-containing glycosphingolipids expressed on all vertebrate cells. They are primarily positioned in the plasma membrane with the ceramide part anchored in the membrane and the glycan part exposed on the surface of the cell. These lipids have highly diverse structures, not the least with respect to their carbohydrate chains, with N-acetylneuraminic acid (NeuAc) and N-glycolylneuraminic acid (NeuGc) being the two most common sialic-acid residues in mammalian cells. Generally, human healthy tissue is deficient in NeuGc, but this molecule is expressed in tumors and in human fetal tissues, and was hence classified as an onco-fetal antigen. Gangliosides perform important functions through carbohydrate-specific interactions with proteins, for example, as receptors in cell–cell recognition, which can be exploited by viruses and other pathogens, and also by regulating signaling proteins, such as the epidermal growth factor receptor (EGFR) and the vascular endothelial growth factor receptor (VEGFR), through lateral interaction in the membrane. Through both mechanisms, tumor-associated gangliosides may affect malignant progression, which makes them attractive targets for cancer immunotherapies. In this review, we describe how proteins recognize gangliosides, focusing on the molecular recognition of gangliosides associated with cancer immunotherapy, and discuss the importance of these molecules in cancer research.

**Keywords: biological membranes, cancer immunotherapy, cell signaling, gangliosides, protein–carbohydrate interactions, glycosphingolipids, sialic acid, tumor-associated antigens**

#### **INTRODUCTION**

Few lipid species included in biological membranes have received as much attention as glycosphingolipids (GSLs), and especially gangliosides, sialic-acid-containing GSLs. They were discovered by Ernst Klenk in the 1940s, who proposed the term "ganglioside" due to the abundance of these molecules in"Ganglionzellen"(neurons). Gangliosides were later classified by Svennerholm according to the number of sialic-acid residues and chromatographic mobility (1). In contrast to glycerolipids, the lipid anchor in sphingolipids builds on the long-chain amino alcohol sphingosine, which is coupled *via* its amino group to a fatty acid to form ceramide (**Figure 1**). In gangliosides, the ceramide anchor is linked to a hydrophilic glycan head group, which is characterized by the presence of one or more sialic-acid residues (carbohydrates with a nine-carbon backbone and a carboxylic acid group); however, there is large variability of this structure. One example, the GM3 ganglioside, abundant in almost all healthy tissues, is shown in **Figure 1**. The large structural variability is related to developmental stage and cell type, and hundreds of gangliosides are known today (3–5). Variations in carbohydrate structure alone account for over a 100 different structures, and this number significantly increases, when ceramide variations are taken into account (4–7). Accumulating evidence indicates that many cellular events, including differentiation, growth, signaling, interactions, and immune reactions are highly influenced by gangliosides, and that these molecules may also cause malignancies. Positioned in the plasma membrane, gangliosides interact with other lipids and proteins, both laterally in the membrane and *via* their head groups, acting as cellular receptors that can be recognized by antibodies and other ganglioside-binding molecules. Here, we highlight the function and molecular interactions of gangliosides with high clinical significance.

#### **GANGLIOSIDES – GENERAL ARCHITECTURE, CELLULAR LOCALIZATION, AND BIOSYNTHESIS**

Gangliosides consist of a lipid anchor, the ceramide, decorated by a glycan head group of various complexity. In cells, gangliosides are mainly found in the outer leaflets of the plasma membrane. Together with sphingomyelin and cholesterol, they form membrane microdomains, which play important roles in cell–cell communication and signal transduction (8–10). The synthesis of gangliosides starts in the ER compartment with the synthesis of the ceramide, the common precursor of all GSLs. Aided by the ceramide-transfer protein, CERT, ceramide is then transferred to the Golgi apparatus, and thereafter converted to glucosylceramide (GlcCer) (11). Subsequently, other carbohydrate residues are attached, one by one, catalyzed by glycosyltransferases, as described below (12, 13). The glycosyltransferases are specific to the sugar residues that they transfer and are grouped into families according to their specificity. Interestingly, all glycosyltransferase promoters lack the TATA sequence, and hence do not have any core promoter element characteristic for housekeeping genes. Although some indications relate their transcription to complex developmental and tissue-specific regulation, very

Consortium for Functional Glycomics (2); purple diamond – N-acetylneuraminic acid; yellow circle – D-galactose; blue circle – D-glucose.

little is known about how glycosyltransferases are regulated (14). The molecular products are further subject to remodeling, by sialidases, sialyltransferases, and other enzymes, followed by vesicle sorting and fusion with the plasma membrane (15). Gangliosides are assumed to recycle to the plasma membrane from early endosomes, and a degradation process is thought to take place at the late endosomal level (16).

The biosynthetic pathways of gangliosides are shown in **Figure 2**. After formation of the initial glucosylceramide, a galactose moiety is added to GlcCer to yield lactosylceramide (LacCer), the common precursor for almost all gangliosides (except GM4). Addition of one sialic-acid residue to LacCer subsequently converts this precursor molecule to GM3. This reaction is catalyzed by sialyltransferase I (ST-I) or GM3 synthase. In the same manner, GD3 and GT3 can be generated by further addition of sialic-acid residues, catalyzed by ST-II or GD3 synthase and ST-III or GT3 synthase, respectively. The number of sialic-acid residues linked to the inner galactose residue (0, 1, 2, or 3) classify the gangliosides into asialo, a-, b-, or c-series (**Figure 2**), however, only trace amounts of gangliosides from the asialo- and c-series are found in adult human tissue (17).

#### **GANGLIOSIDES – BIOLOGICAL FUNCTION AND EXPLOITATION BY PATHOGENS**

Gangliosides are key molecules in cellular recognition and signaling. They are primarily present in the plasma membranes of vertebrates, but have recently also been found in nuclear membranes, recognized as functionally important constituents (18, 19). Knock-out studies in mice have been essential for revealing the functions of gangliosides, especially in embryonic development and differentiation. For example, Yamashita et al. observed that mouse embryos carrying a knock-out in the glycosylceramide synthase enzyme did not survive more than 7.5 days (20). Other examples are studies of mice with a knock-down of GM3 synthase and GM2/GD2 synthase, which exhibit increased insulin sensitivity and decreased ability to repair nervous tissues, respectively (21, 22).

Because of the tight packing of lipids in membranes, gangliosides associate with other types of lipids, forming membrane subcompartments such as lipid rafts, to which specific proteins can associate (8, 23, 24). The organization of gangliosides in membranes will be further discussed in the Section "Organization and Presentation of Gangliosides in Biological Membranes." Since gangliosides have the ability to interact with both sugars and proteins (see Sections "Gangliosides – Structure and Molecular Recognition", "Organization and Presentation of Gangliosides in Biological Membranes", and "Effect of Gangliosides on Membrane Proteins and Cellular Signaling"), a large range of events can be triggered or inhibited by these molecules. Cell growth, migration, differentiation, adhesion, and apoptosis are some examples (25, 26). The terminal sialic-acid residue(s) in particular are targets for many important intercellular interactions, but can also be exploited by pathogens that use these residues as a docking station to enter the cell (27).

Various pathogens, from viruses to bacteria and parasites, recognize sialic-acid residues on host cell membranes, several of these known to cause cancer. The most common recognition module is NeuAc; in addition, NeuGc and 9-*O*-acetylated sialic acids are also well-known receptors (28, 29). Examples of viral pathogens recognizing gangliosides are the influenza virus (30), simian virus 40 (SV40) (31), and polyomavirus (32, 33). Bacteria interact with gangliosides *via* toxins and adhesins, with the cholera toxin (34) and the Sialic-acid binding adhesin from the Class 1 carcinogen *Helicobacter pylori*, SabA (35, 36), being prominent examples. Gangliosides may also suppress natural killer (NK) cell cytotoxicity, through interaction with Siglec-7 (sialic-acid binding

immunoglobulin-like lectin 7), as elaborated further in the Section "Gangliosides and Cancer."

#### **GANGLIOSIDES – STRUCTURE AND MOLECULAR RECOGNITION**

The molecular recognition of carbohydrates, with their large number of hydroxyl groups, is dominated by hydrogen bonds, with the binding specificity determined by the recognition of the characteristic OH-scaffolds of different sugars (37, 38). Many of these interactions are water-mediated, and sometimes, metal ions are involved. In addition, hydrophobic interactions contribute significantly to carbohydrate recognition, which may involve methyl groups such as in the monosaccharide fucose or the stacking against exposed hydrophobic patches of the sugar rings. A particularly typical molecular recognition mechanism of carbohydrates involves the CH-πstacking of sugar rings against the side chains of aromatic amino acids (so-called "aromatic stacking interactions"), promoted by weak hydrogen bonds (39) (**Figure 3**).

Gangliosides are characterized by the presence of at least one sialic-acid residue, which in contrast to many other sugars is charged. This charge can be exploited by salt bridges with positively charged residues, but this is not necessarily the case (and in fact quite rare). The carboxylate group is often not even the most important recognition motif. For example, the fingerprint of the most common sialic acid, *N*-acetylneuraminic acid (NeuAc), which is derived from pyruvate and *N*-acetylmannosamine, generally involves the recognition of the *N*-acetyl group and the adjacent 4-OH-group, originating from mannose (which corresponds to 3-OH in hexoses) (41). Further H-bonding interactions are provided by the sialic-acid glycerol chain (also originating from mannose), which is recognized by a conserved binding motif common to a number of viral and bacterial sialic-acid binding proteins (42). In addition, conformer selection and clustering play important roles for the molecular recognition of gangliosides, as shown for example for the recognition of GM1 by the cholera toxin or galectin-1 (34, 43–45).

Carbohydrates in general are flexible molecules, but due to internal carbohydrate–carbohydrate interactions, the influence of the lipid anchor, or due to interactions with other molecules in the immediate neighborhood, rigid molecular epitopes may arise. As gangliosides are localized in the plasma membrane, the presentation of the carbohydrate epitopes in particular depends on the interaction with other lipids (8). However, the structural characterization of anchored gangliosides is difficult to achieve. State-of-the-art lipid simulations are described by Vattulainen and Róg (46), but these often fail to take the glycan head groups into account. Nevertheless a few studies have been undertaken that do just that. One interesting example is the atomic-resolution conformational analysis of GM3 in a bilayer composed of dimyristoylphosphatidylcholine (DMPC) (47). Two known GM3-binding proteins [sialoadhesin, PDB ID: 1QFO (48), and wheat germ agglutinin, PDB ID: 2CWG (49)] were studied in order to evaluate the importance of carbohydrate accessibility and ganglioside recognition. Probing the presentation and dynamics of the glycan head group, DeMarco and Woods observed significantly altered accessibility of the less exposed carbohydrate residues Gal and Glc, even though the internal structural properties for

membrane-bound versus soluble GM3 were unchanged. On the other hand, the terminal NeuAc-residue remained almost fully exposed. The difference in accessibility is likely of considerable importance for the initial recognition of GM3 by a receptor protein, although subsequent recognition events may include the glycan residues embedded deeper in the membrane. The less exposed residues may also indirectly affect recognition, by ceramide–Glc and Glc–Gal rotations, altering NeuAc presentation. Furthermore, the hydrophobic ceramide together with the polar Glc residue may regulate the insertion depth.

#### **ORGANIZATION AND PRESENTATION OF GANGLIOSIDES IN BIOLOGICAL MEMBRANES**

Cellular membranes serve both as segregation barriers and as facilitators of cellular communication. Positioned in the cell membrane, lipids interact laterally with other membrane components (lipids or membrane proteins), and also serve as cellular receptors, through their exposed head groups. In the past decade many studies have focused on the lateral characterization of membranes and it is now well-established that highly unsaturated components, like glycerophospholipids, provide the membrane with flexibility, while saturated components, such as GSLs, create order in biological membranes (10). Furthermore, the shape and length of the lipids determine the shape, size, and stability of cellular membranes (50). The ceramide part of gangliosides is characterized by a rigid and planar structure, composed of saturated acyl chains, which can be more tightly packed. Together with other membrane sphingolipids and cholesterol, they can segregate and form dynamic nanoscale "clusters", also called lipid rafts (8, 24, 51), to which specific proteins associate, hitching a ride.

Apparently, the density of GSLs can also influence their structure, affecting antigen specificity. For example, an antibody established by immunizing mice with syngeneric B16 melanoma, named M2590, reacted only with melanoma and not with healthy

tissues (52). Remarkably, the target epitope was later identified as GM3, an abundant ganglioside in membranes of normal cells (53). Further studies showed that a ganglioside density above a threshold value was required for reactivity, suggesting that this antibody recognized more densely packed GM3 (54). These results indicate that ganglioside antigens can be differently organized in tumor cells compared to normal cells and that some ganglioside antigens are fully antigenic when organized in clusters, but fail to bind antibodies when their density is under a threshold value (54, 55).

How can this be explained? This brings us back to the structural characterization of GSLs in biological membranes. One example has already been described [GM3 in DMPC bilayer; (47)]. Two other interesting studies evaluate the effect of cholesterol on GSL structure (56, 57), building on earlier work by Pascher and coworkers (58). Notably, cholesterol was found to introduce a tilt in the glycolipid head group from a conformation almost perpendicular to the membrane surface to an alignment parallel to the membrane (**Figure 4**). The culprit appears to be an H-bonding network involving the cholesterol OH-group, the sphingosine amide, and the oxygen of the glycosidic bond (56). Similar lipid-raft-specific conformational changes of GSLs may be critical for the entry of bacterial toxins or viruses into host cells (8, 59).

Glycosphingolipids are not always fully accessible, however. Their short head groups may be hidden in the "jungle" of membrane proteins or even masked by sialic-acid binding proteins positioned near the GSLs in the membranes (i.e.,in *cis*). Such a scenario is postulated, e.g., for Siglecs, a family of lectins that modulate innate and adaptive immune functions. *Trans* interactions may still occur, e.g., for higher-affinity ligands that can out-compete the *cis* ligands, however, in general, accessibility will be reduced.

#### **EFFECT OF GANGLIOSIDES ON MEMBRANE PROTEINS AND CELLULAR SIGNALING**

It has been suggested that also the activation of membrane proteins can be influenced by lipid cluster association. In addition to lateral interaction with the lipid tails in the cell membrane, such interactions may exploit the unique properties of sphingolipids, bearing a carbonyl oxygen, a hydroxyl group, and an amide nitrogen, thus being able to act as both H-bond donors and acceptors (60). As described in the previous section, gangliosides and other GSLs may further cause conformational changes of the glycan head group, which may either interact directly with amino acids of the extracellular part of the protein or alternatively interact with the sugar residues of a glycosylated protein, affecting protein activity.

Most growth factor receptors are known to be regulated by gangliosides (9). Here, we will discuss two examples of membrane proteins important for cancer research and immunotherapy: the epidermal growth factor receptor (EGFR) and the vascular endothelial growth factor receptor (VEGFR) (**Table 1**). A number of cancers are characterized by hyper-activated EGFRs, either caused by mutations or over-expression (61–63). Another important factor for tumor progression is the growth of new blood vessels. Tumor cells produce and release the growth factor VEGF, stimulating the VEGFR, and ultimately resulting in proliferation and migration of vascular endothelial cells (64).

The EGFR is known to undergo ligand-dependent dimerization, resulting in an autophosphorylation of tyrosine residues at

**Table 1 | Gangliosides affecting the growth factor receptors EGFR and VEGFR**.


the C-terminal tail of the protein (78). This initiates downstream signaling, leading to adhesion, cell migration, and proliferation (79). More recently, the EGFR has also been shown to undergo ligand-independent dimerization, a phenomenon that is poorly understood (80). Such ligand-free dimers can also be functionally active, but this is not always the case.

Several membrane ligands have been shown to affect signaling by the EGFR and the VEGFR. The GM3 ganglioside, a wellknown regulator of the insulin receptor (81), has an inhibitory effect on both the EGFR and the VEGFR, while the ganglioside GD1a strongly induces VEGFR-2 activation (26, 66, 70, 75, 82, 83). Moreover, the proangiogenic effects of GD1a can be efficiently reduced by GM3 (75). GM3 has been suggested to inhibit VEGFR-2 activation by blocking both growth factor binding and receptor dimerization through direct interaction with the extracellular domain of the VEGFR (74). The molecular interaction between the EGFR and GM3 is not fully elucidated, although it has been studied extensively. It has been shown that the inhibition of EGFR activation by GM3 involves the binding of the ganglioside to the GlcNAc-terminated *N*-glycans on the EGFR, suggesting carbohydrate–carbohydrate interactions (65, 67, 84, 85). In addition, increasing evidence points to the integral importance of ganglioside organization in the membrane for signal transduction (affecting the localization and activation of growth factor receptors). For example, recent computer simulations of the EGFR embedded in the membrane suggest that membrane lipids, especially anionic species, interact extensively with the EGFR (86). These interactions are more pronounced for the inactive EGFR, due to electrostatic interactions with the EGFR's intracellular domain, which may explain the inhibitory effect of GM3 on EGFR activation.

Cellular biological membranes are complex and the dynamics difficult to study. Even small modifications like the fluorescent labeling of lipids may critically affect bulk membrane properties as well as ligand–receptor interactions in biological environments (87). To generate a more controllable system, Coskun et al. reconstituted EGFR into proteoliposomes with defined lipid composition, with either uniform liquid-disordered (ld) membrane phases or a combination of disordered and ordered (ld/lo) domains. Adding gangliosides to this system, they found that GM3 had a strong inhibitory effect on EGFR activation, without interfering with ligand-binding, but in ld/lo proteoliposomes only (66). It would be of significant clinical interest to investigate how targeting GM3 by immunotherapy affects EGFR and VEGFR signaling, and whether the presence of both targets (GM3 clusters and EGFR/VEGFR) affect antibody efficiency and affinity.

#### **GANGLIOSIDES AND CANCER**

Gangliosides play important roles in many normal physiological processes, such as cell growth, differentiation, and embryogenesis (20), but also in pathological events like cellular malignancy and metastasis (88) (see **Table 2** for examples of gangliosides expressed in human cancer cells). Tumor formation results from autonomous uncontrolled proliferation of neoplastic cells, while metastasis occurs when tumor cells are released from the primary tumor and continue to proliferate at a distant site. Multiple factors affect these processes, in which gangliosides may serve both as inhibitory and stimulating molecules. For example, it has been

**Table 2 | Gangliosides expressed in human cancer cells**.

#### shown that highly metastatic melanoma cells have high expression levels of GD3. This is in contrast to poorly metastatic cells or the normal counterpart, melanocytes, which express very low levels of GD3 (89–91), suggesting a role of GD3 in transforming melanocytes into melanomas and promotion of metastasis. Gangliosides may suppress NK cell cytotoxicity through interaction with Siglec-7, which preferentially binds to gangliosides of the b-series, as found for cells engineered to overexpress GD3 (92). The high expression levels of the GD3 ganglioside in melanoma may hence reflect the suppressed efficiency of NK cell cytotoxicity against these tumor cells. The function of gangliosides as suppressors of the anti-tumor immune response is well-documented in many studies, with tumor-associated gangliosides reported to down-regulate the activity of T and B cells, NK cytotoxicity and active dendritic cells, among others (93–95). For instance, Tcell dysfunction is promoted by the GM2 ganglioside, however, an antibody targeting GM2 was able to block 50–60% of T-cell apoptosis (94).

Gangliosides are also shed from the tumor to the microenvironment in greater quantities than normal cells. Shed gangliosides can interact with proteins or be incorporated into the membrane of other cells, leading to signaling events or interactions with healthy cells (112–114). For example, the addition of exogenous GD3 to the culture medium of glioma cells was found to stimulate the release of VEGF (115). Taken together, these observations suggest a multitude of mechanisms by which tumor-associated gangliosides may contribute to malignancy and cancer progression.

Many of the tumor-associated gangliosides are also found in normal healthy tissues, but are over-expressed in tumors, while other antigens are only found in cancer cells. An interesting example is the sialic-acid NeuGc, which is found in several tumor types, such as melanoma and breast cancer (116). Among all variants of sialic acids, NeuAc and NeuGc are the most abundant; however, humans are a notable exception. Due to a 92-bp deletion in the gene coding for CMP-NeuAc hydroxylase (*cmah*), humans lack a functional enzyme required for generation of NeuGc (117, 118). Nevertheless, NeuGc is present in fetal tissues and malignant cells


αNeuAc = 5-acetyl-α-neuraminic acid, αNeuGc = 5-glycolyl-α-neuraminic acid, βDGal = β-D-galactopyranose, βDGalNAc = N-acetyl-β-D-galactopyranose, βDGlc = β-Dglucopyranose, Cer = ceramide, NSCLC = non-small-cell lung carcinoma, SCLC = small-cell lung carcinoma.

(99, 119, 120). For this reason, NeuGc was assumed to classify as an "onco-fetal" antigen, being expressed in the fetus, suppressed during adult life and re-expressed in malignant cells. However, since humans lack the putative active site of the enzyme, other explanations must lie at the heart of this change in carbohydrate profile. Diet incorporation, hypoxic conditions, and endogenous metabolic mechanisms are currently being discussed as possible origins of the increased levels of NeuGc (116, 121–124). Getting to grips with the high NeuGc-ganglioside levels is important, since this property appears to correlate with a poor prognosis. Specifically, recent studies indicate that non-small-cell lung cancer (NSCLC) patients with high NeuGc-ganglioside expression exhibit a low overall survival rate and a significantly lower progressionfree survival rate (125). These findings are consistent with recent experiments demonstrating that the silencing of the *cmah* gene in NeuGc GM3-expressing L1210 mouse lymphocytic leukemia B cells caused a shift to NeuAc GM3 expression and a concomitant reduction of tumorigenicity (126).

Interestingly, it has been shown that serum from healthy humans contains antibodies recognizing glycoconjugates exhibiting NeuGc (127, 128). These antibodies are called Hanganutziu– Deicher (HD) antibodies, and were first described by Hanganutziu (129) and Deicher (130) [as cited in Ref. (131)] independently in the 1920s. HD antibodies attract complement molecules to malignant cells (132, 133). The level decreases with age, which may correlate with an increased cancer risk at higher age (133). Characteristic for natural antibodies is that they recognize highly conserved antigens (134). Importantly, auto-antibodies against tumor-associated antigens can arise and be detected early, before symptoms occur, and hence have potential for early diagnosis (135–137). In line with this hypothesis, a recent study reported that healthy donors exhibited low levels of anti-NeuGc GM3 antibodies (decreasing with age), while these antibodies were absent in NSCLC patients (138).

#### **GANGLIOSIDE-BASED THERAPY**

Cancer immunotherapy is a highly promising approach to cancer treatment, which has been gaining grounds only recently (139). In contrast to traditional therapies like chemo- or radiation-therapy, immunotherapies constitute a much more targeted approach that promises higher specificity while eliciting fewer side effects. As the name states, this type of therapy uses the immune system to treat cancer. There are two main approaches (139, 140): (i) tumor-associated antigens or derivatives or mimics of these may be used as active therapeutic vaccines, priming the body to launch an immune attack against these molecules and hence the tumor cells (overcoming the body's tolerance of self-antigens); (ii) alternatively, antibodies may be used for passive immunotherapy, either coupled to toxins, radioactivity or on their own, relying on processes like antibody-dependent cell-mediated cytotoxicity (ADCC) or complement-dependent cytotoxicity (CDC). In both cases, effective immunotherapy relies on the choice of the antigen. Notably, in a recent project for prioritization of cancer antigens, 4 of the 75 selected antigens were gangliosides (GD2, GD3, fucosyl-GM1, and *N*-acetyl GM3), and additional targets, like the EGFR and the VEGFR, are known to interact with gangliosides (141).

Several antibodies targeting tumor-associated gangliosides are currently under investigation in pre-clinical or clinical studies, also including molecular vaccines. One example, the antibody 3F8, targets GD2, which is highly expressed in aggressive cancer, such as pediatric neuroblastoma (142). Other examples are 14F7 and chP3, both of which specifically recognize NeuGc GM3, discriminating it from the highly similar NeuAc GM3. So far, no crystal structures of these complexes have been reported, however, computer docking studies, *in silico* site mapping and phage display studies are contributing to reveal the recognition mechanisms of these promising tools (143–146). In addition, two NeuGc-gangliosidebased vaccines are currently tested in clinical trials (phase III); these are Racotumomab, an anti-idiotypic antibody<sup>1</sup> registered and launched in Cuba and Argentina under the trade name Vaxira (147) and NeuGc GM3/VSSP, a NeuGc GM3 ganglioside conjugated into very small proteoliposomes. In the ongoing clinical trials, the NeuGc GM3/VSSP and Racotumomab vaccines show efficacy and are well-tolerated by patients with advanced cutaneous melanoma (148) and NSCLC (149), respectively. This represents a significant step forward from the first, unsuccessful, attempt of developing a ganglioside-based vaccine – the GMK (GM2 based) vaccine for melanoma (150, 151). These molecules are part of a growing arsenal of targeted molecular weapons against cancer, which may be used as stand-alone therapy, but will more likely be employed as adjuvant therapy, in combination with or following standard treatment such as surgery, radiation, or chemotherapy. For example, based on the important roles of NeuGc GM3 and the EGFR for tumor cell immune evasion and proliferation, a combination therapy targeting both molecules may provide a rationale for fighting tumor cells. This combination is currently tested using Racotumomab and a vaccine targeting EGF in NSCLC patients, showing, so far, promising clinical results (152).

#### **CONCLUSION**

Today, we are still far from fully understanding the roles, structures, and mechanisms of gangliosides in biological systems, and only at the beginning of the exploitation of these molecules in potential therapies. However, the importance of these molecules is evident, and technology development is picking up pace (7, 46, 153, 154). We are looking forward to a bright future, in which gangliosides are fully appreciated, and unfold their full potential in targeted therapies.

#### **ACKNOWLEDGMENT**

We would like to thank Steffi Munack for improving the quality of **Figure 1**.

#### **REFERENCES**


<sup>1</sup>*Explanation*: Antibodies (Ab2) raised against the primary antibody (Ab1, generated in the original immune response against cancer antigens) are also called "anti-idiotypic" antibodies. They may have potential as therapeutic vaccines if they give rise to a third class of antibodies (Ab3) that resembles the primary antibody (Ab1).

from: http://www.functionalglycomics.org/static/consortium/Nomenclature. shtml.


1.85 Å resolution. *Mol Cell* (1998) **1**(5):719–28. doi:10.1016/S1097-2765(00) 80071-4


GD1A and GT1B. *Cell Prolif* (2002) **35**(2):105–15. doi:10.1046/j.1365-2184. 2002.00228.x


n-glycolylneuraminic acid.*Int Arch Allergy Appl Immunol* (1978) **57**(5):477–80. doi:10.1159/000232140


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 02 May 2014; accepted: 27 June 2014; published online: 21 July 2014. Citation: Krengel U and Bousquet PA (2014) Molecular recognition of gangliosides and their potential for cancer immunotherapies. Front. Immunol. 5:325. doi: 10.3389/fimmu.2014.00325*

*This article was submitted to Immunotherapies and Vaccines, a section of the journal Frontiers in Immunology.*

*Copyright © 2014 Krengel and Bousquet . This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

### Structure based refinement of a humanized monoclonal antibody that targets tumor antigen disialoganglioside GD2

#### **Mahiuddin Ahmed, Jian Hu and Nai-Kong V. Cheung\***

Department of Pediatrics, Memorial Sloan Kettering Cancer Center, New York, NY, USA

#### **Edited by:**

Mark Agostino, Curtin University, Australia

#### **Reviewed by:**

Paul A. Ramsland, Burnet Institute, Australia An-Suei Yang, Academia Sinica, Taiwan

#### **\*Correspondence:**

Nai-Kong V. Cheung, Department of Pediatrics, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, New York, NY 10065, USA e-mail: cheungn@mskcc.org

Disialoganglioside GD2 is an important target on several pediatric and adult cancer types including neuroblastoma, retinoblastoma, melanoma, small-cell lung cancer, brain tumors, sarcomas, and cancer stem cells. We have utilized structural and computational methods to refine the framework of humanized monoclonal antibody 3F8, the highest affinity anti-GD2 antibody in clinical development. Two constructs (V3 and V5) were designed to enhance stability and minimize potential immunogenicity. Construct V3 contained 12 point mutations and had higher thermal stability and comparable affinity and in vitro tumor cells killing as the parental hu3F8. Construct V5 had nine point mutations to minimize potential immunogenicity, but resulted in weaker thermal stability, weaker antigen binding, and reduced tumor killing potency. When construct V3 was combined with the single point mutation HC:G54I, the resulting V3-Ile construct had enhanced stability, antigen binding, and a nearly sixfold increase in tumor cell killing. The resulting product is a lead candidate for clinical development for the treatment of GD2-positive tumors.

**Keywords: antibody engineering, ganglioside, neuroblastoma, melanoma, structure, computational chemistry**

#### **INTRODUCTION**

GD2 is a ganglioside expressed in several pediatric and adult cancer types and has been actively targeted by cancer immunotherapy approaches (see Ref. (1) for recent review). GD2 is a member of the b-series gangliosides, which are normally expressed during fetal development and are highly restricted to the central nervous system in healthy adults, with low levels of expression on peripheral nerves and skin melanocytes (2). GD2 has been found to be expressed in neuroectoderm-derived tumors and sarcomas, including neuroblastoma, retinoblastoma, melanoma, small-cell lung cancer, brain tumors, and sarcomas (3–5). Recent evidence has also shown that GD2 can be found on breast cancer stem cells (6, 7), as well as on neuroectodermal (8) and mesenchymal stem cells (9, 10).

Because of its surface expression on tumor cells and restricted normal expression in the brain and low levels in the periphery, GD2 has been an ideal target for the development of monoclonal antibodies (MoAbs), which cannot cross the blood–brain barrier. Several anti-GD2 antibodies have been developed and tested in the clinic over the past 20 years, primarily in pediatric neuroblastoma patients. 3F8 was the first anti-GD2 MoAb to be tested in patients with neuroblastoma (3, 11, 12). MoAb 3F8 is a murine IgG3 with the highest reported affinity for GD2 (*K* <sup>D</sup> = 5 nM) (13). It binds specifically to the pentasaccharide epitope on GD2. Phase II clinical data have demonstrated that 3F8 when combined with the cytokine GM-CSF can significantly improve the survival of high-risk stage 4 children with metastatic neuroblastoma (14). Murine 3F8 was more recently humanized (hu3F8) based on complementarity determining region (CDR) grafting (13), and is currently in Phase

I clinical trials (clinical trials.gov NCT01419834, NCT01757626, and NCT01662804).

We have previously solved the crystal structure of murine 3F8 to 1.65 Å resolution (protein data bank 3VFG) and used completely *in silico* methods to find a single point mutation (HC:G54I) that could significantly enhance the antibody-dependent cell-mediated cytotoxicity (ADCC) of hu3F8 (15). Based on computational modeling, we have developed two additional hu3F8 frameworks, named V3 and V5, which were designed to optimize the properties of hu3F8. More specifically, V3 was designed to maximize stability and V5 was designed to minimize potential immunogenicity. We present here the computational methods used to derive the hu3F8 V3 and V5 frameworks along with their experimental properties of antigen binding, thermal stability, and *in vitro* ADCC.

#### **MATERIALS AND METHODS**

#### **MOLECULAR MODELING**

Molecular modeling, energy calculations, and image renderings were done using Discovery Studio 4.0 (Accelrys, San Diego, CA, USA). The crystal structure of m3F8 Fab (pdb 3VFG) and the homology model of hu3F8 Fab were simulated using CHARMm (CHemistry at Harvard Molecular mechanics) force fields, and the effects of point mutations were calculated from the difference between the folding free energies of the mutated structure and the parental protein. Generalized Born approximation was used to account for the effect of the solvent and all electrostatic terms were calculated as a sum of coulombic interactions and polar contributions to the solvation energy. A weighted sum of the van der Waals, electrostatic, entropy, and non-polar terms was calculated for each point mutation.

#### **CONSTRUCTION AND EXPRESSION OF hu3F8 CONSTRUCTS**

Humanized 3F8 genes were synthesized for CHO cells (Blue Heron Biotechnology or Genscript) as previously described (13). Using the bluescript vector, these heavy and light chain genes of hu3F8 were transfected into DG44 cells and selected with G418 (InVitrogen, CA, USA). Hu3F8 producer lines were cultured in Opticho serum free medium (InVitrogen) and the mature supernatant was harvested as previously described (13). Protein A affinity column was pre-equilibrated with 25 mM sodium citrate buffer with 0.15 M NaCl, pH 8.2. Bound hu3F8 was eluted with 0.1 M citric acid/sodium citrate buffer, pH 3.9 and alkalinized (1:10 v/v ratio) in 25 mM sodium citrate, pH 8.5. It was passed through a Sartobind-Q membrane and concentrated to 5–10 mg/mL in 25 mM sodium citrate, 0.15 M NaCl, pH 8.2.

#### **THERMAL STABILITY MEASUREMENTS**

The thermal stabilities of MoAbs were measured by differential scanning fluorimetry using the Protein Thermal Shift assay (Life Technologies). MoAbs (0.2 mg/mL) were mixed with Sypro Orange dye and fluorescence was monitored using a StepOne-Plus quantitative PCR machine (Applied Biosystems) with a 1% thermal gradient from 25 to 99°C. Data were analyzed using Protein Thermal Shift Software (Applied Biosystems) to calculate the Tm using the derivative method. Fab and F(ab')2 preparations of hu3F8 were used to correctly assign the Fab peak for the hu3F8 samples. All samples were prepared in triplicate. Statistical significance was calculated using a student's *T* test.

#### **BINDING KINETICS BY SURFACE PLASMON RESONANCE**

*In vitro* binding kinetics were measured using Biacore T-100 (GE Healthcare) as previously described (13). In brief, gangliosides were directly immobilized onto the CM5 sensor chip via hydrophobic interaction. Purified anti-GD2 MoAbs were diluted in HBS-E buffer containing 250 mM NaCl at increasing concentrations (50–1600 nM) prior to analysis. Samples (60µL) were injected over the sensor surface at a flow rate of 30µL/min over 2 min. Following completion of the association phase, dissociation was monitored in HBS-E buffer containing 250 mM NaCl for 300 s at the same flow rate. At the end of each cycle, the surface was regenerated using 50µL 20 mM NaOH at a flow rate of 50µL/min over 1 min and 100µL 4 M MgCl<sup>2</sup> at a flow rate of 50µL/min over 2 min. The data were analyzed by the bivalent analyte model and default parameter setting for the rate constants using the Biacore T-100 evaluation software, and the apparent association on rate constant (*k*on), dissociation off rate constant (*k*off), and equilibrium dissociation constant (*K* <sup>D</sup> = *k*off/*k*on) were calculated.

#### **ANTIBODY-DEPENDENT CELL-MEDIATED CYTOTOXICITY BY <sup>51</sup>CHROMIUM RELEASE**

Human neuroblastoma cell line LAN-1 was provided by Dr. Robert Seeger (Children's Hospital of Los Angeles). LAN-1 cells were grown in F10 RPMI 1640 medium supplemented with 10% fetal bovine serum (Hyclone, South Logan, UT, USA), 2 mM glutamine, 100 U/mL penicillin, and 100µg/mL streptomycin at 37°C in a 5% CO<sup>2</sup> incubator. ADCC assays were performed using NK-92MI cells stably transfected with the human CD16 Fc receptor as previously described (13). LAN-1 target cells were detached with 2 mM

EDTA in Ca2<sup>+</sup> Mg2<sup>+</sup> free PBS and washed in F10, before radiolabeling with <sup>51</sup>Cr for ADCC assays. All samples were prepared in triplicate. Dose–response curves were fitted by non-linear regression to a sigmoidal dose–response (variable slope) model, using GraphPad Prism software, to allow for determination of EC50. For comparison of curves, best-fit values for EC50 were analyzed for significance using F tests.

### **RESULTS**

#### **DESIGN OF CONSTRUCTS V3 AND V5**

Constructs V3 and V5 (see **Figure 1**) were designed utilizing completely *in silico* methods, based on both the crystal structure of murine 3F8 Fab (pdb 3VFG) and a homology model of hu3F8 Fab that was built using MODELLER followed by CHARMm energy minimizations. The original hu3F8 that was built by CDR grafting methods utilized the human germline sequences IGHV3-33 for the heavy chain template and IGKV3-15 for the light chain template (www.imgt.org). These same templates were utilized in

deciding which mutations to incorporate into V3 and V5, in order to minimize potentially immunogenic sequences.

**Table 1** shows the 12 mutations that were incorporated into hu3F8 resulting in construct V3, along with their predicted mutational energies. *In silico* mutagenesis was done on every potential humanizing mutation in the murine 3F8 structure that was not directly predicted to be involved in antigen recognition, based on our previous 3F8:GD2 docked model (15). In addition, potential back mutations and humanizing mutations were analyzed in the homology model of hu3F8. **Table 1** shows both of these sets of calculations. In choosing which mutations to incorporate into construct V3, more emphasis was placed in the first set of calculations for stabilizing the murine 3F8 structure, since this was the experimentally verified high-resolution crystal structure, in comparison to the homology model of hu3F8, which can contain inherent error. Another consideration in placing emphasis on the native structure of murine 3F8 was the fact that murine 3F8 had consistently shown higher antigen-binding affinity than hu3F8 (13).

The analysis showed that five mutations that were made in the original hu3F8 were destabilizing, and so for construct V3, those mutations were reverted back to the murine sequence (LC:E1S, LC:T10F, LC:S12L, HC:V11L, and HC:S21T). Two additional back mutations were made (LC:P40A and LC:Q100G) because they involved Gly or Pro residues that can affect protein backbone conformation. To offset the potential immunogenicity of these seven back mutations in the V3 construct, five humanizing mutations were added (LC:K24R, LC:S56T, LC:V58I, HC:I20L, and HC:M89V), which had either enhanced stability or had a negligible effect (< 0.5 kcal/mol). Two of these mutations involved mutating CDR residues (LC:K24R and LC:S56T). The net result of all 12 mutations was predicted to have a stabilizing mutational energy of -3.55 kcal/mol to the murine 3F8 structure. However, this same set of mutations in the model of hu3F8 was predicted to have a destabilizing mutational energy of +4.61 kcal/mol.

**Table 2** shows the nine point mutations were incorporated into hu3F8 to make construct V5, in an effort to minimize potential immunogenicity. In addition to the five humanizing mutations from construct V3 (LC:K24R, LC:S56T, LC:V58I, HC:I20L, and HC:M89V), construct V5 also includes four additional humanizing mutations (HC:A62S, HC:F63V, HC:M64K, and HC:S65G), which are located on CDR H2. These four CDR residues were predicted to be a part of a potentially moderate affinity MHC class II T-cell epitope, which can result in enhanced immunogenicity (as identified using the NN-align method on the Immune Epitope Database (http://www.iedb.org/). The net mutational energy of all nine mutations in construct V5 was predicted to be a moderately destabilizing +1.62 kcal/mol for the murine 3F8 structure, and +0.90 kcal/mol for the hu3F8 model.

Potential immunogenicity of constructs V3 and V5 as compared to hu3F8 was analyzed using the T20 score analyzer (16), a new *in silico* tool that can predict the"humanness" content of antibody variable regions derived from a database of ~38,700 human antibody variable sequences. **Table 3** shows the T20 scores for hu3F8, V3, and V5. As expected, the net two additional murine mutations in V3 compared to hu3F8 resulted in slightly lower T20 scores, and the net nine humanizing mutations in V5 resulted in higher T20 scores, a characteristic of low immunogenicity MoAbs.

#### **Table 2 | Mutation energies associated with the design of constructV5**.






Scale is on the order of 0–100, with 100 being the most human in sequence.

**Table 4 | Thermal stability of hu3F8 constructs**.


Samples were prepared in triplicate and measured by differential scanning flourimetry. Values are shown as mean ± standard deviation.

#### **THERMAL STABILITY**

The thermal stability of hu3F8, V3, and V5 was measured using differential scanning fluorimetry (see **Table 4**). The Fab domain of construct V3, which was designed to be more stable, had a nearly 2°C increase in Tm compared to hu3F8 (*p* = 0.006). Construct V5, on the other hand, had substantially lower thermal stability (9°C lower Tm than hu3F8). Based on the enhanced stability, V3 was chosen as a lead candidate, and the HC:G54I mutation, which we had previously shown to enhance tumor cell killing, was incorporated to make construct V3-Ile. The measured thermal stability of V3-Ile was nearly identical to V3.

#### **ANTIGEN BINDING KINETICS**

The GD2 binding kinetics of hu3F8, V3, V3-Ile, and V5 were measured by surface plasmon resonance (see **Figure 2** for normalized composite sensorgram, Figure S1 in Supplementary Material for complete sensorgrams, and **Table 5** for analysis). Construct V3 (11.5 nM *K* <sup>D</sup>) had similar binding properties to hu3F8 (9.1 nM *K* <sup>D</sup>). Construct V5, on the other hand, had an almost twofold loss in binding (19.1 nM *K* <sup>D</sup>), which may have resulted from the additional CDR mutations and/or the weakened thermal stability. Construct V3-Ile had the highest GD2 affinity (3.7 nM *K* <sup>D</sup>). Interestingly, this enhancement in affinity is higher than what we had previously observed with the HC:G54I mutation in the parental hu3F8 framework.

#### **Table 5 | Analysis of binding kinetics measured by surface plasmon resonance**.


Kon and Koff were determined from individual sensorgrams, shown in Figure S1 in Supplementary Material.

#### **IN VITRO ANTIBODY-DEPENDENT CELL-MEDIATED CYTOTOXICITY**

Antibody-dependent cell-mediated cytotoxicity assays were done to test the effectiveness of hu3F8, V3, V3-Ile, and V5 on human neuroblastoma LAN-1 cells (see **Figure 3** and **Table 6**). Cytotoxicity of an isotype matched non-targeting control is shown in Figure S2 in Supplementary Material. Consistent with the antigen binding data, V3 had similar binding to hu3F8 (EC<sup>50</sup> of 3.83 ± 0.51 × 10-3 µg/mL for V3 compared to EC<sup>50</sup> of 2.61 ± 0.48 × 10-3 µg/mL for hu3F8, *p* = 0.1138). Construct V5 had significantly weaker killing (EC<sup>50</sup> of 6.55 ± 1.45 × 10-3 µg/mL, *p* = 0.0025). Construct V3-Ile had the highest level of killing (EC50 of 0.46 ± 0.08 × 10-3 µg/mL), a nearly sixfold increase in killing relative to hu3F8 (*p* < 0.0001) and eightfold increase over its parental V3.

#### **DISCUSSION**

Aberrant glycosylation has long been considered to be a hallmark of cancer (17). Ganglioside markers such as GD2 have become an attractive target in recent years because of the number of tumor types and cancer stem cells that express it on their surface, as


#### **Table 6 | Analysis of in vitro antibody-dependent cell-mediated cytotoxicity of neuroblastoma LAN-1 cells**.

Samples were prepared in triplicate. Values are shown as mean ± standard error.

well as GD2's restricted expression in normal tissue. Monoclonal antibody 3F8 is a lead therapeutic candidate in this area, and its derivatives are being tested in a number of different targeting strategies including bispecific T-cell engaging antibodies, pretargeted radio-immunotherapy, drug/toxin conjugates, nanoparticles, and even chimeric antigen receptors for use in adoptive cell therapy.

As in all antibody therapeutics, *in vivo* efficacy is affected by antigen affinity, antibody stability, immunogenicity, as well as a number of serum stability and pharmacokinetics related factors. In this study, we have investigated a structural and computational approach to refine the stability and to reduce computationally predicted immunogenicity of a humanized form of the anti-GD2 MoAb 3F8. By introducing site-specific mutations based on forcefield simulations of the antibody crystal structure, we generated construct V3, which had significantly higher thermal stability, and comparable antigen binding and *in vitro* ADCC.

We have additionally attempted to minimize potential immunogenicity in designing construct V5, which had major mutations to CDR residues to eliminate a predicted T-cell epitope. Immunogenicity is a major component of clinical efficacy. A large percentage of neuroblastoma patients treated with the murine 3F8 developed a human anti-mouse antibody response, limiting repeated administrations of this antibody. In the case of designing construct V5, however, the mutations were too stringent, resulting in lowered thermal stability, weaker antigen binding, and weaker tumor cell killing. Finally, we combined our stability enhanced V3 construct with the cytotoxicity enhancing mutation (HC:G54I), which resulted in enhanced stability, antigen binding, and *in vitro* tumor cell killing, compared to the parental hu3F8.

While there are several examples of using computational methods to enhance the properties of antibodies [see Ref. (18) for review], there are few examples of using site-specific *in silico* based framework mutations to enhance thermal stability profiles. Wang and Duan (19) did suggest mutations to the VH–VL interface of anti-VEGF single-chain variable fragment (scFv) to enhance thermal stability based on molecular dynamics simulations, but with no experimental validation. We have recently shown that disulfide stabilization at the VH–VL interface of the anti-GD2 scFv 5F11 in the context of a GD2xCD3 tandem scFv bispecific antibody resulted in a 10°C increase in thermal stability and a nearly 150 fold increase in tumor killing potency (20). Enhancing thermal stability can also lead to less aggregation and less immunogenicity. Liu et al. (21) have shown that disulfide stabilization of an anti-CD22 antibody–toxin fusion protein resulted in enhanced thermal stability and less immunogenicity in mice. What is novel and less obvious in this investigation is that we used *in silico* predictions to make site-specific framework mutations which resulted in a nearly 2°C increase in thermal stability to the V3 framework, and when combined with a cytotoxicity enhancing mutation also derived by *in silico* methods, resulted in enhancement of both antigen binding affinity and tumor cell killing potency. In fact, we had previously shown that cytotoxicity enhancing Ile mutation (HC:G54I) had nearly the same binding to GD2 as the parental hu3F8 (15), but when the same mutation was inserted into the more stable V3 framework in this investigation, there was a greater than twofold enhancement to GD2 binding. We have therefore demonstrated that structural and computational methods can be used to refine MoAbs that bind to complex carbohydrate targets such as GD2, with further*in vivo* validation necessary to progress toward clinical development.

#### **ACKNOWLEDGMENTS**

The authors would like to thank Ms. Hong-fen Guo and Ms. Yi Feng for excellent technical assistance.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at http://www.frontiersin.org/Journal/10.3389/fimmu.2014.00372/ abstract

#### **REFERENCES**


patients with neuroblastoma and malignant melanoma. *J Clin Oncol* (1987) **5**(9):1430–40.


**Conflict of Interest Statement:** Mahiuddin Ahmed and Nai-Kong V. Cheung were named as inventors in patents related to antibody 3F8 filed by Memorial Sloan Kettering Cancer Center. Jian Hu declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 02 May 2014; accepted: 21 July 2014; published online: 14 August 2014. Citation: Ahmed M, Hu J and Cheung N-KV (2014) Structure based refinement of a humanized monoclonal antibody that targets tumor antigen disialoganglioside GD2. Front. Immunol. 5:372. doi: 10.3389/fimmu.2014.00372*

*This article was submitted to Immunotherapies and Vaccines, a section of the journal Frontiers in Immunology.*

*Copyright © 2014 Ahmed, Hu and Cheung . This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

### Carbohydrate-mimetic peptides for pan anti-tumor responses

#### **Thomas Kieber-Emmons <sup>1</sup>\*, Somdutta Saha<sup>1</sup> , Anastas Pashov <sup>2</sup> , Behjatolah Monzavi-Karbassi <sup>1</sup> and Ramachandran Murali <sup>3</sup>**

<sup>1</sup> Department of Pathology and Winthrop P. Rockefeller Cancer Institute, University of Arkansas for Medical Sciences, Little Rock, AR, USA

<sup>2</sup> Stephan Angelov Institute of Microbiology, Bulgarian Academy of Sciences, Sofia, Bulgaria

<sup>3</sup> Research Division of Immunology, Department of Biomedical Sciences, Cedars-Sinai Medical Center, Los Angeles, CA, USA

#### **Edited by:**

Elizabeth Yuriev, Monash University, Australia

#### **Reviewed by:**

Mark Agostino, Curtin University, Australia Mauro Sergio Sandrin, University of Melbourne, Australia

#### **\*Correspondence:**

Thomas Kieber-Emmons, University of Arkansas for Medical Sciences, 4301 West Markham Street, #824, Little Rock, AR 72205, USA e-mail: tke@uams.edu

Molecular mimicry is fundamental to biology and transcends to many disciplines ranging from immune pathology to drug design. Structural characterization of molecular partners has provided insight into the origins and relative importance of complementarity in mimicry. Chemical complementarity is easy to understand; amino acid sequence similarity between peptides, for example, can lead to cross-reactivity triggering similar reactivity from their cognate receptors. However, conformational complementarity is difficult to decipher. Molecular mimicry of carbohydrates by peptides is often considered one of those. Extensive studies of innate and adaptive immune responses suggests the existence of carbohydrate mimicry, but the structural basis for this mimicry yields confounding details; peptides mimicking carbohydrates in some cases fail to exhibit both chemical and conformational mimicry. Deconvolution of these two types of complementarity in mimicry and its relationship to biological function can nevertheless lead to new therapeutics. Here, we discuss our experience examining the immunological aspects and implications of carbohydrate–peptide mimicry. Emphasis is placed on the rationale, the lessons learned from the methodologies to identify mimics, a perspective on the limitations of structural analysis, the biological consequences of mimicking tumor-associated carbohydrate antigens, and the notion of reverse engineering to develop carbohydrate-mimetic peptides in vaccine design strategies to induce responses to glycan antigens expressed on cancer cells.

**Keywords: glycans, carbohydrate-mimetic peptide, mimotope, vaccines, structural design, cancer**

#### **INTRODUCTION**

Among the most challenging of antigen targets for vaccine design are glycans (1). They are ubiquitous in nature and can be considered as one of the unique antigens expressed across pathogens and cancer cells. Glycans are fundamental to the biological functions of cell–cell communication, cell proliferation, and differentiation, and they mediate cell attachment, as well as mediating pathogen attachment and infection. Cancer cells, in particular, are noted for their aberrant glycosylation profiles that affect the metastatic process. Consequently, certain carbohydrate forms profoundly affect both the pathophysiology of infection and neoplasia (**Table 1**). A unique advantage in targeting tumor-associated carbohydrate antigens (TACAs) is that multiple proteins and lipids on cancer cells can be modified with the same carbohydrate structure which might be shared with bacterial antigens (2). Thus, targeting TACAs has the potential to broaden the spectrum of antigens recognized by the immune response, thereby lowering the risk of developing resistant tumors due to the loss of a given protein antigen.

We have come to learn that the manner a TACA is expressed will dictate how an immune effector mechanism will be invoked (8). Antibodies against glycolipids and globular glycoproteins are found to mediate complement-dependent cytotoxicity (CDC) because they extend less than 100 angstroms from the cell membrane while antibodies to mucins that extend up to 5000 angstroms from the cell surface do not (8). But TACAs are also associated with cell signaling activities whereby anti-TACA antibodies are capable of direct induction of cell death of number of tumor cell lines, but this activity has not been investigated in great detail (9, 10). In this context, TACAs are pan-targets on tumor cells because they are collectively and intimately involved in cell-death signaling pathways. Strategies that target TACAs have, therefore, potential clinical benefit as cell-death therapies. Anti-TACA antibodies can mediate significant reprograming of signaling events, with profound anti-tumor activities. The ability to induce antibodies reactive with multiple TACAs is relevant as heterogeneity of antigen expression in different cancers of the same type, as well as different cells of the same cancer, and heterogeneity of immune response in different patients make it likely that maximal anticancer effect may not result from immunization against a single antigen.

The success of carbohydrate-based vaccines against pathogens has led to technological advances in vaccine design, but they have typically been developed as mono or singular vaccine types requiring a polyvalent formulation to induce responses across carbohydrate types (11). While glycans are diverse in expression patterns and in their composition,the structural commonalities among glycans provide a template to target, at least some of them collectively,



Lacto series, neolactoseries, Globosides, and Ganglioside antigens are found on tumor cells (3, 4) and on LOS of multiple bacteria (5–7).

by directing the immune response toward these commonalities. Therefore, it is logical to target glycans in vaccine design, which can lead to the interruption of disease processes (11).

Among potential technological strategies is using carbohydratemimetic peptides (CMP) to induce responses to glycans on pathogens and cancer cells (12). Peptides can substitute as immunogens to target pathways involving protein–carbohydrate interactions and in carbohydrate-specific immunological reactions. However, there is a noted distinction between the ideas of antigenic mimicry versus the ability of a mimic to induce a response cross-reactive with a carbohydrate/glycan moiety.

Antigenic mimicry, in simple terms, is when one ligand competes with another for antibody binding. The origin of crossreactivity involves thermodynamic and structural interpretations (13–15). The notion of immunological mimicry is less precise. Does it mean that the mimic generates the same antibody subset as the nominal antigen or just that it induces a response that cross-reacts with the nominal antigen?

Early on CMPs were shown to function as antigenic mimics (16, 17) but more importantly they were shown to induce serum antibodies in a variety of systems, having utility in directing responses to cancer cells and against pathogens (18–26). Most of all, unlike carbohydrate antigens, CMPs can prime for memory responses to TACAs (27) suggesting that the CMPs facilitate cognate interactions between B cells and T cells, which is something that carbohydrates/polysaccharides do not facilitate, but surrogate antigens of carbohydrates such as anti-idiotypic antibodies and CMPs should and can do. CMPs are not only a functional strategy to induce carbohydrate-reactive responses, but also they can function as probes to understand the structural basis for the dual recognition properties of antibodies, lectins, and T cells (12, 14, 15, 28, 29). Understanding the structural requirements for antibody and T-cell recognition provides a basis for identifying potentially new sets of immunogens that may have both fundamental immunological and clinical value. However, it has been argued that translation of such information into viable vaccines is still a long way off (30–32). Here, we briefly discuss the various perspectives and elements of CMPs useful to translate them into the clinic in tumor vaccine design applications to target glycans.

#### **MOLECULAR MIMICRY AT A GLANCE**

Molecular mimicry is now firmly considered as the basis of many autoimmune disorders, proposed as a pathogenic mechanism for autoimmune disease, as well as a probe useful in uncovering its etiologic agents (33). On the other hand, self-limiting autoimmunity may underlie some of the pathogenic mechanisms in infectious disease. This hypothesis is based in part on observed cross-reactivity of immune reagents with host "self" antigens and microbial determinants (33). Molecular mimicry is also suggested as a means to regulate immune homeostasis and to elicit responses against target antigens as evidenced by studies on anti-idiotypic antibodies (34). This model suggests that conventional T-cell/Bcell collaboration can explain communication between complementary idiotype [Id(+)] and anti-Id antibody at the cellular level that integrates present and previous data on B-cell regulation. Furthermore, this model provides a tool to probe carbohydrate immunology paradigms because the synergistic interaction of effector T and B cells require common recognition of identical tumor-associated antigen(s) (35). Anti-idiotypic antibodies have been proposed to mimic carbohydrate antigens and have been tested in the clinic (36–40).

At one level, an explanation for molecular mimicry is when a foreign antigen shares sequence or structural similarities with self-antigens. But on another level what defines the recognition and interaction basis for antigenic mimicry that ties to a functional immune response? Molecular mimicry in the context of antibody– antigen recognition is interpreted at several levels (**Figure 1**). The work by Hoffmuler et al. (41) suggests that a common epitope can be preserved among an ensemble of peptide variants. They demonstrated that the binding modes of intermediate conformation of selected peptides were characterized using complete sets of substitution analogs, revealing that a number of sequential substitutions accumulated without changing the pattern of key interacting residues. At a distinct step, however, one single amino acid exchange induces a change in the binding mode, indicating a flip in specificity and conformation (41). Regions of proteins with biased amino acid composition [so-called Low-Complexity Regions (LCRs)] are abundant in the protein universe (42). LCRcontaining proteins tend to have more binding partners across different networks than proteins that have no LCRs. LCRs may be involved in flexible binding associated with specific functions, but also that their positions within a sequence may be important in determining both their binding properties and their biological roles (42).

**FIGURE 1 | Illustrative models highlighting the polyspecificity or cross reactivity of antigens for an antibody**. **(A)** Two different molecules may carry the same structure. **(B)** The same paratope may accommodate multiple smaller epitopes in different parts. **(C)** The flexibility of the paratope may allow for interaction with different epitopes. **(D)** Different flexible molecules with repetitive low complexity structure containing common groups (e.g., sugars) have a high probability of fitting in the same paratope. These are aspects of polyspecific binding, which are partially related (like **A** and **D**) and sometimes may occur in combination (**C** and any one of the rest).

Intrinsically disordered regions of proteins have also been associated with molecular mimicry (43), indicating the potential of highly flexible peptides as mimics. Such peptides may be attractive to induce pan anti-tumor responses, due to their potential ability to mimic multiple TACAs *in situ*. However, the structural diversity inherent to such peptides makes defining the precise nature of their mimicry of any or multiple TACAs even more challenging. Geometrical shape complementarity, the "lock and key" hypothesis, between antigen–antibody interaction, has long dominated immunological thinking. However, studies demonstrating the existence of a large number of monoclonal antibodies that can bind to a variety of totally unrelated self and foreign antigens (i.e., polyreactive antibodies) have modified this view. Consequently the lock and key model has been supplemented with an explanation focusing on the flexibility of antibody binding sites that can change conformation to accommodate different antigens (44).

Antibodies induced by a CMP to the meningococcal group C capsular polysaccharide (18) were shown to be reactive with the Lewis Y antigen (20). Carbohydrate-reactive antibodies show the potential cross-reactivity for dissimilar carbohydrate forms that highlight the common epitope basis for cross-reactivity (**Figure 2**); **Figure 2A** shows that a common epitope is formed between α2–8 sialic acid and the neolactoseries antigen Lewis Y (18, 20). The potential of antibodies recognizing three hydroxyl groups might be cross-reactive with three hydroxyl groups displayed on two glycosyl groups (**Figure 2B**). This level of recognition leads to the idea that antibodies can recognize carbohydrate in the context of pan-recognition. The cases discussed above relate as much to the common epitope mechanism as to the low-complexity epitopes, which seems to be often the case in carbohydrate recognition.

What is discussed here, strictly speaking, is molecular interaction at the atomic level, while recognition is rather the system level processing of information relevant to immune function, i.e., self/non-self distinction and identification of previously met danger (32, 45). Specificity of interaction serves these purposes only in some aspects while others favor polyspecific binding. For T-cell receptors (TCR) antigen specificity is an emerging property of the system rather than a characteristic of the individual receptor (46). On a molecular level, TCRs are a rather promiscuous binder. Furthermore, in terms of pre-immune antibodies, polyspecificity has also the role of ensuring a complete repertoire. It funnels antigen/pre-immune antibody interactions into the somatic hypermutation process of refining specificity.

An interesting twist to this topic is the emerging notion of reverting specific antibodies to polyspecific binding or induced polyspecificity as a physiological mechanism operating for instance at the sites of inflammation (47–49). Yet, perhaps to most, typical polyspecific immune binding makes use of pattern recognition to generalize a danger context (50, 51). Functionally, the boundary between pattern recognition receptors and natural antibodies is fuzzy (52–54). Intrinsically, prone to polyspecificity by several mechanisms, antibody recognition of carbohydrates conceptually merges antigen and pattern recognition. In this regard, carbohydrate mimotopes (e.g., CMP) instead of mimicking one particular structure by another come about rather as mimics of patterns, not unlike synthetic TLR agonists. But carbohydrate mimotopes are not exclusively artificial. CMPs from natural proteins are known for some time (55, 56). Peptides from Mucin 1 cell surface receptor (MUC1) are the most interesting because they are considered mimics of the Gal-epitope (56). Natural peptides can adopt structures similar to carbohydrate antigens (21) and can exhibit binding kinetics similar to the nominal antigens that they mimic (21, 28).

Often times CMPs share no obvious consensus sequence but their amino acid sequences often contain aromatic and hydrophobic residues but also amino acids having cyclic side chains, including proline and glycine that affects the conformational properties of the mimic (13, 57). The predominance of aromatic residues in CMPs invokes interaction scenarios that include stacking and hydrophobic interactions. A basis for this is the notion that carbohydrate recognition by antibodies use hydrophobic faces on carbohydrate antigens (58). It is important to note that cohesive solvent–solvent interactions are the major driving force behind apolar association in solution (59). Consequently, interaction models that implicate important roles for dispersion forces in molecular recognition events should be interpreted with caution in solvent-accessible systems (59). In addition, other antibody recognition systems also suggest that dual antigen recognition could involve divergent antibody conformations of nearly equivalent energetic states (60). Therefore, developing high-affinity-binders might make use of antibody structural plasticity to mediate the recognition step without increasing the entropic cost (60).

#### **HUMORAL RESPONSES TO CMPs**

While a variety of CMPs have been developed with the ability to induce immune responses of desired specificities and functionality (61) they are perhaps most appealing as a probe to understand the

**FIGURE 2 | Examples depicting similarity of epitopes in dissimilar carbohydrate antigens**. Epitopes (hydroxyls) are represented by red-spheres. **(A)** Relationship between Lewis Y antigen on left side of panel with MCP on

right side. **(B)** Relationship between Lewis Y antigen on left side of panel with α1–4 Glucose on right side of panel. Interestingly the epitope defined on the glucose moiety defines a three-dimensional epitope on the Lewis Y antigen.

immunological response to carbohydrate antigens. An important feature of CMPs is in their ability to mediate contact-dependent T-cell help as an obligatory role in humoral immune responses to T-cell dependent antigens. Cognate B-cell/T-cell interactions during the immune response to protein antigens depend on T-cell co-stimulation. Details of how such interactions govern immune responses to carbohydrate-conjugate vaccines are few. We have shown that immunization with CMPs activate peptide-specific T helper type 1 (Th1) and type 2 (Th2) responses (62, 63). However, while behaving like a Th1 antigen (63), multivalent peptide mimetics still could induce a high carbohydrate-reactive IgM/IgG ratio with an endpoint titer of 1:2,000 (20). These results suggest that the multiple antigen peptide form might function like a Th2 independent immunogen in BALB/c mice. Furthermore, we observed that CMPs mediate cognate B and T-cell interactions as CMPs can induce antibodies in a host with deficiency in IgM production that typically do not respond to carbohydrate

antigens (62). In these studies apparently the B cells functioned as antigen-presenting cells. In addition, these studies suggest that Bcell subsets influence the interactions. More importantly, the type of TACA mimicked by the CMPs is expressed in mice (29, 64). Consequently, these studies are obtained in a toleragenic mouse model, further suggesting that tolerance is broken upon CMP immunization.

A characteristic of an effective mimotope based vaccine would be to prime for secondary responses upon boosting or challenge with native antigen (18,27,65–67). Peptide-mimotope anamnestic responses have been noted for mimotope-conjugates (65, 66). The identification of peptide mimetics relies upon the idea that antibody fine specificity epitope mapping patterns of carbohydrates and peptide mimetics might be used as a proxy for individual B-cell receptor specificity activated during a secondary antibody response. However, the idea of functional mimicry would suggest that immunization with a carbohydrate-mimic peptide might also induce a specific subset or restricted anti-carbohydrate response. Our studies indicate that since peptide-conjugates elicit immune responses in xid mice (62), it is likely that antibodies to peptide and carbohydrate immunogens might be structurally unique and derived from different antibody subsets.

#### **POTENTIAL FOR CELLULAR IMMUNITY TARGETING CARBOHYDRATE ANTIGENS**

Up until a few years ago, carbohydrate determinants were traditionally not considered as targets for Cytotoxic T-Lymphocytes (CTL) despite a variety of immunogenicity and specificity studies for the glycan moiety of synthetic *O*-glycosylated MHC-binding peptides suggest otherwise (68–70). GD2 was also implicated as a target upon CTL activation early on (71). Crystal structure analyses indeed show that T cells can recognize glycopeptides bound by MHC molecules on the surface of antigen-presenting cells (72, 73). T cells, therefore, have the potential to react with the carbohydrate moiety of neoglycopeptide antigens, suggesting that T cells can target carbohydrate antigens expressed on tumor cells. However, it is also possible to generate carbohydrate-specific unrestricted CTL responses with MHC class-I-binding carrier peptides (74) that might explain the GD2 response (71). Nonetheless,how such T-cell responses are generated is presently unclear. From a vaccine perspective, the construction of glycopeptide/protein immunogens is problematic.

Rather than simple molecular mimicry, unpredictable arrays of common and differential contacts on class-I complexes can be used for their recognition by the same TCR. For example, bacterial polysaccharides with a distinct charge-motif can be emulated by peptides that can activate T cells (75). Lysine–aspartic acid (KD) peptides with repeating units are able to stimulate CD4<sup>+</sup> T cells *in vitro* and confer protection against abscesses induced by bacteria such as *Bacteroides fragilis* and *Staphylococcus aureus* (75). CMPs can induce a Th1 response in mice using a DNA platform (76). We have observed an augmented induction of CTL activity against Meth A tumor cells upon peptide-mimotope immunization (63, 77). The induction of carbohydrate-reactive T-lymphocytes with peptide mimics is based upon a functional definition of T-cell mimotopes. One possible explanation is that the peptide-mimotope activates CTLs, which bind to *O*-linked GlcNAc or GalNac glycopeptides associated with MHC Class-I. Based upon crystal structure analysis of MHC complexes with glycopeptides, it appears that the central region of the putative T-cell-receptor-binding site is dominated by the extensive exposure of the tethered carbohydrate (72, 73). Our modeling of CMPs in the MHC Class-I groove suggests that amino acids and glycans attached to a glycopeptides overlap in 3D space, providing an array of contacts for TCR recognition (12).

#### **FIDELITY IN MIMICRY**

The ability to augment or enhance TACA-reactive antibodies using CMPs would be noteworthy. Much like anti-idiotypes, CMPs may elicit anti-saccharide responses, but fail to elicit the idiotypes and isotypes observed in the protective response to the microbial antigen (78). Functional antibodies depend not only on the host's ability to mount an immune response, but also on its ability to mount the appropriate immune response. Whether an antibody response is protective or not may depend on both the fine antigenic specificity that may be associated with particular idiotypes and epitope binding characteristics, and the isotype, determining antibody effector function. Often times studies of peptide mimics selected by lectins or antibodies and then analyzed by structural approaches come to the conclusion that mimicry at structural level is minimal at best (13–15). The same conclusions are drawn in considering anti-idiotypic antibodies (79). Rather, mimics as peptides or anti-idiotypes serve as imprints of the structural characteristics of the nominal carbohydrate antigen and, consequently, give rise to antibodies with carbohydrate-like properties upon immunization. The question remains how to enhance the ability of TACA-mimetic peptides to induce TACA-reactive antibodies with higher titers and association constants. Herein lies the problem with mimics; the immune response is only assayed after a choice is made as to which mimic is to be followed. So what lessons can be learned about choosing the "true" mimic?

#### **From lectins to vaccines**

While lectins have been generally used to identify CMPs and to understand the general features of recognition phenomena, **Figure 3** outlines the general development of CMPs in vaccine design using lectins as a template to induce antibodies that would emulate the actions of lectins. We have shown that this concept can be brought into practice (80). Plant lectins like *Griffonia simplicifolia* lectin I (GS-1) and wheat germ agglutinin (WGA) mediate the apoptosis of tumor cells. We have investigated the possibility of using these lectins as templates to select peptidemimotopes of TACAs as immunogens to generate cross-reactive antibodies capable of mediating apoptosis of tumor cells (80).

**FIGURE 3 | General scheme of translating process of random phage library screening to functional vaccine**. Important to start with lectin or antibody with functionality but not all CMPs selected will induce the desired response. CMPs can be defined in a four-step process. (1) Lectins that trigger apoptosis of tumor cells are defined. (2) Biopanning against a random peptide display library identifies potential CMPs, which are confirmed by carbohydrate-peptide inhibition assays. (3) The potential of the CMPs to induce TACA-reactive antibodies is evaluated, as is (4) the ability of CMP-induced antibodies to mediate apoptosis of tumor cells.

Vaccine-induced anti-carbohydrate antibodies to both 106 and 107 (**Table 2**) reduced the outgrowth of micrometastases in the 4T1 spontaneous tumor model, significantly increasing survival time of tumor-bearing animals. This finding parallels suggestions that carbohydrate-reactive IgM with cytotoxic activity may have merit in the adjuvant setting if the right carbohydrate-associated targets are identified (81, 82). Interestingly, while both CMPs 106 and 107 are reactive with lectins only 107 induced responses that were directly cytotoxic to tumor cells. Both CMPS induced antibodies that mediated CDC,however, only CMP 107 induced serum IgM antibodies in mice that mediated the apoptosis of murine 4T1 and human MCF7 cell lines*in vitro*, paralleling the apoptotic activity of the lectins (80). This finding again highlights that selection of CMPs based upon antigenic mimicry does not automatically translate into inducing antibodies with a desired functionality.

Fundamental feature of these CMPs was their hydrophobic nature being built on motifs containing aromatic residues. Early on, peptides that mimic carbohydrate antigens were identified by analysis of reactivity of random peptide libraries on phage with the lectin Concanavalin A (ConA) (16, 17). These early peptides contain aromatic side chains, representing a generalized Trp/Tyr/X/Tyr (were X is a number of different residues) motif. Subsequent to these seminal studies other aromatic peptides displaying similarities to ConA-reactive ones were described (18, 83). Aromatic residues, hydrophobic, and hydrogen bonding amino acids seem favored but with the possibility that the W/YXY motif functionally mimic elements of Core 1 and 2 structures shared among otherwise dissimilar carbohydrate structures (**Figure 2**). Consequently, this motif type has been observed in peptides isolated by a number of anti-carbohydrate antibodies and lectins and might represent low-complexity surfaces (**Figure 1**) perhaps because of the bias in the amino acid composition of the mimetics. Such biased sequences do not necessarily converge on a canonical set of patterns although some motifs stand out. It is important to note how those peptides reactive with ConA were identified based upon a conception of antigen mimicry, as the work of Westerink et al. (18) were based upon immunological studies starting with an anti-Id that displayed immunological functionality.

#### **Structure-based reverse engineering to discover peptide mimics**

The caveats associated with screening libraries with either lectins or with antibodies often lead to identifying mimics that fail to mimic critical contacts that the carbohydrate makes with the protein, and there is a possibility that such peptides may bind to alternative sites on protein to the carbohydrate-binding site, making optimization of a true structural mimic from such a peptide impossible (84). The structural approaches to define the basis of mimicry have been previously discussed (13, 14). As mentioned above high-affinity peptides *per se* may not necessarily mimic critical contacts required for the function. In addition, the judicious choice of peptides for testing antibody responses against should be based on the peptide interaction with both the heavy and light chain in order to induce antibodies with similar antigen specific properties (28); as the combination of heavy and light chains will influence specificity. Thus, both the variable and the constant region of the antibodies induced by a peptide mimic or mimotope must be considered when assessing the success of any immunization.

To overcome the limitation of high-affinity peptides' lack of immunological mimicry, we adopted a "reverse engineering approach" sometime ago, which places emphasis on the maintaining critical contacts between carbohydrates and its protein partner (28, 29). This method is similar to fragment-based drug discovery (28). We have previously reviewed the structural concepts and approaches used in vaccine design applications that illustrate the value and limitations of using chemical (peptide libraries which are mimics of a ligand) and immunological information to define novel peptide immunogens that function as mimotopes to generate immune responses targeting TACA (85) and glycans on the human immunodeficiency virus (86). In this context, we showed that concepts associated with pharmacophore design (now considered reverse engineering) could be used to define CMPs applied to vaccine design (21, 28). We demonstrated a structure-assisted vaccine design approach, whereby small molecules, defined in crystallographic databases, could be used in principle to define peptide mimetics emulating the three-dimensional interaction scheme of a native carbohydrate antigen (21, 28). More importantly, it was shown that virtual screening led to motifs being observed experimentally and that they could display binding energetics similar to the nominal carbohydrate antigen (28).

We have also shown that by using this approach, an immunogenic peptide (911 **Table 2**) can be designed *de novo* using ConA as a template inducing antibodies with the same functionality as ConA in neutralizing HIV isolates (21). In addition, we showed that peptides could adopt structures that are similar to carbohydrate conformations that include extended beta strand type and helical structures (21). Using reactivity patterns of glycan binding to ConA coupled with structural design concepts we identified a peptide (referred to as 911) (**Table 2**) that when rendered as a multiple antigenic peptide (MAP) was reactive with ConA at


#### **Table 2 | Selected CMPs that we have studied**.

lower concentrations than those required for reaction of some native oligosaccharide ligands of ConA (21). The 911-MAP displayed competitive inhibition with carbohydrate ligands of ConA, indicating that it binds at an overlapping carbohydrate-binding site on ConA. Isothermal Calorimetric analyses and immunoprecipitation experiments suggest that a shorter monovalent putative peptide 912 (**Table 2**) exhibited a weak affinity comparable to that of MeαMan (21). The 911-MAP exhibited a higher association constant and free energy of association with ConA compared with that found upon binding of the putative 912 peptide and the Ka and ∆G values of 911-MAP are comparable to those of ConA-reactive trimannoside and pentasaccharide (21). Most importantly, the 911-MAP induce antibodies in mice that are capable of neutralizing HIV-1 III-B as assessed by p24 ELISA (21). This is work perhaps for the first time demonstrated that design-principles associated with CMPs could be useful to induce functional antibodies. Similar approaches have since been applied to investigate peptide recognition by anti-alpha-Gal antibodies (87) and in developing CMPs of gangliosides (88). As in our studies, it was found that peptides could interact with the same residues as those involved in carbohydrate recognition. In this context, CMPs are envisioned to be further enhanced as either inhibitors much like that in mainstream pharmacophore development or as in our case to develop vaccines targeting glycans.

To further emphasize the design principles to enhance the fidelity of mimicry, we tested the hypothesis that improving the hydrogen bond pattern through amino acid substitutions in a CMP, to be coincident with that for the carbohydrate ligand, will enhance the ability of CMPs to elicit anti-TACA antibodies with high titers and association constants (29). Based on anti-Id/Id crystal structures, highly directional bonds represent an important set of interactions to establish a basis for mimicry because they mainly confer the specificity in binding of the peptide and the carbohydrate antigen. In this exercise, we developed the CMP P10s (**Table 2**) (29). This CMP was identified from a random peptide library screen using the anti-GD2/GD3 antibody ME36.1 (89). P10 was shown to generate immune responses in mice that inhibited tumor growth *in vivo* (90).

In the development of P10s, we made use of the crystal structure of the anti-ganglioside antibody ME36.1 (29). Briefly, the crystal structure of ME36.1 was analyzed in the context of comparing GD2 binding and CMP binding using a molecular docking approach (29). Based on the hydrogen bonds interaction between GD2 and CDRs of ME36.1, P10s was designed. Conformational and docking calculations suggested that P10s would form an increased number of hydrogen bonds with ME36.1 that are in common with the GD2 hydrogen bond interaction pattern with ME36.1 [see **Table 1** in Ref. (29)]. This increased level of mimicry would suggest that the immune response to GD2 upon immunization with P10s would be better. We observed that P10s did indeed induce higher titer antibodies to the target antigen and antigen expressing tumor cells than the parent CMP, P10. These studies suggest that for carbohydrate mimics, pharmacophore based design is superior over the conformational approach undertaken for other peptide mimics.

#### **PRECLINICAL ASSESSMENTS OF CMPs**

Tumor-associated carbohydrate antigen are rather varied in their expression profiles on tumor cells and on normal tissue. TACAs are upregulated in many types of tumors, and therefore represent a potential vaccination target with widespread application. Cancer vaccines functionally resemble the process of autoimmune-mediated tissue damage (91).

Since tissue rejection is the goal of cancer immunotherapies, broad-spectrum, pan-antigens like TACA are plausible effective targets once the problem of their low immunogenicity is solved. This is the hope of CMP and anti-idiotypic vaccine research.

The basis of TACA mediate tumor rejection is akin to the observation that anti-Gal IgM and IgG mediate rejection of xenograft expressing α-gal glycoconjugates with terminal Galalpha1-3Galbeta1-4GlcNAc sequences (alpha-galactosyl epitopes, natural xenoreactive antigens) that are present on various tissues in pigs and are recognized by human anti-alpha-galactosyl (alpha-Gal) antibodies (92). The tissue-rejection mediated by α-Gal-reactive antibodies demonstrates the feasibility of targeting TACAs for tumor therapy because tumor-induced antibody responses resemble autoimmune responses (93).

The generation of tissue-rejection represents an important conceptual approach to cancer immunotherapy. Alpha-galactosylated xenoantigens (Galalpha1-3Galbeta1-4GlcNAcbeta1 and Galalpha1- 3Galbeta1-4GlcNAcbeta1-3Galbeta1-4Glc) are often detected with the alpha-Gal-reactive lectin GS-1. However, this lectin exhibits a broad and variable specificity for carbohydrates terminating in alpha-Gal (94). The blood group reactive lectin GS-I, which recognizes alpha-galactosyl moieties is recognized as a surrogate marker to identify tumor expressed antigens reactive with anti-Gal antibodies and GS-I is of utility to interrogate terminal α-GalNAc/Gal expression on human tissues (95). Some of these antigens are also expressed on normal cells at low levels, potentially creating a state of immune tolerance.

We have previously demonstrated that vaccination with the CMPs 106 and 107 (**Table 2**) can induce antibody responses leading to cell-mediated cytotoxicity and apoptosis, respectively, in murine models of cancer (80). In preclinical studies, we observe that immunization in mice with these CMPs do not induce significant immunopathology, organs including liver, kidney, heart, lungs, intestines, stomach, lymph nodes, spleen, brain, spinal cord, and eyes were examined in H&E stained sections. These organs are reactive with GS-1 and the CMPs induce antibodies reactive with GS-1 antigens (29, 64). No significant cellular infiltrates were identified in any organ, including brain and spinal cord, from any animal, and there was no evidence of necrosis or extensive apoptosis in these sections (29, 64). It is likely that the level or pattern of expression of these molecules on the surface of tumor cells differs significantly from that on normal cells mediated by antibody avidity and the clustering of glycan epitopes (96). This difference in expression may account for the relative specificity of immunologic injury for tumor cells over normal cells.

Antibodies induced by CMPs are thought to have low affinities for TACA that might compensate for the low-affinity of the carbohydrate cross-reactive antibodies, minimizing the destruction of normal tissue. Such results demonstrate that repeated injections of CMPs do not necessarily lead to immune mediated injury and support the development of CMPs for clinical testing (29, 64). Bringing such vaccines to the treatment armamentarium may significantly improve outcomes for patients.

#### **CLINICAL ASPECTS OF MIMICRY**

The potential benefits of inducing TACA-reactive antibodies in patients with cancer are demonstrated by observations that patient survival significantly correlates with ganglioside-reactive IgM levels (97). The fact that survival rates of cancer patients are correlated with low-titer, and presumably low-affinity, TACA-reactive antibodies argues that more robust antibody responses may not be necessary. Cross-reactions are important issues in vaccine the development field. As self-antigens induce tolerance, vaccination with non-self-antigens that molecularly mimic self-antigens may avoid tolerance and lead to generation of anti-tumor immune responses. In this context, little attention has been paid to the fact that the tumor-associated antigen MUC1 might be a natural CMP. In a series of studies from McKenzie's group, it was noted that anti-α-Gal antibodies reacted with MUC1 antigens and that anti-MUC1 antibodies reacted with the α-Gal sugar (56). In mice, MUC1 peptide immunization resulted in cellular responses with reported little humoral response. In contrast, the MUC1 peptide induced a strong antibody response in human immunization. It was argued that pre-existing anti-Gal antibodies in human was the basis for the differential response as the Gal-epitope is a natural antigen in mice (56).

The mimicking of MUC1 with the Gal-epitope might have important consequences. The natural cross-reactivity of anti-Gal antibodies against MUC1 might lend to confusion making it difficult to ascertain the relative contributions of antibodies in binding to MUC1 upon MUC1 immunization; e.g., whether one can dissect if anti-MUC1 antibodies are not anti-TF and whether anti-MUC1 antibodies are not anti-Gal antibodies (98). On the other hand, the cross-reactivity of anti-Gal antibodies with MUC1 might lend to anti-MUC cellular immunity. Dendritic cells (DCs) play an important role in the induction of T-cell responses. Fc gammaRs (FcγR), expressed on DCs, facilitate the uptake of complexed antigen, resulting in efficient MHC class-I and MHC class-II Ag presentation and DC maturation (99, 100). IgG-complexed MUC1 internalized through FcγR on DCs might be are efficiently presented to CTLs through the MHC class-I pathway as observed in other systems (99, 100). However, these mechanisms might also responsible for antibody-mediated enhancement *in vivo* as suggested by the McKenzie work in humans and in animal models where antigen-IgG and IgE complexes exacerbated Th2 cells rather than Th1 cells (101). Therefore, mimicry of the Gal-epitope by MUC1 might skew Th2 type responses to MUC1 vaccines, which is contradictory to the present paradigm that stresses Th1 responses as being beneficial to MUC1 and other tumor-associated antigens.

While CMPs of TACA have been described that include the ganglioside GD2 (89, 95, 102–105), the ganglioside GD3 (106), sialylated Lewis a/x (107), and LeY (89), none of these CMPs have made it to the clinic except for our P10s. In contrast, several anti-idiotypic antibodies that mimic different GD2 (108– 111), GD3 (112), and N-glycolyl (NGc) gangliosides (40) have made it to clinical trials. The most advanced is the GD3 mimicking antibody Bec2, which has been tested in a Phase III trial

(112). Unfortunately, there was no improvement in survival or progression-free survival in the vaccination arm with Bec2. Each of these anti-idiotypes seems to have a different mechanism of action against cancer cells but parallel mechanisms observed with CMPs. In the case of the anti-idiotype that mimics NGc gangliosides it generates a humoral response that triggers cell death but differently than typical apoptosis (113). Patients that developed IgG and/or IgM Abs against NeuGcGM3 showed longer median survival times (114). Immunizations with the GD2/GD3 surrogates are less mechanism based. Bec2 induces antibody responses in about 25% of subjects (115). Consequently, different strategies using Bec2 have been considered including priming (38) and in combination therapy with adjuvant (116). For GD2, the antiidiotypes induce GD2 reactive antibodies, which mediate ADCC activity. This type of data suggests that the anti-idiotypes generally generate IgG1 type antibodies are efficient at ADCC (108) while IgG2, which are considered carbohydrate reactive is minimal at mediating ADCC.

#### **FUTURE DEVELOPMENT OF STRUCTURE-BASED VACCINES**

Glycans or TACAs are important targets for cancer immunotherapy as suggested by immune surveillance mechanisms. TACAs display important biological effects in tumor biology and tumor immunology. Most importantly, the recognition properties of glycans by immune effector cells have suggested translational strategies in immune therapy. In this review, we elaborated on achievements that facilitate rational vaccine design using CMPs. In nature, immunogenic parts of pathogens and cancer cells that provide antigens for B-cell receptors and antigenic peptides that are presentable by MHC molecules to TCR have to be identified. There is much to learn from the B/TCR that see carbohydrates as antigens and then as immunogens. Carbohydrates define recognition patterns, which activate the innate immune system to induce an appropriate adaptive immune response. Regular considerations in using CMPs that are selected upon binding to these receptors have not been pursued in the clinic with much fervor. This is partly due to the perception of utility and the idea that we need "specific" responses to singular carbohydrate antigens. More thought needs to be directed toward rational design approaches, which we have shown can be successfully implemented and not indiscriminate studies of co-crystallization or NMR studies with CMPs derived from random phage screening that are selected biased toward high-affinity binders (13–15).

While there are no universally accepted strategies and tools to rationally design vaccines to elicit antibody responses, vaccines should include B-cell receptor epitopes, but these might be more of a clustered type as we have shown using MAP platforms. Nanoparticle concepts could play a role here if they can be manufactured under GMP clinical grade. MHC Class-I/II molecules process Glycopeptides and so it is thought that these could be incorporated as well. The choice of carbohydrate might also impact on inducing Th1 or Th2 responses. We have shown that naked peptides can do the same however (63, 77). These glycopeptides or naked peptides should display sequences that allow T-cell epitope formation in a complex with MHC molecules but with the realization that there are hundreds of alleles that are differentially combined between individuals. Choosing immunogenic peptides presented by MHC faces the challenge of not only predicting sequences appropriate for complex with a particular MHC allele, but also finding peptides that can reliably build epitopes in the diverse genetic background within a human population.

The diversity of regulatory mechanisms involving glycans expands the range of possible effects of TACA targeting immunotherapeutic approaches (117). Anti-TACA antibodies, thus, may be involved in more than direct tumor cytotoxicity even though this mechanism is exciting. Although, the exact mechanism may represent a cascade of steps that are still to be established, TACA targeting has the potential to yield anti-tumor effects mediated by Natural Killer cells, which has not been thoroughly investigated in humans even though there is some evidence of therapeutic benefit (118, 119) or through neutralization of tumor immunosuppressive factors in the form of soluble gangliosides (120–122). Future work should clarify the points of involvement of antibody/carbohydrate interactions in modulating tumor growth and facilitating innate surveillance mechanisms.

The number of manuscripts published on CMPs has certainly diminished in recent years. The promise of CMPs 'to be functional in animal models of bacterial infection has been impressive starting with our own (18), yet no CMP for bacterial antigens has made it to the clinic. The same is said for anti-idiotypes of bacterial carbohydrate antigens. In the cancer area only our P10s CMP has made it to the clinic. Perhaps the problem with diminished work on CMPs is more about the perception of mimicry rather than outcomes. The same has been suggested for anti-idiotypes (34). In mouse models, CMPs are functionally relevant much like anti-idiotypes. But clinically, all vaccine types, mimic or not, display less than optimal activity. The focus in immunotherapy has therefore been centered on checkpoints that mediate the immune response. Nevertheless, the promise of CMPs and other surrogates of carbohydrates is to better understand the structural implications of the antibody-mediated interactions that has the potential for innovation in terms of rational design of reagents with biological, chemical, and pharmaceutical applications that underlies concepts of reverse immunology which is highlighted herein.

#### **REFERENCES**


cross-reactive carbohydrate antigens. *Vaccine* (2004) **22**:898–908. doi:10.1016/ j.vaccine.2003.11.036


anti-neuroblastoma immune response. *Cancer Immunol Immunother* (2013) **62**:999–1010. doi:10.1007/s00262-013-1413-y


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 25 April 2014; accepted: 17 June 2014; published online: 30 June 2014. Citation: Kieber-Emmons T, Saha S, Pashov A, Monzavi-Karbassi B and Murali R (2014) Carbohydrate-mimetic peptides for pan anti-tumor responses. Front. Immunol. 5:308. doi: 10.3389/fimmu.2014.00308*

*This article was submitted to Immunotherapies and Vaccines, a section of the journal Frontiers in Immunology.*

*Copyright © 2014 Kieber-Emmons, Saha, Pashov, Monzavi-Karbassi and Murali. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

### Predicting the origins of anti-blood group antibody specificity: a case study of the ABO A- and B-antigens

#### **Spandana Makeneni <sup>1</sup> ,Ye Ji <sup>1</sup> , David C.Watson<sup>2</sup> , N. MartinYoung<sup>2</sup> and Robert J.Woods 1,3\***

<sup>1</sup> Complex Carbohydrate Research Center, University of Georgia, Athens, GA, USA

<sup>2</sup> Human Health Therapeutics, National Research Council Canada, Ottawa, ON, Canada

<sup>3</sup> School of Chemistry, National University of Ireland, Galway, Ireland

#### **Edited by:**

Elizabeth Yuriev, Monash University, Australia

#### **Reviewed by:**

Jessica Kate Holien, St. Vincents Institute of Medical Research, Australia Ricardo Mancera, Curtin University, Australia

#### **\*Correspondence:**

Robert J. Woods, Complex Carbohydrate Research Center, University of Georgia, 315 Riverbend Road, Athens, GA 30602, USA e-mail: rwoods@ccrc.uga.edu

The ABO blood group system is the most important blood type system in human transfusion medicine. Here, we explore the specificity of antibody recognition toward ABO blood group antigens using computational modeling and biolayer interferometry. Automated docking and molecular dynamics simulations were used to explore the origin of the specificity of an anti-blood group A antibody variable fragment (Fv AC1001). The analysis predicts a number of Fv-antigen interactions that contribute to affinity, including a hydrogen bond between a HisL49 and the carbonyl moiety of the GalNAc in antigen A. This interaction was consistent with the dependence of affinity on pH, as measured experimentally; at lower pH there is an increase in binding affinity. Binding energy calculations provide unique insight into the origin of interaction energies at a per-residue level in both the scFv and the trisaccharide antigen. The calculations indicate that while the antibody can accommodate both blood group A and B antigens in its combining site, the A antigen is preferred by 4 kcal/mol, consistent with the lack of binding observed for the B antigen.

**Keywords: molecular docking, MD simulations, blood group antigens, antibody specificity, GLYCAM, AMBER**

#### **INTRODUCTION**

Since its discovery in 1900 (1), the ABO blood group system has played a crucial role in defining human blood and tissue compatibility. The blood type of an individual indicates the presence or absence of relevant antigens and antibodies. The three blood types share a core oligosaccharide antigen (H), and based on the glycosyl transferases inherited, different antigens are synthesized (2–4); type A transferase adds a terminal non-reducing *N*-acetylgalactosamine (GalNAc) residue; type B transferase adds galactose (Gal), whereas individuals with blood group O retain the unmodified H antigen. During the first years of life, the immune system forms antibodies upon exposure to non-self antigens from various exogenous factors. Thus an A-type individual will have circulating antibodies specific for the B-antigen, and vice-versa. The high degree of specificity is notable given that the only difference between the structures of the A- and B-antigens is the replacement of an acetamido moiety (in A) with a hydroxyl group (in B). Because of the presence of circulating antibodies, a mismatched blood transfusion or organ transplant can lead to hyperacute immune response and death (5, 6). Additionally, under certain circumstances, incompatibilities in blood groups between mother and child can trigger the mother's immune system to produce antibodies against the fetus, causing hemolytic disease (7).

Alterations in the structures of the ABO antigens often occur during carcinogenesis and therefore they have also been considered tumor markers (8, 9). Recently, strong correlations have been established between the presence of particular ABO and Lewis antigens and susceptibility to infectious diseases, such as *Helicobacter pylori*, norovirus, and cholera (10), wherein the blood group antigens can be exploited as receptors for bacterial and viral adhesion. Conversely, it has been suggested that endogenous anti-blood group antibodies can recognize blood-group-like carbohydrate antigens on pathogen surfaces, conferring protection against infection (11).

Despite their clinical importance, relatively little is known about the structural basis for these highly specific antibodies– antigen interactions. Although X-ray crystallography has been used to characterize antibody–carbohydrate complexes, the generally enhanced flexibility and conformational heterogeneity of oligosaccharides detracts from the ability to generate co-crystals (12). Additionally, anti-carbohydrate antibodies bind to their antigens with an affinity that is 3–5 orders of magnitude lower than typical antibodies that bind to protein or peptide antigens. Difficulties in generating 3D structures for carbohydrate–antibody complexes have led to the increasing use of theoretical structure prediction methods (13, 14), which, while convenient, are prone to predicting false positives due to inaccuracies in pose scoring functions (15) and to the omission of carbohydrate conformational preferences (16).

In this study, we examined the structural origin of the antigenicity (the specificity and affinity) of a monoclonal antibody raised against blood group A (BGA) antigen, for which an apo structure of the single-chain variable fragment (scFv AC1001) has been reported (17). The specificity data from screening two independent glycan arrays [Consortium for Functional Glycomics (v4.0, request ID: 1808) and from the group of Jeff Gildersleeve] confirmed that the scFv displayed no detectable binding to any B-antigens and only bound to BGA-containing glycans. To provide a structural interpretation for the specificity of AC1001 for BGA over blood groups H (BGH) and B (BGB), we generated a 3D model of the immune complex using molecular docking and refined it by molecular dynamics (MD) simulation. Despite its limitations, molecular docking, with or without additional experimental constraints, such as from NMR data, is often the only approach that may be employed to generate the structure of a ligand–protein complex, in the absence of direct crystallographic data. To enhance the success rate, a recent carbohydrate conformational energy function (16) was employed with AutoDock VINA (18), which quantifies the conformational preferences of oligosaccharides based on their glycosidic torsion angles. MD simulations (50 ns) were subsequently performed to ensure that the docked complexes were stable under physically realistic conditions, and in that event, the MD data were employed in binding free energy calculations. A particular advantage of MD-based energy calculations is that they provide statistically converged values that may be partitioned into contributions from individual residues in the protein and ligand (19).

#### **MATERIALS AND METHODS**

#### **CLONING, EXPRESSION, AND PURIFICATION OF scFv**

An scFv gene containing a short linker (RADAA) and the Leu 103H Val mutation (17), with a His<sup>6</sup> tag, was assembled by PCR and cloned into the phagemid pSK4. The construct was maintained in *Escherichia coli* TG1 cells. Cells from positive clones, as judged by DNA sequence analysis, were grown in minimal media, induced, and subjected to periplasmic extraction. The scFv dimer was purified from the extract by Ni2<sup>+</sup> immobilized metal affinity chromatography, by elution with an imidazole gradient.

#### **BIOLAYER INTERFEROMETRY**

Affinity measurements were performed on a biolayer interferometer (Octet Red96, ForteBio). Data were processed using the Data Acquisition and Analysis 8.0 software (ForteBio), and kinetic binding constants were determined from a 1:1 binding model using the OriginPro software (OriginLab). The scFv was immobilized on an amine reactive second-generation (AR2G) biosensor (Lot No. 1311212, ForteBio). The BGA trisaccharide was analyzed as the conjugate to bovine serum albumin (BSA–BGA) and was dissolved in an analysis buffer containing 10 mM HEPES, 150 mM NaCl, 3.4 mM EDTA, and 0.005% Tween 20 at a range of pH values (5, 5.5, 6, 6.5, and 7). A BSA–Le<sup>X</sup> trisaccharide conjugate (Prod. No. NGP0302, V-Labs, Inc.) and BSA (Prod. No. 23209, Pierce Thermo Scientific, Rockford, IL, USA) were used as negative controls. Details of the biolayer interferometry (BLI) conditions are provided in Supplemental Material.

#### **AUTOMATED DOCKING**

Docking was performed using AutoDock VINA (18) with 20 docked poses generated for each experiment. The protein and the ligand files were prepared using Autodock tools (ADT) (20) with Gassteiger (21) partial atomic charges assigned to both the protein and ligand residues. The crystal structure of the scFv (PDB ID: 1JV5) was employed, together with a 3D structure of BGA obtained from the GLYCAM-Web server (www.glycam.org). Crystal waters were removed prior to docking and hydrogen atoms were added to the protein using ADT, whereas hydrogen atoms in the ligand were assigned from the GLYCAM residue templates. The glycosidic φ and ϕ torsion angles were allowed to be flexible during docking, as were all the hydroxyl groups. The protein was maintained rigid. The docking grid box (dimensions: 26.25 Å × 26.25 Å × 37.5 Å) was centered relative to the complementarity determining regions (CDRs) of the antibody as described previously (16). For the mutational-docking approach, TrpH100 was mutated to Ala by deleting the side-chain atoms of the Trp residue in the crystal structure,followed by processing with the tleap module in AMBER (22). AlaH100 was reverted back to Trp by restoring the crystal coordinates of the side chain of TrpH100. The docked poses from the mutational approach were filtered based on the clashes with the reverted Trp. Poses in which the clashes could not be eliminated by implicit energy minimization (details are in the "MD simulations" section) were rejected. Ligand conformations of all the docked poses from both the flexible and mutational-docking approaches were scored using the recently reported carbohydrate intrinsic (CHI) energy scoring function (16). Any conformations with total CHI-energies >5 kcal/mol were rejected. The BGB complex was generated directly from that generated for BGA by simple replacement of the NAc group by an OH group.

#### **MD SIMULATIONS**

All the MD simulations were performed with the GPU implementation of the pmed code, pmed.cud\_SPDP (23), from AMBER12 (22). The calculations employed the ff99SSB (24) parameters for the protein and the GLYCAM06h (25) parameters for the carbohydrate. For the BGA, BGB–scFv complex simulations, an implicit solvent energy minimization (5000 steps of steepest descent followed by 5000 steps of conjugate gradient), were performed to optimize the side-chain positions of the reverted Trp residue. During this minimization, the backbone atoms of the framework regions were restrained with a 5 kcal/mol Å<sup>2</sup> while the CDRs and the ligand were allowed to be flexible. The systems were then solvated in a cubic water box [120 Å per side, with a TIP3P water (26)]. Each system was energy minimized using explicit solvent (10,000 steps of steepest descent, 10,000 steps of conjugate gradient). During this energy minimization, the protein residues were restrained with a force constant of 100 kcal/mol Å<sup>2</sup> allowing only the solvent and ligand to relax. This minimization was followed by heating from 5 to 300 K over the course of 50 ps at constant volume. Production MD simulations were performed for 50 ns at constant pressure (NPT ensemble) with the temperature held constant at 300 K using a Langevin thermostat. During the heating and the production MD, the backbone atoms of the protein were restrained with a force constant of 5 kcal/mol Å<sup>2</sup> , with the protein side chains and ligand atoms allowed to be flexible. The backbone atoms were restrained in order to ensure that the protein fold remained stable during the course of the simulation. For the BGA trisaccharide MD simulation, the system was solvated in a cubic water box (120 Å per side, with a TIP3P water) and energy minimized using explicit solvent (5000 steps of steepest descent, 5000 steps of conjugate gradient). This was followed by heating from 5 to 300 K for a period of 50 ps at constant volume. Production MD simulations were performed for 50 ns at constant pressure (NPT). During the minimization, heating, and production MD simulations, there were no restraints placed on the trisaccharide. For both BGA, BGB–scFv complexes and BGA trisaccharide simulations, all covalent bonds involving hydrogen atoms were constrained using the SHAKE (27) algorithm, allowing a time step of 2 fs. A non-bonded cut-off of 8 Å was used and longrange electrostatics were employed using the particle mesh Ewald (PME) method (28). Snapshots were collected at 1 ps intervals for subsequent analysis.

#### **ANALYSIS**

The stability of the complexes was assessed by monitoring the root-mean-squared-displacement (RMSD) of the ligand position, the glycosidic torsion angles, the ring conformations, and the protein–ligand hydrogen bonds. All these values except for the ring conformation analysis were generated using the ptraj module of AMBERTOOLS 12 (29). Ligand RMSD values were calculated for the ring atoms, relative to the first time step of the simulation. Hydrogen bond interactions between the protein and the ligand were measured with distance and angle cut-off values of 3.5 Å and 120°, respectively. The ring conformations of each individual residue in the ligand during the course of simulation were analyzed using the recently reported BFMP method (Makeneni et al., submitted). Binding free energies were calculated with the MM-GBSA (30, 31) module in AMBERTOOLS12. All the water molecules were removed prior to the MM-GBSA calculation, and desolvation free energies were approximated using the generalized born implicit solvation model (igb = 2) (32).

#### **RESULTS AND DISCUSSION**

#### **DOCKING ANALYSIS**

In preliminary experiments, docking to the rigid scFv structure yielded complexes that failed to remain stable during subsequent 10 ns MD simulations (Table S1 in Supplementary Material). The spontaneous dissociation of the complex during MD simulation suggested that the docking had failed to detect the correct, high affinity, pose (33). Upon inspection of the MD data, it was observed that light chain residue His49 (HisL49) forms a stacking interaction with heavy chain residue Trp100 (TrpH100), which occupies a large volume of the presumed binding site, potentially preventing deeper penetration of the ligand (**Figure 1**).

As Trp residues can also form stacking interactions with the apolar face of monosaccharides in antibody complexes (34), we hypothesized that the trisaccharide ligand might compete for formation of such an interaction with TrpH100. For example, the galactose (Gal) residue in a *Salmonella* trisaccharide antigen stacks against TrpL93 in the complex with Fab Se155-4 (34). In addition, in the same complex, TrpH33 stacks against the C-6 position in the 6-deoxy sugar Abequose. The BGA antigen contains GalNAc and a 6-deoxy monosaccharide (fucose, Fuc), thus a revised docking experiment was sought that would permit the formation of such interactions with the aromatic residues in the binding pocket. Thus, two alternative docking experiments were designed: in the first, the side-chain torsion angles of TrpH100 were allowed to be flexible during docking (termed flexible residue docking); while in the second, TrpH100 was mutated to Ala prior to docking, and then reverted back to Trp after docking (mutational residue docking). The docked poses were filtered based on three criteria. First, poses in which the GalNAc was not located within the binding pocket were eliminated (**Figure 2C**). This criterion was adopted based on the results from two array screenings, which indicated that the antibody interacts exclusively with the BGA antigens (Tables S2 and S3 in Supplementary Material) and because the only structural difference between BGA and BGB is the presence of the NAc moiety in the former. Therefore it was hypothesized that the ability of the antibody to discriminate between these two antigens

**FIGURE 1 | (A)** Docked antigen A (green) from preliminary docking experiments with residues lining the binding pocket (shown in yellow). The antibody is shown in gray. **(B)** Residues lining the binding pocket before

(yellow) and after (ice blue) the 50 ns MD simulation. Residues HisL49 and TrpH100 (shaded rings) form stacking interactions during the course of the simulation thereby causing the ligand to become unstable.

**FIGURE 2 | Docked complexes of BGA (stick structure) in the scFv binding site (heavy and light chains shown as solvent accessible surfaces in cyan and pink, respectively, the TrpH100 surface is shown in dark blue)**. **(A)** The stick structures in green and yellow represent the best-docked poses from the TrpH100-mutagenesis and the flexible

**Table 1 | Comparison of glycosidic torsion angles between experimentally observed values and average values obtained from the MD simulations**.


<sup>a</sup>Glycosidic torsion angles for the GalNAcα(1,3)Gal (φ1, ϕ1).

<sup>b</sup>Torsion angles for Fucα(1,2)Gal (φ2, ϕ2).

would be dependent on interactions with this residue. Second, in the case of the mutational approach, poses were rejected if the Ala-Trp mutation led to irreconcilable steric clashes with the antigen (**Figure 2B**). All the docked poses obtained from each of these approaches were then scored using a CHI scoring function. After applying these criteria, both docking approaches identified essentially equivalent antigen poses (0.48 Å RMSD between ligand positions) (**Figure 2A**), in which the C-6 atom of the GalNAc forms a CH/π stacking interaction with the TrpH96. This complex was selected for further analysis by MD simulation.

#### **STRUCTURAL STABILITY OF THE IMMUNE COMPLEXES Blood group A**

The final docked model of the blood group antigen A bound to the antibody remained stable during the course of a 50 ns simulation based on the RMSD of the ring atoms of the ligand, which remained between 2 and 4 Å over the course of the simulation (**Figure 5**). An analysis of the ring conformational preferences showed that all three residues in the trisaccharide remained in the <sup>4</sup>C<sup>1</sup> chair conformations. The φ- and ψ-glycosidic torsion angles for the GalNAcα(1,3)Gal (φ1, ψ1) and Fucα(1,2)Gal (φ2, residue docking approaches, respectively. **(B)** An example of a docked pose (red) that was eliminated on the basis of clashes ensuing from the AlaH100Trp mutation. **(C)** An example of a docked pose (red) that was eliminated on the basis of the orientation of the ligand in the binding pocket.

the MD simulation that is closest to the average RMSD of the structure during the simulation. Unless shown with an H, all residues are from VL.

ψ2) linkages were monitored throughout both the simulations (BGA–scFv complex and BGA trisaccharide in solution) and the average values were found to be in agreement with the values observed for the same trisaccharide in the complex with *Dolichos biflorus* lectin as well as the conformations of the trisaccharide in solution (35) (**Table 1**). The stacking interactions between the GalNAc and TrpH96 interactions were characterized by the angle (θ) between the normals to the ring planes, and the distance (*R*) between their centroids (36). For an ideal stacking conformation, θ should be around 180° or 0°, and for CH/π, it should be around 90°. The average θ value was close to the latter at 108° (with a standard deviation of 9°) at a distance of 6.5 Å.

**Table 2 | Hydrogen bonds between BGA and the scFv during the MD simulation**.


<sup>a</sup>Standard deviations in parentheses.

During the course of the MD simulation, the side chain of HisL49 was observed to flip from its initial orientation (χ<sup>2</sup> = h-73i) to (h115i) in which it could form a hydrogen bond with the Nacetyl group of the GalNAc residue (**Figures 3** and **4**; **Table 2**). This interaction remained stable for the remainder of the 50 ns simulation. This side-chain flip may represent an example of induced fit during ligand binding, however, at the resolution of the present Xray data (2.2 Å), it is not possible to reliably discriminate between histidine χ<sup>2</sup> rotamers (37).

#### **Blood group B**

To probe the specificity of the antibody for antigen B, the scFv was screened experimentally against an array of neoglycoconjugates including ABO and related blood group antigens. The screening confirmed the exclusive specificity of the antibody for BGA-related antigens (Tables S2 and S3 in Supplementary Material). Computational carbohydrate grafting (39) of the relevant glycans from the array onto the bound BGA trisaccharide in the scFv complex confirmed that all of the BGA- and BGB-related glycans could be accommodated in the binding pocket (Table S3 in Supplementary Material). Therefore, the lack of binding of the BGB-glycans does not appear to be due to steric collisions, but rather to the loss of affinity arising from the absence of the NAc group in the BGA congeners. MD simulation of the BGB–scFv complex was employed to examine the effect of the loss of the NAc moiety on the stability and affinity of the structural difference in the antigens on the stability and affinity of the putative immune complex. Despite the fact that the MD simulations of the two complexes (BGA and BGB) were started with the antigens aligned in identical binding modes, the BGB antigen dissociated from the antibody after a relatively short simulation period of 10 ns. In order to eliminate the possibility that this instability arose due to artifacts from the conversion of the BGA to BGB antigen, two additional simulations were performed with independent initial atomic velocities. In both cases, the ligand appeared to dissociate from the antibody after approximately 10 ns (**Figure 5**). To enable comparison with the BGA complex, only the data from the initial stable 10 ns period of the BGB complex were chosen for analysis.

In antigen–scFv complexes, the Gal or GalNAc residues are flanked by residues TyrL50, AsnL34 and HisL49 on one side of the antigen (Group 1) and residues TrpH100 and TrpL96 (Group 2) on the other; the Fuc interacts with GlyL91 and AsnL92 (Group 3) (**Figure 6**). In contrast to the case of the BGA antigen, in the BGB–scFv simulation HisL49 does not form a stabilizing interaction with the terminal Gal residue. Additionally, the Gal and Fucl residues display enhanced flexibility owing to the loss of stabilizing interactions with residues from Groups 2 and 3.

#### **INVOLVEMENT OF HISL49 IN BINDING AFFINITY**

All histidines in the scFv were protonated by default for modeling with a hydrogen atom at the δ nitrogen position. During the MD simulation of the BGA–scFv complex, the χ2 angle of HisL49 flips (-73° to 115°) enabling a hydrogen bond to form with the carbonyl moiety of the NAc group in the GalNAc residue in BGA, which would be expected to be significant for enhancing the stability of the BGA–scFv complex. In the BGB complex, the same HisL49 forms an interaction with the non-terminal Gal residue. The interaction with HisL<sup>49</sup> suggests that there might

**BGA (green) and BGB (from three independent simulations, blue, purple, and red) antigens, relative to the starting conformation of the complex**.

also be a pH dependence on binding; at lower pH all histidines would be positively charged, potentially enhancing the strength of the HisL49–BGA hydrogen bond, leading to higher binding affinity. This prediction was confirmed by BLI measurements, which showed a marked decrease in the apparent *K* <sup>D</sup> as the pH dropped below the p*K*<sup>a</sup> of histidine (**Figure 7**). It should be noted that this protonation would not be localized to HisL49 nevertheless, no enhanced non-specific binding was observed at low pH for either BSA or BGA–Le<sup>x</sup> (Figures S1–S3 in Supplementary Material), supporting a role for a direct interaction between HisL49 and the BGA antigen.

#### **BINDING ENERGY ANALYSIS**

A per-residue decomposition of the interaction energies in the immune complexes indicated that, in the case of BGA, the

**FIGURE 7 | The reference (BSA)-subtracted pH dependence of the apparent K <sup>D</sup> for the interaction between scFv AC1001 and the BSA–BGA conjugate**. Error bars are derived from replicates of five measurements. Note, the pKa of histidine is 6.04 (40).

GalNAc residue contributed 25% (-8.1 kcal/mol) toward the binding energy, compared to a reduced (-4.3 kcal/mol) contribution from the corresponding Gal residue in BGB (**Table 3**). This loss



aKey residues defined as those that contribute >0.5 kcal/mol to the total interaction energy for either the BGA or BGB in the complexes. Only the initial stable 10 ns period of the BGB simulation was employed, whereas the entire 50 ns trajectory for BGA was analyzed.

<sup>b</sup>Upper row, values for BGA; lower, BGB.

of 4 kcal/mol of interaction energy is the predominant difference between the two antigens, and would be enough to reduce the affinity by nearly 800-fold, consistent with the lack of apparent binding of the BGB analogs in the glycan array screening. In addition, this analysis identified the residues that contributed significantly toward antigen binding.

In the BGA–scFv complex, residues from CDR L3 make the maximum contributions to binding (GlyL91 + TrpL96 +AsnL92 + ThrL93 = -7.2 kcal/mol) followed by H3 (AsnH98 + TrpH100 + LeuH99 = -5.5 kcal/mol), L1 (TyrL32 +AsnL34 = -4.5 kcal/mol), and L2 (TyrL50 = -1.02 kcal/mol). In contrast, in the case of BGB, the same residues from L3 contribute less than a total of 1 kcal/mol to the interaction energies. The most significant single residues are TyrL32, GlyL91, TrpH100, and TrpL96, which each contributes more than 2 kcal/mol and together account for 50% of the total affinity. Residues GlyL91 and AsnL92 that form hydrogen bonds with the Fuc residue together contribute -4.0 kcal/mol to the binding of BGA, but fail to make any stable interactions in the BGB simulation and therefore contribute negligibly to the affinity. It is these interactions that provide the predominant contributions to the preferential binding of the BGA antigen. While in the BGB complex, HisL49 does not form any stable hydrogen bonds with the terminal Gal, it is able to form new, albeit transient, interactions with the non-terminal Gal for 30% of the stable simulation period. Therefore, while the per-residue contribution values indicate that HisL49 makes a contribution greater than -1.5 kcal/mol in both cases, the interactions it forms in BGA are more stable when compared to the interactions in BGB.

#### **CONCLUSION**

In this study, 3D models of the BGA and BGB trisaccharides in complex with scFv AC1001 were generated that provided a detailed atomic level rationalization of the interactions and dynamics responsible for antigen specificity. Quantification of the binding affinities identified key residues in the binding site that are predicted to contribute to specific and non-specific interactions with each antigen and led to the confirmed prediction of enhanced binding at lower pH. The spontaneous dissociation of antigen B from the scFv–BGB complexes (in three different simulations) indicated that MD simulations confirm the known preference of this antibody for the A antigen, and support a role for MD simulations in overcoming limitations associated with ligand docking. The present study illustrates that integration of multiple experimental (affinity measurements, glycan array screening, and crystallography) and theoretical (ligand docking, MD simulation, and energy decomposition) methods provides a powerful platform for predicting the origin of antibody–carbohydrate specificity.

#### **AUTHOR CONTRIBUTIONS**

Spandana Makeneni, N. Martin Young, and Robert J. Woods conceived and designed the experiments. Spandana Makeneni, Ye Ji, and David C. Watson performed the experiments. Spandana Makeneni, Ye Ji, and Robert J. Woods analyzed the data. Spandana Makeneni, Ye Ji, N. Martin Young, David C. Watson, and Robert J. Woods contributed reagents/materials/analysis tools. Spandana Makeneni, N. Martin Young, and Robert J. Woods wrote the paper.

### **ACKNOWLEDGMENTS**

We would like to thank Dr. Jeffery C. Gildersleeve and W. Shea Wright for screening the scFv against the carbohydrate array. We wish to acknowledge the Consortium for Functional Glycomics grant number GM62116 for screening the antibody against their array.We thank the National Institutes for Health [R01 GM094919 (EUREKA) and P41 GM103390], as well as the Science Foundation of Ireland (08/IN.1/B2070) and the European Research Development Fund for support.

#### **SUPPLEMENTARY MATERIAL**

Included in the supplemental information are details and raw data from the BLI experiments. Results from screening of the scFv against both CFG and carbohydrate arrays (Dr. Jeffery Gildersleeve's group) are also presented in the supplemental material. The Supplementary Material for this article can be found online at http://www.frontiersin.org/Journal/10.3389/fimmu.2014.00397

#### **REFERENCES**


39. Tessier MB, Grant OC, Heimburg-Molinaro J, Smith D, Jadey S, Gulick AM, et al. Computational screening of the human TF-glycome provides a structural definition for the specificity of anti-tumor antibody JAA-F11. *PLoS One* (2013) **8**(1):e54874. doi:10.1371/journal.pone.0054874

40. Wood EJ. Data for biochemical research (third edition) by R M C Dawson, D C Elliott, W H Elliott and K M Jones. *Biochem Educ* (1987) **15**(2):97. doi:10.1016/0307-4412(87)90110-5

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 02 July 2014; accepted: 05 August 2014; published online: 22 August 2014. Citation: Makeneni S, Ji Y, Watson DC, Young NM and Woods RJ (2014) Predicting the origins of anti-blood group antibody specificity: a case study of the ABO A- and B-antigens. Front. Immunol. 5:397. doi: 10.3389/fimmu.2014.00397*

*This article was submitted to Immunotherapies and Vaccines, a section of the journal Frontiers in Immunology.*

*Copyright © 2014 Makeneni, Ji, Watson, Young and Woods. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Differential site accessibility mechanistically explains subcellular-specific N-glycosylation determinants

#### **LingYen Lee, Chi-Hung Lin, Susan Fanayan, Nicolle H. Packer and Morten Thaysen-Andersen\***

Department of Chemistry and Biomolecular Sciences, Biomolecular Frontiers Research Centre, Macquarie University, Sydney, NSW, Australia

#### **Edited by:**

Elizabeth Yuriev, Monash University, Australia

#### **Reviewed by:**

Jamie Heimburg-Molinaro, Emory University, USA Tony Velkov, Monash University, Australia

#### **\*Correspondence:**

Morten Thaysen-Andersen, Department of Chemistry and Biomolecular Sciences, Biomolecular Frontiers Research Centre, Macquarie University, Sydney, NSW 2109, Australia

e-mail: morten.andersen@mq.edu.au

Glycoproteins perform extra- and intracellular functions in innate and adaptive immunity by lectin-based interactions to exposed glyco-determinants. Herein, we document and mechanistically explain the formation of subcellular-specific N-glycosylation determinants on glycoproteins trafficking through the shared biosynthetic machinery of human cells. LC-MS/MS-based quantitative glycomics showed that the secreted glycoproteins of eight human breast epithelial cells displaying diverse geno- and phenotypes consistently displayed more processed, primarily complex type, N-glycans than the highmannose-rich microsomal glycoproteins. Detailed subcellular glycome profiling of proteins derived from three breast cell lines (MCF7/MDA468/MCF10A) demonstrated that secreted glycoproteins displayed significantly more α-sialylation and α1,6-fucosylation, but less α-mannosylation, than both the intermediately glycan-processed cell-surface glycoproteomes and the under-processed microsomal glycoproteomes. Subcellular proteomics and gene ontology revealed substantial presence of endoplasmic reticulum resident glycoproteins in the microsomes and confirmed significant enrichment of secreted and cell-surface glycoproteins in the respective subcellular fractions.The solvent accessibility of the glycosylation sites on maturely folded proteins of the 100 most abundant putative N-glycoproteins observed uniquely in the three subcellular glycoproteomes correlated with the glycan type processing thereby mechanistically explaining the formation of subcellular-specific Nglycosylation. In conclusion, human cells have developed mechanisms to simultaneously and reproducibly generate subcellular-specific N-glycosylation using a shared biosynthetic machinery.This aspect of protein-specific glycosylation is important for structural and functional glycobiology and discussed here in the context of the spatio-temporal interaction of glyco-determinants with lectins central to infection and immunity.

**Keywords: N-glycosylation, solvent accessibility, N-glycome, subcellular location, glycoproteome, glycosylation site, N-glycan, glycoprotein**

#### **INTRODUCTION**

Significant parts of the human genome and cellular energy are dedicated to produce and regulate protein glycosylation (1). Hence, it is no surprise that this abundant post-translational modification is important in a wide spectrum of biological processes to maintain cellular homeostasis (2). Dysregulation of protein glycosylation is a cause and/or effect of numerous pathological conditions including, but not limited to, congenital disorder of glycosylation (3), cystic fibrosis (4), inflammation (5), auto-immunity (6), and cancer (7). The extracellular location of secreted and cell-surfacetethered proteins carrying *N*-linked glycosylation is ideal for facilitating molecular interactions with the surrounding environment (8). Intracellular functions of *N*-glycoproteins are also known (9, 10). The terminal determinants of host *N*-glycans (so-called "self" and "altered self" in disease) are recognized by endogenous and exogenous glycan-binding proteins commonly called lectins. Interactions between lectins and *N*-glycans are central in innate and adaptive immunity (11). Important examples include the Ctype lectins, which may be crudely divided into lectins having affinity for α-mannose/α-fucose-terminated *N*-glycans including dendritic cell-specific intercellular adhesion molecule-3-grabbing non-integrin (DC-SIGN), macrophage mannose receptors and Langerin (12), and lectins having affinity for galactose/GalNAc terminating glycans such as macrophage galactose lectin and DCasialoglycoprotein receptor (13, 14). In addition, siglecs (I-type lectins) and galectins (S-type lectins) are important for facilitating a functional immune response (15).

The human *N*-glycosylation biosynthetic machinery is relatively well understood (16, 17). In brief, the synthesis is initiated by the transfer of common immature glycan precursors i.e., Glc3Man9GlcNAc<sup>2</sup> to conserved sequons (NxT/S, x 6= P) on translocating polypeptide chains. The glycan precursor is then remodeled through sequential trimming and elongation by specific glycosidases and glycosyltransferases located in the endoplasmic reticulum (ER) and the *cis*-, medial, and *trans*-Golgi, respectively. This series of enzymatic processes first results in the trafficking *N*-glycoproteins being comprised of attached high-mannosetype *N*-glycans,which progresses to the hybrid- and complex-type stage if sufficient interactions with the processing enzymes occur (17). The Golgi-based *N*-glycan processing, including the formation of glycan types and the addition of terminal determinants such as α-fucosylation and α-sialylation, occurs on maturelyfolded glycoproteins (18, 19). An extensive and reproducible repertoire of *N*-glycans is usually present on a given glycosylation site (20). This *N*-glycan microheterogeneity on proteins results from incomplete processing by the multiple competing enzymatic reactions that can be influenced by cellular factors including the availability of nucleotide sugars, glycosylation enzyme activity, and glycoprotein trafficking time through the biosynthetic machinery. Such cellular factors contribute to cell- and tissue-specific *N*-glycosylation (21). Importantly, the structures of the individual glycoproteins trafficking through the glycosylation machinery dramatically influence the degree of *N*-glycan processing creating protein- and site-specific *N*-glycosylation (22). By thorough literature-based curation of published site-specific glycoprofiling data of mammalian *N*-glycoproteins, we recently confirmed that several structural features including glycan type formation, α1,6- (core) fucosylation, and β1,4/6-GlcNAc branching of *N*-glycans are strongly correlated with the solvent accessibility of the glycosylation sites of maturely folded glycoproteins (19). As such, extensive *N*-glycan processing was observed for proteins displaying solvent accessible glycosylation sites relative to spatially hidden sites. Thus, differential site accessibility can explain how glycoproteins produced simultaneously in the same cell, and even sequons on the same glycoproteins, can present widely different *N*-glycan structural repertoires.

Considering the importance of protein- and site-specific *N*glycosylation in many aspects of glycobiology including glycoimmunology, we here seek to further explore this feature in the context of the multiple subcellular glycoproteomes that traffic through the shared glycosylation machinery in the secretory pathway of human cells, yet end up at different cellular locations. Due to the functional implications of both intra- and extracellular *N*-glycoproteins, we focus on the secreted, cell-surface, and intracellular glycoproteomes, the latter fraction largely represented by microsomal proteins (23). Understanding, how the subcellular glycoproteomes are generated and regulated under normal and altered physiological conditions of the cell is valuable to the understanding of their involvement in immune biology. Recent analytical developments in glycomics (24–27) and glycoproteomics (28–31) have, together with more conventional proteomics, enabled sensitive, and detailed system-wide investigations of the regulation of protein *N*-glycosylation in immunity (32).

Using LC-MS/MS-based glycomics and proteomics on multiple subcellular fractions from a panel of human cell lines displaying diverse cellular characteristics, we here document that human cells have developed a general mechanism to reproducibly generate vastly different *N*-glycan determinants on their differently located subcellular glycoproteomes that trafficked simultaneously through a shared biosynthetic machinery. We provide evidence that the subcellular-specific protein *N*-glycosylation arises from differential solvent accessibilities of the glycosylation sites of maturely folded glycoproteins that localize to different subcellular compartments following the glycan processing. This aspect of protein-specific glycosylation is discussed here in the context of immunity and infection due to the crucial role of endogenous and exogeneous lectins recognizing exposed self, and altered self, glyco-determinants tofacilitate thefunctional immune response.

#### **MATERIALS AND METHODS**

#### **CELLULAR ORIGIN, CULTURE CONDITIONS, AND DOUBLING TIME**

Multiple human cells showing diverse geno- and phenotypical characteristics were used to demonstrate the general nature of the cellular mechanisms observed in this study. Human mammary epithelial cells (HMEC) were purchased (product # CC-2551, Lonza). Human breast epithelial cell lines MCF10A, MCF7, SKBR3, MDA-MB-157 (MDA157), MDA-MB-231 (MDA231), and HS578T as well as a human colon cancer epithelial cell line SW480 were obtained from American Type Culture Collection (Manassas, VA, USA). HMEC was grown in HuMEC Ready Media (Invitrogen). MCF10A was cultured in DMEM/F12 with the addition of 5% horse serum (Invitrogen), 20 ng/mL epidermal growth factor (EGF) (Invitrogen), 0.5µg/mL hydrocortisone (Sigma), 100 ng/mL cholera toxin (Sigma), and 8µg/mL insulin (Invitrogen). Other cell lines were grown in RPMI (Sigma) supplemented in 5% fetal bovine serum (FBS) (Invitrogen), 10 mM glutamine (Invitrogen), and 10µg/mL insulin. Cells were maintained at 37°C in 5% CO<sup>2</sup> for all experiments. The breast cell lines were grown in triplicates to ~80% confluence and washed at least four times with ice-cold phosphate buffered saline (PBS) to remove traces of FBS and incubated in serum-free media at 37°C in 5% CO<sup>2</sup> for 48 h prior to subcellular fractionation.

To measure the cellular doubling times of the breast cell lines, cells were seeded at 1.3 × 10<sup>4</sup> cells/mL/well in six-well plates and incubated overnight at 37°C in 5% CO2. Cells were counted every 24 h over a four-day period using a cell counter (Bio-Rad). The doubling time for each cell line was determined from their exponential growth phase. For overview of the investigated cells and associated data, see Table S1 in Supplementary Material.

#### **COLLECTION AND PREPARATION OF SUBCELLULAR GLYCOPROTEOMES FROM BREAST CELL LINES**

The *secreted* subcellular glycoproteomes were collected by sampling 30 mL of serum-free culture media followed by centrifugation at 2,000 × *g* to pellet any floating cells. The supernatants were concentrated and buffer exchanged into PBS (1×) using 10,000 MWCO Amicon Ultra membranes (Millipore). Proteins were then precipitated with nine volumes of acetone overnight at−20°C. The pellets were stored at −80°C until further analysis.

The *cell-surface* subcellular glycoproteomes were isolated from MCF7, MDA468, and MCF10A breast epithelial cell lines using a commercial biotinylation kit (product # 89881, Pierce) to specifically biotinylate the cell-surface glycoproteins. The protocol supplied by the manufacturer was followed. Briefly, monolayers of cultured cells grown in 75 cm<sup>2</sup> culture flasks were washed threetimes with PBS (1×) before incubation in EZ-Link sulfo-NHS-SS-biotin in ice-cold PBS (1×) for 30 min at 4°C on a rocking platform. The labeling reactions were terminated and the biotinylated cells were washed and collected by scraping in Tris-buffered saline (TBS) (1×),followed by centrifugation at 500 × *g* for 3 min. The supernatants were discarded and the cell pellets were disrupted in manufacturer-provided lysis buffer by ultra-sonication using five 1-s bursts with a Sonifier 450 (Branson Sonifier, Wilmington, NC, USA). The cell lysates were centrifuged at 10,000 × *g* for 2 min at 4°C. Solubilized biotinylated cell-surface proteins in the clarified supernatants were isolated using NeutrAvidin Agarose. Cell-surface-bound proteins were eluted using 50 mM DTT and precipitated with acetone overnight at −20°C. The pellets were stored at −80°C until analysis.

The *microsome* (total membrane) subcellular glycoproteomes were obtained by first removing serum-free media, thoroughly washing cells with PBS (1×), and harvesting cells in 25 mM Tris-HCl pH 7.4, 150 mM NaCl, 1 mM EDTA containing a protease inhibitor cocktail (Roche Diagnostics). The cells were ultrasonicated on ice for three rounds of 10-s bursts using a Sonifier 450 and centrifuged at 2,000 × *g* for 20 min at 4°C to remove intact cells and nuclei. The supernatants were ultra-centrifuged at 120,000 × *g* for 80 min after which the supernatants were discarded. The microsomal membrane pellets were washed twice with ice-cold 0.1 M sodium carbonate and resuspended in 25 mM Tris-HCl pH 7.4, 150 mM NaCl, and 1% (v/v) Triton X-114. Samples were phase partitioned by incubation at 37°C for 20 min, followed by 1,000 × *g* centrifugation for 10 min. The upper aqueous layer was carefully removed and nine volumes of ice-cold acetone were added to the lower detergent phase and incubated overnight at −20°C to precipitate the proteins. The pellets were stored at−80°C until further analysis.

The protein concentrations of the subcellular fractions were measured using Bradford reagents (Sigma). Equal protein amounts were precipitated in the three subcellular fractions and the resulting pellets were solubilized in 8 M urea for spotting on PVDF membranes for *N*-glycome profiling or in NuPAGE LDS sample buffer for gel electrophoresis prior to proteome profiling.

#### **SUBCELLULAR FRACTIONATION OF HUMAN COLON CANCER CELL LINES**

SW480 cells (5 × 10<sup>7</sup> ) were washed twice with homogenization buffer (20 mM HEPES, pH 7.5, and 0.25 M sucrose). Cell pellets were resuspended to a final volume of 2 mL in homogenization buffer and lysed using an Ultra-Turrax disperser (Ika). After a low speed centrifugation at 1,000 × *g* for 10 min, the supernatant was collected as the post-nuclear fraction (PNF). The PNF was subjected to ultracentrifugation at 30,000 rpm for 1 h in a SW41Ti rotor (Beckman Coulter) to pellet the microsome. ER and Golgienriched membranes were prepared as described (33). Briefly, 1 mL of PNF (usually 2.5–3 mg protein) was adjusted to 1.4 M sucrose by adding 2 mL of 2 M sucrose. A discontinuous sucrose gradient was made by sequentially loading 1.5 mL of 1.6 M sucrose, 3 mL PNF in 1.4 M sucrose, 3 mL of 1.2 M sucrose, and 3 mL of 0.8 M sucrose. All sucrose solutions contained 20 mM HEPES pH 7.5. Ultracentrifugation was conducted at 28,500 rpm for 2 h in a SW41Ti rotor. Enriched-Golgi membranes were harvested at the 0.8 M/1.2 M interface. Enriched ER membranes were harvested from the 1.4 M layer. The collected ER and Golgi membranes were diluted by homogenization buffer to reduce concentration of sucrose and subsequently pelleted by ultracentrifugation at 30,000 rpm for 1 h in a SW41Ti rotor. Pelleted ER- and Golgienriched membranes were resuspended in 8 M urea and protein concentrations were determined by BCA assays (Pierce).

#### **RELEASE AND PREPARATION OF N-GLYCANS FROM SUBCELLULAR GLYCOPROTEOMES**

*N*-glycans were released from ~20µg secreted proteins, 50µg cell-surface proteins, and 50µg microsome membrane proteins as previously described (27). Briefly, protein mixtures were immobilized on methanol-activated PVDF membranes (Millipore) and allowed to dry overnight. Membrane-bound proteins were incubated with 2.5 U PNGase F (*Flavobacterium meningospeticum*, Roche) for 16 h at 37°C to ensure complete release of *N*-glycans. Released *N*-glycans were incubated with 100 mM ammonium acetate (pH 5) for 1 h at RT and subsequently dried by vacuum centrifugation. Reduction of *N*-glycans was performed with 20µL 1 M sodium borohydride (Sigma) in 50 mM potassium hydroxide (Sigma) for 3 h at 50°C. Reactions were quenched with 2µL glacial acetic acid. Dual desalting was performed in micro-SPE formats using strong cation exchange/C<sup>18</sup> and carbon columns (27). Desalted *N*-glycans were eluted from the carbon columns with 20µL 40% acetonitrile (ACN) containing 0.1% (v/v) trifluoroacetic acid and dried by vacuum centrifugation (34). Samples were stored at −80°C if not analyzed immediately.

#### **DIGESTION AND PREPARATION OF PEPTIDE MIXTURES FROM SUBCELLULAR GLYCOPROTEOMES**

The subcellular glycoproteomes of the breast cells (~50µg protein/fraction) i.e., secreted, cell surface, and microsomes and of colon cells (~10µg protein/fraction) i.e., microsome and ER- and Golgi-enriched membrane fractions were reduced and alkylated and subsequently in-gel (breast cells) or in-solution (colon cells) digested. Prior to in-gel digestion, samples were loaded in 10µL NuPAGE LDS buffer and separated on 4–12% Bis-Tris PAGE gels (Invitrogen). Electrophoresis was performed at 200V for 50 min. After separation of proteins, gels were fixed in 40% (v/v) ethanol and 10% (v/v) acetic acid for at least 2 h, stained overnight with Coomassie Blue G250 (Bio-Rad) and destained in ultra-pure water (Millipore). In-gel trypsin digestion of all samples was performed from eight equal sized gel fractions. Each fraction was sliced into 1 mm pieces and placed in a 96-well plate. The gel pieces were destained with 50% (v/v) ACN in 50 mM ammonium bicarbonate until clear, dehydrated in 100% (v/v) ACN, and dried. Sequencegrade porcine trypsin (Promega) (1:30 enzyme/substrate, w/w) was used to digest the proteins overnight at 37°C. Tryptic peptide mixtures were then collected and two rounds of gel extractions of peptides were performed with 2% (v/v) formic acid in 50% (v/v) ACN and 50 mM ammonium bicarbonate. The extracts were combined, peptide mixtures dried by vacuum centrifugation, redissolved in 10µL 0.1% (v/v) formic acid, and desalted as described below. For in-solution digestion, samples were diluted to <1 M urea (final concentration) and trypsinized (sequencegrade porcine trypsin, 1:40 enzyme/substrate, w/w) overnight at 37°C. Following proteolysis, the peptide mixtures were acidified by adding formic acid to a final concentration of 0.1% (v/v). Desalted of peptide mixtures were performed using self-packed C<sup>18</sup> SPE tips. Briefly, C<sup>18</sup> tips were washed three-time with 20µL 100% ACN, three-times with 20µL 50% (v/v) ACN in 0.1% formic acid, and equilibrated with 50µL 0.1% (v/v) formic acid. After sample loading, tips were washed three-times with 20µL 0.1% formic acid. Peptides were eluted with 20µL 60% (v/v) ACN in 0.1% formic acid and 20µL 90% (v/v) ACN in 0.1% formic acid and dried. The desalted fractions were dried and stored at −80°C until LC-MS/MS.

#### **LC-MS/MS-BASED N-GLYCOMICS**

*N*-glycans alditols were separated using a porous graphitized carbon (PGC) LC column [5µm (particle size) Hypercarb KAPPA, 100 mm (length) × 200µm (ID), 250 Å (pore size), Thermo Scientific] using an Ultimate 3000 HPLC system (Dionex) connected directly to an ESI-MS/MS HCT Ultra ion trap (Bruker Daltonics). Separation was performed using a binary gradient solvent system made up of solvent A (aqueous 10 mM NH4HCO3) and solvent B (90% ACN/10 mM ammonium bicarbonate). The flow rate was 2µL/min and a total gradient of 100 min was programed as follows: 0–2.5% solvent B for 0–13 min; 2.5–17.5% solvent B for 14–48 min; 17.5–50% solvent B for 48–65 min; 50–100% solvent B for 65–75 min; 100% solvent B for 75–80 min; back to 0% solvent B for 80–85 min, and 100% solvent A equilibration for 15 min. Settings for the MS/MS were as follows: drying gas flow: 6 L/min; drying gas temperature: 300°C; nebulizer gas: 12 p.s.i.; skimmer: −40.0V; trap drive: −99.1V; and capillary exit: −166V. Smart fragmentation was used with start- and end-amplitude of 30 and 200%, respectively. Ions were detected in ion charge control set at 100,000 ions/scan and with maximum accumulation time of 200 ms. MS spectra were obtained in negative ion mode with three scan events: a full scan (*m/z* 400–2,200) at a scan speed of 8,100 *m/z/*s and data-dependent MS/MS scans after CID fragmentation of the top two most intense precursor ions with an absolute intensity threshold of 30,000 and a relative intensity threshold of 5% relative to the base peak. Dynamic inclusion was inactivated to ensure MS/MS generation of closely eluting *N*-glycan isomers. Precursors were observed mainly in charge states *Z* = −1 and/or −2. Mass accuracy calibration of the mass spectrometer was performed using a well-defined tune mix (Agilent) prior to acquisition. *N*-glycans released from bovine fetuin served as positive controls for the sample preparation and the LC-MS/MS performance. Differences between observed and theoretical precursor and fragment masses were generally <0.2 Da. Three LC-MS/MS technical replicates were performed for the subcellular fractions.

#### **LC-MS/MS-BASED PROTEOMICS**

Three LC-MS/MS technical replicates of the subcellular proteomes of the breast cells were analyzed using a Q-Exactive (Thermo Scientific). Peptide mixtures in 0.1% (v/v) formic acid were loaded onto a C<sup>18</sup> reversed phase column packed in-house [2.7µm (particle size) HaloLink Resins, Promega, column dimensions: 100 mm (length) × 75µm (ID)]. Separation of peptides was performed over a 60 min gradient with the first 50 min of the linear gradient increasing from 0 to 50% in solvent B [0.1% (v/v) aqueous formic acid/100% (v/v) ACN] and then to 85% solvent B for the next 2 min and maintained at 85% for 8 min. The flow rate was constant at 300 nL/min. The Easy-nLC (Thermo Scientific) was connected directly to the nano-ESI source of the Q-Exactive. MS full scans were acquired with resolution of 35,000 in the positive ion mode over *m*/*z* 350–2,000 range and an automatic gain control (AGC) target value of 1 × 10<sup>6</sup> . The top 10 most intense precursor ions were then isolated for MS/MS using higher energy collisional dissociation fragmentation at 17,500 resolution with the following settings: collision energy: 30%; AGC target: 2 × 10<sup>5</sup> ; isolation window: *m/z* 3.0; and dynamic exclusion enabled. Precursors with unassigned or *Z* = +1 charge states were ignored for MS/MS selection.

The subcellular proteomes of the colon cells were LC-MS/MS analyzed using a Triple TOF 5600 (ABSciex). Peptides were separated by a nanoLC system (Eksigent) on a C<sup>18</sup> reversed phase column [ProteCol 100 mm (length) × 150µm, (ID): 3µm (particle size), 300 Å (pore size); SGE Analytical Science] with a 90 min gradient from 5 to 40% solvent B [90% (v/v) ACN with 0.1% formic acid] at a constant flow rate of 600 nL/min. The top 10 most intense precursor ions with *Z* = +2, +3, and +4 were selected for MS/MS using CID fragmentation.

#### **ANALYSIS OF N-GLYCOME LC-MS/MS DATA**

*N*-glycome raw data for all subcellular glycoproteomes were viewed and manually analyzed using DataAnalysis v4.0 (Bruker Daltonics). Monoisotopic masses were obtained and searched against GlycoMod<sup>1</sup> to obtain possible monosaccharide compositions, which were subsequently verified manually by *de novo* sequencing of corresponding MS/MS spectra and by taking account of PGC chromatographic retention time. The glycan type and the terminating monosaccharide determinants could unambiguously be identified using this method (27). The relative abundances of the observed *N*-glycans were determined using the ratio of the extracted ion chromatogram (EIC) peak area of each *N*-glycan species over the sum of EIC peak areas of all observed *N*-glycans in the sample. This has been shown to be a reasonably accurate method for relative *N*-glycan quantitation (35). The extent of *N*-glycan processing was measured by evaluating the relative molar proportion of the relative unprocessed species (i.e., immature mono-glucosylated glycans and high-mannose type *N*-glycans) and the processed species (i.e., hybrid, complex, and paucimannose type *N*-glycans) of the total *N*-glycome. In addition, the degree of monosaccharide determinants including α1,2/3/6-mannose, β1,3/4-galactose, α1,3/4/6-fucose, and α2,3/6 sialic acid terminating *N*-glycans were calculated as a relative molar abundance of both the entire *N*-glycome and of the potentially modified *N*-glycan substrates (e.g., complex/hybrid-types). Since multiple determinants may be displayed by a given *N*-glycan, the total summed to more than 100%.

#### **ANALYSIS OF LC-MS/MS-BASED PROTEOMIC DATA AND GENE ONTOLOGY**

For breast cell proteomes, raw spectra were converted to .mgf files using Proteome Discoverer Daemon v1.3 (Thermo Scientific) and searched against SwissProt protein database (*Homo sapiens*, 20,279 reviewed entries, November 2013 release) using the Global Proteome Machine (Cyclone). The following search criteria were used: carbamidomethylation was a fixed modification and oxidation and deamidation were variable modifications for methionine and asparagine/glutamine residues, respectively. Mass tolerances of 10 ppm and 0.02 Da were selected for precursor and product ions, respectively,with a maximum of two missed tryptic cleavages.

For colon cell proteomes, MS/MS spectra were extracted by ProteinPilot v4.2 (ABSciex) and searched using Mascot v2.4.0

<sup>1</sup>http://web.expasy.org/glycomod/

(Matrix Science) against SwissProt protein database (*Homo sapiens*, 20,253 entries, April 2013 release) using trypsin as the digestion enzyme. Precursor and product ion tolerances were 20 ppm and 0.50 Da, respectively. Oxidation of methionine residues and carbamidomethylation of cysteine residues were used as variable modifications.

Scaffold v4.2.1 (Proteome Software) was used to validate MS/MS-based peptide and protein identifications. Peptides were accepted if they were confidently identified at ≥95.0% probability as evaluated by the local false discovery rate (FDR) algorithm. Proteins were included if they were confidently identified at ≥99.0% probability as assigned by the Protein Prophet algorithm incorporated in the software. Proteins containing shared or similar peptides, and which could not be differentiated based on MS/MS analysis alone, were grouped to satisfy the principles of parsimony. Proteins, which confidently shared identified peptides were grouped into clusters. Proteins were annotated using gene ontology (GO) terms from NCBI. The protein identifications were stringently filtered based on the presence of a minimum of two peptides in all replicates. The relative abundances of proteins were determined by conventional spectral counting and adjusted by taking the polypeptide length into account. Putative *N*-glycoproteins in the proteome of the subcellular fractions were predicted *in silico* based on the presence of one or more sequons (NxT/S, x 6= P) and a signal peptides (for secreted proteins) and/or transmembrane regions (for cell-surface and microsome proteins) using prediction tools including SignalIP (v4.1) (36), Transmembrane Hidden Markov Model (TMHMM v2.0) (37), PrediSi (38), and Phobius (39).Mitochondrial and nuclear membrane proteins were excluded as these are unlikely to enter the ER–Golgi glycosylation pathway. Ambiguous assignments were manually checked (validated or discarded) with information from Uniprot. Potential sequons were obtained using NetNGlyc (40). These *in silico* prediction tools generated lists of experimentally validated and putative glycoproteins. The 100 most abundant glycoproteins in each subcellular fraction were used to assess glycosylation site accessibility. The contribution of these glycoproteins to the total glycoproteome in each sample was estimated by multiplying the normalized spectral count of the individual glycoproteins with their potential glycosylation sites, a measure termed "sequon-weighted normalized spectral count."

#### **SELECTION OF PDB 3D STRUCTURE FOR GLYCOSYLATION SITE ACCESSIBILITY DETERMINATION**

Three-dimensional protein structures were obtained from the protein data bank (PDB) database<sup>2</sup> . If multiple structures were available for a glycoprotein, the best match to the naturally occurring variant was chosen by considering the following parameters in a prioritized order: (1) high protein sequence coverage and resolution of the 3D structure, (2) source of protein (purified from organism/tissue over artificial expression system), (3) known sitespecific mutations, (4) presence of artificial/natural ligands, and (5) oligomerization of the solved 3D structure. The experimentally obtained PDB structures used in this study were all based

<sup>2</sup>http://www.rcsb.org/pdb

on X-ray crystallography, Table S2 in Supplementary Material. Where no experimentally determined structures were available (43%), structure homologs were obtained from ProteinModel-Portal<sup>3</sup> , Swiss-model repository<sup>4</sup> , or ModBase<sup>5</sup> . High sequence homology was used as a selection criterion when choosing homology model. The average sequence homology for all structures was 67%, which is considered very reliable for homology modeling (41), Table S1 in Supplementary Material. 3D protein structures were viewed with RasMol v2.7.5 (RasWin Molecular Graphics) for visual inspection.

#### **GLYCOSYLATION SITE ACCESSIBILITY DETERMINATION FROM MATURELY FOLDED GLYCOPROTEINS**

The glycosylation site solvent accessibility was determined by measuring the accessibility to the individual asparagine residues forming the glycosylation sites using NACCESS<sup>6</sup> (42), an accurate and frequently used solvent accessibility determination program (19, 43–45). NACCESS calculates the atomic accessible area by predicting van der Waal's interactions when a probe is rolled around on the protein surface (46, 47). The maximum probe size offered by the program (5 Å radius) was used as a default in this study to simulate as closely as possible the accessibility of the glycosylation enzymes to the glycosylation sites. NACCESS produces unit-less and absolute accessibility values as the output format (denoted "arb. units"), which are comparable between glycosylation sites of different glycoproteins (19). Prior to the measurements of site accessibility, any water molecules, sugars, ligands, and other hetero-atoms/molecules, not part of the core polypeptide chain, were removed from the protein surface. Negligible accessibility differences were observed for the"native"and the monomeric form of glycoproteins with quaternary structures (data not shown). Hence, in the case of multimers, glycosylation site solvent accessibilities derived from the monomeric structures were not considered in the analysis.

#### **STATISTICAL ANALYSIS**

All relative abundances of *N*-glycans were presented as a percentage out of 100% as mean ± SD. Glycosylation site accessibilities were presented as mean ± SEM to illustrate the potential spread of mean instead of the individual data points, which can be hugely influenced by the (local) accuracy and quality of the PDB structures. To overcome this potential issue of PDB"noise,"relative large numbers of data points (*n*) were needed. Data were analyzed using Prism v6 (GraphPad). One-way ANOVA analysis was performed for statistical comparison between the three subcellular fractions followed by *post hoc* Tukey's tests. All *p* values were adjusted taking into account the multiple comparisons made and reported as multiplicity adjusted *p* values. *p* < 0.05 was regarded as statistically significant and indicated with "\*." Stronger statistical significance was indicated as follows: \*\**p* < 0.01; \*\*\**p* < 0.001; \*\*\*\**p* < 0.0001. Simple linear regression and corresponding correlation coefficients (*R* 2 ) were obtained to evaluate the relationship between

<sup>3</sup>http://www.proteinmodelportal.org

<sup>4</sup>http://swissmodel.expasy.org

<sup>5</sup>http://modbase.compbio.ucsf.edu/modbase-cgi/index.cgi <sup>6</sup>http://wolf.bms.umist.ac.uk/naccess/

the degree of *N*-glycan processing in terms of glycan type and expression of terminal glycan determinants and the glycosylation site solvent accessibility.

#### **RESULTS**

#### **SUBCELLULAR-SPECIFIC N-GLYCOSYLATION OF HUMAN BREAST EPITHELIAL CELLS**

Label-free quantitative *N*-glycome mapping of the secreted and microsome (total membrane) subcellular glycoproteomes of a panel of eight cultured human breast cells (i.e., MCF7, SKBR3, MDA157, MDA231, MDA468, HS578T, HMEC, and MCF10A) displaying diverse cellular features showed differential *N*-glycan processing of the two fractions, **Figure 1A**. The glycoproteins secreted into the cultured media consistently displayed a significantly higher proportion of processed *N*-glycan types (i.e., hybrid, complex, and paucimannose) (74.2–95.0% mol/mol of total *N*-glycome) than the high-mannose-rich microsomal subcellular glycoproteomes (22.1–55.6%, *p* < 0.0001–0.05). Little, if any, correlation between the *N*-glycan processing stage and the cellular doubling time (*R* <sup>2</sup> = 0.13) or the protein secretion rate (*R* <sup>2</sup> = 0.35), respectively, was detected of the secreted glycoproteomes across the cell line panel, Figure S1 in Supplementary Material. No correlation was detected between the *N*-glycan processing stage of the microsomal glycoproteins and the cellular doubling time (*R* <sup>2</sup> = 0.04) or the protein secretion rate (*R* <sup>2</sup> = 0.01).

In-depth, *N*-glycan profiling of the secreted, microsomal, and cell-surface enriched glycoproteomes was carried out for MCF7, MDA468, and MCF10A cells as representative cells for the breast cell line panel. Differential *N*-glycan processing was evident as exampled by the clear differences seen in the *N*-glycome *m/z* profiles of the three subcellular fractions of MCF7 cells, **Figure 1B**. The cell-surface glycoproteins derived from MCF7 and MDA468 (but not MCF10A) cells were subjected to more *N*-glycan processing than microsomal proteins (*p* < 0.01–0.05) and all the three cell lines showed significantly increased abundance of the more processed *N*-glycans on the secreted proteins (*p* < 0.0001–0.01), **Figure 1C**.

#### **SUBCELLULAR-SPECIFIC DISTRIBUTION OF N-GLYCAN DETERMINANTS**

To further evaluate the subcellular-specific distribution of common *N*-glycosylation determinants, which may be recognized by different immuno-lectins, terminal α-mannose, α-fucose, and α-sialic acid residues were mapped based on the obtained *N*glycome profiles, **Figure 1D**. As expected from the glycan type distribution, terminating α-mannosylation was found to be significantly reduced on the secreted and cell-surface proteins relative to the microsomal proteins. The α-fucosylation, primarily of the α1,6-(core) type, and α2,3/6-sialylation were concomitantly significantly higher in the secreted fractions than in the cell-surface-enriched fraction (with the exception of fucosylation of MCF7) and in the microsomal fraction of all three cell lines. Taking the incomplete subcellular fractionation into account (see "Proteomics- and GO-Based Assessment of Subcellular Fractionation"), we estimate that very little terminal αmannosylation is present on protein *N*-glycans in contact with the extracellular environment in the investigated cells and that little α-sialylation and α-fucosylation are carried by intracellular (microsomal) *N*-glycoproteins.

#### **PROTEOMICS- AND GO-BASED ASSESSMENT OF SUBCELLULAR FRACTIONATION**

In total, 2,297, 2,636, and 2,042 human proteins were identified across the three subcellular fractions in MCF7, MDA468, and MCF10A, respectively. Putative *N*-glycoproteins fulfilling our strict prediction criteria i.e., presence of the following: one or more sequons (NxT/S, x 6= P); and signal peptides (for secreted proteins); and/or transmembrane regions (for membrane-tethered proteins) comprised significant proportions of the subcellular proteomes (15.7–31.0%), Table S3A in Supplementary Material. The GO terms "ER","Golgi/endosome/plasma membrane", and "extracellular" were used to evaluate the localization/origin of the glycoproteins identified in the subcellular fractions. In agreement with a previous study (23), the GO annotation of the identified proteins showed that the microsomes in general contained a high proportion of ER-residing proteins, **Figures 2A–C**. Although the proteins are only broadly, and possibly somewhat inaccurately, classified on the basis of GO terms, the trends clearly indicated significant enrichment, although not complete isolation, of the desired proteins in the respective subcellular fractions. The ER-based contribution to the microsome was supported by the fact that a significant proportion of the high-mannose *N*-glycans identified in this fraction were of the immature type i.e., Man<sup>9</sup> ± Glc<sup>1</sup> (MCF7: 35.3 ± 0.9%, MDA468: 40.2 ± 2.0%, and MCF10A: 31.8 ± 0.4%, mol/mol of the total high-mannose *N*-glycans),**Figure 2D** (MCF7 data) and Figure S2 in Supplementary Material (MDA468 and MCF10A data).

To further investigate the intracellular *N*-glycosylation and confirm the presence of ER-rich microsomes, the *N*-glycome and proteome of ER- and Golgi-enriched fractions of human colon epithelial cancer cells (SW480) as prepared by the method of sucrose density gradient centrifugation, were mapped and compared to the microsome profiles derived from the same cells, Figure S3A in Supplementary Material. Quantitative analysis of four reliable and representative markers of the ER (i.e., 78 kDa glucose-regulated protein, protein disulfide bond isomerase, calreticulin, and protein transport protein Sec61 alpha isoform 1) and Golgi (i.e., polypeptide *N*-acetylgalactosaminyltransferase 2, β-1,4-galactosyltransferase 1, Golgi apparatus protein 1, and Golgi membrane protein 1) compartments revealed a high abundance of ER-specific proteins in the ER-enriched fraction, Figure S3B in Supplementary Material. However, there was still a significant presence of ER proteins in the Golgi-enriched and microsomefractions. In contrast, the ER-enriched and microsome fractions were essentially free of Golgi proteins, Figure S3C in Supplementary Material. In line with our breast epithelial cell data, the proteins in the ER-enriched fraction contained a significantly higher degree of high-mannose (Glc0-1Man5-9GlcNAc2) (92%) *N*-glycans than the proteins in the microsome (75%) and the Golgi-enriched fraction (51%). Taken together, the data confirm that the microsomes of human breast and colon epithelial cells predominantly contain ER proteins and that such intracellular proteins mostly carry highmannose type *N*-glycosylation. Since the Golgi fraction contains few, if any, ER proteins, it becomes clear that the majority of post-ER *N*-glycans are of the complex type.

and phenotypically different cultured human breast epithelial cells (i.e., MCF7, SKBR3, MDA157, MDA231, MDA468, HS578T, HMEC, and MCF10A) were profiled, see Table S1 in Supplementary Material for information of investigated cells. The relative molar abundances (mean ± SD) of more processed N-glycans comprising the complex, hybrid, and paucimannose type are presented in light red and the less processed N-glycans of the immature and high-mannose type in green (inset). Subcellular-specific N-glycosylation of boxed cell lines was investigated further in greater detail. **(B)** Summed m/z profiles of the N-glycomes derived from microsomal (top), cell-surface (middle), and secreted (bottom) proteins of MCF7 cells. Signals corresponding to N-glycans have been assigned as less processed (green)

mannose, green bars) processed N-glycan types of the microsomal (dotted bars), cell-surface (brick), and secreted (banded) proteins of MCF7, MDA468, and MCF10A. **(D)** Subcellular-specific distribution of the N-glycan determinants. The proportion of terminal α-mannosylation, α-fucosylation, and α-sialylation (non-reducing end) N-glycans of the total N-glycome (mol/mol %) on the microsome, cell-surface, and secreted glycoproteomes across MCF7 (i), MDA468 (ii), and MCF10A (iii) breast cell lines were determined from the N-glycome profiles. N-glycans may terminate with multiple monosaccharide determinants making the values sum to more than 100%. For all panels: ns, not significant; \*p < 0.05; \*\*p < 0.01; \*\*\*p < 0.001; \*\*\*\*p < 0.0001.

breast epithelial cell lines were mapped according to GO terms into ER, Golgi/endosome/plasma membrane, and extracellular region classifiers. This confirmed enrichment, but not isolation, of cell-surface and secreted proteins in the respective subcellular fractions. In addition, the classification confirmed that the microsomes contained a significant proportion of ER-residing

Man<sup>9</sup> ± Glc1, the latter representing immature N-glycans normally only associated with intracellular ER N-glycosylation. See Figure S2 in Supplementary Material for the subcellular distribution of the high-mannose glycan type series of MDA468 and MCF10A.

#### **DIFFERENTIAL Asn SITE ACCESSIBILITIES EXPLAIN SUBCELLULAR-SPECIFIC N-GLYCOSYLATION**

To investigate a possible link between the observed subcellularspecific *N*-glycosylation and protein *N*-glycosylation site accessibility, *in silico* assessment of site accessibility was performed of the identified proteins predicted to be *N*-glycosylated. Due to the laborious and time-consuming approach of determining glycoprotein site accessibility (19), only the most abundant subset of the putative *N*-glycoproteins observed in the subcellular fractions were included in the accessibility assessment. The relative abundances of the individual putative glycoproteins were calculated by a conventional normalized spectral counting strategy; however, the number of sequons of the individual proteins was factored

into the calculation to ensure a fair representation of heavily and lightly *N*-glycosylated proteins. We call this term "sequonweighted normalized spectral counts." Based on sequon-weighted normalized spectral counts, the 100 most abundant glycoproteins uniquely present in the three subcellular fractions, which, by weight, comprised 70–100% of the individual subcellular glycoproteomes, were used to assess glycosylation site accessibility, Table S3B in Supplementary Material. The solvent site accessibilities were determined using an established approach based on van derWaal interactions of the asparagine residue of the glycosylation sites to solvent (19). 3D-glycoprotein structures (experimental or homology modeled) were available for approximately onethird of the 189, 89, and 183 putative *N*-glycoproteins identified

uniquely in the microsome, cell-surface, and secreted fraction, respectively, Figure S4 in Supplementary Material. This yielded site-accessibility datasets covering in total 161 (microsome), 189 (cell-surface), and 236 (secreted) *N*-glycosylation sites from the three cell types.

Differential site accessibilities were observed for the three subcellular glycoproteomes for all three investigated breast cell lines, **Figure 3A** (see also Figures S5A–C in Supplementary Material for an alternative representation showing 95% confidence intervals). Glycosylation sites of secreted glycoproteins were significantly more accessible [MCF7: 85.63 ± 35.47, *n* = 73; MD468: 85.44 ± 36.85, *n* = 112; MCF10A: 86.56 ± 33.54 (all unit-less arbitrary values), *n* = 95] than sites on microsomal proteins (MCF7: 59.44 ± 46.58, *n* = 32; MD468: 64.98 ± 46.99, *n* = 40; MCF10A: 64.84 ± 40.97, *n* = 22, *p* < 0.01). In agreement with the *N*-glycomes that carried a mixture of less processed highmannose and more processed *N*-glycan types, the sites of cellsurface proteins were intermediately accessible: cell-surface sites were either statistically similar in accessibility to the microsomal protein sites (MCF10A: 67.70 ± 37.66, *n* = 44) or similar to the secreted protein sites (MCF7: 76.20 ± 38.13, *n* = 84; MD468: 85.95 ± 34.08, *n* = 40). For all three breast cell lines, the glycosylation site accessibilities were strongly correlated with the *N*-glycan processing as measured by their glycan type (MCF7: *R* <sup>2</sup> = 0.94; MD468: *R* <sup>2</sup> = 0.75; MCF10A: *R* <sup>2</sup> = 0.92), **Figure 3B**. Higher average glycosylation site accessibility of the secreted and partly also the cell-surface glycoproteins resulted, as such, in more *N*-glycan processing in terms of glycan type formation.

Other subcellular-specific *N*-glycosylation signatures including core fucosylation, β-galactosylation, and α-sialylation were found to correlate only weakly or not at all with glycosylation site accessibility upon search for consistent trends across the three different cell lines, Table S4 in Supplementary Material.

#### **DISCUSSION**

#### **SUBCELLULAR-SPECIFIC PROTEIN N-GLYCOSYLATION OF HUMAN CELLS**

All *N*-linked glycoproteins synthesized by a given cell are processed by a common glycosylation machinery. Despite this shared biosynthetic machinery,we observed that a panel of human breast epithelial cells of different geno- and phenotypes, reproducibly produced subcellular glycoproteomes with distinct *N*-glycosylation signatures. The *N-*glycans attached to proteins enriched from the cell-surface, and in particular the secreted glycoproteins, were significantly more processed with respect to their glycan type (i.e., hybrid/complex/paucimannose) than the predominantly highmannose type microsomal proteins for all investigated cells. As such, subcellular-specific *N*-glycosylation can be predicted to be a general cellular feature not restricted to the investigated breast epithelial cells. Deeper dissection of the intracellular organellespecificity of colon cell *N*-glycosylation supported this concept. The capacity of human cells to generate multiple subcellular glycoproteomes displaying specific *N*-glycosylation profiles has, to the best of our knowledge, not been systematically investigated.

The importance of cell-surface *N*-glycosylation for cell–cell and cell–protein interactions has prompted several investigations of the cell-surface (alternatively termed plasma membrane) *N*-glycosylation. High-mannose type *N*-glycans, in particular Man8–9 structures, were previously reported to be the dominating features of the plasma membrane of human embryonic stem cells (48) and of cancer cells (49, 50). However, cell lysates and total membrane fractions similar to our microsome preparations were used in these studies suggesting significant contributions from intracellular high-mannose-rich ER-residing *N*glycoproteins (23). Hence, the actual cell-surface *N*-glycomes in the previous work may not have been accurately captured. Specific cell-surface enrichment methods such as biotinylation labeling strategies used in this study or adhesion-based isolation methods (23) indicate that human cell-surfaces instead are generally decorated with more processed *N*-glycan types.

Of the six cancerous breast cells investigated in this study, only MCF7 and MDA468 displayed predominantly (>70%) highmannose *N-*glycans of the microsomal proteins. Approximately equal distribution of high-mannose and the more processed *N*glycan types of microsomal proteins were detected in the remaining four cancerous (SKBR3,MDA157,MDA231, and HS578T) and the two non-cancerous cells (HMEC and MCF10A). In addition, no consistent over-representation of high-mannose *N*-glycans were detected for the secreted proteins derived from the cancerous cell lines relative to the non-cancerous cell lines. Together this indicates that high-mannose *N*-glycosylation is not linked directly to tumorigenesis. Others have associated serum-derived highmannose *N*-glycoproteins to pathogenesis including cancer and inflammation (5, 51); however, whether these under-processed species are a result of leakage of intracellular glycoproteins as a consequence of cell death or active cellular secretion from intact cells remains to be described. Based on in-depth comparative analysis of the *N*-glycomes derived from secreted proteins of breast and colon epithelial cells of non-cancerous and cancerous nature, we have recently identified several tumor- and sub-type specific *N*-glycosylation signatures amongst the complex *N*-glycans including alterations of sialylation, α1,6-fucosylation, and bisecting β1,4-GlcNAcylation (submitted) (52).

#### **SITE ACCESSIBILITIES MECHANISTICALLY EXPLAIN SUBCELLULAR-SPECIFIC N-GLYCOSYLATION**

We have previously shown that solvent accessibility of the glycosylation site of *N*-glycoproteins is an important factor in generating protein- and site-specific *N*-glycosylation (19).We used literaturebased glycoprofiling of more than 100 mammalian glycoproteins produced under different cellular and physiological conditions to establish that site accessibility of maturely folded glycoproteins correlates with*N*-glycan processingfeatures including glycan type, α1,6-fucosylation and β1,4/6-GlcNAc-branching. We emphasized in that study that relatively large datasets were required to compensate for the potential inaccuracy of the individual PDB structures and the relative simplistic solvent accessibility assessment simulating the accessibility of the processing glycosylation enzymes to the protein glycosylation sites.

Herein, we used a similar approach using our own *N*glycosylation data acquired from eight cell lines fractionated into subcellular glycoproteomes to further explore the determining features of site-specific *N*-glycosylation in the context of subcellular localization of proteins. Homogenous cell cultures were an essential tool to ensure that the isolated subcellular glycoproteomes were produced simultaneously under the same physiological conditions of the glycosylation machinery. Although the *N*-glycomes, as expected, varied considerably between the different cell lines, our experimental data not only validated the strong correlation of the *N*-glycan type and the glycosylation site accessibility of maturely folded glycoproteins in agreement with our previously report (19), but also mechanistically explained that subcellularspecific *N*-glycosylation is driven by differences in site accessibilities of the individual glycoproteins ending up at different subcellular destinations, **Figure 4**. Intracellular (microsome) *N*glycoproteins receive little glycan processing of the high-mannose intermediates as a result of limited site accessibility, whereas the secreted *N*-glycoproteins are modified almost entirely to more processed *N*-glycan types due to high site accessibilities. As such, *N*-glycan processing may be a targeting signal or a requirement for intracellular (ER–Golgi-residing) glycoproteins to translocate to the surface for cell-surface integration/secretion via vesicles. Keeping in mind there may be many exceptions to the molecular trends presented here, it is tempting to view the glycosylation site accessibility, and, thus, the *N*-glycan type, as a crude predictor of subcellular location of human glycoproteins.

We have previously linked core fucosylation to glycosylation site accessibility (19). Interestingly, glycosylation site accessibility alone could not explain the differential core fucosylation of the subcellular fractionated proteins in our data: the secreted proteins did not have a higher degree of core fucosylation of complex/hybrid-type *N*-glycans than the cell-surface proteins although the secreted proteins had significantly higher accessibilities. This surprising observation may be explained by a possible advantage of the membrane-embedded cell-surface glycoproteins to achieve preferential interaction with the membranebound fucosyltransferase 8 (FUT8) facilitating the addition of α1,6-fucose residues to the chitobiose cores of *N*-glycans. Soluble (luminal) glycoproteins may be less likely to interact with FUT8. This explanation is congruent with our previous observation describing FUT8 discrimination of soluble *N*-glycoproteins over membrane *N*-glycoproteins (19). Similar processing preference was not observed for the multiple processing enzymes responsible for the formation of the glycan type. As expected, the glycan modification more distal from the protein surface i.e., β-1,3/4-galactosylation and α2,3/6-sialylation were not found to be correlated with glycosylation site accessibility since the glycosyltransferases most likely have unhindered access to the substrates relatively far from the protein surface. By the same token, we cannot rule out that a more refined accessibility determination approach, which not only takes into account the glycosylation site solvent accessibility, but also the conjugated *N*-glycans (53– 56), may expose that other subcellular-specific *N*-glycan features correlate with site accessibility. New developments in glycoproteomics may also support and strengthen these observations by giving more accurate insight into the connectivity of glycosylation of the individual protein carriers (31). Finally, it should be emphasized that although the subcellular glycoproteomes share a common biosynthetic machinery, slightly different trafficking rates and/or routes to their final destinations are factors that may

contribute to yield distinct subcellular *N*-glycosylation. Other cellular factors including the glycosylation enzyme activity or the availability of nucleotide donors may also indirectly contribute to subcellular-specific *N*-glycosylation by having differential effects on the individual subcellular glycoproteomes.

#### **SUBCELLULAR-SPECIFIC GLYCO-DETERMINANTS IN IMMUNITY**

The distinct *N*-glycosylation signatures carried by the subcellular glycoproteomes may be functionally important in immunity if we consider the key role of *N*-glycans as mediators for an effective innate and adaptive immune response through their specific interaction with endogenous lectins. In addition, opportunistic pathogens often use exposed *N*-glycan determinants as receptors for adhesion using exogenous lectins (11). The observed subcellular-specific glycosylation is here briefly discussed in the context of glyco-immunity and infection; it is stressed that further empirical evidence is required to validate these proposed relationships.

We found that α-sialylation was a more abundant feature of the secreted *N*-glycoproteins than cell-surface proteins. High sialylation of secreted glycoproteins is essential to mask penultimate galactose residues from being exposed and recognized by asialoglycoprotein receptors, a C-type lectin (12). Thus, the high sialylation of secreted glycoproteins may be a requirement to ensure prolonged circulation half-life. In addition, high sialylation of secreted glycoproteins can act as a strong decoy for the less sialylated cellsurface proteins, to which opportunistic pathogens are known to adhere through sialic acid-recognizing I-type lectins (alternatively termed siglecs) (57, 58). Displaying less-than-complete sialylation of the cell-surface proteins also ensures that a gradient of biological activity toward endogenous siglecs for cellular signaling and endocytosis (59) is maintained through structural diversity, which may confer an immunological advantage to the host cells (60).

The secreted *N*-glycoproteins were over-represented in α1,6 core fucosylation relative to the cell-surface proteins. In line with our previous observations, the higher degree of core fucosylation may serve to either mask hydrophobic patches to regulate stability/solubility of the secreted *N*-glycoproteins (19) or to protect these more exposed proteins from proteolytic degradation in the extracellular environment. It could be speculated that the membrane-embedded nature of cell-surface glycoproteins would make them more stable by not facing solubility issues in their local environment and less vulnerable to proteolytic digestion, thereby having less requirement for steric protection provided by a bulky fucose residue proximal to the protein surface.

We and others have observed that α-mannose is an unusual terminating structural determinant in the extracellular environment (61,62). This may partly be explained by the intracellularfunctions of mannose (and glucose) terminating *N*-glycans (16, 17). The presence of several mannose recognizing lectins in the extracellular environment including mannan binding protein (MBP), DC-SIGN, and macrophage mannose receptors may be relevant in the context of apoptosis when mannose terminating *N*-glycoproteins are exposed to the extracellular environment. In particular,MBP is a key player and a first line of defense in innate immunity, enabling phagocytosis of apoptotic cells through its binding to exposed immature or under-processed glycans or to pathogens carrying mannosylated glycoproteins (63, 64). Hiding mannose inside cells under physiological conditions could thus be viewed as being critical to avoiding the unnecessary onset of inflammation and auto-immunity. The presence of extracellular α-mannosylation would, as such, be indicative of pathophysiological conditions. In support of this hypothesis, high-mannose containing glycoforms of intracellular adhesion molecule 1 and EGF receptor on cell-surfaces were shown to contribute to endothelial inflammation (61) and correlated with poor prognosis of various cancers, respectively (61, 62).

It has been noted that the structure and function of the protein *N*-glycome is different within and outside human cells and that these differences may be shaped by evolutionary forces (60).We are the first to systematically investigate and mechanistically explain some aspects of subcellular-specific *N*-glycosylation. We conclude that human cells have developed protein structure-specific mechanisms including differential *N*-glycosylation site accessibilities to generate subcellular glycoproteomes that display distinct *N*glycosylation phenotypes using a shared biosynthetic machinery. Establishing this relationship is of general significance to glycobiologists and in particular to molecular immunologists due to the functional relevance of *N*-glycan determinants acting as ligands for the spectrum of endogenous lectins involved in facilitating an efficient immune response.

#### **ACKNOWLEDGMENTS**

This work was supported by Macquarie University Research Excellence Scheme postgraduate scholarship and ARC Super Science (FS110200026) and Discovery (DP110104958) Grants. Morten Thaysen-Andersen was funded an Early Career Fellowship by Cancer Institute NSW. The authors declare no conflict of interest.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at http://journal.frontiersin.org/Journal/10.3389/fimmu. 2014.00404/abstract

#### **REFERENCES**


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 09 July 2014; accepted: 07 August 2014; published online: 25 August 2014. Citation: Lee LY, Lin C-H, Fanayan S, Packer NH and Thaysen-Andersen M (2014) Differential site accessibility mechanistically explains subcellular-specific Nglycosylation determinants. Front. Immunol. 5:404. doi: 10.3389/fimmu.2014.00404 This article was submitted to Immunotherapies and Vaccines, a section of the journal Frontiers in Immunology.*

*Copyright © 2014 Lee, Lin, Fanayan, Packer and Thaysen-Andersen. This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

### Crossroads between bacterial and mammalian glycosyltransferases

#### **Inka Brockhausen1,2\***

<sup>1</sup> Department of Medicine, Queen's University, Kingston, ON, Canada

<sup>2</sup> Department of Biomedical and Molecular Sciences, Queen's University, Kingston, ON, Canada

#### **Edited by:**

Elizabeth Yuriev, Monash University, Australia

#### **Reviewed by:**

Thomas Dandekar, University of Würzburg, Germany Anne Harduin-Lepers, Centre National de la Recherche Scientifique, France

#### **\*Correspondence:**

Inka Brockhausen, Department of Biomedical and Molecular Sciences, Queen's University, 18 Stuart Street, Kingston, ON K7L3N6, Canada e-mail: brockhau@queensu.ca

Bacterial glycosyltransferases (GT) often synthesize the same glycan linkages as mammalian GT; yet, they usually have very little sequence identity. Nevertheless, enzymatic properties, folding, substrate specificities, and catalytic mechanisms of these enzyme proteins may have significant similarity. Thus, bacterial GT can be utilized for the enzymatic synthesis of both bacterial and mammalian types of complex glycan structures. A comparison is made here between mammalian and bacterial enzymes that synthesize epitopes found in mammalian glycoproteins, and those found in the O antigens of Gram-negative bacteria.These epitopes includeThomsen–Friedenreich (TF orT) antigen, blood group O, A, and B, type 1 and 2 chains, Lewis antigens, sialylated and fucosylated structures, and polysialic acids. Many different approaches can be taken to investigate the substrate binding and catalytic mechanisms of GT, including crystal structure analyses, mutations, comparison of amino acid sequences, NMR, and mass spectrometry. Knowledge of the protein structures and functions helps to design GT for specific glycan synthesis and to develop inhibitors. The goals are to develop new strategies to reduce bacterial virulence and to synthesize vaccines and other biologically active glycan structures.

**Keywords: glycosyltransferases, protein structure, specificities, glycoprotein epitopes, glycan mimics**

#### **INTRODUCTION**

Glycans play important roles in most biological processes in health and disease. Bacteria and human beings have a close relationship in the intestine, which can be symbiotic or pathogenic. Bacteria often produce human-like glycan structures with bacteria-specific glycosyltransferases (GT)s that have given them a selective advantage for adhesion, colonization, and survival. Knowledge of these enzymes can help us understand human counterparts of GTs, and provide a convenient technology to synthesize both bacterial and human glycans. Bacterial GTs can be easily expressed and stored; they are more soluble and often remarkably active and stable.

Currently, GTs from many different organisms have been classified into 96 GT families in the Carbohydrate-active Enzymes (CAZy) classification system (http://www.cazy.org), based on their sequence similarity derived from GenBank (ftp://ftp.ncbi.nih.gov/ genbank/ or EMBL or DDBJ) (1, 2). Very few of the bacterial GTs have been biochemically and functionally characterized, thus proposed enzymes are assigned based on similarity searches. The CAZy database also contains genetic, structural, mechanistic, and functional information of known GTs. The former *Escherichia coli* (EC) nomenclature for GTs as well as the currently accepted nomenclature and alternative names for GTs are included. A number of databases provide sequence analyses of GTs (e.g., NCBI BLAST, PFAM, INTERPRO, DBCAN, Swiss-Prot – ExPASy).

For searches of glycan structures, a number of databases are useful (3). For example, GLYCOSuiteDB contains information on N- and O-linked glycans and glycoproteins and Glycobase on N- and O-Glycan structures. For glycomics analyses by mass spectrometry (MS), GlycoMaster DB at http://www-novo. cs.uwaterloo.ca:8080/GlycoMasterDB is helpful (4). The current *E. coli* O-antigen database (ECODAB) contains known O antigen structures of *E. coli*, the analytic data available, and has links to genes involved in O antigen synthesis from the O antigen gene cluster (5). Many of the *E. coli* antigens can be found in other bacterial strains. Finally, the Consortium for Functional Glycomics (http://www.functionalglycomics.org/) provides a large database for glycan functions.

Because of wide-spread development of antibiotic resistance, we need new anti-bacterial strategies, and bacterial GTs are virulence factors that could be targeted. The understanding of GTs can help in the production of vaccines to protect against bacterial infections, cancer, and for application in inflammation and autoimmune disease. In this review, we will compare mammalian and bacterial GTs that show remarkable similarity of action, protein folding, or mechanisms, in spite of surprisingly large differences in amino acid sequences.

#### **MAMMALIAN GLYCOPROTEINS AND BACTERIAL GLYCANS**

Mammalian glycoproteins are involved in virtually all cellular activities; they serve as ligands for antibodies or lectins, or as receptors involved in signaling, cellular interactions, cell growth, differentiation, and cell death (6–11). Glycans are important in the inflammatory response, the innate and adaptive immune system, and cancer metastasis, as well as microbial colonization and infections. Glycoproteins have many functional epitopes attached to either N-glycans or O-glycans, and the amounts of many of these epitopes can be altered in disease, for example, in cancer. Although

there is remarkable diversity in glycan structures in mammals, and hundreds of different chains can be found in glycoproteins, only six sugar residues (Man, GlcNAc, GalNAc, Gal, sialic acid, Fuc) are forming the extended and branched varieties of glycans with few modifications such as O-acetylation and sulfation. N- and O-glycans can affect the chemical and physical properties and the conformations of proteins and the accessibility of peptide epitopes.

Bacteria display an astounding variety of unusual sugars and sugar linkages as well as modifications of sugars that are foreign to human beings and, therefore, can trigger immune responses. However, a number of specific bacterial glycans are mimics of mammalian glycoprotein epitopes (**Table 1**). Partial structures of Oantigenic polysaccharides of Gram-negative bacteria (ECODAB) often mimic human glycans and may help bacteria to evade the immune system and promote colonization. The mimicry may prevent the production of effective vaccines to protect against bacterial infections, which requires new considerations of antibacterial strategies. About half of the *EC* strains have some form of mammalian epitope within their O antigens. This includes Galβ1- 3GlcNAcβ-, and Galβ1-4GlcNAcβ-linkages, which are part of the glycan backbone structures (type 1 and type 2, respectively) in mammalian glycoproteins. In bacteria, those are internal structures within the O antigen repeating unit. The cancer-associated Thomsen–Friedenreich (TF or T antigen, O-glycan core 1) is common in glycoproteins and also found in several O antigens of *E. coli*. Blood group O, A, and B, sialylated glycans, and polysialic acid are mimics found in a number of bacterial strains. The fact that bacteria are able to synthesize these human-like structures suggests that they have the appropriate biosynthetic enzymes (**Table 2**), although this would be difficult to anticipate from the inspection of the amino acid sequences of their GTs. Biochemical characterization of bacterial enzymes and structure/function studies are important prerequisites to utilize these enzymes in chemoenzymatic synthesis of mammalian glycoprotein epitope structures.

#### **ROLE OF O ANTIGENS**

The LPS of Gram-negative bacteria are essential structures of the outer membrane. LPS binds to the LPS-binding protein, requiring the CD14/TLR4/MD2 receptor complex, which elicits a strong response during infections, through TLR4 signaling. LPS consist of a lipid A base (endotoxin), which carries a relatively invariable inner oligosaccharide core, strain-specific outer core oligosaccharides, and the serotype-specific outer O antigen polysaccharide. O antigens are polysaccharides composed of up to 50 repeating units of oligosaccharides with one (homopolymeric) to 10 sugars (heteropolymeric) that play a role in bacterial adhesion and colonization, affect pathogenicity and survival, and can be bacteriophage receptors. The enormous structural variability of O antigens is mediated by many specific GTs and other enzymes that modify O antigens, thus increasing structural diversity, e.g., by adding phosphate, acetyl groups, or branching sugar residues. The LPS molecules are necessary for stabilization of the outer membrane and form a barrier against penetration of toxins. In particular, the O antigens serve to evade complement; they protect against phagocytosis and give the bacteria a strain-specific and diversity-selective advantage. The molecular mimicry found

**Table 1 | Glycan mimics: examples of mammalian glycoprotein epitopes also found in the lipopolysaccharides (LPS) or lipooligosaccharides of Gram-negative bacteria**.


Bo, Bacteroides ovatus; Cj, Campylobacter jejuni; EC, Escherichia coli; Hp, Helicobacter pylori; Nm, Neisseria meningitides; Psp, Photobacterium sp.; Rsp, Rhizobium sp.

in a large proportion of bacteria adds to their ability to prevent recognition by the host immune system and thus promotes virulence.

A number of bacteria do not have an extended O-antigenic polysaccharide but instead have a short lipooligosaccharide that may have structural identity with human glycoproteins or glycolipids and may lead to pathological conditions. The close relationship between bacteria and human beings is also apparent in the abundance of bacterial lectins that bind to mammalian glycoproteins and thus promote adhesion to mammalian tissues.

#### **N-GLYCOSYLATION OF MAMMALIAN AND BACTERIAL GLYCOPROTEINS**

In eukaryotic cells, N-glycans are assembled first on a dolicholphosphate (P-Dol) intermediate on the cytoplasmic side of the endoplasmic reticulum (ER) membrane (7, 12) (**Figure 1**). GlcNAc-phosphate is transferred by GlcNAc-phosphotransferase in a reversible reaction, inhibited by tunicamycin, to P-Dol, followed by the transfer of another GlcNAc residue to form chitobiose. This is followed by five Man residues, all transferred from nucleotide sugar donor substrates to form the common N-glycan core structure Man5-chitobiose linked to PP-Dol. This heptasaccharide is flipped across the membrane and further addition of sugars occurs on the inside of the ER lumen through transfer from Man-P-Dol and Glc-P-Dol donor substrates. After completion of the lipid-linked N-glycan, it is transferred *en bloc* to the Asn residue(s) of Asn-X-Ser/Thr sequons in a glycoprotein by the oligosaccharyltransferase complex (OST), and Glc and Man



The donor substrates of mammalian and bacterial enzymes that synthesize the same linkage are the same but the preferred acceptor substrate may differ. The bacterial enzyme can mimic the mammalian enzyme and can be used to synthesize mammalian glycan structures but the amino acid sequences have little sequence identities. The percentage of amino acid identity is much higher among eukaryotic GT families and between GTs with similar action and using similar acceptors in bacteria.The accession numbers in bold (Accession No.) are from Pubmed and/or UniProt database. Bo, Bacteroides ovatus; Cj, Campylobacter jejuni; EC, E. coli; Hh, Helicobacter hepaticus; Hm, Helicobacter mustelae; Hp, Helicobacter pylori; Nm, Neisseria meningitides; Pm, Pasteurella multocida; Psp, Photobacterium sp.; Rsp, Rhizobium sp.; Sb, Shigella boydii; PP-, diphosphate-.

residues are selectively cleaved by glycosidases. After transfer to the Golgi, further removal of Man residues occurs, and GlcNActransferase I (GnT I, MGAT1) adds the first of the N-glycan antennae in β1–2 linkage to the Manα1–3 arm of the core. This can then be followed by several steps that depend on the presence of this first GlcNAc antenna and the expression of processing enzymes, which remove two Man residues from the Manα1–6 arm, add

Fucα1–6 to the inner chitobiose GlcNAc, and add further antennae to form complex type N-glycans. N-glycans can be extended by repeating Gal-GlcNAc residues to form type 1 or type 2 chains (**Table 1**); they may be branched by GlcNAcβ1–6 linkages and may be decorated with specific functional epitopes and blood group determinants (7). This results in hundreds of different N-glycan structures, depending on the glycosylation potential of the cell.

N-glycosylation is initiated at the endoplasmic reticulum (ER) membrane using nucleotide sugar donor substrates and a membrane-bound acceptor phospholipid with multiple isoprenyl units (dolichol-phosphate, P-Dol). The first sugar (GlcNAc) is transferred as GlcNAc-phosphate from UDP-GlcNAc by GlcNAc-P-transferase, resulting in GlcNAc-diphosphate-dolichol (GlcNAc-PP-Dol). This step can be inhibited by the UDP-GlcNAc analog tunicamycin. On the outside face of the ER membrane, another GlcNAc is added to form chitobiose, followed by five Man residues to form a heptasaccharide (Man5GlcNAc2)-PP-Dol. This heptasaccharide is flipped to the inside of the ER where the chain grows by transfer of sugars from membrane-bound Man-P-Dol and Glc-P-Dol. The completed saccharide Glc3Man9GlcNAc<sup>2</sup> is then transferred by an oligosaccharyltransferase complex (OST) to the Asn residue in an Asn-x-Ser/Thr sequon of nascent proteins. After trimming of sugar residues in the ER by removal of Glc and Man residues to

the Man8GlcNAc<sup>2</sup> structure, glycoproteins are exported to the Golgi where further trimming occurs by mannosidases. Many N-glycan chains are processed to the complex type by the addition of GlcNAc residues by GlcNAc-transferases I to V (MGAT1 to 5). Chains grow further by the addition of Gal-GlcNAc sequences and termination by sialyl-, Fuc-, Gal-, GlcNAc-, and GalNAc-transferases, which are all highly specific for both the donor and the acceptor substrates and with few exceptions form only one type of linkage between sugars. This creates a multitude of hundreds of different structures and epitopes with many possible functions, depending on the final destination of the glycoprotein, e.g., in the cell membrane or in secretions. Glycoprotein biosynthesis is regulated at many different levels, e.g., by the synthesis and delivery of nucleotide sugar substrates, the expression, activities and localization of glycosyltransferases and trimming hydrolases, the competition of enzymes for common substrates, levels of metal ion activating factors, localization of enzymes involved, and rate of transport of glycoproteins.

Not all N-glycosylation sites carry N-glycans, and there are differences in chain processing between different glycosylation sites of the same protein. The peptide has been shown to interact with the glycan chains, and this controls the conformations of the glycan and the peptide and leads to site-specific glycosylation. Many sequentially acting and competing GTs assemble glycoproteins in a cell type-specific pattern. Most of the GTs involved exist as families of enzymes (**Table 3**). Several of these have been shown to be localized in specific Golgi compartments according to their action within the complex pathways.

In the mammalian biosynthetic pathways, the sequence of sugar additions is controlled by the gene expression, the relative activities of competing enzymes, the enzyme localizations, levels of substrates and cofactors, and the distinct substrate specificities of GTs. These types of controls still need to be investigated for glycosylation reactions in bacteria.

Bacteria such as *Campylobacter jejuni* (*Cj*) also have Nglycosylated proteins (13). An oligosaccharide is first assembled on undecaprenol-phosphate (P-Und), an analog of P-Dol, in the cytoplasmic compartment. The sugar-PP-Und is then flipped to the periplasmic space where the glycan chain is transferred *en bloc* to protein by oligosaccharyltransferases. These GTs have a broad specificity toward their donor substrates but also require a sequon, Asp/Glu–x–Asn–y-Ser/Thr, where x and y cannot be Pro, in the protein acceptor, that bears close resemblance to the mammalian N-glycosylation sequon (**Figure 1**).

#### **PROTEIN O-GLYCOSYLATION**

O-glycans of glycoproteins and mucins are assembled in mammals without a lipid intermediate and without removal of sugar residues by glycosidases (14,15). The first sugar is always GalNAcαlinked to Ser or Thr (the cancer-associated Tn antigen). All sugars are transferred from nucleotide sugars in the Golgi, resulting in extended and branched O-glycans with hundreds of different structures. The most common structure is Galβ1-3GalNAc, core 1, the T antigen, which is normally masked by the addition of other

#### **Table 3 | Examples of glycosyltransferase families (CAZy)**.


Fold, expected overall fold; Hh, Helicobacter hepaticus; Hm, Helicobacter mustelae; I, Inverting mechanism; Psp, Photobacterium sp.; R, Retaining mechanism; -T, -transferase.

residues but exposed in many cancer cells. In a number of cells, core 1 is branched by core 2 β6-GlcNAc-transferase (C2GnT) or extended in a fashion that is similar to the synthesis of complex N-glycans.

GalNAc is transferred from UDP-GalNAc by up to 20 polypeptide GalNAc-transferases (GALNTs) (14–16). All GALNTs are classified in the GT27 family with a GT-A fold (**Table 3**). They have a catalytic domain linked to a lectin (ricin-like) domain at the C terminus. This lectin domain has three subdomains and may play an important role in binding products or substrates containing GalNAc residues. A crystal structure of mouse GALNT1 with Mn2<sup>+</sup> supported the importance of a DxH motif and the role of Asp209, His211, and His344 (17) (**Table 4**). The conformations of human GALNT2 (18) crystallized with UDP and with or without an acceptor peptide showed a loop formed over UDP. It appeared that the acceptor peptide connected the otherwise separate catalytic and lectin domains. Kinetic analyses showed that the presence of GalNAc in the acceptor was beneficial for activity. Human GALNT10 was crystallized complexed with UDP, GalNAc, and Mn2<sup>+</sup> (19). GalNAc-peptides appear to bind to the second beta-subdomain of the lectin domain. Binding of the donor induces a conformation change that opens the acceptor-binding site. These three crystallized ppGalNAcTs are similar in overall structure and mechanism.

An equivalent GALNT that transfers GalNAc to protein has not been identified in bacteria, although bacteria are known to O-glycosylate Ser/Thr residues of proteins with various sugar residues. In contrast to mammalian O-glycosylation, bacteria transfer a pre-assembled oligosaccharide to Ser/Thr. Bacterial protein OGTs have no sequence homology to GALNT and their action is reminiscent to that of OST in the N-glycosylation pathway. In several bacteria, for example in *Campylobacter* and *Neissseria*, an oligosaccharide or monosaccharide is first pre-assembled on PPlipid in the cytoplasmic compartment, flipped to the periplasm and then transferred *en bloc* to Ser/Thr residues of proteins. These enzymes have a relaxed oligosaccharide donor specificity (46). Oligosaccharyltransferase PglL (which has not yet been assigned to a GT family) from *Neisseria meningitides* (*Nm*) can transfer many different glycans from sugar-PP-Und or sugar-PP-lipid (including sugar-PP-Dol) donor substrates to protein in the periplasmic space. UDP-*N*-diacetyl-bacillosamine was also a donor substrate *in vitro*, showing that even nucleotide sugars can be donors and a single sugar could be transferred to protein. Mutagenesis experiments showed that PglL from *Nm* requires His349 for activity and for interaction with the lipid-linked oligosaccharide (47).

#### **BIOSYNTHESIS OF BACTERIAL O ANTIGENS**

There are many similarities in the pathways and mechanisms by which bacterial O antigens and mammalian glycoproteins are synthesized. In Gram-negative bacteria, O antigens are synthesized by specific GTs at the cytosolic face of the inner membrane where the nucleotide sugar donor substrates are present, as well as the membrane-bound P-Und, an analog of the mammalian P-Dol, as the acceptor substrate for the transfer of the first sugar (48) (**Figure 2**). The first GT to act is always a sugarphosphate transferase that produces the sugar-PP-Und substrate for subsequent transfer of monosaccharides by GTs. Most *E. coli* have GlcNAc or GalNAc at the reducing end of the repeating unit, thus sugar-phosphate transferase WecA and its orthologs are responsible for the first reaction, maintaining the α-anomeric configuration of GlcNAc. 4-Epimerases may also be involved in interconverting GlcNAc and GalNAc in the activated form (UDP-GlcNAc/UDP-GalNAc) or after the sugar transfer (49).


Enzymes are listed that have been characterized and crystallized and shown to be involved in the synthesis of glycoproteins or their glycan mimics in bacteria. Bo, Bacteroides ovatus; Bs, bacillus subtilis; Cj, Campylobacter jejuni; DxD, presence of a DxD motif or its analog or proposed catalytic residue; GT, Glycosyltransferase family (CAZy); Hp, Helicobacter pylori; multi, multifunctional enzyme; Inverting: enzymes invert the anomeric configuration of the sugar of the donor substrate to form enzyme product; N-glycan core Gn, reducing GlcNAc residue of the N-glycan core structure; Nm, Neisseria meningitides; Pm, Pasteurella multocida; Pp, Photobacterium phosphoreum; Psp, Photobacterium sp.; Rsp, Rhizobium sp.; Retaining: enzymes retain the anomeric configuration of the sugar of the donor substrate in the enzyme product; SA, sialic acid; -T, -transferase.

The common heteropolymeric O antigens are synthesized by sequential transfer of sugar units by donor- and acceptor-specific, membrane-associated GTs. The specificities of these bacterial GTs are distinct and comparable to eukaryotic GTs. A completed repeating unit is then translocated across the inner membrane to the periplasmic side by the multiple membrane-spanning flippase Wzx, a process resembling the transfer of Man5GlcNAc2–PP-Dol intermediate across the ER membrane. Polymerization involves the addition of repeating units to the reducing end of the growing chain by Wzy polymerase. This enzyme has 12 predicted transmembrane domains with the catalytic domain in the periplasm that has some specificity for the structure of the repeating unit. Wzy may invert the anomeric linkage of the first sugar in the polysaccharide since many repeating units have the GlcNAcβ-linkage in the O antigen. Many genes specifically involved in the synthesis

of the O antigen are found in the O antigen gene cluster. The presence of the *wzy* gene suggests that the O antigen is synthesized by the Wzy-dependent pathway (**Figure 2**). A much less specific chain terminator Wzz then helps to restrict the number of repeating units assembled in the O antigen. This is followed by a ligase (polysaccharide transferase)-catalyzed transfer of the O antigen to a specific sugar of the outer core structure, synthesizing the complete LPS. This releases PP-Und, which is recycled to P-Und. LPS is then extruded to the outer membrane by the Lpt complex (50).

The less common homopolymeric O antigens, such as the d-Rha polymers of *Pseudomonas aeruginosa* (*Pa*) and the d-Man polymers of *E. coli* O9, are synthesized by the transfer of monosaccharides from nucleotide sugars to R-GlcNAc-PP-Und in a processive fashion in the ABC transporter-dependent pathway (**Figure 3**) (51). Some of the processive GTs can have multiple

**bacteria by the polymerase-dependent pathway**. Many steps of the complex sequences and controls in the biosynthesis of LPS in Gram-negative bacteria are similar to those in mammalian glycoprotein biosynthesis. The inner membrane serves as the site of glycan biosynthesis, and the membrane-bound acceptor is undecaprenol-phosphate (P-Und) having 11 isoprenyl units, which is less than those found in eukaryotic Dol. Nucleotide sugars are synthesized in the cytosol and used for most glycosylation reactions. As in the N-glycan biosynthesis, the first sugar is transferred as sugar-phosphate by membrane-bound WecA to synthesize GalNAc/GlcNAc-PP-Und. This step can also be blocked by tunicamycin. It is possible that a 4-epimerase is involved. Subsequently, sugars are added individually to form the repeating unit of the O antigen. The glycosyltransferases that transfer sugars from nucleotide sugars usually have a high specificity for their donor and acceptor substrates and are associated with the membrane. After Wzx transports the repeating units to

catalytic domains, e.g., Man-transferase WbdA. The entire O antigen is assembled on the cytosolic side, and terminated by termination reactions, e.g., methylation. This is followed by translocation of the large O-antigen-PP-Und by the Wzm exporter and the ATPbinding Wzt to the periplasm where it is further processed. The presence of *wzm* and*wzt* genes in the O antigen gene cluster would suggest that this pathway is operative.

The events utilizing membrane-bound acceptor substrates in bacteria are similar to those of the early N-glycan synthesis in eukaryotes at the ER inner membrane (**Figure 1**). In both mammals and bacteria, isoenzymes are known that can synthesize the same linkage, often with slightly different substrate specificity. These isoenzymes are interesting models to study the catalytic sites and requirement for specific amino acids critical for catalysis and specificity.

#### **CHARACTERIZATION OF GLYCOSYLTRANSFERASES**

Chemical synthesis has been used to produce natural-like or unnatural glycans but the stereochemistry and regio-selectivity is difficult to achieve. Nature has developed GTs, excellent tools the periplasm, they are polymerized by Wzy by addition of repeating units to the reducing end of the growing polysaccharide linked to PP-Dol. The O antigen can be further processed and modified to form completed O antigens and the biosynthesis is usually terminated with Wzz. The O polysaccharide is then transferred to a sugar of the core oligosaccharide linked to lipid A by a ligase, forming the LPS, which is exported to the outer membrane by the Lpt complex. The O antigenic polysaccharide is then exposed to the environment on the outer membrane. Although many bacterial enzymes involved in LPS synthesis have been cloned, the individual steps of LPS synthesis are not well understood, mainly because of the major challenge to find the appropriate enzyme substrates and conditions to assay enzymes. The example shows the biosynthesis of the E. coli O104 antigen. The repeating unit tetrasaccharide contains the cancer-associated T antigen (Galβ1-3GalNAc), as well as the sialyl-T antigen (sialylα2-3Galβ1-3GalNAc). The WbwA sialyltransferase and the WbwB Gal-transferase remain to be characterized.

to synthesize an amazing diversity of glycan structures with defined anomeric configurations and linkages. GT reactions do not require harsh conditions or protection of reactive groups. GTs have distinct specificities for their donor and acceptor substrates. More than 100,000 genes from various species are thought to encode GTs, and organisms have 1–2% of their genes dedicated to GTs (52).

In order to assess the requirements and characteristics of GT activities, specific and accurate enzyme assays have to be developed. Nucleotide sugar donor substrates for mammalian glycoprotein biosynthesis are usually commercially available but for bacterial enzymes may have to be chemically or enzymatically synthesized. It would be difficult to extract the natural donor and acceptor substrates from bacteria in the pure form. Therefore, syntheses for bacteria-specific donor substrate analogs have been developed, e.g., for UDP-QuiNAc (UDP-6-deoxy-GlcNAc) found in *E. coli* and *Pseudomonas aeruginosan* (*PA*) (53) or for GDPd-Rha found in *PA* (54). Oligosaccharides linked to a synthetic aglycone group may be suitable acceptors for both, mammalian GTs and bacterial GTs. However, bacterial GTs that act early in the

O antigen synthesis pathway seem to require sugar-diphosphatelipids as acceptors, which are difficult to synthesize. We developed the natural acceptor analog GlcNAcα-diphosphate-lipid to mimic the product of the first sugar-phosphate addition (55), which was very active as an acceptor. In order to isolate the enzyme product from the assay mixture for quantification, a number of different chromatographic methods have been employed, including hydrophobic or anion exchange methods, HPLC, TLC, and capillary electrophoresis. Enzyme-coupled assays or lectin and antibody binding have also been used to determine activities. Methods to assay specific GTs are essential prerequisites to study their properties and optimal conditions, substrate specificities, and to develop inhibitors.

GTs can be classified based on similarities of their amino acid sequences, according to the sugar they transfer, and the stereochemistry of the reaction in the CAZy database. If at least 100 amino acids in two different stretches of the protein have significant similarity to other members of the same family but not to other families, GTs are assigned to a specific family with the same predicted fold, and being either inverting or retaining GT (**Tables 3** and **4**). However, not all known sequences fit into a GT family or are reclassified when the specific function of the GT has been established, and the number of families are growing. Sequence similarity of unknown proteins can be used to predict function and protein folding. However, the final proof of function has to be obtained by biochemical analysis of enzymes. Most GTs in bacteria have not been functionally characterized, and this area is both challenging and tedious, often because the appropriate donor and acceptor substrates have to be especially prepared.

Crystal structures for GTs from eukaryotic and prokaryotic sources have been helpful in delineating the catalytic actions of GTs. It is interesting that this large and important class of thousands of enzymes that bind to many different nucleotide sugars as well as to a very large variety of monosaccharides, oligosaccharides, glycopeptides, and glycolipids occurs in only two major fold types, GT-A and GT-B. GTs, thus, have a relatively conserved three-dimensional architecture within their catalytic sites and share mechanisms, resulting in an extremely large number of product structures with linear or branched glycans of mostly unknown functions.

The binding of substrates to GTs and the transfer reactions have been shown to involve conformational changes in the enzyme proteins. GT-A folded enzymes have two tightly associated α/βα Rossman nucleotide-binding-like domains with two α-helices surrounding an open twisted, central β-sheet. The donor and acceptor substrates bind in different domains. The GT-B folded enzymes have two β/α/β Rossman-like domains, which are less tightly associated with each other and have the active site in the cleft in between domains (56). Usually, the sugar donor substrate binds first. This induces a conformational change in the enzyme forming a lid over the nucleotide sugar, facilitating the binding of the acceptor substrate and catalysis in an ordered sequential, regio- and stereo-specific mechanism (57, 58). Internal disordered loops seem to be a common feature in mammalian and bacterial enzymes (40). Upon substrate binding, a disordered, short protein loop becomes ordered when donor substrate is bound. A change in orientation and conformation of the resulting ordered loop appears to facilitate binding of the second substrate and catalysis. Thus, the function of an ordered loop could be to allow catalysis, possibly by excluding water that would hydrolyze the donor substrate, or to form a lid over the nucleotide binding site allowing acceptor to bind, or to allow movement, and facilitating the reaction.

Generally, GTs have a distinct acceptor substrate specificity and with few exceptions, utilize only one type of nucleotide sugar donor substrate. Although few of the bacterial GTs have been biochemically characterized, it appears that both bacterial and mammalian GTs generally have similar properties with respect to their optimal pH, metal ion requirement, and donor specificity, although some bacterial GTs have a more promiscuous acceptor specificity (59). They can, thus, synthesize unnatural linkages that may find application as inhibitors or for biological studies. For example, β4-Gal-transferase LgtB from *Helicobacter pylori* (*Hp*) has been used to synthesize thio-glycosides.

A comparison of mammalian and corresponding bacterial GTs (**Table 2**) shows that there is a low percentage of amino acid identity (often <12%), although the activities are comparable and the sugar transfer reactions follow similar mechanisms. Exceptions are ABO transferases, GTA and GTB, that synthesize blood group A and B, respectively, similarly in human beings and in certain bacteria, and show about 20% identity. Some of the α2 and α3/α4-Fuc-transferases also have similar activities when comparing human and bacterial GTs and show 14.5–17.5% identity. This suggests an exchange of genes between mammals and bacteria or a common evolutionary origin. The similarity and identity between GTs with similar function in bacteria or within the eukaryotic GT families can be much higher. The arrangements of amino acids in the catalytic site may therefore be similar, leading to the binding of the same nucleotide sugars and acceptors with the transfer of the sugar in a specific linkage. The requirement of a metal ion to stabilize the negative charge of the nucleotide sugar may also be the same. An evolutionally conserved feature of GTs is that the catalytic mechanism usually involves a catalytic base.

Inverting GT (52, 57, 58) inverts the anomeric configuration of the sugar in the donor substrate. This inversion is expected to follow a single displacement where the catalytic base deprotonates the hydroxyl group of the acceptor to be glycosylated, which then becomes an active nucleophile attacking carbon-1 of the sugar of the donor substrate. This mechanism involves an SN2 reaction and an oxocarbenium ion transition state. Crystal structures show that the catalytic base (Asp, Glu, or His) is properly positioned near the hydroxyl to be glycosylated. In many cases, this catalytic residue is within a conserved DxD motif (**Table 5**). Both the DxD motif and the negatively charged phosphate group of the nucleotide leaving group may be stabilized by a divalent metal

#### **Table 5 | Conserved peptide motifs in glycosyltransferases**.


ion, but positively charged amino acids could also serve this function (20). Inverting GTs have been shown to act with a sequential ordered mechanism.

GTs that retain the anomeric linkage of the nucleotide sugar may function in a double displacement mechanism (58). Thus, for retaining GTs, a short-lived glycosyl-enzyme intermediate may form. This is followed by a shift in protein conformation that allows a nucleophilic attack on the anomeric center of the sugar by the deprotonated hydroxyl of the acceptor substrate to be glycosylated, maintaining the original anomeric linkage. A double displacement mechanism has been proposed for GalNAc-transferase GTA and Gal-transferase GTB, and a covalent glycosyl-enzyme intermediate through Cys303 was found (68). Other mechanisms may be possible and need to be investigated for retaining enzymes (58). GTs can also transfer sugar to water and thus have a nucleotide sugar hydrolase activity.

Mammalian GTs are single or multiple membrane-spanning proteins in the ER or single transmembrane-spanning type II membrane proteins in the Golgi, with a short cytosolic domain, a transmembrane anchor domain, and a stem region that helps the globular catalytic domain to protrude into the Golgi lumen. In the bacterial inner membrane, the first enzyme that adds the sugar-phosphate to P-Und such as WecA, as well as related sugar-phosphate transferases has multiple membrane-spanning domains. The remaining bacterial GTs that assemble O antigen repeating units do not have a transmembrane domain but have short hydrophobic stretches that may contribute to an association with membrane components. It is possible that both, mammalian and bacterial GTs, exist in protein/membrane complexes that activate enzymes and make the assembly of glycan chains highly efficient.

#### **A LARGE FAMILY OF Gal-TRANSFERASES**

Families of at least 5 β3-Gal-transferases (B3GALTs) and at least 7 β4-Gal-transferases (B4GALTs) participate in forming the extensions of glycoproteins (69) that are the basis for the attachments of epitopes including the Lewis<sup>x</sup> antigen, the selectin ligand involved in the inflammatory response (8). These inverting metal iondependent GTs have a DxD motif, bind UDP-Gal and a number of GlcNAc-terminating acceptor substrates.

#### **THE B4GALT FAMILY**

The crystal structures of both, human and bovine β4-Galtransferases 1 (B4GALT1) in complexes with donor and acceptor substrates and several mutants, have been thoroughly studied (22). UDP-Gal binds in a deep catalytic pocket of the bovine B4GALT1 together with Mn2+, in the vicinity of Asp252,Asp318, and Glu317 residues. The conformational change induced by binding UDP-Gal creates the binding site for GlcNAc-terminating oligosaccharides. The GlcNAc moiety, which needs to be in the β-anomeric configuration is bound by Phe280, Phe360, Tyr286, Arg259, and Ile363. The enzyme has three DxD sequences.

In the bovine B4GALT1 enzyme, the first Asp254 residue in the DVD motif has contact with UDP and Mn2<sup>+</sup> but mutations of Asp318 or Asp320 within the DDD sequence show that these residues are essential for activity. His344 normally interacts with Mn2+. A His344Met mutant is active in the presence of Mg2+, instead of Mn2<sup>+</sup> and maintains a closed conformation bound to Mg2<sup>+</sup> and UDP-hexanolamine, allowing an acceptor to bind. The mutant is, thus, useful to study the role of conformational changes and the binding of various acceptors (70, 71).

The catalytic domain of B4GALT1 has a short and a larger flexible loop containing the metal ion binding site. The binding of the donor and metal ion induces conformational changes in the long flexible loop, which changes from the open to the closed conformation, creating a lid over the bound nucleotide sugar. This opens an acceptor-binding site at the C terminus of the flexible loop. After the transfer reaction, the loop changes back to the open conformation, releasing the nucleotide (72). β4-Galtransferase 7 (B4GALT7) is another member of the same family, involved in priming glycosaminoglycan synthesis by adding Gal to Xylose (24). B4GALT7 also works in an SN2 type mechanism and changes conformation from closed to open conformation upon binding UDP and Mn2+. The mammalian β4-Gal-transferases have a common B4GALT motif GWGxED, which is not found in β3-Gal-transferases or in the bacterial counterparts of B4GALT (61) (**Table 5**).

β4-Gal-transferases that synthesize Galβ1-4GlcNAc sequences are also found in bacteria. For example, β4-Gal-transferase LgtB from *Helicobacter pylori* (*Hp*) can synthesize Galβ4-S-GlcNAc and Galβ4-Man linkages (59). The repeating unit of *Shigella boydii* (*Sb*) also contains the Galβ1-4GlcNAc sequence, which is synthesized by β4-Gal-transferase WfeD (73). The sequences of human β4- Gal-transferase and WfeD have about 9% identity; yet, the reaction catalyzed is similar. Both enzymes are inverting GTs, bind UDP-Gal and GlcNAc-R acceptor substrates, are activated by Mn2+, and have a DxD motif. Interestingly, we found that both enzymes are also activated by Pb2+, although the activation of the bacterial enzyme is much higher and is similar to Mn2<sup>+</sup> activation. While human β4-Gal-transferase is in the GT7 family with a GT-A fold, the structure and predicted fold of the WfeD in GT family 26 is uncertain (**Table 3**). The human enzyme does not accept the negatively charged bacterial acceptor substrate, GlcNAc-PP-lipid, and vice versa, the bacterial enzyme cannot act on GlcNAcβ-Bn, which is the standard acceptor for assays of the human enzyme. Mutagenesis of WfeD showed that the central Glu101 residue of the DxExE sequence is essential for activity. Lys211 was also found to be important, possibly by binding one or two phosphate group(s) of the acceptor substrate (73). Lys residues are apparently not involved in catalysis of the human enzyme. WfeD is not inhibited by GlcNAcβ-naphthyl, which is a potent inhibitor of the mammalian β4-Gal-transferase (74).

#### **THE FAMILY OF** β**3Gal-TRANSFERASES (B3GALT)**

Human glycoproteins can be extended with Galβ1–3GlcNAc (type 1) sequences that are also found in O antigens, e.g., in the repeating unit structure of the *E. coli* O7 antigen. There are five enzymes that synthesize the Galβ1–3GlcNAc linkage on a variety of acceptors in mammals. They are inverting GTs having a DxD motif and a requirement for divalent metal ions such as Mn2<sup>+</sup> (15, 69). B3GALT5 has a distinct specificity for O-glycan core 3 (GlcNAcβ1– 3GalNAc-) acceptors. However, crystal structures are not available for β3-Gal-transferases. Members of the β3-Gal-transferase family have two common peptide motifs, in addition to the DxD motif (**Table 5**).

A β3-Gal-transferase WbbD from *E. coli* O7 was detected that can act on GlcNAcα-PP-lipids where apparently the lipid structure is of minor contribution to the activity (75). The enzyme belongs to the GT2 family with a predicted GT-A fold and synthesizes the disaccharide Galβ3GlcNAc α-linked to PP-lipid as the second step in repeating unit synthesis. Deletion of the enzyme eliminates the synthesis of O antigen on LPS. This supports the idea that an inhibition of this second step is successful in creating bacteria that are more susceptible to the mammalian immune system.

#### **BIOSYNTHESIS OF THE THOMSEN–FRIEDENREICH (TF) ANTIGEN**

The cancer-associated T antigen, Galβ1–3GalNAc-, core 1, is the precursor for most O-glycans. In cancer, core 1 is often found in the unsubstituted form, while in normal glycoproteins, it is substituted by other sugars and is thus not recognized by anti-T antibodies. Sialylation of core 1 is also common in glycoproteins and often overexpressed in cancer and is recognized as the sialyl-T antigen (15). Several bacteria carry the T antigen as an internal structure within their O antigen repeating unit. The Shiga toxin producing O104 serogroup of *E. coli* is unusual in that it contains the T antigen in its O antigen repeating unit, as well as the sialyl-T antigen, sialylα2–3Galβ1–3GalNAc- (ECODAB).

The core 1 structure in human beings is synthesized by core 1 β3-Gal-transferase (T synthase, C1GALT1) and deficiencies of the enzyme are associated with pathological conditions including cancer. T synthase is the only known GT that requires the co-expression of a chaperone protein, Cosmc, C1GALT1C1 (76). C1GALT is a GT31 family member with a predicted GT-A fold, requires Mn2<sup>+</sup> for activity and prefers GalNAcα-glycopeptides as substrates but can also transfer Gal from UDP-Gal to GalNAcαbenzyl and related substrates (77).

The GTs responsible for the synthesis of the T antigen in bacteria have a similar function (**Table 2**). The T synthase WbwC in the *E. coli* O104 strain is within the GT2 family (**Table 3**), and has only 10.5% identity compared to human C1GALT. No chaperone is necessary for the expression and activity of the bacterial enzyme (78). Both, human C1GALT and WbwC have a GT-A fold and DxD motifs, utilize UDP-Gal as a donor and require Mn2<sup>+</sup> as a cofactor. However, in contrast to C1GALT, WbwC has a specificity for GalNAcα-diphosphate-lipid acceptor, while GalNAcα-peptides are not substrates. At this time, no crystal structure is available for T synthases but it is conceivable that the three-dimensional amino acid arrangements in the catalytic sites are similar. WbwC and human C1GALT could be distinguished using bis-imidazolium salt inhibitors, which showed that only WbwC, but not human C1GALT, was strongly inhibited with IC<sup>50</sup> values of 8µM (78). These inhibitors could selectively attack GTs in pathogenic bacteria. However, a potent inhibitor for T synthase has yet to be discovered (77).

#### **P BLOOD GROUP SYNTHESIS**

Human blood group P (**Table 1**) and related, complex structures containing the Galα1–4 linkage are synthesized by α4-Galtransferases (A4GALT), mainly using glycolipids with Gal residues as acceptors, e.g., lactosylceramide (79). However, a different α4- Gal-transferase from pigeon, related to β4-Gal-transferase from the same species, but not to β4-Gal-transferases from human beings, has been described that preferably acts on the N-glycans of glycoproteins (80).

A number of bacteria, including *Cj* (81), also express an α4- Gal-transferase with about 11% identity to the human enzyme (**Table 2**). The LgtC α4-Gal-transferase from *Nm* synthesizes the bacterial mimic of the human P blood group (45). The enzyme is a member of the GT8 family with a GT-A fold and follows a bi-bi kinetic mechanism where UDP-Gal binds first. The crystal structure of LgtC with analogs of UDP-Gal and lactose substrates suggests that Asp103 and Asp105 of one of the four DxD motifs, as well as His244, are in the vicinity of the donor substrate, while a Mn2<sup>+</sup> ion coordinates the phosphates of UDP. The mainly helical C terminus is expected to form hydrophobic and electrostatic interactions with the bacterial membrane. Multiple conformational states of LgtC with and without bound substrate analogs were found by methyl-TROSY NMR (82), which is additional information that cannot be obtained by static crystal structure analysis.

#### **A NEW DxDD MOTIF IN GT2 TRANSFERASES**

A new DxDD motif (**Table 5**), essential for activity, was discovered in WbwC (78). This motif is also present in WbdN, WfaP, WfgD, WbgO, WbiP, and CgtB (83–87). All of these GTs in the GT2 family having a DxDD motif are specific for the transfer of either Gal or Glc in β1–3 linkage to GalNAc or GlcNAc. Mutagenesis showed that in WbiP from *E. coli* O127 (83), the first Asp of the DxDD sequence was critical for activity while the second contributed but was not essential. In WbwC from *E. coli* O104 and O5, all three Asp residues were mutated and found to be important for activity. The first Asp (D91) is probably the catalytic base. The other Asp residues may support the nucleophilic property of the catalytic base (78).

While WbwC synthesizes the Galβ1–3 linkage attached to the first GalNAc residue at the reducing end of the O antigen repeating unit, several other GTs having a DxDD motif in the GT2 family were shown to synthesize the T antigen at a more internal position of the repeating unit. These GTs have a different specificity from that of WbwC and do not require the diphosphate in the acceptor. The T synthase activities of variants of CgtB from *Cj* mainly act on β-linked GalNAc acceptors. Variants of CgtB have distinct acceptor specificities (86) and synthesize lipooligosaccharides, which mimic mammalian glycolipids and glycoproteins.

#### **GlcNAc-TRANSFERASES FORM BACKBONE STRUCTURES**

Gal-transferases cooperate with five or more β3-GlcNActransferases (B3GNT) within the GT31 family to form the type 1 and 2 backbone structures of mammalian glycan chains (15, 88, 89) (**Table 3**). B3GNTs have significant sequence similarity with Gal-transferases. It is not known if these enzymes are physically associated, although their combined action would suggest this. A family of three β6GlcNAc-transferases (IGnT, GCNT2) then can add 1–6 branches to the linear chains. The β3-GlcNActransferases, but not the β6-GlcNAc-transferases, require divalent metal ions for activity. No crystal structures are yet available for B3GNTs.

In the N-glycosylation pathways, GnT I to V (MGAT1 to 5) (12) are responsible for forming GlcNAc-based antennae that can be further extended through repeating linear or branched GlcNAcβ1– 3Gal-disaccharides. MGAT1 is an inverting GT with a GT-A fold within the GT13 family. The crystal structure of rabbit GnT I with UDP-GlcNAc and Mn2<sup>+</sup> supports an ordered sequential mechanism. The DxD motif is present as EDD, with Glu211 being the likely catalytic base (19). MGAT2, 3, 4, and 5 are all inverting GTs and have been classified in the GT16, GT17, GT54, and GT18 families respectively. Although the GT17 family also contains uncharacterized bacterial proteins, no bacterial equivalents of MGAT have been found in bacteria.

In the O-glycosylation pathways, the basis of most extended chains is core 2. Core 2 β6-GlcNAc-transferase C2GnT1 (GCNT1) adds a branch to O-glycan core 1 to form the core 2 structure GlcNAcβ1–6(Galβ1–3)GalNAc-R (15). The enzyme has a GT-A fold and is classified in the GT14 family. The crystal structure of the catalytic domain of mouse C2GnT1 shows that the protein has four conserved intramolecular disulfide bonds (20, 21). Cys217, however, has to be reduced to support the activity, although it is not an essential residue (90). The human enzyme expressed in insect cells has two flexible N-glycans that protect the protein from degradation (91). C2GnT1 is an inverting GT that is active in the presence of EDTA and does not require Mn2+. The crystal structure suggests that the conserved, basic amino acids Arg378 and Lys401 stabilize the diphosphate group of UDP-GlcNAc and thus serve the function of Mn2+. The structure supports specificity studies of C2GnT1, showing an absolute requirement for the 4 and 6-hydroxyl groups of the Gal and GalNAc residues and the 2 acetamido group of GalNAc (77). Glu320 of the conserved SPDE sequence may be the catalytic base; it binds to the 4 and 6-oxygen of GalNAc and could thus deprotonate and activate the 6-hydroxyl to induce a nucleophilic attack on the C-1 of the GlcNAc moiety of UDP-GlcNAc (20, 21).

Bacteria do not appear to have C2GnT or GnT I equivalents, but they express type 1 of type 2 chains and β3-GlcNAc-transferases comparable to the mammalian enzymes in their activities. For example, a β3-GlcNAc-transferase from *Hp* is involved in the synthesis of lipooligosaccharides and GlcNAcβ1–3Gal- extensions that resemble mammalian epitopes (92). The β3-GlcNActransferase LgtA from *Nm* acts on lactose and has a relaxed donor specificity. It is most active with UDP-GlcNAc but can also utilize UDP-GalNAc (93). Both, the mammalian and bacterial β3- GlcNAc-transferases accept a wide variety of acceptor substrates but have low sequence identity (**Table 2**) (15).

#### **FUCOSYLTRANSFERASES**

Three different types of Fuc-transferases (FUT) are involved in glycoprotein biosynthesis. Peptide motifs have been identified that are specific for α2-Fuc-transferases, α3-Fuc-transferases, or α6-Fuctransferases or are shared by α2- and α6-Fuc-transferases (62). The mammalian α3-Fuc-transferases (FUT3-7,9-11) are inverting enzymes (**Table 3**) and have two shared motifs with similar spacing, shared by eukaryotic and bacterial α3FUT. The α2-Fuctransferases FUT1 and 2 have less than 30% sequence identity with their bacterial FUT counterparts, but have well-preserved α2-Fuctransferase motifs. There are also similarities between α2- and α6-FUT (FUT8) (60) with three common motifs (I to III), shared among eukaryotic and bacterial enzymes (62). The different types of FUT may have evolved from a common ancestor by divergent evolution (63).

#### **FUCOSYLTRANSFERASES THAT SYNTHESIZE THE H ANTIGEN**

The blood group O (H antigen, Fucα1-2Gal-R) is found in virtually all human beings and in certain bacteria and is the precursor substrate structure to form blood groups A and B. The enzymes that synthesize the H antigen in human beings are inverting α2-Fuc-transferases 1 and 2 (FUT1 and FUT2) that are closely related in sequence to the GT6 family ABO transferases GTA and GTB, although FUT1 and 2 have been classified into a different (GT11) family. FUT1 has a broad acceptor specificity for Galβ-R while FUT2 prefers O-glycan core 1 (T antigen) as a substrate (15).

Similar enzymes (**Table 2**) have been identified in *Hp* as FutC (94), in *E. coli* O86 as WbwK (95), as WbsJ in *E. coli* O128 (64), and WbiQ in *E. coli* O127 (96). WbwK and WbiQ have a distinct specificity for the T antigen (95, 96) and do not act on Galβ1–4 glycans. These FUT, therefore, have an activity resembling that of human FUT2 and have 12–17.5% sequence identity. HpFucT2 (FutC) adds Fuc preferably to Lewis x acceptors but also uses Lewis a and type 1 chains (94). In contrast, WbsJ prefers acceptors with terminal Galβ1-4Glc structures (64). WbsJ functions in the absence of divalent metal ion and does not have a DxD motif. Especially the first Arg residue of the HxRRxD motif, conserved in α2- and α6-Fuc-transferases, is critical for activity due to its positive charge. Domain swapping between WbwK and WbsJ showed that the C-terminal motifs function in determining acceptor specificity (95). All of the identified α2-Fuc-transferases have significant homology in GT family 11 (**Table 3**) with a predicted GT-B fold but none have been crystallized.

#### **FUCOSYLTRANSFERASES INVOLVED IN THE SYNTHESIS OF LEWIS ANTIGENS**

Lewis type antigens play essential roles in cell adhesion in the immune system and during inflammation, and aberrant amounts are often found in cancer. A family of mammalian, inverting α3-Fuc-transferases (FUT3–7, 9–11) is involved in Lewis antigen synthesis by linking Fuc to GlcNAc (9, 15, 97, 98). The enzymes vary in their acceptor substrate specificities and cell type expression and are in the GT10 family with a GT-B fold. FUT3 is an exceptional enzyme that has a dual specificity and adds Fucα1–3 on type 2 chains to synthesize Lewis x and y, as well as Fucα1–4 to type 1 chains to synthesize Lewis a and b (**Table 1**). FUT5 also shows some α4-Fuc-transferase activity. Human FUT3 and 5 have Trp111, responsible for type I acceptor recognition and 1–4 linkage synthesis. FUT that do not have this Trp synthesize the 1–3 linkage (99).

The bacterial α3FUTs show weak homology to mammalian FUT in two small segments of the catalytic domains (α3FUT motifs). They have about 10% sequence identity and a common GT-B fold but no transmembrane domain (25, 62). Two amphipathic α-helices serve to anchor the enzymes in the membrane. The gastric pathogen *Hp* is a prime example of expressing human-like type 1 and type 2 chains that are fucosylated and include Lewis antigens, which may play a role in adhesion to gastric epithelial cells or in internalization. *Hp* have short O antigens (lipooligosaccharides) and the human glycan mimics help to mask the immunogenic determinants of *Hp*, thus evading immune surveillance and supporting persistent *Hp* infections. The different pH environments in the various regions of the stomach influence the expression of Lewis antigens, and likely the activities of GTs, leading to phase variations.

A number of bacteria have Fucα1–4 linkages but*Hp* is especially rich in Lewis a, b, x, and y structures and in α3/4-Fuc-transferase activities (100). All of the eukaryotic and most bacterial α3-Fuctransferases are in the GT10 family. *Hp* has futA and futB genes encoding α3FUT, in addition to 1-3/4 FUT (FucTa). FucTa has the CNDAHYSALH sequence near the C terminus that controls type I chain recognition. It seems that in this α3/4 FUT, it is Tyr instead of Trp that determines the acceptor preference. Thus, the Y350A mutant synthesizes Lewis x since it had dramatically reduced α4 FUT activity (100).

The crystal structure of α3-Fuc-transferase from *Hp* shows that a Glu95 residue is positioned closely to the anomeric carbon of Fuc of the donor GDP-βFuc and could be a catalytic base (25) while Glu249 could stabilize the intermediate oxonium ion. Mutants in these Glu residues are virtually inactive. Interestingly, tandem repeats of 7 amino acids (DDLRINY) are found in this α3FUT. The 2–10 heptad repeats appear to connect the N terminus to 2 amphipathic helices at the C terminus and are thought to be involved in maintaining secondary structure and activity (101). The C terminal sequence appears to determine the stability and overall structure of the protein.

A different α3-Fuc-transferase HhFT2 from *Helicobacter hepaticus* (*Hh*) synthesizes the Lewis x as well as the sialyl-Lewis x antigen (102). This enzyme is a member of the GT11 family and has more homology to α2-Fuc-transferases such as WbsJ of GT11, but less to alpha3/4 FUT in GT family 10. It has 10.4% sequence

identity with the human enzyme FUT4. HhFT2 has three conserved motifs, one at the N terminus, one central, and one near the C terminus (**Table 5**).

#### **SYNTHESIS OF THE Fuc**α**1-6GlcNAc LINKAGE**

The α6-Fuc-transferases add Fuc in α1–6 linkage to the reducing end GlcNAc of the N-glycan core. The human enzyme (FUT8) requires the prior action of GnT I and cannot act when the chitobiose of the N-glycan core carries an α3Fuc residue, or if the internal Manβ residue carries the bisecting GlcNAc. FUT8 is classified in family GT23 with a GT-B fold. The crystal structure of human FUT8 shows three domains: an N-terminal coiled-coil domain, a catalytic domain that resembles GT-B folded GTs, and a C-terminal SH3 domain, although its significance is unknown. The C-terminal part of the catalytic domain contains a Rossmannlike fold with three regions, conserved in α2-, α6-, and other Fuc-transferases. Both Arg365 and Arg366 are critical for binding to GDP-Fuc while Asp453 may be a critical catalytic base (26).

A bacterial α6-Fuc-transferase with similar activity in the GT23 family with a GT-B fold and only 8% sequence identity is NodZ from *Rhizobium* sp. *(Rsp)* (**Tables 2** and **4**) (103). The crystal structure of NodZ shows two domains of nearly equal size but with different shape, separated by a central cleft (27). There are three conserved sequence motifs near the C terminus that play a role in GDP-Fuc binding or catalysis.

#### **GLYCOSYLTRANSFERASES THAT SYNTHESIZE BLOOD GROUPS A AND B**

The two human ABO transferases that synthesize the antigenic blood group A and B determinants from the H antigen (α3- GalNAc-transferase GTA and α3-Gal-transferase GTB, respectively) are homologous retaining enzymes within the GT6 family with a GT-A fold (**Table 3**). It is astounding that the critical difference in donor specificities determining blood group A or B lies in a difference of only four amino acids. While GTB that transfers Gal has Gly176, Ser235, Met266, and Ala268, the GTA protein that transfers the slightly larger GalNAc has mostly smaller amino acids Arg176, Gly235, Leu266, and Gly268.

In GTA and GTB, two domains are separated by a catalytic cleft containing the DxD motif (Asp211–Asp213) (39). However, a highly conserved Glu303 is likely to be the active nucleophile. UDP binds in the nucleotide sugar-binding domain at the N terminus and the Mn2<sup>+</sup> ion coordinates the β-phosphate of UDP. The H antigen acceptor binds to the C terminus. A disordered and flexible internal loop adjacent to the active site (40) becomes ordered when the nucleotide (sugar) is bound. This leads to a conformational change in the protein (43). Two amino acids are in contact with donor or acceptor in GTA and GTB (39) but only one of them determines the binding of the nucleotide-bound sugar moiety, i.e., either Gal or GalNAc. Leu266 in GTA has contact with the acetamido group that allows binding of UDP-GalNAc. Due to the larger Met in this position (Met266), GalNAc cannot be accommodated and, therefore, Gal binds. Ala/Gly268 has contact with the 3- and 4-hydroxyl groups of Gal and thus does not contribute to the difference in donor specificity.

Human beings have antibodies against the absent blood group (A or B), and it is possible that this is induced by bacteria displaying this blood group. A number of bacterial GTA-like enzymes are also in the GT6 family and resemble the human counterpart with relatively high sequence identity of about 20%. The similarities between human and bacterial enzymes suggest a horizontal gene transfer between species and between bacteria. The bacterial enzymes have an NxN sequence instead of the eukaryotic DxD motif, and most of these enzymes do not have a metal ion requirement. Thus, bacterial enzymes may have altered catalytic mechanisms, although there is a strong conservation of mammalian-type of residues in the active sites (104).

*Helicobacter mustelae* (*Hm*) synthesize the blood group A determinant, which reacts with anti-human blood group A antibodies (105). The enzyme responsible, GTA-like α3-GalNAc-transferase (BgtA), has 20% sequence identity to its human counterpart GTA and can act on Fucα1-2Galβ1–3-R or Fucα1–2Galβ1–4-R substrates (**Table 2**). Thus, bacteria may have acquired the GTA gene from a mammalian host, enhancing their molecular mimicry, although it is not clear how the human blood group is giving them a selective advantage.

The GTA-like enzyme BoGT6a from *Bacteroides ovatus* (*Bo*) (44) and GTB-like α3-Gal-transferase WbnI from *E. coli* O86 (95) that synthesize blood group B are related to the human enzymes with significant sequence homology in the GT6 family. Both donor and acceptor substrates are the same as those for GTA and GTB from human beings. The crystal structure of BoGT6a revealed a disordered region, which becomes ordered when acceptor Fuc-lactose is bound. This is accompanied by a large conformational change from the open to a closed state. Isothermal titration calorimetry (ITC) experiments showed that BoGT6a binds UDP-GalNAc with high affinity.

In non-primate mammals and new world monkeys, the linear blood group B occurs (Galβ1–3Gal-), without the α2-linked Fuc residue. This structure is foreign to human beings who have anti-linear B antibodies, thus hindering xenotransplantation. The α3-Gal-transferase A3GALT that synthesizes the linear B determinant is a homolog of GTA and GTB and has been crystalized with UDP and Mn2<sup>+</sup> (41). The invariable Glu317 was identified as the catalytic base. The crystal structure in a complex with Galβ-pnp suggests that Trp residues are critical for binding the natural substrate Galβ1-4GlcNAc (106). The disordered C terminal region is critical for allowing the substrate to bind (42). Bacterial analogs of this α3-Gal-transferase remain to be characterized.

#### **SIALYLTRANSFERASES IN MAMMALS AND BACTERIA**

Sialyltransferases are ubiquitous in eukaryotes and are also expressed in certain bacteria (107). These enzymes synthesize sialic acid linkages commonly found on the non-reducing termini of Nand O-glycans, and gangliosides as sialylα2–3Galβ1-or sialylα2– 6Gal(NAc)-linkages. In addition, sialylα2–8 linkages are found, especially in polysialic acids (PSA), which are extremely large, linear polymers, expressed in a cell type specific, restricted fashion in embryonic, neuronal, and other selected cell types (108). Sialic acids contribute to the acidity and hydration of a glycoprotein, the metal ion binding, and epitope exposure. While sialic acid can mask the underlying epitope, certain lectins of the immune system (e.g., siglecs) directly recognize sialic acid in specific linkages. Metastatic cancer cells and leukemia cells are often hypersialylated, which reduces further processing of glycans and causes glycan chains to be shorter (15). Sialylation significantly affects the adhesive properties of cells and has also been implicated in the functions of cell surface receptors (109).

Sialyltransferases are inverting GTs that usually lack a DxD motif and do not require divalent metal ions. Thus, general acids and bases identified in the crystal structures of α3- and α6-sialyltransferases that may interact closely with the substrates include His residues (28, 33). All known eukaryotic sialyltransferases have been classified as inverting GT29 with a GT-A fold, having at least four sialylmotifs (**Table 5**), a large (L), small (S), very small (VS), and motif III (65). The L motif contains the donor binding site while the S motif also binds the acceptor.

Bacterial sialyltransferases are inverting enzymes that bind CMP-sialic acid donor substrate and acceptors terminating in Gal or sialic acid but do not have these sialylmotifs and do not belong to the GT29 family. Instead, they are classified as GT42 (with a GT-A fold), GT52 or GT80 (with a GT-B fold), and GT38 (polysialyltransferases, PSTs). The 6-sialyltransferases (ST6GalNAc) acting on O-glycans do not appear to have a bacterial counterpart. Two highly conserved short motifs have been identified in bacterial PST and other bacterial sialyltransferases (GT52 and GT80), a Cterminally located HP sequence and a more N-terminally located D/E-D/E-G sequence (66). Certain bacterial sialyltransferases have multiple activities, including CMP-sialic acid hydrolase, transsialidase, and neuraminidase activities and are usually from the GT80 family. Thus, sialyltransferases can be promiscuous with respect to the linkages they form (or cleave) and the acceptor substrates they recognize. Bacterial sialyltransferases probably evolved separately from the eukaryotic enzymes, although their functions and mechanisms can be similar.

#### **ALPHA3-SIALYLTRANSFERASES**

In human beings, 6 α3-sialyltransferases (ST3GAL) form the sialylα2–3 linkage. The expression and activity of α3 sialyltransferase ST3GAL1 that synthesizes sialyl-T antigen are increased in breast cancer (110) and appear to promote survival of cancer cells in the blood (111). In keeping with its activity in adding a terminal structure, ST3GAL1 is localized to the medial and late Golgi compartments in human mammary cells (112). ST3GAL1 acts on glycopeptides with core 1 structure and also on Galβ1–3GalNAcα-R acceptors that have hydrophobic aglycone groups. In contrast, a bacterial equivalent, WbwA from *E. coli* O104, responsible for the rare occurrence of the sialyl-T antigen in *E. coli*, is in the GT52 family with a GT-B fold. WbwA, but not mammalian ST3GAL1, also has HP and D/E-D/E-G motifs (**Table 5**). The crystal structure of porcine ST3GAL1 with CMP and Galβ1–3GalNAc-acceptor substrate suggests that the essential His302 interacts with the phosphate of CMP-sialic acid. His319 is the catalytic base in motif III that is proposed to be positioned near carbon-2 of the sialic acid moiety (28). The conserved Tyr269 residue interacts with the 4-hydroxyl of GalNAc and thus determines the enzyme specificity for Galβ1–3GalNAc- over the Galβ1–3GlcNAc- acceptors.

An α3-sialyltransferase of the GT80 family from *Photobacterium Phosporeum* (*Pp*) has been crystallized with CMP (29). The acceptor-binding site has a wide access explaining that a range of possible disaccharides with Galα and Galβ linkages can form substrates. CMP binds in a cleft between the two domains of the GT-B fold. The main chain nitrogen of His317 in the HP motif is close to the nitrogen-4 of Cytidine and the side chain of His317 is near the oxygen of the phosphate. This suggests a critical role of these His residues in catalysis. Another α3-sialyltransferase with a GT-B fold in family GT52 from *Nm* was crystallized (31) with the donor analog CMP-3F-Neu5Ac. Asp258 could be a general base and His280 (within the HP motif) a general acid.

#### **ALPHA6-SIALYLTRANSFERASES**

Human α6-sialyltransferases add sialic acid to the Gal termini of N-acetyllactosamine chains of N-glycans. ST6GAL1 is highly expressed in colon cancer and metastatic cells (113) and also resides in the trans-Golgi (114). A homolog with 48% sequence identity (ST6GAL2) is mainly expressed in the brain and has the additional ability to synthesize sialylα2–6GalNAcβ1–4GlcNAcstructures (115). Human ST6GAL1 is a glycoprotein stabilized by three disulfide bonds (33). The catalytic residue, His370, deprotonates the 6-hydroxyl of Gal, generating an active nucleophile that attacks the carbon-2 of sialic acid. The reaction follows a randomorder mechanism of substrate binding. Rat ST6GAL has three disulfide bonds and two N-glycans (32). As many GTs, the enzyme has a disordered loop, and His367 is the catalytic base within the sialyl motif VS.

The bifunctional bacterial α6-sialyltransferase PM0188 from *Pasteurella multocida* (*Pm*) of GT family 80 has 14.6% sequence identity to human ST6GAL1. The crystal structure showed the GT-B fold and that Asp141, His311 (within the HP motif), Glu338, Ser355, and Ser356 were important for catalysis (37). The *Photobacterium* sp. *(Psp)* α6-sialyltransferase was also crystallized with CMP and lactose (35). The enzyme is in the GT80 family with a GT-B fold and has three domains, with the donor and acceptor bound between domains 2 and 3. Asp232 (within the D/E-D/E-G motif) is near the 6-hydroxyl of Gal while the nitrogen of His405 (within the HP motif) is close to the phosphate-oxygen. Thus, Asp232 could be a catalytic base that deprotonates the 6-hydroxyl of Gal, and His405 could be a catalytic acid that protonates the donor substrate.

#### **MULTIFUNCTIONAL SIALYLTRANSFERASES**

In bacteria, mimics of human sialylα2-3/6/8Galβ1- structures occur, e.g., in the lipooligosaccharides of Gram-negative bacteria such as *Cj* (116, 117). Cells of the human nervous system are rich in gangliosides as well as glycoproteins containing similar sialyl-linkages. Thus, after bacterial infections, cross reactivity of antibodies could cause the rare development of neurological disorders. Guillain–Barré syndrome is an example (118). *Cj* expresses a bifunctional α3/8-sialyltransferase CstII and an α3-sialyltransferase CstI, which are responsible for the molecular mimicry of *Cj* in their lipooligosaccharide structures. Both enzymes have a predicted GT-A fold within the GT42 family. The structure of CstI shows (30) that His 202 is the catalytic base. Similarly, His188 is likely the catalytic base in CstII that deprotonates the 3-hydroxyl of Gal, which then attacks carbon-2 of sialic acid of the donor (119). The flexible lid in the CstII protein becomes ordered in a closed form when CMP binds (36). The acceptors lactose (for α3-sialyltransferase activity) or sialyl-lactose (for α8 sialyltransferase activity) bind in a cleft and Arg129, Asn51, and Tyr81 contribute to the binding of the sialylated acceptor. The role of His188 as a catalytic base in CstII has also been confirmed by NMR studies (120). The intrinsic *pK*<sup>a</sup> values of His188 were measured in monomeric mutants by determining the pH-dependent chemical shifts of [13C]-labeled His188.

The monofunctional sialyltransferases function with similar mechanisms compared to the mammalian enzymes. Multifunctional enzymes, however, are primarily found in bacteria and include the α3-sialyltransferase PmST1 from *Pm*, which binds CMP-sialic acid as donor and lactose, Gal, GalNAc as well as sialic acid as acceptor. The crystal structure shows that binding of CMPsialic acid donor substrate causes a change in conformation and opens the acceptor-binding site. The activities of PmST1 function optimally at different pH values. It has a GT-B fold within the GT80 family (**Table 3**). The crystal structure shows that Asp141 is the catalytic base (34) with His112 also being important for enzyme activity. Another multifunctional α3-sialyltransferases PdST from *Pasteurella dagmatis* (*Pd*) in the GT80 family with a GT-B fold is also a CMP-sialic acid hydrolase. At low pH, it can act as a trans-sialidase and a sialidase (121).

#### **POLYSIALYLATION**

Important sialic acid structures are the PSA, found in human neuronal and other selected cell types (107). Only a selected number of proteins carry the PSA modification (122). For example, polysialylated neural cell adhesion molecule N-CAM is prominent in the developing nervous system but also occurs in leukocytes with roles in the regulation of cell adhesion. N-CAM becomes anti-adhesive when long polymers of α2–8-linked sialic acids are covalently attached to its N-glycans (108). The sialylα2–8 linkages of PSA are synthesized based on sialylα2–3/6Gal residues of N-glycans by developmentally regulated PSTs, which are highly expressed in the developing and embryonic brain (123). Neuropilin-2 (NRP2) is a glycoprotein containing multiple N-glycosylation sites, as well as O-glycans with sialylated core 1 and 2 structures. In cells lacking core 2, human PST (ST8SiaIV) was shown to assemble PSA on sialylated core 1 chains of neuropilin (124). The presence of these PSA polymers extends the half-life of proteins.

*E. coli* and *Nm* are examples of bacteria that carry sialylα2–8 polymeric PSA capsules, which help bacteria to resist phagocytosis. These PSA capsules mimic the eukaryotic chains, although they are linked to the membrane via a lipid anchor, and may have bacteria-specific modifications such as O-acetylation (125). The large, charged and hydrated polymeric enzyme product is assembled in the cytoplasmic compartment and then extruded through the membranes by ABC transporter and export systems (**Figure 3**). PSA confers a selective advantage to bacteria in the human nervous system and is associated with meningitis or other neurological conditions. Bacteria may also have PSA with α2–9 linkages or alternating α2–8 and α2–9 linkages.

In mammals, PSTs synthesize PSA by the addition of individual sialic acid residues in a processive fashion. Like the other sialyltransferases, mammalian PSTs are inverting enzymes of the GT29 family (**Table 3**). In addition to four sialyl motifs, ST8SIAII and ST8SIAIV (formerly STX and PST, respectively) have a unique, conserved, polybasic PST domain (PSTD motif) (**Table 5**) (67), which is absent from the other types of sialyltransferases. Basic residues in the PSTD motif are responsible for acceptor substrate recognition (126, 127).

In bacteria, the PSA capsule is synthesized by GT38 family PST that are inverting enzymes. In *E. coli*, PST has only 5.4% sequence identity with the human enzyme (128). The human PST equivalent from *Nm* has <10% sequence identity with human PST and has a requirement for Mg2<sup>+</sup> (129). Kinetic experiments of His and Pro mutants of the PST from *Nm* suggested that the HP motif contributes to CMP-sialic acid but not acceptor binding. The acceptors can be a glycolipid containing two sialic acid residues as a primer. Gal-terminating glycans of glycopeptides, including the T antigen linked to Ser, also served as acceptor substrates for the PST form *Nm*. Different PSTs synthesize either the α2–8 or α2–9 linked polymers of bacterial PSA capsules.

#### **METHODS TO STUDY GLYCOSYLTRANSFERASE PROTEIN STRUCTURES AND FUNCTIONS**

It is often difficult to produce sufficient pure enzyme in order to analyze protein structure by X-ray crystallography. In addition, the protein may not show exactly the same properties in a crystal, compared to solution and body fluids. To approximate the protein structure present in the natural environment, protein NMR studies have been helpful (130). Enzyme substrate or inhibitor interactions have been determined by biochemical kinetics studies but can also be studied by MS and Saturation Transfer Difference (STD) NMR (131, 132). Conformational dynamics of proteins to understand molecular recognition can be achieved by molecular dynamics simulation and docking programs, which requires knowledge of protein structure. Theoretical modeling has been undertaken to predict protein structure, substrate binding, and dynamic properties of GTs. Thus, the three-dimensional interactions between substrates and enzyme protein, cofactor binding sites, ligand flexibility, and movements can be estimated by computational methods (133, 134). Multivariate data analysis of the amino acid property patterns also helps to predict a protein fold (135).

New enzymes can be designed based on knowledge of protein structure and substrate binding. For example, the blood group B GTB Gal-transferase has been re-designed with a model Epimer Propensity Index (EPI) to transfer Glc instead of Gal (136). The orientation of the sugar donor in the folded enzyme is highly conserved. The R228K mutant of β4GalT1 has higher Glc-transferase activity due to the inability to effectively bind the axial 4-hydroxyl of Gal (23). Similarly, GTB modeling correctly predicted a higher Glc-transferase activity of GTB in the presence of the unnatural UDP-Glc donor upon increasing the sizes of Ser185 to Asn and Cys (136).

One approach to developing good GT inhibitors is to obtain qualitative and quantitative information on the substrate binding sites from NMR spectroscopy. STD NMR measures the signals of the unbound substrate, which is then compared to those of the bound substrate. Saturation transferred from the enzyme to specific sites of the bound substrate is seen as an attenuation of resonance signals. The difference spectra at different ligand concentrations allow to identify the bound substrate and to determine the binding affinity. In spin-lock filtering experiments, transverse relaxation of substrate signals is recorded, which is enhanced when the ligand is bound. Thus, signals are attenuated upon ligand binding. This process can be enhanced by using spin labels. The conformations and relative placements of bound GnT V substrates have been determined using transferred NOE and STD measurements (137).

In addition to these NMR experiments, surface plasmon resonsance (Bioacore) experiments can be used to determine the binding affinities (132). The ligand binding ability of GTs with and without donor or a number of potential inhibitors can be assessed with biotinylated substrates bound to a streptavidin-coated chip. The binding of donor and acceptor analogs to the blood group B enzyme GTB has also been analyzed by ITC combined with STD NMR titration (138). The results show the binding stoichiometry and binding affinity of one donor and acceptor molecule per protein, the thermodynamics, enthalpy, and entropy changes upon binding as well as the dissociation constants. The study also emphasizes that there can be differences in binding substrate analogs that should be considered.

Electrospray-mass spectrometry (ES-MS) has been used to determine the thermodynamics and affinities of substrates (139). Association constants were measured from the relative abundance of ions in the EI-MS spectra for GTA and GTB in aqueous solution with native donor and acceptor substrates as well as substrate analogs, products, and metal ion cofactor. To confirm the retaining mechanism of the enzymes, a mutant of blood group A (GTA) GalNAc-transferase in solution containing UDP-GalNAc and Mn2<sup>+</sup> was studied, as well as the similar mutant of GTB. The catalytic Glu303 was replaced with Cys. After Trypsin digestion, a covalent intermediate could be trapped. Thus, tandem MS using collision-induced dissociation confirmed that Cys303 in both GTA and GTB enzymes was responsible for forming the glycosyl-enzyme intermediate. The formation of trisaccharide product can also be proven by MS (68). Thus, the double displacement mechanism of these retaining GTs was supported by MS.

#### **GLYCOSYLTRANSFERASE INHIBITORS**

Detailed knowledge of GT structure and function is the basis for the development of effective GT inhibitors that may re-direct glycan biosynthesis. GT substrates usually bind through a small number of essential hydrogen bonds or hydrophobic interactions. Thus, not all of the substituents of the donor sugar, the base of the nucleotide, or the sugars of the acceptor are critically involved in binding (140). Therefore, modifications of these residues can result in competitive inhibitors that still bind in the catalytic site but do not support catalysis. Inhibitors can be ligands that bind well to the enzyme but cannot be released easily, or interfere with catalysis either as donor substrate analogs, acceptor substrate analogs, transition state analogs, compounds that prevent conformational changes necessary for catalysis, or compounds that distort protein conformation (74, 77, 103). Small structural modifications of compounds can have a dramatic effect on their inhibitory activity. Inhibitors have been designed that interfere with conformational changes and flexible loop movements that are essential events for substrate binding and catalysis (141). Sugar donor analogs for ABO transferases (GTA and GTB), carrying a substituent at the uracil moiety, block the stacking of amino acids required for the proper folding of the internal loop. A heterocyclic compound inhibited GTB by interfering with its ability to bind metal ion, as well as donor and acceptor substrates. The compound does not appear to be structurally related to the acceptor but partly binds in the acceptor-binding site (142). A combination of crystal structure, Biacore, STD NMR, and docking experiments suggested that the inhibitor competes with binding of Fuc of the acceptor and the Mn2<sup>+</sup> ion. Non-competitive inhibitors have also been described that potentially alter the structure of the enzyme leading to inactive proteins (77, 78). Modified nucleotide sugars are often recognized by GTs leading to transfer of unnatural sugars (143). Fluorescent groups modifying the base of the sugar-nucleotide can be useful as indicators of binding (144).

#### **CHEMOENZYMATIC SYNTHESIS OF SHARED EPITOPES**

The preparation of bacterial GTs that lack a transmembrane domain is relatively inexpensive and they can be used in chemoenzymatic synthesis not only of bacterial glycoconjugates but also for mammalian oligosaccharides and glycoproteins with specific epitopes (**Table 1**). Due to the variety of bacterial enzymes with different specificities, a diverse range of glycan structures can be synthesized and processed for use as vaccine, to prepare antibodies for passive immunity, and for further studies of glycan functions. Examples include the synthesis of the complete blood group Forssman antigen GalNAcα3GalNAcβ3Galα4Galβ4Glc-pnitrophenyl by β3-GalNAc-transferase and α4-Gal-transferase from *Cj*, followed by α3-GalNAc-transferase from *Pm* (81, 140). The assembly of the entire blood group B determinant was achieved using GTs from *E. coli* O86 (145). Bacterial enzymes α4-Gal-transferase LgtC, β3-Gal(NAc)-transferase LgtD, and α2-Fuc-transferase WbsJ (146) efficiently synthesized the tumor-associated epitope Globo-H-hexasaccharide (Fucα2- Galβ1–3GalNAcβ1–3Gaαl–4Galβ1–4Glcβ-benzyl).

Knowing the amino acids and mechanisms involved in substrate binding and catalysis, bacterial enzymes or new mutant enzymes can be engineered for use in the production of new natural or unnatural glycan structures, or for more efficient synthesis of known structures (147). For example, new donor specificity can be engineered by mutating only one or two critical amino acids that convert the function of the enzyme (140, 148).

Phosphorylases can also reversibly form glycosidic linkages (149). They can have similarity to either inverting or retaining glycohydrolases or to GT-B-folded retaining GT (CAZy). Sugar-1-P can be used as a substrate for phosphorylases to produce a wealth of different glycans with regio-selectivity. An interesting combination of chemical and enzymatic synthesis of the T antigen and the Galβ1–3GlcNAc linkage has been achieved using a combination of galactokinase (GalK) from *E. coli* that synthesizes Gal-1-P, and a Galβ1–3 HexNAc phosphorylase from *Bifobacterium infantis* that has promiscuous acceptor specificity (150). These enzymes could add Gal in the presence of ATP to synthetic GalNAc- and GlcNAc- substrates with various aglycone groups. The phosphorylase has multiple DxD motifs and an Asp-rich domain at the C terminus. The T antigen was also synthesized from sucrose and GlcNAc using phosphorylase from *Bifidobacterium longum*

(151) together with sucrose phosphorylase, UDP-Glc-hexose-1-P uridyltransferase and UDP-Glc 4-epimerase.

Glycosidases catalyze reversible reactions and have also been used to form sugar linkages using high concentrations of reactants. Glycosidases act with an inverting or retaining mechanism, utilizing a catalytically active nucleophile in the active site such as Asp or Glu. Mutant glycohydrolases that lack the catalytic base as well as hydrolase activity can be used to efficiently transfer a sugar to an acceptor substrate and synthesize specific linkages (glycosynthases). For example, large N-glycan-type oligosaccharides can be transferred to the GlcNAc residue linked to Asn of glycoproteins by a mutant endo-glucosaminidase that normally cleaves the chitobiose and releases the N-glycan. Thus, engineered glycosidases can be stereo-selective and very useful in achieving high yields of complex glycans (152).

#### **CONCLUDING REMARKS**

It is astounding that proteins can be so different in amino acid sequence and yet become similar specific and effective catalysts for the transfer of sugars to proteins, lipids, and sugars and only have two major protein folds. Many possibilities are there for binding of donor and acceptor substrates but the transfer only involves inversion or retention of the anomeric configuration of the sugar. Mechanisms common to eukaryotes and bacteria include a change in protein conformation upon nucleotide sugar binding facilitating acceptor binding, and the action of a base (Glu, Asp, His) that deprotonates the hydroxyl to be glycosylated, which then becomes a nucleophile that results in cleavage of the sugar from the donor substrate. Bacterial and mammalian enzymes are often comparable in their action so that mammalian epitopes can easily be synthesized with bacterial enzymes, for example, to produce vaccines for cancer. However, the bacterial world is much more complex, variable, and challenging. Knowledge of bacterial GTs can lead to the synthesis of glycans, enzyme substrates, and antigens to study their biological functions and role in disease and to synthesize vaccines against specific pathogenic strains of bacteria. Bacteria may have evolved to express the GTs that make human-like structures, giving them a selective advantage. Most of the time, bacteria and human beings are symbiotic or compatible but once in a while, the mimicry of bacteria can lead to infection and serious consequences. We speculate that bacterial and mammalian enzymes with similar functions may have evolved in parallel, or may be derived from an ancient common ancestor. There may have been exchange of genes between these species (horizontal gene transfer), or GTs may be derived by convergent evolution. The many similar genes of a particular family may have been derived by gene duplication from an ancestral gene.

Further detailed understanding of GT structures and mechanisms helps to visualize how amino acids cooperate in forming a catalytic site, predict their functions, and to gain valuable insight into the syntheses of complex glycans in mammals and in our close neighbors, bacteria. Both, the bacterial world and human beings can benefit from this relationship. In addition, inhibitors of bacterial GTs may help to eliminate virulence factors, and this is an urgently needed goal in light of growing antibiotic resistance.

#### **ACKNOWLEDGMENTS**

The work by the author has been supported by the Canadian Institutes of Health Research and the Natural Science and Engineering Research Council of Canada.

#### **REFERENCES**


O-glycans. *Biochim Biophys Acta* (2013) **1830**:4274–81. doi:10.1016/j.bbagen. 2013.04.001


correlates with the degree of enterocytic differentiation. *Biochem Biophys Res Commun* (1992) **184**:1405–10. doi:10.1016/S0006-291X(05)80039-7


NMR experiments. *J Am Chem Soc* (2006) **128**:13529–38. doi:10.1021/ ja063550r


**Conflict of Interest Statement:** The author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 16 July 2014; accepted: 23 September 2014; published online: 20 October 2014.*

*Citation: Brockhausen I (2014) Crossroads between bacterial and mammalian glycosyltransferases. Front. Immunol. 5:492. doi: 10.3389/fimmu.2014.00492*

*This article was submitted to Immunotherapies and Vaccines, a section of the journal Frontiers in Immunology.*

*Copyright © 2014 Brockhausen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

#### **Mark B. Richardson and Spencer J.Williams\***

School of Chemistry and Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Parkville, VIC, Australia

#### **Edited by:**

Mark Agostino, Curtin University, Australia

#### **Reviewed by:**

Katsumi Maenaka, Hokkaido University, Japan Norberto Walter Zwirner, Institute of Biology and Experimental Medicine, Argentina

#### **\*Correspondence:**

Spencer J. Williams, School of Chemistry and Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, 30 Flemington Road, Parkville, VIC 3010, Australia e-mail: sjwill@unimelb.edu.au

#### **INTRODUCTION**

Macrophage C-type lectin (MCL; Clec4d, ClecSf8) and macrophage inducible C-type lectin (Mincle; Clec4e, ClecSf9) are transmembrane germline-encoded pattern recognition receptors (PRRs) that form part of the innate immune system. These C-type lectin receptors (CLRs) recognize damage-associated molecular patterns (DAMPs) and enable immune sensing of damaged self, and pathogen-associated molecular patterns (PAMPs) from a growing list of bacteria and fungi (**Figure 1**) (1, 2). The PAMPs notably include mycobacterial trehalose dimycolate (TDM, cord factor), and appear to play a significant roles in the immune response to certain bacterial and fungal infections. In the case of Mincle, recognition of PAMPs is mediated through the carbohydrate binding part of the carbohydrate recognition domain (CRD) in the extracellular region of the CLR, whereas recognition of DAMPs occurs through a distinct region of the CRD (3). For both CLRs, signal transduction occurs through the immunoreceptor tyrosine-based activation motif (ITAM)-containing adaptor molecule Fc receptor γ-chain (FcRγ). Ligand binding to Mincle leads to phosphorylation of the ITAM of FcRγ and recruitment of spleen tyrosine kinase (Syk) (3). Syk recruitment by FcRγ leads to nuclear factor kappa-light-chain-enhancer of activated B cells (NF-κB) activation through Card9–Bcl10–MALT1 signalosomes, pivotal regulators that link innate and adaptive immune responses (4). DNA transcription leads to the production of cytokines and chemokines that shape the development of naïve T cells into effector T helper (TH) cell TH1 and TH17 subtypes (5). There is a growing interest in the immunological roles of MCL and Mincle for the development of defined synthetic adjuvants for TH1/TH17 vaccination, and due to the involvement of these CLRs in the recognition of bacterial and fungal pathogens, and the response to dysregulated cell death.

Macrophage C-type lectin (MCL) and macrophage inducible C-type lectin (Mincle) comprise part of an extensive repertoire of pattern recognition receptors with the ability to sense damage-associated and pathogen-associated molecular patterns. In this review, we cover the discovery and molecular characterization of these C-type lectin receptors, and highlight recent advances in the understanding of their roles in orchestrating the response of the immune system to bacterial and fungal infection, and damaged self. We also discuss the identification and structure–activity relationships of activating ligands, particularly trehalose dimycolate and related mycobacterial glycolipids, which have significant potential in the development of TH1/TH17 vaccination strategies.

**Keywords: cord factor, C-type lectin receptors, glycolipids, PAMP, DAMP, T cell**

#### **MCL AND MINCLE: C-TYPE LECTIN RECEPTORS**

The genetic and molecular analysis of MCL and Mincle preceded the understanding of their function. MCL was originally cloned as a mouse macrophage-restricted C-type lectin (6). Mincle was cloned as a transcriptional target of the nuclear factor NF-IL-6, which binds the interleukin-1 (IL-1) responsive element of the IL-6 gene (7). Mincle RNA was induced upon exposure to inflammatory stimuli including lipopolysaccharide (LPS), tumor necrosis factor-α (TNFα), IL-6, and interferon-γ (IFNγ). Cloning/identification of the human and rat genes followed shortly thereafter (8, 9). According to the HUGO Gene Nomenclature Committee, the gene encoding MCL is CLEC4D (formerly CLECSF8) and that encoding Mincle is CLEC4E (formerly CLECSF9). Both CLRs are located on mouse chromosome 6 and human chromosome 12, clustered with closely related CLRs of the Dectin-2 family (Dectin-2, DCIR, DCAR, and BDCA-2) within the telomeric region of the natural killer complex (10). Sequence analyses predict type 2 transmembrane proteins with an N-terminal cytoplasmic tail, a transmembrane region, a membrane proximal stalk, followed by a single C-terminal C-type lectin domain (11). The transmembrane region of Mincle contains a positively charged arginine that mediates interaction with ITAMbearing adaptors; a direct interaction of Mincle with FcRγ was obtained by immunoprecipitation of Mincle using an anti-FcRγ antibody in transfected human embryonic kidney cells, with the precipitation abolished upon substitution of arginine by leucine (**Figure 2A**) (3). This interaction is critical for the ability of Mincle to signal as Mincle-dependent signaling is lost in the leucine mutant (3).

Macrophage C-type lectin is an FcRγ coupled receptor (12) that signals through Syk (13); however the nature of the adaptor protein interaction is unclear (**Figure 2A**). MCL does not contain an

pair with the signaling adaptor molecule Fc receptor γ-chain (FcRγ). With Mincle this association is driven by the presence of a positively charged arginine in the transmembrane region whereas the corresponding residue is not present in MCL. Upon binding to DAMPs or fungal and bacterial PAMPs, phosphorylation of the immunoreceptor tyrosine-based activation motifs (ITAMs) of FcRγ recruits spleen tyrosine kinase (Syk)

Transcription results in the formation of immunoregulatory cytokines and chemokines. **(B)** Heterodimers of MCL and Mincle have the capacity to sense trehalose dimycolate (TDM). In these complexes, Mincle recognizes the carbohydrate headgroup while MCL recognizes the lipid tail, and Mincle acts as a bridge to enable formation of a functional MCL–Mincle–FcRγ complex.

arginine residue within the transmembrane domain that is typically required for association with FcRγ and so a direct interaction with this adaptor seems unlikely. Immunoprecipitation of MCL in a rat myeloid cell line led to co-precipitation with FcRγ (14) but this could not be replicated in a transfected non-myeloid cell line, indicating that additional cellular components are required for this interaction (15). Co-transfection of non-myeloid cells with MCL, Mincle, and FcRγ followed by immunoprecipitation gave

evidence for a covalently linked heterodimer of MCL and Mincle that associated with FcRγ, suggesting that Mincle acts as a bridge for the interaction between MCL and FcRγ (**Figure 2B**) (15, 16). Phagocytic internalization of anti-Mincle and anti-MCL coated beads provided strong evidence for a functional complex of MCL–Mincle–FcRγ on the cell surface. In primary rat peritoneal macrophages, expression of Mincle and MCL is tightly coupled, suggesting that Mincle–MCL heterodimers are formed. It was proposed that in Mincle–MCL heterodimers, TDM binding is mediated through carbohydrate binding to Mincle and lipid binding to MCL, a model that explains the ability of MCL to recognize soluble TDM but not TDM associated with mycobacterial cells.

In humans and rodents, Mincle is expressed on monocytes, macrophages, neutrophils, dendritic cells (DCs), and some subsets of B cells (3, 7, 17–19), and has been additionally detected on T cells and concanavalin A blasts in rats (9). Mincle is inducible on mouse macrophages upon activation of TLR4 (18), and is also inducible on human B cells through activation of TLR9 (17). In guinea pig, Mincle is expressed in the spleen, lymph nodes, and peritoneal macrophages and is up-regulated upon stimulation by zymosan and TDM (20).

Macrophage C-type lectin expression was first characterized as macrophage-restricted in mice (6). Human MCL is expressed on macrophages (synovial, peritoneal, and blood monocyte-derived) and Langerhans cells (8), and on neutrophils, monocytes, and immature and mature DCs (13). In rats, expression of MCL has been detected on macrophages, neutrophils, DCs, B cells, and T cells (9, 14). Upon treatment with various stimuli including PAM3CSK4, TNF-α, and IFNγ, MCL expression is inducible on monocytes and neutrophils (8, 13), but considerable variation is found between studies. For example, using monocytes from human donors, Arce et al. qualitatively demonstrated that IL-6, TNF-α, IL-10, and IFNγ caused up-regulation of MCL expression, whereas LPS caused down-regulation (8). In contrast, using similar protocols and cell lines, Graham et al. obtained different results (13). LPS caused up-regulation; and TNF-α, IFNγ, IL-6, and IL-10 did not change expression levels, although a less than twofold change in expression levels was observed. Expression of MCL on rat macrophages is inducible with IFNγ and pro-inflammatory stimuli from Gram-negative bacteria (14).

Carbohydrate recognition by lectins is frequently associated with conserved glutamic acid–proline–asparagine (EPN) or glutamine–proline–aspartate (QPD) motifs within the CRDs of C-type lectins (see **Box 1**) (21). Mincle contains an EPN motif within its CRD, leading to the suggestion of mannose/fucose/*N*acetylglucosamine/glucose specificity, which is commonly seen for such motifs. While MCL contains a conserved Ca2<sup>+</sup> binding site, it lacks a canonical EPN or QPD motif, instead possessing an EPX motif in rat (X = K) and human (X = D), whereas mouse possesses an ESN sequence. Insight into carbohydrate recognition by Mincle and MCL was assessed for six hexoses: mannose, fucose, *N-*acetylglucosamine, glucose, galactose, and *N-*acetylgalactosamine (22). Relative affinities for Mincle were mannose ~ fucose > glucose > *N-*acetylglucosamine > galactose ~ *N-*acetylgalactosamine. These relative affinities were roughly paralleled by MCL: mannose ~ fucose > glucose ~ *N*acetylglucosamine > galactose ~ *N-*acetylgalactosamine, although this CLR bound only very weakly to all hexoses examined. A screen of a 326-member carbohydrate microarray for ligands for a soluble MCL–Fc fusion protein failed to identify any carbohydrate ligands for this receptor (13), suggesting that MCL may not in fact be a carbohydrate binding lectin.

Functional studies of CLRs have been greatly accelerated through the development of reporter cell lines. In the case of Mincle, a useful reporter is a T cell hybridoma that expresses Mincle and FcRγ, as well as green fluorescent protein (GFP) under the control of the transcription factor nuclear factor of activated T cells (NFAT) to detect ITAM-mediated signals (3). An MCL reporter strain in which LacZ β-galactosidase is expressed under the control of NFAT in a T cell hybrid has been reported (15).

#### **MINCLE IN THE DAMAGED CELL RESPONSE**

Dead cells activate Mincle-expressing cells. The factor causing activation was identified to be spliceosome-associated protein 130 (SAP130) (3). SAP130 binds to Mincle in a Ca2<sup>+</sup> independent manner and mutation of the EPN motif of Mincle did not affect binding. Conversely, a mutant in the region recognized by a blocking antibody, VEGQW, was not activated by dead cells, suggesting that SAP130 binds to the CRD but at a distinct site to that of carbohydrate binding. SAP130 derived from either living or dead cells has a similar ability to activate through Mincle. As SAP130 is located in the nucleus in live cells, Mincle recognition of dead cells must occur after translocation to the external milieu. SAP130 therefore acts as a DAMP, providing an alarm signal for dysregulated cell death. Gamma-ray irradiation causes cell death in the thymus and induces neutrophil infiltration. Mincle RNA is upregulated after irradiation, but use of a Mincle-blocking antibody suppressed neutrophil infiltration into the thymus after irradiation. The observation that macrophage inflammatory protein 2 (MIP-2), a specific signal produced by thymic macrophages, was inhibited by the Mincle-blocking antibody suggests that Mincle

#### **Box 1 C-type lectins: structure and classification**.

C-type lectins are a subclass of lectins that are distinguished by a requirement for Ca2<sup>+</sup> for binding (21). Crystallographic studies reveal that C-type lectins contain a compact globular structure that comprises the carbohydrate recognition domain (CRD). The CRD contains conserved amino acid residue motifs, and which allows the prediction of new C-type lectins on the basis of sequence data. Somewhat confusingly, it has since been found that many predicted C-type lectin CRDs do not bind either carbohydrates or Ca2+. C-type lectins can be soluble or transmembrane proteins. An early classification of C-type lectins was introduced on the basis of (1) the identification of mannoseor galactose-specific binding motifs [glutamic acid–proline–asparagine (EPN) or glutamine–proline–aspartate (QPD) motifs, respectively] in well-characterized mannose- and galactose-specific lectins known at the time, and (2) conversion of the ligand specificity from mannose to galactose by mutagenesis (23). Although based on compelling studies at the time, this further adds to the confusion as some predicted mannose or galactose binding C-type lectins do not in fact bind to these carbohydrates. C-type lectins are classified into 17 groups; Mincle and MCL fall into Group II of the CTL family and are typically grouped in with a subset of CTLRs termed the Dectin-2 cluster that contains other PRRs including Dectin-2, DCIR, DCAR, and BDCA-2 (10).

activation by ligands induces the production of inflammatory cytokines and/or chemokines.

Ischemia results in dysregulated cell death and the exit of cellular components, suggesting the possible involvement of Minclemediated inflammation. Mincle knockout mice show a better outcome after stroke (24). Cerebral ischemia results in induction of Mincle expression in immune, neuronal, and endothelial cells, which paralleled increases in SAP130 expression (25). Levels of phosphorylated-Syk (p-Syk) were raised following ischemia suggesting that Mincle activation leads to increased levels of p-Syk. Application of the Syk inhibitor piceatannol reduced infarct volume and swelling, suggesting that signaling through the Syk-Card9–Bcl10–Malt1 axis is an important factor in the response to ischemia.

#### **IDENTIFICATION OF MYCOBACTERIAL GLYCOLIPIDS AS MCL AND MINCLE ANTIGENS AND THEIR ROLE IN MYCOBACTERIAL INFECTION**

Research in the areas of mycobacterial immunogenicity and Ctype lectins began to merge with the report that TDM activates macrophages and DCs via Syk–Card9–Bcl10–Malt1 signaling to produce innate activation that was distinct from that produced by Toll-like receptor ligands (4). Activation of antigen presenting cells was independent of Dectin-1 but required the ITAM-bearing signaling adaptor FcRγ, leading to the suggestion that a range of C-type lectins, including Mincle, were possible TDM receptors (4). Independently, Mincle had been shown to be associated with FcRγ (3). Two contemporaneous reports identified Mincle as a TDM receptor. Using Mincle-expressing reporter cells, Yamasaki and co-workers showed that while heat-killed mycobacteria could activate Mincle-expressing cells, delipidated cells could not, and the activity was located within the lipid extract (26, 27). Sub-fractionation of this extract identified TDM as the activating species. Independently, Lang and co-workers mined a gene expression array database for genes expressed and up-regulated in bone marrow macrophages treated with TDB (28). The candidate Mincle was expressed as an Fc fusion protein and was shown by ELISA to bind to TDB and TDM in a dose-dependent manner. TDM is an important glycolipid produced by all mycobacteria that possesses potent immunostimulatory properties, in particular the ability to cause granulomas. Using Mincle−/<sup>−</sup> mice, Mincle was shown to be essential for the granulomatous response to TDM, providing compelling evidence that Mincle is a major TDM receptor (26). While initial reports demonstrated that TDM activates mouse Mincle, a recent report using a reporter cell has shown that TDM also activates human Mincle (29).

The discovery that MCL is a receptor for TDM arose from the initial observation that MCL is expressed on neutrophils and monocytes and triggers cellular activation through Syk (13). The observation that resting macrophages barely express Mincle, yet addition of TDM drives Mincle expression, suggested the existence of another TDM receptor (12). Innate immune responses were impaired in MCL-deficient mice, including the TDM-induced acquired immune response, experimental autoimmune encephalomyelitis (EAE) (12). Further, MCL was shown to be required to drive Mincle expression in a Clec4e–GFP fusion reporter mouse (12). Daws reported that MCL reporter strains were not responsive to intact mycobacteria and argued that this observation is consistent with MCL recognizing the lipid portion of TDM, which is exposed in the purified antigen but embedded in the bacterial cell wall in intact bacteria (15). Recent studies of guinea pig homologs of Mincle (gpMincle) and MCL (gpMCL) revealed that only gpMincle binds TDM and that gpMincle is constitutively expressed (20). gpMCL lacks the hydrophobic region proposed to be involved in TDM binding, although it does bind FcRγ. This work suggested that of these two receptors, only gpMincle is involved in guinea pig immune responses against mycobacteria and that the functions of MCL and Mincle in recognition of mycobacteria are not conserved between humans and guinea pig.

The identification of Mincle and MCL as TDM receptors and establishing that Mincle is required for the TDM-induced granulomatous response has resolved longstanding questions in the field. However, the connection of Mincle and MCL to the antimycobacterial response in the context of infection has been less clearly answered (30). An early role for the cytosolic adaptor caspase recruitment domain family, member 9 (Card9) in pulmonary tuberculosis was demonstrated when Card9−/<sup>−</sup> mice were shown to succumb rapidly to aerosol infection by *Mycobacterium tuberculosis* H37Rv (31). Comparison of bone marrow derived macrophages of wild-type and Card9−/<sup>−</sup> mice when infected with *M. tuberculosis* revealed a significant reduction in the homozygous mutant of the pro-inflammatory cytokines, TNF, IL-1β, and IL-6, and reduced IL-12 and CCL5, compared to wild-type. Heat-killed *Mycobacterium bovis*, *Mycobacterium smegmatis,* and *M. tuberculosis* H37Rv all activated an NFAT-driven GFP reporter strain that expresses Mincle and FcRγ, with activation ablated upon mutation of the Mincle CRD EPN motif to QPD (26). The effect of Mincle deletion upon *M. tuberculosis* infection has been studied in Mincle−/<sup>−</sup> mice. While in the absence of Mincle, macrophages did not produce the reactive nitrogen species NO<sup>2</sup> <sup>−</sup> upon TDM stimulation, Mincle-deficient macrophages were responsive to *M. tuberculosis* infection, and costimulation with IFNγ resulted in normal levels of granulocyte colony-stimulating factor (G-CSF), TNF and NO<sup>2</sup> <sup>−</sup>. Mincle−/<sup>−</sup> mice did not form granulomas upon stimulation with TDM, the same mice forms granulomas that were indistinguishable from wild-type upon infection with *M. tuberculosis*. Furthermore, while TDM resulted in induction of TH1 and TH17 cells, Mincle−/<sup>−</sup> mice infected with *M. tuberculosis* developed normal TH1 and TH17 immune responses, indicating that in the context of *M. tuberculosis* infection, Mincle is not required for instructing maturation of naïve T cells. This observation is consistent with studies with *Fonsecaea pedrosoi* that suggests that while specific signals are transmitted by Mincle during vaccination with purified TDM, PRR co-stimulation during infection leads to a cocktail of cytokines and chemokines that shape T cell development (32) (see **Box 2**).

MCL-deficient mice have defective immune responses to mycobacterial infection. Induction of TNF and MIP-2 RNA was impaired in bone marrow derived macrophages from MCL−/<sup>−</sup> mice (12). Further, MCL−/<sup>−</sup> mice gave impaired IFNγ levels in Mantoux test responses when stimulated with purified protein derivative compared to wild-type.

#### **Box 2 The role of MCL and Mincle inT cell development**.

Upon recognition of their cognate antigen, naïve CD4<sup>+</sup> T cells (TH0 cells) can differentiate into novel effector CD4<sup>+</sup> T cell lineages that can regulate or assist active immune responses (**Figure 3**). The fate of the transforming TH0 cell is determined by the pattern of cytokines it receives at the moment of antigen recognition by the T cell receptor. Activation of Mincle and MCL induces expression of interleukins-1 (IL-1) and IL-6 in antigen presenting cells (4, 19, 26, 33, 34), which flood the microenvironment of the transforming TH0 cell and shape the development of TH1 and TH17 phenotypes in both humans and mice (12, 28). The TH17 lineage is defined by the production of IL-17, which induces and mediates pro-inflammatory responses leading to the recruitment of monocytes and neutrophils, which can clear infections (35).TH1 cells characteristically produce IFNγ, an activator of natural killer cells (which provide direct killing of pathogens) and macrophages (leading to phagocytosis).

Glycerol monomycolate (GroMM) has been identified as a Mincle-activating lipid (29). Using NFAT–GFP reporter strains, it was shown that human Mincle-expressing cells could be activated by GroMM, although less potently than for TDM. While mouse Mincle-expressing cells were not activated by GroMM, transgenic mice expressing human Mincle gained the ability to recognize GroMM, and injection of GroMM liposomes into the skin of these mice resulted in infiltration of macrophages and eosinophils. In primary human monocytederived macrophages, GroMM produced TNFα in response to GroMM that could be blocked by an anti-human Mincle antibody, demonstrating that this glycolipid is a ligand for human, but not mouse Mincle.

#### **STRUCTURE–ACTIVITY RELATIONSHIPS OF TREHALOSE AND GLYCEROL-BASED ANTIGENS FOR MINCLE**

Mincle is potently activated by TDM (**Figure 4**). Treatment of TDM with trehalase, which apparently cleaves this molecule into glucose monomycolate (GMM), abolished Mincle binding (26). This result suggests that Mincle specifically recognizes the two glucose residues within TDM and is particularly interesting as GMM is a glycolipid produced upon infection by *M. tuberculosis* and which itself is a potent antigen when presented to T cells by CD1b (36). Cells are activated to similar degrees by the TDM analog trehalose dibehenate, which suggests that complex mycolate structures are not necessary for Mincle activation. Measurement of the direct interaction of Mincle with TDM analogs is limited by their poor solubility and so accurate data has only been obtained with shorter acyl groups. For bovine Mincle, affinities of trehalose diesters generally increase with increasing chain length (37). Trehalose monoesters are also effective ligands for human and bovine Mincle with affinities increasing up to 6-*O*-lauryltrehalose

(C12) and 6-*O*-octanoyltrehalose, respectively (37, 38). Trehalose itself is a weak ligand for Mincle, and methyl α-glucoside, representing a monomer of trehalose, possesses 36-fold weaker affinity for Mincle (37). The C<sup>22</sup> and C<sup>26</sup> trehalose monoesters were able to activate mouse macrophages in a Mincle-dependent manner (34).

Glycerol monomycolate is an antigenic ligand for human, but not mouse Mincle (**Figure 4**) (29). Plate-bound GroMM is less potent than TDM in an NFAT–GFP reporter cell assay. Glycerol monobehenate (GMB) possesses similar activity to GroMM toward human Mincle and gave only marginal responses for mouse Mincle reporter cells.

#### **STRUCTURES OF CRDs OF MCL AND MINCLE**

The classification of Mincle and MCL as CLRs leads to the prediction that the CRD domain will conform to the typical domains seen for this class of proteins. Two recent papers have independently reported structures of Mincle and MCL C-type lectin domains. Maenaka and co-workers reported the structures of human Mincle and MCL, recombinantly expressed in *Escherichia coli* (38). While wild-type human Mincle failed to crystallize, the I99K mutant (which matches the equivalent residue in human MCL) provided diffracting crystals. Drickamer and coworkers have reported the structure of bovine Mincle, expressed using a similar approach and crystallized without the need for mutagenesis (37).

The human MCL C-type lectin domain comprises a globular fold containing two alpha helices around a beta-strand core, with a single Ca2<sup>+</sup> bound (**Figure 5A**) (38). The human and bovine Mincle C-type lectin domains reveal largely identical folds, and both contain two Ca2<sup>+</sup> ions, with the bovine structure containing an additional Na<sup>+</sup> ion (**Figure 5B**) (38). Structural insight into carbohydrate recognition by Mincle was obtained through a complex of bovine Mincle with trehalose (**Figure 5C**) (37). In this complex, only one pyranose ring of trehalose binds to calcium through O3 and O4 of glucose (**Figure 5D**). Insight into a possible molecular basis for signal transduction was obtained through a conformational change in a loop between Asn-170 and Asp-177 near the conserved Ca2<sup>+</sup> upon biding of trehalose. Examination of the surface of Mincle near the trehalose binding site identified a hydrophobic channel lined with lipophilic amino acids that is adjacent to the 6-OH of the Ca2+-bound glucose ring, which was proposed to comprise a lipid binding channel (**Figure 5E**).

#### **MCL AND MINCLE IN FUNGAL DISEASE**

Effective protection of host from pathogenic fungi and clearance of infection requires a coordinated immune response from both TH1 and TH17 cells that restrict fungal cell growth and drive phagocytic clearance (2, 5). The Syk–Card9–Bcl10–MALT1 signaling axis leads to NF-κB activation, which drives the expression of multiple cytokines including IL-1, IL-6, and IL-23 that facilitate TH17 cell differentiation, and IL-12, which is essential for TH1 differentiation (see **Box 2**).

Yamasaki and co-workers screened 50 species of pathogenic fungi as Mincle activators using a NFAT–GFP reporter cell line (39). Of these only *Malassezia* species, including *Malassezia pachydermatis*, *Malassezia dermatis*, *Malassezia japonica*, *Malassezia nana*, *Malassezia slooffiae*, *Malassezia sympodialis*, and *Malassezia furfur*, induced strong NFAT–GFP activation. Interestingly, particularly given results from other laboratories, *Candida albicans* did not result in NFAT–GFP expression. In normal skin, *Malassezia* spp. are commensals, however in atopic/eczema and psoriasis, these fungi can elicit inflammatory responses in skin lesions and can cause diseases such as tinea versicolor, atopic dermatitis, and lethal sepsis. Reporter cells with mutant Mincle mutated in the

CRD were not activated by *Malassezia* indicating that the likely ligand is a carbohydrate, and as discussed below, subsequent work led to the isolation of glycolipids including gentiobiosyl diacylglycerides and a complex mannosyloxylstearyl mannitol glycolipid (40). Co-culture of *M. pachydermatis* with wild-type murine bone marrow macrophages resulted in production of MIP-2, TNFα, keratinocyte-derived chemokine (CXCL1), and IL-10 cytokines, which were reduced but not ablated in Mincle−/<sup>−</sup> macrophages. When *M. pachydermatis* was injected into the peritoneal cavity, wild-type mice produced IL-6 and TNF, which was reduced in the Mincle−/<sup>−</sup> mutant,suggesting that Mincle is important in immune responses to these fungi.

Chromoblastomycosis is a chronic skin infection caused by fungi including *F. pedrosoi*. *F. pedrosoi* is recognized by Mincle, resulting in production of high levels of IL-10 and low levels of TNF and IFNγ (32). Under these conditions,*F. pedrosoi* establishes a chronic infection which is unable to be cleared. Co-stimulation of macrophages and DCs infected with *F. pedrosoi* with the TLR2 agonist PAM3CSK4, the TLR7 agonist Imiquimod, or the TLR4 agonist LPS gave robust levels of TNF, suggesting that this fungus fails to cause co-stimulation of PRRs. Indeed, Mincle−/<sup>−</sup> bone marrow DCs lacked the ability to be co-stimulated by LPS and *F. pedrosoi*, and *F. pedrosoi* infected mice effectively cleared disseminated infections with this fungus when treated with a single dose of LPS. As well, topical application of Imiquimod significantly decreased fungal burden in skin after subcutaneous infection. To gain further insight into the mechanisms IL-6 of inflammatory responses, Wevers and co-workers studied stimulation of human DCs with the related fungus *Fonsecaea monophora*, also a causative agent of chromoblastomycosis (41, 42). *F. monophora* triggered the maturation of DCs and production of IL-6, IL-1β, and IL-23, but not IL-12p70. In this case, *F. monophora* is recognized by both Dectin-1 and Mincle, and while activation through Dectin-1 resulted in cytokine production, Mincle activation resulted in suppression of IL-12p70 through suppression of IL-12p35 transcription. This in turn was achieved by Mincle signals targeting the proteasomal degradation of nuclear IRF1 via the ubiquitin E3 ligase Mdm2. Mdm2 activation and translocation was a result of Mincle triggering of the signaling kinase PKB, which was not triggered by Dectin-1 or TLR4. Overall, this process leads to impairment of IL-12 and is important for shaping the development of CD4<sup>+</sup> T cells toward TH17 cells. This appears to be a general phenomenon for Mincle activation by other fungi including *Fonsecaea compacta* and *Cladophialophora carrionii*, and indeed by the TDM mimic TDB. Naïve CD4<sup>+</sup> T cells co-cultured with DCs primed with the TLR4 agonist LPS differentiate to TH1 polarized cells, however in the presence of *F. pedrosoi* and other fungi, these are skewed to a TH2 response.

Mincle appears to play crucial roles in *C. albicans* infection. Mincle-deficient mice are more susceptible to systemic candidiasis, and production of TNF-α by macrophages was reduced *in vivo* and *in vitro* (43). The soluble CRD of human and mouse Mincle was found to bind whole *C. albicans* cells (44). However, a subsequent screening study of different *C. albicans* strains did not activate a Mincle-expressing NFAT–GFP reporter strain, leading to the suggestion that strain-specific features are required for Mincle activation (39). The identity of the *C. albicans* ligands is not known.

A direct role for MCL in fungal disease is yet to be demonstrated. No significant differences were observed in the ability of MCL-deficient mice to resist infection with *C. albicans* (13).

#### **IDENTIFICATION OF MINCLE ANTIGENS FROM FUNGI**

Gentiobiosyl diacylglycerides from *M. pachydermatis* have been identified as mouse Mincle ligands (**Figure 6**) (40). These glycolipids bear anteiso fatty acyl groups (anteiso-C15, C17, C19, or C20) at the *sn*-1 and *sn*-2 positions of the glycerol. Four lipoforms were identified with the following substituent permutations (*sn*-1/*sn*-2): anteiso-C19/anteiso-C15, anteiso-C17/anteiso-C15, anteiso-C20/anteiso-C15, and anteiso-C19/anteiso-C17. All isomers activated Mincle to similar degrees, but were significantly less potent than TDM. While these gentiobiosyl diacylglycerides bear significant resemblance to the glycolipid membrane anchor of lipoteichoic acid, it did not activate via Mincle.

*Malassezia pachydermatis* produces a complex mannosyloxylstearyl mannitol glycolipid that is a potent activator via mouse Mincle, with a potency approaching that of TDM (**Figure 7**) (40). This glycolipid comprises β-linked mannose residues attached to 10-hydroxystearic acid and esterified onto an l-mannitol core.

#### **CONCLUSION**

T cell mediated immune responses induced by PRRs are important in recognition of damaged self and pathogens. The identification of DAMPs and PAMPs that activate Mincle has elevated the importance of this CLR, and the involvement of MCL as an auxiliary CLR that forms heterodimers with Mincle provides the potential for an expansion in the range of ligands recognized by Mincle. MCL and Mincle appear to be an important part of a larger repertoire of PRRs and recent studies of the interplay of Mincle with Dectin-1 and TLRs demonstrate the potential for modulation of immune signals through PRR co-stimulation. The availability of three dimensional X-ray structures of the CRDs of these receptors has unveiled a molecular picture of ligand recognition that may inform development of novel TH1/TH17 vaccines. However, at this stage it is not clear how Mincle can recognize a diverse range of structurally dissimilar antigens, or the structural changes that lead to signal transduction and transcription. Further, it is likely that additional pathogens that activate MCL and Mincle remain to be identified as well as new ligands from existing pathogens such as *Klebsiella pneumonia* (45) and *F. pedrosoi* that activate through Mincle. It is noteworthy that the majority of studies with MCL and Mincle has been performed on mice and it will require additional work to translate these findings to the human system.

The species differences noted in the expression and functional properties of MCL and Mincle, and the selective activation of human, but not mouse Mincle by GroMM, suggest that the development of humanized animal models and cell lines will be essential for understanding the role of these CLRs in human health and disease.

#### **ACKNOWLEDGMENTS**

We thank the Australian Research Council for financial support. Spencer J. Williams is an ARC funded Future Fellow.

#### **REFERENCES**


mycobacterial cord factor. *Immunity* (2013) **38**:1050–62. doi:10.1016/j.immuni. 2013.03.010


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 29 April 2014; paper pending published: 19 May 2014; accepted: 03 June 2014; published online: 23 June 2014.*

*Citation: Richardson MB and Williams SJ (2014) MCL and Mincle: C-type lectin receptors that sense damaged self and pathogen-associated molecular patterns. Front. Immunol. 5:288. doi: 10.3389/fimmu.2014.00288*

*This article was submitted to Immunotherapies and Vaccines, a section of the journal Frontiers in Immunology.*

*Copyright © 2014 Richardson and Williams. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## Computational and experimental prediction of human C-type lectin receptor druggability

#### **Jonas Aretz 1,2, Eike-ChristianWamhoff 1,2, Jonas Hanske1,2, Dario Heymann<sup>1</sup> and Christoph Rademacher 1,2\***

<sup>1</sup> Department of Biomolecular Systems, Max Planck Institute of Colloids and Interfaces, Potsdam, Germany <sup>2</sup> Department of Biology, Chemistry, and Pharmacy, Freie Universität Berlin, Berlin, Germany

#### **Edited by:**

Elizabeth Yuriev, Monash University, Australia

#### **Reviewed by:**

Anthony G. Coyne, University of Cambridge, UK Stephen James Headey, Monash University, Australia

#### **\*Correspondence:**

Christoph Rademacher, Department of Biomolecular Systems, Max Planck Institute of Colloids and Interfaces, Am Mühlenberg 1, Potsdam 14424, Germany e-mail: christoph.rademacher@mpikg. mpg.de

Mammalian C-type lectin receptors (CTLRS) are involved in many aspects of immune cell regulation such as pathogen recognition, clearance of apoptotic bodies, and lymphocyte homing. Despite a great interest in modulating CTLR recognition of carbohydrates, the number of specific molecular probes is limited. To this end, we predicted the druggability of a panel of 22 CTLRs using DoGSiteScorer. The computed druggability scores of most structures were low, characterizing this family as either challenging or even undruggable.To further explore these findings, we employed a fluorine-based nuclear magnetic resonance screening of fragment mixtures against DC-SIGN, a receptor of pharmacological interest. To our surprise, we found many fragment hits associated with the carbohydrate recognition site (hit rate = 13.5%). A surface plasmon resonance-based follow-up assay confirmed 18 of these fragments (47%) and equilibrium dissociation constants were determined. Encouraged by these findings we expanded our experimental druggability prediction to Langerin and MCL and found medium to high hit rates as well, being 15.7 and 10.0%, respectively. Our results highlight limitations of current in silico approaches to druggability assessment, in particular, with regard to carbohydrate-binding proteins. In sum, our data indicate that small molecule ligands for a larger panel of CTLRs can be developed.

**Keywords: C-type lectin receptors, druggability, inhibitor, DC-SIGN, langerin, MCL, fragment screening, NMR screening**

#### **INTRODUCTION**

Glycans are present in a large diversity on cell surfaces and are essential in many aspects of life such as embryonic development, cell–cell communication, and regulation of the immune system (1). In particular, our understanding of the role of glycans in immunobiology has grown significantly during the last decades. Three major families of secreted or membrane-bound lectins recognize carbohydrates. Complementary to other receptors of the innate and adaptive immune system, Galectins, Siglecs, and Ctype lectins shape the response to incoming signals (2, 3). Among many other processes, they are involved in pathogen recognition and killing, antigen processing, and tumor progression (2, 4, 5).

Mammalian C-type lectin receptors (CTLRs) represent a large family of lectins, which is subdivided into 17 groups based on their phylogenetic relationships and domain structure (6). CTLRs are present in a variety of tissues and the glycan specificity of receptors present on cells of the innate immune system has been studied extensively. For example, they function as homing receptors on leukocytes as well as pattern recognition receptors (2, 3, 7). A particularly well-studied pattern recognition receptor is the dendritic cell-specific intercellular adhesion molecules-3-grabbing non-integrin (DC-SIGN) (8, 9). This CTLR is expressed on dendritic cells and macrophages and is involved in the recognition of a large array of pathogens such as *Mycobacterium tuberculosis*, *Leishmania*, HCV, Ebola, and HIV (3, 10–15). It was demonstrated that DC-SIGN promotes HIV *trans*-infection of T cells and has since then drawn attention as a therapeutic target in anti-viral therapy (10, 16, 17).

Aside from interference with pathogen recognition, leukocyte homing has been a target for small molecule inhibition of CTLR function. To this end, Selectins, a group of three CTLRs, have been in the focus as anti-inflammatory drug targets since the mid-90s (18). Only recently, the glycomimetic GMI-1070 has entered clinical trials for the treatment of sickle cell anemia (19). Likewise, agonistic CTLR ligands hold promise to serve as adjuvants for immune stimulation (20). However, despite increasing interest in CTLRs as pharmacological targets, there is only a limited set of small molecule agonists or antagonists available (17). Partially, this can be attributed to the limited success of previous attempts to find lead structures from classical drug discovery campaigns.

All CTLRs share a C-type lectin domain (CTLD) that has a conserved fold with a characteristic double-loop stabilized by two disulfide bridges (7, 21). This domain is often referred to as carbohydrate recognition domain (CRD) for those CTLRs involved in glycan binding. Additional domains are frequently present and in particular, heptad-repeats and collagen-like neck domains promote oligomerization, resulting in high avidity glycan binding. In transmembrane CTLRs, CRD, and neck domain are referred to as extracellular domain (ECD). Canonical carbohydrate recognition is mediated by a calcium ion and although there are four Ca2<sup>+</sup> binding sites, only the second site (Ca2+-2) is described to be involved in coordinating glycans (21). While Ca2+-4 has not been associated with carbohydrate binding, positive cooperative effects are observed between the other sites (22, 23). Not all potential Ca2<sup>+</sup> sites are occupied in every CTLD, which reflects the fine-tuned physiological role of this interaction. For endocytic CTLRs the pH sensitivity of the heptad-repeat neck formation and Ca2<sup>+</sup> coordination as well as active Ca2<sup>+</sup> export from the endosome are major contributors to endosomal ligand release (23, 24). Some CTLRs bind carbohydrates in a Ca2+-independent, non-canonical binding site with Dectin-1 being the prime example (25). All CRDs share a carbohydrate recognition site that is largely flat and hydrophilic. This is a consequence of glycans being highly hydrophilic themselves (17, 26). Hence, binders are also often hydrophilic and do not suffice the requirements for orally available drugs (27).

Whether a protein is a suitable candidate for drug development is of major concern during the drug discovery process. Considering the expenses involved in the development of a pharmacologically active small molecule, target selection has to be done carefully (28). The modulation of a suitable drug target with a rule of five compliant molecules should result in a therapeutic effect (29). The term druggability, however, refers to the ability of a protein to bind a drug-like ligand with high affinity and specificity (29–31). Furthermore, this interaction has to result in a modulation of the protein function. Importantly, a high druggability does not infer the protein being a good drug target. The latter definition includes a therapeutic effect induced by small molecule binding (32, 33). Methods to assess the druggability of a target protein have become good predictors prior to starting a drug discovery campaign, as low scores are indicators for a high failure rate during later stages of the project (30, 33).

The availability of structural information enables computational assessment of druggability. Limited resources are required and many computational tools have been developed to deduce druggability scores from crystallographic information (34, 35). In a two-step process, pockets on the protein surface are first identified and then scored (28, 32, 34, 36). Large sets of proteins can be analyzed and predictions have been found to correlate well with experimental data (31, 34, 37, 38). To the best of our knowledge, there are only two studies on the computational druggability assessments of glycan-binding proteins, both reporting low scores (39, 40).

Experimental assessment of target druggability can be pursued even in the absence of structural information. For this, screening of drug-like molecules in a high-throughput screening format can be performed. Previous reports on micromolar inhibitors of DC-SIGN resulting from a screening campaign highlight the success of this approach (41, 42). Alternatively, a diverse library of fragments of drug-like molecules is screened against the target. The molecular weight of these fragments ranges between 150 and 300 Da. Estimates propose that 1000 fragments can cover a similar chemical space as 10 trillion drug-sized molecules would (33). This in turn allows applying smaller libraries to test the druggability of a candidate protein (31, 33). The low complexity of fragments increases their likelihood of binding a receptor and consequently hit rates of 5–15% are regularly observed for druggable targets (31, 37, 43).

Small molecule fragments have low affinities with dissociation constants in the upper micro- to lower millimolar range. Hence, sensitive biophysical techniques are necessary to monitor this interaction and nuclear magnetic resonance (NMR) spectroscopy has established itself as one of the major techniques used for fragment screening (31, 33, 37–39, 44, 45). In particular, hit rates from NMR-based screenings have proven to be reliable measures of druggability (31, 37, 44). In ligand-observed NMR, mixtures of fragments are screened against a target and changes in NMR observables such as chemical shift, line width, and signal intensity upon binding allow hit identification. Notably, deconvolution of the fragment mixtures is not necessary. The use of fluorine atoms in drug-like fragments has proven to be instrumental (38, 46). As fluorine is rare in biological samples, <sup>19</sup>F NMR spectra of fragment cocktails are not perturbed by background resonances. Moreover, the fluorine spin is highly susceptible to changes in its chemical environment and allows sensitive identification of hits.

To predict the druggability of human CTLRs, we compiled a set of 22 crystal structures and analyzed it by applying computational methods. We then chose DC-SIGN and conducted experimental fragment screening to compare these findings. Low druggability scores derived *in silico* did not match the moderate to high fragment hit rates during experimental evaluation. Hence, we expanded our screening by two additional CTLRs, namely Langerin and MCL and discovered similarly high experimental druggability estimates. Taken together, our results highlight the limitations of *in silico* druggability prediction for CTLRs while our fragment screening present promising grounds for inhibitor design against this family.

#### **MATERIALS AND METHODS**

#### **STRUCTURE-BASED MULTIPLE SEQUENCE ALIGNMENT AND CONSENSUS STRUCTURE**

The scope of structural data on human CTLRs was assessed using the protein family (Pfam) database (accession code: PF00059) (47). Natural killer (NK) cell lectin-like receptors were treated as a closely related, yet physiologically distinct subfamily according to the Pfam annotation and were not included in the analysis. Furthermore, CTLRs crystallized as a domain swap dimer, namely blood dendritic cell antigen 2 (BDCA-2) and mannose receptor (MR), were omitted (48, 49). Murine Dectin-1 was included in the selection as it has an unusual Ca2+-independent carbohydrate-binding mode and no structural information of the human ortholog is available (25). All structures considered for analysis are listed (**Table 1**). If available, a structure in complex with a carbohydrate ligand was selected. Prior to the calculations, all structures were trimmed down to the respective CRD domain as inferred from the Pfam domain definition. A structure-based multiple sequence alignment was performed in molecular operating environment (MOE) (50). Pairwise root mean square deviation (RMSD) values were determined for all pairs of Cα atoms unless a gap was found in one of the compared sequences. Next, a phylogenetic analysis based on the pairwise sequence similarities was conducted in R (51, 52). Hierarchical clustering was performed based on the Manhattan metric and via the complete linkage criterion. To complement the phylogenetic analysis, MOE was used

to predict a consensus structure of all CRDs. During model construction, up to 20 gaps and RMSD values of Cα up to 10 Å were allowed for a single position in the multiple sequence alignment.

#### **BINDING SITE PREDICTION AND IN SILICO DRUGGABILITY ASSESSMENT**

Initially, CTLR structures were superposed in MOE. For superposition and the subsequent druggability assessment, physiologically relevant oligomerization states were assumed (**Table 1**). The EGF domains of Selectin structures were removed. The resulting files served as input data for binding site prediction with DoGSite (72). The predicted binding sites were mapped on the structure and classified into four categories following the reported nomenclature of secondary structure elements and Ca2<sup>+</sup> binding sites (21): (i) Ca2+-2-binding sites, (ii) Ca2+-associated binding sites in long loop, (iii) Ca2+-independent carbohydrate-binding sites, and (iv) other binding sites. A binding site was assigned to category (i) if the Ca2+-2 ion was part of the predicted binding site. For category (ii), the criteria were less restrictive and all binding sites with residues within a 6 Å radius of either Ca2+-1, 2, or 3 were included (Figure S1 in Supplementary Material). Binding sites in category (iii) are located in close proximity to the experimentally determined Ca2+ independent carbohydrate-binding site. The druggability of all binding sites was scored with DoGSiteScorer (73). Finally, category (i), (ii), or (iii) binding sites that displayed the highest score for a receptor were selected and this selection served to determine a mean druggability score for the analyzed CTLRs.

#### **CLONING**

Codon optimized genes for DC-SIGN and human Langerin for expression in *E. coli* were purchased from Life Technologies (Carlsbad, CA, USA) and GenScript (Piscataway, NJ, USA), respectively. The DC-SIGN gene included a C-terminal TEV (tobacco etch virus) cleavage site and a Strep-tag II for affinity purification. The ECD and CRD, ranging from amino acids 62 to 404 and 250 to 404 (Figure S3 in Supplementary Material), respectively, were cloned into a pUC19 vector using primers including a T7 promoter and ribosomal binding site (RBS) upstream of the gene (**Table 2**). Human Langerin truncated ECD, ranging from amino acids 148 to 328, was cloned with a C-terminal TEV cleavage site and a Strep-tag II into a pET32a expression vector (EMD Millipore, Billerica, MA, USA). The MCL gene was obtained from the DNASU Plasmid Repository (HsCD00507041, Arizona State University, Phoenix, AZ, USA) and the ECD was cloned into a pUC19 vector already carrying a Strep-tag II, a T7 promoter and an RBS. For MCL ECD, amino acids ranging from 61 to 215 were used (65).

#### **PROTEIN EXPRESSION AND PURIFICATION**

All growth media or chemicals used for protein expression and purification were purchased from Carl Roth (Karlsruhe, Germany) if not stated otherwise. Proteins were expressed insoluble in *E. coli* BL21(DE3) (New England Biolabs, Ipswich, MA, USA) or KRX (Promega, Fitchburg, WI, USA). Precultures were grown in 50 mL Luria–Bertani (LB) medium supplemented with 100 mg L−<sup>1</sup> carbenicillin for DC-SIGN and MCL expression or 35 mg L−<sup>1</sup> kanamycin for Langerin expression at 37°C in 250 mL baffled shaking flasks at 220 rpm shaking frequency. The precultures of DC-SIGN and MCL were centrifuged (3,000 × *g*, 10 min,


**Table 2 | Primer sequences used for cloning**.

**Table 1 | List of analyzed CTLR structures**.


4°C), the supernatant was discarded, and the sediment was resuspended in 500 mL LB medium supplemented with 50 mg L−<sup>1</sup> carbenicillin. The cells were afterwards cultivated at 37°C in 2.5 L baffled shaking flasks at 220 rpm shaking frequency. Protein expression was induced with 1 mM IPTG (isopropyl β-d-1 thiogalactopyranoside) at OD<sup>600</sup> = 0.4–0.6 for additional 4 h at 37°C. The preculture of Langerin trECD was diluted directly to OD<sup>600</sup> = 0.1 into 500 mL of LB medium supplemented with 35 mg L <sup>−</sup><sup>1</sup> kanamycin, 0.01% d-glucose, and 0.05% l-rhamnose for autoinduction of expression. Bacteria were harvested (4,000 × *g*, 20 min, 4°C), frozen, and resuspended in lysis buffer (50 mM Tris– HCl, pH 7.5, 10 mM magnesium chloride, 0.1% Triton X-100, 4 mg lysozyme [Sigma-Aldrich, St. Louis, MO, USA) and 500 U DNaseI (Applichem, Darmstadt, Germany) per gram of wet biomass] and incubated on ice for 4 h. Inclusion bodies were harvested by centrifugation (10,000 × *g*, 10 min, 4°C) and washed thrice with 20 mL washing buffer (50 mM Tris–HCl, pH 8.0, 4 M urea, 500 mM sodium chloride, 1 mM EDTA) to remove soluble proteins.

For DC-SIGN ECD and Langerin ECD purification, the washed inclusion bodies were resuspended and denatured in 40 mL denaturation buffer (6 M guanidine hydrochloride, 100 mM Tris–HCl, pH 8.0, 1 mM DTT) and incubated at 30°C for 1 h or at 4°C over night, following a centrifugation (42,000 × *g*, 1 h, 4°C). The denatured inclusion bodies were slowly diluted threefold with cold binding buffer (TBS, pH 7.8 with 25 mM calcium chloride), supplemented with 1 mM reduced glutathione (GSH,Applichem) and 0.1 mM oxidized glutathione (GSSG, Applichem), and afterwards dialyzed twice against 2 L of this buffer for 24 h at 4°C. After another 2 L dialysis against binding buffer, proteins were purified according to previously published protocols using a mannan agarose affinity chromatography (Sigma-Aldrich) (74).

The washed inclusion bodies of DC-SIGN CRD were resuspended and denatured in 10 mL denaturation buffer and incubated at 30°C for 1 h or at 4°C over night, following a centrifugation (42,000 × *g*, 1 h, 4°C). The solubilized inclusion bodies in the supernatant were refolded by rapid dilution into 50 mL of cold refolding buffer (100 mM Tris–HCl, pH 8.0, 1 M l-arginine, 150 mM sodium chloride, 120 mM sucrose) while stirring at 4°C. After 2 days, protein solution was dialyzed against 2 L of cold buffer W (100 mM Tris–HCl, pH 8.0, 150 mM sodium chloride, 1 mM EDTA) and aggregated protein was removed by centrifugation (42,000 × *g*, 1.5 h, 4°C). The protein was purified using a Streptactin affinity chromatography (IBA, Goettingen, Germany) according to the manufacturer's instructions.

MCL refolding and purification was performed according to Furukawa and coworkers introducing minor changes in the protocol. Briefly, purification was performed via Streptactin affinity chromatography after dialysis against 2 L of buffer W.

#### **FRAGMENT LIBRARY**

Fragments were selected from a pool of commercially available compounds from different manufacturers (Sigma-Aldrich, St. Louis, MO, USA; KeyOrganics, Camelford, UK; ACB Blocks, Toronto, ON, Canada; Santa Cruz Biotechnology, Santa Cruz, CA, USA; Vistas-MLab, Moscow, Russia; LifeChemicals, Kyiv, Ukraine; Alfa Aesar, Ward Hill, MA, USA; TCI, Tokyo, Japan; Apollo Scientific, Stockport, UK) using chemoinformatic tools as implemented in MOE and KNIME (75). Only compounds with <23 non-hydrogen atoms and at least one ring were PAINS-filtered and consecutively included in the diversity selection (76). Fragment selection was based on normalized moments of inertia for shape diversity, Tanimoto coefficient (<0.8) using MACCS fingerprint for chemical diversity and scaffold diversity was ensured following definitions given by Murcko and coworkers (77, 78).

Maximum pairwise similarities were calculated in MOE using three-point pharmacophore-based fingerprints (GpiDAPH3) as descriptors and Tanimoto coefficient as similarity metric. The same descriptor was used to assess the chemical complexity of the fragments (31).

Fragments were dissolved in d6-DMSO (Euriso-Top, Saint-Aubin, France) to 100 mM stock solutions under a nitrogen atmosphere in Matrix plates (Thermo Scientific, Waltham, MA, USA) followed by shaking at room temperature for 18 h at 140 rpm. Fragments were stored at −20°C. Next, each fragment was dissolved under nitrogen atmosphere at 1 mM in 500µL 10 mM deuterated phosphate buffer, pH 7.0, containing 50µM d4-TSP [(3-(trimethylsilyl)-2,2<sup>0</sup> ,3,3<sup>0</sup> -tetradeuteropropionic acid, Sigma-Aldrich], 50µM TFA (trifluoroacetic acid, Sigma-Aldrich), and 0.01% sodium azide (Carl Roth). A <sup>19</sup>F and <sup>1</sup>H NMR spectrum of each fragment was recorded for quality control. All NMR studies were measured at 298 K in Norell SP5000-7 5 mm tubes (Norell, Landisville, NJ, USA) on a Varian PremiumCOMPACT 600 MHz spectrometer equipped with an oneNMR probe (Agilent, Santa Clara, CA, USA) with TSP and TFA as internal references. All spectra were analyzed in MestReNova 9.0.0 (Mestrelab Research, Santiago de Compostela, Spain) for identity and for solubility in D2O of at least 200µM. Substances, that did not fulfill these quality criteria (17%), were removed from the library. Chemical shifts were used to design 8 screening mixtures consisting of 36 compounds each. A genetic algorithm was used to solve the optimization problem of mixture prediction (unpublished data). Prior to screening, all mixtures were analyzed in <sup>19</sup>F NMR spectra after 18–24 h incubation at room temperature to ensure stability of the mixtures. Compounds experiencing precipitation or changes in chemical shift were removed from the following experiments. The quality control left 281 compounds (83%) to be prepared in mixtures of 100µM compound each, 100µM TFA, 150 mM sodium chloride in 20 mM Tris–HCl, pH 7.8, in 20% D2O (Euriso-Top) that were stored at −20°C as aliquots until used.

#### **NMR SCREENING**

All protein samples were prepared at 20µM of final concentration in 20 mM Tris–HCl, pH 7.8, with 150 mM sodium chloride and 1 mM EDTA and mixed 1:1 with the screening mixture aliquots resulting in a final protein and compound concentration of 10 and 50µM, respectively, in 500µL final volume. Fluorine spectra were recorded with a spectral width of 140 ppm and a transmitter offset at −120 ppm, acquiring 128 scans, with an acquisition time of 0.8 and 2 s relaxation time. T2-filtered spectra were recorded using a CPMG pulse sequence with a 180° pulse repetition rate of 50 Hz and duration of 1.0 s using same acquisition and relaxation times (79, 80). Two CPMG spectra were recorded per mixture to cover the full spectral width. A spectrum ranging from −50 to −100 ppm and from −100 to −150 ppm was recorded with 96 and 256 scans, respectively. Screening was performed first in the presence and absence of protein including 0.5 mM EDTA. Next, calcium chloride was added to a final concentration of 10 mM and measurements were repeated. All spectra were analyzed for changes in peak intensity and chemical shift. As an additional quality control, frequent hitters identified during unrelated screening campaigns were removed.

#### **SPR FOLLOW-UP SCREENING**

All surface plasmon resonance (SPR) measurements were performed on a Biacore® T100 (GE Healthcare, Chalfont St. Giles, UK) with a flow-rate of 10µL min−<sup>1</sup> using HBS-P buffer [10 mM HEPES (4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid), pH 7.6, 150 mM sodium chloride, 0.05% Tween-20] at 298 K. DC-SIGN ECD was immobilized on a CM7 Series S sensor chip in a density of 3317 RU using 0.2 M EDC (1-ethyl-3-(3 dimethylaminopropyl)carbodiimide, Sigma-Aldrich) and 0.05 M NHS (*N*-hydroxysuccinimide, Merck, Hohenbrunn, Germany) as coupling reagents. The activated surface was saturated with 1 M ethanolamine (Sigma-Aldrich), pH 8.5, after immobilization. The reference flow cell was treated in the same manner without immobilizing protein. Prior to measurements, the solubility of each compound in SPR buffer was determined by recording absorption spectra at different concentrations between 400 and 800 nm in clear 96-well plates (Nalge Nunc International, Penfield, NY, USA) in a SpectraMax M5 plate reader (Molecular Devices, Sunnyvale, CA, USA). During SPR measurements, fragments were injected for 30 s following a dissociation time of 120 s at 10µL min−<sup>1</sup> flowrate omitting regeneration as fast off-rates were observed for all ligands. To estimate the apparent affinity of a compound, at least three dilutions between 0.1 and 1 mM depending on the solubility were run in triplicates, blanking the data against a corresponding DMSO control. A positive control was included during screening to ensure stability of the sensorgrams. A 1:1 binding model was applied for data fitting:

$$\text{RU} = \text{RU}\_{\text{max}} \frac{[L]}{K\_{\text{D, app} + [L]}} \tag{1}$$

with the fragment concentration [*L*], the measured relative response units RU, the apparent dissociation constant *K* D,app, and the maximal relative response units RUmax using Origin8.6Gpro (OriginLab, Northampton, MA, USA). The maximal relative response units were estimated using:

$$\text{RU}\_{\text{max}} = \text{A} \cdot \text{RU}\_{\text{immobilized}} \frac{\text{MW}\_{\text{compound}}}{\text{MW}\_{\text{protein}}} \tag{2}$$

with the immobilization level of protein RUimmobilized, the molecular weight of the compound and protein MWcompound and MWprotein, respectively, and the remaining activity of the protein on the chip A. The latter was determined to be 0.6 using 4 as positive control (Figure S4 in Supplementary Material). The apparent affinity constant for each compound was determined under two conditions, either in the presence of 0.5 mM EDTA or 2 mM calcium chloride included in the running and sample buffer. Ligand efficiencies (LE) were calculated applying

$$\text{LE} = -\frac{\text{RT}\ln\left(K\_{\text{D, app}}\right)}{\text{HA}}\tag{3}$$

using the apparent dissociation constant *K* D,app, the temperature T, the gas constant *R*, and the number of non-hydrogen atoms HA (81).

#### **RESULTS**

#### **STRUCTURE-BASED SEQUENCE ALIGNMENT IDENTIFIES CANONICAL CARBOHYDRATE-BINDING SITES**

A comparative framework between the CTLRs served as the starting point of our druggability prediction. To this end, a structurebased sequence alignment was performed for 22 CRDs (Figure S3 in Supplementary Material). With an average of 41%, the global sequence similarity within the set of receptors is low. It spans a range from 26 to 86% (**Figure 1A**). A phylogenetic analysis based on this alignment yields a dendrogram that resembles the canonical classifications of CTLRs, in particular with respect to the correct assignment of members of the groups II, III, IV, V, and VII (1). Collectin-12 deviates from this classification, as it is part of the group II cluster. Moreover, Tetranectin and eosinophil major basic protein (EMBP) are the only representatives of group IX and XII used in this study. Both display elevated distances to other branches. EMBP and Tetranectin as well as Clec9a, Lox-1, Clec1b, and Reg1a have been reported to interact with non-carbohydrate ligands and all of these CTLRs were assigned to cluster B. Strikingly, CRDs known to recognize carbohydrates via the Ca2+-2-binding site are exclusively present in cluster A (**Figure 1A**).

#### **CONSENSUS STRUCTURE PREDICTION REVEALS ELEVATED STRUCTURAL VARIABILITY IN THE LONG LOOP**

Contrasting the low global sequence similarity, the overall structure of the CTLD is highly conserved. RMSD values of Cα atoms obtained from the structure-based multiple sequence alignment are uniformly low and do not exceed 3.2 Å (**Figure 1A**). To visualize the conservation of the domain architecture, we calculated a consensus structure (**Figure 1B**). While the core of the CTLD displays only minor deviations, a higher level of structural variability characterizes the two loop regions. The long loop is of particular interest as it harbors the Ca2+-1, -2, and -3 sites and thus plays a fundamental role in Ca2+-dependent carbohydrate recognition (21).

#### **COMPUTATIONAL ANALYSIS PREDICTS LOW DRUGGABILITY FOR THE MAJORITY OF CTLRs**

The initial identification of binding sites with DoGSite yielded between three and nine sites for CRDs and 9–19 for ECDs. Next, DoGSiteScorer was applied to calculate druggability scores. In the scoring scheme of this program, scores over 0.5 are indicative of a druggable binding site (73). At least one site that meets this criterion is found for the majority of the analyzed CTLRs (**Figure 2A**). However, targeting these sites with drug-like molecules will not necessarily exert an effect on the physiological function of the respective CTLR.

We propose that binding sites in proximity to Ca2<sup>+</sup> ions located in the long loop region are relevant to carbohydrate recognition. Therefore, we assumed that small molecule-binding to these sites potentially modulates CTLR function. To this end, binding sites were assigned to four categories: (i) Ca2+-2-dependent, (ii) Ca2+ associated binding sites, (iii) Ca2+-independent carbohydratebinding sites, and (iv) other binding sites (**Figure 2A**). Ca2+ associated binding sites (i, ii) were identified by DoGSite for all CTLRs coordinating a Ca2+-2 ion except for Mincle and the Langerin trimer. Experimentally determined Ca2+-independent

carbohydrate-binding sites (iii) were identified for DC-SIGN, DC-SIGNR, and Reg3a (58, 82). The existence of a single druggable site is sufficient to render a target druggable. Accordingly, for each CTLR, sites assigned to categories (i) and (ii) displaying the highest score were selected for statistical analysis and a mean druggability score of 0.47 was calculated (**Figure 2B**). This classifies CTLRs as "difficult" or even "undruggable" targets (73). Notably, individual receptors such as SP-D and Collectin-12 possess favorable pockets in the long loop region. Other targets such as E-Selectin display druggability values well below the mean.

#### **FRAGMENT SCREENING REVEALS HIGH HIT RATES AGAINST DC-SIGN, LANGERIN, AND MCL**

The existence of pockets on the surface of a receptor that are suitable to accommodate drug-like ligands can be assessed experimentally using fragment screening. The resulting hit rate serves as a predictor for druggability. Therefore, we composed a chemical library of fragments to be used in a homogeneous, labelfree NMR-based screening assay. All fragments carry a fluorine atom, which allows for <sup>19</sup>F NMR spectroscopy-based assessment of fragment binding. After quality control, 281 fragments were available for screening in 8 mixtures of maximum 36 fragments. The fragment library displays high shape and chemical diversity (**Figures 3A,B**).

DC-SIGN CRD and ECD were screened against the fragment library using <sup>19</sup>F and T2-filtered <sup>19</sup>F NMR spectra. Fragment binding to DC-SIGN was observed monitoring changes in chemical shift, line broadening, and T<sup>2</sup> relaxation. Moreover, three spectra were recorded per fragment mixture. First, a spectrum was recorded in the absence of protein to exclude false positives such as Ca2<sup>+</sup> chelators. The second spectrum was acquired in the presence of 10µM protein to monitor fragment binding. Finally, Ca2<sup>+</sup> was added in excess to the protein-fragment mixture, hypothesizing that metal binding to DC-SIGN modulates interaction of those fragments that are good candidates for inhibition of carbohydrate recognition (*vide supra*). Hits for DC-SIGN CRD and ECD were combined and frequent hitters were removed. Consequently, we identified 38 hits (13.5%) from mixtures binding to DC-SIGN in a Ca2+-dependent manner (**Figure 3C**). Out of these hits, 16 were found in both screenings and 21 hits were identified only during the CRD screening. Only one fragment was found while the ECD was used for screening.

To further validate these hits, SPR spectroscopy was employed as an orthogonal biophysical assay. This method not only detects binding of small molecules to macromolecules, but also allows for the determination of equilibrium dissociation constants. DC-SIGN ECD was immobilized on the chip surface and two experimental setups were utilized to differentiate Ca2+-mediated fragment binding from Ca2+-fragment competition. Fragments

were injected either in the presence of 0.5 mM EDTA or 2 mM calcium chloride (**Figure 3D**), confirming a 1:1 binding model for 18 fragments (47%). Five fragments (13%) bound with a higher stoichiometry, 3 experienced no change in response in presence or absence of Ca2<sup>+</sup> (8%), and 12 fragments (32%) did not give rise to detectable signals. The highest affinities measured were in the upper micromolar to lower millimolar range (0.6 mM < *K* D,app > 1.3 mM). Of the 18 fragments confirmed by SPR, 9 showed increase in affinity upon Ca2<sup>+</sup> addition and 9 displayed competitive behavior. Moreover, fragments similar to substructures of an already published submicromolar DC-SIGN inhibitor were identified (41, 42) (**Figure 4**). While fragments 1 and 2 bound competitive with the polysaccharide mannan in a <sup>19</sup>F NMR competition assay, fragment 3 showed no such behavior upon addition of the natural carbohydrate ligand of DC-SIGN (data not shown).

In light of our computational analysis, we were surprised to find such a high fragment hit rate for DC-SIGN, and decided to expand our <sup>19</sup>F NMR-based druggability prediction against the ECDs of two further CTLRs. We decided to screen our fragment library against Langerin being sufficiently distant to DC-SIGN in our structural sequence alignment (**Figure 1A**). To compare these findings to a CTLR more closely related to Langerin, we also included MCL in our analysis. Both proteins were expressed in *E. coli* and screened following the same protocol as for DC-SIGN. Again, Ca2<sup>+</sup> was utilized as a competitor (Figures S5A,B in Supplementary Material) and several hits associated with Ca2<sup>+</sup> binding were identified (**Table 3**). The pairwise overlap between the three CTLRs was low and none of the fragment hits bound to all CTLRs (Figure S5C in Supplementary Material).

#### **DISCUSSION**

In this report, we assessed the potential of human CTLRs to be targeted with drug-like molecules. Therefore, we explored the ability of a set of CTLRs to accommodate inhibitors to modulate the receptor–carbohydrate interaction. This druggability prediction is an important part of the decision on whether a drug discovery campaign should be pursued (28–30). Despite a large body of recent research highlighting the importance of CTLRs in immune cell regulation, pathogen uptake, and as targets for adjuvants, only afew drug-like molecules have been developedfor the CTLRfamily (2, 16, 17, 20). Herein, we aimed to rationalize why these receptors are considered challenging targets.

To start our investigations, CTLR druggability was predicted by computational methods. No data focusing on CTLRs are available and more general reports on glycan-binding proteins presented low druggability scores (39, 40). Unfortunately, the exact structures were either not disclosed or highly redundant and no CTLR was explicitly included. We assembled a set of 21 human CTLRs, and the murine Dectin-1. The latter was included as a reference as it is a well-studied CTLR and harbors a potential noncanonical, calcium-independent carbohydrate recognition site. The druggability prediction was performed using DoGSiteScorer, recently released software to predict the druggability of protein targets based on structural and physicochemical properties (73). Here, potential pockets on the protein surface were identified first, and then scored according to their physicochemical properties. Major determinants of druggability are depth, volume, and amino acid composition of the pocket (28, 32, 34, 36, 73). Generally, highly hydrophilic binding sites are considered undruggable (36).

Between three and nine binding sites were identified for CRDs, which is in accordance with values reported for other protein families (32). For Langerin, MBP-C, and Tetranectin, data on the homo-trimeric form were available. Here, the algorithm identified more potential sites, which is not surprising due to the larger surface area and symmetry of the assemblies (**Figure 2A**). Yet, targeting this initial set of binding sites does not necessarily interfere with carbohydrate recognition. Therefore, we categorized pockets according to their potential to modulate glycan binding. We argue that a druggable pocket located in close proximity of the long loop renders it a potential binding site for an inhibitor. The loop exhibits considerable movement in the absence of calcium as observed for other CRDs (65, 67, 83, 84) and adjacent sites have been proposed to communicate with the primary carbohydrate

recognition site (22, 23). Four categories of sites were defined out of which only two, namely categories (i) and (ii), are either directly or indirectly associated with calcium ion binding.

The success-rate of detecting the canonical Ca2+-2 site (i) was low. Only 4 out of the 14 structures known to harbor such a site were identified (**Figure 2A**). This low number reflects a limitation of the employed pocket prediction, potentially due to shallow architecture of the Ca2+-2 sites. The low druggability score of the successfully identified Ca2+-2 sites corroborates this finding. Overall, these findings suggest that identification of carbohydrate recognition sites with computational algorithms such as DoGSite is challenging (*vide infra*).

Moreover, we analyzed a larger panel of sites associated with either the Ca2+-1, -2, or -3 site, summarized in category (ii)

**Table 3 | Hit rates for three C-type lectin receptors from <sup>19</sup>F NMR screening against a library of 281 compounds and hits confirmed by SPR for DC-SIGN**.


(**Figure 2A**). The criteria of this category were less stringent and based on an extended definition of sites potentially interfering with carbohydrate binding. Again, druggable sites were sparse. Collectin-12 and SP-D, both members of the Collectin group (CTLR group III), represent notable exceptions. Furthermore, our data on Langerin, for which monomer and trimer were analyzed side by side, highlight that subtle changes in the long loop region upon oligomerization abrogate the recognition of these sites by DoGSite (62).

Low scores for category (ii) sites are also found for members of cluster B of the sequence alignment. This cluster exclusively contains CTLRs not known to bind carbohydrates with their Ca2+-2 site (**Figure 2A**). The Ca2+-independent carbohydratebinding sites of category (iii) found for Reg3a (group VII) is located in other regions of the CRD fold and has druggability scores of 0.56, predicting this CTLR to be challenging (82). Overall, only a few members of the CTLR family were predicted to be druggable (**Figure 2B**), which is in line with previous reports on glycan-binding proteins (39, 40).

To substantiate the computational studies, a <sup>19</sup>F NMR-based fragment screening against one of the analyzed CTLRs was conducted. We chose DC-SIGN because as a viral uptake receptor it is of pharmacological interest and has been targeted in a high-throughput screening (41). While the successful HTS was already an indicator of DC-SIGN being amendable to fragment binding, the low druggability assessment by our computational analysis predicted a low hit rate of fragments interfering with any of the three DC-SIGN calcium sites. To our surprise, a high hit rate of 13.5% of the fragments from our library bound to DC-SIGN in Ca2+-associated sites during the NMR screening. The follow-up screening via SPR validated 18 (47%) of these fragments, a value not unusual for these two assay systems (85). Hits that were not validated by the SPR screening were either superstoichiometric binders (13%), not competitive with Ca2<sup>+</sup> (8%), or had affinities below the detection limit of the SPR assay. The latter can be attributed to the high sensitivity of <sup>19</sup>F NMR as a primary screen (38). Together, NMR and SPR result in a hit rate of 6.4%, which is in the expected range for fragment-based screenings and does not suggest a low likelihood to bind drug-like molecules (31, 37, 43, 86).

We performed the primary NMR screen against the CRD and the tetrameric ECD of DC-SIGN. Notably, only one fragment was uniquely identified during the screening of the ECD compared to 21 in the CRD screening. Conversely, many fragments binding to the ECD were later discovered to be false positives, such as frequent hitters from unrelated screening campaigns against non-CTLR targets. Hence, we conclude that screening for inhibitors has a lower false positive rate in absence of the neck region of DC-SIGN.

Another indicator for the validity of our screen to discover fragments inhibiting carbohydrate binding to DC-SIGN was the identification of the three fragments 1, 2, and 3. These hits are similar to substructures of the previously reported micromolar DC-SIGN inhibitor 4 (**Figure 4**) (41). In this respect, four has been shown to compete with carbohydrate binding and antagonized the DC-SIGN-mediated cell adhesion and particle uptake (41, 42). Direct competition between four and the three fragments was hampered by direct interaction of the fragments with four in absence of DC-SIGN (data not shown). Thus, mannan was employed to compete with fragments 1–3 and resulted in reproducible competition with fragments 1 and 2 (data not shown). Although, fragment 3 did not experience competition with the natural ligand, it can be speculated that it is associated with the binding site, as recognition was detected in SPR only in presence of Ca2<sup>+</sup> (**Figure 4**). Moreover, other fragment hits showed even higher LE ranging from 0.30 to 0.37, which is a good starting point for further fragment evolution. A subsequent expansion of our <sup>19</sup>F NMR-based screening to Langerin and MCL, also revealed similarly high hit rates (**Table 3**). Following up on these initial hits is subject of current research in the laboratory.

These encouraging experimental results are in contrast to our computational predictions. We attribute this conflict to the limitations of the DoGSiteScorer algorithm, which on the one hand is not parameterized for carbohydrate or metal binding sites (72) and on the other does not account for protein flexibility. Currently, there is no single software for druggability prediction available that is able to overcome these limitations.

Throughout the experimental evaluation, we employed competition with calcium ions as an indicator for the inhibition of carbohydrate recognition. We assumed the existence of allosteric sites originating from the flexibility of the long loop and cooperativity between the adjacent sites as previously described for other CTLRs (22, 23, 65, 67, 83, 84). In this context, it should be noted that accounting for conformational dynamics is recognized as a particular challenge for the development of improved algorithms (34).

To summarize, we report high *in silico* druggability scores for group III and V CTLRs as well as high experimental hit rates from fragment screenings against group II CTLRs. These data stand alongside with a successful drug design campaign that has already been launched against group IV CTLRs (19). Hence, we conclude that our data, while highlighting the limitations of current computational methods, support the assessment of CTLRs as suitable targets for drug-like molecules.

#### **ACKNOWLEDGMENTS**

The authors thank the Max Planck Society, the German Research Foundation (DFG, Emmy Noether program RA1944/2-1), and the Chemical Industry Fund for financial support. Dr. Andrea Volkamer is gratefully acknowledged for support using DoGSiteScorer and Olaf Niemeyer for NMR technical support. Jonas Hanske is supported by a fellowship from the Chemical Industry Fund and Eike Christian Wamhoff acknowledges funding from the Collaborative Research Centre 765. We thank Prof. Dr. Peter H. Seeberger for support and helpful discussions.

#### **SUPPLEMENTARY MATERIAL**

The Supplementary Material for this article can be found online at http://www.frontiersin.org/Journal/10.3389/fimmu.2014.00323/ abstract

#### **REFERENCES**


implications for ligand binding and signaling. *J Biol Chem* (2011) **286**:24208–18. doi:10.1074/jbc.M111.226142


in methodology affect hit identification? *J Biomol Screen* (2013) **18**:147–59. doi:10.1177/1087057112465979

86. Mashalidis EH, Sledz P, Lang S, Abell C. A three-stage biophysical screening cascade for fragment-based drug discovery. *Nat Protoc* (2013) **8**:2309–24. doi:10.1038/nprot.2013.130

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Received: 12 May 2014; accepted: 26 June 2014; published online: 10 July 2014. Citation: Aretz J, Wamhoff E-C, Hanske J, Heymann D and Rademacher C (2014) Computational and experimental prediction of human C-type lectin receptor druggability. Front. Immunol. 5:323. doi: 10.3389/fimmu.2014.00323*

*This article was submitted to Immunotherapies and Vaccines, a section of the journal Frontiers in Immunology.*

*Copyright © 2014 Aretz, Wamhoff, Hanske, Heymann and Rademacher. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

## **Carbohydrates in cyberspace**

#### *Elizabeth Yuriev <sup>1</sup> \* and Paul A. Ramsland2,3,4,5 \**

*<sup>1</sup> Monash Institute of Pharmaceutical Sciences, Monash University, Parkville, VIC, Australia, <sup>2</sup> Centre for Biomedical Research, Burnet Institute, Melbourne, VIC, Australia, <sup>3</sup> Department of Immunology, Alfred Medical Research and Education Precinct, Monash University, Melbourne, VIC, Australia, <sup>4</sup> Department of Surgery Austin Health, University of Melbourne, Heidelberg, VIC, Australia, <sup>5</sup> CHIRI Biosciences, School of Biomedical Sciences, Curtin University, Perth, WA, Australia*

**Keywords: Carbohydrate, curation, glycan, molecular modeling, online database, three-dimensional structure**

Many research areas, particularly those focused on structural aspects of biomolecules, have "moved into cyberspace" in the last 20–30 years. This move has become even more prominent in the last 5–10 years, amplified by the increased space accessibility (from mobile devices to the Cloud) and by ever more sophisticated and powerful resources available for high-performance computing. The field of structural glycobiology has greatly benefitted from such progress and has taken advantage of the computational tools and resources developed specifically for the structural analysis of carbohydrates. In this short opinion piece, we do not intend to provide a comprehensive coverage of all such resources. We will briefly survey four websites [GLYCAM, UniCarb KnowledgeBase (UniCarbKB), Glycosciences.de, and Glyco3D], offering tools for structural glycobiology, and will provide some examples of studies where such tools have been successfully used for research relevant to understanding of carbohydrate recognition by proteins involved in immunity and infection. For a more comprehensive listing of computational resources for structural glycobiology, readers should consult Ref. (1–4).

#### *Edited by:*

*Lee Mark Wetzler, Boston University School of Medicine, USA*

#### *Reviewed by:*

*Serge Perez, Centre national de la recherche scientifique, France*

#### *\*Correspondence:*

*Elizabeth Yuriev elizabeth.yuriev@monash.edu; Paul A. Ramsland pramsland@burnet.edu.au*

#### *Specialty section:*

*This article was submitted to Immunotherapies and Vaccines, a section of the journal Frontiers in Immunology*

> *Received: 23 April 2015 Accepted: 26 May 2015 Published: 10 June 2015*

#### *Citation:*

*Yuriev E and Ramsland PA (2015) Carbohydrates in cyberspace. Front. Immunol. 6:300. doi: 10.3389/fimmu.2015.00300*

GLYCAM-Web (5) is focused on the prediction of three-dimensional (3D) structures of carbohydrates and macromolecular structures that include carbohydrates. The server is created and operated by the research group of Prof. Robert J. Woods in the Complex Carbohydrate Research Center (CCRC) (6) at the University of Georgia in Athens. The server can perform conformational modeling of oligosaccharides as well as 3D modeling of glycoproteins. The main interfaces for modeling oligosaccharide conformations are the Carbohydrate Builder [including a special Builder for glycosaminoglycans (GAGs)], the Glycoprotein Builder, and the Oligosaccharide Libraries. The server is user-friendly and allows a range of upload and download options. Given a range of file formats – often, program-specific – used in molecular modeling, user-friendliness related to the portability of file formats is a significant "real-estate" feature highly valued by the cyberspace dwellers.

Most of the back-end software used by the server is open source and freely available to the public. This feature is also highly valued by viewers. The reason is not only reduced cost but also the openness of science that could be performed with open-source software. We consider such open access as critical in the field of molecular modeling: it allows researchers to use the program codes and develop them further, thus advancing the development of better modeling methods.

The main engine room of the server is the GLYCAM force field, the outcome of more than 20 years of development. Its latest incarnation, GLYCAM06 (7), has been significantly modified to satisfy the following requirements: transferability of the parameter set to all carbohydrate ring conformations and sizes, carbohydrate derivatives, and other biomolecules; self-containment and transferability to many quadratic force fields; ability to treat both α- and β-anomers without specific atom types. The GLYCAM-Web server (5) is a very widely and extensively used resource. Its statistics records show that an average of 1700–1800 unique users access the site monthly (data examined: December 2014–March 2015).

The GLYCAM-Web server (5) and the GLYCAM force field (7) were used for the investigation of recognition specificity of ABO blood group antigens by antibodies (8). Such circulating antibodies could cause a hyperacute immune response and sometimes death resulting from mismatched blood transfusion or organ transplantation (9). Another recent example of application of GLYCAM was in the study of structure and immune recognition of HIV-1 envelope (10), where it was used for modeling the *N*-linked glycan shield.

UniCarb KnowledgeBase (11, 12) was conceived to support online data storage and search capabilities for glycomics and glycobiology by integrating structural, experimental, and functional information. The aim of UniCarbKB is to advance the understanding of structures, pathways, and networks involved in glycosylation and processes mediated by carbohydrates. This information-rich resource is freely accessible, supports data annotation, and promotes adoption of common standards, necessary for seamless and meaningful integration of structural and functional data. UniCarbKB is the outcome of the collaborative effort of researchers from Australia (Macquarie University and University of NSW), Germany (Max Planck Institute of Colloids and Interfaces), Ireland (National Institute for Bioprocessing Research and Training), Japan (Soka University), Sweden (University of Gothenburg), Switzerland (Swiss Institute for Bioinformatics and University of Geneva), and USA (CCRC).

UniCarb KnowledgeBase provides querying interfaces for three-structural databases (GlycoSuiteDB, EUROCarbDB, and GlycoBase) and a link to the UniCarb-DB (13). UniCarbKB continues efforts started with GlycoSuiteDB (14) in curating structural glycobiology information from research literature. GlycoBase (15) and UniCarb-DB (16) are databases of experimental glycan structures. Support for EUROCarbDB (17) is planned for the future release.

At the time of writing, over 906 references, 3238 glycan structure entries, and 898 glycoproteins have been curated in Uni-CarbKB. A total of 598 protein glycosylation sites have been annotated using experimentally confirmed glycan structures. Beyond storing, curating, and providing searchable access to structural and functional data, UniCarbKB offers a range of tools for data pre-process and analysis: GlycanBuilder (18, 19) – for fast and intuitive drawing of glycan structures; GlycoMod tool (20, 21) – for predicting the possible glycan structures on proteins, from their experimentally determined masses. Uni-CarbKB is connected to the GLYCAM site (described above) and RINGS – a web resource offering algorithms and data mining tools for glycobiology research (22). Further, through its SugarBindDB module, UniCarbKB is linked to the Functional Glycomics Gateway of the Consortium for Functional Glycomics (CFG) (23) and the Glyco3D Portal for Structural Glycosciences (24, 25).

The GlycoMod tool of the UniCarbKB was used in a study of the formation of subcellular-specific *N*-glycosylation glycoprotein determinants (26). Lee et al. used liquid chromatography and mass spectrometry-based quantitative glycomics to investigate eight human breast epithelial cells with diverse genotypes and phenotypes – mostly human breast cancer cell lines – and have shown that the secreted glycoproteins consistently displayed more processed, primarily complex type, *N*-glycans than the high-mannose-rich microsomal glycoproteins. They have also demonstrated that secreted glycoproteins displayed significantly more α-sialylation and α-1,6-fucosylation, but less α-mannosylation, than both cell-surface and microsomal glycoproteomes.

Of particular interest to those working on the glycobiology of infectious disease is the SugarBindDB module of UniCarbKB – a database of pathogen lectins and corresponding glycan targets (27, 28). The SugarBindDB advances a pathogen-capture technology specifically for the binding of pathogen lectins to carbohydrate epitopes. The contents of the database come from publicly accessible information. After experts (glycobiologists, microbiologists, and medical histologists) locate candidate papers, the binding data is extracted for human pathogens and specific glycans.

The SugarBindDB database can be searched using names of different entities involved in the carbohydrate-protein interactions. For pathogenic agents (e.g., *Pseudomonas aeruginosa*), the outcomes include agent type and taxonomy ID. For lectins themselves (e.g., PA-IIL), the database stores genes, Protein Data Bank (PDB) codes, and associated Glyco3D and CFG references (see below for more information about the Glyco3D resources). A query could also be constructed for specific ligands (e.g., Lewis X). The entries in SugarBindDB are limited to sugar sequences that demonstrate consistent binding to pathogen lectins. Sugar sequences include "glycotope" within a larger carbohydrate structure (the precise lectin-binding site) or entire structures, depending on specific findings within the cited papers. Similar to the GLYCAM-Web server mentioned above, the SugarBindDB is user-friendly in that it allows displaying glycans in a variety of formats; specifically, it supports the modified 2D condensed IUPAC nomenclature, Oxford symbol notation, and the colored CFG cartoons. A sugar query can be input as a structure drawn with GlycanBuilder (18, 19). The SugarBindDB database can also be browsed by disease (e.g., Influenza) or by published references (by author or title descriptions) or searched by the affected area in the pathology (e.g., intestine). Search outputs are very informatively interconnected, i.e., the biological context of each ligand is shown in relation to a disease, to other known ligands, to similar lectins, etc. At the time of writing, over 178 references, 549 pathogenic agents, and 200 ligands have been curated in SugarBindDB.

Glyco3D (25) was developed at the Centre de Recherches sur les Macromolécules Végétales (CERMAV-CNRS), France. It comprises a group of databases containing the 3D data collected from an extensive screening of scientific literature: BiOligo (more than 250 bioactive oligosaccharides along with about 120 of their constituting disaccharide and about 80 monosaccharide segments); PolySac3DB (157 polysaccharides); Lectin3D (more than 1,000 structures); GAG3D (GAG-binding proteins co-crystallized with their ligands); MAbs (a limited set of high resolution structures of carbohydrate–antibody complexes); GT3D (glycosyltransferases crystallized with or without their carbohydrate ligands). Lectins are oligomeric proteins, which can specifically recognize carbohydrates and are involved in many processes crucial to infection and immunity (29); ~70% of the Lectin3D structures are present in complex with a carbohydrate ligand, from monosaccharides to oligosaccharides or glycoproteins, allowing great insight into the molecular basis of carbohydrate recognition by lectins. GAGs are complex anionic polysaccharides (e.g., heparin), specifically recognized by protein receptors, which participate in the regulation of many processes, including cell adhesion. Anticarbohydrate antibodies, demonstrating specificity to various carbohydrate epitopes, are of a high importance in immunology and vaccine development (30).

The individual databases within Glyco3D are publically accessible via the portal (24), furnished with a user-friendly graphical user interface with several search options. All 3D structures, established using a variety of structure determination methods, can be visualized and some basic measurements are possible on the spot. For a more detailed analysis, the structures can be downloaded in a variety of commonly used molecular file formats.

Why would a user access 3D structural information relevant to glycobiology via databases such as Glyco3D while it can be obtained from the original sources, mainly PDB (31)? The answer is: in databases, such as Glyco3D, the data is manually curated and value-added. Such curation and value-adding is indispensable, given poor quality of structural information for carbohydrates in PDB. Agirre et al. have recently demonstrated that 64% of all *N*-glycan -pyranosides in PDB reflect poor fit to the electron density (32). They have revealed significant structural errors, such as incorrect conformations, stereochemical configurations, and linkages, resulting from incorrect model building from the start, as well as more subtle errors, resulting from incorrect refinement. Even subtle errors could be detrimental if incorrect structures are interpreted in a biological context. The reasons for errors are various, but the main one seems to be poor chemical understanding on the part of crystallographers and the lack of appropriate torsional restraints, often in disagreement with the low-resolution data. The detriment of such errors goes beyond incorrect interpretations in specific cases. It also leads to skewed statistics based on all published/deposited structures, creating new abnormal "norms." Thus, structural glycobiology is now ripe for remediation of the type that the protein crystallography field underwent in not too distant past (33).

Tools for structural validation of carbohydrate 3D structural data are available, have been developed specifically for carbohydrates, and are publically accessible (34). For example, pdb-care (35) can check residue notation (which can be a source of confusion for carbohydrate residues) and atom connectivities and evaluate biological correctness of glycan structures. CARP (carbohydrate Ramachandran plot) (36) analyses carbohydrate data by generating a separate Phi/Psi plot for each linkage type, such as in the study of conformational preferences of *Shigella flexneri* O-antigens and their implications for vaccine design (37). Information necessary for such plots can be obtained from the GlyTorsion database (36), containing glycosidic torsions observed in PDB entries, or from conformational maps based on molecular dynamics simulations provided by GlycoMapsDB (38). Both pdbcare and CARP are available at the Glycosciences.de portal (39).

To summarize, the online resources available for structural glycobiology reviewed here demonstrate good scientific citizenship features: multinational collaborative approach to design and development, open access to data and code, and the drive for common and accepted standards. The collaborative approach is illustrated not only by initiatives such as UniCarbKB, but also by efforts to cross-link and integrate existing resources (e.g., SugarBindDB and Glyco3D) and map functional and structural information (e.g., mapping of SugarBindDB to GlycoSuiteDB to link saccharide sequences with their physical location in the body). Further, somewhat altruistic, aspect of these developments is the extensive effort that goes into manual curation of data. Efforts like these should be recognized, credited, and supported more strongly by relevant scientific societies and funding bodies.

### **Acknowledgments**

The authors gratefully acknowledge the contribution toward this study from the Victorian Operational Infrastructure Support Program received by the Burnet Institute.

### **References**


a global glycan reference MS/MS repository. *Biochim Biophys Acta* (2014) **1844**:108–16. doi:10.1016/j.bbapap.2013.04.018


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Yuriev and Ramsland. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

### ADVANTAGES OF PUBLISHING IN FRONTIERS

FAST PUBLICATION Average 90 days from submission to publication

COLLABORATIVE PEER-REVIEW

Designed to be rigorous – yet also collaborative, fair and constructive

RESEARCH NETWORK Our network increases readership for your article

#### OPEN ACCESS

Articles are free to read, for greatest visibility

#### TRANSPARENT

Editors and reviewers acknowledged by name on published articles

GLOBAL SPREAD Six million monthly page views worldwide

#### COPYRIGHT TO AUTHORS

No limit to article distribution and re-use

IMPACT METRICS Advanced metrics track your article's impact

SUPPORT By our Swiss-based editorial team

EPFL Innovation Park · Building I · 1015 Lausanne · Switzerland T +41 21 510 17 00 · info@frontiersin.org · frontiersin.org