# PROTEOMICS OF MICROBIAL HUMAN PATHOGENS

EDITED BY: Nelson C. Soares, Jonathan M. Blackburn and German Bou PUBLISHED IN: Frontiers in Microbiology

#### *Frontiers Copyright Statement*

*© Copyright 2007-2017 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88945-088-6 DOI 10.3389/978-2-88945-088-6

# About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

# Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

# Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

# What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

# **PROTEOMICS OF MICROBIAL HUMAN PATHOGENS**

Topic Editors:

**Nelson C. Soares,** University of Cape Town, South Africa **Jonathan M. Blackburn,** University of Cape Town, South Africa **German Bou,** Complejo Hospitalario Universitario A Coruña, Spain

The murine macrophage cell line RAW 264.7 was infected with GFP-labelled Mycobacterium smegmatis at a multiplicity of infection (MOI) of 10:1. The nuclei of RAW cells were visualised with Hoechst and lysosomal compartments with LysoTracker® Red DND-99. Infection of marcophages was followed using a Zeiss Axiovert 200M LSM 510 Meta Confocal Microscope Credit: Dr. Clemens Hermann and Ms Nazla Hassen (UCT-iBMS)

According to the World Health Organization (WHO), in 2012 infectious diseases and related conditions account for more than 70% of premature deaths across 22 African countries and estimated 450 000 people worldwide developed multi-drug resistant tuberculosis. This alarming situation, of great public health concern, calls for the urgent development of novel and efficient responding strategies. The employment of important research platforms, such as genomics and proteomics, has contributed significant insight into the mechanisms underlying microbial infection and microbe-host interaction. In this Frontiers Research Topic, we aim to produce a timely and pertinent discussion regarding the current status of "Proteomics of microbial Human pathogens" and the role of proteomics in combating the challenges posed by microbial infection and indeed acquired anti-microbial resistance.

As the field of proteomics progressed from 2-DE gel based approaches to modern LC-MS/MS based workflows, remarkable advances have been reported in terms of data quantity and quality. Given the immediate and enormous advantages that high resolution and accurate mass spectrometers have brought to the field, proteomics has now evolved into a robust platform capable of generating large amounts of comprehensive data comparable to that reported previously in genomics studies. For example, detection of the complete yeast proteome has been reported and other small proteomes, such as those of bacteria, are within reach. Mass spectrometry-based proteomics has become an essential tool for biologists and biochemists, and is now considered by many as an essential component of modern structural biology.

Additionally, the introduction of high-resolution mass spectrometers has driven the development of various different strategies aimed at accurate quantification of absolute and relative amount of protein(s) of interest. Emerging targeted mass spectrometry methodologies such as; Selected Reaction Monitoring (SRM), Parallel Reaction Monitoring (PRM) and SWATH, are perhaps the latest breakthrough within the proteomics community. Indeed, through a label free approach, targeted mass spectrometry offers an unequalled capability to characterize and quantify a specific set of proteins reproducibility, in any biological sample. Usefully, Aebersold and colleagues have recently generated and validated a number of assays to quantify 97% of the 4,012 annotated Mycobacterium tuberculosis (Mtb) proteins by SRM. As such, the Mtb Proteome library represents a valuable experimental resource that will undoubtedly bring new insight to the complex life cycle of Mtb. Finally, as reviewed recently in Frontiers Research Topic, mass spectrometry-based proteomics has had a tremendous impact on our current understanding of post translational modification (PTM) in bacteria including the key role of PTMs during interaction of pathogenic bacteria and host interactions.

We believe that our understanding of microbial Human pathogens has benefited enormously from both 2-DE gel and modern LC-MS/MS based proteomics. It is our wish to produce an integrated discussion surrounding this topic to highlight the existing synergy between these research fields. We envisage this Research Topic as a window to expert opinions and perspectives on the realistic practicalities of proteomics as an important tool to address healthcare problems caused by microbial pathogens.

**Citation:** Soares, N. C., Blackburn, J. M., Bou, G., eds. (2017). Proteomics of Microbial Human Pathogens. Lausanne: Frontiers Media. doi: 10.3389/978-2-88945-088-6

# Table of Contents


*128 Ionic Liquids as Unforeseen Assets to Fight Life-Threatening Mycotic Diseases* Diego O. Hartmann, Marija Petkovic and Cristina Silva Pereira

# **Section II: Focus on bacterial post translational modifications**


Bridget Calder, Claudia Albeldas, Jonathan M. Blackburn and Nelson C. Soares

*148 Mass Spectrometry Targeted Assays as a Tool to Improve Our Understanding of Post-translational Modifications in Pathogenic Bacteria*

Nelson C. Soares and Jonathan M. Blackburn

# Editorial: Proteomics of Microbial Human Pathogens

#### Nelson C. Soares <sup>1</sup> \*, German Bou<sup>2</sup> and Jonathan M. Blackburn<sup>1</sup>

<sup>1</sup> Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa, <sup>2</sup> Servicio de Microbiologia-Instituto de Investigación Biomédica, Complejo Hospitalario Universitario A Coruña, A Coruña, Spain

#### Keywords: proteomics, mass spectrometry, microbes, bacteria, virulence, pathogens, protein posttranslational modifications

**The Editorial on the Research Topic**

#### **Proteomics of Microbial Human Pathogens**

Despite remarkable advances in treatment and prevention, infectious diseases remain amongst the leading causes of death worldwide, particularly in the developing world, and drug resistant pathogens are ominously on the rise. By way of one specific example, the global incidence of human tuberculosis (TB) disease—caused by infection with the pathogen Mycobacterium tuberculosis—is estimated to be ∼9 million new cases (∼0.1% of the global population) per annum, causing 1.5–2 million deaths per annum and with an estimated 450,000 people developing multi-drug resistant tuberculosis each year); moreover, the WHO estimates that ∼1/3rd of the World's population carries a latent M. tuberculosis infection, thus representing a huge reservoir of potential future TB cases. In the context of this Research Topic, it is also relevant that the burden of TB disease is very uneven globally, with certain countries in Africa having a much higher incidence than observed in the developed world—for example, in South Africa the national incidence of TB disease is around 1% of the population and a single city, Cape Town, reports more cases of TB annually than the whole of North America and Europe combined. Furthermore, roughly 21% of all deaths in South Africa are associated with TB disease today, despite proactive efforts at TB control since the beginning of the twentieth century and despite the TB control program in Cape Town achieving 75% case finding and 85% completion rates for smear positive disease. Similarly depressing statistics abound for numerous other infectious diseases in the developing world, with disease burden being driven by both microbial adaption (to drug resistance and to differing ecologies), as well as by the emergence of new, often zoonotic pathogens (e.g., ebola and zika viruses). According to the World Health Organization (WHO), infectious diseases today account for more than 70% of premature deaths across 22 African countries (Who, 2014), with co-infections, for example with HIV or helminths, being rife. It is therefore critical that we develop new molecular knowledge now that will inform new strategies in the global fight against human microbial pathogens—including those that are not prevalent in the developed world—with the use of state-of-the art "omics" technologies set to take center-stage through providing a detailed understanding of the mechanistic basis of pathogenicity and by creating a comprehensive description of the molecular interplays that mediate host-pathogen interactions.

During the last two decades, powerful high throughput research platforms—namely genomics, transcriptomics, and proteomics—have contributed significantly to our understanding of the life styles of pathogenic microbes, including important aspects of host interactions and infection (Raskin et al., 2006; Merhej et al., 2013; Yang et al., 2015). Of all the omics platforms, proteomics is perhaps the research platform that has undergone the greatest transformation, including a structural transformation, as it evolved from initial two-dimensional gel electrophoresis (2-DE gel) based workflows to sophisticated shotgun proteomics analysis, capable

#### Edited by:

Marc Strous, University of Calgary, Canada

#### Reviewed by:

Nadeem Omar Kaakoush, University of New South Wales, Australia

\*Correspondence: Nelson C. Soares nelson.dacruzsoare@uct.ac.za

#### Specialty section:

This article was submitted to Microbial Physiology and Metabolism, a section of the journal Frontiers in Microbiology

> Received: 26 August 2016 Accepted: 18 October 2016 Published: 04 November 2016

#### Citation:

Soares NC, Bou G and Blackburn JM (2016) Editorial: Proteomics of Microbial Human Pathogens. Front. Microbiol. 7:1742. doi: 10.3389/fmicb.2016.01742 of generating comprehensive data with a coverage of individual proteomes that begins to approach that of corresponding genomics/transcriptomics datasets now. Modern mass spectrometry (MS)-based approaches today enable the detection and relative/absolute quantification of several thousand proteins in a single run, providing the opportunity to make a major impact on our general understanding of microbial pathogens. The goal of this Research Topic is thus to provide a survey of recent mass spectrometry-based research on the proteomics of human pathogens, which term we use broadly to refer to microorganisms that cause disease in humans, including inter alia virus, bacteria, fungi, and protozoa.

In this Research topic, we have particularly sought to highlight the powerful synergies that can be found between the fields of proteomics and microbiology, as exemplified by the review of Baarda and Sikora, which outlines the treasure hunt for counter-measures against old disease (Baarda and Sikora). Their review starts by acknowledging early contributions of 2-DE gel approaches, and how this approach implicated novel proteins—such as peroxidoxin, outer membrane protein Rmp, and the 50S ribosomal proteins L7/L12—in the Neisseria gonorrhoeae acquired resistance to spectinomycin. They then highlight how the employment of recent quantitative proteomics (SILAC, iTRAQ, and iCAT) has illuminated the pathways utilized by N. gonorrhoeae to adapt to different lifestyles, including during host interaction itself. A similar trend, suggesting that progression in the field of proteomics strongly augments classical microbiological research on microbial pathogens, is also evident in several others reviews (Ravikumar et al.; Pérez -Llarena and Bou; Soufi and Soufi), as well as opinion articles, in this Research Topic (Soares and Blackburn).

Elsewhere in this Research Topic, Perez-Llarena and Bou describe proteomics as a tool to study bacterial virulence (Pérez-Llarena and Bou) and, following this lead, this Research Topic carries several original reports that employ proteomics based approaches to investigate different aspects of bacterial/fungi virulence, including biofilm formation (Arnal et al.), motility (Merino et al.), virulence factors (Diehl et al.), altered virulence between strains (Peters et al.), and determinants that mediate host interaction in Candida albicans (Marin et al.). For example, Peters et al. used a combination of discovery and targeted liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based proteomics to compare the proteomes of six clinically relevant mycobacterial strains within the M. tuberculosis complex (MTBC) (Peters et al.); MTBC members show a high degree of genetic conservation (∼99.9%), yet clinically they exhibit differing pathogenicity and virulence. Peters et al. identified an average of 3290 protein groups for each MTBC organism, corresponding to >80% coverage of the theoretical proteomes; thereafter, the authors identified quantitative differences between strains for specific proteins that could be linked to enhanced bacterial fitness in the more virulent W. Beijing lineage of M. tuberculosis (Peters et al.).

This Research Topic also includes a number of manuscripts that explore novel applications of mass spectrometry within microbial research (Hartmann et al.); (Potgieter et al.); (Soares and Blackburn) (Soufi and Soufi), providing evidence that proteomics has now evolved into a robust platform capable of generating comprehensive datasets comparable in size and utility to those described in genomic studies and enabling the application of mass spectrometry-based proteomics to genome reannotation and refinement (Krug et al., 2011). For example, in this topic, Potgieter et al. report the results of a proteogenomic analysis of M. smegmatis, integrating mass spectrometry-based proteomics with genomic six frame translation and ab initio gene prediction databases to identify a total of 2887 open reading frames (ORFs), including 2810 ORFs previously annotated to a reference protein and 63 ORFs not previously annotated to any reference protein (Potgieter et al.).

Finally, several papers in this Research Topic have described the enormous contribution of proteomics to the knowledgebased regarding bacterial protein post translation modifications (PTMs) and their intimate association with virulence (Ravikumar et al.); (Calder et al.); (Pérez -Llarena and Bou); (Soares and Blackburn); (Soufi and Soufi) and it will be interesting to see whether future investigation of bacterial PTMs provide a means to discover targets for novel therapies (Soufi and Soufi). In an Opinion article, we challenge researchers in the field to join efforts and to take advantage of recent advances in mass spectrometry instrumentation and in the development of targeted quantitative workflows that enable detection and accurate quantification of bacterial PTMs (Soares and Blackburn) since we believe that this represents a golden opportunity to access the dynamics of bacterial PTMs at sites of infection and that, through this, researchers will gain meaningful insights into the functional role of such PTMs during host-pathogen interactions.

# CONCLUSION

Overall, this Research Topic demonstrates the huge potential for modern mass spectrometry-based (phospho)proteomics to yield major breakthroughs in our understanding of host-pathogen interactions. Therefore, it seems reasonable to suppose that as proteomics-based experimentation is married ever more closely with biologically relevant models of human microbial disease, the analytical power of quantitative mass spectrometry will yield testable hypotheses about key molecular host-pathogen interactions and might identify new candidate drug or vaccine targets. However, in order to reach this Holy Grail, the burgeoning microbial proteomics field will need to not only infer but then experimentally validate true biological significance from amongst the vast datasets generated. Tighter integration and iteration between computational modeling of proteomic data and model-driven further experimentation, including use of cell biology, imaging and CRISPR technologies therefore seems likely to represent the future.

# AUTHOR CONTRIBUTIONS

NS and JB wrote the original draft of the paper, read and edited the final draft. GB critically discussed the content of the manuscript.

# ACKNOWLEDGMENTS

NS and JB thank the NRF for the South African Research Incentive Funding for Rated Researchers and Research Chair

# REFERENCES


grant respectively.NS thanks the South African Medical Research Council for a Fellowship. We would like to thanks all authors, reviewers, editors, participants, and Frontiers Editorial Office for the valuable contribution to this Research Topic.

interactions. Protein Cell 6, 265–274. doi: 10.1007/s13238-015- 0136-6

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Soares, Bou and Blackburn. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Identification of Quantitative Proteomic Differences between *Mycobacterium tuberculosis* Lineages with Altered Virulence

Julian S. Peters 1 †, Bridget Calder 2 †, Giulia Gonnelli <sup>3</sup> , Sven Degroeve<sup>3</sup> , Elinambinina Rajaonarifara<sup>2</sup> , Nicola Mulder <sup>2</sup> , Nelson C. Soares <sup>2</sup> , Lennart Martens <sup>3</sup> and Jonathan M. Blackburn<sup>2</sup> \*

<sup>1</sup> Centre of Excellence for Biomedical TB Research, Witwatersrand University, Johannesburg, South Africa, <sup>2</sup> Department of Integrative Biomedical Sciences, Institute of Infectious Diseases and Molecular Medicine, University of Cape Town, Cape Town, South Africa, <sup>3</sup> VIB, Ghent University, Ghent, Belgium

#### *Edited by:*

Marc Bramkamp, Ludwig-Maximilians-University Munich, Germany

#### *Reviewed by:*

Andreas Burkovski, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany Julia Elisabeth Bandow, Ruhr-Universität Bochum, Germany

#### *\*Correspondence:*

Jonathan M. Blackburn jonathan.blackburn@uct.ac.za

† These authors have contributed equally to this work.

#### *Specialty section:*

This article was submitted to Microbial Physiology and Metabolism, a section of the journal Frontiers in Microbiology

> *Received:* 23 January 2016 *Accepted:* 12 May 2016 *Published:* 31 May 2016

#### *Citation:*

Peters JS, Calder B, Gonnelli G, Degroeve S, Rajaonarifara E, Mulder N, Soares NC, Martens L and Blackburn JM (2016) Identification of Quantitative Proteomic Differences between Mycobacterium tuberculosis Lineages with Altered Virulence. Front. Microbiol. 7:813. doi: 10.3389/fmicb.2016.00813 Evidence currently suggests that as a species Mycobacterium tuberculosis exhibits very little genomic sequence diversity. Despite limited genetic variability, members of the M. tuberculosis complex (MTBC) have been shown to exhibit vast discrepancies in phenotypic presentation in terms of virulence, elicited immune response and transmissibility. Here, we used qualitative and quantitative mass spectrometry tools to investigate the proteomes of seven clinically-relevant mycobacterial strains—four M. tuberculosis strains, M. bovis, M. bovis BCG, and M. avium—that show varying degrees of pathogenicity and virulence, in an effort to rationalize the observed phenotypic differences. Following protein preparation, liquid chromatography mass spectrometry (LC MS/MS) and data capture were carried out using an LTQ Orbitrap Velos. Data analysis was carried out using a novel bioinformatics strategy, which yielded high protein coverage and was based on high confidence peptides. Through this approach, we directly identified a total of 3788 unique M. tuberculosis proteins out of a theoretical proteome of 4023 proteins and identified an average of 3290 unique proteins for each of the MTBC organisms (representing 82% of the theoretical proteomes), as well as 4250 unique M. avium proteins (80% of the theoretical proteome). Data analysis showed that all major classes of proteins are represented in every strain, but that there are significant quantitative differences between strains. Targeted selected reaction monitoring (SRM) assays were used to quantify the observed differential expression of a subset of 23 proteins identified by comparison to gene expression data as being of particular relevance to virulence. This analysis revealed differences in relative protein abundance between strains for proteins which may promote bacterial fitness in the more virulent W. Beijing strain. These differences may contribute to this strain's capacity for surviving within the host and resisting treatment, which has contributed to its rapid spread. Through this approach, we have begun to describe the proteomic portrait of a successful mycobacterial pathogen. Data are available via ProteomeXchange with identifier PXD004165.

Keywords: *Mycobacterium tuberculosis*, virulence, proteomics, SRM, fitness, stress response

# INTRODUCTION

Tuberculosis disease is caused by the bacterium Mycobacterium tuberculosis, and remains one of the leading causes of death by a single pathogen worldwide. Despite the presence of a vaccine and a number of antibiotics for the disease, it continues to cause about 2 million deaths and 8 million new cases worldwide per year. The emergence of multiple and extremely drug resistant strains, together with HIV co-infection, are fuelling the pandemicespecially in developing countries. Furthermore, latent and subclinical tuberculosis infection compounds tuberculosis control strategies by creating an unseen pathogenic reservoir. Even though various strains of M. tuberculosis have whole genome sequences available, the bacterium still closely guards the secrets of its success as a human pathogen.

According to whole-genome analysis, members of the M. tuberculosis complex (MTBC) exhibit the greatest degree of genetic conservation above all other pathogenic bacteria (99.9%). This strict level of observed genetic homogeneity initially led to the assumption that genetic variety amongst different strains would not be of any clinical significance (Homolka et al., 2008). However, subsequent research has led to the understanding that traits manifested by members of the MTBC are influenced by the genetic and evolutionary background of the strains (Gagneux and Small, 2007). Although, thousands of strains have been identified, only a few seem to drive widespread disease outbreaks and multi-drug resistance (Bifani et al., 2002). M. tuberculosis isolates have been observed to exhibit vast discrepancies in phenotypic presentation especially with regard to clinical outcome and epidemiological behavior (Shimono et al., 2003; Baker et al., 2004; Gagneux and Small, 2007; Nicol and Wilkinson, 2008). The East Asian/Beijing M. tuberculosis lineage is particularly of interest due to its increasing prevalence in the global TB community, implying an apparent selective advantage compared to existing strains (Parwati et al., 2010). It is therefore cause for concern that modern Beijing lineages appear to be accumulating mutations which enhance pathogenicity, apparently under positive selection pressure (Merker et al., 2015). The exact mode by which increased pathogenicity is conferred in this lineage remains undetermined and is likely to be a combination of factors (Ribeiro et al., 2014), however some proposed mechanisms include enhanced stress response, drug resistance and altered host-pathogen interactions, as has been reviewed previously (Hanekom et al., 2011; Warner et al., 2015). On the other hand, some closely related strains in the M. tuberculosis complex (MTBC) have attenuated virulence in humans (such as the vaccine strain BCG), or are not typically human pathogens and will only opportunistically infect immunocompromised humans (Desforges and Horsburgh, 1991; Wang and Behr, 2014; Halstrom et al., 2015).

Whilst genetic variation across multiple strains has been studied in depth, the clinical and epidemiological consequences of genetic differences between mycobacterial strains remains poorly understood (Malik and Godfrey-Faussett, 2005). As a consequence, it is not known whether the proteome is comparatively static between different strains of M. tuberculosis or whether quantitative differences in the expressed proteomes could contribute in some way to differential virulence. Here, we used liquid chromatography mass spectrometry (LC-MS)-based proteomics to define and compare the proteomic complement of 6 clinically relevant mycobacterial strains within the MTBC as well as a strain of Mycobacterium avium as an outlier. While these strains are all pathogenic in principal, the extent to which they cause disease in humans varies greatly. We therefore aimed to identify protein expression profiles that might correlate with altered virulence amongst these strains by comparing more pathogenic strains in the MTBC to less pathogenic Mycobacterium bovis, BCG and M. avium strains.

# MATERIALS AND METHODS

M. tuberculosis isolates H37Rv, W-Beijing, CAS and LAM3 were obtained from the Medical Microbiology Division of the University of Cape Town. The clinical strains representing lineages 2 (Beijing), 3 (CAS), and 4 (LAM3/F11) were isolated from pediatric patients from Red Cross war memorial hospital, Cape Town. M. tuberculosis H37Rv was used in all assays as a reference strain. Phylogeny of the isolates was determined using spoligotyping and MIRU-VNTR described in Sarkar et al. (2012) and is shown in Supplementary Table 1. The Danish strain of M. bovis BCG was used in this study. The M. avium strain was obtained from the National Health Laboratory Services (NHLS) laboratory and was verified using line probe assays. M. bovis was obtained from Stellenbosch University Health Sciences Department in Tygerberg Hospital, Cape Town.

# Cell Culture

Cells were maintained in wholly synthetic Sautons media (2% glycerol, 0.4% L-asparagine, 0.2% glucose, 0.2% citric acid, 0.05% mono-potassium phosphate, 0.05% magnesium sulfate, 0.015% Tween 80, 0.005% ferric citrate, 0.00001% zinc sulfate at pH 7.4). Briefly, 190 ml of Sautons medium was inoculated with a 10 ml starter culture (approximately 10<sup>8</sup> bacteria/ml). The flasks were sealed and incubated at 37◦C and 5% CO<sup>2</sup> with gentle agitation until OD<sup>600</sup> reached 0.9 (approximately 6 weeks).

# Protein Extraction

Proteins were extracted in a Biosafety level 3 facility in line with health and safety guidelines. Briefly, the cell pellet was separated from the culture filtrate by centrifugation at 4000 × g for 15 min in a bench-top centrifuge. Cell lysis was carried out by boiling the cell pellet in 1% SDS buffer (1% SDS, 100 mM Tris-HCl pH 7.6, 0.1 mM dithiothreitol (DTT), 1 mM PMSF) for 30 min. Cell debris was separated from the protein containing supernatant by centrifugation at 10,000 × g for 15 min in a bench top centrifuge and the supernatant containing the protein was transferred into a clean tube. Protein extracts from cell lysates were concentrated and buffer exchanged to 2 M urea buffer using 3 kDa MWCO filters (Millipore). Culture filtrate proteins were concentrated and buffer exchanged into 2 M urea using 15 ml 10 kDa MWCO filters. Protein concentration was determined using the BCA assay kit (Thermo Scientific). A 10 kDa MWCO filter was used for the culture filtrate instead of a 3 kDa because this is the lowest filter size available for large volumes, however, according to manufacturer's product specifications (Millipore), proteins as low as 3 kDa are still retained on 10 kDa MWCO filters.

# Protein Separation (1D SDS PAGE)

Proteins were separated according to molecular weight using an SDS PAGE gel system. The separating gels were made from 10% acrylamide: bis-acrylamide, 0.375 M Tris-HCl (pH 8.8), 7.5% SDS, 0.5% ammonium persulphate and 0.1% TEMED. The stacking gels consisted of 4% acrylamide: bis-acrylamide, 0.125 M Tris-HCl (pH 6.8), 0.1% SDS, 0.5% ammonium persulphate and 0.1% TEMED. 40 µg of each sample (culture filtrate and intracellular protein) was mixed with an equal volume of 2x sample buffer and heated at 65◦C for 5 min. Electrophoresis was performed from anode to cathode at 100 V using a BioRad mini-Protean II gel system until the bromophenol blue dye reached the bottom of the gel.

# Protein Visualization

Visualization of the proteins on the gel was performed using Coomasie brilliant blue R250 for 1 h (50% methanol, 10% acetic acid and 0.1% Coomasie brilliant blue R250). Destaining of the gels was carried out by incubating on a shaker overnight at room temperature in destaining solution (10% methanol, 10% acetic acid).

# In Gel Trypsin Digestion

Each gel lane for each strain sample was divided into 5 pieces (i.e., 5 culture filtrate fractions and 5 intracellular protein fractions, hence a total of 10 fractions per strain). Each gel piece was cut into smaller cubes and washed twice with water followed by 50% (v/v) acetonitrile for 10 min. The acetonitrile was replaced with 50 mM ammonium bicarbonate and incubated for 10 min. Washes with 50 mM ammonium bicarbonate were repeated twice to remove acetonitrile. All the gel pieces were then incubated in 100% acetonitrile until they turned white, after which the gel pieces were dried in vacuo. Proteins were reduced with 10 mM DTT for 1 h at 57◦C. This was followed by brief washing steps of ammonium bicarbonate followed by 50% acetonitrile before proteins were alkylated with 55 mM iodoacetamide for 1 h in the dark. Following alkylation, the gel slices were washed with ammonium bicarbonate for 10 min followed by 50% acetonitrile for 20 min, before being dried in vacuo. The proteins in the gel cubes were digested with trypsin (Promega) at 37◦C overnight in a 1:50 trypsin: protein ratio. The resulting peptides were extracted twice with 70% acetonitrile in 0.1% formic acid for 30 min and then dried and stored at −20◦C. Dried peptides were dissolved in 5% acetonitrile in 0.1% formic acid and 10 µl injections were made for nano-LC chromatography.

# Mass Spectrometry

All experiments were performed on a Thermo Scientific EASYnLC II coupled to an LTQ Orbitrap Velos mass spectrometer (Thermo Scientific, Bremen, Germany) equipped with a nanoelectrospray source. For liquid chromatography, separation was performed on an EASY-Column (2 cm, ID 100 µm, 5 µm, C18) pre-column followed by a, EASY-column (10 cm, ID 75 µm, 3 µm, C18) column with a flow rate of 300 nl/min. The gradient used was from 5–15% B in 5 min, 15–35% B in 90 min, 35–60% B in 10 min, 60–80% B in 5 min, and kept at 80% B for 10 min. Solvent A was 100% water in 0.1% formic acid; solvent B was 100% acetonitrile in 0.1% formic acid. MS/MS data was acquired from the Orbitrap Velos in Top 20 CID mode.

# Post MS Data Analysis

Raw data was captured from the mass spectrometer and converted to MS2 files using MakeMS2 software (Thermo Scientific). The data was then analyzed using Crux (McIlwain et al., 2014) and Mascot (Cottrell and London, 1999), and the output of MS2PIP (Degroeve and Martens, 2013) was used additional features for the Percolator algorithm. Spectra were obtained from each fraction of the gel (a total of 5 fractions per strain) and were viewed using Peaks v5.3. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE (Vizcaíno et al., 2016) partner repository with the dataset identifier PXD004165.

# Protein Preparation for SRM-MS

Proteins were extracted in the BSL3 facility in line with health and safety guidelines as described in Section Protein extraction. Protein concentration was determined using a BCA assay according to manufacturer's protocol (#23227, Thermo Fisher Scientific). Protein preparation was performed using a filter aided sample preparation (FASP) method. Briefly, 200 µg of each protein sample was placed into a 10 KDa molecular weight cut off filter (MWCO) (Millipore). Protein cysteine residues were alkylated in the dark for 30 min in 10 mM iodoacetamide. Iodoacteamide was then removed by centrifugation at 14,000 × g for 15 min. Buffer exchange was performed twice with 8 M urea in 0.1 mM Tris-HCl pH 8.5 by centrifugation at 14,000 ×g for 15 min in a refrigerated bench-top centrifuge at 18◦C. The urea buffer was then exchanged for 0.05 M ammonium bicarbonate by centrifugation. Sequencing-grade modified trypsin (#608-274- 4330, Promega) was added at a ratio of 1:100 enzyme: substrate and incubated overnight at 37◦C in a wet chamber. The peptides were finally collected through the filter into a clean collection tube by centrifugation at 14 000 × g for 10 min.

To stop the tryptic digest the pH was lowered to 2 using 50% trifluoro acetic acid (TFA) followed by an incubation for 15 min at 37◦C with shaking at 500 rpm. The peptide solution was desalted with C18 reversed-phase columns (Pierce #89870- 25) according to the manufacturer's instructions. Briefly, the C18 columns were activated with 50% methanol, followed by equilibration with 5% ACN: 0.5% TFA. After loading the sample, the columns were washed 3 times with 5% ACN: 0.5% TFA. Finally, peptides were eluted with 70% ACN, dried under vacuum, and re-solubilized in 0.1% FA to a final concentration of 8 µg/µl.

# SRM Mass Spectrometry

All SRM experiments were performed on a TSQ Vantage triple quadrupole mass spectrometer (Thermo Fisher Scientific) equipped with a heated electrospray II ion source. For liquid chromatography and separation of peptides, a Synergi 4 µ Hydro RP 150 × 4.60 mm 80 Å pore size C18 column (serial # 630710- 14) was used with a column flow rate of 300 µl/min. The gradient used was from 5–15% B in 5 min, 15–35% B in 90 min, 35–60% B in 10 min, 60–80% B in 5 min and kept at 80% B for 10 min. Solvent A was 100% water in 0.1% formic acid, and solvent B was 100% acetonitrile in 0.1% formic acid.

The mass spectrometer was operated in positive mode using electrospray ionization with a voltage of 3500 V. The capillary temperature was set to 350◦C and the collision gas pressure to 1.2 mTorr. Up to 336 transitions per run were acquired with a cycle time of 3 s and a dwell time of at least 20 ms. Collision energies were calculated per individual peptide transition ion using Skyline software and further optimized by a series of energy ramping experimental steps (10 steps of 5 V) to obtain the optimum energy of each transition. MS/MS data was acquired from the CID mode. Raw data was captured from the mass spectrometer and analyzed using Skyline software.

The proteins chosen for SRM analysis were based on relevance in pathogenicity and/or virulence as stated in literature as shown in Supplementary Table 2. To design SRM assays, peptides were chosen for analysis based on their prior observation in our discovery experiments. Although, each protein was typically observed by two or more peptides in the discovery experiment, the two best performing peptides were chosen to confirm each protein in the SRM assay, with the exception of 2 proteins (Rv1818c and Rv0833) (Supplementary Table 3). Due to the small size and non-redundant nature of the M. tuberculosis proteome, 3 ion transitions were set as the minimum number required to identify a peptide in an SRM assay.

# SRM Data Analysis

Data analysis was carried out using Skyline software (MacCoss Lab software). Intra-assay ambiguity (CV) for each peptide was based on the calculated average protein concentration for a set of technical duplicate injections of each sample. Inter-assay CV was calculated for each peptide from across three biological replicates of the 7 strains. Quantitation of each peptide was carried out using the area under the curve for the peptide transitions assayed in Skyline. Retention times for the peptide standards were obtained by pre-assessment on the MS. For peptides without standards, the retention times were obtained from predictions made by Skyline software and gated at 5 s from the predicted retention time.

# Protein Inference

The protein databases used to generate theoretical spectra were strain specific individual non-redundant fasta files obtained from Ensembl (www.ensembl.org), with the exception of LAM for which there is no Ensembl annotation, and W-Beijing whose annotated file is not sufficient for downstream cross strain comparison. For LAM, the UniProt fasta file was used (www.uniprot.org) and for W-Beijing, the H37Rv Ensembl fasta file was used. The parameters were standard across both searches and included carbamidomethylation of cysteine residues as a fixed modification and oxidation of methionine residues as a variable modification. Two missed cleavages were allowed and peptide mass tolerance was set at 10 ppm whilst fragment mass tolerance was set to 0.5 Da. Decoy databases were used for FDR analysis and a cut-off was set at 5% for protein identifications.

The PSMs obtained from the search were used to predict the expressed protein repertoire of each sample. MS/MS spectra from the 10 fractions per strain were searched with each individual search engine. Combining results from multiple search engines yields higher protein identifications (Shteynberg et al., 2013), and therefore all the proteins identified from each individual search engine were combined and redundancy was removed to give one complete non-redundant dataset per study organism.

# RESULTS

Total protein extracts from the discovery MS approach were quantified using BCA assay with concentrations ranging between 2.5–15 µg/µl for total cellular proteins and 10–40 µg/µl for culture filtrate proteins. Each tryptically digested sample was analyzed on the Orbitrap Velos to produce an LC-MS and MS/MS dataset. To assess the efficiency of the tryptic digest a descriptive analysis software package in Protein Pilot was used and the results are summarized in Supplementary Table 4.

As shown in Supplementary Figure 1A, the MS1 scan of H37Rv confirms a successful tryptic digest with the total ion chromatogram in showing a steady elution of peptides across the LC gradient. The 2D MS chromatogram (Supplementary Figure 1B) demonstrates the complexity of the sample, showing a significant number of discrete tryptic peptides eluting at the marked time point indicated on the 1D chromatogram (∼2600 s).

For the analysis of the MS2 spectra a software pipeline was implemented that combines the results of different peptide identification strategies. At the core of our pipeline is the semi-supervised learning algorithm implemented in Percolator (Brosch et al., 2009) that has been shown to obtain high identification sensitivity. The first tool in the pipeline is Crux (McIlwain et al., 2014) which is a reimplementation of the popular tool Sequest with added post-processing by the Percolator algorithm. The second tool is Mascot (Cottrell and London, 1999) for which again Percolator was used to postprocess the MOWSE identification scores<sup>1</sup> .

The Percolator tool allows for adding new features that can be exploited by the semi-supervised learning algorithm to further increase peptide identification sensitivity. It has been shown that adding features obtained from MS2 peak intensity predictions can significantly increase sensitivity (Sun et al., 2007). Therefore, we employed the MS2PIP (Degroeve and Martens, 2013) tool to predict the b- and y-ion peak intensities for all peptides suggested by Mascot (top ranked peptide for each MS2 spectrum). We then computed several features from the difference between the predicted and the observed MS2 peak intensities, such as the Pearson correlation. We observed that adding these features to the Percolator algorithm for Mascot did indeed increase identification sensitivity significantly.

<sup>1</sup>http://www.sanger.ac.uk/science/tools/mascotpercolator

Comparison of protein numbers obtained at 1% and 5% FDR showed that the use of 5% FDR allows a substantial increase in the absolute number of true positives with an insignificant increase in the absolute number of false positives, hence providing an apparently favorable trade-off in true positives over false positives (**Figure 1**). The final proteome obtained from each strain using all algorithms represented a high proportion of the theoretical protein fasta files for each strain, as shown in **Table 1**.

FIGURE 1 | (A) The contribution of each of the search engines to the total number of non-redundant proteins obtained per strain at 5% FDR. Mascot is shown in blue bars, Crux is shown in red bars, Mascot+MS2PIP+Percolator is shown in green bars and the total non-redundant library is shown in purple bars. (B) The comparison between 1 and 5% FDR across all strains. This illustrates the total number of proteins obtained for each strain at each FDR, and the proportion of true and false positives in that FDR bracket.

TABLE 1 | Total non-redundant number of proteins obtained in the experiment compared to the total number of proteins in the theoretical fasta file.


# Data Alignment for Downstream Comparison

To carry out an effective cross strain comparison, it was crucial to ensure that as much of the theoretical proteome as possible was observed by discovery MS. After obtaining a non-redundant dataset for each strain using strain specific databases, it became necessary to convert all the protein IDs into a standard protein ID by orthology mapping to allow effective cross strain comparison. To achieve this, the total non-redundant IDs from each strain obtained by searching against its individual Ensembl fasta file were then mapped back to the Ensembl H37Rv protein IDs. These were all in turn mapped to UniProt accession numbers and Tuberculist "Rv" loci numbers to facilitate downstream analysis with tools such as GO analysis and pathway mapping. It was observed that there is a slight deficiency in ortholog mapping data between databases (Ensembl and UniProt), as well as shortcomings in ortholog mapping between strains. These discrepancies lead to a minor loss of information, as represented in **Figure 2**. The discrepancy in ortholog mapping was much more pronounced when mapping protein IDs to H37Rv from the more distantly related M. avium, which lies outside the MTBC group; this resulted in loss of approximately 50% of the biological information in downstream comparisons to M. avium. Other strains showed relatively small losses in the number of experimentally observed proteins mapped to H37Rv orthologs, for instance the LAM strain had approximately 600 proteins with no orthologs in H37Rv which were therefore not included in the cross-species comparison.

# Qualitative Cross Species Comparison

With congruent IDs, strains were cross-compared to obtain a comprehensive qualitative comparison as summarized in **Figure 3** using the Venn diagram tool Venny [267]. Protein IDs from the 4 Mycobacterium tuberculosis strains were compared (**Figure 3A**) and, as expected, the majority of observed proteins were found to be shared amongst all M. tuberculosis strains, with less than 5% being strain specific. A total of 1938 proteins comprise the shared proteins among the 4 M. tuberculosis strains, perhaps representing a M. tuberculosis core proteome. A second diagram was generated comparing the collective, non-redundant proteomes of the four M. tuberculosis strains to those of M. bovis, BCG, M. avium (**Figure 3B**). Surprisingly, M. avium had many proteins in common with the MTBC strains, with only 12 unique proteins apparently unique to M. avium; this may simply reflect though the deficiencies in ortholog mapping between more distantly related organisms. The four M. tuberculosis strains also share 989 common proteins with M. bovis and BCG which they do not share with M. avium. A group of 168 was observed uniquely in the M. tuberculosis strains and not found in M. bovis, BCG or M. avium; these proteins were therefore earmarked as candidate virulence factors to be further explored.

# Protein Expression Profiling

We subsequently sought to quantitatively assess a subset of the candidate virulence factors identified by discovery MS and crossstrain comparison using selected reaction monitoring (SRM) a sensitive, reproducible and quantitative MS technique. Since

the design of SRM robust assays can be a lengthy process and the capacity to highly multiplex hundreds of SRM assays remains challenging, we devised a strategy to create a short list of candidate proteins with possible relevance in differential clinical phenotypes observed between the M. tuberculosis isolates for subsequent quantitative proteomic analysis. To do this, we compared our proteomic data on each of the 168 proteins (Supplementary Table 5) that were observed only in the M. tuberculosis strains with 771 gene expression data sets contained in the TBDB, representing varying in vitro models of TB disease. We focussed attention in particular on 7 categories of experiment from the TBDB that aimed to more closely reflect in vivo conditions (e.g., starvation models, macrophage infection models, hypoxic models, etc.), the logic being that consistent over-expression of a protein in one of those categories might plausibly confer a selective advantage to the bacterium in vivo; the categories chosen are listed in **Table 2**.

For each of the 168 proteins, we assessed whether they were significantly over- or under-expressed in each of the 7 chosen categories of gene expression models and carried out statistical analysis on the gene expression values obtained from the TBDB experiments using packages in R (strategy depicted in **Figure 4**). Proteins from the 168 protein set whose gene expression showed an average fold change of ≥2 SD from the mean across all datasets in an individual condition were taken as significantly differentially expressed in that condition. Proteins that had significant fold change for fewer than 4 out of the 7 categories were removed from this list, resulting in a final shortlist of 23 proteins, summarized in **Table 3**. In order to create SRM assays for the shortlisted proteins, the M. tuberculosis proteome library (Schubert et al., 2013) was consulted and validated SRM assays for the proteins of interest were extracted.

A minimum of 2 peptides per protein and 3 transitions per peptide were assessed by SRM for the shortlisted proteins in all seven strains. Intra-assay and inter-assay coefficients of variation were determined for each individual peptide. Intra

TABLE 2 | The number of significantly differentially expressed proteins from the 168 proteins unique to human clinical strains segregated into over- or under-expressed per category assessed (www.tbdb.org).

bovis, BCG, and M. avium.


assay variability was based on 3 technical replicates per strain whilst inter assay variability was assessed based on 2 biological replicates. The signal of each peptide observed was obtained by

summing the peak areas of each measured transition for that peptide and then normalizing by the total number of cells per strain at the point of protein extraction. An ANOVA test was used to determine if there was a significant difference in the expression of each of the peptides representing each protein. The single factor ANOVA results (**Table 3**) shows that of the 23 peptides assessed, 18 had a significant difference in expression between the 7 strains. The proteins were then broadly classified into 4 groups denoting some aspect of the organisms' success in the host (Supplementary Figure 2) in order to aid further interpretation of the data.

The four proteins assayed with roles in drug response are PyrB, PyrC, CarA, and CarB, and all form part of the pyrimidine biosynthetic operon in M. tuberculosis. All with the exception of carB (Rv1384) are more abundant in M. avium compared to the other strains, but within the MTBC the Beijing strain has the highest expression of PyrB, PyrC, and CarA.

Amongst proteins which are known to modulate the host immune response, the uncharacterized hypothetical protein Rv0966 is highly expressed in the LAM strain compared to the other strains. In this functional category, Rv2136c, Rv1002, Rv2703, and Rv2108 are more abundant in the Beijing strain. Rv1818c appears to be more abundant in the CAS strain while in the other strains it is present at comparatively low amounts.

Amongst proteins responsible for the growth of M. tuberculosis in the host, a possible toxin with unknown function, VapC2 (Rv0301), has the highest relative expression in the Beijing strain. PE\_PGRS13 (Rv0833), MetC (Rv3340), and conserved hypothetical protein Rv3412 are all more abundant in the LAM strain than in BCG or M. avium, although similarly highly abundant in H37Rv and M. bovis. PPE65 (Rv3621c) appears to be upregulated in both LAM and M. avium strains.

In terms of adaptation to stress, the Acyl-CoA dehydrogenase MbtN (Rv1346) is particularly abundant in the Beijing strain. The protein Rv0901 is apparently less abundant in M. bovis and BCG strains whereas it is relatively abundant in the other strains, particularly so in H37Rv and M. avium. Aspartate kinase (Rv3709c) is more abundant in H37Rv and LAM strains comparatively, and almost entirely absent in M. avium and M. bovis.

The contribution to total signal detected for each protein as measured by SRM in each of the 7 strains is represented in **Figure 5**. This analysis demonstrates that proteins which were detected in a particular strain in the discovery experiment, e.g., Rv0301 and Rv1002c which were observed only in the Beijing strain, tend to contribute the highest signal when measured by SRM. This appears to be clearly the case for 14 of the 23 proteins assessed (Rv0301, Rv1381, Rv1383, Rv1002c, Rv3709c, Rv1346, Rv3412, Rv2108, Rv3621, Rv0833, Rv0966c, Rv1818c, Rv2136c, and Rv3340).

# DISCUSSION

Although, genetically similar, different strains of M. tuberculosis present very different clinical phenotypes in terms of virulence. We therefore postulated that there may be a proteomic mechanism underpinning the differences in pathogenicity



observed between strains of M. tuberculosis and we explored this possibility using discovery and targeted mass spectrometry techniques. In order to reveal candidate proteins that might be involved in differential pathogenicity, we aimed to compare proteomes between individual pathogenic and non-pathogenic mycobacteria, noting that both BCG and M. avium can cause TB-like disease in immune compromised individuals, suggesting that their pathogenicity is attenuated, not lost entirely. To underpin our intended cross-strain and crossspecies proteomic comparisons, we first carried out an exhaustive mass spectrometry-based discovery proteomics analysis of the 7 mycobacterial strains. Through use of a sophisticated bioinformatic strategy, combining data from multiple search engines, we obtained >80% coverage at the protein level for 6 out of the 7 theoretical proteomes. Combining data across the 4 M. tuberculosis strains, we identified a total of 3788 M. tuberculosis proteins with high confidence, meaning that we failed to observe only 235 out of the predicted 4023 proteins in the M. tuberculosis proteome. To our knowledge, this level of discovery proteomic coverage across multiple M. tuberculosisstrains is unprecedented, although we note that recently reported SWATH-based analyses on M. tuberculosis H37Rv have come close to this figure (Schubert et al., 2015). Surprisingly, although the 23 proteins subsequently quantified by SRM were initially observed only in the M. tuberculosis strains by discovery MS analysis, they were in fact all identified in all 7 strains by SRM analysis, albeit with relative quantifications that correlated to a large degree with our discovery data. Furthermore, SRM analysis of 45 predicted M. tuberculosis proteins that had not been observed in our discovery MS analysis of any of the 7 mycobacterial strains revealed that half were in fact expressed in each of the M. tuberculosis strains (data not shown), presumably reflecting low absolute expression levels for those proteins, below the detection limit for discovery MS. Taken together, our data suggests that the total expressed complement of proteins is remarkably similar in the different clinical strains of M. tuberculosis and moreover that virtually the entire M. tuberculosis proteome is expressed in all strains, at least under optimal in vitro conditions. However, our data also clearly demonstrates that significant, quantitative differences in expression levels exist between strains which may directly influence the phenotype of these strains. While it is possible that protein quantity does not track with the enzyme activity of that protein in a cell, due to allosteric effects or post-translational modifications, enzyme activity was not measured in our study.

Cross-strain and cross-species comparisons of our semiquantitative discovery proteomics data were limited only by the relatively poor ortholog mapping found between M. avium and M. tuberculosis H37Rv and enabled identification of 168 proteins that were originally observed only in the M. tuberculosis strains and were therefore considered candidates that might contribute to the differential virulence of these mycobacterial strains. However, we were conscious that our proteomic data had been generated under one set of culture conditions that were likely far removed from the true host environment in TB disease. We therefore cross-correlated our proteomic data with 771 gene expression models deposited in the TBDB, covering a wide range of different in vitro culture conditions and exogenous stresses on M. tuberculosis that can be thought of as mimicking various aspects of the host environment (e.g., macrophage infection; hypoxia; starvation; etc.). The genes for 23 of the 168 proteins were found to be significantly up- or down-regulated in macrophage-based and related in vitro gene expression models of TB disease and we therefore carried out quantitative analysis of their protein expression in vitro across the 7 strains, focussing particularly on differential expression in the Beijing and LAM lineages that are known to have particularly virulent clinical phenotypes (Pillay and Sturm, 2007; Cowley et al., 2008).

One of M. tuberculosis's many features as a pathogen is its ability to evade the host innate and acquired immune responses such that it is capable of attaining latency and can potentially remain relatively quiescent in alveolar macrophages for decades (Flynn and Chan, 2003). Here, we identified four proteins which modulate the host immune response which are significantly more abundant in the Beijing strain compared to other strains and may therefore have functional significance in conferring virulence in M. tuberculosis—Rv2136c, Rv1002c, Rv2703, and Rv2108. Mutant M. tuberculosis with insertionally inactivated Rv2136c, a known virulence factor of the MTBC (Forrellad et al., 2013), has severe hypersensitivity to acid and a number of other stresses (Vandal et al., 2009). Rv2136c (uppP) is an undecaprenyl pyrophosphate phosphatase which recycles undecaprenyl pyrophosphate back to undecaprenyl phosphate so that it can act again as a receptor for the UDP-MurNAc-pentapeptide to make C55- PP-MurNAc-pentapeptide (lipid 1). The antibiotic Bacitracin inhibits peptidoglycan synthesis by sequestering undecaprenyl diphosphate, thereby reducing the pool of lipid carrier available, whilst increased expression of uppP provides resistance; by extension, Rv2703c might therefore be involved in virulence by speeding up the recycling of key lipid intermediates and hence cell wall biosynthesis, thus conferring a selectable advantage on the virulent Beijing strains. Similarly, sigma factor A (sigA), Rv2703, is the primary sigma factor in this bacterium and is essential for growth. Increased initiation of transcription, and thus RNA processing capacity, may therefore be another mechanism by which this strain has achieved hypervirulence, perhaps coupled to the observed increased expression of several proteins involved in pyrimidine biosynthesis (pyrB, pyr C, CarA). Although the function of Rv1002c is unknown, it is essential for growth in H37Rv (Sassetti et al., 2003), whereas the PPE family protein PPE36 (Rv2108) has no known function and is nonessential for growth in H37Rv. Both of these proteins therefore represent attractive targets for further investigation.

Once inside the host macrophage, M. tuberculosis becomes dependent on the intracellular environment for sources of carbon (McKinney et al., 2000; Eisenreich et al., 2010) and iron. The acyl-CoA dehydrogenase, MbtN (Rv1346), is involved in the production of mycobactins which are thought to be vital for the acquisition of iron within the macrophage and are therefore considered to be virulence factors (De Voss et al., 2000). It is notable therefore that MbtN protein was significantly more abundant in the Beijing strain, suggesting that this strain's capacity to acquire iron intracellularly may be superior. The capacity to produce essential amino acids in the host may also provide a selective advantage in terms of virulence, as demonstrated by the increased relative abundance of the Oacetylhomoserine sulfhydrylase MetC (Rv3340) and aspartate kinase Ask (Rv3709c) in the LAM strain.

Another important feature of intracellular M. tuberculosis in the persistent phase is the toxin/antitoxin system, of which M. tuberculosis has a remarkable 79 encoded loci (Sala et al., 2014). Here we found that the possible toxin VapC2 (Rv0301) was significantly more abundant in the Beijing strain than other clinical strains. Vap C has been reported to suppress translation by hydrolysis of mRNA, so its increased expression in the Beijing strain may represent an evolutionary advantage by providing an efficient means to erase previous transcriptional profiles, thus allowing M. tuberculosis to rapidly reprogram the proteome and hence change the metabolic state of the cell in response to rapidly changing external stresses during the bacterium's host-based lifecycle.

Finally, proteins with unknown function are clear targets for further investigation—for example the PE\_PGRS and PPE family proteins which are known virulence factors yet currently have no functional categorization. Both PE\_PGRS13 (Rv0833) and PPE65 (Rv3621c) as well as conserved hypothetical proteins Rv0966 and Rv3412 are abundant in the LAM strain. Interestingly, Rv0901 a possible exported or membrane protein with unknown function—is abundant in all measured strains except for M. bovis and BCG, both of which have attenuated pathogenicity. The loss of this protein could therefore alter the pathogenicity of the bacterium, indicating that this protein is a potential therapeutic target.

An important caveat for any quantitative in vitro study on M. tuberculosis that aims to correlate gene or protein expression levels with in vivo clinical phenotypes is that significant differences in expression observed under specific in vitro conditions may not accurately reflect the situation at the site of disease due to altered environmental influences on expression. This is further compounded by the fact that it is not currently technically possible to isolate sufficient M. tuberculosis bacilli from the site of disease in a human lung for a discovery proteomics experiment and by the fact that the clinical definitions of "virulence" and "pathogenicity" are themselves largely qualitative. In order to mitigate this caveat, we therefore correlated our quantiative proteomic data with over 700 gene expression models of TB disease—themselves acquired under a number of different in vitro conditions that each mimic in some way aspects of the stress likely to be experienced by M. tuberculosis at the site of disease—in order to provide a logical means to infer biological significance from in vitro data with greater confidence. A testable prediction from the in vitro quantitative proteomic data presented here is thus that the observed differential expression of specific mycobacterial proteins across these 7 strains when cultured under a common set of environmental conditions will affect clinical phenotype in vivo. However, the true role of these proteins in virulence will need to be validated in due course by targeted analysis of limiting numbers of M. tuberculosis bacilli isolated from the site of disease.

# CONCLUSION

Through our combined discovery- and quantitative proteomic analysis of differential protein expression in 7 mycobacterial strains of varying pathogenicity and virulence, we have uncovered previously unknown, statistically significant quantitative differences in the expression of numerous proteins which begin to shed new light on differential virulence in M. tuberculosis strains. In particular, our data suggests strain specific bacterial fitness in the W-Beijing lineage, including: the ability to rapidly remodel the M. tuberculosis proteome in response

# REFERENCES


to altered environments; up-regulation of key sigma factors to support rapid transcriptional responses; up-regulation of enzymes involved in pyrimidine biosynthesis and cell wall biosynthesis to promote rapid growth; enhanced mycobactin biosynthesis to promote iron scavenging in the host. These individually selectable traits may then conceivably work together to provide the W-Beijing lineage with an enhanced ability to establish primary infection and active TB disease in a new host. These are testable hypotheses and further research is underway on this now.

# AUTHOR CONTRIBUTIONS

JP carried out research, wrote initial draft. BC analyzed data, refined and rewrote initial draft to produce final draft. GG, SD, ER, NM, NS, LM assisted with data analysis and JB is principal investigator.

# ACKNOWLEDGMENTS

We thank the South African Medical Research Council for Fellowships. JB acknowledges the South African National Research Foundation for the Research Chair grant. We would like to acknowledge Olga Schubert and Ruedi Aebersold for access to the M. tb SRM Atlas pre-publication.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmicb. 2016.00813


the complete proteome of Mycobacterium tuberculosis. Cell Host Microbe 13, 602–612. doi: 10.1016/j.chom.2013.04.008


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Peters, Calder, Gonnelli, Degroeve, Rajaonarifara, Mulder, Soares, Martens and Blackburn. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# The polar and lateral flagella from *Plesiomonas shigelloides* are glycosylated with legionaminic acid

*Susana Merino1, Eleonora Aquilini1, Kelly M. Fulton2, Susan M. Twine2 and Juan M. Tomás1\**

*<sup>1</sup> Departamento de Microbiología, Facultad de Biología, Universidad de Barcelona, Barcelona, Spain, <sup>2</sup> National Research Council, Ottawa, ON, Canada*

*Plesiomonas shigelloides* is the unique member of the *Enterobacteriaceae* family able to produce polar flagella when grow in liquid medium and lateral flagella when grown in solid or semisolid media. In this study on *P. shigelloides* 302-73 strain, we found two different gene clusters, one exclusively for the lateral flagella biosynthesis and the other one containing the biosynthetic polar flagella genes with additional putative glycosylation genes. *P. shigelloides* is the first *Enterobacteriaceae* were a complete lateral flagella cluster leading to a lateral flagella production is described. We also show that both flagella in *P. shigelloides* 302-73 strain are glycosylated by a derivative of legionaminic acid (Leg), which explains the presence of Leg pathway genes between the two polar flagella regions in their biosynthetic gene cluster. It is the first bacterium reported with *O*-glycosylated Leg in both polar and lateral flagella. The flagella *O*-glycosylation is essential for bacterial flagella formation, either polar or lateral, because gene mutants on the biosynthesis of Leg are non-flagellated. Furthermore, the presence of the lateral flagella cluster and Leg *O*-flagella glycosylation genes are widely spread features among the *P. shigelloides* strains tested.

Keywords: *Plesiomonas shigelloides*, polar flagella, lateral flagella, *O*-glycosylation, legionaminic acid

# Introduction

*Plesiomonas shigelloides* is a Gram-negative bacilli flagellated bacterium. This facultative anaerobic bacterium is ubiquitous, has been isolated from different water sources (freshwater or surface water), and animals (wild and domestic; Farmer et al., 1992). In humans, *P. shigelloides* is associated with diarrheal disease in humans (Brenden et al., 1988). Sometimes could also be the cause of gastroenteritis, including acute secretory gastroenteritis (Mandal et al., 1982), an invasive shigellosis-like disease (McNeeley et al., 1984), and a cholera-like illness (Tsukamoto et al., 1978). Extra intestinal infections, such as meningitis, bacteremia (Billiet et al., 1989), and pseudoappendicitis (Fischer et al., 1988), are also associated with *P. shigelloides* infection. Of particular concern are the severe cases of meningitis and bacteremia (Fujita et al., 1994) caused by *P. shigelloides*.

*Plesiomonas shigelloides* was initially classified in the *Vibrionaceae* family; however, molecular studies by Martinez-Murcia et al. (1992) indicated that is related to the enterobacterial genus *Proteus* phylogenetically. Huys and Sings (1999) during studies of *Aeromonas* spp. genotyping using by the amplified fragment length polymorphism found that *P. shigelloides* clearly falls out of the major *Aeromonas* cluster. According to these features the genus *Plesiomonas* was reclassified to

#### *Edited by:*

*Nelson Cruz Soares, University of Cape Town, South Africa*

#### *Reviewed by:*

*Jason Warren Cooley, University of Missouri, USA Akos T. Kovacs, Friedrich Schiller University Jena, Germany*

#### *\*Correspondence:*

*Juan M. Tomás, Departamento de Microbiología, Facultad de Biología, Universidad de Barcelona, Diagonal 643, Barcelona 08071, Spain jtomas@ub.edu*

#### *Specialty section:*

*This article was submitted to Microbial Physiology and Metabolism, a section of the journal Frontiers in Microbiology*

> *Received: 30 January 2015 Accepted: 15 June 2015 Published: 26 June 2015*

#### *Citation:*

*Merino S, Aquilini E, Fulton KM, Twine SM and Tomás JM (2015) The polar and lateral flagella from Plesiomonas shigelloides are glycosylated with legionaminic acid. Front. Microbiol. 6:649. doi: 10.3389/fmicb.2015.00649* the family *Enterobacteriaceae,* being the only oxidase-positive member of this family (Garrity et al., 2001). In order to distinguishing different strains of *P. shigelloides*, two major serotyping schemes, one based on *O*-antigen lipopolysaccharide (O) and the other one on flagellar (H) antigens. With a total of 102 somatic antigens and 51 flagellar antigens recognized (Aldova and Shimada, 2000).

The flagella biosynthesis, in terms of resources and energy, is a costly commitment for the bacterium (Macnab, 1996). The flagella number is variable, and the distribution most frequently found on pathogenic bacteria are monotrichous (single flagellum) or pertitrichous (multiple flagella around the cell; Macnab, 1996). The flagella expression is dependable of the growth conditions. When grown in plates, several bacterial species produced more flagella than when they grow in liquid medium. Some species, like *Proteus mirabilis*, have been observed to show an increase in the numbers of flagella. *Vibrio parahaemolyticus*, have a single polar flagellum in liquid medium, instead when grown on solid medium, produces the polar flagellum (Fla) and peritrichous (or lateral) flagella (Laf; Allison and Hughes, 1991; Allison et al., 1992; Merino et al., 2014). Lateral flagella, were shown in about seven other *Vibrio* species (some of which evokes a disease spectrum similar to *V. parahaemolyticus*; Shinoda et al., 1992), while only a reduced number of bacterial species, including *Rhodospirillum centenum* (a purple photosynthetic bacterium; McClain et al., 2002), *Azospirillum* spp. (nitrogenfixing rhizobacteria that colonize plants; Moens et al., 1996), *Helicobacter mustelae* (the causative agent of chronic gastritis and ulcer disease in ferrets; O'Rourke et al., 1992), *P. shigelloides* (Inoue et al., 1991), and *Aeromonas* spp. (opportunistic and gastroenteric pathogens of man; Gavín et al., 2002). Other species that show lateral flagella include *Bradyrhizobium japonicum* (Kanbe et al., 2007), *Photobacterium profundum* (Eloe et al., 2008), and *Rhodobacter sphaeroides* (Poggio et al., 2007). Furthermore, *Selenomonas ruminantium* subsp. *lactilytica* is a solely laterally flagellate bacterium (Haya et al., 2011).

Protein glycosylation is one of the most common protein post-translational modifications and consists in the covalent attachment of carbohydrates to amino acids. This mechanism was thought to occur exclusively in eukaryotes. However, protein glycosylation systems have been identified in all forms of life including prokaryotes. *N*-glycosylation is the covalent linkage to asparagine residues of carbohydrates, while *O-*glycosylation to serine or threonine residues. *O-*glycosylation in bacteria has been largely reviewed recently (Iwashkiw et al., 2013). As more bacterial genomes are now available together with bioinformatic analysis coupled with functional analysis, the elucidation of glycosylation pathways achieved increasing, including the identification of many genes that participate in flagellin glycosylation (Iwashkiw et al., 2013). The number of *O-*glycosylation genes involved is diverse in each bacterial species (Goon et al., 2003; Schirm et al., 2003; Faridmoayer et al., 2007; Iwashkiw et al., 2012). In spite of these advances, the knowledge of glycans structure and composition of which modify from Gram-negative bacteria flagellins is restricted to certain species and has been observed to be strain-dependent [as reviewed by Merino and Tomás (2014)].

In this work we study the genetics of *P. shigelloides* flagella (polar and lateral), and their flagella post-translational modifications, the first report of flagellar glycosylation in enteric bacteria.

# Materials and Methods

# Bacterial Strains, their Growth Conditions, and Plasmids Used

The bacterial strains, as well as the plasmids used, are listed on **Table 1**. Bacteria were grown in TSB broth and TSA medium supplemented if necessary with kanamycin (25 μg/ml), tetracycline (20 μg/ml), and rifampicin (100 μg/ml) when needed.

# MiniTn5Km-1 Mutagenesis

Conjugal transfer of miniTn5Km-1 transposition element from *Escherichia coli* S17-1λ*pir*Km-1 to *P. shigelloides* 302-73R (wild type strain rifampicin-resistant) was carried out in a conjugal drop as previously described (Aquilini et al., 2013).

# Construction of a *P. shigelloides* Genomic Library

*Plesiomonas shigelloides* strain 302-73 (serotype O1) genomic DNA was isolated and partially digested with *Sau3A* as described by Sambrook et al. (1989). The *P. shigelloides* strain 302-73 genomic library, using cosmid pLA2917 (Allen and Hanson, 1985), was performed as described (Guasch et al., 1996).

## General DNA Methods

General DNA manipulations were done essentially as previously described described (Sambrook et al., 1989; Aquilini et al., 2014).

# Southern Blot Hybridizations

Southern blotting was performed by capillary transfer (Sambrook et al., 1989) from the gel to a nylon membrane (Hybond N1, Amersham). Probe labeling, hybridization, and detection were carried out as previously described (Aquilini et al., 2014) using the enhanced chemiluminescence labeling and detection system (Amersham) according to the manufacturer's instructions.

# DNA Sequencing and *In Silico* Analysis of Sequence Data

These studies were previously described (Wilhelms et al., 2013). The dideoxy-chain termination method (Sanger et al., 1977), BLAST (Altschul et al., 1997; Bateman et al., 2002), and Clustal W were used.

### Complementation Studies

Complementation of the different mutants carrying the miniTn5 was done as previously described (Aquilini et al., 2013) by conjugal transfer of positive recombinant clones from the genomic library.

### Antisera

Anti-*P. shigelloides* polar flagellum and lateral flagella serum were independently obtained using purified polar flagellum or lateral

#### TABLE 1 | Bacterial strains and plasmids used.


<sup>a</sup>/ <sup>=</sup> *resistant.*

flagella obtained after cesium chloride, and assayed as previously described for other surface molecules (Tomás et al., 1991; Merino et al., 1992).

### Motility Assays (Swarming and Swimming)

The studies were performed as previously described (Wilhelms et al., 2012). Bacterial colonies were picked with a sterile toothpick and deposited into the center of swarm agar or swim agar plate. The plates were incubated up for 16–24 h at 25◦C and motility was examined by the migration of bacteria through the agar from the center toward the plate periphery. Swimming motility in liquid medium was observed by phasecontrast microscopy at a magnification of x 400 as previously (Wilhelms et al., 2012).

### Transmission Electron Microscopy (TEM)

Transmission electron microscopy (TEM) studies were performed as previously described (Wilhelms et al., 2012).

### Flagella Purification

*Plesiomonas shigelloides* strain 302-73 was grown in TSB for the polar flagellum purification. For the isolation of lateral flagella the strains were grown on TSA and recovered with 100 mM Tris (pH = 7.8). Purified flagella were isolated as previously described (Merino et al., 2014).

# Cytoplasmic Fraction

*Plesiomonas shigelloides* cytoplasmic fraction from strain 302-73 cells grown in TSB at 37◦C was obtained as previously described (Wilhelms et al., 2012).

### Immunological Methods

Western blot of cytoplasmic fractions or purified flagella was performed as previously described (Wilhelms et al., 2012). Immunoblotting was carried out as described (Towbin and Gordon, 1984) using specific anti-polar or lateral flagellins polyclonal serum (Canals et al., 2006; 1:2000).

# Electrospray Liquid Chromatography Mass Spectrometry

Mass spectrometry studies of intact flagellin proteins were carried out using 1 μg or less of protein, as described in our previous work (Wilhelms et al., 2012). Briefly, purified flagellin samples were injected onto a protein microtrap (Michrom Bioresources Inc., Auburn, CA, USA) connected to a gradient HPLC pump (Agilent 1100 HPLC). To resolve the proteins, a gradient of 5– 60% solvent B (1 mL/min) over 60 min was used, where Solvent A was 0.1% formic acid in HPLC grade water and solvent B was 0.1% formic acid in acetonitrile. A precolumn splitter was used to direct ∼35 μl/min of the HPLC mobile phase through the trap or column and into the electrospray interface of the QTOF2 (Waters, Milford, MA, USA) or Orbitrap XL Mass Spectrometer (Thermal, CA, USA) to allow real-time monitoring of ion elution profiles. Intact masses of proteins were calculated using MaxEnt (Waters, Beverly, MA, USA) software by spectral deconvolution.

To identify potential glycopeptides, flagellin (50–200 μg) was digested and analyzed as previously described (Wilhelms et al., 2012). Unmodified peptides were identified using MASCOT (Matrix Science, London, UK) as described (Wilhelms et al., 2012). Glycopeptide MS/MS spectra were *de novo* sequenced as previously described (Wilhelms et al., 2012).

# Construction of Defined in Frame Legionaminic Acid Mutants and their Complementation

The chromosomal in-frame *pgmL* and *legF* deletion mutants, 302*pgmL* and 302*legF*, respectively, were constructed by allelic exchange as described (Milton et al., 1996), and used by us (Merino et al., 2014). The primers used to obtain the mutants are listed in **Table 2**. Two DNA fragments (A–B and C–D) were obtained after asymmetric polymerase chain reactions (PCRs), then were annealed at their overlapping region, and a single DNA fragment obtained after PCR using primers A and D. pDM4*pgmL* and pDM4*legF* plasmids were obtained as previously described (Merino et al., 2014).These plasmids were transferred by triparental matings using the *E. coli* MC1061 *(*λ*pir*), the mobilizing strain *E. coli* HB101/pRK2073 and *P. shigelloides* mutant 302-73R as recipient strain. Colonies grown on plates with chloramphenicol and rifampicin, were confirmed for genome integration of vector by PCR analysis. Colonies grown rifampicin resistant (RifR) and chloramphenicol sensitive (CmS) after sucrose treatment, PCR confirmed for mutation were chosen.

Plasmids pBAD33-*pgmL* and pBAD33-*legF* were constructed carrying the wild type genes *pgmL* and *legF* by PCR amplification of genomic DNA by using specific primer pairs and ligated to the plasmid pBAD33 from ATCC (American Type Culture Collection; see the list of primers in **Table 2**). Plasmids pBAD33 *pgmL* and pBAD33-*legF* were introduced in *E. coli* DH5α by electroporation, and then by triparental matings were introduced in the corresponding mutants. Induction or repression of genes in pBAD33 was achieved as described in ATCC.

TABLE 2 | (A) Primers used in the construction of chromosomal in-frame deletion mutants. (B) Primers used for mutant complementation using vector pBAD33.


<sup>b</sup>*Underlined letters show BamHI or BglII restriction site.*


<sup>a</sup>*Primers contain SmaI(bold) and XbaI(underlined), the PCR amplified product (1496 bp) was ligated to SmaI- XbaI digested pBAD33.* <sup>b</sup>*Primers contain SmaI(bold) and XbaI(underlined), the PCR amplified product (982 bp) was ligated to SmaI- XbaI digested pBAD33.*

# Results

*Plesiomonas shigelloides* 302-73 [serogroup O1 (Pieretti et al., 2010)] grown in liquid medium or semisolid medium (swimming agar plates) showed the typical three-four flagella located in single point of one cell pole (lophotricus; **Figure 1**). However, when the agar concentration was increased, the flagellar distribution shifted from single pole to more disperse. The agar concentration seems to be involved in this change in flagella distribution. When the bacteria were grown in solid or semisolid media (swarming agar plates), a complete different flagella distribution was observed. As can be seen in **Figure 1** the flagella showed a typical peritrichous distribution over the entire cell surface.

A similar pattern of flagellar distribution with changes in growth medium was observed with 12 *P. shigelloides* strains. Among these strains eight represented five different serotypes (O1, O2, O3, O17, and O54) while four were non-serotyped strains. The source of the strains was from clinical stools (7) and fish (5), from Japan four of them, four from Spain, three from Brazil, and one from Poland.

# MiniTn5Km-1 Mutagenesis

A spontaneous rifampicin-resistant *P. shigelloides* mutant (named 302-73R) derived from the wild type strain 302-73 was isolated by our group. *P. shigelloides* 302-73R showed identical pattern of flagella production as described previously for wild type strain. We selected insertional mutants, as described in Materials and Methods, and grouped by their inability to swim, to swarm, or both negative characteristics.

Among an initial screening of 2500 colonies four mutants were selected (initially named A, B, C, and D), based upon

O1. TEM from cells grown in liquid medium (A) and swarming agar plates (B). Motility in swimming (C) and swarming (D) agar plates.

inability to swim but retaining the ability to swarm. A further, three mutants (initially named E, F, and G) were selected based upon inability to swarm but retaining ability to swim. Lastly, two mutants (initially named H and I) were selected that were unable to swim or swarm. Mutants A, B, C, and D, when observed by EM in appropriate conditions showed lateral flagella but not polar (**Figure 2**), while mutants E, F, and G (**Figure 3**), showed polar but not lateral flagella by EM when grown in appropriate conditions. Mutants H and I were unable to produce polar or lateral flagella observed by EM in any growth conditions (**Figure 4**). The presence of a single copy of the minitransposon in their genome was determined by Southern blot analysis. We were unable to clone the minitransposon-containing DNA fragment from the mutants using methodologies that were successful in other bacteria (Aquilini et al., 2013).

Complementation of the mutants, using a cosmid based genomic library of *P. shigelloides* 302-73 (see Materials and Methods) reversed the phenotype observed, either to swim or swarm in motility plates.

# Polar Flagella Mutants

We found several recombinant positive clones able to complement A, B, C, and D mutants. The complementation was studied by the recovery of swimming behavior under appropriate conditions. All complemented mutants were able to produce polar flagella when observed by EM growing in liquid conditions (**Figure 2**). Sequencing the recombinant positive clones complete inserts revealed the complete region to correspond to PLESHI\_03205 to PLESHI\_03505 in the complete *P. shigelloides* 302-73 genome (Piqué et al., 2013).

Polar flagella gene cluster, as shown in **Figure 5A**, are based in two gene regions (I and II) adjacent to a group of putative biosynthetic Leg genes. In region I there are several genes encoding chemotaxis proteins, including the σ<sup>28</sup> factor *fliA,* cluster from *flhB* to *G*, *fliK* to *R*, *fliE* to *J*, *flrA* and *C,* and *flaC* to *J* (transcribed in the same direction). This region I, similar to *V. parahaemolyticus* region two by gene distribution and transcription sense, also lacks the motor genes (McCarter, 2001). Region II, downstream of the putative biosynthetic Leg genes group, contains cluster *flgP*,*O*,*T*, or *flgA*,*M*,*N* with the typical transcription sense in the different Gram-negative bacteria described, two genes encoding chemotaxis proteins, and cluster *flgB* to *L*. By gene distribution and transcription sense this region II is similar to region 1 of *V. parahaemolyticus* and *Aeromonas hydrophila* (McCarter, 2001; Canals et al., 2006).

**Table 3** shows the ORFs with their predicted function based on their homology to proteins of known function. Proteins of unknown function were not included. The last gene in this region encoded an ORF (named Gt), which showed homology to domains of a glycosyltransferase. This was provisionally assigned to the polar flagella cluster and not to the putative biosynthetic Leg genes. Once the DNA fragment was completely sequenced, several primers were used to derive the DNA sequence to locate the miniTn5 [A = *flgE*, B = *flhA*, C = *fliI*, and D = *flgK* (**Figure 5A**)].

FIGURE 3 | *Plesiomonas shigelloides* E mutant (as an example for the insertional lateral flagella mutants). TEM of the E mutant grown in liquid medium (A) and swarming agar plates (B) and complemented mutant with COS-LAFI harboring the corresponding wild type gene grown in semisolid medium (C). As could be observed in

B the polar flagella are constitutively expressed in semisolid medium. Bar, correspond to 0.5 μm. Motility of the E mutant in swimming (D) and swarming (E) agar plates. The complemented mutant with COS-FLAregI-1harbouring the corresponding wild type gene in swarming agar plate (F).

#### Lateral Flagella Mutants

Several recombinant positive clones complemented E, F, and G mutants separately. Some clones were observed to complement two mutants. The complementation was studied on the basis of recovery of swarming behavior on appropriate growth plates. All complemented mutants were able to produce lateral flagella when observed by EM growing in semisolid conditions (**Figure 3**). We used the same strategy previously indicated to sequence the entire

DNA region contained in the recombinant positive clones. This complete region correspond to PLESHI\_07125 to PLESHI\_07305 in the complete *P. shigelloides* 302-73 genome (Piqué et al., 2013).

Lateral flagella gene cluster shows 37 genes grouped in a single region (**Figure 5B**). Five typical group of genes (*lafA* to *<sup>U</sup>*; *flgBL* to *LL*; *flgAL*,*ML*,*NL*; *fliEL* to *JL*; and *fliML* to *RL* plus *flhB*-*AL*) when compared to the most similar *A. hydrophila* AH-3 lateral flagella region were found. All the genes were found in a unique region similar to *A. hydrophila* or enteric bacteria. In contrast, in the equivalent region in *V. parahaemolyticus* is found in two separate regions (Canals et al., 2006; Merino et al., 2006). The group of genes *fliEL* to *JL* and *fliML* to *RL* plus *flhB*-*AL* are adjacent in all the lateral flagella clusters described. The groups of genes have been shown to be transcribed in the same direction in *A. hydrophila* and divergently in *Vibrio*, enteric bacteria and *P. shigelloides* (Merino and Tomás, 2009). **Table 4** shows the ORFs with their predicted function based on their homology to proteins of known function. All the protein analogies that were from unknown or not well-established homology were discarded. Between the group of genes *flgB*-*LL* and *lafA*-*U*, there is a gene encoding for a hypothetical protein without the classical motility accessory factors domains found in *A. hydrophyla* Maf-5. However, this encoded protein showed a minimal similarity with this Maf-5, and the gene was denoted *maf-5* (Parker et al., 2014). Once the DNA fragment was completely sequenced, we used several primers derived from the DNA sequence to locate the miniTn5 in *lafA* (E), *flhAL* (F), and *flgEL* (G; **Figure 5B**).

# Mutants Unable to Produce Flagella

A single recombinant positive clone was observed to complement both mutants H and I as they recover swimming and swarming in plates. The complemented mutants were able to produce polar and lateral flagella when observed by EM growing in appropriate conditions (**Figure 4**). Sequencing the entire DNA region in the recombinant positive clone showed this region to contain the group of putative biosynthetic Leg genes (**Figure 5A**) between region I and II codifying for the polar flagella. This complete region corresponds to PLESHI\_03365 to PLESHI\_03405 in the complete *P. shigelloides* 302-73 genome (Piqué et al., 2013).**Table 5** shows the ORFs with their predicted function based on their homology to proteins of known function.

The *Campylobacter jejuni* CMP-Leg biosynthetic pathway described involves two segments: synthesis of a GDP-sugar building block and synthesis of the final CMP-nonulosonate which are linked by the *N*-acetyl transferase GlmU (Schoenhofen et al., 2009). We found all the genes encoding for the necessary two segments of the CMP-Leg biosynthetic pathway in this region besides the one encoding phosphoglucosamine mutase (PgmL) included in the first segment of the biosynthesis. Once the DNA fragment was completely sequenced, we used several primers derived from the DNA sequence to establish that the miniTn5 was located in *ptmA* (H) and *legH* (I; **Figure 5A**).

# Flagella Purification

Polar flagellins were purified from the wild type strain after grown in liquid medium and a mixture of polar and lateral flagellins after grown in swarm agar plates (**Figure 6A**). Lateral flagellin was also isolated from insertion mutant A (unable to produce constitutive polar flagella with unaltered lateral flagella).

# Intact Mass Analysis of Purified Flagellins

Purified polar flagellin preparations showed a well-resolved ion envelop of multiple charged protein ions, which deconvoluted into three distinct masses at 40201, 40652, and 40931 Da. The

mass of the translated gene sequence for polar flagellin was 38710 Da, giving mass excesses of 1491, 1942, and 2221 Da, respectively (data not shown). During front end CID experiments of the purified polar flagellin preparation, labile glycan related ions were observed at *m/z* 359 and 317. Using increasing cone voltages, fragmentation of this ion at *m/z* 359 was observed, as shown in **Figure 7**. The fragment ions observed at *m/z* 317, 299, 281, 222, and 181 were characteristic fragment ions of nonulosonic acids, such as pseudaminic or legionaminic acid.

From the observed mass of 316.124, the top ranked plausible elemental formula was C13H21N2O6, suggestive that this moiety is a carbohydrate. The additional glycan ion observed at *m/z* 359, gave a top ranked plausible elemental formula C15H23N2O8, suggesting this species to be a nonulosonic acid with an additional of an acetyl group. An intense fragment ion was observed at *m/z* 341, most likely a loss of water from the glycan ion observed at *m/z* 359.

The preparation containing purified polar and lateral flagellins showed a more complex elution profile when HPLC separated, with two sequentially eluting protein peaks. The area under each peak was combined separately and each showed a complex ion envelope. The ion envelope of the first eluted protein deconvoluted into two distinct masses at 39325, 40678 Da. The second eluting protein ion envelope deconvoluted to give a single protein mass at 30940. It is possible that the larger MW proteins correspond to the polar flagellin and the 30 kDa protein the lateral flagellin. The A mutant that is unable to produce polar flagella showed only this second eluting peak when grown in swarming conditions (**Figure 6A**). In each case, the measured molecular mass is greater than that of the translated gene sequence for each protein. This suggests that both polar and lateral flagellins are post-translational modified. Front end CID experiments showed almost identical profiles when compared with the polar flagellin preparation, with intense ions observed at *m/z* 359, 317. These data suggest that both polar and lateral flagellins are modified with the same nonulosonic acid sugar, with or without acetylation.

# Bottom Up Mass Spectrometry Studies of Flagellins

Tandem mass spectrometry studies of tryptic digests of purified polar flagellins identified a number of unmodified peptides. *De novo* sequencing of the MS/MS data showed a number of spectra that were identified as flagellin peptides and harboring mass excess of 316 Da. Also observed was an intense ion at *m/z* 317, suggestive of a glycan oxonium ion. **Figure 8A** shows the MS/MS spectrum of the polar flagellin glycopeptide AIASLSTATINK, modified with a putative 316 Da glycan. Peptide type y and b fragment ions are annotated and confirm the peptide sequence. In addition, low *m/z* fragment ions that did not correspond to peptide type y or b ions were also observed at *m/z* 317, 299, 281, 240, 221, 196, and 181. Combined with the mass excess, glycan oxonium ion and putative glycan fragment ions, the data suggest the flagellin peptides to be modified with a legionaminic acid like glycan.

#### TABLE 3 | Characteristics of the *P. shigelloides* 302-73 strain polar flagella gene regions I and II.


#### TABLE 4 | Characteristics of the *P. shigelloides* 302-73 strain lateral flagella cluster.


The purified polar and lateral flagellins were also digested with trypsin and analyzed by tandem mass spectrometry, identifying a number of unmodified flagellin peptides. Once again, *de novo* sequencing showed several flagellin peptides from both polar and lateral flagellins to be modified with putative glycan moieties. The lateral flagellin (LafA) harbored peptides modified with glycans of 316 and 358 Da (**Figure 8B**). In some cases peptides were showed to harbor both glycans. It was not clear from the data whether two monosaccharides were modifying two separate amino acids, or whether a single disaccharide was modifying at one site.

The polar flagellin was also observed to be modified with 316 and 358 Da glycan moieties. In some cases, glycan chains comprised of multiple 358 Da glycans were observed; in other cases a single modification of 316 or 358 Da was noted. Very low levels of peptides harboring distinct glycan masses were observed, such as the peptide AIASLSTATINK, was observed to be modified with either 316 Da glycan, or a 523 or 481 Da glycan. Glycan related ions were observed in each case, with intense ions observed at *m/z* 524 and 184 or *m/z* 424 and 184. The ion at *m/z* 184 was also observed in front end CID experiments with the intact polar and lateral flagellin preparations, and gave a top ranked plausible elemental formula of C9H12O4, suggesting that it is a related nonulosonic acid type sugar. The low abundance of these glycopeptides made any further analyses challenging.


### Legionaminic Acid Biosynthetic Mutants

The insertional mutants in *ptmA* (H) and *legH* (I) were unable to produce polar or lateral flagella under induced conditions, as shown by TEM or by immunodetection (**Figure 6B**) or lateral flagellins (**Figure 6C**) in purified flagella. The introduction of the *P. shigelloides* wild type genes was observed to recover the production of polar and lateral flagella in the mutants. This was demonstrated using immunodections, as shown in **Figures 6B,C**. These data prompted us to examine the production of the polar flagellin in the mutants by immunodetection. Western blot analysis shows presence of polar flagellin the cytoplasmic subcellular fraction. Interestingly, only a single protein band was observed, with a lower than expected molecular weight (**Figure 6D**). Wild type flagellin typically migrates as two distainct bands, both detectable by Western blot. We speculate that the single, lower molecular weight species is a non-glycosylated form of flagellin. The complemented mutants showed the same cytoplasmic polar flagellin molecular weight bands as observed with wild type strain. Similarly, where lateral flagellin was detected in the cytoplasmic fraction, it was observed at a lower molecular weight, likely the non-modified form of the protein. Then, the lack of polar and lateral flagella formation observed in the mutants is not by the lack of flagellin protein or the master regulator transcription.

In order to prove at the genomic level that mutations in the CMP-Leg biosynthetic pathway were responsible for the phenotypic traits shown by insertional mutants H and I, two in-frame *pgmL* and *legF* deletion mutants were generated, 302*pgmL* and 302*legF*, respectively. Our genomic studies indicates that all the genes of the Leg pathway are included in the cluster between polar region I and II, with the exception

of the PgmL ortholog which is found in another region of the chromosome [703.5 peg 1785 (Piqué et al., 2013)]. PgmL or GlmM, phosphoglucosamine mutase, is involved in the first step to produce GDP-GlcNAc. LegF, CMP-legionaminic acid synthase is the final enzyme of the second step to produce CMP-Leg. Using TEM, neither mutant was observed to produce polar or lateral flagella under induced conditions. Both show the same phenotypic traits as insertional mutants H and I. When mutants 302*pgmL* and 302*legF* were complemented with their single corresponding wild type gene (pBAD33-*pgmL* and pBAD33-*legF*, respectively) under inducing conditions (plus arabinose) all the wild type phenotypic traits (production of polar and lateral flagella or swimming and swarming motilities) were fully recovered. Control plasmid pBAD33 alone under inducing conditions (plus arabinose) was unable to do it.

# Lateral Flagella and Leg *O*-Flagella Glycosylation Gene Distribution on *P. shigelloides*

In order to test if the presence of lateral flagella and Leg *O*-flagella glycosylation genes is a specific feature for the strain studied, the 12 previously mentioned *P. shigelloides* strains used for PCR studies were eight strains representing five different serotypes (O1, O2, O3, O17, and O54) plus 4 non-serotyped strains described in Material and Methods. Initially, genomic DNA from 302-73 strain was used as template for PCR amplification with two sets of oligonucleotides: 5 -ATCGCGTCTGAAAGGCTAC-3 and 5 -CTGCGCCATAGAACTACCC-3 which amplified a 2160 bp DNA fragment from lateral flagella cluster (partial *lafA* and complete *maf-5*); and another oligonucleotide set (5 - CGGGTTAAAGCTATCCCATC-3 and 5 -CCAATGACAGC

FIGURE 6 | (A) *Plesiomonas shigelloides* 302-73 wild type strain serotype O1 purified flagella according to Section "Materials and Methods" when grown in liquid medium (1) and swarming agar plates (2). As could be observed in 2, and previously indicated in Figure 3, polar flagella are constitutively expressed in semisolid medium. Purified flagella from *P. shigelloides* insertional polar A mutant grown in swarming agar plates (3). (B). Western blot with specific polar flagella antiserum of purified flagella from wild type (1), *P. shigelloides* insertional polar H mutant (2), and complemented mutant with COS-LEG harboring the corresponding wild type gene (3) obtained in liquid medium growth. (C) Western blot with specific lateral flagella antiserum of purified flagella from wild type (1), *P. shigelloides* insertional polar H mutant (2), and complemented mutant with COS-LEG harboring the corresponding wild type gene (3) obtained in swarming agar plates. (D) Western blot with specific polar flagella antiserum of cytoplasmic fractions obtained as described in Section "Materials and Methods" of wild type (1), *P. shigelloides* insertional polar H mutant (2), and complemented mutant with COS-LEG harboring the corresponding wild type gene (3) obtained in liquid medium growth. The low molecular weight band could correspond to the non-glycosylated form, and the upper band (not present in the mutant) to the glycosylated form.

TGAATCTCC-3 ) amplified a 1985 bp DNA fragment from Leg biosynthesis genes (partial *legH* and complete *legI*). DNA fragments of the same size (2160 and 1985 bp, respectively) were PCR amplified for all the genomic DNAs from the strains studied, as shown by the results shown in **Figure 9**. DNA sequence of the amplified fragments confirmed the presence of the lateral and Leg biosynthetic genes. In addition, in all the amplified *maf-5* and *legI* fragments the presence of a sequence coding for the N-terminal amino acid residues of *lafA* and *legH* genes, respectively, were found adjacent to *maf-5* or *legI*, suggesting that in the analyzed strains the genomic location is the same as that found in *P. shigelloides* wild type strain 302-73 (**Figure 5**).

## Discussion

Motility is an essential mechanism in adaptation to different environments for free living bacteria. Bacteria showed three

flagella types classified according to their location on a cell: peritrichous, polar, and lateral. It has been reported dual flagella systems in some polar flagellated bacteria when grow in viscous environments or surfaces. This fact allows bacteria to swarm on solid or semisolid media by a mixed flagellation (polar and lateral flagella). *P. shigelloides* has been observed to express mixed flagellation (Inoue et al., 1991).

Two *P. shigelloides* 302-73 different gene clusters were described, one exclusively involved in lateral flagella biosynthesis, and a second containing the polar flagella genes distributed in two regions spaced by putative glycosylation genes. It is characteristic of the bacteria with dual flagella systems to separate both in different gene clusters (McCarter, 2001; Canals et al., 2006; Merino et al., 2006; Merino and Tomás, 2009). Of note, *P. shigelloides* is the first *Enterobacteriaceae* with lateral flagella production as shown herein.

*Plesiomonas shigelloides* lateral gene cluster is nearly identical to the lateral gene cluster of *A. hydrophila* according to the gene grouping and transcription direction, with the exception of the group of genes *fliML* to *RL* plus *flhB*-*AL* which are transcribed in opposite direction (Canals et al., 2006). However, no *lafK* ortholog could be detected in *P. shigelloides* lateral gene cluster. This gene has been reported in all the lateral gene clusters, including the non-functional in the *Enterobacteriaceae* (Canals et al., 2006; Merino and Tomás, 2009). A non-functional Flag-2 flagella cluster with large similarity to *V. parahaemolyticus* lateral flagella system, was found in different *E. coli* enteroaggregative or *Yersinia pestis* or *pseudotuberculosis* strains (Ren et al., 2004). However, as we proved, *P. shigelloides* lateral gene cluster is fully functional.

The transcriptional hierarchy of *V. parahaemolyticus* lateral flagella is one of the *Gammaproteobacteria* model. LafK (σ54-associated transcriptional activator) is the master regulon in this model, controlling Class II lateral flagella genes transcription. Class II genes contains the σ<sup>28</sup> factor (*fliA*L) which is involved in transcription of Class III lateral flagella genes (Stewart and McCarter, 2003). In *V. parahaemolyticus* the absence of polar flagellum induces the expression of lateral flagella in liquid medium, and LafK is able to compensate the lack of FlaK (σ54-associated polar transcriptional activator) and activate polar flagellum class promoters. *A. hydrophila* lateral flagella transcriptional hierarchy represents the second *Gammaproteobacteria* model. Class I gene transcription in *A. hydrophila* lateral flagella is σ70-dependent as LafK in contrast to describe in *V. parahaemolyticus* (Stewart and McCarter, 2003). It is important to point out that *A. hydrophila* lateral flagella genes are transcribed in liquid and solid or semisolid media, and unlike *V. parahaemolitycus* the genes are not induced by mutation of polar flagellum genes. The transcription hierarchy of *A. hydrophila* lateral flagella is complex because LafK is not strictly their master lateral flagella regulator, and many clusters of genes are LafK independently transcribed (Wilhelms et al., 2013). *A. hydrophila* LafK protein is unable to not compensate the lack of FlrA, which is the polar-flagellum regulator (σ54-associated transcriptional activator for polar flagellum), a situation that happens in *V. parahaemolyticus* (Wilhelms et al., 2013). This point is in agreement with *A. hydrophila* FlrA mutation not affect

ions at *m/z* 299, 281, 240, 221, and 181. (B) From lateral flagellin, the

fragment ions are indicated with an asterisk (∗).

lateral flagella besides that abolishes polar flagellum formation in liquid and on solid surfaces (Wilhelms et al., 2013).

The *P. shigelloides* polar flagella gene regions show greater similarity to those reported in *Vibrio* or *Aeromonas* than the regions in *Enterobacteriaceae* [e.g., *E. coli* or *S. typhimurium* (Chilcott and Hughes, 2000)]. Bacteria with peritrichous flagella, such as *E. coli* and *Salmonella*, showed three hierarchy levels. The σ<sup>70</sup> is required for transcription of class I and II genes, and class I promoter responds to different regulatory factors and transcribes the FlhDC master activator, which allowed the class II σ70-dependent promoter expression. At the top of the *Vibrio* sp. or *A. hydrophyla* polar flagella hierarchy is σ54-associated transcriptional activator (FlrA, named FleQ in *Pseudomonas aeruginosa*) which activates class II genes σ54-dependent promoters. Class II promoters encode a two component signaltransducing system (*Vibrio* sp. or *A. hydrophyla* FlrBC and FleSR in *P. aeruginosa*) whose regulator (FlrC/FleR) activates class III genes σ54-dependent promoters.

In the *P. shigelloides* polar flagella region I only *flrA* and *C* orthologs were observed. *P. shigelloides* FlrA shows the characteristic three domains (FleO, σ<sup>54</sup> -interaction domain and family regulatory protein Fis) like in *Vibrio* sp. or *A. hydrophila* (Kim and McCarter, 2004; Wilhelms et al., 2011). Class II promoters encode a two component signal-transducing system (FlrBC of *Vibrio* sp. or *A. hydrophila* and FleSR in *P. aeruginosa*) whose regulator (FlrC/FleR) activates class III σ54-dependent promoters. However, when analysis of *P. shigelloides* FlrC encoded protein, revealed the corresponding domains for FlrB and C. Thus, *P. shigelloides* FlrC contains two domains of *Vibrio* sp. or *A. hydrophila* FlrB (PAS domain and His Kinase A) as well as two domains of *Vibrio* sp. or *A. hydrophila* FrlC (σ54-interaction domain and family regulatory protein Fis). We suggest that *P. shigelloides* FlrC could be able to activate class III genes σ54-dependent promoters as observed in *Vibrio* sp. or *A. hydrophila*. No FlrB ortholog was observed in the *P. shigelloides* 302-73 genome (Piqué et al., 2013). It could be suggested that in *P. shigelloides*, FlrB and C functions are developed by a single bifunctional protein encoded by the single *flrC* as it happens for some LPS-core biosynthetic genes (Jiménez et al., 2009). Taken together, the data presented hererin, no *lafK* or separate *flrB* in *P. shigelloides*, indicate that their lateral and polar flagella transcriptional hierarchy represents a different *Gammaproteobacteria* model that requires further study.

Among this large *P. shigelloides* polar flagella gene cluster, genes were identified between the two polar flagella regions, the presence of genes putatively linked to glycosylation. These genes were not found in other *Enterobacteriaceae* studied. *O*-glycosylation could be performed by a mechanism dependent or not of an oligosaccharyltransferase (OTase; Kim and McCarter, 2004; Iwashkiw et al., 2013). The *O*-glycosylation frequently affects protein stability, flagella filament assembly, bacterial adhesion, biofilm formation, and virulence in general as has been described in several bacteria (Lindenthal and Elsinghorst, 1999; Logan, 2006; Faridmoayer et al., 2008; Egge-Jacobsen et al., 2011; Iwashkiw et al., 2013; Lithgow et al., 2014). The predominant *O*-glycans linked to flagellins are mainly derivatives of pseudaminic acid (PseAc, where Ac represents an acetamido group) and in a minor extent an acetamidino form of legionaminic acid (LegAm, where Am represents acetamidino; Merino et al., 2014). Both are nine-carbon sugars related to sialic acid. The flagellin glycosylation pathways in both cases have been elucidated, including the Pse pathway of *Helicobacter pylori* and *C. jejuni* (Fox, 2002), the Leg pathway of *C. jejuni* (Schoenhofen et al., 2009). Until today the Leg flagella glycosylation has been restricted to *C. jejuni* or *coli*. The CMP-legionaminic acid biosynthetic pathway in *C. jejuni* involves two steps: synthesis of a GDP-GlcNAc and synthesis of the final CMP-Leg (Schoenhofen et al., 2009). The insertional mutants obtained *ptmA* (H) and *legH* (I), represent key eznymes in the first and second steps of the CMP-Leg biosynthesis, confirming the observation data that both mutants are unable to produce polar or lateral flagella. Furthermore, the in frame mutants obtained in *pgmL* and *legF*, one enzyme of the first step and the last enzyme of the second step of the CMP-Leg biosynthesis, respectively, clearly confirmed the legionaminic acid polar and lateral glycosylation as both mutants are unable to produce polar or lateral flagella as it happens with the insertional mutants.

Mass spectrometry studies show that both flagella in *P. shigelloides* strain 302-73 are glycosylated by a derivative of Leg, and is also indicated by the presence of Leg biosynthetic pathway genes nearby the polar flagella gene regions. It is the first *Enterobacteriaceae* reported to harbor *O*-glycosylation modification on both polar and lateral flagella. Moreover, it is also the first bacteria reported to express a lateral flagella glysosylated by Leg. We also demonstrated that flagella *O*-glycosylation is essential for bacterial flagella formation, either polar or lateral. However, the flagella *O*-glycosylation is not determinant for cytoplasmic flagellin production as can be observed by immunodetection studies.

The *P. shigelloides* homologous recombination rates are extremely high (Salerno et al., 2007), like naturally transformable species as *Streptococcus pneumonia*e. In the rest of *Enterobacteriaceae* the recombination rate is much lower. The high recombination observed in this bacterium could offer a reason for *P. shigelloides* variety of LPScore structures (Salerno et al., 2007). The PCR experiments using several *P. shigelloides* strains and lateral flagella or Leg pathway genes, with the motility and EM studies, demonstrated that presence of lateral flagella and Leg *O*-flagella glycosylation is a widely spread feature, not a strain specific observation. Furthermore, the maintenance of these genes among the different strains besides the recombination rate

## References


observed for *P. shigelloides*, indicates the importance of glycosylated polar and lateral flagella production for this bacterium.

# Acknowledgments

This work was supported by Plan Nacional de I + D + i (Ministerio de Educación, Ciencia y Deporte and Ministerio de Sanidad, Spain) and from Generalitat de Catalunya (Centre de Referència en Biotecnologia). We thank Maite Polo for her technical assistance and the Servicios Científico-Técnicos from University of Barcelona.

for swimming and swarming under high-pressure conditions. *Appl. Environ. Microbiol.* 74, 6298–6305. doi: 10.1128/AEM.01316-08


flagellum genes. *J. Bacteriol.* 193, 5179–5190. doi: 10.1128/JB.053 55-11

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Merino, Aquilini, Fulton, Twine and Tomás. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# *Bordetella pertussis* Isolates from Argentinean Whooping Cough Patients Display Enhanced Biofilm Formation Capacity Compared to Tohama I Reference Strain

#### *Edited by:*

*German Bou, University Hospital La Coruña, Spain*

#### *Reviewed by:*

*Alberto A. Iglesias, Instituto de Agrobiotecnología del Litoral, UNL-CONICET, Argentina Hülya Ölmez, TÜBITAK Marmara Research Center, Food Institute, Turkey*

#### *\*Correspondence:*

*Osvaldo M. Yantorno yantorno@quimica.unlp.edu.ar; Monika Ehling-Schulz monika.ehlingschulz@vetmeduni.ac.at †These authors have contributed equally to this work.*

#### *Specialty section:*

*This article was submitted to Microbial Physiology and Metabolism, a section of the journal Frontiers in Microbiology*

> *Received: 01 September 2015 Accepted: 16 November 2015 Published: 08 December 2015*

#### *Citation:*

*Arnal L, Grunert T, Cattelan N, de Gouw D, Villalba MI, Serra DO, Mooi FR, Ehling-Schulz M and Yantorno OM (2015) Bordetella pertussis Isolates from Argentinean Whooping Cough Patients Display Enhanced Biofilm Formation Capacity Compared to Tohama I Reference Strain. Front. Microbiol. 6:1352. doi: 10.3389/fmicb.2015.01352*

*Laura Arnal1†, Tom Grunert2†, Natalia Cattelan1, Daan de Gouw3,4, María I. Villalba1, Diego O. Serra1,5, Frits R. Mooi6, Monika Ehling-Schulz2\* and Osvaldo M. Yantorno1\**

*<sup>1</sup> CINDEFI–Centro Científico Tecnológico CONICET La Plata, Facultad de Ciencias Exactas, Universidad Nacional de La Plata, Buenos Aires, Argentina, <sup>2</sup> Functional Microbiology, Institute of Microbiology, Department of Pathobiology, University of Veterinary Medicine Vienna, Vienna, Austria, <sup>3</sup> Laboratory of Pediatric Infectious Diseases, Department of Pediatrics, Radboud University Medical Centre, Nijmegen, Netherlands, <sup>4</sup> Laboratory of Medical Immunology, Department of Laboratory Medicine, Radboud University Medical Centre, Nijmegen, Netherlands, <sup>5</sup> Mikrobiologie, Institut for Biologie, Humboldt-Universitat zu Berlin, Berlin, Germany, <sup>6</sup> Netherlands Centre for Infectious Disease Control, National Institute for Public Health and the Environment, Bilthoven, Netherlands*

Pertussis is a highly contagious disease mainly caused by *Bordetella pertussis*. Despite the massive use of vaccines, since the 1950s the disease has become re-emergent in 2000 with a shift in incidence from infants to adolescents and adults. Clearly, the efficacy of current cellular or acellular vaccines, formulated from bacteria grown in stirred bioreactors is limited, presenting a challenge for future vaccine development. For gaining insights into the role of *B. pertussis* biofilm development for host colonization and persistence within the host, we examined the biofilm forming capacity of eight argentinean clinical isolates recovered from 2001 to 2007. All clinical isolates showed an enhanced potential for biofilm formation compared to the reference strain Tohama I. We further selected the clinical isolate *B. pertussis* 2723, exhibiting the highest biofilm biomass production, for quantitative proteomic profiling by means of two-dimensional fluorescence difference gel electrophoresis (2D-DIGE) coupled with mass spectrometry, which was accompanied by targeted transcriptional analysis. Results revealed an elevated expression of several virulence factors, including adhesins involved in biofilm development. In addition, we observed a higher expression of energy metabolism enzymes in the clinical isolate compared to the Tohama I strain. Furthermore, all clinical isolates carried a polymorphism in the *bvgS* gene. This mutation was associated to an increased sensitivity to modulation and a faster rate of adhesion to abiotic surfaces. Thus, the phenotypic biofilm characteristics shown by the clinical isolates might represent an important, hitherto underestimated, adaptive strategy for host colonization and long time persistence within the host.

Keywords: whooping cough, *Bordetella pertussis,* clinical isolates, biofilm, proteomic, real time PCR

# INTRODUCTION

*Bordetella pertussis* is a human-restricted pathogen specifically adapted to infect the respiratory tract producing whooping cough or pertussis. Despite the success of mass immunization in reducing the incidence of the disease in the 1950s, after six decades of sustained high vaccination coverage, whooping cough remains endemic with epidemic cycles every 2–5 years (Mooi et al., 1998, 2001, 2009, 2014; Heikkinen et al., 2007; King et al., 2008). Although the disease has been associated to an acute infection, mainly affecting unvaccinated infants aged <6 months, in the last two decades, a shift in the incidence toward vaccinated children, adolescents, and adults has become increasingly evident (Cherry, 1999; Hellenbrand et al., 2009; de Greeff et al., 2010). This new scenario represents a significant health concern, since these individuals could provide reservoirs for *B. pertussis* transmission. Several reasons for the resurgence and persistence of pertussis in the population have been discussed, including: waning immunity over time, variation between circulating isolates and vaccine strains as a result of constant pathogen adaptation, and reduced efficiency of vaccine formulations (He et al., 2003; Hewlett and Edwards, 2005; Brinig et al., 2006; Bottero et al., 2007; Berbers et al., 2009; Elomaa et al., 2009; Lavine et al., 2011). For the commercial production of both, cellular and acellular vaccines, *B. pertussis* cells are grown in stirred bioreactor operated in batch culture. This planktonic (free-floating) mode of growth does not reflect the lifestyle of the pathogen in its host, where bacteria must primarily adhere to ciliated respiratory epithelial cells; in this hostile environment, bacteria must resist mucociliary clearance and avoid the immune system's mechanisms, adjusting their growth state and virulence accordingly to survive.

Numerous reports provide evidence that the ability of pathogens to adhere and grow attached to tissues' surfaces in microbial communities, known as biofilm, is crucial for the development and progression of human infections (Costerton et al., 1999; Hall-Stoodley et al., 2004). Generally, biofilm development, which is often associated to an enhanced resistance to antimicrobial agents and host defenses, is considered as an important survival strategy for bacteria (Ito et al., 2009; Hoiby et al., 2010; Gurung et al., 2013). In addition, the switch from planktonic to biofilm lifestyle is accompanied by significant changes in bacterial metabolism and phenotypic features, which represent a unique challenge for the development of novel prophylactic therapeutics. We as well as others have shown the capacity of *Bordetella* spp. to grow adhered to abiotic and biotic surfaces as biofilms (Irie et al., 2004; Mishra et al., 2005; Serra et al., 2008, 2011). The two-component sensory transduction system BvgAS, which controls the expression of nearly all known virulence factors, was reported to play an important role in the regulation of biofilm formation for these bacteria (Irie et al., 2004; Mishra et al., 2005). However, the role of biofilm in *B. pertussis* pathogenesis is not yet fully understood and, with a few exceptions (de Gouw et al., 2014), this mode of growth is still largely ignored when new antigens are selected for the formulation of novel pertussis vaccines. Thus, the aim of this study was to compare the biofilm formation by a well-characterized reference strain and eight *B. pertussis* clinical isolates, retrieved from children with pertussis during a 7-years period in Argentina. A comparative analysis, employing proteomics, targeted transcriptomics, and complementary genetic studies including the reference strain Tohama I (which has been sub-cultured *in vitro* since the 1950s), and the clinical isolate Bp2723 (selected by its high capacity of biofilm growth) were carried out to gain insight into the mechanisms responsible for the different behavior of sessile cells exposed to similar external conditions. Our results support the hypothesis that the phenotypic heterogeneity between the reference strain and clinical isolates reflects a specific adaptation of clinical *B. pertussis* to its host. Interestingly, a single nucleotide polymorphism in the *bvgS* gene in all clinical isolates was found, which might implicate an intrinsic feature of the circulating cells. This mutation allowed a fast adaptive response of modulated cells (vir-), incubated under non-modulating conditions, to adhere on abiotic surfaces. Our results foster the hypothesis that these bacteria have developed a repertoire of mechanisms that enable adaptive response to growth adhered to surfaces, allowing these cells to persist in unfriendly environments.

# MATERIALS AND METHODS

# Bacterial Strains and Culture Conditions

*Bordetella pertussis* Tohama I strain -isolated in Japan in the 1950s- was obtained from the Collection of Institute Pasteur, Paris, France (CIP 8132); BPSM, a streptomycin resistant (Smr) strain derivative from Tohama I; BpK705E, a mutant derivative of BPSM in which the amino acid lysine (K) at position 705 of the BvgS has been replaced by glutamic acid (E) (Herrou et al., 2009); and eight *B. pertussis* clinical isolates collected at La Plata Children's Hospital (Hospital Interzonal de Agudos Especializado en Pediatría Sor Maria Ludovica, La Plata, Argentina) from 2001 to 2007 (**Table 1**) were used throughout this study. Stock cultures were grown on Bordet–Gengou agar (Difco Laboratories, Detroit, MI, USA) plates supplemented



with 1% w/v Bactopectone (Difco) and 15% v/v defibrinated sheep blood (Instituto Biológico, Ministerio de Salud, La Plata, Argentina; BGA) for 72 h at 37◦C. Colonies were cultured for others 48 h and then inoculated into 250-mL Erlenmeyer flasks containing 50-mL of Stainer–Scholte (SS) broth and incubated for 24 h at 37◦C on a rotatory shaker (160 rpm). The suspension was used as inoculum for 1-L Erlenmeyer flasks containing 200 mL of SS broth. The initial optical density at 650 nm (OD650) was adjusted to 0.2 and the flaks were incubated for 24 h with agitation. Bacteria were harvested (15◦C, 8000 × *g*, 20 min) at exponential phase, frozen using liquid Nitrogen for 30 s and stored for 48 h at –80◦C before being freeze-dried. To study the growth kinetic in liquid medium, samples were collected every 2 h and the OD650 was measured. Three independent experiments were performed for each strain, averages and standard deviations of the experimental data obtained are indicated in the figures. Specific growth rates (μ) were obtained from curves ln OD650 vs. time.

# Biofilm Cultures

The biofilm growth was performed as indicated previously (de Gouw et al., 2014). Briefly, for each *B. pertussis* strain, a bacterial suspension of planktonic cells (24 h of growth), adjusted to an OD650 = 1.0 (1.0 × 10<sup>9</sup> CFUs/mL) was incubated with 20 g of polypropylene beads (4.2 mm diameter and 2 mm high, average density: 0.901 g/L, Petroken SA, Argentina) contained in glass tubes for 4 h at 37◦C under static conditions. The cell suspension was drained and 20-mL of fresh medium were added to each reactor (glass tubes). The tubes were incubated for 72 h on roller drums under 60 rpm agitation. The culture medium was replaced every 24 h by fresh broth. Before harvest, the beads were washed three times with phosphate buffer saline (PBS) and then used for crystal violet staining. Growth kinetic for sessile cells of *B. pertussis* 2723 strain was evaluated analyzing colony forming units (CFU) every 24 h until 120 h of development. These experiments were performed by triplicate. In a similar way, a semi-continuous biofilm culture was performed to obtain samples able to be analyzed by confocal laser scanning microscopy (CLSM). Duplicates of Bp Tohama I and Bp2723 biofilms were grown attached to glass cover slips. In a first step, bacteria coming from a 24 h planktonic culture were incubated during 4 h with the cover slips inside a Petri dish. Then, broth was changed for fresh medium and incubated under agitation (60 rpm) for 72 h. Every 24 h the medium was changed for fresh broth. After 72 h the cover slips were carefully washed with PBS and stained for CLSM analysis.

For proteomic studies sessile cells were cultured on polypropylene beads contained in column bioreactors as was previously described (Serra et al., 2008) with minor modifications. Briefly, *B. pertussis* Tohama I strain or the clinical isolate *B. pertussis* 2723 were grown in 200-mL SS broth for 24 h and then used to inoculate column reactors. After 4 h of static incubation to allow cell attachment, the suspension was drained to remove non-adhered cells and 200-mL of fresh SS broth were added to each column. Bioreactors were incubated with a constant air supply (0.1 vvm) at 37◦C for 72 h (mature biofilm stage). Every 24 h the broth was replaced by fresh medium. Afterward, polypropylene beads were washed three times with PBS prior to harvest the cells. The biofilm was detached from the surface by lightly agitation on PBS, subsequently, cells were harvested (15◦C, 8000 × *g*, 20 min) and immediately frozen in liquid Nitrogen and stored at –80◦C before being freeze-dried.

# Fourier Transform Infrared Spectroscopy (FT-IR)

For infrared analysis of biofilms each strain was grown in 6 well plates. After 72 h incubation, the biofilms were washed three times with distilled water and the biomass attached to the wells were suspended in bi-distilled water, adjusting the OD650 to 10. Samples were prepared from three independent assays by triplicate in each case. One hundred microliters of each bacterial suspension were transferred to an optical cell of zinc selenide (ZnSe) and vacuum dried (3.6 kPa) to obtain transparent films on the cell. FT-IR absorption spectra from 4,000 to 600 cm−<sup>1</sup> were acquired with a FT-IR spectrometer (Spectrum One, Perkin Elmer, USA) as was reported (Serra et al., 2007). Infrared analysis was carried out as by means of OPUS 7.0 software (Bruker Optics, USA).

# Quantification of Biofilm Biomass

Biofilm biomass was quantified using the crystal violet assay described by O'Toole and Kolter (1998) adapted to the system. The absorbance of the solubilized dye was measured at 590 nm (OD590). Triplicates were made for each strain and *t*-Student's test was used to compare absorbance against *B. pertussis* Tohama I's biofilm. Samples were considered significantly different when *p* ≤ 0.05. CLSM was used to study the architecture and quantitative information of 72 h biofilms. An inverted confocal microscope Carl Zeiss LSM510-Axiovert 100M (Germany) was used as previously reported (Serra et al., 2007). Briefly, biofilms coming from semi-continuous culture, adhered to glass cover slips were first washed very carefully in PBS, and fixed with 4% paraformaldheyde. Then, adhered cells were rinsed in PBS, stained for 20 min with Acridine Orange and washed three times. In order to obtain quantitative information of mature biofilm structure, images were analyzed by COMSTAT software (Heydorn et al., 2000).

# Preparation of Soluble Protein Fraction

Cytosolic proteins were obtained following the protocol described by Ehling-Schulz et al. (2002) with minor modifications. Planktonic and sessile freeze-dried bacteria were suspended in detergent buffer containing 7 M urea, 2 M thiourea, 4% CHAPS and 30 mM TRIS (Sigma, St. Louis, MO, USA), cooled and then passed through a precooled French pressure cell (SLM AMINCO) working at 140 mPa two times. Cellular debris was harvested by centrifugation (4◦C, 10000 × *g*, 15 min). The supernatant was transferred to ultracentrifuge tubes (Beckman, USA) and centrifuged at 30000 × *g* for 40 min at 15◦C. The supernatant containing cytosolic proteins was stored in aliquots at –80◦C. Protein concentration was estimated using the 2-D Quant kit following the manufacture's protocol (GE Healthcare, Amersham Biosciences).

# 2D-DIGE

For two-dimensional difference gel electrophoresis (2D-DIGE) protein samples were minimally labeled as previously described (Radwan et al., 2008) with minor modifications. CyDye DIGETM fluorescent dyes (GE Healthcare Life Science, Munich, Germany) were used to label 33 μg of proteins per sample using 8 nmol dye/mg proteins. For each mode of growth three biological replicates were used. Biofilm and planktonic samples from each strain were labeled with Cye3 and Cye5. The internal standard comprising a pool of equal amounts from all samples was labeled with Cye2. Isoelectric focusing was carried out on an IPGphor III (GE Healthcare, Amersham Biosciences) system using 18 cm IPG Dry strips with linear pH gradients of 4–7 and 6–9 (all GE Healthcare, Amersham Biosciences). The IPG strips were rehydrated over night with rehydration buffer [7 M urea, 2 M thiourea, 4% (w/v) CHAPS, 0.4% (w/v) DTT, 0.5% carrier ampholytes] at room temperature. DTT 0.4% (w/v) and carrier ampholytes 0.5% (v/v) were added to the mixed proteins samples in detergent buffer and the final volume was adjusted to 50 μL with rehydration buffer. Protein samples were then loaded onto the strips via loading cups. pH 4–7 strips were focused for a total of 36 KVh and pH 6–9 strips were focused for a total of 27 KVh. The IPG strips were reduced with 1% w/v DTT for 15 min and alkylated using 4% (w/v) iodoacetamide for 15 min in equilibration buffer (6 M urea, 30% (v/v) glycerin, 2% (w/v) SDS) and SDS-PAGE (12.5% T) was subsequently performed overnight (13 mA per gel) using an Ettan Dalt Six Electrophoresis Chamber (GE Healthcare, Amersham, Biosciences).

# Imaging and Data Processing

Fluorescence images of the gels were acquired using a Typhoon 9400 scanner (GE Healthcare). Data analysis was performed with the DeCyder software version 7.0 (GE Healthcare). Spot detection, matching, and normalization were performed using a standard algorithm of the software. Spot matching as well as spot quality of proteins of interest were manually checked eliminating false positives. To assess the reproducibility of gel replicates, principal component analysis (PCA) was carried out employing the DeCyder analysis module. For PCA all spots within the ANOVA 95th confidence interval were included. Spots showing more than threefold changes (*p* ≤ 0.05) in abundance between the strains growing as biofilm or in planktonic mode were considered as significant differences, manually excised from silvers stained gels (Blum et al., 1989; Miller and Gemeiner, 1992) and subjected to mass-spectrometry for protein identification.

# In-gel Trypsin Digestion and MS-Based Protein Identification

Protein identification was carried out using a matrix-assisted laser desorption/ionization time-of-flight (MALDI–TOF) mass spectrometer (Bruker Daltonics, Ultraflex I) in MS and MS/MS modes. Spot distaining, in gel digestion and sample purification using Zip-Tipμ-C18 (Millipore) pipette tips were performed as reported previously (Blum et al., 1989). Samples were applied on a disposable target plate (Bruker Daltonics, PAC target) prespotted with α-cyano-4-hydroxycinnamic acid as matrix. Spectral pre-processing and peak annotation were carried out using FlexAnalysis 3.0 and Biotools 3.2 (Bruker Daltonics). Processed MS and MS/MS spectra were submitted to MASCOT server (Matrix Science) searching the database NCBInr restricting to *B. pertussis* Tohama I strain. Peptide mass fingerprint (PMF) search parameters were set for mass accuracy: <150 ppm, fixed modification: carbamidomethylation, variable modifications: methionine oxidation and acetylation at the protein N-terminal end, and missed cleavages: one. Based on the measured PMF at least one peptide was selected for MS/MS experiments. Search parameters were identical to PMF experiments, except for product ion tolerance (±1.0 Da). A protein was considered as identified, if the scores of database searches clearly exceeded the algorithm's significance threshold (*p* < 0.05) for PMF data and for sequence tag ion analyses of at least one peptide.

# RNA Isolation, cDNA Synthesis, and Quantitative Real-time PCR (qRT-PCR)

Total RNA was isolated from planktonic and biofilm bacteria using Trizol reagent (Life Technologies, Invitrogen) following manufacturer's instructions. The RNA was treated with DNase I (Promega, Madison, WI, USA) to remove contaminating DNA and cDNA synthesis was performed using hexamers primers (Promega, Madison, WI, USA) and M-MLV retrotranscriptase enzyme (Invitrogen, Carlsbad, CA, USA) following supplier's protocol. Specific primers were used to determine transcript levels of the selected genes (**Table 2**). SYBR premix (Thermo Scientific) was used for qPCR assays following manufacturer's instructions. Reactions were carried out on triplicate samples, including technical duplicates. Relative mRNA expression ratios of selected genes were normalized to the expression of 16S rRNA. Calculations for comparison between samples were performed using the --*CT* (where *CT* is threshold cycle) method as described by Conover et al. (2012). In the case of differences in primers efficiency a modification was done following the method described by Pfaffl (2001).

# DNA Sequencing and Data Analysis

The *bvgS* gene was sequenced for all clinical isolates used in this study. In addition, the following genes and their promoter regions were sequenced for clinical isolate Bp2723: *fhaB*, *ptxS1*, *fim3*, *prn*, *bsp22*, *bcrH2*, *vag8*, *brkA*, and *bvgA* using the primers listed in **Table 2**. For chromosomal PCR amplification the procedure described by Van Loo and Mooi (2002) was employed. Briefly, 1 μL of DNA was added to 19 μL of buffer comprising 50% Hotstar Taq Master mix kit (Qiagen), 1 μM concentration of each primer, and 5–10% dimethyl sulfoxide. Amplification of genes was performed in a Hybaid Omnigene incubator. The PCR fragments were purified with QIAquick PCR purification kit (Qiagen) and sequenced with the primers used for amplification and internal primers (not shown). Sequence reactions were performed with an ABI Prism Big Dye terminator reaction Kit, and the reactions were analyzed using a 377 or 3700 ABI DNA Sequencer (Perkin–Elmer, Applied Biosystems). The resulting sequences were searched against NCBI nucleotide or non-redundant protein database by using BLAST tool.



# RESULTS

# Planktonic Growth and Biofilm Formation Capacity on Abiotic Surfaces by *B. pertussis* Tohama I Strain and Clinical Isolates

Using a collection of eight clinical isolates recovered in Argentina from whooping cough patients and the reference strain *B. pertussis* Tohama I, a comparative growth analysis under planktonic conditions in SS broth was performed. **Figure 1** shows the growth kinetics for Tohama I strain and the clinical isolates *B. pertussi*s 2723 (Bp 2723), *B. pertussis* 892, *B. pertussis* 1918, and *B. pertussis* 492. The four isolates are depicted as representative for the clinical isolates, which showed similar growth behavior (data not shown). At stationary phase the planktonic biomass of the isolates -measured by optical density- was approximately 70% higher than the biomass reached by *B. pertussis* Tohama I strain. From batch cultures, specific growth rate (μ) for each strain was calculated. *B. pertussis* 2723, as well as the other clinical isolates, exhibited similar specific growth rates of 0.091 <sup>±</sup> 0.003 h−<sup>1</sup> while the μ of Tohama I strain under the same experimental conditions was significantly lower (0.052 <sup>±</sup> 0.002 h<sup>−</sup>1). Next, we compared the adhesion and the mature biofilm biomass of the clinical isolates on abiotic surface. All clinical isolates showed higher

adhesion to polypropylene beads after 4 h of static incubation (data not shown) and higher biofilm biomass production after 72 h of culture (mature biofilm) compared to the reference strain (**Figure 2**). However, the final sessile biomass was different for each isolate. The *B. pertussis* 2723 strain was selected for further analysis since it exhibited fivefold more biofilm biomass as well as 70% more biomass under planktonic conditions compared to the reference strain Tohama I. The growth kinetics of this clinical isolate and *B. pertussis* Tohama I strain growing as biofilm were also studied. The clinical isolate Bp2723 showed a final biomass of 2.6 <sup>×</sup> <sup>10</sup><sup>10</sup> CFU/cm<sup>2</sup> while the biomass for reference strain was 2.5 <sup>×</sup> 109 CFU/cm2. In addition, the specific growth rate for Bp2723 was 0.033 <sup>±</sup> 0.002 h−<sup>1</sup> and for Tohama I strain 0.028 <sup>±</sup> 0.001 h−1. Next, and to differentiate the structure of mature biofilms produced by both Bp2723 clinical isolate and Tohama I strain, micrograph images of biofilms were obtained from CLSM stacks (Supplementary Figure S1). The images were then analyzed by using COMSTAT software. The images obtained showed characteristic biofilm architecture with channels for both, Bp Tohama I and Bp2723. A more profound analysis revealed that the clinical strain produced a bigger biofilm, characterized by a maximum thickness of around the double of that achieved by the reference strain. Five architectural parameters calculated for the two biofilms are provided in **Table 3**. These parameters show not only significant differences in the thickness of the biofilms but also in the covered surface and the roughness coefficient that were higher for the clinical strain biofilm. This analysis revealed an apparent higher complexity of the biofilm produced by Bp2723 compared to the biofilm produced by the reference strain and confirm its enhanced biomass production under this culture condition. To gain insight into the nature of the biofilm developed by the clinical isolate we employed FT-IR spectroscopy for the comparison of the chemical composition of the biofilms developed by the reference strain and the clinical isolate growing in similar environmental

conditions. FT-IR spectroscopy, one of the most frequently used spectroscopic techniques to compare biochemical composition among biological samples, in association with PCA showed clear chemical differences between the FT-IR spectra obtained from both biofilms communities (**Figure 3**). Differences in biomass composition were detected between spectral areas assigned to protein and carbohydrates. The protein:carbohydrate ratio was 1.684 for Bp2723 and 1.142 for *B. pertussis* Tohama I strain, respectively. Based on this information we decided to explore the molecular basis for these significant phenotypic differences between the reference strain and the clinical isolate selected.

# 2D-DIGE Analysis and Protein Identification

*Bordetella pertussis* Tohama I strain and the clinical isolate *B. pertussis* 2723 were grown in parallel as planktonic cells for 24 h in SS medium (exponential phase) and as biofilm on polypropylene beads for 72 h (mature stage). To investigate the differentially expressed proteins in the reference strain and the clinical isolate under both culture conditions, a comparative

TABLE 3 | Biometric parameters obtained from 72 h biofilms formed by *Bordetella pertussis* wild type (Bp Tohama I) and the clinical isolate *B. pertussis* 2723.


*The features of the biofilms were quantified using the program COMSTAT. Parameters correspond to mean values. Standard errors are indicated.*

proteomic analysis was performed. The soluble cellular protein fraction was isolated from three replicates per strain and growth condition, and subjected to differential 2D DIGE analysis in two pH ranges (4–7 and 6–9) and a PCA was carried out. As shown in Supplementary Figure S2, the statistical analysis of each biological replicate, clearly indicates a distinct clustering of the four groups demonstrating a high reproducibility between the replicate samples. In addition, the analysis demonstrates that the highest variation was found to be strain-dependent (PC1), whereas PC2 discriminates the different growth conditions. Representative 2D electrophoresis patterns of bacterial proteins based on the internal standard sample for both pH ranges are depicted in **Figure 4**. The global proteome analysis showed that out of a total of 1275 spots analyzed, 65 proteins (5.1%) were differentially expressed in *B. pertussis* 2723 compared to the reference strain growing attached to surface and under planktonic culture conditions. Forty eight differentially expressed protein spots were selected for protein identification based on a combination of selection criteria as published elsewhere (Radwan et al., 2008). MALDI–ToF–MS–MS analysis resulted in the identification of 35 different proteins and/or protein species (Supplementary Table S1). The clinical isolate showed, in comparison to Tohama I strain 10 up-regulated proteins (*p* < 0.05) and five down-regulated proteins (*p* < 0.05) growing under biofilm conditions, and 27 proteins up-regulated and eight proteins down-regulated growing under planktonic conditions. These differentially expressed proteins can be assigned to five functional categories, namely metabolismenergy production, amino acid and protein synthesis, transport, virulence, and cellular process (Supplementary Table S1). More specifically, within the metabolic group, four proteins (aconitate hydratase –*acnB*-, dihydrolipoamide succinyl transferase component of 2-oxoglutarate dehydrogenase complex –*odhB*-, citrate synthase –*gltA*- and enoyl-CoA

hydratase/isomerase –*acnA*-), related to the energy production were found in higher abundance in the clinical isolate under both culture conditions. Three out of four proteins belonging to "amino acids and protein biosynthesis" pathways were found down regulated in the clinical isolate 2723. These proteins correspond to enzymes involved in phenylalanine, tyrosine, tryptophan, and lysine biosynthesis. The fourth protein, cystathionine beta-lyase -*metC*-, which is related to methionine synthesis and sulfur metabolism, was up-regulated in *B. pertussis* 2723. In addition, three proteins involved in amino acids transport [leu/ile/val (branched chain amino acid-) –*Q7VYN1*- binding protein –*livJ*-, amino acid- binding periplasmic protein –*Q7VS83*-, ABC transporter ATP binding protein –*Q7VTG4-*] and four proteins related to stress response and adaptation (putative Zinc protease –*Q7VVY4*-, antioxidant protein –*Q7VZE7*-, chaperone protein DnaJ –*Q7VVY3*-, and protein tex –*Q45388*-) were found to be up-regulated in the clinical isolate under both culture conditions (Supplementary Table S1). Noteworthy, four BvgAS-activated virulence factors, BcrH2, OmpQ, Vag8, and BrkA, were found under both growth conditions in higher abundance in the clinical isolate 2723 than in the reference strain. One of these proteins, namely BcrH2, is a chaperone protein member of the Type III secretion system (T3SS). T3SS correspond to an injection system that delivers virulence factors into the host cells changing its physiological functions (Mattoo et al., 2004; Medhekar et al., 2009). It is known to protrude from bacterial outer membrane of many Gram-negative bacteria. Another BvgAS regulated protein found to be differentially expressed in the clinical isolate is OmpQ, an outer membrane porin protein that so far has no clear assigned function (Finn et al., 1995). In addition, Vag8 and BrkA were also found in higher abundance in the clinical isolate. Vag8 is an autotransporter recently described to participate in serum resistance and described to bind C1; and BrkA, another serum resistance protein of *B. pertussis*, was reported as necessary for efficient colonization of mice (Finn and Amsbaugh, 1998; Barnes and Weiss, 2001; Marr et al., 2011). These latter observations were particularly interesting since we detected four BvgAS-activated proteins highly expressed in the clinical isolate compared to the Tohama I strain in soluble fraction. As BvgAS-activated genes play an essential role in biofilm development, we next studied the expression of BvgAS regulated adhesins in both strains under biofilm and planktonic culture conditions.

# qRT-PCR for *bvgAS* and its Positively Regulated Genes

qRT-PCR was performed to measure mRNA expression levels of BvgAS-regulated genes: *fhaB*, *fim* (*fim2* for Tohama I and *fim3* for Bp2723), *prn*, and *bipA* in cells grown under biofilm and planktonic conditions. In addition *vag8*, *brkA*, *bcrH2,* and *ompQ* genes, encoding the four proteins which showed increased levels of expression by *B. pertussis* 2723 compared to Tohama I strain in the cytosolic proteome (Supplementary Table S1), as well as *bvgS, bvgA,* and *ptx* genes were included in the analysis. This approach revealed a higher mRNA expression levels of *vag8*, *bcrH2, prn,* and *brkA* in planktonic cells of the clinical isolate compared to Tohama I strain (**Figure 5A**). Interestingly, significantly higher mRNA expression levels of *fhaB*, *fim3, ptxS1*, *bipA, vag8*, *prn, bvgA, bsp22,* and *brkA*, genes were found in sessile cells of the clinical isolate compared to the reference strain (*<sup>p</sup>* <sup>&</sup>lt; 0.05; **Figure 5B**). The higher expression of *fhaB, prn*, *fim3*, and *bipA* genes, might probably be associated with increased biofilm formation capacity depicted by the clinical isolate, since these genes encode adhesins reported to participate in the development of *B. pertussis* mature biofilm (Serra et al., 2011; Sugisaki et al., 2013; de Gouw et al., 2014).

# Sequence Analysis of *bvgA*, *bvgS*, Virulence Genes, and Promoters

To determine whether specific nucleotide changes within virulence genes, or their promoter regions had occurred in Bp2723, the respective DNA sequences and their flanking regions were analyzed. No changes were detected when compared with

the published sequences of Tohama I strain (GenBank accession number BX470248.1). However, one nucleotide polymorphism, which corresponds to a single exchange of an A for a G at position 2113 of the *bvgS* gene (**Figure 6**), resulting in the replacement of K by E in the amino acid sequence of BvgS protein was observed for Bp 2723. Based on this result, we sequenced the *bvgS* gene of all clinical isolates used in this study and surprisingly found the same mutation in each clinical isolate (**Figure 6**). Interestingly, the same nucleotide exchange has been previously described by Herrou et al. (2009) for strains circulating in the Netherlands.

# Effect of E at 705 Position of BvgS Sensor on the Biofilm Formation Ability of *B. pertussis*

To evaluate whether the mutation detected in the *bvgS* gene could affect biofilm formation ability, *B. pertussis* 2723 and *B. pertussis* Tohama I strain were tested for their capacity for biofilm development. Growth was evaluated in tube-bioreactors containing polypropylene beads. The mutant strain BpK705E with an E at position 705 of BvgS and the wild type strain (BPSM) with a K at position 705 were also included in this analysis. Our results did not reveal any significant differences in the mature biofilm biomass of BpK705E strain, the wild type

isolates analyzed are shown. The single nucleotide mutation at position 2113 is highlighted in yellow.

BPSM and Tohama I strain after 72 h of culture (data not shown), suggesting that this point mutation in the *bvgS* gene is not associated to the increased biofilm biomass shown by the clinical isolate. Since this mutation has been previously described to confer a sensitive response to modulatory agents, such as MgSO4 and nicotinic acid, we investigated whether this mutation could trigger a faster attachment of bacteria when they are transferred from a modulatory environment to a non-modulatory one. The four strains described above were therefore modulated using SS culture medium with the addition of 40 mM MgSO4 and then incubated in a non-modulating SS medium under static conditions with polypropylene beads for 4 h. Under this condition, the surface adhesion of each strain was monitored every 30 min. Noteworthy, the clinical isolate Bp2723 and the mutant BpK705E strain adhered to polypropylene beads significantly faster than Tohama I and BPSM wild type strains (**Figure 7**). The kinetic adhesion of the Tohama I and BPSM wild type strains showed a lag period of 120 min while an increase of biomass attached was already observed for the clinical isolate and BpK705E strain after 30 min. These findings suggest that the *bvgS* allele coding for the 705 E protein variant might be associated with an accelerated expression of adhesins involved in the initial adhesion steps.


# DISCUSSION

In the current work, we investigated the biofilm formation capacity of eight argentinean *B. pertussis* clinical isolates recovered over a seven years time period in a local children hospital compared to a laboratory adapted strain grown under biofilm conditions. Clinical strains showed an increased ability to grow attached to polypropylene surfaces compared to the laboratory strain. The architecture of biofilm developed by the reference strain was compared with the one developed by the isolate Bp2723 growing in similar culture conditions by using CLSM. We found a marked variability in three dimensional biofilm structures between the two strains studied. The measurement of parameters extracted from confocal stack images analyzed by COMSTAT software such as thickness, roughness coefficient, surface to biovolume ratio, revealed that the biofilm developed by clinical isolate is significantly thicker than the formed by the reference strain. In addition to the differences between the architectures of both biofilms a FT-IR spectroscopic analysis also showed clearly phenotypic variations between these two biofilms. Thus, to better understand the different growth performances between clinical isolates and Tohama I strain we carried out a proteome investigation. The major functional groups of differentially expressed proteins in the clinical isolate included energy metabolism, transport, stress and regulation, and virulence factors. Seven proteins involved in metabolism and energy production were found upregulated in sessile cells of the clinical isolate *B. pertussis* 2723. Among them the citrate synthase that catalyzes the first reaction in the TCA cycle represents an important control point for determining the metabolic rate of the cell (Park et al., 1994). The higher expression of proteins associated to energy production under respiratory conditions was attributed primarily to the cell biosynthetic needs to produce higher biomass quantities. Our results indicate that the clinical isolate 2723 could have different metabolic and energetic requirements than the reference strain, which is supported by the higher final biomass reached for this isolate both in liquid medium and biofilm growth conditions. However, this higher capacity of biofilm formation is not associated with a higher growth rate. The TCA cycle is not only a central point in the metabolism of living organisms but also important for the survival of infectious biofilms. Therefore, its inhibition could be a promising strategy for the control of biofilms (Yahya et al., 2014). Similar results were previously reported in comparative proteomic studies of two *Burkholderia cenocepacia* isolates retrieved from a chronically infected cystic fibrosis patient. *B. cenocepacia* isolate obtained after 3 years of persistent infection and antibiotic therapy, showed an upregulation of citrate synthase (Madeira et al., 2011), which was reported to be important for biofilm formation and virulence (Subramoni et al., 2011). The results from the *B. cenocepacia* study as well as the results from our current study on *B. pertussis*, point toward a tight link between primary metabolism and biofilm formation capability.

Our proteome analysis revealed an increased expression of the Bvg-activated factors BcrH2, OmpQ, BrkA, and Vag8 in the clinical isolate *B. pertussis* 2723. These proteins are positively regulated by the *B. pertussis* BvgAS two-component signal transduction system. This system is known for its key role in the regulation of *Bordetella* virulence gene expression including adhesins and toxins, and it has also been shown to be determinant for the ability of *Bordetella* species to produce biofilms (Irie et al., 2004; Mishra et al., 2005; Serra et al., 2007). If BvgAS is not active *B. pertussis* is unable to adhere to respiratory tract and colonize the host (Bassinet et al., 2000; Scheller et al., 2015). Using quantitative real-time PCR assays we analyzed the relative expression level of adhesin genes, known to be positively regulated by the BvgAS system. After 72 h of biofilm growth these genes were transcribed at higher levels in the clinical isolated compared to Tohama I strain. In addition the *bvgA* regulatory gene showed three times higher transcription in *B. pertussis* 2723 sessile cells compared to reference strain, although no significant increase was found on the protein level. The latter results underpin the importance and value of combinatory analysis on transcriptional and translation levels. Although beyond the scope of our current work, these studies should be expanded in the future to post-translation and metabolic level to gain a holistic picture of pathogen host adaptation mechanisms.

The adhesins tested, namely FHA, Fim3, Prn, and BipA, showed higher transcriptional levels by qRT-PCR in the clinical isolate grown as biofilm compared to the reference strain Tohama I. FHA, one of the main adhesins described for *B. pertussis,* is involved in different steps of biofilm formation *in vitro* and *in vivo* in mouse nasopharynx, contributing not only to the first adhesion to the surface but also enhancing cell–cell interactions (Serra et al., 2011). Recently, we also reported that BipA is a common signature of *B.* pertussis biofilms (de Gouw et al., 2014). Interestingly, although Prn negative strains are now increasingly being isolated from patients with whooping cough (Barkoff et al., 2012; Pawloski et al., 2014), our results showed that this protein is up-regulated in biofilm culture for *B. pertussis* 2723. Proteomics and targeted transcriptomics approaches provide a picture of the changes between reference strain and a clinical *B. pertussis* isolate growing in similar culture conditions. These results are in agreement with the ones from FT-IR analysis, which show a chemical abundance of proteins in the mature structure of the biofilm produced by the clinical isolate. Therefore, it is tempting to speculate that: (i) the high expression of adhesins, mediating a faster and enhanced attachment, as well as (ii) the higher expression of enzymes involved in energy metabolism, leading to the augmented biomass, are responsible for the robust biofilm structure of the clinical isolate. When bacteria are under stress conditions, they often get together to form biofilms, which suggests that this bacterial lifestyle increases the fitness of the cells in harsh environments. Differential gene expression patterns between Tohama I strain and clinical isolates planktonic cells were previously attributed to either sequence divergence in *cis*regulatory regions or variation in the levels, activity, or encoding of transcriptional regulatory proteins (Cummings et al., 2006). However, in our current study, the higher transcription of adhesin genes could not be assigned to specific polymorphisms in the sequences of structural genes or promoters, suggesting that trans-acting factors could be involved. A single nucleotide mutation was found in *bvgS* gene of all clinical isolates tested, resulting in an exchange of lysine by glutamic acid at position 705 in the linker domain of the sensor protein. Interestingly, Herrou et al. (2009) reported the same mutation previously. Using an experimental infection model, they demonstrated that this mutation in the BvgS sensor of *B. pertussis* BPSM strain does not lead to a better pulmonary survival of the pathogen, though a faster response to modulatory agents like MgSO4 and nicotinic acid was observed. The latter results are in agreement with our current findings. After being modulated by MgSO4, the clinical isolate Bp2723 were transferred to SS medium and incubated under non-modulating conditions with polypropylene beads. Under the latter culture condition cells showed faster adherence to abiotic surfaces than the reference strain. In our experimental conditions, both, the clinical isolate Bp2723 and the Bp*K*075*<sup>E</sup>* strain showed an accelerated adhesion kinetic to polypropylene beads compared to Tohama I and the wild type BPSM strain. This high phase variation capability might represent an important adaptive advantage during pathogen colonization of its host. Fast adhesion suggests that the phases are tuned to different environmental niches favoring spatially defined regulation. This timing for attachment could promote persistence by protecting bacteria from the clearance occasioned by hydrodynamic forces in upper respiratory tract and the killing activity of host defense mechanisms. It is important to note that the increase of adhered cells (Bp2723 and Bp*K*075*<sup>E</sup>* strain) detected after 30 min of incubation under our experimental conditions is not due to growth on the surface since these sessile bacteria have a reduced specific growth rate (<sup>μ</sup> <sup>=</sup> 0.033 h<sup>−</sup>1, and duplication time of 21.0 h). *B. pertussis* transmission from human to human can probably occur by aerosolized respiratory droplets containing modulated bacteria. Thus, the results shown in our work could be important for the understanding of the rapid adaptation of clinical isolates to new environment. Our combinatory approach of proteomics, targeted transcriptional and genetic analysis revealed that multiple realms of regulation are governing the adaptation of *B. pertussis* to biofilm lifestyle.

# CONCLUSION

The divergent biofilm responses between *B. pertussis* clinical isolates and the laboratory adapted reference strain suggest that clinical isolates probably evolved in order to increase their potential and capacity to form biofilm and eventually to adapt rapidly to the fluctuations that they encounter at the site of infection. To date the emergence of pertussis remains a critical issue that should gear researchers to develop novel control measures by considering particularly the biofilm as a *B. pertussis* way of life.

# AUTHOR CONTRIBUTIONS

Conceived and designed the experiments: OY and ME-S. The experiments were performed by LA, TG, NC, DdG, MV, DS. Analysis of data: all the authors. Contributed reagents/materials/analysis tools: OY, ME-S, and FM. Contributed to the writing of manuscript: all the authors.

# FUNDING

OY is supported by grants from the Ministry of Science and Technology of Argentina (ANPCyT-PICT 2012-2514 and MINCYT-Dirección de Relaciones Internacionales). LA and NC are fellows of CONICET-Argentina. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

# ACKNOWLEDGMENTS

We would like to acknowledge the support of the Science and Technology Cooperation Program between Argentina and Germany as well as Argentina and Austria. We are grateful to Francoise Jacob-Dubuisson from Institut Pasteur, Lille, France, for providing the bacterial strain BpK705E. We thank Nicole Guiso from Institut Pasteur, Paris, France for the analysis of PtxS1, Prn and Fim of clinical isolates.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmicb. 2015.01352

# REFERENCES


one possible explanation for the emergence of antigenic variants? *J. Infect. Dis.* 187, 1200–1205. doi: 10.1086/368412


factors P.*69*/pertactin and pertussis toxin in The Netherlands: temporal trends and evidence for vaccine-driven evolution. *Infect. Immun.* 66, 670–675.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Arnal, Grunert, Cattelan, de Gouw, Villalba, Serra, Mooi, Ehling-Schulz and Yantorno. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# *Candida albicans* Shaving to Profile Human Serum Proteins on Hyphal Surface

Elvira Marín<sup>1</sup> , Claudia M. Parra-Giraldo<sup>1</sup> † , Carolina Hernández-Haro<sup>1</sup> , María L. Hernáez <sup>2</sup> , César Nombela1, 3 †, Lucía Monteoliva1, 3 \* and Concha Gil 1, 2, 3

#### *Edited by:*

Nelson Cruz Soares, University of Cape Town, South Africa

#### *Reviewed by:*

Gustavo Antonio De Souza, University of Oslo, Norway Maria Dolores Moragues, Universidad del País Vasco, Spain

> *\*Correspondence:* Lucía Monteoliva luciamon@ucm.es

#### *†Present Address:*

Claudia M. Parra-Giraldo, Grupo de Enfermedades Infecciosas, Departamento de Microbiología, Facultad de Ciencias, Pontificia Universidad Javeriana, Colombia; César Nombela, Universidad Internacional Menéndez Pelayo, Madrid, Spain

#### *Specialty section:*

This article was submitted to Microbial Physiology and Metabolism, a section of the journal Frontiers in Microbiology

> *Received:* 04 September 2015 *Accepted:* 16 November 2015 *Published:* 08 December 2015

#### *Citation:*

Marín E, Parra-Giraldo CM, Hernández-Haro C, Hernáez ML, Nombela C, Monteoliva L and Gil C (2015) Candida albicans Shaving to Profile Human Serum Proteins on Hyphal Surface. Front. Microbiol. 6:1343. doi: 10.3389/fmicb.2015.01343 <sup>1</sup> Departamento de Microbiología II, Facultad de Farmacia, Universidad Complutense de Madrid, Madrid, Spain, <sup>2</sup> Unidad de Proteómica, Facultad de Farmacia, Universidad Complutense de Madrid, Madrid, Spain, <sup>3</sup> Instituto Ramón y Cajal de Investigación Sanitaria, Madrid, Spain

Candida albicans is a human opportunistic fungus and it is responsible for a wide variety of infections, either superficial or systemic. C. albicans is a polymorphic fungus and its ability to switch between yeast and hyphae is essential for its virulence. Once C. albicans obtains access to the human body, the host serum constitutes a complex environment of interaction with C. albicans cell surface in bloodstream. To draw a comprehensive picture of this relevant step in host-pathogen interaction during invasive candidiasis, we have optimized a gel-free shaving proteomic strategy to identify both, human serum proteins coating C. albicans cells and fungi surface proteins simultaneously. This approach was carried out with normal serum (NS) and heat inactivated serum (HIS). We identified 214 human and 372 C. albicans unique proteins. Proteins identified in C. albicans included 147 which were described as located at the cell surface and 52 that were described as immunogenic. Interestingly, among these C. albicans proteins, we identified 23 GPI-anchored proteins, Gpd2 and Pra1, which are involved in complement system evasion and 7 other proteins that are able to attach plasminogen to C. albicans surface (Adh1, Eno1, Fba1, Pgk1, Tdh3, Tef1, and Tsa1). Furthermore, 12 proteins identified at the C. albicans hyphae surface induced with 10% human serum were not detected in other hypha-induced conditions. The most abundant human proteins identified are involved in complement and coagulation pathways. Remarkably, with this strategy, all main proteins belonging to complement cascades were identified on the C. albicans surface. Moreover, we identified immunoglobulins, cytoskeletal proteins, metabolic proteins such as apolipoproteins and others. Additionally, we identified more inhibitors of complement and coagulation pathways, some of them serpin proteins (serine protease inhibitors), in HIS vs. NS. On the other hand, we detected a higher amount of C3 at the C. albicans surface in NS than in HIS, as validated by immunofluorescence.

Keywords: *Candida albicans*, shaving, surface proteins, human serum, complement pathways, coagulation pathways, GPI-anchored proteins, host-pathogen interaction

# INTRODUCTION

The yeast Candida albicans is the most important opportunistic fungi, causing a wide variety of infections ranging from superficial to invasive candidiasis, and it is often found in the mucosal microbiota. C. albicans infections produce high morbidity and mortality in intensive care, post-surgery and cancer patients or other types of immunocompromised patients (Leleu et al., 2002; Almirante et al., 2005; Hube, 2006; Brown et al., 2012). The high mortality of these infections is determined by the diagnostic limitations, the scarcity of antifungal agents and the emergence of resistance to them (Viudes et al., 2002; Wilson et al., 2002). For these reasons, the incidence of invasive candidiasis is still high.

The cell wall of C. albicans is a dynamic and complex multilayered structure located external to the plasma membrane. It is responsible for maintenance of the shape that characterizes each growth form (mainly yeast and hyphae) of the fungus (Klis et al., 2009). The cell wall mediates the first contact with the environment and integrates multiple cues into complex signaling networks that are coordinated by various transcription factors. Consequently, C. albicans differentially expresses cell surface proteins and virulence factors. The dimorphic transition that is the ability of C. albicans to reversibly switch from yeast to hyphal growth is essential for virulence; strains that are locked in either form are avirulent (Yan et al., 2013; Lu et al., 2014). The hyphal form allows the pathogen to penetrate into tissues to acquire nutrients or escape from the host defense, and yeast cells disseminate across the human body. There are many signals capable of inducing the dimorphic phenotypic switch: N-acetyl-D-glucosamine, physiological temperature and pH (37◦C and neutral), high CO<sup>2</sup> concentration, hypoxia and nutrient starvation; nonetheless, growing in human serum is the most physiological condition to study this process (Kumamoto and Vinces, 2005; Karkowska-Kuleta et al., 2009; Sudbery, 2011; Mayer et al., 2013; Ene et al., 2014).

C. albicans is able to adhere to host cells and tissues; for this reason, it exposes surface proteins such as adhesins and many other pathogenic factors (Chaffin, 2008). Adhesion to host tissue is a prerequisite for tissue invasion and infection. Among adhesins, there are the Als family (agglutinin-like sequence), the Hwp family and the Iff/Hyr family (de Groot et al., 2013). Pathogenic factors include tissue-digesting hydrolytic enzymes such as the Sap family (Secreted Aspartyl Proteinases) (Kretschmar et al., 2002; Buu and Chen, 2014), lipases and phosholipases. The Sap family is composed of 10 proteins with different expression patterns; they act optimally at acidic pH. Their main role is related to digesting molecules for nutrient acquisition and for digesting or distorting host cell membranes to avoid or resist antimicrobial attack by the host immune system (Naglik et al., 2003). Some adhesins are attached to the wall through a C-terminus glycosylphosphatidylinositol (GPI) anchorage sequence (De Groot et al., 2003; Pardo et al., 2004; Klis et al., 2009).

Human serum is a very complex body fluid; its qualitative and quantitative composition was a hot topic of study during years (Anderson and Anderson, 2002; Mitchell, 2010). In the Plasma Proteome Database, proteomic data of 10,546 proteins detected in human serum and plasma are collected (Nanjappa et al., 2014). The dynamic range of abundance of these proteins has more than 10 orders of magnitude of variation. The complement system is an efficient component of the innate immune response and establishes a link with the adaptive system. The three conventional pathways by which complement is activated (classical, alternative and lectin) are composed of around 30 proteins (Engleberg et al., 2012). It mediates several physiological processes such as tissue regeneration and clearance of immune complexes. C2 and factor B (FB) are the main heatsensitive complement components (Soltis et al., 1979). C2 is the central component in the activation of classical and lectin pathways and FB is for alternative pathway; both are necessary for the formation of C3- and C5-convertases in the respective complement cascades aforementioned.

C. albicans develops several strategies to evade the complement system, delay the detection by the host immune systems and even capable of manipulate the host defenses for its own purpose (Collette and Lorenz, 2011). In this way, C. albicans acquires host-derived inhibitory proteins such as factor H (FH), factor H-like protein 1 (FHL-1) and C4b-binding protein (C4BP), to prevent activation of complement pathways on its surface. These three protein regulators are closely related molecules composed of variable numbers of short consensus repeat (SCR) domains. There are four C. albicans proteins described with the ability to fix FH on their surface: glycerol 3-phosphate-dehydrogenase 2 (Gpd2), high-affinity gluthathione transporter 1 (Hgt1), phosphoglycerate mutase 1 or complement regulator-acquiring surface protein 1 (Gpm1/CRASP1) and pH-regulated antigen 1 (Pra1/CRASP2) (Poltermann et al., 2007; Luo et al., 2009, 2013). With the exception of Hgt1, the other three are able also to bind plasminogen. Additionally, Pra1 and Hgt1 bind C4BP (Lesiak-Markowicz et al., 2011; Luo et al., 2011). Factor H mediates binding to polyanions (sialic acids and human heparin) present on C. albicans cell wall (Soares et al., 2000; Rodríguez de Córdoba et al., 2004; Green et al., 2013). Pra1 also binds fibrinogen and C3 directly (Zipfel et al., 2011).

The aim of this study is to know the human serum protein coat on C. albicans surface as well as which C. albicans proteins could be candidates for diagnostic markers, potential vaccines or therapeutic targets, as they are detected when human serum is present. The interaction between human serum proteins and other microorganism surface was tested by different proteomic approaches in Staphylococcus aureus and Streptococcus pyogenes (Dreisbach et al., 2011; Sjöholm et al., 2014). In this work, we carried out a gel-free proteomic approach by shaving live C. albicans cells with trypsin after serum interaction to obtain exposed peptides of C. albicans and human proteins. With this shaving strategy that was applied in our laboratory to C. albicans and S. cerevisiae cultured in synthetic media (Hernáez et al., 2010; Insenser et al., 2010; Gil-Bona et al., 2015), it is not necessary to perform sub-cellular fractionation. In the present work, we have optimized this protocol for the detection of C. albicans and human proteins together in conditions that mimic in vivo interaction during systemic infections; overall, 372 C. albicans and 214 human proteins were identified. Among C. albicans proteins, 147 are described as located at the cell surface. Interestingly Pra1 and Gpd2 are able to attach inhibitors of complement cascades and plasminogen; while a further 7 proteins are also capable of attaching plasminogen. Regarding human proteins, all main proteins of complement pathways were identified, and many proteins involved in coagulation and metabolism, as well as immunoglobulins or cytoskeletal proteins. Furthermore, in this work proteins of complement and coagulation pathways were observed surrounded the surface of C. albicans hyphal cells.

# MATERIALS AND METHODS

# *Candida albicans* Strain and Growth Conditions

The C. albicans strain used in this work was SC5314, from a clinical isolate (Gillum et al., 1984). This strain was grown on YPD plates (2% D-glucose, 2% peptone, 1% yeast extract, and 2% agar) and incubated at 30◦C. A colony was picked and grown in YPD medium at 30◦C and 200 rpm during 6–8 h. The OD600 nm was measured and diluted on minimum medium (MM) at 0.00003 OD600 nm (2% D-glucose, 0.17% yeast nitrogen base, 0.5% (NH4)2SO4, 0.192% synthetic complete mixture (Kaiser) drop-out -ura and 0.01% uracil). Culture was grown at 30◦C and 200 rpm overnight to obtain and collected at a concentration ∼10<sup>6</sup> cells/ml. 1.5 × 10<sup>7</sup> cells were resuspended in 9 ml of Lee medium at pH 6.7 (Lee et al., 1975). One milliliter of human serum was added and the culture was grown during 5 h at 37◦C and 200 rpm. Human sera were from healthy donor volunteers who did not have any clinical or microbiological evidence of infection. They were used in the experiments as normal serum (NS) or heat inactivated serum (HIS). Serum inactivation was done by heating at 56◦C during 30 min in a water bath.

# Surface Shaving

Surface shaving procedure was adapted from previous experiment of our group (Hernáez et al., 2010; Gil-Bona et al., 2015). Briefly, after incubation of C. albicans with 10% human serum (normal serum-NS or heat inactivated serum-HIS), the cultures were centrifuged at 3500 rpm 5 min. Cell pellet was resuspended in 1 ml of phosphate buffer saline (PBS) with 0.1% Tween 20, centrifuged again and washed 6 times more with PBS. The last pellet was resuspended in 400µl of ammonium bicarbonate (AMBI buffer; NH4HCO3) 25 mM pH 8.0 and 7.5µl dithiothreitol (DTT) 1 M and 9µg of recombinant sequencing grade trypsin (Roche) were added. DTT was added during the digestion as reducing agent to consent deep digestion of non-covalently or disulphide bridges associated protein. The samples were incubated at 37◦C, 30 min and 600 rpm. After incubation, samples were centrifuged at 5000 rpm 5 min and the supernatant was filtered with a filter unit of 0.22µm. The cell pellet was resuspended in 400µl of fresh AMBI buffer and 100µl of trifluoroacetic acid (TFA) 0.1% (v/v) were added to stop the proteolytic reaction. It was centrifuged and the supernatant was filtered again. Both peptide-supernatants were put together and processed for further proteomic analysis. Subsequently, originated peptides were cleaned up with a Poros R2 resin (ABSciex, Framingham, MA). Peptides were eluted with 80% acetonitrile in 0.1% TFA, dried in a Speed-Vac and resuspended in 0.1% formic acid. The samples were stored at −20◦C before nano-LC-MS/MS analysis.

# LTQ-Orbitrap Velos Analysis, Protein Identification, and Bioinformatics Analysis

Peptides were analyzed by nano-LC-MS/MS analysis. Peptide samples were concentrated and desalted using C18-A1 ASYcolumn 2 cm pre-column (Thermo Scientific) and then eluted onto a Biosphere C18 column (Nano-Separations). Peptides were separated with a 140 min gradient (110 min from 0 to 40% Buffer B; Buffer A: 0.1% formic acid/2% acetonitrile; Buffer B: 0.1% formic acid in acetonitrile) at a flow-rate of 250 nl/min on a nano-Easy HPLC (Proxeon) coupled to a nano-electrospray ion source (Proxeon). Mass spectrometry experiments were performed using a LTQ-Orbitrap Velos (Thermo Scientific) in the positive ion mode. Full-scan MS spectra (m/z 400/1400) were acquired in the Orbitrap apparatus with a target value of 1,000,000 at a resolution of 60,000 at m/z 400 and the 15 most intense ions were selected for collision induced dissociation (CID) fragmentation in the LTQ with a target value of 10,000 and normalized collision energy of 38%. Precursor ion charge state screening and monoisotopic precursor selection were enabled. Singly charged ions and unassigned charge states were rejected. Dynamic exclusion was enabled with a repeat count of 1 and exclusion duration of 30 s.

Protein identification was carried out with mass spectra raw files using a licensed version of search engine MASCOT 2.3.0 with Proteome Discoverer software version 1.4.1.14 (Thermo Scientific). Searchers were made against C. albicans SC5314 (6221 sequences) and human database (20233 sequences) available on Candida Genome Database (Assembly 21 of CGD, http://www.candidagenome.org/) and Uniprot-SwissProt (http://www.uniprot.org) respectively, to identify peptides and proteins. Search parameters were oxidized methionine as variable modification, peptide mass tolerance 10 ppm, 2 missed trypsin cleavage sites and MS/MS fragment mass tolerance of 0.8 Da. In all protein identification, the FDR was <1%, using a Mascot Percolator, with a q-value of 0.01. Gene Ontology (GO) terms at CGD (http://www.candidagenome.org/cgi-bin/ GO/goTermFinder) were used to classify C. albicans proteins.

As an estimation of the relative protein abundances the normalized spectral abundance factor (NSAF) was used (Zybailov et al., 2007), and the average of the normalized values was calculated. Mass spectrometry proteomics data have been deposited in C. albicans PeptideAtlas (Vialas et al., 2014) with the data set identifier PASS00446.

# Cell Permeability

Before and after trypsin treatment, C. albicans cell wall permeability was evaluated by staining with propidium iodide (PI). 10<sup>6</sup> cells/ml were incubated with 10µl of PI 5 mM and the positive fluorescent cells (red staining) were checked, at least 200 cells of each sample were counted in a fluorescence microscope. Cells treated with 70% ethanol-PBS (v/v) were used as positive control.

# Epifluorescence and Confocal Fluorescence Microscopy

For detecting complement proteins on C. albicans surface after interaction with human serum, a solution of 10<sup>6</sup> cells/ml on Lee medium at pH 6.7 with 10% of human serum (NS or HIS) was prepared. 10<sup>5</sup> cells of this suspension were laid during 30 min, 1.5 or 5 h at 37◦C on glass coverslips pre-coated with poly-L-lysine (1 mg/ml). Cells were fixed with formaldehyde 4% (w/v) (in PBS) for 20 min at room temperature (RT). Glass slides were washed twice with PBS. To stain cell membrane, PKH26 dye was added following technical instructions (Sigma Aldrich). Then, slides were blocked during 45 min at RT with buffer B (PBS plus bovine serum albumin (BSA) at 1 mg/ml). The slides were washed twice again with PBS and then incubated for 1.5 h at RT in the same buffer with an anti-C3, anti-factor B polyclonal serum (dilution 1:75 and 1:50, respectively) or only buffer. The slides were washed three times with PBS and further incubation for 1 h with an antirabbit IgG conjugated with Alexa-488 diluted at 1:500 in buffer B was done. Nuclei were stained with DAPI dye (5µg/ml; 5 min at RT). Mounting medium Fluoromount-G (SouthernBiotech) was added to the preparations. Cells were then examined with epifluorescence and confocal microscopy images were collected using an Olympus FV1200 microscope.

# Flow Cytometry Analysis

C. albicans cells were grown as previously described on Immunofluorescence assay section. In this case the incubation with human serum (NS or HIS) was done during 30 min to avoid larger hyphae formation. After that, cells were washed with PBS and fixed with formaldehyde 4% in PBS during 20 min at RT. Then, three washes with PBS were performed and samples were blocked with 0.1% BSA in PBS at 4◦C overnight with gentle shaking. Samples were incubated with primary antibody or with buffer alone as negative control, for 2 h at RT. The primary antibodies were prepared in PBS with 0.1% BSA, anti-C3 and anti-FB at 1:75 and 1:50 dilutions, respectively. Four PBS washes were performed and incubated with secondary antibody below, an anti-rabbit conjugated with Alexa-488 diluted at 1:500, 2 h at RT with gently shaking. Then, samples were washed with PBS 4 times more. These cells were analyzed with a Guava easyCyte cytometer of Millipore.

# RESULTS

# Gel-free Proteomic Approach to Study Protein Interactions between *C. albicans* and Human Serum

In order to break through the broad spectrum of human serum proteins attached to C. albicans cells and the fungus surface proteins, we analyzed C. albicans cell surface after 5 h of incubation with 10% human serum by a gel-free proteomic strategy. Normal serum (NS) was used to mimic physiological conditions of C. albicans during invasive candidiasis and heat inactivated serum (HIS) to determine human proteins deposited on the C. albicans surface without conventional complement cascade activation. As human serum is the most important inductor of C. albicans dimorphic transition, after 5 h of incubation all C. albicans cells were in the hyphal form. To obtain the protein peptides, a shaving methodology was used, which consists of direct cell digestion with trypsin and MS/MS identification. Before digestion, cells were washed with PBS plus Tween-20 and several additional washes with PBS were done to remove human serum proteins weakly attached to C. albicans cells. We optimized parameters such as time and trypsin concentration during shaving of C. albicans cells, in order to control cell integrity and identify the highest number of proteins (from both, C. albicans and human). With short times of trypsin treatment, between 5 and 10 min, only human proteins were identified (data not shown). Otherwise, after 30 min, human and C. albicans proteins were detected; thus, this duration of trypsin incubation was selected. Before and after trypsin treatment, cells were treated with propidium iodide (PI) and the cell permeability state was analyzed under the microscope. Less than 1% positive staining was obtained in all conditions. Cell integrity was assessed with the purpose of avoiding cytoplasmic protein contamination after trypsin treatment.

Peptides obtained after C. albicans cells interaction with 10% human serum (NS or HIS) by shaving were recovered and identified by LC-MS/MS on a LTQ Orbitrap. Four biological replicates were made using NS and three using HIS. Mass spectra raw files were searched against C. albicans SC5314 and human databases (CGD and Uniprot-SwissProt, respectively) and proteins identified in at least two biological replicates with two or more peptides in one of them were selected. C. albicansidentified proteins are listed in **Table 1** and Table S1, and human proteins in **Table 2** and Table S2.

This analysis rendered the identification of 372 unique C. albicans proteins, 371 in NS and 134 in HIS (Table S1). There were 133 proteins identified in both conditions, accounting for 36% of the total C. albicans-identified proteins. The Orf19.8 (an ortholog of C. dubliniensis CD36) was the only protein identified with HIS and not with NS. Noteworthy, around ∼33% of the C. albicans proteins were identified in at least six of the seven analyzed samples. Furthermore, of the 372 C. albicans-identified proteins, 60 proteins presented a signal peptide (SP) at CGD. On the other hand, 214 human proteins were identified; of these, 128 were common to both conditions (60%), 43 were present only after incubation with NS and the other 43 only with HIS (**Table 2** and Table S2).

# Analysis of *C. albicans* Cell Surface Protein Pattern after Interaction with 10% human Serum

Only those C. albicans identified proteins with a Gene Ontology (GO) annotation at CGD pointing either to surface localization, either to surface exposure or with the ability to induce human serum antibodies, were included in **Table 1**. We focused our study on these 155 C. albicans proteins classified into 5 categories: glycosylphosphatidylinositol (GPI)-anchored proteins; proteins


TABLE 1 | Selected *C. albicans* proteins identified after incubation with 10% NS or HIS related with cell surface and immunogenicity.

The classification is hierarchical and exclusive. Proteins were included if they were detected in at least two replicates of one condition with at least 2 peptides in one replicate. There are four replicates of Normal Serum (NS) and three replicates of Heat Inactivated Serum (HIS). Protein names from Candida Genome Database (CGD) (Inglis et al., 2012).

<sup>a</sup>GPI-cell wall proteins reviewed in De Groot et al. (2003), Pardo et al. (2004), Chaffin (2008), Klis et al. (2009).

<sup>b</sup>Gene Ontology (GO) terms (in parentheses the GO identifier) at CGD.

Proteins able to induce human serum antibodies in patients with invasive candidiasis are indicated in bold (Pitarch et al., 2004, 2006, 2008, 2011; Mochon et al., 2010).

involved in cell wall organization or biogenesis; cell surface proteins; plasma membrane proteins and other immunogenic proteins not present in the previous group. The groups are hierarchical and exclusive.

The first category includes 23 proteins attached to the plasma membrane or cell wall by a GPI-anchor; most are involved in biogenesis and maintenance of the C. albicans cell wall. All GPIanchored proteins identified have a detected SP for entry into the secretory pathway at CGD. In the second category, there are 26 other proteins involved in cell wall organization or biogenesis. Among the 62 proteins included in the cell surface category, there are proteins located at the cell surface, although most of them do not have SP and could not reach this cell localization via the secretory pathway. Several of them have been described as "moonlighting" or multifunctional proteins. It is interesting that antibodies against a total of 52 C. albicans-identified proteins have been detected in human blood or serum from patients with invasive candidiasis (these immunogenic proteins are highlighted in **Table 1**). Many of these are abundant at the cell surface (as they were detected with more than 10 unique peptides) and were not secreted by the classical secretory pathway (not have SP), such as Eno1, Fba1, Hsp70, and 17 additional proteins.

It is also remarkable that two out of the four proteins described in C. albicans with the ability to fix the complement inhibitor factor H (FH) were identified. Pra1 was identified in both samples (NS and HIS), but Gpd2 was only in NS samples. Furthermore, we detected many C. albicans proteins described with the ability to bind plasminogen: Adh1, Eno1, Fba1, Gpd2, Pgk1, Pra1, Tdh3, Tef1, and Tsa1.

# Analysis of Human Serum Proteins that Interact with *C. albicans* Cell Surface Relevant to *C. albicans*-host Interaction

The 214 identified human proteins were classified into the following classes according to their function: complement pathway, coagulation pathway, metabolism, immunoglobulins, cytoskeleton, and others. Proteins belonging to complement and coagulation cascades are listed in **Table 2** and the rest of the proteins are in Table S2. The percentages of proteins identified in each category in the two conditions tested are shown in **Figure 1**. The complement and coagulation pathways represent higher percentages of the total identified proteins when HIS was used and, on the other hand, the immunoglobulin

#### TABLE 2 | Human proteins of complement and coagulation pathways identified on the surface of *C. albicans* after incubation with human serum.


(Continued)

#### TABLE 2 | Continued


<sup>a</sup>Accession number, gene symbol and function from UniProtKB/Swiss-Prot database (Uniprot, 2014).

<sup>b</sup>Proteins were included if they were identified in at least two replicates of one condition with at least 2 peptides in one replicate.

group represents a higher percentage when NS was used.

Close to all components of the three complement pathways (33 proteins) were identified, most of them in the two conditions tested. However, there are some complement proteins that have been detected only in the NS sample (Collectin-11) or in the HIS sample [complement C1q subcomponent subunit A (C1qA), complement C1r-like protein, complement factor D (FD), ficolin-2, phosphatidylinositol-glycan-specific phospholipase D (GPLD1) and plasma protease C1 inhibitor (SERPING1 or C1INH)] (**Table 3**). The identification of many complement inhibitors such as alpha-2-macroglobulin (A2M), C1INH, FH, and factor I (FI) is also interesting. In relation to the coagulation pathway, 31 proteins were identified, with most of them also being detected in the two conditions. Although heparanase was only identified in NS sample and coagulation factor XIII B chain, hyaluronan-binding protein 2 and plasma serine protease inhibitor (SERPINA5) were only identified in the HIS one. Interestingly, antithrombin-III (SERPINC1) and plasminogen (PLG), two coagulation proteins that are relevant complement inhibitors, were identified. Focusing on human immunoglobulins IgG and IgM, they were identified in NS and HIS samples and with a similar number of peptides in both of them (Table S2). The category of metabolic proteins includes many apolipoproteins detected on the surface of C. albicans (Table S2).

Interestingly, 13 proteins belonging to the serpin family were identified. Serpins are a relevant group of proteins with similar structures able to inhibit proteases. Some of them had only been observed in the HIS sample, such as SERPINA5, C1INH and cortisol-binding globulin (SERPINA6) (**Table 4**).

To estimate the relative abundance of human proteins identified on C. albicans surface, the normalized spectral abundance factor (NSAF) (Zybailov et al., 2007) of each protein was calculated and the average NSAF values among samples were assessed. We compared these NSAF values with the ranking of the original protein abundance in human plasma using data of the dynamic range of most abundant proteins adapted from Mitchell (2010). As shown in Figure S1, the relative abundance TABLE 3 | Comparison of complement proteins identified on the cell surface of *C. albicans* by shaving after incubation with NS vs. HIS.


Proteins indicated in bold are only identified in one condition.

Proteins with \*are identified only in one replicate of the indicated condition.

#### TABLE 4 | Identified proteins belonging to the SERPIN family.


of proteins in the plasma is different to the relative abundance at the hypha surface after interaction with human serum. For example, C3 protein of complement pathways was identified as one of the most abundant proteins on the surface of C. albicans in both conditions and it was not among the most abundant proteins in plasma. Also, in the apolipoprotein family, the relative abundance of some proteins is different in the plasma vs. the fungal surface. ApoB100 and ApoE are less abundant in human plasma than ApoCIII and both proteins showed higher NSAF than ApoCIII.

The abundance ranking of the proteins involved in the complement and coagulation pathways identified on C. albicans surface is shown in **Figure 2**. C3, C4-B, and C4-A had the highest NSAF results among complement proteins in both serum conditions. Complement components of Membrane Attack Complex (MAC: C6, C7, C8-A, C8-B, C8-G, and C9) had higher NSAF in NS than in HIS samples and only C5 had higher NSAF in HIS than in NS samples. In the coagulation pathway, it is notable that SERPINA1 (alpha-1-antitrypsin) is much more abundant in the HIS sample than other proteins such as HRG (histidine-rich glycoprotein), HCF2 (SERPIND1) or F2 (prothrombin), which are more abundant in the NS sample than SERPINA1.

During complement activation, C3, C4, and C5 are proteolyzed by cleavage at a specific site and small fragments are produced; these fragments are called anaphylotoxins or complement peptides, and are indicated with the letter "a" (Walport, 2001a,b). The analysis of the identified peptides belonging to: (i) anaphylotoxin regions (C3a, C4a, and C5a), (ii) released domains during proteolysis activation of complement


with 10% human serum at 37◦C (normal serum-NS or heat inactivated serum-HIS) and NSAF was calculated for proteins belonging to complement and coagulation proteins. The proteins were ordered according to their NSAF obtained in NS samples, from more to less abundant. Gradient color ranges from red (more abundant) to green (less abundant).

pathways (C2b and FBa), and (iii) of the MAC-interaction regions (MACPF) is shown in **Table 5**. The tendency was the identification of more peptides belonging to these regions in HIS samples than in NS samples. It is remarkable that peptides from C2b and C5a were not identified in NS samples.

# Validation of Proteomic Results

To validate the proteomic results, firstly the deposition level of C3 and FB on the C. albicans surface was analyzed by immunofluorescence microscopy and flow cytometry assays. C3 is a common component of the three complement cascades and together with A2M, are the two highest abundant complement proteins in human serum. Factor B (FB) is an abundant complement protein specific of the alternative pathway. C. albicans cells were incubated with NS and HIS and treated with anti-C3 and anti-FB antibodies (**Figure 3**). We observed the deposition of C3 and FB along the surface of all C. albicans cells when incubated with human serum (NS or HIS) and some areas with higher intensity in the case of C3 after 5 h of incubation with NS (**Figure 3A**, left panels). These differences TABLE 5 | Analysis of peptides corresponding to cleavage fragments produced during complement activation and to contact regions in MAC formation.


Identified peptides belong to the indicated released region or remain part of the full-length protein.

Gray background highlights identification in HIS condition.

between the fluorescence intensity of FB and C3 can be correlated with the number of peptides of these proteins in the samples at 5 h. When cells were incubated with HIS, the intensity of FB was slightly higher than in the case of NS incubation (it is well observed in short incubation times) (**Figure 3A**, right panels). To achieve flow cytometry assays, the incubation time of C. albicans cells with human serum was decreased to avoid the formation of large filaments. Within 30 min of incubation, the deposition of C3 was higher among the population in NS samples than in HIS (Figure S3). The intensity of FB was lower than that of C3, as expected, and less homogenous in both samples (NS and HIS) (upper panels). Mean fluorescence intensity (MFI) of two independent experiments were collected and showed that, in all cases, the MFI for C3 and FB was higher when cells were incubated with NS than with HIS at short times, although the differences are not significant in the case of FB (bottom panel).

All of these data together showed that C3 and FB deposition was clearly observed at short incubation times, and C3 enhanced its accumulation on the surface with longer incubation time and more dramatically when the incubation was with NS. However, FB accumulation on C. albicanssurface did not appear to increase with time, maybe because it can be masked by the rest of the complement proteins (**Figure 3A**).

Furthermore, to address whether C3 and FB are close to the cytoplasmic membrane, fluorescence co-localization was evaluated with confocal microscopy images by the analysis of transversal sections across cells (**Figure 3B**). As observed, both C3 and FB proteins are localized around the plasmatic membrane, staining when C. albicans cells were incubated with NS and also with HIS.

Afterwards, in order to evaluate the deposition of human IgGs on C. albicans surface after 5 h of incubation with 10% human serum, IgGs were detected in NS and HIS samples, with higher intensity on the yeast surface than on the hyphal surface in both samples (Figure S2).

# DISCUSSION

Our work shows, for the first time to our knowledge, the identification by a shaving approach of 147 C. albicans surface proteins and 214 human serum proteins attached to the surface of the fungus in the same report. The more relevant C. albicans proteins identified are GPI-anchored proteins and proteins involved in cell wall organization or biogenesis. The most relevant human proteins identified belong to complement and coagulation pathways. Indeed, all main human complement proteins were identified on the surface of C. albicans. The identified proteins were reproducible among the experiments and would represent the physiological mode of action of the immune system against C. albicans or components of the immune system used for C. albicans to evade, disturb or use on its own purpose the host defenses.

# Analysis of *C. albicans* Surface Proteins

With the objective of unraveling the protein surface pattern of C. albicans hyphal cells, we analyzed our data in comparison to the data obtained by our group using the same methodology of cell shaving over yeast and hyphal cells (growth also in Lee's medium, but without human serum) (Gil-Bona et al., 2015). This analysis rendered 304 proteins identified in all C. albicans assays (yeasts and hyphae with and without human serum) that we consider the C. albicans common core of cell surface proteins and not morphotype specific (Table S3). Also, a hyphal protein pattern of 55 proteins identified on C. albicans hypha forms, but not identified in C. albicans yeasts, was obtained. Among them, Als3, Hyr1, and Sod5 were associated with hyphae in different works (Heilmann et al., 2011; Sosinska et al., 2011; Sudbery, 2011). They are proteins that are expected to be detected, indicating that the other 52 proteins are at least more abundant or accessible in hypha than in yeast cells. Another example is Ece1, which is a hyphal-specific protein with unknown function (Fan et al., 2013) (Table S3).

Other proteins were associated with hyphae in some works and described as morphotype-independent in others, for example, Phr1 and Rbt1 (Heilmann et al., 2011; Ragni et al., 2011; Monniot et al., 2013). We identified both proteins in all shaving C. albicans experiments. Furthermore, Phr2 (a homolog of Phr1) was also identified in all conditions. The differences observed at the level of detection of Phr1 and Phr2 among the samples were possibly related to the optimum pH for expression of each one (Sosinska et al., 2011; Dühring et al., 2015) (Table S1). On the other hand, Rhd3 and Ywp1 were reported as yeast-specific and were found to be part of the common core pattern in our study (Heilmann et al., 2011). A relevant protein found in C. albicans common core pattern is Msb2; this stabilizes the fungal cell wall and inactivates human antimicrobial peptides (Dühring et al., 2015).

It is interesting to know that GPI-anchored proteins identified in this work are part of the common core of C. albicans surface proteins or of the hyphal-form specific ones. There are 17 proteins belonging to the common core in our analysis, and also Ece1 and Sod5, which are part of the hyphal-specific group, whose genes were up-regulated under blood growth (Fradin et al., 2003).

Remarkably, 12 proteins were only identified on the C. albicans hyphal surface when it was grown with 10% human serum and we highlighted Pra1, Sap5, Tef1, and 5 more Orfs with unknown function (Table S3). In this work, two members of the Sap family, Sap9 and Sap5, were identified. Sap9 is part of C. albicans common core while Sap5 shows up in hyphae induced with the human serum group. Both genes were up-regulated in C. albicans biofilm associated with bloodstream infections (Joo et al., 2013). Ten out of these twelve proteins are identified for the first time on the surface of C. albicans; Pra1 and Tef1 have been previously identified in the cell wall of C. albicans.

Many proteins described as "moonlighting" or multifunctional proteins in C. albicans were identified in this work (Nombela et al., 2006; Chaffin, 2008) (indicated with <sup>∗</sup> in Table S1). Among them, Gpd2 has been described as a virulence factor on microorganism surface because it mediates complement evasion; meanwhile, there is a glycerol 3-phosphate-dehydrogenase in the cytoplasm that is involved in the intracellular accumulation of glycerol to control osmotic pressure (Luo et al., 2013). Tsa1 is involved in oxidative stress response in the cytoplasm and in hyphal cell wall biogenesis (Urban et al., 2005). Tdh3 is a glycolytic enzyme (glyceraldehyde-3-phosphate dehydrogenase) in the cytoplasm and able to attach fibronectin and laminin at the cell surface (Gozalbo et al., 1998).

Regarding C. albicans proteins that interact with complement inhibitor proteins (FH and PLG), the expression of Gpd2 and Pra1 is more prominent at the hyphal tip (Zipfel et al., 2011; Luo et al., 2013). Pra1 is also able to bind C4BP and it is also known that the hyphal tip is a prominent binding site for C4BP (Meri et al., 2004; Luo et al., 2011). For these reasons and because it facilitates host tissue penetration, the hyphal tip is considered an important factor of pathogenesis. We identified these two proteins on the C. albicans surface grown with 10% human serum that activate hyphal growth. Focusing on the mechanism to evade complement activation mediated by plasminogen coated surfaces, as plasminogen is an important complement cascade inhibitor in the fluid phase and on the human surface (Barthel et al., 2012), the identification of nine out of the eleven C. albicans proteins with the ability to bind PLG (Adh1, Eno1, Fba1, Gpd2, Pgk1, Pra1, Tdh3, Tef1, and Tsa1) is another highlight of this study (Crowe et al., 2003; Jong et al., 2003; Barthel et al., 2012).

# Human Serum Proteins with the Ability to Interact with *C. albicans* Surface

We detected human plasma proteins from different abundance strata on C. albicans surface; most of them are from the most abundant proteins or from intermediate stratum (Anderson and Anderson, 2002; Mitchell, 2010). Due to the heat treatment of HIS serum, FB and C2 become non-functional; in this situation, other human serum proteins could interact further with the surface of C. albicans. Remarkably, C. albicans surface could be less accessible to trypsin digestion when it was grown with 10% HIS as the number of coated of human proteins is greater. This could explain the observation of higher sum of peptide-spectrum match (PSMs) of human serum proteins in the HIS condition in comparison with the NS condition. In any case, it is not possible to discard that some proteins differentially observed in HIS are a result of changes in their binding properties due to protein denaturation.

Albumin is the most abundant protein in human plasma with close to two orders of magnitude more than the rest of proteins (Mitchell, 2010), but it was not detected as the highest abundant protein on C. albicans surface incubated with human serum (either NS or HIS) (Figure S1). Also, there are some proteins only detected on C. albicans surface in HIS samples that have intermediate abundance in human plasma such as C1INH, C1qA, and RBP4. Serotransferrin and fibrinogen (FGG, FGA and FGB) were more abundant in NS condition and in the other side apolipoprotein A-I and transthyretin in HIS condition. All of these data indicate that the interaction between human proteins and C. albicans surface did not respond solely to their protein abundance in human serum. The study of C3 and FB deposition on C. albicans surface after incubation with 10% human serum was carried out by immunofluorescence with specific antibodies. The clearly deposition of C3 and FB was observed at short times of interaction in **Figure 3A**.

A graphical summary of the identified proteins at C. albicans surface from the different complement pathways in NS and HIS conditions is presented in **Figure 4**. The enrichment on the surface of C. albicans in proteins belonging to the lectin, classical and alternative complement pathways is visibly observed in normal conditions (NS); almost all of the main proteins of complement pathways were identified (**Figures 4A,B**). Peptides belonging to anaphylotoxin region are less frequently identified in NS than in HIS condition; this could be related to that C2b, C3a, FBa, and C5a fragments were released to the medium under this condition (**Table 5**). Furthermore, two peptides (CCEDGMR and CCEDGMRENPMR), which are part of the LRK26 peptide derived from C3a and with antimicrobial effects on C. albicans surface (Sonesson et al., 2007), were only identified in NS.

Proteins mainly identified on heat inactivated serum (HIS) in (C). Identified proteins on samples are indicated with gray background and not-identified in white. Complement fragments released during complement activation are indicated with triangles. Peptides belonging to complement fragments that can be released or be part of a full-length protein are indicated with \*. Numbers in (A,B) correspond to sequential steps during complement pathways activation. (A) Collectin-11 was only identified in NS samples. MASP2 was identified in NS and MASP1 was not identified in any sample. Fc (constant region of Ig) of IgGs or IgMs start classical complement pathway. (B) Properdin was identified in NS and Factor D (FD) was not. (C) Ficolin-2, Ficolin-3, C1qA, C1INH, and FD were identified only in HIS samples. C1INH dissociates the C1qrs complex, blocks alternative C3 convertase and inhibits C3 activation by kallikrein. C2 was identified in HIS, including 4 peptides of C2b. Five peptides belonging to FBa were identified in HIS condition. Signally, peptides belonging to C5a fragment were only identified in HIS samples.

Peptides belonging to the C5a region were only identified in HIS samples, just as more peptides belonging to the MACPF region of C6, C7, and C8-A. Furthermore, lower NSAF for all components of MAC, except C5, were obtained with HIS (**Figure 2**), which could be correlated with the enhanced identification of C5a peptides in this condition. In contrast, no identification of peptides belonging of C5a region and higher NSAF for the rest of components of MAC in NS samples may indicate MAC formation. MACPF-domains would not be accessible to trypsin digestion because they are points of interaction for MAC formation and would be protected in this condition. Furthermore, we identified clusterin and vitronectin (VTN) with higher NSAF in NS condition and both are able to interact with MAC in plasma, and thus prevent the cytolysis of host cells (Chauhan and Moore, 2006). Further experiments are required to determine the formation and the functionality of MAC on C. albicans surface.

In the same line of thought, in NS samples, properdin (FP) and not factor D (FD) were identified, while the opposite happened with HIS samples (**Figure 4B** vs. **Figure 4C**). The results are in agreement with a functional alternative complement pathway in NS condition on the surface of C. albicans. On the other hand, in HIS samples, FD could still be attached to the C3bB complex and we suppose that C3 convertase is not formed. In this situation, FP could not stabilize the complex and would be not identified on HIS samples. C1INH is able to bind to immobilized C3b and block progression of the complement cascade (Jiang et al., 2001). Serpin role of C1INH produces dissociation of C1qrs complex and the identification of these proteins in the HIS condition was enhanced, possibly for this reason. C3 and C4 (C4-A and C4-B) were identified with less NSAF and C2 and FB with higher NSAF in HIS samples (**Figure 4C**). In a current study about Streptococcus pyogenes interaction with human plasma, C2, C1qA, and FD were not detected on the surface of the microorganism (Sjöholm et al., 2014). In our experiments, when C. albicans cells were incubated with HIS, we detected not only these proteins, but also C1INH, and the enrichment of FB, C1qB, and C1qC. This is an interesting observation that may be due to the absence of functional convertases; FB and C2 might be more accessible to trypsin digestion. We assume that the unsuitable formation of complement convertases, or at least in a less extended manner, is responsible for the enhanced peptide identification of C2, FB, FD, C5, and the C1qrs complex in HIS samples.

Coagulation and complement pathways are highly connected, with many cross-talks (Kazatchkine and Jouvin, 1984; Amara et al., 2008, 2010; Peerschke et al., 2008; Diamond, 2013; Verschoor and Langer, 2013). Many complement inhibitors were identified on the surface of C. albicans and in general more abundant in HIS samples. The innate immune system strives to respond and activates complement cascades by non-conventional mechanisms (thrombin (F2), PLG, Factor XII (F12), kallikrein (KLKB1), ficolin-2 and -3) and bypass locking points in absence of functional FB and C2 (Nielsen et al., 1996; Rittirsch et al., 2008) (**Figures 2**, **4C**). In HIS samples, KLKB1 inhibitors were enriched, such as SERPINA5, SERPINC1, A2M, and C1INH (Gadjeva et al., 1998; Paréj et al., 2013).

A relevant group of proteins is serpin family (serine protease inhibitors), which are broadly distributed among eukaryote organisms. Serpins are able to inhibit proteases by a specific mechanism where covalent complexes are formed with target proteases (Law et al., 2006). Since serpin has to be cleaved to inhibit the target substrates, inhibition also consumes the serpin; serpins are therefore irreversible enzyme inhibitors. For these reasons, the identification of many serpins on C. albicans surface and more interestingly in HIS samples is an interesting observation because they may be inactivating their substrates attached to the surface of C. albicans or themselves coating C. albicanssurface in the sense to avoid interactions between their substrates and C. albicans (**Table 4**).

Interestingly, the apolipoproteins classified in the metabolism group are involved in lipid transport and also play a relevant role in host defense as part of the innate immune system (Grunfeld and Feingold, 2008). Furthermore, C. albicans grows better in lipoprotein-depleted plasma due to two factors: greater availability of lipids and reduced neutralizing candidacidal factor (Vonk et al., 2004). ApoE has a relevant role in systemic candidiasis infection and showed higher NSAF in NS samples (Figure S1). ApoE-/- mice showed an increased mortality in comparison with wild type (85 vs. 52%) (Vonk et al., 2004), and also an impaired immune response in Klebsiella pneumonia infection (de Bont et al., 2000).

In summary, the mechanisms developed by C. albicans to evade, interfere or use for its benefit host defenses are very wide and part of its pathogenic evolution. The analysis of C. albicans surface after incubation with human serum to obtain a partial breakdown of the immunological crosstalk between C. albicans and human serum proteins is a relevant target of study. C. albicans mimic human immune evasion model using many surface proteins to fix plasminogen and complement inhibitors (FH and C4BP) to protect against complement activation. Furthermore, the identification of Pra1, Sap5, Tef1, and 9 more proteins only identified in C. albicans hyphal surface induced with 10% human serum and not in other published conditions with an equivalent approach is a relevant highlight of this analysis.

# AUTHOR CONTRIBUTIONS

EM designed and performed the experiments, analysis of results, and writing of the manuscript. CP designed and performed the experiments. CH performed the experiments. MH performed the experiments and analysis of results. CN research support. LM designed the experiments, analysis of results and critically revised the manuscript. CG designed the experiments, analysis of results, and critically revised the manuscript.

# ACKNOWLEDGMENTS

This work was supported by BIO2012-31767 from the Ministerio de Economía y Competitividad, PROMPT (S2010/BMD-2414) from the Comunidad Autónoma de Madrid, REIPI-Spanish Network for the Research in Infectious Diseases (RD12/0015/0004) and PRB2 (PT13/0001/0038) funded by the ISCIII and FEDER. CP was the recipient of a fellowship from the program Continuous training for teachers of the Pontificia Javeriana University (Colombia) in conjunction with COLCIENCIAS-LASPAU University of Harvard. The proteomic analyses were carried out in the Proteomic Units of Complutense University of Madrid and of Centro de Investigaciones Biológicas (CSIC), members of ProteoRed network. The Immunofluorescence Microscopy images were carried out in Centro Nacional de Microscopía Electrónica (ICTS)-UCM. The authors would like to thank Dr. Fernando Vivanco [Fundación Jimenez Diaz, Universidad Complutense,

# REFERENCES


Madrid (Spain)] and Dra. Pilar Sánchez-Corral Gómez [Unidad de Investigación de Inmunopatología, Hospital Universitario La Paz, Madrid (Spain)] for the gift of C3 and FB antibodies, respectively. These results are lined up with the Spanish Initiative on the Infectious Diseases-Human Proteome Project (B/D-HPP).

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmicb. 2015.01343


information for Candida albicans and Candida glabrata. Nucleic Acids Res. 40, D667–D674. doi: 10.1093/nar/gkr945


clinical outcome and risk factors for death. Eur. J. Clin. Microbiol. Infect. Dis. 21, 767–774. doi: 10.1007/s10096-002-0822-1


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Marín, Parra-Giraldo, Hernández-Haro, Hernáez, Nombela, Monteoliva and Gil. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Structure and Interactions of a Dimeric Variant of sHIP, a Novel Virulence Determinant of Streptococcus pyogenes

Carl Diehl 1, 2, Magdalena Wisniewska1, <sup>3</sup> , Inga-Maria Frick <sup>4</sup> , Werner Streicher 1, 5 , Lars Björck <sup>4</sup> , Johan Malmström<sup>4</sup> and Mats Wikström<sup>1</sup> \* †

<sup>1</sup> Protein Function and Interactions Group, Faculty of Health and Medical Sciences, The Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Copenhagen, Denmark, <sup>2</sup> SARomics Biostructures, Lund, Sweden, <sup>3</sup> Malopolska Centre of Biotechnology, Krakow, Poland, <sup>4</sup> Division of Infection Medicine, Department of Clinical Sciences, Lund University, Lund, Sweden, <sup>5</sup> Novozymes A/S, Bagsvaerd, Denmark

#### Edited by:

Jonathan M. Blackburn, University of Cape Town, South Africa

#### Reviewed by:

Kai Papenfort, Ludwig-Maximilians-Universität München, Germany Didier Soulat, Universitätsklinikum Erlangen, Germany

#### \*Correspondence:

Mats Wikström mats.wikstrom@amgen.com

†Present Address: Mats Wikström, Amgen Inc., Thousand Oaks, CA, USA

#### Specialty section:

This article was submitted to Microbial Physiology and Metabolism, a section of the journal Frontiers in Microbiology

> Received: 22 October 2015 Accepted: 18 January 2016 Published: 05 February 2016

#### Citation:

Diehl C, Wisniewska M, Frick I-M, Streicher W, Björck L, Malmström J and Wikström M (2016) Structure and Interactions of a Dimeric Variant of sHIP, a Novel Virulence Determinant of Streptococcus pyogenes. Front. Microbiol. 7:95. doi: 10.3389/fmicb.2016.00095

Streptococcus pyogenes is one of the most significant bacterial pathogens in the human population mostly causing superficial and uncomplicated infections (pharyngitis and impetigo) but also invasive and life-threatening disease. We have previously identified a virulence determinant, protein sHIP, which is secreted at higher levels by an invasive compared to a non-invasive strain of S. pyogenes. The present work presents a further characterization of the structural and functional properties of this bacterial protein. Biophysical and structural studies have shown that protein sHIP forms stable tetramers both in the crystal and in solution. The tetramers are composed of four helix-loop-helix motifs with the loop regions connecting the helices displaying a high degree of flexibility. Owing to interactions at the tetramer interface, the observed tetramer can be described as a dimer of dimers. We identified three residues at the tetramer interface (Leu84, Leu88, Tyr95), which due to largely non-polar side-chains, could be important determinants for protein oligomerization. Based on these observations, we produced a sHIP variant in which these residues were mutated to alanines. Biophysical experiments clearly indicated that the sHIP mutant appear only as dimers in solution confirming the importance of the interfacial residues for protein oligomerisation. Furthermore, we could show that the sHIP mutant interacts with intact histidine-rich glycoprotein (HRG) and the histidine-rich repeats in HRG, and inhibits their antibacterial activity to the same or even higher extent as compared to the wild type protein sHIP. We determined the crystal structure of the sHIP mutant, which, as a result of the high quality of the data, allowed us to improve the existing structural model of the protein. Finally, by employing NMR spectroscopy in solution, we generated a model for the complex between the sHIP mutant and an HRG-derived heparin-binding peptide, providing further molecular details into the interactions involving protein sHIP.

Keywords: proteomics, structural biology, virulence factors, protein-protein interactions, host-pathogen relationship

# INTRODUCTION

Streptococcus pyogenes, also known as group A streptococci (GAS) is a significant human pathogen that infects and colonizes the skin and the upper respiratory tract where it causes relatively mild clinical conditions such as impetigo and pharyngitis. Some infections caused by invasive strains, such as the AP1 strain of the M1 serotype, can lead to severe and potentially life-threatening diseases such as necrotizing fasciitis and streptococcal toxic shock syndrome (STSS), whereas acute rheumatic fever and glomerulonephritis are sequelae to acute S. pyogenes infections. S. pyogenes causes an estimated 700 million cases of mild and non-invasive infections each year, of which ∼650,000 progress to severe invasive infections with a mortality of at least 25% (Carapetis et al., 2005; Ralph and Carapetis, 2013). S. pyogenes produces a number of proteins that enable the bacterium to attach to host tissues, evade the immune response, and spread by penetrating host tissue layers. These virulence factors are predominantly secreted or surface associated proteins, and they include the family of M proteins (Lancefield, 1962; Swanson et al., 1969; Phillips et al., 1981), fibronectin-binding proteins (Talay et al., 1994; Kreikemeyer et al., 1995; Jaffe et al., 1996; Courtney et al., 1999; Rocha and Fischetti, 1999; Terao et al., 2001), superantigenic exotoxins (Stevens et al., 1989; Abe et al., 1991; Tomai et al., 1992; Mollick et al., 1993; Norrby-Teglund et al., 1994), and the secreted streptococcal inhibitor of complement referred to as protein SIC (Åkesson et al., 1996; Fernie-King et al., 2002; Frick et al., 2003).

In a recent study, we quantitatively analyzed and compared S. pyogenes proteins in the growth medium of a strain that is virulent to mice (AP1), with a non-virulent strain (SF370). We found that one protein in particular was present at significantly higher levels in the stationary growth medium from the virulent strain. The amount of sHIP in the medium fraction is similar to the secreted mitogenic exotoxin SmeZ (Kamezawa et al., 1997), and shows an overall abundance profile resembling the profile observed for the surface associated proteins H and M1 (Lancefield, 1962; Åkesson et al., 1990; Wisniewska et al., 2014). Through the use of affinity pull-down mass spectrometry analysis of human plasma, we could demonstrate that the new bacterial protein interacts with the antimicrobial human protein histidine-rich glycoprotein (HRG), and the name sHIP (streptococcal Histidine-rich glycoprotein Interacting Protein) was therefore introduced (Wisniewska et al., 2014). HRG is an abundant plasma glycoprotein of approximately 60 kDa that interacts with several other protein ligands such as tropomyosin, heparin, plasminogen, plasmin, fibrinogen, and IgG (Jones et al., 2005). HRG has been shown to exhibit broad antimicrobial potency (Rydengård et al., 2007), including activity against S. pyogenes (Shannon et al., 2010). It was shown that sHIP binds both intact HRG and HRG-derived peptides (peptides containing consensus heparin-binding sequences) with high affinity. Moreover, the antibacterial activity of HRG is blocked by protein sHIP, which represents a new mechanism that can contribute to the virulence of AP1 bacteria. In addition, we could show that patients with severe S. pyogenes infection, in contrast to patients with superficial and uncomplicated infections, are more prone to develop antibodies against sHIP, which suggests that sHIP represents a novel virulence determinant (Wisniewska et al., 2014). Furthermore, the determination of the threedimensional structure of sHIP, showed that it has a tetrameric organization composed of four helix-loop-helix motifs. A similar structural unit can be found in the adhesion factor FadA from Fusobacterium nucleatum. However, the two proteins differ significantly in their respective oligomeric organization. In FadA, the helix-loop-helix motifs form elongated fibers whereas the sHIP monomers are organized into a compact tetrameric structure. In order to understand the molecular prerequisites for the observed interaction between sHIP and HRG, we have performed extensive crystallization trials but not been able to obtain any diffracting crystals of complexes between sHIP and HRG or any HRG derived peptides. Due to the size of the tetrameric sHIP, NMR experiments in solution have proven to be very challenging and would benefit from access to a sHIP variant with lower molecular weight and retained activity. We have previously made the observation that the tetrameric structure of sHIP can also be described as a dimer of dimers, and postulated that three residues in the dimer-dimer interface could represent important determinants for the stabilization of the tetrameric form of sHIP. In this study, we have mutated these three interfacial residues and thereby been able to produce a stable dimeric variant of sHIP that was shown to be active both in biophysical binding experiments and an antimicrobial assay. The generation of stable dimers enabled further characterization of the interaction between sHIP and a peptide from HRG through NMR experiments, providing the first molecular details of an interaction involving this novel virulence determinant.

# MATERIALS AND METHODS

# Mutagenesis

The DNA sequence corresponding to residues Lys3-Met98 in the S. pyogenes protein sHIP (UniprotID: Q99XU0) was expressed as described previously and denoted sHIPwt (Wisniewska et al., 2014). A protein sHIP variant, a quadruple mutant named sHIPqp, in which Leu84, Leu88, Tyr95 were mutated to alanine, and Cys65 was mutated to serine, was constructed using the Quikchange XL site-directed mutagenesis kit (Agilent) following the manufacturer's instructions. We have previously observed that the wildtype protein can form larger aggregates through the exposed Cys65 residue in the absence of a reducing agent. The mutation Cys65Ser was introduced in order to avoid the need to use a reducing agent in subsequent experiments. As previously noted, this mutation has no effect on the structure and function of sHIP (Wisniewska et al., 2014). We are therefore using the Cys65Ser mutant in all studies involving both the wildtype sHIP and in the sHIP variant. All mutations were verified by DNA sequence analysis. The resulting expression construct contained a His-tag and a TEV protease cleavage site preceding the protein sequence of interest.

# Protein Expression and Purification

The protein sHIP variants sHIPwt and sHIPqp were expressed in E. coli strain Rosetta BL21 (DE3) and purified in two steps using standard immobilized metal ion affinity chromatography (IMAC) followed by proteolytic removal of the His-tag, and reverse phase chromatography (RPC), as described previously (Wisniewska et al., 2014). For the NMR studies, protein was produced using minimal medium for cell growth in order to introduce the stable isotopes <sup>15</sup>N and <sup>13</sup>C enabling tripleresonance NMR experiments (Neidhardt et al., 1974). The Histag was removed by treatment with tobacco etch virus (TEV) protease giving the following amino acid sequences for the wildtype protein sHIP, sHIPwt:

### SMKQDQLIVEKMEQTYEAFSPKLANLIEALDAFKEHYEE YATLRNFYSSDEWFRLANQPWDDIPCGVLSEDLLFDMIG DHNQLLADILDLAPIMYKHM,

and the quadruple mutant protein sHIP, sHIPqp:

SMKQDQLIVEKMEQTYEAFSPKLANLIEALDAFKEHYEE YATLRNFYSSDEWFRLANQPWDDIPSGVLSEDLLFDMIG DHNQLAADIADLAPIMAKHM, respectively.

The purity and mono-dispersity of the recombinant proteins were verified by SDS-PAGE electrophoresis and mass spectrometry.

# Crystallization, X-Ray Data Collection, and Structure Determination

Crystallization of the protein sHIP variant was carried out using the sitting drop vapor diffusion method at 18◦C. The best crystals were obtained from 20% PEG 6000, 0.2 M calcium chloride and 0.1 M Tris, pH 8.0. Prior to plunge freezing, the crystals were soaked for approximately 30 s in a drop of a reservoir solution containing 20% v/v ethylene glycol as cryo-protectant. The crystals belonged to the space group P22121 and contained one monomer per asymmetric unit. The highest resolution data set of 180 frames (1◦ oscillation range) was collected from an orthorhombic crystal at Max II lab beamline I911-2, Lund, Sweden, using a MAR CCD detector. Data were indexed and integrated using iMOSFLM (Battye et al., 2011) and scaled using SCALA (Evans, 2006) from the CCP4 suite of programs. The structure was solved by molecular replacement using wild type sHIP as the search model (PDB ID: 4MER), and refined using REFMAC (Murshudov et al., 1997). Refinement rounds were complemented with manual rebuilding using COOT (Emsley et al., 2010). Water molecules were automatically inserted using Arp/wArp and visually inspected. For further details on data processing and refinement statistics, see **Table 1**. Geometry of the model was analyzed with Molprobity (Chen et al., 2010). Molecular graphics images were generated using the program Pymol. The atomic coordinates and structure factors for the sHIP mutant (sHIPqp) crystal structure have been deposited with the protein data bank (PDB ID: 4PZ1).

# Analytical Ultracentrifugation

Sedimentation velocity experiments were performed at 20◦C and 50000 rpm using a Beckman XL-I instrument. Samples containing either sHIPwt or sHIPqp were in 20 mM MES buffer TABLE 1 | Crystallographic data collection and refinement statistics.


<sup>a</sup>Values for the highest resolution shell values are indicated in parentheses.

<sup>b</sup>RMSD, root mean square deviation.

alpha, beta, gamma refers to the unit cell axes dimensions, while alpha, beta, and gamma in symbol refers to the inclination angles of the axes in the unit cell. The space group number P22121 refers to the description of the symmetry of the crystal.

(2-(N-morpholino) ethanesulfonic acid), pH 5.5. The data were analyzed with SEDFIT using a continuous c (S) distribution (Schuck, 2000). HYDROPRO (Ortega et al., 2011) was used to calculate the theoretical sedimentation coefficient using the PDB IDs 4MER (sHIPwt) and 4PZ1 (sHIPqp), respectively.

# Synthetic Peptide

The sequence for the HRG peptide used in the study referred to as HRGsingle, corresponding to a single heparin-binding motif, have the amino acid sequence GHHPHG. The peptide was synthesized by Biosyntan GmbH (Berlin, Germany) and purity and molecular mass were confirmed by MALDI-TOF MS.

# Antimicrobial Assay

The S. pyogenes strain AP1 (40/58), was from the World Health Organization Collaborating Centre for Reference and Research on Streptococci, Prague, Czech Republic. The bacteria were cultivated in THY (Todd-Hewitt broth (Difco) supplemented with 0.2% yeast extract (Oxoid) at 37◦C and 5% CO<sup>2</sup> until reaching mid-log phase (OD 620 nm approximately 0.4). The bacterial cells were washed and resuspended in 10 mM Tris-HCl, pH 7.5, containing 5 mM glucose, to a concentration of 2 × 10<sup>9</sup> cfu (colony forming unit)/ml. Subsequently, the bacteria were diluted to a concentration of 2 × 10<sup>6</sup> cfu/ml in 10 mM MES buffer, pH 5.5, containing 5 mM glucose. Fifty microliters of the bacterial solution was incubated with recombinant His-tagged HRG (Creative Biomart) at a concentration of 0.45 µM together with various concentrations of protein sHIPwt or protein sHIPqp for 40 min at 37◦C. Serial dilutions of the incubation mixtures were plated on TH agar, plates were incubated over night at 37◦C and the number of cfu's were determined.

# Isothermal Titration Calorimetry (ITC)

ITC experiments were performed using a VP-ITC200 instrument (GE Healthcare). The samples were extensively dialyzed against 20 mM MES buffer, pH 5.5. 0.5 mM HRGsingle peptide was titrated into 0.05 mM of sHIPwt, or sHIPqp respectively. All experiments were performed at 25◦C and run until saturation was achieved. The data were fitted using a model describing one independent binding site using the software provided by the manufacturer (Wiseman et al., 1989).

# NMR Experiments

Protein samples were prepared in 20 mM MES pH 5.5 with 7% (v/v) D2O for the spectrometer lock and 0.02% (w/v) NaN<sup>3</sup> to prevent bacterial growth in the samples. All NMR experiments were performed at 25◦C. For the studies of the sHIPqp/HRGsingle peptide complex, the sample contained 1.0 mM of double-labeled (15N, <sup>13</sup>C) protein sHIPqp and 1.1 mM of the HRGsingle peptide. The backbone and side-chain resonance assignment experiments were performed at 800 MHz proton resonance frequency, whereas the experiments in order to acquire Nuclear Overhauser Enhancement (NOE) distance restraints were acquired at 900 MHz proton resonance frequency. The assignment of the signals from sHIPqp and the HRGsingle peptide, and collection of NOE restraints were performed by using a combination of standard 2D and 3D triple-resonance experiments including HNCA (Clore et al., 1990), HNCOCA (Bax and Ikura, 1991), HNCACB (Wittekind and Mueller, 1993), CBCACONH (Grzesiek and Bax, 1992), HCCONH (Montelione et al., 1992), CCONH (Montelione et al., 1992), <sup>15</sup>N-NOESY-HSQC (Marion et al., 1989), <sup>13</sup>C-NOESY-HSQC (separate for aliphatic and aromatic signals; Marion et al., 1989), 2D double-filtered NOESY (Ikura et al., 1992), and 2D singlefiltered NOESY (Ikura et al., 1992). In addition, 2D NOESY and TOCSY experiments were acquired for the HRGsingle peptide. Spectra were processed with NMRpipe (Delaglio et al., 1995), applying zero-filling and linear prediction in the indirect dimensions and solvent filter and polynomial baseline correction in the direct dimension. CCPNMR (Vranken et al., 2005) was used for visualization of spectra, resonance connectivity analysis, and distance restraints (NOE) assignments. Structure calculations were performed using an iterative procedure within the ARIA/CNS suite of programs (Brunger et al., 1998). The assessment of the quality of the NMR generated structural models, were performed using the Protein Structure Validation

Suite (PSVS, **Table 3**; Bhattacharya et al., 2007). Molecular graphics images were generated using the program Pymol (**Figure 5**).

# RESULTS

# The sHIPqp Mutant Forms Stable Dimers of Two Helix-Loop-Helix Motifs

The wildtype sHIP is composed of tetramers that can be described as the dimer of two dimers each harboring two helix-loop-helix motifs (**Figure 1A**). The hypothesis was that the three non-polar residues at the tetramer interface (Leu84, Leu88, Tyr95) represent important determinants for the protein oligomerization (**Figure 1B**), and these residues were therefore mutated to alanines in the sHIPqp mutant protein. The orthorhombic crystals of the sHIPqp mutant contain one monomer in the asymmetric unit, with a solvent content of 36%. The crystal packing shows the formation of a dimer with a total buried surface area of 6070 Å<sup>2</sup> as calculated by

FIGURE 2 | Sedimentation velocity experiments for protein sHIP. The dotted lines indicate the theoretical sedimentation coefficient for a tetramer and a dimer respectively calculated using HYDROPRO and the structural coordinates for protein sHIPwt (PDB ID: 4MER) and protein sHIPqp (PDB ID: 4PZ1). The wild-type sHIPwt is shown in black, and the sHIPqp mutant in red respectively.

PISA server (Krissinel and Henrick, 2007). The final model comprises residues 3-98, corresponding to the entire sHIPqp molecule (**Figure 1C**). The structure is very well defined, with an average B factor of 12.2 Å<sup>2</sup> (10.8 Å<sup>2</sup> and 13.6 Å<sup>2</sup> for main chain and side chains, respectively). Final electron density maps are of very high quality including the loop region, which was partially not visible in sHIP wild type (**Figure 1D**). All non-glycine residues exhibit main-chain angles in the favored regions of the Ramachandran plot as defined by the use of the program Molprobity (Chen et al., 2010; **Table 1**). To provide a quantitative analysis of the oligomeric states of the sHIP variants, analytical ultracentrifugation was employed on sHIPwt and the sHIPqp mutant. These experiments clearly show that the wildtype sHIP (sHIPwt) appear as tetramers in solution whereas the sHIP mutant (sHIPqp) exist as dimers (**Figure 2**). The largest differences in the monomeric helix-loop-helix motif between the two sHIP variants are observed in the loop region connecting the two helices. In the sHIPqp mutant, all residues, including side-chains, are very well defined, while in sHIPwt, some residues could not be defined due to the absence of interpretable electron density maps in this region (**Figure 1D**).

# The sHIPqp Mutant Inhibits the Antibacterial Activity of HRG

It is known that HRG has antibacterial activity including activity against S. pyogenes (Shannon et al., 2010), and when challenged by HRG, sHIP has been shown to rescue S. pyogenes bacteria (Wisniewska et al., 2014). To examine whether a sHIP mutant has the same ability, we performed an antimicrobial assay comparing sHIPwt and sHIPqp. As shown in **Figure 3**, sHIPqp inhibits the bactericidal activity of HRG to the same or even higher extent that wildtype sHIP indicating that the dimeric mutant sHIPqp retains its inhibitory capacity.

# sHIP Binds a Single Heparin-Binding Motif in HRG

In our previous studies we have shown that protein sHIP binds to both HRG and HRG-derived peptides with nanomolar to micromolar affinity, and that the histidine-rich region is important for the interaction (Wisniewska et al., 2014). To further dissect the molecular details in the interaction, we investigated whether the sHIP variants are also able to bind a single heparin-binding motif (GHHPHG) in HRG (denoted as HRGsingle). Isothermal titration calorimetry (ITC) experiments showed that both sHIPwt and sHIPqp binds HRGsingle with similar micro-molar affinities, clearly indicating that the heparinbinding motif represents an important recognition motif in HRG for the interaction with protein sHIP (**Figure 4**, **Table 2**).

# The HRG-Binding Site in sHIP Involves Both Helix-Loop-Helix Motifs of the Dimer

The wildtype protein sHIP is a tetrameric protein with a molecular weight of 46 kDa complicating NMR experiments in solution due to broad and overlapping resonances. The creation of the mutant dimeric form of sHIP, sHIPqp, and through the verification of its functional properties, we now have access to a smaller, yet biologically relevant system with a molecular weight of 23kDa, which enabled detailed NMR studies in solution.

A molecular model was created with a combination of the structure of sHIPqp (PDB ID: 4PZ1), and distance restraints (NOEs) extracted from the NMR experiments of the complex between sHIPqp and the HRGsingle peptide. A total of 1714 NOEs were used as distance restraints in the structure calculations of which 321 are long-range restraints and 56 represent NOEs detected between sHIPqp and the HRGsingle peptide (**Table 3**). It should be noted that this model represent

TABLE 2 | Summary of thermodynamic parameters from ITC experiments of the binding of the two sHIP variants sHIPwt and sHIPqp to the HRGsingle peptide respectively.


a preliminary description of the sHIPqp-HRGsingle complex that will need further refinement before considered a finalized complex structure. However, the assessment of the generated structure shows that it is of sufficient quality to represent a most relevant descriptive model for the complex (**Table 3**). The model shows that the HRGsingle peptide binds in a pocket created by residues from both monomeric helix-loop-helix motifs, suggesting that the dimeric form represents the smallest sHIP variant attainable with retained HRG-binding capacity. Each sHIP dimer binds two HRGsingle peptides in two identical sites (**Figure 5A**). The binding site includes the N-terminal part of helix 1 of one monomer (residues Asp5-Met12), and the end region of helix 1 of the second monomer (residues Ala41- Ala56). For the first monomer three residues are involved in NOE contacts with the HRGsingle peptide with primary contribution from Ile8. For the second monomer, thirteen residues give rise to detectable NOEs dominated by contacts involving Phe46, Glu51, and Arg54. Moreover, the binding site does not involve any residues in the proximity of the dimer-dimer interface in agreement with the observation that the mutant sHIPqp maintain its full inhibitory capacity of HRG in antimicrobial assays (**Figure 3**). The two external Glycine residues of the HRGsingle peptide, Gly-1 and Gly-6, are not involved in any interactions with sHIP. However, the core part of the peptide, (HHPH), forms multiple contacts with a binding pocket created by residues from both chains of the structural framework created by the sHIP dimer. In particular, residues His-2 and His-5 are positioned deeply in the binding pocket, a confirmation of the peptide backbone that is made possible due to the unique backbone properties of the connecting Pro-4 residue (**Figure 5B**).

# DISCUSSION

Bacterial infections represent a major challenge to human health worldwide, a challenge increasing at an elevated pace due to the alarming upsurge of antibiotic resistance. In order to treat bacterial infections in the future, new therapeutic strategies need to be developed that will require a better understanding of the complex molecular interplay between bacterial pathogens and their human host. S. pyogenes represents one of the most significant bacterial pathogens in the human population. Using mass spectrometry based proteomics, we previously identified a novel protein and virulence factor from this bacterium (Wisniewska et al., 2014). In this study we have extended the structural and functional characterization of this protein to



<sup>a</sup>Analyzed for residues 2 to 98 (sHIPqp chain A), 102-198 (sHIPqp chain B), 301-306 (HRGsingle peptide 1), and 401-406 (HRGsingle peptide 2).

<sup>b</sup>PROCHECK (Laskowski et al., 1993, 1996).

<sup>c</sup>Ramachandran statistics calculated by PROCHECK (Laskowski et al., 1993, 1996).

further our understanding of the interactions between sHIP and the human host.

The structure determination of the sHIPqp mutant showed that the fold of the sHIP mutant and the sHIP wild type monomers is essentially identical, displaying an elongated helixloop-helix motif (**Figure 1**). However, the most significant difference is related to the oligomeric states of the two protein variants. For wildtype sHIP, four monomers assemble into a compact homotetramer, which can be described as a dimer of dimers due to the interactions at the dimer-dimer interface. Our previous studies showed that this interface consists of amino acids with largely non-polar side-chains, such as Leu84, Leu88, Tyr95, and Met98, which can be responsible for protein oligomerization. The mutation of these residues to alanine's resulted in a shift in quaternary structure from tetramers to dimers further evidenced by sedimentation velocity experiments that clearly show that the sHIPqp mutant exists as dimers in solution (**Figure 2**).

To gain further understanding of the interaction between sHIP and HRG, a molecular model was created for the complex between sHIPqp and a single heparin-binding motif of HRG (HRGsingle), a peptide shown to be active in binding to both sHIPwt and sHIPqp (**Figure 4**). The model shows that the HRGsingle peptide interacts with both monomers of the helixloop-helix dimers, and that a proline situated in the middle of the peptide sequence allows two histidines, His-2 and His-5, to enter deeply into the binding pocket created by the

two sHIP monomers (**Figures 5A,B**). However, even if both monomers contribute to contacts with the HRGsingle peptide, one of the monomers clearly dominates the overall contribution to the binding epitope for the sHIP dimer. Future studies should involve mutagenesis experiments of both sHIP and HRG peptides to further define the critical determinants for the interaction.

# CONCLUDING REMARKS

Through structure-based protein engineering of a selected set of residues in the dimer-dimer interface of sHIP, we were able to generate a dimeric form of the protein with retained functional properties. The access to dimeric sHIP provides the opportunity to utilize NMR techniques in order to further elucidate interactions involving this protein. In conclusion, we have through a combination of protein engineering, biophysics, and structural biology methods, provided the first molecular details describing the interaction between the antimicrobial protein HRG and the novel virulence factor sHIP.

# AUTHOR CONTRIBUTIONS

CD and MagW performed the NMR and X-ray experiments, respectively. IF performed the antimicrobial assay, WS the AUC experiments, and JM the MS proteomics analysis. LB was providing the clinical data assessment. MW (corresponding author) supervised the project, performed the ITC experiments, and wrote the manuscript with input from all authors.

# FUNDING

This work was supported by the Swedish Research Council (projects 7480, 2008:3356, and 621-2012-3559), the Swedish Foundation for Strategic Research (grant FFL4), the Crafoord Foundation (grant 20100892), the Wallenberg Academy Fellow KAW (2012.0178), the European research council starting grant

# REFERENCES


(ERC-2012-StG-309831), the Novo Nordisk Foundation grant NNF14CC0001, the European Commission under the Seventh Framework Program (FP7) contract Bio-NMR 261863 providing access to the Swedish NMR Centre, Gothenburg, Sweden, and the MAX laboratory, Lund, Sweden enabling access to the MAX laboratory (ID 20120014 and 20130189).

# ACKNOWLEDGMENTS

We would like to thank SyBIT project of SystemsX.ch for support and maintenance of the lab-internal computing infrastructure, the ITS HPC team (Brutus).


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Diehl, Wisniewska, Frick, Streicher, Björck, Malmström and Wikström. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Proteogenomic Analysis of Mycobacterium smegmatis Using High Resolution Mass Spectrometry

Matthys G. Potgieter <sup>1</sup> , Kehilwe C. Nakedi <sup>2</sup> , Jon M. Ambler <sup>1</sup> , Andrew J. M. Nel <sup>2</sup> , Shaun Garnett <sup>2</sup> , Nelson C. Soares <sup>2</sup> , Nicola Mulder <sup>1</sup> \* and Jonathan M. Blackburn<sup>2</sup> \*

<sup>1</sup> Computational Biology Division, Department of Integrative Biomedical Sciences, IDM, University of Cape Town, Cape Town, South Africa, <sup>2</sup> Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, IDM, University of Cape Town, Cape Town, South Africa

Biochemical evidence is vital for accurate genome annotation. The integration of experimental data collected at the proteome level using high resolution mass spectrometry allows for improvements in genome annotation by providing evidence for novel gene models, while validating or modifying others. Here, we report the results of a proteogenomic analysis of a reference strain of Mycobacterium smegmatis (mc2155), a fast growing model organism for the pathogenic Mycobacterium tuberculosis—the causative agent for Tuberculosis. By integrating high throughput LC/MS/MS proteomic data with genomic six frame translation and ab initio gene prediction databases, a total of 2887 ORFs were identified, including 2810 ORFs annotated to a Reference protein, and 63 ORFs not previously annotated to a Reference protein. Further, the translational start site (TSS) was validated for 558 Reference proteome gene models, while upstream translational evidence was identified for 81. In addition, N-terminus derived peptide identifications allowed for downstream TSS modification of a further 24 gene models. We validated the existence of six previously described interrupted coding sequences at the peptide level, and provide evidence for four novel frameshift positions. Analysis of peptide posterior error probability (PEP) scores indicates high-confidence novel peptide identifications and shows that the genome of M. smegmatis mc2155 is not yet fully annotated. Data are available via ProteomeXchange with identifier PXD003500.

Keywords: Mycobacterium smegmatis mc2155, mass spectrometry, proteogenomics, genome annotation, proteomics

# INTRODUCTION

Evidence for the existence of protein coding genes include ab initio gene predictions, transcriptomic analysis, and comparative genomics information (Krug et al., 2011). Although, gene annotation of model organisms often relies on transcript sequencing, it has become apparent that evidence of transcription may not equal evidence of translation (Castellana and Bafna, 2010). Proteomics data, on the other hand, gives direct evidence of which genes are translated. Using proteomics data, true genes can be separated from pseudogenes, and the translational frame can be determined. The location of TSSs (translational start sites) can be identified, while signal cleavage and other posttranslational modifications (PTMs) can be identified (Kucharova and Wiker, 2014). By mapping to uncharacterized genomic coordinates, novel genes can be identified (Borchert et al., 2010).

#### Edited by:

Biswarup Mukhopadhyay, Virginia Tech, USA

#### Reviewed by:

Leonard James Foster, University of British Columbia, Canada Julian Uszkoreit, Ruhr-Universität Bochum, Germany

#### \*Correspondence:

Nicola Mulder nicola.mulder@uct.ac.za; Jonathan M. Blackburn jonathan.blackburn@uct.ac.za

#### Specialty section:

This article was submitted to Microbial Physiology and Metabolism, a section of the journal Frontiers in Microbiology

> Received: 25 October 2015 Accepted: 16 March 2016 Published: 05 April 2016

#### Citation:

Potgieter MG, Nakedi KC, Ambler JM, Nel AJM, Garnett S, Soares NC, Mulder N and Blackburn JM (2016) Proteogenomic Analysis of Mycobacterium smegmatis Using High Resolution Mass Spectrometry. Front. Microbiol. 7:427. doi: 10.3389/fmicb.2016.00427

Genomic six frame translation allows for the creation of a database containing, in the ideal case, all possible putative proteins, but at the cost of including many spurious entries. This leads to decreased sensitivity of identifications at the same FDR as standard proteomic databases with a higher proportion of non-spurious entries (Castellana and Bafna, 2010). Due to the large database sizes involved in proteogenomics, and large numbers of spectra that need to be assigned, many false positive identifications are obtained even at a low error rate. To limit the number of spurious entries, thus increasing the sensitivity of identifications, proteogenomic databases can be compacted, by excluding entries below a minimum length cutoff or only focusing on genomic regions identified by ab initio gene prediction tools (Castellana and Bafna, 2010).

Automated gene prediction at the early stage of genome annotation is prone to errors, with rates of incorrect TSS prediction of up to 44% reported (Gallien et al., 2009), while short protein-coding genes are difficult to predict (Renuse et al., 2011). Accurately identifying the TSS of a gene is complicated by the existence of different possible start codons, with many nonstandard start codons identified in prokaryotes (Castellana and Bafna, 2010). Mycobacteria are known to use GTG and TTG as initiator Met (fMet) start codons, in addition to translation of these codons with Val and Leu, respectively (Kelkar et al., 2011).

Wang et al. (2005) used databases of gene predictions to identify mass spectra, identifying 901 proteins in Mycobacterium smegmatis using the partially sequenced genome available at the time—validating many predicted genes. Gallien et al. (2009) identified 946 proteins in M. smegmatis, characterizing 443 Nterminal peptides, and revealed an error rate of 19% in predicted TSSs. Kelkar et al. (2011) reported 41 novel protein-coding genes in Mycobacterium tuberculosis strain H37Rv. By identifying Nterminal peptides, the authors were able to correct the TSS of 33 proteins, and validate the TSS of 727 annotated proteins. Strikingly, the authors identified eight proteins with evidence for translation initiation at two different sites.

The occurrence of false positive peptide identifications from tandem mass spectrometry data implies the need for statistical concepts such as the false discovery rate (FDR) and posterior error probability (PEP) score to gauge the reliability of identifications. The FDR is controlled by scoring and ranking peptide spectral matches (PSMs), and identifying the score above which a maximum allowable proportion of PSMs from a decoy database are identified—indicating the expected maximum proportion of incorrect PSMs—while posterior error probability (PEP) indicates the probability of error of a single PSM (Käll et al., 2008). The MaxQuant algorithms calculate PEP scores based on search engine PSM scores and peptide length, and determine a cutoff PEP score at a specific FDR (Cox and Mann, 2008).

Krug et al. (2013) compared the number and median PEP scores of novel peptide identifications with reverse sequence hit peptides in a proteogenomic analysis of Escherichia coli, and determined that the absolute number as well as the median PEP-values for both groups were very similar. The authors concluded that as a genome approaches complete annotation, the likelihood of any novel peptide identification being a false positive identification increases (Krug et al., 2013).

Annotation of coding regions for organisms is a continuously evolving process. The genome of M. tuberculosis was sequenced in 1998 (Cole et al., 1998), yet a few years later, the same authors re-annotated the genome and identified 71 more ORFs than before (Camus et al., 2002). To date, the M. tuberculosis genome is annotated with 4018 ORFs<sup>1</sup> . This highlights the importance of continuous review and re-annotation of genomes as technologies improve. The M. smegmatis mc<sup>2</sup> 155 genome may not yet be fully annotated, and the genome sequence has been shown to contain multiple errors (Deshayes et al., 2007), one of the reasons being the high GC content and genome annotation shortfalls, such as short protein validation and incorrect TSS assignment of genes. That poses a problem by limiting our understanding of the many cellular processes coded by these genes. It is imperative that the genome of M. smegmatis mc<sup>2</sup> 155 be fully annotated since this bacterium, due to its non-pathogenic and fast growing status, is used frequently as a model organism to study the biology of M. tuberculosis, the causative agent of Tuberculosis. Tuberculosis continues to be a burden on the health system, with an estimated 9.6 million cases of infection and 1.5 million deaths in 2014, despite a globally decreasing incidence of ∼1.5% every year since 2000 (WHO, 2015). Limited understanding of the biology of M. tuberculosis is an obstacle to improving current treatment and eradication of the disease. Thus, it is hoped that refinements in proteogenomics pipelines and the study of model organisms of M. tuberculosis such as M. smegmatis mc<sup>2</sup> 155, may further our understanding of this pathogenic organism.

In this study, we developed and applied a compacted six frame genomic database, to map high resolution and high accuracy tandem mass spectrometry coupled to liquid chromatography data to re-evaluate the genome annotation of M. smegmatis mc<sup>2</sup> 155. We also used a database of ab initio gene predictions using GeneMarkS<sup>2</sup> in the proteogenomics pipeline to identify novel open reading frames, gene model validations, and gene model modifications. By analyzing the PEP score distribution of novel, annotated, and reverse sequence hit peptides, we explored the current annotation status of M. smegmatis mc<sup>2</sup> 155.

# MATERIALS AND METHODS

# Bacterial Cultures

Wild type strain of M. smegmatis mc<sup>2</sup> 155 were grown in 7H9 Middlebrook (BD, Maryland, USA) broth supplemented with 0.05% Tween 80, OADC (Becton Dickinson), and 0.2% glycerol (v/v). Cells were grown at 37◦C with continuous agitation (120 rpm).

# Protein Extraction

Cells were harvested from three biological replicates each during the exponential and early stationary phase (OD<sup>600</sup> ∼ 1.2 and 1.8, respectively) by centrifugation at 4000 g for 15 min at 4◦C. Cell pellets were washed with PBS (10 mM phosphate buffer, 2.7 mM potassium chloride, and 137 mM sodium chloride, pH 7.4) and flash frozen in liquid nitrogen and stored at −80◦C. Pellets were

<sup>1</sup>http://tuberculist.epfl.ch/

<sup>2</sup>GeneMarkTM—Free Gene Prediction Software." http://exon.gatech.edu/Gene Mark/(Accessed September 21, 2015).

suspended in lysis buffer [500 mM Tris-HCl, 0.1% (w/v) SDS, 0.15% sodium deoxycholate], 1× protease inhibitor cocktail, 1× phosphatase inhibitor cocktail (Roche, Mannheim Germany), and 50 µg/ml lysozyme (Repaske, 1956) and disrupted by sonication at maximum power for six cycles of 30 s, with 1 min cooling on ice between cycles (Rezwan et al., 2007). Lysates were further clarified using centrifugation at 4000 g for 5 min and filtering through a 20 µm pore size low-protein binding filter (Merck, NJ, USA). Proteins were precipitated using the chloroform–methanol precipitation method as previously described (Wessel and Flügge, 1984). Protein precipitate was suspended in denaturing buffer (10 mM Tris-HCl, 6 M urea, 2 M thiourea, pH 8). Protein concentration was determined using the modified Bradford assay as described by Ramagli (1999).

# In-Solution Trypsin Digestion

Fifty micrograms of precipitated protein was reduced with 1 mM DTT for 1 h followed by another hour of incubation in the presence of 5.5 mM IAA. Alkylated protein samples were predigested for 3 h at room temperature with lysyl endopeptidase LysC (Waco, Neuss, Germany). Pre-digested samples were diluted four-fold with 20 mM ammonium bicarbonate pH 8 prior to trypsination. Sequencing grade modified trypsin (Promega, Madison, USA) was used at a protease:protein ratio of 1:50 (w/w) for 14 h at room temperature with gentle agitation. Trypsination was terminated with 0.1% formic acid final concentration (Sigma Aldrich, St Louis, USA). Ten micrograms of peptides were desalted using a homemade stage tip containing Empore Octadecyl C18 solid-phase extraction disk (Supelco; Rappsilber et al., 2003). Activation, equilibration, and peptide wash and elution were all carried out using centrifugation at 5000 g for 5 min. Activation and equilibration of the C18 disk was carried out using three rinses with 80% acetonitrile (ACN), followed by three rinses with 2% ACN, respectively. Peptide rich solution was loaded onto the disk and centrifuged. Desalting was carried out using three washes of 2% ACN, followed by three washes of 2% ACN containing 0.1% formic acid (Sigma). Elution of desalted peptides into glass capillary tubes was carried out using three rounds of 100 µL of 60% ACN, 0.1% formic acid. Peptides were dried in a vacuum and resuspended in 2% ACN, 0.1% formic acid at 50 ng/µL.

# LC/MS/MS Analysis

Data acquisition was performed on the Orbitrap Q-Exactive mass spectrometer (Thermo Scientific) in a data-dependent manner, coupled to the Dionex Ultimate 3000 UHPLC (Thermo Scientific). One microgram of peptides were loaded on to an inhouse packed pre-column (100 µm ID × 20 mm) connected to an in-house packed analytical column (75 µm × 400 mm) both packed with C18 Luna 5 µm 100 Å beads (04A-5452) for liquid chromatography separation. The flow rate was set to 300 nl/min with the gradient of 2% to 25% ACN for 125 min, then up to 35% in 5 min. To wash the column ACN was increased to 80% for 20 min followed by a column equilibration at 2% ACN for 10 min. A top 10 method with 30 s dynamic exclusion was used to acquire mass spectra with automatic switching between MS and MS/MS scans. The LC/MS/MS methods used have been described previously (Nakedi et al., 2015).

# Proteogenomic Databases

The genome of M. smegmatis mc<sup>2</sup> 155 was accessed from the European Nucleotide Archive under the accession number CP000480, Genome Assembly GCA\_000015005.1 (Fleischmann et al., 2006). To facilitate genomic database compaction each open reading frame in the six frames was translated in silico starting at the most upstream start codon—using possible start codons ATG, GTG, and TTG—and sequences below a minimum translated length of 20 amino acids were excluded from the database. For sequences at the end of each of the genomic frames that did not end in a stop codon, the sequence up till the last in-frame codon was translated and included in the database. The genomic coordinates of the six frame translated sequences were included in the FASTA headers of the database. Translated sequences that occurred more than once in the database were combined into a single entry with multiple genomic coordinates. The Reference proteome for M. smegmatis mc<sup>2</sup> 155 was obtained from UniProt<sup>3</sup> (Proteome ID UP000000757), and any translated six frame sequence overlapping with or identical to a Reference protein sequence was mapped to that protein in the FASTA header. Where a Reference sequence was located downstream of a translated six frame TSS, the number of amino acids difference in the N-termini of the two sequences was recorded, and six frame sequences identical to a protein in the Reference proteome were labeled. Overlapping open reading frames were included in the database, leading to a final database size of 79,481 entries.

A second genomic database for targeted identification of TSSs was generated using ab initio prediction of protein-coding genes with the GeneMarkS software package (version 4.28)—at the same time providing supporting evidence for novel protein identifications and gene model modifications. This software uses a Hidden Markov Model algorithm, combined with models of protein coding and non-coding regions and gene regulation sites, to predict the occurrence of genes in a DNA sequence (Besemer et al., 2001). The predicted genes were translated and the final database contained 6655 sequences.

# Database Search Using Maxquant

The MaxQuant software package (version 1.5.0.30) was used to search the raw MS spectra using the Andromeda search engine separately against the six frame database, UniProt Reference proteome and GeneMarkS database—using reverse decoy databases and a selection of known contaminants provided by MaxQuant. Trypsin and LysC were selected as enzymes, and a maximum of three missed cleavages were allowed. Specific enzyme mode was selected, which allowed for the detection of non-tryptic N-terminal peptides of database entries, peptides with N-Met cleavage, and fully tryptic peptides. Carbamidomethyl was set as a fixed modification and Acetyl (Protein N-term) and Oxidation (M) were set as variable modifications. A minimum peptide length of seven amino acids, and a minimum of one unique peptide identification per protein group, was required. The default MaxQuant false discovery

<sup>3</sup>UniProt. http://www.uniprot.org/ (Accessed September 21, 2015).

rate (FDR) cutoff of 0.01 (1%) was used at the PSM, peptide and protein group levels. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (Vizcaíno et al., 2014) via the PRIDE partner repository (Vizcaíno et al., 2013) with the dataset identifier PXD003500 and the data is freely available<sup>4</sup> .

# Proteogenomic Analysis

We relied heavily on previously published proteogenomic protocols, in particular the methodology described by Kelkar et al. (2011)—differing in that we used a separate search database (GeneMarkS) for targeted TSS identification, and did not exclude all novel peptides mapping to multiple genomic locations, allowing for the identification of paralogous translated ORF sequences.

All MaxQuant results for the different databases were combined using an in-house python script. Use was made of the Biopython software package<sup>5</sup> (Cock et al., 2009). Pettersen et al. (2015) only considered proteins identified in at least two replicates in their proteogenomic analysis of enterotoxigenic Escherichia coli—similarly, we only considered peptides identified in at least two replicates for further analysis. All peptides were searched against the genome dynamically translated in the six reading frames, and peptides unique to a single position in the genome were identified. Mycobacteria are known to have duplications in protein-coding genes due to the effect of transposable elements (Dale, 1995). To allow for the detection of paralogous sequences, peptides specific to a single repeating translated six frame ORF sequence in the genome were identified as paralogous sequence peptides, and also included in the analysis, but identified as belonging to paralogous translated ORF sequences.

Peptides not found in the Reference proteome were identified as genome search specific peptides (GSSPs). GSSPs are peptides mapping to genomic regions not considered to be coding regions, or not included in overlapping or adjacent gene models (Kelkar et al., 2011). We used the UniProt Reference proteome as a benchmark for annotation status to discriminate annotated peptides from GSSPs. Only GSSPs were used for upstream gene model modifications and novel gene model identifications, while N-terminal peptides mapping downstream in the gene model of Reference proteins were used for downstream TSS assignment. N-terminal peptides mapping to the start of Reference protein gene models were used for TSS validation. All protein groups with no peptides unique to the protein sequence after performing the genome search, were excluded from further analysis. All results in the different databases mapping to the same ORF were identified for comparison, and the unique identified peptide set across all three databases for each ORF was obtained. All database entries were mapped to their respective ORFs in the genome using an in-house python script (see **Figure 1**).

TSS peptides were identified by the non-tryptic nature of their N-terminals, N-Met cleavage, or N-terminal acetylation. Reference protein start sites were validated by identifying TSS peptides mapping to the genomic coordinates of the start of a Reference protein sequence. All open reading frames mapping to Reference proteins, with peptides identified upstream in the ORF from the annotated TSS of the Reference protein, were identified for upstream gene model modification, while Nterminal peptides mapping downstream of a Reference protein start were used for downstream gene model modification. The coordinates of novel ORFs were compared to the coordinates of known cDNA features (sequenced transcripts obtained from Ensembl<sup>6</sup> using a sense strand genome coordinate search (inhouse python script), and ORFs overlapping with cDNA evidence in the genome were identified. A nucleotide BLAST of novel ORFs was performed against the transcript sequences, requiring same strand alignment with an E-value cutoff of 0.0001. A protein BLAST of novel translated ORFs, as well as ORFs for which gene model modification were identified, was performed against the NCBI non-redundant (nr) BLAST database, and the highest scoring alignment by E-value was obtained for each sequence, using an E-value cutoff of 0.0001. The leading gi IDs of protein BLAST results were converted to UniProtKB IDs using the UniProt Retrieve/ID Mapping tool. Novel ORFs were ranked by number of identified GSSPs, protein BLAST evidence, cDNA nucleotide BLAST evidence, and cDNA coordinate overlap.

Interrupted CoDing Sequences (ICDSs) are erroneously shortened gene model predictions either due to the presence of unrecognized true genomic events (such as programmed frameshifts or in-frame stop codons), or artificially due to sequencing errors (Perrodou et al., 2006). We followed the methodology described by Perrodou et al. (2006) and Deshayes et al. (2007) to identify ICDSs based on shared homologous sequence evidence (protein BLAST) of adjacent or overlapping non-paralogous ORFs, but differ in that we only focused on ORFs identified at the peptide level.

All peptides identified in the three databases in at least two replicates, as well as the GeneMarkS FASTA database, were processed into general feature format (GFF) files for visualization using the Ensembl genome browser for this strain7—allowing for manual examination of novel annotations, gene model modifications, and ICDSs (see **Supplementary Data Sheets 1**–**4**).

The PEP score distributions of novel, annotated and reverse peptide identifications from each database were analyzed with in-house python and R scripts. Use was made of the python module matplotlib (Hunter, 2007) to produce boxplots of the PEP score distributions, while the R ggplot2 package (Wickham, 2009), as well as the density and qqnorm functions from the R stats package (R Core Team, 2015) were used to plot the PEP score distributions and investigate for normality. As non-normal PEP score distributions were found, the non-parametric Kruskal– Wallis analysis of variance test was chosen to investigate for differences between the groups—using the kruskal.test function from the R stats package, followed by post-hoc analysis with a twosided Dunn test (Dunn, 1964) with Bonferroni correction using the dunnTest function from the R FSA package (Ogle, 2016).

<sup>4</sup>http://www.ebi.ac.uk/pride/archive/

<sup>5</sup>Biopython-Biopython. Available online at: http://biopython.org/wiki/Main\_Page (Accessed September 21, 2015).

<sup>6</sup> ftp://ftp.ensemblgenomes.org/pub/bacteria/current/fasta/bacteria\_7\_collection/ mycobacterium\_smegmatis\_str\_mc2\_155/cdna/

<sup>7</sup>http://bacteria.ensembl.org/Mycobacterium\_smegmatis\_str\_mc2\_155/Info/ Index

Venn diagrams were produced using the online venn diagram plotting tool Venny 2.1.0 (Oliveros, 2016).

# RESULTS

From the 276,472 MS/MS spectra submitted to MaxQuant at 1% FDR, 26,125 peptide sequences were identified from 172,570 spectra using the translated six frame database, 27,895 peptides from 176,518 spectra using the Reference proteome database, and 27,735 peptides from 176,301 spectra using the GeneMarkS database. Only protein groups were considered where the leading protein had at least one unique peptide identification. By mapping identified proteins from the different databases to their corresponding ORFs in the genome, 2887 ORFs were identified at the peptide level (identical translated ORF sequences with multiple occurrences in the genome having been combined into a single entry with multiple genomic coordinates at the database generation phase; see **Supplementary Table 1** and **Figure 2**).

# TSS Peptide Identifications

TSS peptide identifications can be divided into peptides with and without N-Met cleavage. We identified 137 TSS peptides with non-tryptic Met N-termini at a genomic start codon position (of which three were acetylated at the N-terminal), and 549 N-Met cleaved peptides (of which 127 were acetylated at the N-terminal). The distribution of penultimate amino acids of identified N-Met cleaved peptides—Thr (188), Ser (134), Ala (128), Pro (51), Gly (18), Val (15), Asn (13), Leu (1), and Arg (1)—corresponds to the non-random nature of Met-AP cleavage (Link et al., 1997; Frottin et al., 2006). Further, the identified start codon distribution of all non-paralogous ORFs with an identified N-terminal— ATG 63.96%, TTG 1.14%, and GTG 34.90%—corresponds to the high percentage of GTG start codons reported previously in M. tuberculosis (Cole et al., 1998) and M. smegmatis (Gallien et al., 2009), but with a lower proportion of identified TTG start codons. We identified three ORFs with evidence for multiple initiation, corresponding to the observations of Kelkar et al. (2011; see **Supplementary Table 2**).

Two ORFs were detected where the most upstream evidence was an N-Met cleaved peptide and the penultimate position corresponded to another possible start codon. For one of these proteins, the second amino acid in the peptide was also a Met located at an ATG codon, thus not allowing for discrimination between N-Met cleavage and downstream initiation. For the second protein, the N-terminal was identified by an N-Met cleaved peptide with V as the penultimate amino acid, mapping to a GTG codon at the annotated start site of a Reference proteome sequence. Due to the known initiation of translation with fMet, this was concluded to be an instance of N-Met cleavage, leading to modification of the gene model to include the adjacent upstream start codon.

identified in more than one replicate were considered. Two identified Reference proteins spanning multiple ORFs (A4ZHT6 and A4ZHR8), not mapping on a one to one basis to a genomic ORF, are included in the diagram as separate ORFs.

# Gene Model Validations

We identified 2810 genomic ORFs annotated to Reference proteins with at least one unique peptide—with a median of six unique peptides identified per ORF. Using N-terminal peptide evidence, we validated the TSS for 558 Reference protein gene models. The start codon distribution of validated Reference protein TSSs also reflected the higher proportion of GTG start codons in Mycobacteria—with 65.05% ATG, 0.72% TTG, and 34.23% GTG. Prominent Reference proteins identified include A0R1H7\_MYCS2—Fatty acid synthase with 168 unique peptides and 71.32% sequence coverage, RPOC\_MYCS2—DNA-directed RNA polymerase subunit beta—with 96 unique peptides and 79.5% sequence coverage, Q3L891\_MYCS2—Linear gramicidin synthetase subunit D, predicted protein—with 89 unique peptides and 54.98% sequence coverage, Q3L885\_MYCS2—Type I modular polyketide synthase, predicted protein—with 88 unique peptides and 38.44% sequence coverage, and A0R617\_MYCS2—Polyketide synthase, predicted protein—with 82 unique peptides and 62.14% sequence coverage. The predicted and unreviewed Reference proteome entry A0R0A1\_MYCS2 (Glyoxalase/bleomycin resistance protein/dioxygenase) was identified with 99.28% sequence coverage from 13 identified peptides and 152 MS/MS spectra (see **Figure 3** and **Supplementary Table 3**).

Gene model validation was also obtained for the Reference protein MSHD\_MYCS2 (Mycothiol acetyltransferase, inferred from homology). This is an interrupted coding sequence (ICDS) identified by Deshayes et al. (2007)—which they confirmed by resequencing to span a sequencing error in the M. smegmatis mc<sup>2</sup> 155 genome (GenBank accession DQ866865). We present first-time peptide evidence for this protein with eight peptides identified from 50 MS/MS scans using the Reference proteome database, including a peptide spanning the frameshift position (245) in the corrected sequence (see **Figure 4** and **Supplementary Table 4**).

# Upstream Gene Model Modifications

Upstream peptide evidence was identified for 81 Reference proteome gene models. Upstream peptide identifications can be grouped into N-terminal (TSS) or fully tryptic upstream peptides—for 39 of the upstream gene model modifications the TSS was detected exactly, while for the 42 sequences with only non-tryptic upstream peptides, the next upstream start-codon in the sequence was located. The putative new gene models thus obtained were searched against the NCBI nr database using protein BLAST, to identify orthologous gene models, or alternative gene models in the same strain. For 58 of the modified gene models, nr BLAST alignment yielded a sequence of the same length, supporting the gene model modification, while 14 nr BLAST alignments in the group without exact TSS identification indicated a TSS further upstream. Only one nr BLAST result in the group with an identified TSS indicated a further upstream TSS than the one identified. The start codon GTG was overrepresented in upstream TSS identifications, making up 38.46%—supporting the observation that ATG may be over predicted as translational start codon (Gallien et al., 2009). The median length of upstream N-terminal extension was five amino acids in the group with upstream TSS identifications. The Reference protein A0R4J1\_MYCS2 (Phosphoribosylamine– glycine ligase, inferred from homology) was identified with the N-terminal of three identified peptides extending upstream of the predicted TSS—extending the N-terminal of the protein by 29 amino acids. The modified sequence is identical to the predicted protein I7GFT2\_MYCS2 of the same strain (not included in the Reference proteome; see **Figure 5** and **Supplementary Table 5**).

# Downstream Gene Model Modifications

Downstream TSS evidence was identified for 24 Reference proteome gene models—with all cases corresponding to

homology. (A) This sequence corresponds to an interrupted coding sequence (ICDS) identified by Deshayes et al. (2007), which they confirmed by resequencing to span a sequencing error in the M. smegmatis mc2155 genome (GenBank accession DQ866865). We present peptide evidence for this protein with eight peptides identified from 50 MS/MS scans after searching the Reference proteome database, as well as identifying a peptide spanning the frameshift position (245) in the corrected sequence—highlighted in red. (B) A representative spectrum of the highlighted peptide, which was identified with four MS/MS scans, supporting the sequence correction at the peptide level.

an alternative TSS prediction in the GeneMarkS database. The predicted gene model A0QWY3\_MYCS2 (Quinone oxidoreductase) was shortened by 16 amino acids, with the identification of a TSS peptide MHAIEVAETGGPEVLNYIER PEPSPGPGEVLIK with a non-tryptic N-terminal downstream of the annotated TSS. The downstream TSS peptide corresponded to the N-terminal of the GeneMarkS predicted sequence for this ORF, thus allowing this semi-tryptic TSS peptide to be included in the GeneMarkS database search space. The modified sequence is identical to I7G8G0\_MYCS2, predicted for the same strain but not included in the Reference proteome. The downstream TSS peptide was identified from 34 MS/MS scans (see **Figure 6** and **Supplementary Table 6**).

# Novel ORF Identifications

Due to the high prevalence of gene prediction errors, inclusion in the UniProt Reference proteome was used as a benchmark for annotation status in this study. Thus, all identified ORFs not annotated to a Reference proteome entry were considered novel identifications. Peptide evidence mapping to ORFs not annotated to a Reference proteome entry was identified for 72 ORFs, of which 44 were identified with two or more peptides. ORFs with one or more identified peptides that occurred adjacent to another novel ORF identification, were examined as possible evidence for Interrupted CoDing Sequences (ICDSs). Nineteen novel ORFs that were identified with a single peptide, and were supported by either protein BLAST alignment or a previously identified transcript overlapping on the genome on the same strand, are presented as lower ranking evidence for genome annotation. Nine ORFs with only one identified peptide and no supporting evidence, were excluded from the further analysis due to the high likelihood of these identifications being erroneous—leading to a total of 63 novel ORF identifications (see **Supplementary Table 7**).

### Validation of Previously Identification Interrupted Coding Sequences (ICDSs)

Twelve non-Reference proteome ORFs identified at the peptide level corresponded to six interrupted coding sequences previously reported by Deshayes et al. (2007), with GenBank accessions DQ866867, DQ866856, DQ866859, DQ866863, DQ866858, and DQ866873—see **Supplementary Figures 1A–E,G,** respectively, for Ensembl genome browser visualizations, and **Supplementary Table 8** rows 2–11 and 14–15 for detailed information. The above authors had shown by resequencing that these frameshifts corresponded to genome sequencing errors, and they also reported peptide-level evidence for two of these sequences using nano-LC/MS/MS analysis (DQ866873 and DQ866856). Thus, we were able to identify four of these ICDSs with first-time peptide evidence, and validate two ICDSs previously identified at the peptide level.

#### Novel ICDSs

Four likely novel ICDS sequences were identified with peptide evidence spanning either side of a possible genomic frameshift region from eight non-Reference proteome ORFs. In an interesting case, three novel peptides were identified from an ORF with the closest nr protein BLAST alignment to a predicted protein in Mycobacterium goodii—A0A0K0X632\_9MYCO (Peptidase M75). A non-Reference proteome predicted ORF I7FS93\_MYCS2 was identified partially overlapping and upstream of this ORF with peptide evidence. This upstream ORF also aligned with high confidence to the same sequence in M. goodii. To our knowledge an ICDS has not previously been identified at this position, but peptide evidence from two overlapping reading frames and alignment to an orthologous sequence of a related species, supports the identification of a novel ICDS at this site—although the existence of separate adjacent protein coding genes cannot entirely be excluded (see **Supplementary Figures 1F** and **Supplementary Table 8** rows 12–13).

Another novel ORF with only one identified peptide— TAILDAAAQLIAER—was identified upstream and partially overlapping the predicted ORF I7G4B8\_MYCS2 that was identified with three peptides (see **Supplementary Figure 1J** and **Supplementary Table 8** rows 20–21). Both ORFs aligned with high confidence to the predicted protein L8F9F0\_MYCSM of M. smegmatis MKD8—the genome sequence of which has recently been announced (Gray et al., 2013). Thus, orthologous sequence evidence combined with evidence at the peptide level strongly supports the existence of an ICDS in this position, and further investigation is needed to ascertain whether this is an occurrence of authentic mutation or sequencing error, or in fact two separate protein-coding genes. The identification of two more ICDS sequences with peptide evidence on either side of a possible frameshift position was also facilitated by alignment to orthologous sequences submitted by Gray et al. (2013) as a result of their genome sequencing efforts of M. smegmatis MKD8 emphasizing the iterative nature of genome annotation as new data becomes available (see **Supplementary Figures 1H,I**, and **Supplementary Table 8** rows 16–19).

### Novel ORFs Identified with Two or More Peptides

We identified 44 non-Reference proteome ORFs with two or more peptides (with a median of five identified unique peptides per ORF in this group). Protein BLAST alignments were obtained for 43 of these, and the gi accession numbers thus obtained were mapped to their corresponding entries in UniProt using the UniProt "Retrieve/ID mapping" tool<sup>8</sup> . Of the 39 sequences that were mapped with this tool, 31 were predicted and eight were inferred from homology. Three of these sequences were annotated to M. smegmatis MKD8, one to M. thermoresistibile strain ATCC 19527, and four to M. smegmatis non-specifically. One identified ORF alignment—discussed above as part of an identified ICDS—aligned to a sequence annotated to M. goodii. The remaining 30 identified ORFs aligned to Non-Reference proteome sequences from M. smegmatis mc<sup>2</sup> 155—with 26 predicted and four inferred from homology. An interesting novel ORF was identified with two peptides from eight MS/MS spectra mapping to an intergenic region with a genomic position from 6,434,313 to 6,434,801 on the reverse strand. The sequence yielded a protein nr BLAST alignment to G7CE94\_MYCTH (Lipoprotein LppV), a 182 amino acids long predicted protein annotated to M. thermoresistibile, with an E-value of 2.13747e-44 (see **Figure 7** and **Supplementary Table 7**).

# Database Comparison

The PEP scores of the identified peptides were obtained from the MaxQuant peptides.txt file—which contains the identified peptides with the PEP score calculated using the peptide length and the Andromeda score for the best associated MS/MS spectrum. Density plots of the distribution of PEP scores for each database revealed non-gaussian PEP score distributions. Analysis of variance of novel, annotated and reverse peptide PEP scores for each database was performed using the Kruskal–Wallis non-parametric analysis of variance test followed by post-hoc analysis with a two-sided Dunn's test and Bonferroni correction (see **Supplementary Data Sheet 5**). A statistically significant difference between group means (annotated, novel and reverse peptides) for each of the three databases was found (p-values < 2.2e-16). Post-hoc analysis indicated a significant difference between reverse and annotated peptide group PEP scores for both the GeneMarkS (adjusted p-value 6.68e-21), six frame database (adjusted p-value 3.67e-19), and Reference proteome (adjusted p-value 1.05e-31) peptide results, while the assignment of statistically significant differences between annotated and novel peptide group PEP scores varied between the six frame database (adjusted p-value 1.89e-06), and GeneMarkS database (adjusted p-value 8.75e-01), although the values were much higher than the annotated-reverse comparisons. Further, the comparison between novel and reverse peptide group PEP scores indicated a significant difference for both the six frame database (adjusted p-value 9.84e-13) and GeneMarkS database (adjusted p-value 5.50e-17) comparisons—with much lower adjusted pvalues than those of the annotated-novel group comparisons of both databases. PEP score analysis was also performed on the combined PSMs for each database obtained from the msms.txt file, supporting the below analysis (see **Supplementary Table 9, Supplementary Figure 2**, and **Supplementary Data Sheet 6**.

# Reference Proteome Database

Using the Reference proteome database, 27,720 annotated peptides were identified, with a median PEP score of 1.32E-04. Further, 72 reverse hit peptides were reported by MaxQuant, with a median PEP score of 7.46E-02. Kruskal–Wallis and Dunn's test post-hoc analysis showed a statistically significant difference between the two groups (adjusted p-value 1.05e-31 before excluding peptides seen in only one replicate). After selecting peptides identified in at least two replicates, 2788 database sequences were identified with at least one unique peptide. For the group of Reference proteins with TSS validation and single genomic coordinates, a start site distribution of ATG 353 (64.53%), TTG 4 (0.73%), and GTG 190 (34.73%) was determined (see **Figures 2**, **8A**, **Table 1**, and **Supplementary Table 10**).

# GeneMarkS Database

The GeneMarkS database allowed for the identification of 27,168 annotated and 407 novel peptides, with a median peptide PEP score of 1.16E-04 and 2.22E-04 respectively, and 59 reverse hit peptides with a median peptide PEP score of 6.24E-02. The PEP score distribution of annotated and novel peptides were not significantly different using Kruskal–Wallis and posthoc tests (adjusted p-value 8.75e-01), and from the boxplot

<sup>8</sup>Retrieve/ID Mapping (UniProt). http://www.uniprot.org/uploadlists/

#### TABLE 1 | Database identifications.


A table comparing Reference protein ORF identifications, TSS validations, upstream gene model modifications, upstream TSS identifications, downstream TSS identifications and novel ORF identifications using three different databases.

visualizations appear similar to the PEP distribution of annotated peptides obtained from the Reference proteome database. A significant difference was found between annotated and reverse peptide PEP score distributions (adjusted p-value 6.68e-21). A comparison between novel and reverse peptide PEP score distributions revealed a significant difference (adjusted p-value 5.50e-17), and the number of reverse hit peptides identified was markedly lower than the number of novel peptides identified. Thus, it is very likely that most novel identifications are true positive identifications (see **Figure 8B**). After excluding peptides only identified in a single sample, 2815 GeneMarkS sequences were identified with at least one unique peptide. Of these, 42 were not annotated to a Reference protein. Further, in the group of 558 Reference protein TSS validations, 442 sequences had a corresponding entry in the GeneMarkS database with the correct TSS prediction (of which 439 were identified) while only one did not have a corresponding entry, and 115 had a corresponding entry with an alternative TSS assignment. Thus, in the group of TSS validations with a corresponding GeneMarkS entry, GeneMarkS had correctly predicted the TSS in 79.35% cases. Further in the set of all identified genomic ORFs annotated to a Reference protein (2810), GeneMarkS had

FIGURE 8 | Peptide PEP score distribution. The posterior error probability (PEP) score distribution of novel, annotated and reverse hit peptide identifications obtained using three different database searches are shown, using the PEP score of the best PSM obtained for every peptide, from the MaxQuant peptides.txt output file. (A) Reference proteome, (B) GeneMarkS database, (C) Six frame database. The whiskers represent 1.5 times the interquartile range (IQR) below and above the first and third quartile, respectively. Differences in the means of the groups from each database search were evaluated using the non-parametric Kruskal–Wallis test, which indicated significant differences between the means of novel and reverse peptide PEP scores in the GeneMarkS (adjusted p-value 5.50e-17) and six frame database (adjusted p-value 9.84e-13). The p-values assigned to the comparisons between annotated and novel groups for the GeneMarkS (adjusted p-value 8.75e-01) and six frame (adjusted p-value 1.89e-06) databases were much larger than the p-values assigned to the comparisons between novel and reverse groups. Significant differences were found in the annotated-reverse comparisons for all three databases—GeneMarkS (adjusted p-value 6.68e-21), six frame database (adjusted p-value 3.67e-19) and Reference proteome (adjusted p-value1.050e-31). The effect of excluding peptides seen only in a single replicate on the PEP score distributions is also apparent. The adjusted p-value assigned to the comparison between all six frame novel and annotated peptides (adjusted p-value 1.89e-06) was much smaller than the adjusted p-value assigned to the same comparison after excluding peptides seen only in a single replicate (adjusted p-value 1.77e-02). In contrast, the comparison between six frame novel and reverse peptides seen in at least two replicates was assigned an adjusted p-value of 2.87e-06, which is closer to the p-value assigned to the comparison between annotated and reverse peptides seen in at least two replicates (adjusted p-value 6.09e-08; see Supplementary Table 10 and Supplementary Data Sheet 5 for the results of Kruskal–Wallis tests and post-hoc pairwise comparisons).

predicted the ORF as protein-coding in 99.68% of cases. Of the 81 upstream gene model modifications, 80 had a corresponding GeneMarkS sequence identified, of which 42 of the modifications were supported by upstream peptide identifications from the GeneMarkS database (with 28 TSS identifications). All of the 24 downstream gene model modifications were identified with TSS evidence from the GeneMarkS database, with a mean N-terminal shortening of ∼10 amino acids in this group (see **Figure 2**, **Table 1**, and **Supplementary Table 10**).

#### Six Frame Database

The translated six frame database allowed for the identification of 25,427 annotated and 553 novel peptides, with a median peptide PEP score of 1.07E-04 and 1.04E-03 respectively, and 48 reverse hit peptides with a median PEP score of 6.47E-02. Using Kruskal– Wallis and Dunn's test post-hoc analysis, a significant difference between the group PEP scores of annotated and reverse (adjusted p-value 3.67e-19), annotated and novel (adjusted p-value 1.89e-06), and novel and reverse (adjusted p-value 9.84e-13) peptide identifications was found. After selecting peptides seen in at least two replicates, a much larger although still statistically significant adjusted p-value of 1.77e-02 for the comparison between novel and annotated groups was obtained—possibly indicating a relatively higher proportion of false positive identifications in the set of novel peptide identifications than in the annotated set. Posthoc comparison between novel and reverse peptide PEP score distributions revealed adjusted p-values of 9.84e-13 and 2.87e-06 before and after excluding peptides seen in only one replicate respectively—much lower than the p-values obtained from the comparison between novel and annotated peptide groups indicating a much greater difference between the distribution of novel and reverse peptide PEP scores than between novel and annotated peptides, see **Figure 8C**. After selecting peptides seen in at least two replicates, 2825 ORFs were identified using the six frame database. Seventy-two ORFs identified using the six frame database mapped to non-Reference proteome ORFs. Of the 558 Reference protein sequences with TSS validations, a corresponding six frame sequence was identified in 544 cases. In this group, 299—or 54.96%—of the six frame sequences (translated from the most upstream start codon in the ORF) correctly assigned the TSS. On average, the six frame database overestimated the correct TSS by ∼32 amino acids. Further, in the group of 81 upstream gene model modifications, 80 cases had a corresponding six frame sequence identified. In this group, 87.5% of the upstream gene model modifications were supported by the identification of upstream peptide evidence using the six frame database. Of the 38 sequences with identified TSS sites in this group, 29 were correctly assigned in the six frame database (see **Figure 2**, **Table 1**, and **SupplementaryTable 10**).

# DISCUSSION

Validation of genomic information by proteomic data has gained momentum over the past few years. The improvement in genome annotation can give rise to a better understanding of the biology of pathogenic organisms and ultimately, new strategies for disease prevention and treatment. Mass-spectrometry data has proven invaluable in terms of providing evidence of the translation of genes expressed under differing conditions.

By mapping identified proteins from the different databases to their corresponding ORFs in the genome, 2887 ORFs were identified at the peptide level. We showed that the identified start codon distribution of all identified ORFs with TSS identifications corresponded to previously reported findings in Mycobacteria, with a relatively higher percentage of GTG, although a relatively lower proportion of TTG start codons was identified. The identified penultimate amino acids of N-Met cleaved peptides corresponded to previously published findings, although the preponderance of Thr was notable.

We identified 2810 Reference proteins at the peptide level, and using N-terminal peptide evidence we validated the TSS for 558 of these. Further, 81 gene model modifications were indicated by the identification of peptides mapping upstream, and 24 by Nterminal peptides mapping downstream, of the annotated TSS of a Reference protein.

We provide experimental evidence for 63 novel ORFs not previously verified at the protein level. A total of 44 ORFs were identified with two or more peptides—of which 30 aligned to predicted or inferred non-Reference proteome sequences of M. smegmatis mc<sup>2</sup> 155. A single ORF was identified with two peptides but did not yield any protein BLAST alignments. Further, 19 novel ORFs were identified with a single peptide but were supported by either protein BLAST alignment or a previously identified transcript overlapping the ORF on the genome on the same strand, and are presented as lower-ranking evidence for annotation. We also identified six previously reported interrupted coding sequences caused by sequencing errors, with peptides identified on either side of a frameshift position four of which do not appear to have been previously identified at the peptide level. Further, we identified evidence for four novel ICDSs, three of which were supported by alignment to orthologous sequence evidence from a newly sequenced strain of M. smegmatis, emphasizing the importance of continuous review of genome annotation as new information becomes available.

In this study, we analyzed LC/MS/MS data with three different sequence databases to improve the genome annotation of M. smegmatis mc<sup>2</sup> 155; two genomic—a six frame translation and a GeneMarkS gene prediction database—and the UniProt Reference proteome<sup>9</sup> . In both genomic databases, the PEP score distribution of novel peptides was much closer to that of annotated peptides than reverse sequence hits, and almost identical in the GeneMarkS database—indicating that most novel peptide identifications are likely to be true positive identifications. A higher number of total peptide identifications was attained using GeneMarkS than the six frame database closely approaching the number attained using the Reference proteome. We report a high percentage of accurate TSS predictions using GeneMarkS—79.35% in the group of Reference protein TSS validations where a corresponding GeneMarkS entry was predicted. This indicates that proteogenomic database generation with targeted ORF and TSS predictions using de novo gene prediction tools such as GeneMarkS can fruitfully decrease the database search space, increasing the sensitivity of peptide identifications. However, database compaction is a compromise between reducing spurious possibilities, and minimizing the exclusion of non-spurious entries from the search space. ORF and TSS assignments will depend on the particular gene prediction tool used, and will not be detected if incorrectly assigned at the database generation phase. By including a six frame translation database in the pipeline, ORFs missed by the gene prediction algorithm, as well as upstream peptide evidence and TSS site identifications not included in the gene prediction database, may be identified.

Considering the PEP score distributions of novel and annotated peptides, and the much smaller number of reverse sequence hits than novel peptide identifications, it is evident that the proteome of M. smegmatis mc<sup>2</sup> 155 is not yet fully characterized and more work needs to be done to identify novel protein coding genes before a complete genome annotation status is attained. We hope that the evidence we present here will lead to the addition of new sequences to the Reference proteome of this strain, supporting the downstream functional characterization of these proteins, thus leading to a better understanding of the biology of an important model organism of the infectious M. tuberculosis. The gene model modifications we present for many Reference proteome sequences—many of which have not yet been identified at the protein level—facilitates improved functional and structural characterization of these proteins, allowing for accurate conclusions to be drawn by downstream comparative proteomics analyses.

# AUTHOR CONTRIBUTIONS

MP, MSc student, writing, Bioinformatics component; KN, writing, Proteomics component; JA, technical support on Bioinformatics component; AN, technical support on Proteomics component; SG, technical support on Proteomics component; NS, writing, technical support on Proteomics component; JB, corresponding author, technical supervision, and assistance, Proteomics component; NM, corresponding author, technical supervision and assistance, proof reading, Bioinformatics component.

# ACKNOWLEDGMENTS

We thank Dr. Karsten Krug for fruitful discussion and help in manuscript preparation. We thank the National Research Foundation (NRF) of South Africa for their financial support of this research and for a PhD bursary to KN. NS and JB thank the NRF for the South African Research Incentive Funding for Rated Research and Research chair grant, respectively. NS acknowledges support in part by the NRF (Grant Numbers 98963 and 95984). NS thanks the South African Medical Research Council for a fellowship. SG and KN each thanks the CSIR for a Ph.D. bursary. We thank the NRF for funding under the Bioinformatics and Functional Genomics programme, grant

<sup>9</sup>Mycobacterium smegmatis Reference Proteome. UniProt Proteomes. http://www.uniprot.org/proteomes/UP000000757 (Accessed August 9, 2015).

number 86934, including an MSc bursary to MP. Computations were performed using facilities provided by the University of Cape Town's ICTS High Performance Computing team: http://hpc.uct.ac.za.

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmicb. 2016.00427

Supplementary Figure 1 | Novel ORF ICDSs. Shows peptide evidence for six previously identified ICDSs—of which only two have previously been reported at the protein level—and four novel ICDS sequences. (A) DQ866867: A predicted protein spanning a sequencing error reported by Deshayes et al. (2007), showing peptides on either side of a frame shift position. (B) DQ866856: A protein spanning a sequencing error reported by Deshayes et al. (2007), which they also identified at the peptide level, showing peptides on either side of the frameshift position. (C) DQ866859: Peptide evidence for a predicted protein spanning a sequencing error reported by Deshayes et al. (2007). (D) DQ866863: Peptide evidence for a predicted protein spanning a sequencing error reported by Deshayes et al. (2007). (E) DQ866858: Peptide evidence for a predicted protein spanning a sequencing error reported by Deshayes et al. (2007), showing an upstream peptide identified in an overlapping ORF (highlighted in red). (F) Novel ICDS: An upstream ORF first predicted by Deshayes et al. in 2007 (I7FS93\_MYCS2), with downstream peptide evidence in an overlapping novel ORF (three peptides). Both ORFs align to A0A0K0X632\_9MYCO (peptidase M75 of M. goodii), with very low E-values (0.0 and 5.03e-09, respectively). The downstream peptides are highlighted in red. (G) DQ866873: an ICDS spanning a sequencing error, and also detected at the protein level, reported by Deshayes et al. (2007). Five upstream peptides in an overlapping novel ORF are highlighted in red. (H) Novel ICDS: Two adjacent ORFs in alternate reading frames, both aligned to the protein L8F5X0\_MYCSM—predicted by Gray et al. (2013) from the genome of M. smegmatis MKD8. (I) Novel ICDS: Two adjacent ORFs in alternate reading frames, both aligned to the protein L8FGK0\_MYCSM—predicted by Gray et al. (2013) from the genome of M. smegmatis MKD8. (J) Novel ICDS: Two adjacent ORFs in alternate reading frames, both aligned to the protein L8F9F0\_MYCSM—predicted by Gray et al. (2013) from the genome of M. smegmatis MKD8. A single upstream peptide in an alternate frame is highlighted in red.

Supplementary Figure 2 | PSM PEP score distribution. Shows the PEP score distributions of all annotated, novel and reverse sequence PSMs from the different databases. The whiskers represent 1.5 times the interquartile range (IQR) below and above the first and third quartile, respectively. In the GeneMarkS and six frame

# REFERENCES


database groups the PEP score distribution of novel PSMs is noticeably closer to the group of annotated PSMs than reverse hits, indicating the high confidence of novel PSM identifications. PEP score distributions are further improved by excluding the PSMs of peptides that were only identified in a single replicate. Using the Kruskal–Wallis test followed by post-hoc analysis, significant differences were found for the comparisons between all novel and reverse PSM PEP scores for both the GeneMarkS (adjusted p-value 2.20e-42) and six frame databases (adjusted p-value 8.01e-36), respectively. Similarly, significant differences were found between all annotated and all reverse group PSM PEP scores for the Reference proteome (adjusted p-value 1.71e-70), GeneMarkS (adjusted p-value 2.00e-48) and six frame database (adjusted p-value 1.83e-44). The p-values assigned to the comparisons between all annotated and all novel PSM PEP scores were much higher for both the GeneMarkS (adjusted p-value 4.26e-02) and six frame database (adjusted p-value 6.83e-09; see Supplementary Data Sheet 6) for the results of Kruskal–Wallis tests and post-hoc pairwise comparisons.

#### Supplementary Table 1 | Combined ORF identifications.

Supplementary Table 2 | Multiple initiation.

Supplementary Table 3 | Validated TSSs.

Supplementary Table 4 | Frameshift validation.

Supplementary Table 5 | Upstream start annotations.

Supplementary Table 6 | Downstream start annotations.

Supplementary Table 7 | Novel ORF annotations.

Supplementary Table 8 | Novel ORF ICDSs.

Supplementary Table 9 | PSM PEP scores.

Supplementary Table 10 | Peptide PEP scores.

Data Sheet 1 | GeneMarkS sequence database (GFF format).

Data Sheet 2 | Six frame database peptide identifications (GFF format).

Data Sheet 3 | GeneMarkS database peptide identifications (GFF format).

Data Sheet 4 | Reference proteome peptide identifications (GFF format).

Data Sheet 5 | Analysis of best peptide PSM PEP scores.

Data Sheet 6 | Analysis of all peptide PSM PEP scores.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Potgieter, Nakedi, Ambler, Nel, Garnett, Soares, Mulder and Blackburn. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# **Proteomics of** *Neisseria gonorrhoeae***: the treasure hunt for countermeasures against an old disease**

*Benjamin I. Baarda and Aleksandra E. Sikora\**

*Department of Pharmaceutical Sciences, College of Pharmacy, Oregon State University, Corvallis, OR, USA*

*Neisseria gonorrhoeae* is an exquisitely adapted, strictly human pathogen and the causative agent of the sexually transmitted infection gonorrhea. This ancient human disease remains a serious problem, occurring at high incidence globally and having a major impact on reproductive and neonatal health. *N. gonorrhoeae* is rapidly evolving into a superbug and no effective vaccine exists to prevent gonococcal infections. Untreated or inadequately treated gonorrhea can lead to severe sequelae, including pelvic inflammatory disease and infertility in women, epididymitis in men, and sightthreatening conjunctivitis in infants born to infected mothers. Therefore, there is an immediate need for accelerated research toward the identification of molecular targets for development of drugs with new mechanisms of action and preventive vaccine(s). Global proteomic approaches are ideally suited to guide these studies. Recent quantitative proteomics (SILAC, iTRAQ, and ICAT) have illuminated the pathways utilized by *N. gonorrhoeae* to adapt to different lifestyles and micro-ecological niches within the host, while comparative 2D SDS-PAGE analysis has been used to elucidate spectinomycin resistance mechanisms. Further, high-throughput examinations of cell envelopes and naturally released membrane vesicles have unveiled the ubiquitous and differentially expressed proteins between temporally and geographically diverse *N. gonorrhoeae* isolates. This review will focus on these different approaches, emphasizing the role of proteomics in the search for vaccine candidates. Although our knowledge of *N. gonorrhoeae* has been expanded, still far less is known about this bacterium than the closely related *N. meningitidis*, where genomics- and proteomics-driven studies have led to the successful development of vaccines.

**Keywords:** *Neisseria gonorrhoeae***, gonorrhea, proteomics, molecular targets, vaccine, drugs, antibiotic resistance, surveillance**

# **INTRODUCTION**

Gonorrhea is an ancient human disease, with references to its symptoms found in the Old Testament of the Bible (Leviticus 15:1–3). For almost 700 years, it has been known as "the clap," a likely reference to the old Le Clapiers district of Paris where prostitutes were housed. This sexually transmitted disease remains a global scourge today, causing an estimated 106 million new cases worldwide each year WHO (2012). In the United States, gonorrhea is the second most common sexually transmitted disease, with over 300,000 new cases, primarily affecting those between

#### *Edited by:*

*Nelson C. Soares, University of Cape Town, South Africa*

#### *Reviewed by:*

*Monika J. Adamczyk-Poplawska, University of Warsaw, Poland Ana V. Coelho, Instituto de Tecnologia Química e Biológica-UNL, Portugal*

*\*Correspondence:*

*Aleksandra E. Sikora aleksandra.sikora@oregonstate.edu*

#### *Specialty section:*

*This article was submitted to Microbial Physiology and Metabolism, a section of the journal Frontiers in Microbiology*

> *Received: 01 September 2015 Accepted: 12 October 2015 Published: 26 October 2015*

#### *Citation:*

*Baarda BI and Sikora AE (2015) Proteomics of Neisseria gonorrhoeae: the treasure hunt for countermeasures against an old disease. Front. Microbiol. 6:1190. doi: 10.3389/fmicb.2015.01190* 20 and 24 years of age, reported to the Centers for Disease Control and Prevention (CDC) annually CDC (2013). The Gramnegative diplococcus *Neisseria gonorrhoeae*, the gonococcus (GC), is the sole cause of gonorrhea. In men, infections typically present as profuse, localized inflammatory response of the urethra (i.e., urethritis). In contrast, gonorrhea remains asymptomatic in 50–80% of infected women (Farley et al., 2003; WHO, 2011). Untreated or inadequately treated gonococcal infections can have severe consequences including epididymitis in men and pelvic inflammatory disease and inflammation of the uterus, ovaries and fallopian tubes in women, which can lead to infertility. Neonatal health is also detrimentally affected by GC infection, as this pathogen can cause a sight-threatening conjunctivitis in infants born to infected mothers (Creighton, 2011). Additionally, infection with GC increases the risk of HIV transmission (Tapsall, 2005). Further compounding the difficulty in treating gonorrheal infections, through a number of point mutations, as well as horizontally acquired genes, GC has gained resistance to nearly all antibiotics currently in use (Tapsall et al., 2010; Unemo and Shafer, 2014). The CDC now recommends a combination of ceftriaxone with either doxycycline or azithromycin for empirical gonorrhea treatment (CDC, 2012b); however, treatment failures with ceftriaxone have been verified in Japan, Australia, Sweden, and Slovenia (reviewed in Unemo, 2015). Additionally, GC demonstrates remarkable heterogeneity and strain-to-strain variability, which represent a significant challenge in vaccine development (Zhu et al., 2011; Jerse et al., 2014).

Immediate action is critically needed against gonorrhea before it becomes completely untreatable. In response to this dire possibility, the World Health Organization (WHO) published the "Global Action Plan to Control the Spread and Impact of Antimicrobial Resistance in *Neisseria gonorrhoeae*" (WHO, 2012), and the CDC (CDC, 2012a), as well as the European Centre for Disease Prevention and Control (ECDC; ECDC, 2012) proposed region-specific response plans. Overall, these proposals stressed the importance of implementing holistic action against gonorrhea, which would encompass early prevention, diagnosis, contact tracing, treatment, and surveillance of antimicrobial resistance and treatment failures.

An ideal method of gonorrhea prevention would be the development of a protective vaccine(s). Indeed, according to model simulations, gonococcal prevalence could be reduced by at least 90% after 20 years if all 13-year olds were given a non-waning vaccine with 50% efficacy or a vaccine with 100% efficacy that wanes after 7.5 years. Further, even with a nonwaning vaccine of 20% efficacy, as much as a 40% decrease in prevalence could be anticipated (Craig et al., 2015). In line with the WHO, CDC, and ECDC call for action, it is a prerequisite that the anti-gonorrhea vaccine and new drug development be made a priority. Different proteomic approaches are exceedingly valuable to accompany the progress of effective new therapeutic interventions by identifying vaccine and drug targets. Herein, we guide through these different approaches in the treasure hunt for countermeasures against gonorrhea, emphasizing the role of proteomics in the search for GC vaccine candidates (**Figure 1**).

# **HOW DOES A HEALTH-OBSESSED RESEARCHER EVALUATE AN ENERGY BAR?**

# **WITH PROTEOMICS**

As genomic approaches, whole-genome sequencing in particular, have become relatively inexpensive and increasingly highthroughput, with short turn around times and great resolution in the past few years, they have grown increasingly useful in basic research and in clinical diagnosis. The Broad Institute has recently released the whole genome sequences of 14 GC clinical isolates in collaboration with the *Neisseria gonorrhoeae* Group to facilitate research into pathogenesis and genetic determinants of disease states (*Neisseria gonorrhoeae* Group Sequencing Project, Broad Institute of Harvard and MIT<sup>1</sup> ). Additionally, multilocus sequence typing (MLST), which characterizes isolates based on internal fragments of housekeeping gene alleles, has been used to cluster GC patient isolates based on phenotype (Ilina et al., 2010). A database of MLST data for *Neisseria* species has been established<sup>2</sup> , further facilitating MLST identification and genotypic grouping of GC isolates. Genomic-derived methodologies have identified GC iron-responsive genes (Ducey et al., 2005), the anaerobic stimulon (Isabella and Clark, 2011), as well as gene expression patterns during infection of the lower female genital tract (McClure et al., 2015). In clinical applications, the proliferation of genomic datasets has promoted the development of nucleic acid amplification tests (NAATs), which allow for rapid identification of GC in patient samples without the need for culture (Low et al., 2014).

Ultimately, however, genomics is unable to capture the complete biological complexity present. A combination of different proteomic approaches can greatly complement genomicacquired data by examining the GC protein population in biofilms or upon exposure to relevant host stimuli (Wu et al., 2010; Phillips et al., 2012), proteins associated with drug resistance (Nabu et al., 2014), proteome expression patterns during infection, posttranslational modifications, or by providing information about proteins' subcellular location, structures and protein–protein interactions. The knowledge gained from proteomic studies can be useful for identifying GC in clinical samples (Gudlavalleti et al., 2008; Carannante et al., 2015), evaluating antibiotic resistance, and discovering potential vaccine and drug candidate proteins (Zielke et al., 2014, 2015).

# **WHEN DO YOU NOT WANT TO RECEIVE APPLAUSE?**

# **WHEN YOU IDENTIFY "THE CLAP"**

Rapidly identifying GC in clinical isolates is vital to initiate treatment as quickly as possible to prevent the severe consequences of untreated gonorrhea, as well as to limit the spread of antimicrobial resistant strains. NAATs, developed

<sup>1</sup>http://www.broadinstitute.org/

<sup>2</sup>http://pubmlst.org/neisseria/

using 2-D-SDS-PAGE. **(B)** Proteomic investigations (1D SDS-PAGE and ICAT) of manganese (Mn) regulation of virulence factors and oxidative stress. **(C)** Proteomic profiling of GC transition from planktonic to biofilm growth using SILAC. **(D)** Comparative, high-throughput proteomic analyses of cell envelope and MV fractions derived from GC strains FA1090, F62, MS11, and 1291.

with the use of genomic data, are more rapid and sensitive than culture, which has resulted in an increase in the number of infections detected (Unemo and Shafer, 2014; Wind et al., 2015). Current commercially available gonococcal NAATs are unable to satisfactorily measure antimicrobial resistance, undermining the surveillance of antimicrobial resistance trends. Because of this limitation, the ECDC has recommended that all specimens positive for GC by NAAT are subsequently cultured to monitor antimicrobial resistance trends (ECDC, 2012). However, laboratory-developed NAATs have been utilized to identify known genetic antimicrobial resistance determinants against several classes of antibiotics as well as detecting the crucial mutations conferring resistance to extended-spectrum cephalosporins in superbug strains H041 and F89 (reviewed, by Unemo, 2015).

As antibiotic resistance determinants continuously evolve, the development of new detection tests is required (Unemo, 2015). Proteomics can provide sensitive, accurate, rapid, and cost-effective methods of GC identification and determination of antimicrobial resistance patterns in clinical samples. Proteomic identification of bacteria is primarily performed with data generated by matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF-MS). Unique and representative biomarker ions can be established from intact cell MALDI-TOF-MS analysis (van Baar, 2000; Amiri-Eliasi and Fenselau, 2001; Fenselau and Demirev, 2001; Fagerquist et al., 2010; Murray, 2010; Niyompanich et al., 2014). The most important advantages in direct bacterial profiling by means of MALDI-TOF-MS are: (1) the requirement for only a small amount of biological material, (2) the possibility of examining intact cells without preceding extraction and separation, (3) a fast and straightforward procedure, and (4) high specificity in species differentiation (Ilina et al., 2009).

One of the first examples of direct GC profiling with MALDI-TOF-MS used surface enhanced MALDI-TOF-MS to analyze over 350 GC strains and closely related species (Schmid et al., 2005). These comparisons enabled the design of multilayer artificial neural networks that revealed 20 ion peak descriptors of positive, negative and secondary nature that were supreme for GC identification (over 96% efficiency, a sensitivity of 95.7% and a specificity of 97.1%). Another study used atmospheric pressure MALDI-TOF to determine that a putative DNA binding protein from *N. meningitidis*; its homolog in GC, DbhA; and acyl carrier proteins in each species could be used as protein biomarkers for identifying pathogenic *Neisseria* (Gudlavalleti et al., 2008). Although this approach does not provide a method to distinguish between *N. meningitidis* and *N. gonorrhoeae*, it can act as a base upon which to build techniques for identifying pathogenic *Neisseria* in clinical samples. The successful application of MALDI-TOF-MS for GC identification has also been recently demonstrated on 92 out of 93 isolates of gonococci collected from 2007 to 2012 as part of the European Gonococcal Antimicrobial Surveillance Programme (Carannante et al., 2015).

Together, these studies highlight the potential of proteomic approaches to rapidly and correctly identify GC in various clinical isolates, which, if implemented on a larger scale, will promote rapid initiation of treatment while still allowing antimicrobial susceptibility testing to be performed.

# **WHAT DID THE BACTERIA CALL THEIR GUERILLA WARFARE UNIT?**

# **THE SPECTINOMYCIN RESISTANCE**

In a survey of antimicrobial resistance in Southeast Asia, the WHO Global Gonococcal Antimicrobial Surveillance Program found that between 0.6 and 10.5 percent of isolates demonstrated spectinomycin resistance (Bala et al., 2013). Spectinomycin directly interacts with 16S rRNA and inhibits the elongation factor G (EF-G)-catalyzed translocation of the peptidyl-tRNA from the A site to the P site during protein synthesis (Bilgin et al., 1990; Ramakrishnan and White, 1992). Not surprisingly, spectinomycin resistance determinants traditionally involve mutations in 16S rRNA (Maness et al., 1974; Galimand et al., 2000). However, a deletion of amino acid 25 and a K26E amino acid alteration (*E. coli* numbering) in the ribosomal protein S5 is a newly identified mechanism associated with high-level spectinomycin resistance in GC (Unemo et al., 2013). Overall, however, the *in vitro* susceptibility to this aminocyclitol compound is remarkably high worldwide and this antibiotic is an effective alternative for treatment of anogenital gonorrhea, in particular for multidrug resistant cases (Unemo, 2015).

The proteomic signatures of GC spectinomycin resistance as well as cellular responses to spectinomycin treatment were assessed through qualitative and relative quantitative proteomics using densitometry analysis of proteins separated by two-dimensional sodium dodecyl sulfate-polyacrylamide gel electrophoresis (2D SDS-PAGE) and protein identification with MALDI-TOF-MS (Otto et al., 2012; Nabu et al., 2014). Comparative proteome maps have been constructed between a spectinomycin resistant (Spec<sup>R</sup> ) clinical isolate and a spectinomycin sensitive (Spec<sup>S</sup> ) reference strain (ATCC 49226; **Figure 1A**). When the two strains were not exposed to spectinomycin, their protein profiles were largely the same, with EF-Tu and EF-Ts, cysteine synthase, and the septum sitedetermining protein MinD upregulated in the Spec<sup>R</sup> isolate. Additionally, MinD, oxidoreductase, and hypothetical protein NGO1873 were shifted to a more acidic isoelectric point, and the ribosomal protein S6 was shifted to a more basic pI in the resistant isolate. Finally, ABC transporter substrate-binding protein showed decreased expression and alcohol dehydrogenase was not detected in the Spec<sup>R</sup> strain. Interestingly, in the presence of spectinomycin, although dihydrolipoamide dehydrogenase, peroxiredoxin, an outer membrane protein Rmp, and the 50S ribosomal protein L7/L12 were upregulated in both strains, none of the proteins were expressed as highly in the resistant strain as in the wild type. Overall, the spectinomycin treatment of the Spec<sup>S</sup> GC resulted in alterations in the abundance of proteins involved in energy metabolism and detoxification, as well as cell envelope proteins (Rmp and ABC transporter substrate-binding protein).

Based on the obtained proteomic comparisons, a mode of spectinomycin action on GC is proposed but not experimentally validated (Nabu et al., 2014). Briefly, in the presence of subminimal inhibitory concentrations, the drug destabilizes the outer membrane, recruiting Rmp to maintain the integrity of the cell envelope. Spectinomycin accumulates in the periplasm, generating a concentration imbalance between the inside and outside of the cell. In response, significant changes in energy metabolism including an increase in NADH production and oxidation through the electron transport chain occur. At the same time, additional proton motive force allows more drug molecules to enter the cell. Increased NADH oxidation leads to an increase in reactive oxygen species, which are detoxified by the increase in peroxiredoxin and glutamate dehydrogenase production. Higher levels of L7/L12 allow the cell to overcome the inhibition of ribosomal translocation imposed by spectinomycin. In the Spec<sup>R</sup> strain, amino acid ATP binding cassette transporter substratebinding protein levels are decreased even in the absence of spectinomycin, possibly affecting uptake of the drug. Expression of enolase is decreased, which may result in increased levels of 2P-D-glycerate and 3P-D-glycerate. Together with an increase in cysteine synthetase expression, this may improve the cell's defense against reactive oxygen species. Finally, the already high levels of EF-Tu and –Ts, together with the upregulation of L7/L12, may assist with protein translation in the presence of spectinomycin (Nabu et al., 2014).

Future research may be able to exploit some of the suggested secondary pathways and increase the efficacy or reduce the resistance potential of future antimicrobial compounds. Also apparent in this study are some of the shortcomings of 2D SDS-PAGE, which has limited utility when dealing with low abundance and/or membrane proteins.

# **HOW DID THE HEALTH-OBSESSED RESEARCHER COMPARE TWO PROTEIN SHAKES?**

# **WITH QUANTITATIVE PROTEOMICS**

In contrast to qualitative proteomics, quantitative proteomic approaches allow for (absolute or relative) quantification of proteins on a global scale. The object of quantitative proteomics is to identify specific alterations between control samples and particular experimental conditions (e.g., healthy versus diseased state). In addition, quantitative proteomic profiling may focus on a specific subset of proteins (subproteome), where for instance bacterial whole cell lysates are subjected to fractionation to enrich for cell envelope proteins. Very often the proteins of interest are relatively low in abundance; therefore it is critical to utilize appropriate combinations of pre-fractionation techniques such as different kinds of liquid chromatography (affinity, reversed phase, size-exclusion, or ion exchange) to reduce the complexity of analyzed samples, in addition to employing sensitive mass spectrometry instruments. Different approaches have been developed to facilitate quantitative proteome profiling studies involving stable isotope labeling, such as ICAT, ICPL, SILAC, iTRAQ, TMT, and IPTL. In addition, label-free statistical methodologies (MRM, SWATH) and absolute quantification methods by mass spectrometry (AQUA strategy) have become available (reviewed in Chahrour et al., 2015).

Quantitative proteomic approaches, involving different stable isotope labels and summarized in **Table 1**, have been used to investigate how GC responds to oxidative stress (Wu et al., 2010), transitions from planktonic to biofilm growth (Phillips et al., 2012), and adapts the composition of the cell envelope in response to different environmental cues encountered in microecological niches within the host (Zielke et al., 2015). In addition, iTRAQ technology has also revealed the dynamic subproteomes of cell envelopes and naturally released membrane vesicles in four different GC isolates (Zielke et al., 2014).

# **WHAT HAPPENS WHEN TWO OXEN GO ON THEIR FIRST DATE?**

# **OXIDATIVE STRESS**

In a typical inflammatory response to GC, neutrophils are recruited to the site of infection by chemokines, such as IL-8, released by infected mucosal surfaces (Criss et al., 2009). When activated neutrophils phagocytize GC, the production of reactive oxygen species (ROS) is either stimulated or inhibited, depending on GC expression of opacity-associated proteins (Opa). Cells expressing Opa protein (Opa+) ligate to human carcinoembryonic antigen-related cell adhesion molecules (CEACAMs), specifically CEACAM3, on the surface of the neutrophil, triggering phagocytosis, NADPH subunit assembly, and degranulation (Schmitter et al., 2007; Sarantis and Gray-Owen, 2012). Additionally, Opa<sup>57</sup> protein ligated to CEACAM3 amplifies the inflammatory response by activating nuclear factor (NF)-κB and increasing phosphorylation of the p38 kinase (Sintsova et al., 2014).

This cascade of ROS production puts the GC cells under tremendous oxidative stress. However, viable GC has been shown to survive and replicate within neutrophils, even after NADPH oxidase activation (Simons et al., 2005), indicating that GC is able to protect itself against oxidative stress. This pathogen can defend against superoxide radicals (Tseng et al., 2001) and hydrogen peroxide (Seib et al., 2004) in a manganese (Mn) dependent manner. A study combining transcriptomic and qualitativeand quantitative-proteomic approaches examined the protective mechanism utilized by GC in the presence of Mn (Wu et al., 2010). One-dimensional sodium dodecyl sulfate-polyacrylamide gel electrophoresis (1D SDS-PAGE) paired with one-dimensional liquid chromatography-tandem mass spectrometry (1D LC-MS/MS) identified 46 proteins expressed only in the presence of Mn. Notably, bacterioferritin, azurin (Laz), and iron-superoxide dismutase (SodB)—proteins involved in defense against oxidative stress—were expressed in bacteria cultured in the presence of Mn, but were not detected in those cultured without Mn. When cultured in the absence of Mn, GC was found to express an outer membrane methionine sulfate reductase that is also involved in defense against superoxide and hydrogen peroxide.

Further, a quantitative proteomic investigation using Isotope-Coded Affinity Tag (ICAT) labeling was used to determine which proteins were differentially regulated by the presence or absence of Mn (Wu et al., 2010). For ICAT analysis, two protein samples are labeled with reactive groups (biotinylated iodoacetamide or acrylamide derivatives) that specifically react with the sulphydryl groups of denatured peptides' cysteine side chains (Gygi et al., 1999). One sample is labeled with a light isotope, while the other sample is labeled with a heavy isotope. The samples are then combined and analyzed by mass spectrometry (**Figure 1B**; **Table 1**). The protein populations can then be quantified by comparing the ratio of heavy to light proteins (Gygi et al., 1999; Colangelo and Williams, 2006; Chahrour et al., 2015).

In this study, ICAT labeling, coupled with MS/MS, revealed numerous proteins that were downregulated more than 1.5-fold in the presence of Mn, including PilT (an ATPase involved in pilus disassembly), OmpR, a 64 KDa outer membrane protein (OMP P64k), and peroxiredoxin, which reduces and detoxifies peroxides (Seib et al., 2006). Additionally, Mn affected the levels of pilin, superoxide dismutase, and pyrophosphatase without causing a corresponding change in the transcript level, indicating that these proteins are likely to be regulated post-transcriptionally by the presence of Mn (Wu et al., 2010).

Taken together, the results of this study suggest that, in the presence of Mn, GC upregulates the expression of iron storage proteins that protect against oxidative damage. Concomitantly, the bacterium downregulates pyrophosphatase



(Ppa) and polyphosphate kinase (Ppk). Ppa hydrolyzes inorganic pyrophosphate (PPi) into two molecules of orthophosphate (Pi), while Ppk synthesizes polyP from P<sup>i</sup> . When these two proteins are downregulated, PP<sup>i</sup> accumulates in the cell and chelates Mn. In a Mn(II)-PP<sup>i</sup> complex, Mn is able to defend against ROS non-enzymatically (Wu et al., 2010).

The information gleaned from these proteomic experiments could suggest a method to combat gonorrhea whereby the downregulation of protective proteins required only in the absence of Mn is maintained, but the alteration of the levels of proteins required for protection from oxidative stress in the presence of Mn is blocked. Importantly, this study highlights the utility of proteomic approaches to investigate biological responses involving post-transcriptional regulation that genomic methods alone cannot discover. It is important to keep in mind, however, that ICAT has two major drawbacks: (1) proteins that do not have any cysteine residues will be eliminated from this analysis, and (2) the release of biotinylated peptides from the streptavidin column is not quantitative for low-abundance peptides (Chahrour et al., 2015).

# **WHAT IS A MICROBIOLOGIST'S FAVORITE KIND OF MOVIE?**

# **A BIOFILM**

Bacteria often shift from planktonic (free living bacteria) growth to a biofilm community, where bacteria grow in close proximity to each other, protected by an extracellular polymer composed of polysaccharides, proteins, nucleic acids, and lipids (Flemming and Wingender, 2010). GC has been shown to form biofilms *in vitro* (Greiner et al., 2005), and an examination of primary cervical epithelial cells from cervical biopsy samples revealed biofilm growth in culture positive gonorrhea cases (Steichen et al., 2008). Biofilms exacerbate antibiotic resistance by providing a protective barrier against antimicrobial action, and biofilms formed by GC are thought to contribute to asymptomatic infections (Steichen et al., 2008).

To understand the mechanisms underlying biofilm formation, a quantitative proteomic study examined the proteome changes GC undergoes in the transition from planktonic to biofilm growth using stable isotope labeling by amino acids in cell culture (SILAC; **Figure 1C**; **Table 1**; Phillips et al., 2012). One of the important advantages of SILAC over other stable isotope labeling methods is that the label is integrated into the peptide at early stages of experimentation when the sample is metabolically active. Thus, possible variability due to sample preparation and purification losses are eliminated (Chahrour et al., 2015).

In this analysis, planktonic cells of GC strain 1291, which is an arginine auxotroph, were grown with labeled <sup>13</sup>C6-Arg, and biofilm cells were grown with unlabeled Arg in a continuousflow apparatus. Extracted protein samples from each bacterial population were combined and subjected to MALDI-TOF mass spectrometry analysis. Overall, this global analysis identified 757 proteins, 152 of which were significantly differentially expressed. In particular, GC cultured in a biofilm exhibited 73 upregulatedand 54 downregulated-proteins when compared to planktonic growth. The results of this study indicated that the bacteria upregulate proteins to respond to an oxygen-limited environment, including cytochrome c oxidase subunit III CcoP and nitrite reductase AniA. To cope with restricted nutrient availability in the biofilm, bacterial metabolism is shifted to increase sugar fermentation and tricarboxylic acid (TCA) cycle enzymes. The composition of the outer membrane is also altered during growth in a biofilm, with increased levels of 9 proteins including OpaB and OpaD (Phillips et al., 2012), both of which have been shown to adhere to and damage fallopian tube mucosa (Dekker et al., 1990). Among the downregulated proteins in the biofilm were proteins involved in energy metabolism, protein fate and synthesis, and transport and binding proteins, specifically iron complex outer membrane receptor protein (FetA) as well as transferrin-binding protein B and A (TbpB and TbpA, respectively).

A direct comparison of the transcriptome (Falsetta et al., 2009) and proteome expression profiles of GC biofilms showed a very poor correlation with only seven overlapping hits including AniA, OpaB, cytochrome C peroxidase CcpR, putative dihydrolipoamide dehydrogenase, putative cysteine synthase/ cystathionine beta-synthase, hypothetical protein NGO0905, and putative ABC transporter NGO1494 (Phillips et al., 2012).

This study gives insight into the adaptations necessary for GC to establish long-term infections and emphasizes the utility of proteomic approaches to examine these adaptations. In addition, the identified upregulated outer membrane proteins may be utilized as biomarkers for gonorrhea diagnostics.

# **WHAT BOOK-LIKE CANDIDATE WAS NOT AT THE LAST PRESIDENTIAL DEBATE?**

# **A NOVEL VACCINE CANDIDATE**

Perhaps one of the most exciting uses of proteomic approaches is in the search for new ways to combat multidrug resistant GC. We are applying a proteomics-driven reverse vaccinology approach to identify vaccine candidate proteins against gonorrhea (Zielke et al., 2014, 2015). Reverse vaccinology searches for possible vaccine candidate proteins using different genomics and proteomics methodologies and has already been successfully applied to different pathogenic bacteria including *N. meningitidis* serogroup B (Heckels and Williams, 2010; Adamczyk-Poplawska et al., 2011; Seib et al., 2012; Delany et al., 2013; Heinson et al., 2015).

No vaccine against GC currently exists, although research has been ongoing for decades. Two attempted vaccines, comprised of killed whole cells and purified pilin protein, failed in clinical trials over 13 years ago (Zhu et al., 2011). Since that time, very little research into gonorrheal vaccines has occurred, mainly due to the highly variable targeted surface proteins. Because GC is a strict human pathogen, research was also hampered by the lack of a suitable small animal model for gonorrheal infection (Jerse et al., 2014). Fortunately, a mouse model of female infection was developed, in which female mice are treated with 17-β-estradiol when they are in the diestrus stage of the estrus cycle. The mice are also treated with an antibiotic cocktail of streptomycin sulfate, vancomycin HCl, and trimethoprim sulfate to prevent overgrowth of commensal vaginal bacteria while under the influence of estradiol. 2 days after estradiol treatment, GC is introduced intravaginally. Using this model, GC can be recovered an average of 12.2 days post-inoculation with 10<sup>6</sup> colony forming units (Jerse, 1999). A further advancement in the mouse model has been the development of transgenic mice that express human CEACAM proteins, providing a closer reproduction of conditions encountered in the human host (Jerse et al., 2014). The availability of a mouse model has greatly facilitated vaccine research. The immune response to infection, as well as resistance to subsequent infections after inoculation with an experimental vaccine can be closely monitored and investigated with the genetic tools available for studying mice (Zhu et al., 2011). However, to fully utilize this model for vaccine research, suitable candidate proteins must be identified—a goal for which proteomic approaches are ideally suited.

During the development of the MenB vaccine, out of nearly 600 candidates selected by reverse vaccinology, 350 recombinant proteins were successfully expressed in *Escherichia coli* and evaluated for their surface exposure. A total of 28 among them elicited bactericidal antibodies against Group B meningococci *in vitro*. Finally, the neisserial heparin-binding antigen NHBA, factor H-binding protein fHbp, as well as the neisserial adhesin NadA were chosen as part of the MenB vaccine (Seib et al., 2012; Delany et al., 2013; Jerse et al., 2014). In contrast, only 12 different candidates are being evaluated as potential gonorrhea vaccine antigens (Jerse et al., 2014). Therefore, a more far-reaching effort is required to make a gonorrhea vaccine a reality.

Of particular interest for vaccine development and identification of new drug targets are proteins localized to the bacterial cell envelope and membrane vesicles (MVs)—spherical outpouchings of the cell envelope—as they interact directly or indirectly with host tissues; play roles in pathogenesis, antibiotic resistance, and biofilm formation; and participate in general physiological processes. Surprisingly few studies addressed GC cell envelope composition (Yoo et al., 2007; Phillips et al., 2012; Zielke et al., 2014). Also, despite studies reporting the release of MVs and their different morphological forms (spherical, lobed, and tubular) in GC from the early 1970s, only a few reports focused on elucidating their components (Swanson et al., 1971; Dorward et al., 1989; Pettit and Judd, 1992a,b; Falsetta et al., 2011; Zielke et al., 2014).

To begin the systematic mining of GC cell envelope and MVs for the discovery of vaccine and drug candidates, we first used the PSORTb 3.0.2 (Gardy et al., 2005) bioinformatics predictions and analyzed the subcellular localization of all ORFs in the completed genome sequence of strains FA1090 (Gen Bank accession number AE004969) and NCCP11945 (Gen Bank accession number CP001050), as well as the draft genome sequences of 14 different GC strains (downloaded from the Broad Institute website<sup>3</sup> ). These studies revealed that, on average, about 50 of the 2,000 ORFs present in the GC genome encode outer membrane proteins (Zielke and Sikora, 2014). However, the subcellular location could not be predicted for about 30% of all ORFs. This analysis demonstrated that there is still much to learn about GC cell envelope composition and opened up exciting prospects for applying proteomics for the discovery of vaccine targets.

As proteomic investigations of membrane proteins are technically challenging, we chose to apply gel-free quantitative proteomic approaches including isobaric tagging for relative and absolute quantification (iTRAQ, **Table 1**) combined with multidimensional liquid chromatography and tandem mass spectrometry (2D-LC/MS/MS) to examine cell envelopes and naturally released MVs (Zielke et al., 2014). Four GC strains: FA1090, F62, MS11, and 1291 were cultured in liquid media under standard growth conditions and their cell envelopes and MVs were harvested in mid-logarithmic phase of growth. iTRAQ quantification was performed by labeling proteins isolated from subproteome fractions of each strain of interest with one of four isobaric tags (N-hydroxysuccinimide ester-activated compounds) that react to free amine groups on the N-termini and lysine side chains of proteins with high efficiency (Ross et al., 2004). Each of

<sup>3</sup>http://www.broadinstitute.org/annotation/genome/neisseria\_gonorrhoeae/ MultiHome.html

the four tags contains a reporter ion of a unique mass (**Figure 1D**; **Table 1**). When the samples are combined and subjected to mass spectrometry, the reporter ions are released from the labeled peptide. After the release of the reporter ions, all of the identical peptides in a sample will result in identical mass spectra, and the abundance of the peptide in the four multiplexed samples can be quantified by the relative intensity of the corresponding reporter ion peak (Ross et al., 2004; Wiese et al., 2007). The advantage of using iTRAQ is that it can be easily multiplexed and up to 8 different samples can be simultaneously analyzed within a single experiment. Additionally, as the iTRAQ tags react with all primary amine functional groups of peptides, nearly all peptides are labeled and information about not only their abundance but also their modification(s) can be acquired (Chahrour et al., 2015).

Our proteomic profiling of cell envelopes and native MVs revealed 533 and 168 common proteins, respectively, in analyzed GC strains. A total of 22 differentially abundant proteins were discovered including hitherto unknown proteins. Among those proteins that displayed similar abundance in four GC strains, we identified 305 and 46 cell envelope-and MVs-associated proteins, respectively. In addition, 34 proteins were found in both cell envelopes and MVs with eleven of them differentially regulated (Zielke et al., 2014). A few of these differentially expressed proteins included cytoplasmic proteins, an observation that was confirmed by a subsequent, independent proteomic study of MVs (Perez-Cruz et al., 2015).

The ubiquitous outer membrane proteins identified included GC homologs of the outer membrane β-barrel assembly (Bam) protein complex (Ricci and Silhavy, 2012), including BamA, BamD, and BamE; lipopolysaccharide transport protein LptD; and TamA (NGO1956) and TamB (NGO1955), two proteins thought to work cooperatively to assist in the assembly of autotransporter proteins (Heinz et al., 2015). Numerous uncharacterized proteins were also ubiquitously expressed, including NGO1344, NGO1985, NGO2111, NGO2121, and NGO2139 (Zielke et al., 2014).

We further examined LptD, NGO1344, NGO1985, NGO2111, NGO2121, NGO2139, and TamB by constructing conditional- (LptD) or complete- knockout strains for each protein. These proteins were chosen because they contain domains predicted to function in maintaining cell envelope homeostasis. LptD expression in GC was placed under the control of an isopropyl β-D-1-thiogalactopyranoside (IPTG)-inducible promoter. This strain was unable to grow when streaked onto media lacking IPTG, while it grew robustly on plates supplemented with IPTG. Further, after 3 h of culturing in liquid media without IPTG, the bacteria ceased to grow. By 5 h, bacterial viability decreased dramatically (nearly 13-fold) compared to the LptD-expressing strain. These experiments indicated that LptD is likely essential for GC viability. To test whether NGO1344, NGO1985, NGO2111, NGO2121, NGO2139, and TamB play functions in the integrity of the GC cell envelope, the individual clean deletion mutants were constructed in strain FA1090 and spotted on plates supplemented with various compounds. Although the loss of these proteins had no effect on bacterial growth under permissive conditions, the loss of NGO1985 resulted in a severe growth defect in the presence of bile salts, polymyxin B, Tween 20, SDS, urea, and chloramphenicol. Further, these phenotypes could be completely reversed by complementation with an IPTG-inducible version of the *ngo1985* gene. Additionally, NGO2121 exhibited reduced growth in the presence of bile salts and polymyxin B. These proteins, identified by quantitative proteomic approaches, appear to provide an important function in maintaining cell membrane integrity and, as such, are promising targets for development of new therapeutic interventions against GC (Zielke et al., 2014).

To continue this research endeavor, we went on to determine the ubiquitously and specifically expressed cell envelope proteins of GC FA1090 challenged with host-relevant environmental stimuli: oxygen availability, iron deprivation, and the presence of human serum (Zielke et al., 2015). A myriad of novel proteins have been identified. Our initial characterization of five novel vaccine candidates that were ubiquitously expressed under these different growth conditions demonstrated that BamA, LptD, TamA, NGO2054, and NGO2139 were surface exposed and produced bactericidal antibodies that cross-reacted with a panel of diverse GC isolates. These promising results strongly suggest that the proteomics-driven approach will provide a foundation for the development of anti-GC vaccine(s), which would be the ideal way to prevent gonorrhea. Finally, to promote the utilization of the newly identified proteins and the knowledge of the GC subproteome dynamics among the scientific community, our entire data sets from all these investigations were made publicly available via the ProteomeXchange Consortium<sup>4</sup> , the PRIDE partner repository ProteomeXchange with the identifiers PXD000549 and PXD001944.

# **FUTURE DIRECTIONS**

Although proteomic approaches have revealed a multitude of information on the physiology of GC, to formulate an effective vaccine, more information needs to be gathered about the way its surface proteins interact with host cells during an infection. Ideally, the proteome of clinical samples that have never been subcultured should be examined to determine which proteins are expressed during different stages of infection. This approach presents a significant technical challenge to overcome, as sufficient bacterial material must be collected for a quantitative proteomic analysis. In our investigation of the cell envelope and MV proteome, we collected material from 1 liter of culture (Zielke et al., 2014). Although other proteomic studies have used less material, including 5 milliliters of culture (Wu et al., 2010) or bacteria harvested from 20 plates (Anonsen et al., 2012; Perez-Cruz et al., 2015) collecting this amount of sample from a patient is not feasible; more sensitive MS analyzers or alternate methods of sample enrichment will be required. These studies will give insight into the proteomic adaptations the bacteria undergo to establish and maintain infection. Information collected from proteomic studies of clinical samples and host tissue culture can help further drive vaccine development and have the potential to aid in the discovery of stably expressed protein targets of antimicrobial agents with novel modes of action. Integration of multiple approaches, including public access to on-line raw data

<sup>4</sup>http://www.proteomexchange.org

is essential if there is to be a sense of participation across the biomedical research community.

An interesting analysis technique was recently pioneered by Altindis et al. (2015). Termed "protectome analysis," their technique searches for vaccine candidate proteins in proteomic datasets by identifying proteins with structural or functional features in common with proteins known to provide protection. This analysis tool could be used in combination with other proteomic studies to immediately identify proteins expressed during infection with the potential to provide protection against reinfection.

As more information is deposited into proteomic databases, the utility of proteomic approaches to identify GC in clinical samples will increase. For example, in cases where the molecular determinant of antimicrobial resistance is known, proteomic approaches have the potential to immediately recognize the protein modification(s) that result in antimicrobial resistance, a feat that is not possible with current NAAT identification techniques.

Finally, proteomic investigations of multidrug resistance strains can reveal the mode of drug action, as well as the pathway(s) the bacterium uses to resist multiple antibiotics. One mystery that proteomic approaches may be able to solve is the identity of "Factor X," an unknown determinant of penicillin and cephalosporin resistance that is non-transformable and therefore is difficult to study with typical genetic methods (Unemo and Shafer, 2014).

# **REFERENCES**


# **CONCLUSION**

Developing novel vaccines or antimicrobial agents is critical in the face of growing antibiotic resistance, and global and quantitative proteomic approaches have begun to reveal potential targets in the fight against GC. Proteomic approaches are ideal for the discovery of vaccine candidate proteins, as well as protein targets for the development of novel antimicrobial agents. Qualitative proteomic studies revealed the GC defense response to spectinomycin, while quantitative proteomics have demonstrated bacterial adaptations to conditions encountered in the host, including oxidative stress, anoxia, iron deprivation, and the presence of human serum. Proteomics have also been recently adapted to identify GC in clinical samples, which can expedite treatment. Importantly for vaccine development, stably expressed proteins have been identified through high-throughput examinations of cell envelopes and naturally released MVs. These proteomic studies will act as starting points for studies into structural vaccinology, protein–protein interactions, and GC physiology, and have already given new insights into ways to combat this important, difficult to treat pathogen.

# **ACKNOWLEDGMENTS**

Funding for this work was provided to AES by grant R01- AI117235 from the National Institute of Allergy and Infectious Diseases, National Institutes of Health.

gonococcal infections. *MMWR Morb. Mortal. Wkly. Rep.* 61, 590–594. Available at: http://www.ncbi.nlm.nih.gov/pubmed/22874837 (accessed August 24, 2015).


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Baarda and Sikora. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Proteomics As a Tool for Studying Bacterial Virulence and Antimicrobial Resistance

#### Francisco J. Pérez-Llarena and Germán Bou\*

Servicio de Microbiología-INIBIC, Complejo Hospitalario Universitario A Coruña, A Coruña, Spain

Proteomic studies have improved our understanding of the microbial world. The most recent advances in this field have helped us to explore aspects beyond genomics. For example, by studying proteins and their regulation, researchers now understand how some pathogenic bacteria have adapted to the lethal actions of antibiotics. Proteomics has also advanced our knowledge of mechanisms of bacterial virulence and some important aspects of how bacteria interact with human cells and, thus, of the pathogenesis of infectious diseases. This review article addresses these issues in some of the most important human pathogens. It also reports some applications of Matrix-Assisted Laser Desorption/Ionization-Time-Of-Flight (MALDI-TOF) mass spectrometry that may be important for the diagnosis of bacterial resistance in clinical laboratories in the future. The reported advances will enable new diagnostic and therapeutic strategies to be developed in the fight against some of the most lethal bacteria affecting humans.

#### Edited by:

Weiwen Zhang, Tianjin University, China

#### Reviewed by:

Blanca Barquera, Rensselaer Polytechnic Institute, USA Biswapriya Biswavas Misra, University of Florida, USA

#### \*Correspondence:

Germán Bou german.bou.arevalo@sergas.es

#### Specialty section:

This article was submitted to Microbial Physiology and Metabolism, a section of the journal Frontiers in Microbiology

> Received: 23 October 2015 Accepted: 14 March 2016 Published: 31 March 2016

#### Citation:

Pérez-Llarena FJ and Bou G (2016) Proteomics As a Tool for Studying Bacterial Virulence and Antimicrobial Resistance. Front. Microbiol. 7:410. doi: 10.3389/fmicb.2016.00410 Keywords: resistance, diagnostic, antibiotic, virulence, proteomics, bacteria

# INTRODUCTION

Bacterial diseases continue to be a major cause of death throughout the world as a result of the emergence of new infectious agents, increased transmission due to human migration, and the development of antibiotic resistance (Beceiro et al., 2013). Novel antibiotics and therapies are urgently needed to control these infections, together with new rapid and reliable diagnostic techniques for characterizing resistant strains. Matrix-Assisted Laser Desorption/Ionization-Time-Of-Flight (MALDI-TOF) mass spectrometry techniques may be useful for this task. The present review considers some recent advances in the diagnosis of microbial resistance. Examination of protein profiles by proteomic analysis has become an essential tool for studying the basic mechanisms of bacterial resistance and virulence. This has led to a better understanding of the biology of pathogens that cannot be investigated by reductionist or even genomic studies (e.g., global post-translational protein modifications, subcellular protein location, and protein turnover rates, which are also considered in this review; Chao and Hansmeier, 2012).

**Abbreviations:** ATCC, American Type Culture Collection; c-di-GMP, cyclic diguanylate; DIGE, difference gel electrophoresis; ICAT, isotope-coded affinity tags; iTRAQ, isobaric tags for relative and absolute quantification; LC–MS, liquid chromatography–mass spectrometry; MIC, minimal inhibitory concentration; MRSA, methicillin-resistant Staphylococcus aureus; MS, mass spectrometry; MSSA, methicillin-sensitive Staphylococcus aureus; Omp, outer membrane proteins; SDS-PAGE, sodium dodecyl sulfate polyacrylamide gel electrophoresis; SILAC, stable isotope labeling by amino acids in cell culture; SRM, selected reaction monitoring; TMT, tandem mass tags.

Proteomic techniques are continuously being developed and there is now a great diversity of methods and applications available (Van Oudenhove and Devreese, 2013; Otto et al., 2014). The improved sensitivity of mass spectrometers, together with upgraded sample preparation and protein fractionation technologies, has enabled a more complete study of proteomes. In the past, most quantitative proteomic studies were performed by 2D-PAGE, particularly after the incorporation of immobilized pH gradients and of the difference fluorescent labeling method (DIGE). The gel free method then became more popular and metabolic labeling was included in quantitative analysis. In this labeling method, cells are grown under appropriate conditions in media supplied with either a light or heavy stable isotope of a nutrient, commonly a nitrogen source or an amino acid. The so-called stable isotope labeling by amino acids in cell culture (SILAC) strategy now prevails (Ong et al., 2002). As a substitute to metabolic labeling, differential analysis can be carried out by chemical labeling. Isotope-coded affinity tags (ICAT) covalently label cysteines in extracted proteins with either a light or heavy (deuterium containing) conformation of the ICAT indicator (Gygi et al., 1999). Today, the use of multiple isobaric tags is prevailing among chemical isotope labeling methods. Some products such as isobaric tags for relative and absolute quantification (iTRAQ) and tandem mass tags (TMT) are commercially available (Thompson et al., 2003; Ross et al., 2004). A similar strategy, known as isotope-coded protein label (ICPL), in which both N-termini and lysine side chains are labeled, has been developed for use at the protein level (Schmidt et al., 2005). Recent progress in the development of high throughput and automation of liquid chromatography–mass spectrometry (LC–MS) instruments and in particular progress toward the use of novel algorithms to handle LC–MS data is facilitating the application of quantitative proteomics using labelfree strategies. The methods, in which synthetic proteotypic peptides labeled with stable isotopes are used as the ideal internal standard, are becoming particularly popular. Thus, these internal standard peptides are used to measure the total quantity of individual proteins of interest after digestion by selected reaction monitoring (SRM)-MS measurements (Gerber et al., 2003). The various advantages and disadvantages of the different protocols are summarized in **Table 1** (Tiwari and Tiwari, 2014).

The potential application of proteomics to bacterial pathogen research is huge. However, this review will only consider some of these applications and will specifically address bacteria (rather than fungi). The basic knowledge concerning resistance and virulence obtained recently through proteomic technology should be useful for developing new diagnostic and therapeutic applications for the treatment of infectious diseases.

# PROTEOMICS FOR STUDYING INTERACTIONS BETWEEN BACTERIA AND ANTIBIOTICS

## General Aspects

Antibiotic resistance has become a serious problem in the past two decades and an understanding of the mechanisms of the process is necessary to extend the life of current antibiotics and to enable discovery of novel targets (Lee C. R. et al., 2015). We will describe how antibiotic resistance in bacteria is studied by proteomic techniques. In proteomics, analysis of the total protein content of cells is usually carried out after sublethal contact between selected antibiotics and a pair of isogenic resistant and susceptible bacterial strains. In some cases, intermediate phenotypes are also studied. Resistant strains may be clinical strains or strains obtained in vitro (Park et al., 2016). The proteomic response is usually specific to each antibiotic, but proteins involved in energy and nitrogen metabolism, protein and nucleic acid synthesis, glucan biosynthesis, and stress response are often affected (Park et al., 2016). The findings of proteomic studies are commonly confirmed by genomic and/or transcriptomic analysis of the strains and in some cases also by study of the response of strains in which relevant genes are inactivated by gene replacement technology (Lima et al., 2013).

We will describe essential studies and novel findings as well as some proteomic studies involving the main bacterial antibiotic families.

# Antibiotics Targeting the Cell Wall Beta-lactams

Resistance to beta-lactam antibiotics is one of the types of resistance most commonly studied by proteomics methods (Lima et al., 2013). The beta-lactams antibiotics (e.g., penicillin, cephalosporin, carbapenens, monobactam, and beta-lactamase inhibitors) may disturb the synthesis and/or stability of the cell envelope, thus disrupting cell-wall biogenesis and leading to loss of selective permeability and osmotic integrity, finally causing bacterial cell death (Waxman and Strominger, 1983). The main mechanism of resistance to beta-lactam antibiotics is the presence of antibiotic hydrolyzing proteins, known as beta-lactamases (Pérez-Llarena and Bou, 2009). Other important mechanisms include the imbalance in transport proteins such as efflux pumps and porins and alteration in the penicillin binding protein targets (Poole, 2004). The increased use of antibiotics has generally led to the prevalence of some important resistance strains such as penicillin resistant Streptococcus pneumoniae, methicillin resistant Staphylococcus aureus, and extended spectrum beta-lactamase (ESBL), and cabapenemaseproducing Enterobacteriaceae, Pseudomonas aeruginosa, and Acinetobacter baummanni (Boucher et al., 2009).

One of the earliest proteomic studies was an investigation of ampicillin resistant Pseudomonas aeruginosa, in which novel porins involved in resistance were discovered (Peng et al., 2005). Study of resistance to piperacillin/tazobactam in Escherichia coli has revealed reduced expression of porin OmpX and increased expression of TolC (Dos Santos et al., 2010). In the case of the penicillin-tolerant Gram-positive Streptococcus pyogenes, overexpression of murein metabolism proteins and general alteration of bacterial physiology were observed (Chaussee et al., 2006).

Alterations in cell physiology and overexpression of catalase and superoxide dismutase were detected in methicillin resistant S. aureus. The role of alanine dehydrogenase was indicated as being important in antibiotic resistance (Monteiro et al., 2012).

#### TABLE 1 | Different quantitative proteomic approaches with the associated advantages and disadvantages.


Obtained from Tiwari and Tiwari (2014) and reprinted with permission from the publisher.

Adaptation to oxacillin in S. aureus has recently been investigated (Solis et al., 2014). These authors concluded that proteins involved in capsule formation, peptidoglycan biosynthesis, and wall remodeling are regulated in response to antibiotics. Spectral counting-based label-free quantitative proteomics has been applied to study global responses in methicillin-resistant Staphylococcus aureus (MRSA) and methicillin susceptible S. aureus treated with subinhibitory doses of oxacillin (Liu et al., 2014). Beta-lactamase and penicillin-binding protein 2a were uniquely upregulated in oxacillin-treated MRSA (**Table 2**). Analysis of the inner membrane fraction of carbapenem resistant A. baumanni has shown an association with beta-lactamase AmpC and OXA-51 production as well as metabolic enzymes, elongation factor Tu, and ribosomal proteins (Tiwari et al., 2012; Tiwari and Tiwari, 2014).

#### Glycopeptides

The glycopeptide vancomycin acts by inhibiting peptidoglycan synthesis. It binds to the DAla-DAla terminus of the nascent peptidoglycan, thus blocking its correct synthesis. In Enterococcus spp., a substitution of the DAla residue from peptidoglycan termini by D-lactose or D-Serine has been detected as the main mechanism of resistance to vancomycin. In S. aureus, a more complex scenario involving different enzymes and gene clusters implicated in vancomycin resistance has been proposed. Some resistant strains such as vancomycin resistant Staphylococcus aureus (VRSA) and vancomycin resistant enterococci (VRE) are of serious clinical concern (Lima et al., 2013).

In the first proteomic study of vancomycin resistant Enterococcus faecalis, Wang et al. (2010) examined a reference strain (V583) and a clinical isolate (V309) in the presence and absence of vancomycin. These authors found that proteins involved in vancomycin resistance functions, virulence factors, stress, metabolism, translation, and conjunction were regulated. Proteomic profiles of vancomycin resistant E. faecium SU18 strain treated and not treated with vancomycin have recently been obtained (Ramos et al., 2015). Fourteen proteins were differentially expressed in SU18. Proteins involved in the vancomycin resistance mechanisms were upregulated in the TABLE 2 | Pathway enrichment study by Database for Annotation, Visualization, and Integrated Discovery (DAVID) of the differentially expressed proteins in oxacillin-treated MRSA and MSSA compared with their untreated controls.


Obtained from Liu et al. (2014) and reprinted with permission from the publisher. (+) Upregulated; (–) Downregulated.

presence of vancomycin, while metabolism-related proteins were downregulated, leading to compensatory effects. Differential expression of proteins has been observed in vancomycin resistant S. aureus, thus distinguishing vancomycin intermediate (VISA) type strain Mu50 and vancomycin resistant strains (Drummelsmith et al., 2007). More recently, the proteomic profile of a group of heterogeneous vancomycin-intermediate Staphylococcus aureus (hVISA) was compared with that of vancomycin susceptible S. aureus (Chen et al., 2013). The study initially detected five upregulated proteins in hVISA, although only one was definitely confirmed by real time quantitative reverse transcription PCR (qRT-PCR): the protein encoded by the isaA gene involved in cell wall biogenesis.

# Antibiotics Targeting Protein Synthesis Chloramphenicol

Chloramphenicol acts by binding to the 50 S ribosome subunit. Three mechanisms of resistance to chloramphenicol are known: reduced membrane permeability, mutation of the 50S ribosomal subunit, and production of chloramphenicol acetyltransferase (Civljak et al., 2014). Li et al. (2007) observed differential expression of 10 membrane proteins, including TolC, OmpC, OmpW, and OmpT, in chloramphenicol resistant E. coli. A more recent study demonstrated overexpression of two different efflux pumps in chloramphenicol resistant strains of Burkholderia thailandensis obtained in the laboratory (Biot et al., 2011).

#### Linezolid

In the case of linezolid, an oxazolidinone that binds to the 23S rRNA (Ribosomal ribonucleic acid), different resistance mechanisms have emerged, including increased expression of ABC transporters, mutations in 23S rRNA, mutations in ribosomal proteins L3 and L4, and mutations in an RNA methyltransferase (Lee C. R. et al., 2015). Proteomic and transcriptomic screening of linezolid revealed a possible increase in the metabolism and transport of carbohydrates in some linezolid-resistant S. pneumoniae mutants (Feng et al., 2011). Several glycolytic proteins were overexpressed in the resistant strains, along with other enzymes and transporters involved in sugar metabolism.

### Tetracyclines

The tetracycline family of antibiotics inhibits aminoacyl tRNA binding to the mRNA-ribosome complex. Cells become resistant to tetracycline via at least three mechanisms: enzymatic inactivation of tetracycline, efflux, and ribosomal protection (Falagas et al., 2015).

Yun et al. (2008) used 2-DE/MS-MS (Two-dimensional tandem mass spectrometry) and 1-DE/LC/MS-MS (Onedimensional Liquid chromatography tandem mass spectrometry) techniques to study the surface proteome of A. baumannii DU202 (which is highly resistant to tetracycline) after treatment with subminimal inhibitory concentrations (subMIC) of tetracycline. These authors observed that OmpA38, CarO, OmpW, and other Omps were increasingly secreted on exposure to tetracycline. This indicates an important role for Omps in overcoming antibiotic-induced stress.

A parallel proteomic approach has been applied in the analysis of sarcosine-insoluble outer membrane fraction of P. aeruginosa responding to ampicillin, kanamycin, and tetracycline resistance. The effects on expression levels were variable and expression of OprG decreased as expression of OprF, Oprl, Omp, and MexA increased (Peng et al., 2005). In an interesting study, Lin et al. (2014) labeled the differential proteome of E. coli K12 BW25113 in response to chlortetracycline stress with isobaric tags and applied quantitative proteomics technology for relative and absolute quantitation of the labeling. Crucial metabolic pathways such as the tricarboxylic acid cycle, pyruvate metabolism, and glycolysis/gluconeogenesis fluctuated greatly. The ribosome protein complexes contributing to the translation process were generally elevated under conditions of chlortetracycline stress, which is known to be a compensatory mechanism caused by the action of chlortetracycline on the ribosome.

### Aminoglycosides

The aminoglycoside antibiotic family interrupts protein synthesis by blocking the small 16S subunit of the bacterial ribosome. Three mechanisms of aminoglycoside resistance are known: reduced uptake or decreased cell permeability, alterations at the ribosomal binding sites, and production of aminoglycoside modifying enzymes (Jackson et al., 2013).

In an initial study, a subproteomic approach was used to characterize the outer membrane proteins of streptomycin resistant E. coli. TolC, OmpT, and LamB were upregulated and FadL, OmpW and a location-unknown protein Dps were downregulated in the streptomycin-resistant E. coli strain (Li et al., 2008). Another study analyzed and compared the protein profile of whole cell extracts from Mycobacterium tuberculosis clinical isolates susceptible and resistant to streptomycin. On comparing 2DE patterns, nine proteins were consistently overexpressed in streptomycin resistant isolates and were identified. Moreover, in silico docking analysis revealed significant interactions between these proteins and streptomycin (Sharma et al., 2010). In a native/SDS-PAGE based proteomic study, low levels of NarG and NarH, two components of respiratory nitrate reductase (Nar), were observed in streptomycin, gentamicin, ceftazidime, tetracycline, and nalidixic acid-resistant E. coli strains (Ma et al., 2013).

Nabu et al. (2014) compared the protein expression profiles of a high-level spectinomycin-resistant (clinical isolate) and a susceptible (reference strain) Neisseria gonorrhoeae after treatment with subminimal inhibitory concentrations (subMICs) of spectinomycin. The 50S ribosomal protein L7/L12, an essential component for ribosomal translocation, was over-expressed in both strains, indicating that compensatory mechanisms may work in response to antibiotics that inhibit protein synthesis. Proteomics techniques have been used to establish the effects of gentamicin on the proteomes of aerobic and oxygen-limited E. coli (Al-Majdoub et al., 2013). Ribosomal proteins L1, L9, L10, and S2 were upregulated under both conditions, and the authors postulated that these are candidate drug targets for the development of synergistic combinations with gentamicin.

Kanamycin and amikacin resistant isolates of M. tuberculosis have been studied by proteomic analysis. Twelve proteins, two of which are of unknown function, were upregulated in both antibiotic-resistant isolates. All of the proteins were cellular proteins. Kanamycin and amikacin interacted correctly with the proteins according to molecular docking studies. Kumar et al. (2013) suggested that two of them were putative iron regulation/metabolism related proteins, thereby indicating the role of iron in conferring resistance to second-line drugs. In a recent proteomic and western blotting study of the E. coli K-12 outer membrane (OM) proteins involved in kanamycin resistance, Li H. et al. (2015) observed upregulation of some OM proteins such as Tolc, TsX, and OstA, and downregulation of MipA, OmpA, FadL, and OmpW OM proteins in the kanamycin resistant E. coli K-12 strain. These authors concluded that MipA is a novel OM protein implicated in antibiotic resistance.

### Macrolides

Macrolide antibiotics act by binding reversibly to the P site on the subunit 23 S of the bacterial ribosome. The primary means of bacterial resistance to macrolides is by post-transcriptional methylation of the 23S bacterial ribosomal RNA. Two other rarely observed types of acquired resistance include the production of drug-inactivating enzymes (esterases or kinases) and the production of active ATP (Adenosine triphosphate)-dependent efflux proteins that transport the drug outside of the cell (Cornick and Bentley, 2012).

In an early study, Cash et al. (1999) examined the proteins synthesized by erythromycin-susceptible and erythromycinresistant S. pneumoniae. These authors used peptide mass mapping to identify a 38,500 Dalton protein upregulated in resistant strains as glyceraldehyde-3-phosphate dehydrogenase (GAPDH). Considering this as the possible cause of resistance against erythromycin, the authors proposed an increment in energy production for the efflux system.

In comparative proteomic analysis of isolated sarcosineinsoluble outer membrane protein (OMP) fractions from clarithromycin-susceptible and resistant Helicobacter pylori strains, Smiley et al. (2013) showed that iron-regulated membrane protein, UreaseB, EF-Tu, and a putative OMP were downregulated; the HopT (BabB) transmembrane protein, HofC and OMP31 were upregulated in clarithromycin-resistant H. pylori, revealing that alteration of the outer membrane protein profile may be a novel mechanism involved in clarithromycin resistance in H. pylori.

# Antibiotics Targeting DNA or RNA Synthesis

#### Metronidazole

Metronidazole is an antibiotic of the nitroimidazole class that inhibits nucleic acid synthesis by disrupting the DNA of microbial cells. Multiple mechanisms or resistance have been described in Bacteroides fragilis and H. pylori (Chong et al., 2014)

After analysis of the protein profiles of a derivative of H. pylori strain 26695, which is resistant to moderate levels of metronidazole, McAtee et al. (2001) proposed that the ability of the mutant strain to increase various isoforms of alkylhydroperoxide reductase during exposure to metronidazole is critically important in producing the resistant phenotype. In a metronidazole-resistant strain derived from B. fragilis ATCC 25285, the proteomic changes affected a wide range of metabolic proteins, including lactate dehydrogenase (upregulated) and flavodoxin (downregulated), which may be involved in electron transfer reactions: the enzymatic activity of the pyruvateferredoxin oxidoreductase (PorA) complex was also found to be impaired (Diniz et al., 2004).

Alterations in the metabolic pathway involving pyruvateferredoxin oxidoreductase were also found in a multidisciplinary analysis of a non-toxigenic Clostridium difficile strain with stable resistance to metronidazole (Moura et al., 2014).

Another recent proteomic analysis revealed regulation of DNA repair proteins, putative nitroreductases and the ferric uptake regulator (Fur) in a NAP1 C. difficile clinical isolate resistant to metronidazole (Chong et al., 2014).

### Rifampicin

Rifampicin inhibits bacterial DNA-dependent RNA synthesis by inhibiting bacterial DNA-dependent RNA polymerase. Resistance to rifampicin arises from mutations that alter residues of the rifampicin binding site on RNA polymerase, resulting in decreased affinity for rifampicin. Resistant mutations map to the rpoB gene, encoding RNA polymerase beta subunit (Goldstein, 2014).

Neri et al. (2010) found that 23 proteins were differentially expressed in two rifampicin resistant and one susceptible meningococcus. Increased expression of some of the proteins involved in the major metabolic pathways, mainly pyruvate catabolism and the tricarboxylic acid cycle, was observed. Decreased expression of proteins involved in gene regulation and in polypeptide folding was also observed. Sandalakis et al. (2012) analyzed rifampicin resistance in a rifampicin resistant strain of Brucella abortus 2308 developed in vitro. The resistant strain contained 39 differentially regulated proteins, most of which are involved in various metabolic pathways. The authors suggested that, apart from mutations in the rpoB gene, rifampicin resistance in Brucella mainly involves excitation of several metabolic processes and possible use of the already existing secretion mechanisms at a more efficient level.

#### Quinolones

The fluoroquinolones are the most commonly used family of quinolones in clinical settings. First and second generation fluoroquinolones selectively inhibit the topoisomerase II ligase domain. Third and fourth generation fluoroquinolones are more selective for the topoisomerase IV ligase domain, and thus have enhanced Gram-positive coverage. Three mechanisms of resistance to quinolones are known: efflux pumps; proteins that can bind to DNA gyrase, protecting it from the action of quinolones; and mutations at key sites in DNA gyrase or topoisomerase IV, which decrease their binding affinity and thus decrease the effectiveness of the drug (Redgrave et al., 2014). Two early proteomic studies considered the effects of fluoroquinolones on P. aeruginosa and Salmonella enterica. In the first of these, expression of a probable ATP-binding component of ATP binding cassette (ABC) transporter was observed in ciprofloxacin-intermediate and resistant strains, but not in the sensitive strain (Zhou et al., 2006). The authors of the second study, involving the S. enterica multiple antibiotic resistant strain, suggested that the increased expression of AcrAB/TolC was associated with resistance, while increases in e.g., F1F0-ATP synthase and Imp were a response to fluoroquinolone (Coldham et al., 2006). In a proteomic study of nalidixic acid (NA) resistance in E. coli, Lin et al. (2008) observed upregulation of TolC, OmpT, OmpC, and OmpW and downregulation of FadL in resistant strains. In a broader search for mechanisms at the protein level that confer resistance to fluoroquinolones, Vranakis et al. (2011) compared the proteomes of fluoroquinolone-susceptible Coxiella burnetii and fluoroquinolone-resistant samples of the bacterium (developed in vitro). The study revealed differential expression of 15 bacterial proteins involved in different cellular processes, suggesting that the mechanism of antibiotic resistance in the bacterium is a multifaceted process.

# Antibiotics Targeting Cell Membranes Daptomycin

Daptomycin is a lipopeptide that interacts with phosphatidylglycerol in the bacterial membrane, leading to the formation of holes that leak intracellular ions. Until now, specific genetic determinants of the daptomycin-resistant strains remain to be elucidated (Lee C. R. et al., 2015). Fischer et al. (2011) studied an isogenic daptomycin-susceptible and daptomycin-resistant pair of S. aureus strains (616 and 701) by using comparative proteomics of 616 vs. 701, revealing different concentrations of proteins in various functional categories, including cell wall-associated targets and biofilm formation proteins.

#### Colistin

Colistin is an antimicrobial peptide that interacts with the bacterial outer membrane, by displacing bacterial counter ions in the lipopolysaccharide (LPS). Hydrophobic/hydrophilic regions interact with the cytoplasmic membrane just like a detergent, solubilizing the membrane. The most common mechanisms of resistance to colistin are modifications to LPS (Bialvaei and Samadi Kafil, 2015).

Fernández-Reyes et al. (2009) induced resistance in the susceptible A. baumannii ATCC 19606 by growing the strain under increasing colistin pressure. These researchers then carried out 2-D difference gel electrophoresis (DIGE) experiments and identified 35 proteins that were expressed differently in the two phenotypes. Most of the proteins appearing were downregulated in the colistin-resistant strain. These include outer membrane (OM) proteins, chaperones, protein biosynthesis factors, and metabolic enzymes, indicating an important loss of biological fitness in the resistant phenotype. In a later study, Pournaras et al. (2014) sequentially collected samples from two colistin-susceptible/colistin-resistant (Col(s)/Col(r)) pairs of A. baumannii strains assigned to international clone 2, which is prevalent worldwide after extended exposure to colistin. These researchers observed underexpression of the protein CsuA/B and C from the csu operon in Col(r) isolate Ab347, relative to its Col(s) counterpart Ab299.

The biofilm efficiency of the A. baumannii 19606 type strain depends on the formation of pili, cell-surface appendages assembled via the CsuAB-A-B-C-D-E chaperone–usher secretion system, according to Tomaras et al. (2008). Chaperone usher (CU) pili are linear polymers made of subunits capable of either self-polymerization or assembly with other subunits. The biogenesis of CU fibers need a periplasmic chaperone and outer membrane assembly platform named the usher. Among the three chaperone systems, the archaic systems are associated with bacteria that cause some of the most serious diseases in humans, animals, and plants. This pilus is formed from four subunits, namely CsuA/B, CsuA, CsuB, and CsuE, and is assembled using the CsuC-CsuD chaperone-usher secretion machinery (Pakharukova et al., 2015).

In addition, the Col(r) isolate Ab347 underexpressed aconitase B and some enzymes implicated in the oxidative stress response, such as KatE catalase, superoxide dismutase, and alkyl hydroperoxide reductase. This possibly suggests a limited response to reactive oxygen species (ROS) and, therefore, impaired colistin-mediated cell death by means of hydroxyl radical production (**Table 3**).

The low intracellular c-di-GMP level in dispersed cells of a P. aeruginosa strain coincided with increased expression of proteins



<sup>a</sup>Number of peptides for identification and quantification.

<sup>b</sup>Confidence score for identification by Mascot software.

<sup>c</sup>ANOVA, analysis of variance.

<sup>d</sup>Negative values indicate underexpression in Col<sup>r</sup> strain Ab347.

Obtained from Pournaras et al. (2014) and reprinted with permission from the publisher.

required for the virulence and development of antimicrobial peptide resistance in P. aeruginosa (Chua et al., 2013), and P. aeruginosa cells with low c-di-GMP levels were consequently found to be more resistant to colistin than P. aeruginosa cells with high c-di-GMP levels.

Chopra et al. (2013) analyzed the proteome of two strains of A. baummanii: the multidrug resistant A. baummanii strain BAA-1605 and the drug sensitive strain ATCC 1798. The analysis was performed by by iTRAQ labeling and online 2D LC/MS/MS for peptide/protein identification. A significant number of proteins were overexpressed at least twofold in the multidrug resistant strains including drug-, antibiotic-, and heavy metalresistance proteins, porins, lipoproteins, stress-related proteins, membrane transporters, proteins important for acquisition of foreign DNA, cell-wall, and exopolysaccharide-related proteins, biofilm-related proteins, metabolic proteins, and many with no annotated function. However, the porin OmpW, less abundant in carbapenem- and colistin-resistant A. baumannii strains, was overexpressed by three times in BAA-1605.

## Conclusions

Greater knowledge of the specific mechanisms involved in bacterial resistance is needed to improve the treatment of diseases caused by infectious bacteria and thus control the survival of recalcitrant populations. This interesting area of research is complicated by the fact that multiple, super-imposed and/or balancing resistance mechanisms coexist in the same bacterial species.

Most proteomic analysis of antibiotic resistance can be classified into two broad groups: comparison of resistant and susceptible bacteria, and bacterial responses to the presence of antibiotics (**Table 4**). In general, in the first type of studies the least abundant proteins are related to secretion and metabolism, specifically OmpW, which is a known bacteriocin receptor. The most commonly expressed proteins are those involved in cell wall biogenesis, known resistance mechanisms, polysaccharide metabolism, and transport. The most frequently cited of these is Tolc, an outer membrane channel that participates in several efflux systems. In the second case, analysis of the bacterial response to antibiotic challenge revealed that the proteins most commonly affected are chaperone proteins and proteins involved in stress response, amino acid metabolism and energy metabolism. Some proteins involved in amino acid and energy metabolism are overexpressed while others are underexpressed, indicating the problem of using broad functional classifications and also the complexity of antibiotic response (**Table 4**). Some of the most revealing analysis coupled both previously described approaches, to investigate the response of both susceptible and resistant strains to antibiotic exposure. Multivariable approaches, i.e., those combining several strains, growth conditions, concentrations and types of antibiotics, and time points, may contribute to a better understanding of bacterial pathways and systems relevant to resistance.

Some unique mechanisms or groups of proteins are expressed at low or indiscernible levels in the unchallenged resistant organisms, but are regulated in response to antibiotic exposure. These represent potentially useful targets in new therapies against resistance.

# ANALYSIS OF ANTIMICROBIAL RESISTANCE FOR DIAGNOSTIC PURPOSES

## General Aspects

The main purpose of a clinical microbiology laboratory is to provide reliable information as quickly as possible about the etiological agents responsible for infections and their sensitivity to antibiotics. The information obtained should allow enable selection of the most appropriate antimicrobial therapy to improve the care provided to the patient while at the same time

#### TABLE 4 | Representative proteomic techniques for studying antibiotic resistance.


contributing to better control of resistance and also to decreasing expenditure on antibiotics (DeMarco and Ford, 2013).

Recent findings indicate the potential usefulness of mass spectrometry techniques, in particular MALDI-TOF MS, to identify specific mechanisms of resistance. These techniques are less expensive and produce results more quickly than the currently used methods. The applications of MALDI-TOF MS for diagnosis of antibiotic resistance have been discussed in two recent excellent reviews (Hrabák et al., 2013; Kostrzewa et al., 2013). These methods are at the initial stages of evaluation and some are only useful for centers of excellence and research laboratories.

Here, we will try to summarize and classify methods available with some valuable examples and providing the latest contributions. MALDI-TOF MS applications in detection of antimicrobial resistance can be classified depending on the type of target and the methodology into the following approaches.

# Identification of the Entire Cell Profile

Identification of the entire cell profile involves establishing differences in the whole protein spectra profile of susceptible and resistant strains. This approach has been used to identify methicillin-resistant S. aureus (MRSA), with the first attempts being carried out by Edwards-Jones et al. (2000) at Manchester University. MRSA and MSSA strains were subsequently successfully distinguished using a protein Chip array (SELDI-TOF, Surface-enhanced laser desorption/ionization; Shah et al., 2011). Retrospective typing of an MRSA outbreak showed that it is possible to differentiate unrelated MSSA, MRSA and borderline resistant S. aureus (BORSA) strains.

MALDI-TOF MS has been also used to detect vancomycinresistant enterococci. The method was used to detect vanB positive Enterococcus faecium with a high sensitivity and specificity (Griffin et al., 2012), and vanA positive E. faecium has recently been detected (Nakano et al., 2014). Similar findings have recently been reported for an important Gram-negative anaerobic pathogen, B. fragilis. Two groups (division I and II), division II harboring the gene cfiA, encoding a potent metallobeta-lactamase have been differentiated by a specific peak in their MALDI-TOF profile spectra (Nagy et al., 2011; Wybo et al., 2011).

MALDI-TOF MS techniques have also been used by different researchers to differentiate beta-lactamase genes carrying strains of Enterobacteriaceae and P. aeruginosa. Dubska et al. (2011) used SELDI-TOF MS, but were unable to clearly differentiate different beta-lactamases. However, Camara and Hays (2007) detected a peak corresponding to a beta-lactamase in E. coli strains. Overall the results suggest that the routine detection of beta-lactamase producing strains is not yet possible in clinical laboratories.

# Identification of an Antibiotic and Product of Hydrolysis

This promising approach has been reported for the important clinical carbapenemases and extended-spectrum betalactamases. The mass shift that occurs during the addition of a water residue in the beta-lactam molecule after enzymatic hydrolysis can be detected by MALDI-TOF MS. The betalactamase negative strains do not modify the molecular weight of the beta-lactam (Hrabák et al., 2013; Kostrzewa et al., 2013). Moreover, the procedure enables quantitative analysis that is useful for direct comparison with MIC-values, and it also provides rapid resolution. Furthermore, the method can be improved by using beta-lactamase inhibitors to identify specific types of beta-lactamase and it is applicable to positive blood cultures. The main limitation is the interference with other resistance mechanisms such as porins and efflux pumps when intact cells are used; however, this can be overcome by using different lysis methods. Analysis of the raw spectra can also be difficult for unexperienced microbiologists.

The first two studies involving direct carbapenemase detection were both published in 2011 (Burckhardt and Zimmermann, 2011; Hrabák et al., 2011). The prevailing approach consists of the incubation of a pellet from a bacterial culture, usually grown overnight, with the beta-lactam antibiotic. After incubation of the culture for 1–3 h at 35◦C, the supernatant is analyzed by MALDI-TOF MS (**Figure 1**). The addition of NH4HCO<sup>3</sup> increased the sensitivity of detection of OXA-48 (Papagiannitsis et al., 2015; Studentova et al., 2015). The method has also been extended to OXA-51 and SIM-1 carbapenemases (Lee et al., 2013). MALDI-TOF MS can be used detect OXA- and GEScarbapenemases (Chong et al., 2015). The method has been used to detect carbapenemase resistance associated with cifA encoded metallo-beta-lactamase in blood samples (Johansson et al., 2014). The capacity of these techniques for detecting carbapenemases has recently been enhanced by different research groups (Álvarez-Buylla et al., 2013; Peaper et al., 2013; Wang et al., 2013). The method has been shown to be useful in detecting ESBL-producing Enterobacteriaceae from positive blood cultures, with 99% of sensitivity within a maximum of 150 min, with cefotaxime and ceftazime antibiotics (Oviaño et al., 2014).

Some MALDI-TOF MS based methods can also be used to detect modifications that occur in resistance to aminoglycosides, such as inactivation by acetyltransferases (Kostrzewa et al., 2013), with poor reproducibility and rRNA methyltransferase (Hrabák et al., 2013). Overall these data suggest that MALDI-TOF can be used to detect aminoglycoside resistance, although these methods are generally limited to research laboratories.

# Detection of Resistance Proteins within the Cell

In this case, MALDI-TOF MS can be used to help detect some microbial biomarkers (essentially proteins or their fragments, obtained after trypsin digestion) that confer resistance to the pathogen.

For example, methicillin-resistant S. aureus positive for agr (accessory gene regulator) and harboring the class A mec complex was identified by detection of the small peptide called PSM-mec in whole cells (Josten et al., 2014).

Detection of antibiotic resistance markers, such as peptide fragments, in clinical strains of E. coli associated with some beta-lactamases (CTX-M1, CMY-2, VIM, and TEM) together with kanR and aminoglycoside modifying enzyme was obtained by periplasm extraction followed by nano-LC separation (Hart et al., 2015) Capillary-electrophoresis mass spectrometry is useful for detecting OXA-48 and KPC carbapenemases in multidrug resistant Gram-negative bacteria (Fleurbaaij et al., 2014; **Figure 2**), and detection of beta-lactamase proteins has been improved by the use of MALDI-TOF MS methods rather than PCR detection of the corresponding genes (Trip et al., 2015).

# Cell Wall Analysis

The cell wall is the target of most antibiotics and is a barrier to other antibiotics that act in the cytosol. Some components of the outer membrane of Gram-negative bacteria such as porins, efflux pumps, and lipopolysaccharides have been quantified by MALDI-TOF MS techniques to distinguish between resistant and sensitive strains (Peng et al., 2005; Imperi et al., 2009). For example, a MALDI-TOF MS based method proved better than SDS-PAGE for characterizing the Ompk36 porin in K. pneumoniae (Cai et al., 2012). Changes in the structure of the lipopolysaccharide lipid A that occur during emergence of resistance to colistin can be detected by MALDI-TOF MS in A. baumanni (Beceiro et al., 2011).

# Discovery of Mutations within Resistance Genes through Mini-Sequencing

MALDI-TOF MS methods have been used to analyse DNA sequencing (Pusch et al., 2002). The technique is based on a primer extension assay for only a few bases, and the molecular weight shift is then detected by MALDI-TOF MS. The technique is limited to DNA molecules smaller than 40 bp (**Figure 3**).

Although, it has been used to detect mutations for a large and diverse number of resistance mechanism (Hrabák et al., 2013), this approach is time-consuming and does not confer any advantages over standard sequencing protocols.

PCR/electrospray ionization Mass spectrometry (PCR/ESI MS), which involves mini-sequencing of larger PCR products (110–450 bp), has been successfully used to identify pathogens and has also proved useful for detecting some blaKPC betalactamase producing Enterobacteria (Endimiani et al., 2010). This technique has been commercialized via development of a fully automated system (PLEX-ID, Abbot Biosciencies).

# Stable Isotope Labeling and Monitoring of Cell Growth

The technique has been used for monitoring resistance to fluconazole (Marinach et al., 2009) and caspofungin (De Carolis et al., 2012) in fungi. Cells harvested from solid cultures were incubated with different concentrations of antibiotic for 24 h and the total peak profile was determined; however, no improvement in time-to-result was achieved, despite the good correlation between MIC-values and different spectra. A simplified version of this approach that facilitates discrimination of susceptible and

FIGURE 3 | Simplified procedure of minisequencing linked to MALDI-TOF MS detection. Obtained from Hrabák et al. (2013) and reprinted with permission from the publisher.

resistant isolates of Candida albicans after incubation for 3 h in the presence of "breakpoint" level drug concentrations of the caspofungin has been described (Vella et al., 2013).

Some studies have attempted to detect bacterial resistance by incorporating stable isotope labeled nutrients during growth in the presence of antibiotics (Demirev et al., 2013). The peak profiles were compared with those of intact microorganisms grown in unlabeled media without the drug. Some biomarkers were detected by characteristic mass shifts provided by a suitable isotope. A similar approach has recently been used with <sup>13</sup>C 15 <sup>6</sup> N2- L-lysine as the isotope for detection of methicillin-resistant S. aureus (Sparbier et al., 2013). Stable isotope labeling by amino acids in cell culture (SILAC) technology with the same isotope have been used to distinguish resistant and susceptible strains of P. aeruginosa to meropenem, tobramycin, and ciprofloxacin (Jung et al., 2014).

# Conclusions

Although, the use of MALDI-TOF MS in clinical microbiology has been successfully applied in identifying bacteria, detection of antibiotic resistance, essential in bacterial diagnostics, is still at an early stage of development. Some promising achievements are the detection of proteins that hydrolyze antibiotics by means of direct detection of antibiotic modifications. The most remarkable example is the detection of beta-lactamases, particularly carbapenemases, by a method that has been validated and routinely used in clinical and reference laboratories. However, detection of resistance mechanism elements (particularly Qnr proteins and mutant PBP and beta-lactamases) for improved diagnosis of multiresistant bacteria must be improved and validated, despite some reports about the detection of vancomycin resistant Enterococcus spp. and discrimination between MRSA and MSSA strains.

In summary, several of the cited methods require further validation, simplification and automation. MALDI TOF-MS analysis may prove to be too labor intensive for routine use in clinical laboratories. Other challenges include the better study of the interaction between proteomic results and MIC-values for used in prescribing appropriate treatment and monitoring infection. Methods of testing multiple antibiotics are urgently required in order to provide a complete picture of antibiotic resistance. Standard susceptibility testing will probably coexist with new proteomic methods, as resistance to antibiotic is a complex process.

# PROTEOMICS FOR STUDYING BACTERIAL VIRULENCE

# General Aspects

Proteomic techniques are becoming important tools for investigating microbial pathogenesis. Applications include identification of virulence factors and study of the response of both host and pathogen to infection.

Research was initially limited to the classical proteomic analysis of protein contents of bacterial cultures, usually involving comparison of avirulent and virulent strains or conditions that simulate stresses that bacteria encounter during the infection process, such as acid stress, low oxygen content, high osmolarity, and other conditions of temperature, presence or limited availability of iron and presence of urine or plasma. Co-cultivation with host cell cultures and the direct isolation of bacteria from samples were not achieved until a later stage, because of the technical challenge that arose in measuring bacterial proteins against the overwhelming background of host proteins. Different approaches have been used to solve these problems, usually involving different separation techniques such as differential centrifugation, cell sorting, and immunomagnetic separation. Some review articles have already considered these early studies (Bhavsar et al., 2010; Cash, 2011; Yang et al., 2015), and we therefore focus on recent studies that we consider important in the study of bacterial pathogenesis by proteomics.

# Bacterial Pathogenesis

Wang et al. (2014) recently compared the proteome profile of the S. enterica subsp. enterica serovar Typhimurium and S. typhi, which are responsible for, respectively, gastroenteritis and typhoid fever types. These researchers first detected a group of proteins with serovar-specific expression, which can be used as new biomarkers for identifying clinical serotypes. They also found that expression of flagella and chemotaxis proteins was lower in S. typhi than in S. typhimurium. Finally, the expression of core genes, which were involved in metabolism and transport of carbohydrates and amino acids, differed in the two serovars.

Proteomics, microbial genetics, competitive infections, and computational approaches have previously been used to obtain a comprehensive overview of Salmonella nutrition and growth in a mouse typhoid fever model (Steeb et al., 2013). In a study of the virulence of H. pylori, Vitoriano et al. (2011) compared the proteomes of strains of the bacterium isolated from children with peptic ulcer disease and from children with non-ulcer dyspepsia. In addition to the presence of genes clearly encoding virulence factor, the pediatric ulcerogenic strains presented a proteome profile defined by changes of motilityassociated proteins (involved in higher motility), antioxidant proteins (involved in better resistance to inflammation), and proteins implicated in key functions in the metabolism of glucose, amino acids, and urea (probably advantageous for confronting changes in nutrient availability).

More recently, Ansong et al. (2013a) performed a temporal multi-omic analysis, at physiologically relevant temperatures, of Yersinia pestis (YP), the causative agent of plague with a high mortality rate, and Yersinia pseudotuberculosis (YPT), an enteric pathogen with a modest mortality rate. Gene and protein expression levels of conserved major virulence factors were higher in YP than in YPT, including the Yop virulon and the pH6 antigen. The global transcriptome and proteome responses of YP and YPT revealed conserved post-transcriptional control of metabolism and the translational machinery including the modulation of glutamate levels in Yersiniae.

Madeira et al. (2013) compared the proteomic profiles of three clonal isolates of Burkholderia cenocepacia obtained from a cystic fibrosis patient between onset of infection and before death from cepacia syndrome 3.5 years later. They found that 52 proteins were similarly altered in both late-stage isolates, relative to the first isolate, which suggests an important role for metabolic reprogramming in the virulence potential and persistence of B. cenocepacia, particularly in regard to bacterial adaptation to microaerophilic conditions. The content of the virulence determinant AidA was also higher in the two late-stage isolates.

In an original approach, Provenzano et al. (2013) analyzed the metaproteome of microbial communities from endodontic infections associated with acute apical abscesses and asymptomatic apical periodontal lesions. They found several proteins related to pathogenicity and resistance/survival in endodontic samples, including proteins involved in adhesion, biofilm formation, and antibiotic resistance, stress proteins, exotoxins, invasins, proteases, and endopeptidases (mostly in abscesses), and an archaeal protein linked to methane production. Most of the human proteins detected were involved in cellular processes and metabolism, as well as immune defense.

In a study involving quantitative proteomic analysis, Liu et al. (2012) showed that on entry into host cells, Campylobacter jejuni undergoes a significant metabolic downshift. They also observed reprogramming of respiration in intracellular C. jejuni, favoring respiration of fumarate.

Pieper et al. (2013) performed global proteomic analysis of Shigella flexneri strain 2457T in association with three distinct growth environments: in broth (in vitro), in epithelial cell cytoplasm (intracellular), and coculture with extracellular epithelial cells. The intracellular bacteria showed elevated protein expression in invasion and cell-to-cell spread determinants, including IpA, Mxi, and Ics proteins, compared with in vitro and extracellular bacteria. The intracellular environment was also characterized by changes in iron stress and carbon metabolism protein expression levels, which may indicate its important role in the transition of S. flexneri from the extra to the intracellular milieu.

Another elegant study identified the pathogenic mechanisms of Chlamydia trachomatis after entry into host cells and the formation of a membrane-bound compartment (the inclusion) accompanied by secretion of inclusion membrane proteins (Incs). Mirrashidi et al. (2015) used affinity purification-mass spectroscopy (AP-MS) to detect Inc-human interactions for 38/58 Incs involved in host processes of intracellular life cycles, including retromer components as sorting nexins. Inc targets and overlapping of viral proteins were detected, suggesting common pathogenic mechanisms among obligate intracellular microbes.

During a very recent study with the infectious agent A. baumannii, we obtained an ex vivo protein expression profiling for pneumonia (Méndez et al., 2015). We characterized the proteome of A. baumannii in the presence of bronchoalveolar lavage fluid from infected rats (to simulate conditions in the respiratory tract) and in the presence RAW 264.7 cells (control conditions). We observed alterations in cell wall synthesis and identified two upregulated virulence-associated proteins with >15 peptides/protein in both ex vivo models (OmpA and YjjK), which suggests that these proteins are fundamental for pathogenesis and virulence in the airways (**Figure 4A**).

Proteins from Outer Membrane Vesicle subproteome and (C) Freely soluble extracellular proteins. The total numbers of proteins within the respective group are shown in brackets. Obtained and adapted from Mendez et al. (2012) and reprinted with permission from the publisher.

# Studies of Microbial Biofilm

At least 65% of all human infectious diseases are known to be associated with the biofilm form of microorganisms. The most notable and clinically relevant property of biofilms is the greater resistance to antimicrobials than shown by their planktonic counterparts (Seneviratne et al., 2012). Some proteomic studies have revealed an association between resistance mechanisms and biofilm state, the subject of a recent review article (Seneviratne et al., 2012). Our research group has been able to identify a putative resistance-nodulation-cell division type efflux pump (RND pump), which increased in A. baumannii biofilms. Overexpression of this type of efflux pump in A. baumannii confers resistance to aminoglycosides, as well as to other drugs, including fluoroquinolones, tetracyclines, chloramphenicol, erythromycin, trimethoprim, and ethidium bromide (Cabral et al., 2011). In another study recently carried out in our laboratory, A. baumannii clinical strain AbH12O-A2 was able to survive long periods of desiccation as a result of the presence of cells in a dormant state, via mechanisms affecting control of cell cycling, DNA coiling, transcriptional and translational regulation, protein stabilization, antimicrobial resistance, and toxin synthesis; a few surviving cells embedded in a biofilm matrix were able to resume growth and restore the original population in appropriate environmental conditions (Gayoso et al., 2014).

Relationships between biofilm state and virulence factors have also been discovered recently. High-resolution 2-dimensional gel electrophoresis was used to obtain the proteome profiles of spiral H. pylori and early biofilm (Shao et al., 2013). Differential protein spots were identified and associated with flagellar movement, bacterial virulence, signal transduction, and regulation.

MALDI-TOF MS/MS analysis of the secretome of single and mixed biofilms of P. aeruginosa and C. albicans carried out at different times (Purschke et al., 2012) identified 16 proteins overexpressed or exclusively expressed in P. aeruginosa when interacting with C. albicans including virulence factors such as exotoxin A and iron acquisition systems.

Another strategy was used to study the biofilm expression profile in a clinical strain of S. pneumoniae serotype 14. A decrease in the expression of enzymes related to the glycolytic pathway, as well as proteins involved in translation, transcription, and virulence was detected during biofilm growth by the proteomics method (Allan et al., 2014).

# Outer Membrane Vesicles and their Impact on Virulence

Outer membrane vesicles (OMVs) are nanoscale structures secreted by bacteria and that can carry nucleic acids, proteins, and small metabolites. They mediate intracellular communication and play a role in virulence. One of the early proteomic studies of the protein fraction of the outer membrane vesicles was carried out in our laboratory. This study focused on two main protein fractions of the extracellular proteome: proteins exported by outer membrane vesicles (OMVs) and freely soluble extracellular proteins (FSEPs) present in the culture medium of a highly invasive, multidrug-resistant strain of A. baumannii (clone AbH12O-A2; Mendez et al., 2012). Of the OMV proteins, 39 were associated with pathogenesis and virulence, including proteins associated with attachment to host cells (e.g., CsuE, CsuB, CsuA/B) and specialized secretion systems for delivery of virulence factors (e.g., P-pilus assembly and FilF), whereas the FSEP fraction possessed extracellular enzymes with degradative activity, such as alkaline metalloprotease. Among the FSEP, we have also detected at least 18 proteins with a known role in the oxidative stress response (e.g., catalase, thioredoxin, oxidoreductase, superoxide dismutase; **Figures 4B,C**). These findings are supported by those of a recent study of other clinical A. baummanni strains in which it was discovered that A. baumanni strain A38 possesses an outer membrane vesicle containing virulence factors including Omp38, EpsA, Ptk, GroEL, hemagglutinin-like protein, and FilF (Li Z. T. et al., 2015). The high content of virulence factors in outer membrane vesicles has been demonstrated in other Gram-negative bacteria such as clinical strains of E. coli and Stenotrophomonas maltophila (Devos et al., 2015; Wurpel et al., 2015). Indeed, this phenomenon is also conserved in some Gram-positive pathogenic strains such as M. tuberculosis and S. pneumoniae (Olaya-Abril et al., 2014; Lee J. et al., 2015).

However, as pointed out by several authors (Koning et al., 2013; Pérez-Cruz et al., 2013), OMVs may be composed by inner membrane fractions and can thus be considered outerinner membrane vesicles. These structures may thus contain cytosolic fractions. OMV-related virulence can be considered in the context of this novel concept.

# Post-Translational Protein Modifications and their Impact on Virulence

Bacteria contain different types of post-translational modifications, many of which are commonly present in eukaryotes: phosphorylation, acetylation, glycosylation, ubiquitination, and glutathionylation (Soufi et al., 2012). Relationships between these types of post-translational modifications and virulence have been established in recent years. One of the first studies with Pseudomonas species revealed that phosphoproteins involved in motility, transport and pathogenicity pathways in P. aeruginosa are critical for survival and virulence (Ravichandran et al., 2009).

Around the same time, the phosphoproteome of K. pneumoniae NTUH-K2044 was analyzed by a shotgun approach, and 117 unique phosphopeptides were identified along with 93 in vivo phosphorylated sites corresponding to 81 proteins (Lin et al., 2009). Interestingly, three of these were found to be distributed in proteins of the cps locus and the authors of the study speculated that they were involved in the converging signal transduction of capsule biosynthesis, which has been related to virulence. Involvement of phosphorylated proteins in virulence was also hypothesized after proteomic analysis of S. pneumoniae and Mycoplasma pneumoniae (Schmidl et al., 2010; Sun et al., 2010).

Interestingly, qualitative comparison between the Ser/Thr/Tyr phosphoproteomes of two A. baumannii strains (a reference strain and a highly invasive multidrug-resistant clinical isolate) led to the identification of phosphoproteins with a role in pathogenicity and also involved in drug resistance (Soares et al., 2014).

Other types of post-translational modifications have recently been discovered by proteomic methods. In a well-designed study, Iwashkiw et al. (2012) showed that A. baumannii ATCC 17978 possesses an O-glycosylation system responsible for the glycosylation of multiple proteins. These authors identified seven A. baumannii glycoproteins, of yet unknown function, by 2D-DIGE and mass spectrometry. A glycosylation-deficient strain was generated by homologous recombination. This strain did not show any growth defects, but exhibited a severely diminished capacity to generate biofilms. Disruption of the glycosylation machinery also resulted in reduced virulence in two infection models (Dictyostelium discoideum and Galleria mellonella) and reduced in vivo fitness in a mouse model of peritoneal sepsis. In a study involving proteomic and glycoproteomic analysis, Scott et al. (2014) found higher N-linked glycosylation in Campylobacter jejuni NCTC11168 O clinical strain than in the laboratory-derived strain.

Another post-translational modification was described by Xie et al. (2015) who investigated proteome lysine acetylation profiling in M. tuberculosis by using a combination of antiacetyl lysine antibody-based immunoaffinity enrichment and high-resolution mass spectrometry. These authors identified 1128 acetylation sites on 658 acetylated M. tuberculosis proteins, several of which, e.g., isocitrate lyase, were involved in the persistence, virulence and antibiotic resistance. In an interesting study, Ansong et al. (2013b) identified different uses for the protein S-thiolation forms S-glutathionylation and Scysteinylation in response to infection-like conditions and basal conditions in S. typhimurium and supported by analysis of protein structure and gene deletion.

# Conclusions

The use of proteomic analysis to study the interactions between bacterial pathogenesis and host is at a very early stage. Nevertheless, we have described the use of proteomic techniques to investigate many aspects of bacterial pathogenesis: assignment of a virulence factor reservoir of a single pathogen, analysis of the interaction between host and pathogen response during the infection, identification of interactions between several virulence factors and host cell components and even the identification of biochemical action of virulence factors (**Table 5**).

The limit of sensitivity has prevented a more detailed proteomic study of in vivo models of infection, particularly in humans, in comparison with tissue culture models. Although, current methods of analysis are proving very useful, they must be improved further in order to advance our knowledge in this field. A multidisciplinary approach including methods such as immmunoassays, histological methods, and electron microscopy should be undertaken for an integrated understanding of bacterial pathogenesis. Monitoring of the temporal and spatial



action of virulence factors may also contribute to advances in this field.

The relationships between biofilm formation, outer membrane vesicles, and virulence have been highlighted during this review.

Identification of variations in post-translational changes in the bacteria and/or in the host proteomes after infection is also important for studying interactions between signaling cascades.

A detailed global picture of the interaction between host and pathogen may provide opportunities for identifying new targets for antimicrobial programs and vaccine development.

# PRACTICAL APPLICATIONS

We have reported the huge progress in diagnostics of microbial resistance and basic research in resistance and virulence achieved in recent years. Many of these advances have been transferred to protocols for practical diagnosis and will potentially lead to the development of new therapies against pathogenic bacteria. Such examples include the discovery of 34 unknown proteins of similar abundance found in both cell envelopes and membrane vesicles fractions in four N. gonorrhoeae strains (Zielke et al., 2014). Depletion of one of these, a homolog of an outer membrane protein LptD, was found to cause loss of viability in N. gonorrhoeae. These authors also identified another six predicted outer membrane proteins of unknown function. Loss of NGO1985, in particular, resulted in dramatically decreased viability of N. gonorrhoeae on treatment with detergents, polymyxin B and chloramphenicol, suggesting that this protein functions in maintaining the cell envelope barrier to permeability and may represent a new therapeutic target (Zielke et al., 2014).

Proteomic studies involving metabolism of virulent or resistant strains have been a source of potential therapeutic targets. For example, in a study investigating determinants of the biofilm condition of A. baumannii, Cabral et al. (2011) found that histidine metabolism (like Urocanase) was implicated in biofilm formation, as confirmed by gene disruption experiments. The authors proposed a model in which novel proteins are suggested for the first time as targets for preventing the formation of A. baumannii biofilms.

Vaccination has been revealed as an important strategy for reducing the incidence of infectious diseases. Immunoproteomics permits the characterization of putative vaccine candidates by combining proteomic analysis during infection together with serum identification of immunoreactive antigens. A recent review described examples of vaccine candidates that have been identified by immunoproteomics and have successfully protected animals against challenge when tested in immunization studies (Dennehy and McClean, 2012).

An exopolysaccharide and a protein-based biofilm expressed by two clinical strains of S. aureus have recently been studied by proteomic methods. Moreover, the number of bacterial cells inside a biofilm and in the surrounding tissue decreased after immunization with the biofilm matrix exoproteome in an in vivo model with a biofilm linked to infection (Gil et al., 2014).

# CONCLUDING REMARKS

The method used for MS-based identification of proteins is constantly improving and has allowed progress from qualitative determination of proteins (i.e., presence or absence) toward quantitative studies of the proteome. However, the method still has many limitations. The inability to determine many proteins present at low concentrations, together with small number of proteins in the dataset may lead to errors in interpretation (e.g., associating complete biochemical pathways with virulence or resistance mechanisms). Comprehensive interpretation is also very difficult when the genome has not yet been obtained. This could be solved by using the highest resolution bottomup approach and appropriate use of the knowledge base to improve the design of new studies and by the inclusion of specific statistical strategies to identify significant proteins. Next-generation sequencing should be linked with proteomic techniques. The combined use of different methods will cut down on the number of experiments required and improve the chances of identifying effective targets.

Basic aspects of bacterial growth and physiology are often ignored. Collaboration between a broad range of specialist including biochemists, biologists, and statisticians will be necessary, especially once metabolites can be synthesized. These metabolites must be assayed by metabolomic methods. Recent reviews of the relationship between metabolism and antibiotic resistance revealed global regulators that simultaneously alter virulence, metabolism and antibiotic resistance (Martínez and Rojo, 2011).

It can be difficult to coordinate and interpret the increasing volume of high resolution MS data obtained in the field of bacterial proteomics. Thus, an outstanding role for microbial bioinformatics is expected and more software packages for MS analysis must be shared. Attempts could also be made to merge several previous studies and select common patterns that ultimately enable better design of future experiments.

On the other hand, the multifactorial nature of bacterial infection must be investigated using several approaches, in a systems biology strategy. The coordinated use of different techniques including genomics, transcriptomics, and metabolomics, together with good standard proteomic methods will improve the capacity for detecting bacterial resistance, determination of resistance mechanisms and the understanding of virulence response in bacteria and host. This will hopefully help in the discovery of new biomarkers, vaccines, and drugs for combating bacterial infection worldwide.

In conclusion, in this review we summarize some of the applications of proteomic technology for diagnosing infections as well as performing basic studies for understanding antimicrobial resistance and virulence. This will enable discovery of new targets for novel and innovative antibiotics as well as the implementation of new therapies in the near future.

# AUTHOR CONTRIBUTIONS

FP, revised literature and wrote; GB, wrote and contributed with critical analysis.

# FUNDING

This work was funded by the European Community, FP7, ID:278232 (MagicBullet) and by the Plan Nacional de I+D+I 2008–2011 and Instituto de Salud Carlos III, Subdirección General de Redes y Centros de Investigación Cooperativa, Ministerio de Economía y Competitividad, Spanish Network for Research in Infectious Diseases (REIPI RD12/0015/0014) cofinanced by the European Development Regional Fund " A Way to Achieve Europe" ERDF. Additional funding was provided by the Fondo de Investigación Sanitaria (grants PI12/00552 and PI15/00860).

# REFERENCES


of Streptococcus pyogenes. J. Antimicrob. Chemother. 58, 752–759. doi: 10.1093/jac/dkl319


rapid identification of antibiotic resistance. Anaerobe 17, 444–447. doi: 10.1016/j.anaerobe.2011.05.008


in Acinetobacter baumannii and its role in virulence and biofilm formation. PLoS Pathog. 8:e1002758. doi: 10.1371/journal.ppat.1002758


role in evolutionary success. Trends Microbiol. 22, 438–445. doi: 10.1016/j.tim.2014.04.007


pathogenic bacterium Streptococcus pneumoniae. J. Proteome Res. 9, 275–282. doi: 10.1021/pr900612v


membrane vesicles for the discovery of potential therapeutic targets. Mol. Cell. Proteomics 13, 1299–12317. doi: 10.1074/mcp.M113.029538

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Pérez-Llarena and Bou. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Mass Spectrometry-Based Bacterial Proteomics: Focus on Dermatologic Microbial Pathogens

Youcef Soufi<sup>1</sup> and Boumediene Soufi<sup>2</sup> \*

<sup>1</sup> College of Medicine, University of Manitoba, Winnipeg, MB, Canada, <sup>2</sup> Independent Academic Scholar, Magdeburg, Germany

The composition of human skin acts as a natural habitat for various bacterial species that function in a commensal and symbiotic fashion. In a healthy individual, bacterial flora serves to protect the host. Under certain conditions such as minor trauma, impaired host immunity, or environmental factors, the risk of developing skin infections is increased. Although a large majority of bacterial associated skin infections are common, a portion can potentially manifest into clinically significant morbidity. For example, Gram-positive species that typically reside on the skin such as Staphylococcus and Streptococcus can cause numerous epidermal (impetigo, ecthyma) and dermal (cellulitis, necrotizing fasciitis, erysipelas) skin infections. Moreover, the increasing incidence of bacterial antibiotic resistance represents a serious challenge to modern medicine and threatens the health care system. Therefore, it is critical to develop tools and strategies that can allow us to better elucidate the nature and mechanism of bacterial virulence. To this end, mass spectrometry (MS)-based proteomics has been revolutionizing biomedical research, and has positively impacted the microbiology field. Advances in MS technologies have paved the way for numerous bacterial proteomes and their respective post translational modifications (PTMs) to be accurately identified and quantified in a high throughput and robust fashion. This technological platform offers critical information with regards to signal transduction, adherence, and microbial–host interactions associated with bacterial pathogenesis. This mini-review serves to highlight the current progress proteomics has contributed toward the understanding of bacteria that are associated with skin related diseases, infections, and antibiotic resistance.

Keywords: mass spectrometry, proteomics, pathogenic bacteria, skin disease, dermatology, bacterial resistance

# INTRODUCTION

Skin has a primary role to act as a physical barrier in order to protect the body from temperature variations, microorganisms, or toxic substrates. While in utero skin is sterile, after birth it becomes rapidly colonized with numerous microorganisms which aids toward protecting the body (Capone et al., 2011). Bacterially derived dermatologic conditions can range from relatively benign conditions (assuming non-immunocompromised patients) such as cellulitis, erysipelas, folliculitis, or impetigo to serious clinical morbidity, such as necrotizing fasciitis (Stulberg et al., 2002). Underlying medical conditions such as diabetes mellitus or AIDS are examples where

#### Edited by:

German Bou, Hospital Universitario A Coruña, Spain

#### Reviewed by:

Catrin Ffion Williams, Cardiff University, UK Lucía Monteoliva, Universidad Complutense de Madrid, Spain

> \*Correspondence: Boumediene Soufi boumediene@gmail.com

#### Specialty section:

This article was submitted to Microbial Physiology and Metabolism, a section of the journal Frontiers in Microbiology

> Received: 24 October 2015 Accepted: 02 February 2016 Published: 19 February 2016

#### Citation:

Soufi Y and Soufi B (2016) Mass Spectrometry-Based Bacterial Proteomics: Focus on Dermatologic Microbial Pathogens. Front. Microbiol. 7:181. doi: 10.3389/fmicb.2016.00181

commensal bacteria can invade the skin resulting in infections that can manifest from mild to the life threatening (Swartz, 2004).

In recent years there have been growing concerns with respect to the challenges faced in the emergent inability to provide effective therapeutic interventions for patients with pathologies of bacterial etiology, due to increasing rates of antibiotic resistance. Considering the ease of transmission with respect to resistance genes amongst mucosal and epidermal flora, special consideration should be given to dermatologic related bacterial conditions (Espersen, 1998).

Mass spectrometry (MS)-based proteomics has the capability to study proteins and their interactions in order to better understand dysregulations in infection disorders (List et al., 2008), reveal antibiotic resistance mechanisms (Lee et al., 2015) and significant new targets for future drug discovery. Current MS-based proteomics technologies have advanced to the point where they are amenable to any biological system. With regards to bacterial organisms, they are particularly attractive models to apply proteomics based approaches due to their smaller proteomes and modifications compared to eukaryotes allowing for comprehensive proteome coverage (Stekhoven et al., 2014).

# SKIN MICROBIOME

The skin is host to many different bacterial species, fungi, viruses and mites. Most of the bacteria fall into four phyla: Actinobacteria, Firmicutes, Bacteroidetes, and Proteobacteria from which Propionibacterium, Corynebacterium, Staphylococcus, Micrococcus, Streptococcus, and Brevibacterium are highly abundant (van Rensburg et al., 2015) Gram-positive bacteria, since Gram-negative members typically cannot cope with the relatively dry environment of healthy human skin (Del Rosso and Leyden, 2007). In certain physiological conditions where the skin becomes moist, some Gram-negative bacteria can colonize such as Acinetobacter sp. (Del Rosso and Leyden, 2007). The diverse community of the skin, known as the skin microbiome, is defined by host physiology, host genotype, immune system, environment, and lifestyle (Grice and Segre, 2011). However, most of these microorganisms are harmless and many bacterial species act in a commensal fashion by which each species benefits through the exchange of nutrients as well as protect the host from pathogens without negatively affecting each other (Mankowska-Wierzbicka et al., 2015). Knowledge toward the bacterial microbiome is mainly derived from the conventional culture-based approaches, although in recent years DNA sequencing technologies combined with bioinformatic analysis and metagenomics approaches have allowed comprehensive examination of microbial communities (Kuczynski et al., 2012), yet uncertainties remain as to what defines a "normal" microbiome (Backhed et al., 2012).

The underlying contribution of the microbiome in the clinical picture of skin disorders is clear from the perspective of antibacterial treatment, however, the molecular dynamics between the microbiome and host remains largely unknown. Disturbances in homeostasis between microbiome and host can manifest into skin disorders such as atopic dermatitis (AD) or psoriasis (Kuo et al., 2013). Both disorders are connected with dysregulation of the skin immune response. However, while AD lesions are characterized by low level, psoriatic lesions are characterized by high level of antimicrobial peptide production (Nomura et al., 2003). Additionally, AD lesions are regularly infected with microbial pathogens.

The bacterial genus of Staphylococcus sp. is composed of approximately 40 different species both commensal (unable to produce virulence factor coagulase) and pathogenic, and play an important role in skin health and pathology (Del Rosso and Leyden, 2007). For example, Staphylococcus aureus is a medically relevant bacterial pathogen that is associated with the increasing rate of antibiotic resistance mechanisms as well as systemic and cutaneous infections (Becker et al., 2007), including AD (Hanifin and Rogge, 1977). However, it was previously thought that S. aureus causes AD alone, a recent study showed that the skin microbiome composition has a temporal change and it depends on disease flares and treatment. In the active state of the disease, the abundance of S. aureus and skin commensal S. epidermidis was increased, while Streptococcus, Propionibacterium, and Corynebacterium were increased following therapy (Kong et al., 2012). Increased abundance of S. epidermidis could reflect a microbial response to overgrowth of S. aureus, since it can selectively inhibit S. aureus growth (Iwase et al., 2010; Zeeuwen et al., 2013). Moreover, a recent study identified the presence of a distinct microbiome capable of direct communication with the host at the sub-epidermal compartments of the skin, an area previously thought to be sterile (Nakatsuji et al., 2013). While these types of studies employing different techniques have characterized the bacterial species that make up the skin microbiome, proteomics is required in order to identify proteins and their respective pathways involved during these interactions, which could directly contribute to the better understanding of the complex interplay between microbiome and host.

# SHOTGUN MS-BASED PROTEOMICS

Due to the progress made toward state of the art next generation sequencing methodologies applied to generate fully annotated genomes, the number of fully sequenced microbial genomes has increased dramatically (Ribeiro et al., 2012). This has enabled state of the art MS-based proteomics technologies to rapidly advance in order to study microbial models and their communities in a robust and systematic fashion.

Two-dimensional polyacrylamide gel electrophoresis (2D-PAGE), followed by MS analysis has served as the main proteomics method of choice in the past and has been utilized in many skin associated bacterial pathogen studies (Kohler et al., 2005; Becher et al., 2009; Francois et al., 2014). In this technique, proteins are separated according to their molecular weight and isoelectric point migrating and accumulating as protein spots on the gel matrix. Although this approach can resolve multiples of thousands of proteins, it does carry some disadvantages especially with respect toward the identification of low abundant proteins, proteins with extremely high and low molecular weights, and

is rather impractical with respect to large scale site specific analysis of post translational modifications (PTMs) such as phosphorylation.

The inherent limitations associated with 2D-PAGE MS lead to the development of gel-free MS-based approaches better known as "shotgun" proteomics. Advances made in shotgun based proteomics approaches is largely due to technological improvements made in high performance mass spectrometers. Since most biological samples consist of very complex peptide mixtures, mass spectrometers must be capable of ensuring a deep analytical coverage while maintaining a high level of robustness, sensitivity, and measurement accuracy (Mann and Kelleher, 2008). Newer generations of hybrid mass spectrometers such as but not limited to the LTQ-Orbitrap (Hu et al., 2005), LTQ Orbitrap Velos (Olsen et al., 2009), Q-Exactive HF (Scheltema et al., 2014), and Orbitrap Fusion (Erickson et al., 2015) can achieve precise mass accuracy at very high acquisition speeds, and resolution which allows for a complete sampling of complex peptide extracts providing comprehensive proteome coverage. Combined with high performance liquid chromatography (HPLC) technologies, these LC-MS workflows also referred to as gel-free shotgun proteomics allow for the quantification and identification of entire proteomes as well as their protein modifications across different biological samples. An overview of a typical shotgun MS-based experimental workflow is illustrated in **Figure 1**.

For example, a similar approach as described above was applied to the bacterium Propionibacterium acnes (Bek-Thomsen et al., 2014). P. acnes is known to be associated with the inflammatory condition of the sebaceous human follicles known as acne vulgaris (Williams et al., 2012). The authors utilized proteomics in order to identify both human and P. acnes proteins using sebaceous follicular casts. To this end, many proteins involved in wound healing, inflammation, and tissue formation were identified. The most abundant P. acnes proteins were CAMP factors, and surface exposed dermatan sulfate adhesins.

This study is the first of its kind in terms of analyzing the proteomes of both humans and bacteria on sebaceous follicular casts and demonstrates the importance of employing shotgun based proteomics workflows (Bek-Thomsen et al., 2014).

A recent study employed a combinatorial approach utilizing 16SrRNA sequencing with ultra-performance liquid chromatography/quadrupole time of flight (UPLC-TOF) MS toward the generation of a 3-dimensional topographical map of microbes and their molecules of origin (peptides, metabolites) distributed on the surface of the human skin (Bouslimani et al., 2015). This information was utilized to identify microbial species present and correlated with the chemical environment of the skin and serves as a powerful approach toward elucidating how the microbiome interacts and subsequently modifies different areas of the human skin in a species specific fashion.

# SHOTGUN MS-BASED QUANTITATIVE PROTEOMICS

The complete identification of a proteome and its respective PTMs provides a better understanding toward biological function and regulation. However, it is equally important to quantify dynamics of proteins relative to each other under different biological conditions, disease states, and perturbations. This information can be obtained through the use of quantitative proteomics of which numerous approaches have been developed that are compatible with shotgun MS-based proteomics strategies.

One common approach is to introduce a stable nonradioactive isotope label either chemically or metabolically on the peptide or protein level. Through this technique, the relative intensities of peptides are measured, via MS thus identifying and quantifying regulated proteins from different samples through the obtained mass spectra. Examples of such techniques include but they are not limited to Stable Isotope Labeling by Amino acids in Cell culture (SILAC; Ong et al., 2002) or <sup>15</sup>N labeling (Hempel et al., 2010; Soufi et al., 2015), both of which have been successfully implemented in a variety of bacteria (Soufi et al., 2010, 2015; Hempel et al., 2011; Soares et al., 2013; Misra et al., 2014; Soufi and Macek, 2014; Boysen et al., 2015). Furthermore, an innovative MS-based technique known as cell type-specific labeling using amino acid precursors (CTAP) has the ability to continuously label the proteome of individual cell types actively growing in a co-culture environment, allowing for the elucidation of the "cell-of-origin" of proteins in multicellular environments (Gauthier et al., 2013). Although this method was demonstrated in eukaryotic models, the application toward bacteria especially those in multispecies environments, such as human microbiomes, biofilms, etc. could significantly contribute toward the better understanding of the dynamics between different bacterial species inhabiting the same environment.

Although metabolic labeling approaches such as SILAC are considered by many to be the most accurate method toward global protein quantitation (Ong et al., 2002), they may be difficult to implement due to technical challenges with incorporation of the chemical or metabolic labeling approach, or challenges posed in complex biological models such as those involving the skin. Therefore, due to rapid advances made in LC-MS, label free quantitation (LFQ) approaches are now possible and routinely performed (Neilson et al., 2011). In this approach, different peptide samples are measured via MS and compared either by the total number of sequenced (MS/MS) spectra or the total extracted ion currents under controlled conditions.

The LFQ methodology was successfully applied to study the role of S. aureus in patients with ectodermal dysplasia and AD (Burian et al., 2015). Patients suffering from AD are highly correlated with the presence of opportunistic S. aureus infections due to having a reduced immune response, and have been shown to contain a lower amount of peptides originating from the natural antimicrobial dermcidin found in sweat leading to decreased levels of antimicrobial activity (Schittek, 2011). The authors proved through proteomics analysis of the secretome of diseased vs. healthy patients that a similar mechanism of reduced sweat derived dermcidin also exists in patients with ectodermal dysplasia, making these individuals highly prone to acquiring S. aureus infections (Burian et al., 2015).

Quantitative proteomics techniques can be utilized to estimate the absolute molar amount or concentration of a particular protein per cell. This information is highly relevant especially within the clinical context (Rodriguez-Suarez and Whetton, 2013). Many variants of this technique exist and all involve spiking in a known concentration of an internal protein standard into the sample, followed by MS analysis, and comparing resulting sample peptide measurements to the internal standard. Techniques include FLEXIQuant (Singh et al., 2009), absolute quantification using protein epitope signature tags (PrEST; Zeiler et al., 2012), intensity based absolute quantification (iBAQ; Schwanhausser et al., 2011), and absolute quantification (AQUA; Gerber et al., 2003). These methods can be applied to bacteria and serve as an important tool toward understanding the underlying mechanisms during bacterial virulence and antibiotic resistance.

# POST TRANSLATIONAL MODIFICATIONS (PTMS)

Various PTMs on bacterial proteins such as phosphorylation, acetylation, methylation, and deamidation serve as an efficient means of controlling signal transduction, virulence and regulatory processes. PTMs represent a significant process in the life cycle of bacteria and can modulate key virulence factors and are attractive targets for novel therapies.

Detecting these PTMs in bacteria poses a technical challenge due to the fact that they are difficult to discover as these modifications typically exist at low levels of abundance. To circumvent this issue, specific enrichment strategies targeting certain PTMs can be utilized in order to decrease peptide complexity thereby increasing the likeliness of detection and subsequent characterization. For example, immunoaffinity enrichment is typically employed to select for lysine acetylated peptides (Rardin et al., 2013). Moreover, similar enrichment strategies are employed to capture phosphorylation events on serine, threonine, and tyrosine (S/T/Y) amino acid residues.

These PTMs were once thought to exist solely in eukaryotes, however, MS-based proteomics combined with phosphopeptide enrichment strategies such as titanium dioxide chromatography (TiO2; Pinkse et al., 2004), immunoprecipitation (Rush et al., 2005) or immobilized metal ion affinity chromatography (IMAC; Villen and Gygi, 2008) have now established S/T/Y phosphorylation as a frequent and important PTM among different bacterial species (Macek et al., 2007, 2008; Soufi et al., 2008; Lin et al., 2009; Sun et al., 2010; Manteca et al., 2011; Misra et al., 2011; Cousin et al., 2013).

A recent shotgun based high resolution LC-MS/MS approach involving the enrichment of surface proteins using "trypsin shaving" was applied to the opportunistic pathogen S. aureus toward the identification of hydroxymethylation on aspargine and glutamine amino acid residues in an attempt to identify the presence and potential regulatory importance of this PTM on surface proteins which are known to assist in the colonization and invasion of the host cell (Waridel et al., 2012). The authors reported a total of 15 proteins (mostly surface proteins) that contained hydroxymethylation modifications and could play a role in virulence factor modulation.

Pseudomonas aeruginosa, an opportunistic bacterial pathogen, is associated with immunocompromised patients and nosocomial infections. P. aeruginosa infections can enter the body through the skin, and can be life threatening due to their ability to develop antibiotic resistance (Ouidir et al., 2015). A study involving P. aeruginosa, employed an effective combinatorial method of immunoaffinity assays, complex peptide fractionation, and shotgun proteomics toward the identification and characterization of lysine acetylation, a reversible PTM that has recently been implicated as an important regulatory mechanism in many bacteria (Yu et al., 2008; Kim et al., 2013; Zhang et al., 2013; Liao et al., 2014). This approach lead to the identification of 320 acetylated proteins in a wide variety of functional classes. The study also identified novel lysine acetylation events in virulence factors known to assist in host immune response evasion such as chitin binding protein, serine protease, exotoxin A, and hemolysin which potentially implies that lysine acetylation events in P. aeruginosa plays a role in mechanisms involving virulence (Ouidir et al., 2015).

# REFERENCES


Cysteine phosphorylation in S. aureus was shown to assist in the regulation of bacterial virulence and vancomycin resistance (Sun et al., 2012). Utilizing high resolution MS, the authors elucidated in a site specific fashion, that cysteine phosphorylation events occurred in various proteins many of which are global regulators that control important biological processes (Sun et al., 2012). Moreover, the eukaryotic-like Ser/Thr kinase and phosphatase pair Stk1/Stp1 was found to regulate cysteine phosphorylation in many Gram-positive bacteria providing an important piece of information toward the underlying regulatory mechanism of these events (Sun et al., 2012).

# CONCLUSION AND FUTURE OUTLOOK

A wide range of dermatological microbial associated diseases presents current and future challenges to health care providers. While in the last decade metagenomics data provided a higher level of understanding of the microbial skin environment, understanding signal transduction, virulence, regulatory processes, and dynamics between different bacterial species is essential in order to improve the overall standards and quality of patient care and treatment. MS-based proteomics has the capability to provide this knowledge. To this end, a groundbreaking endeavor with proteomics at the forefront is required in order to elucidate the specific mechanisms involved in skin infections, bacterial resistance as well as the complex microbiome and host relationship.

# AUTHOR CONTRIBUTIONS

YS wrote the manuscript. BS assisted with writing of the manuscript and generated the figure.

# ACKNOWLEDGMENT

We would like to thank Katarina Matic, M.D., Ph.D. for critical reading of the manuscript.

infundibula extracted from healthy and acne-affected skin. PLoS ONE 9:e107908. doi: 10.1371/journal.pone.0107908



revealed by proteome-wide profiling. J. Proteomics 106, 260–269. doi: 10.1016/j.jprot.2014.04.017



mediates bacterial virulence and antibiotic resistance. Proc. Natl. Acad. Sci. U.S.A. 109, 15461–15466. doi: 10.1073/pnas.1205952109


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Soufi and Soufi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Ionic Liquids as Unforeseen Assets to Fight Life-Threatening Mycotic Diseases

#### *Diego O. Hartmann, Marija Petkovic and Cristina Silva Pereira\**

*Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Oeiras, Portugal*

Ionic liquids discovery has celebrated 100 years. They consist solely of ions, one of which is typically organic and asymmetrical. Remarkable physical and chemical properties stirred their use as alternative solvents in many chemical processes. The recent demonstration of their occurrence in nature might boost their interest in biological sciences. In the search of mechanistic understandings of ionic liquids' ecotoxicological impacts in fungi, we have analyzed the proteome, transcriptome, and metabolome responses to this chemical stress. Data illuminated new hypotheses that altered our research path – exploit ionic liquids as tools for the discovery of pathways and metabolites that may impact fungal development and pathogenicity. As we get closer to solve the primary effects of each ionic liquid family and their specific gene targets, the vision of developing antifungal ionic liquids and/or materials, by taking advantage of elegant progresses in this field, might become a reality. *Task-designed* formulations may improve the performance of conventional antifungal drugs, build functional coatings for reducing allergens production, or aid in the recovery of antifungal plant polymers. The frontier research in this cross-disciplinary field may provide us unforeseen means to address the global concern of mycotic diseases. Pathogenic and opportunistic fungi are responsible for numerous infections, killing annually nearly 1.5 million immunocompromised individuals worldwide, a similar rate to malaria or tuberculosis. This perspective will review our major findings and current hypotheses, contextualizing how they might bring us closer to efficient strategies to prevent and fight mycotic diseases.

#### *Edited by:*

*Jonathan M. Blackburn, University of Cape Town, South Africa*

#### *Reviewed by: Manuel Simões,*

*University of Porto, Portugal Jeffrey A. Lewis, University of Arkansas, USA*

#### *\*Correspondence: Cristina Silva Pereira spereira@itqb.unl.pt*

#### *Specialty section:*

*This article was submitted to Microbial Physiology and Metabolism, a section of the journal Frontiers in Microbiology*

> *Received: 20 October 2015 Accepted: 22 January 2016 Published: 08 February 2016*

#### *Citation:*

*Hartmann DO, Petkovic M and Silva Pereira C (2016) Ionic Liquids as Unforeseen Assets to Fight Life-Threatening Mycotic Diseases. Front. Microbiol. 7:111. doi: 10.3389/fmicb.2016.00111*

Keywords: ionic liquids, filamentous fungi, fungal infections, proteomics, stress response, natural compounds discovery, antifungal drugs

Ionic liquids, which consist entirely of ionic species, are conventionally defined as salts that are liquid below 100◦C. Their history started in 1914 when the physical properties of ethylammonium nitrate were first reported (Plechkova and Seddon, 2008). Nonetheless, only in the last decades the term *ionic liquid* emerged and a new scientific area arose. Their generic – yet not universal – properties include features such as negligible vapor pressure, conventional non-flammability and excellent solvation potential (Endres and Zein El Abedin, 2006), which do not occur concurrently either in molecular compounds or in crystalline salts. These properties boosted the interest of chemists and chemical engineers, and were the basis for the classification of ionic liquids as *green* solvents (Earle and Seddon, 2000). Their potential was further emphasized with the insight that numerous structural variations can be obtained via relatively simple synthesis, categorizing ionic liquids as *designer solvents* (Ranke et al., 2007). Such subtle structural variations in the composing ions allow fine-tuning their physical and chemical properties to promote them as *taskdesigned solvents*.

These liquid salts have been widely investigated and several hundred are already chemically well characterized. By now, numerous applications of ionic liquids have been proposed, impacting diverse relevant areas, such as Catalysis, Separations, Materials, Sustainable Energies, Biorefineries, Renewable Fuels, and Chemicals, just to name a few. Some remarkable examples include ionic liquids in membranes for gas separation (Tomé et al., 2014), and for the extraction of disease biomarkers and antibodies (Taha et al., 2015). Unfortunately, only a few applications have successfully reached the industrial panorama (Plechkova and Seddon, 2008), e.g., BASILTM and cellulose reshaping (BASF, Germany), dye sensitizing solar cells (G24i Power, UK) and – certainly one of the most fascinating – the use of mercury grabbing ionic liquids to clean natural gas streams (Abai et al., 2015), commercialized as HycaPure Hg by Clariant (Switzerland).

As a scientific topic, ionic liquids are likely among the most intensely researched ones, especially in the Chemistry, Physics, and Materials disciplines. Over 10000 publications on ionic liquids can be found in the *Web of Knowledge* but little more than 10% intersect with the Life Sciences. So the question that rises from all the above is *why should any environmentalist, biologist or biochemist care about ionic liquids at first?* Perhaps the demonstration that ionic liquids might occur in nature sounds appealing. In a recent study, it was suggested that an ionic liquid is formed during the confrontation of two ant species, *Nylanderia fulva* and *Solenopsis invicta*, as a form of defense mechanism of the former against the venom of the latter ant species (Chen et al., 2014). In line with this idea, it was shown that certain metabolites abundant in plants become liquid when mixed together (Choi et al., 2011). These so-called *natural deep eutectic solvents*, with properties strongly resembling those of ionic liquids, would ensure cellular processes involving waterinsoluble compounds. The likelihood of naturally occurring ionic liquids creates a new paradigm – they are not exclusively manmade chemicals – and fosters a new boost of interest in their research.

Ionic liquids and life sciences intertwined for the first time, however, to answer a very fundamental need: understanding ionic liquids' environmental impact. Our initial motivation, similar to tens of other research laboratories worldwide, was partially due to the fact that, despite being classified as *green* solvents, these organic salts comprise a disparate group of compounds that are not all intrinsically green. Many have been shown to be toxic and recalcitrant to biodegradation. Data collected so far has been compiled in a series of elegant and comprehensive reviews on their environmental impact (Petkovic et al., 2011) and biodegradability (Coleman and Gathergood, 2010). The large majority of early studies on ionic liquids' toxicity aimed at defining their inhibitory concentrations to very distinct organisms, essentially as to guide chemical research efforts toward more sustainable formulations. These studies have ascertained that different testing models exhibit fairly diverse susceptibilities to ionic liquids, however, often suggesting a similar mechanism of toxicity or cytotoxicity, i.e., plasma membrane permeabilization and oxidative stress (Yu et al., 2009; Petkovic et al., 2011). Unsurprisingly, the chemical nature of the ions rules their specific molecular and/or cellular mode of action. Most toxicity assays, if not all, were based on aqueous systems, in which the composing ions were fully solubilized in water. By accepting this principle, the biological effect of an ionic liquid should consider the individual contributions of its ions. Among the most common cations, the aromatic ones appear to be more toxic than the alicyclic or the quaternary ammonium (Stolte et al., 2007a,b). Nevertheless, the prevailing idea around ionic liquid's mechanism of toxicity is that, in either of the composing ions, the length of the alkyl chain is directly correlated with lipophilicity and permeabilization of biological membranes, leading to cell death (Zhao et al., 2007). This seems valid only for lipophilic cations, since our most recent data showed that permeabilization by long chain anions (i.e., alkanoates) is hindered by negative charges in the membrane outer surface (Hartmann et al., 2015). These were great news since we have been for long pursuing the use of cholinium alkanoates as novel biocompatible solvents for plant polyesters (Garcia et al., 2010). This idea has been nurturing our research in identifying *taskdesigned* ionic liquids for the hydrolysis of structural polymers in plant cell walls, i.e., cutin and suberin. Our goal was to preserve the native properties of the polyesters, particularly their function as barriers to microbial pathogens (Ranathunge et al., 2011). In some ways unpredictably, an ionic liquid – cholinium hexanoate – provided us the right means for that. It plays the dual role of solvent and catalyst, promoting the specific cleavage of particular ester bonds of suberin (Ferreira et al., 2014). This ensures the partial preservation of its tridimensional structure, hence the spontaneous formation of films with potentially broad antimicrobial properties (Garcia et al., 2014).

Notwithstanding significant progresses in the field of ionic liquids' toxicity, our curiosity did not allow us to stop there. We wanted to seek for better mechanistic understandings of how these allegedly man-made chemicals would impact living organisms at a cellular and molecular level. Our frontrunner candidates for study organisms were undoubtedly filamentous fungi. Fungi, which are unique and remarkable eukaryotic organisms, act as key colonizers of the soil and ensure major ecosystem functions, including the mitigation of hazardous chemicals (Harms et al., 2011; Varela et al., 2015). Moreover, these organisms are well known as proficient producers of enzymes and metabolites of great biotechnological and pharmacological interest. Several studies revealed that fungal strains commonly found in soil can resist high concentrations of ionic liquids (Petkovic et al., 2009; Singer et al., 2011; Simpson, 2012). In particular our study also demonstrated that all the tested compounds could completely alter the fungal metabolic footprint, i.e., the diversity of diffusible small molecules (Petkovic et al., 2009). These promising and stimulating findings constituted a foundation for our subsequent research efforts.

Looking for a holistic view of the impact of these organic ions in fungal metabolism, we decided to perform a proteomic analysis of model filamentous fungi exposed to ionic liquids

(Martins et al., 2013). We specifically selected cholinium chloride and 1-ethyl-3-methylimidazolium chloride, which carry cations currently attracting most academic and industrial interest. These compounds have been previously observed to display very distinct antifungal activities and biodegradability potential (Coleman and Gathergood, 2010; Petkovic et al., 2011). *Aspergillus nidulans* and *Neurospora crassa* – prime model fungal systems for genetic, cellular, and biochemical research – are very dissimilar when accounting for their halo-tolerance (Gunde-Cimerman et al., 2009) and secondary metabolite producing capacity (Khaldi et al., 2010; Inglis et al., 2013). The differential proteome showed that several critical biological processes and pathways were affected by either cation, reflected in the accumulation of numerous stress-responsive proteins and osmolytes, and in the alteration of developmental programs in both fungi. Encouragingly, in this study we observed the accumulation of proteins likely involved in the biosynthesis of non-proteinogenic amino acids in *N. crassa* in the presence of either cation. These rare amino acids are found in secondary metabolites with potent biological activity, e.g., neoefrapeptins and acretocins (Degenkolb et al., 2007; Degenkolb and Bruckner, 2008). Another promising example is the case of *A. nidulans*, which genome has nearly 70 genes coding for multi-domain enzymes likely involved in secondary metabolite biosynthesis (Inglis et al., 2013). Through whole-genome profiling, our recent research efforts revealed that, upon exposure to certain organic ions, this fungus up-regulated a series of secondary metabolism backbone genes (Petkovic and Silva Pereira, 2012). This resulted in a differential metabolic profile that conceals small compounds with biological activities of high pharmacological value (unpublished data). These promising findings open new perspectives on ionic liquids' potential in the discovery of natural compounds.

The large amount of data that emerge from proteomics or transcriptomics analyses can provide fundamental information on very specific scientific questions. As an excellent example, the resistance of the bacterium *Enterobacter lignolyticus* to 1 ethyl-3-methylimidazolium chloride was in part unraveled using whole genome profiling (Khudyakov et al., 2012). This study arose from the question of how this solvent – able to efficiently dissolve cellulose from plant biomass – could impact biological and fermentation processes. The authors showed that bacteria partially circumvent the toxicity of the cation by increasing membrane transporters and the concentrations of osmolytes. These findings further inspired the design of biofuel cells where ionic liquids are employed for biopolymer dissolution (Ruegg et al., 2014).

As we gather more data from global analyses of the impact of these organic ions, we move deeper into exploring these compounds as tools to solve fundamental questions in fungal biology. Our group rapidly advanced from a rather simplistic view of morphological alterations perceived microscopically (Petkovic et al., 2012) to evaluate, at a gene expression level, membrane and cell wall damage induced by ionic liquids (Hartmann and Silva Pereira, 2013). Fungi can alter the composition of their membranes, regulating its fluidity to overcome adverse environments. The membrane fluidity, which is inversely related to its resistance to permeabilizing compounds, is essentially controlled by the levels of ergosterol and by the balance between saturated and unsaturated fatty acids. The fungal cell wall, on its turn, is responsible for maintaining cell shape, counteracting the turgor pressure and protecting the plasma membrane. Upon damage to the cell wall, fungi respond by activating several genes involved in its biosynthesis, creating conditions that allow them to re-establish its integrity, through the so-called cell wall integrity pathway. This salvage mechanism, better understood in the yeast *Saccharomyces cerevisiae*, remains poorly characterized in filamentous fungi (Fujioka et al., 2007; Valiante et al., 2015). We have demonstrated that some ionic liquids can cause membrane and cell wall damage in *A. nidulans*, most likely activating an alternative cell wall integrity pathway, yet to be characterized (Hartmann and Silva Pereira, 2013). More intriguing is the fact that these organic ions can also activate sphingolipid biosynthesis, leading to the differential accumulation of intermediates, including unknown species (Hartmann and Silva Pereira, 2015). These molecules may participate in the stress response of *A. nidulans*, including the activation of the cell wall integrity pathway. These are noteworthy results, not only for the prospect of unraveling a cross-talk mechanism between the cell wall integrity pathway and sphingolipids biosynthesis, but also because both pathways have for long been considered to be excellent candidate targets for the development of new antifungal agents.

Conventional antifungals, which target, directly or indirectly, the fungal plasma membrane or cell wall, are limited to just a few classes (*viz.* azoles, echinocandins and polyenes) (Odds et al., 2003). New generations of the classical antifungal drugs, as well as non-conventional agents and targets are already available, such as flucytosine and sordarins, which act by inhibiting DNA and protein synthesis, respectively, (Odds et al., 2003). However, clinical development and implementation of new drugs is notoriously long. Hence, the current challenge is to better understand the biology of filamentous fungi, aiming at the discovery of novel targets and the development of new effective drugs and antifungal strategies (Ostrosky-Zeichner et al., 2010; Denning and Bromley, 2015). We hope to make further evident how our cross-disciplinary research will provide means to address these global concerns. As long-term perspective, we seek to deepen our knowledge on fungal biology by exploring organic ions as the right stimuli for deciphering key cellular and molecular processes. We now rely on proteomic tools and, more specifically, phosphoproteomics, to attain deeper insights on the potential elements of the cell wall integrity pathway, as a foundation to solve the puzzling roles of sphingolipids in filamentous fungi.

Although debatable, the intriguing application of ionic liquids in pharmaceuticals development – often mentioned as *third evolution of ionic liquids* – has produced so far notable improvement of drugs solubility, delivery and biological activity through their conversion to a salt form (Hough et al., 2007). This seems a rather interesting prospect, especially when applied to the salt form of the antifungal drug amphotericin B to overcome its low solubility (Petkovic et al., 2015). Nevertheless, the fundamental question we are trying to address is how ionicity impacts the drug primary mode of action. Another path being investigated by us is the use of ionic liquids to reduce the negative impact of pathogens as *Aspergillus fumigatus* by targeting allergen production. There are nearly twenty fully described allergen peptides in this fungus and as many predicted ones (Kurup, 2005; Fedorova et al., 2008). Transcriptomic data suggested that exposure of fungi to certain ionic liquids can strongly reduce the expression of genes coding for putative allergenic peptides (unpublished data). The current challenge is to identify, supported by immunoproteomics, organic ions that strongly interfere with the biosynthesis of allergenic peptides in *A. fumigatus*. This constitutes another elegant example of cell biochemistry manipulation using chemical stimuli and may inspire the use of ionic liquids for developing novel antifungal materials/coatings.

Life-threatening fungal infections present an uprising burden that affects millions of individuals, with more than 2 million invasive fungal infections reported every year worldwide (Brown et al., 2012). Fungi constitute a high risk to immunocompromised individuals of all ages, such as HIV/AIDS, cancer, transplant, and diabetes patients, which represent a significant percentage of the world population. The healthcare costs are enormous, estimated to billions of dollars per year on antifungal drugs only. Mortality rates often exceed 50% even with the current treatment options. This reality is aggravated when considering that the available therapies are sometimes inadequate, as many resistant strains (Anderson, 2005) and emerging fungal pathogens (Fisher et al., 2012) are now being discovered at a regular basis. The identification of new potential risk groups, from asthma sufferers to gastric ulcer patients, further emphasizes the need for efficient

# REFERENCES


antifungal drugs (van Woerden et al., 2013). Our vision is to produce valuable far-reaching insights to advance on the identification and development of novel antifungal strategies to, ultimately, fight fungal pathogenicity.

# AUTHOR CONTRIBUTIONS

CP conceived and written the first draft; DH and MP contributed in the acquisition, analysis, and interpretation of the data included. All authors revised the manuscript critically for important intellectual content and gave final approval of the version to be submitted.

# FUNDING

We acknowledge funding from the European Research Council through grant ERC-2014-CoG-647928 and *Fundação para a Ciência e Tecnologia* through grant UID/Multi/04551/2013 (Research unit GREEN-it "Bioresources for Sustainability"). The authors would like to acknowledge the kind support in the framework of the COST Action EXIL – EXchange on Ionic Liquids (CM1206).

# ACKNOWLEDGMENT

We are grateful to all past and present team members and collaborators.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2016 Hartmann, Petkovic and Silva Pereira. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# **Elucidating Host–Pathogen Interactions Based on Post-Translational Modifications Using Proteomics Approaches**

*Vaishnavi Ravikumar <sup>1</sup> , Carsten Jers <sup>2</sup> and Ivan Mijakovic 1,3 \**

*<sup>1</sup> Systems and Synthetic Biology Division, Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg, Sweden, <sup>2</sup> Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark, <sup>3</sup> Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Hørsholm, Denmark*

#### *Edited by:*

*Nelson C. Soares, Institute of Infectious Disease and Molecular Medicine-University of Cape Town, South Africa*

#### *Reviewed by:*

*Céline Henry, Institut National de la Recherche Agronomique, France Lucía Monteoliva, Universidad Complutense de Madrid, Spain*

#### *\*Correspondence:*

*Ivan Mijakovic ivan.mijakovic@chalmers.se*

#### *Specialty section:*

*This article was submitted to Microbial Physiology and Metabolism, a section of the journal Frontiers in Microbiology*

> *Received: 30 August 2015 Accepted: 09 November 2015 Published: 20 November 2015*

#### *Citation:*

*Ravikumar V, Jers C and Mijakovic I (2015) Elucidating Host–Pathogen Interactions Based on Post-Translational Modifications Using Proteomics Approaches. Front. Microbiol. 6:1312. doi: 10.3389/fmicb.2015.01312*

Microbes with the capability to survive in the host tissue and efficiently subvert its innate immune responses can cause various health hazards. There is an inherent need to understand microbial infection patterns and mechanisms in order to develop efficient therapeutics. Microbial pathogens display host specificity through a complex network of molecular interactions that aid their survival and propagation. Co-infection states further lead to complications by increasing the microbial burden and risk factors. Quantitative proteomics based approaches and post-translational modification analysis can be efficiently applied to gain an insight into the molecular mechanisms involved. The measurement of the proteome and post-translationally modified proteome dynamics using mass spectrometry, results in a wide array of information, such as significant changes in protein expression, protein abundance, the modification status, the site occupancy level, interactors, functional significance of key players, potential drug targets, etc. This mini review discusses the potential of proteomics to investigate the involvement of post-translational modifications in bacterial pathogenesis and host–pathogen interactions.

**Keywords: co-cultures, pathogenesis, bacteria, proteomics, post-translational modifications**

# **INTRODUCTION**

Phylogenetically diverse microorganisms such as bacteria, viruses or protozoans commonly pose as "pathogens." They have the capacity to survive in the host tissue, adapt to and suppress the host innate immune responses, thus constantly challenging human health and welfare. The normal microbial flora that inhabits specific regions in the host organism is essential for the proper functioning of the host. In contrast, pathogens elicit specific unnatural responses in the host system. Over the years, the field of infection biology has thus generated much interest, especially with the current emergence of various multi-drug resistant strains. The infection cycle of a pathogen comprises mainly of three stages, namely introduction of the pathogen into the host cell or tissue, establishment of the pathogen within the host and dissemination within the host. Despite continuous improvements in the techniques developed to identify and elucidate infection patterns, there is a pressing demand for more efficient advancements to interpret the molecular cues during infection. Mass spectrometry based proteomics, through the years, has been instrumental in deciphering the molecular mechanisms underlying host–pathogen interactions (Yang et al., 2015). Development of high-resolution instruments and improvements in experimental techniques has helped expand the scope of infection biology.

The delicate balance that exists between the host immune responses and the pathogen can be tipped toward the latter by the co-existence of multiple microbial species capable of colonizing the same host niche. Bacteria have been identified in a wide range of symbiotic associations such as pathogenic, mutualistic or as commensals. The state of "natural selection" enforces interspecies specific interactions that differ significantly from the behavior of the individual species. Bacterial co-cultures or mixed populations find numerous benefits (1) they are commonly employed in the biotechnology industry such as in waste water treatment (Tijhuis et al., 1994), biogas plants (Klocke et al., 2007), production of vitamin C (Ma et al., 2014); (2) they form an important part of the human gut microbiota (Putignani et al., 2014). Adversely, symbiotic interactions can also be unfavorable, resulting in host invasion and damage (Chowdhury et al., 2010; Kluge et al., 2012). Thus systematic analysis of bacteria in co-culture conditions is essential to obtain an understanding of microbial behavior under conditions of infection. Routine strategies employed by pathogens to defeat host immune responses generally include adherence to host cell surfaces, followed by invasion and subsequent production and secretion of enterotoxins along with exhibiting host molecular mimicry mechanisms (Zhang et al., 2005). Protein secretion systems are thus pivotal in infection (Tseng et al., 2009). Targeting any one of these systems could help curb transmission of the disease or elimination of the pathogen. The following sections in this review describe the implications of various post-translational modifications (PTMs) on bacterial virulence and the proteomics technologies available to investigate such events along with the associated technical challenges.

# **POST-TRANSLATIONAL MODIFICATIONS**

Protein PTMs are known to play diverse roles in the cellular milieu. Bacterial virulence is orchestrated by a multitude of PTMs. Some of the most common PTMs critical for the infection process are reviewed here.

# **Phosphorylation**

Phosphorylation is a well-characterized and ubiquitous PTM that is crucial for most functions occurring in a bacterial cell. Apart from regulating metabolic pathways, phosphorylation has also been seen to be involved in various virulence mechanisms. The existence of Hanks-type Ser/Thr kinases in bacteria has generated a huge interest in the field of infection biology. Pathogenic strains such as *Streptococcus* spp., *Pseudomonas aeruginosa*, *Mycobacteria*, *Yersinia* spp., to name a few, employ Ser/Thr kinasemediated host–pathogen interactions to mediate diverse cellular networks required for adhesion to and invasion of the host. While the exact mechanism of infection is not yet well understood, the mode of infection has been speculated to follow three basic modes: (1) phosphorylation of host proteins, (2) disruption of host defense mechanisms due to kinase activity, and lastly (3) essential role of Ser/Thr kinases by unrealized processes (Canova and Molle, 2014). The global phosphoproteome of a number of pathogenic bacteria such as *Corynebacterium glutamicum* (Bendt et al., 2003), *Campylobacter jejuni* (Voisin et al., 2007), *Klebsiella pneumoniae* (Lin et al., 2009), *Mycobacterium tuberculosis* (Prisic et al., 2010), *Streptomyces coelicolor* (Parker et al., 2010), to name a few, have been analyzed. Changes that occur in the host phosphoproteome upon bacterial infection have also been investigated (Schmutz et al., 2013; Scholz et al., 2015). Dynamics of the bacterial phosphoproteome at the time of infection and post infection would be an interesting avenue to embark upon.

Secretion systems play a vital role during pathogenesis in a large number of bacteria. *Yersinia enterocolitica* and other species encode for a protein kinase A, YopO, which is secreted into the host via the type III secretion system. This kinase helps resist phagocytosis by macrophages via disruption of host cytoskeletal elements (Juris et al., 2000; Grosdent et al., 2002). This kinase was also reported to phosphorylate actin and otubain, resulting in inhibition of phagocytosis (Juris et al., 2006). Mutation of the kinase domain has been shown to reduce lethality during infection (Galyov et al., 1993; Wiley et al., 2006). Similarly, Stk1 from *Staphylococcus aureus* has been shown to phosphorylate numerous host substrates involved in cell cycle signaling or apoptotic pathways (Miller et al., 2010). SteC of *Salmonella enterica* serovar Typhimurium, like YopO, induces reorganization of actin filaments in the host on infection (Odendall et al., 2012). Host immune cell responses depend on the proper functioning of the NF-κB pathway. *Legionella pneumophila* LegK acts as an inflammatory agent and interferes with the NF-κB pathway (Ge et al., 2009). Likewise, protein kinases NleH1 and NleH2 from enteropathogenic *Escherichia coli* work by inhibiting the transcription factor, NF-κB (Royan et al., 2010). Phosphorylation of the central core of type II fatty acid synthase (FASH) in *Mycobacterium tuberculosis*, catalyzed by the kinase KasB, governs the physiopathology of tuberculosis (Vilcheze et al., 2014). Apart from phosphorylation by Ser/Thr Hanks-type kinases, bacterial two-component systems involving phosphorylation on histidine-aspartate residues (TCS) form a major adaptive mechanism in pathogenic strains. For example, CovRS (the control of virulence regulator/sensor kinase) in the human pathogen group A *Streptococcus* is fundamental for virulence (Horstmann et al., 2015). Interestingly, cysteine protein phosphorylation events are also reported to mediate bacterial virulence. The SarA/MarA staphylococcal accessary regulator A, part of the family of global transcriptional regulators (MgrA), is phosphorylated/dephosphorylated by the *Staphylococcus aureus* kinase/phosphatase pair Stk1-Stp1 and speculated to play a crucial role in shifting the intracellular redox balance, contributing to virulence (Sun et al., 2012).

Cognate to kinase activity is the activity of phosphatases, making phosphorylation a reversible and tightly regulated PTM. In many organisms, protein phosphatases act as essential virulence determinants, thus playing a central role in infection and dissemination. YopH tyrosine phosphatase from *Yersinia* is involved in the dephosphorylation of the focal adhesion complexes and essential for antiphagocytosis (Persson et al., 1999). SptP tyrosine phosphatase of *Salmonella typhimurium* was observed to be required for virulence in murine models (Kaniga et al., 1996). The phosphothreonine lyase protein, OspF, from *Shigella flexneri* irreversibly phosphorylates members of the MAPK and ERK pathway, subsequently affecting the innate immune system (Reiterer et al., 2011). Dephosphorylation of tyrosine kinases such as the BY-kinase Wzc-ca from *E. coli* K12 by Wzb causes increased capsular polysaccharide formation which can further act as a poor immunogen (Whitmore and Lamont, 2012; Hansen et al., 2013).

# **Acylation**

Acetylation can be used as a mechanism to modulate phosphorylation-based signaling. *Yersinia* species use a serine/threonine acetyltransferase, YopJ, to interfere with host MAPK kinase signaling by acetylating serine and threonine residues in the activation loop thereby preventing phosphorylation-dependent activation (Mukherjee et al., 2006). YopJ homologs are widely distributed in both mammal and plant pathogens suggesting that inhibition of host kinases by serine/threonine acetylation could be a common strategy for bacterial pathogens (Lewis et al., 2011). Lysine acetylation and succinylation have in recent years been shown to be abundant modifications in bacteria, and there are some indications that they could be important for bacterial pathogenesis. In *E. coli*, the transcription factor RcsB that controls colanic acid capsule synthesis, is acetylated on lysine thereby reducing its DNA binding activity (Thao et al., 2010). A global study indicated that lysine acetylation is involved in regulation of cell wall fatty acids synthesis in *M. tuberculosis*, which in turn is implicated in pathogenicity (Liu et al., 2014). In addition, a lysine deacetylase (MRA\_1161) mutant exhibits a defect in biofilm formation (Liu et al., 2014). KasA, a protein involved in biofilm formation, is modified by another type of acylation namely lysine succinylation. Further, a number of proteins involved in antibiotic resistance are succinylated (Xie et al., 2015). With only few functional studies, it is still unclear to what extent lysine acetylation and succinylation contribute to bacterial virulence.

# **Ubiquitination**

Ubiquitination is an important PTM in Eukarya and regulates several processes including key cell defense systems. Ubiquitin is a small polypeptide (78 amino acids) that can be covalently linked to primarily lysine. Ubiquitination requires the activities of an E1 activating enzyme, an E2 conjugating enzyme and an E3 ligase (Ashida et al., 2014). Ubiquitin contains seven lysines and can itself be ubiquitinated leading to formation of polyubiquitin chains with various linkages that in turn dictates biological function (Ashida et al., 2014). Bacterial pathogens have developed several ways of targeting the host ubiquitin system including the use of Eukaryotic-like and novel E3 ligases as well as de-ubiquitinating enzymes. The *S. flexneri* E3 ligase IpaH9.8 reduces the NF-κB-mediated inflammatory response by polyubiquitination of NEMO, a protein involved in NFκB activation, thereby targeting it for degradation (Ashida et al., 2010). Additionally, IpaH9.8 appears to modulate gene expression via ubiquitination of the splicing factor U2AF<sup>35</sup> (Okuda et al., 2005; Seyedarabi et al., 2010). In *S. enterica* infection, ubiquitinated protein aggregates are formed near the *Salmonella*-containing vacuole which targets it for autophagic degradation. This is countered by SseL, a deubiquitinase, that deubiquitinates the protein aggregates (Mesquita et al., 2012). The pathogens can also use the host ubiquitination system to modify their own proteins. To control the timing of effector proteinmediated functions, *S. enterica* SopE, that targets RHO-GTPases leading to membrane ruffling, is ubiquitinated and degraded earlier than the protein SptP that prevents membrane ruffling after invasion (Kubori and Galan, 2003). Another *S. enterica* effector protein, the phosphatase SopB, can be mono-ubiquitinated at six positions and this in turn modulates its cellular location and enzyme activity (Knodler et al., 2009).

# **AMPylation**

AMPylation is the covalent attachment of AMP to a threonine or tyrosine residue. *Vibrio parahaemolyticus* effector VopS uses ATP to covalently modify Rho-GTPases with AMP on threonine. This is turn affects its interaction with downstream signaling proteins leading to inhibition of actin assembly (Yarbrough et al., 2009). To exploit host cell vesicle transport, *L. pneumophila* SidM activates the small GTPase Rab1 by AMPylation on a tyrosine, and when Rab1 is no longer needed, it is de-AMPylated by SidD, and subsequently targeted for degradation by polyubiquitination (Neunuebel et al., 2011). Host GTPases are also targeted by another novel, reversible PTM, namely phosphocholination. *L. pneumophila* effectors AnkX and Lpg0696 phosphocholinate and de-phosphocholinate Rab1 and Rab35, respectively (Mukherjee et al., 2011; Tan et al., 2011).

# **Alkylation**

Protein alkylation is the addition of alkyl groups on specific amino acids, notably methyl on arginine and lysine (methylation) and the lipids farnesyl or geranylgeranyl isoprenyl on cysteine (prenylation). Histone proteins are regulated by lysine methylation, and this is also a target for bacterial pathogens. The *Bacillus anthracis* protein BaSET trimethylates histone H1 on lysine and thereby reduces activation of NF-κB response elements (Mujtaba et al., 2013). Prenylation confers hydrophobicity to its substrate proteins and target them to membranes. This is employed by *L. pneumophila* to assure correct cellular localization of its effector AnkB. Host-mediated farnesylation of AnkB, targets it to the cytosolic face of the *Legionella*-containing vacuole and this in turn is essential for its function (Price et al., 2010).

# **Eliminylation**

*Shigella flexneri* effector protein OspF, a phosphothreonine lyase, interferes with host signaling by irreversibly dephosphorylating MAP kinases. In this PTM termed eliminylation, not only the phosphate but also the hydroxyl group of threonine is removed, thereby preventing any future phosphorylation. It irreversibly inactivates the kinase (Li et al., 2007).

# **Glycosylation**

Protein glycosylation is now a well-established modification in bacteria. O-linked glycosylation of flagellar proteins has been linked to virulence in pathogenic bacteria. In the opportunistic pathogen *Burkholderia cenocepacia*, glycosylation of the flagellar protein FliC reduces binding to the host TLR5 receptor, thereby weakening the immune response. Additionally, a Ravikumar et al. PTMs and Proteomics of Pathogenesis

non-glycosylated FliC mutant exhibits a defect in biofilm formation (Hanuszkiewicz et al., 2014). *Helicobacter pylori* flagellar proteins are also glycosylated, and this is essential for virulence (Schirm et al., 2003). In a systematic study of *H. pylori* glycoproteins, 26 proteins with diverse roles in pathogenesis were identified, arguably indicating a broader array of mechanisms for glycosylation-based virulence (Champasa et al., 2013).

# **POTENTIAL OF PROTEOMICS**

Assessment of host–pathogen interactions is beneficial from a surveillance and diagnostic perspective. Pursuing the proteomic signatures of pathogenic bacteria during infection states could reveal dynamic changes that occur in the proteome or PTM proteome, which are often induced due to the selection pressure occurring during the interaction with the host. High-resolution mass spectrometry based proteomics has the ability to bridge the gap between genomics and the physiological mechanisms of behavior (Dove, 1999). Global and quantitative analysis of the pathogen proteome provides an in depth understanding of the molecular events that are differentially regulated during the onset of infection. Dynamic changes occurring at the proteome level, quantitative post-translational modification analysis are some of the basic and widely used applications of the shotgun proteomics technology. The high-throughput "omics" technology also allows for identification of macromolecular components important for specific cell-to-cell communications (Chait, 2006), for monitoring metabolic shifts and revealing metabolic pathways involved and/or activated during specific stages of infection. The conventional shot-gun proteomics or the bottom-up approach involves protein extraction from the organism of interest followed by digestion with a suitable endoprotease and subsequent analysis by LC-MS/MS. Entire bacterial proteomes and factors that play a key role in certain virulence pathways can be identified and analyzed conveniently by employing this approach (Tracz et al., 2013; Alvarez Hayes et al., 2015). Conversely, top-down approaches that involve the analysis of intact proteins have also been adopted to characterize bacterial proteomes (Ansong et al., 2013). Additionally, while challenging, top-down proteomics can be employed to study PTMs such as phosphorylation, acetylation, etc, occurring in the bacterial proteome as well as the host proteome upon infection (Kelleher, 2004; Zhang and Ge, 2011).

Microbial proteomics additionally aids in detection of biomarkers which could possibly act as targets for drug based therapy (Guest et al., 2013). Proteomics studies conducted on the effects of drug dosage are also crucial for acquiring information about the real-time *in vivo* scenario, thus assisting in the development of antimicrobial drugs. Targeted proteomics is also a powerful method for focusing on a specific subset of proteins involved in infection and resistance mechanisms, allowing for mechanistic questions to be answered.

# **CURRENT TECHNOLOGY**

A limited number of non-proteomics based methodologies are currently employed for the investigation of population dynamics of binary mixed cultures (Kluge et al., 2012), namely, Fluorescence *In Situ* Hybridization (FISH; Rogers et al., 2000), Real-Time PCR (Higuchi et al., 1993), Flow Cytometry (Müller et al., 1995) and Terminal Restriction Fragment Length Polymorphism (T-RFLP; Schmidt et al., 2007). Metaproteome analysis of mixed cultures using 2-DE followed by LC-ESI-MS has been utilized to identify diverse microbial communities in the environment (Benndorf et al., 2007; Kluge et al., 2012). Highresolution mass spectrometry based approaches are routinely employed to study bacterial infection models or bacterial cocultures. Global subcellular protein profiling has been applied to reveal subcellular distributions of proteins, subsequently used to reconstruct functional networks and to identify new targets responsible for pathogenicity (Mawuenyega et al., 2005). *De novo* sequencing has been applied for peptide/protein identification when the DNA sequence coverage of a particular organism has been unavailable or incomplete (Wilmes and Bond, 2004). Centrifugation followed by detergent solubilization (Fernandez-Arenas et al., 2007); ImmunoMagnetic Separation (IMS) using anti-IgG-coated DynabeadsTM in combination with antisera (Twine et al., 2006); or Fluorescence-Activated Cell Sorting (FACS; Becker et al., 2006) based approaches are routinely used for separation of bacterial cells from host cells. Pulsechase experiments using <sup>35</sup>S-labeled methionine or cysteine can also be employed to monitor changes occurring at the level of protein synthesis or protein turn-over (Schmidt and Volker, 2011). Quantification of proteome dynamics post infection can be done by label free approaches [for example, Luo et al. (2014) identified 2125 phosphopeptides and quantified 253 and 344 up-regulated phosphorylation events from PRRSV-infected pulmonary alveolar macrophages 12 and 36 h postinoculation respectively by employing label free approaches (Luo et al., 2014)] or by other label based relative quantitation methods such as iTRAQ (isobaric tagging for relative and absolute quantitation) or SILAC (stable isotope labeling with amino acids in cell culture; Shui et al., 2009). In the paper by Shui et al. (2009), a total of 1286 proteins were identified from murine macrophages during *M. tuberculosis* infection, of which 463 were identified by both SILAC and iTRAQ labeling strategies. Targeted proteomics, that forms a part of the newer generation proteomics approaches, can also be applied to study specific changes occurring in the host or bacterial cell by applying multiple reaction monitoring mass spectrometry (MRM-MS). Lange et al. (2008) employ multiple reaction monitoring to understand the dynamics of virulence factors from the Gram-positive bacterium *Streptococcus pyogenes*. Similarly, Karlsson et al. (2012) use selected reaction monitoring mass spectrometry (SRM-MS) to decipher the *S. pyogenes* proteome. Mass-spectrometry based imaging techniques form a fresh avenue of approach for studying differences in microbial communities (Wilmes and Bond, 2009).

# **TECHNICAL CHALLENGES**

In the field of infection biology, the characterization of microbes existing as interacting communities is extremely challenging but essential (Kluge et al., 2012). Genome is a static entity, while the proteome is dynamic, creating a higher order of complexity in whole proteome analysis in comparison to the genome. The presence of PTMs further increases the functional diversity of an organism's proteome. Furthermore, only a fraction of a given organism's proteome is modified at any given time point, making it harder to detect or quantify certain events. Limitations with respect to instrumentation can also decrease the mass range of detection, hence losing out on critical biological information. Another major limiting factor is the absence of a complete genome sequence for many relevant bacterial species in complex environmental samples. As a consequence, limitations in the analysis of protein modifications could occur mainly due to the partial sequence coverage. When investigating bacterial proteomes under conditions of infection, extra care must be taken to ensure that the ratio of abundance of the bacterial to eukaryal proteins is not disproportional, as the bacterial proteome can easily be drowned in the noise of the highly abundant eukaryal proteins. Furthermore, the complete recoverability of the low number of bacterial cells under such conditions is difficult. This is a challenge for the PTM analysis, as generally a high amount of protein is required for enrichment of modified peptides. Additionally, mammalian-based infection studies are limited by ethical constraints, and a delicate balance must be struck to gain mechanistic understanding in a true *in vivo* setup.

# **PERSPECTIVES**

Developments in experimental methodologies and instrumentation have made it possible to efficiently investigate host–pathogen

# **REFERENCES**


interactions. However, there is still a necessity for further advances in instrumentation, technology and improvements in data processivity to be able to (a) obtain a complete or sufficiently high coverage of the pathogen proteome starting with low input material; (b) conveniently identify and differentiate between organism-specific proteomes to be able to monitor the species-specific changes occurring; and (c) routinely monitor alterations occurring at the post-translation level to reveal the pathways relevant for drug targeting. Proteomics based massspectrometry is a powerful tool that helps obtain a functional understanding of microbial co-operativity, interaction, evolution, physiological changes and cellular functioning. Quantitative proteomic profiling of host–pathogen interactions and discovery mode proteomics, in combination with bioinformatics, have been shown as an effective approach to illuminate key factors intrinsic to host invasion and disease propagation. Additionally, data-dependent analysis or targeted approaches show a high degree of promise and are now being applied for biomarker discovery and clinical applications. The potential for definite target selection and validation should be further exploited to reveal basic solutions to important biological issues. Furthermore, proteomics analysis of PTMs from the bacterial perspective would be an interesting avenue to venture into. Apart from a few global phosphoproteome analyses, currently there are not many reports on other post-translationally modified pathogenic bacterial proteomes. Resolving some of the above-described technical challenges could aid in unraveling physiological pathways affected by PTMs that help in virulence.

pathogen *Helicobacter pylori* (Hp). *Mol. Cell. Proteomics* 12, 2568–2586. doi: 10.1074/mcp.M113.029561


human innate immune responses. *J. Biol. Chem.* 289, 19231–19244. doi: 10.1074/jbc.M114.562603


autophagy of cytosolic aggregates. *PLoS Pathog.* 8:e1002743. doi: 10.1371/ journal.ppat.1002743


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2015 Ravikumar, Jers and Mijakovic. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Mass Spectrometry Offers Insight into the Role of Ser/Thr/Tyr Phosphorylation in the Mycobacteria

Bridget Calder, Claudia Albeldas, Jonathan M. Blackburn\* and Nelson C. Soares \*

Applied and Chemical Proteomics Group, Medical Biochemistry Division, Faculty of Health Sciences, Institute of Infectious Diseases and Molecular Medicine, University of Cape Town, Cape Town, South Africa

Phosphorylation is a post translational modification which can rapidly regulate biochemical pathways by altering protein function, and has been associated with pathogenicity in bacteria. Once engulfed by host macrophages, pathogenic bacteria are exposed to harsh conditions and must respond rapidly in order to survive. The causative agent of TB, Mycobacterium tuberculosis, is unusual amongst the bacteria because it can survive within the host macrophage for decades in a latent state, demonstrating a remarkable capacity to successfully evade the host immune response. This ability may be mediated in part by regulatory mechanisms such as ser/thr/tyr phosphorylation. Mass spectrometry-based proteomics has afforded us the capacity to identify hundreds of phosphorylation sites in the bacterial proteome, allowing for comparative phosphoproteomic studies in the mycobacteria. There remains an urgent need to validate the reported phosphosites, and to elucidate their biological function in the context of pathogenicity. However, given the sheer number of putative phosphorylation events in the mycobacterial proteome, and the technical difficulty of assigning biological function to a phosphorylation event, it will not be trivial to do so. There are currently six published phosphoproteomic investigations of a member of mycobacteria. Here, we combine the datasets from these studies in order to identify commonly detected phosphopeptides and phosphosites in order to present high confidence candidates for further validation. By applying modern mass spectrometry-based techniques to improve our understanding of phosphorylation and other PTMs in pathogenic bacteria, we may identify candidates for therapeutic intervention.

Keywords: post-translational modification, phosphorylation, mycobacteria, tuberculosis, proteomics, phosphoproteomics, mass spectrometry, virulence

# INTRODUCTION

Proteins are the bioactive molecule in the cell, and contribute to survival, growth, and reproduction by interacting with each other, and with metabolites, lipids, nucleic acids and carbohydrates, and catalyzing biological reactions (Nørregaard Jensen, 2004). Protein biosynthesis and degradation are tightly regulated by complex biochemical systems in response to the changing needs of the cell. However, there is an additional mechanism which allows cells to respond rapidly and efficiently to the external and internal conditions. Post translational modification (PTM) by selective covalent processing of proteins—by proteolytic cleavage or the addition of a modifying

#### *Edited by:*

Biswarup Mukhopadhyay, Virginia Tech, USA

#### *Reviewed by:*

Haike Antelmann, Freie Universität Berlin, Germany Ivan Mijakovic, Chalmers University of Technology, Sweden

#### *\*Correspondence:*

Jonathan M. Blackburn jonathan.blackburn@uct.ac.za; Nelson C. Soares nelson.dacruzsoares@uct.ac.za

#### *Specialty section:*

This article was submitted to Microbial Physiology and Metabolism, a section of the journal Frontiers in Microbiology

> *Received:* 25 October 2015 *Accepted:* 25 January 2016 *Published:* 12 February 2016

#### *Citation:*

Calder B, Albeldas C, Blackburn JM and Soares NC (2016) Mass Spectrometry Offers Insight into the Role of Ser/Thr/Tyr Phosphorylation in the Mycobacteria. Front. Microbiol. 7:141. doi: 10.3389/fmicb.2016.00141 group—can drastically alter the properties of a protein (Mann and Jensen, 2003; Mijakovic, 2010; Stülke, 2010). Post translational modifications add a layer of complexity to both bacterial and eukaryotic mechanisms of adaptation to the surrounding environment.

Modern mass spectrometry (MS) has enabled us to perform high throughput analysis of PTMs. Since modified proteins usually occur at low abundance, an enrichment process is typically carried out for a specific PTM prior to MS analysis (Semanjski and Macek, 2016). This results in increased resolution, sensitivity and fragmentation (Cain et al., 2014) and has contributed to the identification and localization of phosphosites in many species, including pathogenic bacteria. Once thought to be found only in eukaryotes, the discovery of Hanks-type family of kinases in bacteria indicates that a complex bacterial phosphorylation-mediated signaling system exists (Bakal and Davies, 2000). These serine/threonine protein kinases (STPKs) add a phosphate group to a serine/threonine, while BY-kinases add phosphate groups to Tyrosine residues. PTMs in bacteria have been linked to pathogenicity, virulence, resistance and persistence, and are vital for survival (Ge and Shan, 2011; Van Els et al., 2014). MS-based phosphoproteomic analysis has been carried out in a number of bacterial species, including Bacillus subtilis (Macek et al., 2007), Escherichia coli (Macek et al., 2008; Soares et al., 2013), Streptococcus pneumonia (Sun et al., 2009) Listeria monocytogenes (Misra et al., 2011), Acinetobacter baumanii (Soares et al., 2014), and Mycobacterium tuberculosis.

# The Role of Phosphorylation in *M. tuberculosis*

The bacillus M. tuberculosis is the causative agent of tuberculosis (TB), a leading global health crisis which has claimed millions of lives by continuing to evade clinical intervention. This is largely due to its ability to lay dormant for many years in the body, resurfacing if the hosts' immune system becomes compromised (Gengenbacher and Kaufmann, 2012). When the bacilli enter the human lung, they are ingested by alveolar macrophage cells of the human immune system. The macrophages respond by becoming acidic and exposing the pathogen to lytic enzymes, oxygenated lipids, fatty acids, and reactive oxygen and nitrogen intermediates (Schnappinger et al., 2003). In order to survive in the adverse environment of the macrophage, M. tuberculosis needs to react swiftly, which is possible through the regulatory mechanisms afforded by PTMs (Cain et al., 2014). Mycobacterial STPK phosphorylation has been long associated with pathogenicity (Sherman and Grundner, 2014), which has driven efforts to improve our understanding of the role of phosphorylation in M. tuberculosis (Prisic and Husson, 2014). Currently, there are six published manuscripts describing the phosphoproteome of a member of the Mycobacteria—M. tuberculosis H37Rv (Prisic et al., 2010; Kusebauch et al., 2014), M. smegmatis, and M. bovis BCG (Nakedi et al., 2015; Zheng et al., 2015), a clinical isolate of M. tuberculosis Beijing lineage (Fortuin et al., 2015) and a 1pknE deletion mutant strain of M. tuberculosis (Parandhaman et al., 2014b). While these studies are discussed in more detail below, a summary of the methods used and relevant results for each of them is presented in **Table 1**. A major difficulty in comparing phosphoproteomic studies is that we cannot compare those phosphosites that were uniquely identified by each study, because it is impossible to determine whether that uniqueness is as a result of biological differences or the stochastic nature of discoverydriven MS-based proteomics. The only available study in the mycobacteria where those uniquely detected phosphosites are comparable is Nakedi et al. (2015), because the study compared two strains under the same experimental conditions. In this case, a relevant conclusion drawn by the authors is that the phosphosite patterns detected in M. smegmatis and M. bovis BCG are often species specific and these phosphorylation events are commonly occurring on entirely different peptides.

The identification of PTMs which contribute to the pathogenicity of M. tuberculosis by enhancing its ability to survive in the macrophage is of great interest to the medical community, as these represent attractive candidates for therapeutic intervention. This review will focus specifically on the use of MS-based techniques which have been used to identify phosphorylated proteins in the mycobacteria, with particular focus on the identification of phosphorylation sites which may contribute to pathogenicity and are conserved across pathogenic strains of mycobacteria.

# COMPARATIVE PHOSPHOPROTEOMIC ANALYSIS OF MYCOBACTERIA

Prisic et al. (2010) were the first to present a global view of the phosphoproteome of M. tuberculosis H37Rv. They used in-gel tryptic digest to proteolytically prepare samples of H37Rv lysate grown under conditions of NO stress, oxidative stress, hypoxia and using glucose, or acetate as a carbon source. A total of 152 samples were analyzed on an LTQ mass spectrometer following enrichment for phosphopeptides using titanium dioxide beads. In this manner, the authors detected a total of 506 phosphosites on 301 proteins and identified a dominant motif for M. tuberculosis STPKs, which was validated using synthetic peptides. This initial investigation reported 40% Ser: 60% Thr phosphorylation, and no Tyr phospho-sites—at the time of publication there was no conclusive molecular evidence for Tyr phosphorylation in M. tuberculosis, although there was a long-established association between Tyr phosphorylation and pathogenicity in other bacteria. (Ilan et al., 1999). Kusebauch et al. (2014) reported Tyr phosphorylation in M. tuberculosis for the first time, after establishing that the known M. tuberculosis STPKs have the capacity to phosphorylate Tyr, and then by carrying out LC MS/MS analysis on M. tuberculosis culture lysate enriched for phosphopeptides. In this manner, they detected 30 high-confidence Tyr phospho-sites on 17 M. tuberculosis proteins, contributing to a Ser: Thr: Tyr ratio of 34:62:4%. Intriguingly, these authors also found an additional 35 Tyr phosphorylation sites in publically accessible MS data from previously published proteomic studies of M. tuberculosis, where the assumption that there was no Tyr phosphorylation in M. tuberculosis had led the authors to overlook them. Subsequent MS based descriptions of the M. tuberculosis phosphoproteome have identified similar proportions of Tyr phosphorylation sites,



some of which may be of particular importance in establishing virulence.

Recently, Nakedi et al. (2015) investigated the impact of protein phosphorylation in growth-related functions by measuring differential phosphorylation between two mycobacterial species during exponential growth phase—the fast growing, non-pathogenic, soil dwelling Mycobacterium smegmatis and the slow growing, attenuated strain of Mycobacterium bovis (BCG). BCG had consistently higher phosphorylation levels, with 289 phosphosites on 203 proteins, compared to M. smegmatis with 106 phosphosites found on 76 proteins. The phosphoproteins which were uniquely found in BCG were generally involved in cell growth and stress response. Even under optimal growth conditions, BCG appears to have a high level of phosphorylated stress response proteins which suggests the capability for quick on/off responses to stressors within the host, which ultimately allows the bacteria to respond rapidly and survive more effectively. The potential adaptive advantage for the pathogen may result in a fitness cost of slower growth. In a follow-up phosphoproteomic study, Zheng et al. (2015) found 659 phosphosites on 398 proteins in BCG harvested during stationary phase. The majority (40.1%) of identified phosphoproteins in this case were involved in regulation of metabolism. Again, these findings indicate that phosphorylation plays an important role in the slower metabolism of BCG, which may ultimately increase the capability for persistence within the host, and may be beneficial to the bacteria when faced with drug treatment (Evangelopoulos and McHugh, 2015).

A phosphoproteomic investigation of a hyper-virulent Beijing strain of M. tuberculosis by Fortuin et al. (2015) reported the identification of 414 phosphosites on 214 proteins. Of these, 252 were novel phosphosites which had not been identified in


TABLE 2 | Previously reported virulence factors in the *M. tuberculosis* complex which have been identified in more than one phosphoproteomic dataset, and their putative role in virulence.

The commonly detected phosphopeptide is indicated in bold, along with phosphosites that have been confirmed in more than one study.

previous phosphoproteome research on the H37Rv strain. Since the capability for complex signaling is directly related to the number of phosphorylation events, it could be inferred that an increasing number of phosphorylation events may play a role in differentiating virulence in different M. tuberculosis strains. This is supported by findings that a highly virulent, drug resistant strain of Acinetobacter baumannii has almost double the number of phosphosites in comparison to the reference strain (Soares et al., 2014). It is also interesting to note that phosphorylation may be linked to drug resistance, which is of concern given the emergence of drug resistant strains of TB (Evangelopoulos and McHugh, 2015). Although the mechanism by which a bacterium might accumulate additional phosphosites is not fully understood, it is of interest that Fortuin et al. (2015) identified phosphorylated forms of 9 out of the 11 STPKs encoded by the M. tuberculosis genome in this hypervirulent Beijing strain (Prisic and Husson, 2014), while Prisic et al. (2010) only identified four in H37Rv.

Although our understanding of the specific activity of Mycobacterial STPKs is largely incomplete, some substrates and their downstream functions have been associated with specific kinases (Supplementary Table 1). While mass spectrometry is capable of identifying hundreds of phospho-substrates, more laborious methods are necessary to associate these substrates with their corresponding kinases. The challenges inherent in mass-spectrometry-based kinase substrate identification are discussed in more detail by Sherman and Grundner (2014). Parandhaman et al. (2014b) made use of a 1PknE deletion mutant of M. tuberculosis to identify PknE substrates during NO stress in M. tuberculosis, and identified 68 phosphoproteins by combining 2D PAGE MS with phospho-serine and phosphothreonine specific antibodies. The candidate PknE substrates identified in this manner may play a role in dormancy within the macrophage, which may have implications for virulence, once again highlighting the importance of phosphorylation in allowing the bacterium to respond effectively to the host environment. Given this information, and somewhat surprisingly, there is evidence that the PknE gene is non-essential for growth of M. tuberculosis in culture (Sassetti et al., 2003). Indeed, in culturebased models it seems that only three of the STPKs available to M. tuberculosis are individually essential for growth: PknA and PknB (Wehenkel et al., 2008); and PknG, which is essential for survival of M. tuberculosis within the macrophage. Walburger et al. (2004) demonstrated that a strain of M. bovis BCG carrying an inactivated PknG gene showed no differences in growth or cell morphology in liquid medium to the wild type, whereas the mutant was unable to survive when inside a macrophage because the bacteria were no longer able to prevent lysosomal fusion. In many pathogenic bacteria, but particularly in M. tuberculosis, we are increasingly aware of the complexity of the host/pathogen interaction during disease progression, while being limited to culture or animal based models in our attempts to understand it1 . The fact that M. tuberculosis has evolved to exist in the intracellular space is highlighted by the metabolic changes which are observed in intracellular M. tuberculosis (Lee et al., 2013), but we have yet to explore the consequences of phosphorylation for M. tuberculosis in vivo. It is conceivable that intracellular STPKs have different activity compared to culture-based systems, and high-throughput phosphoproteomic analysis of intracellular M. tuberculosis would allow us to better understand the role of phosphorylation in the host pathogen interaction.

# Identifying Functional Phosphosites for Further Characterization

The aim of discovery-based proteomics has been to catalog as many proteins as possible in a sample, with the intent being to better understand a biological condition or response. However,

<sup>1</sup>MISSING:pmid:22320122. MISSING:pmid:22320122. (2016).

the volumes of data generated in this manner have not necessarily achieved a deeper understanding of the biological systems in question. This problem is equally as confounding, if not more so, in phosphoproteomic investigations, since it is difficult to attribute biological significance to a discreet phosphorylation event. Many of the detected phosphorylation events reported by these large-scale studies require validation before we can begin to determine their function. To this end, meta-analysis of the available data may provide some insight for future investigations. Although the above-mentioned mycobacterial phosphoproteomic studies were conducted using different MSbased methodologies, and in different strains of mycobacteria, there are discreet phosphorylation events that were commonly observed in several datasets, and are therefore unlikely to be random. A phosphopeptide which is observed in more than one study, or across several mycobacterial strains, may therefore be the best starting point for further investigation in M. tuberculosis. To this end, here we present a compilation of commonly detected phosphopeptides/sites which may thus be of use for future functional phosphoproteomic investigations in the mycobacteria (Supplementary Table 2).

A total of 194 phosphopeptides representing 148 proteins were found to be represented by two or more sets of data according to the available published supplementary information. To facilitate comparison at the protein level, the peptide sequences were matched to H37Rv protein identifiers using the Protein Information Resource (PIR) batch peptide match tool against the Uniprot H37Rv proteome (Chen et al., 2013). GO analysis of the proteins corresponding to these shared phosphopeptides using STRAP 1.5 (Bhatia et al., 2009) revealed that the associated GO terms relate to broad regulatory functions, such as the regulation of cell metabolism, cellular processes, and growth (**Figures 1A,B**). While this is expected given the established role of phosphorylation in metabolic regulation in bacteria (Kochanowski et al., 2015), this is also potentially noteworthy as we are only beginning to unravel the importance of metabolic regulation for intracellular M. tuberculosis and how this relates to virulence (Eisenreich et al., 2010). Of particular interest are incidences where the more virulent clinical strain is phosphorylated on a different residue compared to the other strains, as in the example of the AEASIETPTPVQSQR peptide of TatA, a Sec-independent protein translocase protein, which was phosphorylated on T7 and/or T9 in M. bovis BCG and M. tuberculosis H37Rv but on S4 in the clinical strain. The functional significance of this difference remains unknown and should be validated and investigated, particularly in light of the contribution of the Tat pathway to virulence in M. tuberculosis (Feltcher et al., 2010). It should be noted that the utility of GO analysis in mycobacteria is limited by the availability of GO annotations and other resources such as KEGG pathway representation, for which coverage is generally poor.

Differential phosphorylation of the STPKs themselves can alter their enzymatic activity, and is another possible mechanism for altered pathogenicity in M. tuberculosis (Chopra et al., 2003; Durán et al., 2005). Supplementary Table 2 highlights that phosphorylation is commonly detected in PknA, B, D, E, G, and H in these mycobacteria; however the differences in localized phosphosites between the virulent and less virulent strains are more pronounced in peptides corresponding to PknA, D, and G. Having already established the importance of PknA (Singh et al., 2006) and G (Walburger et al., 2004) for survival within the macrophage, the biological significance of these differences in phosphorylation and how they contribute virulence should now be ascertained. Forrellad et al. (2013) published a summary of the known virulence factors in the M. tuberculosis complex and their putative contribution to virulence. Cross referencing their table of results to ours identified proteins which are known virulence factors which are also commonly detected in these phosphoproteomic datasets. These proteins and their putative role in virulence in the M. tuberculosis complex are presented in **Table 2**. Included in these are some previously mentioned STPKs, as well as the proteins KatG, EspR, and IdeR.

# CURRENT STATE OF PHOSPHOPROTEOMICS

Currently, the mycobacterial phosphoproteome has been qualitatively described, which has identified many high confidence phosphosites. However, a complete description of the phosphoproteome should include quantitative information for the specific phosphosite in question as well as for the phosphorylated protein. In addition, most of the available phosphoproteomic data is for bacterial culture under a single condition, and yet we recognize that phosphorylation is dynamic and can change rapidly (Macek et al., 2009). In order to address this, it is important to incorporate multiple time points and experimental conditions in future phosphoproteomic investigations. These objectives are specifically challenging for mass spectrometry-based proteomics, and despite promising technological advances the mycobacterial phosphoproteome has not yet been quantitatively assessed (de la Fuente van Bentem et al., 2008). There are many available quantitative proteomic tools which are applicable in bacterial phosphoproteomics, which have been discussed in detail elsewhere (Jers et al., 2008; Macek et al., 2009), however, not all of these are suitable for the mycobacteria. SILAC, for example, has been used successfully to quantify proteins in the bacterium Bacillus subtilis (Ravikumar et al., 2014), but is not currently possible in M. tuberculosis because the available lysine deficient mutants are not viable for SILAC. Alternative quantitative methods include labelfree quantitation, which has the benefit of cost effectiveness allowing for multiple time points/conditions to be explored, or dimethyl labeling, which is a promising alternative to iTRAQ and other label-based techniques (Lau et al., 2014). iTRAQ has been successfully used to quantify the phosphorylation of arginine in B. subtilis (Schmidt et al., 2014), and therefore is suitable for measuring S/T/Y phosphorylation in bacterial systems, but is limited by the number of samples that can be analyzed concurrently and the high cost of the iTRAQ reagents. Validation of phosphoproteomic data using targeted proteomics along with specialized analysis software such as Skyline is an exciting prospect, as these powerful tools have the capability to validate specific phosphosites as well as providing quantitative information. The absolute quantitation of modified peptides is possible through a combination of heavy-labeled AQUA peptides and selected reaction monitoring (SRM) MS, although the successful use of this strategy has not yet been reported in bacteria (Kirkpatrick et al., 2005).

# RECOMMENDATIONS FOR FUTURE RESEARCH

The field of phosphoproteomics would benefit greatly from an integrated, easily accessible database containing all available information, which would facilitate meta-analysis of PTMs between multiple datasets. Furthermore, such a database for all known PTMs would allow for more in-depth analysis of the cross-talk between bacterial PTMs and how this may relate to pathogenicity, which has been discussed in a review by Soufi et al. (2012). Such a resource could provide a standardized format for reporting PTMs, as well as the opportunity to automatically update database accession numbers for modified proteins which would in turn facilitate more meaningful comparison between different species or strains. Significant challenges exist prior to the development of such a database, including the lack of standardized reporting in currently published studies. However, the capacity to detect PTMs using MS will only increase with improving technology and the interpretation and management of the data thus generated is therefore crucial if we are to translate this into meaningful clinical applications. Through better understanding of the function of regulatory PTMs in M. tuberculosis, we may reveal the means to control or cure it.

# AUTHOR CONTRIBUTIONS

CA Wrote the original draft of this paper, read, and edited final draft. BC Edited draft submitted by CA, compiled data from phosphoproteomic reviews and performed meta-data analysis, put together final draft of paper. NS Supervised revisions on paper, provided editing support, and directed the topic to be covered by the review. JB Provided supervision and guidance throughout and offered editing support.

# ACKNOWLEDGMENTS

BC and NS thank the South African Medical Research Council for Fellowships. JB acknowledges the South African National Research Foundation for the Research Chair grant. NS acknowledges support in part by the National Research Foundation of South Africa (Grant Numbers 98963 and 95984).

# SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmicb. 2016.00141

Bakal, C. J., and Davies, J. E. (2000). No longer an exclusive club: eukaryotic signalling domains in bacteria. Trends Cell Biol. 10, 32–38. doi: 10.1016/S0962-


REFERENCES

8924(99)01681-5

substrates of the kinase PrkC and phosphatase PrpC. Mol. Cell. Proteomics 13, 1965–1978. doi: 10.1074/mcp.M113.035949


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Calder, Albeldas, Blackburn and Soares. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Mass Spectrometry Targeted Assays as a Tool to Improve Our Understanding of Post-translational Modifications in Pathogenic Bacteria

Nelson C. Soares \* and Jonathan M. Blackburn

*Division of Chemical and Systems Biology, Department of Integrative Biomedical Sciences, Faculty of Health Sciences, Institute of Infectious Diseases and Molecular Medicine, University of Cape Town, Cape Town, South Africa*

Keywords: post-translational modification, protein phosphorylation, proteomics, phosphoproteomics, targeted proteomics, mass spectrometry, PRM, SRM

# ADVANCES IN MASS SPECTROMETRY-BASED PROTEOMICS IMPACT THE ANALYSIS OF BACTERIAL PTMS

#### Edited by:

*Ivan Mijakovic, Chalmers University of Technology, Sweden*

#### Reviewed by:

*Dana Reichmann, Hebrew University of Jerusalem, Israel Christophe Grangeasse, Centre National de la Recherche Scientifique, France*

\*Correspondence:

*Nelson C. Soares nelson.dacruzsoares@uct.ac.za*

#### Specialty section:

*This article was submitted to Microbial Physiology and Metabolism, a section of the journal Frontiers in Microbiology*

> Received: *09 June 2016* Accepted: *21 July 2016* Published: *04 August 2016*

#### Citation:

*Soares NC and Blackburn JM (2016) Mass Spectrometry Targeted Assays as a Tool to Improve Our Understanding of Post-translational Modifications in Pathogenic Bacteria. Front. Microbiol. 7:1216. doi: 10.3389/fmicb.2016.01216* Post translational modifications (PTMs) play a vital role in maintaining protein function, regulation, signaling, and many other important cellular processes (Anonsen et al., 2012). For several years, protein PTMs were thought to be unique to eukaryotes; only recently it has been demonstrated that bacteria also contain a wide variety of PTMs including those previous described in eukaryotic cells, for example Ser/Thr/Tyr phosphorylation, lysine-acetylation, glycosylation, ubiquitination, and glutathionylation, as reviewed (Macek and Mijakovic, 2011; Soufi et al., 2012; Michard and Doublet, 2015; Ravikumar et al., 2015). PTMs have been linked to numerous essential bacterial cellular events, including cell division, morphology, and more recently they have also been associated with: Pathogenicity (Michard and Doublet, 2015; Ravikumar et al., 2015); virulence (Lin et al., 2009; Calder et al., 2015; Fortuin et al., 2015); and drug resistance (Soares et al., 2014; Lai et al., 2016). The employment of powerful new proteomics platforms has contributed to significant insight in to the regulatory functions of bacterial PTMs. More than a decade ago, the first attempts to catalog bacterial phosphoproteomes were performed using 2D gel electrophoresis-based (2-DE) approaches, first combined with <sup>33</sup>P radiolabelling and later with phosphoprotein-specific dyes (Cortay et al., 1986; Bendt et al., 2003; Lévine et al., 2006). However, such studies were limited by a number of technical challenges, including those inherent to gel based approaches (Pietrogrande et al., 2003; Monteoliva and Albar, 2004). Consequently, at the time few phosphoproteins and even fewer phosphorylation sites were reported. As the field of proteomics progressed from 2-DE gel based methodologies to modern liquid chromatography mass-spectrometry (LC-MS/MS) based workflows, significant advances in the analysis of bacterial PTMs were then achieved. The introduction of modern mass spectrometry (MS), specifically the inclusion of Orbitrap mass spectrometers, brought new momentum to the field. In 2006, Mann and colleagues proposed a gel-free approach to explore the phosphoproteome of eukaryotic cells in depth. The study reported the identification of more than 6600 phosphorylation sites (Olsen et al., 2006). Their workflow involved the enzymatic digestion of complex protein extracts into peptides and then phosphopeptides were enriched by a combination of strong cation exchange and TiO<sup>2</sup> chromatography. Macek et al. (2007) used similar technology to describe a site– specific, in vivo phosphoproteome of a gram positive model organism, Bacillus subtilis. The study represented a main breakthrough in the bacterial phosphoproteomic field: the number of reported B. subtilis phosphoproteins went from 16 phosphorylation sites on eight proteins to 103 unique phosphopetides on 78 proteins (Macek et al., 2007). This order of magnitude jump in the field then inspired many other subsequent studies that successfully applied variants of the protocol described by Macek et al. (2007), in particular those that employed phosphopetide enrichment via immobilized metal affinity chromatography (IMAC; Thingholm and Jensen, 2009) to perform high throughput analysis of different bacterial phosphoproteomes (Ravichandran et al., 2009; Qu et al., 2013). Today, several bacterial phosphoproteomes have been successfully characterized including some important human pathogenic bacteria: Pseudomonas aeruginosa (Ravichandran et al., 2009); Klebsiella pneumoniae (Lin et al., 2009); mycobacterium spp. (Prisic et al., 2010; Fortuin et al., 2015; Nakedi et al., 2015; Calder et al., 2016); Helicobacter pylori (Ge et al., 2011), streptomyces spp. (Manteca et al., 2011); Acinectobacter baummnii (Soares et al., 2014; Lai et al., 2016); Staphylococcus aureus (Basell et al., 2014); and Enteropathogenic Escherichia coli (Scholz et al., 2015). The development of new high throughput mass spectrometric instrumentation (Michalski et al., 2011b, 2012; Scheltema et al., 2014; Erickson et al., 2015) together with continuous optimization of LC-MS/MS shotgun proteomics workflows (Michalski et al., 2011a; Kelstrup et al., 2012; Pirmoradian et al., 2013) resulted in increased resolution, sensitivity, and fragmentation and has contributed to the successful identification and localisation of phosphosites. Such approaches have also now allowed for the identification of other bacterial PTMs such as lysine acetylation (Zhang et al., 2009; Weinert et al., 2013; Wu et al., 2013; Liu et al., 2014; Ouidir et al., 2015; Xie et al., 2015), ubiquitination (Valkevich et al., 2014), and glycosylation (Scott et al., 2011; Anonsen et al., 2012; Smith et al., 2014). It is therefore well established that large scale LC-MS/MS shotgun proteomics is the technique of choice for identifying bacterial post-translationally modified peptides/proteins.

When quantifying the state of a modified peptide/protein under a certain experimental condition(s) and/or during different cellular processes, it is important to determine its regulatory function and ultimately link this to a specific molecular mechanism. Recent advances in quantitative LC-MS/MS based proteomics have made possible the reliable quantification of thousands of phosphorylation peptides in eukaryotic cells (Pan et al., 2008) and, likewise, a number of studies have successfully applied different quantification methods to describe the dynamics of bacterial phosphoproteomes under different cellular states or experimental conditions. For instance, stable isotope labeling by amino acids in cell culture (SILAC) has recently been applied to establish the dynamics of both the proteome and phosphoproteome of E. coli during five different growth phases in liquid cell culture (Soares et al., 2013). In that study, 76 Ser/Thr/Tyr phosphorylation events were quantified in all growth phases. Importantly the use of SILAC enabled the measurement of median occupancies of phosphorylation sites, which, unlike eukaryotic cells (Pan et al., 2008), were generally low (<12%) (Soares et al., 2013). SILAC has been successfully used on a number of occasions to determine the varying levels of Ser/Thr/Tyr phosphorylation events. However, the issues associated with SILAC when applied to bacteria are well documented (Soufi and Macek, 2014) and include, in particular, cases such as mycobacteria where there is no available viable, lysine-deficient mutant. Alternatively, chemical labeling, such as peptide demethylation labeling (Boersema et al., 2009), has been proven a convenient and efficient method to obtain differential quantification of bacterial PTMs (Spat et al., 2015; De Keijzer et al., 2016) and has gained increasing popularity. Additionally, an always attractive quantitative approach is label free quantification (LFQ), which has been applied on a number of occasions to further investigate the different PTMs levels in bacteria. Within this context, although phospho-enrichment strategies combined with LFQ analyses look promising, it has been pointed out (Rosenberg et al., 2015) that low occupancy among bacterial phosphorylation events may compound possible inconsistences amongst biological replicates.

MS based detection of PTMs is normally based on a mass shift introduced by the site-specific modification, however these are often sub-stoichiometric and usually occur at low abundance, thus an enrichment process (such as those mentioned above) is typically carried out for a specific PTM prior to subsequent MS analysis. The enrichment steps are however, for many, a major drawback in bacterial PTM analyses, since the enrichment step typically requires large amounts of starting material including protein concentrations in the order of 5–9 mg (Soufi et al., 2008; Soares et al., 2013, 2014; Spat et al., 2015). While this is can be easily achieved when using liquid cell culture, this may represent a major technical challenge for other studies, such those looking at host pathogen interactions at the site of infection, where it is virtually impossible to isolate the large number of bacterial cells or the physical amount of protein needed in those protocols. Additionally, the multiple rounds of enrichment steps remain a source of possible technical variations that can bias downstream quantitative analyses. Ultimately, the efficiency and reproducibility of the enrichment protocol across multiple samples is determinant both in terms of coverage as well as in terms of accurate quantification.

# MS BASED TARGETED METHODOLOGIES WILL ALLOW DETECTION AND QUANTIFICATION OF LOW ABUNDANT BACTERIAL PTMS

Emerging targeted MS methodologies, namely selected reaction monitoring (SRM), and parallel reaction monitoring (PRM), appear as perhaps the latest breakthrough within the field. Specifically, targeted mass spectrometry offers unequaled capability to characterize and quantify a specific set of proteins/peptides reproducibility, in any biological sample. In this hypothesis-driven approach (Picotti et al., 2013), only a small number of peptides are used as surrogate markers for the protein of interest, which are selectively measured in predefined m/z ranges and retention time windows. SRM (often called MRM, multiple reaction monitoring) is a targeted approach to quantitate pre-selected peptides by monitoring specific precursor-to-product ion transitions in a triple-quadrupole mass spectrometer (Picotti et al., 2010, 2013). While SRM measurements use a low resolution MS instrument, PRMs alternatively benefit from the capabilities of high resolution quadrupole and Orbitrap analysers (Peterson et al., 2012). The increased quality of hybrid quadrupole-Orbitrap analyses has consequently improved the quality of the target methods, mainly in terms of accuracy at the level of fragment, which enables the fundamental distinction of target compounds from the undesired background ions and enables a significantly lowered detection limit. In mammalian systems, SRM assays have been successfully applied in a large scale targeted proteomics experiment of a phosphorylation network. In the first of its kind study, Wolf-Yadlin et al. (2007) applied SRM to monitor the dynamics over time of tyrosine phosphorylation events in EGFR signaling after EGF stimulation. The authors used a combination of iTRAQ peptide labeling and phosphotyrosine peptide immunepurification coupled with IMAC to enrich for phosphotyrosine containing peptides (Wolf-Yadlin et al., 2007). More recently, Adachi et al. (2016) described the application of a large-scale phosphoproteome analysis and SRM-based quantification to develop a strategy for the systematic discovery and validation of biomarkers. Their two step approach described uses a typical IMAC based phosphoenrichment coupled to shotgun MS-based proteomic analysis in order to identify differentially modulate phosphopeptides; identified candidates are then validated by SRM analysis (Adachi et al., 2016). In another recent, well-designed application, Altelaar and co-workers suggested a strategy to monitor signal transduction pathways: By combining targeted quantitative proteomics with high selective phosphopeptide enrichment, the authors monitored the phosphorylation dynamics of the PI3K-mTOR and MAPK signaling network (De Graaf et al., 2015). Interestingly, in order to increase the success rate of the phosphopeptide SRM assays the authors focused on phosphopeptides reported previously by the group and from publicly available shotgun proteomics experimental data. The rationale for this approach is that phosphorylation alters the local charge distribution, which may interfere with proteolytic cleavage, thus previous knowledge of the phosphosite would enable a more efficient SRM assay. The study used synthetic stable isotope standard phosphopeptides for increased accuracy in quantification (De Graaf et al., 2015). Despite all this, measuring PTMs events through SRM can be challenging and in most cases involves time consuming assay optimization and reliance on synthetic peptide standards (Lawrence et al., 2016), which can be somewhat discouraging in practice. However, recently Lawrence et al. have proposed the use of PRMs as a straightforward and efficient means for large-scale targeted phosphoproteomic analysis. Here, instead of using databases of previously reported human phosphopeptide sequences, the authors report a highly comprehensive shotgun global phosphoproteomic analysis that resulted in the identification of more than 7.5 million phosphopetide spectral matches corresponding to 109,611 phosphorylation sites (Lawrence et al., 2016). Subsequently, the authors used the generated information, including the respective phosphopeptides retention times, to develop a plug-and-play phosphoproteomic system, based on sets of targeted, labelfree phosphopeptide PRM assays in order to interrogate the dynamics of the IGF1/AKT signaling pathway. Importantly, the study demonstrated that without sample fractionation and or phosphopeptide enrichment (Lawrence et al., 2016), PRM label-free quantification assays is a rapid assay for measuring virtually any known phosphorylation event in the human species.

Targeted methodologies have been successfully applied to investigate aspects of the pathogenic bacteria proteome (Lange et al., 2008; Karlsson et al., 2012; Schubert and Aebersold, 2015; Schubert et al., 2015; Peters et al., 2016). A noteworthy highlight is that Ruedi Aebersold and colleagues have recently generated a library of targeted SRM assays for ∼97% of the 4012 annotated Mycobacterium tuberculosis proteins and were able to reproducibly quantify ∼72% of the theoretical M. tuberculosis proteome in single unfractionated runs on a triple quadrupole MS (Schubert et al., 2013). As such, the generated M. tuberculosis proteome library represents a valuable experimental resource that now in theory enables researchers to interrogate and quantify essentially the entire proteome of M. tuberculosis in a single experiment. This allows for greater understanding of, for example, differences in expressed pathogenicity and virulence between clinical isolates. However, in order to further investigate mycobacterial or any other bacterial PTMs in a large scale, targeted and quantitative manner, it will be necessary first to follow a workflow similar to that used in mammalian cells as suggested by Lawrence et al. (2016). A strategy based on previous knowledge of the post translationally modified proteins/peptide is sensitive enough to detect low abundance, modified peptides. To date, much of the work on pathogenic bacteria using MS-based proteomics has focused on cataloging, and in some cases quantifying, PTMs that occur under in vitro conditions. Although informative, such reports come with the caveat of the studied conditions often being detached from the unique microenvironment that pathogenic bacteria are exposed to during host infection and/ or during colonization (e.g., inside the macrophage). Thus, in order to gain meaningful insights into the dynamics and role of different bacterial PTMs during host-pathogen interactions, it is crucial that researchers in the field join efforts to once again take advantage of recent advances in mass spectrometry instrumentation and develop targeted, quantitative PTM workflows that enable detection and accurate quantification of low-abundance, modified analytes, ideally circumventing the need for additional PTM enrichment steps in the process.

# AUTHOR CONTRIBUTIONS

NS wrote the original draft of this paper read, and edit final draft. JB critically discussed the content of the manuscript and provided editing support.

# FUNDING

This work was supported by a research grant from the National Research Foundation of South Africa (NRF; grant number 98963 and 95984). The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflicts with subject matter or materials discussed in the manuscript.

# ACKNOWLEDGMENTS

NS thanks the South African Medical Research Council (MRC) for a fellowship and NFR for funding support. JB thanks

# REFERENCES


the NRF for a Research Chair grant (64760). Authors would like to thank Kim Gurwitz for help with proofreading this manuscript.


hydrophilic interaction chromatography and parallel fragmentation by CID, higher energy collisional dissociation, and electron transfer dissociation MS applied to the N-linked glycoproteome of Campylobacter jejuni. Mol. Cell. Proteomics 10, M000031-MCP201. doi: 10.1074/mcp.M000031-MCP201


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Soares and Blackburn. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.