EMERGING APPROACHES FOR TYPING, DETECTION, CHARACTERIZATION, AND TRACEBACK OF *ESCHERICHIA COLI,* 2nd EDITION

EDITED BY: Pina M. Fratamico, Chitrita DebRoy and David S. Needleman PUBLISHED IN: Frontiers in Microbiology

#### *Frontiers Copyright Statement*

*© Copyright 2007-2018 Frontiers Media SA. All rights reserved. All content included on this site, such as text, graphics, logos, button icons, images, video/audio clips, downloads, data compilations and software, is the property of or is licensed to Frontiers Media SA ("Frontiers") or its licensees and/or subcontractors. The copyright in the text of individual articles is the property of their respective authors, subject to a license granted to Frontiers.*

*The compilation of articles constituting this e-book, wherever published, as well as the compilation of all other content on this site, is the exclusive property of Frontiers. For the conditions for downloading and copying of e-books from Frontiers' website, please see the Terms for Website Use. If purchasing Frontiers e-books from other websites or sources, the conditions of the website concerned apply.*

*Images and graphics not forming part of user-contributed materials may not be downloaded or copied without permission.*

*Individual articles may be downloaded and reproduced in accordance with the principles of the CC-BY licence subject to any copyright or other notices. They may not be re-sold as an e-book.*

*As author or other contributor you grant a CC-BY licence to others to reproduce your articles, including any graphics and third-party materials supplied by you, in accordance with the Conditions for Website Use and subject to any copyright notices which you include in connection with your articles and materials.*

> *All copyright, and all rights therein, are protected by national and international copyright laws.*

*The above represents a summary only. For the full conditions see the Conditions for Authors and the Conditions for Website Use.*

ISSN 1664-8714 ISBN 978-2-88945-433-4 DOI 10.3389/978-2-88945-433-4

#### About Frontiers

Frontiers is more than just an open-access publisher of scholarly articles: it is a pioneering approach to the world of academia, radically improving the way scholarly research is managed. The grand vision of Frontiers is a world where all people have an equal opportunity to seek, share and generate knowledge. Frontiers provides immediate and permanent online open access to all its publications, but this alone is not enough to realize our grand goals.

#### Frontiers Journal Series

The Frontiers Journal Series is a multi-tier and interdisciplinary set of open-access, online journals, promising a paradigm shift from the current review, selection and dissemination processes in academic publishing. All Frontiers journals are driven by researchers for researchers; therefore, they constitute a service to the scholarly community. At the same time, the Frontiers Journal Series operates on a revolutionary invention, the tiered publishing system, initially addressing specific communities of scholars, and gradually climbing up to broader public understanding, thus serving the interests of the lay society, too.

#### Dedication to Quality

Each Frontiers article is a landmark of the highest quality, thanks to genuinely collaborative interactions between authors and review editors, who include some of the world's best academicians. Research must be certified by peers before entering a stream of knowledge that may eventually reach the public - and shape society; therefore, Frontiers only applies the most rigorous and unbiased reviews.

Frontiers revolutionizes research publishing by freely delivering the most outstanding research, evaluated with no bias from both the academic and social point of view. By applying the most advanced information technologies, Frontiers is catapulting scholarly publishing into a new generation.

### What are Frontiers Research Topics?

Frontiers Research Topics are very popular trademarks of the Frontiers Journals Series: they are collections of at least ten articles, all centered on a particular subject. With their unique mix of varied contributions from Original Research to Review Articles, Frontiers Research Topics unify the most influential researchers, the latest key findings and historical advances in a hot research area! Find out more on how to host your own Frontiers Research Topic or contribute to one as an author by contacting the Frontiers Editorial Office: researchtopics@frontiersin.org

## **EMERGING APPROACHES FOR TYPING, DETECTION, CHARACTERIZATION, AND TRACEBACK OF ESCHERICHIA COLI, 2nd EDITION**

Topic Editors:

**Pina M. Fratamico,** United States Department of Agriculture, Eastern Regional Research Center, USA

**Chitrita DebRoy,** Pennsylvania State University, USA

**David S. Needleman,** United States Department of Agriculture, Eastern Regional Research Center, USA

Scanning electron micrograph of Shiga toxin-producing *E. coli* O157:H7 on a lettuce leaf. Image colorized by Susan Reed

Pathogenic *Escherichia coli* strains cause a large number of diseases in humans, including diarrhea, hemorrhagic colitis, hemolytic uremic syndrome, urinary tract infections, and neonatal meningitis, while in animals they cause diseases such as calf scours and mastitis in cattle, post-weaning diarrhea and edema disease in pigs, and peritonitis and airsacculitis in chickens. The different *E. coli* pathotypes are characterized by the presence of specific sets of virulence-related genes. Therefore, it is not surprising that pathogenic *E. coli* constitutes a genetically heterogeneous family of bacteria, and they are continuing to evolve. Rapid and accurate molecular methods are critically needed to detect and trace pathogenic *E. coli* in food and animals. They are also needed for epidemiological investigations to enhance food safety, as well as animal and human health and to minimize the size and geographical extent of outbreaks. The serotype of *E. coli* strains has traditionally been determined using antisera raised against the >180 different O- (somatic) and 53 H- (flagellar) antigens. However, there are many problems associated with serotyping, including: it is labor-intensive and time consuming; cross reactivity of the antisera with different serogroups occurs; antisera are available only in specialized laboratories; and many strains are non-typeable. Molecular serotyping targeting O-group-specific genes within the *E. coli* O-antigen gene clusters and genes that are involved in encoding for the different flagellar types offers an improved approach for determining the *E. coli* O- and H-groups. Furthermore, molecular serotyping can be coupled with determination of specific sets of virulence genes carried by the strain offering the possibility to determine O-group, pathotype, and the pathogenic potential simultaneously. Sequencing of the O-antigen gene clusters of all of the known O-groups of *E. coli* is now complete, and the sequences have been deposited in the GenBank database. The sequence information has revealed that some *E. coli* serogroups have identical sequences while others have point mutations or insertion sequences and type as different serogroups in serological reactions. There are also a number of other ambiguities in serotyping that need to be resolved. Furthermore, new *E. coli* O-groups are being identified. Therefore, there is an essential need to resolve these issues and to revise the *E. coli* serotype nomenclature based on these findings. There are emerging technologies that can potentially be applied for molecular serotyping and detection and characterization of *E. coli*. On a related topic, the genome sequence of thousands of *E. coli* strains have been deposited in GenBank, and this information is revealing unique markers such as CRISPR (clustered regularly interspaced short palindromic repeats) and virulence gene markers that could be used to identify *E. coli* pathotypes. Whole genome sequencing now provides the opportunity to study the role of horizontal gene transfer in the evolution and emergence of pathogenic *E. coli* strains. Whole genome sequencing approaches are being investigated for genotyping and outbreak investigation for regulatory and public health needs; however, there is a need for establishing bioinformatics pipelines able to handle large amounts of data as we move toward the use of genetic approaches for non-culture-based detection and characterization of *E. coli* and for outbreak investigations.

**Citation:** Fratamico, P. M., DebRoy, C., Needleman, D. S., eds. (2018). Emerging Approaches for Typing, Detection, Characterization, and Traceback of *Escherichia coli,* 2nd Edition. Lausanne: Frontiers Media. doi: 10.3389/978-2-88945-433-4

# Table of Contents

*06 Editorial: Emerging Approaches for Typing, Detection, Characterization, and Traceback of* **Escherichia coli** Pina M. Fratamico, Chitrita DebRoy and David S. Needleman

#### **Molecular serotyping and subtyping of** *E. coli*


#### **Prevalence, detection, and characterization of pathogenic** *E. coli*

*41 Characterization of Shiga Toxin Subtypes and Virulence Genes in Porcine Shiga Toxin-Producing* **Escherichia coli**

Gian Marco Baranzoni, Pina M. Fratamico, Jayanthi Gangiredla, Isha Patel, Lori K. Bagi, Sabine Delannoy, Patrick Fach, Federica Boccia, Aniello Anastasio and Tiziana Pepe

*51 Comparison of Methods for the Enumeration of Enterohemorrhagic* **Escherichia coli** *from Veal Hides and Carcasses*

Brandon E. Luedtke and Joseph M. Bosilevac


Christopher H. Sommers, O. J. Scullen and Shiowshuh Sheen

*87 Surveillance of Extended-Spectrum Beta-Lactamase-Producing* **Escherichia coli** *in Dairy Cattle Farms in the Nile Delta, Egypt*

Sascha D. Braun, Marwa F. E. Ahmed, Hosny El-Adawy, Helmut Hotzel, Ines Engelmann, Daniel Weiß, Stefan Monecke and Ralf Ehricht

#### **Molecular characterization and targets for detection and genetic profiling of pathogenic** *E. coli*

*101 Development of a High Resolution Virulence Allelic Profiling (HReVAP) Approach Based on the Accessory Genome of* **Escherichia coli** *to Characterize Shiga-Toxin Producing* **E. coli** *(STEC)*

Valeria Michelacci, Massimiliano Orsini, Arnold Knijn, Sabine Delannoy, Patrick Fach, Alfredo Caprioli and Stefano Morabito

*116 Revisiting the STEC Testing Approach: Using* **espK** *and* **espV** *to Make Enterohemorrhagic* **Escherichia coli** *(EHEC) Detection More Reliable in Beef* Sabine Delannoy, Byron D. Chaves, Sarah A. Ison, Hattie E. Webb, Lothar Beutin, José Delaval, Isabelle Billet and Patrick Fach

#### **Whole genome sequencing applications for** *E. coli*

*126 Implementation of Whole Genome Sequencing (WGS) for Identification and Characterization of Shiga Toxin-Producing* **Escherichia coli** *(STEC) in the United States*

Rebecca L. Lindsey, Hannes Pouseele, Jessica C. Chen, Nancy A. Strockbine and Heather A. Carleton


Brigida Rusconi, Fatemeh Sanjar, Sara S. K. Koenig, Mark K. Mammel, Phillip I. Tarr and Mark Eppinger

# Editorial: Emerging Approaches for Typing, Detection, Characterization, and Traceback of Escherichia coli

Pina M. Fratamico<sup>1</sup> \*, Chitrita DebRoy <sup>2</sup> and David S. Needleman<sup>1</sup>

*<sup>1</sup> Agricultural Research Service, United States Department of Agriculture, Eastern Regional Research Center, Wyndmoor, PA, USA, <sup>2</sup> Department of Veterinary and Biomedical Sciences, E. coli Reference Center, The Pennsylvania State University, University Park, PA, USA*

Keywords: Escherichia coli, virulence factors, whole genome sequencing, serotyping, detection, subtyping

#### **Editorial on the Research Topic**

#### **Emerging Approaches for Typing, Detection, Characterization, and Traceback of Escherichia coli**

#### Edited by:

*Ludmila Chistoserdova, University of Washington, USA*

#### Reviewed by:

*Scott H. Harrison, North Carolina Agricultural and Technical State University, USA*

\*Correspondence:

*Pina M. Fratamico pina.fratamico@ars.usda.gov*

#### Specialty section:

*This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology*

Received: *28 September 2016* Accepted: *09 December 2016* Published: *27 December 2016*

#### Citation:

*Fratamico PM, DebRoy C and Needleman DS (2016) Editorial: Emerging Approaches for Typing, Detection, Characterization, and Traceback of Escherichia coli. Front. Microbiol. 7:2089. doi: 10.3389/fmicb.2016.02089* Commensal E. coli inhabit the large intestines of humans and animals and are important in maintaining normal intestinal homeostasis. There are also many groups of disease-causing E. coli, including diarrheagenic and extra-intestinal pathogenic E. coli (ExPEC). E. coli strains have been identified primarily based on their O- and H-antigens, defining the E. coli serotype. There are approximately 188 somatic O-antigens, 74 capsular K-antigens, 53 flagellar H-antigens, and greater than 60 fimbrial F-antigens in E. coli identified based on antigens that produce an immune response in animals. This research topic consists of articles based on the subject of an international workshop on "Emerging approaches for typing, detection and characterization of Escherichia coli" held at the Pennsylvania State University in 2015. This workshop brought together well-known scientists from throughout the world to provide a forum for the exchange of ideas with regard to examining the current serotype classification and nomenclature for E. coli, emerging pathotypes, and new technologies and whole genome sequencing (WGS) for detection, characterization, and outbreak investigation. Scientists who presented papers at the conference were affiliated with public health laboratories, regulatory agencies, academic institutions, and industry groups from the U.S., United Kingdom, France, Italy, Canada, Denmark, Germany, and Japan, as well as industry groups working on technologies for characterization, detection, identification, and subtyping E. coli. This workshop provided a forum to discuss different concepts and practices for typing E. coli based on O- and H-antigens and for characterization of pathotypes. Furthermore, there were discussions on the progress made in the area of WGS as a tool for E. coli typing, subtyping, characterization, diagnostics, and outbreak investigation, as well as on the evolution and emergence of highly pathogenic strains (Franz et al., 2014).

A mini review, providing an overview on this research topic is "Advances in molecular serotyping and subtyping of Escherichia coli" (Fratamico et al.). Phenotypic methods, such as serotyping for O- and H-antigen determination, biotyping based on biochemical characteristics, or bacteriophage typing have been used for differentiating and characterizing E. coli for many years; however, these methods are labor intensive, time consuming, and not always accurate. Recent advances in molecular techniques and DNA sequencing technologies have led to the development of typing and subtyping methods for E. coli that are more accurate and have better discriminatory power than phenotypic typing methods. WGS of E. coli is allowing more rapid and accurate identification of pathogenic strains and is replacing pulsed-field gel electrophoresis (PFGE), a gold standard for investigating food-borne disease outbreaks (Franz et al., 2014; Joensen et al., 2014). The necessity to analyze large amounts of data generated from WGS has also led to the development of new and more efficient bioinformatics pipelines. Lindsey et al. described and validated the use of a genotyping plug-in within BioNumerics <sup>R</sup> v7.5 to provide an accurate and cost-effective single workflow to replace the complex suite of workflows currently used to perform the bioinformatics analyses of WGS data.

Parsons et al. presented a review of various methods and strategies for detection and characterization of Shiga toxin-producing E. coli (STEC), including the use of various chromogenic agars, enzyme immunoassays, and qPCR. Their group at the Provincial Laboratory of Public Health in Canada have also utilized WGS, single nucleotide polymorphism (SNP), and k-mer analysis for epidemiological investigations, and they compared these methods to pulsed-field gel electrophoresis and multiple-locus variable number tandem repeat analysis. In agreement with other studies, they found that the sequencebased methods offer higher resolution (Salipante et al., 2015). Use of WGS to distinguish E. coli O157:H7 outbreak strains was presented by Rusconi et al. Utilizing customized highresolution bioinformatics sequence typing approaches, the core genomes, mobilome plasticity, and SNPs were determined. In addition to providing higher strain discriminatory power compared to currently used methods such as PFGE, sequencebased strategies offer the advantages of higher throughput and improved cost-effectiveness. Whole genome sequencing was compared to phenotypic serotyping results on non-O157 STEC isolated from human fecal specimens (Chattaway et al.). They found that most isolates for which the O-group could not be identified by serotyping could be O-typed using WGS data. Furthermore, WGS data provided more accurate stx-subtyping results compared to PCR. Thus, WGS provided more reliable results for strain identification and characterization compared to traditional serotyping and PCR, enabling a higher level of strain discrimination and the ability to predict the pathogenic potential.

WGS data on enteric pathogens is shedding light into the relationship between Shigella species and E. coli. Shigella species and enteroinvasive E. coli (EIEC) are very similar genetically, and it has been proposed that Shigella and EIEC be classified as a single pathovar of E. coli. The work of Pettengill et al. based on WGS data and phylogenetic analyses of EIEC and Shigella also did not support distinct genera designations for these organisms. They suggested that Shigella be classified as EIEC, and common O-antigen designations should be used. They also identified a panel of 404 SNP markers that could be used to discriminate among different phylogenetic clades.

Targeted sequencing in combination with PCR has furthered the characterization and typing of E. coli. The sequences of the O-antigen gene clusters of 6 STEC O-groups that were found to be non-typeable using a previously reported E. coli O-genotyping PCR system were published by Iguchi et al. They then developed specific PCR assays to identify these novel O-groups that they designated as OgN1, OgN8, OgN9, OgN10, OgN12, and OgN31. Therefore, many O-groups that are not able to be identified by traditional serotyping or other methods may represent new E. coli O-groups and will require new O-group designations. For distinguishing the H1 and H12 H-types of E. coli that have 97.5% identity in their fliCH1 and fliCH12 genes, Beutin et al. analyzed the sequences of E. coli H1 and H12 strains and developed a twostep real-time PCR detection procedure where the first step PCR assay would detect both H1 and H12 flagellar types followed by second step real-time PCR that discriminated H1 or H12. These technologies will ultimately lead to further clarification E. coli Oand H-genotypes and in their phylogenetic relationships.

In an effort to develop more accurate and specific methods for STEC detection, Delannoy et al. tested beef enrichments for the presence of specific STEC genetic markers and found that stx, eae, espK, and espV, in combination with CRISPRO26:H11 served as a suitable set of markers for screening for STEC of concern in beef enrichments, thus lowering the number of samples that required additional testing. A High Resolution Virulence Allelic Profiling (HReVAP) approach to determine allelic variants in genes carried on STEC pathogenicity islands (LEE locus, OI-122, and OI-57) was developed by Michelacci et al. Following cluster analysis of the allelic forms of 91 virulence genes, representing allelic signatures based on HReVAP analyses, they investigated the phylogeny of STEC and identified subpopulations within groups. The approach provided evidence for co-evolution of LEE and OI-122, likely acquired through a single event, and it could be used a tool for studying the evolution of STEC mobile genetic elements. Knowledge of allelic signatures in STEC isolates and the presence of specific genes associated with strains that cause severe illness (hemorrhagic colitis and hemolytic uremic syndrome) improves the ability to test for and identify STEC strains of greater clinical significance.

Knowledge of the virulence genes of food-borne pathogens carried by animals provides information to assess their genetic diversity and zoonotic potential. Baranzoni et al. characterized STEC strains in pigs and found that strains that carried stx2e (81%) were the more predominant STEC, followed by strains that carried stx1a (14%), stx2d (3%), and stx1c (1%) as determined by PCR and an E. coli Identification (ECID) Array. The swine isolates also carried other virulence genes such as iha, lpfAO26, lpfAO157, fedA, orfA, and orfB. The work presented new insights into the virulence genes associated with porcine isolates that may potentially cause human illness. STEC prevalence throughout the pork production chain in Argentina was studied by Colello et al. by determining the O-groups and virulence gene content of strains collected from farms, during slaughter and processing, and at retail markets. The prevalence and characterization of STEC strains throughout the system reflected vertical transmission of the strains, emphasizing the importance of integrated STEC control systems from farm to table.

Overuse and misuse of antibiotics in humans and animals has led to an increase in antibiotic resistant bacteria, which is a major public health concern worldwide. The prevalence of extended-spectrum beta-lactamase (ESBL)-producing E. coli in dairy cattle farms in Egypt was studied by Braun et al. using DNA microarray-based assays that detect resistance genes, and the isolated strains were also serotyped using a SeroGenoTyping array. The study showed a notable prevalence of ESBL-producing strains, and carbapenemase genes (blaOXA-48 and blaOXA-181) (encode enzymes in the beta-lactamase family) were also detected in isolates resistant to imipenem and meropenem. The high prevalence of antibiotic resistant E. coli strains in cattle farms in Egypt points to the importance of increased surveillance efforts in developing countries. Another array-based approach for STEC characterization was reviewed by Carter et al. The "suspension array" technology, which can potentially analyze up to 500 targets in one reaction, was used in both multiplex PCR and immunoassay microbead-based formats to detect E. coli serogroup-specific markers, as well as virulence markers. In addition to virulence- and serotype-specific genes, highthroughput array-based systems can potentially detect antibiotic resistance determinants, as well.

Cattle are an important reservoir of STEC, and STEC that cause serious human illness can be found in higher rates from veal calves compared to beef cattle. Methods for determining the prevalence of STEC in veal hides and carcasses, as well as for enumeration of the STEC were evaluated. The methods included a molecular-based most probable number method (MPN), a qPCR assay, and a digital PCR assay (Luedtke and Bosilevac). Each molecular method had strengths and weaknesses related to the detection and enumeration rate and dynamic range for enumeration. ExPEC cause the majority of urinary tract infections and are associated with many other types of extraintestinal infections. Food, particularly poultry, can carry ExPEC, particularly antibiotic-resistant strains that cause human illness (Manges, 2016). Sommers et al. presented information on interventions for control of uropathogenic E. coli (UPEC) in ground chicken meat, purge, and chicken meat surfaces. Using non-thermal processing technologies such as high pressure processing, gamma irradiation, and ultraviolet light (UV-C) they showed significant reduction in the levels of UPEC strains in poultry meat; therefore, these treatments may be useful for the production of safer food products, particularly for at-risk consumers.

In summary, many new technologies have become available to enhance the ability to detect, identify, and characterize E. coli. WGS in particular is providing a powerful and expanding range of information to identify targets for development of improved intervention strategies and is being implemented for source tracking and as part of routine surveillance systems (Bergholz et al., 2014). We expect that further developments in WGS and other genomic and molecular technologies will continue to contribute to a greater understanding of the pathogenesis of E. coli and ultimately provide better resources for improving public health.

### AUTHOR CONTRIBUTIONS

All authors listed, have made substantial, direct and intellectual contribution to the work, and approved it for publication.

#### REFERENCES


Salipante, S. J., SenGupta, D. J., Cummings, L. A., Land, T. A., Hoogestraat, D. R., and Cookson, B. T. (2015). Application of whole-genome sequencing for bacterial strain typing in molecular epidemiology. J. Clin. Microbiol. 53, 1072–1079. doi: 10.1128/JCM. 03385-14

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Fratamico, DebRoy and Needleman. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Advances in Molecular Serotyping and Subtyping of Escherichia coli†

Pina M. Fratamico<sup>1</sup> \*, Chitrita DebRoy<sup>2</sup> , Yanhong Liu<sup>1</sup> , David S. Needleman<sup>1</sup> , Gian Marco Baranzoni<sup>1</sup> and Peter Feng<sup>3</sup>

<sup>1</sup> Eastern Regional Research Center, Agricultural Research Service, United States Department of Agriculture, Wyndmoor, PA, USA, <sup>2</sup> Department of Veterinary and Biomedical Sciences, Pennsylvania State University, University Park, PA, USA, <sup>3</sup> Division of Microbiology, U.S. Food and Drug Administration, College Park, MD, USA

Escherichia coli plays an important role as a member of the gut microbiota; however, pathogenic strains also exist, including various diarrheagenic E. coli pathotypes and extraintestinal pathogenic E. coli that cause illness outside of the GI-tract. E. coli have traditionally been serotyped using antisera against the ca. 186 O-antigens and 53 H-flagellar antigens. Phenotypic methods, including bacteriophage typing and O- and H- serotyping for differentiating and characterizing E. coli have been used for many years; however, these methods are generally time consuming and not always accurate. Advances in next generation sequencing technologies have made it possible to develop genetic-based subtyping and molecular serotyping methods for E. coli, which are more discriminatory compared to phenotypic typing methods. Furthermore, whole genome sequencing (WGS) of E. coli is replacing established subtyping methods such as pulsedfield gel electrophoresis, providing a major advancement in the ability to investigate food-borne disease outbreaks and for trace-back to sources. A variety of sequence analysis tools and bioinformatic pipelines are being developed to analyze the vast amount of data generated by WGS and to obtain specific information such as O- and H-group determination and the presence of virulence genes and other genetic markers.

Edited by: Feng Gao, Tianjin University, China

#### Reviewed by:

Andrea Isabel Moreno Switt, Universidad Andrés Bello, Chile Séamus Fanning, University College Dublin, Ireland

#### \*Correspondence:

Pina M. Fratamico pina.fratamico@ars.usda.gov

#### Specialty section:

This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology

Received: 26 February 2016 Accepted: 18 April 2016 Published: 03 May 2016

#### Citation:

Fratamico PM, DebRoy C, Liu Y, Needleman DS, Baranzoni GM and Feng P (2016) Advances in Molecular Serotyping and Subtyping of Escherichia coli. Front. Microbiol. 7:644. doi: 10.3389/fmicb.2016.00644 Keywords: Escherichia coli, molecular serotyping, subtyping, detection, identification, whole genome sequencing, O-group, H-type

### INTRODUCTION

Escherichia coli strains are commensal organisms that are part of the normal intestinal microflora of humans and other mammals. The traditional method for identifying E. coli uses antibodies to test for surface antigens: the O- polysaccharide antigens, flagellar H-antigens, and capsular K-antigens (described below). There are currently ∼186 different E. coli O-groups and 53 H-types, so serotyping is highly complex. There are also many pathogenic groups of E. coli that cause disease in humans and animals, including diarrheagenic E. coli and the extra-intestinal pathogenic E. coli (ExPEC) that cause illness outside of the GI-tract. Diarrheagenic E. coli that cause human illness have been classified based on specific sets of virulence genes they carry and the characteristics of the disease they cause (Kaper et al., 2004). These pathotypes include the enteropathogenic E. coli (EPEC), enterotoxigenic E. coli (ETEC), enteroinvasive E. coli (EIEC), enteroaggregative E. coli (EAEC), Shiga toxin-producing E. coli (STEC), diffusely adherent E. coli (DEAC), and

<sup>†</sup>Mention of trade names or commercial products is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U.S. Department of Agriculture.

adherent invasive E. coli (AIEC) that have been associated with Crohn's disease. There are also hybrid pathotypes, including the enteroaggregative hemorrhagic E. coli (EAHEC) that carry STECand EAEC-associated virulence genes. As an example, EAHEC serotype O104:H4, an EAEC that acquired the phage that carried the Shiga toxin gene of STEC, caused a large outbreak in 2011 associated with illness in over 3800 individuals and 54 deaths (Frank et al., 2011). Certain E. coli serotypes are often associated with specific pathotypes, such as STEC O157:H7 and O103:H21 (Kaper et al., 2004) that are important STEC, often referred to as enterohemorrhagic E. coli (EHEC). Therefore, pathogenic E. coli constitutes a genetically heterogeneous family of bacteria, and they continue to evolve.

Extra-intestinal pathogenic E. coli cause illness outside of the gastrointestinal tract, including urinary tract infections, meningitis, pneumonia, septicemia, and other types of infections (Russo and Johnson, 2003; Smith et al., 2007). ExPEC that cause illness in poultry are known as avian pathogenic E. coli (APEC). Avian colibacillosis caused by APEC is a major cause of morbidity and mortality associated with economic losses in the poultry industry throughout the world. The human gut is a reservoir for ExPEC that cause human illness. When ExPEC leave the GI tract and infect other parts of the body such as the urinary tract, the blood, or the lungs, illness results (Smith et al., 2007). Animals, particularly, poultry and poultry products (eggs), pork/pigs, and beef/cattle, and also companion animals may carry ExPEC, and thus, these pathogens may be acquired through the food supply, and zoonotic pathogens may also be acquired via contact with animals (Vincent et al., 2010; Nordstrom et al., 2013; Mitchell et al., 2015; Singer, 2015). Investigations of community-acquired UTI and outbreaks of UTI suggested common point sources, such as contaminated food products (Nordstrom et al., 2013). Indeed, high genetic similarity, including antibiotic resistance and virulence gene patterns, between APEC and ExPEC strains causing disease in poultry and humans, respectively, has been observed (Smith et al., 2007; Manges and Johnson, 2012). The ability to differentiate commensal E. coli from ExPEC and other pathotypes is important for risk assessment and epidemiological and ecological studies. However, a rapid and reliable typing/identification system or criteria that allows this type of discrimination and that also provides information on the organism's evolutionary history, fitness, and pathogenic potential has not yet been established. Determining whether an E. coli strain is an ExPEC and whether it is pathogenic is based on its source, O:K:H serotype, phylogenetic background, virulence factor profile, and experimental virulence in an animal model. ExPEC belong to specific phylogenetic groups (A, B1, B2, and D) determined based on multilocus enzyme electrophoresis, ribotyping, or by triplex PCR targeting the genes chuA and yjaA and a particular DNA fragment known as TSPE4.C2. ExPEC strains belonging to phylogenetic groups B2 and D show higher virulence in humans (Clermont et al., 2000; Smith et al., 2007). It has become evident that certain ExPEC lineages or clonal groups are responsible for a large fraction of human extraintestinal E. coli infections, and these lineages are becoming increasingly multi-drug resistant (Smith et al., 2007; Manges and Johnson, 2012).

Rapid and accurate molecular methods are critically needed to detect and trace pathogenic E. coli in food and animals and for epidemiological investigations to enhance food safety and animal and human health, as well as to minimize the size and geographical extent of outbreaks. As opposed to traditional serotyping using antisera raised against the different E. coli Oand H-types, molecular serotyping generally refers to geneticbased assays targeting O-group-specific genes found within the E. coli O-antigen gene clusters and the H-antigen genes that encode for the different flagellar types. Although determining the E. coli serotype could be considered a component of subtyping (differentiation beyond the species level), methods used for molecular subtyping such as pulsed-field gel electrophoresis (PFGE), multilocus sequence typing (MLST), and whole genome sequencing (WGS) generate a unique "fingerprint" of the bacterium that can be used in outbreak investigations and to determine the source of illnesses. There are many problems associated with traditional serotyping for determining the E. coli O- and H-groups. It is costly, labor-intensive and time consuming, cross reactivity of the antisera with different serogroups occurs, antisera are available only in specialized laboratories, batch-to-batch variations in antibodies can occur, and many E. coli strains isolated from various sources are nontypeable (Lacher et al., 2014). Thus, molecular serotyping offers alternative methods for E. coli serotyping, and furthermore, they can be coupled with assays for specific virulence gene enabling the determination of O- and H-group, pathotype, and the strain's pathogenic potential simultaneously.

### E. coli O-, K-, AND H-ANTIGENS

The outer membrane of E. coli is composed of lipopolysaccharides (LPS) that includes lipid A, core oligosaccharides, and a unique polysaccharide, referred to as the O-antigen. Loss of the O-antigens results in attenuated virulence suggesting their importance in host–pathogen interactions (Sarkar et al., 2014). Based on the antigenic diversity among the different O-antigens, they have been targeted as biomarkers for classification of E. coli since the 1940s (Kaufmann, 1943, 1944, 1947). Later, Ørskov et al. (1977) presented a comprehensive serotyping system for 164 E. coli O-groups and developed a typing scheme based on the presence of three principal surface antigens, O-antigens, flagellar H-antigens, and capsular K-antigens. Since few laboratories had capabilities to type the K antigen, serotyping based on Oand H-antigens became the gold standard for E. coli typing. Currently, O-groups numbered O1-O188 have been defined, except for O31, O47, O67, O72, O94, and O122 that have not been designated (Ørskov and Ørskov, 1984; Scheutz et al., 2004), and four groups have been divided into subtypes O18ab/ac, O28ab/ac, O112ab/ac, and O125ab/ac, giving a total of 186 O-groups.

The conventional serotyping method is based on agglutination reactions of the O-antigen with antisera that are generated in rabbits against each of the O-groups (Ørskov and Ørskov, 1984). The method is easy to carry out; however, it is

laborious and error-prone, and thus, molecular methods are better alternatives for O-typing (Ballmer et al., 2007; Lacher et al., 2014). The genes that encode for O-antigens are located on the chromosome in a cluster designated as the O-antigen gene cluster (O-AGC). These are flanked by two conserved sequences called JUMPstart, a 39 bp-element at the 5 0 end (Hobbs and Reeves, 1994), which is downstream of galF (UTP-glucose-1-phosphate uridylyltransferase) and gnd (6 phosphogluconate dehydrogenase) at the 3<sup>0</sup> end. Analysis of the O-AGCs of all E. coli O-groups (Iguchi et al., 2015a; DebRoy et al., 2016) showed that the sizes of the O-AGCs and their gene content vary considerably, which results in the variability of O-antigens. O-antigens are composed of 10– 25 repeating units of two to seven sugar residues and are processed by three mechanism of which the most common is Wzy (O antigen polymerase) dependent, followed by an ABC transporter dependent system, and the third mechanism, which involves a synthase dependent pathway (Greenfield and Whitfield, 2012) by which the O-antigens are flipped across the outer membrane. The pathways for biosynthesis of the O-AGCs and assembly of O-antigens have been studied extensively (Samuel and Reeves, 2003). Each of the O-antigens that utilize Wzy-dependent pathway carries two unique genes wzx (O-antigen flippase) and wzy (O-antigen polymerase). Wzx proteins translocates the O-units across the inner membrane, and Wzy polymerizes the O-antigen (Samuel and Reeves, 2003). For the ABC transporter-dependent pathway, wzm (O-antigen ABC transporter permease gene) and wzt (ABC transporter ATP-binding gene) are involved in O-AGC synthesis. The O-AGCs are composed of nucleotide sugar biosynthesis genes that are involved in the synthesis of O-antigen nucleotide sugar precursors, the glycosyl transferases that transfer the various sugar precursors to form the oligosaccharide, and the O-antigen processing genes described above.

All of the O-AGC clusters have been sequenced, and sequence analyses revealed that some O-AGCs are 98–100% identical (Iguchi et al., 2015a; DebRoy et al., 2016) while others have point mutations or insertion sequences which causes these to type as different serogroups (Liu et al., 2008, 2015) . Therefore, there is a need to resolve these discrepancies, merge or eliminate serogroups and to revise the E. coli serotype nomenclature (DebRoy et al., 2016). Furthermore, many of the E. coli O-AGCs have been found to be identical to those of other Enterobacteriaceae members such as Shigella and Salmonella (Wang et al., 2007). Out of 34 distinct Shigella O-antigens, 13 were unique to Shigella; however, the other 21 were also found in E. coli (Liu et al., 2008). Similarly, out of 46 O-AGCs of Salmonella, 24 of were found to be identical or closely related to E. coli O-antigens (Liu et al., 2014).

Serology has defined 53 H-flagellar antigens (Ørskov and Ørskov, 1984; Ewing, 1986) that are numbered from H1 to H56, but H-types 13, 22, and 50 are not in use (Ørskov et al., 1975; Centers for Disease Control and Prevention [CDC], 1999). Molecular H-typing methods are based on the sequences of fliC gene that encode for the FliC, the flagellar filament structural protein (Wang et al., 2003). The N- and C-terminals of FliC are highly conserved, so different H-types are due to amino acid differences within the central region, which is the surfaceexposed antigenic part of the flagellar filament (Namba et al., 1989). Thus, PCR methods developed to distinguish H-types target the variable region of the fliC gene (Machado et al., 2000); however, these regions of some H-types such as H1 and H12 and H25 and H28 are very similar, making them difficult to distinguish. However, a two-step PCR method was developed that can distinguish between fliCH1 and fliCH12 (Beutin et al., 2015, 2016). Other methods such as Matrix-Assisted Laser Desorption/Ionization Time-of-Flight (MALDI-TOF)-based peptide mass fingerprinting in conjunction with a custom E. coli H-antigen data base (Cheng et al., 2014) has been also utilized to distinguish H-types (Chui et al., 2015).

#### METHODS USED FOR SUBTYPING AND MOLECULAR SEROTYPING OF E. coli

Subtyping methods that allow for differentiation of E. coli beyond the species and subspecies level are critical for determining the source of outbreaks and establishing transmission pathways (Eppinger et al., 2011; Frank et al., 2011). Several phenotypebased and genotype-based methods for subtyping E. coli are listed in **Table 1**. Phenotypic culture methods, in conjunction with biochemical-based testing, serotyping, phage typing, multilocus enzyme electrophoresis have been used for many years and could be considered gold standard methods; however, they are time and labor intensive and may not be very discriminatory.

Compared to phenotypic methods, genetic subtyping methods that are based on bacterial DNA, generally have better discriminatory ability. Of the various methods used for E. coli subtyping, PFGE is a reliable and highly discriminating method and has been considered to be the "gold standard" of typing methods. Through the establishment of PulseNet (Ribot et al., 2006), use of PFGE has had a major impact on pathogen subtyping and outbreak investigation.

In contrast to traditional serotyping, Luminex <sup>R</sup> -based suspension assays allow for simultaneous testing for multiple serogroups in a single assay. Lin et al. (2011) performed PCR assays targeting the wzx and wzy genes of ten Shiga toxin-producing E. coli (STEC) serogroups, and then used the Luminex <sup>R</sup> system to identify the 10 serogroups through binding of the PCR products to fluorescent microspheres conjugated to specific DNA probes for each of the ten serogroups. Clotilde et al. (2015) used the Luminex <sup>R</sup> technology, both antibody- and multiplex PCR-based, and compared them to traditional E. coli serotyping. The results of the two Luminex <sup>R</sup> assays were mostly consistent, and 11 STEC isolates that were previously untypeable by traditional serotyping were able to be typed.

Multiplex PCR-based assays targeting unique regions within the E. coli O-AGCs have been used to determine the O-groups. A review by DebRoy et al. (2011) describes many of these assays, most of which target the E. coli wzx and wzy genes. Based on O-AGC sequence data for all O-groups, Iguchi et al. (2015b) designed 162 PCR primer pairs for identification and classification of E. coli O-serogroups. The primer pairs were

#### TABLE 1 | Phenotype- and genotype-based methods for subtyping and molecular serotyping of E. coli.


<sup>a</sup>The O-somatic and H-flagellar antigens define the E. coli serotype. Agglutination assays using antibodies that react with the specific O- and H-antigens are the basis for traditional serotyping. Molecular serotyping methods are generally based on genetic targets specific to the O- and H-antigens.

used in 20 separate multiplex PCR assays with each assay containing 6–9 primer pairs that amplified products of different sizes so that they could be distinguished. A high-throughput PCR method based on the GeneDisc <sup>R</sup> array targeted virulence genes and O- and H-type-specific genes for identification of STEC associated with severe illness (Bugarel et al., 2010b). Another high-throughput method, known as the BioMarkTM real-time PCR system (Fluidigm), used a panel of virulence genes as discriminative markers to differentiate EHEC O26 strains, EHEC-like O26 pathogenic strains, and avirulent O26 strains (Bugarel et al., 2011a).

Clustered regularly interspaced short palindromic repeats (CRISPR) are short, highly conserved DNA repeats separated by unique sequences of similar length, and they have been used for subtyping, identification, and detection of bacteria (Shariat and Dudley, 2014). Based on spacer content or sequencing of CRISPR loci, CRISPR-based typing analyses can be used to differentiate strains for epidemiological investigations or

for detection. Delannoy et al. (2012) utilized CRISPR loci of seven important EHEC serotypes to develop real-time PCR assays, generating results based on CRISPR polymorphisms that correlated with specific EHEC O:H serotypes and the presence of EHEC virulence genes.

DNA microarrays have also been developed for molecular serotyping of E. coli (Liu and Fratamico, 2006; Ballmer et al., 2007; Geue et al., 2014; Lacher et al., 2014). One microarray method to identify E. coli serogroups involved spotting O-groupspecific wzx or wzy gene oligonucleotides or PCR products onto the chip and hybridized with labeled PCR products of the entire O-AGCs (Liu and Fratamico, 2006). Lacher et al. (2014) reported on the use of an FDA-ECID (E. coli identification) microarray for O- and H-typing of E. coli. The ECID chip was designed based on >250 E. coli genomes and incorporates over 40,000 E. coli genes, including O- and H-group-specific genes, and approximately 9800 single nucleotide polymorphisms (SNPs). Antibody-based microarrays have also been developed to detect important non-O157 STEC serogroups (Gehring et al., 2013; Hegde et al., 2013). Although this method is rapid and has the potential to be used for high throughput screening, the utilization of this method is dependent on the availability of antibodies with good specificity.

The commercial introduction of next-generation sequencing technologies has made it possible to perform routine WGS of E. coli and other bacteria relatively rapidly and at affordable costs (Franz et al., 2014). Since WGS typing has discriminatory power superior to other typing methods, it has the potential to revolutionize bacterial subtyping. A MLST webserver was designed to determine sequence types (STs) of bacteria using WGS data. STs were determined from uploaded preassembled complete or partial genome sequences or short sequence reads obtained from different sequencing platforms (Larsen et al., 2012). Based on SNPs observed from WGS data, Norman et al. (2015) identified unique STEC O26 genotypes in human and cattle strains. These isolates had similar virulence gene profiles and did not cluster in separate polymorphism-derived genotypes, and thus human and cattle strains could not be distinguished within the phylogenetic clusters. An approach based on targeted amplicon sequencing for SNP genotyping was used to determine the relationship of stx-positive and stxnegative E. coli O26:H11 strains from cattle compared to the genomes of human clinical isolates (Ison et al., 2016). Joensen et al. (2015) described SerotypeFinder, a publicly available web tool hosted by the Center for Genomic Epidemiology, Denmark, which enables WGS-based serotyping of E. coli. Typing is based on wzx, wzy, wzm, and wzt, as well as flagellinassociated genes. Similar to SerotypeFinder, the VirulenceFinder tool can be used to determine virulence genes in E. coli to determine different pathogenic groups (Joensen et al., 2014).

Whole genome sequencing typing has the potential to be the new "gold-standard" for pathogen subtyping. However, some challenges need to be addressed before standardization and full implementation of this technology. The bioinformatic analyses required to analyze enormous amounts of sequence data generated by WGS are necessitating the development of analysis pipelines to enhance the assembly, annotation, and interpretation of the data, which will require a coordinated international approach (Franz et al., 2014; Oulas et al., 2015). Currently, the following databases for WGS and advanced detection are available: the 100K Genome Project<sup>1</sup> , GenomeTrakr Network<sup>2</sup> , Global Microbial Identifier<sup>3</sup> , and Advanced Molecular Detection<sup>4</sup> . These databases are creating a vast resource of microbial genome information for WGS-based surveillance of microbial pathogens. Furthermore, detailed analysis of WGS data can determine the E. coli O- and H-type and provide information on the resistome (antibiotic resistance gene profile) of the isolate, and the presence of specific virulence genes, prophages, and plasmids, as well as other genetic information important to identify E. coli pathotypes as well as utility in evolutionary studies. The advantages of WGS approaches are being recognized by academic, government, industry, and the private sector for addressing regulatory and public health needs. However, as we move toward the use of these genetic approaches for non-culture-based detection, characterization, subtyping, trace backs, and outbreak investigations, it will be critical to establish bioinformatics pipelines that are capable of analyzing and handling the large amounts of data that are generated.

#### AUTHOR CONTRIBUTIONS

PF, CD, YL, DN, GB, and PF have made a substantial, direct, and intellectual contribution to the work and approved it for publication.

## FUNDING

3

This work was supported in part by an appointment to the Agricultural Research services (ARS) Research Participation Program which is administered by the Oak Ridge Institute for Science and Education (ORISE) through an interagency agreement between the U.S. Department of Energy (DOE) and the USDA. ORISE is managed by ORAU under DOE contract number DE-AC05-06OR23100. All opinions expressed in this manuscript are the author's and do not necessarily reflect the policies and views of USDA, ARS, DOE, or ORAU/ORISE.

<sup>1</sup> http://100kgenome.vetmed.ucdavis.edu/

<sup>2</sup> http://www.fda.gov/Food/FoodScienceResearch/WholeGenomeSequencing ProgramWGS/

http://www.cdc.gov/amd/project-summaries/index.html

<sup>4</sup> http://www.globalmicrobialidentifier.org/

#### REFERENCES

fmicb-07-00644 April 30, 2016 Time: 13:2 # 6


laser desorption/ionization time-of-flight (MALDI-TOF)-based peptide mass fingerprinting. J. Clin. Microbiol. 53, 2480–2485. doi: 10.1128/JCM.00593-15



O103, O111, O121, and O145 with serogroups and genetic subtypes. Appl. Environ. Microbiol. 78, 6689–6703. doi: 10.1128/AEM.01259-12


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Fratamico, DebRoy, Liu, Needleman, Baranzoni and Feng. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Genetic Analysis and Detection of fliCH1 and fliCH12 Genes Coding for Serologically Closely Related Flagellar Antigens in Human and Animal Pathogenic Escherichia coli

#### Lothar Beutin<sup>1</sup> , Sabine Delannoy <sup>2</sup> \* and Patrick Fach<sup>2</sup>

<sup>1</sup> Department of Biology, Chemistry, Pharmacy, Institute for Biology - Microbiology, Freie Universität Berlin, Berlin, Germany, <sup>2</sup> Université Paris-Est, Anses, Food Safety Laboratory, IdentyPath, Maisons-Alfort, France

#### Edited by:

Chitrita Debroy, The Pennsylvania State University, USA

#### Reviewed by:

Sang Jun Lee, Korea Research Institute of Bioscience and Biotechnology, Korea Timothy Read, Emory University, USA

> \*Correspondence: Sabine Delannoy sabine.delannoy@anses.fr

#### Specialty section:

This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology

Received: 26 November 2015 Accepted: 25 January 2016 Published: 15 February 2016

#### Citation:

Beutin L, Delannoy S and Fach P (2016) Genetic Analysis and Detection of fliCH1 and fliCH12 Genes Coding for Serologically Closely Related Flagellar Antigens in Human and Animal Pathogenic Escherichia coli. Front. Microbiol. 7:135. doi: 10.3389/fmicb.2016.00135 The E. coli flagellar types H1 and H12 show a high serological cross-reactivity and molecular serotyping appears an advantageous method to establish a clear discrimination between these flagellar types. Analysis of fliCH1 and fliCH12 gene sequences showed that they were 97.5% identical at the nucleotide level. Because of this high degree of homology we developed a two-step real-time PCR detection procedure for reliable discrimination of H1 and H12 flagellar types in E. coli. In the first step, a real-time PCR assay for common detection of both fliCH1 and fliCH12 genes is used, followed in a second step by real-time PCR assays for specific detection of fliCH1 and fliCH12, respectively. The real-time PCR for common detection of fliCH1 and fliCH12 demonstrated 100% sensitivity and specificity as it reacted with all tested E. coli H1 and H12 strains and not with any of the reference strains encoding all the other 51 flagellar antigens. The fliCH1 and fliCH12 gene specific assays detected all E. coli H1 and all E. coli H12 strains, respectively (100% sensitivity). However, both assays showed cross-reactions with some flagellar type reference strains different from H1 and H12. The real-time PCR assays developed in this study can be used in combination for the detection and identification of E. coli H1 and H12 strains isolated from different sources.

#### Keywords: E. coli, molecular serotyping, fliC type H1 gene, fliC type H12 gene, STEC, ExPEC

### INTRODUCTION

Strains belonging to the species of Escherichia coli are ubiquitous as commensals in the gut of humans and warm-blooded animals. Apart from their role as beneficial microbes, some E. coli strains are known to behave as human and animal pathogens, causing a wide spectrum of extraintestinal and enteric diseases, with urinary tract infection and diarrhea as most frequent (Kaper et al., 2004; Stenutz et al., 2006). Pathogenic and apathogenic E. coli cannot be discerned from each other by their morphology, cultural properties or fermentation reactions. As a consequence, serotyping is used since the 1940s as a diagnostic tool for identification of animal and human pathogenic E. coli strains (Orskov and Orskov, 1984).

E. coli serogroups are commonly defined by the antigenic properties of the lipopolysaccharide which is part of the outer membrane (O-antigen) (Stenutz et al., 2006). Motile E. coli strains can be additionally typed for their flagellar filaments (H-antigen) (Orskov and Orskov, 1984). E. coli O- and H-antisera are usually produced by immunization of rabbits with respective reference strains (Orskov and Orskov, 1984; Edwards and Ewing, 1986). At present, 182 O-antigens and 53 H-antigens have been described (Scheutz et al., 2004; Scheutz and Strockbine, 2005). The resulting O:H serotype (for example O157:H7) is commonly used for describing E. coli isolates (Bettelheim, 1978; Orskov and Orskov, 1984).

Complete serotyping of E. coli is laborious and timeconsuming and performed only in a few specialized reference laboratories worldwide. Moreover, cross-reactivity which is observed between some E. coli O-groups and H-types can complicate the interpretation of serotyping results. Last but not least, serotyping fails if autoagglutinating (O-antigen or Hantigen rough) and non-motile (NM) E. coli strains have to be examined (Orskov and Orskov, 1984; Edwards and Ewing, 1986). For these reasons, attempts were made to substitute serotyping by molecular typing of O-antigen and H-antigen encoding genes.

In the recent years, the nucleotide sequences of all known O and H-antigen genes in E. coli have been elucidated (Wang et al., 2003; Iguchi et al., 2015a). Molecular methods such as PCR and nucleotide sequencing have been successfully employed for typing of O- and H-antigen genes in E. coli (Beutin and Fach, 2014; Joensen et al., 2015; Iguchi et al., 2015b). Molecular serotyping was shown to be specific and sensitive and can substitute conventional serological detection of E. coli surface antigens (Bugarel et al., 2010; Fratamico et al., 2011; Clotilde et al., 2015; Iguchi et al., 2015b; Joensen et al., 2015). In contrast to serotyping, molecular detection of O- and H-antigen genes is easier and faster to perform and O-rough and non-motile strains can be typed on the basis of their O- and H-antigen genes (Beutin and Fach, 2014; Joensen et al., 2015).

We have previously investigated the genetic variability of flagellar types H19, H25 and H28 in E. coli (Beutin et al., 2015a,b). These flagellar types are widespread in strains belonging to numerous O-serogroups but are also associated with enterohemorrhagic E. coli O145:H25, O145:H28, and O121:H19 strains. By nucleotide sequence analysis of fliC (flagellin) genes encoding H19, H25, and H28 flagella we have observed a high genetic variability among fliCH19, flicH25, and fliCH28 alleles, respectively. To some part, this sequence alterations were associated with some O-groups of strains which allowed the development of real-time PCR protocols for specific typing of flagellar variants encoded by enterohemorrhagic E. coli O145:H25, O145:H28, and O121:H19 strains (Beutin et al., 2015a,b). Such real-time PCR protocols were found useful for improvement of horizontal real-time PCR detection methods for EHEC from food samples (Beutin et al., 2015a,b).

In this work, we compared E. coli fliC genes that encode flagellar types H1 and H12. These flagellar types show a high serological cross-reactivity and cross-absorbed H1 and H12 antisera are used for definite H-typing (Orskov and Orskov, 1984; Edwards and Ewing, 1986). Moreover, three subtypes of H1 were detected by serological typing using factor specific antisera (Ratiner et al., 1995). Serological cross reactions may cause confounding results in diagnostic laboratories where absorbed antisera are not available. The development of molecular typing procedures for reliable detection of H1 and H12 flagellar types could overcome this specific problem.

A clear discrimination between E. coli flagellar types H1 and H12 has a value for clinical diagnostics and for epidemiological investigations. Some human isolates of Shiga Toxin-producing E. coli (STEC) express H1 or H12 flagella (Scheutz and Strockbine, 2005). Moreover, flagellar type H1 is clinically significant as it is associated with worldwide occurring extraintestinal pathogenic E. coli (ExPEC) strains carrying capsular polysaccharides (O2:K2:H1, O4:K12:H1, O6:K2:H1, O6:K5:H1, O7:K1:H1, O15:K52:H1) that cause cystitis, pyelonephritis and urosepsis (Orskov and Orskov, 1985; Johnson et al., 1994, 2005, 2006; Olesen et al., 2009). Adherent-invasive E. coli (AIEC) O83:H1 strains were associated with Crohn's disease in human patients (Allen et al., 2008; Nash et al., 2010) and flagellar type H1 is associated with biofilm formation and invasive properties of AIEC strains (Eaves-Pyles et al., 2008; Martinez-Medina et al., 2009) as well as with intestinal colonization (Martinez-Medina and Garcia-Gil, 2014). Moreover, H1-type flagellum is a characteristic trait of Shiga toxin 2e-producing E. coli O139:H1 strains which are a major cause of edema disease in pigs (Tschape et al., 1992; Frydendahl, 2002; Fairbrother et al., 2005; Beutin et al., 2008). Conversely, the flagellar type H12 has not been associated with pathogenic E. coli, except from human enterotoxigenic O78:H12 and O128:H12 strains (Orskov and Orskov, 1977; Echeverria et al., 1982; Shaheen et al., 2004).

In this work we have analyzed the nucleotide sequences of E. coli H1 and H12 strains in order to detect characteristic fliC sequence alterations corresponding with these closely related H-types. Subsequently, we have developed a real-time PCR procedure for reliable discrimination of H1 and H12 flagellar types in E. coli. The protocol should be useful for diagnostic and epidemiological investigations of human and animal pathogenic strains of E. coli.

### MATERIALS AND METHODS

#### Bacteria

E. coli strains used in this study were derived from the collections of the National Reference Laboratory for E. coli (NRL E. coli) at the Federal Institute for Risk Assessment (BfR) in Berlin, Germany and from the French Agency for Food, Environmental and Occupational Health and Safety (Anses) in Maisons-Alfort, France. E. coli strains used for specificity study included in particular the E. coli reference strains belonging to serogroups O1-O181 and H-types H1-H56 (Orskov and Orskov, 1984; Edwards and Ewing, 1986). All strains have been previously described for their serotypes and for virulence genes associated with STEC (Beutin et al., 2015a,b). All strains were grown overnight at 37◦C in Luria broth, and DNA was extracted according to manufacturers instructions using InstaGene matrix (BioRad laboratories, Marnes-La-Coquette, France).

Real-time PCR assays were performed with an ABI 7500 instrument (Applied Biosystems, Foster City, CA, USA) in 25 µl reaction volumes, a LightCycler Nano (Roche Diagnostics, Meylan, France) in 10µl reaction volumes or with a LightCycler 1536 (Roche Diagnostics, Meylan, France) in 1.5-µl reaction volumes according to the recommendations of the suppliers. Primers and TaqMan probes were used at 300 nM final concentrations. The following thermal profile was applied to all instruments: enzyme activation at 95◦C for 1–10 min as recommended followed by 40 cycles of denaturation at 95◦C and annealing at 60◦C.

### PCR Detection and Mapping of E. coli O-Antigen and H-Antigen Genes

Mapping of fliC gene variants to their respective H-types was performed as previously described (Beutin et al., 2015a,b). Nucleotide sequence data obtained from thirteen fliCH1 and eight fliCH12 genes were used for designing TaqMan <sup>R</sup> real-time PCR probes and XS probes (minor groove binder replacement, Biolegio, Nijmegen, The Netherlands) and primers for specific detection of all genetic variants of thirteen fliCH1 and eight fliCH12 genes (this work). Real-time PCR probes and primers used in this work were designed with the software Primer Express V3.0 (Applied Biosystems) and are described in **Table 1**.

#### Nucleotide Sequencing

The nucleotide sequence of the PCR products were determined as described (Beutin et al., 2015b) and analyzed with the Accelrys DS Gene software package (Accelrys Inc., USA). The nucleotide sequences of the respective products for fliC homologs were determined and have been submitted to European Nucleotide Archive (ENA). The GenBank Accession numbers are listed in **Table 2**.

#### RESULTS

#### Sources and Properties of E. coli H1 and H12 Strains

The E. coli H1 and H12 strains investigated in this study were from human, animal, food, and environmental sources (**Table 3**). The thirty-one flagellar type H1 strains were associated with 10 different E. coli O-serogroups, O-rough and O-untypable strains and originated from healthy and diseased humans and animals and from food. The thirty-eight H12 strains divided into thirteen different O-groups of E. coli, and in O-untypable and O-rough strains. The H12 strains were from healthy and diseased humans and animals, from food and the environment. Production of Shiga-toxins (Stx) was found in 16 (42.1%) of the H12 strains and associated with five different O-groups. Fourteen (45.2%) of the E. coli H1 strains produced Stx, however most of these were from pigs with edema disease (O139:H1, Or:H1) and harbored the stx2e gene. O:H types known to be associated with E. coli causing extraintestinal infections of humans (O2:H1, O4:H1, O6:H1, O25:H1) were detected among the investigated H1 strains. Interestingly, strains belonging to these serotypes originated not only from humans but also from animals and food. Certain strains belonged to serotypes which have not been previously associated with clinical disease and their role of pathogens for humans and animals is not yet known.

TABLE 1 | Primers and probes for real-time PCR detection of E. coli flagellar types H1 and H12.


<sup>a</sup>XS probes (MGB-replacement) were used for fliCH1 and fliCH12 specific real-time PCRs. <sup>b</sup>Forward primer conserved in all analyzed fliCH1 and fliCH12 sequences.

<sup>c</sup>FliCH1 reverse primer: one mismatch at position 429: fliCH1 = G, fliCH12 = A (underlined). <sup>d</sup>FliCH1 probe: one mismatch at position 381: fliCH1 = C, fliCH12 = T; and position 384: fliCH1 = T, fliCH12 = C (underlined).

<sup>e</sup>FliCH12 reverse primer: one mismatch at position 402 with fliCH1 = T (4/13 strains), fliCH12 = C (underlined).

<sup>f</sup> FliCH12probe: one mismatch at position 381: fliCH12 = T, fliCH1 = C; and position 384: fliCH12 = C, fliCH1 = T (underlined).

<sup>g</sup>Conserved in all 21 fliCH1 and fliCH12 sequences from Table 2.

### Nucleotide Analysis of E. coli fliCH1 and fliCH12 Genes

The nucleotide sequences of the reference strains (Orskov and Orskov, 1984) for E. coli flagellar antigens H1 (strain Su1242, GenBank accession AB028471.1) and H12 (Bi 316-42, GenBank accession AY249997) (Wang et al., 2003) have been published previously. The length of coding sequence of each fliCH1 and fliCH12 gene is 1788 nucleotides and both sequences have 97.5% identity (44 nucleotide exchanges) at the nucleotide level and 98.98% identity and 99.16% similarity at the amino acid level (7 amino acids (aa) exchanges). Additional fliC nucleotide sequences from six E. coli H1 and five E. coli H12 strains were obtained in this work (**Table 2**). These sequences were compared with seven fliCH1 sequences and three fliCH12 sequences already available in GenBank (**Table 2**). All 21 H1 or H12 flagellin genes have a 1788 nucleotides length that codes for 595 amino acid residues.

A cluster analysis performed with thirteen fliCH1 and eight fliCH12 sequences is shown in **Figure 1**. Four different genotypes were detected among the thirteen fliCH1 strains. Uropathogenic E. coli O2:H1, O6:H1, O25:H1, and AIEC O83:H1 strains were identical for their fliCH1 sequences and assigned to a large cluster composed by eight strains. A smaller cluster was formed by five fliCH1 strains; four of these were Stx2e producing O139:H1 causing edema disease in pigs.

Six different genotypes were found among the eight fliCH12 strains. Identical fliCH12 sequences were only found between two O9:K9:H12 strains and each one O55:H12 and O153:H12 strain, respectively.



<sup>a</sup>The whole genome sequence of the E. coli strain ABU 83972 (GenBank: CP001671.1) is available (Zdziarski et al., 2010). The O-serogroup of this strain was not reported but its wzx gene (position 2372093–2373353) is >99% similar to wzx of E. coli O25 strains E47a (GenBank GU014554.1) (Wang et al., 2010). Therefore, we classified ABU 83972 here as an O25:H1 strain.

<sup>b</sup>The whole genome sequence of E. coli strain LF82 is available (GenBank: CU651637.1). The O-serogroup of this strain is not reported but its wzx gene (position 2127428–212804) is identical to the wzx gene of E. coli O83:H31 strain H17a GenBank: KJ778808.1 (unpublished) and of E. coli O83:H1 strain NRG857c (GenBank: CP001855.1) (Allen et al., 2008). Therefore, we suggest that LF82 is an O83:H1 strain.

<sup>c</sup>The fliC sequence deposited under GenBank JF308285 is derived from strain EC614 reported as O157:H1 (Goulter et al., 2010). By Blast search, it is 100% identical to the fliC sequence of the H1 reference strain Su1242 (GenBank accession AY249997). Therefore, the flagellar type of EC614 was classified as H1.

<sup>d</sup>Multiresistant, extended-spectrum-lactamase (ESBL)-producing E. coli from healthy human carrier.

<sup>e</sup>The fliC sequence was determined in this study.

#### Amino acid Alterations between Flagellar Antigens H1 and H12 in E. coli Strains

An alignment of the amino acid sequences of thirteen fliCH1 and eight fliCH12 strains is shown in Table S1. All translation products had a length of 595 amino acids (aa). The eight fliCH12 strains were showing only few alterations with one or more of the strains at aa positions 249, 258, 339, and 472 (99.2% similarity) (Table S1), generating six different protein sequences (**Figure 2**). The thirteen fliCH1 strains split into three protein sequences (**Figure 2**) differing at positions 258, 431, and 481 (99.5% similarity) (Table S1). The aa changes were all located in the variable region of fliC encoding flagellar antigen specificity (Wang et al., 2003). Differences in the aa sequence which could distinguish between all investigated fliCH1 and fliCH12 strains, respectively, were found at positions 302 (Glu/Lys), 340 (Asn/Lys), 361 (Gly/Asp), 391 (Thr/Lys), 396 (Asn/Asp), and 430 (Asn/Lys). The six flagellar type specific aa sequence differences were all located in the variable region of the fliCH1 and fliCH12 genes.

### Development and Evaluation of Real-Time PCR Assays for Identification of E. coli fliCH1 and fliCH12 Strains

The close similarity between E. coli fliCH1 and fliCH12 translation products explains the serological cross-reactivity which was previously described for H1 and H12 antigens (Orskov and Orskov, 1984; Edwards and Ewing, 1986). As specific differences were found that distinguish between fliCH1 and fliCH12 sequences, molecular detection of the respective fliC genes could be more suitable than serotyping for clear identification of H1 and H12 strains of E. coli.

Based on the sequence data obtained for E. coli fliCH1 and fliCH12 genes we developed a TaqMan real-time PCR assay for common detection of fliCH1 and fliCH12 genes as well as

real-time PCR assays for specific detection of fliCH1 and fliCH12, respectively (**Table 1**). Short-length XS-probes (minor groove binder replacement) had to be employed to develop real-time PCR assays specific for fliCH1 and fliCH12 sequences (**Table 1**). We used two nucleotide substitutions between the sequences of fliCH1 and fliCH12 to design specific probes (**Table 1**).

The assays were first tested for sensitivity and specificity on 31 E. coli H1 and 38 E. coli H12 strains (**Table 4**) as well as on the E. coli H-type reference strains (H1-H56) (Orskov and Orskov, 1984; Edwards and Ewing, 1986). The real-time PCR for common detection of fliCH1 and fliCH12 reacted with all tested E. coli H1 and H12 strains (**Table 4**) and not with any of the reference strains encoding all other flagellar antigens than H1 and H12.

The fliCH1 and fliCH12 gene specific assays detected all E. coli H1 and all E. coli H12 strains, respectively (**Table 4**). However, both assays showed cross-reactions with some flagellar type reference strains different from H1 and H12. With the fliCH1 realtime PCR, cross-reactions were observed with H6, H7, H15, H20, H34, H37, H41, H45, H46, H49, and H52 strains. The fliCH12 specific PCR reacted also with H7, H28, H31, and H41 strains (**Table 5**). Although the overall sequences of the fliC genes of H-types cross-reacting with the fliCH<sup>1</sup> and fliCH<sup>12</sup> real-time PCR are widely different from those of fliCH<sup>1</sup> and fliCH12, they show high local similarities with the primers and probes sequences. In cases of cross reactivity, no or only minor differences (0–3 mismatches) were found between target-sequences and fliCH1



<sup>a</sup>This list includes serotype reference strains: Nissle 1917 (O6:K5:H1) (Reister et al., 2014), EH250 (O118:H12), (Scheutz et al., 2012), Su 1242 (O2:K2:H1), E14a (O22:H1), CDC 63- 57 (O139:H1), Bi316-42 (O9:K9:H12), U12-41 (O49:H1) (Orskov and Orskov, 1984). <sup>b</sup>Positive for stx2e.

<sup>c</sup>Positive for stx1d.

TABLE 4 | Detection of different E. coli H1 and H12 strains belonging to different O-serogroups by fliCH1, fliCH12 and fliCH1/H12 Real Time PCR assays.


<sup>a</sup>Range of real time PCR cycle thresholds. Negative reactions are indicated with the "–" sign.

<sup>b</sup>Reference strain Orskov non-motile and the fliC-genotype was detected by nucleotide sequencing of fliC PCR products.

and fliCH12, primers and probes (**Table 5**). Three and more mismatches were found in cases of negative real-time PCR results.

In respect to these findings, the assays were then tested on a second panel of 78 strains comprising strains with H-types previously found to cross-react with FlicH1 or FlicH12 PCR assays as well as strains from O-groups that can be found associated with H1 and H12, but with H-types different from H1 and H12 (**Table 6**).

None of the 78 strains with H–types different from H1 and H12 reacted with the common fliCH1 / fliCH12 real-time PCR assay. Cross reactions with the fliCH1 real-time PCR-assay were observed with H6 (9/9), H49 (3/3), H31 (1/2), H34 (2/7), H41 (1/2), H45 (3/4) as well as with one O6:H4 strain. Weak cross-reactions were also observed with one O2:H25 strain and one O153:H25 strain. Cross-reactions with the fliCH12 real-time PCR-assay were observed with H7 (9/9), H28 (6/6), H31 (1/2),

<sup>d</sup>Positive for stx2.

<sup>e</sup>Positive for stx1.

TABLE 5 | Cross-reactions of fliCH1 and fliCH12 real time PCR assays with other flagellar types of E. coli.


<sup>a</sup>H-type reference strains (Orskov and Orskov, 1984).

<sup>b</sup>As listed in Table 1.

<sup>c</sup>Mean of real-time cycle threshold (CT-values) calculated from duplicate PCRs. Negative reactions are indicated with the "–" sign.

<sup>d</sup>Number of mismatches found between real-time detector sequence and target gene sequence. FP, forward primer; P, gene probe; RP, reverse primer.

H34 (2/7), H41 (1/2), as well as one O20:H9, one O55:H19 and one O153:H14 strains. In contrast to the respective reference strains, cross-reactions were not observed with either real-time PCR-assay with two other H15 and one H52 strain tested (**Tables 5, 6**). We do not know if these three strains show further differences in the PCR-target region which could explain these findings.

Overall, molecular typing of E. coli H1 and H12 strains requires first identification of H1/H12 strains with the common fliCH1/fliCH12 real-time PCR assay, followed by specific identification of fliCH1 and fliCH12, by their respective real-time PCR-assays. The real-time PCR for common detection of fliCH1 and fliCH12 was found 100% sensitive and 100% specific. The fliCH1 and fliCH12 gene specific assays were found 100% sensitive as they detected all E. coli H1 and all E. coli H12 strains, respectively. When used exclusively on H1 and H12 strains (as identified by the common primers/probe set in a first step), the fliCH1 and fliCH12 gene specific assays were found 100% specific. Thus, 100% of H1 and H12 strains would be accurately typed with this system.

#### DISCUSSION

The genetically and serologically closely related flagellar antigens H1 and H12 were found in heterogeneous types of E. coli strains belonging to 26 different O-serogroups, O-untypable and Orough strains. With one exception (O79:H1 and O79:H12), H1 and H12 strains did not share common O-serogroups which would indicate that flagellar types H1 and H12 have separated from each other not very recently in evolution. They may have evolved independently following rearrangements in the O-group loci of ancestor strains carrying the closely related H1/H12 flagellar types and do not directly derive from a common O-group ancestor.

By comparing nucleotide sequences of fliC genes from thirteen H1 and eight H12 strains we identified six H-type specific aa changes at positions 302, 340, 361, 391, 396, and 430. All these are located in the variable part of flagellin determining antigen specificity (Wang et al., 2003). As these changes are characteristic for the respective flagellar antigen, we suppose them to determine the antigen specificities of H1 and H12. The few other aa changes detected in some H1 and H12 strains might thus not be significant as specific characteristics of H1 or H12 types. However, such aa-changes could explain the finding of serological subtypes of H1 which were detected using factor specific H-antisera (Ratiner et al., 1995).

Interestingly, the genetic distance between fliCH1 (Su1242, GenBank accession AB028471.1) and fliCH12 sequences (Bi 316- 42, GenBank accession AY249997) (97.5% similarity) is less than that found between different alleles of fliCH28 (92.0% similarity) (Beutin et al., 2015b). It is slightly bigger than the distance found among different alleles of fliCH19 (98.5% similarity) (Beutin et al., 2015a). Multiple allelelic types of fliC were also detected in E. coli H6, H7, H8, H25, and H40 strains, respectively (Reid et al., 1999; Wang et al., 2000; Beutin and Strauch, 2007; Beutin et al., 2015b). Already, a considerable number of serological cross-reactions were observed when flagellar types H1–H56 were compared (Orskov and Orskov, 1984; Edwards and Ewing, 1986). Some of these flagellar antigens (H1/H12, H8/H40, H11/H21, and H37/H41) are so closely related that the use of cross-absorbed antisera is needed to obtain unambiguous serotyping results (Edwards and Ewing, 1986).

The presence of allelic subtypes within fliC genes encoding different H-types of E. coli and the finding that different flagellar types are serologically cross-reacting may complicate E. coli strain typing using H-antisera. The use of molecular typing procedures, such as real-time PCR can solve the typing problem caused by serologically closely related H-antigens, as we have shown for H1 and H12 in this work. Using primer express V3.0 software, it was not possible to design real-time PCRs specific exclusively for fliCH1 and fliCH12, respectively. For this reason, we employed a two-step real-time detection procedure. The first step uses a real-time PCR highly specific for both H1 and H12

#### TABLE 6 | Reaction of the fliCH1/H12, fliCH1 and fliCH12 real-time PCR assays with non-H1 and non-H12 strains.


(Continued)

#### TABLE 6 | Continued


<sup>a</sup>Range of real time PCR cycle thresholds. Negative reactions are indicated with the "–" sign.

strains, followed by subtyping of H1/H12-positive strains with the respective fliCH1 and fliCH12 specific real-time PCRs. Short probe sequence lengths as obtained with minor groove binder (MGB) or MGB-replacements (XS-probe) are needed to ensure specificity between closely similar DNA-targets as previously shown for fliCH19 allelic discrimination (Beutin et al., 2015a). The PCRs could be used in parallel for examination of large number of isolates using high throughput PCR platforms as described previously for analysis of large numbers of Clostridia and E. coli strains (Delannoy et al., 2013; Woudstra et al., 2013).

Unambiguous typing of fliCH1 and fliCH12 sequences is of interest for clinical and epidemiological investigations since some H1 and H12 strains were shown to play a role as pathogens in humans and animals.

More than one third of investigated H1 and H12 strains produced Shiga toxins. Strains showing O:H types characteristic for ExPEC associated with human diseases (O2:H1, O4:H1, O6:H1, O15:H1) were detected in this work. Interestingly, these were not only from humans but also found in animals and food. It was previously described that animals, food and water can be a source of pandemic ExPEC strains (Jakobsen et al., 2010; Riley, 2014; Gomi et al., 2015; Singer, 2015). Flagellar type H12 strains encompass mainly STEC (O20:H12, O55:H12, O118:H12, O136:H12, O153:H12, and Or:H12) and were isolated from diseased animals and humans, food and the environment (Scheutz and Strockbine, 2005).

The specific molecular detection of H1 and H12 flagellins as described in this study will be useful for diagnosis and for source attribution of human and animal pathogenic ExPEC and STEC strains in outbreaks and sporadic cases of infection.

#### AUTHOR CONTRIBUTIONS

Conceived and designed the experiments: LB, SD, PF. Performed the experiments: LB, SD. Analyzed the data: LB, SD, PF. Contributed reagents/materials/analysis tools: LB, SD, PF. Wrote the paper: LB, SD, PF. Critical revision of the paper for important intellectual content: LB, SD, PF.

#### REFERENCES


#### FUNDING

The project was partially financed by the French "joint ministerial program of R&D against CBRNE risks" (Grant number C1 7609- 2).

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmicb. 2016.00135


use of whole-genome sequencing data. J. Clin. Microbiol. 53, 2410–2426. doi: 10.1128/JCM.00008-15


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Beutin, Delannoy and Fach. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Rapid, Multiplexed Characterization of Shiga Toxin-Producing Escherichia coli (STEC) Isolates Using Suspension Array Technology

John M. Carter<sup>1</sup> \*, Andrew Lin<sup>2</sup> , Laurie Clotilde<sup>3</sup> and Matthew Lesho<sup>4</sup>

<sup>1</sup> Pacific West Area – Western Regional Research Center – Produce Safety and Microbiology Research, Agricultural Research Service, United States Department of Agriculture, Albany, CA, USA, <sup>2</sup> ORA/PA-FO/SAN-LB – Office of Global Regulatory Operations and Policy – Oceans, Reefs & Aquariums – Food and Drug Administration, United States Department of Health and Human Services, Alameda, CA, USA, <sup>3</sup> MagArray, Inc., Milpitas, CA, USA, <sup>4</sup> Luminex Corporation, Austin, TX, USA

#### Edited by:

Chitrita Debroy, The Pennsylvania State University, USA

#### Reviewed by:

Arun K. Bhunia, Purdue University, USA Yanhong Liu, United States Department of Agriculture, USA

\*Correspondence:

John M. Carter mark@safetraces.com

#### Specialty section:

This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology

Received: 15 January 2016 Accepted: 18 March 2016 Published: 20 May 2016

#### Citation:

Carter JM, Lin A, Clotilde L and Lesho M (2016) Rapid, Multiplexed Characterization of Shiga Toxin-Producing Escherichia coli (STEC) Isolates Using Suspension Array Technology. Front. Microbiol. 7:439. doi: 10.3389/fmicb.2016.00439 Molecular methods have emerged as the most reliable techniques to detect and characterize pathogenic Escherichia coli. These molecular techniques include conventional single analyte and multiplex PCR, PCR followed by microarray detection, pulsed-field gel electrophoresis (PFGE), and whole genome sequencing. The choice of methods used depends upon the specific needs of the particular study. One versatile method involves detecting serogroup-specific markers by hybridization or binding to encoded microbeads in a suspension array. This molecular serotyping method has been developed and adopted for investigating E. coli outbreaks. The major advantages of this technique are the ability to simultaneously serotype E. coli and detect the presence of virulence and pathogenicity markers. Here, we describe the development of a family of multiplex molecular serotyping methods for Shiga toxin-producing E. coli, compare their performance to traditional serotyping methods, and discuss the cost-benefit balance of these methods in the context of various food safety objectives.

Keywords: E. coli, Shiga toxin, immunoassay, PCR, microbead

#### INTRODUCTION/BACKGROUND

In the U.S., Shiga toxin-producing Escherichia coli (STEC) represent a significant public health concern causing approximately 175,000 illnesses annually (Scallan et al., 2011). The CDC estimates the annual U.S. costs for acute care of STEC patients to be \$1–2 billion (Paton and Paton, 2000 Science and Medicine p 28–37). Furthermore, STEC-related recalls are expensive for the food industry. For example, the outbreak of E. coli O157 associated with spinach from California's Salinas Valley in 2006 cost farmers up to \$200 million (Arnade et al., 2010). Although E. coli O157 remains the most common STEC serogroup in the U.S. (Centers for Disease Control and Prevention [CDC], 2011), over 200 additional E. coli serogroups exist (Hussein and Bollinger, 2005). Those non-O157 STEC are responsible for over 60% of the STEC infections (Scallan et al., 2011). The clinical manifestations of STEC infections range from mild watery diarrhea to severe complications of hemorrhagic colitis (HC), hemolytic uremic syndrome (HUS), and even death (Levine et al., 1987; Gyles, 2007). Since not all STEC appear to cause illness, distinguishing pathogenic STEC from the ones that do not pose a health risk remains a current challenge for regulatory agencies worldwide. Certain STEC serogroups are more highly correlated with severe

illness than others. For example, STEC O26, O45, O103, O111, O121, and O145 are referred to as the "Big Six" and account for 75–80% of non-O157 STEC isolations in clinical samples in the U.S. (Gould et al., 2013). In addition, STEC O91, O113, and O128 have also been previously reported to cause HC and HUS (Johnson et al., 2006), and an STEC O104-caused outbreak in Germany in 2011 sickened 3816 individuals, making it one of the largest HUS outbreaks ever reported (Muniesa et al., 2012). Thus rapidly identifying STEC belonging to those serogroups is as important for protecting consumer health as early diagnosis of STEC infection for determining the proper treatment (Gould et al., 2009). Additionally, such rapid identification can alert public health officials that an outbreak has occurred and aid in matching clinical, food, and environmental isolates when attempting to trace back to the source of contamination.

Traditional serotyping employs O-specific antisera, usually in a slide agglutination format. These methods are quite simple, robust, and rapid. However, they can be both labor-intensive and time-consuming for large numbers of isolates, and may lead to ambiguous results because of variability in antisera production and a lack of standard methodology. Whereas slide agglutination is also relatively inexpensive when only a few of the most common antigens are tested, maintaining a complete set of hundreds of antisera reagents for O serotyping is expensive and most often left to central reference laboratories adding to the time of analysis (Dijkshoorn et al., 2001). Currently, there is a need for faster tests. One solution is to allow a food sample to be probed simultaneously for specific STEC serogroups, which can then be extracted from that food sample, thus completing detection and isolation of STEC in foods within 24 h. Suspension array technology using microbeads has been useful in filling this need by enabling development of rapid, high-throughput adaptable assays for identifying clinically relevant STEC O-serogroups. These microbead-based assays can theoretically analyze up to 500 targets in a single reaction, so targets can be added or removed as needed. For example, a 7-plex immunoassay was developed by Clotilde et al. (2013) to identify O serogroups O26, O45, O103, O111, O121, O145, and O157. The microbeads were also capable of binding to live cells, which aides in isolating pathogenic organisms for further characterization studies (Clotilde et al., 2013). Another example is a 13-plex PCR-based suspension assay to identify 11 O-serogroups: O26, O45, O91, O103, O104, O111, O113, O121, O128, O145, and O157; and two virulence factors: eae and aggR (Feng et al., 2015). These two assays are fast (<4 h), high throughput (96 well format), and were able to identify STEC serogroups more accurately than traditional serotyping (Clotilde et al., 2015).

#### SUSPENSION ARRAY TECHNOLOGY

Microbead-based suspension array technology has emerged as a standard method for simultaneously detecting multiple biological analytes from one sample and has enabled a wide variety of applications in life science research, clinical diagnostics, food safety, and biodefense (McBride et al., 2003; Toro et al., 2013; Sun et al., 2014; Di Cristanziano et al., 2015; Khalifian et al., 2015; Silbereisen et al., 2015; Zhang et al., 2015; Kong et al., 2016). Commercial assay kits are available for cytokine profiling, infectious disease diagnostics, genotyping for inherited diseases, food pathogen typing, organ transplant compatibility testing, and many more. Many of these kits are built on the Luminex <sup>R</sup> xMAP <sup>R</sup> multiplexing platform, which utilizes fluorescent dyes to create sets of microbeads with unique spectral identities. Unique capture molecules are coupled to each set of microbeads, which then capture different analytes of interest. After binding to a detector molecule, which subsequently binds to a fluorescent reporter, these microbeads are read in a flow cytometer such as the Luminex <sup>R</sup> 200TM or FLEXMAP 3D <sup>R</sup> instrument, or in a bead imager such as the Luminex MAGPIX <sup>R</sup> instrument to determine both the presence and quantity of analyte(s) in the sample. The Luminex systems are capable of detecting up to 50, 100, or 500 different analytes from one sample in the MAGPIX, Luminex 200, or FLEXMAP 3D instruments, respectively. Some advantages of suspension array platforms include fast kinetics due to mobile capture surfaces, broad dynamic range (>3 logs), and high precision due to multiple independent measurements on many microbeads for each analyte in the sample. Open platform suspension array systems also enable researchers and other biomedical professionals to design and build tests for their specific applications. In the area of food safety discussed here, an important aspect of suspension array technology is the ability to detect DNA, RNA, and protein targets with the same system.

### IMMUNOASSAY/PROTEIN BASED SEROTYPING

Inoculation of E. coli produces a strong immune response, which targets immunodominant antigens, such as lipopolysaccaride (O-antigens) and flagellum (H-antigens). The best-known E. coli serotype is O157:H7. Strains sharing the O157 O-antigen (i.e., members of the O157 serogroup) also tend to share pathogenic phenotypes. For example, most strains belonging to the O157 serogroup exhibit prophages that code for Shiga toxin (Stx). Other serotypes may also carry Stx-encoding phage, and in the US seven STEC serogroups are considered adulterants in foods: O26, O111, O103, O121, O45, O145, and O157. Although there are particular strains within these serogroups that do not produce Stx, all are grounds for regulatory actions, including product recalls. Thus the food industries have considerable interest in E. coli serogroup testing.

Identification of specific E. coli serotypes is important for tracing infections to their environmental source. Outbreaks were formerly detected and defined solely by the serogroup of pathogens, and serotyping still provides a rapid and inexpensive means for preliminary characterization. Until recently the "gold standard" for serotyping was slide agglutination with O and H antigen specific antisera (Orskov et al., 1977; Machado et al., 2000). Now, alternative immunochemical assay formats are also available, including our Luminex-format microbead suspension array as a solid phase for multiplex typing of various STEC serogroups. (See **Figure 1A**).

Our assay architecture is similar to a sandwich ELISA, starting with an antibody (Ab) that is bound to a solid phase support. In a typical ELISA, this "capture Ab" is non-covalently adsorbed to a microplate well, but we use covalent attachment. When the target is added, the capture antibody can bind and pull its antigen out of the solution phase. Next the detector Ab is added, which binds to the antigen to create the Ab sandwich configuration. Finally the detector Ab is detected by means of a probe. In ELISA, that probe is labeled with an enzyme. In our assay, the probe is labeled with phycoerythrin, a bright fluorophore. Because bacteria exhibit many copies of O-antigen, the same Ab may be used for both capture and detection.

To generate the assay reagents, magnetic microbeads were covalently coupled to capture Ab according to the instructions provided with the BioRad Amine Coupling kit (BioRad, Hercules, CA, USA), using an amount of Ab based on a preliminary microplate ELISA data (1–10 µg/mL). This reaction is a common two-step carbodiimide protocol with N-hydroxysulfosuccinimide. Coupling was confirmed using phycoerythrin-labeled anti-species Ab. Detector Ab were biotinylated with the EZ-Link Sulfo-NHS-Long Chain-Biotin kit (Pierce, Rockford, IL, USA), according to the manufacturer's instructions. Coupled microbeads and detector Ab were stable at 2–8◦C for up to 1 year.

Samples were generated by blending foods at a 1:10 dilution into Brain Heart Infusion broth. A 1mL aliquot was then spiked with ∼10 CFU of E. coli and incubated overnight shaking at 25 or 37◦C. Appropriate BSL2 precautions were used when handling live pathogens. For example, samples were spiked after blending, and cultures were handled in screw-cap tubes, to reduce aerosolization.

For the assay, incubations were all 1 h at room temperature, in black microplates, swirling at 100 rpm. Washes were all threefold, using phosphate-buffered saline (PBS), pH 7.4, containing 0.05% Tween 20 (PBS-T) and a Bio-Plex Pro Wash Station (BioRad). For each sample, a 100 µL aliquot was combined with 5000 of each type of microbead, then incubated and washed. The microbeads were then resuspended in 100 µL of a cocktail containing 4 µg/mL of each biotinylated detector Ab, then incubated and washed. The microbeads were then resuspended in 100 µL of 4 µg/mL streptavidin labeled with R-phycoerythrin (SAPE), incubated and washed. The microbeads were finally resuspended in 100 µL PBS and then analyzed on Luminex 100 or MAGPIX instruments.

For these experiments, we used overnight cultures, and we were not concerned with assay sensitivity. We typically recovered 50–90% of microbeads from samples, depending on the matrix. We found that even 100 microbeads (instead of 5000) were sufficient for robust and reproducible signals. In validation experiments performed with three strains of each of the seven listed STEC serogroups, the assay gave 100% sensitivity and specificity (Clotilde et al., 2013). When we expanded the validation to include 161 environmental STEC strains our Luminex immunoassay missed only one strain of O157. In this latter experiment we compared performance of our assay to standard assays and a microbead-based PCR serotyping assay (Clotilde et al., 2015). This comparative study (**Table 1**) suggested that our Luminex immunoassay also missed 10 strains of O111 and O128. However, these targets were not included in our 7-plex immunoassay.

We have now further expanded our assay to include the 10 most common STEC in the US: O26, O45, O103, O111, O121, O145, and O157, plus O91, O113, and O128. We still observe excellent sensitivity and specificity. We believe that STEC will continue to evolve additional pathogenic serogroups in the future. Regulatory agencies currently base their actions on the serogroup of such emerging adulterants, but there is consensus toward surveillance of virulence factors, rather than O-antigens. We already have working immunoassays for Stx (Clotilde et al., 2011), and we plan to add intimin, a virulence factor involved in E. coli attachment.

### MOLECULAR SEROTYPING

The O antigen, a polymer of repeating oligosaccharides, is a component of the lipopolysaccharide of gram negative bacteria, and is used for serotyping E. coli into O serogroups. The wzx flippase, and wzy polymerase genes code for proteins involved in making the O antigen oligosaccharide, have proven to be O serogroup specific, and thus excellent targets for O serogroup specific PCR assays (Lin et al., 2011b). These genes were used to develop a multiplex suspension array that can identify 11 STEC O serogroups: O26, O45, O91, O103, O104, O111, O113, O121, O128, O145, and O157 (Lin et al., 2013) (**Figure 1B**).

Primer and probe sequences from each of the 11 O serogroup wzx or wzy genes were identified. Primers were designed with one biotinylated primer, and one unlabeled primer to result

#### TABLE 1 | List of disagreements between serotyping methods.


OUT, other untypeable.

in a biotinylated amplicon. A single primer mix with primers for all targets was prepared. PCR reactions were carried out according to Lin et al. (2013). Signal to noise was improved with asymmetric PCR amplification with concentrations of biotinylated: unlabeled primer at a ratio of 5:1. Probe sequences complimentary to the biotinylated strand were conjugated to Luminex MagPlex <sup>R</sup> microbeads according to the manufacturer's instructions (Luminex Corporation, 2007). After multiplex PCR, amplicons were hybridized to labeled microbeads and incubated with SAPE. Reactions were then analyzed with a Luminex compatible cytometry-type instrument to interrogate each uniquely colored microbead and detect the amount of reporter molecule. Median fluorescent intensities (MFI's) were calculated for each analyte, and a signal to background ratio was determined, where background is the MFI measured using one or more wells containing all ingredients except for template DNA. Signal to background ratio was calculated using Bio-Plex Manager software as (Sample MFI-Background MFI)/Background MFI. Samples are considered positive when signal to background ratio is greater than 5.0.

The PCR based Luminex suspension array for O serotyping has proven to be accurate, robust, and adaptable. A panel of 114 STEC isolates were all correctly identified, while 46 non-STEC and non- E. coli yielded no false positives (Lin et al., 2011a). In a multi-laboratory study of blind samples involving nine laboratories, a total of 492 isolates were identified correctly out of 495 analyzed (99.4% accuracy) (Lin et al., 2013). Another advantage of the suspension array system and of PCR based suspension arrays is the flexibility to add or remove targets from the array as needed. For instance, the 11-plex suspension array to identify 11 STEC O serogroups was recently modified to include STEC attachment factors that may be important markers of pathogenic potential. Primers and probes for the eae intimin gene, and the aggR regulator of the enteroaggregative phenotype have been added to the 11-plex O serotyping array to comprise a 13-plex STEC suspension array (Feng et al., 2015).

Suspension array technology allows for rapid, accurate, high throughput analysis of STEC isolates. In addition to identifying the most clinically relevant STEC O serogroups, the 13-plex suspension array is able to detect the presence of adherence factors that are associated with HUS, allowing regulatory and public health labs to determine pathogenic potential. The addition of the virulence factors to the suspension array also illustrates the flexibility of suspension array technology. While the big six STEC O serogroups and the O91, O128, O113, O104 additional O serogroups are presently the most concerning, new emerging O serogroups could be added to the suspension array. Additional targets could be added to the array, if other genes prove to be more highly correlated with illness. For instance, the H flagellar antigen is a useful phylogenetic marker (Ju et al., 2012) and could be added to the array to indicate the full O and H serotype. Other putative virulence factors that could be included

in an STEC virulence profile include the plasmid-encoded enterohemolysin (ehxA), STEC autoagglutinating adhesin (Saa), extracellular serine protease (espP), long polar fimbria (lpf), and a bifunctional catalase-peroxidase (katP) (Law, 2000; Etcheverria and Padola, 2013). Another area of future study is to evaluate the effectiveness of the suspension array in screening food, environmental, and clinical samples. A preliminary study of artificially inoculated fresh produce and raw milk resulted in over 90% agreement between the suspension array and qPCR screening for stx1 and/or stx2 genes (Kase et al., 2015). Further improvements such as including an internal amplification control would be useful especially when screening enrichments to ensure that PCR inhibition does not cause false negative results.

#### DISCUSSION

Both of these suspension array assays represent improvements in speed and accuracy for detection and characterization of STEC pathogens in foods. A unique feature of the microbead-based immunoassay is that the magnetic microbeads pull down live pathogens, concentrating them. We have found that positive samples are easily cultured from the material remaining in the assay microplate (Clotilde et al., 2011). This facilitates further characterization, e.g., via molecular experiments. Other possibilities include use of an orthogonal detection scheme, other than SAPE. One group has combined immunomagnetic pulldown on Luminex microbeads with a modified fluorescence in situ hybridization (FISH) detection method (Stroot et al., 2012). This capture antibody-targeted FISH assay (CAT-FISH) method provides sensitivity equal to the Luminex immunoassay with the possibility of additional specificity from the nucleic acid-based FISH reporter. Among the many advantages of these suspension arrays is the ability to quickly upgrade and enhance assays as new genomic information becomes available for other pathotypes. Beyond the addition of more informative targets, other potential enhancements for these assays include workflow simplification, improved reagent stability through lyophilization or stable, engineered antibodies, and the ability to interrogate single cells with the assay, which would enable the characterization of mixed populations of STEC.

These assays fit on the spectrum of STEC characterization techniques between single-plex PCR assays for rapid E. coli screening of food samples and Next-Generation Sequencing (NGS) techniques. NGS is benefitting assay developers by providing information about novel molecular targets to more

### REFERENCES


completely characterize STEC samples. While NGS is a powerful tool in microbiological research and outbreak investigations, there is still a need for low-complexity multiplex assays to make specific determinations about prioritized serogroups. One of the most significant drawbacks to fluorescent microbead-based methods is the cost of reagents, specifically the microbeads themselves. If we test 100 samples with a 10-plex assay, then 1000 assays are performed, and the cost per assay is acceptable. However, if half of those samples are identified as O157, then a more cost-effective work flow might begin with slide agglutination for all samples, using a single-plex O157 assay, followed by work-up of the non-O157 samples using a more expensive multiplex assay. The tradeoff of this workflow is that the time taken to do a full determination of the non-O157 samples is increased significantly. The suspension array technology still remains a versatile platform that benefits research, industrial, and regulatory labs by enabling a variety of protein and genetic analyses for many foodborne pathogens of interest (e.g., sets of assays to identify and characterize Salmonella, STEC, and foodborne toxins).

### AUTHOR CONTRIBUTIONS

JC Drafted and edited immunoassay/protein based serotyping section and discussion. He and his lab staff conceived, developed, built, and tested the assays that resulted in the data in **Table 1**. AL Drafted and edited the Introduction and Molecular Serotyping sections. He conceived and developed the molecular serotyping assays described herein, including the enhanced assay with markers of pathogenic potential. LC Provided significant editorial contribution to all sections. Collaborated with JC and AL in building and testing the assays described. ML Outlined the article, drafted abstract and suspension array sections and formulated key messages in the discussion section. Luminex Corporation currently manufactures E. coli Serogroup Identification assays.

### ACKNOWLEDGMENTS

U.S. Department of Agriculture work was funded under National Program 108, Food Safety, CRIS 5325-42000-043-00D. The authors would also like to thank the FDA San Francisco Laboratory, FDA Office of Regulatory Science, and CFSAN Office of Regulatory Science for their support of this research.


serotyping methods for Shiga toxin-producing Escherichia coli. Foodborne Path. Dis. 12, 118–121. doi: 10.1089/fpd.2014.1827


**Disclaimer:** This presentation is not intended to be or appear as an endorsement, either directly, or indirectly, by the U.S. Government of any product or service described in this presentation. The U.S. Government does not directly or indirectly endorse any product or service presented or described in this presentation. The U.S. Department of Agriculture (USDA) prohibits discrimination in all its programs and activities on the basis of race, color, national origin, age, disability, and where applicable, sex, marital status, familial status, parental status, religion, sexual orientation, genetic information, political beliefs, reprisal, or because all or part of an individual's income is derived from any public assistance program. (Not all prohibited bases apply to all programs.) Persons with disabilities who require alternative means for communication of program information (Braille, large print, audiotape, etc.) should contact USDA's TARGET Center at (202) 720-2600 (voice and TDD). To file a complaint of discrimination, write to USDA, Director, Office of Civil Rights, 1400 Independence Avenue, SW, Washington, DC 20250-9410, or call (800) 795-3272 (voice) or (202) 720-6382 (TDD). USDA funding was administrated through the Agricultural Research Service, National Program 108 Food Safety.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Carter, Lin, Clotilde and Lesho. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Six Novel O Genotypes from Shiga Toxin-Producing Escherichia coli

Atsushi Iguchi<sup>1</sup> \*, Sunao Iyoda<sup>2</sup> , Kazuko Seto<sup>3</sup> , Hironobu Nishii<sup>1</sup> , Makoto Ohnishi<sup>2</sup> , Hirohisa Mekata4,5, Yoshitoshi Ogura<sup>6</sup> and Tetsuya Hayashi<sup>6</sup>

<sup>1</sup> Department of Animal and Grassland Sciences, Faculty of Agriculture, University of Miyazaki, Miyazaki, Japan, <sup>2</sup> Department of Bacteriology I, National Institute of Infectious Diseases, Tokyo, Japan, <sup>3</sup> Division of Bacteriology, Osaka Prefectural Institute of Public Health, Osaka, Japan, <sup>4</sup> Organization for Promotion of Tenure Track, University of Miyazaki, Miyazaki, Japan, <sup>5</sup> Center for Animal Disease Control, University of Miyazaki, Miyazaki, Japan, <sup>6</sup> Department of Bacteriology, Faculty of Medical Sciences, Kyushu University, Fukuoka, Japan

Serotyping is one of the typing techniques used to classify strains within the same species. O-serogroup diversification shows a strong association with the genetic diversity of O-antigen biosynthesis genes. In a previous study, based on the O-antigen biosynthesis gene cluster (O-AGC) sequences of 184 known Escherichia coli O serogroups (from O1 to O187), we developed a comprehensive and practical molecular O serogrouping (O genotyping) platform using a polymerase chain reaction (PCR) method, named E. coli O-genotyping PCR. Although, the validation assay using the PCR system showed that most of the tested strains were successfully classified into one of the O genotypes, it was impossible to classify 6.1% (35/575) of the strains, suggesting the presence of novel O genotypes. In this study, we conducted sequence analysis of O-AGCs from O-genotype untypeable Shiga toxin-producing E. coli (STEC) strains and identified six novel O genotypes; OgN1, OgN8, OgN9, OgN10, OgN12 and OgN31, with unique wzx and/or wzy O-antigen processing gene sequences. Additionally, to identify these novel O-genotypes, we designed specific PCR primers. A screen of O genotypes using O-genotype untypeable strains showed 13 STEC strains were classified into five novel O genotypes. The O genotyping at the molecular level of the O-AGC would aid in the characterization of E. coli isolates and will assist future studies in STEC epidemiology and phylogeny.

#### Keywords: E. coli, O serogroup, genotyping techniques, PCR, STEC

## INTRODUCTION

Serotyping is a standard method for subtyping of Escherichia coli strains in taxonomical and epidemiological studies (Orskov and Orskov, 1984). In particular, the identification of strains of the same O serogroup is essential in outbreak investigations and surveillance for identifying the diffusion of a pathogenic clone (Frank et al., 2011; Luna-Gierke et al., 2014; Terajima et al., 2014; Heiman et al., 2015). Thus far, the World Health Organization Collaborating Centre for Reference and Research on Escherichia and Klebsiella, which is based at the Statens Serum Institut (SSI) in Denmark<sup>1</sup> , has recognized 185 E. coli O serogroups. These are designated O1 to O188 (publication of O182 to O188 is pending) and include three pairs of subgroups, O18ab/ac, O28ab/ac, and O112ab/ac; and six missing numbers, O31, O47, O67, O72, O93, and O122 (Orskov and Orskov, 1992; Scheutz et al., 2004).

<sup>1</sup>http://www.ssi.dk/English.aspx

#### Edited by:

Pina Fratamico, United States Department of Agriculture – Agricultural Research Service, USA

#### Reviewed by:

Beatriz Quiñones, United States Department of Agriculture – Agricultural Research Service, USA Edward G. Dudley, Pennsylvania State University, USA

#### \*Correspondence:

Atsushi Iguchi iguchi@med.miyazaki-u.ac.jp

#### Specialty section:

This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology

Received: 18 February 2016 Accepted: 06 May 2016 Published: 20 May 2016

#### Citation:

Iguchi A, Iyoda S, Seto K, Nishii H, Ohnishi M, Mekata H, Ogura Y and Hayashi T (2016) Six Novel O Genotypes from Shiga Toxin-Producing Escherichia coli. Front. Microbiol. 7:765. doi: 10.3389/fmicb.2016.00765

O-serogroup diversification shows a strong association with the genetic diversity of O-antigen biosynthesis genes. In E. coli, the genes required for O-antigen biosynthesis are clustered at a chromosomal locus flanked by the colanic acid biosynthesis gene cluster (wca genes) and the histidine biosynthesis (his) operon. Sequence comparisons of O-antigen biosynthesis gene clusters (O-AGCs) indicate a variety of genetic structures (DebRoy et al., 2011a). In particular, sequences from O-antigen processing genes (wzx/wzy and wzm/wzt) located on the O-AGCs are highly variable and can be used as gene markers for the identification of O serogroups via molecular approaches. So far, several studies have reported genetic methodologies allowing rapid and lowcost O-typing of isolates (Coimbra et al., 2000; Beutin et al., 2009; Bugarel et al., 2010; Wang et al., 2010, 2014; DebRoy et al., 2011b; Fratamico and Bagi, 2012; Quiñones et al., 2012; Geue et al., 2014). In a previous study (Iguchi et al., 2015a), we analyzed the O-AGC sequences of 184 known E. coli O serogroups (from O1 to O187), and organized 162 DNA-based O serogroups (O-genotypes) on the basis of the wzx/wzy and wzm/wzt sequences. Subsequently we presented a comprehensive molecular O-typing scheme: an E. coli O-genotyping polymerase chain reaction (ECOG-PCR) system using 20 multiplex PCR sets containing 162 O-genotype-specific PCR primers (Iguchi et al., 2015b).

The Shiga toxin-producing E. coli (STEC) constitute one of the most important groups of food-borne pathogens, as they can cause gastroenteritis that may be complicated by hemorrhagic colitis or hemolytic-uremic syndrome (HUS; Tarr et al., 2005). O157 is a leading STEC O serogroup associated with HUS (Terajima et al., 2014; Heiman et al., 2015) and other STEC O serogroups, including O26, O103, O111, O121 and O145, are also recognized as significant food-borne pathogens worldwide (Johnson et al., 2006). Additionally, unexpected STEC O serogroups have sometimes emerged to cause sporadic cases or outbreaks. For example, STEC O104:H4 was responsible for a large food-borne disease outbreak in Europe in Buchholz et al. (2011). For such various O-serogroups, ECOG-PCR is an accurate and reliable approach for subtyping E. coli isolates from patients and contaminated foods (Iguchi et al., 2015b; Ombarak et al., 2016). However, as our previous studies indicated, some of the tested strains were not classified into any of the known O genotypes, suggesting the presence of novel O genotypes (Iguchi et al., 2015b).

Here, we analyzed the O-AGCs from genetically untypeable STEC strains (including strains from patients with diarrhea and hemorrhagic colitis) by the ECOG-PCR. By comparing sequences we revealed six novel O-genotypes and developed specific-PCRs for each novel O-genotype.

### MATERIALS AND METHODS

### O Serogrouping/O Genotyping

O serogroup were determined by agglutination tests in microtiter plates using commercially available pooled and single antisera against all recognized E. coli O antigens (O1 to O187; SSI Diagnostica, 156 Hillerød, Denmark). O genotypes were determined by ECOG-PCR as described in our previous study (Iguchi et al., 2015b). Salmonella enterica O42 (SSI Diagnostica) and Shigella boydii type 13 (Denka Seiken Co. Ltd., Japan) single antisera were also used to test for the agglutination reaction.

### Source Sequences of Novel O-Genotypes

The O-AGC sequences were determined from six O-genotype untypeable (OgUT) STEC strains, of which four were serologically typeable (O1, O39, O40, and O141) and two others were untypeable (OUT) strains (**Table 1**). All strains were isolated from human feces (including patients with diarrhea and hemorrhagic colitis) in Japan from 2008 to 2012. The O-AGC sequences flanked by wcaM and hisI were extracted from draft genome sequences determined using an Illumina MiSeq sequencer (Illumina, San Diego, CA, USA), as previously described (Ogura et al., 2015). Identification and functional annotation of the coding sequences were performed based on the results of homology searches against the public non-redundant protein database using BLASTP. Six O-AGC sequences reported in this paper have been deposited in the GenBank/EMBL/DDBJ database (accession no. LC125927-LC125932).

#### Sequence Comparisons

The wzx/wzy sequences from O-serogroup strains (Iguchi et al., 2015a) and OX-groups reference strains (DebRoy et al., 2016) were used. Additionally, O-AGC sequences from O116 (AB812051; Iguchi et al., 2015a), O1 (GU299791; Li et al., 2010), O39 (AB811616; Iguchi et al., 2015a), O141 (DQ868765; Han et al., 2007), O40 (EU296417; Liu et al., 2008), S. enterica O42



<sup>a</sup>AC, asymptomatic carrier; D, diarrhea; BD, bloody diarrhea.

(JX975340; Liu et al., 2014), and S. boydii type 13 (AY369140; Feng et al., 2004) were also used. Multiple alignments of DNA and amino acid sequences were constructed by using the CLUSTAL W program (Thompson et al., 1994). Phylogenetic trees were constructed by using the neighbor-joining algorithm using MEGA5 software (Tamura et al., 2007). Homology comparisons of paired sequences were performed by using the In Silico Molecular Cloning Genomics Edition (In Silico Biology, Inc., Yokohama, Japan).

#### PCR for Identifying Novel O-Genotypes

Polymerase chain reaction primers for specifically identifying novel O genotypes were designed (**Table 2**) and their specificities were evaluated by using 185 O-serogroup reference strains (O1–O188) from SSI using the following PCR conditions. Genomic DNA from E. coli strains was purified using the Wizard Genomic DNA purification kit (Promega, Madison, WI, USA) or DNeasy Blood & Tissue Kit (QIAGEN, Hilden, Germany) according to the manufacturer's instructions. PCRs were performed using 10 ng/µl of template DNA. PCR was performed as follows: each 30-µl reaction mixture contained 2 µl of genomic DNA, 6 µl of 5× Kapa Taq buffer, dNTP mix (final concentration, 0.3 mM each), MgCl<sup>2</sup> (final concentration, 2.5 mM), primers (final concentration, 0.5 µM each), and 0.8 U of Kapa Taq DNA polymerase (Kapa Biosystems, Woburn, MA, USA). The thermocycling conditions were: 25 cycles of 94◦C for 30 s, 58◦C for 30 s, and 72◦C for 1 min. PCR products (2 µl) were electrophoresed in 1.5% agarose gels in 0.5× TBE (25 mM Tris borate, 0.5 mM EDTA), and photographed under UV light after the gel was stained with ethidium bromide (1 mg/ml).

## Distributional Survey of Novel O-Genotypes

Thirty five O-serogrouped E. coli strains from our previous study (Iguchi et al., 2015b), whose O genotypes were not identified by ECOG-PCR were used for screening novel O-genotypes by the PCR method designed in this study. The prevalence of stx1, stx2 (Cebula et al., 1995) and eae (Oswald et al., 2000) genes in the tested STEC strains was determined by the PCR.

### RESULTS

### Novel O-Genotypes

Six types of O-AGC were identified from OgUT STEC strains (**Figure 1**). Four O-AGCs (named OgN10, OgN31, OgN1, and OgN8 genotypes) were obtained from strains that were serologically classified into O1, O39, O40, and O141, respectively (**Table 1**). Two others (named OgN9 and OgN12 genotypes) were obtained from strains that were both serologically and genetically unclassified into any groups (**Table 1**). Actually, the OgN9 strain did not react with any particular antiserum, and OgN12 showed identical agglutination titers with O34 and O140 antisera, which resulted in OUT classsification. OgN8, OgN10, and OgN12 carried rmlBDAC for the synsthesis of deoxythymidine diphosphate (dTDP)-L-rhamnose, and OgN31 carried rmlBA-vioA for dTDP viosamine synthesis (**Figure 1**). OgN9 carried fnlA-qnlBC for UDP-N-acetyl-L-quinovosamine (UDP-L-QuiNAc) synthesis (**Figure 1**). All novel O-AGCs carried the wzx/wzy O-antigen processing genes (**Figure 1**). The wzx/wzy sequences from OgN O-AGCs were compared with those from 171 O-serogroup strains and 11 OX-group reference strains,

indicating that their sequences were unique compared to those from known O-AGCs (less than 70% DNA sequence identity of closest pairs), except for wzx of OgN31 (**Figure 2**). The sequence of OgN31\_wzx was 98.7% identical in DNA sequence (99.0% amino acid sequence identity) to that of O116. Sequence comparison of O-AGCs revealed that the left region including

wzx and genes for the d-TDP glucose pathway was conserved between OgN31 and Og116, and the right region including the wzy and glycosyltransferase genes was unique (less than 40% DNA sequence identity) in each O-AGC (**Figure 3A**). O-AGC gene sets from four pairs with members of different genotypes that agglutinated with the same O antisera were compared (**Figure 3B**). Between OgN10 and Og1 (from O1 strain), and between OgN8 and Og141 (from O141 strain), rmlBDAC genes were highly conserved in both O-AGCs, while other genes including wzx and wzy were diversified (less than 70% DNA sequence identity). Between OgN31 and Og39 (from O39 strain), different types of sugar biosynthesis genes were located on each O-AGC (rmlBA-vioA on OgN31, and rmlBDAC-vioAB and manCB on Og39). There was no genetic similarity between OgN1 and Og40 (from O40 strain). From these results, we were convinced that these six were novel O-AGCs.

A BLAST search of the NCBI database revealed that the OgN10 O-AGC is similar to that of S. enterica O42, and the OgN9 O-AGC was almost identical to that of S. boydii type 13 (**Figure 3C**). OgN10 and OgN9 strains agglutinated with S. enterica O42 and S. boydii type 13 antisera, respectively (data not shown).

### Primers for Identifying Novel O Genotypes

Six PCR primer pairs were designed for identifying the novel O-genotypes (**Table 2** and **Figure 4**). All primer pairs were targeted unique sequences of wzy, except for OgN31 for which primers were targeted to a glycosyltransferase gene. Each PCR was evaluated by using all 185 O-serogroup reference strains from O1 to O188 and six novel O-genotype strains (listed in **Table 1**). PCR products of the expected sizes on the agarose gel were obtained only with the corresponding strains, and no extra products were observed in the size range between 100 and 1,500 bp (data not shown).

### Distribution of Novel O-Genotypes

Among 35 O-serogrouped E. coli strains whose O genotypes were not identified by the ECOG-PCR, five O141, three O1, three O39, one O40, and one O140 strains were classified by using the novel O-genotype PCR into OgN8, OgN10, OgN31, OgN1, and OgN12, respectively (**Table 3**). All 13 strains classified into five novel O-genotypes were eae-negative STEC, and OgN8, OgN10, and OgN31 had been isolated from patients with bloody diarrhea. OgN8 and OgN12 strains also cross-reacted with O41 and O34 antisera, respectively.

## DISCUSSION

In this study, six novel O genotypes were revealed from STEC strains isolated from human patients, and the prevalence of these O genotype strains was confirmed in STECs. The OgN10 strains were serologically classified into the O1 serogroup. The O1-serogroup strain is often seen in extra-intestinal pathogenic E. coli from patients with urinary tract infections (Abe et al., 2008; Mora et al., 2009) and septicemic disease (Mora et al.,

(less than 70%) are indicated in black.

carried different types of O-AGCs. Lower genes show the O-AGCs from O-serogroup strains. (C) Similar O-AGCs in strains of other genera. Amino acid sequence identities (%) between homologs are shown in the middle.


TABLE 2 | Polymerase chain reaction (PCR) primer sequences for identification of six novel O genotypes.

2009), and in avian pathogenic E. coli (Mora et al., 2009; Johnson et al., 2012). Our previous study showed that an O1 strain isolated from patient blood was classified into Og1 (Iguchi et al., 2015b) and sequence comparisons showed that both an APEC O1 strain from avian colibacillosis (Johnson et al., 2007) and the G1632 strain from a patient with a urinary tract infection (Li et al., 2010) carried the Og1-type O-AGC, whereas three STEC O1 strains used in this study were all classified into OgN10. Actually, we confirmed that five STEC O1 strains from cattle used in a previous study, described as O1B type (Mekata et al., 2014) were also classified into OgN10 (data not shown). In fact, E. coli O1 strains could be generally subtyped into two genotypes, Og1 and OgN10, which were clearly linked to extra-intestinal/avian pathogenic E. coli and STEC, respectively. Among the O1 serogroup, three types of antigen structures have so far been reported (Baumann et al., 1991; Gupta et al., 1992) and the β-linked side-chain N-acetyl-D-mannosamine residue was suggested to be a common O1-specific epitope (Gupta et al., 1992). A partial kinship of O antigen structure synthesized from different O-AGCs may be serologically recognized as the same O serogroup and may also be represented in OgN1 OgN8, OgN12, and OgN31 strains, related to O40, O141, O140, and O39, respectively. The subdivision within each O-serogroup based on the O-AGC DNA sequences may be useful for



<sup>a</sup>AC, asymptomatic carrier; D, diarrhea; BD, bloody diarrhea. <sup>b</sup>Strains used for sequencing of the O-AGC are indicated by asterisks. All strains were isolated in Japan.

obtaining more reliable information for epidemiological studies of pathogenic E. coli. Another advantage for DNA-based typing is that serologically untypeable and ambiguous strains could be clearly classified. At the present time, the PCR-based method reported here is the only way to distinguish OgN9. Although, STEC OgN groups have not emerged as a major public health issue, these groups are believed to be a possible cause of diarrhea and bloody diarrhea. To gain more information about trends in STEC OgNs epidemiology, further studies of global OgN isolates are needed. The PCR method described in this study may help the surveillance and monitoring of the OgN groups. Additionally, published sequences from OgN O-AGCs may be used for other DNA-based methodologies, such as in silico typing using whole genome sequencing data (Joensen et al., 2015).

#### AUTHOR CONTRIBUTIONS

Conceived and designed the experiments: AI, MO, and TH. Performed the experiments: AI, SI, KS, and HN. Analyzed the

#### REFERENCES


data: AI and YO. Contributed reagents/materials/analysis tools: SI, KS, MO, and HM. Wrote the paper: AI. Critical revision of the paper for important intellectual content: AI, SI, and TH.

#### FUNDING

This research was partially supported by the Research Program on Emerging and Re-emerging Infectious Diseases from Japan Agency for Medical Research and development (AMED; 15fk0108008h0001), and by the Grants-in-Aid for Scientific Research (C) from the Ministry of Education, Culture, Sports, Science and Technology (MEXT) to AI (25350180) and SI (15K08486).

#### ACKNOWLEDGMENT

We thank Atsuko Akiyoshi and Yuiko Kato for technical assistance.



Escherichia coli by a rapid and cost-effective DNA microarray colorimetric method. Front. Cell Infect. Microbiol. 2:61. doi: 10.3389/fcimb.2012.00061


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

The reviewer BQ and handling Editor declared their shared affiliation and the handling Editor states that the process nevertheless met the standards of a fair and objective review.

Copyright © 2016 Iguchi, Iyoda, Seto, Nishii, Ohnishi, Mekata, Ogura and Hayashi. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Characterization of Shiga Toxin Subtypes and Virulence Genes in Porcine Shiga Toxin-Producing Escherichia coli

Gian Marco Baranzoni<sup>1</sup> , Pina M. Fratamico<sup>1</sup> \*, Jayanthi Gangiredla<sup>2</sup> , Isha Patel<sup>2</sup> , Lori K. Bagi<sup>1</sup> , Sabine Delannoy<sup>3</sup> , Patrick Fach<sup>3</sup> , Federica Boccia<sup>4</sup> , Aniello Anastasio<sup>4</sup> and Tiziana Pepe<sup>4</sup>

<sup>1</sup> Eastern Regional Research Center, United States Department of Agriculture – Agricultural Research Service, Wyndmoor, PA, USA, <sup>2</sup> Center of Food Safety and Applied Nutrition, U.S. Food and Drug Administration, Laurel, MD, USA, <sup>3</sup> Food Safety Laboratory, University of Paris-Est, Anses, Maisons-Alfort, France, <sup>4</sup> Department of Veterinary Medicine and Animal Production, University of Naples Federico II, Naples, Italy

#### Edited by:

Dustin Brisson, University of Pennsylvania, USA

#### Reviewed by:

Jorge Blanco, University of Santiago de Compostela, Spain Séamus Fanning, University College Dublin, Ireland

\*Correspondence:

Pina M. Fratamico pina.fratamico@ars.usda.gov

#### Specialty section:

This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology

Received: 26 February 2016 Accepted: 07 April 2016 Published: 21 April 2016

#### Citation:

Baranzoni GM, Fratamico PM, Gangiredla J, Patel I, Bagi LK, Delannoy S, Fach P, Boccia F, Anastasio A and Pepe T (2016) Characterization of Shiga Toxin Subtypes and Virulence Genes in Porcine Shiga Toxin-Producing Escherichia coli. Front. Microbiol. 7:574. doi: 10.3389/fmicb.2016.00574 Similar to ruminants, swine have been shown to be a reservoir for Shiga toxin-producing Escherichia coli (STEC), and pork products have been linked with outbreaks associated with STEC O157 and O111:H-. STEC strains, isolated in a previous study from fecal samples of late-finisher pigs, belonged to a total of 56 serotypes, including O15:H27, O91:H14, and other serogroups previously associated with human illness. The isolates were tested by polymerase chain reaction (PCR) and a high-throughput real-time PCR system to determine the Shiga toxin (Stx) subtype and virulence-associated and putative virulence-associated genes they carried. Select STEC strains were further analyzed using a Minimal Signature E. coli Array Strip. As expected, stx2<sup>e</sup> (81%) was the most common Stx variant, followed by stx1<sup>a</sup> (14%), stx2<sup>d</sup> (3%), and stx1<sup>c</sup> (1%). The STEC serogroups that carried stx2<sup>d</sup> were O15:H27, O159:H16 and O159:H-. Similar to stx2<sup>a</sup> and stx2c, the stx2<sup>d</sup> variant is associated with development of hemorrhagic colitis and hemolytic uremic syndrome, and reports on the presence of this variant in STEC strains isolated from swine are lacking. Moreover, the genes encoding heat stable toxin (estIa) and enteroaggregative E. coli heat stable enterotoxin-1 (astA) were commonly found in 50 and 44% of isolates, respectively. The hemolysin genes, hlyA and ehxA, were both detected in 7% of the swine STEC strains. Although the eae gene was not found, other genes involved in host cell adhesion, including lpfAO<sup>113</sup> and paa were detected in more than 50% of swine STEC strains, and a number of strains also carried iha, lpfAO26, lpfAO157, fedA, orfA, and orfB. The present work provides new insights on the distribution of virulence factors among swine STEC strains and shows that swine may carry Stx1a-, Stx2e-, or Stx2d-producing E. coli with virulence gene profiles associated with human infections.

Keywords: Escherichia coli, STEC, swine, Shiga toxins variants, virulence genes

## INTRODUCTION

fmicb-07-00574 April 19, 2016 Time: 16:19 # 2

Shiga Toxin-producing Escherichia coli (STEC) are food-borne pathogens responsible for outbreaks and serious illness including hemorrhagic colitis (HC) and hemolytic uremic syndrome (HUS). STEC O157:H7 is the serotype that has most often been associated with outbreaks and severe forms of diarrhea; however, recently a number of non-O157 STEC serogroups that cause similar illnesses have emerged (Gould et al., 2013). Cattle and other ruminants are important reservoirs of STEC; infection is asymptomatic, and they can carry the pathogens for long periods of time. Similarly, healthy swine may shed STEC, as demonstrated by several studies in which STEC were detected and isolated from swine fecal samples (Tseng et al., 2014b). Many of the investigations focused on serotype O157:H7; however, some studies also tested for non-O157 STEC serogroups and identified serogroups previously associated with human cases of illness (Fratamico et al., 2004; Kaufmann et al., 2006; Tseng et al., 2014b). The possibility that swine can transmit pathogenic STEC to humans is supported by a few outbreaks linked to the consumption of pork products contaminated with STEC O157:H7, O157:NM, and O111:H- (Tseng et al., 2014b).

Shiga toxins (Stx) are divided in two major antigenic forms: Stx1 and Stx2. Variants for Stx1 and Stx2 are grouped in three (Stx1a, Stx1c, Stx1d) and seven (Stx2a, Stx2b, Stx2c, Stx2d, Stx2e, Stx2f, and Stx2g) subtypes, respectively (Scheutz et al., 2012). Although Stx1a has been linked to human illness, STEC that produce subtypes Stx2a, Stx2c, and Stx2d are more often associated with the development of HC and HUS (Friedrich et al., 2002; Melton-Celsa, 2014). In vitro studies in two different cell lines showed that Stx2a and Stx2d were more potent than Stx2b and Stx2c. These results were also confirmed by experimentation in mice showing a significantly higher potency of Stx2a and Stx2d than Stx1, Stx2b, and Stx2c (Fuller et al., 2011). Stx variants are not homogeneously distributed among the STEC population and certain variants are frequently detected in association with different animals (Martin and Beutin, 2011; Hofer et al., 2012; Fuente et al., 2015). Swine STEC strains commonly produce Stx2e (Fratamico et al., 2004; Meng et al., 2014; Tseng et al., 2015), which may cause edema disease in weaned pigs, often leading to ataxia and death. Stx2e-producing Escherichia coli, do not represent a particular threat for humans (Friedrich et al., 2002; Tseng et al., 2014b). Nevertheless, STEC carrying the stx2<sup>e</sup> gene have been isolated from human cases with mild diarrhea (Muniesa et al., 2000; Friedrich et al., 2002; Beutin et al., 2004; Sonntag et al., 2005) and from two patients with HUS (Thomas et al., 1994; Fasel et al., 2014). The severe outcome of the first HUS case was probably due to a co-infection with another STEC strain (Thomas et al., 1994), while the second patient with HUS was described as having a very weak immune system (Fasel et al., 2014). Besides Stx2e, there is a lack of information on the presence of other Stx subtypes in STEC strains isolated from swine.

The production of Stx is necessary to provoke HUS; however, other virulence factors are also important in causing illness. These include genes involved in cell adhesion, proteases, and toxins, as well as other putative virulence factors. The presence of specific combinations of virulence factors may determine the risk of developing severe symptoms. The eae gene, found on the locus of enterocyte effacement (LEE), encodes intimin, which is an adhesin involved in gut colonization. LEE-positive STEC are expected to provoke HUS or HC more frequently than LEE-negative STEC (Ethelberg et al., 2004; Toma et al., 2004; Luna-Gierke et al., 2014). Nevertheless, cases of HUS provoked by LEE-negative STEC have been reported (Karmali et al., 1985; Paton et al., 1999; Bielaszewska et al., 2009), including a large outbreak in 2011 in Europe caused by an enteroaggregative E. coli that acquired the stx2<sup>a</sup> gene, and it possessed a combination of virulence genes increasing its virulence (Boisen et al., 2015). This suggests that LEE is not essential in the development of severe symptoms, and other genes involved in adherence may also be important. Many adherence gene candidates, including eibG, lpfA, saa, and sab have been identified in STEC (Croxen et al., 2013). Nevertheless, mechanisms for attachment of LEE-negative STEC to the intestinal epithelium have not been studied as extensively as attachment of LEE-positive STEC.

In 2000, one objective of the U.S. Department of Agriculture's Animal and Plant Health Inspection Service National Animal Health Monitoring System (NAHMS) Swine 2000 study was to determine the prevalence of STEC in swine. Fecal samples were from states with the highest production of swine in the U.S. (U.S. Department of Agriculture, 2001). As a result of this work, 219 STEC isolates were recovered and characterized (Fratamico et al., 2004, 2008). Since this work was conducted, the knowledge of the importance of non-O157 STEC in human illness has increased, and there is a need to develop a model for molecular risk assessment associated with STEC. Knowledge of the virulence gene combinations that distinguish highly pathogenic E. coli from less virulent strains remains unclear, particularly for LEE-negative STEC (Beutin and Fach, 2014). Additionally, new virulence-associated and putative virulenceassociated factors are being identified (Coombes et al., 2008; Brandt et al., 2011; Bugarel et al., 2011). The aim of the present study was to characterize STEC recovered from swine, belonging to a variety of serotypes to determine their Stx subtype and virulence gene profiles to understand their virulence potential.

#### MATERIALS AND METHODS

### Bacterial Strains

Swine STEC strains were isolated and serotyped during the NAHMS swine 2000 study (NAHMS 2000) as described by Fratamico et al. (2004). Briefly, fresh swine feces were recovered from the pen floor of swine operations from the main porkproducing states in U.S. A total of 687 swine fecal samples were enriched using tryptic soy broth (TSB) and screened for the presence of stx<sup>1</sup> and stx<sup>2</sup> by polymerase chain reaction (PCR). Positive samples were plated onto Luria-Bertani agar, and stx1- and stx2-positive colonies were detected following DNA hybridization and confirmed by PCR. Two hundred and nineteen

STEC strains were serotyped and frozen in TSB with 20% of glycerol. From this collection, 181 STEC strains were used in this study and maintained on tryptic soy agar plates or TSB as working stock cultures.

Besides the NAHMS swine isolates, three STEC O91 strains from our collection were also used for comparison. STEC O91:H14 (strains 2.4111 and 2.4114) were isolated from ground beef while STEC O91:H21 (strain B2F1) was isolated from a case of HUS (Ito et al., 1990).

#### Identification of Shiga-toxin Subtypes

DNA extraction and PCR assays to identify stx subtypes and stx partial sequences were performed according to Scheutz et al. (2012) using a ProFlex PCR system (Thermo Fisher, Waltham, MA, USA) with slight modifications. TaqMan Environmental Master Mix 2.0 (Thermo Fisher) was used, and the annealing temperature was raised to 65◦C when cross-reaction was observed, as suggested by the authors (Scheutz et al., 2012). Gel electrophoresis was performed using 1.5% UltraPure Agarose (Invitrogen, Carlsbad, CA, USA) gel with 0.5X GelRed (Phenix Research Products, Candler, NC, USA) in 1X Tris-acetate-EDTA buffer at 100 V for 1 h. One microliter of amplified DNA was analyzed by agarose gel electrophoresis and visualized using an AlphaImager gel documentation system (Alpha Innotech, San Leandro, CA, USA).

Polymerase chain reaction products for sequencing were cleaned with Agencourt AMPure XP (Beckman Coulter, Brea, CA, USA), and 1.2 µl were amplified in a reaction consisting of 7 µl of 2.5X buffer, 1 µl of 3.2 µM primer stx2- F4 or stx2-R1 (Scheutz et al., 2012), 1 µl of Big Dye Terminator (Applied Biosystems), and 10 µl of nucleasefree water. Thermocycling conditions consisted of 30 cycles of 95◦C for 10 s, 55◦C for 5 s and 60◦C for 4 min. The sequencing reaction products were then purified and sequenced using Agencourt CleanSEQ (Beckman Coulter) and 3730 DNA Analyzer (Applied Biosystems), respectively. The sequences were manually curated using Sequencher v5.2.3 (Gene Code Corporation, Ann Arbor, MI, USA), run in VirulenceFinder 1.5 (Joensen et al., 2014), and blasted against the NCBI database<sup>1</sup> . The nucleotide sequences were deposited in the GenBank nucleotide sequence database under the following accession numbers: strain 306, KU682619; strain 308, KU682620; strain 326, KU682621; strain 341, KU682622; strain 360, KU682623, and strain 500, KU682624.

### High-throughput Real-time PCR Assay and Testing for Hemolysis

DNA was extracted from the swine isolates using the PrepMan Ultra Sample Preparation Reagent (Thermo Fisher) according to the manufacturer's instructions. The high-throughput real-time PCR (hrPCR) assay was carried out using the BioMark real-time PCR system (Fluidigm, San Francisco, CA, USA), targeting 67 virulence-associated and putative virulence-associated genes, 14 O-group-associated genes (O26, O45, O55, O91, O103, O104, O111, O113, O118, O121, O128, O145, O146, and O157) and 11 H-group-associated genes (H2, H4, H7, H8, H11, H16, H19, H21, H25, H28, and H32). Primers were designed in several studies (Perelle et al., 2003; Fratamico et al., 2008; Bugarel et al., 2010, 2011; Delannoy et al., 2013) and summarized by Tseng et al. (2014a). Reagents for DNA amplification and thermal cycling conditions were previously reported (Tseng et al., 2014a). Swine STEC strains positive to ehxA and hlyA genes were tested for hemolysis by plating onto SHIBAM agar (Hardy Diagnostic, Santa Maria, CA, USA).

#### FDA Minimal Signature E. coli Array

Swine Stx2d-producing E. coli and non-Stx2e STEC belonging to a serotype associated with human disease were further analyzed using the Minimal Signature E. coli Array Strip (FDA-ECID; Affymetrix, Santa Clara, CA, USA). Genomic DNA was isolated and concentrated using the DNeasy Tissue Kit (QIAgen Inc., Valencia, CA, USA) and SC100 Speedvac Concentrator (Savant Instruments, Inc. Holbrook, NY, USA), respectively. Two micrograms of DNA were tested using the FDA-ECID array as described in detail by Lacher et al. (2014). Robust multiarray average summarized probe intensity data were analyzed using R-Bioconductor software v3.1.2 and affy package with parameters defined by Lacher et al. (2014). The Hierarchical clustering was done using overview function in MADE4 package that uses average linkage cluster analysis with a correlation metric distance (Culhane et al., 2005; Culhane and Thioulouse, 2006).

## RESULTS

### Swine STEC Serotypes

All of the strains had been previously serotyped at the E. coli Reference Center at the Pennsylvania State University (University Park, PA, USA). In addition, many O-group- and H-groupspecific targets were included in the hrPCR assay. Several discrepancies were found and serotypes that did not match with the traditional serotyping are indicated in bold in **Figure 1**. Selected swine STEC strains were also analyzed using the FDA-ECID microarray, and the resulting serotypes were in agreement with the hrPCR. Moreover, the grouping within the phylogenetic tree was consistent with the serotypes proposed by the FDA-ECID microarray (**Table 1**).

#### Shiga-toxin Subtype Characterization

The swine STEC strains were analyzed by singleplex and multiplex PCR assays to determine their Shiga-toxin subtype. Stx-encoding genes were carried by 177/181 (99.8%) of the tested isolates. Four strains previously identified as STEC likely lost the Stx genes due to loss of Stx-encoding phages, as has been shown by other investigators (Joris et al., 2011) since PCR results were negative for any of the subtypes. stx<sup>1</sup> or stx<sup>2</sup> genes were carried by 25 and 151 strains, respectively. Stx subtype analysis revealed that the 25 stx1-positive strains carried the stx1<sup>a</sup> subtype. Among the 151 stx2-positive strains, 146 and 5 isolates carried stx2<sup>e</sup> and stx2d, respectively. STEC strain 308 was the only isolate that carried both stx<sup>1</sup> and stx2, subtypes stx1<sup>c</sup>

<sup>1</sup>http://blast.ncbi.nlm.nih.gov/Blast.cgi

and stx2d, respectively. Stx subtypes divided by STEC serotype are reported in **Figure 1**. Strains carrying Stx subtypes stx1d, stx2a, stx2<sup>b</sup> , stx2<sup>c</sup> , stx2<sup>f</sup> , and stx2<sup>g</sup> were not identified. Selected swine strains were analyzed using the FDA-ECID microarray and results of Stx subtypes are reported in **Table 1**. Nucleotide sequencing of stx<sup>2</sup> was carried out from STEC strains that were stx2<sup>d</sup> positive by PCR. The STEC strain 308 stx<sup>2</sup> sequence showed 100% identity to a portion of stx2<sup>d</sup> subunit B (AF479829) using VirulenceFinder. When blasted against the NCBI database, the STEC strain 308 stx<sup>2</sup> partial sequence matched EF441621 (stx2d) with 100% identity and no gaps. The stx<sup>2</sup> partial sequences of STEC strains 306, 326, 341, 360, and 500 were identical. VirulenceFinder results showed 100% identity with a portion of stx2d, subunit B (DQ059012). The most similar sequence present in the NCBI database was KC339670 (stx2e) with an identity on 99%.

### Distribution of Virulence-associated Genes among the Swine STEC Collection

Genomic DNA extracted from the swine STEC strains was analyzed using hrPCR. Genes encoding the enteroaggregative E. coli heat-stable enterotoxin 1 (astA) and the heat-stable enterotoxin (estIa) were detected in 79 (44%) and 91 (50%) of the isolates, respectively. Toxins and cytotoxic factors encoded by cdtI, cdtIII, elt, ent/espL2, cnf2, and subAB were not detected. Regarding cytolysins, enterohemolysin (ehxA) and α-hemolysin (hlyA) encoding genes were found non-simultaneously in 13 (7%) of the swine STEC strains each. These strains were also hemolytic when plated onto SHIBAM plates.

None of the isolated swine STEC strains carried the intiminencoding gene, eae, effector genes involved in the type III secretion system (espK, espM1, espM2, espN, espO1-1, espV, espX7, nleA, nleB, nleE, nleF, nleG5, nleG6-2, and nleH1-2) or the type II secretion system effector (etpD). Other genes encoding factors involved in adhesion and colonization to the host intestine were also investigated. Among these, the most prevalent were lpfAO113, paa, ihA, and lpfAO<sup>26</sup> present in 116 (64%), 98 (54%), 41 (23%), and 33 (18%) isolates, respectively. Genes orfA, orfB, and fedA were detected in less than 8% of the isolates. While bfp, efa1, fasA, fimF41a, saa, and toxB were not found, one swine STEC strain carried lpfAO157. Three autotransporter protein genes ehaA, espP, and sab were found in 60 (33%) 13 (7%), and 15 (8%) isolates.

Other gene targets were also included in the high-throughput real-time PCR assay. Positive results were obtained for the ecs1763, terE, katP, and ureD genes in 56 (31%), 31 (17%), 27 (15%), and 23 (13%) of isolates, respectively. While less than 8% were positive for pagC eibG, irp2, fyuA, ecf1, ecf2, ecf3, ecf4, and Z2099. None of the swine STEC strains carried ecs1822, epeA, sfp, stcE, Z2096, or Z2098.

## DISCUSSION

It is well-known that swine shed a variety of STEC serogroups, which may be carried along the food production chain. Most of the STEC isolated from these animals have adapted to the swine host and seem to have low potential to infect humans. Nevertheless, outbreaks associated with pork products have occurred (Meng et al., 2014; Tseng et al., 2014b). The sampling area covered by the NAHMS swine 2000 study was large, covering all the main pork-producing States (Fratamico et al., 2004). A subset of 181 STEC strains were analyzed and their pathogenic potential was assessed by detection of virulence and putative virulence factors.

The stx subtypes carried by the swine STEC were identified, and the majority of the isolates carried stx2<sup>e</sup> (81%), which was consistent with the data reported by Fratamico et al. (2004). The second most prevalent subtype was stx1<sup>a</sup> (14%), followed by stx2<sup>d</sup> (3%), and stx1<sup>c</sup> (1%). Stx2d is a potent toxin, and infection with strains carrying this subtype can lead to severe symptoms such as HC and HUS in humans (Melton-Celsa, 2014). Besides the Stx genes, the thermostable enterotoxin genes, astA and/or estIa, genes were found in ∼71%



A, autoagglutination; O-, O non-typeable; H-, H non-typeable; A; neg, negative results; NA, not available.

<sup>a</sup>The FDA-ECID does not include probes for O163 markers.

<sup>b</sup>Swine STEC isolates were analyzed together with two strains of E. coli O91:H14 and E. coli O91:H21 for comparison.

of the isolates. Thermostable enterotoxins are usually carried by enterotoxigenic E. coli, which are the major pathogens responsible for traveler's diarrhea. Twenty-two percent of the swine STEC strains were positive for both genes. The exotoxins HlyA (α-hemolysin) and EhxA (enterohemolysin) produce pores in the cytoplasmic membranes of eukaryotic cells causing their death. Their role in STEC pathogenesis is still not clear; HlyA may increase the virulence of extraintestinal pathogenic E. coli and, in the case of EhxA, a correlation between ehxA-positive STEC and development of severe symptoms in humans has been observed (Karch and Bielaszewska, 2001; Mainil, 2013). Thirteen isolates carried the hlyA gene. Nine of them belonged to serotypes O121:H- or O121:H10, presenting a virulence gene profile typical of strains associated with edema disease in swine due to the presence of stx2e, hlyA and fedA (Tseng et al., 2014b). The ehxA gene is commonly found in STEC. From 40 to 77% of strains collected from patients, food, and cattle carry this gene (Karch and Bielaszewska, 2001; Slanec et al., 2009; Bosilevac and Koohmaraie, 2011; Feng, 2014). Swine isolates appear to carry ehxA less frequently (Meng et al., 2014; Tseng et al., 2014a), and this observation is in agreement with our study where only 7% of the isolates was ehxA positive.

All of the swine STEC strains were LEE-negative. Although the adhesion mechanisms of LEE-negative STEC are not well characterized, several factors have been described to play an important role in adhesion to the intestinal epithelium. The long polar fimbriae gene lpfAO<sup>113</sup> was identified in STEC O113:H21 (Doughty et al., 2002). These investigators demonstrated that the removal of lpfAO<sup>113</sup> reduces the bacterial capacity to adhere to epithelial cells. Similar lpfA genes were found in E. coli O157 and O26 (Hayashi et al., 2001; Toma et al.,

2004). Another bacterial adherence-conferring gene is the ironregulated gene A homolog adhesin iha. Similarly to lpfAO113, the iha gene is commonly found in STEC strains associated with human cases of HUS (Newton et al., 2009; Galli et al., 2010). Nevertheless, non-pathogenic E. coli can also carry lpfAO<sup>113</sup> and iha, suggesting that the presence of these genes is insufficient to establish an infection (Toma et al., 2004). Over 80% of the strains analyzed in this study carried lpfAO26, lpfAO113, or lpfAO157; while iha was found in almost one quarter of swine isolates. iha-positive STEC were also described in a longitudinal study of two Midwestern U.S. pork production sites (Tseng et al., 2014a, 2015). On the contrary, none of the swine STEC strains collected in another interesting study in China carried iha (Meng et al., 2014). The second most prevalent adhesion factor found in this dataset was the porcine attaching and effacing-associated adhesin, paa, which is associated with neonatal post-weaning diarrhea in pigs (An et al., 1999). In addition, a few strains carried orfA and orfB, which encode for adhesins involved in diffuse adherence (Charbonneau et al., 2006).

Autotransporter proteins have a peculiar structure that allows them to move through the membrane system and execute their function outside the bacterial cell. The genes ehaA and sab were discovered in O157:H7 strain EDL933 and LEEnegative O113:H21, respectively. They encode for two different autotransporter proteins that contribute to adhesion and biofilm formation (Wells et al., 2008; Herold et al., 2009). Together with LEE genes, iha and ehaA are highly expressed in the intestines of pigs presenting attaching and effacing lesions (Liu et al., 2015). While the ehaA gene was present in over 30% of the swine isolates, sab was carried by 13 STEC strains only belonging to O-group O91.

As stated above, 12 to 18% of the isolates were positive for katP, ureD, and terE. The genes katP and ureD encode for a catalase/peroxidase and urease transporter, respectively. Their role in E. coli pathogenesis is unclear; however, they appear to be prevalent in diarrheagenic E. coli (Dorothea et al., 2006; Delannoy et al., 2013). The gene terE is a component of the ter cluster, which confers tellurite resistance (Orth et al., 2007). The ecs1763 and ecs1822 genes have been proposed to be novel markers for enterohemorrhagic E. coli. Their function is unknown, and they were shown to be shared by a clonal group of enterohemorrhagic E. coli that includes O26, O111, and O118 (Abu-Ali et al., 2009). Tseng et al. (2014a) observed that ecs1763 is frequently found in swine STEC, which was confirmed by the present study where 31% of the isolates carried ecs1763. ecs1822 was absent in all the tested strains.

Traditional serotyping of E. coli is time consuming, and cross-reactions among antisera often occur. Based on the hrPCR and FDA-ECID results, 71 strains present in this collection belonged to serotypes O8:H9, O8:H19, O8:H-, O15:H27, O20:H-, O91:H14, O101:H-, O121:H-, O145:H25, O159:H21, and O163:H19 that were previously isolated from human patients (Blanco et al., 1992; Beutin and Fach, 2014). Serotypes O8:H19, O15:H27, O145:H25, and O163:H19 have also been associated with cases of HUS (Prager et al., 2005; Bielaszewska et al., 2006; Galli et al., 2010). All of the strains belonging to the serotypes O8:H9, O8:H19, O8:H-, O20:H-, O101:H-, O121:H- , O145:H25, and O159:H21 analyzed in this study carried stx2e, which is a subtype that is generally not associated with STEC that cause serious human illness. Human infections linked to Stx2e-producing E. coli generally cause asymptomatic infections or mild diarrhea (Tseng et al., 2014b). The work of Sonntag et al. (2005) reported that human Stx2e-producing E. coli carry different virulence factors compared to swine Stx2e-producing E. coli associated with edema disease. They also detected fyuA and irp2 genes in five strains isolated from humans. These genes are included in the high-pathogenicity island (HPI), which is involved in the iron metabolism of Yersinia. Mouse models showed that the HPI increases E. coli virulence in extraintestinal infections (Schubert et al., 2002). The hrPCR results revealed that some swine STEC strains belonged to the same serotypes as human Stx2e-producing E. coli (O8:H19 and O8:H-) reported by Sonntag et al. (2005). STEC O8:H19 and STEC O8:H- also carried markers for the HPI. Moreover, their virulence gene profiles included adhesins (lpfAO26, lpfAO113, paa) and enterotoxins (astA and estIa), which suggest that they can potentially provoke mild diarrhea in humans. The HPI genes fyuA and irp2 were also found in Stx2eproducing E. coli belonging to serotypes O5:H4 and O8:H4 (**Table 1**).

Shiga toxin-producing Escherichia coli strain 308 was re-typed as O15:H27 using the FDA-ECID array and was found to have the same stx2<sup>d</sup> sequence as E. coli O15:H27 (strain 88-1509) in the STEC isolate database at Michigan State University<sup>2</sup> . E. coli strain 88-1509 was collected in 1988 from a human case of HC and HUS in Canada. Other strains belonging to serotype O15:H27 have been isolated from human and cattle feces, and from meat sources (Piérard et al., 1997; Woodward et al., 2002; Bosilevac et al., 2007; Galli et al., 2010). The LEE-negative swine STEC O15:H27 has a virulence gene profile consisting of stx1<sup>c</sup> , stx2d, ehaA, espP, fyuA, ihA, irp2, lpfAO113, and Z2099. The relevance of some of these genes was mentioned above. E. colisecreted protein P (EspP) is an autotransporter protein with serine protease activity, and is used by the bacteria to impair the complement response of the host (Orth et al., 2010). Recently, In et al. (2013) reported that EspP boosts macropinocytosis in the intestinal epithelium increasing Stx uptake. The open reading frame Z2099 is highly prevalent in typical and emerging enterohemorrhagic E. coli, while it is significantly less prevalent in non-pathogenic E. coli (Delannoy et al., 2013).

Six of the swine STEC strains carried stx2<sup>d</sup> according to PCR and VirulenceFinder results, and they belonged to serotypes O159:H-, O159:H4, and OX10:H-. DebRoy et al. (2016) reported that serological cross-reactions between the O159 and OX10 O-groups often occur and that the nucleotide sequences of O159 and OX10 O-antigen gene clusters are almost identical. Based on the FDA-ECID analysis, the strains 306, 360, 341, and 500 were re-typed as O159:H16; while the strain 326 was re-typed as O159:H- (**Table 1**). STEC belonging to O-group O159 rarely infect humans (Brooks et al., 2005; Gould et al., 2013). STEC O159:H16 and O159:H- have been isolated only

<sup>2</sup>http://shigatox.net/new/database.html

from swine samples, such as feces and carcasses (DesRosiers et al., 2001; Kaufmann et al., 2006; Meng et al., 2014). Stx subtype analysis of these strains often gives ambiguous results (Kaufmann et al., 2006; Meng et al., 2014). In this work, STEC O159:H16 and O159:H- were positive for stx2<sup>d</sup> when tested by PCR; however, they were positive for stx2<sup>e</sup> or stx2<sup>i</sup> using the FDA-ECID array. Note that probes of the FDA-ECID array corresponding to stx2<sup>i</sup> were designed using the Stx sequences AM904726 and FN252457 (Patel et al., 2016) that belong to the stx2<sup>e</sup> subtype according to Scheutz et al. (2012). The product obtained from partial sequencing of stx<sup>2</sup> was 99% identical to the sequence KC339670 when blasted against the NCBI database. KC339670 is a complete stx<sup>2</sup> sequence belonging to a STEC O159:H16 strain isolated from swine in China. After a neighborjoining cluster analysis of the sequence, Meng et al. (2014) concluded that KC339670 represented a new variant of stx2e. Further investigations using cell lines and animal models are needed to understand the virulence potential of this Stx2 variant. Another STEC O159 was detected in this collection. It belonged to the H21 H-group and was positioned distantly from the clade of O159:H16 and O159:H- (**Table 1**). This strain was positive for stx2<sup>e</sup> only by PCR. E. coli belonging to serotype O159:H21 was isolated in 1983 during a small outbreak of diarrhea involving newborn children in Spain (Blanco et al., 1992), and no other infections associated with serotype O159:H21 have been reported.

Locus of enterocyte effacement-negative STEC belonging to O-group O91 are frequently associated with adult human infections with symptoms ranging from mild diarrhea to HC and HUS. The main serotypes are O91:H14 and O91:H21, and the latter is usually linked with development of severe symptoms (Bielaszewska et al., 2009). Human STEC O91:H14 and O91:H21 isolates carried mainly stx<sup>1</sup> and stx2d, respectively (Prager et al., 2005; Bielaszewska et al., 2009; Galli et al., 2010). These STEC have been isolated from food samples derived from bovine, swine, and ovine origin, and from both domestic and wild animals (Martin and Beutin, 2011; Ju et al., 2012). From the NAHMS swine 2000 study, 15 strains belonging to serotypes O91:H12, O91:H14, O91:H44, and O91:H- were isolated from fresh fecal samples collected from four different states (Fratamico et al., 2004). Eight of these strains were re-typed as O91:H14, while STEC O91:H44 strains 448 and 477 did not belong to the O91 O-group based on FDA-ECID and hrPCR results (**Figure 1**; **Table 1**). STEC strain 319 that carried an identical virulence gene profile to other O91:H14 strains was also re-typed as O91:H14 by FDA-ECID array (**Table 1**). According to the phylogenetic tree in **Table 1**, the clade of STEC O91:H14 strains is well separated from the STEC O91:H21 strain B2F1 isolated from a human case of HUS. Interestingly, two O91:H14 strains were more closely related to two STEC O91:H14 strains isolated from ground beef samples than the other swine STEC O91:H14 strains. Despite the fact that one strain was katP-negative, all 13 STEC O91:H14 strains presented a conserved virulence gene profile (ehaA, ehxA, eibG, espP, ihaA, katP, lpfAO26, lpfAO113, pagC, sab, and stx1a), which is very similar to profiles of strains from human clinical samples (Prager et al., 2005; Bielaszewska et al., 2009). Similar to ihaA, lpfAO26, and lpfAO113, the proteins encoded by the genes eibG and sab are involved in host gut colonization. The E. coli immunoglobulin-binding protein encoded by eibG binds human immunoglobulin G and immunoglobulin A, and contributes to epithelial host cell adhesion (Lu et al., 2006), and sab is a gene encoding for an autotransporter protein involved in biofilm formation and found in a pathogenic LEE-negative STEC (Herold et al., 2009). Lastly, the pagC gene encodes for an outer membrane protein present in different Enterobacteriaceae that contributes to serum resistance (Nishio et al., 2005).

Human infections caused by STEC O163:H19 are rare (Brooks et al., 2005; Gould et al., 2013). However, Stx2-producing E. coli O163:H19 provoked sporadic cases of HUS (Prager et al., 2005) and have been found associated with cattle and produce (Woodward et al., 2002; Galli et al., 2010; Feng, 2014). In this work, five strains of STEC O163:H- or O163:H41/H51 were re-typed as O163:H19. They all carried stx1<sup>a</sup> similar to the Stx1-producing E. coli O163:H19 strain isolated from swine by DesRosiers et al. (2001).

STEC O20:H19 is associated with human cases of HUS (Galli et al., 2010), and one strain belonging to this serotype was isolated in the NAHMS study (Fratamico et al., 2004). However, this same strain was re-analyzed using the FDA-ECID array, and it was retyped as O152:H19, which is not known to be a human pathogen.

#### CONCLUSION

Using state-of-the-art DNA-based techniques, this study provides new insights on the distribution of virulence factors in a heterogeneous collection of STEC isolated from the major porkproducing states of the U.S. Stx2e-producing E. coli known to provoke mild diarrhea in humans carried different virulence factors than Stx2e-producing E. coli associated with edema disease in pigs; this finding suggests that Stx2e-producing E. coli that cause human illnesses may not have a swine origin (Sonntag et al., 2005). In our work, STEC strains carrying stx2<sup>e</sup> belonging to the same serotype and having similar virulence gene profiles as Stx2e-producing E. coli isolated from humans were identified. Additionally, the majority of Stx2e-producing E. coli carried thermostable enterotoxin genes usually found in enterotoxigenic E. coli.

This work suggests that STEC, including serotypes O15:H27 and O91:H14 that have been associated with human illness and are found in multiple hosts or environments, could also be carried by swine. Interestingly, a strain of O15:H27 found to carry stx2<sup>d</sup> and other virulence genes may have the potential to produce severe symptoms in humans. Moreover, STEC O91:H14 strains presented a virulence gene profile very similar to profiles found in human isolates.

### AUTHOR CONTRIBUTIONS

GMB and PF. designed research; GMB, LKB, SD, PF, FB, AA, and TP performed research; GMB, JG, and IP analyzed data; GMB and PF wrote the paper.

#### ACKNOWLEDGMENTS

fmicb-07-00574 April 19, 2016 Time: 16:19 # 8

This research was supported in part by an appointment to the Agricultural Research services (ARS) Research Participation Program administered by the Oak Ridge Institute for Science and Education (ORISE) through an interagency agreement between the U.S. Department of Energy (DOE) and the

#### REFERENCES


USDA. ORISE is managed by ORAU under DOE contract number DE-AC05-06OR23100. The hrPCR development was partially financed by the French joint ministerial program of R&D against CBRNE risks. All opinions expressed in this manuscript are the authors' and do not necessarily reflect the policies and views of USDA, ARS, DOE, or ORAU/ORISE.

autotransporter adhesin involved in diffuse adherence (AIDA-I). J. Bacteriol. 188, 8504–8512. doi: 10.1128/JB.00864-06




**Disclaimer**: Mention of trade names or commercial products is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U.S. Department of Agriculture.

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Baranzoni, Fratamico, Gangiredla, Patel, Bagi, Delannoy, Fach, Boccia, Anastasio and Pepe. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

Edited by:

Chitrita Debroy, The Pennsylvania State University, USA

#### Reviewed by:

Amit Kumar, Kansas State University, USA T. G. Nagaraja, Kansas State University, USA

#### \*Correspondence:

Joseph M. Bosilevac, U. S. Department of Agriculture, Agricultural Research Service, Roman L. Hruska U. S. Meat Animal Research Center, State Spur 18D, Clay Center, NE 68933-0166, USA mick.bosilevac@ars.usda.gov

†USDA is an equal opportunity provider and employer. Trade names are necessary to report factually on available data; however, the USDA neither guarantees nor warrants the standard of the product, and the use of the name by USDA implies no approval of the product to the exclusion of others that may also be suitable.

#### Specialty section:

This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology

Received: 13 July 2015 Accepted: 15 September 2015 Published: 29 September 2015

#### Citation:

Luedtke BE and Bosilevac JM (2015) Comparison of methods for the enumeration of enterohemorrhagic Escherichia coli from veal hides and carcasses. Front. Microbiol. 6:1062. doi: 10.3389/fmicb.2015.01062

# Comparison of methods for the enumeration of enterohemorrhagic Escherichia coli from veal hides and carcasses

#### Brandon E. Luedtke † and Joseph M. Bosilevac\* †

U. S. Department of Agriculture, Agricultural Research Service, Roman L. Hruska U. S. Meat Animal Research Center, Clay Center, NE, USA

The increased association of enterohemorrhagic Escherichia coli (EHEC) with veal calves has led the United States Department of Agriculture Food Safety and Inspection Service to report results of veal meat contaminated with the Top 7 serogroups separately from beef cattle. However, detection methods that can also provide concentration for determining the prevalence and abundance of EHEC associated with veal are lacking. Here we compared the ability of qPCR and a molecular based most probable number assay (MPN) to detect and enumerate EHEC from veal hides at the abattoir and the resulting pre-intervention carcasses. In addition, digital PCR (dPCR) was used to analyze select samples. The qPCR assay was able to enumerate total EHEC in 32% of the hide samples with a range of approximately 34 to 91,412 CFUs/100 cm<sup>2</sup> (95% CI 4-113,460 CFUs/100 cm<sup>2</sup> ). Using the MPN assay, total EHEC was enumerable in 48% of the hide samples and ranged from approximately 1 to greater than 17,022 CFUs/100 cm<sup>2</sup> (95% CI 0.4–72,000 CFUs/100 cm<sup>2</sup> ). The carcass samples had lower amounts of EHEC with a range of approximately 4–275 CFUs/100 cm<sup>2</sup> (95% CI 3–953 CFUs/100 cm<sup>2</sup> ) from 17% of samples with an enumerable amount of EHEC by qPCR. For the MPN assay, the carcass samples ranged from 0.1 to 1 CFUs/100 cm<sup>2</sup> (95% CI 0.02–4 CFUs/100 cm<sup>2</sup> ) from 29% of the samples. The correlation coefficient between the qPCR and MPN enumeration methods indicated a moderate relation (R <sup>2</sup> = 0.39) for the hide samples while the carcass samples had no relation (R <sup>2</sup> = 0.002), which was likely due to most samples having an amount of total EHEC below the reliable limit of quantification for qPCR. Interestingly, after enrichment, 81% of the hide samples and 94% of the carcass samples had a detectable amount of total EHEC by qPCR. From our analysis, the MPN assay provided a higher percentage of enumerable hide and carcass samples, however determining an appropriate dilution range and the limited throughput offer additional challenges.

Keywords: veal, total EHEC, multiplex qPCR, MPN, dPCR

### Introduction

Shiga-toxin producing Escherichia coli (STEC) are an increasing concern in relation to food safety. The United States Department of Agriculture (USDA) Food Safety and Inspection Service (FSIS) has identified the pathogenic strains of the serogroups O26, O45, O103, O111, O121, and O145 (Top 6) in addition to O157 as being adulterants in non-intact beef (Almanza, 2011). However, emerging STEC serogroups pose a threat to human health with emphasis on the STEC subgroup that comprises the enterohemorrhagic E. coli (EHEC). Approximately 20% of human illnesses caused by a non-O157 EHEC were attributed to a serogroup not identified by FSIS (Brooks et al., 2005; Gould, 2009).The EHEC serogroups mostly cause the severest form of disease and can result in hemorrhagic colitis and/or hemolytic uremic syndrome primarily in children under 10 and the elderly (Goldwater and Bettelheim, 2012). In the environment, cattle act as the primary reservoir for EHEC and facilitate the transmission of the bacteria through the release of contaminated feces. Moreover, during the harvesting of cattle, EHEC can contaminate the carcass via the transfer of feces from the animal hide (Elder et al., 2000; Monaghan et al., 2012).

Recently, FSIS has placed interest in the increased association of the adulterant EHEC with veal products compared to beef (United States Department of Agriculture and Food Safety and Inspection Service, 2012) and has implicated a hide to carcass transmission as the primary mode of contamination (United States Department of Agriculture, and Food Safety and Inspection Service, 2013). Indeed, among weaned beef calves entering the feedlot environment the fecal prevalence of O157:H7 was found to be at 5% while 54% of tested hides were positive for O157:H7 (Arthur et al., 2009). However, a study investigating total STEC prevalence found 100% of 62 white veal calves were positive by ELISA for Shiga toxin 1 and/or Shiga toxin 2 (stx1/2) (Cristancho et al., 2008). Although isolation and genetic characterization of the stx1/2 strains was not conducted for these samples, it does suggest that veal calves have the potential to harbor EHEC amongst the total STEC and could lead to hide contamination prevalence greater than that of O157:H7. The limited studies involving non-O157:H7 have identified EHEC of the serogroups O26, O103, O111, O118, and O145 as being associated with calves (Wieler et al., 1998; Pearce et al., 2004; Wang et al., 2014). This is likely not an exhaustive list of EHEC serogroups and additional studies are required to elucidate other EHEC serogroups found in veal calves. In addition, the current method for enumerating EHEC from veal calf samples uses direct plate counts on selective media and is limited to O157:H7, hence molecular assays to detect and enumerate EHEC associated with veal calves are required (Wang et al., 2014).

Culture based enumeration strategies, such as most probable number (MPN) or direct plate counts, can be a subjective and time-consuming process. Moreover, these assays could be impacted by EHEC that are viable but not culturable. Although, the contribution of viable but not culturable, EHEC to human disease is not fully known (Ramamurthy et al., 2014), molecular based assays would detect and include the unculturable EHEC in the enumeration. The use of real time PCR (qPCR) based enumeration methods are common for samples recovered from cattle. These assays primarily target a combination of the genes stx1/2, intimin (eae), uidA, rfbE, and fliC alleles for the detection and/or enumeration of O157:H7 and select Top 6 serogroups (Jacob et al., 2012; Wasilenko et al., 2012). However, these genes can be found separately in cells that are non-EHEC. Recently we used the E. coli attaching and effacing gene-positive conserved fragment 1 (ecf1), which is solely associated with EHEC (Boerlin et al., 1999; Becker and Groschel, 2014), as a gene target for the detection and enumeration of total EHEC directly from cattle feces using qPCR and reported a reliable limit of quantification of 1.25 × 10<sup>3</sup> CFUs/mL (Luedtke et al., 2014). To provide a lower limit of detection, the ecf1 target could be utilized in a molecular based modified MPN assay (Russo et al., 2014). In addition, the third generation of PCR termed digital PCR (dPCR) may offer an advantage to qPCR for the enumeration of total EHEC. In dPCR, Poisson based statistics are used to quantify absolute amounts of target DNA from tens of thousands of sub-nanoliter sized endpoint PCR reactions per sample. This reaction partitioning limits the interference of PCR inhibitors, allows for the detection of rare targets, and is not prone to amplification variability like replicate qPCR Cq values in the 35–40 range (Huggett et al., 2013; Marx, 2014).

Here we are the first to utilize three distinct molecular based assays detecting ecf1 to enumerate total EHEC associated with veal hides at the abattoir and the resulting carcass. For the enumeration of total EHEC, identical pre-enrichment samples were run in parallel using our previously mentioned qPCR assay, a molecular based MPN assay, and by dPCR. Each assay had comparative strengths and weaknesses. The MPN assay provided the highest detection and enumeration rate while the qPCR assay allowed for the greatest dynamic range. Moreover, this is the first reported use of dPCR for the direct enumeration of bacteria from cattle samples. The application of a rapid and accurate assay for the enumeration of total EHEC associated with veal could provide a valuable tool, which is currently lacking.

#### Materials and Methods

#### Sample Collection

Paired hide and carcass samples were collected from 95 20- to 22-week-old formula fed veal calves at a veal processing plant in December 2013. Prior to any form of microbial intervention, hide samples were obtained after stunning using Speci-Sponges (Nasco, Fort Atkinson, WI), moistened with buffered peptone water (BPW) (BD, Sparks, MD), to swab an approximate 500 cm<sup>2</sup> area over the breast-plate region. The sponges were passed (back and forth counting as one pass) five times either vertically or horizontally within the sample area and the sponge was flipped and passed five times in the remaining direction. The sponges were placed in respective Whirl-Pak bags (Nasco) containing 20 mL BPW. After the hide microbial intervention and removal, the respective carcass samples were obtained before any additional intervention in a similar fashion as the hide samples. An approximate 6000 cm<sup>2</sup> area from the inside and outside round and the navel-plate-brisket-foreshank areas was swabbed and placed in Whirl-Pak bags with 20 mL BPW. All samples were secured in insulated coolers with ice packs for transport to the United States Meat Animal Research Center. All samples were processed the following day. Before removing aliquots for analysis, bacteria were dislodged from the sponges and suspended by toughly hand massaging the Whirl-Pak bags.

#### qPCR

Prior to the enrichment of the hide and carcass samples, 20µl of the respective sample was added to 180µl of the BAX <sup>R</sup> system lysis buffer containing BAX <sup>R</sup> system protease and then prepared using the manufacturer's guidelines (DuPont, Wilmington, DE). All DNA samples were stored at −20◦C prior to processing. A standard curve was generated using the E. coli O157:H7 reference strain EDL 932 (ATCC 43894) and divided into single use aliquots that were stored at −20◦C. The EDL 932 standards, hide and carcass samples, and no template controls were run in duplicate reactions using the duplex qPCR assay targeting eae and ecf1 as previously described (Luedtke et al., 2014). To normalize the quantification of the total EHEC across separate reaction plates, a pooled approach was used to develop the standard curve for enumeration. The pooled approach was reported to reduce uncertainty in the concentration of an unknown sample compared to a standard curve generated from a single instrument run since a similar mean is established from all of the instrument runs in the study analysis (Sivaganesan et al., 2010). The total EHEC was recorded as CFUs/mL and then converted to CFUs/100 cm<sup>2</sup> using the previously described equation (Bohaychuk et al., 2011). In addition, the theoretical reliable limit of enumeration was calculated as CFUs/100 cm<sup>2</sup> of swabbed hide and carcass using the previously described limit of enumeration of 1250 CFUs/mL for the ecf1 target (Luedtke et al., 2014).

#### Modified Most Probable Number

A modified most probable number assay (MPN) was developed to increase the sensitivity of detection for the enumeration of total EHEC. From the Whirl-Pak bags, a 1 mL aliquot was transferred to 3 mLs of Tryptic Soy broth (TSB). For the hide samples, this initial dilution was used to create additional triplicate dilutions of 1:44, 1:484, and 1:5324 in BPW and incubated for 6 h at 42◦C. Since the carcass samples likely had a lower starting amount of total EHEC, triplicate dilutions of 1:4, 1:44, and 1:484 were created and incubated as previously described. A 1 mL portion of each dilution for the hide and carcass samples was inoculated into a Roka G2 Sample Transfer Tube (Roka Bioscience, San Diego, CA). Samples were shipped on ice to the Roka Bioscience laboratory for analysis using the automated Atlas <sup>R</sup> system (Roka Bioscience), which targets ecf1 mRNA for subsequent transcription mediated amplification and a hybridization protection assay. The MPN was determined from the number ecf1 of positive replicates for each dilution and calculated using a freeware MPN calculator (Jarvis et al., 2010). The total EHEC was recorded as CFUs/mL and then converted to CFUs/100 cm<sup>2</sup> using the previously described equation (Bohaychuk et al., 2011). Four carcass samples were incorrectly loaded and were removed from the sample set and comparative analysis.

Since this is the first description of using dPCR to enumerate bacteria directly from environmental sources, select samples with enumerable total EHEC from the MPN and qPCR were utilized for absolute enumeration. The absolute enumeration of total EHEC from 26 hide and 16 carcass pre-enrichment samples was performed in 15µl reactions. The reactions contained 7.5µl of the Quantstudio™ 3D Digital PCR Master Mix (Applied Biosystems <sup>R</sup> by Life Technologies, Carlsbad, CA), 2µl of DNA, 3.7µl of PCR grade H2O, and the addition of the eae and ecf1 primers and probes at the previously described concentrations (Luedtke et al., 2014). A 14.5µl aliquot of each reaction was loaded onto respective QuantStudio™ 3D Digital PCR 20K Chips (Applied Biosystems <sup>R</sup> by Life Technologies), which has 20,000 wells that can accommodate a 865 pL reaction per well, using the automated QuantStudio™ 3D Digital PCR Chip Loader (Applied Biosystems <sup>R</sup> by Life Technologies). No template controls and a field sample that screened negative by qPCR for both targets were also included for each run. All loaded chips were assembled according to the manufacturer's recommendations (Applied Biosystems <sup>R</sup> by Life Technologies). For the amplification of the DNA targets, the loaded chips were placed on a flat block Gene Amp <sup>R</sup> 9700 thermocycler (Applied Biosystems <sup>R</sup> by Life Technologies) and used the cycling conditions 96◦C for 10 min, with 39 cycles of 59◦C for 2 min and 98◦C for 30 s, a hold at 59◦C for 2 min, and a 4◦C hold. After the thermocycling was completed, the chips were allowed to warm to room temperature and analyzed within 1 hr after removal from the thermocylcer using the QuantStudio™ 3D Digital PCR Instrument (Applied Biosystems <sup>R</sup> by Life Technologies). Samples where the chip leaked the QuantStudio™ 12K Flex OpenArray <sup>R</sup> Immersion Fluid (Applied Biosystems <sup>R</sup> by Life Technologies) during thermocycling were redone using a new chip. Further analysis of the data was performed using the Quantstudio™ 3D AnalysisSuite™ version 2.0.0 (Applied Biosystems <sup>R</sup> by Life Technologies). All chips were analyzed for the quality of the read and adjusted to a quality threshold of 0.6 for an increased stringency. The fluorescent threshold was adjusted according to the no template control and the sample that screened negative for both targets by qPCR. These controls served as a baseline for background fluorescence of the FAM and MAXN dyes. The fluorescent threshold was adjusted above the background and then universally for eae and ecf1 across all samples due to the mono-modal peak associated with samples that contain a limited amount of target DNA. The total EHEC was recorded as CFUs/mL and then converted to CFUs/100 cm<sup>2</sup> using the previously described equation (Bohaychuk et al., 2011).

#### Sample Enrichment

dPCR

To determine the prevalence of total EHEC in the hide and carcass samples, the Whirl-Pak bags were supplemented with 80 mL of TSB and incubated for 6 h at 42◦C. The enriched samples were processed for qPCR as previously described and a multiplex qPCR assay was used to detect the presence of eae, ecf1, and stx1/2 (Luedtke et al., 2014).

#### Statistics

GraphPad Prism 6 (GraphPad Software, La Jolla, CA) was used to determine the 95% confidence intervals for the qPCR assays, construct the Bland-Altman plots, Pearson correlation coefficients, and was used to calculate significant differences using the χ 2 test. P-values <0.05 were considered significant.

### Results

#### qPCR Standard Curve and Enumeration of total EHEC from Veal Hide and Carcass Samples

Using a pooled approach to develop the qPCR standard curve for enumerating total EHEC provided a reproducible curve over the five log dilution series (**Table 1**). Overall, the PCR efficiency for eae and ecf1 was 102 and 104%, respectively (**Figure 1**). In addition, the no template controls were consistently negative across all qPCR assays.

Total EHEC was enumerable in 30 (32%) of the 95 preenrichment hide samples using the duplex qPCR assay. Based on the ecf1 target, the amount of total EHEC ranged from approximately 34 to 91,412 CFUs/100 cm<sup>2</sup> (95% CI 4–113,460 CFUs/100 cm<sup>2</sup> ). However, 26 (87%) of the enumerable samples were below the calculated reliable limit of enumeration of 5000 CFUs/100 cm<sup>2</sup> for the hide samples. Despite being below the reliable limit of enumeration, total EHEC could be enumerated based on extrapolation from the standard curve. However, some samples had a single Cq value from the duplicate reactions. This also occurred amongst the carcass samples. From the 95 pre-enrichment carcass samples, total EHEC was enumerable in 16 (17%) samples and all of the enumerable samples had a concentration below the calculated reliable limit of quantification of approximately 417 CFUs/100 cm<sup>2</sup> for the carcass samples. Using the ecf1 target, the amount of total EHEC in the carcass samples ranged from approximately 4–275 CFUs/100 cm<sup>2</sup> (95% CI 3–953 CFUs/100 cm<sup>2</sup> ).

#### MPN Enumeration of Total EHEC from Veal Hide and Carcass Samples

The MPN assay was able to enumerate total EHEC in 46 (48%) of the 95 hide swab samples, and indicated a concentration of total EHEC ranging from approximately 1 to greater than

TABLE 1 | Average Cq values and coefficients of variability from pooled standard curves collected during the enumeration of total EHEC from veal hides and carcasses.


<sup>a</sup>The average Cq value ± the standard deviation.

<sup>b</sup>CV, coefficient of variability

17,022 CFUs/100 cm<sup>2</sup> (95% CI 0.4–72,000 CFUs/100 cm<sup>2</sup> ). For the MPN analysis of total EHEC on the carcasses, 91 samples were included. As observed with the qPCR assay, the MPN assay indicated that the carcass samples have a low concentration of total EHEC, which ranged from approximately 0.1–1 CFU/100 cm<sup>2</sup> (95% CI 0.02–4 CFUs/100 cm<sup>2</sup> ) from 26 (29%) of the samples.

#### Comparison of the qPCR and MPN for the Enumeration of Total EHEC from Veal hide and Carcass Samples

By comparing the qPCR assay to the MPN assay, the qPCR assay was able to enumerate total EHEC in 23 (50%) of the hide samples that were also enumerable with the MPN assay. In addition, the qPCR assay was able to enumerate total EHEC in 7 (7%) of the hide samples that was not enumerable by the MPN assay. Samples with an enumerable amount of total EHEC were within the same log<sup>10</sup> value for 10 (43%) samples while the remaining 13 (57%) samples were within approximately one to two orders of magnitude difference between the qPCR and MPN assays (**Table 2**). Samples only enumerable by either qPCR or the MPN assay were below 3 log<sup>10</sup> CFUs/100 cm<sup>2</sup> . Regression analysis between the qPCR and MPN assays for the hide samples was significant (p < 0.00001) with a Pearson correlation coefficient of 0.63 (**Figure 2A**) and the Bland-Altman plot indicates that 92 (97%) of the hide samples were within the 95% confidence interval. This suggests that the two methods are interchangeable for the enumeration of total EHEC from hide samples (**Figure 3A**). Amongst the carcass samples, the qPCR assay was able to enumerate total EHEC in 10 (11%) samples not enumerable by the MPN assay, while the MPN assay was able to enumerate total EHEC in 21 (23%) samples not enumerable by the qPCR assay. In addition, 5 (5%) samples had a concentration of total EHEC that was enumerable by both assays. The Pearson correlation coefficient for the qPCR and MPN assays on the carcass samples was 0.04, which indicates that a relationship between the methods does not exist (**Figure 2B**). The Bland-Altman plot supports that the two methods are not interchangeable since less than 95% of the samples were within the confidence interval (**Figure 3B**). The differences in the ability to enumerate total EHEC between the qPCR and MPN assays could be explained by the methodology and limitations of each assay.

#### dPCR Analysis of Select Veal Hide and Carcass Samples for the Enumeration of Total EHEC

Additional analysis of the total EHEC enumeration observations was performed on select hide and carcass samples using dPCR. To determine the capabilities of dPCR, a separate eight log standard curve from approximately 8.19–1.19 log<sup>10</sup> CFUs/mL was created using the EDL 932 reference strain. From this standard curve, it was found, for both targets, that the dPCR assay was within the same log value as the expected inoculums for dilutions containing approximately 3–7 log<sup>10</sup> CFUs/mL (**Table 3**). At the dilution with an expected 8.19 log<sup>10</sup> CFUs/mL, the dPCR assay indicated approximately 7.26 log<sup>10</sup> CFUs/mL based on the eae target while the ecf1 target indicated 7.44

indicate the average standard deviation.



<sup>a</sup>n = 91.

<sup>b</sup> NE, No enumeration.

log<sup>10</sup> CFUs/mL. Thus, the upper limit of this dPCR assay is approximately within the 7 log<sup>10</sup> CFUs/mL range. Moreover, the lower limit was found to be approximately 3 log<sup>10</sup> CFUs/mL since at the expected dilution of 2.19 log<sup>10</sup> CFUs/mL the concentration of the eae and ecf1 targets was 3.52 log<sup>10</sup> and 3.15 log<sup>10</sup> CFUs/mL, respectively (**Figure 4**). Moreover, at the expected dilutions of 2.19 and 1.19 log<sup>10</sup> CFUs/mL, the precision for both targets was above 100% (**Table 3**). Analyzing the same standard curve template DNA by qPCR showed a similar trend, from dilutions containing approximately 7.19 to 3.19 log<sup>10</sup> CFUs/mL, as the dPCR. The respective efficiency for eae and ecf1 over the five log curve was 92% and 94%, with an R<sup>2</sup> of 0.999 for both targets (Supplementary Figure 1).

The dPCR assay tended to overestimate the concentration of total EHEC in the select hide and carcass samples that were indicated previously by the qPCR and MPN assays to have less than 3 log<sup>10</sup> CFUs/100 cm<sup>2</sup> (**Figures 5A,B**). However, for the hide samples with total EHEC over 3 log<sup>10</sup> CFUs/100 cm<sup>2</sup> the dPCR assay was in the same order of magnitude for 5 (50%) and 6 (60%) of the samples enumerated by either qPCR or the MPN assay, respectively. In three samples, 4, 49, and 62, the dPCR assay estimated the total EHEC concentration closer to the concentration enumerated by the MPN assay while the qPCR assay determined the EHEC concentration to be greater than approximately one magnitude lower (Supplementary Table 1). In addition, the samples 45, 51, and 53 were closer in the estimated concentration between the qPCR and dPCR assay than the MPN assay (Supplementary Table 1). The carcass samples selected for dPCR analysis all had total EHEC below 3 log<sup>10</sup> CFUs/100 cm<sup>2</sup> as determined by the qPCR and MPN assays. Despite the low level of total EHEC, the dPCR assay determined the concentration within the same log<sup>10</sup> for approximately 3 (19%) and was an order of magnitude higher or lower for 10 (62%) and two orders higher for 3 (19%) of the samples as the qPCR assay (Supplementary Table 2). The MPN assay estimated the total EHEC concentration on the carcasses at two to three orders of magnitude lower than the qPCR and dPCR assays (Supplementary Table 2).

#### Prevalence of Total EHEC in Veal Hide and Carcass Enrichments

To determine the prevalence of total EHEC in the hide and carcass samples, the samples were enriched and a multiplex qPCR assay targeting eae, ecf1, and stx1/2 was performed. From the hide samples, total EHEC, which are positive for all three targets, was detected in 72 (76%) of the samples while 5 (5%) samples were positive for only eae and ecf1. In addition, eae and stx1/2 was detected in 12 (13%) of the samples while eae alone was detected in 6 (6%) samples. Using a χ 2 test, significantly (p =< 0.05) more carcass samples were positive for all three targets than the hide samples with 86 samples. This accounted for approximately 91% of the carcass samples while 3 (3%) samples had detectable amounts of only eae and ecf1. Four (4%) of the carcass samples had detectable amounts of only eae and stx1/2 and 1 (1%) sample contained only eae while 1 (1%) sample was negative for all three targets. The average post enrichment Cq values for the

hide and carcass samples that were positive for all three targets were compared. The hide samples had an average Cq value ± the standard deviation for eae, ecf1, and stx1/2 of 27.4±1.6, 30.8±2.7, and 29.0±3.0 respectively. For the carcass samples, the respective average Cq values were 33.4 ± 2.9, 34.7 ± 1.9, and 34.2 ± 3.1 for eae, ecf1, and stx1/2. A correlation between the pre-enrichment enumeration values and the post-enrichment Cq values for hide and carcass samples was not identified (data not shown).

#### Discussion

Pathogenic E. coli remains a constant concern for food safety and human health with an emphasis on the most severe pathotype, EHEC (Palaniappan et al., 2006). A systematic review and metaanalysis covering 62 years of published reports indicates a stable and continued association of EHEC with calves (Kolenda et al., 2015), yet veal calves have received limited study toward EHEC detection and enumeration methods (Wang et al., 2014). To

methods based on the average total EHEC enumerated (log<sup>10</sup> CFUs/100 cm2) for a respective sample using the qPCR and MPN assays and the difference in enumeration values (log<sup>10</sup> CFUs/100 cm2) between the assays. (B) Carcass samples were enumerated by the qPCR and MPN assays and compared for agreeability between the two methods based on the average total EHEC enumerated (log<sup>10</sup> CFUs/100 cm2) for a respective sample using the qPCR and MPN assays and the difference in enumeration values (log10 CFUs/100 cm2) between the assays. UCL and LCL indicate the 95% upper confidence level and 95% lower confidence level, respectively. The mean is the average difference between the two methods.

address these issues, we sought to investigate the use of qPCR, molecular MPN, and dPCR assays to detect and enumerate total EHEC from paired pre-intervention veal calf hides and carcasses.

Studies utilizing qPCR to detect and enumerate EHEC primarily focus on a single EHEC such as O157:H7 or target virulence genes that can be independently possessed by non-EHEC in a polymicrobial matrix. This use of potential nonconjoined targets results in false positives and an over estimation of the true EHEC population (Jacob et al., 2012). Recently, Livezey et al. (2015) reported the use of ecf1 as a target for the detection of total EHEC in beef samples. That study detailed the specificity of ecf1 in E. coli possessing eae, stx1/2, and ehxA, which is applicable to determining the total EHEC load in a sample. In addition, qPCR has primarily been used to detect and enumerate specific EHEC from cattle feces, while the application of qPCR on direct hide and carcass samples is unreported. This is likely due to the low concentration of a specific EHEC serogroup within a defined area on the hide or carcass (Arthur et al., 2007), and the intrinsic limit of detection and enumeration for qPCR, which can range between 10<sup>3</sup> and 10<sup>4</sup> CFUs/mL. Indeed, based on the surface area sampled in this study, to reach the theoretical limit of detection and enumeration comparable to the previously determined reliable limit of 1250 CFUs/mL (Luedtke et al., 2014) a concentration of 5000 and 417 CFUs/100 cm<sup>2</sup> isrequired for the hide and carcass samples, respectively.

While we could not find any published reports of using qPCR to determine the total EHEC concentration on cattle or veal hides or carcasses, a PCR-MPN based investigation of potential total EHEC from 11 head of cattle found the respective average concentration on hides and carcasses to be approximately 15662 and 123 CFUs/100 cm<sup>2</sup> when using the average of the targets eae and ehxA for the enumeration regardless of the cattle diet (Gilbert et al., 2008). Analyzing the data in this manner provides a better normalization to our enumeration targeting ecf1 as ecf1 is in a 1:1 relationship with E. coli possessing both eae and ehxA (Livezey



et al., 2015). Using this methodology suggests that our use of qPCR to enumerate total EHEC from veal hides and carcasses may under estimate the total EHEC load (Gilbert et al., 2008). Moreover, determining the precise amount of total EHEC in low concentration samples is difficult due to the Monte Carlo effect (Bustin and Nolan, 2004) as respectively 87 and 100% of the enumerable hide and carcass samples were below the reliable limit of enumeration and some samples returned a single Cq value for the duplicate reactions.

Variations of the MPN assay have been described. These modifications incorporate a combination of immunomagnetic separation and PCR to increase sensitivity. However, these MPN assays primarily focus on the enumeration of O157 and O26 from feces (Widiasih et al., 2004; Stephens et al., 2007; Guy et al., 2014) while two studies have investigated the total potential EHEC in feces and on hides and carcasses (Gilbert et al., 2005, 2008). With our dilution range for the hide and carcass samples the enumeration of total EHEC was mostly one to two logs below the estimates of average potential EHEC reported by Gilbert et al. (2008). However, Gilbert et al. (2008) was analyzing cattle and differences in cattle versus veal EHEC hide carriage and carcass processing techniques may exist in addition to physiological and environmental differences (Cristancho et al., 2008; Wang et al., 2014). In addition, the use of an a priori dilution scheme, like used here, for the MPN assays highlights the limitations in the ability to enumerate total EHEC at the upper and lower concentrations from a diversity of samples. To encompass a diverse sample set by expanding the MPN dilution range would reduce the throughput and increase the cost per sample of large analyses.

Comparatively, the MPN assay was able to detect and enumerate total EHEC from 17 (17%) more hide and 11 (12%) more carcass samples than the qPCR assay. However, the qPCR assay was able to enumerate total EHEC in 7 hide and 11 carcass samples that were not enumerable by the MPN assay. The additional samples enumerable by PCR could be due to differences in the detection methods. In the qPCR assay, all amplifiable DNA contributes to the enumeration, which would

include free DNA and DNA from injured E. coli, while the MPN assay targets ecf1 mRNA transcripts and would likely only enumerate viable cells. In addition, the targeting of mRNA would improve the sensitivity of the assay since more template would be available for detection, which is likely why more hide and carcass samples were enumerable than with the qPCR assay. Despite the detection and enumeration advantage of the MPN assay, the indication, from the Bland Altman plot, that the MPN and qPCR assay are interchangeable for hide enumerations would save a considerable amount of time and resources when using the qPCR assay. The reliable detection and enumeration of total EHEC from carcasses offers an additional challenge due to the inherently low concentrations of total EHEC that resulted in a difference of 2 to 3 orders of magnitude between the qPCR and MPN assays.

To overcome the low concentrations of total EHEC, we sought to investigate dPCR. Digital PCR has been previously shown to be insensitive to PCR inhibitors and perform similarly to qPCR on environmental samples, but does not require a standard curve to enumerate the target DNA concentration (Blaya et al., 2015; Kinz et al., 2015). Our analysis of a dilution curve using dPCR showed a similarity to qPCR in the enumeration of the gene targets with the expected input of approximately 10<sup>7</sup> and 10<sup>3</sup> CFUs/mL being within the same order of magnitude for the two methods. However, based on precision, the dynamic range of the dPCR was limited compared to the qPCR assay. To increase the dynamic range toward lower target concentrations, additional dPCR replicates are required to lower the percent precision within an acceptable range (Blaya et al., 2015; Majumdar et al., 2015). The requirement to perform additional dPCR reactions for a single sample with a high percent precision value could be cost inhibitory for commercial application; hence additional dPCR reactions were not performed in this study.

False positive and negative reactions at lower target concentrations can impact the accurate enumeration of the target (Majumdar et al., 2015) as can incidental that pipetting errors between dilutions that change the expected concentration, which the absolute enumeration of dPCR would detect and provide a true estimation (Kishida et al., 2014). dPCR is less prone to error due to stochastic effects like qPCR at low target concentrations. This was evident in hide samples 4, 49, and 62 as the qPCR assay had a high standard deviation between duplicates while the dPCR was in agreement with the concentration estimated by the MPN assay. To our knowledge, this is the first report of using dPCR to enumerate total EHEC from an environmental source and without using previous DNA purification methods.

Our analysis of the 95 hide samples after enrichment indicted total EHEC prevalence at 76%, which is below the 94% average prevalence of the Top 6 EHEC on 132 veal hides as reported by Wang et al. (2013). This difference in prevalence could be due to the origin of the calves and/or the detection method utilized. The method used by Wang et al. (2013) identifies stx1/2 and single-nucleotide polymorphisms associated with the Top 6 serogroup and eae. However, the analysis of results from a polymicrobial sample like hides could result in false positives as we observed 95 (100%) of the hide samples possessing eae while 18 (19%) lacked ecf1 and 12 (67%) of these samples also contained stx1/2. This would cause a misinterpretation of true positives if all targets were possessed by separate bacteria, and has been indicated to occur, using the same method, during the detection of the Top 7 on beef hides (Stromberg et al., 2015). Interestingly, the carcass samples had significantly more (P < 0.05) samples possessing all three gene targets. With these total EHEC detections coming from an enrichment of the sample, it suggests that the hide intervention utilized maybe not be effective and/or the equipment and procedures utilized for hide removal facilitate further total EHEC transmission (United States Department of Agriculture, and Food Safety and Inspection Service, 2013). However, the comparison of Cq values, although not a fully quantitative method, between the hide and carcass sample sets does provide a generalized estimation of the total EHEC population prior to enrichment with regards to the background microflora population and enrichment media (Vimont et al., 2007). With this methodology, the pre-enrichment hide samples likely started with a total EHEC near the detection/enumeration limit while the carcass samples were below the limit (Luedtke et al., 2014) as we observed for the qPCR, MPN, and dPCR assays. Moreover, differences in the background microflora between samples would explain why a correlation between the enumeration value and the Cq value was not identified (Vimont et al., 2007). In addition, samples possessing eae and ecf1, which are classified as atypical EPEC (Livezey et al., 2015), remain a concern due to the potential for these cells to regain stx1/2 at later point (Bielaszewska et al., 2007) but would not be identified using the conventional detection and enumeration methods.

In conclusion, veal has received limited research pertaining to food safety despite being a significant source of total EHEC. This study is the first to enumerate total EHEC from paired pre-intervention veal hides and carcasses. Each of the methods utilized for detection and enumeration had benefits and drawbacks. The qPCR assay was easy to use and offered a high throughput, but the inherent low concentration of total EHEC on the hides and carcasses limited the accuracy and the requirement of a standard curve limits consistency and number of samples loaded. Our MPN assay allowed for the detection of viable cells and offers a lower limit of detection; however, this sample size inhibited the throughput and required a different range of dilutions. Digital PCR offers advantages of a medium throughput and no standard curve, although the dynamic range of dPCR is limited and to improve precision would require additional analysis of samples with a total EHEC concentration below the reliable limit. When attempting to enumerate low concentrations of total EHEC, the MPN assay would provide the most accurate results while the qPCR and dPCR assays would be effective in determining veal with hide concentrations above approximately 5000 CFUs/100 cm<sup>2</sup> within 4 h. By rapidly determining highly contaminated calves, these animals could be restricted to end of the day production or receive an increased focus during intervention. Moreover, it was unexpected to detect a higher prevalence of total EHEC on the carcass samples, but the use of an additional carcass intervention may eliminate the risk imposed by the low concentration of total EHEC on the carcass. However, further research tracking the potential spread of total EHEC from a veal hide to the resulting carcasses in combination with our detection and enumeration assays would provide a better understanding of why and how veal trim has an higher association with total EHEC than beef trim.

## Author Contributions

BL and JB contributed equally to the designing of experiments, conducting assays, data analysis, and drafting the manuscript.

## Acknowledgments

The authors would like to thank the cooperating veal processing plant for access to sample collection and Roka Bioscience for providing the assay tubes and performing analyses. Additional recognition to Lawnie Luedtke and Greg Smith for technical support and Jody Gallagher for secretarial assistance.

### Supplementary Material

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmicb. 2015.01062

## References


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2015 Luedtke and Bosilevac. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Detection, Characterization, and Typing of Shiga Toxin-Producing Escherichia coli

#### Brendon D. Parsons <sup>1</sup> , Nathan Zelyas <sup>2</sup> , Byron M. Berenger <sup>2</sup> and Linda Chui <sup>1</sup> \*

<sup>1</sup> Laboratory Medicine and Pathology, University of Alberta, Edmonton, AB, Canada, <sup>2</sup> Medical Microbiology and Immunology, University of Alberta, Edmonton, AB, Canada

Shiga toxin-producing Escherichia coli (STEC) are responsible for gastrointestinal diseases reported in numerous outbreaks around the world. Given the public health importance of STEC, effective detection, characterization and typing is critical to any medical laboratory system. While non-O157 serotypes account for the majority of STEC infections, frontline microbiology laboratories may only screen for STEC using O157-specific agar-based methods. As a result, non-O157 STEC infections are significantly under-reported. This review discusses recent advances on the detection, characterization and typing of STEC with emphasis on work performed at the Alberta Provincial Laboratory for Public Health (ProvLab). Candidates for the detection of all STEC serotypes include chromogenic agars, enzyme immunoassays (EIA) and quantitative real time polymerase chain reaction (qPCR). Culture methods allow further characterization of isolates, whereas qPCR provides the greatest sensitivity and specificity, followed by EIA. The virulence gene profiles using PCR arrays and stx gene subtypes can subsequently be determined. Different non-O157 serotypes exhibit markedly different virulence gene profiles and a greater prevalence of stx1 than stx2 subtypes compared to O157:H7 isolates. Finally, recent innovations in whole genome sequencing (WGS) have allowed it to emerge as a candidate for the characterization and typing of STEC in diagnostic surveillance isolates. Methods of whole genome analysis such as single nucleotide polymorphisms and k-mer analysis are concordant with epidemiological data and standard typing methods, such as pulsed-field gel electrophoresis and multiple-locus variable number tandem repeat analysis while offering additional strain differentiation. Together these findings highlight improved strategies for STEC detection using currently available systems and the development of novel approaches for future surveillance.

Keywords: STEC, detection, characterization, typing, O157, non-O157

## INTRODUCTION

Shiga toxin-producing Escherichia coli (STEC) encompass a heterogeneous group of enteric pathogens responsible for numerous sporadic infections and large outbreaks worldwide. Accurate and rapid diagnosis of STEC infections is important for the appropriate management of infected patients and for implementation of proper public health interventions. Specifically, patients infected with STEC should not be treated with antibiotics because of the risk of developing

#### Edited by:

Pina Fratamico, Agricultural Research Service, USA

#### Reviewed by:

Nora Lía Padola, Universidad Nacional del Centro de la Provincia de Buenos Aires, Argentina Glenn Edward Tillman, United States Department of Agriculture Food Safety and Inspection Service, USA

\*Correspondence:

Linda Chui linda.chui@albertahealthservices.ca

#### Specialty section:

This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology

Received: 27 February 2016 Accepted: 22 March 2016 Published: 12 April 2016

#### Citation:

Parsons BD, Zelyas N, Berenger BM and Chui L (2016) Detection, Characterization, and Typing of Shiga Toxin-Producing Escherichia coli. Front. Microbiol. 7:478. doi: 10.3389/fmicb.2016.00478 hemolytic uremic syndrome (HUS) (Wong et al., 2000, 2012; Smith et al., 2012). Also, once STEC is identified in a patient, the contacts and potential sources of infection must be identified to prevent further spread of the disease. Although, laboratories have become proficient at detecting O157:H7 infections, they often do not screen stools for other STEC serotypes. This creates a gap in diagnostics; since 50% or more of STEC infections may be caused by non-O157 STEC, our surveillance and understanding of the epidemiology of STEC disease is incomplete (Fey et al., 2000; Jelacic et al., 2003; Thompson et al., 2005; Chui et al., 2011; Couturier et al., 2011; Scallan et al., 2011; Gould et al., 2013; STEC, National Surveillance Summary, 2014).

Growing recognition of the shortfall in STEC detection has prompted a shift toward more comprehensive STEC identification methods. Bacteriological culture remains the gold standard test, given the importance of identifying viable bacterial isolates for typing. For this reason, there has been increased development and use of agars which also select for non-O157 STEC (Kase et al., 2015). However, as culture-based methods are laborious and exhibit clear limits in sensitivity for STEC detection, it is recommended that laboratories supplement culture-based approaches with other assay types (Gould et al., 2009).

Alongside culture-based STEC testing many laboratories assay for the presence of Shiga toxins (Stx) or the stx genes. Shiga toxin was originally referred to as verotoxin for its cytotoxic effect on Vero cells (Konowalchuk et al., 1977). Once Stx was linked to hemorrhagic colitis and HUS, researchers developed cytotoxicity assays to detect Stx from both fecal specimens as well as from enriched stool cultures containing polymyxin B (Karmali et al., 1985). While such laborious cytotoxicity assays remain a method of diagnosis for some laboratories, detection of Stx or the presence of the stx genes is now primarily done in clinical laboratories by enzyme immune assay (EIA) of some form, or by polymerase chain reaction (PCR)-based approaches, respectively. These methods can also determine if Stx1 or 2 are present, which adds prognostic value, since there is a well-documented correlation of Stx2 with the clinical severity of STEC infection and the risk of HUS (Schimmer et al., 2008; Soon et al., 2013; Chui et al., 2015a).

While advancements in the sensitivity and speed of STEC detection have direct implication on the diagnosis and treatment of diarrheal illnesses, characterization of STEC isolates beyond serotype or individual virulence factors is required for prevention, control and prediction of STEC infections on a public health scale. The requirement for high resolution typing of STEC is also increasingly necessary given the observed emergence of diverse types of virulent strains (Soon et al., 2013). Current STEC fingerprinting techniques such as pulsed-field gel electrophoresis (PFGE) or multi-locus variable number tandem repeat analysis (MLVA) using the PulseNet International protocol allow comparison of strains from different countries and aid in the epidemiological tracking of STEC infections around the world, especially during outbreak settings (Sabat et al., 2013). While methods such as PFGE and MLVA play a crucial role in current outbreak investigations, the increasingly tractable use of whole genome sequencing (WGS) has garnered significant interest as a powerful method for typing bacterial pathogens (Chattaway et al., 2016). WGS technologies promise typing resolution at orders of magnitude greater than existing methods. Yet, as technical capabilities improve for STEC typing, new challenges arise surrounding implementation, standardization and management of typing data (Köser et al., 2012; Franz et al., 2014).

Here we present an overview of recent advances and the experiences at the Provincial Laboratory of Public Health (ProvLab) in Alberta Canada with various bacteriological, molecular and genomic strategies for detection and typing of STEC. We assess the benefits and shortcomings of various methods used in the detection and differentiation of STEC. Through evaluation of available systems and opportunities for novel approaches, this review aims to identify improved strategies for STEC identification and surveillance.

### DIFFERENTIAL AND SELECTIVE MEDIA

When E. coli O157:H7 was first identified as an etiologic agent of hemorrhagic colitis, it was discovered to be unlike most other strains of E. coli, because it could not ferment sorbitol (Wells et al., 1983; Pai et al., 1984). This biochemical peculiarity led to the use of sorbitol-MacConkey (SMAC) agar to identify non-sorbitol fermenting E. coli in stool of patients with bloody diarrhea (Remis, 1984). This agar differed from typical MacConkey agar by substituting lactose with sorbitol; nonsorbitol fermenting organisms produced white colonies on the medium (Remis, 1984). Early investigations found that SMAC agar displayed acceptable sensitivity, specificity, and negative predictive value (NPV) for E. coli O157:H7 detection, but a positive predictive value (PPV) of only 28% (**Table 1**; March and Ratnam, 1986). Besides its low PPV, other limitations of SMAC include its inability to detect non-O157 STEC as well as sorbitol-fermenting O157 STEC isolates, which can carry the toxigenic stx genes and cause outbreaks (Gunzer et al., 1992; Ammon et al., 1999). To improve the detection of STEC, a new chromogenic medium, CHROMagarTM O157, was developed by CHROMagar Microbiology (CHROMagarTM O157 CHROMagar Microbiology, Paris, France, 2013). Through the incorporation of proprietary chromogenic substrates in CHROMagarTM O157 agar, O157 STEC appear mauve while other E. coli are blue. Unfortunately, as with SMAC agar, CHROMagarTM O157 is not able to detect most non-O157 STEC (Bettelheim, 1998). The only study directly comparing SMAC agar to CHROMagarTM O157 showed that CHROMagarTM O157 had a lower false-positive rate for colony picks (20%) compared to that of SMAC agar (65%) as well as estimated annual cost-savings of approximately \$76,000 in a large Canadian clinical laboratory (Church et al., 2007). In the same study, CHROMagarTM O157 had a higher sensitivity (96.3% vs. 85.2%) for detecting O157 STEC (**Table 1**; Church et al., 2007). Other chromogenic media, such as ColorexTM O157 (Alere, Inc., Ottawa, ON, Canada), are also available to detect O157 STEC, but have not been clinically evaluated.

Many clinical microbiology laboratories continue to use either SMAC agar or CHROMagarTM O157 exclusively to detect

#### TABLE 1 | Summary of clinically evaluated STEC detection methods.


PPV, positive predictive value; NPV, negative predictive value; for CHROMagarTM O104 STEC, diagnostic values are for O104:H4 STEC only; for CHROMagarTM O157 and sorbitol-MacConkey, diagnostic values are for O157:H7 STEC only; for all other assays, diagnostic values are for all STEC serotypes.

STEC even though they are not appropriate for the detection of non-O157 STEC. Guidelines released by the Centers for Disease Control and Prevention (CDC) specify that laboratories are to simultaneously culture stool specimens for O157 STEC and test them with an assay that detects non-O157 STEC (Gould et al., 2009). Because of the shortcomings of SMAC and CHROMagarTM O157, there has been interest in creating a medium capable of detecting non-O157 serotypes of STEC.

During the 2011 E. coli O104:H4 outbreak in Europe, a medium specifically designed to detect the outbreak strain, CHROMagarTM STEC O104, was developed (Gouali et al., 2013). While this agar is able to detect the O104:H4 strain expressing an extended-spectrum β-lactamase (ESBL) initially causing the outbreak, it is unable to detect other non-O157 serotypes and O104:H4 isolates that have lost the plasmid encoding the ESBL (Mariani-Kurkdjian et al., 2011; Grad et al., 2012). Its utility is also hampered by low sensitivity and PPV (**Table 1**).

Other chromogenic agars capable of detecting wider ranges of STEC serotypes have been described, including Rainbow <sup>R</sup> Agar O157 and CHROMagarTM STEC (Biolog, Hayward, CA, USA, 2008; CHROMagarTM STEC, CHROMagar Microbiology, Paris, France, 2014). CHROMagarTM STEC is meant to detect all STEC serotypes. Like other media developed by CHROMagar Microbiology, pathogen detection on CHROMagarTM STEC is based on the organism's utilization of proprietary chromogenic substrates. CHROMagarTM STEC is able to detect most of the STEC serotypes for which it has been assessed (Hirvonen et al., 2012; Wylie et al., 2013; Zelyas et al., 2016). Direct inoculation of stool onto the agar yields acceptable sensitivity, specificity, and NPV, but the PPV is quite low (Wylie et al., 2013; Zelyas et al., 2016). Two studies using different broth enrichment protocols prior to inoculation showed sensitivities varying from 50% (McCallum et al., 2013) to 91.4% (Gouali et al., 2013).

Rainbow <sup>R</sup> Agar O157 is purported to detect O157:H7, O26:H11, O48:H21, O111:H-, and O111:H8 serotypes based on their reduced or absent β-glucuronidase activity compared to non-toxigenic strains (Biolog, Hayward, CA, USA, 2008). Although, Rainbow <sup>R</sup> Agar O157 has been evaluated in a number of studies for the detection of STEC in food and water (Radu et al., 2000; Tutenel, 2003; Tillman et al., 2012; Yoshitomi et al., 2012; Ngwa et al., 2013), its ability to identify STEC from human stool was first investigated at the Alberta ProvLab. A study by Zelyas et al. (2016) performed at the Alberta ProvLab compared four chromogenic agar media in their ability to detect non-O157 STEC. Isolates from a panel of 161 non-O157 STEC were inoculated directly onto CHROMagarTM STEC, Rainbow <sup>R</sup> Agar O157, CHROMagarTM O157, and Colorex <sup>R</sup> O157 to observe if the isolates would produce STEC-like colonies. Unsurprisingly, CHROMagarTM O157 and Colorex <sup>R</sup> O157 were unable to identify the majority of non-O157 isolates as STEC, while CHROMagarTM STEC and Rainbow <sup>R</sup> Agar O157 had detection rates of 90% and 70%, respectively. Using stool cultures spiked with non-O157 STEC isolates, it was found that CHROMagarTM STEC once again exhibited a superior detection rate of 72% (compared to 26% using Rainbow <sup>R</sup> Agar O157) in bloody stool. Similar to previous studies, CHROMagarTM STEC demonstrated a sensitivity, specificity, PPV, and NPV of 84.6, 87, 13.9, and 99.6%, respectively, when 536 clinical specimens were inoculated directly onto the medium (**Table 1**). Although, studies demonstrate that CHROMagarTM STEC shows promise in its ability to rule out STEC in its absence, the high number of false-positive results seen on the medium would necessitate considerable additional laboratory testing to confirm or deny STEC status of mauve colonies. The use of CHROMagarTM STEC should perhaps be limited to the procurement of STEC isolates when a stool tests positive for STEC by a non-culture method, such as toxin or toxin gene detection. As discussed below, such non-culture methods often display sensitivities above the ∼85% seen with CHROMagarTM STEC.

### ENZYME IMMUNOASSAYS

As no culture medium is yet available for the practical detection of all STEC serotypes, identifying the Shiga toxin (Stx) in stools is an alternative method of diagnosing STEC-related disease. The first EIAs developed for Stx identified STEC colonies based on the binding of monoclonal antibodies to Stx1 and Stx2 immobilized on membranes (Perera et al., 1988; Milley and Sekla, 1993). Since the creation of these early EIAs that required the growth of isolated colonies on a solid medium, a number of other assays have been developed for the detection of Shiga toxin directly from stool or from enriched stool cultures.

One of the most evaluated and used EIAs is the Premier <sup>R</sup> EHEC microwell immunoassay (Meridian Bioscience Inc., Cincinnati, OH, USA). Multiple studies using overnight broth enrichment stool cultures found that Premier <sup>R</sup> EHEC demonstrates high sensitivity and specificity (**Table 1**). Premier <sup>R</sup> EHEC has also been used to detect Stx directly from clinical specimens without the use of an overnight enrichment step; one group found this approach had a sensitivity of 83.9% and specificity of 99.8% (Teel et al., 2007). Another microwell immunoassay that has undergone clinical evaluation, the ProSpectTM Shiga Toxin E. coli assay (Remel, Lenexa, KS, USA), demonstrated inferior sensitivity compared to Premier <sup>R</sup> EHEC (**Table 1**).

Besides microwell EIAs, other types of immunoassays have been developed to detect STEC. One such assay is the BioStar <sup>R</sup> SHIGATOX optical immunoassay (Inverness Medical Professional Diagnostics, Inc., San Diego, CA, USA) which detects Stx by its interaction with anti-Stx antibodies on the surface of a silicon wafer; this interaction causes an increase in the optical thickness of the thin film and results in a visible color change on the wafer. Similarly, the Duopath Verotoxin-testTM (Merck, Darmstadt, Germany) is an immunochromatographic assay that employs anti-Stx antibodies immobilized to a membrane to bind and detect Stx. In previous studies, the BioStar <sup>R</sup> SHIGATOX assay exhibited a superior performance to the Duopath Verotoxin-testTM (**Table 1**). However, the Duopath Verotoxin-testTM is advantageous because it differentiates between Stx1- and Stx2-producing STEC.

Studies performed at the Alberta ProvLab have evaluated two microwell immunoassays: the aforementioned Premier <sup>R</sup> EHEC and the Shiga Toxin ChekTM assay (TechLab, Inc., Blacksburg, VA, USA; Chui et al., 2011, 2015b). Premier <sup>R</sup> EHEC demonstrated a sensitivity of 90.5%, similar to that seen in previous studies (Grif et al., 2007; Teel et al., 2007; Hermos et al., 2011), and the Shiga Toxin ChekTM assay had a lower sensitivity of 80% which decreased to 70% when unenriched specimens were used (**Table 1**; Chui et al., 2011, 2015b).

Additionally, two immunochromatographic assays have been assessed at the Alberta ProvLab: ImmunoCard STAT! <sup>R</sup> (Meridian Bioscience, Inc., Cincinnati, OH, USA) and Shiga Toxin Quik ChekTM (TechLab, Inc., Blacksburg, VA, USA; Chui et al., 2015b, 2013). Despite having a specificity >99%, ImmunoCard STAT! <sup>R</sup> had a low sensitivity of 35.5% even when using enrichment broths (**Table 1**). Shiga Toxin Quik ChekTM demonstrated sensitivities of 85 and 70% with and without enrichment, respectively (Chui et al., 2013, 2015b).

Some caution must be exercised when using EIAs alone to detect STEC. There have been two norovirus outbreaks in the United States in which EIAs yielded false-positive STEC results, highlighting the pitfall of depending on a single method to diagnose STEC-related disease (Centers for Disease Control and Prevention (CDC), 2001, 2006).

### MOLECULAR METHODS

While the detection of Stx is a direct way to determine if clinical specimens harbor STEC, there has been much interest in nucleic acid-based methods to detect the presence of the stx genes in stools. The earliest application of nucleic acid detection for STEC involved the use of cloned portions of stx<sup>1</sup> and stx<sup>2</sup> as <sup>35</sup>S-labeled DNA probes in colony hybridization assays (Willshaw et al., 1987; Scotland et al., 1988). Soon after DNA hybridization assays were developed, a conventional PCR targeting stx<sup>1</sup> and stx<sup>2</sup> in a single reaction was devised (Pollard et al., 1990); a similar assay was later described which could detect STEC from DNA isolated from stool (Brian et al., 1992).

A multitude of PCR assays have been developed since and a number of them use real-time platforms. Some of the advantages of using real-time PCR assays include excellent sensitivity and specificity and the ability to devise multiplex assays to detect and differentiate between stx<sup>1</sup> and stx2, other virulence genes such as the intimin gene, eae, and hemolysin gene, ehx4, and even other gastrointestinal pathogens. The first reported stxtargeting real-time PCR assay used directly on naturally infected clinical stool had a sensitivity of 100% and a specificity of 92% (Bélanger et al., 2002). Numerous real-time PCR assays have been designed and generally demonstrate similarly high detection rates with few false positive results (Grys et al., 2009; Gerritzen et al., 2011; Zhang et al., 2012). Commercial real-time PCR assays such as the GeneDisc <sup>R</sup> (GeneDisc <sup>R</sup> Technolgies Pall Corporation, NY, USA) and BAX <sup>R</sup> System (DuPont Nutrition and Health, Wilmington, DE, USA) include a panel for rapidly screening for STEC, targeting stx1, stx2, and eae or other genes, followed by panels that target serotype-specific genes of O157 STEC and top six non-O157 STEC. These real-time PCR STEC panels exhibit high sensitivity and can be applied in two step screening algorithms that first capture STEC followed by detection the most frequently reported STEC serotypes (Fratamico et al., 2012; Wasilenko et al., 2014) Most real-time PCR assays use any one of a number of available detection systems, including SYBR green, TaqMan <sup>R</sup> , molecular beacon probes, fluorescence resonance energy transfer (FRET) probes, LUXTM (light upon extension) assays with singly-labeled primers without probes, as well as other methods. Further contributing to the heterogeneity of available methods, different real-time PCR assays often target different regions within stx<sup>1</sup> and stx2, (Chui et al., 2010).

Numerous multiplex molecular assays for the detection of multiple gastrointestinal pathogens are also available. The xTag <sup>R</sup> Gastrointestinal Pathogen Panel (GPP) (Luminex Corporation, Austin, TX, USA), is FDA- and Health Canada-approved for the detection of multiple agents of gastroenteritis. The GPP employs a multiplex PCR with a reverse transcriptase step intended to amplify nucleic acid from nine bacterial pathogens, three parasites, and three viruses. The generated amplicons are then hybridized to oligonucleotides bound to microspheres, which are detected by the instrument. Included in the GPP are separate targets for the detection of E. coli O157 and non-O157 STEC. Multiple evaluations performed in different regions have demonstrated high sensitivities and specificities (**Table 1**). Some studies report the detection of STEC by the GPP in culture- or conventional PCR-negative specimens; the significance of these results, whether due to heightened sensitivity of the GPP or to false-positives, has not been determined (Mengelle et al., 2013; Vocale et al., 2015). The EntericBio real-time Gastro Panel I <sup>R</sup> (Serosep, Limerick, Ireland), the FilmArray <sup>R</sup> GI panel (BioFire, Inc., Salt Lake City, UT, USA), and the Seeplex <sup>R</sup> Diarrhea ACE Detection system (Seegene, Seoul, South Korea) demonstrate similar sensitivities and specificities for STEC as the GPP (**Table 1**).

A method of considerable interest that has yet to be clinically evaluated for the detection of STEC from human stool is loopmediated isothermal amplification (LAMP). The assay uses a DNA polymerase with strand-displacement activity and four to six specially-designed primers to generate high numbers of stem-loop amplicons in as little as 1 h at a stable temperature of 60–65◦C; real-time visualization of positive reactions occurs with the production of insoluble magnesium pyrophosphate, thus obviating the need for fluorescent reporters (Notomi et al., 2000; Mori and Notomi, 2009). Advantages of LAMP include a high sensitivity and specificity, short turn-aroundtime, isothermal conditions, and a simple detection method. Two studies employed LAMP to detect STEC from human stool thus far, neither determined sensitivities or specificities for the assays used (Wang et al., 2012; Teh et al., 2014). However, a LAMP assay developed by Hara-Kudo et al. (2007) to detect STEC, had 100% sensitivity for stx<sup>1</sup> and stx2, and a specificity of 98% for stx<sup>1</sup> and 100% for stx<sup>2</sup> in tests of stool samples at the Alberta ProvLab. As well, the PPV for this assay was 92% for stx1and 100% for stx<sup>2</sup> while both stx<sup>1</sup> and stx<sup>2</sup> had an NPV of 100%. Although, a major disadvantage of LAMP is the difficulty in developing multiplex assays, one could envision how the advantages of the LAMP assay could be exploited in point-of-care testing. However as of yet, more clinical evaluations are needed.

The Alberta ProvLab compared the diagnostic characteristics and costs associated with five PCR assays (Chui et al., 2010). This analysis showed that an in-house assay using the TaqMan <sup>R</sup> platform with a rapid turn-around-time costs the least among the real-time PCR assays, making it the most attractive test (**Table 1**; Chui et al., 2010). This assay has been used by ProvLab to determine the prevalence of STEC infections in various areas of Alberta during 2006–2012 as well as to act as a comparator for other methods of STEC detection (Chui et al., 2011, 2015b, 2013; Couturier et al., 2011; Zelyas et al., 2016). As well, the Alberta ProvLab is participating in the APPETITE (Alberta Provincial Pediatric EnTeric Infection TEam) study, which is comparing the GPP to routine detection methods in a large pediatric cohort from 2014 to 2019 to better define the epidemiology of gastrointestinal disease in Alberta (Freedman et al., 2015). Since routine STEC detection methods in Alberta currently involve only the identification of O157:H7 STEC through culture methods, the use of the GPP during the APPETITE study will greatly enhance STEC disease detection among children with acute gastroenteritis and serve to further evaluate the diagnostic value of the GPP.

## AN ALGORITHM TO MAXIMIZE STEC DETECTION

None of the aforementioned approaches is without drawbacks. Culture techniques either lack sensitivity or a robust PPV, toxin detection assays may yield false-positives and are often expensive, and molecular methods tend to be laborious and/or expensive. At the same time, each method has at least one advantage: culture allows the isolation of strains for typing; EIAs confirm the production of disease-causing toxin and permit non-O157 STEC to be detected; and current nucleic acid tests have high sensitivity and specificity for all STEC serotypes. As suggested by the CDC in 2009, STEC detection algorithms are of most utility if a combination of culture and non-culture methods are used (Gould et al., 2009). One approach would be to pool clinical specimens and test them initially using a non-culture method. This would have the benefit of keeping costs low while screening for a low-prevalence disease. Once a positive result is obtained, the individual stools could be tested by the same non-culture method to identify the STEC-positive specimen (Chou et al., 2014). This would be followed by culture of the specimen on a chromogenic agar to obtain the isolate for typing.

While the design of detection and characterization algorithms may use various combinations of testing methods, the ultimate goal for public health investigations of STEC are aimed at specifically identifying "pathogenic" strains of STEC. However, strategies to detect pathogenic STEC are hindered by a lack of a consistent association between any single marker or combination of markers and the severity of disease. Essentially, there exist no absolute characteristic of pathogenic STEC, and therefore testing algorithms need sufficient inclusivity to capture emerging strains. As outlined by the 2013 European Food Safety Authority (EFSA) criteria for assessing STEC pathogenicity, the O104 outbreak in 2011 revealed significant shortfalls in previous testing algorithms. Specifically, algorithms that focus on identification of a narrow panel of serogroups, virulence genes or reliance on seropathotypes, which define the reported frequencies of certain serotypes with human disease, are likely insufficient to detect "non-typical" emerging STEC strains (EFSA BIOHAZ Panel, 2013). As such, the O104 outbreak strain was not included in any seropathotype category prior to 2011, and screening strategies at the time required detection of stx and eae before attempting to isolate the suspect STEC, therefore missing O104, which was eae negative. Modifications to STEC detection algorithms outlined by the EFSA include requirements to attempt isolation of STEC from all samples positive for stx genes. In addition, the EFSA panel also recommended testing for the presence of aaiC (a secreted protein of EAEC) and aggR (a plasmid-encoded regulator) genes associated with enteroaggregative adhesion, which along with stx and eae exhibit higher associated risk of severe disease. The continued improvements in STEC identification and strategic testing algorithms will aid epidemiological investigations and provide early detection of future STEC outbreaks.

### GENOMICS AND GENOTYPING IN SURVEILLANCE

Once STEC is identified and isolated in culture, the next challenge is to identify the relatedness of isolates for the purpose of public health surveillance. As discussed, STEC isolates can be classified initially based on the serotype, but additional typing is required to determine if an isolate is related to another of the same serotype. PFGE has been used extensively in public health to determine the relatedness of isolates of many bacterial species including STEC. For STEC, enhanced resolution can be achieved by combining PFGE with MLVA or using PFGE alone. Networks of public health laboratories accredited to run PFGE and/or MLVA report to their regional PulseNet organization the profiles of organisms they type (i.e., PulseNet International). This facilitates identifying national or international outbreaks that would otherwise go unnoticed. In our laboratory, this process through PulseNet Canada and PulseNet USA has found cases linked to Albertan foodborne outbreaks associated with the cross-border trade of food products between different provinces of Canada and the two countries (internal communications).

PFGE and MLVA have great utility in outbreak investigations of STEC and are advantageous because they are amenable to intra-laboratory comparison. However, for many types of bacteria, these methods and others do not have adequate resolution to identify outbreaks. WGS, on the other hand, has been demonstrated in numerous cases to provide enhanced resolution compared to pre-WGS methods (Gilchrist et al., 2015) because the entire genome (or most of the genome) can be analyzed rather than just one or a few genetic elements. In comparison to the current STEC typing and epidemiological screening methods, WGS has superior discriminatory power for comparable cost and would dramatically streamline the detection and typing workflow by replacing the multiple tests required for current investigations (Joensen et al., 2014; Dallman et al., 2015a,b).

### WHOLE GENOME SEQUENCING

Before discussing the studies demonstrating the utility of WGS for STEC typing, one must be aware that multiple computing methods exist to assess relatedness between a set of isolates. These methods can be broadly categorized into those that analyze the difference in single nucleotide variants (SNV) between isolates (also referred to as single nucleotide polymorphisms [SNP]), nucleotide differences, gene presence or absence throughout the whole genome, gene allele differences, or overall genetic similarity (e.g., k-mers, average nucleotide identity, and multiple genome alignment; Konstantinidis et al., 2006; Sims et al., 2009; Nielsen et al., 2011; Maiden et al., 2013; Leekitcharoenphon et al., 2014). Once sequenced using a next-generation sequencer such the IonTorrentTM (Thermo Fisher Scientific, Waltham, USA)<sup>1</sup> or HiSeqTM /MiSeqTM (Illumina Inc., San Diego, USA)<sup>2</sup> platforms, one or two files are generated depending on if single-end or paired sequencing is done, respectively. The file(s) contain all the "raw" sequence reads of the genomic fragments of ∼150–300 bp in length, which can then be analyzed directly or assembled into contigs to form a draft genome. The assembly can be done with the help of a reference genome (e.g., E. coli O157:H7

<sup>1</sup>Thermo Fisher Scientific, Waltham, USA IonTorrentTM, Thermo Fisher Scientific. Available online at: https://www.thermofisher.com/ca/en/home/brands/ ion-torrent.html (Accessed January 12, 2016).

<sup>2</sup> Illumina Inc., San Diego, USA Illumina Sequencing Platform Systems Portfolio. Available online at: http://www.illumina.com/content/dam/illumina-marketing/ npiassets/PDF/brochure-sequencing-systems-portfolio.pdf (Accessed January 12, 2016).

str. Sakai) or de novo. Most methods used to demonstrate the utility of WGS in public health investigation of bacteria have used assembly-based analysis. In general, assembly uses one of more than 10 assemblers available, but microbiological investigations, especially for Enterobacteriaceae, have generally used Velvet or Burrows-Wheeler Aligner (Zerbino and Birney, 2008; Li and Durbin, 2009). Once assembled, the genome can then be compared to other genomes to identify similarity. This is the step in which there are the diverse aforementioned analysis methods with each having multiple different software algorithms and parameters in which to approach them. These approaches are bundled into analysis "pipelines" in which a raw or assembled genome sequence file can be inputted and subjected to multiple software algorithms with specified settings (Kisand and Lettieri, 2013).

Most genome analysis pipelines require high performance computing. There are, however, commercial methods emerging that have optimized genome assembly and analysis to run on high-performance desktop computers or utilize external computing infrastructure to run the computationally intensive steps. These commercial methods include: Bionumerics (Applied Maths NV, Sint-Martens-Latem, Belgium)<sup>3</sup> , CLC Genomic Workbench (Qiagen, Redwood City, CA, USA) and RidomTM Seqsphere+ (Ridom GmbH, Münster, Germany)<sup>4</sup> . For SNV analyses, there are also online web-interfaces that allow the user to upload a set of sequence files and use a genomic center's computing infrastructure to run the software (e.g., SeqSero [http://www.denglab.info/SeqSero] and the Center for Genomic Epidemiology [http://www.genomicepidemiology.org]).

### WGS IN STEC SURVEILLANCE

SNV analysis has been the predominant method used to date to type isolates using WGS. Usually only the portion of the genome that is conserved amongst a species or a specific pathovar is used for determining strain relatedness (Tettelin et al., 2005; Maiden et al., 2013). This type of analysis is coined "core SNV" analysis to differentiate from SNV analysis of the entire genome. Four groups have applied SNV analysis for the typing of STEC for clinical public health purposes: ProvLab in collaboration with PulseNet Canada and the Public Health Agency of Canada National Microbiology Laboratory (PHAC-NML), the Danish Center for Genomic Epidemiology, Health Protection Scotland and Public Health England.

The Serum Staten Institute in Copenhagen, Denmark sequenced 42 isolates received in a 7-week period and determined their relationship using their web-based tools SNPtree and NDtree (Joensen et al., 2014). During this study period they had an outbreak with 13 cases of E. coli O157:H7, six of which were included in their study. The NDtree method, an assembly-free approach that compares the test isolates to nucleotide segments of a reference genome and generates a score representing the differences in nucleotides found between the genomes, was able to distinguish the outbreak O157 isolates from the other six O157 non-outbreak isolates and the other STEC serotypes. The SNPtree method clustered all serotypes except for O117 K1:H7 together and found 29–65 SNVs different within the outbreak O157 isolates and 521–753 SNVs different between the outbreak and non-outbreak O157 isolates. This group also demonstrated the ability of SNPtree and NDtree methods to differentiate Salmonella Typhimurium strains from each other (Leekitcharoenphon et al., 2014).

Public Health England has demonstrated that SNV phylogenetic methods can accurately identify outbreak isolates while adding increased sensitivity to current methods (MLVA and epidemiological investigations in this case; Dallman et al., 2015a). In one of their studies, 572 isolates received by the Gastrointestinal Bacterial Reference Unit for typing including randomly selected isolates from 2012 (n = 334) and 2013 (n = 147) were sequenced. Based on temporal and epidemiological linkages, the maximum number of SNV differences for isolates to be part of the same cluster was found to be five. An intriguing part of this study was that SNV analysis identified two outbreaks that were not detected by their routine epidemiological investigations or MLVA typing, but were later found to have previously unrecognized epidemiological linkages. This group also demonstrated that SNV analysis was concordant with epidemiological investigations in its ability to identify two different outbreaks caused by watercress contaminated with E. coli O157 from two different retailers (one supplied by imports from North America and Europe and the other supplied from south England; Jenkins et al., 2015). SNV analysis may also be applicable to non-O157 because in one study of a nursery school-associated E. coli O26:H11 outbreak, ≤3 SNVs differences were found in outbreak associated-isolates compared to ≥272 SNVs differences between outbreak and non-outbreak isolates (Dallman et al., 2015b).

In Scotland, using SNV methods, a 5-year retrospective review of 105 E. coli O157 isolates and 11 epidemiologically linked clusters, found that WGS was generally concordant with MLVA (Holmes et al., 2015). In this study, epidemiologically linked cases exhibited SNV differences of ≤4, while unrelated cases had SNV differences between 9 and 1632. Two sets of isolates that differed in only one MLVA locus were 32 and 126 SNVs different from the other isolate in each set, demonstrating the increased discriminatory power of WGS.

In 2014, Alberta had one of the largest outbreaks of E. coli O157:H7 since monitoring by PulseNet Canada began in 2000 with a final tally of 119 clinical cases, which was linked to pork consumption (ProMed-mail post 2759887, 2014). The PHAC-NML SNVPhyl pipeline was used to detect SNV in 111 of these clinical cases and 6 environmental/food isolates and was compared to the current protocol for identifying outbreaks, which involves PFGE and MLVA profiling in combination with epidemiological investigations (Sabat et al., 2013). Clinical, food and environmental isolates from the pork-associated outbreak were found to have ≤23 SNVs different from each other and a minimum of 84 SNVs different from isolates not associated with the outbreak, which included sporadic isolates, a concurrent

<sup>3</sup>Applied Maths NV, Sint-Martens-Latem, Belgium BioNumerics Seven: a unique software platform | Applied Maths. Available online at: http://www.applied-maths. com/bionumerics (Accessed January 12, 2016).

<sup>4</sup>Ridom GmbH, Münster, Germany Ridom SeqSphere+. Available online at: http:// www.ridom.de/seqsphere/u/User\_Guide.html (Accessed January 12, 2016).

smaller outbreak associated with a summer fair and a 2012 beef outbreak. The intra-outbreak SNV differences in the two other outbreaks were 0–5 SNVs and it should be noted that 109 of the 117 sequenced pork outbreak isolates were also 0– 5 SNVs different from each other. The same isolates were also subjected to k-mer analysis in which the genome is segmented into nucleotide sequences of a pre-determined length (in this case 25-mers) and the frequency of each k-mer in the entire genome of each isolate is compared to the frequency of kmers in all other isolates to determine a k-mer phylogeny tree. This analysis was able to cluster each outbreak into separate nodes. Interestingly, the k-mer method was also able to distinguish isolates from within the pork-associated outbreak that lacked some virulence genes present in all other outbreak isolates. The current surveillance methods and SNV analysis could not distinguish the isolates missing virulence genes from other isolates of the same outbreak. The difference between the two WGS methods is likely because SNV analysis compares conserved genomic regions where virulence factors are rarely found, whereas k-mer analysis analyzes the entire genome.

#### WGS ANALYSIS APPROACHES

The ideal bacterial typing method should have the following characteristics: accuracy, inter- and intra-laboratory reproducibility, stability with multiple passaging of isolates, high discriminatory power, concordance with epidemiological data, speed, ease-of-use, cost effectiveness, and amenability to computerized analysis (Van Belkum et al., 2007). WGS fulfills many of these requirements while providing better accuracy and discriminatory power than PFGE, MLVA and/or epidemiological investigations combined. WGS is currently being used by Public Health England for routine pathogen surveillance (Ashton et al., 2015), and PulseNet Canada is currently setting up a similar infrastructure. Before WGS becomes the international standard, especially for networks such as PulseNet, many issues still need to be addressed.

First of all, a comparison of the different software platforms needs to be performed against a set of isolates with known epidemiological and typing data. Although the studies discussed herein demonstrate that different pipelines can cluster outbreak isolates together with similar intra-outbreak SNV differences, the performance of each pipeline should be tested using the same set of isolates. Also, the SNV and k-mer analysis methods are very computer intensive (k-mer more so than SNV) and are not amenable to use by labs without the appropriate computing infrastructure or expertise. If SNV and/or k-mer methods were used as a standard, a new isolate would need to be compared to a curated regional, national and/or international database of isolates to place it into phylogenetic clusters, similar to viral genotyping. Other options such as wholegenome or core genome multi-locus sequence typing (wg or cgMLST), which look at most or all genes in an organism's genome, can create a barcode by assigning numbers for allelic variants and can be run on a desktop computer (Kohl et al., 2014; Leopold et al., 2014; Ruppitsch et al., 2015). Once a wgMLST database for E. coli is developed in PulseNet, it could become a strong contender for the standard WGS typing of STEC.

Once standards and issues surrounding the computing power and expertise are resolved, WGS will supplant current typing methods for STEC and most other organisms. It can also replace serotyping of STEC and typing of stx genes (Joensen et al., 2014, 2015) while providing the additional benefits of using the sequence for detecting virulence genes other than stx and providing data for research into genetic elements that influence pathogenicity. Finally, if the methods to perform metagenomics on stool are refined, WGS may also replace PCR or selective agars as the initial screening mechanism for STEC and other enteric pathogens. However, there remain significant technical, economic and organizational hurdles to overcome, before the practical use of genome-wide typing and analysis approaches in routine STEC investigations become a reality (Köser et al., 2012; Franz et al., 2014).

### EXPERIENCES AT THE ALBERTA PROVLAB

There have been continuous improvements in STEC detection through bacterial culturing, immunochemistry, and molecular and genomic methods. The advancements in each of these methods aim to increase assay sensitivity, specificity, speed, throughput and broad-ranging strain inclusivity. Despite the improvements in commercial assays and technologies, the adoption of detection methods that encompass non-O157 STEC serotypes by frontline laboratories has been slow. For this reason, there is likely to be a continued significant underreporting of STEC infections in current surveillance data in many countries. At the Alberta ProvLab, multiple studies of regional STEC rates and serotypes revealed that a diverse range of serotypes exist among non-O157 STEC in the province; 99% of these were identified by culture method and included O157 (n = 99), top six (n = 45), and nontop six (n = 36) (Chui et al., 2011, 2015a,b; Couturier et al., 2011). Most of the non-O157 STEC identified would not have been detected by routine frontline testing, which is restricted to detect only O157 STEC. Notably, between 8.3 and 52.9% of the non-O157 serotypes identified in these studies were positive for Stx2. Therefore, these findings reveal the underreporting of non-O157 STEC and the potentially pathogenic strains that risk being undetected in the population (Couturier et al., 2011; Chui et al., 2011, 2015a,b; Gould et al., 2013).

## CONCLUSION

There has been much dialog surrounding strategies for more comprehensive STEC identification by frontline laboratories. Primarily these approaches focus on the inclusion of Stx typing in routine testing. More recently, the development and incorporation of WGS methods in STEC surveillance aim to improve the epidemiological tracking of infections. As a secondary benefit of adopting WGS, will be a enhancement in our understanding of STEC biology through the vast collection of genome sequences. Although, there is great promise in WGS in STEC characterization and surveillance, STEC detection will likely continue to rely on a combination of culturing and nonculture methods. As such, regardless of the technologies that arise for STEC detection and characterization, for at least the immediate future, frontline laboratories will continue to need logical testing algorithms that incorporate a selection of the appropriate methods above.

#### REFERENCES


#### AUTHOR CONTRIBUTIONS

All authors listed, have made substantial, direct and intellectual contribution to the work, and approved it for publication.

#### FUNDING

Salary support for BP was provided through Alberta Innovates - Health Solutions (AIHS) funding of The Alberta Provincial Pediatric EnTeric Infection Team.


epidemiological surveillance. Euro Surveill. 18, 20380. Available online: http:// www.eurosurveillance.org/images/dynamic/EE/V18N04/art20380.pdf


without an acid treatment. J. Food Sci. 77, M481–M489. doi: 10.1111/j.1750- 3841.2012.02813.x


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Parsons, Zelyas, Berenger and Chui. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# From Farm to Table: Follow-Up of Shiga Toxin-Producing *Escherichia coli* Throughout the Pork Production Chain in Argentina

*Rocío Colello, María E. Cáceres, María J. Ruiz, Marcelo Sanz, Analía I. Etcheverría\* and Nora L. Padola*

#### *Edited by:*

*Pina Fratamico, United States Department of Agriculture – Agricultural Research Service, USA*

#### *Reviewed by:*

*James L. Smith, United States Department of Agriculture, USA Yanhong Liu, European Roma Rights Centre – United States Department of Agriculture – Agricultural Research Service, USA*

> *\*Correspondence: Analía I. Etcheverría analiain@vet.unicen.edu.ar*

#### *Specialty section:*

*This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology*

*Received: 03 November 2015 Accepted: 18 January 2016 Published: 08 February 2016*

#### *Citation:*

*Colello R, Cáceres ME, Ruiz MJ, Sanz M, Etcheverría AI and Padola NL (2016) From Farm to Table: Follow-Up of Shiga Toxin-Producing Escherichia coli Throughout the Pork Production Chain in Argentina. Front. Microbiol. 7:93. doi: 10.3389/fmicb.2016.00093*

*Laboratorio de Inmunoquímica y Biotecnología, Centro de Investigación Veterinaria de Tandil – Consejo Nacional de Investigaciones Científicas y Técnicas – Comisión de Investigaciones Científicas de la Provincia de Buenos Aires, Facultad de Ciencias Veterinarias, Universidad Nacional del Centro de la Provincia de Buenos Aires, Tandil, Argentina*

Pigs are important reservoirs of Shiga toxin-producing *Escherichia coli* (STEC). The entrance of these strains into the food chain implies a risk to consumers because of the severity of hemolytic uremic syndrome. This study reports the prevalence and characterization of STEC throughout the pork production chain. From 764 samples, 31 (4.05%) were *stx* positive by PCR screening. At farms, 2.86% of samples were *stx* positive; at slaughter, 4.08% of carcasses were *stx* positive and at boning rooms, 6% of samples were *stx* positive. These percentages decreased in pork meat ready for sale at sales markets (4.59%). From positive samples, 50 isolates could be characterized. At farms 37.5% of the isolates carried *stx1/stx2* genes, 37.5% possessed stx2e and 25%, carried only stx2. At slaughter we detected 50% of isolates positive for stx2, 33% for stx2e, and 16% for stx1/stx2. At boning rooms 59% of the isolates carried stx1/stx2, 14% stx2e, and 5% stx1/stx2/stx2e. At retail markets 66% of isolates were positive for stx2, 17% stx2e, and 17% stx1/stx2. For the other virulence factors, *ehxA* and *saa* were not detected and *eae* gene was detected in 12% of the isolates. Concerning putative adhesins, *agn43* was detected in 72%, *ehaA* in 26%, *aida* in 8%, and *iha* in 6% of isolates. The strains were typed into 14 *E. coli* O groups (O1, O2, O8, O15, O20, O35, O69, O78, O91, O121, O138, O142, O157, O180) and 10 H groups (H9, H10, H16, H21, H26, H29, H30, H32, H45, H46). This study reports the prevalence and characterization of STEC strains through the chain pork suggesting the vertical transmission. STEC contamination originates in the farms and is transferred from pigs to carcasses in the slaughter process and increase in meat pork at boning rooms and sales markets. These results highlight the need to implement an integrated STEC control system based on good management practices on the farm and critical control point systems in the food chain.

Keywords: STEC, foodborne pathogens, pork production chain, prevalence, characterization

## INTRODUCTION

Shiga toxin-producing *Escherichia coli* (STEC) are important foodborne pathogens that can cause severe disease, including a life-threatening complication such as bloody diarrhea and hemolytic uremic syndrome (HUS; Paton and Paton, 1998). HUS is one of the most common etiologies for acute kidney injury and an important cause of acquired chronic kidney disease in children (Grisaru, 2014). This damage is produced by the action of cytotoxins Stx1 and Stx2, being Stx2 and their subtypes associated more frequently with HUS (Beutin et al., 2007). The ability to adhere to epithelial cells is an important virulence trait, because adherence presumably enables to deliver toxins efficiently to host organs (Tarr et al., 2000). Intimin, encoded by *eae* gene, is required for intimate bacterial adhesion to epithelial cells inducing a characteristic histopathological lesion defined as "attaching and effacing" (A/E) and has been considered as a risk factor for disease in human (Ethelberg et al., 2004). However, the presence of *eae* would not be essential for pathogenesis, considering that some *eae* negative STEC have been associated with severe disease in human (Paton et al., 2001). Some studies reported adherence factors other than intimin, such as Saa (Paton et al., 2001), AIDA and Agn43 (Restieri et al., 2007), EhaA (Wu et al., 2010), Iha (Tarr et al., 2000; Szalo et al., 2002). AIDA was identified in diffusely adhering diarrheagenic *E. coli* strain and is associated with edema disease and diarrhea in pigs (Niewerth et al., 2001), contributing to bacterial intercellular aggregation and biofilm formation (Restieri et al., 2007); *iha* encode for an outer membrane protein identified as a bacterial adherence conferring iron regulated gene (Tarr et al., 2000) and Agn43 and EhaA are autotransporter proteins of O157:H7 involved in adhesion and biofilm formation (Wells et al., 2008). Other factors are also involved in human pathogenicity such as a plasmid that encoded enterohemolysin (EhxA), among others (Feng and Reddy, 2013).

Argentina, where the HUS is endemic, hold the highest record worldwide of this syndrome with an incidence of 17/100,000 children less than 5 years old (Rivas et al., 2010). Although STEC O157:H7 is recognized as the most important serotype associated with human infection, there are more than 400 non-O157 serotypes that have been involved in human disease and isolated from different reservoirs including cattle, pigs, goats, sheep, cats, and dogs (Parma et al., 2000; Padola et al., 2004; Amézquita-López et al., 2014). STEC usually do not produce disease in animals, however, the Stx2e subtype is involved in edema disease in pigs, a peracute toxemia characterized by vascular necrosis, edema, neurological signs and that in some cases can be fatal (Niewerth et al., 2001). STEC strains have been isolated from pork products and have been associated with human infections as diarrhea and HUS, including strains harboring *stx2e* subtype (Sonntag et al., 2005; Kaufmann et al., 2006; Trotz-Williams et al., 2012); however, it is unknown if the contamination of pork- derivate food occurs during the processing or by cross contamination (Tseng et al., 2014).

There is an increase in worldwide demand for fast-growing species with efficient feed conversion rates, such as pigs, because they represent a major share in the growth in the livestock subsector (Food and Agriculture Organization [FAO], 2014). Because of the limited epidemiologic data of STEC in pork and the increasing role of non-O157 STEC in human illnesses, it is very important to study the role of pigs as reservoirs of STEC and the transmission to the swine production chain (Ercoli et al., 2015). Taking into account the data mentioned above, the aim of this study was to determine the prevalence and to characterize STEC throughout the pork production chain in Argentina.

#### MATERIALS AND METHODS

#### Management of Farms and Animals

The study was conducted in two pig production farm systems. Both farms are intensively organized in total confinement. The production stages are: gestation, farrowing, weaning, and growing/finishing (fattening), which are geographically separated from each other within the same farm. The usual group size varies between 10 and 30 pigs. Pigs and employees move from one building to others by corridors that are isolated from external traffic.

### Management of Carcasses Until Retails Markets

Pigs at finishing stage are transported to the slaughterhouses. After slaughtered, the pork carcasses are chilled during 24 to 48 h and sent to boning rooms in refrigerated trucks.

At the boning rooms the carcasses were boning to products such as meat and minced meat. Finally, the products are transferred to the retails markets.

#### Collection of Samples

Seven hundred and sixty four samples were collected from May, 2012 to November, 2014 from two pig production systems.

This study was carried out in accordance with the recommendations of the Animal Welfare Committee from the Veterinary Science Faculty, UNCPBA, Resolution 087/02.

#### Samples at Farms

Three hundred and forty eight samples were taken at farms. From these, 277 corresponded to rectal swabs, and 71 come from the environment obtained from water drink, feed and feces on the floor by swabbing.

#### Samples at Slaughterhouses

One hundred and forty seven samples were by swabbing. Off these, 22 were from rectal swabs after slaughter, 85 from carcasses, and 40 from the slaughterhouses environment (pre-washing, scalding, deharing, dressing, cooling, and knives).

Carcasses swabs were taken in concordance with circular No 3496/02 of Servicio Nacional de Sanidad y Calidad Agroalimentaria (SENASA, 2002). Five quarters areas of 100 cm<sup>2</sup> each one were taken and processed separately, they are named heads (H), external rectum (ER), internal rectum (IR), external thoracic (ET), and internal thoracic (IT; **Figure 1**).

### Samples at Boning Rooms

One hundred and eighty one samples were taken. From these, 94 come from carcasses, 24 from meat, 23 from minced meat, and 40 from environmental samples (refrigerated trucks and meat contact surfaces such as meat tables, knives, meat mincing machine, and vertical band saw machine).

## Sampling at Retail Markets

Eighty seven samples were taken from retail markets (43 samples come from meat, 13 from minced meat, and 31 from the environment). The environmental samples were obtained from meat tables, knives, vertical band saw machine, and refrigerators.

### Sample Preparation and Isolation of STEC

Swabs were processed according to Etcheverría et al. (2010). Briefly, the swabs were cultured in Luria Bertani broth (LB) with shaking at 37◦C for 18 h, and then an aliquot was grown on MacConkey agar plates by incubating at 37◦C for 24 h.

Ten to fifty individual colonies were processed for amplification of Shiga toxin genes (*stx1*, *stx2*, and *stx2e*; **Table 1**). Each positive colony for either *stx* was tested for the presence of the *eae*, *ehxA*, and *saa* by multiplex polymerase chain reaction (PCR; Paton and Paton, 2002). Genes encoding adhesins (*ehaA, agn43, iha*, *aida*) were amplified using monoplex PCR. STEC strains used as positive control were *E. coli* O157:H7 EDL933 (*stx1*, *stx2*, *eae*, *ehxA*, *ehaA, agn43, iha*), *E. coli* O8 (stx2e), *E. coli* O91:H21 (*stx*1*, stx*2*, ehxA, saa*), and *E. coli* O157:H19 (*aida*).

Amplification products were separated by electrophoresis on 2% agarose gel containing 0.8 µg/ml of ethidium bromide in running buffer and were visualized by UV transillumination.

### Determination of Serotype

O and H types were determined by microagglutination technique in plates and tubes as described by Guinée et al. (1981) and modified by Blanco et al. (1996) using all available O (O1– O175) antisera plus six putative new O antigens (OX176 through OX181) and H (H1–H56) antisera (Pradel et al., 2000).

TABLE 1 | Genes, primers sequences, and size of amplified product of Shiga toxin-producing *Escherichia coli* (STEC).


## RESULTS

The results indicate that STEC occurrence is widespread throughout pork production chain. Among the 764 samples, 31 (4.05%) were positive for *stx*. In rectal swabs from the different pig categories, 2.86% were STEC positive, distributed 5.88% at fattening, 4.3% at growing, 2.38% at gestation and 1.51% at farrowing. STEC were not detected in feed, water, and fecal samples taken from farms. At slaughter, 4.08% of carcasses sampled were *stx* positive. The distribution in the different quarters of the carcasses was: 50% from ER, 16.6% from ET, 16.6% from IT, 16.6 % from heads. At boning rooms, 6% of samples were STEC positive, belonging 82% to carcasses, and 18% to pork meat. The distribution in the different quarters of the carcasses was: 33.3% from ER, 22.2% from IR, 22.2% from IT, 11.2% from ET and 11% from head. At sale markets 4.59% of STEC positive samples were detected in pork meat ready for sale.

#### Characterization of STEC

From positive samples, 50 isolates could be characterized by PCR. In samples from farms 6/16 (37.5%) of the isolates carried *stx1*/*stx2*, 6/16 (37.5%) possessed *stx2e*, and 4/16 (25%) carried *stx2*. At slaughter 3/6 (50%) of isolates were positive for *stx2,* 2/6 (33%) for *stx2e*, and 1/6 (16%) for *stx1/stx2*. At boning rooms 13/22 (59%) of the isolates carried *stx1*/*stx2*, 3/22 (14%) *stx2e*, and 1/22 (5%) *stx1*/*stx2*/*stx2e*. At retail markets 4/6 (66%) of isolates were positive for *stx2,* 1/6 (17%) for *stx2e*, and 1/6 (17%) for *stx1/stx2*. Other virulence factors such as *ehxA* and *saa* were not detected and *eae* was detected in 6/50 (12%) of samples. Concerning putatives adhesins, *agn43* was detected in 36/50 (72%), *ehaA* in 13/50 (26%), *aida* in 4/50 (8%), and *iha* in 3/50 (6%) of isolates. The most frequent virulence profiles found were *stx1, stx2*, or *stx2e* combined with *agn43* in 36 (72%) strains. The 50 isolates were typed into 14 *E. coli* O groups (O1, O2, O8, O15, O20, O35, O69, O78, O91, O121, O138, O142, O157, O180) and 15 were considered O non-typable (NT). Ten H antigens (H9, H10, H16, H21, H26, H29, H30, H32, H45, H46) were distributed among the 50 strains, while one isolate were non-motile (H–). **Table 2** indicates the relationships between virulence profiles, sites of samples and serotypes in isolated STEC strains.

### DISCUSSION

To our knowledge, this study is the first that reports the prevalence and characterization of STEC strains through the chain pork suggesting the vertical transmission of these pathogens. However, there are studies that demonstrate the prevalence in farms, finishing pigs, slaughter, and pork meat in sales markets, separately. The prevalence of STEC in pigs, carcasses, and pork meat at different stage of production from other countries is variable and it is necessary to take caution when comparing prevalence since the variation may be due to several factors, such as sampling method, samples processing, and season in which the study was performed. Different prevalence of STEC in pigs was reported previously, ranged from 2 to 31%, in agreement with our results (Parma et al., 2000; Kaufmann et al., 2006; Meng et al., 2014). In carcasses at slaughter and boning rooms, the prevalence found is in concordance with other studies whose prevalence ranged from 0.2 to 26% (Leung et al., 2001; Bouvet et al., 2002; Bohaychuk et al., 2011; Kolácková et al., 2014 ˇ ).

At slaughter some operations such as skinning, evisceration and handling are more likely than others to contaminate carcass and meat (Koohmaraie et al., 2005; Etcheverría et al., 2010). For this, some areas of carcasses are more prone than others to be exposure to potential or cross contamination, thus the suggestion of sampling at three or four sites on carcass, because contamination appears to vary considerably among different sites (Roberts et al., 1984). In addition, the ER is the area that involves

TABLE 2 | Relationships between virulence profiles, sites of samples, and serotypes in STEC strains.


a particular risk of contamination during early stages of dressing as our result in that the ER was the more contaminated area, in concordance with Bouvet et al. (2002). However, other areas sampled as IT and IR shown more contamination probably due to handling at the boning room.

In our study, the prevalence at sale market was less than that informed by Magwedere et al. (2013) in USA (50%), Martin and Beutin (2011) in Germany (14%) and Lee et al. (2009) Korea (15%). This could be due because these studies were performed in sale markets where meat from different origins were sold and cross contamination during handling can occur. In the present study the samples were obtained from sale markets where only meat pork was sold.

From 50 STEC isolates, *stx*1/*stx*2 and *stx*2 occurred more frequently than isolates carrying *stx1.* Epidemiologically, Stx2 producing strains are more often related with HUS than strains that produce Stx1 (Paton and Paton, 2002).

Regarding *stx*2e, its prevalence decreased from pigs at farms to pork meat. Although some authors have reported the presence of *stx2*e in STEC strains in human patients on a few occasions, STEC harboring *stx2*e are more likely to cause edema disease in pigs causing economic losses in pig production (Kaufmann et al., 2006). It is necessary to determine a rol of these strains in human infection.

The presence of *eae* detected in isolates that harbor too *stx2*, *stx2e, agn43* from boning rooms and sale markets implies a high risk for human health (Tseng et al., 2014). The most prevalent adhesin identified among all isolates and involved in adhesion and biofilm formation was Agn43, followed by EhaA in agreement with Biscola et al. (2011) and Tseng et al. (2014) which detected them in swine and different sources, respectively. In this study, 8% of strain harbored *AIDA*, similar to that found in South Africa (Mohlatlole et al., 2013) and China (Zhao et al., 2009). The *iha* was present in few isolates, but this gene has been detected over 70% of the *eae*-negative STEC strains associated with human clinical cases examined in studies in Germany (Hauser et al., 2013) and Argentina (Galli et al., 2010). The high prevalence of LEE negative STEC isolated from pigs in our study emphasizes the need of further work to better define the role that the attachment proteins outside the LEE may play in the adherence to both pork and human epithelial cells.

Although many serotypes isolated in this study have been detected with low incidence in human disease and rarely associated with outbreaks (Friedrich et al., 2002), they have been isolated from pigs, sheep, cattle and food in other countries

#### REFERENCES


(Beutin et al., 1993; Kaufmann et al., 2006; Beutin et al., 2007). At slaughter and boning rooms serogroups associated with human illnesses such as *E. coli* O91, O121, and O157 were detected in agreement with other studies that recovered these serogroups from pig fecal samples (Desrosiers et al., 2001; Friedrich et al., 2002; Karmali et al., 2003; Bielaszewska et al., 2009; Trotz-Williams et al., 2012; Yoon et al., 2013; Tseng et al., 2014).

### CONCLUSION

The presented study investigated the tracking of STEC from the farm to table and indicates that the production of meat pork harbored STEC strains. STEC contamination originated in the farms is transferred from pigs to carcasses in the slaughter process and increase in meat pork at boning rooms and sales markets. Besides, the entrance of these strains into the food chain implies a risk to consumers because of severity of the illness they can cause. If STEC is present in any food product, it has the possibility of causing foodborne illness. In addition to public health problem, the presence of strains carrying the *stx2e* gene is a problem for the pig production because they can cause the edema disease causing important economic losses. In spite of the wealth of data available on this important disease, it is necessary to effectively prevent this contamination by educating employees, retailers and consumers on the appropriate handling and storage of meat. Further studies are needed to provide more systematic data in order to fuel the development of novel approaches for control of STEC in foods, including pork meat.

### AUTHOR CONTRIBUTIONS

RC conceived, designed, analyzed the experiments, and wrote the manuscript. MR, MS, and MC did some of the experiments. NP and AE designed some of the experiments, analyzed the data, and revised the manuscript.

#### ACKNOWLEDGMENTS

Authors thank Guillermo Arroyo for their collaboration during sampling collection, María Rosa Ortiz for her technical assistance. This work was supported by PICT 2010-1655, CIC, and SECAT from Argentina.

*Escherichia coli* from food by a combination of serotyping and molecular typing of Shiga toxin genes. *Appl. Environ. Microbiol.* 73, 4769–4775. doi: 10.1128/AEM.00873-07


*Escherichia coli* of genes encoding for putative adhesins of human EHEC strains. *Res. Microbiol.* 153, 653–658. doi: 10.1016/S0923-2508(02)01379-7


Shiga toxin-producing *Escherichia coli* strains isolated from healthy cattle and diarrheal patients in Japan. *J. Vet. Med. Sci.* 72, 589–597. doi: 10.1292/jvms.09- 0557


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2016 Colello, Cáceres, Ruiz, Sanz, Etcheverría and Padola. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

fmicb-07-00413 April 12, 2016 Time: 15:33 # 1

# Inactivation of Uropathogenic Escherichia coli in Ground Chicken Meat Using High Pressure Processing and Gamma Radiation, and in Purge and Chicken Meat Surfaces by Ultraviolet Light

#### Christopher H. Sommers\*, O. J. Scullen and Shiowshuh Sheen

Eastern Regional Research Center, United States Department of Agriculture, Agricultural Research Service, Wyndmoor, PA, USA

#### Edited by:

Chitrita Debroy, The Pennsylvania State University, USA

#### Reviewed by:

Diego Garcia-Gonzalo, Universidad de Zaragoza, Spain Olin Silander, Massey University, New Zealand

#### \*Correspondence:

Christopher H. Sommers christopher.sommers@ars.usda.gov

#### Specialty section:

This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology

Received: 25 November 2015 Accepted: 14 March 2016 Published: 14 April 2016

#### Citation:

Sommers CH, Scullen OJ and Sheen S (2016) Inactivation of Uropathogenic Escherichia coli in Ground Chicken Meat Using High Pressure Processing and Gamma Radiation, and in Purge and Chicken Meat Surfaces by Ultraviolet Light. Front. Microbiol. 7:413. doi: 10.3389/fmicb.2016.00413 Extraintestinal pathogenic Escherichia coli, including uropathogenic E. coli (UPEC), are are common contaminants in poultry meat and may cause urinary tract infections after colonization of the gastrointestinal tract and transfer of contaminated feces to the urethra. Three non-thermal processing technologies used to improve the safety and shelf-life of both human and pet foods include high pressure processing (HPP), ionizing (gamma) radiation (GR), and ultraviolet light (UV-C). Multi-isolate cocktails of UPEC were inoculated into ground chicken which was then treated with HPP (4◦C, 0–25 min) at 300, 400, or 500 MPa. HPP D10, the processing conditions needed to inactivate 1 log of UPEC, was 30.6, 8.37, and 4.43 min at 300, 400, and 500 MPa, respectively. When the UPEC was inoculated into ground chicken and gamma irradiated (4 and −20◦C) the GR D<sup>10</sup> were 0.28 and 0.36 kGy, respectively. The UV-C D<sup>10</sup> of UPEC in chicken suspended in exudate and placed on stainless steel and plastic food contact surfaces ranged from 11.4 to 12.9 mJ/cm<sup>2</sup> . UV-C inactivated ca. 0.6 log of UPEC on chicken breast meat. These results indicate that existing non-thermal processing technologies such as HPP, GR, and UV-C can significantly reduce UPEC levels in poultry meat or exudate and provide safer poultry products for at-risk consumers.

Keywords: UPEC, high pressure processing, gamma radiation, ultraviolet light, chicken

## INTRODUCTION

Escherichia coli are classified as commensal (natural microflora), or variants that cause disease such as intestinal pathogenic E. coli (iPEC) or extraintestinal (ExPEC) types. Groups of ExPEC include Neonatal Meningococcal E. coli (NMEC), Avian Pathogenic E. coli, (APEC), Sepsisassociated Pathogenic E. coli (SEPEC) and Uropathogenic E. coli (UPEC) (Mitchell et al., 2015). E. coli such as ExPEC (UPEC) are responsible for 75–95% of urinary tract infections (UTI) and uncomplicated cystitis and pyelonephritis (Nordstom et al., 2013). Fifty percent of women will contract one UTI in their lifetime, and 25% will have a recurrent UTI (Minardi et al., 2011; Bao et al., 2014). The number of UTI in the US is ca. 6–8 million annually, with ca. 100, 000 hospitalizations, ca. 23,000 deaths, and a health care burden fmicb-07-00413 April 12, 2016 Time: 15:33 # 2

of ca. \$3.5 billion (Nordstom et al., 2013). The mechanism for contraction of a UTI is transfer of contaminated feces from the gastrointestinal tract to the urethra, and isolates associated with UTI invariably match the individual's fecal microflora (Moreno et al., 2008).

The idea that extraintestinal foodborne pathogens such as the ExPEC might be responsible for UTI in humans is relatively new, and it has long been suspected they may be associated with illness outbreaks (Markland et al., 2015). The presence of ExPEC in poultry meat has been firmly established (Johnson et al., 2005; Mitchell et al., 2015). Studies have compared ExPEC isolates from food animals, food, and those from women with UTI and the incidence of ExPEC in poultry meat and have demonstrated both genetic similarity and identity between ExPEC from animals and food with those from humans with UTI (Cortes et al., 2010; Jakobsen et al., 2010a,b, 2012; Vincent et al., 2010; Bergeron et al., 2012; Mora et al., 2013). More importantly ExPEC isolated from animals and food can cause UTI in mouse model systems (Jakobsen et al., 2012).

Three non-thermal intervention technologies of interest to the meat and poultry processing industry, which are used commercially to improve food safety and extend shelf life, include high pressure processing (HPP), ionizing (gamma) radiation (GR) and ultraviolet light (UV-C) (Salvage, 2014). HPP subjects food to an elevated pressure of 100–1000 MPa typically at temperatures below 60◦C. The mechanism by which HPP inactivates foodborne pathogens includes cell membrane and structure damage, ribosome dissociation, dissociation of DNA, and enzyme inactivation (Campus, 2010; Simonin et al., 2012). GR inactivates microorganisms by damaging their DNA indirectly through radiolysis of water and induction of oxidative damage or direct damage through breakage of the phosphodiester backbone in addition to oxidative damage to proteins and cell membranes (Taub et al., 1979; Diehl, 1995). UV-C kills microorganisms through induction of cyclobutane pyrimidine dimmers and 6-4 photoproducts in addition to protein damage (Krisko and Radman, 2010; Rastogi et al., 2010).

The purpose of this study was to determine the HPP and GR inactivation kinetics for ExPEC (UPEC) inoculated in ground chicken as well as the UV-C inactivation kinetics on poultry meat surfaces and in chicken purge on food contact surfaces. To the authors knowledge this is the first study to examine the inactivation kinetics of ExPEC in a food system.

### MATERIALS AND METHODS

#### Chicken

Ground chicken (92% lean) was freshly prepared and purchased at a local wholesaler (Lansdale, PA, USA) and evenly portioned into 90 g aliquots in polynylon pouches (Uline, Inc., Philadelphia, PA, USA), vacuum sealed to 50 millibars using a Multi-Vac A300 packager (Multi-Vac Inc., Kansas City, MO, USA) and then frozen (−70◦C). The ground chicken was tested for presence of E. coli as described below and it was <1 CFU/g. Multiple chicken lots were tested and one with low E. coli levels was selected. Boneless skinless chicken breast and chicken skin was obtained fresh from a local butcher. Chicken purge was obtained from a local poultry processor and frozen (−70◦C) until ready for use.

### E. coli Isolates

The E. coli isolates were obtained from the American Type Culture Collection (Manassas, VA, USA). These include 700414, 700415, 700416, 700417, 700336, and BAA-1161 (http://www. atcc.org), which were isolated from women with UTI. Multiisolate cocktails of the pathogens were used as recommended for appropriate validation of non-thermal processing technologies (National Advisory Committee on Microbiological Criteria for Food [Nacmcf], 2006). The individual isolates were prescreened for resistance to HPP, GR and UV prior to use, and the D<sup>10</sup> were consistent with results for our previous studies with iPEC (Sheen et al., 2015; Sommers et al., 2015; Sommers et al., unpublished data).

#### E. coli Growth and Inoculation

The E. coli were cultured independently in 20 ml Tryptic Soy Broth (TSB) without dextrose to avoid development of acid resistance (BD-Difco, Sparks, MD, USA) using 50 ml sterile tubes at 37◦C (150 rpm) for 18–24 h using a New Brunswick Model G34 Environmental Shaker (New Brunswick, Edison, NJ, USA). The bacteria were then sedimented by centrifugation (1,200 × g, Hermle Model Z206A, Hermle Labortechnik, Germany) and resuspended as a cocktail in 20 ml sterile 0.1% peptone water (SPW, BD-Difco).

Thawed ground chicken (10 g) was aliquoted into 2 oz. Nasco (Ft. Atkinson, WI, USA) Whirl-Pak bags, inoculated with 0.1 ml of UPEC, mixed manually for 1 min, and then sealed using the Multi-Vac A300 Packager. The final concentration of UPEC in the ground chicken was ca. 8–9 log CFU/g. The sample bags were then sealed in a second bag and stored at 4◦C until HPP treatment or gamma radiation (ca. 2 h).

#### High Pressure Processing Treatment

High pressure processing was performed using a laboratory scale pressure unit (Mini Food lab FPG5620, Stansted Fluid Power Ltd., Essex, UK), comprised of a double-jacketed thick-wall stainless steel cylinder (approximate volume of 0.3 L) having an internal stainless steel sample holder of 25.4 mm × 254 mm (diameter × length). The thick-wall cylinder was maintained at a set-point temperature in which heat transfer fluid continuously circulated from a refrigerated liquid chiller (Proline RP 855, Lauda, Germany). The pressure come-up rate was 100 MPa per 15 s (or 6.7 MPa/s) and the release rate was 100 MPa per 9 s (or 11.1 MPa/s). Samples were pressure-treated at 500, 400, and 300 MPa (4◦C) at 5 min intervals for up to 25 min. The initial temperature in the processing chamber was ca. 4◦C and did not exceed a maximum of 35◦C during the HPP treatment. Keeping the chamber temperature low (ca. 4◦C) prevents compression heating induced thermal effects from interfering with HPP inactivation kinetic determination (Sheen et al., 2015). The chamber temperature was monitored by the built-in sensor (a T-type thermal couple device). The thermal sensor was immersed in the working chamber near food samples filled with the recirculation fluid.

#### Gamma Radiation

fmicb-07-00413 April 12, 2016 Time: 15:33 # 3

A Lockheed Georgia Company (Marietta, GA, USA) selfcontained <sup>137</sup>Cs irradiator, with a dose rate of 0.065 kGy/min, was used for all exposures. The radiation source consisted of 23 individually sealed source pencils in an annular array. The 22.9 cm × 63.5 cm cylindrical sample chamber was located central to the array when placed in the operating position. Inoculated samples were placed vertically and centrally in the sample chamber, using a 4 mm thick polypropylene bucket, to ensure a good dose uniformity (DUR < 1.1:1.0). The temperature during irradiation (4◦C) was monitored by thermocouple and maintained (4 or −20◦C) by introduction of the gas phase from a liquid nitrogen source directly into the top of the sample chamber. The radiation doses were at 0.3 and 0.6 kGy increments at 4 or −20◦C, respectively. The absorbed dose was verified using temperature tempered 5 mm alanine pellets that were then measured using a Bruker eScan EPR Analyzer (Bruker, Billerica, MA, USA).

### Exposure to Ultraviolet Light

A custom built UV-C apparatus (2 mW/cm<sup>2</sup> ) (Sommers et al., 2010) was used to treat chicken purge inoculated with UPEC on stainless steel (304 L), High Density Polypropylene (HDPP) and High Density Polyethylene (HDPE) coupons (5 × 10 cm), and the foods themselves. Chicken purge was thawed in a refrigerator overnight and 0.5 ml of UPEC cocktail inoculated into 4.5 ml chicken exudates was then mixed by vortexing for 30 s. One hundred microliter of inoculated purge was placed on the coupons which were then spread to a 4 cm × 4 cm area using an inoculating loop. The coupons were placed in a refrigerator for 30 min and then placed on a cold pack (4◦C) for UV-C exposure. The UV-C intensity exposure times were 0, 10, 20, 30, 40, 50, and 60 s for UV-C doses of 20, 40, 60, 80, 100, and 120 mJ/cm<sup>2</sup> .

For chicken meat and skin 4 × 4 cm sections (ca. 1 mm thick) of boneless skinless chicken breast were placed in sterile petri dishes and inoculated with 0.1 ml of chicken purge which was then spread onto the surface (4 cm × 4 cm) using an inoculating loop, and then incubated for 30 min in a refrigerator (4◦C) prior to treatment with UV-C. The samples were placed on cold packs prior to UV-C treatment. The UV-C intensity exposure times were 0, 10, 20, 30, 40, 50, and 60 s for UV-C doses of 20, 40, 60, 80, 100, and 120 mJ/cm<sup>2</sup> .

UV-C intensity was monitored using a calibrated UVX Radiometer (UVP Inc., Upland, CA, USA). The temperature of the room was approximately 20◦C during the exposure to UV-C, and the food temperature did not increase to more than 30◦C at the end of the process as measured using an infrared thermometer.

### Recovery of the Surviving E. coli

The individual ground chicken samples were added to 90 ml of 0.1% PW and then stomached for 2 min (Model Bag Mixer 100W, Inter science Co., France). The coupons with 0.1 ml exudate were placed in stomacher bags with 9.9 ml SPW and hand massaged for 1 min. For recovery of UPEC 1.0 mL, after proper decimal dilutions, was placed on duplicate E. coli/coliform PetrifilmTM (3M Microbiology Products Co., St. Paul, MN, USA). The films were maintained at room temperature for 4 h to allow the injured cells to recover (Hsu et al., 2014) and then incubated at 37◦C for 24 h. Colonies (CFU) were enumerated for determination of log reduction and D10. Incubation for longer periods did not result in changes to the colony counts, an indicator of injured cell recovery.

### Statistical Analysis

The mean plate counts of the treated samples (N) were divided by the average control plate counts (No) to give a survivor ratio (N/No). The log<sup>10</sup> (N/No) of the ratios was then used for determination of D10-values and other statistical analyses. D10-values were determined by the reciprocal of the slope following linear regression as determined by least squares analysis (Diehl, 1995). Each experiment (D<sup>10</sup> determination) was conducted independently three times. A minimum of five time points were used for determination of D10. Statistical analysis functions of MS Excel (Microsoft Corp., Redmond, WA, USA) were used for routine calculations (D<sup>10</sup> determination), descriptive statistics, analysis of variance (ANOVA, 95% confidence).

## RESULTS AND DISCUSSION

#### High Pressure Processing

The HPP inactivation kinetics for the UPEC multi-isolate cocktail is shown in **Table 1** and **Figure 1**. As we have found previously for STEC the inactivation kinetics was first order in nature. The HPP D<sup>10</sup> of the UPEC in refrigerated (4◦C) ground chicken was ca. 30.6, 8.37, and 4.43 min at 300, 400, and 500 MPa, respectively. HPP treatment using 300 MPa was ineffective as a treatment as ca. 1 log was inactivated at that pressure. When we compare the results of this study with those from previous HPP studies for

TABLE 1 | D<sup>10</sup> values for uropathogenic Escherichia coli in ground chicken and chicken purge.


D<sup>10</sup> for HPP, GR, and ultraviolet light are shown with the standard error of the mean in parenthesis. Each experiment was conducted independently three times (n = 3). Each HPP D<sup>10</sup> was significantly different than the others, as were the GR D<sup>10</sup> (ANOVA, p < 0.05). There was no difference (p > 0.05) between the UV-C D<sup>10</sup> for food contact surfaces.

fmicb-07-00413 April 12, 2016 Time: 15:33 # 4

inactivation of STEC the results are similar. Sheen et al. (2015) found the mean HPP D<sup>10</sup> (350 MPa) of 39 STEC isolates from illness outbreaks to be ca. 9.25 min while those from animals and environmental sources was ca. 10.4 min when suspended in 80% lean ground beef (350 MPa, 4◦C). Hsu et al. (2014) found that 450 MPa (15 min, 4◦C) inactivated 5.5–6.9 log of STEC in 77% lean ground beef while 350 MPa inactivated ca. 3.2–4.7 log. Jiang et al. (2015) was able to inactivate 3–4 log of STEC with HPP using multiple 1 min cycles at 400 MPa. Our results obtained using the UPEC were similar to those we have and others have obtained in the STEC suspended in ground beef.

#### Gamma Radiation

When the UPEC cocktail was suspended in ground chicken and treated with gamma radiation the GR D<sup>10</sup> was ca 0.28 kGy at refrigeration (4◦C) temperature (**Figure 1**, **Table 1**). These results are similar to those obtained by Sommers et al. (2015) which found the GR D<sup>10</sup> of STEC associated with illness outbreaks to be ca. 0.27 kGy when suspended in refrigerated 80% lean ground beef. Sommers and Fan (2012) reviewed the studies for inactivation of E. coli O157:H7 in refrigerated ground beef in which the GR D<sup>10</sup> ranged from 0.013 to 0.37 kGy. GR D<sup>10</sup> for microorganisms irradiated in frozen foods are typically higher than that in refrigerated foods due to the limitation of indirect DNA damage through immobility of hydroxyl radicals produced by the radiolysis of water in the frozen state (Bruns and Maxcy, 1979; Taub et al., 1979). Lopez-Gonzalez et al. (1999) found the D<sup>10</sup> for E. coli O157:H7 suspended in frozen beef (−15◦C) beef to be 0.62 kGy. Thayer and Boyd (2001) found the GR D<sup>10</sup> of E. coli O157:H7 in frozen ground beef (−20◦C) to be 0.98 kGy. Black and Jaczynski (2006) obtained D<sup>10</sup> of 0.33 and 0.35 kGy for E. coli O157:H7 in frozen (−20◦C) ground beef and chicken, respectively. It appears the radiation doses needed to inactivate STEC in refrigerated and frozen meat and poultry products should also control the UPEC.

### Ultraviolet Light

In this study our objective was to calculate a UV-C D<sup>10</sup> value for the UPEC suspended in chicken exudate on SS, HDPP, and HDPE surfaces. The UV-C D<sup>10</sup> for UPEC is shown in **Table 1** and **Figure 1**. The D<sup>10</sup> was calculated from the linear portion of the survival curve (0–60 mJ/cm<sup>2</sup> ) and ranged from 11.4 to 12.9 mJ, cm<sup>2</sup> (p > 0.05, ANOVA). As with previous studies complete inactivation of microorganisms with UV-C is difficult because of shadowing by particulates in purge. The D<sup>10</sup> for UPEC in purge obtained were very similar to those obtained with STEC suspended in veal purge (Sommers et al., unpublished data), as well as other foodborne pathogens (Sommers et al., 2012; Sommers and Sheen, 2015). A relatively low UV-C dose of 100 mJ/cm<sup>2</sup> should be able to inactivate ≥5 log of UPEC in chicken purge on food contact surfaces.

When we inoculated the UPEC onto skinless chicken meat we obtained ca. 0.6 (±0.19), respectively, which was significantly reduced from the untreated controls (p < 0.05) which is consistent with previous results from our group as well as other researchers (Stermer et al., 1987; Sumner et al., 1996; Sommers et al., 2010). When the UPEC were inoculated onto chicken skin we did not obtain a significant reduction, which is again consistent with results we have obtained using other foodborne pathogens on chicken skin (Stermer et al., 1987; Sumner et al., 1996; Sommers et al., 2010). The reduced inactivation of the UPEC on skin and meat surfaces is expected due to the surface topology and shielding of the UPEC from UV-C (Gardner and Shama, 2000).

## CONCLUSION

Our results indicate the HPP, GR, and UV-C inactivation kinetics of the UPEC are similar to our historical results for the STEC in meat and meat purge. The processing conditions used to control STEC should have similar effects on the UPEC.

### DISCLAIMER

fmicb-07-00413 April 12, 2016 Time: 15:33 # 5

Mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U.S. Department of Agriculture. USDA is an equal opportunity provider and employer.

### AUTHOR CONTRIBUTIONS

SS contributed to collection of high pressure processing data. OS was responsible for collection of UV-C data, CS was study director and designed the study, was responsible for data

#### REFERENCES


collection and analysis, and was responsible for manuscript completion.

### FUNDING

This Project was funded by USDA-ARS National Program 108 Food Safety Project No. 8072-42000-073-00D.

### ACKNOWLEDGMENT

We would like to thank Jennifer Cassidy for editing this manuscript.


fmicb-07-00413 April 12, 2016 Time: 15:33 # 6

Barbosa-Cánovas, V. M. Bala Balasubramaniam, C. P. Dunne, D. F. Farkas, J. T. C. Yuan (Hoboken, NJ: Wiley-Blackwell), 236–246.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Sommers, Scullen and Sheen. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Surveillance of Extended-Spectrum Beta-Lactamase-Producing *Escherichia coli* in Dairy Cattle Farms in the Nile Delta, Egypt

Sascha D. Braun1, 2 \*, Marwa F. E. Ahmed<sup>3</sup> , Hosny El-Adawy 4, 5, Helmut Hotzel <sup>4</sup> , Ines Engelmann1, 2, Daniel Weiß1, 2, Stefan Monecke1, 2, 6 and Ralf Ehricht 1, 2

*<sup>1</sup> Alere Technologies GmbH, Jena, Germany, <sup>2</sup> InfectoGnostics Research Campus, Jena, Germany, <sup>3</sup> Department of Animal Hygiene and Zoonoses, Faculty of Veterinary Medicine, Mansoura University, Mansoura, Egypt, <sup>4</sup> Institute of Bacterial Infections and Zoonoses, Friedrich-Loeffler-Institut, Jena, Germany, <sup>5</sup> Department of Poultry Disease, Faculty of Veterinary Medicine, Kafrelsheikh University, Kafr El-Sheikh, Egypt, <sup>6</sup> Institute for Medical Microbiology and Hygiene, Technical University of Dresden, Dresden, Germany*

#### *Edited by:*

*Chitrita Debroy, The Pennsylvania State University, USA*

#### *Reviewed by:*

*Amy J. Mathers, University of Virginia, USA Brandon Luedtke, United States Department of Agriculture, USA*

*\*Correspondence:*

*Sascha D. Braun sascha.braun@clondiag.com*

#### *Specialty section:*

*This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology*

*Received: 25 February 2016 Accepted: 15 June 2016 Published: 04 July 2016*

#### *Citation:*

*Braun SD, Ahmed MFE, El-Adawy H, Hotzel H, Engelmann I, Weiß D, Monecke S and Ehricht R (2016) Surveillance of Extended-Spectrum Beta-Lactamase-Producing Escherichia coli in Dairy Cattle Farms in the Nile Delta, Egypt. Front. Microbiol. 7:1020. doi: 10.3389/fmicb.2016.01020* Introduction: Industrial livestock farming is a possible source of multi-resistant Gram-negative bacteria, including producers of extended spectrum beta-lactamases (ESBLs) conferring resistance to 3rd generation cephalosporins. Limited information is currently available on the situation of ESBL producers in livestock farming outside of Western Europe. A surveillance study was conducted from January to May in 2014 in four dairy cattle farms in different areas of the Nile delta, Egypt.

Materials and Methods: In total, 266 samples were collected from 4 dairy farms including rectal swabs from clinically healthy cattle (*n* = 210), and environmental samples from the stalls (*n* = 56). After 24 h pre-enrichment in buffered peptone water, all samples were screened for 3rd generation cephalosporin-resistant *Escherichia coli* using BrillianceTM ESBL agar. Suspected colonies of putatively ESBL-producing *E. coli* were sub-cultured and subsequently genotypically and phenotypically characterized. Susceptibility testing using the VITEK-2 system was performed. All suspect isolates were genotypically analyzed using two DNA-microarray based assays: CarbDetect AS-1 and *E. coli* PanType AS-2 kit (ALERE). These tests allow detection of a multitude of genes and their alleles associated with resistance toward carbapenems, cephalosporins, and other frequently used antibiotics. Serotypes were determined using the *E. coli* SeroGenotyping AS-1 kit (ALERE).

Results: Out of 266 samples tested, 114 (42.8%) ESBL-producing *E. coli* were geno- and phenotypically identified. 113 of 114 phenotypically 3rd generation cephalosporin-resistant isolates harbored at least one of the ESBL resistance genes covered by the applied assays [*bla*CTX-M15 (*n* = 105), *bla*CTX-M9 (*n* = 1), *bla*TEM (*n* = 90), *bla*SHV (*n* = 1)]. Alarmingly, the carbapenemase genes *bla*OXA-48 (*n* = 5) and blaOXA-181 (*n* = 1) were found in isolates that also were phenotypically resistant to imipenem and meropenem. Using the array-based serogenotyping method, 66 of the 118 isolates (55%) could be genotypically assigned to O-types. Conclusion: This study is considered to be a first report of the high prevalence of ESBL-producing *E. coli* in dairy farms in Egypt. ESBL-producing *E. coli* isolates with different underlying resistance mechanisms are common in investigated dairy cattle farms in Egypt. The global rise of ESBL- and carbapenemase-producing Gram-negative bacteria is a big concern, and demands intensified surveillance.

Keywords: ESBL, carbapenemases, *Escherichia coli*, Egypt, dairy cattle, microarray, genotype, CRE

#### INTRODUCTION

Extended-spectrum beta-lactamases (ESBLs) are mainly plasmidencoded enzymes providing resistance to 3rd generation (3G) cephalosporins. These enzymes can be produced by a variety of different bacteria, such as Enterobacteriaceae or non-fermenting bacteria (Bradford, 2001; Giamarellou, 2005; Rawat and Nair, 2010; Shaikh et al., 2015). The most frequently found ESBLproducing species is Escherichia coli which often causes urinary tract infections, pneumonia or even sepsis in humans (Abraham et al., 2012). ESBL-producing E. coli has been broadly recognized in veterinary medicine as causative agents of mastitis in dairy cattle since the 2000s (Brinas et al., 2003; Haftu et al., 2012), but only a few studies exist that investigated the prevalence of ESBLproducing bacteria in livestock, showing their existence in sick and/or healthy cattle (Valentin et al., 2014; Dahms et al., 2015).

Unfortunately, there is no legislation in Egypt regulating the use of antibiotics (Dahshan et al., 2015). Antimicrobials such as tetracycline, quinolones, and beta lactams are still used in Egypt for growth promotion in animal feed and by veterinarians for the treatment and prevention of zoonotic diseases (WHO, 2013).

The CTX-M beta-lactamases, named for their greater activity against cefotaxime, are the most frequently detected ESBLs in livestock, and have been reported from different food-producing animals (Schmid et al., 2013; Brolund, 2014; Hansen et al., 2014). These animals also represent a source and/or a reservoir for ESBL-producing E. coli (Carattoli, 2008). Several studies indicate that these resistance genes are disseminated through the food chain or via direct contact between humans and animals (Schmid et al., 2013; Dahms et al., 2015). Data on ESBL-producing bacteria in food animals from Egypt are very limited. Therefore, the current study was conducted on four dairy cattle farms in different districts of Northern Egypt to assess the prevalence of ESBL-producing E. coli in dairy cattle and their environment.

#### MATERIALS AND METHODS

#### Farm Description and Sampling

In 2014, four dairy farms, three in Gamasa (GF1, GF2, GF5), and one in Damietta (D), were investigated (**Figure 1**). These farms were located in Nile Delta, Egypt in two different governorates (Damietta; Latitude N. 31◦ 19′ , Longitude E. 31◦ 81′ and Dakahlia Latitude N. 31◦ 25′ , Longitude E. 31◦ 32′ ). The herd size ranged from 400, 600, 650, and 800 in GF1, GF2, GF5, and D, respectively. The cattle enrolled in this study were between 2 and 10 years old. Half of the farms housed dairy cattle in free-stall barns and half of them in tie-stall barns. In total, 266 samples were collected from these farms. This included rectal swabs and milk samples from apparently healthy dairy cattle (n = 210), and environmental swab samples from water trough, feed and bedding (n = 56).

#### Bacterial Strains, Isolation, Identification, and Genomic DNA Extraction

All collected samples were enriched in buffered peptone water and cultivated on BrillianceTM ESBL agar (Oxoid, Wesel, Germany) for preliminary analysis for ESBL-producing E. coli. For further investigations, all suspected E. coli that grew on the selective medium were cultivated on tryptone yeast agar (Oxoid, Wesel, Germany). Presumptive characteristic E. coli isolates were identified by Gram staining and motility, and confirmed using a panel of biochemical tests (Triple Sugar Iron (TSI) agar, catalase, oxidase, H2S production and sugar fermentation) and API 20 E systems (bioMérieux, France; ISO, 2001). All confirmed isolates were subsequently re-tested using an automated microdilution technique (VITEK-2, bioMérieux, Nürtingen, Germany) that covered the following antibiotics: imipenem, meropenem, cefotaxime, ceftazidime, cefuroxime-axetil, cefuroxime, piperacillin/tazobactam, ampicillin/sulbactam, ampicillin, gentamicin, tobramycin, ciprofloxacin, moxifloxacin, tetracycline, tigecycline, co-trimoxazol, and fosfomycin (VITEK-2 test card: AST-N289). Susceptibility tests for chloramphenicol, kanamycin, streptomycin, erythromycin and colistin were not performed in this study.

Genomic DNA from clonal isolates was extracted using the DNeasy Blood & Tissue kit (Qiagen GmbH, Hilden, Germany) according to manufacturer's instructions. When necessary, DNA was concentrated to at least 100 ng/µl using a SpeedVac centrifuge (Eppendorf, Hamburg, Germany) at room temperature with 1400 rpm and for 30 min. Five microliters of recovered genomic DNA were used directly for biotin-labeling and subsequent hybridization.

### GenoSeroTyping and Antimicrobial Resistance Genotype

For all ESBL-producing E. coli, the serotype was determined using the E. coli SeroGenoTyping AS-1 kit. The antimicrobial resistance (AMR) genotype was detected by the CarbDetect AS-1 kit and all other resistance genes were detected by the E. coli PanType AS-2 kit (Alere Technologies GmbH, Jena, Germany). The data were automatically summarized by the "result collector," a software tool provided by Alere Technologies. An antibiotic resistance genotype was defined as a group of genes which have been described to confer resistance to a family of antibiotics (e.g.,

the genotype "blaCTX-M1/15, blaTEM" confers resistance to 3G cephalosporins) (**Table 2**).

### Multiplex Labeling, Hybridization, and Data Analysis

Extracted DNA was labeled by primer extension amplification using E. coli SeroGenoTyping AS-1, CarbDetect AS-1 or E. coli PanType AS-2 kits according to manufacturer's instructions. The procedure for multiplex labeling, hybridization and data analysis was described in detail by Braun et al. (2014). Briefly, internal labeling of the synthesized single stranded DNA resulted from the primer elongation of previously hybridized primers to the target genomic DNA, by using dUTP linked biotin as dideoxynucleotide triphosphate to be incorporated during synthesis. This procedure allowed site-specific internal labeling of the corresponding target region. The PCR protocol included 5 min of initial denaturation at 96◦C, followed by 50 cycles with 20 s of annealing at 50◦C, 40 s of elongation at 72◦C, and 60 s of denaturation at 96◦C (used device: Eppendorf Mastercycler gradient, Eppendorf, Hamburg, Germany). This reaction resulted in a multitude of specific linearly amplified, single-stranded, biotin-labeled DNA molecules for subsequent hybridization and detection using the DNA microarrays.

For hybridization procedures, the CarbDetect AS-1 and the E. coli PanType AS-2 kit were used according to manufacturer's instructions. CarbDetect ArrayStrips were placed in a thermomixer with an Alere ArrayStrip adapter (Quantifoil Instruments, Jena, Germany) and subsequently washed with 200 µl of deionized water at 50◦C with 550 rpm for 5 min and with 100 µl hybridization buffer C1 at 50◦C with 550 rpm for 5 min. Liquids were always completely removed using a soft plastic pipette (e.g., BRANDT, #612-2856) to avoid any scratching of the chip surface. In a separate tube, 10 µl of previously labeled, single-stranded DNA was dissolved in 90 µl hybridization buffer C1. The hybridization was carried out at 50◦C and 550 rpm for 1 h. After hybridization, the ArrayStrips were washed twice using 200 µl washing buffer C2 at 45◦C for 10 min, shaking at 550 rpm. Peroxidase-streptavidin conjugate C3 was diluted 1:100 in buffer C4. A total of 100 µl of this mixture was added to each well of the ArrayStrip and subsequently incubated at 30◦C and 550 rpm for 10 min. Thereafter, two washing steps with 200 µl C5 washing buffer were carried out at 550 rpm at 30◦C for 5 min. The visualization was achieved by adding 100 µl of staining substrate D1 to the ArrayStrips, and signals were detected using the ArrayMate device (Alere Technologies GmbH). Finally, an automatically generated HTML-report was provided giving information on the presence or absence of antimicrobial resistance genes and the affiliation to one of the more common species.

#### Ethic Statement

An Ethic Statement is not necessary. The isolates were obtained by noninvasive rectal swabs and no animal experiments were carried out for this study.

### RESULTS

### Antimicrobial Resistance Genotype and Phenotype

Rectal swabs samples (n = 210) yielded 98 (46.6%) cultures and environmental samples (n = 56) yielded 16 (28.6%) cultures of putatively ESBL-producing E. coli. All 114 isolates were Gramnegative, motile, catalase positive, oxidase negative and indoleproducing bacteria. Additionally, all isolates caused a decrease of pH and a color change of the TSI agar indicator and gas formation in the bottom of the test tube, and were therefore assigned as E. coli. In total, 113 (99.1%) phenotypically 3G cephalosporin-resistant isolates harbored at least one of the ESBL genes covered by the microarray (blaCTX-M15, blaCTX-M9, blaTEM, blaSHV; **Figure 2**). The carbapenemase gene blaOXA-48 was detected in five isolates (3.4%) and the carbapenemase gene blaOXA-181 (0.8%) was detected in one isolate (**Table 1**, **Figure 2**). These isolates showed a phenotypic resistance to imipenem and meropenem (**Table 2**). The total number of detected resistance genes is listed in **Table 1** and an overview to each isolate is given in **Figure 2**. The ESBL gene blaCTX-M1/15 was found in 103 isolates whereas blaCTX-M9 was only found in 9 isolates. Consensus sequences for blaTEM and blaSHV were found in 89 and 1 isolate, respectively. For all detected beta-lactamase genotypes, the phenotype was analyzed using the VITEK-2 instrument. The results are shown in **Table 2**. The concordance for the carbapenem resistance and ESBL genotype was 100%. In one phenotypic ESBL positive isolate, only the narrow spectrum beta-lactamase (NSBL) gene blaOXA-1 was found. Due to this unexpected phenotype the concordance for the NSBL genotype was only 60%.

Nine different aminoglycoside resistance genes were detected by the E. coli PanType AS-1 kit (**Table 1**). The combinations of these genes resulted in 14 different genotypes, whereas the most prevalent genotype was aac(6′ )-Ib in combination with aadA4 (n = 25). All isolates harboring this genotype were resistant to tobramycin, but three of them were susceptible to gentamicin (**Table 2**). Therefore, the concordance between genotype and phenotype was 94%. Five isolates, which harbored only aac(6′ )-Ib, were resistant to all tested aminoglycosides. The second most frequent genotype was aac(3)-lVa (n = 20). All isolates harboring this gene were resistant to gentamicin and tobramycin and corresponded to 100% of the expected phenotype (**Table 2**). Six isolates harboring the gene aphA were susceptible to tobramycin and gentamicin. The detected phenotype corresponded 100% with the genotype, as the enzyme AphA does not mediate resistance against both aminoglycoside antibiotics tested (Ramirez and Tolmasky, 2010).

The most prevalent genotype for fluoroquinolone resistance was qnrA1 followed by qepA. One isolate harboring qnrA1 was sensitive to both quinolone antibiotics tested (97.0% concordance), but all isolates with detected qepA gene were resistant (100% concordance). Overall, 86 of 114 isolates were resistant to ciprofloxacin and 92 against moxifloxacin. Only in 56 ciprofloxacin resistant isolates a corresponding genotype was detected. Similar results were observed for the 92 moxifloxacin resistant isolates, where only in 68 isolates a corresponding genotype was detected. The overall concordance of the detection of genes mediating fluoroquinolone resistance with phenotypic resistance was 79.0% (**Table 2**).

Resistance to co-trimoxazole is associated with sul and dfrA genes. All isolates with this gene combination were resistant to co-trimoxazole (**Table 2**). From 45 isolates without this gene combination, 34 were resistant. Therefore, the concordance of genotype and phenotype was 80.0%.

Of 114 isolates, one was resistant to fosfomycin. Such resistance is caused by mutations in ubiquitous genes (e.g., murA or glpT) or the loss of entire genes (e.g., uhpA) rather than by acquisition of distinct resistance markers and therefore the genotypes were not included into the test panel (Takahata et al., 2010; Li et al., 2015).

In summary, the overall concordance among all genotypes to expected phenotypes was 82.6% (**Table 2**).

#### Serotyping

For all 114 ESBL-producing E. coli isolates, the O- and Htypes were identified using the E. coli SeroGenotyping AS-1 kit (**Figure 3**). For 63 (55.3%) isolates, genes encoding both O and H antigens were detected and for 51 isolates (44.7%) only the gene encoding the H antigen could be detected. The most prevalent serotype was O101:H10. This serotype was found in locations D, GF1, and GF2 and isolated mainly from rectal swabs, as one isolate belonging to this group was found in a water trough. The AMR genotype for O101:H10 isolates was rather uniform (**Figure 3**). Isolates of serotype O53:H18 with the carbapenemase gene blaOXA-48 were only found in farm GF1 and were isolated from rectal swabs. All isolates belonging to this group were identical with regard to their phenotype and genotypes (**Table 2**, **Figures 2**, **3**). The serotype O8:H9 with blaOXA-181 was found only once in farm D from a rectal swab. For isolates where only the H-antigen H6 was found, a very similar AMR genotypes was detected. Such isolates were found in all investigated farms and sample types.

#### DISCUSSION

In 2014, four dairy farms in northern Egypt were investigated for ESBL-producing E. coli. In total, 210 clinically healthy dairy cattle were sampled using rectal swabs. Additionally, 56 environmental swabs were taken from different stall objects and screened for multi-drug resistant bacteria. All swabs were precultured on BrillianceTM ESBL Agar, and in 114 of 266 samples (42.8%), ESBL-producing E. coli were detected. To analyze the underlying molecular AMR mechanism, all 114 isolates were genotyped using the multiplex microarray technique. The most frequently detected gene which mediated resistance against 3G cephalosporins was blaCTX-M1/15 (90.4%). The genes blaCTX-M9 (5.3%) and blaTEM (78%), which also mediate resistance to 3G cephalosporins, were also detected. However, both were usually found in combination with blaCTX-M1/15. BlaCTX-M9 was related to resistance to 3G cephalosporins in just one isolate as well as blaTEM in four isolates. A comparison to data similar to data from this study is difficult due to missing reports from Egypt. Recently, two reports from Germany are known describing the

FIGURE 2 | Overview of antimicrobial resistance pattern. Antimicrobial resistant genes of all *E. coli* isolates obtained from swab samples (healthy dairy cattle and/or environment). Also given are the farm IDs, sample sources and sampling dates. (Abbreviations: CS, rectal swab; WT, water trough; BM, bulk milk; S, soil; FM, feed mixer; FA, feed animal; BS, boot swab; B, bedding; TMR, total mixed ration).

#### TABLE 1 | Antimicrobial resistance genes and their frequency in *E. coli* isolates detected by microarray.




**93**




TABLE 2 | Continued


*bAK amikacin, CN gentamicin,*

*c3G cephalosporin*

 *– 3rd generation cephalosporin.*

 *TOB tobramycin,*

 *FEP cefepime, CTX cefotaxime,*

 *CAZ ceftazidime,*

 *MEM meropenem,*

 *IMP imipenem, TE tetracycline,*

 *TGC tigecycline, STX co-trimoxazole,*

 *CIP ciprofloxacin,*

 *MXF moxifloxacin.*

FIGURE 3 | Overview of microarray-based serotyping. Serotype of all *E. coli* isolates obtained from swab samples (healthy dairy cattle and/or environment). Also given are the farm IDs, sample sources and sampling dates. (Abbreviations: CS, rectal swab; WT, water trough; BM, bulk milk; S, soil; FM, feed mixer; FA, feed animal; BS, boot swab; B, bedding; TMR, total mixed ration).

prevalence of ESBL-producing bacteria in livestock of healthy animals (Schmid et al., 2013; Dahms et al., 2015). Like in the present study, in both reports the most prevalent ESBL group was CTX-M. Schmid et al. (2013) collected a total of 598 samples that yielded 196 ESBL-producing E. coli (32.8%). The high percentage of ESBL-producing E. coli in healthy animals shows the high zoonotic risk for people working in close contact to animals. With this background Dahms et al. (2015) investigated different farms for ESBL-producing bacteria in samples collected from livestock and also from farm workers. In total, 70.6% of the tested farms and 5.8% of the farm workers were positive for ESBL-producing bacteria. In contrast, a study from Burgundy in France in 2012 showed only a low prevalence, of about 5%, of ESBL-producing bacteria in feces samples from different farms (Hartmann et al., 2012).

Due to the preselection of all samples on BrillianceTM ESBL agar, carbapenem resistant isolates were also retrieved. In six isolates, carbapenemase genes were detected using the CarbDetect AS-1 kit. In five isolates from a stall on farm GF1, blaOXA-48 was found. These isolates were phenotypically and genotypically identical. Interestingly, another isolate was found containing blaOXA-181, a gene belonging to the blaOXA-48 like family. Both genes are also widely distributed in human pathogens that cause hospital-acquired infection (Poirel et al., 2012). Carbapenems are last-line antibiotics with a broad spectrum and a high efficacy, and are stable against ESBLs. As they can only be administered intravenously, carbapenems are used exclusively in the clinical environment, and are not known to be used in animal husbandry. While carbapenemaseproducing Enterobacteriaceae (CPE) are mostly described in hospitalized humans (Nordmann et al., 2011; Abdallah et al., 2015b), the current study shows conclusively that such CREs are also found in farm animals including apparently healthy dairy cattle. The detection of carbapenem resistant isolates in such an environment and the threat that such multi-drug resistance bacteria ends up in consumer food (e.g., milk or dairy products), raises serious concerns about public health. Carbapenemaseproducing isolates have been detected in poultry farms (Abdallah et al., 2015a), but there are to date no reports describing finding of these multi-drug resistant bacteria in dairy cattle farms in Egypt.

Given that the majority of samples that contained ESBL/carbapenemase-producing bacteria originated from rectal swabs, this raises the question of how the potentially contaminated feces are disposed of. Normally, dung is used as fertilizer in agriculture. Via this route, multi-drug resistant pathogens might get into the food chain, either directly through

#### REFERENCES


consumption of meat, or indirectly from cattle grazing on fertilized pasture. Another major problem raises up in this context, the resistance genes described in this paper are usually found on plasmids (Chanawong et al., 2001; Paterson and Bonomo, 2005; Duan et al., 2006; Wittum et al., 2010; Brolund, 2014; Hansen et al., 2014; Valentin et al., 2014) and such mobile elements can be easily transferred to and between environmental bacteria (Aminov, 2011; Berglund, 2015), as well as to other human pathogens (Pitout et al., 2015). This poses a high risk to the environment, and the human population.

#### CONCLUSION

To the best of our knowledge, this study is the first report which analyses the prevalence of ESBL-producing E. coli in Egyptian dairy farms based on genotyping and phenotyping data. The high percentage of ESBL-positive isolates and even of carbapenemase-producing bacteria was alarming given the relevance of 3G cephalosporins and carbapenems in modern medicine. Additionally, the isolates had a highly diverse genetic background with regard to serotype, virulence and antimicrobial resistance markers (**Figures 2**, **3**). Experiments showed a high degree of concordance between genotype and phenotype.

Strict hygiene measures are mandatory to control the spread, the transmission dynamics and potential zoonotic risk factors of ESBL- and carbapenemase-producing bacteria in dairy farms.

#### AUTHOR CONTRIBUTIONS

SB, HE, and HH conceived of the study, and participated in its design and coordination. SB, IE, and DW carried out the genotyping and serotyping. SB and IE carried out the antimicrobial resistance pattern by VITEK-2. MA and HE participated in sampling. HE, HH, and MA participated in preliminary design as well as bacteriological analysis of part of the study. SB, SM, HE, HH, and RE drafted the manuscript. All authors read and approved the final manuscript.

#### ACKNOWLEDGMENTS

We thank Keri Clack (Alere Technologies, Jena, Germany) for proof reading the manuscript, Annett Reißig (Alere Technologies, Jena, Germany) and Elke Müller (Alere Technologies, Jena, Germany) for excellent technical support, Christina Braun for continuous support as well as for proof reading of the manuscript.

and carbapenemase-producing Enterobacteriaceae isolated from Egyptian patients with suspected blood stream infection. PLoS ONE 10:e0128120. doi: 10.1371/journal.pone.0128120


**Conflict of Interest Statement:** SB, DW, SM, IE, and RE are employees of Alere Technologies GmbH, the company that manufactures the microarrays also used in this study. This has no influence on study design, data collection and analysis, and this does not alter the authors' adherence to all the Frontiers policies on sharing data and materials.

The other authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Braun, Ahmed, El-Adawy, Hotzel, Engelmann, Weiß, Monecke and Ehricht. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Development of a High Resolution Virulence Allelic Profiling (HReVAP) Approach Based on the Accessory Genome of *Escherichia coli* to Characterize Shiga-Toxin Producing *E. coli* (STEC)

Valeria Michelacci <sup>1</sup> \*, Massimiliano Orsini <sup>2</sup> , Arnold Knijn<sup>3</sup> , Sabine Delannoy <sup>4</sup> , Patrick Fach<sup>4</sup> , Alfredo Caprioli <sup>1</sup> and Stefano Morabito<sup>1</sup>

#### *Edited by:*

Pina Fratamico, United States Department of Agriculture-Agricultural Research Service, USA

#### *Reviewed by:*

James L. Bono, United State Department of Agriculture- Agricultural Research Service, USA Erin R. Reichenberger, United States Department of Agriculture, USA

> *\*Correspondence:* Valeria Michelacci valeria.michelacci@iss.it

#### *Specialty section:*

This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology

*Received:* 26 November 2015 *Accepted:* 05 February 2016 *Published:* 23 February 2016

#### *Citation:*

Michelacci V, Orsini M, Knijn A, Delannoy S, Fach P, Caprioli A and Morabito S (2016) Development of a High Resolution Virulence Allelic Profiling (HReVAP) Approach Based on the Accessory Genome of Escherichia coli to Characterize Shiga-Toxin Producing E. coli (STEC). Front. Microbiol. 7:202. doi: 10.3389/fmicb.2016.00202 <sup>1</sup> European Reference Laboratory for Escherichia coli, Dipartimento di Sanità Pubblica Veterinaria e Sicurezza Alimentare, Istituto Superiore di Sanità, Rome, Italy, <sup>2</sup> Istituto Zooprofilattico Sperimentale dell'Abruzzo e del Molise G. Caporale, Teramo, Italy, <sup>3</sup> Servizio Informatico, Documentazione, Biblioteca e Attività Editoriali, Istituto Superiore di Sanità, Rome, Italy, <sup>4</sup> Platform IdentyPath, Food Safety Laboratory, ANSES, Université Paris-Est, Maisons-Alfort, France

Shiga-toxin producing Escherichia coli (STEC) strains possess a large accessory genome composed of virulence genes existing in multiple allelic variants, which sometimes segregate with specific STEC subpopulations. We analyzed the allelic variability of 91 virulence genes of STEC by Real Time PCR followed by melting curves analysis in 713 E. coli strains including 358 STEC. The 91 genes investigated were located on the locus of enterocyte effacement (LEE), OI-57, and OI-122 pathogenicity islands and displayed a total of 476 alleles in the study population. The combinations of the 91 alleles of each strain were termed allelic signatures and used to perform cluster analyses. We termed such an approach High Resolution Virulence Allelic Profiling (HReVAP) and used it to investigate the phylogeny of STEC of multiple serogroups. The dendrograms obtained identified groups of STEC segregating approximately with the serogroups and allowed the identification of subpopulations within the single groups. The study of the allelic signatures provided further evidence of the coevolution of the LEE and OI-122, reflecting the occurrence of their acquisition through a single event. The HReVAP analysis represents a sensitive tool for studying the evolution of LEE-positive STEC.

Keywords: accessory genome, allelic variants, STEC subtyping, phylogenesis, bioinformatics

## INTRODUCTION

Human infections with Shiga-toxin producing Escherichia coli (STEC) cause a wide range of symptoms including uncomplicated diarrhea, hemorrhagic colitis, and the life-threatening hemolytic uremic syndrome (HUS) (Caprioli et al., 2005). The main virulence feature of STEC is the ability to produce Shiga-toxins (Stx), which interfere with the protein synthesis in the target cells, eventually causing their death (O'Brien and Holmes, 1987). The capacity to produce Stx is acquired through infection with bacteriophages conveying the stx genes, which can remain stably integrated into the bacterial chromosome (O'Brien et al., 1984).

In spite of the striking biological effect exerted by the Stx, their sole production seems not to be sufficient for causing the disease, at least the most severe forms. As a matter of fact, only a few STEC serogroups are usually isolated from human cases of severe disease (Nataro and Kaper, 1998; Karmali et al., 2003), which share the presence in the genome of mobile genetic elements (MGEs) encoding robust machineries for the colonization of the host gut (McDaniel and Kaper, 1997; Paton et al., 2001; Morabito et al., 2003; Imamovic et al., 2010; Michelacci et al., 2013). Three Pathogenicity Islands (PAIs) have been described in the genome of such STEC serogroups: the locus of enterocyte effacement (LEE) (McDaniel and Kaper, 1997), the OI-122 (Karmali et al., 2003; Morabito et al., 2003), and the OI-57 (Imamovic et al., 2010).

The LEE locus governs the ability to induce the typical "attachment and effacement" (A/E) lesion on the enterocyte. It encodes a type three secretion system, effectors subverting the cell functions related with the cytoskeleton assembly and maintenance, and factors mediating the intimate adhesion of the bacterium to the enterocyte, including the adhesin intimin (McDaniel and Kaper, 1997). The other two PAIs carry genes whose products are also involved in the mechanism of colonization, such as Efa1/LifA, encoded by a gene present in the OI-122 (Morabito et al., 2003), and AdfO (Ho et al., 2008), whose genetic determinant is conveyed by the OI-57 (Imamovic et al., 2010).

During the last decades different authors deployed schemes for the classification of the different STEC types (Griffin and Tauxe, 1991; Nataro and Kaper, 1998; Karmali et al., 2003). One of these schemes groups the STEC strains based on the serogroup, relative incidence of human infections, ability to cause severe diseases, association with outbreaks and presence of virulence-associated MGEs in the genome (Karmali et al., 2003). According to this classification, STEC are divided into seropathotypes (SPTs), identified with letters from A to E in a decreasing rank of pathogenicity. SPT A comprises STEC O157, while SPT B includes the STEC belonging to serogroups different from O157 but causing both sporadic cases and outbreaks of HUS, namely O26, O103, O111, O145, and O121. SPTs A and B share the presence of the LEE, OI-57, and OI-122 PAIs in their genome. The SPT C includes a number of STEC serogroups, including O113 and O91, which apparently do not harbor the LEE locus but are sporadically isolated from severe infections. Finally, STEC included in the SPTs D and E have rarely or never been associated with human disease respectively (Karmali et al., 2003). For the last three SPTs the information on the presence and integrity of the three PAIs are scanty.

The complexity of the STEC virulome is an important source of strain genomic variability, which is further augmented by the existence of multiple allelic variants of the virulence genes. Some of the subtypes of stx2 have been significantly associated with the most severe infection (Friedrich et al., 2002), while some other subtypes of both stx1 and stx2 seemed to be primarily associated with a milder course of the disease or confined to animal hosts (Friedrich et al., 2002; Bielaszewska et al., 2006; Persson et al., 2007; Scheutz et al., 2012). A considerable heterogeneity has also been identified in the DNA sequence of the intimin-coding gene eae, leading to the identification of at least 18 intimin types

In the present study we developed an approach to simultaneously identify the presence and the allelic types of a large panel of genes carried by the LEE locus, OI-122, and OI-57 PAIs and used it to study the phylogeny of STEC belonging to SPT A, B and C.

## MATERIALS AND METHODS

### Bacterial Strains

A total of 713 E. coli strains positive for at least one of the three pathogenicity islands LEE, OI-122, and OI-57 were selected among the isolates present in the culture collections of the Istituto Superiore di Sanità (ISS, Rome, Italy) and the Agence Nationale de Sécurité Sanitaire de l'Alimentation, de l'Environnement et du travail (ANSES, Maisons Alfort, France) and used to identify the alleles of the genes harbored by the three PAIs. The panel comprised 358 STEC strains belonging to serogroups O157 (n = 81), O26 (n = 32), O111 (n = 36), O103 (n = 8), O145 (n = 8), O121 (n = 3), and others (n = 190) isolated from unrelated human cases of human infections and from food in Italy and France in the period 2008–2011. Additional 355 stx-negative E. coli of multiple serogroups and isolated from human, food and animal sources in the same countries and period were included in the study. The O157 STEC strain EDL933 was used as positive control (Supplementary Table 1, Sheet 1).

A population of 318 unrelated STEC strains, part of the panel of strains described above, was used to assess the performance of the HReVAP approach. These included 161 isolates of SPTs A and B and belonging to serogroups O157 (n = 81), O26 (n = 32), O111 (n = 33), O103 (n = 5), O145 (n = 7) and O121 (n = 3) and 157 eae-negative strains, of serogroups O91 (n = 14), O174 (n = 10), O113 (n = 9), O104 (n = 6), O101 (n = 3), O153 (n = 3), O21 (n = 3), and others (n = 109). The study population also included a panel of 36 stx-negative eae-positive E. coli, altogether referred to as EPEC, including the following serogroups: O26 (n = 12), O127 (n = 5), O55 (n = 4), O128 (n = 3), O125 (n = 3), O111 (n = 2), O86 (n = 2), and others (n = 5).

Finally, 39 out of the 161 SPTs A and B STEC were also subjected to whole genome sequencing, followed by Multi Locus Sequence Typing (MLST) and Whole genome SNP analysis with the aim of comparing HReVAP results with those obtained using these DNA-sequence-based methods. The latter isolates included strains belonging to serogroups O157 (n = 16), O26 (n = 12), O111 (n = 7), O103 (n = 2), O145 (n = 1), and O121 (n = 1).

### Real-Time PCR and Melting Curves Analysis

Ninety-one primer pairs were deployed to amplify 100–300 bp fragments from as many genes harbored on the three PAIs LEE, OI-122, and OI-57 (38, 12, and 41 genes from each island, respectively). The genomic sequence of the O157 STEC strain EDL933 (Acc. no. AE005174) was used to design the primer pairs using the Primer-BLAST web-tool available on the NCBI webserver. Some of the primers were degenerated to amplify the target genes in all the STEC strains for which a sequence was available in GenBank. The sequences of primers used in this study and their annealing position on the genomic sequence of EDL933 strain are reported in **Table 1**.

Total DNA was extracted from overnight cultures of the strains with the Nucleospin Tissue extraction kit (Macherey-Nagel, Düren, DE).

The Real-Time PCR reactions were performed on the high throughput BioMark Real Time PCR system with 96.96 Genotyping Dynamic Array Chips (Fluidigm, San Francisco, CA), using the EvaGreen DNA binding dye (Biotium Inc., Hayward, CA). The thermal profile was 95◦C for 10 min (enzyme activation) followed by 35 cycles of 95◦C for 15 s and 60◦C for 1 min (amplification step). Finally, a denaturation step was performed and the melting curves of the amplified products were registered. Eight array chips were used to perform the whole panel of reactions, each including a positive template control consisting in the DNA extracted from an overnight culture of the EDL933 strain. In addition to the 91 genes, each sample was also subjected to amplification of the stx genes (Perelle et al., 2004) and the wecA housekeeping gene, used as marker for E. coli species (Forward primer: 5′ -CTTTATCTCAGTAGCCTGGG-3′ , Reverse primer: 5′ -AGGAAGTAACCAAACGGTCC-3′ ).

### High Resolution Virulence Allelic Profiling Analysis (HReVAP)

The melting temperatures (Tm) of the amplicons were normalized using the Tm of the PCR products amplified from the positive control strain EDL933 present in each array chip. Normalization values were obtained for each array chip minimizing the mean of the Tm standard deviations for the eight normalized Tm measurements over each gene as well as the overall maximum value of the normalized Tm ranges of all genes.

The Tm frequency distributions for each gene have been calculated by grouping values in 0.05◦C intervals. The frequency distributions were then analyzed with the "mix" function of the "mixdist" package in the R software (RTeam, 2014). This function finds Maximum Likelihood estimates for the proportions, means, and standard deviations of a mixture distribution by applying a Newton-type iterative method (RTeam, 2014). The number of Gaussians and the starting parameters were adjusted upon evaluation of several fitting results. In order to limit the degrees of freedom, prior knowledge was applied to the model: in each model fit, the standard deviations (σ) of the Gaussian curves were left variable but constrained to be equal. The following command was used for the analysis: mix[Tm\_dat, Tm\_par, dist = "norm," mixconstr(conmu = "NONE," consigma = "SEQ")], where Tm\_dat is the data frame of the grouped Tm data and Tm\_par a data frame of the starting values for the parameters of the distributions. Several normal distributions were observed for each gene at different intervals of temperatures. Before clustering, Tm values were aggregated into numbered classes utilizing the model fitting results to define temperature intervals for distinct alleles. The interval limits were calculated as the points of intersection between two adjoining Gaussian curves with the same standard deviation according to the following equation:

$$T\_{\text{int}}\left(^{\circ}\text{C}\right) = \frac{\mu\_1^2 - \mu\_2^2 - 2\,\sigma^2 \ln\left(\frac{\pi\_1}{\pi\_2}\right)}{2\mu\_1 - 2\,\mu\_2} \text{ (1)}$$

Tint = Temperature at the point of intersection between Gaussian 1 and 2

µ<sup>i</sup> = Mean temperature of the Gaussian curve i = 1, 2

σ = Standard deviation of the Gaussian curves

π<sup>i</sup> = Amplitude of the Gaussian curve i = 1, 2

The model fitting used did not prove optimal for allele assignment due to the proximity of many Gaussian curves, with consequent overlap. Therefore, all fits have been revised trying to keep the maximum number of Gaussian curves in the model while aggregating those heavily overlapping. With this procedure, the resolution of the method (capacity of allele distinction) has been lowered in benefit of the precision of the method (convergence of assignments). The Tm intervals identified by peaks in the distributions obtained from the analysis of each gene were used to identify the alleles, which were labeled with numbers in ascending order according to the position of the peak in the temperature distribution (Supplementary Table 2). Each strain was given a numeric allelic signature comprising the alleles of all the genes analyzed.

The neighbor software of the Emboss package for samples clustering with default parameters (Rice et al., 2000) was used to compare the allelic signatures and to obtain distance matrices.

Each step of the above described procedure has been implemented in dedicated python classes, including the neighbor software that was wrapped together with the TreeGraph software (Stover and Muller, 2010) for graphical trees representation.

The HReVAP software package was deployed and used on the public computational framework ARIES operating on the servers of the Istituto Superiore di Sanità and based on the Galaxy bioinformatics platform (Giardine et al., 2005; Blankenberg et al., 2010; Goecks et al., 2010) (https://w3.iss.it/site/aries/).

The tree files (.tre) produced with the HReVAP clustering algorithm were downloaded and visualized using the FigTree program version 1.4.0 (Drummond et al., 2012).

### Whole Genome Sequencing of STEC and Phylogenetic Analyses

Thirty-nine out of the 161 STEC strains used for the HReVAP typing were subjected to whole genome sequencing using the Library Preparation Kit by Kapa Biosystems (Wilmington, MA, USA) and a paired end 100 bp protocol on an Illumina HiSeq2500 instrument in fast run mode according to manufacturers' instructions. The sequencing reads have been uploaded in the EMBL-ENA sequence database (EMBL European Nucleotide Archive Study accession no. PRJEB11886). The raw reads were trimmed to remove the adaptors and to accept 27 as the lowest Phred value and assembled using the de novo assembly tool Edena v3 (Hernandez et al., 2008). The contigs were subjected to in silico Multi Locus Sequence Typing (MLST) with the protocol described by Wirth and colleagues (Wirth et al., 2006). Whole genome Single Nucleotide Polymorphism (WG-SNP) analysis was performed using the ksnp3 pipeline (Gardner et al., 2015), using 19 as kmer size. The optimum value for the kmer size was selected as that producing the highest number of unique kmers of the median length in all the genomes of the dataset and it was calculated by using the kchooser tool included in the ksnp3 pipeline. All the bioinformatics analyses were performed through the ARIES webserver (https://w3.iss.it/site/aries/).

### RESULTS

## HReVAP: Identification of the Alleles

The analysis of the 91 genes conveyed by the three pathogenicity islands LEE, OI-122, and OI-57 allowed identifying a total of 476 alleles (Supplementary Table 1, Sheet 1 and Supplementary Table 3). Each gene displayed 2 to 10 different alleles in the study population.

The eae-positive strains included in the panel exhibited the presence of 32 out of the 38 LEE-harbored genes on average, while the strains positive for the marker of OI-122, efa1-lifA, were positive for the majority of the 12 genes selected on this PAI (10.8 genes on average). The OI-57 showed the widest variability. As a matter of fact, 10.4% of the isolates proved positive for 1 to 10 genes of PAI OI-57, 30.6% were positive for 11–20 genes, 26% fell in the range of 21–30 genes detected and 33% gave positive result for more than 31 out of the 41 targets considered (Supplementary Table 1, Sheet 1).

The genes conveyed by the LEE and the OI-122 islands showed a mean number of alleles of 4.79 (range: 2–8 alleles; median = 5) and 4.17 (range: 3–8 alleles; median = 4) for each gene, respectively, while those part of the OI-57 were the most variable, displaying a mean value of 5.95 alleles each (range: 3– 10 alleles; median = 6) (**Figure 1** and Supplementary Table 3). Interestingly, the LEE locus displayed a uniform allelic variation throughout its whole length while the OI-122 and the OI-57 appeared to have a slightly higher number of alleles in the leftmost part (**Figure 1**).

### HReVAP Performance: Amplification of the LEE, OI-122, and OI-57 Targets in STEC and EPEC

All the LEE-genes could be amplified in the vast majority of the STEC strains belonging to SPTs A and B (**Figure 2A** and Supplementary Table 1, Sheet 2). In detail, no negative results were obtained for all the O157 and O145 strains tested, with the only exception of one O145 strain, negative for eight targets. Nine LEE-borne genes, namely the open reading frames (ORF) Z5101, Z5107, Z5111, Z5112, Z5114, Z5117, Z5121, Z5122, and Z5127 were more variable in the STEC serogroups other than the O157 and O145, with more than 60% of the STEC O26 strains tested producing no amplicons. The same nine targets could not be amplified in all the STEC O103 and O121 strains, with a few exceptions (**Figure 2A**). Four of these nine gene targets (Z5107, Z5112, Z5114, and Z5122) could not be amplified from the whole

OI-122. (C) Results of the ORFs harbored by the OI-57.

panel of the STEC O111, while ORF Z5101 gave positive result in only two strains of this serogroup.

As for the OI-122 PAI, all the 12 ORFs selected were detected in the whole panel of STEC O157, O111, and O121 strains, with the only exception of one O157 strain, which was negative for the three ORFs Z4331, Z4332, and Z4333 (**Figure 2B** and Supplementary Table 1, Sheet 2).

The STEC strains belonging to serogroups O145, O103, and O26 showed different amplification profiles, with the four ORFs Z4318, Z4320, Z4321, and Z4322 negative in 60% of the O103 strains and in the majority of O145 and O26 strains (**Figure 2B**).

The HReVAP typing of the STEC and EPEC confirmed the highest degree of variation of PAI OI-57. In particular, ORFs Z2112, Z2114, and Z2116 could not be amplified in many STEC O26, O157, and O145 while ORFs Z2090 and Z2091 were not detected in all the STEC O111, in the majority of STEC O26 strains, and in some STEC O145 and O157. Finally, the ORF Z2085 was frequently negative in STEC O111 and O26 (**Figure 2C** and Supplementary Table 1, Sheet 2).

Unexpectedly, the eae-negative STEC strains tested also showed positivity to many targets of the PAI OI-57 (**Figure 2** and Supplementary Table 1, Sheet 3). In particular, two ORFs, Z2054, and Z2101, were positive in more than 95% of the strains tested, while genes Z2037, Z2039, Z2056, Z2057, Z2060, Z2069, Z2071, Z2084, Z2086, Z2096, Z2118, Z2131, and Z2146, were positive in more than 50% of the population assayed.

As a whole, a mean of 15.9 targets out of the 41 selected for OI-57 were present in the panel of eae-negative STEC strains (range: 5–39; median = 15) (Supplementary Table 1, Sheet 3).

The EPEC strains assayed provided different amplification patterns (**Figure 2** and Supplementary Table 1, Sheet 4). As expected, the LEE-borne ORFs and the OI-122 targets followed a pattern of positivity to PCR similar to that displayed by the STEC belonging to SPTs A and B, although with some variation. The OI-57 was present in all the EPEC strains tested but showed two regions of major variability encompassing the ORFs Z2090-Z2093 (negativity range: 72.2–83.3%) and Z2112-Z2116 (negativity range: 75–77.78%) (**Figure 2** and Supplementary Table 1, Sheet 4).

#### HReVAP Typing: Allelic Variability of the LEE, the OI-122, and the OI-57

The allelic variability of the genes harbored by the three pathogenicity islands LEE, OI-122, and OI-57 has been investigated in the same study population used to assess the HReVAP performance.

The clustering of the allelic signatures of the STEC strains of SPTs A and B produced a dendrogram whose branches segregated with the serogroups, with a few exceptions (**Figure 3A**). In particular, the cluster formed by STEC O157 strains appeared clearly distinct from the others and much more homogeneous, while the strains belonging to O111 and O26 serogroups were divided in two and three distinct clusters, respectively. Similar results were obtained when the cluster analysis was carried out separately using the allelic signatures produced with the ORFs of the LEE locus (**Figure 3B**) and of the PAI OI-122 only (**Figure 3C**). Such dendrograms displayed the same topology of that produced when the alleles of the complete ORFs panel were used but showed a lower intra-cluster resolution. More complex results were instead obtained from the HReVAP analysis of the genes conveyed by the OI-57, reflecting the highest variability observed in the ORFs of this PAI (**Figure 3D**). Even if the main groups corresponding to STEC serogroups O157, O111, and O26 could still be detected, the overall topology showed wider and less defined branches.

The topology of the dendrograms obtained with the EPEC isolates resembled that of those produced with the STEC of SPTs A and B allelic signatures. However, the higher variability of EPEC, together with the lower number of isolates tested, caused the output to be less definite (**Figure 4**).

As for the eae-negative STEC strains, the cluster analysis of the allelic signatures was carried out exploiting the observed positivity to many of the ORFs of the OI-57. Although based on a smaller number of targets, this analysis showed a massive variability in the allelic signatures, yet able to distinguish and group different populations of strains (**Figure 5**).

### Comparison between HReVAP and the DNA Sequence-Based MLST and Whole Genome-SNP Analyses

Thirty-nine STEC strains were used to compare the results of the HReVAP with those produced by DNA sequence-based typing techniques. The strains were either analyzed with the HReVAP or their genomes subjected to in silico MLST and WG-SNP analysis. The comparison showed that the HReVAP (**Figure 6A**) had a much higher discriminatory power than the MLST (**Figure 6B**) and produced a dendrogram similar

labeled according to the following color legend: dark blue for O157, red for O26, green for O111, pale blue for O103, purple for O145, and black for O121. (A) Dendrogram of the allelic signatures from all the 91 ORFs. (B) Dendrogram of the allelic signatures obtained with the ORFs of the LEE locus. (C) Dendrogram of the allelic signatures obtained with the ORFs of the OI-122. (D) Dendrogram of the allelic signatures obtained with the ORFs of the OI-57.

#### TABLE 1 | List of primers used in this study.


#### TABLE 1 | Continued


#### TABLE 1 | Continued


TABLE 1 | Continued


to that produced with the WG-SNP (**Figure 6C**). Apparently, the topology of the HReVAP dendrogram allowed identifying differences within the serogroups, that were not visible in the WG-SNP-based dendrogram.

#### DISCUSSION

The detection of the enzyme isoforms or of the polymorphisms in the genomes of pathogenic microorganisms has been for a long time the basis for the identification of molecular profiles of isolates. Bacterial subtyping has been largely used in research studies on the evolution of bacterial pathogens since the first development of typing methods such as the multi locus enzyme electrophoresis (Selander and Levin, 1980; Selander et al., 1986; Donkor, 2013) and the pulsed field gel electrophoresis (Arbeit et al., 1990), followed by the elaboration of schemes for the identification of the allelic forms of genes as in the multi-locus sequence typing (MLST) (Maiden et al.,

1998). Moreover, molecular typing of microorganisms soon demonstrated its great potential in the control of infectious diseases through the implementation of surveillance programs aiming at limiting the burden of infections (Swaminathan et al., 2001). Nowadays, molecular subtyping of bacteria can benefit from cutting edge technologies such as the next generation sequencing (NGS) allowing the detection of whole genome single nucleotide polymorphism (SNP) (WG-SNP) (Kuroda et al., 2010; Vogler et al., 2011; Joensen et al., 2014; Dallman et al., 2015b).

WG-SNP has been successfully used to define a typing scheme for the surveillance of Listeria monocytogenes infections (Commission Decision, 2010). However, its application to bacterial pathogens such as E. coli, although advisable, is still under debate given the extensive genomic variability of this bacterial species. The development of WG-SNP-based typing concepts for E. coli has been attempted by several authors and proved successful for some STEC serogroups such as O157 and, to a lesser extent, O26 (Dallman et al., 2015a,b,c; Holmes et al., 2015; Jenkins et al., 2015). However, a single approach successfully applicable to all the STEC serogroups has not been developed yet.

We have deployed a typing scheme for STEC based on the evaluation of polymorphisms in the sequence of a large panel of virulence genes through the determination of the melting temperature of Real-Time PCR amplicons. Such an approach originated from multiple considerations. The virulence genes, part of the accessory genome, have a higher variability than the rest of the genome. Such an increased variability would reduce the number of targets needed for the phylogenetic analysis.

Additionally, comparing the allelic combinations of the fraction of genome shared by all the members of a pathotype (e.g., STEC) should overcome the need of finding an appropriate reference or setting a threshold for the diversity, which would introduce a bias in the evaluation of clusters. These aspects both represent limitations of the currently described whole genome sequence-based methods for E. coli (Dallman et al., 2015a,b,c; Holmes et al., 2015; Jenkins et al., 2015). Finally, the use of the widely diffused Real Time PCR to obtain the strains' signatures makes it not necessary to resort to NGS, which is only available in reference laboratories and requires skills and knowledge of the downstream bioinformatics applications that might be unavailable in most of the front-line laboratories.

The rationale behind the proposed concept resides in the many studies published on the allelic variants of known virulence genes of STEC, such as the Shiga-Toxin-coding genes (Friedrich et al., 2002; Bielaszewska et al., 2006; Persson et al., 2007; Scheutz et al., 2012) and the eae gene (Oswald et al., 2000; Tarr et al., 2002; Ito et al., 2007; Madic et al., 2010), as well as the more recently described subAB and toxB genes (Tozzoli et al., 2010; Michelacci et al., 2013, 2014). All the mentioned papers described the association between specific alleles and sub-populations of STEC strains. We investigated the allelic forms of 91 virulence genes conveyed by the three main MGEs associated with STEC pathogenicity, namely the LEE, the OI-122, and the OI-57 (McDaniel and Kaper, 1997; Karmali et al., 2003; Imamovic et al., 2010) and used the obtained allelic signatures to investigate the phylogenesis of STEC.

The whole process, termed High Resolution Virulence Allelic Profiling (HReVAP), allowed us to identify a range of 2–10 allelic forms for each of the 91 ORFs, resulting in the impressive number of 476 total alleles generating a high number of unique allelic signatures.

The HReVAP clustered the LEE-positive STEC strains into groups approximately segregating with the serogroup, providing an indication that the allelic signatures were not randomly assigned to the isolates. Additionally, the analysis identified different subpopulations within the serogroups and also showed variability within each of the populations identified (**Figure 3**). This finding was not unexpected, since all the strains used in the test panel were epidemiologically unrelated, and at the same time provided an indication that the HReVAP might also be successful in identifying clusters of related strains such as those derived from an outbreak.

The HReVAP produced allelic signatures also with eaenegative STEC. However, the dendrogram obtained with these strains had a less resolved topology (**Figure 5**). An explanation of this result resides either in the lower number of genes these isolates are positive for or in the low number of strains in each serogroup, which in some cases only included one isolate. Nevertheless, the finding that at least part of this PAI

was frequently present in eae-negative STEC is interesting and constitutes the first report of the presence of this PAI, or its remnants, in this group of STEC.

The HReVAP analysis also proved useful in following the evolution of the single MGEs considered for the typing scheme. As a matter of fact, we could visualize a similar pattern of variation in the allelic signatures obtained considering the ORFs of the LEE locus and the OI-122 (**Figures 3B,C**). This result indicates that the two MGEs underwent similar evolutionary pathways and supports the previous hypothesis about their common acquisition through a single event of horizontal gene transfer in certain STEC and EPEC strains (Morabito et al., 2003).

Our results showed that the OI-57 had the greatest genetic variability, displaying the highest number of alleles on average for all the ORFs considered (**Figure 1C** and Supplementary Table 3). Additionally, the analysis of the allelic signatures obtained considering the OI-57 ORFs produced dendrograms with the most dispersed topology (**Figures 3D**, **4D**). These observations suggest that this MGE could have been acquired at an early stage of the evolutionary pathway that led to the emergence of STEC. Additionally, since the LEE-negative STEC investigated were also positive for many of the OI-57-related ORFs considered in this study, it can be hypothesized that such an island could be a common heritage of STEC independently of the presence of the LEE locus.

Finally, the comparison of the performance of the HReVAP with that obtained with other comparative genomic tools such as the MLST and the WG-SNP analysis substantiates the robustness of the HReVAP in identifying LEE-positive STEC populations with a much higher resolution with respect to the MLST and a comparable level of discrimination to that of the WG-SNP.

In conclusion, the HReVAP approach demonstrated good sensitivity and high resolution in the molecular characterization of STEC, particularly for the LEE-positive strains. Moreover, the incredibly large virulome of pathogenic E. coli offers the opportunity to refine the HReVAP typing strategy for other STEC groups, such as the LEE-negative isolates, or even to extend it to other E. coli pathotypes by integrating the panel of targets.

Further work is in progress to assess the use of HReVAP as an effective tool for the surveillance of STEC infections and to obtain the allelic signatures from whole genome sequences in order to make this technique a cross-generational tool connecting the Real-Time PCR and the NGS-based applications.

#### AUTHOR CONTRIBUTIONS

VM conceived the experimental design and drafted the manuscript, MO developed the scripts for HReVAP clustering, and critically revised the manuscript, AK developed and applied the scripts for the extraction of the allelic signatures of the HReVAP and critically revised the manuscript, SD and PF designed the Real Time PCR primers, performed the amplifications and melting curve analyses and participated in the revision of the manuscript, AC contributed to the revision of the draft manuscript for important intellectual content, SM conceived the study and thoroughly revised the manuscript. Finally, all the authors approved the manuscript to be published.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmicb. 2016.00202

## REFERENCES


detection and epidemiological surveillance. J. Clin. Microbiol. 53, 3565–3573. doi: 10.1128/JCM.01066-15


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Michelacci, Orsini, Knijn, Delannoy, Fach, Caprioli and Morabito. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Revisiting the STEC Testing Approach: Using *espK* and *espV* to Make Enterohemorrhagic *Escherichia coli* (EHEC) Detection More Reliable in Beef

Sabine Delannoy <sup>1</sup> , Byron D. Chaves <sup>2</sup> , Sarah A. Ison<sup>2</sup> , Hattie E. Webb<sup>2</sup> , Lothar Beutin<sup>3</sup> , José Delaval <sup>4</sup> , Isabelle Billet <sup>5</sup> and Patrick Fach<sup>1</sup> \*

<sup>1</sup> Food Safety Laboratory, Université Paris-Est, Anses (French Agency for Food, Environmental and Occupational Health and Safety), Platform IdentyPath, Maisons-Alfort, France, <sup>2</sup> Department of Animal and Food Sciences, Texas Tech University, Lubbock, TX, USA, <sup>3</sup> Division of Microbial Toxins, National Reference Laboratory for Escherichia coli, Federal Institute for Risk Assessment, Berlin, Germany, <sup>4</sup> Laboratoire de Touraine, (LDA37) Conseil Départemental, Tours, France, <sup>5</sup> SAS Charal Groupe Bigard, Cholet, France

#### *Edited by:*

Pina Fratamico, United States Department of Agriculture, Agricultural Research Service, USA

#### *Reviewed by:*

James L. Smith, United States Department of Agriculture, USA George Carl Paoli, United States Department of Agriculture, USA

> *\*Correspondence:* Patrick Fach patrick.fach@anses.fr

#### *Specialty section:*

This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology

*Received:* 30 October 2015 *Accepted:* 05 January 2016 *Published:* 22 January 2016

#### *Citation:*

Delannoy S, Chaves BD, Ison SA, Webb HE, Beutin L, Delaval J, Billet I and Fach P (2016) Revisiting the STEC Testing Approach: Using espK and espV to Make Enterohemorrhagic Escherichia coli (EHEC) Detection More Reliable in Beef. Front. Microbiol. 7:1. doi: 10.3389/fmicb.2016.00001

Current methods for screening Enterohemorrhagic Escherichia coli (EHEC) O157 and non-O157 in beef enrichments typically rely on the molecular detection of stx, eae, and serogroup-specific wzx or wzy gene fragments. As these genetic markers can also be found in some non-EHEC strains, a number of "false positive" results are obtained. Here, we explore the suitability of five novel molecular markers, espK, espV, ureD, Z2098, and CRISPRO26:H11 as candidates for a more accurate screening of EHEC strains of greater clinical significance in industrialized countries. Of the 1739 beef enrichments tested, 180 were positive for both stx and eae genes. Ninety (50%) of these tested negative for espK, espV, ureD, and Z2098, but 12 out of these negative samples were positive for the CRISPRO26:H11 gene marker specific for a newly emerging virulent EHEC O26:H11 French clone. We show that screening for stx, eae, espK, and espV, in association with the CRISPRO26:H11 marker is a better approach to narrow down the EHEC screening step in beef enrichments. The number of potentially positive samples was reduced by 48.88% by means of this alternative strategy compared to the European and American reference methods, thus substantially improving the discriminatory power of EHEC screening systems. This approach is in line with the EFSA (European Food Safety Authority) opinion on pathogenic STEC published in 2013.

Keywords: STEC, O157, non-O157, *espK*, *espV*, *ureD*, *Z2098*, CRISPR

## INTRODUCTION

Shiga toxin-producing Escherichia coli (STEC) are important zoonotic pathogens comprising more than 400 serotypes (Beutin and Fach, 2014). A fraction of these serotypes are able to cause bloody diarrhea and may progress to hemolytic uremic syndrome (HUS). This subset of STECs is termed Enterohemorrhagic E. coli (EHEC) (Beutin and Fach, 2014). STEC O157:H7 has been the first pathogenic E. coli whose presence in foodstuffs was regulated. Today, non-O157 STEC infections have increased greatly, sometimes accounting for up to 70% of notified STEC infections (Brooks et al., 2005; Johnson et al., 2006; Gould et al., 2013; EFSA, 2014). Consequently, regulations in the US and in the EU have evolved to include some non-O157 STEC serogroups together with serotype O157:H7.

Successful implementation of these regulations in the food industry across the world requires effective detection methods that are both specific and sensitive. Detection of non-O157 STEC in foods is particularly challenging because these bacteria lack phenotypic characteristics that distinguish them invariably from the large number of non-STEC flora that share the same habitat. Additionally, they may be present in very low numbers, with a heterogeneous distribution in the food matrices. The environment of the food matrix may also trigger stress responses from the STEC strains and induce latent physiological states that further complicate detection (Wang et al., 2013).

In the absence of a clear definition of virulent STEC strains, the ISO/CEN 13136:2012 Technical Specification (ISO, 2012) and US MLG5B.05 (USDA-FSIS, 2014) use a stepwise approach, comprising an initial screening step for virulence genes (Shiga toxin genes, stx, and intimin gene, eae), followed by testing of Oserogroup specific gene markers. Because the stx and eae genes can be independently present in a number of non-pathogenic strains of E. coli and other Enterobacteriaceae, the first screening step generates numerous signals from samples that do not necessarily contain a true EHEC strain. These stx/eae positive enrichments must then be subjected to a second screening targeting the O-group gene markers. As testing is completed on enrichment broths that contain a mixture of different cells, the different target gene signals (i.e., stx, eae, and the O-group markers) may arise from different individual non-pathogenic strains. Lastly it is necessary to perform an isolation step to confirm the presence of the different markers in a single isolate by PCR.

We have previously shown that combining the detection of espK with either espV, ureD, or Z2098 is a highly sensitive and specific approach for identifying the top seven clinically important EHEC serotypes in industrialized countries (Delannoy et al., 2013a). These markers were shown to be preferentially associated with E. coli strains carrying stx and eae genes, known as typical EHEC, and could be used in conjunction with stx/eae screening to better identify samples that may be more likely to contain a true EHEC.

Recently, a new clone of STEC O26:H11 harboring stx2a and strongly associated with HUS has emerged (Bielaszewska et al., 2013). Characterization of O26:H11 stx2 circulating in France (Delannoy et al., 2015a,b) demonstrated that some stx2a or stx2d positive strains do not have any of the espK, espV, ureD, or Z2098 markers. Hence, such clones would evade a first detection step solely based on stx/eae and a combination of those genetic markers. Therefore, we identified a CRISPR (Clustered Regularly Interspaced Short Palindromic Repeats) sequence specific for this O26:H11 EHEC French clone (Delannoy et al., 2015a). Detection of this CRISPR sequence would advantageously be combined with stx and eae in the first screening step to identify this new O26:H11 clone. Furthermore, we have previously developed CRISPR-based real-time PCR assays able to detect the top seven EHEC serotypes and the German O104:H4 STEC clone responsible for a very large European STEC outbreak in 2011 (Delannoy et al., 2012a,b; Miko et al., 2013). These CRISPR PCR assays proved highly sensitive and specific when tested on a large collection of E. colistrains comprising various E. coli pathogroups (Delannoy et al., 2012a,b). Such CRISPR markers may substitute O group testing in a more targeted second step.

Different approaches have been tested to refine the detection systems for EHEC. Some involve detection of ecf1, a plasmid gene highly associated with E. coli strains that are positive for stx, eae, and ehxA (Livezey et al., 2015); while others involve detection of specific eae-subtypes (Bugarel et al., 2010). The top seven EHEC serotypes are exclusively associated with certain eae-subtypes (Oswald et al., 2000; Bugarel et al., 2010; Madic et al., 2010). Indeed, O157:H7 and O145:H28 serotypes are associated with the eae-gamma subtype; O26:H11 are associated with the eae-beta subtype; O103:H2, O121:H19, and O45:H2 are associated with the eae-epsilon subtype; and O111:H8 are associated with the eaetheta subtype. Real-time PCR targeting these eae-subtypes were previously developed and tested in raw milk cheese (Madic et al., 2011) and cattle feces (Bibbal et al., 2014). These could also be used as targets in a more targeted second step.

We attempted to develop an alternative real-time PCR-based approach to improve the detection of the clinically important EHEC by reducing the number of potential positive samples that require further confirmation of the O-antigen markers. We also aimed at reducing the number of samples for which isolation is attempted, as the isolation step is laborious, time-consuming, and not always successful (Wang et al., 2013). The objective of this project was to evaluate the discriminatory power of these various genetic markers to predict the presence of the top seven EHEC serogroups compared to that of the strategy proposed by the ISO/CEN TS13136:2012 (ISO, 2012) and MLG5B.05 (USDA-FSIS, 2014).

#### MATERIALS AND METHODS

### *E. coli* Control Strains

E. coli control strains used for this study (**Table 1**) comprised a panel of ATCC E. coli strains (n = 42) and E. coli reference strains derived from the BfR and Anses collections (n = 13). The origin and characteristics of the E. colistrains from BfR and Anses have been previously described (Delannoy et al., 2012a, 2013a). Cultivation of bacteria and preparation of DNA was performed as previously described (Delannoy et al., 2012a, 2013a).

### Beef Samples, Enrichment, and DNA Extraction

A set of 1739 beef samples composed of ground beef and carcasses were collected from routine screening using the GeneDisc array (Pall GeneDisc, Bruz, France) at the Veterinary Departmental Laboratory of Touraine, France during a 1-year period as well as in meat production plants. For this study, sampling was biased to get greater numbers of DNA samples positive for stx alone (n = 306), positive for stx and eae (n = 180), positive for eae alone (n = 200), and negative for both stx and eae (n =

#### TABLE 1 | PCR-screening of the genetic markers in STEC isolates used as reference.




1053). This sampling scheme does not represent the prevalence of STEC and EHEC in French beef. All samples were incubated in buffered peptone water (BioMerieux, Marcy l'étolie, France) for 18–24 h at 37◦C. After enrichment, DNA was extracted from 1 ml of enriched sample using the InstaGene matrix (Bio-Rad Laboratories, Marnes-La Coquette, France) following manufacturer's instruction and DNA was stored at −20◦C until use. When samples were found positive for stx, eae, and rfbEO157, isolation of strains was attempted by local laboratories for confirmation of EHEC O157:H7. Following the recommendation of the French ministry of agriculture the appropriate sanitary measures were taken in positive cases of EHEC O157:H7. Unfortunately, because this study was performed several months after the samples were collected, the original samples were not conserved to attempt isolation from presumptive positives with the alternate methods described in this study.

#### High-Throughput Real-Time PCR

A LightCycler <sup>R</sup> 1536 (Roche, Meylan, France) was used to perform high throughput real-time PCR amplifications as described previously (Delannoy et al., 2012a), except that 1µl of sample DNA was used in each reaction for a final reaction volume of 2µl. The thermal profile was modified as follows: 95◦C for 1 min, followed by 45 cycles of 95◦C for 0 s, and 58◦C for 30 s. All ramp rates were set to 2◦C/s. E. coli gene targets used for the real-time PCR amplification and all primers and probes that have previously been described are reported in **Table 2**. An inhibition control (IC) was performed on each sample to check for potential inhibition of the PCR reaction due to intrinsic characteristics of the sample. The IC is a recombinant pBluescript IISK+ plasmid containing the dsb gene from Ehrlichia canis (Michelet et al., 2014). The plasmid was added to each sample at a concentration of approximately 0.3 pg/µl. Primers and probe specific for the E. canis dsb gene were used to detect the IC (Michelet et al., 2014).

#### RESULTS

#### Presence of *stx1*, *stx2*, *eae*, *espK*, *espV*, *Z2098*, and *ureD* in *E. coli* Strains

The presence of stx1, stx2, eae, espK, espV, Z2098, and ureD was tested by PCR in a panel of E. coli strains obtained from culture collections (**Table 1**). All E. coli strains used in this study were positive for the stx and eae genes and therefore can be considered as typical EHEC strains. Strains were associated with the following eae variants: eae-beta (O26:H11, O26:HND, O103:H11, O5, O118:H16, O123:H11), eae-gamma (O145:H28, O145:HND, O157:H7, O55:H7), eae-epsilon (O121:H19, O45:H2, O103:H2, O103:H11), and eae-theta (O111:H8, O111:HND, O103:H25). The eae subtype of strain CB10528 (O172:H25) could not be determined. Distribution of the genetic markers espK, espV, Z2098, and ureD in the 55 EHEC strains is shown in **Table 1**. Overall, the genetic markers investigated were detected in most of the EHEC strains examined. With the exception of the new O26:H11 stx2-positive strain (CB14699) and one strain of serotype O157:H7 (CDC-C984), all of the strains were positive for espK. The espV gene was not detected in two E. coli O26:H11 strains (ATCC2196 and CB14699), in O45:H2, O118:H16, O5, or in O123:H11 isolates. The Z2098 gene marker tested negative in only a few strains: the new O26:H11 stx2 positive clone (strain CB14699) and in the O55:H7 strains. The ureD gene was absent in the new O26:H11 stx2 positive clone (strain CB14699) and in the O55:H7 strains. It was also absent from a few O103:H2 isolates (strains PMK5, CB12062, CB12092). In summary, all of the strains were positive for one or more of the genes espK, espV, Z2098, and ureD, with the exception of the new O26:H11 stx2 positive clone (CB14699).

#### Screening Beef Enrichments for *stx1*, *stx2*, *eae*, *espK*, *espV*, *Z2098*, and *ureD*

A set of 1739 beef samples was screened for the presence of stx1, stx2, eae, espK, espV, Z2098, and ureD. The stx genes were detected in 27.95% of the samples (486/1739). The eae gene was detected in 21.85% of the samples (380/1739). The two genes were simultaneously present in 10.35% of the samples (180/1739). The genes espK and/or Z2098 were detected in 7.42% of the samples (129/1739) (**Figure 1A**), while espK and/or espV were found in 130 samples (7.48%) (**Figure 1B**) and espK and/or ureD was recorded in 145 samples (8.34%) (**Figure 1C**).

By using the stx and eae genes for screening beef samples, following the ISO/CEN TS13136 and MLG5B.05 methods, 180 samples (10.35%) were recorded as stx/eae positive and should, therefore, be subjected to a second screening step for EHECserogroups. Pre-screening of stx/eae positive samples for espK with either Z2098 (alternate method A), or ureD (alternate method C), provided a 60% reduction of the number of samples that should be submitted to a second screening targeting the Ogroup gene markers (n = 71). Using the alternate method B (stx/eae/espK/espV), 80 of the 1739 samples (4.6%) needed to be submitted to a further screening for serogroup determination,

which represents a reduction by 55% of the number of samples subjected to a second screening.

**Figure 2** shows the comparison of the alternative methods A–C with the 180 beef samples that tested positive for both the stx and eae genes. A total of 90 stx and eae positive beef samples tested negative for espK, espV, Z2098, and ureD (sector 8, **Figure 2**). However, the inclusion of the CRISPRO26:H11 PCR revealed 12 espK, espV, Z2098, and ureD negative samples that were positive for both the new CRISPRO26:H11 clone and eaebeta, the variant of the intimin gene carried by EHEC O26:H11 (see below).

### Screening Beef Samples for *stx1*, *stx2*, *eae*, and CRISPRO26:H11

The 180 beef samples that are positive for stx and eae were also tested by the CRISPRO26:H11 PCR test (SP\_O26-E, as described in Delannoy et al., 2015a), that detects the new EHEC O26:H11 French clone (stx2 and eae positive, espK, espV, Z2098, ureD negative). Among the 180 samples tested, 20 stx/eae positive samples were also found to be positive for SP\_O26-E and should therefore be submitted to a further screening for serogroup determination. Interestingly, most of them (16/20) were found positive for stx2, 2 samples were positive for stx1 only and 2 had an unknown stx subtype. Twelve of these 20 were also negative for espK, espV, Z2098, and ureD (**Figure 2**, sector 8). Finally, when combining the use of CRISPRO26:H11 (SP\_O26-E PCR test) with alternate method A, 85 samples should be submitted to a second screening targeting the O-group gene markers. When using alternate methods B and C with the CRISPRO26:H11 PCR test, 92 and 86 samples should be submitted to the second step, respectively.

#### Screening Beef Samples for *stx1*, *stx2*, *eae*, and the Top Seven EHEC Serogroups

As recommended in the ISO/TS 13136 (EU) and MLG5B.05 (US) reference methods, the 180 stx and eae positive samples were tested for the top seven EHEC serogroups. Among these, 115 samples were positive for at least one of the top seven US regulated EHEC serogroups and 99 were positive for at least one of the top five EHEC serogroups screened by the European ISO/TS 13136 method (data not shown). The most frequently found serogroup was O103 (n = 71), followed by O26 (n = 45), O121 (n = 38), O157 (n = 18), O45 (n = 14), O145 (n = 6), and O111 (n = 1). Interestingly, 30 samples were positive for 2 serogroups, 11 were positive for 3 serogroups, and 7 for more than 4 serogroups. In final, 41.74% (48/115) of the beef samples tested positive for more than one O-group marker.

#### Screening Beef Samples for *stx1*, *stx2*, *eae*, and the *eae* Subtypes

The 180 stx and eae positive beef samples were further tested for the eae subtypes gamma, beta, epsilon and theta, which are associated with one or more of the top seven EHEC serogroups. Among these 180 stx and eae positive samples, 135 tested positive for at least one of the four eae-subtypes: gamma, beta, epsilon and theta (data not shown). The most frequently detected eaesubtypes were eae-beta (n =94) and eae-theta (n = 65), followed by eae-epsilon (n = 15), and eae-gamma (n = 5). Trying to correlate the eae subtype with the serogroup, we identified 51 beef samples for which at least one serogroup was associated with the corresponding related eae-subtype. Among these 51 samples, 6 were positive with multiple serogroups.

### Comparison of Alternative Methods A–C for Screening Beef Samples

We identified 62 samples that were recorded positive with the three alternative methods A–C (sector 3, **Figure 2**) and therefore must be submitted to a second screening targeting the O-group gene markers. These samples are strongly suspected to contain typical EHEC and five of these are also suspected to contain the new CRISPRO26:H11 clone. In addition we found 28 samples (sectors 1, 2, 4, 5, 6, 7 from **Figure 2**) that were positive by one or

two alternative methods only. PCR results obtained for these 28 samples for eae subtypes and top seven O-groups were as follows: only one sample among the 5 samples of sector 1 (**Figure 2**) was positive for the association of O26 and eae-beta, but it tested negative for the different CRISPRO26:H11 assays targeting EHEC O26:H11 [SP\_O26-C, SP\_O26-D, and SP\_O26-E, as described in Delannoy et al., 2012a, 2015a (data not shown)]. The only sample from sector 4 (**Figure 2**) and each of the four samples from sectors 6 and 7 (**Figure 2**) were found negative by the association of the top seven serogroups and the corresponding eae-subtypes. From sector 2 (**Figure 2**), only one out of three samples was found positive for the new CRISPRO26:H11 clone (this sample tested positive by PCR for O26, eae-beta, SP\_O26-E; and was also stx2 positive which is consistent with the new clone). The two other samples were not suspected as "presumptive positive" based on the eae-subtypes and top seven EHEC serotypes determination. In sector 5 (**Figure 2**), out of eleven samples three were found "presumptive positive" for EHEC O26 (among them two were suspected to be positive for the new O26:H11 clone). Finally, "presumptive positive" samples were recorded in sectors 2, 3, and 5 which lead to consider the alternate method B as the best one among the other alternate methods for screening EHEC strains. In order to complete the screening of EHEC and not to exclude the new virulent O26 clone, the alternate method B should include the screening of the new CRISPRO26:H11 assay.

## DISCUSSION

A STEC seropathotype classification has been based upon the serotype association with human epidemics, bloody diarrhea, and HUS, and has been developed as a tool to assess the clinical and public health risks associated with non-O157 EHEC and STEC strains (Karmali et al., 2003). This approach has been of considerable value in defining pathogenic STEC serotypes of importance in cases of human infection (EFSA, 2007; Coombes et al., 2011); however it does not resolve the underlying problem with strains that have not yet been fully serotyped. Furthermore, classification based solely on the presence of seropathotype is inadequate with illnesses linked to STEC serotypes other than O157:H7 that are on the rise worldwide, indicating that some of these organisms may be emerging pathogens. In 2013, the Panel on Biological Hazards (BIOHAZ) of the European Food Safety Authority (EFSA) published a Scientific Opinion on "VTECseropathotype and scientific criteria regarding pathogenicity assessment" (EFSA, 2013). This document has focused attention on the applicability of the Karmali seropathotype concept. The document does not provide a scientific definition of a pathogenic STEC but states that the seropathotype classification of Karmali et al. (2003) does not define pathogenic STEC nor does it provide an exhaustive list of pathogenic serotypes. It is not possible to fully define human pathogenic STEC or identify factors for STEC that absolutely predict the potential to cause human disease, but

#### TABLE 2 | Primer and probe sequences used in this study.


(Continued)

#### TABLE 2 | Continued


<sup>a</sup>F, forward primer; R, reverse primer; P, probe.

<sup>b</sup>All probes were labeled with 6-HEX or 6-FAM and BHQ1 (Black Hole Quencher).

<sup>c</sup>Oligonucleotide described by Perelle et al. (2004).

<sup>d</sup>Oligonucleotide described by Nielsen and Andersen (2003).

<sup>e</sup>Oligonucleotide described by Delannoy et al. (2013a).

<sup>f</sup>Oligonucleotide described by Delannoy et al. (2013b).

<sup>g</sup>Oligonucleotide described by Delannoy et al. (2012a).

<sup>h</sup>Oligonucleotide described by Delannoy et al. (2015a).

<sup>i</sup>Oligonucleotide described by Perelle et al. (2005).

<sup>j</sup>Oligonucleotide described by Bugarel et al. (2010).

<sup>k</sup>Oligonucleotide described by Fratamico et al. (2009).

strains positive for Shiga-toxin (in particular the stx2 genes) and eae (intimin production) genes are associated with a higher risk of more severe illness than other virulence factor combinations (EFSA, 2013). Severe disease, and particularly HUS, is linked to certain serotypes and strains and this link must be the result of particular genetic factors or combinations of factors that have to be determined. A new molecular classification scheme has been proposed by the EFSA Panel on Biological Hazards (BIOHAZ) that relies more on virulence factors than seropathotypes. It is proposed that STEC serogroups O157, O26, O103, O145, O111, and O104 in combination with stx and eae or stx and both aaiC (secreted protein of EAEC) and aggR (plasmid-encoded regulator) genes should be considered as presenting a potentially higher risk for bloody diarrhea and HUS, such strains are categorized in group I (EFSA, 2013). For any other serogroups in combination with the same genes, the potential risk is regarded as high for diarrhea, but currently unknown for HUS, such strains are categorized in group II. The inclusion of aaiC and aggR genes in the proposed molecular approach is due to the O104:H4 outbreak, which was caused by a highly virulent strain (Frank et al., 2011). This appears to be an exceptional event (Prager et al., 2014) and future surveillance will provide data that may be used to review the inclusion of these virulence factors but a recent study showed that French cattle are not a reservoir of the highly virulent enteroaggregative Shiga toxin-producing E. coli of serotype O104:H4 (Auvray et al., 2012).

Following the EFSA opinion, several laboratories have attempted to develop detection and identification methods for strains of groups I and II, and although substantial progress has been made, a practical method of pathogenic STEC detection has yet to be validated. Molecular methods for screening EHEC O157 and non-O157 in beef products rely currently on the molecular detection of stx, eae, and the top five or top seven EHEC serogroups in mixed bacterial enrichments as described in the ISO/TS 13136 (EU) and MLG5B.05 (US) reference methods (ISO, 2012; USDA-FSIS, 2014), followed by attempted isolation of the correct strain. These approaches bear the disadvantage that Delannoy et al. Screening EHEC in Beef Samples

food samples that carry mixtures of stx-negative E. coli carrying eae genes together with eae-negative STEC falsely indicate the presence of EHEC when analyzed for the genes mentioned above (Beutin et al., 2009; Fratamico and Bagi, 2012; Wasilenko et al., 2014). Hence, screening food enrichment broths for stx and eae genes may cause needless disruption and costs for food producers when the risk of low-virulence STEC is overestimated. The erroneous identification of cucumbers as the source in the outbreak in Germany in 2011 cost European fruit and vegetable producers approximately e 812 million (Commission of the European Communities., 2011). Thus, new DNA targets that unambiguously identify typical EHEC strains (stx-positive and eae-positive E. coli strains) in complex samples are a desirable goal. This new panel of genes might include novel genetic markers recently identified as highly associated to pathogenic STEC strains (Delannoy et al., 2013a,b). Thus, the associations of espK with either espV, ureD, or Z2098 were found to be the best combinations for more specific and sensitive detection of the top seven EHEC strains, allowing detection of 99.3–100% of these strains. In addition, detection of 93.7% of the typical EHEC strains belonging to other serotypes than the top seven offered the possibility for identifying new emerging typical EHEC strains (Delannoy et al., 2013a,b). Conversely, these different combinations of genetic markers were very rarely associated with STEC (1.6–3.6%) and with non-pathogenic E. coli (1.1–3.4%). The objective of the present study was to refine the EHEC screening systems for testing beef samples via the incorporation of these additional gene targets espK, espV, ureD, and Z2098 in the detection scheme. In addition to these four targets, we included a CRISPRO26:H11 target in the detection scheme that has been designed for detecting a new clone of STEC O26:H11 harboring stx2 only and strongly associated with HUS in France (Delannoy et al., 2015a,b). This new EHEC O26:H11 French clone is positive for eae (eae-beta subtype) but does not contain any of the above markers espK, espV, ureD, or Z2098. The fact that the new French clone is missing espK, espV, ureD, and Z2098 does not mean that these genes are not required EHEC virulence factors. We have previously demonstrated that they are significantly associated with EHEC strains and not the other pathotypes. It does suggest however that the new French clone may harbor a different set of virulence genes (just like atypical EHEC lack eae), which can be investigated through additional genomic studies. In the meantime, although CRISPR sequences are not virulence factors per se, we have demonstrated that certain specific spacers are associated with EHEC strains from the top7 serotypes and thus can provide specific and sensitive detection of the top 7 EHEC strains (Delannoy et al., 2012a,b).

To validate the pertinence of this new approach we screened 1739 beef samples and collected 180 samples that tested positive for both stx and eae, and must be subjected to a second screening step for serogroup determination according to the ISO/TS 13136 (EU) and MLG5B.05 (US) reference methods (ISO, 2012; USDA-FSIS, 2014). Among these 180 samples, 135 tested positive for at least one of the four eae-subtypes gamma, beta, epsilon and theta, which are related to typical EHEC and in particular to those of the top seven serogroups (Bugarel et al., 2010). Introduction of the eae-subtypes in the screening step provided a reduction by 25% of the number of stx/eae positive samples that should be subjected to a further screening for serogroup determination. A more significant refinement of the first EHEC screening step was achieved by including espK, espV, ureD, Z2098, and CRISPRO26:H11 target genes in the detection scheme. Thus, a reduction by 52.78–52.22% of the number of samples subjected to a further screening for serogroup determination was obtained by using respectively the alternate method A (stx/eae/espK/Z2098) or method C (stx/eae/espK/ureD) in combination with the CRISPRO26:H11 PCR assay detecting the new O26 clone. A reduction by 48.88% of the number of "presumptive positive" samples was obtained using the alternate method B (stx/eae/espK/espV) in association with the CRISPRO26:H11 PCR assay. Given the additional information on the association of the top seven serogroups and the eae-subtypes, we determined the last approach, i.e., method B (stx/eae/espK/espV) in association with the CRISPRO26:H11 PCR assay, as the best approach to narrow down the EHEC screening step in beef samples. Using such an approach, 92 samples must be subjected to a further screening for serogroup determination vs. 180 with the conventional stx/eae approach used in the ISO/TS 13136 (EU) and MLG5B.05 (US) reference methods (ISO, 2012; USDA-FSIS, 2014). This constitutes a significant reduction (almost 50%) of the number of samples subjected to a second screening targeting the O-group gene markers. Moreover, this approach is in line with the EFSA opinion that has identified STEC strains of groups I and II as presenting the potential higher risk for diarrhea and HUS (EFSA, 2013). Identification of additional gene markers, i.e., espK, espV, and CRISPRO26:H11 to better distinguish typical EHEC from other E. coli pathogroups would substantially enhance the power of EHEC test systems providing a significant reduction of "presumptive positive" in beef samples. Such a new approach would provide to the agroindustry a novel method for tracking EHEC in food samples. This work should be considered with interest to draw up the outline of a future standard that will follow the recommendations of EFSA.

### AUTHOR CONTRIBUTIONS

Conceived and designed the experiments: SD, PF. Performed the experiments: SD, BC, SI, HW, JD, IB. Analyzed the data: SD, BC, PF. Contributed reagents/materials/analysis tools: SD, LB, JD, IB, PF. Wrote the paper: SD, PF. Critical revision of the paper for important intellectual content: SD, BC, SI, HW, LB, JD, IB, PF.

### FUNDING

The project was partially financed by the French "joint ministerial program of R&D against CBRNE risks" (Grant number C17609-2).

### REFERENCES


Escherichia coli in food. Foodborne Pathog. Dis. 10, 665–677. doi: 10.1089/fpd. 2012.1448

Wasilenko, J. L., Fratamico, P. M., Sommers, C., DeMarco, D. R., Varkey, S., Rhoden, K., et al. (2014). Detection of Shiga toxin-producing Escherichia coli (STEC) O157:H7, O26, O45, O103, O111, O121, and O145, and Salmonella in retail raw ground beef using the DuPont™ BAX <sup>R</sup> system. Front. Cell Infect. Microbiol. 4:81. doi: 10.3389/fcimb.2014.00081

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Delannoy, Chaves, Ison, Webb, Beutin, Delaval, Billet and Fach. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Implementation of Whole Genome Sequencing (WGS) for Identification and Characterization of Shiga Toxin-Producing Escherichia coli (STEC) in the United States

Rebecca L. Lindsey <sup>1</sup> , Hannes Pouseele<sup>2</sup> , Jessica C. Chen<sup>3</sup> , Nancy A. Strockbine<sup>1</sup> and Heather A. Carleton<sup>1</sup> \*

<sup>1</sup> Enteric Diseases Laboratory Branch, Centers for Disease Control and Prevention, Atlanta, GA, USA, <sup>2</sup> Applied Maths NV, Sint-Martens-Latem, Belgium, <sup>3</sup> IHRC Inc., Atlanta, GA, USA

#### Edited by:

Pina Fratamico, United States Department of Agriculture, Agricultural Research Service, USA

#### Reviewed by:

Michel Drancourt, Aix Marseille Université, France Alan Leonard, Florida Institute of Technology, USA

> \*Correspondence: Heather A. Carleton hcarleton@cdc.gov

#### Specialty section:

This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology

Received: 02 March 2016 Accepted: 06 May 2016 Published: 23 May 2016

#### Citation:

Lindsey RL, Pouseele H, Chen JC, Strockbine NA and Carleton HA (2016) Implementation of Whole Genome Sequencing (WGS) for Identification and Characterization of Shiga Toxin-Producing Escherichia coli (STEC) in the United States. Front. Microbiol. 7:766. doi: 10.3389/fmicb.2016.00766 Shiga toxin-producing Escherichia coli (STEC) is an important foodborne pathogen capable of causing severe disease in humans. Rapid and accurate identification and characterization techniques are essential during outbreak investigations. Current methods for characterization of STEC are expensive and time-consuming. With the advent of rapid and cheap whole genome sequencing (WGS) benchtop sequencers, the potential exists to replace traditional workflows with WGS. The aim of this study was to validate tools to do reference identification and characterization from WGS for STEC in a single workflow within an easy to use commercially available software platform. Publically available serotype, virulence, and antimicrobial resistance databases were downloaded from the Center for Genomic Epidemiology (CGE) (www.genomicepidemiology.org) and integrated into a genotyping plug-in with in silico PCR tools to confirm some of the virulence genes detected from WGS data. Additionally, down sampling experiments on the WGS sequence data were performed to determine a threshold for sequence coverage needed to accurately predict serotype and virulence genes using the established workflow. The serotype database was tested on a total of 228 genomes and correctly predicted from WGS for 96.1% of O serogroups and 96.5% of H serogroups identified by conventional testing techniques. A total of 59 genomes were evaluated to determine the threshold of coverage to detect the different WGS targets, 40 were evaluated for serotype and virulence gene detection and 19 for the stx gene subtypes. For serotype, 95% of the O and 100% of the H serogroups were detected at > 40x and ≥ 30x coverage, respectively. For virulence targets and stx gene subtypes, nearly all genes were detected at > 40x, though some targets were 100% detectable from genomes with coverage ≥20x. The resistance detection tool was 97% concordant with phenotypic testing results. With isolates sequenced to > 40x coverage, the different databases accurately predicted serotype, virulence, and resistance from WGS data, providing a fast and cheaper alternative to conventional typing techniques.

Keywords: Escherichia coli, whole genome sequence, STEC, next generation sequencing, stx subtyping, Escherichia coli serotypes

## INTRODUCTION

Foodborne bacteria pose a major threat to public health. To prevent widespread infections due to these bacteria as well as detect outbreaks, rapid and accurate identification and subtyping of these bacteria is key. Shiga toxin-producing Escherichia coli (STEC) is an important foodborne pathogen estimated to cause over 265,100 illnesses each year in the United States (Scallan et al., 2011). STEC may present as a mild gastroenteritis, diarrhea, grossly bloody diarrhea and hemolytic uremic syndrome (HUS), and infection may be fatal. In the United States an estimated 96,500 O157 STEC and 168,690 non-O157 STEC infections occur each year and result in over 3600 hospitalizations and 30 deaths annually (Scallan et al., 2011).

STEC is a nationally reportable disease in the U.S. and clinical laboratory requirements for forwarding the STEC positive isolate or specimen to the public health laboratory vary by state. Once a STEC positive isolate or specimen arrives at the local or state public health laboratories it undergoes further characterization. These isolates are routinely subtyped using pulsed-field gel electrophoresis (PFGE) and submitted to the national surveillance network for foodborne bacteria, PulseNet, as well as characterized using conventional techniques for phenotype, serotype, and virulence. Workflows at public health laboratories for characterization for STEC can vary, current methods for characterization of STEC in the Enteric Diseases Laboratory Branch at the Centers for Disease Control and Prevention include panels of 22–49 phenotypic tests for identification, agglutination assays with 270 pooled and individual O- and H-specific antisera for serotyping (determination of 188 O and 53 H antigens), panels of five to 10 PCR assays for virulence profiling and broth microdilution assays for antimicrobial susceptibility testing. These methods require complex workflows, expensive reagents, labor-intensive quality control procedures, specialized training, and typically take 1–3 weeks to complete. Therefore a need exists to simplify workflows and reduce costs and time associated with subtyping and characterization of STEC, possibly through whole genome sequencing.

Whole genome sequencing (WGS) using benchtop instruments makes WGS possible in a public health lab setting. These machines are relatively easy to operate; the cost per isolate is low; and turnaround time for generating WGS data is within days rather than the 1–3 weeks required for current, conventional methods. Since the serotype, virulence and antimicrobial resistance profile may be predicted from the genome sequence, WGS may replace almost all reference characterization of STEC in the public health laboratory. Additionally, the sequence data also provide a level of strain discrimination and precision that is better than any subtyping method hitherto used for outbreak detection and investigation. Thus, almost all characterization of STEC in the public health laboratory can be replaced by WGS using one single efficient workflow. However, converting the WGS data into interpreted output that is useful for public health professionals is a real challenge.

To address this challenge, The Center for Genomic Epidemiology (CGE) (www.genomicepidemiology.org) has developed a suite of web-based tools for in silico analysis of bacterial whole genome sequence (Cosentino et al., 2013; Joensen et al., 2014). These tools include a serotype detection procedure (SerotypeFinder) and resistance and virulence prediction tool (ResFinder and VirulenceFinder) for analysis of E. coli and other bacterial WGS data (Zankari et al., 2013; Kleinheinz et al., 2014; Joensen et al., 2015). To characterize an isolate, WGS is uploaded to the website, and depending on the analysis requested, a report of the isolate's serotype, virulence, and resistance gene content is returned within several minutes to hours. Since many WGS analysis tools will accept data of any quality, it is important to understand the data quality requirements for the information being sought to interpret negative results correctly.

Although the CGE tools are useful for analysis in a setting where the isolate throughput is low and data analysis is centralized, for WGS analysis tools to be effective in a public health laboratory setting that processes tens to hundreds of WGS isolate sets per week, all the tools need to be merged into a single platform that performs WGS quality assessment and can also be used with a database that includes sufficient patient and sample information about the isolates to be able to interpret them in the proper epidemiological context, e.g., the outbreak setting. Moreover, the platform needs to be simple and user-friendly so that it may be used by public health professionals with limited bioinformatics skills. While numerous commercial and public domain software are available to analyze WGS data, very few combine databasing, WGS and other analytical capabilities, which are highly desirable in national and international laboratory surveillance networks. One exception is BioNumerics v7.5 <sup>R</sup> (Applied Maths, Austin, TX), which is a commercial, customizable WGS quality assessment, analysis, and database software package that may be used for all these purposes. Thus the serotype, virulence, and resistance gene detection tools from CGE and in silico PCR tools to confirm results from virulence prediction tools can be integrated into a single push button tool in BioNumerics v7.5 <sup>R</sup> , the genotyping plug-in. This plug-in can be used in the same database that contains the sequence data for subtyping for surveillance purposes and only requires a de novo assembled genome. Therefore, within the one software platform, reference characterization and WGS analysis for outbreak detection can be performed rapidly and requires little bioinformatics training for the user.

In this study, we performed validation of a genotyping plugin within BioNumerics for identification of O and H genes using a diversity set of nearly 200 genomes. We further demonstrated the utility of the detection tools within the genotyping plug-in on isolates that were sequenced in-house and tested by traditional methods for serotype, virulence gene content, and antimicrobial susceptibility. In addition, we down-sampled the sequence reads of the later set to determine the minimum genome coverage needed for the program to detect the intended target genes and present in silico PCR tools to confirm selected results of the virulence detection tool.

### MATERIALS AND METHODS

### Whole Genome Sequencing and Analysis

Validation of the genotyping plug-in within BioNumerics version 7.5 was performed using whole genome sequences of nearly 300 in-house sequenced and publically available STEC genomes for which conventional serotyping results were also available. The sequence data were generated on PacBio, Roche 454, and Illumina sequencing platforms (Table S1). An additional 17 genomes of Shigella and corresponding antimicrobial susceptibility testing data were included to evaluate the resistance finder tool. Moreover, a set of 106 isolates with traditional serotype and virulence PCR results performed in house were sequenced on the Illumina MiSeq or HiSeq sequencer platforms and selected for the down sampling experiment (Table S1). For in-house sequenced isolates, DNA was extracted using the Qiagen Blood and Tissue kit, libraries were prepared using NexteraXT (MiSeq) or NEB Next (HiSeq), and sequenced on the MiSeq or HiSeq using 2 × 250 bp chemistry. Sequence quality was evaluated on a per genome basis using BioNumerics version 7.5 <sup>R</sup> . All genomes passed the basic quality metrics for raw sequence data from Illumina sequencers of average Qscore >30 in both reads and at least 40X average coverage with expected genome size for E. coli of 5 Mb. Read files of in house generated sequence data were uploaded to NCBI SRA using BioNumerics v.7.5 NCBI uploader (see Table S1). Genomes were processed through the BioNumerics Calculation Engine for de novo assemblies using the wgMLST client plug-in. The assembly was done using SPAdes version 3.5.0 integrated into the wgMLST plug-in and basic assembly metrics were calculated and used for quality assessment.

### Analysis Using Genotyping Plug-in

Assembled sequence data was analyzed using the genotyping plug-in. The genotyping plug-in contains databases for serotype, virulence and resistance prediction (consisting of annotated allelic variants for genes encoding serotype, virulence factors and antimicrobial resistance), and for plasmid and prophage detection obtained from the Center for Genomic Epidemiology (DTU, Lyngby, Denmark) (https://cge.cbs.dtu.dk/services/data.php). The genotyping plug-in also contains an in silico PCR tool for the detection of Shiga toxin gene subtypes and virulence genes using previously published primers (Paton and Paton, 1998; Scheutz et al., 2012). The various "finder" tools use a blast-based approach to detect the genes of interest in the de novo assembled genome, and subsequently identifies them against the appropriate reference database. Detection parameters were set to 90% sequence identity and 60% sequence coverage. As a quality metric and a guard against blindly extrapolating the serotype, virulence or resistance prediction, for each similarity-based association, a discrimination score is calculated, indicating how good the closest known allele in the respective database fits the sample data with respect to the runner-up allele. The in silico PCR tools, mimicking the wet lab PCR process, excise a particular part of the genome, defined by forward and reverse primer pairs. In detecting a primer, at most 1 mismatch was allowed.

#### Downsampling Analysis

A set of 59 genomes were downsampled. A set of 40 genomes were used to validate the serotype and virulence gene finder, and the remaining set of 19 were used to validate the stx subtyper. The genomes were downsampled to 40x, 30x, 20x, and 10x coverage using the Computational Genomics Pipeline (CG-Pipeline; https://github.com/lskatz/CG-Pipeline; Katz et al., 2011). Downsampled genomes were assembled and analyzed using the genotyping plug-in BioNumerics v.7.5 with the settings outlined above to determine limit of detection for WGS-based identification tools.

### Conventional Testing Procedures

Conventional testing of isolates was completed in the Escherichia Shigella reference laboratory at the Centers for Disease Control and Prevention, USA. Serotyping was performed with O- and Hspecific antisera from the Statens Serum Institut (Copenhagen, Denmark) by standard methods in a microtiter format (Ewing, 1986). For the virulence genes real-time or conventional PCR for the presence of Shiga toxin 1 and 2 (stx1, stx2), stx subtyping (stx1a, stx1c, stx1d, stx2a, stx2c, stx2d, stx2e, stx2f, and stx2g), intimin (eae) and hemolysin (ehxA) genes was performed (Paton and Paton, 1998; Scheutz et al., 2012). Broth microdilution assays to determine antimicrobial susceptibility was done by the National Antimicrobial Resistance Monitoring (NARMS) laboratory using previously published techniques (CDC, 2013). Resistance data were interpreted using Clinical Laboratory Standards Institute criteria (Clinical Laboratory Standards Institute, 2012).

## RESULTS

### Validation of the Serotype Detection Tool in BioNumerics

The serotype detection tool within BioNumerics was validated on a total of 188 isolates for which WGS data and conventional serotype information was available. These publically available genomes were sequenced by either PacBio, Illumina, or 454 technology (see Table S1). The genomes represent 30 O serogroups and 26 H serogroups for a total of 76 serotypes (**Table 1**). Several representatives of the top 20 serotypes as well as a representation of a diverse collection of serotypes were selected for this set of genomes.

A total of 29 O and 25 H serogroups were identified from the WGS data of the 30 O and 26 H serogroups detected by conventional methods, one O118 serogroup isolate was not detected by WGS data, and an H47 isolate was typed as an H7 by WGS. Comparisons to the traditional O serogroup results with the predictions from the WGS data showed that 96.3% (181/188) of the O serogroups were accurately predicted from the WGS data. For the H serogroup, 95.9% (164/171) of the H antigens were accurately predicted from the WGS data, H antigen detection for non-motile isolates was not counted since such isolates are non-typable by the phenotypic methods (**Table 1**). There were only 4 isolates that had a different O serogroup predicted and 6 isolates that had a different H serogroup predicted compared to traditional typing results (see Table S1).



TABLE 1 | Continued


Number of isolates given for positive by conventional and WGS serotype tests (isolate details listed in Table S1). 17 isolates were non-motile by traditional testing and not counted in the H WGS test results.

<sup>a</sup>Descrepant conventional and WGS serotyping results are noted by isolate in Table S1. <sup>b</sup>Antigen not predicted from WGS data.

These were not sequenced by us but were downloaded from NCBI and it is possible that the data on NCBI may contain errors. However, since we do not have access to these isolates we cannot confirm the WGS results and phenotype. Overall, the serotyper tool predicted the serotype correctly in 94.2% (161/171) of the tested genomes.

#### Robustness of Serotype and Virulence Gene Prediction in WGS Datasets

To determine the sensitivity of the serotype and virulence gene predictions by WGS, a set of 40 isolates was selected that had all been sequenced by Illumina MiSeq or HiSeq and serotyped and virulence gene characterized using PCR methods. These genomes ranged in coverage from 40x to 267x coverage. Using the serotype detection tool in the genotyping plug-in, all but one of the O serogroups were predicted (95%) (38/40 isolates), both isolates belonging to the O153 serogroup were not predicted (**Table 2**). For those isolates where no O serogroup was predicted, the genomes ranged from 119x to 153x coverage, suggesting that sequence coverage was not a factor in being able to predict this particular serogroup. All of the H serotypes were predicted correctly when considering motile isolates, i.e. isolates that could be phenotypically verified by agglutination. The sequencing reads per isolate were then randomly down sampled to 40, 30, 20, and 10 times coverage and then analyzed in BioNumerics. These genomes were assembled de novo and the serotype and virulence genes predicted. In the down-sampled datasets, at 40x coverage 77.5% of O and 100% of H serogroups were correctly identified. For the remaining 30x, 20x, and 10x coverage levels, O serogroups were predicted correctly in 77.5, 52.5, and 17.5% of isolates and H serogroups were predicted in 100, 95, and 70% of isolates, respectively. The best prediction of O and H serogroup was from genomes at greater than 40x coverage.

(Continued)

For the original sequence and 40x, 30x, and 20x down sampled genomes, there was 100% concordance between the virulence


#### TABLE 2 | Limit of detection for O and H antigens in a downsampled WGS data set from 40 strains.

Strain identifiers listed in Table S1.

<sup>a</sup>Original coverage ranged from 40 to 267x.

<sup>b</sup>Six H serogroups were called from the WGS data that were typed as non-motile by conventional methods and not included here.

detection tool and in silico PCR and conventional real-time PCR assay for Shiga toxin 1 and 2 (stx1, stx2), intimin (eae), and hemolysin (ehxA) genes when a call was made (see **Table 3**). At 10x coverage, few virulence genes were identified from the WGS. The virulence detection databases did not identify stx<sup>2</sup> in a STEC O76:H19 isolate which was detected by in silico PCR. Additionally, the in silico PCR did not identify stx<sup>2</sup> in two isolates that were identified as stx<sup>2</sup> positive by the virulence detection tool databases. By using both the virulence detection databases and in silico PCR, all stx<sup>2</sup> positive isolates identified by conventional typing methods were also identified from the WGS at 20x coverage or higher. For the other virulence gene targets, stx<sup>1</sup> was detected in all 21 of the isolates positive by conventional testing at ≥ 20x coverage, eae in all 26 at ≥ 30 coverage, and ehxA gene detection from WGS data was 100% concordant in the assemblies from the original sequence read set.

### Prediction of stx Gene Subtype Using Virulence Gene Database and In silico PCR

A total of 19 isolates were examined that had complete conventional stx gene subtype results and WGS results. These isolates represented the a, c, and d subtypes of stx<sup>1</sup> and the a, b, c, d, e, f, and g subtypes of stx2. All the stx subtypes except stx2c were detected using the in silico PCR tool or the virulence detection database at original coverage levels and down to 10x coverage (see **Table 4**). For stx2c, only one of the two isolates identified as positive by conventional laboratory testing was detected as


TABLE 3 | Limit of detection of virulence genes in a down sampled WGS data set from 40 STEC and one hybrid STEC/EAEC O104:H4 by both a blast and in silico PCR approach.

Strain identifiers listed in Table S1.

<sup>a</sup>Original coverage was 40x to 267x.

<sup>b</sup>For the original sequence files, stx2 was missed in an E. coli O76:H19 using the genotyping plug-in that was detected by in silico PCR.

<sup>c</sup>The in silico PCR did not detect stx2 in 2 isolates though it was detected by the genotyping plug-in.

TABLE 4 | Limit of detection of stx gene subtype in a downsampled WGS data set for 19 strains by both a blast and in silico PCR approach.


see Table S1 for isolate identification.

positive from the WGS using in silico PCR. One isolate was positive for stx1a using blast against the virulence gene database but was negative by both conventional testing using PCR and the in silico approach. Looking further into this discrepancy, using the blast based approach the gene was only a 82.3% length match compared to the reference allele and may indicate the gene was truncated so that the reverse primer used in the traditional or in silico PCR assay would not hybridize. Overall, stx gene subtype was correctly predicted in 89.5% of isolates at ≥ 10x coverage.

#### Validation of the Resistance Finder Tool in BioNumerics

The resistance finder tool in BioNumerics was evaluated against a set of 46 isolates where WGS and traditional antimicrobial susceptibility results were available. Several of the isolates tested, a total of 28 out of the 46, were pansusceptible by both antimicrobial susceptibility testing and did not contain resistance genes by WGS (see **Table 5**). Of the remaining STEC and Shigella that were found to be resistant by traditional antimicrobial susceptibility testing, the concordance for detecting genetic antimicrobial resistance determinants for ampicillin, azithromycin, chloramphenicol, sulfisoxazole, streptomycin, and trimethoprim/sulfamethoxazole was 100%. One isolate was found to contain tetracycline resistance genes that did not test as resistant by conventional testing. No genetic resistance determinants were detected for isolates resistant to nalidixic acid and ciprofloxacin using the ResFinder database. Through further genetic analysis, it was determined that these isolates were resistant via chromosomal mutations in the gyrA gene alone or in combination with mutations in the parC gene. These results are not unexpected as gene detection schemes can identify nonfunctional genes and do not detect mutational events. Taking these issues into account, there was 99.7% concordance between phenotypic susceptibility and antimicrobial resistance detection by WGS.



Values indicate the number of strains identified with resistance to the indicated antimicrobial.

### DISCUSSION

In this study, we demonstrated the utility and accuracy of a single software platform for combining workflows for quality assessment and reference characterization of STEC through WGS data. A single software program that can be used by nonbioinformaticians is a requirement for public health professionals to be able to infer phenotypic results from WGS data. Using publically available databases and in silico PCR tools developed as part of this study, we identified the same information (serotype, virulence genes, and resistance determinants) from the WGS data for 94.7% E. coli and Shigella isolates as was identified previously by conventional methods. Additionally, the limit of detection for these determinants was established through down sampling experiments allowing for better interpretation of negative results and understanding of sequence data quality needed for reference characterization from WGS.

Although, other recent studies have already shown the utility of reference characterization directly from sequence data generated from benchtop sequencers (Joensen et al., 2014, 2015; DebRoy et al., 2016), we present the combined quality assessment, serotyping, virulence profile, and resistance profile in one simple, high-throughput, and user-friendly analytical WGS workflow. These previous studies extensively validated their findings against those obtained by conventional methods, yet limited testing was done to identify the limit of detection for these tools and how best to interpret a negative result. Often in these studies, sequences were selected because they were of high quality and had high coverage, typically over 50x. While high sequence quality and coverage may be possible during routine testing periods, it is often difficult to achieve during outbreak response or when trying to reduce testing costs. When there is a need to increase isolate multiplexing per sequencing run to increase throughput and reduce costs, sequence coverage per isolate decreases. For this reason we attempted to determine the limits for coverage to help determine the maximum number of isolates that could be sequenced at the same time. In our study, sequence coverage of ≥30 was enough to predict 100% of the H serogroups, for the O serogroups 93% of the serogroups were correctly predicted at a sequence coverage > 40x. One O antigen, O153 (2 isolates), was not detected in genomes sequenced to over 100x coverage. Since other groups have shown that O153 genes wzx and wzy are 100% identical to the O178 genes, even though surprisingly these two serogroups are not cross-reactive using phenotypic testing, the current similarity-based WGS detection methods may not be able to distinguish these closely-related serogroups (Joensen et al., 2014, 2015; DebRoy et al., 2016).

Virulence gene detection performed more robustly than O and H antigen gene detection. The majority of virulence genes were detected at ≥20x coverage in the WGS data using the genotyping plug-in. Since both serotype and virulence gene information is needed for STEC surveillance, to be able to consistently identify serotype from WGS in all isolates >40x coverage is recommended. Preliminary data (not presented) indicates that this coverage will also suffice for subtyping for outbreak investigations using whole genome multilocus sequence typing analysis.

To improve confidence in negative WGS results, additional in-silico PCR tools were developed to double check negative results from whole genome sequence data. This provided further confidence in virulence typing results. By using both WGS typing tools, all virulence genes were detected, by relying on either tool alone, blast algorithm or in silico PCR, important virulence genes would have been missed. Being able to accurately identify virulence genes and stx gene subtypes is important because certain virulence gene combinations are associated with higher risks for adverse events, e.g., HUS (Scheutz et al., 2012). Other groups have also shown the robustness of determining stx gene subtypes from WGS from both O157 and non-O157 serogroups (Ashton et al., 2015; Chattaway et al., 2016).

For identification of resistance determinants, the ResFinder database produced highly concordant results with traditional phenotypic testing. Isolates possessing quinolone resistance mechanisms that were not identified by the ResFinder database, underscore the limitations of a gene-based detection approach. Supplementary in silico PCRs using conventional primer sets (Conrad et al., 1996; Bhattacharya et al., 2013) and subsequent sequence analysis or reference-mapping tools can be used in conjunction with the resistance finder tool, in order to detect mutational events conferring antimicrobial resistance. A recent study examining multi-drug resistant E. coli in the United States accurately predicted drug resistance with high specificity and sensitivity using a WGS approach that employed gene-based detection in conjunction with mutational analysis of the quinolone resistance-determining regions of the chromosome (Tyson et al., 2015). The present study confirms this high concordance between genetic and phenotypic testing for antimicrobial resistance, and also reveals the ability of this WGS-based approach to distinguish resistant and susceptible isolates for most drug classes.

From this study, it has been shown that quality assessment, serotyping, virulence and resistance profiling can be performed in one simple workflow. Additionally, the information that is extracted from WGS has more details than provided by conventional methods, e.g., by conventional methods we routinely detect only 5 virulence targets and 9 antimicrobial susceptibilities whereas over 100 virulence and resistance determinant genes are detected by WGS. Extracting this information from the whole genome sequence rather than using traditional identification techniques is highly cost-efficient: it is possible to save up to 180 US dollars on reagents alone per characterized isolate (assuming \$123 for WGS and \$304 for traditional typing workflow per isolate), which makes WGS both more rapid and less expensive for typing STEC. Additionally other groups (Joensen et al., 2014) have shown that the turnaround time for WGS is faster compared to conventional reference identification and subtyping workflows. We have also seen that the turnaround time from receipt of isolate in lab to WGS result can be 3–4 days while conventional methods take 1–3 weeks. While there is a high initial overhead cost of the sequencing instrument, WGS has the potential to streamline laboratory work into a unified workflow, reducing the need for multiple specialized personnel and instruments for various genotypic and phenotypic testing. Furthermore, by understanding the limit of detection by WGS for these different targets, there is more confidence that a negative result is an accurate prediction, though further work needs to be done to fine tune the serotype and resistance finder databases.

#### FUTURE WORK

For future work on the genotyping plug-in within BioNumerics, we plan to integrate more gene identification techniques from the whole genome sequence, such as reference mapping, to improve the ability to detect serogroups and resistance determinants. Lastly, although this validation was done for

#### REFERENCES


reference characterization and a minimum coverage requirement was identified, similar tests need to be performed in terms of WGS analysis for outbreak detection. Currently we are validating a whole genome multilocus sequence typing (wgMLST) database for STEC.

#### AUTHOR CONTRIBUTIONS

RL, HC designed this work, RL, HC, JC, and HP contributed to the analysis and interpretation of the work. RL and HC drafted the manuscript and RL, HC, JC, HP, and NS reviewed the manuscript for intellectual content.

#### FUNDING

This work was made possible through support from the Advanced Molecular Detection (AMD) initiative at the Centers for Disease Control and Prevention.

### ACKNOWLEDGMENTS

We are very grateful to Devon Stripling, Haley Martin, and Lisley Garcia-Toledo for help with traditional reference characterization and preparing isolates for sequencing as well as Ashley Sabol and Eija Trees in the PulseNet CDC team for providing some WGS data. We would also extend our thanks to Katrine Joensen and Flemming Schuetz for their excellent work on the SerotypeFinder and DTU for hosting the Finder databases. We would also like to thank PulseNet database managers Morgan Schroeder and Sung Im with assistance using BioNumerics v7.5.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmicb. 2016.00766


use of whole-genome sequencing data. J. Clin. Microbiol. 53, 2410–2426. doi: 10.1128/JCM.00008-15


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

HP is affiliated as an employee (chief operations officer) with the following organization: Applied Maths NV, Keistraat 120, B-9830 Sint-Martens-Latem, Belgium.

Copyright © 2016 Lindsey, Pouseele, Chen, Strockbine and Carleton. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Phylogenetic Analyses of *Shigella* and Enteroinvasive *Escherichia coli* for the Identification of Molecular Epidemiological Markers: Whole-Genome Comparative Analysis Does Not Support Distinct Genera Designation

#### *Emily A. Pettengill1, James B. Pettengill2 and Rachel Binet1\**

#### *Edited by:*

*Pina Fratamico, United States Department of Agriculture – Agricultural Research Service, USA*

#### *Reviewed by:*

*Edward G. Dudley, Penn State, USA Adam Peritz, United States Department of Agriculture – Agricultural Research Service, USA*

> *\*Correspondence: Rachel Binet rachel.binet@fda.hhs.gov*

#### *Specialty section:*

*This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology*

*Received: 13 November 2015 Accepted: 28 December 2015 Published: 19 January 2016*

#### *Citation:*

*Pettengill EA, Pettengill JB and Binet R (2016) Phylogenetic Analyses of Shigella and Enteroinvasive Escherichia coli for the Identification of Molecular Epidemiological Markers: Whole-Genome Comparative Analysis Does Not Support Distinct Genera Designation. Front. Microbiol. 6:1573. doi: 10.3389/fmicb.2015.01573*

*<sup>1</sup> Division of Microbiology, Office of Regulatory Science, U.S. Food and Drug Administration, Center for Food Safety and Applied Nutrition, College Park, MD, USA, <sup>2</sup> Division of Public Health Informatics and Analytics, Office of Analytics and Outreach, U.S. Food and Drug Administration, Center for Food Safety and Applied Nutrition, College Park, MD, USA*

As a leading cause of bacterial dysentery, *Shigella* represents a significant threat to public health and food safety. Related, but often overlooked, enteroinvasive *Escherichia coli* (EIEC) can also cause dysentery. Current typing methods have limited ability to identify and differentiate between these pathogens despite the need for rapid and accurate identification of pathogens for clinical treatment and outbreak response. We present a comprehensive phylogeny of *Shigella* and EIEC using whole genome sequencing of 169 samples, constituting unparalleled strain diversity, and observe a lack of monophyly between *Shigella* and EIEC and among *Shigella* taxonomic groups. The evolutionary relationships in the phylogeny are supported by analyses of population structure and hierarchical clustering patterns of translated gene homolog abundance. Lastly, we identified a panel of 254 single nucleotide polymorphism (SNP) markers specific to each phylogenetic cluster for more accurate identification of *Shigella* and EIEC. Our findings show that *Shigella* and EIEC are not distinct evolutionary groups within the *E. coli* genus and, thus, EIEC as a group is not the ancestor to *Shigella*. The multiple analyses presented provide evidence for reconsidering the taxonomic placement of *Shigella*. The SNP markers offer more discriminatory power to molecular epidemiological typing methods involving these bacterial pathogens.

Keywords: *Shigella*, enteroinvasive *E. coli* (EIEC), phylogeny, whole genome sequencing, classification, epidemiological markers

## INTRODUCTION

*Shigella* species are a leading cause of bacterial diarrhea (Walker et al., 2010). Worldwide, it is estimated that 164.7 million people are infected by *Shigella* annually (495,000 of those people in the United States) often through contaminated food and water (Scallan et al., 2011). Enteroinvasive *Escherichia coli* (EIEC), like *Shigella*, can also cause dysentery-like symptoms (Taylor et al., 1988). *Shigella* and EIEC are, in essence, strict human pathogens, sharing similar pathogenic mechanisms but their evolutionary relationship on a genomic level has not been determined. Although, the close relationship between *Shigella* and *E. c*oli has been acknowledged since 1898 (reviewed by Lan and Reeves, 2002), in the 1940s Ewing proposed classifying the four species in the new genus *Shigella* (*S. dysenteriae*, *S. flexneri*, *S. boydii,* and *S. sonnei*) based on the antigen characteristics of those species (Edwards and Ewing, 1986). Since that time, numerous studies have indicated that the phylogenetic history does not support this current classification (Pupo et al., 2000; Lan and Reeves, 2002; Escobar-Páramo et al., 2003; Lan et al., 2004; Sahl et al., 2015).

Volunteer feeding studies have shown that whereas 10 to a few 100 *Shigella* cells were enough to cause illness in healthy adults, the infective dose for three different EIEC strains was more in the 10<sup>8</sup> range, justifying the need for clinical medicine to maintain two separate genera (DuPont et al., 1971; Mathewson et al., 1985). However, considering that most governmental health agencies do not currently require reporting EIEC infections, their impact on diarrheal disease and their genetic diversity is not well-understood. The recent involvement of EIEC O96:H19 as the source of outbreaks severely affecting healthy individuals in Italy, Great Britain and a case reported in Spain illustrates that EIEC can be a potential threat to public health and provides new motivation for improving our understanding of EIEC for rapid and accurate identification (Escher et al., 2014; Michelacci et al., 2015; Pettengill et al., 2015). This new motivation is reinforced by a long established need to understand the evolutionary relationships between *Shigella*, EIEC and non-invasive *E. coli* for improved detection and surveillance.

Traditional microbiology differentiates *Shigella* from *E. coli* based on their physiological and biochemical characteristics, with EIEC being more metabolically active than *Shigella* (Edwards and Ewing, 1986). Sero-agglutination assays are afterward generally performed for the differentiation of members of the genus *Shigella*, but cross-reactivity with certain EIEC serotypes have been observed (Liu et al., 2008). Developing nucleic acid-based detection methods combining higher discriminatory power with low limits of detection are ideal but rely on the availability of suitable markers based on a wide diversity of isolates for that organism (Zhao et al., 2014). Currently, most molecular assays for the diagnosis of *Shigella* rely mainly on targeting the large ∼220-kbp invasive plasmid that is also shared by EIEC and, hence, cannot differentiate between the pathogens (Binet et al., 2014). Although, two recent studies proposed PCR assays to distinguish between *Shigella* species (Sahl et al., 2015) or between *Shigella* and EIEC (Pavlovic et al., 2011), the first study did not include any EIEC in their exclusivity panel and the second study included only 18 isolates of *Shigella* and 11 isolates of EIEC in their inclusivity panel.

In this study, we studied the evolutionary relationships among a wide diversity of strains that represent the *Shigella* genus and closely related EIEC. Comprehensive phylogenetic analyses were performed to determine if *Shigella* and EIEC are distinct evolutionary groups. Genome similarity was then investigated using a Bayesian clustering method that does not impose the bifurcating structure of phylogenetic analyses. Samples were then hierarchically clustered based on differences in abundance of predicted protein homologs to determine functional genomic differences. Lastly, we identified single nucleotide polymorphisms (SNPs) that were diagnostic of different phylogenetic clades that could be used to type and/or discriminate among those lineages.

### MATERIALS AND METHODS

### Growth of Strains, DNA Isolation, and Genome Retrieval

Pure culture isolates for 33 *Shigella* and *E. coli* strains (**Supplementary Table S1**) were grown from frozen stocks on Trypticase Soy Agar plates and incubated overnight at 37◦C. A minimum of three colonies were then inoculated into either *Shigella* Broth (if *Shigella* sp.; Center for Food Safety and Applied Nutrition, 2001) or Trypticase Soy Broth (if EIEC strains) for DNA extraction after overnight growth at 37◦C. Genomic DNA was extracted using DNeasy<sup>R</sup> Blood and Tissue kits (QIAGEN, Valencia, CA, USA) according to manufacturers' instructions. An additional 80 genomes (**Supplementary Table S1**) were retrieved in June 2014 from the NCBI SRA database using the SRA Toolkit v. 2.3.5-2 in fastq format1 . Assembled genomes from Sahl et al. (2015) were retrieved from NCBI in February 2015.

### Library Construction, Genome Sequencing, and Sequence Data

DNA was quantified using the Qubit<sup>R</sup> 2.0 Fluorometer and the Qubit<sup>R</sup> HS Assay kit (Life Technologies, Foster City, CA, USA). Samples were diluted to 0.2 ng/μl and stored at −20◦C until library preparation. Libraries were prepared using the Nextera XT DNA Sample Preparation Kit (Illumina<sup>R</sup> , San Diego, CA, USA). Sequencing reactions were performed with the MiSeq v2 chemistries with 250 bp paired-end read lengths and a 500 cycle cartridge and processed on a MiSeq platform (Illumina<sup>R</sup> , San Diego, CA, USA) to obtain data in fastq format. All the sequencing data generated for this project are available through bioproject accessions PRJNA273284 and PRJNA230969 at the National Center for Biotechnology Information (NCBI).

### Quality Control, Trimming, and Genome Assembly

Reads were trimmed and low quality bases (Q-scores < 20) filtered using the DynamicTrim program in SolexaQA v. 2.2 (Cox et al., 2010). Trimmed reads were then assembled using SPAdes v. 3.1.1 (Bankevich et al., 2012) with default settings. To ensure that assemblies were of high quality (e.g., low number of contigs and adequate total length), we obtained assembly statistics using the program Quast (Gurevich et al., 2013; **Supplementary Table S3**). Using the *de novo* assemblies from SPAdes, SNP matrices were produced using the reference-free approach implemented in kSNP v2.0 (Gardner and Hall, 2013). For the kSNP analyses we

<sup>1</sup>http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?cmd=show&f=software&m<sup>=</sup> software&s=software

used a *k*-mer value of 21, which was identified as the best fitting value based on the auxiliary script kChooser provided with that software.

Although, kSNP produces three matrices (composed of "all," "majority," and "core" SNPs), we focused on the core matrix as it is a more conservative method for identifying variant sites and better suited to remove recombination/horizontally transferred genomic elements form the analysis. The core matrix contains no missing data meaning there is a nucleotide state at each position in the alignment for all individuals. For kSNP analyses that included the *Salmonella* genomes, a total of 660,234 SNPs were identified and the number of core SNPs was 2,348. Analyses without *Salmonella* genomes had a total of 598,876 SNPs and 7,062 core SNPs. Of the core SNPs, 385 (16%) including *Salmonella* genomes and 1556 (22%) excluding *Salmonella* genomes were homoplastic (non-informative) SNPs. The proportions of homoplastic are lower than other kSNP analyses of *E. coli* genomes (37.6%; Gardner and Hall, 2013).

#### Serotyping

*Shigella* species are routinely serotyped with Statens Serum Institute species specific pool antisera (Cedarlane, Burlington, NC, USA) upon reception and by an in-house multiplex PCR assay (Binet, personal communication). Although the serotype is also confirmed with serotype specific Denka Seiken agglutinating sera (Thermo Fisher Scientific, Lenexa, KS, USA) on a caseby case basis, we did not confirm the identity of the nine *Shigella* isolates we sequenced in this study at the serotype level since they came from reputable bacterial collections, i.e., ATCC and CDC (**Supplementary Table S1**). All EIEC strains we sequenced were, however, conventionally serotyped with polyclonal O antigens from Statens Serum Institute (Cedarlane, Burlington, NC, USA) using a boiling method detailed by the manufacturer. For the additional genomes added to the study, in the absence of isolates, pertinent information was obtained directly from NCBI as provided at the time of submission or from associated publications when available (Holt et al., 2012; Escher et al., 2014; Sahl et al., 2015).

### Phylogenetic Analysis and Sample Labeling Designations

Using the core matrix produced by kSNP, phylogenetic inference analysis was performed using GARLI (Genetic Algorithm for Rapid Likelihood Inference) v. 2.0.1019 under the GTR + I + model and other default settings; trees were visualized with Figtree v. 1.3.1 (Zwickl, 2006; Rambaut and Drummond, 2009). To estimate the best topology based on the observed data, we ran 1000 replicate analyses and present the tree with the highest likelihood value. To estimate topological support for the different relationships, we ran 1000 bootstrap replicates that were then summarized using the SumTrees utility within the DendroPy package (Sukumaran and Holder, 2010). We chose not to remove homoplasious sites because bacterial phylogenetic topologies have been shown to be robust to the inclusion of such sites and removing them may in fact be detrimental to estimates of branch length (Hedge and Wilson, 2014).

*Escherichia coli* strains present in the phylogenetic tree are listed by the type of *E. coli*, the O antigen and H antigen (if known) followed by strain replicate number in parentheses. Other abbreviations found in the tree are: EIEC: enteroinvasive *E. coli*; EAEC: enteroaggregative *E. coli*; STEC: Shiga-toxin producing *E. coli*; ExPEC: extraintestinal pathogenic *E. coli*; EPEC: enteropathogenic *E. coli*; EHEC: enterohemorrhagic *E. coli* (Clements et al., 2012). *Shigella* strains are designated by genus and species, serotype (if known) followed by strain replicate number in parentheses. Abbreviations for *Shigella* species are as follows: SD: *S. dysenteriae*; SF: *S. flexneri*; SB: *S. boydii*, SS*: S. sonnei*.

### Diagnostic SNP Detection

A separate kSNP analysis was performed without the two *Salmonella* outgroup samples to obtain a core SNP matrix for only *Shigella* and EIEC samples (described above). A custom python script was used with the core matrix to identify those SNPs that were specific to the groups from the SNP-based phylogeny. We define a diagnostic SNP as a position in the core matrix where the nucleotide state is the same among all members of a group and that state differs from all non-members. For each diagnostic SNP (**Supplementary Table S2**), we report the SNP nucleotide region of 21 bp (or *k*-mer), the diagnostic SNP state of that cluster, the position in relation to a reference genome (SD serotype 1, NCBI: CP000034), the name of the gene (if applicable), the product (if applicable), the functional Clusters of Orthologous Groups of proteins (COG) category and the reference genome locus tag (if applicable).

### STRUCTURE Analyses

The STRUCTURE program performed model-based Bayesian clustering of genomes using the core SNP matrix without *Salmonella*, *E. fergusonii* or SB serotype 13 (related to *E. albertii*; Pritchard et al., 2000; Falush et al., 2003). Default parameters that consider admixture, were run for values of *k* between 2 through 11. The best fitting value of *k* identified by STRUCTURE HARVESTER based on changes in likelihood scores across the values of *k* as well as results from the value of *k* corresponding to the number of phylogenetic cluster (Evanno et al., 2005; Earl and vonHoldt, 2012). We ran ten replicate STRUCTURE runs for *<sup>k</sup>* <sup>=</sup> 2 to 11, each consisting of 6 <sup>×</sup> 104 generations, the first 10<sup>4</sup> served as the burn-in. Analyses were visualized using the DISTRUCT program (Rosenberg, 2004).

### Genome Annotation, Homology Prediction, and Similarity Matrix

Genome annotation was performed with RAST v. 2.0 (ClassicRAST; Overbeek et al., 2013). Annotated genomes were used to predict the homology of predicted proteins using the GET\_HOMOLOGUES (Contreras-Moreira and Vinuesa, 2013) program which uses a BLASTP bidirectional best hit approach with the following parameters: 75% amino acid sequence coverage, 1e-05 E-value and 60% sequence identity. This produced an abundance matrix of 3,777 predicted protein homologs that were identified in at least two genomes. Manhattan

outgroup can be found in Supplementary Figure S1.

distances were calculated from this matrix and clustered using the average linkage method with the hclust function in R Core Team (2014). Hierarchical clusters are colored to match the phylogenetic clusters in **Figure 1** in a bar next to the heat map. To obtain bootstrap probabilities (BPs) for the dendrogram and assign approximately unbiased *p*-values (AU), the Pvclust program in R was used with 10,000 replicates and shown next to a heat map generated with ggplot2 (Suzuki and Shimodaira, 2006; Wickham, 2009; R Core Team, 2014).

### Antibiotic Resistance-Related Annotation and Hierarchical Clustering

Using all genomes except those from Sahl et al. (2015) study, antibiotic resistance, the genes of antibiotic targets and biosynthesis genes were determined from a local BLASTN search using files available from the Comprehensive Antibiotic Resistance Database (downloaded in January 2015) with parameters set to an E-value of 1e-06 and 75% identity (**Supplementary Figure S4**; McArthur et al., 2013). The data were filtered to include genes that were present in at least two genomes. Hierarchical clustering, bootstrap support and approximately unbiased *p*-values were determined as described above.

## Evaluation of Previously Described Molecular Assays for the Differentiation of *Shigella* and EIEC

Sahl et al. (2015) reported 11 primer pairs that were specific to their phylogenetic analysis of *Shigella* but they did not include EIEC strains in their analysis. Similarly, Pavlovic et al. (2011) reported that primers targeting the β-glucuronidase gene (*uidA*) and the lactose permease gene (*lacY*) could differentiate 18 isolates of *Shigella* from 11 isolates of EIEC. The primers sequence identities from those two studies were examined, *in silico*, using local BLAST searches against the 169 genomes in our analyses. In **Supplementary Figures S5** and **S6**, genomes for which the particular primer pair exhibited 95% or greater, and 92% or greater similarity, respectively, were shown in blue to predict PCR amplification. The figures were made using R Core Team (2014).

## RESULTS

## Phylogeny

One hundred and seventy-one genomes were selected to encompass a large selection of EIEC strains and represent the diversity of the *Shigella* genus. Genomes from 35 isolates were inhouse sequenced draft genomes while 136 were available in public databases (**Supplementary Table S1**). We used 23 isolates of SD, including a minimum of 14 serotypes, 36 SF isolates, including at least six serotypes, 32 SB isolates, covering all 20 serotypes, 26 SS isolates, 32 EIEC isolates with 15 different serotypes, 18 isolates of non-invasive *E. coli* composed of 14 different serotypes, two isolates of *E. fergusonii.* The genomes of two *Salmonella* isolates were used for an outgroup (**Table 1**).

Single nucleotide polymorphisms found in every genome, defined as core SNPs, were used to generate SNP matrices. The kSNP v. 2.0 program (Gardner and Hall, 2013), which uses a *k*-mer based approach to identify variant sites across a set of genomes, generated SNP matrices consisting of 7,062



or 2,348 core SNPs depending on whether the *Salmonella* outgroup was excluded (**Figure 1**) or included (**Supplementary Figure S1**). Subsequent phylogenetic reconstruction based on both SNP matrices resolved 11 groups that did not follow the taxonomic classification of the samples, thus implying that *Shigella,* EIEC, and non-invasive *E. coli* were polyphyletic (**Figure 1**; **Supplementary Figure S1**). With the exception of the EIEC large cluster, all clusters had adequate bootstrap support (greater than 0.83). The phylogeny shows that SD serotype 1, SD serotype 8, SD serotype 10, and SB serotype 13 do not cluster with any other *Shigella* serotypes (**Figure 1**). Clusters 1, 2, 3, 7, and 9 were composed of either EIEC or *Shigella* strains in combination with non-invasive *E. coli* strains, whereas clusters 4, 5, 6, 8, 10, and 11 contained only EIEC or *Shigella*. Clustering of SB and SD genomes suggests there are not distinct SB and SD lineages. Most SF genomes clustered together except those of SF serotype 6 that falls into cluster 11 with several serotypes of SB and SD. In the absence of actual isolates for SF(13) and SF(15) to conventionally determine their O-antigen type by sero-agglutination, we turned to molecular serotyping targeting the *wzx* and *wzy* genes involved in the assembly of the O-antigen. Gene alignments between SF(13) and SF(15) and *S*. *flexneri* serotype 6 *wzx* and *wzy* genes were 99% homolog (data not shown) and both strains identified as *E. coli* O147, which is nearly identical to *S. flexneri* type 6 (Liu et al., 2008), using the SerotypeFinder software (v. 1.1) accessible on the Center for Genomic Epidemiology server2 . For perspective on how many SNP differences are represented by the branch lengths, histograms of the pairwise distances of total SNP number between pairs of genomes can be found in **Supplementary Figure S2**.

#### Population Structure of SNP Clustering

Genome similarity was then investigated using a Bayesian clustering method that does not impose the bifurcating structure of phylogenetic analyses. The population structure of the samples was therefore examined using the Bayesian modelbased program STRUCTURE v. 2.3.4. the core SNP matrix from the kSNP program without *Salmonella* as input. The program assigns individuals to a fixed number of clusters (*k*) allowing for admixture (e.g., recombination, ancestral polymorphism, horizontal gene transfer). The program STRUCTURE Harvester was used to infer the optimal value of *k* that best fits the data (Evanno et al., 2005; Earl and vonHoldt, 2012), which was determined to be 6 (**Figure 2A**). We also chose a *<sup>k</sup>* value of 11 to represent the number of clusters in the phylogenetic analyses (**Figures 1** and **2B**). Both cluster schemes were similar to the phylogeny, particularly for SS, SF, ExPEC, and EIEC lineages and the two distinct SB/SD lineages (**Figure 1**). Genomes in clusters that include SD serotype 1, SD serotype 10, and SD serotype 8 shared core SNPs with genomes in the EIEC, ExPEC and very small proportions of SF and SS clusters (**Figures 2A,B**). When core SNPs from SF genomes were grouped into 11 genetic groups, the phylogeny topology was similar to that of the six groups with the exception of the SF genomes which appear to have two genetic backgrounds and these roughly correspond to the clustering observed in the phylogeny (**Figure 1**; **Supplementary Figure S3**).

#### Clusters of Predicted Protein Homologs

The differences between the gene content of the genomes was then investigated based on the abundance of predicted protein homologs. After annotating all genomes with RAST (Overbeek et al., 2013), homologous translated genes were identified using the program GET\_HOMOLOGUES which uses a BLASTP bidirectional best-hit approach (Contreras-Moreira and Vinuesa, 2013). While restricting our analyses to the genes that were shared between at least two individuals, we obtained a matrix composed of 3,777 genes and their abundances within each genome. The abundance matrix was hierarchically clustered with the average linkage method and Manhattan distances to identify differences in these profiles using the R package Pvclust (Suzuki and Shimodaira, 2006; R Core Team, 2014). Pvclust was also used to obtain statistical support for clusters based on both AU *p*-values and BP (Suzuki and Shimodaira, 2006). This showed that genomes from the phylogeny in the SS, SF, and SB/SD large clusters have significantly clustered translated gene abundance profiles with BP and AU of 100/100, 100/100, and 93/97, respectively (**Figure 3**). Hierarchical clustering of antibiotic resistance related genes shows patterns that are consistent with these studies and may indicate lineage specific selection in SS and some SD serotypes (**Supplementary Figure S4**).

### Lineage-Specific SNP Identification and Evaluation of Previously Described Molecular Assays for the Differentiation of *Shigella* and EIEC

To identify lineage specific SNPs, we excluded the *Salmonella* outgroup to focus on differentiating among *Shigella* and EIEC lineages. From 7,062 core SNPs, we found 254 SNP positions that were diagnostic for each of the clusters (**Supplementary Table S2**). A description of the diagnostic SNPs by phylogenetic cluster is found in **Table 2**.

<sup>2</sup>https://cge.cbs.dtu.dk/services/

To illustrate the importance of using a genetically diverse set of genomes for the development of molecular epidemiological markers, we performed *in silico* analyses of primer sequence identities using BLAST searches for each primer against the full set (169) of genomes with a sequence identity of 95% (one base pair difference per primer) or higher for primers from (Sahl et al., 2015) or 92% and higher for primers from (Pavlovic et al., 2011; **Supplementary Figures S5** and **S6**). We predict that these primers would not accurately distinguish between the phylogenetic groups determined by Sahl et al. (2015) or between *Shigella* and EIEC genomes, as suggested by Pavlovic et al. (2011; **Supplementary Figures S5** and **S6**).

### DISCUSSION

To the best of our knowledge, this study represents the most comprehensive phylogeny of *Shigella* and EIEC to date. Unlike previous studies exploring the molecular relationships between *E. coli* and *Shigella* (Pupo et al., 2000; Lan and Reeves, 2002; Escobar-Páramo et al., 2003, 2004; Lan et al., 2004; Touchon et al., 2009; Sims and Kim, 2011; Zhang and Lin, 2012; Gardner and Hall, 2013; Zuo et al., 2013; Sahl et al., 2014, 2015), we used a large number and diversity of *Shigella* and EIEC genomes, including the recently discovered SB serotypes 19 and 20 and SD serotype 15, and performed genomic-scale phylogenetic analyses. The phylogeny together with the population structure analyses and the clustering of translated gene abundance profiles suggest that *Shigella* and EIEC evolved independently (**Figures 1–3**). Due to the polyphyly observed for EIEC, EIEC as a group cannot be considered as the ancestor to *Shigella* although some EIEC lineages may be the ancestor to *Shigella* (**Figure 1**). Interestingly, the phylogeny obtained is similar to the ones constructed using multi locus genotype data and other inference methods (i.e., neighbor-joining; Pupo et al., 2000; Lan and Reeves, 2002; Escobar-Páramo et al., 2003; Lan et al., 2004).

### Incongruence between Phylogeny and Taxonomy

A few studies have concluded that *Shigella* arose from a single common ancestor (or monophyletically; Escobar-Páramo et al., 2003; Zuo et al., 2013). This conclusion likely comes from phylogenetic analyses conducted with a limited diversity of *Shigella* strains and serotypes and EIEC isolates. Analyses that include a broader diversity of strains support a hypothesis of multiple origins (Pupo et al., 2000; Lan and Reeves, 2002; Lan

et al., 2004; Sahl et al., 2015). Although many topological characteristics of our SNP-based phylogeny, such as the polyphyly of SB/SD, have been identified previously (Pupo et al., 2000; Lan and Reeves, 2002; Escobar-Páramo et al., 2003, 2004; Lan et al., 2004; Sahl et al., 2015), we clearly show that *Shigella* and EIEC genomes originated from multiple independent events. Similarly, the grouping of SF serotype 6 near SB serotypes 2, 4, and 14 indicates that, despite being called SF, they are part of the SB/SD large lineage (Pupo et al., 2000; Lan et al., 2004). As expected from previous studies that link SB serotype 13 to *E. albertii* (Pupo et al., 2000; Lan and Reeves, 2002; Hyma et al., 2005), our SB serotype 13 representative genome clusters outside of *E. coli*, EIEC and *Shigella* groups where it appears as the base of the phylogeny on an exceptionally long branch.

When considering EIEC specifically, our results are in agreement with those of Lan et al. (2004) where O124, O152, and O135 serotypes cluster together and O136, O28ac, O164, and O29 cluster together. Similarly, we observed, that EIEC serotype O112ac clustered near SB serotype 12 and SD serotype 2, and identified only five core SNP differences between EIEC serotype O112ac and SD serotype 2(2) (**Figures 1–3**).

One topological difference between our phylogeny and previous phylogenies is the clustering of SB serotype 12. In our phylogenetic analyses (**Figure 1**), SB serotype 12 clusters in the SB/SD small cluster as opposed to clustering with SF strains in trees constructed by Pupo et al. (2000) and Lan et al. (2004). Our kSNP analyses reveal that there are only eight core SNP differences between SB serotype 12 and SD serotype 2(1). However, clustering of the translated gene abundance matrix shows that SB serotype 12 clusters by itself, away from any isolates it clusters near in the phylogeny (**Figure 3**). This suggests that SB serotype 12 may have a unique genetic history requiring additional analyses.

Given that we did not remove homoplastic SNPs based on the phylogenetic results, we can infer the degree of admixture (perhaps due to recombination) among the samples based on the STRUCTURE results. In general, both the clustering at *k* = 6 and 11 show only a few samples to have SNP profiles that suggest admixture with other distinct groups. Also from the

TABLE 2 | Phylogenetic group name (from Figure 1), number of individuals within each group (*N*) and the number of diagnostic SNPs (*Dsnps* ).


STRUCTURE analyses, we see that hybrid strains within SB, SD, and SS lineages may be rare. An exception is SB serotype 9 (**Figure 3**) and, similar to SB serotype 12 discussed above, the hierarchical clustering of the translated gene abundance matrix shows it clustering distantly from strains it clusters near in the phylogeny. It would be interesting to further investigate a range of SB serotype 9 isolates to determine if this pattern is common and represents a transitional strain.

While we did not specifically investigate the evolutionary history of the invasion plasmid, our data do not support the hypothesis proposed by Escobar-Páramo et al. (2003) that the invasion plasmid was transferred before the evolution of *Shigella* and EIEC lineages. Our phylogeny and the DISTRUCT diagram (**Figures 1** and **2**) suggest that EIEC cluster with non-invasive *E. coli* genomes that do not possess the invasion plasmid implying that the transfer of the invasion plasmid did not precede a monophyletic evolution of *Shigella* and EIEC.

### Importance of Sampling Diverse Genetic Lineages

Our study underscores the importance of including a diverse collection of *Shigella* and EIEC genomes into phylogenetic studies that examine *Shigella,* as we were able to make a number of novel findings with high confidence. For example, EIEC strains appear to have a greater genetic diversity than previously believed, with EIEC strains clustering near non-invasive *E. coli* strains (**Figure 1**). For this reason, the inclusion of a range of EIEC strains for developing diagnostic tools is essential for accurate and clinically relevant identification, as well as for outbreak detection. When genetic diversity is not a component of investigations for diagnostic purposes, markers may not be useful. One example is a recent study that presents diagnostic markers for PCR detection of *Shigella* (Sahl et al., 2015), yet the primers for these markers do not discriminate between *Shigella* and EIEC when a larger genetic diversity is considered (**Supplementary Figure S5**). While another study included 11 EIEC strains (Pavlovic et al., 2011), their primers and probes cannot accurately distinguish between *Shigella* and EIEC (**Supplementary Figure S6**).

Single nucleotide polymorphisms markers are a useful genotyping/molecular epidemiological typing method because they are considered relatively genetically stable and not likely to change, to such a degree that classification tools are built based on SNPs (Larkeryd et al., 2014). Another asset is that a nucleotide should always be present at the SNP position, reducing the number of false negatives from presence/absencetype gene markers. SNP detection methods are also considered excellent for their discriminatory power, reproducibility and ability to be used in a high-throughput capacity (Hallin et al., 2012). With these advantages in mind, we identified multiple SNPs for the phylogenetic groups (except EIEC large), which offer researchers multiple opportunities for optimizing primer design and confirming positive results. Our inability to identify diagnostic SNP markers for the EIEC large cluster suggests that a greater diversity of EIEC isolates would be needed for markers (**Supplementary Table S3**). The lower bootstrap support (0.61) for the EIEC large cluster (**Figure 1**) is consistent with a need for additional genomes with greater genetic diversity.

Analyses looking for the presence/absence of core genes that were specific to each cluster yielded no such genes. This is in agreement with another study that did not identify *Shigella*specific genes that were distinct from *E. coli* using orthologous genes from pan-genomes (Gordienko et al., 2013). As the authors and our data suggest, phylogenetic evidence points toward *Shigella* belonging to the *E. coli* genus and thus these groups are likely sharing the same pool of genes. In summary, the polyphyletic nature of the *Shigella* and *E. coli* groups and putative taxonomy makes the strategy of identifying specific genes to these groups difficult.

The clustering based on the abundance matrix of translated genes is not strictly congruent with the topology inferred from the phylogenetic analyses using the SNP data (**Figures 1** and **3**). However, most of the incongruence is among clades rather than the membership of individuals to specific clades. For example, all but one of the individuals belonging to the SF, SB/SD large, and SS clusters are not found grouped together in the trees based on the SNP and gene abundance data but the relationships among those clades does differs (**Figures 1** and **3**). Overall, we find support that gene content/abundance carries a similar evolutionary signal as that contained in SNPs. For example, there is an appreciable amount of resolution and fidelity to the relationships depicted in the phylogeny using the hierarchically clustered distance matrix of predicted protein homologs for clusters of genomes from SF, SS, SB/SD large clusters (**Figure 3**). These clusters have significant AU values and strong bootstrap support. However, differences do exist between the methods, which may be the result of unresolved basal relationships and/or unique isolate outliers (such as EIEC O96:H16, SB serotypes 9 and 12). It is also possible that the gene abundance analyses are capturing a stronger signal from recombination and mobile elements than would be present in the core SNP matrix. A similar incongruence was observed in a very limited number of *Shigella* and *E. coli* genomes between phylogenies based on core SNPs and using BLAST derived coding sequences (CDSs; Sahl et al., 2014). Some degree of incongruence is to be expected due to gene histories being linked but different from species histories (Szöllosi et al., 2015). For example, studies of SS and SD provide evidence that these lineages are undergoing selection for drug and multidrug resistance and we also observed a pattern of clustering of antibiotic resistance-related genes that are linked to phylogeny but also may have individual gene histories (**Supplementary Figure S4**; Holt et al., 2012; Rohmer et al., 2014).

#### CONCLUSION

There is a growing acknowledgment that microbial taxonomy should be based on a more comprehensive and exhaustive survey of genomes (Rosselló-Móra and Amann, 2015; Thompson et al., 2015). Current problematic taxonomic designations are common throughout microbial taxonomy (Rosselló-Móra and Amann, 2015; Thompson et al., 2015 and references within). In the case of *Shigella*, genomic evidence supporting the change of taxonomic designations is well-established (Pupo et al., 2000; Lan and Reeves, 2002; Escobar-Páramo et al., 2003, 2004; Lan et al., 2004; Gardner and Hall, 2013; Sahl et al., 2014, 2015). Based on these studies and the analyses conducted herein, there is a large body of evidence that the *Shigella* genus should be moved back within the species *E. coli*. Furthermore, we suggest that *Shigella* should be classified as EIEC and the serotypes renamed using the common O antigen naming. *Shigella* serotypes are based upon O antigens many of which are identical or nearly identical to existing *E. coli* O antigens (with the exception of *S. sonnei*; Liu et al., 2008). The existence of two separate nomenclatures is redundant and confusing. We are repeating a long established call to reduce confusion and promote the understanding of accurate evolutionary relationships of *Shigella* and *E. coli* (Lan and Reeves, 2002; Chaudhuri and Henderson, 2012). While we believe that taxonomic designations that more accurately reflect genetic relationships can improve outbreak characterization and communication in the long-term, taxonomic revisions are difficult and some may consider that revisions pose risks for public health in the more immediate time frame. We support the growing recognition of the value behind systematic species or genome similarity assignments for all players involved in realtime epidemics (Marakeby et al., 2014; Varghese et al., 2015; Weisberg et al., 2015). In the absence of universal genome-based classification and naming systems, our results provide support for reconsidering the current taxonomic placement and naming of *Shigella* species.

### AUTHOR CONTRIBUTIONS

EP generated sequence data. EP and JP performed analyses. EP, JP, and RB interpreted results and wrote the manuscript. RB conceived the project. All authors read and approved the final manuscript.

#### FUNDING

EP was funded in part by an appointment with the Research Participation Program at the Center for Food Safety and Applied Nutrition administered by Oak Ridge Institute for Science and Education through and interagency agreement between the U.S. Department of Energy and The U.S. Food and Drug Administration.

#### ACKNOWLEDGMENTS

We would like to thank John Miller and Yan Luo for assistance with Python scripting and analyses, and Tom Hammack and Eric Brown for their support and critical reading of the manuscript. The authors would also like to thank Rosangela Tozzoli and Stefano Morabito for providing the raw data for EIEC O96:H19 before public release of the genome as well as Anthony Maurelli and the STEC center for providing isolates.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmicb.2015. 01573

FIGURE S1 | A maximum-likelihood (ML) phylogeny of *Shigella*, enteroinvasive *E. coli* (EIEC), non-invasive *E. coli* strains and *Salmonella* outgroup based on 2,348 SNPs present in all genomes using the kSNP program (Gardner and Hall, 2013). The ML tree was generated using GARLI v. 2.0.1019 (Zwickl, 2006) under the GTR + I + model and other default settings. Trees were visualized with Figtree v. 1.3 (Rambaut and Drummond, 2009). The best tree was chosen from 100 runs of the data set and bootstrap values (1,000 iterations) are reported above each node. Bootstrap values <80% were not shown.

FIGURE S2 | Histograms of the pairwise distances of core SNP differences between genome pairs for the SNP-based phylogenies (A) without the *Salmonella* outgroup and (B) with the *Salmonella* outgroup.

FIGURE S3 | Reordered STRUCTURE results for *S. flexneri* genomes from analyses performed with 11 SNP groups (right) corresponding to the phylogenetic cluster in Figure 1 (left).

FIGURE S4 | Hierarchical clustering of antibiotic resistance related genes. Red values on dendrogram represent unbiased *p*-values determined by Pvclust package in R. The dendrogram was generated using the correlation distance method and the average linkage method.

FIGURE S5 | BLAST alignment of primers, described by Sahl et al. as specific for Shigella phylogenetic groups (Sahl et al., 2015), with genomes used in this study. A blue cell for a particular genome indicates that both primers of the pair aligned to 95% or greater sequence identity and should therefore hybridize to yield a PCR product. The phylogenetic group designation assigned by Sahl et al. is noted next to the cluster designations we observed with these genomes.

FIGURE S6 | *In silico* alignment of primer-probe sets described by Pavlovic et al. (2011) with genomes used in this study using BLAST. The *lacY* set was supposed to differentiate between *Shigella* (absent) and EIEC (present), while the *uidA* set was intended to be a positive control (present in both). BLAST identities of 92% or higher are shown with blue cells. Although PCR products are expected from a particular genome if both cells corresponding to the forward and reverse primers are highlighted in blue, the real-time PCR assay (Pavlovic et al., 2011) also require the respective probe to hybridize efficiently and therefore the respective cell to be highlighted in blue in the figure.

TABLE S1 | Strain information includes NCBI identifier (SRA#), Tree label/Strain designation, genus and species with serotype, O or H antigens, additional strain identifiers and reference for source of genomes.

TABLE S2 | Full list of diagnostic SNPs for *Shigella* and EIEC phylogenetic clusters. Includes phylogenetic cluster name, 21 bp sequence of region containing diagnostic SNP with ambiguous SNP state represented by ".", diagnostic SNP state of cluster, position in the NCBI annotated reference genome (SD serotype 1, CP000034), gene name ("NA" if intergenic), functional gene product ("NA" if intergenic), COG identifier and reference genome (CP000034) locus tag.

#### REFERENCES


TABLE S3 | Assembly statistics and genome metrics calculated by the Quast program. Includes Tree label/Strain designation, NCBI SRA accession number, number of contigs greater or equal to 1,000 bp (# contigs (≥1,000 bp)), number of contigs greater or equal to 0 bp ((# contigs (≥0 bp), total length of contigs greater or equal to 1,000 bp (Total length (≥1,000 bp)), total length of contigs greater or equal to 0 bp (Total length (≥0 bp)), number of contigs, largest contig (bp), total length of all contigs, percent GC content and number of N's per 100 kbp.


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

*Copyright © 2016 Pettengill, Pettengill and Binet. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.*

# Whole Genome Sequencing for Public Health Surveillance of Shiga Toxin-Producing Escherichia coli Other than Serogroup O157

Marie A. Chattaway, Timothy J. Dallman, Amy Gentle, Michael J. Wright, Sophie E. Long, Philip M. Ashton, Neil T. Perry and Claire Jenkins \*

*Gastrointestinal Bacteria Reference Unit, Public Health England, London, UK*

Shiga toxin-producing *Escherichia coli* (STEC) are considered to be a significant threat to public health due to the severity of gastrointestinal symptoms associated with human infection. In England STEC O157 is the most commonly detected STEC serogroup, however, the implementation of PCR at local hospital laboratories has resulted in an increase in the detection of non-O157 STEC. The aim of this study was to evaluate the use of whole genome sequencing (WGS) for routine public health surveillance of non-O157 STEC by comparing this approach to phenotypic serotyping and PCR for subtyping the stx-encoding genes. Of the 102 isolates where phenotypic and genotypic serotyping could be compared, 98 gave fully concordant results. The most common non-O157 STEC serogroups detected were O146 (22) and O26 (18). All but one of the 38 isolates that could not be phenotypically serotyped (designated O unidentifiable or O rough) were serotyped using the WGS data. Of the 73 isolates where a flagella type was available by traditional phenotypic typing, all results matched the H-type derived from the WGS data. Of the 140 sequenced non-O157 isolates, 52 (37.1%) harboured *stx1* only, 42 (30.0%) had *stx2* only, 46 (32.9%) carried *stx1* and *stx2*. Of these, stx subtyping PCR results were available for 131 isolates and 121 of these had concordant results with the stx subtype derived from the WGS data. Of the 10 discordant results, non-specific primer binding during PCR amplification, due to the similarity of the stx2 subtype gene sequences was the most likely cause. The results of this study showed WGS provided a reliable and robust one-step process for characterization of STEC. Deriving the full serotype from WGS data in real time has enabled us to report a higher level of strain discrimination while stx subtyping provides data on the pathogenic potential of each isolate, enabling us to predict clinical outcome of each case and to monitor the emergence of hyper-virulent strains.

Keywords: whole genome sequencing, Shiga Toxin-producing Escherichia coli, serotyping, stx subtyping

## INTRODUCTION

Shiga toxin-producing Escherichia coli (STEC) are considered to be a significant threat to public health due to the severity of gastrointestinal symptoms associated with human infection and the risk of cases developing Haemolytic Uraemic Syndrome (HUS; Byrne et al., 2015). STEC are zoonotic; transmission occurs by direct contact with animals or their environment, or by consumption of

#### Edited by:

*Chitrita Debroy, The Pennsylvania State University, USA*

#### Reviewed by:

*Remy Froissart, Centre National de la Recherche Scientifique, France Seamus Fanning, University College Dublin, Ireland*

> \*Correspondence: *Claire Jenkins claire.jenkins@phe.gov.uk*

#### Specialty section:

*This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology*

Received: *16 January 2016* Accepted: *16 February 2016* Published: *03 March 2016*

#### Citation:

*Chattaway MA, Dallman TJ, Gentle A, Wright MJ, Long SE, Ashton PM, Perry NT and Jenkins C (2016) Whole Genome Sequencing for Public Health Surveillance of Shiga Toxin-Producing Escherichia coli Other than Serogroup O157. Front. Microbiol. 7:258. doi: 10.3389/fmicb.2016.00258* contaminated food or water (Byrne et al., 2014). The infectious dose is low (<10 organisms) and person-to-person spread is common, particularly in nursery school settings and in households with young children (Byrne et al., 2015).

In England, the current Standards for Microbiology Investigations protocols are specific for the isolation of non-sorbitol fermenting colonies of E. coli serogroup O157 on cefixime tellurite sorbitol MacConkey (CT-SMAC) agar. STEC serogroups other than O157 (non-O157 STEC) are not detected using this method (Byrne et al., 2014). However, since 2012 the implementation of commercial PCR assays for the detection of STEC in faecal specimens from cases with symptoms of gastrointestinal infection, at a twelve local hospital laboratories, has resulted in an increase in the detection of non-O157 STEC (Byrne et al., 2014).

Faecal specimens that are PCR positive for the Shiga Toxin (stx) genes at the local hospital laboratories in England are sent to the Gastrointestinal Bacterial Reference Unit (GBRU) at Public Health England (PHE) for isolation of STEC (Jenkins et al., 2012) and subsequent serotyping (Gross and Rowe, 1985). Recent advances in whole genome sequencing (WGS) have led to the development of a method for high throughput sequencing of bacterial genomes at low cost (Joensen et al., 2014). During 2014, we evaluated the use of WGS for routine public health surveillance of non-O157 STEC by comparing this approach to phenotypic serotyping and PCR for subtyping the stx-encoding genes (Persson et al., 2007).

### MATERIALS AND METHODS

All 167 strains of non-O157 STEC isolated during 2014 were phenotypically serotyped by the agglutination of antibodies raised in rabbits to the lipopolysaccharide O antigen and to the flagella H antigen (Gross and Rowe, 1985). Real-time PCR targeting stx1 and stx2 and the stx subtyping PCR was performed as previously described (Persson et al., 2007; Jenkins et al., 2012).

Genomic DNA extracted from 140 of the 167 strains of non-O157 STEC was fragmented and tagged for multiplexing with Nextera XT DNA Sample Preparation Kits (Illumina) and sequenced using the Illumina HiSeq 2500. A reference database, SerotypeFinder, containing the gene sequences encoding the 180 O antigen groups (wzx, wzy, wzm, and wzt) and the 53 H antigens (fliC, flkA, fllA, flmA, and flnA) was constructed and developed by Joensen et al. (2015). Using the GeneFinder tool (Doumith unpublished), FASTQ reads were mapped to the genes in the SerotypeFinder database using Bowtie 2 (Langmead and Salzberg, 2012) and the best match to each of the O and H determinants was reported with metrics including coverage, depth, mixture and homology in an XML format for quality assessment. Only in silico predictions of serotype that matched to a gene determinant at >80% nucelotide identity over >80% length were accepted. Stx subtyping was performed as described by Ashton et al. (2015). FASTQ sequences were deposited in the National Center for Biotechnology Information Short Read Archive under the bioproject PRJNA248064.

#### RESULTS

Whole genome sequences were available for 140 of the 167 non-STEC isolates reported in 2014 (Supplementary Table). Of these, 102 had a phenotypically derived serogroup, 25 did not agglutinate with the antisera in the serotyping scheme raised to the known E. coli serogroups and were designated "O unidentifiable," and 13 did not express the O antigen and were designated "O rough." Of the 102 isolates where phenotypic and genotypic serotyping could be compared, 98 gave fully concordant results (Supplementary Table). The most common non-O157 STEC serogroups detected were O146 (22) and O26 (18). There were 15 strains of STEC O55, all from cases linked to an outbreak in the South of England.

Of the four results that were not fully concordant, two isolates serogrouped as O186 phenotypically but were designated O123/O186 by in silico serotyping and one typed as O178 phenotypically and was designated O153/178 using WGS data. There was one mismatch; STEC O74 was identified as STEC O187 when the serotype was derived from the genome. The in silico serotyping method failed to type one isolate, STEC O146:H21, due to the short read sequences having low mapping coverage of the O antigen encoding genes. All but one of the 38 isolates that could not be phenotypically serotyped (designated O unidentifiable or O rough) were serotyped using the WGS data (Supplementary Table). The most common WGS derived serotypes that were untypable using the phenotypic approach were O91:H14, O117:H7, and O80:H2.

There were 102 isolates that were processed for H-typing, of which 29 were found to be non-motile and could not be typed. Of the 73 isolates where a flagella type was available by traditional phenotypic typing, all results matched the H-type derived from the WGS data. All the non-motile isolates were typable using the in silico serotyped by in silico serotyping (Supplementary Table).

Of the 140 sequenced non-O157 isolates, 52 (37.1%) harboured stx1 only, 42 (30.0%) had stx2 only, 46 (32.9%) carried stx1 and stx2. Of these, stx subtyping PCR results were available for 131 isolates and 121 of these had concordant results with the stx subtype derived from the WGS data (Supplementary Table).

The most frequently detected stx subtype profile was stx1a only (38), most commonly associated with serotypes O103:H2 (14), O26:H11 (8), and O117:H7 (8) (Supplementary Table). There were 14 isolates with stx1c only, including O146:H21 (5) and O76:H19 (3). All stx1a and stx2b strains belonged to STEC O91:H14 (9). There were 23 isolates that had stx1c and stx2b associated with serogroups O146:H21 (9), O128:H2 (8), O113:H4 (2), and O174:H8 (2). The combination of stx1a and stx2a was found almost exclusively in O26:H11 (7), the one exception being an isolate of STEC O71:H2. Of the 21 isolates harbouring stx2a only, 15 were STEC O55:H7 and two were O26:H11. Of the 10 isolates that had stx2d, three belonged to O146:H21 (and also had stx1c), six were O80:H2 and one was O26:H11. Other rare subtypes (one case of each) included stx2e (O8:H9) and stx2g (O187:H28) (Supplementary Table).

There were 10 cases of HUS in 2014 (including five cases belonging to an outbreak of STEC O55:H7), eight had STEC harbouring stx2a, one had STEC O80:H2 carrying stx2d, and O103:H2 stx1a was isolated from the tenth case.

## DISCUSSION

The results of this study showed WGS provided a reliable and robust one-step process for characterisation of STEC. Previous studies have shown an increasing number of strains of STEC reported as "O group unidentifiable" due to antisera failing quality control procedures, unresolvable cross reactions, lack of expression of O antigens (designated "rough") or novel serogroups (Jenkins et al., 2003; Byrne et al., 2014). In this study, all but one of the isolates that were previously phenotypically untypable, were serotyped using data derived from the genome.

Of the 10 mismatched results identified in the comparison between the stx subtyping PCR (Scheutz et al., 2012) and the WGS approach (Ashton et al., 2015), all 10 had additional stx subtypes detected by PCR that were not identified in the WGS data (Supplementary Table). Non-specific primer binding during PCR amplification, due to the similarity of the stx2 subtype gene sequences was the most likely cause.

Historically, the most common stx profile of STEC O26 was stx1a but over the last 10 years a more virulent STEC O26 variant harbouring stx2a has emerged across Europe (Bielaszewska et al., 2013). The enhanced strain characterisation data that WGS provides facilitated the surveillance of emerging strains of STEC associated with more severe disease (for example STEC O55:H7 stx2a, STEC O26:H11 stx2a, STEC O80:H2 stx2d) and with novel stx profiles (for example STEC O26:H11 stx2d) and enabled us to compare data with colleagues in the field (Mariani-Kurkdjian et al., 2014; Delannoy et al., 2015). Previous studies have also reported an association between stx2a and severe disease (Ethelberg et al., 2004; Byrne et al., 2014). The eae gene was detected in 62 (44%) of the 140 non-O157 STEC isolates. All 10 isolates from the HUS cases had eae. None of the STEC strains in this data set had aggR, previously detected in highly pathogenic STEC variants (Boisen et al., 2015).

Prior to the implementation of WGS, due to limited resources and time constraints, H-typing and stx subtyping of STEC were not routinely reported by GBRU. Deriving the full serotype from

#### REFERENCES


WGS data in real time has enabled us to report a higher level of strain discrimination while stx subtyping provides data on the pathogenic potential of each isolate, enabling us to predict clinical outcome of each case and to monitor the emergence of hyper-virulent strains.

### AUTHOR CONTRIBUTIONS

MW and NP isolated the STEC and performed the real-time stx PCR. AG and SL performed the phenotypic serotyping, stx subtyping PCR and extracted the DNA. CJ, MC, MW, and NP implemented the wet lab WGS pipelines and performed analysis. TD and PA implemented the bioinformatics pipelines and performed analysis. CJ, MC, and TD wrote the manuscript.

### FUNDING

This work was supported by the National Institute for Health Research and Health Protection Research Unit in Gastrointestinal Infections at the University of Liverpool.

### ACKNOWLEDGMENTS

We would like to acknowledge Yoshini Taylor and Dawn Hedges at GBRU, Rediat Twolde in the Infectious Disease Informatics Unit and Cath Arnold in the Genomics Service Unit at PHE. We would very much like to thank Flemming Scheutz at the Statens Serum Institute in Copenhagen and Katrine Joensen at the Danish Technical University for sharing the SerotypeFinder database with us.

#### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmicb. 2016.00258


Scheutz, F., Teel, L. D., Beutin, L., Piérard, D., Buvens, G., Karch, H., et al (2012). Multicenter evaluation of a sequence-based protocol for subtyping Shiga toxins and standardizing Stx nomenclature. J. Clin. Microbiol. 50, 2951–2963. doi: 10.1128/JCM.00 860-12

**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Chattaway, Dallman, Gentle, Wright, Long, Ashton, Perry and Jenkins. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

# Whole Genome Sequencing for Genomics-Guided Investigations of *Escherichia coli* O157:H7 Outbreaks

Brigida Rusconi 1, 2, Fatemeh Sanjar 1, 2, Sara S. K. Koenig1, <sup>2</sup> , Mark K. Mammel <sup>3</sup> , Phillip I. Tarr <sup>4</sup> and Mark Eppinger 1, 2 \*

*<sup>1</sup> South Texas Center for Emerging Infectious Diseases, University of Texas at San Antonio, San Antonio, TX, USA, <sup>2</sup> Department of Biology, University of Texas at San Antonio, San Antonio, TX, USA, <sup>3</sup> Center for Food Safety and Applied Nutrition, U.S. Food and Drug Administration, Laurel, MD, USA, <sup>4</sup> Department of Pediatrics, Washington University School of Medicine, St. Louis, MO, USA*

#### *Edited by:*

*Chitrita Debroy, The Pennsylvania State University, USA*

#### *Reviewed by:*

*Patrick Fach, ANSES, France Atsushi Iguchi, University of Miyazaki, Japan*

> *\*Correspondence: Mark Eppinger mark.eppinger@utsa.edu*

#### *Specialty section:*

*This article was submitted to Evolutionary and Genomic Microbiology, a section of the journal Frontiers in Microbiology*

*Received: 06 March 2016 Accepted: 08 June 2016 Published: 30 June 2016*

#### *Citation:*

*Rusconi B, Sanjar F, Koenig SSK, Mammel MK, Tarr PI and Eppinger M (2016) Whole Genome Sequencing for Genomics-Guided Investigations of Escherichia coli O157:H7 Outbreaks. Front. Microbiol. 7:985. doi: 10.3389/fmicb.2016.00985* Multi isolate whole genome sequencing (WGS) and typing for outbreak investigations has become a reality in the post-genomics era. We applied this technology to strains from *Escherichia coli* O157:H7 outbreaks. These include isolates from seven North America outbreaks, as well as multiple isolates from the same patient and from different infected individuals in the same household. Customized high-resolution bioinformatics sequence typing strategies were developed to assess the core genome and mobilome plasticity. Sequence typing was performed using an in-house single nucleotide polymorphism (SNP) discovery and validation pipeline. Discriminatory power becomes of particular importance for the investigation of isolates from outbreaks in which macrogenomic techniques such as pulse-field gel electrophoresis or multiple locus variable number tandem repeat analysis do not differentiate closely related organisms. We also characterized differences in the phage inventory, allowing us to identify plasticity among outbreak strains that is not detectable at the core genome level. Our comprehensive analysis of the mobilome identified multiple plasmids that have not previously been associated with this lineage. Applied phylogenomics approaches provide strong molecular evidence for exceptionally little heterogeneity of strains within outbreaks and demonstrate the value of intra-cluster comparisons, rather than basing the analysis on archetypal reference strains. Next generation sequencing and whole genome typing strategies provide the technological foundation for genomic epidemiology outbreak investigation utilizing its significantly higher sample throughput, cost efficiency, and phylogenetic relatedness accuracy. These phylogenomics approaches have major public health relevance in translating information from the sequence-based survey to support timely and informed countermeasures. Polymorphisms identified in this work offer robust phylogenetic signals that index both short- and long-term evolution and can complement currently employed typing schemes for outbreak ex- and inclusion, diagnostics, surveillance, and forensic studies.

Keywords: *Escherichia coli*, O157:H7, EHEC, phylogenomics, outbreaks, single nucleotide polymorphism, genomic epidemiology, whole genome sequence typing

## INTRODUCTION

Microbial pathogens with a foodborne etiology present major challenges to public health. Escherichia coli has been divided into different pathovars based on key virulence factors that define their pathogenicity (Sadiq et al., 2014). One particularly daunting pathovar among the Shiga toxin producing E. coli (STEC) are strains of the enterohemorrhagic O157:H7 serotype, which can be transmitted by a variety of vehicles, and causes serious human disease (Tarr et al., 2005). Currently, there is no effective treatment or prophylaxis for hemolytic uremic syndrome (HUS) (Goldwater and Bettelheim, 2012), and use of antibiotics is not indicated (Freedman et al., 2016). Since its discovery in 1982, this lineage has rapidly evolved from a rare serotype into the now globally dominant enterohemorrhagic E. coli (EHEC) serotype. A remarkable feature is its low infectious dose; it is estimated that 10–100 colony-forming units (CFUs) are sufficient to cause disease (Tilden et al., 1996; Tuttle et al., 1999) For the above reasons, prevention of human infection is critical, and early identification of outbreaks is highly worthwhile. However, only rudimentary information exists regarding the genomic heterogeneity that can be expected within outbreaks (STEC Outbreaks). Moreover, current typing schemes, such as pulse field gel electrophoresis (PFGE) and multiple locus variable number of tandem repeats analysis (MLVA), often lack the resolution to differentiate organisms that form tightly clonal phylogenetic clusters within the O157:H7 clade (Eppinger et al., 2011b; Turabelidze et al., 2013; Underwood et al., 2013; Rusconi and Eppinger, 2016). Additionally, PFGE is subject to technological and interpretation challenges (Davis et al., 2003).

Increasing technologic economies offer new opportunities for sequence-based typing of microbial pathogens for public health purposes (den Bakker et al., 2014; Joensen et al., 2014; Leekitcharoenphon et al., 2014; Holmes et al., 2015). While it would be ideal to refer a clinical strain's sequence to a reference, of the 445 publicly available genomes of E. coli O157:H7 and its close relative O55:H7 (O157:H7 Genomes) (Kulasekara et al., 2009; Zhou et al., 2010; Eppinger et al., 2011a, 2013; Sanjar et al., 2014, 2015), to date only 11 have been closed (Hayashi et al., 2001; Perna et al., 2001; Kulasekara et al., 2009; Zhou et al., 2010; Eppinger et al., 2011b, 2013; Kyle et al., 2012; Xiong et al., 2012; Latif et al., 2014; Sanjar et al., 2014, 2015; Cote et al., 2015). Whole genome sequencing (WGS) can provide the necessary resolution power to investigate apparent single source outbreaks (Eppinger et al., 2011b; Hasan et al., 2012; Turabelidze et al., 2013) because the granularity of WGS data provides considerable confidence in assigning like vs. not-like status to two potentially linked pathogens (Gilchrist et al., 2015). Such data can also link pathogens to vehicles or environmental isolates most precisely (Bentley and Parkhill, 2015). WGS can offer additional advantages: serotypes and virulence loci within pathogens can be identified (Scheutz et al., 2012; Leekitcharoenphon et al., 2014; Lambert et al., 2015; Klemm and Dougan, 2016), and case management might theoretically be risk-optimized.

Optimization of E. coli O157:H7 sequence analysis methodologies depend on the scientific and epidemiologic inquiries and the data being analyzed. Pettengill et al. evaluated a number of single nucleotide polymorphism (SNP) predicting tools and phylogenetic methodologies in prokaryotes and concluded that a reference-based approach, which accommodates missing data as well as infers phylogenetic reconstruction, is the most appropriate (Pettengill et al., 2014). Such a reference-based approach was recently used by the Alberta Provincial Laboratory for Public Health to study E. coli O157:H7 outbreaks together with virulence profiling and other molecular methods (Berenger et al., 2015). No specific virulence pattern distinguished the outbreak strains from sporadic strains (Berenger et al., 2015). Recent studies have expanded WGS typing to globally distributed strains and identified geographical genomic structuring based on distribution of stx-converting phage integration sites and SNPs (Mellor et al., 2015; Strachan et al., 2015) and provided a more detailed subtyping of E. coli O157:H7 (Griffing et al., 2015). However, clarity can also be gained by comparing closely related isolates to each other, rather than to reference strains (Leopold et al., 2009; Turabelidze et al., 2013).

Here we adapt WGS to a specifically developed SNP-based pipeline for the high resolution typing of E. coli O157:H7 by identifying SNPs within the core genome. In addition to SNP analysis in the core genome we assessed plasticity in the mobilome by LS-BSR and plasmid comparison (phages and plasmids) (Eppinger et al., 2011a,b, 2014; Hasan et al., 2012; Jenkins et al., 2015). We tested this pipeline on isolates from seven retrospectively analyzed EHEC O157:H7 outbreaks, six intra-household cases, and five clinical "plate-mate" pairs, i.e., colonies from the same primary isolation plate from the clinical laboratory.

## MATERIALS AND METHODS

### Strains in Study

We compared human isolates (Supplemental Table 1) of nine phylogenetic clades (Manning et al., 2008), so as to place the strains in the overall E. coli O157:H7 phylogenetic context. Strain-associated metadata of analyzed E. coli O157:H7 are provided in Supplemental Table 1. Outbreak strains were defined as a set of isolates from different cases of infection arising from a single point source, as determined by local health jurisdictions and/or the Centers for Disease Control and Prevention. Intra-household cluster strains were recovered from siblings within a household whose infections were not linked to a recognized outbreak. Because intra-household clusters could reflect co-primary infections rather than secondary transmission, we selected such pairings from among our strain set collection (Cornick et al., 2002; Besser et al., 2007) on the basis of prolonged intervals (4–6 days) between cases, so as to increase the likelihood that genomic diversity might emerge secondary to inter-host transmission. Plate mates are pairs of isolates from the same sorbitol-MacConkey agar plate used in clinical laboratories to diagnose the infection.

### Bioinformatic Analyses for Polymorphisms Discovery in Core Genome and Mobilome

Developed bioinformatics workflows, methods and principles for SNP discovery and core and accessory genome analyses performed in this study are described in **Figure 1** with external tools referenced in the legend. Multinucleotide insertions and deletions of polymorphic bases were not considered SNPs. To classify SNPs we mapped the annotation from the de novo annotated references with PROKKA and Prodigal ORF prediction (Hyatt et al., 2010), or the deposited annotation for EC4115 (Eppinger et al., 2011b). The core genome was defined as the set of genic and intragenic regions that were not repeated, did not contain phages, IS elements, or plasmid regions. Briefly for SNP discovery, reads were aligned with Bowtie2 (Langmead and Salzberg, 2012) to designated reference genomes. Resulting alignments were processed with Freebayes (Garrison and Marth, 2012) with the following threshold settings: mapping quality 30, base quality 20, coverage 30, and allelic frequency 0.9. To account for false positive calls we used several SNP curation strategies: (i) Reference reads were mapped against the reference genome and false positives were identified by Freebayes with the settings described above; (ii) If reads were not available, the post-assembly workflow created a reference-based NUCmer alignment and extracted SNPs with delta-filter and show-snps distributed with the MUMmer package (Delcher et al., 2003). SNP occurring in the excluded regions were removed. Cataloged SNPs from each genome were merged into a single SNP panel, and allelic status and chromosomal position were recorded. Curated SNPs were further processed by extracting the surrounding nucleotides (40 nt) and blastn against the query genomes (Altschul et al., 1990). Resulting alignments were parsed to remove SNP locations derived from ambiguous hits (≥2), non-uniformly distributed regions, and insertion or deletion events, as previously described (Myers et al., 2009; Morelli et al., 2010; Eppinger et al., 2011b, 2014; Vogler et al., 2011; Hasan et al., 2012).

### Optical Maps

Optical mapping facilitated accurate phage profiling (Kotewicz et al., 2008). In total 12 maps were generated (Supplemental Table 1), either prepared by OpGen or contributed by FDA (Eppinger et al., 2011b, 2014). After gentle lysis and dilution, the extracted genomic DNA molecules from each strain were spread and immobilized onto derivatized glass slides. The genomic DNA was then digested with BamH1 restriction enzyme maintaining the DNA fragment order. Using the ArgusTM Instrument, the DNA fragments were stained with YOYO-1 fluorescent dye and photographed using a fluorescent microscope interfaced with a digital camera. The optical data was converted to digital data, which defines single molecule restriction maps. Physical maps were complemented with in silico maps of other outbreak strains, and comparatively analyzed in MapSolverTM Optical Map Analysis software (Latreille et al., 2007; Zhou et al., 2007).

### SNP PCR Validation

SNPs in four isolates from two outbreaks for which we possessed archived cultures were subjected to PCR confirmation using primer pair (89750-F 5′ - ACA ACG ATA TGA TCG ACC AGC, 89750-R 5′ - TTG TAC AGA AGA CCA TGC TCG) and (27005- F 5′ - AGA GTA CGG ATT CAC CTT GCC, 27005-R 5′ - AGT CAG GCA ATT CCT CGT GG, 78298-F 5′ - AGT CAT TAC CAG GAA CAG CAG 78298-R 5′ - TGT TCG AGA TTC TGG TGA GTG) for strains from the Battle Ground Lake and Finley School District outbreak, respectively. Resulting amplicons were Sanger sequenced.

#### Multi Drug Resistance (MDR) Profiling

Susceptibility to amikacin, ampicillin, amoxicillin-clavulanic acid, cefoxitin, ceftiofur, ceftriaxone, chloramphenicol, ciprofloxacin, gentamicin, kanamycin, nalidixic acid, streptomycin, sulfisoxazole, tetracycline, and trimethoprimsulfamethoxazole was assessed at FDA according to the NARMS methodology and manufacturer's instructions with the Sensititre automated system (Trek Diagnostic Systems, Westlake, OH) (Zhao et al., 2008). Resistance was determined by comparing MICs to Clinical and Laboratory Standards Institute (CLSI) values (Institute, 2013).

## RESULTS AND DISCUSSION

### Epidemiology of Investigated Strains

We analyzed 36 strains from seven US outbreaks as recognized by the CDC that occurred between 1998 and 2009 (Supplemental Table 1): (1) 11 children were infected after consumption of contaminated ground beef tacos in the Finley School District (FS) in 1998; (2) 28 swimmers at Battle Ground (BL) Lake State Park, WA, and eight secondary cases were infected in 1999; (3) 81 cases were attributed to lettuce served at multiple outlets of a taco chain (Taco John) in 2006; (4) 71 people were infected in a multistate outbreak after eating at Taco Bell (TB) in 2006; though the vehicle was not identified (Taco Bell); (5) 21 infections were attributed to a prolonged multi-state outbreak linked to the consumption of Totino's or Jeno's contaminated pepperoni pizza (Totino's pizza) (TP) in 2007; (6) 76 cases were attributed to a nationwide outbreak of contaminated cookie dough (CD) (Cookie dough) in 2009; (7) 26 patients from eight states were infected by beef traced to Fairbank Farms (FF) in 2009 (Fairbank Farms).

We further studied (#12) strains from six intrahousehold illnesses (IH), in which the pathogen probably spread between patients based on the long intervals between onset in the individual family members (Supplemental Table 1). Though we cannot exclude the possibility of infection from the same source (co-primary). The median incubation period of E. coli O157:H7 infections is 3 days (Bell et al., 1994), and onsets ranged between 4 and 6 days. We also studied pairs of isolates from the same primary plate in the clinical laboratory (plate-mates, PM) from six patients (**Figure 2**). The clinical strains were compared to strains representing the nine phylogenetic clades reported by Manning et al. (2008) (**Figure 2**, Supplemental Table 1).

### Core Genome Phylogeny

We applied WGS typing strategies to determine the phylogenetic relatedness of the individual outbreak strains in the context of outbreak etiology, and to place them into the larger phylogenomic framework of the E. coli O157:H7 lineage (Leopold et al., 2009; Eppinger et al., 2011b; Dallman et al., 2015; Holmes et al., 2015; Jenkins et al., 2015). Of the 3313 SNPs identified in these 70 genomes, 2797 were intragenic and 516

#### FIGURE 2 | Continued

Only nodes with bootstrap values below 100 are listed. Phylogenetic clade association is provided in circled numbers (Manning et al., 2006). Strains investigated are comprised of PM, Plate mates; IH, intrahousehold infections; BL, Battle Ground Lake; CD, Cookie Dough; FS, Finley School District; FF, Fairbank Farms; TB, Taco Bell; TJ, Taco John; and TP, Totino's Pizza. Our plasmid survey confirmed that all strains carry the lineage-specific plasmid pO157. Further we identified other plasmids with a size range from 34 to 78 kb. Plasmid type prevalence is represented in colored circles: p78 (yellow), p63 (orange), p55 (light brown), p39 (red), p36 (brown), p34 (dark brown), and a small 3.3 kb plasmid (black) (Makino et al., 1998). Plasmid p36 is homologous to pEC4115 (Eppinger et al., 2011b).

were intergenic (**Table 1**, Supplemental Dataset 1). We observed significantly more SNPs in intergenic regions (chi square test, p < 10−14) than would be expected when considering the average intergenic frequency in EC4115 of 11.1% when compared to the percentage (15.5%) delineated from the cataloged SNPs. We note that, even though we excluded repeated regions and phages/mobilome during SNP discovery, thereby reducing the genome content by 20%, the coding to non-coding ratio of the remaining core genome remained stable. Homoplasy was negligible: only seven homoplastic SNPs were found dispersed throughout the chromosome (Supplemental Table 2) evidenced by a consistency index of 0.998. Of the seven homoplastic SNPs four are in rpoS, which is known to be highly polymorphic in E. coli O157:H7 (Uhlich et al., 2013). In line with our previous findings (Leopold et al., 2009; Eppinger et al., 2011a,b), SNPs were evenly distributed throughout the chromosome (**Figure 2**) without any mutational hot spots as found in other enteric pathogens (Hasan et al., 2012; Eppinger et al., 2014). From the cataloged SNP panel we delineated a total of 77 individual SNP genotypes. These genotypes represent only two-thirds of the 115 nodes in the tree (**Figure 2**, Supplemental Table 2), which can be attributed to the lack of terminal strain-specific SNPs (**Figure 2**). Among the cataloged 3313 SNPs, approximately onethird (#1266) is parsimony-informative. The SNPs in PA40 and PA48 are not strain-specific, but indicate the relative phylogenetic distance that separates these clade 7 and 9 strains from the other clades (Manning et al., 2008). As evidenced in the tree topology, approximately half of the parsimony non-informative SNPs (#1046) is introduced by reference strain PA48 from clade 9 (**Figure 2**, Supplemental Table 2). Among investigated strains PA48 is phylogenetically closest to the progenitor O55:H7 serotype (Feng et al., 1998; Manning et al., 2008; Zhou et al., 2010). This clade is also within the most ancient cluster of E. coli O157:H7 (Leopold et al., 2009) and higher SNP counts are indicative of more time to accrue mutations than in other phylogenetic groups that have emerged more recently.

#### Genomic Epidemiology of North American Outbreaks

Guided by the established phylogenomic framework (**Figure 2**), we analyzed outbreak specific "genome" characteristics and polymorphic heterogeneity in seven different North American outbreaks using a common (EC4115), as well as outbreak-specific references. We specifically applied this dual

parsimonious trees with a consistency index of 0.998. Trees were recovered using a heuristic search in Paup 4.0b10 (Wilgenbusch and Swofford, 2003).



reference genome approach to improve resolution power by enabling polymorphism discovery in parts of the core genome integral to the outbreak-associated strains, but not necessarily present in a more phylogenetic distant reference like EC4115 (**Figure 2**). We note here that our investigation of the 2006 Spinach (SP) outbreak revealed a number of subtle polymorphisms distinguishing all the recovered Maine isolates from the remainder of SP strains. Such subtle polymorphisms would have clearly evaded detection by using a reference from outside the SP outbreak (Eppinger et al., 2011b). In general we observed limited plasticity among related outbreak strains when compared to the closed reference genome EC4115 (**Tables 2**–**5**) (Eppinger et al., 2011b). Strains with increased SNP numbers were either from cases that were epidemiologically predicted to be outliers, or that could not be read-corrected. For example, the TB outbreak associated strains included four strains classified by CDC as temporal outliers (Supplemental Table 1), two (EC4436, EC4437) separated by 285 SNPs and one (EC4439) by 27 SNPs from the core outbreak cluster (**Figure 2**, **Tables 2**, **3**). According to our SNP analysis, the remaining strain EC4448 should be considered to be derived from a single point source, even if this isolate is separated by a single homoplastic nonsynonymous (nsyn) SNP (**Figures 2**, **3**, **Tables 2**, **3**, Supplemental Table 2). Notably, rpoS carries this homoplastic stop codon mutation, which is known to be highly polymorphic in E. coli O157:H7; particularly in regards to premature stop codons that affect curli expression and biofilm formation (Uhlich et al., 2013). The same SNP was identified with an outbreak specific reference EC4401 in addition to multiple (#31) reference specific alleles (**Figure 4A**, **Tables 2**, **3**, Supplemental Dataset 1). These SNPs were mainly located in intergenic (#30) regions and probably caused by over-predictions because of a lack of reads in the genome repository, and consequently inability to perform quality control. We observed the same phenomenon of over-prediction for the TJ outbreak strains, separated by 24 SNPs (**Tables 2**, **3**, Supplemental Dataset 1); again no read data were available to us. We found the majority of predicted SNPs clustered mainly in close proximity either in intergenic regions or within the boundaries of the same gene, indicative of low quality sequence regions. Intragenic SNPs were identical to those found in EC4115, except for two additional SNPs in the lac repressor and in rpoS (Supplemental Dataset 1). The CD outbreak set underwent both contig and read-based discovery, which again over-predicted SNPs for EC1734 (no reads) due to a lack of reads for quality control (**Figures 2**, **4B**, **Tables 2**, **3**). Moreover, the production lot isolate EC1738 was placed on a distant branch (clade 6.26), separated from all human isolates tightly clustered in clade 8.30 (**Figure 1**, **Table 2**, Supplemental Table 1). Hence, we consider this strain as an outlier, which is phylogenetically unrelated to the case isolates. Among the outbreak-specific SNPs we detected one synonymous (syn) and two nsyn SNPs in EC1736, but the syn was also detected using EC4115 as a reference (**Table 2**, Supplemental Dataset 1). Archived strains were not available for this outbreak and we could therefore not confirm if EC1736 truly carries these 3 SNPs, which would question its inclusion into the outbreak.

For three outbreaks (FF, FS, and BL) we identified only a single or no SNP when referenced to EC4115 (**Figure 2**, **Tables 2**, **3**). A single intergenic and three nsyn mutations were identified when using an FF outbreak-specific reference strain EC1856 (**Tables 2**, **3**). The three nsyn SNPs did not affect domain prediction in Pfam (Finn et al., 2016). B112 of the BL outbreak had a syn SNP in a tRNA-histidine ligase (#3460738) not found in any other E. coli O157:H7 genome deposited (nr or WGS). This SNP was identified in both instances when using EC4115 or an outbreak-specific reference (Supplemental Dataset 1). This SNP was confirmed using PCR amplicon sequencing. Using the alternative FS outbreak-specific reference one intergenic SNP in B105 and one nsyn SNP affecting the Nitrogen regulation protein NR(I) (ECH74115\_RS26390) (B107 and B105) were identified (**Tables 2**, **3**). SNP discovery predicted an outbreak-specific allele in three strains. However, these SNPs are false positives, as they could not be confirmed by PCR sequencing. The SNP (#693920) in FS strain B103 was identified as false-positive homoplastic SNP also observed in plate mate and intrahousehold strains with an allelic frequency below 0.9 (Supplemental Table 2). During SNP prediction we identified 52 SNPs in strain B103 that were not found in the other FS outbreak strains. These 52 SNPs were all located in a phage region that corresponds to the tandem integrated SP1/2 phages (Hayashi et al., 2001). The SNPs were all false-positives due to the presence of an additional phage in B103 related to a prophage from organism pro483 (NC\_028943) (Supplemental Figure 3). The tail fiber proteins of these two phages were sufficiently similar to misalign reads for B103. This exemplifies the importance of SNP curation and assessment according to the genomic region in which they originate, as independent horizontal acquisition of segments can introduce epidemiologically misleading SNPs (Pettengill et al., 2014), also known as epidemiological type 2 errors of attribution.

The TP outbreak strains revealed a highly distinct SNP pattern compared to the genomic plasticity reported for other outbreaks (**Figures 1**, **5**, **Tables 2**, **3**). Two distinct phylogenetic clusters separated by 16 SNPs were observed. Additionally, each strain carried at least 4–17 strain-specific SNPs. Comparison to outbreak-specific reference EC1863 confirmed the relative high number of strain-specific SNPs (**Tables 2**, **3**, **Figure 5**). In


TABLE 2 | Comparison of common vs. outbreak-specific reference genic SNPs.

*<sup>a</sup> This study/short reads in NCBI* = *A, WGS* = *B, assembled genomes in NCBI* = *C.*

contrary to our observations for strain-specific SNPs in the above discussed outbreaks, these SNPs are neither concentrated in specific regions nor more frequent in intergenic than in genic regions (**Tables 2**, **3**). The EC1869/EC1870 branch contributes roughly 60% of all SNPs (Supplemental Dataset 1). Based on the established phylogenetic topology we hypothesize that two



*<sup>a</sup>This study/short reads in NCBI* = *A, WGS* = *B, assembled genomes in NCBI* = *C. <sup>b</sup>F* = *mixed analysis with reads missing for some strains.*

closely related but different E. coli O157:H7 contaminated a common vehicle, if, indeed, all cases had the same exposure. Two-thirds of the SNPs were strain-specific, denoting a particular high diversity within this outbreak (**Figures 1**, **5**, **Tables 2**, **3**). Such a degree of genomic plasticity among epidemiologically linked strains has rarely been observed in E. coli O157:H7. Several scenarios could have led to this radial expansion: (i) the epidemiology linked cases together that actually were from different simultaneous outbreaks, (ii) the SNPs identified in silico are false positives and only PCR-confirmation could really confirm the true distance among the strains, (iii) the high rate of accumulated SNPs could be caused by a mutator genotype resulting in the accumulation of mutations in a short time span, (iv) the heterogeneity could be related to the protracted duration of the outbreak (3 months), vs. single, brief, single source-exposures as in the FS outbreak, or (v) heterogeneity caused by increased strain mutation rates during outbreaks as have been discussed for other enterics (Morelli et al., 2010). In support of our findings, Dallman et al. noted correlations between the length of the strain collection intervals and respective numbers of SNPs observed (Dallman et al., 2015).

The clonal nature of E. coli O157:H7 outbreaks was confirmed in the majority of the outbreak strains analyzed here, consistent with prior findings from SNP typing in other O157:H7 outbreaks (Turabelidze et al., 2013; Dallman et al., 2015; Holmes et al., 2015; Jenkins et al., 2015; Munns et al., 2016). We found the number of SNPs to be inversely proportional to the availability of reads. This highlights the critical importance of quality control for accurate SNP discovery by accounting for both underlying sequence quality and evolutionary context of the SNP carrying loci to curate for false-positives. In this regard, the relevance of excluding mobile regions when inferring outbreak relatedness is evidenced in the loss of at least two thirds of predicted SNPs that if considered would impair phylogenetic accuracy.

### WGS Typing of Plate Mates Recovered from Human Infections

In the medical praxis typically a single colony is retrieved from a primary isolation plate and sent for further molecular analysis. It is therefore not clear how much genotypic diversity exists among infecting isolates of E. coli O157:H7 as shed from the same individual in a single stool. To answer this question, plate-mates (pairs of colonies) were separately saved from five patients (**Figure 2**, Supplemental Table 1) enrolled in a multistate study of E. coli O157:H7 infections (Wong et al., 2012). In the EC4115 reference-based discovery, two PM possessed the same homoplastic intergenic SNP (**Figure 2**, **Tables 4**, **5**), which was not confirmed after allelic verification. When using an internal reference these strains were undistinguishable. The results are in accordance with those of Dallman et al., who reported 0–2 SNPs among same patient isolates, with most (70%) having no SNP differences at all (Dallman et al., 2015). Our results from this limited study, therefore, point toward infection with a single E. coli O157:H7 clone as the underlying cause for the majority of infections. We previously reported that a single laboratory passage can produce SNPs in E. coli O157:H7, but SNPs arise only rarely (Eppinger et al., 2011b). In the course of naturally acquired human infections, our


#### TABLE 4 | Comparison of common vs. PM/IH-specific genic SNPs.

*<sup>a</sup>This study/short reads in NCBI* = *A, WGS* = *B, assembled genomes in NCBI* = *C.*

data endorse that E. coli O157:H7 SNPs are exceptionally rare events.

#### WGS Typing of Strains from Intrahousehold Infections

To determine if genomic changes in infecting E. coli O157:H7 occur during probable intrahousehold (IH) transmission, we analyzed a cohort of six pair isolates from IH infections where onset was quite delayed between cases (**Figure 2**). As with the PM pairs, EC4115 based SNP discovery resulted only in false positive homoplastic intergenic SNPs (**Figure 2**, **Tables 3**, **4**) that were absent in the pair-wise analysis. Dallman et al. observed similar SNP distributions in household transmission cases in the UK, with 40% having no such differences in the core genome (Dallman et al., 2015). Interestingly, two IH cases of clade type 3.15 clustered together (**Figure 2**). A single syn SNP was specific to the B83/B84 cluster. These cases were all from the same state and occurred in the same year, but epidemiological investigations suggest they are separate cases of IH transmissions with over 6 weeks between occurrence and 80 miles distance between the zip codes in which the cases resided. This application of WGS typing analysis can genomically link clusters that were not previously identified epidemiologically (Dallman et al., 2015).

In general the frequency of SNPs in intergenic and genic regions were similar, highlighting the random nature of SNPs identified. While there is clearly no applicable universal gold standard or criteria for outbreak ex- or inclusion in regards to SNP matrix distances, we note that a number of outbreak investigations have found between four to seven SNPs among strains with putative epidemiological links (Underwood et al., 2013; Joensen et al., 2014; Dallman et al., 2015; Holmes et al.,

#### TABLE 5 | Comparison of common vs. PM/IH-specific intergenic SNPs.


*<sup>a</sup>This study/short reads in NCBI* = *A, WGS* = *B, assembled genomes in NCBI* = *C. <sup>b</sup>False positive* = *FP.*

2015). However, these analyses are limited by only including the genic portions of the genomes and/or did not use an outbreakspecific reference for SNP discovery. This prevents identification of variations in parts of the core genome that are unique to outbreak-associated strains and not necessarily present in a distantly related closed reference strain. Moreover, only few studies use confirmatory PCR or other resequencing to validate in silico delineated SNPs (Eppinger et al., 2011b; Underwood et al., 2013).

#### Phage Profiles of Clinical U.S. Strains

The abundance of lambdoid phages in the EHEC O157:H7 genome hinders assembly of phage regions based on short reads alone (Eppinger et al., 2011b). Contig breaks often occur within the phage borders due to the conserved nature of structural and replication proteins and hinder individual phage-level comparisons in the fragmented phage assemblies. Therefore, we applied an alternative genome-scale strategy to comprehensively analyze stx allele status and losses or gains in the strain's phage ORF-omes.

Major virulence traits of E. coli O157:H7 are encoded on members of the mobilome that are usually stably integrated into the chromosome, such as the locus of enterocyte effacement (LEE) and stx-converting phages (Nataro and Kaper, 1998). Phages are key components of pathogenome evolution and their acquisitions are important events in the emergence of E. coli O157:H7 from an ancestral cell closely related to E. coli O55:H7 (Feng et al., 1998, 2007; Zhou et al., 2010). Moreover, analyses such as SNP typing that are limited to the core genome cannot provide information about the conferred pathogenic potential anchored in the mobilome. Our analysis of the 2006 SP outbreak exemplifies genomic heterogeneity that can be found in a single outbreak of O157:H7 in regards to mobilome (Eppinger et al., 2011b). Within the prophage pool (Hayashi et al., 2001) the stx-converting bacteriophages are of particular interest, as they encode a potent cytotoxin, Shiga toxin or Stx (Karmali et al., 2010) as direct mediator of EHEC O157:H7 disease (Krüger and Lucchesi, 2015). In E. coli O157:H7 the chromosomal backbone is highly conserved and genomic alterations chiefly relate to phage complement, plasticity, and respective integration sites (Shaikh and Tarr, 2003; Abu-Ali et al., 2009; Eppinger et al., 2011b, 2013; Smith et al., 2012; Yin et al., 2015). Three stx alleles, stx1a, stx2a, and stx2c, are found predominantly in this lineage (Scheutz et al., 2012). We used discontiguous megablast against the VirulenceFinder database to determine the toxin subtypes present in each outbreak (Joensen et al., 2014). All IH, PM, and outbreak strains carry the more potent allelic variant stx2a (Supplemental Table 1). In addition, all FS and TJ outbreak strains, PMs B40-1/2 and B26-1/2 and two separate IH transmission cases (B83/B84, B85/B86) carry an stx1-converting phage. Co-carriage of Stx2 and Stx1 can reduce Stx2a production (Serra-Moreno et al., 2008) and also attenuates end-organ toxicity of Stx2a (Donohue-Rolfe et al., 2000; Russo et al., 2016). Noteworthy, the 2006 SP outbreak associated with hypervirulence (Kulasekara et al., 2009; Abu-Ali et al., 2010) features the Stx2a/2c toxin type, with an almost complete stx1 converting phage occupies yehV. However, this atypical phage lacks stx1genes (Eppinger et al., 2011b). We also note that the TJ lettuce isolate TW14588 harbors two stx2a-converting phages integrated at argW and wrbA (Supplemental Table 1). We speculate that double stx2a-converting phage occupancy might also increase pathogenic potential, such as through phage dosage effects, also considering that stx2a is the most potent allelic subtype (Tesh et al., 1993; Tesh, 2010; Fogg et al., 2012). We note that this information cannot be gathered by PCR-based Stx-subtyping (Scheutz et al., 2012), as this approach does not determine copy number, highlighting the increased resolution obtained by WGS in regards to the pathogenic potential of the outbreak (Holmes et al., 2015). All other outbreaks except BL possess stx2c-converting bacteriophages. The interplay between these two stx2-converting phage types is not known, although both variants have been linked to HUS (Friedrich et al., 2002;

correspond to excluded mobile regions, such as *stx*-prophages.

Persson et al., 2007). We observed only two variations in the stx2 allelic profiles in the CD and TP outbreaks (Supplemental Table 1). The non-clinical CD outlier EC1738 collected in the production plant is distinguished by lack of the stx2a allele. TP strain EC1870 lacks the stx2c allele that is present in all other TP outbreak strains (Supplemental Table 1).

On average, the phage complement in the strains represents 14% of predicted coding regions in the genomes, in accordance with other studies (Asadulghani et al., 2009; Smith et al., 2012). As expected when considering the close relation between outbreak strains (**Figure 2**), the variability in phage-borne ORFs was low (5%) (Supplemental Figure 1; Supplemental Table 3). This small variability represents the noise caused by clustering of related proteins into centroids in the LS-BSR analysis rather than differences in phage regions. The TB associated strains had more variability (11.2%) (Supplemental Table 3), due to inclusion of the temporal outliers EC4436, EC4437, and EC4439 (**Figure 6**). Variome analysis highlighted phage regions that were unique to each temporal outlier group, resulting in the same clustering as in the SNP-based analysis (**Figures 2**, **6**). The CD outbreak also had greater variability (16.5%) despite the exclusion of outlier EC1738 (Supplemental Table 3), likely attributable to differences in fragmentation of the analyzed genomes. We observed a correlation of quality of PHAST prediction with size of contigs and reduced genome fragmentation. Closed genomes and genomes with larger contigs had up to 20% more predicted phage regions that also served to increase the noise, compared to more fragmented genomes (Supplemental Table 1).

The identified phage complements of the FS isolates were highly similar. However, we found phage sequences unique to strain B103 (Supplemental Figure 2A). Discontiguous megablast of the phage region (Buhler, 2001; Ma et al., 2002) against closed bacteriophages identified Escherichia phage pro483 (KR073661), originally isolated from an avian pathogenic E. coli DE048. This prophage was previously described in SP strains (Eppinger et al., 2011b) and a supershedder strain SS17 (Cote et al., 2015). Unlike the yegQ insertion in SP outbreak strains (Eppinger et al., 2011b), this phage disrupts the colicin immunity protein (WP\_001303895) in strain B103. Using the phage pro483 (KR073661) as a genomic anchor for the B103 draft contigs we recovered 86% of the phage genome with 97% identity. We further identified a 12 bp (ACCAATAACTGA) repeat at both ends of the phage borders, indicative of the phage integration mechanism (Campbell, 1992). The SP outbreak strain EC4115 however features an 18 bp repeat (Eppinger et al., 2011b). The genomic architecture is syntenic and largely conserved throughout the phage genome, except for insertion or deletion introduced by an exonuclease (ECH74115\_RS15445) and a hypothetical protein (ECH74115\_RS15450) only present in EC4115 (Supplemental Figure 3).

Acquisition or loss of phages secondary to recombination events during the course of an outbreak creates interstrain plasticity. Thus, analysis of a single archetypical outbreak strain might underestimate the mobilome and core chromosome plasticity (Eppinger and Cebula, 2015). Comprehensive analyses did not reveal significant differences in phage content of the BL and FF outbreak clusters (**Figure 2**) to further distinguish these clonal outbreaks featuring only one and four SNPs per outbreak cluster, respectively (Supplemental Figures 2B,C). In contrast, the TP outbreak strains displayed a higher degree of mobilome plasticity (Supplemental Figure 2D), in line with the higher number of predicted SNPs (#98) for this outbreak (**Figure 2**, Supplemental Dataset 1).

#### Plasmid Prevalence in Clinical *E. coli* O157:H7 Strains

The E. coli O157:H7 lineage is distinguished from other serotypes by the presence of the large virulence plasmid pO157 (Burland et al., 1998). For this serotype, additional plasmids have been occasionally characterized at sequence level (Makino et al., 1998; Eppinger et al., 2011b) or by plasmid profiling (Ostroff et al., 1989; Meng et al., 1995). To facilitate plasmid discovery and survey we reassembled the genomes with SPAdes (Bankevich et al., 2012). Even though deposited genomes from 454 and Illumina Celera hybrid assemblies (Denisov et al., 2008) had fewer contigs compared to SPAdes assemblies from Illumina reads only (Supplemental Table 1), reassembly typically produced longer contigs, in particular for plasmid-originating regions. If Illumina reads only were processed, the SPAdes assemblies clearly outperformed NCBI deposited Velvet assemblies in regards to sensitivity for plasmid prediction (Supplemental Table 1). We queried plasmid sequences against the NCBI nr plasmid database using discontiguous megablast (Buhler, 2001; Ma et al., 2002).

Using this approach we discovered five plasmids at the sequence level (p78, p34, p55, p63, and p39) that have not been previously described in deposited E. coli O157:H7 genomes. Among these is a homolog of a 37 kb conjugal transfer pEC4115, referred to as p36, originally described in the SP outbreak strains (Eppinger et al., 2011b). We found the TB and TP outbreaks to be most diverse in regards to plasmid carriage (**Figure 2**). The TB associated strains contained three distinct plasmid profiles, which correlated with the clustering from the core genome SNP discovery (**Figure 2**).

Plasmid p78, the largest plasmid, shows homology to the conjugative IncI1 group E. coli plasmid pC49-108 and Salmonella enterica plasmids (Fricke et al., 2011; Kröger et al., 2012; Wang et al., 2014a). p78 varied in length, from 78 to 88 kb in clade 8 strains (**Figure 7**). The related plasmid pC49- 108 carries multiple antibiotic resistance genes (Wang et al., 2014a), including a beta-lactamase (blaCTX−M−1) (Wang et al., 2014a), dihydrofolate reductase (dfrA17) and aminoglycoside adenylyltransferase (aadA5) found both adjacent to a class 1 integron (Wang et al., 2014a). In similarity to the blaCTX−M−<sup>1</sup> located next to a mobile element (ISECp1), we found another class C beta-lactamase gene in S. enterica CVM 22462, again found next to a mobile transposase locus. We speculate that colocalization to mobile elements might affect locus stability and explains the scattered prevalence of these resistances in the plasmid homologs (Wang et al., 2014b) (**Figure 7**).

Resistance to antibiotics has been observed in E. coli O157:H7, but the genetic basis remains largely unknown (Meng et al., 1998). We previously linked multi drug resistance (MDR), a rare occurrence in E. coli O157:H7, to phage-borne antibiotic resistance loci (Eppinger et al., 2011a). Strain EC4402, part of the core TB outbreak cluster, was identified as a MDR isolate (**Figure 2**). This strain displays elevated MICs for several cephalosporins and aminoglycosides, sulfisoxazole and nalidixic acid (quinolone). However, our in silico analysis with ResFinder (Kleinheinz et al., 2014) did not reveal any potential underlying resistance loci. Here we note that resistance phenotypes can be conferred by loci not previously linked to antibiotic resistance (Gibson et al., 2016). We speculate that the resistance loci might have been either lost from the original p78, or were an integral part of other MDR plasmids lost during laboratory cultivation prior to the sequencing of EC4402. Alternatively, the antibiotic resistance might be conferred by yet unknown loci not represented in queried resistance databases.

Plasmid p36 was highly homologous to other conjugal transfer plasmids, such as S. enterica plasmid pCFSAN000111\_01 (NZ\_CP007599) (Timme et al., 2012) and pEC4115 (Eppinger et al., 2011b) (Supplemental Figure 4). While p78 was found solely in clade 8, p36 seems to be more widespread (Cote et al., 2015), and present in non-O157:H7 E. coli serotypes.

Co-carriage of a p78-p36 combination was also found in clade 8 strains K4405 and K4406 (**Figure 7**, Supplemental Figure 4). The TB outlier strain EC4439 lacks both p78 and p36, but carries a p55 plasmid with high homology to Klebsiella pneumoniae pDMC1097-77.775 kb (87% coverage, 99% identity) (Wright et al., 2014) (Supplemental Figure 5). This IncI2 group plasmid carries multiple resistances, which are absent in the E. coli plasmid homologs (Supplemental Figure 5). Interestingly, this plasmid is also present in IH strain B86, but absent from strain B85, either because of independent acquisition or secondary loss in B85, respectively (**Figure 2**). Our findings on plasmid prevalence are in accordance with those of Dallman et al. who showed that epidemiologically linked strains can vary largely in their plasmid inventory (Holmes et al., 2015).

The IH strains B89 and B90 harbor p34 (Supplemental Figure 6), which is related to E. coli pVR50, an F-like conjugative MDR plasmid (Beatson et al., 2015). While the overall plasmid backbone is conserved, p34 lacks any resistance loci (Supplemental Figure 6). The TP strains also possess strainspecific plasmids: p36 in EC1870, p63 in EC1863, and p38 in EC1868 (**Figure 2**). Plasmid p63 has partial homology to pO26- Vir, an IncK group plasmid, a mosaic of multiple plasmids (Fratamico et al., 2011). In 1863 (p63) we found homologous loci for conjugal transfer and type IV pili (Supplemental Figure 7), which have been implicated in cell adherence and biofilm formation (Dudley et al., 2006), and notably, these phenotypes are strain-dependent in E. coli O157:H7 (Vogeleer et al., 2014). A 39 kb plasmid fragment in EC1868 (p39) was found to be homologous to a 87 kb INcFII plasmid from E. coli (pGUE-NDM) (Bonnin et al., 2012) (**Figure 2**).

The observed variability in plasmid type and prevalence in the individual strains clearly highlights genomic plasticity that exists even among closely related isolates of the same origin and can be utilized for strain attribution (Eppinger et al., 2011b). The identified heterogeneity between the mobilome of outbreak strains stresses the importance of studying a number of isolates from the same outbreak instead of using archetypal outbreak strains, which as shown might not fully reflect the plasticity in the outbreak population (Eppinger and Cebula, 2015). Interestingly, all the above described E. coli O157:H7 plasmids lack antibiotic resistance loci, even though our plasmid survey found widespread resistances among homologous plasmids in other serotypes and species.

### CONCLUSION

While some of these clinical isolates have been studied previously using molecular epidemiology techniques (Samadpour et al., 2002), we have for the first time applied whole genomics epidemiology approaches (Eppinger et al., 2011b). Through these high resolution methods we established a detailed understanding of the genomic heterogeneity found among the studied E. coli O157:H7 outbreak populations from the U.S. The gathered phylogenomic data were critical to define the genetic relatedness of individual strains in the context of outbreak etiology and phylogenetic positions in the broader model of E. coli O157:H7 evolution and epidemiology. In this

study, we detected previously unnoted polymorphic genome features in the core and mobile genome, such as an array of new plasmids not previously associated with this lineage. The cataloged polymorphic signatures aided in strain attribution and allowed us to precisely define the outbreak boundaries. This allowed us to discern the distinct phylogenetic boundaries of the studied EHEC strains when placed into a larger phylogenomic framework of E. coli O157:H7 from North America (**Figure 2**) assessing both core and mobilome (**Figures 6**, **7**) (Eppinger and Cebula, 2015). The developed WGS typing approach

FIGURE 7 | Alignment of plasmid p78 and homologs. Plasmid architecture and gene inventories were compared by tblastx, and respective annotations were mapped in Geneious vR9 (CDS, green). The plasmid architecture was highly conserved with a high identity level throughout the entire length of the plasmid [100% identity yellow (inverted fragment, orange), 36% identity blue (inverted fragment light blue)]. Plasmid p78 homologs were widespread, such as in TB outbreak associated strains, B28 plate mates, other *E. coli* O157:H7, and *S. enterica*. We found a locus for a RelB/E tox/antitoxin system present in all plasmids, with the exception of PM strains B28 (blue box). Resistance loci are highlighted in red boxes.

#### TABLE 6 | Proposed criteria and practices for SNP-based epidemiological outbreak inclusion or exclusion.


provided us with the necessary resolution power to study the individual dynamics in highly clonal outbreaks (Morelli et al., 2010; Eppinger et al., 2011b, 2014; Hasan et al., 2012; Berenger et al., 2015; Holmes et al., 2015; Jenkins et al., 2015).

While the majority of outbreaks were caused by pathogens that form tight clonal clusters, one outbreak ("Totino's Pizza") was associated with isolates showing considerable genomic heterogeneity (**Figures 2**, **5**). Apparent SNPs in other outbreaks are associated with a paucity of reads for quality control, falsely increasing the diversity among the outbreak isolates. Since outbreaks can have high economic impacts, such as nationwide recalls of contaminated product, multiple samples from the same outbreak should be concomitantly sequenced instead of using archetypal outbreak strains to provide strong evidence for inclusion or exclusion, strain and source attribution (Eppinger et al., 2010, 2014; Morelli et al., 2010; Hasan et al., 2012). Additionally, these high resolution approaches allow for the discovery of emerging pathotypes, and, potentially, to better assess the pathogenic potential of individual bacterial clones (Berenger et al., 2015; Klemm and Dougan, 2016). Expanding these sequence-based analyses to the publicly available EHEC sequence pool will improve public health response in the event of an outbreak allowing timely and informed countermeasures. Canonical SNPs can be implemented in efficient typing assays offering robust phylogenetic signals for outbreak exclusion/inclusion that surpass classical technologies (Riordan et al., 2008; Elhadidy et al., 2015; Rusconi and Eppinger, 2016).

Our study strongly endorses that quality of SNPs and choice of an appropriate reference strain in WGST approaches are equally critical to achieve phylogenetic resolution and accuracy (**Table 6**). Here we also demonstrate that in order to avoid type 2 error of attribution, the quality of SNP data obtained from WGS approaches is crucial (**Table 6**). For read-based discovery approaches we would like to emphasize the importance of SRA data availability, which is not only foundational to determine coverage and quality of detected SNP positions, but also to optimize assembly quality should assemblers with improved algorithms become available (Supplemental Table 1). SNP discovery with an appropriate outbreak-specific reference strain is critical for reference based WGS typing. To fully assess the genomic plasticity, the reference should be phylogenetically related and not too distant to the strains of interest, as evidenced by the resolution power gained using a within outbreak reference (**Figures 2**, **4**, **5**). By extending our analysis to the mobilome, we detected plasticity among clonal strains in phage and plasmid content describing novel plasmids not previously associated with E. coli O157:H7. We also would like to stress the importance of publicly available strain associated clinical, environmental, and epidemiological metadata concomitantly to the genomic data as prerequisite for informed source attribution (**Table 6**) (Eppinger and Cebula, 2015). We anticipate that NGS long-read technology, such as contemporary SMRT technology (English et al., 2012), or other platforms under development (Feng et al., 2015; Rhoads and Au, 2015) will tremendously benefit WGS typing strategies as it pertains to the highly homogenous E. coli O157:H7 lineage (Zhang et al., 2006, 2007; Eppinger et al., 2011b). In particular, long-read technologies will produce (near) closed genomes and thus allow to accurately determine the stx-virulence status by defining not only stx allele type, but also stx-converting phage combination, plasticity, and location, all factors that have been associated with alterations in Stx-production as direct mediator of EHEC disease (Ogura et al., 2015; Toro et al., 2015; Yin et al., 2015).

Our data provide insight into the maximum number of permissible SNPs two strains can have and still designate them of the same origin. In prior work, we found no SNPs between 24 isolates of the same point-source cluster, focusing on backbone ORFs (Turabelidze et al., 2013). Dallman et al. and others tolerated up to 4 SNPs in the core genome before assigning two isolates to different sources (Underwood et al., 2013; Joensen et al., 2014; Dallman et al., 2015; Holmes et al., 2015). We found one bona fide SNP in the course of a single point-source, shortterm outbreak. Since no gold standards have yet been accepted for E. coli O157:H7 WGS typing we propose the following criteria (**Table 6**) for inclusion (presumably of same source) vs. exclusion (presumably of different source) of investigated isolates: (i) High-quality whole genome sequence fortified with extensive epidemiological outbreak data, (ii) genome-scale SNP discovery based on high quality sequencing with reference, (iii) exclusion of mobilome and repeats (to reduce epidemiological noise), followed by (iv) PCR-confirmation of eventual SNPs for definitive in-/exclusion, and (v) mobilome discovery which can significantly contribute to the genomic plasticity. Moreover, for cases that are quite dispersed in time and space, there should be greater stringency in assigning "like" status to two strains that are even differentiated by a single SNP. When outbreaks occur, there are often large product liability issues at stake, and considerable obligation on disease control authorities to identify such clusters and molecular typing serves an increasingly important role. Therefore, diligence should be exercised in choice of sequence-based typing protocols, and in their analysis.

Finally, while we eagerly anticipate the introduction of sequence-based pathogen typing as a public health and disease prevention tool (Sadiq et al., 2014; Eppinger and Cebula, 2015), we share the concern of Osterholm (2015), who stresses that this powerful technology be employed as an adjunct to, and not a replacement for, case interviewing (descriptive epidemiology) and environmental investigations. Also, we are entering an era of non-culture diagnosis of enteric infections, including those caused by E. coli O157:H7 (Schatz and Phillippy, 2012; Klemm and Dougan, 2016). The high resolution data presented in this article would not have been possible without classic diagnostic microbiology laboratory recovery of the pathogen of interest. We hope that resources will be devoted to recovering these agents from submitted specimens, so as to complement case investigation by local healthy jurisdictions.

## ACCESSION NUMBER

The sequence data sets analyzed in this study have been retrieved from the short read archive (SRA) and whole genome shotgun repository at NCBI. Accession numbers for the genomes are provided in Supplemental Table 1.

### AUTHOR CONTRIBUTIONS

Conceived and designed the experiments: ME. Analyzed the data: BR, FS, SK, MM, PT, ME. Contributed reagents/materials/analysis tools: MM, PT, ME. Wrote the paper: BR, SK, MM, PT, ME.

## FUNDING

The study was supported by the National Institute of Allergy and Infectious Diseases, National Institute of Health, Department of Health and Human Services under contract SC2AI120941, the US Department of Homeland Security under contract 2014- ST-062-000058, the Department of Biology, the South Texas Center for Emerging Infectious Diseases (STCEID) at the University of Texas at San Antonio, and the High Performance Computing Center (HPC) under contract 2G12RR013646-12. Strain collection and archiving was funded by R01DK52081 and P30DK052574 for the Biobank Core. BR was supported in part by the Swiss National Science Foundation Early Postdoc Mobility Fellowship (P2LAP3-151770). FS was supported in part by the South Texas Center for Emerging Infectious Diseases (STCEID) and through an UTSA Teaching Fellowship (UTF).

## ACKNOWLEDGMENTS

This work received computational support from Computational System Biology Core, funded by the National Institute on Minority Health and Health Disparities (G12MD007591) from the National Institutes of Health. We would like to thank Armando L. Rodriguez for technical support and maintenance of the Galaxy platform and Dr. Anna Allue-Guardia and Heidi Gildersleeve for critically reading the manuscript. Further we are grateful to Dr. Nurmohammad Shaikh for assistance with SNP PCR confirmation. We are also grateful to the diagnostic microbiologists, whose diligence in identifying the isolates on the original agar plates were necessary to the subsequent sequences that underlie our analysis. We are further indebted to the various outbreak investigation teams that identified the outbreaks described in this manuscript.

### SUPPLEMENTARY MATERIAL

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fmicb. 2016.00985

## REFERENCES


relatedness between strains of Escherichia coli O157:H7. J. Clin. Microbiol. 41, 1843–1849. doi: 10.1128/JCM.41.5.1843-1849.2003


platform for the organization and analysis of sequence data. Bioinformatics 28, 1647–1649. doi: 10.1093/bioinformatics/bts199


with swimming in Battle Ground Lake, Vancouver, Washington. J. Environ. Health 64, 16–20. 26, 25.


syndrome outbreak in China. PLoS ONE 7:e36144. doi: 10.1371/journal.pone. 0036144


**Conflict of Interest Statement:** The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Copyright © 2016 Rusconi, Sanjar, Koenig, Mammel, Tarr and Eppinger. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.