Introduction
The gut microbiota is a vast and diverse ensemble of bacteria and other microorganisms that work together to help digestion, produce vitamins, fatty acids, amino acids and other bioactive compounds, and participate in the regulation of our immune, metabolic, and neurological systems (Shapiro et al., 2014; Boulangé et al., 2016). The understanding of our microbiota, together with the determination of its composition when contrasting healthy vs. diseased states allows the identification of microorganism disturbances that are possibly related to disease development and, therefore, offers a new approach for diagnosis as well as preventive and therapeutic interventions.
Specific dietary components, tobacco and alcohol consumption, which have been linked to the development of a number of pathological states (such as obesity, allergy, diabetes, Crohn's disease, irritable colon syndrome, and cancer) are known to drive microbiome alterations and lead to dysbiosis (Turnbaugh et al., 2009; Leclercq et al., 2014; Allais et al., 2016). The direct action of these elements or of the dysbiosis they cause, appears to be instrumental in the pathogenesis of many diseases and, under certain circumstances, it is possible that dysbiosis may, per se, have a direct link with disease development (Duboc et al., 2013). In oncology, studies have been conducted in different neoplastic conditions, identifying roles for specific bacteria in carcinogenesis (Kostic et al., 2012; Riley et al., 2013; Rubinstein et al., 2013), immune evasion (Gur et al., 2015), modulation of the tumor microenvironment (Kostic et al., 2013), and interference with anti-cancer immune responses and immune-surveillance that facilitate chemotherapy activity (Zitvogel et al., 2013; Galluzzi et al., 2015; Vétizou et al., 2015). As a consequence, the emerging concept that cancer needs to be studied considering the complex tumor microenvironment, which includes components such as tumor cells, the surrounding microenviroment and the microbiome, may aid in the development and improvement of cancer treatment, including immunotherapy (Pitt et al., 2016).
Tumors of the lower digestive tract, which include colon and rectal cancer, are among the most prevalent neoplasias worldwide, as well as one of the most fatal. Colorectal cancer (CRC) is the third most commonly diagnosed cancer with 1.4 million people diagnosed annually (Torre et al., 2015). The World Health Organization estimates an increase of 77% in the number of newly diagnosed CRC cases and an increase of 80% in deaths from CRC by 2030 (Binefa et al., 2014). Whereas, colon and rectal cancers have been routinely studied together as CRC, evidences indicate these to be distinct nosological entities. Differences in embryological origin, anatomy, treatment, metastatic potential, and outcome between colon cancer and rectal cancer have led to discussions as to whether neoplastic lesions of these two anatomical sites should be considered as different diseases, with further dichotomization of colon cancers into distal and proximal (Tamas et al., 2015).
The mechanisms involved in sporadic CRC predisposition or development are still poorly understood and the long list of cancer risk factors is continuously expanding and includes age, tobacco, and alcohol consumption, lack of physical activity, increased body weight and, most importantly, diet (Moore and Moore, 1995; Bingham, 2000). Of particular importance is the fact that all these risk factors can directly or indirectly modify the microbiota, making the precise definition of their roles a very complex task. Fecal microbiota studies have contributed greatly in our understanding of the general gut microbiota composition and its dysbiosis in different scenarios (Wu et al., 2013; Sabino et al., 2016). However, possibly due to practical issues related to obtaining the required biopsy samples—from patients and controls—there are still very few available studies focused on the analysis of microbial community compositions of more specific regions of the lower digestive tract, such as the proximal and distal colon, and the rectal tissue. Furthermore, few studies contemplate the fact that fecal- and tissue-associated microbiota are significantly different (Durbán et al., 2011; Hong et al., 2011; Mira-Pascual et al., 2015; Flemer et al., 2016). This fact is particularly relevant for CRC as the intimate crosstalk between the host's epithelium layer and the gut microbial community is a key factor for cell proliferation and development, as well as the regulation of inflammation, a major driver of rectal carcinogenesis (Arthur et al., 2014). Such differences lead to a lack of representativeness with respect to the bacterial biofilm of the rectal mucosa (Durbán et al., 2011; Gevers et al., 2014) and may reflect the disease state but possibly not the tumor microenvironment, which is of great importance to study possible microbiota:disease links.
Here we have addressed such shortcomings by studying the tissue-associated microbiota of 36 subjects, 18 with and 18 without rectal adenocarcinoma. The use of 16S rDNA deep sequencing allowed us to compare non-cancer x cancer mucosa, pointing to specific OTUs and bacterial genera potentially associated with rectal adenocarcinoma.
Materials and methods
Cohort
A total of 36 subjects were included after approval by AC Camargo Cancer Center's review board (ACCCC - 1614/11, January 30th, 2012). Tissue biopsies were collected from subjects belonging to one of the following groups:
Non-cancer subjects
(Non-Cancer, NC, n = 18): All subjects had medical indication of exploratory colonoscopy due to complaints, such as bleeding, abdominal pain, constipation, and chronic diarrhea. No subjects had personal or familial history of colorectal cancer or colitis (either ulcerative, Crohn's, radiation or infectious colitis, chronic inflammatory illnesses), previous colonic or small bowel resection, nor previous colon adenomas or familial polyposis syndrome. Only individuals with complete colonoscopies that allowed the full visualization of the entire colon and showed no significant clinical alterations were included.
Colonoscopy and biopsy procedures for the NC subjects
All patients received standard instructions for preparation for colonoscopy that included consumption of 500 ml of mannitol for bowel cleansing, luftal, and bisacodyl. Eligible subjects gave written informed consent to provide colorectal biopsies, had their anthropometric measures taken and answered questions about diet, consumption of alcohol, and tobacco. Colonoscopy was performed using a Pentax videoscope model FC38LX. During biopsy procurement, the rectum was inflated with air and care was taken not to use any suction during advancement of the scope to 7–8 cm from the anal verge. Sterile biopsy forceps were not taken out of the channel of the scope until an area that was completely clear of stool was seen with clear pink mucosa. Biopsies were taken with 2.2 mm sterile standard forceps.
Patients diagnosed with rectal adenocarcinomas
(Rectal-Cancer, RC, n = 18): Tumor specimens, located in the higher (N = 15), mid (N = 2), and lower rectum (N = 1), were obtained from surgeries to remove the tumor mass. All subjects belonging to this group were recruited at AC Camargo Cancer Center's Pelvic Surgery Department, in São Paulo, Brazil. We included patients that were diagnosed with rectal adenocarcinoma (tumors of stage pT1 or pT2 low- or mid-straight, pT1 or pT2 or pT3 high-straight), that had not undergone any neoadjuvant therapy and had their tumors surgically resected at the Pelvic Surgery Department, AC Camargo Cancer Center, with diagnosis confirmed by the Pathology Department of the same institution. After the histopathologic confirmation of rectal adenocarcinoma diagnosis, surplus samples were macrodissected by an experienced pathologist and used for DNA extraction and bacterial community profiling. Exclusion criteria were: Patients subjected to neoadjuvant therapy prior to tissue collection; patients reporting inflammatory bowel diseases or with hereditary cancer syndromes. We also excluded all subjects (cases and controls) who reported the use of antibiotics for at least 4 weeks prior to sample-collection.
DNA extraction
DNA extraction started after incubating the samples for 18 h in 600 μl of a lysis buffer (Qiagen) and 15 μl of proteinase K (20 μg/μl) at 55°C. After this period, DNA samples were extracted using a standard phenol chloroform protocol, followed by ethanol precipitation, quantification using a spectrophotometer (Nanodrop—Thermo Scientific), and visualized on 2% agarose gels to inspect DNA integrity.
PCR amplification and sequencing of the V4–V5 region of 16S rRNA gene
Oligonucleotide primer selection and coverage analysis
The V4–V5 region was amplified using a primer set designed to generate amplicons compatible with the chemistry available for the Ion Torrent PGM platform, that allowed ~400 nt of high quality sequences (Ion PGM Sequencing 400 Kit). Coverage of the primer set was evaluated using the Ribosomal Database Project's (RDP—Release 11.2) ProbeMatch (Cole et al., 2014) and the ARB Silva's (Release 115) TestPrime (Klindworth et al., 2013). The forward primer (5′-AYTGGGYDTAAAGNG-3′) and reverse primer (5′-CCGTCAATTCNTTTRAGTTT-3′) corresponded to positions 562 and 906, respectively, of the Escherichia coli 16S rRNA gene.
PCR amplification and amplicon sequencing
Three 50 μl amplification replicate reactions were performed per sample, each containing: 2.5 μM of each primer; 25 μl of Kapa Hotstart High Fidelity Master Mix (Kapa Technologies) and 25 ng of genomic DNA (gDNA). Thermocycling conditions were: 95°C, 3 min; 98°C, 15 s, and 40°C, 30 s for 35 cycles; followed by a last extension step at 72°C for 5 min. Amplicons of the three reactions from each subject were pooled and purified using a MinElute PCR Purification Kit (Qiagen). The purified products were run on 1.5% agarose gels and gel bands within the expected amplicon range were excised using sterile and disposable scalpels and purified using the Qiaquick gel extraction kit (Qiagen) to remove artifacts, primer-dimers and non-specific bands. Amplicons were end-repaired and Ion Torrent adaptors with barcodes were ligated. Equimolar amounts of amplicons from each sample were pooled, using the Ion Torrent qPCR quantitation kit (Thermo Scientific, Carlsbad, USA), and used for emulsion PCR. All samples were sequenced on the Ion torrent PGM platform (Thermo Scientific, Carlsbad, USA) using two 318 v2 chips. Samples from both groups were processed simultaneously, to avoid possible batch effects.
Sequence analysis
Sequence filtering
Sequences processed by the latest version of the Ion Torrent server (v3.6.2) were used as input into the Qiime (Quantitative insights into microbial ecology) software package (Version 1.6.0) (Caporaso et al., 2010a). We first removed sequences with an average quality score <20 using a 50 nt sliding window. Then, we identified barcodes used for subject-assignment, allowing a maximum of 2 mismatches, and discarded sequences with no barcodes, and <200 nt or >500 nt after barcode removal. PCR primers identified at the start or at the end of the reads, allowing a maximum of 4 nt mismatches, were trimmed and sequences with no identifiable primers were discarded. After primer trimming we removed all sequences below 200 nt and the remaining sequences were used as input for downstream analysis.
Sequence clustering and OTU filtering
Filtered sequences were clustered with 97% identity using UPARSE (implemented in USEARCH v7) (Edgar, 2013) and the seed sequence of each cluster was picked as a representative. Chimeric sequences (and clusters) were identified using UCHIME (Edgar et al., 2011) and the Broad Institute's chimera slayer database (version microbiomeutil-r20110519) and excluded from further analysis. The RDP classifier (Wang et al., 2007), as implemented within the Qiime interface (default parameters), was used to assign taxonomic ranks using a minimum confidence value of 80% and, subsequently, to each operational taxonomic unit (OTU). Unless otherwise stated, OTUs that occurred in less than 25% of all samples and with less than 3 reads were not considered.
Alpha and beta diversity analysis
We rarefied the OTU table to 17,414 sequences per sample in order to calculate species diversity, using the Shannon-Weaver index (Shannon, 1948) and the Simpson index (Simpson, 1949), and richness (by using the observed species) implemented in the R Phyloseq package (McMurdie and Holmes, 2013).
For beta diversity analysis, OTU-representative sequences were aligned using PyNAST (Caporaso et al., 2010b) against the aligned green genes core set (DeSantis et al., 2006) with Qiime default parameters, and the alignments were lanemask-filtered (Lane, 1991). A phylogenetic tree was built using FastTree (Price et al., 2009), weighted and unweighted UniFrac (Lozupone and Knight, 2005) distances were calculated and a distance matrix was generated. Using the R phyloseq package, distance matrices were used to calculate coordinates for principal coordinate analysis (PCoA).
Enterotypes
Community types of each sample were analyzed by the Dirichlet multinomial mixture model-based method (Holmes et al., 2012) using rarefied genera level counts of 16S rRNA sequencing reads. Partioning around medoids (PAM) enterotyping was performed in R using genera level relative abundances and the “cluster” package (Maechler et al., 2015). We applied 4 distance metrics: Weighted UniFrac, Unweighted UniFrac, root Jensen-Shannon divergence, and Bray-Curtis and assessed the quality of the clusters using prediction strength (Tibshirani and Walther, 2005), silhouette index (Rousseeuw, 1987), and the Caliński-Harabasz statistic (Calinski and Harabasz, 1974) using the “fpc” R package (Hennig, 2015).
Differential abundance analysis
To investigate differences in OTU, phyla and genera abundances between both groups, raw counts were normalized then log transformed using the normalization method below, as performed by a previous study (Sanapareddy et al., 2012):
Normalized count= log10((raw countnumber of sequences in that sample) × average number of sequences per sample+1)
We also evaluated high-level phenotypical differences in microbial composition between both groups. Quality control passed sequences were closed-reference picked at 97% identity using UCLUST_Ref (Edgar, 2010) and the green genes core set (Version 13.5). The resulting OTU table was rarefied to 13,944 sequences and submitted to BugBase (http://github.com/danknights/bugbase) in order to calculate differences between both groups in terms of microbial phenotypes.
Data validation
Digital droplet PCR of bacteroides fragilis 16S rRNA
We detected and quantified the absolute number of 16S rRNA B. fragilis copies in our samples using the QX200™ Droplet Digital™ PCR System (Bio-Rad). The primers used to amplify the B. fragilis 16S rRNA gene were: BF-fwd 5′-TCRGGAAGAAAGCTTGCT-3′ and BF-rev 5′-CATCCTTTACCGGAATCCT-3′(Tong et al., 2011) and to ensure further specificity, a labeled probe BF-p 5′(FAM)-ACACGTATCCAACCTGCCCTTTACTCG-3′ (BHQ1) (Tong et al., 2011) was included in the reaction. We used a commercial RNAseP Copy Number Reference Assay (Thermo-Fisher) to detect and quantify human DNA. Microdroplets (≈20.000/reaction) were generated on the Bio-Rad QX-100 following the manufacturer's instructions. RNAse P and B. fragilis ddPCR were performed in 96 well-plates, in a final volume of 20 μl, containing: 15 ng of total DNA, 10 ul of ddPCR supermix for probes (Bio-Rad), 8 pmol of each PCR BF-primer and 2 pmol of the BF-probe, or 1 μl of RNAse P assay. PCR conditions were: 50°C- 2 min; 95°C- 10 min; 95°C- 15 s and 60°C- 1 min for 40 cycles. After cycling, the 96-well plate was immediately transferred on a QX200 Droplet Reader (Bio-Rad), where flow cytometric analysis determined the fraction of PCR-positive droplets vs. the number of PCR-negative droplets in the original sample. Data acquisition and quantification was carried out using QuantaSoft Software (Bio-Rad). To ensure the accuracy of the results, a minimum of 10,000 acceptable droplets per reaction were required for quantification using the QuantaSoft software. Samples yielding a minimum of 3 positive droplets from 10–15,000 droplets analyzed were scored as positive.
Immunohistochemistry
Immunohistochemistry was performed in an automated Benchmark platform (Ventana Medical Systems, USA) for Anti-B. fragilis LPS antibody (mouse monoclonal—Abcam 1265/30) in whole slide tissues. Alkaline phosphatase conjugated to secondary polymeric system was used for IHC visualization. The selection of positive and negative samples was guided by the high-throughput sequencing (HTS) data and used to confirm the presence of B. fragilis in the sample set. The primary antibody was omitted to evaluate background staining.
Statistical analysis
Wilcoxon tests were used to compare mean differences between tumor and biopsy samples for phyla, genera and OTU log-abundances. Considering t = total number of taxa tested, p = raw p-value and R = sorted rank of the taxon, P-values were corrected for multiple testing (Sanapareddy et al., 2012) using:
Adjusted p value = t × pR
Fold changes for each genera/OTU were calculated using:
Log2FC = log2(RC average + 1) - log2(NC average + 1)
Chi-Square tests were performed on subject's categorical data such as gender, alcohol and tobacco use and vital status. Student t-tests were performed to compare differences in the means between both groups for age, height, weight, BMI, and alpha diversity. We used ANOSIM and ADONIS (Oksanen et al., 2016) to compare differences in beta-diversity between groups using 3 distance metrics weighted UniFrac, unweighted UniFrac and Bray-Curtis for categorical, and numerical variables, respectively. Linear models were built using normalized counts at the genera and OTU level to investigate associations with clinical-pathological characteristics of rectal-cancer samples, such as lymph node and perineural neoplastic invasion status. Unless otherwise stated, values were reported as mean ± SD (standard deviation) and P-values <0.05 were considered statistically significant. All calculations were performed within the R statistical computing environment (R Foundation, 2011) unless otherwise stated.
Discussion
In face of the microbiota gradient found in the human digestive tract (Zhang et al., 2014; Gao et al., 2015; Flemer et al., 2016) and the possibility that tissue-associated microorganisms could play a more direct role in immunomodulation and cancer development, we investigated bacterial populations present in tissue biopsies, which may be relevant to pathological processes. Instead of studying colon and rectum samples together, our work is more specific as it is focused and contains only rectal tumors. Whereas, we achieved high 16S rRNA coverage from a large spectrum of bacteria from cancer samples, before any therapeutic intervention, we also see limitations, such as our relatively small sample size of 36 individuals. However, effect size analysis (Kelly et al., 2015) between both groups revealed an ω2 ranging from 0.13 to 0.26, depending on the metric of pairwise distance, with PERMANOVA p-values <0.001 and power of 1 (data not shown), a finding that indicates that this sample size allows the observation of significant microbial differences between our two sample groups. We also need to point out that, whereas the primer pair used here gives a good coverage of most phyla, it has a poor coverage of the two closely related bacteria phyla Lentisphaerae and Verrucomicrobia.
In our study, we observed increased species-diversity and -richness among rectal-cancer samples. Higher species-diversity and -richness were seen in rectal tissue samples from adenomas compared to normal samples (Sanapareddy et al., 2012) and CRCs vs. adenomas (Nakatsu et al., 2015) and increased richness was found in CRCs compared to both adenomas and controls (Mira-Pascual et al., 2015). However, when looking at fecal samples, studies have had conflicting results. One study found increased diversity of both genes and genera along the adenoma-carcinoma transition (Feng et al., 2015), whereas another found a decrease in diversity when comparing carcinoma samples and normal controls (Ahn et al., 2013) and a third found no differences between controls, adenomas and carcinomas (Zeller et al., 2014). It is noteworthy to state that these fecal studies grouped proximal and distal colon cancers together with rectal cancers, which could have led to differences in their results. We should note that the five cases of early-stage lesions (pT2) showed, on average, intermediate microbial richness, when compared to non-cancer biopsies and a more advanced neoplastic stage (pT3). This suggests that increased species richness of cancer lesions could have an early role in rectal carcinogenesis.
Inter-individual microbial community heterogeneity of the human gut is influenced by spatial distribution, micro-heterogeneity, host genetics, dietary preferences, and mucin content (Eckburg et al., 2005; Hong et al., 2011; Zhang et al., 2014), and has posed a long-standing challenge when investigating microbial signatures implicated in CRC tumorigenesis. However, our results show that despite the high inter-individual differences, a common microbial community pattern appears to emerge, as shown in the PCoA analysis that clustered non-cancer and rectal-cancer groups separately (Figure 1D), suggesting a common dysbiotic setting related to this neoplasia.
We performed a global analysis of high-level phenotypical differences for bacteria identified in both groups. We highlight the higher abundance of anaerobic bacteria in the RC group in agreement with a previous study (Warren et al., 2013) and the reduction of biofilm-forming bacteria. The latter is a finding that may point to barrier breakage that would contribute to rectal colonization by relevant bacteria (Reid et al., 2001) (Supplementary Figure 3).
The alterations we found at the phylum level include higher levels of Cyanobacteria (possibly Melainabacteria) (Soo et al., 2014), Actinobacteria, Bacteroidetes, OD1, Proteobacteria, and Planctomycetes in the RC-group. We should note an important abundance difference for bacteria of the candidate phylum OD1 (Parcubacteria). These highly adapted organisms have not been isolated in vitro yet; they have small genomes (<1 Mb) and reduced metabolic properties identified in a range of anoxic environments. The absence of biosynthetic capabilities and DNA repair enzymes, derived from the genomic analyses of some OD1 bacteria, suggests a role as ectosymbionts (Nelson and Stegen, 2015). However, the putative role of these microbes in rectal cancer remains to be determined. A second phylum, Planctomycetes, which are atypical bacteria (Fuerst and Sagulenko, 2011) relatively close to Verrucomicrobia (Hou et al., 2008) and more frequently observed in aquatic environments (such as saltwater, fresh water, and acidic mud), also showed potential as a biomarker for RC, with striking differences between the groups.
Interestingly, our study also indicated the differential abundance of more specific microbes after comparing NC and RC groups. B. fragilis, a symbiotic organism common to the human intestinal tract, was found to be more abundant in rectal-cancer samples seen by 16S rRNA HTS and confirmed by ddPCR. Other studies that investigated tissue-associated bacteria also found increased abundance of B. fragilis in tumor samples (Wang et al., 2012; Zeller et al., 2014; Nakatsu et al., 2015). B. fragilis has been identified as an important human intestinal symbiont and has been suggested to act as a “keystone pathogen” in the development of CRC (Hajishengallis et al., 2012). B. fragilis is an obligate anaerobe and is a minority member of the normal colonic microbiota with a propensity for mucosal adherence (Sears et al., 2014). Previous reports have linked enterotoxigenic B. fragilis (ETBF) to human diarrheal illnesses and increased tumorigenesis in an IL-23-dependent and STAT3-dependent manner (Wick et al., 2014). The toxin fragylisin, produced by ETBF, is a zinc-dependent metalloprotease that triggers NF-kB signaling and cleaves E-cadherin, and has been suggested to be oncogenic (Wu et al., 2009). Bacterial genera known for their role in butyrate production, such as Ruminococcus, Roseburia, and Butyricimonas were more abundant among rectal-cancers, differing from results reported so far. An explanation for this difference could involve the fact that most data has been derived from fecal samples and/or grouping different anatomical tumor sites (such as proximal, distal, and rectal). An OTU assigned to Bilophila, a bile-resistant, strictly anaerobic bacterial genus, was also more abundant among rectal-cancer samples, and evidence suggests that products of bacterial bile acid conjugation, secondary bile acids, are carcinogenic (McGarr et al., 2005; Ridlon et al., 2014). Desulfovibrio, a commensal sulfate-reducing bacterium, may contribute to mucosal inflammation through hydrogen sulfide production, a resulting by-product of sulfated mucin metabolism (Earley et al., 2015). Phascolarctobacterium, known to produce propionate via succinate fermentation, was also increased among cancer samples. On the other hand, we found that L. delbrueckii was more abundant in non-cancer samples. Probiotic Lactobacilli can modify the enteric flora and are thought to have a beneficial effect on enterocolitis. Treatment of IL-10-deficient mice with the probiotic Lactobacillus salivarius ssp. reduced the intensity of mucosal inflammation and the incidence of colon cancer from 50 to 10%. These effects were accompanied by significant reductions in fecal coliform, enterococci, and Clostridium perfringens levels (O'Mahony et al., 2001). This study exemplifies the effect of changes at the flora level on the development of inflammation, and supports the hypothesis that there are “protective” species and “harmful” species in the normal bacterial flora.
After identifying relevant cancer-related microorganisms, the next steps of microbiome studies will certainly involve microbial manipulations to reduce disease-associated agents, or increase the frequency of protective and health-associated microbes. This can be achieved through diet, exemplified by a previous study using animal models that showed taurine consumption lead to a reduction of Proteobacteria (especially Helicobacter), as well as an elevation in short-chain fatty acids (SCFA) and a reduction in fecal lipopolysaccharides (LPS) (Yu et al., 2016). Duque et al., recently demonstrated, using SHIME® (Simulator of the Human Microbial Ecosystem), that the consumption of non-pasteurized fresh orange juice was able to significantly increase levels of Lactobacillus spp., Enterococcus spp., Bifidobacterium spp., and Clostridium spp. and to reduce enterobacteria (Duque et al., 2016).
Long before associations between cancer and the microbial flora started to be uncovered, diet recommendations—including low consumption of red meat and fat, and high ingestion of fibers and vegetables—have been recognized as protective against the development of colorectal cancer. Current evidences suggest that diet recommendations may be effective, together with tissue environment and host-related factors, because they also help shape the gut microbiota (Sonnenburg and Bäckhed, 2016). Further research may show that treatment of rectal dysbiosis may contribute to the prevention of inflammation-induced rectal carcinoma development and aid in chemotherapy and overall treatment response (Yang and Pei, 2006).