Escherichia coli Strains from Patients with Inflammatory Bowel Diseases have Disease-specific Genomic Adaptations

Dubinsky, Vadim; Reshef, Leah; Rabinowitz, Keren; Wasserberg, Nir; Dotan, Iris; Gophna, Uri

doi:10.1093/ecco-jcc/jjac071

Abstract

Background and Aims

Escherichia coli is over-abundant in the gut microbiome of patients with inflammatory bowel disease [IBD]. Here, we aimed to identify IBD-specific genomic functions of diverse E. coli lineages.

Methods

We investigated E. coli genomes from patients with ulcerative colitis [UC], Crohn’s disease [CD] or a pouch, and healthy subjects. The majority of genomes were reconstructed from metagenomic samples, including newly sequenced faecal metagenomes. Clinical metadata were collected. Functional analysis at the gene and mutation level were performed and integrated with IBD phenotypes and biomarkers.

Results

Overall, 530 E. coli genomes were analysed. The E. coli B2 lineage was more prevalent in UC compared with other IBD phenotypes. Genomic metabolic capacities varied across E. coli lineages and IBD phenotypes. Host mucin utilisation enzymes were present in a single lineage and depleted in patients with a pouch, whereas those involved in inulin hydrolysis were enriched in patients with a pouch. E. coli strains from patients with UC were twice as likely to encode the genotoxic molecule colibactin than strains from patients with CD or a pouch. Strikingly, patients with a pouch showed the highest inferred E. coli growth rates, even in the presence of antibiotics. Faecal calprotectin did not correlate with the relative abundance of E. coli. Finally, we identified multiple IBD-specific non-synonymous mutations in E. coli genes encoding for bacterial cell envelope components.

Conclusions

Comparative genomics indicates that E. coli is a commensal species adapted to the overactive mucosal immune milieu in IBD, rather than causing it. Our results reveal mutations that may lead to attenuated antigenicity in some E. coli strains.

Escherichia coli, genomics, IBD

1. Introduction

Inflammatory bowel diseases [IBD], including Crohn’s disease [CD], ulcerative colitis [UC], and pouchitis, are multifactorial diseases characterised by chronic and relapsing intestinal inflammation. CD typically affects the small intestine and colon, UC is confined to the colon, and pouchitis is inflammation of the previously normal small intestine comprising the pouch in patients with UC after proctocolectomy and ileal pouch-anal anastomosis.¹ The aetiology of IBD is believed to involve an aberrant immune response towards the intestinal microbiota² in a genetically predisposed host,³ induced by environmental triggers.⁴

The gut microbiome in patients with IBD is distinguished by taxonomic^5,6 and functional^7,8 dysbiosis. One of its hallmarks is the expansion of Enterobacteriaceae, in particular of Escherichia coli in both faecal⁸ and mucosal^5,9 samples from patients with IBD. E. coli is a prevalent commensal in the human gut microbiota,¹⁰ but has high within-species variation in functional potential¹¹ and some strains are pathogens, causing intestinal or extra-intestinal diseases.¹² One E. coli strain [Nissle 1917] has been considered as a probiotic agent and reported to prolong remission in patients with UC.¹³

E. coli has been extensively studied in the context of IBD pathogenesis, yet the focus has been mostly on the adherent-invasive E. coli [AIEC] pathotype.^14,15 AIEC strains are characterised by their phenotype of adhesion and invasion into intestinal epithelial cells and survival and replication within macrophages without prompting cell death,⁹ and are more prevalent in mucosal biopsies of patients with CD compared with patients with UC or healthy controls.^9,16–18 Roughly 65% of the AIEC biopsy-derived isolates belong to phylogroup B2,^16,19 which contains both commensal and many extra-intestinal pathogenic E. coli [ExPEC] strains. Such ExPEC strains typically have pathogenicity islands encoding capsular antigens, iron acquisition systems, cytotoxins, and haemolysins.²⁰

Despite considerable efforts, comparative genomic analysis of AIEC strains failed to identify any discriminative genomic markers of this phenotype.²¹ Most studies investigating E. coli as a potential pathogen in IBD focused exclusively on AIEC, and the few previous comparative genomic studies of E. coli strains in IBD were limited by the number and diversity of the genomes available.^22–24 Thus it remains unknown if there are E. coli lineages, genes, or alleles that are better adapted to patients with IBD. Here, we performed a large-scale comparative genomic study of E. coli from patients with CD, UC, and an ileal pouch, aiming to reveal IBD-specific or phenotype-specific adaptations and also test to what degree those strains are expected to promote intestinal inflammation.

2. Materials and Methods

The study was approved by the local institutional review board [0298–17] and the National Institutes of Health [NCT01266538]. All patients signed informed consent before inclusion to the study.

2.1. E. coli genomes from patients with IBD and healthy subjects and their metadata

The genomes of E. coli from patients with a pouch, CD, or UC, and from healthy subjects, were obtained from metagenomic and isolate genome data from multiple published studies [Supplementary Table S1]. In addition, 58 newly sequenced metagenomic samples from 22 patients with a pouch [including two UC samples prior to proctocolectomy and four stoma samples prior to pouch surgery], nine isolate genomes from patients with a pouch, and three isolate genomes from patients with CD from Rabin Medical Center [RMC], Israel, were added to the analysis. Publicly available isolate genomes from patients with CD or UC and from healthy subjects, and related metadata, were downloaded from the PATRIC database.²⁵ Publicly available metagenomes used for reconstructing metagenome-assembled genomes [MAGs] were downloaded from the NCBI-SRA repository and included all phenotypes [Supplementary Table S1]. In total, the comparative genomic analysis included 530 [IBD- and healthy subject-derived] genomes and two versions of the E. coli Nissle-1917 genome.

Patients after pouch surgery were recruited at a dedicated pouch clinic in a tertiary IBD referral centre in Israel [RMC]. The study was approved by the local institutional review board [0298–17] and the National Institutes of Health [NCT01266538]. All patients signed informed consent before inclusion. Demographic and clinical data were obtained during clinic visits. Faecal sample collection, faecal calprotectin measurements, genomic DNA extraction and shotgun metagenomic sequencing, pouch disease behaviour [phenotype] definition, and treatment protocol were performed as described in Dubinsky et al.²⁶

2.2. Pipeline for the reconstruction of genomes from metagenomes

To reconstruct the genomes of E. coli directly from metagenome samples [E. coli MAGs], we applied the following single-sample metagenomic assembly pipeline to each sample. [i] Metagenome reads were quality-filtered with Trimmomatic v0.36²⁷ using default parameters, removing low-quality and short reads and low-quality bases, and clipping Illumina adaptors. Human-derived reads were removed by mapping them against the human genome [GRCh38] with Bowtie2 v2.2.9²⁸ in --very-sensitive local mode. [ii] The remaining reads were mapped against a reference database of the E. coli pangenome to retain only closely matching reads using MIDAS.²⁹ [iii] Samples with a sufficient amount of mapped reads to E. coli pangenome [depth of coverage >5x] were separately de novo assembled with Unicycler v0.4.4,³⁰ using default parameters. Contigs shorter than 1 Kb were discarded. [iv] To obtain better quality MAGs, the remaining contigs were screened with BLASTN against the NR database, and only contigs that aligned to E. coli genes as top hit with nucleotide identity of ≥90% were retained. [v] All the MAGs underwent the following quality control steps. Genome completeness and contamination measures were assessed using CheckM v1.0.7³¹ in taxonomy-specific mode [taxonomy_wf] for the species E. coli. MAGs with levels of completeness <75% and contamination >5% were discarded. The amount of strain-level heterogeneity was estimated with CMSeq [https://github.com/SegataLab/cmseq], which calculates the polymorphism at each position [with coverage of >5x] in the contigs [a position was considered as non-polymorphic if the dominant allele frequency was >80%]. [vi] The relative abundance of each MAG from the corresponding metagenome sample was calculated by mapping reads against the reconstructed MAG from the same sample using Bowtie2 [--very-sensitive mode], and to avoid counting reads from closely related strains, only aligned reads with an edit-distance [mismatches] of two or less bases to the contig were considered, divided by the total number of quality-filtered reads in the sample.

Metagenomes where the E. coli MAGs failed to pass our minimum accepted quality criteria [completeness ≥75% and contamination ≤5%] were de novo assembled with metaSPAdes v3.14³² using default parameters and ‘--meta’ option, including the read error corrector BayesHammer, instead of the Unicycler assembler. The contigs were then used for binning with MetaBAT2 v2.15³³ with ‘-m 1500’ option [minimal contig size for binning of ≥1500 bp] to obtain E. coli genome bins. The quality control steps with CheckM were repeated for the newly generated MAGs, and those that passed the accepted quality criteria were added into the analysis pipeline with the rest of the steps unchanged.

In total 412 E. coli MAGs were reconstructed and had genome completeness of >75% and contamination <5%, and 346 MAGs could be defined as high-quality according to current guidelines with completeness of >90%³⁴; see Supplementary Table S2 for assembly quality control statistics.

2.3. Whole-genome phylogeny and average nucleotide identity analysis

Phylogenetic analysis of the E. coli genomes [MAGs and isolates] was performed using PhyloPhlAn-3.³⁵ The phylogeny was built by annotating the input genomes with a set of species-specific marker genes identified from the UniRef90 database [2818 E. coli markers]. The following parameters were used: ‘--diversity low --accurate -min_num_entries 505 --force_nucleotides’, to generate high-resolution strain-level phylogeny [nucleotides level] with the genes defined as core if present in ≥95% of the genomes.

Whole-genome average nucleotide identity [ANI] between pairs of genomes was calculated with FastANI v1.32³⁶ using default parameters. ANI values of ≥95% are the most frequently used as a species-level taxonomic boundary threshold for prokaryotic genomes.³⁶

2.4. E. coli phylogroups typing

To perform in silico phylogroup typing for the E. coli genomes, the ClermonTyping tool [https://github.com/A-BN/ClermonTyping] was used, and the assembled contigs of each genome were used as input.³⁷ This tool mimics the in vitro PCR method using a set of 30 primers that are aligned against the genomes with BLASTN.

2.5. Core genes single nucleotide polymorphism analysis

Core genes single nucleotide polymorphism [SNP] analysis of the E. coli genomes was performed using Prokka v1.13.3³⁸ in order to obtain the coding genes, and then Roary v3.13³⁹ for pangenome building and core gene alignment. To convert the resulting core gene alignment file into an SNP distance matrix, the tool SNP-dists [https://github.com/tseemann/snp-dists] was used.

2.6. Functional annotation and analysis of the genomes

Functional annotation of the E. coli genomes was performed based on several tools and databases. EggNOG mapper v2.0.1⁴⁰ was used to generate the functional genes profile [pangenome] based on EggNOG orthology system with DIAMOND v0.9.24⁴¹ for the sequence similarity search. In addition, the genomes were annotated with KEGG Orthologs [KO] assignment based on hidden Markov model profiles using KofamScan v1.3.0.⁴² Carbohydrate-active enzymes [CAZy] families⁴³ were identified using ‘hmmscan’ from HMMER v3.2.1 [http://hmmer.org] against dbCAN HMMs database release-9, applying stringent filtering cutoff as suggested by the authors [http://bcb.unl.edu/dbCAN2]. To add virulence genes annotation to the genomes, protein search BLASTP [DIAMOND] against the core set of experimentally verified virulence factors database [VFDB] was performed.⁴⁴ Only sequences that had identity ≥90% to the proteins in VFDB were considered. Similarly, antibiotic resistance genes were identified by using the curated protein sequences [homologue model] in the comprehensive antibiotic resistance database [CARD].⁴⁵ Only sequences that had identity ≥95% to the proteins in CARD were considered. For all the functional annotations, protein-coding genes within the contigs were predicted with Prodigal v2.6.3 and used as input query for sequence similarity searches.

2.7. Bacterial genome-wide association study analysis incorporating phylogenetic information

The bacterial genome-wide association algorithm PySEER⁴⁶ was used to control for the phylogenetic structure of the E. coli strains. PySEER was run in linear mixed effects model mode, with a similarity matrix obtained from the previously computed whole-genome phylogenetic tree [see above]. Thus, population structure was used as a random effect and the phenotype of the subjects was used as a fixed effect. Matrices of genes presence/absence for each E. coli genome were used, and the analysis was performed based on all the functional gene groups as described above. The p-values from the mixed model associations were reported.

2.8. Genome replication rates of E. coli within subjects

Genome replication rate for the E. coli MAGs was calculated with the iRep [genome replication index] algorithm.⁴⁷ This tool is based on the principle that actively replicating cells have higher coverage near the origin of replication, compared with the terminus in their genomes.⁴⁸ Only MAGs that were obtained from faecal and aspirate metagenomes were used in this analysis [n = 391], since the mapping of metagenome reads to the assembled MAGs is required for this tool.

2.9. Identification of mutation in ciprofloxacin antibiotic target genes

The primary target genes for ciprofloxacin antibiotic, gyrA and parC, were extracted from the coding genes predicted by Prokka [see above]. Point mutations in these genes in positions known to confer resistance to ciprofloxacin were identified as described in Dubinsky et al.²⁶

2.10. Mutational analysis of E. coli functional genes

To identify IBD-specific adaptations at the mutational level, Snippy v4.6.0 variant calling tool [https://github.com/tseemann/snippy] was used. The analysis was performed in a within-clade approach with a different isolate reference genome [high-quality draft] for each E. coli clade as was determined by the whole-genome phylogeny. For clade-I, clade-III, clade-IV, and clade-V, the following genomes from healthy subjects were used as reference [PATRIC genome ID]: 749549.3, 749550.3, 749540.3, and 749545.3, respectively. Functional annotations for these genomes were obtained from NCBI-Assembly according to NCBI prokaryotic genome annotation pipeline in genomic GenBank format, as required by Snippy. The analysis was not performed for clade-II due to the small number of genomes. The metagenomic reads composing the MAGs from IBD patients were mapped against the reference genomes, with a minimum read depth at a position for variant calling set to 10 [default]. Mutation variants were identified in coding and non-coding regions in the genome, and were classified as synonymous, non-synonymous, insertions/deletions, and frameshifts. Synonymous mutations and mutations in intergenic regions were not analysed further. To further validate the genes with non-synonymous mutations that were potentially IBD-specific [not present in non-IBD MAGs] from clade-III, homology search with BLASTN was performed. These genes were used as a query against a set of 23 non-IBD E. coli isolate genomes, and only genes with mutations that were absent in >80% of these non-IBD genomes [present in <20%] were labelled as IBD-specific and retained.

2.11. Statistical analysis

All statistical analysis was performed in R v3.6.3. The Kruskal–Wallis H-test with Dunn’s test correction for multiple pairwise comparisons was performed using ‘kruskal.test’ and ‘dunn.test’ functions. Wilcoxon [Mann–Whitney] rank test was performed using the ‘wilcox.test’ function. Fisher’s exact test with false-discovery rate correction [Benjamini–Hochberg] for multiple hypothesis testing was performed with the ‘fisher.test’ and ‘p.adjust’ functions. Spearman’s rank correlation was performed with the ‘cor.test’ function. Permutational multivariate analysis of variance [PERMANOVA] test was performed using the adonis function. A linear mixed effects model of iRep [dependent variable] as a function of genome and patient-related predictor variables, with individual patients from whom a set of samples were derived being specified as a random effect, was performed with the ‘lmer’ function.

2.12. Data availability [data sharing statement]

All the metagenomic sequence data generated in this study [patients with a pouch, UC and a stoma] have been deposited in NCBI-SRA, and the isolate genomes of patients with a pouch and CD were deposited in NCBI-Genome, and are available under BioProject number PRJNA730677. The assembled MAGs are available for download from Figshare data repository with the following URL: [https://doi.org/10.6084/m9.figshare.14625537].

3. Results

We studied the genomes of 530 E. coli strains from faecal [n = 438], aspirate [n = 24], and biopsy [n = 68] samples of patients with IBD [n = 459] and healthy individuals [n = 71]. We established a dataset that encompassed 21 individual studies, both publicly available and newly sequenced samples from our own cohort of patients with a pouch [Supplementary Table S1]. We reconstructed the majority of the analysed genomes [77%] directly from metagenomic samples (metagenome-assembled genomes [MAGs]) and the rest of the genomes were obtained from cultured isolates. The patients with IBD in this study, from whom we obtained E. coli genomes, included the following clinical phenotypes: pouch [n = 268], CD [n = 124], UC [n = 63], and an additional four patients after UC-associated proctocolectomy and a stoma [prior to pouch surgery]. For a subset of patients from our dataset, we had faecal calprotectin and antibiotic usage data [Supplementary Table S1].

3.1. E. coli strains from patients with IBD and healthy subjects belong to distinct lineages that are differentially distributed across phenotypes

First, the phylogenetic structure of E. coli strains was investigated using whole-genome analysis of 2818 core marker genes. We identified five distinct clades [main branches] comprising the E. coli strains in our dataset [Figure 1A], which we named Clade-I to Clade-V. Each clade included both isolate genomes and MAGs, reassuring that our computational approach for MAGs reconstruction from metagenomes is robust and the MAGs are compatible with isolate genomes. Although the majority of the E. coli strains in our dataset were from faecal samples, the genomes obtained from biopsies and aspirates clustered with the faecal strains throughout the five clades, implying that there is no clade that is specifically mucosa-associated.

Figure 1.

Phylogenetic structure of the metagenome-assembled genomes [MAGs] and isolate genomes of E. coli from patients with IBD and healthy subjects. [A] A whole-genome phylogenetic tree based on core genes of 532 E. coli genomes. The inner ring indicates the clinical phenotype of the subject from whom the genome was derived; the middle ring corresponds to the E. coli phylogroup membership; the outer ring is the E. coli clade membership based on whole-genome phylogeny; circles/squares on tree nodes denote whether the genome is a MAG or isolate genome [WGS label], respectively; stars outside the rings denote the isolation source of the sample. The phylogenetic tree was built with RAxML [parameters ‘-p 1989 -m GTRCAT’] as part of PhyloPhlAn-3 tool,³⁵ based on 2818 species-specific marker genes defined as core, which were present in ≥95% of the genomes. The branch scale is nucleotide substitutions per site. [B] Whole-genome average nucleotide identity [ANI] between pairs of E. coli genomes within a clade and between the clades. [C] Jaccard similarity in gene content [non-core functional genes based on EggNOG] between pairs of E. coli genomes within a clade and between the clades. [D] Core genome single-nucleotide polymorphism [SNP] distance between pairs of E. coli genomes within a clade and between the clades. In panels B to D, coloured bars are within a clade and grey bars are between the clades comparison, respectively. IBD, inflammatory bowel disease; pos., position.

Open in new tab Download slide

Population genetic studies of E. coli isolates have traditionally relied on the division into eight deep branching lineages called phylogroups⁴⁹ based on a PCR-based typing method relying on four marker genes. An analysis of E. coli phylogroups [Figure 1A] indicated that the IBD strains originated from a diverse array of lineages [A, B1, B2, C, D, E, F, G]. The five clades we identified based on the whole-genome data largely corresponded to the phylogroups, with clade-I being consistent with D, clade-II with F, clade-III with B2, clade-IV with A and clade-V with B1 and C [Figure 1A; Supplementary Table S1]. As expected, strains within a clade had genomes that were more similar to one another in sequence [mean of 98.17% to 98.75% average nucleotide identity, ANI] compared with a mean ANI similarity of strains between clades (96.56% to 98.14% [Figure 1B]) and also had more similar gene content [Figure 1C and D].

We next analysed the distribution of the five E. coli clades among the clinical phenotypes. All the IBD phenotypes and healthy subjects had at least one strain from each clade, suggesting that many different E. coli lineages successfully colonise both patients with IBD and healthy subjects. Nonetheless, we observed an uneven distribution in clade-III [Fisher’s exact test, FDR p <0.05; Figure 2A], which was more prevalent in UC [49.2%] and non-IBD [40.8%] compared with CD [31.5%] and pouch [17.5%], whereas clade-I prevalence in UC was only 4.8%, the lowest among all the phenotypes.

Figure 2.

Distribution of the E. coli clades across clinical phenotypes and lack of association between intestinal inflammation and E. coli clade and relative abundance. [A] Prevalence of E. coli clades in the different clinical phenotypes of patients with IBD [pouch, CD and UC] and non-IBD subjects. *FDR p <0.05; Fisher’s exact test. The test was performed separately for each clade, between the phenotypes—i.e., testing for each of the clades, is it differently distributed between the different phenotypes compared with the other clades. [B] Spearman correlation between E. coli MAGs relative abundance and faecal calprotectin level [log-transformed]. [C] Fecal calprotectin levels [log-transformed] associated with each E. coli clade. Box plot whiskers mark observations within the 1.5 interquartile range of the upper and lower quartiles. The blue horizontal lines mark the mean value in each clade. IBD, inflammatory bowel disease; CD, Crohn’s disease; UC, ulcerative colitis; FDR. false-discovery rate.

Open in new tab Download slide

Finally, we tested to what degree can E. coli or specific clades of this species be associated with intestinal inflammation. Faecal calprotectin, used as a biomarker of intestinal inflammation, did not correlate with the relative abundance of the E. coli MAGs in the microbiomes from corresponding metagenomic samples [Spearman r = 0.0167, p = 0.77; Figure 2B]. Furthermore, the median calprotectin levels associated with each clade were similar [Kruskal–Wallis, p = 0.59; Figure 2C] although, in the group of patients with clade-III E. coli, the mean calprotectin was higher due to the inclusion of several patients with very high calprotectin values [>8000 μg.g^-1].

3.2. E. coli clades present in patients with IBD may have particular functional potential

We next explored the functional potential of the E. coli clades through genome annotation based on several databases [see Materials and Methods]. We identified a pangenome of 7462 genes in the genomes set [with either a known gene name or KO number, according to the EggNOG], of which 2121 were non-core [present in less than 90% of the genomes] and thus might drive differences in the functional potential between the clades. A substantial variation in the functional genes was observed between the E. coli genomes [Figure 3A], which was explained to a large degree by the clade membership [PERMANOVA test, R² = 0.437, p <0.001]. Clade-III was the most dissimilar from the other clades according to EggNOG, KO and CAZy gene annotations [Figure 3B and C] and to core-gene SNP alignment [Figure 3D]. Focusing on genes with functional annotations [KO], we detected 709 non-core genes that significantly differed in prevalence between the E. coli clades [Fisher’s exact test, FDR p <0.05; Supplementary Table S3].

Figure 3.

Variation in the functional gene content of the E. coli genomes explained by clade membership. [A] Principal coordinates analysis [PCoA] of Jaccard distances [overall dissimilarity in the presence or absence of genes] based on non-core genes annotated with EggNOG. [B] PCoA of Jaccard distances based on non-core genes annotated with KEGG Orthologs [KO]. [C] PCoA of Jaccard distances based on non-core carbohydrate-active enzymes [CAZy] families. [D] PCoA based on dissimilarity in core-gene single-nucleotide polymorphisms [SNPs]. The variation in the functional gene content explained by the clade membership of E. coli genomes in panels A to D [PERMANOVA, R²= 0.437, 0.401, 0.366, and 0.748, respectively; p <0.001].

Open in new tab Download slide

We also specifically analysed the potential differences in carbohydrate utilisation between the clades using CAZy protein families. Even though many of the identified 26 non-core CAZy families were broadly distributed, we nonetheless observed a distinct prevalence pattern between the clades for some of them [Supplementary Table S4]. The alpha-mannosidase family GH38 [involved in the oligosaccharide mannose degradation] was over 93% prevalent in clades IV and V, but less than 33% prevalent in the other three clades [Fisher’s exact test, FDR p <9.6 × 10^-63]. On the other hand, the GH127 family involved in the degradation of other oligosaccharides, including arabinofuranose, was 98.8% and 100% prevalent in clades I and II, respectively, whereas in the other clades it was only 29.5% to 67% prevalent. Unlike the other clades, clade-III almost exclusively possessed a high prevalence [98%] of GH33 sialidases [Fisher’s exact test, FDR p <3.4 × 10^-113], which are involved in the breakdown and utilisation of mucin-derived human glycans [by cleaving terminal sialic acid from sialylated mucins⁵⁰].

Clade-III [B2 phylogroup equivalent] genomes encoded a high number of potentially proinflammatory proteins and toxins [based on the virulence factor database VFDB, Figure 4A, and Supplementary Table S5]. Half of the genomes in clade-III had genes encoding for biosynthesis of colibactin [pks genomic island—clb], a genotoxic molecule that can induce double-strand DNA breaks and chromosome aberrations.⁵¹ Several serine-protease autotransporters that are harmful for eukaryotic cells⁵² and are considered as enterotoxins [pic, sat, vat], were almost exclusively present in clade-III genomes, and their prevalence ranged from 30% to 77%. Moreover ɑ-haemolysin [hly], which was shown to disrupt epithelial cell tight junctions in patients with UC and thus promote intestinal inflammation,⁵³ was present in 25% of clade-III genomes. Notably, only a small number of the genomes from the other clades possessed the above-mentioned virulence factors.

Figure 4.

Virulence factor and antibiotic resistance gene profiles in the E. coli genomes from patients with IBD and healthy subjects. [A] Heatmap of virulence factors [rows] in E. coli genomes [columns] showing the presence [blue] or absence [beige] of the genes. [B] Heatmap of antibiotic resistance genes [rows] in E. coli genomes [columns] showing the presence [blue] or absence [beige] of the specific genes. Colour bars at the top of the plots represent the clinical phenotype of the subject and the clade membership of the E. coli genomes. Colour bars at the left of the plots represent the classification category of the virulence genes in panel [A] and the antibiotic resistance mechanism in panel [B]. Genes were clustered using Euclidean distances by applying the complete linkage method. For the list of p-values obtained from Fisher’s exact test for genes prevalence, see Supplementary Tables S5 and S6.

Open in new tab Download slide

We also explored the antibiotic resistance genes in the E. coli strains from each clade based on the CARD database for resistance genes. Genes encoding various efflux pumps [against tetracycline, macrolide, aminocoumarin, and aminoglycoside antibiotics] showed a diverse prevalence pattern between the clades [Figure 4B; and Supplementary Table S6]. Aminoglycoside resistance by inactivation had a relatively low prevalence in all the clades [<37%], but AAC[3]-IId was significantly higher in clade-II than in the other clades [Fisher’s exact test, FDR p <2.2 × 10^-6]. We also detected beta-lactamase encoding genes, some of which were prevalent [ampC and TEM-1] whereas others were rare, specially those coding for the CTX-M family of extended-spectrum β-lactamases.⁵⁴ Interestingly, genomes from clade-III had on average the lowest number of antibiotic resistance genes [Figure 4B].

Due to the marked differences in E. coli strains which we detected between clades, we followed with a phylogenetically-aware analysis to identify specific metabolic functions and virulence factors associated with IBD phenotypes, regardless of the clades [see below].

3.3. Antibiotic treatment in patients with IBD increases mobile antibiotic resistance genes in E. coli strains

Next we wanted to determine whether antibiotic treatment in a subset of patients with IBD affected the presence of antibiotic resistance genes in the E. coli strains. For 408 E. coli genomes, we obtained information regarding the antibiotic usage of the patients from whom the samples were obtained. By dividing these genomes into two groups, from subjects who had received antibiotic treatment in the last month prior to sampling [Abx+] and those who were free of antibiotics for at least 1 month [Abx-], we could detect 14 antibiotic resistance genes that were significantly enriched in the genomes from the Abx + group [Figure 5A]. Importantly, aac[6’]-Ib-cr, oxa-1, and catB3 genes conferring resistance against fluoroquinolones, beta-lactams, and chloramphenicol, respectively, were 16% prevalent in the genomes of Abx+, compared with only 2.54% in the Abx- [Fisher’s exact test, FDR p <8.4 × 10^-5]. These genes are known to be encoded together on some plasmids of Enterobacteriaceae,⁵⁵ and accordingly we observed these genes co-located on the same contigs of E. coli in our dataset. Overall, genomes in the Abx + and Abx- groups had a median of 11 and nine resistance genes, respectively [Mann-Whitney, p <5.3 × 10; Figure 5B]. Taken together, these findings suggest that E. coli strains of patients with IBD tend to be antibiotic-resistant, and antibiotic treatment further increases mobile antibiotic resistance in these strains.

Figure 5.

Higher prevalence of antibiotic resistance genes of E. coli in subjects undergoing antibiotic treatment. [A] Heatmap showing the prevalence of antibiotic resistance genes in E. coli genomes from subjects who received antibiotic treatment in the past month [Abx+] compared with those that were free of antibiotics for at least 1 month [Abx-]. *FDR p <0.05; Fisher’s exact test. [B] The distribution of the number of antibiotic resistance genes [AMR] in E. coli genomes from Abx + and Abx- subjects as defined in panel A. *p <0.001; Mann–Whitney test. Box plot whiskers mark observations within the 1.5 interquartile range of the upper and lower quartiles. FDR, false-discovery rate.

Open in new tab Download slide

3.4. Genome replication rates of E. coli strains are associated with clinical phenotypes but not with intestinal inflammation or antibiotic usage

To test how well E. coli strains grow in the IBD gut, we analysed the growth dynamics of the E. coli genomes obtained from 375 faecal and 16 aspirate metagenomes of patients with IBD and of healthy subjects in vivo using iRep [genome replication index^47]. This method is based on the fact that in replicating fast-growing bacteria, such as E. coli, the regions of the genome closer to the origin of replication are over-represented, and thus more metagenomic reads map to them. This analysis showed that patients with a pouch had the fastest-growing E. coli strains in vivo [Kruskal–Wallis, p <0.05; Figure 6A], followed by patients with CD, and those with UC and non-IBD subjects had the slowest-growing strains. It is worth noting that iRep was not correlated with the relative abundance of E. coli [Spearman r = -0.018, p = 0.79; Figure 6B]. A linear mixed effects model of iRep taking into account strain and patient-related parameters indicated that only the clinical phenotype of the subject was significantly associated with iRep [mixed effects model, p <0.001]. In contrast, the phylogenetic clade, relative abundance of E. coli, antibiotic usage [antibiotics in the past month], and faecal calprotectin [intestinal inflammation marker] failed to show statistically significant associations with iRep.

Figure 6.

Inferred growth rates of E. coli strains from patients with IBD and healthy subjects. [A] Inferred growth rates [using iRep [genome replication index]] of E. coli strains from patients with a pouch, CD, UC and healthy subjects. *p <0.05; ***p <0.001; Kruskal–Wallis test. [B] Spearman correlation between iRep and the relative abundance of E. coli in the microbiomes from corresponding metagenomic samples. [C] iRep of E. coli strains from subjects who had used antibiotics in the past month [Abx+] compared with subjects who were antibiotic-free for at least 1 month [Abx-]. [D] iRep of E. coli strains from patients with a pouch who were treated with ciprofloxacin, grouped according to the number of resistant mutations in drug target genes [gyrA and parC]. Zero mutations represent ciprofloxacin-sensitive strains. ‘Abx+ during treatment’ and ‘Abx-’ division define samples from patients taken during antibiotic treatment or in the absence of antibiotics for at least 1 month, respectively. In panels A, C, and D, box plot whiskers mark observations within the 1.5 interquartile range of the upper and lower quartiles. IBD, inflammatory bowel disease; UC, ulcerative colitis; CD, Crohn’s disease.

Open in new tab Download slide

Curiously, E. coli strains that grew in the presence of antibiotics had similar iRep to those from non-antibiotic treated patients [Mann–Whitney, p = 0.454; Figure 6C]. For a subset of patients with pouch [n = 233], for whom we had more comprehensive antibiotic usage data who were treated with ciprofloxacin, we could match resistance mutations with the respective iRep from the same sample. As the main mechanism of resistance again ciprofloxacin [a fluoroquinolone] is point mutations in drug target genes, gyrA and parC,²⁶ we extracted these genes and analysed them for mutations in previously known positions conferring resistance. Even when taking into account E. coli from samples taken during active antibiotic treatment, highly resistant strains with three mutations had a similar iRep to sensitive strains [no mutations] [Figure 6D], suggesting that the resistant strains grow very well within hosts in the presence of ciprofloxacin, with rates comparable to sensitive strains growing in the absence of antibiotics.

3.5. Functional gene repertoire of IBD-associated E. coli strains

Our next goal was to determine if there is an association between specific functional genes of E. coli and patients with IBD or healthy subjects. As we have shown earlier, many functions [including metabolism-, virulence-, and resistance-related] exhibit strong clade-dependent distribution patterns. To overcome such biases, we analysed the E. coli genomes while phylogenetically controlling for their relatedness to one another [population structure] with PySEER [see Methods], and focused on the metabolic functions and virulence factors associated with specific IBD phenotypes or with healthy controls [Supplementary Tables S7 to S10]. A gene family involved in sucrose and inulin degradation [CAZy family GH32] was enriched in patients with a pouch, with a prevalence of 82.5% compared with 55.5% to 66.9% in the other phenotypes [PySEER analysis, p <0.05; Supplementary Table S7]. Furthermore, patients with a pouch also harboured E. coli strains enriched [23.5%] in alpha-galactosidase [GH36; involved in hydrolysis of alpha-galactosidic residues], which was very low in UC [8%] and CD [5.6%] and absent in healthy subjects. In contrast, patients with a pouch were depleted [1.87%] in endo-alpha-sialidase [GH58; involved in utilisation of sialylated mucins from human glycoproteins or animal-derived food], which was significantly more abundant in healthy subjects [15.5%], and patients with CD [8.06%] and UC [22.2%] [PySEER analysis, p <0.05; Supplementary Table S7]. Finally, we did not find significant differences in the prevalence of propanediol dehydratase [pduC] between patients with CD and healthy subjects [Supplementary Table S8], in agreement with previous reports.^56,57.

In terms of virulence factors, the colibactin gene cluster was significantly more abundant in patients with UC [33.8%] and healthy subjects [26.1%] than in patients with CD [10.3%] and those with a pouch [6.5%] [PySEER analysis, p <0.05; Supplementary Table S9]. Surprisingly, healthy subjects were significantly enriched in senB and cnf1, encoding for enterotoxic and cytotoxic proteins, respectively. Last, iron transport [iro] genes showed significantly lower presence in healthy subjects compared with all IBD phenotypes. In terms of antibiotic resistance genes, the gene tetM, conferring resistance against tetracyclines, was almost solely found in 32.5% of the genomes in patients with a pouch [PySEER analysis, p <0.05; Supplementary Table S10]. In addition, the TEM-1 and OXA-1 beta-lactamases, conferring resistance against beta-lactam antibiotics, were significantly more prevalent in E. coli strains from patients with a pouch as well [PySEER analysis, p <0.05; Supplementary Table S10].

3.6. Polymorphism analysis of functional genes reveals IBD-specific adaptations

To complement the analysis of the differential prevalence of functional genes in patients with IBD and healthy subjects, we tested whether there are IBD-specific point mutations that are potentially adaptive for E. coli. We mapped the metagenome reads composing the MAGs from IBD phenotypes against a reference isolate genome from the same clade from a healthy phenotype, richly annotated by the NCBI prokaryotic genome pipeline to obtain all potential polymorphic positions in our E. coli strains. If a polymorphism is beneficial for the bacterium when colonising the IBD gut, we can expect it to be present in multiple IBD-associated strains and nearly absent in strains that colonise individuals without IBD. We therefore defined IBD-associated polymorphisms as those within a clade that are present in over 30% of IBD-derived genomes and absent from >80% non-IBD strains [see Methods]. We identified several polymorphisms that are predicted to change the amino acid sequence [henceforth non-synonymous mutations] in genes with annotated functions of E. coli strains from clade-III [Table 1; Supplementary Table S11].

Table 1.

Open in new tab

Polymorphisms in genes encoding bacterial cell envelope and secretion components in IBD-derived genomes.

Protein function	Variant pos. [amino-acid]	Mutation type	[B]uried/ [E]xposed residue	Mutation prevalence
Capsular polysaccharide biosynthesis protein KpsC	148, 233	Non-synonymous	E, E	0.43, 0.48
Colanic acid biosynthesis glycosyltransferase WcaC	303, 357	Non-synonymous	E, E	0.37, 0.56
OmpA family protein	202, 237, 242, 257	Non-synonymous,	E, E, E, B,	0.38, 0.43, 0.41, 0.31,
Type VI secretion system ATPase TssH	261, 787, 797	Non-synonymous	B, E, E	0.44, 0.39, 0.35
Type VI secretion system baseplate subunit TssK	296	Non-synonymous	E	0.36

Protein function	Variant pos. [amino-acid]	Mutation type	[B]uried/ [E]xposed residue	Mutation prevalence
Capsular polysaccharide biosynthesis protein KpsC	148, 233	Non-synonymous	E, E	0.43, 0.48
Colanic acid biosynthesis glycosyltransferase WcaC	303, 357	Non-synonymous	E, E	0.37, 0.56
OmpA family protein	202, 237, 242, 257	Non-synonymous,	E, E, E, B,	0.38, 0.43, 0.41, 0.31,
Type VI secretion system ATPase TssH	261, 787, 797	Non-synonymous	B, E, E	0.44, 0.39, 0.35
Type VI secretion system baseplate subunit TssK	296	Non-synonymous	E	0.36

Prevalence of selected mutation variants in genes encoding for bacterial cell surface- or secretion-associated proteins discussed in the text. For the full list of mutation variants, see Supplementary Table S11.

IBD, inflammatory bowel disease.

Table 1.

Open in new tab

Polymorphisms in genes encoding bacterial cell envelope and secretion components in IBD-derived genomes.

Protein function	Variant pos. [amino-acid]	Mutation type	[B]uried/ [E]xposed residue	Mutation prevalence
Capsular polysaccharide biosynthesis protein KpsC	148, 233	Non-synonymous	E, E	0.43, 0.48
Colanic acid biosynthesis glycosyltransferase WcaC	303, 357	Non-synonymous	E, E	0.37, 0.56
OmpA family protein	202, 237, 242, 257	Non-synonymous,	E, E, E, B,	0.38, 0.43, 0.41, 0.31,
Type VI secretion system ATPase TssH	261, 787, 797	Non-synonymous	B, E, E	0.44, 0.39, 0.35
Type VI secretion system baseplate subunit TssK	296	Non-synonymous	E	0.36

Protein function	Variant pos. [amino-acid]	Mutation type	[B]uried/ [E]xposed residue	Mutation prevalence
Capsular polysaccharide biosynthesis protein KpsC	148, 233	Non-synonymous	E, E	0.43, 0.48
Colanic acid biosynthesis glycosyltransferase WcaC	303, 357	Non-synonymous	E, E	0.37, 0.56
OmpA family protein	202, 237, 242, 257	Non-synonymous,	E, E, E, B,	0.38, 0.43, 0.41, 0.31,
Type VI secretion system ATPase TssH	261, 787, 797	Non-synonymous	B, E, E	0.44, 0.39, 0.35
Type VI secretion system baseplate subunit TssK	296	Non-synonymous	E	0.36

Prevalence of selected mutation variants in genes encoding for bacterial cell surface- or secretion-associated proteins discussed in the text. For the full list of mutation variants, see Supplementary Table S11.

IBD, inflammatory bowel disease.

Remarkably, some genes showed multiple independent non-synonymous mutations in the same operon in different genomes [Table 1; Supplementary Table S11]. Such convergent mutational patterns were especially notable in genes encoding bacterial cell surface- or secretion-associated components, such as colanic acid biosynthesis proteins [wcaC], outer membrane proteins OmpA, and the type VI secretion system baseplate subunit TssK. Such proteins may have direct proinflammatory roles in the gut, and some of them are known to be important antigens^58–60 and thus may contribute to clearance of the bacteria. Interestingly, the gene encoding the lipoprotein metalloprotease SslE, which contributes to activation of macrophages via toll-like receptor 2,⁶¹ showed several different mutations in various strains and was also significantly more abundant among isolate genomes from non-IBD strains compared with IBD strains [10/23 vs 4/36, respectively; p <0.05; Fisher’s exact test]. To confirm that mutations could alter the antigenicity of the mutated proteins, we mapped the mutations in outer-membrane proteins of E. coli [likely to interact with host immune system] onto the three-dimensional structure of the respective proteins, using ConSurf.⁶² This analysis revealed that the polysaccharide export protein Wza, the OmpA family protein YiaD, and the ferrichrome porin FhuA indeed showed mutations in surface-exposed residues.

Last, several of those mutated genes showed polymorphisms in multiple other clades [Supplementary Table S12], although the mutated positions were different. One may speculate that this may represent a convergent evolution process potentially toward lower antigenicity, which could be beneficial to the bacteria in the intestines of patients with IBD; however this requires experimental validation.

4. Discussion

Comparative genomic studies of E. coli in IBD have focused mostly on AIEC strains isolated from mucosal samples of patients with CD, and were unable to define genomic markers associated with this phenotype.^19,21 Analyses of commensal E. coli from faecal and aspirate samples of patients with IBD could not identify IBD-specific strains, genes, or alleles.^22–24 Here we characterised large and diverse collections of E. coli genomes from different IBD phenotypes to obtain a broader understanding of this species in the context of IBD pathogenesis.

We could classify these diverse E. coli strains into five distinct clades. Although no clade was unique to IBD, clade-III was more prevalent in UC [49%] and healthy controls [41%] than in patients with CD [31.5%] or pouchitis [17.5%]. This finding corroborates reports observing a comparable prevalence of E. coli isolates of 47% to 55% from the B2 phylogroup in biopsy tissues of patients with UC.^63,64 Notably, we did not detect associations between the relative abundance or the clade affiliation of E. coli and intestinal inflammation, in contrast to Mirsepasi-Lauridsen et al.⁶⁵ who found that patients with active UC colonised with E. coli B2 had significantly increased levels of faecal calprotectin. However, clade-III strains did have a higher presence of intestine-relevant virulence factors, which implies that some of them can potentially contribute to the severity of IBD, especially UC.

The genomic metabolic capacities varied across the clades and IBD phenotypes. Of particular interest were the sialidase genes, that encode enzymes that cleave sialic acid from host sialylated mucins, and were exclusively present in clade-III. Strains from this clade may therefore be better adapted for using host-derived mucin glycans. Sialylated glycans increase from the ileum to the colon along the gut.⁶⁶ As patients after pouch surgery are missing a colon, and the pouch is made from ileal tissue, this might explain the observation of fewer E. coli strains capable of using sialylated mucins in these patients. Indeed, those patients instead had higher abundance of strains that can use food-derived sugars, such as sucrose or inulin [poly-fructose]. Inulin supplementation was shown to reduce endoscopic and histological inflammation of the mucosa, coupled with increased butyrate concentration, in patients with a pouch⁶⁷; thus having inulin-degrading E. coli strains may counter such approaches in which inulin is intended to stimulate butyrate-producing bacteria.

The pdu operon [propanediol] is part of a pathway involved in the metabolism of fucose. It was found that utilisation of 1,2-propanediol by AIEC derived from patients with CD is critical for promoting T cell-driven intestinal inflammation in a mouse colitis model⁶⁸ and is correlated with increased cellular invasion and persistence in vitro.⁵⁶ The ability to metabolise fucose together with sialic acid [two major components of intestinal mucus] is linked with the expansion of enteropathogenic species like Salmonella typhimurium and Clostridioides difficile, which exploit increased mucosal carbohydrate availability following disruption of commensal microbiota by antibiotics.⁶⁹ We observed comparable prevalence of pduC in all IBD phenotypes and in healthy subjects.

Here we go beyond preliminary reports of higher within-host growth rates of E. coli in CD⁴⁸ into a comprehensive analysis of the major three IBD phenotypes. E. coli often attains higher abundance in patients with IBD due to the fact that higher levels of nitrate, which accumulate in the inflamed gut, allow it to perform anaerobic respiration rather than fermentation and consequently grow faster than other bacteria.⁷⁰ Enigmatically, we show that patients with a pouch harbour E. coli with the highest growth rates, even in the presence of antibiotics. Given that patients with a pouch also tend to have the highest stool frequency, it is possible that E. coli growth rates reflect intestinal emptying: very short transit times mean that luminal bacteria do not have time to use up all available nutrients and at the same time must divide fast enough or be diluted out. Conversely, we did not observe an association between the iRep and the calprotectin levels of the patients.

The Clade-III strains encoded the highest number of putative virulence factors, including ɑ-haemolysin, serine-protease autotransporters, the yersiniabactin iron uptake system, and the island for biosynthesis of the genotoxic molecule colibactin, confirming previous reports on B2 phylogroup strains.^12,20 Recently, mutational signatures that are caused by colibactin exposure were identified in human organoids and matched the mutational signature of human colorectal cancer tumours, implying that colibactin exposure increases colorectal cancer risk.⁷¹ Conversely, the pks genomic island is also required for the anti-colitis properties of the probiotic E. coli strain Nissle 1917⁷² that has shown clinical benefit in maintaining remission in patients with UC.⁷³ Using publicly available metagenomic data, we previously showed that the prevalence of E. coli colibactin genes is similar between IBD and non-IBD controls, but the relative abundance of colibactin pks island is 3-fold higher in IBD.⁷⁴ Here we show that E. coli from patients with UC are twice more likely to encode the pks island than strains from patients with CD or pouch. Combined with the direct effect of years of chronic inflammation of the colon that characterises UC, these patients may thus be further exposed to a higher risk of developing colorectal cancer, and should undergo surveillance colonoscopy even during years of remission.

When non-synonymous mutations occur independently in the same gene in multiple strains, this suggests that the gene in question is involved in host adaptation.⁷⁵ Here, we have identified multiple IBD-specific non-synonymous mutations in E. coli genes related to bacterial cell envelope and secretion components. This could reflect adaptation to the overactive host immune response² that enables such E. coli strains to expand in the gut of patients with IBD.^5,8,9 One may speculate that a more intimate contact of E. coli with host tissue and host immunity can select for bacteria that are less ‘visible’ to the immune system, and thus more rarely attacked by neutrophils or cleared by antibodies; however, such a hypothesis requires support by further experiments. Moreover, some mutations could actually make IBD-associated strains less proinflammatory to the host.

Another example is the numerous mutations we detected in the genes encoding the metalloprotease SslE, which were also rarer in IBD strains. SslE is generally involved in E. coli degradation of mucin substrates and colonisation of the intestinal and urinary tract,⁷⁶ in addition to stimulating the production of reactive nitrogen and oxygen species and proinflammatory chemokines in mouse macrophages.⁶¹ Thus, a change in SslE function could reduce the pathogenicity of strains that encode it to their hosts. One may speculate that some IBD-associated mutations, especially in proteins that may come into contact with the immune system, such as outer membrane proteins, could also result in less inflammatory variants of these bacterial proteins.

Our study has several limitations. First, genomic data can provide the genotypic potential of E. coli strains, but IBD adaptation can also take the form of changes in gene expression or post-transcriptional regulation that cannot be inferred directly from sequence. Second, we had substantially fewer genomes from UC and from healthy subjects than from CD and patients with a pouch. This is to a large extent because we relied on faecal metagenomes with relatively high E. coli abundance that enabled our analyses. It is possible that E. coli strains that are a small minority in the gut will show different genomic signatures, but to obtain MAGs from such strains will require deeper sequencing of metagenomes than is currently practised.

Our results may have important clinical implications. E. coli is often considered as an opportunistic pathogen in IBD, but direct mechanistic links between its potential virulence and inflammation in vivo [in patients with IBD] are still missing. The current study presents E. coli not just as a species that is generally well-adapted to the IBD gut, but also one that shows some strain-specific adaptations. This could explain the lower proinflammatory potential previously observed in E. coli strains from patients with pouchitis.²⁶ Furthermore, although E. coli prospers in the IBD gut, perhaps partly because it is a fast-grower, its abundance is not associated with inflammation in our dataset. Eradicating E. coli is therefore unlikely to generally benefit patients with IBD, unless the strain in question has pro-carcinogenic potential, like colibactin-producing E. coli.

Funding

This work was supported by a generous grant from the Leona M. and Harry B. Helmsley Charitable Trust [grant number #2018PG-CD007]. VD was partially supported by a fellowship from the Edmond J. Safra Center for Bioinformatics at Tel-Aviv University. UG was also supported by the Israeli Ministry of Science and Technology and the Israeli Ministry of Health.

Conflict of Interest

ID: consultation/advisory boards for Pfizer, Janssen, Abbvie, Takeda, Roche/Genentech, Celltrion, Celgene, Medtronic/Given Imaging, Rafa Laboratories, Neopharm, Sublimity, Arena, Gilead; speaking/teaching: Pfizer, Janssen, Abbvie, Takeda, Roche/Genentech, Celltrion, Celgene, Falk Pharma, Ferring. Grant support: Pfizer, Altman Research. The remaining authors declare no competing interests.

Acknowledgements

The authors wish to thank the members of the RMC IBD Centre and the Comprehensive Pouch Clinic for their help and stimulating discussion of the project. The authors wish to thank Stefan Green of the University of Illinois at Chicago, for his continued expert help in the metagenomic sequencing.

Author Contributions

VD, UG, and ID conceived and designed the study; VD developed the bioinformatic analysis and the comparative genomic pipelines and analysed the metagenomic data; LR analysed the metagenomic data; KR and ID collected, analysed and provided the clinical data; NW and ID enrolled and examined the patients; VD, UG, and ID wrote the paper. All authors read, discussed, and approved the final manuscript.

References

1.

Tulchinsky

H

,

Dotan

I

,

Alper

A

, et al. .

Comprehensive pouch clinic concept for follow-up of patients after ileal pouch anal anastomosis: report of 3 yearsʼ experience in a tertiary referral center

.

Inflamm Bowel Dis

2008

;

14

:

1125

–

32

.

2.

Sartor

RB

,

Wu

GD.

Roles for intestinal bacteria, viruses, and fungi in pathogenesis of inflammatory bowel diseases and therapeutic approaches.

Gastroenterology

2017

;

152

:

327

–

39.e4

.

3.

Liu

JZ

,

van Sommeren

S

,

Huang

H

, et al. .;

International Multiple Sclerosis Genetics Consortium; International IBD Genetics Consortium

.

Association analyses identify 38 susceptibility loci for inflammatory bowel disease and highlight shared genetic risk across populations.

Nat Genet

2015

;

47

:

979

–

86

.

4.

Ananthakrishnan

AN

,

Bernstein

CN

,

Iliopoulos

D

, et al. .

Environmental triggers in IBD: a review of progress and evidence.

Nat Rev Gastroenterol Hepatol

2018

;

15

:

39

–

49

.

5.

Gevers

D

,

Kugathasan

S

,

Denson

LA

, et al. .

The treatment-naive microbiome in new-onset Crohn’s disease.

Cell Host Microbe

2014

;

15

:

382

–

92

.

6.

Reshef

L

,

Kovacs

A

,

Ofer

A

, et al. .

Pouch inflammation is associated with a decrease in specific bacterial taxa.

Gastroenterology

2015

;

149

:

718

–

27

.

7.

Lloyd-Price

J

,

Arze

C

,

Ananthakrishnan

AN

, et al. .

Multi-omics of the gut microbial ecosystem in inflammatory bowel diseases

.

Nature

2019

;

569

:

655

–

62

.

8.

Dubinsky

V

,

Reshef

L

,

Rabinowitz

K

, et al. .

Dysbiosis in metabolic genes of the gut microbiomes of patients with an ileo-anal pouch resembles that observed in Crohn’s disease

.

mSystems

2021

;

6

:

e00984

–

20

.

9.

Darfeuille-Michaud

A

,

Boudeau

J

,

Bulois

P

, et al. .

High prevalence of adherent-invasive Escherichia coli associated with ileal mucosa in Crohn’s disease.

Gastroenterology

2004

;

127

:

412

–

21

.

10.

Tenaillon

O

,

Skurnik

D

,

Picard

B

,

Denamur

E.

The population genetics of commensal Escherichia coli.

Nat Rev Microbiol

2010

;

8

:

207

–

17

.

11.

Horesh

G

,

Blackwell

GA

,

Tonkin-Hill

G

, et al. .

A comprehensive and high-quality collection of Escherichia coli genomes and their genes

.

Microb Genomics

2021

. Doi: 10.1099/mgen.0.000499.

Google Scholar

OpenURL Placeholder Text

WorldCat

12.

Denamur

E

,

Clermont

O

,

Bonacorsi

S

,

Gordon

D.

The population genetics of pathogenic Escherichia coli.

Nat Rev Microbiol

2021

;

19

:

37

–

54

.

13.

Kruis

W

,

Fric

P

,

Pokrotnieks

J

, et al. .

Maintaining remission of ulcerative colitis with the probiotic Escherichia coli Nissle 1917 is as effective as with standard mesalazine.

Gut

2004

;

53

:

1617

–

23

.

14.

Darfeuille-Michaud

A

,

Neut

C

,

Barnich

N

, et al. .

Presence of adherent Escherichia coli strains in ileal mucosa of patients with Crohn’s disease.

Gastroenterology

1998

;

115

:

1405

–

13

.

15.

Palmela

C

,

Chevarin

C

,

Xu

Z

, et al. .

Adherent-invasive Escherichia coli in inflammatory bowel disease.

Gut

2018

;

67

:

574

–

87

.

16.

Martinez-Medina

M

,

Aldeguer

X

,

Lopez-Siles

M

, et al. .

Molecular diversity of Escherichia coli in the human gut: new ecological evidence supporting the role of adherent-invasive E. coli [AIEC] in Crohnʼs disease

.

Inflamm Bowel Dis

2009

;

15

:

872

–

82

.

17.

Elliott

TR

,

Hudspith

BN

,

Wu

G

, et al. .

Quantification and characterization of mucosa-associated and intracellular Escherichia coli in inflammatory bowel disease

.

Inflamm Bowel Dis

2013

;

19

:

2326

–

38

.

18.

Conte

MP

,

Longhi

C

,

Marazzato

M

, et al. .

Adherent-invasive Escherichia coli [AIEC] in pediatric Crohn’s disease patients: phenotypic and genetic pathogenic features.

BMC Res Notes

2014

;

7

:

748

.

19.

Desilets

M

,

Deng

X

,

Rao

C

, et al. .

Genome-based definition of an inflammatory bowel disease-associated adherent-invasive Escherichia coli Pathovar

.

Inflamm Bowel Dis

2016

;

22

:

1

–

12

.

20.

Ostblom

A

,

Adlerberth

I

,

Wold

AE

,

Nowrouzian

FL.

Pathogenicity island markers, virulence determinants malX and usp, and the capacity of Escherichia coli to persist in infants’ commensal microbiotas.

Appl Environ Microbiol

2011

;

77

:

2303

–

8

.

21.

O’Brien

CL

,

Bringer

MA

,

Holt

KE

, et al. .

Comparative genomics of Crohn’s disease-associated adherent-invasive Escherichia coli

.

Gut

2017

;

66

:

1382

–

9

.

22.

Vejborg

RM

,

Hancock

V

,

Petersen

AM

,

Krogfelt

KA

,

Klemm

P.

Comparative genomics of Escherichia coli isolated from patients with inflammatory bowel disease.

BMC Genomics

2011

;

12

:

316

.

23.

Rakitina

DV

,

Manolov

AI

,

Kanygina

AV

, et al. .

Genome analysis of E. coli isolated from Crohn’s disease patients.

BMC Genomics

2017

;

18

:

544

.

24.

Tyakht

AV

,

Manolov

AI

,

Kanygina

AV

, et al. .

Genetic diversity of Escherichia coli in gut microbiota of patients with Crohn’s disease discovered using metagenomic and genomic analyses.

BMC Genomics

2018

;

19

:

968

.

25.

Wattam

AR

,

Davis

JJ

,

Assaf

R

, et al. .

Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center.

Nucleic Acids Res

2017

;

45

:

D535

–

42

.

26.

Dubinsky

V

,

Reshef

L

,

Bar

N

, et al. .

Predominantly antibiotic-resistant intestinal microbiome persists in patients with pouchitis who respond to antibiotic therapy.

Gastroenterology

2020

;

158

:

610

–

24.e13

.

27.

Bolger

AM

,

Lohse

M

,

Usadel

B.

Trimmomatic: a flexible trimmer for Illumina sequence data.

Bioinformatics

2014

;

30

:

2114

–

20

.

28.

Langmead

B

,

Salzberg

SL.

Fast gapped-read alignment with Bowtie 2.

Nat Methods

2012

;

9

:

357

–

9

.

29.

Nayfach

S

,

Rodriguez-Mueller

B

,

Garud

N

,

Pollard

KS.

An integrated metagenomics pipeline for strain profiling reveals novel patterns of bacterial transmission and biogeography.

Genome Res

2016

;

26

:

1612

–

25

.

30.

Wick

RR

,

Judd

LM

,

Gorrie

CL

,

Holt

KE.

Unicycler: resolving bacterial genome assemblies from short and long sequencing reads.

PLoS Comput Biol

2017

;

13

:

e1005595

.

31.

Parks

DH

,

Imelfort

M

,

Skennerton

CT

,

Hugenholtz

P

,

Tyson

GW.

CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes.

Genome Res

2015

;

25

:

1043

–

55

.

32.

Nurk

S

,

Meleshko

D

,

Korobeynikov

A

,

Pevzner

PA.

metaSPAdes: a new versatile metagenomic assembler.

Genome Res

2017

;

27

:

824

–

34

.

33.

Kang

DD

,

Li

F

,

Kirton

E

, et al. .

MetaBAT 2: an adaptive binning algorithm for robust and efficient genome reconstruction from metagenome assemblies.

PeerJ

2019

;

7

:

e7359

.

34.

Bowers

RM

,

Kyrpides

NC

,

Stepanauskas

R

, et al. .;

Genome Standards Consortium

.

Minimum information about a single amplified genome [MISAG] and a metagenome-assembled genome [MIMAG] of bacteria and archaea.

Nat Biotechnol

2017

;

35

:

725

–

31

.

35.

Asnicar

F

,

Thomas

AM

,

Beghini

F

, et al. .

Precise phylogenetic analysis of microbial isolates and genomes from metagenomes using PhyloPhlAn 3.0.

Nat Commun

2020

;

11

:

2500

.

36.

Jain

C

,

Rodriguez-R

LM

,

Phillippy

AM

,

Konstantinidis

KT

,

Aluru

S.

High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries.

Nat Commun

2018

;

9

:

5114

.

37.

Beghain

J

,

Bridier-Nahmias

A

,

Le Nagard

H

,

Denamur

E

,

Clermont

O.

ClermonTyping: an easy-to-use and accurate in silico method for Escherichia genus strain phylotyping

.

Microb Genomics

2018

;

4

:

e000192

.

Google Scholar

Crossref

WorldCat

38.

Seemann

T.

Prokka: rapid prokaryotic genome annotation.

Bioinformatics

2014

;

30

:

2068

–

9

.

39.

Page

AJ

,

Cummins

CA

,

Hunt

M

, et al. .

Roary: rapid large-scale prokaryote pan genome analysis.

Bioinformatics

2015

;

31

:

3691

–

3

.

40.

Huerta-Cepas

J

,

Forslund

K

,

Coelho

LP

, et al. .

Fast genome-wide functional annotation through orthology assignment by eggNOG-Mapper.

Mol Biol Evol

2017

;

34

:

2115

–

22

.

41.

Buchfink

B

,

Xie

C

,

Huson

DH.

Fast and sensitive protein alignment using DIAMOND.

Nat Methods

2015

;

12

:

59

–

60

.

42.

Aramaki

T

,

Blanc-Mathieu

R

,

Endo

H

, et al. .

KofamKOALA: KEGG Ortholog assignment based on profile HMM and adaptive score threshold.

Bioinformatics

2020

;

36

:

2251

–

2

.

43.

Lombard

V

,

Golaconda Ramulu

H

,

Drula

E

,

Coutinho

PM

,

Henrissat

B.

The carbohydrate-active enzymes database [CAZy] in 2013.

Nucleic Acids Res

2014

;

42

:

D490

–

5

.

44.

Chen

L

,

Zheng

D

,

Liu

B

,

Yang

J

,

Jin

Q.

VFDB 2016: hierarchical and refined dataset for big data analysis–10 years on.

Nucleic Acids Res

2016

;

44

:

D694

–

7

.

45.

Alcock

BP

,

Raphenya

AR

,

Lau

TT

, et al. .

CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database

.

Nucleic Acids Res

2019

. Doi: 10.1093/nar/gkz935.

Google Scholar

OpenURL Placeholder Text

WorldCat

46.

Lees

JA

,

Galardini

M

,

Bentley

SD

,

Weiser

JN

,

Corander

J.

pyseer: a comprehensive tool for microbial pangenome-wide association studies.

Bioinformatics

2018

;

34

:

4310

–

2

.

47.

Brown

CT

,

Olm

MR

,

Thomas

BC

,

Banfield

JF.

Measurement of bacterial replication rates in microbial communities.

Nat Biotechnol

2016

;

34

:

1256

–

63

.

48.

Korem

T

,

Zeevi

D

,

Suez

J

, et al. .

Growth dynamics of gut microbiota in health and disease inferred from single metagenomic samples.

Science

2015

;

349

:

1101

–

6

.

49.

Clermont

O

,

Christenson

JK

,

Denamur

E

,

Gordon

DM.

The Clermont Escherichia coli phylo-typing method revisited: improvement of specificity and detection of new phylo-groups: a new E. coli phylo-typing method

.

Environ Microbiol Rep

2013

;

5

:

58

–

65

.

50.

Tailford

LE

,

Crost

EH

,

Kavanaugh

D

,

Juge

N.

Mucin glycan foraging in the human gut microbiome.

Front Genet

2015

;

6

:

81

.

51.

Nougayrède

JP

,

Homburg

S

,

Taieb

F

, et al. .

Escherichia coli induces DNA double-strand breaks in eukaryotic cells.

Science

2006

;

313

:

848

–

51

.

52.

Ruiz-Perez

F

,

Nataro

JP.

Bacterial serine proteases secreted by the autotransporter pathway: classification, specificity, and role in virulence.

Cell Mol Life Sci

2014

;

71

:

745

–

70

.

53.

Mirsepasi-Lauridsen

HC

,

Du

Z

,

Struve

C

, et al. .

Secretion of alpha-hemolysin by escherichia coli disrupts tight junctions in ulcerative colitis patients

.

Clin Transl Gastroenterol

2016

;

7

:

e149

.

54.

Rossolini

GM

,

D’andrea

MM

,

Mugnaioli

C.

The spread of CTX-M-type extended-spectrum β-lactamases

.

Clin Microbiol Infect

2008

;

14

:

33

–

41

.

55.

Bunny

KL

,

Hall

RM

,

Stokes

HW.

New mobile gene cassettes containing an aminoglycoside resistance gene, aacA7, and a chloramphenicol resistance gene, catB3, in an integron in pBWH301.

Antimicrob Agents Chemother

1995

;

39

:

686

–

93

.

56.

Dogan

B

,

Suzuki

H

,

Herlekar

D

, et al. .

Inflammation-associated adherent-invasive Escherichia coli are enriched in pathways for use of propanediol and iron and M-cell translocation

.

Inflamm Bowel Dis

2014

;

20

:

1919

–

32

.

57.

Viladomiu

M

,

Kivolowitz

C

,

Abdulhamid

A

, et al. .

IgA-coated E. coli enriched in Crohn’s disease spondyloarthritis promote T _H 17-dependent inflammation

.

Sci Transl Med

2017

;

9

:

eaaf9655

.

58.

Chaban

B

,

Hughes

HV

,

Beeby

M.

The flagellum in bacterial pathogens: for motility and a whole lot more.

Semin Cell Dev Biol

2015

;

46

:

91

–

103

.

59.

Confer

AW

,

Ayalew

S.

The OmpA family of proteins: roles in bacterial pathogenesis and immunity.

Vet Microbiol

2013

;

163

:

207

–

22

.

60.

Navarro-Garcia

F

,

Ruiz-Perez

F

,

Cataldi

Á

,

Larzábal

M.

Type VI secretion system in pathogenic Escherichia coli: structure, role in virulence, and acquisition.

Front Microbiol

2019

;

10

:

1965

.

61.

Tapader

R

,

Bose

D

,

Dutta

P

,

Das

S

,

Pal

A.

SslE [YghJ], a cell-associated and secreted lipoprotein of neonatal septicemic Escherichia coli, induces toll-like receptor 2-dependent macrophage activation and proinflammation through NF-κB and MAP kinase signaling.

Infect Immun

2018

;

86

,

e00399

–

18

.

62.

Ashkenazy

H

,

Abadi

S

,

Martz

E

, et al. .

ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules.

Nucleic Acids Res

2016

;

44

:

W344

–

50

.

63.

Kotlowski

R

,

Bernstein

CN

,

Sepehri

S

,

Krause

DO.

High prevalence of Escherichia coli belonging to the B2+D phylogenetic group in inflammatory bowel disease.

Gut

2007

;

56

:

669

–

75

.

64.

Sepehri

S

,

Khafipour

E

,

Bernstein

CN

, et al. .

Characterization of Escherichia coli isolated from gut biopsies of newly diagnosed patients with inflammatory bowel disease

.

Inflamm Bowel Dis

2011

;

17

:

1451

–

63

.

65.

Mirsepasi-Lauridsen

HC

,

Halkjaer

SI

,

Mortensen

EM

, et al. .

Extraintestinal pathogenic Escherichia coli are associated with intestinal inflammation in patients with ulcerative colitis.

Sci Rep

2016

;

6

:

31152

.

66.

Robbe

C

,

Capon

C

,

Coddeville

B

,

Michalski

JC.

Structural diversity and specific distribution of O-glycans in normal human mucins along the intestinal tract.

Biochem J

2004

;

384

:

307

–

16

.

67.

Welters

CF

,

Heineman

E

,

Thunnissen

FB

,

van den Bogaard

AE

,

Soeters

PB

,

Baeten

CG.

Effect of dietary inulin supplementation on inflammation of pouch mucosa in patients with an ileal pouch-anal anastomosis.

Dis Colon Rectum

2002

;

45

:

621

–

7

.

68.

Viladomiu

M

,

Metz

ML

,

Lima

SF

, et al. .;

JRI Live Cell Bank

.

Adherent-invasive E. coli metabolism of propanediol in Crohn’s disease regulates phagocytes to drive intestinal inflammation.

Cell Host Microbe

2021

;

29

:

607

–

19.e8

.

69.

Ng

KM

,

Ferreyra

JA

,

Higginbottom

SK

, et al. .

Microbiota-liberated host sugars facilitate post-antibiotic expansion of enteric pathogens.

Nature

2013

;

502

:

96

–

9

.

70.

Winter

SE

,

Winter

MG

,

Xavier

MN

, et al. .

Host-derived nitrate boosts growth of E. coli in the inflamed gut.

Science

2013

;

339

:

708

–

11

.

71.

Pleguezuelos-Manzano

C

,

Puschhof

J

,

Rosendahl Huber

A

, et al. .;

Genomics England Research Consortium

.

Mutational signature in colorectal cancer caused by genotoxic pks+ E. coli.

Nature

2020

;

580

:

269

–

73

.

72.

Olier

M

,

Marcq

I

,

Salvador-Cartier

C

, et al. .

Genotoxicity of Escherichia coli Nissle 1917 strain cannot be dissociated from its probiotic activity.

Gut Microbes

2012

;

3

:

501

–

9

.

73.

Schultz

M.

Clinical use of E. coli Nissle 1917 in inflammatory bowel disease

.

Inflamm Bowel Dis

2008

;

14

:

1012

–

8

.

74.

Dubinsky

V

,

Dotan

I

,

Gophna

U.

Carriage of colibactin-producing bacteria and colorectal cancer risk.

Trends Microbiol

2020

;

28

:

874

–

6

.

75.

Zhao

S

,

Lieberman

TD

,

Poyet

M

, et al. .

Adaptive evolution within gut microbiomes of healthy people.

Cell Host Microbe

2019

;

25

:

656

–

67.e8

.

76.

Nesta

B

,

Spraggon

G

,

Alteri

C

, et al. .

FdeC, a novel broadly conserved Escherichia coli adhesin eliciting protection against urinary tract infections

.

mBio

2012

;

3

:

9

.

Google Scholar

Crossref

WorldCat

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/pages/standard-publication-reuse-rights)

Download all slides

Month:	Total Views:
May 2022	72
June 2022	34
July 2022	25
August 2022	10
September 2022	48
October 2022	30
November 2022	62
December 2022	17
January 2023	27
February 2023	38
March 2023	22
April 2023	28
May 2023	21
June 2023	20
July 2023	31
August 2023	44
September 2023	36
October 2023	31
November 2023	39
December 2023	46
January 2024	53
February 2024	57
March 2024	42
April 2024	44
May 2024	57
June 2024	47
July 2024	59
August 2024	64
September 2024	40
October 2024	79
November 2024	46
December 2024	15
January 2025	36
February 2025	49
March 2025	49
April 2025	52
May 2025	7

Article Contents

Escherichia coli Strains from Patients with Inflammatory Bowel Diseases have Disease-specific Genomic Adaptations

Abstract

1. Introduction

2. Materials and Methods

2.1. E. coli genomes from patients with IBD and healthy subjects and their metadata

2.2. Pipeline for the reconstruction of genomes from metagenomes

2.3. Whole-genome phylogeny and average nucleotide identity analysis

2.4. E. coli phylogroups typing

2.5. Core genes single nucleotide polymorphism analysis

2.6. Functional annotation and analysis of the genomes

2.7. Bacterial genome-wide association study analysis incorporating phylogenetic information

2.8. Genome replication rates of E. coli within subjects

2.9. Identification of mutation in ciprofloxacin antibiotic target genes

2.10. Mutational analysis of E. coli functional genes

2.11. Statistical analysis

2.12. Data availability [data sharing statement]

3. Results

3.1. E. coli strains from patients with IBD and healthy subjects belong to distinct lineages that are differentially distributed across phenotypes

3.2. E. coli clades present in patients with IBD may have particular functional potential

3.3. Antibiotic treatment in patients with IBD increases mobile antibiotic resistance genes in E. coli strains

3.4. Genome replication rates of E. coli strains are associated with clinical phenotypes but not with intestinal inflammation or antibiotic usage

3.5. Functional gene repertoire of IBD-associated E. coli strains

3.6. Polymorphism analysis of functional genes reveals IBD-specific adaptations

4. Discussion

Funding

Conflict of Interest

Acknowledgements

Author Contributions

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Escherichia coli Strains from Patients with Inflammatory Bowel Diseases have Disease-specific Genomic Adaptations

Abstract

1. Introduction

2. Materials and Methods

2.1. E. coli genomes from patients with IBD and healthy subjects and their metadata

2.2. Pipeline for the reconstruction of genomes from metagenomes

2.3. Whole-genome phylogeny and average nucleotide identity analysis

2.4. E. coli phylogroups typing

2.5. Core genes single nucleotide polymorphism analysis

2.6. Functional annotation and analysis of the genomes

2.7. Bacterial genome-wide association study analysis incorporating phylogenetic information

2.8. Genome replication rates of E. coli within subjects

2.9. Identification of mutation in ciprofloxacin antibiotic target genes

2.10. Mutational analysis of E. coli functional genes

2.11. Statistical analysis

2.12. Data availability [data sharing statement]

3. Results

3.1. E. coli strains from patients with IBD and healthy subjects belong to distinct lineages that are differentially distributed across phenotypes

3.2. E. coli clades present in patients with IBD may have particular functional potential

3.3. Antibiotic treatment in patients with IBD increases mobile antibiotic resistance genes in E. coli strains

3.4. Genome replication rates of E. coli strains are associated with clinical phenotypes but not with intestinal inflammation or antibiotic usage

3.5. Functional gene repertoire of IBD-associated E. coli strains

3.6. Polymorphism analysis of functional genes reveals IBD-specific adaptations

4. Discussion

Funding

Conflict of Interest

Acknowledgements

Author Contributions

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only