Abstract

Objectives

The objective of this study was to determine the population structure of Escherichia coli ST73 isolated from human bacteraemia and urinary tract infections.

Methods

The genomes of 22 E. coli ST73 isolates were sequenced using the Illumina HiSeq platform. High-resolution SNP typing was used to create a phylogenetic tree. Comparative genomics were also performed using a pangenome approach. In silico and S1-PFGE plasmid profiling was conducted, and isolates were checked for their ability to survive exposure to human serum.

Results

E. coli ST73 isolates circulating in clinically unrelated episodes show a high degree of diversity at a whole-genome level, but exhibit conservation in gene content, particularly in virulence-associated gene carriage. The isolates also contain a highly diverse plasmid pool that confers MDR via carriage of CTX-M genes.

Conclusions

Our data show that a rise in incidence of MDR E. coli ST73 clinical isolates is not due to a circulating outbreak strain as in E. coli ST131. Rather the ST73 circulating strains are distantly related and carry a diverse set of resistance plasmids. This suggests that the evolutionary events behind emergence of drug-resistant E. coli differ between lineages.

Introduction

Extra-intestinal pathogenic Escherichia coli (ExPEC) is the term used to describe strains of E. coli that can asymptomatically colonize the intestinal tract of humans and animals, but cause disease in non-intestinal sites.1 In humans ExPEC most commonly cause urinary tract infections (UTIs), which is thought to affect as many as 70% of the global female population.1 ExPEC are also capable of causing bacteraemia, where large numbers of bacterial cells gain entry to the bloodstream, causing a potentially life-threatening infection. The incidence of bacteraemia caused by ExPEC has been increasing rapidly in the past 10 years, with ExPEC now the most common cause of bacteraemia in Europe, overtaking MRSA and Clostridium difficile bloodstream infections.2

The rise in cases of ExPEC bacteraemia is mirrored by a marked increase in the carriage of MDR plasmids in ExPEC. In particular, ExPEC are associated with the sustained carriage and dissemination of genes encoding ESBL, especially the CTX-M variant. In some countries as many as 50% of bacteraemia ExPEC isolates are ESBL-positive isolates.2 Numerous epidemiological studies have shown the E. coli ST131 clone to be the most commonly isolated MDR ExPEC strain type from human clinical cases.3,4 ST73 is another phylogroup B2 strain type that is also frequently isolated from human clinical cases.4 Unlike ST131, which has been extensively studied and characterized at population and genomic levels,5–7 very little is known about ST73 beyond the reference ExPEC strain CFT073.8

We recently conducted a molecular epidemiological survey of bacteraemia ExPEC isolates from the East Midlands area of the UK.9 Our study found that MDR ExPEC were significantly more abundant in bacteraemia samples than clinical urine samples over a concomitant time frame. Perhaps more surprisingly, our study also showed that ST73 had risen in prevalence to become the most commonly isolated MDR ExPEC strain type from bacteraemia samples, and not ST131 as observed in a previous study in the same region.4 Given that the rapid increase in clinical cases of MDR E. coli ST131 is attributable to rapid global dissemination of a successful clone,6,7 we sought to determine whether the high incidence of MDR ST73 clinical isolates from our bacteraemia study was also due to the emergence of a successful dominant clone.

Methods

Bacterial isolates

An epidemiological investigation of bacteraemia and UTI E. coli isolates conducted by our group in 2013 identified an increase in the number of E. coli ST73 clinical isolates containing the CTX-M gene conferring MDR.9 Twenty-two isolates were selected for sequencing, incorporating 10 ESBL-positive blood isolates, 2 ESBL-negative blood isolates, 3 ESBL-positive UTI isolates and 7 ESBL-negative UTI isolates (Table 1). These were selected to represent the diversity in ESBL phenotype in the sample population.

Table 1.

List of isolates and genome assembly statistics used in this study

IsolatePCR ESBL typeGenome size (bp)Number of contigsN50 contig sizePercentage mapped readsS1-PFGE plasmid profileIn silico Inc typing
B10CTX-M-155 173 276106108 73194.5112 kbpFIB(AP001918), FII, Col156
B14negative5 099 552158113 74590.21negative
B18CTX-M-155 120 683125122 41791.9333.5 kbp, 48.5 kbpnon-typeable
B29CTX-M-155 261 474168101 82093.7112 kbpFIB(AP001918), FII, Col156
B36CTX-M-155 191 523152125 32192.26145 kbpFIB(pB171), FII, Col156
B40CTX-M-155 257 611165103 45991.43140 kbpFIA, FIB(AP001918)
B72CTX-M-155 158 804110134 65484.5333.5 kbp, 82 kbpFII(pRSB107)
B73CTX-M-155 150 717156121 32994.38112 kbpFIB(AP001918), FII, Col156
B84CTX-M-155 182 704137134 97293.42112 kbpFIB(AP001918), FII, Col156
B91CTX-M-155 155 91119779 51590.23120 kbpFIB(S), FII, Col156
B102negative5 075 95616087 16493.51negative
B134OXA-1, CTX-M-155 230 535154116 03993.6182 kbpFIB(AP001918), FII, FIA
U1negative5 243 352151123 11286.52negative
U7negative5 176 031145126 22893.16negative
U21negative5 145 668162113 45991.81negative
U24negative5 120 446147110 56089.83negative
U30negative5 287 542160139 41687.12negative
U36negative5 162 072138114 80491.04negative
U42CTX-M-155 188 710155106 92093.92112 kbpFIB(AP001918), Col156, Col8282, Col(MG828)
U48negative5 080 928112113 44087.44negative
U50CTX-M-155 256 879145117 62194.0348.5 kbpFII
U76CTX-M-155 179 037140133 76194.11112 kbpFIB(AP001918), FII, Col156
IsolatePCR ESBL typeGenome size (bp)Number of contigsN50 contig sizePercentage mapped readsS1-PFGE plasmid profileIn silico Inc typing
B10CTX-M-155 173 276106108 73194.5112 kbpFIB(AP001918), FII, Col156
B14negative5 099 552158113 74590.21negative
B18CTX-M-155 120 683125122 41791.9333.5 kbp, 48.5 kbpnon-typeable
B29CTX-M-155 261 474168101 82093.7112 kbpFIB(AP001918), FII, Col156
B36CTX-M-155 191 523152125 32192.26145 kbpFIB(pB171), FII, Col156
B40CTX-M-155 257 611165103 45991.43140 kbpFIA, FIB(AP001918)
B72CTX-M-155 158 804110134 65484.5333.5 kbp, 82 kbpFII(pRSB107)
B73CTX-M-155 150 717156121 32994.38112 kbpFIB(AP001918), FII, Col156
B84CTX-M-155 182 704137134 97293.42112 kbpFIB(AP001918), FII, Col156
B91CTX-M-155 155 91119779 51590.23120 kbpFIB(S), FII, Col156
B102negative5 075 95616087 16493.51negative
B134OXA-1, CTX-M-155 230 535154116 03993.6182 kbpFIB(AP001918), FII, FIA
U1negative5 243 352151123 11286.52negative
U7negative5 176 031145126 22893.16negative
U21negative5 145 668162113 45991.81negative
U24negative5 120 446147110 56089.83negative
U30negative5 287 542160139 41687.12negative
U36negative5 162 072138114 80491.04negative
U42CTX-M-155 188 710155106 92093.92112 kbpFIB(AP001918), Col156, Col8282, Col(MG828)
U48negative5 080 928112113 44087.44negative
U50CTX-M-155 256 879145117 62194.0348.5 kbpFII
U76CTX-M-155 179 037140133 76194.11112 kbpFIB(AP001918), FII, Col156

Isolates with the prefix B were isolated from bacteraemia cases and isolates with the prefix U were isolated from UTI.

Percentage mapped reads equates to reads mapped against the CFT073 genome.

N50 is a weighted median statistic such that 50% of the entire assembly is contained in contigs or scaffolds equal to or larger than this value.

Table 1.

List of isolates and genome assembly statistics used in this study

IsolatePCR ESBL typeGenome size (bp)Number of contigsN50 contig sizePercentage mapped readsS1-PFGE plasmid profileIn silico Inc typing
B10CTX-M-155 173 276106108 73194.5112 kbpFIB(AP001918), FII, Col156
B14negative5 099 552158113 74590.21negative
B18CTX-M-155 120 683125122 41791.9333.5 kbp, 48.5 kbpnon-typeable
B29CTX-M-155 261 474168101 82093.7112 kbpFIB(AP001918), FII, Col156
B36CTX-M-155 191 523152125 32192.26145 kbpFIB(pB171), FII, Col156
B40CTX-M-155 257 611165103 45991.43140 kbpFIA, FIB(AP001918)
B72CTX-M-155 158 804110134 65484.5333.5 kbp, 82 kbpFII(pRSB107)
B73CTX-M-155 150 717156121 32994.38112 kbpFIB(AP001918), FII, Col156
B84CTX-M-155 182 704137134 97293.42112 kbpFIB(AP001918), FII, Col156
B91CTX-M-155 155 91119779 51590.23120 kbpFIB(S), FII, Col156
B102negative5 075 95616087 16493.51negative
B134OXA-1, CTX-M-155 230 535154116 03993.6182 kbpFIB(AP001918), FII, FIA
U1negative5 243 352151123 11286.52negative
U7negative5 176 031145126 22893.16negative
U21negative5 145 668162113 45991.81negative
U24negative5 120 446147110 56089.83negative
U30negative5 287 542160139 41687.12negative
U36negative5 162 072138114 80491.04negative
U42CTX-M-155 188 710155106 92093.92112 kbpFIB(AP001918), Col156, Col8282, Col(MG828)
U48negative5 080 928112113 44087.44negative
U50CTX-M-155 256 879145117 62194.0348.5 kbpFII
U76CTX-M-155 179 037140133 76194.11112 kbpFIB(AP001918), FII, Col156
IsolatePCR ESBL typeGenome size (bp)Number of contigsN50 contig sizePercentage mapped readsS1-PFGE plasmid profileIn silico Inc typing
B10CTX-M-155 173 276106108 73194.5112 kbpFIB(AP001918), FII, Col156
B14negative5 099 552158113 74590.21negative
B18CTX-M-155 120 683125122 41791.9333.5 kbp, 48.5 kbpnon-typeable
B29CTX-M-155 261 474168101 82093.7112 kbpFIB(AP001918), FII, Col156
B36CTX-M-155 191 523152125 32192.26145 kbpFIB(pB171), FII, Col156
B40CTX-M-155 257 611165103 45991.43140 kbpFIA, FIB(AP001918)
B72CTX-M-155 158 804110134 65484.5333.5 kbp, 82 kbpFII(pRSB107)
B73CTX-M-155 150 717156121 32994.38112 kbpFIB(AP001918), FII, Col156
B84CTX-M-155 182 704137134 97293.42112 kbpFIB(AP001918), FII, Col156
B91CTX-M-155 155 91119779 51590.23120 kbpFIB(S), FII, Col156
B102negative5 075 95616087 16493.51negative
B134OXA-1, CTX-M-155 230 535154116 03993.6182 kbpFIB(AP001918), FII, FIA
U1negative5 243 352151123 11286.52negative
U7negative5 176 031145126 22893.16negative
U21negative5 145 668162113 45991.81negative
U24negative5 120 446147110 56089.83negative
U30negative5 287 542160139 41687.12negative
U36negative5 162 072138114 80491.04negative
U42CTX-M-155 188 710155106 92093.92112 kbpFIB(AP001918), Col156, Col8282, Col(MG828)
U48negative5 080 928112113 44087.44negative
U50CTX-M-155 256 879145117 62194.0348.5 kbpFII
U76CTX-M-155 179 037140133 76194.11112 kbpFIB(AP001918), FII, Col156

Isolates with the prefix B were isolated from bacteraemia cases and isolates with the prefix U were isolated from UTI.

Percentage mapped reads equates to reads mapped against the CFT073 genome.

N50 is a weighted median statistic such that 50% of the entire assembly is contained in contigs or scaffolds equal to or larger than this value.

Genome sequencing and analysis

Isolates were sequenced on the Illumina HiSeq2500 platform using 2 × 250 bp paired-end sequencing (Table 1). Genome assemblies were performed using Velvet and PAGIT,10 which reordered contigs based on the CFT073 reference genome.8 Assembled genomes were annotated using Prokka.11 ProgressiveMauve was used to create a whole-genome alignment of the assembled genomes.12 High-resolution SNP typing was performed by mapping fastQ files against the reference ST73 genome CFT073 using SMALT (https://www.sanger.ac.uk/resources/software/smalt/#t_2) and Samtools. Resulting VCF files were filtered using vcftools13 to retain only SNPs with a MinQ 30, MinDP 10 and MinAF 0.8. The filtered VCF files were used to produce a consensus sequence for each strain relative to CFT073. The sequences were aligned using Mugsy,14 from which a maximum likelihood phylogeny was created using RaxML implementing the GTR-Gamma model.15 All raw sequence data have been deposited in the European Nucleotide Archive under project accession number PRJEB9931.

Pangenome analysis

A pangenome of the 22 sequenced strains and CFT073 was made using Gegenees.16 To determine whether there were loci associated with bacteraemia in ST73, the genetic content of bacteraemia isolates was compared against UTI isolates using a cut-off of 80% identity across 80% of bacteraemia strains and 80% identity across 20% of UTI strains. An identical analysis was conducted for ESBL positive against ESBL negative to attempt to identify loci associated with ESBL carriage. Presence of virulence-associated genes17 was determined by BlastN analysis of gene sequences against the de novo-assembled genome of each strain.

Plasmid typing

In silico plasmid typing was performed using a locally installed version of the PlasmidFinder database.18 Assembled genomes were compared with the database using BlastN to identify plasmid types present in each genome. Plasmid profiling was also performed using the S1-PFGE method.19

Results

The observed increase in MDR E. coli ST73 clinical isolates is due to a highly diverse group of strains

Sequence data for all 22 isolates were mapped against the CFT073 reference genome and a high-resolution SNP phylogenetic tree was constructed (Figure 1). The phylogenetic tree shows that bacteraemia and UTI isolates are intermixed throughout the phylogeny, as are ESBL-positive and -negative isolates. Pairwise SNP distance calculations between isolates showed that the minimum SNP distance between any two isolates was 416 SNPs, and the maximum distance was 6026 SNPs (Figure S1A, available as Supplementary data at JAC Online).

Maximum likelihood phylogenetic tree of clinical ST73 isolates, with S1-PFGE and in silico plasmid profiling superimposed. Plasmid sizes, as determined by S1-PFGE, and Inc types, as determined by in silico analysis, are indicated in the legend to the right. This figure appears in colour in the online version of JAC and in black and white in the print version of JAC.
Figure 1.

Maximum likelihood phylogenetic tree of clinical ST73 isolates, with S1-PFGE and in silico plasmid profiling superimposed. Plasmid sizes, as determined by S1-PFGE, and Inc types, as determined by in silico analysis, are indicated in the legend to the right. This figure appears in colour in the online version of JAC and in black and white in the print version of JAC.

Comparative genomic analysis indicates diversity between ST73 genomes occurs at single base pair mutation level and in plasmid repertoire

An alignment of all the ST73 genomes using progressiveMauve indicated genetic variation predominantly occurring in small contigs of the assemblies (Supplementary Data), suggesting that most gene-content variation occurs in plasmids and other mobile genetic elements. We created a pangenome of the ST73 genomes using Gegenees (Supplementary Data) showing a core genome of 3.81 Mbp, and 1201 conserved CDS from a total of 10 696 CDS, consistent with analyses performed on the E. coli species and on E. coli ST131.20,21 We performed in silico analysis to determine the presence of the major ExPEC virulence-associated genes in our dataset (Supplementary Data). This shows some differences in carriage of virulence genes but a relatively fixed virulence gene profile. The comparison of UTI and bacteraemia isolates for virulence gene carriage also showed identical profiles between the two groups. We sought to identify the presence of any loci over-represented in the UTI or bacteraemia group of strains, or in the ESBL-positive and ESBL-negative group of strains using Gegenees. This analysis failed to identify any loci associated with a propensity towards bacteraemia or ESBL carriage.

Highly diverse plasmid repertoire in circulating clinical E. coli ST73 isolates

Given the observations of our pangenome analysis, we sought to determine the extent of mobile genetic element diversity in our ST73 isolates, focusing primarily on plasmids. Using the PlasmidFinder database we performed in silico plasmid typing on our 22 isolates (Table 1). Our analysis showed that FII, FIA and FIB plasmid types were predominant. To further investigate this we performed S1-PFGE plasmid profiling of every isolate. No plasmids were detected in the CTX-M-negative isolates, but a large number of plasmid molecules were detected in the remaining isolates (Table 1). A 112 kbp plasmid was found in the six isolates that showed the most similar accessory gene content in the pangenome analysis. Superimposing the plasmid typing data on the phylogenetic tree showed that the 112 kbp plasmid is present in the six isolates that showed the lowest amount of core genome diversity (Figure 1). We compared the similarity of genomes at gene-content level using the fragmented all-against-all comparison in Gegenees to show that the six strains sharing the 112 kbp plasmid also showed gene content similarity >95% (Supplementary Data), suggesting that the plasmid pool in these six strains is highly similar if not identical.

Discussion

Epidemiological studies in the East Midlands area of the UK have highlighted an increase in incidence of E. coli ST73 MDR isolates over the past 5 years.4,9 In this study, we present the genomic analysis of 22 ST73 isolates from human clinical bloodstream and UTI cases, all isolated within a 3 month period from the same region of the UK. Our analysis shows levels of diversity in the hundreds or thousands of SNPs between isolates. This is in stark contrast to ST131, where isolates from the identical UK region over a 6 month period showed diversity of <10 SNPs between strains isolated from unrelated clinical episodes, and a maximum diversity of dozens of SNPs.5

Analysis of our ST73 genomic dataset identified the presence of a limited number of plasmid types based on in silico rep typing; however, both genomic analysis and classical plasmid profiling showed plasmid diversity in the small ST73 population sampled here. The presence of a 112 kbp plasmid was inferred in six isolates, which were also the six most closely related isolates phylogenetically and at gene-content level. It is tempting to speculate there may be a circulating sub-clone of ST73, but such inference is hampered by our small and geographically restricted sample size.

The small population we have sequenced limits the inferences we can make from our dataset. However, there are several key points that our study highlights. The first is that the evolution and emergence of MDR lineages of ExPEC does not have a one-size-fits-all model. E. coli ST131 became a predominant clinical ExPEC isolate by clonal expansion and rapid global dissemination of an MDR clone of the wider ST131 lineage.7 Our data on clinically unrelated ST73 isolates show a highly diverse population of circulating ST73 strains, with a diverse plasmid pool driving MDR in this lineage. In order to gain a more comprehensive understanding of the emergence and population structure of this important lineage of pathogenic E. coli it is vitally important that larger global isolate collections are analysed. Equally important is that these collections include non-human reservoir isolates. By doing this we will acquire a far greater understanding of the ways in which ExPEC lineages can emerge as dominant MDR clinical isolates, and move our focus beyond just E. coli ST131.

Funding

This study was funded by an EMDA iNET award to A. M. and M. D., a Kuwaiti government studentship award to F. A. and a Royal Society grant award IE121459 to Z. Z. and A. M.

Transparency declarations

None to declare.

Acknowledgements

All sequencing was performed at the Exeter Sequencing Service at the University of Exeter.

References

1

Foxman
B
.
The epidemiology of urinary tract infection
.
Nat Rev Urol
2010
;
7
:
653
60
.

2

De Kraker
MEA
,
Jarlier
V
,
Monen
JCM
et al. .
The changing epidemiology of bacteraemias in Europe: trends from the European Antimicrobial Resistance Surveillance System
.
Clin Microbiol Infect
2013
;
19
:
860
8
.

3

Johnson
JR
,
Johnston
B
,
Clabots
C
et al. .
Escherichia coli type ST131 as the major cause of serious multidrug-resistant E. coli infections in the United States
.
Clin Infect Dis
2010
;
51
:
286
94
.

4

Croxall
G
,
Hale
J
,
Weston
V
et al. .
Molecular epidemiology of extraintestinal pathogenic Escherichia coli isolates from a regional cohort of elderly patients highlights the prevalence of ST131 strains with increased antimicrobial resistance in both community and hospital care settings
.
J Antimicrob Chemother
2011
;
66
:
2501
8
.

5

Clark
G
,
Paszkiewicz
K
,
Hale
J
et al. .
Genomic and molecular epidemiology analysis of clinical Escherichia coli ST131 isolates suggests circulation of a genetically monomorphic but phenotypically heterogeneous ExPEC clone
.
J Antimicrob Chemother
2012
;
67
:
868
77
.

6

Price
LB
,
Johnson
JR
,
Aziz
M
et al. .
The epidemic of extended-spectrum-β-lactamase-producing Escherichia coli ST131 is driven by a single highly pathogenic subclone, H30-Rx
.
MBio
2013
;
4
:
e00377
13
.

7

Petty
NK
,
Ben Zakour
NL
,
Stanton-Cook
M
et al. .
Global dissemination of a multidrug resistant Escherichia coli clone
.
Proc Natl Acad Sci USA
2014
;
111
:
5694
9
.

8

Welch
RA
,
Burland
V
,
Plunkett
G
et al. .
Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli
.
Proc Natl Acad Sci USA
2002
;
99
:
17020
4
.

9

Alhashash
F
,
Weston
V
,
Diggle
M
et al. .
Multidrug-resistant Escherichia coli bacteremia
.
Emerg Infect Dis
2013
;
19
:
1699
701
.

10

Swain
MT
,
Tsai
IJ
,
Assefa
SA
et al. .
A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs
.
Nat Protoc
2012
;
7
:
1260
84
.

11

Seemann
T
.
Prokka: rapid prokaryotic genome annotation
.
Bioinformatics
2014
;
30
:
2068
9
.

12

Darling
AE
,
Mau
B
,
Perna
NT
.
progressiveMauve: multiple genome alignment with gene gain, loss and rearrangement
.
PLoS One
2010
;
5
:
e11147
.

13

Danecek
P
,
Auton
A
,
Abecasis
G
et al. .
The variant call format and VCFtools
.
Bioinformatics
2011
;
27
:
2156
8
.

14

Angiuoli
SV
,
Salzberg
SL
.
Mugsy: fast multiple alignment of closely related whole genomes
.
Bioinformatics
2011
;
27
:
334
42
.

15

Stamatakis
A
,
Ludwig
T
,
Maier
H
.
RAxML-III: a fast program for maximum likelihood-based inference of large phylogenetic trees
.
Bioinformatics
2005
;
21
:
456
.

16

Agren
J
,
Sundstrom
A
,
Hafstrom
T
et al. .
Gegenees: fragmented alignment of multiple genomes for determining phylogenomic distances and genetic signatures unique for specified target groups
.
PLoS One
2012
;
7
:
e39107
.

17

Johnson
JR
,
Stell
AL
.
Extended virulence genotypes of Escherichia coli strains from patients with urosepsis in relation to phylogeny and host compromise
.
J Infect Dis
2000
;
181
:
261
72
.

18

Carattoli
A
,
Zankari
E
,
Garcia-Fernandez
A
et al. .
In silico detection and typing of plasmids using PlasmidFinder and plasmid multilocus sequence typing
.
Antimicrob Agents Chemother
2014
;
58
:
3895
903
.

19

Barton
BM
,
Harding
GP
,
Zuccarelli
AJ
.
A general method for detecting and sizing large plasmids
.
Anal Biochem
1995
;
226
:
235
40
.

20

Alqasim
A
,
Scheutz
F
,
Zong
Z
et al. .
Comparative genome analysis identifies few traits unique to the Escherichia coli ST131 H30Rx clade and extensive mosaicism at the capsule locus
.
BMC Genomics
2014
;
15
:
830
.

21

McNally
A
,
Cheng
L
,
Harris
SR
et al. .
The evolutionary path to extraintestinal pathogenic, drug-resistant Escherichia coli is marked by drastic reduction in detectable recombination within the core genome
.
Genome Biol Evol
2013
;
5
:
699
710
.

Supplementary data