Large-Scale Analysis of Plasmid Relationships through Gene-Sharing Networks

Author Notes

Abstract

Plasmids are vessels of genetic exchange in microbial communities. They are known to transfer between different host organisms and acquire diverse genetic elements from chromosomes and/or other plasmids. Therefore, they constitute an important element in microbial evolution by rapidly disseminating various genetic properties among different communities. A paradigmatic example of this is the dissemination of antibiotic resistance (AR) genes that has resulted in the emergence of multiresistant pathogenic bacterial strains. To globally analyze the evolutionary dynamics of plasmids, we built a large graph in which 2,343 plasmids (nodes) are connected according to the proteins shared by each other. The analysis of this gene-sharing network revealed an overall coherence between network clustering and the phylogenetic classes of the corresponding microorganisms, likely resulting from genetic barriers to horizontal gene transfer between distant phylogenetic groups. Habitat was not a crucial factor in clustering as plasmids from organisms inhabiting different environments were often found embedded in the same cluster. Analyses of network metrics revealed a statistically significant correlation between plasmid mobility and their centrality within the network, providing support to the observation that mobile plasmids are particularly important in spreading genes in microbial communities. Finally, our study reveals an extensive (and previously undescribed) sharing of AR genes between Actinobacteria and Gammaproteobacteria, suggesting that the former might represent an important reservoir of AR genes for the latter.

horizontal gene transfer, antibiotic resistance, plasmid, network

Issue Section:

Research article

Introduction

Plasmids are paradigmatic examples of the network-like structure of microbial evolution (Brilli et al. 2008). Indeed, they are among the most important players in the evolution of prokaryotes because they can be transferred between microorganisms, thus representing natural vectors for the transfer of genes and the functions they code for (Norman et al. 2009). Accordingly, they often provide a basis for genomic rearrangements via homologous recombination, facilitating the loss and/or acquisition of genes during these events, which may eventually lead to horizontal gene transfer (HGT). As a consequence, plasmids possess a mosaic structure with collections of functional genetic modules, each of which likely possessing an independent phylogenetic history, organized into a stable and self-replicating entity (Osborn et al. 2000; Toussaint and Merlin 2002; Bosi et al. 2011). Importantly, these functional blocks often embed genes that might have a great impact on the metabolic functions of the host cell, providing additional traits that can be accumulated without altering the gene content of the bacterial chromosome (Fondi et al. 2010). Plasmids are actually involved in many accessorial functions and constitute, together with “not essential” chromosomal regions, what is referred to as the “dispensable genome” in the microbial pan-genome concept (Medini et al. 2005). This, in turn, can include genes for ecologically important traits, such as antibiotic resistance (AR) (Crosa et al. 1975), pathogen virulence (Hacker and Kaper 2000), symbiotic nitrogen fixation (van Rhijn and Vanderleyden 1995), and the production of allelopathic bacteriocins (Riley and Gordon 1999). Among these processes, pathogenesis and AR are those that have been primarily explored up to now. Indeed, it has been shown that the presence of plasmids can be strictly linked to the emergence of pathogenic lineages within a given taxonomic unit (Reynaud et al. 2008; Le Roux et al. 2010). Parallely, in terms of AR, plasmids serve a central role as the vehicles for resistance gene capture and their subsequent spreading (Bennett 2008; Fondi and Fani 2010). Dissemination of these features represents one of the most important effects of “bacterial sex,” from both an evolutionary and an ecological viewpoint (Kohiyama et al. 2003). In this context, plasmid mobility represents an essential parameter of microorganisms' fitness, and it might also be a key element to an understanding of the epidemiology of these plasmid-carried traits (Smillie et al. 2010). However, despite their clear biological relevance, the pathways followed by plasmids during their evolutionary history remain almost obscure.

Nowadays, the use of massive plasmid sequencing as a routine laboratory technique (Schluter et al. 2008), together with the development of bioinformatics tools enabling the visualization of sequence homology relationships through similarity networks (Vlasblom et al. 2006; Brilli et al. 2008), can greatly speed up studies of gene mobility among plasmids. Furthermore, thanks to the expansion of network-oriented representation of sequences similarity relationships (Lima-Mendez et al. 2007; Brilli et al. 2008; Dagan and Martin 2009; Dagan et al. 2010; Fondi and Fani 2010; Fondi et al. 2010; Halary et al. 2010), graph theory measures have been applied to better describe the gene(s) flow across the diverse microbial communities, paving the way to large scale comparative analyses adopting bioinformatics strategies. In more detail, by adopting a gene-sharing network approach, Dagan et al. (2008) reported that the construction and the analysis of graphs capturing both vertical and lateral components of evolutionary history among 539,723 genes distributed across 181 sequenced prokaryotic genomes. The same authors estimated that an impressive amount (almost 80% on average) of the gene content of each analyzed genome was involved in lateral gene transfer at some point in evolution. More recently, Halary et al. (2010) applied mathematical studies of the centralities of a network embedding 119,381 homologous DNA families. They demonstrated that plasmids, and not viruses, are likely the key vectors of genetic exchange between bacterial chromosomes. Moreover, results also supported a disconnected yet highly structured network of genetic diversity, revealing the existence of multiple “genetic worlds.” From the analysis of the same network, the same authors also inferred that DNA pools mostly circulate between vehicles (i.e., plasmids, phages, and chromosomes) of the same type. Finally, Lima-Mendez et al. (2008) represented relationships across the phage population as a weighted graph where nodes represented phages and edges represented phage–phage similarities in terms of gene content. Their approach succeeded in capturing the pervasive mosaicism of phage genomes, indicating the importance of horizontal gene exchange in their evolution and also proving to be a promising tool for predicting lifestyles of individual phages from sequence data.

By applying a computational network-oriented pipeline, we have analyzed the evolutionary relationships among 2,343 microbial plasmids in order to explore the role of each of them within the reticulate evolutionary dynamics of this class of mobile genetic elements. Moreover, we focused the attention on the proteins involved in two main biological processes, that is, AR and pathogenesis as well as on plasmid features that might be involved in ruling the overall network of plasmids-mediated HGT (e.g., plasmid mobility). Data obtained provide interesting clues in gaining a systemic interpretation of the overall behavior of plasmids within bacterial evolution and in the spreading of some key biological features, such as AR and virulence.

Materials and Methods

Data Sets Assembly

All the available complete plasmid sequences (in GenBank format) were downloaded from NCBI using EFetch interface (as on 24 July 2010). Totally, 2,343 plasmids (102,772 open reading frames) were retrieved, and a complete table including all their main features (their size, taxonomy, accession codes, etc.) is available as supplementary S1 (Supplementary Material online). Moreover, two different subsets of sequences were created starting from the whole plasmid sequences data set. On one side, we created a set of plasmid-encoded proteins that were involved in the process of AR. This was done using each of the retrieved sequences as seen in basic local alignment search tool (BLAST) (Altschul et al. 1997) search against the Antibiotic Resistance DataBase (ARDB) (Liu and Pop 2009) using the following parameters: e value, 1 × 10⁻²⁰; minimum alignment length, 50 amino acids (aa); that is, a degree of amino acid sequence identity sufficiently high to retrieve all the proteins that should perform a function related to AR (Friedberg 2006; Fondi and Fani 2010). In this way, a set of 2,678 sequences putatively associated to AR were retrieved (for the complete list of accession codes of the proteins used in this work, see supplementary material S2, Supplementary Material online). These sequences belonged to 501 different plasmids.

The same strategy with the same parameters was applied when searching for virulence-related proteins (virulence factors, VFs) within the whole plasmid sequence data set. In this case, the probed database was the Virulence Factor DataBase (VFDB) (Chen et al. 2005; Yang et al. 2008), and a set of 7,840 sequences were retrieved from this BLAST search (belonging to 615 plasmids). Again, all the information about these sequences is available as supplementary material S3 (Supplementary Material online).

Network Construction

The network construction workflow described in this paragraph has been applied to each of the three assembled data sets, that is, the one embedding all retrieved plasmids sequences (hereinafter referred to as “all sequences network”), the one embedding the AR-related sequences (the “resistance network”), and the one embedding VF-related sequences (the “virulence network”).

In detail, each of the sequence data set was used in an all against all BLAST probing (Altschul et al. 1997) using the Murska parallel computing cluster (Center for Scientific Computing, Espoo, Finland). The BLAST output was parsed to include matches from two different identity thresholds (70% and 95%) by using ad hoc-implemented Python scripts. Two parsed files were obtained, one embedding those sequences sharing at least 70% sequence identity and another one embedding sequences sharing at least 95% identity. Similarly to Dagan et al. (2008) and, later, to Halary et al. (2010), this allows to interpret the resulting networks under a molecular clock–based assumption, that is, under the hypothesis that proteins with the highest percentages of identity were likely to be more recently shared than the ones with less identity. In the present context, proteins with 95% identity were considered more recently shared than those with 70%.

Subsequently each of these parsed BLAST outputs was transformed into a gene-sharing network and visualized using the Gephi visualization program (Bastian et al. 2009). Accordingly, in this network, each node represents a single plasmid, and two different plasmids are linked on the basis of their shared protein content. In particular, sharing is defined by a BLAST match between two reading frames longer than 300 bp and 95% or 70% amino acid identity, respectively, therefore representing an absolute measure. To investigate the dynamics of plasmids among bacterial cells, we applied a further filter to each of the obtained graph, maintaining linked only those edges sharing at least five proteins and discarded all the connections linking plasmids with a lower amount of shared proteins. Similarly, to investigate the dynamics of individual genes or small gene clusters among the plasmid population, we applied a filter to maintain only those edges that constitute sharing less than five genes. Altogether, we obtained eight different networks: 70% and 95% identity values for all sequences with more or less than five gene transfers and sequences related to AR or VF. The Gephi-formatted network files are available as supplementary material S4 (Supplementary Material online).

Permutation Tests

To evaluate the statistical significance of observed preferential gene flows (see below), we randomly permuted 10,000 times the phylogenetic affiliation of each node, while keeping intact the original degree of each node within the network (randomization with node degree conservation, see Brohee et al. 2008). A P value was then obtained by counting the number of times the randomly assembled networks returned a number of links greater (or lower) than the observed one and dividing this number for the total amount of performed permutation tests.

Estimation of Plasmid Mobility

The presence of genes related to plasmid mobility were identified by BLAST analysis (with the following parameters: e value, 1 × 10⁻²⁰; minimum alignment length, 50 aa) of the plasmid-encoded amino acid sequences against a tra and mob gene data set retrieved from ACLAME database (http://aclame.ulb.ac.be/; Leplae et al. 2004). Since tra and mob genes are generally associated with plasmid mobility and conjugation, we defined plasmid as mobile if it contained one or more mob or tra genes (a similar approach was recently adopted by Smillie et al. 2010).

Network Centralities, Statistics, and Visualization

Network centrality values for network nodes were calculated using iGraph package in R (Csardi and Nepusz 2006). Network clustering was estimated using the Louvain algorithm implemented in Gephi (Blondel et al. 2008) by maximizing modularity and minimizing number of clusters. All statistical tests to investigate the differences in degree and betweenness distributions and GC% content were performed using the base statistics tools in R (R Development Core Team 2010; http://www.r-project.org/). Data plotting was performed using ggplot2 package of R (Wickham 2009). All other statistical analyses were performed using in-house developed Perl and Python scripts. Visualization of network clustering and gene sharing as an ideogram was performed using Circos (Krzywinski et al. 2009).

Estimation of the Phylogenetic Distances of Gene Sharing

The 16S rRNA sequences for plasmid hosts were downloaded from Ribosomal Database project (Cole et al. 2007, 2009). The 16S rRNA sequences were aligned using the Nearest Alignment Space Termination aligner provided by Greengenes (DeSantis et al. 2006). The distance matrix of the phylogenetic distances was calculated using Phylip (Felsestein 1989).

Estimation of Phylogenetic Coherence in Major Network Clusters

The Conclustador algorithm (Leigh et al. 2011) was applied to analyze the congruence of phylogenetic trees reconstructed from the sequences of the genes shared by plasmids belonging to the same cluster in a network. Gene families responsible for the connections among the different plasmids were extracted from the 70% and 95% networks and aligned using Muscle software (Edgar 2004). Then, for each plasmid cluster, resulting multiple sequence alignments were used as input for phylogenetic coherence analysis, adopting Conclustador (Leigh et al. 2011) algorithm. Finally, SplitsTree4 (Huson and Bryant 2006) was used to visualize the phylogenetic information both in each single group identified by Conclustador and in all the groups all at once (and, together, responsible for the plasmid interconnections shown in the networks of fig. 1). In both cases, supernetworks were inferred using data available from single gene phylogenetic analyses performed with RAxML tool with 1,000 bootstrap replications.

FIG. 1.

The gene sharing between plasmids presented as matrices (A) and networks (B) at both 70% and 95% criteria. In network figures, plasmids are represented by the nodes (node size is proportional to the plasmid size) and the shared genes by the links. At least five shared genes are required to establish a link.

Open in new tab Download slide

Since for Conclustador to work properly analyzed data sets should not be too fragmented, that is, about the 80% of the overall taxa data set must be present in each multiple alignment, not all the identified plasmids clusters could be reliably analyzed. Accordingly, only the major clusters in the 70% and 95% networks were analyzed (namely clusters 961, 993, 1,144, and 1,238 for 70% network and 961, 993, and 1,144 for 95% network). Interestingly, the widespread fragmentation found for most of the clusters in the data set might be due to a high heterogeneity of the same clusters that, in turn, might mirror a high level of horizontal transfer of their embedded genes.

Results and Discussion

Gene-Sharing Networks

Gene sharing between plasmids was visualized as a network where the plasmids are represented as vertices (or nodes) and gene sharing as edges (or links). Altogether eight networks were constructed based on 70% and 95% identity between the amino acid sequences and the different edge criteria, such as the amount of genes shared (more than or less than five) or sharing AR or virulence genes (supplementary material S6, Supplementary Material online). The identity-based criterion introduced for links setting allows interpreting the resulting networks under a molecular clock–based assumption, that is, under the hypothesis that sequences with the highest percentages of identity (e.g., 95%) were likely to be more recently exchanged than the ones with less identity (e.g., 70%) (see, e.g., Halary et al. 2010). Data for the networks accounting for the sharing of five or more genes are reported in figure 1A and B. Overall, the plasmid network of all sequences at 70% identity (fig. 1B) threshold exhibits one major connected component, some minor connected components and a large number of disconnected plasmids (see below). The main connected component of the network of all genes (the central one in fig. 1B) embeds plasmids mainly belonging to the Proteobacteria phylum (particularly from Gamma, Alpha, and Beta subdivisions). Interestingly, this component also contains plasmids from Actinobacteria. A similar trend is observed in the case of 95% identity threshold network (fig. 1A) although, as it might be expected, in this case, the main connected component of the network is smaller. The only phylogenetically uniform major component is represented by plasmids from Borrelia burgdorferi (Spirochaetes, yellow nodes of fig. 1A and B).

In order to investigate the relationships between the taxonomy of represented microorganisms and the evolutionary interconnections of their plasmids, we performed network clustering using the Louvain algorithm implemented in Gephi (see Materials and Methods; Blondel et al. 2008) and compared the obtained plasmids groups with the phylogenetic and habitat affiliations of their constituent cells. The network clusters embedding multiple phyla and/or habitats for the 70% and 95% networks of all sequences are presented in figure 2. According to the network clustering analysis, the network clusters more typically embed members from different habitats than from different phylogenetic orders. Hence, it appears that phylogenetic distance is a greater barrier to gene sharing than having a different habitat. This is likely due to limited HGT across phylogenetic classes that could result from, for example, restriction or incompatible replication systems (as reviewed in Thomas and Nielsen 2005). Moreover, these observations are consistent with findings from microbial ecology and previous in silico analyses (Baquero et al. 2008; Fondi and Fani 2010) and suggest that there is a (more or less) high degree of mixing of microbes between unrelated environments.

FIG. 2.

The major phylogenetic groups, their habitats, and their clustering in (A) the 70% and the 95% networks for ≥5 networks and in 70% and the 95% networks for <5 networks (B) and (C), respectively. The clusters that were subjected to Conclustador analysis have been indicated. In (D) the amount of interphylum and intraphylum and interclass and intraclass clustering in the networks is reported for both <5 (low) and ≥5 (high) networks. The clustering of the network has been determined using the Louvain algorithm implemented in Gephi (see Materials and Methods).

Open in new tab Download slide

Gene sharing across phylogenetic classes implies at least one past HGT event and is therefore simple to detect. However, HGT could also be commonplace within phylogenetic classes. To investigate this, all the major network clusters (including those reported in fig. 2) were analyzed using Conclustador package to infer phylogenetically congruent and incongruent gene families. Overall, obtained data (provided as supplementary material S5, Supplementary Material online) revealed a high level of incongruence among the analyzed clusters. Indeed, Conclustador identified 8, 4, 2, and 3 different groups within 961, 993, 1,144, and 1,238 major plasmids clusters, respectively. Similarly, in the 95% network, 6, 4, and 2 distinct phylogenetic groups for 961, 993, and 1,144 were retrieved. The construction of phylogenetic networks of the sequences embedded in the groups identified by Conclustador revealed, in most cases, high levels of interspecies reticulation. Overall, these data suggest that the presence of potential abundant HGT at lower taxonomical levels than those reported in figures 1 and 2.

Furthermore, in order to shed some light on the putative functions encoded by the shared genes, we performed a Clusters of Orthologous Groups (COG) of proteins-based functional annotation of the sequences embedded in each plasmid cluster. Data obtained (also reported in supplementary material S5, Supplementary Material online) revealed that most of the sequences responsible of the plasmids interconnections encode for proteins involved in DNA transposition and recombination. This is not surprising since these functions are strongly linked to the process of HGT and, consequently, to plasmids. Nevertheless, as shown in supplementary material S5 (Supplementary Material online), other genes are shared among the different plasmids embedded in the same cluster and, importantly, their encoded functions are not directly related to the process of HGT itself. This suggests that other functions, probably related to more complex phenotypes, are shared by the different plasmids, including, for example, genes involved in transcription, inorganic ion transport and metabolism, and cell motility (the three most abundant functional categories of plasmids cluster 961, see supplementary material S5, Supplementary Material online).

To study the sharing of resistance and virulence genes, the same procedure of network construction was applied to the AR and VF sequence data sets. Results of these analyses for networks of 70% identity criterion are shown in supplementary material S6 (Supplementary Material online). Overall, the topology of both networks appeared to be similar to 70% and 95% networks of all sequences, although some differences can be identified. Indeed, concerning the AR network, the Proteobacterial plasmids do not form a single component, but two different major components can now be identified, one embedding Gammaproteobacterial and Actinobacterial plasmids and the other one embedding Beta and Alphaproteobacterial sequences. This suggests that plasmids belonging to these taxonomic units are not preferential transfer partners of AR genes for Gammaproteobacteria representatives. Conversely, in the virulence network, Proteobacterial plasmids form the major connected component of the graph (supplementary material S6, Supplementary Material online), revealing an intense sharing of virulence-related genes among microorganisms belonging to this taxonomic unit. Although some remarkable exceptions of plasmids acting as bridges in connecting otherwise separate groups do exist (see below), the other clusters of virulence network are overall coherent with the phylogenetic class affiliation (although intense gene sharing might be present within these groups of plasmids, as shown by previous phylogenetic coherence analysis).

Network Features and Taxonomy

In order to globally analyze the evolutionary relationships underlying the plasmid populations, we applied graph theory measures to the gene-sharing networks. In particular, the networks were analyzed for node degree and betweenness. Degree is defined as the number of connections a node has to other nodes. In the present context, a plasmid with a high degree is a plasmid that shares a large number of genes with other plasmids. Betweenness is a centrality measure that is defined as the frequency of a node to lie on the shortest path between two other network nodes. In this context, a plasmid with a high betweenness can transfer genes to many other plasmid in the network with a low number of gene transfer events and, in other words, can function as a bridge between otherwise disconnected regions of the network.

Accordingly, we computed centrality measures along the network, for all the classes of prokaryotes present in the data set. Results are provided in figure 3, whose analysis revealed a positive correlation between degree and betweenness that has also been observed by Halary et al. (2010). However, in the network, some nodes showed a much higher betweenness than most nodes of the same degree (see below). Such outliers, characterized by a low degree but a high betweenness, are especially important in any given network, as they can be seen as bridges between smaller, more connected parts of the network (Halary et al. 2010).

FIG. 3.

Dependency of plasmid betweenness from plasmid degree for different phylogenetic classes according to Pearson's product–moment correlation coefficient.

Open in new tab Download slide

Tables 1 and 2 report the highest degree and betweenness values, respectively, for individual plasmids in the 70% and 95% identity networks of all sequences. The analysis of table 1 reveals that all the plasmids possessing the highest values of degree belong to the Gamma subdivision of Proteobacteria. This result can be easily explained by the oversampling of plasmids from this class of bacteria. Indeed, the plasmids data used in this study are unsystematically gathered from several unrelated sources and are highly biased toward human pathogenic organisms (most of Gammaproteobacteria) (Wu et al. 2009). In this context, it is likely that more detailed studies of individual environments would reveal several gene-sharing events between various phylogenetic groups that are not represented in the current data set. Nevertheless, a detailed inspection of high-degree plasmids gave further support to previous observations based on single plasmids sequence data. In fact, for example, plasmid pU302L (see table 1) from Salmonella enterica subsp. enterica serovar Typhimurium has already been described for possessing a mosaic pattern of sequence homology with other plasmids (Chen et al. 2007), suggesting, in turn, that this plasmid acquired resistance genes from a variety of enteric bacteria (Chen et al. 2007). Notably, the fact that this plasmid is the best degree scoring plasmid in the 95% network indicates that it acquired foreign genetic material from very closely related microorganisms and/or very recently in time. Similarly, most of the other plasmids embedded in table 1 possess a well-documented history of HGT events (see, e.g., p1658/97 [Zienkiewicz et al. 2007; Yi et al. 2010] and pKF3-140 [Yi et al. 2010]).

Table 1.

Open in new tab

Individual Plasmids with the Highest Degree Measures Observed in the Gene-Sharing Networks of All Genes.

Accession Number	Microorganism	Plasmid Name	Degree	No. of tra/mob Genes	Conjugative (c) or Mobilizable (m)
70% Network
NC_010119	Salmonella enterica subsp. enterica serovar Choleraesuis	pOU7519	268	17	c
NC_006856	S. enterica subsp. enterica serovar Choleraesuis str. SC-B67	pSC138	254	17	c
NC_011964	Escherichia coli	pAPEC-O103-ColBM	253	8	c
NC_013951	Klebsiella pneumoniae	pKF3-140	247	9	c
NC_013728	E. coli O26:H-	pO26-CRL	243	21	c
NC_010488	E. coli SMS-3-5	pSMS35_130	242	13	c
NC_011092	S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633	pCVM19633_110.	241	17	c
NC_006816	S. enterica subsp. enterica serovar Typhimurium	pU302L	240	17	c
NC_013122	E. coli	pEK499	231	15	c
NC_013437	S. enterica subsp. enterica serovar Typhimurium	pSLT-BT	225	4	c
95% Network
NC_006816	S. enterica subsp. enterica serovar Typhimurium	pU302S	192	16	c
NC_010488	E. coli SMS-3-5	pSMS35_130	188	13	c
NC_006856	S. enterica subsp. enterica serovar Choleraesuis str. SC-B67	pSC138	187	17	c
NC_013951	K. pneumoniae	pKF3-140	186	9	c
NC_011964	E. coli	pAPEC-O103-ColBM	184	8	c
NC_010119	S. enterica subsp. enterica serovar Choleraesuis	pOU7519	171	17
NC_013728	E. coli O26:H-	pO26-CRL	168	21	c
NC_013122	E. coli	pEK499	166	15	c
NC_011092	S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633	pCVM19633_110.	165	17	c
NC_004998	E. coli	p1658/97	157	11	c

Accession Number	Microorganism	Plasmid Name	Degree	No. of tra/mob Genes	Conjugative (c) or Mobilizable (m)
70% Network
NC_010119	Salmonella enterica subsp. enterica serovar Choleraesuis	pOU7519	268	17	c
NC_006856	S. enterica subsp. enterica serovar Choleraesuis str. SC-B67	pSC138	254	17	c
NC_011964	Escherichia coli	pAPEC-O103-ColBM	253	8	c
NC_013951	Klebsiella pneumoniae	pKF3-140	247	9	c
NC_013728	E. coli O26:H-	pO26-CRL	243	21	c
NC_010488	E. coli SMS-3-5	pSMS35_130	242	13	c
NC_011092	S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633	pCVM19633_110.	241	17	c
NC_006816	S. enterica subsp. enterica serovar Typhimurium	pU302L	240	17	c
NC_013122	E. coli	pEK499	231	15	c
NC_013437	S. enterica subsp. enterica serovar Typhimurium	pSLT-BT	225	4	c
95% Network
NC_006816	S. enterica subsp. enterica serovar Typhimurium	pU302S	192	16	c
NC_010488	E. coli SMS-3-5	pSMS35_130	188	13	c
NC_006856	S. enterica subsp. enterica serovar Choleraesuis str. SC-B67	pSC138	187	17	c
NC_013951	K. pneumoniae	pKF3-140	186	9	c
NC_011964	E. coli	pAPEC-O103-ColBM	184	8	c
NC_010119	S. enterica subsp. enterica serovar Choleraesuis	pOU7519	171	17
NC_013728	E. coli O26:H-	pO26-CRL	168	21	c
NC_013122	E. coli	pEK499	166	15	c
NC_011092	S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633	pCVM19633_110.	165	17	c
NC_004998	E. coli	p1658/97	157	11	c

Table 1.

Open in new tab

Individual Plasmids with the Highest Degree Measures Observed in the Gene-Sharing Networks of All Genes.

Accession Number	Microorganism	Plasmid Name	Degree	No. of tra/mob Genes	Conjugative (c) or Mobilizable (m)
70% Network
NC_010119	Salmonella enterica subsp. enterica serovar Choleraesuis	pOU7519	268	17	c
NC_006856	S. enterica subsp. enterica serovar Choleraesuis str. SC-B67	pSC138	254	17	c
NC_011964	Escherichia coli	pAPEC-O103-ColBM	253	8	c
NC_013951	Klebsiella pneumoniae	pKF3-140	247	9	c
NC_013728	E. coli O26:H-	pO26-CRL	243	21	c
NC_010488	E. coli SMS-3-5	pSMS35_130	242	13	c
NC_011092	S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633	pCVM19633_110.	241	17	c
NC_006816	S. enterica subsp. enterica serovar Typhimurium	pU302L	240	17	c
NC_013122	E. coli	pEK499	231	15	c
NC_013437	S. enterica subsp. enterica serovar Typhimurium	pSLT-BT	225	4	c
95% Network
NC_006816	S. enterica subsp. enterica serovar Typhimurium	pU302S	192	16	c
NC_010488	E. coli SMS-3-5	pSMS35_130	188	13	c
NC_006856	S. enterica subsp. enterica serovar Choleraesuis str. SC-B67	pSC138	187	17	c
NC_013951	K. pneumoniae	pKF3-140	186	9	c
NC_011964	E. coli	pAPEC-O103-ColBM	184	8	c
NC_010119	S. enterica subsp. enterica serovar Choleraesuis	pOU7519	171	17
NC_013728	E. coli O26:H-	pO26-CRL	168	21	c
NC_013122	E. coli	pEK499	166	15	c
NC_011092	S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633	pCVM19633_110.	165	17	c
NC_004998	E. coli	p1658/97	157	11	c

Accession Number	Microorganism	Plasmid Name	Degree	No. of tra/mob Genes	Conjugative (c) or Mobilizable (m)
70% Network
NC_010119	Salmonella enterica subsp. enterica serovar Choleraesuis	pOU7519	268	17	c
NC_006856	S. enterica subsp. enterica serovar Choleraesuis str. SC-B67	pSC138	254	17	c
NC_011964	Escherichia coli	pAPEC-O103-ColBM	253	8	c
NC_013951	Klebsiella pneumoniae	pKF3-140	247	9	c
NC_013728	E. coli O26:H-	pO26-CRL	243	21	c
NC_010488	E. coli SMS-3-5	pSMS35_130	242	13	c
NC_011092	S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633	pCVM19633_110.	241	17	c
NC_006816	S. enterica subsp. enterica serovar Typhimurium	pU302L	240	17	c
NC_013122	E. coli	pEK499	231	15	c
NC_013437	S. enterica subsp. enterica serovar Typhimurium	pSLT-BT	225	4	c
95% Network
NC_006816	S. enterica subsp. enterica serovar Typhimurium	pU302S	192	16	c
NC_010488	E. coli SMS-3-5	pSMS35_130	188	13	c
NC_006856	S. enterica subsp. enterica serovar Choleraesuis str. SC-B67	pSC138	187	17	c
NC_013951	K. pneumoniae	pKF3-140	186	9	c
NC_011964	E. coli	pAPEC-O103-ColBM	184	8	c
NC_010119	S. enterica subsp. enterica serovar Choleraesuis	pOU7519	171	17
NC_013728	E. coli O26:H-	pO26-CRL	168	21	c
NC_013122	E. coli	pEK499	166	15	c
NC_011092	S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633	pCVM19633_110.	165	17	c
NC_004998	E. coli	p1658/97	157	11	c

Table 2.

Open in new tab

Individual Plasmids with the Highest Betweenness Measures Observed in the Gene-Sharing Networks of All Genes.

Accession Number	Microorganism	Plasmid Name	Betweenness	No. of tra/mob Genes	Conjugative (c) or Mobilizable (m)
70% Network
NC_007635	Escherichia coli	pCoo	8050	10	c/m
NC_006663	Staphylococcus epidermidis RP62A	pSERP	6329	3	m
NC_007974	Cupriavidus metallidurans CH34	megaplasmid	6067	14	c
NC_011092	Salmonella enterica subsp. enterica serovar Schwarzengrund str. CVM19633	pCVM19633_110	5800	17	c
NC_010558	E. coli 1520	pIP1206	5750	16	c
NC_009651	Klebsiella pneumoniae subsp. pneumoniae MGH 78578	pKPN5	5641	11
NC_011339	Bacillus cereus H3081.97	pH308197_258	5507	2	m
NC_011655	B. cereus AH187	pAH187_270	5330	7	c/m
NC_012586	Rhizobium sp. NGR234	pNGR234b	5271	88	c
NC_010980	Enterococcus faecium	pVEF3	4700	4	m
95% Network
NC_011092	S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633	pCVM19633_110	38781	17	c
NC_005024	Staphylococcus aureus	pSK41	29020	7	c
NC_012547	S. aureus	pGO1	29020	9	c
NC_010378	E. coli	pOLA52	21221	3	c
NC_005054	S. aureus	pLW043	19209	6	c
NC_009435	Lactococcus lactis	pGdh442	18216	7	m
NC_004669	Enterococcus faecalis V583	pTEF1	15617	8
NC_008381	Rhizobium leguminosarum bv. viciae 3841	pRL10	15030	27	c
NC_013121	E. coli	pEK516	13724	11	c
NC_005327	E. coli	pC15-1a	13073	9	c
NC_011996	Macrococcus caseolyticus JCSC5402	pMCCL2	12981	4	m

Accession Number	Microorganism	Plasmid Name	Betweenness	No. of tra/mob Genes	Conjugative (c) or Mobilizable (m)
70% Network
NC_007635	Escherichia coli	pCoo	8050	10	c/m
NC_006663	Staphylococcus epidermidis RP62A	pSERP	6329	3	m
NC_007974	Cupriavidus metallidurans CH34	megaplasmid	6067	14	c
NC_011092	Salmonella enterica subsp. enterica serovar Schwarzengrund str. CVM19633	pCVM19633_110	5800	17	c
NC_010558	E. coli 1520	pIP1206	5750	16	c
NC_009651	Klebsiella pneumoniae subsp. pneumoniae MGH 78578	pKPN5	5641	11
NC_011339	Bacillus cereus H3081.97	pH308197_258	5507	2	m
NC_011655	B. cereus AH187	pAH187_270	5330	7	c/m
NC_012586	Rhizobium sp. NGR234	pNGR234b	5271	88	c
NC_010980	Enterococcus faecium	pVEF3	4700	4	m
95% Network
NC_011092	S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633	pCVM19633_110	38781	17	c
NC_005024	Staphylococcus aureus	pSK41	29020	7	c
NC_012547	S. aureus	pGO1	29020	9	c
NC_010378	E. coli	pOLA52	21221	3	c
NC_005054	S. aureus	pLW043	19209	6	c
NC_009435	Lactococcus lactis	pGdh442	18216	7	m
NC_004669	Enterococcus faecalis V583	pTEF1	15617	8
NC_008381	Rhizobium leguminosarum bv. viciae 3841	pRL10	15030	27	c
NC_013121	E. coli	pEK516	13724	11	c
NC_005327	E. coli	pC15-1a	13073	9	c
NC_011996	Macrococcus caseolyticus JCSC5402	pMCCL2	12981	4	m

Table 2.

Open in new tab

Individual Plasmids with the Highest Betweenness Measures Observed in the Gene-Sharing Networks of All Genes.

Accession Number	Microorganism	Plasmid Name	Betweenness	No. of tra/mob Genes	Conjugative (c) or Mobilizable (m)
70% Network
NC_007635	Escherichia coli	pCoo	8050	10	c/m
NC_006663	Staphylococcus epidermidis RP62A	pSERP	6329	3	m
NC_007974	Cupriavidus metallidurans CH34	megaplasmid	6067	14	c
NC_011092	Salmonella enterica subsp. enterica serovar Schwarzengrund str. CVM19633	pCVM19633_110	5800	17	c
NC_010558	E. coli 1520	pIP1206	5750	16	c
NC_009651	Klebsiella pneumoniae subsp. pneumoniae MGH 78578	pKPN5	5641	11
NC_011339	Bacillus cereus H3081.97	pH308197_258	5507	2	m
NC_011655	B. cereus AH187	pAH187_270	5330	7	c/m
NC_012586	Rhizobium sp. NGR234	pNGR234b	5271	88	c
NC_010980	Enterococcus faecium	pVEF3	4700	4	m
95% Network
NC_011092	S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633	pCVM19633_110	38781	17	c
NC_005024	Staphylococcus aureus	pSK41	29020	7	c
NC_012547	S. aureus	pGO1	29020	9	c
NC_010378	E. coli	pOLA52	21221	3	c
NC_005054	S. aureus	pLW043	19209	6	c
NC_009435	Lactococcus lactis	pGdh442	18216	7	m
NC_004669	Enterococcus faecalis V583	pTEF1	15617	8
NC_008381	Rhizobium leguminosarum bv. viciae 3841	pRL10	15030	27	c
NC_013121	E. coli	pEK516	13724	11	c
NC_005327	E. coli	pC15-1a	13073	9	c
NC_011996	Macrococcus caseolyticus JCSC5402	pMCCL2	12981	4	m

Accession Number	Microorganism	Plasmid Name	Betweenness	No. of tra/mob Genes	Conjugative (c) or Mobilizable (m)
70% Network
NC_007635	Escherichia coli	pCoo	8050	10	c/m
NC_006663	Staphylococcus epidermidis RP62A	pSERP	6329	3	m
NC_007974	Cupriavidus metallidurans CH34	megaplasmid	6067	14	c
NC_011092	Salmonella enterica subsp. enterica serovar Schwarzengrund str. CVM19633	pCVM19633_110	5800	17	c
NC_010558	E. coli 1520	pIP1206	5750	16	c
NC_009651	Klebsiella pneumoniae subsp. pneumoniae MGH 78578	pKPN5	5641	11
NC_011339	Bacillus cereus H3081.97	pH308197_258	5507	2	m
NC_011655	B. cereus AH187	pAH187_270	5330	7	c/m
NC_012586	Rhizobium sp. NGR234	pNGR234b	5271	88	c
NC_010980	Enterococcus faecium	pVEF3	4700	4	m
95% Network
NC_011092	S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633	pCVM19633_110	38781	17	c
NC_005024	Staphylococcus aureus	pSK41	29020	7	c
NC_012547	S. aureus	pGO1	29020	9	c
NC_010378	E. coli	pOLA52	21221	3	c
NC_005054	S. aureus	pLW043	19209	6	c
NC_009435	Lactococcus lactis	pGdh442	18216	7	m
NC_004669	Enterococcus faecalis V583	pTEF1	15617	8
NC_008381	Rhizobium leguminosarum bv. viciae 3841	pRL10	15030	27	c
NC_013121	E. coli	pEK516	13724	11	c
NC_005327	E. coli	pC15-1a	13073	9	c
NC_011996	Macrococcus caseolyticus JCSC5402	pMCCL2	12981	4	m

High-betweenness nodes (plasmids) span over a larger taxonomic spectrum, suggesting that this centrality measure is less affected by sampling biases. Indeed, the plasmids with the highest betweenness values belong to diverse phylogenetic classes, including Bacilli, Lactobacilli, and Gamma, Beta, and Alphaproteobacterial representatives. As in the case of high-degree plasmids, mosaic-like structure of high-betweenness plasmids has been described before, for example, of pCoo from Escherichia coli (Froehlich et al. 2005) and pGO1 from Staphylococcus aureus (Caryl and O'Neill 2009). Hence, although the overall plasmids clustering seems to agree with taxonomic classification of their source microorganisms, some plasmids compact the overall network, residing in the path between plasmids that otherwise would remain disconnected (Halary et al. 2010). Importantly, some of the plasmids that were found to possess high-degree/betweenness values (tables 1 and 2) were the same that were found to be central in other gene-sharing network analyses performed by Halary et al. (2010) (namely, plasmids pOU7519, pU302L from Salmonella representatives, p1658/97, pIP1206 from E. coli, pKPN5 from Klebsiella pneumoniae, pVEF3 from Enterobacter faecium, pSK41 from S. aureus, pGdh442 from Lactococcus lactis, and pTEF1 from Enterococcus faecalis V583), thus confirming the key role of these DNA molecules in the flow of genetic material among different microorganisms. In our opinion, these plasmids represent key players from an evolutionary viewpoint, contributing to the spreading of potentially clinically relevant genetic determinants within the whole bacterial mobilome.

Several plasmids (1,159 for the 70% identity network of all genes and 1,369 for the 95% identity network) in the data set shared less than five genes with any other plasmid and therefore did not belong to any connected component. The taxonomic composition of this disconnected component of the network is presented in figure 4. Statistical randomization testing (as described in Materials and Methods) was performed to evaluate the effect of sampling bias in the frequency distribution. Most of the phylogenetic classes possessed between 2% and 5% of disconnected plasmids, the only exception being represented by Gammaproteobacteria (almost 15% of disconnected plasmids). For most classes, the amount of disconnected plasmids was higher than expected by random shuffling of the networks.

FIG. 4.

The phylogenetic class distribution of the disconnected plasmids in the data set. A plus sign (+) is used to mark the interclass transfers that were more abundant than expected by random assignment of the transfer events between plasmids (permutation test, P value < 1 × 10⁻⁴). A minus sign (−) is used to mark the interclass transfers that were less abundant than expected by random assignment of the transfer events between plasmids (permutation test, P value < 1 × 10⁻⁴).

Open in new tab Download slide

Dynamics of Genes in the Plasmid Population

In the previous sections, we mainly analyzed networks in which two plasmids were connected if they shared (at least) five genes, thus surely underestimating the real amount of gene transfer events among plasmids. To go into greater details and to analyze the possible dynamics of gene transfer among plasmids, we built gene-sharing networks taking into account the sharing of single genes (up to four genes) among two given plasmids. Such networks were constructed adopting the same computational strategy used for ≥5 networks (see Materials and Methods) and, together with singlets taxonomical distribution and cross-taxa interconnections, are reported in supplementary material S7 (Supplementary Material online). Overall, <5 networks embedded almost the same number of links (11,458 and 5,136 for 70% and 95% identity thresholds, respectively) compared with >5 networks (12,444 and 6,777 for 70% and 95% identity thresholds, respectively), suggesting the presence of an extensive amount of single gene (or of relatively small gene sets) exchange among the different plasmids.

Louvain clustering of <5 networks, although producing a large fraction of taxonomically highly coherent groups, resulted in slightly more heterogeneous plasmid clustering compared with the clustering obtained from ≥5 networks (fig. 2B and C). This suggests that when considering the transfer single genes or groups of small genes, taxonomical barriers can be bypassed more frequently than in the movement of larger sets of genes. In agreement with the previous congruency analysis, a deeper analysis of the phylogenetic coherence (adopting the coherence analysis pipeline described in Materials and Methods) of the gene families within the major network clusters revealed a high amount of incongruency (data not shown). Hence, according to the overall body of data presented here, it appears that the sharing of relatively small gene sets is more abundant and spans over a larger phylogenetic distance than transfers of larger sets of genes, although the great part of this genetic exchange still happens within the boundaries of microbial phylogenetic classes.

Network Comparison

To explore the differences among the networks, we computed Pearson product–moment correlation coefficients between betweenness and degree values for each node (i.e., plasmid) (fig. 5). Data obtained revealed a low-positive correlation between betweenness and degree in each of the networks, independently from the nucleic acid identity thresholds and/or the functions shared among the different plasmids (virulence or AR genes). R² values range between 0.25 and 0.36 for 70% networks and are slightly higher for 95% networks (ranging from 0.29 to 0.44). Accordingly, node degree does not explain all the variation in node betweenness regardless the timing of the gene transfer(s) (70% vs. 95% thresholds) and/or to the functions that are transferred (virulence vs. AR determinants)—the values are most likely determined by the mobile nature of plasmids themselves.

FIG. 5.

Dependency of plasmid betweenness from plasmid degree for the major networks built in this work according to Pearson's product–moment correlation coefficient. Networks of <5 and ≥5 connections are indicated as low and high, respectively.

Open in new tab Download slide

Analysis of Mobilizable and Conjugative Plasmids

Conjugative plasmids have been defined as “vessels” of the communal gene pool (Norman et al. 2009). Indeed, this class of plasmids possesses the ability to “visit” different cells and, in principle, undergo genetic rearrangements (such as homologous recombination) with other plasmids and/or other informative molecules (phage genomes and chromosomes). For this reason, conjugative plasmids might be expected to possess a more central position within the overall plasmid gene-sharing network in respect to those that are not mobilizable. To test this hypothesis, all the tra- and mob-like sequences of the plasmids were eliminated from the networks, and the centrality measures of conjugative/mobilizable plasmids were evaluated. Plasmid mobility was estimated by identifying the number of mob and tra genes that they harbor (an approach similar to that adopted in Smillie et al. 2010 and described in Materials and Methods). The relationship existing between the mobility and the network measures was investigated by studying the distribution of the centrality measures between the mobile and the nonmobile plasmids. The distributions of the centrality measures are presented in figure 6 and are significantly higher for mobilizable plasmids in the networks of all genes and resistance genes (P values according to Mann–Whitney tests are presented in fig. 6). Therefore, the presence of mob or tra genes significantly promotes the gene-sharing measures in the networks of all genes and AR genes. This suggests that plasmid mobility is an important mechanism in spreading various genetic traits within the plasmid community, including AR genes. This fully agrees with the central role inferred for conjugative plasmids in the context of bacterial evolution (Norman et al. 2009) and gives further support to the idea that these particular plasmids act as vessels of the communal gene pool. This also indicates that the high incidence of high degree and betweenness values in certain phylogenetic classes (such as Gammaproteobacteria) does not only result from their overrepresentation in current data set but are also affected by genetic properties of their plasmids.

FIG. 6.

The relationship between the network centrality measures and plasmid mobility. The mobile plasmids are significantly more central in the networks of all and resistance genes, as indicated by the P values (calculated with Mann–Whitney tests) embedded in the figure.

Open in new tab Download slide

Gene Sharing over Phylogenetic Classes

The importance of plasmids within the complex microbial evolutionary network resides also in the capability to connect microbes separated by a (more or less) long phylogenetic distance and to overcome the various barriers to HGT (Thomas and Nielsen 2005). The occurrence of gene sharing over phylogenetic classes was enumerated and visualized in figure 7.

FIG. 7.

The frequency of interclass gene transfer events in the networks. A plus sign (+) is used to mark the interclass transfers that were more abundant than expected by random assignment of the transfer events between plasmids (permutation test, P value < 1 × 10⁻⁴). A minus sign (−) is used to mark the interclass transfers that were less abundant than expected by random assignment of the transfer events between plasmids (permutation test, P value < 1 × 10⁻⁴).

Open in new tab Download slide

Interestingly, some connections in the network span over very large phylogenetic distances. For example, we found connections linking Alphaproteobacteria and Cyanobacteria and in particular plasmid pCC7120beta from Nostoc sp. PCC 7120 with plasmid pBBta01 from Bradyrhizobium sp. BTAi1 and pCC7120gamma from Nostoc sp. PCC 7120 with plasmid pNGR234b from Sinorhizobium fredii NGR234. These connections suggest that the presence of HGT among microorganisms inhabiting very different ecological niches (multiple and host associated for Cyanobacteria and Alphaproteobacteria, respectively), involving genes linked to important functions, such as copper transport and transcriptional regulation, respectively. Remarkably, also interkingdom transfers (involving chemotaxis-related genes) were observed: This is the case, for example, of connections linking plasmid pH308197_258 from Bacillus cereus H3081.97 to plasmid pHmuk01 from Halomicrobium mukohataei DSM 12286. Also in this case, microorganisms belong to likely unrelated habitats (multiple and specialized, respectively).

However, because the amount of interclasses connections is likely strongly affected by sampling biases, we performed statistical tests to investigate the significance of the observed interclass connections by performing random permutation of the original network, as described in Materials and Methods. In the 70% identity network, interclass links included connections between more closely related microorganisms (e.g., connections between Alpha, Beta, and Gammaproteobacteria and between Bacilli and Lactobacilli) as well as connections between more distantly related microorganisms (i.e., Actinobacteria and Betaproteobacteria, Actinobacteria and Gammaproteobacteria, and Alphaproteobacteria and Deinococci). However, some closely related microorganisms possessed a lower amount of connections than expected by chance (e.g., between Alphaproteobacteria and Gammaproteobacteria, P value < 1 × 10⁻⁴), possibly indicating a genetic incompability between these groups (Thomas and Nielsen 2005). As it might be expected, when analyzing the 95% network, the number of observed connections decreased and mainly closely related taxonomic groups were still interconnected (Bacilli–Lactobacilli and Betaproteobacteria–Gammaproteobacteria [P value < 1 × 10⁻⁴] among overrepresented and Alphaproteobacteria–Gammaproteobacteria and Bacilli–Gammaproteobacteria among underrepresented [P value < 1 × 10⁻⁴]). Notably, the connection between distantly related Gammaproteobacteria and Actinobacteria also remained strong.

As noted in the case of gene transfers among phylogenetically incoherent groups (supplementary material S7, Supplementary Material online), the majority of shared genes code for functions that are related to the process of HGT itself and generally belong to L category in COG annotation (fig. 8). Nevertheless, also other functions are exchanged as indicated by gene sharing (fig. 8), underlining the key role of plasmids in spreading important biological traits throughout the whole microbial world.

FIG. 8.

COG functional annotation of the genes shared by the plasmids belonging to the different taxonomical classes of the data set.

Open in new tab Download slide

Gene Transfer between Actinobacteridae and Gammaproteobacteria

According to the results presented in figure 7, the gene sharing between Actinobacteria and Gammaproteobacteria is spanning one of the longest phylogenetic distances within our networks (supplementary material S8, Supplementary Material online) and appears to be crucial in transferring AR genes. Furthermore, most of the shared genes are (at least) 95% similar and therefore, according to the molecular clock hypothesis, the transfer between these classes has occurred recently. For this reason, we further analyzed this, apparently preferential, gene flow.

To better characterize the gene sharing between Actinobacteria and Gammaproteobacteria, we selected representative plasmids with a high amount of shared genes between Gammaproteobacteria and Actinobacteria and visualized them as circular ideogram with resistance-, conjugation-, and transposition-related genes and gene-sharing events (fig. 9). The analysis of figure 9 revealed that the AR genes transfer between the plasmids by transposition, as most of the links connecting Actinobacteria and Gammaproteobacteria fall in plasmid regions embedding AR- and/or transposition-related genes. These results indicate the presence of a clinically important gene flow between representatives of these microbial groups, although not suggesting the possible direction of these gene transfers (i.e., from Actinobacteria to Gammaproteobacteria or vice versa). To shed some light on this point, we investigated the composition of the involved plasmids under the assumption that, if the HGT events are recent (as suggested by the high amino acid identity), the transferred genes are expected to have a GC content closer to the donor plasmids rather than to the recipient one (Karlin 2001). Hence, the GC content of the Actinobacterial and Gammaproteobacterial plasmids and genes was calculated and compared (supplementary material S9, Supplementary Material online). The Actinobacterial plasmid GC content (mean 0.56% from seven plasmids) was significantly higher (P value = 9.4 × 10⁻³ according to a Mann–Whitney test) than the Gammaproteobacterial GC content (mean 0.51% from 95 plasmids). Moreover, GC contents were calculated for the individual transferred genes and compared with the plasmids. According to Mann–Whitney test, the transferred genes have a significantly different GC content from the Gammaproteobacterial plasmids (P = 7.0 × 10⁻¹⁵) but are not significantly different from Actinobacterial plasmids (P = 0.42). Accordingly, the whole body of data presented in this section suggests that the direction of gene transfer is very likely from Actinobacteria to Gammaproteobacteria. This is consistent with the knowledge that some Actinobacteria are natural producers of antibiotic compounds and, therefore, a potential source of AR genes to human pathogens (Wright 2007; Miao and Davies 2010).

FIG. 9.

An ideogram of gene transfers between Actinobacterial plasmids (accession nos. NC_004939, NC_004945, and NC_014167) and Gammaproteobacterial plasmids (accession nos. NC_006816, NC_009141, NC_009651, NC_010488, NC_010886, and NC_011092). Gene-sharing events are marked using the curves in the middle of the ideogram. GC content of the plasmids is plotted on the outer side of the plasmid molecules if it is above the average of the GC content of the corresponding plasmid. Genes related to resistance, conjugation, and transposition are marked as lines on outer, middle, and innermost rings, respectively, on the inner side of the plasmid ring.

Open in new tab Download slide

Conclusions

The use of gene-sharing network as a tool to investigate microbial evolutionary relationships is rapidly expanding, especially when studying nontree-like structures that sometimes can arise in evolution (Dagan et al. 2008; Halary et al. 2010). The power of such approach is demonstrated here by revealing the relationships between biological properties (e.g., plasmids mobility) and network properties (e.g., plasmid centrality) in the gene-sharing network. Moreover, the approach applied here also revealed an extensive AR gene sharing between Actinobacterial and Gammaproteobacterial plasmids, suggesting a potential source of AR genes that might have led to the recent emergence of antibiotic multiresistance in pathogenic organisms.

The plasmid sequences analyzed in this study were gathered in a nonsystematic manner from different sequencing projects; their sampling is therefore random and likely biased toward human pathogenic organisms. The bioinformatic workflow described here would be best suited for single genomic sequence data sets obtained from specifically selected environments. We expect such data sets to become available as the DNA sequencing costs decrease, and genome sequencing from single cells becomes a routine approach (Stepanauskas and Sieracki 2007; Rodrigue et al. 2009). The proposed approach could then be used to investigate whether the functional categories of transferred genes would reflect the different selective patterns present in the given environment(s). Therefore, obtaining single genome data sets from multiple different environments would permit evaluation and comparison of gene-sharing patterns in response to different environmental conditions.

The study was financially supported by the Academy of Finland (grant number 129873) and the Finnish Graduate School in Environmental Science and Technology (EnSTe). M.F. is financed by a postdoctoral grant from “Fondazione Adriano Buzzati-Traverso.” The authors would like to thank Kimmo Mattila for his kind assistance in parallel BLAST analyses.

References

Altschul

Madden

Schaffer

Zhang

Miller

Lipman

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs

Nucleic Acids Res.

1997

, vol.

(pg.

3389

3402

)

Baquero

Martinez

Canton

Antibiotics and antibiotic resistance in water environments

Curr Opin Biotechnol.

2008

, vol.

(pg.

260

265

)

Bastian

Heymann

Jacomy

Gephi: an open source software for exploring and manipulating networks [Internet]

International AAAI Conference on Weblogs and Social Media

2009

Available from: https://gephi.org/users/publications/

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Bennett

Plasmid encoded antibiotic resistance: acquisition and transfer of antibiotic resistance genes in bacteria

Br J Pharmacol.

2008

, vol.

153

Suppl 1

(pg.

S347

S357

)

Blondel

Guillaume

Lambiotte

Lefebvre

Fast unfolding of communities in large networks

J Stat Mech.

2008

pg.

P10008

Google Scholar

OpenURL Placeholder Text

WorldCat

Bosi

Fani

Fondi

The mosaicism of plasmids revealed by atypical genes detection and analysis

BMC Genomics

2011

, vol.

pg.

403

Brilli

Mengoni

Fondi

Bazzicalupo

Lio

Fani

Analysis of plasmid genes by phylogenetic profiling and visualization of homology relationships using Blast2Network

BMC Bioinformatics

2008

, vol.

pg.

551

Brohee

Faust

Lima-Mendez

Vanderstocken

van Helden

Network Analysis Tools: from biological networks to clusters and pathways

Nat Protoc.

2008

, vol.

(pg.

1616

1629

)

Caryl

O'Neill

Complete nucleotide sequence of pGO1, the prototype conjugative plasmid from the Staphylococci

Plasmid

2009

, vol.

(pg.

)

Chen

Nace

Solow

Fratamico

Complete nucleotide sequences of 84.5- and 3.2-kb plasmids in the multi-antibiotic resistant Salmonella enterica serovar Typhimurium U302 strain G8430

Plasmid

2007

, vol.

(pg.

)

Chen

Yang

Yao

Sun

Shen

Jin

VFDB: a reference database for bacterial virulence factors

Nucleic Acids Res.

2005

, vol.

(pg.

D325

D328

)

Cole

Chai

Farris

Wang

Kulam-Syed-Mohideen

McGarrell

Bandela

Cardenas

Garrity

Tiedje

The ribosomal database project (RDP-II): introducing myRDP space and quality controlled public data

Nucleic Acids Res.

2007

, vol.

(pg.

D169

D172

)

Cole

Wang

Cardenas

et al.

(11 co-authors)

The Ribosomal Database Project: improved alignments and new tools for rRNA analysis

Nucleic Acids Res.

2009

, vol.

(pg.

D141

D145

)

Crosa

Luttropp

Falkow

Nature of R-factor replication in the presence of chloramphenicol

Proc Natl Acad Sci U S A.

1975

, vol.

(pg.

654

658

)

Csardi

Nepusz

The igraph software package for complex network research

InterJournal Complex Systems

2006

Google Scholar

OpenURL Placeholder Text

WorldCat

Dagan

Artzy-Randrup

Martin

Modular networks and cumulative impact of lateral transfer in prokaryote genome evolution

Proc Natl Acad Sci U S A.

2008

, vol.

105

(pg.

10039

10044

)

Dagan

Martin

Getting a better picture of microbial evolution en route to a network of genomes

Philos Trans R Soc Lond B Biol Sci.

2009

, vol.

364

(pg.

2187

2196

)

Dagan

Roettger

Bryant

Martin

Genome networks root the tree of life between prokaryotic domains

Genome Biol Evol.

2010

, vol.

(pg.

379

392

)

DeSantis

Hugenholtz

Keller

Brodie

Larsen

Piceno

Phan

Andersen

NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes

Nucleic Acids Res.

2006

, vol.

(pg.

W394

W399

)

Edgar

MUSCLE: a multiple sequence alignment method with reduced time and space complexity

BMC Bioinformatics

2004

, vol.

pg.

113

Felsestein

PHYLIP—phylogenetic inference package (version 3.2)

Cladistics

1989

, vol.

(pg.

164

166

)

Google Scholar

OpenURL Placeholder Text

WorldCat

Fondi

Bacci

Brilli

Papaleo

Mengoni

Vaneechoutte

Dijkshoorn

Fani

Exploring the evolutionary dynamics of plasmids: the Acinetobacter pan-plasmidome

BMC Evol Biol.

2010

, vol.

pg.

Fondi

Fani

The horizontal flow of the plasmid resistome: clues from inter-generic similarity networks

Environ Microbiol.

2010

, vol.

(pg.

3228

3242

)

Friedberg

Automated protein function prediction—the genomic challenge

Brief Bioinformatics

2006

, vol.

(pg.

225

242

)

Froehlich

Parkhill

Sanders

Quail

Scott

The pCoo plasmid of enterotoxigenic Escherichia coli is a mosaic cointegrate

J Bacteriol.

2005

, vol.

187

(pg.

6509

6516

)

Hacker

Kaper

Pathogenicity islands and the evolution of microbes

Annu Rev Microbiol.

2000

, vol.

(pg.

641

679

)

Halary

Leigh

Cheaib

Lopez

Bapteste

Network analyses structure genetic diversity in independent genetic worlds

Proc Natl Acad Sci U S A.

2010

, vol.

107

(pg.

127

132

)

Huson

Bryant

Application of phylogenetic networks in evolutionary studies

Mol Biol Evol.

2006

, vol.

(pg.

254

267

)

Karlin

Detecting anomalous gene clusters and pathogenicity islands in diverse bacterial genomes

Trends Microbiol.

2001

, vol.

(pg.

335

343

)

Kohiyama

Hiraga

Matic

Radman

Bacterial sex: playing voyeurs 50 years later

Science

2003

, vol.

301

(pg.

802

803

)

Krzywinski

Schein

Birol

Connors

Gascoyne

Horsman

Jones

Marra

Circos: an information aesthetic for comparative genomics

Genome Res.

2009

, vol.

(pg.

1639

1645

)

Le Roux

Labreuche

Davis

Iqbal

Mangenot

Goarant

Mazel

Waldor

Virulence of an emerging pathogenic lineage of Vibrio nigripulchritudo is dependent on two plasmids

Environ Microbiol.

2010

, vol.

(pg.

296

306

)

Leigh

Schliep

Lopez

Bapteste

Let them fall where they may: congruence analysis in massive, phylogenetically messy datasets

Mol Biol Evol.

2011

, vol.

(pg.

2773

2785

)

Leplae

Hebrant

Wodak

Toussaint

ACLAME: a CLAssification of Mobile genetic Elements

Nucleic Acids Res.

2004

, vol.

(pg.

D45

D49

)

Lima-Mendez

Toussaint

Leplae

Analysis of the phage sequence space: the benefit of structured information

Virology

2007

, vol.

365

(pg.

241

249

)

Lima-Mendez

Van Helden

Toussaint

Leplae

Reticulate representation of evolutionary and functional relationships between phage genomes

Mol Biol Evol.

2008

, vol.

(pg.

762

777

)

Liu

Pop

ARDB—Antibiotic Resistance Genes Database

Nucleic Acids Res.

2009

, vol.

(pg.

D443

D447

)

Medini

Donati

Tettelin

Masignani

Rappuoli

The microbial pan-genome

Curr Opin Genet Dev.

2005

, vol.

(pg.

589

594

)

Miao

Davies

Actinobacteria: the good, the bad, and the ugly

Antonie Van Leeuwenhoek

2010

, vol.

(pg.

143

150

)

Norman

Hansen

Sorensen

Conjugative plasmids: vessels of the communal gene pool

Philos Trans R Soc Lond B Biol Sci.

2009

, vol.

364

(pg.

2275

2289

)

Osborn

da Silva Tatley

Steyn

Pickup

Saunders

Mosaic plasmids and mosaic replicons: evolutionary lessons from the analysis of genetic diversity in IncFII-related replicons

Microbiology

2000

, vol.

146

Pt 9

(pg.

2267

2275

)

R Development Core Team

R: a language and environment for statistical computing

2010

[Internet]. Available from: http://cran.r-project.org/doc/FAQ/R-FAQ.html#Citing-R

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Reynaud

Saulnier

Mazel

Goarant

Le Roux

Correlation between detection of a plasmid and high-level virulence of Vibrio nigripulchritudo, a pathogen of the shrimp Litopenaeus stylirostris

Appl Environ Microbiol.

2008

, vol.

(pg.

3038

3047

)

Riley

Gordon

The ecological role of bacteriocins in bacterial competition

Trends Microbiol.

1999

, vol.

(pg.

129

133

)

Rodrigue

Malmstrom

Berlin

Birren

Henn

Chisholm

Whole genome amplification and de novo assembly of single bacterial cells

PLoS One

2009

, vol.

pg.

e6864

Schluter

Krause

Szczepanowski

Goesmann

Puhler

Genetic diversity and composition of a plasmid metagenome from a wastewater treatment plant

J Biotechnol.

2008

, vol.

136

(pg.

)

Smillie

Garcillan-Barcia

Francia

Rocha

de la Cruz

Mobility of plasmids

Microbiol Mol Biol Rev.

2010

, vol.

(pg.

434

452

)

Stepanauskas

Sieracki

Matching phylogeny and metabolism in the uncultured marine bacteria, one cell at a time

Proc Natl Acad Sci U S A.

2007

, vol.

104

(pg.

9052

9057

)

Thomas

Nielsen

Mechanisms of, and barriers to, horizontal gene transfer between bacteria

Nat Rev Microbiol.

2005

, vol.

(pg.

711

721

)

Toussaint

Merlin

Mobile elements as a combination of functional modules

Plasmid

2002

, vol.

(pg.

)

van Rhijn

Vanderleyden

The Rhizobium-plant symbiosis

Microbiol Rev.

1995

, vol.

(pg.

124

142

)

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Vlasblom

Superina

Liu

Orsi

Wodak

GenePro: a Cytoscape plug-in for advanced visualization and analysis of interaction networks

Bioinformatics

2006

, vol.

(pg.

2178

2179

)

Wickham

ggplot2: elegant graphics for data analysis

2009

New York

Springer

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Wright

The antibiotic resistome: the nexus of chemical and genetic diversity

Nat Rev Microbiol.

2007

, vol.

(pg.

175

186

)

Hugenholtz

Mavromatis

et al.

(34 co-authors)

A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea

Nature

2009

, vol.

462

(pg.

1056

1060

)

Yang

Chen

Sun

Jin

VFDB 2008 release: an enhanced web-based resource for comparative pathogenomics

Nucleic Acids Res.

2008

, vol.

(pg.

D539

D542

)

Liu

et al.

(15 co-authors)

Sequence analysis of pKF3-70 in Klebsiella pneumoniae: probable origin from R100-like plasmid of Escherichia coli

PLoS One

2010

, vol.

pg.

e8601

Zienkiewicz

Kern-Zdanowicz

Golebiewski

Zylinska

Mieczkowski

Gniadkowski

Bardowski

Ceglowski

Mosaic structure of p1658/97, a 125-kilobase plasmid harboring an active amplicon with the extended-spectrum beta-lactamase gene blaSHV-5

Antimicrob Agents Chemother.

2007

, vol.

(pg.

1164

1171

)

Author notes

Associate editor: James McInerney

Download all slides

Month:	Total Views:
December 2016	5
January 2017	8
February 2017	35
March 2017	25
April 2017	18
May 2017	18
June 2017	28
July 2017	14
August 2017	6
September 2017	17
October 2017	13
November 2017	31
December 2017	28
January 2018	33
February 2018	25
March 2018	49
April 2018	27
May 2018	30
June 2018	37
July 2018	26
August 2018	60
September 2018	39
October 2018	30
November 2018	84
December 2018	33
January 2019	21
February 2019	44
March 2019	61
April 2019	86
May 2019	35
June 2019	40
July 2019	44
August 2019	45
September 2019	19
October 2019	36
November 2019	38
December 2019	41
January 2020	30
February 2020	38
March 2020	24
April 2020	53
May 2020	20
June 2020	26
July 2020	30
August 2020	40
September 2020	27
October 2020	57
November 2020	41
December 2020	45
January 2021	56
February 2021	54
March 2021	62
April 2021	54
May 2021	28
June 2021	22
July 2021	36
August 2021	11
September 2021	32
October 2021	41
November 2021	37
December 2021	25
January 2022	27
February 2022	43
March 2022	39
April 2022	21
May 2022	35
June 2022	21
July 2022	47
August 2022	29
September 2022	65
October 2022	33
November 2022	36
December 2022	42
January 2023	44
February 2023	29
March 2023	26
April 2023	47
May 2023	37
June 2023	37
July 2023	16
August 2023	18
September 2023	15
October 2023	18
November 2023	19
December 2023	42
January 2024	47
February 2024	33
March 2024	27
April 2024	60
May 2024	64
June 2024	33
July 2024	37
August 2024	18
September 2024	32
October 2024	29
November 2024	61
December 2024	28
January 2025	27
February 2025	36
March 2025	41
April 2025	31

Article Contents

Large-Scale Analysis of Plasmid Relationships through Gene-Sharing Networks

Abstract

Introduction

Materials and Methods

Data Sets Assembly

Network Construction

Permutation Tests

Estimation of Plasmid Mobility

Network Centralities, Statistics, and Visualization

Estimation of the Phylogenetic Distances of Gene Sharing

Estimation of Phylogenetic Coherence in Major Network Clusters

Results and Discussion

Gene-Sharing Networks

Network Features and Taxonomy

Dynamics of Genes in the Plasmid Population

Network Comparison

Analysis of Mobilizable and Conjugative Plasmids

Gene Sharing over Phylogenetic Classes

Gene Transfer between Actinobacteridae and Gammaproteobacteria

Conclusions

References

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Large-Scale Analysis of Plasmid Relationships through Gene-Sharing Networks

Abstract

Introduction

Materials and Methods

Data Sets Assembly

Network Construction

Permutation Tests

Estimation of Plasmid Mobility

Network Centralities, Statistics, and Visualization

Estimation of the Phylogenetic Distances of Gene Sharing

Estimation of Phylogenetic Coherence in Major Network Clusters

Results and Discussion

Gene-Sharing Networks

Network Features and Taxonomy

Dynamics of Genes in the Plasmid Population

Network Comparison

Analysis of Mobilizable and Conjugative Plasmids

Gene Sharing over Phylogenetic Classes

Gene Transfer between Actinobacteridae and Gammaproteobacteria

Conclusions

References

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only