Abstract

Plasmids are vessels of genetic exchange in microbial communities. They are known to transfer between different host organisms and acquire diverse genetic elements from chromosomes and/or other plasmids. Therefore, they constitute an important element in microbial evolution by rapidly disseminating various genetic properties among different communities. A paradigmatic example of this is the dissemination of antibiotic resistance (AR) genes that has resulted in the emergence of multiresistant pathogenic bacterial strains. To globally analyze the evolutionary dynamics of plasmids, we built a large graph in which 2,343 plasmids (nodes) are connected according to the proteins shared by each other. The analysis of this gene-sharing network revealed an overall coherence between network clustering and the phylogenetic classes of the corresponding microorganisms, likely resulting from genetic barriers to horizontal gene transfer between distant phylogenetic groups. Habitat was not a crucial factor in clustering as plasmids from organisms inhabiting different environments were often found embedded in the same cluster. Analyses of network metrics revealed a statistically significant correlation between plasmid mobility and their centrality within the network, providing support to the observation that mobile plasmids are particularly important in spreading genes in microbial communities. Finally, our study reveals an extensive (and previously undescribed) sharing of AR genes between Actinobacteria and Gammaproteobacteria, suggesting that the former might represent an important reservoir of AR genes for the latter.

Introduction

Plasmids are paradigmatic examples of the network-like structure of microbial evolution (Brilli et al. 2008). Indeed, they are among the most important players in the evolution of prokaryotes because they can be transferred between microorganisms, thus representing natural vectors for the transfer of genes and the functions they code for (Norman et al. 2009). Accordingly, they often provide a basis for genomic rearrangements via homologous recombination, facilitating the loss and/or acquisition of genes during these events, which may eventually lead to horizontal gene transfer (HGT). As a consequence, plasmids possess a mosaic structure with collections of functional genetic modules, each of which likely possessing an independent phylogenetic history, organized into a stable and self-replicating entity (Osborn et al. 2000; Toussaint and Merlin 2002; Bosi et al. 2011). Importantly, these functional blocks often embed genes that might have a great impact on the metabolic functions of the host cell, providing additional traits that can be accumulated without altering the gene content of the bacterial chromosome (Fondi et al. 2010). Plasmids are actually involved in many accessorial functions and constitute, together with “not essential” chromosomal regions, what is referred to as the “dispensable genome” in the microbial pan-genome concept (Medini et al. 2005). This, in turn, can include genes for ecologically important traits, such as antibiotic resistance (AR) (Crosa et al. 1975), pathogen virulence (Hacker and Kaper 2000), symbiotic nitrogen fixation (van Rhijn and Vanderleyden 1995), and the production of allelopathic bacteriocins (Riley and Gordon 1999). Among these processes, pathogenesis and AR are those that have been primarily explored up to now. Indeed, it has been shown that the presence of plasmids can be strictly linked to the emergence of pathogenic lineages within a given taxonomic unit (Reynaud et al. 2008; Le Roux et al. 2010). Parallely, in terms of AR, plasmids serve a central role as the vehicles for resistance gene capture and their subsequent spreading (Bennett 2008; Fondi and Fani 2010). Dissemination of these features represents one of the most important effects of “bacterial sex,” from both an evolutionary and an ecological viewpoint (Kohiyama et al. 2003). In this context, plasmid mobility represents an essential parameter of microorganisms' fitness, and it might also be a key element to an understanding of the epidemiology of these plasmid-carried traits (Smillie et al. 2010). However, despite their clear biological relevance, the pathways followed by plasmids during their evolutionary history remain almost obscure.

Nowadays, the use of massive plasmid sequencing as a routine laboratory technique (Schluter et al. 2008), together with the development of bioinformatics tools enabling the visualization of sequence homology relationships through similarity networks (Vlasblom et al. 2006; Brilli et al. 2008), can greatly speed up studies of gene mobility among plasmids. Furthermore, thanks to the expansion of network-oriented representation of sequences similarity relationships (Lima-Mendez et al. 2007; Brilli et al. 2008; Dagan and Martin 2009; Dagan et al. 2010; Fondi and Fani 2010; Fondi et al. 2010; Halary et al. 2010), graph theory measures have been applied to better describe the gene(s) flow across the diverse microbial communities, paving the way to large scale comparative analyses adopting bioinformatics strategies. In more detail, by adopting a gene-sharing network approach, Dagan et al. (2008) reported that the construction and the analysis of graphs capturing both vertical and lateral components of evolutionary history among 539,723 genes distributed across 181 sequenced prokaryotic genomes. The same authors estimated that an impressive amount (almost 80% on average) of the gene content of each analyzed genome was involved in lateral gene transfer at some point in evolution. More recently, Halary et al. (2010) applied mathematical studies of the centralities of a network embedding 119,381 homologous DNA families. They demonstrated that plasmids, and not viruses, are likely the key vectors of genetic exchange between bacterial chromosomes. Moreover, results also supported a disconnected yet highly structured network of genetic diversity, revealing the existence of multiple “genetic worlds.” From the analysis of the same network, the same authors also inferred that DNA pools mostly circulate between vehicles (i.e., plasmids, phages, and chromosomes) of the same type. Finally, Lima-Mendez et al. (2008) represented relationships across the phage population as a weighted graph where nodes represented phages and edges represented phage–phage similarities in terms of gene content. Their approach succeeded in capturing the pervasive mosaicism of phage genomes, indicating the importance of horizontal gene exchange in their evolution and also proving to be a promising tool for predicting lifestyles of individual phages from sequence data.

By applying a computational network-oriented pipeline, we have analyzed the evolutionary relationships among 2,343 microbial plasmids in order to explore the role of each of them within the reticulate evolutionary dynamics of this class of mobile genetic elements. Moreover, we focused the attention on the proteins involved in two main biological processes, that is, AR and pathogenesis as well as on plasmid features that might be involved in ruling the overall network of plasmids-mediated HGT (e.g., plasmid mobility). Data obtained provide interesting clues in gaining a systemic interpretation of the overall behavior of plasmids within bacterial evolution and in the spreading of some key biological features, such as AR and virulence.

Materials and Methods

Data Sets Assembly

All the available complete plasmid sequences (in GenBank format) were downloaded from NCBI using EFetch interface (as on 24 July 2010). Totally, 2,343 plasmids (102,772 open reading frames) were retrieved, and a complete table including all their main features (their size, taxonomy, accession codes, etc.) is available as supplementary S1 (Supplementary Material online). Moreover, two different subsets of sequences were created starting from the whole plasmid sequences data set. On one side, we created a set of plasmid-encoded proteins that were involved in the process of AR. This was done using each of the retrieved sequences as seen in basic local alignment search tool (BLAST) (Altschul et al. 1997) search against the Antibiotic Resistance DataBase (ARDB) (Liu and Pop 2009) using the following parameters: e value, 1 × 10−20; minimum alignment length, 50 amino acids (aa); that is, a degree of amino acid sequence identity sufficiently high to retrieve all the proteins that should perform a function related to AR (Friedberg 2006; Fondi and Fani 2010). In this way, a set of 2,678 sequences putatively associated to AR were retrieved (for the complete list of accession codes of the proteins used in this work, see supplementary material S2, Supplementary Material online). These sequences belonged to 501 different plasmids.

The same strategy with the same parameters was applied when searching for virulence-related proteins (virulence factors, VFs) within the whole plasmid sequence data set. In this case, the probed database was the Virulence Factor DataBase (VFDB) (Chen et al. 2005; Yang et al. 2008), and a set of 7,840 sequences were retrieved from this BLAST search (belonging to 615 plasmids). Again, all the information about these sequences is available as supplementary material S3 (Supplementary Material online).

Network Construction

The network construction workflow described in this paragraph has been applied to each of the three assembled data sets, that is, the one embedding all retrieved plasmids sequences (hereinafter referred to as “all sequences network”), the one embedding the AR-related sequences (the “resistance network”), and the one embedding VF-related sequences (the “virulence network”).

In detail, each of the sequence data set was used in an all against all BLAST probing (Altschul et al. 1997) using the Murska parallel computing cluster (Center for Scientific Computing, Espoo, Finland). The BLAST output was parsed to include matches from two different identity thresholds (70% and 95%) by using ad hoc-implemented Python scripts. Two parsed files were obtained, one embedding those sequences sharing at least 70% sequence identity and another one embedding sequences sharing at least 95% identity. Similarly to Dagan et al. (2008) and, later, to Halary et al. (2010), this allows to interpret the resulting networks under a molecular clock–based assumption, that is, under the hypothesis that proteins with the highest percentages of identity were likely to be more recently shared than the ones with less identity. In the present context, proteins with 95% identity were considered more recently shared than those with 70%.

Subsequently each of these parsed BLAST outputs was transformed into a gene-sharing network and visualized using the Gephi visualization program (Bastian et al. 2009). Accordingly, in this network, each node represents a single plasmid, and two different plasmids are linked on the basis of their shared protein content. In particular, sharing is defined by a BLAST match between two reading frames longer than 300 bp and 95% or 70% amino acid identity, respectively, therefore representing an absolute measure. To investigate the dynamics of plasmids among bacterial cells, we applied a further filter to each of the obtained graph, maintaining linked only those edges sharing at least five proteins and discarded all the connections linking plasmids with a lower amount of shared proteins. Similarly, to investigate the dynamics of individual genes or small gene clusters among the plasmid population, we applied a filter to maintain only those edges that constitute sharing less than five genes. Altogether, we obtained eight different networks: 70% and 95% identity values for all sequences with more or less than five gene transfers and sequences related to AR or VF. The Gephi-formatted network files are available as supplementary material S4 (Supplementary Material online).

Permutation Tests

To evaluate the statistical significance of observed preferential gene flows (see below), we randomly permuted 10,000 times the phylogenetic affiliation of each node, while keeping intact the original degree of each node within the network (randomization with node degree conservation, see Brohee et al. 2008). A P value was then obtained by counting the number of times the randomly assembled networks returned a number of links greater (or lower) than the observed one and dividing this number for the total amount of performed permutation tests.

Estimation of Plasmid Mobility

The presence of genes related to plasmid mobility were identified by BLAST analysis (with the following parameters: e value, 1 × 1020; minimum alignment length, 50 aa) of the plasmid-encoded amino acid sequences against a tra and mob gene data set retrieved from ACLAME database (http://aclame.ulb.ac.be/; Leplae et al. 2004). Since tra and mob genes are generally associated with plasmid mobility and conjugation, we defined plasmid as mobile if it contained one or more mob or tra genes (a similar approach was recently adopted by Smillie et al. 2010).

Network Centralities, Statistics, and Visualization

Network centrality values for network nodes were calculated using iGraph package in R (Csardi and Nepusz 2006). Network clustering was estimated using the Louvain algorithm implemented in Gephi (Blondel et al. 2008) by maximizing modularity and minimizing number of clusters. All statistical tests to investigate the differences in degree and betweenness distributions and GC% content were performed using the base statistics tools in R (R Development Core Team 2010; http://www.r-project.org/). Data plotting was performed using ggplot2 package of R (Wickham 2009). All other statistical analyses were performed using in-house developed Perl and Python scripts. Visualization of network clustering and gene sharing as an ideogram was performed using Circos (Krzywinski et al. 2009).

Estimation of the Phylogenetic Distances of Gene Sharing

The 16S rRNA sequences for plasmid hosts were downloaded from Ribosomal Database project (Cole et al. 2007, 2009). The 16S rRNA sequences were aligned using the Nearest Alignment Space Termination aligner provided by Greengenes (DeSantis et al. 2006). The distance matrix of the phylogenetic distances was calculated using Phylip (Felsestein 1989).

Estimation of Phylogenetic Coherence in Major Network Clusters

The Conclustador algorithm (Leigh et al. 2011) was applied to analyze the congruence of phylogenetic trees reconstructed from the sequences of the genes shared by plasmids belonging to the same cluster in a network. Gene families responsible for the connections among the different plasmids were extracted from the 70% and 95% networks and aligned using Muscle software (Edgar 2004). Then, for each plasmid cluster, resulting multiple sequence alignments were used as input for phylogenetic coherence analysis, adopting Conclustador (Leigh et al. 2011) algorithm. Finally, SplitsTree4 (Huson and Bryant 2006) was used to visualize the phylogenetic information both in each single group identified by Conclustador and in all the groups all at once (and, together, responsible for the plasmid interconnections shown in the networks of fig. 1). In both cases, supernetworks were inferred using data available from single gene phylogenetic analyses performed with RAxML tool with 1,000 bootstrap replications.

The gene sharing between plasmids presented as matrices (A) and networks (B) at both 70% and 95% criteria. In network figures, plasmids are represented by the nodes (node size is proportional to the plasmid size) and the shared genes by the links. At least five shared genes are required to establish a link.
FIG. 1.

The gene sharing between plasmids presented as matrices (A) and networks (B) at both 70% and 95% criteria. In network figures, plasmids are represented by the nodes (node size is proportional to the plasmid size) and the shared genes by the links. At least five shared genes are required to establish a link.

Since for Conclustador to work properly analyzed data sets should not be too fragmented, that is, about the 80% of the overall taxa data set must be present in each multiple alignment, not all the identified plasmids clusters could be reliably analyzed. Accordingly, only the major clusters in the 70% and 95% networks were analyzed (namely clusters 961, 993, 1,144, and 1,238 for 70% network and 961, 993, and 1,144 for 95% network). Interestingly, the widespread fragmentation found for most of the clusters in the data set might be due to a high heterogeneity of the same clusters that, in turn, might mirror a high level of horizontal transfer of their embedded genes.

Results and Discussion

Gene-Sharing Networks

Gene sharing between plasmids was visualized as a network where the plasmids are represented as vertices (or nodes) and gene sharing as edges (or links). Altogether eight networks were constructed based on 70% and 95% identity between the amino acid sequences and the different edge criteria, such as the amount of genes shared (more than or less than five) or sharing AR or virulence genes (supplementary material S6, Supplementary Material online). The identity-based criterion introduced for links setting allows interpreting the resulting networks under a molecular clock–based assumption, that is, under the hypothesis that sequences with the highest percentages of identity (e.g., 95%) were likely to be more recently exchanged than the ones with less identity (e.g., 70%) (see, e.g., Halary et al. 2010). Data for the networks accounting for the sharing of five or more genes are reported in figure 1A and B. Overall, the plasmid network of all sequences at 70% identity (fig. 1B) threshold exhibits one major connected component, some minor connected components and a large number of disconnected plasmids (see below). The main connected component of the network of all genes (the central one in fig. 1B) embeds plasmids mainly belonging to the Proteobacteria phylum (particularly from Gamma, Alpha, and Beta subdivisions). Interestingly, this component also contains plasmids from Actinobacteria. A similar trend is observed in the case of 95% identity threshold network (fig. 1A) although, as it might be expected, in this case, the main connected component of the network is smaller. The only phylogenetically uniform major component is represented by plasmids from Borrelia burgdorferi (Spirochaetes, yellow nodes of fig. 1A and B).

In order to investigate the relationships between the taxonomy of represented microorganisms and the evolutionary interconnections of their plasmids, we performed network clustering using the Louvain algorithm implemented in Gephi (see Materials and Methods; Blondel et al. 2008) and compared the obtained plasmids groups with the phylogenetic and habitat affiliations of their constituent cells. The network clusters embedding multiple phyla and/or habitats for the 70% and 95% networks of all sequences are presented in figure 2. According to the network clustering analysis, the network clusters more typically embed members from different habitats than from different phylogenetic orders. Hence, it appears that phylogenetic distance is a greater barrier to gene sharing than having a different habitat. This is likely due to limited HGT across phylogenetic classes that could result from, for example, restriction or incompatible replication systems (as reviewed in Thomas and Nielsen 2005). Moreover, these observations are consistent with findings from microbial ecology and previous in silico analyses (Baquero et al. 2008; Fondi and Fani 2010) and suggest that there is a (more or less) high degree of mixing of microbes between unrelated environments.

The major phylogenetic groups, their habitats, and their clustering in (A) the 70% and the 95% networks for ≥5 networks and in 70% and the 95% networks for <5 networks (B) and (C), respectively. The clusters that were subjected to Conclustador analysis have been indicated. In (D) the amount of interphylum and intraphylum and interclass and intraclass clustering in the networks is reported for both <5 (low) and ≥5 (high) networks. The clustering of the network has been determined using the Louvain algorithm implemented in Gephi (see Materials and Methods).
FIG. 2.

The major phylogenetic groups, their habitats, and their clustering in (A) the 70% and the 95% networks for ≥5 networks and in 70% and the 95% networks for <5 networks (B) and (C), respectively. The clusters that were subjected to Conclustador analysis have been indicated. In (D) the amount of interphylum and intraphylum and interclass and intraclass clustering in the networks is reported for both <5 (low) and ≥5 (high) networks. The clustering of the network has been determined using the Louvain algorithm implemented in Gephi (see Materials and Methods).

Gene sharing across phylogenetic classes implies at least one past HGT event and is therefore simple to detect. However, HGT could also be commonplace within phylogenetic classes. To investigate this, all the major network clusters (including those reported in fig. 2) were analyzed using Conclustador package to infer phylogenetically congruent and incongruent gene families. Overall, obtained data (provided as supplementary material S5, Supplementary Material online) revealed a high level of incongruence among the analyzed clusters. Indeed, Conclustador identified 8, 4, 2, and 3 different groups within 961, 993, 1,144, and 1,238 major plasmids clusters, respectively. Similarly, in the 95% network, 6, 4, and 2 distinct phylogenetic groups for 961, 993, and 1,144 were retrieved. The construction of phylogenetic networks of the sequences embedded in the groups identified by Conclustador revealed, in most cases, high levels of interspecies reticulation. Overall, these data suggest that the presence of potential abundant HGT at lower taxonomical levels than those reported in figures 1 and 2.

Furthermore, in order to shed some light on the putative functions encoded by the shared genes, we performed a Clusters of Orthologous Groups (COG) of proteins-based functional annotation of the sequences embedded in each plasmid cluster. Data obtained (also reported in supplementary material S5, Supplementary Material online) revealed that most of the sequences responsible of the plasmids interconnections encode for proteins involved in DNA transposition and recombination. This is not surprising since these functions are strongly linked to the process of HGT and, consequently, to plasmids. Nevertheless, as shown in supplementary material S5 (Supplementary Material online), other genes are shared among the different plasmids embedded in the same cluster and, importantly, their encoded functions are not directly related to the process of HGT itself. This suggests that other functions, probably related to more complex phenotypes, are shared by the different plasmids, including, for example, genes involved in transcription, inorganic ion transport and metabolism, and cell motility (the three most abundant functional categories of plasmids cluster 961, see supplementary material S5, Supplementary Material online).

To study the sharing of resistance and virulence genes, the same procedure of network construction was applied to the AR and VF sequence data sets. Results of these analyses for networks of 70% identity criterion are shown in supplementary material S6 (Supplementary Material online). Overall, the topology of both networks appeared to be similar to 70% and 95% networks of all sequences, although some differences can be identified. Indeed, concerning the AR network, the Proteobacterial plasmids do not form a single component, but two different major components can now be identified, one embedding Gammaproteobacterial and Actinobacterial plasmids and the other one embedding Beta and Alphaproteobacterial sequences. This suggests that plasmids belonging to these taxonomic units are not preferential transfer partners of AR genes for Gammaproteobacteria representatives. Conversely, in the virulence network, Proteobacterial plasmids form the major connected component of the graph (supplementary material S6, Supplementary Material online), revealing an intense sharing of virulence-related genes among microorganisms belonging to this taxonomic unit. Although some remarkable exceptions of plasmids acting as bridges in connecting otherwise separate groups do exist (see below), the other clusters of virulence network are overall coherent with the phylogenetic class affiliation (although intense gene sharing might be present within these groups of plasmids, as shown by previous phylogenetic coherence analysis).

Network Features and Taxonomy

In order to globally analyze the evolutionary relationships underlying the plasmid populations, we applied graph theory measures to the gene-sharing networks. In particular, the networks were analyzed for node degree and betweenness. Degree is defined as the number of connections a node has to other nodes. In the present context, a plasmid with a high degree is a plasmid that shares a large number of genes with other plasmids. Betweenness is a centrality measure that is defined as the frequency of a node to lie on the shortest path between two other network nodes. In this context, a plasmid with a high betweenness can transfer genes to many other plasmid in the network with a low number of gene transfer events and, in other words, can function as a bridge between otherwise disconnected regions of the network.

Accordingly, we computed centrality measures along the network, for all the classes of prokaryotes present in the data set. Results are provided in figure 3, whose analysis revealed a positive correlation between degree and betweenness that has also been observed by Halary et al. (2010). However, in the network, some nodes showed a much higher betweenness than most nodes of the same degree (see below). Such outliers, characterized by a low degree but a high betweenness, are especially important in any given network, as they can be seen as bridges between smaller, more connected parts of the network (Halary et al. 2010).

Dependency of plasmid betweenness from plasmid degree for different phylogenetic classes according to Pearson's product–moment correlation coefficient.
FIG. 3.

Dependency of plasmid betweenness from plasmid degree for different phylogenetic classes according to Pearson's product–moment correlation coefficient.

Tables 1 and 2 report the highest degree and betweenness values, respectively, for individual plasmids in the 70% and 95% identity networks of all sequences. The analysis of table 1 reveals that all the plasmids possessing the highest values of degree belong to the Gamma subdivision of Proteobacteria. This result can be easily explained by the oversampling of plasmids from this class of bacteria. Indeed, the plasmids data used in this study are unsystematically gathered from several unrelated sources and are highly biased toward human pathogenic organisms (most of Gammaproteobacteria) (Wu et al. 2009). In this context, it is likely that more detailed studies of individual environments would reveal several gene-sharing events between various phylogenetic groups that are not represented in the current data set. Nevertheless, a detailed inspection of high-degree plasmids gave further support to previous observations based on single plasmids sequence data. In fact, for example, plasmid pU302L (see table 1) from Salmonella enterica subsp. enterica serovar Typhimurium has already been described for possessing a mosaic pattern of sequence homology with other plasmids (Chen et al. 2007), suggesting, in turn, that this plasmid acquired resistance genes from a variety of enteric bacteria (Chen et al. 2007). Notably, the fact that this plasmid is the best degree scoring plasmid in the 95% network indicates that it acquired foreign genetic material from very closely related microorganisms and/or very recently in time. Similarly, most of the other plasmids embedded in table 1 possess a well-documented history of HGT events (see, e.g., p1658/97 [Zienkiewicz et al. 2007; Yi et al. 2010] and pKF3-140 [Yi et al. 2010]).

Table 1.

Individual Plasmids with the Highest Degree Measures Observed in the Gene-Sharing Networks of All Genes.

Accession NumberMicroorganismPlasmid NameDegreeNo. of tra/mob GenesConjugative (c) or Mobilizable (m)
70% Network
    NC_010119Salmonella enterica subsp. enterica serovar CholeraesuispOU751926817c
    NC_006856S. enterica subsp. enterica serovar Choleraesuis str. SC-B67pSC13825417c
    NC_011964Escherichia colipAPEC-O103-ColBM2538c
    NC_013951Klebsiella pneumoniaepKF3-1402479c
    NC_013728E. coli O26:H-pO26-CRL24321c
    NC_010488E. coli SMS-3-5pSMS35_13024213c
    NC_011092S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633pCVM19633_110.24117c
    NC_006816S. enterica subsp. enterica serovar TyphimuriumpU302L24017c
    NC_013122E. colipEK49923115c
    NC_013437S. enterica subsp. enterica serovar TyphimuriumpSLT-BT2254c
95% Network
    NC_006816S. enterica subsp. enterica serovar TyphimuriumpU302S19216c
    NC_010488E. coli SMS-3-5pSMS35_13018813c
    NC_006856S. enterica subsp. enterica serovar Choleraesuis str. SC-B67pSC13818717c
    NC_013951K. pneumoniaepKF3-1401869c
    NC_011964E. colipAPEC-O103-ColBM1848c
    NC_010119S. enterica subsp. enterica serovar CholeraesuispOU751917117
    NC_013728E. coli O26:H-pO26-CRL16821c
    NC_013122E. colipEK49916615c
    NC_011092S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633pCVM19633_110.16517c
    NC_004998E. colip1658/9715711c
Accession NumberMicroorganismPlasmid NameDegreeNo. of tra/mob GenesConjugative (c) or Mobilizable (m)
70% Network
    NC_010119Salmonella enterica subsp. enterica serovar CholeraesuispOU751926817c
    NC_006856S. enterica subsp. enterica serovar Choleraesuis str. SC-B67pSC13825417c
    NC_011964Escherichia colipAPEC-O103-ColBM2538c
    NC_013951Klebsiella pneumoniaepKF3-1402479c
    NC_013728E. coli O26:H-pO26-CRL24321c
    NC_010488E. coli SMS-3-5pSMS35_13024213c
    NC_011092S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633pCVM19633_110.24117c
    NC_006816S. enterica subsp. enterica serovar TyphimuriumpU302L24017c
    NC_013122E. colipEK49923115c
    NC_013437S. enterica subsp. enterica serovar TyphimuriumpSLT-BT2254c
95% Network
    NC_006816S. enterica subsp. enterica serovar TyphimuriumpU302S19216c
    NC_010488E. coli SMS-3-5pSMS35_13018813c
    NC_006856S. enterica subsp. enterica serovar Choleraesuis str. SC-B67pSC13818717c
    NC_013951K. pneumoniaepKF3-1401869c
    NC_011964E. colipAPEC-O103-ColBM1848c
    NC_010119S. enterica subsp. enterica serovar CholeraesuispOU751917117
    NC_013728E. coli O26:H-pO26-CRL16821c
    NC_013122E. colipEK49916615c
    NC_011092S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633pCVM19633_110.16517c
    NC_004998E. colip1658/9715711c
Table 1.

Individual Plasmids with the Highest Degree Measures Observed in the Gene-Sharing Networks of All Genes.

Accession NumberMicroorganismPlasmid NameDegreeNo. of tra/mob GenesConjugative (c) or Mobilizable (m)
70% Network
    NC_010119Salmonella enterica subsp. enterica serovar CholeraesuispOU751926817c
    NC_006856S. enterica subsp. enterica serovar Choleraesuis str. SC-B67pSC13825417c
    NC_011964Escherichia colipAPEC-O103-ColBM2538c
    NC_013951Klebsiella pneumoniaepKF3-1402479c
    NC_013728E. coli O26:H-pO26-CRL24321c
    NC_010488E. coli SMS-3-5pSMS35_13024213c
    NC_011092S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633pCVM19633_110.24117c
    NC_006816S. enterica subsp. enterica serovar TyphimuriumpU302L24017c
    NC_013122E. colipEK49923115c
    NC_013437S. enterica subsp. enterica serovar TyphimuriumpSLT-BT2254c
95% Network
    NC_006816S. enterica subsp. enterica serovar TyphimuriumpU302S19216c
    NC_010488E. coli SMS-3-5pSMS35_13018813c
    NC_006856S. enterica subsp. enterica serovar Choleraesuis str. SC-B67pSC13818717c
    NC_013951K. pneumoniaepKF3-1401869c
    NC_011964E. colipAPEC-O103-ColBM1848c
    NC_010119S. enterica subsp. enterica serovar CholeraesuispOU751917117
    NC_013728E. coli O26:H-pO26-CRL16821c
    NC_013122E. colipEK49916615c
    NC_011092S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633pCVM19633_110.16517c
    NC_004998E. colip1658/9715711c
Accession NumberMicroorganismPlasmid NameDegreeNo. of tra/mob GenesConjugative (c) or Mobilizable (m)
70% Network
    NC_010119Salmonella enterica subsp. enterica serovar CholeraesuispOU751926817c
    NC_006856S. enterica subsp. enterica serovar Choleraesuis str. SC-B67pSC13825417c
    NC_011964Escherichia colipAPEC-O103-ColBM2538c
    NC_013951Klebsiella pneumoniaepKF3-1402479c
    NC_013728E. coli O26:H-pO26-CRL24321c
    NC_010488E. coli SMS-3-5pSMS35_13024213c
    NC_011092S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633pCVM19633_110.24117c
    NC_006816S. enterica subsp. enterica serovar TyphimuriumpU302L24017c
    NC_013122E. colipEK49923115c
    NC_013437S. enterica subsp. enterica serovar TyphimuriumpSLT-BT2254c
95% Network
    NC_006816S. enterica subsp. enterica serovar TyphimuriumpU302S19216c
    NC_010488E. coli SMS-3-5pSMS35_13018813c
    NC_006856S. enterica subsp. enterica serovar Choleraesuis str. SC-B67pSC13818717c
    NC_013951K. pneumoniaepKF3-1401869c
    NC_011964E. colipAPEC-O103-ColBM1848c
    NC_010119S. enterica subsp. enterica serovar CholeraesuispOU751917117
    NC_013728E. coli O26:H-pO26-CRL16821c
    NC_013122E. colipEK49916615c
    NC_011092S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633pCVM19633_110.16517c
    NC_004998E. colip1658/9715711c
Table 2.

Individual Plasmids with the Highest Betweenness Measures Observed in the Gene-Sharing Networks of All Genes.

Accession NumberMicroorganismPlasmid NameBetweennessNo. of tra/mob GenesConjugative (c) or Mobilizable (m)
70% Network
    NC_007635Escherichia colipCoo805010c/m
    NC_006663Staphylococcus epidermidis RP62ApSERP63293m
    NC_007974Cupriavidus metallidurans CH34megaplasmid606714c
    NC_011092Salmonella enterica subsp. enterica serovar Schwarzengrund str. CVM19633pCVM19633_110580017c
    NC_010558E. coli 1520pIP1206575016c
    NC_009651Klebsiella pneumoniae subsp. pneumoniae MGH 78578pKPN5564111
    NC_011339Bacillus cereus H3081.97pH308197_25855072m
    NC_011655B. cereus AH187pAH187_27053307c/m
    NC_012586Rhizobium sp. NGR234pNGR234b527188c
    NC_010980Enterococcus faeciumpVEF347004m
95% Network
    NC_011092S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633pCVM19633_1103878117c
    NC_005024Staphylococcus aureuspSK41290207c
    NC_012547S. aureuspGO1290209c
    NC_010378E. colipOLA52212213c
    NC_005054S. aureuspLW043192096c
    NC_009435Lactococcus lactispGdh442182167m
    NC_004669Enterococcus faecalis V583pTEF1156178
    NC_008381Rhizobium leguminosarum bv. viciae 3841pRL101503027c
    NC_013121E. colipEK5161372411c
    NC_005327E. colipC15-1a130739c
    NC_011996Macrococcus caseolyticus JCSC5402pMCCL2129814m
Accession NumberMicroorganismPlasmid NameBetweennessNo. of tra/mob GenesConjugative (c) or Mobilizable (m)
70% Network
    NC_007635Escherichia colipCoo805010c/m
    NC_006663Staphylococcus epidermidis RP62ApSERP63293m
    NC_007974Cupriavidus metallidurans CH34megaplasmid606714c
    NC_011092Salmonella enterica subsp. enterica serovar Schwarzengrund str. CVM19633pCVM19633_110580017c
    NC_010558E. coli 1520pIP1206575016c
    NC_009651Klebsiella pneumoniae subsp. pneumoniae MGH 78578pKPN5564111
    NC_011339Bacillus cereus H3081.97pH308197_25855072m
    NC_011655B. cereus AH187pAH187_27053307c/m
    NC_012586Rhizobium sp. NGR234pNGR234b527188c
    NC_010980Enterococcus faeciumpVEF347004m
95% Network
    NC_011092S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633pCVM19633_1103878117c
    NC_005024Staphylococcus aureuspSK41290207c
    NC_012547S. aureuspGO1290209c
    NC_010378E. colipOLA52212213c
    NC_005054S. aureuspLW043192096c
    NC_009435Lactococcus lactispGdh442182167m
    NC_004669Enterococcus faecalis V583pTEF1156178
    NC_008381Rhizobium leguminosarum bv. viciae 3841pRL101503027c
    NC_013121E. colipEK5161372411c
    NC_005327E. colipC15-1a130739c
    NC_011996Macrococcus caseolyticus JCSC5402pMCCL2129814m
Table 2.

Individual Plasmids with the Highest Betweenness Measures Observed in the Gene-Sharing Networks of All Genes.

Accession NumberMicroorganismPlasmid NameBetweennessNo. of tra/mob GenesConjugative (c) or Mobilizable (m)
70% Network
    NC_007635Escherichia colipCoo805010c/m
    NC_006663Staphylococcus epidermidis RP62ApSERP63293m
    NC_007974Cupriavidus metallidurans CH34megaplasmid606714c
    NC_011092Salmonella enterica subsp. enterica serovar Schwarzengrund str. CVM19633pCVM19633_110580017c
    NC_010558E. coli 1520pIP1206575016c
    NC_009651Klebsiella pneumoniae subsp. pneumoniae MGH 78578pKPN5564111
    NC_011339Bacillus cereus H3081.97pH308197_25855072m
    NC_011655B. cereus AH187pAH187_27053307c/m
    NC_012586Rhizobium sp. NGR234pNGR234b527188c
    NC_010980Enterococcus faeciumpVEF347004m
95% Network
    NC_011092S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633pCVM19633_1103878117c
    NC_005024Staphylococcus aureuspSK41290207c
    NC_012547S. aureuspGO1290209c
    NC_010378E. colipOLA52212213c
    NC_005054S. aureuspLW043192096c
    NC_009435Lactococcus lactispGdh442182167m
    NC_004669Enterococcus faecalis V583pTEF1156178
    NC_008381Rhizobium leguminosarum bv. viciae 3841pRL101503027c
    NC_013121E. colipEK5161372411c
    NC_005327E. colipC15-1a130739c
    NC_011996Macrococcus caseolyticus JCSC5402pMCCL2129814m
Accession NumberMicroorganismPlasmid NameBetweennessNo. of tra/mob GenesConjugative (c) or Mobilizable (m)
70% Network
    NC_007635Escherichia colipCoo805010c/m
    NC_006663Staphylococcus epidermidis RP62ApSERP63293m
    NC_007974Cupriavidus metallidurans CH34megaplasmid606714c
    NC_011092Salmonella enterica subsp. enterica serovar Schwarzengrund str. CVM19633pCVM19633_110580017c
    NC_010558E. coli 1520pIP1206575016c
    NC_009651Klebsiella pneumoniae subsp. pneumoniae MGH 78578pKPN5564111
    NC_011339Bacillus cereus H3081.97pH308197_25855072m
    NC_011655B. cereus AH187pAH187_27053307c/m
    NC_012586Rhizobium sp. NGR234pNGR234b527188c
    NC_010980Enterococcus faeciumpVEF347004m
95% Network
    NC_011092S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633pCVM19633_1103878117c
    NC_005024Staphylococcus aureuspSK41290207c
    NC_012547S. aureuspGO1290209c
    NC_010378E. colipOLA52212213c
    NC_005054S. aureuspLW043192096c
    NC_009435Lactococcus lactispGdh442182167m
    NC_004669Enterococcus faecalis V583pTEF1156178
    NC_008381Rhizobium leguminosarum bv. viciae 3841pRL101503027c
    NC_013121E. colipEK5161372411c
    NC_005327E. colipC15-1a130739c
    NC_011996Macrococcus caseolyticus JCSC5402pMCCL2129814m

High-betweenness nodes (plasmids) span over a larger taxonomic spectrum, suggesting that this centrality measure is less affected by sampling biases. Indeed, the plasmids with the highest betweenness values belong to diverse phylogenetic classes, including Bacilli, Lactobacilli, and Gamma, Beta, and Alphaproteobacterial representatives. As in the case of high-degree plasmids, mosaic-like structure of high-betweenness plasmids has been described before, for example, of pCoo from Escherichia coli (Froehlich et al. 2005) and pGO1 from Staphylococcus aureus (Caryl and O'Neill 2009). Hence, although the overall plasmids clustering seems to agree with taxonomic classification of their source microorganisms, some plasmids compact the overall network, residing in the path between plasmids that otherwise would remain disconnected (Halary et al. 2010). Importantly, some of the plasmids that were found to possess high-degree/betweenness values (tables 1 and 2) were the same that were found to be central in other gene-sharing network analyses performed by Halary et al. (2010) (namely, plasmids pOU7519, pU302L from Salmonella representatives, p1658/97, pIP1206 from E. coli, pKPN5 from Klebsiella pneumoniae, pVEF3 from Enterobacter faecium, pSK41 from S. aureus, pGdh442 from Lactococcus lactis, and pTEF1 from Enterococcus faecalis V583), thus confirming the key role of these DNA molecules in the flow of genetic material among different microorganisms. In our opinion, these plasmids represent key players from an evolutionary viewpoint, contributing to the spreading of potentially clinically relevant genetic determinants within the whole bacterial mobilome.

Several plasmids (1,159 for the 70% identity network of all genes and 1,369 for the 95% identity network) in the data set shared less than five genes with any other plasmid and therefore did not belong to any connected component. The taxonomic composition of this disconnected component of the network is presented in figure 4. Statistical randomization testing (as described in Materials and Methods) was performed to evaluate the effect of sampling bias in the frequency distribution. Most of the phylogenetic classes possessed between 2% and 5% of disconnected plasmids, the only exception being represented by Gammaproteobacteria (almost 15% of disconnected plasmids). For most classes, the amount of disconnected plasmids was higher than expected by random shuffling of the networks.

The phylogenetic class distribution of the disconnected plasmids in the data set. A plus sign (+) is used to mark the interclass transfers that were more abundant than expected by random assignment of the transfer events between plasmids (permutation test, P value < 1 × 10−4). A minus sign (−) is used to mark the interclass transfers that were less abundant than expected by random assignment of the transfer events between plasmids (permutation test, P value < 1 × 10−4).
FIG. 4.

The phylogenetic class distribution of the disconnected plasmids in the data set. A plus sign (+) is used to mark the interclass transfers that were more abundant than expected by random assignment of the transfer events between plasmids (permutation test, P value < 1 × 104). A minus sign (−) is used to mark the interclass transfers that were less abundant than expected by random assignment of the transfer events between plasmids (permutation test, P value < 1 × 104).

Dynamics of Genes in the Plasmid Population

In the previous sections, we mainly analyzed networks in which two plasmids were connected if they shared (at least) five genes, thus surely underestimating the real amount of gene transfer events among plasmids. To go into greater details and to analyze the possible dynamics of gene transfer among plasmids, we built gene-sharing networks taking into account the sharing of single genes (up to four genes) among two given plasmids. Such networks were constructed adopting the same computational strategy used for ≥5 networks (see Materials and Methods) and, together with singlets taxonomical distribution and cross-taxa interconnections, are reported in supplementary material S7 (Supplementary Material online). Overall, <5 networks embedded almost the same number of links (11,458 and 5,136 for 70% and 95% identity thresholds, respectively) compared with >5 networks (12,444 and 6,777 for 70% and 95% identity thresholds, respectively), suggesting the presence of an extensive amount of single gene (or of relatively small gene sets) exchange among the different plasmids.

Louvain clustering of <5 networks, although producing a large fraction of taxonomically highly coherent groups, resulted in slightly more heterogeneous plasmid clustering compared with the clustering obtained from ≥5 networks (fig. 2B and C). This suggests that when considering the transfer single genes or groups of small genes, taxonomical barriers can be bypassed more frequently than in the movement of larger sets of genes. In agreement with the previous congruency analysis, a deeper analysis of the phylogenetic coherence (adopting the coherence analysis pipeline described in Materials and Methods) of the gene families within the major network clusters revealed a high amount of incongruency (data not shown). Hence, according to the overall body of data presented here, it appears that the sharing of relatively small gene sets is more abundant and spans over a larger phylogenetic distance than transfers of larger sets of genes, although the great part of this genetic exchange still happens within the boundaries of microbial phylogenetic classes.

Network Comparison

To explore the differences among the networks, we computed Pearson product–moment correlation coefficients between betweenness and degree values for each node (i.e., plasmid) (fig. 5). Data obtained revealed a low-positive correlation between betweenness and degree in each of the networks, independently from the nucleic acid identity thresholds and/or the functions shared among the different plasmids (virulence or AR genes). R2 values range between 0.25 and 0.36 for 70% networks and are slightly higher for 95% networks (ranging from 0.29 to 0.44). Accordingly, node degree does not explain all the variation in node betweenness regardless the timing of the gene transfer(s) (70% vs. 95% thresholds) and/or to the functions that are transferred (virulence vs. AR determinants)—the values are most likely determined by the mobile nature of plasmids themselves.

Dependency of plasmid betweenness from plasmid degree for the major networks built in this work according to Pearson's product–moment correlation coefficient. Networks of <5 and ≥5 connections are indicated as low and high, respectively.
FIG. 5.

Dependency of plasmid betweenness from plasmid degree for the major networks built in this work according to Pearson's product–moment correlation coefficient. Networks of <5 and ≥5 connections are indicated as low and high, respectively.

Analysis of Mobilizable and Conjugative Plasmids

Conjugative plasmids have been defined as “vessels” of the communal gene pool (Norman et al. 2009). Indeed, this class of plasmids possesses the ability to “visit” different cells and, in principle, undergo genetic rearrangements (such as homologous recombination) with other plasmids and/or other informative molecules (phage genomes and chromosomes). For this reason, conjugative plasmids might be expected to possess a more central position within the overall plasmid gene-sharing network in respect to those that are not mobilizable. To test this hypothesis, all the tra- and mob-like sequences of the plasmids were eliminated from the networks, and the centrality measures of conjugative/mobilizable plasmids were evaluated. Plasmid mobility was estimated by identifying the number of mob and tra genes that they harbor (an approach similar to that adopted in Smillie et al. 2010 and described in Materials and Methods). The relationship existing between the mobility and the network measures was investigated by studying the distribution of the centrality measures between the mobile and the nonmobile plasmids. The distributions of the centrality measures are presented in figure 6 and are significantly higher for mobilizable plasmids in the networks of all genes and resistance genes (P values according to Mann–Whitney tests are presented in fig. 6). Therefore, the presence of mob or tra genes significantly promotes the gene-sharing measures in the networks of all genes and AR genes. This suggests that plasmid mobility is an important mechanism in spreading various genetic traits within the plasmid community, including AR genes. This fully agrees with the central role inferred for conjugative plasmids in the context of bacterial evolution (Norman et al. 2009) and gives further support to the idea that these particular plasmids act as vessels of the communal gene pool. This also indicates that the high incidence of high degree and betweenness values in certain phylogenetic classes (such as Gammaproteobacteria) does not only result from their overrepresentation in current data set but are also affected by genetic properties of their plasmids.

The relationship between the network centrality measures and plasmid mobility. The mobile plasmids are significantly more central in the networks of all and resistance genes, as indicated by the P values (calculated with Mann–Whitney tests) embedded in the figure.
FIG. 6.

The relationship between the network centrality measures and plasmid mobility. The mobile plasmids are significantly more central in the networks of all and resistance genes, as indicated by the P values (calculated with Mann–Whitney tests) embedded in the figure.

Gene Sharing over Phylogenetic Classes

The importance of plasmids within the complex microbial evolutionary network resides also in the capability to connect microbes separated by a (more or less) long phylogenetic distance and to overcome the various barriers to HGT (Thomas and Nielsen 2005). The occurrence of gene sharing over phylogenetic classes was enumerated and visualized in figure 7.

The frequency of interclass gene transfer events in the networks. A plus sign (+) is used to mark the interclass transfers that were more abundant than expected by random assignment of the transfer events between plasmids (permutation test, P value < 1 × 10−4). A minus sign (−) is used to mark the interclass transfers that were less abundant than expected by random assignment of the transfer events between plasmids (permutation test, P value < 1 × 10−4).
FIG. 7.

The frequency of interclass gene transfer events in the networks. A plus sign (+) is used to mark the interclass transfers that were more abundant than expected by random assignment of the transfer events between plasmids (permutation test, P value < 1 × 104). A minus sign (−) is used to mark the interclass transfers that were less abundant than expected by random assignment of the transfer events between plasmids (permutation test, P value < 1 × 104).

Interestingly, some connections in the network span over very large phylogenetic distances. For example, we found connections linking Alphaproteobacteria and Cyanobacteria and in particular plasmid pCC7120beta from Nostoc sp. PCC 7120 with plasmid pBBta01 from Bradyrhizobium sp. BTAi1 and pCC7120gamma from Nostoc sp. PCC 7120 with plasmid pNGR234b from Sinorhizobium fredii NGR234. These connections suggest that the presence of HGT among microorganisms inhabiting very different ecological niches (multiple and host associated for Cyanobacteria and Alphaproteobacteria, respectively), involving genes linked to important functions, such as copper transport and transcriptional regulation, respectively. Remarkably, also interkingdom transfers (involving chemotaxis-related genes) were observed: This is the case, for example, of connections linking plasmid pH308197_258 from Bacillus cereus H3081.97 to plasmid pHmuk01 from Halomicrobium mukohataei DSM 12286. Also in this case, microorganisms belong to likely unrelated habitats (multiple and specialized, respectively).

However, because the amount of interclasses connections is likely strongly affected by sampling biases, we performed statistical tests to investigate the significance of the observed interclass connections by performing random permutation of the original network, as described in Materials and Methods. In the 70% identity network, interclass links included connections between more closely related microorganisms (e.g., connections between Alpha, Beta, and Gammaproteobacteria and between Bacilli and Lactobacilli) as well as connections between more distantly related microorganisms (i.e., Actinobacteria and Betaproteobacteria, Actinobacteria and Gammaproteobacteria, and Alphaproteobacteria and Deinococci). However, some closely related microorganisms possessed a lower amount of connections than expected by chance (e.g., between Alphaproteobacteria and Gammaproteobacteria, P value < 1 × 104), possibly indicating a genetic incompability between these groups (Thomas and Nielsen 2005). As it might be expected, when analyzing the 95% network, the number of observed connections decreased and mainly closely related taxonomic groups were still interconnected (Bacilli–Lactobacilli and Betaproteobacteria–Gammaproteobacteria [P value < 1 × 104] among overrepresented and Alphaproteobacteria–Gammaproteobacteria and Bacilli–Gammaproteobacteria among underrepresented [P value < 1 × 104]). Notably, the connection between distantly related Gammaproteobacteria and Actinobacteria also remained strong.

As noted in the case of gene transfers among phylogenetically incoherent groups (supplementary material S7, Supplementary Material online), the majority of shared genes code for functions that are related to the process of HGT itself and generally belong to L category in COG annotation (fig. 8). Nevertheless, also other functions are exchanged as indicated by gene sharing (fig. 8), underlining the key role of plasmids in spreading important biological traits throughout the whole microbial world.

COG functional annotation of the genes shared by the plasmids belonging to the different taxonomical classes of the data set.
FIG. 8.

COG functional annotation of the genes shared by the plasmids belonging to the different taxonomical classes of the data set.

Gene Transfer between Actinobacteridae and Gammaproteobacteria

According to the results presented in figure 7, the gene sharing between Actinobacteria and Gammaproteobacteria is spanning one of the longest phylogenetic distances within our networks (supplementary material S8, Supplementary Material online) and appears to be crucial in transferring AR genes. Furthermore, most of the shared genes are (at least) 95% similar and therefore, according to the molecular clock hypothesis, the transfer between these classes has occurred recently. For this reason, we further analyzed this, apparently preferential, gene flow.

To better characterize the gene sharing between Actinobacteria and Gammaproteobacteria, we selected representative plasmids with a high amount of shared genes between Gammaproteobacteria and Actinobacteria and visualized them as circular ideogram with resistance-, conjugation-, and transposition-related genes and gene-sharing events (fig. 9). The analysis of figure 9 revealed that the AR genes transfer between the plasmids by transposition, as most of the links connecting Actinobacteria and Gammaproteobacteria fall in plasmid regions embedding AR- and/or transposition-related genes. These results indicate the presence of a clinically important gene flow between representatives of these microbial groups, although not suggesting the possible direction of these gene transfers (i.e., from Actinobacteria to Gammaproteobacteria or vice versa). To shed some light on this point, we investigated the composition of the involved plasmids under the assumption that, if the HGT events are recent (as suggested by the high amino acid identity), the transferred genes are expected to have a GC content closer to the donor plasmids rather than to the recipient one (Karlin 2001). Hence, the GC content of the Actinobacterial and Gammaproteobacterial plasmids and genes was calculated and compared (supplementary material S9, Supplementary Material online). The Actinobacterial plasmid GC content (mean 0.56% from seven plasmids) was significantly higher (P value = 9.4 × 103 according to a Mann–Whitney test) than the Gammaproteobacterial GC content (mean 0.51% from 95 plasmids). Moreover, GC contents were calculated for the individual transferred genes and compared with the plasmids. According to Mann–Whitney test, the transferred genes have a significantly different GC content from the Gammaproteobacterial plasmids (P = 7.0 × 1015) but are not significantly different from Actinobacterial plasmids (P = 0.42). Accordingly, the whole body of data presented in this section suggests that the direction of gene transfer is very likely from Actinobacteria to Gammaproteobacteria. This is consistent with the knowledge that some Actinobacteria are natural producers of antibiotic compounds and, therefore, a potential source of AR genes to human pathogens (Wright 2007; Miao and Davies 2010).

An ideogram of gene transfers between Actinobacterial plasmids (accession nos. NC_004939, NC_004945, and NC_014167) and Gammaproteobacterial plasmids (accession nos. NC_006816, NC_009141, NC_009651, NC_010488, NC_010886, and NC_011092). Gene-sharing events are marked using the curves in the middle of the ideogram. GC content of the plasmids is plotted on the outer side of the plasmid molecules if it is above the average of the GC content of the corresponding plasmid. Genes related to resistance, conjugation, and transposition are marked as lines on outer, middle, and innermost rings, respectively, on the inner side of the plasmid ring.
FIG. 9.

An ideogram of gene transfers between Actinobacterial plasmids (accession nos. NC_004939, NC_004945, and NC_014167) and Gammaproteobacterial plasmids (accession nos. NC_006816, NC_009141, NC_009651, NC_010488, NC_010886, and NC_011092). Gene-sharing events are marked using the curves in the middle of the ideogram. GC content of the plasmids is plotted on the outer side of the plasmid molecules if it is above the average of the GC content of the corresponding plasmid. Genes related to resistance, conjugation, and transposition are marked as lines on outer, middle, and innermost rings, respectively, on the inner side of the plasmid ring.

Conclusions

The use of gene-sharing network as a tool to investigate microbial evolutionary relationships is rapidly expanding, especially when studying nontree-like structures that sometimes can arise in evolution (Dagan et al. 2008; Halary et al. 2010). The power of such approach is demonstrated here by revealing the relationships between biological properties (e.g., plasmids mobility) and network properties (e.g., plasmid centrality) in the gene-sharing network. Moreover, the approach applied here also revealed an extensive AR gene sharing between Actinobacterial and Gammaproteobacterial plasmids, suggesting a potential source of AR genes that might have led to the recent emergence of antibiotic multiresistance in pathogenic organisms.

The plasmid sequences analyzed in this study were gathered in a nonsystematic manner from different sequencing projects; their sampling is therefore random and likely biased toward human pathogenic organisms. The bioinformatic workflow described here would be best suited for single genomic sequence data sets obtained from specifically selected environments. We expect such data sets to become available as the DNA sequencing costs decrease, and genome sequencing from single cells becomes a routine approach (Stepanauskas and Sieracki 2007; Rodrigue et al. 2009). The proposed approach could then be used to investigate whether the functional categories of transferred genes would reflect the different selective patterns present in the given environment(s). Therefore, obtaining single genome data sets from multiple different environments would permit evaluation and comparison of gene-sharing patterns in response to different environmental conditions.

The study was financially supported by the Academy of Finland (grant number 129873) and the Finnish Graduate School in Environmental Science and Technology (EnSTe). M.F. is financed by a postdoctoral grant from “Fondazione Adriano Buzzati-Traverso.” The authors would like to thank Kimmo Mattila for his kind assistance in parallel BLAST analyses.

References

Altschul
SF
Madden
TL
Schaffer
AA
Zhang
J
Zhang
Z
Miller
W
Lipman
DJ
,
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
Nucleic Acids Res.
,
1997
, vol.
25
(pg.
3389
-
3402
)
Baquero
F
Martinez
JL
Canton
R
,
Antibiotics and antibiotic resistance in water environments
Curr Opin Biotechnol.
,
2008
, vol.
19
(pg.
260
-
265
)
Bastian
M
Heymann
S
Jacomy
M
,
Gephi: an open source software for exploring and manipulating networks [Internet]
International AAAI Conference on Weblogs and Social Media
,
2009
 
Bennett
PM
,
Plasmid encoded antibiotic resistance: acquisition and transfer of antibiotic resistance genes in bacteria
Br J Pharmacol.
,
2008
, vol.
153
Suppl 1
(pg.
S347
-
S357
)
Blondel
VD
Guillaume
J
Lambiotte
R
Lefebvre
E
,
Fast unfolding of communities in large networks
J Stat Mech.
,
2008
pg.
P10008
Bosi
E
Fani
R
Fondi
M
,
The mosaicism of plasmids revealed by atypical genes detection and analysis
BMC Genomics
,
2011
, vol.
12
pg.
403
Brilli
M
Mengoni
A
Fondi
M
Bazzicalupo
M
Lio
P
Fani
R
,
Analysis of plasmid genes by phylogenetic profiling and visualization of homology relationships using Blast2Network
BMC Bioinformatics
,
2008
, vol.
9
pg.
551
Brohee
S
Faust
K
Lima-Mendez
G
Vanderstocken
G
van Helden
J
,
Network Analysis Tools: from biological networks to clusters and pathways
Nat Protoc.
,
2008
, vol.
3
(pg.
1616
-
1629
)
Caryl
JA
O'Neill
AJ
,
Complete nucleotide sequence of pGO1, the prototype conjugative plasmid from the Staphylococci
Plasmid
,
2009
, vol.
62
(pg.
35
-
38
)
Chen
CY
Nace
GW
Solow
B
Fratamico
P
,
Complete nucleotide sequences of 84.5- and 3.2-kb plasmids in the multi-antibiotic resistant Salmonella enterica serovar Typhimurium U302 strain G8430
Plasmid
,
2007
, vol.
57
(pg.
29
-
43
)
Chen
L
Yang
J
Yu
J
Yao
Z
Sun
L
Shen
Y
Jin
Q
,
VFDB: a reference database for bacterial virulence factors
Nucleic Acids Res.
,
2005
, vol.
33
(pg.
D325
-
D328
)
Cole
JR
Chai
B
Farris
RJ
Wang
Q
Kulam-Syed-Mohideen
AS
McGarrell
DM
Bandela
AM
Cardenas
E
Garrity
GM
Tiedje
JM
,
The ribosomal database project (RDP-II): introducing myRDP space and quality controlled public data
Nucleic Acids Res.
,
2007
, vol.
35
(pg.
D169
-
D172
)
Cole
JR
Wang
Q
Cardenas
E
et al.
(11 co-authors)
,
The Ribosomal Database Project: improved alignments and new tools for rRNA analysis
Nucleic Acids Res.
,
2009
, vol.
37
(pg.
D141
-
D145
)
Crosa
JH
Luttropp
LK
Falkow
S
,
Nature of R-factor replication in the presence of chloramphenicol
Proc Natl Acad Sci U S A.
,
1975
, vol.
72
(pg.
654
-
658
)
Csardi
G
Nepusz
T
,
The igraph software package for complex network research
InterJournal Complex Systems
,
2006
Dagan
T
Artzy-Randrup
Y
Martin
W
,
Modular networks and cumulative impact of lateral transfer in prokaryote genome evolution
Proc Natl Acad Sci U S A.
,
2008
, vol.
105
(pg.
10039
-
10044
)
Dagan
T
Martin
W
,
Getting a better picture of microbial evolution en route to a network of genomes
Philos Trans R Soc Lond B Biol Sci.
,
2009
, vol.
364
(pg.
2187
-
2196
)
Dagan
T
Roettger
M
Bryant
D
Martin
W
,
Genome networks root the tree of life between prokaryotic domains
Genome Biol Evol.
,
2010
, vol.
2
(pg.
379
-
392
)
DeSantis
TZ
Jr
Hugenholtz
P
Keller
K
Brodie
EL
Larsen
N
Piceno
YM
Phan
R
Andersen
GL
,
NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes
Nucleic Acids Res.
,
2006
, vol.
34
(pg.
W394
-
W399
)
Edgar
RC
,
MUSCLE: a multiple sequence alignment method with reduced time and space complexity
BMC Bioinformatics
,
2004
, vol.
5
pg.
113
Felsestein
J
,
PHYLIP—phylogenetic inference package (version 3.2)
Cladistics
,
1989
, vol.
5
(pg.
164
-
166
)
Fondi
M
Bacci
G
Brilli
M
Papaleo
MC
Mengoni
A
Vaneechoutte
M
Dijkshoorn
L
Fani
R
,
Exploring the evolutionary dynamics of plasmids: the Acinetobacter pan-plasmidome
BMC Evol Biol.
,
2010
, vol.
10
pg.
59
Fondi
M
Fani
R
,
The horizontal flow of the plasmid resistome: clues from inter-generic similarity networks
Environ Microbiol.
,
2010
, vol.
12
(pg.
3228
-
3242
)
Friedberg
I
,
Automated protein function prediction—the genomic challenge
Brief Bioinformatics
,
2006
, vol.
7
(pg.
225
-
242
)
Froehlich
B
Parkhill
J
Sanders
M
Quail
MA
Scott
JR
,
The pCoo plasmid of enterotoxigenic Escherichia coli is a mosaic cointegrate
J Bacteriol.
,
2005
, vol.
187
(pg.
6509
-
6516
)
Hacker
J
Kaper
JB
,
Pathogenicity islands and the evolution of microbes
Annu Rev Microbiol.
,
2000
, vol.
54
(pg.
641
-
679
)
Halary
S
Leigh
JW
Cheaib
B
Lopez
P
Bapteste
E
,
Network analyses structure genetic diversity in independent genetic worlds
Proc Natl Acad Sci U S A.
,
2010
, vol.
107
(pg.
127
-
132
)
Huson
DH
Bryant
D
,
Application of phylogenetic networks in evolutionary studies
Mol Biol Evol.
,
2006
, vol.
23
(pg.
254
-
267
)
Karlin
S
,
Detecting anomalous gene clusters and pathogenicity islands in diverse bacterial genomes
Trends Microbiol.
,
2001
, vol.
9
(pg.
335
-
343
)
Kohiyama
M
Hiraga
S
Matic
I
Radman
M
,
Bacterial sex: playing voyeurs 50 years later
Science
,
2003
, vol.
301
(pg.
802
-
803
)
Krzywinski
M
Schein
J
Birol
I
Connors
J
Gascoyne
R
Horsman
D
Jones
SJ
Marra
MA
,
Circos: an information aesthetic for comparative genomics
Genome Res.
,
2009
, vol.
19
(pg.
1639
-
1645
)
Le Roux
F
Labreuche
Y
Davis
BM
Iqbal
N
Mangenot
S
Goarant
C
Mazel
D
Waldor
MK
,
Virulence of an emerging pathogenic lineage of Vibrio nigripulchritudo is dependent on two plasmids
Environ Microbiol.
,
2010
, vol.
13
(pg.
296
-
306
)
Leigh
JW
Schliep
K
Lopez
P
Bapteste
E
,
Let them fall where they may: congruence analysis in massive, phylogenetically messy datasets
Mol Biol Evol.
,
2011
, vol.
28
(pg.
2773
-
2785
)
Leplae
R
Hebrant
A
Wodak
SJ
Toussaint
A
,
ACLAME: a CLAssification of Mobile genetic Elements
Nucleic Acids Res.
,
2004
, vol.
32
(pg.
D45
-
D49
)
Lima-Mendez
G
Toussaint
A
Leplae
R
,
Analysis of the phage sequence space: the benefit of structured information
Virology
,
2007
, vol.
365
(pg.
241
-
249
)
Lima-Mendez
G
Van Helden
J
Toussaint
A
Leplae
R
,
Reticulate representation of evolutionary and functional relationships between phage genomes
Mol Biol Evol.
,
2008
, vol.
25
(pg.
762
-
777
)
Liu
B
Pop
M
,
ARDB—Antibiotic Resistance Genes Database
Nucleic Acids Res.
,
2009
, vol.
37
(pg.
D443
-
D447
)
Medini
D
Donati
C
Tettelin
H
Masignani
V
Rappuoli
R
,
The microbial pan-genome
Curr Opin Genet Dev.
,
2005
, vol.
15
(pg.
589
-
594
)
Miao
V
Davies
J
,
Actinobacteria: the good, the bad, and the ugly
Antonie Van Leeuwenhoek
,
2010
, vol.
98
(pg.
143
-
150
)
Norman
A
Hansen
LH
Sorensen
SJ
,
Conjugative plasmids: vessels of the communal gene pool
Philos Trans R Soc Lond B Biol Sci.
,
2009
, vol.
364
(pg.
2275
-
2289
)
Osborn
AM
da Silva Tatley
FM
Steyn
LM
Pickup
RW
Saunders
JR
,
Mosaic plasmids and mosaic replicons: evolutionary lessons from the analysis of genetic diversity in IncFII-related replicons
Microbiology
,
2000
, vol.
146
Pt 9
(pg.
2267
-
2275
)
R Development Core Team
R: a language and environment for statistical computing
,
2010
 
Reynaud
Y
Saulnier
D
Mazel
D
Goarant
C
Le Roux
F
,
Correlation between detection of a plasmid and high-level virulence of Vibrio nigripulchritudo, a pathogen of the shrimp Litopenaeus stylirostris
Appl Environ Microbiol.
,
2008
, vol.
74
(pg.
3038
-
3047
)
Riley
MA
Gordon
DM
,
The ecological role of bacteriocins in bacterial competition
Trends Microbiol.
,
1999
, vol.
7
(pg.
129
-
133
)
Rodrigue
S
Malmstrom
RR
Berlin
AM
Birren
BW
Henn
MR
Chisholm
SW
,
Whole genome amplification and de novo assembly of single bacterial cells
PLoS One
,
2009
, vol.
4
pg.
e6864
Schluter
A
Krause
L
Szczepanowski
R
Goesmann
A
Puhler
A
,
Genetic diversity and composition of a plasmid metagenome from a wastewater treatment plant
J Biotechnol.
,
2008
, vol.
136
(pg.
65
-
76
)
Smillie
C
Garcillan-Barcia
MP
Francia
MV
Rocha
EP
de la Cruz
F
,
Mobility of plasmids
Microbiol Mol Biol Rev.
,
2010
, vol.
74
(pg.
434
-
452
)
Stepanauskas
R
Sieracki
ME
,
Matching phylogeny and metabolism in the uncultured marine bacteria, one cell at a time
Proc Natl Acad Sci U S A.
,
2007
, vol.
104
(pg.
9052
-
9057
)
Thomas
CM
Nielsen
KM
,
Mechanisms of, and barriers to, horizontal gene transfer between bacteria
Nat Rev Microbiol.
,
2005
, vol.
3
(pg.
711
-
721
)
Toussaint
A
Merlin
C
,
Mobile elements as a combination of functional modules
Plasmid
,
2002
, vol.
47
(pg.
26
-
35
)
van Rhijn
P
Vanderleyden
J
,
The Rhizobium-plant symbiosis
Microbiol Rev.
,
1995
, vol.
59
(pg.
124
-
142
)
Vlasblom
J
Wu
S
Pu
S
Superina
M
Liu
G
Orsi
C
Wodak
SJ
,
GenePro: a Cytoscape plug-in for advanced visualization and analysis of interaction networks
Bioinformatics
,
2006
, vol.
22
(pg.
2178
-
2179
)
Wickham
H
ggplot2: elegant graphics for data analysis
,
2009
New York
Springer
Wright
GD
,
The antibiotic resistome: the nexus of chemical and genetic diversity
Nat Rev Microbiol.
,
2007
, vol.
5
(pg.
175
-
186
)
Wu
D
Hugenholtz
P
Mavromatis
K
et al.
(34 co-authors)
,
A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea
Nature
,
2009
, vol.
462
(pg.
1056
-
1060
)
Yang
J
Chen
L
Sun
L
Yu
J
Jin
Q
,
VFDB 2008 release: an enhanced web-based resource for comparative pathogenomics
Nucleic Acids Res.
,
2008
, vol.
36
(pg.
D539
-
D542
)
Yi
H
Xi
Y
Liu
J
et al.
(15 co-authors)
,
Sequence analysis of pKF3-70 in Klebsiella pneumoniae: probable origin from R100-like plasmid of Escherichia coli
PLoS One
,
2010
, vol.
5
pg.
e8601
Zienkiewicz
M
Kern-Zdanowicz
I
Golebiewski
M
Zylinska
J
Mieczkowski
P
Gniadkowski
M
Bardowski
J
Ceglowski
P
,
Mosaic structure of p1658/97, a 125-kilobase plasmid harboring an active amplicon with the extended-spectrum beta-lactamase gene blaSHV-5
Antimicrob Agents Chemother.
,
2007
, vol.
51
(pg.
1164
-
1171
)

Author notes

Associate editor: James McInerney

Supplementary data