-
PDF
- Split View
-
Views
-
Cite
Cite
Manu Tamminen, Marko Virta, Renato Fani, Marco Fondi, Large-Scale Analysis of Plasmid Relationships through Gene-Sharing Networks, Molecular Biology and Evolution, Volume 29, Issue 4, April 2012, Pages 1225–1240, https://doi.org/10.1093/molbev/msr292
- Share Icon Share
Abstract
Plasmids are vessels of genetic exchange in microbial communities. They are known to transfer between different host organisms and acquire diverse genetic elements from chromosomes and/or other plasmids. Therefore, they constitute an important element in microbial evolution by rapidly disseminating various genetic properties among different communities. A paradigmatic example of this is the dissemination of antibiotic resistance (AR) genes that has resulted in the emergence of multiresistant pathogenic bacterial strains. To globally analyze the evolutionary dynamics of plasmids, we built a large graph in which 2,343 plasmids (nodes) are connected according to the proteins shared by each other. The analysis of this gene-sharing network revealed an overall coherence between network clustering and the phylogenetic classes of the corresponding microorganisms, likely resulting from genetic barriers to horizontal gene transfer between distant phylogenetic groups. Habitat was not a crucial factor in clustering as plasmids from organisms inhabiting different environments were often found embedded in the same cluster. Analyses of network metrics revealed a statistically significant correlation between plasmid mobility and their centrality within the network, providing support to the observation that mobile plasmids are particularly important in spreading genes in microbial communities. Finally, our study reveals an extensive (and previously undescribed) sharing of AR genes between Actinobacteria and Gammaproteobacteria, suggesting that the former might represent an important reservoir of AR genes for the latter.
Introduction
Plasmids are paradigmatic examples of the network-like structure of microbial evolution (Brilli et al. 2008). Indeed, they are among the most important players in the evolution of prokaryotes because they can be transferred between microorganisms, thus representing natural vectors for the transfer of genes and the functions they code for (Norman et al. 2009). Accordingly, they often provide a basis for genomic rearrangements via homologous recombination, facilitating the loss and/or acquisition of genes during these events, which may eventually lead to horizontal gene transfer (HGT). As a consequence, plasmids possess a mosaic structure with collections of functional genetic modules, each of which likely possessing an independent phylogenetic history, organized into a stable and self-replicating entity (Osborn et al. 2000; Toussaint and Merlin 2002; Bosi et al. 2011). Importantly, these functional blocks often embed genes that might have a great impact on the metabolic functions of the host cell, providing additional traits that can be accumulated without altering the gene content of the bacterial chromosome (Fondi et al. 2010). Plasmids are actually involved in many accessorial functions and constitute, together with “not essential” chromosomal regions, what is referred to as the “dispensable genome” in the microbial pan-genome concept (Medini et al. 2005). This, in turn, can include genes for ecologically important traits, such as antibiotic resistance (AR) (Crosa et al. 1975), pathogen virulence (Hacker and Kaper 2000), symbiotic nitrogen fixation (van Rhijn and Vanderleyden 1995), and the production of allelopathic bacteriocins (Riley and Gordon 1999). Among these processes, pathogenesis and AR are those that have been primarily explored up to now. Indeed, it has been shown that the presence of plasmids can be strictly linked to the emergence of pathogenic lineages within a given taxonomic unit (Reynaud et al. 2008; Le Roux et al. 2010). Parallely, in terms of AR, plasmids serve a central role as the vehicles for resistance gene capture and their subsequent spreading (Bennett 2008; Fondi and Fani 2010). Dissemination of these features represents one of the most important effects of “bacterial sex,” from both an evolutionary and an ecological viewpoint (Kohiyama et al. 2003). In this context, plasmid mobility represents an essential parameter of microorganisms' fitness, and it might also be a key element to an understanding of the epidemiology of these plasmid-carried traits (Smillie et al. 2010). However, despite their clear biological relevance, the pathways followed by plasmids during their evolutionary history remain almost obscure.
Nowadays, the use of massive plasmid sequencing as a routine laboratory technique (Schluter et al. 2008), together with the development of bioinformatics tools enabling the visualization of sequence homology relationships through similarity networks (Vlasblom et al. 2006; Brilli et al. 2008), can greatly speed up studies of gene mobility among plasmids. Furthermore, thanks to the expansion of network-oriented representation of sequences similarity relationships (Lima-Mendez et al. 2007; Brilli et al. 2008; Dagan and Martin 2009; Dagan et al. 2010; Fondi and Fani 2010; Fondi et al. 2010; Halary et al. 2010), graph theory measures have been applied to better describe the gene(s) flow across the diverse microbial communities, paving the way to large scale comparative analyses adopting bioinformatics strategies. In more detail, by adopting a gene-sharing network approach, Dagan et al. (2008) reported that the construction and the analysis of graphs capturing both vertical and lateral components of evolutionary history among 539,723 genes distributed across 181 sequenced prokaryotic genomes. The same authors estimated that an impressive amount (almost 80% on average) of the gene content of each analyzed genome was involved in lateral gene transfer at some point in evolution. More recently, Halary et al. (2010) applied mathematical studies of the centralities of a network embedding 119,381 homologous DNA families. They demonstrated that plasmids, and not viruses, are likely the key vectors of genetic exchange between bacterial chromosomes. Moreover, results also supported a disconnected yet highly structured network of genetic diversity, revealing the existence of multiple “genetic worlds.” From the analysis of the same network, the same authors also inferred that DNA pools mostly circulate between vehicles (i.e., plasmids, phages, and chromosomes) of the same type. Finally, Lima-Mendez et al. (2008) represented relationships across the phage population as a weighted graph where nodes represented phages and edges represented phage–phage similarities in terms of gene content. Their approach succeeded in capturing the pervasive mosaicism of phage genomes, indicating the importance of horizontal gene exchange in their evolution and also proving to be a promising tool for predicting lifestyles of individual phages from sequence data.
By applying a computational network-oriented pipeline, we have analyzed the evolutionary relationships among 2,343 microbial plasmids in order to explore the role of each of them within the reticulate evolutionary dynamics of this class of mobile genetic elements. Moreover, we focused the attention on the proteins involved in two main biological processes, that is, AR and pathogenesis as well as on plasmid features that might be involved in ruling the overall network of plasmids-mediated HGT (e.g., plasmid mobility). Data obtained provide interesting clues in gaining a systemic interpretation of the overall behavior of plasmids within bacterial evolution and in the spreading of some key biological features, such as AR and virulence.
Materials and Methods
Data Sets Assembly
All the available complete plasmid sequences (in GenBank format) were downloaded from NCBI using EFetch interface (as on 24 July 2010). Totally, 2,343 plasmids (102,772 open reading frames) were retrieved, and a complete table including all their main features (their size, taxonomy, accession codes, etc.) is available as supplementary S1 (Supplementary Material online). Moreover, two different subsets of sequences were created starting from the whole plasmid sequences data set. On one side, we created a set of plasmid-encoded proteins that were involved in the process of AR. This was done using each of the retrieved sequences as seen in basic local alignment search tool (BLAST) (Altschul et al. 1997) search against the Antibiotic Resistance DataBase (ARDB) (Liu and Pop 2009) using the following parameters: e value, 1 × 10−20; minimum alignment length, 50 amino acids (aa); that is, a degree of amino acid sequence identity sufficiently high to retrieve all the proteins that should perform a function related to AR (Friedberg 2006; Fondi and Fani 2010). In this way, a set of 2,678 sequences putatively associated to AR were retrieved (for the complete list of accession codes of the proteins used in this work, see supplementary material S2, Supplementary Material online). These sequences belonged to 501 different plasmids.
The same strategy with the same parameters was applied when searching for virulence-related proteins (virulence factors, VFs) within the whole plasmid sequence data set. In this case, the probed database was the Virulence Factor DataBase (VFDB) (Chen et al. 2005; Yang et al. 2008), and a set of 7,840 sequences were retrieved from this BLAST search (belonging to 615 plasmids). Again, all the information about these sequences is available as supplementary material S3 (Supplementary Material online).
Network Construction
The network construction workflow described in this paragraph has been applied to each of the three assembled data sets, that is, the one embedding all retrieved plasmids sequences (hereinafter referred to as “all sequences network”), the one embedding the AR-related sequences (the “resistance network”), and the one embedding VF-related sequences (the “virulence network”).
In detail, each of the sequence data set was used in an all against all BLAST probing (Altschul et al. 1997) using the Murska parallel computing cluster (Center for Scientific Computing, Espoo, Finland). The BLAST output was parsed to include matches from two different identity thresholds (70% and 95%) by using ad hoc-implemented Python scripts. Two parsed files were obtained, one embedding those sequences sharing at least 70% sequence identity and another one embedding sequences sharing at least 95% identity. Similarly to Dagan et al. (2008) and, later, to Halary et al. (2010), this allows to interpret the resulting networks under a molecular clock–based assumption, that is, under the hypothesis that proteins with the highest percentages of identity were likely to be more recently shared than the ones with less identity. In the present context, proteins with 95% identity were considered more recently shared than those with 70%.
Subsequently each of these parsed BLAST outputs was transformed into a gene-sharing network and visualized using the Gephi visualization program (Bastian et al. 2009). Accordingly, in this network, each node represents a single plasmid, and two different plasmids are linked on the basis of their shared protein content. In particular, sharing is defined by a BLAST match between two reading frames longer than 300 bp and 95% or 70% amino acid identity, respectively, therefore representing an absolute measure. To investigate the dynamics of plasmids among bacterial cells, we applied a further filter to each of the obtained graph, maintaining linked only those edges sharing at least five proteins and discarded all the connections linking plasmids with a lower amount of shared proteins. Similarly, to investigate the dynamics of individual genes or small gene clusters among the plasmid population, we applied a filter to maintain only those edges that constitute sharing less than five genes. Altogether, we obtained eight different networks: 70% and 95% identity values for all sequences with more or less than five gene transfers and sequences related to AR or VF. The Gephi-formatted network files are available as supplementary material S4 (Supplementary Material online).
Permutation Tests
To evaluate the statistical significance of observed preferential gene flows (see below), we randomly permuted 10,000 times the phylogenetic affiliation of each node, while keeping intact the original degree of each node within the network (randomization with node degree conservation, see Brohee et al. 2008). A P value was then obtained by counting the number of times the randomly assembled networks returned a number of links greater (or lower) than the observed one and dividing this number for the total amount of performed permutation tests.
Estimation of Plasmid Mobility
The presence of genes related to plasmid mobility were identified by BLAST analysis (with the following parameters: e value, 1 × 10−20; minimum alignment length, 50 aa) of the plasmid-encoded amino acid sequences against a tra and mob gene data set retrieved from ACLAME database (http://aclame.ulb.ac.be/; Leplae et al. 2004). Since tra and mob genes are generally associated with plasmid mobility and conjugation, we defined plasmid as mobile if it contained one or more mob or tra genes (a similar approach was recently adopted by Smillie et al. 2010).
Network Centralities, Statistics, and Visualization
Network centrality values for network nodes were calculated using iGraph package in R (Csardi and Nepusz 2006). Network clustering was estimated using the Louvain algorithm implemented in Gephi (Blondel et al. 2008) by maximizing modularity and minimizing number of clusters. All statistical tests to investigate the differences in degree and betweenness distributions and GC% content were performed using the base statistics tools in R (R Development Core Team 2010; http://www.r-project.org/). Data plotting was performed using ggplot2 package of R (Wickham 2009). All other statistical analyses were performed using in-house developed Perl and Python scripts. Visualization of network clustering and gene sharing as an ideogram was performed using Circos (Krzywinski et al. 2009).
Estimation of the Phylogenetic Distances of Gene Sharing
The 16S rRNA sequences for plasmid hosts were downloaded from Ribosomal Database project (Cole et al. 2007, 2009). The 16S rRNA sequences were aligned using the Nearest Alignment Space Termination aligner provided by Greengenes (DeSantis et al. 2006). The distance matrix of the phylogenetic distances was calculated using Phylip (Felsestein 1989).
Estimation of Phylogenetic Coherence in Major Network Clusters
The Conclustador algorithm (Leigh et al. 2011) was applied to analyze the congruence of phylogenetic trees reconstructed from the sequences of the genes shared by plasmids belonging to the same cluster in a network. Gene families responsible for the connections among the different plasmids were extracted from the 70% and 95% networks and aligned using Muscle software (Edgar 2004). Then, for each plasmid cluster, resulting multiple sequence alignments were used as input for phylogenetic coherence analysis, adopting Conclustador (Leigh et al. 2011) algorithm. Finally, SplitsTree4 (Huson and Bryant 2006) was used to visualize the phylogenetic information both in each single group identified by Conclustador and in all the groups all at once (and, together, responsible for the plasmid interconnections shown in the networks of fig. 1). In both cases, supernetworks were inferred using data available from single gene phylogenetic analyses performed with RAxML tool with 1,000 bootstrap replications.

The gene sharing between plasmids presented as matrices (A) and networks (B) at both 70% and 95% criteria. In network figures, plasmids are represented by the nodes (node size is proportional to the plasmid size) and the shared genes by the links. At least five shared genes are required to establish a link.
Since for Conclustador to work properly analyzed data sets should not be too fragmented, that is, about the 80% of the overall taxa data set must be present in each multiple alignment, not all the identified plasmids clusters could be reliably analyzed. Accordingly, only the major clusters in the 70% and 95% networks were analyzed (namely clusters 961, 993, 1,144, and 1,238 for 70% network and 961, 993, and 1,144 for 95% network). Interestingly, the widespread fragmentation found for most of the clusters in the data set might be due to a high heterogeneity of the same clusters that, in turn, might mirror a high level of horizontal transfer of their embedded genes.
Results and Discussion
Gene-Sharing Networks
Gene sharing between plasmids was visualized as a network where the plasmids are represented as vertices (or nodes) and gene sharing as edges (or links). Altogether eight networks were constructed based on 70% and 95% identity between the amino acid sequences and the different edge criteria, such as the amount of genes shared (more than or less than five) or sharing AR or virulence genes (supplementary material S6, Supplementary Material online). The identity-based criterion introduced for links setting allows interpreting the resulting networks under a molecular clock–based assumption, that is, under the hypothesis that sequences with the highest percentages of identity (e.g., 95%) were likely to be more recently exchanged than the ones with less identity (e.g., 70%) (see, e.g., Halary et al. 2010). Data for the networks accounting for the sharing of five or more genes are reported in figure 1A and B. Overall, the plasmid network of all sequences at 70% identity (fig. 1B) threshold exhibits one major connected component, some minor connected components and a large number of disconnected plasmids (see below). The main connected component of the network of all genes (the central one in fig. 1B) embeds plasmids mainly belonging to the Proteobacteria phylum (particularly from Gamma, Alpha, and Beta subdivisions). Interestingly, this component also contains plasmids from Actinobacteria. A similar trend is observed in the case of 95% identity threshold network (fig. 1A) although, as it might be expected, in this case, the main connected component of the network is smaller. The only phylogenetically uniform major component is represented by plasmids from Borrelia burgdorferi (Spirochaetes, yellow nodes of fig. 1A and B).
In order to investigate the relationships between the taxonomy of represented microorganisms and the evolutionary interconnections of their plasmids, we performed network clustering using the Louvain algorithm implemented in Gephi (see Materials and Methods; Blondel et al. 2008) and compared the obtained plasmids groups with the phylogenetic and habitat affiliations of their constituent cells. The network clusters embedding multiple phyla and/or habitats for the 70% and 95% networks of all sequences are presented in figure 2. According to the network clustering analysis, the network clusters more typically embed members from different habitats than from different phylogenetic orders. Hence, it appears that phylogenetic distance is a greater barrier to gene sharing than having a different habitat. This is likely due to limited HGT across phylogenetic classes that could result from, for example, restriction or incompatible replication systems (as reviewed in Thomas and Nielsen 2005). Moreover, these observations are consistent with findings from microbial ecology and previous in silico analyses (Baquero et al. 2008; Fondi and Fani 2010) and suggest that there is a (more or less) high degree of mixing of microbes between unrelated environments.

The major phylogenetic groups, their habitats, and their clustering in (A) the 70% and the 95% networks for ≥5 networks and in 70% and the 95% networks for <5 networks (B) and (C), respectively. The clusters that were subjected to Conclustador analysis have been indicated. In (D) the amount of interphylum and intraphylum and interclass and intraclass clustering in the networks is reported for both <5 (low) and ≥5 (high) networks. The clustering of the network has been determined using the Louvain algorithm implemented in Gephi (see Materials and Methods).
Gene sharing across phylogenetic classes implies at least one past HGT event and is therefore simple to detect. However, HGT could also be commonplace within phylogenetic classes. To investigate this, all the major network clusters (including those reported in fig. 2) were analyzed using Conclustador package to infer phylogenetically congruent and incongruent gene families. Overall, obtained data (provided as supplementary material S5, Supplementary Material online) revealed a high level of incongruence among the analyzed clusters. Indeed, Conclustador identified 8, 4, 2, and 3 different groups within 961, 993, 1,144, and 1,238 major plasmids clusters, respectively. Similarly, in the 95% network, 6, 4, and 2 distinct phylogenetic groups for 961, 993, and 1,144 were retrieved. The construction of phylogenetic networks of the sequences embedded in the groups identified by Conclustador revealed, in most cases, high levels of interspecies reticulation. Overall, these data suggest that the presence of potential abundant HGT at lower taxonomical levels than those reported in figures 1 and 2.
Furthermore, in order to shed some light on the putative functions encoded by the shared genes, we performed a Clusters of Orthologous Groups (COG) of proteins-based functional annotation of the sequences embedded in each plasmid cluster. Data obtained (also reported in supplementary material S5, Supplementary Material online) revealed that most of the sequences responsible of the plasmids interconnections encode for proteins involved in DNA transposition and recombination. This is not surprising since these functions are strongly linked to the process of HGT and, consequently, to plasmids. Nevertheless, as shown in supplementary material S5 (Supplementary Material online), other genes are shared among the different plasmids embedded in the same cluster and, importantly, their encoded functions are not directly related to the process of HGT itself. This suggests that other functions, probably related to more complex phenotypes, are shared by the different plasmids, including, for example, genes involved in transcription, inorganic ion transport and metabolism, and cell motility (the three most abundant functional categories of plasmids cluster 961, see supplementary material S5, Supplementary Material online).
To study the sharing of resistance and virulence genes, the same procedure of network construction was applied to the AR and VF sequence data sets. Results of these analyses for networks of 70% identity criterion are shown in supplementary material S6 (Supplementary Material online). Overall, the topology of both networks appeared to be similar to 70% and 95% networks of all sequences, although some differences can be identified. Indeed, concerning the AR network, the Proteobacterial plasmids do not form a single component, but two different major components can now be identified, one embedding Gammaproteobacterial and Actinobacterial plasmids and the other one embedding Beta and Alphaproteobacterial sequences. This suggests that plasmids belonging to these taxonomic units are not preferential transfer partners of AR genes for Gammaproteobacteria representatives. Conversely, in the virulence network, Proteobacterial plasmids form the major connected component of the graph (supplementary material S6, Supplementary Material online), revealing an intense sharing of virulence-related genes among microorganisms belonging to this taxonomic unit. Although some remarkable exceptions of plasmids acting as bridges in connecting otherwise separate groups do exist (see below), the other clusters of virulence network are overall coherent with the phylogenetic class affiliation (although intense gene sharing might be present within these groups of plasmids, as shown by previous phylogenetic coherence analysis).
Network Features and Taxonomy
In order to globally analyze the evolutionary relationships underlying the plasmid populations, we applied graph theory measures to the gene-sharing networks. In particular, the networks were analyzed for node degree and betweenness. Degree is defined as the number of connections a node has to other nodes. In the present context, a plasmid with a high degree is a plasmid that shares a large number of genes with other plasmids. Betweenness is a centrality measure that is defined as the frequency of a node to lie on the shortest path between two other network nodes. In this context, a plasmid with a high betweenness can transfer genes to many other plasmid in the network with a low number of gene transfer events and, in other words, can function as a bridge between otherwise disconnected regions of the network.
Accordingly, we computed centrality measures along the network, for all the classes of prokaryotes present in the data set. Results are provided in figure 3, whose analysis revealed a positive correlation between degree and betweenness that has also been observed by Halary et al. (2010). However, in the network, some nodes showed a much higher betweenness than most nodes of the same degree (see below). Such outliers, characterized by a low degree but a high betweenness, are especially important in any given network, as they can be seen as bridges between smaller, more connected parts of the network (Halary et al. 2010).

Dependency of plasmid betweenness from plasmid degree for different phylogenetic classes according to Pearson's product–moment correlation coefficient.
Tables 1 and 2 report the highest degree and betweenness values, respectively, for individual plasmids in the 70% and 95% identity networks of all sequences. The analysis of table 1 reveals that all the plasmids possessing the highest values of degree belong to the Gamma subdivision of Proteobacteria. This result can be easily explained by the oversampling of plasmids from this class of bacteria. Indeed, the plasmids data used in this study are unsystematically gathered from several unrelated sources and are highly biased toward human pathogenic organisms (most of Gammaproteobacteria) (Wu et al. 2009). In this context, it is likely that more detailed studies of individual environments would reveal several gene-sharing events between various phylogenetic groups that are not represented in the current data set. Nevertheless, a detailed inspection of high-degree plasmids gave further support to previous observations based on single plasmids sequence data. In fact, for example, plasmid pU302L (see table 1) from Salmonella enterica subsp. enterica serovar Typhimurium has already been described for possessing a mosaic pattern of sequence homology with other plasmids (Chen et al. 2007), suggesting, in turn, that this plasmid acquired resistance genes from a variety of enteric bacteria (Chen et al. 2007). Notably, the fact that this plasmid is the best degree scoring plasmid in the 95% network indicates that it acquired foreign genetic material from very closely related microorganisms and/or very recently in time. Similarly, most of the other plasmids embedded in table 1 possess a well-documented history of HGT events (see, e.g., p1658/97 [Zienkiewicz et al. 2007; Yi et al. 2010] and pKF3-140 [Yi et al. 2010]).
Individual Plasmids with the Highest Degree Measures Observed in the Gene-Sharing Networks of All Genes.
Accession Number | Microorganism | Plasmid Name | Degree | No. of tra/mob Genes | Conjugative (c) or Mobilizable (m) |
70% Network | |||||
NC_010119 | Salmonella enterica subsp. enterica serovar Choleraesuis | pOU7519 | 268 | 17 | c |
NC_006856 | S. enterica subsp. enterica serovar Choleraesuis str. SC-B67 | pSC138 | 254 | 17 | c |
NC_011964 | Escherichia coli | pAPEC-O103-ColBM | 253 | 8 | c |
NC_013951 | Klebsiella pneumoniae | pKF3-140 | 247 | 9 | c |
NC_013728 | E. coli O26:H- | pO26-CRL | 243 | 21 | c |
NC_010488 | E. coli SMS-3-5 | pSMS35_130 | 242 | 13 | c |
NC_011092 | S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633 | pCVM19633_110. | 241 | 17 | c |
NC_006816 | S. enterica subsp. enterica serovar Typhimurium | pU302L | 240 | 17 | c |
NC_013122 | E. coli | pEK499 | 231 | 15 | c |
NC_013437 | S. enterica subsp. enterica serovar Typhimurium | pSLT-BT | 225 | 4 | c |
95% Network | |||||
NC_006816 | S. enterica subsp. enterica serovar Typhimurium | pU302S | 192 | 16 | c |
NC_010488 | E. coli SMS-3-5 | pSMS35_130 | 188 | 13 | c |
NC_006856 | S. enterica subsp. enterica serovar Choleraesuis str. SC-B67 | pSC138 | 187 | 17 | c |
NC_013951 | K. pneumoniae | pKF3-140 | 186 | 9 | c |
NC_011964 | E. coli | pAPEC-O103-ColBM | 184 | 8 | c |
NC_010119 | S. enterica subsp. enterica serovar Choleraesuis | pOU7519 | 171 | 17 | |
NC_013728 | E. coli O26:H- | pO26-CRL | 168 | 21 | c |
NC_013122 | E. coli | pEK499 | 166 | 15 | c |
NC_011092 | S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633 | pCVM19633_110. | 165 | 17 | c |
NC_004998 | E. coli | p1658/97 | 157 | 11 | c |
Accession Number | Microorganism | Plasmid Name | Degree | No. of tra/mob Genes | Conjugative (c) or Mobilizable (m) |
70% Network | |||||
NC_010119 | Salmonella enterica subsp. enterica serovar Choleraesuis | pOU7519 | 268 | 17 | c |
NC_006856 | S. enterica subsp. enterica serovar Choleraesuis str. SC-B67 | pSC138 | 254 | 17 | c |
NC_011964 | Escherichia coli | pAPEC-O103-ColBM | 253 | 8 | c |
NC_013951 | Klebsiella pneumoniae | pKF3-140 | 247 | 9 | c |
NC_013728 | E. coli O26:H- | pO26-CRL | 243 | 21 | c |
NC_010488 | E. coli SMS-3-5 | pSMS35_130 | 242 | 13 | c |
NC_011092 | S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633 | pCVM19633_110. | 241 | 17 | c |
NC_006816 | S. enterica subsp. enterica serovar Typhimurium | pU302L | 240 | 17 | c |
NC_013122 | E. coli | pEK499 | 231 | 15 | c |
NC_013437 | S. enterica subsp. enterica serovar Typhimurium | pSLT-BT | 225 | 4 | c |
95% Network | |||||
NC_006816 | S. enterica subsp. enterica serovar Typhimurium | pU302S | 192 | 16 | c |
NC_010488 | E. coli SMS-3-5 | pSMS35_130 | 188 | 13 | c |
NC_006856 | S. enterica subsp. enterica serovar Choleraesuis str. SC-B67 | pSC138 | 187 | 17 | c |
NC_013951 | K. pneumoniae | pKF3-140 | 186 | 9 | c |
NC_011964 | E. coli | pAPEC-O103-ColBM | 184 | 8 | c |
NC_010119 | S. enterica subsp. enterica serovar Choleraesuis | pOU7519 | 171 | 17 | |
NC_013728 | E. coli O26:H- | pO26-CRL | 168 | 21 | c |
NC_013122 | E. coli | pEK499 | 166 | 15 | c |
NC_011092 | S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633 | pCVM19633_110. | 165 | 17 | c |
NC_004998 | E. coli | p1658/97 | 157 | 11 | c |
Individual Plasmids with the Highest Degree Measures Observed in the Gene-Sharing Networks of All Genes.
Accession Number | Microorganism | Plasmid Name | Degree | No. of tra/mob Genes | Conjugative (c) or Mobilizable (m) |
70% Network | |||||
NC_010119 | Salmonella enterica subsp. enterica serovar Choleraesuis | pOU7519 | 268 | 17 | c |
NC_006856 | S. enterica subsp. enterica serovar Choleraesuis str. SC-B67 | pSC138 | 254 | 17 | c |
NC_011964 | Escherichia coli | pAPEC-O103-ColBM | 253 | 8 | c |
NC_013951 | Klebsiella pneumoniae | pKF3-140 | 247 | 9 | c |
NC_013728 | E. coli O26:H- | pO26-CRL | 243 | 21 | c |
NC_010488 | E. coli SMS-3-5 | pSMS35_130 | 242 | 13 | c |
NC_011092 | S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633 | pCVM19633_110. | 241 | 17 | c |
NC_006816 | S. enterica subsp. enterica serovar Typhimurium | pU302L | 240 | 17 | c |
NC_013122 | E. coli | pEK499 | 231 | 15 | c |
NC_013437 | S. enterica subsp. enterica serovar Typhimurium | pSLT-BT | 225 | 4 | c |
95% Network | |||||
NC_006816 | S. enterica subsp. enterica serovar Typhimurium | pU302S | 192 | 16 | c |
NC_010488 | E. coli SMS-3-5 | pSMS35_130 | 188 | 13 | c |
NC_006856 | S. enterica subsp. enterica serovar Choleraesuis str. SC-B67 | pSC138 | 187 | 17 | c |
NC_013951 | K. pneumoniae | pKF3-140 | 186 | 9 | c |
NC_011964 | E. coli | pAPEC-O103-ColBM | 184 | 8 | c |
NC_010119 | S. enterica subsp. enterica serovar Choleraesuis | pOU7519 | 171 | 17 | |
NC_013728 | E. coli O26:H- | pO26-CRL | 168 | 21 | c |
NC_013122 | E. coli | pEK499 | 166 | 15 | c |
NC_011092 | S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633 | pCVM19633_110. | 165 | 17 | c |
NC_004998 | E. coli | p1658/97 | 157 | 11 | c |
Accession Number | Microorganism | Plasmid Name | Degree | No. of tra/mob Genes | Conjugative (c) or Mobilizable (m) |
70% Network | |||||
NC_010119 | Salmonella enterica subsp. enterica serovar Choleraesuis | pOU7519 | 268 | 17 | c |
NC_006856 | S. enterica subsp. enterica serovar Choleraesuis str. SC-B67 | pSC138 | 254 | 17 | c |
NC_011964 | Escherichia coli | pAPEC-O103-ColBM | 253 | 8 | c |
NC_013951 | Klebsiella pneumoniae | pKF3-140 | 247 | 9 | c |
NC_013728 | E. coli O26:H- | pO26-CRL | 243 | 21 | c |
NC_010488 | E. coli SMS-3-5 | pSMS35_130 | 242 | 13 | c |
NC_011092 | S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633 | pCVM19633_110. | 241 | 17 | c |
NC_006816 | S. enterica subsp. enterica serovar Typhimurium | pU302L | 240 | 17 | c |
NC_013122 | E. coli | pEK499 | 231 | 15 | c |
NC_013437 | S. enterica subsp. enterica serovar Typhimurium | pSLT-BT | 225 | 4 | c |
95% Network | |||||
NC_006816 | S. enterica subsp. enterica serovar Typhimurium | pU302S | 192 | 16 | c |
NC_010488 | E. coli SMS-3-5 | pSMS35_130 | 188 | 13 | c |
NC_006856 | S. enterica subsp. enterica serovar Choleraesuis str. SC-B67 | pSC138 | 187 | 17 | c |
NC_013951 | K. pneumoniae | pKF3-140 | 186 | 9 | c |
NC_011964 | E. coli | pAPEC-O103-ColBM | 184 | 8 | c |
NC_010119 | S. enterica subsp. enterica serovar Choleraesuis | pOU7519 | 171 | 17 | |
NC_013728 | E. coli O26:H- | pO26-CRL | 168 | 21 | c |
NC_013122 | E. coli | pEK499 | 166 | 15 | c |
NC_011092 | S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633 | pCVM19633_110. | 165 | 17 | c |
NC_004998 | E. coli | p1658/97 | 157 | 11 | c |
Individual Plasmids with the Highest Betweenness Measures Observed in the Gene-Sharing Networks of All Genes.
Accession Number | Microorganism | Plasmid Name | Betweenness | No. of tra/mob Genes | Conjugative (c) or Mobilizable (m) |
70% Network | |||||
NC_007635 | Escherichia coli | pCoo | 8050 | 10 | c/m |
NC_006663 | Staphylococcus epidermidis RP62A | pSERP | 6329 | 3 | m |
NC_007974 | Cupriavidus metallidurans CH34 | megaplasmid | 6067 | 14 | c |
NC_011092 | Salmonella enterica subsp. enterica serovar Schwarzengrund str. CVM19633 | pCVM19633_110 | 5800 | 17 | c |
NC_010558 | E. coli 1520 | pIP1206 | 5750 | 16 | c |
NC_009651 | Klebsiella pneumoniae subsp. pneumoniae MGH 78578 | pKPN5 | 5641 | 11 | |
NC_011339 | Bacillus cereus H3081.97 | pH308197_258 | 5507 | 2 | m |
NC_011655 | B. cereus AH187 | pAH187_270 | 5330 | 7 | c/m |
NC_012586 | Rhizobium sp. NGR234 | pNGR234b | 5271 | 88 | c |
NC_010980 | Enterococcus faecium | pVEF3 | 4700 | 4 | m |
95% Network | |||||
NC_011092 | S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633 | pCVM19633_110 | 38781 | 17 | c |
NC_005024 | Staphylococcus aureus | pSK41 | 29020 | 7 | c |
NC_012547 | S. aureus | pGO1 | 29020 | 9 | c |
NC_010378 | E. coli | pOLA52 | 21221 | 3 | c |
NC_005054 | S. aureus | pLW043 | 19209 | 6 | c |
NC_009435 | Lactococcus lactis | pGdh442 | 18216 | 7 | m |
NC_004669 | Enterococcus faecalis V583 | pTEF1 | 15617 | 8 | |
NC_008381 | Rhizobium leguminosarum bv. viciae 3841 | pRL10 | 15030 | 27 | c |
NC_013121 | E. coli | pEK516 | 13724 | 11 | c |
NC_005327 | E. coli | pC15-1a | 13073 | 9 | c |
NC_011996 | Macrococcus caseolyticus JCSC5402 | pMCCL2 | 12981 | 4 | m |
Accession Number | Microorganism | Plasmid Name | Betweenness | No. of tra/mob Genes | Conjugative (c) or Mobilizable (m) |
70% Network | |||||
NC_007635 | Escherichia coli | pCoo | 8050 | 10 | c/m |
NC_006663 | Staphylococcus epidermidis RP62A | pSERP | 6329 | 3 | m |
NC_007974 | Cupriavidus metallidurans CH34 | megaplasmid | 6067 | 14 | c |
NC_011092 | Salmonella enterica subsp. enterica serovar Schwarzengrund str. CVM19633 | pCVM19633_110 | 5800 | 17 | c |
NC_010558 | E. coli 1520 | pIP1206 | 5750 | 16 | c |
NC_009651 | Klebsiella pneumoniae subsp. pneumoniae MGH 78578 | pKPN5 | 5641 | 11 | |
NC_011339 | Bacillus cereus H3081.97 | pH308197_258 | 5507 | 2 | m |
NC_011655 | B. cereus AH187 | pAH187_270 | 5330 | 7 | c/m |
NC_012586 | Rhizobium sp. NGR234 | pNGR234b | 5271 | 88 | c |
NC_010980 | Enterococcus faecium | pVEF3 | 4700 | 4 | m |
95% Network | |||||
NC_011092 | S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633 | pCVM19633_110 | 38781 | 17 | c |
NC_005024 | Staphylococcus aureus | pSK41 | 29020 | 7 | c |
NC_012547 | S. aureus | pGO1 | 29020 | 9 | c |
NC_010378 | E. coli | pOLA52 | 21221 | 3 | c |
NC_005054 | S. aureus | pLW043 | 19209 | 6 | c |
NC_009435 | Lactococcus lactis | pGdh442 | 18216 | 7 | m |
NC_004669 | Enterococcus faecalis V583 | pTEF1 | 15617 | 8 | |
NC_008381 | Rhizobium leguminosarum bv. viciae 3841 | pRL10 | 15030 | 27 | c |
NC_013121 | E. coli | pEK516 | 13724 | 11 | c |
NC_005327 | E. coli | pC15-1a | 13073 | 9 | c |
NC_011996 | Macrococcus caseolyticus JCSC5402 | pMCCL2 | 12981 | 4 | m |
Individual Plasmids with the Highest Betweenness Measures Observed in the Gene-Sharing Networks of All Genes.
Accession Number | Microorganism | Plasmid Name | Betweenness | No. of tra/mob Genes | Conjugative (c) or Mobilizable (m) |
70% Network | |||||
NC_007635 | Escherichia coli | pCoo | 8050 | 10 | c/m |
NC_006663 | Staphylococcus epidermidis RP62A | pSERP | 6329 | 3 | m |
NC_007974 | Cupriavidus metallidurans CH34 | megaplasmid | 6067 | 14 | c |
NC_011092 | Salmonella enterica subsp. enterica serovar Schwarzengrund str. CVM19633 | pCVM19633_110 | 5800 | 17 | c |
NC_010558 | E. coli 1520 | pIP1206 | 5750 | 16 | c |
NC_009651 | Klebsiella pneumoniae subsp. pneumoniae MGH 78578 | pKPN5 | 5641 | 11 | |
NC_011339 | Bacillus cereus H3081.97 | pH308197_258 | 5507 | 2 | m |
NC_011655 | B. cereus AH187 | pAH187_270 | 5330 | 7 | c/m |
NC_012586 | Rhizobium sp. NGR234 | pNGR234b | 5271 | 88 | c |
NC_010980 | Enterococcus faecium | pVEF3 | 4700 | 4 | m |
95% Network | |||||
NC_011092 | S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633 | pCVM19633_110 | 38781 | 17 | c |
NC_005024 | Staphylococcus aureus | pSK41 | 29020 | 7 | c |
NC_012547 | S. aureus | pGO1 | 29020 | 9 | c |
NC_010378 | E. coli | pOLA52 | 21221 | 3 | c |
NC_005054 | S. aureus | pLW043 | 19209 | 6 | c |
NC_009435 | Lactococcus lactis | pGdh442 | 18216 | 7 | m |
NC_004669 | Enterococcus faecalis V583 | pTEF1 | 15617 | 8 | |
NC_008381 | Rhizobium leguminosarum bv. viciae 3841 | pRL10 | 15030 | 27 | c |
NC_013121 | E. coli | pEK516 | 13724 | 11 | c |
NC_005327 | E. coli | pC15-1a | 13073 | 9 | c |
NC_011996 | Macrococcus caseolyticus JCSC5402 | pMCCL2 | 12981 | 4 | m |
Accession Number | Microorganism | Plasmid Name | Betweenness | No. of tra/mob Genes | Conjugative (c) or Mobilizable (m) |
70% Network | |||||
NC_007635 | Escherichia coli | pCoo | 8050 | 10 | c/m |
NC_006663 | Staphylococcus epidermidis RP62A | pSERP | 6329 | 3 | m |
NC_007974 | Cupriavidus metallidurans CH34 | megaplasmid | 6067 | 14 | c |
NC_011092 | Salmonella enterica subsp. enterica serovar Schwarzengrund str. CVM19633 | pCVM19633_110 | 5800 | 17 | c |
NC_010558 | E. coli 1520 | pIP1206 | 5750 | 16 | c |
NC_009651 | Klebsiella pneumoniae subsp. pneumoniae MGH 78578 | pKPN5 | 5641 | 11 | |
NC_011339 | Bacillus cereus H3081.97 | pH308197_258 | 5507 | 2 | m |
NC_011655 | B. cereus AH187 | pAH187_270 | 5330 | 7 | c/m |
NC_012586 | Rhizobium sp. NGR234 | pNGR234b | 5271 | 88 | c |
NC_010980 | Enterococcus faecium | pVEF3 | 4700 | 4 | m |
95% Network | |||||
NC_011092 | S. enterica subsp. enterica serovar Schwarzengrund str. CVM19633 | pCVM19633_110 | 38781 | 17 | c |
NC_005024 | Staphylococcus aureus | pSK41 | 29020 | 7 | c |
NC_012547 | S. aureus | pGO1 | 29020 | 9 | c |
NC_010378 | E. coli | pOLA52 | 21221 | 3 | c |
NC_005054 | S. aureus | pLW043 | 19209 | 6 | c |
NC_009435 | Lactococcus lactis | pGdh442 | 18216 | 7 | m |
NC_004669 | Enterococcus faecalis V583 | pTEF1 | 15617 | 8 | |
NC_008381 | Rhizobium leguminosarum bv. viciae 3841 | pRL10 | 15030 | 27 | c |
NC_013121 | E. coli | pEK516 | 13724 | 11 | c |
NC_005327 | E. coli | pC15-1a | 13073 | 9 | c |
NC_011996 | Macrococcus caseolyticus JCSC5402 | pMCCL2 | 12981 | 4 | m |
High-betweenness nodes (plasmids) span over a larger taxonomic spectrum, suggesting that this centrality measure is less affected by sampling biases. Indeed, the plasmids with the highest betweenness values belong to diverse phylogenetic classes, including Bacilli, Lactobacilli, and Gamma, Beta, and Alphaproteobacterial representatives. As in the case of high-degree plasmids, mosaic-like structure of high-betweenness plasmids has been described before, for example, of pCoo from Escherichia coli (Froehlich et al. 2005) and pGO1 from Staphylococcus aureus (Caryl and O'Neill 2009). Hence, although the overall plasmids clustering seems to agree with taxonomic classification of their source microorganisms, some plasmids compact the overall network, residing in the path between plasmids that otherwise would remain disconnected (Halary et al. 2010). Importantly, some of the plasmids that were found to possess high-degree/betweenness values (tables 1 and 2) were the same that were found to be central in other gene-sharing network analyses performed by Halary et al. (2010) (namely, plasmids pOU7519, pU302L from Salmonella representatives, p1658/97, pIP1206 from E. coli, pKPN5 from Klebsiella pneumoniae, pVEF3 from Enterobacter faecium, pSK41 from S. aureus, pGdh442 from Lactococcus lactis, and pTEF1 from Enterococcus faecalis V583), thus confirming the key role of these DNA molecules in the flow of genetic material among different microorganisms. In our opinion, these plasmids represent key players from an evolutionary viewpoint, contributing to the spreading of potentially clinically relevant genetic determinants within the whole bacterial mobilome.
Several plasmids (1,159 for the 70% identity network of all genes and 1,369 for the 95% identity network) in the data set shared less than five genes with any other plasmid and therefore did not belong to any connected component. The taxonomic composition of this disconnected component of the network is presented in figure 4. Statistical randomization testing (as described in Materials and Methods) was performed to evaluate the effect of sampling bias in the frequency distribution. Most of the phylogenetic classes possessed between 2% and 5% of disconnected plasmids, the only exception being represented by Gammaproteobacteria (almost 15% of disconnected plasmids). For most classes, the amount of disconnected plasmids was higher than expected by random shuffling of the networks.

The phylogenetic class distribution of the disconnected plasmids in the data set. A plus sign (+) is used to mark the interclass transfers that were more abundant than expected by random assignment of the transfer events between plasmids (permutation test, P value < 1 × 10−4). A minus sign (−) is used to mark the interclass transfers that were less abundant than expected by random assignment of the transfer events between plasmids (permutation test, P value < 1 × 10−4).
Dynamics of Genes in the Plasmid Population
In the previous sections, we mainly analyzed networks in which two plasmids were connected if they shared (at least) five genes, thus surely underestimating the real amount of gene transfer events among plasmids. To go into greater details and to analyze the possible dynamics of gene transfer among plasmids, we built gene-sharing networks taking into account the sharing of single genes (up to four genes) among two given plasmids. Such networks were constructed adopting the same computational strategy used for ≥5 networks (see Materials and Methods) and, together with singlets taxonomical distribution and cross-taxa interconnections, are reported in supplementary material S7 (Supplementary Material online). Overall, <5 networks embedded almost the same number of links (11,458 and 5,136 for 70% and 95% identity thresholds, respectively) compared with >5 networks (12,444 and 6,777 for 70% and 95% identity thresholds, respectively), suggesting the presence of an extensive amount of single gene (or of relatively small gene sets) exchange among the different plasmids.
Louvain clustering of <5 networks, although producing a large fraction of taxonomically highly coherent groups, resulted in slightly more heterogeneous plasmid clustering compared with the clustering obtained from ≥5 networks (fig. 2B and C). This suggests that when considering the transfer single genes or groups of small genes, taxonomical barriers can be bypassed more frequently than in the movement of larger sets of genes. In agreement with the previous congruency analysis, a deeper analysis of the phylogenetic coherence (adopting the coherence analysis pipeline described in Materials and Methods) of the gene families within the major network clusters revealed a high amount of incongruency (data not shown). Hence, according to the overall body of data presented here, it appears that the sharing of relatively small gene sets is more abundant and spans over a larger phylogenetic distance than transfers of larger sets of genes, although the great part of this genetic exchange still happens within the boundaries of microbial phylogenetic classes.
Network Comparison
To explore the differences among the networks, we computed Pearson product–moment correlation coefficients between betweenness and degree values for each node (i.e., plasmid) (fig. 5). Data obtained revealed a low-positive correlation between betweenness and degree in each of the networks, independently from the nucleic acid identity thresholds and/or the functions shared among the different plasmids (virulence or AR genes). R2 values range between 0.25 and 0.36 for 70% networks and are slightly higher for 95% networks (ranging from 0.29 to 0.44). Accordingly, node degree does not explain all the variation in node betweenness regardless the timing of the gene transfer(s) (70% vs. 95% thresholds) and/or to the functions that are transferred (virulence vs. AR determinants)—the values are most likely determined by the mobile nature of plasmids themselves.

Dependency of plasmid betweenness from plasmid degree for the major networks built in this work according to Pearson's product–moment correlation coefficient. Networks of <5 and ≥5 connections are indicated as low and high, respectively.
Analysis of Mobilizable and Conjugative Plasmids
Conjugative plasmids have been defined as “vessels” of the communal gene pool (Norman et al. 2009). Indeed, this class of plasmids possesses the ability to “visit” different cells and, in principle, undergo genetic rearrangements (such as homologous recombination) with other plasmids and/or other informative molecules (phage genomes and chromosomes). For this reason, conjugative plasmids might be expected to possess a more central position within the overall plasmid gene-sharing network in respect to those that are not mobilizable. To test this hypothesis, all the tra- and mob-like sequences of the plasmids were eliminated from the networks, and the centrality measures of conjugative/mobilizable plasmids were evaluated. Plasmid mobility was estimated by identifying the number of mob and tra genes that they harbor (an approach similar to that adopted in Smillie et al. 2010 and described in Materials and Methods). The relationship existing between the mobility and the network measures was investigated by studying the distribution of the centrality measures between the mobile and the nonmobile plasmids. The distributions of the centrality measures are presented in figure 6 and are significantly higher for mobilizable plasmids in the networks of all genes and resistance genes (P values according to Mann–Whitney tests are presented in fig. 6). Therefore, the presence of mob or tra genes significantly promotes the gene-sharing measures in the networks of all genes and AR genes. This suggests that plasmid mobility is an important mechanism in spreading various genetic traits within the plasmid community, including AR genes. This fully agrees with the central role inferred for conjugative plasmids in the context of bacterial evolution (Norman et al. 2009) and gives further support to the idea that these particular plasmids act as vessels of the communal gene pool. This also indicates that the high incidence of high degree and betweenness values in certain phylogenetic classes (such as Gammaproteobacteria) does not only result from their overrepresentation in current data set but are also affected by genetic properties of their plasmids.

The relationship between the network centrality measures and plasmid mobility. The mobile plasmids are significantly more central in the networks of all and resistance genes, as indicated by the P values (calculated with Mann–Whitney tests) embedded in the figure.
Gene Sharing over Phylogenetic Classes
The importance of plasmids within the complex microbial evolutionary network resides also in the capability to connect microbes separated by a (more or less) long phylogenetic distance and to overcome the various barriers to HGT (Thomas and Nielsen 2005). The occurrence of gene sharing over phylogenetic classes was enumerated and visualized in figure 7.

The frequency of interclass gene transfer events in the networks. A plus sign (+) is used to mark the interclass transfers that were more abundant than expected by random assignment of the transfer events between plasmids (permutation test, P value < 1 × 10−4). A minus sign (−) is used to mark the interclass transfers that were less abundant than expected by random assignment of the transfer events between plasmids (permutation test, P value < 1 × 10−4).
Interestingly, some connections in the network span over very large phylogenetic distances. For example, we found connections linking Alphaproteobacteria and Cyanobacteria and in particular plasmid pCC7120beta from Nostoc sp. PCC 7120 with plasmid pBBta01 from Bradyrhizobium sp. BTAi1 and pCC7120gamma from Nostoc sp. PCC 7120 with plasmid pNGR234b from Sinorhizobium fredii NGR234. These connections suggest that the presence of HGT among microorganisms inhabiting very different ecological niches (multiple and host associated for Cyanobacteria and Alphaproteobacteria, respectively), involving genes linked to important functions, such as copper transport and transcriptional regulation, respectively. Remarkably, also interkingdom transfers (involving chemotaxis-related genes) were observed: This is the case, for example, of connections linking plasmid pH308197_258 from Bacillus cereus H3081.97 to plasmid pHmuk01 from Halomicrobium mukohataei DSM 12286. Also in this case, microorganisms belong to likely unrelated habitats (multiple and specialized, respectively).
However, because the amount of interclasses connections is likely strongly affected by sampling biases, we performed statistical tests to investigate the significance of the observed interclass connections by performing random permutation of the original network, as described in Materials and Methods. In the 70% identity network, interclass links included connections between more closely related microorganisms (e.g., connections between Alpha, Beta, and Gammaproteobacteria and between Bacilli and Lactobacilli) as well as connections between more distantly related microorganisms (i.e., Actinobacteria and Betaproteobacteria, Actinobacteria and Gammaproteobacteria, and Alphaproteobacteria and Deinococci). However, some closely related microorganisms possessed a lower amount of connections than expected by chance (e.g., between Alphaproteobacteria and Gammaproteobacteria, P value < 1 × 10−4), possibly indicating a genetic incompability between these groups (Thomas and Nielsen 2005). As it might be expected, when analyzing the 95% network, the number of observed connections decreased and mainly closely related taxonomic groups were still interconnected (Bacilli–Lactobacilli and Betaproteobacteria–Gammaproteobacteria [P value < 1 × 10−4] among overrepresented and Alphaproteobacteria–Gammaproteobacteria and Bacilli–Gammaproteobacteria among underrepresented [P value < 1 × 10−4]). Notably, the connection between distantly related Gammaproteobacteria and Actinobacteria also remained strong.
As noted in the case of gene transfers among phylogenetically incoherent groups (supplementary material S7, Supplementary Material online), the majority of shared genes code for functions that are related to the process of HGT itself and generally belong to L category in COG annotation (fig. 8). Nevertheless, also other functions are exchanged as indicated by gene sharing (fig. 8), underlining the key role of plasmids in spreading important biological traits throughout the whole microbial world.

COG functional annotation of the genes shared by the plasmids belonging to the different taxonomical classes of the data set.
Gene Transfer between Actinobacteridae and Gammaproteobacteria
According to the results presented in figure 7, the gene sharing between Actinobacteria and Gammaproteobacteria is spanning one of the longest phylogenetic distances within our networks (supplementary material S8, Supplementary Material online) and appears to be crucial in transferring AR genes. Furthermore, most of the shared genes are (at least) 95% similar and therefore, according to the molecular clock hypothesis, the transfer between these classes has occurred recently. For this reason, we further analyzed this, apparently preferential, gene flow.
To better characterize the gene sharing between Actinobacteria and Gammaproteobacteria, we selected representative plasmids with a high amount of shared genes between Gammaproteobacteria and Actinobacteria and visualized them as circular ideogram with resistance-, conjugation-, and transposition-related genes and gene-sharing events (fig. 9). The analysis of figure 9 revealed that the AR genes transfer between the plasmids by transposition, as most of the links connecting Actinobacteria and Gammaproteobacteria fall in plasmid regions embedding AR- and/or transposition-related genes. These results indicate the presence of a clinically important gene flow between representatives of these microbial groups, although not suggesting the possible direction of these gene transfers (i.e., from Actinobacteria to Gammaproteobacteria or vice versa). To shed some light on this point, we investigated the composition of the involved plasmids under the assumption that, if the HGT events are recent (as suggested by the high amino acid identity), the transferred genes are expected to have a GC content closer to the donor plasmids rather than to the recipient one (Karlin 2001). Hence, the GC content of the Actinobacterial and Gammaproteobacterial plasmids and genes was calculated and compared (supplementary material S9, Supplementary Material online). The Actinobacterial plasmid GC content (mean 0.56% from seven plasmids) was significantly higher (P value = 9.4 × 10−3 according to a Mann–Whitney test) than the Gammaproteobacterial GC content (mean 0.51% from 95 plasmids). Moreover, GC contents were calculated for the individual transferred genes and compared with the plasmids. According to Mann–Whitney test, the transferred genes have a significantly different GC content from the Gammaproteobacterial plasmids (P = 7.0 × 10−15) but are not significantly different from Actinobacterial plasmids (P = 0.42). Accordingly, the whole body of data presented in this section suggests that the direction of gene transfer is very likely from Actinobacteria to Gammaproteobacteria. This is consistent with the knowledge that some Actinobacteria are natural producers of antibiotic compounds and, therefore, a potential source of AR genes to human pathogens (Wright 2007; Miao and Davies 2010).

An ideogram of gene transfers between Actinobacterial plasmids (accession nos. NC_004939, NC_004945, and NC_014167) and Gammaproteobacterial plasmids (accession nos. NC_006816, NC_009141, NC_009651, NC_010488, NC_010886, and NC_011092). Gene-sharing events are marked using the curves in the middle of the ideogram. GC content of the plasmids is plotted on the outer side of the plasmid molecules if it is above the average of the GC content of the corresponding plasmid. Genes related to resistance, conjugation, and transposition are marked as lines on outer, middle, and innermost rings, respectively, on the inner side of the plasmid ring.
Conclusions
The use of gene-sharing network as a tool to investigate microbial evolutionary relationships is rapidly expanding, especially when studying nontree-like structures that sometimes can arise in evolution (Dagan et al. 2008; Halary et al. 2010). The power of such approach is demonstrated here by revealing the relationships between biological properties (e.g., plasmids mobility) and network properties (e.g., plasmid centrality) in the gene-sharing network. Moreover, the approach applied here also revealed an extensive AR gene sharing between Actinobacterial and Gammaproteobacterial plasmids, suggesting a potential source of AR genes that might have led to the recent emergence of antibiotic multiresistance in pathogenic organisms.
The plasmid sequences analyzed in this study were gathered in a nonsystematic manner from different sequencing projects; their sampling is therefore random and likely biased toward human pathogenic organisms. The bioinformatic workflow described here would be best suited for single genomic sequence data sets obtained from specifically selected environments. We expect such data sets to become available as the DNA sequencing costs decrease, and genome sequencing from single cells becomes a routine approach (Stepanauskas and Sieracki 2007; Rodrigue et al. 2009). The proposed approach could then be used to investigate whether the functional categories of transferred genes would reflect the different selective patterns present in the given environment(s). Therefore, obtaining single genome data sets from multiple different environments would permit evaluation and comparison of gene-sharing patterns in response to different environmental conditions.
The study was financially supported by the Academy of Finland (grant number 129873) and the Finnish Graduate School in Environmental Science and Technology (EnSTe). M.F. is financed by a postdoctoral grant from “Fondazione Adriano Buzzati-Traverso.” The authors would like to thank Kimmo Mattila for his kind assistance in parallel BLAST analyses.
References
Author notes
Associate editor: James McInerney