-
PDF
- Split View
-
Views
-
Cite
Cite
Juliette Luiselli, Jonathan Rouzaud-Cornabas, Nicolas Lartillot, Guillaume Beslon, Genome Streamlining: Effect of Mutation Rate and Population Size on Genome Size Reduction, Genome Biology and Evolution, Volume 16, Issue 12, December 2024, evae250, https://doi.org/10.1093/gbe/evae250
- Share Icon Share
Abstract
Genome streamlining, i.e. genome size reduction, is observed in bacteria with very different life traits, including endosymbiotic bacteria and several marine bacteria, raising the question of its evolutionary origin. None of the hypotheses proposed in the literature is firmly established, mainly due to the many confounding factors related to the diverse habitats of species with streamlined genomes. Computational models may help overcome these difficulties and rigorously test hypotheses. In this work, we used Aevol, a platform designed to study the evolution of genome architecture, to test 2 main hypotheses: that an increase in population size (N) or mutation rate (μ) could cause genome reduction. In our experiments, both conditions lead to streamlining but have very different resulting genome structures. Under increased population sizes, genomes lose a significant fraction of noncoding sequences but maintain their coding size, resulting in densely packed genomes (akin to streamlined marine bacteria genomes). By contrast, under an increased mutation rate, genomes lose both coding and noncoding sequences (akin to endosymbiotic bacteria genomes). Hence, both factors lead to an overall reduction in genome size, but the coding density of the genome appears to be determined by
Many bacterial species show reduced genomes. However, the diversity of these species and their life traits makes it difficult to identify the mechanisms that led to this reduction. Indeed, no unifying hypothesis accounts for the whole diversity of genome size reduction. Here, we used simulations to systematically explore the effect of population size and mutation rate on genome size. Our results show that the interaction between these 2 factors tightly determines the size, but also the density of genomes, making it possible to account for the whole diversity of reduced genomes by acting on these 2 parameters only. Our results suggest a theoretical model in which genome reduction is driven by a robustness/fitness trade-off.
Introduction
Genome size was one of the first studied genome characteristics (Leth Bak et al. 1969; Bachmann 1972), yet its dynamic and causal factors are still poorly understood. Genome size is hugely variable across life: from less than
The observed range of genome sizes is more restricted when studying only bacterial organisms (Westoby et al. 2021), ranging from
Buchnera aphidicola, and endosymbionts more generally, are characterized by very small effective population sizes (
In sharp contrast, free-living marine bacteria such as Prochlorococcus marinus or Pelagibacter ubique also have reduced genomes (Giovannoni et al. 2005; Batut et al. 2014), but are believed to have very large effective population sizes (Marais et al. 2008; Flombaum et al. 2013; Giovannoni et al. 2014), although that is an ongoing debate (Chen et al. 2022; Filatov and Kirkpatrick 2024). Noticeably, in their case, genome size reduction is primarily contributed by the loss of noncoding sequences rather than coding sequences (Giovannoni et al. 2005; Batut et al. 2014). This phenomenon is called streamlining and could indicate a very effective selection (Wolf and Koonin 2013; Giovannoni et al. 2014). Many hypotheses have been proposed to account for genome size reduction and the associated changes in genome architecture in such free-living organisms: adaptation to a nutrient-poor environment or to other abiotic factors, the Black Queen hypothesis, or high mutation rates (Koskiniemi et al. 2012; Morris et al. 2012; Batut et al. 2014; Ngugi et al. 2023).
Both endosymbionts and free-living marine bacteria thus show a marked reduction in genome size, linked to an increase in mutation rate (Bourguignon et al. 2020) but, strikingly, also linked to either an increase or a decrease in effective population size
In this study, we focus on determining the impact of both an increased mutation rate and a change in population size on genome size evolution. However, mutation rates and population sizes are difficult to estimate. The effective population size is also highly variable through time, such that it is not totally obvious which long-term average is relevant at the macroevolutionary scale (Brevet and Lartillot 2021; Müller et al. 2022). For that reason, many comparative analyses have relied on somewhat indirect proxies, such as life-history traits (Popadin et al. 2007; Romiguier et al. 2012; Figuet et al. 2016). However, the precise quantitative relation between these proxies and effective population size is difficult to assess. Moreover, the very different living conditions and potential mutational biases of the bacterial species that have undergone genome reduction introduce many confounding factors. To avoid these pitfalls, we choose to turn to simulation, which allows us to control all the parameters (population size, mutation rate, and mutational biases) and the magnitude of their variation. It also ensures that no other factor than the ones investigated will impact the phenomenon under study. Hence, we can gain a theoretical understanding of the relationship between the different factors at stake and genome size reduction.
In silico experimental evolution provides tools to study genomic architecture in detail (Adami 2006; Hindré et al. 2012; Batut et al. 2013). For our study, we need a framework that provides coding and noncoding genomic compartments which can vary independently, and with arbitrary underlying mutational biases for the deletion/insertion balance. Then, running simulations in a perfectly controlled environment covering a broad range of population sizes N and mutation rates μ makes it possible to investigate the conditions and mechanisms leading to genome size reduction. We will hence use Aevol, a simulation platform that provides an explicit genomic structure where both the coding and noncoding genome can evolve freely. Aevol emulates the evolution of bacteria and enables replicated and controlled in silico evolution experiments with known and fixed parameters (Knibbe et al. 2007; Banse et al. 2023). It provides an ideal tool to uncover links between genome size and either population size or mutation rate, as the experimenter perfectly controls these parameters. Throughout the experiments, fitness, genome size, and amounts of coding and noncoding bases are monitored to study the evolution of genome architecture and the response of genome size to changes in μ and N.
Our results show that both an increase in N or μ lead to genome size reduction, regardless of the underlying mutational bias. However, both conditions lead to very different genome structures, as a high μ reduces both the coding and noncoding compartments while a high N reduces only the noncoding compartment. Surprisingly, they both lead to a similar coding proportion when increased by the same factor, such that
Results
We perform our experiments using Aevol, a forward-in-time evolutionary simulator (Knibbe et al. 2007; Banse et al. 2023). Aevol is an individual-based model which includes an explicit population and in which every organism owns a double-stranded genome. It uses an explicit genome decoding algorithm directly inspired by the central dogma of molecular biology to compute the phenotype, and thus the fitness, of each individual based on its genomic sequence. As Aevol also includes a large variety of mutational operators (including substitutions, InDels, and chromosomal rearrangements), this nonparametric genotype-to-phenotype map allows for changes in the genome architecture (genome size, coding density, overlapping genes or operons, etc.), without assuming a predefined distribution of fitness effects. Indeed, in the model, it is possible to reach similar fitnesses in many ways, by adjusting the number of genes, their loci, their lengths, or the intergenic distances, hence the total amount of noncoding DNA. In Aevol, genes are typically created by duplication-divergence (Kalhor et al. 2024), but they can also be deleted, and some may emerge de novo. Hence, the impact of a given mutation highly depends on the preexisting genome structure, which can in turn be indirectly selected (Knibbe et al. 2007). Aevol therefore allows studying changes in size and structure of genomes in response to changes in population size and mutation rates.
Our experiments start from 5 “Wild-Type” (WT) lines, each having evolved for 10 million generations within a population of

Total a), coding b) and non-coding c) genome size variation, and final coding fraction d), after 2 million generations. For each of the 5 WTs, 10 replicas were performed under a constant mutation rate (

Total a), coding b) and non-coding c) genome size variation, and final coding fraction d), after 2 million generations. For each of the 5 WTs, 10 replicas were performed under a constant population size (
WT id . | Fitness (arbitrary unit) . | Total genome size (bp) . | Coding size (bp) . | Non-coding size (bp) . | Coding fraction . |
---|---|---|---|---|---|
1 | 0.014903 | ||||
2 | 0.103795 | ||||
3 | 0.128472 | ||||
4 | 0.035369 | ||||
5 | 0.029588 | ||||
Average | 0.0624254 |
WT id . | Fitness (arbitrary unit) . | Total genome size (bp) . | Coding size (bp) . | Non-coding size (bp) . | Coding fraction . |
---|---|---|---|---|---|
1 | 0.014903 | ||||
2 | 0.103795 | ||||
3 | 0.128472 | ||||
4 | 0.035369 | ||||
5 | 0.029588 | ||||
Average | 0.0624254 |
WT id . | Fitness (arbitrary unit) . | Total genome size (bp) . | Coding size (bp) . | Non-coding size (bp) . | Coding fraction . |
---|---|---|---|---|---|
1 | 0.014903 | ||||
2 | 0.103795 | ||||
3 | 0.128472 | ||||
4 | 0.035369 | ||||
5 | 0.029588 | ||||
Average | 0.0624254 |
WT id . | Fitness (arbitrary unit) . | Total genome size (bp) . | Coding size (bp) . | Non-coding size (bp) . | Coding fraction . |
---|---|---|---|---|---|
1 | 0.014903 | ||||
2 | 0.103795 | ||||
3 | 0.128472 | ||||
4 | 0.035369 | ||||
5 | 0.029588 | ||||
Average | 0.0624254 |
Genome Size Evolution Following a Change in Population Size and Mutation Rate
Change in Population Size
In the absence of mutational bias, increasing the population size by a factor of 4 or 16 results in a reduction in the total genome size (see Fig. 1a). Yet, this change does not impact the coding and noncoding parts of the genome proportionally: while the size of the coding compartment is barely affected (see Fig. 1b), the noncoding genome size is greatly reduced (see Fig. 1c). As a result, the coding proportion of the genome increases (see Fig. 1d). Conversely, reducing the population size by a factor of 4 or 16 increases the total genome size (Fig. 1a) by increasing greatly the noncoding genome size (Fig. 1c). In the extreme condition
Change in Mutation Rate
In the absence of mutational bias, increasing the mutation rate drastically reduces the total genome size (see Fig. 2a). Thus, at first sight, population size and mutation rate seem to have a similar effect on genome evolution. However, in the details, the effect of these 2 variables on genome structure appears to differ, as the reduction now occurs in both the coding and non-coding genomic compartments (see Fig. 2b and c). Both are nevertheless not proportionally affected by the decrease in mutation rate, which affects more strongly the noncoding part of the genome, such that the final coding fraction of the genome increases with μ (see Fig. 2d). Altogether, these results show that streamlined genomes, denser and shorter than their ancestors, can result from either an increase in population size or in mutation rate.
Notably, and despite the very different dynamics displayed in the 2 experiments, a 4-fold increase in N or in μ results in the same final coding proportion of approximately
Linked Effect of Population Sizes and Mutation Rates
Figure 3 shows the variation in the total amount of DNA, coding size, and noncoding size, as well as the variation in coding fraction for several combinations of changes in N and μ (note that, in the panels of Fig. 3, the bottom line and the central column, respectively, correspond to the values presented in Figs. 1 and 2).

Amount of DNA a), coding size b), noncoding size c) and coding fraction d) for the different combinations of μ and N tested, after 2 million generations. For each of the 5 WTs, 10 replicas were performed for each tested set of conditions. Control conditions (
Overall, as N increases, the total amount of DNA decreases, whatever the value of μ (see Fig. 3a). A higher μ also leads to a reduction in the total genome size, whatever the value of N. However, the effect of population size and mutation rate differ when considering the coding size of the genome: specifically, the coding size increases with N but decreases with μ (see Fig. 3b). This is countered by the change in the noncoding size of the genomes (see Fig. 3c), which strongly decreases with both N and μ and drives the overall change in genome size.
The interplay between N and μ results in a surprisingly constant coding fraction across the different constant values of
However, strikingly, the total genome size as well as the coding and noncoding genome sizes vary greatly, even for similar coding densities (Fig. 3b, c, and d). For densities of
Mutational Biases Change the Equilibrium Genome Size, but not the Role of N and μ
As genome sizes are generally thought to be heavily impacted by mutational biases, we control whether the effect of population size and mutation rate we observed is affected by either a deletion or an insertion bias. To this end, we evolved 5 WT organisms with either an insertion bias (twice as many duplications than large deletions), or a deletion bias (twice as many large deletions than duplications). The rates of all other types of mutations, as well as the sum of all mutation rates, are the same as in the previous experiments. As expected, the equilibrium genome sizes and coding proportions of these WT is affected by the balance between large deletions and duplications, with an average genome size of
We then confronted the median (in terms of genome size) WT of each condition to changes in population size (multiplied or divided by 4) or mutation rate (multiplied by 4) for 10 replicas. Similarly to what is observed without bias, an increase in N reduces the non-coding genome size only, while an increase in μ reduces both the coding and noncoding genome (see Fig. 4). Notably, a decrease in N increases the noncoding genome size even in the case of a deletion bias, although an insertion bias greatly amplifies this effect. As a result, and despite the strong mutational biases, we observe that multiplying either the population size or the mutation rate by the same factor leads to a genome compaction in similar proportions (the final coding fraction being

Change in coding and noncoding genome sizes in reaction to changes in N or μ for the different mutational biases. Blue boxes (on the left of each condition) show the results with a mutational bias (left: insertion bias, right: deletion bias), and gray boxes (on the right of each condition) show the results without mutational bias. Depicted values are the ratio of the coding/noncoding sizes at the final generation over the value at generation 0.
Robustness Selection as the Explanatory Mechanism
We observed that 2 distinct processes, triggered by an increase in either population size or mutation rate, can lead to genome size reduction in our experiments. However, both have different effects on coding and noncoding sequences: while an increased μ reduces both the coding and noncoding genome sizes, increasing N reduces only the noncoding genome size.
We propose that these observations can be explained by an interplay between selection for phenotypic adaptation to the environment (hereafter called direct selection), and selection for replicative robustness (hereafter referred to as indirect selection). More specifically, we define the replicative robustness of an individual as its ability to transmit its fitness to its offspring. It hence corresponds to the proportion of offspring that did not acquire new deleterious mutations. This depends both on the number of mutations occurring at replication (which in turn depends on genome size) and on the probability for a given mutation to be deleterious (usually called mutational robustness Wilke and Adami 2003), which depends on the intertwining between the kind of mutation and the genomic architecture. In our case, WT organisms are very well adapted to their environment, thus most mutations will be deleterious if they affect the coding part of the genome. This is particularly true for chromosomal rearrangements, which can affect large genomic segments (Knibbe et al. 2007; Banse et al. 2023). Conversely, beneficial mutations are extremely rare. We therefore approximate the robustness of our organisms by measuring the proportion of their offspring that have the exact same fitness, i.e. that underwent no mutations or only neutral mutations.
A more robust individual has more chances to pass on its genomic information accurately than a less robust one, thus enabling its lineage to better maintain its fitness in the long term and to outcompete other lineages in which deleterious mutations would accumulate at a higher rate. This results in an indirect selection for replicative robustness. We recall that replicative robustness depends both on the probability for a given mutation to be neutral (hence on the fraction of noncoding sequences in the genome) and on the mean number of mutations undergone by the genome at each generation (hence on the genome-wide mutation rate). Here, while the per base mutation rate is constant within each experiment, the total amount of DNA, and hence the genome-wide mutation rate, varies and can thus be indirectly selected. By contrast, direct selection depends only on the content of the coding compartment, the size of which is likely to be positively correlated with the level of phenotypical adaptation (at least in our model). As a result, indirect selection for robustness favors shorter genomes with a lower coding fraction, while direct selection for phenotypical adaptation maintains or even increases the coding size of the genome.
The efficacy of both direct and indirect selection increases with population size, since some deleterious mutations that were quasi-neutral for a low N can become effectively counter-selected in the context of a high N, changing the balance of beneficial vs deleterious fixed mutations. To quantify this effect, we measured the robustness of the individuals at time

Fitness gain a) and Robustness (b: overall and c: by mutation type) at the end of the simulations, for different population sizes N and without mutational biases. Robustness is defined as the proportion of neutral offspring. The mutation rate is fixed to
In Aevol, genomes undergo different types of mutations that can be roughly grouped into local mutations (substitutions, InDels) and chromosomal rearrangements (duplications, deletions, inversions, translocations). Both kinds of events don’t have the same effect on robustness. Figure 5c shows the change in robustness induced by the different types of events. It shows that the loss and gain in robustness are driven by chromosomal rearrangements. In contrast, local mutations (substitutions and InDels) do not have a significant effect on robustness.
In the case of an increased mutation rate, things are very different: a sudden increase in μ results in an immediate drop in robustness at the beginning of the experiments (Fig. 6a). As the proportion of offspring that bears mutations rises with μ, we go from an initial robustness of

Robustness, fitness, and genome architecture across generations for
Notably, robustness does not reach values as high as that observed before the increase in mutation rate and stays below
The interplay between direct and indirect selection can therefore explain both types of genome size reduction: affecting both coding and noncoding compartments (although not proportionally) when caused by an increased mutation rate, and restricted to the noncoding compartment when caused by an increased population size.
Discussion
We found that, in our experiments, genome size reduction can be caused by an increase in population size, mutation rate, or both, even in case of mutational biases. These 2 factors can nevertheless be distinguished, as they have different effects on the coding and noncoding sequences of the genome. Their combination in various proportions can create a broad range of alternative patterns of genome size and coding density. In particular, by playing independently on mutation rate and population size, our model can reproduce the 2 extreme but different cases of genome size reduction that are seen in some endosymbionts and cyanobacteria. As an example, Prochlorococcus marinus is known to have lost both some parts of its coding and noncoding genome, although in different proportion such that its coding density has increased (Dufresne et al. 2005; Batut et al. 2014; Giovannoni et al. 2014). In our model, this would correspond to a population undergoing an increase in population size and a slight increase in mutation rate, which is coherent with the scientific literature on Prochlorococcus marinus (Hu and Blanchard 2008; Marais et al. 2008), although the large effective population size of this species has been recently debated (Chen et al. 2022; Filatov and Kirkpatrick 2024). On the other hand, Buchnera aphidicola has conserved its coding proportion but greatly reduced its total genome size (Moran and Mira 2001), which could be explained in our model by an increase in mutation rate and a decrease in population size, in similar proportions. This suggests that indirect selection for shorter genomes through robustness selection could be a key factor playing on genome evolution (Wilke et al. 2001; Gabzi et al. 2022), and especially on the evolution of genome size and structure.
Our observations confirm those made by Lynch and Conery (2003), namely that an increased genetic drift, here associated with a decreased population size, increases the genome size. Our results also point toward an equilibrium genome size: a sufficient number of genes makes it possible to fine-tune the phenotype to the environment, but the genome also has to be short enough to prevent the degeneration caused by an excess of chromosomal rearrangements (Knibbe et al. 2007; LaBar and Adami 2020). Increasing the mutation rate or the population size displaces this equilibrium toward shorter genomes, either through a more efficient genome purification of noncoding sequences (when increasing N) or a loss of both coding and noncoding sequences to recover a minimal level of robustness (when increasing μ). Of course, mutational biases (regarding the balance between insertions and duplications versus deletions) also play an important role in determining the equilibrium genome size. In particular, deletion biases have been suggested as one main reason explaining why bacterial genomes remain small (Mira et al. 2001). However, we show here that, because of the indirect selection for robustness, a deletion bias is not needed to prevent a runaway inflation in the size of genomes. Instead, selection for robustness provides a counteracting force that increases with genome size, eventually offsetting any underlying bias in favor of insertions or duplications. Importantly, this indirect selection was not postulated in the model but emerged spontaneously in the simulations.
We propose an evolutionary mechanism consisting of a trade-off between direct selection for phenotypical adaptation and indirect selection for replicative robustness. In this respect, mutations appear to be a weak selective force, as pointed out by Lynch and Walsh (2007). However, the emphasis was previously on the mutational targets contributed by genomic features, such as introns. Here, we emphasize another aspect, which seems to have been overseen thus far: any nonfunctional DNA represents an additional target for initiating macroscopic mutational events that can eventually impact the coding genome. This mechanism requires no additional hypotheses and is very general. It should therefore be pervasive in the living world.
Sung et al. (2012) have observed that, in real populations, the mutation rate scales negatively with both the population size and the amount of coding DNA. They propose that this is a consequence of selection for lower per-base mutation rates induced by the amount of coding DNA. Here, thanks to the use of fixed mutation rates, we have shown that the mutation rate can select the amount of DNA, including both the coding and noncoding compartments. This points towards the per-genome mutation rate being the relevant value, which can evolve due to changes in genome size and per-base mutation rate. This calls for further experiments in which both the genome size and the per-base mutation rate would be allowed to evolve, to study their relative speed of adaptation and their contribution to the variation of the per-genome mutation rate.
Although our main focus was on the final equilibrium reached by the populations after a change in N or μ, our observations are broader than the end equilibrium as we can observe the temporal dynamics (Fig. 6 and supplementary S3–S15, Supplementary Material online). In particular, we observe that, when the mutation rate increases strongly, the fitness immediately drops drastically (Fig. 6b). This can be related to an error-threshold crossing mechanism (Eigen 1971; Takeuchi and Hogeweg 2007; de Boer and Hogeweg 2010): individuals can no longer pass on to their descendants all the information contained in their genome. They therefore lose fitness, and the lineage that survives in the long term is the one where genomes greatly reduced in size in the early phase of the experiment, thus reducing the number of mutations per replication event and finally reaching a point at which the information can be passed on reliably. The detailed aspects of these temporal dynamics could be the focus of future work. Indeed, it has been shown that genome reduction in endosymbionts occurred very quickly after the endosymbiosis became effective (Moran 2003; Wernegreen 2015), which is also what we observed in our data (Fig. 6).
In our experiments,
In order to allow for a fair quantitative comparison between the effect of mutation rates and population size, the amplitudes of the variations applied to the 2 parameters were similar in our experiments. In biological species, the range of variation in mutation rates is much narrower than the range of variation in effective population size, as shown by Lynch et al. (2023). Hence, given our explanatory mechanism, the observed range of variations in genome size is likely to be driven mainly by changes in N. However, our results show that μ and N do not play an identical role. Indeed, variations in N change solely the noncoding size of the genome, while the variation in μ impacts both the coding and the noncoding sizes. Therefore, even a small variation in μ compared with a variation in N could be significant in determining genome architecture trajectories. This highlights that the correlation of N and genome size is not enough to understand genome evolution and that μ, as well as any underlying mutational bias, also needs to be taken into account as a determining factor.
In this paper, we specifically focused on the effect of the variation in population size and mutation rates on genome size. Of course, it does not imply that the mechanism we identified is the only one, and various additional ones can also impact genome size evolution. For instance, there can be a limitation in available resources for nucleotide production, constraining the total genome size (Ngugi et al. 2023). In the case of endosymbiosis, exchanges can also happen between the host and the endosymbiont genomes, hence contributing to its streamlining (Bock 2017). Recombination could also further complicate the picture by adding a new type of mutation with unexpected interactions. More importantly, mobile genetic elements, and transposable elements (TE) in particular, are often proposed as one of the main drivers of genome expansion (Marino et al. 2024), especially in populations with small effective population sizes that could not eliminate them efficiently due to the low selective pressure (Lynch and Conery 2003). TE invasions have been shown to increase dramatically genome size in eukaryotes (Kidwell 2002; Oggenfuss et al. 2021), although van Dijk et al. (2022) have demonstrated that they can also lead to streamlining in prokaryotes because genome reduction prevents their invasion. We did not test their impact here, but our results show that the effect of the variations in population size and mutation rate is conserved, even in case of a strong insertion bias (Fig. 4 and supplementary figure S2, Supplementary Material online). This enables us to conjecture that mobile elements would change the equilibrium genome size (as observed in our simulations, Figs. 4 and supplementary figure S2, Supplementary Material online), and probably drastically increase the variance of observed sizes, but that they are unlikely to change the response of genome size evolution to changes in μ or N. This remains however to be tested.
To conclude, our experiments show that genome size reduction can occur in 2 very different conditions for bacteria. On the one hand, a very large population size promotes a more efficient selection in the face of random drift, which in turn enhances the robustness of genomes by decreasing their noncoding load. This corresponds to streamlining and leads to genomes with a high coding density. On the other hand, a higher mutation rate results in an instantaneous decrease in the robustness of genomes in the entire population, making the selection for robustness transiently stronger than the selection for phenotypical adaptation. The genome then shrinks rapidly, with both coding and noncoding sequences being discarded until a new robustness equilibrium is reached, all this at a substantial initial cost in phenotypical adaptation. This corresponds to a decaying genome and is compatible with empirical observations in endosymbiotic bacteria (Moran 2003). Strikingly, this remains true even in the presence of a mutational bias. Although the model that we propose here, of a balance between selection for robustness and selection for phenotypical adaption, can explain the tendencies we observe and the final genome structures in our populations, further work is needed to understand the transient regimes and the mechanisms behind the constant coding fraction along the
Materials and Methods
The Aevol Framework
Aevol (Knibbe et al. 2007; Banse et al. 2023) is an individual-based forward-in-time simulation software that has been specifically designed to study the evolution of genome structure. It emulates a population that is composed of a fixed number of individuals on a grid (Fig. 7a). Each individual owns a double-stranded circular genomic sequence, composed of 0s and 1s. To compute the phenotype, sequences on the genome are recognized as promoters and mark the start of transcription, which stops when a sequence able to form a hairpin structure is encountered. On RNAs, Shine-Dalgarno-like sequences followed by a START codon mark the beginning of translation. The RNA sequence is then read 3 bases at a time until a STOP codon is encountered on the same reading frame. An artificial genetic code allows for each sequence of codons to be converted into a mathematical function, and the sum of all functions encoded on the genome defines the phenotype of the individual (Fig. 7b). The distance between this function and a target function, which represents the ideal phenotype in the specified environment, gives the fitness of the individual with a scaling factor k that tunes the strength of the selection. A detailed explanation can be found on the dedicated website www.aevol.fr.

The Aevol model. a) Individuals are distributed on a grid. At each generation, the whole population replicates according to a Wright–Fisher replication model, in which selection operates locally within a
All individuals are replaced at each generation following a spatialized Wright–Fisher model. The number of descendants of each individual depends on its fitness difference with its neighbors. At each reproduction event, point mutations or genomic rearrangements can occur (Fig. 7c). They create diversity in the genomes, hence in the phenotypes, and allow the genome size and structure to change. These changes can be neutral or not, depending on whether mutations alter coding and/or noncoding sequences. These changes do not have a predefined effect on the fitness of the offspring as their genomes will be decoded thereafter, thus the model does not impose an a priori genome structure and allows us to study the evolution of genome architecture in various experimental conditions.
The mutation rate (in
Experimental Design
Wild Types
In order to observe changes in genome architecture induced by changes in the population size and/or mutation rates, we begin our experiments from pre-evolved organisms, which are called “WT”. Having already evolved for millions of generations under constant conditions, WTs are very stable in genome structure and well adapted to their environment (although the fitness never stops increasing). Five different WTs were used for our experiments, all having evolved for 10 million generations at the basal conditions of
Experimental Conditions
A range of population sizes increases or decreases and mutation rates increases, as well as some combinations of both, are tested. All conditions are listed in Table 2 below. For each combination of conditions, 10 replications of each of the 5 WTs are run. Initial populations are always clonal: all individuals are identical to the specific WT used for the run.
Population size . | Mutation rate (per base pair, per mutation type) . | |
---|---|---|
64 ( | ||
256 ( | ||
1024 ( | ||
529 ( | ||
256 | ||
64 ( | ||
Population size . | Mutation rate (per base pair, per mutation type) . | |
---|---|---|
64 ( | ||
256 ( | ||
1024 ( | ||
529 ( | ||
256 | ||
64 ( | ||
The control condition is in bold. Note that, as the simulations take place on a squared grid, population sizes could not be exactly divided or multiplied by 2.
Population size . | Mutation rate (per base pair, per mutation type) . | |
---|---|---|
64 ( | ||
256 ( | ||
1024 ( | ||
529 ( | ||
256 | ||
64 ( | ||
Population size . | Mutation rate (per base pair, per mutation type) . | |
---|---|---|
64 ( | ||
256 ( | ||
1024 ( | ||
529 ( | ||
256 | ||
64 ( | ||
The control condition is in bold. Note that, as the simulations take place on a squared grid, population sizes could not be exactly divided or multiplied by 2.
Data Analyses
To analyze the simulations, we reconstruct the ancestral lineages of the final populations. To this end, simulations are run for
On this lineage, we retrieve the fitness, coding, and noncoding genome size at each generation, as well as the replicative robustness every
To compare experimental conditions, we retrieve the individual at generation
Effect of Mutational Biases
As it is often assumed that mutational biases—toward deletions for bacteria and toward insertions for eukaryotes—are very important for genome size evolution (Petrov 2002), we also tried to confront our experiments to the impact of mutational biases. We tested 4 mutational biases: twice as many large deletions than duplications, twice as many small deletions than small insertions, twice as many duplications than large deletions, and twice as many small insertions than small deletions. In all cases, the sum of all mutation rates is conserved, such that the overall mutational pressure is the same as in the previous experiments.
For each mutational condition, 5 WT evolved for
Supplementary Material
Supplementary material is available at Genome Biology and Evolution online.
Acknowledgments
The authors would like to thank Laurent Duret and David P. Parsons for fruitful comments on the manuscript.
Funding
The authors acknowledge the support of the French Agence Nationale de la Recherche (ANR), under grant no. ANR-20-CE02-0008 (NeGA project). J.L., G.B., and N.L. would like to thank the Rhône-Alpes Institute for Complex Systems (IXXI) for funding. All authors thank the Grid’5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000.fr), for computational support.
Data Availability
The code of Aevol is available on GitLab at https://gitlab.inria.fr/aevol/aevol. WTs sequences to reproduce the experiments, as well as the full lineages data and robustness data, are available on Zenodo: https://doi.org/10.5281/zenodo.10669479.
References
Author notes
Conflict of interest The authors declare no competing interests.