-
PDF
- Split View
-
Views
-
Cite
Cite
Matthew J Lollar, Timothy J Biewer-Heisler, Clarice E Danen, John E Pool, Hybrid breakdown in male reproduction between recently diverged Drosophila melanogaster populations has a complex and variable genetic architecture, Evolution, Volume 77, Issue 7, July 2023, Pages 1550–1563, https://doi.org/10.1093/evolut/qpad060
- Share Icon Share
Abstract
Secondary contact between formerly isolated populations may result in hybrid breakdown, in which untested allelic combinations in hybrids are maladaptive and limit genetic exchange. Studying early-stage reproductive isolation may yield key insights into the genetic architectures and evolutionary forces underlying the first steps toward speciation. Here, we leverage the recent worldwide expansion of Drosophila melanogaster to test for hybrid breakdown between populations that diverged within the last 13,000 years. We found clear evidence for hybrid breakdown in male reproduction, but not female reproduction or viability, supporting the prediction that hybrid breakdown affects the heterogametic sex first. The frequency of non-reproducing F2 males varied among different crosses involving the same southern African and European populations, as did the qualitative effect of cross direction, implying a genetically variable basis of hybrid breakdown and a role for uniparentally inherited factors. The levels of breakdown observed in F2 males were not recapitulated in backcrossed individuals, consistent with the existence of incompatibilities with at least three partners. Thus, some of the very first steps toward reproductive isolation could involve incompatibilities with complex and variable genetic architectures. Collectively, our findings emphasize this system’s potential for future studies on the genetic and organismal basis of early-stage reproductive isolation.
Introduction
Populations with limited genetic exchange experience distinct population genetic forces that may drive allele frequency change between groups through successive generations. As a consequence of this genetic divergence, reproductively isolated populations descending from a once freely interbreeding common ancestor may be partially or wholly reproductively isolated upon secondary contact (Mayr, 1942). Alleles that are at appreciable frequency in one but not both populations are tested against a novel genetic background upon hybridization. Neutral or beneficial alleles in one population may prove deleterious in a hybrid genome, and consequent reductions to fitness may prevent reproduction or survivability in hybrids (Coyne & Orr, 2004).
These interactions are “unseen” by selection until the point of hybridization, and thus populations need not ever pass through unfit allelic intermediates for alleles involved in incompatibilities to evolve between them. Rather, they may evolve simply as a pleiotropic byproduct of independent evolution (Presgraves, 2010). Collectively, these genetic incompatibilities were described by and attributed to Bateson (1909), Dobzhansky (1934), and Muller (1942) (BDM-incompatibilities or BDMIs). BDMIs are relevant to many species definitions, particularly those which generally define species boundaries by the presence of strong or complete reproductive isolation between groups (Coyne & Orr, 2004).
The frequent existence of BDMIs between species has been well documented by field and lab studies (Blackman, 2016), and found in both systems that are partially or completely isolated. Analysis of studies characterizing reproductive isolation have revealed some general patterns. The primary and most consistent pattern to emerge within taxa such as Drosophila is that the strength of reproductive isolation is positively correlated to genetic distance (Coyne & Orr, 1989; Orr, 2005). As all genes act within the epistatic context of their genetic background, it is perhaps expected that the number of possible BDMIs between genomes is elevated with increasing genetic differences between them (Orr, 1995; Satokangas et al., 2020). Theoretically, incompatibilities are expected to accumulate at a rate proportional to the square number of differences between populations (Matute et al., 2010; Orr & Turelli, 2001).
While between-species models have been successful in identifying genomic regions and sometimes genes underlying hybrid dysfunction (Blackman, 2016), a remaining challenge in these studies is the determination of which incompatibilities have historical relevance to the speciation process. Inherent to the study of distantly related species is the assumption that detectable incompatibilities in the present were involved in the generation of initial barriers to gene flow. The accumulation of incompatibilities over time may strengthen initial barriers to hybridization or even “complete” the speciation process (Dobzhansky, 1937), but there is no certainty that currently known BDMIs were involved in the early stages of isolation (Coyne & Orr, 2004).
Many of the outstanding questions remaining in the field of speciation are difficult to answer in between-species studies. The determination of which evolutionary forces govern the fixation or rise in frequency of incompatibility variants may not always be possible for well-isolated species, as any population genetic signatures of positive selection may no longer be detectable. In addition, if there was a polymorphic basis to early-acting incompatibilities between diverging taxa (Cutter, 2012), this history may be obscured when incompatible variants eventually become fixed. Although polymorphisms contributing to incompatibilities may often be present in the early stages of reproductive isolation (Cutter, 2012; Laturney & Moehring, 2012; Reed et al., 2008; Sweigart et al., 2007; Zuellig & Sweigart, 2018), complete isolation inevitably involves the fixation of alleles between populations (Larson et al., 2018), and questions regarding the establishment of such alleles remain of primary interest. Furthermore, the genetic tools available to pursue the basis and mechanism of incompatibilities between completely isolated taxa are inherently limited, especially if such incompatibilities between-species result in lethality of hybrids (Presgraves, 2007). Researchers have used incompletely isolated species models to overcome limitations to studies in distantly related species, and to determine whether the speciation process differs at distinct evolutionary stages (Kulmuni et al., 2020). Taking this approach to the extreme, if studies of very recently diverged populations within the same species can identify incompatibilities that are present at some polymorphic frequency, new insights into the biology of emergent reproductive isolation may be elucidated.
Although populations that have experienced a recent divergence may be separated genetically only by a small number of alleles, evidence for BDMI-like incompatibilities between incompletely isolated populations within species has been documented (Coughlan & Matute, 2020). In such cases alleles involved in an incompatibility may or may not be fixed within a population (Turelli & Orr, 2000), and if they are recessive, isolation will typically only manifest in second generation or later hybrids of sexually reproducing diploids. Reproductive isolation of this kind is often termed “hybrid breakdown” and is a specific category of intrinsic postzygotic reproductive isolation (Oka et al., 2003).
Investigation of hybrid breakdown and later generation BDMIs has received some attention in recent speciation literature, and is often pursued between closely related species (e.g., Breeuwer & Werren, 1995; Matsubara, 2020; Morgan et al., 2020; Oka et al., 2003; Renaut & Bernatchez, 2011). Examples of within-species models of reproductive isolation are more limited (Koski et al., 2022; Pritchard & Edmands, 2013; Stelkens et al., 2015; White et al., 2012), yet in these few studies the existence of moderately strong isolating barriers among populations has been evidenced. Still, there remain gaps in our understanding of the biology underlying the earliest incompatibilities segregating within species, the evolutionary forces that act upon causative agents, and whether hybrid breakdown contributes significantly to reproductive isolation in its nascent stages. Within-species allopatric populations are particularly fertile territory for the investigation of such questions during early-stage reproductive isolation, and one potential candidate for such studies is the vinegar fly Drosophila melanogaster, a species with many recently isolated populations and the most intensively studied species in a genus with broad importance in the speciation literature (Coyne & Orr, 1989).
Natural populations of D. melanogaster are thought to have originated in sub-Saharan Africa, where at some unknown time it formed a commensal relationship with humans (Lachaise & Silvain, 2004; Lachaise et al., 1988). It is thought that expansion out of a southern-central African ancestral range (Pool et al., 2012) first occurred approximately 13,000 years ago (Sprengelmeyer et al., 2020). While this period of population divergence encompasses nearly 200,000 fly generations, it represents only a brief interval of population genetic time (less than 0.1Ne generations, based on the estimated long-term ancestral population size; Sprengelmeyer et al., 2020). The species then expanded across much of sub-Saharan Africa, and soon thereafter crossed what is now the Saharan desert, which involved a moderately strong population bottleneck leading to reduced genetic diversity outside of sub-Saharan Africa (Begun & Aquadro, 1993; Sprengelmeyer et al., 2020). However, the species may have only entered Europe roughly 1,800 years ago (Sprengelmeyer et al., 2020; Figure 1D), and it was not observed in northern Europe until the 1800s (Keller, 2007). Additionally, phenotypic differences have evolved that correlate with geography, providing evidence of local adaptation between populations within and beyond Africa (e.g., David & Capy, 1988; Fabian et al., 2015; Lack et al., 2016a; Pool & Aquadro, 2007). Populations of D. melanogaster are now found worldwide, providing a rich pool of genetic and phenotypic diversity that can be readily studied in the laboratory (e.g., Lack et al., 2016b).

Failure in male reproductive success is the predominant mode of hybrid breakdown among F2 hybrids between various Drosophila melanogaster populations. Left: F2 combined cross values and significance for average percent inviability (A), and for female (B) and male (C) reproductive failure rates. Line color denotes significance of differences in inviability/reproductive failure between-population’s crosses relative to both within-population cross groups, where pink indicates p < .05, yellow indicates .05< p < .1, and blue indicates p > .1. Within-population cross averages (gray) are listed adjacent to the population label. P-values were generated from bootstrapping (see Materials and methods section) and are listed underneath each between-population percentage. Individual cross averages are supplied in Supplementary Tables S1–S3. (D) Map of D. melanogaster population expansion and estimated divergence times from Sprengelmeyer et al. (2020). Divergence is given in thousands of years (kya) and each value represents the initial expansion estimate between nodes. Expansion out of ancestral ranges in southern-central African occurred roughly 12.8 kya. Populations rapidly colonized west Africa (12.6 kya), and later other regions of central Africa including our Ethiopian sample (9.5 kya). An additional estimated split (2.7 kya) between low- and high-altitude Ethiopian populations, the latter being used in the present study, is not depicted. Trans-Saharan migration occurred roughly 12.4 kya, during which populations experienced a moderately strong population bottleneck. Migration from the Middle East into cooler European regions occurred in the more recent past (1.8 kya). Coloration based on average annual temperature (scale bottom right) highlights the dramatic environmental differences between regions.
Evidence of partial reproductive isolation between populations within the D. melanogaster species has previously been suggested, particularly between tropical African and temperate non-African populations. Prezygotic barriers have been reported, evidenced primarily by the unidirectional mating preference of Zimbabwe-type females, who disfavor males of cosmopolitan origin in favor of other “Z-type” males (Grillet et al., 2012; Hollocher & Wu, 1996; Wu et al., 1995). A separate instance of unidirectional mating preference has also been reported between populations of western/central African origin vs. all others (Bontonou et al., 2013). Concordantly, a degree of sexual isolation has been reported between northern U.S. populations (with predominantly European ancestry) and those from the Caribbean populations (with greater proportions of African ancestry; Yukilevich & True, 2008a, 2008b). Postzygotic isolation after the F1 generation has also been reported between those same populations (Kao et al., 2015; Lachance & True, 2010). When X chromosomes from North American strains of high European ancestry were introgressed into autosomal backgrounds of Caribbean strains with greater African ancestry, lethality was observed in over half of all crosses, and 57% of the viable crosses showed male and/or female sterility (Lachance & True, 2010). The reciprocal introgression produced F2 lethality only in 18% of crosses, and sterility in around 30% of the viable crosses (Lachance & True, 2010). Reduced egg-to-adult viability has also been observed in naturally admixed populations with more even levels of African and European ancestry (Kao et al., 2015). Thus, there is good evidence that incompatibilities may be exposed in crosses between pure African and European strains.
Subsequent genomic inference of pervasive epistasis in admixed D. melanogaster (Corbett-Detig et al., 2013; Pool, 2015) further bolster the desire to pursue the D. melanogaster system as a model for early-stage reproductive isolation. Pool (2015) assessed European/African ancestry across genomes of the D. melanogaster Genetic Reference Panel (Mackay et al., 2012) and found strong evidence for natural selection mediating the outcome of ancestry within strains. Particularly, there was a genome-wide signal for ancestry disequilibrium between unlinked loci in these highly inbred lines (Pool, 2015). Ancestry disequilibrium represents the deficiency of Africa–Europe allele combinations between locus pairs that is not predicted under neutrality when accounting for the number of admixed generations. Results found here suggest that natural selection, potentially involving negative epistasis, mediated the outcomes of natural admixture between European and African alleles and laboratory inbreeding. Additionally, a largely distinct set of epistatic incompatibilities were inferred to segregate between natural populations of D. melanogaster (Corbett-Detig et al., 2013), based on the genomes of laboratory-admixed Drosophila Synthetic Resource Panel strains (King et al., 2012). Taken together, these studies highlight the possibility of using genomic variation and laboratory experiments to detect signatures of isolation within the species.
Here, we sought to investigate the presence of hybrid breakdown among several natural populations of D. melanogaster. We find little evidence of breakdown for inviability or hybrid female reproductive failure, but provide evidence for hybrid male reproductive failure in the F2 generations, most evidenced in crosses between France and Zambia populations. We find that both frequencies of non-reproductive males and the effect of cross direction are highly dependent on the specific strains crossed. Additionally, we find that backcrosses among crosses with male reproductive failure do not recapitulate nor enrich the phenotype of F2 crosses, suggesting that putative incompatibilities underlying reproductive failure do not conform to the simplest two-locus BDMI models. Our investigation suggests that early-stage reproductive isolation within the species has a variable and complex genetic architecture and that D. melanogaster represents a promising system for studying the genetic basis of early-stage reproductive isolation.
Materials and methods
Fly husbandry
All stocks and crosses were reared in bottles containing standard Drosophila cornmeal/molasses medium, prepared in batches consisting of 4.5 L water, 500 mL cornmeal (Quaker yellow), 500 mL molasses (Grandma’s unsulphured), 200 mL powdered yeast (MP Biochemical Brewer’s), 54 g agar (Genesee Drosophila Type II), 20 mL propionic acid, and 45 mL 10% Tegosept solution in 95% ethanol (by volume). Bottles were maintained at a constant 25 °C temperature and 12 hr light/dark circadian cycles. Humidity was not strictly controlled, but typically ranges between 50% and 60%. Bottles used for virgin collections were moved to an 18 °C environment during hatching periods, and collections were maintained under the same conditions as stocks. Crosses were founded using approximately 10–20 flies per strain, and F1 interbreeding occurred using an estimated 50–100 F1 flies per bottle. Flies were transferred to fresh bottles every three days, and offspring were collected from crosses up to 16 days from the first transfer of flies to the bottle. F2 offspring in all assays were collected 24–48 hr post-eclosion and placed into vials containing no more than 20 flies per vial. Flies were aged for an additional 24–48 hr before use in assays. Virgin females used in male no-choice mating assays were collected into vials of no more than 20 females per vial and aged 2–5 days before use in assays.
Generation of experimental individuals
We first selected six unique wild-derived inbred strains from each of four natural populations, representing Cameroon, France, high-altitude Ethiopia, and Zambia sources (Pool et al., 2012). In all cases except for Cameroon, strains free from commonly known inversions were selected, while all available inbred strains from our Cameroon collection contain at least one known inversion. Six between-population crosses were performed between each combination of population pairs, for a total of six population pairings (Supplementary Tables S1–S3). Each of the six crosses between two populations involved different strains, and three were performed in each cross direction. In crosses involving Cameroon, one Cameroon strain was used for two crosses, as we possessed five inbred strains from this sample. Three within-population crosses for each of the four source populations were also performed and used as controls for statistical comparison. First generation (F1) hybrids of crosses were subsequently intercrossed to generate second generation hybrids (F2). Reproductive isolation in F2 hybrids was then measured by three metrics: Egg-to-adult viability and reproductive success in no-choice mating success assays for both males and females.
For the larger-scale male reproduction screen between France and Zambia, we randomly selected five inversion-free strains each from both population samples, and this time we performed all pairwise cross combinations among these strains. Each of the 25 cross pairs were sampled in both cross directions for a total of 50 between-population crosses. Within-population controls were likewise generated through pairwise crosses among each of the five strains within a population, crossed in both directions, yielding 40 within-population control crosses consisting of 20 crosses each within France and Zambia (Supplementary Tables S4 and S5). F2 males were generated and assayed identically to the methods described above for the four-population analysis.
Backcrossed (BC) males were generated by two primary cross designs to test for effects of either the mitochondria or Y chromosome. For tests involving a focal Y chromosome, F1 males were crossed back to F0 females from the non-focal ancestral type (thus pairing the focal Y chromosome genotype with homozygous autosomal and/or X chromosomal genotypes in many backcross offspring). For tests involving a focal mitochondrial genotype, F1 females were crossed back to F0 males from the non-focal ancestral type (thus pairing the focal mitochondrial genotype with homozygous autosomal and/or X chromosomal genotypes in many backcross offspring). Male reproductive success in BC males was measured in an identical procedure as for the F2 studies above.
No-choice mating reproductive success assay
F2 males were paired with two virgin females (one from each population involved in the cross) in a single vial, and allowed to mate for nine days. Females used in the assay were randomly chosen from all employed lines for a population, excluding those that founded the cross being assayed. After nine days, vials were checked for the presence of larvae or pupae. If no offspring were present, males were transferred to a new vial, paired with two new virgin females, and allowed an additional nine days for possible mating. Males that were successful in the first trial were not assayed in a second trial and considered successful. Males that failed to produce offspring in the second 9-day trial were counted as non-reproducing, and successful males in either trial were counted as reproducing. If a male or both females were deceased during either checkpoint, results from that male were discarded. Assays were conducted in ambient lab conditions.
Female reproductive performance assays were conducted identically to male assays, with the exception of pairing a single hybrid female to two male mates, again randomly selected from strains of each population represented in the cross but excluding the strain used to establish a cross. Reproductive success was counted as a binary trait (fecundity was not assayed quantitatively).
Egg-to-adult viability assays
F1 offspring were collected into vials of no more than 25 flies per vial at 0–48 hr post-eclosion. Flies were aged an additional 3–5 days in groups of vials containing both males and females. At the time of assay flies were separated in groups of three males and three females, and placed into fresh vials to allow mate/laying for an additional 18–22 hr (density of eggs in vials was not strictly controlled). At this time, flies were discarded and all embryos were counted. Larvae were added to the egg count if embryos had hatched before counting occurred. Vials were maintained at 25 °C for an additional 15–17 days, at which point adults were counted from vials. Trials with less than five embryos counted were not considered. In instances where the adult count was higher than egg count (Supplementary Table S1), we elected to include the overcount in the analysis. Given some unknown overall level of counting error, we suggest that an inferred adult:egg ratio greater than one should be considered to have a greater expected true viability proportion than a ratio of exactly one, and thus the overcount contains information. Unless otherwise noted in Supplementary Table S1, 10 trials per cross were performed for each cross.
Testing for elevated rates of F2 defects using a bootstrap/resampling approach
We sought to test whether the levels of reproductive failure observed in each between-population test cross exceeded our null expectations based on the levels of reproductive failure observed in within-population control crosses. P-values for all reproductive failure analyses were generated using a nonparametric bootstrap resampling approach to compare rates of F2 breakdown among cross types (between or within population). We describe the resampling design and significance testing in greater detail below. Bootstraps were manually constructed in python (version 3.6). Scripts used to conduct tests described the following section, including our bootstrap, are provided at www.github.com/mjlollar/signficance_testing.
The goal for each bootstrap was to compare rates of a breakdown trait such as male reproductive failure among two sample groups, most often intra- vs. inter-population crosses, in order to estimate a p-value estimating the probability of the data under the null model that inter-population crosses have breakdown rates no higher than intra-population crosses. The boostrap is a powerful method first introduced by Efron (1979), and allows us to investigate differences among samples without making assumptions about underlying distributions or parameters. Importantly, it is a robust statistical framework against a primary variance component of the data, sampling variation among replicates (e.g., comparing the differences in sample sizes between Supplementary Tables S3 and S4).
Each bootstrap compares the differences between total reproductive failure rates among two groups, an “experimental” population most often represented by between-population crosses, and a “control” population, most often within-population crosses. In each replicate, for each between-population cross, a random cross was selected from the control group. This control cross was resampled with replacement, based on its own sample size, to account for sampling variance in the control cross data. Third, we again resampled individuals from the (resampled) control cross data, this time based on the between-population test cross’s sample size, to account for sampling variance in the test cross data. This process was repeated once for each between-population cross in the test data, each time sampling a random control cross with replacement. Fourth, if the number of reproductive failures after the latter resampling step was greater than or equal to the number observed in the test cross, then this replicate was added to the numerator of the p-value calculation (which had a denominator equal to the number of replicates, 1 million). P-values therefore represented the proportion of replicates in which our two-step resampling of control cross data yielded at least as many reproductive failures as observed for the test cross.
We also calculated individual cross p-values using the same procedure as described above. Here in each replicate, we performed the above two-step resampling procedure for a single random control cross to compare against a given test cross in each bootstrap replicate. We report all raw individual cross p-values in Supplementary Tables for reference purposes. However, our conclusions about the presence of hybrid breakdown are instead based upon the above multi-cross testing procedure, and we only discuss cross-specific results if the population-comparison-level p-value is significant (i.e., France/Zambia male reproduction).
To follow up on significant evidence of hybrid breakdown in male reproduction between the France and Zambia populations, we performed the broader bidirectional 5-by-5 crossing scheme described above (Figure 2; Supplementary Tables S4 and S5). Here, we also tested the significance of individual crosses, and so a Benjamini-Hochberg multiple test correction was applied to these p-values using the p.adjust method “BH” (Benjamini & Hochberg, 1995) from the Rstats package (version 3.6.2; R core, 2021). For this same France/Zambia comparison, we also tested whether individual between-population crosses showed significant differences between the reproductive failure rates observed between the two cross directions. Here, the cross direction with the lowest rate of reproductive failure was considered the control set, and tests were performed identically to the description above for testing a full set of crosses between two populations.

Male reproductive failure rates between French and Zambian Drosophila melanogaster crosses deviate significantly from within-population crosses, and have a variable and sometimes cross-direction-specific pattern. (A) Histogram depicting percentage of male reproductive failure for between-population (blue) and within-population (yellow) cross directions. Values on the x-axis are the percentage of males that failed in all four reproductive assays, each involving one female from each parental population (binned by rounding percentages up to nearest whole number). Within-population cross averages are displayed for France (purple line) and Zambia (orange line), as well as between-population average (light blue line). In total, 36 of the 50 between-population crosses have reproductive failure rates higher than both within-population averages (bins right of both lines), and the rate of reproductive failure between these groups is significantly different (bootstrap p < 1.0 × 10−6). Individual cross counts are supplied in Supplementary Tables S4 and S5. (B) Male F2 reproductive failure rates among all possible crosses between five France and five Zambia strains, for both cross directions. Each cross is represented by a separate male failure rate for each cross direction (FR maternal top-left, ZI maternal bottom right). Individual cross directions with significant failure rates compared to within-population controls are bolded. Crosses whose direction-combined cross failure rates have significant raw p-values (p < .05) are highlighted pink, and those with .05 < p < .1 are highlighted yellow. These highlighted crosses were tested for differences between cross directions in failure rate, and crosses with significant (p < .05) differences are denoted by a blue solid line.
An alternative test of F2 hybrid breakdown using generalized linear models
Regression modeling was performed as a supplemental analysis to the bootstrap method applied in the 5-by-5 strain France/Zambia male reproduction screen. Regression analysis was leverged as means to account for the possibility that specific parental strains could have generally elevated rates of male reproductive failure. Here, our primary hypothesis testing does not deviate from that of the bootstrap, i.e., to say we wanted to model the effect of cross type (between or within population) on the response variable, reproductive failure, or success.
In each logistic model, categorical variables predictors were used to model a binary response variable that represents either reproductive success (coded as 1) or reproductive failure (coded as 0). Individual F2s from between- and within-population crosses (Supplementary Tables S4 and S5) were combined into a single data set, and scored as successful or unsuccessful. Overdispersion was assumed and accounted for using the “family=quasibinomial” argument of the glm() function from R package “stats” (version 3.6.2).
Our first model used a single, binary categorical variable (between- vs. within-population cross origin) to predict reproductive success. Next, we modeled reproductive success using two additional categorical variables, maternal and paternal strain. For each category, an individual was labeled with a factor denoting a parental genotype. Both maternal and paternal categories each contained 10 factors (representing the five France and five Zambia strains). The strain giving the closest reproductive success rate to the overall mean was used as the reference category for maternal strain and separately for paternal strain—this was ZI366N for both variables.
Analysis of variance for viability data
One-way analysis of variance (ANOVA) was used as an alternative statistical approach to the bootstrap when we sought to summarize viability data (Figure 1A, Supplementary Table S1). ANOVA was preferred to the bootstrap primarily due to the difference in structure of this dataset compared to measured rates of reproductive failure. Whereas reproductive failure rate was measured at the level of individual, viability was measured in each cross using three females per replicate, and replicates were tallied together to generate means for each unique cross. Means from crosses were compared using the ANOVA() method in R stats package (version 3.6.2), using default model parameters (Type-II sum of squares). This analysis compared the set of six means from the relevant within-population crosses to the six means from between-population crosses, resulting in an overall p-value for each population comparison displayed in Figure 1A.
Results
Testing for evidence of hybrid breakdown among four populations for three traits
We first searched for evidence of hybrid breakdown involving three traits (egg-to-adult viability, female reproductive ability, and male reproductive ability) among four natural populations of D. melanogaster (Cameroon, high-altitude Ethiopia, France, and Zambia). For each population pair, we performed six unidirectional crosses involving separate inbred strains—three with each population as the maternal parent. We applied assays described in the Materials and methods section to F2 individuals. We then tested for differences in survival/reproduction between the sets of between-population crosses and within-population crosses from a given population pair, in order to assess potential evidence for hybrid breakdown between D. melanogaster populations.
We did not find clear evidence of egg-to-adult viability decline between any population pair when compared to within-population crosses (Figure 1A). In general, embryo yield varied both between crosses and within crosses, and assay conditions may have impacted our ability to detect subtle inviability differences. This may be evidenced, for instance, by the Ethiopian population (Supplementary Table S1). Previous studies in our lab have found lower egg lay rates in this population (Lack et al., 2016c), and indeed the egg counts tended lower in our assay (Supplementary Table S1), while overall inviability within the Ethiopian population was much lower than other within-population samples. We note that alternative assays could reveal more subtle viability differences between populations, but our screens do not suggest that F2 viability under benign lab conditions is strongly impacted by hybrid breakdown between D. melanogaster populations.
Female reproductive ability also did not significantly differ between-population crosses compared to within-population samples (Figure 1B). In general, failures in the mating assays performed among all crosses were almost entirely absent (Supplementary Table S2). A nearly significant signal between Zambia and Ethiopia (bootstrap p-value = .0558) was driven primarily by a single between-population cross (out of six; Supplementary Table S2). We do not interpret this result as clear evidence of hybrid breakdown, although we can not rule the existence of polymorphic incompatibilities between these populations affecting this trait. We also note that our study did not examine quantitative female fecundity, which might conceivably be affected by hybrid breakdown.
In contrast to the above traits, we found two signatures of increased F2 male reproductive failure between populations: A strong signal between France and Zambia (Figure 1C; bootstrap p-value = 3.76 × 10−4) and a weaker signal between France and Cameroon (Figure 1C; bootstrap p-value = .06117). Interestingly, both pairs demonstrated a cross-direction pattern of reproductive failure (Supplementary Table S3). Crosses involving France and Zambia were more often enriched for male failure in the Zambian maternal/France paternal direction of crosses (bootstrap p-value <1.0 × 10−6; Supplementary Table S3), while crosses involving France and Cameroon failed more frequently in paternal Cameroon/maternal France direction (bootstrap p-value <1.0 × 10−6; Supplementary Table S3). Despite the high average reproductive failure rate between France and Cameroon, the statistical signal was diminished due to the presence of a single within-population control cross with notably high failure rate (Supplementary Table S3) in the Cameroon population. While a larger sample may improve the statistical clarity when comparing this population pair, we again note the difficulty in the immediate interpretation of this result, in part due to the presence of inversions among our assayed Cameroon strains. Conversely, the strong signal of male reproductive failure between France and Zambia cross pairings represented a more promising suggestion of hybrid breakdown, and prompted us to investigate this trend of increased male reproductive failure in greater detail.
Expanded analysis of France–Zambia hybrid breakdown in male reproduction
We next sought to increase overall sampling among France and Zambia crosses to further assess evidence of hybrid breakdown between these populations and to examine its genetic architecture. We chose five inbred strains from each population and performed all possible between-population and within-population crosses among them (making 50 total cross directions for between-population crosses and 40 within). Results of these experiments qualitatively supported our preliminary findings of increased male reproductive failure from crosses between France and Zambia strains. Within-population crosses only yielded average male reproductive failure rates of 1.15% (France) and 2.09% (Zambia), and similarly we found among males sampled from within these 10 inbred strains, an average of just 1.4% of failed to reproduce (Supplementary Table S4). In contrast, crosses between these populations produced a mean male failure rate of 4.94%, more than triple the total within-population cross average (Figure 2A; Supplementary Table S5). And yet, the latter average masks striking variation among between-population crosses. Averaging between cross directions, five of the 25 between-population crosses gave male failure rates no greater than the within-Zambia average cited above. In contrast, a cross between FR54N and ZI403N gave an average of 12.2% male failure between the two cross directions, and several other crosses showed notable elevations as well (Figure 2B).
Hypothesis testing was carried out on this data set to determine: (a) whether the total between-population cross group had a significantly increased rate of failure vs. the total within-population cross group, and if so, (b) whether each individual between-population cross had significantly or marginally increased rates of male reproductive failure, and if so, (c) whether the two directions of that between-population cross yielded significantly different male reproductive failure rates. For each test, we applied a bootstrap resampling approach that accounted for sample variance in both test and control crosses (Supplementary Tables S4 and S5) to determine significant differences in male reproductive rates between sample groups (see Materials and methods section).
First, we asked whether the mean failure rate observed in the between-population cross group differed from our within-population cross group. When all between-population crosses were tested jointly, no within-population resampling replicate out of 1,000,000 permutations matched or exceeded the total observed number of failed F2 males from our between-population crosses, suggesting an exceedingly significant difference between sample groups. Collectively, these data suggest that male reproductive failure is significantly increased in between- vs. within-population crosses between France and Zambia D. melanogaster.
In our analysis to determine whether any individual between-population cross had increased rates of reproductive failure, nine of the 50 between-population cross directions were found to be significant at p < .05 (Figure 2B, bold values). When both cross directions were considered as a single sample, eight crosses out of 25 were significant at p < .05 (Figure 2B, pink boxes), while another three crosses had marginal cross direction effects with .05 < p < .1 (Figure 2B, yellow boxes). Due to the number of tests performed in this analysis, we applied a Benjamini-Hochberg correction to the resampling p-values (Benjamini & Hochberg, 1995), and found that no individual cross could be strictly considered individually significant (Supplementary Table S5). Despite this, 36 out of the 50 individual between-population cross directions had a percent failure rate higher than both within-population averages (Figure 2A), suggesting that many of them may indeed be influenced by hybrid breakdown.
Finally, we tested the significance of cross direction in the rate of F2 male failure, as suggested from preliminary screening (Figure 1, Supplementary Table S5). We considered all crosses with combined-direction p-values below .1, a relaxed threshold allowing us to test a total of 11 cross (Figure 2B, highlighted crosses). Roughly half of these crosses (6/11) showed significant differences between cross directions (Figure 2B, blue solid lines). Of these six, three crosses produced F2 failures more significantly in the France maternal cross direction, and three crosses more frequently in the Zambian maternal direction. The most extreme ratio between cross directions, from cross FR320N/ZI251N, involved 9.3% F2 male failure with a France mother, but only 1.0% with a Zambia mother. The greatest arithmetic difference in failure rates, for cross FR54N/ZI403N, involved 16.7% failure with a Zambia mother, vs. 7.8% with a France mother. These results do not support the above preliminary finding of a France paternal/Zambia maternal pattern of male failure in the preliminary survey. Rather, they suggest a variable pattern to male reproductive failure that is sometimes influenced by parental sex in either direction.
As a supplementary analysis to our population-level testing (i.e., Step 1 above), we also considered this data within the framework of a logistic-binomial linear model, including all individuals from within- and between-population crosses, where reproductive success or failure represented our binary response variable. In our first generalized linear model (Supplementary Table S6, Model 1), we considered only cross type (between- vs. within-population) as a binary predictor of reproductive success. We found that cross type was a significant predictor of reproductive success (p = 3.65 × 10−10; Supplementary Table S6), with a logit−1 scale regression coefficient for cross type of −1.17. In other words, a unit change in cross type (modeled here as a change from “within” to “between” population) corresponded to a negative difference in the probability of reproductive success.
We next considered the possibility of strain-specific effects on male reproductive success by adding maternal and paternal strain categorical predictors to the base model. We found that there was only one significant regression coefficient for each of the maternal and paternal predictors, both indicating higher probabilities of reproductive success for crosses involving FR126N specifically (Supplementary Table S6). Theses result could reflect a general effect of the FR126N strain, or that it simply happens to have fewer incompatible variants with the Zambia strains tested. No maternal or paternal strain was associated with a significantly lower probability of reproductive success (Supplementary Table S6). Furthermore, in this model including strain effects from both parents, we found that cross type was still a strongly significant predictor of reproductive outcome (p = 5.05 × 10−10, Supplementary Table S6). Here, the logit−1 scale regression coefficient for cross type was −1.19, which suggests slightly stronger predictive power than in the base model, indicating that accounting for parental strain effects did not lessen the effect of between-population cross origin in reducing rates of reproductive success.
Backcross analysis to investigate cross direction asymmetry
The asymmetric cross direction pattern of F2 male reproductive failure observed in some crosses implies the involvement of uniparentally inherited chromosomes, i.e., Y chromosomes or mitochondria (unless other cytoplasmic factors are involved). One simple way to test for effects from these uniparental factors is to employ backcross designs aimed at exposing each uniparentally inherited unit to a homozygous (or hemizygous) genetic background of the alternate ancestry. If the potential underlying genetic basis of male reproductive failure follows a two-locus recessive BDMI model involving a uniparental chromosome, the expected failure rate among BC males should be higher in some backcrosses relative to F2 hybrids from a F1 intercross (Figure 3).

Male reproductive failure in backcrosses is reduced relative to F2 males, suggesting that a two-locus BDMI model is not appropriate for incompatibilities involving uniparental factors. Left: Backcross design to test for mitochondrial (top) or Y (bottom) by X/autosome two-locus recessive BDMIs between two populations (blue and orange). Above, an example cross design to test for BDMIs involving the blue mitochondrial haplotype. Below, an example design to test for BDMIs involving the blue Y haplotype. Right: Reproductive failure rates among BC males. Under the hypothesis of a single, fully penetrant two-locus recessive BDMI, these backcrosses increase the probability of uniparental by hemizygous X or recessive autosomal genotypes, and the expected rate of reproductive failure among backcrosses should increase (blue bars). Out of the six crosses in Figure 2B that had cross direction effects, three crosses each produced reproductive failure at higher rates in the France or Zambia maternal direction. This allowed us to test for BDMIs involving four uniparental elements (France or Zambia, mitochondria or Y), as indicated by labels above yellow bars. Backcrosses are written in the format of “(F1 Maternal Parent / F1 Paternal Parent) x F0 partner”, where the F0 partner is male for backcrosses targeting the mitochondria (top) and female for backcrosses targeting the Y chromosome (bottom). Individual cross counts are supplied in Supplementary Table S7. Male reproductive failure was virtually absent in all six focal backcrosses assayed (yellow bars), differing drastically from expected values under two-locus BDMI models (blue bars). These results suggest that pairing a uniparental chromosome with hemizygous/homozygous genotypes from the opposite population is not sufficient to unmask the incompatibilities involved in cross direction asymmetry of F2 hybrid breakdown, indicating that these do not reflect classic two-locus BDMIs.
We performed two backcrosses on each of the six of the France/Zambia cross directions identified as having significantly greater male failure rates than their reciprocal crosses (Figure 2B)—one meant to expose a mitochondrial incompatibility, and the other meant to expose a Y chromosome incompatibility. We predicted that two-locus BDMIs involving a maternally- or paternally inherited partner would yield 50% or more of males carrying incompatible genotypes in the relevant backcross, even if an autosomal or X-linked partner locus was recessive (Figure 3). In stark contrast to this prediction, all 12 backcrosses yielded male failure rates below 5% (Figure 3; Supplementary Table S7). Rather than being sharply increased, each of these male failure rates were lower than observed for F2 males from the enriched cross direction. These findings suggest that potential male reproduction BDMIs associated with uniparental chromosomes are unlikely to involve only a single BDMI partner. Instead, the reduced male failure rates from backcrosses could indicate that BDMIs involving uniparental chromosomes require at least one more partner locus from the same parental strain as the uniparental factor (which can not be homozygous under our backcross design), in addition to one or more partner loci from the opposite strain.
Discussion
The accumulation of reproductive isolation can occur as a consequence of restricted gene flow and independent evolution between populations. The strength of isolation has important consequences for the maintenance and origins of biological diversity (Rabosky, 2016), and is considered to be the determining factor for species boundaries by various species concepts (Coyne & Orr, 2004). Even if isolation is incomplete and reproductive barriers are weak, incompatible alleles may be present between populations. When such variants are recessive, barriers to reproduction manifest in F2 or later generation hybrids, in a process commonly referred to as hybrid breakdown. Here, we provide evidence for hybrid breakdown in male reproduction between various natural populations of D. melanogaster. This study highlights the potential of using D. melanogaster as a powerful within-species model of early-stage reproductive isolation, and provides preliminary evidence that such incompatibilities may be more genetically complex in nature than what might be expected under the simplest BDMI models.
Our analysis of four D. melanogaster populations revealed that hybrid breakdown between populations was most clearly reflected in a decline in male reproductive success in the F2 generation, whereas viability and female reproductive success occurred at relatively similar rates in between- and within-population crosses. In contrast to our largely negative results for the latter traits, Lachance and True (2010) did find evidence for inviability and female sterility, in addition to male sterility, when they created strains that were homozygous for X chromosomes and autosomes from contrasting geographic origins (e.g., crossing an X chromosome from a majority African population into an autosomal background of primarily European ancestry). The fully homozygous nature of those X-autosome combinations may have given that study a greater ability to detect some incompatibilities. However, the crossing scheme employed in that study also leaves the possibility that female sterility might have been caused by simple recessive deleterious X-linked variants (whereas recessive X-linked inviability and male sterility variants are less frequently encountered during lab inbreeding, due to more efficient natural selection against less-fit hemizygous males). Additional evidence for inviability incompatibilities comes from the studies of Alipaz et al. (2005), which also detects fertilization defects, and Kao et al. (2015). It is possible that further F2 studies like ours may yield further evidence for hybrid breakdown involving viability and female fertility, whether by expanding the experimental scale, controlling for population differences in egg production, quantifying female fecundity, or performing assays under more challenging environmental conditions. Still, on its own, our current study provides no clear support for hybrid breakdown between D. melanogaster populations leading to inviability or female reproductive failure.
Our deeper examination of hybrid breakdown in male reproduction between populations from France and Zambia revealed a strikingly variable pattern. Compared with average male failure rates from within-population crosses, the frequencies observed among individual France/Zambia cross directions ranged from no elevation at all (i.e., 0%–2%) to as much as an order of magnitude greater (16.7%; Supplementary Tables S4 and S5). Such divergent outcomes among crosses might not be expected under a model of numerous small effect BDMIs, in which case each cross might be expected to contain roughly comparable numbers of incompatibilities and rates of male failure. Instead, the striking variation in male failure rates among crosses may hint at the presence of large-effect interactions driving hybrid breakdown, with at least one interacting allele in each BDMI remaining polymorphic within its population. Further research will be needed to gain more detailed insights regarding the genetic architecture and molecular basis of these putative BDMIs.
Our finding that evidence of hybrid breakdown between recently diverged populations is strongest for male reproduction follows theoretical predictions and previous observations that emergent reproductive isolation disproportionately affects the heterogametic sex, evidenced particularly between many Drosophilae (known commonly as Haldane’s Rule) (Coyne & Orr, 1997; Haldane, 1922; Orr, 1997). Many common explanations of this trend invoke the X (or Z) chromosome, and a “large X effect” has been described and confirmed in many model systems (Presgraves, 2018). Our experimental design does not specifically address the role of the X chromosome in hybrid breakdown, since regardless of cross direction, F2 males receive their lone X chromosome from a mother who has one X from each parent. Hence, further genetic analysis will be needed to confirm the presence of X chromosome involvement in our observed male reproductive failure.
The variable effect of cross direction on male reproductive failure that we observed does imply a role for uniparentally inherited factors, i.e., Y chromosomes, mitochondria, or possibly other cytoplasmic factors such as maternally inherited endosymbionts. We can not say whether the variability in cross direction effect is driven by within-population variation in the uniparental interactors or their partner loci (or both). Since both directional biases were observed among our set of France/Zambia crosses, it follows that at least two such factors would need to be involved in incompatibilities, e.g., both populations’ Y chromosomes, both populations’ mitochondria, or one population’s Y and mitochondria. In D. melanogaster, polymorphism in both the mitochondria and Y chromosome have been shown to contribute to differential fitness of male offspring (Chippindale & Rice, 2001), suggesting that possible factors contributing to reproductive failure here may indeed involve these genetic elements. In closely related species, Y chromosome variation has been suggested to contribute to hybrid sterility between D. mauritania and D. simulans (Bayes & Malik, 2009). Alternatively, some models of early-stage reproductive isolation also implicate mito-nuclear interactions (Ellison & Burton, 2008; Rand et al., 2004). Finally, it is possible that the existence of cladistic variation in Wolbachia among D. melanogaster harboring unique mtDNA haplotypes. This maternally inherited endosymbiont is generally present in all studied populations but was not specifically tested for here (Richardson et al., 2012). In theory, Wolbachia could manifest CI phenotypes concurrent with hybrid breakdown, although CI is thought to be partially to fully repressed by D. melanogaster (Merçot & Charlat, 2004), and effects specific to male reproduction beyond the F1 generation are not known. We also can not formally exclude the possibility of multi-generational maternal effects on cross direction asymmetry in male reproductive success. In summary, backcross results in Supplementary Table S7 do not allow inferences as to which uniparental element is driving reproductive failure in the current study, and both maternally- and paternally transmitted candidates remain interesting candidates for BDMIs or other fitness interactions between these populations.
Following up on the implication of uniparental elements in hybrid breakdown of male reproduction, we conducted backcrosses that, under a two-locus BDMI model, were predicted to yield much higher rates of male failure than seen among F2s. Instead, we observed the opposite pattern of much lower male failure among backcross offspring than among F2s (Figure 3). These results suggest that putative BDMIs involving uniparental elements do not reflect simple two-locus interactions, and specifically that one or more additional partners may include a recessive allele from the same source population as the uniparental element. Such an interaction is not possible to generate in our backcross design, as there are no scenarios where a homozygous autosomal or X-linked allele is present together with the uniparental chromosome being tested. The interpretation of multigenic incompatibilities is consistent with the patterns of ancestry-mediated epistatic selection detected by an analysis of African and European ancestry in an admixed North American population, which found hub-like patterns of “ancestry disequilibrium” that could potentially be explained by complex incompatibilities (Pool, 2015).
This scenario is comparable to theoretical (Cutter, 2012, Fraïsse et al., 2014) and empirical studies (Phadnis, 2011; Phadnis et al., 2015; Turner & Harr, 2014) that suggest incompatibilities between isolated groups are likely to be polymorphic and to involve multiple partner loci. The maintenance and stability of multi-locus incompatibilities, especially those that may involve interactions with non-recombining elements of the genome such as the Y chromosome, is not easily predictable. Fraisse et al. (2014) have considered a theoretical framework for the evolution of multigenic incompatibilities among allopatric populations. Under the simplest assumptions, complex BDMIs are expected to occur more frequently as the number of possible evolutionary “paths” not passing through unfit intermediate genotypes increases as a factor of the number of alleles involved in a BDMI, but this result is sensitive to assumptions regarding evolutionary divergence (Fraïsse et al., 2014). Other models with different assumptions about population parameters, such as the selection on new mutations, also make predictions favoring multigenic incompatibilities (Unckless & Orr, 2009). However, we lack theoretical predictions that address the maintenance and stability of multigenic incompatibilities involving uniparentally inherited factors.
The existence of hybrid breakdown between populations that diverged within the last 13,000 years (Sprengelmeyer et al., 2020) could indicate that incompatible variants were either already present at the time that populations split, or else they arose and increased in frequency rapidly thereafter. This early emergence of hybrid breakdown is intriguing in light of a study of isolation between multiple Drosophila species which found that postzygotic isolation such as hybrid sterility usually evolves more slowly than prezygotic forms of isolation (Turissini et al., 2018). We note that some level of prezygotic isolation, in the form of a polymorphic unidirectional mating preference, does also exist between these populations (see Introduction section). Hence, results from this and prior studies in D. melanogaster emphasize that both pre- and postzygotic isolation can begin to emerge in the earliest stages of population divergence.
Evidence for hybrid breakdown impacting male reproduction was strongest between populations from France and Zambia. An additional possible signature of increased male reproductive failure was also observed between France and Cameroon hybrids, but not explored further in the present study. The involvement of France in both of these pairings is interesting in that this population has the most distinct allele frequencies among the four studied, as a consequence of the out-of-Africa bottleneck (Lack et al., 2016b; Sprengelmeyer et al., 2020). However, no strong enrichment of hybrid male failure was observed between France and Ethiopia (both of which have adapted to colder environments; Pool et al., 2017), even though these populations have the highest FST of any population pair in our study (0.312; Lack et al., 2016b). Instead, France shows potential signals of hybrid breakdown with the two warm-adapted African populations in our study (Figure 1D). Our results thus far can not suggest to what degree incompatible variants have increased in frequency due to adaptive evolution (Schluter & Conte, 2009) vs. genetic drift (Coyne & Orr, 2004; Schiffman & Ralph, 2021). Still, it is worth noting that incomplete sweeps appear to be common in D. melanogaster (Garud & Petrov, 2016; da Silva Ribeiro et al., 2022; Vy et al., 2017) and to underlie cases of locally adaptive trait evolution (Bastide et al., 2016; Sprengelmeyer & Pool, 2021; Sprengelmeyer et al., 2022). Hence, the polymorphic nature of hybrid breakdown that we observe is consistent with adaptive as well as neutral hypotheses. If future genetic studies can identify causative loci, then patterns of genetic diversity at these loci may hint at the evolutionary forces responsible for this instance of hybrid breakdown.
The present study does not allow us to draw direct conclusions about the relevance of incomplete hybrid breakdown in the broader context of the speciation process. We find reproductive failure at a mean population rate of ~5% (Figure 2; Supplementary Table S4), with stronger rates (~9%–17%) among crosses with the most significant reproductive failure. Studies of reproductive isolation between closely related species have identified recessive BDMIs that affect second generation hybrids at a rate of about 1/16 (Zuellig & Sweigart, 2018), a comparable rate to results from our large screen and the expected Mendelian rate predicted for the occurrence of a two-locus recessive genotype in progeny from F1 hybrid intercrosses. However, our results are consistent with the presence of incompatibilities that are polymorphic within populations, involve recessive interactors, and in at least some cases involve more than two partner loci. A polymorphic, recessive, and complex incompatibility should only affect a small proportion of admixed/hybrid individuals, and should primarily manifest in matings between two hybrids, thus offering a modest barrier to gene flow between natural populations (although multiple such incompatibilities could exist). Indeed, later generation hybrids are easy to generate in the lab, and admixture between putatively African and European source populations has been observed in multiple geographic regions. Given that substantial recent introgression has continued between African and European-like populations of D. melanogaster (e.g., Pool et al., 2012), we do not suggest that these populations are currently on a trajectory toward speciation. Nevertheless, it is striking that hybrid breakdown can begin to emerge after only about 13,000 years (Sprengelmeyer et al., 2020), and we suggest that there is considerable value in examining the genetic properties of some of the earliest stages of reproductive isolation, whether or not they ultimately contribute to a speciation event.
Conclusions
Early stages of incomplete reproductive isolation may include hybrid breakdown, when recessive incompatibilities that are masked in the first generation of interbreeding are exposed in later generation hybrids. Here, we provide strong evidence for the existence of hybrid breakdown in male reproductive performance (but no clear evidence for female reproduction or viability) between recently diverged populations of D. melanogaster. We find that the rate of reproductive failure among males from crosses between France and Zambia populations is significantly greater than within-population crosses. The apparent cross and cross-direction variability that exists among these crosses in consistent with the presence of incompatibilities are polymorphic and involve uniparental factors in some cases. Further, backcross results provide further evidence of a complex basis of hybrid breakdown, which could be explained by the existence of multi-locus incompatibilities. Taken together, our results could indicate that incompatibilities have arisen rapidly after populations diverged, that they remain variable, and that at least some of them have a complex genetic architecture. In light of the experimental tractability of D. melanogaster, the genetic tools available, and our knowledge of this species’ genome and its diversity, the existence of hybrid breakdown between D. melanogaster populations appears to offer promising opportunities for future investigations of the molecular basis of hybrid breakdown in early stage reproductive isolation.
Data availability
All data produced for this study is reflected in Supplementary Tables. Full code and documentation of novel analysis methods can be found at: [http://www.github.com/mjlollar/significance_testing].
Author contributions
M.J.L. and J.E.P. designed the research, M.J.L., C.E.D., and T.J.B. performed the research, M.J.L. and J.E.P. analyzed the data and wrote the paper.
Funding
This research was supported by National Institutes of Health grants R35 GM13630 (to J.E.P.), T32 GM007133, and T32 HG002760. We thank members of the Pool lab for helpful comments on draft versions of this manuscript.
Conflict of interest: The authors declare that no conflicts of interest exist.