-
PDF
- Split View
-
Views
-
Cite
Cite
Uliana K Kolesnikova, Alison Dawn Scott, Jozefien D Van de Velde, Robin Burns, Nikita P Tikhomirov, Ursula Pfordt, Andrew C Clarke, Levi Yant, Alexey P Seregin, Xavier Vekemans, Stefan Laurent, Polina Yu Novikova, Transition to Self-compatibility Associated With Dominant S-allele in a Diploid Siberian Progenitor of Allotetraploid Arabidopsis kamchatica Revealed by Arabidopsis lyrata Genomes, Molecular Biology and Evolution, Volume 40, Issue 7, July 2023, msad122, https://doi.org/10.1093/molbev/msad122
- Share Icon Share
Abstract
A transition to selfing can be beneficial when mating partners are scarce, for example, due to ploidy changes or at species range edges. Here, we explain how self-compatibility evolved in diploid Siberian Arabidopsis lyrata, and how it contributed to the establishment of allotetraploid Arabidopsis kamchatica. First, we provide chromosome-level genome assemblies for two self-fertilizing diploid A. lyrata accessions, one from North America and one from Siberia, including a fully assembled S-locus for the latter. We then propose a sequence of events leading to the loss of self-incompatibility in Siberian A. lyrata, date this independent transition to ∼90 Kya, and infer evolutionary relationships between Siberian and North American A. lyrata, showing an independent transition to selfing in Siberia. Finally, we provide evidence that this selfing Siberian A. lyrata lineage contributed to the formation of the allotetraploid A. kamchatica and propose that the selfing of the latter is mediated by the loss-of-function mutation in a dominant S-allele inherited from A. lyrata.
Introduction
Most angiosperms are hermaphroditic, with bisexual flowers producing both female and male gametes, and can thus potentially self-fertilize. Diverse self-recognition systems based on pollen–pistil interactions evolved repeatedly (Charlesworth et al. 2005; Zhao et al. 2022), preventing inbreeding, and subsequently, several independent transitions from outcrossing to self-pollination have occurred through degradation of these recognition systems (Shimizu and Tsuchimatsu 2015). A transition to selfing provides an immediate advantage in the face of low population density, often at the edges of the species distribution (Levin 2012). Pinpointing the genetic changes undermining self-rejection in nature not only improves our understanding of self-incompatibility mechanisms but also provides a more complete evolutionary history of the self-compatible species, providing essential context to understand their genome evolution (Guo et al. 2009; Slotte et al. 2013; Vekemans et al. 2014; Durvasula et al. 2017; Mable et al. 2017; Fulgione et al. 2018; Mattila et al. 2020).
In Brassicaceae, the sporophytic self-incompatibility (SI) system involves a self-pollen recognition mechanism determined by the S-locus, where two main genes are linked: The male SCR gene is expressed in tapetum cells of anthers, the protein is embedded into the pollen coat and serves as a ligand for the receptor kinase coded by the female SRK gene, which is expressed on the surface of the stigma (Stein et al. 1991; Schopfer et al. 1999; Takayama et al. 2000, 2001; Takayama and Isogai 2005; Nasrallah 2019). A breakdown of SI and transition to self-compatibility occurs when recognition between SCR and SRK (or downstream signaling) leading to pollen rejection is impaired (Uyenoyama et al. 2001; Shimizu and Tsuchimatsu 2015; Mable et al. 2017). In outcrossing Arabidopsis species (e.g., Arabidopsis lyrata, Arabidopsis halleri, and Arabidopsis arenosa), more than ten different S-haplotypes can segregate in a population (Castric and Vekemans 2004; Castric et al. 2008). This haplotypic diversity is essential for an SI system to function and has been maintained by frequency-dependent balancing selection for over 8 My (Mable et al. 2003; Castric and Vekemans 2004; Mable et al. 2004; Castric et al. 2008; Llaurens et al. 2008; Le Veve et al. 2022). A diploid outcrossing individual can possess two different S-alleles but often only one of them is expressed due to dominance, thus increasing the chances of reproduction (Hatakeyama et al. 2001; Kusaba et al. 2002; Prigoda et al. 2005; Okamoto et al. 2007), although codominance has also been reported (Prigoda et al. 2005; Llaurens et al. 2008). The expression of only one S-allele increases the chances for successful mating in heterozygous outcrossers, however, which of the S-alleles will be expressed can differ in pollen and stigma (Bateman 1954). Pollen-driven dominance is more thoroughly described and is conditioned by different trans-acting microRNA precursors and their targets on recessive S-alleles. MicroRNAs produced by dominant S-alleles silence the expression of the SCR gene on recessive S-allele through methylation of a 5′ promoter of SCR (Kusaba et al. 2002; Shiba et al. 2006); (Tarutani et al. 2010; Durand et al. 2014; Fujii and Takayama 2018). As dominance is uncoupled from self-recognition in this system, a dominant loss-of-function mutation is possible and would yield a self-compatible phenotype in a heterozygous individual.
The ancestral state in the genus Arabidopsis is outcrossing due to self-incompatibility. However, self-compatible species have evolved multiple times: in the model species Arabidopsis thaliana, and allotetraploids Arabidopsis suecica and Arabidopsis kamchatica. One of the early challenges for a new polyploid is the scarcity of compatible karyotypes for mating, and competition with established nearby diploids (Levin 1975). Selfing alleviates such challenges. In A. suecica, the transition to self-compatibility was likely immediate following the cross between an A. thaliana with a nonfunctional dominant S-haplotype (Tsuchimatsu et al. 2010) and an outcrossing A. arenosa (Novikova et al. 2017). However, the origin of self-compatibility in A. kamchatica is less clear, as the species originated from multiple crosses between A. lyrata and A. halleri in East Asia (Shimizu et al. 2005; Shimizu-Inatsugi et al. 2009; Tsuchimatsu et al. 2012; Paape et al. 2018). Whereas A. halleri is an obligate outcrosser, A. lyrata is predominantly self-incompatible with described self-compatible populations restricted to the Great Lakes region of North America (Mable et al. 2005; Foxe et al. 2010; Willi and Määttänen 2010; Griffin and Willi 2014) from which subarctic and arctic selfing Arabidopsis arenicola in Canada and Greenland may have originated. (Willi et al. 2022). A selfing individual of A. lyrata collected in Yakutia has been reported as genetically closest to the A. lyrata subgenome of A. kamchatica (Shimizu-Inatsugi et al. 2009; Paape et al. 2018), but the evolutionary history of this selfing lineage and S-locus genotype has not been described.
Here, we ask 1) how and when self-compatibility evolved and spread in Siberian A. lyrata; 2) is it plausible that A. lyrata was already self-compatible when it contributed to allopolyploid A. kamchatica? and 3) could a loss of self-incompatibility in only one of the diploid ancestors (A. lyrata) be sufficient to render A. kamchatica self-compatible? Broad sampling combining live and herbarium collections allowed us to describe the selfing lineage of A. lyrata in Siberia ranging between Lake Taymyr and Chukotka, across north-central and eastern Russia. We first present chromosome-level assemblies of a Siberian selfing A. lyrata and the reference North American selfing accession (Hu et al. 2011), characterize the genomic and structural differences between them, and describe the S-locus structure and the likely mechanism of the failure of self-incompatibility in the Siberian selfing populations. Using demographic modeling, we date the transition to selfing in Siberian A. lyrata and suggest that it happened prior or concurrent with the formation of allopolyploid A. kamchatica. We confirm that the Siberian selfing A. lyrata was likely one of the progenitors of the allotetraploid A. kamchatica using overall genetic relatedness assessment and the phylogeny of the SRK gene in the S-locus. Together, our results suggest that one of the allopolyploid A. kamchatica origins and its transition to selfing was facilitated by the loss-of-function in the dominant S-allele inherited from Siberian A. lyrata.
Results
Genome Assembly of the Selfing Siberian NT1 Accession
We grew seeds of A. lyrata collected from three populations in Yakutia (supplementary table S1, Supplementary Material online and supplementary fig. S1A, Supplementary Material online) in the greenhouse (see Materials and Methods) and noticed that plants from NT1 population formed long fruits (supplementary fig. S1B, Supplementary Material online), suggesting self-compatibility. We observed that flowers of the selfing NT1 accession appeared to be smaller compared with flowers of outcrossing plants and another selfing accession MN47 from North America (supplementary fig. S1C–E, Supplementary Material online). We confirmed that self-pollen successfully germinated in a selfed NT1 accession and made pollen tubes, whereas self-pollination of an outcrossing plant from the NT8 population did not result in pollen tube growth (supplementary fig. S2, Supplementary Material online).
We extracted HMW DNA from NT1 leaf tissue and obtained 1,100,878 high-fidelity (HiFi) PacBio reads with N50 read length of 14,161 bp (total length of raw read sequences is ∼15,9 Gbp). We assembled those reads using the Hifiasm (Cheng et al. 2021) into 1,070 contigs with N50 of 5.508 Mb. We scaffolded these contigs further along the MN47 A. lyrata assembly (Hu et al. 2011) with RagTag (Alonge et al. 2019) reaching chromosome-level with a scaffold N50 of 24.641 Mb. We then assessed the completeness of the NT1 A. lyrata genome assembly using BUSCO and found 4,463 complete and single-copy (97.1%), 88 complete and duplicated (1.9%), 7 fragmented (0.2%), and 38 missing genes (0.8%) from the Brassicales_odb10 set. Repeated sequences composed about 49.9% of the assembly. We annotated 28,596 genes by transferring gene annotation from the reference A. lyrata genome (Rawat et al. 2015) using Liftoff (Shumate and Salzberg 2020).
Various papers (Long et al. 2013; Slotte et al. 2013; Henry et al. 2014; Burns et al. 2021; Dukić and Bomblies 2022) have reported potential artifacts in the reference A. lyrata MN47 (version 1 or v1) genome assembly (Hu et al. 2011). Our comparison of the Siberian NT1 with the MN47 v1 A. lyrata reference genome indicated multiple structural variants in the same genomic regions as those between the genomes of MN47 v1 and the A. arenosa subgenome of A. suecica (fig. 1A), MN47 v1 and Capsella rubella, and MN47 v1 and a diploid A. arenosa (supplementary table S2, Supplementary Material online) (Long et al. 2013; Slotte et al. 2013; Burns et al. 2021; Dukić and Bomblies 2022). We confirmed the existence of such artifacts and corrected them through long-read DNA sequencing (supplementary table S2, Supplementary Material online). Specifically, we obtained 868,563 HiFi reads of the MN47 accession with N50 length of 20,206 bp (total length of raw read sequences is ∼17,6 Gbp; ∼80× coverage). In total, we assembled ∼244 Mb in 820 contigs with an N50 of 23.506 Mb, indicating that full chromosome arms of MN47 were assembled as single contigs. Contigs were scaffolded into eight chromosomes using the genomes of MN47 v1 and NT1 as guides. The scaffolded contigs amount to ∼209 Mb. Completeness of the new MN47 v2 A. lyrata genome assembly by BUSCO was 4,544 complete and single-copy (97.1%), 83 complete and duplicated (1.8%), 8 fragmented (0.2%), and 44 missing genes (0.9%) from the Brassicales_odb10 set. The placement and orientation of contigs in the scaffolds were corrected using previously published Hi-C data (Zhu et al. 2017) and by manual examination of the long reads (see Materials and Methods, supplementary figs. S3–S7, Supplementary Material online).

Segregating large structural variants in Arabidopsis lyrata. (A) Eleven large inversions between North American MN47 v1 and both Siberian NT1 A. lyrata and the Arabidopsis arenosa subgenome of Arabidopsis suecica are not observed between NT1 and A. suecica (supplementary table S2, Supplementary Material online), suggesting these inversions are likely artifacts in the MN47 v1 assembly. (B) Segregating inversions in A. lyrata observed following reassembly by long reads and manual curation using Hi-C data of the North American MN47 genome and its alignment to the Siberian NT1 A. lyrata genome. Five inversions unique to MN47 (the longest being ∼2.4 Mb in size) are highlighted.
Our reassembled long-read–based MN47 v2 genome confirmed the existence of the expected false structural variants in the MN47 v1 genome (figs. 1 and S3–S7, Supplementary Material online), which we were able to fix in v2. The comparison of the MN47 v2 and NT1 genomes revealed several large inversions segregating in A. lyrata, the largest of which (∼2.4 Mb) is on chromosome 1. All the identified inversions between the genome comparisons are listed in supplementary table S2, Supplementary Material online. Inversions between A. lyrata MN47 and NT1 accessions are listed in supplementary table S1, Supplementary Material online. In each genome comparison, we identified multiple alleles of structural variants at the end of chromosome 3. This may be explained by the fact that one of the nucleolar organizer regions (NORs) of A. lyrata is located at the end of chromosome 3 (Lysak et al. 2006). We confirmed that chromosome 3 contains a partially assembled NOR using Basic Local Alignment Search Tool (Blast). Overall, we have assembled high-quality chromosome-level genomes for two A. lyrata accessions and through pairwise genome alignment, we identified several inversions up to 2.4 Mb long segregating in the species.
Breakdown of the SI System in Siberian A. lyrata NT1
Both genes flanking the S-locus (U-box and ARK3) were assembled in a single contig in the HiFi assembly before any scaffolding, indicating that the entire ∼44.5 kb S-locus of the NT1 accession was fully assembled. We further confirmed the completeness of the S-locus by mapping PacBio reads back to the assembly and found even coverage spanning the S-locus with no gaps (supplementary fig. S8, Supplementary Material online). Blast analysis of SRK and SCR sequences from the known S-haplotypes (supplementary Data 1 and 2, Supplementary Material online) (Boggs et al. 2009a, b; Tsuchimatsu et al. 2010; Guo et al. 2011; Goubet et al. 2012; Tsuchimatsu et al. 2012) revealed no hits for SRK, and one hit for SCR from the A. halleri S12 haplogroup (fig. 2B). Due to long-term frequency-dependent balancing selection on the S-locus in Brassicaceae, relatedness among S-haplotypes is not consistent with species relatedness, such that the closest sequences to A. halleri S12 (AhS12) are not other A. halleri S-haplotypes but rather specific S-haplotypes from A. lyrata S42 (AlS42) and A. kamchatica D (Ak-D) (Wright 1939; Vekemans and Slatkin 1994; Mable et al. 2003; Castric and Vekemans 2004; Kamau and Charlesworth 2005; Castric et al. 2008; Llaurens et al. 2008; Tsuchimatsu et al. 2012; Roux et al. 2013). We estimated a phylogeny of the known SCR protein sequences (Guo et al. 2011; Goubet et al. 2012) and the manually annotated NT1 A. lyrata SCR sequence from the Blast results (fig. 2A). As expected, the SCR phylogeny has a different topology than the species phylogeny, as S-haplotypes are trans-specifically shared across Arabidopsis. The SCR phylogeny confirms that the closest haplotype to the NT1 A. lyrata S-locus is the S12 haplotype from A. halleri (AhS12).

S-locus structure of the Siberian NT1 selfing Arabidopsis lyrata population. (A) Phylogenetic tree of SCR proteins reveals clustering of NT1 SCR (green) and AhS12. (B) Comparison of the S-locus region of the A. lyrata NT1 genome assembly with the Arabidopsis halleri S12 haplotype (Durand et al. 2014). Links between S-loci are colored according to the Blast scores from highest (blue) to lowest (gray). SCR, SRK, and flanking U-box and ARK3 genes have green, orange, and purple borders, respectively. SRK gene appears to be completely absent from the S-locus of the NT1 A. lyrata selfing accession. The only Blast hit to SRK is a spurious hit to ARK3 as they both encode receptor-like serine/threonine kinases. (C) Protein sequence alignment of S-locus SCR genes from A. halleri and A. lyrata, including NT1. One of the eight conserved cysteines important for structural integrity has been lost from the NT1 SCR protein.
We compared the structures of the AhS12 and NT1 S-loci (fig. 2B) and confirmed the absence of SRK (i.e., the female component of the self-incompatibility system), which is sufficient to explain the selfing nature of the NT1 accession. We also mapped short reads from NT1 to the NT1 genome assembly plus the intact AhS12 sequence from A. halleri containing SRK, and found no reads mapped to SRK (supplementary fig. S9C, Supplementary Material online). This provides additional confirmation of a complete loss of SRK from the NT1 S-locus. Analyzing the SCR protein sequences more closely, we also observed a loss of one of the eight conserved cysteines in the NT1 SCR sequence, which are important in protein-folding and the recognition of the SCR ligand by the SRK receptor (Kusaba et al. 2001; Mishima et al. 2003; Tsuchimatsu et al. 2010) (supplementary fig. S10A, Supplementary Material online). This suggests that the SCR protein is nonfunctional in the NT1 A. lyrata accession. We tested for expression of the SCR gene in the flowers of NT1 using RNAseq and did not detect any transcript of the AhS12 SCR (supplementary fig. S9A and B, Supplementary material online), though this may be due to the timing of floral development as expression of SCR is transient (Burghgraeve et al. 2020). Sequence comparison of the SCR region between AhS12 and NT1 showed high similarity in the promoter region (supplementary fig. S9D, Supplementary Material online) indicating that structural rearrangements did not cause loss of expression—but nucleotide substitutions at critical sites cannot be excluded. To verify whether SCR is indeed nonfunctional and/or not expressed in NT1, we performed controlled crosses, fertilizing an outcrossing A. lyrata accession (NT8.4-24, which has a functional AhS12 haplogroup) with NT1 pollen, resulting in successful pollen tube growth (supplementary fig. S11, Supplementary Material online). This outcome is possible if 1) the SCR protein from the NT1 accession could not be recognized by SRK receptors from the same AhS12 haplogroup or 2) the SCR gene was not expressed at all. Both of these scenarios lead to the conclusion that the SCR gene is nonfunctional in the NT1 selfing Siberian A. lyrata accession. There is, however unlikely, an additional possibility: 3) A self-compatible reaction could be possible with a functional SCR in NT1 if the SRK gene from haplogroup AhS12 was not expressed in the outcrossing maternal plant (NT8.4-24). We describe scenario 3 as improbable because outcrossing maternal plant NT8.4-24 is heterozygous at the S-locus, possessing two S-alleles: AhS12 and AlS25. The latter is known to be either codominant or recessive to AhS12 as it belongs to a lower dominance class (Llaurens et al. 2008; Durand et al. 2014), therefore, AhSRK12 is most likely expressed in NT8.4-24.
According to the classification of S-haplotypes, AhS12 belongs to dominance class IV (the most dominant class), and it is documented that it has an sRNA precursor, which can silence the expression of SCR genes from S-haplotypes belonging to classes I, II, and III (Durand et al. 2014; Burghgraeve et al. 2020). Indeed, by Blast analysis, we identified an sRNA precursor sequence in the NT1 S-locus assembly similar to the mirS3 precursor of A. halleri S12 haplotype (Durand et al. 2014), suggesting a conserved dominance mechanism of A. lyrata S12.
Population-level Re-sequencing Confirms that Selfing Siberian A. lyrata Contributed to A. kamchatica Origin
Sampling
We sequenced additional nine A. lyrata accessions collected during the same expedition (supplementary table S1, Supplementary Material online), ten herbarium samples of A. lyrata from Taymyr, Yakutia, Kamchatka, and Chukotka dating from 1958 to 2014, and 19 herbarium samples of A. kamchatica using the same Illumina NovaSeq platform (150 bp PE) (see Materials and Methods and supplementary table S1, Supplementary Material online). The herbarium samples were obtained from the Moscow University Herbarium (Seregin 2023). Our data set also included previously published whole genome resequencing data from the diploid A. lyrata collected in the same region, allotetraploid A. kamchatica samples (Shimizu-Inatsugi et al. 2009; Novikova et al. 2016; Paape et al. 2018) (supplementary table S1, Supplementary Material online and fig. 3A) and European A. lyrata samples (Takou et al. 2021) as outgroups (supplementary table S1, Supplementary Material online).

(A) Map of short-read sequenced Siberian Arabidopsis lyrata (circles) and Arabidopsis kamchatica (triangles). Live A. lyrata accessions names start with NT, herbarium sample names start with MW, a previously published sample of A. lyrata has been assigned the SRR and DRR prefix, and A. kamchatica samples start with SAMD. Colors indicate heterozygosity per sample, calculated by the percent of heterozygous sites. (B) Network depiction of Nei's D between individuals shows that selfing A. lyrata is genetically closer to A. kamchatica than the outcrossing populations. Whereas the network is drawn as unrooted, an outgroup accession provides context for interpretation. Individual genetic distances are also shown as heatmap in supplementary figure S14, Supplementary Material online. (C) Neighbor-joining tree of Siberian A. lyrata accessions with heterozygosity and genotyped SCR and SRK alleles. (D) Best-fit demographic model of divergence, a bottleneck in selfers, and asymmetric migration between selfing and outcrossing lineages, with parameter estimate for divergence time. TDIV, time of divergence between selfing and outcrossing lineage (origin of selfing); NeANC, effective population size of ancestor lineage; NePOP1, effective population size of selfing lineage; NePOP2, effective population size of outcrossing lineage; TBOT, time of bottleneck in selfing lineage; Nm12 and Nm21, the number of migrants between selfing and outcrossing lineages. Further values are reported in table 1. Point estimates and confidence intervals are reported in table 1; point estimates for all tested models are reported in supplementary table S4, Supplementary Material online.
Defining Selfing A. lyrata by Heterozygosity
To determine whether multiple selfing populations might exist in the examined geographic region, we first calculated the percent of heterozygous sites for each individual (supplementary table S1, Supplementary Material online and fig. 3A) mapped to NT1 reference. Two modes on the heterozygosity levels were apparent in our A. lyrata data set (supplementary fig. S12, Supplementary Material online), which we assign as selfing (0.012% on average with 0.046% maximum value in supplementary table S1, Supplementary Material online, indicated with yellow markers on fig. 3) and outcrossing (0.27% on average with 0.288% maximum value in supplementary table S1, Supplementary Material online within A. lyrata samples, indicated by green markers on fig. 3). This heterozygosity-based assignment is supported by our observations of individuals growing in the greenhouse: NT1 populations produced seeds without crosses, whereas NT8 and NT12 populations did not. Allotetraploid A. kamchatica co-occurring in the same geographical region is also self-compatible. To ensure that none of our A. lyrata samples were misclassified, we first mapped allotetraploid A. kamchatica samples in the same way to the NT1 A. lyrata reference without separating subgenomes. The majority of the single nucleotide polymorphisms (SNPs) in A. kamchatica represent divergent sites between the two subgenomes, which explains its high heterozygosity levels, clearly distinct from selfing A. lyrata samples (supplementary figs. S12, Supplementary Material online and 3).
Genotyping S-alleles in Outcrossers
We genotyped S-alleles of all the short-read sequenced accessions in our data set by using a genotyping pipeline for de novo discovery of divergent alleles (Genete et al. 2020) with both SCR and SRK sequences as the reference allele databases (Schierup et al. 2001; Mable et al. 2003; Bechsgaard et al. 2004; Castric and Vekemans 2007; Castric et al. 2008; Boggs et al. 2009a, b; Guo et al. 2009; Castric et al. 2010; Guo et al. 2011; Goubet et al. 2012; Dwyer et al. 2013; Durand et al. 2014; Mable et al. 2017; Tsuchimatsu et al. 2017; Mable et al. 2018; Takou et al. 2020; Kodera et al. 2021) (supplementary Data 1 and 2, Supplementary Material online and supplementary table S1, Supplementary Material online). For each outcrossing individual, we find two different SRK alleles and at most one SCR allele (fig. 3C). Identifying SCR alleles is more difficult than SRK, likely due to an incomplete SCR database rather than these genes being absent in outcrossing individuals.
Selfing Siberian A. lyrata is Fixed for AhS12
All the self-compatible (low-heterozygosity) A. lyrata samples shared the same S-haplogroup—AhS12 (fig. 3C): either by SCR or SRK genotype. Most of the self-compatible accessions, with exception of MW0079456, did not have SRK genotype. As the SRK database is robust and we confirmed the absence of SRK from the full-length NT1 assembly, the lack of SRK genotypes in the self-compatible Siberian A. lyrata accessions is likely due to gene loss.
S-allele of the Self-compatible Siberian A. lyrata Matches the Most Common A. lyrata-inherited S-allele in A. kamchatica
In the pool of A. kamchatica samples, we identified five SRK alleles (supplementary table S1, Supplementary Material online) using the same genotyping pipeline (Genete et al. 2020), consistent with Tsuchimatsu et al. 2012, where AhS12 (AkS-D) and AhS02 (AkS-E) are shown to be A. lyrata-inherited, whereas AhS26 (AkS-A), AhS47 (AkS-B), and AhS1 (AkS-C) are A. halleri-inherited (Tsuchimatsu et al. 2012). A. lyrata-inherited AhS12 (AkS-D) is the most common SRK allele (43.67%) on the A. lyrata subgenome of A. kamchatica and matches the S-haplotype of the Siberian self-compatible A. lyrata lineage. The full SRK gene sequence from Siberian self-compatible A. lyrata accession MW0079456 forms a monophyletic group with A. kamchatica SRK sequences from the S12-haplogroup whereas outcrossing A. lyrata SRK sequences from the S12-haplogroup are in a distinct clade separate from A. kamchatica (supplementary fig. S13A, Supplementary Material online and supplementary Data 3, Supplementary Material online).
Self-compatible Siberian A. lyrata Lineage Is Genetically Closest to A. kamchatica
To estimate the relatedness between A. lyrata and A. kamchatica, we analyzed only the A. lyrata subgenome of A. kamchatica. We split the subgenomes of A. kamchatica by mapping accessions simultaneously to NT1 A. lyrata and A. halleri ssp. gemmifera (Briskine et al. 2016) reference genomes, and used only the A. lyrata portion for further analysis. In addition to the SRK phylogeny, network analysis based on genetic distance (Nei's D) (Nei 1972) between individuals for 4,141 biallelic SNPs at four-fold degenerate sites suggests an overall closer genome relatedness between Siberian selfing A. lyrata and A. lyrata subgenome of A. kamchatica, compared with Siberian outcrossing A. lyrata and the A. lyrata subgenome of A. kamchatica (fig. 3B). These individual pairwise genetic distances are further represented as a heatmap (supplementary fig. S14, Supplementary Material online). The same relationships are also shown in the maximum likelihood (ML) phylogeny based on the same SNP data, where selfing Siberian A. lyrata populations form a clade with A. lyrata subgenome of A. kamchatica (supplementary fig. S13B, Supplementary Material online), whereas outcrossing Siberian A. lyrata is more distantly related. This is consistent with the previously published results showing that a selfing A. lyrata accession from Siberia (lyrpet4—DRR124344) is genetically closest to A. kamchatica (Shimizu-Inatsugi et al. 2009; Paape et al. 2018).
Demographic Modeling Suggests that Self-compatible Siberian A. lyrata Lineage Originated Around 90 Kya
The observation that all the Siberian selfing A. lyrata accessions share the same S-haplotype suggests that they may have originated from a single breakdown of self-incompatibility. The calculated total nucleotide diversity in 10 kb windows for selfing A. lyrata has a mean value of 0.11% (95% confidence interval [CI] [0.105–0.118]), which is about 7.5 times lower compared with 0.84% (95% CI [0.818–0.87]) in the outcrossing Siberian A. lyrata population. Though the selfing lineage in Siberia likely originated from a single founder, the joint allele frequency spectrum between selfing and outcrossing Siberian A. lyrata shows a considerable amount of shared polymorphism (genome-wide nongenic region and excluding pericentromeric and centromeric regions—54,772 SNPs shared versus 128,393 SNPs private to the selfer lineage; supplementary fig. S14, Supplementary Material online). This may be because the founder was a heterozygous outcrosser, and a certain amount of gene flow does occur between lineages, as self-compatibility does not prevent plants from mating with outcrossers.
To further investigate the relationships between selfing and outcrossing populations and to date the self-incompatibility breakdown, we implemented a series of demographic models in fastsimcoal26 (Excoffier et al. 2013). The best-fit model is shown (fig. 3D,table 1), which includes divergence between selfers and outcrossers with a subsequent bottleneck in the selfing lineage, with asymmetric introgression between populations. The estimate of divergence time (TDIV) in this model is ∼90 ka (87,756), though we suggest caution when interpreting such estimates. All tested models can be viewed in supplementary figure S15, Supplementary Material online, with corresponding parameters in supplementary table S4, Supplementary Material online and input files on GitHub (https://github.com/novikovalab/selfing_Alyrata).
S-allele Dominance is Retained in the Self-compatible Siberian A. lyrata Lineage
Above we show that the self-compatible Siberian A. lyrata lineage is fixed for AhS12, which belongs to a dominant class of S-alleles in Arabidopsis (Durand et al. 2014). To test whether dominance is retained in A. lyrata NT1 despite the loss of the self-recognition function in the AhS12 S-allele, we conducted two crosses with self-incompatible A. lyrata plants (TE10.3-2 and TE11) as maternal plants and NT1 as pollen donor. The resulting F1 plants had different combinations of S-alleles and were self-compatible in two combinations, where the maternally inherited S-allele from a self-incompatible plant was from a lower dominance class than AhS12 (table 2 and figs. 4A and Band S16, Supplementary Material online). F1 plants with AhS1/AhS12 and AhS63/AhS12 combinations of S-alleles are self-compatible, whereas F1 plants with AhSRK54/AhS12 combination are self-incompatible. AhS1 is recessive to AhS12 in A. halleri (Durand et al. 2014), and AhS63 belongs to class III of dominance (corresponding to AlS41 in Mable et al. 2018), which is expected to be recessive to the class IV AhS12 allele in A. lyrata (Prigoda et al. 2005).

(A) Self-pollinated F1 progeny (F1.1-1) resulting from a cross between a self-incompatible (shown in B) ♀ TE10.3-2 Arabidopsis lyrata accession and ♂ NT1 self-compatible A. lyrata accession shows pollen tube growth (yellow arrow) and dominance of self-compatibility in the F1 generation. (B) Self-pollinated self-incompatible A. lyrata accession TE10.3-2 (used as the maternal plant in A shows no pollen tube growth, demonstrating its self-incompatibility. (C) The geographical distribution of Arabidopsis kamchatica S-haplotypes shows a strong population structure across the species range. Circles are individual accessions, with S-haplogroups indicated by colors of pie slices. Arabidopsis halleri orthologous S-haplogroups are mentioned in the parenthesis next to the A. kamchatica S-haplogroups (AkS-A-E). Circle outline indicates either previously published data (grey) or newly reported accessions (black). A. kamchatica occurrences from the Global Biodiversity Information Facility (GBIF) are indicated by transparent grey dots.
Point Estimates and 95% Confidence Intervals for the Best-fit Demographic Model.
. | Parameter estimates . | ||||||
---|---|---|---|---|---|---|---|
. | Ne POP1 . | Ne POP2 . | Ne ANC . | TDIV . | Nm12 . | Nm21 . | Ne BOT . |
Best point estimate | 526,490 | 33,400 | 129,243 | 87,756 | 2.69E−01 | 3.79E+00 | 14 |
CI 2.5% | 5,602 | 20,224 | 1,418 | 36,002 | 7.61E−05 | 8.35E−06 | 5 |
CI 97.5% | 552,758 | 56,554 | 275,637 | 89,150 | 1.37E+00 | 1.11E+01 | 48 |
. | Parameter estimates . | ||||||
---|---|---|---|---|---|---|---|
. | Ne POP1 . | Ne POP2 . | Ne ANC . | TDIV . | Nm12 . | Nm21 . | Ne BOT . |
Best point estimate | 526,490 | 33,400 | 129,243 | 87,756 | 2.69E−01 | 3.79E+00 | 14 |
CI 2.5% | 5,602 | 20,224 | 1,418 | 36,002 | 7.61E−05 | 8.35E−06 | 5 |
CI 97.5% | 552,758 | 56,554 | 275,637 | 89,150 | 1.37E+00 | 1.11E+01 | 48 |
Parameter Estimates are Reported in Number of Individuals and Years.
Point Estimates and 95% Confidence Intervals for the Best-fit Demographic Model.
. | Parameter estimates . | ||||||
---|---|---|---|---|---|---|---|
. | Ne POP1 . | Ne POP2 . | Ne ANC . | TDIV . | Nm12 . | Nm21 . | Ne BOT . |
Best point estimate | 526,490 | 33,400 | 129,243 | 87,756 | 2.69E−01 | 3.79E+00 | 14 |
CI 2.5% | 5,602 | 20,224 | 1,418 | 36,002 | 7.61E−05 | 8.35E−06 | 5 |
CI 97.5% | 552,758 | 56,554 | 275,637 | 89,150 | 1.37E+00 | 1.11E+01 | 48 |
. | Parameter estimates . | ||||||
---|---|---|---|---|---|---|---|
. | Ne POP1 . | Ne POP2 . | Ne ANC . | TDIV . | Nm12 . | Nm21 . | Ne BOT . |
Best point estimate | 526,490 | 33,400 | 129,243 | 87,756 | 2.69E−01 | 3.79E+00 | 14 |
CI 2.5% | 5,602 | 20,224 | 1,418 | 36,002 | 7.61E−05 | 8.35E−06 | 5 |
CI 97.5% | 552,758 | 56,554 | 275,637 | 89,150 | 1.37E+00 | 1.11E+01 | 48 |
Parameter Estimates are Reported in Number of Individuals and Years.
The Genotypes and Phenotypes of Outcrossing Mother Plants (TE10.3-2 and TE11.1-2) and F1 Progeny From Their Pollination by NT1 Self-compatible A. lyrata Accession With AhS12 S-allele (SCR Present and SRK Lost—fig. 2B). Mating types are abbreviated with SC for self-compatibility and SI for self-incompatibility.
Accession . | SRK genotype . | SCR genotype . | Mating type . |
---|---|---|---|
TE10.3-2 | AhSRK01, AhSRK63 | AhSCR01, – | SI |
TE11.1-2 | AhSRK01, AhSRK54 | AhSCR01, – | SI |
NT1 | – | AhSCR12 | SC |
TE10.3-2♀ × NT1♂ F1.1-1 | AhSRK01 | AhSCR01, AhSCR12 | SC |
TE10.3-2♀ × NT1♂ F1.1-2 | AhSRK63 | AhSCR12, – | SC |
TE11.1-2♀ × NT1♂ F1.2-1 | AhSRK54 | AhSCR12, – | SI |
TE11.1-2♀ × NT1♂ F1.2-2 | AhSRK54 | AhSCR12, – | SI |
Accession . | SRK genotype . | SCR genotype . | Mating type . |
---|---|---|---|
TE10.3-2 | AhSRK01, AhSRK63 | AhSCR01, – | SI |
TE11.1-2 | AhSRK01, AhSRK54 | AhSCR01, – | SI |
NT1 | – | AhSCR12 | SC |
TE10.3-2♀ × NT1♂ F1.1-1 | AhSRK01 | AhSCR01, AhSCR12 | SC |
TE10.3-2♀ × NT1♂ F1.1-2 | AhSRK63 | AhSCR12, – | SC |
TE11.1-2♀ × NT1♂ F1.2-1 | AhSRK54 | AhSCR12, – | SI |
TE11.1-2♀ × NT1♂ F1.2-2 | AhSRK54 | AhSCR12, – | SI |
Some of the SCR genotypes have missing data “–” due to the incomplete SCR database. Note that in the F1s, AhSRK12 is missing due to gene loss in NT1 (fig. 2).
The Genotypes and Phenotypes of Outcrossing Mother Plants (TE10.3-2 and TE11.1-2) and F1 Progeny From Their Pollination by NT1 Self-compatible A. lyrata Accession With AhS12 S-allele (SCR Present and SRK Lost—fig. 2B). Mating types are abbreviated with SC for self-compatibility and SI for self-incompatibility.
Accession . | SRK genotype . | SCR genotype . | Mating type . |
---|---|---|---|
TE10.3-2 | AhSRK01, AhSRK63 | AhSCR01, – | SI |
TE11.1-2 | AhSRK01, AhSRK54 | AhSCR01, – | SI |
NT1 | – | AhSCR12 | SC |
TE10.3-2♀ × NT1♂ F1.1-1 | AhSRK01 | AhSCR01, AhSCR12 | SC |
TE10.3-2♀ × NT1♂ F1.1-2 | AhSRK63 | AhSCR12, – | SC |
TE11.1-2♀ × NT1♂ F1.2-1 | AhSRK54 | AhSCR12, – | SI |
TE11.1-2♀ × NT1♂ F1.2-2 | AhSRK54 | AhSCR12, – | SI |
Accession . | SRK genotype . | SCR genotype . | Mating type . |
---|---|---|---|
TE10.3-2 | AhSRK01, AhSRK63 | AhSCR01, – | SI |
TE11.1-2 | AhSRK01, AhSRK54 | AhSCR01, – | SI |
NT1 | – | AhSCR12 | SC |
TE10.3-2♀ × NT1♂ F1.1-1 | AhSRK01 | AhSCR01, AhSCR12 | SC |
TE10.3-2♀ × NT1♂ F1.1-2 | AhSRK63 | AhSCR12, – | SC |
TE11.1-2♀ × NT1♂ F1.2-1 | AhSRK54 | AhSCR12, – | SI |
TE11.1-2♀ × NT1♂ F1.2-2 | AhSRK54 | AhSCR12, – | SI |
Some of the SCR genotypes have missing data “–” due to the incomplete SCR database. Note that in the F1s, AhSRK12 is missing due to gene loss in NT1 (fig. 2).
Ancestral Dominant S-allele AhS12 with Lost Self-recognition Function Could Promote A. kamchatica Establishment
Multiple crosses between different A. lyrata and A. halleri have contributed to allopolyploid A. kamchatica (Shimizu et al. 2005; Shimizu-Inatsugi et al. 2009; Tsuchimatsu et al. 2012; Paape et al. 2018). This is also apparent in the strong population structure of the S-allele combinations inherited from different parental lineages (fig. 4C). The most common S-allele in A. kamchatica on the A. lyrata subgenome is AhS12 (AkS-D), which is also fixed in the self-compatible Siberian A. lyrata lineage.
Moreover, F1 crosses (fig. 4A and B) show that the pollen-dominance mechanism is retained in self-compatible Siberian A. lyrata. The same combination of S-alleles AhS1 (AkS-C)/AhS12 (AkS-D) in the F1 self-compatible accession (F1.1-1 plant in table 2; fig. 4A) exists in A. kamchatica and is common in the eastern Siberian mountains bordering Okhotsk sea in Aldan–Amur interfluve (fig. 4C, yellow/blue pie charts). We, therefore, hypothesize that A. kamchatica with AhS1 (AkS-C)/AhS12 (AkS-D) combination of S-alleles was self-compatible in the first generation due to dominance of the AhS12 S-allele inherited from self-compatible Siberian A. lyrata over AhS1 inherited from A. halleri.
Discussion
Full A. lyrata Genomes
Selfing accessions can be considered natural inbred lines, which are especially useful in genomics, as the assembly of their genomes is not complicated by long heterozygous stretches. So far, only one selfing accession (MN47) of A. lyrata from North America has been fully assembled and serves as a reference for this species (Hu et al. 2011). An additional draft assembly of A. lyrata subsp. petraea has also been released (Paape et al. 2018), though its utility is hindered due to gaps in the assembly (12.75% missing) and lack of contiguity (scaffold N50 of 1.2 Mb). Furthermore, whereas a single reference genome provides a useful resource for short-read re-sequencing-based population genetic studies (Novikova et al. 2016; The 1001 Genomes Consortium 2016), reference bias is an increasingly recognized problem. Using long and proximity-ligation reads we assembled high-quality genomes of the Siberian selfing A. lyrata accession NT1 and reassembled North American A. lyrata MN47 accession. We found five inversions ranging from 0.3 to 2.4 Mb in length in between these independently evolved selfing accessions (fig. 1 and supplementary table S3, Supplementary Material online). Large genomic structural rearrangements, especially inversions, can prevent chromosomal pairing and drive reproductive isolation and speciation (Rieseberg 2001; Stevison et al. 2011; McGaugh and Noor 2012; Ayala et al. 2013; Jeffares et al. 2017). In these circumstances, selfing probably increases tolerance to such rearrangements and can even promote their fixation. For example, karyotypic changes from 8 to 5 chromosomes in A. thaliana are linked to a transition to self-compatibility at about 500 Kya (Durvasula et al. 2017). A. lyrata transitions to selfing are more recent but are consistent with this observation. Interestingly, the inversions found within A. thaliana (Jiao and Schneeberger 2020; Goel and Schneeberger 2022) and within A. lyrata (this study) are comparable in size: up to 2.5 Mb and 2.4 Mb, respectively. However, to corroborate that selfing genomes are more tolerant to large structural rearrangements, one must compare the results to outcrossing genomes, which are not yet available at comparable quality, as heterozygosity renders them harder to assemble.
Self-compatibility Evolved at Least Twice in A. lyrata
We described the evolutionary history and distribution of selfing A. lyrata populations in Siberia, which have an independent origin from North American selfing A. lyrata. Siberian selfing populations possess only a single S-haplotype, AhS12 (AlS42), whereas several different haplogroups (AhS1 [AlS1], AhS31 [AlS19], and AhS29 [AlS13]) are found in the North American selfing populations of A. lyrata (Hu et al. 2011; Mable et al. 2017). The differences in S-haplotype composition of selfing lineages in Siberia and North America support their independent origin, consistent with the phylogenetic relationships among accessions from these two regions (supplementary fig. S13B, Supplementary Material online). Our phylogenetic inference yields a well-supported clade of North American A. lyrata, comprised of both self-compatible and self-incompatible accessions, showing that the closest relatives to self-compatible North American A. lyrata are outcrossing North American A. lyrata, instead of self-compatible Siberian A. lyrata.
A transition to selfing is often associated with changes in flower morphology (Sicard and Lenhard 2011; Tsuchimatsu and Fujii 2022), which we observed in Siberian but not in North American selfing accessions (supplementary fig. S1C–E, Supplementary Material online). The lack of so-called “selfing syndrome” in the latter was described previously (Carleial et al. 2017). Similarly, in the outcrossing species Leavenworthia alabamica, two independent selfing lineages have been described, with the older (∼150 Kya) showing an obvious selfing syndrome whereas the younger selfing lineage (∼48 Kya) did not (Busch et al. 2011). Although further investigation is required to quantify this difference in flower size, such observations in A. lyrata may also be explained by differences in transition to selfing: The North American A. lyrata likely transitioned to selfing during or after colonization of the area, around ∼10 Kya (Carleial et al. 2017), which is much more recent than our estimates of the Siberian selfer originating ∼90 Kya.
Transition to Self-compatibility in Siberian A. lyrata Is Associated with S-locus
All selfing Siberian accessions spanning the massive geographical area between Lake Taymyr and Chukotka share the same S-haplotype (AhS12), suggesting the breakdown of self-incompatibility in Siberia is linked to the S-locus. This also suggests a single breakdown of self-incompatibility in the Siberian selfing lineage, as it is unlikely that this transition to self-compatibility occurred independently in multiple individuals with the same AhS12 allele. More than one origin of self-compatibility in the studied Siberian populations is improbable for two reasons: First, the S-locus is highly diverse, as tens of divergent S-alleles typically segregate within outcrossing populations to facilitate reproductive success (Schierup et al. 2001; Castric and Vekemans 2004; Castric and Vekemans 2007), so one would expect more diversity of S-alleles if there were multiple origins; and second, because dominant alleles, including AhS12, are more rare compared with recessive ones (Schierup et al. 1997; Billiard et al. 2007; Genete et al. 2020). The probability of independent loss-of-function on two of the same rare alleles is low (multiplied probabilities of drawing the same rare allele by chance).
However, the breakdown of self-incompatibility in North American A. lyrata is not associated with a specific S-allele or S-locus mutation, but rather with another genetic factor, likely in a downstream cascade of reactions preventing pollen-tube growth (Mable et al. 2005; Foxe et al. 2010; Mable et al. 2017). Therefore, despite strong evidence supporting an S-locus–driven loss of self-incompatibility in Siberian A. lyrata, it is possible that a mutation in a downstream cascade caused the initial mating system switch (Goring et al. 2014; Jany et al. 2019), followed by fixation of a single S-allele due to drift, and further degeneration of the S-allele sequence, reinforcing self-compatibility. Another scenario could involve a modifier mutation specific to AhS12 S-allele, which arose prior to loss-of-function in the S-locus. The existence of allele-specific modifiers has been proposed based on observed segregation patterns in offspring (Nasrallah et al. 2004; Sherman-Broyles et al. 2007; Mable et al. 2017; Li et al. 2019) and could also explain loss of self-incompatibility in Siberian A. lyrata lineage fixed for the AhS12 S-allele. Whereas these alternative explanations are plausible, based on the strong association between a specific S-haplotype (AhS12) and self-compatibility, we conclude that inactivation of AhS12 is the most likely scenario.
Self-compatibility in Siberian A. lyrata Is Likely Male-driven
Our long-read-based genome assembly of A. lyrata NT1 contains a fully-assembled S-locus (fig. 2), in which we manually annotated SCR by Blast analysis of all known SCR sequences in Arabidopsis. The SRK gene was absent from our assembly. Mapping of the short reads from the A. lyrata NT1 accession to A. halleri AhS12 sequence of the same haplotype also did not yield any coverage of the SRK gene, so we conclude that SRK was lost from the NT1 genome. However, this does not mean that the loss of SRK is the causal mutation leading to selfing, as the SCR protein of NT1 A. lyrata also appears to be nonfunctional: 1) it lacks one of the eight cysteine residues (fig. 2C) that were shown to be functionally important (Kusaba et al. 2001; Mishima et al. 2003; Tsuchimatsu et al. 2010) (fig. 2C), and 2) its expression was not detected in flowers (supplementary fig. S9, Supplementary Material online). Genotyping of the S-locus in other selfing A. lyrata accessions reveals that all of them share the same S-haplotype AhS12 (fig. 3C and supplementary table S1, Supplementary Material online), which suggests their shared origin. Moreover, one of the selfing A. lyrata accessions has SRK, but seems to lack SCR (accession number MW0079456, fig. 3). Different reciprocal gene loss mutations of SCR or SRK across accessions (fig. 3B) exclude the possibility of gene loss being a causal mutation and rather suggest that gene loss happened after a common causal mutation.
In controlled crossing experiments (Tsuchimatsu et al. 2012), haplogroup-D SRK in the A. lyrata subgenome of A. kamchatica (AkSRK-D, orthologous to AhSRK12) was shown to be functional. This suggests that SRK in the ancestors of both A. kamchatica and selfing A. lyrata was also functional. We discuss the role of selfing A. lyrata in the origin of A. kamchatica in the next section. If the breakdown of self-incompatibility is indeed S-locus driven (and not caused by an unlinked S-allele specific modifier), it most likely occurred on SCR rather than SRK in this lineage. Our results show that indeed, SCR from NT1 is not recognized by a functional SRK of the same haplogroup (from accession NT8.4-24; supplementary fig. S11, Supplementary Material online). Whether the initial loss-of-function in the SCR protein was due to a loss of a structurally important cysteine residue (fig. 2C) or a loss of expression (supplementary fig. S9A and B, Supplementary Material online) is unclear. Transitioning to selfing through degradation of male specificity gene would be consistent with the recurrent pattern in the evolution of self-compatibility (reviewed in Shimizu and Tsuchimatsu (2015)). According to Bateman's principle, an S-haplotype with nonfunctional SCR and functional SRK will produce pollen of higher fitness, as it will be compatible with all other S-haplotypes including itself. In contrast, an S-haplotype with a functional SCR and a nonfunctional SRK will produce pollen that will be self-compatible but incompatible with the fraction of the population carrying the same, albeit fully functional, S-haplotype. Pistils with a nonfunctional SRK do not have a higher fitness unless pollen availability is very limited, making fixation of the male-driven selfing more likely (Bateman 1954; Tsuchimatsu and Shimizu 2013). The most likely scenario suggested by our results, where self-compatibility in Siberian A. lyrata is SCR-driven, is therefore consistent with Bateman's principle.
Self-compatible Siberian A. lyrata Is Ancestral to A. kamchatica
A previous study showed a Siberian A. lyrata accession (lyrpet4) to be genetically closest to A. kamchatica, however this was limited to sampling in a single locality and did not include assessment of S-alleles (Shimizu-Inatsugi et al. 2009; Paape et al. 2018). In addition to the previously reported selfing individual, our field and herbarium collections yielded seven more self-compatible accessions, spanning a wide geographical range across Siberia (supplementary table S1, Supplementary Material online). We explored the relationships among all Siberian A. lyrata accessions with A. kamchatica using network analysis and hierarchical clustering. Genetic network of Nei's D (fig. 3B) shows that A. kamchatica clusters closely to self-compatible Siberian A. lyrata, which is consistent with the sister relationship between A. kamchatica and self-compatible A. lyrata in a well-supported ML phylogeny (supplementary fig. S13B, Supplementary Material online). Moreover, we identified a fixed S-allele (AhS12) associated with self-compatibility in Siberian A. lyrata.
Allopolyploid A. kamchatica has three S-alleles inherited from A. halleri—AhS26 (AkS-A), AhS47 (AkS-B), and AhS1 (AkS-C) and two S-alleles inherited from A. lyrata—AhS12 (AkS-D) and AhS02 (AkS-E) (Tsuchimatsu et al. 2012). The AhS12 S-allele is the most frequent in the A. lyrata subgenome of A. kamchatica and was inherited from a self-compatible Siberian A. lyrata lineage. A tree of A. lyrata and A. kamchatica accessions, which share the AhS12 haplotype (based on exon 1 of the SRK gene; supplementary fig. S13A, Supplementary Material online), shows that a self-compatible A. lyrata accession is nested within a clade of A. kamchatica accessions, providing further support for their shared origin.
Furthermore, our demographic modeling suggests the Siberian selfing lineage originated approximately 90 Kya. This is in line with estimates by Paape et al. (2018), who dated the divergence times of both A. kamchatica subgenomes. Their estimates for divergence time of the A. halleri subgenome range from ∼60 to 100 Kya and the A. lyrata subgenome between ∼70 Kya and 140 Kya. The authors recommend caution when interpreting these parameters, and we agree that: Mutation rates used in both studies are from A. thaliana rather than A. lyrata, and sample sizes are small in both cases. Still, given the overlap in divergence estimates from both our study and work by Paape et al. (2018), it is plausible that at least one of the multiple polyploid origins of A. kamchatica included this selfing Siberian A. lyrata lineage as a parental genome donor.
Combinations of A. kamchatica S-alleles show a strong population structure (fig. 4C) consistent with multiple origins of A. kamchatica in different geographical regions (Shimizu et al. 2005; Shimizu-Inatsugi et al. 2009; Tsuchimatsu et al. 2012; Paape et al. 2018). However, the current sampling of A. kamchatica is biased towards Japan and the Kamchatka Peninsula, and this uneven coverage of the species range means that observed frequencies of S-allele combinations may not represent their true distribution. That said, a combination of dominant nonfunctional AhS12 (A. lyrata-derived) and recessive AhS1 (A. halleri-derived) S-alleles is common in A. kamchatica in the eastern Siberian mountains bordering Okhotsk sea in Aldan–Amur interfluve (fig. 4C).
Interestingly, whereas both progenitors of A. kamchatica coexist in Europe, and interspecific crosses can be created ex situ (Sarret et al. 2009), A. lyrata and A. halleri do not form other allotetraploids (Clauss and Koch 2006; Schmickl et al. 2010). The variation (or lack thereof) of mating systems in A. lyrata and A. halleri can explain why allopolyploid establishment is limited to Asia: A. halleri is self-incompatible throughout its range (no known selfing accessions have been described to date), and selfing A. lyrata is found only in Siberia and North America. Previous work showed that self-compatibility in A. kamchatica was likely male (SCR)-driven in the more dominant S-haplotype inherited from A. lyrata (Ah12/Al42/Ak-D) (Tsuchimatsu et al. 2012). We argue that self-compatibility is ancestral to A. kamchatica, and inherited from Siberian A. lyrata. We also show that dominance between nonfunctional AhS12 and functional AhS01 is retained in self-compatible A. lyrata (fig. 4A and B) and therefore argue that the transition to selfing in A. kamchatica with this combination of S-alleles was likely immediate upon allopolyploid formation. Our results show that Siberian selfing diploid A. lyrata is ancestral to allotetraploid A. kamchatica, and contributed the most widely observed A. lyrata-derived S-allele (AhS12) in A. kamchatica. Furthermore, the nonfunctional AhS12 S-allele is still dominant over the recessive AhS01 S-allele in A. lyrata. This dominance of the nonfunctional S-allele likely explains the transition to self-compatibility in A. kamchatica with the same combination of S-alleles (AhS12/AkS-D and AhS01/AkS-C), rather than self-compatibility evolving de novo in A. kamchatica.
Similar examples where a loss-of-function mutation on a dominant S-haplotype in one progenitor facilitated transition to selfing in allotetraploids have been recently reviewed (Novikova et al. 2022) and include A. suecica (Novikova et al. 2017), Capsella bursa-pastoris (Bachmann et al. 2019, 2021; Duan et al. 2023), and Brassica napus (Okamoto et al. 2007; Kitashiba and Nasrallah 2014). Allopolyploid establishment may be facilitated by a transition to self-compatibility, ensuring reproductive success in the face of limited mating partners.
Materials and Methods
Plant Collection and Growth
We collected seeds from three A. lyrata populations (NT1, NT8, and NT12) during an expedition to the Yakutia region in Russia in the summer of 2019 (supplementary table S1, Supplementary Material online). Multiple individual plants were collected from those three populations: three individuals from NT1 (NT1_1, NT1_2, and NT1_3), four from NT8 (NT8_1, NT8_2, NT8_3, and NT8_4) and two from NT12 (NT12_1 and NT12_2). Collected seeds were grown in the greenhouse at 21 °C, under 16 h of light per day until a full rosette was formed, after which plants were moved to open frames outside on the grounds of the Max Planck Institute for Plant Breeding Research in Cologne, Germany. We grew several seeds per collected bag of seeds from individual plants, each was given an additional number extension (e.g., NT1_1_1, NT1_1_2, etc.). In this work, we only used the plants with last extension 1. All the individuals grown from NT1 population formed long fruits and appeared to be selfing. NT1 samples were collected on a sandy island in the course of the Lena river (GPS coordinates 66.80449, 123.46546; supplementary fig. S1A, Supplementary Material online shows a picture of the collection site).
Pollen Tube Staining to Characterize Mating Type
Almost mature flower buds were opened and after removing the anthers, they manually pollinated. Pistils were collected 2–3 h after pollination, fixed for 1.5 h in 10% acetic acid in ethanol, and softened in 1 M NaOH overnight. Before staining, the tissue was washed three times in KPO4 buffer (pH 7.5). For staining, we submerged the tissue in 0.01% aniline blue for 10–20 min. After that, pistils were transferred to slides into mounting media and observed under UV light (Lu 2011). A self-compatible reaction was called if we counted more than ten pollen tubes.
Long-read Sequencing for de novo Genome Assembly
DNA extraction, library preparation, and long-read sequencing of the NT1 and MN47 A. lyrata accessions were performed by the Max Planck-Genome-centre Cologne, Germany (https://mpgc.mpipz.mpg.de/home/). High molecular weight DNA was isolated from 1.5 g material with a NucleoBond HMW DNA kit (Macherey Nagel). Quality was assessed with a FEMTOpulse device (Agilent), and quantity was measured by a Quantus fluorometer (Promega). HiFi libraries were then prepared according to the manual “Procedure & Checklist—Preparing HiFi SMRTbell® Libraries using SMRTbell Express Template Prep Kit 2.0” with an initial DNA fragmentation by g-Tubes (Covaris) and final library size selection on BluePippin (Sage Science). Size distribution was again controlled by FEMTOpulse (Agilent). Size-selected libraries were then sequenced on a Sequel II device with Binding Kit 2.0 and Sequel II Sequencing Kit 2.0 for 30 h (Pacific Biosciences).
Short-read Sequencing for Population Analyses
Plant material was processed in two different ways, indicated by types I and II in supplementary table S1, Supplementary Material online.
Type I: Herbarium material was extracted in a dedicated clean-room facility (Ancient DNA Laboratory, Department of Archaeology, University of Cambridge). The lab has strict entry and surface decontamination protocols, and no nucleic acids are amplified in the lab. For each accession, leaf and/or stem tissue was placed in a 2 ml tube with two tungsten carbide beads and ground to a fine powder using a Qiagen Tissue Lyser. Each batch of extractions included a negative extraction control (identical but without tissue). DNA was extracted using the DNeasy Plant Mini Kit (Qiagen). Library preparation and sequencing were performed by Novogene Ltd (UK). Sequencing libraries were generated using NEBNext® DNA Library Prep Kit following manufacturer's recommendations, and indices were added to each sample. The genomic DNA is randomly fragmented to a size of 350 bp by shearing, then DNA fragments were end polished, A-tailed, and ligated with the NEBNext adapter for Illumina sequencing, and further enriched by polymerase chain reaction (PCR) on P5 and indexed P7 oligos. The PCR products were purified (AMPure XP system), and resulting libraries were analyzed for size distribution by Agilent 2100 Bioanalyzer and quantified using real-time PCR.
Type II: Genomic DNA was isolated with the “NucleoMag© Plant” kit from Macherey and Nagel (Düren, Germany) on the KingFisher 96Plex device (Thermo) with programs provided by Macherey and Nagel. Random samples were selected for a quality control to ensure intact DNA as a starting point for library preparation. TPase-based libraries were prepared as outlined by (Rowan et al. 2019) on a Sciclone (PerkinElmer) robotic device. Short-read (PE 150 bp) sequencing was performed by Novogene Ltd (UK), using a NovaSeq 6000 S4 flow cell Illumina system.
Transcriptome Sequencing for S-locus Gene Expression Assessment
We used three flash-frozen open flowers of the A. lyrata NT1 accession as input material for RNA sequencing, which we used to assess the expression of the S-locus genes. RNA was extracted by the RNeasy Plant Kit (Qiagen) including an on-column DNase I treatment. Quality was assessed by Agilent Bioanalyser and the amount was calculated by an RNA-specific kit for Quantus (Promega). An Illumina-compatible library was prepared with the NEBNext® Ultra™ II RNA Library Prep Kit for Illumina ® and finally sequenced on a HiSeq 3000 at the Max Planck-Genome-centre Cologne, Germany.
PacBio de novo Assembly and Annotation of NT1 and MN47 A. lyrata Accessions
Raw PacBio reads of NT1 were assembled using Hifiasm assembler (Cheng et al. 2021) in the default mode, choosing the primary contig graph as our resulting assembly. The completeness of our assembly was assessed using BUSCO (Seppey et al. 2019) with Brassicales_odb10 set. Repeated sequences were masked using RepeatMasker (Smit et al. 2013–2015) with the merged libraries of RepBase A. thaliana repeats and NT1 A. lyrata repeats, which we modeled with RepeatModeler (Smit and Hubley 2008–2015). Then, annotation from the reference MN47 genome (Rawat et al. 2015) was transferred to our NT1 repeat-masked assembly by using Liftoff (Shumate and Salzberg 2020). Contigs were reordered according to their alignment to the reference chromosomes and updated gene and repeat annotations using RagTag (Alonge et al. 2019) in the scaffolding mode without correction. Assembly of MN47 PacBio reads was done using the Hifiasm assembler with the same parameters.
Synteny Analysis of A. lyrata, A. suecica, and C. rubella Genomes
Synteny analysis was done by performing an all-against-all BlastP search using the coding sequences of both genomes. We used SynMap (Haug-Baltzell et al. 2017), a tool from the online platform CoGe, with the default parameters for DAGChainer. The Quota Align algorithm was used to decide on the syntenic depth, employing the default parameters. Syntenic blocks were not merged. The results were visualized using the R (version 4.1.2) library “circlize” (version 0.4.13), as well as using plotsr (version 0.5.3) (Goel and Schneeberger 2022) for the supplementary figures, Supplementary Material online.
HiC Sequencing of NT1 A. lyrata Accession to Validate Structural Variants
A chromatin-capture library of the NT1 A. lyrata accession was prepared by the Max Planck-Genome-centre Cologne, Germany and was used for validation of the large inversions in whole-genome comparisons. We followed the Dovetail® Omni-C® Kit starting with 0.5 g of fresh weight as input. Libraries were quantified and quality assessed by capillary electrophoresis (Agilent Tapestation) and then sequenced at the Novogene Ltd (UK), using a NovaSeq Illumina system.
Mapping of Hi-C Reads for the A. lyrata Accessions NT1 and MN47
To validate the assembled scaffolds of A. lyrata, we used proximity-ligation short read Hi-C data. For NT1, Hi-C reads were mapped to the repeat-masked NT1 genome assembly, using the mapping pipeline proposed by the manufacturer (https://omni-c.readthedocs.io/en/latest/index.html). The Dovetail Omni-C processing pipeline is based on BWA (Li and Durbin 2009), pairtools (https://github.com/mirnylab/pairtools), and Juicertools (Durand et al. 2016). We mapped the Hi-C reads for MN47 (released previously (Zhu et al. 2017)) to a repeat masked MN47 genome (Hu et al. 2011) and to a repeat masked version of the newly assembled MN47 genome (in this paper) using HiCUP (version 0.6.1) (Wingett et al. 2015). The assemblies were manually examined using Juicebox (Robinson et al. 2018). Plots of the HiC contact matrix were made using the function hicPlotMatrix from HiCExplorer (Wolff et al. 2020) (version 3.7.2).
Validation of Structural Variants Between NT1 and MN47 A. lyrata Accessions
To validate the inversions (supplementary table S2, Supplementary Material online), we used PacBio, Hi-C data, and synteny analysis results. Guided by synteny analyses, we first identified inversion breakpoints. Then, we investigated the long-read map at these regions and either confirmed their contiguity or manually flipped the genomic region, followed by another round or long-read map investigation (supplementary figs. S3–S8, Supplementary Material online). To map the PacBio HiFi reads we used Winnowmap (Jain et al. 2020). As the last step, we analyzed the Hi-C contact maps in the same regions to show that there is no evidence for alternative genome assembly configurations (supplementary figs. S3–S7, Supplementary Material online).
A. lyrata NT1 S-locus Genotyping and Manual Annotation
We manually annotated the S-locus in our initial assembly before the reference-guided reordering and scaffolding. In the transferred annotation resulting from Liftoff (Shumate and Salzberg 2020), we found both of the flanking genes (U-box and ARK3) in the same contig. The final coordinates of the S-locus in the NT1 assembly on scaffold 7 are 9,291,658 bp to 9,336,246 bp. The length of the assembled NT1 A. lyrata S-locus including both flanking genes is about 44.5 Kbp. We mapped PacBio long reads back to the assembled NT1 genome using minimap2 (Li 2018) with default parameters in order to make sure that there are no obvious gaps in coverage or breakpoints (supplementary fig. S8, Supplementary Material online). Similar to Zhang et al. (2019), we blasted the SRK and SCR sequences from all the known S-haplotypes across Arabidopsis and Capsella to the A. lyrata NT1 S-locus, finding a single hit at the SCR gene from the AhS12 haplogroup. We constructed a comparative structure plot of A. lyrata NT1 and A. halleri S12 (GenBank accession KJ772374) S-loci (fig. 2B) using the R library genoPlotR (Guy et al. 2010). We aligned SCR protein sequences using MAFFT with default parameters and estimated a phylogenetic tree with RaxML (Stamatakis 2014) using the BLOSUM62 substitution model and visualized the alignment (fig. 2C) using Jalview2 (Waterhouse et al. 2009). The phylogenetic tree was visualized using R package “ape” (Paradis et al. 2004).
A. lyrata S-allele Genotyping From Short-read Sequencing Data
The S-alleles from all the re-sequenced samples used in the population analysis (supplementary table S1, Supplementary Material online) and crosses (NT8.4-24, submitted to ENA under ERS12276051) were genotyped using the S-locus genotyping pipeline NGSgenotyp (Genete et al. 2020). The list of SRK and SCR alleles used as a reference data set is provided in the supplementary table S3, Supplementary Material online, and the corresponding sequences for SRK and SCR alleles are provided in the supplementary Data 1 and 2, Supplementary Material online. Using the NGSgenotyp pipeline, we could not identify any S-haplotypes for DRR124344 (lyrpet4), for either SRK or SCR databases. However, we found a partial SCR gene sequence matching the AhS12 haplotype by blasting the SCR database to the DRR124344 assembly. We translated the SCR nucleotide sequence and aligned the resulting protein sequence with SCR proteins from other accessions using MAFFT (Katoh and Standley 2013) using default parameters. The resulting alignment shows that SCR from DRR124344 is shorter compared with NT1 or AhS12. To confirm that SCR from DRR124344 belongs to the AhS12 haplotype, we estimated a maximum likelihood tree using IQ-tree web service (http://www.iqtree.org/) with default parameters (supplementary fig. S10B, Supplementary Material online).
Short Read Mapping and Variant Calling for Population Analysis
We first filtered the short paired-end reads (2 × 150 bp) for adapter contamination using bbduk.sh script from BBMap (38.20) (Bushnell 2014) with the following parameters settings: ktrim=r k=23 mink=11 hdist=1 tbo tpe qtrim=rl trimq=15 minlen=70. Then, we mapped the reads to the MN47 and NT1 A. lyrata genome with bwa mem (0.7.17) (Li and Durbin 2009), marking shorter split reads as secondary (-M parameter). We marked potentially PCR duplicated reads with picard MarkDuplicates (http://broadinstitute.github.io/picard/), sorted and, indexed the bam file with samtools (Li et al. 2009). To call variants, we used the HaplotypeCaller algorithm from GATK (McKenna et al. 2010) (3.8). We then ran GenotypeGVCF from GATK including non-variant sites on the entire sample set to generate a vcf. To estimate heterozygosity levels, we calculated the proportion of heterozygous sites within all the confidently called sites in mapping to MN47 and NT1 reference genomes (supplementary table S1, Supplementary Material online and supplementary fig. S12, Supplementary Material online).
Separation of Subgenomes From A. kamchatica Accessions
To isolate the A. lyrata subgenome of A. kamchatica, we used a combined reference, containing A. lyrata NT1 and A. halleri ssp. gemmifera reference genomes (Briskine et al. 2016). We mapped A. kamchatica short reads to the combined reference with bwa mem (0.7.17) (Li and Durbin 2009) and filtered for reads mapped uniquely to A. lyrata NT1 using samtools (Li et al. 2009). We then genotyped the resulting A. lyrata-subgenome bam files for each A. kamchatica accession as described above for diploid samples.
Tree and Network Estimation
Genome-wide SNP Tree
We filtered the vcf generated above to include only biallelic SNPs without missing data, which resulted in 2,261,679 SNPs. These data were read into R (version 4.1.1) and from them, we estimated a neighbor-joining tree using the nj function from package ape (Paradis and Schliep 2019). We then visualized the neighbor-joining tree as a cladogram using ggtree (Yu et al. 2017, 2018; Yu 2020) and annotated the tips with associated data (supplementary fig. S13B, Supplementary Material online). We then further filtered this data set to include only Siberian A. lyrata and an outgroup (excluding A. kamchatica from this portion) to generate the lyrata-only tree (fig. 3C).
Network Based on Nei's D and Phylogenetic Inference
We filtered a vcf of biallelic SNPs shared by the lyrata subgenome of A. kamchatica and all A. lyrata accessions down to just four-fold degenerate sites, with maximum 10% missing data across individuals, resulting in 4,141 SNPs. We read the vcf with both Siberian A. lyrata and A. kamchatica into R using vcfR (Knaus and Grünwald 2017), then calculated Nei's D (Nei 1972) between individuals using StAMPP (Pembleton et al. 2013). We visualized the resulting matrix in SplitsTree4 (Huson and Bryant 2006) and in R using the pheatmap package (Kolde 2019). To further explore the evolutionary relationships among accessions, we generated a nexus file from the vcf using vcf2phylip (Ortiz 2019), which served as input for phylogenetic inference with IQTree (http://www.iqtree.org/)
SRK Tree
We assembled partial SRK sequences from Siberian A. lyrata and A. kamchatica accessions based on short-read sequencing data using the assembly step of the S-locus genotyping pipeline NGSgenotyp (Genete et al. 2020) and aligned sequences with MAFFT (Katoh and Standley 2013). From this alignment estimated 1,000 bootstrap replicates of a ML phylogeny using RaXML (Stamatakis 2014) with substitution model GTR+Γ then visualized the best-scoring ML phylogeny using R package ape 5.0 (Paradis and Schliep 2019). The input alignment is available in supplementary Data 3, Supplementary Material online.
PCR Identification of AhS12 Haplotype
For DNA extraction, 1 cm of leaf material was frozen in liquid nitrogen and ground to a powder. We added 400 μl UltraFastPrep Buffer to the powdered tissue, then mixed, vortexed, and finally spun for 5 min at 5000 revolutions per minute (rpm). We then took 300 μl of the supernatant, added 300 μl isopropanol, and mixed by inversion. We again spun for 5 min at 5000 rpm, then discarded the supernatant and dried 10–30 min at 37 °C. The pellet was resuspended in 200 μl 1xTE and stored at 4 °C. We amplified the AhSRK12 allele by PCR using 1.5 μl of DNA solution and previously published primers (forward ATCATGGCAGTGGAACACAG, reverse CAAATCAGACAACCCGACCC) (Ruggiero et al. 2008). We ran 35 cycles consisting of 30 s at 94 °C, 30 s annealing at 56.8 °C, and 40 s extension at 72 °C. We visualized PCR products via gel electrophoresis using 1.5% agarose gel with GelGreen® nucleic acid stain (supplementary fig. S10A, Supplementary Material online). Accessions identified with SRK 12 (NT8.4–24) and without SRK 12 (NT8.4-20) were used in crosses (supplementary fig. S10B–D, Supplementary Material online).
Demographic Modeling of Divergence Between Selfing and Outcrossing Siberian A. lyrata Lineages
We calculated nucleotide diversity using all biallelic and non-variant sites in 10 kb windows with custom script uploaded to github (https://github.com/novikovalab/selfing_Alyrata). CIs for the median of the distribution were calculated using the basic bootstrap method in the R package “boot” (Davison and Hinkley 1997; Canty and Ripley 2022).
To prepare a joint allele frequency spectrum of the seven self-compatible accessions and the ten self-incompatible accessions, we first filtered the SNP-only vcf to remove centromeric, pericentromeric, and exonic regions. We subsequently filtered out sites with missing data to yield our final vcf for demographic inference. Following Nordborg and Donnelly (1997), we excluded sites heterozygous in the selfing population and treated selfers as haploid. We then generated the joint allele frequency spectrum using easySFS (https://github.com/isaacovercast/easySFS). EasySFS produces output ready for use in fastsimcoal2 (fsc26) (Excoffier et al. 2013, 2021), which we then used for demographic modeling. We tested five models for the origin of self-compatibility in Siberian A. lyrata as follows: 1) simple divergence, 2) divergence with symmetrical introgression (migration), 3) divergence with asymmetrical introgression, 4) simple divergence model as in Model 1 plus bottleneck in selfing population; and 5) Model 3 (asymmetric gene flow) plus bottleneck in selfing population.
For each model, we initiated 100 fastsimcoal2 runs. We then chose the best run for each model (the run with the best likelihood scores) and from that best run, we calculated the Aikake Information Criterion for the model. After selecting the model with the best AIC score, we used the maximum likelihood parameter file to generate 200 pseudo-observations of joint SFS for bootstrapping. For each of the 200 pseudo-observations, we initiated 100 fastsimcoal2 runs, then selected the best run for each model based on likelihood scores as above. The resulting parameter estimates from the 200 replicate pseudo-observations were used to calculate the 95% CIs in R. Site frequency spectra and other fastsimcoal2 input files (.tpl and .est) are on GitHub (https://github.com/novikovalab/selfing_Alyrata). Because fastsimcoal2 reports haploid effective population sizes, we divided them by two to report numbers of diploid individuals (table 1). These parameters can be interpreted as the inverse of the coalescent rate estimated from our accessions.
Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.
Acknowledgments
We thank the Max Planck-Genome-centre Cologne, Germany (http://mpgc.mpipz.mpg.de/home/) for performing short- and long-read libraries and PacBio sequencing in this study. We thank Vincent Castric and Mathieu Genete for sharing the up-to-date SRK database for the S-locus genotyping pipeline. We thank Kathrin Wippel for providing seeds of North American MN47 A. lyrata. We thank Jo Osborn, University of Cambridge, for assisting with the herbarium extractions. The work of A.P.S. on the curation of dry plant material was supported by the Russian Science Foundation (project #21-77-20042). Field research was supported by the Austrian Science Fund (FWF grant P30208-B-29) and held within the state assignment of the Papanin Institute for Biology of Inland Waters Russian Academy of Sciences (theme 121051100099-5). The sequencing and data analysis were funded by the European Union (ERC, HOW2DOUBLE, 101041354) and by the Deutsche Forschungsgemeinschaft (DFG)—project number 462181533. The views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council Executive Agency. Neither the European Union nor the granting authority can be held responsible for them.
Author Contributions
U.K.K., A.D.S., and P.Y.N. conceptualized the project. U.K.K., N.P.T., U.P., A.C.C., A.P.S., and L.Y. collected material and/or generated data. U.K.K., A.D.S., J.D.V.d.V., R.B., N.P.T., X.V., S.L., and P.Y.N. analyzed the data. U.K.K., A.D.S., J.D.V.d.V., R.B., and P.Y.N. wrote the manuscript. All authors read and approved the final manuscript.
Data Availability
The whole genome raw Illumina short reads for the samples used in this study were submitted to the ENA database under the project number PRJEB50329 (ERP134897). Individual accession names are listed in the supplementary table S1, Supplementary Material online. Raw PacBio HiFi reads of NT1 and MN47, Hi-C reads of NT1, RNAseq reads of NT1, and the genome assembly and annotation of A. lyrata NT1 (GCA_945152055) and MN47 (GCA_944990045) have been submitted to ENA database under the same project number PRJEB50329 (ERP134897) and to https://figshare.com/projects/Arabidopsis_lyrata_genome_assemblies/162343. Scripts associated with the project are at https://github.com/novikovalab/selfing_Alyrata.
References
Author notes
Uliana K Kolesnikova and Alison Dawn Scott co-first authors.