Abstract

The soybean tentiform leafminer, Macrosaccus morrisella (Fitch) (Lepidoptera: Gracillariidae), is native to North America where it was known to feed on American hogpeanut and slickseed fuzzybean. However, it has recently expanded its host range to include soybean, an important agricultural crop. Here, we report a new, highly contiguous genome for this species with a length of 245 Mb, N50 of 9 Mb, and 96.33% BUSCO completeness. The mitochondrial genome shares only 81% identity to its nearest relative in the NCBI nucleotide database indicating long-standing divergence or sparse sequencing in this clade. To determine whether host plant choice is genetically driven, we sequenced 18 individuals across 3 locations in Minnesota, USA, collected from both American hogpeanut and soybean plants. Genetic variation did not correlate with population structure based on either geography or host plant species (weighted FST estimate: 0.0058). As a secondary measure, we independently assembled complete mitochondrial genomes from all individuals and observed no delineation between host or location. The overall lack of detectable population structure at the nuclear and mitochondrial genome levels suggests a large population with flexible dietary preferences and does not show evidence of genetically driven host preference.

Introduction

Gracillariidae is a diverse family of about 2000 species whose larvae are plant miners, with most species being monophagous (eating a single host species) or oligophagous (eating only a few species) leaf miners (Davis 1987; van Nieukerken et al. 2011; Kawahara et al. 2017; De Prins and De Prins 2024) Gracillariid larvae are hypermetamorphic wherein instars of different developmental stages have distinct sap-feeding and tissue-feeding forms (Davis 1987; De Prins and De Prins 2024). Within this family, Macrosaccus is a relatively small genus of 5 New World species that feed on Fabaceae (Fabales) (Davis and De Prins 2011). The soybean tentiform leafminer, Macrosaccus morrisella (Fitch) (Lepidoptera: Gracillariidae), is native to North America where it was known to feed on American hogpeanut, Amphicarpaea bracteata (L.) Fernald, and slickseed fuzzybean, Strophostyles leiosperma (Torr. and A. Gray) Piper (Fabales: Fabaceae) (Davis and De Prins 2011). These 2 vining legumes occur in forest understories in eastern North America (USDA 2024a). However, M. morrisella was recently discovered feeding on soybean in Quebec, Canada, in 2016 and in Minnesota, USA, in 2021 (Koch et al. 2021). Soybean is a major source of vegetable oil and protein grown on 34,913,000 and 2,300,000 ha in the USA and Canada, respectively (USDA 2024b).

Adults of M. morrisella oviposit on soybean leaves (Menger et al. 2024). The feeding injury (i.e. mines) created by larvae of M. morrisella progress from serpentine mines, to blotch mines to tentiform mines as the larvae develop (Davis and De Prins 2011; Koch et al. 2021; Menger et al. 2024). These mine types are named after the appearance of the damage on the leaf as the miner feeds and the ultimate tentiform structure is created. The larvae pupate inside the mines (Davis and De Prins 2011; Koch et al. 2021). Like other leafminers (Kappel and Proctor 1986; Liu et al. 2015; Ullah et al. 2020), it is believed that the feeding injury from M. morrisella reduces photosynthesis of infested leaves of soybean plants (Koch et al. 2021). Development of M. morrisella from egg to adult requires an accumulation of 425 degree-days above a lower developmental threshold of 8.96°C, which could enable repeated attack of soybean plants by multiple generations of M. morrisella within a single growing season (Ribeiro et al. 2024).

As a new pest, relatively little is known about the biology of M. morrisella. Kogan (1981) stated that an important source of new pests for soybean in North America would be native oligophagous (i.e. legume feeding) insects adapting to soybean, which seems to align well with the appearance of M. morrisella in soybean. Advances in genomics research are facilitating the understanding of insect adaptation to new host plants (Simon et al. 2015). However, despite their biological and economic significance, the genomes have been sequenced from relatively few (only 3) species of Gracillariidae. Here, we provide the first genome sequence for M. morrisella to establish a foundation for examination of its biology and development of more sustainable management programs. In addition, we characterize the genomic variability and association with host plant use.

Materials and methods

Sample collection

To sequence the complete genome of M. morrisella, individuals were obtained from a laboratory colony established in 2022 and maintained over ∼22 generations according to Menger et al. (2024) at the University of Minnesota, Minnesota, USA. In short, this colony was initiated with adults of M. morrisella reared from soybean leaves infested with pupae of M. morrisella collected from a soybean field in Henderson, Minnesota, USA, in summer 2022. In the laboratory colony, emerged adults were allowed to oviposit for 48–72 h on potted soybean plants (variety: “Sheyenne”) at the V1–V2 developmental stage (Purcell et al. 2014) inside an oviposition cage. After oviposition, potted plants with M. morrisella eggs were transferred to a separate cage for development of immature M. morrisella. When M. morrisella reached the pupal stage, the pots were transferred to an adult emergence cage. After adult emergence, the process described above was repeated to maintain the colony. The colony was maintained in a walk-in environmental growth chamber at 25 ± 2°C and 16 h photophase.

From this colony, pupae were manually collected from mines of infested soybean plants with the help of an entomological pin (size #1) and a fine paintbrush to carefully open the mines and the silken cocoons. Collected pupae were placed inside individual 2-mL microcentrifuge tubes for adult emergence. A single pair of freshly emerged (i.e. <24 h old) adults (F0) was bred using the above-mentioned methods, their offspring were raised until pupation, and an F1 sibling pair was bred using these same methods. Finally, the resulting F2 offspring were used for sequencing the reference genome and are referred to as the “F2 isoline.”

To examine population genomics of M. morrisella, soybean fields infested with M. morrisella were sampled on July 20 and 2023 August 10 in Brooten, Minnesota, on 2023 June 22 and 23 in St. Paul, Minnesota, and on 2023 August 14 in Rochester, Minnesota, USA. For each field on each sample date, 20 soybean leaflets containing tentiform mines of M. morrisella were manually collected from field edges (first 3 rows) adjacent to wooded areas, placed inside an individually labeled 17 × 17-cm resealable plastic bag, and then into a refrigerated cooler for transportation to the laboratory at the University of Minnesota Saint Paul Campus. Samples were collected from edges of soybean fields near wooded areas because infestations are higher at these locations (Ribeiro and Koch 2024). Similarly, 20 leaflets of American hogpeanut containing tentiform mines of M. morrisella were collected from forested areas <2.5 km from the soybean fields sampled, on the same dates and using the same methods described above for soybean. These methods were used to minimize geographical and temporal variability between samples from both hosts. In the laboratory, the collected leaflets were placed inside emergence cages similar to Melotto et al. (2023), with a moist cotton ball on a petri dish inside each cage to help maintain the relative humidity inside cages at ∼70%. Deionized water was added to the cotton balls every 2–3 days. Cages were maintained inside a rearing room at ∼25°C and 16 h photophase for adult emergence. The emergence of M. morrisella adults was evaluated every 2–3 days, and adults were collected from emergence cages with the help of a mechanical aspirator (Clarke Mosquito Control, #13500, St. Charles, IL, USA). On each evaluation day, collected adults were placed inside individual 2-mL microcentrifuge tubes containing 90% ethanol/deionized water (v/v) solution and stored inside a −20°C freezer. The preserved adults of M. morrisella were used for population genomics analysis described below.

DNA extraction and sequencing

Approximately 20 individuals from the F2 isoline underwent DNA extraction using a Qiagen MagAttract kit (cat. 67563) according to the manufacturer's instructions. The resulting DNA was pooled and library prepped using a ligation sequencing kit, LSK-114 from Oxford Nanopore Technologies (ONT). The library was run on a single promethION flowcell using a P2 Solo instrument (ONT). Resulting squiggle data in pod5 format were basecalled using dorado v0.7.0 and model [email protected] including 5mC methylation calling at CpG sites.

Genome assembly

Reads longer than 5 kb were corrected using the HERRO algorithm as implemented in dorado v0.7.0. The genome was assembled using hifiasm v0.16.0 (Cheng et al. 2021). Completeness was assessed using Compleasm (Huang and Li 2023), an alternative to BUSCO that uses the same BUSCO databases. Here, we used the lepidoptera_odb10 database matching to 5286 genes for completeness. To reduce duplicate BUSCO percentage below 1%, we applied three successive rounds of purge_dups v1.2.6 (Guan et al. 2020). Residual adaptor and vector sequence was removed from the assembly using the NIH Foreign Contaminant Screening tool. Repeats were identified using RepeatModeler2 v2.0.5 (Flynn et al. 2020) to generate a species-specific repeat database which was run against the assembly using RepeatMasker v4.1.6 (Smit et al. 2013). Other genomes from Gracillariidae underwent the same analysis for comparative purposes. The mitochondrial sequences were identified using MitoHiFi (Uliano-Silva et al. 2023) and initially aligned to the mitogenome from Euspilapteryx auroguttella Stephens (Lepidoptera: Gracillariidae). DNA methylation was determined by pileup of the modified cytosines into bedmethyl format via modkit v0.3.1. Protein prediction and annotation was performed using GeMoMa v1.9 with monarch butterfly genes as a guide. The pipeline for assembly and analysis is available in Supplementary File 1.

Population genomic analysis

The DNA of the field populations was extracted and sequenced with the same parameters as above. Each individual was barcoded individually and sequenced using the native barcoding kit SQK-NBD-114 from ONT. Resulting barcodes were separately aligned to the reference isoline genome. SNVs were called using Clair3 v1.0.10 (Zheng et al. 2022) and resulting vcf files were merged using bcftools (Danecek et al. 2021). The merged vcf was filtered to remove any variants below Q20, indels, no more than 10% individuals missing a genotype, minimum depth 2X, max depth 100X, and a minor allele frequency of <8% (at least 3 individuals must have a minor allele present per variant location). Fst, principal component analysis, heterozygosity, and depth were performed in plink 1.9 (Purcell et al. 2007). First linkages were pruned and the filtered vcf was used to create the principal component analysis (PCA) plot. Admixture plots were created using admixturePipeline (Mussmann et al. 2023) and visualized with Clumpak (Kopelman et al. 2015). The VCF file for all individuals is available as Supplementary File 2.

Results and discussion

Assembly

Pairs of M. morrisella from a laboratory colony were selected for sibling mating to reduce heterozygosity prior to reference genome assembly. After 2 generations, larval specimens were collected for DNA extraction, sequencing, and assembly. Nanopore sequencing produced 79.6 Gb of reads with a read length N50 of 3.7 kb and an average Q-score of 16.9 (Supplementary Table 1). The complete output underwent read correction and filtering resulting in 21 Gb with a read N50 of 9.1 kb used for assembly. The assembled F2 isoline reference genome (STLf2iso) is 245 Mb in length, consisting of 68 contigs with an N50 of 9 Mb and is similar or better in quality to members of the same insect family (Table 1). The genome contained 96.33% BUSCOs (lepidoptera_odb10) with <1% duplication rate, similar to 2 of the other high-quality Gracillariidae genomes available on NCBI and exceeding the cocoa pod borer, Conopomorpha cramerella (Snellen) (Lepidoptera: Gracillariidae) assembly (Table 2). The initial genome assembly was attempted using wild-caught specimens; however, difficulty in achieving high contiguity led us to develop the F2 isoline to reduce heterozygosity. Our assembly pipeline deviated from standard practice due to the high heterozygosity causing assemblers to generate excessive haplotigs, manifested by extraordinarily high duplicate BUSCOs in draft assemblies (some assemblies with 95% duplicate rates) and abnormally large estimated genome sizes. The use of HERRO-corrected reads to improve read quality, the choice of hifiasm over flye assembler, and 3 rounds of haplotig purging via purge_dups resulted in a highly contiguous genome with <1% duplicate BUSCOs, and the expected genome size for a leafminer moth of this family.

Table 1.

Assembly comparison of M. morrisella and other gracillariid genomes available from NCBI.

Scientific nameCommon nameAssemblyNum seqsSum lengthMin lengthMax lengthN50N50 numGC (%)
Macrosaccus morrisellaSoybean tentiform leafminerSTLf2iso68245,695,24247516,440,9569,036,2531136.44
Conopomorpha cramerellaCoco pod borerASM1293212v173,142497,288,140200351,20512,1367,76438.13
Euspilapteryx auroguttellaLeaf blotch minerilEusAuro2.1103331,914,7669,62822,118,60611,650,9791237.39
Aspilapteryx tringipennellaLeaf blotch minerilAspTrin1.1144261,726,8881,00017,401,3949,501,7761237.44
Scientific nameCommon nameAssemblyNum seqsSum lengthMin lengthMax lengthN50N50 numGC (%)
Macrosaccus morrisellaSoybean tentiform leafminerSTLf2iso68245,695,24247516,440,9569,036,2531136.44
Conopomorpha cramerellaCoco pod borerASM1293212v173,142497,288,140200351,20512,1367,76438.13
Euspilapteryx auroguttellaLeaf blotch minerilEusAuro2.1103331,914,7669,62822,118,60611,650,9791237.39
Aspilapteryx tringipennellaLeaf blotch minerilAspTrin1.1144261,726,8881,00017,401,3949,501,7761237.44
Table 1.

Assembly comparison of M. morrisella and other gracillariid genomes available from NCBI.

Scientific nameCommon nameAssemblyNum seqsSum lengthMin lengthMax lengthN50N50 numGC (%)
Macrosaccus morrisellaSoybean tentiform leafminerSTLf2iso68245,695,24247516,440,9569,036,2531136.44
Conopomorpha cramerellaCoco pod borerASM1293212v173,142497,288,140200351,20512,1367,76438.13
Euspilapteryx auroguttellaLeaf blotch minerilEusAuro2.1103331,914,7669,62822,118,60611,650,9791237.39
Aspilapteryx tringipennellaLeaf blotch minerilAspTrin1.1144261,726,8881,00017,401,3949,501,7761237.44
Scientific nameCommon nameAssemblyNum seqsSum lengthMin lengthMax lengthN50N50 numGC (%)
Macrosaccus morrisellaSoybean tentiform leafminerSTLf2iso68245,695,24247516,440,9569,036,2531136.44
Conopomorpha cramerellaCoco pod borerASM1293212v173,142497,288,140200351,20512,1367,76438.13
Euspilapteryx auroguttellaLeaf blotch minerilEusAuro2.1103331,914,7669,62822,118,60611,650,9791237.39
Aspilapteryx tringipennellaLeaf blotch minerilAspTrin1.1144261,726,8881,00017,401,3949,501,7761237.44
Table 2.

BUSCO completeness of M. morrisella and other gracillariid genomes available from NCBI.

Scientific nameGenomeCompleteSingleDuplicateFragment (1)Fragment (2)Missing
Macrosaccus morrisellaSTLf2iso96.33%95.76%0.57%0.25%0.00%3.42%
Conopomorpha cramerellaASM1293212v175.22%66.78%8.44%11.39%0.25%13.15%
Euspilapteryx auroguttellailEusAuro2.197.47%96.94%0.53%0.36%0.00%2.18%
Aspilapteryx tringipennellailAspTrin1.198.07%97.67%0.40%0.28%0.00%1.65%
Scientific nameGenomeCompleteSingleDuplicateFragment (1)Fragment (2)Missing
Macrosaccus morrisellaSTLf2iso96.33%95.76%0.57%0.25%0.00%3.42%
Conopomorpha cramerellaASM1293212v175.22%66.78%8.44%11.39%0.25%13.15%
Euspilapteryx auroguttellailEusAuro2.197.47%96.94%0.53%0.36%0.00%2.18%
Aspilapteryx tringipennellailAspTrin1.198.07%97.67%0.40%0.28%0.00%1.65%
Table 2.

BUSCO completeness of M. morrisella and other gracillariid genomes available from NCBI.

Scientific nameGenomeCompleteSingleDuplicateFragment (1)Fragment (2)Missing
Macrosaccus morrisellaSTLf2iso96.33%95.76%0.57%0.25%0.00%3.42%
Conopomorpha cramerellaASM1293212v175.22%66.78%8.44%11.39%0.25%13.15%
Euspilapteryx auroguttellailEusAuro2.197.47%96.94%0.53%0.36%0.00%2.18%
Aspilapteryx tringipennellailAspTrin1.198.07%97.67%0.40%0.28%0.00%1.65%
Scientific nameGenomeCompleteSingleDuplicateFragment (1)Fragment (2)Missing
Macrosaccus morrisellaSTLf2iso96.33%95.76%0.57%0.25%0.00%3.42%
Conopomorpha cramerellaASM1293212v175.22%66.78%8.44%11.39%0.25%13.15%
Euspilapteryx auroguttellailEusAuro2.197.47%96.94%0.53%0.36%0.00%2.18%
Aspilapteryx tringipennellailAspTrin1.198.07%97.67%0.40%0.28%0.00%1.65%

The genome consists of 42.28% repetitive elements with the vast majority being interspersed repeats rather than simple repeats (Table 3). Most were novel to this genome with only 4.35% detected when using the publicly available Dfam database. The repeat content is in line with other animal genomes, though the elements which make up the bulk of the retroposed copies appear to be novel to this clade and were not represented in the public database of known repeat elements.

Table 3.

Repeat content of M. morrisella genome.

NameNumber of ElementsCombined length (bp)Genome (%)
Retroelements65,04915,218,8736.19
L2/CR1/Rex11,3413,125,8961.27
R1/LOA/Jockey267266,3760.11
R2/R4/NeSL109102,1210.04
RTE/Bov-B40,6206,683,2472.72
LTR elements4,0603,315,5651.35
BEL/Pao1,8091,259,2920.51
Ty1/Copia238240,3220.1
Gypsy/DIRS12,0131,815,9510.74
DNA transposons11,2082,342,4850.95
hobo-Activator492131,4770.05
Tc1-IS630-Pogo1,098482,4570.2
PiggyBac8589,6440.04
Other4,572508,4780.21
Rolling-circles84367,5410.03
Unclassified538,87186,328,89835.14
Total interspersed repeats-103,890,25642.28
Satellites63,5420
Simple repeats58,4462,606,6051.06
Low complexity9,588468,3300.19
NameNumber of ElementsCombined length (bp)Genome (%)
Retroelements65,04915,218,8736.19
L2/CR1/Rex11,3413,125,8961.27
R1/LOA/Jockey267266,3760.11
R2/R4/NeSL109102,1210.04
RTE/Bov-B40,6206,683,2472.72
LTR elements4,0603,315,5651.35
BEL/Pao1,8091,259,2920.51
Ty1/Copia238240,3220.1
Gypsy/DIRS12,0131,815,9510.74
DNA transposons11,2082,342,4850.95
hobo-Activator492131,4770.05
Tc1-IS630-Pogo1,098482,4570.2
PiggyBac8589,6440.04
Other4,572508,4780.21
Rolling-circles84367,5410.03
Unclassified538,87186,328,89835.14
Total interspersed repeats-103,890,25642.28
Satellites63,5420
Simple repeats58,4462,606,6051.06
Low complexity9,588468,3300.19
Table 3.

Repeat content of M. morrisella genome.

NameNumber of ElementsCombined length (bp)Genome (%)
Retroelements65,04915,218,8736.19
L2/CR1/Rex11,3413,125,8961.27
R1/LOA/Jockey267266,3760.11
R2/R4/NeSL109102,1210.04
RTE/Bov-B40,6206,683,2472.72
LTR elements4,0603,315,5651.35
BEL/Pao1,8091,259,2920.51
Ty1/Copia238240,3220.1
Gypsy/DIRS12,0131,815,9510.74
DNA transposons11,2082,342,4850.95
hobo-Activator492131,4770.05
Tc1-IS630-Pogo1,098482,4570.2
PiggyBac8589,6440.04
Other4,572508,4780.21
Rolling-circles84367,5410.03
Unclassified538,87186,328,89835.14
Total interspersed repeats-103,890,25642.28
Satellites63,5420
Simple repeats58,4462,606,6051.06
Low complexity9,588468,3300.19
NameNumber of ElementsCombined length (bp)Genome (%)
Retroelements65,04915,218,8736.19
L2/CR1/Rex11,3413,125,8961.27
R1/LOA/Jockey267266,3760.11
R2/R4/NeSL109102,1210.04
RTE/Bov-B40,6206,683,2472.72
LTR elements4,0603,315,5651.35
BEL/Pao1,8091,259,2920.51
Ty1/Copia238240,3220.1
Gypsy/DIRS12,0131,815,9510.74
DNA transposons11,2082,342,4850.95
hobo-Activator492131,4770.05
Tc1-IS630-Pogo1,098482,4570.2
PiggyBac8589,6440.04
Other4,572508,4780.21
Rolling-circles84367,5410.03
Unclassified538,87186,328,89835.14
Total interspersed repeats-103,890,25642.28
Satellites63,5420
Simple repeats58,4462,606,6051.06
Low complexity9,588468,3300.19

The mitochondrial genome is 15,437 bp in length with 37 genes with no frameshifts in our assembly (Fig. 1). This is smaller than the bait mitogenome we used for assembly which was 17,050 bp in length from E. auroguttella. When BLASTed against lepidoptera, the closest match is to Argyresthia albistria Haworth (Lepidoptera: Yponomeutidae) mitochondrion with 81.55% identity over 99% length. When BLASTed against Gracillariidae, the closest match is to E. auroguttella with 82.83% identity over 97% length. High mitochondrial divergence even to other Gracillariidae genomes indicates a long history of separation and sparse coverage of sequenced genomes in this clade.

Mitogenome structure of the F2 isoline strain of M. morrisella. Coding regions are in green, tRNAs in red, and the small and large mitochondrial ribosomal subunits are in blue.
Fig. 1.

Mitogenome structure of the F2 isoline strain of M. morrisella. Coding regions are in green, tRNAs in red, and the small and large mitochondrial ribosomal subunits are in blue.

Protein prediction yielded 10,709 transcripts with an overall protein BUSCO score of 90.77% completeness consisting of 84.26% single copy, 6.51% duplicates, 0.19% fragments, 9.04% missing (Supplementary File 3) (Melotto et al. 2023). To determine pipeline efficacy, we used the same parameters on the previously published E. auroguttella genome (ilEusAuro2.1) and found very similar detection levels, 90.11% completeness consisting of 84.43% single copy, 5.68% duplicates, 0.40% fragments, and 9.50% missing.

Population genomics

We sequenced 18 individuals collected from the field, representing 3 biological replicates per host plant (either soybean or American hogpeanut), from 3 geographical locations, St. Paul, Rochester, and Brooten, Minnesota. Their depth when mapped to our 245 Mb genome ranged from 6X to 45X, mean 19X (Supplementary Table 2). Nanopore sequencing allows detection of DNA modifications along with sequence variants. None of the 18 individuals had any appreciable DNA methylation (mean 1.01%) or DNA hydroxymethylation (mean 0.22%), indicating that this species does not utilize DNA methylation as a regulatory mechanism (Supplementary Table 3).

We called variants using Clair3 followed by standard filtering guidelines. Variant depth ranged from 4.62X to 24.95X. The fixation index (FST) estimate was 0.0052 (weighted FST estimate: 0.0058) indicating very low genetic differentiation between groups. The populations are largely similar genetically and are likely interbreeding with high gene flow or share a very recent common ancestor, leading to similar genetic structures (Table 4).

Table 4.

Population genomics of M. morrisella from soybean or American hogpeanut collected at 3 geographical locations in Minnesota, USA.

IDClair3 variantsMean depthObserved (homozygous)Estimated (homozygous)Filtered sitesFa
St.Paul_Hogpeanut_17,683,1795.7096,055101,804.3138,751−0.15561
St.Paul_Hogpeanut_29,128,99113.70107,412109,066.9148,306−0.04218
St.Paul_Hogpeanut_38,989,9748.27104,614108,517.2147,600−0.09987
St.Paul_Soybean_19,081,91612.08107,633109,077.2148,328−0.03679
St.Paul_Soybean_29,215,48213.09107,833108,825.9147,972−0.02536
St.Paul_Soybean_39,331,46616.66108,591108,923.1148,099−0.00848
Rochester_Hogpeanut_19,145,82010.09106,582108,638.5147,735−0.0526
Rochester_Hogpeanut_29,369,27519.50107,009108,828.1147,987−0.04645
Rochester_Hogpeanut_39,202,28211.01107,356108,773.8147,904−0.03623
Rochester_Soybean_19,393,79024.65107,055108,757.2147,873−0.04352
Rochester_Soybean_29,515,69124.95107,623108,876148,017−0.03201
Rochester_Soybean_38,267,4047.08103,034106,543.9144,939−0.09141
Brooten_Hogpeanut_17,772,5236.0798,226106,014.2144,339−0.20322
Brooten_Hogpeanut_26,768,4494.7788,92096,074.6131,140−0.20404
Brooten_Hogpeanut_36,572,3784.6286,39197,063.6132,512−0.30107
Brooten_Soybean_18,234,9807.22101,119108,277147,304−0.18341
Brooten_Soybean_29,164,94620.86108,515108,789.3147,920−0.00701
Brooten_Soybean_39,103,05712.69108,255108,728.5147,829−0.01211
Isoline3,797,43743.00----
Mean8,407,31814.00103,457106,754145,253−0.0878539
IDClair3 variantsMean depthObserved (homozygous)Estimated (homozygous)Filtered sitesFa
St.Paul_Hogpeanut_17,683,1795.7096,055101,804.3138,751−0.15561
St.Paul_Hogpeanut_29,128,99113.70107,412109,066.9148,306−0.04218
St.Paul_Hogpeanut_38,989,9748.27104,614108,517.2147,600−0.09987
St.Paul_Soybean_19,081,91612.08107,633109,077.2148,328−0.03679
St.Paul_Soybean_29,215,48213.09107,833108,825.9147,972−0.02536
St.Paul_Soybean_39,331,46616.66108,591108,923.1148,099−0.00848
Rochester_Hogpeanut_19,145,82010.09106,582108,638.5147,735−0.0526
Rochester_Hogpeanut_29,369,27519.50107,009108,828.1147,987−0.04645
Rochester_Hogpeanut_39,202,28211.01107,356108,773.8147,904−0.03623
Rochester_Soybean_19,393,79024.65107,055108,757.2147,873−0.04352
Rochester_Soybean_29,515,69124.95107,623108,876148,017−0.03201
Rochester_Soybean_38,267,4047.08103,034106,543.9144,939−0.09141
Brooten_Hogpeanut_17,772,5236.0798,226106,014.2144,339−0.20322
Brooten_Hogpeanut_26,768,4494.7788,92096,074.6131,140−0.20404
Brooten_Hogpeanut_36,572,3784.6286,39197,063.6132,512−0.30107
Brooten_Soybean_18,234,9807.22101,119108,277147,304−0.18341
Brooten_Soybean_29,164,94620.86108,515108,789.3147,920−0.00701
Brooten_Soybean_39,103,05712.69108,255108,728.5147,829−0.01211
Isoline3,797,43743.00----
Mean8,407,31814.00103,457106,754145,253−0.0878539

aNegative values indicate excess of heterozygotes beyond HW eq.

Table 4.

Population genomics of M. morrisella from soybean or American hogpeanut collected at 3 geographical locations in Minnesota, USA.

IDClair3 variantsMean depthObserved (homozygous)Estimated (homozygous)Filtered sitesFa
St.Paul_Hogpeanut_17,683,1795.7096,055101,804.3138,751−0.15561
St.Paul_Hogpeanut_29,128,99113.70107,412109,066.9148,306−0.04218
St.Paul_Hogpeanut_38,989,9748.27104,614108,517.2147,600−0.09987
St.Paul_Soybean_19,081,91612.08107,633109,077.2148,328−0.03679
St.Paul_Soybean_29,215,48213.09107,833108,825.9147,972−0.02536
St.Paul_Soybean_39,331,46616.66108,591108,923.1148,099−0.00848
Rochester_Hogpeanut_19,145,82010.09106,582108,638.5147,735−0.0526
Rochester_Hogpeanut_29,369,27519.50107,009108,828.1147,987−0.04645
Rochester_Hogpeanut_39,202,28211.01107,356108,773.8147,904−0.03623
Rochester_Soybean_19,393,79024.65107,055108,757.2147,873−0.04352
Rochester_Soybean_29,515,69124.95107,623108,876148,017−0.03201
Rochester_Soybean_38,267,4047.08103,034106,543.9144,939−0.09141
Brooten_Hogpeanut_17,772,5236.0798,226106,014.2144,339−0.20322
Brooten_Hogpeanut_26,768,4494.7788,92096,074.6131,140−0.20404
Brooten_Hogpeanut_36,572,3784.6286,39197,063.6132,512−0.30107
Brooten_Soybean_18,234,9807.22101,119108,277147,304−0.18341
Brooten_Soybean_29,164,94620.86108,515108,789.3147,920−0.00701
Brooten_Soybean_39,103,05712.69108,255108,728.5147,829−0.01211
Isoline3,797,43743.00----
Mean8,407,31814.00103,457106,754145,253−0.0878539
IDClair3 variantsMean depthObserved (homozygous)Estimated (homozygous)Filtered sitesFa
St.Paul_Hogpeanut_17,683,1795.7096,055101,804.3138,751−0.15561
St.Paul_Hogpeanut_29,128,99113.70107,412109,066.9148,306−0.04218
St.Paul_Hogpeanut_38,989,9748.27104,614108,517.2147,600−0.09987
St.Paul_Soybean_19,081,91612.08107,633109,077.2148,328−0.03679
St.Paul_Soybean_29,215,48213.09107,833108,825.9147,972−0.02536
St.Paul_Soybean_39,331,46616.66108,591108,923.1148,099−0.00848
Rochester_Hogpeanut_19,145,82010.09106,582108,638.5147,735−0.0526
Rochester_Hogpeanut_29,369,27519.50107,009108,828.1147,987−0.04645
Rochester_Hogpeanut_39,202,28211.01107,356108,773.8147,904−0.03623
Rochester_Soybean_19,393,79024.65107,055108,757.2147,873−0.04352
Rochester_Soybean_29,515,69124.95107,623108,876148,017−0.03201
Rochester_Soybean_38,267,4047.08103,034106,543.9144,939−0.09141
Brooten_Hogpeanut_17,772,5236.0798,226106,014.2144,339−0.20322
Brooten_Hogpeanut_26,768,4494.7788,92096,074.6131,140−0.20404
Brooten_Hogpeanut_36,572,3784.6286,39197,063.6132,512−0.30107
Brooten_Soybean_18,234,9807.22101,119108,277147,304−0.18341
Brooten_Soybean_29,164,94620.86108,515108,789.3147,920−0.00701
Brooten_Soybean_39,103,05712.69108,255108,728.5147,829−0.01211
Isoline3,797,43743.00----
Mean8,407,31814.00103,457106,754145,253−0.0878539

aNegative values indicate excess of heterozygotes beyond HW eq.

The heterozygosity was calculated on filtered biallelic SNVs of high quality. The moderate negative inbreeding coefficient (F values) in the field samples suggests that they are generally outbred, with a small excess of heterozygous sites, which is typical for natural populations. Most field samples have F values between −0.1 and −0.03, indicating a modest excess of heterozygous sites, which could be due to the Wahlund effect (Garnier-Géré and Chikhi 2013). The Wahlund effect occurs when a population is structured into subpopulations with limited gene flow between them, and these subpopulations have different allele frequencies. When samples are pooled together without accounting for the subpopulation structure, it can result in an apparent excess of heterozygosity across the combined population compared with Hardy–Weinberg equilibrium.

The PCA plot resulted in 2 distinct clusters, though not separated by either host species or geographic region of collection, indicating wide diversity and unrestricted gene flow in this population (Fig. 2). The variation explained by the first 2 components is relatively low (7.57% for PC1 and 6.8% for PC2), suggesting that there may be other genetic factors influencing the populations that are not captured by these 2 components alone.

PCA of genetic variation among M. morrisella populations. The plot shows 18 individuals grouped by sampling location (Brooten, St. Paul, Rochester) and host plant (American hogpeanut or soybean). PC1 and PC2 explain 7.57 and 6.8% of the variation, respectively.
Fig. 2.

PCA of genetic variation among M. morrisella populations. The plot shows 18 individuals grouped by sampling location (Brooten, St. Paul, Rochester) and host plant (American hogpeanut or soybean). PC1 and PC2 explain 7.57 and 6.8% of the variation, respectively.

We performed an admixture analysis to further distinguish the population structure (Fig. 3). At k = 2, there was a clear division of populations between the St. Paul and Brooten samples; however, the Rochester samples clustered on both sides of the distribution by host plant. At k = 3, the isoline reference genome diverged from all other samples. Up to k = 4, the American hogpeanut samples from Brooten and soybean samples from Rochester maintain the greatest homogeneity, with all individuals in the orange group, suggesting a common origin. At k = 4 and above, the population structure no longer diverged in a defined way according to sample origin. Even the most homogenous cluster at k = 4 is derived from both host plants indicating that genetic divergence is not driven by host preference.

Admixture plot showing genetic structure of M. morrisella populations across different locations (St. Paul, Rochester, Brooten) and host plants (American hogpeanut, soybean). Each vertical bar represents an individual, and different colors indicate proportions of ancestry from inferred genetic clusters. The number of clusters (K) ranges from 2 to 8, revealing increasing genetic complexity. Clear differentiation based on host plant is visible at lower K values, while location-specific variation and admixture become more apparent at higher K values.
Fig. 3.

Admixture plot showing genetic structure of M. morrisella populations across different locations (St. Paul, Rochester, Brooten) and host plants (American hogpeanut, soybean). Each vertical bar represents an individual, and different colors indicate proportions of ancestry from inferred genetic clusters. The number of clusters (K) ranges from 2 to 8, revealing increasing genetic complexity. Clear differentiation based on host plant is visible at lower K values, while location-specific variation and admixture become more apparent at higher K values.

Mitochondrial population structure

Genetic divergence between laboratory-reared and wild populations is commonly seen for insects due to factors such as adaptation to laboratory conditions, inbreeding, or genetic drift (Hoffmann and Ross 2018). Mitochondrial sequence divergence was used as a secondary method to reconstruct any population structure that may have existed for M. morrisella. We used complete mitogenomes independently assembled from each of the 3 biological replicates collected across host plants and geographic regions (Fig. 4). All field samples clustered separately from the F2 isoline reference mitogenome; however, there was very little sequence divergence among them. No consistent monophyly was seen between host plant and geographic region. The short x-axis reflects the small number of substitutions seen over the entire population.

Phylogenetic tree of 18 M. morrisella individuals based on mitochondrial genome sequences. Individuals are grouped by sampling location (St. Paul, Rochester, Brooten) and host plant (American hogpeanut, soybean). The tree is rooted with the reference isoline as an outgroup. Some clustering by location is observed (e.g. Brooten and Rochester), but there is no clear pattern by host plant. Bootstrap values indicate support for major branches, with values over 90 suggesting well-supported relationships. Scale bar represents genetic distance.
Fig. 4.

Phylogenetic tree of 18 M. morrisella individuals based on mitochondrial genome sequences. Individuals are grouped by sampling location (St. Paul, Rochester, Brooten) and host plant (American hogpeanut, soybean). The tree is rooted with the reference isoline as an outgroup. Some clustering by location is observed (e.g. Brooten and Rochester), but there is no clear pattern by host plant. Bootstrap values indicate support for major branches, with values over 90 suggesting well-supported relationships. Scale bar represents genetic distance.

Conclusion

The newly created reference genome for M. morrisella gives insight to its place in the Gracillariidae family tree, showing high genomic heterozygosity and early divergence from related species. The lack of detectable population structure at the nuclear and mitochondrial genome levels suggests a large population with flexible dietary preferences and does not show evidence of genetically driven host preference. Overall, the findings of this study advance the knowledge of the biology of M. morrisella and will help to establish the foundation for development of management strategies for this insect.

Data availability

Assembly and BioSample information is available at NCBI BioProject number PRJNA1173748, BioSample number SAMN44319150, and accession number JBIOAS000000000. Raw sequencing reads are available at the NCBI sequence read archive accession number SRR32170798.

Supplemental material available at G3 online.

Acknowledgments

We thank Fabio Führ for assistance collecting and rearing the insects. We also thank Gloria Melotto for reviewing an early version of the manuscript.

Funding

This research was supported by the North Central Soybean Research Program and the Minnesota Rapid Agricultural Response Fund.

Author contributions

A.V.R. and R.L.K. provided the insect samples. C.W. performed the extractions, sequencing, and quality analyses. C.F. assembled the genome and performed population analyses. C.F., A.V.R., and R.L.K. wrote and edited the manuscript. R.L.K. and C.F. secured funding to support this work. All authors have read and approved the manuscript prior to submission.

Literature cited

Cheng
H
,
Concepcion
GT
,
Feng
X
,
Zhang
H
,
Li
H
.
2021
.
Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm
.
Nat Methods.
18
(
2
):
170
175
. doi:.

Danecek
P
,
Bonfield
JK
,
Liddle
J
,
Marshall
J
,
Ohan
V
,
Pollard
MO
,
Whitwham
A
,
Keane
T
,
McCarthy
SA
,
Davies
RM
, et al.
2021
.
Twelve years of SAMtools and BCFtools
.
Gigascience
.
10
(
2
):
giab008
. doi:.

Davis
DR
.
1987
. The Gracillariidae, blotch leaf miners. In:
Stehr
FW
, editor.
Immature Insects
.
Dubuque (IA)
:
Kendall/Hunt Publishing Company
. p.
372
376
.

Davis
DR
,
De Prins
J
.
2011
.
Systematics and biology of the new genus Macrosaccus with descriptions of two new species (Lepidoptera, Gracillariidae)
.
ZooKeys
.
98
:
29
82
. doi:.

De Prins
J
,
De Prins
W
.
2024
. Global Taxonomic database of Gracillariidae (Lepidoptera). [Accessed 2024 Aug 25] http://www.gracillariidae.net.

Flynn
JM
,
Hubley
R
,
Goubert
C
,
Rosen
J
,
Clark
AG
,
Feschotte
C
,
Smit
AF
.
2020
.
RepeatModeler2 for automated genomic discovery of transposable element families
.
Proc Natl Acad Sci U S A
.
117
(
17
):
9451
9457
. doi:.

Garnier-Géré
P
,
Chikhi
L
.
2013
. Population subdivision, Hardy–Weinberg equilibrium and the Wahlund effect. In:
eLS
.
Chichester (UK)
:
John Wiley and Sons, Ltd
.

Guan
D
,
McCarthy
SA
,
Wood
J
,
Howe
K
,
Wang
Y
,
Durbin
R
.
2020
.
Identifying and removing haplotypic duplication in primary genome assemblies
.
Bioinformatics
.
36
(
9
):
2896
2898
. doi:.

Hoffmann
AA
,
Ross
PA
.
2018
.
Rates and patterns of laboratory adaptation in (mostly) insects
.
J Econ Entomol.
111
(
2
):
501
509
. doi:.

Huang
N
,
Li
H
.
2023
.
Compleasm: a faster and more accurate reimplementation of BUSCO
.
Bioinformatics
.
39
(
10
):
btad595
. doi:.

Kappel
F
,
Proctor
JT
.
1986
.
Simulated spotted tentiform leafminer injury and its influence on growth and fruiting of apple trees
.
J Am Soc Hortic Sci.
111
(
1
):
64
69
. doi:.

Kawahara
AY
,
Plotkin
D
,
Ohshima
I
,
Lopez-Vaamonde
CA
,
Houlihan
PR
,
Breinholt
JW
,
Kawakita
A
,
Xiao
LE
,
Regier
JC
,
Davis
DR
, et al.
2017
.
A molecular phylogeny and revised higher-level classification for the leaf-mining moth family Gracillariidae and its implications for larval host-use evolution
.
Syst Entomol.
42
(
1
):
60
81
. doi:.

Koch
RL
,
Moisan-De Serres
J
,
Ribeiro
AV
.
2021
.
First reports of Macrosaccus morrisella (Lepidoptera: Gracillariidae) feeding on soybean, Glycine max (Fabales: Fabaceae)
.
J Integr Pest Manag.
12
(
1
):
44
. doi:.

Kogan
M
.
1981
.
Dynamics of insect adaptations to soybean: impact of integrated pest management
.
Environ Entomol.
10
(
3
):
363
371
. doi:.

Kopelman
NM
,
Mayzel
J
,
Jakobsson
M
,
Rosenberg
NA
,
Mayrose
I
.
2015
.
Clumpak: a program for identifying clustering modes and packaging population structure inferences across K
.
Mol Ecol Resour.
15
(
5
):
1179
1191
. doi:.

Liu
WH
,
Dai
XH
,
Xu
JS
.
2015
.
Revisión de las influencias de los insectos minadores de hojas en sus plantas huésped
.
Collect Bot
.
34
:
e005
. doi:.

Melotto
G
,
Potter
BD
,
Koch
RL
,
Lindsey
AR
.
2023
.
Spatial and temporal dynamics of soybean gall midge (Resseliella maxima) parasitism by Synopeas maximum
.
Pest Manag Sci.
79
(
12
):
5096
5105
. doi:.

Menger
J
,
Ribeiro
A
,
Fuhr
F
,
Koch
RL
.
2024
.
Laboratory rearing methods for the soybean tentiform leafminer (Lepidoptera: Gracillariidae), a new pest of soybean
.
Great Lakes Entomol
.
57
(
1
):
63
68
. doi:.

Mussmann
SM
,
Douglas
MR
,
Chafin
TK
,
Douglas
ME
.
2023
.
AdmixPipe v3: facilitating population structure delimitation from SNP data
.
Bioinform Adv.
3
(
1
):
vbad168
. doi:.

Purcell
S
,
Neale
B
,
Todd-Brown
K
,
Thomas
L
,
Ferreira
MA
,
Bender
D
,
Maller
J
,
Sklar
P
,
De Bakker
PI
,
Daly
MJ
, et al.
2007
.
PLINK: a tool set for whole-genome association and population-based linkage analyses
.
Am J Hum Genet.
81
(
3
):
559
575
. doi:.

Purcell
LC
,
Salmeron
M
,
Ashlock
L
, editors.
2014
. Soybean growth and development. In:
Arkansas Soybean Prod Handbook
(vol. 197).
Fayetteville, Arkansas
:
University of Arkansas
. p.
1
8
.

Ribeiro
AV
,
Koch
RL.
2024
. The soybean tentiform leafminer has been found in 51 counties in Minnesota, North Dakota and South Dakota. Minnesota Crop News. [Accessed 2024 Nov 8] https://blog-crop-news.extension.umn.edu/2024/01/the-soybean-tentiform-leafminer-has.html.

Ribeiro
AV
,
Menger
JP
,
Führ
FM
,
Koch
RL
.
2024
.
Immature development and adult longevity of the soybean tentiform leafminer (Lepidoptera: Gracillariidae)
.
Environ Entomol.
53
(
4
):
723
729
. doi:.

Simon
JC
,
d’Alencon
E
,
Guy
E
,
Jacquin-Joly
E
,
Jaquiery
J
,
Nouhaud
P
,
Peccoud
J
,
Sugio
A
,
Streiff
R
.
2015
.
Genomics of adaptation to host-plants in herbivorous insects
.
Brief Funct Genomics.
14
(
6
):
413
423
. doi:.

Smit
AFA
,
Hubley
R
,
Green
P.
2013
. RepeatMasker Open-4.0. 2013–2015. Accessed 2024 Nov 8. http://www.repeatmasker.org.

Uliano-Silva
M
,
Ferreira
JG
,
Krasheninnikova
K
,
Formenti
G
,
Abueg
L
,
Torrance
J
,
Myers
EW
,
Durbin
R
,
Blaxter
M
,
McCarthy
SA
.
2023
.
Mitohifi: a python pipeline for mitochondrial genome assembly from PacBio high fidelity reads
.
BMC Bioinformatics
.
24
(
1
):
288
. doi:.

Ullah
MI
,
Arshad
M
,
Ali
S
,
Mehmood
N
,
Khalid
S
,
Afzal
M
.
2020
.
Physiological characteristics of Citrus plants infested with citrus leafminer, Phyllocnistis citrella (Lepidoptera: Gracillariidae)
.
Int J Fruit Sci.
20
(
Suppl. 2
):
S871
S883
. doi:.

USDA
.
2024a
. Plants database. Accessed 2024 Nov 8. https://plants.usda.gov/.

USDA
.
2024b
. Country summary. Accessed 2024 Nov 8. https://ipad.fas.usda.gov/countrysummary/.

Van Nieukerken
EJ
,
Kaila
L
,
Kitching
IJ
,
Kristensen
NP
,
Lees
DC
,
Minet
J
,
Mitter
C
,
Mutanen
M
,
Regier
JC
,
Simonsen
TJ
, et al.
2011
. Order Lepidoptera Linnaeus, 1758. In:
Zhang
Z-Q
, editor.
Animal Biodiversity: An Outline of Higher-level Classification and Survey of Taxonomic Richness
.
Auckland (New Zealand)
:
Magnolia Press
. p.
1
82
.

Zheng
Z
,
Li
S
,
Su
J
,
Leung
AW
,
Lam
TW
,
Luo
R
.
2022
.
Symphonizing pileup and full-alignment for deep learning-based long-read variant calling
.
Nat Comput Sci.
2
(
12
):
797
803
. doi:.

Author notes

Conflicts of interest: The author(s) declare no conflicts of interest.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Editor: D Bergstralh
D Bergstralh
Editor
Search for other works by this author on:

Supplementary data