Abstract

A draft assembly of the rainforest tree Rhodamnia argentea Benth. (malletwood, Myrtaceae) revealed contaminating DNA sequences that most closely matched those from mites in the family Eriophyidae. Eriophyoid mites are plant parasites that often induce galls or other deformities on their host plants. They are notable for their small size (averaging 200 μm), distinctive four-legged body structure, and heavily streamlined genomes, which are among the smallest known of all arthropods. Contaminating mite sequences were assembled into a high-quality gapless telomere-to-telomere nuclear genome. The entire genome was assembled on two fully contiguous chromosomes, capped with a novel TTTGG or TTTGGTGTTGG telomere sequence, and exhibited clear signs of genome reduction (34.5 Mbp total length, 68.6% arachnid Benchmarking Universal Single-Copy Ortholog completeness). Phylogenomic analysis confirmed that this genome is that of a previously unsequenced eriophyoid mite. Despite its unknown identity, this complete nuclear genome provides a valuable resource to investigate invertebrate genome reduction.

Significance

Eriophyoid mites are microscopic, four-legged, plant-infesting mites that exhibit heavily streamlined genomes, in which many of the normal genes have been lost. This study reports a complete (gapless) telomere-to-telomere assembly of the nuclear genome of an eriophyoid mite from contamination in host tree sequencing data. The assembly provides a powerful resource to investigate the evolutionary phenomenon of genome reduction.

Trees are hosts to a diverse range of pests and diseases (Gougherty and Davies 2022). Eriophyoid mites (Acari: Eriophyoidea) are a hyperdiverse lineage of over 5,000 species that feed on plants (Zhang 2017; Ueckermann 2010). These tiny four-legged mites, averaging 200 μm in length (Amrine et al. 2003), have the distinct ability to induce galls (de Lillo et al. 2018; Lindquist et al. 1996). As well as being physically small, eriophyoid mites have very small genomes. For example, the 32.5 Mbp genome of the tomato russet mite Aculops lycopersici (Tryon, 1917) (Greenhalgh et al. 2020) is the smallest yet reported for any arthropod. It has few transposable elements, tiny intergenic regions, and is remarkably intron-poor, as more than 80% of coding genes are intronless (Greenhalgh et al. 2020). Reminiscent of microbial eukaryotes, this reduction of nuclear genome content is a consequence of genome streamlining (Arkhipova 2018; Hessen et al. 2010).

Pest and pathogen DNA can often contribute contaminating DNA to genome assembly projects. We recently sequenced and assembled the genome of Rhodamnia argentea Benth. (malletwood) to inform conservation management (Chen et al. 2024). During this process, arthropod sequences were detected as contaminants at surprisingly high read depths. While these were hard to separate using linked-read sequencing, the addition of Oxford Nanopore Technologies (ONT) long reads made it possible to isolate and independently assemble the tree and mite genomes. Here, we present a telomere-to-telomere reference genome of the contaminating eriophyoid mite, assembled from ONT long reads and 10 × Genomics Chromium linked reads. Our efforts to isolate and identify the mite itself were not successful. Nevertheless, our complete and gapless nuclear genome of this unidentified species shows dramatic streamlining of content and will provide an invaluable resource for future studies of arthropod genome reduction.

Genome Sequencing and Assembly

The sampling and sequencing for the host tree R. argentea is described in Chen et al. (2024). Briefly, young leaves of R. argentea were sampled from a specimen at the Royal Botanic Garden Sydney (NCBI BioSample SAMN19698777, ToLID drRhoArge1) and high molecular weight gDNA extracted for ONT and 10 × Genomics Chromium (Illumina) sequencing, following size selection. Scaffolding of the tree genome to chromosome level was achieved through Hi-C sequencing (Phase Genomics Proximo Hi-C Plant kit). ONT and 10 × reads were partitioned into three sets based on mapping to the contaminated R. argentea genome: (i) R. argentea reads, (ii) contamination reads mapping to removed scaffolds, and (iii) unmapped reads. Contamination and unmapped reads were then combined (“non-drRhoArge1 reads”). The 10 × reads were trimmed (5′ 30 bp of R1, 5′ 10 bp of R2, quality trimming < Q20) with bbmap v38.51 (Bushnell 2014) and filtered to sequences over 100 bp long.

Non-drRhoArge1 ONT reads were assembled with Flye v2.9 (Kolmogorov et al. 2019) with a target genome size (based on the filtered scaffolds) of 35 Mbp. Scaffolding and gap-filling was done with LongStitch-ARKS v1.0.1 (Coombe et al. 2021) and low-quality scaffolds were removed with Diploidocus v1.1.1 (cycle mode) (Chen et al. 2022). The genome was then polished with Hypo v1.0.3 (Kundu et al. 2019). The assembly was subject to a final contamination screen by running FCS-GX v0.4.0 (Astashyn et al. 2024) and Tiara v1.0.3 (Karlicki et al. 2022). FCS-GX was run against the NCBI gxdb (build date 2023 to 2001-24). DepthKopy v1.0.1 analysis was performed to check for low-quality/coverage sequences. Assembly completeness was evaluated with Benchmarking Universal Single-Copy Ortholog (BUSCO) v5.2.2 (Manni et al. 2021) against the arachnida_odb10 (n = 2,934), arthropoda_odb10 (n = 1,013), metazoa_odb10 (n = 954), eukaryota_odb10 (n = 255) datasets. DepthSizer v1.6.2 (Chen et al. 2022) was used to investigate the read depth of the mite genome contamination. The R. argentea reads had a mean read depth of 53 × (Chen et al. 2024) while contaminants were at 35.7X. BUSCO completeness (arachnida_odb10, n = 2,934) was low, at 68.6%, including a high “Duplicated” rate of 4.5% (Table 1). DepthKopy predicted that 132/135 (97.8%) of the Duplicated genes are real duplications, consistent with the high genome quality, including all 132 where both copies were on a chromosome scaffold. BUSCO completeness varied as the taxonomic specificity was reduced, with 87.4% genes complete and only 6.7% missing for the eukaryote database (supplementary table S1, Supplementary Material online).

Table 1

Genome assembly and annotation statistics

StatisticRARGMITEv1
TechnologyONT, 10x
Total length (bp)34,344,194
No. of scaffolds2
 N50 (bp)17,366,404
 L501
No. of contigs2
 N50 (bp)17,366,404
 L501
N bases0
GC (%)42.08
BUSCO complete (genome)68.6% (2,012)
 Single-copy (genome)64.1% (1,880)
 Duplicated (genome)4.5% (132)
BUSCO fragmented (genome)1.3% (38)
BUSCO missing (genome)30.1% (883)
Protein-coding genes8,050
mRNAs9,244
rRNAs18
tRNAs56
BUSCO complete (proteome)70.3% (2,063)
 Single-copy (proteome)65.1% (1,909)
 Duplicated (proteome)5.2% (154)
BUSCO fragmented (proteome)1.3% (38)
BUSCO missing (proteome)28.4% (833)
StatisticRARGMITEv1
TechnologyONT, 10x
Total length (bp)34,344,194
No. of scaffolds2
 N50 (bp)17,366,404
 L501
No. of contigs2
 N50 (bp)17,366,404
 L501
N bases0
GC (%)42.08
BUSCO complete (genome)68.6% (2,012)
 Single-copy (genome)64.1% (1,880)
 Duplicated (genome)4.5% (132)
BUSCO fragmented (genome)1.3% (38)
BUSCO missing (genome)30.1% (883)
Protein-coding genes8,050
mRNAs9,244
rRNAs18
tRNAs56
BUSCO complete (proteome)70.3% (2,063)
 Single-copy (proteome)65.1% (1,909)
 Duplicated (proteome)5.2% (154)
BUSCO fragmented (proteome)1.3% (38)
BUSCO missing (proteome)28.4% (833)

GC, guanine-cytosine.

Table 1

Genome assembly and annotation statistics

StatisticRARGMITEv1
TechnologyONT, 10x
Total length (bp)34,344,194
No. of scaffolds2
 N50 (bp)17,366,404
 L501
No. of contigs2
 N50 (bp)17,366,404
 L501
N bases0
GC (%)42.08
BUSCO complete (genome)68.6% (2,012)
 Single-copy (genome)64.1% (1,880)
 Duplicated (genome)4.5% (132)
BUSCO fragmented (genome)1.3% (38)
BUSCO missing (genome)30.1% (883)
Protein-coding genes8,050
mRNAs9,244
rRNAs18
tRNAs56
BUSCO complete (proteome)70.3% (2,063)
 Single-copy (proteome)65.1% (1,909)
 Duplicated (proteome)5.2% (154)
BUSCO fragmented (proteome)1.3% (38)
BUSCO missing (proteome)28.4% (833)
StatisticRARGMITEv1
TechnologyONT, 10x
Total length (bp)34,344,194
No. of scaffolds2
 N50 (bp)17,366,404
 L501
No. of contigs2
 N50 (bp)17,366,404
 L501
N bases0
GC (%)42.08
BUSCO complete (genome)68.6% (2,012)
 Single-copy (genome)64.1% (1,880)
 Duplicated (genome)4.5% (132)
BUSCO fragmented (genome)1.3% (38)
BUSCO missing (genome)30.1% (883)
Protein-coding genes8,050
mRNAs9,244
rRNAs18
tRNAs56
BUSCO complete (proteome)70.3% (2,063)
 Single-copy (proteome)65.1% (1,909)
 Duplicated (proteome)5.2% (154)
BUSCO fragmented (proteome)1.3% (38)
BUSCO missing (proteome)28.4% (833)

GC, guanine-cytosine.

A Gapless Telomere-to-telomere Reference Genome for an Unidentified Eriophyoid Mite

Of the six scaffolds assembled from contaminated and unmapped reads, two were gapless chromosomes. These comprised 99.64% of the assembly and were capped at each end with a repeating telomere-like sequence. After quality and contamination assessment, the four unplaced scaffolds totaling 124,586 bp were removed as potential contamination (one scaffold, two contigs) or a false duplication (one contig). The final 34.5 Mbp mite genome consisted of two gapless telomere-to-telomere chromosomes (Table 1, Fig. 1). Although this genome was presumably assembled from DNA from multiple individuals, its high contiguity (Fig. 1) and consistent sequencing depth indicate that the presence of structural variants has not hampered the assembly of a representative reference genome. This might be aided by a lack of repetitive content (see supplementary table S2, Supplementary Material online). No mitogenome was identified, possibly due to size selection during sequencing (Chen et al. 2024).

Telomere-to-telomere mite genome. a) Schematic of the two telomere-to-telomere chromosomes. Black circles represent Telociraptor telomere predictions and blue circles represent TIDK-predicted telomere repeats. Triangles along chromosome borders represent Duplicated BUSCO genes (top, forward strand; bottom, reverse strand). Other symbols are rRNA gene predictions. The main rDNA repeat is marked with a “B”. Genes are filled according to DepthKopy predicted copy number (CN). Blue and red connections between chromosomes represent collinear and inverted pairs of Duplicated BUSCO genes on different chromosomes. b) Schematic of rDNA repeat region from chromosome 1, with six full repeats assembled. c) DepthKopy predicted copy number distribution for the assembled mite. Mapped ONT read depths have been converted into copy number distributions using the modal single-copy read depth from Complete BUSCO genes of 35.7X. Whiskers extend to the most extreme values no further than 1.5 times the interquartile range. BUSCO, single-copy complete BUSCO genes; Duplicated, duplicated complete BUSCO genes (all copies); GeMoMa, GeMoMa gene models; rDNA, rDNA gene models; tRNA, tRNA gene models; Repeats, complex repeats; STR, short tandem repeats; LowComp, low complexity regions; Windows, 10 kb tiled windows. d) Synteny between assembled mite chromosomes and A. lycopersici (tomato russet mite) (ACULYCv1). Green and yellow-colored boxes represent chromosomes/scaffolds containing complete BUSCO genes. Blue and red blocks represent blocks of collinear and inverted synteny based on 2 + adjacent BUSCO genes in the same relative orientation. Black circles represent Telociraptor telomere predictions and blue circles represent TIDK-predicted telomere repeats. Yellow triangles mark Duplicated BUSCO genes. Red plus symbols represent assembly gaps.
Fig. 1.

Telomere-to-telomere mite genome. a) Schematic of the two telomere-to-telomere chromosomes. Black circles represent Telociraptor telomere predictions and blue circles represent TIDK-predicted telomere repeats. Triangles along chromosome borders represent Duplicated BUSCO genes (top, forward strand; bottom, reverse strand). Other symbols are rRNA gene predictions. The main rDNA repeat is marked with a “B”. Genes are filled according to DepthKopy predicted copy number (CN). Blue and red connections between chromosomes represent collinear and inverted pairs of Duplicated BUSCO genes on different chromosomes. b) Schematic of rDNA repeat region from chromosome 1, with six full repeats assembled. c) DepthKopy predicted copy number distribution for the assembled mite. Mapped ONT read depths have been converted into copy number distributions using the modal single-copy read depth from Complete BUSCO genes of 35.7X. Whiskers extend to the most extreme values no further than 1.5 times the interquartile range. BUSCO, single-copy complete BUSCO genes; Duplicated, duplicated complete BUSCO genes (all copies); GeMoMa, GeMoMa gene models; rDNA, rDNA gene models; tRNA, tRNA gene models; Repeats, complex repeats; STR, short tandem repeats; LowComp, low complexity regions; Windows, 10 kb tiled windows. d) Synteny between assembled mite chromosomes and A. lycopersici (tomato russet mite) (ACULYCv1). Green and yellow-colored boxes represent chromosomes/scaffolds containing complete BUSCO genes. Blue and red blocks represent blocks of collinear and inverted synteny based on 2 + adjacent BUSCO genes in the same relative orientation. Black circles represent Telociraptor telomere predictions and blue circles represent TIDK-predicted telomere repeats. Yellow triangles mark Duplicated BUSCO genes. Red plus symbols represent assembly gaps.

Manual inspection of the four chromosome ends identified a putative 5 bp TTTGG telomere consensus sequence, which appears to be a novel variant of the more common 6 bp arthropod TTTGGG telomere sequence. It does not match a known consensus telomere repeat in the Telomerase database (Podlevsky et al. 2008) nor TeloBase (Lyčka et al. 2024). Assembled telomere lengths were estimated by extracting the low-complexity A/C-rich 5′ ends and G/T-rich 3′ ends from each chromosome: 17,399 bp (5′) and 12,022 bp (3′) for chromosome 1; 10,729 bp (5′) and 6,824 bp (3′) for chromosome 2. In total, 0.14% of the assembly was telomere, and 3.54% of all TTTGG 5-mers occurred in these regions (∼25 × enrichment), which rose to 53.23% for the TTTGGTTTGG tandem repeat (∼380 × enrichment) (supplementary table S3, Supplementary Material online). Nevertheless, only 53.36% of the telomeres comprised TTTGG motifs, so Tandem Repeats Finder (Benson 1999) was used to identify the dominant motifs. All four telomeres returned an 11-mer consensus, corresponding to TTTGGTGTTGG. The additional TGTTGG 6mer accounted for another 44.24% of the telomeres (>50 × enrichment vs. nontelomeric sequence). Exact copies of the full TTTGGTGTTGG 11mer accounted for 78.15% telomeres but only 0.04% nontelomeres (>2,000 × enrichment) with over 75% of all TTTGGTGTTGG copies found in the telomeres.

To identify the sequenced mite, phylogenomic analysis was performed with the arachnida_odb10 complete BUSCO genes from the assembled mite genome, along with the corresponding BUSCO genes from 36 mite genomes that were downloaded from NCBI (November 2021, 04) (supplementary table S4, Supplementary Material online). HAQESAC v1.14.0 (Edwards et al. 2007) was used to generate protein multiple sequence alignments and phylogenetic trees for all species returning a single copy ortholog for that gene. Briefly, HAQESAC compares all proteins to the query and removes sequences too distant or incomplete before aligning with Clustal Omega v1.2.4 (Sievers et al. 2011). Poorly aligned sequences were removed by HAQESAC (Edwards et al. 2007) and a phylogeny was inferred by FastTree v2.1.11 (Price et al. 2010) before being mid-point rooted. No additional filtering was performed. Trees were then combined into a consensus tree with ASTRAL v5.7.8 (Zhang et al. 2018). Synteny analysis based on BUSCO genes (arachnida_odb10 database) was used to compare the mite genome to ACULYCv1 (Greenhalgh et al. 2020) and visualized with ChromSyn (Edwards et al. 2022). Telomere prediction was performed using Telociraptor v0.9.0 (Edwards 2023) with forward and reverse regular expressions C{1,3}A{2,4} and T{2,4}G{1,3}, based on the simpler TTTGG telomere repeat identified and TIDK v0.2.31 (Brown et al. 2023) with the sequence CCAAA. Ribosomal rDNA genes were predicted using barrnap v0.9.0 (Seemann 2018) in eukaryotic mode.

The phylogenomic analysis provided strong support (98.6% supporting gene quartets) to position the new mite within the superfamily Eriophyoidea, a highly diverse lineage of mites with high host specificity in over 80% of the group (Skoracka et al. 2010) (Fig. 2). However, even the closest sequenced relative, tomato russet mite (A. lycopersici) (Greenhalgh et al. 2020) was distantly related, with little conserved synteny and only 71% protein sequence identity (Fig. 1). BLAST + blastn searches of the 18S and 28S rRNA genes against NCBI core_nt (December 2024, 12) (Camacho et al. 2009) confirmed this placement, with the top ten hits for both genes all to members of the Eriophyidae family (data not shown). Detailed phylogenomic analysis is beyond the scope of this study, but our phylogeny places the superfamily Eriophyoidea outside the suborder Trombidiformes, in accordance with other whole genome studies (Bolton et al. 2023). Eriophyoid mites are microscopic (averaging 200 to 300 µm) and can reside within host tissue, presenting additional challenges for identification and removal (Amrine and Manson 1996). Their presence can be seasonal which further complicates detection (Chetverikov et al. 2022). We examined multiple leaves from the tree in the Sydney Botanic Gardens using a dissecting microscope but were unable to detect any eriophyoid mites. The fact that the contaminating mite was collected and sequenced years apart—first for 10 × linked reads then ONT—suggests the contaminant was persistent and abundant and not removed by rinsing the leaves in water. Additionally, eriophyoid mites were absent on two specimens of R. argentea from the Australian National Botanic Garden examined for comparison. Vagrant mites belonging to other groups were detected (unidentified Phytoseiidae and Oribatida), but these are presumably less host specific (Oldfield 1996). We have been unable to locate any published records of eriophyoid mites on trees in the genus Rhodamnia. While this study provides no concrete data on the mite's taxonomy or host specificity, Scott (1979) observed fruits that were deformed by galls on R. argentea that may indicate mite-induced damage. Eriophyoid mites can serve as vectors for viruses (de Lillo et al. 2018) and have been observed as vectors for rust fungi (Puccinia spp.) (Gamliel-Atinsky et al. 2010). Given the severity of the threat of myrtle rust to species of Rhodamnia, future work may investigate the role of these mites as vectors.

Mite phylogenomics and genome reduction. a) ASTRAL consensus tree for unknown mite (square) and 36 NCBI Acari genomes using 1,877 BUSCO complete gene trees (arachnida_odb10). Internal nodes are labeled with the percentage of quartets in the BUSCO gene trees that support the ancestral branch. Values below 50 are not shown. Nodes labeled with numbers and symbols correspond to taxonomic groups in b) B. BUSCO completeness and compiled BUSCOMP completeness for taxonomic groups labeled in a). For each taxonomic grouping in the phylogenomic tree, numbers report the best ratings for each BUSCO gene across all species in that clade.
Fig. 2.

Mite phylogenomics and genome reduction. a) ASTRAL consensus tree for unknown mite (square) and 36 NCBI Acari genomes using 1,877 BUSCO complete gene trees (arachnida_odb10). Internal nodes are labeled with the percentage of quartets in the BUSCO gene trees that support the ancestral branch. Values below 50 are not shown. Nodes labeled with numbers and symbols correspond to taxonomic groups in b) B. BUSCO completeness and compiled BUSCOMP completeness for taxonomic groups labeled in a). For each taxonomic grouping in the phylogenomic tree, numbers report the best ratings for each BUSCO gene across all species in that clade.

Genome Annotation and Evidence for Genome Reduction

Despite the low BUSCO completeness, the gapless chromosomes and high (35.7×) mean sequencing depth give great confidence that this is a complete nuclear genome. The size of the assembled mite genome is comparable to two previously published eriophyid mite genomes—A. lycopersici (Greenhalgh et al. 2020) and Fragariocoptes setiger (Klimov et al. 2022). All three Eriophyoidea species showed similarly low BUSCO completeness with many of the same genes missing. Of the 882 BUSCO genes missing in our mite genome, 832 were also missing in A. lycopersici (Fig. 2b, Eriophyidae), and 716 of these were missing in F. setiger (Fig. 2b, Eriophyoidea). In contrast, only 20 genes (0.7%) were missing across Eriophyoidea and neighboring Trombidiformes, despite nontick mites typically having small genomes (Gregory and Young 2020). This pattern of loss in Eriophyoidea genomes is consistent with genome reduction (Hessen et al. 2010) occurring early in this lineage. However, continued lineage-specific losses suggest that genome reduction is an ongoing process. This is further supported by a surprisingly high number of Duplicated BUSCO genes, which appear to represent genuine duplications, consistent with ongoing adaptive evolution. It is also notable that the proportion of missing BUSCO genes increased considerably with increasingly taxonomically restricted BUSCO datasets, suggesting that it is more derived traits that are being altered. The reasons for the disparity in genome size between ticks and other mites remain unclear, but it is plausible that small mite body sizes are made possible by streamlined genomes and/or that the evolution of small bodies has constrained genome size compared with other arthropods (Gregory and Young 2020). Increased mite genomic resources will broaden our understanding of the evolutionary mechanisms responsible for reduced genome sizes.

The mite genome was annotated with GeMoMa v1.7.1 (Keilwagen et al. 2019) with 15 invertebrate genomes annotated by NCBI (supplementary table S4, Supplementary Material online). Predicted transcriptome statistics were generated with SAAGA v0.7.7 (Edwards et al. 2021). Annotation completeness was assessed using BUSCO in proteome mode. Ribosomal RNA (rRNA) genes were predicted with Barrnap v0.9 (Seemann 2018) and transfer RNAs (tRNAs) were predicted with tRNAscan-SE v2.05 (Lowe and Chan 2016), implementing Infernal v1.1.2 (Nawrocki and Eddy 2013). A custom repeat library was generated with RepeatModeler v2.0.1 (-engine ncbi) and the genome was masked with RepeatMasker v4.1.0 (Tarailo-Graovac and Chen 2009). Telomeric repeat sequences were identified by manually inspecting the ends of the two chromosome-scale scaffolds and creating a consensus repeat unit. The 36 mite genomes downloaded from NCBI (November 2021, 04) (supplementary table S4, Supplementary Material online) were subject to the same BUSCO, GeMoMa, SAAGA and RepeatModeler/RepeatMasker annotation for a consistent comparison. BUSCO results for all the mite species and selected taxonomic levels were compiled using BUSCOMP v1.1.3 (Stuart et al. 2022).

GeMoMa genome annotation predicted 8,071 protein-coding genes (9,269 mRNAs), with 18 rRNAs and 56 tRNAs. The 18 rRNA genes comprised 6 tandem copies of the complete 18S-5.8S-28S rDNA repeat unit. DepthKopy analysis revealed that the rDNA cluster was collapsed in the assembly and the 6 copies of the repeat unit represented 36 copies in the actual genome (mean copy number per gene ∼12) (Fig. 1). The genome had a low repeat content at 11.1% (supplementary table S2, Supplementary Material online). There was no sign of widespread collapsed repeats in the assembly (Fig. 1c). Along with the two other eriophyoid mites, the annotation reveals a reduction in the number of introns and transposable elements, and an overall reduction in repetitive and intergenic content (Fig. 3). Despite a reduction in gene numbers, the percentage of the genome that corresponds to exons and genes is higher in these species (Fig. 3a). While detailed annotation and investigation of gene content is beyond the scope of this study, our gapless telomere-to-telomere nuclear genome provides an excellent resource to study genome reduction, even though the mite species remains unidentified.

Comparison of mite genome attributes. Genome statistics for 37 Acari genomes are shown as density plots, with individual statistics marked for the assembled eriophyoid mite genome (RARGMITEv1; square), A. lycopersici (tomato russet mite genome ACULYCv1; circle), and F. setiger (gall mite genome FRASETv1; triangle). a) Genomic features by percentage of genome. b) Introns and transposable elements by count. c) Genome features by length. LowComp, low complexity; SSRs, simple sequence repeats; TEs, transposable elements; SEGs, single-exon genes.
Fig. 3.

Comparison of mite genome attributes. Genome statistics for 37 Acari genomes are shown as density plots, with individual statistics marked for the assembled eriophyoid mite genome (RARGMITEv1; square), A. lycopersici (tomato russet mite genome ACULYCv1; circle), and F. setiger (gall mite genome FRASETv1; triangle). a) Genomic features by percentage of genome. b) Introns and transposable elements by count. c) Genome features by length. LowComp, low complexity; SSRs, simple sequence repeats; TEs, transposable elements; SEGs, single-exon genes.

Identification and Removal of Assembly Contamination

Improvements in sequencing technology have massively increased the ease with which whole genomes can be sequenced and assembled (Gupta 2022). However, as the number of genomes rises, the relative resources available per genome for curation decrease (Howe et al. 2021). Genome contamination is a well-known problem and is prevalent among short-read and unpublished draft genomes (Kumar et al. 2013; Chrisman et al. 2022; Steinegger and Salzberg 2020). The mite contamination in the R. argentea draft genome was particularly difficult to identify in the initial draft assembly due to the large number of variable-quality contigs, high depth of contaminant sequencing, and lack of assemblies from related organisms. Variations in read depth and guanine-cytosine (GC) content were discounted as possible regional biases. Instead, contamination in this case was identified by targeted searches for arthropod sequences (pers. comm. W. Dermauw, J. Santolini, and S. Gerber).

Tools such as Blobtoolkit (Kumar et al. 2013), FCS-GX (Astashyn et al. 2024), Conterminator (Steinegger and Salzberg 2020), Tiara (Karlicki et al. 2022), and ContScout (Bálint et al. 2024) improve the ease and resolution of genomic DNA contamination screening. The approach used here, Taxolotl (Tobias et al. 2022), complements DNA-based methods by assigning taxonomy to protein sequences, which enables matches across greater evolutionary distances. The routine FCS-GX screening by NCBI upon submission should greatly reduce genomes with significant contamination. Nevertheless, caution should be used when interpreting unusual findings (e.g. Saffar and Matin 2021). This is particularly true when using unpublished draft genomes, where it is always good practice to contact the submitting researchers. Long-term, the community needs improved mechanisms by which contamination in genomes can be removed from public repositories. Ideally, expert institutions (e.g. zoological/botanical gardens and museums) will be given the resources and power to leverage high-confidence data and taxonomic knowledge to identify and flag contamination for removal.

Conclusions

The power of long-read sequencing is illustrated by the ability to recover a gapless telomere-to-telomere mite genome from contaminating DNA. At 34.3 Mbp, our mite assembly is not much bigger than the (less intact) 32.5 Mbp A. lycopersici genome, and is notable in its completeness despite poor BUSCO results. While the identity of the contaminating mite remains a mystery, the completeness of its genome should provide a valuable resource for investigating genome reduction and other characteristic traits of eriophyoid mites. The novel TTTGG and TTTGGTGTTGG sequences identified could improve telomere identification in other mites if this is a shared trait.

Supplementary Material

Supplementary material is available at Genome Biology and Evolution online.

Acknowledgments

We are grateful to Jérôme Santolini, Sophie Gerber, and Wannes Dermauw for contacting us about contamination in the draft genome. We thank Tamera Beath for assistance with sampling at the Australian National Botanic Garden. This research includes computations using the computational cluster Katana supported by Research Technology Services at UNSW Sydney (https://doi.org/10.26190/669x-a286.) and the Pawsey Supercomputing Research Centre's Setonix Supercomputer (https://doi.org/10.48569/18sb-8s43.), with funding from the Australian Government and the Government of Western Australia.

Author Contributions

Richard Edwards (Conceptualization, Investigation, Methodology, Software, Formal analysis, Writing—original draft, Writing—review and editing, Supervision), Stephanie Chen (Investigation, Formal analysis, Writing—original draft, Writing—review and editing), Bruce Halliday (Investigation, Writing—review and editing), and Jason Bragg (Writing—review and editing, Supervision).

Funding

This research was funded by the Australian Research Council (LP18010072) and the University of New South Wales. Stephanie Chen was supported by an Australian Government Research Training Program (RTP) Scholarship.

Data Availability

The mite genome (Rarg_mite_v1.1) was uploaded to BioProject PRJNA737568 which contains sequences and genomes of the Rhodamnia argentea host plant. The raw 10 × linked-reads were deposited under BioProject PRJEB30444 that was created for an earlier draft genome. Additional data supporting this work can be found on OSF, doi: 10.17605/OSF.IO/EV2NY (https://osf.io/ev2ny/).

Benefit Sharing

All raw sequencing data and assembled genomes have been shared with the broader public via appropriate biological databases.

Literature Cited

Amrine
 
JW
,
Manson
 
DCM
. 1.6.3 Preparation, mounting and descriptive study of eriophyoid mites. In:
Eriophyoid mites their biology, natural enemies and control
. World Crop Pests.
Lindquist
,
EE
,
Sabelis
,
MW
, &
Bruin
,
J
, editors. Vol.
6
 
Elsevier
;
1996
. pp.
383
396
. https://www.sciencedirect.com/science/article/abs/pii/S1572437996800236?via%3Dihub

Amrine
 
JW
,
Stasny
 
TA
,
Flechtmann
 
CH
.
Revised keys to world genera of eriophyoidea (acari: prostigmata)
.
Michigan, USA
:
Indira Publishing House
;
2003
.

Arkhipova
 
IR
.
Neutral theory, transposable elements, and eukaryotic genome evolution
.
Mol Biol Evol.
 
2018
:
35
(
6
):
1332
1337
. .

Astashyn
 
A
,
Tvedte
 
ES
,
Sweeney
 
D
,
Sapojnikov
 
V
,
Bouk
 
N
,
Joukov
 
V
,
Mozes
 
E
,
Strope
 
PK
,
Sylla
 
PM
,
Wagner
 
L
, et al.  
Rapid and sensitive detection of genome contamination at scale with FCS-GX
.
Genome Biol
.
2024
:
25
(
1
):
60
. .

Bálint
 
B
,
Merényi
 
Z
,
Hegedüs
 
B
,
Grigoriev
 
IV
,
Hou
 
Z
,
Földi
 
C
,
Nagy
 
LG
.
ContScout: sensitive detection and removal of contamination from annotated genomes
.
Nat Commun.
 
2024
:
15
(
1
):
936
. .

Benson
 
G
.
Tandem repeats finder: a program to analyze DNA sequences
.
Nucleic Acids Res
.
1999
:
27
(
2
):
573
580
. .

Bolton
 
SJ
,
Chetverikov
 
PE
,
Ochoa
 
R
,
Klimov
 
PB
.
Where eriophyoidea (acariformes) belong in the tree of life
.
Insects
.
2023
:
14
(
6
):
527
. .

Bushnell
 
B.
BBMap: a fast, accurate, splice-aware aligner. 2014. https://sourceforge.net/projects/bbmap/ (Accessed March 4, 2021).

Camacho
 
C
,
Coulouris
 
G
,
Avagyan
 
V
,
Ma
 
N
,
Papadopoulos
 
J
,
Bealer
 
K
,
Madden
 
TL
.
BLAST+: architecture and applications
.
BMC Bioinformatics
.
2009
:
10
(
1
):
421
. .

Chen
 
SH
,
Jones
 
A
,
Lu-Irving
 
P
,
Yap
 
J-YS
,
van der Merwe
 
M
,
Bragg
 
JG
,
Edwards
 
RJ
.
Chromosome-Level genome assembly of the Australian rainforest tree Rhodamnia argentea (Malletwood)
.
Genome Biol Evol
.
2024
:
16
(
11
):
evae238
. .

Chen
 
SH
,
Rossetto
 
M
,
van der Merwe
 
M
,
Lu-Irving
 
P
,
Yap
 
J-YS
,
Sauquet
 
H
,
Bourke
 
G
,
Amos
 
TG
,
Bragg
 
JG
,
Edwards
 
RJ
.
Chromosome-level de novo genome assembly of Telopea speciosissima (New South Wales waratah) using long-reads, linked-reads and Hi-C
.
Mol Ecol Resour.
 
2022
:
22
(
5
):
1836
1854
. .

Chetverikov
 
PE
,
Klimov
 
PB
,
He
 
Q
.
Vertical transmission and seasonal dimorphism of eriophyoid mites (Acariformes, Eriophyoidea) parasitic on the Norway maple: a case study
.
R Soc Open Sci.
 
2022
:
9
(
9
):
220820
. .

Chrisman
 
B
,
He
 
C
,
Jung
 
J-Y
,
Stockham
 
N
,
Paskov
 
K
,
Washington
 
P
,
Wall
 
DP
.
The human “contaminome”: bacterial, viral, and computational contamination in whole genome sequences from 1000 families
.
Sci Rep.
 
2022
:
12
(
1
):
9863
. .

Coombe
 
L
,
Li
 
JX
,
Lo
 
T
,
Wong
 
J
,
Nikolic
 
V
,
Warren
 
RL
,
Birol
 
I
.
LongStitch: high-quality genome assembly correction and scaffolding using long reads
.
BMC Bioinformatics
.
2021
:
22
(
1
):
534
. .

de Lillo
 
E
,
Pozzebon
 
A
,
Valenzano
 
D
,
Duso
 
C
.
An intimate relationship between eriophyoid mites and their host plants—a review
.
Front Plant Sci.
 
2018
:
9
:
1786
. .

Edwards
 
RJ.
Telociraptor: Telomere Prediction and Genome Assembly Editing Tool. 2023. https://github.com/slimsuite/telociraptor.

Edwards
 
RJ
,
Field
 
MA
,
Ferguson
 
JM
,
Dudchenko
 
O
,
Keilwagen
 
J
,
Rosen
 
BD
,
Johnson
 
GS
,
Rice
 
ES
,
Hillier
 
LD
,
Hammond
 
JM
, et al.  
Chromosome-length genome assembly and structural variations of the primal Basenji dog (Canis lupus familiaris) genome
.
BMC Genomics
.
2021
:
22
(
1
):
188
. .

Edwards
 
RJ
,
Moran
 
N
,
Devocelle
 
M
,
Kiernan
 
A
,
Meade
 
G
,
Signac
 
W
,
Foy
 
M
,
Park
 
SDE
,
Dunne
 
E
,
Kenny
 
D
, et al.  
Bioinformatic discovery of novel bioactive peptides
.
Nat Chem Biol.
 
2007
:
3
(
2
):
108
112
. .

Gamliel-Atinsky
 
E
,
Freeman
 
S
,
Maymon
 
M
,
Belausov
 
E
,
Ochoa
 
R
,
Bauchan
 
G
,
Skoracka
 
A
,
Peña
 
J
,
Palevsky
 
E
. The role of eriophyoids in fungal pathogen epidemiology, mere association or true interaction? In:
Ueckermann
 
EA
, editors.
Eriophyoid mites: progress and prognoses
.
Dordrecht, Netherlands
:
Springer
;
2010
. p.
191
204
.

Gougherty
 
AV
,
Davies
 
TJ
.
A global analysis of tree pests and emerging pest threats
.
Proc Natl Acad Sci U S A.
 
2022
:
119
(
13
):
e2113298119
. .

Greenhalgh
 
R
,
Dermauw
 
W
,
Glas
 
JJ
,
Rombauts
 
S
,
Wybouw
 
N
,
Thomas
 
J
,
Alba
 
JM
,
Pritham
 
EJ
,
Legarrea
 
S
,
Feyereisen
 
R
, et al.  
Genome streamlining in a minute herbivore that manipulates its host plant Weigel, D & Heckel, D, editors
.
eLife
.
2020
:
9
:
e56689
. .

Gregory
 
TR
,
Young
 
MR
.
Small genomes in most mites (but not ticks)
.
Int J Acarol.
 
2020
:
46
(
1
):
1
8
. .

Gupta
 
PK
.
Earth biogenome project: present status and future plans
.
Trends Genet
.
2022
:
38
(
8
):
811
820
. .

Hessen
 
DO
,
Jeyasingh
 
PD
,
Neiman
 
M
,
Weider
 
LJ
.
Genome streamlining and the elemental costs of growth
.
Trends Ecol Evol.
 
2010
:
25
(
2
):
75
80
. .

Howe
 
K
,
Chow
 
W
,
Collins
 
J
,
Pelan
 
S
,
Pointon
 
D-L
,
Sims
 
Y
,
Torrance
 
J
,
Tracey
 
A
,
Wood
 
J
.
Significantly improving the quality of genome assemblies through curation
.
Gigascience
.
2021
:
10
(
1
):
giaa153
. .

Brown
 
M
,
González De la Rosa
 
PM
,
Mark
 
B.
 A Telomere Identification Toolkit.
2023
. .

Karlicki
 
M
,
Antonowicz
 
S
,
Karnkowska
 
A
.
Tiara: deep learning-based classification system for eukaryotic sequences
.
Bioinformatics
.
2022
:
38
(
2
):
344
350
. .

Keilwagen
 
J
,
Hartung
 
F
,
Grau
 
J
.
Gemoma: homology-based gene prediction utilizing intron position conservation and RNA-seq data
.
Methods Mol. Biol.
.
2019
:
1962
:
161
177
. .

Klimov
 
PB
,
Chetverikov
 
PE
,
Dodueva
 
IE
,
Vishnyakov
 
AE
,
Bolton
 
SJ
,
Paponova
 
SS
,
Lutova
 
LA
,
Tolstikov
 
AV
.
Symbiotic bacteria of the gall-inducing mite Fragariocoptes setiger (Eriophyoidea) and phylogenomic resolution of the eriophyoid position among Acari
.
Sci Rep.
 
2022
:
12
(
1
):
3811
. .

Kolmogorov
 
M
,
Yuan
 
J
,
Lin
 
Y
,
Pevzner
 
PA
.
Assembly of long, error-prone reads using repeat graphs
.
Nat Biotechnol.
 
2019
:
37
(
5
):
540
546
. .

Kumar
 
S
,
Jones
 
M
,
Koutsovoulos
 
G
,
Clarke
 
M
,
Blaxter
 
M
.
Blobology: exploring raw genome data for contaminants, symbionts and parasites using taxon-annotated GC-coverage plots
.
Front Genet.
 
2013
:
4
:
237
. .

Kundu
 
R
,
Casey
 
J
,
Sung
 
W-K.
HyPo: super fast and accurate polisher for long read genome assemblies. bioRxiv 2019.12.19.882506. , 20 December 2019, preprint: not peer-reviewed.

Lindquist
 
EE
,
Bruin
 
J
,
Sabelis
 
MW
.
Eriophyoid mites: their biology, natural enemies and control
.
Elsevier
;
1996
.

Lowe
 
TM
,
Chan
 
PP
.
tRNAscan-SE On-line: integrating search and context for analysis of transfer RNA genes
.
Nucleic Acids Res
.
2016
:
44
(
W1
):
W54
W57
. .

Lyčka
 
M
,
Bubeník
 
M
,
Závodník
 
M
,
Peska
 
V
,
Fajkus
 
P
,
Demko
 
M
,
Fajkus
 
J
,
Fojtová
 
M
.
TeloBase: a community-curated database of telomere sequences across the tree of life
.
Nucleic Acids Res
.
2024
:
52
(
D1
):
D311
D321
. .

Manni
 
M
,
Berkeley
 
MR
,
Seppey
 
M
,
Simão
 
FA
,
Zdobnov
 
EM
.
BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes
.
Mol Biol Evol.
 
2021
:
38
(
10
):
4647
4654
. .

Nawrocki
 
EP
,
Eddy
 
SR
.
Infernal 1.1: 100-fold faster RNA homology searches
.
Bioinformatics
.
2013
:
29
(
22
):
2933
2935
. .

Oldfield
 
GN
. 1.4.3 Diversity and host plant specificity. In:
Eriophyoid mites their biology, natural enemies and Control
. World Crop Pests.
Lindquist
,
EE
,
Sabelis
,
MW
, &
Bruin
,
J
, editors. Vol.
6
 
Elsevier
;
1996
. pp.
199
216
.

Podlevsky
 
JD
,
Bley
 
CJ
,
Omana
 
RV
,
Qi
 
X
,
Chen
 
JJ-L
.
The telomerase database
.
Nucleic Acids Res
.
2008
:
36
(
Database
):
D339
D343
. .

Price
 
MN
,
Dehal
 
PS
,
Arkin
 
AP
.
FastTree 2—approximately maximum-likelihood trees for large alignments
.
PLoS One
.
2010
:
5
(
3
):
e9490
. .

Saffar
 
A
,
Matin
 
MM
.
Tracing foreign sequences in plant transcriptomes and genomes using OCT4, a POU domain protein
.
Mol Genet Genomics.
 
2021
:
296
(
3
):
677
688
. .

Scott
 
AJ
.
A revision of Rhodamnia (Myrtaceae)
.
Kew Bull
.
1979
:
33
(
3
):
429
459
. .

Seemann
 
T.
barrnap. 2018. https://github.com/tseemann/barrnap (Accessed December 11, 2021).

Sievers
 
F
,
Wilm
 
A
,
Dineen
 
D
,
Gibson
 
TJ
,
Karplus
 
K
,
Li
 
W
,
Lopez
 
R
,
McWilliam
 
H
,
Remmert
 
M
,
Söding
 
J
, et al.  
Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega
.
Mol Syst Biol.
 
2011
:
7
(
1
):
539
. .

Skoracka
 
A
,
Smith
 
L
,
Oldfield
 
G
,
Cristofaro
 
M
,
Amrine
 
JW
.
Host-plant specificity and specialization in eriophyoid mites and their importance for the use of eriophyoid mites as biocontrol agents of weeds
.
Exp Appl Acarol.
 
2010
:
51
(
1-3
):
93
113
. .

Steinegger
 
M
,
Salzberg
 
SL
.
Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank
.
Genome Biol
.
2020
:
21
(
1
):
115
. .

Stuart
 
KC
,
Edwards
 
RJ
,
Cheng
 
Y
,
Warren
 
WC
,
Burt
 
DW
,
Sherwin
 
WB
,
Hofmeister
 
NR
,
Werner
 
SJ
,
Ball
 
GF
,
Bateson
 
M
, et al.  
Transcript- and annotation-guided genome assembly of the European starling
.
Mol Ecol Resour.
 
2022
:
22
(
8
):
3141
3160
. .

Tarailo-Graovac
 
M
,
Chen
 
N
.
Using RepeatMasker to identify repetitive elements in genomic sequences
.
Curr. Protoc. Bioinforma
.
2009
:
25
(
1
):
4.10.1
4.10.14
. .

Tobias
 
PA
,
Edwards
 
RJ
,
Surana
 
P
,
Mangelson
 
H
,
Inácio
 
V
,
do Céu Silva
 
M
,
Várzea
 
V
,
Park
 
RF
,
Batista
 
D.
A chromosome-level genome resource for studying virulence mechanisms and evolution of the coffee rust pathogen Hemileia vastatrix. bioRxiv 502101. , 1 August 2022, preprint: not peer reviewed.

Ueckermann
 
EA
.
Eriophyoid mites: progress and prognoses
.
Dordrecht, Netherlands
:
Springer
;
2010
.

Zhang
 
C
,
Rabiee
 
M
,
Sayyari
 
E
,
Mirarab
 
S
.
ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees
.
BMC Bioinformatics
.
2018
:
19
(
S6
):
153
. .

Zhang
 
Z-Q
.
Eriophyoidea and allies: where do they belong?
 
Syst Appl Acarol
.
2017
:
22
(
8
):
1091
1095
. .

Author notes

Richard J Edwards and Stephanie H Chen contributed equally to the work.

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact [email protected].
Associate Editor: Christopher Wheat
Christopher Wheat
Associate Editor
Search for other works by this author on:

Supplementary data