-
PDF
- Split View
-
Views
-
Cite
Cite
Richard J Edwards, Stephanie H Chen, Bruce Halliday, Jason G Bragg, Small but Mitey: A Gapless Telomere-to-Telomere Assembly of an Unidentified Mite With a Streamlined Genome, Genome Biology and Evolution, Volume 17, Issue 2, February 2025, evaf023, https://doi.org/10.1093/gbe/evaf023
- Share Icon Share
Abstract
A draft assembly of the rainforest tree Rhodamnia argentea Benth. (malletwood, Myrtaceae) revealed contaminating DNA sequences that most closely matched those from mites in the family Eriophyidae. Eriophyoid mites are plant parasites that often induce galls or other deformities on their host plants. They are notable for their small size (averaging 200 μm), distinctive four-legged body structure, and heavily streamlined genomes, which are among the smallest known of all arthropods. Contaminating mite sequences were assembled into a high-quality gapless telomere-to-telomere nuclear genome. The entire genome was assembled on two fully contiguous chromosomes, capped with a novel TTTGG or TTTGGTGTTGG telomere sequence, and exhibited clear signs of genome reduction (34.5 Mbp total length, 68.6% arachnid Benchmarking Universal Single-Copy Ortholog completeness). Phylogenomic analysis confirmed that this genome is that of a previously unsequenced eriophyoid mite. Despite its unknown identity, this complete nuclear genome provides a valuable resource to investigate invertebrate genome reduction.
Eriophyoid mites are microscopic, four-legged, plant-infesting mites that exhibit heavily streamlined genomes, in which many of the normal genes have been lost. This study reports a complete (gapless) telomere-to-telomere assembly of the nuclear genome of an eriophyoid mite from contamination in host tree sequencing data. The assembly provides a powerful resource to investigate the evolutionary phenomenon of genome reduction.
Trees are hosts to a diverse range of pests and diseases (Gougherty and Davies 2022). Eriophyoid mites (Acari: Eriophyoidea) are a hyperdiverse lineage of over 5,000 species that feed on plants (Zhang 2017; Ueckermann 2010). These tiny four-legged mites, averaging 200 μm in length (Amrine et al. 2003), have the distinct ability to induce galls (de Lillo et al. 2018; Lindquist et al. 1996). As well as being physically small, eriophyoid mites have very small genomes. For example, the 32.5 Mbp genome of the tomato russet mite Aculops lycopersici (Tryon, 1917) (Greenhalgh et al. 2020) is the smallest yet reported for any arthropod. It has few transposable elements, tiny intergenic regions, and is remarkably intron-poor, as more than 80% of coding genes are intronless (Greenhalgh et al. 2020). Reminiscent of microbial eukaryotes, this reduction of nuclear genome content is a consequence of genome streamlining (Arkhipova 2018; Hessen et al. 2010).
Pest and pathogen DNA can often contribute contaminating DNA to genome assembly projects. We recently sequenced and assembled the genome of Rhodamnia argentea Benth. (malletwood) to inform conservation management (Chen et al. 2024). During this process, arthropod sequences were detected as contaminants at surprisingly high read depths. While these were hard to separate using linked-read sequencing, the addition of Oxford Nanopore Technologies (ONT) long reads made it possible to isolate and independently assemble the tree and mite genomes. Here, we present a telomere-to-telomere reference genome of the contaminating eriophyoid mite, assembled from ONT long reads and 10 × Genomics Chromium linked reads. Our efforts to isolate and identify the mite itself were not successful. Nevertheless, our complete and gapless nuclear genome of this unidentified species shows dramatic streamlining of content and will provide an invaluable resource for future studies of arthropod genome reduction.
Genome Sequencing and Assembly
The sampling and sequencing for the host tree R. argentea is described in Chen et al. (2024). Briefly, young leaves of R. argentea were sampled from a specimen at the Royal Botanic Garden Sydney (NCBI BioSample SAMN19698777, ToLID drRhoArge1) and high molecular weight gDNA extracted for ONT and 10 × Genomics Chromium (Illumina) sequencing, following size selection. Scaffolding of the tree genome to chromosome level was achieved through Hi-C sequencing (Phase Genomics Proximo Hi-C Plant kit). ONT and 10 × reads were partitioned into three sets based on mapping to the contaminated R. argentea genome: (i) R. argentea reads, (ii) contamination reads mapping to removed scaffolds, and (iii) unmapped reads. Contamination and unmapped reads were then combined (“non-drRhoArge1 reads”). The 10 × reads were trimmed (5′ 30 bp of R1, 5′ 10 bp of R2, quality trimming < Q20) with bbmap v38.51 (Bushnell 2014) and filtered to sequences over 100 bp long.
Non-drRhoArge1 ONT reads were assembled with Flye v2.9 (Kolmogorov et al. 2019) with a target genome size (based on the filtered scaffolds) of 35 Mbp. Scaffolding and gap-filling was done with LongStitch-ARKS v1.0.1 (Coombe et al. 2021) and low-quality scaffolds were removed with Diploidocus v1.1.1 (cycle mode) (Chen et al. 2022). The genome was then polished with Hypo v1.0.3 (Kundu et al. 2019). The assembly was subject to a final contamination screen by running FCS-GX v0.4.0 (Astashyn et al. 2024) and Tiara v1.0.3 (Karlicki et al. 2022). FCS-GX was run against the NCBI gxdb (build date 2023 to 2001-24). DepthKopy v1.0.1 analysis was performed to check for low-quality/coverage sequences. Assembly completeness was evaluated with Benchmarking Universal Single-Copy Ortholog (BUSCO) v5.2.2 (Manni et al. 2021) against the arachnida_odb10 (n = 2,934), arthropoda_odb10 (n = 1,013), metazoa_odb10 (n = 954), eukaryota_odb10 (n = 255) datasets. DepthSizer v1.6.2 (Chen et al. 2022) was used to investigate the read depth of the mite genome contamination. The R. argentea reads had a mean read depth of 53 × (Chen et al. 2024) while contaminants were at 35.7X. BUSCO completeness (arachnida_odb10, n = 2,934) was low, at 68.6%, including a high “Duplicated” rate of 4.5% (Table 1). DepthKopy predicted that 132/135 (97.8%) of the Duplicated genes are real duplications, consistent with the high genome quality, including all 132 where both copies were on a chromosome scaffold. BUSCO completeness varied as the taxonomic specificity was reduced, with 87.4% genes complete and only 6.7% missing for the eukaryote database (supplementary table S1, Supplementary Material online).
Statistic . | RARGMITEv1 . |
---|---|
Technology | ONT, 10x |
Total length (bp) | 34,344,194 |
No. of scaffolds | 2 |
N50 (bp) | 17,366,404 |
L50 | 1 |
No. of contigs | 2 |
N50 (bp) | 17,366,404 |
L50 | 1 |
N bases | 0 |
GC (%) | 42.08 |
BUSCO complete (genome) | 68.6% (2,012) |
Single-copy (genome) | 64.1% (1,880) |
Duplicated (genome) | 4.5% (132) |
BUSCO fragmented (genome) | 1.3% (38) |
BUSCO missing (genome) | 30.1% (883) |
Protein-coding genes | 8,050 |
mRNAs | 9,244 |
rRNAs | 18 |
tRNAs | 56 |
BUSCO complete (proteome) | 70.3% (2,063) |
Single-copy (proteome) | 65.1% (1,909) |
Duplicated (proteome) | 5.2% (154) |
BUSCO fragmented (proteome) | 1.3% (38) |
BUSCO missing (proteome) | 28.4% (833) |
Statistic . | RARGMITEv1 . |
---|---|
Technology | ONT, 10x |
Total length (bp) | 34,344,194 |
No. of scaffolds | 2 |
N50 (bp) | 17,366,404 |
L50 | 1 |
No. of contigs | 2 |
N50 (bp) | 17,366,404 |
L50 | 1 |
N bases | 0 |
GC (%) | 42.08 |
BUSCO complete (genome) | 68.6% (2,012) |
Single-copy (genome) | 64.1% (1,880) |
Duplicated (genome) | 4.5% (132) |
BUSCO fragmented (genome) | 1.3% (38) |
BUSCO missing (genome) | 30.1% (883) |
Protein-coding genes | 8,050 |
mRNAs | 9,244 |
rRNAs | 18 |
tRNAs | 56 |
BUSCO complete (proteome) | 70.3% (2,063) |
Single-copy (proteome) | 65.1% (1,909) |
Duplicated (proteome) | 5.2% (154) |
BUSCO fragmented (proteome) | 1.3% (38) |
BUSCO missing (proteome) | 28.4% (833) |
GC, guanine-cytosine.
Statistic . | RARGMITEv1 . |
---|---|
Technology | ONT, 10x |
Total length (bp) | 34,344,194 |
No. of scaffolds | 2 |
N50 (bp) | 17,366,404 |
L50 | 1 |
No. of contigs | 2 |
N50 (bp) | 17,366,404 |
L50 | 1 |
N bases | 0 |
GC (%) | 42.08 |
BUSCO complete (genome) | 68.6% (2,012) |
Single-copy (genome) | 64.1% (1,880) |
Duplicated (genome) | 4.5% (132) |
BUSCO fragmented (genome) | 1.3% (38) |
BUSCO missing (genome) | 30.1% (883) |
Protein-coding genes | 8,050 |
mRNAs | 9,244 |
rRNAs | 18 |
tRNAs | 56 |
BUSCO complete (proteome) | 70.3% (2,063) |
Single-copy (proteome) | 65.1% (1,909) |
Duplicated (proteome) | 5.2% (154) |
BUSCO fragmented (proteome) | 1.3% (38) |
BUSCO missing (proteome) | 28.4% (833) |
Statistic . | RARGMITEv1 . |
---|---|
Technology | ONT, 10x |
Total length (bp) | 34,344,194 |
No. of scaffolds | 2 |
N50 (bp) | 17,366,404 |
L50 | 1 |
No. of contigs | 2 |
N50 (bp) | 17,366,404 |
L50 | 1 |
N bases | 0 |
GC (%) | 42.08 |
BUSCO complete (genome) | 68.6% (2,012) |
Single-copy (genome) | 64.1% (1,880) |
Duplicated (genome) | 4.5% (132) |
BUSCO fragmented (genome) | 1.3% (38) |
BUSCO missing (genome) | 30.1% (883) |
Protein-coding genes | 8,050 |
mRNAs | 9,244 |
rRNAs | 18 |
tRNAs | 56 |
BUSCO complete (proteome) | 70.3% (2,063) |
Single-copy (proteome) | 65.1% (1,909) |
Duplicated (proteome) | 5.2% (154) |
BUSCO fragmented (proteome) | 1.3% (38) |
BUSCO missing (proteome) | 28.4% (833) |
GC, guanine-cytosine.
A Gapless Telomere-to-telomere Reference Genome for an Unidentified Eriophyoid Mite
Of the six scaffolds assembled from contaminated and unmapped reads, two were gapless chromosomes. These comprised 99.64% of the assembly and were capped at each end with a repeating telomere-like sequence. After quality and contamination assessment, the four unplaced scaffolds totaling 124,586 bp were removed as potential contamination (one scaffold, two contigs) or a false duplication (one contig). The final 34.5 Mbp mite genome consisted of two gapless telomere-to-telomere chromosomes (Table 1, Fig. 1). Although this genome was presumably assembled from DNA from multiple individuals, its high contiguity (Fig. 1) and consistent sequencing depth indicate that the presence of structural variants has not hampered the assembly of a representative reference genome. This might be aided by a lack of repetitive content (see supplementary table S2, Supplementary Material online). No mitogenome was identified, possibly due to size selection during sequencing (Chen et al. 2024).

Telomere-to-telomere mite genome. a) Schematic of the two telomere-to-telomere chromosomes. Black circles represent Telociraptor telomere predictions and blue circles represent TIDK-predicted telomere repeats. Triangles along chromosome borders represent Duplicated BUSCO genes (top, forward strand; bottom, reverse strand). Other symbols are rRNA gene predictions. The main rDNA repeat is marked with a “B”. Genes are filled according to DepthKopy predicted copy number (CN). Blue and red connections between chromosomes represent collinear and inverted pairs of Duplicated BUSCO genes on different chromosomes. b) Schematic of rDNA repeat region from chromosome 1, with six full repeats assembled. c) DepthKopy predicted copy number distribution for the assembled mite. Mapped ONT read depths have been converted into copy number distributions using the modal single-copy read depth from Complete BUSCO genes of 35.7X. Whiskers extend to the most extreme values no further than 1.5 times the interquartile range. BUSCO, single-copy complete BUSCO genes; Duplicated, duplicated complete BUSCO genes (all copies); GeMoMa, GeMoMa gene models; rDNA, rDNA gene models; tRNA, tRNA gene models; Repeats, complex repeats; STR, short tandem repeats; LowComp, low complexity regions; Windows, 10 kb tiled windows. d) Synteny between assembled mite chromosomes and A. lycopersici (tomato russet mite) (ACULYCv1). Green and yellow-colored boxes represent chromosomes/scaffolds containing complete BUSCO genes. Blue and red blocks represent blocks of collinear and inverted synteny based on 2 + adjacent BUSCO genes in the same relative orientation. Black circles represent Telociraptor telomere predictions and blue circles represent TIDK-predicted telomere repeats. Yellow triangles mark Duplicated BUSCO genes. Red plus symbols represent assembly gaps.
Manual inspection of the four chromosome ends identified a putative 5 bp TTTGG telomere consensus sequence, which appears to be a novel variant of the more common 6 bp arthropod TTTGGG telomere sequence. It does not match a known consensus telomere repeat in the Telomerase database (Podlevsky et al. 2008) nor TeloBase (Lyčka et al. 2024). Assembled telomere lengths were estimated by extracting the low-complexity A/C-rich 5′ ends and G/T-rich 3′ ends from each chromosome: 17,399 bp (5′) and 12,022 bp (3′) for chromosome 1; 10,729 bp (5′) and 6,824 bp (3′) for chromosome 2. In total, 0.14% of the assembly was telomere, and 3.54% of all TTTGG 5-mers occurred in these regions (∼25 × enrichment), which rose to 53.23% for the TTTGGTTTGG tandem repeat (∼380 × enrichment) (supplementary table S3, Supplementary Material online). Nevertheless, only 53.36% of the telomeres comprised TTTGG motifs, so Tandem Repeats Finder (Benson 1999) was used to identify the dominant motifs. All four telomeres returned an 11-mer consensus, corresponding to TTTGGTGTTGG. The additional TGTTGG 6mer accounted for another 44.24% of the telomeres (>50 × enrichment vs. nontelomeric sequence). Exact copies of the full TTTGGTGTTGG 11mer accounted for 78.15% telomeres but only 0.04% nontelomeres (>2,000 × enrichment) with over 75% of all TTTGGTGTTGG copies found in the telomeres.
To identify the sequenced mite, phylogenomic analysis was performed with the arachnida_odb10 complete BUSCO genes from the assembled mite genome, along with the corresponding BUSCO genes from 36 mite genomes that were downloaded from NCBI (November 2021, 04) (supplementary table S4, Supplementary Material online). HAQESAC v1.14.0 (Edwards et al. 2007) was used to generate protein multiple sequence alignments and phylogenetic trees for all species returning a single copy ortholog for that gene. Briefly, HAQESAC compares all proteins to the query and removes sequences too distant or incomplete before aligning with Clustal Omega v1.2.4 (Sievers et al. 2011). Poorly aligned sequences were removed by HAQESAC (Edwards et al. 2007) and a phylogeny was inferred by FastTree v2.1.11 (Price et al. 2010) before being mid-point rooted. No additional filtering was performed. Trees were then combined into a consensus tree with ASTRAL v5.7.8 (Zhang et al. 2018). Synteny analysis based on BUSCO genes (arachnida_odb10 database) was used to compare the mite genome to ACULYCv1 (Greenhalgh et al. 2020) and visualized with ChromSyn (Edwards et al. 2022). Telomere prediction was performed using Telociraptor v0.9.0 (Edwards 2023) with forward and reverse regular expressions C{1,3}A{2,4} and T{2,4}G{1,3}, based on the simpler TTTGG telomere repeat identified and TIDK v0.2.31 (Brown et al. 2023) with the sequence CCAAA. Ribosomal rDNA genes were predicted using barrnap v0.9.0 (Seemann 2018) in eukaryotic mode.
The phylogenomic analysis provided strong support (98.6% supporting gene quartets) to position the new mite within the superfamily Eriophyoidea, a highly diverse lineage of mites with high host specificity in over 80% of the group (Skoracka et al. 2010) (Fig. 2). However, even the closest sequenced relative, tomato russet mite (A. lycopersici) (Greenhalgh et al. 2020) was distantly related, with little conserved synteny and only 71% protein sequence identity (Fig. 1). BLAST + blastn searches of the 18S and 28S rRNA genes against NCBI core_nt (December 2024, 12) (Camacho et al. 2009) confirmed this placement, with the top ten hits for both genes all to members of the Eriophyidae family (data not shown). Detailed phylogenomic analysis is beyond the scope of this study, but our phylogeny places the superfamily Eriophyoidea outside the suborder Trombidiformes, in accordance with other whole genome studies (Bolton et al. 2023). Eriophyoid mites are microscopic (averaging 200 to 300 µm) and can reside within host tissue, presenting additional challenges for identification and removal (Amrine and Manson 1996). Their presence can be seasonal which further complicates detection (Chetverikov et al. 2022). We examined multiple leaves from the tree in the Sydney Botanic Gardens using a dissecting microscope but were unable to detect any eriophyoid mites. The fact that the contaminating mite was collected and sequenced years apart—first for 10 × linked reads then ONT—suggests the contaminant was persistent and abundant and not removed by rinsing the leaves in water. Additionally, eriophyoid mites were absent on two specimens of R. argentea from the Australian National Botanic Garden examined for comparison. Vagrant mites belonging to other groups were detected (unidentified Phytoseiidae and Oribatida), but these are presumably less host specific (Oldfield 1996). We have been unable to locate any published records of eriophyoid mites on trees in the genus Rhodamnia. While this study provides no concrete data on the mite's taxonomy or host specificity, Scott (1979) observed fruits that were deformed by galls on R. argentea that may indicate mite-induced damage. Eriophyoid mites can serve as vectors for viruses (de Lillo et al. 2018) and have been observed as vectors for rust fungi (Puccinia spp.) (Gamliel-Atinsky et al. 2010). Given the severity of the threat of myrtle rust to species of Rhodamnia, future work may investigate the role of these mites as vectors.

Mite phylogenomics and genome reduction. a) ASTRAL consensus tree for unknown mite (square) and 36 NCBI Acari genomes using 1,877 BUSCO complete gene trees (arachnida_odb10). Internal nodes are labeled with the percentage of quartets in the BUSCO gene trees that support the ancestral branch. Values below 50 are not shown. Nodes labeled with numbers and symbols correspond to taxonomic groups in b) B. BUSCO completeness and compiled BUSCOMP completeness for taxonomic groups labeled in a). For each taxonomic grouping in the phylogenomic tree, numbers report the best ratings for each BUSCO gene across all species in that clade.
Genome Annotation and Evidence for Genome Reduction
Despite the low BUSCO completeness, the gapless chromosomes and high (35.7×) mean sequencing depth give great confidence that this is a complete nuclear genome. The size of the assembled mite genome is comparable to two previously published eriophyid mite genomes—A. lycopersici (Greenhalgh et al. 2020) and Fragariocoptes setiger (Klimov et al. 2022). All three Eriophyoidea species showed similarly low BUSCO completeness with many of the same genes missing. Of the 882 BUSCO genes missing in our mite genome, 832 were also missing in A. lycopersici (Fig. 2b, Eriophyidae), and 716 of these were missing in F. setiger (Fig. 2b, Eriophyoidea). In contrast, only 20 genes (0.7%) were missing across Eriophyoidea and neighboring Trombidiformes, despite nontick mites typically having small genomes (Gregory and Young 2020). This pattern of loss in Eriophyoidea genomes is consistent with genome reduction (Hessen et al. 2010) occurring early in this lineage. However, continued lineage-specific losses suggest that genome reduction is an ongoing process. This is further supported by a surprisingly high number of Duplicated BUSCO genes, which appear to represent genuine duplications, consistent with ongoing adaptive evolution. It is also notable that the proportion of missing BUSCO genes increased considerably with increasingly taxonomically restricted BUSCO datasets, suggesting that it is more derived traits that are being altered. The reasons for the disparity in genome size between ticks and other mites remain unclear, but it is plausible that small mite body sizes are made possible by streamlined genomes and/or that the evolution of small bodies has constrained genome size compared with other arthropods (Gregory and Young 2020). Increased mite genomic resources will broaden our understanding of the evolutionary mechanisms responsible for reduced genome sizes.
The mite genome was annotated with GeMoMa v1.7.1 (Keilwagen et al. 2019) with 15 invertebrate genomes annotated by NCBI (supplementary table S4, Supplementary Material online). Predicted transcriptome statistics were generated with SAAGA v0.7.7 (Edwards et al. 2021). Annotation completeness was assessed using BUSCO in proteome mode. Ribosomal RNA (rRNA) genes were predicted with Barrnap v0.9 (Seemann 2018) and transfer RNAs (tRNAs) were predicted with tRNAscan-SE v2.05 (Lowe and Chan 2016), implementing Infernal v1.1.2 (Nawrocki and Eddy 2013). A custom repeat library was generated with RepeatModeler v2.0.1 (-engine ncbi) and the genome was masked with RepeatMasker v4.1.0 (Tarailo-Graovac and Chen 2009). Telomeric repeat sequences were identified by manually inspecting the ends of the two chromosome-scale scaffolds and creating a consensus repeat unit. The 36 mite genomes downloaded from NCBI (November 2021, 04) (supplementary table S4, Supplementary Material online) were subject to the same BUSCO, GeMoMa, SAAGA and RepeatModeler/RepeatMasker annotation for a consistent comparison. BUSCO results for all the mite species and selected taxonomic levels were compiled using BUSCOMP v1.1.3 (Stuart et al. 2022).
GeMoMa genome annotation predicted 8,071 protein-coding genes (9,269 mRNAs), with 18 rRNAs and 56 tRNAs. The 18 rRNA genes comprised 6 tandem copies of the complete 18S-5.8S-28S rDNA repeat unit. DepthKopy analysis revealed that the rDNA cluster was collapsed in the assembly and the 6 copies of the repeat unit represented 36 copies in the actual genome (mean copy number per gene ∼12) (Fig. 1). The genome had a low repeat content at 11.1% (supplementary table S2, Supplementary Material online). There was no sign of widespread collapsed repeats in the assembly (Fig. 1c). Along with the two other eriophyoid mites, the annotation reveals a reduction in the number of introns and transposable elements, and an overall reduction in repetitive and intergenic content (Fig. 3). Despite a reduction in gene numbers, the percentage of the genome that corresponds to exons and genes is higher in these species (Fig. 3a). While detailed annotation and investigation of gene content is beyond the scope of this study, our gapless telomere-to-telomere nuclear genome provides an excellent resource to study genome reduction, even though the mite species remains unidentified.

Comparison of mite genome attributes. Genome statistics for 37 Acari genomes are shown as density plots, with individual statistics marked for the assembled eriophyoid mite genome (RARGMITEv1; square), A. lycopersici (tomato russet mite genome ACULYCv1; circle), and F. setiger (gall mite genome FRASETv1; triangle). a) Genomic features by percentage of genome. b) Introns and transposable elements by count. c) Genome features by length. LowComp, low complexity; SSRs, simple sequence repeats; TEs, transposable elements; SEGs, single-exon genes.
Identification and Removal of Assembly Contamination
Improvements in sequencing technology have massively increased the ease with which whole genomes can be sequenced and assembled (Gupta 2022). However, as the number of genomes rises, the relative resources available per genome for curation decrease (Howe et al. 2021). Genome contamination is a well-known problem and is prevalent among short-read and unpublished draft genomes (Kumar et al. 2013; Chrisman et al. 2022; Steinegger and Salzberg 2020). The mite contamination in the R. argentea draft genome was particularly difficult to identify in the initial draft assembly due to the large number of variable-quality contigs, high depth of contaminant sequencing, and lack of assemblies from related organisms. Variations in read depth and guanine-cytosine (GC) content were discounted as possible regional biases. Instead, contamination in this case was identified by targeted searches for arthropod sequences (pers. comm. W. Dermauw, J. Santolini, and S. Gerber).
Tools such as Blobtoolkit (Kumar et al. 2013), FCS-GX (Astashyn et al. 2024), Conterminator (Steinegger and Salzberg 2020), Tiara (Karlicki et al. 2022), and ContScout (Bálint et al. 2024) improve the ease and resolution of genomic DNA contamination screening. The approach used here, Taxolotl (Tobias et al. 2022), complements DNA-based methods by assigning taxonomy to protein sequences, which enables matches across greater evolutionary distances. The routine FCS-GX screening by NCBI upon submission should greatly reduce genomes with significant contamination. Nevertheless, caution should be used when interpreting unusual findings (e.g. Saffar and Matin 2021). This is particularly true when using unpublished draft genomes, where it is always good practice to contact the submitting researchers. Long-term, the community needs improved mechanisms by which contamination in genomes can be removed from public repositories. Ideally, expert institutions (e.g. zoological/botanical gardens and museums) will be given the resources and power to leverage high-confidence data and taxonomic knowledge to identify and flag contamination for removal.
Conclusions
The power of long-read sequencing is illustrated by the ability to recover a gapless telomere-to-telomere mite genome from contaminating DNA. At 34.3 Mbp, our mite assembly is not much bigger than the (less intact) 32.5 Mbp A. lycopersici genome, and is notable in its completeness despite poor BUSCO results. While the identity of the contaminating mite remains a mystery, the completeness of its genome should provide a valuable resource for investigating genome reduction and other characteristic traits of eriophyoid mites. The novel TTTGG and TTTGGTGTTGG sequences identified could improve telomere identification in other mites if this is a shared trait.
Supplementary Material
Supplementary material is available at Genome Biology and Evolution online.
Acknowledgments
We are grateful to Jérôme Santolini, Sophie Gerber, and Wannes Dermauw for contacting us about contamination in the draft genome. We thank Tamera Beath for assistance with sampling at the Australian National Botanic Garden. This research includes computations using the computational cluster Katana supported by Research Technology Services at UNSW Sydney (https://doi.org/10.26190/669x-a286.) and the Pawsey Supercomputing Research Centre's Setonix Supercomputer (https://doi.org/10.48569/18sb-8s43.), with funding from the Australian Government and the Government of Western Australia.
Author Contributions
Richard Edwards (Conceptualization, Investigation, Methodology, Software, Formal analysis, Writing—original draft, Writing—review and editing, Supervision), Stephanie Chen (Investigation, Formal analysis, Writing—original draft, Writing—review and editing), Bruce Halliday (Investigation, Writing—review and editing), and Jason Bragg (Writing—review and editing, Supervision).
Funding
This research was funded by the Australian Research Council (LP18010072) and the University of New South Wales. Stephanie Chen was supported by an Australian Government Research Training Program (RTP) Scholarship.
Data Availability
The mite genome (Rarg_mite_v1.1) was uploaded to BioProject PRJNA737568 which contains sequences and genomes of the Rhodamnia argentea host plant. The raw 10 × linked-reads were deposited under BioProject PRJEB30444 that was created for an earlier draft genome. Additional data supporting this work can be found on OSF, doi: 10.17605/OSF.IO/EV2NY (https://osf.io/ev2ny/).
Benefit Sharing
All raw sequencing data and assembled genomes have been shared with the broader public via appropriate biological databases.
Literature Cited
Author notes
Richard J Edwards and Stephanie H Chen contributed equally to the work.