Small but Mitey: A Gapless Telomere-to-Telomere Assembly of an Unidentified Mite With a Streamlined Genome

Author Notes

Abstract

A draft assembly of the rainforest tree Rhodamnia argentea Benth. (malletwood, Myrtaceae) revealed contaminating DNA sequences that most closely matched those from mites in the family Eriophyidae. Eriophyoid mites are plant parasites that often induce galls or other deformities on their host plants. They are notable for their small size (averaging 200 μm), distinctive four-legged body structure, and heavily streamlined genomes, which are among the smallest known of all arthropods. Contaminating mite sequences were assembled into a high-quality gapless telomere-to-telomere nuclear genome. The entire genome was assembled on two fully contiguous chromosomes, capped with a novel TTTGG or TTTGGTGTTGG telomere sequence, and exhibited clear signs of genome reduction (34.5 Mbp total length, 68.6% arachnid Benchmarking Universal Single-Copy Ortholog completeness). Phylogenomic analysis confirmed that this genome is that of a previously unsequenced eriophyoid mite. Despite its unknown identity, this complete nuclear genome provides a valuable resource to investigate invertebrate genome reduction.

contamination, eriophyoidea, genome assembly, genome streamlining, genome reduction

Significance

Eriophyoid mites are microscopic, four-legged, plant-infesting mites that exhibit heavily streamlined genomes, in which many of the normal genes have been lost. This study reports a complete (gapless) telomere-to-telomere assembly of the nuclear genome of an eriophyoid mite from contamination in host tree sequencing data. The assembly provides a powerful resource to investigate the evolutionary phenomenon of genome reduction.

Trees are hosts to a diverse range of pests and diseases (Gougherty and Davies 2022). Eriophyoid mites (Acari: Eriophyoidea) are a hyperdiverse lineage of over 5,000 species that feed on plants (Zhang 2017; Ueckermann 2010). These tiny four-legged mites, averaging 200 μm in length (Amrine et al. 2003), have the distinct ability to induce galls (de Lillo et al. 2018; Lindquist et al. 1996). As well as being physically small, eriophyoid mites have very small genomes. For example, the 32.5 Mbp genome of the tomato russet mite Aculops lycopersici (Tryon, 1917) (Greenhalgh et al. 2020) is the smallest yet reported for any arthropod. It has few transposable elements, tiny intergenic regions, and is remarkably intron-poor, as more than 80% of coding genes are intronless (Greenhalgh et al. 2020). Reminiscent of microbial eukaryotes, this reduction of nuclear genome content is a consequence of genome streamlining (Arkhipova 2018; Hessen et al. 2010).

Pest and pathogen DNA can often contribute contaminating DNA to genome assembly projects. We recently sequenced and assembled the genome of Rhodamnia argentea Benth. (malletwood) to inform conservation management (Chen et al. 2024). During this process, arthropod sequences were detected as contaminants at surprisingly high read depths. While these were hard to separate using linked-read sequencing, the addition of Oxford Nanopore Technologies (ONT) long reads made it possible to isolate and independently assemble the tree and mite genomes. Here, we present a telomere-to-telomere reference genome of the contaminating eriophyoid mite, assembled from ONT long reads and 10 × Genomics Chromium linked reads. Our efforts to isolate and identify the mite itself were not successful. Nevertheless, our complete and gapless nuclear genome of this unidentified species shows dramatic streamlining of content and will provide an invaluable resource for future studies of arthropod genome reduction.

Genome Sequencing and Assembly

The sampling and sequencing for the host tree R. argentea is described in Chen et al. (2024). Briefly, young leaves of R. argentea were sampled from a specimen at the Royal Botanic Garden Sydney (NCBI BioSample SAMN19698777, ToLID drRhoArge1) and high molecular weight gDNA extracted for ONT and 10 × Genomics Chromium (Illumina) sequencing, following size selection. Scaffolding of the tree genome to chromosome level was achieved through Hi-C sequencing (Phase Genomics Proximo Hi-C Plant kit). ONT and 10 × reads were partitioned into three sets based on mapping to the contaminated R. argentea genome: (i) R. argentea reads, (ii) contamination reads mapping to removed scaffolds, and (iii) unmapped reads. Contamination and unmapped reads were then combined (“non-drRhoArge1 reads”). The 10 × reads were trimmed (5′ 30 bp of R1, 5′ 10 bp of R2, quality trimming < Q20) with bbmap v38.51 (Bushnell 2014) and filtered to sequences over 100 bp long.

Non-drRhoArge1 ONT reads were assembled with Flye v2.9 (Kolmogorov et al. 2019) with a target genome size (based on the filtered scaffolds) of 35 Mbp. Scaffolding and gap-filling was done with LongStitch-ARKS v1.0.1 (Coombe et al. 2021) and low-quality scaffolds were removed with Diploidocus v1.1.1 (cycle mode) (Chen et al. 2022). The genome was then polished with Hypo v1.0.3 (Kundu et al. 2019). The assembly was subject to a final contamination screen by running FCS-GX v0.4.0 (Astashyn et al. 2024) and Tiara v1.0.3 (Karlicki et al. 2022). FCS-GX was run against the NCBI gxdb (build date 2023 to 2001-24). DepthKopy v1.0.1 analysis was performed to check for low-quality/coverage sequences. Assembly completeness was evaluated with Benchmarking Universal Single-Copy Ortholog (BUSCO) v5.2.2 (Manni et al. 2021) against the arachnida_odb10 (n = 2,934), arthropoda_odb10 (n = 1,013), metazoa_odb10 (n = 954), eukaryota_odb10 (n = 255) datasets. DepthSizer v1.6.2 (Chen et al. 2022) was used to investigate the read depth of the mite genome contamination. The R. argentea reads had a mean read depth of 53 × (Chen et al. 2024) while contaminants were at 35.7X. BUSCO completeness (arachnida_odb10, n = 2,934) was low, at 68.6%, including a high “Duplicated” rate of 4.5% (Table 1). DepthKopy predicted that 132/135 (97.8%) of the Duplicated genes are real duplications, consistent with the high genome quality, including all 132 where both copies were on a chromosome scaffold. BUSCO completeness varied as the taxonomic specificity was reduced, with 87.4% genes complete and only 6.7% missing for the eukaryote database (supplementary table S1, Supplementary Material online).

Table 1

Open in new tab

Genome assembly and annotation statistics

Statistic	RARGMITEv1
Technology	ONT, 10x
Total length (bp)	34,344,194
No. of scaffolds	2
N50 (bp)	17,366,404
L50	1
No. of contigs	2
N50 (bp)	17,366,404
L50	1
N bases	0
GC (%)	42.08
BUSCO complete (genome)	68.6% (2,012)
Single-copy (genome)	64.1% (1,880)
Duplicated (genome)	4.5% (132)
BUSCO fragmented (genome)	1.3% (38)
BUSCO missing (genome)	30.1% (883)
Protein-coding genes	8,050
mRNAs	9,244
rRNAs	18
tRNAs	56
BUSCO complete (proteome)	70.3% (2,063)
Single-copy (proteome)	65.1% (1,909)
Duplicated (proteome)	5.2% (154)
BUSCO fragmented (proteome)	1.3% (38)
BUSCO missing (proteome)	28.4% (833)

Statistic	RARGMITEv1
Technology	ONT, 10x
Total length (bp)	34,344,194
No. of scaffolds	2
N50 (bp)	17,366,404
L50	1
No. of contigs	2
N50 (bp)	17,366,404
L50	1
N bases	0
GC (%)	42.08
BUSCO complete (genome)	68.6% (2,012)
Single-copy (genome)	64.1% (1,880)
Duplicated (genome)	4.5% (132)
BUSCO fragmented (genome)	1.3% (38)
BUSCO missing (genome)	30.1% (883)
Protein-coding genes	8,050
mRNAs	9,244
rRNAs	18
tRNAs	56
BUSCO complete (proteome)	70.3% (2,063)
Single-copy (proteome)	65.1% (1,909)
Duplicated (proteome)	5.2% (154)
BUSCO fragmented (proteome)	1.3% (38)
BUSCO missing (proteome)	28.4% (833)

GC, guanine-cytosine.

Table 1

Open in new tab

Genome assembly and annotation statistics

Statistic	RARGMITEv1
Technology	ONT, 10x
Total length (bp)	34,344,194
No. of scaffolds	2
N50 (bp)	17,366,404
L50	1
No. of contigs	2
N50 (bp)	17,366,404
L50	1
N bases	0
GC (%)	42.08
BUSCO complete (genome)	68.6% (2,012)
Single-copy (genome)	64.1% (1,880)
Duplicated (genome)	4.5% (132)
BUSCO fragmented (genome)	1.3% (38)
BUSCO missing (genome)	30.1% (883)
Protein-coding genes	8,050
mRNAs	9,244
rRNAs	18
tRNAs	56
BUSCO complete (proteome)	70.3% (2,063)
Single-copy (proteome)	65.1% (1,909)
Duplicated (proteome)	5.2% (154)
BUSCO fragmented (proteome)	1.3% (38)
BUSCO missing (proteome)	28.4% (833)

Statistic	RARGMITEv1
Technology	ONT, 10x
Total length (bp)	34,344,194
No. of scaffolds	2
N50 (bp)	17,366,404
L50	1
No. of contigs	2
N50 (bp)	17,366,404
L50	1
N bases	0
GC (%)	42.08
BUSCO complete (genome)	68.6% (2,012)
Single-copy (genome)	64.1% (1,880)
Duplicated (genome)	4.5% (132)
BUSCO fragmented (genome)	1.3% (38)
BUSCO missing (genome)	30.1% (883)
Protein-coding genes	8,050
mRNAs	9,244
rRNAs	18
tRNAs	56
BUSCO complete (proteome)	70.3% (2,063)
Single-copy (proteome)	65.1% (1,909)
Duplicated (proteome)	5.2% (154)
BUSCO fragmented (proteome)	1.3% (38)
BUSCO missing (proteome)	28.4% (833)

GC, guanine-cytosine.

A Gapless Telomere-to-telomere Reference Genome for an Unidentified Eriophyoid Mite

Of the six scaffolds assembled from contaminated and unmapped reads, two were gapless chromosomes. These comprised 99.64% of the assembly and were capped at each end with a repeating telomere-like sequence. After quality and contamination assessment, the four unplaced scaffolds totaling 124,586 bp were removed as potential contamination (one scaffold, two contigs) or a false duplication (one contig). The final 34.5 Mbp mite genome consisted of two gapless telomere-to-telomere chromosomes (Table 1, Fig. 1). Although this genome was presumably assembled from DNA from multiple individuals, its high contiguity (Fig. 1) and consistent sequencing depth indicate that the presence of structural variants has not hampered the assembly of a representative reference genome. This might be aided by a lack of repetitive content (see supplementary table S2, Supplementary Material online). No mitogenome was identified, possibly due to size selection during sequencing (Chen et al. 2024).

Fig. 1.

Telomere-to-telomere mite genome. a) Schematic of the two telomere-to-telomere chromosomes. Black circles represent Telociraptor telomere predictions and blue circles represent TIDK-predicted telomere repeats. Triangles along chromosome borders represent Duplicated BUSCO genes (top, forward strand; bottom, reverse strand). Other symbols are rRNA gene predictions. The main rDNA repeat is marked with a “B”. Genes are filled according to DepthKopy predicted copy number (CN). Blue and red connections between chromosomes represent collinear and inverted pairs of Duplicated BUSCO genes on different chromosomes. b) Schematic of rDNA repeat region from chromosome 1, with six full repeats assembled. c) DepthKopy predicted copy number distribution for the assembled mite. Mapped ONT read depths have been converted into copy number distributions using the modal single-copy read depth from Complete BUSCO genes of 35.7X. Whiskers extend to the most extreme values no further than 1.5 times the interquartile range. BUSCO, single-copy complete BUSCO genes; Duplicated, duplicated complete BUSCO genes (all copies); GeMoMa, GeMoMa gene models; rDNA, rDNA gene models; tRNA, tRNA gene models; Repeats, complex repeats; STR, short tandem repeats; LowComp, low complexity regions; Windows, 10 kb tiled windows. d) Synteny between assembled mite chromosomes and A. lycopersici (tomato russet mite) (ACULYCv1). Green and yellow-colored boxes represent chromosomes/scaffolds containing complete BUSCO genes. Blue and red blocks represent blocks of collinear and inverted synteny based on 2 + adjacent BUSCO genes in the same relative orientation. Black circles represent Telociraptor telomere predictions and blue circles represent TIDK-predicted telomere repeats. Yellow triangles mark Duplicated BUSCO genes. Red plus symbols represent assembly gaps.

Open in new tab Download slide

Manual inspection of the four chromosome ends identified a putative 5 bp TTTGG telomere consensus sequence, which appears to be a novel variant of the more common 6 bp arthropod TTTGGG telomere sequence. It does not match a known consensus telomere repeat in the Telomerase database (Podlevsky et al. 2008) nor TeloBase (Lyčka et al. 2024). Assembled telomere lengths were estimated by extracting the low-complexity A/C-rich 5′ ends and G/T-rich 3′ ends from each chromosome: 17,399 bp (5′) and 12,022 bp (3′) for chromosome 1; 10,729 bp (5′) and 6,824 bp (3′) for chromosome 2. In total, 0.14% of the assembly was telomere, and 3.54% of all TTTGG 5-mers occurred in these regions (∼25 × enrichment), which rose to 53.23% for the TTTGGTTTGG tandem repeat (∼380 × enrichment) (supplementary table S3, Supplementary Material online). Nevertheless, only 53.36% of the telomeres comprised TTTGG motifs, so Tandem Repeats Finder (Benson 1999) was used to identify the dominant motifs. All four telomeres returned an 11-mer consensus, corresponding to TTTGGTGTTGG. The additional TGTTGG 6mer accounted for another 44.24% of the telomeres (>50 × enrichment vs. nontelomeric sequence). Exact copies of the full TTTGGTGTTGG 11mer accounted for 78.15% telomeres but only 0.04% nontelomeres (>2,000 × enrichment) with over 75% of all TTTGGTGTTGG copies found in the telomeres.

To identify the sequenced mite, phylogenomic analysis was performed with the arachnida_odb10 complete BUSCO genes from the assembled mite genome, along with the corresponding BUSCO genes from 36 mite genomes that were downloaded from NCBI (November 2021, 04) (supplementary table S4, Supplementary Material online). HAQESAC v1.14.0 (Edwards et al. 2007) was used to generate protein multiple sequence alignments and phylogenetic trees for all species returning a single copy ortholog for that gene. Briefly, HAQESAC compares all proteins to the query and removes sequences too distant or incomplete before aligning with Clustal Omega v1.2.4 (Sievers et al. 2011). Poorly aligned sequences were removed by HAQESAC (Edwards et al. 2007) and a phylogeny was inferred by FastTree v2.1.11 (Price et al. 2010) before being mid-point rooted. No additional filtering was performed. Trees were then combined into a consensus tree with ASTRAL v5.7.8 (Zhang et al. 2018). Synteny analysis based on BUSCO genes (arachnida_odb10 database) was used to compare the mite genome to ACULYCv1 (Greenhalgh et al. 2020) and visualized with ChromSyn (Edwards et al. 2022). Telomere prediction was performed using Telociraptor v0.9.0 (Edwards 2023) with forward and reverse regular expressions C{1,3}A{2,4} and T{2,4}G{1,3}, based on the simpler TTTGG telomere repeat identified and TIDK v0.2.31 (Brown et al. 2023) with the sequence CCAAA. Ribosomal rDNA genes were predicted using barrnap v0.9.0 (Seemann 2018) in eukaryotic mode.

The phylogenomic analysis provided strong support (98.6% supporting gene quartets) to position the new mite within the superfamily Eriophyoidea, a highly diverse lineage of mites with high host specificity in over 80% of the group (Skoracka et al. 2010) (Fig. 2). However, even the closest sequenced relative, tomato russet mite (A. lycopersici) (Greenhalgh et al. 2020) was distantly related, with little conserved synteny and only 71% protein sequence identity (Fig. 1). BLAST + blastn searches of the 18S and 28S rRNA genes against NCBI core_nt (December 2024, 12) (Camacho et al. 2009) confirmed this placement, with the top ten hits for both genes all to members of the Eriophyidae family (data not shown). Detailed phylogenomic analysis is beyond the scope of this study, but our phylogeny places the superfamily Eriophyoidea outside the suborder Trombidiformes, in accordance with other whole genome studies (Bolton et al. 2023). Eriophyoid mites are microscopic (averaging 200 to 300 µm) and can reside within host tissue, presenting additional challenges for identification and removal (Amrine and Manson 1996). Their presence can be seasonal which further complicates detection (Chetverikov et al. 2022). We examined multiple leaves from the tree in the Sydney Botanic Gardens using a dissecting microscope but were unable to detect any eriophyoid mites. The fact that the contaminating mite was collected and sequenced years apart—first for 10 × linked reads then ONT—suggests the contaminant was persistent and abundant and not removed by rinsing the leaves in water. Additionally, eriophyoid mites were absent on two specimens of R. argentea from the Australian National Botanic Garden examined for comparison. Vagrant mites belonging to other groups were detected (unidentified Phytoseiidae and Oribatida), but these are presumably less host specific (Oldfield 1996). We have been unable to locate any published records of eriophyoid mites on trees in the genus Rhodamnia. While this study provides no concrete data on the mite's taxonomy or host specificity, Scott (1979) observed fruits that were deformed by galls on R. argentea that may indicate mite-induced damage. Eriophyoid mites can serve as vectors for viruses (de Lillo et al. 2018) and have been observed as vectors for rust fungi (Puccinia spp.) (Gamliel-Atinsky et al. 2010). Given the severity of the threat of myrtle rust to species of Rhodamnia, future work may investigate the role of these mites as vectors.

Fig. 2.

Mite phylogenomics and genome reduction. a) ASTRAL consensus tree for unknown mite (square) and 36 NCBI Acari genomes using 1,877 BUSCO complete gene trees (arachnida_odb10). Internal nodes are labeled with the percentage of quartets in the BUSCO gene trees that support the ancestral branch. Values below 50 are not shown. Nodes labeled with numbers and symbols correspond to taxonomic groups in b) B. BUSCO completeness and compiled BUSCOMP completeness for taxonomic groups labeled in a). For each taxonomic grouping in the phylogenomic tree, numbers report the best ratings for each BUSCO gene across all species in that clade.

Open in new tab Download slide

Genome Annotation and Evidence for Genome Reduction

Despite the low BUSCO completeness, the gapless chromosomes and high (35.7×) mean sequencing depth give great confidence that this is a complete nuclear genome. The size of the assembled mite genome is comparable to two previously published eriophyid mite genomes—A. lycopersici (Greenhalgh et al. 2020) and Fragariocoptes setiger (Klimov et al. 2022). All three Eriophyoidea species showed similarly low BUSCO completeness with many of the same genes missing. Of the 882 BUSCO genes missing in our mite genome, 832 were also missing in A. lycopersici (Fig. 2b, Eriophyidae), and 716 of these were missing in F. setiger (Fig. 2b, Eriophyoidea). In contrast, only 20 genes (0.7%) were missing across Eriophyoidea and neighboring Trombidiformes, despite nontick mites typically having small genomes (Gregory and Young 2020). This pattern of loss in Eriophyoidea genomes is consistent with genome reduction (Hessen et al. 2010) occurring early in this lineage. However, continued lineage-specific losses suggest that genome reduction is an ongoing process. This is further supported by a surprisingly high number of Duplicated BUSCO genes, which appear to represent genuine duplications, consistent with ongoing adaptive evolution. It is also notable that the proportion of missing BUSCO genes increased considerably with increasingly taxonomically restricted BUSCO datasets, suggesting that it is more derived traits that are being altered. The reasons for the disparity in genome size between ticks and other mites remain unclear, but it is plausible that small mite body sizes are made possible by streamlined genomes and/or that the evolution of small bodies has constrained genome size compared with other arthropods (Gregory and Young 2020). Increased mite genomic resources will broaden our understanding of the evolutionary mechanisms responsible for reduced genome sizes.

The mite genome was annotated with GeMoMa v1.7.1 (Keilwagen et al. 2019) with 15 invertebrate genomes annotated by NCBI (supplementary table S4, Supplementary Material online). Predicted transcriptome statistics were generated with SAAGA v0.7.7 (Edwards et al. 2021). Annotation completeness was assessed using BUSCO in proteome mode. Ribosomal RNA (rRNA) genes were predicted with Barrnap v0.9 (Seemann 2018) and transfer RNAs (tRNAs) were predicted with tRNAscan-SE v2.05 (Lowe and Chan 2016), implementing Infernal v1.1.2 (Nawrocki and Eddy 2013). A custom repeat library was generated with RepeatModeler v2.0.1 (-engine ncbi) and the genome was masked with RepeatMasker v4.1.0 (Tarailo-Graovac and Chen 2009). Telomeric repeat sequences were identified by manually inspecting the ends of the two chromosome-scale scaffolds and creating a consensus repeat unit. The 36 mite genomes downloaded from NCBI (November 2021, 04) (supplementary table S4, Supplementary Material online) were subject to the same BUSCO, GeMoMa, SAAGA and RepeatModeler/RepeatMasker annotation for a consistent comparison. BUSCO results for all the mite species and selected taxonomic levels were compiled using BUSCOMP v1.1.3 (Stuart et al. 2022).

GeMoMa genome annotation predicted 8,071 protein-coding genes (9,269 mRNAs), with 18 rRNAs and 56 tRNAs. The 18 rRNA genes comprised 6 tandem copies of the complete 18S-5.8S-28S rDNA repeat unit. DepthKopy analysis revealed that the rDNA cluster was collapsed in the assembly and the 6 copies of the repeat unit represented 36 copies in the actual genome (mean copy number per gene ∼12) (Fig. 1). The genome had a low repeat content at 11.1% (supplementary table S2, Supplementary Material online). There was no sign of widespread collapsed repeats in the assembly (Fig. 1c). Along with the two other eriophyoid mites, the annotation reveals a reduction in the number of introns and transposable elements, and an overall reduction in repetitive and intergenic content (Fig. 3). Despite a reduction in gene numbers, the percentage of the genome that corresponds to exons and genes is higher in these species (Fig. 3a). While detailed annotation and investigation of gene content is beyond the scope of this study, our gapless telomere-to-telomere nuclear genome provides an excellent resource to study genome reduction, even though the mite species remains unidentified.

Fig. 3.

Comparison of mite genome attributes. Genome statistics for 37 Acari genomes are shown as density plots, with individual statistics marked for the assembled eriophyoid mite genome (RARGMITEv1; square), A. lycopersici (tomato russet mite genome ACULYCv1; circle), and F. setiger (gall mite genome FRASETv1; triangle). a) Genomic features by percentage of genome. b) Introns and transposable elements by count. c) Genome features by length. LowComp, low complexity; SSRs, simple sequence repeats; TEs, transposable elements; SEGs, single-exon genes.

Open in new tab Download slide

Identification and Removal of Assembly Contamination

Improvements in sequencing technology have massively increased the ease with which whole genomes can be sequenced and assembled (Gupta 2022). However, as the number of genomes rises, the relative resources available per genome for curation decrease (Howe et al. 2021). Genome contamination is a well-known problem and is prevalent among short-read and unpublished draft genomes (Kumar et al. 2013; Chrisman et al. 2022; Steinegger and Salzberg 2020). The mite contamination in the R. argentea draft genome was particularly difficult to identify in the initial draft assembly due to the large number of variable-quality contigs, high depth of contaminant sequencing, and lack of assemblies from related organisms. Variations in read depth and guanine-cytosine (GC) content were discounted as possible regional biases. Instead, contamination in this case was identified by targeted searches for arthropod sequences (pers. comm. W. Dermauw, J. Santolini, and S. Gerber).

Tools such as Blobtoolkit (Kumar et al. 2013), FCS-GX (Astashyn et al. 2024), Conterminator (Steinegger and Salzberg 2020), Tiara (Karlicki et al. 2022), and ContScout (Bálint et al. 2024) improve the ease and resolution of genomic DNA contamination screening. The approach used here, Taxolotl (Tobias et al. 2022), complements DNA-based methods by assigning taxonomy to protein sequences, which enables matches across greater evolutionary distances. The routine FCS-GX screening by NCBI upon submission should greatly reduce genomes with significant contamination. Nevertheless, caution should be used when interpreting unusual findings (e.g. Saffar and Matin 2021). This is particularly true when using unpublished draft genomes, where it is always good practice to contact the submitting researchers. Long-term, the community needs improved mechanisms by which contamination in genomes can be removed from public repositories. Ideally, expert institutions (e.g. zoological/botanical gardens and museums) will be given the resources and power to leverage high-confidence data and taxonomic knowledge to identify and flag contamination for removal.

Conclusions

The power of long-read sequencing is illustrated by the ability to recover a gapless telomere-to-telomere mite genome from contaminating DNA. At 34.3 Mbp, our mite assembly is not much bigger than the (less intact) 32.5 Mbp A. lycopersici genome, and is notable in its completeness despite poor BUSCO results. While the identity of the contaminating mite remains a mystery, the completeness of its genome should provide a valuable resource for investigating genome reduction and other characteristic traits of eriophyoid mites. The novel TTTGG and TTTGGTGTTGG sequences identified could improve telomere identification in other mites if this is a shared trait.

Supplementary Material

Supplementary material is available at Genome Biology and Evolution online.

Acknowledgments

We are grateful to Jérôme Santolini, Sophie Gerber, and Wannes Dermauw for contacting us about contamination in the draft genome. We thank Tamera Beath for assistance with sampling at the Australian National Botanic Garden. This research includes computations using the computational cluster Katana supported by Research Technology Services at UNSW Sydney (https://doi.org/10.26190/669x-a286.) and the Pawsey Supercomputing Research Centre's Setonix Supercomputer (https://doi.org/10.48569/18sb-8s43.), with funding from the Australian Government and the Government of Western Australia.

Author Contributions

Richard Edwards (Conceptualization, Investigation, Methodology, Software, Formal analysis, Writing—original draft, Writing—review and editing, Supervision), Stephanie Chen (Investigation, Formal analysis, Writing—original draft, Writing—review and editing), Bruce Halliday (Investigation, Writing—review and editing), and Jason Bragg (Writing—review and editing, Supervision).

Funding

This research was funded by the Australian Research Council (LP18010072) and the University of New South Wales. Stephanie Chen was supported by an Australian Government Research Training Program (RTP) Scholarship.

Data Availability

The mite genome (Rarg_mite_v1.1) was uploaded to BioProject PRJNA737568 which contains sequences and genomes of the Rhodamnia argentea host plant. The raw 10 × linked-reads were deposited under BioProject PRJEB30444 that was created for an earlier draft genome. Additional data supporting this work can be found on OSF, doi: 10.17605/OSF.IO/EV2NY (https://osf.io/ev2ny/).

Benefit Sharing

All raw sequencing data and assembled genomes have been shared with the broader public via appropriate biological databases.

Literature Cited

Amrine

Manson

DCM

. 1.6.3 Preparation, mounting and descriptive study of eriophyoid mites. In:

Eriophyoid mites their biology, natural enemies and control

. World Crop Pests.

Lindquist

Sabelis

, &

Bruin

, editors. Vol.

Elsevier

;

1996

. pp.

383

–

396

. https://www.sciencedirect.com/science/article/abs/pii/S1572437996800236?via%3Dihub

Amrine

Stasny

Flechtmann

Revised keys to world genera of eriophyoidea (acari: prostigmata)

Michigan, USA

Indira Publishing House

;

2003

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Arkhipova

Neutral theory, transposable elements, and eukaryotic genome evolution

Mol Biol Evol.

2018

(

1332

–

1337

10.1093/molbev/msy083

Astashyn

Tvedte

Sweeney

Sapojnikov

Bouk

Joukov

Mozes

Strope

Sylla

Wagner

, et al.

Rapid and sensitive detection of genome contamination at scale with FCS-GX

Genome Biol

2024

(

10.1186/s13059-024-03198-7

Bálint

Merényi

Hegedüs

Grigoriev

Hou

Földi

Nagy

ContScout: sensitive detection and removal of contamination from annotated genomes

Nat Commun.

2024

(

936

10.1038/s41467-024-45024-5

Benson

Tandem repeats finder: a program to analyze DNA sequences

Nucleic Acids Res

1999

(

573

–

580

Bolton

Chetverikov

Ochoa

Klimov

Where eriophyoidea (acariformes) belong in the tree of life

Insects

2023

(

527

10.3390/insects14060527

Bushnell

BBMap: a fast, accurate, splice-aware aligner. 2014. https://sourceforge.net/projects/bbmap/ (Accessed March 4, 2021).

Camacho

Coulouris

Avagyan

Papadopoulos

Bealer

Madden

BLAST+: architecture and applications

BMC Bioinformatics

2009

(

421

10.1186/1471-2105-10-421

Chen

Jones

Lu-Irving

Yap

J-YS

van der Merwe

Bragg

Edwards

Chromosome-Level genome assembly of the Australian rainforest tree Rhodamnia argentea (Malletwood)

Genome Biol Evol

2024

(

evae238

Chen

Rossetto

van der Merwe

Lu-Irving

Yap

J-YS

Sauquet

Bourke

Amos

Bragg

Edwards

Chromosome-level de novo genome assembly of Telopea speciosissima (New South Wales waratah) using long-reads, linked-reads and Hi-C

Mol Ecol Resour.

2022

(

1836

–

1854

10.1111/1755-0998.13574

Chetverikov

Klimov

Vertical transmission and seasonal dimorphism of eriophyoid mites (Acariformes, Eriophyoidea) parasitic on the Norway maple: a case study

R Soc Open Sci.

2022

(

220820

Chrisman

Jung

J-Y

Stockham

Paskov

Washington

Wall

The human “contaminome”: bacterial, viral, and computational contamination in whole genome sequences from 1000 families

Sci Rep.

2022

(

9863

10.1038/s41598-022-13269-z

Coombe

Wong

Nikolic

Warren

Birol

LongStitch: high-quality genome assembly correction and scaffolding using long reads

BMC Bioinformatics

2021

(

534

10.1186/s12859-021-04451-7

de Lillo

Pozzebon

Valenzano

Duso

An intimate relationship between eriophyoid mites and their host plants—a review

Front Plant Sci.

2018

1786

10.3389/fpls.2018.01786

Edwards

RJ.

Telociraptor: Telomere Prediction and Genome Assembly Editing Tool. 2023. https://github.com/slimsuite/telociraptor.

Edwards

Field

Ferguson

Dudchenko

Keilwagen

Rosen

Johnson

Rice

Hillier

Hammond

, et al.

Chromosome-length genome assembly and structural variations of the primal Basenji dog (Canis lupus familiaris) genome

BMC Genomics

2021

(

188

10.1186/s12864-021-07493-6

Edwards

Moran

Devocelle

Kiernan

Meade

Signac

Foy

Park

SDE

Dunne

Kenny

, et al.

Bioinformatic discovery of novel bioactive peptides

Nat Chem Biol.

2007

(

108

–

112

Gamliel-Atinsky

Freeman

Maymon

Belausov

Ochoa

Bauchan

Skoracka

Peña

Palevsky

. The role of eriophyoids in fungal pathogen epidemiology, mere association or true interaction? In:

Ueckermann

, editors.

Eriophyoid mites: progress and prognoses

Dordrecht, Netherlands

Springer

;

2010

. p.

191

–

204

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Gougherty

Davies

A global analysis of tree pests and emerging pest threats

Proc Natl Acad Sci U S A.

2022

119

(

e2113298119

10.1073/pnas.2113298119

Greenhalgh

Dermauw

Glas

Rombauts

Wybouw

Thomas

Alba

Pritham

Legarrea

Feyereisen

, et al.

Genome streamlining in a minute herbivore that manipulates its host plant Weigel, D & Heckel, D, editors

eLife

2020

e56689

Gregory

Young

Small genomes in most mites (but not ticks)

Int J Acarol.

2020

(

–

10.1080/01647954.2019.1684561

Google Scholar

Crossref

WorldCat

Gupta

Earth biogenome project: present status and future plans

Trends Genet

2022

(

811

–

820

10.1016/j.tig.2022.04.008

Hessen

Jeyasingh

Neiman

Weider

Genome streamlining and the elemental costs of growth

Trends Ecol Evol.

2010

(

–

10.1016/j.tree.2009.08.004

Howe

Chow

Collins

Pelan

Pointon

D-L

Sims

Torrance

Tracey

Wood

Significantly improving the quality of genome assemblies through curation

Gigascience

2021

(

giaa153

10.1093/gigascience/giaa153

Brown

González De la Rosa

Mark

A Telomere Identification Toolkit.

2023

10.5281/zenodo.10091385

Karlicki

Antonowicz

Karnkowska

Tiara: deep learning-based classification system for eukaryotic sequences

Bioinformatics

2022

(

344

–

350

10.1093/bioinformatics/btab672

Keilwagen

Hartung

Grau

Gemoma: homology-based gene prediction utilizing intron position conservation and RNA-seq data

Methods Mol. Biol.

2019

1962

161

–

177

10.1007/978-1-4939-9173-0_9

Klimov

Chetverikov

Dodueva

Vishnyakov

Bolton

Paponova

Lutova

Tolstikov

Symbiotic bacteria of the gall-inducing mite Fragariocoptes setiger (Eriophyoidea) and phylogenomic resolution of the eriophyoid position among Acari

Sci Rep.

2022

(

3811

10.1038/s41598-022-07535-3

Kolmogorov

Yuan

Lin

Pevzner

Assembly of long, error-prone reads using repeat graphs

Nat Biotechnol.

2019

(

540

–

546

10.1038/s41587-019-0072-8

Kumar

Jones

Koutsovoulos

Clarke

Blaxter

Blobology: exploring raw genome data for contaminants, symbionts and parasites using taxon-annotated GC-coverage plots

Front Genet.

2013

237

10.3389/fgene.2013.00237

Kundu

Casey

Sung

W-K.

HyPo: super fast and accurate polisher for long read genome assemblies. bioRxiv 2019.12.19.882506.

10.1101/2019.12.19.882506

, 20 December 2019, preprint: not peer-reviewed.

Lindquist

Bruin

Sabelis

Eriophyoid mites: their biology, natural enemies and control

Elsevier

;

1996

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Lowe

Chan

tRNAscan-SE On-line: integrating search and context for analysis of transfer RNA genes

Nucleic Acids Res

2016

(

W54

–

W57

Lyčka

Bubeník

Závodník

Peska

Fajkus

Demko

Fajkus

Fojtová

TeloBase: a community-curated database of telomere sequences across the tree of life

Nucleic Acids Res

2024

(

D311

–

D321

Manni

Berkeley

Seppey

Simão

Zdobnov

BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes

Mol Biol Evol.

2021

(

4647

–

4654

10.1093/molbev/msab199

Nawrocki

Eddy

Infernal 1.1: 100-fold faster RNA homology searches

Bioinformatics

2013

(

2933

–

2935

10.1093/bioinformatics/btt509

Oldfield

. 1.4.3 Diversity and host plant specificity. In:

Eriophyoid mites their biology, natural enemies and Control

. World Crop Pests.

Lindquist

Sabelis

, &

Bruin

, editors. Vol.

Elsevier

;

1996

. pp.

199

–

216

Podlevsky

Bley

Omana

Chen

JJ-L

The telomerase database

Nucleic Acids Res

2008

(

Database

D339

–

D343

Price

Dehal

Arkin

FastTree 2—approximately maximum-likelihood trees for large alignments

PLoS One

2010

(

e9490

10.1371/journal.pone.0009490

Saffar

Matin

Tracing foreign sequences in plant transcriptomes and genomes using OCT4, a POU domain protein

Mol Genet Genomics.

2021

296

(

677

–

688

10.1007/s00438-021-01768-z

Scott

A revision of Rhodamnia (Myrtaceae)

Kew Bull

1979

(

429

–

459

Seemann

barrnap. 2018. https://github.com/tseemann/barrnap (Accessed December 11, 2021).

Sievers

Wilm

Dineen

Gibson

Karplus

Lopez

McWilliam

Remmert

Söding

, et al.

Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega

Mol Syst Biol.

2011

(

539

Skoracka

Smith

Oldfield

Cristofaro

Amrine

Host-plant specificity and specialization in eriophyoid mites and their importance for the use of eriophyoid mites as biocontrol agents of weeds

Exp Appl Acarol.

2010

(

1-3

–

113

10.1007/s10493-009-9323-6

Steinegger

Salzberg

Terminating contamination: large-scale search identifies more than 2,000,000 contaminated entries in GenBank

Genome Biol

2020

(

115

10.1186/s13059-020-02023-1

Stuart

Edwards

Cheng

Warren

Burt

Sherwin

Hofmeister

Werner

Ball

Bateson

, et al.

Transcript- and annotation-guided genome assembly of the European starling

Mol Ecol Resour.

2022

(

3141

–

3160

10.1111/1755-0998.13679

Tarailo-Graovac

Chen

Using RepeatMasker to identify repetitive elements in genomic sequences

Curr. Protoc. Bioinforma

2009

(

4.10.1

–

4.10.14

10.1002/0471250953.bi0410s25

Google Scholar

Crossref

WorldCat

Tobias

Edwards

Surana

Mangelson

Inácio

do Céu Silva

Várzea

Park

Batista

A chromosome-level genome resource for studying virulence mechanisms and evolution of the coffee rust pathogen Hemileia vastatrix. bioRxiv 502101.

10.1101/2022.07.29.502101

, 1 August 2022, preprint: not peer reviewed.

Ueckermann

Eriophyoid mites: progress and prognoses

Dordrecht, Netherlands

Springer

;

2010

Zhang

Rabiee

Sayyari

Mirarab

ASTRAL-III: polynomial time species tree reconstruction from partially resolved gene trees

BMC Bioinformatics

2018

(

153

10.1186/s12859-018-2129-y

Zhang

Z-Q

Eriophyoidea and allies: where do they belong?

Syst Appl Acarol

2017

(

1091

–

1095

Author notes

Richard J Edwards and Stephanie H Chen contributed equally to the work.

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact [email protected].

Associate Editor:

Download all slides

Month:	Total Views:
February 2025	107
March 2025	288
April 2025	80

Article Contents

Small but Mitey: A Gapless Telomere-to-Telomere Assembly of an Unidentified Mite With a Streamlined Genome

Abstract

Genome Sequencing and Assembly

A Gapless Telomere-to-telomere Reference Genome for an Unidentified Eriophyoid Mite

Genome Annotation and Evidence for Genome Reduction

Identification and Removal of Assembly Contamination

Conclusions

Supplementary Material

Acknowledgments

Author Contributions

Funding

Data Availability

Benefit Sharing

Literature Cited

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Small but Mitey: A Gapless Telomere-to-Telomere Assembly of an Unidentified Mite With a Streamlined Genome

Abstract

Genome Sequencing and Assembly

A Gapless Telomere-to-telomere Reference Genome for an Unidentified Eriophyoid Mite

Genome Annotation and Evidence for Genome Reduction

Identification and Removal of Assembly Contamination

Conclusions

Supplementary Material

Acknowledgments

Author Contributions

Funding

Data Availability

Benefit Sharing

Literature Cited

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only