Abstract

The Apicomplexa are a phylum of single-celled eukaryotes that can infect humans and include the mosquito-borne parasite Plasmodium, the cause of malaria. Viruses that infect non-Plasmodium spp. disease-causing protozoa affect the pathogen life cycle and disease outcomes. However, only one RNA virus (Matryoshka RNA virus 1) has been identified in Plasmodium, and none have been identified in zoonotic Plasmodium species. The rapid expansion of the known RNA virosphere via metagenomic sequencing suggests that this dearth is due to the divergent nature of RNA viruses that infect protozoa. We leveraged newly uncovered data sets to explore the virome of human-infecting Plasmodium species collected in Sabah, east (Borneo) Malaysia. From this, we identified a highly divergent RNA virus in two human-infecting P. knowlesi isolates that is related to the unclassified group ‘ormycoviruses’. By characterizing 15 additional ormycoviruses identified in the transcriptomes of arthropods, we show that this group of viruses exhibits a complex ecology as noninfecting passengers at the arthropod–mammal interface. With the addition of viral diversity discovered using the artificial intelligence–based analysis of metagenomic data, we also demonstrate that the ormycoviruses are part of a diverse and unclassified viral taxon. This is the first observation of an RNA virus in a zoonotic Plasmodium species. By linking small-scale experimental data to advances in large-scale virus discovery, we characterize the diversity and confirm the putative genomic architecture of an unclassified viral taxon. This approach can be used to further explore the virome of disease-causing Apicomplexa and better understand how protozoa-infecting viruses may affect parasite fitness, pathobiology, and treatment outcomes.

Introduction

Parasitic protozoa are a highly diverse collection of single-celled eukaryotes that can cause disease in many vertebrates. Organisms belonging to the phylum Apicomplexa are associated with a range of human diseases including malaria (Plasmodium), inflammation of the brain (Toxoplasma) (Aguirre, 2019), diarrhoea (Cryptosporidium) (Ryan et al. 2014), and severe anaemia (Babesia) (Krause 2019). Plasmodium is the leading cause of death from the Apicomplexa in humans worldwide (Poespoprodjo et al. 2023). This mosquito-borne infection is estimated to have caused >240 million cases of malaria and to have killed >600 000 people in 2022 alone (WHO 2023).

Efforts to control and treat malaria are challenged by the complex ecology of this parasite (Lee et al. 2022) and mounting antimalarial drug resistance (Wellems and Plowe 2001, Rosenthal et al. 2024). Of the five human-only infecting species of Plasmodium (P. falciparum, P. vivax, P. malariae, P. ovale wallikeri, and P. ovale curtisii), P. falciparum and P. vivax cause the greatest morbidity and mortality, with P. falciparum accounting for >95% of malaria fatalities (Poespoprodjo et al. 2023). Partial resistance of P. falciparum to artemisinin is entrenched in the Greater Mekong subregion of Southeast Asia (Blasco et al. 2017, Ehrlich et al. 2020) and has now emerged independently in Africa (Conrad et al. 2023, WHO 2023, Rosenthal et al. 2024). Eight additional Plasmodium species can cause human malaria through zoonotic transmission via mosquito vectors (Fornace et al. 2023). Among these, P. knowlesi is the only species to cause severe disease and death in humans (Grigg et al. 2018, Rajahram et al. 2019, Anstey et al. 2021, Poespoprodjo et al. 2023). Although predominating in Malaysian Borneo (Cox-Singh et al. 2008, Cooper et al. 2020), P. knowlesi is now recognized as a significant cause of malaria across Southeast Asia (Tobin et al. 2024), in association with changing land use and deforestation (Grigg et al. 2017, Brock et al. 2019, Fornace et al. 2019, Tobin et al. 2024), and in areas with declining incidence of the cross-protective species, P. vivax (Anstey and Grigg 2019). Thus, innovative strategies are needed to combat and control Plasmodium as the efficacy of accessible treatments declines in the human-only species, and changes in land-use cause greater numbers of zoonotic malaria cases.

One potential approach for malaria control involves the use of viruses that infect disease-causing protozoa. In a similar manner to how bacteriophages have been leveraged to combat drug-resistant bacterial infections (Kortright et al. 2019, Hatfull et al. 2022, Strathdee et al. 2023), protozoa-infecting viruses have been proposed as a potential new avenue for therapeutics (Barrow et al. 2020, Zhao et al. 2023). These parasitic protozoan viruses (PPVs) (Zhao et al. 2023) have been identified in Giardia (Wang and Wang 1986), Leishmania (Grybchuk et al. 2018, Atayde et al. 2019), Cryptosporidum (Berber et al. 2021, Adjou et al. 2023), Eimeria (Revets et al. 1989, Ellis and Revets 1990, Roditi et al. 1994, Lee et al. 1996, Lee and Fernando 1998, 2000, Han et al. 2011, Wu et al. 2016, Xin et al. 2016), Toxoplasma (Gupta et al. 2024), P. vivax (Charon et al. 2019, Kim et al. 2019), and Babesia (Johnston et al. 1991, Hotzel et al. 1992). They are of particular interest because some impact the parasite life cycle and modulate disease outcomes in the parasite host. Notably, Leishmania species that harbour Leishmania RNA virus 1 have been associated with an increased risk of treatment failure in humans (Heeren et al. 2023) and more severe disease outcomes in mice (Atayde et al. 2019). Similarly, it has been proposed that infection of Toxoplasma with the recently characterized apocryptoviruses (Narnaviridae) may be associated with increased disease severity in humans (Gupta et al. 2024), although this has yet to be formally tested. Cryptosporidium parvum virus 1 modulates the interferon response in Cryptosporidium-infected mammals (Deng et al. 2023). To date, however, only one virus, Matryoshka RNA virus 1, has been identified in a Plasmodium species (P. vivax) (Charon et al. 2019, Kim et al. 2019), and it is not known whether this virus impacts Plasmodium fitness or disease pathogenesis in humans.

Extending the known diversity of PPVs requires innovative approaches to virus discovery because both protozoa and the viruses that infect them are likely ancient and often highly divergent. As a case in point, the ormycoviruses were first identified in parasitic single-celled eukaryotes and fungi using structure-based methods (Forgia et al. 2022) and have since been identified in kelp (Stramenopila) (Dekker et al. 2024), ticks (Martyn et al. 2024), palm (Niu et al. 2024), and additional fungal species (Pagnoni et al. 2023, Sahin et al. 2024). This group of bi-segmented RNA viruses shares no discernible phylogenetic relationship to known viral taxa, rendering it previously invisible to sequence-based discovery methods (Forgia et al. 2022). Little else is known about ormycoviruses including their complete host range or whether they encode positive- or negative-sense genomes. The application of artificial intelligence–based methods to metagenomic data (Hou et al. 2024), in addition to the large-scale sampling of aquatic environments (Zayed et al. 2022), has further uncovered previously inaccessible virus diversity, including entirely novel ‘supergroups’ of unclassified viral taxa (Hou et al. 2024). These tools and the data they have generated can be leveraged to explore the viromes of disease-causing protozoa including Plasmodium.

In this study, we combine these facets of virus discovery to characterize a divergent virus associated with human-infecting P. knowlesi isolates. We contextualize this virus within the vast viral diversity revealed through large-scale virus discovery studies. We also explore the complex ecology of viruses that infect parasites and can be transmitted as passengers to mammalian hosts (i.e., without directly infecting the mammal). Our findings extend the diversity of known Plasmodium-associated viruses and highlight the importance of integrating large- and small-scale virus discovery research to better understand viruses that infect these ancient, microscopic hosts.

Results

Identification of a divergent RNA virus associated with human-infecting Plasmodium knowlesi

To extend the known diversity of RNA viruses in disease-causing apicomplexans, we analysed the metatranscriptomes of 18 human blood samples with polymerase chain reaction (PCR)-confirmed Plasmodium infections and six uninfected human controls, collected in Sabah, east (Borneo) Malaysia between 2013 and 2014. These samples are the same as those previously described (Charon et al. 2019). Of the patients with malaria, seven were infected with P. vivax, six with P. knowlesi, and five with P. falciparum (Charon et al. 2019). Sequencing libraries were pooled according to Plasmodium species as were the negative controls, resulting in four libraries (SRR10448859–62, BioProject PRJNA589654). Matryoshka RNA virus 1 was previously found exclusively in all seven P. vivax isolates (SRR10448862) (Charon et al. 2019).

We searched each library for divergent viruses using the RNA-dependent RNA polymerase (RdRp)-scan bioinformatic pipeline (Charon et al. 2022). This revealed a putative, highly divergent RdRp that was 3177 nt in length with a complete open reading frame (ORF) and robust sequencing coverage (median coverage: 4004×) in the P. knowlesi library (SRR10448860) (Fig. 1a). No identical or related sequences were found in the other three libraries. The transcript was relatively abundant (1.4% of non-rRNA reads), and we confirmed the presence of this putative RdRp in two of the six isolates in the pool using reverse transcription PCR (RT-PCR) (Supplementary Table S1, Supplementary Fig. S1). Both patients with putative virus-infected P. knowlesi isolates were from Kota Marudu district residing in villages ∼30 km apart. There was a 3-month difference in the date of hospital presentation. Both had uncomplicated malaria with parasitaemia of 7177 and 41 882 parasites/µl, respectively, which were higher than the median parasitaemia found in the P. knowlesi infections that lacked the putative virus (4518/µl). Parasitaemia was correlated with the RdRp signals we observed with PCR (Supplementary Fig. S1).

A divergent RNA virus associated with human-infecting P. knowlesi is a member of the unclassified group ‘ormycovirus’. (a) Sequencing coverage of the RdRp of a P. knowlesi-associated viral contig. Trimmed reads were mapped to the assembled contig using BBMap (Bushnell et al. 2017) and visualized with Geneious Prime v2024.0.7. (b) The predicted structure of the putative hypothetical protein of Selindung RNA virus 1. (c) MAFFT alignment of motif C in the palm domain of the P. knowlesi- and Cystoisospora-associated viruses and representative ormycoviruses. (d) Phylogenetic inference of the ormycoviruses aligned with MAFFT. The positions of Erysiphe-associated viruses are denoted with black icons (source: phylopic.org). Red tip dots indicate viruses identified in this study. Tips with names in quotes were previously identified but not named (Forgia et al. 2022). Their corresponding NCBI or “RNA Viruses in Metatranscriptomes database” (RVMT) accession is shown in parentheses. The catalytic triad encoded in each palm domain is denoted in grey. Support values are shown at select nodes as sh-aLRT/UFBoot. Tree branches are scaled to amino acid substitutions.
Figure 1.

A divergent RNA virus associated with human-infecting P. knowlesi is a member of the unclassified group ‘ormycovirus’. (a) Sequencing coverage of the RdRp of a P. knowlesi-associated viral contig. Trimmed reads were mapped to the assembled contig using BBMap (Bushnell et al. 2017) and visualized with Geneious Prime v2024.0.7. (b) The predicted structure of the putative hypothetical protein of Selindung RNA virus 1. (c) MAFFT alignment of motif C in the palm domain of the P. knowlesi- and Cystoisospora-associated viruses and representative ormycoviruses. (d) Phylogenetic inference of the ormycoviruses aligned with MAFFT. The positions of Erysiphe-associated viruses are denoted with black icons (source: phylopic.org). Red tip dots indicate viruses identified in this study. Tips with names in quotes were previously identified but not named (Forgia et al. 2022). Their corresponding NCBI or “RNA Viruses in Metatranscriptomes database” (RVMT) accession is shown in parentheses. The catalytic triad encoded in each palm domain is denoted in grey. Support values are shown at select nodes as sh-aLRT/UFBoot. Tree branches are scaled to amino acid substitutions.

Further inspection indicated that this putative virus was a bi-segmented ormycovirus likely infecting the Plasmodium. The divergent RdRp shared low but detectable sequence similarity with that of seven previously identified viruses, of which six were ormycoviruses (Supplementary Table S2). We identified a putative second segment of unknown function, 1721 nt in length sharing 22.8% identity (e-value = 3.14 × 10−15) with the hypothetical protein of Erysiphe lesion-associated ormycovirus 1 (USW07196). The structures of the putative and known hypothetical proteins were significantly similar (P-value = 1.62 × 10−2) when predicted with AlphaFold2 (Jumper et al. 2021, Mirdita et al. 2022) and compared by pairwise alignment with FATCAT (Li et al. 2020) (Fig. 1b). Similar transcripts were not identified in the ormycovirus-negative libraries from the same BioProject. Analysis of the library composition with CCMetagen (Marcelino et al. 2020) and the KMA database (Clausen et al. 2018) did not reveal plausible host candidates aside from the Plasmodium, which comprised 24% of non-rRNA reads. The remainder aligned to the Hominidae, reflecting that the Plasmodium were themselves infecting humans. We assumed that the host range of the ormycoviruses likely did not extend to vertebrates, consistent with their absence in humans without Plasmodium infection. Unlike its closest relatives, the P. knowlesi-associated RdRp encoded GDD in motif C of its palm domain rather than NDD (Fig. 1c).

To assess the prevalence of this and other ormycoviruses in P. knowlesi, we screened 1470 P. knowlesi RNA Sequence Read Archive (SRA) libraries (Supplementary Data S1) with a custom ormycovirus database. This returned no additional ormycovirus candidates. However, all 1470 libraries were generated from only seven BioProjects, and only the library we generated was derived from human-host P. knowlesi infections. The majority (n = 1356) were generated from macaque-host P. knowlesi infections, and all of these were generated by a single contributor from a small set of laboratory-maintained Rhesus macaques (PRJNA508940, PRJNA526495, and PRJNA524357). Sixty-one libraries were derived from cell culture, and the source of 52 (BioProject PRJEB24220) could not be determined. Thus, an accurate prevalence estimate of the P. knowlesi-associated ormycovirus could not be obtained from this data set.

We next investigated the prevalence of ormycoviruses more broadly in disease-causing apicomplexan by screening 2898 RNA SRA libraries (Cryptosporidum, Coccidia, Toxoplasma, Babesia, and Theileria) (Supplementary Data S2). This yielded identical ormyco-like RdRp segments in the transcriptomes of 22 Coccidia (Cystoisospora suis) libraries, 21 of which belonged to the same BioProject (PRJEB52768) (Cruz-Bustos et al. 2022). The remaining library (SRR4213142) was published by the same authors, suggesting that all 22 libraries were generated from the same source (Palmieri et al. 2017). This bias precluded us from inferring the prevalence of ormycoviruses in Cystoisospora. The transcripts of the Cystoisospora-associated virus encoded complete ORFs with an NDD motif C and were ∼3.1 kb in length (range: 3009–3203) (Supplementary Table S3). This virus was highly divergent, sharing only 32.5% identity (e-value = 6 × 10−37) with its closest blast hit (Wildcat Canyon virus, WZL61396.1). It was also at low abundance across the 22 libraries (range: 0.01%–0.08% of non-rRNA reads). We could not conclude that C. suis was the host because fungi represented 4.6% of the non-rRNA reads in a representative library (ERR9846867). Regardless, the prevalence of ormycoviruses was 100% among Cystoisospora suis libraries but otherwise very low in this data set (0.76%).

Phylogenetic analysis placed both apicomplexan-associated viruses in the ‘Alpha’ clade of the ormycoviruses (Fig. 1d). The topology of the inferred phylogenies was stable across six combinations of alignment and trimming methods and recapitulated the three main ormycovirus clades ‘Alpha’, ‘Beta’, and ‘Gamma’ (Forgia et al. 2022) with strong support (Supplementary Fig. S2). Viruses did not cluster by host. For example, viruses associated with the fungal species Erysiphe fell across all three clades and encoded three different catalytic triads (Fig. 1d, ‘icons’), and the apicomplexan-associated viruses were not closely related within the Alpha clade.

We concluded that the P. knowlesi-associated virus represents the first evidence of an RNA virus associated with P. knowlesi and constitutes only the second instance of an RNA virus associated with any Plasmodium species. We have provisionally named it ‘Selindung RNA virus 1’ because it appeared to be concealed (‘terselindung’, Bahasa Malaysia) within the Plasmodium parasite, and we will use this name herein.

Ormycoviruses are associated with arthropod metatranscriptomes

In addition to expanding the diversity of Plasmodium-associated RNA viruses, Selindung RNA virus 1 was of particular interest because it had evidently been transmitted along with its Plasmodium host to a human via a mosquito vector. Taking this together with the detectable phylogenetic relationship of this virus and two viruses recovered from tick metagenomes (Wildcat Canyon virus and Kasler Point virus), we posited that ormycoviruses might exhibit a nested ecology at the arthropod–mammal interface. We therefore sought to further extend the known host range of ormycoviruses to the transcriptomes of the arthropods that indirectly transmit them.

We screened the 4864 arthropod libraries available on NCBI Transcriptome Shotgun Assemblies (TSAs) as of August 2024, initially using Kasler Point virus (WZL61394) as input and then following an iterative process (see Methods). In this way, we identified 15 putative viruses associated with three of the four extant subphyla of the Arthropoda: Chelicerata (n = 1), Crustacea (n = 1), and Hexapoda (n = 13) (Supplementary Table S4). All shared detectable but minimal sequence similarity with published ormycoviruses (range: 27.1%–41.0%, Supplementary Table S4). Two encoded GDD at motif C like Selindung RNA virus 1, while the remainder had NDD at this position.

Phylogenetic analysis again supported the conclusion that these viruses are part of the ormycovirus group (Fig. 2a). All viruses identified in this study fell in the Alpha clade. Selindung RNA virus 1 formed a group with the two other viruses that encode GDD at motif C in the palm domain of the RdRp (Beetle-associated ormycovirus 1 and Bristletail-associated ormycovirus 1). This placement was consistent across all six iterations of phylogenetic inference (Supplementary Fig. S3). However, aside from this instance and the Gamma clade (GDQ motif), minimal clustering of motifs was observed. In addition, although the host organisms had been collected from all six inhabited continents, there was no clustering of viruses by geographic region of sampling, perhaps a reflection of the unrealized diversity and ancient evolutionary history of this taxon (Fig. 2a).

Ormycoviruses are associated with arthropod transcriptomes. (a) Phylogenetic inference of the extended diversity of ormycoviruses. Viruses identified in this study are indicated by red circles. The arrow indicates the position of the virus that appears to use the ciliate genetic code. Clades are annotated according to designations established by Forgia et al. (2022). The catalytic triad encoded in each palm domain is denoted in grey. The tip labelling scheme for unnamed viruses (denoted by quotation marks) is the same as in Fig. 1. Support values are shown at select nodes as sh-aLRT/UFBoot. Tree branches are coloured by the location where each tip was sampled, and they are scaled by amino acid substitutions. (b) Library composition of select arthropod assemblies. The graph labels correspond to the TSA project ID.
Figure 2.

Ormycoviruses are associated with arthropod transcriptomes. (a) Phylogenetic inference of the extended diversity of ormycoviruses. Viruses identified in this study are indicated by red circles. The arrow indicates the position of the virus that appears to use the ciliate genetic code. Clades are annotated according to designations established by Forgia et al. (2022). The catalytic triad encoded in each palm domain is denoted in grey. The tip labelling scheme for unnamed viruses (denoted by quotation marks) is the same as in Fig. 1. Support values are shown at select nodes as sh-aLRT/UFBoot. Tree branches are coloured by the location where each tip was sampled, and they are scaled by amino acid substitutions. (b) Library composition of select arthropod assemblies. The graph labels correspond to the TSA project ID.

We concluded that these viruses were likely infecting fungi and single-celled organisms rather than the arthropods themselves for three reasons. First, the assessment of each library composition revealed instances of parasitic hosts. Contigs mapping to alveolates accounted for more than one-tenth of one Hexapoda (GDXN01) and the only crustacean (GFJG01) library (13% Gregarinidae and 12% Ciliophora, respectively) (Fig. 2b). Similarly, the mite assembly (GEYJ01) included 25% of contigs mapping to fungi, which are known hosts of ormycoviruses (Fig. 2b). Second, the virus identified in GBHO01 (Lygus hesperus) likely utilized the ciliate genetic code (i.e. only a truncated ORF could be recovered with the standard genetic code) yet fell within the diversity of the taxon (Fig. 2a, Supplementary Fig. S3, ‘arrow’). Thus, we assumed that any organism that utilizes the standard genetic code was unlikely to be the host of this putative virus. Identical amino acid translations of the crustacean-associated virus were produced when either the standard or the ciliate genetic code was used, rendering the ciliates detected in these libraries as plausible hosts. Third, the phylogenetic placement of arthropod-associated ormycoviruses was interspersed among known fungal and protozoan-infecting ormycoviruses, suggesting that animals were unlikely to be the host of the newly identified viruses (Fig. 2a). We therefore concluded that nonanimal eukaryotes are the most likely hosts of the ormycoviruses.

As with the P. knowlesi library, we searched these assemblies for hypothetical proteins. From this, we identified a putative second segment in the Machilis pallida (Hexapoda) assembly HBDP01 containing Bristletail-associated ormycovirus 1 that was 1619 bp in length and encoded a partial ORF (HBDP01002991.1). We could not recover candidates corresponding to the remaining libraries or assemblies.

The ormycoviruses are members of a diverse and unclassified viral taxon

The wide host range of the ormycoviruses, spanning Alveolata, Stramenopila, and Opisthokonta (Fungi), suggested that this unclassified group harboured unrealized viral diversity. We therefore aimed to contextualize the diversity of the ormycoviruses within unclassified taxa identified in virus discovery studies. To do this, we assembled a custom database of viruses that were identified with an artificial intelligence-based analysis of metagenomic data (Hou et al. 2024) and screened the ormycoviruses against it using DIAMOND Blastx (Buchfink et al. 2021). This approach placed ormycoviruses within an unclassified taxon referred to in the original study as the proposed ‘SuperGroup 024’ (Hou et al. 2024), a name which we will use herein.

Phylogenetic analysis illustrated that the current set of ormycoviruses represent only a fraction of the total diversity of this group as they fell throughout the phylogeny. Interestingly, the addition of the SuperGroup 024 viruses expanded the diversity of the Alpha group, scattering the original members across three sections of the tree (Fig. 3a, ‘blue branches’). The Beta and Gamma clades were unchanged and characterized by a long branch at their shared base (Fig. 3a, ‘green and yellow branches’).

Ormycoviruses are members of a diverse and unclassified viral taxon with a flexible motif C in its RdRp palm domain. (a) Phylogenetic inference of viruses in SuperGroup 024 (Hou et al. 2024). Branches are coloured by their placement in the ormycovirus-only phylogenetic tree (Figs 1 and 2). Grey tree branches indicate that those tips were not previously recognized as ormycoviruses. The icons show the proportion of individual amino acids at each position of the catalytic triad in motif C of the RdRp palm domain for the corresponding clades. The arrow indicates the topological position of Selindung RNA virus 1. Tree branches are scaled according to amino acid substitutions. (b) Distribution of catalytic triads encoded by members of SuperGroup 024. The x-axis shows the percentage that each triad comprises among all known SuperGroup 024 species.
Figure 3.

Ormycoviruses are members of a diverse and unclassified viral taxon with a flexible motif C in its RdRp palm domain. (a) Phylogenetic inference of viruses in SuperGroup 024 (Hou et al. 2024). Branches are coloured by their placement in the ormycovirus-only phylogenetic tree (Figs 1 and 2). Grey tree branches indicate that those tips were not previously recognized as ormycoviruses. The icons show the proportion of individual amino acids at each position of the catalytic triad in motif C of the RdRp palm domain for the corresponding clades. The arrow indicates the topological position of Selindung RNA virus 1. Tree branches are scaled according to amino acid substitutions. (b) Distribution of catalytic triads encoded by members of SuperGroup 024. The x-axis shows the percentage that each triad comprises among all known SuperGroup 024 species.

Members of SuperGroup 024 encoded a more diverse set of catalytic triads at the motif C palm domain compared to the original ormycovirus data set (Forgia et al. 2022) (Fig. 3b). However, their addition did not lead to observable clustering of discrete motif sequences, as flexibility was observed throughout the phylogeny. Selindung RNA virus 1 again fell in a section predominated by GDD at that position (Fig. 3a, ‘arrow’).

We searched the libraries containing SuperGroup 024 RdRp segments for ormycovirus hypothetical proteins. Of the 259 SRA libraries in which SuperGroup 024 RdRps were detected and assembled, we recovered hypothetical protein candidates at least 1000 bp in length in 190 (73.4%). It was not possible to assign hypothetical proteins to corresponding RdRps as many libraries contained multiple RdRp segments. Despite this, our finding supports the conclusion that bisegmentation is a characteristic of viruses in this taxon and that ormycoviruses and SuperGroup 024 are one and the same.

Discussion

This study expands the diversity of Plasmodium-associated RNA viruses and presents the first evidence of an RNA virus associated with zoonotic transmission of P. knowlesi. Previously, only Matryoshka RNA virus 1 (Narnaviridae) had been identified in a Plasmodium species (the human-only Plasmodium species, P. vivax) (Charon et al. 2019, Kim et al. 2019). Although it is not possible to conclusively establish that Selindung RNA virus 1 was infecting P. knowlesi from metatranscriptomic data alone, lines of indirect evidence suggest that it was. Most notably, no other probable hosts, including fungi, were identified in the library, and the RdRp contig was relatively abundant (1.4% of non-rRNA reads). Contamination was an unlikely source because neither the putative RdRp nor the second segment was detected in the other three libraries extracted and sequenced at the same time. In addition, we were able to confirm the presence of the RdRp segment in two of the six P. knowlesi isolates using PCR. We therefore concluded that Selindung RNA virus 1 most likely represents an RNA virus in a second Plasmodium species.

Robust sampling of natural P. knowlesi infections is needed to evaluate the prevalence and pathobiology of Selindung RNA virus 1. We observed one instance of Selindung RNA virus 1 among 1470 SRA libraries, which suggests that associations occur infrequently and contrasts with the identification of Matryoshka RNA virus 1 in 13 of 30 P. vivax SRA libraries (Charon et al. 2019). However, ours was the only library to have been generated from isolates collected from naturally infected humans, while most of the publicly available data were derived from laboratory experiments. The detection of the RdRp segment in two of six isolates in our library could indicate that associations are more frequent in natural infections in Sabah, but our study was not sufficiently powered to address this. Similarly, whether the observation that the presence of the virus was correlated with higher parasitaemia is meaningful requires further epidemiological investigation.

Arthropods are a powerful tool for measuring the prevalence of viruses in nature, particularly when sampling from humans or other vertebrates is not feasible. The identification of ormycoviruses in arthropod metatranscriptomes and in a human blood sample suggests that these viruses represent a unique type of arbovirus that can be transmitted as a passenger between arthropods and mammals. Mosquito-based surveillance methods have been proposed for tracking the incidence and spread of human pathogens (Grubaugh et al. 2015, Fauver et al. 2018). Unlike cell culture or primary samples, which rely on symptomatic individuals with access to diagnostic testing, arthropod-based surveillance would be relatively unbiased, enabling more accurate estimates of protozoan virus prevalence and diversity within communities. When combined with cell culture data, this approach could also be used to parse arthropod- and protozoan-infecting viruses. Because they can be indirectly transmitted by arthropods, it may be that other protozoan viruses have already been identified, but their relationship to their protozoan host was obscured because they were part of an arthropod metatranscriptome.

An incidental and surprising finding was the identification of an ormycovirus that appears to use a nonstandard genetic code (Plant bug-associated ormycovirus 1). Despite this difference, the virus fell within the diversity of the ormycoviruses and SuperGroup 024. As RNA viruses are reliant on host machinery for translation, it was previously proposed that the evolution of alternative genetic codes was an antiviral defence (Shackelton and Holmes 2008). Under this assumption, the use of host-specific genetic codes by RNA viruses would imply a long-term virus–host coevolutionary relationship, and we would not expect to find viral taxa in which members use different genetic codes. Genetic code switching has been observed infrequently in the Picornavirales and Lenarviricota (Chen et al. 2022). Whether these few instances are an aberration in an otherwise broadly held rule of virology requires further investigation. However, we posit that there may be many more instances of code switching within known viral taxa that have been overlooked as a consequence of inadequate bioinformatic workflows. For example, if we had used an automated pipeline that filtered out contigs that did not produce an ORF with the standard genetic code, Plant bug-associated ormycovirus 1 would have been removed from our data set. We therefore advocate for the inclusion of multiple genetic codes when searching for divergent RNA viruses.

That Selindung RNA virus 1 does not belong to a classified viral taxon is notable because it demonstrates that parasitic protozoa likely harbour currently unrealized diversity, and additional discoveries may be imminent as new bioinformatic tools are developed to explore the RNA virosphere. However, the discovery of the ormycoviruses highlights the importance of linking large-scale metatranscriptomic data to smaller-scale experimental work when searching for protozoan viruses. Large-scale virus discovery studies often prioritize environmental samples such as water (Zayed et al. 2022), sediment (Hou et al. 2024), and soil (Chen et al. 2022) because these biodiverse sources are rich with RNA viruses. Yet, this approach cannot distinguish between bacteria-, archaea-, and eukaryotic-infecting RNA viruses. Without the discovery of the ormycoviruses and the experimental validation by Forgia et al. (2022), SuperGroup 024 would have been overlooked as a potential source of protozoan virus candidates. Similarly, large-scale studies are not equipped to distinguish segmented from nonsegmented viruses because they necessarily focus on detecting RdRps, rendering them ‘blind’ to segmentation. The molecular characterization of the ormycoviruses again demonstrates this limitation because their hypothetical protein does not share detectable sequence or structural similarity with known viral proteins.

This, as with other metagenomic studies, primarily serves to generate hypotheses and raise questions about RNA virus evolution and biology that require additional experimental data to answer. It is not known whether the ormycoviruses are positive- or negative-sense viruses. Forgia et al. observed a higher proportion of negative-sense RNA in their samples but could not draw a definitive conclusion in the absence of a true virion (Forgia et al. 2022). The presence of both SDD and GDD catalytic triads in motif C of the palm domain counters the hypothesis that SDD is specific to segmented negative-sense RNA viruses (Venkataraman et al. 2018), although it is possible that ormycoviruses do indeed fall into this category. The flexibility of the catalytic triad also raises the question of whether individual triads have a detectable impact on the biology of the virus and why flexibility is permitted in an otherwise highly conserved region of the virus genome. From a global health perspective, the most important questions to address include how viral infection of Plasmodium affects onward Plasmodium transmission and the pathobiology of Plasmodium in humans. Additionally, which part of the parasite the virus infects and whether this could be used as a potential drug target remain unanswered. It has already been shown that viruses can serve as a weapon against drug-resistant bacterial infections (Kortright et al. 2019, Hatfull et al. 2022, Strathdee et al. 2023). Whether a similar approach could be deployed to combat malaria and other disease-causing apicomplexan should be a research priority.

Methods

Human malaria isolates

Plasmodium RNA was isolated from cryopreserved red cells collected from 18 patients with acute malaria, enrolled in Kudat Division, Sabah, Malaysia in 2013 and 2014 (Grigg et al. 2018). PCR was used to confirm Plasmodium species as P. knowlesi (n = 6), P. vivax (n = 7), and P. falciparum (n = 5). The methods for species typing and RNA extraction were reported previously (Charon et al. 2019).

SRA library data sets

BioProject PRJNA589654 libraries

Plasmodium SRA libraries in BioProject PRJNA589654 (n = 4) (i.e. the BioProject that contained Matryoshka RNA virus 1) were downloaded from NCBI. Nextera adapters were trimmed using Cutadapt v.1.8.3 (Martin 2011) with the parameters removing five bases from the beginning and end of each read, a quality cut-off of 24, and a minimum length threshold of 25. The quality of trimming was assessed using FastQC v0.11.8 (Andrews 2023). rRNA reads were removed using SortMeRNA v4.3.3 (Kopylova et al. 2012), and non-rRNA reads were assembled using MEGAHIT v1.2.9 (Li et al. 2015).

Disease-causing apicomplexan libraries

We downloaded all P. knowlesi RNA SRA libraries of at least 0.5 Gb in size available on NCBI as of August 2024 (n = 1470). We also downloaded all RNA SRA libraries for Cryptosporidium, Coccidia, Toxoplasmosis, Babesia, and Theileria available on NCBI as of March 2024 that are at least 0.5 Gb in size and generated on the Illumina platform (n = 3162).

SuperGroup 024 libraries

To analyse the libraries containing RdRp segments of so-called SuperGroup 024 (Hou et al. 2024), we first downloaded all of the contigs designated in this group by Hou et al. (2024) (http://47.93.21.181/). We then extracted the corresponding SRA libraries from each sequence header and removed duplicates (n = 273). All but one were downloaded from NCBI. The library SRR1027962 failed repeated attempts to download, likely due to its size (99.8 Gb).

Arthropod TSA screen

We began by screening all arthropod TSA (n = 4864) available in August 2024, using Kasler Point virus (a tick-associated ormycovirus) as input. This screen was performed with tBLASTn implemented in the NCBI Blast web interface (https://blast.ncbi.nlm.nih.gov/Blast.cgi). All hits were reviewed and filtered according to three criteria: (I) the contig was at least 800 bp in length, (II) the contig encoded an uninterrupted ORF, (III) the contig did not return any hits to cellular genes when screened against the NCBI nonredundant (nr) database. We then aligned our filtered data set using MAFFT (Katoh and Standley 2013) with default parameters and selected the most divergent virus according to the distance matrix. This virus was then used as input for an additional screen of the arthropod TSA. This process was repeated until no new contigs were identified.

Library processing

Contig assembly

For all data sets obtained from the SRA, Nextera adapters were trimmed using Cutadapt v.1.8.3 (Martin 2011) with the parameters described earlier. The efficacy of trimming was assessed using FastQC v0.11.8 (Andrews 2023). In total, 1470 P. knowlesi libraries, 2898 additional apicomplexan libraries, and 259 libraries included by Hou et al. (2024) were successfully assembled using MEGAHIT v1.2.974.

Abundance estimates

The expected count of putative viral transcripts was inferred using RSEM v1.3.0 (Li and Dewey 2011). For the P. knowlesi library containing the ormycovirus (SRR10448860), reverse-strandedness was specified to match the sequencing protocol. Default parameters were used for the remaining libraries. To infer the proportion of reads of each putative viral transcript, we calculated the total expected count for the isoforms in each library and used this value as the denominator to measure the percentage that putative viral reads comprised in the library. This analysis was performed using R v4.4.0.

Identification of divergent viruses

Polymerase segment identification

We identified Selindung RNA virus 1 using the RdRp-scan workflow (Charon et al. 2022). Briefly, we screened the protein sequence and Hidden Markov Models (HMM) profile of assembled contigs from each library against a viral RdRp database. To search for additional divergent viruses, we screened all SRA libraries against the RdRp-scan database (Charon et al. 2022) and a custom database containing known ormycoviruses using DIAMOND Blastx v2.0.9 (Buchfink et al. 2021) and the setting ‘ultra-sensitive’. This database included the 39 published ormycoviruses and the Selindung RNA virus 1 RdRp segment. Only hits with e-values <1e-07 were retained for further analysis. Contigs with hits to this database were then screened against the NCBI nr protein database to remove false positives, again using DIAMOND Blastx v2.0.9 (Buchfink et al. 2021) and an e-value threshold of 1e-07. The parameter ‘very-sensitive’ was specified. Contigs that shared detectable sequence similarity to cellular genes were excluded from further analysis. Nucleotide sequences were translated using Expasy (https://web.expasy.org/translate/). The standard genetic code was used by default. Contigs that did not return an ORF in any frame with this code were checked manually using all codes available in Expasy.

Second segment identification

We first used blastn to screen libraries for contigs sharing conserved 5ʹ and 3ʹ termini of the corresponding ormycovirus RdRp. When this did not reveal any candidates, we compiled a database of all known ormycovirus second segments and used this to screen all SRA libraries using DIAMOND Blastx v2.0.9 (Buchfink et al. 2021). Contigs that had statistically significant hits to this database were checked against the NCBI nr protein database to remove false positives (i.e. nonendogenous viral element cellular genes). Nucleotide sequences were either translated individually with Expasy (https://web.expasy.org/translate/) or with InterProScan v5.65-97.0. For sequences processed with the latter, the longest translated ORFs were used for downstream analysis. To tally the number of SuperGroup 024 libraries with detectable hypothetical proteins, we cross-checked the presence of RdRp segments and hypothetical protein segments in each library using R v4.4.0.

For the primary P. knowlesi library, we searched for similar sequences to those at the 5ʹ and 3ʹ termini of the RdRp segment in other contigs in the library. To do this, we extracted these regions from the RdRp segment and used each as input for tblastn against the assembled library (SRR10448860). To ensure that the putative Selindung RNA virus 1 hypothetical protein was not present in other libraries in the same BioProject, we used this sequence as input for tblastn against the three remaining libraries.

Both tblastn screens were implemented in Geneious Prime v2024.0.7 and default parameters were used.

PCR validation

We first generated cDNA from the isolates using the SuperScript IV reverse transcriptase (Invitrogen). These products were then used as templates for amplification with PCR. Reactions were carried out in a total volume of 50 µl, of which 25 µl was the SuperFi II (Invitrogen) master mix and 1 µl was the cDNA template. A volume of 2.5 µl of forward and reverse primers was used (Supplementary Table S1). Reactions were performed on a thermocycler with the following conditions: 98°C for 1 min followed by 35 cycles of 98°C for 10s, 60°C for 10s, 72°C for 1 min, and 72°C for 5 min. The PCR products were analysed on an agarose gel. We used Plasmodium LDHP primers as the positive control.

Library composition analysis

CCMetagen

The composition of individual sequencing libraries was assessed using ccmetagen v1.2.4 (Marcelino et al. 2020) and kma v1.3.9a (Clausen et al. 2018) with assembled contigs as input. The results presented in Fig. 2b were visualized with Prism v.10.3.0.

Protein structure inference

The structure of the putative hypothetical proteins of Selindung RNA virus 1 and Erysiphe lesion-associated ormycovirus 1 was predicted using AlphaFold2 (Jumper et al. 2021, Mirdita et al. 2022) implemented in the Google Colab cloud computing platform. The confidence (as measured by pLDDT) of the prediction was compared across five models, and the highest performing models (Selindung RNA virus 1: #2, Erysiphe lesion-associated ormycovirus 1: #4) were selected for downstream analysis (Supplementary Fig. S4). To assess structural similarity, we performed a pairwise alignment of the resulting PDB files of each predicted structure using FATCAT (Li et al. 2020). All PDB files were visualized in ChimeraX v1.7.1 (Meng et al. 2023).

Functional domain inference

Several approaches were used to infer functional domains in the hypothetical protein, although none were successful. We first performed a preliminary check with InterProScan (Jones et al. 2014), screening against the CDD, NCBIfam, and TMHMM databases. This approach was implemented in Geneious Prime v2024.0.7. We then employed Phyre2 (Kelley et al. 2015) and HHPred (Zimmermann et al. 2018) using PDB. Finally, we used the predicted structure of the hypothetical protein of Selindung RNA virus 1 as input for FoldSeek (van Kempen et al. 2023), implemented on the Foldseek Server.

Phylogenetic analysis

To assess the phylogenetic relationships of the ormycoviruses identified in this study with those documented previously, we compiled a data set of all known ormycoviruses. This comprised 36 ormycoviruses (Forgia et al. 2022, Pagnoni et al. 2023, Dekker et al. 2024, Martyn et al. 2024, Niu et al. 2024, Sahin et al. 2024) and unclassified or misclassified ormycoviruses that shared detectable sequence similarity with known ormycoviruses: Wildcat Canyon virus (WZL61396), Kasler Point virus (WZL61394), and a fungus-associated ‘Botourmiaviridae’ (UYL94578). For the SuperGroup 024 analysis, we utilized the data set featured in the phylogenetic analysis presented by Hou et al. (2024).

We first added the P. knowlesi-associated and Cystoisospora-associated viruses identified in this study to the ormycovirus data set and aligned with MAFFT v7.490 (Sievers et al. 2011) and MUSCLE v5.1 (Edgar 2022). Ambiguities in each alignment were considered in three ways using trimAl v1.4.1 (Capella-Gutiérrez et al. 2009): (i) no ambiguities were removed; (ii) ambiguities were removed using a gap threshold of 0.5 and a conservation percentage of 50; and (iii) ambiguities were removed using the parameter ‘gappyout’. Phylogenetic trees for these six alignments were inferred using ModelFinder and IQ-TREE v1.6.12 (Minh et al. 2020). To quantify support for the topology, we again used 1000 ultra-fast bootstraps and 1000 SH-aLRT bootstrap replicates.

To infer the pan-SuperGroup 024 phylogeny, all amino acid sequences were aligned using both MAFFT v7.490 (Sievers et al. 2011) and MUSCLE v5.1 (Edgar 2022). Ambiguities were removed using trimAl v1.4.1 (Capella-Gutiérrez et al. 2009) and the parameter -gappyout. The phylogenetic tree was inferred using IQ-TREE v1.6.12 (Minh et al. 2020) with ModelFinder limited to LG. Support values were measured with 1000 ultra-fast bootstraps (UFboot) and 1000 sh-aLRT bootstrap replicates.

All trees were visualized with ggtree (Yu et al. 2018, Yu 2020) (implemented in R v4.4.0) and Adobe Illustrator v26.4.1.

Motif C tally and visualization

The catalytic triad encoded by each virus in SuperGroup 024 was recorded and tabulated using R v4.4.0. The results were visualized using Prism v.10.3.0.

Sequences from individual clades were extracted from the SuperGroup 024 phylogeny by selecting individual nodes using the function ‘extract.clade()’ implemented in the R package ape. Sequences from each clade were then realigned using MAFFT and the motif C logos were generated according to the consensus sequence in Geneious Prime v2024.0.7.

Acknowledgements

We thank the Director General of Health, Malaysia for the permission to publish this article. Ethical approval for this study was obtained from the Medical Research and Ethics Committee, Ministry of Health Malaysia.

We also thank Jon Mifsud for his BatchArtemisSRAMiner pipeline (https://github.com/JonathonMifsud/BatchArtemisSRAMiner), which we used for all SRA screens, and Dr Alvin Kuo Jing Teo for suggesting the name Selindung RNA virus 1.

Author contributions

M.E.P., J.C., N.M.A., and E.C.H. designed the study. M.J.G, T.W., G.S.R., J.W., and K.A.P. designed the original malaria studies and collected the samples. M.E.P., J.C., and M.S. performed the experiments and analyses. M.E.P. wrote the initial manuscript draft. All authors reviewed and edited the manuscript.

Supplementary data

Supplementary data is available at VEVOLU Journal online.

Conflict of interest:

None declared.

Funding

This work was funded by a National Health and Medical Research Council Investigator award (M.J.G.), AIR@InnoHK administered by the Innovation and Technology Commission, Hong Kong Special Administrative Region, China (E.C.H.), a Sydney ID Seed Funding Award (M.E.P.), and the National Institutes of Health, USA R01 AI160457-01, and Malaysia Ministry of Health Grant BP00500/117/1002 (G.S.R.).

Data availability

All sequencing data analysed in this study are publicly available on NCBI (ormycoviruses) and an independent repository (http://47.93.21.181/, SuperGroup 024). Assembled contigs for the viruses identified in this study, the custom database used to screen libraries, alignments, and tree files are available on GitHub (https://github.com/mary-petrone/Plasmodium_ormyco).

References

Adjou
KT
,
Chevillot
A
,
Lucas
P
et al.
First identification of Cryptosporidium parvum virus 1 (CSpV1) in various subtypes of Cryptosporidium parvum from diarrheic calves, lambs and goat kids from France
.
Vet Res
2023
;
54
:66. doi:

Aguirre
AA
,
Longcore
T
,
Barbieri
M
et al.
The one health approach to toxoplasmosis: epidemiology, control, and prevention strategies
.
Ecohealth
2019
;
16
:
378
90
. doi:

Andrews
S
.
FastQC
.
2023
. https://github.com/s-andrews/FastQC

Anstey
NM
,
Grigg
MJ
.
Zoonotic malaria: the better you look, the more you find
.
J Infect Dis
2019
;
219
:
679
81
. doi:

Anstey
NM
,
Grigg
MJ
,
Rajahram
GS
et al.
Knowlesi malaria: human risk factors, clinical spectrum, and pathophysiology
.
Adv Parasitol
2021
;
113
:
1
43
. doi:

Atayde
VD
,
da Silva Lira Filho
A
,
Chaparro
V
et al.
Exploitation of the Leishmania exosomal pathway by Leishmania RNA virus 1
.
Nat Microbiol
2019
;
4
:
714
23
. doi:

Barrow
P
,
Dujardin
JC
,
Fasel
N
et al.
Viruses of protozoan parasites and viral therapy: is the time now right?
Virol J
2020
;
17
:142. doi:

Berber
E
,
Şimşek
E
,
Çanakoğlu
N
et al.
Newly identified Cryptosporidium parvum virus-1 from newborn calf diarrhoea in Turkey
.
Transbound Emerg Dis
2021
;
68
:
2571
80
. doi:

Blasco
B
,
Leroy
D
,
Fidock
DA
.
Antimalarial drug resistance: linking Plasmodium falciparum parasite biology to the clinic
.
Nat Med
2017
;
23
:
917
28
. doi:

Brock
PM
,
Fornace
KM
,
Grigg
MJ
et al.
Predictive analysis across spatial scales links zoonotic malaria to deforestation
.
Proc Biol Sci
2019
;
286
:20182351. doi:

Buchfink
B
,
Reuter
K
,
Drost
H-G
.
Sensitive protein alignments at tree-of-life scale using DIAMOND
.
Nat Methods
2021
;
18
:
366
68
. doi:

Bushnell
B
,
Rood
J
,
Singer
E
.
BBMerge—accurate paired shotgun read merging via overlap
.
PLoS One
2017
;
12
:e0185056. doi:

Capella-Gutiérrez
S
,
Silla-Martínez
JM
,
Gabaldón
T
.
trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses
.
Bioinformatics
2009
;
25
:
1972
73
. doi:

Charon
J
,
Buchmann
JP
,
Sadiq
S
et al.
RdRp-scan: a bioinformatic resource to identify and annotate divergent RNA viruses in metagenomic sequence data
.
Virus Evol
2022
;
8
:veac082. doi:

Charon
J
,
Grigg
MJ
,
Eden
JS
et al.
Novel RNA viruses associated with Plasmodium vivax in human malaria and Leucocytozoon parasites in avian disease
.
PLoS Pathog
2019
;
15
:e1008216. doi:

Chen
YM
,
Sadiq
S
,
Tian
JH
et al.
RNA viromes from terrestrial sites across China expand environmental viral diversity
.
Nat Microbiol
2022
;
7
:
1312
23
. doi:

Clausen
PTLC
,
Aarestrup
FM
,
Lund
O
.
Rapid and precise alignment of raw reads against redundant databases with KMA
.
BMC Bioinf
2018
;
19
:307. doi:

Conrad
MD
,
Asua
V
,
Garg
S
et al.
Evolution of partial resistance to artemisinins in malaria parasites in Uganda
.
N Engl J Med
2023
;
389
:
722
32
. doi:

Cooper
DJ
,
Rajahram
GS
,
William
T
et al.
Plasmodium knowlesi malaria in Sabah, Malaysia, 2015-2017: ongoing increase in incidence despite near-elimination of the human-only plasmodium species
.
Clin Infect Dis
2020
;
70
:
361
67
. doi:

Cox-Singh
J
,
Davis
TM
,
Lee
KS
et al.
Plasmodium knowlesi malaria in humans is widely distributed and potentially life threatening
.
Clinl Infect Dis
2008
;
46
:
165
71
. doi:

Cruz-Bustos
T
,
Feix
AS
,
Lyrakis
M
et al.
The transcriptome from asexual to sexual in vitro development of Cystoisospora suis (Apicomplexa: Coccidia)
.
Sci Rep
2022
;
12
:5972. doi:

Dekker
RJ
,
de Leeuw
WC
,
van Olst
M
et al.
Discovery of novel RNA viruses in commercially relevant seaweeds Alaria esculenta and Saccharina latissima
.
bioRxiv
2024
. 2024.2005.2022.594653 doi:

Deng
S
,
He
W
,
Gong
AY
et al.
Cryptosporidium uses CSpV1 to activate host type I interferon and attenuate antiparasitic defenses
.
Nat Commun
2023
;
14
:1456. doi:

Edgar
RC
.
Muscle5: high-accuracy alignment ensembles enable unbiased assessments of sequence homology and phylogeny
.
Nat Commun
2022
;
13
:6968. doi:

Ehrlich
HY
,
Jones
J
,
Parikh
S
.
Molecular surveillance of antimalarial partner drug resistance in sub-Saharan Africa: a spatial-temporal evidence mapping study
.
Lancet Microbe
2020
;
1
:
e209
17
. doi:

Ellis
J
,
Revets
H
.
Eimeria species which infect the chicken contain virus-like RNA molecules
.
Parasitology
1990
;
101
:
163
69
. doi:

Fauver
JR
,
Weger-Lucarelli
J
,
Fakoli
LS
III
et al.
Xenosurveillance reflects traditional sampling techniques for the identification of human pathogens: a comparative study in West Africa
.
PLoS Negl Trop Dis
2018
;
12
:e0006348. doi:

Forgia
M
,
Chiapello
M
,
Daghino
S
et al.
Three new clades of putative viral RNA-dependent RNA polymerases with rare or unique catalytic triads discovered in libraries of ORFans from powdery mildews and the yeast of oenological interest Starmerella bacillaris
.
Virus Evol
2022
;
8
:veac038. doi:

Fornace
KM
,
Brock
PM
,
Abidin
TR
et al.
Environmental risk factors and exposure to the zoonotic malaria parasite Plasmodium knowlesi across northern Sabah, Malaysia: a population-based cross-sectional survey
.
Lancet Planet Health
2019
;
3
:
e179
86
. doi:

Fornace
KM
,
Topazian
HM
,
Routledge
I
et al.
No evidence of sustained nonzoonotic Plasmodium knowlesi transmission in Malaysia from modelling malaria case data
.
Nat Commun
2023
;
14
:2945. doi:

Grigg
MJ
,
Cox
J
,
William
T
et al.
Individual-level factors associated with the risk of acquiring human Plasmodium knowlesi malaria in Malaysia: a case-control study
.
Lancet Planet Health
2017
;
1
:
e97
e104
. doi:

Grigg
MJ
,
William
T
,
Barber
BE
et al.
Age-related clinical spectrum of Plasmodium knowlesi malaria and predictors of severity
.
Clin Infect Dis
2018
;
67
:
350
59
. doi:

Grubaugh
ND
,
Sharma
S
,
Krajacich
BJ
et al.
Xenosurveillance: a novel mosquito-based approach for examining the human-pathogen landscape
.
PLoS Negl Trop Dis
2015
;
9
:e0003628. doi:

Grybchuk
D
,
Kostygov
AY
,
Macedo
DH
et al.
RNA viruses in trypanosomatid parasites: a historical overview
.
Mem Inst Oswaldo Cruz
2018
;
113
:e170487. doi:

Gupta
P
,
Hiller
A
,
Chowdhury
J
et al.
A parasite odyssey: an RNA virus concealed in toxoplasma gondii
.
Virus Evol
2024
;
10
:veae040. doi:

Han
Q
,
Li
J
,
Gong
P
et al.
Virus-like particles in Eimeria tenella are associated with multiple RNA segments
.
Exp Parasitol
2011
;
127
:
646
50
. doi:

Hatfull
GF
,
Dedrick
RM
,
Schooley
RT
.
Phage therapy for antibiotic-resistant bacterial infections
.
Annu Rev Med
2022
;
73
:
197
211
. doi:

Heeren
S
,
Maes
I
,
Sanders
M
et al.
Diversity and dissemination of viruses in pathogenic protozoa
.
Nat Commun
2023
;
14
:8343. doi:

Hotzel
I
,
Johnston
RC
,
Taque
J
et al.
Extrachromosomal nucleic acids in bovine Babesia
.
Mem Inst Oswaldo Cruz
1992
;
87
:
101
02
. doi:

Hou
X
,
He
Y
,
Fang
P
et al.
Using artificial intelligence to document the hidden RNA virosphere
.
Cell
2024
;187. doi:

Johnston
RC
,
Farias
NA
,
Gonzales
JC
et al.
A putative RNA virus in Babesia bovis
.
Mol Biochem Parasitol
1991
;
45
:
155
58
. doi:

Jones
P
,
Binns
D
,
Chang
HY
et al.
InterProScan 5: genome-scale protein function classification
.
Bioinformatics
2014
;
30
:
1236
40
. doi:

Jumper
J
,
Evans
R
,
Pritzel
A
et al.
Highly accurate protein structure prediction with Alphafold
.
Nature
2021
;
596
:
583
89
. doi:

Katoh
K
,
Standley
DM
.
MAFFT multiple sequence alignment software version 7: improvements in performance and usability
.
Mol Biol Evol
2013
;
30
:
772
80
. doi:

Kelley
LA
,
Mezulis
S
,
Yates
CM
et al.
The phyre2 web portal for protein modeling, prediction and analysis
.
Nat Protoc
2015
;
10
:
845
58
. doi:

Kim
A
,
Popovici
J
,
Menard
D
et al.
Plasmodium vivax transcriptomes reveal stage-specific chloroquine response and differential regulation of male and female gametocytes
.
Nat Commun
2019
;
10
:371. doi:

Kopylova
E
,
Noé
L
,
Touzet
H
.
SortMeRNA: fast and accurate filtering of ribosomal RNAs in metatranscriptomic data
.
Bioinformatics
2012
;
28
:
3211
17
. doi:

Kortright
KE
,
Chan
BK
,
Koff
JL
et al.
Phage therapy: a renewed approach to combat antibiotic-resistant bacteria
.
Cell Host Microbe
2019
;
25
:
219
32
. doi:

Krause
PJ
.
Human babesiosis
.
Int J Parasitol
2019
;
49
:
165
74
. doi:

Lee
S
,
Fernando
MA
.
Intracellular localization of viral RNA in Eimeria necatrix of the domestic fowl
.
Parasitol Res
1998
;
84
:
601
06
. doi:

Lee
S
,
Fernando
MA
.
Viral double-stranded RNAs of Eimeria spp. of the domestic fowl: analysis of genetic relatedness and divergence among various strains
.
Parasitol Res
2000
;
86
:
733
37
. doi:

Lee
S
,
Fernando
MA
,
Nagy
E
.
dsRNA associated with virus-like particles in Eimeria spp. of the domestic fowl
.
Parasitol Res
1996
;
82
:
518
23
. doi:

Lee
WC
,
Cheong
FW
,
Amir
A
et al.
Plasmodium knowlesi: the game changer for malaria eradication
.
Malar J
2022
;
21
:140. doi:

Li
B
,
Dewey
CN
.
RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome
.
BMC Bioinf
2011
;
12
:323. doi:

Li
D
,
Liu
CM
,
Luo
R
et al.
MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph
.
Bioinformatics
2015
;
31
:
1674
76
. doi:

Li
Z
,
Jaroszewski
L
,
Iyer
M
et al.
FATCAT 2.0: towards a better understanding of the structural diversity of proteins
.
Nucleic Acids Res
2020
;
48
:
W60
4
. doi:

Marcelino
VR
,
Clausen
PT
,
Buchmann
JP
et al.
CCMetagen: comprehensive and accurate identification of eukaryotes and prokaryotes in metagenomic data
.
Genome Biol
2020
;
21
:103. doi:

Martin
M
.
Cutadapt removes adapter sequences from high-throughput sequencing reads
.
EMBnet. J
2011
;
17
:
10
12
. doi:

Martyn
C
,
Hayes
BM
,
Lauko
D
et al.
Metatranscriptomic investigation of single Ixodes pacificus ticks reveals diverse microbes, viruses, and novel mRNA-like endogenous viral elements
.
mSystems
2024
;
9
:
e00321
24
. doi:

Meng
EC
,
Goddard
TD
,
Pettersen
EF
et al.
UCSF chimerax: tools for structure building and analysis
.
Protein Sci
2023
;
32
:e4792. doi:

Minh
BQ
,
Schmidt
HA
,
Chernomor
O
et al.
IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era
.
Mol Biol Evol
2020
;
37
:
1530
34
. doi:

Mirdita
M
,
Schütze
K
,
Moriwaki
Y
et al.
ColabFold: making protein folding accessible to all
.
Nat Methods
2022
;
19
:
679
82
. doi:

Niu
X
,
Xu
Z
,
Tian
Y
et al.
A putative ormycovirus that possibly contributes to the yellow leaf disease of Areca palm
.
Forests
2024
;
15
:1025. doi:

Pagnoni
S
,
Oufensou
S
,
Balmas
V
et al.
A collection of Trichoderma isolates from natural environments in Sardinia reveals a complex virome that includes negative-sense fungal viruses with unprecedented genome organizations
.
Virus Evol
2023
;
9
:vead042. doi:

Palmieri
N
,
Shrestha
A
,
Ruttkowski
B
et al.
The genome of the protozoan parasite Cystoisospora suis and a reverse vaccinology approach to identify vaccine candidates
.
Int J Parasitol
2017
;
47
:
189
202
. doi:

Poespoprodjo
JR
,
Douglas
NM
,
Ansong
D
et al.
Malaria
.
Lancet
2023
;
402
:
2328
45
. doi:

Rajahram
GS
,
Cooper
DJ
,
William
T
et al.
Deaths from Plasmodium knowlesi malaria: case series and systematic review
.
Clin Infect Dis
2019
;
69
:
1703
11
. doi:

Revets
H
,
Dekegel
D
,
Deleersnijder
W
et al.
Identification of virus-like particles in Eimeria stiedae
.
Mol Biochem Parasitol
1989
;
36
:
209
15
. doi:

Roditi
I
,
Wyler
T
,
Smith
N
et al.
Virus-like particles in Eimeria nieschulzi are associated with multiple RNA segments
.
Mol Biochem Parasitol
1994
;
63
:
275
82
. doi:

Rosenthal
PJ
,
Asua
V
,
Bailey
JA
et al.
The emergence of artemisinin partial resistance in Africa: how do we respond?
Lancet Infect Dis
2024
;9. doi:

Ryan
U
,
Fayer
R
,
Xiao
L
.
Cryptosporidium species in humans and animals: current understanding and research needs
.
Parasitology
2014
;
141
:
1667
85
. doi:

Sahin
E
,
Edis
G
,
Keskin
E
et al.
Molecular characterization of the complete genome of a novel ormycovirus infecting the ectomycorrhizal fungus Hortiboletus rubellus
.
Arch Virol
2024
;
169
:110. doi:

Shackelton
LA
,
Holmes
EC
.
The role of alternative genetic codes in viral evolution and emergence
.
J Theor Biol
2008
;
254
:
128
34
. doi:

Sievers
F
,
Wilm
A
,
Dineen
D
et al.
Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega
.
Mol Syst Biol
2011
;
7
:539. doi:

Strathdee
SA
,
Hatfull
GF
,
Mutalik
VK
et al.
Phage therapy: from biological mechanisms to future directions
.
Cell
2023
;
186
:
17
31
. doi:

Tobin
RJ
,
Harrison
LE
,
Tully
MK
et al.
Updating estimates of Plasmodium knowlesi malaria risk in response to changing land use patterns across Southeast Asia
.
PLoS Negl Trop Dis
2024
;
18
:e0011570. doi:

van Kempen
M
,
Kim
SS
,
Tumescheit
C
et al.
Fast and accurate protein structure search with Foldseek
.
Nat Biotechnol
2023
;
42
:
243
6
. doi:

Venkataraman
S
,
Prasad
B
,
Selvarajan
R
.
RNA dependent RNA polymerases: insights from structure, function and evolution
.
Viruses
2018
;
10
:76. doi:

Wang
AL
,
Wang
CC
.
Discovery of a specific double-stranded RNA virus in Giardia lamblia
.
Mol Biochem Parasitol
1986
;
21
:
269
76
. doi:

Wellems
TE
,
Plowe
CV
.
Chloroquine-resistant malaria
.
J Infect Dis
2001
;
184
:
770
76
. doi:

Wu
B
,
Zhang
X
,
Gong
P
et al.
Eimeria tenella: a novel dsRNA virus in E. tenella and its complete genome sequence analysis
.
Virus Genes
2016
;
52
:
244
52
. doi:

Xin
C
,
Wu
B
,
Li
J
et al.
Complete genome sequence and evolution analysis of Eimeria stiedai RNA virus 1, a novel member of the family Totiviridae
.
Arch Virol
2016
;
161
:
3571
76
. doi:

Yu
G
Using ggtree to visualize data on tree-like structures
.
Curr Protoc Bioinformatics
2020
;
69
:e96. doi:

Yu
G
,
Lam
TT
,
Zhu
H
et al.
Two methods for mapping and visualizing associated data on phylogeny using ggtree
.
Mol Biol Evol
2018
;
35
:
3041
43
. doi:

Zayed
AA
,
Wainaina
JM
,
Dominguez-Huerta
G
et al.
Cryptic and abundant marine viruses at the evolutionary origins of Earth’s RNA virome
.
Science
2022
;
376
:
156
62
. doi:

Zhao
Z
,
Li
X
,
Zhang
N
et al.
Multiple regulations of parasitic protozoan viruses: a double-edged sword for protozoa
.
mBio
2023
;
14
:e0264222. doi:

Zimmermann
L
,
Stephens
A
,
Nam
SZ
et al.
A completely reimplemented MPI bioinformatics toolkit with a new HHpred server at its core
.
J Mol Biol
2018
;
430
:
2237
43
. doi:

Author notes

Mary E. Petrone and Justine Charon Contributed equally

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Supplementary data