-
PDF
- Split View
-
Views
-
Cite
Cite
Yulia Aleshina, Alexander Lukashev, Mamastrovirus species are shaped by recombination and can be reliably distinguished in ORF1b genome region, Virus Evolution, Volume 11, Issue 1, 2025, veaf006, https://doi.org/10.1093/ve/veaf006
- Share Icon Share
Abstract
Astroviruses are a diverse group of small non-enveloped positive sense single-stranded RNA viruses that infect animals and birds. More than half of all known genome sequences of mammalian astroviruses are not assigned to provisional species, and the biological mechanisms that could support segregation of astroviruses into species are not well understood. The systematic analysis of recombination in Mamastrovirus genomes available in GenBank was done to identify mechanisms providing genetic distinction between astroviruses. Recombination breakpoints were present in all Mamastrovirus genome regions, but occurred most commonly at the ORF1b/ORF2 junction. Recombination was ubiquitous within, but never between established and putative new species, and may be suggested as an additional species criterion. The current species criterion for the genus Mamastrovirus based on ORF2 amino acid sequence p-distances did not reliably distinguish several established species and was of limited use to identify distinct groups among unclassified astroviruses that were isolated recently, predominantly from cattle and pigs. A 17% nucleotide sequence distance cut-off in ORF1b fairly distinguished the established species and several groups among the unclassified viruses, providing better correspondence between phylogenetic grouping, reproductive isolation and the virus hosts. Sequence distance criteria (17% in nucleotide sequence of ORF1b and 25% in amino acid sequence of ORF2) and the recombination pattern corresponded fairly well as species criteria, but all had minor exclusions among mammalian astroviruses. A combination of these taxonomic criteria supported the established Mamastrovirus species and suggested redefining a few provisional species that were proposed earlier and introducing at least six novel species among recently submitted rat and bovine astroviruses.
Introduction
Astroviruses are a group of small, nonenveloped positive-sense single-stranded RNA viruses that primarily infect mammals, including humans, and birds. They belong to the family Astroviridae and are characterized by a star-like appearance under electron microscopy in the proportion of virions, which gives them their name (Méndez and Arias 2013). Mammalian astroviruses are primarily enteric viruses and are known to cause gastroenteritis. While astrovirus infection in humans is generally self-limiting and resolves within a few days, it can pose a significant health risk, particularly in young children, the elderly, and immunocompromised individuals (Bosch et al. 2014). Astroviruses are also relevant pathogens of livestock and domestic animals. Mink astroviruses are the risk factor of preweaning diarrhea in farm-raised mink kits (De Benedictis et al. 2011). Avian astroviruses cause enteric diseases in various bird species, including chickens, turkeys, and ducks, leading to economic losses due to reduced growth rates and increased mortality. Infections with avian astroviruses may involve organs outside the gut and cause severe kidney disease in young chickens and goslings, and fatal hepatitis in ducklings (Pantin-Jackwood et al. 2012, Smyth 2017). Also, there is growing evidence that astroviruses are associated with neurological diseases in animals (mink, cattle, sheep, pigs) and in humans, particularly in the young, the elderly, and the immunocompromised (Wildi and Seuberlich 2021).
The family Astroviridae is divided into two genera, Mamastrovirus (MAstV) and Avastrovirus, which infect mammals and birds, respectively. The classification of astroviruses within each genus has been revised several times since they were first discovered and is currently being redefined (Bosch et al. 2014, Donato and Vijaykrishna 2017). According to the ICTV report, mammalian astroviruses that share >75% identity in the amino acid sequence of ORF2 and infect the same host species are recommended to be assigned to the same virus species (Bosch et al. 2012, 2014, Donato and Vijaykrishna 2017). It has also been suggested to distinguish variants within prospective species based on amino acid p-distance of less than 0.05––0.07 (93–95% identity) to the prototype strains (Donato and Vijaykrishna 2017). Currently, 19 species are recognized within the genus Mamastrovirus, termed Mamastrovirus (MAstV) 1 to 19, of which four species (MAstV1, MAstV6, MAstV8, and MAstV9) infect humans (Fig. 1). By 2012, an additional 14 viral species of mammalian astroviruses were suggested, but have not been officially assigned yet (Guix et al. 2012, Bosch et al. 2014). Since then, more astroviruses infecting the wide spectrum of mammalian hosts have been found. Also, astroviruses from fish, amphibians and reptiles were recently discovered (Shi et al. 2018).

The maximum likelihood phylogenetic trees inferred by IQ-TREE 2 (Minh et al. 2020) of ORF2 (N = 894) (a) and ORF1b (N = 522). (b) MAstV sequences are available in GenBank. The scale bars indicate nucleotide substitutions per site. The nodes with ultrafast bootstrap support values of <80% are marked with red circles. The information about virus species according to ICTV and their hosts was visualized as a color bar. The silhouettes of major hosts are shown near the heatmap. The tree branches are colored according to the virus host. Rare hosts and species are indicated by arrows on the tree. The lower panel shows the ratio of virus species in the ORF2 and ORF1b datasets, respectively. Csl, California sea lion.
More than half of all known MAstV genome sequences is not assigned to species (Fig. 1). Also, the biological events that could provide segregation of astroviruses into sub-genus groups are not well understood. One such mechanism could be natural homologous recombination, which is common in most RNA viruses. For example, in several picornavirus genera, recombination provides segregation of species by frequent genetic exchanges within, but never between, species (Lukashev 2010). There have been multiple reports of recombination in astroviruses. Table 1 summarizes these findings; however, the suggestive “parents” should be taken with care because they reflect the nearest relatives of recombinant fragments, which are highly dependent on sampling bias shifted towards human and domestic animals. Similarly to closely related and well-studied virus families Picornaviridae (Lukashev 2010) and Caliciviridae (Jeong et al. 2007, Oka et al. 2015, Begall et al. 2018, Vakulenko et al. 2023), recombination in astroviruses has been most often found at the junction of ORF1 and ORF2 encoding nonstructural and structural (capsid) proteins. Most of the findings were recombination events involving distinct ORFs, usually between viruses of the same host: classical human astroviruses (MAstV1) (Walter et al. 2001, Wolfaardt et al. 2011, De Grazia et al. 2012, Babkin et al. 2014, Medici et al. 2015, Ha et al. 2016, Zaraket et al. 2017), “novel” human astroviruses (MAstV6, MAstV8) (Ahmed et al. 2011, Hata et al. 2018), swine (Ito et al. 2017, Zhao et al. 2019, Amimo et al. 2020) and canine (Li et al. 2018, Zhang et al. 2020) astroviruses. Also, there were reports of recombinant genomes with suggestive ancestry of ORF1 and ORF2 from astroviruses of distinct host species: feline MAstV2 (ORF1b donor) and porcine MAstV3 (ORF2 donor) (Wang et al. 2020); porcine MAstV3 (ORF1b donor) and deer astrovirus (Lan et al. 2011); bovine and deer astroviruses (Chen et al. 2015); human and sea lion astroviruses (Rivera et al. 2010). There were also reports of recombination within ORF2, mainly between viruses of the same host species, such as human (De Grazia et al. 2012, Martella et al. 2014), canine (Li et al. 2018, Zhang et al. 2020), swine (Ito et al. 2017, Amimo et al. 2020) and camel (Woo et al. 2015) astroviruses. A few studies found recombination within ORF2 between human and feline astroviruses (Hata et al. 2018), bovine and roe deer astroviruses (Tse et al. 2011). The reports of recombination within ORF1ab are even more scarce and include recombination between classical human astroviruses (Pativada et al. 2013, Babkin et al. 2014), canine (MAstV5) (Li et al. 2018), porcine (Ito et al. 2017), and goat astroviruses (Kauer et al. 2019). Despite multiple reports of recombinant astrovirus genomes, our understanding of principles and limitations of recombination in astroviruses is far from comprehensive and lacks apparent “rules,” such as those known for two well-studied ssRNA virus families with related properties, Picornaviridae and Caliciviridae.
No . | Location of recombination breakpoints . | Recombinant virus (host) . | Potential parents (host) . | Reference . |
---|---|---|---|---|
1 | ORF1/ORF2 junction | MAstV1 (human) | MAstV1 (human) | (Babkin et al. 2014; De Grazia et al. 2012; Ha et al. 2016; Medici et al. 2015; Walter et al. 2001; Wolfaardt et al. 2011; Zaraket et al. 2017) |
2 | MAstV6 (human) | MAstV6 (human) | (Hata et al. 2018) | |
3 | MAstV8 (human) | MAstV8 (human) | (Ahmed et al. 2011) | |
4 | Porcine astroviruses | Porcine astroviruses | (Amimo et al. 2020; Ito et al. 2017; Zhao et al. 2019) | |
5 | MAstV5 (dog) | MAstV5 (dog) | (Li et al. 2018) | |
6 | MAstV2 (cat) | MAstV2 (cat), MAstV3 (pig) | (Wang et al. 2020) | |
7 | MAstV3 (pig) | MAstV3 (pig), deer astrovirus (roe deer) | (Lan et al. 2011) | |
8 | Yak astrovirus (yak) | Bovine astrovirus (cow), deer astrovirus (roe deer) | (Chen et al. 2015) | |
9 | CslAstV (California sea lion) | MAstV1 (human) and CslAstV (California sea lion) | (Rivera et al. 2010) | |
10 | MAstV2 (cheetah) | MAstV1 (human), MAstV2 (cat) | (Atkins, et al., 2009) | |
11 | ORF1a/ORF1b | Porcine AstV4 (pig) | Porcine astrovirus (pig) | (Amimo et al. 2020) |
12 | Within ORF2 | MAstV1 (human) | MAstV1 (human) | (De Grazia et al. 2012; Martella et al. 2014) |
13 | MAstV5 (dog) | MAstV5 (dog) | (Li et al. 2018; Zhang et al. 2020) | |
14 | Porcine astrovirus (pig) | Porcine astrovirus (pig) | (Amimo et al. 2020; Ito et al. 2017) | |
15 | Dromedary camel astrovirus (dromedary camel) | Dromedary camel AstV (dromedary camel), unknown | (Woo et al. 2015) | |
16 | MAstV1 (human) | MAstV1 (human), MAstV2 (cat) | (Hata et al. 2018) | |
17 | Deer astrovirus (roe deer) | Bovine astrovirus, unknown | (Tse et al. 2011) | |
18 | Within ORF1a | Porcine astrovirus (pig) | Porcine astroviruses (pig) | (Ito et al. 2017) |
19 | Caprine astrovirus (goat) | Bovine astroviruses (cow) | (Kauer et al. 2019) | |
20 | Ovine astrovirus (sheep) | Ovine astrovirus (sheep), caprine astrovirus (goat) | (Kauer et al. 2019) | |
21 | Within ORF1b | MAstV1 (human) | MAstV1 (human) | (Babkin et al. 2014; Pativada et al. 2013) |
22 | MAstV5 (dog) | MAstV5 (dog) | (Li et al. 2018) |
No . | Location of recombination breakpoints . | Recombinant virus (host) . | Potential parents (host) . | Reference . |
---|---|---|---|---|
1 | ORF1/ORF2 junction | MAstV1 (human) | MAstV1 (human) | (Babkin et al. 2014; De Grazia et al. 2012; Ha et al. 2016; Medici et al. 2015; Walter et al. 2001; Wolfaardt et al. 2011; Zaraket et al. 2017) |
2 | MAstV6 (human) | MAstV6 (human) | (Hata et al. 2018) | |
3 | MAstV8 (human) | MAstV8 (human) | (Ahmed et al. 2011) | |
4 | Porcine astroviruses | Porcine astroviruses | (Amimo et al. 2020; Ito et al. 2017; Zhao et al. 2019) | |
5 | MAstV5 (dog) | MAstV5 (dog) | (Li et al. 2018) | |
6 | MAstV2 (cat) | MAstV2 (cat), MAstV3 (pig) | (Wang et al. 2020) | |
7 | MAstV3 (pig) | MAstV3 (pig), deer astrovirus (roe deer) | (Lan et al. 2011) | |
8 | Yak astrovirus (yak) | Bovine astrovirus (cow), deer astrovirus (roe deer) | (Chen et al. 2015) | |
9 | CslAstV (California sea lion) | MAstV1 (human) and CslAstV (California sea lion) | (Rivera et al. 2010) | |
10 | MAstV2 (cheetah) | MAstV1 (human), MAstV2 (cat) | (Atkins, et al., 2009) | |
11 | ORF1a/ORF1b | Porcine AstV4 (pig) | Porcine astrovirus (pig) | (Amimo et al. 2020) |
12 | Within ORF2 | MAstV1 (human) | MAstV1 (human) | (De Grazia et al. 2012; Martella et al. 2014) |
13 | MAstV5 (dog) | MAstV5 (dog) | (Li et al. 2018; Zhang et al. 2020) | |
14 | Porcine astrovirus (pig) | Porcine astrovirus (pig) | (Amimo et al. 2020; Ito et al. 2017) | |
15 | Dromedary camel astrovirus (dromedary camel) | Dromedary camel AstV (dromedary camel), unknown | (Woo et al. 2015) | |
16 | MAstV1 (human) | MAstV1 (human), MAstV2 (cat) | (Hata et al. 2018) | |
17 | Deer astrovirus (roe deer) | Bovine astrovirus, unknown | (Tse et al. 2011) | |
18 | Within ORF1a | Porcine astrovirus (pig) | Porcine astroviruses (pig) | (Ito et al. 2017) |
19 | Caprine astrovirus (goat) | Bovine astroviruses (cow) | (Kauer et al. 2019) | |
20 | Ovine astrovirus (sheep) | Ovine astrovirus (sheep), caprine astrovirus (goat) | (Kauer et al. 2019) | |
21 | Within ORF1b | MAstV1 (human) | MAstV1 (human) | (Babkin et al. 2014; Pativada et al. 2013) |
22 | MAstV5 (dog) | MAstV5 (dog) | (Li et al. 2018) |
No . | Location of recombination breakpoints . | Recombinant virus (host) . | Potential parents (host) . | Reference . |
---|---|---|---|---|
1 | ORF1/ORF2 junction | MAstV1 (human) | MAstV1 (human) | (Babkin et al. 2014; De Grazia et al. 2012; Ha et al. 2016; Medici et al. 2015; Walter et al. 2001; Wolfaardt et al. 2011; Zaraket et al. 2017) |
2 | MAstV6 (human) | MAstV6 (human) | (Hata et al. 2018) | |
3 | MAstV8 (human) | MAstV8 (human) | (Ahmed et al. 2011) | |
4 | Porcine astroviruses | Porcine astroviruses | (Amimo et al. 2020; Ito et al. 2017; Zhao et al. 2019) | |
5 | MAstV5 (dog) | MAstV5 (dog) | (Li et al. 2018) | |
6 | MAstV2 (cat) | MAstV2 (cat), MAstV3 (pig) | (Wang et al. 2020) | |
7 | MAstV3 (pig) | MAstV3 (pig), deer astrovirus (roe deer) | (Lan et al. 2011) | |
8 | Yak astrovirus (yak) | Bovine astrovirus (cow), deer astrovirus (roe deer) | (Chen et al. 2015) | |
9 | CslAstV (California sea lion) | MAstV1 (human) and CslAstV (California sea lion) | (Rivera et al. 2010) | |
10 | MAstV2 (cheetah) | MAstV1 (human), MAstV2 (cat) | (Atkins, et al., 2009) | |
11 | ORF1a/ORF1b | Porcine AstV4 (pig) | Porcine astrovirus (pig) | (Amimo et al. 2020) |
12 | Within ORF2 | MAstV1 (human) | MAstV1 (human) | (De Grazia et al. 2012; Martella et al. 2014) |
13 | MAstV5 (dog) | MAstV5 (dog) | (Li et al. 2018; Zhang et al. 2020) | |
14 | Porcine astrovirus (pig) | Porcine astrovirus (pig) | (Amimo et al. 2020; Ito et al. 2017) | |
15 | Dromedary camel astrovirus (dromedary camel) | Dromedary camel AstV (dromedary camel), unknown | (Woo et al. 2015) | |
16 | MAstV1 (human) | MAstV1 (human), MAstV2 (cat) | (Hata et al. 2018) | |
17 | Deer astrovirus (roe deer) | Bovine astrovirus, unknown | (Tse et al. 2011) | |
18 | Within ORF1a | Porcine astrovirus (pig) | Porcine astroviruses (pig) | (Ito et al. 2017) |
19 | Caprine astrovirus (goat) | Bovine astroviruses (cow) | (Kauer et al. 2019) | |
20 | Ovine astrovirus (sheep) | Ovine astrovirus (sheep), caprine astrovirus (goat) | (Kauer et al. 2019) | |
21 | Within ORF1b | MAstV1 (human) | MAstV1 (human) | (Babkin et al. 2014; Pativada et al. 2013) |
22 | MAstV5 (dog) | MAstV5 (dog) | (Li et al. 2018) |
No . | Location of recombination breakpoints . | Recombinant virus (host) . | Potential parents (host) . | Reference . |
---|---|---|---|---|
1 | ORF1/ORF2 junction | MAstV1 (human) | MAstV1 (human) | (Babkin et al. 2014; De Grazia et al. 2012; Ha et al. 2016; Medici et al. 2015; Walter et al. 2001; Wolfaardt et al. 2011; Zaraket et al. 2017) |
2 | MAstV6 (human) | MAstV6 (human) | (Hata et al. 2018) | |
3 | MAstV8 (human) | MAstV8 (human) | (Ahmed et al. 2011) | |
4 | Porcine astroviruses | Porcine astroviruses | (Amimo et al. 2020; Ito et al. 2017; Zhao et al. 2019) | |
5 | MAstV5 (dog) | MAstV5 (dog) | (Li et al. 2018) | |
6 | MAstV2 (cat) | MAstV2 (cat), MAstV3 (pig) | (Wang et al. 2020) | |
7 | MAstV3 (pig) | MAstV3 (pig), deer astrovirus (roe deer) | (Lan et al. 2011) | |
8 | Yak astrovirus (yak) | Bovine astrovirus (cow), deer astrovirus (roe deer) | (Chen et al. 2015) | |
9 | CslAstV (California sea lion) | MAstV1 (human) and CslAstV (California sea lion) | (Rivera et al. 2010) | |
10 | MAstV2 (cheetah) | MAstV1 (human), MAstV2 (cat) | (Atkins, et al., 2009) | |
11 | ORF1a/ORF1b | Porcine AstV4 (pig) | Porcine astrovirus (pig) | (Amimo et al. 2020) |
12 | Within ORF2 | MAstV1 (human) | MAstV1 (human) | (De Grazia et al. 2012; Martella et al. 2014) |
13 | MAstV5 (dog) | MAstV5 (dog) | (Li et al. 2018; Zhang et al. 2020) | |
14 | Porcine astrovirus (pig) | Porcine astrovirus (pig) | (Amimo et al. 2020; Ito et al. 2017) | |
15 | Dromedary camel astrovirus (dromedary camel) | Dromedary camel AstV (dromedary camel), unknown | (Woo et al. 2015) | |
16 | MAstV1 (human) | MAstV1 (human), MAstV2 (cat) | (Hata et al. 2018) | |
17 | Deer astrovirus (roe deer) | Bovine astrovirus, unknown | (Tse et al. 2011) | |
18 | Within ORF1a | Porcine astrovirus (pig) | Porcine astroviruses (pig) | (Ito et al. 2017) |
19 | Caprine astrovirus (goat) | Bovine astroviruses (cow) | (Kauer et al. 2019) | |
20 | Ovine astrovirus (sheep) | Ovine astrovirus (sheep), caprine astrovirus (goat) | (Kauer et al. 2019) | |
21 | Within ORF1b | MAstV1 (human) | MAstV1 (human) | (Babkin et al. 2014; Pativada et al. 2013) |
22 | MAstV5 (dog) | MAstV5 (dog) | (Li et al. 2018) |
In this work, we sought to analyze descriptive criteria (sequence distance) and potential underlying mechanistic factors (recombination and host species) that affect segregation of MAstVs into putative species.
Materials and methods
Data collection
Alignments of full ORF1b, ORF2 sequences available in GenBank
Nucleotide sequences which are longer than 400 nt and belong to the genus Mamastrovirus (N = 3228) were downloaded from GenBank database in GenBank format as of June 2023. ORF1b (N = =ORF) and ORF2 (N = 2423) sequences were excised according to the coordinates indicated in GenBank annotations. Sequences that did not cover full or nearly full ORF1b and ORF2 regions (shorter than 1500 nt and 2000 nt, respectively) were excluded from ORF1b and ORF2 datasets. Sequences with more than two ambiguous nucleotides were also excluded. Also, sequences of viruses isolated from the environment (N = 10 and N = 1 for ORF2 and ORF1b datasets, respectively) were left out. The resulting ORF1b (N = 522) and ORF2 (N = 894) nucleotide sequences were aligned based on their corresponding amino acid translations using TranslatorX (Abascal et al. 2010). In the ORF1b alignment, four astrovirus sequences isolated from rats and dogs were omitted due to the signs of extensive nonhomologous recombination compatible with a sequencing artifact.
Alignment of concatenated ORFs
Full-genome sequences available for the genus Mamastrovirus were downloaded from GenBank database as of July 2023 (N = 756). Identical GenBank entries were deleted from the dataset. Sequences with >0.1% ambiguous nucleotides were also excluded from the dataset. To create the alignment of concatenated open reading frames (ORFs), the coordinates of ORF1a, ORF1b and ORF2 were extracted from GenBank annotations, and nucleotide sequences of ORFs were excised from the full genomes. The accuracy of ORF annotations was checked manually by translation of excised sequences and corrected if needed. The nucleotide sequences of ORFs were aligned based on their corresponding amino acid translations using TranslatorX (Abascal et al. 2010) and concatenated. Two sequences of astroviruses from rats were omitted because they had protracted regions of no homology to other astroviruses, likely due to nonhomologous recombination or a sequencing error. The resulting alignment included 478 sequences. The virus host information was retrieved from GenBank annotations and verified manually. Virus species were identified for 293 sequences according to Virus Metadata Resource of ICTV, https://ictv.global/news/vmr_release_0423 (accessed on 17 July 2023) and GenBank metadata. This was further verified by phylogenetic grouping in ORF2 region. Viruses that were missing in the ICTV list and did not have that information in metadata were assigned as unknown species if they were outgroups to viruses with established species (all of them also differed by >30% aa sequence in ORF2). Viruses that were within a phylogenetic group corresponding to an established species were assigned a species of their phylogenetic neighbors.
Pairwise genetic distances
Pairwise nucleotide and amino acid p-distances (the ratio of matching nucleotides or amino acids) were calculated using ape v.5.7 R package (Paradis et al. 2019) and visualized using ggplot2 v.3.4 R package (Wickham 2016).
Clustering of ORF2 and ORF1b sequences
Full ORF2 amino acid sequences available in GenBank (N = 894) were clustered based on sequence identity using USEARCH (Edgar 2010) and MMSeqs2 software (Steinegger and Söding 2018b) with 0.75 identity threshold, which corresponds to the species criterion. ORF2 amino acid sequences and ORF1b nucleotide sequences derived from the full-genome dataset (N = 478) were clustered with 0.75 and 0.83 identity thresholds, respectively. The latter identity threshold was suggested by the distribution of nucleotide p-distances in ORF1b region. In USEARCH software, the clustering was performed using cluster_fast command based on UCLUST algorithm. In MMSeqs2, mmseqs cluster workflow was employed.
Recombination analysis
Recombination analysis of the alignment of concatenated ORFs from 478 complete MAstV genomes was performed with the RDP5 program (Martin, et al., 2021) using a full exploratory automated scan that utilizes the following recombination detection methods: RDP (D. Martin and Rybicki 2000), GENECONV (Sawyer 1989), Maxchi (Smith 1992), Bootscan (D. P. Martin et al. 2005), Chimaera (Posada and Crandall 2001), SiScan (Gibbs et al. 2000), 3Seq (Boni et al. 2007), PhylPro (Weiller 1998), and LARD (Boni et al. 2007). The unique recombination events detected by exploratory automated scan were characterized by 5ʹ and 3ʹ maximum likelihood breakpoint locations, a list of one or more sequences carrying a recombination signal (several recombinant sequences can be the result of a single ancestral recombination event), a list of “parental sequences” (sequences present in the dataset that were closely related to actual parents of recombinants and were used to infer the existence of actual parents), the recombination methods and their associated P-values. The recombination events supported by more than four out of nine recombination detection methods were subsequently analyzed. The breakpoint distribution plot for hot/coldspot detection was constructed from the lists of 5ʹ and 3ʹ breakpoints positions of each detected recombination event. To construct the breakpoint density, the window of 200-nt length was slid along the length of alignment by 1 nt, and recombination breakpoints falling within each window were calculated. The permutation test implemented in RDP5 was used to test the clustering of breakpoint positions (Martin, et al., 2021). The potential hot- or coldspots are identified as window coordinates where the breakpoints counts are higher or lower than in >99% of windows at the identical location in the 1000 permuted breakpoint density plots. The overall recombination patterns in MAstVs were visualized using the recombination region count matrix, which indicates how often different parts of analyzed sequences are separated from one another by recombination.
The correspondence between pairwise nucleotide p-distances in different genome regions of MAstVs was visualized using recDplot R package (https://github.com/v-julia/recDplot, accessed on 5 December 2024).
Phylogenetic analysis
Phylogenetic trees were built for ORF1b (N = 522) and ORF2 (N = 894) sequences available in Genbank and for ORF1a, ORF1b, and ORF2 sequences from available full genomes (N = 478). All phylogenetic trees were inferred using IQ-TREE v2.2.0 (Minh et al. 2020) with 10,000 pseudo-replicates (Minh et al. 2013), incorporating the best-fit model of nucleotide substitutions (Kalyaanamoorthy et al. 2017), and rooted by a midpoint. The automatic coloring of taxa labels on ORF1a, ORF1b, and ORF2 trees was performed using a custom Python and R scripts (https://github.com/v-julia/MAV_recombination). The trees were visualized using ggtree R package (Yu 2020) and FigTree v1.4.4 program (Rambaut, n.d.).
Alignments of full ORF1b and ORF2 regions, alignment of concatenated ORFs (full genome dataset), tables with sequences metadata, as well as raw trees in the Newick format are available at https://github.com/v-julia/MAV_recombination (accessed on 5 December 2024).
Results
Application of established Mamastrovirus species criterion to the available sequence data
The sample of known MAstV sequences is significantly biased toward viruses infecting humans, livestock, and domestic animals. Also, virus species were assigned primarily among the viruses infecting humans, cats, and dogs, while most viruses infecting livestock (e.g. pigs and cattle) and other host species remained unclassified. To illustrate the available dataset and the sample bias, phylogenetic trees were inferred using all sequences of ORF1b (encoding polymerase) and ORF2 (encoding capsid) available in GenBank. These genome regions were most represented in the GenBank database as of July 2023 (Fig. 1). The capsid-encoding ORF2 genome region is the most frequently sequenced in astroviruses, since astrovirus species assignment is primarily based on the amino acid p-distance of this region at the threshold of 0.25 (25%).
More than half of available ORF1b and ORF2 sequences are currently not assigned to virus species (Fig. 1). To apply the established species criterion to unclassified astrovirus sequences, ORF2 sequences available in GenBank (N = 894) were clustered by amino acid sequence similarity using two programs widely used for fast sequence clustering—USEARCH (Edgar 2010) and MMSeqs2 (Steinegger and Söding 2018a) (Figure S1A, “USEARCH” and “MMSeqs2” panels). The number of inferred clusters, which correspond to a suggestive number of MAstV species, differed for two programs (N = 107 for USEARCH, N = 86 for MMSeqs2). The most prominent differences in clustering were observed for unclassified MAstVs, while the assignment to clusters was generally consistent with established MAstV species. It is noteworthy that algorithms for fast sequence clustering are greedy, and the results may thus depend on the order of sequences in the input dataset. The number of clusters did not change significantly when sequences were shuffled before clustering. Ten independent runs of USEARCH with ORF2 sequences shuffled by input order produced from 107 to 114 clusters and the assignment of sequences to clusters slightly differed (data not shown). Specifically, several ORF2 sequences of MAstV2 (feline astroviruses) were assigned to the cluster of MAstV1 (human astroviruses) in most of the shuffled datasets. Thus, clustering based on the established similarity threshold in ORF2 was not robust (was affected by the algorithm applied and sequence input order), could not distinguish certain MAstV species and was of limited use to distinguish unclassified MAstVs, because it produced an unmanageable number of clusters.
The distribution of amino acid and nucleotide distances in ORF2 and ORF1b
Ideally, a classification should be based on criteria that robustly distinguish taxa. For example, overlapping intra- and interspecies pairwise genetic distances could lead to incorrect and inconsistent assignment of MAstV2 ORF2 sequences to the MAstV1 cluster. To examine whether reliable distance-based taxonomic criteria in astroviruses are feasible, the distribution of nucleotide and amino acid p-distances for ORF2 was plotted (Fig. 2a) for sequences with available complete genomes (N = 478). Indeed, the 25% amino acid p-distance threshold in ORF2 did not distinguish MAstV1 and MAstV2 (Fig. 2a, left panel, dashed line). All interspecies pairwise sequence comparisons with distances <25% corresponded to MAstV1 and MAstV2 sequences. Pairwise p-distances between MAstV1 and MAstV2 ranged from 0.20 to 0.28, thus MAstV1 and MAstV2 were not reliably distinguishable by sequence distance in ORF2. Historically, the ORF2 region has been sequenced more often, and there were no complete genome sequences for some viruses that have been classified as separate species previously, namely 10 astrovirus species from bats and marine mammals. In order to include all known genetic diversity of mammalian astroviruses, we reproduced the distribution of pairwise distances for all ORF2 sequences available in GenBank (Fig. 2b, left panel). The 25% amino acid sequence species criterion was even less robust upon inclusion of additional sequences. The intraspecies amino acid p-distances reached 40% when all available sequences of ORF2 were considered (691 virus pairs), compared to almost no intraspecies comparisons with amino acid p-distances >25% (6 virus pairs) in the dataset of full genomes. In general, in ORF2, pairwise distances between MAstVs were distributed without clear gaps (distance values with few or none sequence pairs) that could offer an unambiguous taxon criterion, and there was a significant overlap between intra- and interspecies distances. In order to examine whether it is possible to infer a distance criterion for MAstV, we considered the distribution of nucleotide and amino acid p-distances for ORF1a, ORF1b, and the alignment of concatenated ORFs (Fig. 2, right panel; Figure S2). The distribution of pairwise nucleotide p-distances in ORF2 and amino acid p-distances in the alignment of concatenated ORFs showed distinct peaks with almost no overlap between inter- and intraspecies distances. In the distribution of amino acid distances for concatenated ORFs there were two distinct peaks corresponding to intraspecies comparisons and comparisons of unclassified viruses with a gap around 16%. Although using the complete genome sequences for astrovirus assignment of new species is desirable, shorter regions are commonly sequenced, and more archive data are available. Moreover, there is frequent recombination in MAstVs, especially between ORF1 and ORF2 (see further). Therefore, phylogenetic studies done on complete genomes may be a subject to systematic error. In ORF1b (Fig. 2, right panel) in both complete genomes dataset and the extended dataset of all sequences available in GenBank, there was almost no overlap between pairwise distances even between MAstV1 and MAstV2, and there was a clear gap between inter- and intraspecies distances at 17% nucleotide sequence distance. Therefore, pairwise genetic distances were generally suitable to distinguish MAstV groups, but ORF1b was apparently better suited for this purpose.
![Distribution of pairwise nucleotide and amino acid p-distances in ORF2 (left panel) and ORF1b (right panel) extracted from complete genome sequences (a) and all available sequences (b) within the genus Mamastrovirus. The species thresholds of 25% amino acid difference in ORF2 and 17% nucleotide sequence difference in ORF1b are indicated by dashed lines. The information about virus species was retrieved from the ICTV Master Species List [https://ictv.global/msl] and Genbank metadata. This was further verified by phylogenetic grouping in the ORF2 region. Viruses that were missing in the ICTV list and did not have that information in metadata were assigned as unknown species if they were outgroups to viruses with established species (all of them also differed by >30% amino acid sequence in ORF2). Viruses that were within a phylogenetic group corresponding to an established species were assigned a species of their phylogenetic neighbors.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/ve/11/1/10.1093_ve_veaf006/10/m_veaf006f2.jpeg?Expires=1747906988&Signature=TXZEDCq4hlHV5YryAD486bbWIb3W2ltBE4o3jxR2L5UZjL3v~CtR77Dj4gWTUl-oqM9cA4cdUJQhk5JS6j6xDLEx3bc3GvZWdvCIUDDdOEM6neaD36un98SdF~RVSgP5tA8fSZ1pVonp1zpALMHXX-VszYE8m-eIiJtVw~vj94yy1EVKHpypZHRwOXMeUkiQAXsL8mYHsPJyxlZGhis~tPfZ0tNZPiqOJ3svBkTKiOQ-m8BYfckcO7IRo8WMHbgShVY9QOPdwyqH3voVcxtwaK5Eo8mLhPwVIRmH~a7Bx1dYTI~2zs~8lMuM43-xEpZwMyohlkvSynUMwDBYUJ3jyA__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Distribution of pairwise nucleotide and amino acid p-distances in ORF2 (left panel) and ORF1b (right panel) extracted from complete genome sequences (a) and all available sequences (b) within the genus Mamastrovirus. The species thresholds of 25% amino acid difference in ORF2 and 17% nucleotide sequence difference in ORF1b are indicated by dashed lines. The information about virus species was retrieved from the ICTV Master Species List [https://ictv.global/msl] and Genbank metadata. This was further verified by phylogenetic grouping in the ORF2 region. Viruses that were missing in the ICTV list and did not have that information in metadata were assigned as unknown species if they were outgroups to viruses with established species (all of them also differed by >30% amino acid sequence in ORF2). Viruses that were within a phylogenetic group corresponding to an established species were assigned a species of their phylogenetic neighbors.
Numeric sequence distance criteria might be a result of sampling bias and, in this case, would not be reproducible upon discovery of additional genomes. Robust taxonomic criteria may be expected to have an underlying mechanism (molecular or ecological) that provides noncontinuous distribution of pairwise sequence distances at the genus level. One mechanism known to affect the distribution of genetic distances in viruses is natural recombination. We studied the patterns of recombination in MAstVs in relation with the established species and genetic distances among unclassified viruses.
Patterns of natural recombination in MAstV genomes
The established MAstV species were well distinguished on the phylogenetic trees for both ORF1b and ORF2 regions and apparently were not affected by interspecies recombination, because all viruses of species defined by ORF2 were also grouped together in ORF1b (Fig. 1). Since recombination has been commonly reported in MAstVs previously and could affect phylogenetic grouping, recombination analysis of full MAstV genome sequences (N = 478) was done over the genome to further check robustness of genomic regions for phylogenetic and taxonomic analysis.
To estimate recombination prevalence throughout the genome, the unique recombination events were identified using recombination detection methods implemented in RDP5. A total of 292 events supported by more than four detection methods were visualized using recombination breakpoint density plot (Fig. 3a) and recombination region count matrix (Fig. 3b). The recombination distribution plot uses a permutation test to compare the observed recombination incidence to random recombination corrected for sequence similarity to infer potential recombination hot and coldspots (Heath et al. 2006). Recombination breakpoints were present in all MAstV genome regions (Fig. 3a) but were most common within about 500 nt around the ORF1/ORF2 junction (Fig. 3a, areas filled in red). The breakpoints close to the genome termini were likely an artifact of the method that obligatorily infers two breakpoints for each recombinant fragment. The detectable breakpoint densities were significantly higher in the terminal 5% of ORFs than in their middle 90% (P-value < .001, permutation test). Also, there were two potential recombination hotspots at the 3ʹ end of ORF1a and the 5ʹend of the ORF1b genome regions (Fig. 3a, areas marked in red). Within ORFs, there were several apparent recombination coldspots in the 5ʹ part of ORF2 that corresponds to the major capsid protein VP34, in the 3ʹ half of ORF2 and within ORF1a (Fig. 3a, areas marked blue). In ORF2, these coldspots could be a result of higher expected random recombination incidence (Fig. 3a, gray area) due to lower sequence similarity (see method description in Heath et al. 2006), while the absolute incidence of breakpoints here (Fig. 3, solid line) was generally similar to that within other ORFs and higher than in ORF1b or in the coldspot within ORF1a. Thus, these coldspots in ORF2 should be interpreted with care. Recombination region count matrices indicated that ORF1b and ORF2 (especially its 5ʹ part) were more commonly transferred as a whole between MAstV genomes (Fig. 3b).

(a) Distribution of recombination breakpoints detected by RDP5 along the alignment of full MAstV genomes (N = 478), window size = 200 nt. The genome map is shown above the graph. The breakpoint positions are indicated below the genome map with vertical lines. The solid black line indicates the number of recombination breakpoints that fall into a 200 nt window. The gray area indicates 95% boundaries of the expected number of breakpoints under random recombination identified using the permutation test (1,000 permutations). The areas marked in red and blue correspond to potential recombination hotspots and coldspots. (b) Recombination region count matrices indicating genome regions that were most and least commonly transferred upon detectable recombination events in MAstVs. Colors indicate the number of times recombination events have separated the pair of nucleotides.
To get an overview of the relative impact of recombination in different genome regions, pairwise distance comparison (PDC) plots were used to visualize the relationship of genetic distances in different pairs of genome regions (Figure S3). Generally, if recombinant sequences are absent in a dataset, the correspondence of genetic distances in different regions follows an almost linear relationship due to the proportional accumulation of substitutions. A recombination event results in deviation of pairwise distances between a recombinant, its potential parents and closely related sequences. The distances between sequences involved in recombination might hint at the approximate time of the event. Ancestral events usually lead to the deviation of multiple points with relatively high genetic distances, because such events are reflected in several descendant sequences. PDC plots confirmed that recombination was least evident within ORF1a and ORF1b, concordant with a low incidence of recombination breakpoints detected by RDP5 (Figure S3 A, B). PDC plots also suggested more incongruence between ORF1a/ORF1b and ORF2 than within ORF1b (detailed in Figure S3). This corresponds to a major recombination hotspot found by RDP5 between ORF1b and ORF2. The least incongruence was observed between halves of ORF1b. Therefore, ORF1b appears least affected by recombination and thus best suited for phylogenetic analysis.
The established MAstV species and several phylogenetic groups are reproductively isolated
The phylogenetic grouping of unclassified MAstVs corresponded much better to the host species in the ORF1b region compared to ORF2 (Fig. 1), and pairwise distance distribution suggested better distinction of taxa in ORF1b (Fig. 2). To study correspondence between recombination (mixing of phylogenetic groups), taxonomy, and host species over the MAstV genome, phylogenetic trees for ORF1a, ORF1b, and ORF2 were colored according to phylogenetic grouping (colors) and order of tree tips (shades) in the ORF1b phylogenetic tree (Fig. 4, Figure S4).

The maximum likelihood phylogenetic trees of concatenated ORF1a (a), ORF1b (b), ORF2 (c), and full genomes (d) from the dataset of MAstV full genomes (N = 478) built by IQ-TREE 2 (Minh et al. 2020). The nodes with ultrafast bootstrap support values <80% are marked with red circles. The colors of taxa names match across the trees. The information about virus species according to ICTV and their hosts was visualized as color bars. The silhouettes of major hosts are shown near the heatmap. The tree branches are colored according to the virus host. Rare hosts and species are indicated by arrows. ORF1b and ORF2 sequences were additionally clustered by nucleotide and amino acid sequence identity using MMSeqs2 software (Steinegger and Söding 2018a), and the assignment of sequences to clusters was visualized as color bars “ORF1b-nt17%” and “ORF2-aa25%,” respectively. The clade marked with a cross is discussed in the text.
In ORF1a and ORF1b phylogenetic trees, the unclassified MAstVs made up several distinct phylogenetic groups which generally corresponded to the host species (Fig. 4a and b). There were no signs of taxon mixing between the established species and between several groups of unclassified MAstVs, indicating lack of recombination between them. Within these groups, there was evidence of ubiquitous recombination events between ORF1a and ORF1b, but especially between ORF1b and ORF2 (Fig. 4, shuffled shades of taxa names within groups in all trees except for ORF1b, which was used to assign shades according to tree tips order). In particular, all but two clades of porcine and cattle MAstVs exhibited recombination within clades but never between (Fig. 4, clades b1 and b2 of bovine MAstVs, clades p1 and p2 of porcine MAstVs). Two clades of viruses from pigs and cows identified in ORF1b were shuffled by recombination between them in the ORF2 phylogenetic tree relative to ORF1b and ORF1a (Fig. 4, clade p3 of porcine MAstVs and clade b3 of bovine MAstVs), which could hint at several ancestral recombination events.
It is important to notice that although the established MAstV species and clades of unclassified MAstVs generally include viruses isolated from the same host species, there were several exclusions (Fig. 4). In particular, there was a virus isolated from a crab-eating fox within the species MAstV5, which infects dogs. Within the bovine MAstV clade b3, there were individual viruses isolated from other artiodactyl species (deer, water buffalo, yak). Apparently, the boundaries of MAstV reproductively isolated groups (established or putative species) correspond not to a single host species, but to larger taxonomic groups, such as the family Canidae or the order Artiodactyla in the examples above. Also, viruses isolated from less related host species could be observed in the ‘bovine’ and ‘porcine’ clades. For example, viruses isolated from dogs could be found within bovine MAstV b1-b3 clades or viruses from camels and rats within porcine MAstV p3 and p1 clades, respectively (Fig. 4). Most of such viruses were represented by a single sequence. Many solitary findings of viruses from non-typical hosts could suggest that, despite the ability for interspecies transmission, there is no evidence that most of these spillover events led to a MAstV establishing in a new host. A possible exclusion was the subclade within p3 clade comprised by nine camelid MAstVs that were isolated in three independent studies (Woo et al. 2015, Qureshi et al. 2023). This clade apparently circulates in camels over large territories. Noteworthy, viruses of this camelid subclade were not involved in recombination with closely related porcine MAstVs. This group may represent speciation in progress, when viruses got established in a new host and stopped recombining with the parental porcine astrovirus gene pool, but have not diverged from their ancestors sufficiently to fulfill a sequence distance criterion. However, more field samples are needed to fully support this speculation.
Observing recombination, which requires co-infection, could be informative of previous host switches and assist in distinguishing the wide host specificity of a virus from a permanent host shift. Although there have been a number of reports of recombination between viruses infecting distinct hosts (Table 1), phylogenetic trees (Fig. 4), and PDC plots (Fig. 5) suggested that such recombination was relatively rare (Fig. 5, red dots). Most of the dots deviating from a linear relationship in two genome regions on PDC plots that suggested recombination involved viruses from same host species (Fig. 5, cyan dots). Many potential recombinant pairs involving viruses from distinct hosts were seen between distantly related viruses and likely represented ancestral recombination events, suggestively before these viruses got established in distinct hosts. One of the events reflected in many genomes was recombination between viruses from the porcine clade p3 and the bovine clade b3 clade discussed earlier (Fig. 4, marked with a cross; Fig. 5, purple circle). Other discordant sequence distance pairs (Fig. 5, black circles) involved viruses from closely related hosts, such as members of the order Artiodactyla, or single spill-over isolates, such as bovine MAstVs isolated from dogs, also discussed earlier. Also, there were several recombination events involving viruses from related hosts, with p-distances below 10% in one of the genome regions, such as members of the family Canidae or Felidae (Fig. 5, blue circles).

Correspondence between p-distances (PDC plots) within the nonstructural genome region: between two halves of ORF1a (a), between two halves of ORF1b (b), between ORF1a and ORF1b (c); between two halves of ORF2, approximately between regions encoding VP34 and VP27/25 (d); between nonstructural (ORF1ab) and structural (ORF2) genome regions (e) in the dataset of full MAstV genome sequences. The axes represent uncorrected p-distances in genomic regions. The dots correspond to virus pairs and are colored based on whether viruses in a pair infect the same host. Circles indicate the pairs of astrovirus genomes that were isolated from distinct hosts and bear signs of recombination (discussed in the text).
As findings suggestive of potential recombination between MAstVs infecting distinct hosts were represented by solitary sequences in the available dataset, it was difficult to distinguish true recombination between viruses from distinct host species from single spill-over infections involving recombinant viruses. The evidence available so far suggests that the host range was one of the major factors limiting recombination in MAstVs, but this host range barrier might be limited to a group of related host species rather than a single host species. Obviously, more sequence data is needed to establish precise borders of many MAstV species and their gene pools in a range of hosts.
Combining genetic distances and recombination as potential species criteria for MAstVs
A sole ORF2 distance-based species criterion was applicable, although with exclusions, to the established MAstV species (Fig. 2). However, it fared poorly with many unassigned viruses. Only one (p2) of six large clades of novel MAstV (p1-p3, b1-b3) could be distinguished by the 25% amino acid sequence cut-off in ORF2 (Fig. 4c, “ORF2-aa25%” color bar). The threshold of 17% nucleotide p-distance in the ORF1b region separated properly the established species and suggested distinct groups among the unclassified MAstVs that corresponded to clades (b1, b2, p1, p2) that were not involved in inter-clade recombination, just like the established species (Fig. 4b, “ORF1b-nt17%” color bar). The ORF1b criterion also corresponded well to reproductive isolation and host specificity (Fig. 4). Noteworthy, the number of distinct groups suggested by the ORF1b-nt17% cut-off among full-genome MAstV was 38, which represents a more convenient number of species compared to 103 groups suggested by ORF2-aa25% cut-off in the same dataset. The reproductive isolation criterion (common recombination within, but never outside a clade) generally corresponded to the groups suggested by the ORF1b-nt17% distance criterion, but not in the case of clades p3 and b3. Viruses of the p3 group made up two distinct clades that differed by >17% in ORF1b. However, viruses were commonly shuffled between these two groups in ORF1a and ORF2 relative to ORF1b; therefore, higher nucleotide distance in ORF1b did not prohibit frequent recombination in this case (Fig. 4). The opposite examples are viruses of the clade b3 that differed by <17% in ORF1b, but in ORF2 this clade was not maintained, and its members were found in three distinct sub-clades.
In order to visualize the correspondence between clustering based on ORF1b and ORF2 distance criteria, a matrix showing the grouping of astroviruses simultaneously in ORF1b and ORF2 was inferred (Fig. 6). For the recognized 19 MAstV species, the clustering results in both ORF1b and ORF2 matched almost perfectly, although with several exclusions (Fig. 6, cells marked with red rectangles). In particular, the canine MAstV5 made up one group in ORF2 according to the 25% aa cut-off, but three groups in ORF1b according to the 17% nt cut-off, and one of these ORF1b groups included three viruses that were not within the MAstV5 according to ORF2 classification. Human MAstV6 identified in ORF2 included one outgroup in ORF1b that differed by >17% nt sequence. As the breakaway MAstV5 and MAstV6 genomes are represented by a few sequences in Genbank, it may be pre-mature to further discuss their taxonomic assignment.

Correspondence of MAstV groups determined based on the ORF1b-nt17% and ORF2-aa25% distance criteria. Sequences from ORF1b and ORF2 clusters were collapsed on the corresponding ML trees (the same as on Fig. 4), so that each row and column on the matrix indicates the ORF1b-nt17% and ORF2-aa25% groups respectively. The colors of ORF1b-nt17% and ORF2-aa25% clusters correspond to color bar on Fig. 4. The collapsed clades are colored by the major virus host. The colors of matrix cells indicate the number of sequences that belong to a pair of groups that are concordant in ORF1b and ORF2. The cells that correspond to the established MAstV species are marked with red rectangles. Groups that include sequences of MAstV species proposed by (Guix et al. 2012) are marked with magenta rectangles, and species names are designated with the “caret” symbol. The groups that are proposed as MAstV species in this study are marked with green rectangles, and provisional species names are marked with asterisk.
Additional species MAstV20 to MAstV33 were suggested in 2012 based on ORF2 sequences (Guix et al. 2012) [reproduced with larger dataset of ORF2 sequences later (Donato and Vijaykrishna 2017)]. Since only full-genome sequences were usable to match ORF1b and ORF2 groups, this dataset did not include a number of ORF2 sequences which were used in previously (Guix et al. 2012, Donato and Vijaykrishna 2017), and it was not possible to perfectly reproduce the earlier study (Fig. 6, cells marked with magenta rectangles). Several of the species proposed in that study were consistent with our findings, but few could be suggestively combined into larger species based on the ORF1b-17% cut-off. For example, the suggestive MAstV21 species included a single mink astrovirus. In both ORF1b and ORF2 phylogenetic trees, MAstV21 grouped with an established mink astrovirus designated MAstV10. According to the ORF1b 17% criterion, MAstV21 and MAstV10 are the same species; however, in ORF2 they differ by 33% in the amino acid sequence. Sequences of the species MAstV24 formed a single group in the ORF2 tree and fell into three monophyletic ORF1b-nt17% clusters. As suggested for MAstV5 and 6 above, more MAstV genomes may be required to refine their taxonomic assignment.
Even more incongruence between species suggested by Guix et al. (2012) could be observed among the vast majority of unclassified MAstV that were isolated from livestock, predominantly pigs and cows. The unclassified porcine and bovine viruses were provisionally suggested to be termed species MAstV22, 24, 26, 27, 31, 32, and MAstV28–30, 33, respectively. Among the porcine and bovine astrovirus clades discussed earlier, only p2 clade clearly corresponded to the previously suggested species MAstV22. For this clade, the results of ORF1b and ORF2-based clustering almost coincided, with the exception of one sequence that fell into a separate cluster based on the ORF2 sequence. Another clade with frequent recombination within, but never outside the clade contained sequences that were previously designated as species MAstV26-27. This clade comprised sequences that fell into two ORF1b-nt17% clusters and 24 ORF2-aa25% clusters. The most parsimonic species assignment considering the same host, reproductive isolation and an almost good correspondence to the ORF1b-nt17% criterion would be combining all members of this group, including provisional MAstV26 and 27, into one species, MAstV26 (Fig. 6, green rectangle).
One of the most diverse and complex clades of MAstV was p3. It included two ORF1b groups and 24 ORF2 groups that were shuffled by recombination within the p3 clade. The clade was monophyletic in both ORF1b and ORF2, with several exceptions of external recombination events with clade b3 in ORF2. This clade p3 included members of the provisional species MAstV32 (and possibly MAstV31, which was proposed based only on the ORF2 sequence) suggested earlier. Neither of the criteria alone appears to provide a sensible classification of viruses within clade p3. Reproductive isolation (common recombination within the clade, but not outside) can be suggested to be used as a functional criterion to assign all members of clade p3 to a single species, MAstV32 (Fig. 6, green rectangle).
Viruses of the ORF1b clade b3 were split in ORF2 into three clades and five ORF2-aa25% groups, and were recombinant with the p3 clade in ORF2. This group included provisional MAstV 28, 29, 30, and 33 (Fig. 6, magenta rectangles). The number of sequences in the subclades of clade b3 may not be sufficient to draw reliable conclusions regarding the presence or absence of recombination within, but not between them. Therefore, it may be premature to suggest their taxonomic assignment.
There were a number of clades that were perfectly supported by both ORF1b and ORF2 distance cut-offs and by the recombination criterion. They could be suggested as six provisional species: MAstV34 (rat), MAstV35 (rat), MAstV36 (bovine), MAstV37 (water buffalo), MAstV38 (bovine), and MAstV39 (bovine) (Fig. 6, green rectangles, species names marked with asterisk). Genbank accession numbers of sequences that were included in these provisional species are provided in Supplementary data (Table S1).
The specific molecular signatures of MAstV genomes could complement the criteria for MAstV species designation. There is limited experimental data for astroviruses and the exact coordinates of proteins or protein domains of divergent animal MAstVs are not always straightforward. We analyzed the pattern of indels in conserved proteins that are robustly identifiable using PFAM domain search and SwissProt annotations—serine protease (ORF1a), RdRp (ORF1b) (Figure S5), and VP34 (ORF2, data not shown). The indel pattern was consistent with the phylogeny of viruses. However, in serine protease and in RdRp, distinct indel profiles generally corresponded to higher-level groups that combine several species (e.g. MAstV 1, 2, 3, and 5). On the other hand, a single insertion was observed in just two of 57 MAstV26* sequences (Figure S5, marked with a cross). Even more chaotic indel patterns relative to the taxonomic groups were observed in VP34 (data not shown).
Discussion
Mechanisms and patterns of recombination in astroviruses are not as extensively studied as in other RNA viruses. While recombination has been observed in a significant number of astrovirus strains, the frequency, extent, and biological significance of recombination events in astroviruses remain areas of ongoing research. A generally known feature of many (+)RNA viruses is frequent recombination between genome regions encoding structural and nonstructural proteins (Simmonds 2006). This rule is similarly observed in viruses with a single ORF, e.g. in many picornaviruses (Lukashev 2010) or distinct ORFs, such as coronaviruses (Vakulenko et al. 2021, de Klerk et al. 2022) and caliciviruses (Dos Anjos et al. 2011, Lopes et al. 2015, Begall et al. 2018, Mahar et al. 2019, Szillat et al. 2020, Tohma et al. 2020, Vakulenko et al. 2023). Astroviruses were suggested to follow this rule based on the analysis of 10 genomes known in 2006 (Simmonds 2006), and many isolated findings thereafter found recombination between ORF1 and ORF2 (Table 1). A systematic analysis here confirmed a recombination hotspot between ORF1 and ORF2 in MAstVs. In many picornaviruses, such as entero-, parecho-, aphtho-, and cardioviruses, most recombination events mapped to the junction between the structural VP1 and the nonstructural 2A genome regions, with hardly any recombination within VP1, but rather common events in 2A and all over the nonstructural genome region (Lukashev 2010). In noroviruses, the recombination hotspot between ORF1 (encodes nonstructural proteins) and ORF2 (major capsid protein) was narrow and well-defined, which led to circulation of distinct recombinant forms (Begall et al. 2018, Tohma et al. 2021, Kendra et al. 2022). There was almost no recombination within ORF1, and there were isolated events within ORF2, ORF3, and at ORF2/ORF3 boundary (Rohayem et al. 2005, Lam et al. 2012, Eden et al. 2013, Vakulenko et al. 2023). The hotspot between nonstructural and structural genome regions in MAstVs was similar to that in noroviruses, as there was little evidence of recombination in ORF1b and the part of ORF2 encoding VP34, which is adjacent to the ORF1b/ORF2 junction. The second distinction in MAstVs compared to closely related virus families was the apparent lack of recombination-devoid genome regions. In many picornaviruses, natural recombination is hardly ever observed within the genome region encoding the major capsid proteins VP1-VP3 (meanwhile, VP4, which is present in some genera and is not exposed on the capsid surface, is an exception) (Lukashev 2010). In noroviruses, recombination is extremely rare, if possible at all, within the ORF1 (Begall et al. 2018, Tohma et al. 2021, Vakulenko et al. 2023). Thus, recombination incidence in astroviruses throughout the genome was comparable to similar RNA viruses, but the pattern was less clear-cut.
Besides a recombination profile across the genome, RNA viruses also have distinct patterns of recombination in terms of divergence of the putative recombination partners. In many RNA viruses, including enteroviruses and noroviruses, there is routine recombination in circulating viruses, usually within species, that may occur as often as every several years in enteroviruses (McWilliam Leitch et al. 2009, 2012, Lukashev et al. 2014) and somewhat less frequently, on a scale of about 10 years, in noroviruses (Tohma et al. 2021, Vakulenko et al. 2023) and results in a virtually unrelated phylogenetic profile of genomic regions within a species. Routine recombination between more divergent viruses that differ more than a certain threshold is restricted, and shuffling of genomes within, but not between species was suggested as one of the mechanisms that maintains virus species distinct (Lukashev 2010). This routine recombination should be distinguished from phylogenetic evidence of recombination that can be found even between very distantly related viruses. However, often these findings correspond to extremely rare, unique events, experimental artifacts, or may be attributed to ancient recombination events, which occurred when the ancestral viruses were much more similar than today, belonged to the same ancestral species and recombined within that ancestral species.
Both routine and ancestral recombination events were evident in mammalian astroviruses. On phylogenetic trees, ancestral events were responsible for shifting positions of the whole groups in different genome regions, while routine recombination shuffled viruses within groups that were preserved over the genome (Fig. 4). Most bioinformatics methods, including those implemented in the RDP5 program (Fig. 3), do not distinguish “ancestral” and “routine” recombination events, and none can reliably quantify the relative impact of the two. Pairwise distance plots can show the relative impact of recent recombination events (genome pairs with a very low genetic distance in one genome region and high in another) and ancient ones (genome pairs that deviate from the regression line, but genetic distances in both genome regions were relatively high). It was clear from both phylogenetic trees (Fig. 4) and pairwise distance plots (Figure S3, Fig. 5) that there has been routine recombination between closely related viruses, within a species for viruses of assigned species, and some of these routine events were very recent (involved viruses that differed by <2% in one genome region). Also, there were traces of ancient recombination between more distantly related viruses (over 20–40% of nucleotide sequence distance, depending on the genome region) on pairwise distance plots, but no recent recombination events between distantly related MAstVs. The MAstV genomes were regularly shuffled within, but not between the virus species. Therefore, the same mechanism that defines species in some other RNA viruses, routine recombination within, but not between, apparently maintains MAstV species as well. When a lineage stops recombining with its gene pool (species), it would hypothetically start to diverge and would produce a novel species, which was probably the case with the camel astrovirus group that emerged from porcine astroviruses and apparently stopped recombining with them (Woo et al. 2015, Qureshi et al. 2023), but did not diverge beyond a species threshold yet.
There has been a remarkable correspondence between routine recombination patterns and host species. MAstVs are generally regarded as easily switching hosts (Roach and Langlois 2021). However, recombination patterns suggest that species and their gene pools are confined to a single host or a host group. Many of the reports of recombination of astroviruses from distinct hosts likely correspond to ancient events before a host switch (Fig. 5). Some may reflect spillover of recombinant viruses in hosts that are not typical for that virus group. For example, a few viruses of bovine cluster b3 that were isolated from dogs and were recombinant relative to other bovine viruses (as all viruses within a species are recombinant relative to each other), but there is no evidence to regard this as recombination between canine and bovine MAstVs. Some recombination events between viruses of distinct host species involve closely related hosts, such as dog and crab-eating fox. Excluding these cases, there have been no reliable instances of recent recombination between astroviruses infecting distinct hosts, suggesting that species gene pools are specific to a host. If we regard “host” not as a species, but as higher taxonomic units (for example, family Canidae or the order Artiodactyla), astroviruses may appear much more host specific than generally perceived, and recombination patterns provide even better evidence for that than virus isolation per se.
Reproductive isolation (lack of routine recombination) can be driven by incompatibility of genome fragments between virus species or absence of co-infection due to replication in distinct host species or cell types. Phylogenetic analysis suggests significant host specificity of MAstV. Therefore, host specificity is an important factor shaping MAstV species, by founder effect and by limiting inter-species recombination. On the other hand, one species (e.g. humans) could host several reproductively isolated MAstV groups (virus species). Several groups without evidence of routine recombination between them were also seen among porcine and bovine astroviruses. Therefore, distantly related MAstVs can co-infect a single host, but there are no reliable examples of recombination between the established or provisional species infecting same host. It may be concluded that intra-host barriers that limit emergence or survival of recombinants of distantly related MAstVs exist, but their mechanisms cannot be inferred from the phylogenetic data. It is conceivable that if MAstV establishes in a new host species, it will lose access to the gene pool of the original host species and would, over time, make a distinct species, which would become incompatible, on genomic, cellular receptor or any other level, with the original species. The host range and recombination profile of the established MAstV are compatible with this hypothesis.
There have been several attempts to find a straightforward species criterion for MAstVs. Currently, viruses that share >75% of amino acid sequence in ORF2 and infect the same host species are assigned to MAstV species (Bosch et al. 2012). Indeed, this criterion distinguished well the established species, but did not yield comparably robust groups among MAstV that were isolated recently, predominantly from cattle, and suggested 103 distinct astrovirus groups in the current full-genome dataset, which may be overwhelming for taxonomy (Fig. 4). The distribution of pairwise distances in ORF2 (Fig. 2, left panel) did not suggest a single criterion that could be preferable over the 25% of amino acid distance cut-off to distinguish astroviruses that are currently not assigned to species. In ORF1b, however, there were several sequence distance values that could unambiguously distinguish groups of MAstV (Fig. 2, right panel). A cut-off at 17% nucleotide sequence distance separated the established species and defined several clear-cut groups among the unclassified MAstVs. These groups were generally distinct in all three genome regions and corresponded well to the host species, with one clear exclusion of groups of bovine b3 and porcine p3 that appeared shuffled by recombination in ORF2 relative to ORF1b. As there have also been fewer recombination events within ORF1b, this genome region appears superior over ORF2 in taxonomic and phylogenetic studies (Fig. 3, Figure S3). It is also noteworthy that the number of distinct groups identified in ORF1b using the 17% nucleotide sequence cut-off was 38 (Fig. 4), which is more convenient for taxonomy than 103 groups suggested by the 25% amino acid sequence cut-off in ORF2. It should be noted that this criterion is fairly good for the currently known astroviruses. However, considering the genetic and ecological diversity of astroviruses, it cannot be excluded that it will be challenged by the genomes that will be discovered in the future.
Taxonomic proposal
Common recombination and dissimilar patterns of divergence in ORF1 and ORF2, as well as occasional dead-end spillover events, suggest that MAstV species should be designated only when a complete or near-complete genome is available for several members of a provisional species (ideally, sampled in distinct locations) to take into account recombination patterns in addition to sequence distances. No single sequence distance threshold may be guaranteed to discriminate all known and future sequences, therefore a combination of criteria (ORF1b-nt17%, ORF2-aa25%, and recombination) should be considered. If a surrogate species criterion is required, ORF1b provides more consistent results, but may not ensure reliable species identification in all cases. This combination of criteria suggests that MAstV26 and 27 (suggested previously, but not officially accepted) and closely related viruses should be one species (MAstV26), while MAstV32 and closely related viruses that fell into group p3 in this study should be all one species, MAstV32. In addition, six novel species (MAstV34-MAstV39) could be suggested, provisionally based on the three above-mentioned criteria (Fig. 6, green rectangles). This leaves out unassigned several more potential novel species that were represented by one or two genomic sequences in Genbank, as well as viruses of clade b3 and viruses related to MAstV5, which had complex recombination patterns and may be assigned to species upon future studies.
Conclusion
Recombination patterns in the genus Mamastrovirus are in general similar to other related RNA families, such as Picornaviridae and Caliciviridae: a recombination hotspot between structural and nonstructural genes, reproductively isolated gene pools that correspond to putative species and routine recombination within these gene pools that augments their phylogenetic distinction. However, in mammalian astroviruses, all recombination and evolution patterns are not as uniform and precise as in many other RNA viruses. There was not only a recombination hotspot between genome regions encoding structural and nonstructural proteins, but also a notable recombination incidence throughout the genome. Virus species are generally host-specific, but a “host” should be regarded wider than in other viruses, and there were many isolated spillover events. Nevertheless, routine recombination suggests a functional criterion to distinguish Mamastrovirus species, which are shaped by a combination of routine recombination, host specificity and (possibly) by mechanistic incompatibility or ecological isolation. ORF1b offers a convenient species threshold of 17% nucleotide sequence distance, which can complement other criteria for species assignment.
Supplementary data
Supplementary data is available at VEVOLU online.
Conflict of interest:
None declared.
Funding
This study was funded by Russian Science Foundation grant number 22-15-00230.
Data availability
The data presented in this study (sequence names and metadata, alignments, phylogenetic trees) are openly available at https://github.com/v-julia/MAV_recombination/data (accessed on 5 December 2024).