-
PDF
- Split View
-
Views
-
Cite
Cite
Alina A Mikhailova, Elias Dohmen, Mark C Harrison, Major changes in domain arrangements are associated with the evolution of termites, Journal of Evolutionary Biology, Volume 37, Issue 7, July 2024, Pages 758–769, https://doi.org/10.1093/jeb/voae047
- Share Icon Share
Abstract
Domains as functional protein units and their rearrangements along the phylogeny can shed light on the functional changes of proteomes associated with the evolution of complex traits like eusociality. This complex trait is associated with sterile soldiers and workers, and long-lived, highly fecund reproductives. Unlike in Hymenoptera (ants, bees, and wasps), the evolution of eusociality within Blattodea, where termites evolved from within cockroaches, was accompanied by a reduction in proteome size, raising the question of whether functional novelty was achieved with existing rather than novel proteins. To address this, we investigated the role of domain rearrangements during the evolution of termite eusociality. Analysing domain rearrangements in the proteomes of three solitary cockroaches and five eusocial termites, we inferred more than 5,000 rearrangements over the phylogeny of Blattodea. The 90 novel domain arrangements that emerged at the origin of termites were enriched for several functions related to longevity, such as protein homeostasis, DNA repair, mitochondrial activity, and nutrient sensing. Many domain rearrangements were related to changes in developmental pathways, important for the emergence of novel castes. Along with the elaboration of social complexity, including permanently sterile workers and larger, foraging colonies, we found 110 further domain arrangements with functions related to protein glycosylation and ion transport. We found an enrichment of caste-biased expression and splicing within rearranged genes, highlighting their importance for the evolution of castes. Furthermore, we found increased levels of DNA methylation among rearranged compared to non-rearranged genes suggesting fundamental differences in their regulation. Our findings indicate the importance of domain rearrangements in the generation of functional novelty necessary for termite eusociality to evolve.

Introduction
Protein domains are the functional and structural units of proteins (Dohmen et al., 2020; Forslund & Sonnhammer, 2012). Domains can be gained by a protein, or the relative positions of existing domains to each other within a domain arrangement or domain architecture (DA) can be rearranged, allowing a protein to acquire novel functions (Moore et al., 2008). These rearrangements can occur via different mechanisms, including gene fusion, exon shuffling, recombination, and gene duplication, and are important for the emergence of various traits (Buljan et al., 2010; Forslund et al., 2019; Patthy, 2003; Moore et al., 2008). Particularly, the evolution of multicellularity, one of the major evolutionary transitions (Szathmáry & Smith, 1995), was accompanied by an increase in domain rearrangements (Ekman et al., 2007). This is demonstrated, e.g., in extracellular communication, extracellular matrix formation, and intracellular signalling pathways, in which novel multi-domain proteins played an important evolutionary role (Patthy, 1985, 2003; Pawson, 1995), as well as in the evolution of blood coagulation cascade proteins in vertebrates (Coban et al., 2022).
Not only the reuse of existing domains can be a source of evolutionary innovation, but also the emergence of novel domains can lead to the evolution of new traits. It was shown that the rapid emergence of animal-specific domains contributed to the functional diversification in animals as well as the evolution of tyrosine phosphorylation systems and coagulation cascades (Itoh et al., 2007). Also, a high rate of domain emergences was previously observed at the root of mammals and was potentially linked to the emergence of hair and the evolution of the mammalian immune system (Dohmen et al., 2020). Analysing types and rates of domain rearrangements and emergences along a phylogeny can help us to detect molecular events leading to evolutionary innovation. For example, in arthropods 40,000 domain rearrangements could be identified, contributing to the enormous diversity that exists especially within insects (Thomas et al., 2020).
The role of domain rearrangements in the next major transition, from multicellular individuals to eusocial colonies (Szathmáry & Smith, 1995), has so far received little attention. This complex trait, which is characterized by cooperative brood care, overlapping generations and division of labour (Wilson & Hölldobler, 2005), has emerged in such divergent orders as Coleoptera (ambrosia beetles), Hymenoptera (bees, wasps, and ants), and Blattodea, where termites have evolved within the cockroaches (Crespi, 1996; Inward et al., 2007; Kent & Simpson, 1992; Lo et al., 2007; Sherman et al., 2017). It has been hypothesized that, while mainly regulatory changes are expected during the early stages of social evolution, greater functional changes occur with the evolution of greater social complexity along with the emergence of distinct castes (Rehan & Toth, 2015). These expectations have been largely confirmed by comparative genomic analyses on the transition to eusociality in bees (Kapheim et al., 2015; Shell et al., 2021), ants (Simola et al., 2013), and termites (Harrison et al., 2018) such as increased transcriptional regulation and evolution of gene families involved in chemoperception. The potential role of domain evolution in this major transition has been highlighted by novel DAs being found at the node representing one origin of eusociality in bees, affecting proteins with functions related to embryonic morphogenesis, cell growth, and insulin signalling (Dohmen et al., 2020).
Termites represent an interesting clade for investigating the molecular evolution of eusociality, since extant species display varying levels of social complexity, allowing the inference and mapping of various social traits to the phylogeny (Korb & Thorne, 2017). Furthermore, as with most eusocial taxa, reproductive individuals display extreme lifespans despite high fertility. In the higher termites, of the genus Macrotermes, e.g., queens can live for over 20 years (Elsner et al., 2018) while producing thousands of eggs per day (Kaib et al., 2001). This convergent decoupling of the longevity/fecundity trade-off in eusocial organisms has been linked to an upregulation of members of the insulin/insulin-like growth factor (IGF-1) signalling (IIS) pathway, as well as increased expression of mitochondrial and DNA repair functions (Monroy Kuhn et al., 2019; Séité et al., 2022). Sterile soldiers emerged once early in the evolution of termites and are present in all extant lineages except for a few Termitidae genera where soldiers were secondarily lost (Noirot & Pasteels, 1987). The origin of the sterile worker caste that consists of morphologically specialized individuals, which diverge early from the imaginal (reproductive) line, is considered polyphyletic and correlates with nesting and feeding behaviour (Legendre et al., 2008; Noirot & Pasteels, 1987; Noirot, 1988; Roisin & Korb, 2011). Wood-dwelling termites live in small, non-foraging colonies with totipotent workers, which retain the potential to reach sexual maturity (Korb & Hartfelder, 2008). In the larger, foraging colonies of Mastotermitidae, Hodotermitidae, and Termitidae, as well as most Rhinotermitidae species, on the other hand, development follows an early bifurcation leading to the production of true workers without the ability to reach sexual maturation (Abe, 1990; Pull & McMahon, 2020; Roisin & Korb, 2011). The ability of workers to switch from the sterile, apterous to the nymphal-alate line to obtain a certain level of fertility remains to varying degrees in these groups but is generally rare (Roisin & Korb, 2011). In contrast to findings for the evolution of eusociality in Hymenoptera (Kapheim et al., 2015; Shell et al., 2021; Simola et al., 2013), a comparative genomic study into the origins of blattodean eusociality found the unexpected pattern of reductions in proteome size due to gene family contractions in termites compared to non-social outgroups (Harrison et al., 2018). We hypothesize, therefore, that changes in existing proteins, e.g., via domain rearrangements, may have played a role in the emergence and elaboration of eusociality in termites.
In this study, we tested this hypothesis by analysing the domain content of available genomes from eight blattodean species with varying levels of social complexity. Besides three non-social cockroaches (Blattella germanica, Periplaneta americana, and Diploptera punctata), we analysed two lower, wood-dwelling termites (Zootermopsis nevadensis and Cryptotermes secundus) that form small colonies with totipotent workers consisting of around 900 (Abe, 1990) and 300–400 (Korb & Lenz, 2004) individuals, respectively, in which reproductives live up to around 7 years (Monroy Kuhn et al., 2019; Thorne et al., 2002). To represent termites with higher social complexity, we included three foraging termite species with sterile (“true”) workers (Coptotermes formosanus, Reticulitermes speratus, and Macrotermes natalensis) that form large, complex colonies containing up to 50,000 (C. formosanus) and 200,000 individuals (Abe, 1990; Meyer et al., 2000), in which reproductives have even longer lifespans of over 10 years (Keller, 1998; Monroy Kuhn et al., 2019). We inferred domain rearrangements that occurred at the origin of termites and at the evolution of sterile workers to understand changes in protein content related to the emergence and elaboration of termite eusociality.
Materials and methods
Domain rearrangements calculation
We used previously published proteomes of three cockroaches (B. germanica, D. punctata, P. americana), five termites (Z. nevadensis, C. secundus, R. speratus, C. formosanus, M. natalensis), and two outgroup species (Eriosoma lanigerum, Frankliniella occidentalis) for this study (Biello et al., 2021; Fouks et al., 2022; Harrison et al., 2018; Itakura et al., 2020; Li et al., 2018; Poulsen et al., 2014; Rotenberg et al., 2020; Shigenobu et al., 2022; Terrapon et al., 2014). All proteomes were filtered for possible pseudogenes and only the longest isoforms were kept, using DW-Helper scripts (https://domain-world.zivgitlabpages.uni-muenster.de/dw-helper/index.html). Filtered proteomes were then annotated with Pfam domains in version 30.0 (Mistry et al., 2021) with PfamScan v1.6 (Finn et al., 2016). Completeness of the annotated proteomes was assessed with DOGMA (Dohmen et al., 2016; Kemena et al., 2019) (Supplementary Table 1). A species tree was constructed with OrthoFinder using single-copy orthologs (Emms & Kelly, 2019). DomRates was run on the species tree and annotated proteomes to reconstruct domain rearrangements along the phylogeny (Dohmen et al., 2020). Only exact solutions of DomRates were used in the downstream analyses.
Gene Ontology enrichment analysis
To infer functional adaptation along the evolution of termites, we performed Gene Ontology (GO)-term enrichment analyses of (a) DAs and (b) genes containing rearranged DAs. For DAs, we used pfam2go to map GO terms to the corresponding domains in the DAs (Mitchell et al., 2015). The GO universe for this analysis contained all domain rearrangements in Blattodea, and we took domain rearrangements on each inner node of the Blattodea phylogeny as the test set.
To analyse the functions of genes with rearranged domains, we extracted genes containing DAs from the DomRates output for the nodes of the origin of termites (Z. nevadensis, C. secundus, R. speratus, C. formosanus, and M. natalensis species) and the origin of true workers (R. speratus, C. formosanus, and M. natalensis species). For each species, we looked at the enriched GO terms in the genes with domains that were rearranged in one of the two origins. Additionally, we performed GO enrichment analyses for genes rearranged at cockroach nodes (nodes 3 and 6 from Supplementary Figure 1)
Both GO-term enrichment analyses were performed with the package topGO in R using the classic and weight algorithms (Alexa et al., 2010). Significantly enriched GO terms (p-value 0.05) were grouped based on the semantic similarity using Wang’s graph-based method from the R package GOSemSim (Yu et al., 2010) and visualized using the library ggplot2 for heatmaps (Wickham, 2016) and ggdendro for the dendrogram of GO terms (de Vries & Ripley, 2022).
Differential gene expression analyses
To look into the enrichment of caste-biased genes among genes rearranged at the origin of termites, the information on caste-biased gene and exon expression was extracted from the output produced in a previous study for Z. nevadensis, C. secundus, and M. natalensis (Harrison et al., 2018). For R. speratus, we used expression data from a previous study (Shigenobu et al., 2022) and ran differential gene expression analysis using the R package DESeq2 (Love et al., 2014). The proportions of caste-biased genes among non-rearranged and rearranged genes were compared using chisq.test function in R. Length and domain content of caste-biased and rearranged genes were visualized with the R package ggstatsplot (Patil, 2021).
Phylostratigraphy analysis
The gene age of non-rearranged and rearranged genes was assessed using phylostratigraphy analysis (Gabel, 2021). The results were split into the following categories based on the evolutionary origins of genes: Cellular organisms, Eukaryota, Arthropoda, Neoptera, Dictyoptera, Termitoidea (Isoptera), and Species-specific. The proportions of these categories within rearranged and non-rearranged genes were visualized with ggplot2 and ggsci R packages (Wickham, 2016; Xiao, 2023). We tested the effect of gene age (all categories together and each one separately; Additional File 1—Supplementary Table S1) on the rearrangement probability using logistic regression as described in the next section.
Rearrangement probability model
To calculate the probability of a gene to be rearranged predicted by gene age, caste-biased gene expression, gene length, and domain content, we performed a logistic regression using function glm(formula, family = “binomial”) in R. Since no interactions between the predictor variables were significant, only additive models were used (Additional File 1—Supplementary Table S2).
CpGo/e and methylation analyses
To estimate the methylation level of DAs, the CpGo/e was calculated by comparing observed to expected CpG counts per domain where the expected values are the product of cytosine and guanine fractions. We calculated the weighted mean of CpGo/e per DA where the weights were the fractions of CG pairs. To test whether CpGo/e of domains rearranged in the origin of termites is lower than CpGo/e of non-rearranged domains, we performed a one-sided Wilcoxon test in R.
To test whether lower CpGo/e is associated with rearrangements, we extracted the ancestral arrangements that underwent changes in the origin of termites from the DomRates output. We excluded fusions because some single domains that were fused in the origin of termites were over-represented, e.g., zinc finger domain (PF00096). Next, we calculated CpGo/e for all DAs as previously described and compared DAs that were changed in the origin of termites to the rest using one-sided Wilcoxon test in R.
Moreover, we used methylation data for M. natalensis from a previous study (Harrison et al., 2022) to test whether methylation is indeed higher in rearranged genes. We computed average methylation values over gene regions (flank, cds, intron) and replicates for each gene and compared between genes rearranged in both origins of termites and true workers and non-rearranged genes using one-sided Wilcoxon test in R.
Transposable elements in close proximity
Data on TE counts per gene were taken from a previous study for Z. nevadensis, C. secundus, M. natalensis, and B. germanica, which included proportions of 10kb flanking regions (3’- and 5’- of genes) covered by TEs (Harrison et al., 2018).
We compared the levels of TE content in flanking regions for each species between rearranged and non-rearranged genes. For M. natalensis, rearranged genes included genes rearranged at the origins of termites and true workers.
Results
Over 5,000 domain rearrangements in Blattodea
Using proteomes of three non-social cockroaches, two termites with totipotent workers, three termites with true workers, and two outgroup species (Eriosoma lanigerum, woolly apple aphid, and Frankliniella occidentalis, western flower thrips), we reconstructed protein domain rearrangement events along the Blattodea phylogeny. For this, we annotated protein sequences with PFAM domains, version 30.0 (Mistry et al., 2021), and used DomRates (Dohmen et al., 2020) to infer DAs of all ancestral nodes based on those of the analysed species, thus inferring all rearrangements at each node of the tree. Overall, we observed more than 5,000 rearrangements in Blattodea, most of which were fusions of two ancestral DAs (44%), while the fission of one ancestral DA into two new DAs accounts for 13% of rearrangements. Moreover, we observed 22% and 18% of single domain and terminal domain losses, respectively. The former indicates this domain no longer exists in the whole proteome, while the latter refers to the loss of a terminal domain from a multi-domain protein. For further details on DA types, we refer to the original DomRates publication (Dohmen et al., 2020). Novel domain emergences correspond to 3% of all events (Figure 1, Supplementary Table 2). As proteome completeness scores, based on the existence of conserved DAs measured with DOGMA (Dohmen et al., 2016), varied among the analysed species (Supplementary Figure S1), we did not consider domain losses in subsequent analyses as they could be due to incompleteness of the analysed genome assemblies or annotations, as previously reported (Thomas et al., 2020). To check for a systematic bias in DOGMA completeness scores across our groups of tested species (all termites, termites with true workers, and non-social outgroups), we performed an ANOVA. The test was non-significant, indicating no effect of variation in completeness scores (termites vs. the rest: p-value = 0.106; species with true workers vs. the rest: p-value = 0.397).

Domain rearrangements in Blattodea. For each node, the pie charts represent the fractions of different types of rearrangements shown above, and the grey numbers to the right of each pie chart are the total amount of rearrangements. The nodes of the origin of termites and the origin of true workers are indicated with the red (node 7) and blue (node 11) squares, respectively. Black circles contain node numbers.
Domains related to longevity, nutrition, and development rearranged at the root of termite eusociality
We were particularly interested in functional changes that occurred at two nodes: (a) the origin of termites where eusociality emerged (Figure 1, red box) and (b) the origin of true workers associated with various novel traits such as foraging behaviour, differences in development, larger colonies, and extended lifespan of reproductive castes (Figure 1, blue box). At the origin of termite eusociality, we found 90 novel domain arrangements, including 52 fusions, 10 fissions, and 4 and 3 single and terminal domain emergences, respectively. These seven “novel” domains exist elsewhere in the tree of life but were inferred here as “emerged” at the root of termites, since they were not found in any of the five analysed non-social species (three cockroaches, the aphid E. lanigerum, and the thrips F. occidentalis). These seven termite-specific domains are associated with functions related to protein homeostasis (PF05025, PF01814, and PF07297), mitochondrial functions (PF10642), DNA repair (PF06331), and insulin production (PF11548) (Table 1). A further domain (PF11600) is found in the chromatin assembly factor 1 complex. These domains differ considerably from the diverse domains that emerged at the internal cockroach nodes (Supplementary Table S3).
PFAM IDs and descriptions of domains classed as emergeda at the origin of termites and the origin of true workers in relation to other analysed species.
Node . | PFAM ID . | PFAM description . | DA and species . | Gene and function . |
---|---|---|---|---|
Termites | PF05025 | RbsD/FucU transport protein family | Single domain; Zn, Cs, Rs, Cf | Fucose mutarotase: synthesis of N-glycans |
Termites | PF06331 | Transcription factor TFIIH complex subunit Tfb5 | Single domain; Zn, Cs, Rs, Mn | GTF2H5; DNA repair |
Termites | PF07297 | Dolichol phosphate-mannose biosynthesis regulatory protein (DPM2) | Single domain; Zn, Csx2, Rs, Mn | DPM2: protein glycosylation |
Termites | PF11548 | Protein-tyrosine phosphatase receptor IA-2 | +PF00102 (Y_phosphatase); Zn, Cs, Rs | PTPR2: insulin-like peptide secretion |
Termites | PF10642 | TOM5—Mitochondrial import receptor subunit or translocase | Single domain; Zn, Cs | TOM5-like protein: Transport of mitochondrial proteins |
Termites | PF01814 | Hemerythrin HHE cation binding domain | +PF12937 (F-box-like); Zn, Cs, Rs | FBXL5: ubiquitination |
Termites | PF11600 | Chromatin assembly factor 1 complex p150 subunit, N-terminal | +PF12253 (CAF1A); Zn, Cf, Rs, Mn | CHAF1A: DNA replication & repair |
True workers | PF14990 | Domain of unknown function (DUF4516) | Single domain; Rs, Mn | BRAWNIN: mitochondrial respiratory complex III |
Node . | PFAM ID . | PFAM description . | DA and species . | Gene and function . |
---|---|---|---|---|
Termites | PF05025 | RbsD/FucU transport protein family | Single domain; Zn, Cs, Rs, Cf | Fucose mutarotase: synthesis of N-glycans |
Termites | PF06331 | Transcription factor TFIIH complex subunit Tfb5 | Single domain; Zn, Cs, Rs, Mn | GTF2H5; DNA repair |
Termites | PF07297 | Dolichol phosphate-mannose biosynthesis regulatory protein (DPM2) | Single domain; Zn, Csx2, Rs, Mn | DPM2: protein glycosylation |
Termites | PF11548 | Protein-tyrosine phosphatase receptor IA-2 | +PF00102 (Y_phosphatase); Zn, Cs, Rs | PTPR2: insulin-like peptide secretion |
Termites | PF10642 | TOM5—Mitochondrial import receptor subunit or translocase | Single domain; Zn, Cs | TOM5-like protein: Transport of mitochondrial proteins |
Termites | PF01814 | Hemerythrin HHE cation binding domain | +PF12937 (F-box-like); Zn, Cs, Rs | FBXL5: ubiquitination |
Termites | PF11600 | Chromatin assembly factor 1 complex p150 subunit, N-terminal | +PF12253 (CAF1A); Zn, Cf, Rs, Mn | CHAF1A: DNA replication & repair |
True workers | PF14990 | Domain of unknown function (DUF4516) | Single domain; Rs, Mn | BRAWNIN: mitochondrial respiratory complex III |
Note. DA = domain architecture.
aThese domains are found elsewhere in the tree of life but were not found among the analysed species outside the indicated node.
Gene names derived via blastp to nr. Functions inferred via uniprot.org.
PFAM IDs and descriptions of domains classed as emergeda at the origin of termites and the origin of true workers in relation to other analysed species.
Node . | PFAM ID . | PFAM description . | DA and species . | Gene and function . |
---|---|---|---|---|
Termites | PF05025 | RbsD/FucU transport protein family | Single domain; Zn, Cs, Rs, Cf | Fucose mutarotase: synthesis of N-glycans |
Termites | PF06331 | Transcription factor TFIIH complex subunit Tfb5 | Single domain; Zn, Cs, Rs, Mn | GTF2H5; DNA repair |
Termites | PF07297 | Dolichol phosphate-mannose biosynthesis regulatory protein (DPM2) | Single domain; Zn, Csx2, Rs, Mn | DPM2: protein glycosylation |
Termites | PF11548 | Protein-tyrosine phosphatase receptor IA-2 | +PF00102 (Y_phosphatase); Zn, Cs, Rs | PTPR2: insulin-like peptide secretion |
Termites | PF10642 | TOM5—Mitochondrial import receptor subunit or translocase | Single domain; Zn, Cs | TOM5-like protein: Transport of mitochondrial proteins |
Termites | PF01814 | Hemerythrin HHE cation binding domain | +PF12937 (F-box-like); Zn, Cs, Rs | FBXL5: ubiquitination |
Termites | PF11600 | Chromatin assembly factor 1 complex p150 subunit, N-terminal | +PF12253 (CAF1A); Zn, Cf, Rs, Mn | CHAF1A: DNA replication & repair |
True workers | PF14990 | Domain of unknown function (DUF4516) | Single domain; Rs, Mn | BRAWNIN: mitochondrial respiratory complex III |
Node . | PFAM ID . | PFAM description . | DA and species . | Gene and function . |
---|---|---|---|---|
Termites | PF05025 | RbsD/FucU transport protein family | Single domain; Zn, Cs, Rs, Cf | Fucose mutarotase: synthesis of N-glycans |
Termites | PF06331 | Transcription factor TFIIH complex subunit Tfb5 | Single domain; Zn, Cs, Rs, Mn | GTF2H5; DNA repair |
Termites | PF07297 | Dolichol phosphate-mannose biosynthesis regulatory protein (DPM2) | Single domain; Zn, Csx2, Rs, Mn | DPM2: protein glycosylation |
Termites | PF11548 | Protein-tyrosine phosphatase receptor IA-2 | +PF00102 (Y_phosphatase); Zn, Cs, Rs | PTPR2: insulin-like peptide secretion |
Termites | PF10642 | TOM5—Mitochondrial import receptor subunit or translocase | Single domain; Zn, Cs | TOM5-like protein: Transport of mitochondrial proteins |
Termites | PF01814 | Hemerythrin HHE cation binding domain | +PF12937 (F-box-like); Zn, Cs, Rs | FBXL5: ubiquitination |
Termites | PF11600 | Chromatin assembly factor 1 complex p150 subunit, N-terminal | +PF12253 (CAF1A); Zn, Cf, Rs, Mn | CHAF1A: DNA replication & repair |
True workers | PF14990 | Domain of unknown function (DUF4516) | Single domain; Rs, Mn | BRAWNIN: mitochondrial respiratory complex III |
Note. DA = domain architecture.
aThese domains are found elsewhere in the tree of life but were not found among the analysed species outside the indicated node.
Gene names derived via blastp to nr. Functions inferred via uniprot.org.
To understand the functions that were affected by domain rearrangements, we performed two types of GO term enrichment analyses. First, at all of the internal nodes of the phylogeny shown in Figure 1, we compared the domains, and their associated GO terms, involved in novel domain rearrangements to all DAs contained in the full species set. At the origin of termite eusociality (Figure 1, red box), we found an enrichment of various GO terms related to development and metamorphosis, such as the regulation of embryonic development, cell migration and cell adhesion, as well as cilium assembly and movement (Figure 2). Post-translational modifications of proteins also appear to be important for the evolution of termites, with protein phosphorylation and dolichol metabolism, involved in glycosylation (Carroll et al., 1992), both enriched at the root of termites. This is further supported by similar enriched GO terms at other internal termite nodes but a lack thereof in cockroach nodes (Figure 2). Finally, in the two lower termites, Z. nevadensis and C. secundus, we analysed the functional enrichment of genes containing DAs that were rearranged at the origin of termites. We found common GO terms related to intracellular protein transport and localization (Figure 3). Several GO terms involved in DNA repair mechanisms as well as stress response were found in C. secundus. In cockroaches, on the other hand, mainly GO terms related to transcription and translation were enriched among genes that were rearranged at internal cockroach nodes (Figure 3).

Enriched Gene Ontology (GO) terms of rearranged domains on different nodes (node numbers from Figure 1). Enriched terms are coloured by p-value and GO terms are clustered by semantic similarity using Wang’s graph-strategy (Yu et al., 2010).

Enriched Gene Ontology (GO) terms among rearranged genes in termites and cockroaches. For bger and dpun, genes were considered that were rearranged at the ancestral node of Blattella germanica and Diploptera punctata (node 3 in Figure 1), and for pame genes rearranged at the ancestor of Periplaneta americana and termites (node 6). In znev and csec, we show results for gene rearranged at the root of termites (red node, left dot), and in cfor, rspe, and mnat, genes rearranged at the emergence of true workers are shown (blue node, right dot). Enriched terms are coloured by p-value, and GO terms are clustered by semantic similarity using Wang’s graph-strategy (Yu et al., 2010).
Rearrangements enriched for protein glycosylation at the origin of true workers
At the origin of true workers (Figure 1, blue box), we inferred 110 new DAs, including 48 fusions, 18 fissions, and one single domain emergence. Within the 10 analysed species, this “novel” domain (PF14990) was found only in two of the three species with true workers and is described as a domain of unknown function (DUF4516; Table 1). A protein containing this domain was found both in M. natalensis (Mnat_01442) and R. speratus (RS000595), both of which showed, via Blastp (Camacho et al., 2009), sequence similarity to the protein BRAWNIN with up to 68.75% and 66.67% identity.
The domains that were rearranged at the root of true workers (Figure 1, blue box) were enriched for a single GO term—”vesicle fusion with Golgi apparatus” (Figure 2). The related domain, PF04869 (Uso1_p115_head), and the corresponding protein, Golgi matrix protein p115, lost a terminal domain (PF04871, Uso1_p115_C) in C. formosanus and M. natalensis compared to lower termites and cockroaches. We analysed the proteins that were rearranged at the origin of workers in R. speratus, C. formosanus, and M. natalensis (Figure 3). The majority of enriched GO terms were related to protein glycosylation, or protein and ion transport in the termites with true workers, while we also found an enrichment of transposition in M. natalensis. As with genes rearranged at the origin of termites, we found very little overlap with GO terms that were enriched among genes rearranged within cockroach nodes.
Genes with domain rearrangements are often caste-biased, alternatively spliced, and longer than non-rearranged genes
We hypothesized that genes, in which domains were rearranged at the root of termite eusociality or at the emergence of true workers, were important for the evolution of castes. An increased importance of these rearranged genes for particular castes should therefore be reflected in caste-biased expression. To test this hypothesis, we compared the proportions of caste-biased genes within rearranged and non-rearranged genes for the four species, for which caste-specific expression data were available. Confirming our expectations, we observed, with a -squared test, a significant enrichment of caste-biased genes within rearranged genes for C. secundus (44% of genes with rearrangements are caste biased vs. 26% of genes without rearrangements, p-value 0.0001) and M. natalensis (75% of genes with rearrangements vs. 39%, p-value 0.00001). However, in Z. nevadensis (60% vs. 56%, p-value = 0.519) and R. flavipes (11% vs. 10%, p-value = 0.998), we found no significant difference.
Alternative splicing offers a further mechanism for gaining caste-specific protein functions (Harrison et al., 2022; Price et al., 2018). To test whether genes with domain rearrangements also have an increased rate of caste-biased alternative splicing, we used data on differential exon expression for M. natalensis from a previous study (Harrison et al., 2022). We observed an enrichment of genes alternatively spliced in any caste within the subset of rearranged genes compared to non-rearranged genes (Figure 4). The same pattern was observed separately for genes alternatively spliced in kings, alate queens, and workers (Supplementary Figure 7a–c). In mature queens, the proportion was also higher in rearranged genes but not significantly, likely due to the overall low number of alternatively spliced genes in mature queens (Supplementary Figure 7d).

The proportions of alternatively spliced (AS) genes within rearranged and non-rearranged genes at the origins of termites and true workers in Macrotermes natalensis.
Rearranged genes tended to be older with a greater proportion found in Eukaryota, but the observed differences were not significant (Supplementary Figure 6, Additional File 1—Supplementary Table S1). We found rearranged genes to be significantly longer (Supplementary Figure 2) and contained significantly more domains than non-rearranged genes (Supplementary Figure 4). However, we also found caste-biased genes to be longer (Supplementary Figure 3) and to contain more domains (Supplementary Figure 5) than non-biased genes. To determine the relative importance of caste-biased expression, gene length, gene age, and number of domains on their propensity to be rearranged, we carried out a generalized linear model (GLM, Additional File 1—Supplementary Table S2). We ran the GLM on Z. nevadensis, C. secundus, R. speratus, and M. natalensis for genes rearranged at the origin of termites. For M. natalensis, we also included alternative splicing as an explanatory variable. We did not run a GLM for C. formosanus as we had no access to differential expression results. Gene length had a significant effect on the proportion of genes rearranged at the origin of termites in Z. nevadensis (p = 0.0498) and R. speratus (p = 0.0187). In M. natalensis, on the other hand, caste-biased expression was the only significant explanatory variable (p = 0.0243). All other variables had no significant effect. For R. speratus and M. natalensis, we repeated the GLM for genes rearranged at the origin of true workers. In this case, gene length had a significant positive effect on the proportion of rearranged genes in both species (p = 4.3 x 104 and 0.0180, respectively). In R. speratus, number of domains and caste-biased expression, on the other hand, both had a negative effects on gene rearrangements (p = 0.0499 and 0.0454, respectively). In M. natalensis, alternative splicing had an additional positive effect (p = 0.0461).
Genes with domain rearrangements have higher estimated methylation level
Because genes with domain rearrangements show different patterns of expression compared to non-rearranged genes, we expected to find differences in the regulation of their expression. To test this we compared levels of estimated methylation in genes with and without domain rearrangements in cockroaches and termites. For this, we compared observed vs. expected CpG counts, where depletion is known to correlate with higher levels of DNA methylation (Park et al., 2011). Confirming expectations, we observed lower CpGo/e values for rearranged compared to non-rearranged genes in all analysed blattodean species, significantly in all except R. speratus (Figure 5, Table 2). This difference in CpGo/e indicates higher DNA methylation in rearranged genes.

Distribution of CpGo/e per gene within rearranged and non-rearranged genes. Dashed lines represent means.
Species | Group | Number of RA | Number of non-RA | Mean CpG o/e for RA | Mean CpG o/e for non-RA | p-value |
Zootermopsis nevadensis | Termites | 105 | 9,751 | 0.508 | 0.579 | 0.014 |
Cryptotermes secundus | Termites | 99 | 12,311 | 0.448 | 0.535 | <0.001 |
Reticulitermes speratus | Termites w. TW | 77 | 10,439 | 0.512 | 0.538 | 0.250 |
Coptotermes formosanus | Termites w. TW | 54 | 9,779 | 0.446 | 0.536 | 0.011 |
Macrotermes natalensis | Termites w. TW | 52 | 9,295 | 0.440 | 0.568 | 0.002 |
Blattella germanica | Cockroaches | 301 | 11,126 | 0.538 | 0.604 | <0.001 |
Diploptera punctata | Cockroaches | 386 | 13,433 | 0.480 | 0.543 | <0.001 |
Periplaneta americana | Cockroaches | 152 | 14,693 | 0.553 | 0.656 | <0.001 |
Species | Group | Number of RA | Number of non-RA | Mean CpG o/e for RA | Mean CpG o/e for non-RA | p-value |
Zootermopsis nevadensis | Termites | 105 | 9,751 | 0.508 | 0.579 | 0.014 |
Cryptotermes secundus | Termites | 99 | 12,311 | 0.448 | 0.535 | <0.001 |
Reticulitermes speratus | Termites w. TW | 77 | 10,439 | 0.512 | 0.538 | 0.250 |
Coptotermes formosanus | Termites w. TW | 54 | 9,779 | 0.446 | 0.536 | 0.011 |
Macrotermes natalensis | Termites w. TW | 52 | 9,295 | 0.440 | 0.568 | 0.002 |
Blattella germanica | Cockroaches | 301 | 11,126 | 0.538 | 0.604 | <0.001 |
Diploptera punctata | Cockroaches | 386 | 13,433 | 0.480 | 0.543 | <0.001 |
Periplaneta americana | Cockroaches | 152 | 14,693 | 0.553 | 0.656 | <0.001 |
Note. w. TW = with true workers. p-Value for one-sided Wilcoxon test.
Species | Group | Number of RA | Number of non-RA | Mean CpG o/e for RA | Mean CpG o/e for non-RA | p-value |
Zootermopsis nevadensis | Termites | 105 | 9,751 | 0.508 | 0.579 | 0.014 |
Cryptotermes secundus | Termites | 99 | 12,311 | 0.448 | 0.535 | <0.001 |
Reticulitermes speratus | Termites w. TW | 77 | 10,439 | 0.512 | 0.538 | 0.250 |
Coptotermes formosanus | Termites w. TW | 54 | 9,779 | 0.446 | 0.536 | 0.011 |
Macrotermes natalensis | Termites w. TW | 52 | 9,295 | 0.440 | 0.568 | 0.002 |
Blattella germanica | Cockroaches | 301 | 11,126 | 0.538 | 0.604 | <0.001 |
Diploptera punctata | Cockroaches | 386 | 13,433 | 0.480 | 0.543 | <0.001 |
Periplaneta americana | Cockroaches | 152 | 14,693 | 0.553 | 0.656 | <0.001 |
Species | Group | Number of RA | Number of non-RA | Mean CpG o/e for RA | Mean CpG o/e for non-RA | p-value |
Zootermopsis nevadensis | Termites | 105 | 9,751 | 0.508 | 0.579 | 0.014 |
Cryptotermes secundus | Termites | 99 | 12,311 | 0.448 | 0.535 | <0.001 |
Reticulitermes speratus | Termites w. TW | 77 | 10,439 | 0.512 | 0.538 | 0.250 |
Coptotermes formosanus | Termites w. TW | 54 | 9,779 | 0.446 | 0.536 | 0.011 |
Macrotermes natalensis | Termites w. TW | 52 | 9,295 | 0.440 | 0.568 | 0.002 |
Blattella germanica | Cockroaches | 301 | 11,126 | 0.538 | 0.604 | <0.001 |
Diploptera punctata | Cockroaches | 386 | 13,433 | 0.480 | 0.543 | <0.001 |
Periplaneta americana | Cockroaches | 152 | 14,693 | 0.553 | 0.656 | <0.001 |
Note. w. TW = with true workers. p-Value for one-sided Wilcoxon test.
To further test the causal link between the lower CpGo/e levels and domain rearrangements, we analysed genes with the ancestral DAs in cockroaches that underwent evolutionary changes in termites. In contrast to rearranged genes in termites, cockroach genes carrying the ancestral DAs did not significantly differ from other genes in terms of CpGo/e for all three species of cockroach (Supplementary Table 13). To confirm the ability of CpGo/e to predict relative DNA methylation levels in rearranged genes, we analysed existing bisulfite sequencing data for M. natalensis (Harrison et al., 2022). Our estimations were confirmed. The rearranged genes showed significantly higher methylation levels compared to non-rearranged genes (p-value = 0.022 , one-sided Wilcoxon test; Supplementary Figure 8).
Depletion of transposable elements in close proximity to rearranged genes in termites
To understand whether domain rearrangements that occurred along the evolution of termites were due to TE activity, we analyzed TE counts in close proximity to the corresponding genes for the termites Z. nevadensis, C. secundus, and M. natalensis (data from Harrison et al., 2018, Supplementary Figure 9). We did not observe increased TE content in 10kb flanking regions of the genes rearranged at the origin of termites. In fact, TEs were depleted in rearranged genes compared to non-rearranged genes in all three termite species, significantly so in Z. nevadensis (median TE content: 0.23 vs. 0.27; p-value = 2.56 x 10−3) and M. natalensis for genes rearranged at the root of true workers (median TE content: 0.32 vs. 0.36; p-value = 6.91 x 10−4). In the cockroach B. germanica, on the other hand, TE content did not differ between rearranged and non-rearranged genes (median content: 0.46 vs. 0.46; Supplementary Figure 10).
Discussion
Blattodea genomes show a reduction in proteome size and gene family contractions along with the evolution of eusociality in termites (Harrison et al., 2018). Here, we tested whether the functional novelty involved in the evolution of termites was acquired by changes in existing proteins. For this, we analysed DAs in the proteomes of all available Blattodea genomes (three cockroaches and five termites) and two outgroup species. We observed more than 5,000 rearrangements in Blattodea with the proportions of different rearrangement events similar to previous studies (Dohmen et al., 2020; Thomas et al., 2020). At the origin of termites, when small wood-dwelling colonies emerged with totipotent workers and long-lived reproductives, we discovered 90 new domain arrangements that may be linked to the emergence of eusociality as well as a specialization on wood feeding. Along with the transition to superorganismality, which involved the evolution of sterile, “true” workers, larger, more complex, foraging colonies, and even longer-lived reproductives, we detected 110 further evolutionary changes in DAs. We uncovered seven novel domains at the origin of termites, i.e. not found in the analysed cockroaches and outgroups, but only one domain that, within our species set, was specific to termite species with true workers, indicating the transition to superorganismality mainly involved the rearrangement of existing DAs.
At the emergence of eusociality, novel domains affected proteins with functions that can be related to changes in energy consumption, diets, and lifespan among castes, such as DNA replication and mitochondrial functions. These along with proteins with functions in protein homeostasis (glycosylation and ubiquitination), DNA repair, and nutrient sensing (IIS) can be related to several important ageing mechanisms (López-Otín et al., 2013). These mechanisms have previously been linked to extreme longevity and highly maintained reproductive fitness in termite queens (Monroy Kuhn et al., 2019; Séité et al., 2022), but may also be related to changes in diet. Experimental testing of the rearranged genes may spread light on their involvement in wood-feeding, fertility, and longevity. A domain emergence in the chromatin assembly factor 1 complex at the root of termites may be important for remodelled developmental programmes necessary for the evolution of castes. Rearranged DAs at the origin of termites are enriched for several developmental GO terms, such as regulation of embryonic development, cell adhesion, and cell migration, which are likely linked to the evolution of novel caste phenotypes in the termites. As in the novel domains, we also found DA rearrangements in termites related to DNA repair and response to stress which may be related to the high longevity in termite queens. Within our species set, these functions were unique to termite lineages and distinct from the functional categories affected by domain rearrangements in cockroaches, highlighting their importance for the evolution of termites.
The domain that was unique to termites with true workers in our data set has a putative role in the mitochondrial respiratory chain within the protein BRAWNIN, which may be linked to increased caste-specific differences in energy consumption due to foraging of workers and increased longevity and reproductive output of kings and queens. A single GO term was enriched at the node where in our species set true workers emerged. The domain (F04869: Uso1_p115_head) that was associated with this enriched GO term, “vesicle fusion with Golgi apparatus,” is found in the gene p115, which is important for correct morphology and organization of the Golgi apparatus and transitional endoplasmic reticulum (Kondylis & Rabouille, 2003). Further domain rearrangements at the origin of true workers indicated the importance of protein glycosylation and protein transport in larger, more socially complex termite colonies. It is possible that post-translational modifications such as glycosylation, which occurs in the Golgi apparatus, add to the functional diversity of proteins necessary for the production of diverse castes from the same genome. However, experimental testing of the affected genes is essential for understanding their role in highly eusocial termite colonies. These functions were not enriched among genes rearranged at other Blattodea nodes, highlighting their potential specific importance for the emergence of sterile workers.
One of the characteristics of eusociality is the division of labour with genomes producing distinct castes that differ in morphology, behaviour, and physiology. These differences can be reflected in caste-specific patterns of gene expression and splicing (Harrison et al., 2018, 2022; Scharf et al., 2005; Terrapon et al., 2014). In two of four termites, we observed an enrichment of differential expression among rearranged genes. In the higher termite, M. natalensis, we also discovered an enrichment of caste-biased exon splicing among genes rearranged at the origin of true workers indicating the potential role of rearranged genes in the evolution of castes. However, in three termite species, caste-biased genes were not enriched among rearranged genes, when factoring in the effect of gene length. In fact, in R. speratus, caste-biased genes are under-represented among genes rearranged at the origin of termites. The inclusion of further genomes and information on caste-biased expression and splicing are necessary to better understand the importance of domain rearrangements for the evolution of castes.
Besides being longer and generally containing more domains, we found that rearranged genes are also predicted (with CpGo/e) to be more methylated than non-rearranged genes in all eight investigated Blattodea genomes (three cockroaches and five termites). This was confirmed with bisulfite sequencing data for M. natalensis. That we find no differences in CpGo/e between gene sets containing the ancestral DAs of these rearranged genes, indicating changes in methylation may have occurred after rearrangements took place. The observed differences in methylation for rearranged and non-rearranged genes indicate potential changes in the regulation of these sets of genes. However, methylation data are necessary to confirm these findings as previous studies have shown that CpGo/e is not always a reliable predictor of true methylation levels (Glastad et al., 2017). Overall, the results on differential patterns of expression and splicing of rearranged genes in termites suggest the importance of domain rearrangements in the evolution of termite castes and the potential properties of genes involved in the evolution of eusociality in termites. However, increased methylation might be a common feature of rearranged genes as we found this pattern in both termites and cockroaches.
Some genomic innovations, including domain rearrangements, are associated with the activity of transposable elements (TEs) (Bourque et al., 2018; Buljan et al., 2010). However, we observed a depletion of TEs in the close proximity of the genes with domain rearrangements that occurred in the origins of termites and true workers in Z. nevadensis and M. natalensis. This might indicate other potential mechanisms involved in the domain rearrangements in termites such as recombination, as well as possible selection against TE insertions as an indicator of the functional importance of these genes (Bartolomé et al., 2002; Kent et al., 2017).
Altogether, our findings support a role of domain rearrangements in the evolution of termite castes with rearranged genes playing an important role in caste-biased gene expression and splicing. Domain rearrangements affected functions related to queen longevity and major changes in development. Our analyses indicate the greatest functional novelty due to domain evolution already arose at the origin of eusociality, where simple societies emerged with totipotent workers. The emergence of true workers along with increasing social complexity, such as larger foraging colonies and even longer-lived reproductives, on the other hand, was accompanied by more domain rearrangements, within similar functional categories. These analyses incorporated all currently available blattodean genomes. Further genomes will allow us to better pinpoint domain novelties and relate these to the emergence and elaboration of specific social traits such as worker sterility, foraging, queen lifespan, and colony size, and differentiate these from confounding ecological traits, such as wood-feeding. Important lineages in this respect are the subsocial sister group of all termites, the wood-dwelling Cryptocercus, or the rather ancient lineage of Mastotermes, in which large foraging colonies with true workers evolved independently to the termites studied in the present study. Nevertheless, the observations reported here already suggest an important role for domain rearrangements in the evolution of termite eusociality with intriguing implications for other origins of eusociality, in which this source of protein novelty is so far under-appreciated.
Data availability
All analyses were carried out on publicly available data, using published tools that are cited within the manuscript. All codes that were used to carry out these analyses can be found on Zenodo (https://doi.org/10.5281/zenodo.10820940) paired with the GitHub repository (https://github.com/Aragret/DomainRearrTermites). Inquiries and requests can be directed to the corresponding author.
Author contributions
M.H. conceived the study. A.M. carried out all analyses and wrote the first manuscript draft. E.D. and M.H. assisted with analyses and interpreting results. All authors worked on revising the manuscript.
Funding
A.A.M. and M.H. were funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - project number: 433073542.
Conflicts of interest
None declared.