-
PDF
- Split View
-
Views
-
Cite
Cite
Andrew J. O’Donnell, Ruiqi Huang, Jessica J. Barboline, Todd J. Barkman, Convergent Biochemical Pathways for Xanthine Alkaloid Production in Plants Evolved from Ancestral Enzymes with Different Catalytic Properties, Molecular Biology and Evolution, Volume 38, Issue 7, July 2021, Pages 2704–2714, https://doi.org/10.1093/molbev/msab059
- Share Icon Share
Abstract
Convergent evolution is widespread but the extent to which common ancestral conditions are necessary to facilitate the independent acquisition of similar traits remains unclear. In order to better understand how ancestral biosynthetic catalytic capabilities might lead to convergent evolution of similar modern-day biochemical pathways, we resurrected ancient enzymes of the caffeine synthase (CS) methyltransferases that are responsible for theobromine and caffeine production in flowering plants. Ancestral CS enzymes of Theobroma, Paullinia, and Camellia exhibited similar substrate preferences but these resulted in the formation of different sets of products. From these ancestral enzymes, descendants with similar substrate preference and product formation independently evolved after gene duplication events in Theobroma and Paullinia. Thus, it appears that the convergent modern-day pathways likely originated from ancestral pathways with different inferred flux. Subsequently, the modern-day enzymes originated independently via gene duplication and their convergent catalytic characteristics evolved to partition the multiple ancestral activities by different mutations that occurred in homologous regions of the ancestral proteins. These results show that even when modern-day pathways and recruited genes are similar, the antecedent conditions may be distinctive such that different evolutionary steps are required to generate convergence.
Introduction
The ubiquity of convergent trait acquisition throughout the tree of life suggests that evolution may be, at least to some extent, repeatable due to some combination of constraint on one hand and strength of natural selection on the other (Storz 2016). One framework by which to conceptualize and study mechanisms of convergence emphasizes dissection of the biological hierarchy through study of the pathways (biochemical or developmental), underlying genes, encoded protein functions, and mutational paths involved (Des Marais and Rausher 2010; Manceau et al. 2010; Losos 2011). At one end of the spectrum, convergent traits may arise from different pathways, genes, and sets of mutations, whereas at the other, in principle, it is possible to have the same pathway, generate a similar phenotype in independent lineages that are realized by orthologous genes that acquired their novel functions by identical mutations to the same ancestral nucleotides (Zhang 2006; Christin et al. 2010; Manceau et al. 2010; Storz 2016). Although convergent evolution has been studied in terms of the underlying modern-day pathways and genes recruited, the ancient underpinnings of the independent origins of traits are less well understood. For instance, in the case of convergently evolved metabolites generated by the same biosynthetic pathways (Pichersky and Lewinsohn 2011), ancestral biochemical flux may or may not have been the same as that exhibited by modern-day descendant species. Computational models indicate that evolutionary changes in pathway flux may be predictable and are influenced by biochemical network structure, gene expression levels, and kinetic properties of the enzymes involved (Wheeler and Smith 2019). One way to investigate the historical sequence of independent pathway evolution involves ancestral sequence resurrection (ASR) (Thornton 2004; Dean and Thornton 2007) which provides an experimental means to directly determine the genetic and biochemical changes leading to convergence (Zhang 2006; Natarajan et al. 2016). This experimental method has provided novel insight into protein functional evolution in several different systems (Chang et al. 2002; Bridgham et al. 2006, 2009; Gaucher et al. 2008; Field and Matz 2010; Smith et al. 2013; Lichman et al. 2020). Utility of ASR was shown to be particularly valuable for revealing ancestral changes to enzymes involved in the corticosteroid pathway that allowed for biosynthetic elaboration in primates (Olson-Manning 2020).
One example of convergent trait evolution in angiosperms is the biosynthesis of xanthine alkaloids, like caffeine (CF), which have several ecological roles (Suzuki and Waller 1987; Baumann et al. 1995; Ashihara and Crozier 2001; Uefuji et al. 2005; Anaya et al. 2006; Wright et al. 2013). CF is produced in various plant tissues via the sequential methylation of xanthine alkaloid precursors by either caffeine synthase (CS) or xanthine methyltransferase (XMT) enzymes that are paralogs in the SABATH family of S-adenosyl-l-methionine (SAM)-dependent methyltransferases (Suzuki and Takahashi 1976; Mazzafera et al. 1994; Ashihara et al. 1996; Mizuno, Kato, et al. 2003; Kato and Mizuno 2004) (fig. 1 and supplementary fig. S1, Supplementary Material online). Coffea spp. (coffee) and Citrus sinensis (citrus) convergently utilize XMT enzymes to produce CF (Uefuji et al. 2003; McCarthy AA and McCarthy JG 2007; Denoeud et al. 2014; Huang et al. 2016; supplementary fig. S1, Supplementary Material online). Yet, these orthologous enzymes catalyze divergent pathways; primarily via xanthosine (XR), 7-methylxanthine (7X), and theobromine (TB) in Coffea, and xanthine (X), 1- and 3-methylxanthine (1X, 3X), and theophylline (TP) in Citrus (fig. 1). The pathway in Camellia sinensis (tea) is thought to be convergent with Coffea, even though it recruited CS enzymes to catalyze the same reactions (Kato et al. 2000) (fig. 1 and supplementary fig. S1, Supplementary Material online). Paullinia cupana (guarana) and Theobroma cacao (cocoa) convergently utilize CS-type enzymes to methylate xanthine alkaloids as in Camellia (Huang et al. 2016) (supplementary fig. S1, Supplementary Material online). However, the CS enzymes of Paullinia and Theobroma primarily convert X to 3X and 3X to TB (fig. 1). Surprisingly, the CS1 enzymes in Paullinia and Theobroma that convert X to 3X appear to have evolved independently after gene duplication, as have their CS2 enzymes that catalyze the conversion of 3X to TB (Huang et al. 2016). A CS-type enzyme has been reported to convert TB to CF in Paullinia but none is yet known for Theobroma (Schimpl et al. 2014; Huang et al. 2016) (fig. 1). The multiple modern-day CS enzymes in Paullinia, Theobroma and Camellia are part of an ancient lineage but, like the XMT enzymes of Coffea and Citrus, arose by independent duplication in each caffeine-producing group more recently (Huang et al. 2016) (fig. S1, Supplementary Material online).

The xanthine alkaloid biosynthetic network in plants potentially includes 12 unique paths leading from XR and xanthine to caffeine. Ring nitrogen atoms are numbered on X and XR to show that the order in which N1, N3, and N7 are methylated differs between each pathway. Several of these appear to be utilized across angiosperms. Dotted arrows show the presumed pathway through which the majority of flux is achieved in Coffea using XMT-type enzymes and Camellia using CS-type enzymes. Dashed arrows show the presumed network utilized by Citrus which uses XMT-type enzymes. Solid thick and thin lines show the common pathway hypothesized for Paullinia and Theobroma, respectively, although the final step to CF from TB is unclear for the latter. Both Paullinia and Theobroma utilize CS-type enzymes to catalyze the reactions shown. Enzyme names are provided next to reactions they catalyze. Distinct structure colors are kept consistent with subsequent figures.
Individual CS or XMT enzymes may generate considerable biochemical complexity due to their potential to 1) methylate multiple substrates in the pathway and 2) produce multiple products from single substrates due to methylation of different ring N atoms (see N1, N3, or N7 in fig. 1) (Kato et al. 2000; McCarthy AA and McCarthy JG 2007; Huang et al. 2016). This enzymatic complexity, combined with the fact that intermediates in the pathway do not accumulate to appreciable levels, makes it difficult to determine for any one species if flux is linear or highly branched throughout the metabolic network shown in figure 1. Nonetheless, primary routes to CF biosynthesis have been hypothesized based on a combination of documented gene coexpression patterns, enzyme substrate preferences, and detection of intermediate metabolites in metabolomic and radioisotope tracer studies (Kato et al. 1996; Mizuno, Okuda, et al. 2003; Uefuji et al. 2003; Kato and Mizuno 2004; Ashihara et al. 2008; Huang et al. 2016; Deng et al. 2020). Angiosperm CS- and XMT-type enzymes have been characterized mainly from CF-producing species; yet, orthologs are known from other non-xanthine alkaloid-accumulating relatives (Huang et al. 2016). In these cases, it is unclear why the genes would have been maintained over long periods of time if they were not involved in CF biosynthesis. In Mangifera, the only non-CF-producing species studied, it was shown that benzoic acid is the preferred substrate for its XMT ortholog with no xanthine alkaloid activity detected. Benzoic and salicylic acid methylation was also shown for the ancestral angiosperm XMT protein which suggests that xanthine alkaloid methylation only recently evolved in descendant enzymes (Huang et al. 2016).
The convergent xanthine alkaloid pathways in Paullinia (Sapindales) and Theobroma (Malvales) catalyzed by recently and independently duplicated orthologous CS enzymes makes this a remarkable case of convergence because they are members of lineages that diverged at least 60 Ma (Huang et al. 2016; Zeng et al. 2017). As they have both converged on the use of orthologous proteins to catalyze the same pathway, we predict that each lineage would have possessed progenitor enzymes with similar catalytic properties that would allow for evolution of the modern-day pathway following the same biochemical steps from X to 3X to TB. Furthermore, it has been demonstrated that single amino acid substitutions within the predicted active site in XMT enzymes were sufficient for specialization on xanthine alkaloid substrates in the Citrus lineage (Huang et al. 2016). Therefore, we predict that mutations leading to changes in substrate preference of CS enzymes are also constrained to the same active site regions in spite of their ancient divergence. In this study, we resurrected ancestral enzymes at key branching points of the CS lineage to test these predictions.
Results and Discussion
Evolution of TB and CF Production in Paullinia, Theobroma, and Camellia Originated from Ancestral CS Enzymes with Similar Substrate Preferences to Form Distinct Sets of Products
Even though whole-genome sequences were queried, CS orthologs appear to be lacking from most asterid lineages like Lamiales and Solanales and seem to be encoded in only a scattered set of angiosperm lineages as shown in figure 2A (see also supplementary fig. S2, Supplementary Material online). The only functionally characterized CS enzymes are known from Sapindales, Malvales, and Ericales. These are also the only lineages from which xanthine alkaloids (CF or precursors) are reliably known of those shown in fig. 2A (Ashihara and Suzuki 2004). Thus, it is unclear what the activity of CS orthologs from Myrtales, Geraniales, and Cornales may be because none of their proteins have been functionally characterized. One potential role for the orthologs from these lineages is trigonelline biosynthesis which also requires ring nitrogen methylation and has been shown for an XMT-type enzyme in Coffea (Mizuno et al. 2014). Alternatively, it may be that CS enzymes in all of the lineages shown in figure 2A are involved in xanthine alkaloid biosynthesis to produce CF at low, difficult-to-detect levels. If so, the distribution of this important stimulant could be much more widespread in angiosperms than is currently appreciated.

Substrate preferences of ancestral CS-type enzymes reveal the origins of modern-day enzyme activities. (A) Estimated CS gene tree (lnL = −19,220.31407) shows general relationships among sequences from divergent orders of angiosperms. Node labels are shown for reconstructed ancestral enzymes. (B) Xanthine alkaloid substrates tested with each CS enzyme are color-coded to represent structures shown in figure 1. X = xanthine, XR = xanthosine, 1X = 1-methylxanthine, 3X = 3-methylxanthine, 7X = 7-methylxanthine, TP = theophylline, TB = theobromine, PX = paraxanthine, CF = caffeine. (C) Ancestral Paullinia CS enzymes preferred to methylate 7X as shown by the pie charts at nodes M and N but ultimately evolved modern-day enzymes to sequentially methylate X and 3X and catalyze a complete pathway to TB. (D) Ancestral Theobroma CS enzyme at Node O preferred to methylate 7X. After duplication, preference for X evolved in the ancestral enzyme at Node P. From this enzyme, modern-day Theobroma CS enzymes evolved substrate preferences allowing for TB biosynthesis via sequential methylation of X and 3X. (E) Ancestral Camellia CS enzyme of Node Q also preferred to methylate 7X but later evolved divergent modern-day activities shown for C. sinensis TCS1 and TCS2. Inset boxes showing CF pathway network is shaded for ancestral enzymes. The pie charts of figure 2C–E represent mean relative activity with each substrate for the combined ancestral alleles at each node. Enzymes marked with “*” are taken from published studies from other laboratories.
To investigate the characteristics of ancient CS-type enzymes in angiosperms, we experimentally resurrected and characterized at least two ancestral allelic variants for each of the progenitors predicted for Paullinia (PcAncCS1), Theobroma (TcAncCS1), and Camellia (CsAncCS) (fig. 2C–E nodes M, O, and Q; supplementary figs. S1, S3–S5, Supplementary Material online). Surprisingly, all three of these ancestral enzymes had highest relative activity with 7X even though preference for the substrate is not exhibited by the modern-day CS1 and CS2 enzymes derived from each of them (fig. 2C–E and supplementary figs. S3–S5, Supplementary Material online). This similar methylation activity was exhibited in spite of a substantial degree of sequence divergence among these ancestral CS enzymes that ranges from 51% to 63%. All three ancestral enzymes converted 7X to TB by methylation at N3, but in addition, TcAncCS1 and CsAncCS converted 7X to paraxanthine (PX) by methylation at the N1 position (fig. 2C–E and supplementary table S1, Supplementary Material online). Each ancestral enzyme also had lower relative activity with X: In PcAncCS1, X was converted to 3X, whereas TcAncCS1 and CsAncCS methylated X to produce both 1X and 3X (fig. 2C–E and supplementary table S1, Supplementary Material online). The product of the reaction with 3X was below our detection limit for PcAncCS1 and CsAncCS, but in TcAncCS1, 3X was converted to TP (fig. 2D and supplementary table S1, Supplementary Material online). Thus, N1 and N3 methylation were properties of these ancestral enzymes but none appear to methylate the N7 position of any substrate we tested.
Since each of the ancestral enzymes in the Paullinia, Theobroma, and Camellia lineages appears to have been capable of converting xanthine to a monomethylated product (1X and 3X) (fig. 2C–E), they may have provided the beginnings of the modern-day CF biosynthetic pathways. This is predicted under the cumulative hypothesis for pathway evolution in which earlier steps are predicted to evolve first (Granick 1957; Huang et al. 2016); in addition, TcAncCS1 could also have formed the dimethylated product, TP. If these molecules were to accumulate in ancestral plant tissues, they could have conferred a selective advantage which would likely result in retention of the ancient genes because 1X and 3X have been shown to bind to modern-day rat adenosine receptors (Daly et al. 1983) and TP can modulate Adenylate Cyclase in insects (Nathanson 1984). Their formation in tissues is tenable since PcAncCS1 and TcAncCS1 have apparent KM estimates for X that are comparable with modern-day CS enzymes (53–417 μM) (table 1) (Huang et al. 2016). Alternatively, if ancestral X methylation was not physiologically relevant, then their high 7X activity was probably fortuitous since ancestral plants would not likely accumulate 7X to react with, unless other enzymes were responsible for its production. Yet, only modern-day CS and XMT-type enzymes have been demonstrated to produce 7X making it unlikely that the substrate was synthesized in ancient plant tissues (Mizuno, Okuda, et al. 2003; Uefuji et al. 2003; Huang et al. 2016). In this case, these enzymes would have been exapted for their modern-day roles in TB and CF biosynthesis because N3 methylation of 7X to form TB in Camellia CsAncCS appears to be an ancient characteristic that has been maintained over 100 My and is utilized as an enzymatic step for modern-day CF production (Kato et al. 2000) (fig. 1 and 2E). Even if CS genes were maintained for some alternative function outside of xanthine alkaloid methylation that we have not assayed for, it is apparent that ancestral 7X preference is an enzymatic characteristic also maintained by modern-day CS-type enzymes in non-caffeine accumulating Camellia species (Ishida et al. 2009) and Theobroma (Yoneyama et al. 2006), which appears to contribute to TB formation.
Enzyme Kinetic Parameter Estimates for Modern-Day and Ancestral Enzymes with Selected Substrates.
Enzyme (substrate) . | KM (μM) . | kcat (1/s) . | kcat/KM (s−1M−1) . |
---|---|---|---|
Modern-day enzymes | |||
TcCS1 (X) | 95.80 | 8.37E−05 | 0.87 |
TcCS2 (3X) | 49.10 | 9.81E−05 | 2.00 |
PcCS1 (X) | 95.38 | 1.52E−03 | 15.94 |
PcCS2 (3X) | 677.00 | 9.33E−04 | 1.38 |
Ancestral enzymes | |||
TcAncCS1 (X) | 53.40 | 8.65E−05 | 1.62 |
TcAncCS1 (3X) | 138.60 | 1.28E−04 | 0.92 |
TcAncCS1 (7X) | 34.60 | 2.47E−04 | 7.20 |
TcAncCS2 (X) | 4.14 | 3.39E−04 | 82.06 |
TcAncCS2 (3X) | 154.00 | 3.32E−04 | 2.16 |
PcAncCS1 (X) | 417.50 | 3.76E−04 | 0.90 |
CsAncCS (7X) | 26.70 | 1.05E−03 | 39.33 |
Enzyme (substrate) . | KM (μM) . | kcat (1/s) . | kcat/KM (s−1M−1) . |
---|---|---|---|
Modern-day enzymes | |||
TcCS1 (X) | 95.80 | 8.37E−05 | 0.87 |
TcCS2 (3X) | 49.10 | 9.81E−05 | 2.00 |
PcCS1 (X) | 95.38 | 1.52E−03 | 15.94 |
PcCS2 (3X) | 677.00 | 9.33E−04 | 1.38 |
Ancestral enzymes | |||
TcAncCS1 (X) | 53.40 | 8.65E−05 | 1.62 |
TcAncCS1 (3X) | 138.60 | 1.28E−04 | 0.92 |
TcAncCS1 (7X) | 34.60 | 2.47E−04 | 7.20 |
TcAncCS2 (X) | 4.14 | 3.39E−04 | 82.06 |
TcAncCS2 (3X) | 154.00 | 3.32E−04 | 2.16 |
PcAncCS1 (X) | 417.50 | 3.76E−04 | 0.90 |
CsAncCS (7X) | 26.70 | 1.05E−03 | 39.33 |
Enzyme Kinetic Parameter Estimates for Modern-Day and Ancestral Enzymes with Selected Substrates.
Enzyme (substrate) . | KM (μM) . | kcat (1/s) . | kcat/KM (s−1M−1) . |
---|---|---|---|
Modern-day enzymes | |||
TcCS1 (X) | 95.80 | 8.37E−05 | 0.87 |
TcCS2 (3X) | 49.10 | 9.81E−05 | 2.00 |
PcCS1 (X) | 95.38 | 1.52E−03 | 15.94 |
PcCS2 (3X) | 677.00 | 9.33E−04 | 1.38 |
Ancestral enzymes | |||
TcAncCS1 (X) | 53.40 | 8.65E−05 | 1.62 |
TcAncCS1 (3X) | 138.60 | 1.28E−04 | 0.92 |
TcAncCS1 (7X) | 34.60 | 2.47E−04 | 7.20 |
TcAncCS2 (X) | 4.14 | 3.39E−04 | 82.06 |
TcAncCS2 (3X) | 154.00 | 3.32E−04 | 2.16 |
PcAncCS1 (X) | 417.50 | 3.76E−04 | 0.90 |
CsAncCS (7X) | 26.70 | 1.05E−03 | 39.33 |
Enzyme (substrate) . | KM (μM) . | kcat (1/s) . | kcat/KM (s−1M−1) . |
---|---|---|---|
Modern-day enzymes | |||
TcCS1 (X) | 95.80 | 8.37E−05 | 0.87 |
TcCS2 (3X) | 49.10 | 9.81E−05 | 2.00 |
PcCS1 (X) | 95.38 | 1.52E−03 | 15.94 |
PcCS2 (3X) | 677.00 | 9.33E−04 | 1.38 |
Ancestral enzymes | |||
TcAncCS1 (X) | 53.40 | 8.65E−05 | 1.62 |
TcAncCS1 (3X) | 138.60 | 1.28E−04 | 0.92 |
TcAncCS1 (7X) | 34.60 | 2.47E−04 | 7.20 |
TcAncCS2 (X) | 4.14 | 3.39E−04 | 82.06 |
TcAncCS2 (3X) | 154.00 | 3.32E−04 | 2.16 |
PcAncCS1 (X) | 417.50 | 3.76E−04 | 0.90 |
CsAncCS (7X) | 26.70 | 1.05E−03 | 39.33 |
Although each of the ancestral enzymes catalyzed a unique set of xanthine alkaloid products, Paullinia and Theobroma ultimately evolved convergent pathways to TB biosynthesis (figs. 1, 2C, and 2D) (Huang et al. 2016). At a minimum, this would have required independent acquisition of N7 methylation of 3X by Paullinia and Theobroma CS enzymes in order for them to produce TB, the second metabolite in the pathway toward CF (fig. 1, 2C, and 2D). To establish how this change to xanthine alkaloid metabolism occurred, we resurrected and experimentally characterized the younger ancestral enzymes that descended from PcAncCS1 and TcAncCS1 and ultimately gave rise to the specialized modern-day CS1 and CS2 paralogs independently in Paullinia and Theobroma.
Convergent Duplication Events of Paullinia and Theobroma Ancestral CS Enzymes Allowed for a Connected Pathway to TB Biosynthesis through Convergent Catalytic Changes
Gene duplication of PcAncCS1 in the Paullinia CS lineage (Node M) gave rise to PcAncCS2 (Node N) and a modern-day descendant, Paullinia “CS-like,” for which no enzymatic activity was detected (fig. 2C). In both allelic variants of PcAncCS2, as in the ancestor PcAncCS1, activity with 7X was still highest to form TB and PX, whereas methylation with X to form 3X remained relatively low; however, in this descendant, higher relative activity with 3X evolved to form TB by N7 methylation (fig. 2C, Node N; supplementary fig. S6 and table S1, Supplementary Material online). As a result, this change would have allowed for extension of the caffeine pathway by sequential methylation from the ancestral step of X to 3X to include 3X to TB in a manner consistent with the cumulative hypothesis which predicts that later evolved biosynthetic steps build upon earlier ones within a pathway (Granick 1957).
In Theobroma, duplication of TcAncCS1 at Node O led to the evolution of TcAncCS2 (Node P) as well as Theobroma BTS (fig. 2D). Whereas BTS retained ancestral preference to methylate 7X (Yoneyama et al. 2006), both allelic variants of TcAncCS2 suggest it lost ancestral preference for 7X and evolved highest relative activity with X, converting it solely to 3X (fig. 2D and supplementary fig. S7 and table S1, Supplementary Material online). This result represents specialization for N3 methylation due to loss of N1 activity with xanthine. TcAncCS2 also converted 3X to TB, which required acquisition of N7 methylation and loss of N1 methylation of the substrate (fig. 2D and supplementary table S1, Supplementary Material online). Thus, both Theobroma and Paullinia ancestral enzymes acquired N7 methylation of 3X following the earliest known duplications that occurred in each lineage (fig. 2C and D). As these independent gene duplication events gave rise to descendant enzymes capable of performing the sequential methylation steps required to convert X to TB, they could be viewed as convergent changes that facilitated the evolution of the same pathway steps in both lineages. In the case of Theobroma, it appears that the first two biochemical steps catalyzed by its modern-day enzymes evolved by a diversion away from ancestral flux to 1X from X and TP from 3X as seen in TcAncCS1 to primarily X to 3X to TB, rather than being gradually assembled one reaction at a time as in Paullinia (fig. 2C and D).
The restructuring of relative substrate preferences and methylation patterns observed in TcAncCS2 (fig. 2D) prompted further investigation into its kinetic properties as well as those of TcAncCS1 from which it descended; it was not clear if the relative preference shift toward X in TcAncCS2 was due to higher affinity for it or merely a relative loss in recognition of other substrates. The Michaelis–Menten kinetic parameter estimates summarized in table 1 indicate that TcAncCS2 had a kcat/KM of 82.1 s−1 M−1 with X which is 37-fold higher than 3X (and likely that of 7X for which activity was too low to determine kinetic parameters), whereas in TcAncCS1, the kcat/KM with X was comparable with that of 3X but four times lower than 7X (table 1). The apparent improvement of TcAncCS2 with xanthine relative to other substrates was concomitant with a switch away from N1 toward N3 methylation and appears to have set ancestral flux into the beginning of the pathway to TB and CF via 3X predicted for modern-day Theobroma (Huang et al. 2016). However, even though TcAncCS2 evolved to perform the two successive enzymatic steps to TB, kinetic estimates predict that flux may have been low for this single ancestral enzyme. The kcat/KM for TcAncCS2 with 3X, the product of the reaction with X and second substrate in the pathway to TB, was estimated to be 2.2 s−1 M−1 (table 1). Because the specificity constant of TcAncCS2 for 3X is nearly 40 times lower than that of xanthine, from which it is biochemically derived, the conversion of 3X to TB is predicted to have been low in the presence of this single ancestral enzyme, unless cellular concentrations of X were to become depleted.
After the origin of PcAncCS2 by duplication from PcAncCS1, it was subsequently duplicated again resulting in the evolution of one descendant, PcCS1, that acquired near-complete preference for X to form 3X and a second descendant, PcCS2, that evolved specificity for 3X to form TB (Huang et al. 2016) (fig. 2C). Likewise, initial gene duplication of TcAncCS1 gave rise to TcAncCS2; this daughter gene was then duplicated later to result in modern-day TcCS1, which prefers to methylate X to form 3X, and TcCS2, which prefers to methylate 3X to form TB (but also has lower relative activity with X to form 7X) (Huang et al. 2016) (fig. 2D). These independent duplications in the two lineages therefore represent additional convergence between Paullinia and Theobroma at the level of genes involved; it was not until the duplications of PcAncCS2 and TcAncCS2 that paralogs with the same substrate preferences emerged in the two lineages. In both cases, this likely resulted in a more catalytically favorable pathway from X to 3X to TB. Although we were not able to estimate kinetic parameters for PcAncCS2 due to low protein yields after purification, kinetic parameters of TcAncCS2 and its descendant enzymes predict that specialists would be favored as 3X would not compete for active site binding; the same is perhaps true for Paullinia enzymes.
Divergent Mutations to Homologous Protein Regions Facilitated Convergent Shifts toward Substrate Specialization in Paullinia and Theobroma
In Paullinia and Theobroma, alignments show that Region I (fig. 3A) was mutated in both CS enzyme lineages and is predicted to interact with substrate molecules within the active site, as defined by the Coffea XMT and Clarkia SAMT crystal structures as well as mutagenesis of Citrus XMT (Zubieta et al. 2003; McCarthy AA and McCarthy JG 2007; Huang et al. 2016). Within Region I of the Paullinia CS lineage, Thr25 of PcAncCS2 (Node N) was replaced by Ser in modern-day PcCS2 (figs. 2C and 3A). Experimental replacement of Thr25 by Ser in PcAncCS2 largely recapitulated the enzymatic shift toward 3X preference to form TB as in its descendant, PcCS2, and caused loss of nearly all ancestral activity with other substrates (figs. 2C and 3B and supplementary table S1, Supplementary Material online). Within the Theobroma lineage leading from TcAncCS2 (Node P) to modern-day TcCS2, Region I was substituted such that SAG21-23 was replaced by AEA (figs. 2D and 3A). Experimental mutation of the three contiguous sites resulted in a convergent shift of relative preference for 3X to form TB, like was shown in the Paullinia lineage, as well as the ability to convert X to 7X making the mutant very similar to modern-day TcCS2 (figs. 2D and 3B, supplementary table S1, Supplementary Material online). Thus, convergent improvement to 3X methylation of the N7 position occurred by divergent mutations to different codons of a broadly homologous region of CS-type enzymes. Support for the hypothesis that Region I is necessary for xanthine alkaloid methylation specificity is further strengthened by the fact that a convergent amino acid replacement to that of PcAncCS2 occurred in the XMT lineage of Citrus CF biosynthetic enzymes (Huang et al. 2016) (fig. 3A). In this case, instead of T25S as inferred for Paullinia PcAncCS2, Pro25 was replaced by Ser in CisAncXMT2 and this also resulted in improved activity with 3X (Huang et al. 2016). However, although P25S increased activity with 3X in CisAncXMT2, this improved upon N1 methylation unlike the case for PcAncCS2 in which N7 activity increased when Ser replaced Thr in the presumed active site. Thus, this convergent amino acid replacement by Ser in these SABATH paralogs did not result in the evolution of convergent catalytic properties.

Ancestral CS enzymes experienced mutations to homologous regions and exhibited convergent changes in substrate preferences. (A) Alignments in Regions I and III of CS enzymes show that both were mutated in ancestral Theobroma and Paullinia enzymes, whereas only ancestral Citrus XMT (Huang et al. 2016) experienced substitution in Region II. Ancestral/derived amino acid states are shown in blue/red, respectively. (B) Correspondence analysis shows that ancient CS enzymes were similar and associate due to 7X methylation preference (node labels and substrate colors in pie charts taken from fig. 2C–E). From these ancestral activities, convergent modern-day enzyme substrate preferences evolved largely by mutations to common protein regions. It appears that mutations to Region I in PcAncCS2 and TcAncCS2 resulted in the convergent evolution of similar enzymes, PcCS2 and TcCS2, which associate due to preference to methylate 3X to form TB. Mutations to Region III of PcAncCS2 and TcAncCS1 resulted in increased relative preference for X to form 3X and ultimately contributed to the convergent evolution of PcCS1 and TcCS1. Arrows between enzyme coordinates represent the enzyme lineages shown in figure 2.
Amino acid replacements in Region III in both the Paullinia and Theobroma ancestral CS lineages, also produced similar convergent effects on substrate preference. In Paullinia, Asn307 of PcAncCS2 was mutated to Tyr in modern-day PcCS1 that shows near-complete preference for X to form 3X (figs. 2C and 3A). When we experimentally replaced Asn307 with Tyr in PcAncCS2, all activity with 3X and 7X was lost such that specialization for xanthine methylation at N3 resulted (fig. 3B andsupplementary table S1, Supplementary Material online). Therefore, this single replacement largely recapitulated the evolution of enzyme substrate preference of PcCS1 (figs. 2C and 3B). TcAncCS2 descended from TcAncCS1 and switched substrate preference from 7X to X. TcAncCS2 differs from its ancestor by three amino acids in Region III (fig. 3A). When we experimentally replaced Asn307, Leu308, and Ser310 of TcAncCS1 by Gly307, His308, and Cys310 (NLRS307-310GHRC), the mutant showed higher activity with X to produce 3X, much like TcAncCS2, (although it also formed 1X) (fig. 3B andsupplementary table S1, Supplementary Material online). Thus, convergence toward N3 methylation of X appears to have evolved in part by different substitutions to the same homologous protein Region III.
Collectively, our results show that the ancient pathways to methylate xanthine alkaloids in Paullinia and Theobroma ancestors likely differed from the similar ones used by species today. Because the pathways later converged it suggests that selection may be important for pathway flux changes over time. After subsequent gene duplication and divergence, single ancient enzymes alone could perform the pathway steps that the multiple modern-day descendants currently catalyze. That convergence subsequently resulted from independent gene duplication events that led to the partitioning of the same biochemical reactions in two enzymes in each of the two lineages studied is remarkable. These results suggest that two modern-day enzymes are better than one ancestral enzyme in terms of pathway flux and product accumulation which is largely consistent with predictions under a model of escape from adaptive conflict (Des Marais and Rausher 2008) although a rigorous test of this model was beyond the scope of this study (Barkman and Zhang 2009). Similar to the CS enzymes we have studied, ancestral constraints for multistep enzymes in the corticosteroid pathway were shown to be at least partly alleviated after gene duplication in primates (Olson-Manning 2020). Although it appears that the modern-day duplicated CS enzymes are coexpressed in Paullinia and Theobroma (Huang et al. 2016), knowledge of ancestral tissue-specific expression patterns for these enzymes would further understanding of the mechanisms of convergence; however, such data are difficult to infer with precision due to our lack of knowledge of ancestral transcriptional regulators and cis-regulatory elements. Finally, because we show that different mutations to homologous sequence regions led to convergence, constraint on the mutational paths available to each lineage is implicated. Future work aimed at testing the relative roles for selection and constraint for this and other cases of convergence may benefit from the use of ancestral sequence resurrection that can reveal evolutionary aspects of the process that may not have been predicted a priori.
Materials and Methods
Phylogenetic Analyses
In order to accurately determine the orthology of Paullinia (Sapindales), Theobroma (Malvales), and Camellia (Ericales) xanthine alkaloid-producing enzymes, amino acid sequences from all previously characterized SABATH gene family members and those from various land plant complete genomes were obtained from GenBank and the PlantTribes database (Wall et al. 2008). We also queried the OneKP database in order to provide more detailed branching relationships of the recently evolved CS enzymes of Paullinia, Theobroma, and Camellia as shown in supplementary figure S1, Supplementary Material online. Alignment of amino acid sequences was achieved using MAFFT version 7 (Katoh and Standley 2013) using the auto search strategy to maximize accuracy and speed. A maximum likelihood phylogenetic estimate for the SABATH family members was obtained using PhyML (Guindon et al. 2010) assuming the Jones, Taylor, and Thorton (JTT) matrix model for amino acid substitution with an invariant and gamma parameter for among-site rate heterogeneity as determined by ProtTest (Abascal et al. 2005). Bootstrap support was evaluated based on 100 pseudoreplicated data sets.
ASR and Mutagenesis
Ancestral enzyme sequences for nodes M–Q were estimated from the full CS lineage of the SABATH gene family shown in figure 2A using the JTT+ Gamma model of amino acid substitution as implemented in Codeml of PAML 4.0 (Yang 2007). Importantly, where possible, we relied on a combination of genomic sequence as well as supporting transcriptomic data to ensure that high-quality sequence was analyzed in order to avoid introducing potential sequence artifacts into our ancestral state estimates. This was possible for Theobroma and Camellia (Argout et al. 2008, 2011; Taniguchi et al. 2012; Wei et al. 2018); however, no genome exists for Paullinia so in that case we relied on transcriptome data alone (Angelo et al. 2008; Figueiredo et al. 2011) as reported in Huang et al. (2016). Alignments of the ancestral proteins with their modern-day descendants are shown in supplementary figure S8, Supplementary Material online. In order to determine ancestral protein lengths in regions with alignment gaps, we coded each sequence for the number of amino acids possessed and used parsimony to determine ancestral residue numbers as in our previous studies (Huang et al. 2012, 2016). The estimated sequences were synthesized by Genscript Corp. and had codons chosen for optimal protein expression in Escherichia coli. For sites that had relatively low posterior probabilities or that differed when the gamma parameter was not assumed, we generated alternative ancestral alleles by site-directed mutagenesis using the QuickChange® Site-Directed Mutagenesis Kit (Agilent), following the manufacturer’s protocol. This allowed us to assess whether experimentally determined enzyme activities were dependent upon particular amino acid reconstructions. At least two ancestral enzymes were characterized for each node M-Q in figure 2 even though average posterior probabilities were high for most sites (see average site-specific posterior probabilities in supplementary figs. S3–S7, Supplementary Material online). Details of those alternative alleles are provided in supplementary figures S3–S7, Supplementary Material online, including which sites were mutated as well as individual enzyme activity of each allele. Mean relative activity with each substrate is shown in the pie charts of figure. 2C–E.
Cloning, Heterologous Expression, and Purification of Enzymes
Ancestral gene sequences were synthesized by Genscript and were subcloned from the pUC57 cloning vector into the pET-15b (Novagen) expression vector. Plasmid DNA was first digested at 37 °C for 6 h using 1.5 μg of DNA and NdeI and BamHI in 30 μl reactions. Linear fragments corresponding to the expected sizes were gel purified using the QIAEXII gel extraction kit (Qiagen Corp.) according to the manufacturer’s instructions. Purified DNA fragments were ligated into pET15b using T4 DNA ligase from New England Biolabs. Reactions were incubated at 16 °C overnight. Ligation products were transformed into Top10 E. coli cells using 2 μl of ligation reaction. Minipreps of positive transformants were obtained using a Qiaprep spin miniprep kit (Qiagen Corp.). Ten nanogram of each plasmid was used to transform and grow BL21 E. coli cells using standard plating and incubation methods. Induction of His6-protein was achieved in 50 ml BL21 (DE3) cell cultures with the addition of 1 mM IPTG at 23 °C for 6 h as described previously (Huang et al. 2016). Purification of the His6-tagged protein utilized TALON spin columns (Takara Bio) and followed the manufacturer’s instructions. Bradford assays were used to determine purified protein concentration and recombinant protein purity was evaluated on SDS-PAGE gels. The plasmids used to produce ancestral proteins are freely available upon request.
Enzyme Assays
All enzymes were tested for activity with the eight xanthine alkaloid substrates shown in figure 1. Radiochemical assays were performed in 50 µl reactions with 0.01 µCi (0.5 µl) 14C-labeled SAM, 100 µM methyl acceptor substrate dissolved in 0.5 M NaOH and 10–20 µl purified protein in 50 mM Tris–HCl buffer at 24 °C for 60 min. Negative controls were composed of the same reagents except that the methyl acceptor substrate was omitted and 1 µl of 0.5 M NaOH was added instead. Methylated products were extracted in 200 µl ethyl acetate and quantified using a liquid scintillation counter. The highest enzyme activity reached with a specific substrate was set to 1.0 and relative activities with remaining substrates were calculated. Each assay was run at least twice so that mean, plus SD, could be calculated as shown in supplementary figures S3–S7, Supplementary Material online. Although the enzymes in this study did not show high activity with some of the substrates (e.g., 1X, XR), we do not believe that this is due to any artifact or limitation of the assays since our previous studies using the same conditions did detect catalysis of those structures by different enzymes (Huang et al. 2016).
Enzyme Kinetics
Kinetic parameters (kcat and KM) of the methyltransferases with a given substrate were determined using the 50 μl radioactive assay described above. However, appropriate enzyme concentration and incubation time were determined in time-course assays with low nonsaturating substrate concentrations to ensure that the reaction velocity was linear during the assay period. When varying xanthine alkaloid substrate concentration, the SAM concentration was held constant and saturated at 320 μM. Assays were run in duplicate and initial velocities versus substrate concentration were plotted using GraphPad Prism (GraphPad Software, La Jolla, CA) to fit the hyperbolic Michaelis–Menten equation to calculate Vmax and KM. Vmax was converted to kcat based on estimated protein concentrations and expressed in units of s−1.
Liquid Chromatography–Tandem Mass Spectrometry
Liquid chromatography MS–tandem mass spectrometry (LC-MS/MS) was used to confirm product identity from ancestral enzyme assays. Detection was optimized using pure standards for the expected products diluted to 1 µM in 0.1% formic acid/50% acetonitrile that were infused directly into a Waters Quattro Micro mass spectrometer via an electrospray ion source as described in Huang et al. (2016). The LC-MS/MS analysis was performed with an Agilent 1100 HPLC in-line to the Quattro Micro mass spectrometer using mobile phase A (0.1% formic acid/0.01% trifluoroacetic acid/water) and B (0.1% formic acid/0.01% trifluoroacetic acid/acetonitrile) with a flow rate of 0.5 ml/min. Compound elution was performed using a linear gradient of 0–16% B mobile phase over 16 min followed by 2 min of 95% B for a run time of 20 min. A postcolumn addition of 0.1% formic acid in acetonitrile was added via a peek tee at a flow rate of 100 µl/min. Scans for diagnostic fragment masses allowed for detection of each unique xanthine alkaloid as described previously in Huang et al. (2016).
Statistical Analysis
Correspondence analysis (Jackson 1997) was used to ordinate modern-day, ancestral, and mutated ancestral CS enzymes based upon relativized substrate preferences. Symmetric plots were used to visualize the results. Nonindependence of the enzymes and substrate preferences was determined (P < 0.05) and total inertia was 1.05 for the analysis described in figure 3. The first two factors of the analysis accounted for a total of 85% of the inertia. Positions of enzymes along the x-axis in the symmetric plot are due to variation in methylation preference for xanthine (coordinates < −0.5) or 3X (coordinates > 1). On the other hand, preference for 7X methylation is explained by position along the y-axis (coordinates < −0.5).
Supplementary Material
Supplementary data are available at Molecular Biology and Evolution online.
Acknowledgments
This work was supported by the National Science Foundation (Grant No. MCB-1120624 to T.J.B.). Greg Cavey is thanked for assistance provided with LC-MS analyses. Ricky Stull and two anonymous reviewers provided helpful feedback on previous versions of the manuscript.
Data Availability
Individuals interested in the data matrices underlying this article may obtain them upon request of the corresponding author. The original data underlying this article are available in https://www.ncbi.nlm.nih.gov/genbank/ as well as https://db.cngb.org/onekp/ (both of which were last accessed on March 01, 2021).
References
Author notes
Present address: Max Planck Institute for Chemical Ecology, Jena, Germany
Present address: Applied Biomedical Science Institute, San Diego, CA, USA