-
PDF
- Split View
-
Views
-
Cite
Cite
Pengda Liu, Arno L. Greenleaf, John W. Stiller, The Essential Sequence Elements Required for RNAP II Carboxyl-terminal Domain Function in Yeast and Their Evolutionary Conservation, Molecular Biology and Evolution, Volume 25, Issue 4, April 2008, Pages 719–727, https://doi.org/10.1093/molbev/msn017
- Share Icon Share
Abstract
The carboxyl-terminal domain (CTD) of eukaryotic RNA polymerase II is the staging platform for numerous proteins involved in transcription initiation, mRNA processing, and general coordination of nuclear events. Concordant with these central roles in cellular metabolism, the consensus sequence, tandemly repeated structure, and core functions of the CTD are conserved across diverse eukaryotic lineages; however, in other eukaryotes, the CTD has been allowed to degenerate completely. Even in groups where the CTD is strongly conserved, genetic analyses and comparative genomic investigations show that a variety of individual substitutions and insertions are permissible. Therefore, the specific functional constraints reflected by the CTD's conservation across much of eukaryotic evolution have remained somewhat puzzling. Here we propose a hypothesis to explain that strong conservation in budding yeast, based on both comparative and experimental evidence. Through genetic complementation for CTD function, we identify 2 sequence elements contained within pairs of heptapeptides, “Y1-Y8” and “S2-S5-S9,” which are required for all essential CTD functions in yeast. The dual requirements of these motifs can account for strong purifying selection on the canonical CTD heptapeptide. Further, in vitro analysis of GST–CTD fusion proteins as substrates for multiple CTD-directed kinases show reduced phosphorylation efficiencies with increased distance between functional units. This indicates that requirements of the RNAP II phosphorylation cycle are most likely responsible for the strong purifying selection on tandemly repeated CTD structure.
Introduction
The expression of protein-encoding genes in eukaryotes requires synthesis of mRNA by the evolutionarily conserved, multisubunit RNA polymerase II (RNAP II) enzyme. Unique among RNA polymerases, the RNAP II largest subunit (RPB1) has an additional and essential carboxyl-terminal domain (CTD) (Corden 1990) comprising tandemly repeated heptapeptides with the consensus sequence Tyr1-Ser2-Pro3-Thr4-Ser5-Pro6-Ser7. Although relatively simple in structure, the CTD undergoes a variety of posttranslational modifications, including isomerization (Xu et al. 2003), glycosylation (Kelly et al. 1993) and, most importantly, a dynamic phosphorylation and dephosphorylation cycle that are critical for regulating RNAP II transcription (Dahmus 1996; Palancade and Bensaude 2003).
Numerous investigations, primarily in yeast and animal systems, have demonstrated that many varied functions are carried out by this simple set of tandem repeats. These studies show that the CTD serves as a scaffold for macromolecular assemblies involved in mRNA synthesis and processing, including regulation of transcription initiation through interactions with mediator complex (Thompson et al. 1993), promoter clearance (Dahmus 1996), elongation and termination (Corden 1993; Meininghaus et al. 2000), histone methylation (Hampsey and Reinberg 2003), organization of transcription centers within the nucleus (Misteli 2000), nascent mRNA 5′ capping (McCracken et al. 1997) and cap methylation (Pillutla et al. 1998), mRNA constitutive (Hirose et al. 1999) and alternative splicing (de la Mata and Kornblihtt 2006), and 3′ cleavage (Proudfoot et al. 2002) and polyadenylation (Hirose and Manley 1998). There is further evidence that the CTD participates in diverse processes not directly tied to mRNA synthesis and processing, including chromatin remodeling, DNA repair, the packaging, editing, and export of mRNAs from the nucleus (Phatnani and Greenleaf 2006). X-ray crystallography and circular dichroism indicate that the CTD forms an unordered tail-like extension of sufficient size and flexibility to interact with multiple components of pre-mRNA processing machinery and localize them close to the nascent mRNA exit channel of RNAP II (Bienkiewicz et al. 2000). Sequential changes in the phosphorylation state of the CTD order and orchestrate the roles of its various protein partners throughout the transcription cycle (Lin et al. 2003; Phatnani and Greenleaf 2006; Hirose and Ohkuma 2007; Buratowski 2003).
Given its centrality to metabolism in the nucleus, it comes as no surprise that the consensus sequence and tandemly repeated structure of the CTD are conserved throughout fungi, animals, green plants, and their respective protistan relatives (Stiller and Hall 2002; Stiller and Cook 2004). The specific elements within the CTD that are under strong purifying selection are not fully understood, however, even in well-characterized model organisms. Genetic investigations in yeast and animals have shown that Y1, S2, and S5 are essential for CTD function (West and Corden 1995; Pei et al. 2001). The strongly conserved T4 as well as S7 are not essential in yeast (Stiller et al. 2000); however, phosphorylation of S7 in mammalian cells has been shown to be critical for small nuclear (sn)RNA expression (Chapman et al. 2007; Egloff et al. 2007). A variety of partial substitution mutants are viable, including those with changes at essential positions (West and Corden 1995). Truncation studies show that 8 repeats are required for viability and 13 for apparent wild-type (WT) growth in yeast (West and Corden 1995); about 30 heptads are needed for viability in mouse (Meininghaus et al. 2000), whereas mutants with 31–39 heptads exhibit growth defects (Litingtung et al. 1999). Investigations of insertion mutations indicate that the CTD's tandem register also is not an absolute requirement. Although Ala insertions between adjacent repeats are lethal in yeast in any register, individual residues inserted between pairs of heptapeptides are well tolerated (Stiller and Cook 2004).
The cumulative data from genetic investigations of CTD mutants beg several key questions: 1) if individual substitutions and disruptions of the tandem register both are permissible, what is the irreducible unit of CTD function and 2) what protein interactions are most responsible for the strong purifying selection on the global tandemly repeated CTD structure? Here we propose clear hypotheses to answer to both questions in the budding yeast model system, based on the genetic analyses of CTD mutants and the relative efficiencies of mutated sequences as phosphorylation substrates for CTD-directed kinases.
Materials and Methods
Construction of Artificial CTD Sequence and Yeast Transformation
Artificial oligonucleotides with various substitutions or insertions were used to construct mutated CTDs, subcloned into a yeast RNAP II shuttle vector, and transformed into yeast cells via the plasmid shuffle as described in detail previously (Stiller and Cook 2004).
Expression and Purification of GST-Mutated CTD Fusion Proteins
Mutated CTD sequences were polymerase chain reaction (PCR) amplified and cloned into the pGEX-5X-1 expression vector. Two primers pGEXF (5′-CGT GGG ATC CTT GGA GTC TCC TCC CCG AGT-3′) and pGEXR (5′-CCG CTC GAG GGG CAC ATC ATA GGG GTA GCT-3′) were synthesized to introduce BamHI and XhoI restriction sites into flanking sequences of artificial CTDs. Transformants were screened via PCR, and the products sequenced to verify integrity. Vectors encoding viable CTD fusion proteins were transformed into the Escherichia coli strain DH5α for GST-tagged fusion protein expression. A single colony was inoculated in 10 ml Luria-Bertani (LB) + ampicillin medium and shaken at 37 °C for 12 h, transferred into 1l LB + ampicillin, and grown to an OD600 of 0.8; 30 mM isopropyl-β-D-thiogalactopyranoside was then added to induce expression. Cultures were incubated for an additional 3 h, and cells were centrifuged at 6.5 K in a SS34 rotor at 4 °C for 10 min. A total of 2.5 volumes of suspension buffer (50 ml phosphate-buffered saline [PBS] + 50 μl yeast protease inhibitor + 1 mM ethylenediaminetetraacetic acid) were added, and the pellet sonicated 3 times under 50% duty cycle for 45 s each. Protein release was monitored with Bio-Rad fast protein assay reagent. Cell debris was removed by centrifugation at 25 K relative centrifugal force (r.c.f.) for 30 min at 4 °C. Preequilibrated Glutathione-Sepharose 4B resins with 1× PBS buffer were added to the supernatant and incubated at 4 °C for 1 h with constant rotation. The resin was collected by centrifugation in a SH 3000 rotor at 1 K r.c.f. for 3 min at 4 °C, washed 3 times with 1× PBS buffer, transferred into a Bio-Rad Poly-Prep chromatography column, and washed with 1 column volume of 1× PBS. GST–CTD fusion proteins were eluted (50 mM Tris–HCl, 10 mM reduced glutathione, pH 8.0), and five 400-μl aliquots were resolved on 4–20% sodium dodecyl sulfate–polyacrylamide gel electrophoresis (SDS–PAGE) gels. Concentrations of GST-mutated CTD fusion proteins were determined by biophotometer.
In Vitro Phosphorylation Assays for GST-Tagged CTD Mutants
CDK7/CycH/MAT1, CDK8/CycC, and CDK9/CycT1 were obtained as recombinant proteins from Invitrogen (Carlsbad, CA), and CTDK-I was produced by the Greenleaf laboratory. For analysis of relative phosphorylation of CTD mutants, 20 μl reactions were used with each specific kinase buffer (CDK7/CycH/MAT1: 12.5 mM Tris–HCl [pH 7.5], 10 mM MgCl2, 1 mM ethylene glycol tetraacetic acid [EGTA], 0.5 mM Na3VO4, 5 mM β-glycerolphosphate, 2.5 mM dithiothreitol [DTT], 0.01% Triton X-100; CDK8/CycC: 25 mM N-2-hydroxyethylpiperazine-N′-2-ethanesulfonic acid [HEPES] [pH 7.5], 10 mM MgCl2, 0.5 mM EGTA, 0.5 mM Na3VO4, 5 mM β-glycerolphosphate, 2.5 mM DTT, 0.01% Triton X-100; CDK9/CycT1: 25 mM 3-[N-morpholino] propane sulfonic acid [pH 7.2], 10 mM MgCl2, 0.5 mM EGTA, 0.5 mM Na3VO4, 5 mM β-glycerolphosphate, 2.5 mM DTT, 0.01% Triton X-100; and CTDK-I: 25 mM HEPES [pH 7.6] and 10 mM MgCl2) and with different kinases (CDK7/CycH/MAT1: 3.9 pmol per reaction, [protein] = 0.42 μg/μl, purity ≥75%; CDK8/CycC: 6.1 pmol per reaction, [protein] = 0.35 μg/μl, purity ≥80%; CDK9/CycT1: 4.9 pmol per reaction, [protein] = 0.33 μg/μl purity ≥75%; and CTDK-I: 4 μl per reaction), 1 μCi [γ-32p] adenosine triphosphate (ATP), 6 pmol cold ATP, and 4 μg of each GST–CTD substrate. Reactions were incubated at 30 °C for 2 h, stopped by adding 5 μl 5× SDS sample buffer, heated at 94 °C for 10 min, and resolved on 4–20% SDS–PAGE gels. Gels were dried by Bio-Rad model 583 gel drier on filter paper, exposed to Amersham (Piscataway, NJ) autoradiography film, and phosphorimages were saved on a Typhoon 9410 imaging system.
Quantification of Relative Levels of Phosphorylation
Relative phosphorylation was measured using ImageQaunt TL software by scanning exposed autoradiography film with the Typhoon 9410. Quantities of protein in each lane were determined accurately by also scanning Coomassie-stained gels. Relative phosphorylation intensities were calculated based on the potential number of phosphorylation sites available (numbers of S5 for CDK7/CycH/MAT1 and CDK8/CycC reactions and total SP pairs for CDK9/CycT1 and CTDK-I). Phosphate incorporation for each mutated CTD was normalized to incorporation by the GST–WT CTD protein. The formulas used are as following:
PI = Phosphorylation intensity/number of SP pairs for CDK9 and CTDK-I or number of S5 for CDK7 and 8.
Number of SP pairs = (protein loaded/protein molecular weight) × SP number.
Number of S5 = (protein loaded/protein molecular weight) × S5 number.
Relative PI = PI mutated CTD/PI WT CTD.
Quantization of Absolute Amount of Phosphates Transfer from ATP to CTD Substrates
The absolute amount of 32P radioactivity associated with the Coomassie-stained GST fusion protein bands by phosphoimage analysis using the same phosphorimager and software as above was determined. The counts obtained by phosphoimage analysis were calibrated to a series of known amounts of [γ-32P]ATP (0, 0.25, 0.5, 0.75, 1, and 1.25 pmole) that were blotted onto filter paper.
Absolute extent of phosphate transfer from ATP to CTD = moles of phosphates incorporated into the substrate/moles of total substrate.
Results
Two Sequence Motifs Are Required for Essential CTD Functions
Previous genetic analyses in yeast have shown that the irreducible unit of CTD function is contained within paired heptapeptides (Stiller and Cook 2004). Therefore, we undertook a comprehensive investigation, using genetic complementation, to determine the essential elements within each diheptapeptide. First, we introduced Ala substitutions at the respective ends of each diheptad repeat; 2 mutated CTDs, pYAR (repeats of YSPTSPSYSPTAAA) and “pYAL” (repeats of ASPTSPSYSPTSPS) were constructed as described previously and introduced into yeast cells via the plasmid shuffle (Stiller and Cook 2004). Although only 13 canonical CTD heptads are required for apparent WT growth, initial analyses of pYAR mutants showed greater length-dependent effects (fig. 1A); growth rates continued to improve as the number of heptads increased to approximately that of the full-length yeast CTD. Therefore, putatively lethal mutations were verified with constructs containing close to the equivalent of 26 heptad repeats (table 1). Viability of pYAR mutants (fig. 1B) with as few as 12 heptads shows that the second S5 within each diheptad unit is unessential. In contrast, lethality of all pYAL constructs demonstrates that adjacent Y1 residues must be maintained. Combined with prior observations that single Ala insertions within every repeat are lethal, regardless of position, but that comparable insertions between diheptads are tolerated (Stiller and Cook 2004), these results indicate that disrupting the heptad spacing between tyrosines also is impermissible.
CTD Mutant | Repeated Sequence | Repeat Number | Cell Phenotype |
pYAL | ASPTSPS YSPTSPS | 18, 24, 32, 39 | Lethal |
pYAR | YSPTSPS YSPTAAA | 12, 18, 24, 32 | Viable |
pYYAA | YAATSPS YSPTSPS | 16, 26 | Lethal |
pYYATA | YAPTAPS YSPTSPS | 18, 26 | Lethal |
pYYAS | YASPTSPS YSPTSPS | 17, 23, 26 | Viable |
pYYAP | YAPTSPS YSPTSPS | 38 | Lethal |
pYAPS | YSPTAPS YSPTSPS | 20 | Viable |
CTD Mutant | Repeated Sequence | Repeat Number | Cell Phenotype |
pYAL | ASPTSPS YSPTSPS | 18, 24, 32, 39 | Lethal |
pYAR | YSPTSPS YSPTAAA | 12, 18, 24, 32 | Viable |
pYYAA | YAATSPS YSPTSPS | 16, 26 | Lethal |
pYYATA | YAPTAPS YSPTSPS | 18, 26 | Lethal |
pYYAS | YASPTSPS YSPTSPS | 17, 23, 26 | Viable |
pYYAP | YAPTSPS YSPTSPS | 38 | Lethal |
pYAPS | YSPTAPS YSPTSPS | 20 | Viable |
CTD Mutant | Repeated Sequence | Repeat Number | Cell Phenotype |
pYAL | ASPTSPS YSPTSPS | 18, 24, 32, 39 | Lethal |
pYAR | YSPTSPS YSPTAAA | 12, 18, 24, 32 | Viable |
pYYAA | YAATSPS YSPTSPS | 16, 26 | Lethal |
pYYATA | YAPTAPS YSPTSPS | 18, 26 | Lethal |
pYYAS | YASPTSPS YSPTSPS | 17, 23, 26 | Viable |
pYYAP | YAPTSPS YSPTSPS | 38 | Lethal |
pYAPS | YSPTAPS YSPTSPS | 20 | Viable |
CTD Mutant | Repeated Sequence | Repeat Number | Cell Phenotype |
pYAL | ASPTSPS YSPTSPS | 18, 24, 32, 39 | Lethal |
pYAR | YSPTSPS YSPTAAA | 12, 18, 24, 32 | Viable |
pYYAA | YAATSPS YSPTSPS | 16, 26 | Lethal |
pYYATA | YAPTAPS YSPTSPS | 18, 26 | Lethal |
pYYAS | YASPTSPS YSPTSPS | 17, 23, 26 | Viable |
pYYAP | YAPTSPS YSPTSPS | 38 | Lethal |
pYAPS | YSPTAPS YSPTSPS | 20 | Viable |

CTD mutant phenotypes. (A) Relative growth rates for pYAR mutants compared with WT. The upper figure shows a 10x titration of transformants, grown for 48 h under 5-fluorootic acid (5-FOA) selection, starting with 106 cells of inoculant. A quantitative comparison of doubling times relative to the WT control is shown below. Growth rates for pYAR mutants improve at least up to the normal CTD length in yeast (26 heptad equivalents). (B) Titration of cell cultures of all CTD mutants under 5-FOA selection. The pYAR with 24 repeats and pYYAS with 23 repeats show that nearly WT growth and pYAPS, even with only 20 repeats, have a growth rate comparable with WT. All the other mutants are inviable, demonstrating the dramatic effects of altering key sequence elements. (C) Titration of the liquid cultures of CTD reversion mutants under 5-FOA selection. To verify that lethality was not due to incidental mutations during the cloning and transformation process elsewhere in RPB1 or the vector, lethal CTD inserts are removed and replaced by the WT CTD, then retransformed into yeast.
Lethality of a third construct containing repeats of the sequence YSPTSPSYAPTAPS (pYYATA) demonstrates that 3 consecutive phosphorylatable serines spanning adjacent heptads are a further requirement. These consecutive potential phosphoserines could be arranged within diheptapeptides either in a “2-5-2” or a “5-2-5” orientation. The pYAR mutants show that the former arrangement supports viability. We tested the latter in pYYAP mutants (YAPTSPSYSPTSPS repeats), all of which were inviable. Thus, consecutive phosphoserines must occur in the 2-5-2 orientation.
The 2 Essential Elements Are Not Absolutely Linked in Diheptad Repeats
Combined with previous results, these mutants define Y1-Y1 and S2-S5-S2 as the 2 sequence requirements for essential CTD function within any given pair of heptapeptides; however, they do not clarify whether the 2 motifs must overlap. Specifically, is Y1S2PTS5PSY8S9P the irreducible unit of CTD function? Because the CTD is repetitive, the only regular insertion within contiguous diheptads that always interrupts this specific motif, without also disrupting a SP kinase recognition site, is immediately after alternating Y1 residues. The pYYAS mutants of various lengths (repeats of YASPTSPSYSPTSPS) all proved to be viable, showing that the 2 essential elements are not linked absolutely. Therefore, we conclude that all essential CTD functions require: 1) paired tyrosines spaced 7 amino acids apart (Y1-Y8) and 2) 3 potential phosphoserines in a 2-5-9 orientation with respect to the Y8 residue of a given diheptapeptide. Because the 2 requirements are somewhat independent, the indivisible essential unit of the CTD appears to be “Y1S2P3X4S5P6X7Y1”; however, this must be accompanied either by a proximal “S2PXS5PX” or a distal “S2P.”
To exclude the possibility that the lethality of key CTD mutations can be compensated by increased numbers of repeats, heptad equivalents were increased to 39 for pYAL and 34 for “pYAA.” The inviability of both mutants shows that there is a complete disruption rather than reduced efficiency of some set of essential CTD–protein interactions or that the cumulative reduction in efficiency that is too severe to overcome by adding additional CTD-binding sites. Combined with evolutionary conservation of the CTD across diverse groups of eukaryotes, this suggests the presence of at least some key functional constraints that underlie strong purifying selection on the CTD's canonical heptad sequence across diverse eukaryotes. Because the tandem structure of the CTD also is strongly conserved, we further investigated our mutated constructs through in vitro phosphorylation of mutated sequences by 4 different CTD-directed cyclin-dependent kinases (CDKs).
Phosphorylation of Mutated Sequences by CTD-Directed Kinases
Many regular CTD substitutions or insertions produce quantitative effects on cell growth, resulting from reduced efficiency of at least some CTD functions. In contrast, any disruption of the core functional requirements we identified results in a clear and qualitative effect, that is, lethality. This could result from an inability of the altered CTD to bind essential transcription and processing protein partners and/or to a failure of modifying enzymes to recognize mutated sequences as appropriate substrates. Because specific CTD-binding domains vary substantially, do not generally require full diheptapeptide motifs (see Discussion), and are not strongly conserved through evolution (Guo and Stiller 2005), they seem less likely candidates to explain global conservation of the CTD over broad evolutionary distances. In contrast, cycling between hypo- and hyperphosphorylated CTDs is conserved from yeast to mammals and is a prerequisite for many other essential CTD–protein interactions (Phatnani and Greenleaf 2006). Moreover, in comparison to most other CTD-related proteins, strong conservation of kinases and phosphatases is reasonably well correlated with retention of a canonical CTD through eukaryotic evolution (Guo and Stiller 2004). Therefore, we explored interactions between CTD kinases and our mutated constructs as the most promising avenue for understanding the mechanistic bases for strong conservation of global CTD structure.
GST–CTD fusion proteins were constructed for each mutated CTD (table 2) and their efficiencies as substrates for in vitro phosphorylation assayed using 4 different CTD-directed kinases: human CDK7/CylcinH/MAT1, CDK9/CyclinT1, and CDK8/CyclinC, as well as the yeast CDK9 homolog CTDK-I. Phosphorylation efficiencies of all mutated CTDs are poorer than for WT heptads. This is true even in cases wherein there is little obvious phenotypic effect; for example, with single Ala insertions between various diheptapeptide repeats (fig. 2). Relative levels of phosphate incorporation by constructs with Ala substitutions and single insertions show no correlation between phosphorylation efficiency and qualitative phenotype (viable or lethal) (fig. 2). For example, pYYATA is lethal but its level of phosphorylation is closest to that of the WT CTD sequence with all 4 kinases. Conversely, both pYAPS and pYAR result in relatively robust mutants but are far poorer kinase substrates. Thus, although reduced efficiency of phosphorylation may explain generally slower growth rates in CTD mutants and could contribute to the lethality of some kinds of mutations, CTD-kinase interactions do not appear to be directly responsible for defining the essential CTD functional elements defined above. Absolute amounts of phosphates transferred from ATP to different CTD substrates by different set of kinases are also calculated (supplementary table, Supplementary Material online) and for the same CTD alteration, different kinases have different sensitivities.
Sequences and Number of Repeats for GST–CTD Fusion Proteins and Corresponding Phenotypes for Their pY Constructs
GST–CTD Fusion Proteins | Repeated Sequence | Repeat Number | Cell Phenotype for Their pY Constructs |
AT | YSPATSPS YSPATSPS | 12 | Lethal |
TA | YSPTASPS YSPTASPS | 6 | Lethal |
AT2 | YSPATSPS YSPTSPS | 16 | Viable |
TA2 | YSPTASPS YSPTSPS | 11 | Viable |
YA | YASPTSPS YASPTSPS | 9 | Lethal |
AA | AA YSPTSPS YSPTSPS | 20 | Viable |
5A | AAAAA YSPTSPS YSPTSPS | 30 | Slow growth |
7A | AAAAAAA YSPTSPS YSPTSPS | 27 | Lethal |
AL | ASPTSPS YSPTSPS | 18 | Lethal |
AR | YSPTSPS YSPTAAA | 18 | Viable |
YAA | YAATSPS YSPTSPS | 10 | Lethal |
YATA | YAPTAPS YSPTSPS | 18 | Lethal |
YAS | YASPTSPS YSPTSPS | 16 | Viable |
YAP | YAPTSPS YSPTSPS | 11 | Lethal |
APS | YSPTAPS YSPTSPS | 20 | Viable |
GST–CTD Fusion Proteins | Repeated Sequence | Repeat Number | Cell Phenotype for Their pY Constructs |
AT | YSPATSPS YSPATSPS | 12 | Lethal |
TA | YSPTASPS YSPTASPS | 6 | Lethal |
AT2 | YSPATSPS YSPTSPS | 16 | Viable |
TA2 | YSPTASPS YSPTSPS | 11 | Viable |
YA | YASPTSPS YASPTSPS | 9 | Lethal |
AA | AA YSPTSPS YSPTSPS | 20 | Viable |
5A | AAAAA YSPTSPS YSPTSPS | 30 | Slow growth |
7A | AAAAAAA YSPTSPS YSPTSPS | 27 | Lethal |
AL | ASPTSPS YSPTSPS | 18 | Lethal |
AR | YSPTSPS YSPTAAA | 18 | Viable |
YAA | YAATSPS YSPTSPS | 10 | Lethal |
YATA | YAPTAPS YSPTSPS | 18 | Lethal |
YAS | YASPTSPS YSPTSPS | 16 | Viable |
YAP | YAPTSPS YSPTSPS | 11 | Lethal |
APS | YSPTAPS YSPTSPS | 20 | Viable |
Sequences and Number of Repeats for GST–CTD Fusion Proteins and Corresponding Phenotypes for Their pY Constructs
GST–CTD Fusion Proteins | Repeated Sequence | Repeat Number | Cell Phenotype for Their pY Constructs |
AT | YSPATSPS YSPATSPS | 12 | Lethal |
TA | YSPTASPS YSPTASPS | 6 | Lethal |
AT2 | YSPATSPS YSPTSPS | 16 | Viable |
TA2 | YSPTASPS YSPTSPS | 11 | Viable |
YA | YASPTSPS YASPTSPS | 9 | Lethal |
AA | AA YSPTSPS YSPTSPS | 20 | Viable |
5A | AAAAA YSPTSPS YSPTSPS | 30 | Slow growth |
7A | AAAAAAA YSPTSPS YSPTSPS | 27 | Lethal |
AL | ASPTSPS YSPTSPS | 18 | Lethal |
AR | YSPTSPS YSPTAAA | 18 | Viable |
YAA | YAATSPS YSPTSPS | 10 | Lethal |
YATA | YAPTAPS YSPTSPS | 18 | Lethal |
YAS | YASPTSPS YSPTSPS | 16 | Viable |
YAP | YAPTSPS YSPTSPS | 11 | Lethal |
APS | YSPTAPS YSPTSPS | 20 | Viable |
GST–CTD Fusion Proteins | Repeated Sequence | Repeat Number | Cell Phenotype for Their pY Constructs |
AT | YSPATSPS YSPATSPS | 12 | Lethal |
TA | YSPTASPS YSPTASPS | 6 | Lethal |
AT2 | YSPATSPS YSPTSPS | 16 | Viable |
TA2 | YSPTASPS YSPTSPS | 11 | Viable |
YA | YASPTSPS YASPTSPS | 9 | Lethal |
AA | AA YSPTSPS YSPTSPS | 20 | Viable |
5A | AAAAA YSPTSPS YSPTSPS | 30 | Slow growth |
7A | AAAAAAA YSPTSPS YSPTSPS | 27 | Lethal |
AL | ASPTSPS YSPTSPS | 18 | Lethal |
AR | YSPTSPS YSPTAAA | 18 | Viable |
YAA | YAATSPS YSPTSPS | 10 | Lethal |
YATA | YAPTAPS YSPTSPS | 18 | Lethal |
YAS | YASPTSPS YSPTSPS | 16 | Viable |
YAP | YAPTSPS YSPTSPS | 11 | Lethal |
APS | YSPTAPS YSPTSPS | 20 | Viable |

In vitro phosphorylation of GST–CTD fusion proteins by 4 different CTD-directed kinases. (A) Coomassie-stained gels of original GST fusion proteins and phosphorimages for each kinase. Coomassie gels show the amount of each fusion protein loaded. (Refer to table 2 for sequences and repeat numbers for each construct). Phosphorylation intensity for each GST-mutated CTD fusion protein is shown on the phosphorimages relative to a GST-WT control. (B) Quantification of the relative phosphorylation for each GST-mutated CTD fusion. Relative phosphorylations are based on the presumed number of potential kinase recognition sites for each CTD kinase. The absolute phosphorylation intensity for each CTD fusion protein was adjusted to the potential number of phosphorylation sites and normalized to the value obtained from the GST–WT CTD control. Yellow bars represent substitution and green bars insertion mutations. Asterisks indicate viability of the respective yeast mutant.
When intact functional units are retained, however, but separated by increasing distance (pYAA, pY5A, and pY7A), there is a direct correlation between relative kinase efficiency and growth phenotypes; both decrease proportionally as diheptads are moved further apart (fig. 2). In fact, we detected no phosphate incorporation for the GST-7A fusion protein with any of the CDKs tested, which certainly could be the direct explanation for why pY7A mutants are inviable. Thus, so long as basic CTD units are held intact, increasing the distance between those units has progressively deleterious and correlated effects on CTD phosphorylation and cell function.
Discussion
Phosphorylation and Conservation of a Tandemly Repeated Structure
The decrease in CTD kinase efficiencies as essential CTD motifs are pushed further apart, and its correlation with declining fitness and eventual lethality of insertion mutants, suggests a unifying explanation for broad scale evolutionary conservation of a tandemly repeated CTD structure. Reduced phosphorylation with increased distance between functional units supports the hypothesis that all CTD-directed CDKs work through progressive and/or cooperative mechanisms. This does not imply that this processivity must extend across the full CTD for any given kinase.
In vitro experiments, including our own results (supplementary fig., Supplementary Material online) indicate that the CTD is hyperphosphorylated only after the appearance of a partially or hypophosphorylated form (Zhang and Corden 1991; Marshall et al. 1996); moreover, contiguous phosphorylation is not consistent with current models of how the CTD phosphorylation cycle proceeds in vivo (Phatnani and Greenleaf 2006). A partially phosphorylated CTD has been shown to be a better substrate for P-TEFb than the unphosphorylated form (Fong and Bentley 2001), and several studies (Marshall et al. 1996; Ramanathan et al. 2001) indicate that faster phosphate addition by P-TEFb occurs after an initial slower phase of phosphorylation, both on the synthetic peptides and on the complete Drosophila CTD. These data support at some level of cooperative CTD phosphorylation across multiple kinase target sites. Moreover, circular dichroism spectra of single CTD repeats with different phosphorylation states show that phosphorylation does not induce marked conformational changes in an aqueous environment (Noble et al. 2005). With the full-length mouse CTD, however, phosphorylation leads to a more extended structure (Zhang and Corden 1991), further supporting the idea of some cooperativity among the phosphorylated repeats. These observations are consistent with our results and the hypothesis that phosphorylation requirements are primarily responsible for strong conservation of a global tandemly repeated CTD structure; however, the lack of a clear relationship between relative phosphorylation efficiencies and mutant phenotypes suggests that they are not also directly responsible for conserving the canonical sequence of the 7 amino acid repeats.
One additional interesting aspect of our kinase results involves the relative behaviors of yeast CTDK-I and its mammalian homolog CDK9. CTDK-I is more efficient in phosphorylating nearly all mutated CTD substrates, indicating that CDK9–CTD interactions are more stringently adapted to the WT CTD sequence. This result is counterintuitive given that the human CTD contains more heptads deviating from the consensus than are present in yeast. Thus, we might predict CDK9 to be better adapted than CTDK-I to recognize our various noncanonical CTD substrates. Our contrary results suggest that phosphorylation in mammals is under more precise control, possibly involving uncharacterized divisions of labor with alternative kinases recognizing different heptapeptide substrates. The observation that inactivation of CTDK-1 in yeast and CDK9 in Drosophila result in problems with pre-mRNA processing but not transcription elongation (Hirose and Ohkuma 2007) supports this hypothesis. Comparative evolutionary analyses show that CDK9 and CTDK-1 are part of a larger subfamily of related CDKs that are likely to have CTD-specific functions; however, human CDK9 is orthologous to yeast BUR1, whereas CTDK-1 is more closely related to human CDC2L5 and CrkRS (Guo and Stiller 2004). These latter human CDKs have yet to be characterized thoroughly, but they have been implicated in regulation of alternative splicing of pre-mRNA and, in the case of CrkRS, phosphorylation of the RNAP II CTD (Chen et al. 2006).
Conservation of Canonical Heptad Sequence
If the sequence motifs required for CTD function are not dictated specifically by their role as a kinase substrate, requirements for binding any number of protein partners must be responsible. Some of these may still relate to the conserved RNAP II phosphorylation cycle; indeed, a “S2-S5-S2” configuration is one requirement of both plant and yeast CTD phosphatases (Hausmann et al. 2005). Although 3-dimensional structures have been solved for only a handful of CTD–protein interactions, binding characteristics appear to be highly flexible, with the CTD assuming specific structures via an “induced fit” mechanism only when associated with its protein partners (Meinhart et al. 2005). Nevertheless, the 2 essential elements we have identified in yeast appear to be key target motifs for CTD-protein–binding interactions in diverse eukaryotes (fig. 3).
![CTD sequence requirements for binding by various protein partners from diverse eukaryotes. To date, 6 CTD-binding domains have been studied extensively, 3 via structural crystallization (A–C, see [Meinhart et al. 2005] for review), 2 (D–H) through genetic mutation analyses (Hausmann, Erdjument-Bromage, and Shuman 2004; Hausmann, Schwer, and Shuman 2004; Hausmann et al. 2005), and 1 via Nuclear magnetic resonance titration and Biacore plus genetic analyses (I and J). For N-terminal, CTD interacting domain, and WW domains, underlined regions represent the motifs that contact their binding partners, whereas for C-terminal domain of a breast cancer susceptibility protein domain underlined residues were shown to be essential for the binding. The full CTD peptide fragments pictured are those used in the corresponding studies.](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/mbe/25/4/10.1093/molbev/msn017/2/m_molbiolevolmsn017f03_ht.gif?Expires=1748049911&Signature=MsED1o~JZwFjJnjNhFrhElIk2J9cKnEuNyoxHMg4JmKSbKntPWdUYg0yM4WxcUlwrxAM6wVxlYd8Fc9WiwdtrFzwMkd-PwOxw86afyuH4V-TrxNH9cGf11q-tQR9luqqbNIUOjP14pjPWJtOs577yxzEyYH5hjug2xPUlAJvjhSxdEoimotcKognJb~kPxIaUP1rRfp1k4NcWF-WM~KDBOY3QbxC-wNSaetDBC8u3IHtwYcezTSZztw6ly5qQh3vLN-H5qs5GBNlUQOXzA-jAG57BV32LmLcyxPOP9-AYSfYwidaHiaKl32y07Mlz1TWrPOUY~f47pvg2~oDH2tKYw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
CTD sequence requirements for binding by various protein partners from diverse eukaryotes. To date, 6 CTD-binding domains have been studied extensively, 3 via structural crystallization (A–C, see [Meinhart et al. 2005] for review), 2 (D–H) through genetic mutation analyses (Hausmann, Erdjument-Bromage, and Shuman 2004; Hausmann, Schwer, and Shuman 2004; Hausmann et al. 2005), and 1 via Nuclear magnetic resonance titration and Biacore plus genetic analyses (I and J). For N-terminal, CTD interacting domain, and WW domains, underlined regions represent the motifs that contact their binding partners, whereas for C-terminal domain of a breast cancer susceptibility protein domain underlined residues were shown to be essential for the binding. The full CTD peptide fragments pictured are those used in the corresponding studies.
A comparison of crystal structures of proteins complexed with the CTD suggests that Y1 residues generally are involved in CTD–protein interactions (Meinhart et al. 2005); it is reasonable that some essential interactions involve more than one. Indeed, paired tyrosines anchor binding of the CTD to several associated protein partners. Two consecutive Y1 residues contact the Cgt1 surface via van der Waals forces to maintain the binding of capping enzyme to the S5 phosphorylated CTD in Schizosaccharomyces pombe and mammals (Pei et al. 2001). The crystal structure of Cgt1 from Candida albicans complexed with CTD repeats shows 2 discrete CTD-binding sites that interact with adjacent Y1 residues (Fabrega et al. 2003). Although the CTD apparently can loop between these binding sites so that the tyrosines need not be in adjacent heptads (Schwer et al. 2001), there probably are other CTD–protein interactions that do not permit such flexibility. Binding between the yeast Set2 and the CTD also involves Y residues in 2 contiguous repeats contacting the Set2 SRI domain (Vojnic et al. 2006). Plant and yeast CTD phosphatases, CPLs and Ssu72, respectively, rely on tyrosines both upstream and downstream of the phospho-S5 for function (Hausmann et al. 2005), whereas 2 adjacent N-terminal flanking tyrosines are required for Encephalitozoon cuniculi Fcp1 to remove S2 phosphates (Hausmann, Schwer, and Shuman 2004). It is interesting to note that Tyr side chains are ordered for CTD peptides in the hydrogen bond–promoting solvent trifluoroethanol (Bienkiewicz et al. 2000), suggesting a mechanistic basis for promoting the evolution of cooperative binding to adjacent Y1 residues.
In addition to the evidence for a role of adjacent tyrosines, defined CTD-binding requirements indicate the importance of contiguous and properly spaced phosphoserines (Li et al. 2005). A comparison across phospho-CTD–associated proteins (PCAPs) shows that the specific CTD residues contacted are highly variable but tend to occur over relatively short stretches. Although most are accommodated by the S2-S5-S2 essential motif (fig. 3), there are exceptions even in the relatively small number of PCAPs investigated to date. The human Set2 hSRI domain requires 4 consecutive CTD phosphoserines for maximal binding, in either a 2-5-2-5 or a 5-2-5-2 configuration (Li et al. 2005). Set2 can bind triply phosphorylated CTD heptads, although less tightly, but does not favor an S2-S5-S2 over an S5-S2-S5 configuration. The absence of an obvious preference for an S2-S5-S2 configuration by CTD-binding domains begs a key question, why is S5-S2-S5 impermissible in yeast? It is possible that there have been tighter constraints on the evolution of yeast PCAP-binding domains that preclude the latter orientation; however, the general sequence similarity among SRI domains (Li et al. 2005), as well as the requirement of doubly phosphorylated heptads for Saccharomyces cerevisiae Set2 CTD binding, do not suggest major differences between yeast and human. (Kizer et al. 2005). Set2 is unessential in yeast, and the inviability of S5-S2-S5 CTD mutants could reflect the disruption of a small subset of critical protein interactions that, likewise, involve more than 1 adjacent CTD heptad but require an S2-S5-S2 configuration. Alternatively, lethality might result from the cumulative effects of suboptimal CTD–protein interactions or the breakdown of cooperative and/or sequential binding of proteins to adjacent CTD sites. Genetic analyses demonstrate that Ser2 and Ser5 have different functions along the CTD repeats (West and Corden 1995) and through the course of transcription of individual genes (Pinhero et al. 2004). Thus, S5-S2-S5 mutants might pass some lethal threshold for additive disruption of RNAP II–protein interactions that are not reached in S2-S5-S2 mutants. It also is possible that the reverse could hold or that neither configuration is dispensable in other CTD-dependent RNAP II transcriptional systems.
Our results also help to explain contradictory evidence regarding the relative importance of the consensus CTD sequence versus repeat number. Various studies have suggested that alternative (de la Mata and Kornblihtt 2006) and constitutive splicing (Rosonina and Blencowe 2004), as well as 3′ end cleavage (Ryan et al. 2002), do not depend on a specific CTD sequence but, rather, correlate with overall CTD length. All the different CTD sequences employed in these comparisons, however, contained the essential CTD elements we have identified. Thus, these prior observations are consistent with our results showing that, as long and essential sequence motifs are not altered, the efficiency of CTD-based functions correlates with the number of heptads present.
Implications for Comparative CTD Evolution and Function
The CTD has proven to be a remarkably flexible platform for coordinating complex protein–protein interactions associated with RNAP II transcription cycle and a variety of other nuclear functions (Zhang and Corden 1991). The analysis presented here provide a working hypothesis of the mechanistic bases for strong evolutionary conservation of CTD structure. Clearly, the joint requirements of Y1-Y8 spacing and S2-S5-S9 arrangement are maintained most efficiently by direct, tandem repeats of the canonical CTD heptapeptide. The marked decline in CTD kinase efficiencies as individual heptad pairs are pushed apart is sufficient to explain strong selection on a globally repetitive structure.
Although these explanations emerge from analysis of a more simple yeast model, the broadscale evolutionary conservation of both the CTD canonical sequence and many CTD–protein interactions (the phosphorylation cycle in particular) suggests that our results may be applicable across diverse eukaryotes. Given demonstrated variation in CTD function among eukaryotes, however, extensions of our results to other organisms should be made with caution. As evidenced by comparative analyses of animal CDK9 and yeast CTDK-1 (fig. 2), there likely are additional core requirements for CTD function in more complex transcriptional systems, such as those in animals and higher plants. For example, although Ser7 is dispensable in S. cerevisiae, it is known to be phosphorylated in mammalian cells (Chapman et al. 2007) and is implicated in snRNA expression (Egloff et al. 2007). Bioinformatic comparisons indicate that, in addition to core functions conserved among CTD-based RNAP II transcription systems, numerous lineage-specific protein–protein interactions have been added to the CTD's repertoire through the course of eukaryotic evolution (Guo and Stiller 2005). Certainly, some of these could alter the specific residues required for cell viability in organisms other than budding yeast. Nevertheless, our results provide a baseline for comparing CTD-coordinated RNAP II transcription across diverse eukaryotes, and particularly for understanding why the CTD has been conserved so strongly in some organisms, but allowed to degenerate completely in others.
Supplementary Material
Supplementary table and figure are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).
We thank Hemali Phatnani and Janice Jones for their help with kinase assays, Brett Keiper and Denise Mayer for technical support, A. Carlyle Rogers for constructing the pY7A mutated CTD, and editor and anonymous reviewers for helpful comments. This work was supported by grant MCB 0133295 from the National Science Foundation. This research was done at East Carolina University.
References
Author notes
Martin Embley, Associate Editor