Abstract

DNA hypermodifications are effective weapons for phages to cope with the defense system of bacteria. The biogenesis of DNA hypermodification in phages involves multiple steps, from the modified deoxynucleotide monophosphates to the final hypermodification on the DNA chains. PseudomonasPaMx11 gp46 and gp47 encode the enzymes for sequentially converting 5-phosphomethyl-2′-deoxyuridine to 5-Nα-glycinylthymidine and 5-aminomethyl-2′-deoxyuridine. Here, we have determined the crystal structures of gp46 and gp47 in their apo and double-stranded DNA (dsDNA)-bound forms. We uncovered their dsDNA recognition properties and identified the critical residues for the catalytic reactions. Combined with in vitro biochemical studies, we proposed a plausible reaction scheme for gp46 and gp47 in converting these DNA hypermodifications. Our studies will provide the structural basis for future bioengineering of the synthetic pathway of hypermodification and identifying new modifications in mammals by enzyme-assisted sequencing methods.

Introduction

DNA modifications exist in all kingdoms of living organisms. To date, various types of modifications have been identified on the nucleobases of DNAs, from the simplest methylation to complex glycosylation. Like histone modifications, DNA modifications could also function as epigenetic markers and regulate gene expression (1). To defend against phages or other intruders, bacteria have evolved many DNA modifications, such as N4-methylcytosine, N6-methyladenine (6mA) and 5-methylcytosine (5mC) modifications, as the weapons of ‘arms race’ (2). During the long-term battle, phages have also evolved many DNA modifications to overcome bacteria’s defense (3,4). It has been confirmed that the DNAs in many phages are partially or even entirely substituted by the modified nucleotides, usually through a three-step mechanism: (i) first, the canonical deoxynucleotide monophosphates (dNMPs) are catalyzed to form modified dNMPs; (ii) second, the modified dNMPs are converted into modified deoxynucleotide triphosphates (dNTPs) by kinases from phage and host; and (iii) finally, the modified dNTPs are incorporated into DNAs by phage polymerases.

In addition to 6mA, several additional types of purine modifications exist in phage DNAs, including the N6-carbamoyl-methyladenine (5,6), 2-aminoadenine (2,6-diaminopurine) (7,8), 7-methylguanine (9,10) and deoxyarchaeosine (1,11–13). Compared to purines, the modifications of pyrimidines are more diverse. Many pyrimidine modifications have been identified in phage DNAs, such as deoxyuracil (14,15), 5-hydroxymethyldeoxyuracil (5hmdU) (1), 5-dihydroxypentauracil (16,17), 5mC (18–20), 5-hydroxymethylcytosine (5hmC) and its glycosylated product glucosyl-5hmC (21), and 5-hydroxycytosine (22). Consistent with the idea that the DNA modifications are derived from the modified dNMPs, many deoxycytidine monophosphate and deoxyuridine monophosphate molecules in phages are catalyzed by the thymidylate synthase homolog proteins to produce methylated or hydroxymethylated pyrimidine products, including deoxythymidine monophosphate, 5-methyldeoxycytidine monophosphate, 5-hydroxymethyl deoxyuridine monophosphate and 5-hydroxymethyl deoxycytidine monophosphate. Phage dYMP (Y=C or T) kinases and host nucleotide diphosphate kinases sequentially convert these products into dYDPs and dYTPs for DNA synthesis (1). In some phages, the target thymidine could also be directly oxidized by the TET_JBP family protein, forming 5hmdU on the synthesized DNA strands (23,24).

In some cases, the modified nucleotides within the DNAs can undergo further modifications, resulting in a new form named hypermodification (4). Recently, many types of hypermodified thymidines have been discovered in phages (Supplementary Figure S1), for instance, α-putrescinylthymidine (Nα-putT) in Delftia phage ΦW-14, α-glutamylthymidine (Nα-gluT) in Bacillus phage SP10, 5-aminoethoxy-2′-deoxymethyluridine (5-NeOmdU) in Salmonella phage ViI and 5-aminoethyl-2′-deoxyuridine (5-NedU) in Pseudomonas phage M6 (23,24). Commonly, the pivotal enzymes 5hmdU DNA kinases (5-HMUDKs) add one phosphate group to the 5-hydroxyl group of 5hmdU at the beginning of thymidine hypermodification; the resulting 5-phosphomethyl-2′-deoxyuridine (5-PmdU) will then be converted into different types of hypermodified thymidines, depending on the enzymes coexisting with 5-HMUDKs (Supplementary Figure S1) (24). In the Vil phage, the enzyme gp247 can recognize 5-PmdU and replace its phosphate group with a serine residue, producing 5-O-serinylthymidine (O-SerT) (24). Then, a pyridoxal phosphate (PLP)-dependent enzyme encoded by Vil gp226 can produce 5-NeOmdU from O-SerT through a decarboxylation reaction (23). In the M6 phage, 5-PmdU can undergo more complicated modifications. The enzyme gp51 catalyzes the reaction between 5-PmdU and glycine residue, forming the 5-Nα-glycinylthymidine (Nα-GlyT) modification. The Nα-GlyT modification is then converted into Cα-GlyT by the enzyme gp53, which is the first validated radical S-adenosylmethionine (SAM) enzyme involved in DNA modifications (24). Finally, the Cα-GlyT-modified DNA will be catalyzed to 5-NedU by a PLP-dependent enzyme encoded by M6 gp52. In the Pseudomonas PaMx11-like phage, the conversion from 5-PmdU to Nα-GlyT is catalyzed by the enzyme gp46. However, instead of the radical SAM enzyme and PLP-dependent enzyme, Nα-GlyT is utilized by a flavin adenine dinucleotide (FAD)-dependent oxidoreductase gp47 and an acetyltransferase gp48 in the PaMx11 phage, leading to the production of 5-aminomethyl-2′-deoxyuridine (5-NmdU) and 5-acetylaminomethyl-2′-deoxyuridine (5-AcNmdU) modifications.

Although DNA hypermodifications play critical roles in the life cycle of phages, no structure has been reported for any enzyme involved in the biosynthetic pathway of these hypermodifications. Here, we report the crystal structures of the gp46 and gp47 proteins from the Pseudomonas PaMx11 phage in their apo and DNA-bound forms. Our structures unveiled the detailed DNA binding modes for gp46 and gp47 and demonstrated a conserved base-flipping mechanism of DNA hypermodification. Based on the structural analyses, one plausible mechanism was proposed for the gp46-catalyzed glycine replacement of the 5-PmdU phosphate group. The overall structure of gp46 is novel, and it represents the first amino acid:DNA transferase domain-containing protein structure. In addition to DNA and FAD binding, the complex structure of gp47 also revealed the detailed mechanism for converting Nα-GlyT to 5-NmdU. Therefore, our studies shed great light on the biosynthetic mechanisms for DNA hypermodifications and provide a solid basis for further applications of these enzymes in detecting potential hypermodifications in mammals.

Materials and methods

Protein expression and purification

The full-length Pseudomonas PaMx11 gp46/PUGT (UniProt ID: A0A0S0MVI5) and Pseudomonas PaMx11 gp47/NGTO (UniProt ID: A0A0S0N8M3) were obtained from Tsingke Biotechnology Co., Ltd and ligated into a pSumo plasmid modified from pET-28a within BamHI and XhoI, respectively. A Ulp1 cleavage site was inserted between the Sumo protein and multiple cloning sites. Mutations of the enzymes were generated by overlap polymerase chain reaction and confirmed by DNA sequencing. The recombinant plasmids were transformed into Escherichia coli BL21(DE3) to produce the target proteins with an N-terminal His-Sumo fusion tag. Escherichia coli cells were cultured in Luria-Bertani (LB) medium at 37°C with 50 mg/ml kanamycin; when the OD600 reached 0.6–0.8, the bacteria were induced with 0.2 mM isopropyl-β-d-thiogalactoside at 18°C for 16 h. Bacteria were then collected by centrifugation, resuspended in buffer containing 20 mM Tris–HCl (pH 8.0), 500 mM NaCl and 20 mM imidazole (pH 8.0), and lysed by high pressure. Cell extracts were centrifuged at 17 000 rpm for 1 h at 4°C. Supernatants were purified with Ni-NTA, and the target protein was washed with lysis buffer and then eluted with a buffer containing 20 mM Tris–HCl (pH 8.0), 500 mM NaCl and 500 mM imidazole (pH 8.0). Ulp1 protease was added to remove the N-terminal His-Sumo tag of the recombinant protein and dialyzed against lysis buffer for 3 h. The mixture was applied to another Ni-NTA resin to remove the His-tagged proteins. Eluted proteins were concentrated by centrifugal ultrafiltration, loaded onto a pre-equilibrated HiLoad 16/600 Superdex 75 column and eluted at a flow rate of 1 ml/min with the same buffer containing 10 mM Tris–HCl (pH 8.0) and 100 mM NaCl. Purified fractions were pooled and concentrated by centrifugal ultrafiltration. The concentration was determined by A280 and diluted to 10 mg/ml for crystallization trials.

Electrophoretic mobility shift assay

For the gp46/PUGT enzyme and its mutants, 10 nM double-stranded DNA (dsDNA; 5′-O-Phos-TAGTCGTCGACTT-3′, 5′-Cy5.5-labeled-AAGTCGACGTCA-3′) was incubated with increasing concentration of the gp46/PUGT protein ranging from 1 to 500 μM. The total volume for each reaction system was 10 μl containing 2 μl 5× reaction buffer [100 mM NaCl, 100 mM Tris, pH 8.0, 5 mM dithiothreitol (DTT), 5 mM CaCl2 and 25% glycerol]. The dsDNA and gp46/PUGT enzyme mixtures were incubated for 1 h on ice and then subjected to electrophoresis for 1 h at 70 V on 6% native polyacrylamide gel electrophoresis (PAGE). The running buffer was 0.5× TBE containing 45 mM Tris–boric (pH 8.0) and 1 mM ethylenediaminetetraacetic acid. For the gp47/NGTO enzyme, 10 nM dsDNA (5′-Cy5.5-labeled-TAGTCATGACT-3′, 5′-Cy5.5-labeled-TAGTCATGACT-3′, containing 1-nt 5′-overhang at each end) was incubated with increasing concentration of the gp47/NGTO protein ranging from 1 to 16 μM. The total volume for each reaction system of each sample was 10 μl containing 2 μl 5× reaction buffer (100 mM NaCl, 100 mM Tris, pH 8.0, 5 mM DTT and 25% glycerol). The dsDNA and gp47/NGTO enzyme mixtures were incubated for 1 h on ice, and electrophoresis was performed with 6% native PAGE at 4°C in a running buffer containing 0.5× TBE for 10 h at 70 V. The gel was visualized using a Sapphire Biomolecular Imager (Azure Biosystems) using a method for Cy5.5 (laser 700 nm). The free and bound DNAs were quantified using ImageJ. Binding curves were fit individually using GraphPad Prism 10 software fitting with ‘One site – Specific binding with Hill slope’. Curves were normalized as a percentage of bound oligonucleotides and reported as the mean ± standard error of the mean (SEM) of the interpolated dissociation constant (Kd) from three independent experiments.

Crystallization

Crystals for the apo-form and DNA-bound Pseudomonas phage PaMx11 gp46/PUGT protein were grown by the hanging-drop vapor diffusion method at 18°C. For the apo-form structure, the drop contains 1 μl protein and 1 μl reservoir solution containing 0.1 M sodium chloride, 0.1 M HEPES (4-(2-Hydroxyethyl)-1-piperazineethanesuffonic acid) (pH 7.5) and 1.6 M ammonium sulfate. To prepare the gp46/PUGT–DNA complex, gp46/PUGT and DNA were mixed at a molar ratio of 1:1.2 at room temperature. The drop contains 1 μl protein–DNA complex and 1 μl reservoir solution composed of 20% (w/v) Polyethylene glycol (PEG) 8,000, 0.1 M MES (2-(N-morpholino) ethanesulfonic acid)/sodium hydroxide (pH 6.0) and 0.2 M calcium acetate.

Crystals for the apo-form and DNA-bound Pseudomonas phage PaMx11 gp47/NGTO protein structures were also crystallized using the hanging-drop vapor diffusion method at 18°C. For the apo-form crystal, 1 μl full-length gp47/NGTO protein was mixed with 1 μl reservoir solution composed of 1.0 M potassium/sodium tartrate, 0.1 M CHES (2-(N-cyclohexylamino)ethanesulphonic acid)/sodium hydroxide (pH 9.5) and 0.2 M lithium sulfate. To prepare the gp47/NGTO–dsDNA complex, NGTO and dsDNA were mixed at a molar ratio of 1:1.2 at room temperature. The drop contains 1 μl NGTO–dsDNA complex and 1 μl reservoir solution, which is composed of 20% (w/v) PEG 4,000, 0.1 M sodium citrate/citric acid (pH 5.5) and 10% (v/v) 2-propanol.

Data collection, structure determination and refinement

All crystals were cryoprotected using their mother liquor supplemented with 25% glycerol and snap-frozen in liquid nitrogen. The diffraction data were collected at beamlines BL17B, BL18U1 and BL19U1 of the National Facility for Protein Science in Shanghai at Shanghai Synchrotron Radiation Facility. For Pseudomonas PaMx11 gp46/PUGT, the diffraction data set was processed and scaled using XDS (25). The phase was determined by the molecular replacement (MR) method using the program Phaser with the structure predicted by Rosetta as the search model (26,27). Cycles of refinement and model building were carried out using PHENIX and COOT (28,29). For Pseudomonas PaMx11 gp47/NGTO, the diffraction data set was processed and scaled using XDS. The phase was determined by the MR method using the structure predicted by Rosetta as the search model (26,27). Cycles of refinement and model building were carried out using PHENIX and COOT (28,29). The PUGT–Ca2+–DNA complex and the NGTO–FAD–dsDNA complex were also solved by the MR method; the apo-form structures of PUGT and NGTO were used as the search model, respectively. The detailed statistics of data collection, processing and refinement are presented in Table 1. All structure figures were prepared with PyMOL (Schrödinger, Inc.).

Table 1.

Statistics of data collection and refinement

PaMx11 gp46PaMx11 gp46–DNA complexPaMx11 gp47PaMx11 gp47–DNA complex
PDB code8Z2M8Z2N8Z2O8Z79
Data collection
 Wavelength0.978520.979150.978520.97852
 Space groupP64P212121P3221P21
 Cell dimensions
  a, b, c (Å)77.13, 77.13, 86.3957.79, 92.03, 167.6486.56, 86.56, 146.7079.38, 102.77, 89.07
αβγ (°)90.00, 90.00, 120.0090.00, 90.00, 90.0090.00, 90.00, 120.0090.00, 107.98, 90.00
 Resolution (Å)30–2.0 (2.11–2.00)30–2.30 (2.42–2.30)30–2.30 (2.42–2.30)30–2.65 (2.79–2.65)
 Rmerge0.139 (1.218)0.154 (1.798)0.126 (1.201)0.118 (0.779)
 CC (1/2)0.999 (0.823)0.999 (0.827)0.999 (0.920)0.997 (0.785)
 I/σI20.7 (3.0)12.7 (1.5)21.4 (3.0)9.7 (2.0)
 Completeness (%)99.9 (99.9)99.9 (99.9)99.9(100.0)99.8 (100.0)
 Redundancy20.3 (19.9)13.5 (13.9)19.6 (18.1)4.7 (4.8)
Refinement
 Resolution (Å)30.0–2.030.0–2.3030.0–2.3030.0–2.65
 No. of reflections19 75440 47128 84839 501
 Rwork/Rfree0.171/0.1970.226/0.2750.188/0.2110.209/0.246
 No. of atoms2461524723199549
  Protein2305468421298449
  DNA502888
  Ligand/water15661190212
 B-factors29.2159.4150.2149.14
Root-mean-square deviations (RMSDs)
 Bond lengths (Å)0.0080.0100.0080.009
 Bond angles (°)0.7951.0620.9231.264
Ramachandran plot
 Favored/allowed (%)98.95/0.798.43/1.5796.81/2.4898.40/1.42
PaMx11 gp46PaMx11 gp46–DNA complexPaMx11 gp47PaMx11 gp47–DNA complex
PDB code8Z2M8Z2N8Z2O8Z79
Data collection
 Wavelength0.978520.979150.978520.97852
 Space groupP64P212121P3221P21
 Cell dimensions
  a, b, c (Å)77.13, 77.13, 86.3957.79, 92.03, 167.6486.56, 86.56, 146.7079.38, 102.77, 89.07
αβγ (°)90.00, 90.00, 120.0090.00, 90.00, 90.0090.00, 90.00, 120.0090.00, 107.98, 90.00
 Resolution (Å)30–2.0 (2.11–2.00)30–2.30 (2.42–2.30)30–2.30 (2.42–2.30)30–2.65 (2.79–2.65)
 Rmerge0.139 (1.218)0.154 (1.798)0.126 (1.201)0.118 (0.779)
 CC (1/2)0.999 (0.823)0.999 (0.827)0.999 (0.920)0.997 (0.785)
 I/σI20.7 (3.0)12.7 (1.5)21.4 (3.0)9.7 (2.0)
 Completeness (%)99.9 (99.9)99.9 (99.9)99.9(100.0)99.8 (100.0)
 Redundancy20.3 (19.9)13.5 (13.9)19.6 (18.1)4.7 (4.8)
Refinement
 Resolution (Å)30.0–2.030.0–2.3030.0–2.3030.0–2.65
 No. of reflections19 75440 47128 84839 501
 Rwork/Rfree0.171/0.1970.226/0.2750.188/0.2110.209/0.246
 No. of atoms2461524723199549
  Protein2305468421298449
  DNA502888
  Ligand/water15661190212
 B-factors29.2159.4150.2149.14
Root-mean-square deviations (RMSDs)
 Bond lengths (Å)0.0080.0100.0080.009
 Bond angles (°)0.7951.0620.9231.264
Ramachandran plot
 Favored/allowed (%)98.95/0.798.43/1.5796.81/2.4898.40/1.42

The highest resolution shell is shown in parentheses.

Table 1.

Statistics of data collection and refinement

PaMx11 gp46PaMx11 gp46–DNA complexPaMx11 gp47PaMx11 gp47–DNA complex
PDB code8Z2M8Z2N8Z2O8Z79
Data collection
 Wavelength0.978520.979150.978520.97852
 Space groupP64P212121P3221P21
 Cell dimensions
  a, b, c (Å)77.13, 77.13, 86.3957.79, 92.03, 167.6486.56, 86.56, 146.7079.38, 102.77, 89.07
αβγ (°)90.00, 90.00, 120.0090.00, 90.00, 90.0090.00, 90.00, 120.0090.00, 107.98, 90.00
 Resolution (Å)30–2.0 (2.11–2.00)30–2.30 (2.42–2.30)30–2.30 (2.42–2.30)30–2.65 (2.79–2.65)
 Rmerge0.139 (1.218)0.154 (1.798)0.126 (1.201)0.118 (0.779)
 CC (1/2)0.999 (0.823)0.999 (0.827)0.999 (0.920)0.997 (0.785)
 I/σI20.7 (3.0)12.7 (1.5)21.4 (3.0)9.7 (2.0)
 Completeness (%)99.9 (99.9)99.9 (99.9)99.9(100.0)99.8 (100.0)
 Redundancy20.3 (19.9)13.5 (13.9)19.6 (18.1)4.7 (4.8)
Refinement
 Resolution (Å)30.0–2.030.0–2.3030.0–2.3030.0–2.65
 No. of reflections19 75440 47128 84839 501
 Rwork/Rfree0.171/0.1970.226/0.2750.188/0.2110.209/0.246
 No. of atoms2461524723199549
  Protein2305468421298449
  DNA502888
  Ligand/water15661190212
 B-factors29.2159.4150.2149.14
Root-mean-square deviations (RMSDs)
 Bond lengths (Å)0.0080.0100.0080.009
 Bond angles (°)0.7951.0620.9231.264
Ramachandran plot
 Favored/allowed (%)98.95/0.798.43/1.5796.81/2.4898.40/1.42
PaMx11 gp46PaMx11 gp46–DNA complexPaMx11 gp47PaMx11 gp47–DNA complex
PDB code8Z2M8Z2N8Z2O8Z79
Data collection
 Wavelength0.978520.979150.978520.97852
 Space groupP64P212121P3221P21
 Cell dimensions
  a, b, c (Å)77.13, 77.13, 86.3957.79, 92.03, 167.6486.56, 86.56, 146.7079.38, 102.77, 89.07
αβγ (°)90.00, 90.00, 120.0090.00, 90.00, 90.0090.00, 90.00, 120.0090.00, 107.98, 90.00
 Resolution (Å)30–2.0 (2.11–2.00)30–2.30 (2.42–2.30)30–2.30 (2.42–2.30)30–2.65 (2.79–2.65)
 Rmerge0.139 (1.218)0.154 (1.798)0.126 (1.201)0.118 (0.779)
 CC (1/2)0.999 (0.823)0.999 (0.827)0.999 (0.920)0.997 (0.785)
 I/σI20.7 (3.0)12.7 (1.5)21.4 (3.0)9.7 (2.0)
 Completeness (%)99.9 (99.9)99.9 (99.9)99.9(100.0)99.8 (100.0)
 Redundancy20.3 (19.9)13.5 (13.9)19.6 (18.1)4.7 (4.8)
Refinement
 Resolution (Å)30.0–2.030.0–2.3030.0–2.3030.0–2.65
 No. of reflections19 75440 47128 84839 501
 Rwork/Rfree0.171/0.1970.226/0.2750.188/0.2110.209/0.246
 No. of atoms2461524723199549
  Protein2305468421298449
  DNA502888
  Ligand/water15661190212
 B-factors29.2159.4150.2149.14
Root-mean-square deviations (RMSDs)
 Bond lengths (Å)0.0080.0100.0080.009
 Bond angles (°)0.7951.0620.9231.264
Ramachandran plot
 Favored/allowed (%)98.95/0.798.43/1.5796.81/2.4898.40/1.42

The highest resolution shell is shown in parentheses.

Results

Crystal structure of apo-gp46

PseudomonasPaMx11 gp46 is 291 amino acids (aa) in length; it could catalyze the in situ synthesis of hypermodification Nα-GlyT from 5-PmdU (Figure 1A) (24). To unravel the structural basis for DNA recognition and catalytic mechanisms, we purified the full-length gp46 protein and performed extensive crystallization trials. One apo-form structure of gp46 was determined at 2.0 Å resolution (Table 1). As depicted in Figure 1B, gp46 is composed of 15 α-helices, which assemble into two helix bundles: HB1 and HB2. HB1 is formed by α1 from the N-terminus and α10–α13 from the C-terminus. Compared to HB1, HB2 is relatively larger; it is formed by α2–α9 and α14–α15. Between HB1 and HB2, there is a narrow and deep groove, which is strongly negative in charge. In contrast to the groove, the surrounding regions are positively charged and form stable electrostatic interactions with four sulfate groups (Figure 1C). As demonstrated by many structures, sulfate can mimic the phosphate group of DNA in interacting with proteins, such as the Tet-like dioxygenase complexed with 5mC DNA (30) and HhaI endonuclease (31).

The structure of 5-PmdU glycinyltransferase PUGT. (A) The stepwise catalytic mechanism for the production of 5-NmdU hypermodification. (B) The overall structure of the apo-PUGT protein is shown as cartoon and colored in the spectrum. The two helix bundles are indicated by dashed boxes. (C) Electrostatic surface representation of the apo-PUGT protein. The potential substrate binding region and catalytic center are indicated by a dashed circle. (D) Sequence of the dsDNA used for crystallizing with the PUGT enzyme. The dashed circles indicate the missing nucleotides in the crystal structure of the PUGT–Ca2+–DNA complex. (E) The structure of the PUGT–Ca2+–DNA complex. The two protomers of the PUGT protein are colored green and wheat, and the two DNA strands are colored yellow and purple, respectively. The 2Fo–Fc electron density maps were contoured at 1.5σ. (F) The overall structure of the PUGT complexed with dsDNA. Loop56 is colored orange and indicated by the black arrow. (G) The surface representation of the PUGT–Ca2+–DNA complex. The target nucleotide inserted into the catalytic center is indicated by a yellow dashed circle.
Figure 1.

The structure of 5-PmdU glycinyltransferase PUGT. (A) The stepwise catalytic mechanism for the production of 5-NmdU hypermodification. (B) The overall structure of the apo-PUGT protein is shown as cartoon and colored in the spectrum. The two helix bundles are indicated by dashed boxes. (C) Electrostatic surface representation of the apo-PUGT protein. The potential substrate binding region and catalytic center are indicated by a dashed circle. (D) Sequence of the dsDNA used for crystallizing with the PUGT enzyme. The dashed circles indicate the missing nucleotides in the crystal structure of the PUGT–Ca2+–DNA complex. (E) The structure of the PUGT–Ca2+–DNA complex. The two protomers of the PUGT protein are colored green and wheat, and the two DNA strands are colored yellow and purple, respectively. The 2FoFc electron density maps were contoured at 1.5σ. (F) The overall structure of the PUGT complexed with dsDNA. Loop56 is colored orange and indicated by the black arrow. (G) The surface representation of the PUGT–Ca2+–DNA complex. The target nucleotide inserted into the catalytic center is indicated by a yellow dashed circle.

Upon determination of the gp46 structure, the Dali server was then used to search for the homolog proteins of PaMx11 gp46 (32). However, no similar structure could be identified. The overall fold of gp46 is most similar to that of glycosidase or DNA lyase; the Z-score between these structures is ∼6. Sequence alignment showed that gp46 only shares ∼10% identity with glycosidase or DNA lyase. These analyses suggested that gp46 is novel in both function and overall fold. To better correlate with its function, gp46 was renamed as PUGT (phosphorylated-5hmdUglycinyltransferase) hereafter.

The structure of PUGT in complex with dsDNA

Like the apo-form structure, one DNA-bound PUGT structure was also determined at atomic resolution (2.3 Å, Table 1). The DNA is formed by two strands (strand A: 5′-T1A2G3T4C5G6T7C8G9A10C11T12A13-3′; strand B: 5′-T1′A2′G3′T4′C5′G6′A7′C8′G9′A10′C11′T12′A13′-3′), which are complementary to each other (Figure 1D). The structure of the PUGT–DNA complex belongs to the P212121 space group; each asymmetric unit contains one DNA duplex and two PUGT molecules. Each PUGT molecule binds to one end of the DNA duplex, forming a dumbbell-like, centrosymmetric conformation (Figure 1E).

The nucleotides at positions 1–12 are well ordered, whereas the adenines at the 3′-ends of both strands are largely disordered in the PUGT–dsDNA structure (Figure 1E). The 5′-end thymidine (T1) of each strand is flipped out and inserted into the catalytic site of PUGT (Figure 1E). No divalent cation is present in the protein–DNA sample, but strong electron densities were observed in the catalytic sites (Supplementary Figure S2A). The strong density and coordination suggested that the densities are caused by divalent cations, most likely the Ca2+ present in the crystallization condition. Therefore, we named this structure the PUGT–Ca2+–dsDNA complex. Structural superposition showed that the overall folds of the two PUGT protomers are virtually identical (Supplementary Figure S2B); the RMSD value between them is only 0.163 Å. Except for the T7:A7′ pair that existed in the middle, other base pairs at the two ends of the DNA are identical. Therefore, only one PUGT protomer was selected for analyzing the detailed DNA binding mode in the section below (Figure 1F).

Detailed interactions between PUGT and dsDNA

The dsDNA was precluded by a loop region between α5 and α6 (hereafter named Loop56), which hinders the dsDNA extension from the A2:T12′ base pair (Figure 1F). The narrow and deep groove is formed by Loop56, α9, α11 and α13; the 5′-end thymine (T1) is engulfed into the catalytic site within the groove (Figure 1G). It was well observed that the T1 nucleotide was inserted into the pocket surrounded by positively charged residues (Figure 1G). The PUGT–Ca2+–dsDNA complex structure was determined at high resolution, confirming that the two PUGT protomers form identical interactions with the DNA (Figure 2A and B). Instead of base-specific interaction, PUGT mainly recognizes the backbone of the DNA. The phosphate groups of G3 and T4 hydrogen bond to the main chains of Gly144 and Leu142, respectively, while Arg141 interacts with the phosphate group of C5 via its side chain (Figure 2A and B). Interestingly, Arg94 hydrogen bonds to the backbone phosphate group of A2 and hindered the extension of dsDNA together with the stacking interaction of His95 (Figure 2B). For the target nucleotide that needs to be modified, the residues composing the narrow groove only allow the single-stranded DNA entry. Via hydrophobic interactions with the sugar pucker of T1, Phe173 acts as a wedge and Trp146 functions as the bottom; in combination with Arg94 and His95, they force the nucleobase of T1 to be engulfed into the catalytic pocket (Figure 2C). This base-flipping mechanism has been seen in many other enzymes, such as DNA methyltransferase and DNA glycosidase (1).

Detailed interactions between the PUGT protein and dsDNA. (A) The schematic of the interactions between the PUGT enzyme and dsDNA. (B) Detailed interactions between the PUGT protein and the dsDNA backbone. The residues involved in dsDNA binding are shown as sticks. (C) Electrostatic surface representation of the target nucleotide entry site. (D) The detailed conformation of the target nucleotide and the surrounding residues. Ca2+ is shown as a gray sphere, and the water molecules are shown as red spheres. The glycine residue is colored yellow. The 2Fo–Fc electron density maps were contoured at 1.5σ. (E) The schematic of the interactions between the target nucleotide and the residues within the catalytic center of PUGT. (F) The potential phosphate group release site is indicated by a yellow dashed circle. The residues that may play roles in releasing the leaving phosphate group are shown as sticks. (G) Conformational changes associated with dsDNA binding. (H) The rotamer changes of the residues involved in dsDNA binding are indicated by arrows and shown as sticks.
Figure 2.

Detailed interactions between the PUGT protein and dsDNA. (A) The schematic of the interactions between the PUGT enzyme and dsDNA. (B) Detailed interactions between the PUGT protein and the dsDNA backbone. The residues involved in dsDNA binding are shown as sticks. (C) Electrostatic surface representation of the target nucleotide entry site. (D) The detailed conformation of the target nucleotide and the surrounding residues. Ca2+ is shown as a gray sphere, and the water molecules are shown as red spheres. The glycine residue is colored yellow. The 2FoFc electron density maps were contoured at 1.5σ. (E) The schematic of the interactions between the target nucleotide and the residues within the catalytic center of PUGT. (F) The potential phosphate group release site is indicated by a yellow dashed circle. The residues that may play roles in releasing the leaving phosphate group are shown as sticks. (G) Conformational changes associated with dsDNA binding. (H) The rotamer changes of the residues involved in dsDNA binding are indicated by arrows and shown as sticks.

The active site of PUGT

In the PUGT–Ca2+–dsDNA complex structure, Ca2+ is hexa-coordinated. In addition to the O4 atom of thymine, it also coordinates with the side chains of Glu243 and Thr244, as well as three water molecules (Figure 2D and E). The nucleobase of T1 of the DNA was clamped by the side chains of Cys247 and Trp146 (Figure 2D). The methyl group at the C5 position of T1 points toward a minor pocket, formed by Glu92, Tyr51, Tyr47, Glu265, Asp262, Lys261 and Asp24 (Figure 2F). This could be the pocket for binding and releasing the leaving phosphate group. Besides DNA, the glycine residue is also included in the crystallization sample. In the structure, one glycine molecule is captured at each of the catalytic sites, supported by the clear electron density maps (Figure 2D). The glycine was stacked with the nucleobase of T1 and was stabilized by the hydrogen bond interactions with the side chain of Asp24, Lys150 and Trp146 (Figure 2D and E). In detail, the carboxyl group of glycine is hydrogen bonded to the NE1 atom of Trp146, and the amino group was bound by the side chain of Lys150 and the side chain of Asp24 directly or via water molecule (Figure 2D and E). The two residues, Trp146 and Lys150, which played a central role in stabilizing the glycine residue, may also play a pivotal role in the catalytic reaction (Figure 2E).

DNA binding-induced conformational change of PUGT

The overall folds of the individual HB1 and HB2 subdomains are similar in the apo and complex structures of PUGT (Supplementary Figure S2C). However, structural superposition showed that HB1 moved toward HB2, forming a more rigid interface for DNA binding in the complex structure (Figure 2G). A set of residues changed their conformations when bound with dsDNA, and almost all these residues were involved in substrate binding (Figure 2H). Arg97 rotates to the minor groove of dsDNA to avoid a clash with the DNA backbone (Figure 2H). Arg94 and His95 changed their conformations to stack with the A2:T12′ base pair. They hindered the double-strand extension, together with the rotation of the side chains of Phe173 and Trp146, thereby kinking the to-be-modified nucleotide into the catalytic pocket. Asp175 and Cys247 rotate their side chain to interact with the target nucleoside (Figure 2H).

Validation of the interactions between PUGT and dsDNA

To gain more insight into the catalytic mechanism of PUGT, we also carried out crystallization trials for PUGT in the presence of dsDNA containing the 5-PmdU modification. Unfortunately, no crystal was obtained. We then designed a series of PUGT mutants and measured their DNA binding abilities by the electrophoretic mobility shift assay (EMSA) (Figure 3 and Supplementary Figure S3). The Cy5.5-labeled dsDNA substrate can be bound and shifted by the wild-type PUGT with the Kd value of ∼278 μM (Figure 3A and B, and Supplementary Figure S3A). In contrast to the wild-type PUGT protein, no detectable DNA binding ability could be observed for the R94A mutant, and only a trace amount of DNA is shifted by the H95A mutant, indicating their critical role in DNA binding (Figure 3C and D, and Supplementary Figure S3B and C). Since Leu142 interacts with the DNA substrate through the main chain N atom, the substitution of Leu142 by Ala (for the L142A mutant) has no obvious impact on DNA binding (Figure 3E and Supplementary Figure S3D). Phe173 acts as a ‘wedge’; together with Arg94 and His95, it kinks the to-be-modified nucleoside into the binding pocket. Ala substitution of Phe173 (for the F173A mutant) dramatically lowered the DNA binding ability of the protein (Figure 3F and Supplementary Figure S3E). Mutation of other DNA-interacting residues also impairs DNA binding by PUGT to some extent. Compared to the wild-type PUGT protein, the DNA binding affinities were lowered by 1.7–2.4-fold for the D175A, C247A and W146A mutants, whereas more dramatic reductions were observed for the D24A, K97A, K150A and E243A mutants (Figure 3A and G–M, and Supplementary Figure S3F–L). Consistent with the structural observations, these EMSA results confirmed the functional roles played by these DNA-interacting residues.

The EMSA experiments showing the impacts of key residue mutation on DNA binding by the PUGT protein. (A) Statistics of the DNA binding affinities of the wild-type or mutated PUGT proteins. W.B. represents the weak binding ability. Values are expressed as the mean ± SEM. The EMSA results of Cy5.5-labeled dsDNA substrate with (B) wild-type PUGT, (C) PUGT-R94A, (D) PUGT-H95A, (E) PUGT-L142A, (F) PUGT-F173A, (G) PUGT-W146A, (H) PUGT-D24A, (I) PUGT-E243A, (J) PUGT-K150A, (K) PUGT-C247A, (L) PUGT-R97A and (M) PUGT-D175A.
Figure 3.

The EMSA experiments showing the impacts of key residue mutation on DNA binding by the PUGT protein. (A) Statistics of the DNA binding affinities of the wild-type or mutated PUGT proteins. W.B. represents the weak binding ability. Values are expressed as the mean ± SEM. The EMSA results of Cy5.5-labeled dsDNA substrate with (B) wild-type PUGT, (C) PUGT-R94A, (D) PUGT-H95A, (E) PUGT-L142A, (F) PUGT-F173A, (G) PUGT-W146A, (H) PUGT-D24A, (I) PUGT-E243A, (J) PUGT-K150A, (K) PUGT-C247A, (L) PUGT-R97A and (M) PUGT-D175A.

To further dissect the potential interactions between PUGT and dsDNA, we performed molecular docking of PUGT with a dsDNA from the methyltransferase HhaI–dsDNA complex (PDB code: 1MHT) by the HDOCK online server (33). Like PUGT, HhaI methyltransferase utilizes a typical base-flipping mechanism to recognize the to-be-modified nucleotide (34). The modeled complex is shown in Supplementary Figure S4A, in which PUGT binds to the 12-bp dsDNA with the flipped-out nucleobase embedded in the middle. The bottom half of the DNA adopts a conformation similar to that observed in the PUGT–Ca2+–DNA complex structure (Supplementary Figure S4B). The backbone of the upper half of the DNA is then placed near Arg86 and Arg97 of Loop56, and His256 located at the C-terminus of α13 (Supplementary Figure S3C and D).

BLAST search identified 12 PUGT homolog proteins in the UniProt database (Supplementary Figure S5). Although not conserved in the plant Momordica charantia, the residues critical in the catalytic reaction are well conserved in PUGT homolog proteins from phage species, especially in the M6 gp51 enzyme, which shares the same activity with the PaMx11 PUGT enzyme (Supplementary Figure S1). Compared to other regions, the ‘Arg-rich’ Loop56 and the ‘FGPW’ motif-containing helix α9 possess higher sequence conservation. As suggested by the PUGT–Ca2+–DNA complex and the model described above, these two regions are responsible for dsDNA binding and kinking of the target nucleotide into the catalytic pocket. The residues involved in the phosphate-releasing pocket are also conserved in phages. Although not strictly identical, most residues are hydrophilic and negative in charge (Supplementary Figure S5). Interestingly, Trp146 of the PUGT enzyme is substituted by a tyrosine residue in Delftia phage ΦW-14, which may play a potential role in the binding of putrescine as the adduct (Supplementary Figure S1).

Crystal structure of PaMx11 gp47

The PUGT-catalyzed replacement of the 5-PmdU phosphate group with glycine leads to the Nα-GlyT modification on DNA, which will then be recognized and converted into the 5-NmdU modification by the Pseudomonas PaMx11 gp47 protein (Figure 1A and Supplementary Figure S1A) (24). To gain more insights into the biosynthetic mechanism of DNA hypermodification, we also performed extensive crystallization trials for the PaMx11 gp47. One apo-form PaMx11 gp47 protein structure was determined at 2.30 Å resolution (Table 1). The structure belongs to the P3221 space group. Although no exogenous FAD was added to the crystallization sample, an endogenous FAD molecule was captured by gp47 in the structure (Figure 4A and B). The gp47 protein comprises two domains: a FAD-binding domain with a typical Rossmann fold and a substrate-binding domain. Indicated by the overall fold, especially the classic FAD-binding domain, gp47 is a member of the glutathione reductase 2 (GR2) family. Most of the GR2 family proteins are FAD-containing flavoproteins, including glycine oxidase (35). Based on these analyses, we renamed the PaMx11 gp47 protein as the Nα-GlyT oxidase (NGTO), and the complex of gp47–FAD was renamed as the NGTO–FAD complex hereafter. Besides FAD, CHES and sulfate groups, which were present in the crystallization conditions, were also captured in the NGTO–FAD complex structure (Figure 4B). Interestingly, one CHES molecule is located in a prominent pocket above the isoalloxazine groups of FAD (Figure 4C). The surrounding region of the FAD-binding pocket was positively charged and attracted a sulfate group like the PUGT protein, suggesting a potential surface for the DNA backbone binding by the NGTO protein (Figure 4C).

The overall structure of 5-NmdU flavin-dependent lyase NGTO. (A, B) The structure of NGTO complexed with cofactor FAD. The FAD is shown as sticks with the C atoms colored yellow. The sulfate group and CHES molecules are also shown as sticks. (C) Electrostatic surface representation of the NGTO–FAD complex. The red dashed box indicates the potential target nucleotide entry site. (D) The five loop regions involved in cofactor binding are colored pink. The 2Fo–Fc electron density map of the FAD was contoured at 1.5σ. (E) Detailed interactions between the cofactor FAD and the NGTO protein.
Figure 4.

The overall structure of 5-NmdU flavin-dependent lyase NGTO. (A, B) The structure of NGTO complexed with cofactor FAD. The FAD is shown as sticks with the C atoms colored yellow. The sulfate group and CHES molecules are also shown as sticks. (C) Electrostatic surface representation of the NGTO–FAD complex. The red dashed box indicates the potential target nucleotide entry site. (D) The five loop regions involved in cofactor binding are colored pink. The 2FoFc electron density map of the FAD was contoured at 1.5σ. (E) Detailed interactions between the cofactor FAD and the NGTO protein.

FAD binding by NGTO

FAD adopts an extended conformation in the NGTO–FAD complex. As noticed in other GR2 family members (35), the FAD molecule was buried inside the NGTO protein with the adenine ring extended opposite to the FAD-binding domain, while the isoalloxazine ring was located at the cleft between the two domains and pointed toward the substrate-binding domain (Figure 4A and D). The interactions between the FAD and the NGTO proteins occurred through five flexible loops, including Loop1 (aa 7–13), Loop2 (aa 32–47), Loop3 (aa 105–109), Loop4 (aa 136–146) and Loop5 (aa 265–273) (Figure 4D). The residues in these loop regions contributed both hydrogen bonds and the hydrophobic interactions to stabilize the FAD cofactor (Figure 4D and E).

Structure of NGTO–FAD–dsDNA complex

To elucidate the structure bases for the catalytic properties of NGTO, we performed co-crystallization trials for NGTO and a series of dsDNA substrates with thymine overhang at their 5′-ends. One DNA-complexed NGTO structure was determined at 2.65 Å resolution; the structure belongs to the space group P21 (Table 1). The DNA (5′-T1A2G3T4C5A6T7G8A9C10T11-3′) is composed of 11 nucleotides (Figure 5A). There are four NGTO molecules in the asymmetric unit. As indicated by the low RMSD values (0.2–0.3 Å), the overall folds of the four NGTO molecules are virtually identical (Supplementary Figure S6A). No DNA was bound by the protomers C and D, whereas both protomers A and B of NGTO were bound with one dsDNA molecule, forming a continuous duplex in the structure (Supplementary Figure S6B). Interestingly, instead of T1 at the 5′-end, T11 at the 3′-end of the DNA is flipped out and inserted into the catalytic pocket (Figure 5A and B). The T1 nucleotides from the two adjacent duplexes form a T–T mismatch with each other; no base pair was observed for A2 in the structure (Supplementary Figure S6C and D). Consistent with the NGTO–FAD complex, the NGTO–FAD–dsDNA structure further confirmed that NGTO uses the positively charged surface in DNA binding (Figure 5C and Supplementary Figure S6D and E).

Structure of the NGTO–FAD–dsDNA complex. (A) Sequence of the dsDNA used for crystallizing with the NGTO enzyme. The T11 and T11′ are nucleotides that flip out from the dsDNA helix into the catalytic center. (B) The overall structure of the NGTO–FAD–dsDNA complex. The 2Fo–Fc electron density maps were contoured at 1.5σ. (C) Surface representation of the NGTO–FAD–dsDNA complex structure. The catalytic center is indicated by a yellow circle. (D) Structural comparison of the NGTO–FAD–dsDNA complex with the NGTO–FAD complex. (E) The structure elements involved in dsDNA substrate binding. Region I (aa 47–75), region II (aa 174–184) and region III (aa 200–210) are colored grey, purple and light blue, respectively. (F) The schematic of the interactions between the dsDNA substrate and the NGTO enzyme. (G) Detailed interactions between the NGTO protein and the dsDNA.
Figure 5.

Structure of the NGTO–FAD–dsDNA complex. (A) Sequence of the dsDNA used for crystallizing with the NGTO enzyme. The T11 and T11′ are nucleotides that flip out from the dsDNA helix into the catalytic center. (B) The overall structure of the NGTO–FAD–dsDNA complex. The 2FoFc electron density maps were contoured at 1.5σ. (C) Surface representation of the NGTO–FAD–dsDNA complex structure. The catalytic center is indicated by a yellow circle. (D) Structural comparison of the NGTO–FAD–dsDNA complex with the NGTO–FAD complex. (E) The structure elements involved in dsDNA substrate binding. Region I (aa 47–75), region II (aa 174–184) and region III (aa 200–210) are colored grey, purple and light blue, respectively. (F) The schematic of the interactions between the dsDNA substrate and the NGTO enzyme. (G) Detailed interactions between the NGTO protein and the dsDNA.

The RMSD values between the NGTO–FAD–dsDNA and the NGTO–FAD complexes are very low (0.504 Å), indicating no significant conformational changes between these structures (Figure 5D). Careful analysis revealed that three regions were involved in dsDNA binding, including region I (aa 47–55), region II (aa 174–184), which is a small β-sheet that was inserted into the minor groove of the dsDNA, and region III (aa 200–210) (Figure 5E). NGTO mainly recognizes the backbone of the dsDNA. The phosphate groups of A6′ and T7′ were hydrogen bonded to the main chain and side chain of Lys202, respectively (Figure 5F and G). The phosphate group of A6′ was hydrogen bonded to the side chain of Asn205 (Figure 5F and G). The side chain guanidyl groups of Arg210 and Arg181 hydrogen bond to the phosphate group of C5′ (Figure 5F and G). The side chains of Trp51 and Lys48 contact with the phosphate group of A2 (chain C) (Figure 5F and G). Pro179 and Tyr180 were inserted into the minor groove of the dsDNA, and the latter acts as the building block to stabilize the conformation of the dsDNA (Figure 5F and G).

Implication for the modified nucleotide recognition by NGTO

The cofactor FAD and the side chains of Trp51, Tyr180, Gln182 and Lys268 form a quadrate cage-like pocket in the NGTO–FAD–dsDNA structure; the isoalloxazine ring of FAD functions as the bottom of the cage (Figure 6A and B). The nucleobase of the flipped-out T11 is inserted into the pocket; the orientations of T11 nucleobase and the isoalloxazine ring of FAD are perpendicular to each other. The nucleobase of T11 is flanked by the side chains of Trp51 on one side and by Lys268 on the opposite side; the O2 and N3 atoms of T11 are specifically recognized by the side chain of Asn182. The conformation of T11 is further stabilized by the hydrogen bond interactions mediated by its phosphate group and the side chains of Tyr244 and Lys268 (Figure 5F and G). Tyr180 stacks with the dsDNA; besides serving as a temporary building block of the DNA duplex, Tyr180 also functions as a wedge to kink the target nucleotide into the pocket (Figure 6A).

Detailed interactions between the NGTO and target nucleotide. (A) The target nucleotide entry site. The Tyr180 residue that kinks the target nucleotide into the catalytic pocket is indicated by a yellow circle. (B) Detailed conformation and interactions of the target nucleotide. (C) 5-NmdU nucleotide is modeled into the catalytic pocket. (D) Conformational changes of the residues involved in dsDNA binding. (E) Comparison of the DNA binding affinities of the wild-type or mutated NGTO proteins. W.B. represents the weak binding ability. Values are expressed as the mean ± SEM. The EMSA results of Cy5.5-labeled dsDNA substrate with (F) wild-type NGTO, (G) NGTO-W51A, (H) NGTO-F52A, (I) NGTO-Y180A, (J) NGTO-Q182A, (K) NGTO-K202A, (L) NGTO-R210A and (M) NGTO-R242A.
Figure 6.

Detailed interactions between the NGTO and target nucleotide. (A) The target nucleotide entry site. The Tyr180 residue that kinks the target nucleotide into the catalytic pocket is indicated by a yellow circle. (B) Detailed conformation and interactions of the target nucleotide. (C) 5-NmdU nucleotide is modeled into the catalytic pocket. (D) Conformational changes of the residues involved in dsDNA binding. (E) Comparison of the DNA binding affinities of the wild-type or mutated NGTO proteins. W.B. represents the weak binding ability. Values are expressed as the mean ± SEM. The EMSA results of Cy5.5-labeled dsDNA substrate with (F) wild-type NGTO, (G) NGTO-W51A, (H) NGTO-F52A, (I) NGTO-Y180A, (J) NGTO-Q182A, (K) NGTO-K202A, (L) NGTO-R210A and (M) NGTO-R242A.

The methyl group at the C5 position of T11 points toward the isoalloxazine ring of FAD (Figure 6B). Between T11 and FAD, we observed additional electron density, which might be caused by the citrate molecule in the crystallization conditions. Based on these observations, one 5-NmdU substrate-binding model can be proposed (Figure 6C). The carboxyl group of 5-NmdU is likely recognized by the guanidyl of Arg242, and the glycine adduct will parallel with the isoalloxazine ring of FAD for further catalytic reaction (Figure 6C). Like the PUGT protein, the side chains of several NGTO residues undergo modest conformational change upon DNA binding, including Tyr244, Lys268, Phe52 and Tyr180 (Figure 6D).

Validation of the DNA binding mode of NGTO

To validate the interactions observed in the NGTO–FAD–dsDNA structure, we performed in vitro EMSAs using wild-type or mutated NGTO proteins (Figure 6EM and Supplementary Figure S7A–J). The EMSA experiments showed that the wild-type NGTO possesses a Kd value of 6.6 μM to the DNA substrate (Figure 6E and F). Compared with wild-type NGTO, the DNA binding affinity is 2-fold weaker for the W51A mutant of NGTO (Figure 6G), in which the T11-stacking residue Trp51 is substituted by Ala51. Ala substitution of Phe52, Tyr244 or Lys268 residue also weakened the DNA binding affinity of NGTO (Figure 6H and Supplementary Figure S7E and F). No detectable DNA binding affinity could be observed for the Y180A mutant (Figure 6I), further confirming the functional importance of Tyr180. The side chain of Gln182 formed sequence-specific hydrogen bond interaction with T11 in the NGTO–FAD–dsDNA structure. However, the substitution of Gln182 by Ala (for the Q182A mutant) only caused very modest impacts on DNA binding by NGTO (Figure 6J), suggesting that the hydrophobic stacking interaction is more important for the flipped-out thymine binding. In the future, it is worth investigating whether Gln182 could play a certain role in the discrimination of thymine from other nucleotides. Ala substitutions of the two backbone-interacting residues, Lys202 and Arg210, also decreased the DNA binding ability of NGTO (Figure 6K and L). Although the DNAs used in the EMSAs are not 5-NmdU modified, Ala substitution of the potential 5-NmdU-interacting residues Arg242 weakened the binding affinity between the DNA and NGTO (Figure 6M), likely due to the conformational change of the flipped-out nucleotide-binding pocket.

Like PUGT, we also performed a docking study for NGTO using the DNA from the HhaI–dsDNA complex by the HDOCK online server (33) (Supplementary Figure S8A). Structural superposition showed that the conformations of DNAs are very similar in the docked NGTO–DNA complex and our NGTO–FAD–DNA crystal structure (Supplementary Figure S8B and C), which further validated the protein–DNA interactions observed in our NGTO–FAD–DNA complex (Supplementary Figure S8D). Online Dali search revealed that the overall fold of NGTO is similar to many oxidases; however, these proteins possess a sequence identity of <20% with NGTO. The BLAST program identified some NGTO homologs in the UniProt database (Supplementary Figure S9); they showed very low sequence identities with PaMx11 NGTO, including the residues involved in the FAD cofactor and dsDNA binding. These observations suggested that the PUGT and NGTO enzymes are unique and likely evolved for the specific DNA hypermodification in the PaMx11 phage.

Reaction scheme

Based on the above structural observations, we proposed one potential reaction scheme for PUGT and NGTO in converting the 5-PmdU to the Nα-GlyT hypermodification and finally to the 5-NmdU hypermodification (Figure 7A and B): (i) The pyrimidine ring of 5-PmdU is first activated by the attacking of the sulfhydryl group of Cys247 to the C6 position. The PUGT enzyme forms a covalently bound intermediate with the target base ring. (ii) The O4 atom of the intermediate accepts one proton from the water coordinated with the metal ion. The phosphate group is released through the rearrangement of the intermediate, forming double bond at the C5 and C7 positions. (iii) The amino group of glycine nucleophilically attacks the C5=C7 double bond, forming an unstable intermediate. (iv) Rearrangement of the intermediate promotes the release of the sulfhydryl group of Cys247 and formation of the Nα-GlyT modification. (v) Following the catalytic processes by PUGT enzyme, the amine nucleophile from the glycine in the Nα-GlyT substrate attacks the C4a atom of the isoalloxazine ring of FAD, which facilitates the donation of the carboxyl proton of Nα-GlyT to the N5 atom, resulting in the extraction of the proton from the Cα of glycine adduct in Nα-GlyT. (vi) The reactions will facilitate the re-protonation of the substrate and FAD to FADH2. (vii) Water or hydroxide ion attacks the double bond aside from the imine group, splitting the imine intermediate into 5-NmdU and glyoxylate. (viii-ix) The produced FADH2 will be oxidized by O2 to generate H2O2.

Proposed reaction schemes of PUGT and NGTO enzymes for the stepwise production of 5-NmdU hypermodification. (A) (i) The pyrimidine ring of 5-PmdU is first activated by the attacking of the sulfhydryl group of Cys247 to the C6 position. The PUGT enzyme forms a covalently bound intermediate with the target base ring. (ii) The O4 atom of the intermediate accepts one proton from the water coordinated with the metal ion. The phosphate group is released through the rearrangement of the intermediate, forming double bond at the C5 and C7 positions. (iii) The amino group of glycine nucleophilically attacks the C5=C7 double bond, forming an unstable intermediate. (iv) Rearrangement of the intermediate promotes the release of the sulfhydryl group of Cys247 and formation of the Nα-GlyT modification. (B) (v) Following the catalytic processes by PUGT enzyme, the amine nucleophile from the glycine in the Nα-GlyT substrate attacks the C4a atom of the isoalloxazine ring of FAD, which facilitates the donation of the carboxyl proton of Nα-GlyT to the N5 atom, resulting in the extraction of the proton from the Cα of glycine adduct in Nα-GlyT. (vi) The reactions will facilitate the re-protonation of the substrate and FAD to FADH2. (vii) Then, water or hydroxide ion attacks the double bond aside from the imine group, splitting the imine intermediate into 5-NmdU and glyoxylate. (viii-ix) The produced FADH2 will be oxidized by O2 to generate H2O2.
Figure 7.

Proposed reaction schemes of PUGT and NGTO enzymes for the stepwise production of 5-NmdU hypermodification. (A) (i) The pyrimidine ring of 5-PmdU is first activated by the attacking of the sulfhydryl group of Cys247 to the C6 position. The PUGT enzyme forms a covalently bound intermediate with the target base ring. (ii) The O4 atom of the intermediate accepts one proton from the water coordinated with the metal ion. The phosphate group is released through the rearrangement of the intermediate, forming double bond at the C5 and C7 positions. (iii) The amino group of glycine nucleophilically attacks the C5=C7 double bond, forming an unstable intermediate. (iv) Rearrangement of the intermediate promotes the release of the sulfhydryl group of Cys247 and formation of the Nα-GlyT modification. (B) (v) Following the catalytic processes by PUGT enzyme, the amine nucleophile from the glycine in the Nα-GlyT substrate attacks the C4a atom of the isoalloxazine ring of FAD, which facilitates the donation of the carboxyl proton of Nα-GlyT to the N5 atom, resulting in the extraction of the proton from the Cα of glycine adduct in Nα-GlyT. (vi) The reactions will facilitate the re-protonation of the substrate and FAD to FADH2. (vii) Then, water or hydroxide ion attacks the double bond aside from the imine group, splitting the imine intermediate into 5-NmdU and glyoxylate. (viii-ix) The produced FADH2 will be oxidized by O2 to generate H2O2.

Discussion

Various types of DNA hypermodifications were found in phages, such as Nα-putT in Delftia phage ΦW-14, Na-gluT in Bacillus phage SP10, 5-NeOmdU in Salmonella phage ViI, 5-NedU in Pseudomonas phage M6 and 5-AcNmdU in Pseudomonas phage PaMx11 (23,24). The DNA hypermodifications were suggested to cope with the defense from bacteria and were produced by multiple steps beginning with the dNMPs. A pivot enzyme in the production of pyrimidine hypermodifications was the 5-HMUDK, which could catalyze 5hmdU to 5-PmdU (24,36). However, there is still no structure available to illustrate the catalytic mechanism of this type of DNA kinase. Solving the structure of the 5-HMUDK will significantly enhance our understanding of the DNA hypermodification processes because the phosphate group on the nucleotide base is an excellent leaving group and could be replaced by various nucleophilic groups, such as the amino group of the amino acids. In Pseudomonas phage PaMx11, transferring glycine by PaMx11 PUGT enzyme to the 5-PmdU modification will produce the Nα-GlyT modification, which could be further converted to 5-NmdU by the NGTO enzyme, and finally leads to the production of the 5-AcNmdU hypermodification in the Pseudomonas phage PaMx11 genome (24).

This study has determined four high-resolution crystal structures, including the PUGT and NGTO enzymes in complex with the dsDNA substrates and cofactors. In addition to DNA binding, these structures uncovered the potential catalytic mechanism for PaMx11 PUGT and NGTO (Figure 7). The molecular docking results suggested that the PUGT enzyme could form extensive interactions with the DNA substrates (Supplementary Figure S4D). However, due to the crystal packing, some potential protein–DNA interactions might not be observed in the PUGT–Ca2+–dsDNA complex. Therefore, further structural studies are still needed to confirm these interactions. The terminal thymine was inserted into the catalytic pocket (Figure 2D), leading us to dissect the recognition mode of the target nucleotide that was waiting to be processed. The glycinyltransferase PUGT follows a mechanism similar to that of 5mC methyltransferase DNMT1 (37), which uses cysteine to form a covalent intermediate and activates the C5 position of the pyrimidine ring for attack by the nucleophilic group.

Ca2+ ion was the only divalent cation present in the sample and crystallization condition for the PUGT–DNA complex (Figure 2D). In addition to stabilizing the target nucleotide, Ca2+ also plays a certain role in activating the catalytic water molecule. Like Ca2+, other divalent cations, such as Mg2+ and Mn2+, could also form hexa-coordination. In the future, it is worth investigating whether these cations could replace Ca2+ or play an even better function in the catalytic activity of PUGT. The EMSA experiments revealed that PUGT can bind dsDNA with thymine overhang at the 5′-end. In addition, it also confirmed the functional importance of many residues in DNA binding (Figure 3). However, the DNA binding affinity of PUGT is relatively low, and replacing the 5′-end thymine with 5-PmdU did not significantly improve the DNA binding affinity, implicating the urgent need for more proper substrate to clarify the binding and/or catalytic function of the critical residues. Due to the lack of product standards, we did not perform the enzyme activity assay.

In contrast to the PUGT–Ca2+–dsDNA complex, our NGTO–FAD and NGTO–FAD–dsDNA structures provided more direct insights into dsDNA binding and catalysis by NGTO. The extended dsDNA strands mimic a long dsDNA with the target nucleotide flipped out in the middle and inserted into the catalytic site of NGTO. Both molecular docking and the in vitro EMSA experiments are consistent with our structural observations. Based on these results, we proposed one catalytic mechanism for NGTO, similar to the flavin-dependent N-hydroxylating enzymes (38).

Since there is no commercial product of the 5-PmdU or 5-NmdU nucleotide available, we could not obtain the structure of PUGT (or NGTO) in complex with natural DNA substrates, which contains the 5-PmdU or 5-NmdU hypermodification. However, our studies have characterized these two enzymes in detail and provided detailed DNA binding properties and catalytic mechanisms to the maximum extent. In addition to the biosynthetic mechanism of DNA hypermodifications, our studies also provide a solid basis for developing hypermodified DNA-specific enzymes, which can be used to detect DNA hypermodifications in other species.

Data availability

The coordinates that support the findings of this study have been deposited in the Protein Data Bank with accession code 8Z2M for the Pseudomonas phage PaMx11 apo-PUGT structure, 8Z2N for the PUGT–Ca2+–DNA structure, 8Z2O for Pseudomonas phage PaMx11 NGTO–FAD and 8Z79 for the NGTO–FAD–dsDNA structure. Other data in this study are available from the corresponding author upon reasonable request.

Supplementary data

Supplementary Data are available at NAR Online.

Acknowledgements

We thank the staff from BL17B/BL18U1/BL19U1 beamline of the National Facility for Protein Science in Shanghai at Shanghai Synchrotron Radiation Facility for assistance during data collection.

Author contributions: B.W., J.G. and H.C. conceived the project. Y.W. and C.M. expressed, purified and grew crystals of the complex. Y.W., W.G., J.Y. and S.X. performed the biochemical assays. Y.W. and W.G. collected X-ray diffraction data. B.W. and J.G. solved the complex structures. B.W., J.G. and H.C. supervised the structural and biochemical studies, wrote and revised the manuscript.

Funding

National Natural Science Foundation of China [31900435 to B.W., 82272312 to H.C., 32371252 to J.G., 82101750 to S.X.]; Guangdong Provincial Science and Technology Department [2023B1212060013 and 2020B1212030004 to B.W.]; Fundamental Research Funds for the Central Universities, Sun Yat-Sen University [2023KYPT11 to B.W.]; 100 Top Talents Program of Sun Yat-Sen University [58000-12230029 to H.C.]; Shenzhen–Hong Kong–Macao Science and Technology Project [SGDX20220530111403024 to H.C.]; Guangzhou Science and Technology Plan [2024A04J3943 to S.X.]. Funding for open access charge: Fundamental Research Funds for the Central Universities, Sun Yat-Sen University [2023KYPT11 to B.W.].

Conflict of interest statement. None declared.

References

1.

Weigele
P.
,
Raleigh
E.A.
Biosynthesis and function of modified bases in bacteria and their viruses
.
Chem. Rev.
2016
;
116
:
12655
12687
.

2.

Hong
S.
,
Cheng
X.
DNA base flipping: a general mechanism for writing, reading, and erasing DNA modifications
.
Adv. Exp. Med. Biol.
2016
;
945
:
321
341
.

3.

Warren
R.A.
Modified bases in bacteriophage DNAs
.
Annu. Rev. Microbiol.
1980
;
34
:
137
158
.

4.

Gommers-Ampt
J.H.
,
Borst
P.
Hypermodified bases in DNA
.
FASEB J.
1995
;
9
:
1034
1042
.

5.

Swinton
D.
,
Hattman
S.
,
Crain
P.F.
,
Cheng
C.S.
,
Smith
D.L.
,
McCloskey
J.A.
Purification and characterization of the unusual deoxynucleoside, alpha-N-(9-beta-D-2′-deoxyribofuranosylpurin-6-yl)glycinamide, specified by the phage mu modification function
.
Proc. Natl Acad. Sci. U.S.A.
1983
;
80
:
7400
7404
.

6.

Kaminska
K.H.
,
Bujnicki
J.M.
Bacteriophage Mu Mom protein responsible for DNA modification is a new member of the acyltransferase superfamily
.
Cell Cycle
.
2008
;
7
:
120
121
.

7.

Khudyakov
I.Y.
,
Kirnos
M.D.
,
Alexandrushkina
N.I.
,
Vanyushin
B.F.
Cyanophage S-2L contains DNA with 2,6-diaminopurine substituted for adenine
.
Virology
.
1978
;
88
:
8
18
.

8.

Kirnos
M.D.
,
Khudyakov
I.Y.
,
Alexandrushkina
N.I.
,
Vanyushin
B.F.
2-Aminoadenine is an adenine substituting for a base in S-2L cyanophage DNA
.
Nature
.
1977
;
270
:
369
370
.

9.

Nikolskaya
I.I.
,
Lopatina
N.G.
,
Debov
S.S.
Methylated guanine derivative as a minor base in the DNA of phage DDVI Shigella dysenteriae
.
Biochim. Biophys. Acta
.
1976
;
435
:
206
210
.

10.

Nikolskaya
I.I.
,
Tediashvili
M.I.
,
Lopatina
N.G.
,
Chanishvili
T.G.
,
Debov
S.S.
Specificity and functions of guanine methylase of Shigella sonnei DDVI phage
.
Biochim. Biophys. Acta
.
1979
;
561
:
232
239
.

11.

Thiaville
J.J.
,
Kellner
S.M.
,
Yuan
Y.
,
Hutinet
G.
,
Thiaville
P.C.
,
Jumpathong
W.
,
Mohapatra
S.
,
Brochier-Armanet
C.
,
Letarov
A.V.
,
Hillebrand
R.
et al. .
Novel genomic island modifies DNA with 7-deazaguanine derivatives
.
Proc. Natl Acad. Sci. U.S.A.
2016
;
113
:
E1452
E1459
.

12.

Kulikov
E.E.
,
Golomidova
A.K.
,
Letarova
M.A.
,
Kostryukova
E.S.
,
Zelenin
A.S.
,
Prokhorov
N.S.
,
Letarov
A.V.
Genomic sequencing and biological characteristics of a novel Escherichia coli bacteriophage 9g, a putative representative of a new Siphoviridae genus
.
Viruses
.
2014
;
6
:
5077
5092
.

13.

Hutinet
G.
,
Swarjo
M.A.
,
de Crecy-Lagard
V.
Deazaguanine derivatives, examples of crosstalk between RNA and DNA modification pathways
.
RNA Biol.
2017
;
14
:
1175
1184
.

14.

Takahashi
I.
,
Marmur
J.
Replacement of thymidylic acid by deoxyuridylic acid in the deoxyribonucleic acid of a transducing phage for Bacillus subtilis
.
Nature
.
1963
;
197
:
794
795
.

15.

Price
A.R.
Bacteriophage PBS2-induced deoxycytidine triphosphate deaminase in Bacillus subtilis
.
J. Virol.
1974
;
14
:
1314
1317
.

16.

Casella
E.
,
Markewych
O.
,
Dosmar
M.
,
Witmer
H.
Production and expression of dTMP-enriched DNA of bacteriophage SP15
.
J. Virol.
1978
;
28
:
753
766
.

17.

Ehrlich
M.
,
Ehrlich
K.C.
A novel, highly modified, bacteriophage DNA in which thymine is partly replaced by a phosphoglucuronate moiety covalently bound to 5-(4′,5′-dihydroxypentyl)uracil
.
J. Biol. Chem.
1981
;
256
:
9966
9972
.

18.

Kuo
T.T.
,
Huang
T.C.
,
Teng
M.H.
5-Methylcytosine replacing cytosine in the deoxyribonucleic acid of a bacteriophage for Xanthomonas oryzae
.
J. Mol. Biol.
1968
;
34
:
373
375
.

19.

Kuo
T.T.
,
Tu
J.
Enzymatic synthesis of deoxy-5-methyl-cytidylic acid replacing deoxycytidylic acid in Xanthomonas oryzae phage Xp12DNA
.
Nature
.
1976
;
263
:
615
.

20.

Kuo
T.T.
,
Chow
T.Y.
,
Lin
Y.T.
A new thymidylate biosynthesis in Xanthomonas oryzae infected by phage Xp12
.
Virology
.
1982
;
118
:
293
300
.

21.

Mathews
C.K.
,
Wheeler
L.J.
,
Ungermann
C.
,
Young
J.P.
,
Ray
N.B.
Enzyme interactions involving T4 phage-coded thymidylate synthase and deoxycytidylate hydroxymethylase
.
Adv. Exp. Med. Biol.
1993
;
338
:
563
570
.

22.

Swinton
D.
,
Hattman
S.
,
Benzinger
R.
,
Buchanan-Wollaston
V.
,
Beringer
J.
Replacement of the deoxycytidine residues in Rhizobium bacteriophage RL38JI DNA
.
FEBS Lett.
1985
;
184
:
294
298
.

23.

Lee
Y.J.
,
Dai
N.
,
Walsh
S.E.
,
Muller
S.
,
Fraser
M.E.
,
Kauffman
K.M.
,
Guan
C.
,
Correa
I.R.
Jr
,
Weigele
P.R
Identification and biosynthesis of thymidine hypermodifications in the genomic DNA of widespread bacterial viruses
.
Proc. Natl Acad. Sci. U.S.A.
2018
;
115
:
E3116
E3125
.

24.

Lee
Y.J.
,
Dai
N.
,
Muller
S.I.
,
Guan
C.
,
Parker
M.J.
,
Fraser
M.E.
,
Walsh
S.E.
,
Sridar
J.
,
Mulholland
A.
,
Nayak
K.
et al. .
Pathways of thymidine hypermodification
.
Nucleic Acids Res.
2022
;
50
:
3001
3017
.

25.

Kabsch
W
XDS
.
Acta Crystallogr. D Biol. Crystallogr.
2010
;
66
:
125
132
.

26.

McCoy
A.J.
,
Grosse-Kunstleve
R.W.
,
Adams
P.D.
,
Winn
M.D.
,
Storoni
L.C.
,
Read
R.J.
Phaser crystallographic software
.
J. Appl. Crystallogr.
2007
;
40
:
658
674
.

27.

Baek
M.
,
DiMaio
F.
,
Anishchenko
I.
,
Dauparas
J.
,
Ovchinnikov
S.
,
Lee
G.R.
,
Wang
J.
,
Cong
Q.
,
Kinch
L.N.
,
Schaeffer
R.D.
et al. .
Accurate prediction of protein structures and interactions using a three-track neural network
.
Science
.
2021
;
373
:
871
876
.

28.

Emsley
P.
,
Cowtan
K.
Coot: model-building tools for molecular graphics
.
Acta Crystallogr. D Biol. Crystallogr.
2004
;
60
:
2126
2132
.

29.

Liebschner
D.
,
Afonine
P.V.
,
Baker
M.L.
,
Bunkoczi
G.
,
Chen
V.B.
,
Croll
T.I.
,
Hintze
B.
,
Hung
L.W.
,
Jain
S.
,
McCoy
A.J.
et al. .
Macromolecular structure determination using X-rays, neutrons and electrons: recent developments in Phenix
.
Acta Crystallogr. D Struct. Biol.
2019
;
75
:
861
877
.

30.

Hashimoto
H.
,
Pais
J.E.
,
Zhang
X.
,
Saleh
L.
,
Fu
Z.Q.
,
Dai
N.
,
Correa
I.R.
Jr
,
Zheng
Y.
,
Cheng
X
Structure of a Naegleria Tet-like dioxygenase in complex with 5-methylcytosine DNA
.
Nature
.
2014
;
506
:
391
395
.

31.

Horton
J.R.
,
Yang
J.
,
Zhang
X.
,
Petronzio
T.
,
Fomenkov
A.
,
Wilson
G.G.
,
Roberts
R.J.
,
Cheng
X.
Structure of HhaI endonuclease with cognate DNA at an atomic resolution of 1.0 Å
.
Nucleic Acids Res.
2020
;
48
:
1466
1478
.

32.

Holm
L.
Using Dali for protein structure comparison
.
Methods Mol. Biol.
2020
;
2112
:
29
42
.

33.

Yan
Y.
,
Tao
H.
,
He
J.
,
Huang
S.-Y.
The HDOCK server for integrated protein–protein docking
.
Nat. Protoc.
2020
;
15
:
1829
1852
.

34.

Klimasauskas
S.
,
Kumar
S.
,
Roberts
R.J.
,
Cheng
X.
HhaI methyltransferase flips its target base out of the DNA helix
.
Cell
.
1994
;
76
:
357
369
.

35.

Pedotti
M.
,
Rosini
E.
,
Molla
G.
,
Moschetti
T.
,
Savino
C.
,
Vallone
B.
,
Pollegioni
L.
Glyphosate resistance by engineering the flavoenzyme glycine oxidase
.
J. Biol. Chem.
2009
;
284
:
36415
36423
.

36.

Iyer
L.M.
,
Zhang
D.
,
Burroughs
A.M.
,
Aravind
L.
Computational identification of novel biochemical systems involved in oxidation, glycosylation and other complex modifications of bases in DNA
.
Nucleic Acids Res.
2013
;
41
:
7635
7655
.

37.

Miletić
V.
,
Odorčić
I.
,
Nikolić
P.
,
Svedružić
Ž.M.
In silico design of the first DNA-independent mechanism-based inhibitor of mammalian DNA methyltransferase Dnmt1
.
PLoS One
.
2017
;
12
:
e0174410
.

38.

Mügge
C.
,
Heine
T.
,
Baraibar
A.G.
,
van Berkel
W.J.H.
,
Paul
C.E.
,
Tischler
D.
Flavin-dependent N-hydroxylating enzymes: distribution and application
.
Appl. Microbiol. Biotechnol.
2020
;
104
:
6481
6499
.

Author notes

The first two authors should be regarded as Joint First Authors.

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site-for further information please contact [email protected]

Supplementary data

Comments

0 Comments
Submit a comment
You have entered an invalid code
Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.