Abstract

Motivation

The two strands of the DNA double helix locally and spontaneously separate and recombine in living cells due to the inherent thermal DNA motion. This dynamics results in transient openings in the double helix and is referred to as “DNA breathing” or “DNA bubbles.” The propensity to form local transient openings is important in a wide range of biological processes, such as transcription, replication, and transcription factors binding. However, the modeling and computer simulation of these phenomena, have remained a challenge due to the complex interplay of numerous factors, such as, temperature, salt content, DNA sequence, hydrogen bonding, base stacking, and others.

Results

We present pyDNA-EPBD, a parallel software implementation of the Extended Peyrard-Bishop-Dauxois (EPBD) nonlinear DNA model that allows us to describe some features of DNA dynamics in detail. The pyDNA-EPBD generates genomic scale profiles of average base-pair openings, base flipping probability, DNA bubble probability, and calculations of the characteristically dynamic length indicating the number of base pairs statistically significantly affected by a single point mutation using the Markov Chain Monte Carlo algorithm.

Availability and implementation

pyDNA-EPBD is supported across most operating systems and is freely available at https://github.com/lanl/pyDNA_EPBD. Extensive documentation can be found at https://lanl.github.io/pyDNA_EPBD/.

1 Introduction

The structural integrity of biological macromolecules is primarily governed by hydrogen bonds (H-bonds) (Fig. 1a), which have natural vibration frequencies in the terahertz range (Turton et al. 2014). H-bonds are much weaker (∼few kBTs) than covalent bonds, causing the macromolecules to experience slow conformational motion resulting from the inherent thermal fluctuations at physiological temperatures.

DNA breathing dynamics and analysis. (a) The primary governance of macromolecules through hydrogen bonds (H-bonds). (b) Representation of a single base “flipping out of the stack,” showcasing a phenomenon known as DNA breathing. (c) Illustration of consecutive base-pairs breaking the H-bonds and opening simultaneously, referred to as DNA bubbles. (d and e) 3D surface plots highlighting the change in bubble intensity across varied lengths and base pairs (bps) for threshold value 1.5 under two conditions: P5 wild (d) and P5 mutant (e). (f and g) Average Coordinates profiles for AAV P5 wild (f) and mutant-promoter (g) sequences at individual base pairs, with the orange vertical block indicating nucleotide substitutions from AG to TC at the 50 and 51 positions (zero-indexed). (h) Average flipping profiles alongside the (i) corresponding q-factor for AAV P5 wild and mutant-sequence. For all the experiments, we set a minimum of 100 MCMC simulations using various initial conditions, setting the temperature to 310 K and employing 50 000 preheating steps followed by 80 000 post-preheating steps.
Figure 1.

DNA breathing dynamics and analysis. (a) The primary governance of macromolecules through hydrogen bonds (H-bonds). (b) Representation of a single base “flipping out of the stack,” showcasing a phenomenon known as DNA breathing. (c) Illustration of consecutive base-pairs breaking the H-bonds and opening simultaneously, referred to as DNA bubbles. (d and e) 3D surface plots highlighting the change in bubble intensity across varied lengths and base pairs (bps) for threshold value 1.5 under two conditions: P5 wild (d) and P5 mutant (e). (f and g) Average Coordinates profiles for AAV P5 wild (f) and mutant-promoter (g) sequences at individual base pairs, with the orange vertical block indicating nucleotide substitutions from AG to TC at the 50 and 51 positions (zero-indexed). (h) Average flipping profiles alongside the (i) corresponding q-factor for AAV P5 wild and mutant-sequence. For all the experiments, we set a minimum of 100 MCMC simulations using various initial conditions, setting the temperature to 310 K and employing 50 000 preheating steps followed by 80 000 post-preheating steps.

In living cells, the thermal fluctuations spontaneously induce local opening and closing of the DNA helix, referred to as DNA breathing, which can occur as a single base flipping out of the stack (base flipping) (Fig. 1b), or as a few consecutive base-pairs breaking the H-bonds and opening simultaneously (DNA bubbles) (Fig. 1c). Large amplitude bubbles describe local regions of DNA melting (denaturation).

Single base-pair opening/flipping has been studied by tracking the exchange of protons from imino groups with water (Maurice Guéron and Leroy 1987). DNA breathing is essential for DNA transcription, transcription factor (TF) binding, replication, enzymatic repair, and base pair methylation (Klimašauskas and Roberts 1995, Roberts and Cheng 1998, Choi et al. 2004, Dellarole et al. 2010, Fenwick et al. 2011, Alexandrov et al. 2012, Phelps et al. 2013). It was also observed that single base pair genetic variants, e.g. in the flanks of TF motifs, may influence DNA breathing dynamics and TF binding (Alexandrov et al. 2011b, Alexandrov et al. 2012, Jablensky et al. 2012, Duan et al. 2014).

The thermo-mechanical characteristics of DNA have been precisely described using a variety of models. Here we employ the mesoscopic Extended-Peyrard-Bishop-Dauxois (EPBD) model, an expansion of the Peyrard-Bishop-Dauxois (PBD) nonlinear model of DNA to examine DNA breathing dynamics through sequence-specific base-pair stacking potentials.

The probabilities for local DNA openings obtained from the EPBD model are equilibrium properties of the underlying free energy landscape, and similar information can be obtained from various thermodynamical models, such as the Poland-Sheraga model (Poland and Scheraga 1970). The parameters of the EPBD model have been derived from DNA melting experiments (Ares et al. 2005, Alexandrov et al. 2009a). (The parameterization is performed for a specific buffer. The DNA was dissolved to 200 mM in 30 mM Na phosphate buffer pH 7.5, 100 mM NaCl, 1 mM EDTA.)

The trajectories derived from EPBD offer unique insights into the relative lifetimes of DNA bubbles, insights that are unattainable through thermodynamical calculations alone (Hillebrand et al. 2020, 2021). Thermodynamics primarily focuses on a system’s overall energetics and equilibrium state, not offering detailed insights into the real-time kinetics leading to equilibrium. The determination of bubble lifetimes holds significant importance in understanding detailed DNA dynamics. Research on bubble lifetimes has revealed that, in several thoroughly studied promoters, long-lived bubbles frequently form, particularly at the transcription start sites (TSS), marking these regions as critical for transcription processes (Alexandrov et al. 2006, 2009b). The EPBD model also boasts single nucleotide resolution, allowing it to discern even minute variations without the need for window averaging, unlike thermodynamical models. This precision enables EPBD to readily identify the effects of single mutations on property profiles.

By using EPBD simulations to locate DNA breathing hotspots, we were able to: (i) design (computationally) single-point mutations that silence breathing dynamics of TSS, and to demonstrate (experimentally) that such mutations also alter transcription, without affecting the TF binding (Alexandrov et al. 2010); (ii) design (computationally) mismatches, which enhance bubble formation, that can lead (experimentally) to bidirectional transcription initiation in the absence of basal TFs (Alexandrov et al. 2009b); and (iii) design (computationally) base-pair substitutions or methylations, that change local DNA breathing and show (experimentally) that TF-DNA binding changes accordingly (Nowak-Lovato et al. 2013). Further, EPBD simulations have indicated significant changes in the spatio-temporal characteristics of double-stranded DNA in the presence of a UV-dimer (Blagoev et al. 2006) (which can be interpreted as an effective increase of the local temperature).

The EPBD modeling framework has been used to simulate the influence of non-ionizing terahertz (THz) radiation on DNA breathing and demonstrated (Alexandrov et al. 2010) that, at sufficient exposure, DNA bubbles can appear through a nonlinear resonance mechanism. Such resonance bubble formations may have a direct effect on transcription, replication, and DNA–protein binding, thus providing/suggesting a connection between THz radiation and biological function. Experimentally, it have been demonstrated that intense ultra-short THz pulses can indeed lead to non-thermal effects, including, modified gene expression profiles, gene-specific activation/repression, and changes in stem-cell differentiation (Bock et al. 2010, Alexandrov et al. 2011a, 2013, Titova et al. 2013, Perera et al. 2019, Shang et al. 2021). Importantly, based on the optical Kerr effect, it was shown that the interstrand H-bond modes (bubbles) of DNA are coherent delocalized/nonlinear phonon modes in the THz range at physiological conditions (González-Jiménez et al. 2016).

Experimental studies of looping of ultra-short DNA sequences revealed a discrepancy of up to six orders of magnitude between experimentally measured and theoretically predicted Jacobson-Stockmayer’s J-factors (Vafabakhsh and Ha 2012, Alexandrov et al. 2016), suggesting that, in addition to the elastic moduli of DNA, the presence of local single-stranded “flexible hinges” in DNA (i.e. DNA bubbles) can assist the loop formation (Yan and Marko 2004). Combining the Czapla-Swigon-Olson structural model of DNA with the EPBD model (Alexandrov et al. 2017), (without changing any of the parameters of the two models), it was shown that the calculated J-factors of calculated J-factors of ultra-short DNA sequences are within an order of magnitude of the experimental measurements.

2 Materials and methods

Our pyDNA-EPBD model utilizes Markov Chain Monte Carlo (MCMC) protocol. The specific algorithm is summarized in Supplementary Algorithm S1 and Fig. S1. We use the standard Metropolis algorithm to produce an equilibrium state of the system: First, a base pair, n0, is selected at random, then a strand (left or right) is selected randomly and a new value of the variable un0 or vn0 is proposed according to a thermal (at the given temperature) Gaussian distribution at this base pair. The proposed value is accepted according to the Metropolis probability, P: (a) P =1 if the energy, Enew, of the new configuration is lower than the energy, Eold, of the old configuration; and, (b) P=exp[(EoldEnew)/kBT], if Enew > Eold. The process is continued after thermal equilibrium is reached in the measurement phase. For each temperature, we performed several simulations (runs). In each of these runs, we compute the transfers opening profile yn. Subsequently, the displacements yn of each base pair is recorded at every Δt (intervals of selected MCMC steps). By conducting numerous runs, each with different initial conditions, we derived the average displacement/opening profile, denoted as yn. One of the main characteristics of DNA breathing and DNA bubbles is the displacement, yn (see Fig. 1e) from equilibrium position for each base pair in the sequence. This displacement signifies the transverse stretching of the H-bonds between the complementary nucleotides. An advantage of the yn profile is that it eliminates the need for window averaging typically required in thermodynamic calculations, thus making the average displacement/opening profiles sensitive to single base pair substitutions. The yn profile can be efficiently calculated by MCMC simulations, yielding results equivalent to those obtained by averaging over Langevin dynamics trajectories (Choi et al. 2004). Below we introduce the main DNA breathing characteristics that our tool pyDNA-EPBD calculates:

  • Average coordinates: Refers to the averaged transverse displacements, yn, i.e. the displacements, yn, averaged over the thermal fluctuations. The displacement profile, yn, is one of the distinct properties of the DNA breathing characteristics, which quantifies the extent to which each of the base pairs of the DNA sequence is “open” in equilibrium, i.e. the extent to which the H-bonds between the bps are stretched due to the thermal fluctuations.

  • Base flipping probability: Describes a particular kind of base movement in which at least one of the bases, in a base pair, flips out of the stack. The flipping exposes the base to the surrounding environment, which can be important for a variety of biological processes such as DNA repair, replication, and TF binding.

    The propensity of flipping characterizes this transition, by determining the fraction of disrupted H-bonds (openings) between complimentary nucleotides, i.e. the fraction of bps for which yn exceeds a certain threshold distance, as a function of temperature.

  • DNA bubble probability: Refers to extended DNA regions where the double helix unwinds and the two strands temporarily separate, due to the thermal motion. These transient opening or denaturation bubbles are part of the natural dynamics of DNA and can play a role in processes such as the initiation of transcription, replication and TF binding propensity. The probability for bubble existence, of a bubble beginning at bp n-th, of length l bps and with a displacement yn exceeding a given threshold (thr) (in Å) is a three dimensional (3D) tensor, P(n,l,thr), which can be computed as Alexandrov et al. (2006),
    (1)

    Here, ·M denotes averaging over M runs, ts is the total number of MCMC simulation steps. The bubble duration Δt (in MCMC steps) is a nonlinear function of its initial position (the beginning of the bubble), n, on its length, l, and amplitude threshold, thr.

  • Dynamic length: Since the average flipping profile represents the breathing propensity of each of the base pairs in a DNA region, we need a way to compare breathing profiles of two alleles that differ, e.g. due one or two SNPs. The dynamic length quantifies the statistical significance of the differences between two such flipping profiles. Specifically, the dynamic length, quantifies how many and precisely which base-pairs experience statistically significant changes caused by the introduced SNPs. We quantified the dynamic length by the concept of a “q-factor” as a function of the position of the base pair, which has been used to estimates the false discovery rate in multiple stochastic simulations/experiments (Lai 2017).

The average coordinates profile, (Fig. 1f), the flipping probability profile (Fig. 1g), the 3D bubble probability tensor (Fig. 1d and e), and the dynamic length profiles (Fig. 1h and i), are key factors for understanding the DNA breathing. The corresponding details are provided in the Supplementary Material.

3 Usage

The pyDNA-EPBD tool is distributed under the 3-Clause BSD License, and it is compatible with Windows, MacOS, and Linux-based systems. pyDNA-EPBD is designed to work seamlessly in large-scale deployments, whether on high-performance computing clusters or cloud infrastructures like Amazon Web Services. Users can provide the input DNA sequence in common formats, such as FASTA or text file. Upon running simulations with specific parameters, pyDNA-EPBD generates the following characteristic outputs: the average coordinates profile, the flipping probability profile, the 3D probability bubble tensor, and the dynamic length profile. The details on hyperparameters, experimental configuration for reproducibility and analysis is provided in the Supplementary Material.

Authors’ contributions

A.K. and M.B. developed and implemented pyDNA-EPBD, and A.K. performed all validation experiments. B.A., K.O.R., A.R.B., and A.U. developed and validated the EPBD algorithm in ANSI C, which served as the base of pyDNA-EPBD implementation. M.B., A.S., and B.A. consulted A.K. simulations. All authors wrote and edit the article.

Supplementary data

Supplementary data are available at Bioinformatics online.

Conflict of interest

None declared.

Funding

This work was supported by National Institute of Health [5R01MH116281-03 to B.A. and R01 HL128831 to A.U.].

Data availability

pyDNA-EPBD is Open Source Software published under the 3-Clause BSD License and can be found at https://github.com/lanl/pyDNA_EPBD.

References

Alexandrov
BS
,
Fukuyo
Y
,
Lange
M
et al.
DNA breathing dynamics distinguish binding from nonbinding consensus sites for transcription factor YY1 in cells
.
Nucleic Acids Res
2012
;
40
:
10116
23
.

Alexandrov
BS
,
Gelev
V
,
Monisova
Y
et al.
A nonlinear dynamic model of DNA with a sequence-dependent stacking term
.
Nucleic Acids Res
2009a
;
37
:
2405
10
.

Alexandrov
BS
,
Gelev
V
,
Yoo
SW
et al.
DNA dynamics play a role as a basal transcription factor in the positioning and regulation of gene transcription initiation
.
Nucleic Acids Res
2010
;
38
:
1790
5
.

Alexandrov
BS
,
Gelev
V
,
Yoo
SW
et al.
Toward a detailed description of the thermally induced dynamics of the core promoter
.
PLoS Comput Biol
2009b
;
5
:
e1000313
.

Alexandrov
BS
,
Phipps
ML
,
Alexandrov
LB
et al.
Specificity and heterogeneity of terahertz radiation effect on gene expression in mouse mesenchymal stem cells
.
Sci Rep
2013
;
3
:
1184
.

Alexandrov
BS
,
Rasmussen
,
Bishop
AR
et al.
Non-thermal effects of terahertz radiation on gene expression in mouse stem cells
.
Biomed Opt Express
2011a
;
2
:
2679
89
.

Alexandrov
BS
,
Valtchinov
VI
,
Alexandrov
LB
et al.
DNA dynamics is likely to be a factor in the genomic nucleotide repeats expansions related to diseases
.
PLoS One
2011b
;
6
:
e19800
.

Alexandrov
BS
,
Wille
L
,
Rasmussen
et al.
Bubble statistics and dynamics in double-stranded dna
.
Phys Rev E Stat Nonlin Soft Matter Phys
2006
;
74
:
050901
.

Alexandrov
LB
,
Bishop
AR
,
Rasmussen
K
et al.
The role of structural parameters in dna cyclization
.
BMC Bioinformatics
2016
;
17
:
68
10
.

Alexandrov
LB
,
Rasmussen
K
,
Bishop
AR
et al.
Evaluating the role of coherent delocalized phonon-like modes in dna cyclization
.
Sci Rep
2017
;
7
:
9731
.

Ares
S
,
Voulgarakis
NK
,
Rasmussen
et al.
Bubble nucleation and cooperativity in dna melting
.
Phys Rev Lett
2005
;
94
:
035504
.

Blagoev
K
,
Alexandrov
B
,
Goodwin
E
et al.
Ultra-violet light induced changes in dna dynamics may enhance TT-dimer recognition
.
DNA Repair (Amst)
2006
;
5
:
863
7
.

Bock
J
,
Fukuyo
Y
,
Kang
S
et al.
Mammalian stem cells reprogramming in response to terahertz radiation
.
PLoS One
2010
;
5
:
e15806
.

Choi
CH
,
Kalosakas
G
,
Rasmussen
et al.
DNA dynamically directs its own transcription initiation
.
Nucleic Acids Res
2004
;
32
:
1584
90
.

Dellarole
M
,
Sá Nchez
IE
,
de Prat Gay
G.
Thermodynamics of cooperative dna recognition at a replication origin and transcription regulatory site
.
Biochemistry
2010
;
49
:
10277
86
.

Duan
J
,
Shi
J
,
Fiorentino
A
et al. ;
Genomic Psychiatric Cohort consortium
.
A rare functional noncoding variant at the GWAS-implicated mir137/mir2682 locus might confer risk to schizophrenia and bipolar disorder
.
Am J Hum Genet
2014
;
95
:
744
53
.

Fenwick
RB
,
Esteban-Martín
S
,
Salvatella
X.
Understanding biomolecular motion, recognition, and allostery by use of conformational ensembles
.
Eur Biophys J
2011
;
40
:
1339
55
.

González-Jiménez
M
,
Ramakrishnan
G
,
Harwood
T
et al.
Observation of coherent delocalized phonon-like modes in DNA under physiological conditions
.
Nat Commun
2016
;
7
:
11799
.

Hillebrand
M
,
Kalosakas
G
,
Bishop
AR
et al.
Bubble lifetimes in dna gene promoters and their mutations affecting transcription
.
J Chem Phys
2021
;
155
:
095101
.

Hillebrand
M
,
Kalosakas
G
,
Skokos
C
et al.
Distributions of bubble lifetimes and bubble lengths in DNA
.
Phys Rev E
2020
;
102
:
062114
.

Jablensky
A
,
Angelicheva
D
,
Donohoe
G
et al.
Promoter polymorphisms in two overlapping 6p25 genes implicate mitochondrial proteins in cognitive deficit in schizophrenia
.
Mol Psychiatry
2012
;
17
:
1328
39
.

Klimašauskas
S
,
Roberts
RJ.
MHhal binds tightly to substrates containing mismatches at the target base
.
Nucleic Acids Res
1995
;
23
:
1388
95
.

Lai
Y.
A statistical method for the conservative adjustment of false discovery rate (q-value)
.
BMC Bioinformatics
2017
;
18
:
69
163
.

Maurice Guéron
MK
,
Leroy
JL.
A single mode of DNA base-pair opening drives imino proton exchange
.
Nature
1987
;
328
:
89
92
.

Nowak-Lovato
K
,
Alexandrov
LB
,
Banisadr
A
et al.
Binding of nucleoid-associated protein Fis to dna is regulated by dna breathing dynamics
.
PLoS Comput Biol
2013
;
9
:
e1002881
.

Perera
PGT
,
Appadoo
DR
,
Cheeseman
S
et al.
PC 12 pheochromocytoma cell response to super high frequency terahertz radiation from synchrotron source
.
Cancers (Basel)
2019
;
11
:
162
.

Phelps
C
,
Lee
W
,
Jose
D
et al.
Single-molecule FRET and linear dichroism studies of DNA breathing and helicase binding at replication fork junctions
.
Proc Natl Acad Sci USA
2013
;
110
:
17320
5
.

Poland
D
,
Scheraga
HA.
Theory of Helix-Coil Transitions in Biopolymers: Statistical Mechanical Theory of Order-disorder Transitions in Biological Macromolecules, Volume 11 of Molecular biology, Molecular biology; an international series of monographs and textbooks.
Academic Press, 1970,
797.

Roberts
RJ
,
Cheng
X.
Base flipping
.
Annu Rev Biochem
1998
;
67
:
181
98
.

Shang
S
,
Wu
X
,
Zhang
Q
et al.
0.1 Thz exposure affects primary hippocampus neuron gene expression via alternating transcription factor binding
.
Biomed Opt Express
2021
;
12
:
3729
42
.

Titova
LV
,
Ayesheshim
AK
,
Golubov
A
et al. Intense picosecond thz pulses alter gene expression in human skin tissue in vivo. In: Wilmink GJ, Ibey BL (eds.),
Terahertz and Ultrashort Electromagnetic Pulses for Biomedical Applications
, Vol.
8585
.
San Francisco, CA
: SPIE BiOS,
2013
,
117
126
.

Turton
DA
,
Senn
HM
,
Harwood
T
et al.
Terahertz underdamped vibrational motion governs protein-ligand binding in solution
.
Nat Commun
2014
;
5
:
3999
.

Vafabakhsh
R
,
Ha
T.
Extreme bendability of dna less than 100 base pairs long revealed by single-molecule cyclization
.
Science
2012
;
337
:
1097
101
.

Yan
J
,
Marko
JF.
Localized single-stranded bubble mechanism for cyclization of short double helix dna
.
Phys Rev Lett
2004
;
93
:
108108
.

This work is written by (a) US Government employee(s) and is in the public domain in the US.
Associate Editor: Arne Elofsson
Arne Elofsson
Associate Editor
Search for other works by this author on:

Supplementary data