-
PDF
- Split View
-
Views
-
Cite
Cite
Guanhua Zhu, Peiyong Jiang, Xingqian Li, Wenlei Peng, L Y Lois Choy, Stephanie C Y Yu, Qing Zhou, Mary-Jane L Ma, Guannan Kang, Jinyue Bai, Rong Qiao, Chian Xi Shirley Deng, Spencer C Ding, Wai Kei Jacky Lam, Stephen L Chan, So Ling Lau, Tak Y Leung, John Wong, K C Allen Chan, Y M Dennis Lo, Methylation-Associated Nucleosomal Patterns of Cell-Free DNA in Cancer Patients and Pregnant Women, Clinical Chemistry, Volume 70, Issue 11, November 2024, Pages 1355–1365, https://doi.org/10.1093/clinchem/hvae118
- Share Icon Share
Abstract
Cell-free DNA (cfDNA) analysis offers an attractive noninvasive means of detecting and monitoring diseases. cfDNA cleavage patterns within a short range (e.g., 11 nucleotides) have been reported to correlate with cytosine-phosphate-guanine (CpG) methylation, allowing fragmentomics-based methylation analysis (FRAGMA). Here, we adopted FRAGMA to the extended region harboring multiple nucleosomes, termed FRAGMAXR.
We profiled cfDNA nucleosomal patterns over the genomic regions from −800 to 800 bp surrounding differentially methylated CpG sites, harboring approximately 8 nucleosomes, referred to as CpG-associated cfDNA nucleosomal patterns. Such nucleosomal patterns were analyzed by FRAGMAXR in cancer patients and pregnant women.
We identified distinct cfDNA nucleosomal patterns around differentially methylated CpG sites. Compared with subjects without cancer, patients with hepatocellular carcinoma (HCC) showed reduced amplitude of nucleosomal patterns, with a gradual decrease over tumor stages. Nucleosomal patterns associated with differentially methylated CpG sites could be used to train a machine learning model, resulting in the detection of HCC patients with an area under the receiver operating characteristic curve of 0.93. We further demonstrated the feasibility of multicancer detection using a dataset comprising lung, breast, and ovarian cancers. The tissue-of-origin analysis of plasma cfDNA from pregnant women and cancer patients revealed that the placental DNA and tumoral DNA contributions deduced by FRAGMAXR correlated well with values measured using genetic variants (Pearson r: 0.85 and 0.94, respectively).
CpG-associated cfDNA nucleosomal patterns of cfDNA molecules are influenced by DNA methylation and might be useful for biomarker developments for cancer liquid biopsy and noninvasive prenatal testing.
Introduction
There is increasing interest in studying fragmentation patterns of cell-free DNA (cfDNA) molecules (1). The fragmentation of cfDNA is nonrandom and is related to the tissues of origin. For instance, size profiles of DNA molecules derived from placental or tumoral tissues are associated with a series of enhanced 10-bp periodic peaks below 146 bp, compared with background DNA mainly of hematopoietic origin (2, 3). The studies of cfDNA fragmentation patterns have shed light on a wide range of biological and clinical implications. For example, DNA nucleases such as deoxyribonuclease 1 like 3 (DNASE1L3) play an important role in generating characteristic end motifs of cfDNA (4–7). Studies of nuclease knockout mouse models have demonstrated the effect of deoxyribonuclease 1 (DNASE1) and DNASE1L3 on the generation of jaggedness of short and long cfDNA molecules, respectively (8–10). It has been demonstrated that end motifs and jagged ends were potentially useful biomarkers for various types of cancer and autoimmune diseases (7, 9, 11). Making use of fragment size information of cfDNA molecules, including long cfDNA molecules (12–14), would enhance the performance of noninvasive prenatal testing (15) and cancer detection (16–19).
A number of characteristic DNA methylation signatures present in different tissues or cell types have been demonstrated to be useful to trace the tissue of origin of plasma cfDNA (20). Recently, Zhou et al. demonstrated the feasibility of deducing cfDNA methylation by taking advantage of short-range cfDNA cleavage patterns (21), termed fragmentomics-based methylation analysis (FRAGMA). It revealed that the cfDNA cleavage patterns proximal to cytosine-phosphate-guanine (CpG) sites exhibited a correlation with their methylation states. These results suggested a close link between DNA methylation and fragmentomics of cfDNA molecules, providing new possibilities to analyze the DNA methylation status using cfDNA fragmentomics in the context of pregnancy and oncology (21). However, in contrast to the reported “CpG-adjacent cleavage patterns” (e.g., 11 nucleotides surrounding a CpG site) (21), the relationship between the CpG methylation states and fragmentomic patterns over a much longer distance, e.g., along multiple nucleosomes, has not been fully elucidated. Here we refer to such longer-range interactions as “CpG-associated cfDNA nucleosomal patterns.”
In this study, we developed an approach to depict the fragmentation of cfDNA over distances encompassing several nucleosomes away from a CpG site, termed FRAGMAXR. We have previously reported that CpG-adjacent cleavage patterns at tissue-specific differentially methylated CpG sites (DMSs) enabled the detection of cancer (21). Here, we first explored whether those DMSs are correlated with differential fragmentation signals in terms of genomic distances spanning several nucleosomes deduced from cfDNA molecules. We further assess the diagnostic performance of distinguishing between subjects with and without hepatocellular carcinoma (HCC), based on CpG-associated nucleosomal patterns surrounding the DMSs. Similarly, the power of FRAGMAXR for multicancer detection was also examined on the basis of lung, breast, and ovarian cancers. In addition, the feasibility of tracing the tissues-of-origin of cfDNA based on nucleosomal patterns was investigated using circulating fetal DNA and tumoral DNA in pregnant women and cancer patients, respectively (Fig. 1).

Schematic illustration for CpG-associated nucleosomal pattern analysis of cfDNA molecules. cfDNA molecules were aligned to the human reference genome and analyzed according to the genomic positions relative to the CpG sites, spanning multinucleosomal distance. To dissect the relationship between the nucleosomal patterns and DNA methylation, we determined DMSs between blood cells that are major contributors to plasma DNA and a targeted tissue of interest and deduced the nucleosomal patterns associated with genomic positions surrounding DMSs. Two types of DMSs are involved in this study. Type-A DMSs are CpG sites that exhibit hypomethylation in blood cells but are hypermethylated in a specific tissue of interest. Conversely, type-B DMSs are hypermethylated in blood cells and hypomethylated in the specific tissue. The nucleosomal pattern is defined as the proportion of cfDNA molecules fully spanning a window (e.g., 140 bp) centered at each queried genomic position. Making use of such nucleosomal patterns allows cancer detection and tissue-of-origin analysis for different pathophysiological states (e.g., pregnancy and cancer).
Materials and Methods
HCC Datasets
Two datasets generated by genome-wide sequencing (75 bp × 2 paired-end reads, Illumina) were obtained from previously published studies (2, 11). Dataset A comprised 89 human subjects, including 38 healthy controls, 17 subjects with chronic hepatitis B infection (HBV), and 34 patients with HCC (11). Dataset B comprised 225 human subjects, including 32 healthy controls, 36 subjects with cirrhosis, 67 subjects with HBV, and 90 patients with HCC. Of note, datasets A and B were processed by different experimental procedures in terms of DNA extractions and library preparations. The details regarding sample processing, library preparation, and sequencing alignment are described in Supplemental Methods.
Pregnant Subjects
This study was approved by the Joint Chinese University of Hong Kong-New Territories East Cluster Clinical Research Ethics Committee. Pregnant women attending the antenatal clinic at the Department of Obstetrics and Gynaecology, Prince of Wales Hospital, Hong Kong, China, were recruited with written informed consent. Peripheral blood samples were collected from 40 women in their first trimester of pregnancy.
Sample Processing and Sequencing
DNA was extracted from 1.6 mL of plasma from each sample using the QIAamp Circulating Nucleic Acid Kit (QIAGEN) according to the manufacturer's protocol. Indexed plasma DNA libraries were constructed using TruSeq Nano DNA Library Prep kits (Illumina) and xGen UDI-UMI adaptors (Integrated DNA Technologies). Multiplexed DNA libraries were sequenced using a pair-end mode (75 bp × 2) on a NovaSeq 6000 platform (Illumina). Sequence read alignment was performed using SOAP2 (22). Only uniquely aligned and nonduplicated paired-end reads, spanning an insert size of ≤600 bp, were used for downstream analysis. Genomic DNA was extracted from 200 to 400 µL of maternal buffy coat samples using the QIAamp DNA Blood Mini Kit (QIAGEN). Maternal genotype information was obtained from microarray analysis using the Infinium Omni2.5 BeadChip (Illumina). Fetal DNA fraction was determined using FetalQuantSD (23).
Nucleosomal Score of cfDNA
CpG-associated nucleosomal patterns associated with a genomic position were deduced by analyzing cfDNA molecules sized 120 to 180 bp falling within a genomic window centered at that position. As shown in Supplemental Fig. 1, we first performed 2 analyses related to a genomic position: (a) the number of cfDNA molecules that fully spanned a 140-bp window centered on the position was denoted as F and (b) the number of partially spanned cfDNA molecules whose end-points were located within the 140-bp window was denoted as P. The nucleosomal footprint signal of a genomic position was calculated by the following formula:
Nucleosome-protected genomic regions would be associated with more fully spanned cfDNA molecules and thus had higher observed nucleosomal footprint signals. The nucleosomal footprint signal value of each position in a target region (i.e., −800 to 800 bp relative to a CpG site) was normalized by subtracting the mean nucleosomal footprint signal of the target region. For simplicity, the subtracted nucleosomal footprint signal was referred to as a nucleosomal score that could quantitatively reflect the CpG-associated nucleosomal patterns. To make nucleosomal score analysis applicable to samples with shallow sequencing depths, the sequenced fragments derived from the regions associated with DMSs were pooled together to form an aggregate distribution of nucleosomal scores centered on the CpG sites.
Nucleosomal scores associated with DMSs were normalized by background nucleosomal scores derived from 1 000 000 random CpGs, which were subsequently translated into a nucleosomal z score by:
Each dataset had its own mean (μ) and SD (δ), calculated based on the data from half of the healthy subjects included in that dataset. The details are described in Supplemental Methods.
Cancer Detection
We performed a leave-one-out analysis to examine the diagnostic performance for cancer based on CpG-associated nucleosomal pattern analysis. The input feature vector for a sample contained nucleosomal scores of −800 to 800 bp relative to DMSs. The details are described in Supplemental Methods.
Results
Nucleosomal Patterns Deduced From cfDNA Molecules Across Differentially Methylated CpG Sites
The use of cfDNA cleavage patterns in close proximity to tissue-specific methylated CpG sites allowed cancer detection and tissue-of-origin analysis of cfDNA molecules (21). In this study, we attempted to analyze a longer-range association, termed CpG-associated nucleosomal patterns of cfDNA molecules. We first analyzed the correlation between the nucleosomal fragmentation patterns and the tissue-specific hypomethylated and hypermethylated CpG sites. By comparing to methylation densities of the white blood cells and HCC tumor tissues, we obtained 2 types of DMSs: 118 544 type-A DMSs with methylation level <30% in buffy coat and >70% in tumor and 842 892 type-B DMSs with methylation level <30% in tumor and >70% in buffy coat. Using cfDNA nonbisulfite sequencing data of healthy controls in dataset A (11), we observed that cfDNA fragmentations near the type-A and type-B DMSs both displayed prominent wave-like nucleosomal patterns. However, the nucleosomal phases in the 2 DMS types appeared to be opposite (Fig. 2A). Moreover, the nucleosomal patterns around DMSs remained generally consistent with more stringent methylation thresholds (Supplemental Fig. 2). To further explore how the 2 types of DMSs were associated with chromatin organizations of the genome, we overlapped the DMSs with compartment A or B. These compartments were identified through Hi-C experiments that capture chromatin conformation. Compartment A is enriched for open chromatin, which tends to be more accessible for transcription factors and gene expression. In contrast, compartment B, which is associated with closed chromatin, often corresponds to inactive genomic regions (24). As shown in Fig. 2B, type-A DMSs were about 4-fold more enriched in compartment A than in compartment B (78% vs 20%). In contrast, type-B DMSs showed a reverse trend of lower proportion in compartment A compared with B (35% vs 60%). The genomic regions without sufficient mapped reads were categorized as “other regions” generally exhibiting low mappability (25).

Comparison of nucleosomal patterns around type-A and type-B DMSs. (A), Mean nucleosomal scores (−800 to 800 bp relative to a CpG site) around type-A and type-B DMSs; (B), Percentages of the 2 types of DMSs located in compartments A and B of the genome.
Analysis of CpG-associated Nucleosomal Patterns in Cancer Patients
Compared to noncancerous individuals, cfDNA from cancer patients comprises both normal cfDNA molecules of primarily hematopoietic origin and circulating tumor DNA (ctDNA) released from tumor cells, which was usually a minority. As shown in Fig. 3A, compared with healthy controls, patients with advanced-stage HCC showed a reduced amplitude of nucleosomal patterns surrounding both types of DMSs. We measured the nucleosomal amplitude between each peak and its paired trough (denoted as A1 to A17, shown in Fig. 3B) and observed that all the 17 pairs showed smaller peak-trough amplitudes in advanced-stage HCC patients than healthy subjects (median: 0.49 vs 0.74; Fig. 3C). We compared amplitudes of nucleosomal patterns in individual subjects without HCC and patients with different stages of HCC tumors. There was a significant decrease of nucleosomal amplitude in HCC patients compared to non-HCC subjects (P < 0.001, Wilcoxon rank-sum test), with a gradual decrease of nucleosomal amplitude over tumor stages (median: 1.07, 1.04, and 0.88 for early-, intermediate-, and advanced-stage HCC, respectively; Fig. 3D). These results supported the hypothesis that the contribution of cancer-related DNA might reduce the amplitude of nucleosomal patterns in cfDNA.

Analysis of CpG-associated nucleosomal patterns in cancer patients. (A), Mean nucleosomal scores (−800 to 800 bp relative to a CpG site) around DMSs using cfDNA data of healthy controls and patients with advanced-stage HCC in dataset A. Thick and thin lines denote type-A (methylation level <30% in buffy coat and >70% in HCC tumor) and type-B (methylation level <30% in HCC tumor and >70% in buffy coat) DMSs, respectively; (B), Definition of peak-trough amplitudes. The vertical height between the peak and trough of the nucleosomal score waves of type-A and type-B DMSs is referred to as the peak-trough amplitude, denoted as A1 to A17; (C), Peak-trough amplitudes in healthy controls and advanced-stage HCC patients; (D), Maximum peak-trough amplitudes of individual healthy controls, HBV subjects, patients with early (eHCC), intermediate (iHCC), and advanced-stage HCC (aHCC), respectively; (E), ROC analysis using individual peak-trough amplitudes (A1 to A17) to distinguish between HCC and non-HCC subjects. The top 6 peak-trough amplitudes with optimal diagnostics are shown; (F), HCC probability deduced with a machine learning model by using nucleosomal scores around DMSs (−800 to +800 bp relative to CpG site); (G), ROC analysis using predicted probabilities of HCC to distinguish between HCC and non-HCC subjects. Abbreviations: ROC, receiver operating characteristic curve.
Next, we used nucleosomal patterns around DMSs for cancer detection. A receiver operating characteristic curve analysis using the amplitude of individual peak-trough pair (A1 to A17) enabled classification of HCC and non-HCC subjects with an area under curve (AUC) of up to 0.86 (Fig. 3E). We further utilized a broad range of nucleosomal patterns (−800 to 800 bp relative to a CpG site; details in Methods) as input features and an SVM model was used to determine the probability of having HCC for each sample. Leave-one-out analysis showed that the probability of having HCC was significantly elevated in patients with HCC, compared with subjects without HCC (median: 0.81 vs 0.09; P < 0.001; Fig. 3F). Additionally, patients with advanced HCC stages tended to have higher probabilities (median: 0.93) of having HCC compared with intermediate and early stages (median: 0.80 and 0.72, respectively). The SVM model exhibited the ability to differentiate between patients with early-stage HCC and HBV subjects (P < 0.001; Fig. 3F), while these 2 groups were not statistically distinguishable when only using maximum amplitude (P = 0.60; Fig. 3D). Such a result suggested that by incorporating a variety of features derived from nucleosomal patterns, we may enhance the sensitivity of detecting early-stage cancers. A receiver operating characteristic curve analysis demonstrated that the approach FRAGMAXR based on nucleosomal patterns of cfDNA could differentiate between subjects with and without HCC, achieving an AUC of 0.93 (Fig. 3G). At a specificity of 99%, FRAGMAXR could achieve a sensitivity of 58%, 89%, and 95% for early-, intermediate-, and advanced-stage HCC patients, respectively. Nucleosomal patterns are also capable of distinguishing subjects with and without HCC when DMSs were defined with different cut-offs of methylation levels (Supplemental Fig. 3). Interestingly, FRAGMAXR appeared to outperform fragment size (AUC = 0.82), motif diversity score (AUC = 0.86) reflecting the end motif diversity as previously reported (11), and ichorCNA based on copy number aberrations (26) (AUC = 0.70) for cancer detection (Supplemental Fig. 4A).
Downsampling analysis was performed to determine the minimal number of fragments that FRAGMAXR required for a robust classification between subjects with and without HCC. As a result, relative to the use of all fragments (median: 38 million), the use of 5 million fragments could achieve a comparable accuracy of classification (AUC: 0.93 vs 0.92; Supplemental Fig. 5). If we adopted FRAGMA combined with FRAGMAXR using 5 million fragments, AUC could be improved to 0.98, as compared with an AUC of 0.96 for FRAGMA (Supplemental Fig. 6A). At a specificity of 99%, when applying FRAGMAXR to analyze the 13 false negatives classified by FRAGMA, we reidentified 4 additional cancer cases. Combining FRAGMAXR and FRAGMA showed a relatively greater improvement of AUC than integrating FRAGMAXR with ichorCNA, motif diversity score, and size features (Supplemental Fig. 4B).
We further illustrated the feasibility of FRAGMAXR in detecting various types of cancer. We identified DMSs associated with lung, breast, and ovarian cancers, respectively, by comparing methylation levels between buffy coat cells and the respective tumor tissues (details in Supplemental Methods). We used nucleosomal patterns around DMSs to detect cancers in plasma cfDNA samples from the previously published DELFI cohort (17), including patients with lung (n = 12), breast (n = 54), and ovarian (n = 28) cancers, as well as healthy individuals (n = 245). We demonstrated that FRAGMAXR had a great power of multicancer detection (Fig. 4A), achieving AUC values of 0.99, 0.81, and 0.93 for lung, breast, and ovarian cancers, respectively (Fig. 4B). Furthermore, a higher predicted probability of cancer was generally observed in a patient with a higher ctDNA fraction (Supplemental Fig. 7).

Multicancer detection for lung, breast, and ovarian cancers using FRAGMAXR. (A), Probabilistic scores of having cancer in datasets with different cancer types; (B), Receiver operating characteristic curve analysis using predicted probabilistic scores of having cancer.
Generalizability of Cancer Detection Approach Across Different Datasets
To examine the generalizability of the analytic framework in this study, we tested our HCC detection approach on another cohort (i.e., dataset B). Using the aforementioned analysis on nucleosomal fragmentation patterns around DMSs, we first observed the decreased nucleosomal amplitude in HCC patients compared to non-HCC subjects (including healthy controls, patients with cirrhosis, and HBV carriers) (P < 0.001; Supplemental Fig. 8). Next, an SVM model achieved a good separation between HCC patients and non-HCC subjects using the probability of having HCC (median: 0.93 vs 0.10; P < 0.001; Fig. 5A), with a receiver operating characteristic curve AUC of 0.93 for differentiating HCC patients from all non-HCC subjects (Fig. 5B). If we downsampled to 5 million fragments, we could still achieve a good separation between HCC patients and non-HCC subjects with an AUC of 0.90, and the combined analysis with FRAGMA (AUC: 0.93) could further enhance the AUC to 0.95 (Supplemental Fig. 6B). At a specificity of 99%, FRAGMAXR could reidentify additional 26 cancer cases out of 51 false negatives that were initially determined by FRAGMA. Furthermore, as the vast majority of patients (94%) in this cohort were diagnosed at Barcelona Clinic Liver Cancer stage A, the patients were grouped into 3 levels of tumor burdens based on ctDNA fractions instead of tumor stages. We found that the predicted probability of HCC was increased with the elevated tumor burden (median HCC probability: 0.89, 0.95, and 0.96 for patients with a ctDNA fraction of <3%, 3%–10%, and >10%, respectively; Fig. 5A).

Cancer detection across datasets. (A), Cancer detection model for dataset B. Leave-one-out analysis was applied to dataset B to calculate HCC probability by using nucleosomal scores. Patients were grouped into 3 subgroups based on ctDNA fractions in plasma, which were estimated from copy number aberrations according to ichorCNA; (B), ROC analysis using predicted probabilities of HCC in dataset B; (C), ROC analysis of predicted probabilities of HCC in an independent test set based on a machine learning model trained from a different dataset. Abbreviations: ROC, receiver operating characteristic curve.
We further investigated the robustness of the nucleosomal patterns-based diagnostics present in this study. Thus, we tested whether the cancer detection model trained from one study cohort could be generalized to another cohort even with different sequencing protocols. We attempted to differentiate between patients with and without HCC in dataset A (11) by using the SVM model that was trained based on dataset B (2). Nucleosomal scores were normalized with background nucleosomal signal and z score statistic in each dataset to minimize interdataset biases (details in Methods). As a result, we could indeed observe the significantly higher probabilities of having HCC in patients with HCC, compared with non-HCC subjects in the independent test dataset (P < 0.001; Supplemental Fig. 9). We could achieve an AUC of 0.93 for differentiating HCC patients from healthy controls and an AUC of 0.86 for differentiating from all non-HCC subjects in which HBV carriers were also included (Fig. 5C), demonstrating that the nucleosomal patterns-based approach for cancer detection could indeed be generalized across different datasets. Of note, the AUC would decrease to 0.62 if the z score normalization was removed, demonstrating that the z score normalization was crucial for reducing preanalytical discrepancies between datasets (Supplemental Fig. 10).
Tissue-of-origin Analysis for Nucleosomal Patterns
We demonstrated that the use of FRAGMAXR could detect patients with cancers. One possible biological basis was that the signals deduced by FRAGMAXR might carry tumor-specific characteristics. To gain more evidence concerning the tissue specificity of such signals, we explored whether FRAGMAXR could reflect the DNA contribution into blood plasma from a specific tissue type. As the placental tissues harbored a large number of unique alleles that were present in placental tissues but absent in background maternal genomes, the placental contribution could be directly deduced using genotype information between the fetal and maternal genomes (3), providing a gold standard for assessing the nucleosomal pattern-based approach for deducing placental contribution.
We analyzed nucleosomal patterns from regions surrounding the CpG sites that show differential methylation levels in buffy coat and placenta (i.e., methylation level <30% in buffy coat and >70% in placenta and vice versa) using nonbisulfite cfDNA data from pregnant women. Pregnant women with higher fetal DNA fractions (≥10%) generally showed decreased nucleosomal amplitude compared to those with lower fractions (<10%) (P < 0.001; Fig. 6A). Such patterns were in line with the finding in HCC patients (Fig. 3D). Moreover, we used nucleosomal scores to construct a regression function with a Ridge model (27) for predicting fetal DNA fraction. Fetal DNA fractions predicted by nucleosomal patterns were well correlated with the actual fetal DNA fraction deduced by a single nucleotide polymorphism-based approach (23), with a Pearson correlation of 0.91 and 0.85 for training and test sets, respectively (Fig. 6B). On the other hand, the use of nucleosomal scores also enabled the prediction of ctDNA fractions in plasma samples from HCC patients. The predicted ctDNA fractions strongly correlated with those deduced by ichorCNA (Pearson r: 0.94). (Supplemental Fig. 11). Additionally, using nucleosomal z scores around DMSs as input features, we could develop an SVM-based model for multicancer classification using the plasma samples shown in Figs. 3 and 4, which included liver, lung, breast, and ovarian cancers. Leave-one-out analysis showed that we could correctly identify 79% (27/34) of liver cancer, 83% (10/12) of lung cancer, 70% (38/54) of breast cancer, and 71% (20/28) of ovarian cancer samples. These results suggested that nucleosomal pattern analysis could be used for deducing the tissue contribution into the plasma DNA pool.

Analysis of CpG-associated nucleosomal patterns in pregnant women. (A), Comparison of maximum peak-trough amplitudes between the individuals with lower (<10%) and higher fetal DNA fraction (≥10%); (B), Comparison between actual and predicted fetal DNA fractions in the training and test sets based on a machine learning regression model. The actual fractions were deduced by a single nucleotide polymorphism-based approach.
Discussion
In this study, we demonstrated that CpG-associated nucleosomal patterns of cfDNA molecules are associated with DNA methylation. The use of nucleosomal signals of cfDNA associated with DMSs could enable the detection of patients with cancer, suggesting the nucleosomal signals containing the tissue-of-origin information. The relationship between the nucleosomal signals and tissue-of-origin was further evidenced by the strong correlation of placenta DNA contributions deduced by FRAGMAXR and single nucleotide polymorphisms (23), as well as tumor DNA contributions determined by FRAGMAXR and copy number aberrations (26).
Recently, Zhou et al. demonstrated that the use of the cleavage profile of a CpG site could enable the deduction of its methylation status (21), termed FRAGMA technology. The original demonstration of the FRAGMA concept was based on a cleavage profile depicted by the frequency of cfDNA fragment ends at each position within an 11-bp range relative to the CpG of interest. In contrast, the nucleosomal signals defined in this study referred to an extended genomic region, covering multiple nucleosomal units. We therefore named the new approach FRAGMAXR. We conjectured that these 2 approaches might capture synergistic information on cfDNA fragmentation patterns. FRAGMA captured molecular information related to local cutting preference at a CpG site, while FRAGMAXR captured a wide-ranging chromatin structure. We showed that the combined use of FRAGMA and FRAGMAXR enhanced the detection of cancer compared with using either method solely (Supplemental Fig. 6), demonstrating the additional diagnostic power contributed by FRAGMAXR.
Previous studies on nucleosomal patterns of cfDNA mainly focused on its association with transcriptional activity. Snyder et al. attempted to correlate the spacing distance between 2 nucleosomes with RNA expression datasets to inform tissues-of-origin of cfDNA (28). Additionally, nucleosome occupancies at the transcription start site (29) and transcription factor binding sites (30) have been used for informing transcriptional activity. Nucleosomal patterns at the regulatory regions have been reported to be useful for cancer detection and tumor subtyping (19, 30–34). In this study, we analyzed fragmentation patterns at the genomic regions with differential methylation statuses. The majority of those DMSs are not located in the transcription start site and transcription factor binding site regions (Supplemental Fig. 12), suggesting that many other genomic regions harbor tissue-specific nucleosomal fragmentation patterns.
The biological mechanism as to why the amplitude of nucleosomal patterns appeared to decline in patients with HCC remains elusive in this study. One possible explanation might be that the nucleosomal phases around CpG sites across tumoral cells might be more heterogeneous compared with nontumoral cells (35). Hence, the nuclease-mediated cleavage patterns depending on chromatin structures during cfDNA generation would be relatively “blurred,” resulting in the decreased amplitude of nucleosomal patterns. However, in the realm of cancer diagnosis, it is worth noting that the early detection of cancer is often challenging, and further studies are needed to investigate whether our method could shift the tumor stages on diagnosis toward the earlier stages. Multicenter trials with large-scale sample sizes encompassing multiple cancer types are necessary for further validation of potential clinical applications of FRAGMAXR. Future research into the intricate impact of chromatin structures and epigenetics on cfDNA fragmentomics would be an exciting avenue.
In summary, we developed a cell-free DNA-based analytical approach utilizing the methylation-associated nucleosomal signals across multiple nucleosomes. The analysis of nucleosomal signals associated with the predefined tissue-specific methylation status would enable the analysis of disease states, such as cancer, or physiological states, such as pregnancy, opening many new possibilities for the use of plasma DNA sequencing for molecular diagnostics.
Data Availability
Sequencing raw data were obtained from previously published studies (2, 11, 17). The 2 HCC datasets were obtained from the European Genome-Phenome Archive (EGA) (https://www.ebi.ac.uk/ega/), and the corresponding data accession numbers are EGAS00001001024 (2) and EGAS00001003409 (11). The DELFI dataset was obtained from EGA under accession number EGAS00001003611 (17).
Supplemental Material
Supplemental material is available at Clinical Chemistry online.
Nonstandard Abbreviations
cfDNA, cell-free DNA; FRAGMA, fragmentomics-based methylation analysis; CpG, cytosine-phosphate-guanine; DMS, differentially methylated CpG site; HCC, hepatocellular carcinoma; HBV, hepatitis B virus; ctDNA, circulating tumor DNA; AUC, area under the receiver operating characteristic curve.
Human Genes
DNASE1L3, deoxyribonuclease 1 like 3; DNASE1, deoxyribonuclease 1.
Author Contributions
The corresponding author takes full responsibility that all authors on this publication have met the following required criteria of eligibility for authorship: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; (c) final approval of the published article; and (d) agreement to be accountable for all aspects of the article thus ensuring that questions related to the accuracy or integrity of any part of the article are appropriately investigated and resolved. Nobody who qualifies for authorship has been omitted from the list.
Guanhua Zhu (Conceptualization-Lead, Formal analysis-Lead, Investigation-Lead, Methodology-Lead, Writing—original draft-Lead, Writing—review & editing-Lead), Peiyong Jiang (Formal analysis-Lead, Investigation-Lead, Methodology-Lead, Supervision-Lead, Writing—original draft-Lead, Writing—review & editing-Lead), Xingqian Li (Formal analysis-Lead, Investigation-Lead), Wenlei Peng (Formal analysis-Lead, Investigation-Lead), L.Y. Lois Choy (Formal analysis-Equal, Investigation-Equal), Stephanie C.Y. Yu (Data curation-Equal, Writing—original draft-Supporting), Qing Zhou (Formal analysis-Supporting, Investigation-Equal), Mary-Jane L. Ma (Investigation-Equal, Writing—original draft-Supporting), Guannan Kang (Formal analysis-Supporting, Investigation-Supporting), Jinyue Bai (Formal analysis-Supporting, Investigation-Supporting), Rong Qiao (Formal analysis-Supporting, Investigation-Supporting), Chian Xi Shirley Deng (Investigation-Supporting), Spencer C. Ding (Formal analysis-Supporting), Wai Kei Jacky Lam (Formal analysis-Supporting), Stephen L. Chan (Investigation-Supporting), So Ling Lau (Data curation-Equal), Tak Y. Leung (Data curation-Equal), John Wong (Investigation-Supporting), K.C. Allen Chan (Formal analysis-Equal, Investigation-Lead, Methodology-Lead, Project administration-Equal, Resources-Equal, Supervision-Equal), and Y.M. Lo (Conceptualization-Lead, Funding acquisition-Lead, Investigation-Lead, Project administration-Lead, Resources-Lead, Supervision-Lead, Writing—original draft-Lead, Writing—review & editing-Lead)
Authors’ Disclosures or Potential Conflicts of Interest
Upon manuscript submission, all authors completed the author disclosure form.
Research Funding
This work was supported by the Innovation and Technology Commission of the Hong Kong SAR Government (InnoHK Initiative). Y.M.D. Lo is supported by an endowed chair from the Li Ka Shing Foundation and the Research Grants Council of the Hong Kong Special Administrative Region (SAR) Government under the theme-based research scheme (T12-401/16-W).
Disclosures
Y.M.D. Lo, K.C.A. Chan, P. Jiang, and G. Zhu have filed a patent application related to this work. P. Jiang is a consultant to KingMed Future; has royalties or licenses with Illumina, Grail, Xcelom, DRA, and Take 2; has filed patent applications related to cell-free DNA; has stock or stock options with Grail/Illumina, DRA, and Take 2; and is a director of DRA, Take2, KingMed Future, and Insighta. L.Y.L. Choy has filed patents on cell-free DNA analysis. S.C.Y. Yu has received patent royalties from Illumina, Xcelom, Take2, and DRA; has received financial support related to conference and travel from Oxford Nanopore Technologies; and has been granted or filed patents on cell-free nucleic acid analysis. W.K.J. Lam is a director of DRA and holds equities in Grail/Illumina. S.L. Chan has received research grants from MSD and Eisai; honoraria from MSC, AZ, Eisai, Roche, and Bayer; and consulting fees from MSC, AZ, Eisai, BMS, and Roche. K.C.A. Chan has royalties or licenses with Illumina, Grail, DRA, and Take 2; has received travel support from BioRad; has filed patent applications related to cell-free DNA; holds equities in DRA, Take2, Insighta, and Illumina; is a director of DRA, Take2, KingMed Future, Centre for Novostics, and Insighta. Y.M.D. Lo holds equities in Grail/Illumina, DRA, Take2, and Insighta; has filed patent applications related to cell-free DNA; has royalties or licenses from Illumina, Grail, Xcelom, DRA, Take 2, and Insighta; and has received consulting fees from Grail and Decheng Capital.
Role of Sponsor
The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, preparation of the manuscript, or final approval of the manuscript.
References
Author notes
Guanhua Zhu and Peiyong Jiang contributed equally to this work.