COCOA: A Framework for Fine-scale Mapping of Cell-type-specific Chromatin Compartments Using Epigenomic Information

Author Notes

Abstract

Chromatin compartmentalization and epigenomic modifications play crucial roles in cell differentiation and disease development. However, precise mapping of chromatin compartment patterns requires Hi-C or Micro-C data at high sequencing depth. Exploring the systematic relationship between epigenomic modifications and compartment patterns remains challenging. To address these issues, we present COCOA, a deep neural network framework using convolution and attention mechanisms to infer fine-scale chromatin compartment patterns from six histone modification signals. COCOA extracts 1D track features through bidirectional feature reconstruction after resolution-specific binning of epigenomic signals. These track features are then cross-fused with contact features using an attention mechanism and transformed into chromatin compartment patterns through residual feature reduction. COCOA demonstrates accurate inference of chromatin compartmentalization at a fine-scale resolution and exhibits stable performance on test sets. Additionally, we explored the impact of histone modifications on chromatin compartmentalization prediction through in silico epigenomic perturbation experiments. Unlike obscure compartments observed in high-depth experimental data at 1-kb resolution, COCOA generates clear and detailed compartment patterns, highlighting its superior performance. Finally, we demonstrate that COCOA enables cell-type-specific prediction of unrevealed chromatin compartment patterns in various biological processes, making it an effective tool for gaining insights into chromatin compartmentalization from epigenomics in diverse biological scenarios. The COCOA Python code is publicly available at https://github.com/onlybugs/COCOA and https://ngdc.cncb.ac.cn/biocode/tools/BT007498.

Three-dimensional genome, Deep learning, Chromatin compartment, Histone modification, Cell-type specificity

Introduction

The three-dimensional (3D) architecture of chromatin is essential for gene expression regulation during cell differentiation and disease development [1,2]. Recent advances in next-generation sequencing have led to the development of several chromosome conformation capture techniques, such as Hi-C, Micro-C, and Pore-C [3–5], enabling the exploration of multiscale chromatin structural elements including chromatin compartments [3,6], topological associating domains (TADs) [7,8], loops [6], stripes [9], and microcompartments [10]. These techniques have revealed that the chromatin can be segregated into A and B compartments [3,11]. The A compartments are generally active chromatin, whereas the B compartments are mostly transcriptionally repressive. These chromatin compartments are closely related to the mechanisms underlying various key biological processes [12,13].

To identify chromatin compartments, sequencing data are usually processed into contact maps, and distance effects are eliminated using normalization methods. The normalized contact map is then used to calculate the correlation matrix (CM), which is subjected to principal component analysis (PCA). The sign of the first principal component (PC1) corresponds to the compartment state [3]. Most analyses related to chromatin compartments rely on CM and PC1 [14–16]. While the CM is commonly available and of high quality at mega-base scale, it becomes noisy at resolutions finer than 25 kb, failing to show clear plaid patterns due to its sparseness. Recent studies have suggested the associations of the fine-scale-resolution chromatin compartments with other structural elements [17,18], histone modifications, and chromatin accessibility [19]. However, the available chromatin compartment data do not match the scale of the epigenomic data, making the connection between epigenomics and chromatin compartmentalization a challenge. Furthermore, due to technical limitations and sequencing costs [20], experimentally mapping high-resolution chromatin compartments is both expensive and labor-intensive. Therefore, there is an unmet need for the development of a computational method to obtain the fine-scale CM across multiple cell lines.

In the past decade, deep learning [21] has emerged as a widely used tool in computational 3D genomics. These applications include various tasks such as TAD boundary recognition [22,23], chromatin loop detection [24,25], chromatin interaction data enhancement [26,27], interaction matrix generation [28–30], and single-cell Hi-C imputation [31,32]. While several methods explore contact map generation and enhancement, they lack cell-type specificity. For example, HiC-Reg [33] uses fourteen epigenomic signals from five cell lines to predict short-range chromatin interactions using random forests. Akita [29] and Orca [34] adopt convolutional neural networks to predict contact maps from DNA sequences. However, these methods are not capable of directly inferring contact maps across different cell types. Recently, two proposed methods, C.Origami [35] and Epiphany [30], address this limitation by utilizing histone modifications and chromatin accessibility data. C.Origami predicts short-range interactions by integrating CCCTC-binding factor (CTCF), chromatin accessibility, and DNA sequence information through a neural network containing the attention and convolutional modules. Epiphany uses multiple epigenomic signals to generate short-range chromatin contact maps. However, these methods have their own limitations in terms of chromatin compartmentation and method generalization. Firstly, these existing methods concentrate on the prediction of short-range interactions (TADs and loops) while ignoring long-range interactions (compartments). Additionally, the relationship between compartmentalization and histone modifications is still unresolved. Furthermore, these models require inputs in fixed bin sizes, limiting scalability and preventing across-resolution predictions.

To resolve these limitations, we introduce COCOA, a method that predicts the cell-type-specific CM using six types of accessible epigenomic modification signals. COCOA adopts bidirectional feature reconstruction and cross-attention fusion for bidirectional reconstruction and fusion of epigenomic data. Subsequently, residual feature reduction is applied to map the fused results into CM. COCOA is specifically designed to generate chromatin compartmentalization, and the predicted CM can be directly used to determine compartment statuses. We evaluated the performance of COCOA using multiple metrics, including mean square error (MSE), mean absolute error (MAE), GenomeDISCO score, Pearson correlation coefficient (PCC), structure similarity (SSIM) index, and peak signal to noise ratio (PSNR). The results demonstrate that COCOA accurately generates significant and biologically meaningful CMs. Furthermore, we conducted in silico perturbation experiments to investigate the influence of histone modifications on compartment prediction. Additionally, we tested the generalization performance of COCOA by making model predictions with resolution-specific and cell-type-specific data. The results show that COCOA enables robust performance at various resolutions across diverse cell lines, providing insights into the patterns of chromatin compartments in immune and disease tissues.

Method

Hi-C and Micro-C data sources and preprocessing

We collected publicly available processed Hi-C and Micro-C data of different cell lines from the 4DN database [36]. Intra-chromosomal contact maps were computed from these data for model training and testing (Table S1). Depending on the specific task, the intra-chromosomal contact maps were computed at different resolutions using the cooler package [37]. To eliminate the distance effect in the contact maps, we applied the observed-expected (OE) normalization method [3]. Finally, these normalized contact maps were converted into CMs, which clearly depict the plaid pattern of chromatin compartmentalization.

ChIP-seq data sources and preprocessing

Histone modification signals (H3K9me3, H3K27ac, H3K4me1, H3K27me3, H3K4me3, and H3K36me3) from the ChIP-seq [38] data for all cells were retrieved from the ENCODE project [39] (Table S2). The ChIP-seq data were binned to specific resolutions using the pyBigWig package (Figure 1A). After binning, a $l o g (x + 1)$ transformation and min-max normalization were performed on the data. Finally, the processed data were combined into an epigenomic matrix (EM).

Figure 1

COCOA pipeline and architecture

A. COCOA pipeline: the integration of six accessible epigenomic signals by resolution-specific binning serves as inputs to predict the CM. B. COCOA architecture: COCOA extracts 1D track features from each input (the bidirectional feature reconstruction module) and then combines these features with spatial contact features (the cross-attention fusion module). The contact features are further processed by the residual feature reduction module to obtain the final prediction result. Parameters are updated using a backpropagation algorithm with mixed loss functions. Refer to Method section for detailed information. CM, correlation matrix; AFF, attention feature fusion.

Open in new tab Download slide

Dividing matrices

The preprocessing step generates two matrices: a symmetric CM with dimensions n×n (⁠ $C M_{n \times n}$ ⁠) and an EM with dimensions m×n (⁠ $E M_{m \times n}$ ⁠). Each $C M_{i j}$ in the $C M_{n \times n}$ represents the correlation strength between genomic segments $i$ and $j$ ⁠. Values greater than 1 indicate that the two genomic segments have the same interaction mode, while values less than 1 indicate the opposite interaction mode, providing information about the status of chromatin compartments. Each $E M_{i j}$ in the $E M_{m \times n}$ represents the signal strength of genomic segment $j$ on epigenomic track $i$ ⁠.

To better preserve the plaid pattern of chromatin compartmentalization and adapt to the inputs of the neural network, we implemented the following processing scheme. First, the $C M_{n \times n}$ was divided into sub-matrices of $k \times k$ size (⁠ $S C M_{k \times k}$ ⁠), and the $E M_{m \times n}$ was divided into two sub-matrices of $m \times k$ size (⁠ $S L E M_{m \times k}$ and $S R E M_{m \times k}$ ⁠). We started at the diagonal position in the top-left corner of $C M_{n \times n}$ and moved horizontally, dividing it into $S C M_{k \times k}$ ⁠. Simultaneously, we divided the two corresponding groups of genomic loci from EM into $S L E M_{m \times k}$ and $S R E M_{m \times k}$ ⁠. After completing the horizontal division, we moved the current position diagonally by k positions. This process was repeated until the entire $C M_{n \times n}$ could no longer be divided. Due to computational resource constraints, we sampled the $S C M_{k \times k}$ in groups to minimize the size of the training datasets (⁠ $S L E M_{m \times k}$ and $S R E M_{m \times k}$ were synchronized to minimize). Finally, these data were saved separately for further modeling.

Combining predicted sub-matrices

The COCOA model takes the $S L E M_{m \times k}$ and $S R E M_{m \times k}$ for each chromosome division as inputs. It then outputs a series of predicted correlation sub-matrices. These sub-matrices sequentially cover a square matrix (⁠ $P C M_{n \times n}$ ⁠) with the same number of columns as the $E M_{m \times n}$ ⁠. The specific coordinates for covering each predicted correlation sub-matrix are determined by the corresponding inputs (⁠ $S L E M_{m \times k}$ and $S R E M_{m \times k}$ ⁠). Finally, the complete $P C M_{n \times n}$ is generated and saved for further biological analyses.

COCOA architecture

The COCOA model consists of three main components: bidirectional feature reconstruction, cross-attention fusion, and residual feature reduction (Figure 1B), which are described in the following sections.

Bidirectional feature reconstruction

The bidirectional feature reconstruction module consists of two matrix reconstruction (MR) layers. The construction of these MR layers is inspired by our previous work on chromatin interaction data enhancement [40]. Each MR layer consists of two parts: an aggregation convolution layer with a filter size of

N \times 1

and a linear reconstruction layer. The output of each MR layer is computed by Equations 1 and 2:

\begin{matrix} v (S E M) = T a n h ((S E M \otimes K_{N \times 1}) \times W_{i}) \end{matrix}

(1)

\begin{matrix} M R (S E M) = v (S E M) \cdot {v (S E M)}^{T} \times W_{j} \end{matrix}

(2)

where

\otimes

denotes the convolution operation,

K_{N \times 1}

represents convolution kernel (⁠

N \times 1

⁠),

T a n h

is the activation function [41],

\times

denotes Hadamard product, and

\cdot

denotes dot product.

S E M

represents

S L E M_{m \times k}

S R E M_{m \times k}

generated through preprocessing.

v^{T}

represents the transposition of the vector

v

⁠.

W_{i}

and

W_{j}

are learnable weight matrices, respectively. The MR layer aggregates multiple ChIP-seq track signals from different genomic loci into a 1D vector. This vector is in turn reconstructed into a low-ranking epigenomic track feature using learnable weight matrices. In summary, this module obtains bidirectional epigenomic track features by reconstructing the

S L E M_{m \times k}

and

S R E M_{m \times k}

⁠.

Cross-attention fusion

Next, the COCOA model employs the cross-attention fusion module to fuse bidirectional epigenomic track features. This module mainly contains two attention feature fusion (AFF) layers [42]. Each AFF layer has three parts: global feature extraction, local feature extraction, and attention fusion. The results of cross-attention fusion are defined by Equation 3:

\begin{matrix} C A F (P, Q) = c o n c a t (A F F_{1} (P, Q), A F F_{2} (Q, P)) \end{matrix}

(3)

where P and Q represent bidirectional epigenomic track features, respectively,

c o n c a t

denotes stacking two outputs in the same dimension, and

A F F

refers to an attention-based uniform and general neural network layer for feature fusion proposed by Dai et al. [42]. The cross-attention fusion module transforms epigenomic track features from the other direction into potential attention weights to reinforce the epigenomic track features in the current direction. By interleaving attention fusion and concatenation, a set of fused contact feature maps is obtained as inputs for the next module.

Residual feature reduction

The residual feature reduction module consists of a series of residual blocks, each containing several residual layers. Following the approach described in previous work [43], each residual layer is composed of convolutional layers with different convolution kernels, batch normalization (BN) layers [44], and activation functions. The computation of each layer is defined by Equations 4 and 5:

\begin{matrix} F (X) = T a n h (B N_{1} (X \otimes K_{n \times n})) \end{matrix}

(4)

\begin{matrix} R e s (X) = T a n h (B N_{2} (F (X) \otimes K_{m \times m}) + X) \end{matrix}

(5)

where

K

denotes the convolution kernels of different sizes,

T a n h

is the activation function, and

B N

represents the BN layer.

X

represents the fused contact feature maps for the first layer, and the output of the current layer serves as the input for the next layer. The residual feature reduction module decreases the channels of the contact features from the previous module, level by level. Throughout this process, the residual layer continuously filters to retain important information from the previous layer, aggregating it with the output of the current layer. Finally, the predicted correlation sub-matrix is obtained from the last layer of the residual feature reduction module.

Loss function

The COCOA model can be viewed as a function

F

with a parameter set

θ

⁠, which maps each group input

S L E M_{i, m \times k}

and

S R E M_{i, m \times k}

to the predicted correlation sub-matrix

P S C M_{i, k \times k}

[i.e.,

P S C M_{i, k \times k} = F (S L E M_{i, m \times k}, S R E M_{i, m \times k} : θ)

]. The training objective is to find a set of

θ^{*}

to enable

P S C M_{i, k \times k}

similar to the ground truth

S C M_{i, k \times k}

⁠. Therefore, COCOA initially uses the MSE loss to minimize the pairwise error of the genomic range

k \times k

between

P S C M

and

S C M

⁠. This loss can be described as Equation 6:

\begin{matrix} L_{M S E} (P S C M_{i, k \times k}, S C M_{i, k \times k}) = \frac{1}{m} \sum_{i = 1}^{m} {(P S C M_{i, k \times k} - S C M_{i, k \times k})}^{2} \end{matrix}

(6)

Subsequently, COCOA incorporates a perceptual loss based on the Visual Geometry Group (VGG) network [45] to restore structural information of the CM. Furthermore, the total variation (TV) loss [46] is added, which effectively smooths noise in computer vision, as a regularization term to suppress the noise of the

P S C M_{k \times k}

⁠. These losses are described as Equations 7 and 8:

H_{V G G} (P S C M_{i, k \times k}, S C M_{i, k \times k}) = \frac{1}{N} \sum_{k = 1}^{N} {(V G G (P S C M_{i, k \times k}) - V G G (S C M_{i, k \times k}))}^{2}

(7)

F_{T V} (P S C M_{k \times k}) = \sum_{i, j} {(P S C M_{i, j - 1} - P S C M_{i, j})}^{2} + \sum_{i, j} {(P S C M_{i + 1, j} - P S C M_{i, j})}^{2}

(8)

Finally, the training objective is represented by Equation 9:

θ^{*} = a r g m i n_{θ} [L_{M S E} (P S C M_{i, k \times k}, S C M_{i, k \times k}) + {α * F}_{T V} (P S C M_{i, k \times k}) + β * H_{V G G} (P S C M_{i, k \times k}, S C M_{i, k \times k})]

(9)

where

α

and

β

are scaling weights that range from 0 to 1.

COCOA training and hyperparameter exploration

Before model training, we preprocessed each chromosome of the HFFc6 Micro-C data [47] and corresponding ChIP-seq data. Chromosomes 1, 3, 5, 7, 9, 11, 13, 15, 17, and 19 were used as training sets, while chromosomes 18, 20, 21, and 22 were utilized for hyperparameter tuning. The remaining chromosomes were allocated for performance evaluation.

The COCOA model was implemented in Python 3.7 with PyTorch1.12 [48]. We trained the model with a batch size of 16 for 120 epochs, using the Adam optimizer [49] with an initial learning rate of 5E–4 (⁠ $l r_{i n i t} = 5 E - 4$ ⁠). All the training and testing processes were conducted on Intel(R) Xeon(R) CPU E5–2696 v4 and 503 GB of memory. During the training phase, the average calculation time of a single training epoch is ∼ 13 h 28 min. For the testing phase, taking chromosome 16 as an example, the calculation time at 25-kb resolution, 10-kb resolution, and 1-kb resolution is 56 min 8 s, 4 h 18 min 44 s, and 52 h 34 min 41 s, respectively. Additional details on model training and hyperparameters are provided in File S1.

Model evaluation

We started the evaluation process by making predictions on independent test sets using the best-trained model. The predicted correlation sub-matrices were then combined to form the intra-chromatin CM. The experimental chromatin interaction CMs at 25-kb resolution were considered as the ground truth. During evaluation, we used PCA provided by the sklearn package [50] to calculate the PC1 values of the two CMs. PC1 is generally considered to represent the A/B compartment information. Additionally, we discretized PC1 to obtain the chromatin compartment state, which was saved separately.

To assess model performance, we used several metrics, including MSE (Equation 6), MAE, SSIM (assessing the similarity of two CMs), and PSNR (measuring the quality score of the CMs) [51]. MAE, SSIM, and PSNR are defined by Equations 10–12:

\begin{matrix} M A E (Ŷ, Y) = \frac{1}{n} \sum_{i = 1}^{n} |{\hat{y}}_{i} - y_{i}| \end{matrix}

(10)

\begin{matrix} S S I M (Ŷ, Y) = \frac{(2 μ_{Ŷ} μ_{Y} + C_{1}) * (2 σ_{Ŷ Y} + C_{2})}{(μ_{Ŷ}^{2} + μ_{Y}^{2} + C_{1}) * (σ_{Ŷ}^{2} + σ_{Y}^{2} + C_{2})} \end{matrix}

(11)

\begin{matrix} P S N R (Ŷ, Y) = 10 * l o g_{10} (\frac{N}{M S E (Ŷ, Y)}) \end{matrix}

(12)

where

Ŷ

denotes the predicted CM, and

Y

represents the real CM. Furthermore, considering the chromatin compartmentalization information in the CMs, we evaluated their reproducibility using multiple PCC and GenomeDISCO score [52].

Results

Overview of COCOA

In this study, we proposed COCOA as a method for accurately predicting cell-type-specific chromatin compartment patterns at a fine-scale resolution by integrating epigenomic modification signals. COCOA only requires six epigenomic track signals as inputs, which are accessible for most tissues and cell lines in the ENCODE database [53]. The targets of COCOA are defined as the CMs of OE-normalized contact maps, allowing for the maximum retention of chromatin compartment pattern information. The COCOA framework connects these inputs and targets through binning, prediction, and combination operations (Figure 1A). Notably, in the binning process, we utilized the resolution-specific binning approach (i.e., ${Bin}_{epi} = {Bin}_{corr}$ ⁠) instead of a single bin per genome site approach (i.e., ${Bin}_{epi} = {Bin}_{corr} * resolution$ ⁠). This choice greatly improves the practicality of COCOA.

We trained COCOA on Micro-C data of HFFc6 along with corresponding ChIP-seq data (Tables S1 and S2) using backpropagation algorithm. Specifically, COCOA first utilizes the bidirectional feature reconstruction module to calculate the 1D track features separately from two inputs. This step captures the intrinsic association present in the original epigenomic data in each direction (Figure 1B; see Method). Subsequently, the cross-attention fusion module integrates these 1D track feature maps with space contact features based on crossed attention mechanisms (see Method). Lastly, the residual feature reduction module decodes these contact features to generate predicted results, which are then combined into a complete CM (see Method). In addition, a composite loss function is employed to minimize the distance between the predicted targets and the ground truth.

COCOA accurately predicts chromatin compartmentalization pattern

To assess the performance of COCOA, we applied the trained COCOA model to randomly selected epigenomic data from the test sets (Chr12, Chr14, and Chr16) to generate predicted CMs. We considered the CM calculated from the Micro-C data and its PC1 as the experimental CM, which can be regarded as the ground truth for comparison. Heatmaps in Figure 2A and Figure S1A compare the typical genomic regions using the predicted and experimental CMs. The results demonstrate that the predicted CM generally exhibits the correct chromatin compartmentalization pattern. Furthermore, COCOA shows outstanding generative capacity in capturing subtle chromatin compartments. Notably, the predicted CM shows more pronounced interactions in dissimilar chromatin compartment blocks (blue blocks) compared with the experimental CM, while exhibiting partial over-reinforcement in identical chromatin compartment blocks (red blocks). We also computed the PC1 values of the predicted CM and the experimental CM using the sklearn package [50]. Subsequently, the CMs were sorted based on the size of their respective PC1 values (Figure 2B, Figure S1B). The results indicate that the modularity phenomenon of the predicted CM resembles the modularity patterns observed in the experimental CM. Similar results can be obtained by sorting the two CMs according to the PC1 size obtained from the predicted CM (Figure S2A). In addition, the predicted CM successfully captures the white band regions present in the experimental CM (Figure S1B).

Figure 2

COCOA accurately predicts significant compartment patterns from epigenomic data

A. A representative region illustrating the experimental and predicted CMs on test chromosomes. B. Heatmaps showing the experimental and predicted CMs, sorted according to their respective PC1 sizes. The predicted CM demonstrates consistency with the compartment patterns observed in the experimental CM. C. The predicted CM exhibits patterns that align precisely with the waveforms of histone modification signals. Within the region marked by the black dashed lines, COCOA is able to correct the pattern misclassified by the experimental data. D. Analysis of the shifts of epigenomic modification signals within 375-kb neighborhoods surrounding compartment boundaries in both experimental and predicted CMs. PC1, the first principal component; CTCF, CCCTC-binding factor.

Open in new tab Download slide

To establish the biological significance of COCOA model predictions, we generated plots that illustrate the predicted CMs alongside the epigenomic signal tracks and the PC1 tracks. Figure S1C reveals that the predicted CM accurately shows plaid patterns of chromatin compartments, with each block of the plaid corresponding to a signal peak in the epigenomic data tracks. The PC1 values from the tracks of the experimental and predicted CMs also align precisely with these results. Importantly, COCOA can infer chromatin compartments that are consistent with the underlying epigenomic data but are not captured in the experimental CM (indicated by the black dotted lines in Figure 2C and Figure S2B). Moreover, we analyzed shifts of six epigenomic modification signals at compartment boundaries and randomly selected genomic loci, as done in previous studies [54]. Notably, we observed the consistent significant shifts of epigenomic modification signals within 375-kb neighborhoods around A/B compartment boundaries in both predicted and experimental CMs. $T$ hese shifts were obviously different from randomly selected genomic loci (Figure 2D, Figure S1D). It is worth noting that shifts of partial epigenomic modification signals of the predicted CM generated by COCOA outperformed those of the experimental CM in capturing some compartment boundaries (e.g., A2B boundary of H3K4me1 shown in Figure 2D).

Genome-wide performance evaluation of COCOA

The performance of COCOA was quantitatively analyzed on genome-wide test sets. We calculated the MSE, MAE, PNSR, and SSIM scores to evaluate the robustness of error, signal-to-noise ratio, and structure similarity of COCOA on test sets. Compared to the score between two biological replicates, COCOA achieved competitive error and similarity scores on the test sets (Figure 3A, left panel; Table S3), exhibiting only minimal fluctuation with variations in the quality of the input data and the chromosome size. This stability indicates that COCOA performs consistently across different prediction scenarios (Figure 3A, left panel; Figure S3A; Table S3). In addition, we adopted GenomeDISCO scores, designed to assess the reproducibility of contact maps, to validate the biological significance of the predicted CM. As shown in Figure 3A (right panel) and Figure S3B, COCOA achieved high reproducibility between the predicted and experimental CMs.

Figure 3

Genome-wide performance evaluation of COCOA in multiple metrics

A. The MSE and genomeDISCO scores for COCOA on the test chromosomes. “Ctrl” represents the MSE and genomeDISCO scores between the two biological replicates. B. Scatter plot showing the high correlation between the PC1 values of the predicted and experimental CMs across the test sets. C. Proportion of compartment pattern matching between the predicted and experimental CMs. The red and blue bars represent the proportion of compartments that overlap between the predicted CM and experimental CM. The green and purple bars indicate the proportion of compartments that differ between the predicted and experimental CMs. D. Correlation analyses between the predicted and experimental CMs. The “Ctrl” column represents the PCCs between the two biological replicates. The “PC1” row represents the PCC of the PC1 values between the two CMs, and the “A/B” row represents the PCC of the binarized PC1 values between the two CMs. MSE, mean square error; PCC, Pearson correlation coefficient.

Open in new tab Download slide

As the CM contains abundant information regarding chromatin compartmentalization, we preformed correlation evaluations at the CM, PC1-value, and compartment-state levels. Figure 3B shows a scatter plot of the PC1 values between the predicted and experimental CMs across the test sets (⁠ $R^{2} = 0.941, P < 2.2 E - 16$ ⁠). In Figure 3C, we observed that misclassification rates of A/B compartments were independent of chromosome length, and all remained below $10 %$ ⁠. Furthermore, the PCCs of the PC1 values for paired predicted and experimental CMs were higher than 0.9, with the same trend observed for the PCCs of A/B compartment states (Figure 3D, Figure S3C). These PCCs are also consistent with the results of the correlation assessment between the two biological replicates. To evaluate COCOA’s performance on inferring deep chromatin compartmentalization information, we calculated the mean PCC for each column between the predicted CM and the ground truth. The results showed that the predicted CM achieved a high mean PCC when compared to the experimental CM (Figure S4).

COCOA predicts chromatin compartmentalization changes to epigenomic perturbations

After confirming the accuracy of COCOA in inferring chromatin compartmentalization from epigenomic data, we used COCOA to perform in silico epigenomic perturbation experiments and assessed the impact of epigenomic signals on chromatin compartment pattern prediction.

In the single epigenomic signal perturbation (one-perturbation) experiments, we generated perturbed epigenomic data by setting one selected epigenomic signal to its minimum value while keeping other data unchanged. Subsequently, we predicted the corresponding CMs for the perturbed epigenomic data and compared them to their respective experimental CMs for the unperturbed data. The predicted results from one-perturbation experiments indicated that alerting H3K9me3 signal significantly influenced chromatin architecture, causing a substantial number of B-to-A compartment switches (Figure 4A). On the other hand, the perturbation of H3K4me1 signal led to a small proportion of A-to-B compartment switches. Perturbing other epigenomic signals (i.e., H3K27ac, H3K27me3, H3K36me3, and H3K4me3) had no significant effect on chromatin compartment patterns (Figure 4A, Figure S5A).

Figure 4

COCOA predicts compartment patterns for epigenomic perturbation experiments

The row where green “No perturbation” (reference) indicates the comparison between the predicted CM and the experimental CM (ground truth) for the unperturbed data. Red, black, and blue labels in the vertical axis indicate high, medium, and low impact of the perturbed epigenomic signals on the chromatin compartment patterns, respectively. A. Proportion of compartment pattern matching between the predicted CM and the experimental CM from the one-perturbation epigenomic combinations. B. Proportion of compartment pattern matching between the predicted CM and the experimental CM from the two-perturbation epigenomic combinations.

Open in new tab Download slide

To further analyze the contribution of individual epigenomic signals to the maintenance of chromatin compartment patterns, we conducted keep-one epigenomic signal perturbation (keep-one perturbation) experiments. In these experiments, we perturbed the epigenomic data by maintaining the selected epigenomic signal data while setting all other epigenomic signal data to their minimum values. Subsequently, we utilized COCOA to predict the chromatin compartment patterns for the perturbed data. The keep-one perturbation experiments revealed that the predicted CMs from H3K9me3 or H3K4me1 signal partially overlapped with their respective experimental CMs for the unperturbed data, while those predicted CMs from other epigenomic signals exhibited distinct differences from their respective experimental CMs (Figure S6A and B). This result reinforced the importance of H3K9me3 and H3K4me1 for predicting the status of chromatin architecture.

We next investigated the effects of two epigenomic signals on the chromatin compartmentalization through two epigenomic signal perturbation (two-perturbation) experiments. We observed that H3K9me3 and H3K4me1 signals play dominant roles in determining A/B compartment patterns. When H3K9me3 or H3K4me1 signal was perturbed along with H3K27ac, H3K27me3, H3K36me3, or H3K4me3 signal, the predicted chromatin compartment patterns exhibited significant changes compared to those for the unperturbed data (Figure S5B). Notably, perturbing the H3K9me3 signal gave rise to B-to-A compartment switches, while perturbing the H3K4me1 signal resulted in A-to-B compartment switches (Figure 4B). Simultaneous perturbations of H3K9me3 and H3K4me1 signals exhibited the greatest impact on the changes in chromatin compartment patterns among all the two-perturbation combinations.

Taken together, COCOA facilitates investigations into the role of epigenomic signals in determining chromatin compartmentalization prediction through in silico epigenomic perturbation experiments. Our results suggest that H3K9me3 and H3K4me1 signals are crucial for maintaining the chromatin compartment pattern prediction.

COCOA shows robust performance of model predictions at different resolutions

To evaluate the performance of COCOA at different resolutions, we used the model trained at 25-kb resolution to predict the fine-scale CM using resolution-specific inputs. We first used the trained model to predict 10-kb-resolution CMs for Chr16, Chr17, and Chr18 datasets and evaluated the performance of the predictions. The results showed that COCOA achieved consistent and competitive scores on all three test sets (Table S4). We further evaluated the correlations between the predicted compartment patterns and the ground-truth patterns. The predicted 10-kb-resolution CM was highly correlated with the experimental CM (Figure 5A). A similar high correlation of A/B compartment states was observed between the predicted and experimental CMs. The compartment misclassification rates were all below 0.2, indicating that COCOA generates CMs containing reliable chromatin compartment information (Figure S7A). The predicted CM exhibited similar plaid patterns to the experimental CM, corresponding well with the epigenomic signals (Figure 5B and C). In addition, the modularity of the predicted CM aligned with that of the experimental CM on the whole (Figure S7C). We also observed consistent and significant shifts of epigenomic modification signals within 150-kb neighborhoods around A/B compartment boundaries in both predicted and experimental CMs, which are distinguishable from randomly selected genomic loci (Figure 5D).

Figure 5

Prediction of resolution-specific compartment patterns by COCOA

A. Correlation analyses between the predicted and experimental CMs at 10-kb and 1-kb resolutions, respectively. The “PC1” row represents the PCC of the PC1 values between the two CMs; the “Corr Mat” row represents the mean PCC for each column of the two CMs; and the “A/B” row represents the PCC of the binarized PC1 values between the two CMs. B. The predicted CM accurately captures the histone modification signal waveforms on Chr16 at 10-kb resolution. C. A typical region illustrating the predicted and experimental CMs at 10-kb and 1-kb resolutions, respectively. At 1-kb resolution, the experimental CM exhibits high noise levels and lacks recognizable plaid patterns, while the predicted CM demonstrates clear plaid patterns. D. Analysis of the shifts of histone modification signals within 150-kb and 15-kb neighborhoods surrounding compartment boundaries in both predicted and experimental CMs at 10-kb and 1-kb resolutions, respectively. At 10-kb resolution, both predicted and experimental CMs display meaningful shifts, whereas at 1-kb resolution, the experimental CM approaches random results, while the predicted CM still shows significant biological shifts.

Open in new tab Download slide

To evaluate COCOA’s performance at ultra-high resolution, we employed the model trained at 25-kb resolution to predict the 1-kb-resolution CMs for Chr16, Chr17, and Chr18 datasets. Similar to the evaluation at 10-kb resolution, we assessed the performance metrics and correlations. The results showed that COCOA achieved robust performance across a wide range of scores, but obtained scores close to 0 for PCCs at the CM level (Figure 5A; Table S4). This may be attributed to the sparsity of the deeply-sequenced experimental CM at ultra-high resolution (i.e., ∼ 2.6–4.5 billion uniquely mapped reads with ∼ 150× coverage per nucleosome) [47], which is challenging to define as the ground truth. As the CM size increases, the mean error evaluation narrows the gap, producing similar scores. However, using PCA-based correlations or compartment misclassification rates, we can partially mitigate the sparsity issue and obtain reliable scores (Figure S7B). Therefore, we visualized the experimental and predicted CMs by heatmaps (Figure 5C). We found that the experimental CM showed vaguely visible plaid patterns and was filled with noise-induced thin lines. In contrast, the predicted CM remained consistent with these fuzzy patterns but displayed more apparent compartmentalization patterns (Figure 5C, Figure S7D). Moreover, we investigated the shifts of epigenomic modification signals within 15-kb neighborhoods around A/B compartment boundaries in both the predicted and experimental CMs. Surprisingly, the shifts observed in the experimental CM were similar to those in randomly selected genome loci, while the predicted CM showed significant and biologically meaningful shifts (Figure 5D).

COCOA accurately predicts cell-type-specific chromatin compartment patterns

Because epigenomic data are cell-type-specific, we tested whether COCOA can accurately predict chromatin compartment patterns in different cell types. We first applied COCOA to the GM12878 dataset and generated the predicted CMs for multiple chromosomes. The corresponding experimental CMs obtained from the Hi-C data of GM12878 [6] served as the ground truth for comparison. The results indicated that high correlations were observed between the predicted and experimental CMs and between the PC1 values of the two CMs (Figure 6A, Figure S8A). Figure 6B showed that the compartment misclassification rates were all below 20%. Furthermore, the predicted CM presented the plaid patterns consistent with those of the experimental CM, achieving stable and competitive scores in terms of both error and the image similarity (Figure 6C; Table S5). Taken together, these results suggest that COCOA reliably predicts the cell-type-specific chromatin compartment patterns.

Figure 6

Prediction of cell-type-specific compartment patterns by COCOA

A. Correlation analyses between the predicted and experimental CMs on the GM12878 dataset. The “PC1” row represents the PCC of the PC1 values between the two CMs, and the “A/B” row represents the PCC of the binarized PC1 values between the two CMs. B. Proportion of compartment pattern matching between the predicted and experimental CMs on the GM12878 dataset. C. A typical region illustrating the predicted and experimental CMs on the GM12878 dataset. D. Example regions illustrating the predicted CMs on the datasets of activated B cells and SJSA1 cells. E. Precisely matching of the predicted CMs with the waveforms of histone modification signals on Chr15 in the GM12878 dataset. F. Systematic comparison of chromatin compartment statuses on Chr15 across the datasets of GM12878, activated B cells, activated T cells, chorionic villus cells, SJSA1 cells, and NCI-H929 cells.

Open in new tab Download slide

Having established COCOA’s capability to predict CMs across diverse cell types, we proceeded to predict CMs in five additional cell types (i.e., SJSA1 cells, NCI-H929 cells, activated B cells, activated T cells, and chorionic villus cells) representing tissues, diseases, and primary cells where chromatin conformation had not been sequenced. The predicted CMs for the five datasets displayed obvious plaid patterns (Figure 6D, Figure S8B). The predicted A/B compartments in SJSA1, NCI-H929, and activated B cells were clearly defined and noise-free, while the predicted CMs of activated T cells and chorionic villus cells displayed slightly diminished performance. To gain insight into the effects of chromatin compartmentalization in disease and differentiation, we systematically compared the patterns across different cell types. Using the predicted CM from GM12878 data as a benchmark, we examined commonalities and differences in chromatin region patterns based on histone modification track information (Figure 6E). Similar chromatin compartment patterns were observed between GM12878 and most other cells, albeit with variations in certain regions (Figure 6F). Notably, activated B cells, being immune cells akin to GM12878, exhibited a comparable compartment pattern. Similarly, activated T cells demonstrated an analogous pattern. In contrast, SJSA1, NCI-H929, and chorionic villus cells exhibited distinct compartment patterns.

Discussion

In this study, we developed a deep neural network framework, COCOA, which incorporates six types of epigenomic modification signals to accurately predict fine-scale-resolution chromatin compartment patterns. These epigenomic signal data are readily accessible in ENCODE database [53] for various cell lines, in vitro differentiated cells, primary cells, and tissues. To process the raw epigenomic data, we employed resolution-specific preprocessing to bin the data into mated inputs from different genomic positions. COCOA then uses the bidirectional feature reconstruction module to extract track features from these mated inputs, then fuses these track features to contact features using the cross-attention fusion module. Eventually, these contact features are converted to chromatin compartment patterns by the residual feature reduction module. COCOA predicts directly long-range chromatin compartment patterns without considering short-range interactions [28,29,35]. Our results demonstrate that COCOA accurately predicts the same chromatin compartment patterns as the experimental CM, with consistent epigenomic signal shifts of these patterns (Figure 2D). During model evaluation, COCOA achieves excellent performance with robust reproducibility scores on the test sets. Furthermore, the predicted CM and its PC1 values show a high correlation with the experimental CM and its PC1 values. The compartment misclassification rates of the predicted CM remain below 10% and are independent of chromosome length.

With COCOA’s accurate prediction of chromatin compartmentalization, it becomes possible to perform in silico epigenomic perturbation to study the influence of histone modification signals on chromatin compartmentalization prediction. By generating predicted CMs using different perturbed epigenomic data, we found that H3K9me3 has strong impact on chromatin compartment pattern prediction, followed by H3K4me1. In contrast, H3K27me3 and H3K36me3 have a moderate level of impact, and H3K27ac and H3K4me3 have low impact. Interestingly, COCOA predicted that perturbation of H3K9me3 signal led to compartment B-to-A changes, while perturbation of H3K4me1 signal resulted in A-to-B compartment switches. Additionally, H3K9me3 and H3K4me1signals play dominant roles in determining chromatin compartment patterns when they are perturbed together with other epigenomic signals in two-perturbation experiments. These findings contrast with a previous study by Zheng et al. [55], which identifies H3K27ac and H3K36me3 as the top two predictive histone marks for CoRNN. This inconsistency arises from substantial differences in the target (global contact pattern for COCOA versus A/B compartment for CoRNN), as well as variations in question formulation, such as regression (COCOA) versus classification (CoRNN).

Furthermore, we explored the performance of COCOA’s predictions across different resolutions and cell types. For prediction at 10-kb resolution, COCOA exhibited the same outstanding performance as predicted at the trained 25-kb resolution. Recognizing the significance of high resolution in chromatin interaction data analysis, we investigated whether COCOA can make good prediction at 1-kb resolution. Unfortunately, even with a high sequencing depth [47], the experimental CM at 1-kb resolution contains excessive noise lines and barely discernible plaid patterns. Therefore, we analyzed histone modification shifts at compartment boundaries and mapped heatmaps of the predicted CM at different genome ranges. Surprisingly, the predicted CM displayed clearer plaid patterns and exhibited more biologically meaningful shifts compared to the experimental CM and randomly selected loci. We then evaluated the performance of COCOA in predicting cell-type-specific compartment patterns. Using validated Hi-C data of GM12878, our results demonstrated that COCOA can correctly infer chromatin compartment patterns from epigenomic data on unseen cell lines.

While this work presents promising results, it also has several potential areas for improvements. Firstly, as a data-driven approach, COCOA relies on moderately good-quality training sets to achieve high performance by incorporating potential information from bidirectional epigenomic data. In addition, we observed that the transfer capacity of COCOA in cross-cell-line experiments is affected by the epigenomic data quality. Developing new data processing schemes may prove beneficial in solving this issue. Secondly, in challenging task such as high-volume fine-scale-resolution CM prediction and in silico epigenomic signal perturbation experiments, COCOA requires significant run time and substantial computational resources. To alleviate this computational burden, parallel CM generation and distributed implementations can be explored as feasible approaches [56]. Thirdly, we also preliminarily explored the influences of histone modification signals on A/B chromatin compartmentalization in HFF datasets by in silico epigenomic perturbation experiments. However, more systematically studying the combined impacts of epigenomic modifications in relation to complex chromatin compartmentalization on different cell lines would benefit from further experimental evidence. Lastly, COCOA’s predictions for fine-scale chromatin compartmentalization information in diseases, tissues, and primary cells have not been thoroughly explored. In the future, it would be interesting to explore the impact of the chromatin compartment alteration on cell differentiation and disease occurrence by integrating epigenomics data with other omics and phenotypic data.

Code availability

The source code has been implemented in Python and can be freely accessed on GitHub (https://github.com/onlybugs/COCOA). The source code has also been submitted to BioCode at the National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences / China National Center for Bioinformation (BioCode: BT007498), which is publicly accessible at https://ngdc.cncb.ac.cn/biocode/tools/BT007498.

CRediT author statement

Kai Li: Conceptualization, Methodology, Software, Data curation, Visualization, Formal analysis, Writing – original draft, Writing – review & editing. Ping Zhang: Conceptualization, Writing – review & editing. Jinsheng Xu: Data curation. Zi Wen: Data curation. Junying Zhang: Data curation. Zhike Zi: Writing – review & editing, Resources, Conceptualization. Li Li: Writing – review & editing, Project administration, Resources. All authors have read and approved the final manuscript.

Supplementary material

Supplementary material is available at Genomics, Proteomics & Bioinformatics online (https://doi.org/10.1093/gpbjnl/qzae091).

Competing interests

The authors have declared no competing interests.

Acknowledgments

This work was supported by grants from the National Natural Science Foundation of China (Grant No. 32470661 to LL), the Huazhong Agricultural University Scientific and Technological Self-innovation Foundation (to LL), and the Guangdong Provincial Key Laboratory of Synthetic Genomics and Shenzhen Key Laboratory of Synthetic Genomics (Grant No. ZDSYS201802061806209 to ZZ). The numerical computations were performed on the Hefei Advanced Computing Center, China. We thank Li lab members for providing feedback on the earlier version of the manuscript.

ORCID

0009-0009-6928-4914 (Kai Li)

0000-0001-6831-1807 (Ping Zhang)

0000-0002-5083-6354 (Jinsheng Xu)

0000-0001-7600-4399 (Zi Wen)

0009-0007-7108-4984 (Junying Zhang)

0000-0002-7601-915X (Zhike Zi)

0000-0002-9594-8341 (Li Li)

References

[1]

Chen

Feng

Liu

Sun

, et al.

3D chromatin structures of mature gametes and structural reprogramming during mammalian embryogenesis

Cell

2017

;

170

367

–

81.e20

[2]

Dong

Shi

Lin

Peng

, et al.

Integrative analysis of genome, 3D genome, and transcriptome alterations of clinical lung cancer samples

Genomics Proteomics Bioinformatics

2021

;

741

–

[3]

Lieberman-Aiden

van Berkum

Williams

Imakaev

Ragoczy

Telling

, et al.

Comprehensive mapping of long-range interactions reveals folding principles of the human genome

Science

2009

;

326

289

–

[4]

Hsieh

THS

Weiner

Lajoie

Dekker

Friedman

Rando

OJ.

Mapping nucleosome resolution chromosome folding in yeast by Micro-C

Cell

2015

;

162

108

–

[5]

Deshpande

Ulahannan

Pendleton

Dai

Behr

, et al.

Identifying synergistic high-order 3D chromatin conformations from genome-scale nanopore concatemer sequencing

Nat Biotechnol

2022

;

1488

–

[6]

Rao

Huntley

Durand

Stamenova

Bochkov

Robinson

, et al.

A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping

Cell

2014

;

159

1665

–

[7]

Nora

Lajoie

Schulz

Giorgetti

Okamoto

Servant

, et al.

Spatial partitioning of the regulatory landscape of the X-inactivation centre

Nature

2012

;

485

381

–

[8]

Dixon

Selvaraj

Yue

Kim

Shen

, et al.

Topological domains in mammalian genomes identified by analysis of chromatin interactions

Nature

2012

;

485

376

–

[9]

Vian

Pękowska

Rao

SSP

Kieffer-Kwon

Jung

Baranello

, et al.

The energetics and physiological impact of cohesin extrusion

Cell

2018

;

175

292

–

[10]

Goel

Huseyin

Hansen

AS.

Region capture Micro-C reveals coalescence of enhancers and promoters into nested microcompartments

Nat Genet

2023

;

1048

–

[11]

Simonis

Klous

Splinter

Moshkin

Willemsen

de Wit

, et al.

Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C)

Nat Genet

2006

;

1348

–

[12]

Spracklin

Abdennur

Imakaev

Chowdhury

Pradhan

Mirny

, et al.

Diverse silent chromatin states modulate genome compartmentalization and loop extrusion barriers

Nat Struct Mol Biol

2023

;

–

[13]

Haws

Simandi

Barnett

Phillips-Cremins

JE.

3D genome, on repeat: higher-order folding principles of the heterochromatinized repetitive genome

Cell

2022

;

185

2690

–

707

[14]

Feng

Wang

Yang

Naseri

, et al.

Simultaneous epigenetic perturbation and genome imaging reveal distinct roles of H3K9me3 in chromatin architecture and transcription

Genome Biol

2020

;

296

[15]

Nichols

Corces

VG.

Principles of 3D compartmentalization of the human genome

Cell Rep

2021

;

109330

[16]

Wen

Zhang

Zhong

Hou

Qin

, et al.

Extensive chromatin structure–function associations revealed by accurate 3D compartmentalization characterization

Front Cell Dev Biol

2022

;

845118

[17]

Harris

Olshansky

Wang

Farabella

Eliaz

, et al.

Chromatin alternates between A and B compartments at kilobase scale for subgenic organization

Nat Commun

2023

;

3303

[18]

Rao

SSP

Huang

Glenn St Hilaire

Engreitz

Perez

Kieffer-Kwon

, et al.

Cohesin loss eliminates all loop domains

Cell

2017

;

171

305

–

20.e24

[19]

Fortin

Hansen

KD.

Reconstructing A/B compartments as revealed by Hi-C using long-range correlations in epigenetic data

Genome Biol

2015

;

180

[20]

Schmitt

Ren

Genome-wide mapping and analysis of chromosome architecture

Nat Rev Mol Cell Biol

2016

;

743

–

[21]

LeCun

Bengio

Hinton

Deep learning

Nature

2015

;

521

436

–

[22]

Yang

Chang

JM.

Pattern recognition of topologically associating domains using deep learning

BMC Bioinformatics

2022

;

634

[23]

Soler-Vila

Cuscó

Farabella

Di Stefano

Marti-Renom

MA.

Hierarchical chromatin organization detected by TADpole

Nucleic Acids Res

2020

;

e39

[24]

Zhang

Plummer

Cui

Wang

, et al.

DeepLoop robustly maps chromatin interactions from sparse allele-resolved or single-cell Hi-C data at kilobase resolution

Nat Genet

2022

;

1013

–

[25]

Zhang

Blanchette

Reference panel guided topological structure annotation of Hi-C data

Nat Commun

2022

;

7426

[26]

Wang

Liu

Wang

DFHiC: a dilated full convolution model to enhance the resolution of Hi-C data

Bioinformatics

2023

;

btad211

[27]

Zhang

Zheng

, et al.

Enhancing Hi-C data resolution with deep convolutional neural network HiCPlus

Nat Commun

2018

;

750

[28]

Schwessinger

Gosden

Downes

Brown

Oudelaar

Telenius

, et al.

DeepC: predicting 3D genome folding using megabase-scale transfer learning

Nat Methods

2020

;

1118

–

[29]

Fudenberg

Kelley

Pollard

KS.

Predicting 3D genome folding from DNA sequence with Akita

Nat Methods

2020

;

1111

–

[30]

Yang

Das

Gao

Karbalayghareh

Noble

Bilmes

, et al.

Epiphany: predicting Hi-C contact maps from 1D epigenomic signals

Genome Biol

2023

;

134

[31]

Zhang

Zhou

Multiscale and integrative single-cell Hi-C analysis with Higashi

Nat Biotechnol

2022

;

254

–

[32]

Xiong

Zhang

scGHOST: identifying single-cell 3D genome subcompartments

Nat Methods

2024

;

814

–

[33]

Zhang

Chasman

Knaack

Roy

In silico prediction of high-resolution Hi-C interaction matrices

Nat Commun

2019

;

5449

[34]

Zhou

Sequence-based modeling of three-dimensional genome architecture from kilobase to chromosome scale

Nat Genet

2022

;

725

–

[35]

Tan

Shenker-Tauris

Rodriguez-Hernaez

Wang

Sakellaropoulos

Boccalatte

, et al.

Cell-type-specific prediction of 3D chromatin organization enables high-throughput in silico genetic screening

Nat Biotechnol

2023

;

1140

–

[36]

Reiff

Schroeder

Kırlı

Cosolo

Bakker

Mercado

, et al.

The 4D nucleome data portal as a resource for searching and visualizing curated nucleomics data

Nat Commun

2022

;

2365

[37]

Abdennur

Mirny

LA.

Cooler: scalable storage for Hi-C data and other genomically labelled arrays

Bioinformatics

2020

;

311

–

[38]

Kim

Dekker

ChIP-seq

Cold Spring Harb Protoc

2018

;

2018

363

–

Google Scholar

OpenURL Placeholder Text

WorldCat

[39]

Colwell

Expanding the scope of ENCODE

Cancer Discov

2016

;

OF4

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

[40]

Zhang

Wang

Shen

Sun

, et al.

iEnhance: a multi-scale spatial projection encoding network for enhancing chromatin interaction data resolution

Brief Bioinform

2023

;

bbad245

[41]

Wang

Qin

Wang

Xiang

Chen

ReLTanh: an activation function with vanishing gradient resistance for SAE-based DNNs and its application to rotating machinery fault diagnosis

Neurocomputing

2019

;

363

–

Google Scholar

Crossref

WorldCat

[42]

Dai

Gieseke

Oehmcke

Barnard

Attentional feature fusion. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision

2021

3560

–

[43]

Zhang

Ren

Sun

Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition

2016

770

–

[44]

Ioffe

Szegedy

Batch Normalization: accelerating deep network training by reducing internal covariate shift. Proceedings of the 32nd International Conference on International Conference on Machine Learning 2015;

448

–

[45]

Johnson

Alahi

FF.

Perceptual losses for real-time style transfer and super-resolution. In: Leibe B, Matas J, Sebe N, Welling M, editors. Computer Vision – ECCV 2016. Cham: Springer; 2016, p.694–711.

[46]

Gatys

Ecker

Bethge

A neural algorithm of artistic style. arXiv

2015

;1508.06576.

[47]

Krietenstein

Abraham

Venev

Abdennur

Gibcus

Hsieh

, et al.

Ultrastructural details of mammalian chromosome architecture

Mol Cell

2020

;

554

–

65.e7

[48]

Paszke

Gross

Massa

Lerer

Bradbury

Chanan

, et al.

PyTorch: an imperative style, high-performance deep learning library

. Advances in Neural Information Processing Systems

2019

OpenURL Placeholder Text

WorldCat

[49]

Kingma

Adam: a method for stochastic optimization. 3rd International Conference for Learning Representations 2014.

[50]

Pedregosa

Varoquaux

Gramfort

Michel

Thirion

Grisel

, et al. Scikit-learn: machine learning in Python. J Mach Learn Res 2011;

2825

–

[51]

Wang

Bovik

Sheikh

Simoncelli

EP.

Image quality assessment: from error visibility to structural similarity

IEEE Trans Image Process

2004

;

600

–

[52]

Yang

Methods to assess the reproducibility and similarity of Hi-C data

Methods Mol Biol

2022

;

2301

–

[53]

Luo

Hitz

Gabdank

Hilton

Kagda

Lam

, et al.

New developments on the encyclopedia of DNA elements (ENCODE) data portal

Nucleic Acids Res

2020

;

D882

–

[54]

Xiong

Revealing Hi-C subcompartments by imputing inter-chromosomal chromatin interactions

Nat Commun

2019

;

5069

[55]

Zheng

Thakkar

Harris

Liu

Zhang

Gerstein

, et al.

Predicting A/B compartments from histone modifications using deep learning

iScience

2024

;

109570

[56]

EnHiC: learning fine-resolution Hi-C contact maps using a generative adversarial framework

Bioinformatics

2021

;

i272

–

Author notes

Kai Li and Ping Zhang Equal contribution.

Current address for Ping Zhang: School of Computer, BaoJi University of Arts and Sciences, Baoji 721016, China.

© The Author(s) 2024. Published by Oxford University Press and Science Press on behalf of the Beijing Institute of Genomics, Chinese Academy of Sciences / China National Center for Bioinformation and Genetics Society of China.

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Handling Editor:

Download all slides

Month:	Total Views:
December 2024	64
January 2025	88
February 2025	29
March 2025	23
April 2025	84

Article Contents

COCOA: A Framework for Fine-scale Mapping of Cell-type-specific Chromatin Compartments Using Epigenomic Information

Abstract

Introduction

Method

Hi-C and Micro-C data sources and preprocessing

ChIP-seq data sources and preprocessing

Dividing matrices

Combining predicted sub-matrices

COCOA architecture

Bidirectional feature reconstruction

Cross-attention fusion

Residual feature reduction

Loss function

COCOA training and hyperparameter exploration

Model evaluation

Results

Overview of COCOA

COCOA accurately predicts chromatin compartmentalization pattern

Genome-wide performance evaluation of COCOA

COCOA predicts chromatin compartmentalization changes to epigenomic perturbations

COCOA shows robust performance of model predictions at different resolutions

COCOA accurately predicts cell-type-specific chromatin compartment patterns

Discussion

Code availability

CRediT author statement

Supplementary material

Competing interests

Acknowledgments

ORCID

References

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only