-
PDF
- Split View
-
Views
-
Cite
Cite
Shrabanti Chowdhury, Sammy Ferri-Borgogno, Peng Yang, Wenyi Wang, Jie Peng, Samuel C Mok, Pei Wang, Learning directed acyclic graphs for ligands and receptors based on spatially resolved transcriptomic data of ovarian cancer, Briefings in Bioinformatics, Volume 26, Issue 2, March 2025, bbaf085, https://doi.org/10.1093/bib/bbaf085
- Share Icon Share
Abstract
To unravel the mechanism of immune activation and suppression within tumors, a critical step is to identify transcriptional signals governing cell–cell communication between tumor and immune/stromal cells in the tumor microenvironment. Central to this communication are interactions between secreted ligands and cell-surface receptors, creating a highly connected signaling network among cells. Recent advancements in in situ-omics profiling, particularly spatial transcriptomic (ST) technology, provide unique opportunities to directly characterize ligand–receptor signaling networks that power cell–cell communication. In this paper, we propose a novel statistical method, LRnetST, to characterize the ligand–receptor interaction networks between adjacent tumor and immune/stroma cells based on ST data. LRnetST utilizes a directed acyclic graph model with a novel approach to handle the zero-inflated distributions of ST data. It also leverages existing ligand–receptor regulation databases as prior information, and employs a bootstrap aggregation strategy to achieve robust network estimation. Application of LRnetST to ST data of high-grade serous ovarian tumor samples revealed both common and distinct ligand–receptor regulations across different tumors. Some of these interactions were validated through both a MERFISH dataset and a CosMx SMI dataset of independent ovarian tumor samples. These results cast light on biological processes relating to the communication between tumor and immune/stromal cells in ovarian tumors. An open-source R package of LRnetST is available on GitHub at https://github.com/jie108/LRnetST.
Introduction
High grade serous ovarian cancer (HGSC) is the most lethal gynecologic malignancy, with its daunting overall survival rate showing limited improvement over decades [1–4]. A major obstacle in fully understanding the mechanisms of tumor progression and chemo-resistance in HGSC is its high intra-tumor heterogeneity, comprising both tumor clonal heterogeneity and tissue architecture heterogeneity [5]. The latter is reflected by the heterogeneous stromal and immune cell population in the ovarian tumor microenvironment (TME) [5, 6]. The recent advances in in situ omics analysis have suggested an important link between cell–cell interactions among tumor/immune/stromal cells in TME and tumor progression as well as therapeutic resistance [7]. However, the molecular mechanisms that shape these cell–cell interactions in HGSC are still largely unexplored.
A predominant form of cell–cell signaling is powered through interactions between ligands from one cell and cognate receptors on neighboring cells. Considerable effort has been dedicated to develop tools for exploring these interactions using single-cell RNA-seq (scRNA-seq) data, including CellphoneDB [8], Single-CellSignalR [9], Cellchat [10], ICELLNET [11], CrosstalkeR [12], Nichenet [13], scMLnet [14], and CytoTalk [15]. Despite these endeavors to characterizing intercellular communications with scRNAseq data, the absence of spatial information in scRNAseq data significantly hampers the precision in dissecting ligand–receptor (LR) interaction network, given that these interactions occur locally between neighboring cells within the tissue.
The latest development of spatial transcriptomic (ST) profiling technology enables mapping messenger RNA molecules to a specific location (spot or grid) of a tissue slice in a high-throughput manner [16–18]. These platforms thus provide an unprecedented opportunity to comprehensively characterize the LR interactions among neighboring cells (e.g. those from the adjacent grids on ST slices), which is not feasible based on either bulk or single-cell RNA profiles. In this paper, we aim to characterize the LR regulatory network in HGSC using ST data.
Pioneering efforts have been made for inferring cell–cell communication networks while considering spatial information, including Giotto [19], stLearn [20], and spaCI [21]. Within Giotto, the Spatially Informed Cell-to-Cell Communication (spatCellCellcom) module calculates cell–cell communication scores for each LR pair between proximal cell types according to the spatial network. Permutation-based P-values and multiple hypotheses adjustment are subsequently computed. stLearn focuses on the proportion of neighboring spots with upregulated expressions for both the ligand and receptor genes of a given LR pair. The significant LR pairs are then obtained by integrating the signals across all spots. spaCI uses the gene-based and spatially guided embeddings to convert the gene expressions into latent representation via a standard multilayer perceptron and then applies a triplet loss function to predict the cosine similarity of all possible LR pairs. Although all three methods incorporate spatial information, none directly addresses the issue of zero-inflation that is pervasive in ST data. Furthermore, assessing co-expression based on either thresholding, marginal correlation, or similarity measure is susceptible to considerable variability present in ST data, leading to a lack of reproducibility (see Results section).
To address these challenges, we developed LRnetST—Ligand-Receptor Network learning based on Spatial Transcriptomics data, a novel tool to construct LR networks between adjacent cells of different types based on either multicellular or single-cell ST data. The LRnetST pipeline begins by introducing the Neighbor Integrated Matrix (NIM), which integrates the spatial information and the molecular information within the ST data. Subsequently, LRnetST utilizes a binary variable alongside a continuous variable for every ligand/receptor in the node space. This coding strategy not only addresses zero inflation in ST data, but also enhances the power to detect interactions predominantly signaled through active/inactive statuses of ligands/receptors. To account for variation in library sizes across grids/cells, LRnetST employs an aggregation framework that combines bootstrap resample based directed acyclic graph (DAG) learning with downsampling based normalization. This framework provides false edge control and is adaptable to incorporate prior information, e.g. data from existing LR databases.
We applied LRnetST to both multicellular and single-cellular ST datasets of ovarian cancer. In the multicellular dataset, our focus was on detecting LR interactions between neighboring spot-pairs enriched with tumor and immune/stroma cells, respectively. In the single-cellular ST data, we examined interactions between tumor cells and other immune or stroma cell types. To guide the construction of DAGs, we incorporate known LR regulation information from relevant databases [22] as prior information to constrain the DAG space. Based on the multicellular analysis, LRnetST revealed a substantial number of shared LR interactions between tumor and stroma cells across four different HGSC samples. Further, applying LRnetST to the single-cell MERFISH ST data as well as CosMx SMI ST data of independent ovarian tumors, we were able to confirm several LR interactions between tumor and nearby stroma cells. These findings offer fresh insights into the roles of these tumor-relevant genes/proteins in facilitating cell–cell communication within HGSC.
Materials and methods
Notation and background
A DAG |$\mathcal{G}(\mathbb{N},\mathbb{E})$| contains a node set |$\mathbb{N}$| and an edge set |$\mathbb{E}$|, where the edges are directed from parent nodes to children nodes, without any cycle in the graph. In a DAG model, the node set |$\mathbb{N}$| represents a set of random variables, while the edge set |$\mathbb{E}$| represents the conditional dependence relationships among these random variables. Structure learning of a DAG means identifying the parent set (often known as neighborhood) of each node in the graph.
LRnetST pipeline
To effectively characterize the complicated LR regulatory relationship in TME, we proposed to use DAG models, which has emerged as a valuable tool for inferring gene–gene interactions [22–28]. However, multiple challenges impede the application of existing DAG models to ST data, including their inability to integrate both spatial information and -omics profiles concurrently into network learning, statistical and computational challenges due to high drop-out rates in the ST data, and read count variation across different ST spots (see Methods).
To address these challenges, we developed a new DAG model coupled with customized fitting pipeline, called LRnetST, for constructing LR networks based on ST data. Below, we first provide a concise overview of the pipeline, followed by a detailed explanation of the key steps.
As illustrated in Fig. 1A, the LRnetST pipeline comprises of four major steps: (1) normalization through downsampling, (2) construction of the NIMs with an additional round of bootstrap resample, (3) estimation of DAGs based on bootstrapped samples, and (4) aggregation of the DAGs into a final network.

LRnetST pipeline. (A) Given the ST data matrix and the spatial information, LRnetST generates multiple downsampled data matrices together with a list of matching Index Spot and Neighbor Spot pairs, constructs NIMs with bootstrap resamples, learns DAGs from resampled NIMs, and aggregates the inferred DAGs using SHD. (B) To construct one DAG, starting with the initial NIM by stacking the gene expression profiles of the matching Index and Neighbor Spots, LRnetST converts the downsampled UMI counts Z into activation status Y and expression level X given Y, generates a bootstrap resample without replacement, and learns a DAG based on the resampled matrix.
Within the LRnetST pipeline, the second step helps to combine the spatial location information with the molecular data information. It contains three key components as illustrated in Fig. 1B: (1) integration of the spatial and molecular information through introducing the Initial Neighbor Integrated Matrices (Initial-NIMs), (2) forming the NIM by encoding each gene expression using one binary indicator and a continuous variable, and (3) bootstrap resampling for the subsequent DAG construction. Notably, this framework, which integrates downsampling-based normalization, NIM construction, and the bootstrap-aggregation based network inference, not only effectively accounts for the varying unique molecular identifier (UMI) count levels across different ST spots/cells, but also enhances the robustness of the DAG structure learning.
The third step of LRnetST contains a novel DAG model customized to NIM. Specifically, the DAGs are constructed by maximizing a Bayesian information criterion (BIC) penalized joint log-likelihood of all the nodes in the graph. Moreover, LRnetST incorporates existing LR databases as prior information to constrain the search space during DAG learning.
In the last step, the derivation of the final aggregated DAG is achieved by searching for a DAG that minimizes an average structural Hamming distance (SHD) to the DAGs in the ensemble built from the previous steps.
Note that LRnetST is a versatile tool and can be applied to both multicellular and single-cellular ST datasets. In the real data application, we demonstrated LRnetST’s utility by analyzing both a multicellular and a single-cell ST datasets of ovarian cancer. Primarily the multicellular data were used to identify LR interactions between neighboring spot-pairs enriched with tumor and stroma cells, respectively, while two independent single-cell ST datasets were used to validate the findings.
Below, we provide details of the key steps in the LRnetST pipeline using multicellular ST data. The parallel strategy for handling single-cell ST data is similar, with “spots” replaced by “cells.”
Neighbor integrated matrix
In the LRnetST pipeline, we first derive an Initial-NIM to integrate the spatial and the molecular information in ST data. We choose to focus on pairs of adjacent ST spots enriched of tumor and stromal/immune cells, respectively. Specifically, for each tumor sample, ST spots are first classified into two classes: enriched or not enriched of tumor cells. We then identify the subset of tumor-cell-enriched spots sitting on the boundary of the tumor regions (i.e. being adjacent to spots enriched of non-tumor cells on the ST slice). We refer to these spots as the Index Spots and refer to their closest non-tumor-cell-enriched spots as their Neighbor Spots. We denote the total number of Index Spot and Neighbor Spot pairs as |$n$|, and the number of genes under consideration as |$p$|.
In the next step, we derive an Initial-NIM by stacking the rows of the gene expression matrix of the Index Spots, which has a dimension of |$p \times n$|, with the gene expression matrix of the corresponding Neighbor Spots, which also has a dimension of |$p \times n$|. The resulting Initial-NIM has an expanded feature space of |$2p$| rows. Finally, we convert the Initial-NIM into NIM by replacing the expression |$Z$| of a gene with a pair of features, a continuous variable |$X$| and a binary variable |$Y$|, to facilitate the modeling of zero-inflation in the ST data (see the next subsection). The resulting NIM has a dimension |$4p \times n$|. See Fig. 1B and the pseudo algorithm outlined in section A.2 of the Supplementary Material for more details.
Accounting for zero inflation in ST data
Similar to scRNA-seq data, ST data have a high dropout rate, i.e. only a fraction of the transcriptome detected at each ST spot. To facilitate the modeling of such zero-inflated data, LRnetST uses two nodes (variables) to represent each gene: a binary node and a continuous node.
First, we use |$Z_{ij}$| to denote the normalized expression measures (see the next subsection) of the |$i_{th}$| gene in the |$j_{th}$| ST spot. Note that |$Z_{ij}=0$| corresponds to a situation where either the |$i_{th}$| gene is not expressed, or its expression level is low in the |$j_{th}$| spot and thus is not detected in the sequencing experiments. Therefore, we view the event of |$Z_{ij}=0$| as a “less-active” status of the gene and uses a binary node, denoted as |$Y_{ij}$|, to represent the “detection status”:
Given the gene is detected. (i.e. |$Y_{ij}=1$|), the continuous node, denoted as |$X_{ij}$|, represents the normalized gene expression/abundance:
By including both nodes, |$X_{ij}$| and |$Y_{ij}$|, LRnetST not only accounts for the zero inflation in the data, but also achieves better power in detecting those interactions that are largely signaled through the active/less-active statuses of genes.
Score function and optimization
Based on NIM, we learn an LR DAG by minimizing a BIC-type score function through a greedy search algorithm—hill-climbing (HC) [25, 26, 28, 29]. Note that the likelihood of the continuous nodes is calculated only using data from spots on which their binary nodes take the value |$1$| (i.e. |$X$| is not |$NA$|). This makes the binary nodes natural parents of the corresponding continuous nodes of the same gene. At each current graph |$\mathcal{G}$|, for a continuous node, |$X$|, the residual sum of squares (|$RSS^{\mathcal{G}}_{X}$|) from regressing |$X$| onto its current parent set, using only the samples where |$X$| is nonzero, is calculated. Specifically, we have |$loglik(X|pa_{X}^{\mathcal{G}}) \sim -n_{X}/2 \log (RSS^{\mathcal{G}}_{X}/n_{X}) + Constant$|, where, |$RSS^{\mathcal{G}}_{X}$| is the residual sum of squares, |$pa_{X}^{\mathcal{G}}$| denotes the parent set of the node in graph |$\mathcal{G}$|, and |$n_{X}$| is the number of samples where |$X$| is nonzero. The log-likelihood of a binary node |$Y$| is obtained by regressing |$Y$| onto its current parent set |$pa_{Y}^{\mathcal{G}}$| through logistic regression [29].
where |$ \hat{p}_{s} = \widehat{P}\left (Y_{s}=1| pa_{Y,s}^{\mathcal{G}}\right ) =\frac{\exp (\hat{\alpha }_{0} + \hat{\boldsymbol{\alpha }}^{T}pa_{Y,s}^{\mathcal{G}})}{1 + \exp (\hat{\alpha }_{0} + \hat{\boldsymbol{\alpha }}^{T} pa_{Y,s}^{\mathcal{G}})}$|, |$\hat{\alpha }_{0}$| is the fitted intercept and |$\hat{\boldsymbol{\alpha }}$| is the vector of the fitted coefficients, and |$n$| is the sample size. We then consider BIC scores that penalize for model complexity: |$score_{BIC_{X}} = -2loglik(X|pa_{X}^{\mathcal{G}}) + |pa_{X}^{\mathcal{G}}| \log (n_{X})$| for a continuous node |$X$| and |$score_{BIC_{Y}} = -2loglik(Y|pa_{Y}^{\mathcal{G}}) + |pa_{Y}^{\mathcal{G}}| \log (n)$| for a binary node |$Y$|, where |$|pa_{X}^{\mathcal{G}}|$| and |$|pa_{Y}^{\mathcal{G}}|$| denote the size of the parent set of a continuous and a binary node, respectively, in graph |$\mathcal{G}$|.
The final score of a graph |$\mathcal{G}$| is the summation of the BIC scores of the individual nodes. We then search for the DAG that minimizes this score function using an efficient implementation of the HC algorithm that is modified from our previous work DAGBagM [29].
Downsampling normalization and bootstrap model aggregation
LRnetST employs an aggregating framework that couples a bootstrap aggregation (bagging) procedure of DAG learning with the downsampling based normalization to achieve better control of false edge detection and at the same time to account for the varying library sizes across different ST spots.
Specifically, we apply downsampling normalization on the original gene expression data by fixing the total UMI at the median level across all ST spots. We then generate |$B=100$| downsampled gene expression matrices (|$[Z]$|), and construct |$B=100$| NIMs (see Fig. 1B) accordingly. Next we obtain one bootstrap resample on each NIM through sampling with replacement of the Index/Neighbor Spots pairs. Finally, we learn one DAG based on each bootstrap resample using the method described in the previous subsection.
The resulting ensemble of DAGs are then aggregated following the aggregation procedure implemented in DAGBagM [29], where the HC algorithm is (again) used to search for a DAG that minimizes an aggregation score based on the SHD while maintaining acyclicity.
For more details of the LRnetST pipeline, please see the pseudo code in Algorithm in Section A.2 of the Supplementary Material.
Simulation studies
We utilized synthetic data sets to evaluate the performance of the DAG learning step in LRnetST. We consider simulations with different sample sizes (i.e. numbers of Index-Neighbor Spot pairs: |$n=200$| and |$400$|) and different DAG topology with varying number of genes (|$p=$| 100 to 600). The true DAGs are shown in Figure S1A-F. For the detailed data generating steps please refer to Section A.3 of the Supplementary Material. We compared the performance of DAG learning based on initial-NIM and based on NIM to assess the advantage of using the paired binary and continuous nodes to facilitate the modeling of zero-inflation in ST data. Moreover, we compared the performance of LRnetST with three alternative methods: DAGBagM, DAGBagM_C [29], and bnlearn [30]. DAGBagM is a method for learning DAGs with mixed binary and continuous nodes. When applying DAGBagM to NIM, |$X$| is replaced with |$Z$| (i.e. |$NA$| in |$X$| is replaced with |$0$|) in NIM, and all samples are used to calculate the scores for the continuous nodes, whereas LRnetST uses the converted |$X$| so that only samples with |$Y=1$| are used to calculate the scores for the continuous nodes. Note that we performed bootstrap aggregation for all methods except bnlearn due to its prohibitive computational cost. To evaluate the performance of LRnetST and other methods for detecting the skeleton (i.e. edges without direction) as well as the directed edges, we assessed the power (or true positive rates), false discovery rates, and the F1 scores defined as |$F1=2\times \mbox{precision} \times \mbox{recall}/\mbox{(precision+recall)}$|.
ST data description and processing
10x ST data: to characterize tumor and its microenvironment, ST analysis was performed on four fresh frozen treatment-naive advanced stage HGSC samples using the 10x ST platform as described in our previous work [31]. Among these samples, two (A4 and A5) were derived from chemo-sensitive patients with extended progression free survival time (|$>10$| years), and the remaining two (A10 and A12) originated from chemo-resistant patients with short progression free survival time (|$<6$| months). In the ST experiment, each ST spot contained 20–50 cells, and 930–1007 spots were profiled per tumor slice.
Expression levels of genes in each spot were measured using counts of UMI. Since each spot is a mixture of several cell types, in order to identify tumor or immune/stromal cell-enriched spots, we estimated the cell-type proportions in each grid (spot) by performing deconvolution analysis. We then integrated the estimated cell-type proportions with pathologist-annotated tumor/stroma regions in each tumor slice to identify the tumor-enriched spots, as well as spots enriched with immune/stromal cells (see Section A.4 in the Supplementary Material). For each tumor spot, we further identified the adjacent or neighboring immune/stromal spots that are at most 2-grid distance apart from the tumor spot.
MERFISH and CosMx SMI ST data: to validate the LR edges from 10x ST data, we obtained publicly available MERFISH data sets for four ovarian tumor samples [32] and a NanoString CosMx Spatial Molecular Imaging (SMI) dataset of one HGSC tumor [33].
After preprocessing, for each tumor, we performed clustering and cell-type annotation analysis (see Supplementary Materials). For MERFISH, we identified five major cell groups: tumor cells, macrophages, T-cells, fibroblast, and endothelial, while for CosMx SMI, we identified tumor, macrophages, and T-cells as the major cell groups (Supplementary Table S2). Then, for each tumor cell, we identified its adjacent/neighboring cell that has minimum distance from the tumor cell, as obtained from the Delaunay network (implemented under Giotto).
Normalization: for all the 10x, MERFISH, and CosMx SMI data, the median of the total gene counts per spot (or cell) in each sample was used for the downsampling normalization in LRnetST. Tables A.3 and A.4 in the Supplementary Material give the detailed numbers of the tumor and neighboring cells, the numbers of adjacent index-neighbor-cell pairs, along with the number of LR genes (documented in database [34]) used for LR network learning in each tumor tissue for 10x ST and MERFISH data, respectively.
Comparing LRnetST with stLearn, spatCellCellcom, and spaCI
To compare the performance of LRnetST with other LR network learning methods, we applied spatCellCellcom, stLearn, and spaCI on the same data sets. For preprocessing including normalization, we followed their respective pipelines implemented in the corresponding R (spatCellCellcom implemented under Giotto) and python (stLearn and spaCI) packages. Similar to LRnetST, we only considered the genes documented in the LR database. For spatCellCellcom and stLearn, we obtained the LR regulation edges (Table A.5 in the Supplementary material, Supplementary Table S1) by thresholding the adjusted P-values, while for spaCI we used a threshold on the predicted cosine similarity from the output of all possible LR interactions to derive networks of comparative number of edges.
Results
Evaluation of DAG learning based on synthetic data sets
We evaluated and compared the performance of the DAG learning step in LRnetST with alternatives including DAGBagM [29], DAGBagM_C [29], and bnlearn, based on a collection of simulated NIM matrices (see details in Methods). The results in Table A.1 in the Supplementary Material and Figure S1G-H indicate superior performance of LRnetST compared with the other methods across all considered scenarios, demonstrating the results of including the binary nodes in NIM in the LRnetST pipeline. Furthermore, the results of LRnetST and DAGBagM on NIM show the advantages derived from the customized treatment of likelihood terms associated with paired binary and continuous nodes in LRnetST. In the end, as expected, for all four methods, we observed improved performances as the sample size (n) increases, while declined performance with the increase of the number of genes (p).
Revealing cell–cell interaction in ovarian cancer with LRnetST
Application to the 10|$\times $| ST ovarian cancer data
For each tumor sample in the ovarian cancer 10x ST data set, we applied LRnetST, stLearn spatCellCellcom, and spaCI to construct patient-specific LR networks (Figure S2A–D). Table A.5 in the Supplementary Material summarizes the numbers of LR regulation edges.
Reproducibility of the edges across four tumor samples
We first evaluated how ”reproducible” the inferred LR edges are across different tumor samples. Intuitively, methods detecting more shared (reproducible) edges across tumors shall enjoy better power for detecting meaningful biological interactions. As summarized in Fig. 2A,C–E, LRnetST inferred many more shared (reproducible) edges across multiple (at least two) samples than the other three methods. Specifically, LRnetST inferred 185 reproducible edges, while spatCellCellcom inferred 67, stLearn inferred 48, and spaCI inferred only 21 reproducible edges. Note that, for counting the reproducible edges, we considered undirected edges and further collapsed the binary and continuous components of each gene to one node. Further, assessing whether the overlap between LR edges for a pair of tumors can significantly surpass that by random chance (Fig. 2B), we found the results of LRnetST are significant for all six pairs of tumors, whereas the results from spatCellCellcom, stLearn, and spaCI demonstrated significant overlap for no more than half of the pairs. Finally, LRnetST identified four edges shared by all four tumors, whereas the other methods failed to identify any such edges. These results suggest a higher level of reproducibility of LRnetST compared with the alternatives.

Reproducible LR edges and LR network modules resulted from the 10x ST data. (A) Common edges inferred by LRnetST across the 4 10x ST data. (B) P-values for the pairwise correlation of the common edges between the samples by LRnetST, spatCellCellcom (Giotto), stLearn, and spaCI. (C) Common edges inferred by stLearn across the 4 10x ST data. (D) Common edges inferred by spatCellCellcom (Giotto) across the 4 10x ST data. (E) Common edges inferred by spaCI across the 4 10x ST data. (F)One ITGB1 module by LRnetST in 10x ST data sample 10, where ITGB1 in the tumor spot is causally associated with several genes from the neighbor spot. (G) The otherITGB1 module by LRnetST in 10x ST data sample 10, where ITGB1 in the neighboring stroma spot is causally associated with several genes from the tumor spot. (H) One LRP1 module by LRnetST in 10x ST data sample 10, where LRP1 in the tumor spot is causally associated with several genes from the neighbor spot. (I) The other LRP1 module by LRnetST in 10x ST data sample 10, where LRP1 in the neighboring stroma spot is causally associated with several genes from the tumor spot.
We then investigated the hub nodes in the LR networks by the four methods (Figs S3, S4). Hub nodes are those with higher connections in a network, and thus are often more likely to play an essential role [35]. We identified three common hub nodes, LRP1, ITGB1, and ITGAV, across all four tumors by LRnetST. On the other hand, only ITGB1 was detected as a common hub node by spatCellCellcom, stLearn, and spaCI.
Interestingly, by LRnetST (Fig. S3A), LRP1 in both tumor cell-enriched spots (LRP1_tumor) and stroma cell-enriched spots (LRP1_stroma) appeared to be a hub node in all LR networks. Of all the edges associated with LRP1, the interaction between LRP1_stroma and MMP9_tumor is the only common edge identified in all four tumor samples (Fig. 2H and Fig. S7B, D, F). Previous studies have highlighted the active involvement of the LRP1–MMP9 interaction in many biological processes [36] (see Discussion).
Moreover, a few interactions involving LRP1 were patient-specific, such as LRP1_Tumor–C1QB_Stroma interaction was detected only in the two chemo-sensitive patients, whereas, LRP1_Tumor–SERPINA1_Stroma was detected only in the two chemo-refractory patients (Fig. 2G-H, and Fig. S7A–F). The latter implies a potential connection between LRP1 and the protease [37].
Apart from LRP1, ITGB1 is another hub node inferred to interact with multiple genes in all four tumors by LRnetST (Fig. 2F and G, and Fig. S8A–D). Specifically, for both chemo-refractory patients, ITGB1_tumor was detected to interact with VCAM1 and VEGFA from the nearby stroma spots. At the same time, ITGB1_stroma was connected with multiple genes, including LAMC2 and VEGFA, from the nearby tumor spots. These interactions have been previously reported in multiple literatures [38, 39] (see Discussion).
To further assess the performance of various methods, we then applied LRnetST and the other three methods to two independent cell-level ST data of ovarian cancer, one from the MERFISH platform and the other from the NanoString CosMx SMI platform, to validate some of the detected LR interactions.
Application to MERFISH and CosMx SMI data for validation
We studied four MERFISH data sets [32] derived from four tumor slices of two ovarian cancer patients (Methods). The MERFISH datasets contain gene expression measurements of 550 genes in over 100 000 single cells. The CosMx SMI data contain gene expression measurements for a panel of 960 genes in 21 651 single cells from a HGSC patient.
Both the MERFISH and the CosMx SMI data offer cell-level resolution, enabling detailed monitoring of LR interactions between individual cells. This granularity is particularly advantageous for overcoming the challenge of cell type mixtures within spots encountered in 10x ST data. However, owing to technology limitations inherent to MERFISH and CosMx SMI, only a limited set of transcripts (few hundreds) can be detected and quantified. This significantly constrains the ability to systematically characterize LR interactions, underscoring our use of MERFISH and CosMx SMI data for validation rather than exploration purposes.
For preprocessing and clustering of the validation datasets we used the same pipeline implemented under R Giotto (see section A.4 of Supplementary material).
Figures 3A and 4A illustrate the spatial distribution of tumor cells and macrophage cells in one tumor slice (see Fig. S9A–C for other cell-types) for MERFISH and CosMx SMI data, respectively. While there are thousands of epithelial (pink) and macrophage or T-cells (light blue) cells in the tumor slice, we focused on the neighboring tumor-macrophage (or tumor-T-cells) pairs (red epithelial cells and dark blue macrophage cells). With LRnetST, we built a collection of LR networks, one for each cell type pair and each tumor slice for both validations datasets (Supplementary Tables S3 and S4). In the result of the MERFISH data, there were 53 nodes involved for LR interactions that were inferred in at least two tumor slices. Figure 3B–E illustrates subsets of inferred networks considering different neighboring cell type pairs for one tumor slice. Similarly, a subset of the inferred LR networks from the CosMx SMI data are shown in Fig. 4B and C, considering neighboring tumor-macrophage and tumor-T-cells pairs.

Inferred LR network based on the MERFISH data (Patient-2, slice-2) by LRnetST. (A) Spatial plots showing the index epithelial (tumor) cells in red, the neighboring macrophage cells close to the tumor cells in blue, the non-index epithelial (tumor) cells in pink, and the non-adjacent macrophage cells in light blue. (B)–(E) LR network topology for LR interactions inferred in at least two MERFISH data sets, with red edges indicating those detected also in the 10x ST data.

Inferred LR network based on Nanostring CosMx SMI data by LRnetST. (A) Spatial plots showing the index epithelial (tumor) cells in red, the neighboring T-cells (left) or macrophage (right) cells close to the tumor cells in blue, the non-index epithelial (tumor) cells in pink, and the non-adjacent macrophage cells in light blue. (B) and (C) LR network topology for modules with at least one LR interaction appearing in two or more LR networks from the 10x ST data, with red edges indicating those detected also in the 10x ST data.
We then assessed the validation of the inferred LR interactions from the 10x ST data. Specifically, we focused on the 185 reproducible edges identified in at least two out of the four 10x ST data sets by LRnetST. Among these, only 70 edges had both nodes measured in the MERFISH and/or CosMx SMI datasets. Of these, 30 edges were detected in the LR network constructed from either the MERFISH or CosMx SMI data (Fig. 5A, Table A.6 in the Supplementary Material). Additionally, 10 edges were confirmed based on both datasets (bold red and bold green edges in Fig. 5A). For example, the interaction between ITGB1_tumor and VCAM1_stroma as well as VEGFA_stroma were inferred in multiple tumor samples from all three data sets (Supplementary Tables S3 and S4), suggesting the robust signal of these LR interactions in ovarian tumor tissues.

LR network topology showing edges inferred in two or more tumors based on the 10x ST data, with illustration for validation results based on MERFISH and Nanosting CosMx data. (A) Network module topology of LR interactions detected by LRnetST in at least two tumors from the 10x ST data, highlighting modules with at least one validated edge in MERFISH and/or CosMx SMI data; red edges represents interactions inferred only in the two short-term survivor patients, while green edges indicate those identified in both short- and long-term survivors in the 10x ST data. (B) Similar to A but for the results of stLearn. (C) Similar to A but for the results of spatCellCellcom (Giotto).
In parallel, we performed the same analysis using spatCellCellcom, stLearn, and spaCI (Fig. 5B and C, Supplementary Tables S3 and S4). As illustrated in Fig. 5B and C, only three and one edges were validated in the MERFISH data, while two and three other edges were validated in CosMx SMI data, by stLearn and SpatCellCellcom, respectively. Note that no edge was inferred in all three data sets by either spatCellCellcom and stLearn (Fig. 5B and C). For spaCI, we obtained sparse LR networks and only the interaction between ITGB1 and VEGFA was validated (Supplementary Table S4; no Figure provided)). Note that one interaction between ITGB1 and VEGFA was detected and validated by all four methods.
Discussions
To gain insights on the molecular basis of TME, in this paper, we introduce a new method—LRnetST—to infer LR interaction networks among adjacent cells using ST data. LRnetST employs novel strategies to address the spatial structure and high dropout rates that are uniquely present in the ST data. We demonstrated benefits of LRnetST for modeling the zero-inflated ST data over alternative approaches on synthetic data.
Furthermore, we compared performance of LRnetST and alternative tools on the ovarian cancer ST datasets. The results highlight LRnetST’s superior capability in detecting reproducible signals that hold greater biological significance. However, similar to other high-dimensional network construction methods, the performance of LRnetST may be impacted by the dimension of the gene space, as demonstrated in our simulation study. This underscores the significance of leveraging the existing LR interaction databases as prior domain knowledge to constrain the network space, as exemplified in our ovarian cancer ST data analysis.
By LRnetST, LRP1 is identified as a hub node in LR networks. LRP1 is reported to make critical contribution to many processes that drive tumorigenesis and tumor progression [40, 41], and has been proposed to be an important diagnostic and prognostic biomarker of epithelial ovarian cancer [36]. Besides the previously reported role of LRP1 in ERK signaling pathway through interaction with MMP9 [36], which was also supported in our results (Fig. 2G), we further revealed strong interactions between LRP1 in tumor cells and SERPINA1 (AAT) (Fig. 3B and C, 5A, and S7A, S7C, S7E) in the two short-term survivor samples in the 10x ST data sets, which were consistently validated across multiple tumor samples in the MERFISH data (Fig. 3B and C, 5A). Unfortunately, LRP1 was not measured in the CosMx SMI data. While the role of LRP1 and SERPINA1 interaction in the context of cancer remains largely unexplored, existing research in cardiovascular studies has highlighted its significance in an anti-inflammatory mechanism [42]. By analogy, one may hypothesize that the interaction between LRP1 in tumor cells and SERPINA1 in stroma cells could contribute to an immunosuppressive effect within the TME.
Interaction between LRP1_Tumor and C1QB_Stroma (pattern recognition molecule of innate immunity) [34] were detected only in the two chemo-sensitive patients, suggesting a potential role of LRP1-C1QB crosstalk network in modulating immune response, which may contribute to the chemo-treatment effectiveness in HGSC patients. Unfortunately, C1QB was not measured in the MERFISH data sets, so validation of this interaction using additional resources is warranted as future studies.
Another hub node detected in all four tumors based on 10x ST data sets was ITGB1, which has been inferred to interact with VEGFA and VCAM1 in the two chemo-refractory patients. Interactions between ITGB1 and VEGFA/VCAM1 was validated in either one or both single-cell level data. Intercellular communication mediated via interaction between VEGFA and ITGB1 has been reported to contribute to development, differentiation, and inflammation [38]. This interaction underlies the crosstalk between epithelial cells and fibroblasts relating to tumor inflammation in Pancreatic ductal cancer [43]. While VCAM1 mediates intercellular adhesion by specific binding to the integrin ITGB1 on leukocytes [39], the interaction between ITGB1 and VCAM1 has also been reported in the context of cell migration between brain endothelial cells and central memory T cells [44]. However, our analysis, for the first time, suggests the potential roles of the cross-talk between ITGB1, VEGFA, and/or VCAM1 among ovarian tumors.
This study focuses on analyzing static ST data from single time points. In the future, when temporal ST data are more accessible, causal relationships might be inferred with greater confidence by utilizing dynamic network-building techniques such as temporal link prediction [45, 46].
LR interaction-driven cell–cell communication is a key component of spatial domain communication. Further evaluation of how LR interactions vary across different spatial regions would be valuable and warrants future studies.
Conclusion
ST data offer an unprecedented opportunity to unravel the molecular mechanisms underlying cell–cell interactions within the TME. LRnetST proves to be a valuable tool for constructing LR interaction networks based on ST data. The outcomes of these enhanced analyses promise to unveil crucial contributors driving cell–cell interactions, potentially leading to the identification of new predictive biomarkers or therapeutic targets.
We propose a new method, LRnetST, to characterize the LR interactions among neighboring cells using multicellular or single-cellular ST data.
LRnetST utilizes directed acyclic graph models with customized treatment to handle the zero-inflated distribution of the ST data and employs an aggregation framework to enhance network inference.
LRnetST is adaptable to incorporate prior information such as those from existing LR databases.
Applying LRnetST to both multicellular 10x ST data and independent single-cell MERFISH data and CosMx SMI data of ovarian tumors, we identified several common LR interactions, and demonstrated that LRnetST leads to more reproducible results compared with alternative methods.
Author contributions
Shrabanti Chowdhury (Writingoriginal draft, Formal Analysis, Visualization, Investigation), Sammy Ferri-Borgogno (Formal Analysis, Visualization, Investigation), Peng Yang (Writingoriginal draft, Formal Analysis, Visualization, Investigation), Wenyi Wang (Writingoriginal draft, Formal Analysis, Visualization, Investigation, Supervision, Funding acquisition), Jie Peng (Writingoriginal draft, Formal Analysis, Visualization, Investigation, Supervision, Funding acquisition), Samuel C. Mok (Formal Analysis, Visualization, Investigation, Supervision, Funding acquisition), and Pei Wang (Writingoriginal draft, Formal Analysis, Visualization, Investigation, Supervision, Funding acquisition).
Conflict of interest: No competing interest is declared.
Funding
This work is supported by grants U24CA271114, U01CA271407, U24CA210993, U01CA214172, U01CA294459, and R01CA268380 from National Institute of Health and National Science Foundation (Division of Mathematical Sciences) grant 1915894. We also thank Dr. Guocheng Yuan and his team members for their instruction on the use of Giotto.
Data availability
An R package of LRnetST is made available as a github repository https://github.com/jie108/LRnetST. All the codes along with the data will be made available through this github repository.
References
Author notes
Shrabanti Chowdhury and Sammy Ferri-Borgogno Co-first author.