-
PDF
- Split View
-
Views
-
Cite
Cite
Zizhan Gao, Kai Cao, Lin Wan, Graspot: a graph attention network for spatial transcriptomics data integration with optimal transport, Bioinformatics, Volume 40, Issue Supplement_2, September 2024, Pages ii137–ii145, https://doi.org/10.1093/bioinformatics/btae394
- Share Icon Share
Abstract
Spatial transcriptomics (ST) technologies enable the measurement of mRNA expression while simultaneously capturing spot locations. By integrating ST data, the 3D structure of a tissue can be reconstructed, yielding a comprehensive understanding of the tissue’s intricacies. Nevertheless, a computational challenge persists: how to remove batch effects while preserving genuine biological structure variations across ST data. To address this, we introduce Graspot, a graph attention network designed for spatial transcriptomics data integration with unbalanced optimal transport. Graspot adeptly harnesses both gene expression and spatial information to align common structures across multiple ST datasets. It embeds multiple ST datasets into a unified latent space, facilitating the partial alignment of spots from different slices. Demonstrating superior performance compared to existing methods on four real ST datasets, Graspot excels in ST data integration, including tasks that require partial alignment. In particular, Graspot efficiently integrates multiple ST slices and guides coordinate alignment. In addition, Graspot accurately aligns the spatio-temporal transcriptomics data to reconstruct human heart developmental processes.
Graspot software is available at https://github.com/zhan009/Graspot.
1 Introduction
Recently, spatial transcriptomics (ST) technologies have enabled the simultaneous measurement of mRNA expression while retaining spatial information within a tissue slice. The integration of multiple slices provides a comprehensive understanding of complex biological processes in entire tissues, fostering innovative approaches to downstream tasks such as analyzing spatial expression across slices and exploring 3D cell–cell communications (Ståhl et al. 2016, Baccin et al. 2020, Ji et al. 2020).
Unlike single-cell batch effect correction, there is no correspondence between spots from different ST slices. Therefore, how to identify and preserve unique biological structural variations during integration remains challenging.
Several methods have been used for alignment and integration of multiple ST slices. For example, PASTE (Zeira et al. 2022), which is based on fused Gromov–Wasserstein optimal transport, can align slices in a global manner. However, PASTE assumes that slices have global similarity and fails to consider the biological structure variations across slices. To solve the problem, PASTE2 (Liu et al. 2023) extends the single-cell data integration method Pamona (Cao et al. 2022b) and introduces a partial fused Gromov–Wasserstein method to select and align only a subset of spots. Meanwhile, both PASTE and PASTE2 work on original space or linear embedding, and cannot provide a common integrated space across ST slices. Moreover, nonlinear embedding based methods are developed, such as GPSA (Jones et al. 2023) and STAligner (Zhou et al. 2023). GPSA (Jones et al. 2023) integrates multiple ST slices into a common coordinate system using deep Gaussian processes, while STAligner (Zhou et al. 2023) embeds gene expression and spatial neighbor network of spots with a graph attention network (GAT) and aligns slices using the mutual nearest neighbor (MNN) originally developed for single-cell data integration. However, none of them are capable of handling the partial alignment of slices or providing the probabilistic mapping of spots in adjacent slides for downstream analysis, as achieved by PASTE and PASTE2. There are also some generic methods, such as Tangram (Biancalani et al. 2021) and uniPort (Cao et al. 2022a), are designed to integrate and align single-cell and ST data. Tangram (Biancalani et al. 2021) is a deep learning method that aligns single-cell data onto ST data, while uniPort (Cao et al. 2022a) is a unified single-cell data integration framework that combines a coupled variational autoencoder and mini-batch unbalanced optimal transport (UOT). The UOT module allows uniPort more suitable for heterogeneous data integration. However, both Tangram and uniPort fail to utilize the spatial information provided by ST data.
Several existing methods, including STAligner, stack ST slices in a coordinate framework but are limited by requiring the guidance of manual landmarks.
Here we present Graspot, a GAT with UOT that aligns and integrates multiple ST data. Graspot extends uniPort by leveraging both GAT and UOT with the aim to efficiently utilize both gene expression and spatial information provided by ST data for the integration. Graspot inherits the GAT architecture of STAGATE (Dong and Zhang 2022) which nonlinearly embeds both gene expression and spatial information of each slice into a common latent space. The UOT module of Graspot enables it to reconstruct the probabilistic mapping of spots in the latent space across slices (Peyré and Cuturi 2019). Based on the partial probabilistic mapping, Graspot not only efficiently removes the batch effect, but also retains the biological structure variations across slices. We iteratively optimize the GAT and UOT distance to generate both the integrated embedding and probabilistic alignment. Probabilistic alignment result of Graspot can guide the alignment of slices with rotation and translation through weighted iterative closest point (ICP) algorithm in a 3D coordinate framework.
We verified Graspot on human dorsolateral prefrontal cortex (DLPFC) ST data, anterior sagittal mouse brain ST data, HER2 breast cancer ST data, and ST datas of human heart development. Our results indicate that Graspot attains highest alignment accuracy compared to competing methods both on global and partial alignment. Graspot can also integrates multiple ST slices via the pretraining of model. In addition, Graspot automatically align ST slices in a 3D coordinate framework. For instance, Graspot accurately aligns spatio-temporal ST slices to reveal the human heart developmental processes.
2 Materials and methods
2.1 Overview of Graspot
Graspot is a GAT for ST data integration with UOT (Fig. 1). The main inputs of Graspot are multiple ST slices which contain both gene expression profiles and two-dimensional spot coordinates. There are three main outputs of Graspot: (i) a probabilistic alignment of spots across multiple ST slices; (ii) a common low-dimensional space that recovers and aligns multiple ST slices; (iii) 3D coordinate alignment.

Overview of Graspot algorithm. The inputs of Graspot are multiple spatial transcriptomics slices with gene expression matrices and spatial coordinates. The outputs of Graspot are probabilistic alignment T and common integrated space aligns intrinsic structures of multiple ST slices. Probabilistic alignment matrix T guides the coordinate alignment using weighted ICP algorithm.
2.2 Mathematical formulation of Graspot
Suppose that two ST slices and are inputs of Graspot, where is the normalized gene expression of spot i, is the normalized gene expression of spot j and is the coordinate of spot i(j). g represents for the number of highly variable common genes. Graspot first builds an undirected spatial neighbor network and computes an adjacency matrix A based on spot locations in respective slices. Here, if and only if the Euclidean distance between spot i and spot j in same slice is less than a predefined radius r. Afterwards, Graspot inputs two ST slices through GAT framework and learns its k-dimensional embeddings. The latent vector z generated by the graph attention auto-encoder which achieves nonlinear dimensionality reduction while collectively aggregating information from neighbors of spots.
The neural network module in Graspot consists of three parts: an encoder, a decoder, and the graph attention layers. The inputs of encoder are initialing as . The encoder in Graspot generates the embedding of spot i in kth layer as follows:
where σ is the nonlinear activation function, Si is the neighbor set of spot i according to spatial neighbor network, and is weight matrix for training. Besides, Graspot employs a self-attention mechanism, which is an added single-layer neural network in kth layer in the encoder, to adaptively learn information from neighbors. The edge weight between spot i and spot j from same slice is described as follows:
where and are trainable weight vectors, is the represent of spot j in k−1 layer, and attij is normalized form of eij via a softmax function as follows:
After obtaining the latent representations from the encoder, Graspot applies a decoder to reverse the final spot embedding back to the original normalized gene expression. The last layer of the decoder is formulated as follows:
It has the same number of layers as encoder and we set and , respectively, to avoid overfitting. One of the objectives minimized by Graspot is the reconstruction loss of normalized expressions as follows:
To better integrate heterogeneous ST slices in a common low-dimensional space, we further design an alignment term for different ST slices using UOT. Suppose we have two latent vectors and generated by encoder ψ, respectively. Due to the favorable properties of the common latent space that recover intrinsic structures of ST slice, the UOT cost between spot i and spot j in different slices is defined as Euclidean distance of latent vectors and as follows:
We then compute the following unbalanced entropic optimal transport plan (Benamou 2003):
where , entropy regularization term , and . ϵ is an entropy parameter. ρ1 and ρ2 are two regularization parameters. As mentioned above, multiple ST slices may only partially overlap, and as a result, there may be a portion of spots in both slices that are seldom aligned. It is worth noting that the UOT module in Graspot allows only a fraction of the total mass to be transported between two distributions. Equation (7) combines entropic regularization (Cuturi 2013) with a more general formulation for UOT. Unbalanced entropic optimal transport allows to slightly diverge from initial mass during transportation and can be optimized using variations of the Sinkhorn algorithm (Janati et al. 2020). This is a strictly convex optimization problem and can be solved via an efficient inexact proximal point method (Xie et al. 2020) as:
The initial and , and the final UOT plan is calculated as . The second objective of our method is defined as the UOT loss:
Therefore, the total objective minimized by Graspot is formulated as:
Here, γ is the trade-off of reconstruction loss and UOT loss. We set in the following experiments.
We perform UOT and graph attention auto-encoder training in an iterative manner to generate a probabilistic optimal transport plan and a common aligned low-dimensional space. Graspot provides users an option to initialize in the optimization of Equation (7) to enhance the participation of spatial information in alignment module. Finally, the optimal transport plan T produced as the output represents the probabilistic correspondence between spots. The encoder’s output is a commonly aligned low-dimensional space for multiple ST slices. Details of Graspot algorithm are shown in Supplementary Note S1.
Once probabilistic alignment T is obtained, the source slice would be transformed to the target slice according to the optimal rotation and translation solution from SVD-based weighted ICP optimization in coordinate framework (Supplementary Note S3).
3 Methods evaluation
We evaluate our method both from the spots mapping T for alignment and nonlinear integrated embedding z. We mainly employ the metrics of Alignment accuracy and Label Transfer ARI to assess the performance of spots mapping T for alignment. Given spots mapping , we compute the maximum probability matching and one-to-one spot correspondence. Alignment accuracy measures the fraction of matched spots belonging to the same cell types across slices. Label Transfer Adjusted Rand Index (LTARI) indicates the overlap of annotated cell types and corresponding cell types. To evaluate nonlinear integrated embedding z, we employ the metrics of two categories including removal of batch effects and conservation of biological structure variations. Here the former category, including Batch Entropy score and Batch Average Silhouette Width (Batch ASW), refers to measure batch mixing. The latter category, including Silhouette (Cell-type ASW) score, is used to determine the separation of cell types. Both metrics work on the basis of the common embedded space of the integrated datasets (See Supplementary Note S2).
4 Results
4.1 Graspot outperforms the state-of-the-art methods on global alignment with highest accuracies
We applied Graspot to analyze 10× Genomics Visium ST data from human DLPFC. This dataset consists of 12 slices of DLPFC tissue from three samples, denoted as I, II, and III. Each sample consists of four slices labeled from A to D along the z-axis. Within each sample, the first adjacent pair AB and the last adjacent pair CD are 10 µm apart, whereas the middle pair BC of slices is located 300 µm apart (Maynard et al. 2021). The slices were manually annotated into six neocortical layers plus a white matter by Maynard et al. (2021). We used the annotation for method evaluations.
We applied Graspot to compute a pairwise slice alignment of each pair of consecutive slices from the same sample. The visualization of parallel spot matching results in Sample III using Graspot is shown in Fig. 2a. We compared the pairwise alignments obtained by Graspot with PASTE (Zeira et al. 2022), STAligner (Dong and Zhang 2022), Tangram (Biancalani et al. 2021), and the results are shown in the Fig. 2b and c.

(a) Spot matching results in Sample III using Graspot. Blue lines indicate correct matches, whereas red lines denote incorrect matches. (b) Accuracy of pairwise alignment of consecutive DLPFC slices (labeled AB, BC, and CD) for Tangram, STAligner, PASTE, and Graspot, which is computed based on published annotations of spots. (c) Label Transfer ARI, Batch entropy, Batch ASW, and Silhouette scores of pairwise slices alignment.
Graspot achieved the highest alignment accuracy in 6 of 9 pairwise alignment which shows the efficiency of our method in the alignment tasks in ST of human DLPFC data. As Fig. 2b shows, Graspot obtained a higher accuracy mostly above 0.8 in the close pairs (AB and CD) but lower accuracy in the far pairs (BC). Tangram algorithm achieved lowest accuracy from 0.26 to 0.48 on all slice pairs. STAligner algorithm achieved lower accuracy from 0.77 to 0.88 on all slice pairs, outperforming Graspot only on one middle BC slice pair in Sample I and II. Graspot achieved higher Label Transfer ARI scores with less variation than other methods in nine pairwise alignment tasks. The mean Label Transfer ARI score using Graspot is 0.67, surpassing other compared methods. The results of Batch entropy, Batch ASW, and Silhouette score are shown in Fig. 2c. Graspot is more competitive than STAligner both on removing batch effects and conserving biological structure variations in latent embeddings. Since STAligner cannot provide the probabilistic alignment across slices, we performed a matching strategy in its embedding space for the following evaluation. Due to the lack of a common integrated space in PASTE and Tangram, we only compared our method with STAligner.
These results demonstrate that Graspot, which combines transcriptional and spatial information to align ST data, frequently outperforms algorithms that use only transcriptional information. Additionally, our method, which iteratively learns the embedding and mapping using UOT, exhibits better performance compared to the fused Gromov–Wasserstein optimal transport strategy employed in PASTE and spot triplet learning strategy in STAligner. Figure 2 shows the superiority of Graspot in evaluation of pairwise alignment of consecutive DLPFC slices (labeled AB, BC, and CD) over Tangram, STAligner, and PASTE. It is evident that Graspot improves alignment performance between adjacent slices. For instance, this enhancement is noticeable for all pairs in Sample III and for the BC pairs in both Sample I and Sample II.
4.2 Graspot outperforms the state-of-the-art methods on partial alignment with highest accuracies
To evaluate Graspot’s ability to handle unbalanced datasets using UOT module, we tested Graspot on vertical partial subslices of human DLPFC tissues. The slices that we selected are derived from four DLPFC slices in Sample III. For each slice, we generated two partially overlapping subslices in the following manner: selecting two vertical lines on a slice to generate two subslices which encompass 70% of the total spots, as illustrated in Fig. 3a.

(a) Two vertical subslices, one on the left and the other on the right, exhibit a 70% overlap of spots across four slices from Sample III. (b) Label transfer ARI scores of pairwise alignment of subslices in (a) for PASTE, PASTE2 (s = 0.8), PASTE2 (s = 0.9), and Graspot.
Following the strategy of matching to the closest in De Plaen et al. (2023), we set and for partial alignment. We applied our method to find alignments within the two generated subslices for each of the slices A, B, C, and D in Sample III, separately. We follow PASTE2 (Liu et al. 2023) to use the Label Transfer ARI to measure the accuracy of partial alignment: Ground Truth clustering in slice with fewer spots and reordered Ground Truth clustering induced from spot correspondence in another slice. We compared our results with the other two methods (Fig. 3b). Consequently, Graspot demonstrated superior performance with results ranging between 0.68 and 0.71. In contrast, PASTE exhibited lower alignment accuracy ranging from 0.14 to 0.24, attributable to its inability to handle partially overlapping slices. PASTE2 is designed for partial alignment, allowing only a fraction s of the probability mass to be transported. Here we set s in PASTE2 equal to 0.8. PASTE2 achieved a lower alignment result, ranging from 0.62 to 0.66 compared to Graspot. PASTE2 yielded inferior outcomes with s = 0.9, varying from 0.50 to 0.55. This suggests that PASTE2, when set to a larger s value, intends to align the entirety of spots between slices.
4.3 Graspot efficiently integrates multiple ST slices via pretraining
The trained graph attention neural network is considered as a reference model which endow Graspot with the capability to iteratively integrate multiple slices into an aligned embedding within a common low-dimensional space. To verify the dominance of handing multiple ST datas, we applied Graspot in four slices of human DLPFC Sample III as follows.
We trained the neural network thoroughly within our model to align the pairwise slices (Slice B and Slice C) and then input the new slices from the same sample (Slice A and Slice D). As shown in Fig. 4a, the newly inputted slices, including Slice A and Slice D, align well with the existing slice embeddings in the common low-dimensional space. Additionally, the embeddings display clear patterns of cluster results that are in accordance with manually annotated layers. We also integrated the four slices in a common space using STAligner for comparison (Fig. 4b). The embeddings produced by Graspot demonstrate superior alignment and distinct clustering when compared to STAligner, particularly in cases involving the integration of multiple ST slices.

(a) Left: Embedding of four slices in Sample III using Graspot. Right: Clustering results of Graspot. (b) Left: Embedding of four slices in Sample III using STAligner. Right: Clustering results of STAligner.
We also applied Graspot to analyze three ST slices from HER2 breast cancer of patient G (Andersson et al. 2021). The experiment demonstrated that Graspot is capable of effectively performing tasks related to alignment, integration, and identification of distinct spatial structures in biological data. The three ST slices from HER2 breast cancer of patient G are adjacent and part of slices are partial overlap, as depicted in Fig. 5a. Following the alignment of the pairwise slices G1 and G2 using Graspot, the subsequent slice G3 aligns effectively with the existing latent embedding in the shared low-dimensional space. This process results in an integrated embedding, as depicted in Fig. 5b. Afterwards, we employed the Louvain clustering algorithm on integrated spots within the common low-dimensional space. The clustering outcomes for the template slice G2 are displayed in Fig. 5c. Graspot successfully identifies two regions of spots, aligning with the in situ cancer regions identified by pathological annotations, as illustrated in Fig. 5d. These results affirm that the integration of multiple ST slices using Graspot effectively preserves and unveils intricate structures within the tumor microenvironment.

Graspot integrated multiple human breast cancer spatial transcriptomics data. (a) Three spatial transcriptomics slices from HER2 breast cancer patient G with sections of adjacent slices overlapping along the z-axis. (b) Integrated embedding of three ST slices in common space. (c) Clusters in template slice G2. (d) Pathological annotations of G2 slice.
4.4 Graspot efficiently guides coordinate alignment of ST slices
Without the prior information as anchor pairs, Graspot is able to automatically registrate ST slices in coordinate framework with the guidance of probabilistic alignment T. We applied Graspot to analyze one pair of 10× Genomics Visium ST datas from sagittal mouse brain with complex structures.
Figure 6a depicts the 3D landscape of spot matching results between two replicates of sagittal mouse brain data using Graspot, PASTE, and STAligner, respectively. We benchmarked the performances of methods using the ground truth of annotations which contain common hierarchical layers including “L5,” “L2/3,” “Astrocyte,” and “CA.” We conducted alignment accuracy evaluation in probabilistic spots mapping T compared with PASTE and STAligner. Graspot achieved a highest score of 0.31, while STAligner achieved the second highest score of 0.30, and PASTE achieved a lowest score of 0.29 (left panel of Fig. 6b). To assess and demonstrate the integration performance within the common space, we used metrics such as Batch entropy and silhouette score for comparing two methods, namely STAligner and Graspot. Given that PASTE does not offer a common embedding space, it was excluded from these subsequent comparisons. We found that Graspot obtain a higher Batch entropy score than that of STAligner (middle panel of Fig. 6b), while its silhouette score is slightly lower than STAligner (right panel of Fig. 6b).

(a) Spot matching results in Sample III using Graspot, PASTE, and STAligner. Blue lines indicate correct matches, whereas red lines denote incorrect matches. (b) Label Transfer ARI, Batch entropy, and Silhouette scores of pairwise slices alignment. (c) Alignment of slices in coordinate with rotation.
Given probabilistic spots mapping T, Graspot solved weighted ICP algorithm to compute the optimal rotation and translation solution. As shown in Fig. 6c, slice “anterior 2” can be transformed through a 90-degree rotation stitching to slice “anterior 1” which indicates the capability of Graspot in coordinate alignment as one part of downstream analysis.
4.5 Graspot partially aligns spatio-temporal ST data of human heart development
Graspot can provide more intuitive insights of spot relationships such as cell growth and differentiation. We applied Graspot to spatio-temporal ST data of human heart development. This dataset contains three slices collected from human heart in early embryos, ranging from 4.5 to 5 post-conception weeks (PCW), 6.5–9 PCW (Asp et al. 2019) in Fig. 7a. It is clear that the number of cell types within the ST sections increases as embryonic development progresses (Fig. 7a). For example, the main myocardial region (e.g. cell types 0, 1, 2, 3, 4) grew dramatically between 4.5–5 PCW and 9 PCW, reflecting the heart’s growth processes.

(a) Distribution of spot types in ST slices sampled from three different developmental stage. (b) Coordinate alignment result of ST slices from 4.5 to 5 PCW and 6.5 PCW. (c) Coordinate alignment result of ST slices from 6.5 PCW to 9 PCW. (d) The alignment of three ST slices (4.5–5 PCW, 6.5 PCW, 9 PCW) in a 3D coordinate framework.
We used the probabilistic alignment matrix T output from Graspot to align the coordinates of ST slices at the first two time points and the last two time points, respectively, using the weighted ICP algorithm (Fig. 7b and c). The results show that the aligned overlapping regions of different slices share common cell types, whereas unaligned regions retain slice-specific cell types. This result indicates that Graspot can preserve the common structure of slices during alignment, and provides a basis for further downstream analysis of cell growth and differentiation mechanisms.
We also aligned the three ST slices of human heart development into a 3D coordinate framework. For multiple ST slices alignment, Graspot selects one slice as the reference template, and aligns the other slices with respect to the reference. Here we set the slice of 6.5 PCW as the reference, and aligned the slices of 4.5–5 PCW and 9 PCW to the reference slice. The resulting alignment of ST slices in a 3D coordinate framework is shown in Fig. 7d. It is also clear that the invariant cell types among the three slices (e.g. cell types 2 and 6) are well aligned.
5 Discussion
In this study, we introduce Graspot, a GAT designed for the integration of ST data using optimal transport. It generates probabilistic alignments for downstream analysis and aligns the intrinsic structures of multiple slices within a common low-dimensional space. Additionally, Graspot adeptly handles unbalanced alignment by allowing for partial overlap between slices. Our findings demonstrate Graspot’s capability to effectively perform tasks related to alignment, integration, and identification of unique spatial structures. Specifically, Graspot achieves alignments with higher accuracy compared to existing methods on ST data.
It is worth noting that, in the global alignment task of DLPFC dataset, Graspot achieved the second highest alignment accuracy scores in BC pairs of samples I and II (Fig. 2b), surpassed by STAligner. We took a close look at the alignment of BC pair in sample II (Fig. S1 in Supplementary Note S4). We found that Layer4 and Layer5 are invariant across slice B and slice C, whereas Layer3 are expanded from slice B to slice C, Layer6 and WM are contracted from slice B to slice C. The mismatched spots between BC pair (red lines in Fig. S1 in Supplementary Note S4) by Graspot mainly occurs between Layer3 in slice B and WM in slice C. The cross-layer mismatches may be due to the discontinues changes in slice B and slice C. To solve the problem of cross-layer mismatches, we plan to further improve Graspot by incorporating feature information (e.g. fused OT) to penalize the matching across different cell types.
Graspot is a computationally efficient algorithm (Fig. S2 in Supplementary Note S5). However, it requires memory consumption in the storage of optimal transport plan matrices T. Therefore, it may not perform well when sample size is in large-scale ST data. Designed for scalability, Graspot can efficiently handle large-scale ST data by employing subgraph sampling and a mini-batch scheme. We plan to explore these topics in our future work.
Supplementary data
Supplementary data are available at Bioinformatics online.
Conflict of interest
None declared.
Funding
This work was supported by the National Key Research and Development Program of China [grant number 2022YFA1004801], National Natural Science Foundation of China [grant number 12071466], and Eric and Wendy Schmidt Center at the Broad Institute of MIT and Harvard. This paper was published as part of a supplement financially supported by ECCB2024.
Data availability
Graspot software is available at https://github.com/zhan009/Graspot. The datasets in this study are all from publicly available datasets. DLPFC dataset is available online at http://research.libd.org/spatialLIBD/. The sagittal mouse brain ST dataset is available online at https://zenodo.org/record/6925603#.YuM5WXZBwuU. HER2 breast cancer ST dataset is available online at https://doi.org/10.5281/zenodo.6334774. And spatio-temporal ST dataset of human heart development is available online at https://www.spatialresearch.org.