-
PDF
- Split View
-
Views
-
Cite
Cite
Qiheng Qian, Ruikun Xue, Chenle Xu, Fengyu Wang, Jingyao Zeng, Jingfa Xiao, CVD Atlas: a multi-omics database of cardiovascular disease, Nucleic Acids Research, Volume 53, Issue D1, 6 January 2025, Pages D1348–D1355, https://doi.org/10.1093/nar/gkae848
- Share Icon Share
Abstract
Cardiovascular disease (CVD) is the leading cause of illness and death worldwide. Numerous studies have been conducted into the underlying mechanisms and molecular characteristics of CVD using various omics approaches. However, there is still a need for comprehensive resources on CVD. To fill this gap, we present the CVD Atlas, accessed at https://ngdc.cncb.ac.cn/cvd. This database compiles knowledge and information from manual curation, large-scale data analysis, and existing databases, utilizing multi-omics data to understand CVDs comprehensively. The current version of CVD Atlas contains 215,333 associations gathered from 308 publications, 652 datasets and 7 databases. It covers 190 diseases and 44 traits across multiple omics levels. Additionally, it provides an interactive knowledge graph that integrates disease-gene associations and two types of analysis tools, offering an engaging way to query and display relationships. CVD Atlas also features a user-friendly web interface that allows users to easily browse, search, and download all association information, research metadata, and annotation details. In conclusion, CVD Atlas is a valuable resource that enhances the accessibility and utility of knowledge and information related to CVD, benefiting human health and CVD research communities.

Introduction
Cardiovascular disease (CVD) is the primary cause of illness and death globally, resulting in 17.8 million deaths in 2017, a 21.1% increase from 2007 to 2017 (1). The number reached 20.5 million in 2021 (2). CVD includes a range of disorders affecting the heart and blood vessels, such as coronary artery disease (CAD) and stroke. Various omics studies on CVD have revealed some important genes associated with the disease. For example, a comprehensive gene-centric association study identified four genetic variants linked to myocardial infarction (3). RNA-seq analysis and network construction pinpointed lncRNA SNHG8 as a risk factor for acute myocardial infarction, supported by RT-PCR for its diagnostic value (4). In addition, methylation of the PTEN promoter was found to be involved in cerebral cavernous malformation (CCM), a type of vascular lesion in the central nervous system (5). A proteomic study also suggested that LRP2 and SZT2 may play a role in CVD (6). An untargeted metabolomic study on coronary heart disease (CHD) identified four lipid-related metabolites with a causal role in CHD (7). These research findings, encompassing genomics, transcriptomics, epigenomics, proteomics, and metabolomics, offer a comprehensive understanding of the molecular mechanisms underlying CVD. Furthermore, significant research efforts have been made to identify CVD-related traits, such as blood lipid levels, blood pressure, and inflammatory markers, through genome-wide association studies (GWAS) to elucidate the genetic loci associated with these risk factors for CVD (8–10).
The advancement of sequencing technology has provided valuable insights into the mechanisms and molecular signatures of cardiovascular disease (CVD). As a result, there is now an abundance of CVD-related data and knowledge available. To better understand the biological characteristics of CVD, it is crucial to develop integrated and organized multi-omics databases. These databases are designed to provide a comprehensive view of the complex biological processes underlying CVD, including CaGE (11), CADgene (12), CardioGenBase (13) and C/VDdb (14). CaGE is a knowledgebase of cardiac gene expression, presenting 7,188 unique Human LocusLink identifiers expressed in human normal or dilated cardiomyopathic failing cardiac tissue. CADgene provides evidence related to coronary artery disease and myocardial infarction, with information on more than 300 candidate genes from over 1,300 publications of genetic studies. CardioGenBase uses a text mining approach to collect gene-disease associations, containing about 1,500 CVD genes and 24,000 research articles on six major CVDs. C/VDdb is an integrated multi-omics information resource of CVD, consisting of 4,353 unique molecule entries collected from nearly 100 manually curated studies. However, these existing databases for cardiovascular diseases (CVD) have some limitations. Firstly, these databases often only focus on a narrow range of CVD disorders, neglecting others requiring comprehensive characterization. Secondly, the abundant omics data urgently calls for unified analysis, improved integration, and visualization to unlock their full potential and gain comprehensive insights. Thirdly, prior knowledge from other published databases must be utilized more effectively. Lastly, continuously updating or making these databases readily available is crucial.
To tackle these issues, we have developed CVD Atlas, a comprehensive database for cardiovascular disease. CVD Atlas includes manually curated gene–disease associations, analysis results from human multi-omics datasets and prior knowledge from existing databases. Furthermore, it boasts an interactive, information-rich knowledge graph and provides two types of analysis tools. CVD Atlas offers the latest integrated multi-omics resources for cardiovascular disease research. Consequently, we believe that CVD Atlas will significantly support cardiovascular disease research and clinical practice.
Materials and methods
Data curation and integration
Our process began with a comprehensive literature search in NCBI PubMed (https://pubmed.ncbi.nlm.nih.gov/). We used predefined keywords such as ‘(Cardiovascular disease) AND (GWAS)’, ‘(Cardiovascular disease) AND (Transcriptome)’ and ‘(Cardiovascular disease) AND (Methylation)’. We then reviewed the articles rigorously, ensuring that only the most relevant and high-quality publications were included. Subsequently, we manually curated the study information from each relevant publication, a step that underscores the human expertise and judgment involved in the process. We selected only associations reported at significant levels (P-value < 0.05 or adjusted P-value < 0.05) for inclusion in CVD Atlas. To create a more comprehensive database, we also reviewed and integrated prior knowledge from existing databases such as PedAM (15), TWAS Atlas (16), HMDD v4.0 (17), miRNASNP-v3 (18), CTD (19), PharmGKB (20) and RNADisease v4.0 (21).
Multi-omics data collection and processing
We obtained GWAS summary statistics for cardiovascular disease (CVD) from the GWAS Catalog (https://www.ebi.ac.uk/gwas/) (22). We processed the raw data using MungeSumstats v1.10.1 (23) to ensure consistency with the reference genome GRCh38 (hg38). We conducted various quality control steps such as filling in missing SNP IDs, chromosome (CHR), base pair position (BP), and statistical values like effect size (Beta), standard error (SE), odds ratio (OR) and P-value using available data. We then used a P-value threshold of 5.0E-8 to identify significant SNPs. If no SNPs reached the genome-wide significance level in a GWAS dataset, we defined a genetic locus using the suggestive P-value threshold of 1.0E-4. For fine-mapping, we performed colocalization analysis using the R package coloc v5.2.2 (24), integrating GWAS and eQTL data from 49 tissues in GTEx v8 (https://gtexportal.org/) (25). We defined genetic loci as 1 MB windows around lead SNPs with the smallest P-value and used recommended thresholds (PP4 ≥ 0.75 and PP4/PP3 ≥ 3) to establish causal gene associations.
For transcriptomics, we collected microarray and RNA-sequencing (RNA-seq) data from the GEO database (https://www.ncbi.nlm.nih.gov/geo/) (26,27). The gene expression profiles of the microarray data were obtained using the advanced R package GEOquery (28). We performed normalization and differential expression analysis using the powerful R package limma (29). The bulk RNA-seq data were downloaded, and we used the state-of-the-art tools fastp v0.20.0 (30), STAR v2.7.6a (31) and RSEM v1.3.1 (32) for quality control, read alignment, and estimation of gene expression levels, respectively. We utilized DESeq2 v1.38.3 (33) for differential expression analysis, identifying differentially expressed genes based on |log2FC| ≥ 1 and adjusted P-value < 0.05. GO enrichment and co-expression analyses were performed using the R packages clusterProfiler v3.18.0 (34) and WGCNA v1.70-3 (35), respectively.
We obtained beta-value matrices for methylation data from the Illumina HumanMethylation450 BeadChip (450K) from GEO. Then, we used the R package ChAMP v2.32.0 (36–38) to identify differentially methylated positions (DMPs). DMPs with |delta beta| ≥ 0.2 and adjusted P-value < 0.05 were considered significant.
For proteomics, we downloaded protein expression profiles from PRIDE (https://www.ebi.ac.uk/pride/) (39) and conducted the differential expression analysis using the R package DEP v1.12.0 (40). We considered proteins with |average log2FC| ≥ 1 and P-value < 0.05 as significantly differentially expressed proteins.
For metabolomics, we carefully selected metabolomic datasets from the Metabolomics Workbench (https://www.metabolomicsworkbench.org/) (41) and performed the differential analysis using the R package MetaboAnalystR v4.0.0 (42). This rigorous selection process ensures the high quality and reliability of our data. We identified differential metabolites with a variable importance projection (VIP) score ≥ 1 using Partial Least Squares Discriminant Analysis (PLS-DA).
Ontology mapping and standardizing
To standardize disease and trait names, we employed disease-related terms and identifiers from various ontologies, such as the Disease Ontology (DO, https://disease-ontology.org/), Experimental Factor Ontology (EFO, https://www.ebi.ac.uk/efo/), Medical Subject Headings (MeSH, https://www.nlm.nih.gov/mesh/meshhome.html), and MONDO (https://mondo.monarchinitiative.org/). Utilizing these ontologies, we organized diseases and traits into eight and five distinct subcategories, respectively.
Knowledge graph construction
We have recently built a knowledge graph to improve the visualization and understanding of connections. This graph identifies three main entities: disease, trait and gene. The graph consists of three main sections: ‘Disease network’, ‘Trait network’ and ‘Gene network’ each centered around its respective core entity. We calculated a confidence score for each disease-gene or trait-gene association based on multi-omics evidence to provide a deeper understanding. This score is included in the graph as an additional attribute, indicating the strength and reliability of each connection.
Enrichment analysis and signature comparison
The current version of CVD Atlas includes two tools: ‘Enrichment analysis’ and ‘Signature comparison’. The ‘Enrichment analysis’ tool determines gene function based on the gene–disease or gene–trait perspective, using all genes in the repository as a reference set. Users can submit a list of gene symbols, set conditions, and calculate the significance P-value using Fisher's exact test. The ‘Signature comparison’ tool allows users to compare user-submitted signatures with differentially expressed genes from each dataset in CVD Atlas.
Database implementation
CVD Atlas was developed using SpringBoot (https://spring.io/projects/spring-boot) and MyBatis (https://mybatis.org/mybatis-3) as the back-end framework. The web interface was built using HTML (HyperText Markup Language), CSS (Cascading Style Sheets), jQuery (https://jquery.com), AJAX (Asynchronous JavaScript and XML) and Thymeleaf (https://www.thymeleaf.org). Front-end frameworks such as Semantic UI (https://semantic-ui.com) and Bootstrap (https://getbootstrap.com) were also employed. Data were stored and queried using MySQL (https://www.mysql.com), and visualization was achieved through Apache ECharts (http://echarts.apache.org).
Database contents and usage
CVD Atlas is a comprehensive database created through meticulous manual curation, analysis of multi-omics datasets and integration of knowledge from existing databases. Additionally, the website includes a visual and interactive knowledge graph and two types of tools to facilitate a better understanding of the intricate relationships between genes and CVD (Figure 1A).

(A) An overview of CVD Atlas. (B) The distribution of 190 diseases among the 8 categories. (C) The distribution of 44 traits among the 5 categories. (D) The top 20 genes with the highest number of associated diseases (top) and traits (bottom). (E) The top 20 diseases (left) and traits (right) with the highest associated genes.
Integration of multi-omics association knowledge for cardiovascular disease
In the current version, CVD Atlas houses 215,333 associations from 308 publications, 652 datasets and 7 existing databases. These encompass 35,829 genes, 190 diseases grouped into 8 categories, and 44 traits classified into 5 categories (Figure 1B, C). Among these genes, the most prevalent are protein-coding genes, followed by lncRNA (Supplementary Figure S1). Most genes are linked to multiple diseases or traits, with a median association of four diseases or traits per gene. The top 20 genes with the highest number of associated diseases or traits exhibit various functions across different conditions (Figure 1D). hsa-mir-146a is the most prominent gene associated with 55 diseases, while BRAP, HECTD4 and NAA25 are the leading genes linked to 19 traits. Conversely, the median number of related genes for 190 diseases and 44 traits is 86 and 234, respectively. The top 10 diseases and traits with the most associated genes are shown in Figure 1E. Hypertension is associated with 836 high-quality genes with a confidence score greater than 0.7. This indicates its complex pathological mechanisms and molecular functions. AGT, NOS3, POMC, ADRB1 and NPPA are the top genes associated with hypertension, ranked by their confidence scores. AGT gene encodes angiotensinogen, the sole precursor of all angiotensin peptides, playing a vital role in regulating blood pressure (43). The associations between AGT polymorphism and hypertension have been well-documented (44–46). NOS3 encodes one of the three isoforms of nitric oxide synthase, primarily found in endothelial cells, and is essential for vascular health. NOS3 is also recognized as a genetic risk factor for hypertension (47,48). Additionally, the CVD Atlas contains 457,286 SNPs, 8,436 differentially methylated positions, 453 differentially expressed proteins and 148 differentially expressed metabolites.
User-friendly modules for accessing data
The ‘Browse’ page of CVD Atlas contains seven interactive tables that are easy to navigate. Each table is indexed by disease, trait, dataset, gene, SNP, association and publication, allowing users to access relevant data easily. The browse tables provide basic information and brief summarized statistics. For example, the ‘Disease browse’ table displays all the diseases included in CVD Atlas, along with their names, CVD Atlas IDs and statistics such as significant SNPs, colocalized genes, differentially expressed genes, proteins, metabolites, differentially methylated positions, associated chemicals, other associations and datasets. Users can access detailed information on each item by visiting its dedicated page, which provides comprehensive information on all associations across different omics levels (Figure 2A).

Illustration of module interfaces in CVD Atlas. (A) Screenshot of the browse table with fundamental information and summary statistics (top), using disease browse as an example, and detailed disease information, using coronary artery disease as an example, including basic information, genomics, transcriptomics, epigenomics, proteomics, metabolomics, chemical, and other information (bottom). (B) Screenshot of the advanced search module. (C) A screenshot of the knowledge graph using coronary artery disease as an example is included. (D, E) Screenshots of analysis tools for enrichment analysis (D) and signature comparison (E), respectively.
To facilitate efficient querying, CVD Atlas offers several search channels: (i) a quick search box on the home page for real-time queries by specifying disease name, trait name, gene name, SNP, or CVD Atlas ID; (ii) an advanced search function on the ‘Search’ page that allows users to access CVD Atlas using specific terms related to genes (e.g. gene symbol), diseases or traits (e.g. disease or trait name), SNPs (e.g. dbSNP ID (49)) and datasets (e.g. GEO ID). The advanced search also enables users to search for genes, SNPs, or datasets associated with a specific disease or trait and to query diseases related to a specific gene (Figure 2B). Furthermore, CVD Atlas features an auto-suggestion function that provides candidate query terms for users based on even short inputs.
Highly integrated knowledge graph with interactive visualization
To improve the integration and visualization of information, CVD Atlas creates a comprehensive and interactive knowledge graph by systematically combining disease-gene and trait–gene associations (Figure 2C). The graph consists of three panels: ‘Disease network’, ‘Trait network’ and ‘Gene network’. It defines three types of entities: diseases, traits and genes. Each panel highlights the respective entity as core nodes. The connections are quantitatively characterized by a confidence scoring system based on multi-omics associations and represented by the distance between two nodes. By default, only the top 20 associated entities are displayed and sorted in descending order by confidence score. Moreover, all nodes in the graph can be moved, allowing users to customize the display and export the high-resolution graph. For example, in the ‘Disease network’ for coronary artery disease, the graph displays the top 20 genes related to CAD by default, such as PHACTR1 and NOS3. Notably, PHACTR1 is the most related gene, as it is the nearest node to CAD, with a confidence score of 0.965. The graph lets users access detailed information about each node by clicking on them.
Analysis tools for efficient data exploration
We have implemented two tools, ‘Enrichment analysis’ and ‘Signature comparison’, in CVD Atlas to make it easy and efficient to explore the database. The first tool helps users determine if specific genes play a role in certain diseases or traits by examining which diseases or traits are enriched by the genes of interest. Users can submit a list of genes under specified conditions to calculate the P-value using Fisher's exact test. The ‘Enrichment analysis’ output includes a table for genes matched in CVD Atlas, a dot plot, and a detailed table of enrichment results (Figure 2D). The second tool helps users identify the relevant diseases for user-submitted gene signatures across datasets by assessing the overlap between the submitted gene list and the differentially expressed genes for diseases in each dataset. The ‘Signature comparison’ output consists of a network graph showing the relationships between genes, diseases, and datasets, along with two tables detailing the number of gene hits per disease and the differential analysis results (Figure 2E).
Case study: coronary artery disease
CVD Atlas contains comprehensive information about cardiovascular disease (CVD) and its functional modules. To illustrate, let's look at how CVD Atlas is used to explore coronary artery disease (CAD). CAD is a cardiovascular disorder characterized by the narrowing or blockage of the coronary arteries. There are 79,724 entries related to CAD within CVD Atlas across all six omics levels. CAD is linked with 3,279 genes, with 379 genes associated with CAD in at least two omics levels. The detailed information about CAD is divided into six panels: ‘Basic information’, ‘Genomics’, ‘Transcriptomics’, ‘Epigenomics’, ‘Proteomics’ and ‘Chemical’. The ‘Basic information’ panel provides fundamental details about CAD, such as the disease name, CVD Atlas ID, MeSH ID, ontology ID and a description. A confidence score presents an overview of the associations between genes and CAD. The top five genes associated with CAD are PHACTR1, NOS3, LPA, PCSK9 and SORT1 (Supplementary Figure S2). Taking NOS3 as an example, it is linked to CAD across three omics levels: genomics, transcriptomics, and chemicals. This association is supported by 598 entries, covering 43 publications and two datasets. This strongly underscores that the integrated data provided by CVD Atlas offers users multidimensional insights into the connection between specific genes and diseases from different angles.
CVD Atlas provides six other panels to offer a comprehensive view. In the ‘Genomics’ panel, five GWAS datasets (four European and one mixed ancestry) were included. Among them, rs7412, a missense variant of APOE, has the highest number of significant datasets in the European population (Supplementary Figure S3), suggesting its potential importance for CAD. Colocalization analysis of GWAS and coronary artery eQTL data indicates that PHACTR1, MRAS and LINC00881 might be strong causal candidates for CAD in the European population (Supplementary Figure S4). These genes are closely associated with CAD and other cardiovascular diseases (50–54). The ‘Transcriptomics’ panel displays differentially expressed genes collected from dataset analysis and manual curation. For example, JUN, NR4A1 and TNFAIP3 are significantly differentially expressed in three of ten datasets (Supplementary Figure S5). These genes have close relationships with human malignancies (55–59). Recent studies reported that CAD and cancer often co-occur via common biological pathways and shared risk factors. Various chemotherapeutic agents and radiotherapy can impact the development and progression of CAD (60). In the ‘Epigenomics’ panel, no differentially methylated positions are recorded in the current version of CVD Atlas. Moreover, associations at different omics levels collected from existing databases and curated results are provided in these three panels. In the ‘Proteomics’ panel, IGFBP2, THY1 and APOH are the top positively related proteins, whereas S100A8, TGM3, and APOB are the top negatively associated proteins in CAD (Supplementary Table S1). In the ‘Metabolomics’ panel, 49 differentially expressed metabolites are displayed. Among them, adenylosuccinic acid, 4a-carbinolamine tetrahydrobiopterin, and succinyladenosine are the top metabolites associated with CAD (Supplementary Table S2). The associations of chemicals with diseases were presented in the ‘Chemical’ panel. For example, cocaine is the most closely related chemical, directly supported by 17 publications as the ‘Marker/Mechanism’ type in CTD. In comparison, aspirin is directly reported by eight publications as the ‘Therapeutic’ type (Supplementary Figure S6). CVD Atlas contains comprehensive information about cardiovascular diseases and offers valuable insights for researchers on cardiovascular health.
Discussion and future developments
Cardiovascular disease (CVD) is a significant health concern that has been extensively researched. Numerous studies have been conducted on CVD across different fields, including genomics, transcriptomics, epigenomics, proteomics and metabolomics (61,62). Existing CVD multi-omics resources offer a comprehensive perspective and valuable insights into CVD. However, integrating a large number of associations and datasets remains a challenge. To address this issue, we introduce the CVD Atlas. CVD Atlas is an available database that systematically curates published disease-gene associations, analyzes multi-omics datasets, combines knowledge from existing databases, and implements a visualized and interactive knowledge graph and two analysis tools. Compared with the current CVD multi-omics databases, CVD Atlas offers the following features: (i) comprehensive coverage of a wide range of cardiovascular diseases grouped by an ontology classification system and characterized by multi-omics annotations; (ii) integration of analysis results from diverse omics datasets processed through a unified pipeline, providing valuable insights into the underlying mechanisms of diseases and (iii) aggregation of scattered knowledge from multiple existing databases to provide a more comprehensive understanding of CVD.
CVD Atlas, a crucial resource within the National Genomics Data Center (NGDC, https://ngdc.cncb.ac.cn) (63) committed to continuous improvement, will be regularly updated to include the latest association discoveries and cardiovascular disease (CVD) datasets from various omics. The updates will expand the types of knowledge or datasets generated by different methods, including additional levels of epigenomic knowledge like histone modification and chromatin accessibility (64,65). Moreover, future updates will incorporate single-cell RNA sequencing (scRNA-seq) and spatial transcriptomics data to analyze cellular transcriptomic variability under different conditions of CVD (66,67). The current version of CVD Atlas already includes one expression quantitative trait loci (eQTL) dataset from GTEx, and future updates will integrate more types of QTLs while considering the impact of ancestry. Additionally, regulatory information on protein–protein interaction, which is valuable for understanding the complex biological pathways and molecular mechanisms underlying CVD, will also be included in future updates. The aim is for CVD Atlas to become an essential resource, providing comprehensive and up-to-date knowledge on cardiovascular disease.
Data availability
CVD Atlas is available at https://ngdc.cncb.ac.cn/cvd/.
Supplementary data
Supplementary Data are available at NAR Online.
Acknowledgements
We thank the users for reporting bugs and providing suggestions.
Funding
Strategic Priority Research Program of the Chinese Academy of Sciences [XDB38030400 to J.X.]; National Natural Science Foundation of China [32170669 to J.X., 32300542 to J.Z.]; National Key Research Program of China [2020YFA0907001 to J.X.]; Youth Innovation Promotion Association of the Chinese Academy of Sciences [2022098 to J.Z.]. Funding for open access charge: Strategic Priority Research Program of the Chinese Academy of Sciences [XDB38030400 to J.X.]; National Natural Science Foundation of China [32170669 to J.X., 32300542 to J.Z.]; National Key Research Program of China [2020YFA0907001 to J.X.]; Youth Innovation Promotion Association of the Chinese Academy of Sciences [2022098 to J.Z.].
Conflict of interest statement. None declared.
References
Author notes
The first two authors should be regarded as Joint First Authors.
Comments