CHDbase: A Comprehensive Knowledgebase for Congenital Heart Disease-Related Genes and Clinical Manifestations

Author Notes

Abstract

Congenital heart disease (CHD) is one of the most common causes of major birth defects, with a prevalence of 1%. Although an increasing number of studies have reported the etiology of CHD, the findings scattered throughout the literature are difficult to retrieve and utilize in research and clinical practice. We therefore developed CHDbase, an evidence-based knowledgebase of CHD-related genes and clinical manifestations manually curated from 1114 publications, linking 1124 susceptibility genes and 3591 variations to more than 300 CHD types and related syndromes. Metadata such as the information of each publication and the selected population and samples, the strategy of studies, and the major findings of studies were integrated with each item of the research record. We also integrated functional annotations through parsing ∼ 50 databases/tools to facilitate the interpretation of these genes and variations in disease pathogenicity. We further prioritized the significance of these CHD-related genes with a gene interaction network approach and extracted a core CHD sub-network with 163 genes. The clear genetic landscape of CHD enables the phenotype classification based on the shared genetic origin. Overall, CHDbase provides a comprehensive and freely available resource to study CHD susceptibilities, supporting a wide range of users in the scientific and medical communities. CHDbase is accessible at http://chddb.fwgenetics.org.

Congenital heart disease, Congenital heart defect, Database, Genetics, Classification

Introduction

Congenital heart disease (CHD) refers to abnormalities in cardiocirculatory structure or function that arise during embryonic development. With a prevalence as high as ∼ 1% of live births [1], CHD is one of the most common causes of major birth defects [2] and imposes enormous health and economic burdens. Specifically, 3%–5% of CHD patients have a genetic syndrome manifesting not only as CHD but also as extracardiac defects, which is termed syndromic CHD; others present abnormalities restricted to the heart, which is termed nonsyndromic CHD. Family studies have shown the strong genetic susceptibilities underlying the pathogenesis of CHD [3,4], so it is essential to clarify the CHD genetic etiology in the diagnosis and treatment of CHD. Recently, numerous genetics studies have addressed the genetic susceptibilities underlying CHD. Briefly, it has been estimated that the chromosomal aberrations and aneuploidies account for 8%–10% of CHD, the copy number variations (CNVs) for 3%–10% of CHD [5], de novo and rare inherited variants for 8% and 2% of CHD, respectively [6].

However, as the findings of these genetic studies are largely scattered in literature, it is difficult to incorporate them into in-depth mechanism studies and clinical practice. The holistic picture of CHD susceptibilities, such as the number and features of CHD-related genes, the strength of each item of evidence, and the relationship between genetic elements and cardiac malformations, is still to be addressed. Notably, the adequate classification of CHD types could substantially facilitate the investigation of its pathogenesis, accurate diagnosis, and effective treatments, while the current practice largely depends on its clinical manifestations with high diversity [7–9]. Efficient molecular classification could certainly supplement the current practices, while it is still to be addressed due to the lack of well-organized genotype-phenotype correlations. Notably, one pilot study recently reported a CHD-RF-KB database [10] to curate genetic risk factors associated with nonsyndromic CHD from approximately 300 studies, while it is far from complete to provide a clear genetic landscape of CHD for basic and translational studies.

Here, we present a comprehensive knowledgebase of CHD with extensive research evidence and functional annotations. We also prioritized these CHD-related genes with a gene interaction network approach. We then classified the CHD types based on their shared genetic origin, facilitating a molecular diagnosis and treatment of CHD which supplements the traditional practices with clinical manifestations.

Database implementation

Data collection and processing

To integrate current knowledge of CHD, we searched the PubMed database to retrieve CHD-related publications, using keywords including “congenital heart disease”, “heart defects”, “gene”, “proteomics”, “expression”, “copy number variation”, “linkage”, “association”, and “sequencing” (File S1). Among more than 2700 search results, we confirmed the relevance of each publication by examining its title and abstract. We excluded publications about congenital cardiomyopathies or congenital heart rhythm disorders due to their distinct clinical presentation and underlying mechanisms. After this process, a total of 1114 research articles published between February 1992 and January 2020 were collected for further curation.

For each article, we reviewed the main text and additional materials to extract evidence of an association between genes/variations and CHD. If multiple experimental approaches or different datasets were used in one study, the evidence of this study was divided into multiple independent records. In total, 1353 items of evidence were extracted, which could be classified into six types as follows: (1) 159 items of “Genetic association” evidence, (2) 452 items of “SNV/Indel” evidence reporting single-nucleotide variants/insertion-deletion variants (SNVs/Indels) in CHD patients, (3) 63 items of “Expression” evidence linking differentially expressed genes to CHD, (4) 240 items of “CNV” evidence reporting CNVs detected in patients, (5) 15 items of “Linkage” evidence reporting CHD-related genomic regions by linkage studies, and (6) 424 items of “Other” evidence, such as the evidence from transgenic animal models and cell lines (Figure 1). For each item of evidence, comprehensive metadata, such as the information of each publication, the selected population and samples, the strategy of studies, and the major findings of association, were extracted manually and double-checked by two researchers (Figure 1). A detailed description of the collected information is listed in Table S1.

Figure 1

Overview of the framework of CHDbase CHD, congenital heart disease; SNV, single nucleotide variant; Indel, insertion-deletion variant; CNV, copy number variation; ID, identity document; HGVSc, the Human Genome Variation Society expressions at the cDNA level; HGVSp, the Human Genome Variation Society expressions at the protein level; PTM, posttranslational modification; GO, Gene Ontology; MAF, minor allele frequency.

Open in new tab Download slide

The disease name was standardized according to the 11th revision of the International Classification of Diseases (ICD-11) congenital cardiology terms [11]. Transcript-based SNVs/Indels were annotated by Ensembl Variant Effect Predictor (VEP, https://grch37.ensembl.org/Tools/VEP) [12] and confirmed by Mutalyzer [13] following the nomenclature recommendations of the Human Genome Variation Society (HGVS). Genomic coordinates are presented according to the human reference genome (GRCh37/hg19). Overall, CHDbase integrated 3813 gene–CHD associations and 3957 variation–CHD associations, which contains the association information for 1124 susceptibility genes, 1006 structural variations, and 2585 SNVs/Indels.

Prioritization of CHD-related genes

For gene prioritization, the traditional evidence-based prioritization approach typically assigns a gene to a weighted sum based on the number and type of supporting evidence. As the weight for each type of evidence is debatable, and the items of evidence from the literature may simply reflect hot research topics rather than the significance of genes, here we introduced a gene interaction network approach to prioritize the significance of CHD-related genes.

In CHDbase, 1124 genes have been linked to CHD susceptibility, including 115 reported only in syndromic CHD (syndromic genes), 597 reported only in nonsyndromic CHD (nonsyndromic genes), and 412 reported in both syndromic and nonsyndromic CHD (Figure S1). We further prioritized these susceptibility genes with a gene interaction network approach and extracted a core CHD gene set (File S1). Briefly, on the basis of the 1124 susceptibility genes and protein–protein interaction annotations from the STRING database [14], we first constructed an unweighted gene interaction network with 952 nodes and 30,777 edges. Then, we used three centrality scores, namely, degree, betweenness centrality, and eigenvector centrality, to measure the significance of each gene in the network. Specifically, a gene with a high degree could be considered a “hub” in the CHD network with more interactions with other genes. A gene with large betweenness centrality could be considered as a “bridge” in the network and the removal of such a gene could lead to a disconnected network. A gene with large eigenvector centrality has more interactions with high-degree genes. A gene with a high degree, large betweenness centrality, and large eigenvector centrality could be more important within the network.

To obtain the core gene set, we extracted a k-core with 163 genes from the CHD network using k-core decomposition (see supplementary methods in File S1). As we retained the nodes with a maximum value of k (k = 70), these 163 genes in the k-core were considered the most centrally located nodes in the original network. The core gene set contains six syndromic genes, 83 nonsyndromic genes, and 74 genes reported in both syndromic and nonsyndromic CHD, indicating that the syndromic and nonsyndromic CHD share a common gene network (Figure 2A). These core genes are supported by a significantly higher number of supporting evidence than that of all CHD-related genes (P = 4.55E−10, Mann–Whitney U test). Notably, for 38 high-confidence CHD genes recurrently reported in recent reviews [5,15–18], 18 are present in this core gene set (Figure 2B). To facilitate users to rank genes based on their own experiences and preferences, we provide the CHD susceptibility gene list with the metadata, such as the information indicating whether it belongs to the core gene set, its centrality scores, and the number of evidence items supporting its association with CHD, as shown in Table S2.

Figure 2

Network-based prioritization of CHD-related genes A. The k-core of CHD-related genes (n = 163). The node size is proportional to the number of supporting evidence items for each gene. B. For 18 core genes overlapping with the high-confidence CHD genes recurrently reported in recent reviews, the degree, betweenness centrality, eigenvector centrality, and the number of supporting evidence items for each gene are listed.

Open in new tab Download slide

Functional annotations of CHD-related genes

To better understand the function of CHD-related genes, we further annotated them using public databases and data as follows: (1) HUGO Gene Nomenclature Committee (HGNC, https://www.genenames.org), National Center for Biotechnology Information (NCBI) Entrez gene (https://www.ncbi.nlm.nih.gov/gene) [19], Ensembl (https://www.ensembl.org), GeneCards (https://www.genecards.org) [20], Online Mendelian Inheritance in Man (OMIM, https://www.omim.org) [21], UniProt Knowledgebase (UniProtKB, https://www.uniprot.org), Mouse Genome Informatics (MGI, https://www.informatics.jax.org) [22], and Zebrafish Model Organism Database (ZFIN, https://www.zfin.org) [23] for basic information of the gene and its orthologs; (2) possible associated phenotypes and diseases from Human Phenotype Ontology (HPO, https://hpo.jax.org/app) [24], GWAS Catalog (https://www.ebi.ac.uk/gwas) [25], and DisGeNET (https://www.disgenet.org) [26]; (3) spatiotemporal expression profiles integrated from the Genotype-Tissue Expression (GTEx) project [27], RNA-seq data for seven human organs across multiple developmental time points [28], and spatial subcellular expression data of the developing human heart at three developmental phases [29]; (4) posttranslational modifications retrieved from UniProtKB and dbPTM (https://awi.cuhk.edu.cn/dbPTM) [30]; (5) biological pathways and protein–protein interactions annotated by Gene Ontology (GO, https://www.geneontology.org), Reactome (https://reactome.org) [31], BioCyc (https://www.biocyc.org) [32], Kyoto Encyclopedia of Genes and Genomes (KEGG, https://www.genome.jp/kegg) [33], PANTHER (https://panther.celera.com) [34], STRING (https://cn.string-db.org) [14], OmniPath (https://omnipathdb.org) [35], BioGRID (https://thebiogrid.org) [36], and the Human Reference Interactome (HuRI, http://www.interactome-atlas.org) [37]; and (6) drug–gene interactions retrieved from the Drug Gene Interaction Database (DGIdb, https://dgidb.org) [38], DrugCentral (https://drugcentral.org) [39], and Pharmacogenetics and Pharmacogenomics Knowledge Base (PharmGKB, https://www.pharmgkb.org) [40] (Figure 1). Additionally, to better understand the effects of variations, we classified the variations into five categories according to the conclusions of the original publications: “disease-causing”, “likely disease-causing”, “uncertain”, “likely benign”, and “benign”. For SNVs/Indels, we further integrated genomic information annotated by VEP [12], including variant consequences, minor allele frequency in different populations of the Genome Aggregation Database (gnomAD) [41], conservation scores calculated with phastCons, phylogenetic P-values (phyloP), and Genomic Evolutionary Rate Profiling (GERP++), and variant deleteriousness predicted by Sorting Intolerant from Tolerant (SIFT), Polymorphism Phenotyping v2 (PolyPhen2), Combined Annotation Dependent Depletion (CADD), Deleterious Annotation of genetic variants using Neural Networks (DANN), Likelihood Ratio Test (LRT), Mendelian Clinically Applicable Pathogenicity (M-CAP), MetaLR, PrimateAI, Functional Analysis Through Hidden Markov (FATHMM) using a Multiple Kernel Learning algorithm (FATHMM-MKL) and FATHMM with an eXtended Feature set (FATHMM-XF) (Figure 1).

Database interface

We constructed a MySQL database to store and manage the data. A user-friendly web interface was further implemented in Java, HTML, CSS, and JavaScript, as powered by Spring Boot, Thymeleaf, and Bootstrap framework.

User interface and functions

CHDbase provides two search modes. The basic search mode on the Home page and the top navigation bar allows users to precisely or fuzzily search for genes, variations, CHD/syndrome, and literature of interest (Figure 3 A and B). The basic search engine can recognize query terms such as the gene symbols, Entrez IDs, Ensembl IDs, the range of genomic coordinates, HGVS expressions at the cDNA level (HGVSc), HGVS expressions at the protein level (HGVSp), IDs in the database of single nucleotide polymorphisms (dbSNP), the full name or abbreviation of diseases, and PubMed IDs. When the user clicks on the button of “Exact Search”, a result page fully matching the search term displays detailed evidence of gene–CHD or variation–CHD association. If the user clicks on the “Fuzzy Search” button, the search engine will attempt to find partial matches for the search term, and return the results on the Browse page. The Browse page not only enables users to flexibly browse associations at the gene level or variation level, but enables the users to filter data with an advanced search mode (Figure 3C). Specifically, the user can enter multiple search terms in corresponding search boxes under column names to search for information of interest. Search results can be sorted by multiple columns in a customized order, and are downloadable in the format of JavaScript Object Notation (JSON) or Comma Separated Values (CSV). The user could further click on the gene or variation links on the Browse page to view the detailed information of research evidence and annotations as described below.

Figure 3

The web interface of CHDbase The major components of the web interface of CHDbase are shown. A. The navigation bar. B. The Home Page. C. The Browse Page. D. The Evidence and Annotation Page for each CHD-related gene or variation. E. The Statistics Page for the diseases, genes, variations, and studies. F. The Help Page.

Open in new tab Download slide

Detailed information about gene–CHD or variation–CHD associations is provided in a similar manner on the Gene Evidence Page or Variation Evidence Page (Figure 3D). Taking Gene Evidence Page as an example (Figure S2), in the upper panel, the list of research evidence for a specific gene relevant to CHD is summarized, including evidence ID, PubMed ID, title, abstract, study type, species, associated CHD types, associated syndromes, and conclusions. Furthermore, the user can click on the “Detail” label on the left of the “Evidence ID” column to view details in the lower panel. For detailed information on each item of evidence, we first provide population information, study design, and a summary of results, followed by the details of the results. According to different categories of evidence, the detailed information is summarized in varied formats (Figure S3). For example, for the evidence of “SNV/Indel” or “CNV”, variation, variation type, associated CHD types, associated syndromes, the number of variations in affected individuals, the number of variations in unaffected individuals, and variation pathogenicity are provided. If the sample information is available, the user can click on the plus sign at the end of the row to view the information of individuals carrying this variation, including sample ID, family name, ethnicity, sex, age, diagnosis, other phenotypes, transmission (de novo, paternal, maternal, and familial), and zygosity (homozygous, heterozygous, and mosaic). For the evidence of “Genetic association”, “Expression”, and “Linkage”, besides the associated CHD types and syndromes, statistical methods and results are also integrated. For further functional information about the gene or variation, the user can click on the label “Gene Annotation” or “Variation Annotation” in the top left to view annotations from external databases and the data mentioned above (Figures S2 and S4).

Finally, to display a comprehensive genetic landscape of CHD, the database also shows the statistics at disease, gene, variation, and study levels, in charts or forms with embedded hyperlinks (Figures 1 and 3E). A Help Page with detailed introductions to the database is also provided (Figures 1 and 3F).

Features of CHD-related genes

To clarify the molecular mechanisms of CHD and facilitate the prioritization of novel susceptibility genes, we systematically characterized the features of CHD-related genes. As syndromic CHD is usually accompanied by abnormal brain development, we explored the expression profiles of three CHD gene categories in the brain and heart, across different time points of human development [28]. Compared with a random gene set unrelated to CHD as a background, all of the three CHD categories showed higher expression levels in the brain and heart, especially in fetal organs, indicating their roles in embryonic development. Notably, at varied developmental stages, the syndromic genes showed higher expression levels than nonsyndromic genes in the brain (P = 0.13, one-sided Mann–Whitney U test) (Figure 4A); however, in the heart, nonsyndromic genes expressed at a significantly higher level than syndromic genes (P < 2.20E−16, one-sided Mann–Whitney U test), suggesting a more specific role of nonsyndromic genes in heart development (Figure 4B). Furthermore, CHD-related genes tend to broadly express across tissues and developmental stages compared with the background, according to tissue- and time-specificity indexes as defined by Cardoso-Moreira et al., indicating CHD-related genes are more intolerant to functional variations [28] (Figure 4C and D).

Figure 4

The expression profiles of CHD-related genes A. and B. Boxplots showing the expression levels for three categories of CHD-related genes in the brain (A) and heart (B) across different developmental time points. C. and D. Violin plot showing the difference between CHD-related genes and background genes in tissue-specificity (C) and time-specificity (D). The P values were obtained with Mann–Whitney U test. In all panels, CHD-related genes are classified into three categories: syndromic genes (red, n = 115), nonsyndromic genes (blue, n = 597), and genes reported in both syndromic and nonsyndromic CHD (orange, n = 414). A non-CHD gene set (n = 1124) was randomly selected as the background (gray). TPM, transcripts per million; PCW, post-conception week.

Open in new tab Download slide

We then identified the enriched GO terms and pathways for CHD-related genes with ClueGO [42] (Figure 5; Table S3). Briefly, the GO terms of heart development, embryonic morphogenesis, pattern specification process, cell differentiation, muscle structure development, regulation of cellular component movement, and cell surface receptor signaling pathway are significantly enriched. Consistently, on the pathway level, the cell surface receptor signaling pathways involved in heart development, such as the signaling pathways of wingless-type MMTV integration site family (Wnt), Notch, and transforming growth factor -beta (TGF-beta), are significantly enriched. The enrichment of CHD-related genes in terms of heart development is expected since the initial keywords for retrieving CHD-related genes were chosen to find these genes. While the enriched terms and pathways identified here provide a comprehensive molecular picture to understand CHD. Moreover, several cancer-related signaling pathways are also identified, providing a perspective to understand the previous epidemiological data that CHD patients typically show higher cancer prevalence [43,44]. Interestingly, the generic transcription pathway is also enriched, with 112 of the 1124 CHD-related genes cataloged as transcription factors in hTFtarget [45] (P < 2.20E−16, Fisher’s exact test). Notably, among the top ten genes with the strongest supporting evidence, six are transcription factor genes (GATA4, NKX2-5, TBX5, GATA6, CHD7, and NOTCH1). In addition, our data also recapitulated several pathways previously linked to CHD etiology [18], such as muscle contraction, extracellular matrix organization, and cilium assembly.

Figure 5

Functional enrichments of CHD-related genes A. The top 15 significantly enriched GO terms for CHD-related genes (n = 1124) are shown with significance levels. B. The top 15 significantly enriched KEGG and Reactome pathways for CHD-related genes (n =1124) are shown with significance levels. The adjusted P values corrected with Bonferroni step down were – log₁₀ transformed to denote the significance levels. GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; TCF, T-cell factor; Wnt, wingless-type MMTV integration site family; GFR, growth factor receptor.

Open in new tab Download slide

Molecular classification of CHD

CHD is a highly heterogeneous disease with a large set of cardiac malformations. Adequately clustering these malformations could classify this complex disease into several groups with shared pathophysiological mechanisms, facilitating in-depth mechanism study and clinical management. As CHDbase integrates comprehensive phenotype–genotype correlation data, we then attempted to classify CHD based on the genetic background (File S1). We first used Jaccard coefficient [26] to measure pairwise genetic similarity between all pairs of 27 CHD types. In total, 196 pairs of CHD types showed significantly more shared genes than expected (adjusted P < 0.05) (Figure 6A; Table S4). Based on the matrix of pairwise Jaccard distances, hierarchical clustering was further performed to reveal genetic patterns shared by CHD types. Seven distinct groups were identified (Figure 6B), which are largely consistent with known developmental processes and previous CHD classifications, such as Botto’s taxonomy [7] and Ellesøe’s classification based on familial co-occurrence [9]. To control for the effects of false positives in identifying disease-causing variants, we repeated the classification after excluding 21 variants that were reclassified as “benign” or “likely benign” by InterVar [46] according to the American College of Medical Genetics and Genomics/Association for Molecular Pathology (ACMG/AMP) guidelines [47]. The new results are consistent, indicating the reliability of this classification.

Figure 6

CHD classification based on genotype – phenotype correlations A. Jaccard coefficient, Jaccard distance, statistical significance for Jaccard coefficient, and gene distribution are shown for 20 pairs of CHD types with the strongest correlations. The centered Jaccard coefficient equals to the Jaccard coefficient minus the unbiased estimation of expectation. P and Adjusted P were calculated with the Jaccard test and corrected for multiple testing using the Benjamini–Hochberg false discovery rate. B. Classification of 27 CHD types in CHDbase. Only CHD types associated with at least ten genes were included in this analysis. Jaccard distances for all pairs of CHD types are shown in a heatmap, which were further used to classify these CHD types into seven major groups through hierarchical cluster analysis. For each group, the anatomical term is defined and shown. POF, patent oval foramen; PDA, patent arterial duct; PS/PVS, pulmonary stenosis/pulmonary valve stenosis; BAV, bicuspid aortic valve; AS/AVS, aortic stenosis/aortic valvar stenosis; AR, aortic regurgitation; MR, mitral regurgitation; SV, single ventricle; PA, pulmonary atresia; Dxc, dextrocardia; APVC, anomalous pulmonary venous connection; CoA, coarctation of the aorta; VCAb, vena cava abnormality; TA, tricuspid atresia; TR, tricuspid regurgitation; EA, Ebstein malformation of the tricuspid valve; TOF, tetralogy of Fallot; OA, overriding aorta; cAVJ, common atrioventricular junction; AVSD, atrioventricular septal defect; VSD, ventricular septal defect; ASD, atrial septal defect; DORV, double-outlet right ventricle; TGA, transposition of the great arteries; PTA, persistent truncus arteriosus; RAA, right aortic arch; IAA, interrupted aortic arch; RVOTO, right ventricular outflow tract obstruction; LVOTO, left ventricular outflow tract obstruction.

Open in new tab Download slide

As depicted in Figure 6B, cardiac malformations of abnormal rotation and septation of the outflow tract, such as transposition of the great arteries (TGA), double-outlet right ventricle (DORV), persistent truncus arteriosus (PTA), interrupted aortic arch (IAA), and right aortic arch (RAA), were grouped into “Outflow tract malformations”, which represents the spectrum of conotruncal defects. This group highly correlates with “Atrioventricular canal and septal abnormality” because it is usually accompanied by septal abnormalities. In addition to “Outflow tract malformations”, ventricular outflow tract obstruction (VOTO) clustered strongly and was divided into the left (LVOTO) and right (RVOTO) groups. VOTO refers to any anatomic or functional obstruction of flow out of the ventricle into the aorta or the pulmonary artery, usually caused by valvular stenosis or atresia. Our LVOTO and RVOTO groups largely overlap with Botto’s and Ellesøe’s classifications. Exceptions are pulmonary atresia (PA) and coarctation of the aorta (CoA). In our classification, PA and CoA share the most genes with dextrocardia (Jaccard coefficient = 0.20, adjusted P = 0) and vena cava abnormalities (Jaccard coefficient = 0.18, adjusted P = 0), respectively (Figure 6A; Table S4); both classified as “left–right abnormality”. As previously reported, PA manifests in 23.2% of patients with heterotaxy, an abnormality of formation of the left–right axis of the body [48], and CoA has been associated with an isolated persistent left superior vena cava, a common vena cava abnormality in 21.3% of patients [49]. These findings support the correlation between the development of these two cardiac malformations and abnormal left–right arrangement. The other four CHD types in our “left–right anomaly” group have been recognized to be related to the left–right arrangement [7,9]. Taken together, these major groups we defined are highly related to anatomy, embryology, and especially genetic origin. This molecular classification would thus serve as a supplement to the traditional CHD classifications and clinical practice.

Comparison with existing databases

To evaluate the utility and uniqueness of CHDbase, we compared it with CHD-RF-KB [10], a recently published database to compile risk factors associated with nonsyndromic CHD but not syndromic CHD, and two widely-used variant databases, ClinVar [50] and the Human Gene Mutation Database (HGMD) [51], that focus on a wide range of inherited diseases. As shown in Table 1, CHDbase compiles a larger set of CHD-related genes and variations, from a wider scope and a larger number of publications, compared with the other three databases. In comparison to the three databases which only reported information at the variation level, CHDbase provides 60 types of metadata, such as the study design, major associations, and sample information (Table S1). Moreover, CHDbase provides comprehensive functional annotations and analyses of the genes, showing a comprehensive landscape of CHD susceptibilities and facilitating a “one-stop” solution to clarify the gene–CHD associations.

Table 1

Open in new tab

Comparison with existing databases

	CHDbase	CHD-RF-KB (up to 10 January 2020)	ClinVar (up to 10 January 2020)	HGMD (up to 10 January 2020)
Data source	Manually curated	Manually curated	User submitted	Manually curated
Target disease	Syndromic CHD, nonsyndromic CHD	Nonsyndromic CHD	A wide range of diseases	A wide range of diseases
No. of publications	1114	305	163	445
Evidence type	Genetic association, SNV/Indel, Expression, CNV, Linkage, and Other	Genetic association, SNV/Indel, CNV, and Methylation	Genetic association, SNV/Indel, and CNV	Genetic association and SNV/Indel
No. of genes	1145 (1124 CHD susceptibility genes and 21 negative genes)	303	53	874
No. of variations	2585 SNVs/Indels, 1006 CNVs	1194 SNVs/Indels, 567 CNVs	760 SNVs/Indels, 14 CNVs	2152 SNVs/Indels
Collected information	Publication, population, study design, result, and sample information	Publication, result, and sample information	Publication and result information	Publication and result information
Annotation	Gene annotation and variation annotation	Variation annotation	Variation annotation	None
Data analysis	Expression profile, functional enrichment, and CHD classification	Functional enrichment	None	None

	CHDbase	CHD-RF-KB (up to 10 January 2020)	ClinVar (up to 10 January 2020)	HGMD (up to 10 January 2020)
Data source	Manually curated	Manually curated	User submitted	Manually curated
Target disease	Syndromic CHD, nonsyndromic CHD	Nonsyndromic CHD	A wide range of diseases	A wide range of diseases
No. of publications	1114	305	163	445
Evidence type	Genetic association, SNV/Indel, Expression, CNV, Linkage, and Other	Genetic association, SNV/Indel, CNV, and Methylation	Genetic association, SNV/Indel, and CNV	Genetic association and SNV/Indel
No. of genes	1145 (1124 CHD susceptibility genes and 21 negative genes)	303	53	874
No. of variations	2585 SNVs/Indels, 1006 CNVs	1194 SNVs/Indels, 567 CNVs	760 SNVs/Indels, 14 CNVs	2152 SNVs/Indels
Collected information	Publication, population, study design, result, and sample information	Publication, result, and sample information	Publication and result information	Publication and result information
Annotation	Gene annotation and variation annotation	Variation annotation	Variation annotation	None
Data analysis	Expression profile, functional enrichment, and CHD classification	Functional enrichment	None	None

Note: CHD, congenital heart disease; SNV, single nucleotide variant; CNV, copy number variation; Indels, insertion-deletion variants; HGMD, the Human Gene Mutation Database.

Table 1

Open in new tab

Comparison with existing databases

	CHDbase	CHD-RF-KB (up to 10 January 2020)	ClinVar (up to 10 January 2020)	HGMD (up to 10 January 2020)
Data source	Manually curated	Manually curated	User submitted	Manually curated
Target disease	Syndromic CHD, nonsyndromic CHD	Nonsyndromic CHD	A wide range of diseases	A wide range of diseases
No. of publications	1114	305	163	445
Evidence type	Genetic association, SNV/Indel, Expression, CNV, Linkage, and Other	Genetic association, SNV/Indel, CNV, and Methylation	Genetic association, SNV/Indel, and CNV	Genetic association and SNV/Indel
No. of genes	1145 (1124 CHD susceptibility genes and 21 negative genes)	303	53	874
No. of variations	2585 SNVs/Indels, 1006 CNVs	1194 SNVs/Indels, 567 CNVs	760 SNVs/Indels, 14 CNVs	2152 SNVs/Indels
Collected information	Publication, population, study design, result, and sample information	Publication, result, and sample information	Publication and result information	Publication and result information
Annotation	Gene annotation and variation annotation	Variation annotation	Variation annotation	None
Data analysis	Expression profile, functional enrichment, and CHD classification	Functional enrichment	None	None

	CHDbase	CHD-RF-KB (up to 10 January 2020)	ClinVar (up to 10 January 2020)	HGMD (up to 10 January 2020)
Data source	Manually curated	Manually curated	User submitted	Manually curated
Target disease	Syndromic CHD, nonsyndromic CHD	Nonsyndromic CHD	A wide range of diseases	A wide range of diseases
No. of publications	1114	305	163	445
Evidence type	Genetic association, SNV/Indel, Expression, CNV, Linkage, and Other	Genetic association, SNV/Indel, CNV, and Methylation	Genetic association, SNV/Indel, and CNV	Genetic association and SNV/Indel
No. of genes	1145 (1124 CHD susceptibility genes and 21 negative genes)	303	53	874
No. of variations	2585 SNVs/Indels, 1006 CNVs	1194 SNVs/Indels, 567 CNVs	760 SNVs/Indels, 14 CNVs	2152 SNVs/Indels
Collected information	Publication, population, study design, result, and sample information	Publication, result, and sample information	Publication and result information	Publication and result information
Annotation	Gene annotation and variation annotation	Variation annotation	Variation annotation	None
Data analysis	Expression profile, functional enrichment, and CHD classification	Functional enrichment	None	None

Note: CHD, congenital heart disease; SNV, single nucleotide variant; CNV, copy number variation; Indels, insertion-deletion variants; HGMD, the Human Gene Mutation Database.

Discussion

The genetic heterogeneity of CHD has been increasingly recognized in recent decades. However, several fundamental questions, such as the number of genes associated with CHD, and the reliability and reproducibility of each association, are still to be addressed, hindering the applications of current knowledge to etiological studies and clinical practice. In this study, we developed an evidence-based, manually curated knowledgebase for CHD, the CHDbase, linking 1124 susceptibility genes and 3591 variations with ∼ 310 CHD types and related syndromes. We also propose a new network-based approach to prioritize the contribution of each gene to CHD pathogenesis. The users could thus simply select their candidate genes for in-depth mechanism study through our user-friendly data retrieval interfaces. CHDbase thus provides a “one-stop” solution from a comprehensive landscape of CHD susceptibilities to promising candidate genes or even drug targets.

Furthermore, we demonstrated the utility of CHDbase through pattern analyses of genetic and phenotypic diversity in CHD. The expression and functional features of CHD-related genes were systemically analyzed to describe the holistic picture of CHD genetic factors, which should facilitate the study of CHD pathogenesis and the identification of novel susceptibility genes. Based on the most comprehensive genotype–phenotype correlations integrated into CHDbase, we further identified CHD groups with similar genetic origins. Although the classification is based on the genetic background, it recapitulates the traditional classification in that similar malformations could be detected in the same cluster, providing new insight into the genetic backgrounds underlying the pathogenesis and recurrence patterns of CHD. Notably, for this approach of classification, the integrity and reliability of genotype–phenotype correlations are crucial. Currently, although hundreds of genetic loci involved in CHD have been reported and integrated, the list of genes associated with CHD is still not complete and conclusive. Thus, the classification in this study is limited to the 27 most frequent CHD types. The molecular classification of CHD could thus be improved along with more accurate gene–CHD associations identified in the future.

Taken together, CHDbase can not only improve research on the etiology and pathogenesis of CHD but also aid in clinical practice for CHD, including diagnosis, genetic counseling, and treatment. The database will routinely be updated to keep pace with the research progress of CHD and built into a “one-stop” knowledgebase for exploring CHD susceptibility factors.

Data availability

CHDbase is publicly available at http://chddb.fwgenetics.org.

Competing interests

The authors have declared no competing interests.

CRediT authorship contribution statement

Wei-Zhen Zhou: Conceptualization, Data curation, Formal analysis, Methodology, Visualization, Writing - original draft, Writing - review & editing, Project administration, Funding acquisition. Wenke Li: Software, Methodology, Visualization. Huayan Shen: Data curation, Supervision, Writing - original draft. Ruby W. Wang: Formal analysis, Methodology, Writing - original draft. Wen Chen: Conceptualization, Data curation, Supervision. Yujing Zhang: Data curation, Visualization. Qingyi Zeng: Data curation. Hao Wang: Data curation. Meng Yuan: Data curation. Ziyi Zeng: Data curation. Jinhui Cui: Data curation. Chuan-Yun Li: Resources, Writing - review & editing. Fred Y. Ye: Supervision, Methodology, Writing - original draft. Zhou Zhou: Conceptualization, Project administration, Funding acquisition. All authors have read and approved the final manuscript.

Supplementary material

Supplementary data to this article can be found online at https://doi.org/10.1016/j.gpb.2022.08.001.

Acknowledgments

This work was supported by the Young Scientists Fund of the National Natural Science Foundation of China (Grant No. 31801103) and the Chinese Academy of Medical Sciences Initiative for Innovative Medicine (Grant No. 2016-I2M-1-016). We acknowledge Dr. Adam Yongxin Ye at Harvard Medical School, Boston Children’s Hospital for his support and contribution to the project, and Dr. Ge Gao at Peking University for assistance with database comparison.

References

Bernier

P.L.

Stefanescu

Samoukovic

Tchervenkov

C.I.

The challenge of congenital heart disease worldwide: epidemiologic and demographic facts

Semin Thorac Cardiovasc Surg Pediatr Card Surg Annu

2010

;

–

van der Linde

Konings

E.E.

Slager

M.A.

Witsenburg

Helbing

W.A.

Takkenberg

J.J.M.

et al.

Birth prevalence of congenital heart disease worldwide: a systematic review and meta-analysis

J Am Coll Cardiol

2011

;

2241

–

2247

Oyen

Poulsen

Boyd

H.A.

Wohlfahrt

Jensen

P.K.

Melbye

Recurrence of congenital heart defects in families

Circulation

2009

;

120

295

–

301

Nogee

J.M.

Jay

P.Y.

The heritable basis of congenital heart disease: past, present, and future

Circ Cardiovasc Genet

2016

;

315

–

317

Pierpont

M.E.

Brueckner

Chung

W.K.

Garg

Lacro

R.V.

McGuire

A.L.

et al.

Genetic basis for congenital heart disease: revisited: a scientific statement from the American heart association

Circulation

2018

;

138

e653

–

e711

Jin

S.C.

Homsy

Zaidi

Morton

DePalma

S.R.

et al.

Contribution of rare inherited and de novo variants in 2871 congenital heart disease probands

Nat Genet

2017

;

1593

–

1601

Botto

L.D.

Lin

A.E.

Riehle-Colarusso

Malik

Correa

National Birth Defects Prevention Study. Seeking causes: classifying and evaluating congenital heart defects in etiologic studies

Birth Defects Res A Clin Mol Teratol

2007

;

714

–

727

Houyel

Khoshnood

Anderson

R.H.

Lelong

Thieulin

A.C.

Goffinet

et al.

Population-based evaluation of a suggested anatomic and clinical classification of congenital heart defects based on the International Paediatric and Congenital Cardiac Code

Orphanet J Rare Dis

2011

;

Ellesoe

S.G.

Workman

C.T.

Bouvagnet

Loffredo

C.A.

McBride

K.L.

Hinton

R.B.

et al.

Familial co-occurrence of congenital heart defects follows distinct patterns

Eur Heart J

2018

;

1015

–

1022

Yang

Liu

Chen

Shen

An update on the CHDGKB for the systematic understanding of risk factors associated with non-syndromic congenital heart disease

Comput Struct Biotechnol J

2021

;

5741

–

5751

Franklin

R.C.G.

Beland

M.J.

Colan

S.D.

Walters

H.L.

Aiello

V.D.

Anderson

R.H.

et al.

Nomenclature for congenital and paediatric cardiac disease: the International Paediatric and Congenital Cardiac Code (IPCCC) and the Eleventh Iteration of the International Classification of Diseases (ICD-11)

Cardiol Young

2017

;

1872

–

1938

McLaren

Gil

Hunt

S.E.

Riat

H.S.

Ritchie

G.R.

Thormann

et al.

The ensembl variant effect predictor

Genome Biol

2016

;

122

Wildeman

van Ophuizen

den Dunnen

J.T.

Taschner

P.E.M.

Improving sequence variant descriptions in mutation databases and literature using the Mutalyzer sequence variation nomenclature checker

Hum Mutat

2008

;

–

Szklarczyk

Gable

A.L.

Lyon

Junge

Wyder

Huerta-Cepas

et al.

STRING v11: protein-protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets

Nucleic Acids Res

2019

;

D607

–

D613

Fahed

A.C.

Gelb

B.D.

Seidman

J.G.

Seidman

C.E.

Genetics of congenital heart disease: the glass half empty

Circ Res

2013

;

112

707

–

720

Andersen

T.A.

Linde

Lind

Troelsen

Larsen

L.A.

Of mice and men: molecular genetics of congenital heart disease

Cell Mol Life Sci

2014

;

1327

–

1352

Nees

S.N.

Chung

W.K.

Genetic basis of human congenital heart disease

Cold Spring Harb Perspect Biol

2020

;

a036749

Google Scholar

OpenURL Placeholder Text

WorldCat

Williams

Carson

Genetics of congenital heart disease

Biomolecules

2019

;

879

Sayers

E.W.

Beck

Brister

J.R.

Bolton

E.E.

Canese

Comeau

D.C.

et al.

Database resources of the National Center for Biotechnology Information

Nucleic Acids Res

2020

;

–

Safran

Dalah

Alexander

Rosen

Iny

Stein T

Shmoish

, et al.

GeneCards Version 3: the human gene integrator

Database (Oxford)

2010

;

2010

baq020

Google Scholar

OpenURL Placeholder Text

WorldCat

Amberger

J.S.

Bocchini

C.A.

Schiettecatte

Scott

A.F.

Hamosh

OMIM.org: Online Mendelian Inheritance in Man (OMIM^R), an online catalog of human genes and genetic disorders

Nucleic Acids Res

2015

;

D789

–

D798

Bult

Blake

Smith

Kadin

Richardson

Mouse Genome Database Group

Mouse Genome Database (MGD) 2019

Nucleic Acids Res

2019

;

D801

–

Bradford

Conlin

Dunn

Fashena

Frazer

Howe

D.G.

et al.

ZFIN: enhancements and updates to the Zebrafish Model Organism Database

Nucleic Acids Res

2011

;

D822

–

D829

Kohler

Carmody

Vasilevsky

Jacobsen

J.O.B.

Danis

Gourdine

J.P.

et al.

Expansion of the Human Phenotype Ontology (HPO) knowledge base and resources

Nucleic Acids Res

2019

;

D1018

–

D1027

Hindorff

L.A.

Sethupathy

Junkins

H.A.

Ramos

E.M.

Mehta

J.P.

Collins

F.S.

et al.

Potential etiologic and functional implications of genome-wide association loci for human diseases and traits

Proc Natl Acad Sci U S A

2009

;

106

9362

–

9367

Pinero

Ramirez-Anguita

J.M.

Sauch-Pitarch

Ronzano

Centeno

Sanz

et al.

The DisGeNET knowledge platform for disease genomics: 2019 update

Nucleic Acids Res

2020

;

D845

–

D855

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

GTEx Consortium

The Genotype-Tissue Expression (GTEx) project

Nat Genet

2013

;

580

–

Crossref

PubMed

WorldCat

Cardoso-Moreira

Halbert

Valloton

Velten

Chen

Shao

et al.

Gene expression across mammalian organ development

Nature

2019

;

571

505

–

509

Asp

Giacomello

Larsson

Furth

Qian

et al.

A spatiotemporal organ-wide gene expression and cell atlas of the developing human heart

Cell

2019

;

179

1647

–

1660

Lee

T.Y.

Hsu

J.B.K.

Chang

W.C.

Wang

T.Y.

Hsu

P.C.

Huang

H.D.

A comprehensive resource for integrating and displaying protein post-translational modifications

BMC Res Notes

2009

;

111

Croft

O’Kelly

Haw

Gillespie

Matthews

et al.

Reactome: a database of reactions, pathways and biological processes

Nucleic Acids Res

2011

;

D691

–

D697

Karp

P.D.

Ouzounis

C.A.

Moore-Kochlacs

Goldovsky

Kaipa

Ahren

et al.

Expansion of the BioCyc collection of pathway/genome databases to 160 genomes

Nucleic Acids Res

2005

;

6083

–

6089

Kanehisa

Araki

Goto

Hattori

Hirakawa

Itoh

et al.

KEGG for linking genomes to life and the environment

Nucleic Acids Res

2008

;

D480

–

D484

Thomas

P.D.

Campbell

M.J.

Kejariwal

Karlak

Daverman

et al.

PANTHER: a library of protein families and subfamilies indexed by function

Genome Res

2003

;

2129

–

2141

Turei

Korcsmaros

Saez-Rodriguez

OmniPath: guidelines and gateway for literature-curated signaling pathway resources

Nat Methods

2016

;

966

–

967

Oughtred

Stark

Breitkreutz

B.-J.

Rust

Boucher

Chang

et al.

The BioGRID interaction database: 2019 update

Nucleic Acids Res

2019

;

D529

–

D541

Luck

Kim

D.K.

Lambourne

Spirohn

Begg

B.E.

Bian

et al.

A reference map of the human binary protein interactome

Nature

2020

;

580

402

–

408

Cotto

K.C.

Wagner

A.H.

Feng

Y.Y.

Kiwala

Coffman

A.C.

Spies

et al.

DGIdb 3.0: a redesign and expansion of the drug-gene interaction database

Nucleic Acids Res

2018

;

D1068

–

D1073

Avram

Bologa

C.G.

Holmes

Bocci

Wilson

T.B.

Nguyen

D.T.

et al.

DrugCentral 2021 supports drug discovery and repositioning

Nucleic Acids Res

2021

;

D1160

–

D1169

Whirl-Carrillo

McDonagh

E.M.

Hebert

J.M.

Gong

Sangkuhl

Thorn

C.F.

et al.

Pharmacogenomics knowledge for personalized medicine

Clin Pharmacol Ther

2012

;

414

–

417

Karczewski

K.J.

Francioli

L.C.

Tiao

Cummings

B.B.

Alfoldi

Wang

et al.

The mutational constraint spectrum quantified from variation in 141,456 humans

Nature

2020

;

581

434

–

443

Bindea

Mlecnik

Hackl

Charoentong

Tosolini

Kirilovsky

et al.

ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks

Bioinformatics

2009

;

1091

–

1093

Gurvitz

Ionescu-Ittu

Guo

Eisenberg

M.J.

Abrahamowicz

Pilote

et al.

Prevalence of cancer in adults with congenital heart disease compared with the general population

Am J Cardiol

2016

;

118

1742

–

1750

Mandalenakis

Karazisi

Skoglund

Rosengren

Lappas

Eriksson

et al.

Risk of cancer among children and young adults with congenital heart disease compared with healthy controls

JAMA Netw Open

2019

;

e196762

Zhang

Liu

Zhang

H.M.

Xie

G.Y.

Miao

Y.R.

Xia

et al.

hTFtarget: a comprehensive database for regulations of human transcription factors and their targets

Genomics Proteomics Bioinformatics

2020

;

120

–

128

Wang

InterVar: clinical interpretation of genetic variants by the 2015 ACMG-AMP guidelines

Am J Hum Genet

2017

;

100

267

–

280

Richards

Aziz

Bale

Bick

Das

Gastier-Foster

et al.

Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology

Genet Med

2015

;

405

–

424

Kiran

V.S.

Choudhary

Shaik

Gadabanahalli

Raj

Bhat

The spectrum of cardiac anomalies associated with heterotaxy: a single-center study of a large series based on computed tomography

Pediatr Cardiol

2020

;

1414

–

1424

Gustapane

Leombroni

Khalil

Giacci

Marrone

Bascietto

et al.

Systematic review and meta-analysis of persistent left superior vena cava on prenatal ultrasound: associated anomalies, diagnostic accuracy and postnatal outcome

Ultrasound Obstet Gynecol

2016

;

701

–

708

Landrum

M.J.

Chitipiralla

Brown

G.R.

Chen

Hart

et al.

ClinVar: improvements to accessing data

Nucleic Acids Res

2020

;

D835

–

D844

Stenson

P.D.

Mort

Ball

E.V.

Evans

Hayden

Heywood

et al.

The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies

Hum Genet

2017

;

136

665

–

677

Author notes

Equal contribution.

Peer review under responsibility of Beijing Institute of Genomics, Chinese Academy of Sciences / China National Center for Bioinformation and Genetics Society of China.

© 2023 The Authors. Published by Elsevier B.V. and Science Press on behalf of Beijing Institute of Genomics, Chinese Academy of Sciences / China National Center for Bioinformation and Genetics Society of China.

This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 license (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. You are not required to obtain permission to reuse this article.

Download all slides

Month:	Total Views:
September 2023	8
October 2023	8
November 2023	15
December 2023	18
January 2024	45
February 2024	33
March 2024	19
April 2024	14
May 2024	18
June 2024	44
July 2024	38
August 2024	57
September 2024	42
October 2024	56
November 2024	44
December 2024	30
January 2025	43
February 2025	42
March 2025	63
April 2025	55
May 2025	3

Article Contents

CHDbase: A Comprehensive Knowledgebase for Congenital Heart Disease-Related Genes and Clinical Manifestations

Abstract

Introduction

Database implementation

Data collection and processing

Prioritization of CHD-related genes

Functional annotations of CHD-related genes

Database interface

User interface and functions

Features of CHD-related genes

Molecular classification of CHD

Comparison with existing databases

Discussion

Data availability

Competing interests

CRediT authorship contribution statement

Supplementary material

Acknowledgments

References

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

CHDbase: A Comprehensive Knowledgebase for Congenital Heart Disease-Related Genes and Clinical Manifestations

Abstract

Introduction

Database implementation

Data collection and processing

Prioritization of CHD-related genes

Functional annotations of CHD-related genes

Database interface

User interface and functions

Features of CHD-related genes

Molecular classification of CHD

Comparison with existing databases

Discussion

Data availability

Competing interests

CRediT authorship contribution statement

Supplementary material

Acknowledgments

References

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only