-
PDF
- Split View
-
Views
-
Cite
Cite
Chung-Feng Kao, Peilin Jia, Zhongming Zhao, Po-Hsiu Kuo, Enriched pathways for major depressive disorder identified from a genome-wide association study, International Journal of Neuropsychopharmacology, Volume 15, Issue 10, November 2012, Pages 1401–1411, https://doi.org/10.1017/S1461145711001891
- Share Icon Share
Abstract
Major depressive disorder (MDD) has caused a substantial burden of disease worldwide with moderate heritability. Despite efforts through conducting numerous association studies and now, genome-wide association (GWA) studies, the success of identifying susceptibility loci for MDD has been limited, which is partially attributed to the complex nature of depression pathogenesis. A pathway-based analytic strategy to investigate the joint effects of various genes within specific biological pathways has emerged as a powerful tool for complex traits. The present study aimed to identify enriched pathways for depression using a GWA dataset for MDD. For each gene, we estimated its gene-wise p value using combined and minimum p value, separately. Canonical pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG) and BioCarta were used. We employed four pathway-based analytic approaches (gene set enrichment analysis, hypergeometric test, sum-square statistic, sum-statistic). We adjusted for multiple testing using Benjamini & Hochberg's method to report significant pathways. We found 17 significantly enriched pathways for depression, which presented low-to-intermediate crosstalk. The top four pathways were long-term depression (p⩽1×10−5), calcium signalling (p⩽6×10−5), arrhythmogenic right ventricular cardiomyopathy (p⩽1.6×10−4) and cell adhesion molecules (p⩽2.2×10−4). In conclusion, our comprehensive pathway analyses identified promising pathways for depression that are related to neurotransmitter and neuronal systems, immune system and inflammatory response, which may be involved in the pathophysiological mechanisms underlying depression. We demonstrated that pathway enrichment analysis is promising to facilitate our understanding of complex traits through a deeper interpretation of GWA data. Application of this comprehensive analytic strategy in upcoming GWA data for depression could validate the findings reported in this study.
Introduction
Major depressive disorder (MDD) is a complex and multifactorial disorder that causes a substantial burden of disease worldwide. Benefit from advances in genotyping technology, genome-wide association (GWA) studies have been frequently conducted to search for susceptibility genes for traits of interest during the past few years. Typical GWA studies examine half or a few million markers in hundreds or thousands of subjects; thus, such studies are expected to provide greater power in detecting common genetic variants for complex traits. Specifically in MDD, Sullivan et al. (2009) conducted a GWA study in >3000 subjects and found signals in the piccolo (PCLO) gene region (minimal p=7.7×10−7). However, in independent replication samples, association of this gene did not reach a genome-wide significance level at p<5×10−8. Furthermore, a more narrowly defined phenotype, such as recurrent MDD or recurrent early onset MDD, was thought to be more aetiologically homogeneous and might have a higher chance of revealing disease-associated loci. A GWA study for recurrent early onset MDD using sequenced treatment alternatives to relieve depression samples reported significant findings for the GRM7 gene (Shyn et al.2011). However, other GWA studies using the same recurrent early onset MDD phenotype did not find any significant loci reaching genome-wide significance when using >1000 individuals in each study (Muglia et al.2010; Shi et al.2011; Wray et al.2010). Additionally, among the four studies, each of which combined several GWA datasets to perform meta-analyses, only one reported significant associations for MDD in ATP6V1B2 (p=6.8×10−7), SP4 (p=7.7×10−7) and GRM7 (p=1.1×10−6) and for recurrent early onset MDD in GRM7 (p=5.6×10−6; Muglia et al.2010; Shi et al.2011; Shyn et al.2011; Wray et al.2010).
In addition to the results obtained from GWA stud-ies, genetic data for a large number of candidate genes have accumulated during the past decade for their associations with MDD (Kao et al.2011). Bosker et al. (2010) identified 57 depression candidate genes (in total, 92 markers) from literature, which showed significant associations with MDD. They attempted to replicate these previously reported associations using a GWA dataset of MDD. Only two markers, rs12520799 (C5orf20, p=0.038) and rs16139 (NPY, p=0.034), and only one gene, TNF (p=0.0034), were replicated with weak association evidence. Such poor replication indicated the predicament of genetic association studies at the single gene or single marker level. There are many possible reasons for poor replication related to MDD, including the heterogeneous nature of depression patient recruitment and ascertainment bias in different studies, potential publication bias that favours the presentation of positive results, false-positive findings in previous studies and variations in the contributions of genetic and environmental risk factors for depression (Bosker et al.2010; Wang et al.2010). Most importantly, polygenes with a small effect size are a significant issue in conducting genetic association studies for complex disorders, which resulted in concerns of under-power in most of the previous studies (Wray et al.2010). These issues potentially impede success in uncovering the underlying biological mechanisms of depression.
Investigating the join effects of multiple functionally related markers or genes for complex traits has recently been employed. In principle, a set-based analysis at the gene-level is utilized to extract information of all single nucleotide polymorphisms (SNPs) in a gene region, which can be done two ways. A simple approach used in many studies is to select the most significant SNP with the smallest p value to represent the gene (denoted as ‘pmin’). However, this method may underestimate the importance of genes with several moderately associated markers or overestimate the significance of genes (e.g. if the very significant signal results from genotyping errors). Hence, combining p values of all SNPs in a gene to represent the gene (denoted as ‘pcomb’) is a useful alternative to maximize the utilization of SNP association information. Haplotype analysis is another way to consider joint effects of multiple SNPs within genes. Haplotype blocks can be viewed as subsets of a gene, while the pcomb method treated the whole gene as one block. To pre-define haplotype blocks for each gene based on its real linkage disequilibrium structure in a genome-wide level is not very straightforward. Thus, in the current study, we focused on the pmin and the pcomb methods to collect gene-level information.
Furthermore, complex traits may result from defects in a number of genes that disrupt one or more biological pathways. Hence, a pathway-based analysis that incorporates prior biological knowledge of gene functions is necessary to provide a more comprehensive view than a single gene-level analysis (Wang et al.2010, 2011) so as to aggregately analyse enriched gene sets using biological pathway information (Kanehisa et al.2010) or functional categories (Consortium, 2007). Examples of pathway-based analysis methods include, but are not limited to: (i) the competitive method that compares association evidence of genes in a pathway to those not in a pathway, such as gene set enrichment analysis (GSEA) (Wang et al.2007) and a hypergeometric test (Jia et al.2010a); (ii) the self-contained method that accounts for genes within the pathway only, such as sum-statistic and sum-square-statistic (Tintle et al.2009). The competitive method may be less powerful when many genes outside of a specific pathway are also associated with the disease trait. A detailed review of the pathway-based analysis methods is available in Wang et al. (2010) . There have been several successful examples of pathway-based analysis as applied to reveal possible biological mechanisms for the risk of developing schizophrenia (Jia et al.2010a), Parkinson's disease (Wang et al.2007) and heart disease (Tintle et al.2009). It is thus our intention to examine enriched pathways for MDD using a large-scale GWA dataset. We performed pathway-based analyses using four competitive and self-contained methods with two approaches to gather SNP information at the gene level. Of significant importance, the strategies used to conduct pathway-based analyses in the current study for depression potentially boosted our power to obtain meaningful results in studying the molecular mechanisms of depression.
Method and materials
GWA dataset
The MDD GWA dataset was accessed through dbGaP (http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000020.v2.p1). A total of 1821 depression cases and 1822 controls were included in this dataset (Boomsma et al.2008). We followed the quality control processes performed by dbGaP to remove subjects that have a missing phenotype, low call rate (<0.95), gender prediction ambiguity or outliers, as well as to remove markers via criteria such as minor allele frequency <0.05, Mendelian errors ⩾3 and the Hardy–Weinberg Equilibrium test p value ⩽1e−5 in the downloaded GWA dataset. In total, we included genotyping data of 424 861 markers in 3394 individuals (1673 cases and 1721 controls) to perform pathway analysis. A basic allelic test was used to calculate the genomic inflation factor for this GWA dataset, which was 1.028. The quantile–quantile plot for all analysed SNPs can be found in Supplementary Fig. S1, indicating good quality in this GWA dataset.
Computing gene-wise statistic values
To obtain gene-level significance, we first mapped SNPs to a gene (using NCBI build 36) if SNPs were located within the gene region or 20 kb upstream or downstream of the gene, which was suggested as a good gene boundary (Jia et al.2010b). We used two approaches to extract information for SNPs at the gene level. A commonly adopted method is to select the most significant SNP of a gene region using the smallest p value (pmin) in association tests to represent the significance level of a gene. Alternatively, we combined all information of SNPs in a gene region to represent a gene (pcomb) using the inverse γ method (Zaykin et al.2007). While applying the inverse γ method, we specified a shape parameter (α) of 0.1 for γ distribution, as suggested in Biernacka et al. (unpublished observations) for the best performance in their study of the alcohol dependence trait. Note that alcohol dependence might exhibit complexity and heterogeneity that is similar to MDD. In the current study, both pmin and pcomb gene level statistics were calculated for gene-level association signals.
Pathway annotations
To map genes into biological pathways, we used the Kyoto Encyclopedia of Genes and Genomes (KEGG; ftp://ftp.genome.jp/pub/kegg/pathway/; December 2010 version) and the BioCarta (ftp://ftp.ncbi.nih.gov/gene/DATA/GENE_INFO/Mammalia/Homo_sapiens.gene_info.gz; February 2011 version) annotations. Pathways with extreme numbers of genes (i.e. 2.5th percentile of pathway size distribution, <4 or >250 genes) were removed from analysis to avoid overly limited information or excessively large pathways. This procedure resulted in 207 pathways in the KEGG and 300 pathways in the BioCarta datasets for the following analysis. There were 5443 human protein coding genes mapped to 507 pathways in this pathway annotation procedure and the average gene number per pathway was 36.
Statistical methods for pathway enrichment analysis
We applied four statistical methods to evaluate the enrichment of significant pathways for MDD; these methods are GSEA, hypergeometric test, sum-square-statistic and sum-statistic. The first two are competitive methods and the latter two are self-contained methods, according to two recent review articles (Wang et al.2010, 2011). The GSEA was first developed to analyse microarray gene expression data (Subra-manian et al.2005) and was later modified to expand to GWA studies (Wang et al.2007). In brief, a set of genes was first ordered according to gene-wise statistic values so that genes with higher significance (or small p values) are ranked on the top. The gene-wise statistic values (tj) in pmin gene level were defined as the χ2 statistic of the corresponding most significant SNP and in pcomb gene level were defined as the inverse γ statistic of the combined information of all SNPs in a gene region. For each examined pathway, an enrichment score (ES) was calculated, based on p values of a set of genes in each pathway. The ES was used to evaluate association signals for each annotated pathway. At the second step, for each pathway, the ES was normalized to compute NES by subtracting the mean of the ES in the permutated datasets, ES (Sperm) and divided by the standard deviation of ES (Sperm). We calculated empirical p values for all pathways using 5000 permutations to compare the original ES score from the GWA dataset and the permutation datasets (denoted as Sperm) by computing the fraction of the numbers of [ES (Sperm)>ES (S)] divided by the total number of permutations.
In the hypergeometric test, we used a cut-off p value of 0.05 to define significant genes using their gene-wise p values. A p value based on hypergeometric distribution for each pathway was computed. Further details are described in Jia et al. (2010a) . We performed the hypergeometric test for all annotated pathways using the GWA dataset for MDD.
Both the sum-square-statistic and sum-statistic (Tintle et al.2009) are self-contained methods for gene-set analysis, which only take genes in a specific pathway into account. Let ti (i=1, …, S) be the χ2 test statistics for each of the S genes in a pathway. The sum-square-statistic method utilizes the sum of the squares of all gene-wise statistic values (ti2) over the set of genes (∑i=1Sti2), and the sum-statistic method calculates the sum of the gene-wise statistic values (ti) over the set of genes (∑i=1Sti).
To explore the crosstalk among pathways, we used an overlap coefficient (OC) to calculate the fraction of genes overlapped across pathways; that is, the number of genes overlapped among two pathways divided by the minimal gene number among two pathways. Large OCs represent high similarity in gene information between two pathways. We defined the degree of overlap as follows: complete overlap (OC=1); high overlap (OC⩾50%); moderate overlap (20%⩽OC<50%); low overlap (OC<20%); no overlap (OC=0). To account for multiple testing problems in the pathway-based analyses, we used the method proposed by Benjamini & Hochberg (1995) to control for the false discovery rate (FDR). We ordered all the p values of pathways and compared each p value p(i) with a threshold of (i/m)q*, where m represents the total number of pathways and q* represents the significance level. Thus, the procedure controls the FDR at q*=0.05 level in the current study, assuming p values are independently distributed under null hypotheses. Our analyses were based on two gene-level indexes and four pathway-based methods. The resulting significant pathways may not be the same under different statistical scenarios. We adjusted the FDR on the level of individual statistical method and pathways with p<0.05 after Benjamini and Hochberg's adjustment was reported separately by using pmin or pcomb statistic, as shown in the tables.
Results
A total of 249 270 SNPs were mapped to 16 758 protein-coding genes in the GWA dataset of MDD, which were then mapped to 507 annotated pathways (207 from KEGG and 300 from BioCarta) in the gene-pathway mapping process.
Using the competitive methods, we initially found >60 pathways with nominal p's=<0.05 via the hypergeometric test using either the pmin or pcomb gene-level statistic. Among them, 13 pathways reached statistical significance after controlling the FDR at the 0.05 level. No significant pathways were found after FDR correction using the GSEA method. Significant pathways over-represented in the GWA dataset are listed in Table 1. Using the self-contained methods, five out of 56 significant pathways passed the FDR correction (q value <0.05) by the sum-statistic method under the pmin gene-level statistic and three out of 27 significant pathways were similarly identified by the sum-square-statistic method under the pcomb gene level statistic (see Supplementary Table S1). Of note, there were four overlapping pathways identified simultaneously by the hypergeometric test and the sum-statistic method under the pmin gene level statistic (see Table 1), including long-term depression (LTD), calcium signalling, arrhythmogenic right ventricular cardiomyopathy (ARVC) and cell adhesion molecule (CAM) pathways.
Significantly enriched pathways in the genome-wide association data for major depressive disorder using the ‘minimal p’ gene statistic

KEGG, Kyoto Encyclopedia of Genes and Genomes; ECM, extracellular matrix; PKC, protein kinase C.
The number of genes in each pathway.
The number of genes having a significant p value <0.05 in each pathway.
The empirical p values were calculated from 5000 permutations.
The false discovery rate (FDR) results were based on Benjamini & Hochberg's (1995) multiple testing correction.
Indicates a false discovery rate <0.05 for reported enriched pathways.
Significantly enriched pathways in the genome-wide association data for major depressive disorder using the ‘minimal p’ gene statistic

KEGG, Kyoto Encyclopedia of Genes and Genomes; ECM, extracellular matrix; PKC, protein kinase C.
The number of genes in each pathway.
The number of genes having a significant p value <0.05 in each pathway.
The empirical p values were calculated from 5000 permutations.
The false discovery rate (FDR) results were based on Benjamini & Hochberg's (1995) multiple testing correction.
Indicates a false discovery rate <0.05 for reported enriched pathways.
In total, we identified 17 significant pathways for their biological relevance in MDD that were over-represented in the GWA dataset. The biological functions of 17 enriched pathways (13 from KEGG and four from BioCarta) are listed in both Table 1 (pmin statistic) and Table 2 (pcomb statistic). Those pathways were structurally mapped to organismal systems (i.e. LTD, axon guidance, vascular smooth muscle contraction and protein kinase C-catalysed phosphorylation of inhibitory pathways), environmental info-rmation processing (i.e. calcium signalling, CAMs, phosphatidylinositol signalling system and the extracellular matrix-receptor interaction pathways), human disease-related (i.e. ARVC, type I diabetes mellitus and graft-versus-host disease pathways), cellular processes (i.e. focal adhesion and regulation of actin cytoskeleton pathway) and immune-related system (O-Glycan biosynthesis, T cell receptor signalling, nuclear factor κB (NF-κB) activation by non-typable Haemophilus influenzae and cell cycle pathway).
Significantly enriched pathways in the genome-wide association data for major depressive disorder using the ‘pcombined’ gene statistic

NF-κB, Nuclear factor κB.
Results were based on the sum-square-statistic method.
The number of genes in each pathway.
The empirical p values were calculated from 5000 permutations.
The false discovery rate (FDR) results were based on Benjamini & Hochberg's (1995) multiple testing correction.
Indicates a false discovery rate <0.05 for reported enriched pathways.
Significantly enriched pathways in the genome-wide association data for major depressive disorder using the ‘pcombined’ gene statistic

NF-κB, Nuclear factor κB.
Results were based on the sum-square-statistic method.
The number of genes in each pathway.
The empirical p values were calculated from 5000 permutations.
The false discovery rate (FDR) results were based on Benjamini & Hochberg's (1995) multiple testing correction.
Indicates a false discovery rate <0.05 for reported enriched pathways.
Among the 17 enriched pathways, we evaluated their crosstalk by checking whether there was a high degree of overlap for significant genes (p<0.05) in these pathways. The resulting number and proportion of overlapping genes are shown in Supplementary Table S2. The magnitude of overlap across pathways ranged from a low to intermediate level, which indicated some crosstalk for molecules in enriched pathways. Among all possible pair-wise pathway comparisons, 56.6% did not have any significant genes overlapping, 27.2% had a low degree of overlap (<20%), 13.2% had a moderate degree of overlap (20–50%) and only 3.0% pathways had a high degree of overlap (>50%). The fact that only a few genes were commonly identified in significant pathways for MDD further reflects the difficulty that we faced in identifying ‘the genes’ for complex diseases. Rather, those disease traits are usually caused by the dysfunction of several susceptible gene loci with small main and interaction effects.
We further examined genes that, in the 17 significantly enriched pathways, were involved in at least three pathways and had at least one SNP with a p value <0.05. These genes are listed in Table 3. They are mainly related to cell transformation and cell adhesion (e.g. PRKCA, ITGA, IL-1B), immune system and inflammation response (e.g. IL-1B, HLA genes, MAPK1, PLCB1), signal transduction (e.g. PRKCG, GNA) and calcium-dependent processes (e.g. CACNA genes).

Genes that were involved in at least three pathways with >1 single nucleotide polymorphism (SNP) having p value <0.05 were listed. Proportion of significant SNPs were defined as the number of significant SNPs (p<0.05) divided by total number of SNPs in that gene. The counts were provided for better information in this table.

Genes that were involved in at least three pathways with >1 single nucleotide polymorphism (SNP) having p value <0.05 were listed. Proportion of significant SNPs were defined as the number of significant SNPs (p<0.05) divided by total number of SNPs in that gene. The counts were provided for better information in this table.
Discussion
A wealth of large-scale GWA data have been produced during the past few years for complex traits, such as depression. The pathway-based analytic strategy provides the opportunity to uncover enriched pathways that are involved in the aetiology of depression, based on prior knowledge of gene functions and molecular mechanisms. In this study, we reported 17 over-represented pathways using a major GWA dataset of MDD, where a group of genes in the same pathway jointly showed associations with depression. Some genes were involved in multiple pathways to increase the risk of depression. It is worth noting that many of these genes did not reach genome-wide significance for their associations with depression in the analysis of single-gene level but reveal their potential roles in pathway-based analyses. In the original analysis of this GWA MDD dataset, only one associated loci, PCLO (rs2715148, p=7.7×10−7), was reported, although not reaching genome-wide significance level at 5×10−8 (Sullivan et al.2009). Bosker et al. (2010) attempted to replicate prior associations of candidate genes for MDD using this same GWA dataset; only one gene, TNF, exhibited significant associations (p=0.0034) after multiple testing correction. Concordantly, this TNF gene was included in three significant pathways that we identified (i.e. type I diabetes mellitus, graft-vs.-host disease and NF-κB activation by non-typable Haemophilus influenza).
We found 14 enriched pathways using the pmin gene-level statistic (Table 1) and three pathways using the pcomb gene-level statistic (Table 2). None of the pathways overlapped across the two gene-level calculation methods. Note that we performed multiple testing corrections on the level of individual statistical method, but not strictly applied across multiple pathway-based statistical methods. Nevertheless, four pathways were consistently identified by both the hypergeometric test and the sum-statistic method using the pmin. Even when a more stringent multiple testing adjustment was applied, the four pathways remained significant (none of the 5000 permutation had a statistic score greater than the original score obtained for these pathways, p=0). Similarly, under the pcomb approach, one pathway has empirical significance level at p=0.
The pmin is a commonly used approach to assess association evidence at gene level in previous studies that employed pathway-based analysis (Jia et al.2010a; Tintle et al.2009; Torkamani et al.2008; Wang et al.2007). However, there are limitations of using the pmin statistic to represent the significance of a gene. For instance, if there is a set of moderately associated markers in genes, the importance of such genes may be down-weighted by not having ‘one’ particular significant signal. Highly significant results are prone to the risk of being affected by genotyping errors. In addition, larger genes may be more likely to have smaller p values. In the GWA data for depression, we noticed that larger genes (e.g. >10 000 kb) were positively related to smaller p values (Kao et al.2011; Supplementary Fig. S2). Nevertheless, there is no difference between the proportion of larger gene size in the 17 enriched gene sets compared to genes not in these significant pathways (odds ratio 0.86, p=0.62). Thus, our findings were unlikely affected by such bias. In our analyses, we further combined information of all SNPs in a gene region to represent a gene. Benefits from the pcomb method include: (1) the overall evidence of gene-set association for depression can be assessed; (2) SNPs with moderate effects can be included. We found that the different strategies of defining gene-level significance seem to have substantial influences on results and completed different pathways were identified from these two statistics. We noticed that pathways related to neurotransmitters, neuronal activity, the immune system and inflammation response tended to be selected using the pmin approach; while immune-related pathways tended to be identified using the pcomb approach.
Using the pmin based approach, we identified four significant pathways by both the hypergeometric test and sum-statistics methods, including LTD, calcium signalling, ARVC and CAMs. These are important pathways related to neuronal activity, the immune system and inflammation response. More specifically, the LTD pathway is responsible for the mechanism of long-term, activity-dependent changes in the efficacy of neuronal synapses in some brain areas (e.g. hippocampus and cerebellum) with influence on the central nervous system to release various neurotransmitters during the developmental process (du Lac et al.1995; Massey & Bashir, 2007). Evidence shows that cerebellar LTD affects motor learning, and hippocampal LTD may contribute to memory traces (Malleret et al.2010; Nicholls et al.2008). Thus, the dysregulation of the stress response due to hippocampal dysfunction was suggested to be linked to depression (Pittenger & Duman, 2008).
The network of the calcium signalling pathway is complex and sophisticated and it is involved in neuronal activity and the immune system (Dodd et al.2010; Feske, 2007; Ghosh & Greenberg, 1995; Popoli et al.2002; Vig & Kine, 2009). Genes in this pathway regulate brain function, especially in the hippocampus, to cope with stressful events, which links to anxiety and depression as well as the effects of antidepressants (D'Sa & Duman, 2002; Popoli et al.2002). The ARVC pathway is named by a disease of the heart muscle that involves fibro-fatty replacement of the right ventricle, which is related to immune-mediated damage and inflammation response. CAMs play critical roles in regulating cell–cell and cell–matrix adhesion, which are involved with immune response, inflammation and development of neuronal tissue (Elangbam et al.1997). Changes in CAM expression were suggested to link with depression. For example, a reduction in glia in both the dorsolateral prefrontal cortex and anterior cingulated cortex were indicated to increase neuronal vulnerability in young depressed subjects (Greenwald et al.1998; Parker & Auld, 2004; Thomas et al.2002). Prior evidence also suggested that ischaemia-induced inflammatory response in the prefrontal cortex may cause neuronal damage and subsequent glial malfunction in elderly depressed subjects (Thomas et al.2002). Multiple lines of evidence have supported the implication that development of depression is related to complex biological pathways and networks with the involvement of neuronal activity, the immune system and inflammation response.
Furthermore, the four aforementioned pathways share several genes, reflecting the combinatory nature of cellular components and functional molecules and their potential involvement in several biological pathways. Some example genes in these pathways are GRM1, GRM5 and CDH2. The metabotropic glutamate group 1 receptors (e.g. GRM1, GRM5) are mediated by a G-protein that activates a phosphatidylinositol-calcium second messenger system, which regulates brain function through neurotransmitter system. N-cadherin (e.g. CDH2) is a calcium-dependent CAM essential for normal neuronal development in synaptic contact. It affects synapse formation by interacting with potential signalling pathways and/or molecules such as Wnt signalling (Lamora & Voigt, 2009) and p38 mitogen-activated protein kinase signalling (Ando et al.2011), which is important in terms of the neuronal recognition mechanism. Evidence from these prior findings supports the critical roles of our identified pathways and genes in the pathophysiology of depression. Of note, we previously published a DEP genes set with 151 prioritized candidate genes for depression (Kao et al.2011). Seventeen genes in these four pathways were previously discussed and, in total, 41 genes in all 17 enriched pathways appeared in our top list of DEP genes with high scores (Supplementary Table S3), consistently suggesting the importance of these enriched pathways in relation to depression.
There is increasing evidence showing that mental disorders may result from abnormality in the immune and inflammation systems (Dantzer et al.2008; Pardo et al.2005). Our results support this proposition, especially for the four significantly identified immune-related pathways. Two pathways, T cell receptor signalling and NF-κB activation, are relevant biological processes in immune response and in regulating the inflammatory system (Burbach et al.2007; Shuto et al.2001). In addition, NF-κB activation and cell cycle pathways were reported to be involved in the processes of synaptic plasticity, memory and neuronal differentiation (Merlo et al.2002; Nagy et al.1998). Another significant immune-related pathway is O-glycans synthesis (Brockhausen, 1999; Piller et al.1988; Tsuboi & Fukuda, 1997). Although the causal mechanisms of inflammation response in depression are still unknown, our findings further suggest the potential to investigate roles of neuronal chronic inflammation for depression.
There are some limitations in the current study. First, our pathway analysis relied on the accuracy and completeness of pathway annotation databases and we only used two commonly adopted ones, KEGG and BioCarta. Some genes that may have potential impacts on depression but are not annotated in pathway databases were excluded from our analyses. Other datasets, such as Gene Ontology and Reactome, may be helpful and can be considered in future analysis, although their annotations need to be carefully selected. Second, our findings have shown that different pathways were identified by combining all markers' information in defining gene-level significance. We used the inverse γ method to extract SNPs' information for a gene in the pcomb statistic. Other gene-level statistics, such as the random effects model or Bayesian statistical methods, may provide different gene-level evidence (Stephens & Balding, 2009) in pathway analysis. Third, the current study focused on the signals in genetic association tests, while other genomic information (such as gene expression, gene regulation, etc.) has not yet been used. Concerning other useful genomic datasets, a possible utilization approach is to first integrate gene information from different platforms or data sources to construct a combined score for each gene, followed by typical pathway analysis to obtain more value-added pathway results using all existing genomic evidence and knowledge for depression. Fourth, in conducting pathway analysis for depression, we only used one major GWA dataset. There are several large-scale GWA studies for depression, newly completed while this analysis was in the final stage, and those datasets are not yet easily accessible to the general public. The association results from meta-analysis (or mega-analysis) can be used in the near future to increase our power to uncover the underlying biological mechanisms for depression.
In conclusion, our results suggest several enriched pathways for their biological functions to be involved in molecular mechanisms of depression, including pathways related to neurotransmitter and neuronal systems, the immune system and inflammatory response. A number of novel genes that did not show significant associations with depression in the original single marker/gene analysis of the GWA dataset were found to participate in several pathways, which, jointly with other genes, play roles in the pathogenesis of depression. Although it remains largely unclear how the defect of pathways is specifically linked to the development of depression, our identified pathways provide important biological insights into the interpretation of GWA data for depression. These findings are anticipated to facilitate future follow-up and functional studies for depression.
Note
Supplementary material accompanies this paper on the Journal's website.
Acknowledgements
This research was supported by National Science Council (NSC 97-2314-B-002-184-MY2, NSC 99-2314-B-002-140-MY3) and National Health Research Institute (NHRI-EX99-9918NC) grants (to P-H.K.) and partially supported by NIH grants, 2009 NARSAD Maltz Investigator Award (to Z.Z.) and 2010 NARSAD Young Investigator Award (to P.J.). We thank P.C. Hsiao for his IT assistance. The GWA dataset was accessed through the Genetic Association Inform-ation Network (GAIN), database of Genotypes and Phenotypes (dbGaP) accession number phs000020.v1.p1 (http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000020.v2.p1).
Statement of Interest
None.
References
Author notes
These authors contributed equally to this work.
These authors contributed to this work as joint senior authors.