Abstract

Ischemic stroke (IS) is a leading cause of adult disability that can severely compromise the quality of life for patients. Accurately predicting the IS functional outcome is crucial for precise risk stratification and effective therapeutic interventions. We developed a predictive model integrating genetic, environmental, and clinical factors using data from 7819 IS patients in the Third China National Stroke Registry. Employing an 80:20 split, we randomly divided the dataset into development and internal validation cohorts. The discrimination and calibration performance of models were evaluated using the area under the receiver operating characteristic curves (AUC) for discrimination and Brier score with calibration curve in the internal validation cohort. We conducted genome-wide association studies (GWAS) in the development cohort, identifying rs11109607 (ANKS1B) as the most significant variant associated with IS functional outcome. We employed principal component analysis to reduce dimensionality on the top 100 significant variants identified by the GWAS, incorporating them as genetic factors in the predictive model. We employed a machine learning algorithm capable of identifying nonlinear relationships to establish predictive models for IS patient functional outcome. The optimal model was the XGBoost model, which outperformed the logistic regression model (AUC 0.818 versus 0.756, P < .05) and significantly improved reclassification efficiency. Our study innovatively incorporated genetic, environmental, and clinical factors for predicting the IS functional outcome in East Asian populations, thereby offering novel insights into IS functional outcome.

Introduction

Ischemic stroke (IS) is one of the leading causes of disability and death worldwide [1]. The occurrence and progression of this complex disease are influenced by both genetic and environmental factors, including social and natural components [2]. The functional outcome following IS exhibits extensive variability among individuals, ranging from complete recovery to persistent severe disability. Accurate prediction of IS functional outcome is crucial for precise risk stratification and effective therapeutic interventions.

Current predictive models for functional outcome in IS primarily rely on demographic characteristics, comorbidities, biomarkers, and neurological functional scales such as the Acute Stroke Registry and Analysis of Lausanne (ASTRAL) and the ischemic stroke predictive risk score scale [3–5]. Genetic factors are often neglected in these models due to the unavailability or lack of routine collection of genetic information in clinical practice [6], as well as potential additional costs associated with integrating genetic data into predictive models [7]. Consequently, clinicians tend to prioritize well-established risk factors when constructing IS functional outcome models [7]. However, several studies have demonstrated that the incorporation of genetic factors identified through genome-wide association studies (GWAS) can enhance the performance of prediction models [8].

In order to better predict the functional outcome of Chinese patients with IS, it is necessary to clarify the influence of genes on functional outcome. However, previous genetic analyses have predominantly focused on European populations and their findings may not be directly applicable to East Asian populations [9]. Research within East Asian populations has primarily concentrated on exploring the genetic contribution to IS onset rather than its functional outcome [10]. Therefore, it is crucial to conduct a GWAS analysis specifically targeting IS functional outcome within East Asian populations in order to develop a comprehensive predictive model that incorporates genetic factors.

In this study, the contribution of genes to IS functional outcome was initially identified in a Chinese population through GWAS. Subsequently, we identified significantly associated variants as genetic susceptibility factors and developed prediction models for IS functional outcome by integrating genetic, environmental, and clinical factors. We assessed the performance of prediction models that solely incorporated genetic factors and those that integrated genetic, environmental, and clinical factors. Finally, machine learning (ML) techniques were employed to enhance the predictive capability of these models and compared with the performance of existing statistical models.

Materials and methods

Data source

The study population was derived from the Third China National Stroke Registry (CNSR-III) [11], a prospective, nationwide registry enrolling IS patients in China from August 2015 to March 2018. The CNSR-III involved 201 hospitals that cover 31 out of the 34 provincial administrative divisions in China. This study is based on the Stroke Omics Atlas (STROMICS) study, which has been previously described in detail [12]. The STROMICS website is available at http://www.stromics.org.cn. Exclusion criteria for our study were: (i) history of stroke, and (ii) missing data of 6-month modified Rankin Scale (mRS). There is a total of 15 166 patients in the CNSR-III cohort, and 10 914 patients in the pre-specified genetic sub-study were applied in whole-genome sequencing (WGS) [13]. WGS using 100 bp paired-end reads to high sequencing depth (41.17 × on average). Among these, 10 241 genetically independent WGS data successfully passed the rigorous individual quality control (QC) implemented in the STROMICS study [12]. Our study flow chart of patients was shown in Supplementary Fig. 1. Additionally, we did not have any additional cohorts available for replication at the time of this study.

The baseline data included age, natural environmental data, family income, neurological severity, smoking history, drinking history, and physical activity. Neurological severity was scaled by the admission National Institutes of Health Stroke Scale (NIHSS) score [14]. The natural environmental data included latitude and longitude information of each patient’s birthplace, as well as post-onset residence data on annual average temperature, humidity, Carbon Monoxide (CO), ParticulateMatter 2.5 (PM 2.5), ParticulateMatter 10 (PM 10), and Nitrogen Dioxide (NO2). Natural environmental data was sourced from monitoring data in China (National Real-time Air Quality Monitoring Platform for Cities, https://air.cnemc.cn:18007/). In contrast, temperature and humidity information originates from the Fifth Generation of European Reanalysis. The environmental information for each patient was derived from the annual average environmental data of the onset for that year. An overview of the study design was shown in Fig. 1. All participants provided written informed consent, and local research ethics committees and institutional review boards approved the individual studies (no. KY2015-001-01).

Overview of the study design. The study flowchart includes the development cohort (674 patients with poor outcome and 5581 patients with favorable outcome), an internal validation cohort (153 patients with poor outcome and 1411 patients with favorable outcome), and subsequent downstream analyses.
Figure 1

Overview of the study design. The study flowchart includes the development cohort (674 patients with poor outcome and 5581 patients with favorable outcome), an internal validation cohort (153 patients with poor outcome and 1411 patients with favorable outcome), and subsequent downstream analyses.

Outcome

The primary endpoint of this study was the functional outcome at 6 months, which was defined as the mRS. In accordance with our GWAS analysis plan, we analyzed mRS as a binary variable (mRS 0–2 versus 3–6), and also as the full ordinal scale variable [15]. For model construction in the prediction model, binary classification was employed. A poor functional outcome was indicated by an mRS score ≥ 3, while scores ranging from 0 to 2 were considered indicative of a favorable functional outcome.

Genotyping and quality control for whole-genome sequencing data

We randomly divided the overall population into the development and the internal validation cohort in an 80:20 ratio. The gene data had already undergone some individual and variant QC before this study [12]. In the development cohort, we performed a series of variants QC (Supplementary Fig. 2): (i) variants on sex chromosomes were extracted (n = 3 796 567); (ii) missing rate > 20% were extracted (n = 5 106 618); (iii) Hardy–Weinburg equilibrium > 10−6 were extracted (n = 9773); (iv) a minor allele frequency < 1% were extracted (n = 120 233 897). Individuals’ QC was also checked for discordant sex information. In addition, the genotyping rate was 99.8%. In total, 6 442 355 genetic variants passed the QC and were obtained in this analysis.

Genome-wide association studies with ischemic stroke functional outcome

We aimed to explore genetic variants related to the functional outcome. The GWAS analysis of 6-month functional outcome (mRS 0–2 versus 3–6) after IS was performed on the development cohort of 6255 patients using PLINK (v1.90b6.21). For the ordinal scale mRS, we utilized OrdinalGWAS, which is implemented as a Julia package [15]. Age, sex, and the three principal components from principal component analysis (PCA) analysis were included as covariates in the GWAS analysis. Variants with P < 5.00E-08 were considered significant for association with functional outcome, while variants with P < 1.00E-05 were considered suggestive (Fig. 2 and Supplementary Fig. 3). Furthermore, the odds ratio (OR) in this study was calculated from the β coefficients in logistic regression (LR), according to the following formula: |$\mathrm{OR}={\mathrm{e}}^{-\mathrm{\beta}}$|⁠. From genomic inflation factors (λ), there was no sign of population stratification. We selected information from the top 100 variants, converted it into additive genetic model values (0, 1, 2), and used it for subsequent construction of prediction models. The threshold r2 value of 0.2 was set for clumping the top 100 loci in GWAS analysis. The resulting genetic variants after clumping were presented in Supplementary Table 1.

The Manhattan plot of analysis for associations with functional outcome at 6 months. The outcome was measured as mRS at 6 months after IS onset. The dotted lines show genome-wide significance (P < 5.00E-08) and suggestive association level (P < 1.00E-05). Results were adjusted for age, sex, and principal components.
Figure 2

The Manhattan plot of analysis for associations with functional outcome at 6 months. The outcome was measured as mRS at 6 months after IS onset. The dotted lines show genome-wide significance (P < 5.00E-08) and suggestive association level (P < 1.00E-05). Results were adjusted for age, sex, and principal components.

Annotation of genetic variants

To identify variants with protein-altering effects and annotate the function of the top 100 genetic variants, we used ANNOVAR [16] and SNPnexus [17] for comprehensive annotation of variants (Supplementary Tables 2 and 3). To investigate whether IS outcomes risk variants influence the cis-regulation of nearby genes, we examined genome-wide quantitative information, specifically the expression of quantitative trait loci (eQTLs). We utilized Genotype-Tissue Expression (GTEx) V6 [18] for eQTLs. Additionally, we selected a range of ±400 kb to plot the regional association of IS patient functional outcome (Fig. 3 and Supplementary Fig. 4).

Regional association plots for functional outcome 6 months of IS. The LocusZoom plot display the GWAS significant associations for rs11109607, which is identified as the most significant variant.
Figure 3

Regional association plots for functional outcome 6 months of IS. The LocusZoom plot display the GWAS significant associations for rs11109607, which is identified as the most significant variant.

Feature selection of natural environmental factors

To address data complexity, we applied feature selection to reduce the dimensionality of environmental factors. We measured eight variables (latitude, longitude, temperature, humidity, CO, PM2.5, PM10, and NO2) to predict the risk of poor functional outcome in IS. We employed ML methods (SelectFromModel-XGBoost and SFS-XGB) to perform feature selection, utilizing a step size of 1. The evaluation metric “scoring” was set as “roc_auc”, with the area under the receiver operating characteristic curve (AUC) utilized for assessing the significance of each variable in predicting poor functional outcome in IS (Supplementary Fig. 5). To ensure robust model performance, we implemented a 10-fold cross-validation approach (“cv” set to 10). The objective of this approach was to identify the minimum number of predictors required while maximizing the AUC.

Extreme gradient boosting algorithm

Extreme Gradient Boosting (XGBoost), developed by the University of Washington in the United States in 2016, is a Boosting library equipped with linear scale solvers and tree learning algorithms. XGBoost performs a second-order Taylor expansion on the loss function and introduces a regularization term outside the objective function. This comprehensive approach aims to achieve the optimal solution by balancing the trade-off between the objective function’s descent and the model’s complexity. This helps prevent overfitting and enhances the efficiency of model solving [19].

Statistical and machine learning analysis

Comparisons between groups were conducted using analysis of t-test or Mann–Whitney U-test for continuous variables and chi-square test or Fisher exact test for categorical variables. Continuous variables were expressed as the mean ± standard deviation or median (interquartile range [IQR]), and categorical variables were expressed as percentages. We used ML algorithms [XGBoost [19], Light Gradient Boosting Machine (LightGBM) [20], Categorical Boosting (CatBoost) [21], Multilayer Perceptron (MLP) [22]] and the LR to predict poor functional outcome at 6-month. Hyperparameter tuning of the XGBoost, LightGBM, CatBoost, and MLP algorithms was performed with GridSearchCV for 10-fold cv toward maximizing the AUC metrics in the development cohort.

To reduce the complexity of gene data, we employed PCA before constructing the prediction model. In this study, PCA was utilized to decrease the dimensionality of 100 genetic factors. This approach enhances model generalizability, interpretability, and computational efficiency by addressing the curse of dimensionality and revealing underlying data patterns. During PCA, it was observed that the sixth principal component exhibited overlap with the preceding components. As the first five principal components sufficiently captured information from all 100 genetic variants, they were selected as genetic factors for the prediction model and labeled as PC1–5.

Three primary prediction models were denoted as Model 1, Model 2, and Model 3. Model 1 included genetic information (top 100 variants as PC1–5). In addition, Model 2 added the age variable. Lastly, Model 3 further added environmental and clinical factors such as PM2.5, PM10, CO, temperature, income, smoking history, drinking history, physical activity, and NIHSS score. To accurately measure the bias and variance of each model, we conducted a 10-fold cv in the internal validation cohort to evaluate the model’s performance. Model reclassification performance for estimating the improvement between the prediction models was also assessed using net reclassification improvement (NRI) and integrated discrimination improvement (IDI) [23, 24]. Lastly, we visualized the importance ranking of each feature using the “feature_importances_” parameter. According to the PROBAST guidelines, comparative evaluation of predictive models is conducted in terms of both discrimination and calibration [25]. Discrimination was measured using the AUC, where a higher value indicated better discriminatory capability. Calibration was assessed through the Brier score and calibration curve plots. A lower Brier score suggested improved calibration. Data analysis and visualization application R (v4.2.2) and Python (v3.9.7) software completed. Statistical significance was determined for two-sided P < .05.

Results

Baseline description and genome-wide association studies analysis

An overview of the study design was shown in Fig. 1. The average age of the included 7819 IS patients was 61.6 ± 11.4 years, and 2468 (31.6%) were women. Among them, 827 (10.6%) patients experienced poor functional outcome (mRS score of ≥3) within 6 months, with an average age of 67.1 ± 11.7 years (Supplementary Table 4). We randomly assigned 7819 patients into development and internal validation cohorts using an 80:20 ratio. Baseline characteristics of participants in both cohorts were presented in Table 1.

Table 1

Baseline characteristics in development and internal validation cohorts

Total (n = 7819)Development cohort (n = 6255)Internal validation cohort (n = 1564)
Demographics
Age, mean (SD)61.6 ± 11.461.5 ± 11.461.8 ± 11.4
Sex, female, n (%)2468 (31.6)1953 (31.2)515 (32.9)
Family income/monthly, n (%)
 <700 Yuan401 (5.1)322 (5.1)79 (5.1)
 700 ~ 1500 Yuan1058 (13.5)855 (13.7)203 (13.0)
 1501 ~ 2300 Yuan1599 (20.5)1285 (20.5)314 (20.1)
 >2300 Yuan2801 (35.8)2234 (35.7)567 (36.3)
 Unknown1960 (25.1)1559 (24.9)401 (25.6)
Smoking history, n (%)2613 (33.4)2107 (33.7)506 (32.4)
Drinking history(≥20 g/day), n (%)1231 (15.7)988 (15.8)243 (15.5)
Physical activity, n (%)4532 (58.0)3616 (57.8)916 (58.6)
Natural environment
 Longitude, median (IQR)116.1 (112.9–120.1)116.1 (112.9–120.1)115.8 (112.9–120.2)
 Latitude, median (IQR)36.2 (31.2–39.1)36.3 (31.3–39.1)35.9 (31.2–38.9)
 Air pollutant, mean (SD)
  Temperature (°C)13.7 ± 4.413.7 ± 4.413.9 ± 4.3
  Humidity (%)65.0 ± 9.865.0 ± 9.865.4 ± 9.9
  CO (mg/m3)1.2 ± 0.41.2 ± 0.41.1 ± 0.4
  NO2 (ug/m3)38.5 ± 11.238.6 ± 11.238.2 ± 11.1
  PM10 (ug/m3)98.7 ± 31.199.0 ± 31.297.5 ± 30.5
  PM2.5 (ug/m3)55.9 ± 17.756.1 ± 17.855.2 ± 17.5
Neurological severity
 Admission NIHSS score, median (IQR)3.0 (1.0–6.0)3.0 (1.0–6.0)3.0 (1.0–6.0)
Total (n = 7819)Development cohort (n = 6255)Internal validation cohort (n = 1564)
Demographics
Age, mean (SD)61.6 ± 11.461.5 ± 11.461.8 ± 11.4
Sex, female, n (%)2468 (31.6)1953 (31.2)515 (32.9)
Family income/monthly, n (%)
 <700 Yuan401 (5.1)322 (5.1)79 (5.1)
 700 ~ 1500 Yuan1058 (13.5)855 (13.7)203 (13.0)
 1501 ~ 2300 Yuan1599 (20.5)1285 (20.5)314 (20.1)
 >2300 Yuan2801 (35.8)2234 (35.7)567 (36.3)
 Unknown1960 (25.1)1559 (24.9)401 (25.6)
Smoking history, n (%)2613 (33.4)2107 (33.7)506 (32.4)
Drinking history(≥20 g/day), n (%)1231 (15.7)988 (15.8)243 (15.5)
Physical activity, n (%)4532 (58.0)3616 (57.8)916 (58.6)
Natural environment
 Longitude, median (IQR)116.1 (112.9–120.1)116.1 (112.9–120.1)115.8 (112.9–120.2)
 Latitude, median (IQR)36.2 (31.2–39.1)36.3 (31.3–39.1)35.9 (31.2–38.9)
 Air pollutant, mean (SD)
  Temperature (°C)13.7 ± 4.413.7 ± 4.413.9 ± 4.3
  Humidity (%)65.0 ± 9.865.0 ± 9.865.4 ± 9.9
  CO (mg/m3)1.2 ± 0.41.2 ± 0.41.1 ± 0.4
  NO2 (ug/m3)38.5 ± 11.238.6 ± 11.238.2 ± 11.1
  PM10 (ug/m3)98.7 ± 31.199.0 ± 31.297.5 ± 30.5
  PM2.5 (ug/m3)55.9 ± 17.756.1 ± 17.855.2 ± 17.5
Neurological severity
 Admission NIHSS score, median (IQR)3.0 (1.0–6.0)3.0 (1.0–6.0)3.0 (1.0–6.0)
Table 1

Baseline characteristics in development and internal validation cohorts

Total (n = 7819)Development cohort (n = 6255)Internal validation cohort (n = 1564)
Demographics
Age, mean (SD)61.6 ± 11.461.5 ± 11.461.8 ± 11.4
Sex, female, n (%)2468 (31.6)1953 (31.2)515 (32.9)
Family income/monthly, n (%)
 <700 Yuan401 (5.1)322 (5.1)79 (5.1)
 700 ~ 1500 Yuan1058 (13.5)855 (13.7)203 (13.0)
 1501 ~ 2300 Yuan1599 (20.5)1285 (20.5)314 (20.1)
 >2300 Yuan2801 (35.8)2234 (35.7)567 (36.3)
 Unknown1960 (25.1)1559 (24.9)401 (25.6)
Smoking history, n (%)2613 (33.4)2107 (33.7)506 (32.4)
Drinking history(≥20 g/day), n (%)1231 (15.7)988 (15.8)243 (15.5)
Physical activity, n (%)4532 (58.0)3616 (57.8)916 (58.6)
Natural environment
 Longitude, median (IQR)116.1 (112.9–120.1)116.1 (112.9–120.1)115.8 (112.9–120.2)
 Latitude, median (IQR)36.2 (31.2–39.1)36.3 (31.3–39.1)35.9 (31.2–38.9)
 Air pollutant, mean (SD)
  Temperature (°C)13.7 ± 4.413.7 ± 4.413.9 ± 4.3
  Humidity (%)65.0 ± 9.865.0 ± 9.865.4 ± 9.9
  CO (mg/m3)1.2 ± 0.41.2 ± 0.41.1 ± 0.4
  NO2 (ug/m3)38.5 ± 11.238.6 ± 11.238.2 ± 11.1
  PM10 (ug/m3)98.7 ± 31.199.0 ± 31.297.5 ± 30.5
  PM2.5 (ug/m3)55.9 ± 17.756.1 ± 17.855.2 ± 17.5
Neurological severity
 Admission NIHSS score, median (IQR)3.0 (1.0–6.0)3.0 (1.0–6.0)3.0 (1.0–6.0)
Total (n = 7819)Development cohort (n = 6255)Internal validation cohort (n = 1564)
Demographics
Age, mean (SD)61.6 ± 11.461.5 ± 11.461.8 ± 11.4
Sex, female, n (%)2468 (31.6)1953 (31.2)515 (32.9)
Family income/monthly, n (%)
 <700 Yuan401 (5.1)322 (5.1)79 (5.1)
 700 ~ 1500 Yuan1058 (13.5)855 (13.7)203 (13.0)
 1501 ~ 2300 Yuan1599 (20.5)1285 (20.5)314 (20.1)
 >2300 Yuan2801 (35.8)2234 (35.7)567 (36.3)
 Unknown1960 (25.1)1559 (24.9)401 (25.6)
Smoking history, n (%)2613 (33.4)2107 (33.7)506 (32.4)
Drinking history(≥20 g/day), n (%)1231 (15.7)988 (15.8)243 (15.5)
Physical activity, n (%)4532 (58.0)3616 (57.8)916 (58.6)
Natural environment
 Longitude, median (IQR)116.1 (112.9–120.1)116.1 (112.9–120.1)115.8 (112.9–120.2)
 Latitude, median (IQR)36.2 (31.2–39.1)36.3 (31.3–39.1)35.9 (31.2–38.9)
 Air pollutant, mean (SD)
  Temperature (°C)13.7 ± 4.413.7 ± 4.413.9 ± 4.3
  Humidity (%)65.0 ± 9.865.0 ± 9.865.4 ± 9.9
  CO (mg/m3)1.2 ± 0.41.2 ± 0.41.1 ± 0.4
  NO2 (ug/m3)38.5 ± 11.238.6 ± 11.238.2 ± 11.1
  PM10 (ug/m3)98.7 ± 31.199.0 ± 31.297.5 ± 30.5
  PM2.5 (ug/m3)55.9 ± 17.756.1 ± 17.855.2 ± 17.5
Neurological severity
 Admission NIHSS score, median (IQR)3.0 (1.0–6.0)3.0 (1.0–6.0)3.0 (1.0–6.0)

In the development cohort of 6255 IS patients, the GWAS was employed to identify genetic variants associated with the functional outcome of IS by analyzing a total of 6,442,355 variants. Among these IS patients, 674 patients exhibited poor functional outcome (mRS ≥ 3) while 5581 patients demonstrated favorable functional outcome (mRS < 3). Although no genome-wide significant variants were associated with IS 6-month functional outcome (P < 5.00E-08), 145 loci showed potential significance (P < 1.00E-5). Thus, we selected the top 100 loci (P < 6.43E-06) as candidate genetic factors for our predictive models. These 100 loci were located in 11 genes, namely ANKS1B, LIMCH1, PDZRN4, CFAP77, CHMP4B, KAZN, MARK3, PROM1, TMEFF2, TTLL11, and ZNF536 (as shown in Fig. 2, Supplementary Table 2-3). As shown in Supplementary Fig. 6, these genes are extensively expressed in the brain [18]. The Manhattan plot revealed that rs11109607 (ANKS1B, P = 1.48E-07) was the most significant locus identified. Through clumping (Supplementary Table 1), two independent loci located in the ANKS1B gene were retained (rs11109607 and rs141876128). Supplementary Fig. 7 illustrated the protein–protein interaction network of the ANKS1B gene, showing its interaction with APP, a protein-coding gene that produces amyloid precursor protein implicated in cell binding and neuron development [26].

Additionally, through OrdinalGWAS analysis, we found rs520046 (SERINC1, P = 1.00E-07) as the most significant locus (Supplementary Fig. 8 and Supplementary Table 5). SERINC1 is a protein coding gene involved in key pathways such as the Sphingolipid pathway and Metabolism. Gene ontology annotations associated with this gene include L-serine transmembrane transporter activity [27–29]. Furthermore, expression of the SERINC1 gene has been shown in brain tissues based on the GTEx database.

Model construction and evaluation

We employed dimensionality reduction techniques as a foundational step in developing prediction models. As shown in Supplementary Fig. 9, we utilized PCA to reduce the dimensionality of the genetic data. Subsequently, the first five principal components were ultimately chosen as the genetic factors for constructing the prediction model. PC1 corresponded to the PDZRN4 gene (with lead variant rs12314242, P = 1.69E-06), PC2 to ANKS1B, PC3 to a linkage disequilibrium (LD) block on chromosome 8, PC4 to an LD block on chromosome 3, and PC5 was the LIMCH1 gene (with lead variant rs145664751, P = 2.36E-07).

We employed SFS-XGB to perform feature selection on environmental factors, setting the step size to 1. The smallest number of features with the highest AUC was selected through iteration. As illustrated in Supplementary Fig. 5, PM2.5, PM10, CO, and temperature were identified as the optimal combination of environmental factors.

To evaluate prediction models based solely on genetic factors and those incorporating a combination of genetic, environmental, and clinical factors, we established Models 1–3. In Model 1, incorporating only genetic factors PC1–5, the XGBoost algorithm achieved an AUC of 0.542, while LR achieved 0.519. Subsequently, in Model 2, including both genetic factors and age improved the predictive performance for both LR (an increase of 0.113) and XGBoost (an increase of 0.09) models. However, the optimal performance was observed in Model 3 which included genetic, environmental, and clinical factors together. In this comprehensive model (Model 3), XGBoost significantly outperformed LR with the AUC of LR = 0.756 (95% CI: 0.648, 0.835) and XGBoost = 0.818 (95% CI: 0.753, 0 0.893), P < .05 (refer to Supplementary Table 6).

After confirming that Model 3 (XGBoost) was the optimal model for predicting the 6-month IS functional outcome, we also evaluated several other ML algorithms, including MLP, LightGBM, and CatBoost. In addition to assessing discrimination and calibration, we also evaluated accuracy, positive predictive value (PPV), and negative predictive value (NPV). According to Table 2, the ML models outperformed the LR model with XGBoost demonstrating superior performance (AUC 0.818 versus 0.756; P < 0.05), which was statistically significant compared to LR (Fig. 4 and Supplementary Fig. 10a). The calibration curve plots indicated an excellent fit for each prediction model we developed as shown in Supplementary Fig. 10b. Despite an imbalanced ratio of favorable functional outcome to the poor functional outcome at a ratio of 8:1, the imbalanced setting did not significantly impact the prediction ability of LR and XGBoost (data not shown).

Table 2

Results of internal validation cohort for predicting 6-month functional outcome in IS

ModelAUCAccuracyPPVNPVBrier scoreP (versus LR)
LR0.756 (0.648,0.835)0.9020.4870.9120.079
MLP0.793 (0.700,0.866)0.9010.4810.9150.0760.293
LightGBM0.797 (0.668,0.899)0.9100.6360.9180.0710.192
CatBoost0.803 (0.700,0.897)0.9070.5560.9210.0730.105
XGBoost0.818 (0.753,0.893)0.9090.6190.9170.0720.030
ModelAUCAccuracyPPVNPVBrier scoreP (versus LR)
LR0.756 (0.648,0.835)0.9020.4870.9120.079
MLP0.793 (0.700,0.866)0.9010.4810.9150.0760.293
LightGBM0.797 (0.668,0.899)0.9100.6360.9180.0710.192
CatBoost0.803 (0.700,0.897)0.9070.5560.9210.0730.105
XGBoost0.818 (0.753,0.893)0.9090.6190.9170.0720.030
Table 2

Results of internal validation cohort for predicting 6-month functional outcome in IS

ModelAUCAccuracyPPVNPVBrier scoreP (versus LR)
LR0.756 (0.648,0.835)0.9020.4870.9120.079
MLP0.793 (0.700,0.866)0.9010.4810.9150.0760.293
LightGBM0.797 (0.668,0.899)0.9100.6360.9180.0710.192
CatBoost0.803 (0.700,0.897)0.9070.5560.9210.0730.105
XGBoost0.818 (0.753,0.893)0.9090.6190.9170.0720.030
ModelAUCAccuracyPPVNPVBrier scoreP (versus LR)
LR0.756 (0.648,0.835)0.9020.4870.9120.079
MLP0.793 (0.700,0.866)0.9010.4810.9150.0760.293
LightGBM0.797 (0.668,0.899)0.9100.6360.9180.0710.192
CatBoost0.803 (0.700,0.897)0.9070.5560.9210.0730.105
XGBoost0.818 (0.753,0.893)0.9090.6190.9170.0720.030
Receiver operating characteristic (ROC) and calibration curves of LR and XGBoost for predicting IS outcome at 6 months on internal validation cohort. (a) ROC curves. (b) Calibration curves.
Figure 4

Receiver operating characteristic (ROC) and calibration curves of LR and XGBoost for predicting IS outcome at 6 months on internal validation cohort. (a) ROC curves. (b) Calibration curves.

Furthermore, we employed 18 independent loci as genetic factors for constructing prediction models (Model 1, Model 2, and Model 3) after clumping. In the best-performing XGBoost model, the prediction models incorporating PC1–5 showed superior performance compared to the clumping-based model (Supplementary Table 7). We also performed Gene Set Enrichment Analysis for the loci that reached the level of 1.00E-5 simultaneously (Supplementary Table 8), and built prediction models by testing the top 10, top 50, top 100, top 500, and top 1000 loci. We found that only the model using the top 100 loci had the best performance and the best interpretability (Supplementary Table 9).

Reclassification performance and feature importance analysis

To comprehensively assess the enhanced predictive performance of the optimal model (XGBoost) in comparison to the LR model, we conducted an evaluation based on NRI and IDI. Results showed that the XGBoost model outperformed the LR model with NRI and IDI values of 0.77% and 1.53%, respectively.

The feature importance of XGBoost was illustrated in Fig. 5, with admission NIHSS score, age, and PC2 emerging as the top three essential features. Notably, ANKS1B represented by PC2 was identified as the most crucial genetic factor in the best model.

Feature importance plot of the XGBoost model.
Figure 5

Feature importance plot of the XGBoost model.

Discussion

This study employed ML algorithms capable of handling nonlinear relationships to establish a predictive model for the 6-month functional outcome in patients with IS, integrating genetic, environmental, and clinical factors. We observed that incorporating environmental and clinical factors significantly enhanced the predictive performance compared to models solely focusing on genetic factors. The XGBoost model outperformed the LR model (AUC 0.818 versus 0.756, P < .05) and substantially improved reclassification efficiency. Age, PM2.5, PM10, CO, temperature, smoking, drinking, physical activity, family income, NIHSS score, and genetic factors were predictors for functional outcome in IS patients at 6 months. Moreover, this study represents an innovative effort by utilizing the GWAS to identify a potential association between the ANKS1B gene (rs11109607, C > T, beta = 0.452, OR = 1.573, P = 1.48E-07) and IS 6-month functional outcome within the Chinese IS population.

We designed an integrated predictive model for IS functional outcome by incorporating genetic, environmental, and clinical factors, inspired by the disease triangle theory in public health. According to this theory, health issues arise from the interaction of three primary factors: host factors, environmental influences, and clinical features (including neurological function). In the context of IS, host factors refer to genetic variants; environmental factors include lifestyle choices, socioeconomic status, and exposure to different environments; neurological function is commonly assessed using the NIHSS score. Therefore, our research distinguishes itself by emphasizing that the functional outcome of IS patients is determined not only by singular factors but also by the interplay among these three elements. Moreover, we have selected easily accessible environmental and clinical variables for future external validation purposes. For example, publicly available data repositories can provide information on environmental variables while clinical physicians can directly assess the NIHSS score.

Notably, the models solely based on genetic factors did not exhibit high predictive performance (Model 1); the incorporation of age led to some improvement (Model 2). Further inclusion of environmental factors, NIHSS score and other variables resulted in a significantly improved predictive performance (Model 3). While genetic factors contribute to complex diseases to some extent their influence is limited as complex diseases do not adhere strictly to Mendelian patterns of inheritance [30, 31]. Although individuals may inherit genes associated with these diseases, it represents only part of their risk for developing poor outcomes related to complex diseases. Genetic predisposition implies a patient’s susceptibility but does not guarantee poor outcomes development as actual functional outcome largely depends on a patient’s environment, lifestyle, and clinical status [30, 31]. This suggests that comprehensive prediction models incorporating genetic, environmental, and clinical factors are necessary for complex diseases such as IS, distinguishing them from single-gene diseases and validating our initial research design.

In this study, we used the GWAS to identify genetic factors associated with the 6-month functional outcome of IS for prediction models. Previous GWAS in the field of IS have mainly focused on disease onset rather than functional outcome. Except for the Genetics of Ischemic Stroke Functional Outcome study involving individuals of European ancestry, there has been a lack of GWAS focusing on IS functional outcome in East Asian populations [9]. We innovatively utilized GWAS methods to identify relevant genetic factors for the 6-month outcome of IS. Within the development cohort, we conducted the GWAS involving 6255 patients and identified the top 100 variants that potentially influence the functional outcome of IS. However, the individual effects of these genetic variants were generally small [32]. Predicting poor outcomes at 6 months in IS patients required integrating contributions from multiple loci. Therefore, we performed PCA on the top 100 loci and used five principal components to construct our prediction model. Gene annotation revealed that the top 100 loci obtained from GWAS were associated with eleven brain-expressed genes [18]. The most significant genome-wide locus (rs11109607) is located within the ANKS1B gene.

The ANKS1B gene, located on human chromosome 12q23.1, is widely expressed throughout the brain [33], with significantly higher expression levels compared to other tissues. It ranks second in terms of brain-specific expression among all human genes [34, 35]. Moreover, ANKS1B transcripts are found in developing Purkinje cells of the cerebellum, suggesting potential associations with motor and language disorders [36]. Disruptions in ANKS1B expression may impact the functionality of brain regions crucial for motor coordination and social communication [37]. The protein AIDA-1 encoded by the ANKS1B gene serves as a crucial regulatory factor, primarily distributed throughout various regions of the brain, particularly in the hippocampus and cerebellum [38]. It interacts with N-methyl-D-aspartate (NMDA) receptors (NMDARs) and modulates synaptic NMDAR subunit composition, which is essential for synaptic plasticity [39, 40]. Dysregulation of NMDARs is linked to various neuropsychiatric and neurodegenerative disorders. AIDA-1 promotes the transport of GluN2B subunits, essential for NMDAR plasticity, impacting hippocampal plasticity [40, 41]. ANKS1B knockout mice exhibit altered behavior and NMDAR function, potentially associated with cognitive and motor disorders in IS patients [42–44]. We posit that the functional loss of the ANKS1B (rs11109607, C > T, OR = 1.573) gene may impact the NMDA receptors by influencing the encoding of the AIDA-1 protein. This could affect neural function ultimately leading to poor functional outcome in IS patients manifested as limb impairment or consciousness disorders.

We identified natural environmental predictors associated with the functional outcome of IS at 6-months following the identification of a genetic predictor relevant to IS functional outcome. Feature selection techniques were employed to reduce the number of predictive factors within natural environmental factors, improving model performance and reducing computational expenses. The SelectFromModel method with an embedded strongly regularized model was used for feature selection. Combinations of natural environmental predictors yielding the highest AUC while minimizing variable count were identified through iterative processes. We performed feature selection on natural environmental variables and identified PM2.5, PM10, CO, and temperature as predictors. Previous studies reported associations between air pollutants such as PM2.5, PM10, and CO and the risk of stroke onset and their relevance to the functional outcome of IS patients [45–47]. Additionally, the incidence of IS and the risk of poor outcomes were elevated under extreme temperatures [45–47].

The impact of environmental factors on the functional outcome of patients with IS extends beyond natural elements. Previous research consistently identifies smoking and drinking as risk factors for IS, while regular physical activity is recognized as protective [2]. Furthermore, socioeconomic disparities contribute to diverse outcomes among patients with IS [48, 49]. Therefore, we included adverse lifestyle habits (smoking and drinking), physical activity, and economic factors (family income) as predictors.

The NIHSS score, widely applied in clinical settings and easily obtainable, played a pivotal role in assessing the neurological function of IS patients [14]. In clinical practice, the NIHSS score was derived from assessments by clinicians regarding patients’ motor function, sensation, speech, eye movement, and other aspects. It represented the patient’s neurological status and served as a commonly employed and easily accessible variable in clinical settings.

XGBoost has consistently demonstrated excellent performance in avoiding overfitting and accurately predicting tabular data. In this study, we explored multiple algorithms, including MLP and other ML algorithms, but ultimately found that the XGBoost model yielded the best results. Additionally, the XGBoost model exhibited improvement over the LR model, as evidenced by the NRI and IDI values. This suggested a significant enhancement in the predictive performance of XGBoost, particularly notable in the improvement of the prediction probability distribution. This suggests that our trained XGBoost model was more stable and suitable for disease prediction. The findings of this study suggest that XGBoost algorithms can be potentially utilized by future biologists and clinicians for the identification of disease-associated variants, as well as for the development of predictive models. The algorithm can help uncover intricate patterns within biological data that might be challenging to detect using traditional statistical methods alone.

Our study observed an intriguing discovery when utilizing PCA to select the top 5 principal components. It accurately differentiated between different genes and identified LD blocks. Even without gene annotation, this unsupervised PCA technique precisely distinguished different genes on the same chromosome (Chromosome 12) into two principal components (representing gene ANKS1B and gene PDZRN4). This approach might offer insights for future unsupervised identification of essential genes using ML algorithms. The performance of the XGBoost model was enhanced when using PC1–5 compared to using the 18 independent loci after clumping. This improvement may be attributed to PCA’s ability to fully capture the nonlinear relationship among the 100 loci rather than solely representing the contribution of these 18 independent loci to the prediction model.

The present study also had several limitations. Firstly, the significant loci identified through the GWAS were based on Chinese IS patients, potentially constraining the applicability of outcome predictions to non-Asian populations. Secondly, owing to the existing availability of cohorts containing genetic, environmental, and clinical data limited to the CNSR-III, the results lacked external validation, relying solely on internal validation. Despite these constraints, it was deemed necessary for subsequent research to undertake external validation in independent prospective cohorts. Thirdly, this study focused on genetic factors related to IS outcomes among common variants, excluding rare variant data from the analysis. Future research could explore rare variants further. It is worth noting that the selected predictive factors in this study effectively predict functional outcome at 6-month in IS patients, but further research is needed to establish an objective causal relationship between these factors and functional outcome in IS patients.

In conclusion, this study established a 6-month functional outcome prediction model for IS patients by integrating genetic, environmental, and clinical factors through ML algorithms. Our results demonstrated that ML outperformed LR models in predictive power, with the XGBoost model exhibiting the highest performance. Moreover, our analysis indicated the ANKS1B gene as a potentially crucial genetic factor in IS.

Key Points
  • A comprehensive predictive model incorporating genetic, environmental, and clinical factors for assessing functional outcomes in patients with IS.

  • Revealing novel important loci associated with the 6-month functional outcome of IS in the Chinese population through the GWAS.

  • The model is constructed by ML algorithms and dimensionality reduction algorithms.

  • Compared to other models, XGBoost showed significant improvements in both differentiation and reclassification.

  • A useful attempt for biologists and scientists who would like to perform research on IS function outcomes in East Asian populations.

Acknowledgements

We would like to express our gratitude to the Changping Laboratory for their support. We also extend our thanks to all the participating hospitals, doctors, and nurses, as well as the members of the CNSR-III Steering Committee, particularly Dr Yongjun Wang and Dr Yong Jiang, for their valuable support and assistance. We are also grateful to the editors and reviewers for their thorough review and constructive comments. Our thanks also go to the Tian Qi High Performance Computing Center and the Biomedical High Performance Computing Platform for their support. Finally, we sincerely thank the dedicated staff for their significant contributions during the initial phases of the STROMIC study.

Author contributions

S.D.C. performed the data analysis and wrote the manuscript, Y.J. and Y.J.W. reviewed the manuscript and provided insightful suggestions, Z.X., H.Q.G., Y.F.S., and J.F.Y. conducted the methodological, C.G., X.M., H.L., and X.Y.H. supervised the study. All authors approved the final manuscript.

Conflict of interest: None declared.

Funding

This work was supported by grants from Chinese Academy of Medical Sciences Innovation Fund for Medical Sciences (2019-I2M-5-029), grants from the National Natural Science Foundation of China (81870905, U20A20358), and grants from the Beijing Hospitals Authority Clinical medicine Development of special funding support (ZLRK202312).

Data availability

The datasets in the current study are available for contact with corresponding authors and can also be accessed via a web portal at http://www.stromics.org.cn/.

Code availability

The complete code in the current study are available for contact with the corresponding authors.

References

1.

GBD 2013 Mortality and Causes of Death Collaborators. Global, regional, and national age–sex specific all-cause and cause-specific mortality for 240 causes of death, 1990–2013: a systematic analysis for the global burden of disease study 2013
.
The Lancet
2015
;
385
:
117
71
.

2.

Boehme
AK
,
Esenwa
C
,
Elkind
MSV
.
Stroke risk factors, genetics, and prevention
.
Circ Res
2017
;
120
:
472
95
. .

3.

Cooray
C
,
Mazya
M
,
Bottai
M
. et al.
External validation of the ASTRAL and DRAGON scores for prediction of functional outcome in stroke
.
Stroke
2016
;
47
:
1493
9
. .

4.

Liu
G
,
Ntaios
G
,
Zheng
H
. et al.
External validation of the ASTRAL score to predict 3- and 12-month functional outcome in the China National Stroke Registry
.
Stroke
2013
;
44
:
1443
5
. .

5.

Park
TH
,
Saposnik
G
,
Bae
H-J
. et al.
The iScore predicts functional outcome in Korean patients with ischemic stroke
.
Stroke
2013
;
44
:
1440
2
. .

6.

Manolio
TA
.
Genomewide association studies and assessment of the risk of disease
.
N Engl J Med
2010
;
363
:
166
76
. .

7.

Visscher
PM
,
Brown
MA
,
McCarthy
MI
. et al.
Five years of GWAS discovery
.
Am J Hum Genet
2012
;
90
:
7
24
. .

8.

Hahn
S-J
,
Kim
S
,
Choi
YS
. et al.
Prediction of type 2 diabetes using genome-wide polygenic risk score and metabolic profiles: a machine learning analysis of population-based 10-year prospective cohort study
.
EBioMedicine
2022
;
86
:
104383
.

9.

Söderholm
M
,
Pedersen
A
,
Lorentzen
E
. et al.
Genome-wide association meta-analysis of functional outcome after ischemic stroke
.
Neurology
2019
;
92
:e1271–e1283.

10.

Lu
X
,
Niu
X
,
Shen
C
. et al.
Development and validation of a polygenic risk score for stroke in the Chinese population
.
Neurology
2021
;
97
:e619–e628.

11.

Wang
Y
,
Jing
J
,
Meng
X
. et al.
The third China National Stroke Registry (CNSR-III) for patients with acute ischaemic stroke or transient ischaemic attack: design, rationale and baseline patient characteristics
.
Stroke Vasc Neurol
2019
;
4
:
158
64
. .

12.

Cheng
S
,
Xu
Z
,
Bian
S
. et al.
The STROMICS genome study: deep whole-genome sequencing and analysis of 10K Chinese patients with ischemic stroke reveal complex genetic and phenotypic interplay
.
Cell Discov
2023
;
9
:
75
.

13.

Cheng
S
,
Xu
Z
,
Liu
Y
. et al.
Whole genome sequencing of 10K patients with acute ischaemic stroke or transient ischaemic attack: design, methods and baseline patient characteristics
.
Stroke Vasc Neurol
2021
;
6
:
291
7
. .

14.

Kwah
LK
,
Diong
J
.
National Institutes of Health stroke scale (NIHSS)
.
J Physiother
2014
;
60
:
61
.

15.

German
CA
,
Sinsheimer
JS
,
Klimentidis
YC
. et al.
Ordered multinomial regression for genetic association analysis of ordinal phenotypes at biobank scale
.
Genet Epidemiol
2020
;
44
:
248
60
. .

16.

Yang
H
,
Wang
K
.
Genomic variant annotation and prioritization with ANNOVAR and wANNOVAR
.
Nat Protoc
2015
;
10
:
1556
66
. .

17.

Oscanoa
J
,
Sivapalan
L
,
Gadaleta
E
. et al.
SNPnexus: a web server for functional annotation of human genome sequence variation (2020 update)
.
Nucleic Acids Res
2020
;
48
:
W185
92
. .

18.

GTEx Consortium
.
Human genomics. The genotype-tissue expression (GTEx) pilot analysis: multitissue gene regulation in humans
.
Science
2015
;
348
:
648
60
.

19.

Chen
T
,
Guestrin
C
. Xgboost: A scalable tree boosting system. In: Krishnapuram B, Shah M, Smola AJ et al. . (eds).
Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, 2016
. pp. 785–94. Association for Computing Machinery, New York, NY, USA.

20.

Ke G, Meng Q, Finely T, et al. . LightGBM: a highly efficient gradient boosting decision tree. In:

31st Conference on Neural Information Processing Systems 2017
, pp. 3146–54.

21.

Dorogush
AV
,
Ershov
V
,
Gulin
A
. CatBoost: gradient boosting with categorical features support. 2018. Preprint arXiv:1810.11363.

22.

Taud
H
,
Mas
JF
.
Multilayer perceptron (MLP)
.
Geomatic Approaches for Modeling Land Change Scenarios
2018
;
451
455
. .

23.

Leening
MJG
,
Vedder
MM
,
Witteman
JCM
. et al.
Net reclassification improvement: computation, interpretation, and controversies: a literature review and clinician’s guide
.
Ann Intern Med
2014
;
160
:
122
31
.

24.

Pencina
MJ
,
D’Agostino
RB
,
Demler
OV
.
Novel metrics for evaluating improvement in discrimination: net reclassification and integrated discrimination improvement for normal variables and nested models
.
Stat Med
2012
;
31
:
101
13
. .

25.

Wolff
RF
,
Moons
KGM
,
Riley
RD
. et al.
PROBAST: a tool to assess the risk of bias and applicability of prediction model studies
.
Ann Intern Med
2019
;
170
:
51
.

26.

Vu Nguyen
K
.
β-amyloid precursor protein (APP) and the human diseases
.
AIMS Neurosci
2019
;
6
:
273
81
.

27.

Grossman
TR
,
Luque
JM
,
Nelson
N
.
Identification of a ubiquitous family of membrane proteins and their expression in mouse brain
.
J Exp Biol
2000
;
203
:
447
57
.

28.

Vieira
AR
,
McHenry
TG
,
Daack-Hirsch
S
. et al.
Candidate gene/loci studies in cleft lip/palate and dental anomalies finds novel susceptibility genes for clefts
.
Genet Med
2008
;
10
:
668
74
.

29.

Hu
RM
,
Han
ZG
,
Song
HD
. et al.
Gene expression profiling in the human hypothalamus-pituitary-adrenal axis and full-length cDNA cloning
.
Proc Natl Acad Sci U S A
2000
;
97
:
9543
8
.

30.

Dempfle
A
,
Scherag
A
,
Hein
R
. et al.
Gene–environment interactions for complex traits: definitions, methodological requirements and challenges
.
Eur J Hum Genet
2008
;
16
:
1164
72
. .

31.

Hunter
DJ
.
Gene–environment interactions in human diseases
.
Nat Rev Genet
2005
;
6
:
287
98
.

32.

Yang
J
,
Benyamin
B
,
McEvoy
BP
. et al.
Common SNPs explain a large proportion of the heritability for human height
.
Nat Genet
2010
;
42
:
565
9
. .

33.

Ghersi
E
,
Noviello
C
,
D’Adamio
L
.
Amyloid-β protein precursor (AβPP) intracellular domain-associated protein-1 proteins bind to AβPP and modulate its processing in an isoform-specific manner
.
J Biol Chem
2004
;
279
:
49105
12
.

34.

Teng
S
,
Yang
JY
,
Wang
L
.
Genome-wide prediction and analysis of human tissue-selective genes using microarray expression data
.
BMC Med Genomics
2013
;
6
:
S10
.

35.

Ghersi
E
,
Vito
P
,
Lopez
P
. et al.
The intracellular localization of amyloid β protein precursor (AβPP) intracellular domain associated protein-1 (AIDA-1) is regulated by AβPP and alternative splicing
.
JAD
2004
;
6
:
67
78
.

36.

Paul
A
,
Cai
Y
,
Atwal
GS
. et al.
Developmental coordination of gene expression between synaptic partners during GABAergic circuit assembly in cerebellar cortex
.
Front Neural Circuits
2012
;
6
:37.

37.

Carbonell
AU
,
Cho
CH
,
Tindi
JO
. et al.
Haploinsufficiency in the ANKS1B gene encoding AIDA-1 leads to a neurodevelopmental syndrome
.
Nat Commun
2019
;
10
:
3529
.

38.

Jacob
AL
,
Jordan
BA
,
Weinberg
RJ
.
Organization of amyloid-β protein precursor intracellular domain-associated protein-1 in the rat brain
.
J Comp Neurol
2010
;
518
:
3221
36
.

39.

Zamzow
DR
,
Elias
V
,
Shumaker
M
. et al.
An increase in the association of GluN2B containing NMDA receptors with membrane scaffolding proteins was related to memory declines during aging
.
J Neurosci
2013
;
33
:
12300
5
. .

40.

Tindi
JO
,
Chavez
AE
,
Cvejic
S
. et al.
ANKS1B gene product AIDA-1 controls hippocampal synaptic transmission by regulating GluN2B subunit localization
.
J Neurosci
2015
;
35
:
8986
96
. .

41.

Mota
SI
,
Ferreira
IL
,
Rego
AC
.
Dysfunctional synapse in Alzheimer’s disease—a focus on NMDA receptors
.
Neuropharmacology
2014
;
76
:
16
26
.

42.

Enga
RM
,
Rice
AC
,
Weller
P
. et al.
Initial characterization of behavior and ketamine response in a mouse knockout of the post-synaptic effector gene Anks1b
.
Neurosci Lett
2017
;
641
:
26
32
. .

43.

Braff
DL
,
Grillon
C
,
Geyer
MA
.
Gating and habituation of the startle reflex in schizophrenic patients
.
Arch Gen Psychiatry
1992
;
49
:
206
15
.

44.

Braff
D
,
Stone
C
,
Callaway
E
. et al.
Prestimulus effects on human startle reflex in normals and schizophrenics
.
Psychophysiology
1978
;
15
:
339
43
.

45.

Czernych
R
,
Badyda
A
,
Kozera
G
. et al.
Assessment of low-level air pollution and cardiovascular incidence in Gdansk, Poland: time-series cross-sectional analysis
.
JCM
2023
;
12
:
2206
.

46.

Wang
Y
,
Eliot
MN
,
Wellenius
GA
.
Short-term changes in ambient particulate matter and risk of stroke: a systematic review and meta-analysis
.
JAHA
2014
;
3
:
e000983
.

47.

Lasek-Bal
A
,
Rybicki
W
,
Student
S
. et al.
Direct exposure to outdoor air pollution worsens the functional status of stroke patients treated with mechanical thrombectomy
.
JCM
2024
;
13
:
746
.

48.

Stulberg
EL
,
Twardzik
E
,
Kim
S
. et al.
Association of neighborhood socioeconomic status with outcomes in patients surviving stroke
.
Neurology
2021
;
96
:
e2599
610
. .

49.

Taylor
RL
,
Cooper
SR
,
Jackson
JJ
. et al.
Assessment of neighborhood poverty, cognitive function, and prefrontal and hippocampal volumes in children
.
JAMA Netw Open
2020
;
3
:
e2023774
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]