Abstract

Studies have highlighted the potential importance of modeling interactions for suicide attempt prediction. This case-cohort study identified risk factors for suicide attempts among persons with depression in Denmark using statistical approaches that do (random forests) or do not (least absolute shrinkage and selection operator regression [LASSO]) model interactions. Cases made a nonfatal suicide attempt (n = 6032) between 1995 and 2015. The comparison subcohort was a 5% random sample of all persons in Denmark on January 1, 1995 (n = 11 963). We used random forests and LASSO for sex-stratified prediction of suicide attempts from demographic variables, psychiatric and somatic diagnoses, and treatments. Poisonings, psychiatric disorders, and medications were important predictors for both sexes. Area under the receiver-operating characteristic curve (AUC) values were higher in LASSO models (in men, 0.85, 95% CI, 0.84-0.86; in women, 0.89, 95% C, 0.88-0.90) than random forests (in men, 0.76, 95% CI, 0.74-0.78; in women, 0.79, 95% CI = 0.78-0.81). Automatic detection of interactions via random forests did not result in better model performance than LASSO models that did not model interactions. Due to the complex nature of psychiatric comorbidity and suicide, modeling interactions may not always be the optimal statistical approach to enhancing suicide attempt prediction in high-risk samples.

This article is part of a Special Collection on Mental Health.

Introduction

Depression is the most common psychiatric disorder globally, afflicting approximately 280 million people worldwide.1 It is an important risk factor for suicide attempts.2 The lifetime prevalence of suicide attempts among persons with major depressive disorder is 31% (95% CI, 27-34), which is over 6 times the lifetime prevalence of suicide attempts in the general population.3,4 Suicide attempts are associated with injury, psychological distress, and increased risk of dying by suicide in a future attempt.5 A key challenge is our limited ability to predict who is at high-risk of suicide attempts and should be connected with preventive interventions. Traditional regression approaches that focus on isolating bivariate associations between single risk factors and suicide attempts have not been able to accurately predict suicide attempts.6 To address this challenge, researchers have increasingly used novel machine learning approaches for prediction. Machine learning methods permit examination of large sets of risk factors simultaneously to develop composite prediction functions and identify the most important predictors of suicide attempts.

Recent studies suggest that machine learning methods may be better able to predict suicide attempt risk compared with traditional methods.7,8 For example, Walsh et al.7 used random forests and traditional logistic regression models to predict suicide attempts using electronic health record data. They found that the random forests area under the receiver operating characteristic curve (AUC) values ranged from 0.80 to 0.84 whereas the logistic regression AUC values ranged from 0.66 to 0.68.7 The random forests results also showed that the variables that contributed the most to prediction accuracy for suicide attempts were poisonings, psychiatric disorders, substance use, and psychiatric medications.7 Another study predicting suicide attempts after outpatient clinic visits8 using electronic health records across 7 health systems found that the strongest predictors of suicide attempt were a prior suicide attempt, mental health and substance use diagnoses, and inpatient or emergency mental health care. Similarly, a study predicting suicide attempts using Danish registry data found that substance use disorders/treatment, psychiatric medications, previous poisoning diagnoses, and stress disorders were important predictors of suicide attempts.9 Clearly, there is an increasing focus on modeling complex combinations of factors for suicide attempt prediction.2,6,9,-11 At the same time, there is question in the literature as to whether machine learning approaches, including those that automatically detect interactions, are truly necessary and appropriate for all research questions.12 Accordingly, it is becoming commonplace to compare machine learning approaches to more standard approaches as an evaluation of necessity, and this may be especially important for questions with complex relationships among the predictors and outcome, as has been shown with psychiatric comorbidity and suicide.13

Although prior studies have used machine learning methods to predict suicide attempts among the general population,9 few have focused on suicide attempt prediction among persons living with depression, an important subgroup among which the majority of suicide attempts are concentrated. The development of prediction models in persons with depression may yield important differences in risk factors and interactions in this high-risk subgroup compared with models developed in the general population and may be of specific interest to clinicians working with depressed patients. Using prospective Danish registry data, we developed prediction models for suicide attempts among persons diagnosed with depression using both random forests analyses, to model main effects (including nonlinear effects) and interactions, and least absolute shrinkage and selection operator regression (LASSO) models that did not model interactions between variables. The goal of this work was not to develop clinically deployable predictive models but rather to use these statistical approaches to begin to gain a fuller understanding of the variables that might be important to understanding the etiology of suicide attempts, using a data-driven approach to identify risk factors of importance. Comparing and contrasting results from these models can inform us about the extent to which modeling interactions may affect model performance. We conducted sex-stratified analyses to examine potential sex differences in patient characteristics associated with suicide attempts.

Methods

Study design and sample

We conducted a case-cohort study using national medical and administrative registries in Denmark. Denmark provides all residents with a universal health care system.14,15 Cases were persons who made an incident suicide attempt between January 1, 1995, and December 31, 2015, restricted to persons with a depression diagnosis before the attempt (n = 6032). The comparison subcohort was a 5% random sample of all individuals born or residing in Denmark on January 1, 1995, restricted to persons who had an incident inpatient or outpatient depression diagnosis between January 1, 1995, and December 31, 2015 (n = 11 963). Incident suicide attempts were obtained from the Danish National Patient Registry14 using International Classification of Diseases, Tenth Revision (ICD-10), codes X60 to X84 without a death recorded in the Danish Register of Causes of Death16 in the subsequent 30 days. A validation study of the ICD-10 codes for suicide attempt (X60 to X84) in Denmark found that the positive predictive value for suicide attempts was approximately 73% (27% of the classified cases were reclassified as nonsuicidal events).17 Depression diagnoses were obtained using the inpatient and outpatient ICD-10 diagnoses F32 to F39, identified from the Danish National Patient Registry and the Danish Psychiatric Central Research Register. We did not match cases and controls on any factors to enable maximum variability in the predictors included in the machine learning analyses.

Predictors

We included the following predictors in our prediction models for suicide attempts: age, marital status, immigrant status, citizenship, family suicide death history (parent or spouse), employment, income, mental disorders, somatic disorders, surgeries, prescription drugs, and psychotherapy (any encounter for psychological services). Data on age, marital status, immigrant status, citizenship, and family suicide death history were obtained from the Danish Civil Registration System and the Danish Cause of Death Registry.16,18 Baseline data on employment and income were obtained from the Integrated Database for Labor Market Research and Income Statistics Register.19,20 Psychiatric disorder diagnoses were ascertained using 2-digit ICD-10 codes from the Danish Psychiatric Central Research Register and Danish National Patient Registry.14,21 The Danish Psychiatric Central Research Register includes the recorded dates of inpatient psychiatric stays and outpatient psychiatric visits since 1995 and includes up to 20 diagnoses per psychiatric treatment episode.21,22 The Danish National Patient Registry includes data on all inpatient hospitalizations in nonpsychiatric hospitals and hospital outpatient and emergency room visits since 1995.14 We also used the Danish National Patient Registry to obtain second-level ICD-10 codes for inpatient and outpatient somatic diagnoses, surgery procedure codes (according to body system), and any encounters for psychological services. The Danish National Prescription Registry,23,24 which has recorded data on all prescriptions sold in Danish pharmacies since 1995, was used to obtain data on prescription drugs recorded according to Level 3 Anatomical Therapeutic Chemical classification codes. A list of all predictor variables is provided in Table S1.

Demographic variables were defined at a single time point and diagnostic and treatment variables were treated as time-varying. Natal sex, age, employment, and income and immigrant status were defined at baseline. We examined diagnostic and treatment variables in a time-varying fashion to reflect changes in variables over time. Consistent with prior suicide-related machine learning studies,25,-27 we dummy-coded variables to create time-varying predictors with intervals of 0 to 6, 0 to 12, 0 to 24, and 0 to 48 months before the date of suicide attempt. To estimate the prevalence of each predictor in the person-time of the source population that gave rise to the cases, we randomly selected a date between the depression diagnosis date and the end of follow-up for comparison subcohort members and computed the prevalence of each predictor 0 to 6, 0 to 12, 0 to 24, and 0 to 48 months before the selected date.

Statistical analyses

To reduce the risk of overfitting in the machine learning models, we removed rare predictors that had 10 or fewer observations in any cell of a 2 × 2 contingency table of the predictor and suicide attempt for men and women separately. All predictors were binary. The initial analytical dataset contained 2543 predictors. After removing rare predictors, the final number of included predictors was 841 for men and 1079 for women. Retained predictors are shown in Table S1.

To assess the extent to which interactions may contribute to accurate suicide attempt prediction, we compared the results of sex-stratified random forests with LASSO models. We used the same set of predictors for both machine learning approaches. The random forests were constructed using 1000 trees with a minimum of 10 observations needed to make a split in each tree. To decorrelate the trees, the numbers of variables sampled as split candidates at each node were 29 for men and 33 for women, corresponding with the square root of the total number of predictors for men and women.28 Each individual tree in the random forests used equal proportions of suicide attempt observations and non–suicide attempt observations to address class imbalance. We performed 10-fold cross-validation (internal) of the random forests to generate predicted values for each individual and to calculate the mean decrease in accuracy (MDA) of each variable. We evaluated random forests’ AUC using 10-fold cross-validation and calculated the corresponding 95% CI using bootstrapping in 1000 replicates.29

Table 1

Baseline characteristics of the suicide attempt cases and the comparison subcohort, Denmark, January 1, 1995.

MenWomen
Suicide attempt cases (n = 2093)Comparison subcohort (n = 4281)Suicide attempt cases (n = 3939)Comparison subcohort (n = 7682)
VariableNo.%No.%No.%No.%
Age, yearsa32 (17)a42 (21)a29 (19)a43 (24)a
Marital status
Single126761%184343%229258%297439%
Married or registered partnership58628%183443%108728%283337%
Divorced, separated, or widowed24011%60414%56014%187524%
Immigrant934.4%2064.8%1493.8%3294.3%
Employment status
Employed107752%199447%133234%246532%
Unemployed25212%45111%53914%77710%
Early retirement1799%3658.5%49112%76910%
State pension23911%92322%53214%244832%
Age ≤ 14 years31615%50712%101426%115015%
Missing301.4%410.96%310.79%730.95%
MenWomen
Suicide attempt cases (n = 2093)Comparison subcohort (n = 4281)Suicide attempt cases (n = 3939)Comparison subcohort (n = 7682)
VariableNo.%No.%No.%No.%
Age, yearsa32 (17)a42 (21)a29 (19)a43 (24)a
Marital status
Single126761%184343%229258%297439%
Married or registered partnership58628%183443%108728%283337%
Divorced, separated, or widowed24011%60414%56014%187524%
Immigrant934.4%2064.8%1493.8%3294.3%
Employment status
Employed107752%199447%133234%246532%
Unemployed25212%45111%53914%77710%
Early retirement1799%3658.5%49112%76910%
State pension23911%92322%53214%244832%
Age ≤ 14 years31615%50712%101426%115015%
Missing301.4%410.96%310.79%730.95%

a Values are expressed as mean (standard deviation).

Table 1

Baseline characteristics of the suicide attempt cases and the comparison subcohort, Denmark, January 1, 1995.

MenWomen
Suicide attempt cases (n = 2093)Comparison subcohort (n = 4281)Suicide attempt cases (n = 3939)Comparison subcohort (n = 7682)
VariableNo.%No.%No.%No.%
Age, yearsa32 (17)a42 (21)a29 (19)a43 (24)a
Marital status
Single126761%184343%229258%297439%
Married or registered partnership58628%183443%108728%283337%
Divorced, separated, or widowed24011%60414%56014%187524%
Immigrant934.4%2064.8%1493.8%3294.3%
Employment status
Employed107752%199447%133234%246532%
Unemployed25212%45111%53914%77710%
Early retirement1799%3658.5%49112%76910%
State pension23911%92322%53214%244832%
Age ≤ 14 years31615%50712%101426%115015%
Missing301.4%410.96%310.79%730.95%
MenWomen
Suicide attempt cases (n = 2093)Comparison subcohort (n = 4281)Suicide attempt cases (n = 3939)Comparison subcohort (n = 7682)
VariableNo.%No.%No.%No.%
Age, yearsa32 (17)a42 (21)a29 (19)a43 (24)a
Marital status
Single126761%184343%229258%297439%
Married or registered partnership58628%183443%108728%283337%
Divorced, separated, or widowed24011%60414%56014%187524%
Immigrant934.4%2064.8%1493.8%3294.3%
Employment status
Employed107752%199447%133234%246532%
Unemployed25212%45111%53914%77710%
Early retirement1799%3658.5%49112%76910%
State pension23911%92322%53214%244832%
Age ≤ 14 years31615%50712%101426%115015%
Missing301.4%410.96%310.79%730.95%

a Values are expressed as mean (standard deviation).

We then fitted main effects LASSO models that did not model interactions among variables. LASSO shrinks coefficient estimates towards zero and can force some of the coefficient estimates to be exactly equal to zero. We considered a variable to be selected by LASSO if the estimated coefficient in the model was nonzero. We fitted the LASSO regression models using 10-fold cross-validation with lambda equal to the minimum mean cross-validated error. We evaluated the AUC values of the LASSO regression models using 10-fold cross-validation.

For both random forests and LASSO, we calculated the proportion of all suicide attempt cases that occurred in the top 5%, 10%, and 20% of the predicted probabilities for suicide attempts. We then calculated the proportion of all non–suicide attempters that occurred in the bottom 95%, 90%, and 80% of predicted probabilities.

Analyses were conducted in SAS, version 9.4, and R, version 3.5.2.30,31 We used the R packages glmnet32 and ranger.33 This study was determined to be exempt from review by the Boston University institutional review board and approved by the Danish Data Protection Agency (record number 2015-57-0002).

Results

Descriptive results

There were 2093 men who made a suicide attempt and had a prior depression diagnosis and 4281 men in the corresponding comparison subcohort. Among women, there were 3939 suicide attempt cases with a prior depression diagnosis and there were 7682 women in the corresponding comparison subcohort. Table 1 displays the descriptive characteristics of the study sample. For both men and women, cases were on average younger than the comparison subcohorts and a greater proportion of cases were single and unemployed.

Random forests

In the random forests, poisoning in any time interval had the greatest impact on model accuracy for men diagnosed with depression. Other important predictors included alcohol-related disorders, reaction to severe stress and adjustment disorders, and drugs used to treat psychiatric disorders (eg, drugs used to treat addictive disorders, anxiolytics, and hypnotics and sedatives). Antithrombotic agents prescribed at several time intervals and toxic effects of substances chiefly nonmedicinal as to source emerged as important predictors. Social variables such as receipt of state pension and being single were also important to accurate prediction of suicide attempts among men diagnosed with depression. Figure 1 displays the variable importance rankings of the top 30 predictors (median MDA value of all predictors with a nonzero MDA value, 0.29 [interquartile range, 0.06]). The cross-validated AUC of the random forest was 0.76 (95% CI, 0.74-0.78).

Variable importance of suicide attempt predictors in men with depression in Denmark from 10-fold cross-validated random forests, 1995-2015. RSS and AD refers to reaction to severe stress and adjustment disorders. Poisoning refers to poisoning by, adverse effect of, and underdosing of drugs, medicaments, and biological substances. Toxic effects of substances refer to those chiefly nonmedicinal as to source. The black dots represent the mean decrease in accuracy (MDA) value from 10-fold cross-validation. The vertical line represents the median of the MDA values of all predictors with a nonzero MDA value (median, 0.29; interquartile range, 0.06).
Figure 1

Variable importance of suicide attempt predictors in men with depression in Denmark from 10-fold cross-validated random forests, 1995-2015. RSS and AD refers to reaction to severe stress and adjustment disorders. Poisoning refers to poisoning by, adverse effect of, and underdosing of drugs, medicaments, and biological substances. Toxic effects of substances refer to those chiefly nonmedicinal as to source. The black dots represent the mean decrease in accuracy (MDA) value from 10-fold cross-validation. The vertical line represents the median of the MDA values of all predictors with a nonzero MDA value (median, 0.29; interquartile range, 0.06).

Similar to the random forests results in men, poisoning in any time interval was the most important variable for accurate prediction of suicide attempts among women diagnosed with depression (Figure 2). Psychiatric disorder diagnoses (eg, specific personality disorders, alcohol-related disorders, reaction to severe stress, and adjustment disorders) and psychiatric medication prescriptions (eg, antipsychotics, hypnotics and sedatives, and anxiolytics) at all time intervals were important predictors of suicide attempts in women. Also similar to the random forests findings in men, social variables, including receipt of state pension and remaining single at all time points, were among the top 30 most important predictors of suicide attempts. Prescription of other analgesics and antipyretics in the preceding 48 months emerged as an important predictor of suicide attempts among women. Figure 2 displays the random forest’s MDA values (median MDA value of predictors with a non-zero MDA value, 0.08 [IQR, 0.02]). The cross-validated AUC of the random forests was 0.79 (95% CI, 0.78-0.81).

Variable importance of suicide attempt predictors in women with depression in Denmark from 10-fold cross-validated random forests, 1995-2015. RSS and AD refers to reaction to severe stress and adjustment disorders. Poisoning refers to poisoning by, adverse effect of, and underdosing of drugs, medicaments, and biological substances. The black dots represent the mean decrease in accuracy (MDA) value from 10-fold cross-validation. The vertical line represents the median of the MDA values of all predictors with a non-zero MDA value (median, 0.08; interquartile range, 0.02).
Figure 2

Variable importance of suicide attempt predictors in women with depression in Denmark from 10-fold cross-validated random forests, 1995-2015. RSS and AD refers to reaction to severe stress and adjustment disorders. Poisoning refers to poisoning by, adverse effect of, and underdosing of drugs, medicaments, and biological substances. The black dots represent the mean decrease in accuracy (MDA) value from 10-fold cross-validation. The vertical line represents the median of the MDA values of all predictors with a non-zero MDA value (median, 0.08; interquartile range, 0.02).

In the random forests, men in the top 5%, 10%, and 20% of predicted risk accounted for 13%, 27%, and 49% of all suicide attempts among men diagnosed with depression, respectively. Men in the bottom 95%, 90%, and 80% of predicted suicide attempt risk accounted for 99%, 98%, and 94% of all men diagnosed with depression who did not make a suicide attempt, respectively. Women in the top 5%, 10%, and 20% of predicted risk accounted for 14%, 28%, and 53% of all suicide attempts among women with a depression diagnosis, respectively. Women in the bottom 95%, 90%, and 80% of predicted suicide attempt risk accounted for 100%, 99%, and 97% of all women diagnosed with depression who did not make a suicide attempt, respectively.

LASSO models

Figures 3 and 4 display the top 30 predictors of suicide attempts in the LASSO models for men and women. Although there was overlap in the top 30 predictors identified across the LASSO and random forests models for men and women (eg, poisoning, alcohol-related disorders, reaction to severe stress and adjustment disorders, anxiolytics, hypnotics and sedatives, specific personality disorders), injuries were more prominent in the LASSO results than in the random forests. The top predictors of suicide attempt in men according to the LASSO model were poisoning, toxic effects of substances, injuries (to the elbow and forearm, neck, thorax, abdomen, lower back, lumbar spine, pelvis, and external genitals, and to the wrist, hand, and fingers), and reaction to severe stress and adjustment disorders (Figure 3). Out of 841 predictors included in the LASSO model for men, 127 predictors had nonzero coefficients (15%).

Variable importance of suicide attempt predictors in men with depression in Denmark from 10-fold cross-validated least absolute shrinkage and selection operator regression (LASSO) model, 1995-2015. RSS and AD refers to reaction to severe stress and adjustment disorders. Poisoning refers to poisoning by, adverse effect of, and underdosing of drugs, medicaments, and biological substances. Various injuries refers to injuries to the abdomen, lower back, lumbar spine, pelvis, and external genitals.
Figure 3

Variable importance of suicide attempt predictors in men with depression in Denmark from 10-fold cross-validated least absolute shrinkage and selection operator regression (LASSO) model, 1995-2015. RSS and AD refers to reaction to severe stress and adjustment disorders. Poisoning refers to poisoning by, adverse effect of, and underdosing of drugs, medicaments, and biological substances. Various injuries refers to injuries to the abdomen, lower back, lumbar spine, pelvis, and external genitals.

Variable importance of suicide attempt predictors in women with depression in Denmark from 10-fold cross-validated least absolute shrinkage and selection operator regression (LASSO) model, 1995-2015. RSS and AD refers to reaction to severe stress and adjustment disorders. Poisoning refers to poisoning by, adverse effect of, and underdosing of drugs, medicaments, and biological substances. Disorders of ocular muscles, etc., refers to disorders of ocular muscles, binocular movement, accommodation, and refraction. Disorders of glucose regulation, etc., refers to other disorders of glucose regulation and pancreatic internal secretion.
Figure 4

Variable importance of suicide attempt predictors in women with depression in Denmark from 10-fold cross-validated least absolute shrinkage and selection operator regression (LASSO) model, 1995-2015. RSS and AD refers to reaction to severe stress and adjustment disorders. Poisoning refers to poisoning by, adverse effect of, and underdosing of drugs, medicaments, and biological substances. Disorders of ocular muscles, etc., refers to disorders of ocular muscles, binocular movement, accommodation, and refraction. Disorders of glucose regulation, etc., refers to other disorders of glucose regulation and pancreatic internal secretion.

In women, the top predictors were poisoning, toxic effects of substances, schizoaffective disorders, reaction to severe stress and adjustment disorders, and injuries to the wrist, hand, and fingers (Figure 4). Out of 1079 predictors included in the LASSO model for women, 130 predictors had nonzero coefficients (12%). The LASSO regression coefficients are shown in Tables S2 and S3.

The cross-validated AUC of the LASSO was 0.85 (95% CI, 0.84, 0.86) in men and 0.89 (95% CI, 0.88, 0.90) in women. Men in the top 5%, 10%, and 20% of predicted risk in the LASSO model accounted for 14%, 28%, and 50%, respectively, of all suicide attempts among men diagnosed with depression. Men in the bottom 95%, 90%, and 80% of predicted suicide attempt risk accounted for 99.5%, 99%, and 95%, respectively, of all men diagnosed with depression who did not make a suicide attempt. Women in the top 5%, 10%, and 20% of predicted risk accounted for 14%, 28%, and 53%, respectively, of all suicide attempts among women with a depression diagnosis. Women in the bottom 95%, 90%, and 80% of predicted suicide attempt risk accounted for 99.7%, 99%, and 97%, respectively, of all women diagnosed with depression who did not make a suicide attempt.

Discussion

We used random forests and LASSO to predict suicide attempts among men and women diagnosed with depression. Both random forests and LASSO models identified poisonings, psychiatric disorders, and psychiatric medications as important predictors of suicide attempts. However, injuries emerged as an important predictor in the LASSO models but not in the random forests. Moreover, the LASSO regression models had higher AUC values than the random forests. These results suggest that modeling interactions between study variables using random forests did not result in more accurate prediction of suicide attempts relative to a main effects LASSO model. Prior work has shown that interactions between variables are important for suicidal behavior.6 One potential explanation for why interactions did not play an important role in our study is that depression is such a strong risk factor for suicide attempts that it masks the effects of any weaker modifying variables. These results underscore the complexity of the relationships between depression and other risk factors (eg, psychiatric comorbidity) for suicide attempts that may lead to unexpected findings.13 Another possible explanation is that we lacked data on precipitating stressors that may have had important interactions with diagnostic variables associated with suicide attempt risk (ie, vulnerability-stress interactions). Such stressors may be strong predictors in the days and weeks preceding suicide,10 and lack of data on these variables may have hindered our ability to detect interactions between acute stressors and diagnostic variables necessary for accurate prediction suicide attempts.

Similar findings emerged for men and women; specific personality disorders, reaction to severe stress and adjustment disorders, prior poisonings, prescriptions for drugs used in addictive disorders, anxiolytics, hypnotics and sedatives, antipsychotics, prescriptions for antiinflammatory medications (eg, antithrombotic agents and other analgesics and antipyretics), and being single were important factors in predicting suicide attempts in men and women. However, injuries of different types were more prominent in the LASSO results for men than women. One potential explanation for these results is that prior suicide attempts may be more likely to be recorded as injuries (eg, to the hand, wrist, neck) for men than women. It is not always possible to differentiate acts of self-harm with and without intention to die. Records that do not have evidence of intention or a clear indication of intention to die are considered nonsuicidal events and may be coded as injuries.17 Men may be more likely to have a suicide attempt incorrectly classified as a nonsuicidal injury than women. As such, prior suicide attempts recorded as injuries may be predictive of a future suicide attempt in men with depression.

We did not find any particular time interval of predictors to be strongly predictive of suicide attempts in the random forests. Poisonings, psychiatric disorders, and medications were identified as top predictors across all time periods in the random forests. However, in the LASSO models, there was greater variability in the top predictors in terms of their timing. This finding suggests that using a main effects LASSO model may allow specific time points of predictors to stand out more than others compared with random forests which model interactions. For example, a poisoning diagnosis in the 6 months prior to a suicide attempt was the most important predictor in the LASSO models for both men and women, whereas poisoning diagnoses at all time points were the top predictors in the random forests. Knowledge of the importance of the recency of a poisoning diagnosis to suicide attempt prediction may be informative for prevention. Importantly, we used overlapping time intervals for predictors so that our work was similar to previous research. Future work should be aimed at building separate models that predict short- and long-term risk. The nature of the Danish registry data allows for the thorough examination of timing of incident disorders, but not full disorder duration. Another important avenue for future research in other data sources would be a more thorough characterization of full disorder duration in prediction models. In addition, in the random forests and LASSO models, men and women in the top 20% of predicted risk accounted for approximately 50% of suicide attempts. This finding indicates that targeting prevention programs to those at highest risk could lead to a meaningful reduction in the overall burden of suicide attempts, depending on the effectiveness (and costs) of the program.

A limitation of this study is that diagnoses of poisoning and toxic effects of substances may capture a mix of previous suicide attempts, nonsuicidal self-injury, and drug overdoses and adverse reactions. Any misclassification of poisonings and toxic effects of substances is expected to be nondifferential with respect to outcome classification in the current study because these predictors were measured prior to outcome occurrence. Despite this misclassification, these variables still appeared as the top predictors of suicide attempts in the machine learning analyses, suggesting that the inclusion of broader variables for which the intent of the poisoning is unclear may be useful in prediction models for suicide attempts. Another limitation is that we did not include psychiatric and somatic diagnoses prior to 1995 given that our study period was from 1995 to 2015. We chose 1995 as the beginning of the study period because it coincided with the switch from ICD-8 to ICD-10 in Denmark and the inclusion of outpatient visits to the somatic and psychiatric registries.14 The lack of psychiatric and somatic diagnosis data before 1995 may have had a limited impact on our results given that the largest time interval used for the time-varying predictors was 0 to 48 months prior to suicide or the randomly chosen date during the study period for the comparison subcohort. Thus, missing data prior to 1995 would only affect the small proportion of suicide cases who died within the first 4 years of the study period and subcohort members who had a random date that fell within the first 4 years of the study period. Moreover, we did not model income and employment as time-varying variables, which may have limited our ability to detect their importance to suicide attempt prediction. However, given the social safety nets in Denmark that relate to employment and income, we expect this to have little impact on our findings in this particular population. Finally, given our goal of using data-driven methods to identify novel risk factors and interactions that are important to suicide attempt prediction, it is crucial that our findings be replicated in other data sources and populations outside of Denmark to add further context and deepen knowledge on risk factors that are important to the etiology of suicidal behavior.

There is increasing focus on modeling complex combinations of variables for suicide attempt prediction. However, using data from Danish registries, we found that modeling interactions may not be necessary for prediction of suicide attempts specifically among high-risk subgroups, such as persons with depression. These results call for additional careful thought when considering appropriate approaches to statistically modeling suicidal behavior that may vary by patient population.

Supplementary material

Supplementary material is available at the American Journal of Epidemiology online.

Funding

This work was supported by National Institute of Mental Health grants R01MH109507 (PI: J.L.G.) and R01MH110453 (PI: J.L.G.).

Conflict of interest

The authors declare no conflicts of interest.

Disclaimer

The National Institute of Mental Health had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; and decision to submit the manuscript for publication.

Data availability

This population study used individual-level data from Danish medical and social registries. The ethical approval of this project does not include permission to publicly share raw data. For access to data, please contact the Department of Clinical Epidemiology at Aarhus University Hospital.

References

1.

Institute of Health Metrics and Evaluation
. Global Health Data Exchange (GHDx). Accessed December 1, 2023. http://ghdx.healthdata.org/gbd-results-tool?params=gbd-api-2019-permalink/d780dffbe8a381b25e1416884959e88b

2.

Ribeiro
 
JD
,
Huang
 
X
,
Fox
 
KR
, et al.  
Depression and hopelessness as risk factors for suicide ideation, attempts and death: meta-analysis of longitudinal studies
.
Br J Psychiatry.
 
2018
;
212
(
5
):
279
-
286
. https://doi.org/10.1192/bjp.2018.27

3.

Dong
 
M
,
Zeng
 
L-N
,
Lu
 
L
, et al.  
Prevalence of suicide attempt in individuals with major depressive disorder: a meta-analysis of observational surveys
.
Psychol Med.
 
2019
;
49
(
10
):
1691
-
1704
. https://doi.org/10.1017/S0033291718002301

4.

Kessler
 
RC
,
Borges
 
G
,
Walters
 
EE
.
Prevalence of and risk factors for lifetime suicide attempts in the National Comorbidity Survey
.
Arch Gen Psychiatry.
 
1999
;
56
(
7
):
617
-
626
. https://doi.org/10.1001/archpsyc.56.7.617

5.

Han
 
B
,
Kott
 
PS
,
Hughes
 
A
, et al.  
Estimating the rates of deaths by suicide among adults who attempt suicide in the United States
.
J Psychiatr Res.
 
2016
;
77
:
125
-
133
. https://doi.org/10.1016/j.jpsychires.2016.03.002

6.

Franklin
 
JC
,
Ribeiro
 
JD
,
Fox
 
KR
, et al.  
Risk factors for suicidal thoughts and behaviors: a meta-analysis of 50 years of research
.
Psychol Bull.
 
2017
;
143
(
2
):
187
-
232
. https://doi.org/10.1037/bul0000084

7.

Walsh
 
CG
,
Ribeiro
 
JD
,
Franklin
 
JC
.
Predicting risk of suicide attempts over time through machine learning
.
Clin Psychol Sci.
 
2017
;
5
(
3
):
457
-
469
. https://doi.org/10.1177/2167702617691560

8.

Simon
 
GE
,
Johnson
 
E
,
Lawrence
 
JM
, et al.  
Predicting suicide attempts and suicide deaths following outpatient visits using electronic health records
.
Am J Psychiatry.
 
2018
;
175
(
10
):
951
-
960
. https://doi.org/10.1176/appi.ajp.2018.17101167

9.

Gradus
 
JL
,
Rosellini
 
AJ
,
Horváth-Puhó
 
E
, et al.  
Predicting sex-specific nonfatal suicide attempt risk using machine learning and data from Danish national registries
.
Am J Epidemiol.
 
2021
;
190
(
12
):
2517
-
2527
. https://doi.org/10.1093/aje/kwab112

10.

Ribeiro
 
JD
,
Huang
 
X
,
Fox
 
KR
, et al.  
Predicting imminent suicidal thoughts and nonfatal attempts: the role of complexity
.
Clin Psychol Sci.
 
2019
;
7
(
5
):
941
-
957
. https://doi.org/10.1177/2167702619838464

11.

Gradus
 
JL
,
Rosellini
 
AJ
,
Horváth-Puhó
 
E
, et al.  
Prediction of sex-specific suicide risk using machine learning and single-payer health care registry data from Denmark
.
JAMA Psychiatry.
 
2020
;
77
(
1
):
25
-
34
. https://doi.org/10.1001/jamapsychiatry.2019.2905

12.

Wilkinson
 
J
,
Arnold
 
KF
,
Murray
 
EJ
, et al.  
Time to reality check the promises of machine learning-powered precision medicine
.
Lancet Digit Health.
 
2020
;
2
(
12
):
e677
-
e680
. https://doi.org/10.1016/S2589-7500(20)30200-4

13.

Jiang
 
T
,
Smith
 
ML
,
Street
 
AE
, et al.  
A comorbid mental disorder paradox: using causal diagrams to understand associations between posttraumatic stress disorder and suicide
.
Psychol Trauma.
 
2021
;
13
(
7
):
725
-
729
. https://doi.org/10.1037/tra0001033

14.

Schmidt
 
M
,
Schmidt
 
SAJ
,
Sandegaard
 
JL
, et al.  
The Danish National Patient Registry: a review of content, data quality, and research potential
.
Clin Epidemiol.
 
2015
;
7
:
449
. https://doi.org/10.2147/CLEP.S91125

15.

Laugesen
 
K
,
Ludvigsson
 
JF
,
Schmidt
 
M
, et al.  
Nordic health registry-based research: a review of health care systems and key registries
.
Clin Epidemiol.
 
2021
;
13
:
533
-
554
. https://doi.org/10.2147/CLEP.S314959

16.

Helweg-Larsen
 
K
.
The Danish register of causes of death
.
Scand J Public Health.
 
2011
;
39
(
7 suppl
):
26
-
29
. https://doi.org/10.1177/1403494811399958

17.

Gasse
 
C
,
Danielsen
 
AA
,
Pedersen
 
MG
, et al.  
Positive predictive value of a register-based algorithm using the Danish National Registries to identify suicidal events
.
Pharmacoepidemiol Drug Saf.
 
2018
;
27
(
10
):
1131
-
1138
. https://doi.org/10.1002/pds.4433

18.

Pedersen
 
CB
.
The Danish civil registration system
.
Scand J Public Health.
 
2011
;
39
(
7 suppl
):
22
-
25
. https://doi.org/10.1177/1403494810387965

19.

Baadsgaard
 
M
,
Quitzau
 
J
.
Danish registers on personal income and transfer payments
.
Scand J Public Health.
 
2011
;
39
(
7 suppl
):
103
-
105
. https://doi.org/10.1177/1403494811405098

20.

Timmermans
 
B
.
The Danish Integrated Database for Labor Market Research: Towards Demystification for the English Speaking Audience: Danish Research Unit for Industrial Dynamics
.
Department of Business Studies, DRUID Working Papers
.
Copenhagen, Denmark
:
Copenhagen Business School: Department of Industrial Economics and Strategy/Aalborg University
;
2010
.

21.

Mors
 
O
,
Perto
 
GP
,
Mortensen
 
PB
.
The Danish psychiatric central research register
.
Scand J Public Health.
 
2011
;
39
(
7 suppl
):
54
-
57
. https://doi.org/10.1177/1403494810395825

22.

Munk-Jørgensen
 
P
,
Mortensen
 
PB
.
The Danish psychiatric central register
.
Dan Med Bull.
 
1997
;
44
(
1
):
82
-
84
.

23.

Wallach Kildemoes
 
H
,
Toft Sørensen
 
H
,
Hallas
 
J
.
The Danish national prescription registry
.
Scand J Public Health.
 
2011
;
39
(
7_suppl
):
38
-
41
. https://doi.org/10.1177/1403494810394717

24.

Pottegård
 
A
,
Schmidt
 
SAJ
,
Wallach-Kildemoes
 
H
, et al.  
Data resource profile: the Danish national prescription registry
.
Int J Epidemiol.
 
2017
;
46
(
3
):
798
-
798f
. https://doi.org/10.1093/ije/dyw213

25.

McCarthy
 
JF
,
Bossarte
 
RM
,
Katz
 
IR
, et al.  
Predictive modeling and concentration of the risk of suicide: implications for preventive interventions in the US Department of Veterans Affairs
.
Am J Public Health.
 
2015
;
105
(
9
):
1935
-
1942
. https://doi.org/10.2105/AJPH.2015.302737

26.

Kessler
 
RC
,
Hwang
 
I
,
Hoffmire
 
CA
, et al.  
Developing a practical suicide risk prediction model for targeting high-risk patients in the Veterans Health Administration
.
Int J Methods Psychiatr Res.
 
2017
;
26
(
3
):e1575. https://doi.org/10.1002/mpr.1575

27.

Kessler
 
RC
,
Warner
 
CH
,
Ivany
 
C
, et al.  
Predicting suicides after psychiatric hospitalization in US Army soldiers: the Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS)
.
JAMA Psychiatry.
 
2015
;
72
(
1
):
49
-
57
. https://doi.org/10.1001/jamapsychiatry.2014.1754

28.

Gareth
 
J
,
Daniela
 
W
,
Trevor
 
H
, et al.  
An Introduction to Statistical Learning: With Applications in R
.
New York, NY
:
Springer
;
2013
.

29.

Robin
 
X
,
Turck
 
N
,
Hainard
 
A
, et al.  
pROC: an open-source package for R and S+ to analyze and compare ROC curves
.
BMC bioinformatics.
 
2011
;
12
(
1
):
1
-
8
. https://doi.org/10.1186/1471-2105-12-77

30.

Inc
 
SI
.
SAS/GRAPH 9.4
.
Cary, NC
:
SAS Institute, Inc
;
2013
.

31.

Development Core Team
.
R: A Language and Environment for Statistical Computing
.
Vienna, Austria
:
R Foundation for Statistical Computing
;
2017
.

32.

Tibshirani
 
R
.
Regression shrinkage and selection via the lasso
.
J R Stat Soc B Methodol.
 
1996
;
58
(
1
):
267
-
288
.

33.

Wright
 
MN
,
Ziegler
 
A
.
ranger: A fast implementation of random forests for high dimensional data in C++ and R
.
J Stat Softw.
 
2017
;
77
(
1
):
1
-
17
. https://doi.org/10.18637/jss.v077.i01

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/pages/standard-publication-reuse-rights)

Supplementary data