-
PDF
- Split View
-
Views
-
Cite
Cite
Andrew B Seidenberg, Richard P Moser, Brady T West, Preferred Reporting Items for Complex Sample Survey Analysis (PRICSSA), Journal of Survey Statistics and Methodology, Volume 11, Issue 4, September 2023, Pages 743–757, https://doi.org/10.1093/jssam/smac040
- Share Icon Share
Abstract
Methodological issues pertaining to transparency and analytic error have been widely documented for publications featuring analysis of complex sample survey data. The availability of numerous public use datasets to researchers without adequate training in using these data likely contributes to these problems. In an effort to introduce standards for reporting analyses of survey data and promote replication, we propose the Preferred Reporting Items for Complex Sample Survey Analysis (PRICSSA), an itemized checklist to guide researchers publishing analyses using complex sample survey data. PRICSSA is modeled after other checklists (e.g., PRISMA, CONSORT) that have been widely adopted for other research designs. The PRICSSA items include a variety of survey characteristics, such as data collection dates, mode(s), response rate, and sample selection process. In addition, essential analytic information—such as sample sizes for all estimates, missing data rates and imputation methods (if applicable), disclosing if any data were deleted, specifying what survey weight and sample design variables were used along with method of variance estimation, and reporting design-adjusted standard errors/confidence intervals for all estimates—are also included. PRICSSA also recommends that authors make all corresponding software code available. Widespread adoption of PRICSSA will help improve the quality of secondary analyses of complex sample survey data through transparency and promote scientific rigor and reproducibility.
Statement of Significance
Tremendous resources have been invested into complex sample survey design and data collection to produce accurate population estimates. However, reviews of the peer-reviewed literature have identified analytic and reporting errors within papers analyzing complex sample survey data. Publications containing these incorrect analyses could yield results that misinform well-intended policymakers, researchers, and practitioners. To help introduce standards for reporting analyses of survey data, we propose the Preferred Reporting Items for Complex Sample Survey Analysis (PRICSSA), an itemized checklist to guide researchers publishing analyses using complex sample survey data. Widespread adoption of PRICSSA will help improve the quality of secondary analyses of complex sample survey data and promote scientific rigor and reproducibility.
1. INTRODUCTION
Population-level survey research has become an important tool used in many disciplines for conducting surveillance and polling, evaluating policies and interventions, and for identifying disparate outcomes and behaviors among subpopulations of interest. For instance, the National Health Interview Survey (NHIS) and National Health and Nutrition Examination Survey (NHANES) are the main sources of data for monitoring trends in health behaviors and outcomes among the US population (CDC 2022a, 2022b). These and other surveys employ sampling design features that can produce unbiased estimates describing a target population (e.g., US adults). Furthermore, these surveys routinely oversample subpopulations of interest to ensure that underrepresented individuals (e.g., racial/ethnic minorities, rural populations) are adequately represented in the final data produced.
Complex samples are probability-based samples that employ design features such as stratification, cluster sampling, and unequal probabilities of selection (Heeringa et al. 2017). Population samples routinely use complex design features to improve statistical efficiency, reduce costs, and to increase sample sizes of underrepresented individuals (Heeringa et al. 2017). However, complex samples deviate from simple random samples (SRS) with important analytic implications for both point estimates and variance estimation. By default, most statistical software programs assume that data arise from SRS. Therefore, it is incumbent upon the data analyst to use the correct software procedures to account for complex sample design features when analyzing these types of data. Failing to account for complex design features (here forward referred to as analytic error) can yield biased estimates and incorrect standard errors and can increase the likelihood of committing a type I error (Ward 2018; West and Sakshaug 2018).
Unfortunately, analytic errors have commonly appeared in complex sample analyses reported in the peer-reviewed literature. For example, West, Sakshaug, and Aurelien (2016) examined 145 research products analyzing data from the Scientists and Engineers Statistical Data System and found that only 7.6 percent correctly accounted for complex sampling in variance estimation. West, Sakshaug, and Aurelien (2016) also found that a little more than half (54.5 percent) of papers correctly accounted for the sampling weights in analyses and only 10.7 percent of papers used appropriate subpopulation estimation. A separate review of publications analyzing data from the National Inpatient Sample found that 80 percent (106/133) of papers did not account for the sample’s clustering and stratification (Teng et al. 2020). Publications containing these incorrect analyses could yield results that misinform well-intending policymakers, researchers, and practitioners.
Reporting errors (i.e., failing to adequately describe data sources and relevant analytic plans for reproducibility by others) have also been documented in research analyzing complex sample survey data. For instance, a review by Briesacher et al. (2012) found that less than half of papers analyzing data from the Medicare Current Beneficiary Survey described appropriate weighting or variance estimation. Given the known problems with analyzing these data incorrectly, reporting errors should raise concerns that published analyses of complex sample survey data may be inaccurate.
To help eliminate analytic and reporting errors and increase transparency and reproducibility, we propose the Preferred Reporting Items for Complex Sample Survey Analysis (PRICSSA). PRICSSA is an itemized checklist to guide researchers publishing analyses using complex sample survey data. PRICSSA recommends the reporting of information on a variety of survey characteristics, such as data collection dates, data collection mode(s), survey response rate, and sample design. In addition, PRICSSA recommends essential analytic information—such as sample sizes for all estimates, missing data rates and imputation methods (if applicable), specifying what survey weight and sample design variables were used along with method of variance estimation, and reporting design-adjusted standard errors/confidence intervals for all estimates—to be reported. PRICSSA also recommends that authors make all corresponding software code publicly available to reviewers and readers.
PRICSSA is modeled after checklists that have been widely adopted for other research designs, including systematic reviews/meta-analysis (PRISMA) (Moher et al. 2015), randomized trials (CONSORT) (Schulz et al. 2010), observational studies (STROBE) (von Elm et al. 2007), and case reports (CARE) (Gagnier et al. 2013). Details about these and other guidelines can be found on the Enhancing the QUAlity and Transparency Of health Research (EQUATOR) Network website (EQUATOR Network 2022). A checklist for open data archives has also been created (Consortium of European Social Science Data Archives 2022). PRICSSA has some items in common with the compliance checklist designed by the American Association for Public Opinion Research’s (AAPOR) Transparency Initiative (TI) which lists required elements for release in the methodology statements of research publications (AAPOR 2021). However, PRICSSA is unique compared to the TI checklist in that it places much more of a focus on transparency in the specific analytic techniques and software used, in particular for correct variance estimation given the features of a complex sample. Moreover, while the TI is designed for survey research organizations, PRICSSA is designed for survey data analysts and researchers. Therefore, PRICSSA can serve as a complement to AAPOR’s TI.
Below, we provide a detailed explanation of each PRICSSA item along with its justification. In addition, table 1 includes example text of how to report each PRICSSA item in a manuscript (examples either quoted from publications or created by the authors with no quotes).
Itemized List of Each PRICSSA Item, a Detailed Description of Each Item, and Example Text of Each Item That Could Be Used in a Manuscript
PRICSSA item . | Description . | Example text . |
---|---|---|
1.1 Data collection dates | Describe the survey’s data collection dates (e.g., range) to provide historical context that could affect survey responses and nonresponse. | “The survey was administered February 27, 2020−June 15, 2020” (Seidenberg et al. 2022). |
1.2 Data collection mode(s) | Describe the survey’s data collection mode(s). Data collection mode can affect survey responses (e.g., to sensitive questions), including nonresponse, and a survey’s data collection mode may change over time (e.g., during the COVID-19 pandemic). | “Participants completed a self-administered mailed questionnaire in English or Spanish” (Wheldon et al. 2019). |
1.3 Target population | State the target population the survey was designed to represent and describe all weighted estimates with respect to this target population. | “The target population for the NHIS is the civilian noninstitutionalized population residing within the 50 states and the District of Columbia at the time of the interview” (Centers for Disease Control and Prevention 2022). |
1.4 Sample design | Describe the survey’s sample design, including information about stratification, cluster sampling, and unequal probabilities of selection. | “NHANES uses a complex, stratified, multistage probability cluster sampling design to select participants and collect data in 3-year cycles” (Shokeen and Sokal-Gutierrez 2022). |
1.5 Survey response rate(s) | State the survey’s response rate and how it was calculated. | “Response rates were calculated using the RR4 formula of the American Association of Public Opinion Research. The weighted response rate for HINTS 5 Cycle 4 (2020) was 36.7% overall, with variation by sampling strata (27.3% for the high minority stratum and 40.3% for the low minority stratum)” (Blake et al. 2022). |
2.1 Missingness rates | Report rates of missingness for variables of interest and models, and describe any methods (if any) for dealing with missing data (e.g., multiple imputation). | “An additional 3,046 observations (9.4%) were missing covariate data. Given systematic differences between observations with and without missing data (that is, participants with missing tobacco-use data were more likely to be older, male, nonwhite, unemployed, low-income, less educated, not have private-payer health insurance, been diagnosed with diabetes, and not visited the dentist in the past 12 months), we conducted multiple imputation analysis (15 imputations) using STATA's multiple imputation suite of commands” (Vora and Chaffee 2019). |
2.2 Observation deletion | State whether any observations were deleted from the dataset. If observations were deleted, provide a justification. Note: It is best practice to avoid deleting cases and use available subpopulation analysis commands no matter what variance estimation method is used. |
|
2.3 Sample sizes | Include unweighted sample sizes for all weighted estimates. |
|
2.4 Confidence intervals/standard errors | Include confidence intervals or standard errors when reporting all estimates to inform the reliability/precision of each estimate. | “Table 1 includes weighted point estimates and 95% CIs for support for each policy overall and by sociodemographic characteristics” (Seidenberg et al. 2022). |
2.5 Weighting | State which analyses were weighted and specify which weight variables were used in analysis. | All analyses were weighted with sample weights (person_finwt0). |
2.6 Variance estimation | Describe the variance estimation method used in the analysis and specify which design variables (e.g., PSU/stratum, replicate weights) were used. |
|
2.7 Subpopulation analysis | Describe the procedures used for conducting subpopulation analyses (e.g., Stata’s “subpop” command, SAS’s “domain” command). | “To stratify by sex in the analysis, we used the subpop() option in the svy commands that enables appropriate analysis of subpopulations in complex samples” (McCabe et al. 2022). |
2.8 Suppression rules | State whether or not a suppression rule was followed (e.g., minimum sample size or relative standard error). | In accordance with survey’s recommendations, we suppressed estimates for which the unweighted sample size was less than 30 or the estimate’s relative standard error exceeded 30%. |
2.9 Software and code | Report which statistical software was used, comprehensively describe data management and analysis in the manuscript, and provide all statistical software code. | All design-based analyses were performed using Stata’s svy: commands (in StataSE, Version 17). These included svy: tab for cross-tabulation and svy: logit for estimation of logistic regression models. Code enabling replication of the analyses can be found in the following online repository… |
2.10 Singleton problem (as needed) | Taylor Series Linearization requires at least two PSUs per stratum for variance estimation. Sometimes an analysis is being performed and there is only a single PSU in a stratum. There are several possible fixes to this problem, which should be detailed if the singleton problem is encountered. | Stata’s “singleunit(centered)” option was used, which specifies that strata with a single PSU be centered at the grand mean instead of the stratum mean. |
2.11 Public/restricted data (as needed) | If applicable, state whether the public use or restricted version of the dataset was analyzed. | “The analyses were conducted using the Restricted Use Data File (RUF)” (Elton-Marshall et al. 2020). |
2.12 Embedded experiments (as needed) | If applicable, provide information about split sample embedded experiments (e.g., mode of data collection or varying participant incentives) and detail whether experimental factors were accounted for in the analyses. | Because half of the sample completed the survey online and half completed via telephone, survey mode was controlled for in all analyses. |
PRICSSA item . | Description . | Example text . |
---|---|---|
1.1 Data collection dates | Describe the survey’s data collection dates (e.g., range) to provide historical context that could affect survey responses and nonresponse. | “The survey was administered February 27, 2020−June 15, 2020” (Seidenberg et al. 2022). |
1.2 Data collection mode(s) | Describe the survey’s data collection mode(s). Data collection mode can affect survey responses (e.g., to sensitive questions), including nonresponse, and a survey’s data collection mode may change over time (e.g., during the COVID-19 pandemic). | “Participants completed a self-administered mailed questionnaire in English or Spanish” (Wheldon et al. 2019). |
1.3 Target population | State the target population the survey was designed to represent and describe all weighted estimates with respect to this target population. | “The target population for the NHIS is the civilian noninstitutionalized population residing within the 50 states and the District of Columbia at the time of the interview” (Centers for Disease Control and Prevention 2022). |
1.4 Sample design | Describe the survey’s sample design, including information about stratification, cluster sampling, and unequal probabilities of selection. | “NHANES uses a complex, stratified, multistage probability cluster sampling design to select participants and collect data in 3-year cycles” (Shokeen and Sokal-Gutierrez 2022). |
1.5 Survey response rate(s) | State the survey’s response rate and how it was calculated. | “Response rates were calculated using the RR4 formula of the American Association of Public Opinion Research. The weighted response rate for HINTS 5 Cycle 4 (2020) was 36.7% overall, with variation by sampling strata (27.3% for the high minority stratum and 40.3% for the low minority stratum)” (Blake et al. 2022). |
2.1 Missingness rates | Report rates of missingness for variables of interest and models, and describe any methods (if any) for dealing with missing data (e.g., multiple imputation). | “An additional 3,046 observations (9.4%) were missing covariate data. Given systematic differences between observations with and without missing data (that is, participants with missing tobacco-use data were more likely to be older, male, nonwhite, unemployed, low-income, less educated, not have private-payer health insurance, been diagnosed with diabetes, and not visited the dentist in the past 12 months), we conducted multiple imputation analysis (15 imputations) using STATA's multiple imputation suite of commands” (Vora and Chaffee 2019). |
2.2 Observation deletion | State whether any observations were deleted from the dataset. If observations were deleted, provide a justification. Note: It is best practice to avoid deleting cases and use available subpopulation analysis commands no matter what variance estimation method is used. |
|
2.3 Sample sizes | Include unweighted sample sizes for all weighted estimates. |
|
2.4 Confidence intervals/standard errors | Include confidence intervals or standard errors when reporting all estimates to inform the reliability/precision of each estimate. | “Table 1 includes weighted point estimates and 95% CIs for support for each policy overall and by sociodemographic characteristics” (Seidenberg et al. 2022). |
2.5 Weighting | State which analyses were weighted and specify which weight variables were used in analysis. | All analyses were weighted with sample weights (person_finwt0). |
2.6 Variance estimation | Describe the variance estimation method used in the analysis and specify which design variables (e.g., PSU/stratum, replicate weights) were used. |
|
2.7 Subpopulation analysis | Describe the procedures used for conducting subpopulation analyses (e.g., Stata’s “subpop” command, SAS’s “domain” command). | “To stratify by sex in the analysis, we used the subpop() option in the svy commands that enables appropriate analysis of subpopulations in complex samples” (McCabe et al. 2022). |
2.8 Suppression rules | State whether or not a suppression rule was followed (e.g., minimum sample size or relative standard error). | In accordance with survey’s recommendations, we suppressed estimates for which the unweighted sample size was less than 30 or the estimate’s relative standard error exceeded 30%. |
2.9 Software and code | Report which statistical software was used, comprehensively describe data management and analysis in the manuscript, and provide all statistical software code. | All design-based analyses were performed using Stata’s svy: commands (in StataSE, Version 17). These included svy: tab for cross-tabulation and svy: logit for estimation of logistic regression models. Code enabling replication of the analyses can be found in the following online repository… |
2.10 Singleton problem (as needed) | Taylor Series Linearization requires at least two PSUs per stratum for variance estimation. Sometimes an analysis is being performed and there is only a single PSU in a stratum. There are several possible fixes to this problem, which should be detailed if the singleton problem is encountered. | Stata’s “singleunit(centered)” option was used, which specifies that strata with a single PSU be centered at the grand mean instead of the stratum mean. |
2.11 Public/restricted data (as needed) | If applicable, state whether the public use or restricted version of the dataset was analyzed. | “The analyses were conducted using the Restricted Use Data File (RUF)” (Elton-Marshall et al. 2020). |
2.12 Embedded experiments (as needed) | If applicable, provide information about split sample embedded experiments (e.g., mode of data collection or varying participant incentives) and detail whether experimental factors were accounted for in the analyses. | Because half of the sample completed the survey online and half completed via telephone, survey mode was controlled for in all analyses. |
Itemized List of Each PRICSSA Item, a Detailed Description of Each Item, and Example Text of Each Item That Could Be Used in a Manuscript
PRICSSA item . | Description . | Example text . |
---|---|---|
1.1 Data collection dates | Describe the survey’s data collection dates (e.g., range) to provide historical context that could affect survey responses and nonresponse. | “The survey was administered February 27, 2020−June 15, 2020” (Seidenberg et al. 2022). |
1.2 Data collection mode(s) | Describe the survey’s data collection mode(s). Data collection mode can affect survey responses (e.g., to sensitive questions), including nonresponse, and a survey’s data collection mode may change over time (e.g., during the COVID-19 pandemic). | “Participants completed a self-administered mailed questionnaire in English or Spanish” (Wheldon et al. 2019). |
1.3 Target population | State the target population the survey was designed to represent and describe all weighted estimates with respect to this target population. | “The target population for the NHIS is the civilian noninstitutionalized population residing within the 50 states and the District of Columbia at the time of the interview” (Centers for Disease Control and Prevention 2022). |
1.4 Sample design | Describe the survey’s sample design, including information about stratification, cluster sampling, and unequal probabilities of selection. | “NHANES uses a complex, stratified, multistage probability cluster sampling design to select participants and collect data in 3-year cycles” (Shokeen and Sokal-Gutierrez 2022). |
1.5 Survey response rate(s) | State the survey’s response rate and how it was calculated. | “Response rates were calculated using the RR4 formula of the American Association of Public Opinion Research. The weighted response rate for HINTS 5 Cycle 4 (2020) was 36.7% overall, with variation by sampling strata (27.3% for the high minority stratum and 40.3% for the low minority stratum)” (Blake et al. 2022). |
2.1 Missingness rates | Report rates of missingness for variables of interest and models, and describe any methods (if any) for dealing with missing data (e.g., multiple imputation). | “An additional 3,046 observations (9.4%) were missing covariate data. Given systematic differences between observations with and without missing data (that is, participants with missing tobacco-use data were more likely to be older, male, nonwhite, unemployed, low-income, less educated, not have private-payer health insurance, been diagnosed with diabetes, and not visited the dentist in the past 12 months), we conducted multiple imputation analysis (15 imputations) using STATA's multiple imputation suite of commands” (Vora and Chaffee 2019). |
2.2 Observation deletion | State whether any observations were deleted from the dataset. If observations were deleted, provide a justification. Note: It is best practice to avoid deleting cases and use available subpopulation analysis commands no matter what variance estimation method is used. |
|
2.3 Sample sizes | Include unweighted sample sizes for all weighted estimates. |
|
2.4 Confidence intervals/standard errors | Include confidence intervals or standard errors when reporting all estimates to inform the reliability/precision of each estimate. | “Table 1 includes weighted point estimates and 95% CIs for support for each policy overall and by sociodemographic characteristics” (Seidenberg et al. 2022). |
2.5 Weighting | State which analyses were weighted and specify which weight variables were used in analysis. | All analyses were weighted with sample weights (person_finwt0). |
2.6 Variance estimation | Describe the variance estimation method used in the analysis and specify which design variables (e.g., PSU/stratum, replicate weights) were used. |
|
2.7 Subpopulation analysis | Describe the procedures used for conducting subpopulation analyses (e.g., Stata’s “subpop” command, SAS’s “domain” command). | “To stratify by sex in the analysis, we used the subpop() option in the svy commands that enables appropriate analysis of subpopulations in complex samples” (McCabe et al. 2022). |
2.8 Suppression rules | State whether or not a suppression rule was followed (e.g., minimum sample size or relative standard error). | In accordance with survey’s recommendations, we suppressed estimates for which the unweighted sample size was less than 30 or the estimate’s relative standard error exceeded 30%. |
2.9 Software and code | Report which statistical software was used, comprehensively describe data management and analysis in the manuscript, and provide all statistical software code. | All design-based analyses were performed using Stata’s svy: commands (in StataSE, Version 17). These included svy: tab for cross-tabulation and svy: logit for estimation of logistic regression models. Code enabling replication of the analyses can be found in the following online repository… |
2.10 Singleton problem (as needed) | Taylor Series Linearization requires at least two PSUs per stratum for variance estimation. Sometimes an analysis is being performed and there is only a single PSU in a stratum. There are several possible fixes to this problem, which should be detailed if the singleton problem is encountered. | Stata’s “singleunit(centered)” option was used, which specifies that strata with a single PSU be centered at the grand mean instead of the stratum mean. |
2.11 Public/restricted data (as needed) | If applicable, state whether the public use or restricted version of the dataset was analyzed. | “The analyses were conducted using the Restricted Use Data File (RUF)” (Elton-Marshall et al. 2020). |
2.12 Embedded experiments (as needed) | If applicable, provide information about split sample embedded experiments (e.g., mode of data collection or varying participant incentives) and detail whether experimental factors were accounted for in the analyses. | Because half of the sample completed the survey online and half completed via telephone, survey mode was controlled for in all analyses. |
PRICSSA item . | Description . | Example text . |
---|---|---|
1.1 Data collection dates | Describe the survey’s data collection dates (e.g., range) to provide historical context that could affect survey responses and nonresponse. | “The survey was administered February 27, 2020−June 15, 2020” (Seidenberg et al. 2022). |
1.2 Data collection mode(s) | Describe the survey’s data collection mode(s). Data collection mode can affect survey responses (e.g., to sensitive questions), including nonresponse, and a survey’s data collection mode may change over time (e.g., during the COVID-19 pandemic). | “Participants completed a self-administered mailed questionnaire in English or Spanish” (Wheldon et al. 2019). |
1.3 Target population | State the target population the survey was designed to represent and describe all weighted estimates with respect to this target population. | “The target population for the NHIS is the civilian noninstitutionalized population residing within the 50 states and the District of Columbia at the time of the interview” (Centers for Disease Control and Prevention 2022). |
1.4 Sample design | Describe the survey’s sample design, including information about stratification, cluster sampling, and unequal probabilities of selection. | “NHANES uses a complex, stratified, multistage probability cluster sampling design to select participants and collect data in 3-year cycles” (Shokeen and Sokal-Gutierrez 2022). |
1.5 Survey response rate(s) | State the survey’s response rate and how it was calculated. | “Response rates were calculated using the RR4 formula of the American Association of Public Opinion Research. The weighted response rate for HINTS 5 Cycle 4 (2020) was 36.7% overall, with variation by sampling strata (27.3% for the high minority stratum and 40.3% for the low minority stratum)” (Blake et al. 2022). |
2.1 Missingness rates | Report rates of missingness for variables of interest and models, and describe any methods (if any) for dealing with missing data (e.g., multiple imputation). | “An additional 3,046 observations (9.4%) were missing covariate data. Given systematic differences between observations with and without missing data (that is, participants with missing tobacco-use data were more likely to be older, male, nonwhite, unemployed, low-income, less educated, not have private-payer health insurance, been diagnosed with diabetes, and not visited the dentist in the past 12 months), we conducted multiple imputation analysis (15 imputations) using STATA's multiple imputation suite of commands” (Vora and Chaffee 2019). |
2.2 Observation deletion | State whether any observations were deleted from the dataset. If observations were deleted, provide a justification. Note: It is best practice to avoid deleting cases and use available subpopulation analysis commands no matter what variance estimation method is used. |
|
2.3 Sample sizes | Include unweighted sample sizes for all weighted estimates. |
|
2.4 Confidence intervals/standard errors | Include confidence intervals or standard errors when reporting all estimates to inform the reliability/precision of each estimate. | “Table 1 includes weighted point estimates and 95% CIs for support for each policy overall and by sociodemographic characteristics” (Seidenberg et al. 2022). |
2.5 Weighting | State which analyses were weighted and specify which weight variables were used in analysis. | All analyses were weighted with sample weights (person_finwt0). |
2.6 Variance estimation | Describe the variance estimation method used in the analysis and specify which design variables (e.g., PSU/stratum, replicate weights) were used. |
|
2.7 Subpopulation analysis | Describe the procedures used for conducting subpopulation analyses (e.g., Stata’s “subpop” command, SAS’s “domain” command). | “To stratify by sex in the analysis, we used the subpop() option in the svy commands that enables appropriate analysis of subpopulations in complex samples” (McCabe et al. 2022). |
2.8 Suppression rules | State whether or not a suppression rule was followed (e.g., minimum sample size or relative standard error). | In accordance with survey’s recommendations, we suppressed estimates for which the unweighted sample size was less than 30 or the estimate’s relative standard error exceeded 30%. |
2.9 Software and code | Report which statistical software was used, comprehensively describe data management and analysis in the manuscript, and provide all statistical software code. | All design-based analyses were performed using Stata’s svy: commands (in StataSE, Version 17). These included svy: tab for cross-tabulation and svy: logit for estimation of logistic regression models. Code enabling replication of the analyses can be found in the following online repository… |
2.10 Singleton problem (as needed) | Taylor Series Linearization requires at least two PSUs per stratum for variance estimation. Sometimes an analysis is being performed and there is only a single PSU in a stratum. There are several possible fixes to this problem, which should be detailed if the singleton problem is encountered. | Stata’s “singleunit(centered)” option was used, which specifies that strata with a single PSU be centered at the grand mean instead of the stratum mean. |
2.11 Public/restricted data (as needed) | If applicable, state whether the public use or restricted version of the dataset was analyzed. | “The analyses were conducted using the Restricted Use Data File (RUF)” (Elton-Marshall et al. 2020). |
2.12 Embedded experiments (as needed) | If applicable, provide information about split sample embedded experiments (e.g., mode of data collection or varying participant incentives) and detail whether experimental factors were accounted for in the analyses. | Because half of the sample completed the survey online and half completed via telephone, survey mode was controlled for in all analyses. |
PRICSSA also has implications for survey data providers as researchers will be unable to comply with PRICSSA if survey data providers fail to provide adequate supporting information about the survey or fail to provide survey design variables in a dataset (Kolenikov et al. 2020; Jabkowski et al. 2021). Compliance with PRICSSA may also be higher if survey data providers make identifying PRICSSA-related information easy and fast. Therefore, we encourage survey data providers to create and share a 1–2 page document that lists all of the survey’s PRICSSA-related content. Table 2 provides an example of such a document.
Name and wave of survey: Health Survey Wave 3 |
Data collection mode: in home, computer-assisted personal interviewing (CAPI) |
Dates of data collection: May 1, 2019–December 22, 2019 |
Target population: civilian noninstitutionalized adults (18 years of age or older) living in 50 US states and Washington, DC |
Populations excluded: active-duty military, persons in supervised care or custody in institutional settings |
Design: multistage, stratified cluster sample |
Variance estimation: Taylor Series Linearization |
Weight and design variables |
Weight: finalweight |
PSU: psu |
Stratum: stratum |
Unweighted total sample size: 30,000 |
Weighted total sample size: 270,000,000 |
Response rate: 65 percent (AAPOR RR4) |
Location of example code: see “Health Survey Analytic Documentation” |
Name and wave of survey: Health Survey Wave 3 |
Data collection mode: in home, computer-assisted personal interviewing (CAPI) |
Dates of data collection: May 1, 2019–December 22, 2019 |
Target population: civilian noninstitutionalized adults (18 years of age or older) living in 50 US states and Washington, DC |
Populations excluded: active-duty military, persons in supervised care or custody in institutional settings |
Design: multistage, stratified cluster sample |
Variance estimation: Taylor Series Linearization |
Weight and design variables |
Weight: finalweight |
PSU: psu |
Stratum: stratum |
Unweighted total sample size: 30,000 |
Weighted total sample size: 270,000,000 |
Response rate: 65 percent (AAPOR RR4) |
Location of example code: see “Health Survey Analytic Documentation” |
Name and wave of survey: Health Survey Wave 3 |
Data collection mode: in home, computer-assisted personal interviewing (CAPI) |
Dates of data collection: May 1, 2019–December 22, 2019 |
Target population: civilian noninstitutionalized adults (18 years of age or older) living in 50 US states and Washington, DC |
Populations excluded: active-duty military, persons in supervised care or custody in institutional settings |
Design: multistage, stratified cluster sample |
Variance estimation: Taylor Series Linearization |
Weight and design variables |
Weight: finalweight |
PSU: psu |
Stratum: stratum |
Unweighted total sample size: 30,000 |
Weighted total sample size: 270,000,000 |
Response rate: 65 percent (AAPOR RR4) |
Location of example code: see “Health Survey Analytic Documentation” |
Name and wave of survey: Health Survey Wave 3 |
Data collection mode: in home, computer-assisted personal interviewing (CAPI) |
Dates of data collection: May 1, 2019–December 22, 2019 |
Target population: civilian noninstitutionalized adults (18 years of age or older) living in 50 US states and Washington, DC |
Populations excluded: active-duty military, persons in supervised care or custody in institutional settings |
Design: multistage, stratified cluster sample |
Variance estimation: Taylor Series Linearization |
Weight and design variables |
Weight: finalweight |
PSU: psu |
Stratum: stratum |
Unweighted total sample size: 30,000 |
Weighted total sample size: 270,000,000 |
Response rate: 65 percent (AAPOR RR4) |
Location of example code: see “Health Survey Analytic Documentation” |
While we feel that all PRICSSA items are important and should be included in papers reporting an analysis of complex sample survey data, journal word counts may be viewed as a barrier. We encourage authors and journals to utilize supplementary files where word counts prohibit authors from including all PRICSSA items in the manuscript’s text. Journals may create their own preferences for whether and how PRICSSA-related information should be presented as a supplementary file. In addition, there are situations where PRICSSA may not be appropriate to follow; for instance, when researchers develop statistical methodology using simulations of complex sample survey data or when users do not have access to microdata. Lastly, we welcome feedback from readers (which can be emailed to the corresponding author) and plan to revise and update PRICSSA in the future as needed.
2. PRICSSA ITEMS
1 PRICSSA Survey Characteristics
1.1 Data collection dates
Authors should describe the survey’s data collection dates (e.g., range). Detailing the data collection dates helps inform the historical context (e.g., COVID-19 pandemic) surrounding data collection, which could have impacted survey responses and survey nonresponse. In addition, survey year (e.g., 2019 survey) does not always correspond to dates of data collection (e.g., November–December 2018).
1.2 Data collection mode(s)
The survey’s data collection mode(s) should be clearly detailed, as data collection mode can affect the accuracy of survey responses (e.g., underreporting of risky behaviors), including nonresponse. Moreover, some surveys have changed their data collection modes over time which has implications for trend analyses (Olson et al. 2021). For instance, many in-person surveys switched to telephone surveys due to the COVID-19 pandemic.
1.3 Target population
Authors should clearly state the target population the survey was designed to represent (e.g., non-institutionalized adults, 18 years of age or older) and describe all weighted estimates with respect to this target population. Previous reviews have identified numerous papers failing to describe weighted results with respect to the population, and instead improperly interpret findings with respect to the sample (West et al. 2016).
1.4 Sample design
Authors should completely describe the survey’s sample design, including information about stratification, cluster sampling, and unequal probability of selection methods as appropriate. This information is critical to understanding how the data were collected and helps inform how the data should be analyzed.
1.5 Survey response rate
Authors should state the survey’s response rate and how it was calculated (e.g., AAPOR RR4; AAPOR 2022). Survey nonresponse is inherent to all complex sample surveys and has the potential to introduce bias (Groves and Peytcheva 2008).
2 PRICSSA Analytic Characteristics
2.1 Missingness rates
All complex sample survey datasets have missing data and missing data can introduce bias in estimates. Moreover, the inclusion of many variables with low rates of missingness in a model can result in considerable overall missingness due to listwise deletion. A variety of strategies have been developed for dealing with missing data (Allison 2001; Kalpourtzi et al. 2023). Authors should report rates of missingness for variables of interest and models and describe any methods (if any) for dealing with missing data (e.g., fractional imputation using proc surveyimpute in SAS) (Yang and Kim 2016).
2.2 Observation deletion
Deleting observations drawn from complex sample survey data can lead to incorrect variance estimation when Taylor Series Linearization is used. For instance, observation deletion could reduce the number of strata or clusters. Deleting observations can sometimes also cause singleton problems (i.e., single PSU within a stratum). We note that while analysts would technically be performing a correct analysis when deleting ineligible cases and using replicate weights for variance estimation, it is best practice to avoid deleting cases and use available subpopulation analysis commands no matter what variance estimation method is used. Authors should state whether any observations were deleted from the dataset. If observations were deleted, authors should provide a justification.
2.3 Samples sizes
The precision of an estimate is influenced by its unweighted sample size. Estimates for subpopulations of interest could be based on small sample sizes which may result in imprecise estimates as evidenced by extremely wide confidence intervals and potentially raises concerns about breaking confidentiality. Missing data can further reduce unweighted sample sizes. In some situations, authors may choose to suppress estimates based on sample sizes that are too small (see section 2.8). Authors should report unweighted sample sizes for all weighted estimates.
2.4 Confidence intervals or standard errors
Weighted complex sample survey data can produce unbiased estimates describing a population. These estimates may be imprecise. When reporting findings, authors should include confidence intervals or standard errors with all estimates to inform the reliability/precision of each estimate.
2.5 Weighting and weight variables
Surveys frequently provide adjusted sample weights to account for unequal probabilities of selection, nonresponse, and poststratification and to produce nationally representative estimates. Authors should state which analyses were weighted and specify the type and name of the weight variables that were used in the analysis. Specifying which weight variables were used is needed because some surveys provide multiple weight variables in a dataset. For instance, surveys may provide cross-sectional weights, longitudinal weights, or weights intended for a subset of participants and measures (e.g., biomarker data were collected from a subset of participants). In addition, some surveys provide replicate weights for variance estimation. Specifying the types and names of weight variables used will help increase reproducibility and identify analytic mistakes. Moreover, analysts are encouraged to examine the sum of sample weights as a data quality check which should sum to the sample size or population size. In the latter case, population totals may be estimated.
2.6 Variance estimation method and variables
Analysts must account for design features when analyzing data from complex samples. Common methods of variance estimation for complex sample survey data include Taylor series linearization, balanced repeated replication, jackknife repeated replication, and bootstrapping. Authors should describe the variance estimation method used in the analysis and specify which design variables (e.g., PSU/stratum, replicate weights) were used.
2.7 Subpopulation analysis
Data analysts frequently produce estimates for subpopulations of interest, requiring the use of correct software procedures when analyzing complex sample survey data. For instance, Stata’s “subpop” option (available for all of Stata’s svy commands) and SAS’s “domain” command (available for nearly all of SAS’s SURVEY commands) can be used for appropriate subpopulation analysis (West et al. 2008; Heeringa et al. 2017). Incorrect procedures used for subpopulation estimates may effectively delete observations, potentially resulting in inaccurate variance estimation. Therefore, authors should describe the procedures used for conducting subpopulation analyses.
2.8 Suppression rule
Some surveys and statistical organizations recommend data analysts suppress results when a subpopulation sample size is too small (e.g., less than 30), a relative standard error is too large (e.g., greater than 30 percent), or a confidence interval is too wide (e.g., an absolute Korn–Graubard CI width ≥0.30). Such guidance is usually driven by concerns about confidentiality or the reliability and accuracy of an estimate. For instance, the National Center for Health Statistics has established suppression recommendations for proportions and created a SAS macro and Stata program to assist researchers in flagging estimates that should be considered for suppression (Bush and Elgaddal 2019; Ward 2019). While PRICSSA does not require authors to follow any suppression rule, PRICSSA recommends that authors state whether or not a suppression rule was followed.
2.9 Software and code
While many common statistical software packages include procedures that can be used to analyze complex sample survey data (e.g., SAS, Stata, R, SUDAAN, SPSS), there are important software-specific differences. For instance, the default in Stata is to produce logit-transformed 95 percent confidence intervals for a weighted proportion estimate, while the default in SAS is to construct Wald confidence intervals (SAS 2021; Stata 2021). In addition, although Stata, SAS, R, and SUDAAN can estimate variance using Taylor series linearization or replicate weights, SPSS is limited to Taylor series linearization. Therefore, it is important that authors state which software and procedures were used in their analysis, including specific versions of software packages. Furthermore, to increase transparency and reproducibility, authors should accurately and comprehensively describe data management and analysis in their manuscript and provide all statistical software code. This is especially important when analysts use approximations when forming stratum and cluster codes and when analytic files collapse strata and PSUs for disclosure protection. A variety of free online repositories are currently available for sharing code, including GitHub, Open Science Framework, and Research Box (Center for Open Science 2022; GitHub 2022; Wharton Credibility Lab 2022).
2.10 (As needed): Singleton problems
Taylor series linearization requires at least two PSUs per stratum for variance estimation. Sometimes an analysis is being performed and there is only a single PSU in a stratum. This can occur when subpopulation analyses are being performed incorrectly (see section 2.7). Analysts should consider contacting the survey data providers if the singleton problem is encountered (and subpopulation analyses are being correctly performed). Analysts may also consider ad hoc fixes to this problem, such as specifying the singleunit option with Stata’s svyset command (Stata 2021). Analysts should disclose if they encounter the singleton problem and explain how it was handled.
2.11 (As needed): Public use versus restricted dataset
Some complex sample survey datasets (e.g., Population Assessment of Tobacco and Health) are made available in two forms: public use and restricted use (Hyland et al. 2017). Typically, public use datasets can be accessed by anyone without many constraints. In contrast, restricted use datasets provide additional and potentially identifiable or sensitive data and can only be accessed by adhering to strict research data access and analysis protocols. If applicable, authors should specify whether they are analyzing the public use or restricted version of a dataset.
2.12 (As needed): embedded experiments
Some surveys include split sample experiments. For example, participants respond to the survey using different modes or participants are offered different incentives. Analysts should provide information about these (and potentially other experiments) and detail whether experimental factors were accounted for in the analyses.
3. CONCLUSIONS
Tremendous resources have been invested into survey design and data collection to produce accurate population estimates. It is critical to ensure that complex sample survey data are correctly analyzed by incorporating design features and that results reported in peer-reviewed publications do not misinform well-intended policymakers, practitioners, and researchers.
The PRICSSA checklist has the potential to increase the rigor and reproducibility of survey research by improving the quality of analysis and increasing transparency.
Opinions expressed by the authors are their own and this material should not be interpreted as representing the official viewpoint of the US Department of Health and Human Services, the National Institutes of Health, or the National Cancer Institute.