-
PDF
- Split View
-
Views
-
Cite
Cite
Michelle Jeong, Dongyu Zhang, Jennifer C Morgan, Jennifer Cornacchione Ross, Amira Osman, Marcella H Boynton, Jennifer R Mendel, Noel T Brewer, Similarities and Differences in Tobacco Control Research Findings From Convenience and Probability Samples, Annals of Behavioral Medicine, Volume 53, Issue 5, May 2019, Pages 476–485, https://doi.org/10.1093/abm/kay059
- Share Icon Share
Abstract
Online convenience samples are a quick and low-cost way to study health behavior, but the comparability to findings from probability samples is not yet well understood.
We sought to compare convenience and probability samples’ findings for experiments, correlates, and prevalence in the context of tobacco control research.
Participants were a probability sample of 5,014 U.S. adults recruited by phone from September 2014 through May 2015 (cost ~U.S.$620,000) and an online convenience sample of 4,137 U.S. adults recruited through Amazon Mechanical Turk (MTurk) in December 2014 (cost ~U.S.$17,000). Participants completed a survey with experiments, measures of tobacco product use and demographic characteristics.
MTurk convenience and probability samples showed the same pattern of statistical significance and direction in almost all experiments (21 of 24 analyses did not differ) and observational studies (19 of 25 associations did not differ). Demographic characteristics of the samples differed substantially (1 of 17 estimates did not differ), with the convenience sample being younger, having more years of education, and including more Whites and Asians. Tobacco product use also differed substantially (1 of 22 prevalence estimates did not differ), with the convenience sample reporting more cigarette and e-cigarette use (median error 19%).
Using MTurk convenience samples can yield generalizable findings for experiments and observational studies. Prevalence estimates from MTurk convenience samples are likely to be over- or underestimates.
Introduction
Population-based sampling is a gold standard in behavioral research for assessing how common risk factors and health behaviors are [1, 2]. These samples are generated through various methods, including random-digit dialing and address-based sampling, which can be time- and cost-intensive to implement [1, 3, 4]. In contrast, convenience samples can save time, keep costs low, and thus permit behavioral scientists to study rare or hard to reach populations [2]. Underfunded behavioral research studies have recruited college students as research subjects for many decades, arguing that convenience samples are appropriate for testing theories of general mental processes that guide human behavior [5, 6]. Researchers have intensively debated the generalizability of findings from convenience samples [2], and especially college student samples [7]. A source of adult convenience samples that is increasingly popular as an alternative to college students is Amazon Mechanical Turk (MTurk), an online crowdsourcing platform that provides access to participants who can complete online tasks in a short amount of time at a low cost to the researcher [8, 9]. In recent years, the number of social science studies recruiting convenience samples via MTurk has risen [10], including within health behavior, leading to predictable questions about the generalizability of the findings.
To address this question of generalizability, we examined several outcomes of interest to health behavior researchers. We aimed to examine whether probability-based samples and MTurk convenience samples yield similar findings for experiments, a topic that has thus far received little attention. We also aimed to examine whether probability and MTurk convenience samples yield similar estimates for associations between demographics and health outcomes, as well as prevalence of demographics and health behaviors, given their utility in health behavior research. We addressed these research topics in our study using the example of tobacco control and prevention.
Materials and Methods
Participants
From September 2014 to May 2015, the Carolina Survey Research Laboratory (CSRL) at the University of North Carolina recruited a national U.S. probability sample of 5,014 English- or Spanish-speaking adults (aged 18 and older). Two independent random-digit dialing frames provided 98% coverage of U.S. adult households. To ensure inclusion of smokers, CSRL stratified both frames by household income and smoking rates at the county level, where the poorest counties with the highest smoking rates were oversampled. CSRL also oversampled cell phone numbers to ensure inclusion of young adults. Within the landline frame, if more than one eligible adult resided in the household, young adults and smokers were sampled at a higher rate than older adult nonsmokers. CSRL provided survey weights to account for the sampling design including stratification. Further details on the sampling approach and sample demographic characteristics are available elsewhere [11].
In December 2014, we recruited a convenience sample of 4,137 U.S. adults through MTurk. To oversample smokers, the recruitment advertisement encouraged participation of current users of cigarettes, little cigar/cigarillos, hookah, or e-cigarette/vaping devices, but it also welcomed nonusers. As is typical to ensure data quality in MTurk studies, we restricted participation to MTurk workers who had approval ratings of at least 90% and had previously completed at least 100 tasks.
Procedures
For the probability sample, phone interviewers used a computer-assisted telephone interviewing (CATI) system to administer the survey and record participants’ responses. Participants received $40 for completing the phone survey. The estimated cost of conducting the phone survey was $619,877. For the MTurk convenience sample, participants accessed and completed the survey online using Qualtrics software and received $3.60 upon completion. The cost of conducting the MTurk survey was $16,949. Informed consent was obtained from all individual participants included in both samples.
Measures and Experiments
The probability and convenience samples completed a survey that was largely identical, with small changes to adapt the phone script to an online format. The survey administered several experiments that were embedded throughout. Several of these experiments examined whether learning about the presence or amount of chemicals in cigarette smoke changed people’s interest in natural or organic cigarettes, switching cigarette brands, or dual use of cigarettes and e-cigarettes. Other experiments examined whether messages about chemicals’ health effects or where chemicals can be found discouraged people from smoking, and whether targeting vulnerable demographic groups (teens, Blacks, Hispanics, or sexual minorities) or being a member of one of these vulnerable groups increased support for antismoking advertisements. An additional quasi-experiment examined the effect of features of chemical names (e.g., ending in some version of “ine,” such as “nicotine”) on discouragement from smoking. Further details are available in the articles already published [12–16] or in preparation (S. A. Baig et al., unpublished data, 2018).
The survey assessed key demographic variables, including age, race, ethnicity, sex, sexual orientation, education, numeracy, and household income, as well as general and mental health status. Most of these measures were adapted from large national surveys, including the 2013 Behavioral Risk Factor Surveillance System survey [17] and the 2011 Population Assessment of Tobacco and Health survey [18]. We defined current cigarette use as having smoked at least 100 cigarettes in one’s lifetime and currently smoking some days or every day. We defined use of e-cigarettes and other vaping devices, little cigars/cigarillos, hookah, and other tobacco products as having used the product in the past 30 days. Definitions of these products were provided before asking questions about use to ensure participant understanding of the products being discussed. Among current cigarette users, those who reported planning to quit smoking within the next 6 months were classified as having intentions to quit.
The survey assessed information seeking: (a) whether participants had looked for information on chemicals in cigarettes and cigarette smoke, and (b) where they would most like to see such information (on cigarette packs, in stores, or online). The survey also assessed participants’ beliefs regarding where most of the harmful chemicals in cigarettes and cigarette smoke come from (tobacco before it is made into cigarettes, tobacco additives, or burning the cigarette), and the amount of chemicals trapped by cigarette filters.
Survey software randomly assigned participants to one of six panels of questions about four chemicals in cigarette smoke (24 chemicals in total across the panels). For each of a participant’s four assigned chemicals, the survey assessed whether participants had heard that the chemical is in cigarette smoke and, if so, their beliefs about the harmfulness of the chemical. All participants received a question about how much the chemical discouraged them from wanting to smoke.
Finally, the survey assessed respondents’ opinion of a typical smoker their age and a typical e-cigarette user their age. We categorized responses of somewhat positive and very positive opinions as having positive attitudes toward smokers or e-cigarette users.
Data Analysis
To examine whether experimental findings were consistent across the two samples, we took a descriptive approach. As mentioned above, the findings of the experiments that reported results from both the probability and MTurk convenience samples were previously published [12–16] or are being prepared for publication (S. A. Baig et al., unpublished data, 2018). Thus, we extracted the results from the papers and noted whether for each paper, the two samples provided the same pattern of direction and statistical significance of influence of the experimental manipulations. Analyses of experiments for both samples were unweighted to preserve the randomization [19].
We conducted unadjusted logistic regressions to test whether associations among measured variables were consistent across the two samples. The outcomes were current cigarette use and current e-cigarette use. The predictors were demographic characteristics: age, sex, sexual orientation, ethnicity, race, education, and numeracy; as well as cigarette smoking status in analyses of current e-cigarette use. For the probability sample, analyses of associations among measured variables were weighted (see Boynton et al. [11] for detailed weighting procedures). Corresponding analyses for the MTurk convenience sample were unweighted because survey weights are not available for convenience samples.
To examine whether prevalence estimates were consistent across the two samples, we calculated unweighted estimates for the MTurk convenience sample and weighted estimates for the probability sample. We calculated means for continuous variables (e.g., cigarettes smoked per day) and frequencies for categorical variables (e.g., Hispanic ethnicity) or level of categorical variables (e.g., age 18–24). We subtracted the point estimate for the probability sample from the corresponding point estimate for the Mturk convenience sample. Finally, we examined whether the resulting statistic was different from zero using one-sample t-tests (for continuous variables) and one-sample proportion tests (for categorical variables). One-sample tests were ideal for our study because they do not require the statistics being compared to have equal variance. This approach presumes that weighted data obtained from the probability sample are a normative standard, which is appropriate given that the weighted demographic estimates were comparable to those from other nationally representative surveys [11]. We conducted these analyses for the overall sample, as well as stratified on smoking status (i.e., separately for current smokers and nonsmokers). Statistical analyses used two-tailed tests in Stata 13.0 [20], with a critical alpha of .05.
Results
Experiments
The experiments yielded strikingly similar results in the probability and convenience samples: 21 of 24 experimental findings were statistically significant and in the same direction (Table 1). In both samples, learning that chemicals are in cigarette smoke increased interest in “natural,” “organic,” and “additive-free” cigarettes (compared with “ultra-light” cigarettes), and learning that cigarettes have more harmful chemicals than e-cigarettes led to higher interest in dual use and higher perceived harm of cigarettes (compared to believing there are similar amounts) [13]. Also in both samples, support was highest for antismoking ads targeting teens (compared to ads targeting other vulnerable groups), and in-group members expressed greater support for ads than out-group members [14].
Comparison between probability and convenience samples: Experiments and quasi-experiments
Experiment . | Description of experiment findings . | Prob. . | Conv. . |
---|---|---|---|
The effect of health effect and found-in messages on discouragement from wanting to smoke | Message with health effect led to higher discouragement than message without | ✓ | ✓ |
Message with found-in led to higher discouragement than message without | ✓ | ✓ | |
Message with both health effect and found-in led to higher discouragement than message with only found-in | ✓ | ||
The effect of chemical presence on interest in natural cigarettes | Learning that chemicals are in cigarette smoke increased interest in “organic” compared with “ultra-light” cigarettes | ✓ | ✓ |
Learning that chemicals are in cigarette smoke increased interest in “additive-free” compared with “ultra-light” cigarettes | ✓ | ✓ | |
Learning that chemicals are in cigarette smoke increased interest in “natural” compared with “ultra-light” cigarettes | ✓ | ✓ | |
The effect of chemical amount on interest in switching brands or styles | Learning that current cigarettes had a lot more lead than other cigarettes led to increased interest in brand switching compared with learning about nicotine | ✓ | ✓ |
Learning that current cigarettes had a lot more formaldehyde than other cigarettes led to increased interest in brand switching compared with learning about nicotine | ✓ | ✓ | |
Learning that current cigarettes had a lot more arsenic than other cigarettes led to increased interest in brand switching compared with learning about nicotine | ✓ | ✓ | |
Learning that current cigarettes had a lot more carbon monoxide than other cigarettes led to increased interest in brand switching compared with learning about nicotine | ✓ | ||
Learning that current cigarettes had a lot more ammonia than other cigarettes led to increased interest in brand switching compared with learning about nicotine | ✓ | ✓ | |
The effect of chemical amount on interest in dual use of cigarettes and e-cigarettes | Learning there were harmful chemicals in cigarettes but not in e-cigarettes led to higher interest in dual use, compared with “same amount” | ✓ | ✓ |
Learning there were 10 times more chemicals in cigarettes than e-cigarettes led to higher interest in dual use, compared with “same amount” | ✓ | ✓ | |
Learning there were 100 times more chemicals in cigarettes than e-cigarettes led to higher interest in dual use, compared with “same amount” | ✓ | ✓ | |
Learning there were harmful chemicals in cigarettes but not in e-cigarettes led to higher perceived harm of cigarettes, compared with “same amount” | ✓ | ✓ | |
Learning there were 10 times more chemicals in cigarettes than e-cigarettes led to higher perceived harm of cigarettes, compared with “same amount” | ✓ | ✓ | |
Learning there were 100 times more chemicals in cigarettes than e-cigarettes led to higher perceived harm of cigarettes, compared with “same amount” | ✓ | ✓ | |
The effect of targeting and group membership on support for antismoking ads | Ads targeting teens led to greater support than ads targeting Latinos, African Americans, or GLBs | ✓ | ✓ |
In-group members expressed more support for ads than out-group members | ✓ | ✓ | |
Description of different industry marketing practices did not lead to different levels of support | ✓ | ✓ | |
Correlates of discouragement from smoking | Chemicals that people had heard are in cigarette smoke led to greater discouragement than ones they had not | ✓ | ✓ |
Chemicals with names starting with a number led to less discouragement than ones that did not | ✓ | ✓ | |
Chemicals with names ending in “ene” or “ine” led to less discouragement than ones ending in “ide” or “yde” or another ending | ✓ | ✓ | |
Chemicals with longer names led to greater discouragement than ones with shorter names | ✓ |
Experiment . | Description of experiment findings . | Prob. . | Conv. . |
---|---|---|---|
The effect of health effect and found-in messages on discouragement from wanting to smoke | Message with health effect led to higher discouragement than message without | ✓ | ✓ |
Message with found-in led to higher discouragement than message without | ✓ | ✓ | |
Message with both health effect and found-in led to higher discouragement than message with only found-in | ✓ | ||
The effect of chemical presence on interest in natural cigarettes | Learning that chemicals are in cigarette smoke increased interest in “organic” compared with “ultra-light” cigarettes | ✓ | ✓ |
Learning that chemicals are in cigarette smoke increased interest in “additive-free” compared with “ultra-light” cigarettes | ✓ | ✓ | |
Learning that chemicals are in cigarette smoke increased interest in “natural” compared with “ultra-light” cigarettes | ✓ | ✓ | |
The effect of chemical amount on interest in switching brands or styles | Learning that current cigarettes had a lot more lead than other cigarettes led to increased interest in brand switching compared with learning about nicotine | ✓ | ✓ |
Learning that current cigarettes had a lot more formaldehyde than other cigarettes led to increased interest in brand switching compared with learning about nicotine | ✓ | ✓ | |
Learning that current cigarettes had a lot more arsenic than other cigarettes led to increased interest in brand switching compared with learning about nicotine | ✓ | ✓ | |
Learning that current cigarettes had a lot more carbon monoxide than other cigarettes led to increased interest in brand switching compared with learning about nicotine | ✓ | ||
Learning that current cigarettes had a lot more ammonia than other cigarettes led to increased interest in brand switching compared with learning about nicotine | ✓ | ✓ | |
The effect of chemical amount on interest in dual use of cigarettes and e-cigarettes | Learning there were harmful chemicals in cigarettes but not in e-cigarettes led to higher interest in dual use, compared with “same amount” | ✓ | ✓ |
Learning there were 10 times more chemicals in cigarettes than e-cigarettes led to higher interest in dual use, compared with “same amount” | ✓ | ✓ | |
Learning there were 100 times more chemicals in cigarettes than e-cigarettes led to higher interest in dual use, compared with “same amount” | ✓ | ✓ | |
Learning there were harmful chemicals in cigarettes but not in e-cigarettes led to higher perceived harm of cigarettes, compared with “same amount” | ✓ | ✓ | |
Learning there were 10 times more chemicals in cigarettes than e-cigarettes led to higher perceived harm of cigarettes, compared with “same amount” | ✓ | ✓ | |
Learning there were 100 times more chemicals in cigarettes than e-cigarettes led to higher perceived harm of cigarettes, compared with “same amount” | ✓ | ✓ | |
The effect of targeting and group membership on support for antismoking ads | Ads targeting teens led to greater support than ads targeting Latinos, African Americans, or GLBs | ✓ | ✓ |
In-group members expressed more support for ads than out-group members | ✓ | ✓ | |
Description of different industry marketing practices did not lead to different levels of support | ✓ | ✓ | |
Correlates of discouragement from smoking | Chemicals that people had heard are in cigarette smoke led to greater discouragement than ones they had not | ✓ | ✓ |
Chemicals with names starting with a number led to less discouragement than ones that did not | ✓ | ✓ | |
Chemicals with names ending in “ene” or “ine” led to less discouragement than ones ending in “ide” or “yde” or another ending | ✓ | ✓ | |
Chemicals with longer names led to greater discouragement than ones with shorter names | ✓ |
Shaded cells indicate the same pattern of statistically significant findings between the probability (Prob.) and MTurk convenience (Conv.) samples. All data are unweighted.
✓ finding was present and statistically significant (p < .05).
Comparison between probability and convenience samples: Experiments and quasi-experiments
Experiment . | Description of experiment findings . | Prob. . | Conv. . |
---|---|---|---|
The effect of health effect and found-in messages on discouragement from wanting to smoke | Message with health effect led to higher discouragement than message without | ✓ | ✓ |
Message with found-in led to higher discouragement than message without | ✓ | ✓ | |
Message with both health effect and found-in led to higher discouragement than message with only found-in | ✓ | ||
The effect of chemical presence on interest in natural cigarettes | Learning that chemicals are in cigarette smoke increased interest in “organic” compared with “ultra-light” cigarettes | ✓ | ✓ |
Learning that chemicals are in cigarette smoke increased interest in “additive-free” compared with “ultra-light” cigarettes | ✓ | ✓ | |
Learning that chemicals are in cigarette smoke increased interest in “natural” compared with “ultra-light” cigarettes | ✓ | ✓ | |
The effect of chemical amount on interest in switching brands or styles | Learning that current cigarettes had a lot more lead than other cigarettes led to increased interest in brand switching compared with learning about nicotine | ✓ | ✓ |
Learning that current cigarettes had a lot more formaldehyde than other cigarettes led to increased interest in brand switching compared with learning about nicotine | ✓ | ✓ | |
Learning that current cigarettes had a lot more arsenic than other cigarettes led to increased interest in brand switching compared with learning about nicotine | ✓ | ✓ | |
Learning that current cigarettes had a lot more carbon monoxide than other cigarettes led to increased interest in brand switching compared with learning about nicotine | ✓ | ||
Learning that current cigarettes had a lot more ammonia than other cigarettes led to increased interest in brand switching compared with learning about nicotine | ✓ | ✓ | |
The effect of chemical amount on interest in dual use of cigarettes and e-cigarettes | Learning there were harmful chemicals in cigarettes but not in e-cigarettes led to higher interest in dual use, compared with “same amount” | ✓ | ✓ |
Learning there were 10 times more chemicals in cigarettes than e-cigarettes led to higher interest in dual use, compared with “same amount” | ✓ | ✓ | |
Learning there were 100 times more chemicals in cigarettes than e-cigarettes led to higher interest in dual use, compared with “same amount” | ✓ | ✓ | |
Learning there were harmful chemicals in cigarettes but not in e-cigarettes led to higher perceived harm of cigarettes, compared with “same amount” | ✓ | ✓ | |
Learning there were 10 times more chemicals in cigarettes than e-cigarettes led to higher perceived harm of cigarettes, compared with “same amount” | ✓ | ✓ | |
Learning there were 100 times more chemicals in cigarettes than e-cigarettes led to higher perceived harm of cigarettes, compared with “same amount” | ✓ | ✓ | |
The effect of targeting and group membership on support for antismoking ads | Ads targeting teens led to greater support than ads targeting Latinos, African Americans, or GLBs | ✓ | ✓ |
In-group members expressed more support for ads than out-group members | ✓ | ✓ | |
Description of different industry marketing practices did not lead to different levels of support | ✓ | ✓ | |
Correlates of discouragement from smoking | Chemicals that people had heard are in cigarette smoke led to greater discouragement than ones they had not | ✓ | ✓ |
Chemicals with names starting with a number led to less discouragement than ones that did not | ✓ | ✓ | |
Chemicals with names ending in “ene” or “ine” led to less discouragement than ones ending in “ide” or “yde” or another ending | ✓ | ✓ | |
Chemicals with longer names led to greater discouragement than ones with shorter names | ✓ |
Experiment . | Description of experiment findings . | Prob. . | Conv. . |
---|---|---|---|
The effect of health effect and found-in messages on discouragement from wanting to smoke | Message with health effect led to higher discouragement than message without | ✓ | ✓ |
Message with found-in led to higher discouragement than message without | ✓ | ✓ | |
Message with both health effect and found-in led to higher discouragement than message with only found-in | ✓ | ||
The effect of chemical presence on interest in natural cigarettes | Learning that chemicals are in cigarette smoke increased interest in “organic” compared with “ultra-light” cigarettes | ✓ | ✓ |
Learning that chemicals are in cigarette smoke increased interest in “additive-free” compared with “ultra-light” cigarettes | ✓ | ✓ | |
Learning that chemicals are in cigarette smoke increased interest in “natural” compared with “ultra-light” cigarettes | ✓ | ✓ | |
The effect of chemical amount on interest in switching brands or styles | Learning that current cigarettes had a lot more lead than other cigarettes led to increased interest in brand switching compared with learning about nicotine | ✓ | ✓ |
Learning that current cigarettes had a lot more formaldehyde than other cigarettes led to increased interest in brand switching compared with learning about nicotine | ✓ | ✓ | |
Learning that current cigarettes had a lot more arsenic than other cigarettes led to increased interest in brand switching compared with learning about nicotine | ✓ | ✓ | |
Learning that current cigarettes had a lot more carbon monoxide than other cigarettes led to increased interest in brand switching compared with learning about nicotine | ✓ | ||
Learning that current cigarettes had a lot more ammonia than other cigarettes led to increased interest in brand switching compared with learning about nicotine | ✓ | ✓ | |
The effect of chemical amount on interest in dual use of cigarettes and e-cigarettes | Learning there were harmful chemicals in cigarettes but not in e-cigarettes led to higher interest in dual use, compared with “same amount” | ✓ | ✓ |
Learning there were 10 times more chemicals in cigarettes than e-cigarettes led to higher interest in dual use, compared with “same amount” | ✓ | ✓ | |
Learning there were 100 times more chemicals in cigarettes than e-cigarettes led to higher interest in dual use, compared with “same amount” | ✓ | ✓ | |
Learning there were harmful chemicals in cigarettes but not in e-cigarettes led to higher perceived harm of cigarettes, compared with “same amount” | ✓ | ✓ | |
Learning there were 10 times more chemicals in cigarettes than e-cigarettes led to higher perceived harm of cigarettes, compared with “same amount” | ✓ | ✓ | |
Learning there were 100 times more chemicals in cigarettes than e-cigarettes led to higher perceived harm of cigarettes, compared with “same amount” | ✓ | ✓ | |
The effect of targeting and group membership on support for antismoking ads | Ads targeting teens led to greater support than ads targeting Latinos, African Americans, or GLBs | ✓ | ✓ |
In-group members expressed more support for ads than out-group members | ✓ | ✓ | |
Description of different industry marketing practices did not lead to different levels of support | ✓ | ✓ | |
Correlates of discouragement from smoking | Chemicals that people had heard are in cigarette smoke led to greater discouragement than ones they had not | ✓ | ✓ |
Chemicals with names starting with a number led to less discouragement than ones that did not | ✓ | ✓ | |
Chemicals with names ending in “ene” or “ine” led to less discouragement than ones ending in “ide” or “yde” or another ending | ✓ | ✓ | |
Chemicals with longer names led to greater discouragement than ones with shorter names | ✓ |
Shaded cells indicate the same pattern of statistically significant findings between the probability (Prob.) and MTurk convenience (Conv.) samples. All data are unweighted.
✓ finding was present and statistically significant (p < .05).
A message about chemicals in cigarette smoke led to higher discouragement from wanting to smoke in both samples if it presented additional information about the chemical’s health effect or about other toxic products where it is found; and a message with both types of additional information led to higher discouragement levels (compared to a message with only one type of additional information) in only the MTurk convenience sample [12]. Learning that their current cigarette brand had more lead, formaldehyde, arsenic, and ammonia than other brands led participants to report increased interest in brand switching (compared to learning about nicotine) in both samples; however, learning about carbon monoxide increased interest in brand switching only for the MTurk sample [16]. Finally, participants from both samples reported being less discouraged from smoking by chemicals that they had not previously heard are in cigarette smoke, chemicals with names starting with a number, or chemicals with names ending in “ene” or “ine”; however, only participants from the probability sample reported being more discouraged by chemicals with longer names [15].
Associations Among Measured Variables
Analyses of associations among measured variables yielded similar findings for the two samples: 19 of 25 associations were comparable (Table 2). Whites (compared to Asians and American Indians or Alaska Natives), people who did not attend college, and people with low numeracy were more likely to be both current cigarette users and e-cigarette users. Black or African Americans (compared to Asians and American Indians or Alaska Natives), and American Indians or Alaska Natives (compared to Asians) were more likely to be cigarette users. Males, whites (compared to African American), those who identified as gay, lesbian, or bisexual (GLB), and current cigarette smokers were more likely to be e-cigarette users. Some associations found for the MTurk convenience sample deviated from those for the probability sample. Older adults (compared to young adults between the ages of 18 and 24) were more likely to be cigarette users (only in the MTurk convenience sample) and less likely to be e-cigarette users (only in the probability sample). Non-Hispanics were more likely to be cigarette users in the probability sample, while GLBs were more likely to be cigarette users in the MTurk convenience sample.
Comparison between probability and convenience samples: Associations among measured variables
Predictor . | Outcome . | |||
---|---|---|---|---|
Current cigarette use . | E-cigarette use . | |||
Probability . | Convenience . | Probability . | Convenience . | |
Older age, 25+ (ref: young age, 18–24) | n.s. | + | − | n.s. |
Male (ref: female) | n.s. | n.s. | + | + |
Non-Hispanic (ref: Hispanic) | + | n.s. | n.s. | n.s. |
GLB (ref: heterosexual) | n.s. | + | + | + |
Race | ||||
White (ref: Black or African American) | n.s. | + | + | + |
White (ref: Asian) | + | + | + | + |
White (ref: American Indian or Alaska Native) | + | + | + | + |
Black or African American (ref: Asian) | + | + | n.s. | n.s. |
Black or African American (ref: American Indian or Alaska Native) | + | + | n.s. | n.s. |
American Indian or Alaska Native (ref: Asian) | + | + | n.s. | + |
Did not attend college (ref: attended college) | + | + | + | + |
Low numeracy (ref: high numeracy) | + | + | + | + |
Smoker (ref: nonsmoker) | n/a | + | + |
Predictor . | Outcome . | |||
---|---|---|---|---|
Current cigarette use . | E-cigarette use . | |||
Probability . | Convenience . | Probability . | Convenience . | |
Older age, 25+ (ref: young age, 18–24) | n.s. | + | − | n.s. |
Male (ref: female) | n.s. | n.s. | + | + |
Non-Hispanic (ref: Hispanic) | + | n.s. | n.s. | n.s. |
GLB (ref: heterosexual) | n.s. | + | + | + |
Race | ||||
White (ref: Black or African American) | n.s. | + | + | + |
White (ref: Asian) | + | + | + | + |
White (ref: American Indian or Alaska Native) | + | + | + | + |
Black or African American (ref: Asian) | + | + | n.s. | n.s. |
Black or African American (ref: American Indian or Alaska Native) | + | + | n.s. | n.s. |
American Indian or Alaska Native (ref: Asian) | + | + | n.s. | + |
Did not attend college (ref: attended college) | + | + | + | + |
Low numeracy (ref: high numeracy) | + | + | + | + |
Smoker (ref: nonsmoker) | n/a | + | + |
Shaded cells indicate the same pattern of statistically significant findings between the probability and MTurk convenience samples. Data from the probability sample were weighted; data from the MTurk convenience sample were unweighted.
ref reference group; GLB gay, lesbian, or bisexual; + higher level of product use compared to reference group (p < .05); − lower level of product use compared to reference group (p < .05); n.s. nonsignificant difference in level of product use compared to reference group; n/a = we did not assess the association because by definition, all smokers reported current cigarette use.
Comparison between probability and convenience samples: Associations among measured variables
Predictor . | Outcome . | |||
---|---|---|---|---|
Current cigarette use . | E-cigarette use . | |||
Probability . | Convenience . | Probability . | Convenience . | |
Older age, 25+ (ref: young age, 18–24) | n.s. | + | − | n.s. |
Male (ref: female) | n.s. | n.s. | + | + |
Non-Hispanic (ref: Hispanic) | + | n.s. | n.s. | n.s. |
GLB (ref: heterosexual) | n.s. | + | + | + |
Race | ||||
White (ref: Black or African American) | n.s. | + | + | + |
White (ref: Asian) | + | + | + | + |
White (ref: American Indian or Alaska Native) | + | + | + | + |
Black or African American (ref: Asian) | + | + | n.s. | n.s. |
Black or African American (ref: American Indian or Alaska Native) | + | + | n.s. | n.s. |
American Indian or Alaska Native (ref: Asian) | + | + | n.s. | + |
Did not attend college (ref: attended college) | + | + | + | + |
Low numeracy (ref: high numeracy) | + | + | + | + |
Smoker (ref: nonsmoker) | n/a | + | + |
Predictor . | Outcome . | |||
---|---|---|---|---|
Current cigarette use . | E-cigarette use . | |||
Probability . | Convenience . | Probability . | Convenience . | |
Older age, 25+ (ref: young age, 18–24) | n.s. | + | − | n.s. |
Male (ref: female) | n.s. | n.s. | + | + |
Non-Hispanic (ref: Hispanic) | + | n.s. | n.s. | n.s. |
GLB (ref: heterosexual) | n.s. | + | + | + |
Race | ||||
White (ref: Black or African American) | n.s. | + | + | + |
White (ref: Asian) | + | + | + | + |
White (ref: American Indian or Alaska Native) | + | + | + | + |
Black or African American (ref: Asian) | + | + | n.s. | n.s. |
Black or African American (ref: American Indian or Alaska Native) | + | + | n.s. | n.s. |
American Indian or Alaska Native (ref: Asian) | + | + | n.s. | + |
Did not attend college (ref: attended college) | + | + | + | + |
Low numeracy (ref: high numeracy) | + | + | + | + |
Smoker (ref: nonsmoker) | n/a | + | + |
Shaded cells indicate the same pattern of statistically significant findings between the probability and MTurk convenience samples. Data from the probability sample were weighted; data from the MTurk convenience sample were unweighted.
ref reference group; GLB gay, lesbian, or bisexual; + higher level of product use compared to reference group (p < .05); − lower level of product use compared to reference group (p < .05); n.s. nonsignificant difference in level of product use compared to reference group; n/a = we did not assess the association because by definition, all smokers reported current cigarette use.
Prevalence
As expected, the samples differed on most prevalence estimates for demographic characteristics: Only 1 of 17 estimates did not differ, with a median difference of 19% (Table 3). The MTurk convenience sample was considerably younger, with a mean age of 35 (compared to a mean age of 47 among the probability sample) and a higher percentage of 18–44 year olds. The MTurk convenience sample was also notably more educated: 87% and 92% of the MTurk convenience sample had attended college and had high numeracy compared with 58% and 68% of the probability sample. Although both samples were predominantly white, the racial categories were present in different proportions in the samples such that the probability sample was generally more diverse (with more individuals identifying as Black or African American, American Indian or Alaska Native, and “Other” racial identity). The two samples were consistent in terms of sex (about 49% of both samples were male).
Comparison between probability and convenience samples: Demographic and tobacco-related prevalence estimates
. | Overall . | Smokers . | Nonsmokers . | |||||
---|---|---|---|---|---|---|---|---|
Probability (n = 5,014) . | Convenience (n = 4,137) . | Probability (n = 1,151) . | Convenience (n = 1,441) . | Probability (n = 3,856) . | Convenience (n = 2,687) . | |||
n . | % [95% CI] . | n . | % difference from prob. sample . | % [95% CI] . | % difference from prob. sample . | % [95% CI] . | % difference from prob. sample . | |
Age | ||||||||
18–24 | 711 | 12.7 [11.3, 14.2] | 778 | +6.1** | 10.9 [8.6, 13.8] | +1.2 | 13.0 [11.5, 14.8] | +9.5** |
25–44 | 1,612 | 33.2 [30.8, 35.6] | 2,668 | +31.2** | 41.5 [36.0, 47.2] | +27.0** | 31.4 [28.9, 34.1] | +31.1** |
45–64 | 1,883 | 36.7 [34.3, 39.1] | 641 | −21.2** | 40.0 [34.7, 45.5] | −21.7** | 36.0 [33.3, 38.7] | −21.9** |
65+ | 789 | 17.5 [15.3, 19.9] | 41 | −16.5** | 7.6 [5.4, 10.7] | −6.6** | 19.6 [17.1, 22.4] | −18.6** |
Race | ||||||||
White | 3,473 | 68.3 [65.9, 70.6] | 3,433 | +14.7** | 66.7 [61.3, 71.7] | +20.2** | 68.6 [65.9, 71.2] | +12.5** |
Black/African American | 978 | 18.3 [16.4, 20.4] | 313 | −10.7** | 21.9 [17.5, 27.1] | −15.5** | 17.6 [15.5, 19.8] | −9.5** |
American Indian/ Alaska Native | 135 | 1.9 [1.4, 2.7] | 39 | −1.0** | 2.9 [1.6, 5.0] | −1.9** | 1.7 [1.1, 2.6] | −0.8** |
Asian | 104 | 2.4 [1.9, 3.1] | 207 | +2.6** | 0.9 [0.4, 1.7] | +1.7** | 2.8 [2.1, 3.6] | +3.5** |
Other | 291 | 9.0 [7.6, 10.6] | 142 | −5.6** | 7.6 [5.3, 10.8] | −4.5** | 9.3 [7.7, 11.2] | −5.7** |
Hispanic | 432 | 14.2 [12.5, 16.1] | 340 | −6.0** | 10.1 [7.4, 13.7] | −2.4** | 15.0 [13.1, 17.2] | −6.5** |
Male | 2,372 | 48.5 [46.0, 51.0] | 2,042 | +0.9 | 50.7 [45.2, 56.3] | −2.3 | 48.0 [45.1, 50.8] | +1.9* |
Gay, lesbian, or bisexual | 192 | 3.2 [2.6, 4.0] | 478 | +8.4** | 4.5 [3.1, 6.4] | +8.7** | 3.0 [2.3, 3.8] | +7.7** |
Did not attend college | 1,749 | 42.4 [39.8, 45.1] | 559 | −28.9** | 54.5 [49.0, 60.0] | −35.7** | 39.8 [36.8, 42.9] | −29.1** |
Low numeracy | 1,599 | 31.9 [29.6, 34.3] | 344 | −23.6** | 38.3 [33.2, 43.7] | −28.4** | 30.5 [27.9, 33.2] | −23.1** |
Below federal poverty level | 868 | 17.5 [15.4, 19.7] | 589 | −3.2** | 28.1 [23.8, 32.8] | −8.0** | 15.3 [13.0, 17.9] | −2.0** |
Fair/poor general health | 787 | 14.9 [13.0, 17.1] | 469 | −3.5** | 24.0 [19.5, 29.1] | −8.1** | 13.0 [10.9, 15.5] | −4.0** |
Fair/poor mental health | 424 | 8.7 [7.3, 10.3] | 415 | +1.3** | 21.3 [16.8, 26.7] | −10.9** | 5.9 [4.7, 7.5] | +4.0** |
Uses | ||||||||
Cigarettes | 1,151 | 17.8 [16.0, 19.7] | 1,441 | +17.1** | 100.0 | 0.0 | n/a | |
E-cigarettes | 532 | 8.9 [7.7, 10.3] | 1,221 | +20.7** | 29.2 [24.7, 34.2] | +22.3** | 4.5 [3.5, 5.7] | +13.3** |
Little cigars/ cigarillos | 378 | 7.4 [6.2, 8.9] | 643 | +8.2** | 21.5 [17.4, 26.2] | +6.9** | 4.4 [3.3, 5.9] | +4.2** |
Hookah | 139 | 2.9 [2.3, 3.8] | 384 | +6.4** | 6.6 [4.3, 9.8] | +8.8** | 2.2 [1.5, 3.0] | +3.8** |
Other tobacco products | 323 | 5.9 [4.9, 7.0] | 299 | +1.5** | 9.1 [6.7, 12.3] | +0.5 | 5.2 [4.1, 6.5] | +0.8 |
Intends to quit smoking | 550 | 47.1 [41.6, 52.7] | 706 | −1.9 | 48.2 [42.6, 53.8] | −3.2* | n/a | |
n | Mean [95% CI] | n | Mean difference from prob. sample | Mean [95% CI] | Mean difference from prob. sample | Mean [95% CI] | Mean difference from prob. sample | |
Cigarettes smoked per day | 809 | 16.0 [14.8, 17.1] | 923 | −1.0** | 16.0 [14.9, 17.2] | −0.9** | n/a |
. | Overall . | Smokers . | Nonsmokers . | |||||
---|---|---|---|---|---|---|---|---|
Probability (n = 5,014) . | Convenience (n = 4,137) . | Probability (n = 1,151) . | Convenience (n = 1,441) . | Probability (n = 3,856) . | Convenience (n = 2,687) . | |||
n . | % [95% CI] . | n . | % difference from prob. sample . | % [95% CI] . | % difference from prob. sample . | % [95% CI] . | % difference from prob. sample . | |
Age | ||||||||
18–24 | 711 | 12.7 [11.3, 14.2] | 778 | +6.1** | 10.9 [8.6, 13.8] | +1.2 | 13.0 [11.5, 14.8] | +9.5** |
25–44 | 1,612 | 33.2 [30.8, 35.6] | 2,668 | +31.2** | 41.5 [36.0, 47.2] | +27.0** | 31.4 [28.9, 34.1] | +31.1** |
45–64 | 1,883 | 36.7 [34.3, 39.1] | 641 | −21.2** | 40.0 [34.7, 45.5] | −21.7** | 36.0 [33.3, 38.7] | −21.9** |
65+ | 789 | 17.5 [15.3, 19.9] | 41 | −16.5** | 7.6 [5.4, 10.7] | −6.6** | 19.6 [17.1, 22.4] | −18.6** |
Race | ||||||||
White | 3,473 | 68.3 [65.9, 70.6] | 3,433 | +14.7** | 66.7 [61.3, 71.7] | +20.2** | 68.6 [65.9, 71.2] | +12.5** |
Black/African American | 978 | 18.3 [16.4, 20.4] | 313 | −10.7** | 21.9 [17.5, 27.1] | −15.5** | 17.6 [15.5, 19.8] | −9.5** |
American Indian/ Alaska Native | 135 | 1.9 [1.4, 2.7] | 39 | −1.0** | 2.9 [1.6, 5.0] | −1.9** | 1.7 [1.1, 2.6] | −0.8** |
Asian | 104 | 2.4 [1.9, 3.1] | 207 | +2.6** | 0.9 [0.4, 1.7] | +1.7** | 2.8 [2.1, 3.6] | +3.5** |
Other | 291 | 9.0 [7.6, 10.6] | 142 | −5.6** | 7.6 [5.3, 10.8] | −4.5** | 9.3 [7.7, 11.2] | −5.7** |
Hispanic | 432 | 14.2 [12.5, 16.1] | 340 | −6.0** | 10.1 [7.4, 13.7] | −2.4** | 15.0 [13.1, 17.2] | −6.5** |
Male | 2,372 | 48.5 [46.0, 51.0] | 2,042 | +0.9 | 50.7 [45.2, 56.3] | −2.3 | 48.0 [45.1, 50.8] | +1.9* |
Gay, lesbian, or bisexual | 192 | 3.2 [2.6, 4.0] | 478 | +8.4** | 4.5 [3.1, 6.4] | +8.7** | 3.0 [2.3, 3.8] | +7.7** |
Did not attend college | 1,749 | 42.4 [39.8, 45.1] | 559 | −28.9** | 54.5 [49.0, 60.0] | −35.7** | 39.8 [36.8, 42.9] | −29.1** |
Low numeracy | 1,599 | 31.9 [29.6, 34.3] | 344 | −23.6** | 38.3 [33.2, 43.7] | −28.4** | 30.5 [27.9, 33.2] | −23.1** |
Below federal poverty level | 868 | 17.5 [15.4, 19.7] | 589 | −3.2** | 28.1 [23.8, 32.8] | −8.0** | 15.3 [13.0, 17.9] | −2.0** |
Fair/poor general health | 787 | 14.9 [13.0, 17.1] | 469 | −3.5** | 24.0 [19.5, 29.1] | −8.1** | 13.0 [10.9, 15.5] | −4.0** |
Fair/poor mental health | 424 | 8.7 [7.3, 10.3] | 415 | +1.3** | 21.3 [16.8, 26.7] | −10.9** | 5.9 [4.7, 7.5] | +4.0** |
Uses | ||||||||
Cigarettes | 1,151 | 17.8 [16.0, 19.7] | 1,441 | +17.1** | 100.0 | 0.0 | n/a | |
E-cigarettes | 532 | 8.9 [7.7, 10.3] | 1,221 | +20.7** | 29.2 [24.7, 34.2] | +22.3** | 4.5 [3.5, 5.7] | +13.3** |
Little cigars/ cigarillos | 378 | 7.4 [6.2, 8.9] | 643 | +8.2** | 21.5 [17.4, 26.2] | +6.9** | 4.4 [3.3, 5.9] | +4.2** |
Hookah | 139 | 2.9 [2.3, 3.8] | 384 | +6.4** | 6.6 [4.3, 9.8] | +8.8** | 2.2 [1.5, 3.0] | +3.8** |
Other tobacco products | 323 | 5.9 [4.9, 7.0] | 299 | +1.5** | 9.1 [6.7, 12.3] | +0.5 | 5.2 [4.1, 6.5] | +0.8 |
Intends to quit smoking | 550 | 47.1 [41.6, 52.7] | 706 | −1.9 | 48.2 [42.6, 53.8] | −3.2* | n/a | |
n | Mean [95% CI] | n | Mean difference from prob. sample | Mean [95% CI] | Mean difference from prob. sample | Mean [95% CI] | Mean difference from prob. sample | |
Cigarettes smoked per day | 809 | 16.0 [14.8, 17.1] | 923 | −1.0** | 16.0 [14.9, 17.2] | −0.9** | n/a |
Shaded cells indicate nonsignificant differences between the probability and MTurk convenience samples. Data from the probability sample are weighted; data from the MTurk convenience sample are unweighted.
CI confidence interval.
*p < .05; **p < .01.
Comparison between probability and convenience samples: Demographic and tobacco-related prevalence estimates
. | Overall . | Smokers . | Nonsmokers . | |||||
---|---|---|---|---|---|---|---|---|
Probability (n = 5,014) . | Convenience (n = 4,137) . | Probability (n = 1,151) . | Convenience (n = 1,441) . | Probability (n = 3,856) . | Convenience (n = 2,687) . | |||
n . | % [95% CI] . | n . | % difference from prob. sample . | % [95% CI] . | % difference from prob. sample . | % [95% CI] . | % difference from prob. sample . | |
Age | ||||||||
18–24 | 711 | 12.7 [11.3, 14.2] | 778 | +6.1** | 10.9 [8.6, 13.8] | +1.2 | 13.0 [11.5, 14.8] | +9.5** |
25–44 | 1,612 | 33.2 [30.8, 35.6] | 2,668 | +31.2** | 41.5 [36.0, 47.2] | +27.0** | 31.4 [28.9, 34.1] | +31.1** |
45–64 | 1,883 | 36.7 [34.3, 39.1] | 641 | −21.2** | 40.0 [34.7, 45.5] | −21.7** | 36.0 [33.3, 38.7] | −21.9** |
65+ | 789 | 17.5 [15.3, 19.9] | 41 | −16.5** | 7.6 [5.4, 10.7] | −6.6** | 19.6 [17.1, 22.4] | −18.6** |
Race | ||||||||
White | 3,473 | 68.3 [65.9, 70.6] | 3,433 | +14.7** | 66.7 [61.3, 71.7] | +20.2** | 68.6 [65.9, 71.2] | +12.5** |
Black/African American | 978 | 18.3 [16.4, 20.4] | 313 | −10.7** | 21.9 [17.5, 27.1] | −15.5** | 17.6 [15.5, 19.8] | −9.5** |
American Indian/ Alaska Native | 135 | 1.9 [1.4, 2.7] | 39 | −1.0** | 2.9 [1.6, 5.0] | −1.9** | 1.7 [1.1, 2.6] | −0.8** |
Asian | 104 | 2.4 [1.9, 3.1] | 207 | +2.6** | 0.9 [0.4, 1.7] | +1.7** | 2.8 [2.1, 3.6] | +3.5** |
Other | 291 | 9.0 [7.6, 10.6] | 142 | −5.6** | 7.6 [5.3, 10.8] | −4.5** | 9.3 [7.7, 11.2] | −5.7** |
Hispanic | 432 | 14.2 [12.5, 16.1] | 340 | −6.0** | 10.1 [7.4, 13.7] | −2.4** | 15.0 [13.1, 17.2] | −6.5** |
Male | 2,372 | 48.5 [46.0, 51.0] | 2,042 | +0.9 | 50.7 [45.2, 56.3] | −2.3 | 48.0 [45.1, 50.8] | +1.9* |
Gay, lesbian, or bisexual | 192 | 3.2 [2.6, 4.0] | 478 | +8.4** | 4.5 [3.1, 6.4] | +8.7** | 3.0 [2.3, 3.8] | +7.7** |
Did not attend college | 1,749 | 42.4 [39.8, 45.1] | 559 | −28.9** | 54.5 [49.0, 60.0] | −35.7** | 39.8 [36.8, 42.9] | −29.1** |
Low numeracy | 1,599 | 31.9 [29.6, 34.3] | 344 | −23.6** | 38.3 [33.2, 43.7] | −28.4** | 30.5 [27.9, 33.2] | −23.1** |
Below federal poverty level | 868 | 17.5 [15.4, 19.7] | 589 | −3.2** | 28.1 [23.8, 32.8] | −8.0** | 15.3 [13.0, 17.9] | −2.0** |
Fair/poor general health | 787 | 14.9 [13.0, 17.1] | 469 | −3.5** | 24.0 [19.5, 29.1] | −8.1** | 13.0 [10.9, 15.5] | −4.0** |
Fair/poor mental health | 424 | 8.7 [7.3, 10.3] | 415 | +1.3** | 21.3 [16.8, 26.7] | −10.9** | 5.9 [4.7, 7.5] | +4.0** |
Uses | ||||||||
Cigarettes | 1,151 | 17.8 [16.0, 19.7] | 1,441 | +17.1** | 100.0 | 0.0 | n/a | |
E-cigarettes | 532 | 8.9 [7.7, 10.3] | 1,221 | +20.7** | 29.2 [24.7, 34.2] | +22.3** | 4.5 [3.5, 5.7] | +13.3** |
Little cigars/ cigarillos | 378 | 7.4 [6.2, 8.9] | 643 | +8.2** | 21.5 [17.4, 26.2] | +6.9** | 4.4 [3.3, 5.9] | +4.2** |
Hookah | 139 | 2.9 [2.3, 3.8] | 384 | +6.4** | 6.6 [4.3, 9.8] | +8.8** | 2.2 [1.5, 3.0] | +3.8** |
Other tobacco products | 323 | 5.9 [4.9, 7.0] | 299 | +1.5** | 9.1 [6.7, 12.3] | +0.5 | 5.2 [4.1, 6.5] | +0.8 |
Intends to quit smoking | 550 | 47.1 [41.6, 52.7] | 706 | −1.9 | 48.2 [42.6, 53.8] | −3.2* | n/a | |
n | Mean [95% CI] | n | Mean difference from prob. sample | Mean [95% CI] | Mean difference from prob. sample | Mean [95% CI] | Mean difference from prob. sample | |
Cigarettes smoked per day | 809 | 16.0 [14.8, 17.1] | 923 | −1.0** | 16.0 [14.9, 17.2] | −0.9** | n/a |
. | Overall . | Smokers . | Nonsmokers . | |||||
---|---|---|---|---|---|---|---|---|
Probability (n = 5,014) . | Convenience (n = 4,137) . | Probability (n = 1,151) . | Convenience (n = 1,441) . | Probability (n = 3,856) . | Convenience (n = 2,687) . | |||
n . | % [95% CI] . | n . | % difference from prob. sample . | % [95% CI] . | % difference from prob. sample . | % [95% CI] . | % difference from prob. sample . | |
Age | ||||||||
18–24 | 711 | 12.7 [11.3, 14.2] | 778 | +6.1** | 10.9 [8.6, 13.8] | +1.2 | 13.0 [11.5, 14.8] | +9.5** |
25–44 | 1,612 | 33.2 [30.8, 35.6] | 2,668 | +31.2** | 41.5 [36.0, 47.2] | +27.0** | 31.4 [28.9, 34.1] | +31.1** |
45–64 | 1,883 | 36.7 [34.3, 39.1] | 641 | −21.2** | 40.0 [34.7, 45.5] | −21.7** | 36.0 [33.3, 38.7] | −21.9** |
65+ | 789 | 17.5 [15.3, 19.9] | 41 | −16.5** | 7.6 [5.4, 10.7] | −6.6** | 19.6 [17.1, 22.4] | −18.6** |
Race | ||||||||
White | 3,473 | 68.3 [65.9, 70.6] | 3,433 | +14.7** | 66.7 [61.3, 71.7] | +20.2** | 68.6 [65.9, 71.2] | +12.5** |
Black/African American | 978 | 18.3 [16.4, 20.4] | 313 | −10.7** | 21.9 [17.5, 27.1] | −15.5** | 17.6 [15.5, 19.8] | −9.5** |
American Indian/ Alaska Native | 135 | 1.9 [1.4, 2.7] | 39 | −1.0** | 2.9 [1.6, 5.0] | −1.9** | 1.7 [1.1, 2.6] | −0.8** |
Asian | 104 | 2.4 [1.9, 3.1] | 207 | +2.6** | 0.9 [0.4, 1.7] | +1.7** | 2.8 [2.1, 3.6] | +3.5** |
Other | 291 | 9.0 [7.6, 10.6] | 142 | −5.6** | 7.6 [5.3, 10.8] | −4.5** | 9.3 [7.7, 11.2] | −5.7** |
Hispanic | 432 | 14.2 [12.5, 16.1] | 340 | −6.0** | 10.1 [7.4, 13.7] | −2.4** | 15.0 [13.1, 17.2] | −6.5** |
Male | 2,372 | 48.5 [46.0, 51.0] | 2,042 | +0.9 | 50.7 [45.2, 56.3] | −2.3 | 48.0 [45.1, 50.8] | +1.9* |
Gay, lesbian, or bisexual | 192 | 3.2 [2.6, 4.0] | 478 | +8.4** | 4.5 [3.1, 6.4] | +8.7** | 3.0 [2.3, 3.8] | +7.7** |
Did not attend college | 1,749 | 42.4 [39.8, 45.1] | 559 | −28.9** | 54.5 [49.0, 60.0] | −35.7** | 39.8 [36.8, 42.9] | −29.1** |
Low numeracy | 1,599 | 31.9 [29.6, 34.3] | 344 | −23.6** | 38.3 [33.2, 43.7] | −28.4** | 30.5 [27.9, 33.2] | −23.1** |
Below federal poverty level | 868 | 17.5 [15.4, 19.7] | 589 | −3.2** | 28.1 [23.8, 32.8] | −8.0** | 15.3 [13.0, 17.9] | −2.0** |
Fair/poor general health | 787 | 14.9 [13.0, 17.1] | 469 | −3.5** | 24.0 [19.5, 29.1] | −8.1** | 13.0 [10.9, 15.5] | −4.0** |
Fair/poor mental health | 424 | 8.7 [7.3, 10.3] | 415 | +1.3** | 21.3 [16.8, 26.7] | −10.9** | 5.9 [4.7, 7.5] | +4.0** |
Uses | ||||||||
Cigarettes | 1,151 | 17.8 [16.0, 19.7] | 1,441 | +17.1** | 100.0 | 0.0 | n/a | |
E-cigarettes | 532 | 8.9 [7.7, 10.3] | 1,221 | +20.7** | 29.2 [24.7, 34.2] | +22.3** | 4.5 [3.5, 5.7] | +13.3** |
Little cigars/ cigarillos | 378 | 7.4 [6.2, 8.9] | 643 | +8.2** | 21.5 [17.4, 26.2] | +6.9** | 4.4 [3.3, 5.9] | +4.2** |
Hookah | 139 | 2.9 [2.3, 3.8] | 384 | +6.4** | 6.6 [4.3, 9.8] | +8.8** | 2.2 [1.5, 3.0] | +3.8** |
Other tobacco products | 323 | 5.9 [4.9, 7.0] | 299 | +1.5** | 9.1 [6.7, 12.3] | +0.5 | 5.2 [4.1, 6.5] | +0.8 |
Intends to quit smoking | 550 | 47.1 [41.6, 52.7] | 706 | −1.9 | 48.2 [42.6, 53.8] | −3.2* | n/a | |
n | Mean [95% CI] | n | Mean difference from prob. sample | Mean [95% CI] | Mean difference from prob. sample | Mean [95% CI] | Mean difference from prob. sample | |
Cigarettes smoked per day | 809 | 16.0 [14.8, 17.1] | 923 | −1.0** | 16.0 [14.9, 17.2] | −0.9** | n/a |
Shaded cells indicate nonsignificant differences between the probability and MTurk convenience samples. Data from the probability sample are weighted; data from the MTurk convenience sample are unweighted.
CI confidence interval.
*p < .05; **p < .01.
Tobacco use and other tobacco-related constructs also differed substantially between the samples: Only 1 of 22 prevalence estimates did not differ (Tables 3 and 4). The MTurk convenience sample had many more cigarette users and more users of e-cigarettes and other tobacco products than the probability sample. Intentions to quit smoking were consistent across the samples (present among 47% of the probability sample and 49% of the MTurk convenience sample). On awareness of cigarette chemicals, perceived harm of chemicals, and discouragement from smoking due to chemical presence, the differences between the samples were very small in magnitude albeit statistically significant. Most notably, more participants in the MTurk convenience sample reported having looked for information about chemicals in cigarettes (39%) and wanting to see such information online (45%), compared to the probability sample (28% and 29%, respectively). More people in the MTurk convenience sample believed that the source of chemicals was tobacco additives and believed that some chemicals were trapped by cigarette filters, compared with all, a lot, or none.
Comparison between probability and convenience samples: Tobacco-related prevalence estimates
. | Overall . | Smokers . | Nonsmokers . | |||||
---|---|---|---|---|---|---|---|---|
Probability . | Convenience . | Probability . | Convenience . | Probability . | Convenience . | |||
n . | Mean [95% CI] . | n . | Mean difference from prob. sample . | Mean [95% CI] . | Mean difference from prob. sample . | Mean [95% CI] . | Mean difference from prob. sample . | |
Chemical awareness | 4,555 | 0.43 [0.41, 0.45] | 4,053 | −0.06** | 0.43 [0.39, 0.47] | −0.05** | 0.43 [0.40, 0.45] | −0.06** |
Chemical harm | 3,071 | 3.22 [3.17, 3.27] | 2,867 | −0.36** | 2.99 [2.86, 3.12] | −0.31** | 3.27 [3.22, 3.32] | −0.31** |
Chemical discouragement | 5,009 | 3.44 [3.40, 3.49] | 4,137 | −0.35** | 2.88 [2.78, 2.98] | −0.25** | 3.57 [3.52, 3.62] | −0.26** |
n | % [95% CI] | n | % difference from prob. sample | % [95% CI] | % difference from prob. sample | % [95% CI] | % difference from prob. sample | |
Positive attitude toward smokers | 404 | 8.0 [6.3, 10.1] | 219 | −2.7** | 14.4 [11.3, 18.3] | −3.8** | 6.6 [4.8, 9.2] | −4.2** |
Positive attitude toward e-cigarette users | 833 | 16.7 [14.6, 18.9] | 848 | +3.8** | 32.2 [26.8, 38.1] | −3.7** | 13.2 [11.2, 15.7] | +3.0** |
Belief about source of chemicals | ||||||||
Tobacco before it is made into a cigarette | 409 | 7.7 [6.5, 9.1] | 157 | −3.9** | 6.5 [4.1, 10.0] | −3.7** | 8.0 [6.7, 9.6] | −3.7** |
Tobacco additives | 3,074 | 59.9 [57.3, 62.4] | 2,987 | +12.3** | 74.1 [68.9, 78.7] | +2.7* | 56.8 [53.9, 59.7] | +13.0** |
Burning the cigarette | 1,417 | 29.9 [27.5, 32.4] | 986 | −6.1** | 18.3 [14.3, 23.0] | +1.8 | 32.4 [29.7, 35.3] | −6.6** |
Belief about amount of chemicals trapped by cigarette filtersa | ||||||||
All or a lot | 467 | 9.3 [8.0, 10.7] | 215 | −4.1** | 7.5 [5.2, 10.9] | −1.3 | 9.7 [8.3, 11.3] | −5.1** |
Some | 2,715 | 53.7 [51.1, 56.2] | 2,911 | +16.7** | 61.9 [56.7, 66.9] | +10.6** | 51.8 [49.0, 54.7] | +17.5** |
None | 1,611 | 32.7 [30.2, 35.2] | 897 | −11.0** | 25.8 [21.8, 30.3] | −6.5** | 34.2 [31.4, 37.1] | −11.2** |
Looked for chemical information | 1,468 | 27.5 [25.4, 29.7] | 1,606 | +11.3** | 34.3 [29.0, 40.0] | +4.6** | 26.0 [23.8, 28.5] | +12.8** |
Would most like to see chemical information | ||||||||
On cigarette packs | 2,667 | 54.8 [52.3, 57.3] | 2,014 | −6.1** | 57.2 [51.8, 62.5] | −15.2** | 54.3 [51.5, 57.1] | −2.0* |
In store | 774 | 15.0 [13.3, 16.7] | 251 | −8.9** | 11.6 [8.4, 15.9] | −5.7** | 15.7 [13.9, 17.7] | −9.5** |
Online | 1,503 | 28.7 [26.6, 31.0] | 1,870 | +16.5** | 28.8 [24.4, 33.6] | +23.2** | 28.7 [26.3, 31.3] | +12.9** |
. | Overall . | Smokers . | Nonsmokers . | |||||
---|---|---|---|---|---|---|---|---|
Probability . | Convenience . | Probability . | Convenience . | Probability . | Convenience . | |||
n . | Mean [95% CI] . | n . | Mean difference from prob. sample . | Mean [95% CI] . | Mean difference from prob. sample . | Mean [95% CI] . | Mean difference from prob. sample . | |
Chemical awareness | 4,555 | 0.43 [0.41, 0.45] | 4,053 | −0.06** | 0.43 [0.39, 0.47] | −0.05** | 0.43 [0.40, 0.45] | −0.06** |
Chemical harm | 3,071 | 3.22 [3.17, 3.27] | 2,867 | −0.36** | 2.99 [2.86, 3.12] | −0.31** | 3.27 [3.22, 3.32] | −0.31** |
Chemical discouragement | 5,009 | 3.44 [3.40, 3.49] | 4,137 | −0.35** | 2.88 [2.78, 2.98] | −0.25** | 3.57 [3.52, 3.62] | −0.26** |
n | % [95% CI] | n | % difference from prob. sample | % [95% CI] | % difference from prob. sample | % [95% CI] | % difference from prob. sample | |
Positive attitude toward smokers | 404 | 8.0 [6.3, 10.1] | 219 | −2.7** | 14.4 [11.3, 18.3] | −3.8** | 6.6 [4.8, 9.2] | −4.2** |
Positive attitude toward e-cigarette users | 833 | 16.7 [14.6, 18.9] | 848 | +3.8** | 32.2 [26.8, 38.1] | −3.7** | 13.2 [11.2, 15.7] | +3.0** |
Belief about source of chemicals | ||||||||
Tobacco before it is made into a cigarette | 409 | 7.7 [6.5, 9.1] | 157 | −3.9** | 6.5 [4.1, 10.0] | −3.7** | 8.0 [6.7, 9.6] | −3.7** |
Tobacco additives | 3,074 | 59.9 [57.3, 62.4] | 2,987 | +12.3** | 74.1 [68.9, 78.7] | +2.7* | 56.8 [53.9, 59.7] | +13.0** |
Burning the cigarette | 1,417 | 29.9 [27.5, 32.4] | 986 | −6.1** | 18.3 [14.3, 23.0] | +1.8 | 32.4 [29.7, 35.3] | −6.6** |
Belief about amount of chemicals trapped by cigarette filtersa | ||||||||
All or a lot | 467 | 9.3 [8.0, 10.7] | 215 | −4.1** | 7.5 [5.2, 10.9] | −1.3 | 9.7 [8.3, 11.3] | −5.1** |
Some | 2,715 | 53.7 [51.1, 56.2] | 2,911 | +16.7** | 61.9 [56.7, 66.9] | +10.6** | 51.8 [49.0, 54.7] | +17.5** |
None | 1,611 | 32.7 [30.2, 35.2] | 897 | −11.0** | 25.8 [21.8, 30.3] | −6.5** | 34.2 [31.4, 37.1] | −11.2** |
Looked for chemical information | 1,468 | 27.5 [25.4, 29.7] | 1,606 | +11.3** | 34.3 [29.0, 40.0] | +4.6** | 26.0 [23.8, 28.5] | +12.8** |
Would most like to see chemical information | ||||||||
On cigarette packs | 2,667 | 54.8 [52.3, 57.3] | 2,014 | −6.1** | 57.2 [51.8, 62.5] | −15.2** | 54.3 [51.5, 57.1] | −2.0* |
In store | 774 | 15.0 [13.3, 16.7] | 251 | −8.9** | 11.6 [8.4, 15.9] | −5.7** | 15.7 [13.9, 17.7] | −9.5** |
Online | 1,503 | 28.7 [26.6, 31.0] | 1,870 | +16.5** | 28.8 [24.4, 33.6] | +23.2** | 28.7 [26.3, 31.3] | +12.9** |
Shaded cells indicate nonsignificant differences between the probability and MTurk convenience samples. Data from the probability sample were weighted; data from the MTurk convenience sample were unweighted.
CI confidence interval.
aThe four-point response scale ranged from “all of the harmful chemicals” (coded as 1) to “none of them” (4).
*p < .05; **p < .01.
Comparison between probability and convenience samples: Tobacco-related prevalence estimates
. | Overall . | Smokers . | Nonsmokers . | |||||
---|---|---|---|---|---|---|---|---|
Probability . | Convenience . | Probability . | Convenience . | Probability . | Convenience . | |||
n . | Mean [95% CI] . | n . | Mean difference from prob. sample . | Mean [95% CI] . | Mean difference from prob. sample . | Mean [95% CI] . | Mean difference from prob. sample . | |
Chemical awareness | 4,555 | 0.43 [0.41, 0.45] | 4,053 | −0.06** | 0.43 [0.39, 0.47] | −0.05** | 0.43 [0.40, 0.45] | −0.06** |
Chemical harm | 3,071 | 3.22 [3.17, 3.27] | 2,867 | −0.36** | 2.99 [2.86, 3.12] | −0.31** | 3.27 [3.22, 3.32] | −0.31** |
Chemical discouragement | 5,009 | 3.44 [3.40, 3.49] | 4,137 | −0.35** | 2.88 [2.78, 2.98] | −0.25** | 3.57 [3.52, 3.62] | −0.26** |
n | % [95% CI] | n | % difference from prob. sample | % [95% CI] | % difference from prob. sample | % [95% CI] | % difference from prob. sample | |
Positive attitude toward smokers | 404 | 8.0 [6.3, 10.1] | 219 | −2.7** | 14.4 [11.3, 18.3] | −3.8** | 6.6 [4.8, 9.2] | −4.2** |
Positive attitude toward e-cigarette users | 833 | 16.7 [14.6, 18.9] | 848 | +3.8** | 32.2 [26.8, 38.1] | −3.7** | 13.2 [11.2, 15.7] | +3.0** |
Belief about source of chemicals | ||||||||
Tobacco before it is made into a cigarette | 409 | 7.7 [6.5, 9.1] | 157 | −3.9** | 6.5 [4.1, 10.0] | −3.7** | 8.0 [6.7, 9.6] | −3.7** |
Tobacco additives | 3,074 | 59.9 [57.3, 62.4] | 2,987 | +12.3** | 74.1 [68.9, 78.7] | +2.7* | 56.8 [53.9, 59.7] | +13.0** |
Burning the cigarette | 1,417 | 29.9 [27.5, 32.4] | 986 | −6.1** | 18.3 [14.3, 23.0] | +1.8 | 32.4 [29.7, 35.3] | −6.6** |
Belief about amount of chemicals trapped by cigarette filtersa | ||||||||
All or a lot | 467 | 9.3 [8.0, 10.7] | 215 | −4.1** | 7.5 [5.2, 10.9] | −1.3 | 9.7 [8.3, 11.3] | −5.1** |
Some | 2,715 | 53.7 [51.1, 56.2] | 2,911 | +16.7** | 61.9 [56.7, 66.9] | +10.6** | 51.8 [49.0, 54.7] | +17.5** |
None | 1,611 | 32.7 [30.2, 35.2] | 897 | −11.0** | 25.8 [21.8, 30.3] | −6.5** | 34.2 [31.4, 37.1] | −11.2** |
Looked for chemical information | 1,468 | 27.5 [25.4, 29.7] | 1,606 | +11.3** | 34.3 [29.0, 40.0] | +4.6** | 26.0 [23.8, 28.5] | +12.8** |
Would most like to see chemical information | ||||||||
On cigarette packs | 2,667 | 54.8 [52.3, 57.3] | 2,014 | −6.1** | 57.2 [51.8, 62.5] | −15.2** | 54.3 [51.5, 57.1] | −2.0* |
In store | 774 | 15.0 [13.3, 16.7] | 251 | −8.9** | 11.6 [8.4, 15.9] | −5.7** | 15.7 [13.9, 17.7] | −9.5** |
Online | 1,503 | 28.7 [26.6, 31.0] | 1,870 | +16.5** | 28.8 [24.4, 33.6] | +23.2** | 28.7 [26.3, 31.3] | +12.9** |
. | Overall . | Smokers . | Nonsmokers . | |||||
---|---|---|---|---|---|---|---|---|
Probability . | Convenience . | Probability . | Convenience . | Probability . | Convenience . | |||
n . | Mean [95% CI] . | n . | Mean difference from prob. sample . | Mean [95% CI] . | Mean difference from prob. sample . | Mean [95% CI] . | Mean difference from prob. sample . | |
Chemical awareness | 4,555 | 0.43 [0.41, 0.45] | 4,053 | −0.06** | 0.43 [0.39, 0.47] | −0.05** | 0.43 [0.40, 0.45] | −0.06** |
Chemical harm | 3,071 | 3.22 [3.17, 3.27] | 2,867 | −0.36** | 2.99 [2.86, 3.12] | −0.31** | 3.27 [3.22, 3.32] | −0.31** |
Chemical discouragement | 5,009 | 3.44 [3.40, 3.49] | 4,137 | −0.35** | 2.88 [2.78, 2.98] | −0.25** | 3.57 [3.52, 3.62] | −0.26** |
n | % [95% CI] | n | % difference from prob. sample | % [95% CI] | % difference from prob. sample | % [95% CI] | % difference from prob. sample | |
Positive attitude toward smokers | 404 | 8.0 [6.3, 10.1] | 219 | −2.7** | 14.4 [11.3, 18.3] | −3.8** | 6.6 [4.8, 9.2] | −4.2** |
Positive attitude toward e-cigarette users | 833 | 16.7 [14.6, 18.9] | 848 | +3.8** | 32.2 [26.8, 38.1] | −3.7** | 13.2 [11.2, 15.7] | +3.0** |
Belief about source of chemicals | ||||||||
Tobacco before it is made into a cigarette | 409 | 7.7 [6.5, 9.1] | 157 | −3.9** | 6.5 [4.1, 10.0] | −3.7** | 8.0 [6.7, 9.6] | −3.7** |
Tobacco additives | 3,074 | 59.9 [57.3, 62.4] | 2,987 | +12.3** | 74.1 [68.9, 78.7] | +2.7* | 56.8 [53.9, 59.7] | +13.0** |
Burning the cigarette | 1,417 | 29.9 [27.5, 32.4] | 986 | −6.1** | 18.3 [14.3, 23.0] | +1.8 | 32.4 [29.7, 35.3] | −6.6** |
Belief about amount of chemicals trapped by cigarette filtersa | ||||||||
All or a lot | 467 | 9.3 [8.0, 10.7] | 215 | −4.1** | 7.5 [5.2, 10.9] | −1.3 | 9.7 [8.3, 11.3] | −5.1** |
Some | 2,715 | 53.7 [51.1, 56.2] | 2,911 | +16.7** | 61.9 [56.7, 66.9] | +10.6** | 51.8 [49.0, 54.7] | +17.5** |
None | 1,611 | 32.7 [30.2, 35.2] | 897 | −11.0** | 25.8 [21.8, 30.3] | −6.5** | 34.2 [31.4, 37.1] | −11.2** |
Looked for chemical information | 1,468 | 27.5 [25.4, 29.7] | 1,606 | +11.3** | 34.3 [29.0, 40.0] | +4.6** | 26.0 [23.8, 28.5] | +12.8** |
Would most like to see chemical information | ||||||||
On cigarette packs | 2,667 | 54.8 [52.3, 57.3] | 2,014 | −6.1** | 57.2 [51.8, 62.5] | −15.2** | 54.3 [51.5, 57.1] | −2.0* |
In store | 774 | 15.0 [13.3, 16.7] | 251 | −8.9** | 11.6 [8.4, 15.9] | −5.7** | 15.7 [13.9, 17.7] | −9.5** |
Online | 1,503 | 28.7 [26.6, 31.0] | 1,870 | +16.5** | 28.8 [24.4, 33.6] | +23.2** | 28.7 [26.3, 31.3] | +12.9** |
Shaded cells indicate nonsignificant differences between the probability and MTurk convenience samples. Data from the probability sample were weighted; data from the MTurk convenience sample were unweighted.
CI confidence interval.
aThe four-point response scale ranged from “all of the harmful chemicals” (coded as 1) to “none of them” (4).
*p < .05; **p < .01.
When stratified by smoking status, the probability and MTurk convenience samples continued to have different point estimates on demographic and tobacco-related variables, with a few exceptions. Among smokers, a comparable number of people between the two samples were 18–24 years old, male, users of other tobacco products, and believed that the source of cigarette chemicals was burning the cigarette and that all or a lot of chemicals were trapped by cigarette filters. Among nonsmokers, the two samples had similar numbers of users of other tobacco products.
Discussion
Online convenience samples are an increasingly popular tool for recruiting participants in health behavior research, but surprisingly little is known about how they differ from probability samples designed to represent the general public. Our study found that an online MTurk convenience sample provided experimental results and associations among measured variables that closely mirrored those obtained with a probability sample, using the example of tobacco product use. Prevalence estimate comparisons showed the opposite pattern such that the samples differed on almost every variable we assessed.
Experiments and Associations Among Measured Variables
Using MTurk convenience samples is especially promising for experiments and assessing associations among measured variables. Our study found that an MTurk sample provided comparable findings to those yielded by a probability sample across six experiments and quasi-experiments. Although effect sizes sometimes varied between the two samples (Supplementary Table S1), the substantive conclusions yielded from the experiments were the same in both samples. Furthermore, our findings were parallel to studies from political science [21, 22], sociology [23], and evolutionary game theory [24], which also reported equivalence in experimental findings between MTurk convenience and national probability samples.
Few studies have compared associations between measured variables as estimated for MTurk convenience and probability samples [25]. In our study, the same demographic and tobacco use behaviors were generally associated with current cigarette and e-cigarette use in the two samples, with only a few exceptions. Overall, the effect sizes produced by the probability sample tended to be slightly larger than those produced by the convenience sample (Supplementary Table S2), but the results pointed to the same substantive findings. Our study contributes to the literature by demonstrating the potential for using MTurk convenience samples to examine associations with different outcomes of interest, particularly in tobacco control research and potentially in studies of other health behaviors.
The many equivalent findings offer strong support for use of convenience samples in early stages of development of stimuli for interventions, health promotion campaigns, and other larger, more costly research projects. The MTurk survey of the national convenience sample took about a month to complete (about 2 weeks for programming, 3 days for data collection in MTurk, and a week to prepare a usable dataset). In comparison, the phone survey of the national probability sample took about a year to complete (about 2 months for programming, 9 months for data collection, and 2 months to prepare a usable dataset with survey weights). The MTurk survey was also far cheaper than the phone survey (<3% of the cost). Overall, it is fair to suggest that MTurk is an efficient and low-cost way to examine both observational and causal relationships in early stages of research.
Prevalence Estimates
The MTurk convenience and probability samples yielded different point estimates for almost every variable we examined. The MTurk convenience sample was younger, more educated, and included a greater representation of Whites and Asians. These demographic differences were similar to those found in previous studies that examined MTurk workers’ characteristics in other contexts [10, 22, 26, 27]. The MTurk convenience sample also had a higher representation of cigarette and other tobacco product users. This could potentially be because tobacco users were specifically encouraged to participate in the study, but this result does mirror findings from previous research that tobacco users tend to be overrepresented in MTurk samples [10, 25]. Our findings build on the results of a recent study that found crowdsourcing yielded prevalence estimates related to various health behaviors that were very different from those for the general public [28]. Such differences raise concerns about the external validity of inferences about prevalence based on MTurk convenience samples. Researchers hoping to use MTurk solely for the purpose of collecting population prevalence data should do so with caution, ideally confirming data with representative samples acquired through probability sampling.
Our findings support the use of MTurk for behavioral research in specific applications. While MTurk convenience samples may not yield prevalence estimates that are representative of the population, they are more representative than restricted in-person convenience samples such as college students [22, 29]. As long as researchers recognize that prevalence data for MTurk participants are likely biased, MTurk may, when an online study is suitable, be preferable to a student convenience sample. Moreover, in a world of limited resources, researchers need to make tradeoffs among validity types in any single study [30], and external validity is of lower priority to researchers in many instances, especially in studies aiming to develop and test measures or early mechanistic studies.
Limitations and Conclusions
Study surveys were administered using different interview modes, online and by phone. Although mode differences may account for the differences that we attribute to sampling method, this seems unlikely given the observed pattern of findings. Bias attributable to self-deception or impression management triggered by talking on the phone would have yielded larger and more consistent differences in tobacco-related beliefs and behaviors than we found, with much lower estimates in the phone survey across all pro-tobacco beliefs or behavior. Furthermore, the two modes provided results that were highly comparable for both experimental and observational findings, minimizing the likelihood of self-presentation bias. Another caveat is that our study focused only on adult samples in the USA and examined findings within the specific context of tobacco use. Future research may extend the generalizability of our study’s findings to children, non-U.S. populations, and health behaviors beyond tobacco product use.
Convenience sampling, including via online crowdsourcing platforms like MTurk, has many benefits. It can be a low-cost tool to quickly test ideas and refine different aspects of the research before investing substantial time and money in larger-scale studies. MTurk is also a useful platform for conducting large population-based experiments that allow researchers to minimize threats to internal validity and, for estimates of association, external validity. Admittedly, convenience samples have several limitations that warrant extra caution as researchers implement their studies and interpret their findings. Nonetheless, our study supports previous findings that suggest that MTurk is a valid recruitment tool for pilot or experimental studies, extends those findings to tobacco control research, and provides a foundation for use of online convenience samples in the domain of health behavior.
Acknowledgements
Research reported in this publication was supported by grant number P50CA180907 from the National Cancer Institute and FDA Center for Tobacco Products (CTP). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH or the Food and Drug Administration. The authors are grateful to Robert Agans, Kristen Jarmen, Seth Noar, Kurt Ribisl, and Quirina Vallejos for their contributions in the early stages of this study.
Compliance with Ethical Standards
Authors’ Statement of Conflict of Interest and Adherence to Ethical Standards Dr. Brewer has served as a paid expert consultant in litigation against tobacco companies. The other authors declare no conflicts of interest.
Ethical Approval The institutional review board at the University of North Carolina approved all study procedures.
Informed Consent Informed consent was obtained from all individual participants included in the study.