Online research offers many benefits to nicotine and tobacco research, including the ability to conveniently and cheaply recruit large numbers of participants over a wide geographic area. However, online research also introduces an increased risk of participant deception and fraud.1–3 Heffner et al. introduced valuable techniques for detecting fraud that can be applied to a wide variety of study protocols.3 We offer an example of detecting and responding to participant deception and adapting to changing patterns of fraud in an online cohort study of adolescents and young adults.

The goal of the Policy and Communication Evaluation (PACE) Vermont Study is to understand the impact of state-level policies and communication campaigns on substance use beliefs and behaviors in young Vermonters.4 Study measures and findings are aimed at informing state-level prevention programming.5,6 Using a cohort design, participants aged 12–25 years are recruited via multiple methods (including online advertisements) to visit a study website and complete online eligibility screening and informed consent.4 Between 2020 and 2021, three periods of enrollment were open during which participants were screened through Qualtrics surveys, and then manually screened by a member of our study team. Automated survey features were employed within Qualtrics to prevent multiple screener attempts, identify potential duplicates, and detect bots7; screeners also included CAPTCHA verification and one of three simple math problems which served as an attention check and were randomized to ensure that the same response was not copied across surveys. In consultation with our IRB, we included language in our online informed consent about discontinuing participants for providing information in the surveys that we believed to be intentionally falsified. The systematic method of screening participants for deception developed in our pilot study4 was revised and refined in subsequent waves based on changes in participant response patterns.

Beyond the multiple checks built into the screening surveys themselves, we employ a multiphase screening process to reduce participant deception in our PACE Vermont cohort: First, participants complete a brief interest survey to determine initial eligibility and obtain contact information. Second, participants meeting the initial eligibility criteria receive a screener survey via text message or email to ensure valid contact information; youth participants require an additional parent survey which is linked to their response by a common identifier and provides consent for them to participate in the study. Third, eligible participants complete a payment form that includes their mailing address. Finally, the research staff creates a unified dataset across these forms to conduct a manual assessment of eligibility; coding within this dataset verifies age based on the date of birth provided and location based on IP address. Manual checks confirm eligibility, consent, and completion of the payment form as well as flagging participants based on the following criteria: (1) age (eg, providing a birth date that differs from self-reported age), (2) location outside Vermont (eg, IP address, mailing address), (3) suspicious contact information (eg, email addresses that include only numbers or names that do not correspond to contact information), and (4) inconsistency of name, location, or contact information across forms. Participants are informed about study eligibility via email after all checks are complete.

Additional manual checks were introduced in later waves. In Fall 2020, we noticed a number of respondents reporting mailing addresses that corresponded to homes listed for sale on Zillow. Hypothesizing that “bots” may be pulling valid Vermont addresses from for-sale listings, we decided to flag these participants in our fall 2020 and spring 2021 screenings. Manual checks were reviewed regularly by the study team and specific cases were resolved by discussion. After all, checks were completed, potentially deceptive participants were flagged and received an email from the study team offering the opportunity to confirm their contact information and remain in the study. Respondents who did not respond via email or whose email responses employed strange formatting, language, or did not confirm the correct contact information were deemed ineligible and removed from the study. In some cases, discrepancies in responses for youth participants were related to their lack of knowledge (eg, county of residence) and we received messages from a parent about their eligibility; we used the parent screening survey to verify the parent’s contact information and connection to the participant before deeming the participant valid and eligible.

During each enrollment period, a high proportion of our total screened respondents were determined to be ineligible (49%–67%; Table 1). However, only a fraction of likely deceptive respondents have screened out automatically through Qualtrics, either because of responses to screening questions or Qualtrics’ Expert Review Fraud Detection features (10%–46%).7 Most ineligible, likely deceptive respondents were screened out manually using a process that evolved at each recruitment period (52%–90%).

Table 1.

Screening Eligibility by Automated and Manual Checks, PACE Vermont Study (2020–2021)

Total screenedTotal screened eligibleTotal screened ineligibleScreened out by QualtricsScreened out manuallyIneligible but not fraudulent1LocationSuspicious addressSuspicious email addressSuspicious form completion2Mismatch between names givenSuspicious on multiple countsTown mismatchAddress listed for sale
Summer 20203,0521405 (46%)1647(54%)259/1647 (16%)1388/1647 (84%)01262/1388 (91%)22/1388 (2%)65/1388 (5%)5/1388 (<1%)30/1388 (2%)4/1388 (<1%)00
Fall 202029295 (33%)197 (67%)19/197 (10%)178/197 (90%)10/178 (6%)27/178 (15%)10/178 (6%)27/178 (15%)01/178 (<1%)29/178 (16%)2/178 (1%)72/178 (40%)
Spring 2021305154 (51%)151 (49%)70/151 (46%)81/151 (52%)9/81 (11%)26/81 (32%)034/81 (42%)0005/81 (6%)7/81 (9%)
Total screenedTotal screened eligibleTotal screened ineligibleScreened out by QualtricsScreened out manuallyIneligible but not fraudulent1LocationSuspicious addressSuspicious email addressSuspicious form completion2Mismatch between names givenSuspicious on multiple countsTown mismatchAddress listed for sale
Summer 20203,0521405 (46%)1647(54%)259/1647 (16%)1388/1647 (84%)01262/1388 (91%)22/1388 (2%)65/1388 (5%)5/1388 (<1%)30/1388 (2%)4/1388 (<1%)00
Fall 202029295 (33%)197 (67%)19/197 (10%)178/197 (90%)10/178 (6%)27/178 (15%)10/178 (6%)27/178 (15%)01/178 (<1%)29/178 (16%)2/178 (1%)72/178 (40%)
Spring 2021305154 (51%)151 (49%)70/151 (46%)81/151 (52%)9/81 (11%)26/81 (32%)034/81 (42%)0005/81 (6%)7/81 (9%)

This group includes individuals who indicated they were within our population of interest, but whose birth date placed them outside this range (eg, an individual who states they are 12 years old, but whose birth date indicates they are 11 at the time of recruitment), and individuals who mistakenly went through the screening process but were already members of the cohort.

This group includes individuals who incorrectly filled out fields in the interest form, screening questionnaire, or payment form (eg, by typing “I agree” when asked to provide their initials).

Table 1.

Screening Eligibility by Automated and Manual Checks, PACE Vermont Study (2020–2021)

Total screenedTotal screened eligibleTotal screened ineligibleScreened out by QualtricsScreened out manuallyIneligible but not fraudulent1LocationSuspicious addressSuspicious email addressSuspicious form completion2Mismatch between names givenSuspicious on multiple countsTown mismatchAddress listed for sale
Summer 20203,0521405 (46%)1647(54%)259/1647 (16%)1388/1647 (84%)01262/1388 (91%)22/1388 (2%)65/1388 (5%)5/1388 (<1%)30/1388 (2%)4/1388 (<1%)00
Fall 202029295 (33%)197 (67%)19/197 (10%)178/197 (90%)10/178 (6%)27/178 (15%)10/178 (6%)27/178 (15%)01/178 (<1%)29/178 (16%)2/178 (1%)72/178 (40%)
Spring 2021305154 (51%)151 (49%)70/151 (46%)81/151 (52%)9/81 (11%)26/81 (32%)034/81 (42%)0005/81 (6%)7/81 (9%)
Total screenedTotal screened eligibleTotal screened ineligibleScreened out by QualtricsScreened out manuallyIneligible but not fraudulent1LocationSuspicious addressSuspicious email addressSuspicious form completion2Mismatch between names givenSuspicious on multiple countsTown mismatchAddress listed for sale
Summer 20203,0521405 (46%)1647(54%)259/1647 (16%)1388/1647 (84%)01262/1388 (91%)22/1388 (2%)65/1388 (5%)5/1388 (<1%)30/1388 (2%)4/1388 (<1%)00
Fall 202029295 (33%)197 (67%)19/197 (10%)178/197 (90%)10/178 (6%)27/178 (15%)10/178 (6%)27/178 (15%)01/178 (<1%)29/178 (16%)2/178 (1%)72/178 (40%)
Spring 2021305154 (51%)151 (49%)70/151 (46%)81/151 (52%)9/81 (11%)26/81 (32%)034/81 (42%)0005/81 (6%)7/81 (9%)

This group includes individuals who indicated they were within our population of interest, but whose birth date placed them outside this range (eg, an individual who states they are 12 years old, but whose birth date indicates they are 11 at the time of recruitment), and individuals who mistakenly went through the screening process but were already members of the cohort.

This group includes individuals who incorrectly filled out fields in the interest form, screening questionnaire, or payment form (eg, by typing “I agree” when asked to provide their initials).

Our experience highlights the importance of flexible, dynamic methods of fraud detection that respond to “bot learning” and adaptation by ineligible participants. The rates of specific types of suspicious responses differed across waves of recruitment for the same cohort study, and new methods of screening were added in response to emerging patterns in the later waves. For example, the majority of ineligible respondents in summer 2020 were screened out using IP location (91%), which showed that participants were not located within the geographic bounds of the state. In later waves, this type of fraud was less common than it had been in Summer 2020 (Fall 2020: 15%, Spring 2021: 32%), while suspicious patterns in reported email addresses increased (Fall 2020: 15%, Spring 2021: 42%). In Fall 2020, 40% of participants screened out manually had addresses listed for sale on Zillow compared with 9% in Spring 2021. No single indicator of fraud was consistently useful. Instead, it was more valuable to organize screening data by response time and look for groups of responses that were unusual in similar ways and completed at roughly the same time.

Given the likelihood of both false positives and false negatives in the detection process, we conducted an ongoing assessment of the validity of survey responses and removed participants in the midst of data collection if we identified them as suspicious or deceptive. In the process of analyzing open-ended responses from our Winter 2020 survey, for example, the study team noticed patterns of unusual, nearly identical responses submitted at the same time by different respondents, some of whom had participated in multiple waves of the study. As with our recruitment approach, we contacted participants with this pattern of responses to offer them a chance to remain enrolled in the study. Few of these participants replied to the study team, and those who used a similar, unusual email structure and phrasing. As a result of these additional checks, we withdrew 75 participants from the cohort (8% of the sample) prior to the spring survey. We retain the information for these participants within our full contact list—flagged as withdrawn—to allow us to identify potential duplicates or deceptive participants in future waves of recruitment.

Our experience indicates that deception in online tobacco research is pervasive, nearer to the high estimates of previous studies,2,3 difficult to detect with any single method, and not adequately captured by automated processes built into survey platforms. Our PACE Vermont methodology and study infrastructure have been replicated in New Jersey young adults,8 with consistent findings across states: 71% (4,242/5,955) of our PACE New Jersey cohort screenings were deemed ineligible and half of these ineligible respondents were identified manually (52%). Screening for participant deception is a time-intensive process that extends beyond initial screening and requires vigilant data checking at other time points. Moreover, no single method will filter out all deceptive respondents, and the methods that are used need to be responsive to the study design, participant experience, and changes in patterns of respondent behavior. Researchers designing online studies should take care to develop in-depth methods of manual fraud detection and regularly revise and evolve these methods as the study progresses to ensure data integrity. Publishing the methods used to address participant deception in online studies runs the risk of stimulating new forms of deception used to gain entry into research studies; however, this risk is balanced by the need in our field to ensure transparency and to continue to evolve and innovate our own best-practice anti-deception methods.3

Implications

Deception in online nicotine and tobacco research is pervasive, difficult to detect with any single method, and not adequately captured by automated processes built into survey platforms. Manual screening of data at recruitment and throughout data collection, the continual evolution of data quality checks, and attention to patterns of response require significant time and staffing to ensure data quality in online studies.

Funding

Research reported in this publication was supported by the National Institute on Drug Abuse (NIDA) of the National Institutes of Health under Award Number R21DA051943, the National Cancer Institute (NCI), and Food and Drug Administration (FDA) Center for Tobacco Products (CTP) under U54CA229973, and a contract from the New Jersey Department of Health (NJDOH).

Acknowledgments

The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health or the State of New Jersey.

Conflict of Interest

The authors have no conflicts to disclose.

References

1.

Bell
AM
,
Gift
T.
Fraud in online surveys: evidence from a nonprobability, subpopulation sample
.
J Exp Political Sci.
2022
:
1
6
.

2.

Godinho
A
,
Schell
C
,
Cunningham
JA.
Out damn bot, out: recruiting real people into substance use studies on the internet
.
Subst Abus.
2020
;
41
(
1
):
3
5
.

3.

Heffner
JL
,
Watson
NL
,
Dahne
J
, et al. .
Recognizing and preventing participant deception in online nicotine and tobacco research studies: suggested tactics and a call to action
.
Nicotine Tob Res.
2021
;
23
(
10
):
1810
1812
.

4.

Villanti
AC
,
Vallencourt
CP
,
West
JC
, et al. .
Recruiting and retaining youth and young adults in the Policy and Communication Evaluation (PACE) vermont study: randomized controlled trial of participant compensation
.
J Med Internet Res.
2020
;
22
(
7
):
e18446
.

5.

West
JC
,
Peasley-Miklus
C
,
Klemperer
EM
, et al. .
Young adults’ knowledge of state cannabis policy: implications for studying the effects of legalization in Vermont
.
Cannabis.
2022
. https://publications.sciences.ucf.edu/cannabis/index.php/Cannabis/article/view/116.

6.

Villanti
AC
,
LePine
SE
,
Peasley-Miklus
C
, et al. .
COVID-related distress, mental health, and substance use in adolescents and young adults
.
Child Adolesc Ment Health.
2022
;
27
(
2
):
138
145
.

8.

Young
WJ
,
Bover Manderski
MT
,
Ganz
O
,
Delnevo
CD
,
Hrywna
M.
Examining the impact of question construction on reporting of sexual identity: survey experiment among young adults
.
JMIR Public Health Surveill.
2021
;
7
(
12
):
e32294
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/pages/standard-publication-reuse-rights)

Comments

0 Comments
Submit a comment
You have entered an invalid code
Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.