Abstract

Objective

The All of Us Research Program makes individual-level data available to researchers while protecting the participants’ privacy. This article describes the protections embedded in the multistep access process, with a particular focus on how the data was transformed to meet generally accepted re-identification risk levels.

Methods

At the time of the study, the resource consisted of 329 084 participants. Systematic amendments were applied to the data to mitigate re-identification risk (eg, generalization of geographic regions, suppression of public events, and randomization of dates). We computed the re-identification risk for each participant using a state-of-the-art adversarial model specifically assuming that it is known that someone is a participant in the program. We confirmed the expected risk is no greater than 0.09, a threshold that is consistent with guidelines from various US state and federal agencies. We further investigated how risk varied as a function of participant demographics.

Results

The results indicated that 95th percentile of the re-identification risk of all the participants is below current thresholds. At the same time, we observed that risk levels were higher for certain race, ethnic, and genders.

Conclusions

While the re-identification risk was sufficiently low, this does not imply that the system is devoid of risk. Rather, All of Us uses a multipronged data protection strategy that includes strong authentication practices, active monitoring of data misuse, and penalization mechanisms for users who violate terms of service.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/pages/standard-publication-reuse-rights)
You do not currently have access to this article.