-
Views
-
Cite
Cite
Weiyi Xia, Melissa Basford, Robert Carroll, Ellen Wright Clayton, Paul Harris, Murat Kantacioglu, Yongtai Liu, Steve Nyemba, Yevgeniy Vorobeychik, Zhiyu Wan, Bradley A Malin, Managing re-identification risks while providing access to the All of Us research program, Journal of the American Medical Informatics Association, Volume 30, Issue 5, May 2023, Pages 907–914, https://doi.org/10.1093/jamia/ocad021
- Share Icon Share
Abstract
The All of Us Research Program makes individual-level data available to researchers while protecting the participants’ privacy. This article describes the protections embedded in the multistep access process, with a particular focus on how the data was transformed to meet generally accepted re-identification risk levels.
At the time of the study, the resource consisted of 329 084 participants. Systematic amendments were applied to the data to mitigate re-identification risk (eg, generalization of geographic regions, suppression of public events, and randomization of dates). We computed the re-identification risk for each participant using a state-of-the-art adversarial model specifically assuming that it is known that someone is a participant in the program. We confirmed the expected risk is no greater than 0.09, a threshold that is consistent with guidelines from various US state and federal agencies. We further investigated how risk varied as a function of participant demographics.
The results indicated that 95th percentile of the re-identification risk of all the participants is below current thresholds. At the same time, we observed that risk levels were higher for certain race, ethnic, and genders.
While the re-identification risk was sufficiently low, this does not imply that the system is devoid of risk. Rather, All of Us uses a multipronged data protection strategy that includes strong authentication practices, active monitoring of data misuse, and penalization mechanisms for users who violate terms of service.