A systematic review of database validation studies among fertility populations

Descriptive characteristics of included studies.

Authors	Year	Country	Data source being validated	Reference standard	Population	Sample size
Buck Louis GM and Druschel C	2015	USA	Questionnaire (Upstate New York Infant Development Screening Program Study)	IVF registry (SART CORS)	Mothers who had live births in Upstate New York between July 2008 and May 2010 in whom `Infertility treatment’ was checked on birth certificate and multiple births matched to singleton infants whose treatment box was not checked	5034
Buck Louis GM and Hediger ML	2014	USA	Administrative database (Perinatal Data System)	Questionnaire	Mothers who had live births in Upstate New York between July 2008 and May 2010 in whom `Infertility treatment’ was checked on birth certificate and multiple births matched to singleton infants whose treatment box was not checked	4989
Centers for Disease Control and Prevention	2016	USA	Fertility database (SART)	Medical record	ART cycle data from 458 fertility clinics in the US during the 2014 cycle year. A random selection of 34 clinics were selected	1996
Cohen B	2014	USA	Administrative database (birth certificates)	IVF registry (NASS)	Live births to Florida or Massachusetts resident mothers that occurred in state from March 2004 to December 2006	856 165
Gissler M	2004	Finland	Administrative database (medical birth record)	NA (compared ad hoc IVF research and IVF statistics, no reference standard)	Newborns from fertility treatments from 1996 to 1998	176 698
Hemminki E	2003	Finland	Administrative database (Drug Reimbursement Register)	Internal examination of data and linkage to Birth Register	Women exposed to ART between 1996 and 1998	24 318
Hvidtjørn D	2009	Denmark	Administrative database	IVF Registry	Women who participated in the first Danish National Birth Cohort (study) interview with a pregnancy resulting in a live born child between October 2007 and June 2003	88 151
Kotelchuck M	2014	USA	IVF registry (SART)	Administrative database (PELL)	Children born to Massachusetts resident women in MA hospitals from July 2004 to December 2008 conceived by ART	10 138
Liberman RF	2014	USA	Questionnaire (National Birth Defects Prevention Study)	IVF registry	Women who completed the NBDPS with in-state deliveries between September 2004 and December 2008	77
Luke B	2016	USA	Administrative database (birth certificates)	IVF registry	Live births in Florida, Massachusetts, New York, Pennsylvania, Texas, California, Ohio, and Colorado between 2004 and 2009. IVF cycles from SART CORS were linked to birth certificates.	716 103
Molinaro TA	2009	USA	IVF registry	Medical records	IVF patients enrolled for other studies at the University of Pennsylvania between December 2003 and June 2006	590
Overbeek A	2013	Netherlands	Questionnaire (DCOG LATER-VEVO Study—nationwide cohort study)	Administrative database (Netherlands Perinatal Registry)	Childhood cancer survivors who achieved pregnancy and their sibling controls	524
Rosenfeld Y	2009a	Israel	IVF reporting system	Medical record	Women who receive fertility treatment in the District of Haifa and Western Galilee of the General Health Services	108
Stern JE andGopal D	2016a	USA	IVF registry (SART)	Administrative database (Massachusetts BDMP Registry)	ART deliveries from 1 July 2004 to 31 December 2008 in Massachusetts	9092
Stern JE and McLain AC	2016b	USA	Questionnaire (Upstate New York Infant Development Screening Program Study)	SART database for current cycle; Questionnaire for prior treatment information	Mothers who participated in Upstate KIDS Study linked with SART CORS	617
Sunderam S	2006	USA	Administrative database	IVF registry	Infants born in 1997 and 1998 in MA, RI, NH, CT to MA-resident mothers who used ART clinics in MA or RI	2703
Williams CL	2013	UK	Administrative database (National Registry of Childhood Tumours)	IVF registry (HFEA)	Children born between 1 January 1992 and 31 December 2008	106 013
Zhang Y	2012	USA	Administrative database	IVF registry (NASS)	Live births to MA-resident mothers that occurred in MA during 1997-2000	6139
Zhang Z	2010	USA	Administrative database (Massachusetts Registry of Vital Records and Statistics-MBC)	IVF registry (NASS)	Live births to MA-resident mothers that occurred in MA during 1997–2000	5190

Authors	Year	Country	Data source being validated	Reference standard	Population	Sample size
Buck Louis GM and Druschel C	2015	USA	Questionnaire (Upstate New York Infant Development Screening Program Study)	IVF registry (SART CORS)	Mothers who had live births in Upstate New York between July 2008 and May 2010 in whom `Infertility treatment’ was checked on birth certificate and multiple births matched to singleton infants whose treatment box was not checked	5034
Buck Louis GM and Hediger ML	2014	USA	Administrative database (Perinatal Data System)	Questionnaire	Mothers who had live births in Upstate New York between July 2008 and May 2010 in whom `Infertility treatment’ was checked on birth certificate and multiple births matched to singleton infants whose treatment box was not checked	4989
Centers for Disease Control and Prevention	2016	USA	Fertility database (SART)	Medical record	ART cycle data from 458 fertility clinics in the US during the 2014 cycle year. A random selection of 34 clinics were selected	1996
Cohen B	2014	USA	Administrative database (birth certificates)	IVF registry (NASS)	Live births to Florida or Massachusetts resident mothers that occurred in state from March 2004 to December 2006	856 165
Gissler M	2004	Finland	Administrative database (medical birth record)	NA (compared ad hoc IVF research and IVF statistics, no reference standard)	Newborns from fertility treatments from 1996 to 1998	176 698
Hemminki E	2003	Finland	Administrative database (Drug Reimbursement Register)	Internal examination of data and linkage to Birth Register	Women exposed to ART between 1996 and 1998	24 318
Hvidtjørn D	2009	Denmark	Administrative database	IVF Registry	Women who participated in the first Danish National Birth Cohort (study) interview with a pregnancy resulting in a live born child between October 2007 and June 2003	88 151
Kotelchuck M	2014	USA	IVF registry (SART)	Administrative database (PELL)	Children born to Massachusetts resident women in MA hospitals from July 2004 to December 2008 conceived by ART	10 138
Liberman RF	2014	USA	Questionnaire (National Birth Defects Prevention Study)	IVF registry	Women who completed the NBDPS with in-state deliveries between September 2004 and December 2008	77
Luke B	2016	USA	Administrative database (birth certificates)	IVF registry	Live births in Florida, Massachusetts, New York, Pennsylvania, Texas, California, Ohio, and Colorado between 2004 and 2009. IVF cycles from SART CORS were linked to birth certificates.	716 103
Molinaro TA	2009	USA	IVF registry	Medical records	IVF patients enrolled for other studies at the University of Pennsylvania between December 2003 and June 2006	590
Overbeek A	2013	Netherlands	Questionnaire (DCOG LATER-VEVO Study—nationwide cohort study)	Administrative database (Netherlands Perinatal Registry)	Childhood cancer survivors who achieved pregnancy and their sibling controls	524
Rosenfeld Y	2009a	Israel	IVF reporting system	Medical record	Women who receive fertility treatment in the District of Haifa and Western Galilee of the General Health Services	108
Stern JE andGopal D	2016a	USA	IVF registry (SART)	Administrative database (Massachusetts BDMP Registry)	ART deliveries from 1 July 2004 to 31 December 2008 in Massachusetts	9092
Stern JE and McLain AC	2016b	USA	Questionnaire (Upstate New York Infant Development Screening Program Study)	SART database for current cycle; Questionnaire for prior treatment information	Mothers who participated in Upstate KIDS Study linked with SART CORS	617
Sunderam S	2006	USA	Administrative database	IVF registry	Infants born in 1997 and 1998 in MA, RI, NH, CT to MA-resident mothers who used ART clinics in MA or RI	2703
Williams CL	2013	UK	Administrative database (National Registry of Childhood Tumours)	IVF registry (HFEA)	Children born between 1 January 1992 and 31 December 2008	106 013
Zhang Y	2012	USA	Administrative database	IVF registry (NASS)	Live births to MA-resident mothers that occurred in MA during 1997-2000	6139
Zhang Z	2010	USA	Administrative database (Massachusetts Registry of Vital Records and Statistics-MBC)	IVF registry (NASS)	Live births to MA-resident mothers that occurred in MA during 1997–2000	5190

BDMP, Birth Defects Monitoring Program; NASS, National ART Surveillance System; NBDP, National Birth Defects Prevention; NBPDS, National Birth Defects Prevention Study; PELL, Pregnancy to Early Life Longitudinal data system; SART CORS, Society for Assisted Reproductive Technology Clinical Outcomes Reporting System.

Table I

Descriptive characteristics of included studies.

Authors	Year	Country	Data source being validated	Reference standard	Population	Sample size
Buck Louis GM and Druschel C	2015	USA	Questionnaire (Upstate New York Infant Development Screening Program Study)	IVF registry (SART CORS)	Mothers who had live births in Upstate New York between July 2008 and May 2010 in whom `Infertility treatment’ was checked on birth certificate and multiple births matched to singleton infants whose treatment box was not checked	5034
Buck Louis GM and Hediger ML	2014	USA	Administrative database (Perinatal Data System)	Questionnaire	Mothers who had live births in Upstate New York between July 2008 and May 2010 in whom `Infertility treatment’ was checked on birth certificate and multiple births matched to singleton infants whose treatment box was not checked	4989
Centers for Disease Control and Prevention	2016	USA	Fertility database (SART)	Medical record	ART cycle data from 458 fertility clinics in the US during the 2014 cycle year. A random selection of 34 clinics were selected	1996
Cohen B	2014	USA	Administrative database (birth certificates)	IVF registry (NASS)	Live births to Florida or Massachusetts resident mothers that occurred in state from March 2004 to December 2006	856 165
Gissler M	2004	Finland	Administrative database (medical birth record)	NA (compared ad hoc IVF research and IVF statistics, no reference standard)	Newborns from fertility treatments from 1996 to 1998	176 698
Hemminki E	2003	Finland	Administrative database (Drug Reimbursement Register)	Internal examination of data and linkage to Birth Register	Women exposed to ART between 1996 and 1998	24 318
Hvidtjørn D	2009	Denmark	Administrative database	IVF Registry	Women who participated in the first Danish National Birth Cohort (study) interview with a pregnancy resulting in a live born child between October 2007 and June 2003	88 151
Kotelchuck M	2014	USA	IVF registry (SART)	Administrative database (PELL)	Children born to Massachusetts resident women in MA hospitals from July 2004 to December 2008 conceived by ART	10 138
Liberman RF	2014	USA	Questionnaire (National Birth Defects Prevention Study)	IVF registry	Women who completed the NBDPS with in-state deliveries between September 2004 and December 2008	77
Luke B	2016	USA	Administrative database (birth certificates)	IVF registry	Live births in Florida, Massachusetts, New York, Pennsylvania, Texas, California, Ohio, and Colorado between 2004 and 2009. IVF cycles from SART CORS were linked to birth certificates.	716 103
Molinaro TA	2009	USA	IVF registry	Medical records	IVF patients enrolled for other studies at the University of Pennsylvania between December 2003 and June 2006	590
Overbeek A	2013	Netherlands	Questionnaire (DCOG LATER-VEVO Study—nationwide cohort study)	Administrative database (Netherlands Perinatal Registry)	Childhood cancer survivors who achieved pregnancy and their sibling controls	524
Rosenfeld Y	2009a	Israel	IVF reporting system	Medical record	Women who receive fertility treatment in the District of Haifa and Western Galilee of the General Health Services	108
Stern JE andGopal D	2016a	USA	IVF registry (SART)	Administrative database (Massachusetts BDMP Registry)	ART deliveries from 1 July 2004 to 31 December 2008 in Massachusetts	9092
Stern JE and McLain AC	2016b	USA	Questionnaire (Upstate New York Infant Development Screening Program Study)	SART database for current cycle; Questionnaire for prior treatment information	Mothers who participated in Upstate KIDS Study linked with SART CORS	617
Sunderam S	2006	USA	Administrative database	IVF registry	Infants born in 1997 and 1998 in MA, RI, NH, CT to MA-resident mothers who used ART clinics in MA or RI	2703
Williams CL	2013	UK	Administrative database (National Registry of Childhood Tumours)	IVF registry (HFEA)	Children born between 1 January 1992 and 31 December 2008	106 013
Zhang Y	2012	USA	Administrative database	IVF registry (NASS)	Live births to MA-resident mothers that occurred in MA during 1997-2000	6139
Zhang Z	2010	USA	Administrative database (Massachusetts Registry of Vital Records and Statistics-MBC)	IVF registry (NASS)	Live births to MA-resident mothers that occurred in MA during 1997–2000	5190

Authors	Year	Country	Data source being validated	Reference standard	Population	Sample size
Buck Louis GM and Druschel C	2015	USA	Questionnaire (Upstate New York Infant Development Screening Program Study)	IVF registry (SART CORS)	Mothers who had live births in Upstate New York between July 2008 and May 2010 in whom `Infertility treatment’ was checked on birth certificate and multiple births matched to singleton infants whose treatment box was not checked	5034
Buck Louis GM and Hediger ML	2014	USA	Administrative database (Perinatal Data System)	Questionnaire	Mothers who had live births in Upstate New York between July 2008 and May 2010 in whom `Infertility treatment’ was checked on birth certificate and multiple births matched to singleton infants whose treatment box was not checked	4989
Centers for Disease Control and Prevention	2016	USA	Fertility database (SART)	Medical record	ART cycle data from 458 fertility clinics in the US during the 2014 cycle year. A random selection of 34 clinics were selected	1996
Cohen B	2014	USA	Administrative database (birth certificates)	IVF registry (NASS)	Live births to Florida or Massachusetts resident mothers that occurred in state from March 2004 to December 2006	856 165
Gissler M	2004	Finland	Administrative database (medical birth record)	NA (compared ad hoc IVF research and IVF statistics, no reference standard)	Newborns from fertility treatments from 1996 to 1998	176 698
Hemminki E	2003	Finland	Administrative database (Drug Reimbursement Register)	Internal examination of data and linkage to Birth Register	Women exposed to ART between 1996 and 1998	24 318
Hvidtjørn D	2009	Denmark	Administrative database	IVF Registry	Women who participated in the first Danish National Birth Cohort (study) interview with a pregnancy resulting in a live born child between October 2007 and June 2003	88 151
Kotelchuck M	2014	USA	IVF registry (SART)	Administrative database (PELL)	Children born to Massachusetts resident women in MA hospitals from July 2004 to December 2008 conceived by ART	10 138
Liberman RF	2014	USA	Questionnaire (National Birth Defects Prevention Study)	IVF registry	Women who completed the NBDPS with in-state deliveries between September 2004 and December 2008	77
Luke B	2016	USA	Administrative database (birth certificates)	IVF registry	Live births in Florida, Massachusetts, New York, Pennsylvania, Texas, California, Ohio, and Colorado between 2004 and 2009. IVF cycles from SART CORS were linked to birth certificates.	716 103
Molinaro TA	2009	USA	IVF registry	Medical records	IVF patients enrolled for other studies at the University of Pennsylvania between December 2003 and June 2006	590
Overbeek A	2013	Netherlands	Questionnaire (DCOG LATER-VEVO Study—nationwide cohort study)	Administrative database (Netherlands Perinatal Registry)	Childhood cancer survivors who achieved pregnancy and their sibling controls	524
Rosenfeld Y	2009a	Israel	IVF reporting system	Medical record	Women who receive fertility treatment in the District of Haifa and Western Galilee of the General Health Services	108
Stern JE andGopal D	2016a	USA	IVF registry (SART)	Administrative database (Massachusetts BDMP Registry)	ART deliveries from 1 July 2004 to 31 December 2008 in Massachusetts	9092
Stern JE and McLain AC	2016b	USA	Questionnaire (Upstate New York Infant Development Screening Program Study)	SART database for current cycle; Questionnaire for prior treatment information	Mothers who participated in Upstate KIDS Study linked with SART CORS	617
Sunderam S	2006	USA	Administrative database	IVF registry	Infants born in 1997 and 1998 in MA, RI, NH, CT to MA-resident mothers who used ART clinics in MA or RI	2703
Williams CL	2013	UK	Administrative database (National Registry of Childhood Tumours)	IVF registry (HFEA)	Children born between 1 January 1992 and 31 December 2008	106 013
Zhang Y	2012	USA	Administrative database	IVF registry (NASS)	Live births to MA-resident mothers that occurred in MA during 1997-2000	6139
Zhang Z	2010	USA	Administrative database (Massachusetts Registry of Vital Records and Statistics-MBC)	IVF registry (NASS)	Live births to MA-resident mothers that occurred in MA during 1997–2000	5190

Four studies validated method of conception from birth registries (Gissler et al., 2004; Zhang et al., 2010; Cohen et al., 2014; Luke et al., 2016), two validated diagnoses or treatment variables within the fertility database (Molinaro et al., 2009; Centers for Disease Control and Prevention et al., 2016), one study created an algorithm to identify a patient population (Hemminki et al., 2003), and four studies validated linkage algorithms between a fertility and a second administrative database (Sunderam et al., 2006; Zhang et al., 2012; Williams et al., 2013; Kotelchuck et al., 2014).

Sensitivity was the most commonly reported validation measure. Twelve studies reported sensitivity (Hvidtjorn et al., 2009; Zhang et al., 2010, 2012; Overbeek et al., 2013; Buck Louis et al., 2014, 2015; Cohen et al., 2014; Kotelchuck et al., 2014; Liberman et al., 2014; Luke et al., 2016; Stern et al., 2016a, 2016b), nine reported specificity (Hvidtjorn et al., 2009; Zhang et al., 2010; Overbeek et al., 2013; Buck Louis et al., 2014, 2015; Cohen et al., 2014; Kotelchuck et al., 2014; Liberman et al., 2014; Luke et al., 2016), six reported PPV (Hvidtjorn et al., 2009; Zhang et al., 2010; Overbeek et al., 2013; Cohen et al., 2014; Kotelchuck et al., 2014; Buck Louis et al., 2015), one reported NPV (Buck Louis et al., 2015), five reported the Kappa coefficient (Gissler et al., 2004; Overbeek et al., 2013; Buck Louis et al., 2014; Kotelchuck et al., 2014; Stern et al., 2016a), and seven reported percentage agreement (Gissler et al., 2004; Hvidtjorn et al., 2009; Zhang et al., 2012; Overbeek et al., 2013; Buck Louis et al., 2014; Stern et al., 2016a, 2016b) (Table II). The data quality measures are presented in Supplementary Data 3. Only three studies reported four or more measures of validation (Hvidtjorn et al., 2009; Buck Louis et al., 2014, 2015). Nine studies presented 95% CIs with the estimates (Gissler et al., 2004; Zhang et al., 2010, 2012; Overbeek et al., 2013; Cohen et al., 2014; Liberman et al., 2014; Buck Louis et al., 2015; Centers for Disease Control and Prevention et al., 2016; Stern et al., 2016a), of which five reported CIs for all estimates (Zhang et al., 2012; Cohen et al., 2014; Liberman et al., 2014; Buck Louis et al., 2015; Centers for Disease Control and Prevention et al., 2016).

Table II

Summary of reported validity measures.

Study	Sensitivity	Specificity	PPV	NPV	Kappa	% Agreement	ICC	AUC/c-statistic	Likelihood ratios	Four or more measures of validity	Number of measures	95% CI
Buck Louis et al. (2015)	10/10	10/10	10/10	10/10	No	No	No	No	No	10/10	4	10/10
Buck Louis et al. (2014)	1/4	1/4	No	No	1/4	4/4	No	No	No	1/4	4	0/4
Centers for Disease Control and Prevention et al. (2016)	No	No	No	No	No	No	No	No	No	No	1	18/18
Cohen et al. (2014)	2/2	2/2	2/2	No	No	No	No	No	No	No	3	2/2
Gissler et al. (2004)	No	No	No	No	1/2	2/2	No	No	No	No	2	1/2
Hemminki et al. (2003)	No	No	No	No	No	No	No	No	No	No	0	NA
Hvidtjørn et al. (2009)	3/3	3/3	3/3	No	No	3/3	No	No	No	3/3	4	0/3
Kotelchuck et al. (2014)	3/3	1/3	3/3	No	3/3	No	No	No	No	No	4	0/3
Liberman et al. (2014)	5/5	5/5	No	No	No	No	No	No	No	No	2	5/5
Luke et al. (2016)	1/1	1/1	No	No	No	No	No	No	No	No	2	No
Molinaro et al. (2009)	No	No	No	No	No	No	No	No	No	No	0	NA
Overbeek et al. (2013)	10/26	10/26	10/26	No	16/26	16/26	No	No	No	No	4	16/26
Rosenfeld and Strulov (2009)	No	No	No	No	No	No	No	No	No	No	2	No
Stern et al. (2016a)	6/11	No	No	No	2/11	2/11	No	No	No	No	3	6/11
Stern et al. (2016b)	13/13	No	No	No	No	3/13	No	No	No	No	5	No
Sunderam et al. (2006)	No	No	No	No	No	No	No	No	No	No	0	NA
Williams et al. (2013)	No	No	No	No	No	No	No	No	No	No	1	No
Zhang et al. (2012)	1/1	No	No	No	No	No	No	No	No	No	2	1/2
Zhang et al. (2010)	1/1	1/1	1/1	No	No	No	No	No	No	No	3	3/3

Study	Sensitivity	Specificity	PPV	NPV	Kappa	% Agreement	ICC	AUC/c-statistic	Likelihood ratios	Four or more measures of validity	Number of measures	95% CI
Buck Louis et al. (2015)	10/10	10/10	10/10	10/10	No	No	No	No	No	10/10	4	10/10
Buck Louis et al. (2014)	1/4	1/4	No	No	1/4	4/4	No	No	No	1/4	4	0/4
Centers for Disease Control and Prevention et al. (2016)	No	No	No	No	No	No	No	No	No	No	1	18/18
Cohen et al. (2014)	2/2	2/2	2/2	No	No	No	No	No	No	No	3	2/2
Gissler et al. (2004)	No	No	No	No	1/2	2/2	No	No	No	No	2	1/2
Hemminki et al. (2003)	No	No	No	No	No	No	No	No	No	No	0	NA
Hvidtjørn et al. (2009)	3/3	3/3	3/3	No	No	3/3	No	No	No	3/3	4	0/3
Kotelchuck et al. (2014)	3/3	1/3	3/3	No	3/3	No	No	No	No	No	4	0/3
Liberman et al. (2014)	5/5	5/5	No	No	No	No	No	No	No	No	2	5/5
Luke et al. (2016)	1/1	1/1	No	No	No	No	No	No	No	No	2	No
Molinaro et al. (2009)	No	No	No	No	No	No	No	No	No	No	0	NA
Overbeek et al. (2013)	10/26	10/26	10/26	No	16/26	16/26	No	No	No	No	4	16/26
Rosenfeld and Strulov (2009)	No	No	No	No	No	No	No	No	No	No	2	No
Stern et al. (2016a)	6/11	No	No	No	2/11	2/11	No	No	No	No	3	6/11
Stern et al. (2016b)	13/13	No	No	No	No	3/13	No	No	No	No	5	No
Sunderam et al. (2006)	No	No	No	No	No	No	No	No	No	No	0	NA
Williams et al. (2013)	No	No	No	No	No	No	No	No	No	No	1	No
Zhang et al. (2012)	1/1	No	No	No	No	No	No	No	No	No	2	1/2
Zhang et al. (2010)	1/1	1/1	1/1	No	No	No	No	No	No	No	3	3/3

AUC, area under the curve; ICC, intraclass correlation coefficient; NPV, negative predictive value; PPV, positive predictive value.

Numerator: number of validated variables; denominator: total number of variables considered for validation in each study.

Table II

Summary of reported validity measures.

Study	Sensitivity	Specificity	PPV	NPV	Kappa	% Agreement	ICC	AUC/c-statistic	Likelihood ratios	Four or more measures of validity	Number of measures	95% CI
Buck Louis et al. (2015)	10/10	10/10	10/10	10/10	No	No	No	No	No	10/10	4	10/10
Buck Louis et al. (2014)	1/4	1/4	No	No	1/4	4/4	No	No	No	1/4	4	0/4
Centers for Disease Control and Prevention et al. (2016)	No	No	No	No	No	No	No	No	No	No	1	18/18
Cohen et al. (2014)	2/2	2/2	2/2	No	No	No	No	No	No	No	3	2/2
Gissler et al. (2004)	No	No	No	No	1/2	2/2	No	No	No	No	2	1/2
Hemminki et al. (2003)	No	No	No	No	No	No	No	No	No	No	0	NA
Hvidtjørn et al. (2009)	3/3	3/3	3/3	No	No	3/3	No	No	No	3/3	4	0/3
Kotelchuck et al. (2014)	3/3	1/3	3/3	No	3/3	No	No	No	No	No	4	0/3
Liberman et al. (2014)	5/5	5/5	No	No	No	No	No	No	No	No	2	5/5
Luke et al. (2016)	1/1	1/1	No	No	No	No	No	No	No	No	2	No
Molinaro et al. (2009)	No	No	No	No	No	No	No	No	No	No	0	NA
Overbeek et al. (2013)	10/26	10/26	10/26	No	16/26	16/26	No	No	No	No	4	16/26
Rosenfeld and Strulov (2009)	No	No	No	No	No	No	No	No	No	No	2	No
Stern et al. (2016a)	6/11	No	No	No	2/11	2/11	No	No	No	No	3	6/11
Stern et al. (2016b)	13/13	No	No	No	No	3/13	No	No	No	No	5	No
Sunderam et al. (2006)	No	No	No	No	No	No	No	No	No	No	0	NA
Williams et al. (2013)	No	No	No	No	No	No	No	No	No	No	1	No
Zhang et al. (2012)	1/1	No	No	No	No	No	No	No	No	No	2	1/2
Zhang et al. (2010)	1/1	1/1	1/1	No	No	No	No	No	No	No	3	3/3

Study	Sensitivity	Specificity	PPV	NPV	Kappa	% Agreement	ICC	AUC/c-statistic	Likelihood ratios	Four or more measures of validity	Number of measures	95% CI
Buck Louis et al. (2015)	10/10	10/10	10/10	10/10	No	No	No	No	No	10/10	4	10/10
Buck Louis et al. (2014)	1/4	1/4	No	No	1/4	4/4	No	No	No	1/4	4	0/4
Centers for Disease Control and Prevention et al. (2016)	No	No	No	No	No	No	No	No	No	No	1	18/18
Cohen et al. (2014)	2/2	2/2	2/2	No	No	No	No	No	No	No	3	2/2
Gissler et al. (2004)	No	No	No	No	1/2	2/2	No	No	No	No	2	1/2
Hemminki et al. (2003)	No	No	No	No	No	No	No	No	No	No	0	NA
Hvidtjørn et al. (2009)	3/3	3/3	3/3	No	No	3/3	No	No	No	3/3	4	0/3
Kotelchuck et al. (2014)	3/3	1/3	3/3	No	3/3	No	No	No	No	No	4	0/3
Liberman et al. (2014)	5/5	5/5	No	No	No	No	No	No	No	No	2	5/5
Luke et al. (2016)	1/1	1/1	No	No	No	No	No	No	No	No	2	No
Molinaro et al. (2009)	No	No	No	No	No	No	No	No	No	No	0	NA
Overbeek et al. (2013)	10/26	10/26	10/26	No	16/26	16/26	No	No	No	No	4	16/26
Rosenfeld and Strulov (2009)	No	No	No	No	No	No	No	No	No	No	2	No
Stern et al. (2016a)	6/11	No	No	No	2/11	2/11	No	No	No	No	3	6/11
Stern et al. (2016b)	13/13	No	No	No	No	3/13	No	No	No	No	5	No
Sunderam et al. (2006)	No	No	No	No	No	No	No	No	No	No	0	NA
Williams et al. (2013)	No	No	No	No	No	No	No	No	No	No	1	No
Zhang et al. (2012)	1/1	No	No	No	No	No	No	No	No	No	2	1/2
Zhang et al. (2010)	1/1	1/1	1/1	No	No	No	No	No	No	No	3	3/3

AUC, area under the curve; ICC, intraclass correlation coefficient; NPV, negative predictive value; PPV, positive predictive value.

Numerator: number of validated variables; denominator: total number of variables considered for validation in each study.

The elements of data quality are summarized in Tables III and IV. Sixteen studies (84.2%) adequately described their data source, and all but one described the type of patient records from which data were extracted (Rosenfeld and Strulov, 2009a). The studies predominantly described inclusion and exclusion criteria and their methods for determining the validity of the data. Fifteen studies adequately described their method of patient sampling while 14 studies sampled the entire population in the database (Hemminki et al., 2003; Sunderam et al., 2006; Hvidtjorn et al., 2009; Zhang et al., 2010, 2012; Williams et al., 2013; Overbeek et al., 2013; Buck Louis et al., 2014, 2015; Cohen et al., 2014; Kotelchuck et al., 2014; Liberman et al., 2014; Luke et al., 2016; Stern et al., 2016a); one study performed a random sampling strategy (Centers for Disease Control and Prevention et al., 2016). Only one group performed their study using an a priori sample size (Centers for Disease Control and Prevention et al., 2016), and none provided statistical justification for their sample size.

Table III

Reporting quality of methodology of included studies.

Methods	Frequency	%
Describes the data source
Yes	16/19	84.2
Incomplete	2/19	10.5
Unclear	1/19	5.3
Describes type of records (inpatient, outpatient, linked records)
Yes	18/19	94.7
Unclear	1/19	5.3
Describes setting and locations where data were collected
Yes	18/19	94.7
Incomplete	1/19	5.3
Reports a priori sample size
Yes	1/19	5.3
Provides statistical justification for the sample size
Yes	0/19	0.0
Describe recruitment procedure of validation cohort (from a database, based on diagnostic codes)
Yes	17/19	89.5
Unclear	2/19	10.5
Describe patient sampling (Random, consecutive, all)
Random sampling	1/19	5.3
All	14/19	73.7
Unclear	2/19	10.5
Incomplete	2/19	10.5
Describe how participants were chosen for data collection and analysis
Yes	15/19	78.9
Unclear	2/19	10.5
Describes inclusion/exclusion criteria
Yes	14/19	73.7
Incomplete	1/19	5.3
Describes who identified patients (for patients identified from medical records)
Yes	1/19	5.3
Incomplete	1/19	5.3
Describes who collected data
Yes	3/19	15.8
Describes use of a priori data collection form
Yes	13/19	68.4
Unclear	1/19	5.3
Use of a split sample or an independent sample (revalidation using a separate cohort)
Yes	1/19	5.3
Describes the reference standard
Yes	13/17	76.5
Reports the number of persons reading the reference standard
Yes	2/17	11.8
Describes the training or expertise of persons reading reference standard
Yes	1/17	5.9
Readers of the reference standard were blinded to the results of the classification by routinely collected data for that patient (reference standard: medical records)
Yes	1/17	5.9
Reports a measure of concordance if >1 persons reading the reference standard
Yes	0/17	0.0
Describes the linkage procedure, if done (probabilistic/deterministic)
Yes	8/15	50.0
Incomplete	6/15	37.5
Describes the methods of linkage quality evaluation
Yes	7/15	46.7
Incomplete	2/15	13.3
Describes explicit methods for calculating or comparing measures of accuracy and statistical methods used to quantify uncertainty
Yes	13/19	68.4

Methods	Frequency	%
Describes the data source
Yes	16/19	84.2
Incomplete	2/19	10.5
Unclear	1/19	5.3
Describes type of records (inpatient, outpatient, linked records)
Yes	18/19	94.7
Unclear	1/19	5.3
Describes setting and locations where data were collected
Yes	18/19	94.7
Incomplete	1/19	5.3
Reports a priori sample size
Yes	1/19	5.3
Provides statistical justification for the sample size
Yes	0/19	0.0
Describe recruitment procedure of validation cohort (from a database, based on diagnostic codes)
Yes	17/19	89.5
Unclear	2/19	10.5
Describe patient sampling (Random, consecutive, all)
Random sampling	1/19	5.3
All	14/19	73.7
Unclear	2/19	10.5
Incomplete	2/19	10.5
Describe how participants were chosen for data collection and analysis
Yes	15/19	78.9
Unclear	2/19	10.5
Describes inclusion/exclusion criteria
Yes	14/19	73.7
Incomplete	1/19	5.3
Describes who identified patients (for patients identified from medical records)
Yes	1/19	5.3
Incomplete	1/19	5.3
Describes who collected data
Yes	3/19	15.8
Describes use of a priori data collection form
Yes	13/19	68.4
Unclear	1/19	5.3
Use of a split sample or an independent sample (revalidation using a separate cohort)
Yes	1/19	5.3
Describes the reference standard
Yes	13/17	76.5
Reports the number of persons reading the reference standard
Yes	2/17	11.8
Describes the training or expertise of persons reading reference standard
Yes	1/17	5.9
Readers of the reference standard were blinded to the results of the classification by routinely collected data for that patient (reference standard: medical records)
Yes	1/17	5.9
Reports a measure of concordance if >1 persons reading the reference standard
Yes	0/17	0.0
Describes the linkage procedure, if done (probabilistic/deterministic)
Yes	8/15	50.0
Incomplete	6/15	37.5
Describes the methods of linkage quality evaluation
Yes	7/15	46.7
Incomplete	2/15	13.3
Describes explicit methods for calculating or comparing measures of accuracy and statistical methods used to quantify uncertainty
Yes	13/19	68.4

Table III

Reporting quality of methodology of included studies.

Methods	Frequency	%
Describes the data source
Yes	16/19	84.2
Incomplete	2/19	10.5
Unclear	1/19	5.3
Describes type of records (inpatient, outpatient, linked records)
Yes	18/19	94.7
Unclear	1/19	5.3
Describes setting and locations where data were collected
Yes	18/19	94.7
Incomplete	1/19	5.3
Reports a priori sample size
Yes	1/19	5.3
Provides statistical justification for the sample size
Yes	0/19	0.0
Describe recruitment procedure of validation cohort (from a database, based on diagnostic codes)
Yes	17/19	89.5
Unclear	2/19	10.5
Describe patient sampling (Random, consecutive, all)
Random sampling	1/19	5.3
All	14/19	73.7
Unclear	2/19	10.5
Incomplete	2/19	10.5
Describe how participants were chosen for data collection and analysis
Yes	15/19	78.9
Unclear	2/19	10.5
Describes inclusion/exclusion criteria
Yes	14/19	73.7
Incomplete	1/19	5.3
Describes who identified patients (for patients identified from medical records)
Yes	1/19	5.3
Incomplete	1/19	5.3
Describes who collected data
Yes	3/19	15.8
Describes use of a priori data collection form
Yes	13/19	68.4
Unclear	1/19	5.3
Use of a split sample or an independent sample (revalidation using a separate cohort)
Yes	1/19	5.3
Describes the reference standard
Yes	13/17	76.5
Reports the number of persons reading the reference standard
Yes	2/17	11.8
Describes the training or expertise of persons reading reference standard
Yes	1/17	5.9
Readers of the reference standard were blinded to the results of the classification by routinely collected data for that patient (reference standard: medical records)
Yes	1/17	5.9
Reports a measure of concordance if >1 persons reading the reference standard
Yes	0/17	0.0
Describes the linkage procedure, if done (probabilistic/deterministic)
Yes	8/15	50.0
Incomplete	6/15	37.5
Describes the methods of linkage quality evaluation
Yes	7/15	46.7
Incomplete	2/15	13.3
Describes explicit methods for calculating or comparing measures of accuracy and statistical methods used to quantify uncertainty
Yes	13/19	68.4

Methods	Frequency	%
Describes the data source
Yes	16/19	84.2
Incomplete	2/19	10.5
Unclear	1/19	5.3
Describes type of records (inpatient, outpatient, linked records)
Yes	18/19	94.7
Unclear	1/19	5.3
Describes setting and locations where data were collected
Yes	18/19	94.7
Incomplete	1/19	5.3
Reports a priori sample size
Yes	1/19	5.3
Provides statistical justification for the sample size
Yes	0/19	0.0
Describe recruitment procedure of validation cohort (from a database, based on diagnostic codes)
Yes	17/19	89.5
Unclear	2/19	10.5
Describe patient sampling (Random, consecutive, all)
Random sampling	1/19	5.3
All	14/19	73.7
Unclear	2/19	10.5
Incomplete	2/19	10.5
Describe how participants were chosen for data collection and analysis
Yes	15/19	78.9
Unclear	2/19	10.5
Describes inclusion/exclusion criteria
Yes	14/19	73.7
Incomplete	1/19	5.3
Describes who identified patients (for patients identified from medical records)
Yes	1/19	5.3
Incomplete	1/19	5.3
Describes who collected data
Yes	3/19	15.8
Describes use of a priori data collection form
Yes	13/19	68.4
Unclear	1/19	5.3
Use of a split sample or an independent sample (revalidation using a separate cohort)
Yes	1/19	5.3
Describes the reference standard
Yes	13/17	76.5
Reports the number of persons reading the reference standard
Yes	2/17	11.8
Describes the training or expertise of persons reading reference standard
Yes	1/17	5.9
Readers of the reference standard were blinded to the results of the classification by routinely collected data for that patient (reference standard: medical records)
Yes	1/17	5.9
Reports a measure of concordance if >1 persons reading the reference standard
Yes	0/17	0.0
Describes the linkage procedure, if done (probabilistic/deterministic)
Yes	8/15	50.0
Incomplete	6/15	37.5
Describes the methods of linkage quality evaluation
Yes	7/15	46.7
Incomplete	2/15	13.3
Describes explicit methods for calculating or comparing measures of accuracy and statistical methods used to quantify uncertainty
Yes	13/19	68.4

Table IV

Reporting quality of the results of included studies.

	Frequency	%
Reports the number of participants satisfying the inclusion/exclusion criteria
Yes	13/18	68.4
Incomplete	1/18	5.6
Describes the characteristics of misclassified patients (false positives and/or false negatives)
Yes	13/18	68.4
Unclear	2/18	11.1
Provides a study flow diagram
Yes	4/19	21.1
Reports the number of records unable to link
Yes	11/12	91.7
Incomplete	1/12	8.3
Reports missing medical records or reports the number of patients unwilling to participate
Yes	10/19	52.6
Reports incomplete records
Yes	13/19	68.4
Presents a cross tabulation of results of the validated source to the reference standard
Yes	11/19	57.9
Incomplete	1/19	5.3
Reports the pretest prevalence in the study sample
Yes	5/19	26.3
Incomplete	2/19	10.5
Tests and reports results of multiple algorithms
Yes	6/15	40.0
Reports estimates of test reproducibility of the split or independent sample if done
Yes	0/19	0.0

	Frequency	%
Reports the number of participants satisfying the inclusion/exclusion criteria
Yes	13/18	68.4
Incomplete	1/18	5.6
Describes the characteristics of misclassified patients (false positives and/or false negatives)
Yes	13/18	68.4
Unclear	2/18	11.1
Provides a study flow diagram
Yes	4/19	21.1
Reports the number of records unable to link
Yes	11/12	91.7
Incomplete	1/12	8.3
Reports missing medical records or reports the number of patients unwilling to participate
Yes	10/19	52.6
Reports incomplete records
Yes	13/19	68.4
Presents a cross tabulation of results of the validated source to the reference standard
Yes	11/19	57.9
Incomplete	1/19	5.3
Reports the pretest prevalence in the study sample
Yes	5/19	26.3
Incomplete	2/19	10.5
Tests and reports results of multiple algorithms
Yes	6/15	40.0
Reports estimates of test reproducibility of the split or independent sample if done
Yes	0/19	0.0

Table IV

Reporting quality of the results of included studies.

	Frequency	%
Reports the number of participants satisfying the inclusion/exclusion criteria
Yes	13/18	68.4
Incomplete	1/18	5.6
Describes the characteristics of misclassified patients (false positives and/or false negatives)
Yes	13/18	68.4
Unclear	2/18	11.1
Provides a study flow diagram
Yes	4/19	21.1
Reports the number of records unable to link
Yes	11/12	91.7
Incomplete	1/12	8.3
Reports missing medical records or reports the number of patients unwilling to participate
Yes	10/19	52.6
Reports incomplete records
Yes	13/19	68.4
Presents a cross tabulation of results of the validated source to the reference standard
Yes	11/19	57.9
Incomplete	1/19	5.3
Reports the pretest prevalence in the study sample
Yes	5/19	26.3
Incomplete	2/19	10.5
Tests and reports results of multiple algorithms
Yes	6/15	40.0
Reports estimates of test reproducibility of the split or independent sample if done
Yes	0/19	0.0

	Frequency	%
Reports the number of participants satisfying the inclusion/exclusion criteria
Yes	13/18	68.4
Incomplete	1/18	5.6
Describes the characteristics of misclassified patients (false positives and/or false negatives)
Yes	13/18	68.4
Unclear	2/18	11.1
Provides a study flow diagram
Yes	4/19	21.1
Reports the number of records unable to link
Yes	11/12	91.7
Incomplete	1/12	8.3
Reports missing medical records or reports the number of patients unwilling to participate
Yes	10/19	52.6
Reports incomplete records
Yes	13/19	68.4
Presents a cross tabulation of results of the validated source to the reference standard
Yes	11/19	57.9
Incomplete	1/19	5.3
Reports the pretest prevalence in the study sample
Yes	5/19	26.3
Incomplete	2/19	10.5
Tests and reports results of multiple algorithms
Yes	6/15	40.0
Reports estimates of test reproducibility of the split or independent sample if done
Yes	0/19	0.0

Where multiple databases were linked using a common patient identifier, the linkage procedures were adequately described in eight (53.3%) of the studies (Sunderam et al., 2006; Zhang et al., 2010, 2012; Williams et al., 2013; Cohen et al., 2014; Kotelchuck et al., 2014; Stern et al., 2016a, 2016b). The quality of these procedures was described in only seven studies (46.7%) (Hemminki et al., 2003; Sunderam et al., 2006; Zhang et al., 2010, 2012; Williams et al., 2013; Kotelchuck et al., 2014; Stern et al., 2016a).

The pre-test prevalence of the validated variables was provided in seven studies (Sunderam et al., 2006; Zhang et al., 2010; Buck Louis et al., 2014; Cohen et al., 2014; Kotelchuck et al., 2014; Liberman et al., 2014; Luke et al., 2016) (Table V). The post-test prevalence of these variables was within a 2% range of the pre-test values for four of the studies (Zhang et al., 2010; Cohen et al., 2014; Kotelchuck et al., 2014; Liberman et al., 2014); however, in two studies, the post-test prevalence was largely discrepant from pre-test values (Buck Louis et al., 2014; Luke et al., 2016).

Table V

Description of the pre- and post-test prevalence of measured estimates of validity in included studies.

Study	Prevalence estimate reported	Pre-test prevalence (%)	Post-test prevalence^*(%)
Buck Louis et al. (2015)	No	—	—
Buck Louis et al. (2014)	ART conceived infant	1.40	14.0
Centers for Disease Control and Prevention (2016)	No	—	—
Cohen et al. (2014)	ART conceived infant	1.40	0.45
Gissler et al. (2004)	No	—	—
Hemminki et al. (2003)	No	—	—
Hvidtjørn et al. (2009)	No	—	—
Kotelchuck et al. (2014)	ART conceived infant	1.60	2.72
Liberman et al. (2014)	ART conceived infant in MA	4.30	5.30
Luke et al. (2016)	ART conceived infant	1.70	9.80
Molinaro et al. (2009)	No	—	—
Overbeek et al. (2013)	No	—	—
Rosenfeld and Strulov (2009)	No	—	—
Stern et al. (2016a)	Incomplete	—	—
Stern et al. (2016b)	No	—	—
Sunderam et al. (2006)	Yes	3.00	—
Williams et al. (2013)	No	—	—
Zhang et al. (2012)	No	—	—
Zhang et al. (2010)	ART Live birth deliveries	3.00	1.70

Study	Prevalence estimate reported	Pre-test prevalence (%)	Post-test prevalence^*(%)
Buck Louis et al. (2015)	No	—	—
Buck Louis et al. (2014)	ART conceived infant	1.40	14.0
Centers for Disease Control and Prevention (2016)	No	—	—
Cohen et al. (2014)	ART conceived infant	1.40	0.45
Gissler et al. (2004)	No	—	—
Hemminki et al. (2003)	No	—	—
Hvidtjørn et al. (2009)	No	—	—
Kotelchuck et al. (2014)	ART conceived infant	1.60	2.72
Liberman et al. (2014)	ART conceived infant in MA	4.30	5.30
Luke et al. (2016)	ART conceived infant	1.70	9.80
Molinaro et al. (2009)	No	—	—
Overbeek et al. (2013)	No	—	—
Rosenfeld and Strulov (2009)	No	—	—
Stern et al. (2016a)	Incomplete	—	—
Stern et al. (2016b)	No	—	—
Sunderam et al. (2006)	Yes	3.00	—
Williams et al. (2013)	No	—	—
Zhang et al. (2012)	No	—	—
Zhang et al. (2010)	ART Live birth deliveries	3.00	1.70

^*Based on reference standard.

Table V

Description of the pre- and post-test prevalence of measured estimates of validity in included studies.

Study	Prevalence estimate reported	Pre-test prevalence (%)	Post-test prevalence^*(%)
Buck Louis et al. (2015)	No	—	—
Buck Louis et al. (2014)	ART conceived infant	1.40	14.0
Centers for Disease Control and Prevention (2016)	No	—	—
Cohen et al. (2014)	ART conceived infant	1.40	0.45
Gissler et al. (2004)	No	—	—
Hemminki et al. (2003)	No	—	—
Hvidtjørn et al. (2009)	No	—	—
Kotelchuck et al. (2014)	ART conceived infant	1.60	2.72
Liberman et al. (2014)	ART conceived infant in MA	4.30	5.30
Luke et al. (2016)	ART conceived infant	1.70	9.80
Molinaro et al. (2009)	No	—	—
Overbeek et al. (2013)	No	—	—
Rosenfeld and Strulov (2009)	No	—	—
Stern et al. (2016a)	Incomplete	—	—
Stern et al. (2016b)	No	—	—
Sunderam et al. (2006)	Yes	3.00	—
Williams et al. (2013)	No	—	—
Zhang et al. (2012)	No	—	—
Zhang et al. (2010)	ART Live birth deliveries	3.00	1.70

Study	Prevalence estimate reported	Pre-test prevalence (%)	Post-test prevalence^*(%)
Buck Louis et al. (2015)	No	—	—
Buck Louis et al. (2014)	ART conceived infant	1.40	14.0
Centers for Disease Control and Prevention (2016)	No	—	—
Cohen et al. (2014)	ART conceived infant	1.40	0.45
Gissler et al. (2004)	No	—	—
Hemminki et al. (2003)	No	—	—
Hvidtjørn et al. (2009)	No	—	—
Kotelchuck et al. (2014)	ART conceived infant	1.60	2.72
Liberman et al. (2014)	ART conceived infant in MA	4.30	5.30
Luke et al. (2016)	ART conceived infant	1.70	9.80
Molinaro et al. (2009)	No	—	—
Overbeek et al. (2013)	No	—	—
Rosenfeld and Strulov (2009)	No	—	—
Stern et al. (2016a)	Incomplete	—	—
Stern et al. (2016b)	No	—	—
Sunderam et al. (2006)	Yes	3.00	—
Williams et al. (2013)	No	—	—
Zhang et al. (2012)	No	—	—
Zhang et al. (2010)	ART Live birth deliveries	3.00	1.70

^*Based on reference standard.

Discussion

This study demonstrates that there is a paucity of the literature on the validation of data elements within fertility databases and registries. There were numerous studies that validated ART information derived from maternal report or birth and death certificates by comparing those data to the reference standard of a fertility registry; however, we only identified one study that assessed the validity of a fertility registry by comparing data elements from the database to the reference standard of the patient record (Centers for Disease Control and Prevention et al., 2016). Furthermore, only seven studies published the baseline prevalence of the data element being validated (Sunderam et al., 2006; Zhang et al., 2010; Buck Louis et al., 2014; Cohen et al., 2014; Kotelchuck et al., 2014; Liberman et al., 2014; Luke et al., 2016), of which only four studies’ sample prevalence approximated that of the population (Buck Louis et al., 2014; Luke et al., 2016).

There are three commonly cited validation study designs: ecological studies, reabstraction studies, and gold standard studies (Van Walraven and Austin, 2012). Ecological studies compare measures of disease prevalence in the database to those obtained from more reliable methods, like those published elsewhere. Reabstraction studies compare the database variable or element to the medical record. Finally, gold standard studies compare the database variable to a case definition, either based on clinical or laboratory values or clinical consensus (Van Walraven and Austin, 2012).

Hemminki et al. (2003) and Gissler et al. (2004) both performed ecological studies using national statistics. Hemminki et al. (2003) and Gissler et al. (2004) created a case-finding algorithm using data from a drug reimbursement register and a physician examination and intervention register to identify an infertility population in Finland. They subsequently compared these data to national statistics to validate their algorithm. Gissler et al. (2004) compared prevalence estimates both from a birth registry and from aggregate IVF statistics to estimates generated from Hemminki’s study to assess the completeness and validity of these routinely collected data sources. Firstly, these reference standards rely on the accuracy of the national statistics, which were not established and should not be implicitly assumed. Secondly, as the comparison is based on aggregate data rather than patient-level data, identifying specific differences and agreements is impossible.

Of the 19 studies included in our review, only 2 used the medical record as the reference standard (Molinaro et al., 2009; Centers for Disease Control and Prevention et al., 2016), and only 1 presented measures of validation (Centers for Disease Control and Prevention et al., 2016). The others used either another database or patient report as the reference. Molinaro et al. (2009) attempted to validate diagnosis variables in The Society of American Reproductive Technologies (SART) using case definitions based on clinical values in the patients’ charts rather than relying on the expertise of clinicians. They did not report their measures of validity, making it challenging to determine if this method is superior. Using objective measures, such as laboratory tests and strict diagnostic criteria, for validation compared to documentation may be more reliable, though such approaches were not identified by our review of ART validation studies.

The study performed by SART assessed multiple patient variables at one time, comparing SART data to patient charts (Centers for Disease Control and Prevention et al., 2016). However, due to the presentation of discrepancy rates without other important measures of validity, such as sensitivity, kappa coefficients, or PPVs, it is difficult to determine how reliable these data are. A subgroup evaluation by the size of the clinic or geography would be useful to investigate whether specific variables are largely problematic or if there is an issue at a specific clinic.

A Canadian study investigating the validity of diagnostic codes in 10 major hospitals found that the sensitivity and specificity were highly dependent on the hospital, where some had a high accuracy and others demonstrated poor sensitivity (Juurlink et al., 2006). Clinics may have specific expertise with respect to their patient populations, and the prevalence of certain conditions or treatments may vary based on health care provider. Predictive tests (PPV, likelihood ratios) are highly dependent on the baseline prevalence of the specific treatment or disease (Altman and Bland, 1994). Furthermore, in certain cases, the sensitivity and specificity may vary with the prevalence (Brenner and Gefeller, 1997). While the accuracy of those records would not necessarily be influenced, the metrics such as PPV, NPV, and sensitivity will be affected. Only four of the included studies presented post-test prevalence estimates that approximated the reported pre-test prevalence; it, therefore, puts into question the degree of bias in the estimates presented. As such, it is essential to describe both the source of data and prevalence of the variable of interest to adequately interpret the results.

There is insufficient documentation in the literature with respect to how national fertility registries are validating their databases. SART publishes a publicly available report on an annual basis indicating which variables are discrepant between the medical chart and the database (Centers for Disease Control and Prevention et al., 2016). According to ICMART’s world report, there were 61 countries that submitted nationwide ART data for surveillance (Dyer et al., 2016). Unfortunately, none of the other national databases have generated such reports or have made them easily accessible. The Human Fertilisation and Embryology Authority in the UK, Australian & New Zealand Assisted Reproduction Database (ANZARD), and the Belgian Register for Assisted Procreation endorse strict adherence to quality assurance practices; however, no reports were available describing their data-validation processes (written communication with Belgium and ANZARD). As all stakeholders, including patients, health care practitioners, researchers, and policy makers, rely on these data to understand the implications of fertility treatments, including the prevalence of disease, practice patterns, and complications and outcomes of ART, it is essential that these reports are made publicly available (Butler, 2003; Chambers et al., 2009; Canadian Fertility Andrology Society, 2014; Harris et al., 2016; Human Fertilisation and Embryology Authority, 2016).

It is clear from this review that databases are audited, but tracking that process and determining which data elements are reliable are challenging. Therefore, a gold standard from this source should not be implicitly accepted. More studies investigating the accuracy of routinely collected data in local or national registries need to be performed and published, with adherence to reporting guidelines. Upon demonstration of data validity, research can be performed utilizing these databases with measures to reduce bias. Finally, patient report is subject to recall bias, particularly as increasing time has passed from the event to the survey (Leong et al., 2013).

Our review has several limitations. We restricted our inclusion criteria to published reports in English. As many of the internal processes are likely to occur in the primary language of the registry or organization, it is possible that we were unable to capture validation processes from registries. A comprehensive search on the internet did not yield any results, even in other languages, however. Moreover, only four studies were excluded from our database search due to language restriction (Lidegaard and Hammerum, 2002; Rosenfeld and Strulov, 2009b; Ameri and Alizadeh, 2014; Pierron et al., 2015). Our study was also limited by the search strategy developed for Medline, Embase, and CINAHL. While the strategy was quite general for routinely collected databases, the list was not exhaustive for specific diagnoses relevant to infertility. Consequently, it is probable that other published studies were not captured in our review.

In spite of these limitations, our study is strengthened by the systematic and comprehensive approach to searching the articles and analyzing the measures of validity. This is the first study to our knowledge to assess the utility of validation tools for fertility registries. Although many of these reports were not published in indexed bibliographic databases, numerous attempts were made to contact ART surveillance database managers in the UK, Denmark, Belgium, Australia, New Zealand, and the USA to obtain unpublished or ad hoc reports on data maintenance and quality assurance.

This review highlights an important gap in the field of fertility research where the validation of widely utilized databases has not been well described. Big data are increasingly used for research, quality assurance, and policy; therefore, the accuracy of these data is essential. Furthermore, during the validation process, the prevalence of the variables and the statistical estimates need to be adequately measured and compared to the prevalence from the drawn study population. This would allow the reader to assess the generalizability of the study population to the general population. As the prevalence of the condition varies based on health care provider or geographic location, so will these measures. Future studies need to be conducted and published using rigorous methodology that will allow for greater transparency and accuracy of research within this rapidly evolving field of medicine and research.

Acknowledgements

We would like to thank Risa Shorr who helped in the development of the search strategy.

Authors’ roles

V.B. developed the protocol, served as a reviewer and data abstractor in the data acquisition phase, performed the analysis, and drafted the article. M.R. served as a second reviewer and data abstractor in the data acquisition phase. He participated in critically revising the manuscript and approved the final version for publication. D.B.F. helped to develop the protocol, provided guidance in the data analysis, participated in critically revising the manuscript, and approved the final version for publication. H.S. helped to develop the protocol, participated in critically revising the manuscript, and approved the final version for publication. M.W. helped to develop the protocol, provided guidance in the data analysis, participated in critically revising the manuscript, and approved the final version for publication. L.M.G. helped to develop the protocol, provided guidance in the data analysis, participated in critically revising the manuscript, and approved the final version for publication.

Funding

Canadian Institutes of Health Research (CIHR) (FDN-148438).

Conflict of interest

None of the authors have any conflicts of interest to declare.

References

AbdelHafez

Desai

Abou-Setta

Falcone

Goldfarb

Slow freezing, vitrification and ultra-rapid freezing of human embryos: a systematic review and meta-analysis

Reprod Biomed Online

2010

;

209

–

222

Altman

Bland

Diagnostic tests 2: predictive values

BMJ

1994

;

309

102

Ameri

Alizadeh

Assessing the effects of infertility treatment drugs using clustering algorithms and data mining techniques

J Maz Univ Med Sci

2014

;

–

http://www.who.int/reproductivehealth/publications/infertility/progress63.pdf

Benchimol

Manuel

Griffiths

Rabeneck

Guttmann

Development and use of reporting guidelines for assessing the quality of validation studies of health administrative data

J Clin Epidemiol

2011

;

821

–

829

Benchimol

Smeeth

Guttmann

Harron

Moher

Petersen

Sørensen

von Elm

Langan

SM.

The reporting of studies conducted using observational routinely-collected health data (RECORD) statement

PLoS Med

2015

;

e1001885

Bossuyt

Reitsma

Bruns

Gatsonis

Glasziou

Irwig

Lijmer

Moher

Rennie

de Vet

HCW

et al.

Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Standards for reporting of diagnostic accuracy

Clin Chem

2003

;

–

Brenner

Gefeller

Variation of sensitivity, specificity, likelihood ratios and predictive values with disease prevalence

Stat Med

1997

;

981

–

991

Buck Louis

Druschel

Bell

Stern

Luke

McLain

Sundaram

Yeung

Use of assisted reproductive technology treatment as reported by mothers in comparison with registry data: the upstate KIDS study

Fertil Steril

2015

;

103

1461

–

1468

Buck Louis

Hediger

Bell

Kus

Sundaram

McLain

Yeung

Hills

Thoma

Druschel

Methodology for establishing a population-based birth cohort focusing on couple fertility and children’s development, the upstate KIDS study

Paediatr Perinat Epidemiol

2014

;

191

–

202

Butler

Assisted reproduction in developing countries-facing up to the issues

Prog Hum Reprod Res

2003

–

(12 August 2017, date last accessed)

https://cfas.ca/cartr-annual-reports/

Canadian Fertility Andrology Society

. CARTR Annual Reports.

2014

(12 August 2017, date last accessed)

Centers for Disease Control and Prevention

American Society for Reproductive Medicine

Society for Assisted Reproductive Technology

2014 Assisted Reproductive Technology Fertility Clinic Success Rates Report

Centers for Disease Control and Prevention

Atlanta

2016

Google Preview

http://doi.emh.ch/smw.2015.14087

Chambers

Sullivan

Ishihara

Chapman

Adamson

The economic impact of assisted reproductive technology: a review of selected developed countries

Fertil Steril

2009

;

2281

–

2294

Cohen

Bernson

Sappenfield

Kirby

Kissin

Zhang

Copeland

Zhang

Macaluso

Accuracy of assisted reproductive technology information on birth certificates: Florida and Massachusetts, 2004–06

Paediatr Perinat Epidemiol

2014

;

181

–

190

Davies

Moore

Willson

Van Essen

Priest

Scott

Haan

Chan

Reproductive technologies and the risk of birth defects

N Engl J Med

2012

;

366

1803

–

1813

De Geyter

Fehr

Moffat

Gruber

Von

Twenty years’ experience with the Swiss data registry for assisted reproductive medicine: outcomes, key trends and recommendations for improved practice

Swiss Med Wkly

2015

;

145

w14087

(29 July 2016, date last accessed)

Dyer

Chambers

de Mouzon

Nygren

Zegers-Hochschild

Mansour

Ishihara

Banker

Adamson

International Committee for Monitoring Assisted Reproductive Technologies world report: assisted reproductive technology 2008, 2009 and 2010

Hum Reprod

2016

;

1588

–

1609

Fedder

Loft

Parner

Rasmussen

Neonatal outcome and congenital malformations in children born after ICSI with testicular or epididymal sperm: a controlled national cohort study

Hum Reprod

2013

;

230

–

240

Frosst

Hutcheon

Joseph

Kinniburgh

Johnson

Lee

Validating the British Columbia perinatal data registry: a chart re-abstraction study

BMC Pregnancy Childbirth

2015

;

123

Gissler

Klemetti

Sevón

Hemminki

Monitoring of IVF birth outcomes in Finland: a data quality study

BMC Med Inform Decis Mak

2004

;

–

Grams

Plantinga

Hedgeman

Saran

Myers

Williams

Powe

Validation of CKD and related conditions in existing data sets: a systematic review

Am J Kidney Dis

2011

;

–

Harris

Fitzgerald

Macaldowie

Lee

Chambers

Assisted reproductive technology in Australia and New Zealand 2014

Sydney

National Perinatal Epidemiology and Statistics Unit, the University of New South Wales

2016

https://npesu.unsw.edu.au/sites/default/files/npesu/surveillances/Assisted reproductive technology in Australia and New Zealand 2014.pdf (12 August 2017, last date accessed)

Harton

Braude

Lashwood

Schmutzler

Traeger-Synodinos

Wilton

Harper

European Society for Human Reproduction and Embryology (ESHRE) PGD Consortium. ESHRE PGD consortium best practice guidelines for organization of a PGD centre for PGD/preimplantation genetic screening

Hum Reprod

2011

;

–

Hemminki

Klemetti

Rinta-Paavola

Martikainen

Identifying exposures of in vitro fertilization from drug reimbursement files: a case study from Finland

Med Inform Internet Med

2003

;

279

–

289

Herrett

Thomas

Schoonen

Smeeth

Hall

AJ.

Validation and validity of diagnoses in the General Practice Research Database: a systematic review

Br J Clin Pharmacol

2010

;

–

Hierholzer

Jr.

Health care data, the epidemiologist’s sand: comments on the quantity and quality of data

Am J Med

1991

;

21S

–

26S

Human Fertilisation and Embryology Authority

. Annual report and accounts 2015/16.

2016

http://ifqtesting.blob.core.windows.net/umbraco-website/1183/56071_hc_380_web_v02.pdf (12 August 2017, date last accessed)

Hvidtjørn

Grove

Schendel

Schieve

Ernst

Olsen

Thorsen

Validation of self-reported data on assisted conception in the Danish National Birth Cohort

Hum Reprod

2009

;

2332

–

2340

Juurlink

Preyra

Croxford

Chong

Austin

Laupacis

Canadian Institute for Health Information discharge abstract database: a validation study

Toronto Inst Clin Eval Sci

2006

–

Kotelchuck

Hoang

Stern

Diop

Belanoff

Declercq

The MOSART Database: linking the SART CORS clinical database to the population-based Massachusetts PELL Reproductive Public Health Data System

Matern Child Health J

2014

;

2167

–

2178

Lain

Hadfield

Raynes-Greenow

Ford

Mealing

Algert

Roberts

Quality of data in perinatal population health databases: a systematic review

Med Care

2012

;

–

e20

Leong

Dasgupta

Bernatsky

Lacaille

Avina-Zubieta

Rahme

Systematic review and meta-analysis of validation studies on a diabetes case definition from health administrative records

PLoS One

2013

;

e75256

Liberman

Stern

Luke

Reefhuis

Anderka

Validating assisted reproductive technology self-report

Epidemiology

2014

;

773

–

775

Lidegaard

Hammerum

The National Patient Registry as a tool for continuous production and quality control

Ugeskr Laeger

2002

;

164

4420

–

4423

Loutradi

Kolibianakis

Venetis

Papanikolaou

Pados

Bontis

Tarlatzis

Cryopreservation of human embryos by vitrification or slow freezing: a systematic review and meta-analysis

Fertil Steril

2008

;

186

–

193

Luke

Brown

Spector

Validation of infertility treatment and assisted reproductive technology use on the birth certificate in eight states

Am J Obstet Gynecol

2016

;

215

126

–

127

Mascarenhas

Flaxman

Boerma

Vanderpoel

Stevens

GA.

National, regional, and global trends in infertility prevalence since 1990: a systematic analysis of 277 health surveys

PLoS Med

2012

;

e1001356

McGovern

Llorens

Skurnick

Weiss

Goldsmith

LT.

Increased risk of preterm birth in singleton pregnancies resulting from in vitro fertilization–embryo transfer or gamete intrafallopian transfer: a meta-analysis

Fertil Steril

2004

;

1514

–

1520

Moher

Liberati

Tetzlaff

Altman

Grp

Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement (reprinted from Annals of Internal Medicine)

Phys Ther

2009

;

873

–

880

Molinaro

Shaunik

Lin

Sammel

Barnhart

KT.

A strict infertility diagnosis has poor agreement with the clinical diagnosis entered into the Society for Assisted Reproductive Technology registry

Fertil Steril

2009

;

2088

–

2090

National Institute for Health and Care Excellence

Fertility Problems: Assessment and Treatment

National Institute for Health and Care Excellence

2013

https://www.nice.org.uk/guidance/cg156/resources/fertility-problems-assessment-and-treatment-35109634660549 (27 July 2016, last date accessed)

Google Preview

Overbeek

van den Berg

Hukkelhoven

CWPM

Kremer

van den Heuvel-Eibrink

Tissing

WJE

Loonen

Versluys

Bresters

Kaspers

GJL

et al.

Validity of self-reported data on pregnancies for childhood cancer survivors: a comparison with data from a nationwide population-based registry

Hum Reprod

2013

;

819

–

827

Perkins

Boulet

Kissin

Jamieson

DJ.

Risk of ectopic pregnancy associated with assisted reproductive technology in the United States, 2001-2011

Obstet Gynecol

2015

;

125

–

Pierron

Revert

Goueslard

Vuagnat

Cottenet

Benzenine

Fresson

Evaluation of the metrological quality of the medico-administrative data for perinatal indicators: a pilot study in 3 university hospitals

Rev Epidemiol Sante Publique

2015

;

237

–

246

Practice Committee of the Society for Assisted Reproductive Technology

Practice Committee of the American Society for Reproductive Medicine

Elective single-embryo transfer

Fertil Steril

2012

;

835

–

842

Qin

Liu

Sheng

Wang

Assisted reproductive technology and the risk of pregnancy-related complications and adverse pregnancy outcomes in singleton pregnancies: a meta-analysis of cohort studies

Fertil Steril

2016

;

105

–

85e6

Romundstad

Sunde

von Düring

Skjaerven

Vatten

LJ.

Increased risk of placenta previa in pregnancies following IVF/ICSI; a comparison of ART and non-ART pregnancies in the same mother

Hum Reprod

2006

;

2353

–

2358

Rosenfeld

Strulov

Improvement of accuracy of clinical reports--the case of IVF cycle rank

J Assist Reprod Genet

2009a

;

–

103

Rosenfeld

Strulov

Clinical reports on IVF cycle rank--reliability and validity

Harefuah

2009b

;

148

–

Santos-Ribeiro

Tournaye

Polyzos

Trends in ectopic pregnancy rates following assisted reproductive technologies in the UK: a 12-year nationwide analysis including 160 000 pregnancies

Hum Reprod

2016

;

393

–

402

Sazonova

Källen

Thurin-Kjellberg

Wennerholm

U–B

Bergh

Obstetric outcome after in vitro fertilization with single or double embryo transfer

Hum Reprod

2011

;

442

–

450

Shiff

Jama

Boden

Lix

Virnig

McBean

van Walvaren

Austin

Bennett

et al.

Validation of administrative health data for the pediatric population: a scoping review

BMC Health Serv Res

2014

;

236

Sørensen

Sabroe

Olsen

A framework for evaluation of secondary data sources for epidemiological research

Int J Epidemiol

1996

;

435

–

442

Stern

Gopal

Liberman

Anderka

Kotelchuck

Luke

Validation of birth outcomes from the Society for Assisted Reproductive Technology Clinic Outcome Reporting System (SART CORS): population-based analysis from the Massachusetts Outcome Study of Assisted Reproductive Technology (MOSART)

Fertil Steril

2016a

;

106

717

–

722

Stern

McLain

Buck Louis

Luke

Yeung

EH.

Accuracy of self-reported survey data on assisted reproductive technology treatment parameters and reproductive history

Am J Obstet Gynecol

2016b

;

215

219.e1

–

219.e6

Sullivan

Zegers-Hochschild

Mansour

Ishihara

De Mouzon

Nygren

Adamson

GD.

International Committee for Monitoring Assisted Reproductive Technologies (ICMART) world report: assisted reproductive technology 2004

Hum Reprod

2013

;

1375

–

1390

Sunderam

Kissin

Crawford

Folger

Jamieson

Warner

Barfield

WD.

Assisted reproductive technology surveillance — United States, 2014

MMWR Surveill Summ

2017

;

–

Sunderam

Schieve

Cohen

Zhang

Jeng

Reynolds

Wright

Johnson

Macaluso

Linking birth and infant death records with assisted reproductive technology data: Massachusetts, 1997–1998

Matern Child Health J

2006

;

115

–

125

Traeger-Synodinos

Coonen

Goossens

Data from the ESHRE PGD consortium

Hum Reprod

2013

;

i18