-
PDF
- Split View
-
Views
-
Cite
Cite
Andrew D. Revell, Dechao Wang, Robin Wood, Carl Morrow, Hugo Tempelman, Raph L. Hamers, Peter Reiss, Ard I. van Sighem, Mark Nelson, Julio S. G. Montaner, H. Clifford Lane, Brendan A. Larder, on behalf of the RDI Data and Study Group, Peter Reiss, Ard van Sighem, Julio Montaner, Richard Harrigan, Tobias Rinke de Wit, Raph Hamers, Kim Sigaloff, Brian Agan, Vincent Marconi, Scott Wegner, Wataru Sugiura, Maurizio Zazzi, Rolf Kaiser, Eugen Schuelter, Adrian Streinu-Cercel, Gerardo Alvarez-Uria, Jose Gatell, Elisa Lazzari, Brian Gazzard, Mark Nelson, Anton Pozniak, Sundhiya Mandalia, Daniel Webster, Colette Smith, Lidia Ruiz, Bonaventura Clotet, Schlomo Staszewski, Carlo Torti, Cliff Lane, Julie Metcalf, Maria-Jesus Perez-Elias, Stefano Vella, Gabrielle Dettorre, Andrew Carr, Richard Norris, Karl Hesse, Emanuel Vlahakis, Hugo Tempelman, Roos Barth, Carl Morrow, Robin Wood, Chris Hoffmann, Luminita Ene, Gordana Dragovic, Ricardo Diaz, Cecilia Sucupira, Omar Sued, Carina Cesar, Juan Sierra Madero, Sean Emery, David Cooper, Carlo Torti, John Baxter, Laura Monno, Carlo Torti, Jose Gatell, Bonventura Clotet, Gaston Picchio, Marie-Pierre deBethune, Maria-Jesus Perez-Elias, Sean Emery, Paul Khabo, Lotty Ledwaba, on behalf of the RDI Data and Study Group, An update to the HIV-TRePS system: the development and evaluation of new global and local computational models to predict HIV treatment outcomes, with or without a genotype, Journal of Antimicrobial Chemotherapy, Volume 71, Issue 10, October 2016, Pages 2928–2937, https://doi.org/10.1093/jac/dkw217
- Share Icon Share
Abstract
Optimizing antiretroviral drug combination on an individual basis in resource-limited settings is challenging because of the limited availability of drugs and genotypic resistance testing. Here, we describe our latest computational models to predict treatment responses, with or without a genotype, and compare the potential utility of global and local models as a treatment tool for South Africa.
Global random forest models were trained to predict the probability of virological response to therapy following virological failure using 29 574 treatment change episodes (TCEs) without a genotype, 3179 of which were from South Africa and were used to develop local models. In addition, 15 130 TCEs including genotypes were used to develop another set of models. The ‘no-genotype’ models were tested with an independent global test set (n = 1700) plus a subset from South Africa (n = 222). The genotype models were tested with 750 independent cases.
The global no-genotype models achieved area under the receiver-operating characteristic curve (AUC) values of 0.82 and 0.79 with the global and South African tests sets, respectively, and the South African models achieved AUCs of 0.70 and 0.79. The genotype models achieved an AUC of 0.84. The global no-genotype models identified more alternative, locally available regimens that were predicted to be effective for cases that failed their new regimen in the South African clinics than the local models. Both sets of models were significantly more accurate predictors of outcomes than genotyping with rules-based interpretation.
These latest global models predict treatment responses accurately even without a genotype, out-performed the local South African models and have the potential to help optimize therapy, particularly in resource-limited settings.
Introduction
The story of ART for HIV infection is one of success and continuing progress, in terms of the number and nature of the inhibitors available, the long-term suppression of the virus they offer in combination and the roll-out of therapy to low- and middle-income countries (LMICs).
There are now approaching 30 antiretroviral drugs licensed that are used in combination therapy to fully suppress virus replication and prevent HIV disease progression.1–3 However, HIV mutates readily, particularly when patient adherence with drug therapy is poor. This promotes increased viral replication and when treatment fails it is often associated with the development of viral drug resistance. The selection of the next combination of drugs is then critical to overcome any resistance that has emerged and to re-establish long-lasting viral suppression.
In well-resourced countries, an individualized decision is made for each patient based on a range of individual patient data, usually including an expert analysis of the treatment history and the results of one or more genotypic resistance tests.1–3 However, the resistance tests in common use have a number of shortcomings and are only moderately (typically 60%–65%) predictive of response to treatment.4
The challenge of optimizing drug selection in LMICs is even greater as resistance tests are typically unavailable or unaffordable and there is a more limited range of drug options.5 In the absence of routine viral load monitoring in many settings, therapy failure is often detected late and regimen switch decisions are based on standard protocols rather than optimized for individuals. The result can be suboptimal regimen selection, failure to achieve resuppression of the virus and selection of additional drug resistance, which may limit future therapeutic options and can be transmitted to others.6
Over the past 10 years, the HIV Resistance Response Database Initiative (RDI) has collected biological, clinical and treatment outcome data for ∼190 000 HIV-1 patients from multiple sources around the world. From these data, we have used machine learning to develop models to predict HIV-1 treatment outcomes and to identify optimal, individualized therapies.7–11 The data used include a range of immunological and virological measurements, the treatment history, the drugs in the new regimen and the output measure of treatment response: the amount of virus in the bloodstream (viral load) under the new treatment. We have developed models that use information from genotypic resistance tests in their predictions and others that do not, for use in LMICS where such tests are not generally available. Previous models tested with independent test sets predicted virological response with an overall accuracy of 80% with a genotype and 74% without and the area under the receiver-operating characteristic (ROC) curve (AUC) was 0.83 and 0.80, respectively (A. D. Revell, D. Wang and B. A. Larder, unpublished results).11
The models are used to power an online treatment decision support tool, the HIV Treatment Response Prediction System (HIV-TRePS). In order to keep this system as current as possible in terms of the inclusion of new drugs and reflection of current clinical practice, it is essential that new models are developed, including the latest data available, on a fairly frequent basis. Here, we report on the development of three new sets of random forest (RF) models that estimate the probability of combinations of antiretroviral drugs reducing the plasma viral load to undetectable (<50 copies HIV RNA/mL): (i) global models that do not require a genotype for their predictions of treatment response (global no-genotype models), trained using a large global dataset and intended for use in LMICS without access to genotyping; (ii) local South African models that do not require a genotype (South African no-genotype models) trained using a subset of the above data from patients treated in South Africa; and (iii) global models that use a viral genotype in their predictions (global genotype models), trained using a large global dataset including this information.
The global and South African no-genotype models (i and ii above) enabled us to make a comparison as to whether a global or local modelling solution might be more effective for an individual, resource-constrained country with a substantial HIV epidemic. The accuracy of all the models was ascertained and they were evaluated as potential tools to support optimized, individualized treatment decision-making. This paper represents the update alluded to in our previous publication of modelling.11
Methods
Clinical data
The package of data used for training the models (collected from episodes when ART is changed, for whatever reason) is termed a treatment change episode (TCE).7 TCEs were extracted from the RDI database for development of the no-genotype models that had all the following data available (with no missing values) from cases of virological failure: on-treatment baseline plasma viral load (sample taken ≤8 weeks prior to treatment change); on-treatment baseline CD4 cell count (≤12 weeks prior to treatment change); baseline regimen (the drugs the patient was taking prior to the change); ART history; drugs in the new regimen; a follow-up plasma viral load determination taken between 4 and 52 weeks following introduction of the new regimen; and the time to that follow-up viral load (in order that the models can be trained to predict responses at different times). A similar extraction of TCEs was subsequently performed for TCEs that also had an on-treatment genotype [protease and reverse transcriptase (RT) sequence obtained ≤12 weeks prior to treatment change] for the development of the genotype models.
The TCEs were censored using rules established in previous studies and published in detail elsewhere.12 For example, TCEs involving drugs not adequately represented in the database (<500 cases) were excluded, in this case tipranavir, maraviroc and rilpivirine.
Data partition
The qualifying TCEs were partitioned at random using methods described elsewhere.8,12 The following datasets resulted (Figure 1). For the global no-genotype models, a training set of 29 574 TCEs and an independent test set of 1700 TCEs from 1700 patients not represented in the training set were obtained. Of these, 3179 training cases and 222 test cases were from South Africa and were used to train and test the ‘local’ South African no-genotype models.

Schematic of data extraction, partition, model training and testing. SA, South African.
The data that included baseline genotypes were partitioned into a training set of 15 130 TCEs and 750 independent test cases for the development and testing of the global genotype models. Partitioning was constrained in order to give a minimum number of 50 test cases including the latest inhibitor to be modelled (elvitegravir).
Computational model development
The three training sets of TCEs were used to train three committees, each of 10 RF models (Figure 1). Each was trained to predict the probability of the follow-up viral load being <50 copies/mL, using methodology described in detail elsewhere.3,6 The following 43 input variables were used for the development of the global and South African no-genotype models: baseline viral load (log10 copies HIV RNA/mL); baseline CD4 count (cells/mm3); treatment history comprising 20 binary variables coding for any experience of zidovudine, didanosine, stavudine, abacavir, lamivudine, emtricitabine, tenofovir disoproxil fumarate, efavirenz, nevirapine, etravirine, indinavir, nelfinavir, saquinavir, amprenavir, fosamprenavir, lopinavir, atazanavir, darunavir, enfuvirtide and raltegravir; antiretroviral drugs in the new regimen, covering the same 20 drugs; and time from treatment change to the follow-up viral load (number of days). The output variable was the follow-up viral load coded as a binary variable: ≤1.7 log or 50 copies/mL = 1 (response) and >1.7 log = 0 (failure).
The genotype models used the above variables plus, for the first time, tipranavir as a historical drug and elvitegravir in the new regimen (because these models were developed later and sufficient data were available to include these drugs). In addition, the following 62 mutations, detected in a genotype obtained while on the previous therapy no more than 84 days before the initiation of the new regimen, were used: HIV RT mutations (33)—M41L, E44D, A62V, K65R, D67N, 69 insert, T69D/N, K70R, L74V, V75I, F77L, V90I, A98G, L100I, L101I/E/P, K103N, V106A/M, V106I, V108I, Y115F, F116Y, V118I, E138A/G/K, Q151M, V179D/F/T, Y181C/I/V, M184V/I, Y188C/L/H, G190S/A, L210W, T215F/Y, K219Q/E and P236L; and HIV protease mutations (29)—L10F/I/R/V, V11I, K20M/R, L24I, D30N, V32I, L33F, M36I, M36L/V, M46I/L, I47V, G48V, I50V, I50L, F53L, I54 (any change), Q58E, L63P, A71 (any change), G73 (any change), T74P, L76V, V77I, V82A/F/S, V82T, I84V/A/C, N88D/S, L89V and L90M. The mutations were selected on the basis of the IAS-USA mutation list and previous modelling studies.13
Validation and independent testing
Each of the three committees of 10 RF models (global no-genotype, South African no-genotype and global genotype) was developed using a 10× cross-validation scheme.8,12 For each partition, the model's estimates of the probability of response for the validation cases was compared with the actual response observed in the clinic (binary response variable: response versus failure).
The performance of the three sets of models as predictors of virological response was then evaluated using the independent test cases partitioned from the pool of available TCEs at random prior to training. The models' estimates of the probability of response and the responses observed in the clinics for the independent test cases were used to plot ROC curves and assess the AUC. In addition, the optimum operating point (OOP) for the models was derived during cross-validation and used to obtain the overall accuracy, sensitivity and specificity of the models.
Comparison of the accuracy of the models versus rules-based interpretation of the genotype
Genotypic sensitivity scores (GSS) were obtained for those test cases with genotypes that could be fully interpreted (i.e. all the drugs in the regimen were covered) by three rules-based genotype interpretation systems in common use: ANRS, REGA and Stanford HIVdb. The three systems were accessed online on 28 October 2015 and the GSS calculated by adding the score for each drug in the regimen, as described previously.12 The GSS were then used as a predictor of response to the regimens and ROC curves calculated, as described for the models, and the performance characteristics compared with those of the global models, both genotype and no-genotype.
In silico analysis to evaluate the potential of the models to help avoid treatment failure
In order to evaluate further the potential clinical utility of the models, we assessed their ability to identify alternative, practical regimens that were predicted to be effective (probability of virological response above the OOP) or more likely to be effective than the regimens actually introduced in the clinic. For the no-genotype models, intended for use in LMICs, a list of three-drug regimens comprising only those drugs represented in the South African data on or before the start date of the new regimen in the clinic, was identified from the RDI database. We entered each of the South African test cases through both the global and local no-genotype models and obtained the models' predictions of response for each regimen on the list, as well as for the new regimen used in the clinic.
For the global genotype models, a list of three-, four- and five-drug regimens was identified from the RDI database. A global set of independent test cases where the new regimen comprised three or more drugs was entered into the models and predictions obtained for the regimens from the drug list that had no more drugs than the regimen used in the clinic.
Results
Characteristics of the datasets
The baseline, treatment and response characteristics of the datasets are summarized in Tables 1 and 2. For the no-genotype datasets, as would be expected, the gender balance is very different in the global datasets and the South African datasets, with the latter having a fairly even split between males and females. The global datasets were drawn from a wide range of institutions and countries in Europe, North America, South America, sub-Saharan Africa and South and South-East Asia.
Characteristics of the TCEs in the training and test sets for the models that do not require a genotype
. | Global training set . | South African training set . | Global test set . | South African test set . |
---|---|---|---|---|
TCEs | 29 574 | 3179 | 1700 | 222 |
male, n | 19 112 | 1314 | 1031 | 96 |
female, n | 6340 | 1574 | 403 | 99 |
gender not known, n | 4122 | 291 | 266 | 27 |
age (years), median | 41 | 36 | 40 | 37 |
Geographical sources of TCEs, n | ||||
Argentina | 91 | 13 | ||
Australia | 298 | 11 | ||
Brazil | 5 | 0 | ||
Canada | 1760 | 106 | ||
Germany | 3050 | 174 | ||
India | 285 | 33 | ||
Italy | 1114 | 63 | ||
Japan | 69 | 3 | ||
Mexico | 183 | 14 | ||
Netherlands | 3376 | 209 | ||
Romania | 354 | 40 | ||
Serbia | 18 | 5 | ||
South Africa | 3179 | 3179 | 222 | 222 |
Spain | 3192 | 131 | ||
UK | 7218 | 347 | ||
USA | 1309 | 68 | ||
sub-Saharan Africa (country unknown) | 58 | 6 | ||
not known (multinational cohorts) | 4015 | 255 | ||
Baseline data | ||||
baseline viral load (log10 copies/mL), median (IQR) | 3.8 (2.69–4.7) | 3.89 (2.93–4.6) | 3.74 (2.68–4.65) | 3.9 (2.9–4.56) |
baseline CD4 (cells/mm3), median (IQR) | 264 (140–420) | 209 (122–322) | 262 (136–417) | 176 (105–323.5) |
Treatment history | ||||
no. of previous drugs, median (IQR) | 4 (3–6) | 3 (3–3) | 4 (3–6) | 3 (3–3) |
NRTI experience, n (%) | 29 484 (99.7) | 3179 (100) | 1692 (99.5) | 222 (100) |
NNRTI experience, n (%) | 19 551 (66) | 3042 (95) | 1100 (65) | 214 (96) |
PI experience, n (%) | 18 382 (62) | 356 (11) | 970 (57) | 18 (8) |
no. of previous regimens, median (IQR) | 4 (2–7) | 2 (2–3) | 3 (2–5) | 2 (2–2) |
New regimens, n (%) | ||||
2 NRTI + 1 PI | 10 346 (35) | 1547 (49) | 581 (34) | 105 (47) |
2 NRTI + 1 NNRTI | 6398 (22) | 1396 (44) | 431 (25) | 101 (45) |
3 NRTIs + 1 PI | 2035 (7) | 30 (1) | 118 (7) | 0 (0) |
3 NRTIs | 1344 (5) | 2 (0) | 73 (4) | 0 (0) |
3 NRTIs + 1 NNRTI | 1037 (4) | 17 (1) | 66 (4) | 3 (1) |
2 NRTIs | 896 (3) | 37 (1) | 51 (3) | 2 (1) |
2 NRTIs + 1 NNRTI + 1 PI | 742 (3) | 5 (0) | 36 (2) | 0 (0) |
1 PI + 1 integrase inhibitor | 734 (2) | 0 (0) | 41 (2) | 0 (0) |
4 NRTIs | 547 (2) | 0 (0) | 28 (2) | 0 (0) |
1 NRTI + 1 NNRTI + 1 PI | 534 (2) | 1 (0) | 27 (2) | 0 (0) |
1 NRTI + 1 PI | 440 (1) | 59 (2) | 30 (2) | 5 (2) |
other | 4521 (15) | 85 (3) | 218 (13) | 6 (3) |
. | Global training set . | South African training set . | Global test set . | South African test set . |
---|---|---|---|---|
TCEs | 29 574 | 3179 | 1700 | 222 |
male, n | 19 112 | 1314 | 1031 | 96 |
female, n | 6340 | 1574 | 403 | 99 |
gender not known, n | 4122 | 291 | 266 | 27 |
age (years), median | 41 | 36 | 40 | 37 |
Geographical sources of TCEs, n | ||||
Argentina | 91 | 13 | ||
Australia | 298 | 11 | ||
Brazil | 5 | 0 | ||
Canada | 1760 | 106 | ||
Germany | 3050 | 174 | ||
India | 285 | 33 | ||
Italy | 1114 | 63 | ||
Japan | 69 | 3 | ||
Mexico | 183 | 14 | ||
Netherlands | 3376 | 209 | ||
Romania | 354 | 40 | ||
Serbia | 18 | 5 | ||
South Africa | 3179 | 3179 | 222 | 222 |
Spain | 3192 | 131 | ||
UK | 7218 | 347 | ||
USA | 1309 | 68 | ||
sub-Saharan Africa (country unknown) | 58 | 6 | ||
not known (multinational cohorts) | 4015 | 255 | ||
Baseline data | ||||
baseline viral load (log10 copies/mL), median (IQR) | 3.8 (2.69–4.7) | 3.89 (2.93–4.6) | 3.74 (2.68–4.65) | 3.9 (2.9–4.56) |
baseline CD4 (cells/mm3), median (IQR) | 264 (140–420) | 209 (122–322) | 262 (136–417) | 176 (105–323.5) |
Treatment history | ||||
no. of previous drugs, median (IQR) | 4 (3–6) | 3 (3–3) | 4 (3–6) | 3 (3–3) |
NRTI experience, n (%) | 29 484 (99.7) | 3179 (100) | 1692 (99.5) | 222 (100) |
NNRTI experience, n (%) | 19 551 (66) | 3042 (95) | 1100 (65) | 214 (96) |
PI experience, n (%) | 18 382 (62) | 356 (11) | 970 (57) | 18 (8) |
no. of previous regimens, median (IQR) | 4 (2–7) | 2 (2–3) | 3 (2–5) | 2 (2–2) |
New regimens, n (%) | ||||
2 NRTI + 1 PI | 10 346 (35) | 1547 (49) | 581 (34) | 105 (47) |
2 NRTI + 1 NNRTI | 6398 (22) | 1396 (44) | 431 (25) | 101 (45) |
3 NRTIs + 1 PI | 2035 (7) | 30 (1) | 118 (7) | 0 (0) |
3 NRTIs | 1344 (5) | 2 (0) | 73 (4) | 0 (0) |
3 NRTIs + 1 NNRTI | 1037 (4) | 17 (1) | 66 (4) | 3 (1) |
2 NRTIs | 896 (3) | 37 (1) | 51 (3) | 2 (1) |
2 NRTIs + 1 NNRTI + 1 PI | 742 (3) | 5 (0) | 36 (2) | 0 (0) |
1 PI + 1 integrase inhibitor | 734 (2) | 0 (0) | 41 (2) | 0 (0) |
4 NRTIs | 547 (2) | 0 (0) | 28 (2) | 0 (0) |
1 NRTI + 1 NNRTI + 1 PI | 534 (2) | 1 (0) | 27 (2) | 0 (0) |
1 NRTI + 1 PI | 440 (1) | 59 (2) | 30 (2) | 5 (2) |
other | 4521 (15) | 85 (3) | 218 (13) | 6 (3) |
Characteristics of the TCEs in the training and test sets for the models that do not require a genotype
. | Global training set . | South African training set . | Global test set . | South African test set . |
---|---|---|---|---|
TCEs | 29 574 | 3179 | 1700 | 222 |
male, n | 19 112 | 1314 | 1031 | 96 |
female, n | 6340 | 1574 | 403 | 99 |
gender not known, n | 4122 | 291 | 266 | 27 |
age (years), median | 41 | 36 | 40 | 37 |
Geographical sources of TCEs, n | ||||
Argentina | 91 | 13 | ||
Australia | 298 | 11 | ||
Brazil | 5 | 0 | ||
Canada | 1760 | 106 | ||
Germany | 3050 | 174 | ||
India | 285 | 33 | ||
Italy | 1114 | 63 | ||
Japan | 69 | 3 | ||
Mexico | 183 | 14 | ||
Netherlands | 3376 | 209 | ||
Romania | 354 | 40 | ||
Serbia | 18 | 5 | ||
South Africa | 3179 | 3179 | 222 | 222 |
Spain | 3192 | 131 | ||
UK | 7218 | 347 | ||
USA | 1309 | 68 | ||
sub-Saharan Africa (country unknown) | 58 | 6 | ||
not known (multinational cohorts) | 4015 | 255 | ||
Baseline data | ||||
baseline viral load (log10 copies/mL), median (IQR) | 3.8 (2.69–4.7) | 3.89 (2.93–4.6) | 3.74 (2.68–4.65) | 3.9 (2.9–4.56) |
baseline CD4 (cells/mm3), median (IQR) | 264 (140–420) | 209 (122–322) | 262 (136–417) | 176 (105–323.5) |
Treatment history | ||||
no. of previous drugs, median (IQR) | 4 (3–6) | 3 (3–3) | 4 (3–6) | 3 (3–3) |
NRTI experience, n (%) | 29 484 (99.7) | 3179 (100) | 1692 (99.5) | 222 (100) |
NNRTI experience, n (%) | 19 551 (66) | 3042 (95) | 1100 (65) | 214 (96) |
PI experience, n (%) | 18 382 (62) | 356 (11) | 970 (57) | 18 (8) |
no. of previous regimens, median (IQR) | 4 (2–7) | 2 (2–3) | 3 (2–5) | 2 (2–2) |
New regimens, n (%) | ||||
2 NRTI + 1 PI | 10 346 (35) | 1547 (49) | 581 (34) | 105 (47) |
2 NRTI + 1 NNRTI | 6398 (22) | 1396 (44) | 431 (25) | 101 (45) |
3 NRTIs + 1 PI | 2035 (7) | 30 (1) | 118 (7) | 0 (0) |
3 NRTIs | 1344 (5) | 2 (0) | 73 (4) | 0 (0) |
3 NRTIs + 1 NNRTI | 1037 (4) | 17 (1) | 66 (4) | 3 (1) |
2 NRTIs | 896 (3) | 37 (1) | 51 (3) | 2 (1) |
2 NRTIs + 1 NNRTI + 1 PI | 742 (3) | 5 (0) | 36 (2) | 0 (0) |
1 PI + 1 integrase inhibitor | 734 (2) | 0 (0) | 41 (2) | 0 (0) |
4 NRTIs | 547 (2) | 0 (0) | 28 (2) | 0 (0) |
1 NRTI + 1 NNRTI + 1 PI | 534 (2) | 1 (0) | 27 (2) | 0 (0) |
1 NRTI + 1 PI | 440 (1) | 59 (2) | 30 (2) | 5 (2) |
other | 4521 (15) | 85 (3) | 218 (13) | 6 (3) |
. | Global training set . | South African training set . | Global test set . | South African test set . |
---|---|---|---|---|
TCEs | 29 574 | 3179 | 1700 | 222 |
male, n | 19 112 | 1314 | 1031 | 96 |
female, n | 6340 | 1574 | 403 | 99 |
gender not known, n | 4122 | 291 | 266 | 27 |
age (years), median | 41 | 36 | 40 | 37 |
Geographical sources of TCEs, n | ||||
Argentina | 91 | 13 | ||
Australia | 298 | 11 | ||
Brazil | 5 | 0 | ||
Canada | 1760 | 106 | ||
Germany | 3050 | 174 | ||
India | 285 | 33 | ||
Italy | 1114 | 63 | ||
Japan | 69 | 3 | ||
Mexico | 183 | 14 | ||
Netherlands | 3376 | 209 | ||
Romania | 354 | 40 | ||
Serbia | 18 | 5 | ||
South Africa | 3179 | 3179 | 222 | 222 |
Spain | 3192 | 131 | ||
UK | 7218 | 347 | ||
USA | 1309 | 68 | ||
sub-Saharan Africa (country unknown) | 58 | 6 | ||
not known (multinational cohorts) | 4015 | 255 | ||
Baseline data | ||||
baseline viral load (log10 copies/mL), median (IQR) | 3.8 (2.69–4.7) | 3.89 (2.93–4.6) | 3.74 (2.68–4.65) | 3.9 (2.9–4.56) |
baseline CD4 (cells/mm3), median (IQR) | 264 (140–420) | 209 (122–322) | 262 (136–417) | 176 (105–323.5) |
Treatment history | ||||
no. of previous drugs, median (IQR) | 4 (3–6) | 3 (3–3) | 4 (3–6) | 3 (3–3) |
NRTI experience, n (%) | 29 484 (99.7) | 3179 (100) | 1692 (99.5) | 222 (100) |
NNRTI experience, n (%) | 19 551 (66) | 3042 (95) | 1100 (65) | 214 (96) |
PI experience, n (%) | 18 382 (62) | 356 (11) | 970 (57) | 18 (8) |
no. of previous regimens, median (IQR) | 4 (2–7) | 2 (2–3) | 3 (2–5) | 2 (2–2) |
New regimens, n (%) | ||||
2 NRTI + 1 PI | 10 346 (35) | 1547 (49) | 581 (34) | 105 (47) |
2 NRTI + 1 NNRTI | 6398 (22) | 1396 (44) | 431 (25) | 101 (45) |
3 NRTIs + 1 PI | 2035 (7) | 30 (1) | 118 (7) | 0 (0) |
3 NRTIs | 1344 (5) | 2 (0) | 73 (4) | 0 (0) |
3 NRTIs + 1 NNRTI | 1037 (4) | 17 (1) | 66 (4) | 3 (1) |
2 NRTIs | 896 (3) | 37 (1) | 51 (3) | 2 (1) |
2 NRTIs + 1 NNRTI + 1 PI | 742 (3) | 5 (0) | 36 (2) | 0 (0) |
1 PI + 1 integrase inhibitor | 734 (2) | 0 (0) | 41 (2) | 0 (0) |
4 NRTIs | 547 (2) | 0 (0) | 28 (2) | 0 (0) |
1 NRTI + 1 NNRTI + 1 PI | 534 (2) | 1 (0) | 27 (2) | 0 (0) |
1 NRTI + 1 PI | 440 (1) | 59 (2) | 30 (2) | 5 (2) |
other | 4521 (15) | 85 (3) | 218 (13) | 6 (3) |
Characteristics of the TCEs in the training and test sets for the models that use a genotype
. | Global training set . | Global test set . |
---|---|---|
TCEs | 15 130 | 750 |
male, n | 9301 | 457 |
female, n | 1950 | 94 |
gender not known, n | 3879 | 199 |
age (years), median | 44 | 43 |
Geographical sources of TCEs, n | ||
Australia | 292 | 19 |
Canada | 1329 | 68 |
Germany | 1290 | 45 |
India | 76 | 8 |
Italy | 707 | 52 |
Japan | 116 | 7 |
Netherlands | 814 | 40 |
Romania | 20 | 1 |
South Africa | 60 | 7 |
Spain | 1271 | 63 |
UK | 2556 | 94 |
USA | 438 | 23 |
sub-Saharan Africa (country unknown) | 34 | 3 |
unknown (from multinational cohorts/trials) | 6127 | 320 |
Baseline data | ||
baseline viral load (log10 copies/mL), median (IQR) | 4.3 (3.55–4.9) | 4.3 (3.6–4.9) |
baseline CD4 (cells/mm3), median (IQR) | 218 (105–360) | 225 (106.5–354) |
Treatment history | ||
no. of previous drugs, median (IQR) | 5 (3–7) | 5 (3–7) |
NRTI experience, n (%) | 15 057 (99.5) | 745 (99) |
NNRTI experience, n (%) | 9570 (63) | 469 (63) |
PI experience, n (%) | 11 045 (73) | 530 (71) |
no. of previous regimens, median (IQR) | 4 (3–7) | 3 (2–6) |
New regimens, n (%) | ||
2 NRTI + 1 PI | 4307 (28) | 223 (30) |
2 NRTI + 1 NNRTI | 1498 (10) | 66 (9) |
3 NRTIs + 1 PI | 1343 (9) | 66 (9) |
1 PI + 1 II | 655 (4) | 36 (5) |
3 NRTIs | 648 (4) | 26 (3) |
3 NRTIs + 1 NNRTI | 615 (4) | 24 (3) |
2 NRTIs + 1 PI + 1 II | 593 (4) | 23 (3) |
1 NRTI + 1 PI + 1 II | 591 (4) | 43 (6) |
2 NRTIs | 407 (3) | 19 (3) |
2 NRTIs + 1 NNRTI + 1 PI | 445 (3) | 22 (3) |
4 NRTIs | 348 (2) | 14 (2) |
1 NRTI + 1 NNRTI + 1 PI | 382 (3) | 21 (3) |
2 NRTIs + 1 NNRTI + 1 II | 285 (2) | 7 (1) |
1 NRTTI + 1 PI | 272 (2) | 18 (2) |
2 NRTIs + 1 II | 249 (2) | 17 (2) |
other | 2492 (16) | 251 (33) |
. | Global training set . | Global test set . |
---|---|---|
TCEs | 15 130 | 750 |
male, n | 9301 | 457 |
female, n | 1950 | 94 |
gender not known, n | 3879 | 199 |
age (years), median | 44 | 43 |
Geographical sources of TCEs, n | ||
Australia | 292 | 19 |
Canada | 1329 | 68 |
Germany | 1290 | 45 |
India | 76 | 8 |
Italy | 707 | 52 |
Japan | 116 | 7 |
Netherlands | 814 | 40 |
Romania | 20 | 1 |
South Africa | 60 | 7 |
Spain | 1271 | 63 |
UK | 2556 | 94 |
USA | 438 | 23 |
sub-Saharan Africa (country unknown) | 34 | 3 |
unknown (from multinational cohorts/trials) | 6127 | 320 |
Baseline data | ||
baseline viral load (log10 copies/mL), median (IQR) | 4.3 (3.55–4.9) | 4.3 (3.6–4.9) |
baseline CD4 (cells/mm3), median (IQR) | 218 (105–360) | 225 (106.5–354) |
Treatment history | ||
no. of previous drugs, median (IQR) | 5 (3–7) | 5 (3–7) |
NRTI experience, n (%) | 15 057 (99.5) | 745 (99) |
NNRTI experience, n (%) | 9570 (63) | 469 (63) |
PI experience, n (%) | 11 045 (73) | 530 (71) |
no. of previous regimens, median (IQR) | 4 (3–7) | 3 (2–6) |
New regimens, n (%) | ||
2 NRTI + 1 PI | 4307 (28) | 223 (30) |
2 NRTI + 1 NNRTI | 1498 (10) | 66 (9) |
3 NRTIs + 1 PI | 1343 (9) | 66 (9) |
1 PI + 1 II | 655 (4) | 36 (5) |
3 NRTIs | 648 (4) | 26 (3) |
3 NRTIs + 1 NNRTI | 615 (4) | 24 (3) |
2 NRTIs + 1 PI + 1 II | 593 (4) | 23 (3) |
1 NRTI + 1 PI + 1 II | 591 (4) | 43 (6) |
2 NRTIs | 407 (3) | 19 (3) |
2 NRTIs + 1 NNRTI + 1 PI | 445 (3) | 22 (3) |
4 NRTIs | 348 (2) | 14 (2) |
1 NRTI + 1 NNRTI + 1 PI | 382 (3) | 21 (3) |
2 NRTIs + 1 NNRTI + 1 II | 285 (2) | 7 (1) |
1 NRTTI + 1 PI | 272 (2) | 18 (2) |
2 NRTIs + 1 II | 249 (2) | 17 (2) |
other | 2492 (16) | 251 (33) |
II, integrase inhibitor.
Characteristics of the TCEs in the training and test sets for the models that use a genotype
. | Global training set . | Global test set . |
---|---|---|
TCEs | 15 130 | 750 |
male, n | 9301 | 457 |
female, n | 1950 | 94 |
gender not known, n | 3879 | 199 |
age (years), median | 44 | 43 |
Geographical sources of TCEs, n | ||
Australia | 292 | 19 |
Canada | 1329 | 68 |
Germany | 1290 | 45 |
India | 76 | 8 |
Italy | 707 | 52 |
Japan | 116 | 7 |
Netherlands | 814 | 40 |
Romania | 20 | 1 |
South Africa | 60 | 7 |
Spain | 1271 | 63 |
UK | 2556 | 94 |
USA | 438 | 23 |
sub-Saharan Africa (country unknown) | 34 | 3 |
unknown (from multinational cohorts/trials) | 6127 | 320 |
Baseline data | ||
baseline viral load (log10 copies/mL), median (IQR) | 4.3 (3.55–4.9) | 4.3 (3.6–4.9) |
baseline CD4 (cells/mm3), median (IQR) | 218 (105–360) | 225 (106.5–354) |
Treatment history | ||
no. of previous drugs, median (IQR) | 5 (3–7) | 5 (3–7) |
NRTI experience, n (%) | 15 057 (99.5) | 745 (99) |
NNRTI experience, n (%) | 9570 (63) | 469 (63) |
PI experience, n (%) | 11 045 (73) | 530 (71) |
no. of previous regimens, median (IQR) | 4 (3–7) | 3 (2–6) |
New regimens, n (%) | ||
2 NRTI + 1 PI | 4307 (28) | 223 (30) |
2 NRTI + 1 NNRTI | 1498 (10) | 66 (9) |
3 NRTIs + 1 PI | 1343 (9) | 66 (9) |
1 PI + 1 II | 655 (4) | 36 (5) |
3 NRTIs | 648 (4) | 26 (3) |
3 NRTIs + 1 NNRTI | 615 (4) | 24 (3) |
2 NRTIs + 1 PI + 1 II | 593 (4) | 23 (3) |
1 NRTI + 1 PI + 1 II | 591 (4) | 43 (6) |
2 NRTIs | 407 (3) | 19 (3) |
2 NRTIs + 1 NNRTI + 1 PI | 445 (3) | 22 (3) |
4 NRTIs | 348 (2) | 14 (2) |
1 NRTI + 1 NNRTI + 1 PI | 382 (3) | 21 (3) |
2 NRTIs + 1 NNRTI + 1 II | 285 (2) | 7 (1) |
1 NRTTI + 1 PI | 272 (2) | 18 (2) |
2 NRTIs + 1 II | 249 (2) | 17 (2) |
other | 2492 (16) | 251 (33) |
. | Global training set . | Global test set . |
---|---|---|
TCEs | 15 130 | 750 |
male, n | 9301 | 457 |
female, n | 1950 | 94 |
gender not known, n | 3879 | 199 |
age (years), median | 44 | 43 |
Geographical sources of TCEs, n | ||
Australia | 292 | 19 |
Canada | 1329 | 68 |
Germany | 1290 | 45 |
India | 76 | 8 |
Italy | 707 | 52 |
Japan | 116 | 7 |
Netherlands | 814 | 40 |
Romania | 20 | 1 |
South Africa | 60 | 7 |
Spain | 1271 | 63 |
UK | 2556 | 94 |
USA | 438 | 23 |
sub-Saharan Africa (country unknown) | 34 | 3 |
unknown (from multinational cohorts/trials) | 6127 | 320 |
Baseline data | ||
baseline viral load (log10 copies/mL), median (IQR) | 4.3 (3.55–4.9) | 4.3 (3.6–4.9) |
baseline CD4 (cells/mm3), median (IQR) | 218 (105–360) | 225 (106.5–354) |
Treatment history | ||
no. of previous drugs, median (IQR) | 5 (3–7) | 5 (3–7) |
NRTI experience, n (%) | 15 057 (99.5) | 745 (99) |
NNRTI experience, n (%) | 9570 (63) | 469 (63) |
PI experience, n (%) | 11 045 (73) | 530 (71) |
no. of previous regimens, median (IQR) | 4 (3–7) | 3 (2–6) |
New regimens, n (%) | ||
2 NRTI + 1 PI | 4307 (28) | 223 (30) |
2 NRTI + 1 NNRTI | 1498 (10) | 66 (9) |
3 NRTIs + 1 PI | 1343 (9) | 66 (9) |
1 PI + 1 II | 655 (4) | 36 (5) |
3 NRTIs | 648 (4) | 26 (3) |
3 NRTIs + 1 NNRTI | 615 (4) | 24 (3) |
2 NRTIs + 1 PI + 1 II | 593 (4) | 23 (3) |
1 NRTI + 1 PI + 1 II | 591 (4) | 43 (6) |
2 NRTIs | 407 (3) | 19 (3) |
2 NRTIs + 1 NNRTI + 1 PI | 445 (3) | 22 (3) |
4 NRTIs | 348 (2) | 14 (2) |
1 NRTI + 1 NNRTI + 1 PI | 382 (3) | 21 (3) |
2 NRTIs + 1 NNRTI + 1 II | 285 (2) | 7 (1) |
1 NRTTI + 1 PI | 272 (2) | 18 (2) |
2 NRTIs + 1 II | 249 (2) | 17 (2) |
other | 2492 (16) | 251 (33) |
II, integrase inhibitor.
The datasets have similar baseline viral loads, but the South African patients have substantially lower CD4 counts at baseline than the global set (median of 209 versus 264 for the training datasets). The South African cases were mostly cases moving from first to second line with three drugs in their history, whereas the median number of previous drugs for the global data was four. Thirty-five percent of the global training cases were switching to two NNRTIs and a PI compared with 49% of the South African cases. Twenty-two percent of global cases were switching to two NRTIs and an NNRTI compared with 44% of the South African cases.
The baseline characteristics of the global genotype model data are summarized in Table 2. They resembled those of the global no-genotype data, with the exceptions that the median number of previous drugs was higher, at five versus four, the median baseline viral load was a little higher at 4.13 versus 3.8 log10 copies/mL HIV RNA and median CD4 counts were lower at 218 versus 264 cells/mm3.
Results of the modelling without a genotype
The performance characteristics from the ROC curves of the two sets of 10 individual models during cross-validation and independent testing are summarized in Table 3. The 10 global no-genotype models achieved AUC values during cross-validation ranging from 0.81 to 0.85, with a mean of 0.83. The overall accuracy ranged from 75% to 78% (mean = 76%), the sensitivity ranged from 70% to 72% (mean = 71%) and the specificity from 78% to 83% (mean = 80%). The OOP was 0.44.
Cross-validation during model development . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Model . | global models . | local models . | ||||||||
AUC . | sensitivity (%) . | specificity (%) . | OA (%) . | OOP . | AUC . | sensitivity (%) . | specificity (%) . | OA (%) . | OOP . | |
1 | 0.85 | 72 | 81 | 78 | 0.45 | 0.85 | 74 | 82 | 79 | 0.43 |
2 | 0.83 | 70 | 81 | 77 | 0.43 | 0.73 | 62 | 77 | 71 | 0.39 |
3 | 0.83 | 70 | 80 | 76 | 0.44 | 0.90 | 86 | 84 | 85 | 0.43 |
4 | 0.83 | 70 | 80 | 76 | 0.43 | 0.81 | 71 | 78 | 75 | 0.40 |
5 | 0.83 | 70 | 80 | 77 | 0.44 | 0.81 | 71 | 80 | 77 | 0.41 |
6 | 0.81 | 70 | 78 | 75 | 0.42 | 0.81 | 71 | 75 | 74 | 0.38 |
7 | 0.83 | 72 | 78 | 76 | 0.44 | 0.74 | 63 | 77 | 72 | 0.39 |
8 | 0.85 | 70 | 83 | 78 | 0.46 | 0.80 | 67 | 78 | 74 | 0.42 |
9 | 0.84 | 71 | 80 | 77 | 0.44 | 0.77 | 70 | 73 | 72 | 0.39 |
10 | 0.82 | 70 | 78 | 75 | 0.43 | 0.82 | 68 | 83 | 78 | 0.43 |
Mean | 0.83 | 71 | 80 | 76 | 0.44 | 0.80 | 70 | 79 | 76 | 0.41 |
Min | 0.81 | 70 | 78 | 75 | 0.42 | 0.73 | 62 | 73 | 71 | 0.38 |
Max | 0.85 | 72 | 83 | 78 | 0.46 | 0.90 | 86 | 84 | 85 | 0.43 |
Independent testing | ||||||||||
Test set | ||||||||||
global (1700) | 0.82 | 69 | 77 | 74 | 0.44 | 0.70 | 48 | 78 | 66 | 0.41 |
local (222) | 0.79 | 68 | 76 | 73 | 0.44 | 0.79 | 64 | 79 | 74 | 0.41 |
Cross-validation during model development . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Model . | global models . | local models . | ||||||||
AUC . | sensitivity (%) . | specificity (%) . | OA (%) . | OOP . | AUC . | sensitivity (%) . | specificity (%) . | OA (%) . | OOP . | |
1 | 0.85 | 72 | 81 | 78 | 0.45 | 0.85 | 74 | 82 | 79 | 0.43 |
2 | 0.83 | 70 | 81 | 77 | 0.43 | 0.73 | 62 | 77 | 71 | 0.39 |
3 | 0.83 | 70 | 80 | 76 | 0.44 | 0.90 | 86 | 84 | 85 | 0.43 |
4 | 0.83 | 70 | 80 | 76 | 0.43 | 0.81 | 71 | 78 | 75 | 0.40 |
5 | 0.83 | 70 | 80 | 77 | 0.44 | 0.81 | 71 | 80 | 77 | 0.41 |
6 | 0.81 | 70 | 78 | 75 | 0.42 | 0.81 | 71 | 75 | 74 | 0.38 |
7 | 0.83 | 72 | 78 | 76 | 0.44 | 0.74 | 63 | 77 | 72 | 0.39 |
8 | 0.85 | 70 | 83 | 78 | 0.46 | 0.80 | 67 | 78 | 74 | 0.42 |
9 | 0.84 | 71 | 80 | 77 | 0.44 | 0.77 | 70 | 73 | 72 | 0.39 |
10 | 0.82 | 70 | 78 | 75 | 0.43 | 0.82 | 68 | 83 | 78 | 0.43 |
Mean | 0.83 | 71 | 80 | 76 | 0.44 | 0.80 | 70 | 79 | 76 | 0.41 |
Min | 0.81 | 70 | 78 | 75 | 0.42 | 0.73 | 62 | 73 | 71 | 0.38 |
Max | 0.85 | 72 | 83 | 78 | 0.46 | 0.90 | 86 | 84 | 85 | 0.43 |
Independent testing | ||||||||||
Test set | ||||||||||
global (1700) | 0.82 | 69 | 77 | 74 | 0.44 | 0.70 | 48 | 78 | 66 | 0.41 |
local (222) | 0.79 | 68 | 76 | 73 | 0.44 | 0.79 | 64 | 79 | 74 | 0.41 |
OA, overall accuracy.
Cross-validation during model development . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Model . | global models . | local models . | ||||||||
AUC . | sensitivity (%) . | specificity (%) . | OA (%) . | OOP . | AUC . | sensitivity (%) . | specificity (%) . | OA (%) . | OOP . | |
1 | 0.85 | 72 | 81 | 78 | 0.45 | 0.85 | 74 | 82 | 79 | 0.43 |
2 | 0.83 | 70 | 81 | 77 | 0.43 | 0.73 | 62 | 77 | 71 | 0.39 |
3 | 0.83 | 70 | 80 | 76 | 0.44 | 0.90 | 86 | 84 | 85 | 0.43 |
4 | 0.83 | 70 | 80 | 76 | 0.43 | 0.81 | 71 | 78 | 75 | 0.40 |
5 | 0.83 | 70 | 80 | 77 | 0.44 | 0.81 | 71 | 80 | 77 | 0.41 |
6 | 0.81 | 70 | 78 | 75 | 0.42 | 0.81 | 71 | 75 | 74 | 0.38 |
7 | 0.83 | 72 | 78 | 76 | 0.44 | 0.74 | 63 | 77 | 72 | 0.39 |
8 | 0.85 | 70 | 83 | 78 | 0.46 | 0.80 | 67 | 78 | 74 | 0.42 |
9 | 0.84 | 71 | 80 | 77 | 0.44 | 0.77 | 70 | 73 | 72 | 0.39 |
10 | 0.82 | 70 | 78 | 75 | 0.43 | 0.82 | 68 | 83 | 78 | 0.43 |
Mean | 0.83 | 71 | 80 | 76 | 0.44 | 0.80 | 70 | 79 | 76 | 0.41 |
Min | 0.81 | 70 | 78 | 75 | 0.42 | 0.73 | 62 | 73 | 71 | 0.38 |
Max | 0.85 | 72 | 83 | 78 | 0.46 | 0.90 | 86 | 84 | 85 | 0.43 |
Independent testing | ||||||||||
Test set | ||||||||||
global (1700) | 0.82 | 69 | 77 | 74 | 0.44 | 0.70 | 48 | 78 | 66 | 0.41 |
local (222) | 0.79 | 68 | 76 | 73 | 0.44 | 0.79 | 64 | 79 | 74 | 0.41 |
Cross-validation during model development . | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Model . | global models . | local models . | ||||||||
AUC . | sensitivity (%) . | specificity (%) . | OA (%) . | OOP . | AUC . | sensitivity (%) . | specificity (%) . | OA (%) . | OOP . | |
1 | 0.85 | 72 | 81 | 78 | 0.45 | 0.85 | 74 | 82 | 79 | 0.43 |
2 | 0.83 | 70 | 81 | 77 | 0.43 | 0.73 | 62 | 77 | 71 | 0.39 |
3 | 0.83 | 70 | 80 | 76 | 0.44 | 0.90 | 86 | 84 | 85 | 0.43 |
4 | 0.83 | 70 | 80 | 76 | 0.43 | 0.81 | 71 | 78 | 75 | 0.40 |
5 | 0.83 | 70 | 80 | 77 | 0.44 | 0.81 | 71 | 80 | 77 | 0.41 |
6 | 0.81 | 70 | 78 | 75 | 0.42 | 0.81 | 71 | 75 | 74 | 0.38 |
7 | 0.83 | 72 | 78 | 76 | 0.44 | 0.74 | 63 | 77 | 72 | 0.39 |
8 | 0.85 | 70 | 83 | 78 | 0.46 | 0.80 | 67 | 78 | 74 | 0.42 |
9 | 0.84 | 71 | 80 | 77 | 0.44 | 0.77 | 70 | 73 | 72 | 0.39 |
10 | 0.82 | 70 | 78 | 75 | 0.43 | 0.82 | 68 | 83 | 78 | 0.43 |
Mean | 0.83 | 71 | 80 | 76 | 0.44 | 0.80 | 70 | 79 | 76 | 0.41 |
Min | 0.81 | 70 | 78 | 75 | 0.42 | 0.73 | 62 | 73 | 71 | 0.38 |
Max | 0.85 | 72 | 83 | 78 | 0.46 | 0.90 | 86 | 84 | 85 | 0.43 |
Independent testing | ||||||||||
Test set | ||||||||||
global (1700) | 0.82 | 69 | 77 | 74 | 0.44 | 0.70 | 48 | 78 | 66 | 0.41 |
local (222) | 0.79 | 68 | 76 | 73 | 0.44 | 0.79 | 64 | 79 | 74 | 0.41 |
OA, overall accuracy.
The local South African no-genotype models achieved AUC values during cross-validation ranging from 0.73 to 0.90 with a mean of 0.80. The overall accuracy ranged from 71% to 85% (mean = 76%), the sensitivity ranged from 62% to 86% (mean = 70%) and the specificity ranged from 73% to 84% (mean = 79%).
Independent testing of global models
The committee of 10 global no-genotype models achieved an AUC of 0.82 in independent testing with the global test set of 1700 TCEs. The overall accuracy was 74%, the sensitivity 69% and the specificity 77%. The results are listed in Table 3 and the ROC curve for the committee is presented in Figure 2. When tested with the independent set of 222 local South African TCEs, the global models achieved an AUC of 0.79. The overall accuracy was 73%, the sensitivity 68% and the specificity 76%.

ROC curves for the global and local (South African) models in independent testing with global and local (South African) test data.
Independent testing of the South African no-genotype models
The committee of 10 local South African no-genotype models achieved an AUC of 0.79 when tested with the local South African test cases, identical to that of the global models. The overall accuracy was 74%, the sensitivity 64% and the specificity 79%. The results are listed in Table 3 and the ROC curve for the committee is presented in Figure 2. When tested with the global independent set of 1700 TCEs, the local models achieved an AUC of 0.70. The overall accuracy was 66%, the sensitivity 48% and the specificity 78%. The accuracy of the two sets of models was not significantly different for the South African test cases, but the global models were significantly more accurate for the global test set (P < 0.0001).
Comparing the predictive accuracy of the global no-genotype models versus genotyping
Of the 1700 TCEs in the global no-genotype test set, 680 had genotypes available that were suitable for full interpretation using the three rules-based genotype interpretation systems. The AUC values for the GSS obtained using the three systems were 0.57 (ANRS), 0.57 (Rega) and 0.58 (Stanford HIVdb) (Table 4). All were significantly less accurate predictors of virological response than the global no-genotype models, which achieved an AUC of 0.83 for these cases (P < 0.0001).
Comparison of model predictions versus GSS for 601 test TCEs with genotypes
Prediction system . | AUC . | Sensitivity (%) . | Specificity (%) . | Overall accuracy (%) . | P (GSS versus models) . |
---|---|---|---|---|---|
No-genotype models | |||||
Total ANRS score | 0.57 | 48 | 61 | 57 | <0.0001 |
Total HIVdb score | 0.58 | 43 | 68 | 60 | <0.0001 |
Total REGA score | 0.57 | 49 | 60 | 57 | <0.0001 |
Models | 0.83 | 58 | 88 | 78 | |
Genotype models | |||||
Total ANRS score | 0.57 | 48 | 60 | 57 | <0.0001 |
Total HIVdb score | 0.58 | 39 | 67 | 59 | <0.0001 |
Total REGA score | 0.55 | 46 | 60 | 56 | <0.0001 |
Models | 0.82 | 74 | 77 | 76 |
Prediction system . | AUC . | Sensitivity (%) . | Specificity (%) . | Overall accuracy (%) . | P (GSS versus models) . |
---|---|---|---|---|---|
No-genotype models | |||||
Total ANRS score | 0.57 | 48 | 61 | 57 | <0.0001 |
Total HIVdb score | 0.58 | 43 | 68 | 60 | <0.0001 |
Total REGA score | 0.57 | 49 | 60 | 57 | <0.0001 |
Models | 0.83 | 58 | 88 | 78 | |
Genotype models | |||||
Total ANRS score | 0.57 | 48 | 60 | 57 | <0.0001 |
Total HIVdb score | 0.58 | 39 | 67 | 59 | <0.0001 |
Total REGA score | 0.55 | 46 | 60 | 56 | <0.0001 |
Models | 0.82 | 74 | 77 | 76 |
GSS, genotype sensitivity score.
Comparison of model predictions versus GSS for 601 test TCEs with genotypes
Prediction system . | AUC . | Sensitivity (%) . | Specificity (%) . | Overall accuracy (%) . | P (GSS versus models) . |
---|---|---|---|---|---|
No-genotype models | |||||
Total ANRS score | 0.57 | 48 | 61 | 57 | <0.0001 |
Total HIVdb score | 0.58 | 43 | 68 | 60 | <0.0001 |
Total REGA score | 0.57 | 49 | 60 | 57 | <0.0001 |
Models | 0.83 | 58 | 88 | 78 | |
Genotype models | |||||
Total ANRS score | 0.57 | 48 | 60 | 57 | <0.0001 |
Total HIVdb score | 0.58 | 39 | 67 | 59 | <0.0001 |
Total REGA score | 0.55 | 46 | 60 | 56 | <0.0001 |
Models | 0.82 | 74 | 77 | 76 |
Prediction system . | AUC . | Sensitivity (%) . | Specificity (%) . | Overall accuracy (%) . | P (GSS versus models) . |
---|---|---|---|---|---|
No-genotype models | |||||
Total ANRS score | 0.57 | 48 | 61 | 57 | <0.0001 |
Total HIVdb score | 0.58 | 43 | 68 | 60 | <0.0001 |
Total REGA score | 0.57 | 49 | 60 | 57 | <0.0001 |
Models | 0.83 | 58 | 88 | 78 | |
Genotype models | |||||
Total ANRS score | 0.57 | 48 | 60 | 57 | <0.0001 |
Total HIVdb score | 0.58 | 39 | 67 | 59 | <0.0001 |
Total REGA score | 0.55 | 46 | 60 | 56 | <0.0001 |
Models | 0.82 | 74 | 77 | 76 |
GSS, genotype sensitivity score.
Results of the modelling with a genotype
The performance characteristics from the ROC curves of the 10 individual models during cross-validation and independent testing are summarized in Table 5. The 10 global genotype models achieved AUC values during cross-validation ranging from 0.85 to 0.89, with a mean of 0.87. The overall accuracy ranged from 76% to 80% (mean = 79%), the sensitivity ranged from 78% to 84% (mean = 79%) and the specificity from 75% to 82% (mean = 79%). The OOP was 0.38.
Cross-validation during model development . | |||||
---|---|---|---|---|---|
Model . | AUC . | sensitivity (%) . | specificity (%) . | overall accuracy (%) . | OOP . |
1 | 0.88 | 78 | 80 | 80 | 0.36 |
2 | 0.88 | 78 | 80 | 80 | 0.38 |
3 | 0.87 | 78 | 81 | 80 | 0.41 |
4 | 0.87 | 82 | 79 | 80 | 0.35 |
5 | 0.88 | 80 | 79 | 79 | 0.39 |
6 | 0.85 | 78 | 75 | 76 | 0.40 |
7 | 0.87 | 79 | 78 | 78 | 0.39 |
8 | 0.89 | 84 | 79 | 80 | 0.35 |
9 | 0.87 | 78 | 82 | 80 | 0.39 |
10 | 0.88 | 79 | 81 | 80 | 0.39 |
Mean | 0.87 | 79 | 79 | 79 | 0.38 |
Min | 0.85 | 78 | 75 | 76 | 0.35 |
Max | 0.89 | 84 | 82 | 80 | 0.41 |
Independent testing (n = 750) | 0.84 | 79 | 74 | 76 |
Cross-validation during model development . | |||||
---|---|---|---|---|---|
Model . | AUC . | sensitivity (%) . | specificity (%) . | overall accuracy (%) . | OOP . |
1 | 0.88 | 78 | 80 | 80 | 0.36 |
2 | 0.88 | 78 | 80 | 80 | 0.38 |
3 | 0.87 | 78 | 81 | 80 | 0.41 |
4 | 0.87 | 82 | 79 | 80 | 0.35 |
5 | 0.88 | 80 | 79 | 79 | 0.39 |
6 | 0.85 | 78 | 75 | 76 | 0.40 |
7 | 0.87 | 79 | 78 | 78 | 0.39 |
8 | 0.89 | 84 | 79 | 80 | 0.35 |
9 | 0.87 | 78 | 82 | 80 | 0.39 |
10 | 0.88 | 79 | 81 | 80 | 0.39 |
Mean | 0.87 | 79 | 79 | 79 | 0.38 |
Min | 0.85 | 78 | 75 | 76 | 0.35 |
Max | 0.89 | 84 | 82 | 80 | 0.41 |
Independent testing (n = 750) | 0.84 | 79 | 74 | 76 |
Cross-validation during model development . | |||||
---|---|---|---|---|---|
Model . | AUC . | sensitivity (%) . | specificity (%) . | overall accuracy (%) . | OOP . |
1 | 0.88 | 78 | 80 | 80 | 0.36 |
2 | 0.88 | 78 | 80 | 80 | 0.38 |
3 | 0.87 | 78 | 81 | 80 | 0.41 |
4 | 0.87 | 82 | 79 | 80 | 0.35 |
5 | 0.88 | 80 | 79 | 79 | 0.39 |
6 | 0.85 | 78 | 75 | 76 | 0.40 |
7 | 0.87 | 79 | 78 | 78 | 0.39 |
8 | 0.89 | 84 | 79 | 80 | 0.35 |
9 | 0.87 | 78 | 82 | 80 | 0.39 |
10 | 0.88 | 79 | 81 | 80 | 0.39 |
Mean | 0.87 | 79 | 79 | 79 | 0.38 |
Min | 0.85 | 78 | 75 | 76 | 0.35 |
Max | 0.89 | 84 | 82 | 80 | 0.41 |
Independent testing (n = 750) | 0.84 | 79 | 74 | 76 |
Cross-validation during model development . | |||||
---|---|---|---|---|---|
Model . | AUC . | sensitivity (%) . | specificity (%) . | overall accuracy (%) . | OOP . |
1 | 0.88 | 78 | 80 | 80 | 0.36 |
2 | 0.88 | 78 | 80 | 80 | 0.38 |
3 | 0.87 | 78 | 81 | 80 | 0.41 |
4 | 0.87 | 82 | 79 | 80 | 0.35 |
5 | 0.88 | 80 | 79 | 79 | 0.39 |
6 | 0.85 | 78 | 75 | 76 | 0.40 |
7 | 0.87 | 79 | 78 | 78 | 0.39 |
8 | 0.89 | 84 | 79 | 80 | 0.35 |
9 | 0.87 | 78 | 82 | 80 | 0.39 |
10 | 0.88 | 79 | 81 | 80 | 0.39 |
Mean | 0.87 | 79 | 79 | 79 | 0.38 |
Min | 0.85 | 78 | 75 | 76 | 0.35 |
Max | 0.89 | 84 | 82 | 80 | 0.41 |
Independent testing (n = 750) | 0.84 | 79 | 74 | 76 |
Independent testing
The committee of 10 global genotype models achieved an AUC of 0.84 in independent testing with a global test set of 750 TCEs. The overall accuracy was 76%, the sensitivity 79% and the specificity 74%. The results are listed in Table 4. When tested with the 50 independent test cases involving elvitegravir in the new regimen, the models achieved an AUC of 0.82 and overall accuracy of 76%.
Comparing the predictive accuracy of the global genotype models versus genotyping
Of the 750 TCEs in the global test set, 601 had genotypes available that were suitable for full interpretation using the three rules-based genotype interpretation systems. The AUC values for the GSS obtained using the three systems were 0.57 (ANRS), 0.55 (REGA) and 0.58 (Stanford HIVdb) (Table 4). All were significantly less accurate predictors of virological response than the global genotype models, which achieved an AUC of 0.82 for these cases (P < 0.0001).
In silico analysis
The global and South African no-genotype models were then used to identify alternative regimens to the new regimens introduced in the South African clinics, comprising locally available drugs, with the highest estimated probability of virological response. The results are presented in Table 6.
Results of in silico analysis for the global and local South African no-genotype models tested with the 222 test cases from South Africa
Test set . | Measure . | Global models . | Local models . | P (global versus local models) . |
---|---|---|---|---|
All 222 South African TCEs | alternatives predicted to be effective (%) | 74 | 64 | 0.03 |
alternatives with higher probability of response than regimen used in clinic | 100 | 99 | NS | |
difference in probability of response between best alternative versus regimen used in clinic | 0.18 | 0.14 | <0.001 | |
141 South African failures | alternatives predicted to be effective (%) | 62 | 50 | 0.04 |
alternatives with higher probability of response than regimen used in clinic | 100 | 99 | NS | |
difference in probability of response between best alternative versus regimen used in clinic | 0.20 | 0.15 | 0.007 |
Test set . | Measure . | Global models . | Local models . | P (global versus local models) . |
---|---|---|---|---|
All 222 South African TCEs | alternatives predicted to be effective (%) | 74 | 64 | 0.03 |
alternatives with higher probability of response than regimen used in clinic | 100 | 99 | NS | |
difference in probability of response between best alternative versus regimen used in clinic | 0.18 | 0.14 | <0.001 | |
141 South African failures | alternatives predicted to be effective (%) | 62 | 50 | 0.04 |
alternatives with higher probability of response than regimen used in clinic | 100 | 99 | NS | |
difference in probability of response between best alternative versus regimen used in clinic | 0.20 | 0.15 | 0.007 |
NS, not significant.
Results of in silico analysis for the global and local South African no-genotype models tested with the 222 test cases from South Africa
Test set . | Measure . | Global models . | Local models . | P (global versus local models) . |
---|---|---|---|---|
All 222 South African TCEs | alternatives predicted to be effective (%) | 74 | 64 | 0.03 |
alternatives with higher probability of response than regimen used in clinic | 100 | 99 | NS | |
difference in probability of response between best alternative versus regimen used in clinic | 0.18 | 0.14 | <0.001 | |
141 South African failures | alternatives predicted to be effective (%) | 62 | 50 | 0.04 |
alternatives with higher probability of response than regimen used in clinic | 100 | 99 | NS | |
difference in probability of response between best alternative versus regimen used in clinic | 0.20 | 0.15 | 0.007 |
Test set . | Measure . | Global models . | Local models . | P (global versus local models) . |
---|---|---|---|---|
All 222 South African TCEs | alternatives predicted to be effective (%) | 74 | 64 | 0.03 |
alternatives with higher probability of response than regimen used in clinic | 100 | 99 | NS | |
difference in probability of response between best alternative versus regimen used in clinic | 0.18 | 0.14 | <0.001 | |
141 South African failures | alternatives predicted to be effective (%) | 62 | 50 | 0.04 |
alternatives with higher probability of response than regimen used in clinic | 100 | 99 | NS | |
difference in probability of response between best alternative versus regimen used in clinic | 0.20 | 0.15 | 0.007 |
NS, not significant.
The global models were able to identify alternative regimens that were predicted to be effective for 74% of the South African cases. This was significantly higher than the 64% for the local South African models (P = 0.03). The global models were able to identify alternative regimens with a higher probability of response for all 222 South African test cases and the local South African models for 220 (99%). The difference in probability of response between the alternative regimen with the highest probability of response and the regimen used in the clinic was significantly higher for the global models at 0.18 than the local models at 0.14 (P < 0.001).
Turning to the 141 South African cases that experienced virological failure following their change of regimen in the clinic, the global no-genotype models were able to identify alternative regimens that were predicted to be effective for 62% and the local South African no-genotype models for 50% (P = 0.04). The global models were able to identify alternative regimens with a higher probability of response for all 141 failures and the local South African models for 140 (99%). The difference in probability of response between the alternative regimen with the highest probability of response and the regimen used in the clinic was significantly higher for the global models at 0.20 than the local models at 0.15 (P = 0.007).
The global genotype models identified alternative regimens with no more drugs than the new regimen used in the clinic that were predicted to give a response for 636 (96%) of the 664 global test cases that were used in the analysis. They identified regimens with a higher probability of response for 662 cases (99.7%). For the 424 patients who experienced virological failure following the change to a new regimen in the clinic, the models identified alternative regimens that were predicted to give a response for 401 (95%) and with a higher probability of response than the regimen used in the clinic for all 424 cases.
Discussion
These latest computational models are the most accurate predictors of virological response to combination ART developed to date. Both sets of global models achieved AUC values well in excess of 0.80 in cross-validation and independent testing and there was no significant difference in accuracy between those models that use a genotype and those that do not. The results replicated and reinforced previous findings that computational models, even those that do not use a genotype in their predictions, are substantially more accurate predictors of virological response to combination ART than viral genotyping with rules-based interpretation.11
The results indicate that models developed using large global datasets can perform as accurately as models developed using more limited data from a single setting in predicting responses to HIV therapy for patients from that setting. Moreover, the global models are significantly better able to model the responses to alternative combinations of drugs and identify options with a higher probability of virological response than the regimens used in the clinic: they identified viable alternatives for more cases and with greater improvements in the probability of response than the local models.
The broad range of settings and countries represented in the data used in this study suggests that these findings and the potential utility of these models are generalizable and applicable to all settings. These results taken together suggest that a strategy of developing models using the largest most heterogeneous dataset possible for global use would be superior to one of developing national models using smaller, less heterogeneous datasets.
It is important to note that one of the input variables for these models was the plasma viral load, which previous studies have shown to be very important to the predictive accuracy of the models.14 Although viral load monitoring is not routine in most resource-limited settings, it is now recommended as the preferred approach to monitoring ART success and diagnosing treatment failure in the latest WHO guidelines.15 As technological advances enable lower test costs and simpler equipment requiring less infrastructure, maintenance and technical expertise, so the use of viral load in clinical practice is likely to increase.16 A corollary of this is that use of CD4 counts may become less widespread in such settings, preventing the use of the current models. If this transpires, models could be developed that do not require CD4 counts for their predictions, although results of preliminary studies conducted by the RDI in 2003–04 indicated that the inclusion of CD4 counts contributed to model accuracy (B. A. Larder, D. Wang and A. D. Revell, unpublished results).
Another potential avenue for future research is the development of models that use the duration of previous drug exposures as input variables, instead of the current binary variables. While only a subset of the RDI database includes complete information on the duration of previous therapies, as the database expands this pool may be sufficient for the development and evaluation of models.
The study has some limitations. First, it was retrospective and, as such, no firm claims can be made for the clinical benefit that use of the system as a treatment support tool could provide. Another possible shortcoming, inherent in such studies and discussed in previous publications, is that the cases used are, by definition, those with complete data around a change of therapy and, therefore, may not be truly representative of the general patient population and clinical practice as a whole. Nevertheless, the size of the training dataset, the variety of sources and settings from which the data were collected and the range of clinical practice represented are positive factors in considering the potential robustness and generalizability of the models' performance.
Conclusions
Computational models developed using large global datasets can be as accurate for a patient in a given setting as models developed using data only from that setting and are better able to identify alternatives with a higher probability of virological response than the regimens otherwise used in the clinic.
Global models developed using large heterogeneous clinical datasets are significantly better predictors of response to HIV therapy than genotyping with rules-based interpretation, even when those models do not use a genotype for their predictions. Since use of these models is free of charge, this suggests that scarce funds in LMICs would be better spent on antiretroviral drugs and viral load testing than on genotyping. This would enable a greater range of treatments to be offered, failure to be detected early, and optimal individualized treatment change decisions made using the models.
Full validation of this approach as a clinical tool would require a prospective, controlled clinical trial. Nevertheless, the results suggest that the global models have the potential to reduce virological failure and improve patient outcomes in all parts of the world, with particular utility in resource-limited settings. The models can provide clinicians with a practical tool to support optimized treatment decision-making in the absence of resistance tests and where expertise may be lacking in the context of a public health approach to antiretroviral roll-out and management.
The global models described in this paper are freely available to use online through the HIV-TRePS system at http://www.hivrdi.org/treps.
Funding
This project has been funded in whole or in part with federal funds from the National Cancer Institute, National Institutes of Health, under Contract No. HHSN261200800001E. This research was supported by the National Institute of Allergy and Infectious Diseases.
Transparency declarations
None to declare.
Disclaimer
The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, and mention of trade names, commercial products or organizations does not imply endorsement by the US Government.
Acknowledgements
RDI Data and Study Group
The RDI wishes to thank all the following individuals and institutions for providing the data used in training and testing its models.
Cohorts
Peter Reiss and Ard van Sighem (ATHENA, the Netherlands); Julio Montaner and Richard Harrigan (BC Center for Excellence in HIV & AIDS, Canada); Tobias Rinke de Wit, Raph Hamers and Kim Sigaloff (PASER-M cohort, the Netherlands); Brian Agan, Vincent Marconi and Scott Wegner (US Department of Defense); Wataru Sugiura (National Institute of Health, Japan); Maurizio Zazzi (MASTER, Italy); Rolf Kaiser and Eugen Schuelter (Arevir Cohort, Köln, Germany); Adrian Streinu-Cercel (National Institute of Infectious Diseases ‘Prof. Dr. Matei Balş’, Bucharest, Romania); and Gerardo Alvarez-Uria (VFHCS, India).
Clinics
Jose Gatell and Elisa Lazzari (University Hospital, Barcelona, Spain); Brian Gazzard, Mark Nelson, Anton Pozniak and Sundhiya Mandalia (Chelsea and Westminster Hospital, London, UK); Daniel Webster and Colette Smith (Royal Free Hospital, London, UK); Lidia Ruiz and Bonaventura Clotet (Fundacion Irsi Caixa, Badelona, Spain); Schlomo Staszewski (Hospital of the Johann Wolfgang Goethe-University, Frankfurt, Germany); Carlo Torti (University of Brescia); Cliff Lane and Julie Metcalf (NIH Clinic, Rockville, MD, USA); Maria-Jesus Perez-Elias (Instituto Ramón y Cajal de Investigación Sanitaria, Madrid, Spain); Stefano Vella and Gabrielle Dettorre (Sapienza University, Rome, Italy); Andrew Carr, Richard Norris and Karl Hesse (Immunology B Ambulatory Care Service, St Vincent's Hospital, Sydney, NSW, Australia); Dr Emanuel Vlahakis (Taylor's Square Private Clinic, Darlinghurst, NSW, Australia); Hugo Tempelman and Roos Barth (Ndlovu Care Group, Elandsdoorn, South Africa); Carl Morrow and Robin Wood (Desmond Tutu HIV Centre, University of Cape Town, South Africa); Chris Hoffmann (Arum Institute, Johannesburg, South Africa and Johns Hopkins University, Boston, MA, USA); Luminita Ene (‘Dr. Victor Babes’ Hospital for Infectious and Tropical Diseases, Bucharest, Romania); Gordana Dragovic (University of Belgrade, Belgrade, Serbia); Ricardo Diaz and Cecilia Sucupira (Federal University of Sao Paulo, Sao Paulo, Brazil); Omar Sued and Carina Cesar (Fundación Huésped, Buenos Aires, Argentina); and Juan Sierra Madero (Instituto Nacional de Ciencias Medicas y Nutricion Salvador Zubiran, Mexico City, Mexico).
Clinical trials
Sean Emery and David Cooper (CREST); Carlo Torti (GenPherex); John Baxter (GART, MDR); Laura Monno and Carlo Torti (PhenGen); Jose Gatell and Bonventura Clotet (HAVANA); Gaston Picchio and Marie-Pierre deBethune (DUET 1 & 2 and POWER 3); Maria-Jesus Perez-Elias (RealVirfen); and Sean Emery, Paul Khabo and Lotty Ledwaba (PHIDISA).
References
Author notes
Members are listed in the Acknowledgements section.