Chapter 15Understanding the statistical analysis of resistance data

Cozzi-Lepri A.

Publication Details

Introduction: nature of the data

Phenotypic resistance data are a continuous statistical variable that indicate the concentration of drug needed to inhibit the replication of a patient's virus. Typically, this is measured by specifying the concentration of drug needed to inhibit 50% or 90% of virus replication (IC50 or IC90, respectively), or by comparing the fold change in drug concentration required to inhibit the replication of the patient's virus compared with a sensitive virus isolate. In contrast, genotypic resistance data are typically a series of binary variables documenting the presence of specific mutations in the amino acid sequences of the HIV-1 genome in the reverse transcriptase (RT) or protease (PR) regions (e.g. yes/no variables for resistance mutations 41L, 65R, 67N and 184V). This chapter focuses on the statistical analysis of the genotypic resistance data that are generated from routine clinical care.

Data managing: the choice of reference HIV-1 strain

The raw sequencing data generated by genotypic resistance testing are typically a string of nucleotide sequence, such as CCTATAGTGCAGAACATCCAGGGC…, often known as FASTA format data. By comparing the location of the sequence to a reference HIV-1 virus, it is possible to identify which nucleotide changes have occurred. Whereas HXB2 is the most commonly used reference sequence, other references are also used, such as p_NL43 (a molecularly cloned strain of HIV-1 [1]) and Consensus B reference sequence (derived from an alignment of subtype B sequences maintained at the Los Alamos HIV-1 Sequence Database [2]). These reference strains are similar to each other but they can differ at certain codons (see Table 1).

Table 1. Amino acid alterations between the reverse transcriptase of HXB2 reference sequence and p_NL43.

Table 1

Amino acid alterations between the reverse transcriptase of HXB2 reference sequence and p_NL43.

A large number of sequences are commonly required to answer clinical questions related to the predictive value of genotypic resistance [35] and it is possible that different references have been used in different laboratories over time. It is crucial, therefore, that these differences are recognised when merging resistance databases coming from different studies or even from different laboratories within the same study. Also, it is clearly important to specify which reference was used in order to avoid confusion. One way to overcome this problem would be to obtain from the different sources the exact nucleotide triplet and the corresponding amino acid observed at each codon of the RT and PR regions rather than only obtaining differences from a reference strain.

Common assumptions: the importance of basic knowledge of HIV-1 virology

Once resistance data are merged and checked for consistency in the use of reference sequences, the statistician is likely to face other challenges. Data from observational databases are often a mixture of sequences obtained from collected, and retrospectively analysed, plasma samples and/or resistance results routinely collected in the clinics where patients receive their treatments. As a result, more than one sequence may be available for the same patient or, in contrast, it is possible that a plasma sample was not collected at a crucial time point. In order to extract the relevant information from such diverse resistance data, relatively complex statistical analyses and, more importantly, a good basic understanding of HIV-1 virology is required. Such understanding may allow assumptions that are useful to overcome a specific shortfall in the data. For example, the aim of the analysis may be to evaluate the role of a certain mutation to predict the virological response to antiretroviral therapy. Normally, a genotypic sequence obtained from a plasma sample collected within the 3 months preceding the initiation of the new drug or regimen would be selected. However, in highly treatment-experienced patients, it may be reasonable to assume that resistance mutations that emerged during any previous treatment would persist indefinitely and impact on the subsequent virological response [611]. It is recommended that sensitivity analyses are performed under these different assumptions.

It has been shown that only a proportion of patients with documented virological failure enrolled in observational studies have a resistance test performed near to the time of virological failure [5]. This proportion varies from study to study according to design and availability of resources and funding [5,12]. If this proportion is low and the aim of the analysis is to estimate the incidence of accumulation of drug resistance after a certain duration of exposure to combination antiretroviral therapy, then making some assumptions may help with the data interpretation. In particular, it may be reasonable to assume that patients who, in the past, were exposed to lamivudine as part of a regimen containing less than three drugs for more than a few weeks accumulated the 184V mutation [13,14]. Again, it is recommended that the results obtained from an analysis with and without this assumption are compared before drawing final conclusions. Of note, this assumption would define a patient as carrying mutation 184V even if this mutation was not actually detected by a genotypic test; thus, ultimately, if the aim is to assess the predictive value of resistance detected by a genotypic test alone, it would be best to avoid such an assumption altogether.

Further work is needed to achieve consensus on the standard definition of drug-class resistance. Definitions vary from study to study: some studies consider any mutation affecting any drug within a class as indicating drug-class resistance [15], whereas others use selected mutations [16]. Although it seems reasonable to assume that a single non-nucleoside reverse transcriptase inhibitor (NNRTI)-associated mutation would confer resistance to the whole class, this is not the case for nucleoside reverse transcriptase inhibitor (NRTI)- and protease inhibitor (PI)-associated mutations.

Programming interpretation rules and choice of a suitable virological endpoint

The success of genotypic resistance testing is dependent upon appropriate interpretation of genotypic data. Several interpretation systems (ISs) have been developed by incorporation of in vitro and clinical data, following review by independent panels of scientific experts [1719]. The interpretation of a sequence according to three of the most popular ISs [Stanford HIV-1 Drug Resistance Database (HIVDB), Agence Nationale de Recherches sur le Sida (ANRS) and Rega Institute] can be freely derived using web services [17]. Although the programs used to obtain this interpretation are not freely available, the drug-specific rules are provided on the website (at least for these three major ISs). It is relatively easy to translate these rules into lines of commands in any statistical package. The Stanford web service provides a gold standard against which to cross-validate such a programing process [17]. Collaboration between a team of statisticians, who independently developed their programs possibly using different statistical packages, has also proven useful.

Discrepancies in the results obtained from the different ISs have been observed, and may be more pronounced for some drugs (e.g. didanosine and abacavir) than for others (e.g. lamivudine). A major challenge in recent years has been to derive a standardised interpretation of resistance test results [3,20,21]. An international data analysis plan (the Forum for Collaborative HIV Research) has been developed [22] and a preliminary analysis of the week-8 virological response has been presented recently [3]. The choice of the week-8 virological endpoint was dictated by valid scientific arguments. One important consideration was that the virological response is more likely to be influenced by pre-therapy drug susceptibility than newly selected resistance if the analysis is restricted to the first 2 months of therapy. However, controversies regarding the choice of the optimal virological outcome remain and therefore it may be more appropriate to compare genotype ISs for their ability to predict long-term outcomes (e.g. the 24-week virological response, which will be used in future analyses).

Further analyses of the Forum for Collaborative HIV Research data are ongoing. Two different statistical approaches have been adopted: the first, developed in Europe, employs standard parametric models and bootstrap sampling; the second, developed in the USA, uses non-parametric resampling-based tests [23,24]. Despite all these efforts, none of these automatic interpretations can yet replace expert opinion when it comes to deriving optimal information from genotypic data.

Possibility of spurious associations

In a recent editorial, it was pointed out that research findings are less likely to be true in certain conditions: (1) with small studies demonstrating small effects; (2) when there is a greater number and less pre-selection of tested relationships; (3) where there is greater flexibility in designs, definitions, outcomes and analytical modes; and (4) when the same issues have been investigated by several teams worldwide so that it is less likely that one of these investigations may yield a significant result just by chance [25]. Most of these issues can be related to the field of HIV-1 drug resistance. One example is that of explorative analyses of the association between the presence of `novel', previously unrecognised mutations and virological response. Some of these analyses are, indeed, likely to be characterised by a small magnitude of the estimated effect. Furthermore, these estimates are often obtained after performing a large number of analyses based, for example, on various definitions of virological outcomes [2628]. Also, the mutation that shows a significant association with the outcome typically belongs to a much larger list of mutations that were tested with little pre-selection, leading to a significant increase in the probability of making a type I error (i.e. to conclude that there is a significant association when in reality there is not). It has been shown that with a number of non-pre-selected statistical tests (as few as 15), there is 100% chance that one of these tests would give a P-value less than the standard threshold of 0.05 [29]. Ioannidis showed that even in a properly powered epidemiological study designed to detect an odds ratio of 1:10, the probability of detecting a false-positive result is 80% [25]. Thus, even if type I error corrections are used [26,29,30], a great deal of caution is required in the interpretation of such analyses. In contrast, it is not infrequent that a mutation, which has been associated with reduced response to a certain antiretroviral agent in a single study (albeit with P=0.05), has been included in a list of drug-resistance mutations associated with that drug by a panel of experts on the basis of that finding alone [31].

Recommendations for clinical practice

  • The interpretation of genotypic results by different ISs may be discordant. While efforts are being made to develop a standardised IS, expert opinion is still required to derive optimal information from genotypic resistance reports. Resistance data must be complemented by a detailed treatment history.
  • Statisticians using resistance data need to make sure that they understand how the data have been derived and stored in the database; a basic knowledge of HIV-1 virology is crucial in order to design the statistical analysis, make reasonable assumptions and correctly interpret the results. Standardisation of definitions is also important in order to compare the results coming from different studies.
  • A proposed association between a given drug-resistance mutation and virological outcome should not be believed on the basis of a P-value of 0.05 observed in a single analysis performed without taking into account the risk of type I error derived from multiple testing. Exploratory analyses should be done first, with the intent of identifying a specific a priori hypothesis to be tested.


Adachi A, Gendelman HE, Koenig S. et al. Production of acquired immunodeficiency syndrome-associated retrovirus in human and non-human cells transfected with an infectious molecular clone. J Virol. 1986;59:284–291. [PMC free article: PMC253077] [PubMed: 3016298]
Los Alamos National Laboratory HIV Databases. http://hiv-web​ (accessed on 22 November 2005).
Costagliola D, Cozzi-Lepri A, Dalban C, Chang B. on behalf of the Standardisation and Clinical Relevance of HIV Drug Resistance. Project from the Forum for Collaborative HIV Research. Antivir Ther. 2005;10:S11.
HIV Resistance Response Database Initiative. http://www​ (accessed on 22 November 2005).
Phillips AN, Dunn D, Sabin C. UK Collaborative Group on HIV Drug Resistance; UK CHIC Study Group. et al. Long term probability of detection of HIV-1 drug resistance after starting antiretroviral therapy in routine clinical practice. AIDS. 2005;19:487–494. [PubMed: 15764854]
Hance AJ, Lemiale V, Izopet J. et al. Changes in human immunodeficiency virus type 1 populations after treatment interruption in patients failing antiretroviral therapy. J Virol. 2001;75:6410–6417. [PMC free article: PMC114364] [PubMed: 11413308]
Izopet J, Souyris C, Hance A. et al. Evolution of human immunodeficiency virus type 1 populations after resumption of therapy following treatment interruption and shift in resistance genotype. J Infect Dis. 2002;185:1506–1510. [PubMed: 11992288]
Devereux HL, Youle M, Johnson MA, Loveday C. Rapid decline in detectability of HIV-1 drug resistance mutations after stopping therapy. AIDS. 1999;13:F123–127. [PubMed: 10630517]
Verhofstede C, Noe A, Demecheleer E. et al. Drug-resistant variants that evolve during nonsuppressive therapy persist in HIV-1-infected peripheral blood mononuclear cells after long-term highly active antiretroviral therapy. J Acquir Immune Defic Syndr. 2004;35:473–483. [PubMed: 15021312]
Verhofstede C, Wanzeele FV, Van Der Gucht B. et al. Interruption of reverse transcriptase inhibitors or a switch from reverse transcriptase to protease inhibitors resulted in a fast reappearance of virus strains with a reverse transcriptase inhibitor-sensitive genotype. AIDS. 1999;13:2541–2546. [PubMed: 10630523]
Harrigan PR, Wynhoven B, Brumme ZL. et al. HIV-1 drug resistance: degree of underestimation by a cross-sectional versus a longitudinal testing approach. J Infect Dis. 2005;191:1325–1330. [PubMed: 15776380]
Harrigan PR, Hogg RS, Dong WW. et al. Predictors of HIV drug-resistance mutations in a large antiretroviral-naive cohort initiating triple antiretroviral therapy. J Infect Dis. 2005;191:339–347. [PubMed: 15633092]
Wainberg MA, Hsu M, Gu Z. et al. Effectiveness of 3TC in HIV clinical trials may be due in part to the M184V substitution in 3TC-resistant HIV-1 reverse transcriptase. AIDS. 1996;10(suppl 5):S3–10. [PubMed: 9030390]
Zaccarelli M, Perno CF, Forbici F. et al. Using a database of HIV patients undergoing genotypic resistance test after HAART failure to understand the dynamics of M184V mutation. Antivir Ther. 2003;8:51–56. [PubMed: 12713064]
Di Giambenedetto S, Calafigli M, Pinnetti C et al. HIV drug resistance predictors of clinical disease progression in patients undergoing resistance testing in clinical practice. XIV International HIV Drug Resistance Workshop, Quebec, 2005, Abstr. 32.
Zaccarelli M, Tozzi, Lorenzini P. et al. Collaborative Group for Clinical Use of HIV Genotype Resistance Test (GRT) at National Institute for Infectious Diseases Lazzaro Spallanzani. Multiple drug class-wide resistance associated with poorer survival after treatment failure in a cohort of HIV-infected patients. AIDS. 2005;19:1081–1089. [PubMed: 15958840]
Stanford University HIV Drug Resistance Database. http://hivdb​ (accessed on 9 December 2005).
Agence Nationale de Recherches sur le Sida AC11 Resistance Group. http:​// (accessed on 22 November 2005).
Katholieke Universiteit Leuven. Laboratory for Clinical and Evolutionary Virology. http://www​​.htm (accessed on 22 November 2005).
De Luca A, Perno CF. Impact of different HIV resistance interpretation by distinct systems on clinical utility of resistance testing. Curr Opin Infect Dis. 2003;16:573–580. [PubMed: 14624108]
Kijak GH, Rubio AE, Pampuro SE. et al. Discrepant results in the interpretation of HIV-1 drug-resistance genotypic data among widely used algorithms. HIV Med. 2003;4:72–78. [PubMed: 12534963]
Forum for Collaborative HIV Research. Standardization and Clinical Relevance of HIV Drug Resistance Testing. http://www​​/uploads/Resistance​/DataAnalysisPlanRev1.pdf (accessed on 22 November 2005).
Dudoit S, Van Der Laan MJ, Pollard KS. Multiple Testing. Part I. Single-step procedures for control of general type I error rates. Stat Appl Genet Mol Biol. 2004;3:Article 13. [PubMed: 16646791]
DiRienzo AG, De Gruttola V, Larder B, Hertogs K. Nonparametric methods to predict HIV drug susceptibility phenotype from genotype. Stat Med. 2003;22:2785–2798. [PubMed: 12939786]
Ioannidis JP. Why most published research findings are false. PLoS Med. 2005;2:e124. [PMC free article: PMC1182327] [PubMed: 16060722]
Perno CF, Cozzi-Lepri A, Balotta C. Italian Cohort Naive Antiretroviral (ICONA) Study Group. et al. Secondary mutations in the protease region of human immunodeficiency virus and virologic failure in drug-naive patients treated with protease inhibitor-based therapy. J Infect Dis. 2001;184:983–991. [PubMed: 11574912]
Alexander CS, Dong W, Chan K. et al. HIV protease and reverse transcriptase variation and therapy outcome in antiretroviral-naive individuals from a large North American cohort. AIDS. 2001;15:601–607. [PubMed: 11316997]
Brumme CJ, Harrigan PR. No inherent association between minor mutations in HIV protease at baseline and selection of the L90M mutation at the time of the first virological failure. J Infect Dis. 2005;191:1778–1779. author reply 1779–1780. [PubMed: 15838807]
Hsueh HM, Chen JJ, Kodell RL. Comparison of methods for estimating the number of true null hypotheses in multiplicity testing. J Biopharm Stat. 2003;13:675–689. [PubMed: 14584715]
Keselman HJ, Cribbie R, Holland B. Controlling the rate of Type I error over a large set of statistical tests. Br J Math Stat Psychol. 2002;55:27–39. [PubMed: 12034010]
Johnson VA, Brun-Vezinet F, Clotet B. et al. Update of the drug resistance mutations in HIV-1: 2005. Top HIV Med. 2005;13:51–57. [PubMed: 15849371]