NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Stevenson M, Lloyd-Jones M, Morgan MY, et al. Non-Invasive Diagnostic Assessment Tools for the Detection of Liver Fibrosis in Patients with Suspected Alcohol-Related Liver Disease: A Systematic Review and Economic Evaluation. Southampton (UK): NIHR Journals Library; 2012 Feb. (Health Technology Assessment, No. 16.4.)

Cover of Non-Invasive Diagnostic Assessment Tools for the Detection of Liver Fibrosis in Patients with Suspected Alcohol-Related Liver Disease: A Systematic Review and Economic Evaluation

Non-Invasive Diagnostic Assessment Tools for the Detection of Liver Fibrosis in Patients with Suspected Alcohol-Related Liver Disease: A Systematic Review and Economic Evaluation.

Show details

4Assessment of clinical effectiveness

Methods for reviewing effectiveness

A systematic review was undertaken according to the general principles recommended in the quality of reporting of meta-analyses (QUOROM)110 and Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA)111 statements.

Identification of studies

Extensive searches were undertaken for the comprehensive retrieval of studies of clinical effectiveness and cost-effectiveness relating to the research question. The concepts in the search strategies reflected the population and intervention categories of the Population, Intervention, Comparator, Outcome (PICO) model, namely patients with suspected liver fibrosis related to alcohol consumption and the specified non-invasive tests for the identification of fibrosis, respectively.

The search strategy comprised the following main elements:

  • searching of electronic databases
  • scrutiny of bibliographies of retrieved papers and previous systematic reviews
  • contact with experts in the field.

Sources searched

The electronic databases searched included MEDLINE, EMBASE, The Cochrane Library, Cumulative Index to Nursing and Allied Health Literature (CINAHL) and Web of Knowledge (for details, see Appendix 5).

Search strategies

The MEDLINE search strategies are presented in Appendix 5. Search strategies for the other databases are available on request.

Search restrictions

Searches were not restricted by publication type, date of publication, or language.

Inclusion and exclusion criteria

Inclusion criteria

  • Participants: Patients with suspected liver fibrosis related to alcohol consumption. Studies that included patients with suspected liver fibrosis of other aetiologies were included if data relating to patients with suspected alcohol-related disease could be extracted separately.
  • Intervention: One of the specified non-invasive tests for liver fibrosis, namely:

    the ELF test

    FibroTest

    FibroMAX

    FibroScan.

  • Comparators: The primary comparator, or reference standard, was liver biopsy for the identification of liver fibrosis. Secondary reference standards were tests used to identify conditions associated with liver fibrosis, namely HVPG measurement for PHt, and upper gastrointestinal endoscopy for the identification of oesophageal varices.
  • Outcome measures: The primary outcome measure was the diagnostic accuracy of the index test compared with the reference standard in distinguishing patients with significant fibrosis, defined as METAVIR stages F2–F4, from patients without significant fibrosis, defined as METAVIR stages F0–F1. Other outcome measures were:

    the diagnostic accuracy of the index test compared with the reference standard in distinguishing patients with cirrhosis (METAVIR stage F4) from patients without cirrhosis (METAVIR stages F0–F3)

    the diagnostic accuracy of the index test compared with the reference standard in distinguishing patients with moderate-to-severe fibrosis (METAVIR stages F3–F4) from patients without moderate-to-severe fibrosis (METAVIR stages F0–F2)

    the diagnostic accuracy of the index test compared with the reference standard in distinguishing patients with fibrosis (METAVIR stages F1–F4) from patients without fibrosis (METAVIR stage F0)

    the diagnostic accuracy of the index test compared with the reference standards in distinguishing patients with and without the complications of fibrosis (PHt and oesophageal varices)

    the number of patients requiring referral to secondary care

    the number of patients requiring liver biopsy

    the number of patients giving up alcohol, or significantly reducing alcohol consumption, following receipt of a test result

    long-term patient outcomes (disease progression, complications related to liver disease, need for liver transplantation, mortality)

    adverse effects of testing

    health-related quality of life.

Only studies of the index tests that reported data relating to one of the outcome measures in relation to the population of interest were included in the review of clinical effectiveness. However, this criterion was relaxed for consideration of adverse events, where wider searches were undertaken to allow the inclusion of data relating to studies of the adverse effects of diagnostic venepuncture or transient elastography (see Appendix 5).

  • Study design: the best available level of evidence, with priority given to controlled studies, if available.

Exclusion criteria

The following publication types were excluded from the review:

  • animal models
  • preclinical and biological studies
  • narrative reviews, editorials, and opinions.

Systematic reviews of primary studies were excluded from the review of clinical effectiveness, but were scanned for potential additional relevant studies.

In addition, studies were excluded if:

  • they were considered methodologically unsound (specifically, if the reference standard was used in only a subset of study participants and the selection criteria used to identify that subset were not clear)
  • they were published as meeting abstracts only, and insufficient methodological details were reported to allow critical appraisal of study quality
  • they were meeting abstracts that had been superseded by later publications and did not contain any additional data.

Sifting

The references identified by the literature searches were sifted in three stages by a single reviewer. They were screened for relevance first by title and then by abstract. Those papers that seemed, from their abstracts, to be relevant were then read in full, as were all potentially relevant papers for which abstracts were not available. At each step, studies that did not satisfy the inclusion criteria were excluded.

Data extraction strategy

Data were extracted by one reviewer to customised data extraction forms. Where multiple publications of the same study were identified, data were extracted and reported as a single study.

Critical appraisal strategy

Study quality was assessed using a modified version of the quality assessment of diagnostic accuracy studies (QUADAS) checklist,112 a validated tool designed to assess the internal and external validity of studies of diagnostic accuracy. Definitions of some scoring criteria were adapted from the systematic review by Friedrich-Rust et al.113 (for details, see Appendix 6). Where a study was reported in more than one publication, its quality was assessed on the basis of the combined data from all relevant publications.

The quality assessment of studies included in the review of clinical effectiveness was carried out by one researcher. Blinding of the quality assessor to author, institution, or journal was not considered necessary.114,115

Methods of data synthesis

Studies that met the review's entry criteria were eligible for inclusion in meta-analyses if this was appropriate in terms of comparability of the study populations, outcomes, and diagnostic thresholds, if the studies were unlikely to be biased,116 and if the numbers of TP, FP, TN, and FN results for each study were reported or could be obtained from the study authors. However, because of the degree of heterogeneity and the unavailability of full data from some studies, meta-analysis was not in fact considered appropriate. The presentation of results is therefore limited to a narrative review.

Where they were not reported by the original investigators, if data were available for the numbers of TPs, TNs, FPs and FNs, the reviewers independently calculated sensitivity, specificity and positive and negative predictive values, with CIs. This was done using beta distributions (alpha = TPs and beta = FNs for specificity; alpha = TNs and beta = FPs for specificity). If any number was < 5, a non-informative prior of 0.5 (equivalent to Jeffreys' prior) was added to both the alpha and beta parameter.

Results

Quantity and quality of research available

The electronic literature searches identified 4039 potentially relevant citations. Of these, 3829 were excluded at the title or abstract stage, leaving 210 that were obtained for examination of the full text, together with five additional relevant articles that had been identified from other sources. Two of these five articles, those by Janssens et al.117 and Mueller et al.,118 had been published too recently to be identified by the electronic searches; they superseded abstracts119,120 that had been identified by those searches. A further two articles, by Rosenberg et al.121 and Melin et al.,122 were not identified by the electronic searches because they were not appropriately indexed in the electronic databases; they were supplied by the relevant manufacturers together with an unpublished paper by Parkes et al.57

One hundred and ninety-eight citations were excluded at the full-text stage, leaving 17 articles that were included in the review (Figure 4). These 17 articles related to 14 studies: one study of the ELF test,57,121 four studies of FibroTest,13,123126 eight studies of FibroScan97,117,118,122,127131 and one study which used both FibroTest and FibroScan.132 No studies of FibroMAX were identified.

FIGURE 4. Clinical effectiveness: summary of study selection and exclusion.

FIGURE 4

Clinical effectiveness: summary of study selection and exclusion.

Number and type of studies included

The majority of articles that met the review inclusion criteria reported cross-sectional studies intended to confirm the performance of one of the non-invasive tests against a reference standard (liver biopsy, HVPG measurement, or endoscopic identification of oesophageal varices). A minority had a cohort design, following patients over time to assess how well the non-invasive test predicted adverse clinical outcomes. There were no randomised controlled trials (RCTs).

Number and type of studies excluded, with reasons

As may be seen from Quantity and quality of research available, a substantial number of the citations identified by the electronic searches related to studies that were excluded as part of the sifting process because they did not meet the inclusion criteria. Details are therefore given only of those citations that were excluded after a full reading, and then only if they were excluded for a reason other than a simple failure to meet the inclusion and exclusion criteria. Such citations are listed in Appendix 7, together with the reasons for their exclusion.

Study characteristics

Enhanced Liver Fibrosis Test

No studies were identified that assessed the ELF test as such. However, one study57 evaluated the European Liver Fibrosis Test, which was said to be essentially identical to the ELF test except for the inclusion of age in the algorithm. This study was, therefore, deemed to meet the inclusion criteria. Data from this study have been published in two articles.57,121 The first article, by Rosenberg et al.,121 assessed diagnostic test accuracy compared with liver biopsy in patients with chronic liver disease, some of whom had ALD. The second article, by Parkes et al.,57 evaluated the ability of the test to predict survival and relevant adverse events in the cohort of patients with chronic liver disease of various aetiologies enrolled in the original study in English hepatology centres. The primary outcome measure used by Parkes et al.57 was the first post-recruitment liver-related clinical event, defined as liver-related death, ascites, encephalopathy, oesophageal variceal haemorrhage confirmed by endoscopy, liver transplantation, or HCC. The presence of varices without haemorrhage was not included as an outcome because of the possibility that differences in the practice of endoscopy in the different centres may lead to ascertainment bias. Table 11 provides further details of study design.

TABLE 11. Characteristics of included analyses of the ELF test study.

TABLE 11

Characteristics of included analyses of the ELF test study.

At first sight, the data relating to the number of patients with ALD included in the analyses by Rosenberg et al.121 and Parkes et al.57 are confusing. One thousand and twenty-one patients with chronic liver disease of any aetiology were eligible for inclusion in the original study, and 921 were recruited. Of these, 621 formed the training or derivation cohort used to identify the optimum combination of markers and algorithm (the European Liver Test) which was then assessed in the remaining 300 patients (the validation cohort). All 64 patients with ALD were included in the validation cohort (Professor William Rosenberg, University College London, 2010, personal communication). The follow-up study by Parkes et al.57 states that 85 of the patients enrolled in the study in English centres alone had ALD (i.e. more patients with ALD than were said by Rosenberg et al. to have been included in the original study); this apparent discrepancy is attributed to the fact that some patients originally believed to have liver disease of a different aetiology were later found to have ALD (Professor William Rosenberg, University College London, 2010, personal communication).

FibroTest

Two studies of FibroTest were identified that specifically recruited patients with known or suspected ALD. Nguyen-Khac et al.132 evaluated diagnostic test accuracy compared with liver biopsy, but did not include the full spectrum of ALD: patients with known or decompensated cirrhosis were excluded on the basis that they did not require further investigation. Naveau et al.123 compared FibroTest and liver biopsy results in patients hospitalised either for complications of cirrhosis or for alcoholism. The study also assessed the ability of FibroTest to predict 5- and 10-year survival in 218 of the 292 patients (75%) enrolled in the study of test accuracy and followed up for a median period of 8.2 years (range 5 days to 11.8 years).124

A further three studies,13,125,126 all by Thabut et al., evaluated FibroTest in patients with liver disease of mixed aetiology, including ALD. To avoid any risk of double-counting, clarification was obtained from the author that no patient was included in more than one of these studies (Dr Dominique Thabut, Hôpital Pitié-Salpêtriére, Paris, 2010, personal communication). One study13 assessed the ability of FibroTest to identify PHt in patients undergoing transjugular liver biopsy for clinical reasons and also compared FibroTest and liver biopsy results in these patients. A second study125 assessed its ability to predict the presence of oesophageal varices in patients with chronic liver disease. The third study126 assessed FibroTest's predictive value in relation to survival at 2 months and 6 months in patients with severe cirrhosis.

Table 12 provides further details of study design.

TABLE 12. Characteristics of included analyses of FibroTest studies.

TABLE 12

Characteristics of included analyses of FibroTest studies.

FibroMAX

No relevant studies of FibroMAX were identified.

FibroScan

Six studies117,118,122,127,128,132 were identified that specifically recruited patients with known or suspected ALD and assessed the diagnostic test accuracy of FibroScan relative to liver biopsy in these patients. In the studies by Kim et al.,127 Mueller et al.,118 Nahon et al.128 and Nguyen-Khac et al.,132 all patients who underwent FibroScan were also biopsied. However, in the studies by Janssens et al.117 and Melin et al.,122 biopsy was only undertaken in a subset of patients who underwent FibroScan. Janssens et al.117 used FibroScan to identify those patients requiring alcohol detoxification or rehabilitation who had a score of ≥ 9.5 kPa; this threshold was chosen as it was thought to be indicative of severe fibrosis (F3–F4). Data relating to the test accuracy of FibroScan compared with biopsy and HVPG were therefore available only for patients with a FibroScan score of ≥ 9.5 kPa who then consented to liver biopsy and in whom both tests were conducted successfully. Similarly, Melin et al.122 sought to compare the accuracy of FibroScan with biopsy in patients being treated for alcohol withdrawal who had a FibroScan score higher than 13 kPa; this threshold was apparently chosen because it was considered to be the appropriate threshold for the diagnosis of cirrhosis in patients with hepatitis C. Only 41 patients met this criterion; three of these refused biopsy and a further three had contraindications to biopsy.

A further three studies97,129,130 evaluated the diagnostic test accuracy of FibroScan in patients with liver disease of mixed aetiology, including ALD. Bureau et al.129 studied the ability of FibroScan to predict significant PHt in patients undergoing transjugular liver biopsy for clinical reasons. Lemoine et al.97 and Nguyen-Khac et al.130 both specifically recruited patients with cirrhosis. Lemoine et al.97 assessed the ability of FibroScan relative to HVPG measurement to predict significant PHt in patients with compensated cirrhosis, whereas Nguyen-Khac et al.130 assessed its ability relative to upper intestinal endoscopy to predict the presence of large oesophageal varices in patients with cirrhosis of unspecified severity.

Table 13 provides further details of study design.

TABLE 13. Characteristics of included analyses of FibroScan studies.

TABLE 13

Characteristics of included analyses of FibroScan studies.

Study quality

Figures 59 provide an overview of the methodological quality of the included studies.

FIGURE 5. The ELF test: methodological quality summary.

FIGURE 5

The ELF test: methodological quality summary. Review authors' judgements about each methodological quality item. −, no; +, yes; ?, unclear.

FIGURE 6. FibroTest: methodological quality graph.

FIGURE 6

FibroTest: methodological quality graph. Review authors' judgements about each methodological quality item presented as percentages across all included studies.

FIGURE 7. FibroTest: methodological quality summary.

FIGURE 7

FibroTest: methodological quality summary. Review authors' judgements about each methodological quality item for each included study. −, no; +, yes; ?, unclear.

FIGURE 8. FibroScan: methodological quality graph.

FIGURE 8

FibroScan: methodological quality graph. Review authors' judgements about each methodological quality item presented as percentages across all included studies.

FIGURE 9. FibroScan: methodological quality summary.

FIGURE 9

FibroScan: methodological quality summary. Review authors' judgements about each methodological quality item for each included study. −, no; +, yes; ?, unclear.

As may be seen, few studies presented results relating to a representative spectrum of patients suspected of having ALD. The majority recruited patients with relatively severe disease. Bureau et al.,129 Rosenberg et al.121 and Thabut et al. 2007a13 recruited patients who were due to undergo liver biopsy for clinical reasons, whereas Lemoine et al.,97 Nguyen-Khac et al.130 and Thabut et al. 2007b126 recruited patients known to have cirrhosis, Naveau et al.123 recruited patients hospitalised for complications of cirrhosis or alcoholism, and Mueller et al.118 and Thabut et al. 2003126 recruited those known to have, rather than suspected of having, chronic liver disease. Two studies, those by Janssens et al.117 and Melin et al.,122 recruited more representative populations, but displayed partial verification bias, using the reference standard only in patients scoring above a specific threshold on the index test (FibroScan in both cases).

In studies of test accuracy, it is clearly important that the interval between the performance of the index and reference tests should be as short as possible, to minimise the possibility of the patient's condition altering significantly between the tests. However, several studies allowed a delay of > 2 weeks between the index test and reference standard. Naveau et al.123 allowed an interval of up to 1 month, whereas Kim et al.127 allowed the interval between transient elastography and liver biopsy to be as much as 92 days, and Thabut et al. 2003125 included patients in whom endoscopy was performed up to 6 months before or after FibroTest, although in this case the mean interval was only 5 days.

In approximately half of the included studies, it was not clear whether the reference standard results were interpreted without knowledge of the results of the index test (‘reference standard results blinded’) and vice versa (‘index test results blinded’). The remaining studies stated that blinding was used for either one or both tests. The index test was usually well described, but in many cases the execution of the reference standard was not described in sufficient detail to permit replication of the test precisely as performed by the study investigators.

Uninterpretable/intermediate results were generally poorly reported. Two studies, Rosenberg et al.'s121 study of the ELF test and Lemoine et al.'s97 FibroScan study, stated that patients were recruited prospectively and consecutively, and reported no uninterpretable/intermediate results, implying that there were none. Bureau et al.'s129 FibroScan study reported the overall number of uninterpretable results, but did not specify how many related to patients with ALD; it is not clear whether or not they were included in the analyses. Janssens et al.,117 Kim et al.,127 Melin et al.,122 Mueller et al.,118 Nahon et al.128 and Nguyen-Khac et al.132 all reported the number of instances of FibroScan failure (i.e. no or uninterpretable results), but did not include them in their analyses. As none stated how many of these failures occurred in patients who tested positive and how many in patients who tested negative by the reference standard, their impact on test sensitivity and specificity could not be calculated.

The three tests for which evidence has been identified vary in the extent to which that evidence is independent of the test manufacturer. There is no wholly independent evidence relating to the ELF test: one of the investigators, Professor William Rosenberg, is the founder of, and holds stocks in, iQur Ltd, which holds a limited licence to conduct ELF assays on behalf of Siemens Healthcare Diagnostics.57 Only one of the studies of FibroTest, that by Nguyen-Khac et al.,132 appears to be independent. The remaining studies include in their authorship Thierry Poynard, a major stockholder in, and Mona Munteanu, an employee of, the manufacturers, BioPredictive.125 However, six of the nine studies that provided data relating to FibroScan97,117,128130,132 stated that the authors had no conflicts of interest in relation to the work; of the studies that did not include such a declaration, that by Mueller et al.118 stated that it was funded from independent sources, and only the studies by Kim et al.127 and Melin et al.122 contained no relevant information on this point.

Two further indices of methodological quality proposed by Tsochatzis et al.138 were not included in the QUADAS checklist as they were not applicable to all of the included studies. These were:

  • whether studies that used liver biopsy as the reference standard reported that it was performed to an acceptable standard (i.e. the specimen was at least 15 mm long and included at least six portal tracts)
  • whether studies of FibroScan reported that it was performed in accordance with the manufacturer's instructions, i.e. using at least 10 valid shots, with a success rate (ratio of valid shots to total number of shots) of at least 60%, and an interquartile range < 30% of the median value.79

Only one study that used liver biopsy as a reference standard reported using adequate criteria; this was the study by Janssens et al.117 This was also one of only two studies which clearly stated that FibroScan was performed either in accordance with the manufacturer's instructions or, in the case of the study by Lemoine et al.,97 using more stringent criteria. The remaining studies failed either to meet or, more frequently, to report one or more of the standards (for details, see Table 14).

TABLE 14. FibroScan: reported compliance with manufacturer's instructions.

TABLE 14

FibroScan: reported compliance with manufacturer's instructions.

Assessment of diagnostic and prognostic accuracy

Enhanced Liver Fibrosis Test: diagnostic and prognostic accuracy results

The evidence base for the ELF test is small, resting on a single study of the European Liver Fibrosis Test carried out in a population with chronic liver disease that included < 100 patients diagnosed with ALD. Moreover, the quality of that evidence is not ideal as liver biopsy was not performed to an acceptable standard (see Study quality).

This limited evidence suggests that the ELF test can generally distinguish patients with moderate-to-severe fibrosis (Scheuer stages 3–4) from those with milder or no fibrosis (Scheuer stages 0–2) in patients with ALD. Using a low threshold score of 0.087, the test showed 100% sensitivity, but only 16.7% specificity. Ninety-three per cent sensitivity and 100% specificity were achieved using a threshold score of 0.431 (Table 15). These threshold scores appear to have been derived from the AUROCs after data collection, rather than prospectively selected and validated. As the the investigators note, because the results rest on data from so few patients, the resulting positive and negative predictive values should be interpreted with caution.121

TABLE 15. Diagnostic and prognostic accuracy of the ELF test in patients with known or suspected ALD.

TABLE 15

Diagnostic and prognostic accuracy of the ELF test in patients with known or suspected ALD.

The ELF test appears to be less successful in distinguishing patients with probable or definite cirrhosis (Scheuer stage 4) from those with milder or no fibrosis (Scheuer stages 0–3) (see Table 15). As this result was only presented in the form of an AUROC, the sensitivity and specificity associated with a specific threshold score or scores could not be calculated. Discordant results were not discussed for either fibrosis range.

The evidence relating to the prognostic accuracy of the ELF is derived from data from 85 patients with ALD who were enrolled in the European Liver Fibrosis Test study in English centres and were followed up over a median period of 6.86 years (range 0–9 years). Thus, as for test accuracy, the evidence base is very small. During the follow-up period, 27 patients (32%) died of liver-related causes, a further seven (8%) suffered non-fatal liver-related clinical events, and seven (8%) died of non-liver-related causes.57 Again, results are only presented in the form of an AUROC. Although this suggests that the ELF is predictive both of liver-related clinical outcomes and of all-cause mortality (for details, see Table 15), it should be noted that the sensitivity and specificity associated with a specific threshold score or scores could not be calculated and, more importantly, no information was presented on post-test alcohol consumption, although is likely to have been a substantial confounding factor.

FibroTest: diagnostic and prognostic accuracy results

The evidence base for FibroTest, although more substantial than that for the ELF test, derives from a total of only 622 patients enrolled in five small to medium-sized studies; although the evidence for test accuracy relative to liver biopsy is derived from a total of only 390 patients enrolled in three of those studies, none of which state that they stipulated a minimum biopsy length of 15 mm.

The available evidence suggests that FibroTest can distinguish between patients with cirrhosis and those with METAVIR stage F0–F3 fibrosis, and, with lesser accuracy, between those with stage F2–F4 and stage F0–F1 fibrosis, and between those with stage F3–F4 and stage F0–F2 fibrosis (Table 16). However, not only are these conclusions based on data from only three relatively small studies,13,123,132 in which some biopsy samples may not have met the recommended minimum standards, as noted above, but the prevalence of the condition of interest was high in each of the three studies, ranging from 63% to 98% for METAVIR stage F2–F4 fibrosis, 51% for METAVIR stage F3–F4 fibrosis, and from 31% to 92% for cirrhosis.

TABLE 16. Diagnostic and prognostic accuracy of FibroTest in patients with known or suspected ALD.

TABLE 16

Diagnostic and prognostic accuracy of FibroTest in patients with known or suspected ALD.

The largest study of test accuracy relative to liver biopsy, that by Naveau et al.,123 also had the most representative population in that it included the lowest proportion of patients with F2–F4 fibrosis. It explored the impact of different threshold scores on sensitivity and specificity. At a threshold score of 0.30, FibroTest had reasonable sensitivity, but rather disappointing specificity in relation to moderate-to-severe (F2–F4) fibrosis; this situation was reversed when the threshold score was raised to 0.70. For cirrhosis, a threshold score of 0.30 produced 100% sensitivity but only 50% specificity, and the balance was improved using a threshold score of 0.70 (for details, see Table 16). Thabut et al. 2007a13 found that, at a threshold of 0.48, the specificity of FibroTest in relation to moderate-to-severe (F2–F4) fibrosis was 0% because the prevalence of that condition in the study population, as indicated by liver biopsy, was 98%. Similarly, although FibroTest displayed a sensitivity of 95% for the diagnosis of F2–F4 fibrosis, this result is not robust because 92% of the study population had biopsy results indicative of cirrhosis. The third study of test accuracy relative to liver biopsy, that by Nguyen-Khac et al.,132 did not indicate what diagnostic thresholds were used; it did not report sensitivity and specificity, and the underlying data that would have allowed them to be calculated could not be obtained.

Both Naveau et al.123 and Thabut et al. 2007a13 provided some discussion of discordant cases. Naveau et al.123 reported a discordance of two or more fibrosis stages in 19% of assessed patients (42/221). On the basis of independent clinical, ultrasonographical, and endoscopic signs of cirrhosis, they attributed the error to the biopsy in 26 cases (14 FNs and 12 FPs), and to FibroTest in 13 cases (six FNs and seven FPs); three cases were unattributable.123 Eighteen of the 42 discordant cases involved diagnoses of cirrhosis: three FNs and three possible FPs of FibroTest, three FPs and eight possible FNs of biopsy (the eight FNs of biopsy were all in poor-quality samples), and one unattributable case diagnosed as cirrhosis by FibroTest but not by biopsy.123

Thabut et al. 2007a13 attributed discordant cases to failure of biopsy or FibroTest on the basis of clinical events (haemorrhage, ascites) and risk factors for FibroTest failure. Four of the 61 patients with cirrhosis (7%) had FN results on FibroTest; all had large ascites and low alpha-2-macroglobulin. No other discordant results were reported.13

Two small studies by Thabut et al. (2007a and 2003),13,125 which included only 66 and 58 patients with ALD, respectively, suggest that FibroTest can also distinguish between patients with and without PHt and, with less accuracy, between those with and without oesophageal varices. However, these studies were also carried out in populations with a high prevalence of those conditions, and indeed the investigators noted that, as 86% of the population of Thabut et al.'s 2007a study13 had HVPG results indicating clinically significant PHt (HVPG > 12 mmHg), the study findings should not be used as a basis for recommending the use of FibroTest alone to predict severe PHt in cirrhotic patients.

The study by Naveau et al.124 and, to a lesser extent, that by Thabut et al. 2007b126 suggest that FibroTest may also be able, with relatively low accuracy, to predict liver-related mortality and all-cause mortality (see Table 16). In Naveau et al.'s124 cohort study, 85 patients (39%) died during the follow-up period: 42 (19%) of liver-related causes (haemorrhage, HCC, and decompensation) and 43 (20%) of non-liver-related causes. FibroTest was predictive of survival or non-liver-related mortality and, to a lesser degree, of overall mortality (for details, see Table 16). Details of 5- and 10-year survival according to baseline FibroTest values are presented in Tables 17 and 18. The baseline FibroTest and biopsy results were concordant for 38 (90%) of the 42 liver-related deaths (29 with cirrhosis, nine without cirrhosis) and discordant for only four (10%; two FPs of FibroTest, and one FP and one FN of biopsy).124

TABLE 17. Five-year survival in patients with ALD, by baseline FibroTest value (after Naveau S, Gaudé G, Asnacios A, Agostini H, Abella A, Barri-Ova N, et al. Diagnostic and prognostic values of noninvasive biomarkers of fibrosis in patients with alcoholic liver disease. Hepatology 2009;49:97–105.).

TABLE 17

Five-year survival in patients with ALD, by baseline FibroTest value (after Naveau S, Gaudé G, Asnacios A, Agostini H, Abella A, Barri-Ova N, et al. Diagnostic and prognostic values of noninvasive biomarkers of fibrosis in patients with alcoholic (more...)

TABLE 18. Ten-year survival in patients with ALD, by baseline FibroTest value (after Naveau S, Gaudé G, Asnacios A, Agostini H, Abella A, Barri-Ova N, et al. Diagnostic and prognostic values of noninvasive biomarkers of fibrosis in patients with alcoholic liver disease. Hepatology 2009;49:97–105).

TABLE 18

Ten-year survival in patients with ALD, by baseline FibroTest value (after Naveau S, Gaudé G, Asnacios A, Agostini H, Abella A, Barri-Ova N, et al. Diagnostic and prognostic values of noninvasive biomarkers of fibrosis in patients with alcoholic (more...)

Naveau et al.'s124 cohort study also provided information on subsequent alcohol consumption in patients enrolled in their 2005 study of test accuracy.124 Only 21% (46/218) were known to be abstinent during the follow-up period; 50% (108/218) were not abstinent, and the status of the remaining 29% (64/218) was not known.124 Unfortunately, the authors did not link these data with test results, and thus it was not possible to determine whether or not the test results had an impact on subsequent alcohol consumption, or whether or not alcohol consumption affected survival.

FibroScan: diagnostic accuracy results

The evidence base for FibroScan is slightly larger than that for FibroTest, deriving from a total of approximately 868 patients enrolled in nine small to medium-sized studies (the total number of participants is approximate because, as indicated in Table 13, it is not always clear how many of the eligible patients were in fact assessed). The evidence for test accuracy relative to liver biopsy is derived from a total of only 480 patients enrolled in the studies by Janssens et al.,117 Kim et al.,127 Melin et al.,122 Mueller et al.,118 Nahon et al.,128 and Nguyen-Khac et al.132 In only one of these, that by Janssens et al.,117 are the liver biopsy samples known to have met the recommended minimum standards.

The evidence that FibroScan can distinguish patients with METAVIR stage F1–F4 fibrosis from those without fibrosis (F0), and those with F2–F4 fibrosis from those with stage F0–F1, is not robust, being based on only one fairly small study by Nguyen-Khac et al.132 in a population with a high prevalence of stage F2–F4 fibrosis. However, there is more substantial evidence that FibroScan can distinguish patients with stage F3–F4 from those with stage F0–F2 fibrosis, and those with cirrhosis from those with stage F0–F3 fibrosis (Table 19).

TABLE 19. Diagnostic accuracy of FibroScan in patients with known or suspected ALD.

TABLE 19

Diagnostic accuracy of FibroScan in patients with known or suspected ALD.

The data presented in Table 19 illustrate the impact of different threshold scores on the sensitivity and the specificity of FibroScan. Castéra et al.81 originally suggested that, in patients with chronic hepatitis C, the optimal FibroScan threshold values for the identification of significant (F2–F4) fibrosis, advanced (F3–F4) fibrosis, and cirrhosis (F4) were 7.1, 9.5, and 12.5 kPa, respectively (Table 20). These values were used by some of the studies included in this review and were found to be less appropriate for use in patients with ALD. Melin et al.122 achieved 100% sensitivity using a threshold score of 13 kPa to identify cirrhosis in patients with ALD, but could not estimate the specificity associated with this threshold because patients with a score < 13 kPa did not undergo liver biopsy. Other investigators specifically sought to identify the optimal threshold scores for use in patients with ALD (see Table 19). Janssens et al.117 noted that, in such patients, the threshold score of 9.5 kPa proposed for hepatitis C had 100% sensitivity for identifying severe (F3–F4) fibrosis, but a PPV of only 65%; it overestimated the degree of fibrosis in 17 of the 49 patients (35%) who underwent liver biopsy, and in all but one of them did so by two or more stages. Janssens et al.,117 therefore, suggested that more appropriate thresholds for the identification of severe (F3–F4) fibrosis in ALD would lie between 15.8 and 17.3 kPa. They did not report the sensitivity and the specificity of a threshold score of 12.5 kPa for identifying cirrhosis (F4) in patients with ALD, but suggested that a more appropriate threshold would lie between 19.6 and 23.5 kPa, the exact choice depending on the preferred balance between sensitivity and specificity.117 This study was not ideally designed to establish specificity because biopsy was offered only to patients with a FibroScan score of ≥ 9.5 kPa.

TABLE 20. Optimal liver stiffness cut-off values for the diagnosis of fibrosis in patients with chronic hepatitis C (from Castéra et al.).

TABLE 20

Optimal liver stiffness cut-off values for the diagnosis of fibrosis in patients with chronic hepatitis C (from Castéra et al.).

Three studies discussed discordant results.97,117,128 Nahon et al.128 found that, relative to biopsy, FibroScan underestimated and overestimated the degree of fibrosis in approximately equal proportions of patients. Fourteen per cent (11/79) of those with histologically proven cirrhosis had FibroScan scores < 22.6 kPa, whereas 16% (11/68) of those with FibroScan scores > 22.6 kPa had biopsy results that did not indicate cirrhosis, although the majority (10/11) displayed extensive fibrosis. Janssens et al.117 found that FibroScan overestimated the degree of fibrosis in 7 of the 11 patients with severe steatosis (by two stages in five patients and by one stage in two patients). Of the six patients in the study with alcoholic hepatitis, FibroScan classified three as having cirrhosis, although their biopsy results indicated F3 fibrosis. In two of the remaining three, both FibroScan and biopsy results indicated cirrhosis, thus removing the potential for overestimation by FibroScan. Finally, Lemoine et al.97 noted one discordant case in a 70-year-old patient who had been totally abstinent from alcohol for 12 months and had no histological indication of alcoholic hepatitis. The patient's FibroScan score was 38.10 ± 10 kPa, suggestive of PHt, although his HVPG measurement was only 8 mmHg.

Like Janssens et al.,117 Mueller et al.118 found that in patients with inflammatory hepatitis, liver stiffness was increased independently of the degree of fibrosis. They, therefore, found that the diagnostic accuracy of FibroScan improved when patients with laboratory signs of alcoholic steatohepatitis [i.e. serum glutamic oxaloacetic transaminase (SGOT) levels above 100 units/litre (U/L)] were excluded. When patients with only mildly elevated SGOT (> 50 U/L) were excluded, diagnostic accuracy improved in relation to F3–F4 fibrosis, but not in relation to F4 (cirrhosis) (Table 21).

TABLE 21. Diagnostic accuracy of FibroScan in patients with suspected ALD, with and without those with elevated SGOT (from Mueller et al.).

TABLE 21

Diagnostic accuracy of FibroScan in patients with suspected ALD, with and without those with elevated SGOT (from Mueller et al.).

Three small studies97,117,129 suggest that FibroScan can generally distinguish between patients with and without PHt, whereas one smallish study130 suggests that it can distinguish with less success between patients with and without large oesophageal varices (see Table 19).

There are no long-term data relating FibroScan results to survival or other clinical outcomes.

Adverse events and failure rates

The Enhanced Liver Fibrosis Test and FibroTest: failure rates and adverse events

Both the ELF test and FibroTest use blood samples obtained by standard venepuncture. None of the included studies reported adverse events relating to this process. However, a systematic review of studies of adverse events in adults undergoing simple venepuncture for diagnostic or screening purposes indicates that between 14% and 45% of patients undergoing such venepuncture suffer pain and bruising, whereas between 0.9% and 3.4% suffer vasovagal reactions. Potentially disabling nerve injuries occur but, fortunately, appear to be very rare (for full details, see Appendix 8).

There are no data relating to test failure rates for the ELF test or FibroTest specifically in patients with ALD. The only relevant data come from Naveau et al.'s study123 of FibroTest in which, for unspecified reasons, serum samples were unavailable for 17% (50/292) of the enrolled patients. In Rosenberg et al.'s study121 of the ELF test, 4.4% (45/1021) of patients overall had incomplete clinical details or biochemical samples, compared with 5.6% (55/976) whose biopsy samples were considered inadequate; figures relating specifically to patients with ALD were not presented.

FibroScan: failure rates and adverse events

Only two of the included studies commented on the acceptability of FibroScan to patients.117,132 Janssens et al.117 found that only 2% (5/255) of patients entering hospital for alcohol detoxification and rehabilitation refused FibroScan, whereas 29% (21/72) of those with FibroScan results indicative of severe fibrosis or cirrhosis-refused liver biopsy. However, Nguyen-Khac et al.132 found that 34% (55/160) of patients refused to participate in their study, which involved both transient elastography and venepuncture; it is not clear whether this reluctance to participate related specifically to one or other of those interventions, or to the perceived inconvenience of undergoing both.

Some studies reported the proportion of patients with ALD in whom FibroScan was unsuccessful: this ranged between 4.4% and 8.6% (Table 22). Other studies reported the number of potential participants who were excluded because of either failure to obtain a valid result or the presence of obesity or other factors likely to affect FibroScan performance. In the study by Kim et al.,127 11.8% of patients were excluded because of the factors likely to affect test performance. Although obesity is the most frequently reported reason for FibroScan failure, Bureau et al.129 found that one-third of test failures (2/6) in patients with liver disease of varied aetiology could not be attributed to obesity; their cause remained unclear.

TABLE 22. Proportion of patients with ALD in whom FibroScan was either thought inappropriate or was unsuccessful.

TABLE 22

Proportion of patients with ALD in whom FibroScan was either thought inappropriate or was unsuccessful.

Two studies that sought specifically to identify the features associated with successful FibroScan use did not meet the review's inclusion criteria, but nonetheless provide useful information in this context.79,139 The largest prospective study of this nature, by Castéra et al.,79 assessed 13,369 examinations performed by seven operators over a 5-year period in 7261 adult patients with chronic liver disease of varied aetiology. It recorded the prevalence of failure of LSM (defined, in accordance with the manufacturer's recommendation, as failure to obtain any value after at least 10 shots) and unreliable results (defined, again in accordance with the manufacturer's recommendations, as < 10 successful shots, a success rate <60%, or an interquartile range >30% of the median value). In 18.4% of examinations (2466/13,369), valid results could not be obtained. LSM failed in 3.1% (420/13,369), whereas in 15.8% (2046/12,949) of the remaining examinations the results were deemed unreliable. Although the number of patients in whom FibroScan could be used successfully could be increased with repeated examinations, there remained some for whom it was impossible to obtain either any result or a reliable result after five attempts (Table 23).79

TABLE 23. Effect of repeated examinations on rate of failure of LSM or unreliable results (data from Castéra et al.).

TABLE 23

Effect of repeated examinations on rate of failure of LSM or unreliable results (data from Castéra et al.).

Castéra et al.79 found that the factor most strongly associated with both test failure and unreliable results at the first FibroScan examination was a BMI > 30 kg/m2 (Table 24); the rates of both failure and unreliable results increased in parallel with the BMI (Table 25). The rates of both failure and unreliable results were also substantially raised when the operator had performed < 500 examinations (Table 26). Such a high threshold was chosen to define operator experience because all the operators who participated in the study had already performed at least 100 FibroScan examinations. The failure rate ranged from 0.2% in lean young non-diabetic patients to 20.9% in elderly obese diabetic patients, while the rate of unreliable results ranged from 7.2% in lean young men without diabetes or hypertension to 60.4% in elderly obese women with diabetes and hypertension.

TABLE 24. Factors independently associated with failure of LSM or unreliable results at first FibroScan examination (data from Castéra et al.).

TABLE 24

Factors independently associated with failure of LSM or unreliable results at first FibroScan examination (data from Castéra et al.).

TABLE 25. Rates of LSM failure and unreliable results at first FibroScan examination, by BMI (data from Castéra et al.).

TABLE 25

Rates of LSM failure and unreliable results at first FibroScan examination, by BMI (data from Castéra et al.).

TABLE 26. Rates of LSM failure and unreliable results, by operator experience (data from Castéra et al.).

TABLE 26

Rates of LSM failure and unreliable results, by operator experience (data from Castéra et al.).

In a smaller study, Kettaneh et al.139 failed to achieve the manufacturer's recommendation of at least 10 successful LSMs in 8.4% (79/935) of patients with chronic hepatitis C. They found that success was directly related to increased operator experience, and inversely related to both patient age and patient BMI. However, with hindsight, they suggested that the limiting factor was not so much BMI per se as the presence of a fatty thoracic belt that made it technically impossible to obtain accurate results.

A pilot study conducted specifically in patients with a BMI of ≥ 30 kg/m2 found that the use of the XL specialised probe, which can measure to a depth of 35–75 mm below the skin surface, reduced rates of failure and unreliable results. However, even with this probe, no value could be obtained in 12% of patients and the recommended standard of at least 10 valid measurements could be achieved in only 76%.140 These findings are particularly important in the light of the high proportion of the UK population with a BMI of ≥ 30 kg/m2 or a raised waist circumference (see Chapter 1, Incidence and prevalence).

Discussion of clinical effectiveness

Diagnostic and prognostic accuracy

Summary of diagnostic and prognostic accuracy of the Enhanced Liver Fibrosis Test

No studies were identified that specifically assessed the ELF test. One study121 (n = 64 patients) evaluated the diagnostic accuracy of the European Liver Fibrosis Test (essentially the ELF test with the addition of age to the algorithm) relative to liver biopsy, in identifying moderate-to-severe fibrosis and cirrhosis in patients with known or suspected ALD. The study found that, at a threshold of 0.431, the ELF test identified moderate-to-severe fibrosis with a sensitivity of 93% (95% CI 87% to 97%) and a specificity of 100% (95% CI 93% to 100%). The sensitivity and specificity for cirrhosis were not reported, but were presumably lower: the point estimate of the AUROC for cirrhosis was lower than that for moderate-to-severe fibrosis (0.83 vs 0.94), although the CIs overlapped. As the evidence base is very small, and acceptable minimum standards were not used for the biopsy sample, these findings are not robust.

A follow-up study57 (n = 85 patients) suggested that the test had a predictive value in relation to both liver-related clinical outcomes and all-cause mortality, but did not report sensitivity and specificity. Again, the findings are not robust because the evidence base is so small.

Summary of diagnostic and prognostic accuracy of FibroTest

Three studies13,123,132 assessed the diagnostic and prognostic accuracy of FibroTest in identifying moderate-to-severe fibrosis and cirrhosis in patients with known or suspected ALD (n = 390 patients). The largest of these studies, that by Naveau et al.123 (n = 221 patients), also had the most representative population. The study found that, using a threshold score of 0.30, FibroTest could identify moderate-to-severe (F2–F4) fibrosis with a sensitivity of 84% (95% CI 78% to 90%) and a specificity of 66% (95% CI 55% to 76%), whereas, using a threshold score of 0.70, it could identify cirrhosis with a sensitivity of 91% (95% CI 83% to 97%) and a specificity of 87% (95% CI 81% to 92%). Evidence for FibroTest's ability to distinguish between patients with and without fibrosis (F1–4 vs F0) is not robust, being based on only one fairly small study by Nguyen-Khac et al.132 (n = 103 patients) in a population in whom the prevalence of fibrosis or cirrhosis was 92%.

A small study by Thabut et al. 2007a13 (n = 66 patients) found that FibroTest could identify clinically significant PHt (HVPG ≥ 12 mmHg) with a sensitivity of 93% (95% CI 84% to 98%) and specificity of 87% (95% CI 55% to 99%). However, because of the high prevalence of the condition in the study population, the investigators felt that this finding should not be used to support the use of FibroTest alone to predict severe PHt in cirrhotic patients.

A second small study by Thabut et al. 2003125 (n = 58 patients) found that, using a threshold of 0.85, FibroTest could predict the presence of grade 2 oesophageal varices with a sensitivity of 89% (95% CI 73% to 96%) and a specificity of 50% (95% CI 25% to 77%).

Finally, a study by Thabut et al. 2007b126 (n = 189 patients) suggested that FibroTest has a modest predictive value in relation to all-cause mortality at 2 and 6 months (AUROCs 0.64 ± 0.05 and 0.58 ± 0.05, respectively); sensitivity and specificity were not reported. A study by Naveau et al.124 (n = 218 patients) suggested that FibroTest had a somewhat better predictive value in relation to both liver-related and all-cause mortality at 5 years [AUROCs 0.79 (95% CI 0.68 to 0.86) and 0.69 (95% CI 0.61 to 0.76), respectively], but, again, sensitivity and specificity were not reported.

Summary of diagnostic and prognostic accuracy of FibroMAX

No evidence of the diagnostic and prognostic accuracy of FibroMAX was identified.

Summary of diagnostic and prognostic accuracy of FibroScan

Six studies117,118,122,127,128,132 assessed the diagnostic and prognostic accuracy of FibroScan in identifying moderate-to-severe fibrosis and cirrhosis in patients with known or suspected ALD (n = 480 patients). The study with the most representative population, that by Nahon et al.,128 found that, using a threshold score of 11.6, FibroScan could identify severe (F3–F4) fibrosis with a sensitivity of 87% (95% CI 80% to 93%) and a specificity of 89% (95% CI 76% to 96%), whereas, using a threshold score of 22.7, it could identify cirrhosis with a sensitivity of 84% (95% CI 75% to 91%) and specificity of 83% (95% CI 74% to 91%). As with FibroTest, evidence for FibroScan's ability to distinguish between patients with and without fibrosis (F1–4 vs F0) is not robust, being based only on one fairly small study by Nguyen-Khac et al. 132 (n = 103 patients) in a population with a 92% prevalence of fibrosis or cirrhosis.

Two of the included studies, those by Janssens et al.117 and Mueller et al.,118 indicate that FibroScan may overestimate the degree of fibrosis in patients with inflammatory hepatitis. This is consistent with Sagir et al.'s141 finding that 15 out of 20 patients with acute hepatitis of varying causes had FibroScan results indicative of cirrhosis although they had no other signs of cirrhosis, and with Arena et al.'s142 finding that, in patients with chronic hepatitis C, necroinflammatory activity identified at biopsy was associated with increased liver stiffness at each fibrosis stage except cirrhosis. However, unlike Janssens et al.,117 Arena et al.142 found that the degree of steatosis did not influence FibroScan results.

Three studies95,117,129 (n ≤ 148 patients) assessed FibroScan's ability to identify PHt. Only one of these, that by Lemoine et al.,97 reported sensitivity and specificity. The study found that FibroScan could identify clinically significant PHt (HPVG ≥ 10 mmHg) with a sensitivity of 90% (95% CI 78% to 97%) and a specificity of 88% (95% CI 55% to 99%).97

A small study by Nguyen-Khac et al.130 (n = 103 patients) found that, using a threshold of 47.2 kPa, FibroScan could predict the presence of large oesophageal varices with a sensitivity of 85% (95% CI 67% to 95%) and a specificity of 64% (95% CI 53% to 74%).

Discussion of diagnostic and prognostic accuracy of the Enhanced Liver Fibrosis Test, FibroTest and FibroScan

The evidence relating to the diagnostic accuracy of the ELF test, FibroTest, and FibroScan in relation to liver fibrosis and cirrhosis is not robust, and does not support any attempt to differentiate between their performances in this respect. As Naveau et al.124 note, indirect comparisons between the results of different studies of test accuracy are particularly hazardous, not least because of interstudy variability both in the prevalence of different stages of fibrosis and in biopsy lengths. Only one study was identified that compared two different non-invasive tests with liver biopsy in the same patients: this was the relatively small study by Nguyen-Khac et al.,132 which presented data relating to both FibroTest and FibroScan. In this study, although the point estimates of the AUROCs were higher for FibroScan than for FibroTest, the CIs overlap and, therefore, it is not possible to conclude that FibroScan has better diagnostic accuracy than FibroTest (Table 27).

TABLE 27. Comparison of FibroTest and FibroScan with liver biopsy in the same population.

TABLE 27

Comparison of FibroTest and FibroScan with liver biopsy in the same population.

All studies that compare non-invasive tests with liver biopsy in patients with ALD and present information on the interquartile ranges around the median test scores for the different METAVIR stages (i.e. the studies of FibroTest by Nahon et al.128 and Naveau et al.,123 and the studies of FibroScan by Janssens et al.,117 Kim et al.,127 Mueller et al.118 and Nguyen-Khac et al.132), display a substantial degree of overlap between those interquartile ranges. Thus, for any individual patient, whatever the non-invasive test score, there will be substantial uncertainty regarding their true fibrosis stage. Although this uncertainty may perhaps be due less to deficiencies in the non-invasive tests themselves than to issues related to liver biopsy (e.g. the use of inadequate samples) or differences between patients in the degree of necroinflammation and steatosis,143 until well-designed studies are conducted that take these factors into account, the clinical utility of the tests is not apparent.

The evidence relating to the diagnostic accuracy of the ELF test, FibroTest, and FibroScan in relation to PHt and oesophageal varices is weaker than that relating to fibrosis and cirrhosis, as it rests on even smaller patient numbers. Moreover, the use of FibroScan to identify cirrhotic patients at high risk of oesophageal varices is said to be inappropriate because, although varices form only when PHt is present, neither the presence of varices nor their size is directly correlated with the degree of portal pressure elevation.144

Patient management and clinical outcomes

No studies were identified that reported data relating to the effect of the use of any of the four tests on patient management or clinical outcomes.

Adverse effects and contraindications

The non-invasive tests included in this review appear to be safe. No adverse effects were reported in any of the included studies and no additional evidence has been identified that indicates that transient elastography is specifically associated with any adverse effects. As noted in The Enhanced Liver Fibrosis Test and FibroTest: failure rates and adverse events, the ELF test, FibroTest, and FibroMAX, which utilise blood tests, will be associated with the same adverse effects as diagnostic venepuncture generally – primarily pain and bruising, with occasional vasovagal reactions, and very rarely potentially disabling nerve injuries. By contrast, liver biopsy is associated with a high level of morbidity and occasional mortality (see Chapter 1, Liver biopsy).

No contraindications have been specified for the ELF test. The contraindications specified for FibroTest, FibroMAX, and FibroScan all relate to the mode of operation of the test, and do not relate to any potential for harm in patients with the relevant characteristics. Moreover, there is evidence to suggest that FibroScan is generally acceptable to patients with ALD. As noted in FibroScan: failure rates and adverse events, Janssens et al.117 found that only 2% of patients entering hospital for alcohol detoxification and rehabilitation refused FibroScan, although 34% refused to participate in the study by Nguyen-Khac et al.132 which required them to undergo both FibroScan and blood tests. Finally, in a study of acceptability, Melin et al.145 found that all 380 patients seen for alcohol problems during the course of a year agreed to undergo FibroScan; only 5% (2/44) of those who were offered liver biopsy because their FibroScan result indicated severe fibrosis or cirrhosis refused it, compared with 29% in the study by Janssens et al.117

Internal and external validity

The results of the included studies summarised above suggest that the ELF test, FibroTest, and FibroScan can be used to identify patients with ALD who have fibrosis or cirrhosis. However, these results should be viewed with caution for a number of reasons. The most obvious reason is that they are not robust because they rest on data from relatively few patients with ALD; this is especially true of the ELF test.

Internal validity

As noted in Study quality above, study quality, as assessed using a modified version of the QUADAS checklist,112 is generally not high.

Most of the studies display spectrum bias because they recruited patients believed or known to have severe fibrosis or cirrhosis, rather than those representative of the whole spectrum of patients with suspected ALD. Such spectrum bias favours the index test: because the positive and negative predictive values of diagnostic tests depend critically on the prevalence of the condition being tested for in the population being tested,146 if the prevalence is considerably higher than would be expected in normal clinical practice, then the positive predictive value of the test will also be higher than it would be in normal clinical practice. Consequently, even if the studies indicate that the tests have high sensitivity and specificity, in normal use many of the positive results will be FPs.146 Moreover, two of the studies that recruited a more representative patient sample, those by Janssens et al.117 and Melin et al.,122 used the reference standard only in those patients whose index test result was above a specific threshold. This use of the reference standard only in patients testing positive using the index test (verification bias) will result in overestimation of its sensitivity because the number of FN results is too low.147 In the context of liver fibrosis, both spectrum bias and verification bias are probably due to valid ethical issues surrounding the use of biopsy in patients in whom it is not considered clinically necessary; however, they distort study results in such a way as to favour the index tests.

Conversely, however, studies that compare a non-invasive test with liver biopsy are disadvantaged by the fact that it is an imperfect reference standard; thus, discordance between the degree of fibrosis indicated by biopsy and by non-invasive testing may be because of an error in either test. Mehta et al.,148 noted that liver biopsy is associated with such a degree of potential error that its use as the reference standard may make it impossible to differentiate between a perfect and an inadequate surrogate test. They calculated that, assuming that liver biopsy has a sensitivity and a specificity of 90% for the identification of significant liver fibrosis and that the prevalence of that condition in the population being tested is 40%, a perfect non-invasive test with an AUROC of 0.99 versus true disease can only achieve an AUROC of 0.90 versus liver biopsy. Indeed, Afdhal et al.19 suggest that liver biopsy has a diagnostic accuracy of 80–90% and, in that case, any tests that are compared with liver biopsy cannot achieve an AUROC better than 0.9, and the results are likely to lie in the range 0.75–0.88, with a most likely value of 0.85. Thus, even if a non-invasive test is in fact a perfect non-invasive surrogate for liver biopsy, it may be impossible to prove this.19

The use of liver biopsy as the reference standard is associated with a second problem. The non-invasive tests reviewed in this report all present a numeric result relating to a continuous measurement that is held to reflect, directly or indirectly, the degree of fibrosis in the liver. However, this result is then compared with a liver biopsy result expressed in terms of an ordinal scoring system: i.e. biopsy results are classified into a number of groups that have a natural ordering, in that they indicate progressively more severe liver damage, but do not represent a direct arithmetical progression. For example, the degree of fibrosis seen in METAVIR stage F4 is not necessarily twice that seen in METAVIR stage F2; instead, the different stages describe the pattern of deposition of fibrous tissue, as well as its extent.149 Consequently, to permit comparison with liver biopsy results, a threshold value corresponding to each biopsy stage must be identified for each non-invasive test. In most of the included studies, the threshold values recommended as appropriate for the identification of the different stages of fibrosis and cirrhosis in patients with ALD have been derived statistically from the receiver operating characteristic curve after data collection. They have not been validated prospectively and, therefore, do not fulfil the standard criteria for the general use of a diagnostic test.138

External validity

It is difficult to comment on the external validity of the included studies – i.e. the extent to which their populations and methods are generalisable to clinical practice in the UK – not least because of the lack of clarity surrounding the potential role of NILTs in clinical practice in the UK. The issues relate to the population in whom, and the purpose for which, such tests may be used; they are to some extent related.

In the original scope of this assessment, it was envisaged that NILTs would be used in primary care to enable more appropriate selection of patients with abnormal liver function tests and risk factors for chronic liver disease for referral to specialist care. By contrast, the included studies were conducted in secondary or tertiary care settings. Subsequently, clinical experts in the UK have suggested that it is unlikely, and possibly undesirable, that non-invasive tests will be used in primary care, and that most patients who are felt to need further investigation for suspected ALD should be referred to specialist care, where non-invasive tests will be performed if considered appropriate. However, even given this scenario, the range of disease severity is likely to be wider than that seen in the included studies, many of which were limited to patients believed to have relatively severe disease. Indeed, a number of studies recruited patients who not only required liver biopsy for clinical reasons but in whom that biopsy was performed transjugularly rather than percutaneously,13 suggesting the presence of decompensated cirrhosis.

In ALD, NILTs may be used for one of two main diagnostic purposes:

  • to identify patients with fibrosis, so that efforts may be made to prevent the development of cirrhosis
  • to identify patients with cirrhosis, enabling them to be monitored for the development of conditions such as oesophageal varices and HCC.

Assuming that non-invasive test results indicative of fibrosis are effective in influencing patients with ALD to abstain from alcohol – and no evidence for this has been identified – then the former use is of potentially greater clinical value as it would permit the identification of patients with ALD at a time when that disease was still reversible, whereas identification of patients with cirrhosis would only permit the initiation of monitoring to enable prompt treatment of symptoms of an incurable disease. However, only one study, that by Nguyen-Khac et al.,132 reported on the ability of a non-invasive test to identify mild (METAVIR F1) as well as more severe (F2–F4) fibrosis in patients with suspected ALD, and in most studies the tests performed better when identifying cirrhosis (F4) than when identifying mild (F1), moderate (F2), or severe (F3) fibrosis. This clearly limits the clinical utility of the tests.

In tests that present results derived from a continuous scale, the intended purpose of that test will affect the choice of the threshold score. So, if the intended purpose of the NILTs reviewed in this report is to identify patients with cirrhosis to undergo further tests and monitoring, a threshold score should be chosen that maximises sensitivity (i.e. the proportion of patients who genuinely have the condition of interest who are correctly identified by the non-invasive test), as this will minimise the risk of patients with cirrhosis being mistakenly identified as not having the condition, and therefore not receiving further tests, monitoring, and treatment, as appropriate. However, if the intended purpose of the tests is to exclude patients without fibrosis, the threshold score should be chosen to maximise specificity (i.e. the proportion of people who genuinely do not have the condition of interest who are correctly identified by the non-invasive test), to reduce the risk of patients who do not have fibrosis undergoing costly and potentially invasive tests.

Test results may be influenced by factors other than the degree of fibrosis present in the liver. The included studies have shown that, for FibroScan, the optimum threshold values for fibrosis and cirrhosis are higher in patients with ALD than in patients with hepatitis C (and possibly other liver diseases), and it is therefore crucial that the aetiology of suspected liver disease is securely established before the test result is interpreted.18 In addition, as noted in FibroScan: diagnostic accuracy results, in patients with a secure diagnosis of ALD, FibroScan may overestimate the degree of fibrosis if either steatosis or alcoholic hepatitis is present. Current drinking status is also relevant: Mueller et al.118 have shown that, in patients with ALD, liver stiffness, as measured by FibroScan, decreases during alcohol detoxification independent of the fibrosis stage. Thus, consideration must also be given to the optimum timing of the tests.

Finally, it should be noted that, unlike liver biopsy, the non-invasive tests assessed in this report only seek to identify the degree of liver fibrosis. They cannot also provide additional useful information, for example by indicating the presence of another liver disease in addition to ALD, or by evaluating necroinflammation to assess whether that fibrosis is an ongoing process that may continue to develop or whether it results from a past event that has stabilised or even regressed.104

Conclusions for clinical effectiveness

There is some evidence to suggest that, in patients with known or suspected ALD, the ELF test, FibroTest, and FibroScan can identify fibrosis with varying degrees of diagnostic accuracy; no evidence has been identified relating to FibroMAX, although this is recommended by the manufacturers in preference to FibroTest in patients with ALD. Although FibroTest and FibroScan appear to have greater accuracy in identifying cirrhosis rather than lesser degrees of fibrosis, the ELF test appears to perform less well in specifically identifying cirrhosis than in identifying the presence of moderate-to-severe fibrosis but, as the evidence base is very small and acceptable minimum standards were not used for the biopsy samples, this finding is not robust. Evidence for the ability of FibroTest and FibroScan to identify clinically significant PHt, and oesophageal varices, rests on extremely small studies, and again is not robust.

Moreover, the confidence that can be placed in the study results is reduced because most of the studies display spectrum bias, and the two studies that recruited a more representative sample display verification bias: both of these biases will favour the index test. In addition, the degree of error associated with liver biopsy is such that its use as the reference standard may make it impossible to judge with accuracy the adequacy of the surrogate test. Finally, the degree of overlap between the interquartile ranges around the median values relating to each METAVIR stage means that, for any individual patient, whatever their non-invasive test score, there will be substantial uncertainty regarding their true fibrosis stage, and this will substantially limit the clinical utility of the non-invasive tests.

© 2012, Crown Copyright.

Included under terms of UK Non-commercial Government License.

Bookshelf ID: NBK97523

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (1.6M)
  • Disable Glossary Links

Other titles in this collection

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...