U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Maglione MA, Okunogbe A, Ewing B, et al. Diagnosis of Celiac Disease [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2016 Jan. (Comparative Effectiveness Reviews, No. 162.)

Cover of Diagnosis of Celiac Disease

Diagnosis of Celiac Disease [Internet].

Show details

Discussion

Key Findings and Strength of Evidence

The key findings and strength of evidence are summarized in Table 22. Additional details on strength of evidence ratings are provided as Appendix F.

Table 22. Summary of findings and strength of evidence.

Table 22

Summary of findings and strength of evidence.

Findings in Relationship to What Is Already Known

Table 22 displays findings from prior SRs along with the findings from the newly identified studies that met our inclusion criteria. We identified enough studies of the accuracy of tTG IgA tests and EmA IgA tests to conduct new meta-analyses. Our findings confirm the excellent specificity of both tests, the excellent sensitivity of tTG IgA and the good specificity of EmA IgA reported in prior SRs. A prior SR reported promising accuracy results for DGP tests; we found only one new study.

Several studies on whether adding an EmA or DGP test to a tTG test increases accuracy have recently been published. Results are insufficient to determine whether such increases are clinically meaningful.

No SRs have been conducted on the association between setting (academic vs community) and provider performance in CD diagnosis. We identified three retrospective studies evaluating inter-observer variability in histological diagnosis of CD between different pathologists and clinical settings. Results indicate that CD-related histological findings are underdiagnosed in community-based hospital and practice settings when compared to academic settings.

No SRs on how method of diagnosis affects patient adherence or clinical decisionmaking have been published. Very few studies have addressed these issues; we found insufficient evidence to answer Key Questions on this topic.

Applicability

Several factors affect the applicability of this review.

To increase generalizability, this report limited accuracy studies to those that included consecutive patients or a random sample. Several studies were excluded because we could not determine enrollment based on the information available.

Only one general population screening study met the criteria that all subjects, regardless of serology results, undergo biopsy. The cost of performing biopsies in all subjects and the low rate of acceptance of biopsy in seronegative, asymptomatic individuals makes the conduct of such studies challenging. Thus, the evidence on accuracy of diagnostic screening in the general asymptomatic population with no risk factors for CD is categorized as low strength.

Although this report is limited to diagnostic methods currently used in the U.S., study location was not a basis for exclusion. Many studies were conducted in Europe, the Middle East, and South Asia. Due to differences in genetics and disease prevalence, the applicability of these studies to the U.S. population is uncertain.

No studies stratified accuracy results by racial or ethnic group. Few studies focused on populations of special interest.

Most studies were conducted by gastroenterologists in academic settings. This report found a significant difference in interpretation of biopsy results between academic and nonacademic physicians. The majority of accuracy studies included in this report used Marsh classification to categorize biopsy results (Marsh III or higher is classified as celiac disease.) In contrast, many community physicians base their diagnosis on a simple qualitative assessment of villous atrophy or elevation of intraepithelial lymphocytes.

Accuracy of serology assays may vary by both laboratory and manufacturer. For example, Li and colleagues (2009)100used 150 samples from participants of known CD status to compare accuracy of tTG tests at 20 laboratories in the US and Europe. Sensitivity was less than 75% at four laboratories. Using a similar research design, Rozenberg and colleagues (2011)101found differences in performance of tTG across various manufacturers.

Finally, VCE is not a first line diagnostic method—it is indicated for adults who refuse biopsy. A 2012 SR of six studies reported very good sensitivity and excellent specificity. However, patient characteristics may differ between those who refuse a biopsy and those who accept. For example, those with more severe symptoms are hypothesized to be more likely to accept a biopsy.

Implications for Clinical and Policy Decisionmaking

The findings of this review support those of previous systematic reviews on the accuracy of individual diagnostic tests using immunoglobulin A (IgA). All IgA tests for celiac disease have excellent specificity; DGP IgA has slightly lower specificity than tTG IgA and EmA IgA. tTG IgA testing has a high positive predictive value for most clinical populations with a modest prevalence of CD. EmA IgA has good sensitivity, DGP IgA has very good sensitivity, and tTG IgA has excellent sensitivity. DGP IgG tests have very good sensitivity and excellent specificity, even in non-IgA deficient individuals.

Unfortunately, we were unable to determine which tests, if any, are more accurate in patients with specific symptoms or risk factors due to a dearth of studies meeting our inclusion criteria. Patients with symptoms associated with celiac disease would impact the pretest probability and as a result the likelihood of disease based on a positive result. No studies of test accuracy in patients with trisomy 21, Turner syndrome, and Williams syndrome were identified; the few studies of patients with Type 1 diabetes included small samples and were conducted in non-Western countries. Thus, no clinical implications for testing individuals with specific risks can be stated at this time. New research has found DGP tests more accurate than tTG in small children; strength of evidence is low but could increase if findings are replicated. tTG IgA had greater sensitivity than EmA IgA in the one study of the general (asymptomatic) population identified that met our inclusion criteria that all participants undergo biopsy, regardless of serology results. The quality of this general population study was high, the sample size was large (over 1,000) and it was conducted in a Western country (Sweden) with estimated celiac disease prevalence similar to the US.

This review found insufficient evidence to determine which populations would most benefit from diagnostic algorithms that combine a tTG test with an EmA or DGP test. A combination of positive serological testing with a threshold level at or several fold above the upper limit of normal for specific celiac tests may be accurate for diagnosing celiac disease without requiring histopathology specimens; however, the currently available evidence on comparative accuracy of algorithms is inconclusive, due to the wide range of results, heterogeneity of populations studied, and the lack of clinically significant increases in accuracy compared to individual tests. Future studies aimed at the diagnostic accuracy of multiple-test strategies would strengthen the evidence for this approach.

Finally, regarding biopsy, there is high strength evidence that multiple duodenal specimens should be taken from the duodenal bulb and the distal duodenum for optimal diagnostic yield in both the adult and pediatric population. There is moderate strength evidence that celiac disease is underdiagnosed by pathologists in community settings compared to academic settings; continued education on diagnostic protocols may be warranted for community physicians.

Limitations of the Comparative Effectiveness Review Process

At the request of AHRQ we conducted an assessment of the evidence on comparative effectiveness of various diagnostic methods currently used in the U.S. to diagnosis celiac disease. We conducted an extensive literature search; however, our consideration of unpublished literature was limited. Although a Scientific Resource Center (SRC) funded by AHRQ requested information from test manufacturers and major laboratories, no information was provided; we did not search FDA databases for such information ourselves.

In addition, this project was funded as a “small” systematic review and budgeted to include abstraction and analysis of fewer than 50 studies. Thus, the project protocol was to assess evidence from recent applicable systematic reviews and to abstract studies published thereafter. Data were not abstracted from individual studies included in prior SRs; we assumed the data presented in the SRs were abstracted accurately.

Limitations of the Evidence Base

The literature that addresses the diagnosis of celiac disease has numerous limitations that make it difficult to draw firm conclusions. These limitations can be divided into three categories: study volume, design, and reporting quality.

Volume

We identified many studies on the accuracy of tTG and EmA screens in symptomatic adults and children, including several recent systematic reviews. There were fewer studies of DGP antibody tests, as this diagnostic method is relatively new. There were also few studies assessing the accuracy of using algorithms such as those suggested by the most recent NICE and ESPGHAN guidelines.

No studies stratified accuracy results by race, ethnicity, or SES. Several studies in non-Caucasian populations were identified; however, these were not U.S. studies, and results may not be generalizable to populations in the U.S. We identified no studies of diagnostic accuracy in persons with Turner's syndrome of Trisomy 21. Literature was sparse on other populations of interest; several studies of accuracy in patients with Type 1 diabetes, iron deficiency anemia, or IgA deficiency were identified.

Almost no studies examined the impact of diagnostic method on decisionmaking or clinical or patient centered outcomes. Although the impact of living with undiagnosed celiac disease is well documented,102, 103 very few studies report outcomes of individuals who initially receive false positive or false negative results.

Design

Diagnostic accuracy is generally assessed through case-control and cohort studies; we included both designs. In studies employing a case-control design, a group of patients with known disease and a different group known not to have the disease undergo both the “index” test and the reference standard. Researchers are blinded to initial disease status. In a cohort design, a group of patients suspected of having the disease (but without a confirmed diagnosis) undergo both diagnostic methods. In a cohort design, the group is defined based on symptoms, while in a case-control design, the group is based on disease status. The latter design is more subject to bias.

We used the QUADAS-2 instrument to assess the quality of studies of diagnostic accuracy. The ratings for each QUADAS item for each study are presented in the Evidence Tables (Appendix C); case control studies are identified. Strengths and weaknesses of individual studies are discussed in the results section of this report and taken into consideration in rating the strength of the evidence.

To lessen bias, the decision to perform the reference standard should ideally be independent of the results of the test being studied. Thus, we included only studies where all patients underwent both tests. Many studies were identified where patients first underwent serological testing and only those who tested positive underwent biopsy; although these studies provide data on false positives, they were excluded. In addition, to increase generalizability, we included only studies that enrolled a random or consecutive sample.

The use of biopsy results as the reference standard also presents concerns. As discussed in the results for Key Question 2, inter-rater reliability of interpretation is higher at academic centers than community settings. Most of the published accuracy studies included in this review took place in an academic setting.

Regarding comparative accuracy, conclusions are based primarily on indirect evidence; i.e. pooled results on accuracy of individual tests rather than head to head studies comparing accuracy of different tests in the same samples. However, strength of evidence is high, given the large numbers of studies, the consistency of results, and the precision of the confidence intervals.

Finally, most of the prior SRs described in this report were of moderate quality. Strength of evidence (SOE) was not rated by the authors; we took the strengths and weaknesses of these SRs (as we assessed using AMSTAR) into consideration when we graded the SOE of the body of evidence. An additional item we considered regarding prior SRs was the method of pooling sensitivity and specificity; pooling both jointly in a bivariate model is recommended.

Reporting Quality

Failure to report important study design details in publications is a further limitation. Some accuracy studies were vague regarding blinding of assessors and the time lapse between implementation of the index test and reference standard. Data on these items were abstracted as part of QUADAS-2 and are displayed in the Evidence Tables. Such weaknesses are discussed in the Results section and were taken into consideration in rating the strength of evidence.

Research Gaps

Although there is high strength of evidence of the accuracy of various serologic tests for celiac disease in symptomatic individuals, strength of evidence on the accuracy of algorithms such as recommended by organizations such as ESPGHAN is insufficient due to the small number of studies and inconsistent results. Appendix F contains details on the test combinations, populations, and the strength of evidence domains for each algorithm studied. Further studies should be conducted.

There is also insufficient evidence to recommend specific tests for particular at risk populations. Patient-level factors that have been hypothesized to test accuracy include race and ethnicity, but no studies stratified results by these characteristics.

Due to the inherent invasive nature of biopsy, the vast majority of studies of serologic test accuracy using biopsy as the reference standard have been conducted in patients presenting for testing due to symptoms. The most common symptoms are gastrointestinal (diarrhea, constipation, pain, etc.) as well as malnutrition in children. High accuracy was found in the only general population screening study; however, despite the high scientific quality of this study, the strength evidence of accuracy in the asymptomatic general population is low because the study has never been replicated. This does not mean the tests are inaccurate in asymptomatic individuals; lack of evidence does not equal evidence of inaccuracy.

No studies addressing the key subquestion “What impact does the method of initial diagnosis have on how a physician follows up with a patient?” were identified. Retrospective analyses of existing databases may shed light in this area.

Finally, studies may be needed to investigate the long term impact of misdiagnosis. False positives and false negatives may be important “harms” due to a) huge lifestyle changes involved for positive diagnosis and b) potential health harm (malabsorption, intestinal damage) from undiagnosed CD.

Conclusions

New evidence on accuracy of tests used to diagnosis celiac disease supports the high sensitivity of IgA tTG tests and high specificity of both IgA tTG and IgA EmA tests reported in prior SRs. Regarding comparative accuracy, IgA EmA tests have lower sensitivity but equal specificity to IgA tTG tests. IgA DGP and IgG DGP tests are not as sensitive as IgA tTG tests in non IgA deficient adults. These conclusions are based primarily on indirect evidence; however, strength of evidence is high, given the large number of studies, the consistency of results, and the precision of the confidence intervals.

High strength of evidence of accuracy, particularly in children, was found for DGP tests in recent SRs. Algorithms combining tTG with either EmA or DGP tests appear to be accurate in both children and adults. Adding an EmA test to a tTG test resulted in increased specificity, with either no change or a slight decrease in sensitivity. In contrast, adding a DGP test to a tTG test resulted in increased sensitivity but decreased specificity. However, strength of evidence is insufficient given the low number of studies relative to single tests, heterogeneity of populations, and wide range of results. The increase in accuracy over individual tests is not consistently clinically significant. Additional studies of algorithms are needed.

Notably, current ESPGHAN guidelines state that if a patient demonstrates a tTG result greater than (10x) the normal limit, the patient should then undergo an EmA test and HLA typing; if the patient tests positive, then responds to gluten exclusion diet, a diagnosis of celiac disease can be made without use of biopsy. These guidelines have not been adopted by societies in the U.S. Evidence seems to support that a multiple-testing strategy without biopsy is accurate; however, additional studies are needed to confirm the test threshold levels that would optimize accuracy for general and specific populations.

VCE is a safe and fairly accurate means of diagnosing celiac disease in adults who wish to avoid biopsy; risk of retaining the capsule is approximately 4.6%. However, our pooled results reveal that serological tests have higher sensitivity and specificity. No data are available on how VCE accuracy varies by population characteristics or setting. Endoscopy with biopsy has a very low risk of adverse events; accuracy appears to be greater in academic settings.

Importantly, few applicable studies on the sequelae of false positive or false negative diagnoses were identified. Long-term follow-up of patients, regardless of diagnostic outcomes, should be encouraged.

Views

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...