NCBI » Bookshelf » Health Services/Technology Assessment Text (HSTAT) » AHRQ Evidence Reports » Celiac Disease
 
hserta
AHRQ Evidence Reports
public health

Chapter  104:  Celiac Disease

A156000

Prepared for:

Agency for Healthcare Research and Quality

U.S. Department of Health and Human Services

540 Gaither Road

Rockville, MD 20850

www.ahrq.gov

Contract No. 290-02-0021

Prepared by:

University of Ottawa Evidence-based Practice Center, University of Ottawa, Ottawa, Canada

Co-directors: David Moher, PhD and Howard M. Schachter, PhD

Investigators

Alaa Rostom,* MD, MSc, FRCPC

Catherine Dubé,* MD, MSc, FRCPC

Ann Cranney,* MD, MSc, FRCPC

Navaaz Saloojee,* MD, FRCPC

Richmond Sy,* MD, FRCPC

Chantelle Garritty, BA, DCS

Margaret Sampson, MLIS

Li Zhang, MLIS

Fatemeh Yazdi, MSc

Vasil Mamaladze, MD, PhD

Irene Pan, MSc

Joanne McNeil,* RN

David Moher, PhD

David Mack,* MD, FRCPC

Dilip Patel,* MD, FRCPC

Chalmers Research Group; *Gastrointestinal Clinical Research Unit; Division of Rheumatology

AHRQ Publication No. 04-E029-2

September 2004

ISBN: 1-58763-159-8

ISSN: 1530-4396

This document is in the public domain and may be used and reprinted without permission except those copyrighted materials noted for which further reproduction is prohibited without the specific permission of copyright holders.

This report may be used, in whole or in part, as the basis for development of clinical practice guidelines and other quality enhancement tools, or a basis for reimbursement and coverage policies. AHRQ or U.S. Department of Health and Human Services endorsement of such derivative products may not be stated or implied.

AHRQ is the lead Federal agency charged with supporting research designed to improve the quality of health care, reduce its cost, address patient safety and medical errors, and broaden access to essential services. AHRQ sponsors and conducts research that provides evidence-based information on health care outcomes; quality; and cost, use, and access. The information helps health care decisionmakers—patients and clinicians, health system leaders, and policymakers—make more informed decisions and improve the quality of health care services.

Suggested Citation:

Rostom A, Dubé C, Cranney A, Saloojee N, Sy R, Garritty C, Sampson M, Zhang L, Yazdi F, Mamaladze V, Pan I, McNeil J, Moher D, Mack D, Patel D. Celiac Disease. Evidence Report/Technology Assessment No. 104. (Prepared by the University of Ottawa Evidence-based Practice Center, under Contract No. 290-02-0021.) AHRQ Publication No. 04-E029-2. Rockville, MD: Agency for Healthcare Research and Quality. September 2004.

Prepared for:

Agency for Healthcare Research and Quality

U.S. Department of Health and Human Services

540 Gaither Road

Rockville, MD 20850

www.ahrq.gov

Contract No. 290-02-0021

Prepared by:

University of Ottawa Evidence-based Practice Center, University of Ottawa, Ottawa, Canada

Co-directors: David Moher, PhD and Howard M. Schachter, PhD

Investigators

Alaa Rostom,* MD, MSc, FRCPC

Catherine Dubé,* MD, MSc, FRCPC

Ann Cranney,* MD, MSc, FRCPC

Navaaz Saloojee,* MD, FRCPC

Richmond Sy,* MD, FRCPC

Chantelle Garritty, BA, DCS

Margaret Sampson, MLIS

Li Zhang, MLIS

Fatemeh Yazdi, MSc

Vasil Mamaladze, MD, PhD

Irene Pan, MSc

Joanne McNeil,* RN

David Moher, PhD

David Mack,* MD, FRCPC

Dilip Patel,* MD, FRCPC

Chalmers Research Group; *Gastrointestinal Clinical Research Unit; Division of Rheumatology

AHRQ Publication No. 04-E029-2

September 2004

ISBN: 1-58763-159-8

ISSN: 1530-4396

This document is in the public domain and may be used and reprinted without permission except those copyrighted materials noted for which further reproduction is prohibited without the specific permission of copyright holders.

This report may be used, in whole or in part, as the basis for development of clinical practice guidelines and other quality enhancement tools, or a basis for reimbursement and coverage policies. AHRQ or U.S. Department of Health and Human Services endorsement of such derivative products may not be stated or implied.

AHRQ is the lead Federal agency charged with supporting research designed to improve the quality of health care, reduce its cost, address patient safety and medical errors, and broaden access to essential services. AHRQ sponsors and conducts research that provides evidence-based information on health care outcomes; quality; and cost, use, and access. The information helps health care decisionmakers—patients and clinicians, health system leaders, and policymakers—make more informed decisions and improve the quality of health care services.

Suggested Citation:

Rostom A, Dubé C, Cranney A, Saloojee N, Sy R, Garritty C, Sampson M, Zhang L, Yazdi F, Mamaladze V, Pan I, McNeil J, Moher D, Mack D, Patel D. Celiac Disease. Evidence Report/Technology Assessment No. 104. (Prepared by the University of Ottawa Evidence-based Practice Center, under Contract No. 290-02-0021.) AHRQ Publication No. 04-E029-2. Rockville, MD: Agency for Healthcare Research and Quality. September 2004.

Preface

The Agency for Healthcare Research and Quality (AHRQ), through its Evidence-Based Practice Centers (EPCs), sponsors the development of evidence reports and technology assessments to assist public- and private-sector organizations in their efforts to improve the quality of health care in the United States. This report on Celiac Disease was requested and funded by the Office of Medical Applications of Research National Institutes of Health (NIH) for the Consensus Development Conference on Celiac Disease as well as the National Institute of Diabetes and Digestive and Kidney Diseases, NIH. The reports and assessments provide organizations with comprehensive, science-based information on common, costly medical conditions and new health care technologies. The EPCs systematically review the relevant scientific literature on topics assigned to them by AHRQ and conduct additional analyses when appropriate prior to developing their reports and assessments.

To bring the broadest range of experts into the development of evidence reports and health technology assessments, AHRQ encourages the EPCs to form partnerships and enter into collaborations with other medical and research organizations. The EPCs work with these partner organizations to ensure that the evidence reports and technology assessments they produce will become building blocks for health care quality improvement projects throughout the Nation. The reports undergo peer review prior to their release.

AHRQ expects that the EPC evidence reports and technology assessments will inform individual health plans, providers, and purchasers as well as the healthcare system as a whole by providing important information to help improve health care quality.

We welcome comments on this evidence report. They may be sent by mail to the Task Order Officer named below at: Agency for Healthcare Research and Quality, 540 Gaither Road, Rockville, MD 20850, or by mail to epc@ahrq.gov.

Carolyn M. Clancy, M.D.

Director

Agency for Healthcare Research and Quality

Barnett S. Kramer, M.D., M.P.H.

Director

Office of Medical Applications of Research, NIH

Allen M. Spiegel, M.D.

Director

National Institute of Diabetes and Digestive and Kidney Diseases, NIH

Jean Slutsky, P.A., M.S.P.H.

Director, Center for Outcomes and Evidence

Agency for Healthcare Research and Quality

Kenneth S. Fink, M.D.,M.G.A.,M.P.H.

Director, EPC Program

Agency for Healthcare Research and Quality

Marian James, Ph.D.

EPC Program Task Order Officer

Agency for Healthcare Research and Quality

The authors of this report are responsible for its content. Statements in the report should not be construed as endorsement by the Agency for Healthcare Research and Quality or the U.S. Department of Health and Human Services of a particular drug, device, test, treatment, or other clinical service.

Acknowledgments

The authors would like to thank several individuals for their support of the present project: Keith O'Rourke, who helped with the statistical analysis; Karen Patrias, who helped with conducting the literature search; Gabriela Lewin, who assisted with the quality assessment; and Christine Murray and Isabella Steffensen, who assisted in the editing of the report and the generation of evidence tables.

Author Contribution

Dr. Alaa Rostom was the lead investigator. He was involved in all aspects of the study design, management, planning, analysis and write-up, including article screening, data extraction, quality assessment, statistical analysis, and report write-up. Drs. Catherine Dubé and Ann Cranney were the second investigators. Dr. Catherine Dubé was involved in all aspects of study design, and planning and organization, including task management, article screening, data extraction, and quality assessment. She was the lead writer of Celiac 2 and 3. Dr. Ann Cranney was involved in all aspects of study design and planning, including article screening, data extraction, and quality assessment. She was the lead writer of Celiac 4, and oversaw the screening and data extraction for Celiac 3, 4, and 5. Dr. Navaaz Saloojee was involved in study planning, article screening, data extraction, and quality assessment, and was the lead writer of Celiac 5. Dr. Richmond Sy was involved in study planning, article screening, data extraction, and contributed to the writing of Celiac 4. Drs. David Mack and Dilip Patel were involved in study planning and article screening, in addition to being content experts in pediatric and adult celiac disease, respectively. They also reviewed and advised on the report write-up. JoAnne McNeal was involved in article screening and data extraction for Celiac 1 (serology) and Celiac 2 (prevalence).

Dr. David Moher was involved in all aspects of study design, management, planning, analysis, and write-up. He was the methodological content expert and reviewed and advised on all report conduct and documents. Chantelle Garrity was involved in all aspects of project planning and management, including liaison with all key partners. She oversaw the screening progress, document retrieval, and assisted in report management, review and write-up. Margaret Sampson was the lead information specialist and was involved in all aspects of the search strategy/key word design and refinement, in association with the information specialists at the NLM. She was involved in all aspects of article database management, including article retrieval, set-up of the online computerized SRS article screening and extraction system, and the development and write-up of the QUOROM Flow. Li Zhang was the information specialist involved in all aspects of SRS system management, article retrieval, and implementation of the SRS system. Dr. Vasil Mamaladze and Fatemeh Yazdi performed the data extraction of Celiac 1-serology and Celiac 2-prevalence. Irene Pan performed data extraction and quality assessment of Celiac 2-prevalence.

Structured Abstract

Context. Celiac disease (CD) is a disorder of small bowel malabsorption. It is characterized by mucosal inflammation, villous atrophy and crypt hyperplasia that occur upon exposure to gluten, and clinical and histological improvement with withdrawal of gluten from the diet. The classical presentation of CD has now been shown to be less common than silent or atypical presentation, in which patients do not have intestinal symptoms. Untreated CD is associated with multiple important short- and long-term complications including nutritional derangements, anemia, reduced bone density, as well as intestinal lymphoma. In the vast majority of patients, CD is effectively treated with dietary modifications that eliminate gluten. Mounting evidence suggests that CD is actually considerably more common than previously believed and, therefore, this disorder warrants consideration for screening of at-risk patients, as well as possibly the general population.

Objectives. To conduct a comprehensive systematic review on five areas of CD: (1) sensitivity and specificity of serological tests; (2) prevalence and incidence of CD; (3) CD associated lymphoma; (4) consequences of testing for CD; and, (5) interventions for the promotion and monitoring of adherence to a gluten-free diet (GFD).

Data Sources. Staff of the National Library of Medicine performed a series of searches in support of the literature review of CD. Searches were run in the MEDLINE® (1966 to Oct 2003) and EMBASE (1974 to Dec 2003) databases for each of the five objectives and their respective sub-objectives separately.

Study Selection. Study selection for each objective was performed using three levels of screening with predetermined increasingly more strict criteria to ensure that all relevant articles were captured. Following a calibration exercise, two reviewers independently screened all studies using a web-based system allowed automatic identification of review disagreements. These disagreements were resolved by consensus.

Data Extraction. For each CD objective, a detailed and standardized data abstraction form was developed. For each objective, data abstraction was conducted by one reviewer and verified by another. The extracted data was further verified by one of the principal investigators. Quality assessments were performed using specific instruments for each of the included study types.

Data Synthesis. The data obtained from this review fell into several broad categories, which correspond in large part to the individual study objectives. Data for the sensitivity and specificity of each serological marker was considered separately, and studies were further divided according to the age group of the study population. Attempts were made to identify, explain, and minimize clinical and statistical heterogeneity in the included studies. A Pearson's Chi Square with n-1 degrees of freedom, where n represents the number of included studies in an analysis, was calculated to assess statistical heterogeneity. Pooled estimates were only calculated if clinically and statistically appropriate. In situations where pooling was not performed, a qualitative systematic review was conducted.

To produce clinically useful pooled statistics, a weighted mean of the overall sensitivity and specificity from the included studies was calculated, along with 95% confidence intervals (CIs). The pooled estimates for the sensitivity and specificity were compared with a summary receiver operating characteristic (ROC) curve, calculated for the same group of studies as a second check of the estimates.

Results/Conclusions. This report has provided a systematic review of five broad areas (and corresponding sub-areas) of CD. Perhaps one of the most important findings of this report is the significance of how one chooses to define CD in the era of serological testing, and how this apparently clear-cut task has profound implications on all the results presented in this report. Specifically, can CD be diagnosed solely on the basis of serology? Is some degree of villous atrophy necessary for a diagnosis of CD. These questions have important implications downstream of the diagnosis as well. For example, do CD patients without symptoms or villous atrophy have the same risk of complications as those with villous atrophy. Is serological improvement on a GFD sufficient to reduce CD complications, or must there be documented histological improvement, and what degree of histological improvement is necessary?

The results of the Celiac 1 objective suggest that in the era of EMA and tTG antibody testing, AGA antibody testing in both children and adults has a limited role. The sensitivity and specificity of EMA and tTG are quite high (over 95% for sensitivity, and close to 100% for specificity), as are their positive and negative predictive values; however, one has to be aware that the reported diagnostic parameters are taken from studies in which the prevalence of CD was, for the most part, much higher than that seen in usual clinical practice. The positive predictive values reported for these tests will certainly not be as high as that reported when these tests are used to screen the general population. The bulk of the evidence on the diagnostic characteristics of these tests was derived from studies that defined CD as having at least some degree of VA.

HLA DQ2/DQ8 testing appears to be a useful adjunct in the diagnosis of CD. The test has high sensitivity (in excess of 90%–95%), however, since approximately 30% of the general population, and an even higher proportion of “high-risk” subjects (e.g., diabetics and family members) also carry these markers, the specificity of this test is not ideal. The greatest diagnostic utility of this test appears to be its negative predictive value.

Biopsy itself, when used with a strict cut-off requiring villous atrophy, appears to have high specificity, but poor sensitivity. Using a lower grade cut-off clearly improves sensitivity, but because of the wide differential of causes of histological lesions similar to Marsh I to IIIa, the specificity suffers. The use of histomorphometric measures such as quantification of gamma delta positive intraepithelial lymphocytes (γδ+ IELs) are likely to allow for the use of lower grade cut-offs, while maintaining reasonable specificity. Ultimately, a trial utilizing multiple diagnostic tests in an attempt to capture as many CD patients in a clinically-relevant population as possible, along with a time dimension such as a response to a GFD or gluten challenge, is required to fully assess the diagnostic characteristics of biopsy alone. This type of study would be able to characterize the false-positive and false-negative rates, provided that all studied patients are followed forward in time.

The included prevalence studies demonstrated important differences between the studies including, execution, tests for prevalence assessment, and patient sampling. Thus, results have to be interpreted in the light of some of the limitations that have been identified regarding the diagnostic performance of the tests for CD. Nonetheless, the results of this report suggest that CD is a very common disorder with a prevalence in the general population that is likely close to 1:100 (1%). Several high-risk groups with a prevalence of CD greater than that of the general population have been identified and include: those suspected of having CD; family members of CD patients; type I diabetics; and, those with iron-defiency anemia (IDA) or low bone mineral density (BMD). Additionally, the review identified many other high-risk groups, including those with Down Syndrome, short stature, and infertility, to name a few. Their inclusion was however, beyond the scope of this report

The results of this report confirm that, apart from a few limitations, there is a strong association between CD and GI lymphoma. The report identified standard incidence ratios (SIR) for lymphoma that ranged from 4 to 40, and standard mortality ratios (SMR) that ranged from 11 to 70. A diagnostic delay—in particular a diagnosis of CD in adulthood as apposed to in childhood—is associated with poorer outcomes. Fortunately, several studies suggest that adherence to a GFD reduces the risk of lymphoma in CD patients.

The consequences of testing for CD in at-risk and symptomatic patients appears to be more straightforward, since these patients appear to be more compliant with a GFD and would be expected to benefit from this intervention. The data is less clear for asymptomatic screen-identified patients, particularly those who have truly silent CD and/or don't have fully-developed villous atrophy. On the one hand the outcome of such patients has not been extensively studied, and on the other hand compliance with a GFD appears problematic, particularly for those diagnosed in adulthood.

Finally, no specific interventions have been identified that promote adherence to a GFD, but education of patients and family members about CD and about the intricacies of a GFD, and participation in local celiac societies, has been shown to improve compliance. Although somewhat controversial, biopsy monitoring of adherence to a GFD appears to be important, since improvement in histological grade has been associated with improved BMD, IDA, and nutritional status. The serological markers appear to be adequate for detecting gross dietary indiscretion, and respond to a gluten challenge, but appear to have poor sensitivity for detecting lesser degrees of dietary indiscretion, and inadequately correlating with histological improvement at least in the short-term. It should, however, be noted, that we could not identify a controlled study that objectively determined the level of histological improvement that would be associated with improved outcomes, and this is an area for future study. Nonetheless, based on this report it would appear that follow-up biopsy, at least 1 year after a GFD in adults to document improvement of the histological grade, would be valuable.

Chapter 1. Introduction

Overview

Celiac disease (CD) is a disorder of small bowel malabsorption. It is characterized by mucosal inflammation, villous atrophy and crypt hyperplasia, which occur upon exposure to gluten, and clinical and histological improvement with withdrawal of gluten from the diet.1–4 CD—also referred to as celiac sprue, gluten-sensitive enteropathy, non-tropical sprue, in addition to a host of other names—is thought to result from the activation of both a cell-mediated (T-cell) and humoral (B-cell) immune response upon exposure to the glutens (prolamins and glutenins) of wheat, barley, rye, and oats, in a genetically susceptible person.5, 6 Genetic susceptibility is suggested by a high concordance among monozygotic twins of close to 70 percent,7 and an association with certain type II human leukocyte antigens (HLA).8, 9 HLA DQ2 is found in up to 95 percent of CD patients, while most of the remaining patients have HLA DQ8.8–10 However, there is only a 30 percent HLA concordance among siblings, suggesting that other genetic factors are also at play.11 More recent evidence suggests that the presence of auto-antibodies to a connective tissue element surrounding smooth muscle called endomysium is highly specific for CD. The target of this autoantibody is now known to be an enzyme called tissue transglutaminase (tTG). This enzyme may play a prominent role in the pathogenesis of CD by modifying gliadin, resulting in a greater proliferative response of gliadin specific T-cells, which contributes to mucosal inflammation and further B-cell activation.5, 6, 12, 13

CD appears to represent a spectrum of clinical features and presentations. Although “classical” CD (i.e., fully developed gluten-induced villous atrophy and classical features of intestinal malabsorption) is most commonly described, it appears that most patients have atypical CD (i.e., fully developed gluten-induced villous atrophy found in the setting of another presentation such as iron deficiency, osteoporosis, short stature, or infertility) or silent CD (i.e., fully developed gluten-induced villous atrophy discovered in an asymptomatic patient by serologic screening or perhaps an endoscopy for another reason). Other authors describe a latent form of CD that is characterized by a previous diagnosis that responded to a gluten-free diet (GFD) and retained a normal mucosal histology upon later introduction of gluten. Latent CD can also represent patients with currently normal intestinal mucosa who will subsequently develop gluten-sensitive enteropathy.13, 14

The true prevalence of CD is difficult to estimate because of the variable presentation of the disease, particularly since many patients can have little or no symptoms. With this limitation in mind, the prevalence of the disease is highest in Celtic populations where estimates of 1:300 to 1:122 have been described. The prevalence of CD in North America has been estimated to be 1:3000, but a recent American study found the prevalence among the general not-at-risk population to be 1:105, while the prevalence in at-risk groups such as first-degree relatives of CD patients was 1:22, suggesting that CD is greatly under-diagnosed. CD can affect persons of many ethnic backgrounds, but appears to rarely affect persons of purely Chinese, Japanese, or Afro-Caribbean decent.13

The diagnosis of CD in adults is classically made on the basis of clinical suspicion—that is, recognizing atypical presentations such as isolated iron deficiency, combined iron and folate deficiency, and osteoporosis—compatible with a duodenal biopsy while taking a gluten-containing diet, followed by clinical and histological improvement following commencement of a GFD.2, 4 However, several serologic markers have become available which have altered the classic diagnostic pathway. The sensitivity of IgA anti-gliadin antibodies (AGA) is reported to range from 70 to 85 percent, whereas the specificity ranges from 70 to 90 percent. IgA anti-endomysial (EMA) and anti-tissue transglutaminase (tTG) antibodies have sensitivities in excess of 90 percent and specificities of over 95 percent.14 Significant variability seems to exist in the reported values among the different studies, and these IgA-based tests can be negative in IgA-deficient patients, accounting for about 3 percent of CD cases.

The sensitivity and specificity of the anti-EMA and anti-tTG antibodies, along with the perceived under diagnosis of CD, has led to suggestions of using these tests for population screening. Aside from the recognized influence of CD prevalence on the predictive value of a serologic test result, little consensus exists regarding the value of population screening. Furthermore, specific questions regarding clinically important outcomes resulting from screening remain unclear. In particular, little data is available on adherence to a GFD in asymptomatic CD patients detected by screening.

The major complications of CD include intestinal and extraintestinal malignancies, ulcerative jejunoileitis, and collagenous sprue. Unlike most gastrointestinal (GI) lymphomas that are typically of B-cell origin, lymphomas associated with CD appear to be most commonly of T-cell origin. Unfortunately, the prognoses for patients with CD-associated T-cell lymphomas, ulcerative jejunoileitis and collagenous sprue, appear grim. It is widely believed that strict adherence to a GFD reduces the risk of these complications. It is suggested that by 5 years of dietary adherence the risk of lymphoma in CD patients approaches that of the general population.14

The challenge of CD remains to determine which patient populations should be screened, the best means of screening, and whether early detection of patients with CD leads to improved patient outcomes. For patient outcomes to improve as a result of screening, the degree to which “positively” screened individuals, particularly those who were asymptomatic, adhere to the stringent GFD, needs to be determined.

Definition of CD

As briefly described in the Overview, CD can take on a variety of forms. Paramount to the conduct of this review and subsequent interpretation of the literature is the identification of clear definitions of the many faces of CD. Implicit to a definition of CD (with a few exceptions that are detailed below) is the concept that the clinical and the small intestinal pathological features are present in patients who consume a gluten-containing diet, normalize with the introduction of a GFD, and recur with the re-introduction of dietary gluten.2, 4 The historical tendency to rely on biopsy features as part of the definition of CD, creates difficulties (as discussed below) in accurately addressing the sensitivity and specificity of biopsy for the diagnosis of CD, and in assessing the sensitivity and specificity of the serologic markers, if different studies use different criteria to define CD. For the purpose of this review, the following definitions have been used.

General Definitions

  1. Classical CD. The most commonly described form. It describes patients with the classical features of intestinal malabsorption who have fully developed gluten-induced villous atrophy and the other classic histological features. These patients present because of GI symptoms, and are identified as CD sufferers through the investigation of these symptoms. This group can also be said to have symptomatic CD.

  2. Atypical CD. Appears to be one of the most common forms. These patients generally have little to no GI symptoms, but seek medical attention because of another reason such as iron deficiency, osteoporosis, short stature, or infertility. These patients generally have fully developed gluten-induced villous atrophy. Because these patients are “asymptomatic” from the GI perspective, if their atypical CD feature is not recognized, they may be difficult or impossible to distinguish from “true” silent (asymptomatic) CD patients.

  3. Silent CD. A very common form of CD. Refers to patients who are asymptomatic but are discovered to have fully developed gluten-induced villous atrophy after having undergone serologic screening or perhaps an endoscopy and biopsy for another reason. These patients are clinically silent, in that they do not manifest any clear GI symptoms or associated atypical features of CD such as iron deficiency or osteoporosis. These patients can be confused with atypical CD if their atypical features are not recognized in an early stage. As well, Fasano et al.15 have shown that many of these patients do not manifest fully developed villous atrophy.

  4. Latent CD. Represents patients with a previous diagnosis of CD that responded to a GFD and who retain a normal mucosal histology upon later re-introduction of gluten. Latent CD can also represent patients with currently normal intestinal mucosa who will subsequently develop gluten-sensitive enteropathy.

  5. Refractory CD. For the purpose of this review, patients with refractory CD are patients with true CD and villous atrophy (i.e., not a misdiagnosis) who do not, or no longer, respond to a GFD. Although the most common reason for failure to respond to a GFD is dietary indiscretion or unknown exposure to gluten, refractory CD also occurs in patients on a GFD who have developed a complication such as ulcerative-jejunoileitis, or enteropathy-associated lymphoma. Patients with refractory CD do not necessarily have positive serology for CD. Refractory CD was reviewed in the context of the requested objectives.

In order to utilize the above definitions, there needs to be clear and valid histological criteria for the diagnosis of CD. The histological patterns, particularly the more mild lesions, are not specific for CD and can be seen in a variety of other disorders (Table 1, Appendix A). To help standardize the histological criteria for the diagnosis of CD, several scoring systems have been developed. The classic Marsh criteria,1 and its modification by Rostami,16 are presented in Table 2 (Appendix A). The revised ESPGAN criteria4 use histological, serological and clinical criteria (Table 3, Appendix A).

Report Purpose and Target Population

The purpose of this report is to systematically review the available CD literature in order to provide organized evidence relating to a number of objectives put forth by the AHRQ. The findings of the report are intended to assist an assembled group of American and world experts in the field of CD in the development of a National Institute of Health (NIH) Consensus Development Conference Guidelines sponsored by AHRQ and OMAR.

Methodological Considerations

At first glance, the determination of the sensitivity and specificity of the various diagnostic modalities for CD seems straightforward. There are a multitude of studies that have assessed the diagnostic characteristics of each of the serological markers using a variety of different laboratory methods. However, these studies are remarkably heterogeneous on a number of levels.

For example, there appears to be notable heterogeneity in the actual definition of CD, an issue that has important consequences on all of the task order objectives. Central to the classic definition of CD is the recognition that biopsy is the gold standard for diagnosis. However, it has become clear over the years that the majority of patients with CD do not have the classically described features of intestinal malabsorption, and that a large proportion of patients do not have the classic flat mucosa (sub-total or total villous atrophy). To further aid in the diagnosis of CD, multiple authors have devised and modified histological criteria to grade the mucosal lesions of patients with CD. But still at issue is the broad differential of disorders that can cause villous atrophy, particularly the milder histological grades. To help address this issue, others have attempted to address specific features of the biopsy, such as the number of intraepithelial lymphocytes (IELs), the number of gamma delta positive (γδ+) IELs and other lymphocyte subtypes, as well as the localization of IELs towards the villous tip, just to name a few.

The serological screening studies, together with the recognition that a low-grade histological lesion can be consistent with CD, have helped bring to light the concept of a spectrum of CD and the so-called “celiac iceberg.” In brief, it is recognized that classic CD with the typical symptoms of malabsorption and a fully developed mucosal lesion represents a small proportion of patients. The majority of patients are asymptomatic and are classified as having either atypical CD, silent CD, or less commonly latent CD. Some authors question whether most, if not all cases of silent CD, are in fact atypical CD, although the associated consequence of this has not been recognized. To further complicate the issue, Fasano15 has clearly characterized patients with silent CD without fully developed mucosal lesions, and found that only 34 percent of the patients had subtotal or total villous atrophy.

It should be recognized that the majority of studies assessing the diagnostic characteristics of the serological markers have defined CD by a biopsy with Marsh III or modified IIIa lesions or greater. These studies have reported a high sensitivity and specificity for these tests, particularly for the anti-EMA and anti-tTG antibody tests. However, some studies have looked at the characteristics of these tests in lower-grade lesions, and have found that while 100 percent of patients with Marsh IIIc histology show antibodies to endomysium, only 60 percent of patients with Marsh IIIa histology have anti-EMA antibodies.17, 18 Furthermore, it is apparent that serological markers can be used to monitor adherence to a GFD; for example, EMA and tTG antibodies fall to normal or non-diagnostic levels on a GFD, but the correlation with improvement of villous height is not as clear-cut. Finally, with the discovery by Sollid et al.8 and others, that over 95 percent of patients with CD have HLA DQ2 and most of the remainder having HLA DQ8, it became hopeful that a reliable confirmatory test based on HLA typing would be available. Unfortunately, up to 40 percent of the general population and a much higher proportion of those with autoimmune disorders such as type I diabetes also have HLA DQ2 and/or HLA DQ8. Therefore, the specificity of this test can be quite low, making its positive predictive value relatively low. It is also becoming apparent that HLA DQ2/8 may not be the true risk-genes, and researchers are actively studying other candidate genes that may be associated with DQ2/8, or in patients without DQ2/8, other genes altogether.

The preceding overview was presented to simply illustrate the complexity involved in separately assessing the sensitivity and specificity of the serological markers, HLA typing, and biopsy itself, in the diagnosis of CD. Over time, the status of the biopsy as the gold standard for the diagnosis of CD has been eroded. Yet at the same time, most of what we know about the sensitivity and specificity of serological markers and HLA typing rely on biopsy as the gold standard. Therefore, one is locked in a circular argument of how best to choose the gold standard test(s), when each has important shortcomings and is dependent on another to define its own diagnostic characteristics. The major problem in accurately evaluating the diagnostic characteristics of these tests, is the issue of identifying all possible CD patients in a general screened population to use as a benchmark. Serology would be the most convenient strategy, but appears to loose sensitivity in patients with low-grade lesions. Screening a general population with biopsy has significant practical/cost issues, as well as potential ethical problems; however, if such a study was performed along with measuring the serological and HLA status of patients, this would allow for identification of Marsh I or II lesions that would need to be characterized further. HLA DQ2/8-negative patients could likely be excluded from having CD. But those patients with Marsh I–II lesions would have to be followed, whether or not they were serology positive or HLA DQ2/8 positive, to see if CD develops; alternatively, they could be tested with a GFD and subsequently rechallenged to see whether they truly have CD. Only in this way can the true sensitivity of biopsy be determined. Using this multi-test gold standard with follow-up of equivocal cases, would also be the best way of assessing the sensitivity and specificity of serology markers and HLA DQ2/DQ8 typing.

Finally, a question which needs to be addressed is: “What are the implications of identifying a truly asymptomatic individual, for example with serological screening, who has no other obvious complications such as iron deficiency or osteoporosis, and is then found to have a Marsh I or II lesion?” This returns the circular argument back to “What is truly CD?”—a question that is beyond the scope of this review.

Chapter 2. Methods

Overview

The UO-EPC's evidence report on CD is based on a systematic review of the scientific-medical literature to identify, and synthesize the results from studies addressing the key questions put forth by the AHRQ. The Celiac Review Team, together with content experts, identified specific issues integral to the review. A Technical Expert Panel (TEP) refined the research questions, as well as highlighted key variables requiring consideration in the evidence synthesis. Evidence tables presenting the key study characteristics and results were developed. Summary tables were derived from the evidence tables. The methodological quality of reports of the included studies was appraised, and individual study results were summarized. For some objectives a narrative interpretation of the literature was provided.

Key Questions Addressed in This Report

The AHRQ task order requested answers to the questions outlined below:

  1. Objective 1 - Sensitivity and specificity of tests for CD (Celiac 1)

    1. What is the sensitivity and specificity of the following tests for CD:

      1. AGA;

      2. EMA;

      3. human tTG lgA antibodies;

      4. HLA (DQ2/DQ8);

      5. duodenal/jejunal biopsy (see section below on celiac definition)

    2. Do sensitivity and specificity vary in different target populations (e.g., symptomatic vs. asymptomatic; geographic populations)?

  2. Objective 2 - Prevalence and incidence of CD (Celiac 2)

    1. What is the prevalence and incidence of symptomatic and “clinically silent” CD in:

      1. the general population;

      2. high-risk populations:

        1. family member of patient with CD;

        2. type 1 diabetes mellitus;

        3. iron deficiency anemia (IDA);

        4. osteoporosis?

    2. How does prevalence and incidence in the general population vary in different geographic and racial/ethnic populations?

  3. Objective 3 - Celiac associated lymphoma (Celiac 3)

    1. What is the association between CD and GI lymphoma?

      1. What is the cumulative risk of developing GI lymphoma in patients with CD?

      2. Does the cumulative risk vary with clinical presentation?

  4. Objective 4 - Expected consequences of testing for CD (Celiac 4)

    1. What are the expected consequences of testing for CD in the following populations:

      1. patients with symptoms suggestive of CD;

      2. asymptomatic, at-risk populations (affected family members, patients with type 1 diabetes);

      3. the general population?

    2. “Consequences” include:

      1. false-positive results;

      2. follow-up testing;

      3. invasive procedures (biopsies);

      4. cases diagnosed;

      5. patients complying with treatment; and

      6. response to treatment.

  5. Objective 5 - Promoting or monitoring adherence to a GFD (Celiac 5)

    1. What interventions are effective for promoting or monitoring adherence to a GFD?

Study Criteria Used in this Review

Histological

From the preceding discussion in the methodological consideration section it is clear that current histological criteria using a cut-off grade to define CD have important shortcomings. We therefore adopted an open histological definition of CD when selecting a study for inclusion, as long as the authors' explicitly stated or described the criteria used to define CD (see inclusion criteria below). However, with the help of the TEP, we defined a “standard” histological definition of CD as a biopsy grade showing a modified Marsh IIIa or greater. This definition was NOT used as an inclusion/exclusion criterion, but simply to frame our results and to allow for the evaluation of the effect of different histological criteria on the performance of the various CD tests.

The choice of biopsy criteria and/or histological grade “cut-off” used to define CD has important implications for the interpretation of the studies of serology, HLA, and biopsy. It is recognized that some patients with CD may have Marsh I or II lesions, and by definition patients with latent CD have Marsh 0 lesions. However, as emphasized by Marsh,1 and as is discussed further below, in order to correctly interpret these early lesions, prospective follow-up studies are required, and an individual patient follow-up and documented response to gluten withdrawal would be required to firmly establish the diagnosis of CD.

The practical importance of the histological definition is evident from our preliminary review of articles that demonstrated considerable heterogeneity in the histological criteria used within the studies to define CD. Some used strict definitions, whereas, others accepted milder grade lesions. Furthermore, since the existence of latent CD and some silent CD without fully developed histology is now recognized, a study that aims to assess the sensitivity and specificity of biopsy itself in CD needs to use a design that incorporates the most sensitive and specific serologic and HLA tests available. The biopsy and serology should be performed simultaneously, with patients having discordant test results being further evaluated. Those with normal biopsy and positive serology would have to be followed over time to see if they have a latent form of CD. Conversely, patients with positive biopsies and normal serology would have to demonstrate improvement in histology on a GFD, and ideally, certification of relapse by biopsy with reintroduction of gluten. This type of study design was sought in order to address the objective of the sensitivity and specificity of biopsy.

Populations

  1. Unselected general population. The unselected general population implies a representative sample of a given population, such as a random sample of healthy blood donors or healthy school children. Some unselected populations are better than others for determining the true prevalence or incidence of CD. For example, blood donors are required to have normal hemoglobin and no iron deficiency, and therefore may underestimate the true numbers of patients with CD.

  2. Suspected CD. Patients with suspected CD include patients with GI symptoms, such as diarrhea or symptomatic malabsorption, who are being investigated for the possibility of CD. These patients are typically undergoing other investigations in addition to being worked-up for CD.

  3. High-risk populations. High-risk populations include populations with an expectedly higher prevalence of CD. Such populations include asymptomatic family members of patients with CD, patients with type I diabetes where identified CD would likely be silent or latent, and populations such as those with iron deficiency or osteoporosis where identified CD would be in the atypical CD classification.

HLA DQ2/DQ8

The HLA DQ2 haplotype represents the occurrence of HLA class II heterodimer alleles DQA1*0501 and DQB1*0201. These typically occur in a cis position as HLA DR3-DQ2 or in a trans position as HLA DR5/DR7-DQ2. The HLA DQ8 haplotype DQA1*0301/DQB1*302 typically occurs in association with DR4.

Analytical Framework

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf1.jpg.

   Figure 1. Analytic framework

The analytical framework is presented in Figure 1. In this framework, we wanted to represent the diagnostic pathways and the potential outcomes of testing various populations for CD. Each step of the pathway represents a portion of this systematic review, starting with the identification of the populations of interest, their diagnostic pathways, and ultimately the clinical outcomes, as well as consequences of testing.

Study Identification

Although the objectives of this task order are contained within a request for a single evidence report, we conducted five separate reviews, from the literature search onwards, as the objectives of this mandate were more orthogonal than overlapping.

Search Strategy

A series of searches were performed by National Library of Medicine staff in support of the literature review for CD. Strategies were developed using the guidelines supplied by the UO-EPC, and were divided into the five questions posed by AHRQ. All searches were limited to human studies published in English language journal articles. The specific strategies used for each search are located in Appendix B.

  1. What is the sensitivity and specificity of the following tests for CD:

    1. EMA

    2. human tTG IgA antibodies

    3. AGA EMA

    4. HLA DQ2/DQ8

    5. small bowel biopsy

    Searches were run in the MEDLINE® and EMBASE databases for each of the five tests. With the exception of the search for small bowel biopsy, a reference to CD or its synonyms was not a requirement for retrieval in order to obtain the widest possible information on these tests. Because of their complexity, a separate search was run for each test, then the results combined into one Pro-Cite file and duplicates eliminated. Individual case reports and letters to the editor were also removed.

    The MEDLINE® searches were run in October 2003 for the year 1966 forward and yielded a total of 2885 citations, with a follow-up search for HLA DQ2 and DQ8 performed in November 2003 that yielded an additional 390 citations. The EMBASE searches were run in December 2003 for the year 1974 forward and yielded a total of 1,046 citations after duplicates to MEDLINE® were removed.

  2. What is the prevalence and incidence of symptomatic and clinically silent CD in the general population and in the following identified high-risk populations:

    1. patients with an affected family member

    2. type 1 diabetes mellitus

    3. IDA

    4. osteoporosis

    Searches were run in the MEDLINE® and EMBASE databases. The MEDLINE® search was performed in October 2003 for the year 1966 forward and retrieved a total of 1,584 citations. The EMBASE search was run in December 2003 for the year 1974 forward and yielded 467 citations after duplicates to the MEDLINE® retrieval were removed. Individual case reports and letters to the editor were also removed from both searches.

  3. What is the association between CD and GI lymphoma?

    Searches were run in the MEDLINE® and EMBASE databases. The MEDLINE® search was performed in October 2003 for the year 1966 forward and retrieved a total of 230 citations. The EMBASE search was run in December 2003 for the year 1974 forward and yielded 97 citations after duplicates to the MEDLINE® retrieval were removed. Individual case reports and letters to the editor were also removed from both searches.

  4. What are the expected consequences of testing for CD in the following populations:

    1. patients with symptoms suggestive of CD

    2. asymptomatic, at-risk populations

    3. general population

    Searches were run in the MEDLINE®, EMBASE, PsycINFO, AGRICOLA, CAB, and Sociological Abstracts databases. In order to obtain the widest possible retrieval, all articles on screening for celiac and its synonyms were included, not just those discussing consequences.

    The MEDLINE® search was performed in October 2003 for the year 1966 forward and retrieved a total of 917 citations. The EMBASE (1974 forward), PsycINFO (1840 forward), AGRICOLA (1970 forward), CAB (1972 forward), and Sociological Abstracts (1963 forward) database searches were run in December 2003 and yielded a combined total of 204 citations after duplicates to the MEDLINE® retrieval were removed. Individual case reports and letters to the editor were also removed from both searches.

  5. What interventions are effective for promoting or monitoring adherence to a GFD?

    Searches were run in the MEDLINE®, EMBASE, PsycINFO, AGRICOLA, CAB, and Sociological Abstracts databases. Because of the small number of citations retrieved, a few selected articles discussing adherence to dietary limitations for other conditions were included. The MEDLINE® search was performed in October 2003 for the year 1966 forward and retrieved a total of 152 citations. The EMBASE (1974 forward), PsycINFO (1840 forward), AGRICOLA (1970 forward), CAB (1972 forward), and Sociological Abstracts (1963 forward) database searches were run in December 2003 and yielded a combined total of 168 citations after duplicates to the MEDLINE® retrieval were removed. Individual case reports and letters to the editor were also removed from both searches.

Some citations fulfilled the criteria of more than one celiac objective. Duplicates within each celiac objective were electronically removed. The obtained citations were uploaded into an internal web-based review system (SRS) for online collaborative citation screening and abstraction. Articles passing the first level screen were retrieved in full for further screening (see below).

Reference lists of included studies, book chapters, and narrative or systematic reviews retrieved after having passed the first level of relevance screening, were manually searched to identify additional unique references. Through contact with content experts, and the TEP, attempts were made to identify other studies not identified by the search.

Study Selection and Eligibility Criteria

Table 1. Inclusion/exclusion criteria by level of screening
ObjectiveLevelInclusionExclusion
Celiac 11 Any article reporting sensitivity/specificity of AGA, EMA, tTG, HLA DQ2/DQ8, or biopsy. Clearly unrelated citation.
2For serology and HLA - articles where sensitivity and specificity could be extracted.
For biopsy - articles were included if some measure of diagnostic utility could be obtained.
3Articles that allowed determination of sensitivity or specificity for all tests were included.• Articles with major methodological flaws excluded
• Control group did not have gold standard test (biopsy) applied
• No description of biopsy criteria given
• Celiac group known to be positive for test under evaluation
• Control group known to be negative for the test under evaluation
• Control groups included patients with Marsh I or II biopsy lesions
• AGA test performed without commercial ELISA kit or before 1990
Celiac 21 Any potential citation of prevalence or incidence of CD in general and high-risk populations or association of CD with other disorders Clearly unrelated citation.
2Citations limited to those that gave evidence of the prevalence or incidence of CD in the general population or the AHRQ identified high-risk populations (e.g., diabetes, relatives, iron deficiency, osteoporosis).Any studies of other CD-associated disorders not identified by the task order.
Countries: North America, western Europe, Australia, New Zealand.Citations of the prevalence of specific disorders in patients with celiac (i.e., reverse of the inclusion).
Any other country.
3Incidence and/or prevalence could be extracted from the article.Serious methodological flaws:
• patients identified by surveys, through solicitation of celiac societies
• incidence studies without a population density denominator
Celiac 31 Any potential citation of the association, prevalence or risk of lymphoma in CD, including articles on outcome of refractory sprue and ulcerative jejunoileitis. Clearly unrelated citation.
2Measure of risk or prevalence/incidence of lymphoma in a population with CD.Prevalence of CD in a population of lymphoma
Case reports and non-comparative case series.
3Extractable prevalence, incidence, or cumulative risk of lymphoma in CD.Clonality of lymphocytes in ulcerative jejunoileitis-ileitis not determined or stated (as per TEP).
Serious methodological flaw.
Celiac 41 Any potential citation of possible consequences of testing for CD. Clearly unrelated citation.
2 Consequences extractable from article.
3Consequences limited to the AHRQ list.Consequences obtainable from the other celiac objective sub-review - i.e., false positive and negative results, etc.
Celiac 51 Any potential citation of interventions for the monitoring or promotion of adherence. Clearly unrelated citation.
2Studies of monitoring adherence were included if they assessed monitoring, by biopsy, serology (AGA publication date 1990 or later, EMA, tTG), or both.Serology prior to 1990.
Any promotion intervention
3Data from article could be extracted. Data included follow-up by biopsy alone or serology with biopsy confirmation.Articles assessing adherence through the measures of intestinal permeability.
Studies that reported changes in mean serological titers with a GFD or gluten challenge, but did not address the potential usefulness of a serologic test to assess compliance.
Study selection was performed using three levels of screening with increasingly more strict criteria to ensure that all relevant articles were captured (Table 1). Each celiac objective had its own selection criteria for each level of screening and, as discussed previously, each celiac objective was treated as a separate sub-review. Following a calibration exercise, two reviewers independently screened all studies using the SRS web-based system. This system allows automatic identification of review disagreements. Any disagreements were resolved by the two reviewers by consensus; rarely, a third reviewer was used to break an impasse. The specific screening questions for each screen level are included in Appendix C.

Level 1 broad screening. Level 1 screening was used to identify any potentially relevant citation, based on review of the title, abstract and key words. For each objective, the SRS system displayed the corresponding task order questions alongside the citation details. Reviewers answered a broad question of whether the citation potentially related to the current objective. Furthermore, the SRS system was set-up in such a way that articles which were identified in one celiac objective silo, that could also be relevant to another objective, could be identified and moved/copied to the other silo. The review team was divided up so that two members could be simultaneously reviewing each objective.

Level 2 refined screening. Potentially relevant articles identified at level 1 were obtained in full for level 2 screening. Again, using the SRS system with the actual articles on hand, reviewers selected articles that related to each of the specific objectives. The reviewers were asked to err on the side of inclusion for this level, and to classify articles as “original” or “review”. Original articles meeting level 2 inclusion also had basic demographic data—such as screening test used, celiac definition, and study population identified—recorded into the SRS system.

Level 3 final screening. Level 3 screening identified articles that specifically allowed for the answering of the task order questions. These articles fulfilled the final inclusion/exclusion criteria, allowed actual extraction of the required data, and did not have fatal methodological flaws.

Important articles answering a stated objective but not meeting inclusion criteria (i.e., containing potential threats to internal validity), were presented and discussed in the discussion section.

Data Abstraction

For each objective, a detailed and standardized data abstraction form was developed with the assistance of content experts and the TEP panel. The data abstraction forms included baseline study characteristics as well as questions allowing for the abstraction of all relevant study results and characteristics. The electronic data extraction forms began with basic study and patient demographic questions that were common across the five sub-review forms. These included reviewer name, author name, publication year, publication type, study design type, and basic study population demographics such as race, age, gender, and type of CD population. The extraction forms then moved to specific questions geared at extracting data to answer the respective objective's questions. The individual data abstraction forms are included in Appendix C.

Celiac 1 (sensitivity and specificity) data abstraction form. Separate data abstraction forms were developed for serology, HLA, and the biopsy sub-questions. Two-by-two tables were used to abstract data on sensitivity and specificity, and to determine positive and negative predictive values and the prevalence of CD in the tested population. The biopsy studies were quite heterogeneous, and did not allow for direct numeric extraction of data.

Celiac 2 (prevalence and incidence) data abstraction form. For this objective, the data extraction form included questions for detailing the screened study population, the number of individuals screened, the number of CD cases identified and how CD was confirmed. For incidence studies, the comparison population and time period were recorded.

Celiac 3 (lymphoma) data abstraction form. In addition to the basic demographic, and study design data, the extraction form contained fields for the extraction of risk data linking GI lymphoma to CD. Types of data sought were prevalence and incidence of lymphoma in CD in the setting of comparison data from a control population. Fields for extracting standardized incidence, morbidity, and mortality ratios were included.

Celiac 4 (consequences of screening) data abstraction form. The extraction forms for this objective included text fields to detail the consequences of testing for CD. The form contained fields that identified the specific consequence of testing which was addressed by the study, as well as a data field to report the study findings. The general field approach was chosen to allow extraction of the expected varied data for this objective.

Celiac 5 (monitoring and promoting adherence) data abstraction form. For this objective, standard demographic data was collected, as well as the methods used to monitor adherence to a GFD, the response of those measures to the diet, and the correlation of serological methods with biopsy findings. Space was provided to detail the sensitivity and specificity of the monitoring method when that data was available. For the objective of promoting adherence to a GFD, a text-based form was used to allow the extractor to describe the intervention and the results of its use.

Electronic forms. The abstraction forms were developed in Microsoft Excel to allow for electronic data entry and recording, and to allow exporting the evidence table data into Microsoft Word. For each celiac objective, data abstraction was conducted by one reviewer and verified by another. The extracted data was further verified by one of the principal investigators.

Quality Assessment

The quality of reporting of diagnostic test studies was assessed using the QUADAS tool.19 This tool is the first to be published that allows for the assessment of the quality of studies of diagnostic tests. The instrument was developed using a Delphi procedure. The Delphi panel consisted of nine experts in diagnostic research who refined an initial list of items in four rounds, after which agreement was reached on the items to be included in the tool. The QUADAS tool consists of 14 questions that are answered “yes,” “no,” or “unsure.” The tool addresses the items individually and does not incorporate an overall quality score (Appendix D).

Cohort and case-control study reports were assessed using the Newcastle-Ottawa scale (NOS; Appendix D). The NOS is an ongoing collaboration between the Universities of Newcastle, Australia and Ottawa, Canada. It was developed to assess the quality of non-randomized studies with its design, content and ease-of-use directed to the task of incorporating the quality assessments in the interpretation of meta-analytic results. A “star system” has been developed in which a study is judged on three broad perspectives: the selection of the study groups; the comparability of the groups; and the ascertainment of either the exposure or outcome of interest for case-control or cohort studies, respectively. The goal of this project is to develop an instrument that provides an easy and convenient tool for quality assessment of non-randomized studies for use in a systematic review.

The inter- and intra-rater reliability of the NOS have been established. The face content validity of the NOS has been reviewed based on a critical review of the items by several experts in the field, who evaluated its clarity and completeness for the specific task of assessing the quality of studies to be used in a meta-analysis. Furthermore, the validity of the NOS criteria has been established by comparisons to more comprehensive but cumbersome scales. An assessment plan is being formulated for evaluating its construct validity, with consideration of the theoretical relationship of the NOS to external criteria and the internal structure of the NOS components.20

Quality assessments of cross-sectional reports were assessed using a 19-item instrument adapted from Ophthalmology (Appendix D).21

We did not conduct any sensitivity analysis of quality assessments on the observational studies, as there is little by way of guidance to suggest what a poor quality study score would be based on for these assessment instruments.

One reviewer assessed the quality of an entire celiac objective to maintain internal consistency. Quality assessment was not performed under masked conditions.

Data Synthesis and Analysis

The data obtained from this review fell into several broad categories, which correspond in large part to the individual study objectives. These will be addressed in turn.

Data for the sensitivity and specificity of each serological marker was considered separately. In addition, studies were subdivided by the population age group (adults, children, mixed population), and by study design (case control, relevant clinical population/cohort).

Attempts were made to identify, explain, and minimize clinical and statistical heterogeneity in the included studies. Heterogeneity was assessed graphically by plotting receiver operator (ROC) curves for each of the included studies in a given analysis. A Pearson's Chi Square with n-1 degrees of freedom, where n represents the number of included studies in an analysis was calculated to assess statistical heterogeneity.

Pooled estimates were only calculated if clinically and statistically appropriate. In situations where pooling was not performed, a narrative systematic review was conducted.

There are several potential ways to pool the results of studies of diagnostic tests, each having both advantages and disadvantages. The simplest and most intuitive is to simply perform a weighted mean of the sensitivity and specificity for the studies in question. This method provides a pooled estimate that is easy to interpret by clinicians. Several other techniques involve the pooling of diagnostic odds ratios or likelihood ratios. These methods have the distinct disadvantage of difficulty in interpretation, and the inability to derive a pooled sensitivity or specificity from the resulting estimates. Lastly, one can use one of several methods to produce a summary ROC curve. The method described by Littenberg and Moses,22, 23 has the advantage of being able to produce a summary curve while taking into account a threshold effect. This can occur when different studies use different thresholds to define a positive test, or even from differences in labs using the same cut-off. To interpret summary ROC curves it is necessary to know the sensitivity or specificity of the test in question in the population in which it will be applied. Since neither of these values is estimable without conducting yet another diagnostic accuracy study for the given population, the clinical usefulness of using this method alone is limited.24, 25

In order to produce clinically useful pooled statistics, we calculated a weighted mean of the sensitivity and specificity from those of the included study. For both sensitivity and specificity, this pooling relies on the assumption that the test statistic is the same in all of the included studies. For each pooled estimate, a 95% confidence interval (CI) was calculated using both a fixed and random effects model. The results of which were compared as a further test for heterogeneity. The pooled estimates for the sensitivity and specificity were also compared with a summary ROC curve calculated for the same group of studies as a second check of the estimates (summary ROC Curves are included in Appendix E).

The prevalence and incidence data from the Celiac 2 objective, and the CD-lymphoma data from the Celiac 3 objective, were anticipated to be quite heterogeneous considering the different, countries, age groups, and risk characteristics of the studied patients. Attempts were made to group studies of prevalence by age group, study population, and serological screening method. If the grouped studies did not show evidence of heterogeneity, pooled estimates of the prevalence were produced for that group of studies, otherwise a descriptive presentation of the data with a qualitative systematic review was conducted. Likewise, the outcome measures of the Celiac objectives 4 and 5 were presented in a qualitative systematic review, except in cases where it was possible to pool the sensitivity and specificity data as measures of monitoring of patients at various stages of recovery on a GFD.

Chapter 3. Results

Celiac 1: Sensitivity and Specificity of Tests for CD

Serology

Out of 3,982 citations identified by the search strategy for the Celiac 1 objective, 907 met level 2 screening criteria. Of these, 204 diagnostic test studies of one or more of the serological markers of interest (AGA, EMA, tTG) were identified. Sixty studies fulfilled the level 3 inclusion criteria (Appendix F; Evidence Table 1, Appendix I).26–85 The most common reasons for failing level 3 inclusions were AGA studies conducted before 1990, studies utilizing an improper or an unbiopsied control group, or studies that did not give any description of the biopsy criteria defining CD. Five pairs of duplicate publications were identified.27, 28, 45, 46, 58, 65, 73, 74, 84, 86 Out of each duplicate pair, the study with the most complete data was abstracted,27, 45, 46, 58, 74 bringing the total of included unique studies to 55. The majority of these studies assessed more than one serological marker, and some studied more than one age group. Of the included articles, 20 were conducted in or included an adult population, 33 were conducted in a population of children, and eight in a mixed population of adults and children of varying proportions. The statements in this section that relate to mixed studies or studies in children and adults refer to these eight studies, and not to a sample that we pooled from different studies.

To minimize clinical and statistical heterogeneity, the included articles of a particular antibody test were divided into groups by age of the included population (adults, children, mixed), the study design (case control, or relevant clinical population/cohort), by antibody type (IgA or IgG), and by test methodology (e.g., monkey esophagus [ME] or human umbilical cord [HUC]). Within these groups, further differences in study population, country of origin, and biopsy definitions (especially whether or not mild grades without villous atrophy were included) were assessed systematically. Studies that reported using the ESPGAN criteria for the diagnosis of CD were categorized as including patients with some degree of villous atrophy. Other potential causes of heterogeneity such as the cut-offs used to define a positive test were assessed.

Two articles were identified that assessed the diagnostic value of various antibodies in children64 and in mixed-age populations40 with IgA deficiency. As well, one study enrolled biopsy-proven CD patients who were known to be EMA negative.66 These studies were considered separately from the others. Studies of using antibodies in combination were also assessed separately.

Pooled statistical estimates (with 95% CIs) are provided for studies without clinical and statistical heterogeneity, and summary ROC curves for the studied antibodies are provided in Appendix E. Sensitivity analyses by study design did not show a significant difference except for the analysis of IgA-tTG-guinea pig (GP) in adults. Therefore, apart from studies of IgA-tTG-GP in adults, pooled estimates, when available, included data from both study designs.

AGA. The diagnostic characteristics of IgA were assessed in 35 studies and the diagnostic characteristics of IgG-AGA were assessed in 30 studies. Of the 35 IgA-AGA studies, 11 were conducted in an adult population,30, 33, 45, 50, 54, 61–63, 71, 77, 80 21 in a population of children, 26, 27, 29, 31, 34, 36, 38, 42, 43, 50, 52, 56, 59, 60, 64, 67, 68, 83, 85, 87, 88 and five in a mixed population.27, 37, 40, 74, 75

Of the 30 IgG-AGA studies, seven were conducted in an adult population,30, 33, 54, 62, 63, 71, 80 19 were conducted in population of children,26, 27, 29, 31, 34, 36, 38, 42, 43, 50, 52, 58, 59, 64, 66, 68, 69, 83, 85 and five in a mixed population.27, 37, 40, 74, 75 Some studies provided data for more than one age group.

Some studies only provided summary statistics without the raw two-by-two table results,33, 34, 54, 58, 59, 69 however, the raw data was calculated from the presented sensitivity and specificity, and from the group sizes.

One study66 was conducted in CD patients who were known to be IgA-EMA negative, and was not included in the main analysis. In this study of children, the sensitivity for IgA-AGA was 22% and the sensitivity for IgG-AGA was 33%, whereas, the specificity for IgA-AGA was 67% and the specificity for IgG-AGA was 58%; these values are considerably lower than those reported in other studies. Another two studies were conducted in patients with IgA deficiency.40, 64 The first demonstrated a sensitivity of 0% using IgA-AGA, but a sensitivity and specificity of 100% using IgG-AGA,40 whereas the second showed a sensitivity of 0% with IgA-AGA, but a sensitivity of 100% and a specificity of 80% using IgG-AGA.

Table 2. Included studies for IgA-AGA in children
Author, year; countryStudy typeBiopsy criteriaSensSpecPPVNPVPrev
Picarelli, 2000; ItalyCase-controlESPGAN22.2*66.7*50*36.3*0.60*
Gaetano, 1997; ItalyCase-controlESPGAN926885.280.90.67
Carroccio, 1993; ItalyCase-controlBiopsies confirmed at diagnosis, on GFD, and rechallenge (severity grade - not reported)6891.786.179.70.43
Hansson, 2000; SwedenCase-controlESPGAN95.573.977.894.40.49
Berger, 1996; SwitzerlandCase-controlESPGAN revised with complete villous atrophy766774590.55
Lerner, 1994; USA, IsraelCase-controlCriteria of Townley modified by Ingkaran529487740.52
Bahia, 2001; BrazilRelevant clinical populationSevere villous atrophy95.595.691.397.90.31
Russo, 1999; CanadaRelevant clinical populationESPGAN83.384.564.593.80.25
Bode, 1993; DenmarkRelevant clinical populationESPGAN649990970.07
Poddar, 2002; IndiaRelevant clinical populationESPGAN (villous atrophy and unequivocal response to GFD)9491.59293.50.52
Ascher, 1996; SwedenRelevant clinical populationESPGAN10094.495.71000.55
Lindberg, 1985; SwedenRelevant clinical populationESPGAN; Alexander grading88880.31
Altuntas, 1998; TurkeyRelevant clinical populationSubtotal or total villous atrophy, crypt hyperplasia, increased IEL239075480.55
Artan, 1998; TurkeyRelevant clinical populationESPGAN ;585142.466.70.38
Rich, 1990; USARelevant clinical populationNot recorded - state“severe” lesion539372.785.70.25
Gonczi, 1991; AustraliaRelevant clinical population (184 children with suspected CD)ESPGAN no details on biopsy findings9592.47698.60.20
Wolters, 2002; NetherlandsRelevant clinical population (identified retrospectively)Subtotal villous atrophy with crypt hyperplasia838681810.51
Lindquist, 1993; SwedenRelevant clinical population (suspected celiac)ESPGAN; subtotal or partial villous atrophy86.592.793.7850.55
Chirdo, 1999; ArgentinaRelevant clinical trialTotal or subtotal villous atrophy7587.184800.47
Chartrand, 1997; CanadaRelevant clinical populationESPGAN - with flat mucosal biopsy809267960.17
Meini, 1996; ItalyRelevant clinical populationPartial villous atrophy or total villous atrophy0100091.70.08
*

30 IgA-EMA-negative patients suspected of CD; 9 of 18 CD patients IgA deficient

Table 7. Included studies for IgG-AGA in studies including both children and adults
Author, year; countryStudy typeBiopsy criteriaNotesSensSpecPPVNPVPrev
Cataldo, 2000 ; ItalyCase-controlOriginal and revised criteria?20 IgA-deficient CD vs healthy IgA-deficient non-CD1001001001000.7
Sulkanen, 1998 ; FinlandCase-controlESPGAN6973.46378.30.4
Ascher, 1996 ; SwedenRelevant clinical populationESPGAN96.469.272.695.70.5
Carroccio, 2002 ; ItalyRelevant clinical populationMarsh-broke down by criteria; CD was diagnosed as enlarged crypts and/or villous atrophy - with normalization on GFD767573.477.30.5
Tesei, 2003 ; ArgentinaRelevant clinical populationMarsh II to IV - with confirmation848689790.6
An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf2.jpg.

   Figure 2. IgA-AGA in children with CD

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf4.jpg.

   Figure 4. IgA-AGA in adults and children with CD

Despite clinical subdivision of the identified studies, significant heterogeneity was identified for each of the pooled AGA subgroup results (Tables 2 to 7). Heterogeneity can be visualized graphically in the ROC curves (Figures 2 to 4) and suggests that the heterogeneity is in part related to a serological test cut-off threshold effect. As well, two studies included CD patients with less than a Marsh IIIa grade;37, 45 these studies had lower than average sensitivities (61% and 67% for IgA-AGA) then that reported in other studies. The remaining heterogeneity likely represents a combination of the effects of different test kits, inter-lab variability, and differences in the study groups. For example, within the child population, two of the outlier studies were conducted in Turkey,26, 85 although apparently using standard methodology. Therefore, overall pooled estimates do not represent true summary statistics in these situations.

IgA-AGA. Despite the apparent heterogeneity, one can make some broad statements regarding the diagnostic value of AGA antibodies. IgA-AGA appears to offer fair to good performance in children (Table 2; Figure 2).26, 27, 29, 31, 34, 36, 38, 42, 43, 50, 52, 58, 59, 64, 66, 68, 69, 83, 85 Ten of the 19 studies demonstrated a sensitivity of IgA-AGA of greater than 80%, and six of the studies demonstrated a sensitivity of greater than 90%. However, nine studies demonstrated sensitivities of less than 80%. The specificity was greater than 80% in 15 of the 19 studies, and greater than 90% in 11 studies. Only four studies showed a specificity of less than 80%.

Table 3. Included studies for IgA-AGA in adults
Author, year; countryStudy typeBiopsy criteriaSensSpecPPVNPVPrev (%)
Sategana-Guidetti, 1995; ItalyCase-controlRoy-Choudhury criteria; partial or total villous atrophy5510010055.935.0
Dahele, 2001; ScotlandCase-controlIncluded 6 with IEL, rest partial villous atrophy or greater618688.542.743.6
Bode, 1994; DenmarkRelevant clinical populationCrypt hyperplasia, villous atrophy and increase inflammatory cells4698759225.7
Kaukinen, 2000; FinlandRelevant clinical populationVillous height to crypt ratio <2.0; IEL and HLA also tested8345759257.0
Maki, 1991; FinlandRelevant clinical populationSevere pathology with crypt hyperplasia to total villous atrophy; mild changes considered normal30.887.222.291.314.8
McMillan, 1991; IrelandRelevant clinical populationRevised ESPGAN10010010010031.5
Bardella, 2001; ItalyRelevant clinical populationMarsh; no grade reported9589769833.3
Gonczi, 1991; AustraliaRelevant clinical population (184 children with suspected CD)ESPGAN no details on biopsy findings9288.285.293.845.8
Valdimarsson, 1996; SwedenRelevant clinical population+ a few dypeptic controlsAlexander's classification; partial or subtotal villous atrophy7970289636.8
Vogelsang, 1995; AustriaRelevant study populationModified ESPGAN; flat mucosa; crypt hyperplasia raised IELs81.68381.68348.0
An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf3.jpg.

   Figure 3. IgA-AGA in adults with CD

Ten studies assessed IgA-AGA in adults (Table 3; Figure 3).30, 33, 54, 62, 63, 71, 80 Five of the ten studies demonstrated sensitivities greater than 80%, and three of the studies demonstrated sensitivities of greater than 90%. However, four studies demonstrated sensitivities of less than 65%. The specificity was greater than 80% in eight studies and greater than 90% in three. Five studies had specificities between 80% and 90%, and only two studies had specificities less than 80%.

Table 4. Included studies for IgA-AGA in studies including both children and adults
Author, year; countryStudy typeBiopsy criteriaNotesSensSpecPPVNPVPrev
Cataldo, 2000; ItalyCase-controlOriginal & revised criteria?20 IgA-deficient CD vs healthy IgA-deficient non-CD0100033.30.7
Sulkanen, 1998; FinlandCase-controlESPGAN84.581.675.2890.4
Ascher, 1996; SwedenRelevant clinical populationESPGAN90.998.59892.70.5
Carroccio, 2002; ItalyRelevant clinical populationMarsh, broken down by criteria; CD was diagnosed as enlarged crypts and/or villous atrophy-with normalization on GFD679086750.5
Tesei, 2003; ArgentinaRelevant clinical populationMarsh II to IV - with confirmation649292640.6
Among the studies that assessed IgA-AGA in a mixed population of adults and children,27, 37, 74, 75 two demonstrated poor sensitivities of less than 70% but with specificities between 90% and 92%, one demonstrated a sensitivity of 85% and a specificity of 85%, and the last demonstrated a sensitivity of 91% and a specificity of 98% (Table 4; Figure 4).

Table 5. Included studies for IgG-AGA in adults
Author, year, countryStudy typeBiopsy criteriaSensSpecPPVNPVPrev (%)
Sategana-Guidetti, 1995 ; ItalyCase-controlRoy-Choudhury criteria; partial or total villous atrophy7880.787.667.656.7
Bode, 1994 ; DenmarkRelevant clinical populationCrypt hyperplasia, villous atrophy and increase inflammatory cells6297739434.8
Kaukinen, 2000 ; FinlandRelevant clinical populationVillous height to crypt ration <2.0; IEL and HLA also tested17861493.515.1
Maki, 1991 ; FinlandRelevant clinical populationSevere pathology with crypt hyperplasia to total villous atrophy; mild changes considered normal46.28933.393.314.8
McMillan, 1991 ; IrelandRelevant clinical populationRevised ESPGAN5785648128.1
Gonczi, 1991 ; AustraliaRelevant clinical population (184 children with suspected CD)ESPGAN no details on biopsy findings10069.769.410061.0
Vogelsang, 1995 ; AustriaRelevant study populationModified ESPGAN; flat mucosa; crypt hyperplasia raised IELs73.573.6727549.0
An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf5.jpg.

   Figure 5. IgG-AGA in adults with CD

IgG-AGA. The seven studies of IgG-AGA in adults demonstrated considerably greater heterogeneity.30, 33, 54, 62, 63, 71, 80 The sensitivity ranged from 17% to 100%, with little study grouping. However, there was less variation in the reported specificities. Five of the seven studies demonstrated specificities greater than 80%, whereas, the remaining two studies had specificities of greater than 70%. (Table 5; Figure 5)

Table 6. Included studies for IgG-AGA in children
Author, year; countryStudy typeBiopsy criteriaSensSpecPPVNPVPrev
Picarelli, 2000 ; ItalyCase-controlESPGAN33.358.354.536.80.60
Gaetano, 1997; ItalyCase-controlESPGAN1003675.71000.67
Carroccio, 1993 ; ItalyCase-controlBiopsies confirmed at diagnosis, on GFD, and rechallenge (severity grade - not recorded)88.946.755.684.80.43
Hansson, 2000 ; SwedenCase-controlESPGAN81.882.681.882.60.49
Berger, 1996 ; SwitzerlandCase-controlESPGAN revised with complete villous atrophy695968530.55
Lerner, 1994 ; U.S.A, IsraelCase-controlCriteria of Townley modified by Ingkaran889288920.52
Bahia, 2001 ; BrazilRelevant clinical populationSevere villous atrophy90.997.895.295.70.32
Russo, 1999 ; CanadaRelevant clinical populationESPGAN83.385.966.793.80.25
Bode, 1993 ; DenmarkRelevant clinical populationESPGAN7199100980.07
Ascher, 1996 ; SwedenRelevant clinical populationESPGAN10066.775.61000.55
Lindberg, 1985 ; SwedenRelevant clinical populationESPGAN; Alexander or Perea et al.938993.188.60.31
Altuntas, 1998 ; TurkeyRelevant clinical populationSubtotal or total villous atrophy, crypt hyperplasia, increased IEL10005500.55
Artan, 1998 ; TurkeyRelevant clinical populationESPGAN835955.685.20.38
Rich, 1990 ; USARelevant clinical populationNot reported - state “severe” lesion10058441000.25
Gonczi, 1991 ; AustraliaRelevant clinical population (184 children with suspected CD)ESPGAN no details on biopsy findings10092.476.91000.20
Wolters, 2002 ; NetherlandsRelevant clinical population (identified retrospectively)Subtotal villous atrophy with crypt hyperplasia838086820.51
Chirdo, 1999 ; ArgentinaRelevant clinical trialTotal or subtotal villous atrophy85.780.680860.47
Chartrand, 1997 ; CanadaRelevant clinical populationESPGAN - with flat mucosal biopsy837945960.17
Meini, 1996 ; ItalyRelevant clinical populationPartial villous atrophy or total villous atrophy1008031.21000.08
An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf6.jpg.

   Figure 6. IgG-AGA in children with CD

In contrast, among the 17 analyzed studies (non-IgA deficient) of IgG-AGA conducted in children,26, 27, 29, 31, 34, 36, 38, 42, 43, 50, 52, 58, 59, 68, 69, 83, 85 there seemed to be greater variability in the specificity than in the sensitivity (Table 6; Figure 6). Fifteen of the 17 studies demonstrated sensitivities that were greater than 80%, and six demonstrated sensitivities greater than 90%. Only two studies showed a sensitivity of less than 80%. In contrast, with regards to specificity, two groupings of studies become apparent. The first group consists of 11 studies, all of which had specificities greater than 79%, and except for one study, had sensitivities that were greater than 80%. In contrast, the second group of six studies all had specificities below 70%, and with the exception of one study, had sensitivities greater than 80%. (Tables and figures)

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf7.jpg.

   Figure 7. IgG-AGA in children and adults with CD

Four studies looked at IgG-AGA in a non-IgA-deficient mixed population of adults and children.27, 37, 74, 75 Two of these demonstrated sensitivities greater than 80%, one showed a sensitivity of 84%, whereas the second had a sensitivity of 96%. However, only the first study had specificity greater than 80%. In total, three of the four studies had specificities less than 80% (Table 7; Figure 7).

EMA

EMA—ME. The diagnostic characteristics of IgA-EMA-ME were assessed in 35 studies, and the diagnostic characteristics of IgG-EMA-ME were assessed in three studies. Of these included studies, 11 IgA-EMA-ME studies were conducted in adults,30, 32, 39, 51, 57, 63, 71, 77, 78, 80, 81 17 in children,27, 35, 36, 38, 41, 44, 46, 51, 52, 55, 56, 58, 60, 69, 79, 82, 83 and five in a mixed population.27, 37, 40, 47, 75 Some studies provided data for more than one age group. One study in children provided data on two different populations (including different control groups).55 IgG-EMA-ME was assesed in one adult population,63 one child population,66 but not in any of the mixed-population studies.

One study was conducted in a population of known CD patients who had previously tested negative for EMA. In this study, the sensitivity and specificity of IgG EMA-ME were both 100%;66 the performance of IgA-EMA was not reported. Another study that included CD patients with less than a Marsh IIIa grade,37 demonstrated a sensitivity of 88%. Some studies only provided summary statistics without the raw two-by-two table results,46, 58, 69 however, the raw data was abstracted based on the reported sensitivity and specificity, and the group sizes.

Table 8. Included studies for IgA-EMA-ME in adults
Author, year; countryStudy typeBiopsy criteriaSensSpecPPVNPVPrev(%)
Hallstrom, 1989 ; FinlandCase-controlFlat mucosa90.610010088.951.8
Biagi, 2001 ; ItalyCase-controlPartial villous atrophy or greater94.610010094.549.1
Ladinser, 1994 ; ItalyCase-controlRevised ESPGAN100100.010010021.1
Sategana-Guidetti, 1995 ; ItalyCase-controlRoy-Choudhury criteria; partial or total villous atrophy10010010010063.7
Valentini, 1994 ; ItalyCase-controlPartial villous atrophy or greater9910010096.776.2
Volta, 1995 ; ItalyCase-controlRoy-Choudhury criteria9510010097.135.6
Carroccio, 2002 ; ItalyRelevant clinical populationFerguson and Murray; partial or total villous atrophy10010010010011.6
McMillan, 1991 ; IrelandRelevant clinical populationRevised ESPGAN89.210010095.328.1
Bardella, 2001 ; ItalyRelevant clinical populationMarsh10097.29310028.7
Valdimarsson, 1996 ; SwedenRelevant clinical population+ a few dypeptic controlsAlexander's classification; partial or subtotal villous atrophy74100100969.7
Vogelsang, 1995 ; AustriaRelevant study populationModified ESPGAN; flat mucosa; crypt hyperplasia raised IELs10010010010048.0
An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf8.jpg.

   Figure 8. IgA-EMA-ME in adults with CD

IgA-EMA-ME. Among the 11 studies of IgA-EMA-ME conducted in adults,30, 32, 39, 51, 57, 63, 71, 77, 78, 80, 81 the specificity of the test was 100% in all except one, which showed a specificity of 97.2% (Table 8; Figure 8). The sensitivity of the test showed some slight variation among the studies. One outlier study demonstrated a sensitivity of only 74%;77 however, the authors found that in the remaining five of 19 CD patients who tested negative for EMA, three were IgA deficient. If these patients were excluded, the sensitivity rose to 88%. The authors also go on to say that they seem to have a high proportion of IgA-deficient subjects in their referral base. The remaining ten studies showed sensitivities of 89% or greater. In fact, five studies showed a sensitivity of 100%, one a sensitivity of 99%, and another a sensitivity of 97%. In all, eight out of the 11 showed a sensitivity of 95% or greater, matching the very high specificity of this test. There was no statistical heterogeneity for this analysis. The pooled estimates for the sensitivity and specificity along with their 95% CI values were 97% (95% CI: 95.7–98.5) and 99.6% (95% CI: 98.8–99.9), respectively.

Table 9. Included studies for IgA-EMA-ME in children
Author, year; countryStudy typeBiopsy criteriaSensSpecPPVNPVPrev
Chirdo, 2000; ArgentinaCase-controlESPGAN92.410010085.20.7
Kolho, 1997; FinlandCase-controlRevised ESPGAN95100100970.3
Kolho, 1997; FinlandCase-controlRevised ESPGAN1001001001000.5
Whelan, 1996; IrelandCase-controlSubtotal villous atrophy1001001001000.4
Bonamico, 2001; ItalyCase-controlESPGAN95.198.29044.30.5
Gaetano, 1997; ItalyCase-controlESPGAN969697.992.30.7
Carroccio, 1993; ItalyCase-controlBiopsies confirmed at diagnosis, on GFD, and rechallenge (severity grade - not reported)10096.795.71000.4
Di Leo, 2003; ItalyCase-controlESPGAN10096.593.51000.4
Vitoria, 2001; ItalyCase-controlSubtotal villous atrophy1001001001000.6
Hansson, 2000; SwedenCase-controlESPGAN95.510010095.80.5
Lerner, 1994; USA, IsraelCase-controlCriteria of Townley modified by Ingkaran979897980.5
Hallstrom, 1989; FinlandCase-controlFlat mucosa1001001001000.4
Chan, 2001; CanadaRelevant clinical populationVillous atrophy, crypt hyperplasia, increased lymphocytes899780980.1
Russo, 1999; CanadaRelevant clinical populationESPGAN7588.769.291.30.3
Ascher, 1996; SwedenRelevant clinical populationESPGAN95.410010094.70.6
Wolters, 2002; NetherlandsRelevant clinical population (identified retrospectively)Subtotal villous atrophy with crypt hyperplasia929090.5920.5
Lindquist,1993; SwedenRelevant clinical population (suspected CD)ESPGAN; subtotal or partial villous atrophy98.192.794.497.50.6
Kumar, 1989; USA, IsraelRelevant clinical population and control casesESPGAN + Townley96.089.087.096.70.2
An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf9.jpg.

   Figure 9. IgA-EMA-ME in children with CD

Among the 18 studies that assessed IgA-EMA-ME in children,27, 35, 36, 38, 41, 44, 46, 51, 52, 55, 56, 58, 60, 69, 79, 82, 83 all but one outlier69 were grouped together, and the sensitivities and specificities were both greater than 89% (Table 9; Figure 9). The outlier study demonstrated a sensitivity of only 74%, and also demonstrated low sensitivity for IgA-EMA-HU (see below).69 The authors comment on the difficulties of interpretating immunofluorescence data as a likely explanation. Ten studies showed sensitivities greater than 95%, and except for one study with a sensitivity of 89%, the remaining seven studies had sensitivities between 90% and 95%. All these studies demonstrated specificities of 89% or greater, 16 had specificities greater than 90%, and 14 had specificities greater than 96%. There was no evidence of statistical heterogeneity in this analysis. The pooled sensitivity and specificity was 96.1% (95% CI: 94.4–97.3) and 97.4% (95% CI: 96.3–98.2), respectively.

Table 10. Included studies for IgA-EMA-ME in studies including both children and adults
Author, year; countryStudy typeBiopsy criteriaNotesSensSpecPPVNPVPrev
Cataldo, 2000; ItalyCase-controlOriginal & revised criteria?20 IgA-deficient CD vs healthy IgA-deficient non-CD0100033.30.7
Dickey, 2001; Northern IrelandCase-controlVillous atrophy75.398.398.2760.6
Ascher, 1996; SwedenRelevant clinical populationESPGAN98.210010098.50.5
Carroccio 2002; ItalyRelevant clinical populationMarsh - broke down by criteria; CD was diagnosed as enlarged crypts and/or villous atrophy - with normalization on a GFD889998.7900.5
Tesei, 2003; ArgentinaRelevant clinical populationMarsh II to IV - with confirmation86100100830.6
An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf10.jpg.

   Figure 10. IgA-EMA-ME in adults and children with CD

Among the four studies in a mixed-age population that assessed IgA-EMA-ME,27, 37, 47, 75 all showed specificities of greater than 98% (Table 10; Figure 10). However, these studies showed some variation in the reported sensitivities. One study reported a very low sensitivity of 75%.47 Two other studies showed a sensitivity of 86% and 88%, respectively, whereas the last showed a sensitivity of 98%.

Table 11. Included studies for IgG-EMA-ME in adults
Author, year; countryStudy typeBiopsy criteriaSensSpecPPVNPVPrev (%)
McMillan, 1991; IrelandRelevant clinical populationRevised ESPGAN3998.3927813.5
Table 12. Included studies for IgG-EMA-ME in children
Author, year; countryStudy typeBiopsy criteriaNotesSensSpecPPVNPVPrev
Picarelli, 2000; ItalyCase-controlESPGAN30 IgA-EMA neg. pts suspected of CD; 9/18 CD patients IgA deficient1001001001000.1
IgG-EMA-ME. Only two studies meeting our inclusion criteria assessed IgG-EMA-ME, one in adults (Table 11),63 and one in children (Table 12).66 In the single adult study,63 the sensitivity of the test was found to be 39%, whereas, the specificity was 98%. In a case-control study design, Picarelli et al. studied 30 IgA-EMA-negative children suspected of having CD.66 Of these 30 children, 18 were subsequently found to have CD by duodenal biopsy and nine of the 18 were found to be IgA deficient. In this highly selected population, the reported sensitivity and specificity of IgG-EMA-ME were both 100%.

EMA—HU. IgA-EMA-HU was assessed in 13 studies. Six of these studies were conducted in adults,45, 49, 54, 57, 61, 70, 89 five in children,36, 53, 55, 69, 70 and two in a mixed population.72, 74 One study provided summary statistics without the raw two-by-two table results,69 however the raw data was calculated from the reported sensitivity and specificity and the group numbers. One study provided data on two different populations (including different control groups).55

IgG-EMA-HU was not assessed in any of the studies meeting our inclusion criteria.

Two studies included CD patients (both adult and children) with less than a Marsh IIIa grade, and reported IgA-EMA-HU sensitivities of 87% and 100%.45

70

Table 13. Included studies for IgA-EMA-HU in adults
Author, year; countryStudy typeBiopsy criteriaSensSpecPPVNPVPrev (%)
Gillbert, 2000; CanadaCase-controlMild, moderate, severe villous atrophy10010010010033.3
Ladinser, 1994; ItalyCase-controlRevised ESPGAN901001009818.9
Salmaso, 2001; ItalyCase-controlGrades I–IV Marsh with response to a GFD8710010095.124.7
Volta, 1995; ItalyCase-controlRoy-Choudhury criteria9510010097.135.6
Dahele, 2001; ScotlandCase-controlIncluded 6 with IEL, rest partial villous atrophy or greater8710010081.355.3
Kaukinen, 2000; FinlandRelevant clinical populationVillous height to crypt ration <2.0; IEL and HLA also tested88.910010098.97.6
An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf11.jpg.

   Figure 11. IgA-EMA-HU in adults with CD

IgA-EMA-HU. Six studies in adults assessed IgA-EMA-HU (Table 13; Figure 11).45, 49, 54, 57, 61, 70, 89 In all six, the specificity was reported to be 100%. There was, however, variability in the reported sensitivities, which ranged from 87% to 100%. Three studies demonstrated sensitivities between 87% and 89%, two between 90% and 95% and one showing a sensitivity of 100%. There was no observed statistical heterogeneity for this analysis. The pooled sensitivity and specificity was found to be 90.2% (95% CI: 85.9–93.4) and 100% (95% CI: 99.1–100), respectively.

Table 14. Included studies for IgA-EMA-HU in children
Author, year; countryStudy typeBiopsy criteriaSensSpecPPVNPVPrev
Kolho, 1997; FinlandCase-controlRevised ESPGAN95100100970.3
Kolho, 1997; FinlandCase-controlRevised ESPGAN1001001001000.5
Gaetano, 1997; ItalyCase-controlESPGAN9410010089.20.7
Salmaso, 2001; ItalyCase-controlGrades I–IV Marsh with response to GFD1001001001000.6
Russo, 1999; CanadaRelevant clinical populationESPGAN45.895.878.6840.3
Iltanen, 1999 FinlandRelevant clinical populationESPGAN - CD confirmed at follow-up10077.160.11000.3
An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf12.jpg.

   Figure 12. IgA-EMA-HU in children with CD

Five studies with six separate child populations assessed IgA-EMA-HU (Table 14; Figure 12).36, 53, 55, 69, 70 Four of the six studies were grouped together and revealed sensitivities between 94% and 100%, and specificities of 100%. Of the two outliers,90 one showed a sensitivity of 100% and a specificity of 77%. The other study,69 was an outlier in other analyses, and demonstrated a sensitivity of 46% and a specificity of 96%. The authors comment on difficulties of interpretation of the immunofluorescence as a likely explanation. After accounting for this study, there was no statistical heterogeneity documented for sensitivity. The pooled sensitivity for this analysis was 96.9% (95% CI: 93.5–98.6). A pooled specificity for this analysis was not calculated, but is likely close to 100% given that four of the five grouped studies demonstrated a specificity of 100%.

Table 15. Included studies for IgA-EMA-HU in studies including both children and adults
Author, year; countryStudy typeBiopsy criteriaSensSpecPPVNPVPrev
Sblaterro, 2000; ItalyCase-controlESPGAN93100100800.8
Sulkanen, 1998; FinlandCase-controlESPGAN92.699.599.294.90.4
An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf13.jpg.

   Figure 13. IgA-EMA-HU in adults and children with CD

Two studies assessed IgA-EMA-HU in a mixed-age population (Table 15; Figure 13).72, 74 In both these studies, the specificity was 100% (95% CI: 97.5–100) and the sensitivity 93% (95% CI: 88.1–95.4).

tTG antibodies

tTG—GP liver. The diagnostic characteristics of IgA-tTG-GP were assessed by ELISA in nine studies, and the diagnostic characteristics IgG-tTG-GP assessed by ELISA in three studies. Of the IgA-tTG-GP studies, five were conducted in adults,30, 32, 39, 45, 70 five in children,35, 41, 52, 70, 83 and four in a mixed population.47, 72, 74, 76 One study provided separate data for more than one age group.70

Of the IgG-tTG-GP studies that met the inclusion criteria, none were in adults or children, although two studies were in a mixed population.72, 76

Two studies included CD patients with less than a Marsh IIIa grade.45, 70 These studies demonstrated sensitivities of 81% and 95% for IgA-tTG-GP.

Table 16. Included studies for IgA-tTG-GP in adults
Author, year; countryStudy typeBiopsy criteriaSensSpecPPVNPVPrev (%)
Biagi, 2001; ItalyCase-controlPartial villous atrophy or greater87.598.19887.146.3
Salmaso, 2001; ItalyCase-controlGrades I-IV Marsh with response to a GFD879790.994.927.2
Dahele, 2001; ScotlandCase-controlIncluded 6 with IEL, rest partial villous atrophy or greater819797.974.152.5
Carroccio, 2002; ItalyRelevant clinical populationFerguson and Murray; partial or total villous atrophy100926010018.8
Bardella, 2001; ItalyRelevant clinical populationMarsh10098.283.310010.0
An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf14.jpg.

   FIgure 14. IgA-tTG-GP in adults with CD

IgA-tTG-GP. In the analysis of IgA-tTG-GP in adults, five studies grouped themselves by study design.30, 32, 39, 45, 70 The two cohort studies (relevant clinical population)30, 39 both showed sensitivities of 100%, and specificities of 92% and 98%, respectively. On the other hand, the three case-control studies32, 45, 70 demonstrated high specificities (97% to 98%), but sensitivities of only 81% to 88% (Table 16; Figure 14). This analysis did not show statistical heterogeneity, but the differences by study design were striking, so a pooled estimate for sensitivity was not performed. The pooled specificity was 95.3% (95% CI: 92.5–98.1).

Table 17. Included studies for IgA-tTG-GP in children
Author, country; yearStudy typeBiopsy criteriaSensSpecPPVNPVPrev
Bonamico, 2001; ItalyCase-controlESPGAN90.310010030.30.5
Salmaso, 2001; ItalyCase-controlGrades I-IV Marsh with response to a GFD9510010094.10.6
Hansson, 2000; SwedenCase-controlESPGAN90.995.795.291.70.5
Chan, 2001; CanadaRelevant clinical populationVillous atrophy, crypt hyperplasia, increase lymphocytes899467980.1
Wolters, 2002; NetherlandsRelevant clinical population (identified retrospectively)Subtotal villous atrophy with crypt hyperplasia969292.695.70.5
An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf15.jpg.

   Figure 15. IgA-tTG-GP in children with CD

The analysis of IgA-tTG-GP in children showed very little variability in either the sensitivity, or specificity (Table 17; Figure 15). Among these five studies,35, 41, 52, 70, 83 the sensitivities ranged from 89% to 96%. The specificities were all greater than 92%, with three studies showing specificities greater than 96%,41, 52, 83 and two studies having a sensitivity of 100%.35, 70 The pooled estimates of the sensitivity and specificity were 93.1% (95% CI: 88.8–95.9) and 96.3% (95% CI: 93.1–98.0), respectively (Table 17; Figure 15)

Table 18. Included studies for IgA-tTG-GP in studies including both adults and children
Author, year, countryStudy typeBiopsy criteriaSensSpecPPVNPVPrev
Dickey, 2001; Northern IrelandCase-controlVillous atrophy93.296.697.191.80.6
Sblaterro, 2000; ItalyCase-controlESPGAN8410010062.50.8
Sulkanen, 1998; FinlandCase-controlESPGAN9593.790.896.50.4
Troncone, 1999; ItalyRelevant clinical populationESPGAN91.79898940.4
An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf16.jpg.

   Figure 16. IgA-tTG-GP in adults and children with CD

Among the studies of mixed-age groups,47, 72, 74, 76 there was one outlier study with a sensitivity of only 84% but a specificity of 100% (Table 18).72 The specificities of the remaining studies were all greater than 94% (Table 18; Figure 16), and the sensitivities were between 92% and 95%. Heterogeneity was detected in the estimates of sensitivity, but not for specificity. The pooled specificity was 95.4% (95% CI: 92.7–97.2).

Table 19. Included studies for IgG-tTG-GP in studies including both children and adults
Author, year; countryStudy typeBiopsy criteriaSensSpecPPVNPVPrev
Sblaterro, 2000; ItalyCase-controlESPGAN61.510010044.40.8
Troncone, 1999; ItalyRelevant clinical populationESPGAN239892630.4
An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf17.jpg.

   Figure 17. IgG-tTG-GP in adults with CD

IgG-tTG-GP. Two studies in a mixed-age population assessed IgG-tTG-GP (Table 19; Figure 17).72, 76 The specificities in both studies were greater than 98%, but the sensitivities were 23% and 62%, respectively.

tTG - human recombinant (HR)

IgG-tTG-HR. The diagnostic characteristics of IgA-tTG-HR were assessed by ELISA in ten studies, and the diagnostic characteristics IgG-tTG-HR were assessed by ELISA in two studies. Of the IgA-tTG-HR studies, three were conducted in adults,39, 49, 54 three in children,52, 79, 83 and three in a mixed population.40, 72, 75

Table 20. Included studies for IgG-tTG-HR in studies including both children and adults
Author, year; countryStudy typeBiopsy criteriaNotesSensSpecPPVNPVPrev
Cataldo, 2000; ItalyCase-controlOriginal & revised criteria?20 IgA-deficient CD vs healthy IgA-deficient non-CD1008090.11000.7
Sblaterro, 2000; ItalyCase-controlESPGAN67.610010048.70.8
Of the IgG-tTG-HR studies, two were conducted in a mixed population (Table 20),40, 72 but none were conducted in adults or children. One study was conducted in IgA-deficient patients and is described below.40

Two studies included CD patients with less than a Marsh IIIa grade.45, 70 These studies demonstrated sensitivities of 81% and 95% for IgA-tTG-GP.

One study was conducted in a mixed-age population of patients with known IgA deficiency.40 In this study, the sensitivity of IgA-tTG-HR was 0%, wheras, the sensitivities and specificities of IgG-tTG-HR were 100% and 80%, respectively.

Table 21. Included studies for IgA-tTG-HR in adults
Author, year; countryStudy typeBiopsy criteriaSensSpecPPVNPVPrev (%)
Carroccio, 2002; ItalyRelevant clinical populationFerguson and Murray; partial or total villous atrophy100978010014.5
Gillbert, 2000; ItalyCase-controlMild, moderate, severe villous atrophy95.210095.210031.7
Kaukinen, 2000; FinlandRelevant clinical populationVillous height to crypt ration <2.0; IEL and HLA also tested1001001001008.7
An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf18.jpg.

   Figure 18. IgA-tTG-HR in adults with CD

IgA-tTG-HR. Three studies assessed IgA-tTG-HR in an adult population (Table 21; Figure 18).39, 49, 54 There was very little variability in the reported values for the sensitivities and specificities. The sensitivities were 100% in two studies, and 95% in the other. The specificities were 100% in two studies, and 97% in another. The pooled estimates of the sensitivity and specificity were 98.1% (95% CI: 90.1%–99.7%) and 98.0% (95% CI: 95.8–99.1), respectively.

Table 22. Included studies for IgA-tTG-HR in children
Author, year; countryStudy typeBiopsy criteriaSensSpecPPVNPVPrev
Vitoria, 2001; ItalyCase-controlSubtotal villous atrophy95100100930.6
Hansson, 2000; SwedenCase-controlESPGAN95.595.795.595.70.5
Wolters, 2002; NetherlandsRelevant clinical population (identified retrospectively)Subtotal villous atrophy with crypt hyperplasia96100100960.5
An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf19.jpg.

   Figure 19. IgA-tTG-HR in children with CD

Among the three studies in children (Table 22; Figure 19),52, 79, 83 the sensitivities were 96% in two studies and 95% in one. The specificities were 100% in two studies, and 96% in one. The pooled estimates of the sensitivity and specificity were 95.7% (95% CI: 90.3–98.1) and 99.0% (95% CI: 94.6–99.8), respectively.

Table 23. Included studies for IgA-tTG-HR in studies including both children and adults
Author, year; countryStudy typeBiopsy criteriaNotesSensSpecPPVNPVPrev
Cataldo, 2000; ItalyCase-controlOriginal & revised criteria?20 IgA deficient CD vs healthy IgA-deficient non-CD0100033.30.7
Sblaterro, 2000; ItalyCase-controlESPGAN91.510010076.90.8
Tesei, 2003; ArgentinaRelevant clinical populationMarsh II to IV - with confirmation919697870.6
An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf20.jpg.

   Figure 20. IgA-tTG-HR in adults and children with CD

Only two studies assessed the IgA-tTG-HR in a mixed-age population without IgA deficiency (Table 23; Figure 20).72, 75 The sensitivities and specificities were 92% and 100%, respectively, for the first study, and 91% and 96%, respectively, for the second. The pooled estimates of the sensitivity and specificity were 90.2% (95% CI: 86.4–93.0) and 95.4% (95% CI: 91.5–97.6), respectively.

Overall, these studies demonstrated a specificity of close to 100% and sensitivity in the range of 90% to 96%.

IgG-tTG-HR, IgA deficient. Only one study of IgG-tTG-HR, conducted in an IgA-deficient population, was identified.72 In this study, the sensitivity and specificity of IgG-tTG-HR was 68% and 100%, respectively.

Table 24. Included studies for combination IgA and IgG AGA, when either test is positive
Author, year; countryStudy typeBiopsy criteriaNotesSensSpecPPVNPVPrev
Valentini, 1994; ItalyCase-controlPartial villous atrophy or greaterAdults929096.877.10.76
Bode, 1994; DenmarkRelevant clinical populationCrypt hyperplasia, villous atrophy and increase inflammatory cellsAdults779571970.41
Gonczi, 1991; AustraliaRelevant clinical population (184 children with suspected celiac)ESPGAN no details on biopsy findingsAdults10097.196.21000.44
Bode, 1993; DenmarkRelevant clinical populationESPGANChildren869992990.1
Falth-Magnusson, 1994; SwedenRelevant clinical populationESPGAN + Alexander grading IV, grade III to IV challengeChildren88.593.788.893.50.4
Lindberg, 1985; SwedenRelevant clinical populationESPGAN, Alexander gradingChildren978341.898.20.3
Artan, 1998; TurkeyRelevant clinical populationESPGANChildren: IgA AGA or IgG AGA83364477.80.3
Gonczi, 1991; AustraliaRelevant clinical population (184 children with suspected CD)ESPGAN no details on biopsy findingsChildren10098.795.298.70.2
Chartrand, 1997; CanadaRelevant clinical populationESPGAN - with flat mucosal biopsyChildren937143980.2
Mixed-antibody combinations. Several studies were identified that tested different antibodies in combination. Six studies in children assessed the use of IgA- and IgG-AGA (Table 24).34, 42, 48, 50, 59, 85 When either of these tests were positive, the resulting sensitivities ranged from 83% to 100%, and the specificities ranged from 71% to 99%. One study, that apparently used similar methodologies, had the lowest sensitivity (83%) and specificity (36%) of the group.85 When the same authors tested the antibodies under the requirement of both tests being concordant, the sensitivity fell, as would be expected, to 50%, and the specificity rose to 67%.85 Three adult studies were identified that used IgA- and IgG-AGA in an either/or protocol (Table 24).33, 50, 78 As was observed in the studies of children, significant between-study differences existed, making pooled estimates inappropriate. Nonetheless, in these studies the sensitivity ranged from 77% to 100%, while the specificity ranged from 90% to 97%.

Table 25. Included studies for combination IgA and IgG tTG-HR, when either test is positive
Author, year; countryStudy typeBiopsy criteriaNotesSensSpecPPVNPVPrev
Sblaterro, 2000; ItalyCase-controlESPGANAdults and children98.510010095.20.8
Table 26. Included studies for combination IgA-AGA and IgG-EMA-HU, when either test is positive
Author, year; countryStudy typeBiopsy criteriaNotesSensSpecPPVNPVPrev
Russo, 1999; CanadaRelevant clinical populationESPGANChildren1007357820.3
One study in a mixed-age population assessed the use of a combination of IgA- and IgG-tTG-HR antibodies (Table 25).72 In this study, the sensitivity when either test was positive was 98.5%, while the specificity remained high at 100%. Another study in children assessed the combination of IgA-AGA and IgA-EMA-HU when either test was positive, and found a sensitivity of 100% and a specificity of 73% (Table 26).69 This same study assessed the same antibodies under the situation where both tests needed to be concordant. In this circumstance, the sensitivity remained 100% and the specificity rose to 93%.

In general, combining tests when either test is positive tended to improve sensitivity at the cost of specificity, while a requirement for the tests to be concordant tended to improve specificity.

Table 27. Weighted pooled estimates with 95% CIs and heterogeneity identified
AnalysisSensL 95% CI:U 95% CI:SpecL 95% CI:U 95% CI:PrevL 95% CI:U 95% CI:PPVL 95% CI:U 95% CI:NPVL 95% CI:U 95% CI:
IgA-AGA-ADULTHHHHHH0.3580.3320.385HHHHHH
IgG-AGA-ADULTHHHHHH0.3670.3350.401HHHHHH
IgA-EMA-ME-ADULT0.9740.9570.9850.9960.9880.9990.3980.3710.4250.9740.9570.9850.9960.9880.999
IgG-EMA-ME-ADULT (one study)0.3930.2360.5760.9840.9130.9970.1350.0790.2210.3930.2360.5760.9840.9130.997
IgA-EMA-HU-ADULT0.9020.8590.9341.0000.9911.0000.3310.2970.3680.9020.8590.9341.0000.9911.000
IgA-tTG-GP-ADULT0.8590.8080.8980.9530.9300.9690.3120.2790.3480.8590.8080.8980.9530.9300.969
IgA-tTG-HR-ADULT0.9810.9010.9970.9810.9580.9910.1600.1260.2020.9810.9010.9970.9810.9580.991
IgA-AGA-CHILDHHHHHH0.3630.3410.385HHHHHH
IgG-AGA-CHILDHHHHHH0.4370.4130.462HHHHHH
IgA-EMA-ME-CHILD0.9610.9450.9730.9740.9630.9820.4000.3780.4230.9610.9450.9730.9740.9630.982
IgA-EMA-HU-CHILD0.9690.9350.986HHH0.4470.4020.4930.9690.9350.9860.9490.9150.970
IgA-tTG-GP-CHILD0.9310.8880.9590.9630.9310.9800.4460.4010.4930.9310.8880.9590.9630.9310.980
IgA-tTG-HR-CHILD0.9570.9030.9810.9900.9460.9980.5190.4520.5840.9570.9030.9810.9900.9460.998
IgA-AGA-MIXEDHHHHHH0.4150.3860.444HHHHHH
IgG-AGA-MIXEDHHHHHH0.5100.4800.540HHHHHH
IgA-EMA-ME-MIXEDHHH0.9950.9820.9990.4670.4340.5000.8590.8250.8880.9950.9820.999
IgA-EMA-HU-MIXED0.9250.8810.9540.9960.9750.9990.4370.3910.4840.9250.8810.9540.9960.9750.999
IgA-tTG-GP-MIXEDHHH0.9540.9270.9720.4630.4250.5010.9130.8770.9390.9540.9270.972
IgG-tTG-GP-MIXED0.4510.3630.5430.9880.9350.9980.2650.2080.3310.4510.3630.5430.9880.9350.998
IgA-tTG-HR-MIXED0.9020.8640.9300.9540.9150.9760.5730.5300.6160.9020.8640.9300.9540.9150.976
IgG-tTG-HR-MIXED (one study)0.6770.5560.7781.0000.8391.0000.5180.4130.6210.6770.5560.7781.0000.8391.000

H = significant heterogeneity by Pearson's Chi square

Note: see Appendix G for raw pooled data by antibody test

Prevalence of CD and the positive predictive value (PPV) and negative predictive value (NPV) of serology. The prevalence of CD in the tested populations is presented in Tables 2 to 26 for the individual studies, and in Table 27 for the pooled estimate for the analysis groups.

The minimum prevalence of CD in individual study populations was greater than 25% in most of the studied analysis groups (i.e., IgA-AGA, IgG-AGA, etc), except for ten analysis groups where the minimum prevalence was between 9% and 12%. In all the analysis groups, the maximum prevalence ranged from 30% to as high as 70%. The pooled prevalence for the analysis groups was predominantly between 30% and 45%.

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf21.jpg.

   Figure 21. PPV and prevalence from individual studies

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf22.jpg.

   Figure 22. NPV and prevalence from individual studies

In assessing the IgA-EMA and IgA-tTG analysis groups, the pooled prevalence ranged from 33% to 46% except for the analysis of IgA-tTG-HR in adults, which showed a pooled prevalence of 16%. Figure 21 is a plot of the individual study prevalence versus the study's PPV, and suggests that below a CD prevalence of about 35% to 40%, the PPV of these IgA-based tests tends to drop from about 90% to 100%, to about 80% or less. As expected, Figure 22 demonstrates the reverse relationship, with the NPV being between 95% and 100% up to a CD prevalence of about 45%, and then dropping off.

HLA DQ2/DQ8

We identified 99 potentially relevant HLA articles that appeared to address HLA DQ2/DQ8 in a CD population (Appendix F).8–11, 15, 53, 54, 62, 91–100, 100–177 These studies were not designed to determine the diagnostic utility of DQ2 or DQ8 per se.

Of the identified studies, 54 allowed estimation of the prevalence, sensitivity or specificity of HLA DQ2/DQ8 in the studied population.8–10, 15, 53, 54, 93, 100, 109, 120, 134–177 In one study, DQ2 data could not be reliably extracted.169 The authors of one study9 explicitly stated that the patients used were the same as in two of their other publications.8, 93 In two other publications by the same authors,9, 10 the patients appear to be different and the authors do not indicate that they used patients from a previous study. However, the possibility that these two studies9, 10 share a subset of patients cannot be excluded. Another two studies addressing different topics but with extractable HLA data, also appeared to have used the same patients.53, 136 In cases of duplicate publications, the studies with the greater number of patients were used.9, 136

The study designs and strictness of CD diagnosis in these articles varied, as did the inclusion of a control group. Most of the CD cases were diagnosed based on the ESPGAN criteria, although in some studies CD was diagnosed based on serology and then in most cases later confirmed by biopsy.15, 109, 120, 160, 161, 164, 168, 170, 172, 177 Nine of the studies were classified as cross-sectional studies,169–177, 32 were case-control studies,8–10, 53, 100, 120, 134–159 and 12 were mixed cross-sectional/case-control studies or could be considered as diagnostic cohort studies.15, 54, 109, 160–168 Four of the mixed design studies109, 164, 166, 178 used screen-negative patients as the control group, whereas the rest used a control group that was separate from the screened population. The study populations were also variable. The case-control studies used known CD cases compared with variously defined CD negative controls.

Seven studies used relatives of CD patients,158, 161, 164, 166, 169, 172, 177 four used a population with Down Syndrome,109, 134, 160, 170 two used a population with type I diabetes,165, 173 and one used a mixed group of patients with CD including some with Down's and others with diabetes.173 The mixed-design/cohort studies used patients suspected of CD on clinical grounds or subjects who belonged to a high-risk group, such as type 1 diabetics or first-degree relatives of patients with CD. The remaining articles used a screened healthy population or another specific group.

The articles with extractable data stated the frequency of HLA DQ2, and to a lesser extent the frequency of HLA DQ8, in their CD group. The cross-sectional studies did not include a control group. Only the frequency as a surrogate of sensitivity was available. None of the case-control or mixed-design studies calculated the sensitivity or specificity of HLA DQ2 or DQ8. However, these studies allowed us to derive estimates of these statistics from their results or tables. The considerable degree of clinical and methodological heterogeneity between the identified studies did no allow for statistical pooling of the results.

Table 28. HLA studies with biopsied cases and controls
Author, year; countryPrev of CDDQ2 in CDDQ2 in controlsSensitivitySpecificityPPVNPVCD population
Iltanen, 1999; Finland0.2490.4829.8590%70%49%96%Known CD versus biopsied controls
Sacchetti, 1998; Italy0.7986.8918.7587%81%95%62%Known CD versus biopsied controls
0.5186.8926.7287%73%77%84%Versus unbiopsied healthy controls
Two studies fulfilled our inclusion requirement of both cases and control groups undergoing intestinal biopsy (Evidence Table 2, Appendix I; Table 28).136, 152 The remaining studies had various control group types: unbiopsied, healthy controls, disease controls, or serology-negative controls. These studies provide useful information and are presented at the end of the HLA results section for reference.

The study by Iltanen et al.,136 was conducted in a group of Finnish children to assess the density of gamma delta positive intraepithelial lymphocytes (γδ+ IELs) in: patients with CD by biopsy (ESPGAN); patients with suspected CD where the diagnosis was excluded by biopsy; and, in a group of biopsy-negative patients who underwent endoscopy for dyspepsia. The biopsy aspect of this study is presented in its respective section. In this study, HLA DQ2 was found in 19 of 21 (90.5%) of patients with CD as apposed to 29 out of 67 (29.9%) of the control patients. Elevated γδ+ IEL density was significantly associated with DQ2 positivity. The calculated diagnostic measures for this study are presented in Table 28. In this population, DQ2 demonstrated a high sensitivity of 90.5% but a relatively modest specificity of only 70%, which is understandable given that the control population had a fairly high frequency of DQ2 positivity. The prevalence of CD in the study population was 1:4.2 (or 24%). The PPV was 49% and the NPV was 96%, suggesting that a negative DQ2 test result provides the greatest diagnostic information.

Sacchetti et al.152 studied a group of Italian children suspected of having CD. Patients fulfilling the ESPGAN criteria were classified as having CD (n = 48 of 80), whereas, the remainder (n=32) were considered disease controls. The authors also used a second retrospectively defined group of known CD patients by ESPGAN criteria (n = 74), and a second group control of 180 unbiopsied healthy subjects. HLA DQ2 was determined in the CD group as a whole and in the two control groups, with the results presented in Table 28. In this study, the sensitivity of HLA DQ2 was 88.9% and the specificity was 81% for the comparison with the biopsied controls; the sensitivity of HLA DQ2 was 88.9% and the specificity was 73% for the comparison with the unbiopsied controls. Interestingly, in this study only 18.8% of the biopsy-negative controls were positive for HLA DQ2, whereas, 26.7% of the unbiopsied controls were HLA DQ2 positive. This difference accounts for the higher specificity seen for HLA DQ2 in the comparison with the biopsy-negative control group as compared with the comparison with the healthy controls. The prevalence of CD in the studied population was also quite high in both portions of this study (79% for comparison with biopsied controls and 51% for the comparison with unbiopsied controls). As such the PPV and the NPV of HLA DQ2 in this study were 95% and 62%, respectively. The difference in prevalence between this and the Iltanen study accounts for the differences seen in the PPVs and NPVs.

HLA all study data. The following section presents the data of the HLA studies that failed to be included on the basis that the control groups were not assessed with the gold standard test for CD (biopsy). These studies collectively provide useful information on the diagnostic value of HLA testing, but have to be interpreted with caution.

Table 29. Prevalence/frequency of HLA DQ2 and HLA DQ8 in prevalence and mixed-design studies, and in case-control studies with HLA DQ8 data
AuthorYearCountry# of CD% DQ2% DQ8% DQ2/8Population with CD
Lewis2000USA10190.10n/an/aConfirmed cases among CD relatives
Book2001USA887.5012.50100Down Syndrome
Book2003USA34n/an/a97.06Affected 1st-degree relatives of CD sib. pairs
Csizmadia2000Netherlands1010020n/aDown Syndrome
Fasano2003USA9883.6722.45100Screened large population only subset tested for HLA
Iltamen1999Finland5100n/an/aSjogren's syndrome
Kaukinen2000Finland6100n/an/aKnown CD
Maki2003Finland5685.71n/an/aScreen of school-age children
Mustalahti2002Finland29100n/an/aRelatives of CD or DH
Catassi2001Algeria7991.3n/a95.6Saharawi Arabs
Lui2002Finland26096.922.6999.62Family members of celiacs
Polvi1996Finland45100n/an/aKnown CD
Ploski / Sollid1996Sweden13591.854.4496.30Known CD
Popat2002Sweden6293.55n/an/aKnown CD
Larizza2001Italy7100n/an/aChildren with autoimmune thyroid disease, EMA+biopsy
Failla1996Italy714.29n/an/aDown Syndrome (only 7 CD cases)
Farre1999Spain6093.33n/an/a1st-degree relatives of celiacs
Balas1997Spain21294.814.2599.06Known CD
Zubillaga2002Spain13592.593.7096.0 (calc)Mostly CDs, some CD in subjects with Down Syndrome and subjects with diabetes
Karell2003France 92 86.96 6.52 93.48 Known CD
Italy 302 93.71 5.63 89.40
Finland 100 91 5.00 96.00
Norway/ Sweden 326 91.41 5.21 96.63
Uk 188 87.77 7.98 95.74
Total100893.715.9593.95
Kaur2002India3597.14n/an/aKnown CD
Neuhausen2002Israel2382.6156.52100Bedouin Arabs
Tuysuz2001Turkey5583.6416.3690.91Children with known CD
Bouguerra1996Tunisia9484.04n/an/aKnown CD
Sumnik2000Czech158066.67100Diabetics
Perez-Bravo1999Chile6211.2925.8137.10Chileans

DH = dermatitis herpetiformis

Table 30. Sensitivity/specificity (calculated) for HLA DQ2 in case-control studies
Author, year; countryPrev of CD% DQ2 in CD% DQ2 in ControlsSensSpecPPVNPVCD population
Fine, 2000 ; USA0.0688 (22/25)31.24 (134/429)0.880.690.140.99Known CD
Howell, 1995 ; UK0.3891.21 (83/91)23.18 (35/151)0.910.770.70.94Known CD
Michalski, 1995; Ireland0.6296.67 (87/90)39.29 (22/56)0.970.610.80.92Known CD
Colonna, 1990 ; Italy0.3694.59 (140/148)40.82 (109/267)0.950.590.560.95Known CD
Catassi, 2001 ; Algeria0.3791.1 (72/79)38.9 (53/136)0.910.610.580.92Saharawi Arabs
Congia, 1991; Italy0.296 (24/25)34 (34/100)0.960.660.410.99Known CD
Ferrante, 1992 ; Italy0.4888 (44/50)16.36 (9/55)0.880.840.830.88Known CD
Mazzilli, 1992 ; Italy0.592 (46/50)18 (9/50)0.920.820.840.91Known CD
Tighe, 1992 ; Italy0.4970.59 (39/43)8.33 (5/41)0.910.880.890.9Known CD
Castro, 1993 ; Italy0.3880 (4/5)37.5 (3/8)0.80.630.570.83Down Syndrome
Lio, 1997; Italy0.45100 (18/18)63.64 (14/22)10.360.561Known CD
Sacchetti, 1998 ; Italy0.7986.89 (106/122)18.75 (6/32)0.870.810.950.62Known CD and biopsied controls
Sacchetti, 1998 ; Italy0.5186.89 (106/122)26.72 (31/116)0.870.730.770.84Healthy controls
Iltamen, 1999; Finland0.2490.48 (19/21)29.85 (20/67)0.90.70.490.96Known CD
Ploski/Sollid, 1993 ; Sweden0.3494.68 (89/94)25.97 (47/181)0.950.740.650.96Known CD
Pattersson, 1933; Sweden0.492.31 (60/65)43.75 (42/96)0.920.560.590.92Known CD
Ploski/Sollid, 1996 ; Sweden0.4391.85 (124/135)22.35 (40/179)0.920.780.760.93CD vs blood donors
Fernandez-Arquero, 1995 ; Spain0.3692 (92/100)25.56 (46/180)0.920.740.670.94Known CD
Arranz, 1997 ; Spain0.592 (46/50)24 (12/50)0.920.760.790.9Known CD
Balas, 1997 ; Spain0.2294.81 (201/212)29.25 (217/742)0.950.710.480.98Known CD
Ruiz Del Prado, 2001 ; Spain0.0494.74 (36/38)39.22 (351/895)0.950.610.091Known CD
Dijilali-Saiah, 1994 ; France0.2788.75 (71/80)21.13 (45/213)0.890.790.610.95Known CD
Dijilali-Saiah, 1998 ; France0.4483.17 (84/101)20 (26/130)0.830.80.760.86Known CD
Tighe, 1993 ; Israel0.5190.7 (24/34)12.2 (3/36)0.710.920.890.77Ashkenazi Jews, known CD
Arnason, 1994 ; Iceland0.1384 (21/25)36.36 (60/165)0.840.640.260.96Known CD
Boy, 1994; Sardinia0.596 (48/50)32 (16/50)0.960.680.750.94Known CD
Congia, 1994 ; Sardinia0.4290.77 (59/65)39.33 (35/89)0.910.610.630.9Known CD
Erkan, 1999 ; Turkey0.540 (12/30)6.67 (2/30)0.40.930.860.61Known CD
Tumer, 2000 ; Turkey0.351.52 (17/33)25.97 (20/77)0.520.740.460.78Turkish, known CD
Tuysuz, 2001 ; Turkey0.5283.64 (46/55)24 (12/50)0.840.760.790.81Turkish, known CD
Perez-Bravo, 1999 ; Chile0.3311.29 (7/62)2.42 (3/124)0.110.980.70.69Chilean
Table 31. Sensitivity/specificity (calculated) for HLA DQ2 in mixed-design studies
Author, year; countryPrev of CD% DQ2 in CD% DQ2 in controlsSensSpecPPVNPVCD population
Book, 2001 ; USA0.0987.50 (7/8)15.58 (12/77)0.880.840.370.98Down Syndrome
Csizmadia, 2000 ; Netherlands0.11100 (10/10)28 (25/90)1.000.720.291.00Down Syndrome
Fasano, 2003 ; USA0.5283.67 (82/98)42.39 (39/92)0.840.580.680.779019 at risk, 4126 not at risk
Larizza, 2001 ; Italy0.08100 (7/7)34.62 (27/78)10.650.211Children with autoimmune thyroid disease, EMA+biopsy
Polvi, 1996 ; Finland0.58100 (45/45)28.13 (9/32)10.720.831CD vs various controls
Iltamen, 1999; Finland0.15100 (5/5)n/a1n/an/an/aSjogren's syndrome
Kaukinen, 2000 ; Finland0.17100 (6/6)n/a1n/an/an/aCD vs disease controls
Lui, 2002; Finland0.5296.92 (252/260)57.38 (136/237)0.970.430.650.93Family members of celiacs (controls=unaffected family members)
Farre, 1999 ; Spain0.5593.33 (56/60)18 (9/50)0.930.820.860.91CD vs healthy controls
0.2693.33(56/60)63.91(108/169)0.930.360.340.94CD vs relatives of CD
Sumnik, 2000 ; Czech0.0780 (12/15)49.46 (92/186)0.80.510.120.97Diabetes (control=EMA neg.)
Kaur, 2002 ; India0.1197.14 (34/35)4.64 (13/280)0.970.950.721CD vs healthy controls
Neuhausen, 2002 ; Israel0.3182.61 (19/23)61.54 (32/52)0.830.380.370.83Bedouin Arabs (some cases and controls not biopsied)
Table 32. Sensitivity/specificity (calculated) for HLA DQ8
Author, year; countryPrev of CDDQ8 in CDDQ8 in controlsSensSpecPPVNPVCD population
Csizmadia, 2000 ; Netherlands0.1120 (2/10)20 (18/90)0.200.800.100.90Down Syndrome
Fasano, 2003 ; USA0.5222.45 (22/98)20.65 (19/92)0.220.790.540.49Screened at-risk and not-at-risk populations
Lui, 2002; Finland0.522.69 (7/260)10.55 (25/237)0.030.890.220.46Family members of CD patients (controls=unaffected family members)
Ploski/Sollid1996 ; Sweden0.434.44 (6/135)25.14 (45/179)0.040.750.120.51Known CD
Balas, 1997 ; Spain0.224.25 (9/212)16.85 (125/742)0.040.830.070.75Known CD
Sumnik, 2000 ; Czech0.0766.67 (10/15)65.59 (122/186)0.670.340.080.93Diabetes
Neuhausen, 2002 ; Israel0.3156.52 (13/23)25 (13/52)0.570.750.50.8Bedouin Arabs
Tuysuz, 2001 ; Turkey0.5216.36 (9/55)8 (4/50)0.160.920.690.5Turkish known CD
Perez-Bravo, 1999 ; Chile0.3325.81 (16/62)12.9 (16/124)0.260.870.50.7Chileans
An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf24.jpg.

   Figure 24. HLA DQ2

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf25.jpg.

   Figure 25. HLA DQ2 and DQ8

The prevalence of DQ2 and DQ8 in these studies is presented in Table 29, while the results of the diagnostic value of HLA DQ2 and HLA DQ8 are presented in Tables 30 and 31. Unfortunately, none of these studies were actual studies of the diagnostic value of HLA DQ2 or HLA DQ8 for the diagnosis or screening of CD. However, as presented in the Tables, the crude data was abstracted and the diagnostic characteristics were calculated. Significant clinical and statistical heterogeneity existed between these studies, making arithmetic pooling of the studies unjustified. Figure 24 and Figure 25 represent the plotting of each study's sensitivity (true positives) versus 1-specificity (false positives) to create a ROC presentation. The value of these figures lies in the global picture they represent regarding the results of each of the studies. Figure 24 demonstrates that the vast majority of the studies cluster together in a region where the sensitivity of HLA DQ2 is greater than 80%, with most studies lying above the 90% sensitivity mark. In contrast, these same studies have specificities in the range of 55% to 80%. Outlier studies are identified by author name. The best sensitivities and specificities were seen in two studies. The first, by Kaur et al.,163 was a study from India where only 4.6% of the control population was positive for HLA DQ2. The second study, by Tighe et al.,149 was conducted in a group of patients with CD and ethnically-matched control subjects from Rome, Italy. The prevalence of CD was quite high in the studied group (51%), and the frequency of HLA DQ2 in the control population of 12.2% was much lower than that observed in other Italian studies.

The remaining outlier studies were divided into a low-sensitivity/high-specificity group (Group 1), and a high-sensitivity/low-specificity group (Group 2). In the first case, all the studies were conducted in a non-Western European population. In particular, the worst performance of HLA DQ2 occurred in a study from Chile,137 where the frequency of HLA DQ2 was very low in both the patients with CD and the control subjects. It is important to note, however, that not all non-Western populations deviated from the main cluster of studies. For example, Catassi et al.120 found that 91% of Saharawi Arabs (Algeria) with CD carried HLA DQ2 compared with 38.9% of Saharawi controls. These values are similar to those seen in most Western populations. The second group all showed relatively poor specificity, although the sensitivity was preserved. As would be expected, the control groups of these studies were at high risk of having CD (relatives of CD158, 164, 166), or were a population with a known higher frequency of HLA DQ2 (individuals with diabetes161, 165). As such, the high frequency of HLA DQ2 in these control populations makes the specificity of HLA DQ2 rather poor.

The frequency of HLA DQ8 in Western European populations with CD varies from approximately 2.7% to 6% (Table 29). The frequency is slightly higher in studies from Italy, the UK, and France (5.6% to 8% of CD patients). The frequency of HLA DQ8 in a subset of patients who had HLA testing in a large American serology screening study for CD was 22%,15 which is quite a bit higher than that reported in the European studies.

Table 33. Sensitivity/specificity (calculated) for HLA DQ2 or DQ8
Author; year; countryPrev of CDDQ2 or DQ8 in CDDQ2 or DQ8 in controlsSensSpecPPVNPVNotes
Fasano, 2003 ; USA0.52100 (98/98)59.78 (55/92)10.40.641Screened at-risk and not-at-risk populations
Catassi, 2001 ; Algeria0.3796.2 (76/79)41.9 (57/136)0.960.580.570.96Saharawi Arabs
Lui, 2002; Finland0.5299.62 (259/260)67.93 (161/237)10.320.620.99Family members of CD (controls=unaffected family members)
Balas, 1997 ; Spain0.2299.06 (210/212)46.09 (342/742)0.990.540.381Known CD
Sumnik, 2000 ; Czech0.07100 (15/15)87.63 (163/186)10.120.081Diabetes
Tuysuz, 2001 ; Turkey0.5290.91 (50/55)32 (16/50)0.910.680.760.87Turkish Known CD
Neuhausen, 2002 ; Israel0.31100 (23/23)86.54 (45/52)10.130.341Bedouin Arabs
Perez-Bravo, 1999 ; Chile0.3337.1 (23/62)15.32 (19/124)0.370.850.550.73Chileans
A small group of studies allowed the estimation of the sensitivity and specificity of having either HLA DQ2 or DQ8. The results of these studies are presented in Table 33 and Figure 25. As can be seen in the figure, these studies confer a wide variation. Clearly, the sensitivity of using this strategy is quite high and is likely close to 100% in Western populations. The study by Balas et al.,155 likely represents the closest to the truth, as this was a typical case-control design in patients with know CD compared with healthy controls. The Fasano et al. study15 represents the largest study and gives similar results to those obtained by Balas et al., however, the higher frequency of HLA DQ8 in their control group compared with other studies is of concern. Once again, the remaining studies can be grouped into high-specificity/low-sensitivity (Group 1) and high-sensitivity/low-specificity (Group 2). As was the case for HLA DQ2, Group 1 consists of two studies of non-Western populations, whereas, Group 2 represents studies with first-degree relatives and a study that used patients with diabetes as their control group.

Biopsy

Using epidemiologically appropriate eligibility criteria, our comprehensive literature search did not identify any studies that specifically addressed the question of the sensitivity or specificity of biopsy for the diagnosis of CD.

However we sought to obtain indirect evidence regarding the diagnostic performance of biopsy as a test for CD. Some data was available from those studies identified for other review objectives, such as the cross-sectional screening studies, the HLA DQ2/8 studies, and studies of IELs. We also sought studies of follow-up of biopsy negative patients suspected of CD, and studies of silent and latent CD. The findings from these studies are presented in the Discussion and in Appendix H.

Quality Assessment

Overall, the quality of the diagnostic studies assessed in the Celiac 1 objective was quite good (Appendix J, Table 1). However, 59% of the studies reported using a selected patient population that may not be representative of a clinically relevant population. This is likely related to study design. Only 11% of the studies reported on whether the reference test was reported without knowledge of the index test. We felt that this was not a major threat to the validity of the studies.

Celiac 2: Incidence and Prevalence of CD

The literature search yielded 2,116 references (Appendix F). A first-level screen of the titles, abstracts and keywords, for articles that related to the incidence or prevalence of CD, excluded 1,506 references. Full-text versions of each of the 610 retained references were obtained and used for a second-level screen for articles, with a focus on the incidence and/or prevalence of CD. Review articles were also identified and kept for reference (n = 71). Three hundred and forty-eight out of the 610 references were excluded. The remaining 262 references were screened at a third level (Appendix F). Studies were included if they reported the prevalence and/or incidence of CD in the following groups: (1) general populations from North America or Western Europe; (2) first-degree relatives of patients with CD; (3) patients with type 1 diabetes; (4) patients being investigated for anemia; (5) patients with osteoporosis or osteopenia; (6) patients with suspected CD on the basis of their clinical presentations. We did not use any geographic restriction for the studies of populations at risk (first-degree relatives and type 1 diabetics) or of associated clinical presentations (suspected CD, anemia, or metabolic bone disease). Studies of prevalence or incidence that used AGA tests conducted prior to 1990 were excluded after discussion with the AHRQ because of potential problems with the reliability of older AGA assays. Reports which were not sufficiently explicit for data extraction also had to be excluded.179–181

We defined incidence studies as those studies that reported the total number of new cases of CD for a given territory and period, over a unit of population density. Therefore, studies of incidence where there was no population denominator were excluded. When multiple studies of incidence of CD were available for a similar country or geographic area, the most recent and/or most encompassing was selected. In general, we excluded the studies whose observation periods pertained exclusively to a period prior to 1990.

A total of 133 publications were selected. Of these, 14 publications were identified as duplicates on the basis that the same study population was reported on elsewhere, or as part of a larger cohort.122, 182–194 The remaining 119 original studies on prevalence and/or incidence of CD in the populations of interest were included and their data abstracted. Of these included studies, 42 assessed the prevalence and/or incidence of CD in a general population. Twelve of the 42 reported on the incidence of CD,128, 195–205 and 30 reported on the prevalence, either in the US (three studies206–208), Scandinavia (11 studies209–219), Italy and San Marino (seven studies126, 220–225), UK (four studies226–229), or other countries (Spain230, the Netherlands,231, 232 Switzerland,233 and Germany234).

Studies of the prevalence of CD in populations at risk were divided as follows: 18 studies of the first-degree relatives of CD patients,129, 167, 206, 235–249 and 34 studies in patients with type 1 diabetes.234, 250–282

Studies of the prevalence of CD in patients with associated clinical presentations were divided as follows: 12 studies in anemia and/or iron deficiency,283–294 four studies in metabolic bone disease,295–298 and 13 studies of patients with suspected CD on the basis of their clinical presentation.206, 238, 299–309 The clinical manifestations that were included in the “suspected CD category” were: chronic diarrhea, weight loss, malabsorption or abdominal pain in adults and failure to thrive, short stature, malabsorption, chronic diarrhea, and abdominal pain in children. Four studies included groups at multiple-risk levels.206, 234, 238, 272

Incidence of CD in the General Population

Table 34. Included studies of incidence of CD in the general population
StudyCountry, periodGroup at riskPeriod related to resultsIncidence
Crude incidence (# cases/100,000 patient year)Cumulative incidence (# cases/1,000 births)
Ivarsson, 2003Sweden, 1973-97Children1997 (0–2 y) 51 (95% CI: 36–70) Age 2 (1995): 1.7 (95% CI: 1.3–2.1)
Duplicate Ivarsson, 20001931996 (2–5 y) 33 (95% CI: 24–44)
1996 (5–15 y)10 (95% CI: 7–13)
Weile, 1993Denmark, 1960-88Children1960-88Age 5 (1988): 0.118
Duplicate Weile, 1993196
Maki, 1990Finland, 1960-84Children1974-833.46 (95% CI: n/r)
Duplicate ref194
Hawkes, 2000England, 1981-95Children1991-952.15 (95% CI: n/r)
Magazzu, 1994Sicily 1975-89Children1989 birth cohortAge 5 (1989): 1.16
95% CI: 0.92–1.42
Lopez-Rodriguez, 2003Spain, 1981-99Children 0–14 y1981-90 6.87 (95% CI: 5.26–8.83)
1991-99 16.04 (95% CI: 12.99–19.59)
Children 0–4 y1991-9942.04 (95% CI: n/r)
Hoffenberg, 2003US (Denver, Colorado), 1993-99Children1993-99Age 5 (1999): 9 (95% CI: 4–20)
Jansen, 1993Netherlands 1990-92All ages1991-921.0 (95% CI: n/r)
Corrao, 1995Italy 1990-91All ages1990-912.13 (95% CI: n/r)Age 5 (1991): 0.81
Talley, 1994US 1960-90 Olmstead CountyAll ages1960-90 1.2 (95% CI: 0.7–1.6)
1980-901.7 (95% CI: n/r)
Bodé, 1996Denmark, 1976-91Adults1976-911.27 (95% CI: n/r)
Collin, 1997Finland, 1975-94Adults1990-9417.2 (95% CI: n/r)
Hawkes, 2000England, 1981-95Adults1991-953.08 (95% CI: n/r)
The incidence of CD in North America and Western Europe was derived from studies from the following countries: US,128, 205 England,201 Italy,202, Sicily,203 Spain,204 Netherlands,200 Sweden,195 Denmark,196, 197 and Finland (Evidence Table 3, Appendix I; Table 34).198, 199 In the report, crude incidence is defined as the number of new cases per 100,000 population-at-risk per year and cumulative incidence as the number of new cases per 1,000 live births; cumulative incidence is age-specific and its denominator reflects the total number of individuals from the same year of birth (i.e., birth cohort).

Incidence in children: The crude incidence of CD in children age 0 to 15 years varied from 2.15 to 51 cases per 100,000 patient years.193–195, 198, 201, 204 When reported, the relative risk (RR) of CD was greatest for the 0- to 2-year age group, as well as for women, and varied from 32.26 to 42.4193, 195, 204 and from 1.9 to 3.34,128, 193, 195 respectively. The cumulative incidence at age 5, when reported, varied between 0.089 and 9 cases per 1,000 live births.128, 196, 202, 203 (see Table 34).

The incidence of CD has been most studied in the Scandinavian countries, particularly Sweden,193, 195, 310–313 Denmark,196, 197, 313, 314 and Finland,194, 198, 199 where important disparities have been observed over time and between countries. Reports from these countries have the advantage of being derived from comprehensive prospective databases and from populations which are genetically fairly stable, shedding light on potential environmental causal exposures,195, 196 or on variations in practice patterns.

In Scandinavia, the highest incidences of CD in children were found in Sweden for the 0- to 2-year age group from 1987 to 1997, where an average of 198 new cases per 100,000 patient years (95% CI: 186–210) were observed.193, 195 This peak in incidence was followed by a rapid decline, observed during 1995-97, where incidences dropped to an average of 51/100,000 patient years (95% CI: 36–70). In contrast, the incidence of CD in children aged 2 to 4.9 years and 5 to 15 years was only slightly increased over the 1973-97 period, with a peak in 1996 of 33 cases (95% CI: 24–44) per 100,000 patient years and 10 cases (95% CI: 7–13) per 100,000 patient years for these respective age groups. A cohort effect was noted in that the cumulative incidences at 2 years of age for the children belonging to birth cohorts from 1984 to 1994 were on the gradual rise (up to 4.4 cases/1,000 births [95% CI: 3.8–5.0] for the 1993 cohort), while a progressive decline was observed for birth cohorts from 1994 to 1996 (down to 1.7 cases [95% CI: 1.3–2.1] per 1,000 births for the 1995 cohort). Most of these cases were symptomatic, so that these observations are unlikely to be due to changes in screening practices. Interestingly, these changes mirrored changes in the composition of infant formulas, with the highest values of a wheat/rye/barley exposure index during the years 1982-1994.

In contrast, the incidence of CD in Denmark, a neighbouring country, has been significantly lower and very stable from 1960 to 1988,196 with an average incidence of 0.089/1,000 live births for that period.313 A comparison of dietary exposures between Swedish and Danish children diagnosed with CD between 1972 and 1989 showed that by the age of 8 months, the Swedish diet contained more than 40 times more gliadin than the Danish diet.313 In Finland, incidences have also been fairly stable, and have in fact decreased among infants but increased among older children.198 However, these observations date back to 1984 and can therefore not be compared with the Swedish epidemics.

Spain has also seen an increased incidence of CD over the past 25 years, from 6.87 (95% CI: 5.26–8.83) cases/100,000/year in 1981-90 to 16.04 cases/100,000/year (95% CI: 12.99–19.59) in 1991-99,204 an observation that was correlated with an increased proportion of silent or atypical presentations at diagnosis (i.e., inferring a role for changes in clinical practice). The age at diagnosis also correlated positively with the age at which gluten was introduced in the diet.

The role of dietary exposure during infancy is also highlighted in studies from the UK, where recommendations on infant feeding, promoting breastfeeding and later introduction of starches, were published in 1974. Subsequent to these recommendations, there was a fall in the incidence of childhood CD;315, 316 however, this data is not presented in detail because we focused on reports from the past 15 years.

As opposed to the incidences derived from reported cases, the incidence observed from a prospective screening protocol are not subject to variations related to practice patterns and are obviously more comprehensive and accurate. Hoffenberg et al., from the US, conducted the only prospective CD screening study available to date.128 Between December 1993 and September 1999, a total of 22,346 newborns in Denver, Colorado were screened for HLA genotypes associated with CD and type 1 diabetes. A representative sample of at risk HLA DRB1*03 positive infants were prospectively followed (n=987), for as long as the first seven years of life. Serological screening was performed at nine, 15 and 24 months of age, then yearly. Small bowel biopsies were recommended if the serology (tTG in most cases) was positive on two separate occasions, or in the presence of clinical suspicion. Between 1993 and 1999, 19 children were found to have evidence of CD, ten children had biopsy-confirmed CD, whereas, nine children had a positive tTG result at least twice. The mean age at presentation of evidence of CD was 4.6 years (range 2.6–6.5). Compared with HLA-DR3-negative children, the RR for evidence of CD was 5.6 (1.5–21, p=0.009) and 9.1 (1.7–48, p=0.003), for those expressing one and two HLA-DR3 alleles, respectively. The RR of CD in females was 3.34 (1–10.9, p=0.048) times that of males. Cognisant of the prevalence of HLA-DR mono- and heterozygotes among the same birth cohort, the authors calculated that by the age of 5, the estimated cumulative incidence of CD in the general population (defined as either biopsy-proven CD or persistently elevated tTG) was 9/1000 births (95% CI: 4–20), or 1:104 (1:49 to 1:221). This remarkably high cumulative incidence (i.e., twice that of the highest value among Swedish children at 4 years of age - 5.0 [95% CI: 4.4–5.7]193) has to be interpreted in light of the fact that only ten out of the 19 cases had been biopsied; the remaining nine cases were diagnosed on the basis of a persistently elevated tTG titre, the PPV of which the same authors reported to be only 70% to 83%.317 However, as mentioned above, these results are derived from an actual prospective and systematic screening intervention for CD, where asymptomatic cases would be detected. In all likelihood, there is therefore an important proportion of CD cases who remain undiagnosed during early childhood.

Incidence in adults: The crude incidence of CD in adults varied from lows of 1.27 in Denmark197 and 3.08 in England, 201 to a high of 17.2 cases per 100,000 patient years in Finland,199 where specific efforts had been untaken to encourage screening for CD (see Table 34).

As has been observed for children, the incidence of CD in adults seems to have increased over the past 20 years.199, 201 This is largely explained by a change in practice patterns: physicians are more aware of the condition, its atypical manifestations and associated condition, while at the same time, serological testing has become widely available. There are therefore more diagnoses made on the basis of case-finding. This is reflected by the fact that the proportion of patients being diagnosed with CD in the absence of symptoms, or as a result of serological testing, has also increased.199, 201, 318–320

In Finland over the period 1975-94, Collin et al.199 have observed a ten-fold rise in the incidence of CD. The authors attributed this to the use of serologic screening (physicians were actively told to screen patients with type I insulin-dependent diabetes (IDDM), autoimmune thyroid disease, connective tissue diseases, women with infertility, patients with neurologic symptoms and first-degree relatives of CD patients), the routine performance of intestinal biopsies on all patients undergoing gastroscopy, and to the opening of open-access endoscopy clinics, creating the ability of all general practitioners to refer patients for gastroscopy.

In Italy, a gradual increase in the number of annual new CD diagnoses was observed between 1968 and 1992;318, 320 this increase correlated with an increased proportion of patients with subclinical presentations being identified.318, 320 Interestingly, despite the changing clinical presentation, there was no statistical difference between the histological grades at diagnosis.320

The incidence of CD in individuals of all ages varies from 1.0 in the Netherlands200 to 2.13 in Italy.202 In Italy, the RR of CD in adults ranged from 0.11 in the >60 year group to 0.33 in the 16–39 year group, compared with children.202 The RR of CD for females was 1.90 (95% CI: 1.48–2.45).202

In the US, the 30-year incidence (1960-90) for Olmstead County was 1.2 (95% CI: 0.7–1.6), and the incidence for 1980-90 was slightly higher at 1.7 (95% CI: not reported).205 This observation contrasts with the cumulative incidence of 9/1000 by age 5 reported by Hoffenberg from Denver, Colorado;128 clearly, further knowledge of the epidemiology of CD in the US is required.

The point prevalence of CD can be calculated from registers of CD cases and the size of the population at risk; we found reports of such an observation in three of the included incidence studies.199, 199, 205 The point prevalence of CD was 21.8/100,000 in Olmstead County in 1991,205 2.7/100,000 (95% CI: 11.0–14.5) in the Netherlands in 1992,200 and 204/100,000 (95% CI: 181–231) in Finland in 1994.199 Of note, the later prevalence from Finland was observed in a community where intense efforts had been carried to screen the population at risk for CD.

Prevalence of CD in the General Population—Different Geographic and Racial/Ethnic Populations

Table 35. Prevalence of CD by country
Author, yearCountryAge groupTestTotal patientsPrevalence by serologyPrevalence by biopsyNotes
Fasano, 2003USAAdultsEMA - ME; all positive EMA tested with tTG-HU2,8450.00949116/350 biopsied
Green, 2000USAAdultsEGD/biopsy1,7490.00515Not all sytematically biopsied; only those with suggestive endoscopic features
Not, 1998USAAdultsIgG- and IgA-AGA - ELISA; confirmed with IgA-EMA ME or HU2,0000.00400
Fasano, 2003USAChildren1,2810.00312
Johnston, 1998UKAdultsIgA-AGA, IgA-EMA1,8230.00823
Sanders, 2003UKAdultsIgG- and IgA - ELISA; EMA-ME1,2000.019170.0100022/23 biopsied
West, 2003U.K.AdultsIgA EMA-ME, IgA-tTGA7,5270.01156
Rutz, 2002SwitzerlandChildrenIgA-EMA-ME, IgA-tTG, IgG-AGA and IgA-AGA1,4500.007590.0069010/11 biopsied
Borch, 2001SwedenAdultsBiopsy, IgA- and IgG-AGA; IgA-EMA-ME4820.014520.01867
Grodzinsky, 1996SwedenAdultsIgA-AGA; IgA-EMA1,8660.005890.00375Prevalence by IgA-EMA not reported
Ivarsson, 1999SwedenAdultsIgA- and IgG-AGA - ELISA, cut-off not recorded; IgA-EMA -ME; serum IgA level1,8940.004750.00475
Sjoberg, 1994SwedenAdultsIgG- and IgA-AGA1,5370.014310.0006513/22 biopsied
Sjoberg, 1999SwedenAdultsIgA-AGA, IgA confirmed with EMA-ME19700.001520.00152
Carlsson, 2001SwedenChildrenAGA, EMA, biopsy using Watson capsule6900.018840.01594
Riestra, 2000SpainAdultsIgG/IgA-AGA, IgA-EMA; the study was conducted as a 1) two-step protocol (determination of IgA/IgG-AGA, if positive measuring IgA-EMA); and a 2) one-step protocol (measuring IgA-EMA)1,1700.001710.002561 CD picked up when AGA and EMA was neg.
Corazza, 1997Republic of San MarinoAdultsIgA-EMA; biopsy5590.001790.00179
Hovdenak, 1999NorwayAdultsIgA- and IgG-AGA; IgA-EMA2,0690.003870.00338
Rostami, 1999NetherlandsAdultsIgA-EMA1,0000.003000.00300
Csizmadia, 1999NetherlandsChildrenIgA-EMA6,1270.012240.0050657/75 biopsied
Pittschieler, 1996ItalyAdultsIgA- and IgG-AGA; IgA-EMA; biopsy4,6150.001950.0019538 of 140 biopsied
Trevisiol, 1999ItalyAdultsIgA-EMA; biopsy4,0000.002500.00250
Volta, 2001ItalyAdults (mostly)IgA-EMA-HU; biopsy3,4830.005740.00488Prevalence of 0.57% (20/3483) if included 3 patients with normal villous but with increased IELs
Catassi, 2000ItalyChildrenIgG-AGA (7 AU); IgA-AGA (15 AU); IgA-EMA indirect IF (1:5 dilution); biopsy2,0960.00859
Catassi, 1996ItalyChildrenIgA- or IgG-AGA; confirmed with EMA and biopsy17,2010.006450.00477
Di Pietralata, 1992ItalyChildrenIgA-AGA; biopsy3,0220.006290.00596
Dickey, 1992IrelandAdultsIgA AGA4430.01129
Jager, 2001GermanyMixed - mostly adultsIgA-AGA, IgG-AGA, IgA-tTG -1500.02667Mixed group of at-risk populations, healthy group used
Kolho, 1998FinlandAdultsEMA -HU1,0700.010280.00748
Maki, 2004FinlandChildrenIgA and IgG tTG; IgA and IgG EMA - IF; total serum IgA; HLA DR, DQ2 and DQ83,6540.012590.00739
Collin, 2002FinlandMixed - mostly adultsBiopsy2,9740.00605
Weile, 2001Denmark and SwedenAdultsSerum IgA: IgG-AGA; IgA-AGA, cut-off >40 units; EMA; in cases of IgA <0.07g/L, IgG-AGA was analyzed1,5730.00254

EGD=esophagogastroduodenoscopy; IF=immunofluorescence; prevalence expressed as proportion (multiply by 100 for percent, or 100,000 for per 100,000 value)

Table 36. Prevalence of CD by serological screening test
Screening testAge groupNumber of studiesTotal patientsPrevalence range
Primary biopsyAdults2207,2104,7230.00515 – 0.00605
IgA AGAOverall2223,2293,4650.00629 – 0.01129
Adults12294430.01129
Children12233,0220.00629
IgA / IgG AGAAdults12161,5370.01431
IgA AGA - IGA EMAOverall 6208,209,211,217,219,2268,831 0.00152 – 0.01884
Adults 5208,209,211,217,2196,999 0.00152 – 0.01884
Children13211,8230.00823
IgA/IgG AGA - IgA EMAOverall 7212,213,218,220,221,224,22730,648 0.00195 – 0.01917
Adults 5212,213,218,224,22711,351 0.00195 – 0.01917
Children (Italy)2220,22119,2970.00645 – 0.00859
IgA/IgG AGA - IgA tTGMostly adults (Germany)12341500.02667
IgA EMAOverall 7126,214,222,225,230–23217,409 0.00171 – 0.01224
Adults 7126,214,222,225,230,2310.00171 – 0.01028
Children (Netherlands)12326,1270.01224
IgA EMA - IgG tTGOverall 4206,215,228,23316,757 0.00312 – 0.01259
Adults (USA, UK) 2206,22810,372 0.00949 – 0.01156
Children3 (includes Fasano Child Group)206,215,2336,3850.00312 – 0.01259

Country of study was indicated when possible; prevalence expressed as proportion (multiply by 100 for percent, or 100,000 for per 100,000 value)

Thirty-seven studies reported on the prevalence of CD in a general population (Evidence Table 4, Appendix I; Table 35). Three of these were conducted in the US,206–208 16 in the Scandinavian countries,184–187, 209–219, 232) eight in Italy,126, 182, 183, 220, 221, 223–225 five in the UK,188, 226–229 and five in other countries (Spain,230 Republic of San Marino,222 the Netherlands,231 Switzerland233 and Germany234). Several pairs of duplicate publications were identified including two triplets,182–188, 211, 213, 218, 220, 321 which brought the total number of included unique articles down to 30. The articles with the most complete data were used for the report.126, 206–234 Only seven studies were conducted in a child population,206, 209, 215, 220, 221, 223, 232, 233 but one large American study included separate data for both adults and children.206 All the included studies were conducted between 1992 and 2003. A summary of the included study characteristics is presented in Table 35. A breakdown of the included studies by screening test and age group is provided in Table 36.

The prevalence of CD by serology in the general unselected populations of North America and Western Europe, ranged widely from 152 per 100,000 (0.152% or 1:658) to 2,670 per 100,000 (2.67% or 1:37). The prevalence by biopsy ranged from 152 per 100,000 (0.152% or 1:658) to 1,870 per 100,000 (1.87% or 1:53). In four of the studies, a large proportion of the serology-positive subjects did not undergo biopsy.206, 216, 224, 232

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf26.jpg.

   Figure 26. Frequency distribution of prevalence of CD by serology among included studies

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf27.jpg.

   Figure 27. Frequency distribution of prevalence of CD by biopsy among included studies

Table 37. Prevalence of CD by statistical percentiles
PercentilesSerologyBiopsy
5.0016255.0007378
10.0018050.0015761
25.0030919.0025321
50.0063702.0047672
60.0084439.0050768
75.0117290.0071429
80.0125193.0074416
90.0184088.0147536
95.0225417.0183992
100.0266667.0186722
Minimum.00152.00065
Maximum.02667.01867

Prevalence expressed as proportion (multiply by 100 for percent, or 100,000 for per 100,000 value)

Among the included studies, there was no clear pattern relating prevalence to study age group, or in a consistent way to country, with large numbers of studies clustering around a prevalence range of 0.0025 to 0.014 by serology and 0.0025 to 0.010 by biopsy (Table 35; Figure 26, 27). In fact for prevalence by serology, the 50th, 75th, and 80th percentiles occurred at a prevalence of 0.00637 (0.64%), 0.0117 (1.2%), and 0.0125 (1.3%), respectively, while by biopsy the 80th percentile was at a prevalence of 0.0074 (0.74%) (Table 37; Figure 26, 27). Categorizing the studies by screening test and age group reduced the variability somewhat, but significant between study variation persisted. There were not enough studies to divide an analysis by screening test, age group, and country, simultaneously.

Among the studies conducted in the US,206–208 the prevalence ranged from 0.00312 (0.312% or 1:320—only child population in this group) to 0.00949 (0.949% or 1:105). The largest of these, by Fasano et al.,322 found a prevalence of CD in “not at risk” populations to be 0.95% in adults, 0.31% in children, and 0.75% overall (0.0075 or 1:133). This study included a predominately Caucasian population, although other ethnic groups were included (94% white; 3% black; 1.5% hispanic; 1% asian; 0.5% other). Not et al.208 found the prevalence by EMA confirmation of initial AGA testing to be 0.004 (0.4% or 1:250) in another predominately Caucasian population that also included other ethnic backgrounds (Caucasian [87%], African-American [11.5%], and Asian [1.5%]). Finally, Green et al.207 found a prevalence of 0.005 (0.5% or 1:200) in 1,749 patients undergoing upper endoscopy. The reason for the initial endoscopy in this study was not clearly described, and only those patients with endoscopic features suggestive of CD were biopsied, which may have underestimated the true prevalence of CD. The prevalence of CD among the six Italian studies was similar to that seen in the American studies, showing a range from 0.2% to 0.86%.126, 221, 223–225, 230 The prevalence of CD in other countries is presented in Table 35.

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf28.jpg.

   Figure 28. Prevalence of CD by country

Only four studies demonstrated a prevalence of CD of greater than 0.015 (1.5%) (UK,323 Sweden,209, 219 Germany234), and an additional six showed a prevalence of between 0.010 (1.0%) and 0.015 (1.5%) (UK,228 Sweden,216 Netherlands,232 Ireland,229 Finland214, 215). These studies would suggest a potentially higher prevalence of CD in these countries, though it should be kept in mind that other studies from these same countries showed a prevalence of less than 1.0%, including four studies from Sweden211, 213, 216, 217 (Figure 28). Only three of the eight studies conducted in a child population demonstrated a prevalence of CD of greater than 1.0% (Finland,215 Sweden,209 Netherlands232).

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf29.jpg.

   Figure 29. General population prevalence in relation to sample size

Among the 30 included studies, there was a considerable amount of variation in the point estimates for the prevalence of CD both by serology and by biopsy due to differences in serological test strategies, biopsy definitions and patient sampling, making pooled estimates unreliable. To further explore the potential sources of variability in the observed prevalence of CD, we plotted the studies' prevalence versus its sample size (Figure 29). This scatter diagram visually illustrates the distribution of the prevalence of CD among the included studies. The study with the highest reported prevalence of CD (2.67%), was also the one with the smallest sample size of 150 healthy patients, and also included several other at-risk groups, which were the primary focus of that study.234 Overall, studies with the smallest sample sizes tended to produce both the highest and lowest prevalence of CD. Using an arbitrary cut-off of 1,600 patients to divide “small” and “large” sample size studies, the prevalence by serology ranged fairly evenly from 0.17% to 2.67% for the 13 small studies, while 12 of the 18 large studies were located within a range of 0.5% to 1.26% (one study did not provide prevalence by serology).

Prevalence of CD in Patients with Suspected CD

Table 38. Included studies for prevalence of CD in patients with suspected CD
Study, year; countryClinical settingAge groupDx criteriaN testedPrevalence (%)
Bardella, 1991; ItalyReferral centerAdultsBiopsy6043.3
Bardella, 2001; ItalyReferral centerAdultsBiopsy8050.0
Carrocio, 2002; ItalyReferral centerAdultsBiopsy20711.6
Fasano, 2003; USANot reportedAdultsEMA1,9101.5
Bode, 1993; DenmarkReferral centerChildrenBiopsy1917.3
Day, 2000; New ZealandReferral centerChildrenBiopsy1534.6
Thomas, 1992; EnglandReferral centerChildrenBiopsy3817.9
Chan, 2001; CanadaReferral centerChildrenBiopsy7713.0
Chartrand, 1997; CanadaReferral centerChildrenBiopsy17617.0
Ventura, 2001; ItalyCommunity pediatriciansChildrenBiopsy2407.5
Fitzpatrick, 2001; CanadaCommunity pediatriciansChildrenEMA921.1
Fasano, 2003; USANot reportedChildrenEMA1,3264.0
Hill, 2000; USAReferral centerChildrenEMA1,0082.5
Hin, 1999; EnglandCommunity practiceAll agesBiopsy1,0003.0
Adults: The prevalence of CD in adults suspected of the diagnosis was reported in four studies (Evidence Table 5, Appendix I; Table 38); three from Italy,300, 301, 303 and one from the US.206 The following reasons for suspecting a diagnosis of CD were documented: anemia, persistent iron deficiency, bowel disturbances, chronic intermittent diarrhea, abdominal pain, constipation, dyspepsia, severe malabsorption, tiredness and weight loss, mineral metabolism deficiencies, osteoporosis, arthralgias, arthritis, dermatitis, hypertransaminasemia, type I diabetes mellitus, infertility, and gluten intolerance in childhood not further investigated.

All three Italian studies were from referral centers, and intestinal biopsies were performed on all suspected cases, which cumulated to 347. The prevalence of CD was very high in these series, i.e., 43%,300 50%,301, and 12%.303

In a large study of prevalence of CD in at-risk and not-at-risk individuals in the US, a total of 1,910 adults with CD-associated symptoms or disorders underwent serological testing with EMA. Fifteen of the 28 EMA-positive subjects (53.6%) consented to a biopsy, which was confirmatory in all cases.206 The source of these patients and their mode of recruitment/referral were not reported. Based on the EMA result, the prevalence of CD in these adults with suspected CD was 1.5%.

Children: The prevalence of CD in children suspected of the diagnosis was reported in nine studies (Table 38); three from Canada,304, 305, 307 two from the US,206, 324 and one each from Denmark,302 England,308 Italy,309 and New Zealand.306 The following reasons for suspecting a diagnosis of CD were documented: abdominal pain,238, 304, 305, 307, 309 diarrhea,238, 304, 305, 308 failure to thrive/short stature,206, 238, 304–306, 309 weight loss,305 vomiting,304, 305 abdominal distension,304, 305 chronic GI symptoms,306 inflammatory bowel disease,304 family history of CD,238, 304, 306, 309 type I diabetes mellitus,206, 238, 306 iron deficiency anemia (IDA),309 thyroid disease,304 trisomy 21,206, 304, 309 as well as enamel hypoplasia, recurrent aphtous stomatitis, autoimmune diseases, IgA deficiency, and occult hypertransaminasemia.309

Five of the eight studies came out of referral centers where all suspected cases (cumulating to 978) were biopsied.302, 304–306, 308 The prevalence of CD in these children ranged from 4.6%306 to 17%.305

In a case-finding study among 26 family pediatricians in Italy, 240 children were screened with EMA based on the presence of risk factors, and 18 diagnoses of biopsy-proven CD were made, resulting in a prevalence of 7.5%.309

Three studies, two American206, 238 and one Canadian,307 reported the prevalence of CD in children with related symptoms or conditions based on EMA testing. The cumulative number of children was 2,426, and the prevalence ranged from 1.1% in the Canadian study of children with chronic abdominal pain,307 to 4.0% in the large American study of CD prevalence in at-risk and not-at-risk populations.206

All ages: Hin et al., performed a case-finding study through nine primary care clinics of central England that served a total population of 70,000 (Table 38).299 A thousand patients were enrolled for serological screening, satisfying the following entry criteria: irritable bowel syndrome, anemia, family history of CD, malabsorption symptoms, diarrhea, fatigue, thyroid disease, diabetes mellitus, weight loss, short stature, failure to thrive, epilepsy, infertility, arthralgia, or eczema. The mean age of the screened subjects was 42.8 years; 5.3% were aged under 10, and 3.1% were aged 80 to 90 years. Thirty patients were EMA-positive, all of whom were confirmed by biopsy to have some enteropathy (90% had subtotal or total villous atrophy), and only one out of 30 patients had only IELs in the absence of villous atrohpy. The mean age of the 30 cases with CD was 42.8 years, and there was only one child diagnosed with CD. The prevalence of CD was 3.0%.

Prevalence of CD in with Type I Diabetes

Table 39. Included studies of prevalence of CD in type I diabetes
Author, year; countryTotal patientsAge groupScreening test(s)First serologyConfirmatory serologyBiopsy provenBiopsy criteria & descriptionPrevalence by serologyPrevalence by biopsyNotes
Li Voon Chong, 2002; UK509AdultsEMA7Nonen/aNone done0.0138n/a
Talal, 1997; USA185AdultsEMA9None4ESPGAN0.04860.0216Only 5/9 biopsied
Rossi, 1993211Children, some adultsEMA10None3ESPGAN0.04740.0142Only 3/10 biopsied
Kaukinen, 1999; Finland62AdultsEMANone7ESPGAN0.00000.1129
Sjoberg, 1998; Germany848AdultsAGA - IgG or IgA; EMA258227Marsh0.02590.0083Only 14/22 biopsied
Sategna-Guidetti, 1994; Italy383AdultsEMA12None10Roy-Choudhury0.03130.026110/12 biopsied
Rensch, 1996; USA47AdultsEMA3None3Loss of villous architecture, crypt hyperplasia, and increased IELs0.06380.0638
Frazer-Reynolds, 1998; Canada263ChildrenEMA17None12Carey capsule; Marsh criteria;0.06460.045617/19 biopsied
Gillett, 2001; Canada233ChildrenEMA or AGA19None14Not reported0.08150.060118/19 biopsied
Hansen, 2001; Denmark104ChildrenEMA or tTG10None9Partial or total villous atrophy, crypt hyperplasia and IEL infiltration0.09620.08659/10 biopsied
Saukkonen, 1996; Finland776ChildrenAGA or ARA76None19Not reported0.09790.0245Only 35/76 biopsied
Spiekerkoetter, 2002; Germany205ChildrentTG IgA or IgG13None6Marsh0.06340.0293Only 8/13 biopsied
Arato, 2003; Hungary205ChildrenEMA24None17n/r0.11710.0829
Barera,1991; Italy498ChildrenAGA IgA then if neg IgG AGA30None16Subtotal villous atrophy0.06020.032122/30 biopsied
Barera, 2002; Italy273ChildrenEMA, second EMA15109Marsh; type II or III lesion0.05490.0330
Valerio, 2002; Italy383ChildrenEMA or IgG AGAn/rNone32ESPGANn/r0.0836
Carelo, 1996; Spain141ChildrenIgA AGA if positive on two occaions12None4Subtotal villous atrophy0.08510.0284
Roldan, 1998; Spain177ChildrenIgA, IgG AGA, (and known cases, and some tested with EMA)19None7ESPGAN0.10730.0395Mixed group diagnosed by different means
Juan, 1998; Spain93ChildrenEMA7None6ESPGAN0.07530.0645
Sigurs, 1993; Sweden459ChildrenAGA19None21Watson Capsule0.04140.045818/19 biopsied included known CD
Agardh, 2001; Sweden162ChildrenAGA, EMA, or tTG IgG or IgA886As described by Carlsson et al. 1999, Pediatrics 103:12480.04940.0370Only 6 of 8 biopsied
Acerini, 1998; UK167ChildrenEMA or AGA11None8ESPGAN0.06590.04799/11 biopsied
De Block, 2001; Belgium399MixedEMA9None3No biopsy performed0.02260.0075Unclear how the 3 cases confirmed
Jager, 2001197MixedtTG19Nonen/r0.0964
De Vitis, 1996; Italy1114MixedIgA, IgG then IgA EMA12155.0063Marsh - “villous atrophy”0.10860.056678/121 biopsied
Not, 2001; Italy491MixedEMA28None28Intestinal biopsy; Marsh's modified classification0.05700.0570
Bao, 1999; USA847MixedtTG98None15n/r0.11570.0177Only 20/98 biopsied
Kordonouri, 2000; Germany520Mixed - mostly childrentTG23None9Marsh criteria0.04420.017310/23 not biopsied
Aktay, 2001; USA218Mixed - mostly childrenEMA17None10Partial or total villous atrophy, inflammation in lamina propria with increased IELs, and hyperplasia of crypts; classified as partial or total villous atrophy0.07800.045914/17 biopsied
Cronin, 1997; Ireland101Mixed - mostly adultsEMA8None5n/r0.07920.0495
Schober, 2000; Austria403Mixed - mostly childrenEMA12None6Modified Marsh and Crowe; Watson-type capsule0.02980.014911/12 biopsied
Lampasona, 1999; Italy287Mixed - mostly childrentTG IgA or IgG24Nonen/aNo biopsy0.0836n/a
Lorini, 1996; Italy133Mixed - mostly childrenAGA IgA or IgG5Nonen/aNo biopsy0.0376n/a
Page, 1994; Mixed1785N/aAGA73None13n/a0.04090.0073Only 49/73 biopsied
The literature search identified 36 studies that assessed the prevalence of CD in patients with type I diabetes (insulin-dependent diabetes mellitus [IDDM]).191, 192, 234, 250–282 Two sets of duplicate publications were identified.191, 192, 277, 282 The publications with the most complete data sets were used.277, 282 Of the 34 unique studies (Evidence Table 6, Appendix I; Table 39), seven were conducted in an adult population,257, 263, 266, 270, 273, 277, 279 21 in a child population,250–252, 254–256, 260–262, 264, 265, 267, 271, 272, 274–276, 278, 280–282 and six were conducted in a mixed population of adults and children.234, 253, 258, 259, 268, 269

All the included studies initially screened the study population with one or more antibodies. Three studies did not confirm positive serology with biopsy,265–267 whereas in nine studies confirmatory biopsies were performed in less than 75% of the screened-positive patients.253, 259, 264, 269, 272, 274, 277–279 These studies were not included in the pooled estimates of the prevalence of CD by biopsy. All the studies that reported biopsy criteria used partial villous atrophy or greater to define CD.

Table 40. Summary of prevalence of CD in type I diabetes by age groups and screening test
Number of studiesTotal patientsAge groupScreening test(s)Prevalence by serologyPrevalence by biopsy
1277848AdultsAGA - IgG or IgA; then EMA0.02590.0083*
1266509AdultsEMA0.0138n/a
1279185AdultsEMA0.04860.0216*
126362AdultsEMAn/a0.1129
3257,270,273531AdultsEMA0.04330.0339
1274776ChildrenAGA or ARA0.09790.0245*
1276459ChildrenAGA0.04140.0458
4254,256,267,271949ChildrenAGA - various combinations0.06950.0331
1252205ChildrenEMA0.11710.0829
1275403ChildrenEMA0.02980.0149
5251,255,260,272,2811058ChildrenEMA0.06240.0437
4251,255,260,281847ChildrenEMA0.06610.0437
5250,261,262,280,2821049ChildrenEMA - combinations0.07210.0658
1265287ChildrentTG IgA with IgG0.0836n/a
1278205ChildrentTG IgA with IgG0.06340.0293*
1264520ChildrentTG0.04420.0173*
12691785MixedAGA0.04090.0073*
12591114MixedIgA, IgG-AGA then IgA-EMA0.04940.0566*
1268491MixedEMA0.05700.0570
1258399MixedEMA0.02260.0075
1234197MixedtTG0.0964n/a
1253847MixedtTG0.11570.0177*
*

large proportion of serology-positive patients not biopsied,253,259,264,269,272,274,277–279 these were not included in the pooled analysis of prevalence by biopsy

**

no description of how diagnosis made — result not pooled

For all the included studies, the minimum prevalence of CD in IDDM by serology was 1% and the maximum was 12%. By biopsy, the minimum and maximum prevalence was 1% and 11%, respectively. Within a given study, the prevalence by serology was almost uniformly greater than the prevalence by biopsy, as would be expected. Table 39 (individual studies) and Table 40 (pooled summaries) list the study details, the individual study estimates of CD prevalence and the pooled estimates of prevalence when appropriate.

The prevalence of CD in adults was assessed in seven studies.257, 263, 266, 270, 273, 277, 279 Six of these studies used IgA EMA as the screening test,257, 263, 266, 270, 273, 279 whereas the largest study used IgA- and IgG-AGA, followed by EMA for confirmation.277 In this last study, EMA confirmation was positive in 22 of the initially screened sample of 848 patients (2.6%), but biopsy confirmation was only performed in 14 of these patients, making the estimate of 0.83% prevalence by biopsy unreliable. The second largest study (n=509) did not confirm the EMA-positive patients with biopsy, and demonstrated the lowest prevalence of CD by EMA (1.4%) of all of the studies.266 In another study of 185 patients,279 the prevalence of CD by EMA was 4.9%, but only five of nine screen-positive patients were biopsied, making the prevalence of 2.2% (4/185) by biopsy a likely underestimation since four of the five biopsied EMA-positive patients were diagnosed with CD. A small study of 62 patients used biopsy as the screening test and found the prevalence of CD to be 11.3%, which is the highest prevalence of the group.263 The remaining studies had uniform biopsy confirmation.257, 270, 273 In these studies the prevalence of CD by EMA ranged from 3.1% to 7.9%, and the prevalence of CD by biopsy ranged from 2.6% to 6.4%.

Twenty-one studies assessed the prevalence of CD in children with IDDM.250–252, 254–256, 260–262, 264, 265, 267, 271, 272, 274–276, 278, 280–282 Six of these studies used IgA-AGA or -AGA in combination with either IgG-AGA or other antibody tests.254, 256, 267, 271, 274, 276 The largest study tested 776 children with AGA and ARA (reticulin antibodies), and found a prevalence of CD by serology of 9.8%.274 However, only 35 of 76 serology-positive patients were biopsied, making the reported prevalence by biopsy of 2.5% a likely underestimation. A single study of 459 patients that used IgA-AGA as the screening test found the prevalence of CD by serology to be 4.1%, and the prevalence of CD by uniform biopsy confirmation to be 4.6%.276 The second largest study (n=498) used a combination of IgA- and IgG-AGA, and found a prevalence of CD by serology of 6.0% and a prevalence of CD by biopsy of 3.2%.254 Two other studies that used IgA and IgG-AGA271 or paired IgA-AGA measurements,256 found a very similar prevalence by serology of 10.7% and 8.5%, respectively, and a prevalence by biopsy of 3.95% and 2.8%, respectively. The last study in this group did not perform biopsy confirmation of the IgA- and IgG-AGA derived prevalence of 3.76%.267

Seven studies used IgA-EMA to screen for CD in children with IDDM.251, 252, 255, 260, 272, 275, 281 One Hungarian study of 205 children demonstrated a relatively high prevalence by serology and biopsy of 11.7% and 8.3%, respectively,252 whereas an Austrian study of 403 children demonstrated a relatively low prevalence by serology and biopsy of 3.0% and 1.5%, respectively.275 A study by Rossi et al.272 from the US demonstrated a prevalence of CD of 4.7%. The remaining studies demonstrated fairly consistent results, with the prevalence of CD by serology ranging from 5.5% to 7.8%, and the prevalence by biopsy ranging from 3.3% to 6.5%.251, 255, 260, 281

Three studies used IgA-tTG either alone264 or in combination with IgG-tTG.265, 278 IgA-tTG was used alone in a study of 503 children which demonstrated a prevalence by serology of 4.4%. Ten of the 23 serology-positive patients did not undergo biopsy confirmation, making the reported prevalence of 1.7% a likely underestimation. Of the two studies that used IgA- and IgG-tTG, the first did not perform biopsy confirmation and reported a prevalence of CD by serology of 8.4%,265 whereas, the other found a prevalence of CD by serology of 6.3%, and by biopsy of 2.9%, although only eight of 13 serology-positive patients underwent biopsy.278

Five studies used a combination of IgA-EMA and one or more other antibodies, to assess the prevalence of CD in children with IDDM.250, 261, 262, 280, 282 In three studies, EMA was combined with AGA,250, 261, 280 in one it was combined with tTG,262 and in the one it was combined with AGA and tTG.282 In one study, only the confirmed biopsy prevalence of 8.3% was reported.280 Overall, this group reported prevalences by serology ranging from 5.0% to 9.6%, and by biopsy ranging from 3.7% to 8.6%.

The remaining six studies assessed the prevalence of CD in a mixed-age population of patients with IDDM.234, 253, 258, 259, 268, 269 One study of 1,785 patients found the prevalence of CD by IgA AGA to be 4.1%. In this study, only 49 of 73 screen-positive patients underwent biopsy confirmation, making the reported prevalence by biopsy of 0.73% an underestimation.269 Another large study of 1,114 patients used IgA and IgG AGA as an initial screen of screen-positive patients, and then performed a second level screen with IgA EMA before moving on to biopsy.259 The EMA confirmed prevalence of CD was 4.9%, whereas, the reported biopsy confirmed prevalence was a relatively high 5.7%. In this study, 78 of 121 initial AGA-positive patients underwent biopsy, suggesting that most of the EMA-positive patients were biopsied.

Among the two studies that used IgA EMA as the screening test in a mixed-age population, the prevalence of CD by serology was 2.3%258 and 5.7%.268 It was unclear in the first study how the final confirmed prevalence of CD of 0.75% was arrived at,258 whereas, in the other study the uniformly confirmed biopsy prevalence was 5.7%.268

The final two studies assessed the prevalence of CD in a mixed-age population of diabetics using IgA-tTG.234, 253 The prevalence of CD by serology was fairly high in both these studies: 9.6%234 and 11.5%.253 The first study did not perform biopsy confirmation, whereas, in the last study only 20 of 98 screen-positive patients were biopsied, making the reported prevalence of CD by biopsy of 1.8% a likely underestimation.

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf30.jpg.

   Figure 30. Prevalence of CD in diabetes by study size

Clinical heterogeneity existed for some subgroups of this analysis making an overall pooled estimate of the prevalence of CD in children and adults with IDDM not entirely possible. However, a summary table (Table 40) is provided which presents the data grouped by age group and screening test, and Figure 30 presents the prevalence of CD in diabetes by study size. For similar studies a weighted pooled prevalence is provided, and individual study data with annotation is presented for studies that could not be pooled.

Prevalence of CD in Relatives of Patients with CD

Table 41. Prevalence of CD in relatives of CD patients
Study, year; countryRelative TypeIndex caseScreeningDx criteriaN testedPrevalence (%)
Polvi, 1996; Finland1st degreeCD in familyBiopsyESPGAN9020
Holm, 1993; Finland1st degreeCD in familyBiopsySome VA12110.7
Robinson, 1971; England1st degreeCD child in familyBiopsySome VA2910.3
Rolles, 1974; England1st degreeCD child in familyBiopsyNot reported725.6
Stokes, 1976; England1st degreeCD in familyBiopsySome VA18222.5
Tursi, 2003; Italy1st degreeCD in familyBiopsyMarsh I-IV11144.1
Corazza, 1992; Italy1st degreeCD adult in familyAGASome VA3284.0
Pittschieler, 2003; Italy1st degreeCD in familyEMA, TTGSome VA9212.0
Rostami, 2000; Netherlands1st degreeCD in familyAGA, EMA, HxESPGAN33810.9
Hogberg, 2003; Sweden1st degreeCD in familyAGA, EMA, TTGSome VA1208.3
Korponay-Szabo, 1998; Hungary1st degreeCD in familyEMASome VA9439.1
Farre, 1999; Spain1st degreeCD in familyAGA, EMASome VA6755.6
Kotze, 2001; Brazil1st degreeCD in familyEMA+ve serology*1153.5
Fasano, 2003; US1st degreeCD in familyEMA+ve serology4,5084.5
Vitoria, 1994; Spain1st degreeCD in familyAGA, EMA+ve serology6422.8
Mustalahti, 2002; Finland1st degree>1 DH or CD sibAGA, EMA+ve serology4669.4
Book, 2003; US1st degreeCD sib pairsEMA, TTG+ve serology16317.2
Hill, 2000; US1st & 2nd degreeCD in familyEMA+ve serology1924.7
Fasano, 2003; US2nd degreeCD in familyEMA+ve serology1,2752.6
Korponay-Szabo, 1998; Hungary2nd degreeCD in familyEMA+ve serology545.6
Book, 2003; US2nd degreeCD sib pairsEMA, TTG+ve serology8219.5
Book, 2003; US1st cousinsCD sib pairsEMA, TTG+ve serology4717.0
*

EMA titre ≥ = 1/5

VA = villous atrophy; DH = dermatitis herpetiformis

There were 18 studies on the risk of CD in first-degree relatives of patients with biopsy-proven CD,129, 167, 206, 235–249, four of which also provided data on the risk of CD in second-degree relatives (Evidence Table 7, Appendix I; Table 41).206, 235, 238, 239

First-degree relatives: First-degree relatives were directly evaluated with small bowel biopsy in five studies; three were performed in England in the 1970's,242, 243, 245 and two in Finland during the 1990's.129, 167 The biopsy criteria for a diagnosis of CD was not reported in one study,243 and implied at least some degree of villous atrophy in the other four.129, 167, 242, 325 The percent of all at-risk family members that were studied varied from 34%245 to 100%.243 The study size varied between 29242 and 182,245 and the cumulative number of patients tested was 494. The prevalence of CD among first-degree relatives undergoing intestinal biopsy varied from 5.5%243 to 22.5%;245 the pooled prevalence was 16%.

Serological screening of the first-degree relatives of patients with biopsy-proven CD was performed in 12 studies.206, 235–237, 239–241, 244, 246–249 In seven of those studies, intestinal biopsy was performed on at least 80% of the subjects who tested positive serologically, i.e., in 84 % of subjects in one study,237 and in 100% of subjects in the other six studies.236, 239, 244, 247–249 Serological screening was performed with AGA alone in one study,236 whereas, the other six studies used EMA, either alone239 or in combination.237, 244, 247–249 Six studies used criteria implying some degree of villous atrophy,236, 237, 239, 244, 247, 248 whereas, one study included cases with Marsh I changes.249 The study size varied from 92248 to 943239 subjects, for a cumulative number of 2,607 subjects. For the studies that required some degree of villous atrophy for diagnosis, the prevalence varied from 4%236 to 12%,248 and the mean prevalence was 7.6%. However, when Marsh I lesions were also considered diagnostic, the prevalence of CD among first-degree relatives was reported at 44.1%.249

In five other studies of first-degree relatives,206, 235, 240, 241, 246 confirmatory biopsy was not routinely performed (available in 9%246 to 58%241 of the cases), and the reported prevalence of CD was based on the serology results. EMA was used for serological screening in all of these studies, either alone,206, 240 or in combination with AGA 241, 246 or tTG.235

Two of these studies were performed in families where at least two index cases prevailed and are, therefore, reviewed separately.235, 241 Ninety percent of the at-risk populations from these two studies were tested, which represents a cumulative number of 629 subjects. The prevalence of CD among these first-degree relatives from families where there are at least two index cases of known CD or dermatitis herpetiformis (DH) was 9.4%241 and 17.2%.235

The study size of the other three studies varied from 115240 to 4,508,206 and the cumulative number of first-degree relatives tested was 5,265. The prevalence of CD among these serology-tested first-degree relatives varied between 2.8%246 and 4.5%206 (mean prevalence 4.3%).

Other relatives: One study from the US238 reported an EMA-based prevalence of 4.7% in 192 first- and second-degree relatives; the prevalence from each of the groups of relatives was not reported separately.

An American study by Book et al.235 studied the prevalence of CD in second-degree relatives and first cousins of CD sibling pairs (i.e., families with two affected index cases). Eighty-two second-degree relatives and 47 first cousins were tested with EMA and tTG, and the diagnosis was biopsy confirmed in 40% of the cases. The serology-based prevalence was 19.5% in second-degree relatives and 17.0% in first cousins.

Two other studies, one large (n=1,275) American study of prevalence of CD in at-risk and not-at-risk subjects,206 and one Hungarian study,239 provided data on the prevalence of CD in second-degree relatives. The EMA-based prevalence of CD in those groups was 2.6% and 5.5%, respectively (mean prevalence 2.7% on a cumulative number of 1,329 second-degree relatives).

Prevalence of CD in Patients with IDA

Table 42. Included studies of CD in adult patients with anemia
Author, year; countryNo. of ptsAge groupPopulationAnemia typeScreening testFirst serologyConfirmatory serologyBiospsy provenBiopsy criteriaPrevalence by serologyPrevalence by biopsy
Akerman, 1996; Israel93Adult - some teensOut-patients with IDA (50% symptomatic)IDAEGD/ biopsy13Subtotal or greater villous atrophyn/a0.139785
Annibale, 2001; Italy71AdultsAsymptomaticIDAEGD/ biopsy4Marshn/a0.056338
Corazza, 1995; Italy200AdultsReferred to hematologyIDAIgA/IgG-AGA then EMA then biopsy161010Not mentioned0.050.05
Dickey, 1997; UK10AdultsAsymptomatic, previously investigated no gross GI cause foundIDAIgA AGA then EMA43Endoscopic biopsy; criteria n/r; finding of villous atrophy and IELs in duodenal biopsy0.3n/a
Howard, 2002; UK258AdultsIDA identified through labIDA, folateIgA/IgG-AGA and EMA then biopsy2812Not applicabe0.108527130.046512*
Kepczyk, 1995; USA39AdultsMostly symptomatic out-patients with IDAIDAEGD/ biopsy4Villous atrophy, crypt hyperplasia, inflammatory infiltraten/a0.102564
McIntyre, 1993; UK50AdultsOut-patients with IDAIDAEGD/ biopsy3Not reportedn/a0.06
Oxentenko, 2002; USA113AdultsUndergoing EGD for IDAIDAEGD/ biopsy17CD was defined as total or partial villous atrophy with IELsNot applicable0.150442
Ransford, 2002; UK484AdultsReferred to hematologyIDAEMA then EGD/ biopsy1711Revised ESPGAN; duodenal histologic changes were graded according to Marsh I–III0.035123970.022727
Unsworth, 2000; UK483AdultsBlood donorsAnemia unspecifiedIgA-EMA then biopsy3222n/r0.066252590.045549
Annibale, 2003; Italy59AdultPre-menopausal women with IDAIDAIgA tTG then biopsy75Marsh0.118644070.084746**
Van Mook, 2001; The Netherlands35AdultAsymptomaticIDAEGD / biopsy1Marsh INot applicable0.028571

*24/28 biopsied

†5 Marsh I identified by CD3

‡25/32 biopsied

**5/7 biopsied; 30 had heavy periods; CD in 1/22 with heavy periods, and 4/18 with normal periods

Table 43. Summary of prevalence of CD in adult patients with anemia by population and screening test
No. of studiesTotal patientsPopulationScreening test(s)Prevalence by serologyPrevalence by biopsy
3283,288,290245Symptomatic IDABiopsyn/a0.139
128610Asymptomatic, previously no gross GI cause found investigatedIgA-AGA then EMA0.3n/a
129359Pre-menopausal women with IDAIgA-tTG then Biopsy0.1190.085
4285,287,291,2921,425Asymptomatic serology screenedIgA-EMA, or-AGA followed by EMA; all biopsy confirmed0.0610.039
3284,289,294156Asymptomatic biopsy screenedBiopsyn/a0.051
Twelve studies were identified that allowed for the extraction of the prevalence of CD among patients who were evaluated for anemia (Evidence Table 8, Appendix I; Table 42).283–294 In all of these, IDA was the primary focus of the study or made up the cause of anemia in the majority of the study patients. Tables 42 and 43 summarize the characteristics of the included studies.

Three studies assessed the prevalence of CD in IDA patients with GI symptoms.283, 288, 290 The prevalence of CD in these studies ranged from 10.3% to 15% of the studied group. One small study assessed the prevalence of CD in a group of patients who had IDA but no identified GI source.286 In this study, the prevalence of CD by AGA and confirmed by EMA was 30%.

In another study, the authors assessed the prevalence of CD in pre-menopausal women with IDA.293 The overall prevalence of CD in this population was found to be 12.9% by tTG, and 8.5% after biopsy confirmation. CD was found in 1 of 22 (4.5%) of women with heavy periods, and 4 of 18 (22%) of women with normal menstrual flow.

Four studies assessed the prevalence of CD in asymptomatic IDA patients by serology.285, 287, 291, 292 Two of these used EMA screening,291, 292 whereas the other two initially screened with AGA and then confirmed with EMA.285, 287 The prevalence of CD in this group ranged from 2.3% to 5.0%. Another three studies assessed the prevalence of CD by biopsy in asymptomatic IDA patients, finding it to be between 2.9% and 6%.284, 289, 294

Prevalence of CD in Patients with Low Bone Mineral Density (BMD)

Table 44. Prevalence of CD in patients with low BMD
Author, year; countryPopulationBMD definitionTestPrevalence
Lindh, 1992, Sweden92 consecutive patients with idiopathic osteoporosis screened for CD; 91% F (mean age 66+-12 Y); and 9 M (mean age 50+-12 Y)Bone mineral content by photon absorptiometry (SPA) of non-dominant forearm; criteria n/rIgA-AGA ELISA; cut-off was 2 SD above the mean of blood donors; confirmatory biopsy in 6 - criteria n/r11/92 (12.0%) AGA +ve.; 3% (3/92) biopsy confirmed
Mean proximal SPA 0.97 g/cm2
Mean distal SPA 0.67 g/cm2
Gonzalez, 2002; Argentina127 postmenopausal women with osteoporosis; age (Y): mean 68, range 50–82; 747 controls; age (Y): mean 29, range 16–79History of non-traumatic fractures and lumbar spine and/or femoral neck BMD below T-score -2.5 DXAIgA and IgG-AGA ELISA; cut-off levels: for IgA - 15 AU/mL; for IgG - 20 AU/mL; positives confirmed with IgA-EMA-ME positive at 1:5 dilution; positives confirmed with biopsy in EMA positives; showing villous atrophy, crypt hyperplasia and IEL >30%1/127, or 7.9 × 1000 (95% CI: 0.2–43.1); test positivity: AGA found in 8 of 127 (6.3%) pts on level 1; 1 of these 8 pts was EMA positive on the 2nd level and eligible for biopsy which established a diagnosis of CD in 1 (0.9%)
Mather, 2001; CanadaIdiopathic low BMD; mean age 57 Y; range 18–86 Y; 81.3% (78) F; 18.7% M (18)DXA Osteopenia:IgA- EMA-ME titers of ≥1:10; and biopsy confirmation based on subtotal or greater villous atrophy7 (7.3%) of 96 pts were EMA +ve; all biopsies were negative based on subtotal or greater villous atrophy prevalence of 0%
All osteopenic;45/78 F and 13/18 M osteoporoticBMD <1 SD of mean sex-matched peak
BMD Osteoporosis:
BMD <2.5 SD of mean sex-matched peak
BMD
Nuti, 2001; Italy255 females with osteoporosis; mean age 66.6 Y range 36–65 YDXAIgA-AGA ELISA-cut-off level of 10 AU/mL-1; IgA-tTg cut-off >22 AU; confirmatory biopsy criteria n/r53/255 (20.8%) +ve IgG-AGA; 24/53 +ve for tTG antibody (9.4%); intestinal biopsy in 10/24 resulted in 6 (2.4%) with confirmed CD
BMD below T-score -2.5

F=female; M-male; DXA=dual X-ray absorptiometry; Y=years; n/r=not recorded

Four articles were identified that assessed the prevalence of CD in patients with low BMD (Evidence Table 9, Appendix I).295–297, 326 The study characteristics and definitions used to define low BMD, osteopenia, and CD are presented in Table 44. Three of these studies determined BMD using dual energy X-ray absorptiometry (DXA), and defined osteoporosis as a BMD less than 2.5 standard deviations from the peak bone mass of sex-matched control,295, 297, 326 whereas, the other used single photon absorptiometry (SPA).296 One study included patients with non-traumatic fractures,295 whereas, in the others, idiopathic osteoporosis was sufficient for inclusion. All four studies used serology screening with biopsy confirmation of screen-positive patients. Three studies relied on AGA testing as the initial screen295, 296, 326 followed by biopsy,296 or further confirmatory serology testing with EMA295 or tTG326 prior to biopsy. The final study screened with EMA-ME, with positive screens moving on to biopsy.297 Two studies defined the biopsy criteria for CD and used a fairly standard but rigid requirement of subtotal or greater villous atrophy.295, 297

In the studies that used this test as the initial screen, AGA was positive in 6% to 21% of the patients with osteoporosis. However, in these studies CD was confirmed by biopsy in only 0.9% to 3% of patients.295, 296, 326 The study that used EMA-ME as a screening test identified potential CD cases in 7.3% of patients, but none of these met the authors' biopsy criteria for CD.297

Quality Assessment

Using the cross-sectional checklist, the overall quality of reports of the included studies for the Celiac 2 objective, was marginal to fair (Appendix J, Table 2). For example, most of the studies did not report on whether the patients were consecutively enrolled, which could possibly lead to selection bias.

Celiac 3: Risk of Lymphoma in CD

Literature Search

Out of 379 references resulting from the literature search on CD and lymphoma, 150 were initially excluded because they did not directly address this topic (Appendix F). Of the 229 studies that were screened using full reports of the studies, 211 were excluded for the following reasons: review articles (n=73; 19.3% of level 2 articles); did not address the topic (n=33); assessed the risk of CD in lymphoma (n=28); were uncontrolled studies, including surveys (n=53); or, studied the basic mechanisms and the pathogenesis of lymphoma in CD (n=24).

The following eight exclusions were made from the 18 publications that reached level 3 (i.e., eligibility criteria): duplicate publications (n=7);127, 327–332 (for two of these reports,328, 332 patients originated from the same center [i.e., General Hospital, Birmingham]) and the reports were conducted during the same periods as other reports,329, 330, 333 and we could not rule out that they were not similar series); data was not extractable (n=1).334

Table 45. Included studies for risk of lymphoma in CD
Study, year; country, periodStudy typeParticipantsRisk of lymphomaMortalityOther observations
Cottone, 1999; Sicily, 1980-97Retrospective cohort• 228 CD patients• Incidence NHL 3.1%SMR all causes 3.8 (1.9–6.7)
• 76% females• SIR NHL 3.75, p <0.01
• mean age at Dx 34.7
• 98% adult Dx
• 100% on strict GFD
Holmes, 1989; England, 1941-85Prospective cohort• 210 CD patients• Incidence NHL 4.3%SMR not reportedSIR NHL vs GFD compliance:
• 55% females• SIR NHL 42.7 (19.6–81.4)• Strict GFD 44.4
• 51% on strict GFD• Gluten diet 100
Logan, 1989; Scotland, 1979-1986Prospective cohort• 653 CD patientsMortality from NHL 2.6%SMR childhood Dx 1.4 (0.4–3.7)
• 60% femalesSMR from lymphoma 31 p<0.001SMR adult dx 1.9 (1.5–2.3)
SMR all causes 1.9 (1.5–2.2)
Askling, 2002; Sweden, 1964-94Retrospective cohort• 11,019 CD patients• Incidence NHL 0.34%SMR from NHL 11.4 (7.8–16)SIR NHL childhood Dx 1.9 (0.4–5.5)
• 59% females• SIR NHL 6.3 (4.2–125)SMR all causes 2 (1.8–2.1)SIR NHL adult Dx 7.0 (5.0–9.5)
• Mean age at Dx 17.4 (range 0–>70)
Collin, 1996; Finland, 1970-93Prospective cohort• 383 CD patients• Incidence NHL 0.26%
• 73% females• SIR NHL 2.66 (0.07–14.8)
• Mean age at Dx 41.8 (range 16–78)
• 75% on strict GFD
Corrao, 2001; Italy, 1962-94Prospective cohort• 1,072 CD patientsSMR from NHL: 69.3 (40.7–112.6)SMR age 18–29 at Dx: 2.5 (0.5–7.3)
• 76% femalesSMR all causes: 2.0 (1.5–2.7)SMR age 30–49 at Dx: 2.4 (1.3–4.0)
• mean age at Dx 35.7 (range 18–>50)SMR age >50 at Dx: 1.9 (1.3–2.6)
• 59% on strict GFDSMR strict GFD: 0.5 (0.2–1.1)
SMR unlikely GFD: 6.0 (4.0–8.8)
Green, 2003; USA, 1981-2000Prospective cohort• 381 CD patients• Incidence NHL 1.3%
• 64% females• SIR NHL 6.2 (2.9–14)
• mean age at Dx 44 +/- 18
Selby, 1979; Australia, 1959-78Retrospective cohort• 93 CD patients• Incidence NHL 4.3%
• 67% females• SIR NHL 4.94, p<.0005
• mean age at Dx 40 (range 14–70)
Delco, 1999; USA, 1986-95Case-control• 458 CD patients• OR NHL 4.53 (2.01–10.23)
• 4% females

Dx=diagnosis; SIR=standardized inidence ratio; NHL=non-Hodgkin's lymphoma; SMR=standardized mortality

The nine controlled studies selected for data extraction were grouped as follows: eight cohort studies,333, 335–341 and one case-control study342 (Evidence Table 10, Appendix I; Table 45). Mortality data from one controlled study in refractory CD is presented at the end of this section for reference.343

Measures of Risk

Eight out of nine studies were cohort studies, either prospective or retrospective. The standardized incidence ratio (SIR) was the most commonly reported measure of association; it was calculated as the incidence observed in the patient cohort divided by the expected incidence from the control population, along with a measure of precision (i.e., its 95% CI). The results were expressed either as SIRs of lymphoma or as the standardized mortality ratio (SMR) from lymphoma (SMR-NHL). The all-cause mortality was also reported in some studies.

It was not possible to pool these measures of risk, since SIRs by definition incorporate variables inherent to each population. The attributable risk (AR), was calculated whenever the incidence rates of NHL in CD patients and in the age-adjusted general population, were available.

Study Characteristics

There were eight cohort studies (five prospective 333, 336, 338–340 and three retrospective 335, 337, 341) and one case-control study.342 Two studies were from Italy,335, 339 two from the UK,333, 336 two from Scandinavia,337, 338 two from the US,340, 344 and one from Australia.341 The observation periods varied from 7 years336 to 44 years (1941-85;333), and the mean duration of patient follow-up varied from 6 years335, 339–341 to 18.6 years.333 Patients were either selected from a national patient register,336 from hospital discharge databases,337, 342 or represented all consecutive cases from a single333, 335, 338, 340, 341 or multiple339 institution(s). The cohort sizes varied from 93341 to 11019;337 55% to 76% of patients with CD were female, except for the study by Delco et al.,342 which used discharge diagnoses databases from the US Veterans Affairs hospitals (4% female CD patients). The mean age at diagnosis of CD was reported in six studies: in four studies, the diagnosis of CD was made almost exclusively in adulthood.335, 338, 339, 341 The mode of presentation was reported in four studies.335, 338, 339, 341 Adherence to a GFD was reported in five studies,333, 335, 338, 339, 341 and could be used in the analysis in three of them.333, 338, 339 Control data for the cohort studies was derived from local and national mortality data and cancer registers.

Types of Lymphomas

The total number of lymphomas diagnosed in each study and their histological type was not uniformly reported. Of the 84 lymphomas that were mentioned within these nine studies, 64 were referred to as “non-Hodgkin lymphoma (NHL)” not otherwise specified, one as “lymphoma,” nine as “enteropathy-associated T-cell lymphoma (ETCL),” five as “B cell lymphoma”, two as “large cell lymphoma,” one each as a “T-cell other than ETCL,” “lymphosarcoma” (currently classified as small cell lymphoma), and “histiocytic medullary reticulosis” (currently termed hairy-cell leukemia). Logan et al.336 reported that they found “mostly lymphosarcomas (i.e., small-cell lymphomas) or reticulum-cell sarcomas (i.e., large-cell lymphomas) as well as two Hodgkin's lymphomas,” whereas, the remaining authors systematically excluded Hodgkin's lymphomas from their respective analyses.

Incidence of Lymphoma and Related Mortality Data

The case definition of CD differed between the reports of institutional series and those derived from database analysis. The results will therefore be presented differently according to each of these two study designs.

Institutional series: By institutional studies, we mean reports on the evolution of cases consecutively diagnosed with CD and followed in one or several selected institution(s) over a specific period. Six out of the nine controlled studies were performed in that setting; in five out of six studies, the data originated from a single referral center.333, 335, 338, 340, 341 The sixth study is the product of a collaborative effort between nine Italian centers.339 In these studies, all cases were biopsy-proven CD.

Holmes et al., from Birmingham England, reported on a series of 210 biopsy-proven CD patients diagnosed and followed between 1941 and 1985.333 This series was originally reported by Harris in 1967,329 and reviewed in 1976330 and in 1989333 by Holmes. By this third publication, the authors had excluded all non biopsy-proven cases of CD, as well as the cases of cancer that arose either prior to or within 12 months of diagnosis of CD. The length of follow-up was of a minimum 13 years, 17.4 patient-years for men and 19.4 patient-years for women. Based on the original publication by Harris, we can assume that a large proportion of these patients (80% in Harris' series) were diagnosed with CD in adulthood. There were nine cases of NHL, compared with an expected 0.21, resulting in a SIR-NHL of 42.7 (95% CI: 19.6–81.4), which was the highest reported degree-of-risk for lymphoma among the controlled studies we identified.

Green et al.340 prospectively followed 381 patients with biopsy-proven CD from New York City, most of whom were of European descent, and diagnosed between 1981 and 2000. The mean age at CD diagnosis was 44 +/- 18 years, and the duration of CD-related symptoms prior to diagnosis was 5 +/- 8 years. The mean follow-up was 6 +/- 11 years, for a total of 1,977 patient-years following the diagnosis of CD. There were a total of nine cases of NHL, occurring any time before or after the diagnosis of CD, leading to an attributable risk of NHL from CD of 120.2 cases per 100,000 patient years. The SIR-NHL, diagnosed at any time, was 9.1 (95% CI: 4.7–13), and the SIR-NHL for any lymphoma diagnosed at least one month after the diagnosis of CD was 6.2 (95% CI: 2.9–14).

Cottone et al.335 reported on 228 patients with biopsy-proven CD and followed from 1980 to 1997, from a large referral center in Sicily. Ninety-eight percent of the patients had been diagnosed with CD during adulthood and the mean age at diagnosis was 34.7 years. The mean duration of follow-up was 6 years (range: 1 month to 17 years). No case of refractory CD was mentioned. There were seven cases of NHL, compared with an expected number of 1.824 (SIR-NHL of 3.75 (p<0.01)). The cumulative incidence of NHL was 3%, compared with an expected of 0.8%, leading to a risk difference or AR of 2.2%. The mean age at diagnosis of lymphoma was 59.4 years, and the mean time from the diagnosis of CD was 6.5 years. Lymphomas occurring prior to or within 6 months of CD diagnosis were excluded.

A large Italian multicenter study by Corrao et al.,339 prospectively followed 1,072 patients with CD and spanned from 1962 to 1994, totaling 6,444 patient years. The mean follow-up was 6 years, and all patients were diagnosed with CD during adulthood (mean age at diagnosis of CD 35.7 years). The outcomes were strictly measured in terms of mortality data, i.e., mortality from NHL and from all causes. Events occurring at the time of CD diagnosis were included. There were 16 instances of death from NHL. The SMR-NHL was 69.3 (95% CI: 40.7–112.6), whereas, the SMR of death from all cause (SMR-all cause) was 2.0 (95% CI: 1.5–2.7), showing that the risk of death from NHL in CD is disproportionately elevated.

Selby et al.341 reported on a series of 93 patients with CD that were followed at a single institution in Australia between 1959 and 1978, for a mean duration of 6 years. Patients presented either during the teenage or adulthood, all were symptomatic at the time of diagnosis, and there were no refractory cases. There were four patients with NHL (simultaneous CD and lymphoma diagnosis included), compared with an expected of 0.081 (SIR-NHL 4.94, p<0.0005).

Collin et al.338 reported on a prospective cohort of 383 patients with CD, diagnosed and followed at a single institution over the 1970-93 period, for a mean follow-up of 8.1 years (3,107 patient years in total). The mean age at diagnosis was advanced: 41.8 years, with a range of 16 to 78 years. Seventy-five percent of the patients adhered to a strict GFD and 82% of patients were symptomatic at the time of CD diagnosis. Simultaneous lymphoma and CD diagnoses were not excluded. There was a single case of lymphoma, compared with an expected 0.4 (SIR-NHL 2.66 [95% CI: 0.07–14.8]). As well, the 10- and 15-year survival of CD patients did not differ significantly from those of the general population.

Large database and register series: Logan et al.336 reviewed the death certificates of CD patients belonging to a comprehensive register of CD patients that exists in Scotland since 1979, constituting a cohort of 653 CD patients gathered from 1979 to 1986. There were 17 deaths attributed to lymphoma, instead of an expected 0.55. Both Hodgkin and NHL were included, and so were those lymphomas occurring simultaneously to the diagnosis of CD. The SMR-lymphoma was 31 (p<0.001), which was disproportionately increased compared with the SMR-all causes, which was 1.9 (95% CI: 1.5–2.2).

Askling et al.337 reported on the largest CD patient cohort (n=11,019), gathered from a comprehensive Swedish database of hospital discharge diagnoses over 1964 to 1994. It was not possible to ascertain how the diagnosis of CD was made or confirmed. The mean age at diagnosis of CD was 17.4 (range 0 to >70), and the mean follow-up was 9.8 years (range 0–32), for a total of 97,236 patient years. The ascertainment of outcome was achieved through the Swedish cancer register, as well as the register of causes of death. Lymphomas arising prior to or within 12 months of CD diagnosis were excluded, as for the incident lymphomas found at autopsy. There were 38 cases of NHL, and a SIR-NHL of 6.3 (95% CI: 4.2–125) was calculated. The SMR-NHL was 11.4 (95% CI: 7.8–16), which was disproportionately elevated compared with the SMR-all causes (2.0 [95% CI: 1.8–2.1]).

Delco et al.342 used the database of discharge diagnoses from all US Veteran Affair hospitals to gather a total of 458 CD patients, hospitalized between 1986 and 1995. The concomitant diagnoses received by those patients were compared with those of five controls per CD patient, randomly selected from the same year's discharge database (total 2,692 controls). The mean age of the CD group was 63.8 +/- 12.4 years and the mean age of the control group was 59.7 +/- 14.8 years (p<0.001). Ninety-three percent of the patients with CD were white, compared with 74% of the control subjects (p<0.0001). The odds ratio (OR) of NHL (OR-NHL) in CD, was 4.53 (2.01–10.23).

Role of a GFD

The impact of GFD compliance was analyzed and reported in only two of the nine studies. Holmes et al.333 reported a SIR of NHL in patients on a strict GFD (SIR 44.4), versus those who did not adhere to a GFD (SIR 100). Corrao et al.339 observed that the mortality from all causes was lower in patients on a strict GFD, as opposed to those who were unlikely GFD-compliant (SMR 0.5 [95% CI: 0.2–1.1] and 6.0 [95% CI: 4.0–8.8], respectively). Although, in the study by Askling337, compliance could not be directly ascertained, the SIR of lymphoma 1 to 4 years after diagnosis was 9.7 (95% CI:6.3–14), wheres, it dropped to 3.8 (95% CI: 2.2–6) five or more years after diagnosis, suggesting that the risk of lymphoma decreases over time on a GFD.

Risk of Lymphoma Versus Symptoms

The mode of presentation leading to the diagnosis of CD was not commonly reported. The reports from Italy335, 339 were unique in that they both detailed the circumstances by which the diagnosis of CD was diagnosed, portraying their cohorts as largely asymptomatic, since 45%335 and 70%339 of their patients had subclinical presentations, i.e., either mild symptoms, anemia, or were detected through screening. Conversely, it is reasonable to suggest that the studies that used hospital discharge diagnoses of CD as entry criteria would be largely made up of symptomatic CD patients. Unfortunately, it is not possible to compare the measured risk of lymphoma in the Italian studies to those of our other reports, because of the great disparities in populations, data collection and analyses amongst them.

The presence or absence of symptom at the time of CD diagnosis was not evaluated as a risk factor for lymphoma per se. Corrao et al.339 did, however, analyze the impact of the mode of presentation on the mortality from all causes in CD. They observed that patients diagnosed with mild symptoms or by antibody screening did not show any relevant excess mortality, compared with the symptomatic group (SMR 1.2 [95% CI: 0.1–7.0] and 2.5 [95% CI: 1.8–3.4], respectively).339

Impact of the Age at Diagnosis of CD

Several studies analyzed the risk of lymphoma with respect to the age at diagnosis of CD. Patients who were diagnosed with CD during adulthood were either 1) asymptomatic during childhood or 2) symptomatic but eluded the diagnosis. For the later circumstance, authors have referred to “diagnostic delay” as a symptomatic period in the absence of diagnosis or treatment. The impact of the diagnostic delay was analyzed in two studies.336, 339 Corrao et al.339 compared the mortality from all causes in patients who had suffered a diagnostic delay of more than 10 years, one to 10 years, or less than 1 year (no diagnostic delay), and found that the longer the untreated symptomatic period, the greater the mortality from all causes (SMR 3.8 [95% CI: 2.2–6.4], 2.6 [95% CI: 1.6–4.1], and 1.5 [95% CI: 0.9–2.3], respectively). Logan et al.,336 on the other hand, reported opposite results: while the SMR-all causes was significantly greater than 1 for their entire cohort (1.9 [95% CI: 1.5–2.2]), for those CD patients diagnosed only in adult-life despite an obvious childhood illness typical of CD, all-cause mortality was similar to that of other CD patients diagnosed in adult life. A difference in methodology might explain this discrepancy, since the ascertainment of outcomes was derived from registers in Logan's study and was probably not as accurate and reliable for outcomes such as the presence or absence of symptoms during childhood.

Logan et al.336 also reported that the all-cause mortality was increased in the patients diagnosed as adults, but not those who were diagnosed as children (SMR 1.9 [95% CI: 1.5–2.3] and 1.4 [95% CI: 0.4–3.7], respectively).

The patients from Corrao's cohort were exclusively diagnosed with CD as adults. The SMR-all causes for patients diagnosed between 18 and 29 years was slightly less, and not significantly different from 1.0, compared with those who were diagnosed later on in life, i.e., 2.5 (95% CI: 0.5–7.3) for those diagnosed at age 30 years versus 2.4 (95% CI: 1.3–4.0) for those diagnosed at age 49 years and 1.9 (95% CI: 1.3–2.6) for those diagnosed at age >50 years.

Askling et al.345 reported on 11,019 patients with CD, diagnosed at all ages, and found that the SIR-NHL was not significantly greater than one in CD patients who were diagnosed during childhood, in contrast with those who were diagnosed as adults (SIR-NHL 1.9 [95% CI: 0.4–5.5] for diagnoses made at ages 0 to 19 years compared with 7.7 [95% CI: 4.9–12] for those diagnosed between 20 and 59 years). Part of the increased risk in adults may be explained by the fact that in some of these cases the diagnosis of lymphoma can be made simultaneously or soon after that of CD. However, cases of lymphoma diagnosed within 12 months of CD diagnosis were excluded from Askling' study, so that the risk of lymphoma in adult CD diagnosis remains elevated independently of cases with simultaneous presentation.

Risk of Lymphoma in Refractory CD

We were unable to identify a single source of controlled data on the risk of lymphoma in refractory CD. There was one indirect source of controlled evidence on the mortality in CD. Nielsen et al.,343 from Denmark, published the mortality data from 98 patients with CD diagnosed between 1964 and 1982, 24% of which were treated with prednisone because they did not respond to a GFD, i.e., probable refractory CD. The mortality in CD exceeded that of the general population (controlled for age and sex) by a factor of 3.4 (p<0.025); in GFD-responders, this factor was 2.2 (p<0.025), whereas it was 5.8 (p<0.005) in the non-responders. The causes of death were poorly documented, and therefore, will not be described here.

Quality Assessment

The overall quality of the included studies was good (Appendix J, Tables 35). For example the assessment of outcomes was complete in the included studies.

Celiac 4: Consequences of Testing for CD

Out of 1,199 citations that were identified by the search strategy for the Celiac 4 objective, 140 met the level 1 screening criteria (excluded 1059) (Appendix E). Of these, 126 met the level 2 screening criteria (excluded 14). At level 3, 35 articles satisfied the screening criteria (Evidence Table 11, Appendix H)346–380 (excluded 72 articles at level 3). Eleven relevant articles were identified in other celiac objectives: five from Celiac 2;381–385 four from Celiac 3;331, 335, 336, 343 and two from Celiac 5.386, 387

The search strategy did not identify any studies that would allow us to address the specific benefits and harms of testing with different strategies for CD. The consequences such as false-positive results were dealt with in Celiac 1. We address the response to treatment in the sections that follow.

For the consequence of osteoporosis/fracture, an additional search was conducted with the search terms osteoporosis and CD, and five additional relevant studies were identified.388–392

The consequences that were included in this review were: 1) costs, 2) patients complying with treatment, 3) response to treatment in terms of symptoms, and 4) clinical outcomes such as reduced risk of complications-osteoporosis, mortality, anemia.

Given the recent recognition that the number of subclinical and silent CD cases may be eight times that of classically symptomatic cases, it is important to determine if the clinical outcomes vary according to type of clinical presentation. Where possible, results of the analysis according to type of clinical presentation are presented.

Part A

Most papers included in the consequences of testing for CD dealt with patients (who were newly diagnosed) after they initiated a GFD. Most studies evaluating the consequences of nutritional status were before/after studies. In total, 15 studies dealing with either nutritional status, weight, body mass index (BMI) and body composition, were identified.346–350, 352, 357, 359, 361, 363–365, 369–371

Seven studies were case control,347–349, 352, 357, 364, 369 one a cohort study,346 and in seven studies, the patients acted as their own control group.350, 359, 361, 363, 365, 370, 371

Eight studies were based on children with CD,347, 349, 350, 352, 357, 359, 369, 370, three studies were based on adolescents with CD363–365 and four studies were based on adults with CD346, 348, 361, 371

There were five studies that evaluated costs of screening as a consequence.360, 366, 379, 380, 382

Type 1 diabetes and CD. Four studies evaluated diabetes and CD in children.347, 357, 359, 370 Three studies were from Europe (UK,347 Hungary,359 and Finland370) and one was from Australia.357 Two were case control studies347, 357 and two studies had patients with CD act as their own controls.359, 370 All the studies assessed the effect of a GFD diet (range 3–12 months) on the diabetic control of type 1 diabetes.

The UK study347 evaluated 230 children with type 1 diabetes who were screened for CD with serology. Those children with positive serology were biopsied. Eleven children were diagnosed with CD and followed longitudinally. The control subjects were the children diagnosed with type 1 diabetes with negative serology. The controls were matched for age, sex and duration of diabetes in a 2:1 ratio (22 controls:11 cases). At baseline, the weight (standard deviation score; SDS), BMI SDS and HbA1c of the cases were statistically lower than the controls. No statistical difference was noted for height SDS, C-peptide level and insulin requirements. Also, the cases (type I diabetes with positive CD serology) received significantly less intensive insulin regimens compared with controls. Six type 1 diabetic children with CD participated in the GFD. After 12 months of a GFD, the differences seen in the BMI SDS was reversed between the cases and controls. HgA1c levels did not improve significantly on a GFD. Insulin dose requirements increased for both cases and controls, but still did not significantly differ from each other. Insulin regimens were not statistically different between cases and controls after a GFD.

The Australian study357 included children and adolescents with coexisting type 1 diabetes and CD, which were identified from a database of the Diabetes Center at the Royal Alexandra Hospital for Children. CD had to be biopsy-proven. Twenty patients (5M:15F) were enrolled out of 36 patients identified on the database. Forty control patients from the same database were matched for age, sex and duration of IDDM. No immediate criteria on screening from the database was given in the study. At baseline, the current height SDS, current weight SDS, BMI SDS and HbA1c were not significantly different from controls. Compliance with a GFD was based on dietary records classifying patients to: no detectable gluten; trace of gluten; and, gluten containing. For compliance, 30% of patients were classified as adhering to a strict GFD, 30% consumed trace amounts of gluten, and 40% had a significant amount of gluten in their diet. No differences were detected in growth parameters or HbA1c according to compliance to a GFD.

The Hungarian study359 included 205 children with type 1 diabetes that were randomly selected from screening for CD. None of these patients had suspicion for CD. Twenty-four children were positive for EMA and 17 (7 boys and 10 girls) had subtotal villous atrophy. The height of the children with CD and type 1 diabetes were normal compared with children with only type 1 diabetes at baseline. But the BMI of the 17 children was significantly lower (14.2 vs 16.3 kg/m2) compared to controls. After three months of a GFD, BMI significantly increased (14.2 vs 16.8 kg/m2). Furthermore, significant increases in insulin requirements (0.64 U/kg vs 0.48 U/kg) occurred after a GFD. The percentage of HbA1c did not change on a GFD compared with baseline (7.82% versus 7.67%).

The study from Finland by Saukkonen et al.,370 retrospectively screened 776 children with type 1 diabetes over a 2.7 year period with serology and, if positive, jejunal biopsy. Eighteen children (2.3%) had confirmed CD. HbA1c levels did not change after introduction of a GFD. Correlation of height SDS and mean weight for height were not compared post-GFD.

Body composition and anthropometrics. Six studies specifically detailed body composition after a GFD.348–350, 352, 369, 371 Of these studies, four examined children,349, 350, 352, 369 and two included adults.348, 371

Of the studies conducted in adult patients with CD, one was from Italy348 and the other from Argentina.371 In the Italian case-control study, 212 treated patients with histologically-confirmed CD were assessed. Of these, 71 (33.4%) (51 women and 20 men) were asymptomatic, had maintained a constant body weight during the previous 6 months, and were on a strict GFD. Forty-three of the patients were diagnosed as children (28 women and 15 men; average age 5.2 years) and 28 were diagnosed as adults (23 women and 5 men; average age 28 years). The average consumption of a GFD was ≥ 2 years. For each patient, there were two sex- and age-matched healthy controls (142 controls). Body composition was calculated by means of DEXA. The weight and BMI of female CD patients were lower than the controls (55.5 kg vs 58.7 kg, p=0.004 and 20.9 kg/m2 vs 22.4 kg/m2, p=0.03). The height and BMD were not significantly different, although BMD for those diagnosed as adults was lower than controls. Fat mass (22.9% vs 27.5%, p<0.05) and lean mass (38.8% vs 40.5%, p<0.03) were also significantly lower in cases versus controls. The weight (69.2 kg vs 73.3 kg, p=0.03), height (175 cm vs 178 cm, p=0.05) and BMI (21.9 kg/cm2 vs 23.5 kg/cm2, p=0.05) of male patients were significantly lower than in controls. Fat mass (13.9% versus 16.8%, p<0.05) and lean mass (55.5% versus 56.7%, p<0.03) were also significantly lower than in controls.

The study from Argentina by Smecuol et al.,371 enrolled 47 (41 females, 6 males) unselected, consecutive patients with newly diagnosed CD (diagnosed between Sept 1991 and Oct 1993). Twenty-five patients were re-evaluated in 1995 (24 females and 1 male). The diagnosis of CD was based on clinical features of classic and atypical symptoms, with positive small bowel biopsy and positive serology. Three patients were asymptomatic, the rest had classical features of CD. After 12 months, all patients on an initial GFD, improved. In the study, the patients acted as their own control—15 patients adhered strictly to the GFD, while ten were on a partial GFD. Patients on a strict GFD consumed less calories than patients who were poor compliers (p<0.05). After treatment, fat mass (18.2 kg, p<0.0001) and bone mass (2 kg/m2, p<0.002) increased significantly. Lean tissue mass did not increase. Body weight (55.7 kg, p<0.0001), BMI (22.2 kg/m2, p<0.001) and triceps skinfold thickness (15.8, p<0.0001) were increased significantly; mid-arm muscle circumference and muscle mass did not change. Patients who more strictly adhered to the GFD tended to demonstrate greater increases, although the trend was not significant.

Of the four studies that evaluated children, two were from Italy349, 352, one was from the Netherlands,350 and one was from India.369 Both Italian studies were case-control studies, whereas, in the Netherlands study, the patients acted as their own control. In one of the Italian studies by Barera et al.,349 29 consecutive children (14 boys and 15 girls) with a diagnosis of CD were enrolled (mean age 9.54 ± 3.42 yr). Diagnosis was according to ESPGAN criteria. Four patients had classic symptoms, while the rest had atypical CD. The patients were studied over 1.02 ± 0.15 years of GFD. Each patient was age- and sex-matched to a healthy control patient (n=29). At baseline, children with CD weighed less than the controls (28.3 ± 11 kg vs 34.5 ± 14.1 kg, p=0.04), had lower lean mass of limbs (8.4 ± 4.8 kg vs 10.8 ± 4.7 kg, p=0.0013), less fat mass (4.6 ± 3.5 kg vs 7.5 ± 4.9 kg, p=0.006), less percentage of fat mass (17.4 ± 8.3% vs 23.7 ± 8.4%, p=0.002) and lower bone mineral content (1067.2 ± 451.3 g vs 1317 ± 553.8 g, p=0.006). Height, BMI, lean mass, and ratio of lean mass to height, did not differ from controls at baseline. After an average of 1 year on a GFD in 23 children, no significant differences were found in weight, height, BMI, lean mass, lean mass to height, lean mass of limbs, fat mass, percentage of fat mass or bone mineral content (BMC), compared with controls. Compliance was good in all patients as assessed by EMA (only three subjects were still positive).

The second Italian study by Rea et al.,352 enrolled 23 children (8 boys and 15 girls, mean age 4.7 ± 0.76 yr) from Jan 1992 to Dec 1994, according to ESPGAN criteria. They were sex- and age-matched to healthy controls from the ambulatory clinic. At baseline, the height, BMC, arm muscle area (AMA), triceps skinfold (TSF), subscapular skinfold (SSSF), and fat area index (FAI), were significantly lower than controls. The BMI and weight for height index (WHI) were not different. After GFD, all the parameters improved when compared with patients to before GFD. Height, BMC, AMA, BMI, TSF, SSF, FAI and WHI all significantly improved. If patients post-GFD were compared with controls, the height was still significantly lower (p=0.01) but the rest of the values were not significant. After a GFD, the blood chemistry of these patients was assessed. The hemoglobin, iron, protein, albumin triglycerides, calcium, and zinc levels were significantly different from the baseline value; however, transferrin, cholesterol, phosphorus and alkaline phosphatase levels were not different.

The study from the Netherlands by Boersma et al.,350 enrolled 28 children (9 boys and 19 girls) with newly diagnosed CD (between Jan 94 to Jan 95). All children had classic symptoms and had positive small bowel biopsies. After 3 years of a GFD, the BMI SDS and height SDS improved significantly (p<0.0001 for both). The initial improvement of BMI SDS was seen in the initial 6 months with subsequent gradual improvement. The height SDS improved continuously over the 3 year period, and the improvement was significant.

In a study from India by Poddar et al.,369 104 children evaluated for CD between Sept 1997 to Dec 1998 were included. All children had diarrhea, failure to thrive or pallor as a clinical presentation. Fifty-seven were diagnosed as having CD (by modified ESPGAN score) and the remaining 47 were controls. Seven children who did not respond to a GFD and were excluded, were diagnosed with other diseases. The mean follow-up of patients after starting a GFD was 19.6 ± 8 months (range 4–36 months). The remaining 50 children had a dramatic response to the GFD. Symptoms subsided in 16±9.8 days (range 4–30) and all showed significant weight gain (66% ± 14% vs 86% ± 11% of expected, p<0.001). Height gain improved, but was not significant (88 ± 5% vs 94 ± 5% of expected, p=not significant). Seventeen percent of the children had poor compliance to the GFD. No attempt at subdividing patients into poor versus good compliance was made.

Nutritional status. Two studies looked at nutritional status with biochemical markers.

In the study from Finland by Kemppainen346 nutritional status of newly diagnosed patients with CD before and after GFD was reported. Forty patients with CD diagnosed between Nov 1988 to Dec 1990 were included. All had abdominal symptoms. Diagnosis was made on presence of partial villous atrophy (eight patients), subtotal villous atrophy (17 patients) or total villous atroph (15 patients). On mean histomorphometric index, there was a statistically significant trend (p=0.004) comparing partial villous atrophy (0.018 ± 0.003), subtotal villous atrophy (0.0015 ± 0.002) and total villous atrophy (0.013 ± 0.002). When biochemical measurements were examined according to grade of villous atrophy, significant differences were seen for ferritin (p<0.01) and transferrin (p<0.05). Serum ferritin was still significantly lower in total villous atrophy, as was erythrocyte folate levels if sex was standardized in an analysis of variance. Severity of villous atrophy also correlated with ferritin, erythrocyte folate, and serum vitamin B12. Abnormal values of serum protein, vitamin A, and vitamin B12, were low. There were no abnormal vitamin E levels. Villous atrophy improved in all patients within 12 months of a GFD. Two patients had subtotal villous atrophy, 29 had partial villous atrophy and three had normal villi after a GFD. Six patients withdrew from the study. BMI increased after a GFD, as did most of the biochemical measurements. One patient with subtotal villous atrophy still had a low hemoglobin value. Of the 29 patients with partial villous atrophy, three had low folate levels, seven had low hemoglobin, one had low vitamin B12, one had low protein, five had low vitamin A, five were low in ferritin, five had low iron, and ten patients had low zinc levels. Only one patient (out of three) who had normal villi also had low hemoglobin levels.

In the study from Italy, by Bardella et al.,361 26 adults (five male and 21 female, mean age 42.2, range 22–81) with malabsorption and biopsy-confirmed CD were enrolled. They were followed for a mean of 55.4 months (range 13–137 months) on a GFD. Eight patients remained in good health with normal blood tests. The remaining 18 patients had abnormalities despite GFD. No correlation was noted with severity of symptoms of malabsorption and biochemical abnormalities. Iron deficiency was found in five patients. Abnormal calcium, phosphorus, alkaline phosphatase and/or bone density was found in seven patients. Macrocytic anemia was found in four patients. Clinical symptoms were seen in 11 patients. No correlations between abnormal values and grade of histology on biopsy were found.

Compliance. Three studies were identified that looked at compliance, 363–365. All studies were conducted in Italy and assessed an adolescent population.

In the first study of adolescents that looked at dietary compliance, Fabiani et al.363 evaluated 28 biopsy-proven CD patients (17 females and 11 males). These 28 adolescents were selected from a group of 6,315 students, age 11 to 14 years, who had previously been screened for CD. All were advised to start a GFD. Twenty-three of the 28 patients participated in this study. The mean follow-up duration was 23 ± 7 months (range 9–3 months). Fifty-two percent (12/23) were on a strict GFD and 47% (11/23) partially adhered to the diet. Improvement in most patients was seen after starting a GFD. Weight gain was reported in 12 patients (52%)—11 had increased height velocity and appetite, eight had disappearance of symptoms of abdominal pain, six had resolution of diarrhea, five had disappearance of anemia and three had disappearance of recurrent aphthous stomatitis. Three patients did not demonstrate any change.

The second study, also by Fabiani,364 was a 5-year case-control study that enrolled two groups of patients. The first group (group A) included subjects between the ages of 11 and 14 years, who were diagnosed as a result of a mass screening program. The second group (group B) were patients diagnosed due to typical symptoms of CD between 1985 to 1986. All patients had biopsy-proven CD according to ESPGAN criteria. All patients were followed for 5 years and advised to start a GFD. Twenty-seven patients were in group A and 22 agreed to participate; 24 patients were in group B and 22 agreed to participate. There were no differences between the patients in group A and group B in terms of BMI and height SDS. No difference was found between the two groups in terms of symptoms. Adherence to the treatment was significantly lower in patients from group A compared with group B. There were a significantly greater proportion of patients in group B that demonstrated strict adherence to a GFD (15/22; 68%) compared with patients in group A (5/22; 23%).

The third study to look at compliance looked at 306 teenage patients with CD (mean age 15.9 yr; range 10–27 yr) recruited consecutively from a CD clinic.365 Of the patients, 186(60%) were female and 120 were male. Diagnosis of CD was biopsy confirmed. Recall questionnaire was used to evaluate diet and compliance. Compliance was recorded in three categories: 1) strict gluten diet (n=223 [73%]); 2) occasional relapse (n=46) 15%; and, 3) gluten-containing diet (n=37) 12%. Eighty percent of the female patients, compared with 64.2% of the male patients, adhered to a strict diet (p=0.012). Compliance also varied with age, with older age associated with less compliance (p=0.05). Growth status was grouped according to compliance to a GFD—the mean standardized height, the relative weight for age, and the relative weight for height, did not differ significantly between the compliance groups. Symptom scores were relatively good among all groups. No statistically significant differences were noted. School performance was not significantly different between good versus poor compliers.

Costs. Five studies included an assessment of costs involved in different screening strategies.360, 366, 379, 380, 382

Harewood et al.366 performed a decision analysis to compare costs of serological testing versus small bowel biopsy (AGA vs EMA versus small bowel biopsy) for diagnosis of CD. The analytic technique used was a cost minimization and the viewpoint was third-party payer. A sensitivity analysis was conducted. The authors demonstrated that initial screening with EMA is the least costly strategy for diagnosis in a low to medium risk population.

Gomez et al.382 evaluated a screening algorithm for CD in 1,000 consecutive subjects who were screened while attending a central laboratory. Gomez and colleagues compared two screening protocols: (1) three-level screen-IgG/IgA-AGA antibodies at the first level, then IgA-EMA, and finally intestinal biopsy versus screening, and (2) tTG-GP and total IgA as first-line screen, and EMA for positive patients followed by intestinal biopsy. The analytic framework and viewpoint were not stated. In this study, a comparative cost analysis was performed. They found that the combination of a highly-sensitive test at the first step with a highly-specific test at the second step appears to be a more reliable screening mechanism.

Zaccari et al.,379 in an Italian model, proposed a four-level screening protocol for children at least 15 months of age, including: 1) AGA, 2) EMA, 3) intestinal permeability, and 4) small bowel biopsy. In this study, they evaluated only the total costs at each level of screening.

Atkinson et al.,360 in a Canadian study, evaluated the operating costs of EMA in the diagnosis of CD using a cost-minimization model with a decision analytic approach with three strategies. The analytic perspective used was the societal viewpoint, and costs were discounted at 5% per annum. A one-way sensitivity analysis of all probability and cost estimates was performed. Incremental costs of the GFD were estimated from a survey of 25 patients which resulted in a lifetime incremental cost of $44,000. If a small bowel biopsy was performed initially, the cost was $997; for EMA followed by small bowel biopsy, the cost was $866. The total cost was $3,714, which resulted in an incremental cost savings of $2,177 if small bowel biopsy had been performed first. In the sensitivity analysis, the specificity of EMA would have to be greater than 95% to make EMA least expensive.

Part B

There were 27 studies that examined the response of various endpoints to a GFD.

One Italian study,354 used a case-control design to evaluate the effect of a GFD on thyroid status. The study by Annibale et al.,358 evaluated the impact of a GFD on anemia and iron deficiency in newly diagnosed CD cases identified from screening of adults with IDA in Italy. In a case-control study, Ciacci et al.351 investigated the impact of a GFD on pregnancy outcomes, and Addolorato et al.374 evaluated the impact of a GFD on anxiety and depression in a population of CD patients in Italy. Mortality was evaluated in seven cohort studies.331, 335, 336, 343, 362, 367, 368 Seventeen studies assessed either change in BMD or fracture as an endpoint in individuals with CD.

Thyroid study. In the Italian study,354 241 consecutive adults with biopsy-confirmed CD were enrolled between Jan 1996 and July 1998 (177 women and 64 men). Forty percent of patients had classical symptoms, 44% had atypical symptoms and 16% had silent CD. Two hundred and twelve patients, matched for age, sex and ethnic origin, were used as controls. All newly-diagnosed CD patients were started on a GFD and patients with hypo- or hyperthyroidism were started on appropriate medical therapy. Thyroid dysfunction was found in 73 (61 women and 12 men) of 241 patients with CD, and in 24 (19 women and 5 men) of the 212 patients in the control group (p<0.0005). The difference was statistically significant for women when divided by sex (p<0.0005). Hypothyroidism was diagnosed in 31 patients (12.9%) and nine controls (4.2%) (p<0.003); it was subclinical in 29 CD patients and eight controls and overt in the remainding patients. The difference was only significant for women (p=0.0045). Twenty-one patients and four controls had non-autoimmune hypothyroidism. Ten patients and five controls had autoimmune hypothyroidism. Hyperthyroidism was diagnosed in three patients and seven controls; it was subclinical in two patients and five controls. Autoimmune thyroid disease with euthyroidism was present in 39 patients and eight controls. The difference was only statistically significant in women (p<0.0005). At diagnosis, the BMI, hemoglobin, iron, and albumin levels were similar between patients with thyroid disease and those without. After 1 year of a GFD, 128 patients were reassessed. Ninety-one patients had normal thyroid function, whereas, 37 had some impairment. Compliance to diet was not different between the two groups. Subclinical hypothothyroidism improved in 10/14 patients with non-autoimmune hypothyroidism. Three of five patients with autoimmune hypothyroidism shifted to autoimmune thyroid disease with euthyroidism; four out of five patients with no improvement in thyroid function had poor compliance with diet. Significant improvement in nutritional indices was also seen with BMI in females, HBG in both sexes, and serum albumin and serum iron in both sexes.

Iron deficiency. In this Italian prospective study,358 190 consecutive patients (160 women and 30 men) who were referred to the GI department from the hematology for IDA between Jan 1994 to May 1997, were examined. Twenty-six patients were diagnosed with CD (24 women and 2 men); average age 31.3 years (range 20 –72). Seventy-seven percent of patients had total villous atrophy and 23% had subtotal atrophy; repeat endoscopy with biopsy specimens were taken after 6 months. After GFD, 20 patients (18 women and 2 men) were followed for 24 months. After 6 months, 14 of the 18 female patients (77%) recovered from IDA. Only 5/18 reversed from iron deficiency as defined by normal ferritin levels. At 12 months, 17/18 recovered from IDA. Nine patients reversed from iron deficiency. After 24 months, the same patient still did not reverse from IDA. Ten patients (55%) reversed their iron deficiency. Of the two males, at 6 months of a GFD, only one recovered from anemia but not from iron deficiency (low ferritin). At 12 months, both patients reversed their anemia and iron deficiency. At 24 months, further increases in ferritin were observed. In a subgroup of patients that had repeat small bowel biopsies at 6 and 12 months, there was a significant inverse correlation between increases in Hb concentrations and decreases in histological scores of duodenitis. This study demonstrated that recovery from IDA occurs within the first 6 to 12 months, but reversal from iron deficiency occurs in 50% of cases (predominantly premenopausal women). Long-term follow-up of ferritin results and small bowel biopsies in subjects with CD would be helpful to determine if iron deficiency resolves completely.

Pregnancy outcomes. In this case-control study from Italy by Ciacci et al.,351 297 women with CD were enrolled. Three types of analyses were used. Analysis A was a case-control study between untreated women (n=94; at least one pregnancy when symptoms of CD were present and lead to eventual diagnosis) and treated CD women (n=31; at least one pregnancy after 1 year of a GFD). At baseline, weight, height and body mass index were the similar between the two groups. However, the treated group was significantly younger than the untreated group (37.3 ± 12 yrs vs 22.4 ± 1.6 yrs, p<0.01), which may have biased the results. The number of pregnancies per woman was also lower for the treated group (2.72 ± 0.16 vs 1.6 ± 0.11, p<0.0001). The number of abortions per woman (0.489 ± 0.085 vs 0.032 ± 0.032, p<0.0001), as well as the abortion to pregnancy ratio, was much lower for the treated group compared with the untreated group(0.153 ± 0.027 vs 0.024 ± 0.024, p<0.005). Subgroup analysis taking into account the age at diagnosis, demonstrated that for those women diagnosed at age 30 years or less (n=27), the number of abortions per woman was 0.556 ± 0.156 and the abortion to pregnancy ratio was 0.234 ± 0.066. The prevalence of abortion in pregnancies was 17.8% in untreated CD patients, compared with 2.4% in treated patients (p<0.001). The RR of abortion was 8.9. Low-birth-weight baby to pregnancy ratio (0.126 ± 0.037 vs 0.024 ± 0.024, p<0.03) was significantly lower in the treated group. The duration of breast feeding was significantly longer for the treated group (2.77 ± 0.52 vs 7.03 ± 1.17, p<0.0003). The threatened abortion to pregnancy ratio and premature delivery to pregnancy ratio was not significantly different from untreated to treated CD women. For the subgroup of women <30 years (n=27), birth weight, baby to pregnancy ratio, and duration of breast feeding, did not alter the statistical significance. The prevalence of low birth weight babies in nonabortive pregnancies was 12.7% for untreated patients and 2.4% for treated patients (p<0.05). The RR of low birth weight babies was 5.84 times greater in the untreated group compared with the treated group.

In Analysis B, women with CD were all untreated and then analyzed depending on whether diarrhea was present or not. The authors found that the abortion to pregnancy ratio and the premature delivery ratio were found to be lower in CD women without diarrhea compared with those women with diarrhea, although the difference was not statistically significant.

In Analysis C, the effect of a GFD on pregnancy outcome was analyzed. The study examined 12 women with CD after 1 year of a GFD (own control); there was at least one pregnancy without treatment. All outcomes were better in the group of women on the GFD: number of pregnancies 2.5 ± 1.24 versus 1.08 ± 0.29 (p<0.003); number of abortions per woman 1.08 ± 1.16 versus 0.08 ± 0.28 (p<0.02); abortion to pregnancy ratio 0.405 ± 0.140 versus 0.074 ± 0.280, p<0.02); and, low birth weight baby to pregnancy ratio 0.292 ± 0.129 versus 0 (p=0.05). The threatened abortion to pregnancy ratio, premature delivery to pregnancy ratio, and duration of breast feeding, were not significantly different between the two groups. The prevalence of abortion was 43.3% for the untreated group, compared with 7.7% for the treated group of CD women (p<0.01). The RR of abortion was 9.18. There were no low birthweight babies born to women in the GFD group, whereas, the prevalence of low weight babies was 29.4% in the untreated group (RR=11).

One of the limitations of the Ciacci et al. study was that it did not include an external control group or control for confounders. A historical cohort population-based study of the Danish Medical Birth Registry by Norgard, 1999393 evaluated birth outcomes in women with CD. This study included 211 newborns born to 127 mothers with CD from 1977-1992 and compared them with 1,260 control deliveries. Women with CD were identified from hospital discharge diagnoses. Discharge records were linked to Medical Birth Registry which contained information on relevant outcomes. Outcomes included birthweight, low birthweight (<2500 g) pre-term birth (<37 wk), intrauterine growth retardation (birthweight <2500 g and gestational age ≥37 wk of pregnancy), and perinatal mortality. Potential confounders including maternal age, infant's gender, parity, and gestational age, were adjusted for in the analyses. The investigators could not control for other confounders such as smoking. Another potential limitation is that the date of diagnosis of CD was the initial time of discharge from hospital with CD. It is possible that women may have been initially diagnosed in the ambulatory care clinic. Details about the clinical presentation of the women with CD and biopsy findings were not available. The mean age at time of delivery was 27.5 years for women with CD and 26.3 years for control women.

Norgard et al.,393 found that before women were hospitalized for CD, they were at an increased risk of low birthweight babies (adjusted OR=2.6 [95% CI: 1.3–5.5]), and intrauterine growth retardation (12.3% vs 4.8% of controls; adjusted OR=3.4 [95% CI: 1.6–7.2]). After women with CD were first hospitalized, there was no increased risk of low birthweight babies (6% post diagnosis) or intrauterine growth retardation, when compared with controls. The results of this study have implications for women with undiagnosed (atypical or silent) CD.

Anxiety and depression. The study from Italy by Addolorato et al.,374 enrolled 43 newly-diagnosed adult patients affected with classic CD, selected from 234 adult CD patients from an outpatient clinic between June 1995 and Oct 1998. No psychiatric disorders other than anxiety and/or depression were allowed. The diagnosis of CD was based on positive serology and biopsy. Of the 43 enrolled patients, eight dropped-out leaving 35 (14 males and 21 females, mean age 29.8 ± 7.4 yr) patients for analysis. After a period of 12 months of GFD treatment, the patients were analyzed. The adherence to a GFD was evaluated based on patient self-report and family member interview. A group of 59 healthy asymptomatic controls (27 males and 32 females, age 31.7 ± 6.9 yr) were matched for gender, age, residence, employment, socioeconomic and marital status. The psychological assessment was performed using a self-rating psychometric test for anxiety (State and Trait Anxiety Inventory test) and another for depression (SDS Zung self rating depression scale). Both tests were administered before and after GFD. Of the 59 controls, 23.7% showed high levels of anxiety, 15.2% showed trait anxiety, and 9.5% were positive for depression. Of the 35 untreated CD patients, 71.4% had high levels of anxiety, 25.7% showed trait anxiety and 57.1% were positive for depression. After 1-year of GFD, 25.7% had high levels of anxiety, 17.1% had trait anxiety, and 45.7% were still depressed. The levels of high anxiety (71.4% vs 23.7%, p<0.0001) and levels for depression (57.1% vs 9.6%, p<0.0001) were significantly higher in the CD patients than in the controls. The proportion of untreated CD patients with trait anxiety did not differ from controls. After a 1-year GFD, a significant decrease in high-state anxiety (71.4% vs 25.7%, p<0.001) was found when treated patients were compared with the untreated group. No significant differences were found for trait anxiety or depression.

Fractures. We identified six controlled studies that addressed the outcome of fractures in a CD population385, 388–390, 394and two reviews.381, 391 The study by Cook et al.395 was not included since it did not have a comparison or control group. The study characteristics and methods for each study are summarized in Evidence Tables 12 (Appendix H).

All six studies were retrospective and there were two cohort studies385, 388. Two studies included individuals that had biopsy-confirmed CD. All studies included controls as a comparator, and in three studies the controls appeared to be population-based.385, 388, 394 With regards to the ascertainment of the outcome of fracture, data was obtained from self-report data from administrative databases,394 patient register,385, 388, 394 or from interview/case reports.389, 390, 392 Only two studies mentioned inclusion of asymptomatic subjects.389, 392 Bone histology was mentioned as an outcome in a subset of patients in one study.390

The case-control study by Fickling and colleagues,390 compared individuals with CD attending a GI outpatient department and/or members of local celiac societies. The authors found a higher prevalence of past history of fractures in the CD patients (21%[16/765]) compared with a control group (3% [2/75]; RR 7.0). There was no difference in BMD T-score results between those with and without a history fracture, although those patients with a fracture history were older (p<0.02). Limitations of this study include the fact that they did not identify whether CD was biopsy-confirmed, and a potential for selection bias.

Thomason et al.,373 in a case-control study, used self-report data for 244 patients with biopsy-proven CD and found that fractures were not significantly increased in those with CD compared with controls (OR 1.05, 95% CI: 0.68–1.02), although there did seem to be a trend to increased wrist fractures (OR 1.21, 95% CI: 0.66–2.25). The mean age of these patients was older (60.2) and the mean BMI was higher (23.9) than that reported in other studies. However, this study may have been limited by potentially not having adequate power to detect fractures. In addition, all the fracture data was self-reported.

Vasquez et al.,389 in a retrospective case-control study, found that 25% (41/165) of CD patients had one to four fractures, compared with 8% in age- and sex-matched controls. The majority of fractures occurred prior to diagnosis of CD and the most common fracture site was the wrist (OR 3.5, 95% CI: 1.8–7.2). Potential sources of bias for this study include the fact that the cases were from a malabsorption clinic and may therefore represent patients with more severe disease (mean BMI=21.4). The OR for vertebral fractures was 2.8 (95% CI: 0.7–1.15), although there was incomplete ascertainment of X-rays, since not all X-rays were of adequate quality. This was the only study to include an assessment of the proportion of patients on a strict versus a reduced GFD.

Two studies were population-based.385, 388 Vestergaard et al.,388 evaluated all individuals with CD in Denmark captured from hospital discharge data, and did not find an increase in fractures requiring hospitalization in patients with CD (n=1,021; 7,774 patient years) relative to controls (n=23; 316 patient years) with an independent independent relative risk (IRR) at pre-diagnosis of 0.70 (95% CI: 0.45–1.09) for all fractures. For spine, the IRR pre-diagnosis was 2.14 (95% CI: 0.70–6.57) and 1.07 (95% CI: 0.39–2.95) for rib and pelvis. There are significant limitations to this study since the diagnosis of fractures was hospital-based and therefore, fractures that did not require hospitalization would be missed and could lead to under-reporting. In addition, the diagnosis of CD was only validated in a sample of nine cases (with a validity of 78%), and all cases of CD had to be hospitalized to be included.

West et al.,385 in the largest analysis of fractures in CD patients identified from the UK GPRD primary care database, found an increase in fractures in CD patients relative to controls. The mean age at diagnosis was 43.5 years, and the ascertainment of fractures was from an administrative database. For any fracture, the hazard ratio was 1.3 (95% CI: 1.16–1.46; 137.9/10,000 patient years vs 105.9/10,000 patient years in controls]). The hazard ratio for hip fracture was 1.9 (95% CI: 1.2–3.02) and the hazard ratio for wrist fracture was 1.77 (95% 1.35–2.34). The absolute difference in the overall fracture rate was 3.2/1,000 person years and 0.97/1,000 for hip fractures in those older than age 45. In contrast to earlier studies, the authors did not find a difference in the risk of fracture after CD diagnosis compared with before diagnosis.

A recent case-control cross-sectional study by Moreno et al.,392 compared fractures in 148 CD patients (53% classically symptomatic, 36% subclinical CD, and 11% silent CD-detected by screening} to 296 controls (functional GI disorders). The fracture data was self-report obtained by interview/and pre-designed questionnaire. Moreno et al. found an increased number of fractures in the peripheral skeleton for classically symptomatic subjects compared with controls, but did not find an increased number of fractures in the subjects with subclinical or silent CD.

BMD. BMD is a surrogate outcome for fracture, and it is easier to evaluate in short-term studies. Previous studies of osteoporosis therapies in postmenopausal osteoporosis have shown that there may not, however, be a direct correlation between fracture reduction and increases in BMD. Osteoporosis/osteopenia may be a sign of subclinical CD and persisting osteopenia/osteoporosis in a patient with known CD may be a sign that the mucosa has not normalized.

BMD is an areal two-dimensional measure of bone mass and does not give a true volumetric measure and, therefore, may not be an accurate reflection of bone mass in children.

We found 11 articles that addressed the outcome of BMD/BMC in newly diagnosed subjects with CD.348, 352, 353, 355, 356, 375–378, 386, 387 The study characteristics are summarized in the Evidence Tables (see Appendix H).

The majority of these studies assessed BMD at baseline and the percentage change after a variable follow-up period (1 to 5 years in duration). Two studies evaluated the BMD of children with CD,352, 377 one study evaluated a mixed population,348 and the remaining studies evaluated adults. All studies included individuals with biopsy-proven CD and in most of the studies BMD was compared with a control population. Only two studies had patients with CD act as their own controls.353, 376 The female to male prevalence ratio in CD is 2:1, and in these studies the proportion of females varied from 50% to 80%.

Five studies included assessments of dietary compliance to a GFD and three studies included data on whether subjects were on co-interventions (e.g., vitamin D or calcium), which may have impacted the BMD results. Only two studies356, 376 looked at the potential relationship between the change in histological grade on small bowel biopsy and change in BMD.

Prevalence of osteoporosis/osteopenia. The studies consistently found that BMD results were lower in untreated subjects with CD compared with controls. Regarding the prevalence of osteopenia/osteoporosis in newly diagnosed patients with CD, the estimates varied. Satgena-Guidetta et al.353 noted a mean Z-score of -1.5 at lumbar spine, and -1.8 at the femoral neck, with 34% of subjects having normal BMD, 40% having osteopenia and 26% osteoporosis. Valdimarsson et al.355 found the prevalence of severe osteopenia, as defined by a Z-score less thatn -2, to be 15% at the spine, 9% at the femoral neck, and 22% at the forearm. The prevalence of mild osteopenia (defined as -2 ≤ Z < -1) was 23% at the lumbar spine and 24% at the forearm. There was not any difference in lumber spine BMD between those patients who presented with malabsorption, compared with those patients without malabsorption. Valdimarsson et al., found that 27% of subjects had secondary hyperparathyroidism. After 1 year on a GFD, the prevalence of those with severe osteopenia decreased from 23% to 14%.

In a recent review the authors pooled prevalence results and found that patients with untreated CD had a mean Z-score of -1.42, and a hip Z-score of -1.14.381

Valdimarsson et al.,356 in a prospective study of 105 newly-diagnosed CD patients, performed follow-up small bowel biopsies. Of the 105 subjects, 28 had secondary hyperparathyroidism. They found a greater reduction in BMD in individuals who had secondary hyperparathyroidism (PTH>65). In this group, the BMD increased significantly, but did not completely normalize after 3 years of a GFD. In contrast, in those with normal PTH at diagnosis, the baseline BMD was not as low and there was a 2.5% increase after 1 year with the BMD normalizing after 2 years of a GFD. Valdimarsson also noted that 22 patients with stage III-IV had lower median Z-scores than 76 patients with mucosal changes grade I-II. In this study, compliance with the GFD was 100% in those with high PTH, and lower at 87% in those with normal PTH levels.

Kemppainen et al.,376 in a 5-year cohort study of 28 patients in which the cases served as own controls, found that BMD increased or remained stable in 69% of patients at the lumbar spine and in 67% of patients at the femoral neck. In this study, the authors did not notice an effect of the grade of villous atrophy on the mean BMD values or percentage change in BMD. They also did not observe any correlation between adherence to the GFD and the change in BMD.

Bai,375 in a small cohort of 45 (25 completed) newly-diagnosed CD patients, assessed compliance with the GFD and found that 84% of patients increased their lumbar spine BMD (mean increase of 12%) and total body BMD (mean increase of 7.3%), compared with 151 control subjects. The greatest increase in BMD was noted within the first year. Bai375 documented prior fractures in two patients, but did not report any fractures during the 4-year follow-up period.

Sategna-Guidetti et al.,353 in a longitudinal study of 86 CD patients, noted a similar proportion of patients (83.7%) increased their spine BMD after 1 year, with an increase of 5.3% in LS BMD after 1 year (change in Z-score of 0.5 at the spine).

Ciacci et al.,386 in a retrospective cohort of 41 consecutively diagnosed patients with CD, noted a significant increase in BMD (14% lumbar spine, and 10.4% femoral neck), after 1 year on a GFD. The authors also found that pretreatment BMD predicted response to treatment.

Mustalahati et al.,378 noted a significant increase in lumber spine and femoral neck BMD with treatment after 1 year compared with controls, and noted that the BMD was lower in symptom-free patients (n=15), suggesting patients with silent CD may have mucosal lesions for longer periods of time.

Bardella,348 in a case-control study of 71 CD patients (43 who had started a GFD in childhood and 28 who were diagnosed as adults and were on a GFD and in remission), found that the BMD of the adult CD patients was significantly lower than the control value (0.9 g/cm2 vs 1.1 g/cm2, p<0.01).

McFarlane et al.,387 in a case control study of 21 biopsy-confirmed subjects with CD, documented that the baseline lumbar spine BMD was 85% of that seen in controls, and the increase in lumbar spine BMD over the first year was 6.6% (95% CI: 3.1–10.1) and 5.5% in the femoral neck.

Children/adolescents. Mora et al.,377 in a study of 19 patients (211 controls), noted a lower BMD in CD patients versus controls at baseline, and an increase in total body BMD (using DXA) during the first year when compared with controls (15.2%).

Rea et al.,352 noted an improvement in forearm Z-score after 1 year on a GFD in 23 newly diagnosed children with CD.

Mortality. There were seven cohort studies that addressed mortality data in CD. Two were Italian studies, 335, 362 one was from Denmark,343 one from Sweden,331 and three were from the UK.336, 367, 368 All seven were cohort studies.

Corraro et al.,362 identified 1,072 biopsy-proven CD subjects from the records of 11 GI units between Jan 1962 to Dec 1994. The inclusion criteria were complete records and reliable diagnosis of CD. The ratio of men to women was 1 to 3, the mean age at diagnosis was 35.7 years, mean follow-up was 6.0 years and median diagnostic delay was 17 months. Forty-five percent of the population had mild (39%) or asymptomatic disease, and 50 patients were lost to follow-up. Data were collected over accumulated 6,444 patient years of follow-up, with a mean follow-up of 6 years. Adherence to a GFD was assessed. Fifty-three CD patients died compared with 25.9 expected deaths. An increase in mortality was noted in the entire cohort population (SMR 2.0 [95% CI: 1.5–2.7]). The overall SMR did not differ by sex, age of diagnosis, or year of presentation. Diagnostic delay by more than 1 year significantly increased the SMR (2.6 [95% CI: 1.6–4.1]). There was significant mortality among patients presenting with malabsorption (SMR 2.5 [95% CI: 1.8–3.4]). No excess mortality was seen with patients with mild or asymptomatic CD. Significant mortality was also seen when patients did not adhere to a GFD on clinical records (SMR 10.7 [95% CI: 6.0–17.1]) and on patient interview (SMR 6.1 [95% CI: 4.2–8.6]). The causes of death showed an excess of death from malignancy (24 observed cases, SMR 2.6 [95% CI: 1.7–3.9]) and diseases of the respiratory (SMR 3.6 [95% CI: 1.1–8.4]) and digestive tracts (SMR 6.1 [95% CI: 3.0–10.9]). NHL was seen in two-thirds of the malignant cases (n=16). The other malignancies included gastric (n=2), small intestinal (n=1), liver (n=2), pancreatic (n=1), pleura (n=1), and leukemia (n=1). (Table 45)

Cottone et al.335 evaluated mortality in a prospective cohort study of 228 biopsy-proven CD subjects in Sicily. Mortality was ascertained by reviewing hospital medical records and pathology specimens. Records were incomplete for 5% of patients. The mean age at diagnosis was 34.7 years and 100% of patients were on a GFD. Seventy-six percent were females. The clinical presentation was anemia in 60% of cases, malabsorption in 20% of cases, and asymptomatic in another 10% of cases. The mean follow-up was 73 months. Twelve deaths were observed, with 3.12 deaths expected and the SMR from all causes was 3.8 (95% CI: 1.9–6.7). The mortality rate was increased within the initial 4 years from diagnosis, giving an SMR of 5.8 (95% CI: 2.5–11.5).

Nielsen et al.343 from Denmark, conducted a retrospective cohort study of 98 CD patients between 1964-1982. Sixty-one percent of patients were females and the median age at diagnosis was 41 years (range 2 to 74 yrs). Twenty-four percent of patients had unclassified CD and were treated with prednisone, since they did not respond to a GFD and had probable refractory CD. Twenty-three deaths occurred during the study (four due to malignancy). Nielsen et al. found that the 5-year survival rate was 88%, the 10-year survival rate 68.5%, and that mortality exceeded that of age- and sex-matched controls in the general population by a factor of 3.4 (p<0.025). There was no difference in mortality between males and females (2.7 and 2.3, respectively). Subjects who responded to a GFD had an extra mortality factor of 2.2 (p<0.025), and those who did not respond to a GFD had an extra mortality factor of 5.8 (p<0.005). Causes of death were poorly documented.

Peters et al.,331 in a retrospective cohort study, compared 10,032 symptomatic subjects with CD who had been discharged at least once from hospital, to controls who were age/sex-matched for the calendar period cancer incidence rate. Fifty-nine percent were females. Mean follow-up was 9.8 years. Mortality was ascertained from a national death register. There were 828 deaths, with 419.3 expected, resulting in a SMR of 2 (95% CI: 1.8–2.1). Mortality risk decreased slightly with increasing number of years of follow-up (p for trend, 0.004). Mortality risks were increased for patients with NHL, cancer of the small intestine, autoimmune diseases (RA), allergic disorders, inflammatory bowel disorders, diabetes, and tuberculosis.

The first UK study was conducted in Birmingham, by Holmes et al.367 Series I included 202 patients with idiopathic steatorrhea or CD, followed from 1965-1975. Ten patients had a positive biopsy for CD. Eleven patients could not be traced. In the 10-year period, 20 deaths were seen, with ten due to malignancy. Series II (1989) had 210 patients (94 males and 116 females) with biopsy-proven CD. Seventy patients were on a normal diet and 134 were on a GFD for more than 12 months at the end of the survey. Forty-three patients had died from all causes (expected was 20.82 deaths, p<0.001); 21 deaths were due to malignancy—13 reticulum cell sarcomas, six GI tract cancers and two other malignancies. Of the 21, 13 had a GFD for a mean of 41 months. Deaths from all malignancies, irrespective of diet, were statistically increased as a whole (expected 5.048 vs observed 21, p<0.001) and divided by sex (men expected 2.878 vs observed 12, p<0.001 and women expected 2.170 vs observed 9, p<0.001). Patients taking a normal diet were at increased risk of developing a malignant tumor (p<0.05). Clinical response did not predict the risk of developing malignancy.

Johnston et al.368 examined CD in subjects from Northern Ireland using the Belfast MONICA project. MONICA I was the first survey, and began in Oct 1983 with 1,204 subjects. Of the subjects, 102 (52 males and 50 females, mean age 58.1 years) had positive serology, 72 consented to follow-up (34 males and 38 females) for 11.6 years (range 11.3–11.9 years), and 20 of the 72 gave consent to biopsy. Three subjects had villous atrophy. Thirteen subjects in MONICA I (seven males and six females) died (mean age at death 67.3 yrs; range 56–75 yr). Cause of death was obtained from death certificates from the General Register Office or General Practitioner records. Four patients died with malignant disease-pancreas, stomach, bile duct lymphoma and metastatic melanoma. None of the patients had CD, but all had positive serology. The number of cancer-related deaths and all cause mortality in the MONICA I follow-up study did not show an excess number of deaths compared with the general population of Northern Ireland.

Logan et al.336 followed a prospective cohort of 653 patients with CD in Edinburgh between 1979 and 1981. All patients had biopsy-proven CD and mortality was ascertained from death certificates. Sixty percent of the patients were females and the mean follow-up was 13.5 years. Six percent of subjects were lost to follow-up. Clinical presentation was not reported. The subjects with CD were compared with age/sex-matched controls. There were 115 deaths from all causes; the expected number was 61.8 for a SMR of 1.9 (95% CI: 1.5–2.2). The increased mortality was greatest during the initial year after diagnosis and declined over time. The mortality rate for those diagnosed during childhood was similar to that of the general population.

Quality Assessment

The majority of studies included in this objective were single group “before-after” studies, although some studies also included a comparative healthy control group. We could not identify any quality instruments for this type of study design and in general, this type of study is considered weak, particularly in the absence of a control group. Overall, however, the strength of the evidence for this objective was fair to good (Appendix J, Tables 68).

Celiac 5: Promoting or Monitoring Adherence to a GFD

Out of 502 citations identified by the search strategy for the Celiac 5 objective, 189 met level 1 screening criteria (Appendix F). Of these, 86 met level 2 screening criteria and 20 studies met level 3 inclusion criteria.396–415

Of the included studies, eight studies offered correlation between serology and mucosal histological grade,397, 398, 403, 404, 407, 409, 413, 415 and eight reported on serology only.396, 399–402, 408, 410, 412 Four studies focused on histologic changes without serology.405, 406, 411, 414 Nine of the included studies were conducted in an adult population, six in a pediatric or adolescent population, and five studies in mixed populations consisting of adults and children.

Included articles were divided by study population (adult/children/mixed), antibody type (IgG or IgA), and by antibody methodology (e.g., ME or HU).

None of the identified studies directly assessed the efficacy of a specific intervention on the promotion of adherence to a GFD. Six studies hint at interventions that could potentially be effective.416–421 Four of these studies were applicable to a pediatric population and two studies were applicable to adults.

Monitoring Adherence to a GFD

Biopsy. To evaluate serology in assessing adherence, some information regarding mucosal recovery on GFD must first be known. Although mucosal recovery is generally assumed to occur within 6 to 12 months after starting GFD, there is evidence that recovery may be slower and more incomplete than previously assumed.

In a mixed population, Wahab et al.405 followed the histologic profiles of 158 patients after institution of a GFD. Histological recovery, defined as the absence of villous atrophy (Marsh 0–II), was seen in only 65% of the patients within 2 years. Within 5 years, 85.3% of patients showed recovery, and an incremental improvement to 89.9% occurred after 5 years. Of the 10.1% of patients not achieving histological recovery during the follow-up period, 11 had symptoms of CD and were therefore, considered to have refractory CD (7% of all patients). Patients with Marsh IIIb and IIIc histology initially had lower rates of recovery, compared with those with Marsh IIIa histology. In a subgroup analysis of 25 children, recovery seemed to occur faster—96% showed histological recovery within 2 years (p<0.01 vs adults) and 100% recovered in long-term follow-up. It is important to point out that the validity defining a Marsh II lesion as histological recovery is uncertain. If these patients were not included, rates of histological recovery would be even slower. Nonetheless, clinical improvement was seen despite the slow histological improvement.

An early study by McNicholl et al.,406 is consistent with the finding of more complete mucosal recovery in children. Thirty-six children on a GFD for a mean of 5.8 years underwent duodenal biopsy. Mucosal morphology was normal in 16 (44%) patients, while the remainder of the patients had minimal changes. Villous atrophy was not seen. IEL counts were normal in 30 (83%) patients. A subsequent gluten-challenge confirmed the diagnosis in all 36 children.

Lee et al.,411 in a retrospective cohort of 39 adult patients, also found incomplete mucosal recovery. After a mean duration of a GFD for 8.5 years (range 1 to 14 years), histology was normal in only 21% of patients, and partial and total villous atrophy was seen in 69% and 10% of patients, respectively. These patients were felt not to have refractory CD since they had a good clinical response to the GFD. Also of concern were the results of serologic testing at the time of follow-up biopsy in 31 patients. Despite the relatively high number of patients with some degree of villous atrophy, IgG-AGA, IgA-AGA and IgA-EMA were negative in the majority of patients. In fact, 77% of the 31 patients having serologic tests were negative for all the listed serological tests. The exact number of these 31 patients who had some degree of villous atrophy was not reported, but would be expected to be similar to the overall numbers listed above.

Selby et al.414 investigated whether the failure of mucosal recovery was due to noncompliance with a GFD. Eighty-nine adult patients with CD on a GFD for a mean in excess of 8 years underwent dietary assessment by a dietician, questionnaire and food diary. They were then classified as either Codex GFD, which allows up to 0.03% of protein from a gluten source, or no-detectable gluten GFD (NDG-GFD). Villous atrophy persisted at high rates in both groups, with 46% of those on Codex GFD and 40% of those on NDG-GFD having persistent villous atrophy. The patients in this study did not have clinical features of refractory sprue. Based on the fact that there were similar histologic profiles in both groups, the authors postulate that persisting mucosal abnormalities may be unrelated to gluten non-compliance. Of course, gluten intake in the NDG-GFD group undetected by study protocols cannot be ruled out.

Serology. The studies assessing the utility of serology in monitoring adherence can be divided into those with,397, 398, 403, 404, 407, 409, 413, 415 and those without396, 399–402, 408, 410, 412 biopsy correlation. The studies without biopsy correlation are reviewed first. They establish an association between serologic positivity and patient compliance.

Bartholomeusz et al.396 demonstrated higher rates of IgA-AGA positivity in non-compliant as compared with compliant CD patients in a mixed population. How compliance was ascertained is not described. Three of the 17 (17.6 %) patients compliant with a GFD for greater than 6 months were IgA-AGA positive as compared with 11 of 12 (91.6%) non-compliant patients. The PPV for non-compliance was calculated to be 78.5%.

Burgin-Wolff et al.400 showed that, as expected, serology becomes positive with gluten challenge. One hundred and thirty-four children with CD underwent gluten challenge and were assessed for IgA-AGA and IgA-EMA-ME. At baseline, the rate of serologic positivity was 23% for AGA and 13% for EMA. Within 3 months of gluten challenge, 97% of children were positive for AGA and 65% positive for EMA. Between 3 months and 1 year, 85% of children were positive for AGA and 84% positive for EMA.

In a mixed population, Fabiani et al.408 demonstrated significantly higher IgA-tTG-GP values in patients deemed to be non-compliant with a GFD as compared with compliant patients.

Bardella et al.399 demonstrated that the positivity of various serologic markers falls in adults with duration on a GFD (Evidence Tables, Appendix I). The five groups in this study were untreated CD, poor GFD compliance, GFD less than 2 years, GFD greater than 2 years, and a control group. As expected, IgA-AGA, IgA-EMA-ME and IgA-tTG-GP were positive in virtually all untreated CD patients. Also, as expected, there was a low rate of positive serology in the control group, with a higher percentage being IgA-AGA positive than either IgA-EMA-ME or IgA-tTG-PG. In the poorly-compliant CD group, all were positive for all three serologic tests. In patients on a GFD less than 2 years, the rates of positive AGA, EMA and tTG were 40.9%, 54.5%, and 63.6%, respectively. In patients on a GFD for more than 2 years, the rates were 16.2%, 9.5% and 14.2%, respectively. The overlap of CIs intervals was such that no differences between the serologic tests could be determined.

Vahedi et al.402 studied IgA-EMA and IgA-tTG in adult CD patients. Based on dietary inquiry, patients were divided into those on a strict GFD, those with minor transgressions and those with major transgressions. It was not reported whether the EMA was ME or HU, nor was it reported whether tTG was GP or HR. The median duration of GFD was 75 months. Among those on a strict GFD, 2.5% and 3% were IgA-EMA and IgA-tTG positive, respectively. Among those with minor transgressions, positivity was only 37% and 31%, respectively. Among those with major transgressions, positivity was 86% and 77%, respectively. The sensitivity of IgA-EMA for any dietary transgression was 66%, and for minor transgression it was 37%. For IgA-tTG, the sensitivities were 52% and 31%, respectively. No statistically significant differences were detected between the two serologic tests.

In a mixed population, Scalaci et al.401 showed a low reliability for IgA-EMA in picking up dietary transgressions reported at interview. It is not reported whether ME or HU was used. In patients on a GFD for at least 6 months, only 11.1% those patients reporting one dietary transgression per month were positive, and only 19% reporting one dietary transgression per week were positive.

Fabiani et al.410 showed a similarly low rate of serologic detection of non-compliance in screen-detected adolescents. Of 6,315 screened students, 28 biopsy-proven CD patients were found. Of these, 23 agreed to participate in a follow-up study. The mean duration of GFD was 23 months. IgG-AGA, IgA-AGA and IgA-EMA were measured. Whether EMA was ME or HU was not reported. Of the 11 patients reporting any dietary transgression, only two patients (19%) were positive for any of the serologic tests.

Pacht et al.,412 in a similar study, showed different results. Seventeen children deemed compliant with GFD for at least 1 year were all IgA-EMA-ME-negative, whereas, 22 children deemed non-compliant were IgA-MA-ME-positive. This study suggests a much higher sensitivity for EMA than in other studies.

A number of further studies include serology and biopsy correlation. These are reviewed below.

Sategna-Guidetti et al.413 looked at 47 adults with CD. All were IgA-EMA-ME positive at diagnosis. After 8 to 30 months of GFD, a second biopsy was taken and IgA-EMA-ME was remeasured. Total AGA was also measured in 39 patients. No patient in which the mucosa recovered to normal had a positive EMA. Only one patient with normal histology had a positive AGA (2.6%). EMA was positive in only five of 23 patients with partial villous atrophy, three of 13 patients with subtotal villous atrophy, and one of two patients with total villous atrophy. AGA was positive in only seven of 20 patients with partial villous atrophy, five of ten patients with subtotal villous atrophy, and two of two patients with total villous atrophy. The PPV of EMA for abnormal histology was 100%, but the NPV was only 23%. The PPV-AGA (total) for abnormal histology was 93.8%, whereas the NPV was only 25%. There was a clear inability of serology to adequately reflect the mucosal state in this study, and serology was negative in a significant number of patients with villous atrophy.

Valentini et al.407 also found a significant rate of negative serology despite the presence of villous atrophy. In an adult population on a GFD for a mean of 9.9 months (range 6–12 months), 24 patients were IgA-EMA-ME negative on a GFD. Seventeen of these 24 patients (71%) had varying degrees of villous atrophy on biopsy (14 had partial villous atrophy and three had subtotal villous atrophy).

Dickey et al.409 also showed that disappearance of IgA-EMA-ME did not necessarily indicate mucosal recovery. In adults on GFD for 1 year, IgA-EMA-ME was positive in only two of 22 (9%) with partial villous atrophy, and three of ten (30%) with subtotal/total villous atrophy.

Mengozzi et al.403 investigated adult CD patients on a GFD for 1 year. Most (95%) had a Marsh III histology at diagnosis. In general agreement with the prior studies, only 12% had normal histology at follow-up biopsy 1 year later. Fifty percent were Marsh I and 38% were Marsh II or III (individual results for Marsh II and III were not reported). IgA-EMA-ME, IgA-tTG-HR (four different assays: DRG Diagnostics, Eurospital, Immunodiagnostik, and Celikey), and IgA-tTG-GP were measured. Taking complete mucosal recovery as a negative biopsy and all other biopsies as positive, the authors looked at concordance of serology to biopsy results. Concordance for EMA, tTG1, tTG2, tTG3, tTG4 and tTG5-PG were 48%, 29%, 65%, 14%, 16%, 19%, respectively. The validity of a Marsh I or perhaps Marsh II histology being classified as positive is unclear, and it would have been interesting to know the corresponding concordance rates if Marsh 0–I and Marsh 0–II were considered normal.

Kaukinen et al.398 similarly found a lack of correlation between IgA-EMA-HU, IgA-tTG-GP and histologic state. Of 87 adult patients on a GFD for a median of 1 year, 27 still had a Marsh III villous atrophy. Among those with Marsh III villous atrophy, EMA was negative in 74% and tTG was negative in 59% of patients. Furthermore, of 11 patients admitting regular dietary lapses, 55% were EMA and tTG negative. The sensitivity, specificity, PPV, and NPV of EMA for Marsh III villous atrophy was 26%, 93%, 63%, and 74%, respectively. The values for tTG were 41%, 88%, 61% and 77%, respectively.

The issue arises as to whether serology might more accurately reflect mucosal state in long-term follow-up. In patients on GFD over 5 years,398 two of four patients with Marsh III villous atrophy were EMA and tTG negative, and five of nine patients (56%) admitting dietary transgressions were EMA and tTG negative. In this study, there was no clear advantage of tTG over EMA.

One study by Fotoulaki et al.397 did show a good correlation between serology and mucosal state. In a mixed population of 30 patients, IgG AGA, IgA AGA and IgA-EMA-ME was measured after 12 months of GFD. Contrary to the preceeding studies, all patients had either a Marsh I or II biopsy on a GFD, and all were IgA AGA and IgA EMA negative, while 40% were still IgG-AGA positive. The age range of patients in this study was much younger (1 to 24 years).

Troncone et al.415 demonstrated that serology could miss dietary transgressions in children. Twenty-three adolescents were divided into four groups, depending on assessment of gluten intake. IgA-EmA-ME was present in seven of seven patients assessed to be taking >2 g/day of gluten. All seven also had villous atrophy. Conversely, four patients on a strict GFD, had normal histology and negative EMA. For patients with intermediate levels of gluten intake, one of six patients with a gluten intake of less than 0.5 g/d had a positive EMA. This patient also had partial villous atrophy. Three patients in this group had lesser mucosal abnormalities (increased IELs) and negative serology. For patients ingesting 0.5 to 2 g/d of gluten, three had a positive EMA; two of these had villous atrophy. Five patients had increased numbers of IELs.

Interventions to Promote Adherence to a GFD

Anson et al.416 investigated 43 Jewish Israeli children with CD, and their parents. Thirty-one of the children (70%) were judged compliant based on a combination of clinical symptoms, biopsy and AGA. It is unclear if serology and biopsy was performed in all children to assess compliance. Parental knowledge was studied using a structured questionnaire. A significant positive correlation between the father being a professional and compliance was found (p<.01). Parental level of education was also significantly correlated with compliance. Significant differences in parental ability to choose GFD items from a specific menu were found. Ninety three percent of parents of compliant children were able to pick all five GFD items out of an eight-item menu. This compared with only 67% of parents of non-compliant children (p<.05).

In another parental questionnaire, Jackson et al.418 found that 30 of 50 (60%) parents reported their children to be on a strict GFD. Dietary compliance correlated with membership in the Celiac Society (p<0.0001). It also correlated with parental score on an eight-question test related to knowledge of CD (p<0.001).

Ljungman et al.420 found self-reported GFD compliance in children to be positively associated with knowledge of CD. In this study of 47 Swedish children, those deemed compliant scored 14.03 out of 15 on a knowledge test related to CD. This compared with an average score of 12.44 in the non-compliant group.

Lamontagne et al.419 surveyed 617 past and present members of the Quebec Celiac Foundation. A final sample size of 234 was obtained. Self-reported compliance difficulty with a GFD was inversely correlated with a high level of confidence in treatment information from gastroenterologists and dieticians (p<.005).

Hogberg et al.421 looked at the effect age of diagnosis might have on compliance. In a study population of 29 adults with CD, 15 were deemed compliant with a GFD on the basis of a questionnaire and serology (IgA EMA, IgG EMA and IgA tTG). Eighty percent of patients diagnosed prior to age 4 were GFD compliant compared with 36% of patients diagnosed after age 4 (p<.05). A drawback of this study is that serologic markers were collected about 3 years prior to the dietary questionnaire. This risks misclassification of patients if their compliance varied over time.

In an important study with relevance to outcomes of population screening, Fabiani et al.417 showed a lower compliance in 22 adolescents identified by a mass screening program as compared with 22 age-matched controls with identified CD on the basis of symptoms. All patients had been prescribed a GFD for more than 5 years. Twenty-three percent of screen-detected patients reported being on a strict GFD as compared with 68% of those diagnosed with CD on the basis of symptoms. Patients in the screen-detected group were diagnosed at a later age (mean 14.0 yrs) versus patients identified on the basis of symptoms (mean 4.3 yrs).

A colouring book intervention has been developed to promote GFD compliance,422 but the effectiveness of this intervention has not been assessed in children with CD.

Quality Assessment

The majority of studies in this objective were of a “before-after” design. In this setting, this design may not pose a major limitation for monitoring studies, since the purpose of the study was to assess the change in serology and histology after introduction of a GFD. In this regard, the strength of the evidence for monitoring adherence to a GFD was fairly good. However, there is almost a complete absence of studies of interventions for the promotion of adherence to a GFD.

Chapter 4. Discussion

Celiac 1: Sensitivity and Specificity of Tests for CD

Serology

Systematic reviews of studies of diagnostic accuracy are similar in many ways to reviews of other study types, such as randomized controlled trials. However, important differences exist in large part because of the weaknesses inherent to the diagnostic-accuracy study design and its potential sources of bias.24 In addition to these considerations, the topic of CD introduces further difficulties, and bias because of the nature of how the disease itself is defined, and the methods of patient selection for inclusion in the study. Ideally, a diagnostic-accuracy study should include a consecutive or randomly selected sample of patients from a clinically relevant patient population. That is to say, a study population who's characteristics match those of the population in which the test will ultimately be used, and both patients and controls are selected from this population. Unfortunately, selection spectrum bias is common in studies of diagnostic tests in general, and in practice it is easier for investigators to select cases and controls as separate groups in a case-control design. The practice of choosing cases that have previously been identified as having the disease, especially if more severe, introduces bias in the estimates of sensitivity (artificially raising it), while choosing completely healthy individuals as controls introduces bias in the estimates of specificity—artificially raising it as well.24 The importance of these biases comes back to the issue of the relevant clinical population. If the test is to be used in screening healthy individuals, then the estimate of the reported sensitivity is higher than it should, but the specificity estimate is likely valid. On the other hand, if the test is to be applied to suspected cases of the disease, then the reported estimate of sensitivity may not be that far off, but the specificity estimate would be higher than it should. Other important sources of bias also exist in relation to the study population, such as the mix of other diseases present in the population with similar features as the disease in question, and ensuring an appropriate mix of disease severity in the tested population. This last point regarding disease severity is especially important for this report, and is discussed at length below.

Lijmer et al.423 reviewed 11 meta-analyses of diagnostic tests, and assessed the characteristics of the included studies using multivariate regression analysis. The authors identified several threats to the validity of a diagnostic study's results. Case-control designs overestimated diagnositic odds ratios (DORs) by three-fold compared with studies using a clinical cohort (relevant clinical population). As well, studies that applied different reference tests to those with and without disease (in case control) or to those testing positive or negative (in relevant clinical populations) overestimated the DOR by 2.2-fold. Interpreting the reference test, with knowledge of the results of the test under study, overestimated the DOR by 1.3-fold. DORs from studies without adequate descriptions of the test or study population were 70% and 40% higher, respectively, than in studies reporting these details. Inadequate descriptions of the reference test were also identified as sources of bias.

With this information at hand we tried to minimize bias in this report, by using what some may consider fairly strict inclusion criteria which also eliminated many poor quality studies. We included both case-control studies and cohort (relevant clinical population) designs but grouped them separately. Studies were only included if an adequate description of the test under study and the reference test (biopsy, and a statement of the criteria defining CD) were provided, and both the cases and controls had to have had the same reference test (i.e., biopsy) applied at the same definition or level (i.e., biopsy grade).

The results of the systematic review demonstrate that in the studied populations IgA-EMA and IgA-tTG have sensitivities and specificities each in excess of 90% in both children and adults. In fact, the pooled specificity of EMA was 100% in adults using either EMA-ME or EMA-HU. In studies of children, the specificity of EMA using these two substrates was 97% and 95%, respectively, with overlapping 95% CIs, suggesting no statistical difference between these values. In adults, the pooled specificity of tTG-GP and tTG-HR were 95% and 98%, respectively, with overlapping CIs. Similarly, in children the specificities were 96% and 99%, again with overlapping CIs. Among the three studies in adults,32, 45, 70 and four studies in children35, 52, 70, 79 that assessed both EMA and tTG, the specificities were nearly identical. Overall, these results suggest that EMA and tTG antibodies demonstrate extremely high specificities in both adults and children.

We identified a tendency towards greater variability in sensitivity between studies and between antibodies, compared with specificity. IgA-EMA-ME demonstrated sensitivities of 97% and 96% in adults and children, respectively. EMA-HU demonstrated a similar sensitivity of 97% in children, although the pooled estimate in adults was somewhat lower at 90%. Among two studies that assessed both EMA-ME and EMA-HU in adults, one demonstrated identical sensitivities of 95%,81 whereas, the other57 showed a lower sensitivity of HU compared with ME (90% vs 100%). This last study only included 20 untreated patients with CD, all of whom were ME positive, but two of whom were HU negative. None of the included mixed-age studies assessed both of these antibodies. Heterogeneity existed in the analyses of sensitivity of tTG-GP in the adult, but it is likely close to 90%. In children, the pooled estimate was 93%. The sensitivity of tTG-HR was 98% in adults and 96% in children, although in both cases the CIs included a low of 90%. In studies of mixed-age populations the sensitivity was 90%.

Estimates of the sensitivity of the IgG class antibodies of EMA and tTg suggest that these tests have poor sensitivities around 40%, although the specificities were quite high at around 98%. These finding suggest that this class of antibody would be inappropriate as a single test for CD, but may be useful in IgA deficient patients, or in combination with an IgA class antibody. One study that assessed the use of IgA-tTG-HR with IgG-tTG-HR found a sensitivity of 99% and a specificity of 100% for the combination.72

The analyses of all the AGA subgroups demonstrated significant heterogeneity, making pooled estimates impossible. Be that as it may, the sensitivity of IgA-AGA in adults is likely not much higher than 80%, but seems somewhat higher in children. The specificity likely lies between 80% and 90%, in adults and children, although the studies of serial testing of AGA followed by EMA or tTG in the prevalence section of this report suggest that the specificity is low as well. Even if one considers an optimistic range, the performance of IgA-AGA in both adults and children is inferior to that of the other antibodies discussed above.

The analyses of IgG-AGA suffered from significant clinical and statistical heterogeneity, making even general summary statements difficult. With this in mind, the typical sensitivity of this test likely lies below 80% in adults, and between 80% and 90% in children. The specificities are likely close to 80% in adults and between 80% and 90% in children with the same warning coming from the prevalence studies, suggesting that in the era of EMA and tTG, testing for CD with AGA has a limited role.

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf23.jpg.

   Figure 23. PPV based on the pooled estimates of sensitivity and specificity

In assessing the PPV and NPV of these tests it is important to keep in mind the prevalence of CD in the tested population. In all the included studies, the prevalence of CD would be considered quite high, the minimum study prevalence was 9%, and many studies demonstrated prevalences in excess of 40%. In comparison, Fasano et al.15 found the prevalence of CD in at-risk first-degree relatives of CD patients to be 4.55%. In general, based on our report, the prevalence of CD in high-risk groups such as suspected CD patients, and first-degree relatives was less than 20% (in non-tertiary centers), and the prevalence in patients with anemia and diabetes was generally less than 10% (Celiac 2 section). As expected, overall the included studies demonstrated the classic relationship between prevalence and the PPV and NPVs. At the relatively high prevalence of CD in these studies, the PPV (the chance that a positive test represents a true positive test) was quite high (>90%), but started dropping at a prevalence below 35% to values generally below 80%. Figures 21 and 22 represent the actual unweighted individual study data. It is therefore not surprising that the studies maintaining a high PPV at a low prevalence were all studies of small sample sizes. In the expected reverse relationship, at a prevalence above 45% the included studies showed a drop in the NPVs. However, in contrast to the situation with the PPV, the NPV would be expected to be between 95% and 100%, if not actually close to 100%, at the expected prevalence of CD in most clinical situations. The same relationship was seen when the pooled estimates of the sensitivity and specificity for each analysis group was used to calculate the PPV over a range of prevalences (Figure 23). Therefore, the potential problem with EMA and tTG serological testing lies in their performance in situations of “low” prevalence of CD (i.e., less than 20%, a value that is still higher than the prevalence of CD in most at-risk groups). Unfortunately, it was difficult to directly estimate the PPV of EMA and tTG based on the prevalence studies, such as the one by Fasano et al., since many of the studies only performed serology testing, or there was incomplete biopsy confirmation. However, in studies where it could be estimated using the best performing EMA or tTG serological test, the PPV ranged from 66.7% to 95.0%,209, 211, 212, 214, 215, 220, 223, 323 with all but one study having a PPV of less than 88.9%. Most of the studies had PPVs in the range of 70% to 80%. In this same group of studies that assessed the prevalence of CD in a general population, five studies showed 100% PPV, however, in all these studies there was less than ten confirmed CD cases,213, 217, 222, 225, 231, 269 and in three studies there were three or fewer confirmed cases.217, 222, 231 The PPV of IgA/IgG AGA screening alone was considerably worse, and it was not uncommon in serial testing studies to see a ten-fold drop in potential cases when moving from AGA to subsequent EMA and tTG confirmation.

From the preceding discussion it is clear that in the diagnostic studies of the serological tests, the sensitivities of EMA and tTG antibodies for the detection of CD are quite high. Furthermore the specificities and NPVs are nearly perfect, making these antibodies appealing candidates for screening, as well as for the diagnosis of suspected CD patients. However, the pressing question is whether the reported high sensitivities and PPVs in these studies, and the enthusiasm surrounding these antibody tests, will hold true when these tests are applied to different clinically relevant populations. Of concern, is the true PPV of these tests when they are applied in populations with a relatively “low” prevalence (<10%–20%) of CD. This is an important issue, since the proportion of patients who would undergo unnecessary further testing will rise as the PPV falls. For example, if the PPV falls to a value of 80% (based on the examination of Figure 21), then 20% of screen-positive individuals would undergo unnecessary testing and/or treatments. From the estimates discussed above derived from the population screening studies, and from the plots of PPV versus prevalence, it would appear that the PPV of these tests is potentially lower than the diagnositic test studies suggest it is.

The vast majority of studies, as well as our own TEP, required that the small intestinal mucosa show at least partial villous atrophy histologically for the diagnosis of CD to be made. In fact, most of the studies used patients with subtotal or total villous atrophy. Furthermore, inherent to the clinical definitions of classic, atypical, and silent CD described in the methods, is the requirement of having a “fully developed” villous atrophy. However, Fasano et al.,15 in a large American prevalence study, found that only 34% of biopsied EMA-positive subjects had subtotal or total villous atrophy (modified Marsh IIIb or IIIc). In this study, no EMA-positive patient had a Marsh I lesion, 26% had a Marsh II lesion and 40% had a Marsh IIIa lesion. It is clear from this study, and from the discussion about biopsy later in this section, that true CD exists in patients with histologic grades less severe than classic Marsh III lesions, and that patients with silent CD do not have to have fully developed villous atrophy. The problem that then arises is whether the reported sensitivities of these antibodies holds in the majority of patients who have CD, yet with less severe histology. As well, if the sensitivity is not as high as reported then, by definition, the nearly perfect NPV of IgA EMA and tTG would also be expected to suffer.

This question has been answered in several studies that have correlated histology with the sensitivity of these serological markers, and also mirrors to some extent the antibody response that occurs once patients with CD are placed on a GFD. A description of results of these studies follows below, while a full narrative with tables is located in the Appendix H.

Rostami et al.16 evaluated the diagnostic value of IgA EMA and AGA in 101 untreated patients with CD. The combination of the two tests showed an overall sensitivity of 76%. But, alarmingly, the sensitivity of EMA in these patients dropped precipitously with milder histological grades. EMA demonstrated a sensitivity of 100% in Marsh IIIc, 70% in Marsh IIIb and only 30% in Marsh IIIa. The authors did not consider patients with Marsh I or II lesions as having CD.

Tursi et al.424 assessed the relationship of the histologic grade to tTG positivity in 119 consecutive adult CD patients defined by characteristic duodenal biopsy and “permanent gluten sensitive enteropathy.” In this study, the frequency of tTG-positivity (sensitivity) and mean tTG levels, were greatest with the highest modified Marsh grade, and dropped steadily with milder histologic grades reaching a low of only 8% positivity in CD patients with Marsh I lesions. The sensitivities of tTG in Marsh IIIc, IIIb, IIIa, and II were 96%, 84%, 56%, and 33%, repectively. In another publication, likely using the same population of “permanent gluten-sensitive enteropathy,” Tursi et al.425 demonstrated similar results with AGA and EMA in a population of atypical CD (defined in methods). The sensitivities of EMA in Marsh IIIc, IIIb, IIIa, II, and I, were 97%, 92%, 89%, 40%, and 0%, respectively. The results with AGA showed a similar pattern, with the sensitivity dropping from 90% to 30% in March IIIc to Marsh II.

Furthermore, in likely the same population of “permanent gluten-sensitive enteropathy,” Tursi et al.426 found a relationship between clinical manifestation of CD and EMA sensitivity. EMA was positive in 77 of 96 (80.8%) patients with atypical CD and in 17 of 27 (63.0%) patients with silent CD.EMA was negative in patients with Marsh I lesions. Once again, assuming that all these patients with “permanent gluten-sensitive enteropathy” are truly CD patients, then EMA would miss 19% of atypical CD, and 37% of silent CD that were picked up on the basis of biopsy.

Demir et al.427 studied the presentation and clinical features of 104 newly diagnosed Turkish children. EMA and biopsy correlation was available for 72 children. Similar to what was described above, EMA was positive in 92% of patients with Marsh III lesions versus 66.6% of patients with Marsh I-II lesions. Kotze et al.428 assessed 47 symptomatic subjects with CD with intestinal biopsy, tTG and EMA antibodies. The authors found a statistically significant correlation between antibody titres of EMA and tTG, and histologic grades.

Hoffenberg et al.317 studied a group of children at risk of CD who were part of a large prospective study of the genetic and environmental factors associated with autoimmune diseases. No relationship was found between Marsh grade and the genetic risk factor leading to screening, but a significant correlation was found between Marsh grade and tTG (r=0.57, p<0.01).

In a small case-control study assessing the diagnostic value of EMA, Sategna-Guidetti et al. also found that in patients with documented CD, EMA positivity correlated with the severity of the histologic grade.429 In this study, EMA was falsely negative in 50% of CD patients without villous atrophy.

The findings of the large prevalence study by Fasano et al.,15 however, require further discussion within this context. This study demonstrated a very high prevalence of CD of 0.95% (1:105) in asymptomatic not-at-risk adults using IgA-EMA.Additionally, 34% of biopsied EMA positive subjects had subtotal or total villous atrophy (modified Marsh IIIb or IIIc), 40% had a Marsh IIIa lesion, and 26% had a Marsh II lesion. No CD patient in this study had a Marsh I lesion, although this is in part likely due to how they defined CD.In any case, there are at least two ways to interpret these results. The first is that EMA testing does pick up the mild Marsh grades, given the high prevalence of CD in this study. While the second interpretation is that based on the preceding discussion and the serology monitoring data, this study has missed an unknown number of CD patients with milder histological grades. Unfortunately, since we do not have follow-up data on the screen-negative patients in this study, this question will be difficult to answer and arguments can be made on both sides.

The question that remains, however, is whether subjects with low grade histologic lesions are at the same risk of long-term complications as those with more advanced histologic grades. On the one hand, it is apparent that symptoms may not correlate with histologic grade but rather with the length of affected small bowel. When the distribution of histological grades is compared among patients with CD who are clinically asymptomatic versus symptomatic, the same distribution of grades is seen. For practical reasons, few of the studies we identified assessed length of small bowel involvement with CD.But another question arises: are patients with early March lesions who test positive for serology the ones who have more extensive small bowel disease?430 These questions add to the uncertainty regarding the true performance of serological testing, and whether missing early grade histologic lesions is important. Although we could not find direct evidence comparing outcomes in patients based on their histologic grades, it is not unreasonable to think that a patient with Marsh I-II lesions would still have an increased risk of CD complications (see Celiac 4 and 5 for some data regarding this point).

In summary, it is clear that from our pooled estimates of the included studies that IgA-EMA and IgA-tTG antibodies provide excellent specificity for the diagnosis of CD. However, the high reported sensitivities may only apply to the selected group of patients with villous atrophy. Furthermore, if the sensitivity is in fact lower when the entire biopsy spectrum of CD is considered, then the nearly perfect NPV of these tests, particularly in low prevalence populations, would also be expected to suffer. Finally, the PPV of these tests may not be as high as suggested when the tests are applied in low-prevalence populations, as demonstrated by our estimates of PPV from the population screening studies. These potential limitations of serological testing can have profound implications for population screening initiatives, and verification of the sensitivity of these antibodies in a large population of CD patients showing the full histological spectrum is urgently required.

HLA DQ2/DQ8

The HLA DQ2 haplotype represents the occurrence of the HLA class II heterodimer alleles DQA1*0501 and DQB1*0201.These typically occur in a cis position as HLA DR3-DQ2 or in a trans position as HLA DR5/DR7-DQ2. The HLA DQ8 haplotype DQA1*0301/DQB1*302 typically occurs in association with DR4. HLA DQ2 occurs in about 20% to 40% of the general population,9, 10, 15, 100, 135, 136, 138–141, 143, 146, 147, 150–157, 159, 167 48% to 65% of healthy relatives of patients with CD,158, 161, 164, 166, 167, 169, 172, 177 and in up to 73% of non-CD patients with type I diabetes.97, 165 In one study, 100% of patients with enteropathy associated T-cell lymphoma (EATCL) were HLA DQ2 positive.151 Non-CD patients with Down Syndrome appeared to have the same frequency of HLA DQ2 as the general population.109, 134, 160

Populations of non-Western European descent demonstrated very wide variations in the frequencies of HLA DQ2 both in CD patients and controls.120, 137, 142, 148, 159, 163

Overall, it can be seen that HLA DQ2 alone offers a sensitivity in excess of 90%, which can be improved to close to 100% if a strategy of testing for both HLA DQ2 and HLA DQ8 is utilized (either test being positive). The specificity of both tests together, or either test alone, is not as good as the sensitivity, falling in the range of 55% to 80%. The specificity becomes considerably worse if a population with a higher expected frequency of HLA DQ2 or HLA DQ8, such as first-degree relatives of patients with CD or patients with type 1 diabetes, is tested. The PPV, (the probability that a positive test represents a true positive result) of testing for HLA DQ2/8 in an average population is generally low. One, however, needs to keep in mind the dependence of predictive values on the prevalence of CD in the population to be tested. Therefore, in high-risk groups, such as first-degree relatives or patients with type I diabetes, the PPV tends to be higher. Conversely, it appears that the value of testing for HLA DQ2/8 is highest when a negative test is found. Given the high NPV of this test, average-risk patients can have the diagnosis of CD excluded based on a negative test. The situation is more complex in high-risk groups, since the NPV decreases with increasing prevalence, and with the recognition that there are HLA DQ2/DQ8-negative patients with CD. These findings, along with the cost of HLA testing, make routine use of this modality for screening or diagnosis inappropriate. However, the use of this test is most useful in cases of diagnositic uncertainty or as part of a multi-test gold standard in clinical studies.

Biopsy

Unfortunately, we could not identify any studies that assessed the sensitivity or specificity of biopsy for the diagnosis of CD. This is perhaps not surprising considering that CD has historically been, and for the most part continues to be, diagnosed based on characteristic histological features. These histologic features have been classified and categorized by Marsh and others,1, 16 and criteria for the diagnosis of CD have been proposed,2 and modified4 (Appendix A). A biopsy showing characteristic features that improves with a GFD and recurs with gluten challenge is by definition the gold standard for the diagnosis of CD and therefore would be expected to be highly specific (some patients such as those with refractory sprue will not improve on a GFD but are still considered to have CD, so the specificity of this definition is not absolute nor perhaps completely valid). Although we do not have actual numbers, it would appear from the qualitative assessment of the identified articles that a biopsy classified as a Mash IIIa or higher is likely to have a high specificity for the diagnosis of CD. However, as seen in the study by Fasano et al.,15 such criteria would be expected to have a low sensitivity. Alternatively, one would expect that biopsy could have a very high sensitivity if a Marsh I lesion was used to define CD, though clearly given the wide differential of mild histologic changes (Table 1, Appendix A), the specificity would be expected to drop. Therefore, to try to estimate the sensitivity and specificity of biopsy, and particularly the lower histology grades, we have compiled some articles below that provide “uncontrolled indirect information” on this subject.

Inter-observer agreement in the histologic assessment of small bowel pathology. As previously described, there are several potential criteria for the diagnosis of CD. The original and modified ESPGAN criteria2, 4 appear direct. Most of these criteria, as well as the assembled TEP, felt that some degree of villous abnormality is required for the diagnosis of CD. In practical terms, even distinguishing between a Marsh II (no villous abnormality) and a Marsh IIIa (minimal villous changes) can be difficult.431. This concern is further confounded by potential problems with the biopsy specimens themselves such as size, orientation, quality, and proper biopsy sampling. Hence, agreement between different pathologists and between the same pathologist at different times becomes important. The biopsy literature search identified a few articles that addressed pathologist aggreement.

Weile et al.432 assessed inter and intra-observer agreement among three experienced Swedish and Danish pathologists reading the small bowel histology of patients suspected of having CD. Ninety small-bowel biopsies taken by capsule near the ligament of Treitz from 73 children were selected at random from a larger sample taken from 1987 to 1994. The final diagnosis was made on the basis of evaluation of specimens by dissecting microscopy, formalin-fixed H&E-stained slides, intestinal disaccaridases, serology and clinical presentation. The initial biopsy reports from patient files were sorted into normal (66; normal or minor nonspecific abnormalities—85% were on a gluten-containing diet [GCD]), pathological (17; total and severe villous atrophy, all on GCD), and inconclusive (seven; because of poor orientation, small sample, or autolysis). Several years later (1997) the same three pathologists who read the initial biopsies, performed a second reading of the slides given to them in random order. In comparison with the first reading, the number of inconclusive readings rose from seven to 22, there was a corresponding fall in the number biopsies read as normal and pathological. Considering the overall biopsy reading and diagnosis, the Kappa statistics (a statistical measure of agreement “correcting” for chance433) were (0.57, 0.63, and 0.75) for the three pair-wise comparisons of the three pathologists. These kappa values were reported to be “moderate” (for two out of the three agreement kappa scores) to “substantial” in terms of agreement, and suggest that agreement is far from perfect even when the same pathologist reads the same slide twice

Vilela et al.431 also assessed inter-observer agreement among Brazilian pathologists in the diagnosis of CD. Three experienced masked pathologists independently read the slides of 34 patients with CD based on ESPGAN criteria. Agreement differed among the three possible pair-wise comparisons, with the best agreement occurring between pathologists A and C. Good to excellent agreement (kappa 0.61–0.85) was obtained for the assessment of villous structure. Reasonable to good agreement was observed for increased number of crypt mitosis (kappa 0.63), and decrease in the overall number of villi (kappa 0.47–0.53). However, agreement about the number of IELs using standard staining was weak (kappa 0.39). Interestingly, the agreement regarding overall histologic grade was also weak between two pathologist pairs, and reasonable to good for the last pair. As with the above study, it is difficult to comment on the generalizability of these results. The authors suggest that the number of CD cases seen was fewer than expected, and qualitative rather than quantitative measures of such parameters as villous height and IELs were used. Still, the findings suggest that agreement regarding the histologic grades should not be taken for granted.

Several authors have suggested that quantitating various histologic features, such as the number of IELs per 100 or more enterocytes, results in greater reproducibility of biopsy readings.434 Authors that used quantitiative criteria during studies of inter-observer agreement likewise showed better agreement than reported above.435–437 These studies suggest that the use of quantative methods in the reading and reporting of small bowel histology, by pathologists experienced in the reading of CD biopsy specimens, leads to greater agreement among pathologists and presumably more uniform and standardized reporting.

Latent CD. The presence of latent CD is a threat to the diagnostic accuracy of biopsy, since these patients truly have normal intestinal histology.

Stenhammar et al.438 conducted an initial study of 100 first-degree relatives of 32 patients with CD. All 100 relatives were biopsied and two cases of CD were identified. In a 20-year follow-up study, Hogberg and Stenhammar247 performed serological evaluation (AGA, EMA, tTg) on these same 100 relatives and their offspring, with positive results prompting intestinal biopsy. All relatives with initial “mild or moderate mucosal” abnormalities remained unchanged and were not considered to have CD. Eight new CD cases were identified, two of these were relatives of the two cases diagnosed in the first study. One of these, a parent of an affected child, had a grade II–III lesion in the first study that normalized on a GFD, and remained normal after 3 years of a GCD; she was not classified as CD, though in retrospect she likely represents a late relapser rather than transient gluten intolerance or a true latent CD. The other patient had a grade II lesion, but initially was not regarded as having CD because of the absence of symptoms. She was also found to be DQ2 positive. The remaining six newly diagnosed subjects were offspring of index CD cases and were not part of the initial cohort. In all, only two subjects of the initial biopsied cohort were “missed” in the first study. In retrospect, these subjects should have been included. This suggests that biopsy has the potential of high sensitivity and specificity for CD. Unfortunately, in the follow-up study, the number and HLA status of those with mild-to-moderate mucosal abnormalities (serology negative) was not reported, and since not all subjects were rebiopsied it is also unclear if there is a group of serology-negative, initially normal biopsy relatives that have developed higher grade histology at follow-up, suggesting latent CD.

Maki et al.62 likewise after an initial biopsy screen of 113 first-degree relatives of CD patients, discovered 13 relatives with villous atrophy and crypt hyperplasia. During a 3-year follow-up period another three relatives, with previously “normal biopsies” who were AGA positive, were found to have CD. Unfortunately, the authors do not report on the number of relatives with low-grade histologic lesions, and whether the new cases were in patients with completely normal (Marsh 0) lesions or normal in terms of absence of villous atrophy.

Troncone et al.439 searched the medical records of 25 centres in Italy over a 10-year period to identify children with latent CD defined as either individuals with initial normal biopsies who later developed villous atrophy and responded to a GFD (Group 1), or people who were previously diagnosed with CD by ESPGAN criteria and who were subsequently found to have normal histology on a GCD for 2 years (Group 2). Nineteen such cases were found. All these patients had normal morphometric analysis and IEL counts on the initial biopsy. Four of the 14 GFD responders were considered at risk of CD (first degree, diabetes). The authors suggested that the five Group 2 patients could either represent true transient gluten-intolerance, or, in their opinion, more likely be late relapsers. These results of apparent post-pubertal recovery from CD are similar to those reported by Maki et al.440 and by Schmitz.441 Although the authors do not report on the number of charts or children screened, the findings of this study suggest that latent CD is very rare and unlikely to impact on the diagnostic accuracy of biopsy. It, however, underscores the importance of a time dimension in studies of CD, to accurately assess the true false positive and negative rates of diagnostic tests for CD.

IELs with normal villous structure. CD exists in patients with normal villous structure. The biopsy can pick up these patients on the basis of crypt changes and/or changes in the number and type of IELs.

Ferguson et al.442 assessed the relationship of raised levels of IELs to the final diagnosis among children with diarrhea. The authors found a lack of correlation between IEL counts and morphologic grading of the biopsy. However, among seven children ultimately found to have no organic disease, all had normal IEL counts in the range of 14–25/100 epithelial cells (ECs). Two of three children with CD on a GFD also had normal IEL counts. In contrast, the values were elevated to greater than 38 IEL/100 ECs in untreated CD patients. High counts were also found in three children with failure to thrive or diarrhea of unknown etiology, and in three of nine children with giardiasis. Though in these cases, the mean values were lower than in the untreated CD cases. Interestingly, among 14 children with gastroenteritis, ten had abnormalities of the villi, crypts or lamina propria, but all but one had IEL counts within the normal range. Although, the differential of mild mucosal changes is large, this study suggests that one of the histologic features of CD can distinguish between CD and other mild enteropathies, and could potentially allow for a relatively high sensitivity by allowing CD to be defined by a low-grade Marsh lesion, while maintaining some of the specificity. This theme will be revisited in studies that follow.

Table 46. Results of study assessing γδ+ IELs in patients with and without CD136
TestCeliac (n=27)CD excluded on biopsy (n=79)Biopsy-negative controls (n=28)
Mean # of γδ+ IELs40.4 (95%CI: 32.7–48.2)6.7 (95%CI: 4.8–8.5)1.6 (95% CI: 1.1–2.1)
Elevated γδ+ IELs (> 4.4 cells/mm)27 (100%)39 (49%)n/a
AGA positive21/26 (81%)33/66 (50%)n/a
Reticulin antibodies27/27 (100%)18/78 (23%)n/a
HLA DQ219/21 (90%)20/67 (30%)
Iltanen et al.136 assessed the γδ+ IELs in patients with and without CD. One hundred and seven patients were evaluated for possible CD. Twenty seven were found to have CD (25%) on the basis of ESPGAN criteria. As well, 28 biopsy-negative adults who underwent endoscopy for dyspepsia were used as controls. Table 46 details the main study findings.

The mean density of γδ+ IELs was significantly greater in CD patients compared with those patients where CD was excluded on biopsy, and compared with biopsy-negative controls. The density of these IELs was also significantly higher in patients with CD excluded on biopsy compared with controls. Because the authors used the ESPGAN criteria, which requires some degree of villous atrophy, the 50% of subjects with CD excluded based on this criteria who were AGA positive begs the question of how many of these were actually CD patients. However, based on the reported data, elevated γδ+ IELs were calculated to have a sensitivity of 100%, but a specificity of only 50.6%, although the true specificity is likely higher. In the biopsy-negative suspected CD group, 66 out of the 79 underwent testing for HLA DQ2. Out of these patients, 46 tested negative for HLA DQ2.Given the high NPV of this test, it is likely that most of those patients do not have CD. Recalculating the specificity based on this assumption would raise its value, but unfortunately a breakdown of the number of patients with normal and elevated IEL in relation to HLA DQ2 was not reported. In any case, a better comparison would have been with the biopsy-negative control subjects, but the number of control subjects with raised IELs is not reported. Based on the mean density of IELs in this group, the number of patients with elevated IELs is likely to be low. During follow-up of the children suspected of having CD, but with normal mucosal biopsy and positive serology, four patients developed CD and responded to a GFD, further suggesting that this “control” group of patients with CD “excluded” on biopsy likely contained true CD patients who did not have villous atrophy. The results also suggest that the measurement of γδ+ IELs can be valuable in the diagnosis of CD, and hints at the fact that the requirement of villous atrophy on biopsy may miss some subjects with CD, particularly if they have raised IEL levels , positive serology and are HLA DQ2 positive.

Table 47. Results of study assessing density of γδ+ IELs in patients with untreated CD, treated CD and control patients443
Sub-total/total villous atrophy (n=18)Moderate villous atrophy (n=7)Normal mucosa (n=9)Pediatric controls (n=15)Adult controls (n=15)
Dietnormal GFDn/a
γδ+ IELs/100 ECs14.817.514.53.13.6
Kutlu et al.443 also studied the density of γδ+ IELs in untreated CD, treated CD and control patients (Table 47). The study population was made up of five children with classic CD with total villous atrophy and improvement on a GFD (Group A), seven patients studied after 1 to 11 years of a GFD with mucosal recovery (Group B), and 22 patients with CD by ESPGAN criteria who were left on a normal diet for 1 month to 10 years (Group C). The control group consisted of 15 children with various GI disorders other than CD, and 15 adults undergoing intestinal surgery for gastric and pancreatic disorders. The report aggregated data from groups A and C.

The density of γδ+ IELs/100 enterocytes was significantly higher in CD patients (15.4, n=34) compared with pediatric and adult control patients (3.1 and 3.6, respectively). However, the density did not correlate with histologic grade or with a GFD. Unfortunately, this study has several methodological flaws, and estimates of the sensitivity and or specificity of IEL in CD could not be derived. However, the study does indicate the potential usefulness of measuring γδ+ IELs in the overall evaluation of biopsy specimens for possible CD, and again demonstrates that CD patients can have a biopsy with normal villous structure which can be distinguished from normals by assessing the number of IELs.

In an interesting comparative study of the correlation of IELs with AGA positivity by ELISA, O'Farrelly et al.444 studied 25 patients who had typical histologic features of CD and who were subsequently placed on a GFD. Ten of these were AGA positive, whereas 15 were negative. The second group consisted of 28 subjects suspected of CD but with “normal” small bowel histology. Twelve were AGA positive and 16 were negative. Increased levels of IELs were seen in both AGA positive (82.5) and negative (74.3) CD patients (difference not significant). On the other hand, among those with “normal” histology, AGA positive subjects had a significantly higher density of IELs than those who were AGA negative (42.4 vs 17, p<0.001). This data suggests that subjects suspected of CD with normal villous atrophy who have raised IEL densities should be further evaluated for CD, especially if serology is positive. These are also the types of patients where response to a GFD may be invaluable to firmly establish the diagnosis and help clarify the diagnostic value of low-grade histologic lesions.

Table 48. Results of study comparing density of γδ+ IELs in patients with confirmed CD, those undergoing investigation for CD, and control subjects445
Confirmed CD (n=9)CD under investigation (n=40)Controls (n=143)
IELs/50 ECs68.5551.2111.14
# with raised IELs (estimated from figure)9402
Saputo et al.445 compared the density of IELs between patients with confirmed CD, those undergoing investigation for CD, and control subjects (Table 48). The normal IEL range was determined to be between 4.68 and 17.60 based on the control group mean +/- 2 SD.

These results again suggest the usefulness of IELs in the evaluation of histology of patients being assessed for CD, and suggest a sensitivity of raised IELs of 100%, and a specificity of 98.6%. Unfortunately, the authors do not report the number of individuals under investigation for CD who actually ended up having CD, so as to estimate the diagnostic parameters in this group.

Table 49. Results of study comparing IEL density and villous/crypt ratio in patients with a suspicion of CD, and 59 biopsy-negative controls with dyspepsia436
Untreated CD (n=138)Treated CD (n=198)Suspicion of CD with normal villi (n=545)Controls (n=59)
CD3 + IELs68*40*2630
γδ+ IELs19.8*12*3.22.3
Villous/crypt ratio0.6*1.9*2.83.0
*

statistically different from control

Similarly, Jarvinen436 studied IEL density and villous/crypt ratio in 928 Finnish patients with a suspicion of CD, and 59 biopsy-negative controls with dyspepsia (Table 49). CD was diagnosed on the basis of a suggestive small intestinal biopsy showing some degree of villous atrophy with subsequent later improvement on GFD. The main results excluding DH patients are presented below.

The authors noted that using a cut off of 37 cells/mm for CD3+ and 4.3 cells/mm for γδ+ IELs, the sensitivities and specificities were 93% and 73% for CD3+, and 93% and 88% for raised γδ+ IELs, respectively. The PPVs and NPVs for raised γδ+ IELs were 95% and 85%, respectively, in this population. However, these results are based on the well-documented clear-cut CD group, and did not take into consideration the CD patients that might be in the suspicious but normal villi group. Among the patients with a suspicion of CD but normal villi and high γδ+ IELs (>4.3), 28% were EMA positive compared with only 8% with normal γδ+ IELs (<4.3). Unfortunately, the outcomes of these patients are not reported, so one cannot comment further based on this study about the usefulness of IELs in Marsh I or II patients.

Table 50. Results of study assessing IEL density in routinely stained specimens compared with specimens stained with the CD3 antibody446
CD (n=8)Treated CD (n=4)Non-CD (n=16)Controls (n=11)
Mean age33.546.346.439.1
IELs/100 ECs by H&E staining42.129.236.8Not increased
IEL/100 ECs in villous tip by CD 3 staining47.529.433.28.2
Mino et al.446 assessed the density of IELs in routinely stained specimens compared with specimens stained with the readily available CD3 antibody. Twenty-eight subjects with architecturally normal duodenal biopsies, which were well-oriented and demonstrated greater than 20 IELs/100 ECs were included in the study. AGA, EMA and tTG antibodies were measured. Subjects were divided in the groups listed in Table 50. Controls consisted of seven normal individuals, two patients with reflux, and two patients with irritable bowel syndrome.

There were no statistically significant differences between any of the groups when IELs were measured with H&E staining. However, all pair-wise comparisons were statistically different, except between the treated CD group and the non-CD group, when villous-tip IELs were counted with CD3 staining. The authors conclude that villous tip IELs are more specific indicators of CD, particularly with CD3 staining (which is more readily available than staining for γδ+ IELs), and suggest that the specificity of low grade Marsh lesions could be improved by these techniques.

Table 51. Results of study comparing IEL density and villous distribution among patients suspected of CD447
CD (n=12)Non-CD (n=66)Controls (n=24)
Mean age35.236.134.5
Iga EMA83 (no response to GFD)n/a
IgA AGA513 (all EMA neg.)n/a
Villous tip IELs11.64.32.2
IELs distributed evenly along the villi9/12 (75%)3/68 (4%)0

n/a = not applicable

In a similar study, Goldstein et al.447 compared IEL density and villous distribution among patients suspected of CD. Twelve patients were diagnosed with CD based on histologic features and response to a GFD, whereas in 66 patients the diagnosis of CD was excluded based on biopsy, and supported by negative serology (and in some cases a lack of response to a GFD). Control cases consisted of patients with dyspepsia who underwent endoscopy and biopsy. The main results are summarized in Table 51.

The authors found that the mean villous tip IEL density was significantly greater in the CD group than in the non-CD and control group. A more even distribution of IEL along the villi was also found to be significantly more common in the CD group compared with the other groups. However, this last point is controversial. Unfortunately, given that this is a small study, the authors did not look at differences in these characteristics among CD patients with different histologic grades.

Kuitumen et al.448 compared the histologic features of children with untreated CD, treated CD, other GI disorders (cow's milk allergy, DH, congenital lactase deficiency, acrodermatitis enteropathica, and giardiasis) and a group of control subjects without GI pathology. Of the 52 children with CD in this group, all had severe villous atrophy. CD patients had the lowest enterocyte height, and the most intense IEL infiltration of the studied groups. The authors found no overlap between CD patients and controls for the density of IELs, villous height, crypt depth, and villous height to crypt depth; all these parameters were statistically different between the CD patients and controls.

Kaukinen et al.449 studied 96 consecutive adults found to be ARA or AGA positive and compared them with 27 ARA- and AGA-negative patients with dyspepsia. All patients underwent duodenal biopsy and CD was diagnosed on the basis of a villous height to crypt depth of less than two and crypt hyperplasia. Twenty-nine patients met their biospsy criteria of CD (18 ARA- and AGA-positive patient, nine ARA-positive patients, and two AGA-positive patients). The 29 CD patients were placed on a GFD and of the 21 who were rebiopsied at 6 to 12 months, all showed unequivocal histologic improvement. The mean density of IELs in CD, serology positive, biopsy negative, and control patients were 87, 38, and 25 cells/mm, respectively. These numbers were statistically different. The mean density of γδ+ IELs among the CD patients was 16.6.Eleven serology-positive patients with normal villous structure (presumably Marsh I and II) expressed HLA DR and had higher levels of γδ+ IELs (mean of 13.4 cells/mm) than the non-CD controls. A repeat biopsy (time unspecified) was performed in 12 serology-positive patients with normal villous structure at the time of the first biopsy. Ten of these had raised γδ+ IELs density on biopsy (Marsh I or greater). Five of these 12 were found to have villous atrophy (Marsh IIIa or greater). This study further illustrates the later development of CD in subjects with mild histologic changes, and suggests that although the specificity of villous atrophy may be high (all patients responded to a GFD), the sensitivity of villous atrophy (Marsh IIIa or higher) is lower than that of the serological test used in this study. This suggests that using a lower biopsy cut-off grade could improve sensitivity, albeit at the cost of specificity.

Using another approach, Wahab450, 451 identified 38 patients with symptoms of malabsorption who only demonstrated raised epithelial lymphocytes on duodenal biopsy (Marsh I). These patients were given a gluten challenge of 30g/day for 2 months, while maintaining their normal GFC. Twelve of 38 patients developed worsening mucosal lesions of crypt hyperplasia and partial or subtotal villous atrophy. After institution of a GFD all 12 patients showed improvement of their malabsorption, and improvement of their histology, suggesting that they truly had CD.

The same authors,451 similarly studied 27 patients referred for malabsorption who were found to have a Marsh II lesion. HLA DQ2 or DQ8 was found in 21 of 27 patients (78%). The authors motivated 25 patients to follow a GFD, and all showed symptomatic improvement. The two patients who refused the GFD progressed to a Marsh IIIa lesion at follow-up. Although these data provide evidence of the true existence of CD in patients with Marsh II lesions, the frequency is unlikely to be as high as reported here. The high NPV of HLA DQ2/DQ8 suggests that at least some of the six testing negative likely don't have CD. In any case, this study adds further evidence to the notion that a Marsh III cut-off will miss some patients with CD.

In a very interesting study, Mahadeva et al.452 identified all duodenal biopsies performed over a 1-year period with increased levels of IELs, yet normal villous structure. Biopsies were formalin fixed and stained with H&E. Other biopsies showing at least subtotal villous atrophy and increased IELs were considered as “suggestive of CD.” Two normal control duodenal biopsies for every case of increased IELs with normal villous structure were also obtained. The upper limit of normal for IEL levels in this study was 22 IELs/100 ECs. Out of 626 biopsies assessed, 14 (2.2%) were found to have increased IEL and normal villous structure, whereas 15 (2.4%) cases of CD were identified. Normal histology was found in 502 (80.2%) of the biopsies. The biopsies with raised IELs had a mean of 38 IELs/100 ECs (range of 27–46). Control biopsies on the other hand had a mean of 12.4 IELs/100 ECs (range of 2–20). The presence of GI symptoms did not differentiate those with raised IELs from controls or CD patients in this cohort. Six of the 14 patients with raised IELs had positive EMA and/or unexplained anemia and were suggested as having “latent” CD by the authors. Unfortunately, follow-up in this group was incomplete with only three of these patients undergoing repeat biopsy. As with the previously described studies, the presence of patients evaluated for possible CD who have isolated increased IELs may contain a subset of true CD patients. In fact, if one assumes that the six EMA positive subjects with raised IELs do in fact have CD, then one can estimate that using a lower histologic grade to define CD in this population would have resulted in a sensitivity of biopsy of 100%, and a specificity of 98%—since only eight patients out of the studied sample of 531 would have been misclassified as having CD when in fact they did not. Of course, the expected specificity would not be as high as the one produced in this exercise since the authors do not tell us the histologic features or the diagnoses of the remaining 95 patients (626 biopsied, minus 502 normal, minus 15 CD, minus 14 raised IEL and normal villous structure = 95). However, taking this exercise further, if we assume that all of the other 95 patients were misclassified as having CD, then the specificity would drop to a still respectable 83%. Clearly, this type of study is the starting point in assessing the diagnostic parameters of the biopsy itself as a test. However, what is needed to fully assess biopsy as a test is a clearer measure of the false positive and negative rates. This can only be accomplished by using a battery of tests (biopsy, serology, HLA) to act as a gold standard to initially identify all potential cases, and then a follow-up period (response to GFD or gluten challenge) to assess the permanence of the diagnosis and the utility of biopsy at various cut-offs when used alone.

Table 52. Results of study assessing patients with suspected CD and Marsh I or II, before and after a GFD453
HistologyEMA+TTG+HLA DQ2γδ+ IELs
InitiallyMarsh III - 2 (patchy)8/109/109/9Marsh III - 25 cells/mm
Marsh II - 7Marsh I–II - 13
Marsh I - 1Controls - 1.4
After GFDAll Marsh II re-biopsied0/101/10 (Slightly elevated)SameReported as decreased values not reported.
Marsh I - 2
Marsh 0–5
Kaukinen et al.453 performed a study partially fulfilling the above requirements. Ten patients with suspected CD but only Marsh I or II lesions were compared with 27 biopsy-normal controls. The suspected cases were assessed before and after a GFD. The main results are presented in Table 52.

Although this is a small study with possible selection bias, the authors demonstrate that in a subset of patients suspected of having CD but without villous abnormalities, CD was diagnosed in all on the basis of a response to a GFD. Raised levels γδ+ IELs, positive serology, and HLA DQ2 positivity, supported the diagnosis of CD. Patients with CD and Marsh I-II lesions had significantly higher levels of IELs than controls. Unfortunately, this study did not include a larger sample of patients with Marsh I-II histology that included serology-negative subjects. Although it is clear based on this study that CD can exist in patients with Marsh I-II lesions with raised γδ+ IELs, it is difficult to generalize these results to an unselected sample of suspected CD patients.

In a somewhat complicated but important study, Kuakinen et al.98 assessed 271 patients with suspected CD by biopsy. Forty-five patients were classified as having definite CD on the basis of a Marsh III lesion. While in 136 patients, CD was excluded on the basis of a Marsh 0 lesion and normal levels of γδ+ IELs. The remaining 76 patients had an uncertain diagnosis of CD based on biopsy (absence of villous atrophy) and underwent HLA DQ2 and DQ8 testing. In 59 of these patients, there were minor mucosal lesions or positive serological markers, while 17 were already on a GFD prior to biopsy. CD was excluded in 11 of these 17 patients on a GFD. Of the remaining 59 patients, CD was excluded in 22 because of a negative HLA DQ2/8 given the high NPV of this test, whereas 37 were DQ2/8 positive and remained with the suspicion of CD.Overall, CD was excluded in 33 of 76 patients. Among patients suspected of CD, but without villous atrophy, Marsh I-II lesions were found in 20 DQ2/8-positive patients versus in five DQ2/8-negative patients. Elevated levels of γδ+ IELs were found in 20 patients who were DQ2/8 positive compared with seven patients who were DQ2/8 negative, and IgA-EMA was found in 16 patients who were DQ2/8 positive compared with 0 patients who were DQ2/8 negative. Although data is not provided for some patients, one can estimate the sensitivity of using a Marsh III cut-off. We know that CD was diagnosed outright in 45 out of 271 patients, but with subsequent testing a further 37 patients were found to be positive for HLA DQ2 or DQ8.At least 16 (EMA positive) and likely 20 (increased IEL counts) of these patients likely have CD. Based on these assumptions, the sensitivity of a Marsh III cut-off is between 69% (20 DQ2/8 patients with increased IELs have CD) and 74% (16 EMA and DQ2/8-positive patients have CD). The sensitivity would be lower if more of the DQ2/8 positive patients turned out to have CD. The specificity of that cuff-off would appear to be 100%, although we are not told if the Marsh III patients all improved on a GFD. Clearly using a biopsy cut-off lower than Marsh III would have increased the sensitivity, but unfortunately we are not given enough information to estimate this reliably.

This study with its battery of tests comes closer to the ideal design to estimate the diagnostic characteristics of biopsy, but unfortunately, it has significant short comings. To be fair the intent of the study was not to determine the sensitivity of a Marsh III cut-off. However, for the sake of future studies in this area, several design changes could have allowed this estimation. This study had two important positive aspects: it used a relevant clinically important population of patients suspected of having CD, and all the subjects underwent biopsy. However, it would have been ideal, if all the subjects also underwent HLA testing and serology. Furthermore, a follow-up of positive and negative patients, and or the assessment of the response to a GFD or the use a gluten-challenge in difficult to diagnose patients, would have allowed for the estimation of false positive and negative cases.

Relationship of serology to histology. As the data from the previous discussion suggests, CD clearly exists in patients with histological grades milder than Marsh IIIa. The fact that the sensitivity of biopsy is improved by using a lower grade as a cut-off brings up an important question. If the preceding statement is true, then what test is most sensitive for detecting CD with mild histologic changes—biopsy or serology? The issues surrounding this discussion have been addressed in the later portion of the serology discussion section, and a detailed narrative summary of the studies of the relationship of serology to histology can be found in Appendix H. However, to summarize, data from these studies as well as some data from Celiac 5 suggest that the sensitivity of serology drops with milder histologic grades, and suggests that serology alone would miss CD patients with mild histology grades.

In summary, CD exists in patients with histology grades less than Marsh IIIa. The sensitivity of biopsy at a Marsh IIIa or higher cut-off is likely less than that of serology with EMA or tTG. If lower Marsh grades are used, the sensitivity of biopsy increases, and it is possible that if morphometeric techniques including assessing IEL densities are used, the specificity may not suffer greatly. Ultimately, the question of the true sensitivity of biopsy can only be answered with a well-conducted study that attempts to identify all possible CD patients in a given clinically relevant population using multiple simultaneous tests (e.g., serology, HLA) in addition to biopsy. All patients, those who clearly have CD, those in whom CD seems excluded, as well as equivocal cases, need to be followed for the assessment of the permanence of their “diagnoses.” Equivocal cases could also be considered for further testing, either with assessing response to a GFD or gluten challenge, to help in the clarification of their diagnosis. Although there are other potential variables to consider, with these measures, assessment of the false positive and false negative rates of biopsy, and hence a clearer estimate of the sensitivity and specificity, can be determined.

Celiac 2: Incidence and Prevalence of CD

Incidence in the General Population—Different Geographic and Racial/Ethnic Populations

The crude incidence of CD among western European and North American countries over the past 25 years has varied between 1 and 51 per 100,000, and the cumulative incidence by age 5 between 0. 118 and 9 per 1,000 livebirths. Notable variations in CD incidence have not only been striking between neighbouring countries, such as is the case for Sweden and Denmark, but also between time periods for the same region, such was noted in the UK between the 70's and 80's as well as in Sweden over the 90's.

It is important to note that there were important methodological differences among the studies, from using patient registers200 to actively screening at-risk patients.128 Clinical practice also varied between time periods and regions. The advent of serological testing in the early 90's changed attitudes towards screening and identifying populations at risk with resulting higher detected incidences of CD. In some studies, active efforts were made to detect CD among asymptomatic subjects, such as the case in Finland where all subjects referred for endoscopy underwent small intestinal biopsy, independent of the cause for referral.199 The incidence of CD is also expected to vary according to the genetic make-up of the studied population, although the prevalence of at-risk HLA haplotypes was only noted in one study.128 These observations also highlighted the importance of dietary factors in triggering so-called CD epidemics among genetically predisposed populations. It would appear that breastfeeding bears a protective role, while early introduction of gluten, as well as the amount of gluten content in the diet may promote the early serological and pathological manifestations of CD. It is unknown whether these factors trigger an earlier expression of a disease which would become manifest anyway, or whether they trigger the appearance of a disease which may not otherwise occur, even later on in life.

In conclusion, caution should be exercised when extrapolating the noted incidence for one given region to a whole country, in particular in countries such as the US where there are differing population ethnicities among regions, between rural and urban areas, as well as between small and large cities. However, it remains that the true incidence and prevalence of CD are if anything greater than reported in clinical settings, since observations derived from screening and case-finding efforts were consistently greater than those relying on the diagnosis of clinically suspected cases. Lastly, it is important to bear in mind that, considering the large proportion of subjects with silent CD (the so-called celiac iceberg), observed incidences will depend upon the efforts spent screening cases, as is well illustrated by the difference in the relatively low incidence observed over 30 years in Olmstead county, where the majority of cases had clinically overt disease, as opposed to the very high incidence noted in Denver Colorado that resulted from a systematic and prospective screening of newborns and children at risk.

Prevalence in the General Population—Different Geographic and Racial/Ethnic Populations

The included prevalence studies demonstrated important differences in execution, tests for prevalence assessment, and in patient sampling, making pooled estimates of prevalence unreliable. Furthermore, the discussions regarding the operational characteristics of the serological tests themselves, the influence of disease prevalence on the PPVs and NPVs of these tests, and the criteria by which clinical and histological CD is defined, have to be kept in mind when considering the results of this section. The last point regarding the histologic definition of CD is particularly important in this setting, since one-third of the included studies did not seek histologic confirmation of serology diagnosed CD, and in another four studies, a large proportion of the serology-diagnosed patients did not undergo histologic confirmation. Finally, because of the previously discussed concerns regarding the sensitivity of serological tests in lower grade histological lesions, and the potential for missing true CD patients based on histologic criteria that require villous atrophy, the true prevalence of CD in the general population may still have been underestimated in these studies.

With these points in mind, the results of this report suggest that the prevalence of CD in the general unselected populations of North America and Western Europe is quite high and likely falls within the range of 0.5% to 1.26% (1:200 to 1:79). Smaller sample-size studies tended to give wider estimates ranging from 0.17% to 2.67%. Among the studies from the US, the range of prevalence was 0.4% to 0.95% in adults, and 0.31% in children. In Italy, the range of prevalence was between 0.2% and 0.8%, whereas the Scandinavian countries, Ireland and the UK, tended to show a higher prevalence of CD of approximately 1.0% to 1.5%, although there were also studies from those same countries that showed a lower prevalence.

In summary, the prevalence of CD in Western populations is likely close to 1% (1:100) and may be higher in Northern European countries. A firm estimate of the prevalence is impeded by between-study differences, and uncertainties regarding the performance of serological tests at these relatively “low” prevalences, compared with the 40% to 60% prevalences in the studies of the diagnostic characteristics of these same tests (Celiac 1).

Prevalence of CD in Patients with Suspected CD

The prevalence of CD is greatly affected by the study population. In populations where the diagnosis of CD is clinically suspected, either because of the presenting symptoms or the presence of associated conditions, its prevalence varied between 1.1%307 and 50%.301 This illustrates well how the patient selection process will influence the prevalence of the condition—studies reporting very high prevalence had populations that originated from tertiary, referral centers, while studies reporting low prevalence had populations that tended to originate from general practice. Although the report of the large American study of CD prevalence in at-risk and not-at-risk individuals did not specify how their subjects had been gathered,206 we can assume that these were derived from community practices, considering their large number.

Altogether the variations between the study populations, the diagnostic criteria and the study design were such that it was inappropriate to statistically combine the observed prevalence to obtain a summary measure. Nonetheless, considering studies with subjects who were not originating from a specialized referral centre, the observed prevalence of CD in subjects with symptoms or conditions associated with CD ranged between 1% and 4%.

Prevalence of CD in Patients with Type I Diabetes

The findings of this report suggest that the prevalence of CD in patients with type I diabetes is higher than the prevalence in the general not-at-risk population. These findings appear to be consistent across the studied age groups, and by the screening method. Although the magnitude of the risk of CD among patients with diabetes varied to some degree from study to study, many of these differences can be explained by issues of study design. An overall pooled estimate of the prevalence of CD in diabetes could was not calculated due to these study differences.

Almost uniformly, the prevalence of CD by biopsy was to some degree lower than the prevalence by serology. This may reflect the fact that there were some false-positive serology results in the prevalence of CD seen in these studies. Additionally, all these studies used some degree of villous atrophy to make a diagnosis of CD, which may underestimate the true biopsy prevalence of CD, since CD patients with Marsh I or II lesions were not considered. The prevalence by biopsy seemed to be lower still in studies that require subtotal or greater villous atrophy to make a diagnosis of CD. Furthermore, the prevalence by biopsy was uniformly low, as would be expected, in studies in which a large proportion of the screen-positive patients did not undergo biopsy. In these studies, the prevalence by biopsy was typically less than two percent, which likely represents an underestimation of the true prevalence of CD in this population.

The prevalence of CD by serology varied greatly with lows near 1% and highs close to 12%. However, the majority of studies, and particularly those using EMA or tTG, demonstrated prevalences in the range of 4% to 6%. Although the prevalence by biopsy also varied, the typical study with complete biopsy confirmation of serology-positive patients demonstrated prevalences in the range of 3% to 6%.

This evidence report has gathered the reported studies examining the relationship between diabetes and CD. Baring in mind the limitations noted above, we believe there is sufficient evidence to show individuals with type I diabetes are at higher risk of CD. The prevalence of CD in this population is likely between 3% and 6%.

Prevalence of CD in Relatives of Patients with CD

The prevalence is CD in relatives of patients with CD is elevated, both in first-degree and second-degree relatives. That prevalence varied between 2.8%246 and 17.2%235 in first-degree relatives and between 2.6%206 and 19.5%235 in second-degree relatives. The prevalence remains elevated among first cousins, and was 17% in the only study of these subjects.235

We have identified several factors that can be responsible for the variation in the observed prevalence. In particular, the selection of the families, of the relation to the index case, the diagnostic criteria, and the choice of study design.

The prevalence of CD appears to be generally higher in families with multiple known cases, such as reported by Book et al.235 and Mustalahti et al.241 Most other studies referred to their subjects as originating from a “CD family,” without systematically documenting the proportion of families with multiple known cases of either CD or DH.

As expected, in studies that looked at various degrees of relation, the risk was greatest in the first-degree relatives.206, 235, 239 However, Book et al.235 found no difference in prevalence between second-degree relatives and first cousins, i.e., 19.5% (95% CI: 15.1–23.9) and 17.0% (95% CI: 6.4–27.7), respectively.

Also, the age of the screened population might be a factor even beyond infancy, since it has been observed by prospective serological248 and histological237 follow-up studies that the serological and histological markers of CD can develop after an initial negative screen in a genetically predisposed individual. Therefore, a one-time assessment or screen in these individuals may be insufficient.

The serological diagnosis of CD will be affected by the diagnostic accuracy of the test. Fortunately, 11 out of 12 studies that used serological screening were EMA-based, a test with good diagnostic accuracy in populations with relatively high prevalence, such as relatives of CD patients. The single non-EMA study236 used AGA, a test with a lower sensitivity and specificity than EMA, but all seropositive subjects underwent a confirmatory intestinal biopsy.

The histologic diagnostic criteria also affect the reported prevalence, as was well illustrated by the study by Tursi et al.,249 where Marsh grades of I and II were also considered diagnostic, resulting in a prevalence of 44.1%.

The study design, especially whether all at-risk individuals are biopsied as opposed to solely those that satisfy a non-invasive criteria, is also to be considered. The EMA-based serological tests can miss milder forms of enteropathy as has been discussed, and this may explain why the prevalence of CD was generally higher in studies where all identified relatives were biopsied.

Prevalence of CD in Patients with Anemia

The results of this report demonstrate an increased prevalence of CD in patients with IDA. The prevalence is highest (between 10% and 30%) in studies of patients with GI symptoms, or in patients who have no gross lesions seen at initial investigation. CD appears to also be common in premenopausal women, both with (4.5%) and without (33%) heavy periods. Overall, in asymptomatic IDA patients assessed by serology or biopsy, the prevalence of CD was between 2.3% and 6%. Therefore, patients with IDA, particularly those without a clearly identifiable cause, should be evaluated for CD as part of their investigation.

Prevalence of CD in Patients with Low BMD

The studies of the prevalence of CD in patients with low BMD suggest that between 0.9% and 3% of patients with osteoporosis have CD. As a comparison, Fasano et al.15 found that in the United States 0.75% of the general not-at-risk population, and 4.55% of first degree relatives of CD patients were found to have CD.

The results from these studies should be interpreted within the context of some methodological limitations. Three of them used AGA as the initial screening test to prompt further investigation, and we have shown that the sensitivity of this test is not high. Furthermore, the biopsy criteria used to define CD was either not reported, or required the presence of subtotal, or greater villous atrophy (Marsh IIIb or greater). We have also shown that CD exists in patients with lower grade histological lesions. Furthermore, the study results are contradictory. Two showed a risk of CD higher than the general population,296, 298 while the other two did not.In particular, the study by Mather et al.297 found that seven out of the 96 screened patients were positive for EMA-ME, but none of these were positive on biopsy. From what we have seen regarding the specificity of this test being close to 100% (and therefore the PPV would be expected to be high as well), it is unlikely that there are so many false positives even if the prevalence of CD was low, and raises the question of whether early grade CD patients remained undiagnosed. As such, it is difficult to draw any firm conclusions about the true prevalence of CD in this population, given the contradictory results, the fact that lower grade lesions were not considered, and that no follow-up data was provided on the patients who screened positive for serology but did not meet the biopsy criteria. Taking into account these limitations, it is likely that the prevalence of CD in patients with osteoporosis is higher than that in the general population.

Celiac 3: Risk of Lymphoma in CD

The association between malabsorption and lymphoma is a concept that has evolved over the past century. The observation that a significant proportion of patients with intestinal lymphoma also had villous atrophy at a distance from the malignancy, or had previously been diagnosed with CD, led to the publication of several series on the topic.

Although the objective of the task order was not to determine the risk of CD in lymphoma per se, the broad coverage of our search strategy also allowed us to systematically appraise the literature on this question, and were able to identify only two controlled studies on this association, which we describe here.454, 455

Johnson et al.455 performed a retrospective search of the five main pathology laboratories serving Northern Ireland to identify all the incident cases of small bowel lymphomas (SBL) and small bowel adenocarcinoma from 1987 to 1996. The clinical presentation of the cases, as well as the presence or absence of villous atrophy at a distance, were noted. The prevalence of CD in this group of SBLs was compared with that of the general population in Northern Ireland, as observed from serological screening of the population at large.188 There were 13 cases of CD (gender not reported) out of 69 cases of SBL, all of which were ETCLs. Only one out the 13 CD cases was known to have CD prior to the diagnosis of SBL. The OR of CD in SBL was 27.98 (95% CI: 11.88–65.81) compared with the general population. The OR of unrecognized CD in SBL was 15.72 (95% CI: 9.71–25.45) compared with the general population.

In a prospective multicenter Italian study conducted between 1996 and 1999, Catassi et al.454 screened newly diagnosed adult patients with NHL for CD using EMA and AGA testing; EMA-positive or IgA-deficient patients underwent small bowel biopsy. There were six cases of CD out of 653 patients with NHL (prevalence 0.92%). Three had B-cell and three had T-cell lymphomas. Four out of six cases had lymphoma primarily located in the gut. Two patients were known to have CD for more than 1 year, one of whom was poorly adhering to a GFD. Two cases had been diagnosed with CD within 1 year of the diagnosis of NHL, whereas two other cases had no prior CD diagnosis. The prevalence of CD among these NHL patients was compared with that observed in two Italian studies which performed large scale screening for CD.126, 222 The OR of CD in NHL was 3.1 (95% CI: 1.3–7.6) compared with an age-and sex-matched population.

These observations point to a clear association between CD and lymphoma. To determine the degree of association, or to quantify the risk of lymphoma in CD, we searched the literature for controlled studies of the incidence of lymphoma in CD. Unfortunately, the majority of publications on lymphoma in CD were uncontrolled. Typically, patients diagnosed with CD in a single institution were followed over time and the incident cases of lymphoma were described, along with characteristics of the affected patients, the course of their CD and the histological type of lymphoma. Unfortunately, such studies provide little confidence to estimate the true risk of lymphoma in CD, since lymphoma per se will occur in the general population. The incidence of lymphoma has to be compared with “controls,” matched on various characteristics such as age, sex, period and population. Any study that did not adjust the observed incidence to the expected incidence for age- and sex-matched individuals of the same population was deemed uncontrolled and excluded.

Cohort studies, either prospective or retrospective, constituted the majority of controlled studies. The incidence of lymphoma in a cohort of biopsy-proven CD patients, calculated as the number of lymphomas divided by the number of patient-years of follow up, was compared with that of an age- and sex-matched population from the same geographic area and time-period.

The SIR therefore represents the likelihood of lymphoma in CD patients relative to those who do not have CD in the same population. The value of the denominator reflects the incidence of lymphoma in a given population, so that it is not possible to pool SIR's from different populations.

The AR, however, is a measure of association that provides information about the absolute excess risk of disease in CD patients compared with “non-afflicted” individuals. This measure is defined as the difference between the incidence rates in the CD patients and normal population and, in a cohort study, can be calculated as the difference of cumulative incidence (risk difference) or incidence densities (rate difference) depending on the study design. The AR is a measure of risk which can be pooled; however, since incidence rates were reported in only two studies, we had insufficient data to generate a representative summary statistic.

Furthermore, studies varied greatly at several levels, in particular with respect to the definition of an incident case of lymphoma, the reported outcome measure, and the CD population selection.

Studies differed in their definition of observed cases of lymphoma, in the following manners:

  1. Inclusion of malignancies that antedated the diagnosis of CD. In one American study, the number of at-risk years was calculated both from the time of CD diagnosis and from the time of onset of symptoms that could be attributed to CD.340 In a prior national survey to patients with CD,456 these authors had collected evidence to support that there is usually a long duration of symptoms before a diagnosis of CD is made in the United States, so that they considered this account justifiable. However, authors from other countries would specifically exclude the malignancies that were diagnosed prior to CD, assuming that it was unknown whether these were truly “at-risk” periods and that this account could falsely inflate the incidence of lymphoma in CD.333 Considering that publications uniformly calculated and reported the incidence ratio based on the time period from the CD diagnosis, this is the measure of risk that we selected.

  2. Inclusion of malignancies that were recognized simultaneously to the diagnosis of CD (i.e., within 1 to 12 months of diagnosis). In some cases, the diagnosis of CD can be unknown until the presentation of lymphoma. This fact highlights the possibility that lymphoma can occur in asymptomatic patients with CD. Although the importance of such cases is undeniable, the account of such cases can introduce bias and inflate the incidence of lymphoma in CD. In other words, the simultaneous diagnosis of CD and lymphoma is similar to an incident case in a patient with a “zero” duration of follow-up, i.e., is closer to a measure of prevalence than incidence. The inclusion of cases of lymphoma occurring in patients with previously undiagnosed CD should theoretically be related to all cases of CD, diagnosed and undiagnosed, in order to give an accurate estimate of incidence, which is obviously impossible. However, some studies chose to include such cases, while others excluded them from the incidence calculation. This distinction was noted in the results presentation.

  3. Exclusion of malignancies that were diagnosed incidentally at autopsy. In their large Swedish cohort of individuals hospitalized with CD, Askling et al.337 also excluded unsuspected autopsy diagnoses of lymphoma, assuming that such entities would have been silent during life, and that they therefore could not be controlled for in the comparator group.

  4. Case definition of lymphoma. Lymphomas are broadly categorized as Hodgkin's lymphomas and NHLs. The lymphomas that have been associated with CD have typically been of the NHL type, and so the majority of studies sought cases of NHL, with the exception of the Scottish study from Logan,336 where both Hodgkin's and NHLs were reported.

The reported outcome measures also varied and impaired our ability to combine observations. Some studies reported the incidence of lymphoma, while others, relying on death certificates for ascertainment of outcomes, reported on the mortality from lymphoma.

Finally, the patient selection also varied, along with the reporting of the circumstances that led to the diagnosis of CD. These factors limited our ability to draw conclusions on the risk of lymphoma in symptomatic versus asymptomatic patienst with CD.

We were also unable to find controlled data on the risk of lymphoma in refractory CD, an objective which had been suggested by the TEP. We did find, however, two prospective studies and one retrospective study that could lend support to the notion that the risk of lymphoma in refractory CD is greater than that of responsive CD.457–459

In the Netherlands, Wahab et al.457 prospectively followed 158 biopsy-proven CD patients to assess the recovery of histological changes with a GFD over time. There were 11 incident cases of refractory CD with more than 5-years of follow-up, five of whom developed ETCL, in contrast to none of the remaining GFD-responding CD patients.

Goerres458 reported on 18 patients diagnosed with refractory CD between 1998 and 2000, gathered from all over the Netherlands, whom they treated with azathioprine and prednisone. There were three men and 15 women, with a mean age of 58 years (range 39–82). Subtypes of IEL populations were analyzed by flow cytometry, allowing for the classification of refractory CD patients into two types: type I refractory CD (n=10), in which a normal IEL population is seen, and type II refractory CD (n=8), in which an aberrant IEL population is present. All of the patients with type I refractory CD responded to combined azathioprine-prednisone therapy, whereas none of the patients with type II refractory CD showed a response. In fact, six of the eight patients with type II refractory CD developed EATL within a 3-year period, and a seventh patient died with blastic T-cell-like cells in the small bowel and the liver, and myeloproloferative changes in the bone marrow. The authors concluded that type II refractory CD is a premalignant condition with a very poor prognosis.

In a French national cooperative study, the clinical information and tissue specimen necessary for IEL subpopulation analysis were gathered from 21 patients diagnosed with refractory CD between 1974 and 1998.459 There were five men and 16 women, with a mean age of 51 years (range 29–73 years). Nine of the 21 patients (43%) died from severe malnutrition and/or lymphoma (three patients) after a mean of 6.7 (range 1–14) years after the onset of symptoms of refractory CD. A phenotypically abnormal IEL population associated with evidence of clonality was found in eight of the nine patients that could be tested. The authors suggested that refractory CD may be the missing link between CD and ETCL.

This systematic review identified nine controlled studies that met inclusion criteria. The major observation of our review is that the risk of lymphoma in CD was significantly increased compared to an age-matched population from the same region and period in 8 out of 9 studies. The SIR (NHL) varied from 2.66338 to 42.7,333 whereas, the SMR from NHL or lymphoma in CD varied from 11.4337 to 69.3.339 This increased risk persists even when the cases that are diagnosed with lymphoma simultaneously or within 1 year of the diagnosis of CD are excluded from the calculation.

Some observational studies suggest that the risk of lymphoma, relative to patients of the same age without CD, may be highest in individuals who were diagnosed during adulthood,336, 337 and appears to decrease with adherence to a GFD, as shown by several authors.333, 336–339 It is also interesting to note that the only study that did not report a significant increased risk of lymphoma was one where 75% of patients were on a strict GFD.338

The differential risk of lymphoma among patients diagnosed with CD in adulthood versus childhood may indicate that early diagnosis and treatment with a GFD is protective. The possibility that a GFD may be protective is also supported by Askling et al.337 who found that the risk of lymphoma dropped to unity after 15 years of follow up. Limitations in the designs of these studies, however, prevents firm conclusions. These studies have followed relatively few patients diagnosed as children through middle age when the risk of lymphoma rises, and they may not have accounted for other factors (severity of symptoms, or other marker of disease activity) which might affect risk. The distinction between childhood and adult diagnosis of CD in the published cohorts relies on the presence or absence of CD-related symptoms during childhood, which has historically been a key factor in CD diagnosis. Based on the observations from these groups of patients, it would seem that continuous gluten exposure and ongoing mucosal damage sets the stage for malignancy later on in life. It remains unclear, however, why some individuals would have persistent mucosal damage in the absence of symptoms. Would these individuals also carry other characteristics that modulate their risk of malignancy? As we tap into the base of the “celiac iceberg” through systematic screening, we will hopefully in the future be able to observe the incidence of lymphoma in child and adult CD populations who were identified through population screening, and placed on a GFD despite them being asymptomatic during that period of their lives. The notion that lymphoma arises from prolonged antigenic stimulation should be confirmed if the risk of lymphoma is, as expected, lower than historical CD cohorts in those individuals.

Celiac 4: Consequences of Testing for CD

The search strategy did not identify any studies that would allow us to address the specific benefits and harms of testing with different strategies for CD. At present, there is inadequate information from the published literature on the benefits and harms of screening and the potential risks of undetected CD. Prospective trials of screening would be helpful to provide the data necessary to construct the tables that depict the consequences of screening specific populations. Information on the consequences of screening will come from the currently ongoing large population based prevalence studies.

An external file that holds a picture, illustration, etc., usually as some form of binary object. The name of referred object is er-celiacdisf31.jpg.

   Figure 31. PPV based on pooled estimates of sensitivity and specificity

The consequences of such issues as false-positive results were dealt with in the Celiac 1 Discussion. As discussed in that section, the definition of CD used and the prevalence of CD in the test populations, have a great impact on the diagnostic parameters of the available tests. We have presented data that show that the sensitivity of the available tests declines considerably when applied to patients with low-grade histological lesions. Unfortunately, there is insufficient data to address the question of what is the consequence of missing patients with low-grade histological lesions if serological screening alone is used. As described in Celiac 1, all the diagnositic test studies of the various serological markers were undertaken in study populations in which the prevalence of CD exceeded the that observed in most clinical situations. We have shown that the positive predictive value, which is predominately influenced by the test specificity and the prevalence of CD in the test population, drops from the reported values to much lower values when the test is applied in typical clinical populations. To illustrate this point, Figure 31 highlights the expected PPV when applied to different test populations.

As can be seen from Figure 31, the PPV—the probability that a positive test result actually represents true CD—drops with the prevelance of the population in which the test is applied. This relationship holds true for all the summary curves, but differ in degree. It is important to note that the PPV is predominantly influenced by the specificity of the test and prevalence. Since we have identified that the specificity of EMA and tTG is quite high, the major influence on the PPV in these analyses is the prevelance of CD in the population being tested. The practical importance of this discussion, is that despite having very high specificity, the use of these serological markers in low-prevalence populations would be expected to result in high false-positive rates. Below a prevalence of 5%, the false-positive rates may be as high as 30% to 50% based on our estimates. This may seem counterintuitive, given that the specificity is greater than 95% and close to 100% in some cases. One must keep in mind that unless the specifity actually equals 100%, the prevalence of CD will influence the PPV. As the specificity approaches 100%, the influence of the prevalence decreases. The same interplay occurs between the negative predictive value (the probability that a person with a negative test does not have CD), and the sensitivity of the test. However, in this case, the NPV rises as the prevalence of the disease falls (see Celiac 1 Figures). Given that we have identified that EMA and tTG have a sensitivity in the range of 95%, the NPV would be expected to be very high (>96%), particularly in low-prevalence populations. This would mean that the false-negative rates with these tests are less than 1% to 4%. These data would then suggest that a negative test result would have a high probability of being a true negative result, but that a positive test would have to be considered in light of the expected prevalence of CD in the tested population. If the expected prevalence is in the range of 10% or lower, then the possibility that the result represents a false-positive should be considered. Lastly, one must not forget the discussion regarding the true sensitivity of these serological markers when lower grade CD lesions are considered. The studies by Rostami et al.16 and others, suggest that the sensitivity can be lower than 80%. In fact, both Rostami et al.16 and Tursi et al.424 suggest that the sensitivity for grades less than Marsh IIIa, is in the range of 30% to 40%. If this is the case, then the nearly perfect NPV discussed above would be expected to fall, particularly in groups with a higher prevalence of CD. For example, if the sensitivity was really 75%, then the NPV would drop to 88% (12% false negatives) if a population of patients with suspected CD was tested. However, because of the strong influence of a low prevalence (<15%) on the NPV, the NPV will remain higher than 90%, as long as the sensitivity of the test is greater than 50%.

Expected Outcomes of Treatment of CD

The four studies of diabetes and CD in children/adolescents that evaluated the impact of a GFD found that body composition parameters improved on the GFD, but HbA1c levels did not improve. Some studies observed an increase in the insulin requirements after introduction of a GFD, which could be explained by improved absorption of nutrients.

The results of studies on anthropometrics and body composition in CD patients are variable due to differences in populations, and methods used to evaluate body composition. Overall, weight and BMI improves after starting a GFD. Individuals with CD may have a lower BMI when compared with controls because of lower daily energy intakes, particularly in those who strictly follow a GFD.

A few small studies have evaluated the impact of the diet on nutritional parameters in newly diagnosed symptomatic CD patients. These studies found that nutritional status does improve in the majority of subjects with CD on a GFD. Certain biochemical parameters such as ferritin may take longer to normalize. There is evidence that the recovery of nutritional status is linked to improvement of villous atrophy. Larger studies of nutritional status in those with classical and silent CD patients and the relationship of biochemical values to changes in histological grade on small bowel biopsy and compliance with the GFD would be helpful.

Compliance with the GFD was assessed in adolescent populations in three studies and the results varied. Compliance with a strict GFD was greater in those who were symptomatic, compared with those who were diagnosed via a screening program. Another study in adults by Ciacci et al.460 looked at the correlation between intestinal biopsy and compliance (assessed by dietary interview) and found that that intestinal damage was significantly associated with dietary compliance. Low or very low compliance with a GFD had a PPV of 92.8%, and good compliance had a negative PPV of 96.8%. This study also suggested that those with more severe symptoms at diagnosis were more likely to have better compliance. Given the poorer compliance in those without symptoms, different strategies to promote adherence with the GFD may need to be developed if screening for CD is promoted.

The justification for screening the general population for CD would be strengthened by well-conducted comprehensive cost-effective analyses. Only one study360 appeared to include the majority of the components that have been recommended for the reporting of cost-effectiveness analyses (CCOHTA, Guidelines for Economic Evaluation of Pharmaceuticals: Canada, 1997). None of the analyses incorporated the use of health related quality of life or utility assessments.

Fractures/BMD/Osteoporosis/Osteopenia

There were a number of methodological limitations in the studies that examined bone-related consequences of CD. Limitations included: selection of representative cases and controls, ascertainment of the outcome and failure to identify and control for relevant co-interventions such as calcium and vitamin D.

The issue of whether fractures are increased with individuals with CD appears to be somewhat controversial based on results of the included studies. Both Thomason et al.394 and Vestergaard et al.388 did not find increased fracture rates for CD subjects, whereas, the recent population-based study by West et al.385 did find an increased rate of fractures. This is an important issue to clarify since osteoporotic fractures are one of the key reasons for promoting strict adherence to the GFD and for making decisions about screening. In some studies, the sample sizes were small and may not have been large enough to detect an increased risk in fractures in subjects with CD relative to controls. In addition, methodologies and study populations varied, and not all studies controlled for duration of CD. Moreno et al.392 found that the risk of fracture in subclinical and silent cases of CD was not significantly different from that of controls. Overall, the risk of fracture seemed to increase with age as one would anticipate and may be greater in those patients who were clinically symptomatic. Based on results of current studies, the risk of fracture appears to be highest prior to diagnosis of CD and diminishes once individuals are on GFD. This latter finding would be consistent with the increase in BMD that is seen after 1 year on a GFD. Additional population based fracture studies would be useful to clarify the relative and absolute risk of fracture in CD and to determine if it differs in asymptomatic cases.

Overall, the studies consistently documented an increased prevalence of osteoporosis/osteopenia in newly diagnosed patients relative to controls. There was a significant increase in BMD, especially within the first year of being on a GFD. Some of the variability in the results could be attributed to proportion that were compliant with the diet and use of co-interventions such as calcium and vitamin D. Moreno et al.392 found that the lumbar spine BMD did not differ in groups according to clinical presentation, but they did find a significantly lower T score of the femoral neck BMD in classically symptomatic cases versus subclinical or silent cases. Mustalahti et al.,378 however, found that BMD in the spine was lower in asymptomatic cases.

Based on the two studies in children,352, 377 BMD appears to normalize in children after treatment with a GFD. The normalization of BMD in children would support the need for early diagnosis of CD and treatment. However, in children skeletal growth may affect BMD, with some of the change relating to changes in growth. Most studies of BMD in adults on a GFD have found that the BMD is still reduced at all sites when compared to normal controls. One study suggested that those without secondary hyperparathyroidism at time of diagnosis may normalize their BMD, but this finding was not replicated. A large BMD study with baseline and follow-up small bowel biopsy data, and documentation of clinical presentation, percent compliance with the GFD and adjustment of co-interventions is recommended to give us accurate information on bone-related consequences of CD.

Mortality

The majority of observational studies have demonstrated an increase in overall mortality rate (SMR of 2 or greater) in subjects with CD when compared with the general population. The increase in mortality can be attributed to deaths from malignant diseases, respiratory, and digestive diseases. The increase in mortality appears to be greatest within the first 3 years after diagnosis and declines over time. The mortality rate seems to increase with longer delays in diagnosis and poor adherence to the GFD. Perhaps one of the most important points from the Corraro study,362 is that the mortality rate was not increased compared to the general population for those individuals who had mild symptoms or were asymptomatic. This latter result has potential implications for population screening for CD.

Celiac 5: Promoting or Monitoring Adherence to a GFD

Monitoring Adherence to a GFD

Some of the same concerns expressed in the other celiac objectives, regarding clinical definitions, histological criteria, and the performance of the serological tests, are repeated when the results of the studies on monitoring adherence to a GFD are considered. Foremost in facilitating the interpretation of these studies is the question of what to consider as the histological criteria to define recovery on a GFD. Certainly normalization to Marsh 0 would constitute recovery, but what about improvement to Marsh I or II, or even accepting Marsh IIIa? The distinction has important implications for assessing the strength of the correlation between histological and serological improvement, and in this regard, different studies have adopted different cut-offs.

It is clear from the presented studies that improvement of symptoms does not offer an accurate assessment of adherence to a GFD as judged by interview or by biopsy. This point is illustrated in the study by Kluge et al.461. In follow-up of 18 adult patients with CD, all patients felt well and appeared to be clinically in remission. Nonetheless, only 17% of the patients reported being on a strict GFD. Biopsy assessment of eight patients showed six with total villous atrophy including one patient who reported strict adherence to GFD. The remaining two patients did not have villous atrophy but the mucosa was not normal, including an excess of IELs. Thus, small amounts of gluten may provoke a histologic change without clinical symptoms which may be an important reason why adherence to GFD may be less than perfect. In other words, non-compliance does not necessarily translate into noticeable consequences for the patient. Furthermore, it is increasingly recognized that most CD patients don't have symptoms, so reliance on symptomatic improvement is clearly not adequate.

There is good evidence that mucosal recovery following institution of GFD is slower and more incomplete than previously assumed, especially in adults.405, 411, 414 Whether this slow recovery is due to dietary transgression, inadvertent gluten intake or whether this is simply the natural history of the disease is less clear. This has definite implications for the interpretation of both biopsy and serology results in monitoring adherence to GFD, particularly in the short run.

With the advent of the newer and more sensitive serologic tests for CD (EMA, tTG), the possibility of a reduction in the need for follow-up biopsies and a move towards non-invasive serological monitoring has been proposed. The question arises as to whether serology can detect dietary transgressions and reasonably mirror histological improvement on a GFD.

A number of studies show that values of serologic markers will fall with increasing duration of GFD, whether one looks at IgA-AGA, IgA-EMA, or IgA-tTG. As well, several studies suggest that in both adults and children, increasing degrees of non-compliance with a GFD, are more likely to be associated with positive serologic tests.396, 402, 408. The question, however, is not whether serology can pick-up major transgressions such as with a gluten challenge which it is clearly capable of assessing,400, 404 but rather if serology can pick-up milder degrees of dietary non-compliance and reasonably reflect histologicalstatus. A high rate of falsely-negative serology with lesser degrees of dietary transgression would diminish serology as a means of accurately monitoring adherence.

In both adults and children, the sensitivity of serology for picking-up dietary transgressions based on interview or self-reporting is disappointing.401, 402, 410, 415 One conflicting study412 showed a good correlation between serology and adherence. This likely reflects the way patients were categorized, and it is likely that in this study, patients with lesser degrees of dietary transgression were categorized as compliant. In general, there is a significant rate of normal serology in patients identified as not adhering to a GFD. Furthermore, evidence from several studies suggests that serology, regardless of the actual test used, does not adequately reflect the mucosal state in adults.398, 403, 407, 409, 409, 413 Surprisingly, it seems that serology may be normal, not only in Marsh I or II lesions, but also when there is villous atrophy present.398, 407, 409, 413 Although the specificity of various serologic markers for villous atrophy seems better than sensitivity,398 the NPV of serology would suggest that a negative test does not offer high assurance of the absence of villous atrophy.

As discussed earlier, mucosal recovery can be a slow process. It may be that serologic markers may better reflect histology in long-term follow-up. Certainly, in the range of follow-up of these studies (6–30 months), serology may be negative despite villous atrophy. There is evidence that even in longer follow-up, serology does not accurately reflect adherence.398, 402, 410

In younger patients, IgA-AGA and IgA-EMA-ME may better represent the mucosal state.397, 415 These studies are in keeping with the impression that in children and adolescents, mucosal recovery is faster and more complete. In children, serology seems to be a better marker of the absence of villous atrophy. Still, serology may be negative in the face of lesser degrees of histologic abnormality without villous atrophy.397 The significance of such lower-grade biopsy abnormalities, although, is unclear.

It is possible that IgA-AGA may rise faster with non-compliance to GFD than other markers.396, 400 However, there is little direct evidence to show superiority of one serologic test over another in monitoring adherence.

Perhaps an important question that arises from this discussion, with particular relevance to symptomatic CD patients, is: “is it good enough for CD patients to show symptomatic improvement and a corresponding fall in, or normalization of, a sensitive serological marker without need for ‘normalization’ of the intestinal mucosa?” Unfortunately, this question is not an easy one to answer since many of the outcome studies in CD, particularly for lymphoma and mortality, did not specifically address differences in histologic grade. Furthermore, we identified no clear evidence suggesting that refractory sprue was the result of dietary indiscretion as opposed to a different spectrum of CD. Nonetheless, histological improvement appears to be important. For example, one study356 demonstrated that osteoporotic patients with CD on a GFD who had Marsh III lesions had lower median Z-scores than those with grades less than Marsh III, while another study demonstrated a significant correlation of nutritional status measured by histomorphometric index, with the severity of the histological biopsy grade.346In the former study as well as one other study,358 histologic grade correlated with degree of IDA, all suggesting that the goal of monitoring should be to assess degree of histological improvement.

It can be concluded that the return of serologic markers to normal is associated with duration of GFD and degree of patient compliance. Unfortunately, the correlation remains imperfect, especially in adults, and seems to reflect gross rather than minor degrees of dietary transgressions. Serological tests seem to have a higher specificity than sensitivity for dietary transgressions. It is recognized that this area is controvercial and that clinicians are moving away from routine follow-up biopsy as a means to assess dietary compliance. It seems reasonable to suggest that improvement in clinical parameters, and disappearance of serological markers would be an adequate measure of response to a gluten free diet. In children, because of their faster and more complete mucosal recovery, this strategy of using serology may be an appropriate means to monitor adherence. In adults, however, the situation is somewhat more complex. Therefore, while serology certainly can be an adjunct means to monitor adherence to a GFD, consideration should be given to assessing histological improvement since some evidence exists to suggest that mucosal improvement to at least below a Marsh III appears to be important from an outcomes perspective. If biopsy is to be utilized as a means of assessing adherence to a GFD in adults, the timing of the biopsy needs to take into consideration the slower mucosal healing in adults, and should therefore be performed after 1 year to 1.5 years of a GFD.

Interventions to Promote Adherence to a GFD

Changes in dietary habits are difficult to attain and maintain. The barriers to compliance are many. No interventions to promote compliance with GFD have been studied and found to be effective. Adding to the difficulty of assessing any proposed intervention is the lack of certainty as to how best to measure GFD compliance.

The existing evidence suggests a positive correlation between parental socioeconomic status, education, knowledge of CD, and the compliance of their children.416, 418 Compliant children may also have a better knowledge of CD420 than those children who are non-compliant. Improved knowledge in adults also appears to correlate with compliance.419 It is, therefore, not unreasonable to suggest that interventions designed to improve knowledge about CD in general, and about GFD, and specifically how to identify gluten-containing products, would likely improve compliance with a GFD. Improving knowledge regarding gluten-containing food products and additives would also likely improve self-confidence in choosing gluten-free foods as suggested by Lamontagne et al.419 Improved knowledge of outcomes of untreated CD may also improve compliance. Such information interventions, however, would need to be prospectively evaluated to ensure that they perform as expected.

Membership in a local celiac society appears to be an effective means of promoting compliance with a GFD. This is not surprising since such organizations provide CD patients with not only improved knowledge regarding their disease, and the intricacies of the GFD, but also provide emotional and social support.

It is interesting that one study417 has demonstrated lower rates of compliance in children detected by screen as compared with those diagnosed on the basis of symptoms. It seems logical that if there are no obvious detrimental symptoms from a gluten-containing diet, that children and likely adults will be less likely to be compliant. The authors speculate that since screen-detected patients had a higher mean age of diagnosis, compliance might be promoted by earlier identification. They speculate that earlier detection would avoid the difficulty of changing formed eating habits.

Is early detection of CD an effective intervention to promote compliance? It appears rational that it would be easier to follow a GFD if it were introduced at an earlier age. There are some interesting observations 417 that suggest that diagnosis in early childhood is associated with improved compliance.421 Unfortunately, the issue of compliance in asymptomatic screen-positive individuals casts doubt on the positive downstream effects of screening asymptomatic populations for CD, particularly if the low-compliance rates in asymptomatic individuals can be reproduced in other studies.

In summary, it is suggested by the results of this report that a multidisciplinary approach to patient and parent education and support by physicians, dieticians, and celiac societies, possibly employing formal knowledge and decision support interventions that involve the patient (and parent) directly, are likely to improve compliance in individuals diagnosed with CD. Formal testing of interventions and programs would be valuable.

Strength of the Body of Evidence

Celiac 1

Overall, the quality of the diagnostic studies assessed in the Celiac 1 objective was quite good, due largely to our stringent inclusion criteria. However, 59% of the included studies reported using a selected patient population that may not be representative of a clinically-relevant population. This is likely related to study design. In addition, only 11% of the studies reported on whether the reference test was reported without knowledge of the index test. However, we felt that this was not a major threat to the validity of the studies.

Two other factors that affect the interpretation of these results, yet were not captured in the quality assessments, are the threshold effects for determining the positivity of a serological test, and the high prevalence of CD in these studies (see above). With these considerations in mind, the overall strength of the evidence is quite good.

Celiac 2

The overall quality of reports of the included studies in the Celiac 2 objective was found to be marginal to fair. For example, most of the studies did not report on whether the patients were consecutively enrolled, a factor that could contribute to selection biasHowever, setting aside the quality of individual studies, from a policy perspective, the strength of the evidence is fairly good in that the study populations were selected to reflect that of a North American/Western European descent, that should reflect the demographics of the US population.

Celiac 3

The studies included in the Celiac 3 objective were found overall, to be of good quality. Again, the overall strength of the evidence is due largely to the stringent inclusion criteria, such as the requirement for the reporting of standardised rates for the outcomes based on rates from the local general population, and the overall good quality of the included studies.

Celiac 4

The majority of studies included in this objective were single group “before-after” studies, although some had in addition a comparative healthy control group. We could not identify any quality instruments for this type of study design and in general, this type of study is considered weak, particularly in the absence of a control group. Overall, however, the strength of the evidence for this objective is fair to good and suggests that the results can be used for policy decisions with the understanding that this area of CD research is still relatively new and requires further high quality studies.

Celiac 5

The majority of studies in this objective were also of a “before-after” design. However, in this setting, this design may not pose a major limitation, since the purpose of the study is to assess the change in serology and histology after introduction of a GFD. In this regard, the strength of the evidence for monitoring adherence to a GFD is fairly good. However, there is almost a complete absence of studies of interventions for the promotion of adherence to a GFD.

Future Research

This review has allowed us to identify several areas in need of future research. Perhaps the most important of these is a need for the development of a consensus on the definition of CD in the era of advanced serological testing. As discussed in the report, this distinction of what one calls CD has profound implications for each of the requested task order objectives. Do screen-positive patients without villous atrophy have CD. Certainly the preliminary evidence suggests that this is the situation in many cases. However, what is required is a new definition of a gold standard for the diagnosis of CD. This new gold standard may include a combination of serology, biopsy and HLA testing. Such a gold standard, when used in studies with a time dimension (e.g., response to a GFD or gluten challenge; extended follow-up), would help answer some of the uncertainties identified in this report including: the real performance of the serological tests when low-grade lesions are considered CD; the diagnostic performance of biopsy alone; the outcomes of patients with these low-grade lesions; and, those that would be “missed” using current screening strategies. Even in the absence of a new gold standard, we could not identify a well-conducted study of the diagnostic performance of the various serological markers when applied to an average population (i.e., one with a prevalence of CD in keeping with the range identified for average risk), with the entire cohort being investigated equally (i.e., all are biopsied). Such a study would at least be able to shed light on the performance of these tests in average-risk patients, and since all patients are biopsied, the relationship of histology to serology could be further assessed.

On a similar theme, we have identified multiple studies that suggest the importance of histological improvement on a GFD. This is a controversial area since in common clinical practice, clinicians are moving away from routine follow-up biopsy. It seems reasonable to believe that improvement in clinical parameters with loss of serological markers is adequate evidence of response to a GFD. In children, this issue may be less important since histological improvement is much more rapid and complete than in adults, and correlation with serology seems better. However, we have identified multiple studies in adults that suggest poor correlation between serology and improvement of histology on a GFD, and other studies that suggest that serology is useful for detecting gross dietary indiscretion, but not minor occurrences. Therefore, the question that arises is what constitutes adequate improvement on a GFD, and what are the criteria to define this improvement. Based on the lymphoma literature that suggests that this malignancy may arise from chronic antigenic stimulation and immune activation, what are the outcomes of adults with clinical improvement, yet persistent histological abnormalities? Are some histological features, such as reduction of mucosal lymphocytes, more important markers of improvement and possibly prognosis than other features such as villous height?

We feel that clarification of these fundamental questions is necessary for the conduct of future studies in all areas of CD, and in particular studies of the diagnostic tests and the outcomes in CD, since these are so dependent on the definitions discussed above.

Conclusion

This report has provided a systematic review on five broad areas of CD, with each of these areas including important sub-components. Perhaps one of the most important findings of this report is the understanding of the importance of how one chooses to define CD in the era of serological testing, and how this apparently clear-cut task has profound implications on all the results presented in this report. Specifically, can CD be diagnosed solely on the basis of serology? Is some degree of villous atrophy necessary for the diagnosis of CD? These questions have important implications downstream of the diagnosis as well. Do CD patients without symptoms or villous atrophy have the same risk of complications as those with villous atrophy? Is serological improvement on a GFD sufficient to reduce CD complications or must there be documented histological improvement, and what degree of histological improvement is necessary?

The results of the Celiac 1 objective suggest that in the era of EMA and tTG antibody testing, AGA testing in both children and adults has a limited role. The sensitivity and specificity of EMA and tTG are quite high (over 95% for sensitivity, and close to 100% for specificity), as are their PPVs and NPVs, but as previously discussed, one has to be aware that the reported diagnostic parameters are taken from studies in which the prevalence of CD was, for the most part, much higher than that seen in usual clinical practice and certainly the PPV of these tests may not be as high as reported when these tests are applied in general population screening. The bulk of the evidence on the diagnostic characteristics of these tests was derived from studies that defined CD as having at least some degree of villous atrophy. We have identified studies that suggest that the sensitivity of these tests drops, at times significantly, when applied to populations with CD with lower-grade histological lesions. This not only has implications regarding those patients with “mild” CD who were missed during screening efforts, but also puts into question the nearly perfect NPV of these tests.

HLA DQ2/DQ8 testing appears to be a useful adjunct in the diagnosis of CD. The test has high sensitivity, in excess of 90% to 95%, but because around 30% of the general population and an even higher proportion of “high-risk” subjects including diabetics and family members also carry these markers, the specificity of this test is not ideal. The greatest diagnostic utility of this test appears to be its NPV.

Biopsy itself, when used with a strict cut-off requiring villous atrophy, appears to have high specificity, but poor sensitivity. Using lower grade cut-offs clearly improves sensitivity, but because of the wide differential of causes of histological lesions similar to Marsh I to IIIa, the specificity suffers. The use of histomorphometric measures, such as quantification of γδ+ IELs, are likely to allow for the use of lower-grade cut-offs while maintaining reasonable specificity. Ultimately, a trial utilizing multiple diagnostic tests in an attempt to capture as many CD patients in a clinically-relevant population as possible, with a time dimension including a response to a GFD or gluten challenge, is required to fully assess the diagnostic characteristics of biopsy alone. This type of study would be able to characterize the false-positive and false-negative rates if all studied patients are followed forward in time.

The included prevalence studies demonstrated important differences in execution, tests for prevalence assessment, and in patient sampling, and their results also have to be interpreted in the light of some of the limitations that have been identified regarding the diagnostic performance of the tests for CD. Nonetheless, the results of this report suggest that CD is a very common disorder with a prevalence in the general population that is likely close to 1:100 (1%). Several high-risk groups with a prevalence of CD greater than that of the general population have been identified including those suspected of having CD, family members of CD patients, type I diabetics, and those with IDA or low BMD. Additionally, the review identified multiple other high-risk groups such as those with Down Syndrome, short stature, and infertility, to name a few, though their inclusion was beyond the scope of this report. These results would suggest that at the very least, high-risk groups should be screened for CD. If the performance of the noninvasive serological tests can be verified in the relatively “low prevalence” situations in general unselected populations, then population screening may also be advisable, particularly if a greater understanding of the consequences of missing early low-grade CD can be obtained, and the issues of low-compliance with a GFD of asymptomatic screen identified patients can be addressed.

CD is known to be associated with GI lymphoma. The results of this report confirm this strong association, with the limitations indicated in the text. Nonetheless, the report identified SIR for lymphoma that ranged from 4 to 40, and SMR that ranged from 11 to 70. GI lymphoma is believed to arise as a result of chronic antigenic stimulation, which leads to the development of a clonal T-cell population with usually a refractory intermediate stage. We have identified epidemiologic data that supports this notion, and suggests that a diagnostic delay, and in particular diagnosis of CD in adulthood, as apposed to in childhood, is associated with poorer outcomes. Fortunately, several studies suggest that adherence to a GFD reduces the risk of lymphoma in CD patients. These findings underscore the importance of early diagnosis and treatment of CD.

The consequences of testing for, and identifying CD patients, is expected to have a positive impact on patient outcomes be it either from a reduced risk of lymphoma with early diagnosis and treatment of CD or from improvements in nutritional status, BMI, and BMD. The consequences of testing in at-risk and symptomatic patients appears to be more straightforward since these patients appear to be more compliant with a GFD and would be expected to benefit from this intervention. The data is less clear for asymptomatic screen-identified patients, particularly those who are truly silent and/or don't have fully developed villous atrophy since, on the one hand the outcome of such patients has not been extensively studied, and on the other hand, compliance with a GFD appears problematic, particularly for those diagnosed in adulthood.

Finally, no specific interventions have been identified that promote adherence to a GFD, but education of patients and family members about CD and about the intricacies of the GFD through multidisciplinary teams, and participation in local CD societies, has been show to improve compliance. Therefore, the development and evaluation of formal educational interventions in collaboration between healthcare professionals and CD societies would appear to be a means to build on the methods that appear to already improve patient compliance. Monitoring of adherence to a GFD appears to be important, since improvement in histologic grade has been associated with improved BMD, IDA, and nutritional status. The serological markers appear to be adequate for detecting gross dietary indiscretion, and responding to gluten challenge, but unfortunately, they have poor sensitivity for detecting lesser degrees of dietary indiscretion, and have inadequate correlation with histological improvement at least in the short-term. It is true that histological improvement tends to lag behind clinical and serological improvement, especially in adults in whom improvement may never be complete, but even considering this, a negative serological test has been shown to miss patients with persistent villous atrophy. The recognition of persistent villous atrophy appears to be important since improvement beyond this level is associated with the improved outcomes listed above. It should be noted, however, that we could not identify a controlled study that objectively determined the level of histological improvement that would be associated with improved outcomes, and this is an area for future study. Although somewhat controversial, nonetheless, based on this report it would appear that follow-up biopsy, at least 1 year after GFD in adults to document improvement of the histological grade, would be valuable.

Abbreviations and Acronyms

95% CI-Ninety-five percent confidence interval
AGA-Antigliadin antibody
AR-Attributable risk
BMD-Bone mineral density
CD-Celiac disease
DXA-Dual energy X-ray absorptiometry
EGD-Esophagogastroduodenoscopy
ELISA-Enzyme-linked immunosorbent assay
EMA-Endomysial antibody
ESPGAN-European Society of Pediatric Gastroenterology and Nutrition
ETCL-Enteropathy-associated T-cell lymphoma
GFD-Gluten-free diet
GP-Guinea pig
HLA-Human leukocyte antigen
HR-Human recombinant
HU-Human umbilical cord
IDA-Iron deficiency anemia
IDDM-Type I diabetes (insulin dependent)
IEL-Intraepithelial lymphocytes
IF-Immunofluorescence
IgA-Immunoglobulin A
IgG-Immunoglobulin G
ME-Monkey esophagus
NHL-Non-Hodgkin's lymphoma
NPV-Negative predictive value
OR-Odds ratio
PPV-Positive predictive value
Prev-Prevalence
PVA-Partial villous atrophy
RR-Relative risk
SD-Standard deviation
Sens-Sensitivity
SIR-Standardized incidence ratio
SMR-Standardized mortality ratio
SPA-Single photon absorptiometry
Spec-Specificity
SVA-Subtotal villous atrophy
tTG-Tissue transglutaminase
VA-Villous atrophy

Appendix A

Table 1

Various causes of villous atrophy (VA; Farrell and Kelly, Am J Gastro 2001;96:3237)
Celiac disease
Dermatitis herpetiformis
Cow's milk protein intolerance (children)
Post-gastroenteritis
Giardiasis
Peptic duodenitis
Crohn's disease
Small intestinal bacterial overgrowth
Eosinophilic gastroenteritis
Radiation or chemotherapy
Tropical sprue
Severe malnutrition
Diffuse small intestinal lymphoma
Graft versus host disease
Hypogammaglobulinemia
Alpha chain disease

Table 2

Marsh (Gastroenterology 1992;102:330) and Rostami (Am J Gastroenterol 1999;94:888) modified histological criteria for CD
CriteriaRostami modification (1999)Original Marsh (1992)
Marsh 0Same as originalPre-infiltrative:
• Normal mucosal and villous architecture
Marsh ISame as originalInfiltrative:
• Normal mucosal and villous architecture
• Increased numbers of IELs
Marsh IISame as originalHyperplastic:
• Similar to above but with enlarged crypts, with increased crypt cell division
Marsh IIIaPartial VA:Destructive lesion:
• Shortened blunt villi• Flat mucosa - complete loss of villi
• Mild lymphocyte infiltration• Lymphocyte infiltration
• Enlarged hyperplastic crypts • Enlarged hyperplastic crypts
bSub-total VA:
• Clearly atrophic villi - but still recognizable
• Enlarged crypts whose immature epithelial cells are generated at an increased rate
• Influx of inflammatory cells
cTotal VA:
• Nearly total VA
• Severe Marsh atrophic, hyperplastic and infiltrative lesions
Marsh IVSame as originalHypoplastic:
• Total VA
• Normal crypt height but hypoplasia
• Normal IEL count
• Many feel this doesn't exist and represents severe malnutrition

VA=villous atrophy

IEL=intraepithelial lymphocytes

Table 3

Revised ESPGAN criteria
CriteriaESPGAN*- 1979ESPGAN- Revised 1990
Initial histology- Absent or nearly absent villi- Biopsy must remain the initial step in the diagnosis (mandatory)
- Recognized existence of less severe lesion- Recommend capsule over endoscopic biopsy
- No consensus on verification of less severe lesions but recommended if possible continuing gluten diet and assess histology, or re-challenge after GFD, given the large differential of milder histologic lesions- Large well oriented biopsy
- Histology: hyperplastic VA with hyperplasia of the crypts and an abnormal surface epithelium. The IEL count is raised
- Morphometery and histochemistry are important aids to diagnosis.
- Monoclonal antibodies to IEL may be a future aid
Antibody studies- n/a- Recognize that IgA AGA, and EMA have a high degree of sensitivity and specificity for the diagnosis of CD
- When such antibodies are present at the time of diagnosis in a child with a typical small intestinal mucosa, and when they disappear in parallel to a clinical response to a GFD, weight is added to the diagnosis of CD that may now be said to have been finally established
- When biopsy is unavailable in communities were other causes of enteropathy are rare, the presence of abnormal concentrations of two antibodies strongly suggests that CD is a diagnostic possibility
- Antibodies can be a marker of response to a GFD and a guide to dietary compliance
Improvement on GFD- Recognized as central to the definition- Second mandatory requirement remains a reasonably rapid (weeks rather than many months) clinical remission on a strict GFD
- Recognized that improvement need not be complete- Control biopsy is always a suitable way of verifying the effect of GFD, and is required in asymptomatic pts
Gluten Challenge- Importance of gluten challenge and re-biopsy emphasized to document “permanence” of gluten intolerance- No longer a requirement
- However, the panel recognized that challenge was not being performed in routine practice (only 652 were performed among several thousand children with gluten intolerance)- Should be used in equivocal cases such as when no initial biopsy was done, biopsy was inadequate or atypical, in communities with high rates of other enteropathies, or in situations when pts plan to abandon the GFD in an uncontrolled way
- Challenge should be performed after obtaining a control biopsy on a GFD
- Re-biopsy is performed 3–6 months later with the recognition that relapse can take 5–7 years or more to occur.
2-year rule- To address the issue of transient gluten intolerance, the panel emphasized the usefulness of the 2-year rule after stopping a GFD- The 2-year rule is practical in most cases, but several reports of relapse occurring 5–7 years after gluten rechallenge
- 619 of 652 gluten challenges redeveloped histology compatible with CD by 2 years

Walker-Smith et al., Arch Dis Child 1990:65:99

CD=celiac disease; n/a=not applicable; GFD=gluten-free diet

Appendix B. Search Strategies

Search Strategy 1

Celiac 1 - Diagnostic Tests

Test 1. EMA

MEDLINE on DIALOG

  1. s anti(w)endomysial(w)antibod? OR antiendomysial(w)antibod?

  2. s anti(w)endomysium(w)antibod? OR antiendomysium(w)antibod?

  3. s endomysial(w)antibod? OR endomysium(w)antibod? OR endomysial(w)autoantibod? OR endomysium(w)autoantibod?

  4. s endomysial(n3)iga OR antiendomysial(n3)iga OR iga(n)ema

  5. s endomysium(n3)iga OR antiendomysium(n3)iga OR igg(n)ema

  6. s immunoglobulin?(n3)endomysial OR immunoglobulin?(n3)antiendomysial

  7. s immunoglobulin?(n3)endomysium OR immunoglobulin?(n3)antiendomysium

  8. s ema(n3)antibod? OR ema(n3)autoantibod? OR anti(w)ema OR ema(n3)positiv?

  9. s aea AND (endomysial OR endomysium OR antiendomys?) OR aea(n3)positiv? OR aea(n2)igg OR aea(n2)iga

  10. c 1 OR 2 OR 3 OR 4 OR 5 OR 6 OR 7 OR 8 OR 9

  11. s (celiac OR celiacs OR coeliac OR coeliacs OR gluten OR glutens OR glutenin OR glutenins OR gliadin OR gliadins OR celiac disease/de) AND (ema OR aea)

  12. s (celiac OR celiacs OR coeliac OR coeliacs OR gluten OR glutens OR glutenin OR glutenins OR gliadin OR gliadins OR celiac disease/de) AND autoantibod?(n2) positiv?

  13. c 10 OR 11 OR 12

  14. s epithelial(w)membrane(w)antigen

  15. c 13 NOT 14

  16. s s15/human

  17. s s16/eng

EMBASE on DIALOG

  1. s anti(w)endomysial(w)antibod? OR antiendomysial(w)antibod?

  2. s anti(w)endomysium(w)antibod? OR antiendomysium(w)antibod?

  3. s endomysial(w)antibod? OR endomysium(w)antibod? OR endomysial(w)autoantibod? OR endomysium(w)autoantibod? OR endomysium antibody/de

  4. s endomysial(n3)iga OR antiendomysial(n3)iga OR iga(n)ema

  5. s endomysium(n3)iga OR antiendomysium(n3)iga OR igg(n)ema

  6. s immunoglobulin?(n3)endomysial OR immunoglobulin?(n3)antiendomysial

  7. s immunoglobulin?(n3)endomysium OR immunoglobulin?(n3)antiendomysium

  8. s ema(n3)antibod? OR ema(n3)autoantibod? OR anti(w)ema OR ema(n3)positiv?

  9. s aea AND (endomysial OR endomysium OR antiendomys?) OR aea(n3)positiv? OR aea(n2)igg OR aea(n2)iga

  10. c 1 OR 2 OR 3 OR 4 OR 5 OR 6 OR 7 OR 8 OR 9

  11. s (celiac OR celiacs OR coeliac OR coeliacs OR gluten OR glutens OR glutenin OR glutenins OR gliadin OR gliadins OR celiac disease/de) AND (ema OR aea)

  12. s (celiac OR celiacs OR coeliac OR coeliacs OR gluten OR glutens OR glutenin OR glutenins OR gliadin OR gliadins OR celiac disease/de) AND autoantibod?(n2)positiv?

  13. c 10 OR 11 OR 12

  14. s epithelial(w)membrane(w)antigen

  15. c 13 not 14

  16. s s15/human

  17. s s16/eng

Test 2. tTG

MEDLINE on DIALOG

  1. s tissue(w)transglutaminase?? OR tissue(w)trans(w)glutaminase??

  2. s antitissue(w)transglutaminase?? OR anti(w)transglutaminase??

  3. s human(w)transglutaminase?? OR antitransglutaminase??(n3)antibod?

  4. s (immunoglobulin? OR immunoglobulin a/de OR immunoglobulin g/de) AND (transglutaminase OR transglutaminases)

  5. s ttg(n3)antibod? OR ttg(n3)autoantibod? OR ttg(w)(kit OR kits) OR ttga OR httg OR anti(w2)ttg OR human(w)ttg OR elisa(n)ttg OR attga

  6. s (transglutaminase?? AND antibod?) OR (transglutaminase?? AND autoantibod?)

  7. s transglutaminase??(n3)iga OR transglutaminase??(n3)igg OR tg2(n5)transglutaminase?? OR human(w) recombinant(w)tg2

  8. s anti(w)gamma(w)glutamyltransferase AND (antibod? OR autoantibod?)

  9. c 1 OR 2 OR 3 OR 4 OR 5 OR 6 OR 7 OR 8

  10. s (celiac OR celiacs OR coeliac OR coeliacs OR gluten OR glutens OR glutenin OR glutenins OR gliadin OR gliadins OR celiac disease/de) AND (transglutaminase OR transglutaminases OR ttg OR tg2)

  11. c 9 OR 10

  12. s s11/human

  13. s s12/eng

EMBASE on DIALOG

  1. s tissue(w)transglutaminase?? OR tissue(w)trans(w)glutaminase??

  2. s antitissue(w)transglutaminase?? OR anti(w)transglutaminase??

  3. s human(w)transglutaminase?? OR antitransglutaminase??(n3)antibod?

  4. s immunoglobulin OR immunoglobulin a/de OR immunoglobulin a1/de OR immunoglobulin a2/de

  5. s immunoglobulin g/de OR immunoglobulin g1/de OR immunoglobulin g2/de OR immunoglobulin g2a/de OR immunoglobulin g2b/de OR immunoglobulin g3/de OR immunoglobulin g4/de

  6. s transglutaminase OR transglutaminases

  7. c 4 OR 5

  8. c 7 AND 6

  9. s ttg(n3)antibod? OR ttg(n3)autoantibod? OR ttg(w)(kit OR kits OR assay) OR ttga OR httg OR anti(w2)ttg OR human(w)ttg OR elisa(n)ttg OR attga

  10. s (transglutaminase?? AND antibod?) OR (transglutaminase?? AND autoantibod?)

  11. s transglutaminase??(n3)iga OR transglutaminase??(n3)igg OR tg2(n5)transglutaminase?? OR human(w) recombinant(w)tg2

  12. s anti(w)gam