The Agency for Healthcare Research and Quality (AHRQ), through its Evidence-Based Practice Centers (EPCs), sponsors the development of evidence reports and technology assessments to assist public- and private-sector organizations in their efforts to improve the quality of health care in the United States. This report on Celiac Disease was requested and funded by the Office of Medical Applications of Research National Institutes of Health (NIH) for the Consensus Development Conference on Celiac Disease as well as the National Institute of Diabetes and Digestive and Kidney Diseases, NIH. The reports and assessments provide organizations with comprehensive, science-based information on common, costly medical conditions and new health care technologies. The EPCs systematically review the relevant scientific literature on topics assigned to them by AHRQ and conduct additional analyses when appropriate prior to developing their reports and assessments.
To bring the broadest range of experts into the development of evidence reports and health technology assessments, AHRQ encourages the EPCs to form partnerships and enter into collaborations with other medical and research organizations. The EPCs work with these partner organizations to ensure that the evidence reports and technology assessments they produce will become building blocks for health care quality improvement projects throughout the Nation. The reports undergo peer review prior to their release.
AHRQ expects that the EPC evidence reports and technology assessments will inform individual health plans, providers, and purchasers as well as the healthcare system as a whole by providing important information to help improve health care quality.
We welcome comments on this evidence report. They may be sent by mail to the Task Order Officer named below at: Agency for Healthcare Research and Quality, 540 Gaither Road, Rockville, MD 20850, or by mail to epc@ahrq.gov.
Carolyn M. Clancy, M.D.
Director
Agency for Healthcare Research and Quality
Barnett S. Kramer, M.D., M.P.H.
Director
Office of Medical Applications of Research, NIH
Allen M. Spiegel, M.D.
Director
National Institute of Diabetes and Digestive and Kidney Diseases, NIH
Jean Slutsky, P.A., M.S.P.H.
Director, Center for Outcomes and Evidence
Agency for Healthcare Research and Quality
Kenneth S. Fink, M.D.,M.G.A.,M.P.H.
Director, EPC Program
Agency for Healthcare Research and Quality
Marian James, Ph.D.
EPC Program Task Order Officer
Agency for Healthcare Research and Quality
The authors of this report are responsible for its content. Statements in the report should not be construed as endorsement by the Agency for Healthcare Research and Quality or the U.S. Department of Health and Human Services of a particular drug, device, test, treatment, or other clinical service.
The authors would like to thank several individuals for their support of the present project: Keith O'Rourke, who helped with the statistical analysis; Karen Patrias, who helped with conducting the literature search; Gabriela Lewin, who assisted with the quality assessment; and Christine Murray and Isabella Steffensen, who assisted in the editing of the report and the generation of evidence tables.
Dr. Alaa Rostom was the lead investigator. He was involved in all aspects of the study design, management, planning, analysis and write-up, including article screening, data extraction, quality assessment, statistical analysis, and report write-up. Drs. Catherine Dubé and Ann Cranney were the second investigators. Dr. Catherine Dubé was involved in all aspects of study design, and planning and organization, including task management, article screening, data extraction, and quality assessment. She was the lead writer of Celiac 2 and 3. Dr. Ann Cranney was involved in all aspects of study design and planning, including article screening, data extraction, and quality assessment. She was the lead writer of Celiac 4, and oversaw the screening and data extraction for Celiac 3, 4, and 5. Dr. Navaaz Saloojee was involved in study planning, article screening, data extraction, and quality assessment, and was the lead writer of Celiac 5. Dr. Richmond Sy was involved in study planning, article screening, data extraction, and contributed to the writing of Celiac 4. Drs. David Mack and Dilip Patel were involved in study planning and article screening, in addition to being content experts in pediatric and adult celiac disease, respectively. They also reviewed and advised on the report write-up. JoAnne McNeal was involved in article screening and data extraction for Celiac 1 (serology) and Celiac 2 (prevalence).
Dr. David Moher was involved in all aspects of study design, management, planning, analysis, and write-up. He was the methodological content expert and reviewed and advised on all report conduct and documents. Chantelle Garrity was involved in all aspects of project planning and management, including liaison with all key partners. She oversaw the screening progress, document retrieval, and assisted in report management, review and write-up. Margaret Sampson was the lead information specialist and was involved in all aspects of the search strategy/key word design and refinement, in association with the information specialists at the NLM. She was involved in all aspects of article database management, including article retrieval, set-up of the online computerized SRS article screening and extraction system, and the development and write-up of the QUOROM Flow. Li Zhang was the information specialist involved in all aspects of SRS system management, article retrieval, and implementation of the SRS system. Dr. Vasil Mamaladze and Fatemeh Yazdi performed the data extraction of Celiac 1-serology and Celiac 2-prevalence. Irene Pan performed data extraction and quality assessment of Celiac 2-prevalence.
Context. Celiac disease (CD) is a disorder of small bowel malabsorption. It is characterized by mucosal inflammation, villous atrophy and crypt hyperplasia that occur upon exposure to gluten, and clinical and histological improvement with withdrawal of gluten from the diet. The classical presentation of CD has now been shown to be less common than silent or atypical presentation, in which patients do not have intestinal symptoms. Untreated CD is associated with multiple important short- and long-term complications including nutritional derangements, anemia, reduced bone density, as well as intestinal lymphoma. In the vast majority of patients, CD is effectively treated with dietary modifications that eliminate gluten. Mounting evidence suggests that CD is actually considerably more common than previously believed and, therefore, this disorder warrants consideration for screening of at-risk patients, as well as possibly the general population.
Objectives. To conduct a comprehensive systematic review on five areas of CD: (1) sensitivity and specificity of serological tests; (2) prevalence and incidence of CD; (3) CD associated lymphoma; (4) consequences of testing for CD; and, (5) interventions for the promotion and monitoring of adherence to a gluten-free diet (GFD).
Data Sources. Staff of the National Library of Medicine performed a series of searches in support of the literature review of CD. Searches were run in the MEDLINE® (1966 to Oct 2003) and EMBASE (1974 to Dec 2003) databases for each of the five objectives and their respective sub-objectives separately.
Study Selection. Study selection for each objective was performed using three levels of screening with predetermined increasingly more strict criteria to ensure that all relevant articles were captured. Following a calibration exercise, two reviewers independently screened all studies using a web-based system allowed automatic identification of review disagreements. These disagreements were resolved by consensus.
Data Extraction. For each CD objective, a detailed and standardized data abstraction form was developed. For each objective, data abstraction was conducted by one reviewer and verified by another. The extracted data was further verified by one of the principal investigators. Quality assessments were performed using specific instruments for each of the included study types.
Data Synthesis. The data obtained from this review fell into several broad categories, which correspond in large part to the individual study objectives. Data for the sensitivity and specificity of each serological marker was considered separately, and studies were further divided according to the age group of the study population. Attempts were made to identify, explain, and minimize clinical and statistical heterogeneity in the included studies. A Pearson's Chi Square with n-1 degrees of freedom, where n represents the number of included studies in an analysis, was calculated to assess statistical heterogeneity. Pooled estimates were only calculated if clinically and statistically appropriate. In situations where pooling was not performed, a qualitative systematic review was conducted.
To produce clinically useful pooled statistics, a weighted mean of the overall sensitivity and specificity from the included studies was calculated, along with 95% confidence intervals (CIs). The pooled estimates for the sensitivity and specificity were compared with a summary receiver operating characteristic (ROC) curve, calculated for the same group of studies as a second check of the estimates.
Results/Conclusions. This report has provided a systematic review of five broad areas (and corresponding sub-areas) of CD. Perhaps one of the most important findings of this report is the significance of how one chooses to define CD in the era of serological testing, and how this apparently clear-cut task has profound implications on all the results presented in this report. Specifically, can CD be diagnosed solely on the basis of serology? Is some degree of villous atrophy necessary for a diagnosis of CD. These questions have important implications downstream of the diagnosis as well. For example, do CD patients without symptoms or villous atrophy have the same risk of complications as those with villous atrophy. Is serological improvement on a GFD sufficient to reduce CD complications, or must there be documented histological improvement, and what degree of histological improvement is necessary?
The results of the Celiac 1 objective suggest that in the era of EMA and tTG antibody testing, AGA antibody testing in both children and adults has a limited role. The sensitivity and specificity of EMA and tTG are quite high (over 95% for sensitivity, and close to 100% for specificity), as are their positive and negative predictive values; however, one has to be aware that the reported diagnostic parameters are taken from studies in which the prevalence of CD was, for the most part, much higher than that seen in usual clinical practice. The positive predictive values reported for these tests will certainly not be as high as that reported when these tests are used to screen the general population. The bulk of the evidence on the diagnostic characteristics of these tests was derived from studies that defined CD as having at least some degree of VA.
HLA DQ2/DQ8 testing appears to be a useful adjunct in the diagnosis of CD. The test has high sensitivity (in excess of 90%–95%), however, since approximately 30% of the general population, and an even higher proportion of “high-risk” subjects (e.g., diabetics and family members) also carry these markers, the specificity of this test is not ideal. The greatest diagnostic utility of this test appears to be its negative predictive value.
Biopsy itself, when used with a strict cut-off requiring villous atrophy, appears to have high specificity, but poor sensitivity. Using a lower grade cut-off clearly improves sensitivity, but because of the wide differential of causes of histological lesions similar to Marsh I to IIIa, the specificity suffers. The use of histomorphometric measures such as quantification of gamma delta positive intraepithelial lymphocytes (γδ+ IELs) are likely to allow for the use of lower grade cut-offs, while maintaining reasonable specificity. Ultimately, a trial utilizing multiple diagnostic tests in an attempt to capture as many CD patients in a clinically-relevant population as possible, along with a time dimension such as a response to a GFD or gluten challenge, is required to fully assess the diagnostic characteristics of biopsy alone. This type of study would be able to characterize the false-positive and false-negative rates, provided that all studied patients are followed forward in time.
The included prevalence studies demonstrated important differences between the studies including, execution, tests for prevalence assessment, and patient sampling. Thus, results have to be interpreted in the light of some of the limitations that have been identified regarding the diagnostic performance of the tests for CD. Nonetheless, the results of this report suggest that CD is a very common disorder with a prevalence in the general population that is likely close to 1:100 (1%). Several high-risk groups with a prevalence of CD greater than that of the general population have been identified and include: those suspected of having CD; family members of CD patients; type I diabetics; and, those with iron-defiency anemia (IDA) or low bone mineral density (BMD). Additionally, the review identified many other high-risk groups, including those with Down Syndrome, short stature, and infertility, to name a few. Their inclusion was however, beyond the scope of this report
The results of this report confirm that, apart from a few limitations, there is a strong association between CD and GI lymphoma. The report identified standard incidence ratios (SIR) for lymphoma that ranged from 4 to 40, and standard mortality ratios (SMR) that ranged from 11 to 70. A diagnostic delay—in particular a diagnosis of CD in adulthood as apposed to in childhood—is associated with poorer outcomes. Fortunately, several studies suggest that adherence to a GFD reduces the risk of lymphoma in CD patients.
The consequences of testing for CD in at-risk and symptomatic patients appears to be more straightforward, since these patients appear to be more compliant with a GFD and would be expected to benefit from this intervention. The data is less clear for asymptomatic screen-identified patients, particularly those who have truly silent CD and/or don't have fully-developed villous atrophy. On the one hand the outcome of such patients has not been extensively studied, and on the other hand compliance with a GFD appears problematic, particularly for those diagnosed in adulthood.
Finally, no specific interventions have been identified that promote adherence to a GFD, but education of patients and family members about CD and about the intricacies of a GFD, and participation in local celiac societies, has been shown to improve compliance. Although somewhat controversial, biopsy monitoring of adherence to a GFD appears to be important, since improvement in histological grade has been associated with improved BMD, IDA, and nutritional status. The serological markers appear to be adequate for detecting gross dietary indiscretion, and respond to a gluten challenge, but appear to have poor sensitivity for detecting lesser degrees of dietary indiscretion, and inadequately correlating with histological improvement at least in the short-term. It should, however, be noted, that we could not identify a controlled study that objectively determined the level of histological improvement that would be associated with improved outcomes, and this is an area for future study. Nonetheless, based on this report it would appear that follow-up biopsy, at least 1 year after a GFD in adults to document improvement of the histological grade, would be valuable.
Celiac disease (CD) is a disorder of small bowel malabsorption. It is characterized by mucosal inflammation, villous atrophy and crypt hyperplasia, which occur upon exposure to gluten, and clinical and histological improvement with withdrawal of gluten from the diet.1–4 CD—also referred to as celiac sprue, gluten-sensitive enteropathy, non-tropical sprue, in addition to a host of other names—is thought to result from the activation of both a cell-mediated (T-cell) and humoral (B-cell) immune response upon exposure to the glutens (prolamins and glutenins) of wheat, barley, rye, and oats, in a genetically susceptible person.5, 6 Genetic susceptibility is suggested by a high concordance among monozygotic twins of close to 70 percent,7 and an association with certain type II human leukocyte antigens (HLA).8, 9 HLA DQ2 is found in up to 95 percent of CD patients, while most of the remaining patients have HLA DQ8.8–10 However, there is only a 30 percent HLA concordance among siblings, suggesting that other genetic factors are also at play.11 More recent evidence suggests that the presence of auto-antibodies to a connective tissue element surrounding smooth muscle called endomysium is highly specific for CD. The target of this autoantibody is now known to be an enzyme called tissue transglutaminase (tTG). This enzyme may play a prominent role in the pathogenesis of CD by modifying gliadin, resulting in a greater proliferative response of gliadin specific T-cells, which contributes to mucosal inflammation and further B-cell activation.5, 6, 12, 13
CD appears to represent a spectrum of clinical features and presentations. Although “classical” CD (i.e., fully developed gluten-induced villous atrophy and classical features of intestinal malabsorption) is most commonly described, it appears that most patients have atypical CD (i.e., fully developed gluten-induced villous atrophy found in the setting of another presentation such as iron deficiency, osteoporosis, short stature, or infertility) or silent CD (i.e., fully developed gluten-induced villous atrophy discovered in an asymptomatic patient by serologic screening or perhaps an endoscopy for another reason). Other authors describe a latent form of CD that is characterized by a previous diagnosis that responded to a gluten-free diet (GFD) and retained a normal mucosal histology upon later introduction of gluten. Latent CD can also represent patients with currently normal intestinal mucosa who will subsequently develop gluten-sensitive enteropathy.13, 14
The true prevalence of CD is difficult to estimate because of the variable presentation of the disease, particularly since many patients can have little or no symptoms. With this limitation in mind, the prevalence of the disease is highest in Celtic populations where estimates of 1:300 to 1:122 have been described. The prevalence of CD in North America has been estimated to be 1:3000, but a recent American study found the prevalence among the general not-at-risk population to be 1:105, while the prevalence in at-risk groups such as first-degree relatives of CD patients was 1:22, suggesting that CD is greatly under-diagnosed. CD can affect persons of many ethnic backgrounds, but appears to rarely affect persons of purely Chinese, Japanese, or Afro-Caribbean decent.13
The diagnosis of CD in adults is classically made on the basis of clinical suspicion—that is, recognizing atypical presentations such as isolated iron deficiency, combined iron and folate deficiency, and osteoporosis—compatible with a duodenal biopsy while taking a gluten-containing diet, followed by clinical and histological improvement following commencement of a GFD.2, 4 However, several serologic markers have become available which have altered the classic diagnostic pathway. The sensitivity of IgA anti-gliadin antibodies (AGA) is reported to range from 70 to 85 percent, whereas the specificity ranges from 70 to 90 percent. IgA anti-endomysial (EMA) and anti-tissue transglutaminase (tTG) antibodies have sensitivities in excess of 90 percent and specificities of over 95 percent.14 Significant variability seems to exist in the reported values among the different studies, and these IgA-based tests can be negative in IgA-deficient patients, accounting for about 3 percent of CD cases.
The sensitivity and specificity of the anti-EMA and anti-tTG antibodies, along with the perceived under diagnosis of CD, has led to suggestions of using these tests for population screening. Aside from the recognized influence of CD prevalence on the predictive value of a serologic test result, little consensus exists regarding the value of population screening. Furthermore, specific questions regarding clinically important outcomes resulting from screening remain unclear. In particular, little data is available on adherence to a GFD in asymptomatic CD patients detected by screening.
The major complications of CD include intestinal and extraintestinal malignancies, ulcerative jejunoileitis, and collagenous sprue. Unlike most gastrointestinal (GI) lymphomas that are typically of B-cell origin, lymphomas associated with CD appear to be most commonly of T-cell origin. Unfortunately, the prognoses for patients with CD-associated T-cell lymphomas, ulcerative jejunoileitis and collagenous sprue, appear grim. It is widely believed that strict adherence to a GFD reduces the risk of these complications. It is suggested that by 5 years of dietary adherence the risk of lymphoma in CD patients approaches that of the general population.14
The challenge of CD remains to determine which patient populations should be screened, the best means of screening, and whether early detection of patients with CD leads to improved patient outcomes. For patient outcomes to improve as a result of screening, the degree to which “positively” screened individuals, particularly those who were asymptomatic, adhere to the stringent GFD, needs to be determined.
As briefly described in the Overview, CD can take on a variety of forms. Paramount to the conduct of this review and subsequent interpretation of the literature is the identification of clear definitions of the many faces of CD. Implicit to a definition of CD (with a few exceptions that are detailed below) is the concept that the clinical and the small intestinal pathological features are present in patients who consume a gluten-containing diet, normalize with the introduction of a GFD, and recur with the re-introduction of dietary gluten.2, 4 The historical tendency to rely on biopsy features as part of the definition of CD, creates difficulties (as discussed below) in accurately addressing the sensitivity and specificity of biopsy for the diagnosis of CD, and in assessing the sensitivity and specificity of the serologic markers, if different studies use different criteria to define CD. For the purpose of this review, the following definitions have been used.
Classical CD. The most commonly described form. It describes patients with the classical features of intestinal malabsorption who have fully developed gluten-induced villous atrophy and the other classic histological features. These patients present because of GI symptoms, and are identified as CD sufferers through the investigation of these symptoms. This group can also be said to have symptomatic CD.
Atypical CD. Appears to be one of the most common forms. These patients generally have little to no GI symptoms, but seek medical attention because of another reason such as iron deficiency, osteoporosis, short stature, or infertility. These patients generally have fully developed gluten-induced villous atrophy. Because these patients are “asymptomatic” from the GI perspective, if their atypical CD feature is not recognized, they may be difficult or impossible to distinguish from “true” silent (asymptomatic) CD patients.
Silent CD. A very common form of CD. Refers to patients who are asymptomatic but are discovered to have fully developed gluten-induced villous atrophy after having undergone serologic screening or perhaps an endoscopy and biopsy for another reason. These patients are clinically silent, in that they do not manifest any clear GI symptoms or associated atypical features of CD such as iron deficiency or osteoporosis. These patients can be confused with atypical CD if their atypical features are not recognized in an early stage. As well, Fasano et al.15 have shown that many of these patients do not manifest fully developed villous atrophy.
Latent CD. Represents patients with a previous diagnosis of CD that responded to a GFD and who retain a normal mucosal histology upon later re-introduction of gluten. Latent CD can also represent patients with currently normal intestinal mucosa who will subsequently develop gluten-sensitive enteropathy.
Refractory CD. For the purpose of this review, patients with refractory CD are patients with true CD and villous atrophy (i.e., not a misdiagnosis) who do not, or no longer, respond to a GFD. Although the most common reason for failure to respond to a GFD is dietary indiscretion or unknown exposure to gluten, refractory CD also occurs in patients on a GFD who have developed a complication such as ulcerative-jejunoileitis, or enteropathy-associated lymphoma. Patients with refractory CD do not necessarily have positive serology for CD. Refractory CD was reviewed in the context of the requested objectives.
The purpose of this report is to systematically review the available CD literature in order to provide organized evidence relating to a number of objectives put forth by the AHRQ. The findings of the report are intended to assist an assembled group of American and world experts in the field of CD in the development of a National Institute of Health (NIH) Consensus Development Conference Guidelines sponsored by AHRQ and OMAR.
At first glance, the determination of the sensitivity and specificity of the various diagnostic modalities for CD seems straightforward. There are a multitude of studies that have assessed the diagnostic characteristics of each of the serological markers using a variety of different laboratory methods. However, these studies are remarkably heterogeneous on a number of levels.
For example, there appears to be notable heterogeneity in the actual definition of CD, an issue that has important consequences on all of the task order objectives. Central to the classic definition of CD is the recognition that biopsy is the gold standard for diagnosis. However, it has become clear over the years that the majority of patients with CD do not have the classically described features of intestinal malabsorption, and that a large proportion of patients do not have the classic flat mucosa (sub-total or total villous atrophy). To further aid in the diagnosis of CD, multiple authors have devised and modified histological criteria to grade the mucosal lesions of patients with CD. But still at issue is the broad differential of disorders that can cause villous atrophy, particularly the milder histological grades. To help address this issue, others have attempted to address specific features of the biopsy, such as the number of intraepithelial lymphocytes (IELs), the number of gamma delta positive (γδ+) IELs and other lymphocyte subtypes, as well as the localization of IELs towards the villous tip, just to name a few.
The serological screening studies, together with the recognition that a low-grade histological lesion can be consistent with CD, have helped bring to light the concept of a spectrum of CD and the so-called “celiac iceberg.” In brief, it is recognized that classic CD with the typical symptoms of malabsorption and a fully developed mucosal lesion represents a small proportion of patients. The majority of patients are asymptomatic and are classified as having either atypical CD, silent CD, or less commonly latent CD. Some authors question whether most, if not all cases of silent CD, are in fact atypical CD, although the associated consequence of this has not been recognized. To further complicate the issue, Fasano15 has clearly characterized patients with silent CD without fully developed mucosal lesions, and found that only 34 percent of the patients had subtotal or total villous atrophy.
It should be recognized that the majority of studies assessing the diagnostic characteristics of the serological markers have defined CD by a biopsy with Marsh III or modified IIIa lesions or greater. These studies have reported a high sensitivity and specificity for these tests, particularly for the anti-EMA and anti-tTG antibody tests. However, some studies have looked at the characteristics of these tests in lower-grade lesions, and have found that while 100 percent of patients with Marsh IIIc histology show antibodies to endomysium, only 60 percent of patients with Marsh IIIa histology have anti-EMA antibodies.17, 18 Furthermore, it is apparent that serological markers can be used to monitor adherence to a GFD; for example, EMA and tTG antibodies fall to normal or non-diagnostic levels on a GFD, but the correlation with improvement of villous height is not as clear-cut. Finally, with the discovery by Sollid et al.8 and others, that over 95 percent of patients with CD have HLA DQ2 and most of the remainder having HLA DQ8, it became hopeful that a reliable confirmatory test based on HLA typing would be available. Unfortunately, up to 40 percent of the general population and a much higher proportion of those with autoimmune disorders such as type I diabetes also have HLA DQ2 and/or HLA DQ8. Therefore, the specificity of this test can be quite low, making its positive predictive value relatively low. It is also becoming apparent that HLA DQ2/8 may not be the true risk-genes, and researchers are actively studying other candidate genes that may be associated with DQ2/8, or in patients without DQ2/8, other genes altogether.
The preceding overview was presented to simply illustrate the complexity involved in separately assessing the sensitivity and specificity of the serological markers, HLA typing, and biopsy itself, in the diagnosis of CD. Over time, the status of the biopsy as the gold standard for the diagnosis of CD has been eroded. Yet at the same time, most of what we know about the sensitivity and specificity of serological markers and HLA typing rely on biopsy as the gold standard. Therefore, one is locked in a circular argument of how best to choose the gold standard test(s), when each has important shortcomings and is dependent on another to define its own diagnostic characteristics. The major problem in accurately evaluating the diagnostic characteristics of these tests, is the issue of identifying all possible CD patients in a general screened population to use as a benchmark. Serology would be the most convenient strategy, but appears to loose sensitivity in patients with low-grade lesions. Screening a general population with biopsy has significant practical/cost issues, as well as potential ethical problems; however, if such a study was performed along with measuring the serological and HLA status of patients, this would allow for identification of Marsh I or II lesions that would need to be characterized further. HLA DQ2/8-negative patients could likely be excluded from having CD. But those patients with Marsh I–II lesions would have to be followed, whether or not they were serology positive or HLA DQ2/8 positive, to see if CD develops; alternatively, they could be tested with a GFD and subsequently rechallenged to see whether they truly have CD. Only in this way can the true sensitivity of biopsy be determined. Using this multi-test gold standard with follow-up of equivocal cases, would also be the best way of assessing the sensitivity and specificity of serology markers and HLA DQ2/DQ8 typing.
Finally, a question which needs to be addressed is: “What are the implications of identifying a truly asymptomatic individual, for example with serological screening, who has no other obvious complications such as iron deficiency or osteoporosis, and is then found to have a Marsh I or II lesion?” This returns the circular argument back to “What is truly CD?”—a question that is beyond the scope of this review.
The UO-EPC's evidence report on CD is based on a systematic review of the scientific-medical literature to identify, and synthesize the results from studies addressing the key questions put forth by the AHRQ. The Celiac Review Team, together with content experts, identified specific issues integral to the review. A Technical Expert Panel (TEP) refined the research questions, as well as highlighted key variables requiring consideration in the evidence synthesis. Evidence tables presenting the key study characteristics and results were developed. Summary tables were derived from the evidence tables. The methodological quality of reports of the included studies was appraised, and individual study results were summarized. For some objectives a narrative interpretation of the literature was provided.
The AHRQ task order requested answers to the questions outlined below:
Objective 1 - Sensitivity and specificity of tests for CD (Celiac 1)
What is the sensitivity and specificity of the following tests for CD:
AGA;
EMA;
human tTG lgA antibodies;
HLA (DQ2/DQ8);
duodenal/jejunal biopsy (see section below on celiac definition)
Do sensitivity and specificity vary in different target populations (e.g., symptomatic vs. asymptomatic; geographic populations)?
Objective 2 - Prevalence and incidence of CD (Celiac 2)
What is the prevalence and incidence of symptomatic and “clinically silent” CD in:
the general population;
high-risk populations:
family member of patient with CD;
type 1 diabetes mellitus;
iron deficiency anemia (IDA);
osteoporosis?
How does prevalence and incidence in the general population vary in different geographic and racial/ethnic populations?
Objective 3 - Celiac associated lymphoma (Celiac 3)
What is the association between CD and GI lymphoma?
What is the cumulative risk of developing GI lymphoma in patients with CD?
Does the cumulative risk vary with clinical presentation?
Objective 4 - Expected consequences of testing for CD (Celiac 4)
What are the expected consequences of testing for CD in the following populations:
patients with symptoms suggestive of CD;
asymptomatic, at-risk populations (affected family members, patients with type 1 diabetes);
the general population?
“Consequences” include:
false-positive results;
follow-up testing;
invasive procedures (biopsies);
cases diagnosed;
patients complying with treatment; and
response to treatment.
Objective 5 - Promoting or monitoring adherence to a GFD (Celiac 5)
What interventions are effective for promoting or monitoring adherence to a GFD?
From the preceding discussion in the methodological consideration section it is clear that current histological criteria using a cut-off grade to define CD have important shortcomings. We therefore adopted an open histological definition of CD when selecting a study for inclusion, as long as the authors' explicitly stated or described the criteria used to define CD (see inclusion criteria below). However, with the help of the TEP, we defined a “standard” histological definition of CD as a biopsy grade showing a modified Marsh IIIa or greater. This definition was NOT used as an inclusion/exclusion criterion, but simply to frame our results and to allow for the evaluation of the effect of different histological criteria on the performance of the various CD tests.
The choice of biopsy criteria and/or histological grade “cut-off” used to define CD has important implications for the interpretation of the studies of serology, HLA, and biopsy. It is recognized that some patients with CD may have Marsh I or II lesions, and by definition patients with latent CD have Marsh 0 lesions. However, as emphasized by Marsh,1 and as is discussed further below, in order to correctly interpret these early lesions, prospective follow-up studies are required, and an individual patient follow-up and documented response to gluten withdrawal would be required to firmly establish the diagnosis of CD.
The practical importance of the histological definition is evident from our preliminary review of articles that demonstrated considerable heterogeneity in the histological criteria used within the studies to define CD. Some used strict definitions, whereas, others accepted milder grade lesions. Furthermore, since the existence of latent CD and some silent CD without fully developed histology is now recognized, a study that aims to assess the sensitivity and specificity of biopsy itself in CD needs to use a design that incorporates the most sensitive and specific serologic and HLA tests available. The biopsy and serology should be performed simultaneously, with patients having discordant test results being further evaluated. Those with normal biopsy and positive serology would have to be followed over time to see if they have a latent form of CD. Conversely, patients with positive biopsies and normal serology would have to demonstrate improvement in histology on a GFD, and ideally, certification of relapse by biopsy with reintroduction of gluten. This type of study design was sought in order to address the objective of the sensitivity and specificity of biopsy.
Unselected general population. The unselected general population implies a representative sample of a given population, such as a random sample of healthy blood donors or healthy school children. Some unselected populations are better than others for determining the true prevalence or incidence of CD. For example, blood donors are required to have normal hemoglobin and no iron deficiency, and therefore may underestimate the true numbers of patients with CD.
Suspected CD. Patients with suspected CD include patients with GI symptoms, such as diarrhea or symptomatic malabsorption, who are being investigated for the possibility of CD. These patients are typically undergoing other investigations in addition to being worked-up for CD.
High-risk populations. High-risk populations include populations with an expectedly higher prevalence of CD. Such populations include asymptomatic family members of patients with CD, patients with type I diabetes where identified CD would likely be silent or latent, and populations such as those with iron deficiency or osteoporosis where identified CD would be in the atypical CD classification.
The HLA DQ2 haplotype represents the occurrence of HLA class II heterodimer alleles DQA1*0501 and DQB1*0201. These typically occur in a cis position as HLA DR3-DQ2 or in a trans position as HLA DR5/DR7-DQ2. The HLA DQ8 haplotype DQA1*0301/DQB1*302 typically occurs in association with DR4.
The analytical framework is presented in Figure 1
Although the objectives of this task order are contained within a request for a single evidence report, we conducted five separate reviews, from the literature search onwards, as the objectives of this mandate were more orthogonal than overlapping.
A series of searches were performed by National Library of Medicine staff in support of the literature review for CD. Strategies were developed using the guidelines supplied by the UO-EPC, and were divided into the five questions posed by AHRQ. All searches were limited to human studies published in English language journal articles. The specific strategies used for each search are located in Appendix B.
What is the sensitivity and specificity of the following tests for CD:
EMA
human tTG IgA antibodies
AGA EMA
HLA DQ2/DQ8
small bowel biopsy
Searches were run in the MEDLINE® and EMBASE databases for each of the five tests. With the exception of the search for small bowel biopsy, a reference to CD or its synonyms was not a requirement for retrieval in order to obtain the widest possible information on these tests. Because of their complexity, a separate search was run for each test, then the results combined into one Pro-Cite file and duplicates eliminated. Individual case reports and letters to the editor were also removed.
The MEDLINE® searches were run in October 2003 for the year 1966 forward and yielded a total of 2885 citations, with a follow-up search for HLA DQ2 and DQ8 performed in November 2003 that yielded an additional 390 citations. The EMBASE searches were run in December 2003 for the year 1974 forward and yielded a total of 1,046 citations after duplicates to MEDLINE® were removed.
What is the prevalence and incidence of symptomatic and clinically silent CD in the general population and in the following identified high-risk populations:
patients with an affected family member
type 1 diabetes mellitus
IDA
osteoporosis
Searches were run in the MEDLINE® and EMBASE databases. The MEDLINE® search was performed in October 2003 for the year 1966 forward and retrieved a total of 1,584 citations. The EMBASE search was run in December 2003 for the year 1974 forward and yielded 467 citations after duplicates to the MEDLINE® retrieval were removed. Individual case reports and letters to the editor were also removed from both searches.
What is the association between CD and GI lymphoma?
Searches were run in the MEDLINE® and EMBASE databases. The MEDLINE® search was performed in October 2003 for the year 1966 forward and retrieved a total of 230 citations. The EMBASE search was run in December 2003 for the year 1974 forward and yielded 97 citations after duplicates to the MEDLINE® retrieval were removed. Individual case reports and letters to the editor were also removed from both searches.
What are the expected consequences of testing for CD in the following populations:
patients with symptoms suggestive of CD
asymptomatic, at-risk populations
general population
Searches were run in the MEDLINE®, EMBASE, PsycINFO, AGRICOLA, CAB, and Sociological Abstracts databases. In order to obtain the widest possible retrieval, all articles on screening for celiac and its synonyms were included, not just those discussing consequences.
The MEDLINE® search was performed in October 2003 for the year 1966 forward and retrieved a total of 917 citations. The EMBASE (1974 forward), PsycINFO (1840 forward), AGRICOLA (1970 forward), CAB (1972 forward), and Sociological Abstracts (1963 forward) database searches were run in December 2003 and yielded a combined total of 204 citations after duplicates to the MEDLINE® retrieval were removed. Individual case reports and letters to the editor were also removed from both searches.
What interventions are effective for promoting or monitoring adherence to a GFD?
Searches were run in the MEDLINE®, EMBASE, PsycINFO, AGRICOLA, CAB, and Sociological Abstracts databases. Because of the small number of citations retrieved, a few selected articles discussing adherence to dietary limitations for other conditions were included. The MEDLINE® search was performed in October 2003 for the year 1966 forward and retrieved a total of 152 citations. The EMBASE (1974 forward), PsycINFO (1840 forward), AGRICOLA (1970 forward), CAB (1972 forward), and Sociological Abstracts (1963 forward) database searches were run in December 2003 and yielded a combined total of 168 citations after duplicates to the MEDLINE® retrieval were removed. Individual case reports and letters to the editor were also removed from both searches.
Some citations fulfilled the criteria of more than one celiac objective. Duplicates within each celiac objective were electronically removed. The obtained citations were uploaded into an internal web-based review system (SRS) for online collaborative citation screening and abstraction. Articles passing the first level screen were retrieved in full for further screening (see below).
Reference lists of included studies, book chapters, and narrative or systematic reviews retrieved after having passed the first level of relevance screening, were manually searched to identify additional unique references. Through contact with content experts, and the TEP, attempts were made to identify other studies not identified by the search.
| Objective | Level | Inclusion | Exclusion |
|---|---|---|---|
| Celiac 1 | 1 | Any article reporting sensitivity/specificity of AGA, EMA, tTG, HLA DQ2/DQ8, or biopsy. | Clearly unrelated citation. |
| 2 | For serology and HLA - articles where sensitivity and specificity could be extracted. | ||
| For biopsy - articles were included if some measure of diagnostic utility could be obtained. | |||
| 3 | Articles that allowed determination of sensitivity or specificity for all tests were included. | • Articles with major methodological flaws excluded | |
| • Control group did not have gold standard test (biopsy) applied | |||
| • No description of biopsy criteria given | |||
| • Celiac group known to be positive for test under evaluation | |||
| • Control group known to be negative for the test under evaluation | |||
| • Control groups included patients with Marsh I or II biopsy lesions | |||
| • AGA test performed without commercial ELISA kit or before 1990 | |||
| Celiac 2 | 1 | Any potential citation of prevalence or incidence of CD in general and high-risk populations or association of CD with other disorders | Clearly unrelated citation. |
| 2 | Citations limited to those that gave evidence of the prevalence or incidence of CD in the general population or the AHRQ identified high-risk populations (e.g., diabetes, relatives, iron deficiency, osteoporosis). | Any studies of other CD-associated disorders not identified by the task order. | |
| Countries: North America, western Europe, Australia, New Zealand. | Citations of the prevalence of specific disorders in patients with celiac (i.e., reverse of the inclusion). | ||
| Any other country. | |||
| 3 | Incidence and/or prevalence could be extracted from the article. | Serious methodological flaws: | |
| • patients identified by surveys, through solicitation of celiac societies | |||
| • incidence studies without a population density denominator | |||
| Celiac 3 | 1 | Any potential citation of the association, prevalence or risk of lymphoma in CD, including articles on outcome of refractory sprue and ulcerative jejunoileitis. | Clearly unrelated citation. |
| 2 | Measure of risk or prevalence/incidence of lymphoma in a population with CD. | Prevalence of CD in a population of lymphoma | |
| Case reports and non-comparative case series. | |||
| 3 | Extractable prevalence, incidence, or cumulative risk of lymphoma in CD. | Clonality of lymphocytes in ulcerative jejunoileitis-ileitis not determined or stated (as per TEP). | |
| Serious methodological flaw. | |||
| Celiac 4 | 1 | Any potential citation of possible consequences of testing for CD. | Clearly unrelated citation. |
| 2 | Consequences extractable from article. | ||
| 3 | Consequences limited to the AHRQ list. | Consequences obtainable from the other celiac objective sub-review - i.e., false positive and negative results, etc. | |
| Celiac 5 | 1 | Any potential citation of interventions for the monitoring or promotion of adherence. | Clearly unrelated citation. |
| 2 | Studies of monitoring adherence were included if they assessed monitoring, by biopsy, serology (AGA publication date 1990 or later, EMA, tTG), or both. | Serology prior to 1990. | |
| Any promotion intervention | |||
| 3 | Data from article could be extracted. Data included follow-up by biopsy alone or serology with biopsy confirmation. | Articles assessing adherence through the measures of intestinal permeability. | |
| Studies that reported changes in mean serological titers with a GFD or gluten challenge, but did not address the potential usefulness of a serologic test to assess compliance. | |||
Level 1 broad screening. Level 1 screening was used to identify any potentially relevant citation, based on review of the title, abstract and key words. For each objective, the SRS system displayed the corresponding task order questions alongside the citation details. Reviewers answered a broad question of whether the citation potentially related to the current objective. Furthermore, the SRS system was set-up in such a way that articles which were identified in one celiac objective silo, that could also be relevant to another objective, could be identified and moved/copied to the other silo. The review team was divided up so that two members could be simultaneously reviewing each objective.
Level 2 refined screening. Potentially relevant articles identified at level 1 were obtained in full for level 2 screening. Again, using the SRS system with the actual articles on hand, reviewers selected articles that related to each of the specific objectives. The reviewers were asked to err on the side of inclusion for this level, and to classify articles as “original” or “review”. Original articles meeting level 2 inclusion also had basic demographic data—such as screening test used, celiac definition, and study population identified—recorded into the SRS system.
Level 3 final screening. Level 3 screening identified articles that specifically allowed for the answering of the task order questions. These articles fulfilled the final inclusion/exclusion criteria, allowed actual extraction of the required data, and did not have fatal methodological flaws.
Important articles answering a stated objective but not meeting inclusion criteria (i.e., containing potential threats to internal validity), were presented and discussed in the discussion section.
For each objective, a detailed and standardized data abstraction form was developed with the assistance of content experts and the TEP panel. The data abstraction forms included baseline study characteristics as well as questions allowing for the abstraction of all relevant study results and characteristics. The electronic data extraction forms began with basic study and patient demographic questions that were common across the five sub-review forms. These included reviewer name, author name, publication year, publication type, study design type, and basic study population demographics such as race, age, gender, and type of CD population. The extraction forms then moved to specific questions geared at extracting data to answer the respective objective's questions. The individual data abstraction forms are included in Appendix C.
Celiac 1 (sensitivity and specificity) data abstraction form. Separate data abstraction forms were developed for serology, HLA, and the biopsy sub-questions. Two-by-two tables were used to abstract data on sensitivity and specificity, and to determine positive and negative predictive values and the prevalence of CD in the tested population. The biopsy studies were quite heterogeneous, and did not allow for direct numeric extraction of data.
Celiac 2 (prevalence and incidence) data abstraction form. For this objective, the data extraction form included questions for detailing the screened study population, the number of individuals screened, the number of CD cases identified and how CD was confirmed. For incidence studies, the comparison population and time period were recorded.
Celiac 3 (lymphoma) data abstraction form. In addition to the basic demographic, and study design data, the extraction form contained fields for the extraction of risk data linking GI lymphoma to CD. Types of data sought were prevalence and incidence of lymphoma in CD in the setting of comparison data from a control population. Fields for extracting standardized incidence, morbidity, and mortality ratios were included.
Celiac 4 (consequences of screening) data abstraction form. The extraction forms for this objective included text fields to detail the consequences of testing for CD. The form contained fields that identified the specific consequence of testing which was addressed by the study, as well as a data field to report the study findings. The general field approach was chosen to allow extraction of the expected varied data for this objective.
Celiac 5 (monitoring and promoting adherence) data abstraction form. For this objective, standard demographic data was collected, as well as the methods used to monitor adherence to a GFD, the response of those measures to the diet, and the correlation of serological methods with biopsy findings. Space was provided to detail the sensitivity and specificity of the monitoring method when that data was available. For the objective of promoting adherence to a GFD, a text-based form was used to allow the extractor to describe the intervention and the results of its use.
Electronic forms. The abstraction forms were developed in Microsoft Excel to allow for electronic data entry and recording, and to allow exporting the evidence table data into Microsoft Word. For each celiac objective, data abstraction was conducted by one reviewer and verified by another. The extracted data was further verified by one of the principal investigators.
The quality of reporting of diagnostic test studies was assessed using the QUADAS tool.19 This tool is the first to be published that allows for the assessment of the quality of studies of diagnostic tests. The instrument was developed using a Delphi procedure. The Delphi panel consisted of nine experts in diagnostic research who refined an initial list of items in four rounds, after which agreement was reached on the items to be included in the tool. The QUADAS tool consists of 14 questions that are answered “yes,” “no,” or “unsure.” The tool addresses the items individually and does not incorporate an overall quality score (Appendix D).
Cohort and case-control study reports were assessed using the Newcastle-Ottawa scale (NOS; Appendix D). The NOS is an ongoing collaboration between the Universities of Newcastle, Australia and Ottawa, Canada. It was developed to assess the quality of non-randomized studies with its design, content and ease-of-use directed to the task of incorporating the quality assessments in the interpretation of meta-analytic results. A “star system” has been developed in which a study is judged on three broad perspectives: the selection of the study groups; the comparability of the groups; and the ascertainment of either the exposure or outcome of interest for case-control or cohort studies, respectively. The goal of this project is to develop an instrument that provides an easy and convenient tool for quality assessment of non-randomized studies for use in a systematic review.
The inter- and intra-rater reliability of the NOS have been established. The face content validity of the NOS has been reviewed based on a critical review of the items by several experts in the field, who evaluated its clarity and completeness for the specific task of assessing the quality of studies to be used in a meta-analysis. Furthermore, the validity of the NOS criteria has been established by comparisons to more comprehensive but cumbersome scales. An assessment plan is being formulated for evaluating its construct validity, with consideration of the theoretical relationship of the NOS to external criteria and the internal structure of the NOS components.20
Quality assessments of cross-sectional reports were assessed using a 19-item instrument adapted from Ophthalmology (Appendix D).21
We did not conduct any sensitivity analysis of quality assessments on the observational studies, as there is little by way of guidance to suggest what a poor quality study score would be based on for these assessment instruments.
One reviewer assessed the quality of an entire celiac objective to maintain internal consistency. Quality assessment was not performed under masked conditions.
The data obtained from this review fell into several broad categories, which correspond in large part to the individual study objectives. These will be addressed in turn.
Data for the sensitivity and specificity of each serological marker was considered separately. In addition, studies were subdivided by the population age group (adults, children, mixed population), and by study design (case control, relevant clinical population/cohort).
Attempts were made to identify, explain, and minimize clinical and statistical heterogeneity in the included studies. Heterogeneity was assessed graphically by plotting receiver operator (ROC) curves for each of the included studies in a given analysis. A Pearson's Chi Square with n-1 degrees of freedom, where n represents the number of included studies in an analysis was calculated to assess statistical heterogeneity.
Pooled estimates were only calculated if clinically and statistically appropriate. In situations where pooling was not performed, a narrative systematic review was conducted.
There are several potential ways to pool the results of studies of diagnostic tests, each having both advantages and disadvantages. The simplest and most intuitive is to simply perform a weighted mean of the sensitivity and specificity for the studies in question. This method provides a pooled estimate that is easy to interpret by clinicians. Several other techniques involve the pooling of diagnostic odds ratios or likelihood ratios. These methods have the distinct disadvantage of difficulty in interpretation, and the inability to derive a pooled sensitivity or specificity from the resulting estimates. Lastly, one can use one of several methods to produce a summary ROC curve. The method described by Littenberg and Moses,22, 23 has the advantage of being able to produce a summary curve while taking into account a threshold effect. This can occur when different studies use different thresholds to define a positive test, or even from differences in labs using the same cut-off. To interpret summary ROC curves it is necessary to know the sensitivity or specificity of the test in question in the population in which it will be applied. Since neither of these values is estimable without conducting yet another diagnostic accuracy study for the given population, the clinical usefulness of using this method alone is limited.24, 25
In order to produce clinically useful pooled statistics, we calculated a weighted mean of the sensitivity and specificity from those of the included study. For both sensitivity and specificity, this pooling relies on the assumption that the test statistic is the same in all of the included studies. For each pooled estimate, a 95% confidence interval (CI) was calculated using both a fixed and random effects model. The results of which were compared as a further test for heterogeneity. The pooled estimates for the sensitivity and specificity were also compared with a summary ROC curve calculated for the same group of studies as a second check of the estimates (summary ROC Curves are included in Appendix E).
The prevalence and incidence data from the Celiac 2 objective, and the CD-lymphoma data from the Celiac 3 objective, were anticipated to be quite heterogeneous considering the different, countries, age groups, and risk characteristics of the studied patients. Attempts were made to group studies of prevalence by age group, study population, and serological screening method. If the grouped studies did not show evidence of heterogeneity, pooled estimates of the prevalence were produced for that group of studies, otherwise a descriptive presentation of the data with a qualitative systematic review was conducted. Likewise, the outcome measures of the Celiac objectives 4 and 5 were presented in a qualitative systematic review, except in cases where it was possible to pool the sensitivity and specificity data as measures of monitoring of patients at various stages of recovery on a GFD.
To minimize clinical and statistical heterogeneity, the included articles of a particular antibody test were divided into groups by age of the included population (adults, children, mixed), the study design (case control, or relevant clinical population/cohort), by antibody type (IgA or IgG), and by test methodology (e.g., monkey esophagus [ME] or human umbilical cord [HUC]). Within these groups, further differences in study population, country of origin, and biopsy definitions (especially whether or not mild grades without villous atrophy were included) were assessed systematically. Studies that reported using the ESPGAN criteria for the diagnosis of CD were categorized as including patients with some degree of villous atrophy. Other potential causes of heterogeneity such as the cut-offs used to define a positive test were assessed.
Two articles were identified that assessed the diagnostic value of various antibodies in children64 and in mixed-age populations40 with IgA deficiency. As well, one study enrolled biopsy-proven CD patients who were known to be EMA negative.66 These studies were considered separately from the others. Studies of using antibodies in combination were also assessed separately.
Pooled statistical estimates (with 95% CIs) are provided for studies without clinical and statistical heterogeneity, and summary ROC curves for the studied antibodies are provided in Appendix E. Sensitivity analyses by study design did not show a significant difference except for the analysis of IgA-tTG-guinea pig (GP) in adults. Therefore, apart from studies of IgA-tTG-GP in adults, pooled estimates, when available, included data from both study designs.
AGA. The diagnostic characteristics of IgA were assessed in 35 studies and the diagnostic characteristics of IgG-AGA were assessed in 30 studies. Of the 35 IgA-AGA studies, 11 were conducted in an adult population,30, 33, 45, 50, 54, 61–63, 71, 77, 80 21 in a population of children, 26, 27, 29, 31, 34, 36, 38, 42, 43, 50, 52, 56, 59, 60, 64, 67, 68, 83, 85, 87, 88 and five in a mixed population.27, 37, 40, 74, 75
Of the 30 IgG-AGA studies, seven were conducted in an adult population,30, 33, 54, 62, 63, 71, 80 19 were conducted in population of children,26, 27, 29, 31, 34, 36, 38, 42, 43, 50, 52, 58, 59, 64, 66, 68, 69, 83, 85 and five in a mixed population.27, 37, 40, 74, 75 Some studies provided data for more than one age group.
Some studies only provided summary statistics without the raw two-by-two table results,33, 34, 54, 58, 59, 69 however, the raw data was calculated from the presented sensitivity and specificity, and from the group sizes.
One study66 was conducted in CD patients who were known to be IgA-EMA negative, and was not included in the main analysis. In this study of children, the sensitivity for IgA-AGA was 22% and the sensitivity for IgG-AGA was 33%, whereas, the specificity for IgA-AGA was 67% and the specificity for IgG-AGA was 58%; these values are considerably lower than those reported in other studies. Another two studies were conducted in patients with IgA deficiency.40, 64 The first demonstrated a sensitivity of 0% using IgA-AGA, but a sensitivity and specificity of 100% using IgG-AGA,40 whereas the second showed a sensitivity of 0% with IgA-AGA, but a sensitivity of 100% and a specificity of 80% using IgG-AGA.
| Author, year; country | Study type | Biopsy criteria | Sens | Spec | PPV | NPV | Prev |
|---|---|---|---|---|---|---|---|
| Picarelli, 2000; Italy | Case-control | ESPGAN | 22.2* | 66.7* | 50* | 36.3* | 0.60* |
| Gaetano, 1997; Italy | Case-control | ESPGAN | 92 | 68 | 85.2 | 80.9 | 0.67 |
| Carroccio, 1993; Italy | Case-control | Biopsies confirmed at diagnosis, on GFD, and rechallenge (severity grade - not reported) | 68 | 91.7 | 86.1 | 79.7 | 0.43 |
| Hansson, 2000; Sweden | Case-control | ESPGAN | 95.5 | 73.9 | 77.8 | 94.4 | 0.49 |
| Berger, 1996; Switzerland | Case-control | ESPGAN revised with complete villous atrophy | 76 | 67 | 74 | 59 | 0.55 |
| Lerner, 1994; USA, Israel | Case-control | Criteria of Townley modified by Ingkaran | 52 | 94 | 87 | 74 | 0.52 |
| Bahia, 2001; Brazil | Relevant clinical population | Severe villous atrophy | 95.5 | 95.6 | 91.3 | 97.9 | 0.31 |
| Russo, 1999; Canada | Relevant clinical population | ESPGAN | 83.3 | 84.5 | 64.5 | 93.8 | 0.25 |
| Bode, 1993; Denmark | Relevant clinical population | ESPGAN | 64 | 99 | 90 | 97 | 0.07 |
| Poddar, 2002; India | Relevant clinical population | ESPGAN (villous atrophy and unequivocal response to GFD) | 94 | 91.5 | 92 | 93.5 | 0.52 |
| Ascher, 1996; Sweden | Relevant clinical population | ESPGAN | 100 | 94.4 | 95.7 | 100 | 0.55 |
| Lindberg, 1985; Sweden | Relevant clinical population | ESPGAN; Alexander grading | 88 | 88 | 0.31 | ||
| Altuntas, 1998; Turkey | Relevant clinical population | Subtotal or total villous atrophy, crypt hyperplasia, increased IEL | 23 | 90 | 75 | 48 | 0.55 |
| Artan, 1998; Turkey | Relevant clinical population | ESPGAN ; | 58 | 51 | 42.4 | 66.7 | 0.38 |
| Rich, 1990; USA | Relevant clinical population | Not recorded - state“severe” lesion | 53 | 93 | 72.7 | 85.7 | 0.25 |
| Gonczi, 1991; Australia | Relevant clinical population (184 children with suspected CD) | ESPGAN no details on biopsy findings | 95 | 92.4 | 76 | 98.6 | 0.20 |
| Wolters, 2002; Netherlands | Relevant clinical population (identified retrospectively) | Subtotal villous atrophy with crypt hyperplasia | 83 | 86 | 81 | 81 | 0.51 |
| Lindquist, 1993; Sweden | Relevant clinical population (suspected celiac) | ESPGAN; subtotal or partial villous atrophy | 86.5 | 92.7 | 93.7 | 85 | 0.55 |
| Chirdo, 1999; Argentina | Relevant clinical trial | Total or subtotal villous atrophy | 75 | 87.1 | 84 | 80 | 0.47 |
| Chartrand, 1997; Canada | Relevant clinical population | ESPGAN - with flat mucosal biopsy | 80 | 92 | 67 | 96 | 0.17 |
| Meini, 1996; Italy | Relevant clinical population | Partial villous atrophy or total villous atrophy | 0 | 100 | 0 | 91.7 | 0.08 |
30 IgA-EMA-negative patients suspected of CD; 9 of 18 CD patients IgA deficient
| Author, year; country | Study type | Biopsy criteria | Notes | Sens | Spec | PPV | NPV | Prev |
|---|---|---|---|---|---|---|---|---|
| Cataldo, 2000 ; Italy | Case-control | Original and revised criteria? | 20 IgA-deficient CD vs healthy IgA-deficient non-CD | 100 | 100 | 100 | 100 | 0.7 |
| Sulkanen, 1998 ; Finland | Case-control | ESPGAN | 69 | 73.4 | 63 | 78.3 | 0.4 | |
| Ascher, 1996 ; Sweden | Relevant clinical population | ESPGAN | 96.4 | 69.2 | 72.6 | 95.7 | 0.5 | |
| Carroccio, 2002 ; Italy | Relevant clinical population | Marsh-broke down by criteria; CD was diagnosed as enlarged crypts and/or villous atrophy - with normalization on GFD | 76 | 75 | 73.4 | 77.3 | 0.5 | |
| Tesei, 2003 ; Argentina | Relevant clinical population | Marsh II to IV - with confirmation | 84 | 86 | 89 | 79 | 0.6 | |
| Author, year; country | Study type | Biopsy criteria | Sens | Spec | PPV | NPV | Prev (%) |
|---|---|---|---|---|---|---|---|
| Sategana-Guidetti, 1995; Italy | Case-control | Roy-Choudhury criteria; partial or total villous atrophy | 55 | 100 | 100 | 55.9 | 35.0 |
| Dahele, 2001; Scotland | Case-control | Included 6 with IEL, rest partial villous atrophy or greater | 61 | 86 | 88.5 | 42.7 | 43.6 |
| Bode, 1994; Denmark | Relevant clinical population | Crypt hyperplasia, villous atrophy and increase inflammatory cells | 46 | 98 | 75 | 92 | 25.7 |
| Kaukinen, 2000; Finland | Relevant clinical population | Villous height to crypt ratio <2.0; IEL and HLA also tested | 83 | 45 | 75 | 92 | 57.0 |
| Maki, 1991; Finland | Relevant clinical population | Severe pathology with crypt hyperplasia to total villous atrophy; mild changes considered normal | 30.8 | 87.2 | 22.2 | 91.3 | 14.8 |
| McMillan, 1991; Ireland | Relevant clinical population | Revised ESPGAN | 100 | 100 | 100 | 100 | 31.5 |
| Bardella, 2001; Italy | Relevant clinical population | Marsh; no grade reported | 95 | 89 | 76 | 98 | 33.3 |
| Gonczi, 1991; Australia | Relevant clinical population (184 children with suspected CD) | ESPGAN no details on biopsy findings | 92 | 88.2 | 85.2 | 93.8 | 45.8 |
| Valdimarsson, 1996; Sweden | Relevant clinical population+ a few dypeptic controls | Alexander's classification; partial or subtotal villous atrophy | 79 | 70 | 28 | 96 | 36.8 |
| Vogelsang, 1995; Austria | Relevant study population | Modified ESPGAN; flat mucosa; crypt hyperplasia raised IELs | 81.6 | 83 | 81.6 | 83 | 48.0 |
| Author, year; country | Study type | Biopsy criteria | Notes | Sens | Spec | PPV | NPV | Prev |
|---|---|---|---|---|---|---|---|---|
| Cataldo, 2000; Italy | Case-control | Original & revised criteria? | 20 IgA-deficient CD vs healthy IgA-deficient non-CD | 0 | 100 | 0 | 33.3 | 0.7 |
| Sulkanen, 1998; Finland | Case-control | ESPGAN | 84.5 | 81.6 | 75.2 | 89 | 0.4 | |
| Ascher, 1996; Sweden | Relevant clinical population | ESPGAN | 90.9 | 98.5 | 98 | 92.7 | 0.5 | |
| Carroccio, 2002; Italy | Relevant clinical population | Marsh, broken down by criteria; CD was diagnosed as enlarged crypts and/or villous atrophy-with normalization on GFD | 67 | 90 | 86 | 75 | 0.5 | |
| Tesei, 2003; Argentina | Relevant clinical population | Marsh II to IV - with confirmation | 64 | 92 | 92 | 64 | 0.6 | |
| Author, year, country | Study type | Biopsy criteria | Sens | Spec | PPV | NPV | Prev (%) |
|---|---|---|---|---|---|---|---|
| Sategana-Guidetti, 1995 ; Italy | Case-control | Roy-Choudhury criteria; partial or total villous atrophy | 78 | 80.7 | 87.6 | 67.6 | 56.7 |
| Bode, 1994 ; Denmark | Relevant clinical population | Crypt hyperplasia, villous atrophy and increase inflammatory cells | 62 | 97 | 73 | 94 | 34.8 |
| Kaukinen, 2000 ; Finland | Relevant clinical population | Villous height to crypt ration <2.0; IEL and HLA also tested | 17 | 86 | 14 | 93.5 | 15.1 |
| Maki, 1991 ; Finland | Relevant clinical population | Severe pathology with crypt hyperplasia to total villous atrophy; mild changes considered normal | 46.2 | 89 | 33.3 | 93.3 | 14.8 |
| McMillan, 1991 ; Ireland | Relevant clinical population | Revised ESPGAN | 57 | 85 | 64 | 81 | 28.1 |
| Gonczi, 1991 ; Australia | Relevant clinical population (184 children with suspected CD) | ESPGAN no details on biopsy findings | 100 | 69.7 | 69.4 | 100 | 61.0 |
| Vogelsang, 1995 ; Austria | Relevant study population | Modified ESPGAN; flat mucosa; crypt hyperplasia raised IELs | 73.5 | 73.6 | 72 | 75 | 49.0 |
| Author, year; country | Study type | Biopsy criteria | Sens | Spec | PPV | NPV | Prev |
|---|---|---|---|---|---|---|---|
| Picarelli, 2000 ; Italy | Case-control | ESPGAN | 33.3 | 58.3 | 54.5 | 36.8 | 0.60 |
| Gaetano, 1997; Italy | Case-control | ESPGAN | 100 | 36 | 75.7 | 100 | 0.67 |
| Carroccio, 1993 ; Italy | Case-control | Biopsies confirmed at diagnosis, on GFD, and rechallenge (severity grade - not recorded) | 88.9 | 46.7 | 55.6 | 84.8 | 0.43 |
| Hansson, 2000 ; Sweden | Case-control | ESPGAN | 81.8 | 82.6 | 81.8 | 82.6 | 0.49 |
| Berger, 1996 ; Switzerland | Case-control | ESPGAN revised with complete villous atrophy | 69 | 59 | 68 | 53 | 0.55 |
| Lerner, 1994 ; U.S.A, Israel | Case-control | Criteria of Townley modified by Ingkaran | 88 | 92 | 88 | 92 | 0.52 |
| Bahia, 2001 ; Brazil | Relevant clinical population | Severe villous atrophy | 90.9 | 97.8 | 95.2 | 95.7 | 0.32 |
| Russo, 1999 ; Canada | Relevant clinical population | ESPGAN | 83.3 | 85.9 | 66.7 | 93.8 | 0.25 |
| Bode, 1993 ; Denmark | Relevant clinical population | ESPGAN | 71 | 99 | 100 | 98 | 0.07 |
| Ascher, 1996 ; Sweden | Relevant clinical population | ESPGAN | 100 | 66.7 | 75.6 | 100 | 0.55 |
| Lindberg, 1985 ; Sweden | Relevant clinical population | ESPGAN; Alexander or Perea et al. | 93 | 89 | 93.1 | 88.6 | 0.31 |
| Altuntas, 1998 ; Turkey | Relevant clinical population | Subtotal or total villous atrophy, crypt hyperplasia, increased IEL | 100 | 0 | 55 | 0 | 0.55 |
| Artan, 1998 ; Turkey | Relevant clinical population | ESPGAN | 83 | 59 | 55.6 | 85.2 | 0.38 |
| Rich, 1990 ; USA | Relevant clinical population | Not reported - state “severe” lesion | 100 | 58 | 44 | 100 | 0.25 |
| Gonczi, 1991 ; Australia | Relevant clinical population (184 children with suspected CD) | ESPGAN no details on biopsy findings | 100 | 92.4 | 76.9 | 100 | 0.20 |
| Wolters, 2002 ; Netherlands | Relevant clinical population (identified retrospectively) | Subtotal villous atrophy with crypt hyperplasia | 83 | 80 | 86 | 82 | 0.51 |
| Chirdo, 1999 ; Argentina | Relevant clinical trial | Total or subtotal villous atrophy | 85.7 | 80.6 | 80 | 86 | 0.47 |
| Chartrand, 1997 ; Canada | Relevant clinical population | ESPGAN - with flat mucosal biopsy | 83 | 79 | 45 | 96 | 0.17 |
| Meini, 1996 ; Italy | Relevant clinical population | Partial villous atrophy or total villous atrophy | 100 | 80 | 31.2 | 100 | 0.08 |
Four studies looked at IgG-AGA in a non-IgA-deficient mixed population of adults and children.27, 37, 74, 75 Two of these demonstrated sensitivities greater than 80%, one showed a sensitivity of 84%, whereas the second had a sensitivity of 96%. However, only the first study had specificity greater than 80%. In total, three of the four studies had specificities less than 80% (Table 7; Figure 7
EMA
EMA—ME. The diagnostic characteristics of IgA-EMA-ME were assessed in 35 studies, and the diagnostic characteristics of IgG-EMA-ME were assessed in three studies. Of these included studies, 11 IgA-EMA-ME studies were conducted in adults,30, 32, 39, 51, 57, 63, 71, 77, 78, 80, 81 17 in children,27, 35, 36, 38, 41, 44, 46, 51, 52, 55, 56, 58, 60, 69, 79, 82, 83 and five in a mixed population.27, 37, 40, 47, 75 Some studies provided data for more than one age group. One study in children provided data on two different populations (including different control groups).55 IgG-EMA-ME was assesed in one adult population,63 one child population,66 but not in any of the mixed-population studies.
One study was conducted in a population of known CD patients who had previously tested negative for EMA. In this study, the sensitivity and specificity of IgG EMA-ME were both 100%;66 the performance of IgA-EMA was not reported. Another study that included CD patients with less than a Marsh IIIa grade,37 demonstrated a sensitivity of 88%. Some studies only provided summary statistics without the raw two-by-two table results,46, 58, 69 however, the raw data was abstracted based on the reported sensitivity and specificity, and the group sizes.
| Author, year; country | Study type | Biopsy criteria | Sens | Spec | PPV | NPV | Prev(%) |
|---|---|---|---|---|---|---|---|
| Hallstrom, 1989 ; Finland | Case-control | Flat mucosa | 90.6 | 100 | 100 | 88.9 | 51.8 |
| Biagi, 2001 ; Italy | Case-control | Partial villous atrophy or greater | 94.6 | 100 | 100 | 94.5 | 49.1 |
| Ladinser, 1994 ; Italy | Case-control | Revised ESPGAN | 100 | 100.0 | 100 | 100 | 21.1 |
| Sategana-Guidetti, 1995 ; Italy | Case-control | Roy-Choudhury criteria; partial or total villous atrophy | 100 | 100 | 100 | 100 | 63.7 |
| Valentini, 1994 ; Italy | Case-control | Partial villous atrophy or greater | 99 | 100 | 100 | 96.7 | 76.2 |
| Volta, 1995 ; Italy | Case-control | Roy-Choudhury criteria | 95 | 100 | 100 | 97.1 | 35.6 |
| Carroccio, 2002 ; Italy | Relevant clinical population | Ferguson and Murray; partial or total villous atrophy | 100 | 100 | 100 | 100 | 11.6 |
| McMillan, 1991 ; Ireland | Relevant clinical population | Revised ESPGAN | 89.2 | 100 | 100 | 95.3 | 28.1 |
| Bardella, 2001 ; Italy | Relevant clinical population | Marsh | 100 | 97.2 | 93 | 100 | 28.7 |
| Valdimarsson, 1996 ; Sweden | Relevant clinical population+ a few dypeptic controls | Alexander's classification; partial or subtotal villous atrophy | 74 | 100 | 100 | 96 | 9.7 |
| Vogelsang, 1995 ; Austria | Relevant study population | Modified ESPGAN; flat mucosa; crypt hyperplasia raised IELs | 100 | 100 | 100 | 100 | 48.0 |
| Author, year; country | Study type | Biopsy criteria | Sens | Spec | PPV | NPV | Prev |
|---|---|---|---|---|---|---|---|
| Chirdo, 2000; Argentina | Case-control | ESPGAN | 92.4 | 100 | 100 | 85.2 | 0.7 |
| Kolho, 1997; Finland | Case-control | Revised ESPGAN | 95 | 100 | 100 | 97 | 0.3 |
| Kolho, 1997; Finland | Case-control | Revised ESPGAN | 100 | 100 | 100 | 100 | 0.5 |
| Whelan, 1996; Ireland | Case-control | Subtotal villous atrophy | 100 | 100 | 100 | 100 | 0.4 |
| Bonamico, 2001; Italy | Case-control | ESPGAN | 95.1 | 98.2 | 90 | 44.3 | 0.5 |
| Gaetano, 1997; Italy | Case-control | ESPGAN | 96 | 96 | 97.9 | 92.3 | 0.7 |
| Carroccio, 1993; Italy | Case-control | Biopsies confirmed at diagnosis, on GFD, and rechallenge (severity grade - not reported) | 100 | 96.7 | 95.7 | 100 | 0.4 |
| Di Leo, 2003; Italy | Case-control | ESPGAN | 100 | 96.5 | 93.5 | 100 | 0.4 |
| Vitoria, 2001; Italy | Case-control | Subtotal villous atrophy | 100 | 100 | 100 | 100 | 0.6 |
| Hansson, 2000; Sweden | Case-control | ESPGAN | 95.5 | 100 | 100 | 95.8 | 0.5 |
| Lerner, 1994; USA, Israel | Case-control | Criteria of Townley modified by Ingkaran | 97 | 98 | 97 | 98 | 0.5 |
| Hallstrom, 1989; Finland | Case-control | Flat mucosa | 100 | 100 | 100 | 100 | 0.4 |
| Chan, 2001; Canada | Relevant clinical population | Villous atrophy, crypt hyperplasia, increased lymphocytes | 89 | 97 | 80 | 98 | 0.1 |
| Russo, 1999; Canada | Relevant clinical population | ESPGAN | 75 | 88.7 | 69.2 | 91.3 | 0.3 |
| Ascher, 1996; Sweden | Relevant clinical population | ESPGAN | 95.4 | 100 | 100 | 94.7 | 0.6 |
| Wolters, 2002; Netherlands | Relevant clinical population (identified retrospectively) | Subtotal villous atrophy with crypt hyperplasia | 92 | 90 | 90.5 | 92 | 0.5 |
| Lindquist,1993; Sweden | Relevant clinical population (suspected CD) | ESPGAN; subtotal or partial villous atrophy | 98.1 | 92.7 | 94.4 | 97.5 | 0.6 |
| Kumar, 1989; USA, Israel | Relevant clinical population and control cases | ESPGAN + Townley | 96.0 | 89.0 | 87.0 | 96.7 | 0.2 |
| Author, year; country | Study type | Biopsy criteria | Notes | Sens | Spec | PPV | NPV | Prev |
|---|---|---|---|---|---|---|---|---|
| Cataldo, 2000; Italy | Case-control | Original & revised criteria? | 20 IgA-deficient CD vs healthy IgA-deficient non-CD | 0 | 100 | 0 | 33.3 | 0.7 |
| Dickey, 2001; Northern Ireland | Case-control | Villous atrophy | 75.3 | 98.3 | 98.2 | 76 | 0.6 | |
| Ascher, 1996; Sweden | Relevant clinical population | ESPGAN | 98.2 | 100 | 100 | 98.5 | 0.5 | |
| Carroccio 2002; Italy | Relevant clinical population | Marsh - broke down by criteria; CD was diagnosed as enlarged crypts and/or villous atrophy - with normalization on a GFD | 88 | 99 | 98.7 | 90 | 0.5 | |
| Tesei, 2003; Argentina | Relevant clinical population | Marsh II to IV - with confirmation | 86 | 100 | 100 | 83 | 0.6 | |
| Author, year; country | Study type | Biopsy criteria | Sens | Spec | PPV | NPV | Prev (%) |
|---|---|---|---|---|---|---|---|
| McMillan, 1991; Ireland | Relevant clinical population | Revised ESPGAN | 39 | 98.3 | 92 | 78 | 13.5 |
| Author, year; country | Study type | Biopsy criteria | Notes | Sens | Spec | PPV | NPV | Prev |
|---|---|---|---|---|---|---|---|---|
| Picarelli, 2000; Italy | Case-control | ESPGAN | 30 IgA-EMA neg. pts suspected of CD; 9/18 CD patients IgA deficient | 100 | 100 | 100 | 100 | 0.1 |
EMA—HU. IgA-EMA-HU was assessed in 13 studies. Six of these studies were conducted in adults,45, 49, 54, 57, 61, 70, 89 five in children,36, 53, 55, 69, 70 and two in a mixed population.72, 74 One study provided summary statistics without the raw two-by-two table results,69 however the raw data was calculated from the reported sensitivity and specificity and the group numbers. One study provided data on two different populations (including different control groups).55
IgG-EMA-HU was not assessed in any of the studies meeting our inclusion criteria.
Two studies included CD patients (both adult and children) with less than a Marsh IIIa grade, and reported IgA-EMA-HU sensitivities of 87% and 100%.45
| Author, year; country | Study type | Biopsy criteria | Sens | Spec | PPV | NPV | Prev (%) |
|---|---|---|---|---|---|---|---|
| Gillbert, 2000; Canada | Case-control | Mild, moderate, severe villous atrophy | 100 | 100 | 100 | 100 | 33.3 |
| Ladinser, 1994; Italy | Case-control | Revised ESPGAN | 90 | 100 | 100 | 98 | 18.9 |
| Salmaso, 2001; Italy | Case-control | Grades I–IV Marsh with response to a GFD | 87 | 100 | 100 | 95.1 | 24.7 |
| Volta, 1995; Italy | Case-control | Roy-Choudhury criteria | 95 | 100 | 100 | 97.1 | 35.6 |
| Dahele, 2001; Scotland | Case-control | Included 6 with IEL, rest partial villous atrophy or greater | 87 | 100 | 100 | 81.3 | 55.3 |
| Kaukinen, 2000; Finland | Relevant clinical population | Villous height to crypt ration <2.0; IEL and HLA also tested | 88.9 | 100 | 100 | 98.9 | 7.6 |
| Author, year; country | Study type | Biopsy criteria | Sens | Spec | PPV | NPV | Prev |
|---|---|---|---|---|---|---|---|
| Kolho, 1997; Finland | Case-control | Revised ESPGAN | 95 | 100 | 100 | 97 | 0.3 |
| Kolho, 1997; Finland | Case-control | Revised ESPGAN | 100 | 100 | 100 | 100 | 0.5 |
| Gaetano, 1997; Italy | Case-control | ESPGAN | 94 | 100 | 100 | 89.2 | 0.7 |
| Salmaso, 2001; Italy | Case-control | Grades I–IV Marsh with response to GFD | 100 | 100 | 100 | 100 | 0.6 |
| Russo, 1999; Canada | Relevant clinical population | ESPGAN | 45.8 | 95.8 | 78.6 | 84 | 0.3 |
| Iltanen, 1999 Finland | Relevant clinical population | ESPGAN - CD confirmed at follow-up | 100 | 77.1 | 60.1 | 100 | 0.3 |
| Author, year; country | Study type | Biopsy criteria | Sens | Spec | PPV | NPV | Prev |
|---|---|---|---|---|---|---|---|
| Sblaterro, 2000; Italy | Case-control | ESPGAN | 93 | 100 | 100 | 80 | 0.8 |
| Sulkanen, 1998; Finland | Case-control | ESPGAN | 92.6 | 99.5 | 99.2 | 94.9 | 0.4 |
tTG antibodies
tTG—GP liver. The diagnostic characteristics of IgA-tTG-GP were assessed by ELISA in nine studies, and the diagnostic characteristics IgG-tTG-GP assessed by ELISA in three studies. Of the IgA-tTG-GP studies, five were conducted in adults,30, 32, 39, 45, 70 five in children,35, 41, 52, 70, 83 and four in a mixed population.47, 72, 74, 76 One study provided separate data for more than one age group.70
Of the IgG-tTG-GP studies that met the inclusion criteria, none were in adults or children, although two studies were in a mixed population.72, 76
Two studies included CD patients with less than a Marsh IIIa grade.45, 70 These studies demonstrated sensitivities of 81% and 95% for IgA-tTG-GP.
| Author, year; country | Study type | Biopsy criteria | Sens | Spec | PPV | NPV | Prev (%) |
|---|---|---|---|---|---|---|---|
| Biagi, 2001; Italy | Case-control | Partial villous atrophy or greater | 87.5 | 98.1 | 98 | 87.1 | 46.3 |
| Salmaso, 2001; Italy | Case-control | Grades I-IV Marsh with response to a GFD | 87 | 97 | 90.9 | 94.9 | 27.2 |
| Dahele, 2001; Scotland | Case-control | Included 6 with IEL, rest partial villous atrophy or greater | 81 | 97 | 97.9 | 74.1 | 52.5 |
| Carroccio, 2002; Italy | Relevant clinical population | Ferguson and Murray; partial or total villous atrophy | 100 | 92 | 60 | 100 | 18.8 |
| Bardella, 2001; Italy | Relevant clinical population | Marsh | 100 | 98.2 | 83.3 | 100 | 10.0 |
| Author, country; year | Study type | Biopsy criteria | Sens | Spec | PPV | NPV | Prev |
|---|---|---|---|---|---|---|---|
| Bonamico, 2001; Italy | Case-control | ESPGAN | 90.3 | 100 | 100 | 30.3 | 0.5 |
| Salmaso, 2001; Italy | Case-control | Grades I-IV Marsh with response to a GFD | 95 | 100 | 100 | 94.1 | 0.6 |
| Hansson, 2000; Sweden | Case-control | ESPGAN | 90.9 | 95.7 | 95.2 | 91.7 | 0.5 |
| Chan, 2001; Canada | Relevant clinical population | Villous atrophy, crypt hyperplasia, increase lymphocytes | 89 | 94 | 67 | 98 | 0.1 |
| Wolters, 2002; Netherlands | Relevant clinical population (identified retrospectively) | Subtotal villous atrophy with crypt hyperplasia | 96 | 92 | 92.6 | 95.7 | 0.5 |
| Author, year, country | Study type | Biopsy criteria | Sens | Spec | PPV | NPV | Prev |
|---|---|---|---|---|---|---|---|
| Dickey, 2001; Northern Ireland | Case-control | Villous atrophy | 93.2 | 96.6 | 97.1 | 91.8 | 0.6 |
| Sblaterro, 2000; Italy | Case-control | ESPGAN | 84 | 100 | 100 | 62.5 | 0.8 |
| Sulkanen, 1998; Finland | Case-control | ESPGAN | 95 | 93.7 | 90.8 | 96.5 | 0.4 |
| Troncone, 1999; Italy | Relevant clinical population | ESPGAN | 91.7 | 98 | 98 | 94 | 0.4 |
| Author, year; country | Study type | Biopsy criteria | Sens | Spec | PPV | NPV | Prev |
|---|---|---|---|---|---|---|---|
| Sblaterro, 2000; Italy | Case-control | ESPGAN | 61.5 | 100 | 100 | 44.4 | 0.8 |
| Troncone, 1999; Italy | Relevant clinical population | ESPGAN | 23 | 98 | 92 | 63 | 0.4 |
tTG - human recombinant (HR)
IgG-tTG-HR. The diagnostic characteristics of IgA-tTG-HR were assessed by ELISA in ten studies, and the diagnostic characteristics IgG-tTG-HR were assessed by ELISA in two studies. Of the IgA-tTG-HR studies, three were conducted in adults,39, 49, 54 three in children,52, 79, 83 and three in a mixed population.40, 72, 75
| Author, year; country | Study type | Biopsy criteria | Notes | Sens | Spec | PPV | NPV | Prev |
|---|---|---|---|---|---|---|---|---|
| Cataldo, 2000; Italy | Case-control | Original & revised criteria? | 20 IgA-deficient CD vs healthy IgA-deficient non-CD | 100 | 80 | 90.1 | 100 | 0.7 |
| Sblaterro, 2000; Italy | Case-control | ESPGAN | 67.6 | 100 | 100 | 48.7 | 0.8 | |
Two studies included CD patients with less than a Marsh IIIa grade.45, 70 These studies demonstrated sensitivities of 81% and 95% for IgA-tTG-GP.
One study was conducted in a mixed-age population of patients with known IgA deficiency.40 In this study, the sensitivity of IgA-tTG-HR was 0%, wheras, the sensitivities and specificities of IgG-tTG-HR were 100% and 80%, respectively.
| Author, year; country | Study type | Biopsy criteria | Sens | Spec | PPV | NPV | Prev (%) |
|---|---|---|---|---|---|---|---|
| Carroccio, 2002; Italy | Relevant clinical population | Ferguson and Murray; partial or total villous atrophy | 100 | 97 | 80 | 100 | 14.5 |
| Gillbert, 2000; Italy | Case-control | Mild, moderate, severe villous atrophy | 95.2 | 100 | 95.2 | 100 | 31.7 |
| Kaukinen, 2000; Finland | Relevant clinical population | Villous height to crypt ration <2.0; IEL and HLA also tested | 100 | 100 | 100 | 100 | 8.7 |
| Author, year; country | Study type | Biopsy criteria | Sens | Spec | PPV | NPV | Prev |
|---|---|---|---|---|---|---|---|
| Vitoria, 2001; Italy | Case-control | Subtotal villous atrophy | 95 | 100 | 100 | 93 | 0.6 |
| Hansson, 2000; Sweden | Case-control | ESPGAN | 95.5 | 95.7 | 95.5 | 95.7 | 0.5 |
| Wolters, 2002; Netherlands | Relevant clinical population (identified retrospectively) | Subtotal villous atrophy with crypt hyperplasia | 96 | 100 | 100 | 96 | 0.5 |
| Author, year; country | Study type | Biopsy criteria | Notes | Sens | Spec | PPV | NPV | Prev |
|---|---|---|---|---|---|---|---|---|
| Cataldo, 2000; Italy | Case-control | Original & revised criteria? | 20 IgA deficient CD vs healthy IgA-deficient non-CD | 0 | 100 | 0 | 33.3 | 0.7 |
| Sblaterro, 2000; Italy | Case-control | ESPGAN | 91.5 | 100 | 100 | 76.9 | 0.8 | |
| Tesei, 2003; Argentina | Relevant clinical population | Marsh II to IV - with confirmation | 91 | 96 | 97 | 87 | 0.6 | |
Overall, these studies demonstrated a specificity of close to 100% and sensitivity in the range of 90% to 96%.
IgG-tTG-HR, IgA deficient. Only one study of IgG-tTG-HR, conducted in an IgA-deficient population, was identified.72 In this study, the sensitivity and specificity of IgG-tTG-HR was 68% and 100%, respectively.
| Author, year; country | Study type | Biopsy criteria | Notes | Sens | Spec | PPV | NPV | Prev |
|---|---|---|---|---|---|---|---|---|
| Valentini, 1994; Italy | Case-control | Partial villous atrophy or greater | Adults | 92 | 90 | 96.8 | 77.1 | 0.76 |
| Bode, 1994; Denmark | Relevant clinical population | Crypt hyperplasia, villous atrophy and increase inflammatory cells | Adults | 77 | 95 | 71 | 97 | 0.41 |
| Gonczi, 1991; Australia | Relevant clinical population (184 children with suspected celiac) | ESPGAN no details on biopsy findings | Adults | 100 | 97.1 | 96.2 | 100 | 0.44 |
| Bode, 1993; Denmark | Relevant clinical population | ESPGAN | Children | 86 | 99 | 92 | 99 | 0.1 |
| Falth-Magnusson, 1994; Sweden | Relevant clinical population | ESPGAN + Alexander grading IV, grade III to IV challenge | Children | 88.5 | 93.7 | 88.8 | 93.5 | 0.4 |
| Lindberg, 1985; Sweden | Relevant clinical population | ESPGAN, Alexander grading | Children | 97 | 83 | 41.8 | 98.2 | 0.3 |
| Artan, 1998; Turkey | Relevant clinical population | ESPGAN | Children: IgA AGA or IgG AGA | 83 | 36 | 44 | 77.8 | 0.3 |
| Gonczi, 1991; Australia | Relevant clinical population (184 children with suspected CD) | ESPGAN no details on biopsy findings | Children | 100 | 98.7 | 95.2 | 98.7 | 0.2 |
| Chartrand, 1997; Canada | Relevant clinical population | ESPGAN - with flat mucosal biopsy | Children | 93 | 71 | 43 | 98 | 0.2 |
| Author, year; country | Study type | Biopsy criteria | Notes | Sens | Spec | PPV | NPV | Prev |
|---|---|---|---|---|---|---|---|---|
| Sblaterro, 2000; Italy | Case-control | ESPGAN | Adults and children | 98.5 | 100 | 100 | 95.2 | 0.8 |
| Author, year; country | Study type | Biopsy criteria | Notes | Sens | Spec | PPV | NPV | Prev |
|---|---|---|---|---|---|---|---|---|
| Russo, 1999; Canada | Relevant clinical population | ESPGAN | Children | 100 | 73 | 57 | 82 | 0.3 |
In general, combining tests when either test is positive tended to improve sensitivity at the cost of specificity, while a requirement for the tests to be concordant tended to improve specificity.
| Analysis | Sens | L 95% CI: | U 95% CI: | Spec | L 95% CI: | U 95% CI: | Prev | L 95% CI: | U 95% CI: | PPV | L 95% CI: | U 95% CI: | NPV | L 95% CI: | U 95% CI: |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| IgA-AGA-ADULT | H | H | H | H | H | H | 0.358 | 0.332 | 0.385 | H | H | H | H | H | H |
| IgG-AGA-ADULT | H | H | H | H | H | H | 0.367 | 0.335 | 0.401 | H | H | H | H | H | H |
| IgA-EMA-ME-ADULT | 0.974 | 0.957 | 0.985 | 0.996 | 0.988 | 0.999 | 0.398 | 0.371 | 0.425 | 0.974 | 0.957 | 0.985 | 0.996 | 0.988 | 0.999 |
| IgG-EMA-ME-ADULT (one study) | 0.393 | 0.236 | 0.576 | 0.984 | 0.913 | 0.997 | 0.135 | 0.079 | 0.221 | 0.393 | 0.236 | 0.576 | 0.984 | 0.913 | 0.997 |
| IgA-EMA-HU-ADULT | 0.902 | 0.859 | 0.934 | 1.000 | 0.991 | 1.000 | 0.331 | 0.297 | 0.368 | 0.902 | 0.859 | 0.934 | 1.000 | 0.991 | 1.000 |
| IgA-tTG-GP-ADULT | 0.859 | 0.808 | 0.898 | 0.953 | 0.930 | 0.969 | 0.312 | 0.279 | 0.348 | 0.859 | 0.808 | 0.898 | 0.953 | 0.930 | 0.969 |
| IgA-tTG-HR-ADULT | 0.981 | 0.901 | 0.997 | 0.981 | 0.958 | 0.991 | 0.160 | 0.126 | 0.202 | 0.981 | 0.901 | 0.997 | 0.981 | 0.958 | 0.991 |
| IgA-AGA-CHILD | H | H | H | H | H | H | 0.363 | 0.341 | 0.385 | H | H | H | H | H | H |
| IgG-AGA-CHILD | H | H | H | H | H | H | 0.437 | 0.413 | 0.462 | H | H | H | H | H | H |
| IgA-EMA-ME-CHILD | 0.961 | 0.945 | 0.973 | 0.974 | 0.963 | 0.982 | 0.400 | 0.378 | 0.423 | 0.961 | 0.945 | 0.973 | 0.974 | 0.963 | 0.982 |
| IgA-EMA-HU-CHILD | 0.969 | 0.935 | 0.986 | H | H | H | 0.447 | 0.402 | 0.493 | 0.969 | 0.935 | 0.986 | 0.949 | 0.915 | 0.970 |
| IgA-tTG-GP-CHILD | 0.931 | 0.888 | 0.959 | 0.963 | 0.931 | 0.980 | 0.446 | 0.401 | 0.493 | 0.931 | 0.888 | 0.959 | 0.963 | 0.931 | 0.980 |
| IgA-tTG-HR-CHILD | 0.957 | 0.903 | 0.981 | 0.990 | 0.946 | 0.998 | 0.519 | 0.452 | 0.584 | 0.957 | 0.903 | 0.981 | 0.990 | 0.946 | 0.998 |
| IgA-AGA-MIXED | H | H | H | H | H | H | 0.415 | 0.386 | 0.444 | H | H | H | H | H | H |
| IgG-AGA-MIXED | H | H | H | H | H | H | 0.510 | 0.480 | 0.540 | H | H | H | H | H | H |
| IgA-EMA-ME-MIXED | H | H | H | 0.995 | 0.982 | 0.999 | 0.467 | 0.434 | 0.500 | 0.859 | 0.825 | 0.888 | 0.995 | 0.982 | 0.999 |
| IgA-EMA-HU-MIXED | 0.925 | 0.881 | 0.954 | 0.996 | 0.975 | 0.999 | 0.437 | 0.391 | 0.484 | 0.925 | 0.881 | 0.954 | 0.996 | 0.975 | 0.999 |
| IgA-tTG-GP-MIXED | H | H | H | 0.954 | 0.927 | 0.972 | 0.463 | 0.425 | 0.501 | 0.913 | 0.877 | 0.939 | 0.954 | 0.927 | 0.972 |
| IgG-tTG-GP-MIXED | 0.451 | 0.363 | 0.543 | 0.988 | 0.935 | 0.998 | 0.265 | 0.208 | 0.331 | 0.451 | 0.363 | 0.543 | 0.988 | 0.935 | 0.998 |
| IgA-tTG-HR-MIXED | 0.902 | 0.864 | 0.930 | 0.954 | 0.915 | 0.976 | 0.573 | 0.530 | 0.616 | 0.902 | 0.864 | 0.930 | 0.954 | 0.915 | 0.976 |
| IgG-tTG-HR-MIXED (one study) | 0.677 | 0.556 | 0.778 | 1.000 | 0.839 | 1.000 | 0.518 | 0.413 | 0.621 | 0.677 | 0.556 | 0.778 | 1.000 | 0.839 | 1.000 |
H = significant heterogeneity by Pearson's Chi square
Note: see Appendix G for raw pooled data by antibody test
The minimum prevalence of CD in individual study populations was greater than 25% in most of the studied analysis groups (i.e., IgA-AGA, IgG-AGA, etc), except for ten analysis groups where the minimum prevalence was between 9% and 12%. In all the analysis groups, the maximum prevalence ranged from 30% to as high as 70%. The pooled prevalence for the analysis groups was predominantly between 30% and 45%.
We identified 99 potentially relevant HLA articles that appeared to address HLA DQ2/DQ8 in a CD population (Appendix F).8–11, 15, 53, 54, 62, 91–100, 100–177 These studies were not designed to determine the diagnostic utility of DQ2 or DQ8 per se.
Of the identified studies, 54 allowed estimation of the prevalence, sensitivity or specificity of HLA DQ2/DQ8 in the studied population.8–10, 15, 53, 54, 93, 100, 109, 120, 134–177 In one study, DQ2 data could not be reliably extracted.169 The authors of one study9 explicitly stated that the patients used were the same as in two of their other publications.8, 93 In two other publications by the same authors,9, 10 the patients appear to be different and the authors do not indicate that they used patients from a previous study. However, the possibility that these two studies9, 10 share a subset of patients cannot be excluded. Another two studies addressing different topics but with extractable HLA data, also appeared to have used the same patients.53, 136 In cases of duplicate publications, the studies with the greater number of patients were used.9, 136
The study designs and strictness of CD diagnosis in these articles varied, as did the inclusion of a control group. Most of the CD cases were diagnosed based on the ESPGAN criteria, although in some studies CD was diagnosed based on serology and then in most cases later confirmed by biopsy.15, 109, 120, 160, 161, 164, 168, 170, 172, 177 Nine of the studies were classified as cross-sectional studies,169–177, 32 were case-control studies,8–10, 53, 100, 120, 134–159 and 12 were mixed cross-sectional/case-control studies or could be considered as diagnostic cohort studies.15, 54, 109, 160–168 Four of the mixed design studies109, 164, 166, 178 used screen-negative patients as the control group, whereas the rest used a control group that was separate from the screened population. The study populations were also variable. The case-control studies used known CD cases compared with variously defined CD negative controls.
Seven studies used relatives of CD patients,158, 161, 164, 166, 169, 172, 177 four used a population with Down Syndrome,109, 134, 160, 170 two used a population with type I diabetes,165, 173 and one used a mixed group of patients with CD including some with Down's and others with diabetes.173 The mixed-design/cohort studies used patients suspected of CD on clinical grounds or subjects who belonged to a high-risk group, such as type 1 diabetics or first-degree relatives of patients with CD. The remaining articles used a screened healthy population or another specific group.
The articles with extractable data stated the frequency of HLA DQ2, and to a lesser extent the frequency of HLA DQ8, in their CD group. The cross-sectional studies did not include a control group. Only the frequency as a surrogate of sensitivity was available. None of the case-control or mixed-design studies calculated the sensitivity or specificity of HLA DQ2 or DQ8. However, these studies allowed us to derive estimates of these statistics from their results or tables. The considerable degree of clinical and methodological heterogeneity between the identified studies did no allow for statistical pooling of the results.
| Author, year; country | Prev of CD | DQ2 in CD | DQ2 in controls | Sensitivity | Specificity | PPV | NPV | CD population |
|---|---|---|---|---|---|---|---|---|
| Iltanen, 1999; Finland | 0.24 | 90.48 | 29.85 | 90% | 70% | 49% | 96% | Known CD versus biopsied controls |
| Sacchetti, 1998; Italy | 0.79 | 86.89 | 18.75 | 87% | 81% | 95% | 62% | Known CD versus biopsied controls |
| 0.51 | 86.89 | 26.72 | 87% | 73% | 77% | 84% | Versus unbiopsied healthy controls | |
HLA all study data. The following section presents the data of the HLA studies that failed to be included on the basis that the control groups were not assessed with the gold standard test for CD (biopsy). These studies collectively provide useful information on the diagnostic value of HLA testing, but have to be interpreted with caution.
| Author | Year | Country | # of CD | % DQ2 | % DQ8 | % DQ2/8 | Population with CD |
|---|---|---|---|---|---|---|---|
| Lewis | 2000 | USA | 101 | 90.10 | n/a | n/a | Confirmed cases among CD relatives |
| Book | 2001 | USA | 8 | 87.50 | 12.50 | 100 | Down Syndrome |
| Book | 2003 | USA | 34 | n/a | n/a | 97.06 | Affected 1st-degree relatives of CD sib. pairs |
| Csizmadia | 2000 | Netherlands | 10 | 100 | 20 | n/a | Down Syndrome |
| Fasano | 2003 | USA | 98 | 83.67 | 22.45 | 100 | Screened large population only subset tested for HLA |
| Iltamen | 1999 | Finland | 5 | 100 | n/a | n/a | Sjogren's syndrome |
| Kaukinen | 2000 | Finland | 6 | 100 | n/a | n/a | Known CD |
| Maki | 2003 | Finland | 56 | 85.71 | n/a | n/a | Screen of school-age children |
| Mustalahti | 2002 | Finland | 29 | 100 | n/a | n/a | Relatives of CD or DH |
| Catassi | 2001 | Algeria | 79 | 91.3 | n/a | 95.6 | Saharawi Arabs |
| Lui | 2002 | Finland | 260 | 96.92 | 2.69 | 99.62 | Family members of celiacs |
| Polvi | 1996 | Finland | 45 | 100 | n/a | n/a | Known CD |
| Ploski / Sollid | 1996 | Sweden | 135 | 91.85 | 4.44 | 96.30 | Known CD |
| Popat | 2002 | Sweden | 62 | 93.55 | n/a | n/a | Known CD |
| Larizza | 2001 | Italy | 7 | 100 | n/a | n/a | Children with autoimmune thyroid disease, EMA+biopsy |
| Failla | 1996 | Italy | 7 | 14.29 | n/a | n/a | Down Syndrome (only 7 CD cases) |
| Farre | 1999 | Spain | 60 | 93.33 | n/a | n/a | 1st-degree relatives of celiacs |
| Balas | 1997 | Spain | 212 | 94.81 | 4.25 | 99.06 | Known CD |
| Zubillaga | 2002 | Spain | 135 | 92.59 | 3.70 | 96.0 (calc) | Mostly CDs, some CD in subjects with Down Syndrome and subjects with diabetes |
| Karell | 2003 | France | 92 | 86.96 | 6.52 | 93.48 | Known CD |
| Italy | 302 | 93.71 | 5.63 | 89.40 | |||
| Finland | 100 | 91 | 5.00 | 96.00 | |||
| Norway/ Sweden | 326 | 91.41 | 5.21 | 96.63 | |||
| Uk | 188 | 87.77 | 7.98 | 95.74 | |||
| Total | 1008 | 93.71 | 5.95 | 93.95 | |||
| Kaur | 2002 | India | 35 | 97.14 | n/a | n/a | Known CD |
| Neuhausen | 2002 | Israel | 23 | 82.61 | 56.52 | 100 | Bedouin Arabs |
| Tuysuz | 2001 | Turkey | 55 | 83.64 | 16.36 | 90.91 | Children with known CD |
| Bouguerra | 1996 | Tunisia | 94 | 84.04 | n/a | n/a | Known CD |
| Sumnik | 2000 | Czech | 15 | 80 | 66.67 | 100 | Diabetics |
| Perez-Bravo | 1999 | Chile | 62 | 11.29 | 25.81 | 37.10 | Chileans |
DH = dermatitis herpetiformis
| Author, year; country | Prev of CD | % DQ2 in CD | % DQ2 in Controls | Sens | Spec | PPV | NPV | CD population |
|---|---|---|---|---|---|---|---|---|
| Fine, 2000 ; USA | 0.06 | 88 (22/25) | 31.24 (134/429) | 0.88 | 0.69 | 0.14 | 0.99 | Known CD |
| Howell, 1995 ; UK | 0.38 | 91.21 (83/91) | 23.18 (35/151) | 0.91 | 0.77 | 0.7 | 0.94 | Known CD |
| Michalski, 1995; Ireland | 0.62 | 96.67 (87/90) | 39.29 (22/56) | 0.97 | 0.61 | 0.8 | 0.92 | Known CD |
| Colonna, 1990 ; Italy | 0.36 | 94.59 (140/148) | 40.82 (109/267) | 0.95 | 0.59 | 0.56 | 0.95 | Known CD |
| Catassi, 2001 ; Algeria | 0.37 | 91.1 (72/79) | 38.9 (53/136) | 0.91 | 0.61 | 0.58 | 0.92 | Saharawi Arabs |
| Congia, 1991; Italy | 0.2 | 96 (24/25) | 34 (34/100) | 0.96 | 0.66 | 0.41 | 0.99 | Known CD |
| Ferrante, 1992 ; Italy | 0.48 | 88 (44/50) | 16.36 (9/55) | 0.88 | 0.84 | 0.83 | 0.88 | Known CD |
| Mazzilli, 1992 ; Italy | 0.5 | 92 (46/50) | 18 (9/50) | 0.92 | 0.82 | 0.84 | 0.91 | Known CD |
| Tighe, 1992 ; Italy | 0.49 | 70.59 (39/43) | 8.33 (5/41) | 0.91 | 0.88 | 0.89 | 0.9 | Known CD |
| Castro, 1993 ; Italy | 0.38 | 80 (4/5) | 37.5 (3/8) | 0.8 | 0.63 | 0.57 | 0.83 | Down Syndrome |
| Lio, 1997; Italy | 0.45 | 100 (18/18) | 63.64 (14/22) | 1 | 0.36 | 0.56 | 1 | Known CD |
| Sacchetti, 1998 ; Italy | 0.79 | 86.89 (106/122) | 18.75 (6/32) | 0.87 | 0.81 | 0.95 | 0.62 | Known CD and biopsied controls |
| Sacchetti, 1998 ; Italy | 0.51 | 86.89 (106/122) | 26.72 (31/116) | 0.87 | 0.73 | 0.77 | 0.84 | Healthy controls |
| Iltamen, 1999; Finland | 0.24 | 90.48 (19/21) | 29.85 (20/67) | 0.9 | 0.7 | 0.49 | 0.96 | Known CD |
| Ploski/Sollid, 1993 ; Sweden | 0.34 | 94.68 (89/94) | 25.97 (47/181) | 0.95 | 0.74 | 0.65 | 0.96 | Known CD |
| Pattersson, 1933; Sweden | 0.4 | 92.31 (60/65) | 43.75 (42/96) | 0.92 | 0.56 | 0.59 | 0.92 | Known CD |
| Ploski/Sollid, 1996 ; Sweden | 0.43 | 91.85 (124/135) | 22.35 (40/179) | 0.92 | 0.78 | 0.76 | 0.93 | CD vs blood donors |
| Fernandez-Arquero, 1995 ; Spain | 0.36 | 92 (92/100) | 25.56 (46/180) | 0.92 | 0.74 | 0.67 | 0.94 | Known CD |
| Arranz, 1997 ; Spain | 0.5 | 92 (46/50) | 24 (12/50) | 0.92 | 0.76 | 0.79 | 0.9 | Known CD |
| Balas, 1997 ; Spain | 0.22 | 94.81 (201/212) | 29.25 (217/742) | 0.95 | 0.71 | 0.48 | 0.98 | Known CD |
| Ruiz Del Prado, 2001 ; Spain | 0.04 | 94.74 (36/38) | 39.22 (351/895) | 0.95 | 0.61 | 0.09 | 1 | Known CD |
| Dijilali-Saiah, 1994 ; France | 0.27 | 88.75 (71/80) | 21.13 (45/213) | 0.89 | 0.79 | 0.61 | 0.95 | Known CD |
| Dijilali-Saiah, 1998 ; France | 0.44 | 83.17 (84/101) | 20 (26/130) | 0.83 | 0.8 | 0.76 | 0.86 | Known CD |
| Tighe, 1993 ; Israel | 0.51 | 90.7 (24/34) | 12.2 (3/36) | 0.71 | 0.92 | 0.89 | 0.77 | Ashkenazi Jews, known CD |
| Arnason, 1994 ; Iceland | 0.13 | 84 (21/25) | 36.36 (60/165) | 0.84 | 0.64 | 0.26 | 0.96 | Known CD |
| Boy, 1994; Sardinia | 0.5 | 96 (48/50) | 32 (16/50) | 0.96 | 0.68 | 0.75 | 0.94 | Known CD |
| Congia, 1994 ; Sardinia | 0.42 | 90.77 (59/65) | 39.33 (35/89) | 0.91 | 0.61 | 0.63 | 0.9 | Known CD |
| Erkan, 1999 ; Turkey | 0.5 | 40 (12/30) | 6.67 (2/30) | 0.4 | 0.93 | 0.86 | 0.61 | Known CD |
| Tumer, 2000 ; Turkey | 0.3 | 51.52 (17/33) | 25.97 (20/77) | 0.52 | 0.74 | 0.46 | 0.78 | Turkish, known CD |
| Tuysuz, 2001 ; Turkey | 0.52 | 83.64 (46/55) | 24 (12/50) | 0.84 | 0.76 | 0.79 | 0.81 | Turkish, known CD |
| Perez-Bravo, 1999 ; Chile | 0.33 | 11.29 (7/62) | 2.42 (3/124) | 0.11 | 0.98 | 0.7 | 0.69 | Chilean |
| Author, year; country | Prev of CD | % DQ2 in CD | % DQ2 in controls | Sens | Spec | PPV | NPV | CD population |
|---|---|---|---|---|---|---|---|---|
| Book, 2001 ; USA | 0.09 | 87.50 (7/8) | 15.58 (12/77) | 0.88 | 0.84 | 0.37 | 0.98 | Down Syndrome |
| Csizmadia, 2000 ; Netherlands | 0.11 | 100 (10/10) | 28 (25/90) | 1.00 | 0.72 | 0.29 | 1.00 | Down Syndrome |
| Fasano, 2003 ; USA | 0.52 | 83.67 (82/98) | 42.39 (39/92) | 0.84 | 0.58 | 0.68 | 0.77 | 9019 at risk, 4126 not at risk |
| Larizza, 2001 ; Italy | 0.08 | 100 (7/7) | 34.62 (27/78) | 1 | 0.65 | 0.21 | 1 | Children with autoimmune thyroid disease, EMA+biopsy |
| Polvi, 1996 ; Finland | 0.58 | 100 (45/45) | 28.13 (9/32) | 1 | 0.72 | 0.83 | 1 | CD vs various controls |
| Iltamen, 1999; Finland | 0.15 | 100 (5/5) | n/a | 1 | n/a | n/a | n/a | Sjogren's syndrome |
| Kaukinen, 2000 ; Finland | 0.17 | 100 (6/6) | n/a | 1 | n/a | n/a | n/a | CD vs disease controls |
| Lui, 2002; Finland | 0.52 | 96.92 (252/260) | 57.38 (136/237) | 0.97 | 0.43 | 0.65 | 0.93 | Family members of celiacs (controls=unaffected family members) |
| Farre, 1999 ; Spain | 0.55 | 93.33 (56/60) | 18 (9/50) | 0.93 | 0.82 | 0.86 | 0.91 | CD vs healthy controls |
| 0.26 | 93.33(56/60) | 63.91(108/169) | 0.93 | 0.36 | 0.34 | 0.94 | CD vs relatives of CD | |
| Sumnik, 2000 ; Czech | 0.07 | 80 (12/15) | 49.46 (92/186) | 0.8 | 0.51 | 0.12 | 0.97 | Diabetes (control=EMA neg.) |
| Kaur, 2002 ; India | 0.11 | 97.14 (34/35) | 4.64 (13/280) | 0.97 | 0.95 | 0.72 | 1 | CD vs healthy controls |
| Neuhausen, 2002 ; Israel | 0.31 | 82.61 (19/23) | 61.54 (32/52) | 0.83 | 0.38 | 0.37 | 0.83 | Bedouin Arabs (some cases and controls not biopsied) |
| Author, year; country | Prev of CD | DQ8 in CD | DQ8 in controls | Sens | Spec | PPV | NPV | CD population |
|---|---|---|---|---|---|---|---|---|
| Csizmadia, 2000 ; Netherlands | 0.11 | 20 (2/10) | 20 (18/90) | 0.20 | 0.80 | 0.10 | 0.90 | Down Syndrome |
| Fasano, 2003 ; USA | 0.52 | 22.45 (22/98) | 20.65 (19/92) | 0.22 | 0.79 | 0.54 | 0.49 | Screened at-risk and not-at-risk populations |
| Lui, 2002; Finland | 0.52 | 2.69 (7/260) | 10.55 (25/237) | 0.03 | 0.89 | 0.22 | 0.46 | Family members of CD patients (controls=unaffected family members) |
| Ploski/Sollid1996 ; Sweden | 0.43 | 4.44 (6/135) | 25.14 (45/179) | 0.04 | 0.75 | 0.12 | 0.51 | Known CD |
| Balas, 1997 ; Spain | 0.22 | 4.25 (9/212) | 16.85 (125/742) | 0.04 | 0.83 | 0.07 | 0.75 | Known CD |
| Sumnik, 2000 ; Czech | 0.07 | 66.67 (10/15) | 65.59 (122/186) | 0.67 | 0.34 | 0.08 | 0.93 | Diabetes |
| Neuhausen, 2002 ; Israel | 0.31 | 56.52 (13/23) | 25 (13/52) | 0.57 | 0.75 | 0.5 | 0.8 | Bedouin Arabs |
| Tuysuz, 2001 ; Turkey | 0.52 | 16.36 (9/55) | 8 (4/50) | 0.16 | 0.92 | 0.69 | 0.5 | Turkish known CD |
| Perez-Bravo, 1999 ; Chile | 0.33 | 25.81 (16/62) | 12.9 (16/124) | 0.26 | 0.87 | 0.5 | 0.7 | Chileans |
The remaining outlier studies were divided into a low-sensitivity/high-specificity group (Group 1), and a high-sensitivity/low-specificity group (Group 2). In the first case, all the studies were conducted in a non-Western European population. In particular, the worst performance of HLA DQ2 occurred in a study from Chile,137 where the frequency of HLA DQ2 was very low in both the patients with CD and the control subjects. It is important to note, however, that not all non-Western populations deviated from the main cluster of studies. For example, Catassi et al.120 found that 91% of Saharawi Arabs (Algeria) with CD carried HLA DQ2 compared with 38.9% of Saharawi controls. These values are similar to those seen in most Western populations. The second group all showed relatively poor specificity, although the sensitivity was preserved. As would be expected, the control groups of these studies were at high risk of having CD (relatives of CD158, 164, 166), or were a population with a known higher frequency of HLA DQ2 (individuals with diabetes161, 165). As such, the high frequency of HLA DQ2 in these control populations makes the specificity of HLA DQ2 rather poor.
| Author; year; country | Prev of CD | DQ2 or DQ8 in CD | DQ2 or DQ8 in controls | Sens | Spec | PPV | NPV | Notes |
|---|---|---|---|---|---|---|---|---|
| Fasano, 2003 ; USA | 0.52 | 100 (98/98) | 59.78 (55/92) | 1 | 0.4 | 0.64 | 1 | Screened at-risk and not-at-risk populations |
| Catassi, 2001 ; Algeria | 0.37 | 96.2 (76/79) | 41.9 (57/136) | 0.96 | 0.58 | 0.57 | 0.96 | Saharawi Arabs |
| Lui, 2002; Finland | 0.52 | 99.62 (259/260) | 67.93 (161/237) | 1 | 0.32 | 0.62 | 0.99 | Family members of CD (controls=unaffected family members) |
| Balas, 1997 ; Spain | 0.22 | 99.06 (210/212) | 46.09 (342/742) | 0.99 | 0.54 | 0.38 | 1 | Known CD |
| Sumnik, 2000 ; Czech | 0.07 | 100 (15/15) | 87.63 (163/186) | 1 | 0.12 | 0.08 | 1 | Diabetes |
| Tuysuz, 2001 ; Turkey | 0.52 | 90.91 (50/55) | 32 (16/50) | 0.91 | 0.68 | 0.76 | 0.87 | Turkish Known CD |
| Neuhausen, 2002 ; Israel | 0.31 | 100 (23/23) | 86.54 (45/52) | 1 | 0.13 | 0.34 | 1 | Bedouin Arabs |
| Perez-Bravo, 1999 ; Chile | 0.33 | 37.1 (23/62) | 15.32 (19/124) | 0.37 | 0.85 | 0.55 | 0.73 | Chileans |
Using epidemiologically appropriate eligibility criteria, our comprehensive literature search did not identify any studies that specifically addressed the question of the sensitivity or specificity of biopsy for the diagnosis of CD.
However we sought to obtain indirect evidence regarding the diagnostic performance of biopsy as a test for CD. Some data was available from those studies identified for other review objectives, such as the cross-sectional screening studies, the HLA DQ2/8 studies, and studies of IELs. We also sought studies of follow-up of biopsy negative patients suspected of CD, and studies of silent and latent CD. The findings from these studies are presented in the Discussion and in Appendix H.
The literature search yielded 2,116 references (Appendix F). A first-level screen of the titles, abstracts and keywords, for articles that related to the incidence or prevalence of CD, excluded 1,506 references. Full-text versions of each of the 610 retained references were obtained and used for a second-level screen for articles, with a focus on the incidence and/or prevalence of CD. Review articles were also identified and kept for reference (n = 71). Three hundred and forty-eight out of the 610 references were excluded. The remaining 262 references were screened at a third level (Appendix F). Studies were included if they reported the prevalence and/or incidence of CD in the following groups: (1) general populations from North America or Western Europe; (2) first-degree relatives of patients with CD; (3) patients with type 1 diabetes; (4) patients being investigated for anemia; (5) patients with osteoporosis or osteopenia; (6) patients with suspected CD on the basis of their clinical presentations. We did not use any geographic restriction for the studies of populations at risk (first-degree relatives and type 1 diabetics) or of associated clinical presentations (suspected CD, anemia, or metabolic bone disease). Studies of prevalence or incidence that used AGA tests conducted prior to 1990 were excluded after discussion with the AHRQ because of potential problems with the reliability of older AGA assays. Reports which were not sufficiently explicit for data extraction also had to be excluded.179–181
We defined incidence studies as those studies that reported the total number of new cases of CD for a given territory and period, over a unit of population density. Therefore, studies of incidence where there was no population denominator were excluded. When multiple studies of incidence of CD were available for a similar country or geographic area, the most recent and/or most encompassing was selected. In general, we excluded the studies whose observation periods pertained exclusively to a period prior to 1990.
A total of 133 publications were selected. Of these, 14 publications were identified as duplicates on the basis that the same study population was reported on elsewhere, or as part of a larger cohort.122, 182–194 The remaining 119 original studies on prevalence and/or incidence of CD in the populations of interest were included and their data abstracted. Of these included studies, 42 assessed the prevalence and/or incidence of CD in a general population. Twelve of the 42 reported on the incidence of CD,128, 195–205 and 30 reported on the prevalence, either in the US (three studies206–208), Scandinavia (11 studies209–219), Italy and San Marino (seven studies126, 220–225), UK (four studies226–229), or other countries (Spain230, the Netherlands,231, 232 Switzerland,233 and Germany234).
Studies of the prevalence of CD in populations at risk were divided as follows: 18 studies of the first-degree relatives of CD patients,129, 167, 206, 235–249 and 34 studies in patients with type 1 diabetes.234, 250–282
Studies of the prevalence of CD in patients with associated clinical presentations were divided as follows: 12 studies in anemia and/or iron deficiency,283–294 four studies in metabolic bone disease,295–298 and 13 studies of patients with suspected CD on the basis of their clinical presentation.206, 238, 299–309 The clinical manifestations that were included in the “suspected CD category” were: chronic diarrhea, weight loss, malabsorption or abdominal pain in adults and failure to thrive, short stature, malabsorption, chronic diarrhea, and abdominal pain in children. Four studies included groups at multiple-risk levels.206, 234, 238, 272
| Study | Country, period | Group at risk | Period related to results | Incidence | |
|---|---|---|---|---|---|
| Crude incidence (# cases/100,000 patient year) | Cumulative incidence (# cases/1,000 births) | ||||
| Ivarsson, 2003 | Sweden, 1973-97 | Children | 1997 (0–2 y) | 51 (95% CI: 36–70) | Age 2 (1995): 1.7 (95% CI: 1.3–2.1) |
| Duplicate Ivarsson, 2000193 | 1996 (2–5 y) | 33 (95% CI: 24–44) | |||
| 1996 (5–15 y) | 10 (95% CI: 7–13) | ||||
| Weile, 1993 | Denmark, 1960-88 | Children | 1960-88 | Age 5 (1988): 0.118 | |
| Duplicate Weile, 1993196 | |||||
| Maki, 1990 | Finland, 1960-84 | Children | 1974-83 | 3.46 (95% CI: n/r) | |
| Duplicate ref194 | |||||
| Hawkes, 2000 | England, 1981-95 | Children | 1991-95 | 2.15 (95% CI: n/r) | |
| Magazzu, 1994 | Sicily 1975-89 | Children | 1989 birth cohort | Age 5 (1989): 1.16 | |
| 95% CI: 0.92–1.42 | |||||
| Lopez-Rodriguez, 2003 | Spain, 1981-99 | Children 0–14 y | 1981-90 | 6.87 (95% CI: 5.26–8.83) | |
| 1991-99 | 16.04 (95% CI: 12.99–19.59) | ||||
| Children 0–4 y | 1991-99 | 42.04 (95% CI: n/r) | |||
| Hoffenberg, 2003 | US (Denver, Colorado), 1993-99 | Children | 1993-99 | Age 5 (1999): 9 (95% CI: 4–20) | |
| Jansen, 1993 | Netherlands 1990-92 | All ages | 1991-92 | 1.0 (95% CI: n/r) | |
| Corrao, 1995 | Italy 1990-91 | All ages | 1990-91 | 2.13 (95% CI: n/r) | Age 5 (1991): 0.81 |
| Talley, 1994 | US 1960-90 Olmstead County | All ages | 1960-90 | 1.2 (95% CI: 0.7–1.6) | |
| 1980-90 | 1.7 (95% CI: n/r) | ||||
| Bodé, 1996 | Denmark, 1976-91 | Adults | 1976-91 | 1.27 (95% CI: n/r) | |
| Collin, 1997 | Finland, 1975-94 | Adults | 1990-94 | 17.2 (95% CI: n/r) | |
| Hawkes, 2000 | England, 1981-95 | Adults | 1991-95 | 3.08 (95% CI: n/r) | |
The incidence of CD has been most studied in the Scandinavian countries, particularly Sweden,193, 195, 310–313 Denmark,196, 197, 313, 314 and Finland,194, 198, 199 where important disparities have been observed over time and between countries. Reports from these countries have the advantage of being derived from comprehensive prospective databases and from populations which are genetically fairly stable, shedding light on potential environmental causal exposures,195, 196 or on variations in practice patterns.
In Scandinavia, the highest incidences of CD in children were found in Sweden for the 0- to 2-year age group from 1987 to 1997, where an average of 198 new cases per 100,000 patient years (95% CI: 186–210) were observed.193, 195 This peak in incidence was followed by a rapid decline, observed during 1995-97, where incidences dropped to an average of 51/100,000 patient years (95% CI: 36–70). In contrast, the incidence of CD in children aged 2 to 4.9 years and 5 to 15 years was only slightly increased over the 1973-97 period, with a peak in 1996 of 33 cases (95% CI: 24–44) per 100,000 patient years and 10 cases (95% CI: 7–13) per 100,000 patient years for these respective age groups. A cohort effect was noted in that the cumulative incidences at 2 years of age for the children belonging to birth cohorts from 1984 to 1994 were on the gradual rise (up to 4.4 cases/1,000 births [95% CI: 3.8–5.0] for the 1993 cohort), while a progressive decline was observed for birth cohorts from 1994 to 1996 (down to 1.7 cases [95% CI: 1.3–2.1] per 1,000 births for the 1995 cohort). Most of these cases were symptomatic, so that these observations are unlikely to be due to changes in screening practices. Interestingly, these changes mirrored changes in the composition of infant formulas, with the highest values of a wheat/rye/barley exposure index during the years 1982-1994.
In contrast, the incidence of CD in Denmark, a neighbouring country, has been significantly lower and very stable from 1960 to 1988,196 with an average incidence of 0.089/1,000 live births for that period.313 A comparison of dietary exposures between Swedish and Danish children diagnosed with CD between 1972 and 1989 showed that by the age of 8 months, the Swedish diet contained more than 40 times more gliadin than the Danish diet.313 In Finland, incidences have also been fairly stable, and have in fact decreased among infants but increased among older children.198 However, these observations date back to 1984 and can therefore not be compared with the Swedish epidemics.
Spain has also seen an increased incidence of CD over the past 25 years, from 6.87 (95% CI: 5.26–8.83) cases/100,000/year in 1981-90 to 16.04 cases/100,000/year (95% CI: 12.99–19.59) in 1991-99,204 an observation that was correlated with an increased proportion of silent or atypical presentations at diagnosis (i.e., inferring a role for changes in clinical practice). The age at diagnosis also correlated positively with the age at which gluten was introduced in the diet.
The role of dietary exposure during infancy is also highlighted in studies from the UK, where recommendations on infant feeding, promoting breastfeeding and later introduction of starches, were published in 1974. Subsequent to these recommendations, there was a fall in the incidence of childhood CD;315, 316 however, this data is not presented in detail because we focused on reports from the past 15 years.
As opposed to the incidences derived from reported cases, the incidence observed from a prospective screening protocol are not subject to variations related to practice patterns and are obviously more comprehensive and accurate. Hoffenberg et al., from the US, conducted the only prospective CD screening study available to date.128 Between December 1993 and September 1999, a total of 22,346 newborns in Denver, Colorado were screened for HLA genotypes associated with CD and type 1 diabetes. A representative sample of at risk HLA DRB1*03 positive infants were prospectively followed (n=987), for as long as the first seven years of life. Serological screening was performed at nine, 15 and 24 months of age, then yearly. Small bowel biopsies were recommended if the serology (tTG in most cases) was positive on two separate occasions, or in the presence of clinical suspicion. Between 1993 and 1999, 19 children were found to have evidence of CD, ten children had biopsy-confirmed CD, whereas, nine children had a positive tTG result at least twice. The mean age at presentation of evidence of CD was 4.6 years (range 2.6–6.5). Compared with HLA-DR3-negative children, the RR for evidence of CD was 5.6 (1.5–21, p=0.009) and 9.1 (1.7–48, p=0.003), for those expressing one and two HLA-DR3 alleles, respectively. The RR of CD in females was 3.34 (1–10.9, p=0.048) times that of males. Cognisant of the prevalence of HLA-DR mono- and heterozygotes among the same birth cohort, the authors calculated that by the age of 5, the estimated cumulative incidence of CD in the general population (defined as either biopsy-proven CD or persistently elevated tTG) was 9/1000 births (95% CI: 4–20), or 1:104 (1:49 to 1:221). This remarkably high cumulative incidence (i.e., twice that of the highest value among Swedish children at 4 years of age - 5.0 [95% CI: 4.4–5.7]193) has to be interpreted in light of the fact that only ten out of the 19 cases had been biopsied; the remaining nine cases were diagnosed on the basis of a persistently elevated tTG titre, the PPV of which the same authors reported to be only 70% to 83%.317 However, as mentioned above, these results are derived from an actual prospective and systematic screening intervention for CD, where asymptomatic cases would be detected. In all likelihood, there is therefore an important proportion of CD cases who remain undiagnosed during early childhood.
As has been observed for children, the incidence of CD in adults seems to have increased over the past 20 years.199, 201 This is largely explained by a change in practice patterns: physicians are more aware of the condition, its atypical manifestations and associated condition, while at the same time, serological testing has become widely available. There are therefore more diagnoses made on the basis of case-finding. This is reflected by the fact that the proportion of patients being diagnosed with CD in the absence of symptoms, or as a result of serological testing, has also increased.199, 201, 318–320
In Finland over the period 1975-94, Collin et al.199 have observed a ten-fold rise in the incidence of CD. The authors attributed this to the use of serologic screening (physicians were actively told to screen patients with type I insulin-dependent diabetes (IDDM), autoimmune thyroid disease, connective tissue diseases, women with infertility, patients with neurologic symptoms and first-degree relatives of CD patients), the routine performance of intestinal biopsies on all patients undergoing gastroscopy, and to the opening of open-access endoscopy clinics, creating the ability of all general practitioners to refer patients for gastroscopy.
In Italy, a gradual increase in the number of annual new CD diagnoses was observed between 1968 and 1992;318, 320 this increase correlated with an increased proportion of patients with subclinical presentations being identified.318, 320 Interestingly, despite the changing clinical presentation, there was no statistical difference between the histological grades at diagnosis.320
The incidence of CD in individuals of all ages varies from 1.0 in the Netherlands200 to 2.13 in Italy.202 In Italy, the RR of CD in adults ranged from 0.11 in the >60 year group to 0.33 in the 16–39 year group, compared with children.202 The RR of CD for females was 1.90 (95% CI: 1.48–2.45).202
In the US, the 30-year incidence (1960-90) for Olmstead County was 1.2 (95% CI: 0.7–1.6), and the incidence for 1980-90 was slightly higher at 1.7 (95% CI: not reported).205 This observation contrasts with the cumulative incidence of 9/1000 by age 5 reported by Hoffenberg from Denver, Colorado;128 clearly, further knowledge of the epidemiology of CD in the US is required.
The point prevalence of CD can be calculated from registers of CD cases and the size of the population at risk; we found reports of such an observation in three of the included incidence studies.199, 199, 205 The point prevalence of CD was 21.8/100,000 in Olmstead County in 1991,205 2.7/100,000 (95% CI: 11.0–14.5) in the Netherlands in 1992,200 and 204/100,000 (95% CI: 181–231) in Finland in 1994.199 Of note, the later prevalence from Finland was observed in a community where intense efforts had been carried to screen the population at risk for CD.
| Author, year | Country | Age group | Test | Total patients | Prevalence by serology | Prevalence by biopsy | Notes |
|---|---|---|---|---|---|---|---|
| Fasano, 2003 | USA | Adults | EMA - ME; all positive EMA tested with tTG-HU | 2,845 | 0.00949 | 116/350 biopsied | |
| Green, 2000 | USA | Adults | EGD/biopsy | 1,749 | 0.00515 | Not all sytematically biopsied; only those with suggestive endoscopic features | |
| Not, 1998 | USA | Adults | IgG- and IgA-AGA - ELISA; confirmed with IgA-EMA ME or HU | 2,000 | 0.00400 | ||
| Fasano, 2003 | USA | Children | 1,281 | 0.00312 | |||
| Johnston, 1998 | UK | Adults | IgA-AGA, IgA-EMA | 1,823 | 0.00823 | ||
| Sanders, 2003 | UK | Adults | IgG- and IgA - ELISA; EMA-ME | 1,200 | 0.01917 | 0.01000 | 22/23 biopsied |
| West, 2003 | U.K. | Adults | IgA EMA-ME, IgA-tTGA | 7,527 | 0.01156 | ||
| Rutz, 2002 | Switzerland | Children | IgA-EMA-ME, IgA-tTG, IgG-AGA and IgA-AGA | 1,450 | 0.00759 | 0.00690 | 10/11 biopsied |
| Borch, 2001 | Sweden | Adults | Biopsy, IgA- and IgG-AGA; IgA-EMA-ME | 482 | 0.01452 | 0.01867 | |
| Grodzinsky, 1996 | Sweden | Adults | IgA-AGA; IgA-EMA | 1,866 | 0.00589 | 0.00375 | Prevalence by IgA-EMA not reported |
| Ivarsson, 1999 | Sweden | Adults | IgA- and IgG-AGA - ELISA, cut-off not recorded; IgA-EMA -ME; serum IgA level | 1,894 | 0.00475 | 0.00475 | |
| Sjoberg, 1994 | Sweden | Adults | IgG- and IgA-AGA | 1,537 | 0.01431 | 0.00065 | 13/22 biopsied |
| Sjoberg, 1999 | Sweden | Adults | IgA-AGA, IgA confirmed with EMA-ME | 1970 | 0.00152 | 0.00152 | |
| Carlsson, 2001 | Sweden | Children | AGA, EMA, biopsy using Watson capsule | 690 | 0.01884 | 0.01594 | |
| Riestra, 2000 | Spain | Adults | IgG/IgA-AGA, IgA-EMA; the study was conducted as a 1) two-step protocol (determination of IgA/IgG-AGA, if positive measuring IgA-EMA); and a 2) one-step protocol (measuring IgA-EMA) | 1,170 | 0.00171 | 0.00256 | 1 CD picked up when AGA and EMA was neg. |
| Corazza, 1997 | Republic of San Marino | Adults | IgA-EMA; biopsy | 559 | 0.00179 | 0.00179 | |
| Hovdenak, 1999 | Norway | Adults | IgA- and IgG-AGA; IgA-EMA | 2,069 | 0.00387 | 0.00338 | |
| Rostami, 1999 | Netherlands | Adults | IgA-EMA | 1,000 | 0.00300 | 0.00300 | |
| Csizmadia, 1999 | Netherlands | Children | IgA-EMA | 6,127 | 0.01224 | 0.00506 | 57/75 biopsied |
| Pittschieler, 1996 | Italy | Adults | IgA- and IgG-AGA; IgA-EMA; biopsy | 4,615 | 0.00195 | 0.00195 | 38 of 140 biopsied |
| Trevisiol, 1999 | Italy | Adults | IgA-EMA; biopsy | 4,000 | 0.00250 | 0.00250 | |
| Volta, 2001 | Italy | Adults (mostly) | IgA-EMA-HU; biopsy | 3,483 | 0.00574 | 0.00488 | Prevalence of 0.57% (20/3483) if included 3 patients with normal villous but with increased IELs |
| Catassi, 2000 | Italy | Children | IgG-AGA (7 AU); IgA-AGA (15 AU); IgA-EMA indirect IF (1:5 dilution); biopsy | 2,096 | 0.00859 | ||
| Catassi, 1996 | Italy | Children | IgA- or IgG-AGA; confirmed with EMA and biopsy | 17,201 | 0.00645 | 0.00477 | |
| Di Pietralata, 1992 | Italy | Children | IgA-AGA; biopsy | 3,022 | 0.00629 | 0.00596 | |
| Dickey, 1992 | Ireland | Adults | IgA AGA | 443 | 0.01129 | ||
| Jager, 2001 | Germany | Mixed - mostly adults | IgA-AGA, IgG-AGA, IgA-tTG - | 150 | 0.02667 | Mixed group of at-risk populations, healthy group used | |
| Kolho, 1998 | Finland | Adults | EMA -HU | 1,070 | 0.01028 | 0.00748 | |
| Maki, 2004 | Finland | Children | IgA and IgG tTG; IgA and IgG EMA - IF; total serum IgA; HLA DR, DQ2 and DQ8 | 3,654 | 0.01259 | 0.00739 | |
| Collin, 2002 | Finland | Mixed - mostly adults | Biopsy | 2,974 | 0.00605 | ||
| Weile, 2001 | Denmark and Sweden | Adults | Serum IgA: IgG-AGA; IgA-AGA, cut-off >40 units; EMA; in cases of IgA <0.07g/L, IgG-AGA was analyzed | 1,573 | 0.00254 | ||
EGD=esophagogastroduodenoscopy; IF=immunofluorescence; prevalence expressed as proportion (multiply by 100 for percent, or 100,000 for per 100,000 value)
| Screening test | Age group | Number of studies | Total patients | Prevalence range |
|---|---|---|---|---|
| Primary biopsy | Adults | 2207,210 | 4,723 | 0.00515 – 0.00605 |
| IgA AGA | Overall | 2223,229 | 3,465 | 0.00629 – 0.01129 |
| Adults | 1229 | 443 | 0.01129 | |
| Children | 1223 | 3,022 | 0.00629 | |
| IgA / IgG AGA | Adults | 1216 | 1,537 | 0.01431 |
| IgA AGA - IGA EMA | Overall | 6208,209,211,217,219,226 | 8,831 | 0.00152 – 0.01884 |
| Adults | 5208,209,211,217,219 | 6,999 | 0.00152 – 0.01884 | |
| Children | 1321 | 1,823 | 0.00823 | |
| IgA/IgG AGA - IgA EMA | Overall | 7212,213,218,220,221,224,227 | 30,648 | 0.00195 – 0.01917 |
| Adults | 5212,213,218,224,227 | 11,351 | 0.00195 – 0.01917 | |
| Children (Italy) | 2220,221 | 19,297 | 0.00645 – 0.00859 | |
| IgA/IgG AGA - IgA tTG | Mostly adults (Germany) | 1234 | 150 | 0.02667 |
| IgA EMA | Overall | 7126,214,222,225,230–232 | 17,409 | 0.00171 – 0.01224 |
| Adults | 7126,214,222,225,230,231 | 0.00171 – 0.01028 | ||
| Children (Netherlands) | 1232 | 6,127 | 0.01224 | |
| IgA EMA - IgG tTG | Overall | 4206,215,228,233 | 16,757 | 0.00312 – 0.01259 |
| Adults (USA, UK) | 2206,228 | 10,372 | 0.00949 – 0.01156 | |
| Children | 3 (includes Fasano Child Group)206,215,233 | 6,385 | 0.00312 – 0.01259 | |
Country of study was indicated when possible; prevalence expressed as proportion (multiply by 100 for percent, or 100,000 for per 100,000 value)
The prevalence of CD by serology in the general unselected populations of North America and Western Europe, ranged widely from 152 per 100,000 (0.152% or 1:658) to 2,670 per 100,000 (2.67% or 1:37). The prevalence by biopsy ranged from 152 per 100,000 (0.152% or 1:658) to 1,870 per 100,000 (1.87% or 1:53). In four of the studies, a large proportion of the serology-positive subjects did not undergo biopsy.206, 216, 224, 232
| Percentiles | Serology | Biopsy |
|---|---|---|
| 5 | .0016255 | .0007378 |
| 10 | .0018050 | .0015761 |
| 25 | .0030919 | .0025321 |
| 50 | .0063702 | .0047672 |
| 60 | .0084439 | .0050768 |
| 75 | .0117290 | .0071429 |
| 80 | .0125193 | .0074416 |
| 90 | .0184088 | .0147536 |
| 95 | .0225417 | .0183992 |
| 100 | .0266667 | .0186722 |
| Minimum | .00152 | .00065 |
| Maximum | .02667 | .01867 |
Prevalence expressed as proportion (multiply by 100 for percent, or 100,000 for per 100,000 value)
Only four studies demonstrated a prevalence of CD of greater than 0.015 (1.5%) (UK,323 Sweden,209, 219 Germany234), and an additional six showed a prevalence of between 0.010 (1.0%) and 0.015 (1.5%) (UK,228 Sweden,216 Netherlands,232 Ireland,229 Finland214, 215). These studies would suggest a potentially higher prevalence of CD in these countries, though it should be kept in mind that other studies from these same countries showed a prevalence of less than 1.0%, including four studies from Sweden211, 213, 216, 217 (Figure 28
Among the 30 included studies, there was a considerable amount of variation in the point estimates for the prevalence of CD both by serology and by biopsy due to differences in serological test strategies, biopsy definitions and patient sampling, making pooled estimates unreliable. To further explore the potential sources of variability in the observed prevalence of CD, we plotted the studies' prevalence versus its sample size (Figure 29
| Study, year; country | Clinical setting | Age group | Dx criteria | N tested | Prevalence (%) |
|---|---|---|---|---|---|
| Bardella, 1991; Italy | Referral center | Adults | Biopsy | 60 | 43.3 |
| Bardella, 2001; Italy | Referral center | Adults | Biopsy | 80 | 50.0 |
| Carrocio, 2002; Italy | Referral center | Adults | Biopsy | 207 | 11.6 |
| Fasano, 2003; USA | Not reported | Adults | EMA | 1,910 | 1.5 |
| Bode, 1993; Denmark | Referral center | Children | Biopsy | 191 | 7.3 |
| Day, 2000; New Zealand | Referral center | Children | Biopsy | 153 | 4.6 |
| Thomas, 1992; England | Referral center | Children | Biopsy | 381 | 7.9 |
| Chan, 2001; Canada | Referral center | Children | Biopsy | 77 | 13.0 |
| Chartrand, 1997; Canada | Referral center | Children | Biopsy | 176 | 17.0 |
| Ventura, 2001; Italy | Community pediatricians | Children | Biopsy | 240 | 7.5 |
| Fitzpatrick, 2001; Canada | Community pediatricians | Children | EMA | 92 | 1.1 |
| Fasano, 2003; USA | Not reported | Children | EMA | 1,326 | 4.0 |
| Hill, 2000; USA | Referral center | Children | EMA | 1,008 | 2.5 |
| Hin, 1999; England | Community practice | All ages | Biopsy | 1,000 | 3.0 |
All three Italian studies were from referral centers, and intestinal biopsies were performed on all suspected cases, which cumulated to 347. The prevalence of CD was very high in these series, i.e., 43%,300 50%,301, and 12%.303
In a large study of prevalence of CD in at-risk and not-at-risk individuals in the US, a total of 1,910 adults with CD-associated symptoms or disorders underwent serological testing with EMA. Fifteen of the 28 EMA-positive subjects (53.6%) consented to a biopsy, which was confirmatory in all cases.206 The source of these patients and their mode of recruitment/referral were not reported. Based on the EMA result, the prevalence of CD in these adults with suspected CD was 1.5%.
Five of the eight studies came out of referral centers where all suspected cases (cumulating to 978) were biopsied.302, 304–306, 308 The prevalence of CD in these children ranged from 4.6%306 to 17%.305
In a case-finding study among 26 family pediatricians in Italy, 240 children were screened with EMA based on the presence of risk factors, and 18 diagnoses of biopsy-proven CD were made, resulting in a prevalence of 7.5%.309
Three studies, two American206, 238 and one Canadian,307 reported the prevalence of CD in children with related symptoms or conditions based on EMA testing. The cumulative number of children was 2,426, and the prevalence ranged from 1.1% in the Canadian study of children with chronic abdominal pain,307 to 4.0% in the large American study of CD prevalence in at-risk and not-at-risk populations.206
| Author, year; country | Total patients | Age group | Screening test(s) | First serology | Confirmatory serology | Biopsy proven | Biopsy criteria & description | Prevalence by serology | Prevalence by biopsy | Notes |
|---|---|---|---|---|---|---|---|---|---|---|
| Li Voon Chong, 2002; UK | 509 | Adults | EMA | 7 | None | n/a | None done | 0.0138 | n/a | |
| Talal, 1997; USA | 185 | Adults | EMA | 9 | None | 4 | ESPGAN | 0.0486 | 0.0216 | Only 5/9 biopsied |
| Rossi, 1993 | 211 | Children, some adults | EMA | 10 | None | 3 | ESPGAN | 0.0474 | 0.0142 | Only 3/10 biopsied |
| Kaukinen, 1999; Finland | 62 | Adults | EMA | None | 7 | ESPGAN | 0.0000 | 0.1129 | ||
| Sjoberg, 1998; Germany | 848 | Adults | AGA - IgG or IgA; EMA | 258 | 22 | 7 | Marsh | 0.0259 | 0.0083 | Only 14/22 biopsied |
| Sategna-Guidetti, 1994; Italy | 383 | Adults | EMA | 12 | None | 10 | Roy-Choudhury | 0.0313 | 0.0261 | 10/12 biopsied |
| Rensch, 1996; USA | 47 | Adults | EMA | 3 | None | 3 | Loss of villous architecture, crypt hyperplasia, and increased IELs | 0.0638 | 0.0638 | |
| Frazer-Reynolds, 1998; Canada | 263 | Children | EMA | 17 | None | 12 | Carey capsule; Marsh criteria; | 0.0646 | 0.0456 | 17/19 biopsied |
| Gillett, 2001; Canada | 233 | Children | EMA or AGA | 19 | None | 14 | Not reported | 0.0815 | 0.0601 | 18/19 biopsied |
| Hansen, 2001; Denmark | 104 | Children | EMA or tTG | 10 | None | 9 | Partial or total villous atrophy, crypt hyperplasia and IEL infiltration | 0.0962 | 0.0865 | 9/10 biopsied |
| Saukkonen, 1996; Finland | 776 | Children | AGA or ARA | 76 | None | 19 | Not reported | 0.0979 | 0.0245 | Only 35/76 biopsied |
| Spiekerkoetter, 2002; Germany | 205 | Children | tTG IgA or IgG | 13 | None | 6 | Marsh | 0.0634 | 0.0293 | Only 8/13 biopsied |
| Arato, 2003; Hungary | 205 | Children | EMA | 24 | None | 17 | n/r | 0.1171 | 0.0829 | |
| Barera,1991; Italy | 498 | Children | AGA IgA then if neg IgG AGA | 30 | None | 16 | Subtotal villous atrophy | 0.0602 | 0.0321 | 22/30 biopsied |
| Barera, 2002; Italy | 273 | Children | EMA, second EMA | 15 | 10 | 9 | Marsh; type II or III lesion | 0.0549 | 0.0330 | |
| Valerio, 2002; Italy | 383 | Children | EMA or IgG AGA | n/r | None | 32 | ESPGAN | n/r | 0.0836 | |
| Carelo, 1996; Spain | 141 | Children | IgA AGA if positive on two occaions | 12 | None | 4 | Subtotal villous atrophy | 0.0851 | 0.0284 | |
| Roldan, 1998; Spain | 177 | Children | IgA, IgG AGA, (and known cases, and some tested with EMA) | 19 | None | 7 | ESPGAN | 0.1073 | 0.0395 | Mixed group diagnosed by different means |
| Juan, 1998; Spain | 93 | Children | EMA | 7 | None | 6 | ESPGAN | 0.0753 | 0.0645 | |
| Sigurs, 1993; Sweden | 459 | Children | AGA | 19 | None | 21 | Watson Capsule | 0.0414 | 0.0458 | 18/19 biopsied included known CD |
| Agardh, 2001; Sweden | 162 | Children | AGA, EMA, or tTG IgG or IgA | 8 | 8 | 6 | As described by Carlsson et al. 1999, Pediatrics 103:1248 | 0.0494 | 0.0370 | Only 6 of 8 biopsied |
| Acerini, 1998; UK | 167 | Children | EMA or AGA | 11 | None | 8 | ESPGAN | 0.0659 | 0.0479 | 9/11 biopsied |
| De Block, 2001; Belgium | 399 | Mixed | EMA | 9 | None | 3 | No biopsy performed | 0.0226 | 0.0075 | Unclear how the 3 cases confirmed |
| Jager, 2001 | 197 | Mixed | tTG | 19 | None | n/r | 0.0964 | |||
| De Vitis, 1996; Italy | 1114 | Mixed | IgA, IgG then IgA EMA | 121 | 55.00 | 63 | Marsh - “villous atrophy” | 0.1086 | 0.0566 | 78/121 biopsied |
| Not, 2001; Italy | 491 | Mixed | EMA | 28 | None | 28 | Intestinal biopsy; Marsh's modified classification | 0.0570 | 0.0570 | |
| Bao, 1999; USA | 847 | Mixed | tTG | 98 | None | 15 | n/r | 0.1157 | 0.0177 | Only 20/98 biopsied |
| Kordonouri, 2000; Germany | 520 | Mixed - mostly children | tTG | 23 | None | 9 | Marsh criteria | 0.0442 | 0.0173 | 10/23 not biopsied |
| Aktay, 2001; USA | 218 | Mixed - mostly children | EMA | 17 | None | 10 | Partial or total villous atrophy, inflammation in lamina propria with increased IELs, and hyperplasia of crypts; classified as partial or total villous atrophy | 0.0780 | 0.0459 | 14/17 biopsied |
| Cronin, 1997; Ireland | 101 | Mixed - mostly adults | EMA | 8 | None | 5 | n/r | 0.0792 | 0.0495 | |
| Schober, 2000; Austria | 403 | Mixed - mostly children | EMA | 12 | None | 6 | Modified Marsh and Crowe; Watson-type capsule | 0.0298 | 0.0149 | 11/12 biopsied |
| Lampasona, 1999; Italy | 287 | Mixed - mostly children | tTG IgA or IgG | 24 | None | n/a | No biopsy | 0.0836 | n/a | |
| Lorini, 1996; Italy | 133 | Mixed - mostly children | AGA IgA or IgG | 5 | None | n/a | No biopsy | 0.0376 | n/a | |
| Page, 1994; Mixed | 1785 | N/a | AGA | 73 | None | 13 | n/a | 0.0409 | 0.0073 | Only 49/73 biopsied |
All the included studies initially screened the study population with one or more antibodies. Three studies did not confirm positive serology with biopsy,265–267 whereas in nine studies confirmatory biopsies were performed in less than 75% of the screened-positive patients.253, 259, 264, 269, 272, 274, 277–279 These studies were not included in the pooled estimates of the prevalence of CD by biopsy. All the studies that reported biopsy criteria used partial villous atrophy or greater to define CD.
| Number of studies | Total patients | Age group | Screening test(s) | Prevalence by serology | Prevalence by biopsy |
|---|---|---|---|---|---|
| 1277 | 848 | Adults | AGA - IgG or IgA; then EMA | 0.0259 | 0.0083* |
| 1266 | 509 | Adults | EMA | 0.0138 | n/a |
| 1279 | 185 | Adults | EMA | 0.0486 | 0.0216* |
| 1263 | 62 | Adults | EMA | n/a | 0.1129 |
| 3257,270,273 | 531 | Adults | EMA | 0.0433 | 0.0339 |
| 1274 | 776 | Children | AGA or ARA | 0.0979 | 0.0245* |
| 1276 | 459 | Children | AGA | 0.0414 | 0.0458 |
| 4254,256,267,271 | 949 | Children | AGA - various combinations | 0.0695 | 0.0331 |
| 1252 | 205 | Children | EMA | 0.1171 | 0.0829 |
| 1275 | 403 | Children | EMA | 0.0298 | 0.0149 |
| 5251,255,260,272,281 | 1058 | Children | EMA | 0.0624 | 0.0437 |
| 4251,255,260,281 | 847 | Children | EMA | 0.0661 | 0.0437 |
| 5250,261,262,280,282 | 1049 | Children | EMA - combinations | 0.0721 | 0.0658 |
| 1265 | 287 | Children | tTG IgA with IgG | 0.0836 | n/a |
| 1278 | 205 | Children | tTG IgA with IgG | 0.0634 | 0.0293* |
| 1264 | 520 | Children | tTG | 0.0442 | 0.0173* |
| 1269 | 1785 | Mixed | AGA | 0.0409 | 0.0073* |
| 1259 | 1114 | Mixed | IgA, IgG-AGA then IgA-EMA | 0.0494 | 0.0566* |
| 1268 | 491 | Mixed | EMA | 0.0570 | 0.0570 |
| 1258 | 399 | Mixed | EMA | 0.0226 | 0.0075† |
| 1234 | 197 | Mixed | tTG | 0.0964 | n/a |
| 1253 | 847 | Mixed | tTG | 0.1157 | 0.0177* |
large proportion of serology-positive patients not biopsied,253,259,264,269,272,274,277–279 these were not included in the pooled analysis of prevalence by biopsy
no description of how diagnosis made — result not pooled
The prevalence of CD in adults was assessed in seven studies.257, 263, 266, 270, 273, 277, 279 Six of these studies used IgA EMA as the screening test,257, 263, 266, 270, 273, 279 whereas the largest study used IgA- and IgG-AGA, followed by EMA for confirmation.277 In this last study, EMA confirmation was positive in 22 of the initially screened sample of 848 patients (2.6%), but biopsy confirmation was only performed in 14 of these patients, making the estimate of 0.83% prevalence by biopsy unreliable. The second largest study (n=509) did not confirm the EMA-positive patients with biopsy, and demonstrated the lowest prevalence of CD by EMA (1.4%) of all of the studies.266 In another study of 185 patients,279 the prevalence of CD by EMA was 4.9%, but only five of nine screen-positive patients were biopsied, making the prevalence of 2.2% (4/185) by biopsy a likely underestimation since four of the five biopsied EMA-positive patients were diagnosed with CD. A small study of 62 patients used biopsy as the screening test and found the prevalence of CD to be 11.3%, which is the highest prevalence of the group.263 The remaining studies had uniform biopsy confirmation.257, 270, 273 In these studies the prevalence of CD by EMA ranged from 3.1% to 7.9%, and the prevalence of CD by biopsy ranged from 2.6% to 6.4%.
Twenty-one studies assessed the prevalence of CD in children with IDDM.250–252, 254–256, 260–262, 264, 265, 267, 271, 272, 274–276, 278, 280–282 Six of these studies used IgA-AGA or -AGA in combination with either IgG-AGA or other antibody tests.254, 256, 267, 271, 274, 276 The largest study tested 776 children with AGA and ARA (reticulin antibodies), and found a prevalence of CD by serology of 9.8%.274 However, only 35 of 76 serology-positive patients were biopsied, making the reported prevalence by biopsy of 2.5% a likely underestimation. A single study of 459 patients that used IgA-AGA as the screening test found the prevalence of CD by serology to be 4.1%, and the prevalence of CD by uniform biopsy confirmation to be 4.6%.276 The second largest study (n=498) used a combination of IgA- and IgG-AGA, and found a prevalence of CD by serology of 6.0% and a prevalence of CD by biopsy of 3.2%.254 Two other studies that used IgA and IgG-AGA271 or paired IgA-AGA measurements,256 found a very similar prevalence by serology of 10.7% and 8.5%, respectively, and a prevalence by biopsy of 3.95% and 2.8%, respectively. The last study in this group did not perform biopsy confirmation of the IgA- and IgG-AGA derived prevalence of 3.76%.267
Seven studies used IgA-EMA to screen for CD in children with IDDM.251, 252, 255, 260, 272, 275, 281 One Hungarian study of 205 children demonstrated a relatively high prevalence by serology and biopsy of 11.7% and 8.3%, respectively,252 whereas an Austrian study of 403 children demonstrated a relatively low prevalence by serology and biopsy of 3.0% and 1.5%, respectively.275 A study by Rossi et al.272 from the US demonstrated a prevalence of CD of 4.7%. The remaining studies demonstrated fairly consistent results, with the prevalence of CD by serology ranging from 5.5% to 7.8%, and the prevalence by biopsy ranging from 3.3% to 6.5%.251, 255, 260, 281
Three studies used IgA-tTG either alone264 or in combination with IgG-tTG.265, 278 IgA-tTG was used alone in a study of 503 children which demonstrated a prevalence by serology of 4.4%. Ten of the 23 serology-positive patients did not undergo biopsy confirmation, making the reported prevalence of 1.7% a likely underestimation. Of the two studies that used IgA- and IgG-tTG, the first did not perform biopsy confirmation and reported a prevalence of CD by serology of 8.4%,265 whereas, the other found a prevalence of CD by serology of 6.3%, and by biopsy of 2.9%, although only eight of 13 serology-positive patients underwent biopsy.278
Five studies used a combination of IgA-EMA and one or more other antibodies, to assess the prevalence of CD in children with IDDM.250, 261, 262, 280, 282 In three studies, EMA was combined with AGA,250, 261, 280 in one it was combined with tTG,262 and in the one it was combined with AGA and tTG.282 In one study, only the confirmed biopsy prevalence of 8.3% was reported.280 Overall, this group reported prevalences by serology ranging from 5.0% to 9.6%, and by biopsy ranging from 3.7% to 8.6%.
The remaining six studies assessed the prevalence of CD in a mixed-age population of patients with IDDM.234, 253, 258, 259, 268, 269 One study of 1,785 patients found the prevalence of CD by IgA AGA to be 4.1%. In this study, only 49 of 73 screen-positive patients underwent biopsy confirmation, making the reported prevalence by biopsy of 0.73% an underestimation.269 Another large study of 1,114 patients used IgA and IgG AGA as an initial screen of screen-positive patients, and then performed a second level screen with IgA EMA before moving on to biopsy.259 The EMA confirmed prevalence of CD was 4.9%, whereas, the reported biopsy confirmed prevalence was a relatively high 5.7%. In this study, 78 of 121 initial AGA-positive patients underwent biopsy, suggesting that most of the EMA-positive patients were biopsied.
Among the two studies that used IgA EMA as the screening test in a mixed-age population, the prevalence of CD by serology was 2.3%258 and 5.7%.268 It was unclear in the first study how the final confirmed prevalence of CD of 0.75% was arrived at,258 whereas, in the other study the uniformly confirmed biopsy prevalence was 5.7%.268
The final two studies assessed the prevalence of CD in a mixed-age population of diabetics using IgA-tTG.234, 253 The prevalence of CD by serology was fairly high in both these studies: 9.6%234 and 11.5%.253 The first study did not perform biopsy confirmation, whereas, in the last study only 20 of 98 screen-positive patients were biopsied, making the reported prevalence of CD by biopsy of 1.8% a likely underestimation.
Clinical heterogeneity existed for some subgroups of this analysis making an overall pooled estimate of the prevalence of CD in children and adults with IDDM not entirely possible. However, a summary table (Table 40) is provided which presents the data grouped by age group and screening test, and Figure 30
| Study, year; country | Relative Type | Index case | Screening | Dx criteria | N tested | Prevalence (%) |
|---|---|---|---|---|---|---|
| Polvi, 1996; Finland | 1st degree | CD in family | Biopsy | ESPGAN | 90 | 20 |
| Holm, 1993; Finland | 1st degree | CD in family | Biopsy | Some VA | 121 | 10.7 |
| Robinson, 1971; England | 1st degree | CD child in family | Biopsy | Some VA | 29 | 10.3 |
| Rolles, 1974; England | 1st degree | CD child in family | Biopsy | Not reported | 72 | 5.6 |
| Stokes, 1976; England | 1st degree | CD in family | Biopsy | Some VA | 182 | 22.5 |
| Tursi, 2003; Italy | 1st degree | CD in family | Biopsy | Marsh I-IV | 111 | 44.1 |
| Corazza, 1992; Italy | 1st degree | CD adult in family | AGA | Some VA | 328 | 4.0 |
| Pittschieler, 2003; Italy | 1st degree | CD in family | EMA, TTG | Some VA | 92 | 12.0 |
| Rostami, 2000; Netherlands | 1st degree | CD in family | AGA, EMA, Hx | ESPGAN | 338 | 10.9 |
| Hogberg, 2003; Sweden | 1st degree | CD in family | AGA, EMA, TTG | Some VA | 120 | 8.3 |
| Korponay-Szabo, 1998; Hungary | 1st degree | CD in family | EMA | Some VA | 943 | 9.1 |
| Farre, 1999; Spain | 1st degree | CD in family | AGA, EMA | Some VA | 675 | 5.6 |
| Kotze, 2001; Brazil | 1st degree | CD in family | EMA | +ve serology* | 115 | 3.5 |
| Fasano, 2003; US | 1st degree | CD in family | EMA | +ve serology | 4,508 | 4.5 |
| Vitoria, 1994; Spain | 1st degree | CD in family | AGA, EMA | +ve serology | 642 | 2.8 |
| Mustalahti, 2002; Finland | 1st degree | >1 DH or CD sib | AGA, EMA | +ve serology | 466 | 9.4 |
| Book, 2003; US | 1st degree | CD sib pairs | EMA, TTG | +ve serology | 163 | 17.2 |
| Hill, 2000; US | 1st & 2nd degree | CD in family | EMA | +ve serology | 192 | 4.7 |
| Fasano, 2003; US | 2nd degree | CD in family | EMA | +ve serology | 1,275 | 2.6 |
| Korponay-Szabo, 1998; Hungary | 2nd degree | CD in family | EMA | +ve serology | 54 | 5.6 |
| Book, 2003; US | 2nd degree | CD sib pairs | EMA, TTG | +ve serology | 82 | 19.5 |
| Book, 2003; US | 1st cousins | CD sib pairs | EMA, TTG | +ve serology | 47 | 17.0 |
EMA titre ≥ = 1/5
VA = villous atrophy; DH = dermatitis herpetiformis
First-degree relatives: First-degree relatives were directly evaluated with small bowel biopsy in five studies; three were performed in England in the 1970's,242, 243, 245 and two in Finland during the 1990's.129, 167 The biopsy criteria for a diagnosis of CD was not reported in one study,243 and implied at least some degree of villous atrophy in the other four.129, 167, 242, 325 The percent of all at-risk family members that were studied varied from 34%245 to 100%.243 The study size varied between 29242 and 182,245 and the cumulative number of patients tested was 494. The prevalence of CD among first-degree relatives undergoing intestinal biopsy varied from 5.5%243 to 22.5%;245 the pooled prevalence was 16%.
Serological screening of the first-degree relatives of patients with biopsy-proven CD was performed in 12 studies.206, 235–237, 239–241, 244, 246–249 In seven of those studies, intestinal biopsy was performed on at least 80% of the subjects who tested positive serologically, i.e., in 84 % of subjects in one study,237 and in 100% of subjects in the other six studies.236, 239, 244, 247–249 Serological screening was performed with AGA alone in one study,236 whereas, the other six studies used EMA, either alone239 or in combination.237, 244, 247–249 Six studies used criteria implying some degree of villous atrophy,236, 237, 239, 244, 247, 248 whereas, one study included cases with Marsh I changes.249 The study size varied from 92248 to 943239 subjects, for a cumulative number of 2,607 subjects. For the studies that required some degree of villous atrophy for diagnosis, the prevalence varied from 4%236 to 12%,248 and the mean prevalence was 7.6%. However, when Marsh I lesions were also considered diagnostic, the prevalence of CD among first-degree relatives was reported at 44.1%.249
In five other studies of first-degree relatives,206, 235, 240, 241, 246 confirmatory biopsy was not routinely performed (available in 9%246 to 58%241 of the cases), and the reported prevalence of CD was based on the serology results. EMA was used for serological screening in all of these studies, either alone,206, 240 or in combination with AGA 241, 246 or tTG.235
Two of these studies were performed in families where at least two index cases prevailed and are, therefore, reviewed separately.235, 241 Ninety percent of the at-risk populations from these two studies were tested, which represents a cumulative number of 629 subjects. The prevalence of CD among these first-degree relatives from families where there are at least two index cases of known CD or dermatitis herpetiformis (DH) was 9.4%241 and 17.2%.235
The study size of the other three studies varied from 115240 to 4,508,206 and the cumulative number of first-degree relatives tested was 5,265. The prevalence of CD among these serology-tested first-degree relatives varied between 2.8%246 and 4.5%206 (mean prevalence 4.3%).
Other relatives: One study from the US238 reported an EMA-based prevalence of 4.7% in 192 first- and second-degree relatives; the prevalence from each of the groups of relatives was not reported separately.
An American study by Book et al.235 studied the prevalence of CD in second-degree relatives and first cousins of CD sibling pairs (i.e., families with two affected index cases). Eighty-two second-degree relatives and 47 first cousins were tested with EMA and tTG, and the diagnosis was biopsy confirmed in 40% of the cases. The serology-based prevalence was 19.5% in second-degree relatives and 17.0% in first cousins.
Two other studies, one large (n=1,275) American study of prevalence of CD in at-risk and not-at-risk subjects,206 and one Hungarian study,239 provided data on the prevalence of CD in second-degree relatives. The EMA-based prevalence of CD in those groups was 2.6% and 5.5%, respectively (mean prevalence 2.7% on a cumulative number of 1,329 second-degree relatives).
| Author, year; country | No. of pts | Age group | Population | Anemia type | Screening test | First serology | Confirmatory serology | Biospsy proven | Biopsy criteria | Prevalence by serology | Prevalence by biopsy |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Akerman, 1996; Israel | 93 | Adult - some teens | Out-patients with IDA (50% symptomatic) | IDA | EGD/ biopsy | 13 | Subtotal or greater villous atrophy | n/a | 0.139785 | ||
| Annibale, 2001; Italy | 71 | Adults | Asymptomatic | IDA | EGD/ biopsy | 4 | Marsh | n/a | 0.056338 | ||
| Corazza, 1995; Italy | 200 | Adults | Referred to hematology | IDA | IgA/IgG-AGA then EMA then biopsy | 16 | 10 | 10 | Not mentioned | 0.05 | 0.05 |
| Dickey, 1997; UK | 10 | Adults | Asymptomatic, previously investigated no gross GI cause found | IDA | IgA AGA then EMA | 4 | 3 | Endoscopic biopsy; criteria n/r; finding of villous atrophy and IELs in duodenal biopsy | 0.3 | n/a | |
| Howard, 2002; UK | 258 | Adults | IDA identified through lab | IDA, folate | IgA/IgG-AGA and EMA then biopsy | 28 | 12 | Not applicabe | 0.10852713 | 0.046512* | |
| Kepczyk, 1995; USA | 39 | Adults | Mostly symptomatic out-patients with IDA | IDA | EGD/ biopsy | 4 | Villous atrophy, crypt hyperplasia, inflammatory infiltrate | n/a | 0.102564 | ||
| McIntyre, 1993; UK | 50 | Adults | Out-patients with IDA | IDA | EGD/ biopsy | 3 | Not reported | n/a | 0.06 | ||
| Oxentenko, 2002; USA | 113 | Adults | Undergoing EGD for IDA | IDA | EGD/ biopsy | 17 | CD was defined as total or partial villous atrophy with IELs | Not applicable | 0.150442 | ||
| Ransford, 2002; UK | 484 | Adults | Referred to hematology | IDA | EMA then EGD/ biopsy | 17 | 11 | Revised ESPGAN; duodenal histologic changes were graded according to Marsh I–III | 0.03512397 | 0.022727† | |
| Unsworth, 2000; UK | 483 | Adults | Blood donors | Anemia unspecified | IgA-EMA then biopsy | 32 | 22 | n/r | 0.06625259 | 0.045549‡ | |
| Annibale, 2003; Italy | 59 | Adult | Pre-menopausal women with IDA | IDA | IgA tTG then biopsy | 7 | 5 | Marsh | 0.11864407 | 0.084746** | |
| Van Mook, 2001; The Netherlands | 35 | Adult | Asymptomatic | IDA | EGD / biopsy | 1 | Marsh I | Not applicable | 0.028571 | ||
*24/28 biopsied
†5 Marsh I identified by CD3
‡25/32 biopsied
**5/7 biopsied; 30 had heavy periods; CD in 1/22 with heavy periods, and 4/18 with normal periods
| No. of studies | Total patients | Population | Screening test(s) | Prevalence by serology | Prevalence by biopsy |
|---|---|---|---|---|---|
| 3283,288,290 | 245 | Symptomatic IDA | Biopsy | n/a | 0.139 |
| 1286 | 10 | Asymptomatic, previously no gross GI cause found investigated | IgA-AGA then EMA | 0.3 | n/a |
| 1293 | 59 | Pre-menopausal women with IDA | IgA-tTG then Biopsy | 0.119 | 0.085 |
| 4285,287,291,292 | 1,425 | Asymptomatic serology screened | IgA-EMA, or-AGA followed by EMA; all biopsy confirmed | 0.061 | 0.039 |
| 3284,289,294 | 156 | Asymptomatic biopsy screened | Biopsy | n/a | 0.051 |
Three studies assessed the prevalence of CD in IDA patients with GI symptoms.283, 288, 290 The prevalence of CD in these studies ranged from 10.3% to 15% of the studied group. One small study assessed the prevalence of CD in a group of patients who had IDA but no identified GI source.286 In this study, the prevalence of CD by AGA and confirmed by EMA was 30%.
In another study, the authors assessed the prevalence of CD in pre-menopausal women with IDA.293 The overall prevalence of CD in this population was found to be 12.9% by tTG, and 8.5% after biopsy confirmation. CD was found in 1 of 22 (4.5%) of women with heavy periods, and 4 of 18 (22%) of women with normal menstrual flow.
Four studies assessed the prevalence of CD in asymptomatic IDA patients by serology.285, 287, 291, 292 Two of these used EMA screening,291, 292 whereas the other two initially screened with AGA and then confirmed with EMA.285, 287 The prevalence of CD in this group ranged from 2.3% to 5.0%. Another three studies assessed the prevalence of CD by biopsy in asymptomatic IDA patients, finding it to be between 2.9% and 6%.284, 289, 294
| Author, year; country | Population | BMD definition | Test | Prevalence |
|---|---|---|---|---|
| Lindh, 1992, Sweden | 92 consecutive patients with idiopathic osteoporosis screened for CD; 91% F (mean age 66+-12 Y); and 9 M (mean age 50+-12 Y) | Bone mineral content by photon absorptiometry (SPA) of non-dominant forearm; criteria n/r | IgA-AGA ELISA; cut-off was 2 SD above the mean of blood donors; confirmatory biopsy in 6 - criteria n/r | 11/92 (12.0%) AGA +ve.; 3% (3/92) biopsy confirmed |
| Mean proximal SPA 0.97 g/cm2 | ||||
| Mean distal SPA 0.67 g/cm2 | ||||
| Gonzalez, 2002; Argentina | 127 postmenopausal women with osteoporosis; age (Y): mean 68, range 50–82; 747 controls; age (Y): mean 29, range 16–79 | History of non-traumatic fractures and lumbar spine and/or femoral neck BMD below T-score -2.5 DXA | IgA and IgG-AGA ELISA; cut-off levels: for IgA - 15 AU/mL; for IgG - 20 AU/mL; positives confirmed with IgA-EMA-ME positive at 1:5 dilution; positives confirmed with biopsy in EMA positives; showing villous atrophy, crypt hyperplasia and IEL >30% | 1/127, or 7.9 × 1000 (95% CI: 0.2–43.1); test positivity: AGA found in 8 of 127 (6.3%) pts on level 1; 1 of these 8 pts was EMA positive on the 2nd level and eligible for biopsy which established a diagnosis of CD in 1 (0.9%) |
| Mather, 2001; Canada | Idiopathic low BMD; mean age 57 Y; range 18–86 Y; 81.3% (78) F; 18.7% M (18) | DXA Osteopenia: | IgA- EMA-ME titers of ≥1:10; and biopsy confirmation based on subtotal or greater villous atrophy | 7 (7.3%) of 96 pts were EMA +ve; all biopsies were negative based on subtotal or greater villous atrophy prevalence of 0% |
| All osteopenic;45/78 F and 13/18 M osteoporotic | BMD <1 SD of mean sex-matched peak | |||
| BMD Osteoporosis: | ||||
| BMD <2.5 SD of mean sex-matched peak | ||||
| BMD | ||||
| Nuti, 2001; Italy | 255 females with osteoporosis; mean age 66.6 Y range 36–65 Y | DXA | IgA-AGA ELISA-cut-off level of 10 AU/mL-1; IgA-tTg cut-off >22 AU; confirmatory biopsy criteria n/r | 53/255 (20.8%) +ve IgG-AGA; 24/53 +ve for tTG antibody (9.4%); intestinal biopsy in 10/24 resulted in 6 (2.4%) with confirmed CD |
| BMD below T-score -2.5 | ||||
F=female; M-male; DXA=dual X-ray absorptiometry; Y=years; n/r=not recorded
In the studies that used this test as the initial screen, AGA was positive in 6% to 21% of the patients with osteoporosis. However, in these studies CD was confirmed by biopsy in only 0.9% to 3% of patients.295, 296, 326 The study that used EMA-ME as a screening test identified potential CD cases in 7.3% of patients, but none of these met the authors' biopsy criteria for CD.297
Out of 379 references resulting from the literature search on CD and lymphoma, 150 were initially excluded because they did not directly address this topic (Appendix F). Of the 229 studies that were screened using full reports of the studies, 211 were excluded for the following reasons: review articles (n=73; 19.3% of level 2 articles); did not address the topic (n=33); assessed the risk of CD in lymphoma (n=28); were uncontrolled studies, including surveys (n=53); or, studied the basic mechanisms and the pathogenesis of lymphoma in CD (n=24).
The following eight exclusions were made from the 18 publications that reached level 3 (i.e., eligibility criteria): duplicate publications (n=7);127, 327–332 (for two of these reports,328, 332 patients originated from the same center [i.e., General Hospital, Birmingham]) and the reports were conducted during the same periods as other reports,329, 330, 333 and we could not rule out that they were not similar series); data was not extractable (n=1).334
| Study, year; country, period | Study type | Participants | Risk of lymphoma | Mortality | Other observations |
|---|---|---|---|---|---|
| Cottone, 1999; Sicily, 1980-97 | Retrospective cohort | • 228 CD patients | • Incidence NHL 3.1% | SMR all causes 3.8 (1.9–6.7) | |
| • 76% females | • SIR NHL 3.75, p <0.01 | ||||
| • mean age at Dx 34.7 | |||||
| • 98% adult Dx | |||||
| • 100% on strict GFD | |||||
| Holmes, 1989; England, 1941-85 | Prospective cohort | • 210 CD patients | • Incidence NHL 4.3% | SMR not reported | SIR NHL vs GFD compliance: |
| • 55% females | • SIR NHL 42.7 (19.6–81.4) | • Strict GFD 44.4 | |||
| • 51% on strict GFD | • Gluten diet 100 | ||||
| Logan, 1989; Scotland, 1979-1986 | Prospective cohort | • 653 CD patients | Mortality from NHL 2.6% | SMR childhood Dx 1.4 (0.4–3.7) | |
| • 60% females | SMR from lymphoma 31 p<0.001 | SMR adult dx 1.9 (1.5–2.3) | |||
| SMR all causes 1.9 (1.5–2.2) | |||||
| Askling, 2002; Sweden, 1964-94 | Retrospective cohort | • 11,019 CD patients | • Incidence NHL 0.34% | SMR from NHL 11.4 (7.8–16) | SIR NHL childhood Dx 1.9 (0.4–5.5) |
| • 59% females | • SIR NHL 6.3 (4.2–125) | SMR all causes 2 (1.8–2.1) | SIR NHL adult Dx 7.0 (5.0–9.5) | ||
| • Mean age at Dx 17.4 (range 0–>70) | |||||
| Collin, 1996; Finland, 1970-93 | Prospective cohort | • 383 CD patients | • Incidence NHL 0.26% | ||
| • 73% females | • SIR NHL 2.66 (0.07–14.8) | ||||
| • Mean age at Dx 41.8 (range 16–78) | |||||
| • 75% on strict GFD | |||||
| Corrao, 2001; Italy, 1962-94 | Prospective cohort | • 1,072 CD patients | SMR from NHL: 69.3 (40.7–112.6) | SMR age 18–29 at Dx: 2.5 (0.5–7.3) | |
| • 76% females | SMR all causes: 2.0 (1.5–2.7) | SMR age 30–49 at Dx: 2.4 (1.3–4.0) | |||
| • mean age at Dx 35.7 (range 18–>50) | SMR age >50 at Dx: 1.9 (1.3–2.6) | ||||
| • 59% on strict GFD | SMR strict GFD: 0.5 (0.2–1.1) | ||||
| SMR unlikely GFD: 6.0 (4.0–8.8) | |||||
| Green, 2003; USA, 1981-2000 | Prospective cohort | • 381 CD patients | • Incidence NHL 1.3% | ||
| • 64% females | • SIR NHL 6.2 (2.9–14) | ||||
| • mean age at Dx 44 +/- 18 | |||||
| Selby, 1979; Australia, 1959-78 | Retrospective cohort | • 93 CD patients | • Incidence NHL 4.3% | ||
| • 67% females | • SIR NHL 4.94, p<.0005 | ||||
| • mean age at Dx 40 (range 14–70) | |||||
| Delco, 1999; USA, 1986-95 | Case-control | • 458 CD patients | • OR NHL 4.53 (2.01–10.23) | ||
| • 4% females | |||||
Dx=diagnosis; SIR=standardized inidence ratio; NHL=non-Hodgkin's lymphoma; SMR=standardized mortality
Eight out of nine studies were cohort studies, either prospective or retrospective. The standardized incidence ratio (SIR) was the most commonly reported measure of association; it was calculated as the incidence observed in the patient cohort divided by the expected incidence from the control population, along with a measure of precision (i.e., its 95% CI). The results were expressed either as SIRs of lymphoma or as the standardized mortality ratio (SMR) from lymphoma (SMR-NHL). The all-cause mortality was also reported in some studies.
It was not possible to pool these measures of risk, since SIRs by definition incorporate variables inherent to each population. The attributable risk (AR), was calculated whenever the incidence rates of NHL in CD patients and in the age-adjusted general population, were available.
There were eight cohort studies (five prospective 333, 336, 338–340 and three retrospective 335, 337, 341) and one case-control study.342 Two studies were from Italy,335, 339 two from the UK,333, 336 two from Scandinavia,337, 338 two from the US,340, 344 and one from Australia.341 The observation periods varied from 7 years336 to 44 years (1941-85;333), and the mean duration of patient follow-up varied from 6 years335, 339–341 to 18.6 years.333 Patients were either selected from a national patient register,336 from hospital discharge databases,337, 342 or represented all consecutive cases from a single333, 335, 338, 340, 341 or multiple339 institution(s). The cohort sizes varied from 93341 to 11019;337 55% to 76% of patients with CD were female, except for the study by Delco et al.,342 which used discharge diagnoses databases from the US Veterans Affairs hospitals (4% female CD patients). The mean age at diagnosis of CD was reported in six studies: in four studies, the diagnosis of CD was made almost exclusively in adulthood.335, 338, 339, 341 The mode of presentation was reported in four studies.335, 338, 339, 341 Adherence to a GFD was reported in five studies,333, 335, 338, 339, 341 and could be used in the analysis in three of them.333, 338, 339 Control data for the cohort studies was derived from local and national mortality data and cancer registers.
The total number of lymphomas diagnosed in each study and their histological type was not uniformly reported. Of the 84 lymphomas that were mentioned within these nine studies, 64 were referred to as “non-Hodgkin lymphoma (NHL)” not otherwise specified, one as “lymphoma,” nine as “enteropathy-associated T-cell lymphoma (ETCL),” five as “B cell lymphoma”, two as “large cell lymphoma,” one each as a “T-cell other than ETCL,” “lymphosarcoma” (currently classified as small cell lymphoma), and “histiocytic medullary reticulosis” (currently termed hairy-cell leukemia). Logan et al.336 reported that they found “mostly lymphosarcomas (i.e., small-cell lymphomas) or reticulum-cell sarcomas (i.e., large-cell lymphomas) as well as two Hodgkin's lymphomas,” whereas, the remaining authors systematically excluded Hodgkin's lymphomas from their respective analyses.
The case definition of CD differed between the reports of institutional series and those derived from database analysis. The results will therefore be presented differently according to each of these two study designs.
Institutional series: By institutional studies, we mean reports on the evolution of cases consecutively diagnosed with CD and followed in one or several selected institution(s) over a specific period. Six out of the nine controlled studies were performed in that setting; in five out of six studies, the data originated from a single referral center.333, 335, 338, 340, 341 The sixth study is the product of a collaborative effort between nine Italian centers.339 In these studies, all cases were biopsy-proven CD.
Holmes et al., from Birmingham England, reported on a series of 210 biopsy-proven CD patients diagnosed and followed between 1941 and 1985.333 This series was originally reported by Harris in 1967,329 and reviewed in 1976330 and in 1989333 by Holmes. By this third publication, the authors had excluded all non biopsy-proven cases of CD, as well as the cases of cancer that arose either prior to or within 12 months of diagnosis of CD. The length of follow-up was of a minimum 13 years, 17.4 patient-years for men and 19.4 patient-years for women. Based on the original publication by Harris, we can assume that a large proportion of these patients (80% in Harris' series) were diagnosed with CD in adulthood. There were nine cases of NHL, compared with an expected 0.21, resulting in a SIR-NHL of 42.7 (95% CI: 19.6–81.4), which was the highest reported degree-of-risk for lymphoma among the controlled studies we identified.
Green et al.340 prospectively followed 381 patients with biopsy-proven CD from New York City, most of whom were of European descent, and diagnosed between 1981 and 2000. The mean age at CD diagnosis was 44 +/- 18 years, and the duration of CD-related symptoms prior to diagnosis was 5 +/- 8 years. The mean follow-up was 6 +/- 11 years, for a total of 1,977 patient-years following the diagnosis of CD. There were a total of nine cases of NHL, occurring any time before or after the diagnosis of CD, leading to an attributable risk of NHL from CD of 120.2 cases per 100,000 patient years. The SIR-NHL, diagnosed at any time, was 9.1 (95% CI: 4.7–13), and the SIR-NHL for any lymphoma diagnosed at least one month after the diagnosis of CD was 6.2 (95% CI: 2.9–14).
Cottone et al.335 reported on 228 patients with biopsy-proven CD and followed from 1980 to 1997, from a large referral center in Sicily. Ninety-eight percent of the patients had been diagnosed with CD during adulthood and the mean age at diagnosis was 34.7 years. The mean duration of follow-up was 6 years (range: 1 month to 17 years). No case of refractory CD was mentioned. There were seven cases of NHL, compared with an expected number of 1.824 (SIR-NHL of 3.75 (p<0.01)). The cumulative incidence of NHL was 3%, compared with an expected of 0.8%, leading to a risk difference or AR of 2.2%. The mean age at diagnosis of lymphoma was 59.4 years, and the mean time from the diagnosis of CD was 6.5 years. Lymphomas occurring prior to or within 6 months of CD diagnosis were excluded.
A large Italian multicenter study by Corrao et al.,339 prospectively followed 1,072 patients with CD and spanned from 1962 to 1994, totaling 6,444 patient years. The mean follow-up was 6 years, and all patients were diagnosed with CD during adulthood (mean age at diagnosis of CD 35.7 years). The outcomes were strictly measured in terms of mortality data, i.e., mortality from NHL and from all causes. Events occurring at the time of CD diagnosis were included. There were 16 instances of death from NHL. The SMR-NHL was 69.3 (95% CI: 40.7–112.6), whereas, the SMR of death from all cause (SMR-all cause) was 2.0 (95% CI: 1.5–2.7), showing that the risk of death from NHL in CD is disproportionately elevated.
Selby et al.341 reported on a series of 93 patients with CD that were followed at a single institution in Australia between 1959 and 1978, for a mean duration of 6 years. Patients presented either during the teenage or adulthood, all were symptomatic at the time of diagnosis, and there were no refractory cases. There were four patients with NHL (simultaneous CD and lymphoma diagnosis included), compared with an expected of 0.081 (SIR-NHL 4.94, p<0.0005).
Collin et al.338 reported on a prospective cohort of 383 patients with CD, diagnosed and followed at a single institution over the 1970-93 period, for a mean follow-up of 8.1 years (3,107 patient years in total). The mean age at diagnosis was advanced: 41.8 years, with a range of 16 to 78 years. Seventy-five percent of the patients adhered to a strict GFD and 82% of patients were symptomatic at the time of CD diagnosis. Simultaneous lymphoma and CD diagnoses were not excluded. There was a single case of lymphoma, compared with an expected 0.4 (SIR-NHL 2.66 [95% CI: 0.07–14.8]). As well, the 10- and 15-year survival of CD patients did not differ significantly from those of the general population.
Large database and register series: Logan et al.336 reviewed the death certificates of CD patients belonging to a comprehensive register of CD patients that exists in Scotland since 1979, constituting a cohort of 653 CD patients gathered from 1979 to 1986. There were 17 deaths attributed to lymphoma, instead of an expected 0.55. Both Hodgkin and NHL were included, and so were those lymphomas occurring simultaneously to the diagnosis of CD. The SMR-lymphoma was 31 (p<0.001), which was disproportionately increased compared with the SMR-all causes, which was 1.9 (95% CI: 1.5–2.2).
Askling et al.337 reported on the largest CD patient cohort (n=11,019), gathered from a comprehensive Swedish database of hospital discharge diagnoses over 1964 to 1994. It was not possible to ascertain how the diagnosis of CD was made or confirmed. The mean age at diagnosis of CD was 17.4 (range 0 to >70), and the mean follow-up was 9.8 years (range 0–32), for a total of 97,236 patient years. The ascertainment of outcome was achieved through the Swedish cancer register, as well as the register of causes of death. Lymphomas arising prior to or within 12 months of CD diagnosis were excluded, as for the incident lymphomas found at autopsy. There were 38 cases of NHL, and a SIR-NHL of 6.3 (95% CI: 4.2–125) was calculated. The SMR-NHL was 11.4 (95% CI: 7.8–16), which was disproportionately elevated compared with the SMR-all causes (2.0 [95% CI: 1.8–2.1]).
Delco et al.342 used the database of discharge diagnoses from all US Veteran Affair hospitals to gather a total of 458 CD patients, hospitalized between 1986 and 1995. The concomitant diagnoses received by those patients were compared with those of five controls per CD patient, randomly selected from the same year's discharge database (total 2,692 controls). The mean age of the CD group was 63.8 +/- 12.4 years and the mean age of the control group was 59.7 +/- 14.8 years (p<0.001). Ninety-three percent of the patients with CD were white, compared with 74% of the control subjects (p<0.0001). The odds ratio (OR) of NHL (OR-NHL) in CD, was 4.53 (2.01–10.23).
The impact of GFD compliance was analyzed and reported in only two of the nine studies. Holmes et al.333 reported a SIR of NHL in patients on a strict GFD (SIR 44.4), versus those who did not adhere to a GFD (SIR 100). Corrao et al.339 observed that the mortality from all causes was lower in patients on a strict GFD, as opposed to those who were unlikely GFD-compliant (SMR 0.5 [95% CI: 0.2–1.1] and 6.0 [95% CI: 4.0–8.8], respectively). Although, in the study by Askling337, compliance could not be directly ascertained, the SIR of lymphoma 1 to 4 years after diagnosis was 9.7 (95% CI:6.3–14), wheres, it dropped to 3.8 (95% CI: 2.2–6) five or more years after diagnosis, suggesting that the risk of lymphoma decreases over time on a GFD.
The mode of presentation leading to the diagnosis of CD was not commonly reported. The reports from Italy335, 339 were unique in that they both detailed the circumstances by which the diagnosis of CD was diagnosed, portraying their cohorts as largely asymptomatic, since 45%335 and 70%339 of their patients had subclinical presentations, i.e., either mild symptoms, anemia, or were detected through screening. Conversely, it is reasonable to suggest that the studies that used hospital discharge diagnoses of CD as entry criteria would be largely made up of symptomatic CD patients. Unfortunately, it is not possible to compare the measured risk of lymphoma in the Italian studies to those of our other reports, because of the great disparities in populations, data collection and analyses amongst them.
The presence or absence of symptom at the time of CD diagnosis was not evaluated as a risk factor for lymphoma per se. Corrao et al.339 did, however, analyze the impact of the mode of presentation on the mortality from all causes in CD. They observed that patients diagnosed with mild symptoms or by antibody screening did not show any relevant excess mortality, compared with the symptomatic group (SMR 1.2 [95% CI: 0.1–7.0] and 2.5 [95% CI: 1.8–3.4], respectively).339
Several studies analyzed the risk of lymphoma with respect to the age at diagnosis of CD. Patients who were diagnosed with CD during adulthood were either 1) asymptomatic during childhood or 2) symptomatic but eluded the diagnosis. For the later circumstance, authors have referred to “diagnostic delay” as a symptomatic period in the absence of diagnosis or treatment. The impact of the diagnostic delay was analyzed in two studies.336, 339 Corrao et al.339 compared the mortality from all causes in patients who had suffered a diagnostic delay of more than 10 years, one to 10 years, or less than 1 year (no diagnostic delay), and found that the longer the untreated symptomatic period, the greater the mortality from all causes (SMR 3.8 [95% CI: 2.2–6.4], 2.6 [95% CI: 1.6–4.1], and 1.5 [95% CI: 0.9–2.3], respectively). Logan et al.,336 on the other hand, reported opposite results: while the SMR-all causes was significantly greater than 1 for their entire cohort (1.9 [95% CI: 1.5–2.2]), for those CD patients diagnosed only in adult-life despite an obvious childhood illness typical of CD, all-cause mortality was similar to that of other CD patients diagnosed in adult life. A difference in methodology might explain this discrepancy, since the ascertainment of outcomes was derived from registers in Logan's study and was probably not as accurate and reliable for outcomes such as the presence or absence of symptoms during childhood.
Logan et al.336 also reported that the all-cause mortality was increased in the patients diagnosed as adults, but not those who were diagnosed as children (SMR 1.9 [95% CI: 1.5–2.3] and 1.4 [95% CI: 0.4–3.7], respectively).
The patients from Corrao's cohort were exclusively diagnosed with CD as adults. The SMR-all causes for patients diagnosed between 18 and 29 years was slightly less, and not significantly different from 1.0, compared with those who were diagnosed later on in life, i.e., 2.5 (95% CI: 0.5–7.3) for those diagnosed at age 30 years versus 2.4 (95% CI: 1.3–4.0) for those diagnosed at age 49 years and 1.9 (95% CI: 1.3–2.6) for those diagnosed at age >50 years.
Askling et al.345 reported on 11,019 patients with CD, diagnosed at all ages, and found that the SIR-NHL was not significantly greater than one in CD patients who were diagnosed during childhood, in contrast with those who were diagnosed as adults (SIR-NHL 1.9 [95% CI: 0.4–5.5] for diagnoses made at ages 0 to 19 years compared with 7.7 [95% CI: 4.9–12] for those diagnosed between 20 and 59 years). Part of the increased risk in adults may be explained by the fact that in some of these cases the diagnosis of lymphoma can be made simultaneously or soon after that of CD. However, cases of lymphoma diagnosed within 12 months of CD diagnosis were excluded from Askling' study, so that the risk of lymphoma in adult CD diagnosis remains elevated independently of cases with simultaneous presentation.
We were unable to identify a single source of controlled data on the risk of lymphoma in refractory CD. There was one indirect source of controlled evidence on the mortality in CD. Nielsen et al.,343 from Denmark, published the mortality data from 98 patients with CD diagnosed between 1964 and 1982, 24% of which were treated with prednisone because they did not respond to a GFD, i.e., probable refractory CD. The mortality in CD exceeded that of the general population (controlled for age and sex) by a factor of 3.4 (p<0.025); in GFD-responders, this factor was 2.2 (p<0.025), whereas it was 5.8 (p<0.005) in the non-responders. The causes of death were poorly documented, and therefore, will not be described here.
The search strategy did not identify any studies that would allow us to address the specific benefits and harms of testing with different strategies for CD. The consequences such as false-positive results were dealt with in Celiac 1. We address the response to treatment in the sections that follow.
For the consequence of osteoporosis/fracture, an additional search was conducted with the search terms osteoporosis and CD, and five additional relevant studies were identified.388–392
The consequences that were included in this review were: 1) costs, 2) patients complying with treatment, 3) response to treatment in terms of symptoms, and 4) clinical outcomes such as reduced risk of complications-osteoporosis, mortality, anemia.
Given the recent recognition that the number of subclinical and silent CD cases may be eight times that of classically symptomatic cases, it is important to determine if the clinical outcomes vary according to type of clinical presentation. Where possible, results of the analysis according to type of clinical presentation are presented.
Most papers included in the consequences of testing for CD dealt with patients (who were newly diagnosed) after they initiated a GFD. Most studies evaluating the consequences of nutritional status were before/after studies. In total, 15 studies dealing with either nutritional status, weight, body mass index (BMI) and body composition, were identified.346–350, 352, 357, 359, 361, 363–365, 369–371
Seven studies were case control,347–349, 352, 357, 364, 369 one a cohort study,346 and in seven studies, the patients acted as their own control group.350, 359, 361, 363, 365, 370, 371
Eight studies were based on children with CD,347, 349, 350, 352, 357, 359, 369, 370, three studies were based on adolescents with CD363–365 and four studies were based on adults with CD346, 348, 361, 371
There were five studies that evaluated costs of screening as a consequence.360, 366, 379, 380, 382
Type 1 diabetes and CD. Four studies evaluated diabetes and CD in children.347, 357, 359, 370 Three studies were from Europe (UK,347 Hungary,359 and Finland370) and one was from Australia.357 Two were case control studies347, 357 and two studies had patients with CD act as their own controls.359, 370 All the studies assessed the effect of a GFD diet (range 3–12 months) on the diabetic control of type 1 diabetes.
The UK study347 evaluated 230 children with type 1 diabetes who were screened for CD with serology. Those children with positive serology were biopsied. Eleven children were diagnosed with CD and followed longitudinally. The control subjects were the children diagnosed with type 1 diabetes with negative serology. The controls were matched for age, sex and duration of diabetes in a 2:1 ratio (22 controls:11 cases). At baseline, the weight (standard deviation score; SDS), BMI SDS and HbA1c of the cases were statistically lower than the controls. No statistical difference was noted for height SDS, C-peptide level and insulin requirements. Also, the cases (type I diabetes with positive CD serology) received significantly less intensive insulin regimens compared with controls. Six type 1 diabetic children with CD participated in the GFD. After 12 months of a GFD, the differences seen in the BMI SDS was reversed between the cases and controls. HgA1c levels did not improve significantly on a GFD. Insulin dose requirements increased for both cases and controls, but still did not significantly differ from each other. Insulin regimens were not statistically different between cases and controls after a GFD.
The Australian study357 included children and adolescents with coexisting type 1 diabetes and CD, which were identified from a database of the Diabetes Center at the Royal Alexandra Hospital for Children. CD had to be biopsy-proven. Twenty patients (5M:15F) were enrolled out of 36 patients identified on the database. Forty control patients from the same database were matched for age, sex and duration of IDDM. No immediate criteria on screening from the database was given in the study. At baseline, the current height SDS, current weight SDS, BMI SDS and HbA1c were not significantly different from controls. Compliance with a GFD was based on dietary records classifying patients to: no detectable gluten; trace of gluten; and, gluten containing. For compliance, 30% of patients were classified as adhering to a strict GFD, 30% consumed trace amounts of gluten, and 40% had a significant amount of gluten in their diet. No differences were detected in growth parameters or HbA1c according to compliance to a GFD.
The Hungarian study359 included 205 children with type 1 diabetes that were randomly selected from screening for CD. None of these patients had suspicion for CD. Twenty-four children were positive for EMA and 17 (7 boys and 10 girls) had subtotal villous atrophy. The height of the children with CD and type 1 diabetes were normal compared with children with only type 1 diabetes at baseline. But the BMI of the 17 children was significantly lower (14.2 vs 16.3 kg/m2) compared to controls. After three months of a GFD, BMI significantly increased (14.2 vs 16.8 kg/m2). Furthermore, significant increases in insulin requirements (0.64 U/kg vs 0.48 U/kg) occurred after a GFD. The percentage of HbA1c did not change on a GFD compared with baseline (7.82% versus 7.67%).
The study from Finland by Saukkonen et al.,370 retrospectively screened 776 children with type 1 diabetes over a 2.7 year period with serology and, if positive, jejunal biopsy. Eighteen children (2.3%) had confirmed CD. HbA1c levels did not change after introduction of a GFD. Correlation of height SDS and mean weight for height were not compared post-GFD.
Body composition and anthropometrics. Six studies specifically detailed body composition after a GFD.348–350, 352, 369, 371 Of these studies, four examined children,349, 350, 352, 369 and two included adults.348, 371
Of the studies conducted in adult patients with CD, one was from Italy348 and the other from Argentina.371 In the Italian case-control study, 212 treated patients with histologically-confirmed CD were assessed. Of these, 71 (33.4%) (51 women and 20 men) were asymptomatic, had maintained a constant body weight during the previous 6 months, and were on a strict GFD. Forty-three of the patients were diagnosed as children (28 women and 15 men; average age 5.2 years) and 28 were diagnosed as adults (23 women and 5 men; average age 28 years). The average consumption of a GFD was ≥ 2 years. For each patient, there were two sex- and age-matched healthy controls (142 controls). Body composition was calculated by means of DEXA. The weight and BMI of female CD patients were lower than the controls (55.5 kg vs 58.7 kg, p=0.004 and 20.9 kg/m2 vs 22.4 kg/m2, p=0.03). The height and BMD were not significantly different, although BMD for those diagnosed as adults was lower than controls. Fat mass (22.9% vs 27.5%, p<0.05) and lean mass (38.8% vs 40.5%, p<0.03) were also significantly lower in cases versus controls. The weight (69.2 kg vs 73.3 kg, p=0.03), height (175 cm vs 178 cm, p=0.05) and BMI (21.9 kg/cm2 vs 23.5 kg/cm2, p=0.05) of male patients were significantly lower than in controls. Fat mass (13.9% versus 16.8%, p<0.05) and lean mass (55.5% versus 56.7%, p<0.03) were also significantly lower than in controls.
The study from Argentina by Smecuol et al.,371 enrolled 47 (41 females, 6 males) unselected, consecutive patients with newly diagnosed CD (diagnosed between Sept 1991 and Oct 1993). Twenty-five patients were re-evaluated in 1995 (24 females and 1 male). The diagnosis of CD was based on clinical features of classic and atypical symptoms, with positive small bowel biopsy and positive serology. Three patients were asymptomatic, the rest had classical features of CD. After 12 months, all patients on an initial GFD, improved. In the study, the patients acted as their own control—15 patients adhered strictly to the GFD, while ten were on a partial GFD. Patients on a strict GFD consumed less calories than patients who were poor compliers (p<0.05). After treatment, fat mass (18.2 kg, p<0.0001) and bone mass (2 kg/m2, p<0.002) increased significantly. Lean tissue mass did not increase. Body weight (55.7 kg, p<0.0001), BMI (22.2 kg/m2, p<0.001) and triceps skinfold thickness (15.8, p<0.0001) were increased significantly; mid-arm muscle circumference and muscle mass did not change. Patients who more strictly adhered to the GFD tended to demonstrate greater increases, although the trend was not significant.
Of the four studies that evaluated children, two were from Italy349, 352, one was from the Netherlands,350 and one was from India.369 Both Italian studies were case-control studies, whereas, in the Netherlands study, the patients acted as their own control. In one of the Italian studies by Barera et al.,349 29 consecutive children (14 boys and 15 girls) with a diagnosis of CD were enrolled (mean age 9.54 ± 3.42 yr). Diagnosis was according to ESPGAN criteria. Four patients had classic symptoms, while the rest had atypical CD. The patients were studied over 1.02 ± 0.15 years of GFD. Each patient was age- and sex-matched to a healthy control patient (n=29). At baseline, children with CD weighed less than the controls (28.3 ± 11 kg vs 34.5 ± 14.1 kg, p=0.04), had lower lean mass of limbs (8.4 ± 4.8 kg vs 10.8 ± 4.7 kg, p=0.0013), less fat mass (4.6 ± 3.5 kg vs 7.5 ± 4.9 kg, p=0.006), less percentage of fat mass (17.4 ± 8.3% vs 23.7 ± 8.4%, p=0.002) and lower bone mineral content (1067.2 ± 451.3 g vs 1317 ± 553.8 g, p=0.006). Height, BMI, lean mass, and ratio of lean mass to height, did not differ from controls at baseline. After an average of 1 year on a GFD in 23 children, no significant differences were found in weight, height, BMI, lean mass, lean mass to height, lean mass of limbs, fat mass, percentage of fat mass or bone mineral content (BMC), compared with controls. Compliance was good in all patients as assessed by EMA (only three subjects were still positive).
The second Italian study by Rea et al.,352 enrolled 23 children (8 boys and 15 girls, mean age 4.7 ± 0.76 yr) from Jan 1992 to Dec 1994, according to ESPGAN criteria. They were sex- and age-matched to healthy controls from the ambulatory clinic. At baseline, the height, BMC, arm muscle area (AMA), triceps skinfold (TSF), subscapular skinfold (SSSF), and fat area index (FAI), were significantly lower than controls. The BMI and weight for height index (WHI) were not different. After GFD, all the parameters improved when compared with patients to before GFD. Height, BMC, AMA, BMI, TSF, SSF, FAI and WHI all significantly improved. If patients post-GFD were compared with controls, the height was still significantly lower (p=0.01) but the rest of the values were not significant. After a GFD, the blood chemistry of these patients was assessed. The hemoglobin, iron, protein, albumin triglycerides, calcium, and zinc levels were significantly different from the baseline value; however, transferrin, cholesterol, phosphorus and alkaline phosphatase levels were not different.
The study from the Netherlands by Boersma et al.,350 enrolled 28 children (9 boys and 19 girls) with newly diagnosed CD (between Jan 94 to Jan 95). All children had classic symptoms and had positive small bowel biopsies. After 3 years of a GFD, the BMI SDS and height SDS improved significantly (p<0.0001 for both). The initial improvement of BMI SDS was seen in the initial 6 months with subsequent gradual improvement. The height SDS improved continuously over the 3 year period, and the improvement was significant.
In a study from India by Poddar et al.,369 104 children evaluated for CD between Sept 1997 to Dec 1998 were included. All children had diarrhea, failure to thrive or pallor as a clinical presentation. Fifty-seven were diagnosed as having CD (by modified ESPGAN score) and the remaining 47 were controls. Seven children who did not respond to a GFD and were excluded, were diagnosed with other diseases. The mean follow-up of patients after starting a GFD was 19.6 ± 8 months (range 4–36 months). The remaining 50 children had a dramatic response to the GFD. Symptoms subsided in 16±9.8 days (range 4–30) and all showed significant weight gain (66% ± 14% vs 86% ± 11% of expected, p<0.001). Height gain improved, but was not significant (88 ± 5% vs 94 ± 5% of expected, p=not significant). Seventeen percent of the children had poor compliance to the GFD. No attempt at subdividing patients into poor versus good compliance was made.
Nutritional status. Two studies looked at nutritional status with biochemical markers.
In the study from Finland by Kemppainen346 nutritional status of newly diagnosed patients with CD before and after GFD was reported. Forty patients with CD diagnosed between Nov 1988 to Dec 1990 were included. All had abdominal symptoms. Diagnosis was made on presence of partial villous atrophy (eight patients), subtotal villous atrophy (17 patients) or total villous atroph (15 patients). On mean histomorphometric index, there was a statistically significant trend (p=0.004) comparing partial villous atrophy (0.018 ± 0.003), subtotal villous atrophy (0.0015 ± 0.002) and total villous atrophy (0.013 ± 0.002). When biochemical measurements were examined according to grade of villous atrophy, significant differences were seen for ferritin (p<0.01) and transferrin (p<0.05). Serum ferritin was still significantly lower in total villous atrophy, as was erythrocyte folate levels if sex was standardized in an analysis of variance. Severity of villous atrophy also correlated with ferritin, erythrocyte folate, and serum vitamin B12. Abnormal values of serum protein, vitamin A, and vitamin B12, were low. There were no abnormal vitamin E levels. Villous atrophy improved in all patients within 12 months of a GFD. Two patients had subtotal villous atrophy, 29 had partial villous atrophy and three had normal villi after a GFD. Six patients withdrew from the study. BMI increased after a GFD, as did most of the biochemical measurements. One patient with subtotal villous atrophy still had a low hemoglobin value. Of the 29 patients with partial villous atrophy, three had low folate levels, seven had low hemoglobin, one had low vitamin B12, one had low protein, five had low vitamin A, five were low in ferritin, five had low iron, and ten patients had low zinc levels. Only one patient (out of three) who had normal villi also had low hemoglobin levels.
In the study from Italy, by Bardella et al.,361 26 adults (five male and 21 female, mean age 42.2, range 22–81) with malabsorption and biopsy-confirmed CD were enrolled. They were followed for a mean of 55.4 months (range 13–137 months) on a GFD. Eight patients remained in good health with normal blood tests. The remaining 18 patients had abnormalities despite GFD. No correlation was noted with severity of symptoms of malabsorption and biochemical abnormalities. Iron deficiency was found in five patients. Abnormal calcium, phosphorus, alkaline phosphatase and/or bone density was found in seven patients. Macrocytic anemia was found in four patients. Clinical symptoms were seen in 11 patients. No correlations between abnormal values and grade of histology on biopsy were found.
Compliance. Three studies were identified that looked at compliance, 363–365. All studies were conducted in Italy and assessed an adolescent population.
In the first study of adolescents that looked at dietary compliance, Fabiani et al.363 evaluated 28 biopsy-proven CD patients (17 females and 11 males). These 28 adolescents were selected from a group of 6,315 students, age 11 to 14 years, who had previously been screened for CD. All were advised to start a GFD. Twenty-three of the 28 patients participated in this study. The mean follow-up duration was 23 ± 7 months (range 9–3 months). Fifty-two percent (12/23) were on a strict GFD and 47% (11/23) partially adhered to the diet. Improvement in most patients was seen after starting a GFD. Weight gain was reported in 12 patients (52%)—11 had increased height velocity and appetite, eight had disappearance of symptoms of abdominal pain, six had resolution of diarrhea, five had disappearance of anemia and three had disappearance of recurrent aphthous stomatitis. Three patients did not demonstrate any change.
The second study, also by Fabiani,364 was a 5-year case-control study that enrolled two groups of patients. The first group (group A) included subjects between the ages of 11 and 14 years, who were diagnosed as a result of a mass screening program. The second group (group B) were patients diagnosed due to typical symptoms of CD between 1985 to 1986. All patients had biopsy-proven CD according to ESPGAN criteria. All patients were followed for 5 years and advised to start a GFD. Twenty-seven patients were in group A and 22 agreed to participate; 24 patients were in group B and 22 agreed to participate. There were no differences between the patients in group A and group B in terms of BMI and height SDS. No difference was found between the two groups in terms of symptoms. Adherence to the treatment was significantly lower in patients from group A compared with group B. There were a significantly greater proportion of patients in group B that demonstrated strict adherence to a GFD (15/22; 68%) compared with patients in group A (5/22; 23%).
The third study to look at compliance looked at 306 teenage patients with CD (mean age 15.9 yr; range 10–27 yr) recruited consecutively from a CD clinic.365 Of the patients, 186(60%) were female and 120 were male. Diagnosis of CD was biopsy confirmed. Recall questionnaire was used to evaluate diet and compliance. Compliance was recorded in three categories: 1) strict gluten diet (n=223 [73%]); 2) occasional relapse (n=46) 15%; and, 3) gluten-containing diet (n=37) 12%. Eighty percent of the female patients, compared with 64.2% of the male patients, adhered to a strict diet (p=0.012). Compliance also varied with age, with older age associated with less compliance (p=0.05). Growth status was grouped according to compliance to a GFD—the mean standardized height, the relative weight for age, and the relative weight for height, did not differ significantly between the compliance groups. Symptom scores were relatively good among all groups. No statistically significant differences were noted. School performance was not significantly different between good versus poor compliers.
Costs. Five studies included an assessment of costs involved in different screening strategies.360, 366, 379, 380, 382
Harewood et al.366 performed a decision analysis to compare costs of serological testing versus small bowel biopsy (AGA vs EMA versus small bowel biopsy) for diagnosis of CD. The analytic technique used was a cost minimization and the viewpoint was third-party payer. A sensitivity analysis was conducted. The authors demonstrated that initial screening with EMA is the least costly strategy for diagnosis in a low to medium risk population.
Gomez et al.382 evaluated a screening algorithm for CD in 1,000 consecutive subjects who were screened while attending a central laboratory. Gomez and colleagues compared two screening protocols: (1) three-level screen-IgG/IgA-AGA antibodies at the first level, then IgA-EMA, and finally intestinal biopsy versus screening, and (2) tTG-GP and total IgA as first-line screen, and EMA for positive patients followed by intestinal biopsy. The analytic framework and viewpoint were not stated. In this study, a comparative cost analysis was performed. They found that the combination of a highly-sensitive test at the first step with a highly-specific test at the second step appears to be a more reliable screening mechanism.
Zaccari et al.,379 in an Italian model, proposed a four-level screening protocol for children at least 15 months of age, including: 1) AGA, 2) EMA, 3) intestinal permeability, and 4) small bowel biopsy. In this study, they evaluated only the total costs at each level of screening.
Atkinson et al.,360 in a Canadian study, evaluated the operating costs of EMA in the diagnosis of CD using a cost-minimization model with a decision analytic approach with three strategies. The analytic perspective used was the societal viewpoint, and costs were discounted at 5% per annum. A one-way sensitivity analysis of all probability and cost estimates was performed. Incremental costs of the GFD were estimated from a survey of 25 patients which resulted in a lifetime incremental cost of $44,000. If a small bowel biopsy was performed initially, the cost was $997; for EMA followed by small bowel biopsy, the cost was $866. The total cost was $3,714, which resulted in an incremental cost savings of $2,177 if small bowel biopsy had been performed first. In the sensitivity analysis, the specificity of EMA would have to be greater than 95% to make EMA least expensive.
There were 27 studies that examined the response of various endpoints to a GFD.
One Italian study,354 used a case-control design to evaluate the effect of a GFD on thyroid status. The study by Annibale et al.,358 evaluated the impact of a GFD on anemia and iron deficiency in newly diagnosed CD cases identified from screening of adults with IDA in Italy. In a case-control study, Ciacci et al.351 investigated the impact of a GFD on pregnancy outcomes, and Addolorato et al.374 evaluated the impact of a GFD on anxiety and depression in a population of CD patients in Italy. Mortality was evaluated in seven cohort studies.331, 335, 336, 343, 362, 367, 368 Seventeen studies assessed either change in BMD or fracture as an endpoint in individuals with CD.
Thyroid study. In the Italian study,354 241 consecutive adults with biopsy-confirmed CD were enrolled between Jan 1996 and July 1998 (177 women and 64 men). Forty percent of patients had classical symptoms, 44% had atypical symptoms and 16% had silent CD. Two hundred and twelve patients, matched for age, sex and ethnic origin, were used as controls. All newly-diagnosed CD patients were started on a GFD and patients with hypo- or hyperthyroidism were started on appropriate medical therapy. Thyroid dysfunction was found in 73 (61 women and 12 men) of 241 patients with CD, and in 24 (19 women and 5 men) of the 212 patients in the control group (p<0.0005). The difference was statistically significant for women when divided by sex (p<0.0005). Hypothyroidism was diagnosed in 31 patients (12.9%) and nine controls (4.2%) (p<0.003); it was subclinical in 29 CD patients and eight controls and overt in the remainding patients. The difference was only significant for women (p=0.0045). Twenty-one patients and four controls had non-autoimmune hypothyroidism. Ten patients and five controls had autoimmune hypothyroidism. Hyperthyroidism was diagnosed in three patients and seven controls; it was subclinical in two patients and five controls. Autoimmune thyroid disease with euthyroidism was present in 39 patients and eight controls. The difference was only statistically significant in women (p<0.0005). At diagnosis, the BMI, hemoglobin, iron, and albumin levels were similar between patients with thyroid disease and those without. After 1 year of a GFD, 128 patients were reassessed. Ninety-one patients had normal thyroid function, whereas, 37 had some impairment. Compliance to diet was not different between the two groups. Subclinical hypothothyroidism improved in 10/14 patients with non-autoimmune hypothyroidism. Three of five patients with autoimmune hypothyroidism shifted to autoimmune thyroid disease with euthyroidism; four out of five patients with no improvement in thyroid function had poor compliance with diet. Significant improvement in nutritional indices was also seen with BMI in females, HBG in both sexes, and serum albumin and serum iron in both sexes.
Iron deficiency. In this Italian prospective study,358 190 consecutive patients (160 women and 30 men) who were referred to the GI department from the hematology for IDA between Jan 1994 to May 1997, were examined. Twenty-six patients were diagnosed with CD (24 women and 2 men); average age 31.3 years (range 20 –72). Seventy-seven percent of patients had total villous atrophy and 23% had subtotal atrophy; repeat endoscopy with biopsy specimens were taken after 6 months. After GFD, 20 patients (18 women and 2 men) were followed for 24 months. After 6 months, 14 of the 18 female patients (77%) recovered from IDA. Only 5/18 reversed from iron deficiency as defined by normal ferritin levels. At 12 months, 17/18 recovered from IDA. Nine patients reversed from iron deficiency. After 24 months, the same patient still did not reverse from IDA. Ten patients (55%) reversed their iron deficiency. Of the two males, at 6 months of a GFD, only one recovered from anemia but not from iron deficiency (low ferritin). At 12 months, both patients reversed their anemia and iron deficiency. At 24 months, further increases in ferritin were observed. In a subgroup of patients that had repeat small bowel biopsies at 6 and 12 months, there was a significant inverse correlation between increases in Hb concentrations and decreases in histological scores of duodenitis. This study demonstrated that recovery from IDA occurs within the first 6 to 12 months, but reversal from iron deficiency occurs in 50% of cases (predominantly premenopausal women). Long-term follow-up of ferritin results and small bowel biopsies in subjects with CD would be helpful to determine if iron deficiency resolves completely.
Pregnancy outcomes. In this case-control study from Italy by Ciacci et al.,351 297 women with CD were enrolled. Three types of analyses were used. Analysis A was a case-control study between untreated women (n=94; at least one pregnancy when symptoms of CD were present and lead to eventual diagnosis) and treated CD women (n=31; at least one pregnancy after 1 year of a GFD). At baseline, weight, height and body mass index were the similar between the two groups. However, the treated group was significantly younger than the untreated group (37.3 ± 12 yrs vs 22.4 ± 1.6 yrs, p<0.01), which may have biased the results. The number of pregnancies per woman was also lower for the treated group (2.72 ± 0.16 vs 1.6 ± 0.11, p<0.0001). The number of abortions per woman (0.489 ± 0.085 vs 0.032 ± 0.032, p<0.0001), as well as the abortion to pregnancy ratio, was much lower for the treated group compared with the untreated group(0.153 ± 0.027 vs 0.024 ± 0.024, p<0.005). Subgroup analysis taking into account the age at diagnosis, demonstrated that for those women diagnosed at age 30 years or less (n=27), the number of abortions per woman was 0.556 ± 0.156 and the abortion to pregnancy ratio was 0.234 ± 0.066. The prevalence of abortion in pregnancies was 17.8% in untreated CD patients, compared with 2.4% in treated patients (p<0.001). The RR of abortion was 8.9. Low-birth-weight baby to pregnancy ratio (0.126 ± 0.037 vs 0.024 ± 0.024, p<0.03) was significantly lower in the treated group. The duration of breast feeding was significantly longer for the treated group (2.77 ± 0.52 vs 7.03 ± 1.17, p<0.0003). The threatened abortion to pregnancy ratio and premature delivery to pregnancy ratio was not significantly different from untreated to treated CD women. For the subgroup of women <30 years (n=27), birth weight, baby to pregnancy ratio, and duration of breast feeding, did not alter the statistical significance. The prevalence of low birth weight babies in nonabortive pregnancies was 12.7% for untreated patients and 2.4% for treated patients (p<0.05). The RR of low birth weight babies was 5.84 times greater in the untreated group compared with the treated group.
In Analysis B, women with CD were all untreated and then analyzed depending on whether diarrhea was present or not. The authors found that the abortion to pregnancy ratio and the premature delivery ratio were found to be lower in CD women without diarrhea compared with those women with diarrhea, although the difference was not statistically significant.
In Analysis C, the effect of a GFD on pregnancy outcome was analyzed. The study examined 12 women with CD after 1 year of a GFD (own control); there was at least one pregnancy without treatment. All outcomes were better in the group of women on the GFD: number of pregnancies 2.5 ± 1.24 versus 1.08 ± 0.29 (p<0.003); number of abortions per woman 1.08 ± 1.16 versus 0.08 ± 0.28 (p<0.02); abortion to pregnancy ratio 0.405 ± 0.140 versus 0.074 ± 0.280, p<0.02); and, low birth weight baby to pregnancy ratio 0.292 ± 0.129 versus 0 (p=0.05). The threatened abortion to pregnancy ratio, premature delivery to pregnancy ratio, and duration of breast feeding, were not significantly different between the two groups. The prevalence of abortion was 43.3% for the untreated group, compared with 7.7% for the treated group of CD women (p<0.01). The RR of abortion was 9.18. There were no low birthweight babies born to women in the GFD group, whereas, the prevalence of low weight babies was 29.4% in the untreated group (RR=11).
One of the limitations of the Ciacci et al. study was that it did not include an external control group or control for confounders. A historical cohort population-based study of the Danish Medical Birth Registry by Norgard, 1999393 evaluated birth outcomes in women with CD. This study included 211 newborns born to 127 mothers with CD from 1977-1992 and compared them with 1,260 control deliveries. Women with CD were identified from hospital discharge diagnoses. Discharge records were linked to Medical Birth Registry which contained information on relevant outcomes. Outcomes included birthweight, low birthweight (<2500 g) pre-term birth (<37 wk), intrauterine growth retardation (birthweight <2500 g and gestational age ≥37 wk of pregnancy), and perinatal mortality. Potential confounders including maternal age, infant's gender, parity, and gestational age, were adjusted for in the analyses. The investigators could not control for other confounders such as smoking. Another potential limitation is that the date of diagnosis of CD was the initial time of discharge from hospital with CD. It is possible that women may have been initially diagnosed in the ambulatory care clinic. Details about the clinical presentation of the women with CD and biopsy findings were not available. The mean age at time of delivery was 27.5 years for women with CD and 26.3 years for control women.
Norgard et al.,393 found that before women were hospitalized for CD, they were at an increased risk of low birthweight babies (adjusted OR=2.6 [95% CI: 1.3–5.5]), and intrauterine growth retardation (12.3% vs 4.8% of controls; adjusted OR=3.4 [95% CI: 1.6–7.2]). After women with CD were first hospitalized, there was no increased risk of low birthweight babies (6% post diagnosis) or intrauterine growth retardation, when compared with controls. The results of this study have implications for women with undiagnosed (atypical or silent) CD.
Anxiety and depression. The study from Italy by Addolorato et al.,374 enrolled 43 newly-diagnosed adult patients affected with classic CD, selected from 234 adult CD patients from an outpatient clinic between June 1995 and Oct 1998. No psychiatric disorders other than anxiety and/or depression were allowed. The diagnosis of CD was based on positive serology and biopsy. Of the 43 enrolled patients, eight dropped-out leaving 35 (14 males and 21 females, mean age 29.8 ± 7.4 yr) patients for analysis. After a period of 12 months of GFD treatment, the patients were analyzed. The adherence to a GFD was evaluated based on patient self-report and family member interview. A group of 59 healthy asymptomatic controls (27 males and 32 females, age 31.7 ± 6.9 yr) were matched for gender, age, residence, employment, socioeconomic and marital status. The psychological assessment was performed using a self-rating psychometric test for anxiety (State and Trait Anxiety Inventory test) and another for depression (SDS Zung self rating depression scale). Both tests were administered before and after GFD. Of the 59 controls, 23.7% showed high levels of anxiety, 15.2% showed trait anxiety, and 9.5% were positive for depression. Of the 35 untreated CD patients, 71.4% had high levels of anxiety, 25.7% showed trait anxiety and 57.1% were positive for depression. After 1-year of GFD, 25.7% had high levels of anxiety, 17.1% had trait anxiety, and 45.7% were still depressed. The levels of high anxiety (71.4% vs 23.7%, p<0.0001) and levels for depression (57.1% vs 9.6%, p<0.0001) were significantly higher in the CD patients than in the controls. The proportion of untreated CD patients with trait anxiety did not differ from controls. After a 1-year GFD, a significant decrease in high-state anxiety (71.4% vs 25.7%, p<0.001) was found when treated patients were compared with the untreated group. No significant differences were found for trait anxiety or depression.
All six studies were retrospective and there were two cohort studies385, 388. Two studies included individuals that had biopsy-confirmed CD. All studies included controls as a comparator, and in three studies the controls appeared to be population-based.385, 388, 394 With regards to the ascertainment of the outcome of fracture, data was obtained from self-report data from administrative databases,394 patient register,385, 388, 394 or from interview/case reports.389, 390, 392 Only two studies mentioned inclusion of asymptomatic subjects.389, 392 Bone histology was mentioned as an outcome in a subset of patients in one study.390
The case-control study by Fickling and colleagues,390 compared individuals with CD attending a GI outpatient department and/or members of local celiac societies. The authors found a higher prevalence of past history of fractures in the CD patients (21%[16/765]) compared with a control group (3% [2/75]; RR 7.0). There was no difference in BMD T-score results between those with and without a history fracture, although those patients with a fracture history were older (p<0.02). Limitations of this study include the fact that they did not identify whether CD was biopsy-confirmed, and a potential for selection bias.
Thomason et al.,373 in a case-control study, used self-report data for 244 patients with biopsy-proven CD and found that fractures were not significantly increased in those with CD compared with controls (OR 1.05, 95% CI: 0.68–1.02), although there did seem to be a trend to increased wrist fractures (OR 1.21, 95% CI: 0.66–2.25). The mean age of these patients was older (60.2) and the mean BMI was higher (23.9) than that reported in other studies. However, this study may have been limited by potentially not having adequate power to detect fractures. In addition, all the fracture data was self-reported.
Vasquez et al.,389 in a retrospective case-control study, found that 25% (41/165) of CD patients had one to four fractures, compared with 8% in age- and sex-matched controls. The majority of fractures occurred prior to diagnosis of CD and the most common fracture site was the wrist (OR 3.5, 95% CI: 1.8–7.2). Potential sources of bias for this study include the fact that the cases were from a malabsorption clinic and may therefore represent patients with more severe disease (mean BMI=21.4). The OR for vertebral fractures was 2.8 (95% CI: 0.7–1.15), although there was incomplete ascertainment of X-rays, since not all X-rays were of adequate quality. This was the only study to include an assessment of the proportion of patients on a strict versus a reduced GFD.
Two studies were population-based.385, 388 Vestergaard et al.,388 evaluated all individuals with CD in Denmark captured from hospital discharge data, and did not find an increase in fractures requiring hospitalization in patients with CD (n=1,021; 7,774 patient years) relative to controls (n=23; 316 patient years) with an independent independent relative risk (IRR) at pre-diagnosis of 0.70 (95% CI: 0.45–1.09) for all fractures. For spine, the IRR pre-diagnosis was 2.14 (95% CI: 0.70–6.57) and 1.07 (95% CI: 0.39–2.95) for rib and pelvis. There are significant limitations to this study since the diagnosis of fractures was hospital-based and therefore, fractures that did not require hospitalization would be missed and could lead to under-reporting. In addition, the diagnosis of CD was only validated in a sample of nine cases (with a validity of 78%), and all cases of CD had to be hospitalized to be included.
West et al.,385 in the largest analysis of fractures in CD patients identified from the UK GPRD primary care database, found an increase in fractures in CD patients relative to controls. The mean age at diagnosis was 43.5 years, and the ascertainment of fractures was from an administrative database. For any fracture, the hazard ratio was 1.3 (95% CI: 1.16–1.46; 137.9/10,000 patient years vs 105.9/10,000 patient years in controls]). The hazard ratio for hip fracture was 1.9 (95% CI: 1.2–3.02) and the hazard ratio for wrist fracture was 1.77 (95% 1.35–2.34). The absolute difference in the overall fracture rate was 3.2/1,000 person years and 0.97/1,000 for hip fractures in those older than age 45. In contrast to earlier studies, the authors did not find a difference in the risk of fracture after CD diagnosis compared with before diagnosis.
A recent case-control cross-sectional study by Moreno et al.,392 compared fractures in 148 CD patients (53% classically symptomatic, 36% subclinical CD, and 11% silent CD-detected by screening} to 296 controls (functional GI disorders). The fracture data was self-report obtained by interview/and pre-designed questionnaire. Moreno et al. found an increased number of fractures in the peripheral skeleton for classically symptomatic subjects compared with controls, but did not find an increased number of fractures in the subjects with subclinical or silent CD.
BMD. BMD is a surrogate outcome for fracture, and it is easier to evaluate in short-term studies. Previous studies of osteoporosis therapies in postmenopausal osteoporosis have shown that there may not, however, be a direct correlation between fracture reduction and increases in BMD. Osteoporosis/osteopenia may be a sign of subclinical CD and persisting osteopenia/osteoporosis in a patient with known CD may be a sign that the mucosa has not normalized.
BMD is an areal two-dimensional measure of bone mass and does not give a true volumetric measure and, therefore, may not be an accurate reflection of bone mass in children.
We found 11 articles that addressed the outcome of BMD/BMC in newly diagnosed subjects with CD.348, 352, 353, 355, 356, 375–378, 386, 387 The study characteristics are summarized in the Evidence Tables (see Appendix H).
The majority of these studies assessed BMD at baseline and the percentage change after a variable follow-up period (1 to 5 years in duration). Two studies evaluated the BMD of children with CD,352, 377 one study evaluated a mixed population,348 and the remaining studies evaluated adults. All studies included individuals with biopsy-proven CD and in most of the studies BMD was compared with a control population. Only two studies had patients with CD act as their own controls.353, 376 The female to male prevalence ratio in CD is 2:1, and in these studies the proportion of females varied from 50% to 80%.
Five studies included assessments of dietary compliance to a GFD and three studies included data on whether subjects were on co-interventions (e.g., vitamin D or calcium), which may have impacted the BMD results. Only two studies356, 376 looked at the potential relationship between the change in histological grade on small bowel biopsy and change in BMD.
Prevalence of osteoporosis/osteopenia. The studies consistently found that BMD results were lower in untreated subjects with CD compared with controls. Regarding the prevalence of osteopenia/osteoporosis in newly diagnosed patients with CD, the estimates varied. Satgena-Guidetta et al.353 noted a mean Z-score of -1.5 at lumbar spine, and -1.8 at the femoral neck, with 34% of subjects having normal BMD, 40% having osteopenia and 26% osteoporosis. Valdimarsson et al.355 found the prevalence of severe osteopenia, as defined by a Z-score less thatn -2, to be 15% at the spine, 9% at the femoral neck, and 22% at the forearm. The prevalence of mild osteopenia (defined as -2 ≤ Z < -1) was 23% at the lumbar spine and 24% at the forearm. There was not any difference in lumber spine BMD between those patients who presented with malabsorption, compared with those patients without malabsorption. Valdimarsson et al., found that 27% of subjects had secondary hyperparathyroidism. After 1 year on a GFD, the prevalence of those with severe osteopenia decreased from 23% to 14%.
In a recent review the authors pooled prevalence results and found that patients with untreated CD had a mean Z-score of -1.42, and a hip Z-score of -1.14.381
Valdimarsson et al.,356 in a prospective study of 105 newly-diagnosed CD patients, performed follow-up small bowel biopsies. Of the 105 subjects, 28 had secondary hyperparathyroidism. They found a greater reduction in BMD in individuals who had secondary hyperparathyroidism (PTH>65). In this group, the BMD increased significantly, but did not completely normalize after 3 years of a GFD. In contrast, in those with normal PTH at diagnosis, the baseline BMD was not as low and there was a 2.5% increase after 1 year with the BMD normalizing after 2 years of a GFD. Valdimarsson also noted that 22 patients with stage III-IV had lower median Z-scores than 76 patients with mucosal changes grade I-II. In this study, compliance with the GFD was 100% in those with high PTH, and lower at 87% in those with normal PTH levels.
Kemppainen et al.,376 in a 5-year cohort study of 28 patients in which the cases served as own controls, found that BMD increased or remained stable in 69% of patients at the lumbar spine and in 67% of patients at the femoral neck. In this study, the authors did not notice an effect of the grade of villous atrophy on the mean BMD values or percentage change in BMD. They also did not observe any correlation between adherence to the GFD and the change in BMD.
Bai,375 in a small cohort of 45 (25 completed) newly-diagnosed CD patients, assessed compliance with the GFD and found that 84% of patients increased their lumbar spine BMD (mean increase of 12%) and total body BMD (mean increase of 7.3%), compared with 151 control subjects. The greatest increase in BMD was noted within the first year. Bai375 documented prior fractures in two patients, but did not report any fractures during the 4-year follow-up period.
Sategna-Guidetti et al.,353 in a longitudinal study of 86 CD patients, noted a similar proportion of patients (83.7%) increased their spine BMD after 1 year, with an increase of 5.3% in LS BMD after 1 year (change in Z-score of 0.5 at the spine).
Ciacci et al.,386 in a retrospective cohort of 41 consecutively diagnosed patients with CD, noted a significant increase in BMD (14% lumbar spine, and 10.4% femoral neck), after 1 year on a GFD. The authors also found that pretreatment BMD predicted response to treatment.
Mustalahati et al.,378 noted a significant increase in lumber spine and femoral neck BMD with treatment after 1 year compared with controls, and noted that the BMD was lower in symptom-free patients (n=15), suggesting patients with silent CD may have mucosal lesions for longer periods of time.
Bardella,348 in a case-control study of 71 CD patients (43 who had started a GFD in childhood and 28 who were diagnosed as adults and were on a GFD and in remission), found that the BMD of the adult CD patients was significantly lower than the control value (0.9 g/cm2 vs 1.1 g/cm2, p<0.01).
McFarlane et al.,387 in a case control study of 21 biopsy-confirmed subjects with CD, documented that the baseline lumbar spine BMD was 85% of that seen in controls, and the increase in lumbar spine BMD over the first year was 6.6% (95% CI: 3.1–10.1) and 5.5% in the femoral neck.
Children/adolescents. Mora et al.,377 in a study of 19 patients (211 controls), noted a lower BMD in CD patients versus controls at baseline, and an increase in total body BMD (using DXA) during the first year when compared with controls (15.2%).
Rea et al.,352 noted an improvement in forearm Z-score after 1 year on a GFD in 23 newly diagnosed children with CD.
Mortality. There were seven cohort studies that addressed mortality data in CD. Two were Italian studies, 335, 362 one was from Denmark,343 one from Sweden,331 and three were from the UK.336, 367, 368 All seven were cohort studies.
Cottone et al.335 evaluated mortality in a prospective cohort study of 228 biopsy-proven CD subjects in Sicily. Mortality was ascertained by reviewing hospital medical records and pathology specimens. Records were incomplete for 5% of patients. The mean age at diagnosis was 34.7 years and 100% of patients were on a GFD. Seventy-six percent were females. The clinical presentation was anemia in 60% of cases, malabsorption in 20% of cases, and asymptomatic in another 10% of cases. The mean follow-up was 73 months. Twelve deaths were observed, with 3.12 deaths expected and the SMR from all causes was 3.8 (95% CI: 1.9–6.7). The mortality rate was increased within the initial 4 years from diagnosis, giving an SMR of 5.8 (95% CI: 2.5–11.5).
Nielsen et al.343 from Denmark, conducted a retrospective cohort study of 98 CD patients between 1964-1982. Sixty-one percent of patients were females and the median age at diagnosis was 41 years (range 2 to 74 yrs). Twenty-four percent of patients had unclassified CD and were treated with prednisone, since they did not respond to a GFD and had probable refractory CD. Twenty-three deaths occurred during the study (four due to malignancy). Nielsen et al. found that the 5-year survival rate was 88%, the 10-year survival rate 68.5%, and that mortality exceeded that of age- and sex-matched controls in the general population by a factor of 3.4 (p<0.025). There was no difference in mortality between males and females (2.7 and 2.3, respectively). Subjects who responded to a GFD had an extra mortality factor of 2.2 (p<0.025), and those who did not respond to a GFD had an extra mortality factor of 5.8 (p<0.005). Causes of death were poorly documented.
Peters et al.,331 in a retrospective cohort study, compared 10,032 symptomatic subjects with CD who had been discharged at least once from hospital, to controls who were age/sex-matched for the calendar period cancer incidence rate. Fifty-nine percent were females. Mean follow-up was 9.8 years. Mortality was ascertained from a national death register. There were 828 deaths, with 419.3 expected, resulting in a SMR of 2 (95% CI: 1.8–2.1). Mortality risk decreased slightly with increasing number of years of follow-up (p for trend, 0.004). Mortality risks were increased for patients with NHL, cancer of the small intestine, autoimmune diseases (RA), allergic disorders, inflammatory bowel disorders, diabetes, and tuberculosis.
The first UK study was conducted in Birmingham, by Holmes et al.367 Series I included 202 patients with idiopathic steatorrhea or CD, followed from 1965-1975. Ten patients had a positive biopsy for CD. Eleven patients could not be traced. In the 10-year period, 20 deaths were seen, with ten due to malignancy. Series II (1989) had 210 patients (94 males and 116 females) with biopsy-proven CD. Seventy patients were on a normal diet and 134 were on a GFD for more than 12 months at the end of the survey. Forty-three patients had died from all causes (expected was 20.82 deaths, p<0.001); 21 deaths were due to malignancy—13 reticulum cell sarcomas, six GI tract cancers and two other malignancies. Of the 21, 13 had a GFD for a mean of 41 months. Deaths from all malignancies, irrespective of diet, were statistically increased as a whole (expected 5.048 vs observed 21, p<0.001) and divided by sex (men expected 2.878 vs observed 12, p<0.001 and women expected 2.170 vs observed 9, p<0.001). Patients taking a normal diet were at increased risk of developing a malignant tumor (p<0.05). Clinical response did not predict the risk of developing malignancy.
Johnston et al.368 examined CD in subjects from Northern Ireland using the Belfast MONICA project. MONICA I was the first survey, and began in Oct 1983 with 1,204 subjects. Of the subjects, 102 (52 males and 50 females, mean age 58.1 years) had positive serology, 72 consented to follow-up (34 males and 38 females) for 11.6 years (range 11.3–11.9 years), and 20 of the 72 gave consent to biopsy. Three subjects had villous atrophy. Thirteen subjects in MONICA I (seven males and six females) died (mean age at death 67.3 yrs; range 56–75 yr). Cause of death was obtained from death certificates from the General Register Office or General Practitioner records. Four patients died with malignant disease-pancreas, stomach, bile duct lymphoma and metastatic melanoma. None of the patients had CD, but all had positive serology. The number of cancer-related deaths and all cause mortality in the MONICA I follow-up study did not show an excess number of deaths compared with the general population of Northern Ireland.
Logan et al.336 followed a prospective cohort of 653 patients with CD in Edinburgh between 1979 and 1981. All patients had biopsy-proven CD and mortality was ascertained from death certificates. Sixty percent of the patients were females and the mean follow-up was 13.5 years. Six percent of subjects were lost to follow-up. Clinical presentation was not reported. The subjects with CD were compared with age/sex-matched controls. There were 115 deaths from all causes; the expected number was 61.8 for a SMR of 1.9 (95% CI: 1.5–2.2). The increased mortality was greatest during the initial year after diagnosis and declined over time. The mortality rate for those diagnosed during childhood was similar to that of the general population.
Out of 502 citations identified by the search strategy for the Celiac 5 objective, 189 met level 1 screening criteria (Appendix F). Of these, 86 met level 2 screening criteria and 20 studies met level 3 inclusion criteria.396–415
Of the included studies, eight studies offered correlation between serology and mucosal histological grade,397, 398, 403, 404, 407, 409, 413, 415 and eight reported on serology only.396, 399–402, 408, 410, 412 Four studies focused on histologic changes without serology.405, 406, 411, 414 Nine of the included studies were conducted in an adult population, six in a pediatric or adolescent population, and five studies in mixed populations consisting of adults and children.
Included articles were divided by study population (adult/children/mixed), antibody type (IgG or IgA), and by antibody methodology (e.g., ME or HU).
None of the identified studies directly assessed the efficacy of a specific intervention on the promotion of adherence to a GFD. Six studies hint at interventions that could potentially be effective.416–421 Four of these studies were applicable to a pediatric population and two studies were applicable to adults.
Biopsy. To evaluate serology in assessing adherence, some information regarding mucosal recovery on GFD must first be known. Although mucosal recovery is generally assumed to occur within 6 to 12 months after starting GFD, there is evidence that recovery may be slower and more incomplete than previously assumed.
In a mixed population, Wahab et al.405 followed the histologic profiles of 158 patients after institution of a GFD. Histological recovery, defined as the absence of villous atrophy (Marsh 0–II), was seen in only 65% of the patients within 2 years. Within 5 years, 85.3% of patients showed recovery, and an incremental improvement to 89.9% occurred after 5 years. Of the 10.1% of patients not achieving histological recovery during the follow-up period, 11 had symptoms of CD and were therefore, considered to have refractory CD (7% of all patients). Patients with Marsh IIIb and IIIc histology initially had lower rates of recovery, compared with those with Marsh IIIa histology. In a subgroup analysis of 25 children, recovery seemed to occur faster—96% showed histological recovery within 2 years (p<0.01 vs adults) and 100% recovered in long-term follow-up. It is important to point out that the validity defining a Marsh II lesion as histological recovery is uncertain. If these patients were not included, rates of histological recovery would be even slower. Nonetheless, clinical improvement was seen despite the slow histological improvement.
An early study by McNicholl et al.,406 is consistent with the finding of more complete mucosal recovery in children. Thirty-six children on a GFD for a mean of 5.8 years underwent duodenal biopsy. Mucosal morphology was normal in 16 (44%) patients, while the remainder of the patients had minimal changes. Villous atrophy was not seen. IEL counts were normal in 30 (83%) patients. A subsequent gluten-challenge confirmed the diagnosis in all 36 children.
Lee et al.,411 in a retrospective cohort of 39 adult patients, also found incomplete mucosal recovery. After a mean duration of a GFD for 8.5 years (range 1 to 14 years), histology was normal in only 21% of patients, and partial and total villous atrophy was seen in 69% and 10% of patients, respectively. These patients were felt not to have refractory CD since they had a good clinical response to the GFD. Also of concern were the results of serologic testing at the time of follow-up biopsy in 31 patients. Despite the relatively high number of patients with some degree of villous atrophy, IgG-AGA, IgA-AGA and IgA-EMA were negative in the majority of patients. In fact, 77% of the 31 patients having serologic tests were negative for all the listed serological tests. The exact number of these 31 patients who had some degree of villous atrophy was not reported, but would be expected to be similar to the overall numbers listed above.
Selby et al.414 investigated whether the failure of mucosal recovery was due to noncompliance with a GFD. Eighty-nine adult patients with CD on a GFD for a mean in excess of 8 years underwent dietary assessment by a dietician, questionnaire and food diary. They were then classified as either Codex GFD, which allows up to 0.03% of protein from a gluten source, or no-detectable gluten GFD (NDG-GFD). Villous atrophy persisted at high rates in both groups, with 46% of those on Codex GFD and 40% of those on NDG-GFD having persistent villous atrophy. The patients in this study did not have clinical features of refractory sprue. Based on the fact that there were similar histologic profiles in both groups, the authors postulate that persisting mucosal abnormalities may be unrelated to gluten non-compliance. Of course, gluten intake in the NDG-GFD group undetected by study protocols cannot be ruled out.
Serology. The studies assessing the utility of serology in monitoring adherence can be divided into those with,397, 398, 403, 404, 407, 409, 413, 415 and those without396, 399–402, 408, 410, 412 biopsy correlation. The studies without biopsy correlation are reviewed first. They establish an association between serologic positivity and patient compliance.
Bartholomeusz et al.396 demonstrated higher rates of IgA-AGA positivity in non-compliant as compared with compliant CD patients in a mixed population. How compliance was ascertained is not described. Three of the 17 (17.6 %) patients compliant with a GFD for greater than 6 months were IgA-AGA positive as compared with 11 of 12 (91.6%) non-compliant patients. The PPV for non-compliance was calculated to be 78.5%.
Burgin-Wolff et al.400 showed that, as expected, serology becomes positive with gluten challenge. One hundred and thirty-four children with CD underwent gluten challenge and were assessed for IgA-AGA and IgA-EMA-ME. At baseline, the rate of serologic positivity was 23% for AGA and 13% for EMA. Within 3 months of gluten challenge, 97% of children were positive for AGA and 65% positive for EMA. Between 3 months and 1 year, 85% of children were positive for AGA and 84% positive for EMA.
In a mixed population, Fabiani et al.408 demonstrated significantly higher IgA-tTG-GP values in patients deemed to be non-compliant with a GFD as compared with compliant patients.
Bardella et al.399 demonstrated that the positivity of various serologic markers falls in adults with duration on a GFD (Evidence Tables, Appendix I). The five groups in this study were untreated CD, poor GFD compliance, GFD less than 2 years, GFD greater than 2 years, and a control group. As expected, IgA-AGA, IgA-EMA-ME and IgA-tTG-GP were positive in virtually all untreated CD patients. Also, as expected, there was a low rate of positive serology in the control group, with a higher percentage being IgA-AGA positive than either IgA-EMA-ME or IgA-tTG-PG. In the poorly-compliant CD group, all were positive for all three serologic tests. In patients on a GFD less than 2 years, the rates of positive AGA, EMA and tTG were 40.9%, 54.5%, and 63.6%, respectively. In patients on a GFD for more than 2 years, the rates were 16.2%, 9.5% and 14.2%, respectively. The overlap of CIs intervals was such that no differences between the serologic tests could be determined.
Vahedi et al.402 studied IgA-EMA and IgA-tTG in adult CD patients. Based on dietary inquiry, patients were divided into those on a strict GFD, those with minor transgressions and those with major transgressions. It was not reported whether the EMA was ME or HU, nor was it reported whether tTG was GP or HR. The median duration of GFD was 75 months. Among those on a strict GFD, 2.5% and 3% were IgA-EMA and IgA-tTG positive, respectively. Among those with minor transgressions, positivity was only 37% and 31%, respectively. Among those with major transgressions, positivity was 86% and 77%, respectively. The sensitivity of IgA-EMA for any dietary transgression was 66%, and for minor transgression it was 37%. For IgA-tTG, the sensitivities were 52% and 31%, respectively. No statistically significant differences were detected between the two serologic tests.
In a mixed population, Scalaci et al.401 showed a low reliability for IgA-EMA in picking up dietary transgressions reported at interview. It is not reported whether ME or HU was used. In patients on a GFD for at least 6 months, only 11.1% those patients reporting one dietary transgression per month were positive, and only 19% reporting one dietary transgression per week were positive.
Fabiani et al.410 showed a similarly low rate of serologic detection of non-compliance in screen-detected adolescents. Of 6,315 screened students, 28 biopsy-proven CD patients were found. Of these, 23 agreed to participate in a follow-up study. The mean duration of GFD was 23 months. IgG-AGA, IgA-AGA and IgA-EMA were measured. Whether EMA was ME or HU was not reported. Of the 11 patients reporting any dietary transgression, only two patients (19%) were positive for any of the serologic tests.
Pacht et al.,412 in a similar study, showed different results. Seventeen children deemed compliant with GFD for at least 1 year were all IgA-EMA-ME-negative, whereas, 22 children deemed non-compliant were IgA-MA-ME-positive. This study suggests a much higher sensitivity for EMA than in other studies.
A number of further studies include serology and biopsy correlation. These are reviewed below.
Sategna-Guidetti et al.413 looked at 47 adults with CD. All were IgA-EMA-ME positive at diagnosis. After 8 to 30 months of GFD, a second biopsy was taken and IgA-EMA-ME was remeasured. Total AGA was also measured in 39 patients. No patient in which the mucosa recovered to normal had a positive EMA. Only one patient with normal histology had a positive AGA (2.6%). EMA was positive in only five of 23 patients with partial villous atrophy, three of 13 patients with subtotal villous atrophy, and one of two patients with total villous atrophy. AGA was positive in only seven of 20 patients with partial villous atrophy, five of ten patients with subtotal villous atrophy, and two of two patients with total villous atrophy. The PPV of EMA for abnormal histology was 100%, but the NPV was only 23%. The PPV-AGA (total) for abnormal histology was 93.8%, whereas the NPV was only 25%. There was a clear inability of serology to adequately reflect the mucosal state in this study, and serology was negative in a significant number of patients with villous atrophy.
Valentini et al.407 also found a significant rate of negative serology despite the presence of villous atrophy. In an adult population on a GFD for a mean of 9.9 months (range 6–12 months), 24 patients were IgA-EMA-ME negative on a GFD. Seventeen of these 24 patients (71%) had varying degrees of villous atrophy on biopsy (14 had partial villous atrophy and three had subtotal villous atrophy).
Dickey et al.409 also showed that disappearance of IgA-EMA-ME did not necessarily indicate mucosal recovery. In adults on GFD for 1 year, IgA-EMA-ME was positive in only two of 22 (9%) with partial villous atrophy, and three of ten (30%) with subtotal/total villous atrophy.
Mengozzi et al.403 investigated adult CD patients on a GFD for 1 year. Most (95%) had a Marsh III histology at diagnosis. In general agreement with the prior studies, only 12% had normal histology at follow-up biopsy 1 year later. Fifty percent were Marsh I and 38% were Marsh II or III (individual results for Marsh II and III were not reported). IgA-EMA-ME, IgA-tTG-HR (four different assays: DRG Diagnostics, Eurospital, Immunodiagnostik, and Celikey), and IgA-tTG-GP were measured. Taking complete mucosal recovery as a negative biopsy and all other biopsies as positive, the authors looked at concordance of serology to biopsy results. Concordance for EMA, tTG1, tTG2, tTG3, tTG4 and tTG5-PG were 48%, 29%, 65%, 14%, 16%, 19%, respectively. The validity of a Marsh I or perhaps Marsh II histology being classified as positive is unclear, and it would have been interesting to know the corresponding concordance rates if Marsh 0–I and Marsh 0–II were considered normal.
Kaukinen et al.398 similarly found a lack of correlation between IgA-EMA-HU, IgA-tTG-GP and histologic state. Of 87 adult patients on a GFD for a median of 1 year, 27 still had a Marsh III villous atrophy. Among those with Marsh III villous atrophy, EMA was negative in 74% and tTG was negative in 59% of patients. Furthermore, of 11 patients admitting regular dietary lapses, 55% were EMA and tTG negative. The sensitivity, specificity, PPV, and NPV of EMA for Marsh III villous atrophy was 26%, 93%, 63%, and 74%, respectively. The values for tTG were 41%, 88%, 61% and 77%, respectively.
The issue arises as to whether serology might more accurately reflect mucosal state in long-term follow-up. In patients on GFD over 5 years,398 two of four patients with Marsh III villous atrophy were EMA and tTG negative, and five of nine patients (56%) admitting dietary transgressions were EMA and tTG negative. In this study, there was no clear advantage of tTG over EMA.
One study by Fotoulaki et al.397 did show a good correlation between serology and mucosal state. In a mixed population of 30 patients, IgG AGA, IgA AGA and IgA-EMA-ME was measured after 12 months of GFD. Contrary to the preceeding studies, all patients had either a Marsh I or II biopsy on a GFD, and all were IgA AGA and IgA EMA negative, while 40% were still IgG-AGA positive. The age range of patients in this study was much younger (1 to 24 years).
Troncone et al.415 demonstrated that serology could miss dietary transgressions in children. Twenty-three adolescents were divided into four groups, depending on assessment of gluten intake. IgA-EmA-ME was present in seven of seven patients assessed to be taking >2 g/day of gluten. All seven also had villous atrophy. Conversely, four patients on a strict GFD, had normal histology and negative EMA. For patients with intermediate levels of gluten intake, one of six patients with a gluten intake of less than 0.5 g/d had a positive EMA. This patient also had partial villous atrophy. Three patients in this group had lesser mucosal abnormalities (increased IELs) and negative serology. For patients ingesting 0.5 to 2 g/d of gluten, three had a positive EMA; two of these had villous atrophy. Five patients had increased numbers of IELs.
Anson et al.416 investigated 43 Jewish Israeli children with CD, and their parents. Thirty-one of the children (70%) were judged compliant based on a combination of clinical symptoms, biopsy and AGA. It is unclear if serology and biopsy was performed in all children to assess compliance. Parental knowledge was studied using a structured questionnaire. A significant positive correlation between the father being a professional and compliance was found (p<.01). Parental level of education was also significantly correlated with compliance. Significant differences in parental ability to choose GFD items from a specific menu were found. Ninety three percent of parents of compliant children were able to pick all five GFD items out of an eight-item menu. This compared with only 67% of parents of non-compliant children (p<.05).
In another parental questionnaire, Jackson et al.418 found that 30 of 50 (60%) parents reported their children to be on a strict GFD. Dietary compliance correlated with membership in the Celiac Society (p<0.0001). It also correlated with parental score on an eight-question test related to knowledge of CD (p<0.001).
Ljungman et al.420 found self-reported GFD compliance in children to be positively associated with knowledge of CD. In this study of 47 Swedish children, those deemed compliant scored 14.03 out of 15 on a knowledge test related to CD. This compared with an average score of 12.44 in the non-compliant group.
Lamontagne et al.419 surveyed 617 past and present members of the Quebec Celiac Foundation. A final sample size of 234 was obtained. Self-reported compliance difficulty with a GFD was inversely correlated with a high level of confidence in treatment information from gastroenterologists and dieticians (p<.005).
Hogberg et al.421 looked at the effect age of diagnosis might have on compliance. In a study population of 29 adults with CD, 15 were deemed compliant with a GFD on the basis of a questionnaire and serology (IgA EMA, IgG EMA and IgA tTG). Eighty percent of patients diagnosed prior to age 4 were GFD compliant compared with 36% of patients diagnosed after age 4 (p<.05). A drawback of this study is that serologic markers were collected about 3 years prior to the dietary questionnaire. This risks misclassification of patients if their compliance varied over time.
In an important study with relevance to outcomes of population screening, Fabiani et al.417 showed a lower compliance in 22 adolescents identified by a mass screening program as compared with 22 age-matched controls with identified CD on the basis of symptoms. All patients had been prescribed a GFD for more than 5 years. Twenty-three percent of screen-detected patients reported being on a strict GFD as compared with 68% of those diagnosed with CD on the basis of symptoms. Patients in the screen-detected group were diagnosed at a later age (mean 14.0 yrs) versus patients identified on the basis of symptoms (mean 4.3 yrs).
A colouring book intervention has been developed to promote GFD compliance,422 but the effectiveness of this intervention has not been assessed in children with CD.
The majority of studies in this objective were of a “before-after” design. In this setting, this design may not pose a major limitation for monitoring studies, since the purpose of the study was to assess the change in serology and histology after introduction of a GFD. In this regard, the strength of the evidence for monitoring adherence to a GFD was fairly good. However, there is almost a complete absence of studies of interventions for the promotion of adherence to a GFD.
Systematic reviews of studies of diagnostic accuracy are similar in many ways to reviews of other study types, such as randomized controlled trials. However, important differences exist in large part because of the weaknesses inherent to the diagnostic-accuracy study design and its potential sources of bias.24 In addition to these considerations, the topic of CD introduces further difficulties, and bias because of the nature of how the disease itself is defined, and the methods of patient selection for inclusion in the study. Ideally, a diagnostic-accuracy study should include a consecutive or randomly selected sample of patients from a clinically relevant patient population. That is to say, a study population who's characteristics match those of the population in which the test will ultimately be used, and both patients and controls are selected from this population. Unfortunately, selection spectrum bias is common in studies of diagnostic tests in general, and in practice it is easier for investigators to select cases and controls as separate groups in a case-control design. The practice of choosing cases that have previously been identified as having the disease, especially if more severe, introduces bias in the estimates of sensitivity (artificially raising it), while choosing completely healthy individuals as controls introduces bias in the estimates of specificity—artificially raising it as well.24 The importance of these biases comes back to the issue of the relevant clinical population. If the test is to be used in screening healthy individuals, then the estimate of the reported sensitivity is higher than it should, but the specificity estimate is likely valid. On the other hand, if the test is to be applied to suspected cases of the disease, then the reported estimate of sensitivity may not be that far off, but the specificity estimate would be higher than it should. Other important sources of bias also exist in relation to the study population, such as the mix of other diseases present in the population with similar features as the disease in question, and ensuring an appropriate mix of disease severity in the tested population. This last point regarding disease severity is especially important for this report, and is discussed at length below.
Lijmer et al.423 reviewed 11 meta-analyses of diagnostic tests, and assessed the characteristics of the included studies using multivariate regression analysis. The authors identified several threats to the validity of a diagnostic study's results. Case-control designs overestimated diagnositic odds ratios (DORs) by three-fold compared with studies using a clinical cohort (relevant clinical population). As well, studies that applied different reference tests to those with and without disease (in case control) or to those testing positive or negative (in relevant clinical populations) overestimated the DOR by 2.2-fold. Interpreting the reference test, with knowledge of the results of the test under study, overestimated the DOR by 1.3-fold. DORs from studies without adequate descriptions of the test or study population were 70% and 40% higher, respectively, than in studies reporting these details. Inadequate descriptions of the reference test were also identified as sources of bias.
With this information at hand we tried to minimize bias in this report, by using what some may consider fairly strict inclusion criteria which also eliminated many poor quality studies. We included both case-control studies and cohort (relevant clinical population) designs but grouped them separately. Studies were only included if an adequate description of the test under study and the reference test (biopsy, and a statement of the criteria defining CD) were provided, and both the cases and controls had to have had the same reference test (i.e., biopsy) applied at the same definition or level (i.e., biopsy grade).
The results of the systematic review demonstrate that in the studied populations IgA-EMA and IgA-tTG have sensitivities and specificities each in excess of 90% in both children and adults. In fact, the pooled specificity of EMA was 100% in adults using either EMA-ME or EMA-HU. In studies of children, the specificity of EMA using these two substrates was 97% and 95%, respectively, with overlapping 95% CIs, suggesting no statistical difference between these values. In adults, the pooled specificity of tTG-GP and tTG-HR were 95% and 98%, respectively, with overlapping CIs. Similarly, in children the specificities were 96% and 99%, again with overlapping CIs. Among the three studies in adults,32, 45, 70 and four studies in children35, 52, 70, 79 that assessed both EMA and tTG, the specificities were nearly identical. Overall, these results suggest that EMA and tTG antibodies demonstrate extremely high specificities in both adults and children.
We identified a tendency towards greater variability in sensitivity between studies and between antibodies, compared with specificity. IgA-EMA-ME demonstrated sensitivities of 97% and 96% in adults and children, respectively. EMA-HU demonstrated a similar sensitivity of 97% in children, although the pooled estimate in adults was somewhat lower at 90%. Among two studies that assessed both EMA-ME and EMA-HU in adults, one demonstrated identical sensitivities of 95%,81 whereas, the other57 showed a lower sensitivity of HU compared with ME (90% vs 100%). This last study only included 20 untreated patients with CD, all of whom were ME positive, but two of whom were HU negative. None of the included mixed-age studies assessed both of these antibodies. Heterogeneity existed in the analyses of sensitivity of tTG-GP in the adult, but it is likely close to 90%. In children, the pooled estimate was 93%. The sensitivity of tTG-HR was 98% in adults and 96% in children, although in both cases the CIs included a low of 90%. In studies of mixed-age populations the sensitivity was 90%.
Estimates of the sensitivity of the IgG class antibodies of EMA and tTg suggest that these tests have poor sensitivities around 40%, although the specificities were quite high at around 98%. These finding suggest that this class of antibody would be inappropriate as a single test for CD, but may be useful in IgA deficient patients, or in combination with an IgA class antibody. One study that assessed the use of IgA-tTG-HR with IgG-tTG-HR found a sensitivity of 99% and a specificity of 100% for the combination.72
The analyses of all the AGA subgroups demonstrated significant heterogeneity, making pooled estimates impossible. Be that as it may, the sensitivity of IgA-AGA in adults is likely not much higher than 80%, but seems somewhat higher in children. The specificity likely lies between 80% and 90%, in adults and children, although the studies of serial testing of AGA followed by EMA or tTG in the prevalence section of this report suggest that the specificity is low as well. Even if one considers an optimistic range, the performance of IgA-AGA in both adults and children is inferior to that of the other antibodies discussed above.
The analyses of IgG-AGA suffered from significant clinical and statistical heterogeneity, making even general summary statements difficult. With this in mind, the typical sensitivity of this test likely lies below 80% in adults, and between 80% and 90% in children. The specificities are likely close to 80% in adults and between 80% and 90% in children with the same warning coming from the prevalence studies, suggesting that in the era of EMA and tTG, testing for CD with AGA has a limited role.
In assessing the PPV and NPV of these tests it is important to keep in mind the prevalence of CD in the tested population. In all the included studies, the prevalence of CD would be considered quite high, the minimum study prevalence was 9%, and many studies demonstrated prevalences in excess of 40%. In comparison, Fasano et al.15 found the prevalence of CD in at-risk first-degree relatives of CD patients to be 4.55%. In general, based on our report, the prevalence of CD in high-risk groups such as suspected CD patients, and first-degree relatives was less than 20% (in non-tertiary centers), and the prevalence in patients with anemia and diabetes was generally less than 10% (Celiac 2 section). As expected, overall the included studies demonstrated the classic relationship between prevalence and the PPV and NPVs. At the relatively high prevalence of CD in these studies, the PPV (the chance that a positive test represents a true positive test) was quite high (>90%), but started dropping at a prevalence below 35% to values generally below 80%. Figures 21
The vast majority of studies, as well as our own TEP, required that the small intestinal mucosa show at least partial villous atrophy histologically for the diagnosis of CD to be made. In fact, most of the studies used patients with subtotal or total villous atrophy. Furthermore, inherent to the clinical definitions of classic, atypical, and silent CD described in the methods, is the requirement of having a “fully developed” villous atrophy. However, Fasano et al.,15 in a large American prevalence study, found that only 34% of biopsied EMA-positive subjects had subtotal or total villous atrophy (modified Marsh IIIb or IIIc). In this study, no EMA-positive patient had a Marsh I lesion, 26% had a Marsh II lesion and 40% had a Marsh IIIa lesion. It is clear from this study, and from the discussion about biopsy later in this section, that true CD exists in patients with histologic grades less severe than classic Marsh III lesions, and that patients with silent CD do not have to have fully developed villous atrophy. The problem that then arises is whether the reported sensitivities of these antibodies holds in the majority of patients who have CD, yet with less severe histology. As well, if the sensitivity is not as high as reported then, by definition, the nearly perfect NPV of IgA EMA and tTG would also be expected to suffer.
This question has been answered in several studies that have correlated histology with the sensitivity of these serological markers, and also mirrors to some extent the antibody response that occurs once patients with CD are placed on a GFD. A description of results of these studies follows below, while a full narrative with tables is located in the Appendix H.
Rostami et al.16 evaluated the diagnostic value of IgA EMA and AGA in 101 untreated patients with CD. The combination of the two tests showed an overall sensitivity of 76%. But, alarmingly, the sensitivity of EMA in these patients dropped precipitously with milder histological grades. EMA demonstrated a sensitivity of 100% in Marsh IIIc, 70% in Marsh IIIb and only 30% in Marsh IIIa. The authors did not consider patients with Marsh I or II lesions as having CD.
Tursi et al.424 assessed the relationship of the histologic grade to tTG positivity in 119 consecutive adult CD patients defined by characteristic duodenal biopsy and “permanent gluten sensitive enteropathy.” In this study, the frequency of tTG-positivity (sensitivity) and mean tTG levels, were greatest with the highest modified Marsh grade, and dropped steadily with milder histologic grades reaching a low of only 8% positivity in CD patients with Marsh I lesions. The sensitivities of tTG in Marsh IIIc, IIIb, IIIa, and II were 96%, 84%, 56%, and 33%, repectively. In another publication, likely using the same population of “permanent gluten-sensitive enteropathy,” Tursi et al.425 demonstrated similar results with AGA and EMA in a population of atypical CD (defined in methods). The sensitivities of EMA in Marsh IIIc, IIIb, IIIa, II, and I, were 97%, 92%, 89%, 40%, and 0%, respectively. The results with AGA showed a similar pattern, with the sensitivity dropping from 90% to 30% in March IIIc to Marsh II.
Furthermore, in likely the same population of “permanent gluten-sensitive enteropathy,” Tursi et al.426 found a relationship between clinical manifestation of CD and EMA sensitivity. EMA was positive in 77 of 96 (80.8%) patients with atypical CD and in 17 of 27 (63.0%) patients with silent CD.EMA was negative in patients with Marsh I lesions. Once again, assuming that all these patients with “permanent gluten-sensitive enteropathy” are truly CD patients, then EMA would miss 19% of atypical CD, and 37% of silent CD that were picked up on the basis of biopsy.
Demir et al.427 studied the presentation and clinical features of 104 newly diagnosed Turkish children. EMA and biopsy correlation was available for 72 children. Similar to what was described above, EMA was positive in 92% of patients with Marsh III lesions versus 66.6% of patients with Marsh I-II lesions. Kotze et al.428 assessed 47 symptomatic subjects with CD with intestinal biopsy, tTG and EMA antibodies. The authors found a statistically significant correlation between antibody titres of EMA and tTG, and histologic grades.
Hoffenberg et al.317 studied a group of children at risk of CD who were part of a large prospective study of the genetic and environmental factors associated with autoimmune diseases. No relationship was found between Marsh grade and the genetic risk factor leading to screening, but a significant correlation was found between Marsh grade and tTG (r=0.57, p<0.01).
In a small case-control study assessing the diagnostic value of EMA, Sategna-Guidetti et al. also found that in patients with documented CD, EMA positivity correlated with the severity of the histologic grade.429 In this study, EMA was falsely negative in 50% of CD patients without villous atrophy.
The findings of the large prevalence study by Fasano et al.,15 however, require further discussion within this context. This study demonstrated a very high prevalence of CD of 0.95% (1:105) in asymptomatic not-at-risk adults using IgA-EMA.Additionally, 34% of biopsied EMA positive subjects had subtotal or total villous atrophy (modified Marsh IIIb or IIIc), 40% had a Marsh IIIa lesion, and 26% had a Marsh II lesion. No CD patient in this study had a Marsh I lesion, although this is in part likely due to how they defined CD.In any case, there are at least two ways to interpret these results. The first is that EMA testing does pick up the mild Marsh grades, given the high prevalence of CD in this study. While the second interpretation is that based on the preceding discussion and the serology monitoring data, this study has missed an unknown number of CD patients with milder histological grades. Unfortunately, since we do not have follow-up data on the screen-negative patients in this study, this question will be difficult to answer and arguments can be made on both sides.
The question that remains, however, is whether subjects with low grade histologic lesions are at the same risk of long-term complications as those with more advanced histologic grades. On the one hand, it is apparent that symptoms may not correlate with histologic grade but rather with the length of affected small bowel. When the distribution of histological grades is compared among patients with CD who are clinically asymptomatic versus symptomatic, the same distribution of grades is seen. For practical reasons, few of the studies we identified assessed length of small bowel involvement with CD.But another question arises: are patients with early March lesions who test positive for serology the ones who have more extensive small bowel disease?430 These questions add to the uncertainty regarding the true performance of serological testing, and whether missing early grade histologic lesions is important. Although we could not find direct evidence comparing outcomes in patients based on their histologic grades, it is not unreasonable to think that a patient with Marsh I-II lesions would still have an increased risk of CD complications (see Celiac 4 and 5 for some data regarding this point).
In summary, it is clear that from our pooled estimates of the included studies that IgA-EMA and IgA-tTG antibodies provide excellent specificity for the diagnosis of CD. However, the high reported sensitivities may only apply to the selected group of patients with villous atrophy. Furthermore, if the sensitivity is in fact lower when the entire biopsy spectrum of CD is considered, then the nearly perfect NPV of these tests, particularly in low prevalence populations, would also be expected to suffer. Finally, the PPV of these tests may not be as high as suggested when the tests are applied in low-prevalence populations, as demonstrated by our estimates of PPV from the population screening studies. These potential limitations of serological testing can have profound implications for population screening initiatives, and verification of the sensitivity of these antibodies in a large population of CD patients showing the full histological spectrum is urgently required.
The HLA DQ2 haplotype represents the occurrence of the HLA class II heterodimer alleles DQA1*0501 and DQB1*0201.These typically occur in a cis position as HLA DR3-DQ2 or in a trans position as HLA DR5/DR7-DQ2. The HLA DQ8 haplotype DQA1*0301/DQB1*302 typically occurs in association with DR4. HLA DQ2 occurs in about 20% to 40% of the general population,9, 10, 15, 100, 135, 136, 138–141, 143, 146, 147, 150–157, 159, 167 48% to 65% of healthy relatives of patients with CD,158, 161, 164, 166, 167, 169, 172, 177 and in up to 73% of non-CD patients with type I diabetes.97, 165 In one study, 100% of patients with enteropathy associated T-cell lymphoma (EATCL) were HLA DQ2 positive.151 Non-CD patients with Down Syndrome appeared to have the same frequency of HLA DQ2 as the general population.109, 134, 160
Populations of non-Western European descent demonstrated very wide variations in the frequencies of HLA DQ2 both in CD patients and controls.120, 137, 142, 148, 159, 163
Overall, it can be seen that HLA DQ2 alone offers a sensitivity in excess of 90%, which can be improved to close to 100% if a strategy of testing for both HLA DQ2 and HLA DQ8 is utilized (either test being positive). The specificity of both tests together, or either test alone, is not as good as the sensitivity, falling in the range of 55% to 80%. The specificity becomes considerably worse if a population with a higher expected frequency of HLA DQ2 or HLA DQ8, such as first-degree relatives of patients with CD or patients with type 1 diabetes, is tested. The PPV, (the probability that a positive test represents a true positive result) of testing for HLA DQ2/8 in an average population is generally low. One, however, needs to keep in mind the dependence of predictive values on the prevalence of CD in the population to be tested. Therefore, in high-risk groups, such as first-degree relatives or patients with type I diabetes, the PPV tends to be higher. Conversely, it appears that the value of testing for HLA DQ2/8 is highest when a negative test is found. Given the high NPV of this test, average-risk patients can have the diagnosis of CD excluded based on a negative test. The situation is more complex in high-risk groups, since the NPV decreases with increasing prevalence, and with the recognition that there are HLA DQ2/DQ8-negative patients with CD. These findings, along with the cost of HLA testing, make routine use of this modality for screening or diagnosis inappropriate. However, the use of this test is most useful in cases of diagnositic uncertainty or as part of a multi-test gold standard in clinical studies.
Inter-observer agreement in the histologic assessment of small bowel pathology. As previously described, there are several potential criteria for the diagnosis of CD. The original and modified ESPGAN criteria2, 4 appear direct. Most of these criteria, as well as the assembled TEP, felt that some degree of villous abnormality is required for the diagnosis of CD. In practical terms, even distinguishing between a Marsh II (no villous abnormality) and a Marsh IIIa (minimal villous changes) can be difficult.431. This concern is further confounded by potential problems with the biopsy specimens themselves such as size, orientation, quality, and proper biopsy sampling. Hence, agreement between different pathologists and between the same pathologist at different times becomes important. The biopsy literature search identified a few articles that addressed pathologist aggreement.
Weile et al.432 assessed inter and intra-observer agreement among three experienced Swedish and Danish pathologists reading the small bowel histology of patients suspected of having CD. Ninety small-bowel biopsies taken by capsule near the ligament of Treitz from 73 children were selected at random from a larger sample taken from 1987 to 1994. The final diagnosis was made on the basis of evaluation of specimens by dissecting microscopy, formalin-fixed H&E-stained slides, intestinal disaccaridases, serology and clinical presentation. The initial biopsy reports from patient files were sorted into normal (66; normal or minor nonspecific abnormalities—85% were on a gluten-containing diet [GCD]), pathological (17; total and severe villous atrophy, all on GCD), and inconclusive (seven; because of poor orientation, small sample, or autolysis). Several years later (1997) the same three pathologists who read the initial biopsies, performed a second reading of the slides given to them in random order. In comparison with the first reading, the number of inconclusive readings rose from seven to 22, there was a corresponding fall in the number biopsies read as normal and pathological. Considering the overall biopsy reading and diagnosis, the Kappa statistics (a statistical measure of agreement “correcting” for chance433) were (0.57, 0.63, and 0.75) for the three pair-wise comparisons of the three pathologists. These kappa values were reported to be “moderate” (for two out of the three agreement kappa scores) to “substantial” in terms of agreement, and suggest that agreement is far from perfect even when the same pathologist reads the same slide twice
Vilela et al.431 also assessed inter-observer agreement among Brazilian pathologists in the diagnosis of CD. Three experienced masked pathologists independently read the slides of 34 patients with CD based on ESPGAN criteria. Agreement differed among the three possible pair-wise comparisons, with the best agreement occurring between pathologists A and C. Good to excellent agreement (kappa 0.61–0.85) was obtained for the assessment of villous structure. Reasonable to good agreement was observed for increased number of crypt mitosis (kappa 0.63), and decrease in the overall number of villi (kappa 0.47–0.53). However, agreement about the number of IELs using standard staining was weak (kappa 0.39). Interestingly, the agreement regarding overall histologic grade was also weak between two pathologist pairs, and reasonable to good for the last pair. As with the above study, it is difficult to comment on the generalizability of these results. The authors suggest that the number of CD cases seen was fewer than expected, and qualitative rather than quantitative measures of such parameters as villous height and IELs were used. Still, the findings suggest that agreement regarding the histologic grades should not be taken for granted.
Several authors have suggested that quantitating various histologic features, such as the number of IELs per 100 or more enterocytes, results in greater reproducibility of biopsy readings.434 Authors that used quantitiative criteria during studies of inter-observer agreement likewise showed better agreement than reported above.435–437 These studies suggest that the use of quantative methods in the reading and reporting of small bowel histology, by pathologists experienced in the reading of CD biopsy specimens, leads to greater agreement among pathologists and presumably more uniform and standardized reporting.
Latent CD. The presence of latent CD is a threat to the diagnostic accuracy of biopsy, since these patients truly have normal intestinal histology.
Stenhammar et al.438 conducted an initial study of 100 first-degree relatives of 32 patients with CD. All 100 relatives were biopsied and two cases of CD were identified. In a 20-year follow-up study, Hogberg and Stenhammar247 performed serological evaluation (AGA, EMA, tTg) on these same 100 relatives and their offspring, with positive results prompting intestinal biopsy. All relatives with initial “mild or moderate mucosal” abnormalities remained unchanged and were not considered to have CD. Eight new CD cases were identified, two of these were relatives of the two cases diagnosed in the first study. One of these, a parent of an affected child, had a grade II–III lesion in the first study that normalized on a GFD, and remained normal after 3 years of a GCD; she was not classified as CD, though in retrospect she likely represents a late relapser rather than transient gluten intolerance or a true latent CD. The other patient had a grade II lesion, but initially was not regarded as having CD because of the absence of symptoms. She was also found to be DQ2 positive. The remaining six newly diagnosed subjects were offspring of index CD cases and were not part of the initial cohort. In all, only two subjects of the initial biopsied cohort were “missed” in the first study. In retrospect, these subjects should have been included. This suggests that biopsy has the potential of high sensitivity and specificity for CD. Unfortunately, in the follow-up study, the number and HLA status of those with mild-to-moderate mucosal abnormalities (serology negative) was not reported, and since not all subjects were rebiopsied it is also unclear if there is a group of serology-negative, initially normal biopsy relatives that have developed higher grade histology at follow-up, suggesting latent CD.
Maki et al.62 likewise after an initial biopsy screen of 113 first-degree relatives of CD patients, discovered 13 relatives with villous atrophy and crypt hyperplasia. During a 3-year follow-up period another three relatives, with previously “normal biopsies” who were AGA positive, were found to have CD. Unfortunately, the authors do not report on the number of relatives with low-grade histologic lesions, and whether the new cases were in patients with completely normal (Marsh 0) lesions or normal in terms of absence of villous atrophy.
Troncone et al.439 searched the medical records of 25 centres in Italy over a 10-year period to identify children with latent CD defined as either individuals with initial normal biopsies who later developed villous atrophy and responded to a GFD (Group 1), or people who were previously diagnosed with CD by ESPGAN criteria and who were subsequently found to have normal histology on a GCD for 2 years (Group 2). Nineteen such cases were found. All these patients had normal morphometric analysis and IEL counts on the initial biopsy. Four of the 14 GFD responders were considered at risk of CD (first degree, diabetes). The authors suggested that the five Group 2 patients could either represent true transient gluten-intolerance, or, in their opinion, more likely be late relapsers. These results of apparent post-pubertal recovery from CD are similar to those reported by Maki et al.440 and by Schmitz.441 Although the authors do not report on the number of charts or children screened, the findings of this study suggest that latent CD is very rare and unlikely to impact on the diagnostic accuracy of biopsy. It, however, underscores the importance of a time dimension in studies of CD, to accurately assess the true false positive and negative rates of diagnostic tests for CD.
IELs with normal villous structure. CD exists in patients with normal villous structure. The biopsy can pick up these patients on the basis of crypt changes and/or changes in the number and type of IELs.
Ferguson et al.442 assessed the relationship of raised levels of IELs to the final diagnosis among children with diarrhea. The authors found a lack of correlation between IEL counts and morphologic grading of the biopsy. However, among seven children ultimately found to have no organic disease, all had normal IEL counts in the range of 14–25/100 epithelial cells (ECs). Two of three children with CD on a GFD also had normal IEL counts. In contrast, the values were elevated to greater than 38 IEL/100 ECs in untreated CD patients. High counts were also found in three children with failure to thrive or diarrhea of unknown etiology, and in three of nine children with giardiasis. Though in these cases, the mean values were lower than in the untreated CD cases. Interestingly, among 14 children with gastroenteritis, ten had abnormalities of the villi, crypts or lamina propria, but all but one had IEL counts within the normal range. Although, the differential of mild mucosal changes is large, this study suggests that one of the histologic features of CD can distinguish between CD and other mild enteropathies, and could potentially allow for a relatively high sensitivity by allowing CD to be defined by a low-grade Marsh lesion, while maintaining some of the specificity. This theme will be revisited in studies that follow.
| Test | Celiac (n=27) | CD excluded on biopsy (n=79) | Biopsy-negative controls (n=28) |
|---|---|---|---|
| Mean # of γδ+ IELs | 40.4 (95%CI: 32.7–48.2) | 6.7 (95%CI: 4.8–8.5) | 1.6 (95% CI: 1.1–2.1) |
| Elevated γδ+ IELs (> 4.4 cells/mm) | 27 (100%) | 39 (49%) | n/a |
| AGA positive | 21/26 (81%) | 33/66 (50%) | n/a |
| Reticulin antibodies | 27/27 (100%) | 18/78 (23%) | n/a |
| HLA DQ2 | 19/21 (90%) | 20/67 (30%) | |
The mean density of γδ+ IELs was significantly greater in CD patients compared with those patients where CD was excluded on biopsy, and compared with biopsy-negative controls. The density of these IELs was also significantly higher in patients with CD excluded on biopsy compared with controls. Because the authors used the ESPGAN criteria, which requires some degree of villous atrophy, the 50% of subjects with CD excluded based on this criteria who were AGA positive begs the question of how many of these were actually CD patients. However, based on the reported data, elevated γδ+ IELs were calculated to have a sensitivity of 100%, but a specificity of only 50.6%, although the true specificity is likely higher. In the biopsy-negative suspected CD group, 66 out of the 79 underwent testing for HLA DQ2. Out of these patients, 46 tested negative for HLA DQ2.Given the high NPV of this test, it is likely that most of those patients do not have CD. Recalculating the specificity based on this assumption would raise its value, but unfortunately a breakdown of the number of patients with normal and elevated IEL in relation to HLA DQ2 was not reported. In any case, a better comparison would have been with the biopsy-negative control subjects, but the number of control subjects with raised IELs is not reported. Based on the mean density of IELs in this group, the number of patients with elevated IELs is likely to be low. During follow-up of the children suspected of having CD, but with normal mucosal biopsy and positive serology, four patients developed CD and responded to a GFD, further suggesting that this “control” group of patients with CD “excluded” on biopsy likely contained true CD patients who did not have villous atrophy. The results also suggest that the measurement of γδ+ IELs can be valuable in the diagnosis of CD, and hints at the fact that the requirement of villous atrophy on biopsy may miss some subjects with CD, particularly if they have raised IEL levels , positive serology and are HLA DQ2 positive.
| Sub-total/total villous atrophy (n=18) | Moderate villous atrophy (n=7) | Normal mucosa (n=9) | Pediatric controls (n=15) | Adult controls (n=15) | |
|---|---|---|---|---|---|
| Diet | normal | GFD | n/a | ||
| γδ+ IELs/100 ECs | 14.8 | 17.5 | 14.5 | 3.1 | 3.6 |
The density of γδ+ IELs/100 enterocytes was significantly higher in CD patients (15.4, n=34) compared with pediatric and adult control patients (3.1 and 3.6, respectively). However, the density did not correlate with histologic grade or with a GFD. Unfortunately, this study has several methodological flaws, and estimates of the sensitivity and or specificity of IEL in CD could not be derived. However, the study does indicate the potential usefulness of measuring γδ+ IELs in the overall evaluation of biopsy specimens for possible CD, and again demonstrates that CD patients can have a biopsy with normal villous structure which can be distinguished from normals by assessing the number of IELs.
In an interesting comparative study of the correlation of IELs with AGA positivity by ELISA, O'Farrelly et al.444 studied 25 patients who had typical histologic features of CD and who were subsequently placed on a GFD. Ten of these were AGA positive, whereas 15 were negative. The second group consisted of 28 subjects suspected of CD but with “normal” small bowel histology. Twelve were AGA positive and 16 were negative. Increased levels of IELs were seen in both AGA positive (82.5) and negative (74.3) CD patients (difference not significant). On the other hand, among those with “normal” histology, AGA positive subjects had a significantly higher density of IELs than those who were AGA negative (42.4 vs 17, p<0.001). This data suggests that subjects suspected of CD with normal villous atrophy who have raised IEL densities should be further evaluated for CD, especially if serology is positive. These are also the types of patients where response to a GFD may be invaluable to firmly establish the diagnosis and help clarify the diagnostic value of low-grade histologic lesions.
| Confirmed CD (n=9) | CD under investigation (n=40) | Controls (n=143) | |
|---|---|---|---|
| IELs/50 ECs | 68.55 | 51.21 | 11.14 |
| # with raised IELs (estimated from figure) | 9 | 40 | 2 |
These results again suggest the usefulness of IELs in the evaluation of histology of patients being assessed for CD, and suggest a sensitivity of raised IELs of 100%, and a specificity of 98.6%. Unfortunately, the authors do not report the number of individuals under investigation for CD who actually ended up having CD, so as to estimate the diagnostic parameters in this group.
| Untreated CD (n=138) | Treated CD (n=198) | Suspicion of CD with normal villi (n=545) | Controls (n=59) | |
|---|---|---|---|---|
| CD3 + IELs | 68* | 40* | 26 | 30 |
| γδ+ IELs | 19.8* | 12* | 3.2 | 2.3 |
| Villous/crypt ratio | 0.6* | 1.9* | 2.8 | 3.0 |
statistically different from control
The authors noted that using a cut off of 37 cells/mm for CD3+ and 4.3 cells/mm for γδ+ IELs, the sensitivities and specificities were 93% and 73% for CD3+, and 93% and 88% for raised γδ+ IELs, respectively. The PPVs and NPVs for raised γδ+ IELs were 95% and 85%, respectively, in this population. However, these results are based on the well-documented clear-cut CD group, and did not take into consideration the CD patients that might be in the suspicious but normal villi group. Among the patients with a suspicion of CD but normal villi and high γδ+ IELs (>4.3), 28% were EMA positive compared with only 8% with normal γδ+ IELs (<4.3). Unfortunately, the outcomes of these patients are not reported, so one cannot comment further based on this study about the usefulness of IELs in Marsh I or II patients.
| CD (n=8) | Treated CD (n=4) | Non-CD (n=16) | Controls (n=11) | |
|---|---|---|---|---|
| Mean age | 33.5 | 46.3 | 46.4 | 39.1 |
| IELs/100 ECs by H&E staining | 42.1 | 29.2 | 36.8 | Not increased |
| IEL/100 ECs in villous tip by CD 3 staining | 47.5 | 29.4 | 33.2 | 8.2 |
There were no statistically significant differences between any of the groups when IELs were measured with H&E staining. However, all pair-wise comparisons were statistically different, except between the treated CD group and the non-CD group, when villous-tip IELs were counted with CD3 staining. The authors conclude that villous tip IELs are more specific indicators of CD, particularly with CD3 staining (which is more readily available than staining for γδ+ IELs), and suggest that the specificity of low grade Marsh lesions could be improved by these techniques.
| CD (n=12) | Non-CD (n=66) | Controls (n=24) | |
|---|---|---|---|
| Mean age | 35.2 | 36.1 | 34.5 |
| Iga EMA | 8 | 3 (no response to GFD) | n/a |
| IgA AGA | 5 | 13 (all EMA neg.) | n/a |
| Villous tip IELs | 11.6 | 4.3 | 2.2 |
| IELs distributed evenly along the villi | 9/12 (75%) | 3/68 (4%) | 0 |
n/a = not applicable
The authors found that the mean villous tip IEL density was significantly greater in the CD group than in the non-CD and control group. A more even distribution of IEL along the villi was also found to be significantly more common in the CD group compared with the other groups. However, this last point is controversial. Unfortunately, given that this is a small study, the authors did not look at differences in these characteristics among CD patients with different histologic grades.
Kuitumen et al.448 compared the histologic features of children with untreated CD, treated CD, other GI disorders (cow's milk allergy, DH, congenital lactase deficiency, acrodermatitis enteropathica, and giardiasis) and a group of control subjects without GI pathology. Of the 52 children with CD in this group, all had severe villous atrophy. CD patients had the lowest enterocyte height, and the most intense IEL infiltration of the studied groups. The authors found no overlap between CD patients and controls for the density of IELs, villous height, crypt depth, and villous height to crypt depth; all these parameters were statistically different between the CD patients and controls.
Kaukinen et al.449 studied 96 consecutive adults found to be ARA or AGA positive and compared them with 27 ARA- and AGA-negative patients with dyspepsia. All patients underwent duodenal biopsy and CD was diagnosed on the basis of a villous height to crypt depth of less than two and crypt hyperplasia. Twenty-nine patients met their biospsy criteria of CD (18 ARA- and AGA-positive patient, nine ARA-positive patients, and two AGA-positive patients). The 29 CD patients were placed on a GFD and of the 21 who were rebiopsied at 6 to 12 months, all showed unequivocal histologic improvement. The mean density of IELs in CD, serology positive, biopsy negative, and control patients were 87, 38, and 25 cells/mm, respectively. These numbers were statistically different. The mean density of γδ+ IELs among the CD patients was 16.6.Eleven serology-positive patients with normal villous structure (presumably Marsh I and II) expressed HLA DR and had higher levels of γδ+ IELs (mean of 13.4 cells/mm) than the non-CD controls. A repeat biopsy (time unspecified) was performed in 12 serology-positive patients with normal villous structure at the time of the first biopsy. Ten of these had raised γδ+ IELs density on biopsy (Marsh I or greater). Five of these 12 were found to have villous atrophy (Marsh IIIa or greater). This study further illustrates the later development of CD in subjects with mild histologic changes, and suggests that although the specificity of villous atrophy may be high (all patients responded to a GFD), the sensitivity of villous atrophy (Marsh IIIa or higher) is lower than that of the serological test used in this study. This suggests that using a lower biopsy cut-off grade could improve sensitivity, albeit at the cost of specificity.
Using another approach, Wahab450, 451 identified 38 patients with symptoms of malabsorption who only demonstrated raised epithelial lymphocytes on duodenal biopsy (Marsh I). These patients were given a gluten challenge of 30g/day for 2 months, while maintaining their normal GFC. Twelve of 38 patients developed worsening mucosal lesions of crypt hyperplasia and partial or subtotal villous atrophy. After institution of a GFD all 12 patients showed improvement of their malabsorption, and improvement of their histology, suggesting that they truly had CD.
The same authors,451 similarly studied 27 patients referred for malabsorption who were found to have a Marsh II lesion. HLA DQ2 or DQ8 was found in 21 of 27 patients (78%). The authors motivated 25 patients to follow a GFD, and all showed symptomatic improvement. The two patients who refused the GFD progressed to a Marsh IIIa lesion at follow-up. Although these data provide evidence of the true existence of CD in patients with Marsh II lesions, the frequency is unlikely to be as high as reported here. The high NPV of HLA DQ2/DQ8 suggests that at least some of the six testing negative likely don't have CD. In any case, this study adds further evidence to the notion that a Marsh III cut-off will miss some patients with CD.
In a very interesting study, Mahadeva et al.452 identified all duodenal biopsies performed over a 1-year period with increased levels of IELs, yet normal villous structure. Biopsies were formalin fixed and stained with H&E. Other biopsies showing at least subtotal villous atrophy and increased IELs were considered as “suggestive of CD.” Two normal control duodenal biopsies for every case of increased IELs with normal villous structure were also obtained. The upper limit of normal for IEL levels in this study was 22 IELs/100 ECs. Out of 626 biopsies assessed, 14 (2.2%) were found to have increased IEL and normal villous structure, whereas 15 (2.4%) cases of CD were identified. Normal histology was found in 502 (80.2%) of the biopsies. The biopsies with raised IELs had a mean of 38 IELs/100 ECs (range of 27–46). Control biopsies on the other hand had a mean of 12.4 IELs/100 ECs (range of 2–20). The presence of GI symptoms did not differentiate those with raised IELs from controls or CD patients in this cohort. Six of the 14 patients with raised IELs had positive EMA and/or unexplained anemia and were suggested as having “latent” CD by the authors. Unfortunately, follow-up in this group was incomplete with only three of these patients undergoing repeat biopsy. As with the previously described studies, the presence of patients evaluated for possible CD who have isolated increased IELs may contain a subset of true CD patients. In fact, if one assumes that the six EMA positive subjects with raised IELs do in fact have CD, then one can estimate that using a lower histologic grade to define CD in this population would have resulted in a sensitivity of biopsy of 100%, and a specificity of 98%—since only eight patients out of the studied sample of 531 would have been misclassified as having CD when in fact they did not. Of course, the expected specificity would not be as high as the one produced in this exercise since the authors do not tell us the histologic features or the diagnoses of the remaining 95 patients (626 biopsied, minus 502 normal, minus 15 CD, minus 14 raised IEL and normal villous structure = 95). However, taking this exercise further, if we assume that all of the other 95 patients were misclassified as having CD, then the specificity would drop to a still respectable 83%. Clearly, this type of study is the starting point in assessing the diagnostic parameters of the biopsy itself as a test. However, what is needed to fully assess biopsy as a test is a clearer measure of the false positive and negative rates. This can only be accomplished by using a battery of tests (biopsy, serology, HLA) to act as a gold standard to initially identify all potential cases, and then a follow-up period (response to GFD or gluten challenge) to assess the permanence of the diagnosis and the utility of biopsy at various cut-offs when used alone.
| Histology | EMA+ | TTG+ | HLA DQ2 | γδ+ IELs | |
|---|---|---|---|---|---|
| Initially | Marsh III - 2 (patchy) | 8/10 | 9/10 | 9/9 | Marsh III - 25 cells/mm |
| Marsh II - 7 | Marsh I–II - 13 | ||||
| Marsh I - 1 | Controls - 1.4 | ||||
| After GFD | All Marsh II re-biopsied | 0/10 | 1/10 (Slightly elevated) | Same | Reported as decreased values not reported. |
| Marsh I - 2 | |||||
| Marsh 0–5 | |||||
Although this is a small study with possible selection bias, the authors demonstrate that in a subset of patients suspected of having CD but without villous abnormalities, CD was diagnosed in all on the basis of a response to a GFD. Raised levels γδ+ IELs, positive serology, and HLA DQ2 positivity, supported the diagnosis of CD. Patients with CD and Marsh I-II lesions had significantly higher levels of IELs than controls. Unfortunately, this study did not include a larger sample of patients with Marsh I-II histology that included serology-negative subjects. Although it is clear based on this study that CD can exist in patients with Marsh I-II lesions with raised γδ+ IELs, it is difficult to generalize these results to an unselected sample of suspected CD patients.
In a somewhat complicated but important study, Kuakinen et al.98 assessed 271 patients with suspected CD by biopsy. Forty-five patients were classified as having definite CD on the basis of a Marsh III lesion. While in 136 patients, CD was excluded on the basis of a Marsh 0 lesion and normal levels of γδ+ IELs. The remaining 76 patients had an uncertain diagnosis of CD based on biopsy (absence of villous atrophy) and underwent HLA DQ2 and DQ8 testing. In 59 of these patients, there were minor mucosal lesions or positive serological markers, while 17 were already on a GFD prior to biopsy. CD was excluded in 11 of these 17 patients on a GFD. Of the remaining 59 patients, CD was excluded in 22 because of a negative HLA DQ2/8 given the high NPV of this test, whereas 37 were DQ2/8 positive and remained with the suspicion of CD.Overall, CD was excluded in 33 of 76 patients. Among patients suspected of CD, but without villous atrophy, Marsh I-II lesions were found in 20 DQ2/8-positive patients versus in five DQ2/8-negative patients. Elevated levels of γδ+ IELs were found in 20 patients who were DQ2/8 positive compared with seven patients who were DQ2/8 negative, and IgA-EMA was found in 16 patients who were DQ2/8 positive compared with 0 patients who were DQ2/8 negative. Although data is not provided for some patients, one can estimate the sensitivity of using a Marsh III cut-off. We know that CD was diagnosed outright in 45 out of 271 patients, but with subsequent testing a further 37 patients were found to be positive for HLA DQ2 or DQ8.At least 16 (EMA positive) and likely 20 (increased IEL counts) of these patients likely have CD. Based on these assumptions, the sensitivity of a Marsh III cut-off is between 69% (20 DQ2/8 patients with increased IELs have CD) and 74% (16 EMA and DQ2/8-positive patients have CD). The sensitivity would be lower if more of the DQ2/8 positive patients turned out to have CD. The specificity of that cuff-off would appear to be 100%, although we are not told if the Marsh III patients all improved on a GFD. Clearly using a biopsy cut-off lower than Marsh III would have increased the sensitivity, but unfortunately we are not given enough information to estimate this reliably.
This study with its battery of tests comes closer to the ideal design to estimate the diagnostic characteristics of biopsy, but unfortunately, it has significant short comings. To be fair the intent of the study was not to determine the sensitivity of a Marsh III cut-off. However, for the sake of future studies in this area, several design changes could have allowed this estimation. This study had two important positive aspects: it used a relevant clinically important population of patients suspected of having CD, and all the subjects underwent biopsy. However, it would have been ideal, if all the subjects also underwent HLA testing and serology. Furthermore, a follow-up of positive and negative patients, and or the assessment of the response to a GFD or the use a gluten-challenge in difficult to diagnose patients, would have allowed for the estimation of false positive and negative cases.
Relationship of serology to histology. As the data from the previous discussion suggests, CD clearly exists in patients with histological grades milder than Marsh IIIa. The fact that the sensitivity of biopsy is improved by using a lower grade as a cut-off brings up an important question. If the preceding statement is true, then what test is most sensitive for detecting CD with mild histologic changes—biopsy or serology? The issues surrounding this discussion have been addressed in the later portion of the serology discussion section, and a detailed narrative summary of the studies of the relationship of serology to histology can be found in Appendix H. However, to summarize, data from these studies as well as some data from Celiac 5 suggest that the sensitivity of serology drops with milder histologic grades, and suggests that serology alone would miss CD patients with mild histology grades.
In summary, CD exists in patients with histology grades less than Marsh IIIa. The sensitivity of biopsy at a Marsh IIIa or higher cut-off is likely less than that of serology with EMA or tTG. If lower Marsh grades are used, the sensitivity of biopsy increases, and it is possible that if morphometeric techniques including assessing IEL densities are used, the specificity may not suffer greatly. Ultimately, the question of the true sensitivity of biopsy can only be answered with a well-conducted study that attempts to identify all possible CD patients in a given clinically relevant population using multiple simultaneous tests (e.g., serology, HLA) in addition to biopsy. All patients, those who clearly have CD, those in whom CD seems excluded, as well as equivocal cases, need to be followed for the assessment of the permanence of their “diagnoses.” Equivocal cases could also be considered for further testing, either with assessing response to a GFD or gluten challenge, to help in the clarification of their diagnosis. Although there are other potential variables to consider, with these measures, assessment of the false positive and false negative rates of biopsy, and hence a clearer estimate of the sensitivity and specificity, can be determined.
The crude incidence of CD among western European and North American countries over the past 25 years has varied between 1 and 51 per 100,000, and the cumulative incidence by age 5 between 0. 118 and 9 per 1,000 livebirths. Notable variations in CD incidence have not only been striking between neighbouring countries, such as is the case for Sweden and Denmark, but also between time periods for the same region, such was noted in the UK between the 70's and 80's as well as in Sweden over the 90's.
It is important to note that there were important methodological differences among the studies, from using patient registers200 to actively screening at-risk patients.128 Clinical practice also varied between time periods and regions. The advent of serological testing in the early 90's changed attitudes towards screening and identifying populations at risk with resulting higher detected incidences of CD. In some studies, active efforts were made to detect CD among asymptomatic subjects, such as the case in Finland where all subjects referred for endoscopy underwent small intestinal biopsy, independent of the cause for referral.199 The incidence of CD is also expected to vary according to the genetic make-up of the studied population, although the prevalence of at-risk HLA haplotypes was only noted in one study.128 These observations also highlighted the importance of dietary factors in triggering so-called CD epidemics among genetically predisposed populations. It would appear that breastfeeding bears a protective role, while early introduction of gluten, as well as the amount of gluten content in the diet may promote the early serological and pathological manifestations of CD. It is unknown whether these factors trigger an earlier expression of a disease which would become manifest anyway, or whether they trigger the appearance of a disease which may not otherwise occur, even later on in life.
In conclusion, caution should be exercised when extrapolating the noted incidence for one given region to a whole country, in particular in countries such as the US where there are differing population ethnicities among regions, between rural and urban areas, as well as between small and large cities. However, it remains that the true incidence and prevalence of CD are if anything greater than reported in clinical settings, since observations derived from screening and case-finding efforts were consistently greater than those relying on the diagnosis of clinically suspected cases. Lastly, it is important to bear in mind that, considering the large proportion of subjects with silent CD (the so-called celiac iceberg), observed incidences will depend upon the efforts spent screening cases, as is well illustrated by the difference in the relatively low incidence observed over 30 years in Olmstead county, where the majority of cases had clinically overt disease, as opposed to the very high incidence noted in Denver Colorado that resulted from a systematic and prospective screening of newborns and children at risk.
The included prevalence studies demonstrated important differences in execution, tests for prevalence assessment, and in patient sampling, making pooled estimates of prevalence unreliable. Furthermore, the discussions regarding the operational characteristics of the serological tests themselves, the influence of disease prevalence on the PPVs and NPVs of these tests, and the criteria by which clinical and histological CD is defined, have to be kept in mind when considering the results of this section. The last point regarding the histologic definition of CD is particularly important in this setting, since one-third of the included studies did not seek histologic confirmation of serology diagnosed CD, and in another four studies, a large proportion of the serology-diagnosed patients did not undergo histologic confirmation. Finally, because of the previously discussed concerns regarding the sensitivity of serological tests in lower grade histological lesions, and the potential for missing true CD patients based on histologic criteria that require villous atrophy, the true prevalence of CD in the general population may still have been underestimated in these studies.
With these points in mind, the results of this report suggest that the prevalence of CD in the general unselected populations of North America and Western Europe is quite high and likely falls within the range of 0.5% to 1.26% (1:200 to 1:79). Smaller sample-size studies tended to give wider estimates ranging from 0.17% to 2.67%. Among the studies from the US, the range of prevalence was 0.4% to 0.95% in adults, and 0.31% in children. In Italy, the range of prevalence was between 0.2% and 0.8%, whereas the Scandinavian countries, Ireland and the UK, tended to show a higher prevalence of CD of approximately 1.0% to 1.5%, although there were also studies from those same countries that showed a lower prevalence.
In summary, the prevalence of CD in Western populations is likely close to 1% (1:100) and may be higher in Northern European countries. A firm estimate of the prevalence is impeded by between-study differences, and uncertainties regarding the performance of serological tests at these relatively “low” prevalences, compared with the 40% to 60% prevalences in the studies of the diagnostic characteristics of these same tests (Celiac 1).
The prevalence of CD is greatly affected by the study population. In populations where the diagnosis of CD is clinically suspected, either because of the presenting symptoms or the presence of associated conditions, its prevalence varied between 1.1%307 and 50%.301 This illustrates well how the patient selection process will influence the prevalence of the condition—studies reporting very high prevalence had populations that originated from tertiary, referral centers, while studies reporting low prevalence had populations that tended to originate from general practice. Although the report of the large American study of CD prevalence in at-risk and not-at-risk individuals did not specify how their subjects had been gathered,206 we can assume that these were derived from community practices, considering their large number.
Altogether the variations between the study populations, the diagnostic criteria and the study design were such that it was inappropriate to statistically combine the observed prevalence to obtain a summary measure. Nonetheless, considering studies with subjects who were not originating from a specialized referral centre, the observed prevalence of CD in subjects with symptoms or conditions associated with CD ranged between 1% and 4%.
The findings of this report suggest that the prevalence of CD in patients with type I diabetes is higher than the prevalence in the general not-at-risk population. These findings appear to be consistent across the studied age groups, and by the screening method. Although the magnitude of the risk of CD among patients with diabetes varied to some degree from study to study, many of these differences can be explained by issues of study design. An overall pooled estimate of the prevalence of CD in diabetes could was not calculated due to these study differences.
Almost uniformly, the prevalence of CD by biopsy was to some degree lower than the prevalence by serology. This may reflect the fact that there were some false-positive serology results in the prevalence of CD seen in these studies. Additionally, all these studies used some degree of villous atrophy to make a diagnosis of CD, which may underestimate the true biopsy prevalence of CD, since CD patients with Marsh I or II lesions were not considered. The prevalence by biopsy seemed to be lower still in studies that require subtotal or greater villous atrophy to make a diagnosis of CD. Furthermore, the prevalence by biopsy was uniformly low, as would be expected, in studies in which a large proportion of the screen-positive patients did not undergo biopsy. In these studies, the prevalence by biopsy was typically less than two percent, which likely represents an underestimation of the true prevalence of CD in this population.
The prevalence of CD by serology varied greatly with lows near 1% and highs close to 12%. However, the majority of studies, and particularly those using EMA or tTG, demonstrated prevalences in the range of 4% to 6%. Although the prevalence by biopsy also varied, the typical study with complete biopsy confirmation of serology-positive patients demonstrated prevalences in the range of 3% to 6%.
This evidence report has gathered the reported studies examining the relationship between diabetes and CD. Baring in mind the limitations noted above, we believe there is sufficient evidence to show individuals with type I diabetes are at higher risk of CD. The prevalence of CD in this population is likely between 3% and 6%.
The prevalence is CD in relatives of patients with CD is elevated, both in first-degree and second-degree relatives. That prevalence varied between 2.8%246 and 17.2%235 in first-degree relatives and between 2.6%206 and 19.5%235 in second-degree relatives. The prevalence remains elevated among first cousins, and was 17% in the only study of these subjects.235
We have identified several factors that can be responsible for the variation in the observed prevalence. In particular, the selection of the families, of the relation to the index case, the diagnostic criteria, and the choice of study design.
The prevalence of CD appears to be generally higher in families with multiple known cases, such as reported by Book et al.235 and Mustalahti et al.241 Most other studies referred to their subjects as originating from a “CD family,” without systematically documenting the proportion of families with multiple known cases of either CD or DH.
As expected, in studies that looked at various degrees of relation, the risk was greatest in the first-degree relatives.206, 235, 239 However, Book et al.235 found no difference in prevalence between second-degree relatives and first cousins, i.e., 19.5% (95% CI: 15.1–23.9) and 17.0% (95% CI: 6.4–27.7), respectively.
Also, the age of the screened population might be a factor even beyond infancy, since it has been observed by prospective serological248 and histological237 follow-up studies that the serological and histological markers of CD can develop after an initial negative screen in a genetically predisposed individual. Therefore, a one-time assessment or screen in these individuals may be insufficient.
The serological diagnosis of CD will be affected by the diagnostic accuracy of the test. Fortunately, 11 out of 12 studies that used serological screening were EMA-based, a test with good diagnostic accuracy in populations with relatively high prevalence, such as relatives of CD patients. The single non-EMA study236 used AGA, a test with a lower sensitivity and specificity than EMA, but all seropositive subjects underwent a confirmatory intestinal biopsy.
The histologic diagnostic criteria also affect the reported prevalence, as was well illustrated by the study by Tursi et al.,249 where Marsh grades of I and II were also considered diagnostic, resulting in a prevalence of 44.1%.
The study design, especially whether all at-risk individuals are biopsied as opposed to solely those that satisfy a non-invasive criteria, is also to be considered. The EMA-based serological tests can miss milder forms of enteropathy as has been discussed, and this may explain why the prevalence of CD was generally higher in studies where all identified relatives were biopsied.
The results of this report demonstrate an increased prevalence of CD in patients with IDA. The prevalence is highest (between 10% and 30%) in studies of patients with GI symptoms, or in patients who have no gross lesions seen at initial investigation. CD appears to also be common in premenopausal women, both with (4.5%) and without (33%) heavy periods. Overall, in asymptomatic IDA patients assessed by serology or biopsy, the prevalence of CD was between 2.3% and 6%. Therefore, patients with IDA, particularly those without a clearly identifiable cause, should be evaluated for CD as part of their investigation.
The studies of the prevalence of CD in patients with low BMD suggest that between 0.9% and 3% of patients with osteoporosis have CD. As a comparison, Fasano et al.15 found that in the United States 0.75% of the general not-at-risk population, and 4.55% of first degree relatives of CD patients were found to have CD.
The results from these studies should be interpreted within the context of some methodological limitations. Three of them used AGA as the initial screening test to prompt further investigation, and we have shown that the sensitivity of this test is not high. Furthermore, the biopsy criteria used to define CD was either not reported, or required the presence of subtotal, or greater villous atrophy (Marsh IIIb or greater). We have also shown that CD exists in patients with lower grade histological lesions. Furthermore, the study results are contradictory. Two showed a risk of CD higher than the general population,296, 298 while the other two did not.In particular, the study by Mather et al.297 found that seven out of the 96 screened patients were positive for EMA-ME, but none of these were positive on biopsy. From what we have seen regarding the specificity of this test being close to 100% (and therefore the PPV would be expected to be high as well), it is unlikely that there are so many false positives even if the prevalence of CD was low, and raises the question of whether early grade CD patients remained undiagnosed. As such, it is difficult to draw any firm conclusions about the true prevalence of CD in this population, given the contradictory results, the fact that lower grade lesions were not considered, and that no follow-up data was provided on the patients who screened positive for serology but did not meet the biopsy criteria. Taking into account these limitations, it is likely that the prevalence of CD in patients with osteoporosis is higher than that in the general population.
The association between malabsorption and lymphoma is a concept that has evolved over the past century. The observation that a significant proportion of patients with intestinal lymphoma also had villous atrophy at a distance from the malignancy, or had previously been diagnosed with CD, led to the publication of several series on the topic.
Although the objective of the task order was not to determine the risk of CD in lymphoma per se, the broad coverage of our search strategy also allowed us to systematically appraise the literature on this question, and were able to identify only two controlled studies on this association, which we describe here.454, 455
Johnson et al.455 performed a retrospective search of the five main pathology laboratories serving Northern Ireland to identify all the incident cases of small bowel lymphomas (SBL) and small bowel adenocarcinoma from 1987 to 1996. The clinical presentation of the cases, as well as the presence or absence of villous atrophy at a distance, were noted. The prevalence of CD in this group of SBLs was compared with that of the general population in Northern Ireland, as observed from serological screening of the population at large.188 There were 13 cases of CD (gender not reported) out of 69 cases of SBL, all of which were ETCLs. Only one out the 13 CD cases was known to have CD prior to the diagnosis of SBL. The OR of CD in SBL was 27.98 (95% CI: 11.88–65.81) compared with the general population. The OR of unrecognized CD in SBL was 15.72 (95% CI: 9.71–25.45) compared with the general population.
In a prospective multicenter Italian study conducted between 1996 and 1999, Catassi et al.454 screened newly diagnosed adult patients with NHL for CD using EMA and AGA testing; EMA-positive or IgA-deficient patients underwent small bowel biopsy. There were six cases of CD out of 653 patients with NHL (prevalence 0.92%). Three had B-cell and three had T-cell lymphomas. Four out of six cases had lymphoma primarily located in the gut. Two patients were known to have CD for more than 1 year, one of whom was poorly adhering to a GFD. Two cases had been diagnosed with CD within 1 year of the diagnosis of NHL, whereas two other cases had no prior CD diagnosis. The prevalence of CD among these NHL patients was compared with that observed in two Italian studies which performed large scale screening for CD.126, 222 The OR of CD in NHL was 3.1 (95% CI: 1.3–7.6) compared with an age-and sex-matched population.
These observations point to a clear association between CD and lymphoma. To determine the degree of association, or to quantify the risk of lymphoma in CD, we searched the literature for controlled studies of the incidence of lymphoma in CD. Unfortunately, the majority of publications on lymphoma in CD were uncontrolled. Typically, patients diagnosed with CD in a single institution were followed over time and the incident cases of lymphoma were described, along with characteristics of the affected patients, the course of their CD and the histological type of lymphoma. Unfortunately, such studies provide little confidence to estimate the true risk of lymphoma in CD, since lymphoma per se will occur in the general population. The incidence of lymphoma has to be compared with “controls,” matched on various characteristics such as age, sex, period and population. Any study that did not adjust the observed incidence to the expected incidence for age- and sex-matched individuals of the same population was deemed uncontrolled and excluded.
Cohort studies, either prospective or retrospective, constituted the majority of controlled studies. The incidence of lymphoma in a cohort of biopsy-proven CD patients, calculated as the number of lymphomas divided by the number of patient-years of follow up, was compared with that of an age- and sex-matched population from the same geographic area and time-period.
The SIR therefore represents the likelihood of lymphoma in CD patients relative to those who do not have CD in the same population. The value of the denominator reflects the incidence of lymphoma in a given population, so that it is not possible to pool SIR's from different populations.
The AR, however, is a measure of association that provides information about the absolute excess risk of disease in CD patients compared with “non-afflicted” individuals. This measure is defined as the difference between the incidence rates in the CD patients and normal population and, in a cohort study, can be calculated as the difference of cumulative incidence (risk difference) or incidence densities (rate difference) depending on the study design. The AR is a measure of risk which can be pooled; however, since incidence rates were reported in only two studies, we had insufficient data to generate a representative summary statistic.
Furthermore, studies varied greatly at several levels, in particular with respect to the definition of an incident case of lymphoma, the reported outcome measure, and the CD population selection.
Studies differed in their definition of observed cases of lymphoma, in the following manners:
Inclusion of malignancies that antedated the diagnosis of CD. In one American study, the number of at-risk years was calculated both from the time of CD diagnosis and from the time of onset of symptoms that could be attributed to CD.340 In a prior national survey to patients with CD,456 these authors had collected evidence to support that there is usually a long duration of symptoms before a diagnosis of CD is made in the United States, so that they considered this account justifiable. However, authors from other countries would specifically exclude the malignancies that were diagnosed prior to CD, assuming that it was unknown whether these were truly “at-risk” periods and that this account could falsely inflate the incidence of lymphoma in CD.333 Considering that publications uniformly calculated and reported the incidence ratio based on the time period from the CD diagnosis, this is the measure of risk that we selected.
Inclusion of malignancies that were recognized simultaneously to the diagnosis of CD (i.e., within 1 to 12 months of diagnosis). In some cases, the diagnosis of CD can be unknown until the presentation of lymphoma. This fact highlights the possibility that lymphoma can occur in asymptomatic patients with CD. Although the importance of such cases is undeniable, the account of such cases can introduce bias and inflate the incidence of lymphoma in CD. In other words, the simultaneous diagnosis of CD and lymphoma is similar to an incident case in a patient with a “zero” duration of follow-up, i.e., is closer to a measure of prevalence than incidence. The inclusion of cases of lymphoma occurring in patients with previously undiagnosed CD should theoretically be related to all cases of CD, diagnosed and undiagnosed, in order to give an accurate estimate of incidence, which is obviously impossible. However, some studies chose to include such cases, while others excluded them from the incidence calculation. This distinction was noted in the results presentation.
Exclusion of malignancies that were diagnosed incidentally at autopsy. In their large Swedish cohort of individuals hospitalized with CD, Askling et al.337 also excluded unsuspected autopsy diagnoses of lymphoma, assuming that such entities would have been silent during life, and that they therefore could not be controlled for in the comparator group.
Case definition of lymphoma. Lymphomas are broadly categorized as Hodgkin's lymphomas and NHLs. The lymphomas that have been associated with CD have typically been of the NHL type, and so the majority of studies sought cases of NHL, with the exception of the Scottish study from Logan,336 where both Hodgkin's and NHLs were reported.
The reported outcome measures also varied and impaired our ability to combine observations. Some studies reported the incidence of lymphoma, while others, relying on death certificates for ascertainment of outcomes, reported on the mortality from lymphoma.
Finally, the patient selection also varied, along with the reporting of the circumstances that led to the diagnosis of CD. These factors limited our ability to draw conclusions on the risk of lymphoma in symptomatic versus asymptomatic patienst with CD.
We were also unable to find controlled data on the risk of lymphoma in refractory CD, an objective which had been suggested by the TEP. We did find, however, two prospective studies and one retrospective study that could lend support to the notion that the risk of lymphoma in refractory CD is greater than that of responsive CD.457–459
In the Netherlands, Wahab et al.457 prospectively followed 158 biopsy-proven CD patients to assess the recovery of histological changes with a GFD over time. There were 11 incident cases of refractory CD with more than 5-years of follow-up, five of whom developed ETCL, in contrast to none of the remaining GFD-responding CD patients.
Goerres458 reported on 18 patients diagnosed with refractory CD between 1998 and 2000, gathered from all over the Netherlands, whom they treated with azathioprine and prednisone. There were three men and 15 women, with a mean age of 58 years (range 39–82). Subtypes of IEL populations were analyzed by flow cytometry, allowing for the classification of refractory CD patients into two types: type I refractory CD (n=10), in which a normal IEL population is seen, and type II refractory CD (n=8), in which an aberrant IEL population is present. All of the patients with type I refractory CD responded to combined azathioprine-prednisone therapy, whereas none of the patients with type II refractory CD showed a response. In fact, six of the eight patients with type II refractory CD developed EATL within a 3-year period, and a seventh patient died with blastic T-cell-like cells in the small bowel and the liver, and myeloproloferative changes in the bone marrow. The authors concluded that type II refractory CD is a premalignant condition with a very poor prognosis.
In a French national cooperative study, the clinical information and tissue specimen necessary for IEL subpopulation analysis were gathered from 21 patients diagnosed with refractory CD between 1974 and 1998.459 There were five men and 16 women, with a mean age of 51 years (range 29–73 years). Nine of the 21 patients (43%) died from severe malnutrition and/or lymphoma (three patients) after a mean of 6.7 (range 1–14) years after the onset of symptoms of refractory CD. A phenotypically abnormal IEL population associated with evidence of clonality was found in eight of the nine patients that could be tested. The authors suggested that refractory CD may be the missing link between CD and ETCL.
This systematic review identified nine controlled studies that met inclusion criteria. The major observation of our review is that the risk of lymphoma in CD was significantly increased compared to an age-matched population from the same region and period in 8 out of 9 studies. The SIR (NHL) varied from 2.66338 to 42.7,333 whereas, the SMR from NHL or lymphoma in CD varied from 11.4337 to 69.3.339 This increased risk persists even when the cases that are diagnosed with lymphoma simultaneously or within 1 year of the diagnosis of CD are excluded from the calculation.
Some observational studies suggest that the risk of lymphoma, relative to patients of the same age without CD, may be highest in individuals who were diagnosed during adulthood,336, 337 and appears to decrease with adherence to a GFD, as shown by several authors.333, 336–339 It is also interesting to note that the only study that did not report a significant increased risk of lymphoma was one where 75% of patients were on a strict GFD.338
The differential risk of lymphoma among patients diagnosed with CD in adulthood versus childhood may indicate that early diagnosis and treatment with a GFD is protective. The possibility that a GFD may be protective is also supported by Askling et al.337 who found that the risk of lymphoma dropped to unity after 15 years of follow up. Limitations in the designs of these studies, however, prevents firm conclusions. These studies have followed relatively few patients diagnosed as children through middle age when the risk of lymphoma rises, and they may not have accounted for other factors (severity of symptoms, or other marker of disease activity) which might affect risk. The distinction between childhood and adult diagnosis of CD in the published cohorts relies on the presence or absence of CD-related symptoms during childhood, which has historically been a key factor in CD diagnosis. Based on the observations from these groups of patients, it would seem that continuous gluten exposure and ongoing mucosal damage sets the stage for malignancy later on in life. It remains unclear, however, why some individuals would have persistent mucosal damage in the absence of symptoms. Would these individuals also carry other characteristics that modulate their risk of malignancy? As we tap into the base of the “celiac iceberg” through systematic screening, we will hopefully in the future be able to observe the incidence of lymphoma in child and adult CD populations who were identified through population screening, and placed on a GFD despite them being asymptomatic during that period of their lives. The notion that lymphoma arises from prolonged antigenic stimulation should be confirmed if the risk of lymphoma is, as expected, lower than historical CD cohorts in those individuals.
The search strategy did not identify any studies that would allow us to address the specific benefits and harms of testing with different strategies for CD. At present, there is inadequate information from the published literature on the benefits and harms of screening and the potential risks of undetected CD. Prospective trials of screening would be helpful to provide the data necessary to construct the tables that depict the consequences of screening specific populations. Information on the consequences of screening will come from the currently ongoing large population based prevalence studies.
The consequences of such issues as false-positive results were dealt with in the Celiac 1 Discussion. As discussed in that section, the definition of CD used and the prevalence of CD in the test populations, have a great impact on the diagnostic parameters of the available tests. We have presented data that show that the sensitivity of the available tests declines considerably when applied to patients with low-grade histological lesions. Unfortunately, there is insufficient data to address the question of what is the consequence of missing patients with low-grade histological lesions if serological screening alone is used. As described in Celiac 1, all the diagnositic test studies of the various serological markers were undertaken in study populations in which the prevalence of CD exceeded the that observed in most clinical situations. We have shown that the positive predictive value, which is predominately influenced by the test specificity and the prevalence of CD in the test population, drops from the reported values to much lower values when the test is applied in typical clinical populations. To illustrate this point, Figure 31
The four studies of diabetes and CD in children/adolescents that evaluated the impact of a GFD found that body composition parameters improved on the GFD, but HbA1c levels did not improve. Some studies observed an increase in the insulin requirements after introduction of a GFD, which could be explained by improved absorption of nutrients.
The results of studies on anthropometrics and body composition in CD patients are variable due to differences in populations, and methods used to evaluate body composition. Overall, weight and BMI improves after starting a GFD. Individuals with CD may have a lower BMI when compared with controls because of lower daily energy intakes, particularly in those who strictly follow a GFD.
A few small studies have evaluated the impact of the diet on nutritional parameters in newly diagnosed symptomatic CD patients. These studies found that nutritional status does improve in the majority of subjects with CD on a GFD. Certain biochemical parameters such as ferritin may take longer to normalize. There is evidence that the recovery of nutritional status is linked to improvement of villous atrophy. Larger studies of nutritional status in those with classical and silent CD patients and the relationship of biochemical values to changes in histological grade on small bowel biopsy and compliance with the GFD would be helpful.
Compliance with the GFD was assessed in adolescent populations in three studies and the results varied. Compliance with a strict GFD was greater in those who were symptomatic, compared with those who were diagnosed via a screening program. Another study in adults by Ciacci et al.460 looked at the correlation between intestinal biopsy and compliance (assessed by dietary interview) and found that that intestinal damage was significantly associated with dietary compliance. Low or very low compliance with a GFD had a PPV of 92.8%, and good compliance had a negative PPV of 96.8%. This study also suggested that those with more severe symptoms at diagnosis were more likely to have better compliance. Given the poorer compliance in those without symptoms, different strategies to promote adherence with the GFD may need to be developed if screening for CD is promoted.
The justification for screening the general population for CD would be strengthened by well-conducted comprehensive cost-effective analyses. Only one study360 appeared to include the majority of the components that have been recommended for the reporting of cost-effectiveness analyses (CCOHTA, Guidelines for Economic Evaluation of Pharmaceuticals: Canada, 1997). None of the analyses incorporated the use of health related quality of life or utility assessments.
There were a number of methodological limitations in the studies that examined bone-related consequences of CD. Limitations included: selection of representative cases and controls, ascertainment of the outcome and failure to identify and control for relevant co-interventions such as calcium and vitamin D.
The issue of whether fractures are increased with individuals with CD appears to be somewhat controversial based on results of the included studies. Both Thomason et al.394 and Vestergaard et al.388 did not find increased fracture rates for CD subjects, whereas, the recent population-based study by West et al.385 did find an increased rate of fractures. This is an important issue to clarify since osteoporotic fractures are one of the key reasons for promoting strict adherence to the GFD and for making decisions about screening. In some studies, the sample sizes were small and may not have been large enough to detect an increased risk in fractures in subjects with CD relative to controls. In addition, methodologies and study populations varied, and not all studies controlled for duration of CD. Moreno et al.392 found that the risk of fracture in subclinical and silent cases of CD was not significantly different from that of controls. Overall, the risk of fracture seemed to increase with age as one would anticipate and may be greater in those patients who were clinically symptomatic. Based on results of current studies, the risk of fracture appears to be highest prior to diagnosis of CD and diminishes once individuals are on GFD. This latter finding would be consistent with the increase in BMD that is seen after 1 year on a GFD. Additional population based fracture studies would be useful to clarify the relative and absolute risk of fracture in CD and to determine if it differs in asymptomatic cases.
Overall, the studies consistently documented an increased prevalence of osteoporosis/osteopenia in newly diagnosed patients relative to controls. There was a significant increase in BMD, especially within the first year of being on a GFD. Some of the variability in the results could be attributed to proportion that were compliant with the diet and use of co-interventions such as calcium and vitamin D. Moreno et al.392 found that the lumbar spine BMD did not differ in groups according to clinical presentation, but they did find a significantly lower T score of the femoral neck BMD in classically symptomatic cases versus subclinical or silent cases. Mustalahti et al.,378 however, found that BMD in the spine was lower in asymptomatic cases.
Based on the two studies in children,352, 377 BMD appears to normalize in children after treatment with a GFD. The normalization of BMD in children would support the need for early diagnosis of CD and treatment. However, in children skeletal growth may affect BMD, with some of the change relating to changes in growth. Most studies of BMD in adults on a GFD have found that the BMD is still reduced at all sites when compared to normal controls. One study suggested that those without secondary hyperparathyroidism at time of diagnosis may normalize their BMD, but this finding was not replicated. A large BMD study with baseline and follow-up small bowel biopsy data, and documentation of clinical presentation, percent compliance with the GFD and adjustment of co-interventions is recommended to give us accurate information on bone-related consequences of CD.
The majority of observational studies have demonstrated an increase in overall mortality rate (SMR of 2 or greater) in subjects with CD when compared with the general population. The increase in mortality can be attributed to deaths from malignant diseases, respiratory, and digestive diseases. The increase in mortality appears to be greatest within the first 3 years after diagnosis and declines over time. The mortality rate seems to increase with longer delays in diagnosis and poor adherence to the GFD. Perhaps one of the most important points from the Corraro study,362 is that the mortality rate was not increased compared to the general population for those individuals who had mild symptoms or were asymptomatic. This latter result has potential implications for population screening for CD.
Some of the same concerns expressed in the other celiac objectives, regarding clinical definitions, histological criteria, and the performance of the serological tests, are repeated when the results of the studies on monitoring adherence to a GFD are considered. Foremost in facilitating the interpretation of these studies is the question of what to consider as the histological criteria to define recovery on a GFD. Certainly normalization to Marsh 0 would constitute recovery, but what about improvement to Marsh I or II, or even accepting Marsh IIIa? The distinction has important implications for assessing the strength of the correlation between histological and serological improvement, and in this regard, different studies have adopted different cut-offs.
It is clear from the presented studies that improvement of symptoms does not offer an accurate assessment of adherence to a GFD as judged by interview or by biopsy. This point is illustrated in the study by Kluge et al.461. In follow-up of 18 adult patients with CD, all patients felt well and appeared to be clinically in remission. Nonetheless, only 17% of the patients reported being on a strict GFD. Biopsy assessment of eight patients showed six with total villous atrophy including one patient who reported strict adherence to GFD. The remaining two patients did not have villous atrophy but the mucosa was not normal, including an excess of IELs. Thus, small amounts of gluten may provoke a histologic change without clinical symptoms which may be an important reason why adherence to GFD may be less than perfect. In other words, non-compliance does not necessarily translate into noticeable consequences for the patient. Furthermore, it is increasingly recognized that most CD patients don't have symptoms, so reliance on symptomatic improvement is clearly not adequate.
There is good evidence that mucosal recovery following institution of GFD is slower and more incomplete than previously assumed, especially in adults.405, 411, 414 Whether this slow recovery is due to dietary transgression, inadvertent gluten intake or whether this is simply the natural history of the disease is less clear. This has definite implications for the interpretation of both biopsy and serology results in monitoring adherence to GFD, particularly in the short run.
With the advent of the newer and more sensitive serologic tests for CD (EMA, tTG), the possibility of a reduction in the need for follow-up biopsies and a move towards non-invasive serological monitoring has been proposed. The question arises as to whether serology can detect dietary transgressions and reasonably mirror histological improvement on a GFD.
A number of studies show that values of serologic markers will fall with increasing duration of GFD, whether one looks at IgA-AGA, IgA-EMA, or IgA-tTG. As well, several studies suggest that in both adults and children, increasing degrees of non-compliance with a GFD, are more likely to be associated with positive serologic tests.396, 402, 408. The question, however, is not whether serology can pick-up major transgressions such as with a gluten challenge which it is clearly capable of assessing,400, 404 but rather if serology can pick-up milder degrees of dietary non-compliance and reasonably reflect histologicalstatus. A high rate of falsely-negative serology with lesser degrees of dietary transgression would diminish serology as a means of accurately monitoring adherence.
In both adults and children, the sensitivity of serology for picking-up dietary transgressions based on interview or self-reporting is disappointing.401, 402, 410, 415 One conflicting study412 showed a good correlation between serology and adherence. This likely reflects the way patients were categorized, and it is likely that in this study, patients with lesser degrees of dietary transgression were categorized as compliant. In general, there is a significant rate of normal serology in patients identified as not adhering to a GFD. Furthermore, evidence from several studies suggests that serology, regardless of the actual test used, does not adequately reflect the mucosal state in adults.398, 403, 407, 409, 409, 413 Surprisingly, it seems that serology may be normal, not only in Marsh I or II lesions, but also when there is villous atrophy present.398, 407, 409, 413 Although the specificity of various serologic markers for villous atrophy seems better than sensitivity,398 the NPV of serology would suggest that a negative test does not offer high assurance of the absence of villous atrophy.
As discussed earlier, mucosal recovery can be a slow process. It may be that serologic markers may better reflect histology in long-term follow-up. Certainly, in the range of follow-up of these studies (6–30 months), serology may be negative despite villous atrophy. There is evidence that even in longer follow-up, serology does not accurately reflect adherence.398, 402, 410
In younger patients, IgA-AGA and IgA-EMA-ME may better represent the mucosal state.397, 415 These studies are in keeping with the impression that in children and adolescents, mucosal recovery is faster and more complete. In children, serology seems to be a better marker of the absence of villous atrophy. Still, serology may be negative in the face of lesser degrees of histologic abnormality without villous atrophy.397 The significance of such lower-grade biopsy abnormalities, although, is unclear.
It is possible that IgA-AGA may rise faster with non-compliance to GFD than other markers.396, 400 However, there is little direct evidence to show superiority of one serologic test over another in monitoring adherence.
Perhaps an important question that arises from this discussion, with particular relevance to symptomatic CD patients, is: “is it good enough for CD patients to show symptomatic improvement and a corresponding fall in, or normalization of, a sensitive serological marker without need for ‘normalization’ of the intestinal mucosa?” Unfortunately, this question is not an easy one to answer since many of the outcome studies in CD, particularly for lymphoma and mortality, did not specifically address differences in histologic grade. Furthermore, we identified no clear evidence suggesting that refractory sprue was the result of dietary indiscretion as opposed to a different spectrum of CD. Nonetheless, histological improvement appears to be important. For example, one study356 demonstrated that osteoporotic patients with CD on a GFD who had Marsh III lesions had lower median Z-scores than those with grades less than Marsh III, while another study demonstrated a significant correlation of nutritional status measured by histomorphometric index, with the severity of the histological biopsy grade.346In the former study as well as one other study,358 histologic grade correlated with degree of IDA, all suggesting that the goal of monitoring should be to assess degree of histological improvement.
It can be concluded that the return of serologic markers to normal is associated with duration of GFD and degree of patient compliance. Unfortunately, the correlation remains imperfect, especially in adults, and seems to reflect gross rather than minor degrees of dietary transgressions. Serological tests seem to have a higher specificity than sensitivity for dietary transgressions. It is recognized that this area is controvercial and that clinicians are moving away from routine follow-up biopsy as a means to assess dietary compliance. It seems reasonable to suggest that improvement in clinical parameters, and disappearance of serological markers would be an adequate measure of response to a gluten free diet. In children, because of their faster and more complete mucosal recovery, this strategy of using serology may be an appropriate means to monitor adherence. In adults, however, the situation is somewhat more complex. Therefore, while serology certainly can be an adjunct means to monitor adherence to a GFD, consideration should be given to assessing histological improvement since some evidence exists to suggest that mucosal improvement to at least below a Marsh III appears to be important from an outcomes perspective. If biopsy is to be utilized as a means of assessing adherence to a GFD in adults, the timing of the biopsy needs to take into consideration the slower mucosal healing in adults, and should therefore be performed after 1 year to 1.5 years of a GFD.
Changes in dietary habits are difficult to attain and maintain. The barriers to compliance are many. No interventions to promote compliance with GFD have been studied and found to be effective. Adding to the difficulty of assessing any proposed intervention is the lack of certainty as to how best to measure GFD compliance.
The existing evidence suggests a positive correlation between parental socioeconomic status, education, knowledge of CD, and the compliance of their children.416, 418 Compliant children may also have a better knowledge of CD420 than those children who are non-compliant. Improved knowledge in adults also appears to correlate with compliance.419 It is, therefore, not unreasonable to suggest that interventions designed to improve knowledge about CD in general, and about GFD, and specifically how to identify gluten-containing products, would likely improve compliance with a GFD. Improving knowledge regarding gluten-containing food products and additives would also likely improve self-confidence in choosing gluten-free foods as suggested by Lamontagne et al.419 Improved knowledge of outcomes of untreated CD may also improve compliance. Such information interventions, however, would need to be prospectively evaluated to ensure that they perform as expected.
Membership in a local celiac society appears to be an effective means of promoting compliance with a GFD. This is not surprising since such organizations provide CD patients with not only improved knowledge regarding their disease, and the intricacies of the GFD, but also provide emotional and social support.
It is interesting that one study417 has demonstrated lower rates of compliance in children detected by screen as compared with those diagnosed on the basis of symptoms. It seems logical that if there are no obvious detrimental symptoms from a gluten-containing diet, that children and likely adults will be less likely to be compliant. The authors speculate that since screen-detected patients had a higher mean age of diagnosis, compliance might be promoted by earlier identification. They speculate that earlier detection would avoid the difficulty of changing formed eating habits.
Is early detection of CD an effective intervention to promote compliance? It appears rational that it would be easier to follow a GFD if it were introduced at an earlier age. There are some interesting observations 417 that suggest that diagnosis in early childhood is associated with improved compliance.421 Unfortunately, the issue of compliance in asymptomatic screen-positive individuals casts doubt on the positive downstream effects of screening asymptomatic populations for CD, particularly if the low-compliance rates in asymptomatic individuals can be reproduced in other studies.
In summary, it is suggested by the results of this report that a multidisciplinary approach to patient and parent education and support by physicians, dieticians, and celiac societies, possibly employing formal knowledge and decision support interventions that involve the patient (and parent) directly, are likely to improve compliance in individuals diagnosed with CD. Formal testing of interventions and programs would be valuable.
Overall, the quality of the diagnostic studies assessed in the Celiac 1 objective was quite good, due largely to our stringent inclusion criteria. However, 59% of the included studies reported using a selected patient population that may not be representative of a clinically-relevant population. This is likely related to study design. In addition, only 11% of the studies reported on whether the reference test was reported without knowledge of the index test. However, we felt that this was not a major threat to the validity of the studies.
Two other factors that affect the interpretation of these results, yet were not captured in the quality assessments, are the threshold effects for determining the positivity of a serological test, and the high prevalence of CD in these studies (see above). With these considerations in mind, the overall strength of the evidence is quite good.
The overall quality of reports of the included studies in the Celiac 2 objective was found to be marginal to fair. For example, most of the studies did not report on whether the patients were consecutively enrolled, a factor that could contribute to selection biasHowever, setting aside the quality of individual studies, from a policy perspective, the strength of the evidence is fairly good in that the study populations were selected to reflect that of a North American/Western European descent, that should reflect the demographics of the US population.
The studies included in the Celiac 3 objective were found overall, to be of good quality. Again, the overall strength of the evidence is due largely to the stringent inclusion criteria, such as the requirement for the reporting of standardised rates for the outcomes based on rates from the local general population, and the overall good quality of the included studies.
The majority of studies included in this objective were single group “before-after” studies, although some had in addition a comparative healthy control group. We could not identify any quality instruments for this type of study design and in general, this type of study is considered weak, particularly in the absence of a control group. Overall, however, the strength of the evidence for this objective is fair to good and suggests that the results can be used for policy decisions with the understanding that this area of CD research is still relatively new and requires further high quality studies.
The majority of studies in this objective were also of a “before-after” design. However, in this setting, this design may not pose a major limitation, since the purpose of the study is to assess the change in serology and histology after introduction of a GFD. In this regard, the strength of the evidence for monitoring adherence to a GFD is fairly good. However, there is almost a complete absence of studies of interventions for the promotion of adherence to a GFD.
This review has allowed us to identify several areas in need of future research. Perhaps the most important of these is a need for the development of a consensus on the definition of CD in the era of advanced serological testing. As discussed in the report, this distinction of what one calls CD has profound implications for each of the requested task order objectives. Do screen-positive patients without villous atrophy have CD. Certainly the preliminary evidence suggests that this is the situation in many cases. However, what is required is a new definition of a gold standard for the diagnosis of CD. This new gold standard may include a combination of serology, biopsy and HLA testing. Such a gold standard, when used in studies with a time dimension (e.g., response to a GFD or gluten challenge; extended follow-up), would help answer some of the uncertainties identified in this report including: the real performance of the serological tests when low-grade lesions are considered CD; the diagnostic performance of biopsy alone; the outcomes of patients with these low-grade lesions; and, those that would be “missed” using current screening strategies. Even in the absence of a new gold standard, we could not identify a well-conducted study of the diagnostic performance of the various serological markers when applied to an average population (i.e., one with a prevalence of CD in keeping with the range identified for average risk), with the entire cohort being investigated equally (i.e., all are biopsied). Such a study would at least be able to shed light on the performance of these tests in average-risk patients, and since all patients are biopsied, the relationship of histology to serology could be further assessed.
On a similar theme, we have identified multiple studies that suggest the importance of histological improvement on a GFD. This is a controversial area since in common clinical practice, clinicians are moving away from routine follow-up biopsy. It seems reasonable to believe that improvement in clinical parameters with loss of serological markers is adequate evidence of response to a GFD. In children, this issue may be less important since histological improvement is much more rapid and complete than in adults, and correlation with serology seems better. However, we have identified multiple studies in adults that suggest poor correlation between serology and improvement of histology on a GFD, and other studies that suggest that serology is useful for detecting gross dietary indiscretion, but not minor occurrences. Therefore, the question that arises is what constitutes adequate improvement on a GFD, and what are the criteria to define this improvement. Based on the lymphoma literature that suggests that this malignancy may arise from chronic antigenic stimulation and immune activation, what are the outcomes of adults with clinical improvement, yet persistent histological abnormalities? Are some histological features, such as reduction of mucosal lymphocytes, more important markers of improvement and possibly prognosis than other features such as villous height?
We feel that clarification of these fundamental questions is necessary for the conduct of future studies in all areas of CD, and in particular studies of the diagnostic tests and the outcomes in CD, since these are so dependent on the definitions discussed above.
This report has provided a systematic review on five broad areas of CD, with each of these areas including important sub-components. Perhaps one of the most important findings of this report is the understanding of the importance of how one chooses to define CD in the era of serological testing, and how this apparently clear-cut task has profound implications on all the results presented in this report. Specifically, can CD be diagnosed solely on the basis of serology? Is some degree of villous atrophy necessary for the diagnosis of CD? These questions have important implications downstream of the diagnosis as well. Do CD patients without symptoms or villous atrophy have the same risk of complications as those with villous atrophy? Is serological improvement on a GFD sufficient to reduce CD complications or must there be documented histological improvement, and what degree of histological improvement is necessary?
The results of the Celiac 1 objective suggest that in the era of EMA and tTG antibody testing, AGA testing in both children and adults has a limited role. The sensitivity and specificity of EMA and tTG are quite high (over 95% for sensitivity, and close to 100% for specificity), as are their PPVs and NPVs, but as previously discussed, one has to be aware that the reported diagnostic parameters are taken from studies in which the prevalence of CD was, for the most part, much higher than that seen in usual clinical practice and certainly the PPV of these tests may not be as high as reported when these tests are applied in general population screening. The bulk of the evidence on the diagnostic characteristics of these tests was derived from studies that defined CD as having at least some degree of villous atrophy. We have identified studies that suggest that the sensitivity of these tests drops, at times significantly, when applied to populations with CD with lower-grade histological lesions. This not only has implications regarding those patients with “mild” CD who were missed during screening efforts, but also puts into question the nearly perfect NPV of these tests.
HLA DQ2/DQ8 testing appears to be a useful adjunct in the diagnosis of CD. The test has high sensitivity, in excess of 90% to 95%, but because around 30% of the general population and an even higher proportion of “high-risk” subjects including diabetics and family members also carry these markers, the specificity of this test is not ideal. The greatest diagnostic utility of this test appears to be its NPV.
Biopsy itself, when used with a strict cut-off requiring villous atrophy, appears to have high specificity, but poor sensitivity. Using lower grade cut-offs clearly improves sensitivity, but because of the wide differential of causes of histological lesions similar to Marsh I to IIIa, the specificity suffers. The use of histomorphometric measures, such as quantification of γδ+ IELs, are likely to allow for the use of lower-grade cut-offs while maintaining reasonable specificity. Ultimately, a trial utilizing multiple diagnostic tests in an attempt to capture as many CD patients in a clinically-relevant population as possible, with a time dimension including a response to a GFD or gluten challenge, is required to fully assess the diagnostic characteristics of biopsy alone. This type of study would be able to characterize the false-positive and false-negative rates if all studied patients are followed forward in time.
The included prevalence studies demonstrated important differences in execution, tests for prevalence assessment, and in patient sampling, and their results also have to be interpreted in the light of some of the limitations that have been identified regarding the diagnostic performance of the tests for CD. Nonetheless, the results of this report suggest that CD is a very common disorder with a prevalence in the general population that is likely close to 1:100 (1%). Several high-risk groups with a prevalence of CD greater than that of the general population have been identified including those suspected of having CD, family members of CD patients, type I diabetics, and those with IDA or low BMD. Additionally, the review identified multiple other high-risk groups such as those with Down Syndrome, short stature, and infertility, to name a few, though their inclusion was beyond the scope of this report. These results would suggest that at the very least, high-risk groups should be screened for CD. If the performance of the noninvasive serological tests can be verified in the relatively “low prevalence” situations in general unselected populations, then population screening may also be advisable, particularly if a greater understanding of the consequences of missing early low-grade CD can be obtained, and the issues of low-compliance with a GFD of asymptomatic screen identified patients can be addressed.
CD is known to be associated with GI lymphoma. The results of this report confirm this strong association, with the limitations indicated in the text. Nonetheless, the report identified SIR for lymphoma that ranged from 4 to 40, and SMR that ranged from 11 to 70. GI lymphoma is believed to arise as a result of chronic antigenic stimulation, which leads to the development of a clonal T-cell population with usually a refractory intermediate stage. We have identified epidemiologic data that supports this notion, and suggests that a diagnostic delay, and in particular diagnosis of CD in adulthood, as apposed to in childhood, is associated with poorer outcomes. Fortunately, several studies suggest that adherence to a GFD reduces the risk of lymphoma in CD patients. These findings underscore the importance of early diagnosis and treatment of CD.
The consequences of testing for, and identifying CD patients, is expected to have a positive impact on patient outcomes be it either from a reduced risk of lymphoma with early diagnosis and treatment of CD or from improvements in nutritional status, BMI, and BMD. The consequences of testing in at-risk and symptomatic patients appears to be more straightforward since these patients appear to be more compliant with a GFD and would be expected to benefit from this intervention. The data is less clear for asymptomatic screen-identified patients, particularly those who are truly silent and/or don't have fully developed villous atrophy since, on the one hand the outcome of such patients has not been extensively studied, and on the other hand, compliance with a GFD appears problematic, particularly for those diagnosed in adulthood.
Finally, no specific interventions have been identified that promote adherence to a GFD, but education of patients and family members about CD and about the intricacies of the GFD through multidisciplinary teams, and participation in local CD societies, has been show to improve compliance. Therefore, the development and evaluation of formal educational interventions in collaboration between healthcare professionals and CD societies would appear to be a means to build on the methods that appear to already improve patient compliance. Monitoring of adherence to a GFD appears to be important, since improvement in histologic grade has been associated with improved BMD, IDA, and nutritional status. The serological markers appear to be adequate for detecting gross dietary indiscretion, and responding to gluten challenge, but unfortunately, they have poor sensitivity for detecting lesser degrees of dietary indiscretion, and have inadequate correlation with histological improvement at least in the short-term. It is true that histological improvement tends to lag behind clinical and serological improvement, especially in adults in whom improvement may never be complete, but even considering this, a negative serological test has been shown to miss patients with persistent villous atrophy. The recognition of persistent villous atrophy appears to be important since improvement beyond this level is associated with the improved outcomes listed above. It should be noted, however, that we could not identify a controlled study that objectively determined the level of histological improvement that would be associated with improved outcomes, and this is an area for future study. Although somewhat controversial, nonetheless, based on this report it would appear that follow-up biopsy, at least 1 year after GFD in adults to document improvement of the histological grade, would be valuable.
| 95% CI- | Ninety-five percent confidence interval |
| AGA- | Antigliadin antibody |
| AR- | Attributable risk |
| BMD- | Bone mineral density |
| CD- | Celiac disease |
| DXA- | Dual energy X-ray absorptiometry |
| EGD- | Esophagogastroduodenoscopy |
| ELISA- | Enzyme-linked immunosorbent assay |
| EMA- | Endomysial antibody |
| ESPGAN- | European Society of Pediatric Gastroenterology and Nutrition |
| ETCL- | Enteropathy-associated T-cell lymphoma |
| GFD- | Gluten-free diet |
| GP- | Guinea pig |
| HLA- | Human leukocyte antigen |
| HR- | Human recombinant |
| HU- | Human umbilical cord |
| IDA- | Iron deficiency anemia |
| IDDM- | Type I diabetes (insulin dependent) |
| IEL- | Intraepithelial lymphocytes |
| IF- | Immunofluorescence |
| IgA- | Immunoglobulin A |
| IgG- | Immunoglobulin G |
| ME- | Monkey esophagus |
| NHL- | Non-Hodgkin's lymphoma |
| NPV- | Negative predictive value |
| OR- | Odds ratio |
| PPV- | Positive predictive value |
| Prev- | Prevalence |
| PVA- | Partial villous atrophy |
| RR- | Relative risk |
| SD- | Standard deviation |
| Sens- | Sensitivity |
| SIR- | Standardized incidence ratio |
| SMR- | Standardized mortality ratio |
| SPA- | Single photon absorptiometry |
| Spec- | Specificity |
| SVA- | Subtotal villous atrophy |
| tTG- | Tissue transglutaminase |
| VA- | Villous atrophy |
| Celiac disease |
| Dermatitis herpetiformis |
| Cow's milk protein intolerance (children) |
| Post-gastroenteritis |
| Giardiasis |
| Peptic duodenitis |
| Crohn's disease |
| Small intestinal bacterial overgrowth |
| Eosinophilic gastroenteritis |
| Radiation or chemotherapy |
| Tropical sprue |
| Severe malnutrition |
| Diffuse small intestinal lymphoma |
| Graft versus host disease |
| Hypogammaglobulinemia |
| Alpha chain disease |
| Criteria | Rostami modification (1999) | Original Marsh (1992) | |
|---|---|---|---|
| Marsh 0 | Same as original | Pre-infiltrative: | |
| • Normal mucosal and villous architecture | |||
| Marsh I | Same as original | Infiltrative: | |
| • Normal mucosal and villous architecture | |||
| • Increased numbers of IELs | |||
| Marsh II | Same as original | Hyperplastic: | |
| • Similar to above but with enlarged crypts, with increased crypt cell division | |||
| Marsh III | a | Partial VA: | Destructive lesion: |
| • Shortened blunt villi | • Flat mucosa - complete loss of villi | ||
| • Mild lymphocyte infiltration | • Lymphocyte infiltration | ||
| • Enlarged hyperplastic crypts | • Enlarged hyperplastic crypts | ||
| b | Sub-total VA: | ||
| • Clearly atrophic villi - but still recognizable | |||
| • Enlarged crypts whose immature epithelial cells are generated at an increased rate | |||
| • Influx of inflammatory cells | |||
| c | Total VA: | ||
| • Nearly total VA | |||
| • Severe Marsh atrophic, hyperplastic and infiltrative lesions | |||
| Marsh IV | Same as original | Hypoplastic: | |
| • Total VA | |||
| • Normal crypt height but hypoplasia | |||
| • Normal IEL count | |||
| • Many feel this doesn't exist and represents severe malnutrition | |||
VA=villous atrophy
IEL=intraepithelial lymphocytes
| Criteria | ESPGAN*- 1979 | ESPGAN†- Revised 1990 |
|---|---|---|
| Initial histology | - Absent or nearly absent villi | - Biopsy must remain the initial step in the diagnosis (mandatory) |
| - Recognized existence of less severe lesion | - Recommend capsule over endoscopic biopsy | |
| - No consensus on verification of less severe lesions but recommended if possible continuing gluten diet and assess histology, or re-challenge after GFD, given the large differential of milder histologic lesions | - Large well oriented biopsy | |
| - Histology: hyperplastic VA with hyperplasia of the crypts and an abnormal surface epithelium. The IEL count is raised | ||
| - Morphometery and histochemistry are important aids to diagnosis. | ||
| - Monoclonal antibodies to IEL may be a future aid | ||
| Antibody studies | - n/a | - Recognize that IgA AGA, and EMA have a high degree of sensitivity and specificity for the diagnosis of CD |
| - When such antibodies are present at the time of diagnosis in a child with a typical small intestinal mucosa, and when they disappear in parallel to a clinical response to a GFD, weight is added to the diagnosis of CD that may now be said to have been finally established | ||
| - When biopsy is unavailable in communities were other causes of enteropathy are rare, the presence of abnormal concentrations of two antibodies strongly suggests that CD is a diagnostic possibility | ||
| - Antibodies can be a marker of response to a GFD and a guide to dietary compliance | ||
| Improvement on GFD | - Recognized as central to the definition | - Second mandatory requirement remains a reasonably rapid (weeks rather than many months) clinical remission on a strict GFD |
| - Recognized that improvement need not be complete | - Control biopsy is always a suitable way of verifying the effect of GFD, and is required in asymptomatic pts | |
| Gluten Challenge | - Importance of gluten challenge and re-biopsy emphasized to document “permanence” of gluten intolerance | - No longer a requirement |
| - However, the panel recognized that challenge was not being performed in routine practice (only 652 were performed among several thousand children with gluten intolerance) | - Should be used in equivocal cases such as when no initial biopsy was done, biopsy was inadequate or atypical, in communities with high rates of other enteropathies, or in situations when pts plan to abandon the GFD in an uncontrolled way | |
| - Challenge should be performed after obtaining a control biopsy on a GFD | ||
| - Re-biopsy is performed 3–6 months later with the recognition that relapse can take 5–7 years or more to occur. | ||
| 2-year rule | - To address the issue of transient gluten intolerance, the panel emphasized the usefulness of the 2-year rule after stopping a GFD | - The 2-year rule is practical in most cases, but several reports of relapse occurring 5–7 years after gluten rechallenge |
| - 619 of 652 gluten challenges redeveloped histology compatible with CD by 2 years | ||
Walker-Smith et al., Arch Dis Child 1990:65:99
CD=celiac disease; n/a=not applicable; GFD=gluten-free diet
MEDLINE on DIALOG
s anti(w)endomysial(w)antibod? OR antiendomysial(w)antibod?
s anti(w)endomysium(w)antibod? OR antiendomysium(w)antibod?
s endomysial(w)antibod? OR endomysium(w)antibod? OR endomysial(w)autoantibod? OR endomysium(w)autoantibod?
s endomysial(n3)iga OR antiendomysial(n3)iga OR iga(n)ema
s endomysium(n3)iga OR antiendomysium(n3)iga OR igg(n)ema
s immunoglobulin?(n3)endomysial OR immunoglobulin?(n3)antiendomysial
s immunoglobulin?(n3)endomysium OR immunoglobulin?(n3)antiendomysium
s ema(n3)antibod? OR ema(n3)autoantibod? OR anti(w)ema OR ema(n3)positiv?
s aea AND (endomysial OR endomysium OR antiendomys?) OR aea(n3)positiv? OR aea(n2)igg OR aea(n2)iga
c 1 OR 2 OR 3 OR 4 OR 5 OR 6 OR 7 OR 8 OR 9
s (celiac OR celiacs OR coeliac OR coeliacs OR gluten OR glutens OR glutenin OR glutenins OR gliadin OR gliadins OR celiac disease/de) AND (ema OR aea)
s (celiac OR celiacs OR coeliac OR coeliacs OR gluten OR glutens OR glutenin OR glutenins OR gliadin OR gliadins OR celiac disease/de) AND autoantibod?(n2) positiv?
c 10 OR 11 OR 12
s epithelial(w)membrane(w)antigen
c 13 NOT 14
s s15/human
s s16/eng
EMBASE on DIALOG
s anti(w)endomysial(w)antibod? OR antiendomysial(w)antibod?
s anti(w)endomysium(w)antibod? OR antiendomysium(w)antibod?
s endomysial(w)antibod? OR endomysium(w)antibod? OR endomysial(w)autoantibod? OR endomysium(w)autoantibod? OR endomysium antibody/de
s endomysial(n3)iga OR antiendomysial(n3)iga OR iga(n)ema
s endomysium(n3)iga OR antiendomysium(n3)iga OR igg(n)ema
s immunoglobulin?(n3)endomysial OR immunoglobulin?(n3)antiendomysial
s immunoglobulin?(n3)endomysium OR immunoglobulin?(n3)antiendomysium
s ema(n3)antibod? OR ema(n3)autoantibod? OR anti(w)ema OR ema(n3)positiv?
s aea AND (endomysial OR endomysium OR antiendomys?) OR aea(n3)positiv? OR aea(n2)igg OR aea(n2)iga
c 1 OR 2 OR 3 OR 4 OR 5 OR 6 OR 7 OR 8 OR 9
s (celiac OR celiacs OR coeliac OR coeliacs OR gluten OR glutens OR glutenin OR glutenins OR gliadin OR gliadins OR celiac disease/de) AND (ema OR aea)
s (celiac OR celiacs OR coeliac OR coeliacs OR gluten OR glutens OR glutenin OR glutenins OR gliadin OR gliadins OR celiac disease/de) AND autoantibod?(n2)positiv?
c 10 OR 11 OR 12
s epithelial(w)membrane(w)antigen
c 13 not 14
s s15/human
s s16/eng
MEDLINE on DIALOG
s tissue(w)transglutaminase?? OR tissue(w)trans(w)glutaminase??
s antitissue(w)transglutaminase?? OR anti(w)transglutaminase??
s human(w)transglutaminase?? OR antitransglutaminase??(n3)antibod?
s (immunoglobulin? OR immunoglobulin a/de OR immunoglobulin g/de) AND (transglutaminase OR transglutaminases)
s ttg(n3)antibod? OR ttg(n3)autoantibod? OR ttg(w)(kit OR kits) OR ttga OR httg OR anti(w2)ttg OR human(w)ttg OR elisa(n)ttg OR attga
s (transglutaminase?? AND antibod?) OR (transglutaminase?? AND autoantibod?)
s transglutaminase??(n3)iga OR transglutaminase??(n3)igg OR tg2(n5)transglutaminase?? OR human(w) recombinant(w)tg2
s anti(w)gamma(w)glutamyltransferase AND (antibod? OR autoantibod?)
c 1 OR 2 OR 3 OR 4 OR 5 OR 6 OR 7 OR 8
s (celiac OR celiacs OR coeliac OR coeliacs OR gluten OR glutens OR glutenin OR glutenins OR gliadin OR gliadins OR celiac disease/de) AND (transglutaminase OR transglutaminases OR ttg OR tg2)
c 9 OR 10
s s11/human
s s12/eng
EMBASE on DIALOG
s tissue(w)transglutaminase?? OR tissue(w)trans(w)glutaminase??
s antitissue(w)transglutaminase?? OR anti(w)transglutaminase??
s human(w)transglutaminase?? OR antitransglutaminase??(n3)antibod?
s immunoglobulin OR immunoglobulin a/de OR immunoglobulin a1/de OR immunoglobulin a2/de
s immunoglobulin g/de OR immunoglobulin g1/de OR immunoglobulin g2/de OR immunoglobulin g2a/de OR immunoglobulin g2b/de OR immunoglobulin g3/de OR immunoglobulin g4/de
s transglutaminase OR transglutaminases
c 4 OR 5
c 7 AND 6
s ttg(n3)antibod? OR ttg(n3)autoantibod? OR ttg(w)(kit OR kits OR assay) OR ttga OR httg OR anti(w2)ttg OR human(w)ttg OR elisa(n)ttg OR attga
s (transglutaminase?? AND antibod?) OR (transglutaminase?? AND autoantibod?)
s transglutaminase??(n3)iga OR transglutaminase??(n3)igg OR tg2(n5)transglutaminase?? OR human(w) recombinant(w)tg2
s anti(w)gam