NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Santaguida PL, Balion C, Hunt D, et al. Diagnosis, Prognosis, and Treatment of Impaired Glucose Tolerance and Impaired Fasting Glucose. Rockville (MD): Agency for Healthcare Research and Quality (US); 2005 Sep. (Evidence Reports/Technology Assessments, No. 128.)

  • This publication is provided for historical reference only and the information may be out of date.

This publication is provided for historical reference only and the information may be out of date.

Cover of Diagnosis, Prognosis, and Treatment of Impaired Glucose Tolerance and Impaired Fasting Glucose

Diagnosis, Prognosis, and Treatment of Impaired Glucose Tolerance and Impaired Fasting Glucose.

Show details


Topic Assessment and Refinement

The Research Team

A multidisciplinary research team representing epidemiology and systematic review methods (P. Raina, PhD; P.L. Santaguida, PhD), internal medicine and endocrinology (H. Gerstein, MD; D. Hunt, MD), clinical chemistry (C. Balion, PhD), and pediatric endocrinology (K. Morrison, MD) was assembled. The core research team, including experienced staff at the McMaster EPC (L. Booker, BA; M. Gauld, BA; L. Cocking; E. Estrabillo, B.Sc.) and a statistician (H. Yazdi, PhD), participated in regular meetings and reached consensus on key methodological issues. An international Technical Expert Panel (TEP; see Appendix D *) was assembled to provide high-level content expertise and participate in conference calls as needed. Participants in this panel include: Vincenza Snow (ACP), Amir Qaseem MD, PhD, MHS (ACP), Rodney Hornbake MD (ACP representative), Tommy Cross MD (ACP representative), Belinda Ireland MD, MS (AAFP), Kevin Patterson MD, MPH (AAFP representative), Francine Ratner Kaufman MD (AAP representative).

A teleconference with the partner organizations, the Task Order Officer (TOO) from AHRQ, the invited technical experts, and the McMaster team was held early during protocol development. The meeting's purpose was to define the scope of the systematic review and to achieve consensus about the preliminary research questions. As a result of these discussions, some modifications were made to the original questions to address, in particular, important gaps in the knowledge needed by family physicians to diagnose patients for IFG or IGT.

Eligibility Criteria

Publication types, year, and language.

These criteria were applicable to all research questions:


Publication year: 1979 forward,


Publication language: English, and


Publication types: primary studies

Excluded: Systematic reviews, narrative reviews, editorials, letters to the editor, theses, unpublished position papers, consensus conference reports, and practice guidelines.

Study design.

Diagnosis question. No study design exclusions for primary studies.

Prognosis question. Primary studies with prospective cohort and randomized control trials (RCT) study designs with at least one year of follow-up.

Excluded: Case-control studies.

Treatment question. Only RCT designs were eligible. However, studies evaluating non-pharmacological interventions (lifestyle, behavioral, or surgical treatment) using non-RCT designs (controlled clinical trials and concurrent cohort trials) were captured in an annotated bibliography and no other data were extracted. (Appendix C)

Pediatric question. All study designs were eligible.

Study population.

General criteria for IFG and IGT classification. Eligible citations had to include IFG or IGT groups as the study population or analyzable subgroup. The criteria for classifying dysglycemia were key to identifying this specific population. The glucose threshold values used to define DM, IFG, and IGT have varied over the past 25 years (see Table 1). The specific criteria reference (e.g., WHO 85) used within a study was noted. For the diagnosis question, additional checks were undertaken to compare the testing procedures described in the methods and results sections of each eligible study.

Table 1. Plasma glucose cutoffs for diagnosis of IGT and IFG and DM for the varying criteria established at different times.

Table 1

Plasma glucose cutoffs for diagnosis of IGT and IFG and DM for the varying criteria established at different times.

Laboratory testing procedures.

Laboratory test inclusion:


All laboratory testing for glucose had to be undertaken on venous blood plasma or venous blood serum.


OGTT must have used the following parameters: subject was given 75 g of oral glucose (1.75 g per kg to maximum of 75 g for children) and measurement was taken at two hours post-glucose ingestion.


All measurements must have been done in a laboratory and not with a point-of-care device.

Laboratory test exclusion:


The testing was done on whole blood or on capillary samples.


The laboratory testing was undertaken in an acute care setting (eg. emergency ward, intensive care ward following, for example, a myocardial infarction or pneumonia).

These general criteria for the classification of IFG or IGT were applied to all four questions with the exception of the pediatric question.

Pediatric question. Increased recognition of type 2 DM in children has only occurred within the last two decades. Thus, we anticipated a limited number of articles addressing IFG or IGT in this population. Children were defined as 18 years of age or under. Any study that evaluated children, even if it did not meet all of our eligibility criteria for diagnosis, prognosis, or treatment, was included. We indicated “Include for Children” in the screening form (see Appendix B) if the study met the publication type, language criteria, and testing criteria for IFG and IGT.

Study interventions.

Diagnosis question. The research questions on the diagnosis of IFG or IGT were formulated using two distinct characteristics:


Test-retest reliability, and


The relationship between IFG and IGT.

The reliability of the IFG and IGT diagnostic criteria was assessed using a maximum boundary of eight weeks between the first test and repeated testing. There was consensus among the TEP that true change in the disease status would not likely occur during this time interval, and it represented a typical interval for repeat tests in clinical practice.

For the relationship between the 2-hour OGTT and the FPG and the subsequent diagnosis of IGT and IFG, there was a general consensus that the two tests did not necessarily measure the same population. It was recognized that the literature does not agree as to which test is best or should be used to diagnose glycemic disturbance (IFG versus IGT); thus the degree of association between these two diagnostic tests was of interest.

It was also of interest to evaluate the variation between repeated laboratory measures in subjects. This question was not intended to examine the biochemical basis of the test. Instead, the intention was to describe the change in diagnostic category between having IFG, IGT, normal glycemic levels or DM on repeat testing. It was also of interest to describe any related factors that could contribute to the observed variance.

For the relationship between IFG and IGT, studies were included if they used both the FPG and the OGTT to evaluate subjects for dysglycemia.

Treatment question. There was no restriction on the types of interventions used on an IFG or IGT population. It was expected that these interventions would be categorized into four groups: pharmacological, behavioral, lifestyle, or surgical. Moreover, a minimum follow-up of six months was required.

The specification of interventions was not applicable for the prognosis and pediatric questions.

Study outcomes.

The outcomes selected for this study applied to both the prognosis question and the treatment question. Nine disease categories were selected, and then possible medical or procedural outcomes were further specified within each of these categories (see list below). For example, within the cardiovascular disease category, 11 different cardiac-related outcomes were itemized. Studies were considered eligible if they evaluated at least one of the disease categories or one of the disease outcomes within the category.


  • Progression to DM (if measured with eligible testing criteria).
  • Reversion towards NGT (if measured by eligible testing criteria).

Cardiovascular disease:

  • Angina requiring a minimum 24-hour hospitalization.
  • Myocardial infarction (MI).
  • Acute coronary syndrome.
  • Cardiac revascularization.
  • Peripheral revascularization.
  • Cardiac mortality.
  • Angiographic percutaneous coronary interventions (PCI).
  • Coronary artery bypass grafting (CABG).
  • Stent insertion.
  • Angioplasty.
  • Stroke events.


  • All cause.
  • Disease specific (cardiac mortality was included in fatal CVD outcomes).


  • Proteinuria.
  • Microalbuminuria.
  • Dialysis.
  • Renal transplant.
  • Elevated creatinine.
  • Elevated albumin-to-creatinine ratio in the urine.


  • Cataracts.
  • Blindness.
  • Retinopathy requiring laser photocoagulation.
  • Vitrectomy.
  • A retinal photograph assessed by standard criteria showing at least a two-step change in retinal images.

Hypertension/blood pressure:

  • Concurrent therapy for hypertension or measured BP values. It was noted that studies may give baseline values of numbers of subjects on BP medications, and the number of subjects ending with BP medications. This was an acceptable outcome measure.

Lipid level disturbance:

  • If subjects reported baseline and endpoint mean lipid levels for at least one of the following: low density lipids (LDL), total cholesterol levels, high density lipids (HDL), or triglycerides.


  • Amputation (foot, lower limb, or foot digits).

Literature Search Strategy

A comprehensive approach to searching the literature was undertaken in order to capture all relevant reports. We performed a search for all studies involving IFG or IGT without limiting the search to diagnosis, prognosis, or treatment. In this way, we were less likely to miss any studies. After capturing all of the citations, we screened them for inclusion or exclusion pertaining to diagnosis, prognosis, or treatment. Our search for relevant articles included MEDLINE®, Cochrane Central Register of Controlled Trials, HealthSTAR, CINAHL®, AMED, PsycINFO®, and EMBASE® along with the personal files of the research team and the reference lists16 of included articles (Table 2). Appendix A outlines the search strategy used for each database.16

Table 2. Databases and dates included in the search for relevant articles.

Table 2

Databases and dates included in the search for relevant articles.

Study Selection

A team of study assistants was trained to apply the eligibility criteria in preparation for screening the title and abstract lists and the full-text papers. Standardized forms and a training manual explaining the criteria were developed and reviewed with the screeners (Appendix B).

For the title and abstract phase, two reviewers evaluated the citations for eligibility. Those articles that met the criteria were retrieved as well as those where there was insufficient information to determine eligibility. The article was retrieved if either one of the two screeners identified it for retrieval. For screening of full-text articles, two screeners came to consensus on the identification, selection, and abstraction of information. Disagreements that could not be resolved by consensus were resolved by one of our McMaster research team members. The level of agreement for inclusion of studies was measured using kappa statistics.

Data Extraction

All eligible studies from the selection phase (full-text screening) were abstracted onto a data form according to predetermined criteria. Appropriate data collection forms were developed for use in the systematic review (Appendix B). The articles were grouped according to the questions they addressed: diagnosis, prognosis, and treatment. One data extractor transferred the data onto data forms, and another data extractor checked the answers for accuracy before they were entered into a Microsoft Access database.17

Quality Assessment

One member of the research team rated each eligible study within the prognosis and treatment categories for methodological quality (see Appendix B). RCTs were evaluated using the modified Jadad scale.18 A scale developed by MacKay et al.19 for non-RCT studies was used to rate the prospective cohort studies. The MacKay checklist had three subscales that could yield a score out of 5 possible points for reporting, 12 possible points for internal validity, and 1 possible point for external validity. The summary scores for methodological quality (Tables 8 and 21) were used to determine the strength of the evidence and to select those studies with the best methodological scores for subsequent meta-analysis.

Table 8. General study characteristics: Prognosis.

Table 8

General study characteristics: Prognosis.

Table 21. General study characteristics: Treatment.

Table 21

General study characteristics: Treatment.

Summarizing Results: Descriptive and Analytic Approaches

Data from the Access database were summarized in evidence tables, which included data about the general study characteristics (study design, location of study, population characteristics, mean age, and diagnosis criteria for dysglycemia), interventions, and outcomes assessed.

Five classifications of dysglycemia. Studies were grouped according to classification of the IFG/IGT status. Five dysglycemia classifications were considered as risk factors and these included:


isolated IGT (I-IGT),


isolated IFG (I-IFG),


non-isolated IGT,


non-isolated IFG, and


combined IGT and IFG (IGT/IFG).

The threshold values for these five diagnostic groups are detailed in Table 1 as a function of the changing criteria for classification over time. A diagnosis of isolated IGT excludes those diagnosed with IGT who have a FPG between 6.1 and 7.0 mmol/L (110 and 126 mg/dL). A diagnosis of isolated IFG excludes those without a 2-hour OGTT result and those with an OGTT level greater than 7.8 mmol/L (140 mg/dL). For example, a classification of I-IGT using the WHO 98 criteria implies that the FPG was not within the specified range of 6.1 to 6.9 mmol/L, which is indicative of IFG. Thus, this implies that a FPG test was undertaken and deemed negative. However, the OGTT was within the specified range of 7.8 to 11.0 mmol/L and considered positive. A negative FPG and a positive OGTT are required for the classification of I-IGT. Table 1 shows that the classifications of IFG, I-IFG, I-IGT, and combined IFG/IGT are more recent classifications of dysglycemia, commencing with the WHO 1998/99 criteria. Some preliminary evidence suggests that these dysglycemic classifications may represent different subgroups with potentially different mechanisms leading to glucose disturbance.6

It should be noted that the criteria for diagnosis of dysglcemia in observational studies have been defined by the WHO20 —specifically, as the epidemiological criteria20 which enable researchers to classify subjects using just their blood glucose concentration, measured after an overnignt fast or 2 hours after a 75 g oral glucose load, without any confirmatory symptoms or blood/plasma determinations. Thus, the recommendation for such large population studies was a single glucose test at the start of the study.


Diagnosis question. Kappa estimations for the degree of concordance between IFG and IGT.

Prognosis question.

Measures of association between IFG or IGT and outcomes of interest. To evaluate the strength of the association between the exposure of IFG or IGT and the outcomes of interest (DM, CVD, mortality, lipid disturbances, etc), several metrics of risk were selected to evaluate both the risk in prognosis studies and the placebo arms of clinical trials testing interventions.


Annualized risk of progressing to the outcome of interest within the exposed group (i.e., diagnosed with IFG or IGT).


Unadjusted annualized relative risk (RR) with the confidence interval (CI).


Risk difference between the exposed group (IFG or IGT) and the normal group (NGT) or normal fasting glucose (NFG)). This difference was based on the annualized rates.


Attributable risk (due to the IFG or IGT exposure alone) expressed as a percent of the total risk for the duration of the study.

Treatment question.


Absolute risk difference (ARD).


Number needed to treat (NNT).


Relative risk reduction (RRR).

Equations Used To Calculate Measures of Association

Diagnosis question. Kappa coefficients are used to estimate the average rate of concordance between two repeated tests (categorical data) and also take into account chance occurrence.21, 22 The equation and methods for calculating variance and 95% CI are shown in Appendix G, Section A.

Prognosis question. The equations for these measures of association for the prognosis question can be derived from a basic 2x2 table. Table 3 shows an example of a 2x2 table with the dichotomous classification (yes or no) for the outcome of interest, in this example DM, for those with the exposure (IFG or IGT) and those without the exposure status (NGT or NFG). From this table, the incidence for the duration of the study is derived. Eligible studies varied in duration from one to 18 years, thus it was difficult to compare measures of association between studies. For this reason, we converted estimates of risk, RR, and risk difference to annualized values.

Table 3. Example of a 2×2 table of binary outcomes for the disease status of DM and the exposure status of IFG or IGT.

Table 3

Example of a 2×2 table of binary outcomes for the disease status of DM and the exposure status of IFG or IGT.

Annualized risk for those with IFG or IGT and in normal subjects. The incidence was calculated as the rate of those individuals who developed the outcome of interest relative to those at risk, expressed as (a/n1) in Table 3. Incidence rate is conceptually related to the risk (or probability) for developing an outcome over a specified time period.23 The method used to convert an incidence rate to the risk for those patients with IFG or IGT developing the outcome of interest for a specified period of time can be seen in Appendix G, Section B. The advantage of these equations is that it does not assume that the rate of change is linear, as would be the case of simple division of the estimated rate by the number of years.23

The studies evaluated in this systematic review varied in duration from six months to 18 years. In order to allow comparison across studies, the time period was standardized to a one-year period when reporting risk of the exposed for the outcome of interest. Appendix G, Section B allows for conversion of the varied time periods across studies to a common time period of one year. To facilitate presentation of the annualized risk, values are shown as per 100 persons.

The annualized risk for those with IFG or IGT who then progressed to the endpoint of interest, which, in the example of DM, was calculated as follows:23

Image er-glucosefu1.jpg= Annualized risk in the exposed group (IFG or IGT) at time t and calculated as:

Image er-glucosefu2.jpg

Relative risk for those with IFG or IGT relative to NFG or NGT. The RR is the ratio of the incidence in the exposed group (IFG or IGT) over the incidence of the unexposed group (NFG or NGT). Details of the derivation of RR are presented in Appendix G, Section B.

Image er-glucosefu3.jpg

Calculation of the CI is presented in Appendix G, Section C.

Risk difference. The risk difference estimates the difference in risk that is attributable solely to the exposure of having IFG or IGT. This estimate is based on the annualized risk estimates and expressed per 100 persons.

Risk difference Image er-glucosefu4.jpg is calculated as:

Image er-glucosefu5.jpg

Attributable risk for those with IFG or IGT relative to NFG or NGT for the study duration. The attributable risk (AR) represents the proportion of excess risk of the disease outcome (above the background risk) in the exposed group. It is calculated using the incidence in the exposed group (IFG or IGT) and then subtracting the incidence in the non-exposed group (NFG or NGT). This numerator is then divided by the incidence in the exposed group (IFG or IGT) and multiplied by 100 in order to be expressed as a percent. Note that this estimate of the AR was not based on annualized estimates and was therefore not converted to an annualized proportion. The percent AR was expressed for the duration of the study. The AR was estimated using the following equation:

Image er-glucosefu6.jpg

Treatment question. The metrics of association (AR, RR, and AR) presented for studies evaluated in the treatment question are based on annualized risk estimates (from extracted data where permitted). However, many of the studies also presented these same estimates for the study duration but may not have reported sufficient data to permit annualized estimates. Thus, where possible both annualized and study duration estimates were presented.

ARD, NNT, and RRR. The ARD is a metric frequently used in clinical epidemiology that expresses the absolute risk difference between the event rate in the treatment group and that of the control/placebo group. The ARD compares the outcome rates on an arithmetic scale and is expressed in absolute terms.

ARD = absolute value[(R Et ) - (R Ct )]

An alternative way of expressing the difference between groups is with the number of patients needed to treat (NNT). The NNT expresses the number of patients that a clinician must treat (with the intervention used in the study in question) in order to prevent one patient from having a target outcome Clinicians may find this estimate of the risk difference to be a useful expression of the magnitude of the treatment effect.24 The NNT is calculated as the inverse of the ARD caused by treatment and is detailed, as follows:

NNT = 1/ ARD

Lastly, the RRR is an additional metric used in clinical epidemiology to express the risk that is taken away by the intervention used in the study. It assists in comparing studies with different baseline risks as it considers the ARD and then divides this by the risk for the placebo/control group. The equation used to estimate the RRR is as follows:

RRR = ARD / R Ct


Quantitative meta-analyses were undertaken within each of the dysglycemia classification groups with a minimum of two studies for the unadjusted annualized RR. Some of the prospective cohort studies were related and not independent. One cohort could have multiple publications that reflected analyses done at different time intervals or on different groups within the same population cohort. Therefore, one representative publication was selected from the series of related studies to be included within the meta-analysis. The representative study was selected by consideration of the methodological quality score, the larger sample size, and the year of publication.

An overall pooled estimate was calculated across all study populations. Tests for heterogeneity were undertaken and, when statistically significant, only the results from the random-effects model (REM) were used to calculate the pooled estimate. Statistical software (SAS, version 8.2)25 was used to calculate the test for heterogeneity and the pooled estimates. Studies were weighted according to the inverse of their variances. Individual study effect sizes were calculated and plotted by year of publication.

Tests of heterogeneity. Tests of heterogeneity are statistical analyses for examining whether the observed variation in study results is compatible with the variation expected by chance alone. The test for heterogeneity selected for this review (Q) is detailed in Appendix G, Section D. The smaller the p value of the Q test (that is the more significant the test), the greater the likelihood that the observed differences between the studies was not due to chance alone. If the value of the Q test is relatively low (for example, one in 10 or one in 20) then the observed differences in the results betweens studies is likely related to factors other than chance.26 The potential factors that account for these differences can be numerous, and caution should be used when attempting to explore the nature of these differences. A single factor may not be the only important source of heterogeneity.

Tests of heterogeneity have some limitations that can make interpretation difficult.27 It has been suggested that the statistical power of the Q test in most cases is low (due to a small number of combined studies). As such, the test may indicate that it is not statistically significant at conventional levels, but, in reality, heterogeneity is present. Similarly, if the sample sizes of the studies are very large, the Q test may be significant even when individual effect sizes do not really differ. Furthermore, design flaws and publication biases can also make the interpretation of heterogeneity tests difficult. For example, if all the studies meta-analyzed have the same design flaw, a consistent bias is present that could make the effect sizes appear more reliable than they really are. Conversely, if the studies have different design flaws, the meta-analysis could show a positive test for heterogeneity, but, in reality, reflect the same underlying population.28

In summary, caution must be used when interpreting the Q statistic. Some have argued that this test should be omitted while others have suggested that it should only be used as a diagnostic tool until further research accounts for variation between studies. One method recommended for dealing with sources of heterogeneity is the REM for meta-analysis.27 Rather than attempting to explain or adjust for the variability between studies, the REM takes into account the variation in the underlying effect sizes. The use of the REM is often used when the source of the variance cannot be identified.27 As such, the REM cannot investigate the causes of the heterogeneity.

When meta-analyses in this systematic review revealed a significant test for heterogeneity (Q), the REM was used to calculate the overall pooled estimate. Exploratory sensitivity analyses were undertaken for those meta-analyses that had five or more studies. In these analyses, each study was removed from the pooled estimate, and the Q test and the overall pooled estimate were reviewed. These data were used to judge whether any individual studies should be removed from the combined estimate.

Peer Review Process

A list of potential peer reviewers was assembled from a number of sources including our TEP, our partners, the McMaster research team, and the AHRQ (see Appendix D.)



Appendixes are available electronically; see http://www‚Äč for Appendixes A-G.


  • PubReader
  • Print View
  • Cite this Page

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...