Home > Full Text Reviews > Surveillance for Ocular Hypertension: An... > Agreement and reliability of candidate...

PubMed Health. A service of the National Library of Medicine, National Institutes of Health.

Burr JM, Botello-Pinzon P, Takwoingi Y, et al. Surveillance for Ocular Hypertension: An Evidence Synthesis and Economic Evaluation. Southampton (UK): NIHR Journals Library; 2012 Jun. (Health Technology Assessment, No. 16.29.)

5Agreement and reliability of candidate tonometers for measuring intraocular pressure

Introduction

Background

Raised IOP is the most important risk factor for glaucoma and is the only one that is treatable. The instrument used to measure IOP is called a tonometer. The desirable attributes for a tonometer for use in a monitoring programme are accuracy, precision, acceptability to patients and ease of use. GAT, a contact tonometer, is currently the tonometer most widely used by ophthalmologists and is accepted as the current clinical standard. However, GAT has several limitations (see below for further information), and because of the skills required for its interpretation it is not ideal for the monitoring setting. In this setting, tonometers that do not touch the cornea (non-contact) and the use of which does not require extensive training would be both preferable and more practical. In recent years, a variety of new tonometers for estimating IOP have emerged with potential advantages, including being easy to deliver, non-contact, automated and self-administered, and compensating for, or not being influenced by, corneal thickness and other properties of the cornea.25,26

Description of technologies

Tonometers can be categorised as contact and non-contact tonometers depending on whether or not they involve direct corneal contact. In most cases, measurement of IOP using a tonometer should be carried out by a health-care professional. However, as described below, some tonometers [Ocuton S® (EPSa Elektronik & Präzisionsbav, Saalfeld, Germany) and transpalpebral tonometers] have been specifically designed for self-measurement.27

Contact tonometer

This term refers to tonometers that have direct contact with the cornea.

Goldmann applanation tonometer

The GAT, a contact tonometer, is currently the instrument most commonly used by ophthalmologists to estimate the IOP. It is a slit lamp-mounted device. However, there are some limitations associated with GAT, including the influence of corneal thickness and corneal biomechanical properties, the potential for transmitting infections or causing corneal abrasion and its relative difficulty of use. The GAT calibration assumes that the cornea has a central thickness of between 530 and 560 μm and IOP is likely to be an under- or overestimated when using GAT in people with thinner or thicker corneas, respectively.103 The operating manual recommends three measurements, but this is often not undertaken in clinical practice.

Dynamic contour tonometer

The dynamic contour tonometer (DCT) is a slit lamp-mounted, contact, digital, non-applanation tonometer and is operated in a fashion similar to GAT. It is commercially available as the PASCAL® DCT (SMT Swiss Microtechnology, Port, Switzerland). Corneal anaesthesia is required and there is no need for fluorescein. As corneal thickness is an important factor influencing the IOP measurement, the PASCAL DCT minimises the effect of the architecture of the cornea by using a built-in ‘sensor tip’ with solid-state ‘pressure sensor’, which matches corneal curvature. The concave contour surface approximates the corneal shape when the pressures on both sides of it are equal, with minimum distortion, and directs all forces acting within the cornea to the pressure sensor surface, providing an IOP measure. The pressure in the eye is detected 100 times per second and presented as the diastolic IOP (mmHg) in the LCD (liquid crystal display) screen together with the ocular pulse amplitude (OPA) and a quality score, ‘Q’. OPA is the difference between the systolic and diastolic IOP. The quality score of the data and results can be interpreted as follows: ‘Q1’ is optimum; ‘Q2’ and ‘Q3’ are acceptable; and ‘Q4’ and ‘Q5’ are poor and indicate that the measurement should be repeated. The PASCAL DCT self-calibrates at the beginning of every measurement although a performance test can also be performed.104 A measurement takes about 5 seconds to achieve. Good cooperation is necessary to maintain a steady eye and head position, which are required during the measurement.

Ocuton S

The Ocuton S is a self-measurement tonometer that calculates and displays the IOP value automatically through direct contact of the measuring prism with the cornea. The use of topical anaesthetic is required.105

The Perkins applanation tonometer

The Perkins applanation tonometer is a hand-held device but otherwise uses the same principles as GAT and requires topical anaesthesia and fluorescein instillation. All hand-held applanation tonometers were included under this heading.

Rebound tonometer

The rebound tonometer (RT) is a simple portable device, commercially available as the Icare® tonometer (Tiolat, Helsinki, Finland). Although it is a contact tonometer, topical anaesthetic drops are not required and the tonometer has a disposable tip to minimise the risk of cross-infection. The device processes the rebound movement of a rod probe resulting from its interaction with the eye; there is a shorter duration of impact as the IOP increases. The rebound is influenced by corneal thickness, and for this reason measurement of IOP by this tonometer is prone to measurement error as a result of corneal properties.106,107 Like GAT, it is calibrated for a typical CCT value. Six measurements are recommended to provide accurate results, and the average of six IOP measurements is displayed on the LCD.108 Regular calibration is required.

TonoPen

The TonoPen® (Mentor O&O Inc., Santa Barbara, CA, USA; Reichert Inc., Depew, NY, USA) is a hand-held, self-contained portable tonometer that determines IOP by making contact with the cornea (central contact is recommended) through a probe tip, causing applanation/indentation of a small area. The tip contains a transducer that measures the applied force on the cornea. Topical anaesthetic eye drops are used. After four valid readings are obtained, the averaged measurement will appear on the LCD screen. Up to 10 measurements can be performed according to the manufacturer recommendations. It is recommended that calibration is performed daily before instrument use, when indicated by the LCD screen or whenever batteries are replaced.

Non-contact tonometers

This term refers to tonometers that do not have direct contact with the cornea.

Non-contact tonometer

The air-puff tonometer uses a rapid air pulse to applanate (flatten) the cornea, thus working on the same basic principle as the Goldmann tonometer. The force of the air stream increases linearly over milliseconds, progressively flattening a known area of the cornea. The moment of applanation is determined by an optical sensor and the air pulse is then interrupted. The advantages of the non-contact tonometer (NCT) include speed, the fact that there is no need for topical anaesthesia, and thus a low risk of corneal abrasion (hence its use requires minimal training), and, because there is no direct contact with the eye, any infection issues are avoided. IOP measurement by NCT is affected by the corneal thickness. There are several models available in the market; calibration and the number of recommended measurements may vary accordingly. Some models seek to correct the measurement for the CCT. Some patients find the air puff uncomfortable.

Ocular response analyser

The Ocular response analyser® (ORA) (Reichert Inc., Depew, NY, USA) utilises air-puff technology and an electro-optical system to record two applanation measurements of IOP, one while the cornea is moving inward and the other as the cornea returns. Because of its viscoelastic property, the cornea resists the dynamic air puff, delaying the inward and outward applanation process and resulting in two different pressure values; the average of these two IOP values provides a repeatable, Goldmann-correlated IOP measurement (IOPG). The difference between these two IOP readings is corneal hysteresis (CH), a new measurement related to corneal tissue properties that is a result of viscous damping in the corneal tissue.109 The CH measurement provides a basis for two additional new parameters: corneal-compensated intraocular pressure (IOPCC) and corneal resistance factor.110,111 The IOPCC is an IOP measurement that is less affected by the corneal properties. Four good-quality readings per eye are recommended.112

Transpalpebral tonometer

This type of tonometry includes devices that measure IOP through the eyelid avoiding direct corneal contact. Topical anaesthesia is not required. The Diaton® tonometer (BiCOM Inc., Long Beach, NY, USA; previously commercialised as TGDc-01, Ryazan State Instrument-Making Enterprise, Ryazan, Russia) is a hand-held, pen-like portable device applying this principle. The pressure phosphene tonometer (PPT; Proview® Eye Pressure Monitor, Bausch & Lomb Inc., Rochester, NY, USA) has been developed as a self-measurement tonometer. The PPT is a spring compression device calibrated in mmHg that consists of a probe with a flat applicator of the same diameter (3.06 mm) as the area applanated by the GAT. This instrument delivers a phosphene spot when pressure is applied through the closed eyelid in the superior nasal portion of the eye, a self-perceptible visual phenomenon. The threshold pressure for creating a phosphene spot is the estimated IOP.113

Aim and objectives

Aim

  • To compare the agreement, recordability, practicality, acceptability and reliability of the tonometers used in clinical practice using GAT as the reference tonometer.

Primary objectives

  • To compare the agreement of IOP readings of one or more tonometers in adults with the readings of GAT as the reference tonometer.
  • To explore the factors affecting the agreement between tonometers including CCT, IOP level, previous corneal refractory surgery, type of examiner and use of disposable tonometer heads.

Secondary objectives

  • To report the recordability (proportion of measurements that are recordable) of the alternative tonometers.
  • To report the practicality of the alternative tonometers from the ‘doer’ (examiner) point of view and their acceptability from the ‘user’ perspective.
  • To compare the reliability when reported of the comparator tonometers with that of GAT, including intra/interobserver reliability.

Methods

Inclusion and exclusion criteria

Types of study

Direct comparative studies that assessed the agreement of one or more tonometers with the reference standard tonometer (GAT) in the same group of people were included. Non-English-language studies and conference abstracts were excluded.

Types of participant

Adults aged > 16 years, including those with a diagnosis of OHT or glaucoma, representative of the general population, were included. When the age range was not reported, confirmation from the authors was sought. If no response was received and a mean age and SD or median age and interquartile range (IQR) were provided, a formula was applied (mean − 3SD ≥ 16 or median – IQR/1.35 × 3 ≥ 16) to assess inclusion. This was carried out to prevent exclusion purely on the failure to report the age range of the participants when it was very unlikely that any of the participants were under 16 based on data on the age distribution. Participants with corneal abnormalities were excluded (corneal pathology, including keratoconus, bullous keratopathy or post-corneal grafts).

Types of technology

Reference tonometer

The reference tonometer was the GAT.

Comparator tonometers

All tonometers that could be conceivably used in a monitoring context were eligible for inclusion. Studies that evaluated the agreement of manometry114 (invasive procedure) were not eligible. Tonometers that were primarily used as a research device (such as ocular blood flow115 – recommended for research only) or were unsuitable/unavailable in a clinical setting were not eligible. If a study compared both an eligible and a non-eligible tonometer with GAT the study was included.

Types of examiner

Tonometry performed by any type of examiner including optometrists, ophthalmologists, nurses, technicians and patients was included.

Primary outcome

The primary outcome was the agreement [mean difference and limits of agreement (LoA)] between a tonometer and the reference standard.

Secondary outcomes

  • Interobserver reliability for two observations taken by different observers with the same tonometer.
  • Intraobserver reliability for two observations taken by the same observer with the same tonometer.
  • Practicality for ‘doers’ using the technologies.
  • Acceptability of the tonometers to users and providers.
  • Proportion of participants with a recordable IOP (recordability).

Search strategy

Sensitive electronic searches were conducted to identify reports of published and ongoing studies on the reliability and agreement of tonometers. Databases were searched from 1987 until February 2010 and searches were restricted to articles published in English. Conference proceedings were not included. Studies prior to 1987 were not considered because of technology changes. The following bibliographic databases were searched: MEDLINE, MEDLINE-In Process & Other Non-Indexed Citations, EMBASE, Science Citation Index, BIOSIS and the Cochrane Central Register of Controlled Trials. The websites of key journals were screened for additional relevant or in-press publications, including American Journal of Ophthalmology, Archives of Ophthalmology, British Journal of Ophthalmology, Eye, Graefe's Archive for Clinical and Experimental Ophthalmology, Investigative Ophthalmology and Visual Science, Journal of Glaucoma and Ophthalmology.

Additional searches were undertaken in current research registers, including ClinicalTrials.gov, Current Controlled Trials and World Health Organization International Clinical Trials Registry Platform, and in the HTA database, DARE and the Cochrane Database of Systematic Reviews (CDSR) for relevant evidence synthesis reports. An internet search using Copernic Agent was also undertaken and included key professional organisations and manufacturers of tonometers.

Full details of the search strategies used are provided in Appendix 1.

Data extraction strategy

Two reviewers (AA, AA-B, KMc, ABP or JB) independently screened the titles and abstracts (if available) of all reports identified by the electronic searches. Full-text copies of all studies deemed to be potentially relevant were obtained and independently assessed for inclusion by two reviewers (AA, AA-B, KMc, ABP or JB) using a screening tool developed for this review. Authors were contacted by email (if provided in the manuscript) when only age range data were missing. Any disagreements were resolved by consensus or arbitration by a third party (AA-B, JB).

A data extraction form was developed and piloted. Two reviewers (AA, ABP, JC or AE) independently extracted data on study design, participant characteristics, type of tonometer used and outcome data. We conducted a 20% check of all extracted data (ABP). When outcome data were provided per eye (right/left) and overall (average measurement per participant) for each comparison, right eye data were used. If studies compared different versions of the same technology in the same study only data on the most recent tonometer version were included. IOPCC data were used for ORA. When an individual study provided data from two different groups (e.g. normal/glaucoma), the study with the bigger sample size was included in the analysis. If the sample size was the same in both groups, data from the group with the higher IOP measured by GAT were included. When measurements were performed during different sessions (days apart), data from the first session were used. If the measurements were taken and results reported at different time points during the same day, a measurement time likely to occur in practice (e.g. 10am) was used. When measurements taken before and after surgery were available, preoperative results were included in the main analysis.

When raw outcome data were provided, mean values and SDs per tonometer were calculated. When outcome data were not provided, and if authors' details (email address) were available, they were asked for the mean difference or the SD of the difference between GAT and the comparator tonometer. When mean differences were not reported, they were calculated from the reported data (e.g. GAT and comparator means) by reading the values from a published Bland–Altman plot. When a difference of opinion existed, a third party was consulted (JC).

Quality assessment strategy

Two reviewers (AA, AA-B, ABP or JB) independently assessed the quality of all included studies using a modified checklist adapted from Whiting and colleagues116 and Craig and colleagues.117 Each item was graded as ‘yes’, ‘no’, or ‘unclear’. The quality assessment of diagnostic accuracy studies (QUADAS) tool116 is a recently developed quality assessment tool for use in systematic reviews of diagnostic accuracy studies; however, for this review it was adapted to make it applicable to assessing the quality of reliability and agreement studies. We conducted a 20% check of quality assessment (ABP, JC). Discrepancies were resolved by discussion or arbitration. We classified studies as low quality if they did not meet one or more of the quality criteria.

Data analysis

General approach

The primary outcome, agreement, was assessed by calculating summary LoA.118 Secondary outcomes were tabulated with no quantitative analysis conducted. The 95% LoA interval was calculated for each candidate tonometer from pooled estimates of the mean difference (systematic difference) between a tonometer and the reference standard and of the corresponding variability of agreement (random error). Pooled estimates of mean difference and random error were calculated using the DerSimonian and Laird random-effects method.119 Measures of agreement variability were based on reported (or derived) SDs of within-participant differences. Imputation of within-participant correlation coefficients to allow calculation of the SD of differences was employed, if required, when a correlation estimate was available from other studies of the same type of tonometer. The imputed value was the arithmetic mean of available correlation estimates when more than one study estimate was available. Sensitivity analyses included a fixed-effect analysis and/or imputation of correlations using the minimum correlation coefficient reported from the studies comparing the same tonometer. Additionally, an approximate 95% prediction interval was calculated for both parameters using the estimated τ-value (±1.96τ) from the random-effects analysis to quantify the impact of between-study heterogeneity on the systematic difference and the random error. The prediction interval provides a range of values that could plausibly be observed if a new study was undertaken, based on the observed between-study heterogeneity. Finally, the proportion of studies within 2 and 3 mmHg of GAT was estimated from the pooled difference and SD using the cumulative distribution function for the standard normal deviates for values within ±2/3 mmHg.

Additional sensitivity analyses

Further sensitivity analyses looked at the impact of excluding studies that used suboptimal methods according to our quality assessment tool (i.e. studies in which at least one of the requirements is clearly not met), excluding studies that reported data clustered within persons (i.e. studies in which some or all of the participants contributed data on more than one eye but in which data for one eye only was unavailable) and using the standardised mean difference (SMD) metric, that is, mean difference divided by pooled SD, to address variation in the number of measurements (systematic difference only). An additional analysis was conducted to correct for the underestimated variation in studies with repeated measurements by using reported estimates of within-participant variation to adjust the results of such studies to reflect the variation if only a single measurement had been taken.120

Clinical factor analyses

Heterogeneity between the study estimates in the meta-analyses was explored by visual inspection of forest plots and calculation of I2 statistics. Possible reasons for heterogeneity were explored through prespecified clinical factor analyses. Where possible, studies were categorised according to CCT, previous corneal refractory surgery, type of examiner and IOP level, with corresponding meta-analyses being conducted. For some studies, data relating to a subset of the main cohort were used for the factor analyses if sufficient data were given for the subset but not the full cohort. For the IOP analyses, studies in which the estimated proportion of the sample with OHT (i.e. > 21 mmHg) was > 33% were compared with studies in which this proportion was estimated to be < 33%. The proportion was estimated using the mean and SD for GAT, assuming a normal distribution (studies in which this was not reported were excluded from the subgroup analysis). A similar approach was taken for the CCT analysis with one of the two subgroups being defined as studies in which 33% of patients in the sample had CCT of < 555 μm (i.e. patients with the highest risk of glaucoma conversion). It was not possible to carry out analyses on the use of disposable tonometer heads as none of the included studies reported such data.

A clinical factor analysis was conducted to compare studies with no previous refractive surgery in their sample with studies that had postoperative subsamples. Another analysis compared studies in which the comparator examiner(s) were known to be solely ophthalmologists with studies in which the comparator examiner(s) were all known to be non-ophthalmologists. For self-tonometers, studies in which patients were the sole examiner type were included as a third category in this subgroup analysis. Because of the observed level of heterogeneity, a further clinical factor analysis investigated the impact of manufacturers in studies in which multiple manufacturers produced the same type of tonometer. Formal comparison between factor subgroups was not conducted because of the high level of heterogeneity in the main analyses. When individual studies reported on the impact of the clinical factors considered in the analyses the results of the individual studies were summarised in narrative form without any statistical analysis.

Data abstraction

When the mean difference was not reported, it was derived from the reported mean GAT and mean comparator IOP values. Authors were contacted for clarification if a discrepancy existed between reported mean difference and mean GAT/comparator scores (other than owing to rounding). The study was excluded from the meta-analysis if it was not possible to obtain a satisfactory value for the mean difference. Where IPD were published in the included study report, the IPD were used to calculate all necessary statistics.

The reported mean difference and SD of the differences were ‘validated’ by assessing the consistency between various reported statistics, where available (i.e. reported SD, 95% LoAs, 95% CIs, paired t-tests and Pearson correlation coefficients). In cases in which LoAs were not explicitly reported, the limits were derived if an appropriate Bland–Altman121 plot had been presented. However, LoAs were considered valid only if the mid-point corresponded with a validated measure of the mean difference. If there were any discrepancies between any of the reported statistics relating to the SD, then the value that was most prevalent was used. If a value for the SD was explicitly reported, then it was assumed that the reported value was correct and this was used in the meta-analysis. If this process still failed to be conclusive, then the authors were contacted for clarification. However, if no satisfactory SD was reported (or satisfactory statistics from which it could be derived), then mean correlation imputation (previously described) was used.

Intraobserver repeatability coefficients (RCs) were used as the measure of within-subject variance to adjust the meta-analysis for repeated measurements with the same tonometer, as proposed by Bland and Altman.120 Reported within-subject SDs were converted into RC, where appropriate. If a measure of within-subject variation was not reported, it was imputed using the highest reported value (i.e. the most conservative) from other studies within the same tonometer comparison. Adjustments for repeatability were made for GAT and also for the comparator tonometer (if there was at least one study for the tonometer) reported repeatability statistics.

Data were validated and prepared using Microsoft Excel (Microsoft Corporation, Redmond, WA, USA) and SPSS version 18 (IBM Corporation, NY, USA). Meta-analyses were carried out using the metan command in Stata version 11.

Results

This section is broken down into three main parts: an overview of included studies, a summary of agreement across tonometers and, finally, results per candidate tonometer including quality assessment of included studies, the meta-analysis of agreement and results for the other outcomes (recordability, acceptability, practicality and reliability).

Overview

The study selection process is summarised in Figure 23. In total, 642 reports were identified from the electronic searches as being possibly relevant. The full text of 189 reports was obtained for assessment: 143 from the electronic searches and an additional 46 from reference lists of the selected studies. Finally, 102 reports met the inclusion criteria, including six RCTs.122127

FIGURE 23. Flow diagram of the selection process.

FIGURE 23

Flow diagram of the selection process.

Characteristics of included and excluded studies

Details of the characteristics of the included studies are provided in Appendix 3. A total of 102 studies reporting 130 comparisons involving 11,582 participants (15,525 eyes) were included. The earliest studies took place in 1988128 and the latest in 2010.129,130 Fifteen studies took place in the USA,111,124,127,131142 nine in Italy,143151 nine in the UK,110,128,129,152157 seven in Germany,158164 six in each of Australia,165170 China (excluding Taiwan)171176 and Japan,107,130,177180 five in each of Belgium122,181184 and Switzerland,104,185188 four in Spain,189192 three in each of Saudi Arabia,193195 India,196198 Portugal106,199,200 and Greece201203 and two in Taiwan,125,204 Israel,205,206 the Netherlands,207,208 Sweden209,210 and Turkey.211,212 One study took place in each of Austria,126 Brazil,213 France,123 Ireland,214 New Zealand,215 Norway216 and Denmark.217 One additional study was reported as a multicentre study, taking place in Italy and Spain.218

After contacting authors to clarify the age of study participants, eight authors from nine studies135,140,141,148,189,190,192,201,212 confirmed eligibility for inclusion. One study included only participants who were aged ≥ 50 years but did not give the age range.166

Seventy-three studies (72%) provided information on gender of the participants, with 3337 women and 2900 men. Fourteen studies provided data on participants' race. Almost all studies reported the diagnosis; populations varied: healthy volunteers,158 those diagnosed with OHT155 or glaucoma196 or a mixed population.131

Included studies compared the reference standard tonometer GAT (Haag Streit, Koeniz, Switzerland) with eight different types of tonometer: DCT (PASCAL); RT (Icare); TonoPen; Medtronic Solan, Jacksonville, FL, USA (incorporating Xomed); or Intermedics Intraocular Inc., Pasadena, CA, USA]; Ocuton S; Perkins (Kowa HA-2, Kowa, Japan), NCT (Canon USA Inc., Lake Success, NY, USA; Keeler Ltd., Windsor, UK; NIDEK Co. Ltd., Gamagori, Japan; Reichert Ophthalmic Instruments, Buffalo, NY, USA; or Topcon Corporation, Tokyo, Japan); ORA; transpalpebral tonometer, including the PPT (Proview Eye Pressure Monitor) and the TGDc-01, also known as the Diaton tonometer.

All but three104,179,187 of the included studies reported sufficient data (or data were provided by the authors when contacted) to be included in the agreement meta-analysis. A total of 27, 20, 17 and 37 studies provided data on recordability, acceptability, practicality and reliability, respectively.

The 87 reports that were excluded at the full-text assessment stage as they failed to meet one or more of the inclusion criteria in terms of study participants, study design, candidate tonometers and reference standard are listed in Appendix 2. The tonometers that were not included because they were not commercially available and/or were not suitable for clinical practice were the applanation resonance tonometer,219 ocular blood flow instrument,115 Schiotz tonometer,220 SmartLens (Ophthalmic Development Company, Zurich, Switzerland) tonometer221 and pneumotonometer.222

Summary of agreement across tonometers

Agreement of tonometers with Goldmann applanation tonometer

In total, 99 studies (125 paired comparisons) provided enough data on agreement to be included in the meta-analysis. Comparison across tonometers is difficult given the indirect nature of the analysis. A summary of the main analyses for all candidate tonometers is provided in Tables 9 and 10. Full results of all of the meta-analyses for the candidate tonometers are given in Appendix 4. The percentage of results that would be within 2 mmHg based on the main analysis mean difference and random error (assuming a normal distribution) is also presented. Based on the analysed studies the expected difference did appear to vary across tonometers, with NCT and Tonopen having the smallest estimated difference and Ocuton S the largest. There was substantial uncertainty for most of the tonometers. In terms of the estimated random error, results varied, with Perkins having marginally (over NCT) the smallest expected random error and Ocuton S the largest. For all tonometers, the 95% LoA stretched from at least 3 mmHg less to 3 units higher with Ocuton S and transpalpebral with the widest intervals. With regard to the percentage of measurements that were within 2 mmHg of the GAT value, for most tonometers this value was approximately 50%; Ocuton S at 33% had the lowest value and NCT and Perkins with 66% and 59%, respectively, had the highest. Corresponding values for the percentage within 3 mmHg of the GAT value were 48%, 85% and 79%, respectively, for Ocuton S, NCT and Perkins.

TABLE 9. Pooled estimates and summary 95% LoAs of IOP (mmHg unless otherwise stated).

TABLE 9

Pooled estimates and summary 95% LoAs of IOP (mmHg unless otherwise stated).

TABLE 10. Pooled estimates with 95% prediction intervals of IOP (mmHg unless otherwise stated).

TABLE 10

Pooled estimates with 95% prediction intervals of IOP (mmHg unless otherwise stated).

Prediction intervals

Substantial heterogeneity was observed in estimates between studies for most tonometers. The 95% prediction intervals for the mean difference and random error are shown in Table 10. The values illustrate the impact of the heterogeneity between individual study estimates of the mean difference: −4.0 to 9.4 mmHg for Ocuton S, whereas for NCT the range of values was only from −1.4 to 1.9 mmHg. For most tonometers except NCT and Perkins, a difference of > 2mmHg bar was observed. Similarly, the random error 95% prediction intervals illustrate the difference in the level of variability between studies.

Results by tonometer

Dynamic contour tonometer

Quality assessment

Thirty-four studies representing 3726 participants (4933 eyes) compared DCT with GAT. Figure 24 summarises the quality assessment for these studies. All 34 studies (100%) specified the selection criteria. In nine studies (26%), cases were selected consecutively. A total of 21 studies (62%) reported individual measures taken within 1 hour. In 24 studies (71%) the same clinical data were available for interpretation as would be in clinical practice. In total, 12 studies (35%) reported whether the examiner(s) were masked to the results. In 13 studies (38%), the tonometers were reported as calibrated. Almost all of the studies (31; 91%) included all participants approached in the analysis or stated a reason why not. Only three studies144,148,218 met all of the quality criteria.

FIGURE 24. Dynamic contour tonometer: summary of quality assessment.

FIGURE 24

Dynamic contour tonometer: summary of quality assessment.

Agreement between dynamic contour tonometer and Goldmann applanation tonometer

Thirty-two studies provided sufficient data to include in the meta-analysis. The full results of the agreement analyses of DCT and GAT are given in Appendix 4. Under the main analysis, the pooled mean difference was 1.8 mmHg (95% CI 1.4 mmHg to 2.2 mmHg) with a corresponding random error of 2.4 mmHg (95% CI 2.1 mmHg to 2.6 mmHg). For both analyses there was evidence of a large amount of heterogeneity, with very large I2 values (97% and 95%, respectively), and this can be seen in Figures 25 and 26, in which the forest plots are presented. Based on the main analysis the expected mean difference is 1.8 mmHg (95% LoA −2.9 mmHg to 6.5 mmHg).

FIGURE 25. Meta-analysis of mean difference between DCT and GAT (main analysis – random effects).

FIGURE 25

Meta-analysis of mean difference between DCT and GAT (main analysis – random effects).

FIGURE 26. Meta-analysis of random error between DCT and GAT (main analysis – random effects).

FIGURE 26

Meta-analysis of random error between DCT and GAT (main analysis – random effects).

Recordability

Six studies134,142,161,189,192,210 provided information on the recordability of the DCT. The data are shown in Appendix 5. Individual studies varied in size from 63 to 211 observations. Recordability was high varying from 93% to 100% across the studies.

Acceptability and practicality

Eight studies123,132,143,144,163,184,210,215 reported on the acceptability and/or practicality of the DCT. The data are shown in Appendix 6. For the five studies reporting acceptability, comments were favourable for DCT, and in one study,215 which measured preference, a substantial proportion (36; 34%) expressed a preference for DCT over GAT, with 55 (52%) having no preference. Only three studies reported on practicality, with all three reporting difficulty with its use: the tonometer was ‘not easy to use’123 and ‘entailed a learning curve’144 or extra measurements were needed.210

Reliability

Ten studies104,123,129,142,150,155,159,181,210,218 reported on either inter- and/or intraobserver reliability of DCT. The data are shown in Appendix 7. A variety of reliability measures [LoA, coefficient of variance (CoV), concordance correlation coefficient (CCC), intracluster correlation (ICC) and RC] and types of observers (student, ‘experienced’ ophthalmologist, optometrist and technician) were used, with variable numbers of measurements taken (two or three up to six). Substantial (± 3 mmHg or more) interobserver variation was observed in the 95% LoA for two of the studies with the same lead author,129,155 with a narrower interval for one study (± 2 mmHg).142

Non-contact tonometer

Twenty-eight studies enrolling 2222 participants (2868 eyes) compared NCT with GAT. Figure 27 summarises the quality assessment for these studies.

FIGURE 27. Non-contact tonometer: summary of quality assessment.

FIGURE 27

Non-contact tonometer: summary of quality assessment.

Quality assessment

Selection criteria were specified in 24 studies (86%). In only nine studies (32%) were cases consecutive. A total of 19 studies (68%) reported individual measures taken within 1 hour. In 18 studies (67%), the same clinical data were available for interpretation as would be in clinical practice. In total, 15 studies (54%) reported whether or not the examiner(s) were masked to the results. In nine studies (30%) the tonometers used were calibrated. All bar one study (96%) included all participants approached in the analysis, or stated a reason why not. Only one comparison156 met all quality criteria specified.

Agreement between non-contact tonometer and Goldmann applanation tonometer

Twenty-six studies provide sufficient data to be included in the meta-analysis. The full results of the agreement analyses of NCT and GAT are given in Appendix 4. Under the main analysis, the pooled mean difference was 0.2 mmHg (95% CI −0.1 mmHg to 0.6 mmHg) with a corresponding random error of 2.1 mmHg (95% CI 1.8 mmHg to 2.3 mmHg). For both analyses, there was evidence of a large amount of heterogeneity with very large I2 values (95% for both), and this can be seen in Figures 28 and 29, in which the forest plots are presented. Based on the main analysis, the expected mean difference is 0.2 mmHg (95% LoA −3.8 mmHg to 4.3 mmHg).

FIGURE 28. Meta-analysis of mean difference between NCT and GAT (main analysis – random effects).

FIGURE 28

Meta-analysis of mean difference between NCT and GAT (main analysis – random effects).

FIGURE 29. Meta-analysis of random error between NCT and GAT (main analysis – random effects).

FIGURE 29

Meta-analysis of random error between NCT and GAT (main analysis – random effects).

Recordability

Four studies147,156,182,194 provided information on the recordability of NCT tonometers. The data are shown in Appendix 5. Individual studies varied in size from 45 to 100 observations. Recordability was very high (96–100%) for all bar one study (76%).147

Acceptability and practicality

Three studies156,173,194 reported on the acceptability and practicality of NCT tonometers. The data are shown in Appendix 6. In one study,173 ‘approximately 50%’ of participants appeared to prefer NCT 2000 to Pulsair 2000 or GAT. In another study,194 11% of participants expressed anxiety about the NCT Pulsair EasyEye, necessitating a 5-minute period between measurements to allow patients to ‘calm down’. Two studies173,194 found NCT tonometers to be faster than GAT, with NCT 2000 favoured over Pulsair 2000 for ease of use and speed. One study156 found Pulsair 2000 (mean of 2 minutes) to be faster than GAT and Ao MkII (both with a mean time of 3 minutes).

Reliability

Nine studies122,149,152,156,175,193195,217 reported a reliability measure. The data are shown in Appendix 7. None reported on interobserver reliability. A variety of reliability measures [including CoV, LoA, mean (SD), RC and variance of the difference between the middle reading and the average of the first and last readings (Varmid)] and types of observers (ophthalmologist and optometrist) was used with variable numbers of measurements taken (three to four). Three studies reported substantial 95% LoA for intraobserver reliability and one a substantial RC value.152,193,195,217

Ocuton S

Three studies enrolling 173 participants (258 eyes) compared Ocuton S with GAT. Figure 30 summarises the quality assessment for these studies.

FIGURE 30. Ocuton S: summary of quality assessment.

FIGURE 30

Ocuton S: summary of quality assessment.

Quality assessment

In all three studies (100%) the selection criteria were specified, the same clinical data were available for interpretation as would be in clinical practice and all participants approached in the analysis were included or a reason was stated why not. On the other hand, none of the studies reported calibration of the tonometers used. Only the study by Sacu and colleagues126 reported selection of the cases consecutively and reported individual measures taken within 1 hour. In the studies by Marchini and colleagues146 and Wells170 the examiner(s) were masked to the results. None of the studies met all criteria specified.

Agreement between Ocuton S and Goldmann applanation tonometer

All three studies126,146,170 provided sufficient data to be included in the meta-analysis. The full results of the agreement analyses of Ocuton S and GAT are given in Appendix 4. Under the main analysis the pooled mean difference was 2.7 mmHg (95% CI −1.2 mmHg to 6.6 mmHg) with a corresponding random error of 3.5 mmHg (95% CI 2.4 mmHg to 4.6 mmHg). For both analyses, there was evidence of a large amount of heterogeneity with very large I2 values (96% and 88%, respectively), and this can be seen in Figures 31 and 32, in which the forest plots are presented. Based on the main analysis the expected mean difference is 2.7 mmHg (95% LoA −4.1 mmHg to 9.6 mmHg).

FIGURE 31. Meta-analysis of mean difference between Ocuton S and GAT (main analysis – random effects).

FIGURE 31

Meta-analysis of mean difference between Ocuton S and GAT (main analysis – random effects).

FIGURE 32. Meta-analysis of random error between Ocuton S and GAT (main analysis – random effects).

FIGURE 32

Meta-analysis of random error between Ocuton S and GAT (main analysis – random effects).

Recordability

Two studies126,170 provided information on the recordability of Ocuton S. The data are shown in Appendix 5. Individual studies varied in size from 68 to 85 observations. Recordability was reasonably high in both studies (82% and 94%).

Acceptability and practicality

Only one study146 reported on the acceptability and practicality of Ocuton S. The data are shown in Appendix 6. Of the 80 participants, 62 (78%) reported no issue, 14 (18%) reported a foreign body sensation and four (5%) complained of burning. Examiners were taught how to use the tonometer by carrying out at least three practice examinations each.

Reliability

Two studies146,170 reported on intraobserver and interobserver reliability for Ocuton S. The data are shown in Appendix 7, with the kappa statistic given for two observers (both intra- and inter-reliability) in one study and a very large RC of 9.2 mmHg reported in the second study.

Ocular response analyser

Quality assessment

Twelve studies enrolling 867 participants (1147 eyes) compared ORA with GAT. Figure 33 summarises the quality assessment for these studies.

FIGURE 33. Ocular response analyser: summary of quality assessment.

FIGURE 33

Ocular response analyser: summary of quality assessment.

All 12 studies (100%) specified the selection criteria and included all participants approached in the analysis or stated a reason why not. None of the studies selected cases consecutively; it was unclear in most of the cases (92%). Half of the studies (50%) reported details on individual measures were taken within 1 hour. In nine studies (75%) the same clinical data were available for interpretation as would be in clinical practice. Five studies (42%) reported whether the examiner(s) were masked to the results. Only two studies (17%) reported calibration of the tonometers. No studies met all of the specified criteria.

Agreement between ocular response analyser and Goldmann applanation tonometer

Twelve studies110,111,129,130,135,136,142,174,183,184,212,214 provide sufficient data to include in the meta-analysis. The full results of the agreement analyses of ORA and GAT are given in Appendix 4. Under the main analysis, the pooled mean difference was 1.5 mmHg (95% CI 0.9 mmHg to 2.2 mmHg), with a corresponding random error of 2.8 mmHg (95% CI 2.5 mmHg to 3.1 mmHg). For both analyses, there was evidence of a large amount of heterogeneity with very large I2 values (93% and 89%), and this can be seen in Figures 34 and 35, in which the forest plots are presented. Based on the main analysis the expected mean difference is 1.5 mmHg (95% LoA −3.9 mmHg to 7.0 mmHg).

FIGURE 34. Meta-analysis of mean difference between ORA and GAT (main analysis – random effects).

FIGURE 34

Meta-analysis of mean difference between ORA and GAT (main analysis – random effects).

FIGURE 35. Meta-analysis of random error between ORA and GAT (main analysis – random effects).

FIGURE 35

Meta-analysis of random error between ORA and GAT (main analysis – random effects).

Recordability

Two studies142,185 reported on recordability with 62 of 63 and 49 of 50 participants (98%) having a valid measurement.

Acceptability and practicality

One study184 reported on acceptability and none reported on practicality. Discomfort was reported to be higher for ORA than for RT and DCT.

Reliability

Five studies110,129,142,183,185 reported a reliability measure. The data are shown in Appendix 7. Two studies129,142 reported on interobserver reliability, showing substantial LoA ± 3 or 4 mmHg. A variety of reliability measures (CCC, CoV, LoA, mean difference and RC) and types of observers (ophthalmologist and optometrist) was used with variable numbers of measurements taken (three up to eight).193,195 One study142 reported a substantial 95% LoA for intraobserver reliability.

Perkins

Four studies132,166,191,200 enrolling 433 participants (506 eyes) compared Perkins with GAT. Figure 36 summarises the quality assessment for these studies.

FIGURE 36. Perkins tonometer: summary of quality assessment.

FIGURE 36

Perkins tonometer: summary of quality assessment.

Quality assessment

All four studies132,166,191,200 (100%) specified the selection criteria and included all participants assessed in the analysis or stated a reason why not. Three132,191,200 of the four studies (75%) reported individual measures taken within 1 hour, and three studies132,166,200 (75%) reported calibration of the tonometers. In two studies (50%)132,166 participants were selected consecutively and the same clinical data were available for interpretation as would be in clinical practice. The examiner(s) were masked to the results in two studies (50%).166,200 None of the studies met all of the specified criteria.

Agreement between Perkins and Goldmann applanation tonometer

Four132,166,191,200 studies provide sufficient data to include in the meta-analysis. The full results of the agreement analyses of Perkins and GAT are given in Appendix 4. Under the main analysis, the pooled mean difference was −1.2 mmHg (95% CI −2.8 mmHg to 0.4 mmHg) with a corresponding random error of 2.1 mmHg (95% CI 1.3 mmHg to 2.8 mmHg). For both analyses, there was evidence of a large amount of heterogeneity, with large I2 values (99% and 97%), and this can be seen in Figures 37 and 38, in which the forest plots are presented. This finding is driven by one study166 that had both a much larger mean difference and a much larger random error than the other three studies (see Appendices 3 and 4 for more information on this study). Based on the main analysis the expected mean difference is −1.2 mmHg (95% LoA −5.2 mmHg to 2.8 mmHg).

FIGURE 37. Meta-analysis of mean difference between Perkins and GAT (main analysis – random effects).

FIGURE 37

Meta-analysis of mean difference between Perkins and GAT (main analysis – random effects).

FIGURE 38. Meta-analysis of random error between Perkins and GAT (main analysis – random effects).

FIGURE 38

Meta-analysis of random error between Perkins and GAT (main analysis – random effects).

Recordability

No studies reported on the recordability of the Perkins tonometer.

Acceptability and practicality

Two studies132,166 reported on the acceptability of the Perkins tonometer, one166 of which also reported on its practicality. Perkins was faster to use than GAT (96 vs 120 seconds). The data are shown in Appendix 6. No difficulties were observed but it was noted that the patients were ‘not new to the practice’ in one study.132

Reliability

None of the studies reported on the reliability of the Perkins tonometer.

Rebound tomometer

Fourteen studies106,107,145,154,160,165,167,184,189,190,207,209211 representing 1239 participants (1792 eyes) compared RT with GAT. Figure 39 summarises the quality assessment for these studies.

FIGURE 39. Rebound tonometer: summary of quality assessment.

FIGURE 39

Rebound tonometer: summary of quality assessment.

Quality assessment

Nearly all of the studies106,107,145,160,165,167,184,189,190,207,209211 (13, 93%) specified the selection criteria. In five studies145,165,189,190,209 (36%), cases were selected consecutively. In eight studies106,145,154,165,190,209211 (57%), individual measures taken within 1 hour were reported and the same clinical data were available for interpretation as would be in clinical practice. Ten studies106,107,145,154,165,167,184,207,209,211 (71%) reported whether or not the examiner(s) were masked to the results. In seven studies106,107,145,165,207,209,211 (50%), the tonometers used were calibrated. Almost all studies106,107,145,154,165,167,184,189,190,207,209-211 (13, 93%) included all participants approached in the analysis or stated a reason why not. Only two studies145,165 met all of the specified criteria.

Agreement between rebound tonometer and Goldmann applanation tonometer

Fourteen studies106,107,145,154,160,165,167,184,189,190,207,209211 provided sufficient data to be included in the meta-analysis. The full results of the agreement analyses of RT and GAT are given in Appendix 4. Under the main analysis, the pooled mean difference was 0.9 mmHg (95% CI 0.4 mmHg to 1.4 mmHg), with a corresponding random error of 2.6 mmHg (95% CI 2.1 mmHg to 3.2 mmHg). For both analyses, there was evidence of a large amount of heterogeneity, with very large I2 values (94% and 98%), and this can be seen in Figures 40 and 41, in which the forest plots are presented. Based on the main analysis the expected mean difference is 0.9 mmHg (95% LoA −4.3 mmHg to 6.1 mmHg).

FIGURE 40. Meta-analysis of mean difference between RT and GAT (main analysis – random effects).

FIGURE 40

Meta-analysis of mean difference between RT and GAT (main analysis – random effects).

FIGURE 41. Meta-analysis of random error between RT and GAT (main analysis – random effects).

FIGURE 41

Meta-analysis of random error between RT and GAT (main analysis – random effects).

Recordability

Four studies167,189,190,210 provided information on the recordability of RT. The data are shown in Appendix 5. Individual studies varied in size from 36 to 146 observations. Recordability was very low in one study (18; 50%)190 or 100% in the others.

Acceptability and practicality

Seven studies160,167,184,207,209211 reported on acceptability and/or practicality for RT. The data are shown in Appendix 6. Of the four studies167,184,207,209 reporting acceptability, comments were favourable for RT, with one study167 that measured preference showing a large proportion (28; 74%) in favour of RT over GAT (2; 5%), with 18 (21%) showing no preference. Six studies160,167,207,209211 reported on practicality, with three reporting its use as ‘easy’, two noting that additional measurements were required and one that there was no need for anaesthesia and that it used disposable tips.

Reliability

Four studies154,165,190,210 reported on either inter- and/or intraobserver reliability for RT. The data are shown in Appendix 7. A variety of reliability measures (correlation coefficient, CoV, LoA, ‘intrasubject variation coefficient’ and RC) and types of observers (student, ‘experienced’ ophthalmologist and optometrist) was used with variable numbers of measurements taken (two up to six). Only one study190 reported the interobserver correlation coefficient, which was 0.82 for a small number of eyes (n = 12). Substantial intraobserver variation was observed in the LoA measures for one study,154 with a substantial RC for two studies.165,210

TonoPen

Fifteen studies75,107,138,140,144,153,157,166,169,188,191,197,206,207,216 enrolling 1413 participants (1950 eyes) compared TonoPen with GAT. Figure 42 summarises the quality assessment for these studies.

FIGURE 42. TonoPen: summary of quality assessment.

FIGURE 42

TonoPen: summary of quality assessment.

Quality assessment

All but one study75,107,138,140,144,157,166,169,188,191,197,206,207,216 (93%) specified the selection criteria. In six studies144,153,157,166,206,216 (40%), cases were selected consecutively. Nine studies75,138,140,144,157,169,188,191,216 (60%) reported individual measures taken within 1 hour and 10 studies75,138,140,144,166,169,188,206,207,216 (61%) that the same clinical data were available for interpretation as would be in clinical practice. Nine studies107,140,144,153,157,166,169,207,216 (60%) reported whether or not the examiner(s) were masked to the results. In eight studies75,107,140,144,153,157,166,169 (53%), the tonometers used were calibrated. Almost all studies107,138,140,144,153,157,166,188,191,197,206,207,216 (13, 87%) included all participants approached in the analysis or stated a reason why not. The Salvetat144 study met all of the specified criteria.

Meta-analysis of agreement between TonoPen and Goldmann applanation tonometer

Fourteen studies76,138,140,144,153,157,166,169,188,191,197,206,207,216 provided sufficient data to be included in the meta-analysis. The full results of the agreement analyses of TonoPen and GAT are given in Appendix 4. Under the main analysis, the pooled mean difference was −0.2 mmHg (95% CI −1.0 mmHg to 0.5 mmHg), with a corresponding random error of 3.1 mmHg (95% CI 2.5 mmHg to 3.7 mmHg). For both analyses, there was evidence of a large amount of heterogeneity, with very large I2 values (97% and 98%), and this can be seen in Figures 43 and 44, in which the forest plots are presented. One study166 had a much larger mean difference than the other studies, as was the case for the Perkins mean difference for this study. Based on the main analysis the expected mean difference is −0.2 mmHg (95% LoA −6.2 mmHg to 5.8 mmHg).

FIGURE 43. Meta-analysis of mean difference between TonoPen and GAT (main analysis – random effects).

FIGURE 43

Meta-analysis of mean difference between TonoPen and GAT (main analysis – random effects).

FIGURE 44. Meta-analysis of random error between TonoPen and GAT (main analysis – random effects).

FIGURE 44

Meta-analysis of random error between TonoPen and GAT (main analysis – random effects).

Recordability

Three studies197,207,216 provided information on the recordability of the TonoPen tonometer. The data are shown in Appendix 5. Individual studies varied in size from 103 to 208 observations. Recordability was high in all three: 90–100%.

Acceptability and practicality

Four studies144,166,197,207 reported on acceptability and practicality for TonoPen. The data are shown in Appendix 6. All four reported favourably on acceptability, with one study,197 which measured preference, showing a large proportion (64; 32%) preferring TonoPen over GAT (30 ; 15%), and 100 (49%) having no preference. Two studies144,207 reported that TonoPen was ‘easy to use’ and two studies166,197 reported the time to undertake measurement. Bandyopadhyay and colleagues197 reported a mean (SD) of 54 (18) seconds for TonoPen compared with 15 (4) seconds for GAT, wherease Jackson and colleagues166 reported means of 50 and 120 seconds for TonoPen and GAT, respectively.

Reliability

Three studies152,153,169 reported a reliability measure. The data are shown in Appendix 7. Only intraobserver data were reported. A variety of measures were reported (CoV, ICC, mean difference, LoA and RC). One study169 reported a mean difference of 0.7 (95% LoA −3.3 to 3.6) for TonoPen compared with 0.74 (95% LoA −2.3 to 3.7) for GAT. Tonnu and colleagures152 reported an RC of 4.3 mmHg and 2.2 mmHg for TonoPen and GAT, respectively. The third study153 stated that the CoV was < 5% for 97/99 eyes whereas 2/99 eyes had a CoV between 5% and 10%.

Transpalpebral tonometer

Twenty studies124,125,127,133,136138,158,162,164,168,172,176178,191,198,205,207,208 enrolling 1509 participants (2091 eyes) compared transpalpebral tonometer with GAT.

Quality assessment

Figure 45 summarises the quality assessment for these studies. A total of 19 studies124,125,127,133,136138, 158,164,168,172,176178,191,198,205,207,208 (95%) specified the selection criteria and in six studies124,125,172,176,178,205 (30%) cases were selected consecutively. In total, 10 studies124,127,137,138,168,176,178,191,198,208 (50%) reported individual measures taken within 1 hour. In 12 studies (60%) the same clinical data were available for interpretation as would be available in clinical practice124,127,133,137,138,158,162,168,172,177,205,207 and it was reported that the examiner(s) were masked to the results.124,125,158,162,168,172,176178,205,207,208 In seven studies125,127,172,177,178,205,208 (35%) the tonometers used were calibrated. Almost all studies124,127,133,136138,158,164,168,172,176178,191,198,205,207,208 (18, 90%) included all participants approached in the analysis or stated a reason why not. No studies met all of the specified criteria.

FIGURE 45. Transpalpebral tonometer: summary of quality assessment.

FIGURE 45

Transpalpebral tonometer: summary of quality assessment.

Meta-analysis of agreement between transpalpebral tonometers and Goldmann applanation tonometer

Twenty studies124,125,127,133,136138,158,162,164,168,172,176178,191,198,205,207,208 provide sufficient data to be included in the meta-analysis. The full results of the agreement analyses of transpalpebral and GAT are given in Appendix 4. Under the main analysis the pooled mean difference was −0.5 mmHg (95% CI −1.3 mmHg to 0.3 mmHg) with a corresponding random error of 3.3 mmHg (95% CI 2.8 mmHg to 3.7 mmHg). For both analyses, there was evidence of a large amount of heterogeneity, with very large I2 values (98% and 97%), and this can be seen in Figures 46 and 47, in which the forest plots are presented. Based on the main analysis, the expected mean difference is −0.5 mmHg (95% LoA −6.9 mmHg to 5.9 mmHg).

FIGURE 46. Meta-analysis of mean difference between transpalpebral tonometer and GAT (main analysis –random effects).

FIGURE 46

Meta-analysis of mean difference between transpalpebral tonometer and GAT (main analysis –random effects).

FIGURE 47. Meta-analysis of random error between transpalpebral tonometer and GAT (main analysis – random effects).

FIGURE 47

Meta-analysis of random error between transpalpebral tonometer and GAT (main analysis – random effects).

Recordability

Nine studies124,127,137,138,168,174,176,177,198 reported on recordability, with all but one study137 (recordability of 76%) having very high recordability percentages (91–97%). Seven studies127,137,168,172,176,177,198 reported the reason, i.e. why participants could not see the phosphene. The data are shown in Appendix 5.

Acceptability and practicality

Eight studies125,133,136138,172,205,207 reported on acceptability and/or practicality. Approaches varied between studies although the data were generally favourable. Two studies133,136 reported on practicality from a clinical perspective, with one criticising the subjectivity of patient perception of the phosphene. One study207 suggested that the tonometer was ‘easy to use’. The data are shown in Appendix 6.

Reliability

Eight studies124,127,136,137,158,168,172,177 reported a reliability measure. The data are shown in Appendix 7. One study reported on interoberver reliability.158 A variety of reliability measures (correlation coefficient, CoV, LoA, mean difference and RC) and types of observers (ophthalmologist, patient and technician) was used with variable numbers of measurements taken (3 up to 15). Two studies137,168 reported a wide 95% LoA for intraobserver reliability and one study158 reported a very wide 95% LoA for interobserver reliability. One study127 had an RC of 5.1 mmHg and another of 4.2 mmHg.168

Discussion

Overview

This review was conducted to systematically evaluate the properties of a range of IOP measurement devices (tonometers) and their suitability to replace the current reference standard (i.e. GAT) for surveillance of patients with OHT in a primary-care setting. We identified 102 studies comparing eight candidate tonometer types against GAT. Studies included a variety of individuals, both patients and healthy subjects and patients with treated and untreated OHT and glaucoma. Poor reporting limited the assessment of the quality of the included studies. Most studies did not provide sufficient details to assess all of the criteria. In particular, many studies did not state whether or not cases were consecutive, whether or not the tonometers were calibrated, if examiners were masked to results and the time period within which repeated measurements were taken. Despite comprehensive guidance being available on how to undertake measurement of IOP in clinical studies,223 a substantial proportion of studies appear to adopt suboptimal practice, or at least reporting of such aspects.

Agreement

The results of this study suggest that, compared with GAT, NCT was the tonometer with the most agreement for measuring IOP. Almost two-thirds of measurements with NCT were estimated to be within 2 mmHg of the GAT value and > 80% were within 3 mmHg. Next best was the Perkins tonometer (59% and 79% of measurements, respectively), which was not surprising because it is also an applanation tonometer based on the Goldmann principle. Perkins has the same advantages and limitations as GAT, the only substantial difference being that Perkins is a portable instrument. Other tonometers had about half or less of measurement differences within 2 mmHg or about two-thirds within 3 mmHg. Ocuton S appeared to have the lowest agreement with GAT, with only a third of measurements within 2 mmHg and half within 3 mmHg.

Recordability

Recordability was reported for all tonometer categories except for Perkins. Disappointingly, many studies did not explicitly state the number of participants in whom a measurement was attempted, as opposed to the number in whom a measurement was successfully taken. In general, reported recordability was moderate to very high, with most studies reporting values of ≥ 90%. For one RT study of only 36 participants, recordability was low, at 50%. For NCT, Ocuton S and transpalpebral tonometer, values in the range of 70–90% were observed in a single study, which could be considered problematic if representative of a monitoring scenario.

Acceptability and practicality

Acceptability data were available for all eight tonometers. Because of the variable nature of reporting it is difficult to provide a precise summary. Overall, the data suggest that the candidate tonometers were as acceptable to patients as GAT, if not more so. Marchini and colleagues146 reported short-term side effects (foreign body sensation or burning) associated with Ocuton S in a sizeable number of participants. Ogbuehi and AlMubrad194 reported some anxiety about the procedure, with time breaks of 5 minutes used between measurements. Patient preferences were stated for DCT, TonoPen, RT and NCT over GAT in four studies that assessed this.167,173,197,215

Reliability

Some reliability data were reported for all except the Perkins tonometer; however, data were available for both inter- and intraobserver reliability for only five of the eight tonometers. The number of measurements used to determine reliability varied across studies, as did the statistical measure used. Greater consistency of reporting would be an aid. Either LoA or RC should, in our opinion, always be reported. Nevertheless, there was a clear suggestion of sizeable inter- and intraobserver variability for all seven tonometers for which data were available. It is worth noting that GAT reliability, although often lower than the candidate tonometer values, was also usually sizeable. This would to some extent explain the scale of heterogeneity observed in the agreement meta-analyses, although the use of repeated measurements for both GAT and the candidate tonometer should have lessened the impact.

Clinical issues

Tonometers that could potentially be used to measure IOP in a monitoring setting were eligible for this review. They are based on different principles, and were categorised as contact and non-contact. Another possible classification would be those that are automated compared with those that require investigator judgement. A subgroup of technologies has been designed to enable users to measure their own IOP.

Although GAT is the device currently used in secondary care, and recommended by NICE to diagnose and monitor OHT, it is relatively complex to use, and is influenced by the investigator's experience.1 There are other potential limitations of GAT. Applanation tonometry is based on the Imbert–Fick law, which states that the force to applanate (flatten) the anterior corneal surface is equal to the true IOP multiplied by the applanated area at the posterior corneal surface.224 However, although the applanated area of the anterior surface of the cornea is constant the CCT and the rigidity and resistance to a deforming force vary among individuals. Previous corneal refractive surgery for myopia can lead to measurement errors with GAT by thinning and fattening the cornea and changing its rigidity. The tear film can also affect IOP readings with GAT. The surface tension of the tear film attracts the prism to the cornea, and may cancel the force needed to flatten the cornea. The precise effect of the tear film on IOP readings is not fully understood.25 GAT implicitly assumes a CCT value in its calculation. Some of the candidate tonometers (e.g. ORA and DCT) purport to compensate for the patient's CCT and other corneal properties. Finally, GAT has been shown to underestimate IOP compared with manometry.114 Together, the above issues seem to have led, at least in some studies, to GAT measurements having substantial imprecision, which in turn will have contributed to the substantial random error observed between candidate tonometers and GAT measurements. Reliability data for GAT support this understanding, as noted above.

Two independent experts suggested that a difference of ≥ 2 mmHg would be clinically relevant and might influence the clinical management of patients with OHT. To help in the interpretation of the variability results we explored the proportion of measurements with a difference of ≤ 2 mmHg or 3 mmHg. Others may view a larger difference as acceptable.90

Methodological issues

We chose to include only studies that compared a candidate tonometer against GAT, which we used as a reference standard. In principle, this should have provided some consistency across comparisons, although the results perhaps suggest that this standard, although widely accepted, is somewhat variable in implementation. Implicitly, any contrast between studies is an indirect comparison and suffers from the limitations of such approaches, such that the observed difference may reflect at least to some degree the differences between the studies (e.g. population and observers) that contribute to each comparison. As a result, comparisons between candidate tonometers should be made cautiously.

Although LoA studies are very common, methods for the meta-analysis of such studies have been proposed only fairly recently.118 The approach outlined here was to generate a summary or ‘pooled’ LoA. The mean difference and random error (SD) of individual studies were pooled and 95% LoA generated from pooled estimates. This provides a summary of the evidence across multiple studies that assess the same comparison. Our approach, following that of the original LoA approach for primary studies225 and the meta-analysis approach of Williamson and colleagues,118 assumes an underlying normal distribution for the differences to generate the LoA. This is often at least approximately true for the differences and may still be a reasonable approach even when the assumption is violated in individual studies.120

An important finding of the review was the large-scale heterogeneity between the results of individual studies that assess the same comparison. This was unlike the original example in which this methodology was used.117 Sensitivity analyses and subgroup analyses were undertaken to seek to identify sources of heterogeneity although with little light generated. The scale of the heterogeneity can perhaps be most clearly seen in the prediction intervals, which illustrate a plausible range of values for the mean difference and random error of individual studies for each comparison. For only one of the eight tonometers (NCT) the prediction intervals did not include a difference of ≥ 2 mmHg. Between-study heterogeneity, which reflected differences in study population and methodology, was clearly substantial. Nevertheless, in the absence of clarity of factors that contribute to the IOP measurement the prediction intervals show the uncertainty and the scale of difference that is plausible if a new study were to be carried out.

There were a number of limitations in the reporting of individual studies that limited the extent to which we could accurately represent the evidence. A number of studies included more than one eye per participant, which resulted in clustering of intraindividual data.90 Where possible, we used only one eye per participant, but we chose not to exclude the study if this was not possible. The scale of the clustering effect is uncertain, but our sensitivity analyses, which included only studies with one eye per participant, did not suggest a substantial impact on the estimate of the random error. Four studies did not even provide sufficient information to estimate the mean difference and the within-person difference for all the tonometers, even when allowing imputation of the correlation coefficient. A number of reports did not report any measure of variability for within-person differences and it was necessary to impute a correlation. The sensitivity analyses showed that the results were not unduly influenced by this assumption.

Another limitation for conducting a systematic review was that studies used varying numbers of observations both for the candidate tonometer and for the reference standard. In some cases it was not clear how many measurements had been taken because of ambiguous reporting. We sought to address the impact of variation in the number of measurements in two ways. First, we used the SMD as the effect size given that, in theory, such variability could be addressed this way. Second, where an estimate of the within-person SD was known, we used this value to adjust the variance to match the situation with a single observation of both the candidate tonometer and GAT. If this value was not reported for a particular study but it was reported for another within the same comparison, then the value from the other study was used. Although this did change the estimate of the random error, the magnitude of the change was not large and it did not suggest that it was the primary cause of observed inconsistency in estimates.

Clinical factor analyses

Prespecified analyses looked at the impact of IOP, CCT, refractive surgery and examiner type on the LoA. Because of the magnitude of observed heterogeneity, an additional subgroup analysis investigating the impact of manufacturer and excluding low-quality studies was undertaken. All of the subgroup analyses reflect the limitations in the reported data, although less markedly so for manufacturer, which was generally well reported. Overall, the analyses were inconclusive. The scale of heterogeneity made any subgroup analysis susceptible to spurious differences when only a small number of studies were in a group. In a few studies there were suggestions that IOP was an influencing factor in the observed differences, although this was not a universal finding and appeared to vary by tonometer. Refractive surgery appears to impact on the difference for the transpalpebral tonometer. There was no clear pattern regarding examiner and manufacturer. It was not possible to undertake a subgroup analysis of CCT even on the basis of a crude dichotomy of the group for any of the studies. The exclusion of (based on reporting) lower-quality studies similarly did not provide clarity in this regard, although this perhaps reflects the substantial amount of non-reporting of key information in the studies. Ideally, IPD would have been available to form subgroups, for example according to IOP and CCT. This would allow more informative investigation of the influence of these factors at a review level.

Further research

There is a need for a reporting standard tailored for method comparison studies of tonometers building on recent work in this area.226 Reporting is inconsistent, and basic let alone desirable data are often not presented. The quality assessment highlighted a lack of reporting of key study characteristics and the fact that the issues of clustering of eyes with participants and the number of observations used is regularly ignored. Furthermore, an in-depth study of factors that could influence the pressure measurements is needed for the reference standard and candidate tonometers. This could either be a large primary study or potentially take the form of an IPD meta-analysis.118 Given the level of heterogeneity, it may be the case that a systematic review of LoA studies requires very focused study inclusion criteria akin to those recently proposed for diagnostic test accuracy.227 Further studies evaluating the agreement of the Perkins and Ocuton S tonometers would also be beneficial given that only a small number of studies have been carried out. Finally, further evaluation of the role of GAT as the default tonometer in clinical practice is warranted.

Conclusions

A variety of tonometers are used to evaluate IOP, and GAT is the current reference standard. The NCT or Perkins tonometer appears preferable if the aim is to achieve as close to a GAT measurement as possible. However, the findings cast doubt on the validity of GAT as the default standard. Consistent use of the same tonometer during clinical follow-up is arguably almost as important as the choice of tonometer.

© 2012, Crown Copyright.

Included under terms of UK Non-commercial Government License.

Cover of Surveillance for Ocular Hypertension: An Evidence Synthesis and Economic Evaluation
Surveillance for Ocular Hypertension: An Evidence Synthesis and Economic Evaluation.
Health Technology Assessment, No. 16.29.
Burr JM, Botello-Pinzon P, Takwoingi Y, et al.
Southampton (UK): NIHR Journals Library; 2012 Jun.

PubMed Health Blog...

read all...

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...