This project reviewed meta-analyses published over two time periods: the first covered years up to 2003 and the second covered 2004-09. The search strategies, inclusion and exclusion criteria for both periods were the same. However, for studies published during the second period additional items were extracted from eligible reviews. These differences and the rationale for them are discussed in the pertinent sections below.

Search Strategy and Eligibility Criteria

We searched the MEDLINE database (1966 through to December 2009) using a combination of key words related to test accuracy and meta-analysis. The complete search strategy is presented in Appendix A.

Papers were considered eligible when they reported the findings of systematic reviews (defined as reviews using explicit methods to identify, select and extract information from primary research studies) that used quantitative synthesis (meta-analysis) methods to obtain summary estimates of diagnostic or predictive accuracy of medical tests. Our definition of tests encompassed clinical signs and symptoms. We only included English-language reviews published in full text; retrieving the full-text and extracting information from non-English articles entails substantial effort and would be unlikely to affect our conclusions. We did not consider systematic reviews that did not use quantitative analysis methods because one of our key aims was to assess the temporal evolution and current status of meta-analytic methods for synthesizing test accuracy data. We excluded reviews reporting meta-analyses based on individual patient data because they are subject to different design, analysis and reporting considerations. We also excluded Health Technology Assessment documents, evidence reports produced by the Effective Health Care Program of the Agency for Healthcare Research and Quality (AHRQ), and Cochrane Reviews of diagnostic tests; these documents are substantially longer than the typical meta-analyses published in journals and are subject to reporting conventions determined by the respective entities.

Data Extraction

Nine reviewers extracted data from nonoverlapping sets of publications in extraction forms generated using electronic data collection forms which included abbreviated operational definitions for each item. Forms were piloted using articles extracted independently by multiple reviewers and modifications were performed based on the pilot results. The final data extraction form is presented in Appendix B.

Information Extracted From Meta-Analyses Published 1966-2003

For each paper we extracted the following items: bibliographic information (first author, journal, year of publication); number of index tests, reference standard tests and the number of studies included in quantitative analyses; medical subspecialty to which tests were pertinent (cardiovascular disease, obstetrics and gynecology, gastrointestinal disease infectious disease, oncology, nephrology/urology, rheumatology, pulmonary medicine, orthopedics, psychiatry, ear-nose-throat, neurology and pediatrics); the types of test being assessed (histology/cytology/culture-based tests, clinical examination, imaging, biomarker, clinical challenge tests [e.g., pharmacological stress tests], physiologic tests [e.g., electrocardiogram, electroencephalogram] or endoscopy); details about search strategies used; quality assessment and information extracted from each primary study considered by the meta-analysis (including whether the reviews assessed blinding, spectrum bias, and verification bias in the primary studies they reviewed); use of STAndards for the Reporting of Diagnostic accuracy studies (STARD)23; statistical analysis (including assessment and exploration of heterogeneity, metrics used to assess test accuracy and statistical methods used for synthesizing study findings and graphically presenting these results); and assessment of comparative evidence on alternative index tests.

Information Extracted From Meta-Analyses Published 2004–2009

All data items extracted from studies published between 1966 and 2003 were also extracted from meta-analyses published between 2003 and 2009; however, from meta-analyses published during this period we extracted additional information on blinding (specifically whether index test or reference standard assessors were blinded); use of the QUADAS (first published in November 2003)19 checklist to guide quality assessment of the primary studies; and whether the reviews collected information on the following variables from each eligible study: spectrum bias, selection criteria, number of withdrawals, number of indeterminate test results, independence of and timing of test results compared to the reference standard, and participants' sex.

During the course of data extraction, the review team met regularly to discuss specific papers, review data items, and clarify operational definitions. The majority of investigators participating in the project attended each meeting; resolutions of specific issues were reached by consensus and were circulated to all team members in writing.

Data Cleaning and Quality Control

When all reviewers completed their extractions, we merged the individual data extraction forms to generate a combined database. We queried the database to identify missing values, invalid entries (e.g., a numerical value out of the expected range) and logical inconsistencies (e.g., when a study was recorded as not using an advanced statistical method we checked that no such method was checked in the relevant fields). For every missing value identified we required the data extractor familiar with the paper to re-extract information, when necessary with the help of a second reviewer. Additionally, for continuous variables we identified entries with values differing by more than 3 standard errors from their mean value and verified them against the source documents.

All eligible meta-analyses published up to 2003 (n = 260) were extracted in duplicate. Due to the rapid increase in the number of eligible publications in more recent years, only a sample of 83 articles (17 percent of eligible studies published between 2004 and 2009) was extracted in duplicate. In all cases, discrepancies were resolved by consensus among extractors involving additional investigators, as needed (e.g., for issues relevant to the meta-analysis methods used a statistician was involved). For items that were found to be systematically different between extractors, we held discussion to ensure proper understanding of the relevant operational definitions and then, information on problematic items from all papers of the discrepant reviewers was re-extracted.

To ensure inter-reviewer consistency in the data extractions of reviews published after 2004 we implemented additional quality control measures. After data cleaning, we assessed inter-reviewer consistency by performing statistical comparisons between reviewers for all extracted variables, using chi-squared tests (for categorical variables) or analysis of variance (ANOVA) for continuous variables. To avoid the confounding effect of temporal trends, all comparisons were stratified by year of publication. Variables that reached statistical significance were considered suggestive of the existence of systematic between-reviewer differences. Each potentially problematic variable was discussed by the review team and information was re-extracted following standardization of the pertinent operational definitions.

Journal Impact Factor and Review Citation Count

We performed exploratory analyses to assess whether journal impact factor was associated with the reporting characteristics of meta-analyses and whether reporting characteristics correlated with the number of citations accrued by systematic reviews. For each eligible review we collected the citation count (on August 12th, 2011) and the 2-year impact factor of the corresponding journal (using 2010 impact factor information from the Institute of Scientific Information, ISI, Journal Citation Reports databasea). The ISI databases do not include all MEDLINE-indexed journals, thus the total number of publications available for citation and impact factor analyses was smaller than the total number of studies included in this review (of a total of 760 systematic reviews, 732 were included in the analyses of impact factors effects and 733 were included in the analyses of citation counts; 1 study was published in a journal that is included in the databases but has not yet been assigned an impact factor). In some analyses (see below) journals were grouped into high-impact factor general medical journals versus all others. High-impact factor general medical journals were defined as the top 5 journals by 2010 impact factor in the ISI database that belong to the “Medicine, General and Internal” category (New England Journal of Medicine [NEJM], Lancet, Journal of the American Medical Association [JAMA], Annals of Internal Medicine [AIM], and Public Library of Science – Medicine [PLoS-MED]). We used these data to identify factors predicting the number of citations received by each review and to identify whether journal impact factor was associated with review reporting characteristics.

Data Analysis

We calculated descriptive statistics such as means, medians and ranges for continuous variables and proportions for categorical variables, along with appropriate measures to indicate variability around these values (standard deviations, confidence intervals or interquartile ranges). We used histograms to visualize the distributions of variables of interest and line plots to depict trends in the reporting of variables of interest over time.

We compared key methodological and reporting aspects of meta-analyses pertaining to the five most common clinical areas (cardiovascular disease, oncology, gastrointestinal disease, infectious disease, and obstetrics and gynecology) and the five most common test categories (histological/cytological/culture-based tests, aspects of the clinical examination, imaging tests, biomarkers, and physiologic tests) in our dataset. These comparisons were performed using the Fisher exact test for categorical variables or the Kruskal-Wallis test for continuous and count variables.

To detect trends over time in literature review, quality assessment, statistical analysis, and reporting characteristics of meta-analyses, we used logistic regression with each of the items of interest as the response variable and year of publication as an explanatory variable. Change in the number of studies, index and reference standard tests considered in each review were assessed using linear regression of the natural logarithm of these variables on publication year.

For analyses of the count of citations we used negative binomial regression with the count of citations as the dependent variable, an offset equal to the number of years since publication of each systematic review, and different reporting characteristics as explanatory variables. For analyses of the association of journal impact factor with the reporting characteristics of systematic reviews we used different reporting characteristics as binary dependent variables, and journal impact factor as an explanatory variable. We also performed analyses comparing high impact factor general medical journals versus all others. Because all analyses of citation counts and journal impact factor were exploratory and performed across multiple variables included in our database, we only report associations that reached statistical significance at the 0.001 level.

All analyses were conducted using Stata version SE/11.2 (StataCorp, College Station, TX). Statistical significance was defined as a two-sided P value < 0.05 for all comparisons (with the exception of impact factor and citation analyses).

Footnotes

a

2010 is the most recent year for which impact factor information is currently available in the database.