How well are journal and clinical article characteristics associated with the journal impact factor? a retrospective cohort study
Associated Data
Abstract
Objective:
Journal impact factor (JIF) is often used as a measure of journal quality. A retrospective cohort study determined the ability of clinical article and journal characteristics, including appraisal measures collected at the time of publication, to predict subsequent JIFs.
Methods:
Clinical research articles that passed methods quality criteria were included. Each article was rated for relevance and newsworthiness by 3 to 24 physicians from a panel of more than 4,000 practicing clinicians. The 1,267 articles (from 103 journals) were divided 60∶40 into derivation (760 articles) and validation sets (507 articles), representing 99 and 88 journals, respectively. A multiple regression model was produced determining the association of 10 journal and article measures with the 2007 JIF.
Results:
Four of the 10 measures were significant in the regression model: number of authors, number of databases indexing the journal, proportion of articles passing methods criteria, and mean clinical newsworthiness scores. With the number of disciplines rating the article, the 5 variables accounted for 61% of the variation in JIF (R2 = 0.607, 95% CI 0.444 to 0.706, P<0.001).
Conclusion:
For the clinical literature, measures of scientific quality and clinical newsworthiness available at the time of publication can predict JIFs with 60% accuracy.
BACKGROUND
Journal impact factors (JIFs) are calculated as the number of times articles in a journal published over a two-year period (e.g., 2005–2006) are cited in all journals indexed by Thompson Reuters Institute of Scientific Information (ISI) during the following year (e.g., 2007), divided by the total number of substantive articles and reviews published in that journal in the first two years (e.g., 2005–2006) [1, 2]. This overall measure of mean “citedness” of articles in a journal is frequently regarded as a measure of journal quality, and, rightly or wrongly, the measure is often used to evaluate research performance, promote candidates, and make decisions about funding [1, 3–,6]. The annual fall publication of JIFs by ISI is a much anticipated event as JIFs and article citation rates are important to journal authors, editors, and publishers. New journals must generally wait at least three years, and often much longer, before they obtain their first official JIF, and being added to the ISI list of publications requires meeting ISI's stringent criteria [7]. Some journals have predicted and published their own JIFs for both internal use and solicitation of articles [8–,10]. Though JIFs are often much criticized [3, 11–,13], they are still the most widely used metric for measuring the overall quality of a biomedical journal [1, 3–,5, 14, 15].
A recent study showed that the 2-year citation count of a high-quality clinical article can be predicted with about 60% accuracy using data readily available in the first month after publication [16]. Variables relating to article quality (e.g., number of authors, abstraction in synoptic journals, multicentered study) were significant factors in the prediction model. Such early predictions of citation frequency can be used to direct articles to clinicians and researchers; select important material for systematic reviews, clinical practice guidelines, and health technology assessment projects; choose educational material; and use as promotion and tenure data for researchers [16]. The current study sought to determine if attributes of articles and journals are associated with JIFs. If the JIF is a measure of quality, then attributes other than citation counts relating to the quality and relevance of an article should have positive associations with and contribute to explaining variation in JIFs. Such attributes would provide measures available at time of publication to predict the subsequent JIF. Association of initial quality features of articles would also provide independent support for the use of JIFs as a measure of quality.
METHODS
Study
A retrospective cohort study was conducted of all articles added to the McMaster University Premium LiteratUre Service (PLUS) for the first 5 months of 2005 compared with JIFs for 2007. McMaster PLUS supports the production of a number of evidence-based journals and information services and has been described in detail elsewhere [17]. The service involves trained research staff who read more than 150 clinical journals <http://hiru.mcmaster.ca/hiru/journalslist.asp> and identify studies that meet explicit criteria for scientific merit for research about diagnosis, prevention and treatment, prognosis, etiology, quality improvement, and cost effectiveness <http://hiru.mcmaster.ca/hiru/InclusionCriteria.html>.
Articles that pass these category-specific criteria are then rated by physicians selected from a panel of more than 4,000 for relevance to clinical practice (the extent to which an article is pertinent to the rater's practice; 7-point scale, with 7 highest) and newsworthiness (the extent to which the article's content represents something that clinicians in the rater's discipline are unlikely to know; similar 7-point scale). This rating is done using an online rating system called McMaster Online Rating of Evidence (MORE). Appendix A, online only, has the rating questions. At least 3 raters per pertinent clinical discipline assess each article. For example, an article on cardiac complications of diabetes mellitus would be rated by at least 3 general practitioners, 3 general internists, and 3 cardiologists. The MORE database only retains articles with average ratings of at least 4 on each scale. The ratings are publicly available through EvidenceUpdates <http://plus.mcmaster.ca/EvidenceUpdates>. Although this service is restricted to relatively high-quality studies in higher-quality clinical journals, it includes unique markers of quality and relevance, available at the time of publication, with sufficient variation to predict citation counts of individual articles [16].
The data set for this study included 1,267 articles from 103 journals, the same dataset used in a previous study [16], excluding 7 articles from journals that do not have a published JIF. In 2009, after the JIFs including articles published in 2005 were released, each article in the dataset was allocated using random number generation (60∶40 split) into a derivation set (760 articles in 99 journals) or a validation set (507 articles in 88 journals). Article- and journal-level variables that potentially influence JIF were gathered (Table 1). The variables included the proportion of articles for a given journal passing methods criteria in 2005, the proportion of articles abstracted in 1 or more of the evidence-based summary journals supported by McMaster PLUS (ACP Journal Club, Evidence-Based Medicine, Evidence-Based Nursing) in 2006, the mean number of disciplines that rated that journal's articles (a measure of audience breadth), and the mean newsworthiness and relevancy scores given by raters. Other variables known to be correlated with citation counts and article success were also considered (Table 1). Due to the journal sample size of 103, the number of variables was limited to 10, based on standard statistical practice [18]. The study team used consensus to include variables based on their influence in predicting citation counts in the previous study [16], whether they could be influenced by authors or editors and/or whether they indicated scientific quality for studies of relevance to clinical practice. JIFs for 2007 for the journals were collected from ISI Web of Knowledge.
Table 1
Variables considered for inclusion in the study based on their potential impact on journal impact factor (JIF), with accompanying decisions to include them given that only 10 variables could be used
Analysis
A multiple linear regression was performed on journal-level means from the derivation set and applied to the validation set using STATA Intercooled 9.0 software. The dependent variable was the natural log transformed JIF for 2007. The variable was transformed because the residual variance in models using non-transformed data was not constant, and this violated one assumption of conventional multiple regression models. The 10 independent variables each represented a summary score (i.e., the average of quantitative variables or the proportion for binary variables) across included articles within a journal for the first 5 months of 2005. A weighting factor was used to account for varying numbers of articles in the journals. The initial regression indicated that a number of variables were not significantly contributing to the model; these were removed, keeping only variables with P<0.1. The derivation model was tested against the validation dataset to evaluate the predictive accuracy of the model in new data.
RESULTS
The 103 included journals had 2007 JIFs ranging from 0.725 to 52.6. The mean JIF in the derivation set was 5.51 (95% CI 4.17 to 6.86), median 3.72 (95% CI 3.10 to 4.34), and in the validation set the mean was 5.86 (95% CI 4.38 to 7.34), median 3.75 (95% CI 3.43 to 4.58). From ISI Web of Knowledge, the median 2007 JIF for journals in the category “medicine, general & internal journals” was 0.835 and 0.712 for the category “medicine, research & experimental journals.”
Only 5 of the 10 variables had a P<0.10 in the initial regression and were kept in the final prediction model (Table 2). Those removed were the proportion of articles that were selected for abstracting in a summary journal, mean relevance, mean sample size of studies, proportion of multicentered studies, and proportion of original studies. The reduced model was highly significant (R2 = 0.607, CI 0.444 to 0.706) and was generalizable to the validation set with only a 1% reduction in the R2. Four of the variables (proportion of articles published in a journal that passed McMaster PLUS quality criteria, mean newsworthiness score, number of authors, and the number of databases indexing the journal) were statistically significant and positively associated with the JIF. The number of disciplines rating the article (reflecting the range of clinical disciplines interested in the article) remained in the model because it had a P<0.10; it was of borderline statistical significance (P = 0.075). The mean predicted JIF in the derivation set was 4.05 (95% CI 1.20 to 13.71), median 3.91 (95% CI 3.45 to 4.39), minimum 1.13, maximum 56.15, and in the validation set, the mean was 3.95 (95% CI 1.00 to 8.05), median 4.24 (95% CI 3.98 to 5.40), minimum 0.82, maximum 52.59.
Table 2
Multiple regression results from analysis of the derivation data set*
Each regression coefficient estimates the expected change in the value of the loge transformed JIF associated with a 1-unit increase in the corresponding independent variable (i.e., 1 more author, 1 more database indexed, 100% increase of proportion of articles passing criteria in 2005, 1-unit increase of mean newsworthiness score, 1 more discipline rating the article), with all the other predictors held constant. The magnitude of the regression coefficient reflects the units of its independent variable.
Because the dependent variable was log transformed and it was difficult to reach consensus on defining a meaningful change of each independent variable, a direct comparison of the effect of each independent variable on the dependent variable is not possible. The univariate R2 value for each variable provides some indication of the relative importance of each, though these univariate values do not take into account collinearity with other independent variables (Table 2). The number of authors, number of indexing databases, and mean newsworthiness all had significant univariate R2 values over 0.2. The proportion passing methods criteria had a significant R2 of 0.08. The number of disciplines rating the article was not significant with an R2 of 0.0005.
DISCUSSION
Quality
The results demonstrate that the subsequent JIF of a clinical journal is associated with quality and newsworthiness measures that can be assessed soon after publication of a journal's articles. Of the five measures in the final model, three are unique to McMaster PLUS (mean newsworthiness score, proportion of articles passing quality criteria, and number of disciplines rating the article). These three variables are in addition to two previously validated predictors of citation counts [16]: the number of authors and the number of indexing services that include the journal title. The univariate R2 values (Table 2) suggest that the number of authors and indexing databases have the most influence on JIF. Mean newsworthiness score and the proportion of articles passing quality criteria are also significant predictors. The JIF is often criticized as not truly reflecting journal quality. However, from the model, there is a strong association of the methodological quality of articles published in a journal and the JIF in this select set of journals.
Other studies of articles in general medicine have shown an association of JIF with study quality. Lee and colleagues [28] found that higher study quality scores corresponded to increased citations, JIF, journal circulation, low manuscript acceptance rates (i.e., more selectivity), and indexing in the Brandon/Hill list of journals and books recommended for a hospital library. Gluud and colleagues [29] found that JIF was a significant predictor of adequate randomization and increasing sample size, but not of study outcome, in hepatobiliary randomized controlled trials.
Few practicing clinicians have a need to cite articles, but they do need to know where they are most likely to read sound studies that apply to their practice. The JIF can guide them to high-quality and newsworthy research. A study by Saha and colleagues [30] showed that for nine general medicine journals, clinicians' (subjective) ratings of quality were strongly correlated with a journal's JIF.
General medical journals such as New England Journal of Medicine, Lancet, JAMA, and BMJ tend to have higher JIFs and broader audiences than more specialized journals such as Stroke and Worldviews on Nursing. Not surprisingly, this is reflected in the model presented here by the significant positive association with JIF of both the number of indexing databases and the number of disciplines selected to rate the articles. Journals publishing articles rated higher for newsworthiness have correspondingly higher JIFs. The implication of these findings is that clinical journals could increase their JIF by changing their peer-review procedures to include clinical ratings of submitted manuscripts as a supplement to detailed peer review. It is also important for journals to nominate their publications to as many databases for indexing as possible.
Citing behavior
The two main theories for explaining citation behaviors are normative and social constructivist. In the first, science is viewed as rewarding quality and effort, essentially “what one says,” by citing work based on article traits [31–,33]. Social constructivist theory purports that citations are given based on “who one is” in the social hierarchy of one's field; citing articles is based on author traits [31–,33]. Both contribute to citation decisions, but studies of citation patterns can often tease apart the relative importance of each domain, by including variables that measure both article quality (e.g., empirical and theoretical content, article length, number of references, tables or graphs, publication lag) and social factors (e.g., author nationality or gender, social ties among authors, institutional prestige) [33]. The model presented here supports the normative theory, with article quality and newsworthiness characteristics contributing significantly to the JIF. It also supports the social constructivist theory with the significance of the number of authors, which is a possible surrogate for social recognition and an increased network of associations among people.
In a previous publication, the two-year citation counts for individual articles were predicted using nineteen variables and similar mulitivariate analyses [16]. Eleven of these nineteen were statistically significant in the final model. Four overlap with the five variables in the current JIF study, all of them relating to measures of study quality [16]. Given that citation counts are used in the calculation of JIFs, it is not surprising that overlap is present among the variables that predict them both.
Recently, measures of frequency of online access and citations of open access journals have been studied (Craig provides a review [8]). Some have found no open access advantage in terms of increased citations [34–,36], while others have found a positive open access advantage [37–,39]. Currently, there are no reliable ways to determine frequency of access of all open access journals, but in the future this measure could prove to be a significant factor in citation behavior.
Strengths and limitations
The study model predicts approximately 61% of the variation in journals' JIFs with measures that are available within days to months of publication, demonstrating a link between a journal's JIF and clinical newsworthiness and broad applicability (importance to multiple disciplines). The results are reliable, as shown by the minimal differences in model performance between the development and validation databases.
The study also provides an independent validation of the article selection process for evidence-based journals, including both methods assessment and second-order peer review (the rating by clinicians on the relevance and newsworthiness of the study to their practice) [17], illustrating that articles so rated at the time of publication are more likely to garner scientific interest. Unfortunately, it is not possible to determine in this observational study whether the article selection process and broadcasting of the selected articles in evidence-based alerting services actually stimulates citations, leading to an increased association. However, the prime target audiences for evidence-based services are practicing clinicians, who are unlikely to be citing studies in published articles.
The lack of complete predictability of JIFs with the included variables may be due to deficiencies in the predictors, impact factors, or, more importantly, measures of complementary (correlated) properties of articles. Citations of articles could be for context, notoriety, or criticism, rather than praise. Rater's assessments of newsworthiness, on the other hand, reflect whether a study provides new information that practitioners should know about for care of their patients. Further, the proportion of articles in a journal that meet methodological criteria for clinical quality reflects the journal's ability to attract and select studies that are more likely to be actionable from a clinical perspective. Thus, the identified measures are at least partly complementary to JIFs. Using a multilevel analysis, which would allow for teasing apart article- and journal-level effects, was investigated. However, since the main outcome was at the higher level (JIF) rather than at the lower level (article), such an analysis was not possible. Also, the JIF is based on citations of all articles in the journal.
A major limitation of this study of journal JIFs is that only a small subset of journals was used (n = 103), with JIFs ranging from 0.725 to 52.6, which were heavily weighted in favor of highly regarded and highly used clinical titles. Appendix B, online only, lists the included journals. Also, the included articles were only those that met PLUS rigorous methodological criteria and minimum relevance and newsworthiness scores of 4 on scales of 1 to 7 (Appendix A, online only). MEDLINE indexes more than 5,000 journals and that number represents only a portion of the total biomedical journals currently published. The findings are not applicable to the full complement of biomedical journals but are most likely to be pertinent to clinical journals across a broad range of disciplines. Current research shows that online journals are more often cited. The 2005 sample does not include online journals, so the importance of this factor could not be tested.
Future research
The strength of predicting JIFs might be enhanced by including lower-quality studies in the models. As there are many more low-quality than high-quality studies, a random sample would be adequate to test the prediction model. Some of the variables excluded from analysis (Table 1) might add predictive value and should be considered in future research. Ultimately, it would be desirable to use a more robust and less controversial reference standard than JIF, which is based solely on citation counts of a somewhat arbitrary set of articles within journals over a fixed period.


