NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Hartling L, Hamm M, Milne A, et al. Validity and Inter-Rater Reliability Testing of Quality Assessment Instruments [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2012 Mar.

Cover of Validity and Inter-Rater Reliability Testing of Quality Assessment Instruments

Validity and Inter-Rater Reliability Testing of Quality Assessment Instruments [Internet].

Show details

Appendix EDecision Rules for Application of the Newcastle-Ottawa Scale

The following coding instructions are taken from the Newcastle-Ottawa Scale website, available here: Text in italics indicates additional guidance for reviewers agreed upon during the initial training teleconference.



  1. Representativeness of the exposed cohort
    Item is assessing the representativeness of exposed individuals in the community, not the representativeness of the sample of women from some general population. For example, subjects derived from groups likely to contain middle class, better educated, health oriented women are likely to be representative of postmenopausal estrogen users while they are not representative of all women (e.g. members of a health maintenance organisation (HMO) will be a representative sample of estrogen users. While the HMO may have an under-representation of ethnic groups, the poor, and poorly educated, these excluded groups are not the predominant users of estrogen).
    1. truly representative of the average in the community*
    2. somewhat representative of the average in the community*
    3. selected group of users e.g. nurses, volunteers
    4. no description of the derivation of the cohort
  2. Selection of the non-exposed cohort
    1. drawn from the same community as the exposed cohort*
    2. drawn from a different source*
    3. no description of the derivation of the non-exposed cohort
  3. Ascertainment of exposure
    1. secure record (e.g. surgical records, medical records)*
    2. structured interview*
    3. written self report
  4. Demonstration that outcome of interest was not present at start of study
    In the case of mortality studies, outcome of interest is still the presence of a disease/incident, rather than death. That is to say that a statement of no history of disease or incident earns a star.
    1. yes*
    2. no


  1. Comparability of cohorts on the basis of the design or analysis
    A maximum of 2 stars can be allotted in this category
    Either exposed and non-exposed individuals must be matched in the design and/or confounders must be adjusted for in the analysis. Statements of no differences between groups or that differences were not statistically significant are not sufficient for establishing comparability.
    Note: If the relative risk for the exposure of interest is adjusted for the confounders listed, then the groups will be considered to be comparable on each variable used in the adjustment.
    There may be multiple ratings for this item for different categories of exposure (e.g. ever vs. never, current vs. previous or never)
    Please see the accompanying background sheet to determine what confounders are considered important for each review topic.
    If the outcome/condition of interest is gender-specific (i.e. depression in pregnancy), only evaluate ‘a’ on whether or not the researchers controlled for age.
    1. study controls for age/sex (the most important factor)*
    2. study controls for any additional factor*


  1. Assessment of outcome
    For some outcomes (e.g. fractured hip), reference to the medical record is sufficient to satisfy the requirement for confirmation of the fracture. This would not be adequate for vertebral fracture outcomes where reference to x-rays would be required.
    1. independent or blind assessment stated in the paper, or confirmation of the outcome by reference to secure records (x-rays, medical records, etc.)*
    2. record linkage (e.g. identified through ICD codes on database records)*
    3. self-report (i.e. no reference to original medical records or x-rays to confirm the outcome)
    4. no description.
  2. Was follow-up long enough for outcomes to occur
    Please see the accompanying background sheet to determine what the minimum required follow-up period is for each review topic.
    1. yes*
    2. no
    If the follow-up period is reported with a mean and a range, and the mean is longer than the required minimum, rate it as ‘yes.’
  3. Adequacy of follow-up of cohorts
    This item assesses the follow-up of the exposed and non-exposed cohorts to ensure that losses are not related to either the exposure or the outcome.
    1. complete follow-up, all subjects accounted for*
    2. subjects lost to follow-up are unlikely to introduce bias – small number lost <20%
    3. follow-up rate <80% and no description of those lost
    4. no description or unclear
    If follow-up rates vary by outcome, use the outcome included in the meta-analysis of the systematic review the article is included in.
    If <20% of subjects were lost to follow-up, but the difference between groups is large consider downgrading to ‘c,’ especially if no reasons for difference in follow-up are provided.


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (1.2M)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...