Send to

Choose Destination
J Rheumatol. 2004 Jan;31(1):125-32.

Radiological scoring methods in ankylosing spondylitis. Reliability and change over 1 and 2 years.

Author information

University Hospital Maastricht, Maastricht, The Netherlands.



To compare reliability and change over time of radiological scoring methods in ankylosing spondylitis (AS).


Two trained observers scored 217 sets of radiographs from baseline and from one and 2 years' followup. Sacroiliac (SI) joints were grade 0-4 by the New York method and Stoke Ankylosing Spondylitis Spine Score (SASSS). Hips and cervical and lumbar spine were graded 0-4 by Bath Ankylosing Spondylitis Radiology Index (BASRI). BASRI spinal scores and New York SI are combined into BASRI-spine (score 2-12) and with the addition of BASRI-hips into BASRI-total (2-16). Cervical and lumbar spine were also scored in detail (SASSS, 0-36 each) and were combined into SASSS-total or "modified" SASSS (both range 0-72). To assess change a smallest detectable difference (SDD) was estimated for data on a quasi-interval scale.


The SI scoring methods showed intra and interobserver kappa between 0.36 and 0.70. The BASRI-hip reached kappa between 0.59 and 0.84. Combined SASSS scores were most reliable, with intra and interobserver intraclass correlation coefficients (ICC) between 0.90 and 0.96. The ICC of the combined BASRI scores were also very good, ranging from 0.85 to 0.95. For SI New York, SI SASSS, and BASRI-hip, 0.3-1.2% of patients deteriorated 1 grade; 7.5% deteriorated 1 grade (6.3% of maximum score) in BASRI-spine and BASRI-total, and observers agreed in up to 48% of the cases that no change occurred. The SDD was lowest (7.5; 10% of maximum score) for "modified" SASSS. Only 0.8% of patients deteriorated more than the SDD and observers agreed in up to 92% of the cases that no change occurred.


Radiological scoring methods for AS are moderately to excellently reliable. Under the selected scoring conditions (concealed time order, average of 2 observers, SDD based on interobserver data, unselected patient population) there was too little change over 2 years to be detected reliably by the scoring methods.

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for HighWire
Loading ...
Support Center