Validation of the new classification criteria for systemic lupus erythematosus on a patient cohort from a national referral center: a retrospective study

Aim To validate Systemic Lupus International Collaborating Clinics (SLICC)-12 and American College of Rheumatology (ACR)-97 classification criteria on a patient cohort from the University Hospital Center Zagreb. Methods This retrospective study, conducted from 2014 to 2016, involved 308 patients with systemic lupus erythematosus (SLE) (n = 146) and SLE-allied conditions (n = 162). Patients' medical charts were evaluated by an expert rheumatologist to confirm the clinical diagnosis, regardless of the number of the ACR-97 criteria met. Overall sensitivity and specificity, as well as the sensitivity and specificity according to disease duration, were compared between ACR-97 and SLICC-12 classifications. Predictive value for SLE for both classifications was assessed using logistic regression and receiver operating characteristic (ROC) curves. Results The SLICC-12 criteria had significantly higher sensitivity in early disease, which increased with disease duration. The ACR-97 criteria had higher specificity. The specificity of the SLICC-12 criteria was low and decreased with disease duration. Regression analysis demonstrated the superiority of the SLICC-12 classification criteria over the ACR-97 criteria, with areas under the ROC curve of 0.801 and 0.780, respectively. Conclusion Although the SLICC-12 criteria were superior to the ACR-97 and were more sensitive for diagnosing early SLE, their specificity in our population was too low. The sensitivity of the SLICC-12 classification is increased by better defined clinical features within each criterion. Our results contribute to the current initiative for developing new criteria for SLE.

Systemic lupus erythematosus (SLE) is a chronic autoimmune disease characterized by the production of autoantibodies and immune complexes, and their deposition into different tissues. Its etiology is still unclear, with genetic and environmental factors contributing to disease development (1)(2)(3). SLE has a relapsing-remitting course, and disease damage accumulates over time. Given that the period from the disease onset to permanent damage can be long, SLE is hard to diagnose in its early stages. It is even more difficult to classify patients using classification criteria of the American College of Rheumatology (ACR). Therefore, in a clinical setting, substitute entities such as incomplete lupus erythematosus (ILE), preclinical lupus, or latent lupus are often mentioned when describing patients whose clinical and laboratory findings suggest SLE but who meet fewer than 4 ACR criteria (4)(5)(6).
Advances in understanding of pathogenetic mechanisms require the redefinition and validation of classification criteria. Redefined criteria would encompass some of the patients with undefined disease and enable more thorough monitoring of the disease course and more prompt therapy introduction. In rheumatology, the development of classification criteria is challenging due to the lack of a clinical, etiopathogenetic, or diagnostic gold standard. Instead, the gold standard remains the expert rheumatologist's opinion (7,8).
ACR published the first, preliminary, criteria for SLE in 1971, which were revised in 1982 and later validated. The second revision from 1997 (ACR-97), although unvalidated, has been the most widely used criteria set to date (9-13) ( Table  1). The prerequisite for enrolling patients in clinical studies is cumulative attainment of any 4 of the 11 ACR criteria. Despite their high sensitivity (>85%) and specificity (>95%) in longstanding SLE, their sensitivity in early disease can be significantly lower (14)(15)(16). Furthermore, some organ systems are overrepresented (eg, 4 criteria attributed to skin and mucosa), with equal contribution of every criterion to the diagnosis regardless of its specificity and sensitivity. Clinical experience and previous studies have shown greater statistical significance of certain criteria, such as renal impairment, discoid rash, or cytopenias (14,17,18). Different statistical methods have been employed with the purpose to develop optimal classification rule (19)(20)(21)(22). Somogy et al validated the ACR criteria on a patient cohort One of the most important improvements is that biopsyproven lupus nephritis with positive antinuclear antibody (ANA) and/or anti-double stranded DNA (dsDNA) is a sufficient requirement to meet the SLE diagnosis.
The multi-annual revision of the SLE classification conducted by the SLICC was a two-step process consisting of criteria derivation and criteria validation in two large patient groups. In the derivation group, the SLICC-12 classification showed greater sensitivity than the ACR-97 criteria, almost equal specificity, and had significantly fewer missclassifications. In the validation step there was no statistical difference between the two classifications (14). The study also included 201 new patients with SLE or potential SLE who started their follow up at the Department from 2012 to 2016. The patients with insufficient medical data or without regular follow ups were excluded. Finally, the patients were divided in two groups: 146 patients with established SLE (diagnosed by rheumatologist) and 162 patients with SLE-allied conditions. The criterion for the study inclusion was not the number of classification criteria fulfilled, but the clinical diagnosis determined by the expert rheumatologist, which remains the gold standard for diagnosing SLE.
Patients who had to undergo kidney biopsy for lupus nephritis staging have signed an informed consent, which is a standard procedure at the Clinic. The study was approved by the Ethics Committee of the University Hospital Center Zagreb.

Methods
All of the patient records were reevaluated by expert rheumatologist in order to determine if they agree with the diagnosis. For every patient, a check-list of SLE-related features was filled out. The association between clinical diagnosis and diagnosis generated on the basis of both ACR-97 and SLICC-12 classification criteria was as-sessed. The overall sensitivity and specificity of ACR-97 and SLICC-12 classifications, as well as the sensitivity and specificity according to disease duration was calculated. The predictive value of every criterion in ACR-97 and SLICC-12 classification was assesed using logistic regression analysis and receiver operating characteristic (ROC) curves.

Statistical analysis
Simple descriptive statistics was computed in Microsoft Excel 2010 (Microsoft Corporation, Redmond, WA, USA), while univariate and multivariate logistic regression and stepwise procedure were performed in SAS (SAS Institute, Cary, NC, USA) with the purpose to determine the best predictive models (27). A 5% level of significance (type I er-ror) was considered significant. Differences in proportions were assessed using the test of proportions.

Disease duration and sex distribution
Mean disease duration from the time of first symptoms' onset was 11 ± 7 years, while the mean disease duration from the time of diagnosis was 10 ± 7 years. The relatively short period needed to diagnose SLE could be attributed to prompt and effective diagnostic procedure in a highly specialized tertiary institution. Our cohort consisted of 279/308 (90.6%) female patients, which is the expected sex distribution in SLE patients (1)(2)(3). Sex distribution was almost equal in two patient-groups: 132 (90.4%) women in the SLE-group and 147 (90.7%) women in the NSLE-group. Mean age of all patients was 51 ± 14 years (range 20-88). Male patients had somewhat lower age (mean 48 ± 13) than female patients (mean 52 ± 15). The mean age at the first visit was the same as the age at the time of diagnosis (41 ± 14 years).
Sensitivity and specificity of SlICC-12 and ACR-97 criteria by disease duration In order to assess if SLICC-12 classification was superior to ACR-97 classification in earlier stages of SLE, we distributed the patients into groups according to disesase duration: the early disease group (disease onset max. 5 years    ago) and the remaining groups by five-year periods of disease duration (5-10 years, 10-15 years, 15-20 years, 20-25 years, and 25-30 years). In the early-disease group, the SLICC-12 criteria had 74% sensitivity and a low specificity of 58.5%. The sensitivity and specificity of the ACR-97 criteria were 22.2% and 98.1%, respectively. The sensitivity of the SLICC-12 criteria increased with disease duration, while their specificity decreased. On the other hand, the low sensitivity of the ACR-97 criteria in the early SLE stage showed a slow, but stabile growth. Similarly, their specificity was high in all patient groups and increased to 100% in the long-standing, established disease ( Figure  1A and 1B)  NPSle -neuropsychiatric lupus erythematosus; l -leukocyte; ly -lymphocyte; ANA -anti-nuclear antibody; dsDNA -double stranded DNA; anti-Sm -anti-Smith antibody; APA -anti-phospholipid antibody; lR -likelihood ratio; oR -odds ratio; 95% CI -95% confidence interval; c -area under the RoC curve.

Regression analysis of SlICC-12 and ACR-97 classification criteria
Using regression analysis, we tested the predictive value of classification criteria for diagnosing SLE in our patient cohort. Additionally, to select the best combination of classification criteria and to create the most adequate model for predicting SLE, we employed stepwise procedure.
Univariate analysis showed that the SLICC-12 classification was a better predictor of SLE than the ACR-97 criteria, with the areas under the ROC-curve (AUC) SLICC = 0.801 and AU-C ACR = 0.780 (Figures 2A and 2B). We found a moderate relationship of the number of ACR-97 criteria with the SLE diagnosis, with AUC = 0.711. An increase in the total number of ACR-97 criteria by 1 increased the odds ratio (OR) for diagnosing SLE by almost 3 times (OR 2.916, 95% confidence interval [CI] 1.085-4.078). The total number of SLICC-12 criteria was a slightly better predictor for diagnosing SLE than the number of ACR-97 criteria, with AUC = 0.728.
Interestingly, an increase in the number of SLICC-12 criteria by 1 increased the OR for diagnosing SLE by almost 2 times (OR 1.727, 95% CI 1.454-2.050). When we analyzed the clinical and immunologic SLICC criteria separately, immunologic criteria showed greater predictive value, with AUC immun = 0.708 and AUC clin = 0.607 ( Figures 3A and 3B) ( Table 4 and Table 5).
Using stepwise procedure, we obtained the best predictive models for SLE. Stepwise procedure for SLICC-12 classification generated a combination of 6 criteria: chronic cutaneous lupus, renal impairment, anti-dsDNA antibody, anti-cardiolipin antibody, β2-glycoprotein I, and a positive Coombs test. Taken together, these criteria had the best predictive value for SLE, with AUC = 0.770 ( Figure 4).

DISCuSSIoN
Our results show the superiority of the SLICC-12 criteria and their somewhat greater predictive value for diagnosing SLE compared with the ACR-97 criteria, especially in the early disease stage. Better defined clinical features within each criterion contribute to higher sensitivity of the SLICC-12 classification. Nevertheless, in our cohort, the specificity of SLICC-12 classification was too low and decreased with SLE duration. The ACR-97 classification showed higher specificity than the SLICC-12 classification. Our results agree with the results of criteria validation by the SLICC group and other research, which have shown high specificity of the ACR-97 criteria in the established disease, with lower, but acceptable sensitivity (9,26,28,29). SLE diagnosis is hard to confirm due to its heterogeneous clinical presentation and slow accumulation of symptoms. Therefore, the ACR-97 criteria do not have adequate sensitivity to recognize patients in the early disease stage. On the other hand, SLICC-12 criteria have so far shown higher sensitivity than the ACR-97 in the early disease, as well as in milder SLE cases, in UCTD, and in juvenile SLE (24-26, [29][30][31][32][33][34]. Nevertheless, in our study, SLICC-12 classification often had lower specificity than the ACR-97, which was also confirmed in the validation process by the SLICC group (14).
When we compared the two classifications in terms of disease duration, the greatest difference in sensitivity emerged in the early disease stage. These results agree with those from other reports. A study involving 2055 patients from Spanish and Portuguese registers (26) showed that the SLICC-12 criteria had significantly higher sensitivity than the ACR-97 criteria in the earlier disease stages and that their sensitivity increased with disease duration. In the long-standing disease, the sensitivity of the two classifications did not significantly differ (26). Studies on pediatric patients with SLE also showed higher sensitivity in the earlier disease stages and lower specificity (31)(32)(33). A study involving patients from a Swedish lupus register reported that SLICC-12 criteria had higher sensitivity (94% vs 90%), but a surprisingly low specificity (74%). The authors recommend that a combination of the two criteria sets is used to define patients more precisely (28). Furthermore, research on 495 patients from 12 Japanese medical centers showed similar results: higher sensitivity (99% vs 88%) and lower specificity of the SLICC-12 criteria compared with the ACR-97 criteria (80% vs 85%) (35). In the Mexican population, fewer patients were missclassified when the ACR-97 criteria were used, with higher specificity and positive predictive value (36). This implies that the ACR-97 criteria are more reliable in a "real-life" setting. A study conducted in Olmsted county (Minnesota, USA) showed a higher SLE incidence if the SLICC-12 criteria are used, mostly because of the added classification rule on biopsy-proven lupus nephritis along with positive ANA or anti-dsDNA antibodies (37).
Regression analysis in our patient cohort showed the superioritiy of the SLICC-12 classification. The number of the SLICC-12 criteria met by a patient was more strongly associated with SLE diagnosis than the number of ACR-97 criteria. Nevertheless, the increase in the number of the ACR-97 criteria met was more strongly associated with SLE. Univariate analysis showed a minor difference between the two classifications. In general, both criteria sets in our patient cohort showed significant association with SLE, with SLICC-12 classification being a slightly better predictor. Multivariate analysis showed different predictive value of some criteria in the two classifications. The SLICC criteria that most contributed to diagnosing SLE were chronic cutaneous lupus, synovitis, renal impairment, anti-dsDNA, and APL antibodies. On the other hand, ACR-97 criteria that most contributed to diagnosing SLE were acute cutaneous lupus (photosensitivity and butterfly rash), discoid lupus, serositis, renal impairment, hematologic, and immunologic disorder. Acute cutaneous lupus in the new criteria encompasses various manifestations that often overlap and, taken together, in our cohort did not show enough specificity for SLE. Therefore, they are not as represented as the often described malar rash and photosensitivity in the ACR-97 criteria. As opposed to earlier research (37), the two classifications did not significantly differ regarding renal impairment in our cohort. Given that arthritis/synovitis is less strictly defined in SLICC-12 criteria, it emerged significant in regression analysis, as well as the hematologic criterion of the ACR-97 classification, which incorporates a disorder in all three blood cell lineages. Although stepwise procedure generated optimal criteria combinations for predicting SLE, the strongest predictor turned out to be the overall classification, regarding both SLICC-12 and ACR-97 classification. This is contradictory to the results of Mesa at al (38), who, also employing regression analysis, reported that reduced models, rather than the whole classification, were better discriminators of SLE patients among unclear cases of MCTD. Al-Daabil et al (39) found that strong predictors of SLE were oral ulcerations, renal impairment, and anti-dsDNA antibodies. A study involving 110 patients with SLE from another tertiary center in Croatia reported a linear correlation of the number of SLICC-12 criteria and disease activity, while no such correlation was reported for the ACR-97 criteria (40). Amezcua-Guerra et al (36) showed high positive and negative predictive values for both classifications, with higher specificity of the ACR-97 criteria for SLE.
Our study has several limitations. First, it was carried out in a single center and, therefore, may not account for the entire population with SLE in Croatia. Nevertheless, our center has a substantial number of SLE patients, from all parts of the country (predominantly northwestern). Second, a considerable number of patients (35%) in this study was in the early stage of the disease at the time of data collection. This could imply a milder disease in the overall sample. Still, taken together, all studies carried out so far, as well as this study, have shown overall a minor difference between the two classifications. SLICC-12 classification, despite its high sensitivity, wider range of incorporated clinical manifestations, and immunologic criteria, does not show significant supremacy in a real-life setting. This is largely due to the high specificity of the ACR-97 criteria. Due to relatively low sensitivity, classification criteria are not an appropriate diagnostic tool in routine clinical setting. Their primary purpose is defining homogeneous patient cohorts for clinical research. Therefore, the gold standard for diagnosing most of rheumatic diseases, including SLE, remains an experienced rheumatologist's assessment. The results of this research contribute to the ongoing initative for developing improved classification criteria for SLE (41).
Funding None. ethical approval given by the Ethics Committee of the University Hospital Centre Zagreb.
Declaration of authorship MB and NČ conceived or designed the study. MB acquired the data. MB and BA analyzed or interpreted the data. MB drafted the manuscript. MB critically revised the manuscript for important intellectual content. All authors gave final approval of the version to be submitted and agree to be accountable for all aspects of the work.