Prognostic performance of the Simplified Acute Physiology Score II in major Croatian hospitals: a prospective multicenter study

Aim To perform an external validation of the original Simplified Acute Physiology Score II (SAPS II) system and to assess its performance in a selected group of patients in major Croatian hospitals. Methods A prospective, multicenter study was conducted in five university hospitals and one general hospital during a six-month period between November 1, 2007 and May 1, 2008. Standardized hospital mortality ratio (SMR) was calculated from the mean predicted mortality of all the 2756 patients and the actual mortality for the same group of patients. The validation of SAPS II was made using the area under receiver operating characteristic curve (AUC), 2 × 2 classification tables, and Hosmer-Lemeshow tests. Results The predicted mortality was as low as 14.6% due to a small proportion of medical patients and the SMR being 0.89 (95% confidence interval [CI], 0.78-0.98). The SAPS II system demonstrated a good discriminatory power as measured by the AUC (0.85; standard error [SE] = 0.012; 95% CI = 0.840-0.866; P < 0.001). This system significantly overestimated the actual mortality (Hosmer-Lemeshow goodness-of-fit H statistic: χ2 = 584.4; P < 0.001 and C statistics: χ28 = 313.0; P < 0.001) in the group of patients included in the study. Conclusion The SAPS II had a good discrimination, but it significantly overestimated the observed mortality in comparison with the predicted mortality in this group of patients in Croatia. Therefore, caution is required when an evaluation is performed at the individual level.


Aim
To perform an external validation of the original Simplified Acute Physiology Score II (SAPS II) system and to assess its performance in a selected group of patients in major Croatian hospitals.
Methods A prospective, multicenter study was conducted in five university hospitals and one general hospital during a six-month period between November 1, 2007 and May 1, 2008. Standardized hospital mortality ratio (SMR) was calculated from the mean predicted mortality of all the 2756 patients and the actual mortality for the same group of patients. The validation of SAPS II was made using the area under receiver operating characteristic curve (AUC), 2 × 2 classification tables, and Hosmer-Lemeshow tests.

Results
The predicted mortality was as low as 14.6% due to a small proportion of medical patients and the SMR being 0.89 (95% confidence interval [CI], 0.78-0.98). The SAPS II system demonstrated a good discriminatory power as measured by the AUC (0.85; standard error [SE] = 0.012; 95% CI = 0.840-0.866; P < 0.001). This system significantly overestimated the actual mortality (Hosmer-Lemeshow goodness-of-fit H statistic: χ 2 = 584.4; P < 0.001 and C statistics: χ 2 8 = 313.0; P < 0.001) in the group of patients included in the study.

Conclusion
The SAPS II had a good discrimination, but it significantly overestimated the observed mortality in comparison with the predicted mortality in this group of patients in Croatia. Therefore, caution is required when an evaluation is performed at the individual level.
In the recent decades, scoring systems for assessing the severity of disease on admission to intensive care units (ICU) have been used for performance evaluation in different ICUs in different countries. The comparison is based on the calculation of the standardized mortality ratio (SMR) from the mean value of all predicted mortalities and the observed mortality in the same group of patients (1). The SMR calculation method is widely used for a comparison of ICUs that are specialized for the treatment of very different patients with regard to their age, comorbidities, and current condition (the reason for admission and disorder of physiological variables) (2).
One of the most frequently used disease-severity scoring systems, created by Le Gall et al (3) in 1993, is the second version of the Simplified Acute Physiology Score (SAPS II). The SAPS II was developed on the basis of a large number of patients as an upgrade of the first version created by the same authors in 1984 (3,4). As opposed to the SAPS I, the SAPS II resulted from the selection and weighing of each variable by logistic regression. The SAPS II total score is the sum of scores of the worst value for each variable within the first 24 hours after ICU admission. The score is then converted into the probability of dying, that is, predicted mortality, by using the model equation.
The primary aim of the study was to perform an external validation of the SAPS II system in a group of the ICU patients treated in the major hospitals in Croatia. In addition, this study aimed to determine the actual severity of disease on ICU admission, before the Diagnosis Related Groups system is introduced in Croatia, which will require SAPS II score assessment on ICU admission.

Patients
The project "Performance of Intensive Care Medicine in the Republic of Croatia" began in 2007 under the auspices of the Croatian Association for Anesthesiology and Intensive Care Medicine. The prospective study was carried out in five university hospitals and one general hospital between November 1, 2007 and May 1, 2008. The participant hospitals were "Rijeka University Hospital" from Rijeka, "Sestre Milosrdnice, " "Dubrava, " "Jordanovac, " and "Merkur" university hospitals from Zagreb, and "Varaždin General Hospital" from Varaždin.
Before the study, a dedicated computer program was created by the Microsoft Access 2003 (Microsoft, Redmond, WA, USA). Detailed instructions about the program, as well as about the system for mortality prediction, were prepared. Two data entry trainings were held to educate the personnel responsible for entering the data into the database (anesthesiology residents and specialists). The computer program automatically scored each variable, generated alerts about illogical and/or extreme variable values, and excluded patients from the study according to the criteria used in the original study, that is, patients younger than 18 years, patients with burns, patients with coronary disease, heart surgical patients, and patients who were in the ICU for less than 4 hours (3). Furthermore, in the final computer report, estimated mortality was taken into consideration only for the first ICU stay in case the patient was admitted to the ICU more than once during a single hospitalization. Data collection was completed on September 1, 2008 and was all-inclusive.
All variables for the SAPS II scoring system were manually collected, including age, chronic diseases (hematologic malignancies, metastatic cancer, and/or acquired immunodeficiency syndrome), type of admission (elective surgery, emergency surgery, or medical), and physiological variables (body temperature, heart rate, systolic blood pressure, the ratio of partial oxygen pressure and inspired oxygen concentration, diuresis, urea, potassium, sodium, bicarbonates, leukocytes, bilirubin, and Glasgow Coma Score). Medical patients were defined as patients without any surgical procedure within seven days of ICU admission. The consciousness of the patient was evaluated by the Glasgow Coma Scale (GSC), and for the patients who were sedated at the moment of ICU admission the value of GCS had been recorded before sedation started. Among all recorded values for a particular variable within the first 24 hours of ICU stay, the value that had the highest number of points was selected from the patient's record. If the value was entered as a range, the computer program converted the range to a number of points (3). Except for the variables included in the SAPS II, we measured the length of ICU stay as well as the length of hospital stay. The outcome of interest (alive or dead) was measured at the point of discharge from the hospital. The treatment outcome on discharge from the ICU was measured in the same manner. Since all the patient variables used in this study were regularly collected in everyday work, no additional interventions were needed. The study was approved by the ethics committee of the Rijeka University Hospital Center.

Statistical analysis
The SAPS II score was calculated for all patients by adding up the number of points for each variable and the probability of death was computed according to the original SAPS II equation (3). The observed mortality was divided by the mean value of all predicted mortalities to calculate the SMR. The 95% confidence intervals (CI) for SMRs were calculated by regarding the observed mortality as a Poisson variable, and then dividing its 95% CI by the predicted mortality (25). The survivors and deceased patients were compared using univariate comparisons. Continuous variables were presented as either means with standard deviation (for normally distributed data) or medians with an interquartile range. Comparisons were performed using either the t test or the Mann-Whitney U-test, whichever was suitable. Categorical variables were presented by frequencies and percentages and compared using the χ 2 test. All statistical tests were two-tailed. P < 0.05 was considered statistically significant.
The SAPS II score was validated in a group of patients receiving intensive care medicine in Croatia while testing for discrimination (ability to discriminate between patients who will live and patients who will die) and calibration (degree of agreement between predicted and observed mortality). Discrimination was evaluated by calculating the area under the receiver operating characteristic curve (ROC), with a standard error (SE), 95% CI, and Z statistics. The constructed ROC curve specified a range of probabilities of death, and a 2 × 2 classification table of predicted and observed mortalities was created for each decision criterion, that is, the ROC curve showed the graphic relationship between the sensitivity and specificity. The higher the true-positive frequency in comparison with the false-positive frequency, the larger the area under the ROC curve. The 2 × 2 classification table was created for the three decision criteria of the predicted mortality of 0.1 (10%), 0.5 (50%), and 0.9 (90%), which were compared using the McNamara χ 2 test (26). Calibration was evaluated by using Hosmer-Lemeshow C and H goodness-of-fit statistics and calibration curve. The patients were divided in 10 groups according to the level of predicted mortality in order to calculate the H value. To calculate the C value, the patients were divided in 10 groups of an equal size, and the predicted mortality was compared with the observed mortality in each of the groups. High C values and low P values (P < 0.05) suggested that the model did not predict well the observed mortality (27). When we investigated the uniformity of fit, we used two strategies that compared SMRs and 95% CIs: participant ICUs and the type of the patients. Data were analyzed using MedCalc ver. 11.6.1.0, MedCalc Software (bvba, Mariakerke, Belgium), Statistica ver. 9.1 (StatSoft Inc., Tulsa, OK, USA), and SPSS, version 14.0 (SPSS Inc., Chicago, IL, USA).

ReSultS
All ICUs included in the study were combined medical/surgical ICUs headed by anesthesiologists, with other specialists on the team. In the participating hospitals, there were 68 ICU beds (range, 7-18) and 3613 acute beds (range, 237-1050). In addition to the ICUs included in this study, there were also other ICUs that were not included in the study, with a total of 102 intensive care beds. Of the total number of acute beds, 4.7% were intensive care beds, that is, the ratio of intensive care to acute beds was 1 to 21. The ratio of mean anesthesiology specialists per ICU during the day was 1.8 (range 1-3) and during the night 1.2 (range 1-2). The nurse to bed ratio was 0.5 (range 0.4-0.7) during the day as well as during the night.
We analyzed the data on 3572 patients who were admitted to ICUs during the study period. Exclusion criteria were met by 814 patients, who were mostly heart surgical patients (n = 435). In addition, two patients were excluded from the final analysis due to incomplete data. The final analysis included 2756 patients. The average number of patients per ICU was 459 (range 314-596) The median age of the patients was 64 (range 52-73) years, and 61% were men. According to the type of admission, most patients were admitted after elective surgery, followed by patients admitted after an emergency surgery, and medical patients (62%, 29%, and 9%, respectively). The median length of stay in the ICU was 2 (range 2-4) and the median length of hospital stay was 11 (8)(9)(10)(11)(12)(13)(14)(15)(16)(17) days (Table 1).
The survivors were younger than the patients who died, whereas sex had no effect on survival ( Table 1). The type of admission had a significant influence on the outcome of hospital treatment, with medical patients having a higher mortality than surgical ones. Emergency surgical patients had a higher mortality than elective surgical patients. The survivors had a shorter ICU stay than the patients who died, whereas the length of hospital stay of survivors and deceased patients did not differ (Table 1).
On ICU admission, the median of SAPS II score was 24 (range, 16-37) and the predicted mortality was 14.6%. Most patients had a low predicted mortality, that is, the predicted mortality was lower than 10% in 67% of the patients; 61.3% patients had an SAPS II score less than 27. The predicted mortality varied between the hospitals (range 5.8%-27.4%), and survivors had a lower predicted mortality on admission than patients who died (10.1% vs 44.9%, respectively; P < 0.001; Table 1). The discriminatory power of the SAPS II system, as assessed by the area under the ROC curve, was 0.85 (SE = 0.012; 95% CI = 0.840-0.866; P < 0.001; Figure 1). With the decision criteria of 10%, 50%, and 90%, the sensitivity was 81.2%, 43.0%, and 11.8% and the false-positive rate was 25.9%, 4.6%, and 4.6%, respectively ( Table 2).
The calibration was tested using the Hosmer-Lemeshow goodness of fit H test (χ 2 8 = 584.4; P < 0.001) and C test (χ 2 8 = 313.0; P < 0.001). The difference between the predicted and observed mortality was significant (Tables 3 and 4). The calibration curve of the SAPS II system in this group of ICU patients treated in major Croatian hospitals indicated the lower observed mortality in comparison with the predicted mortality in all groups except for the group whose predicted mortality was 40%. The deviation from the ideal (full line) is larger in groups of patients with a higher predicted mortality (Figure 2).

DISCuSSIOn
We reported the results of a prospective, multicenter study performed in five university hospitals and one general hospital in Croatia. Multi-purpose scoring systems for predicting mortality based on the SMR are the only methods that allow for a comparison of intensive care treatment results in different patient groups in different countries and ICUs.
Patients treated in major Croatian hospitals were not significantly different according to their demographic characteristics from those included in the studies performed in Western countries (5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17). The median age of patients included in the study was 64 years, and two-thirds were men. The reason for a lower percentage of medical patients, which differed from that in the original study and other studies   (5-17), was that the ICUs included in the study were managed by anesthesiologists who primarily dealt with surgical patients. In all the studied hospitals, the ICUs that were not included in the study were managed by physicians of other specialties, such as cardiologists or neurologists, who treated mostly medical patients.
In comparison with the studies performed in Western countries, the group of the patients included in our study had a low SAPS II score and a low predicted mortality. The low proportion of medical patients was the reason why we recorded low predicted mortality on admission. According to the type of admission, most patients included in the study were elective surgical patients. These patients are usually admitted to the ICU for a 24-hour supervision and are characterized by a low predicted mortality. Accordingly, the lengths of ICU and hospital stays were shorter than those reported in the literature (5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17). In addition, the observed ICU mortality and in-hospital mortality were low in comparison with those reported in Western countries (5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17).
The influence of different variables on survival was similar to that reported in Western countries (5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17). The survivors were younger and had a lower predicted mortality, and there was no difference in survival between men and women. On the other hand, the highest mortality was recorded in medical patients, followed by emergency surgical patients and elective surgical patients. The survivors had a shorter length of ICU stay, while the length of hospital stay was similar in survivors and deceased patients.
The SMR was 0.89 (95% CI 0.79-0.98) and deviated toward a lower value in only one hospital: a specialized hospital for thoracic surgical patients where all operated patients are admitted to the ICU by default. Since SMR and 95% CI for elective surgical patients deviated from emergency surgical and medical patients, results from this hospital were probably the reason for the overall lack of uniformity of fit, because elective surgical patients were mostly treated in this hospital (93%). Other hospitals did not differ in SMRs and 95% CIs, which indicated a similar uniformity of fit in these hospitals. In addition, the total SMR was within the range reported in Western countries (5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17). However, the results of these studies should be interpreted with caution, because they were reported approximately 10 years ago, and changes in population and treatment over time may change the prognosis of patients, thereby limiting the applicability of prognostic models (28).
The validation of the SAPS II system in this group of ICU patients in Croatia showed good discriminative properties. The SAPS II system discriminated well between the patients who would live and those who would die, which is similar to the data from Western countries. The 2 × 2 classification table showed low sensitivity values, low-false positive rates, and low overall correct classification for all decision criteria. Such a finding was already reported in the literature, where the analyzed sample of patients included a large proportion of those with a low predicted mortality (29). Our calibration tests confirmed in a Croatian ICU population that the SAPS II system predicted a mortality that was higher than the observed one. This calls for caution when assessment is made at an individual level. Improved prediction at an individual level may be achieved by the customization of the SAPS II system for patients treated in Croatian hospitals, as shown in previous studies (6,12,16). The fact that ICUs included in the study were managed by anesthesiologists, who primarily dealt with surgical patients, could be considered a limitation. However, SAPS II was created to overcome the differences between ICU patients by focusing on the severity rather than the type of the disease.
SAPS II could be a very useful tool for benchmarking, which includes a comparison between similar ICUs. Benchmarking has been recently included among indicators for improving the safety and quality of care for intensive care patients (30). In this sense, it is necessary to create a national database of intensive care outcomes, compare the existing ICUs, find ICUs of excellent practice, and spread it all over the country (31).
The SAPS II system can discriminate well between the patients who will survive and those who will die. However, this system overestimates the mortality of the analyzed group of ICU patients; therefore, the prognosis in individual patients has to be made with caution.
Acknowledgment The authors thank the management of the CAAICM, under whose auspices the project "The Performance of Intensive Care Medicine in the Republic of Croatia" was carried out, and Dražen Matleković, who created the project database. They also thank all their colleagues who participated in data collection, including all physicians at the Department of Anesthesiology and Intensive Care, Merkur University Hospital, and N. Funding None.
ethical approval received from the ethics committee of the Rijeka University Hospital Center.
Declaration of authorship KD was the first investigator responsible for design, collection of data, statistics, interpretation, and writing of the manuscript. MP designed the study and contributed to the interpretation of the results, data analysis, and gave the approval of the final version. IH made substantial contributions to the conception and design, acquisition of data, drafting of the manuscript, and gave the approval of the final version. AŠ participated in design and supervision of the research. AK made substantial contributions to conception and design, acquisition of data, drafting of the manuscript, and gave approval of the final version. VK made substantial contributions to the manuscript preparation. DM made substantial contributions to design and acquisition of data, revised the manuscript, and gave the approval of the final version. BPV was a coauthor in this study. MŽB made substantial contributions to the analysis and interpretation of data, drafting of the manuscript (statistical methods and results), critical revision, and the final version of the manuscript. JS contributed to study design and results analysis. MŠ contributed to collection and processing of data from his hospital. DB made substantial contributions to the conception and design, acquisition of data, drafting the manuscript, and gave approval of the final version. JŠM was an investigator and participated in the acquisition of data and manuscript preparation. DG participated in recruitment and analyzing of the PTS. DOJ was a coauthor in this study.