NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

ECRI Health Technology Assessment Group. Treatment of Degenerative Lumbar Spinal Stenosis. Rockville (MD): Agency for Healthcare Research and Quality (US); 2001 Jun. (Evidence Reports/Technology Assessments, No. 32.)

  • This publication is provided for historical reference only and the information may be out of date.

This publication is provided for historical reference only and the information may be out of date.

Cover of Treatment of Degenerative Lumbar Spinal Stenosis

Treatment of Degenerative Lumbar Spinal Stenosis.

Show details

3Results

The sequence of events leading to spinal stenosis was described in the introduction to this report. In this chapter, we further describe the natural history of lumbar stenosis by examining the evidence pertaining to questions 1-3, which were previously presented in the evidence model. Each of these questions pertains to a specific link in the evidence model.

Question 1 ("What is the relationship between each relevant patient characteristic and the presence and/or intensity of each of the patient signs, symptoms, and conditions of lumbar spinal stenosis?") examines the overarching link from patient characteristics to symptoms of stenosis. Examination of the issues pertinent to this link is complicated by the fact that many of the symptoms of stenosis are shared by other conditions, such as herniated spinal disks or vascular claudication. For this reason, this link is broken down into two shorter links. Question 2 ("Which relevant patient characteristics are associated with an increased likelihood of focal narrowing of the spinal canal?") examines the link between patient characteristics and the development of spinal stenosis. Question 3 ("What is the relationship between degree of stenosis and the presence and/or intensity of each of the signs, symptoms, and patient conditions?") examines the link between having stenosis and developing each of the signs and symptoms of the condition.

Question 1What is the relationship between each relevant patient characteristic and the presence and/or intensity of each of the patient signs, symptoms, and conditions of lumbar spinal stenosis?

Certain patients may be more vulnerable to the effects of spinal stenosis. In this section, we first examine the available evidence concerning whether there is a relationship between canal diameter and the development of back pain, claudication, or radicular pain/sciatica. We then examine data on whether there is a relationship between patient age or sex or the presence of a herniated disk and the development of stenotic symptoms.

In answer to this first question, we consider only studies that reported some measurement of spinal diameter and that reported data on whether patients had or did not have a particular symptom. In many cases, this limits our treatment of the data to case-control studies, because they were the only studies to report symptomology. Due to a relative lack of information, parts of our answer to this question take the form of a systematic narrative review. While this review reflects the results of and critiques currently available studies, another purpose of it is to illustrate where there is a need for data.

Relationship Between Spinal Canal Diameter and the Presence of Back Pain, Claudication, or Radicular Pain/Sciatica

Although it is commonly believed that the development of the symptoms of lumbar spinal stenosis are related to the size of the spinal canal (Clark, 1969; Postacchini and Perugia, 1991), very little evidence is available to support this belief. While more severe degeneration and degree of slippage are often associated with more severe symptoms, the extent of neural deficit may depend more on the degree of neural compression (Postacchini and Perugia, 1991). If the canal is wide and the stenosis is mild, there is relatively little or no compression of the dural sac. When the spinal cord is normally sized, stenosis may lead to compression of the dural sac. If the spinal canal is smaller than normal, the sac may be severely compressed even if there is only slight narrowing or vertebral slippage. When the canal is small, a developing stenosis may tend to become symptomatic earlier and is likely to present more severe deficits than in a wider canal (Clark, 1969).

There is an important distinction between canal size and stenosis. Canal size refers to the size of the healthy, nonstenotic canal. On the other hand, stenosis refers to a focal narrowing. When canal diameter is reported for stenotic patients, the measurement generally refers to canal size at the narrowest point. When canal size is described in epidemiologic or anthropologic studies such as those described in this section of this report, it refers to canal size at predefined levels. Although the spinal canal tends to narrow with age (Twomey and Taylor, 1988), the point at which normal narrowing becomes stenosis is imprecise.

The spinal canal is usually measured in one of three ways. The most common is to measure the midsagittal (anterior-posterior) diameter (Panjabi, Duranceau, Goel et al., 1991). This is the simple diameter of the canal at the widest point of the sagittal plane, and is measured in centimeters or millimeters. The sagittal diameter is the smallest dimension of the canal (Alvarez and Hardy Jr, 1998). The interpedicular distance, which is the diameter of the canal in the frontal plane, is usually considerably wider. Midsagittal diameter averages approximately 13.4 to 20.4 mm, while interpedicular distance is normally 19 to 27 mm (Postacchini, Pezzeri, Montanaro et al., 1980a). These measurements vary by spinal level as well as by the age, race, and size of the individual patient.

In some cases, the cross sectional area of the canal is measured. This may be a more informative measure if stenosis is occurring primarily in the lateral recesses. Finally, some researchers report the canal size in a ratio with the size of the vertebral body (Jones and Thomson, 1968). This ratio takes into account the observation that differently sized people have different sizes of canal. This reflects an attempt to define a "normal" canal size in relation to the size of the subject. It is not known whether a small person with a correspondingly small canal is less susceptible to spinal stenosis than a large person with a canal of similar size.

Spinal measurements are normally expressed individually for one or more discrete spinal levels. Spinal canal measurements of patients with stenosis may convey information about the global narrowing associated with degenerative changes on the spinal canal, but may not reflect the extent of focal narrowing in the stenotic area of the canal. When an osteophyte or an enlarged ligamentum flavum impinges on a spinal canal, the canal size at this point may be considerably less than in the surrounding area of the canal. This focal narrowing may not be reflected in the published measurements of canal size. The precise area of stenosis that is responsible for the patient's symptoms may be difficult to determine from diagnostic visualization (see section on Diagnostic Tests).

To investigate the relationship between spinal canal size and symptoms of stenosis, we conducted a series of meta-analyses of studies that compared spinal canal sizes among patients with back pain, claudication, or radicular pain to those of asymptomatic age-matched controls. The methods we used for our meta-analysis are described in the section entitled "Approaches to Evaluating and Combining Evidence" (see Chapter 2).

In the meta-analyses for this first question, we compared the canal diameters of patients with and without back pain and patients with and without claudication. The data we used in this meta-analysis are provided in Table 6. Also shown in this table are data from patients with and without radicular pain/sciatica (for reasons discussed below, these data could not be meta-analytically assessed). Although all of the patients in these studies have at least one symptom associated with spinal stenosis, they do not all have stenosis.

Table 6. Spinal Measurements in Patients with Symptoms of Stenosis.

Table

Table 6. Spinal Measurements in Patients with Symptoms of Stenosis.

Many of the relevant studies provided spinal canal measurements at more than one level or in more than one set of units. If a study provided more than one unit of measurement, sagittal diameter was preferentially used in our meta-analyses. Similarly, if a study provided canal size at more than one level, then we used the size at L4 in our meta-analyses. This level was reported by all studies that reported multiple levels and is between L3 and L5, each of which was the only level reported by two studies (Anderson, Adcock, Chovil et al., 1988; Hultman, Saraste, and Ohlsen, 1992). Thus, L4 is the most commonly reported level. If canal measurements were given for more than one patient group, then we used the patient group with the most severe symptoms in our meta-analyses, as long as that patient group included at least 10 patients. For the sake of clarity, the measurements used in the meta-analyses are indicated with boldface type.

Five studies compared spinal canal measurements among patients with or without back pain (Anderson, Adcock, Chovil et al., 1988; Drinkall, Porter, Hibbert et al., 1984; Hamanishi, Matukura, Fujita et al., 1994; Hultman, Saraste, and Ohlsen, 1992; Macdonald, Porter, Hibbert et al., 1984). Differences in spinal canal measurements between patients with or without back pain were statistically significant in two studies (Drinkall, Porter, Hibbert et al., 1984; Macdonald, Porter, Hibbert et al., 1984). The summary effect size for the meta-analytically combined studies was statistically significant (d = −0.541, p <0.0001). Neither test for heterogeneity was significant (Q = 1.82, p = 0.769, largest standardized residual = −1.321). Effect sizes of the individual studies, the 95 percent confidence intervals around them, and the appropriate meta-analytic statistics are presented in Table 7 and illustrated graphically with a Forrest plot in Figure 11. The effect sizes are negative because spinal canals of patients with back pain tend to be smaller than those of control patients.

Table 7. Summary of Meta-Analysis of Differences in Spinal Canal Diameter Between Patients With and Without Back Pain.

Table

Table 7. Summary of Meta-Analysis of Differences in Spinal Canal Diameter Between Patients With and Without Back Pain.

Figure 11. Forrest Plot of Differences in Spinal Canal Diameter in Patients With and Without Back Pain.

Figure

Figure 11. Forrest Plot of Differences in Spinal Canal Diameter in Patients With and Without Back Pain.

Figure 12 shows the normalized distributions of the canal sizes among patients with and without back pain. In this figure, the mean of the distribution of canal sizes among patients with no back pain is set to zero. The mean canal size among patients with back pain is 0.541 standard deviation units less than the mean canal size of control patients without pain. While the difference between means is statistically significant, there is a 64.8 percent overlap between the distributions of the spinal canal diameters of patients with and without back pain. The overlap of the two distributions demonstrates that although there is a statistically significant relationship between spinal diameter and back pain, this relationship is not perfect. The magnitude of the relationship between canal diameter and back pain can also be illustrated by converting the summary d statistic into a correlation coefficient. This yields a moderately low correlation coefficient of −0.261. When this correlation coefficient is used to construct a binomial effect size display (BESD, Table 8), we find that approximately 1.7 times as many patients with back pain have small spinal canals than age-matched control subjects without back pain.

Figure 12. Overlap of the Distributions of Normalized Canal Diameters in Patients With and Without Back Pain.

Figure

Figure 12. Overlap of the Distributions of Normalized Canal Diameters in Patients With and Without Back Pain.

Table 8. Binomial Effect Size Display: Canal Size in Patients With and Without Back Pain.

Table

Table 8. Binomial Effect Size Display: Canal Size in Patients With and Without Back Pain.

We caution that even the modest effect obtained in our meta-analysis may be an overestimate of the magnitude of the relationship between pain and canal size. This is because the studies in this analysis may not distinguish a congenitally narrow canal from focal narrowing of the canal. It is possible that some of the patients in these studies had focal spinal stenosis. If the spinal canal measurements in these patients were taken at the level of stenosis, this would lead to a smaller mean canal size for that patient group. If patients with back pain are more likely to have stenosis than those without back pain, this would introduce bias toward finding smaller canals in the pain patients. The relationship between stenosis and back pain is discussed in question 3 ("What is the relationship between degree of stenosis and the presence and/or intensity of each of the signs, symptoms, and patient conditions?").

The fact that patients with back pain tend to have small canals does not allow one to conclude that patients with small canals are more prone to have back pain. This is because, as with all case control studies, the authors were forced to artificially set the prevalence of pain during the enrollment of patients into these studies (this artificiality is typically required to ensure there are enough cases and controls in the study to obtain appropriate statistical power). One probable consequence of this is that the proportion of patients in these studies with back pain is probably much higher than it is in the general population. This means that calculations of predictive values or relative risks cannot be performed on these data, which, in turn, means that it is not possible to use these data to determine whether patients with small canals are prone to have back pain.

This caution also applies to the BESD; although it is in the form of a 2 by 2 table, one cannot use the data in it to compute prevalence. This caution concerning the BESD is reinforced by the fact that this display artificially sets the prevalence of pain to be 50 percent.

In our next meta-analysis, we sought to determine the relationship between the presence or absence of neurogenic claudication and spinal canal diameter. Only two studies compared spinal canal measurements among patients with claudication to measurements in healthy controls (Hamanishi, Matukura, Fujita et al., 1994; Uden, Johnsson, Jonsson et al., 1985). Therefore, the results of this meta-analysis are to be viewed as illustrating a trend in currently available data, and not as providing a definitive estimate of the magnitude of the relationship between these two variables. The raw data from the two studies that contributed data to this meta-analysis are shown in Table 6.

Differences in canal diameter between patients with and without claudication proved to be statistically significant in both studies. When data were combined, neither test of heterogeneity was statistically significant (Q = 2.15, p = 0.142, standardized residuals = ±1.467). The effect size for the combined groups was statistically significant (d = −1.23, p = 0.00001). Effect sizes and the 95 percent confidence intervals around them can be found in Table 9 and are illustrated graphically with a Forrest plot in Figure 13.

Table 9. Summary of Meta-Analysis of Differences in Spinal Canal Diameters of Patients With and Without Claudication.

Table

Table 9. Summary of Meta-Analysis of Differences in Spinal Canal Diameters of Patients With and Without Claudication.

Figure 13. Forrest Plot of Differences in Spinal Canal Diameters in Patients With and Without Claudication.

Figure

Figure 13. Forrest Plot of Differences in Spinal Canal Diameters in Patients With and Without Claudication.

There is a 36.8 percent overlap between groups, as illustrated in Figure 14. The binomial effect size display in Table 10 suggests that if the results of this meta-analysis can be generalized to the population at large, a person with claudication is approximately three times more likely to have a small spinal canal than a healthy person with no claudication.

Figure 14. Overlap of the Distributions of Normalized Canal Measurements in Patients With and Without Claudication.

Figure

Figure 14. Overlap of the Distributions of Normalized Canal Measurements in Patients With and Without Claudication.

Table 10. Binomial Effect Size Display: Canal Size in Patients With and Without Claudication.

Table

Table 10. Binomial Effect Size Display: Canal Size in Patients With and Without Claudication.

Similar to the preceding analysis, data on the prevalence of neurogenic claudication are not available. Therefore, these data cannot be used to determine whether canal size predicts the development of claudication.

Next, we compared canal diameters among patients with and without sciatica or radicular pain. As in the preceding analysis, only two studies (Hamanishi, Matukura, Fujita et al., 1994; Uden, Johnsson, Jonsson et al., 1985) contributed data to this meta-analysis. The raw data from the two studies that contributed data to this meta-analysis appear in Table 6.

Examination of these two studies revealed that both had difficulties with experimental designs that would have made the results of a meta-analysis difficult to interpret. In both studies, patients with different symptoms were divided into different mutually exclusive groups. In the study by Hamanashi, 16 of 53 patients with radicular pain also had claudication (Hamanishi, Matukura, Fujita et al., 1994). Some or even many of the patients in the claudication group in the study by Uden may also have had radicular pain. By stratifying patients in this manner, patients with claudication (which, as shown above, is associated with smaller canal diameters) and radicular pain were eliminated from the sciatica groups, leading to bias in favor of the radicular pain groups having larger canals. For this reason, additional studies are necessary to quantify the relationship between canal size and radicular pain.

Relationship Between Patient Age and Symptoms of Spinal Stenosis

Next, we examined the relationship between patient age and symptoms of lumbar stenosis. Due to a lack of extensive research, our approach to determining the relationship between age and symptoms is, by necessity, a narrative review. Epidemiological surveys show that in the United States, low back pain is most prevalent between the ages of 55 and 64 (Deyo and Tsui-Wu, 1987). The proportion of this pain that is due to stenosis is not known. Initial onset of pain is most commonly reported in the second decade and remains high through the fourth decade, after which it declines (Deyo and Tsui-Wu, 1987). Lumbar instability, which correlates with back pain, is significantly higher among patients under the age of 45 (Friberg, 1989).

A 1998 report states that disability as assessed using the Oswestry index (a measure of disability from back pain, a score of 100 meaning the patient is completely disabled) correlates significantly with age among patients with stenosis (Hurri, Slatis, Soini et al., 1998). Patients between the ages of 41 and 55 years had a mean Oswestry index of 29.7, while patients aged 56 and higher had a mean score of 41.3. This index may include other disabilities that may relate to age but not to spinal stenosis.

Relationship Between Patient Sex and Symptoms of Spinal Stenosis

Although the incidence of disk degeneration in men and women is similar (Lawrence, 1969), men tend to have more severe degeneration. The same survey reports an association between extent of lumbar disk degeneration and severity of back, leg, and hip pain and incapacity. Presence of spondylolisthesis was associated with increased likelihood of back pain in women (p = 0.044), but not in men (Virta and Ronnemaa, 1993). A significant interaction was found between spondylolisthesis and sex, suggesting that the association between spondylolisthesis and back pain is sex dependent (Virta and Ronnemaa, 1993). This association is not explained by the slightly greater degree of spondylolisthetic slip found in women in this study.

Relationship Between Presence of a Herniated Disk and Symptoms of Spinal Stenosis

Spinal stenosis is frequently the result of disk degeneration. Herniated disks are also a result of disk degeneration. For this reason, it is not surprising that herniated disks are often found among patients with spinal stenosis. Of the 35 studies retrieved that reported this information, 692 out of 2,327 (29.7 percent) patients treated for lumbar spinal stenosis also had herniated or protruding disks (see Evidence Table 25*). The range of disk abnormalities among these studies was rather broad; 2.9 percent to 100 percent of stenosis patients also had disk abnormalities (studies that used the presence of herniated disks as part of their inclusion or exclusion criteria were not included in this calculation). While we can conclude that there is an association between herniated disks and spinal stenosis, current data do not permit us to provide a precise estimate of the magnitude of that association or to provide evidence that one causes the other. Narrowing of the spinal canal due to herniated or protruding disks is considered a separate entity from spinal stenosis (Amundsen, Weber, Lilleas et al., 1995) but may contribute to the overall symptomology of the condition.

Summary

Patients with back pain or claudication tend to have narrower spines than asymptomatic patients. This suggests, although it does not prove, that patients with congenitally narrow spines may be more prone to developing symptoms of stenosis, especially back pain and claudication. Increased patient age and the presence of herniated disks may also contribute to the development of back pain and other symptoms of stenosis. The strength of these relationships and the exact ages at which patients are most likely to develop symptoms cannot be determined from the information available. There is some evidence that women with spondylolisthesis are more likely to experience pain than are men.

Question 2Which relevant patient characteristics are associated with an increased likelihood of focal narrowing of the spinal canal?

Certain patients may be more likely to develop spinal stenosis. In this section, we examine the available evidence linking patient characteristics to developing a symptomatic narrowing of the spinal canal.

As with the preceding question, we consider only studies that reported some measurement of spinal diameter and that reported data on whether patients had or did not have a particular symptom. In some cases, this limits our treatment of the data to case-control studies, because they were the only studies to report symptomology. Due to a relative lack of information, parts of our answer to this question take the form of a systematic narrative review. While this review reflects the results of and critiques currently available studies, another purpose of it is to illustrate where there is a need for data.

Relationship Between Nonstenotic Canal Diameter and Development of Focal Spinal Stenosis

The results of the preceding analyses suggest that some relationship exists between symptoms and spinal canal size. Therefore, we conducted a meta-analysis to explore the relationship between focal spinal canal narrowing and symptomatic spinal stenosis. We located four studies that compared measurements of spinal canals among patients with stenosis to those of control subjects. They are listed in Table 11. One of these studies (Schonstrom, Bolender, and Spengler, 1985) provided only one measure in common for control and stenotic patients. This measure, mean transverse area of the dural sac, was calculated from the transverse area at the dural sac at four levels of the lumbar spine. However, none of the measurements at any level included data from all 13 control subjects. At two of the levels, data from only five subjects were used. Nearly half (27 of 52) of the measurements that could have been used to calculate mean transverse area were omitted. No data were provided on the number of levels measured for the stenosis patients. In addition, 7 of the 24 patients said to have stenosis in this study actually had protruding disks. Because of these issues of poor design and reporting, this study was excluded from the meta-analysis. For the remaining studies, we chose data for incorporation into the meta-analysis in the same manner as described in the section entitled "Relationship Between Spinal Canal Diameter and the Presence of Back Pain, Claudication, or Radicular Pain/Sciatica." Numbers used to calculate effect sizes appear in Table 11 in bold type.

Table 11. Spinal Measurements of Patients With Stenosis and in Control Subjects.

Table

Table 11. Spinal Measurements of Patients With Stenosis and in Control Subjects.

The 1992 study by Yoshida (Yoshida, Shima, Taniguchi et al., 1992) incorporated two potential confounding factors that may influence interpretation of the results. First, there was an age difference between patient groups. Patients with stenosis had a mean age of 63.8 years. They were compared to a control group with a mean age of 20.2 years. Because the spinal canal tends to narrow with age (Twomey and Taylor, 1988), the younger controls may have larger canals than an age-matched control group might have had. This would have the effect of introducing bias in favor of finding a significant difference between groups. Also, the control patients all had herniated disks. Patients with disk problems may tend to have smaller spinal canals than do healthy controls (Heliovaara, Vanharanta, Korpi et al., 1986; Ramani, 1974; Ramani, 1976; Winston, Rumbaugh, and Colucci, 1984). If this is the case, this would bias the analysis against finding a significant difference between groups. The effect size from the study by Yoshida (−1.271) is larger (albeit not significantly so) than those from two other studies (Kim and Lee, 1995; Prasartritha, Suntisathaporn, Vathana et al., 1997; see Table 12). The extent to which the age differences between the stenosis patients and the control subjects contributed to this effect cannot be determined. Because of this evidence of confounding, the Yoshida study was excluded from the meta-analysis. Therefore, only two studies compared spinal canal measurements among patients with stenosis to measurements in healthy controls. Therefore, the results of this meta-analysis are to be viewed as illustrating a trend in currently available data, and not as providing a definitive estimate of the magnitude of the relationship between these two variables.

Table 12. Summary of Meta-Analysis of Differences in Canal Diameter Between Patients With and Without Stenosis.

Table

Table 12. Summary of Meta-Analysis of Differences in Canal Diameter Between Patients With and Without Stenosis.

The 1995 study by Kim and Lee had two groups of patients with spondylolisthesis. Because isthmic spondylolisthesis is outside the scope of this assessment, the spinal measurements of patients with degenerative spondylolisthesis were used (Kim and Lee, 1995).

Prasartritha et al. (1997) did not provide a measure of dispersion among measurements of sagittal diameter (Prasartritha, Suntisathaporn, Vathana et al., 1997). Because of this, an exact effect size cannot be calculated. However, the report stated that the mean canal size was significantly different (p <0.05) between groups by t-test. Because the number of patients in each group was known, we were able to calculate the value of Student's t, assuming a p-value of 0.049. This translates to an effect size of at least −0.587. If the actual p-value was smaller (as it may well be), Student's t, and therefore Hedges' d, would be larger. The overall effect size calculated here is therefore a minimum value. How much further it may actually be from zero cannot be determined from the data provided.

Effect sizes (Hedges' d) were calculated as described in Chapter 2 of this evidence report. Effect sizes for each study, the summary statistic, and the 95 percent confidence intervals around them can be found in Table 12, and are illustrated graphically with a Forrest plot in Figure 15.

Figure 15. Forrest Plot of Differences in Spinal Canal Between Patients With and Without Spinal Stenosis.

Figure

Figure 15. Forrest Plot of Differences in Spinal Canal Between Patients With and Without Spinal Stenosis.

When data from all groups were combined, the test of heterogeneity was not statistically significant (Q = 0.002, p = 0.964, standardized residuals = ±0.045), and the summary statistic, −0.597, was statistically significant (p = 0.002).

While the difference between means is statistically significant, there is 61.97 percent overlap between groups, as illustrated in Figure 16. The correlation between canal size and the presence and absence of stenosis is significant but low (r = −0.286). The binomial effect size display shown in Table 13 suggests that if the results of this meta-analysis can be generalized to the population at large, a person with stenosis is approximately 1.78 times more likely than a healthy person with no stenosis to have a small spinal canal. Although people with symptomatic stenosis tend to have smaller spinal canals than people without stenosis, the presence of a small canal is not necessarily predictive of stenosis.

Figure 16. Overlap between Normalized Canal Measurements in Patients With and Without Stenosis.

Figure

Figure 16. Overlap between Normalized Canal Measurements in Patients With and Without Stenosis.

Table 13. Binomial Effect Size Display: Canal Size in Patients With and Without Stenosis.

Table

Table 13. Binomial Effect Size Display: Canal Size in Patients With and Without Stenosis.

Relationship Between Nonstenotic Canal Cross-Sectional Area and Development of Focal Spinal Stenosis

Besides canal diameter, other aspects of the spinal canal may make an individual patient more prone to develop spinal stenosis. A well-documented anatomical variation known as the trefoil-shaped canal may predispose the patient to problems associated with lateral stenosis (Epstein, Epstein, and Lavine, 1962; Schatzker and Pennal, 1968). This variation is characterized by a smaller sagittal diameter and deeper lateral recesses. The association between trefoil canal and spinal stenosis may be merely coincidental (Eisenstein, 1980). A study of trefoil canal in South African skeletons found that while stenosis and trefoil canal may coexist in the same skeleton, they are often present at different vertebrae. Other attempts have been made to identify spinal characteristics that are predictive of future stenosis (MacGibbon and Farfan, 1979). No information is available quantifying the accuracy of these predictions.

Relationship Between Patient Age and Development of Focal Spinal Stenosis

Disk degeneration, spinal instability, and hypertrophy of the facet joints accompany the aging process (Prescher, 1998; Twomey and Taylor, 1988). Degeneration normally begins in the second decade (Mirkovic, Garfin, Rydevik et al., 1992), and its prevalence increases with age (Powell, Wilson, Szypryt et al., 1986). By the age of 40 years, 80 percent of male and 65 percent of female disks are moderately degenerated (Mirkovic, Garfin, Rydevik et al., 1992). Increasing age also correlates with increasing spinal osteoarthritis (Magora and Schwartz, 1978). While increased disk degeneration and spinal osteoarthritis implies increased stenosis, this has not been specifically demonstrated.

A 1988 study of cadavers found a decline in lumbar spinal canal anterior-posterior diameter with age that is significant in males (t = 3.95, df = 4, p <0.01) but not in females (Twomey and Taylor, 1988). This study looked at lumbar spines of men and women age 25 to35 and > 65. Our meta-analyses indicate an association between canal size and back pain, claudication, and stenosis. A narrower spinal canal may render the patient more vulnerable to developing symptomatic stenosis in cases of minor additional trauma or pathology (Clark, 1969; Twomey and Taylor, 1988). The narrowing of the canal with age may therefore contribute to the development of symptomatic stenosis.

There is a correlation between presence of calcium deposits and decreased elastic/collagenous fiber ratio in the ligamentum flavum and age among patients with degenerative stenosis (Schrader, Grob, Rahn et al., 1999). Patients under the age of 60 had 0.032 percent calcification, while patients between the ages of 70 and 75 had 0.336 percent calcification. This correlation was statistically significant (p <0.05) by the Mann-Whitney-Wilcoxon test. Few patients without degenerative stenosis had these changes. No correlation was reported between degree of change and degree of stenosis or symptoms.

Relationship Between Patient Weight and Development of Focal Spinal Stenosis

A 1976 survey indicates that increased weight correlates with increased incidence of spinal osteoarthritis (Magora and Schwartz, 1978). Only 53 percent of slim patients (weight in kg less than height in cm minus 110) had spinal osteoarthritis, compared to 59 percent of average patients (weight in kg equal to height in cm minus 90-110), 69 percent of heavy patients (weight in kg equal to height in cm minus 80 to 90) and 94 percent of very heavy patients (weight in kg more than height in cm minus 80). Overweight patients may also be more likely to have disk degeneration than normal patients (Parkkola, Rytokoski, and Kormano, 1993). A 1991 survey stated that body mass index is a predictor of disk degeneration according to univariate and multivariate analyses, but did not provide a description of the statistical analysis utilized to reach this conclusion (Symmons, van Hemert, Vandenbroucke et al., 1991). The clinical significance of these findings is unclear. Disk degeneration is a common precursor to stenosis, but it occurs in all patients to some extent, and does not inevitably lead to stenosis. No direct evidence in support of a link between body weight and development of stenosis is available.

Relationship Between Osteoarthritis and Development of Focal Spinal Stenosis

Among patients with severe osteoarthritis of one or both hips, there may be an increased frequency of moderate or severe degenerative changes as noted on radiographs in the lumbar spine. Among patients aged 40 to 59, 36 percent of patients with osteoarthritis of the hips had such changes, compared to only 4 percent of control patients matched for age, sex, and occupation. Among patients aged 60 to 69, 53 percent of those with osteoarthritis had moderate or severe degenerative changes of the lumbar spine, compared to 27 percent of matched control patients (Brewerton, 1983). No mention was made of presence or absence of stenosis or back pain, claudication, or other symptoms. In the seven studies giving this information, 42 out of 903 (4.7 percent) patients with lumbar spinal stenosis had osteoarthritis or other hip disease (see Evidence Table 25). Underreporting of this important comorbidity restricts our ability to determine its prevalence or clinical relevance. Whether hip and spine problems are both parts of a larger disease or whether mechanical changes resulting from hip disease lead to spinal degeneration is not known.

Relationship Between Diabetes and Development of Focal Spinal Stenosis

Although some skeletal defects may be more prevalent among patients with diabetes than in the general population, we were unable to locate any evidence on whether patients with diabetes are more prone to developing lumbar spinal stenosis. A cross-sectional epidemiological study published in 1994 found no significant difference in the prevalence of spondylolisthesis between diabetic and nondiabetic adults (Virta, Ronnemaa, and Laakso, 1994).

Relationship Between Type of Patient Employment and Development of Focal Spinal Stenosis

A 1993 report found no correlation (r= 0.07, n = 46) between work index (0-8 scale based on amount of lifting required in four categories of motion) and degree of spondylolisthesis (Virta and Ronnemaa, 1993). No numerical data were provided in support of this statement. In contrast, Lawrence found that coal miners and men doing outdoor or heavy manual work had earlier and greater degrees of disk degeneration than did business, professional or textile workers (Lawrence, 1969). Fifty percent of miners and 40 percent of outdoor workers aged 35-44 had disk degeneration of grade 2 or higher, compared to 14 percent of business and professional workers. Total incidence of disk degeneration for all age groups was significantly lower (p <0.01) in business, professional, and textile workers than in industrial, manual, trade, and outdoor workers or miners. The statistical test used was not stated. No such differences were found in women. Extent of degeneration as indicated by height of the intervertebral space (p = 0.003, n = 46) is associated with a greater degree of spondylolisthetic slip (Virta and Ronnemaa, 1993).

Summary

Patients with spinal stenosis tend to have narrower spines than asymptomatic patients. This suggests, although it does not prove, that patients with congenitally narrow spines may be more prone to developing focal stenosis. The shape of the patients' spinal canal may also contribute to the development of focal narrowing of the canal and symptomatic stenosis, but the evidence for this is weak. There is some evidence that disk degeneration, narrowing of the spinal canal, and degenerative changes in the spinal ligaments contributing to stenosis and instability increase with age. However, the strength of this relationship and the age at which stenosis is most likely to occur cannot be determined from the available information.

Heavier patients may be more likely to develop the degenerative changes leading to stenosis. Similarly, patients with osteoarthritis of the hips as well as patients who perform heavy labor tend to have more disk degeneration than other patients. While these data suggest a relationship between these characteristics and the development of stenosis, there is no direct evidence of a causal relationship. We located no evidence indicating a relationship between diabetes and spinal stenosis, and we located a single study giving evidence against the existence of such a link.

Question 3What is the relationship between degree of stenosis and the presence and/or intensity of each of the signs, symptoms, and patient conditions?

Patients who have abnormal spinal measurements, stenosis, or spondylolisthesis upon imaging but no symptoms of stenosis are frequently described in the literature (Boden, Davis, Dina et al., 1990; Churchill, Taylor, Shimizu et al., 1988; Fitzgerald and Newman, 1976; Healy, Healy, Wong et al., 1996; Karantanas, Zibis, Papaliaga et al., 1998; Nagler and Bodack, 1993; Sandhu, Lakhanpal, and Gupta, 1976; Wiesel, Tsourmas, Feffer et al., 1984). Several studies also report that there is no correlation between imaging appearance or degree of stenosis or spondylolisthesis and intensity of symptoms (Amundsen, Weber, Lilleas et al., 1995; Cauchoix, Benoist, and Chassaing, 1976; Fitzgerald and Newman, 1976; Kikuchi, Hasue, Nishiyama et al., 1984; Nagler and Bodack, 1993; Postacchini and Perugia, 1991; Rosenberg, 1976; Virta and Ronnemaa, 1993), but the studies do not present numerical data or statistical findings in support of this observation. The lack of correlation could relate to the observation that symptoms tend to fluctuate considerably over time (Amundsen, Weber, Lilleas et al., 1995), but again, there is little data to support this position. One study reports that there is no difference in symptoms between lateral and central stenosis (Amundsen, Weber, Lilleas et al., 1995), but no evidence is presented in support of this.

Due to a relative lack of information, our answer to this question takes the form of a systematic narrative review. While this review reflects the results of and critiques currently available studies, another purpose of it is to illustrate where there is a need for data.

Relationship Between Degree of Stenosis and Back Pain

Three studies note a lack of association between degree of stenosis and back pain (Amundsen, Weber, Lilleas et al., 1995; Friberg, 1989; Virta and Ronnemaa, 1993). In two cases, no data were presented in support of this observation. The third study is discussed below.

This 1993 study found that presence of spondylolisthesis was associated with increased likelihood of back pain in women (p = 0.044), but not in men (Virta and Ronnemaa, 1993). Degree of slip was associated with extent of disk degeneration, but not with pain intensity (r= 0.14 in all subjects, r = 0.20 in symptomatic subjects) (Virta and Ronnemaa, 1993).

Table 14 contains data from a study by Friberg showing no relationship between degree of maximal slip and back pain (Friberg, 1989). The report went on to state that symptomatic and asymptomatic patients with spondylolisthesis had significantly different amounts of translatory instability as measured by dynamic traction-compression radiography. We conducted unpaired t-tests on the published data and showed a statistically significant difference between groups (t = 7.48, p <0.00001). Thus, while degree of maximal slip may not be related to severity of back pain, translatory instability may correlate strongly with back pain. This may therefore be a more important measure of severity of stenosis than maximal slip.

Table 14. Relationship Between Back Pain and Spondylolisthesis.

Table

Table 14. Relationship Between Back Pain and Spondylolisthesis.

Relationship Between Degree of Stenosis and Leg Pain

A study of clinical and radiological features of stenosis reported that bilateral leg pain is slightly more frequent among patients with central stenosis than in those with lateral stenosis (Amundsen, Weber, Lilleas et al., 1995), but the study presented no data or statistical analysis of this observation.

Relationship Between Degree of Stenosis and Neurogenic Claudication

A 1991 report observed that claudication is associated with severe stenosis of the entire canal (Postacchini and Perugia, 1991). No data were presented to support this observation. Claudication may be more common with spondylolisthesis of the L4 vertebra than the L5. Again, no data were provided in support of this observation (Postacchini and Perugia, 1991).

Relationship Between Degree of Stenosis and Reduced Physical Function and Activities of Daily Living

A 1998 report states that disability as assessed using the Oswestry index (a measure of disability from back pain; a score of 100 means the patient is completely disabled) correlates significantly with severity of stenosis (Hurri, Slatis, Soini et al., 1998). Patients with moderate stenosis had a mean Oswestry index of 28.0, while patients with severe stenosis had a mean score of 39.0.

Relationship Between Degree of Stenosis and Disability and Dependency

A 1993 study reported that use of medical facilities and likelihood of receiving a disability pension were no different for men or women with or without spondylolisthesis (Virta and Ronnemaa, 1993). Disabling back symptoms necessitating bed rest at the expense of work or hobby activity were more commonly reported in men without spondylolisthesis (70 percent) than in men with spondylolisthesis (38 percent; p = 0.026) (Virta and Ronnemaa, 1993). No information was provided on the presence of other back problems in the control subjects.

Relationship Between Multilevel Stenosis and Symptoms

A 1992 study reported that neurogenic claudication was generally associated with at least two levels of stenosis (Porter and Ward, 1992). In this study, 49 patients with neurogenic claudication were examined by myelography and CT and found to have stenosis in the central and/or root canal. Average patient age was 58.8 years (SD of 8.1). L4-5 was the level most commonly affected by central stenosis. Thirty-nine patients (80 percent) had multilevel central canal stenosis. Root canal stenosis was found in 37 patients. In all, 94 percent of patients had either a two-level or multilevel central stenosis or a one-level central stenosis and associated root canal stenosis. The authors concluded that neurogenic claudication is uncommon in the absence of multilevel stenosis.

We examined the possible connection between multilevel stenosis and neurogenic claudication by plotting the percentage of patients with neurogenic claudication against the mean number of stenotic levels from 49 separate patient groups reported in 39 clinical studies. Several studies provided more than one patient group for this analysis. The results are presented in Figure 17. These data show no association between the percentage of patients reporting neurogenic claudication within a patient group and the mean number of stenotic levels reported for that patient group. Nine patient groups with less than an average of 2.0 stenotic levels reported better than 70 percent of patients with neurogenic claudication, and five patient groups with more than a 2.5 average for stenotic levels reported less than 50 percent of patients with neurogenic claudication.

Figure 17. Association Between Percentage of Patients Reporting Neurogenic Claudication and the Mean Number of Stenotic Levels.

Figure

Figure 17. Association Between Percentage of Patients Reporting Neurogenic Claudication and the Mean Number of Stenotic Levels. The data for this figure were taken from 49 separate patient groups reported in 39 clinical studies. (more...)

Summary

Very little evidence exists correlating degree of narrowing of the lumbar spine with the presence or severity of the signs, symptoms, or conditions associated with stenosis. Difficulties associated with finding such correlations include the presence of large numbers of patients with spinal narrowing and no symptoms; variations in canal size throughout the population; and lack of an accepted system for quantifying the degree of narrowing. Many asymptomatic patients have been shown to have stenosis or spondylolisthesis upon diagnostic imaging. Lumbar spinal stenosis is a condition that includes both a focally narrowed spinal canal and the associated symptoms. The extent of narrowing is also likely to change with the posture of the patient. Extension significantly decreases the canal area, whereas flexion has the opposite effect (Inufusa, An, Lim et al., 1996; Willen, Danielson, Gaulitz et al., 1997). Therefore, a static image of the canal dimension may not be predictive of a patient's symptoms. In current clinical practice, the course of treatment depends on the severity of the symptoms, not on the degree of narrowing.

Two studies have provided numerical evidence of a lack of association between severity of stenosis or spondylolisthesis and severity of back pain. There was, however, some evidence of a relationship between degree of spinal instability and back pain. Another study found that among patients with symptomatic stenosis, those with more severe stenosis tended to have more disability. However, other studies have indicated that patients with spondylolisthesis make the same use of medical resources and miss less work due to back pain than patients without spondylolisthesis. While neurogenic claudication is believed to be associated with more severe stenosis, we located no numerical evidence in support of this hypothesis.

Question 4What is the relationship between the signs and symptoms and other features of the patient history and physical and the results of the imaging examination?

Questions 4 and 7 of this evidence report pertain to results of diagnostic imaging examinations such as myelography, CT, and MRI. Assessment of diagnostic tests is different from assessment of treatments. While it is desirable to evaluate diagnostic tests based on prospective controlled trials that measure patient outcomes in groups that have received or not received a particular test, such trials are rare for any diagnostic application, and nonexistent for tests to diagnose spinal stenosis. Therefore, like most other diagnostic tests, imaging tests for diagnosis of spinal stenosis have to be assessed using an evidence base of case series data. Before addressing question 4, we first examine the relevant imaging modalities to ensure that they do, in fact, provide accurate measures of spinal stenosis. This material is also relevant to question 7 but, to avoid redundancy, we present it only in answer to this fourth question.

Quality of the Diagnostic Studies

Diagnostic Imaging Studies

We begin our discussion of the validity of diagnostic imaging studies by first describing the reasons for excluding studies from our analysis. We retrieved 52 articles that were described as diagnostic imaging studies. Each of them was reviewed to see if they contained any evidence relevant to questions 4 or 7. Most of these articles (45) had to be excluded from our analysis. These excluded articles, and the reasons for their exclusion, are shown in Table 15. Note that some of the excluded articles met more than one criterion for exclusion, but only one criterion is shown in the table. Therefore, the proportion of articles excluded for a specific reason will underestimate the number of articles that actually have that particular flaw.

Table 15. Excluded Diagnostic Imaging Studies.

Table

Table 15. Excluded Diagnostic Imaging Studies.

The most frequent reason for exclusion was a lack of diagnostic data in the article. This lack of data typically arose because it was the intention of the article to provide a description of the imaging of spinal stenosis and related conditions. Thus, while such articles might report the percentage of images that exhibited a particular feature, no relationship between these features and the patient's diagnosis would be made. These articles are as much tutorials as they are evaluations of the effectiveness of the technology. Typically, they show images that demonstrate normal and abnormal patients, show the features by which a diagnosis can be made, and show examples of how one modality can demonstrate an abnormality while another does not. These examples of cases diagnosed by one modality but not by another do not provide valid evidence on the effectiveness of either modality. There is no guarantee that the cases presented in the article are a representative sample of all cases encountered in clinical practice.

Another common reason for study exclusion was a lack of data on spinal characteristics. Several trials reported only imaging findings related to the intervertebral disks. Other studies were excluded because of the patient populations they studied: some studied asymptomatic patients, some studied patients who had already had back surgery, and some studied few (five or less) or no patients with spinal stenosis.

Seven articles (see Table 16) remained from this group after the preliminary inclusion/exclusion criteria were applied, but passing these criteria did not mean that valid conclusions could be drawn from the data in those articles. The studies are discussed in detail and the data is analyzed elsewhere in the report in specific sections pertaining to validation of the imaging modalities and to clinical questions 4 (Can clinical signs and symptoms predict imaging results?) and 7 (Can imaging results predict surgical results?).

Table 16. Diagnostic Trials Meeting Preliminary Inclusion Criteria.

Table

Table 16. Diagnostic Trials Meeting Preliminary Inclusion Criteria.

Diagnostic Imaging Data in Studies of Surgical Treatment

Diagnostic imaging data is not limited to articles described by their authors as diagnostic trials. We also reviewed all 141 surgical trials in the electronic database created for this project to see if they contained any evidence that could answer questions 4 and 7. Trials that could answer either of these questions would have to have reported the results of the preoperative imaging examination, so 81 of the trials were excluded because they did not report imaging results or because they reported only sample results from a few patients. We excluded 15 trials that reported only postoperative imaging findings, because treatment could have affected results. Also excluded were 24 trials that reported some imaging results but did not report any correlations between imaging results and initial signs and symptoms (question 4) or outcomes after surgery (question 7). One trial reported both imaging results and clinical outcomes for each patient, but all the patients had the same imaging results, so no questions about imaging efficacy could be answered by this trial.

Additional criteria were used to exclude studies that used technically inferior or obsolete imaging methods. Even though myelography no longer is in routine use (Eisenberg and Margulis, 2000b; Mitchell, 2000), we did not exclude myelography studies, because some consider myelography a standard modality for measuring the spinal canal. Studies using plain film x-rays for diagnosing spinal stenosis (Omojola, Vas, and Banna, 1981) were excluded. While plain films are still used in assessment of spinal disorders (particularly detection of fractures or metastases), they have been supplanted by myelography, then CT and MRI for diagnosis of spinal stenosis (American College of Radiology, 1998; Eisenberg and Margulis, 2000b; Grossman, Katz, Santelli et al., 1994). Studies using first- and second-generation CT scanners (Postacchini and Pezzeri, 1981; Postacchini, Pezzeri, Montanaro et al., 1980b; Scafuri and Weinstein, 1981) were excluded because their long scan times increase the risk of motion artifact. These scanners are now obsolete (ECRI. 1999). An MRI trial was excluded because the scanner was operated at only one-third of its nominal magnetic field strength (Downing, Schnitzlein, Clarke et al., 1986); MR signal intensity and image quality increases with field strength, so that study's technique was clearly suboptimal. More recent MRI studies may plausibly report better results than older studies using older MR technology, but insufficient data were available to test this hypothesis.

After the preliminary inclusion and exclusion criteria were applied, 19 trials were left for further review. They are listed in Table 17. Again, the appearance of a trial in this table does not necessarily mean its results could be used to answer questions in this evidence report. On further review, three of these studies were excluded from further analysis. Herno et al. (1999b) reported only postoperative imaging findings, so the results are not relevant to the clinical questions asked in this report (Herno, Partanen, Talaslahti et al., 1999). The study by Tajima et al. (1980) was excluded because of incomplete reporting of imaging methods (Tajima, Fukazawa, and Ishio, 1980). Kawauchi et al. (1996) studied diagnosis of adhesive arachnoiditis (a condition reported in no other trials) using myeloscopy (a test reported by no other trials) (Kawauchi, Yone, and Sakou, 1996).

Most of the surgical trials that reported imaging results reported spinal measurements such as the anterior-posterior (AP) dimension of the spinal canal. Correlation of this measurement with clinical signs and symptoms is the subject of question 3, and correlation of the measurement with surgical outcomes is the subject of question 6. Imaging findings like complete and incomplete block are included in the analysis for questions 4 and 7, although they are related to the degree of stenosis.

Validation of Imaging Results

In the preceding two sections, we discussed how we arrived at the set of studies used to evaluate whether the relevant diagnostic modalities produced accurate measurements of spinal stenosis. In this subsection, we now consider those data. Because imaging findings are usually considered a reference standard for diagnostic tests in published studies of spinal stenosis, validation of the results of available studies is difficult. This is because validation requires an independent reference standard to which one can compare the imaging results. The only reference standard that would meet this criterion would be physical confirmation of stenosis during surgery.

Six trials report just such a comparison. Ramsbacher et al. (1997) compared MRI results to surgical findings. They determined that "the results of MRM [MR myelography] were identical to those of conventional myelography and corresponded to the surgical findings." However, this report provides no explanation of which imaging and surgical findings were compared: specific measurements of the spine and spinal canal, localization of abnormal disks, or the differential diagnosis between spinal stenosis and other conditions. Furthermore, there are no data presented in the paper to allow us to confirm the authors' conclusions. Although the authors intended to measure sensitivity and specificity of MRI in this study, we cannot use this trial to assess the efficacy of MRI (Ramsbacher, Schilling, Wolf et al., 1997).

The same difficulties affect the article by Muto et al. (1997). These investigators sought to evaluate the sensitivity and specificity of both CT and MRI, and found that CT evaluated the size of the spinal canal "perfectly." However, they do not report what reference standard they used, whether it was surgery, myelography, or some other test. Therefore, we cannot use this trial to validate CT or MRI. Furthermore, this study also has a patient selection bias affecting the MRI results. Only patients with polyradiculopathy or with imaging findings (modality not specified) that disagreed with the preliminary clinical diagnosis were referred for MRI, so the MRI results reflect only difficult cases, not a representative sample of all cases. Therefore, the sensitivity and specificity of MRI would be underestimated by any calculations performed on this data. The article includes examples of selected cases that were correctly or incorrectly diagnosed by CT and MRI, but we are not certain that these cases are a representative sample (Muto, De Maria, Izzo et al., 1997).

An early study validating MRI of the lumbar spine was reported by Modic et al. (1986). The study design was not subject to the difficulties of the other studies, surgical results to use as an independent reference standard were available for most patients (note that no quantitative definition was given for spinal stenosis), but the MR scanner used is now obsolete and technically inferior to scanners in routine use today. Gradient-echo techniques, now considered standard practice, were not available when Modic et al.'s patients were studied. Therefore, results from this study should be considered minimum levels of sensitivity for MRI. Calculation of specificity from the published results is not possible because no negative results were found on surgery. In this series, MRI alone detected 23 of 30 vertebral levels with spinal stenosis (sensitivity = 77 percent), while CT alone (combining studies done with and without contrast material) detected 19 of 24 stenoses (sensitivity = 79 percent), and myelography alone detected 13 of 24 stenoses (sensitivity = 54 percent). When results of both imaging examinations were used together, MRI and CT detected 23 of 24 stenoses (sensitivity = 96 percent) (Modic, Masaryk, Boumphrey et al., 1986).

Relatively current MRI techniques were used by Eberhardt et al. (1997). All 80 patients in their study underwent surgery, but the definition of stenosis used in the surgical reference standard was not reported. Nerve root compression appears to be their diagnostic criterion. There is a discrepancy in the percentages reported by Eberhardt et al. (1997) for the sensitivity of MRI in diagnosing stenosis. For diagnosing severe stenosis, they report 95 percent sensitivity, but with 50 cases of stenosis, this percentage is not possible. This percentage is also inconsistent with the percentage obtained from subtracting results for mild stenosis from results in all stenoses. To take a conservative approach to computing the sensitivity of MRI, we will assume the lower figure (the figure we calculated by subtraction) is correct. Making this assumption, the sensitivity was 100 percent for both x-ray myelography and MRI in diagnosing mild stenosis. For severe stenosis, we found the sensitivity of MRI was 88 percent and the sensitivity of myelography was 72 percent (Eberhardt, Hollenbach, Tomandl et al., 1997).

Validation of CT and myelography was the objective of a study reported by Bell et al. (1984). Their multicenter study compared imaging results to surgical findings. Details of the CT methods were not reported, other than the fact that they did not use contrast agents, so we cannot determine whether the CT scanners used in this study (reported in 1984) are now obsolete. However, these data are nearly 20 years old, and technical improvements have been made in CT scanners, so it is reasonable to assume now that current scanners are more effective in diagnosing spinal stenosis (Bell, Rothman, Booth et al., 1984).

Rather than expressing their findings as quantitative results for stenosis measurement and differential diagnosis, Bell et al. (1984) used descriptive words, "strong," "firm," and "weak," to describe the agreement between test results and surgical findings. For purposes of this evidence report, we will only consider "strong" agreements to be correct diagnoses. This requires both a correct diagnosis of the vertebra as normal or abnormal, and a correct differential diagnosis between disk herniation and spinal stenosis (or both). Based on these data, we calculated that the CT scan made both diagnoses correctly in 55 of 76 cases of disk herniation (72 percent) and 62 of 93 cases of spinal stenosis (67 percent). The myelogram made both diagnoses correctly in 63 of 72 disk herniation cases (83 percent) and 63 of 93 spinal stenosis cases (68 percent). The myelogram correctly diagnosed the presence of disk herniation a significantly greater percentage of the time (p <0.005, McNemar test), but the two modalities were not significantly different for diagnosing spinal stenosis. Unfortunately, the categories used by Bell et al. (1984) do not allow one to distinguish between false positive and false negative results, so their data cannot be expressed as sensitivity and specificity.

While the study by Young et al. (1988) was primarily a report on a new surgical procedure, it did include data validating myelography results by comparison to surgical findings. According to the myelograms, there were 70 stenosed vertebrae in the 32 patients in this series. However, six additional levels of stenosis were found at surgery. Assuming the surgical results to be the reference standard, this means that the sensitivity of myelography was 92 percent. Only seven patients in this study were imaged by CT, and the CT results were not reported (Young, Veerapen, and O'Laoire, 1988).

Summary

All of the clinical trials that report data validating MRI, CT, or myelography for diagnosis of spinal stenosis had one or more flaws in design or reporting that adversely affected the reliability or applicability of the results. Each of the five trials that studied CT or MRI found that the sensitivity of the cross-sectional modality is equal to or better than the sensitivity of myelography. None concluded that myelography was superior to the cross-sectional modalities. None of the trials attempted to validate the quantitative measurements of the spinal canal made by any imaging modality.

Signs, Symptoms, Patient History, and Physical and the Results of Imaging Exams

In this section, we address the fourth key question of this evidence report, "What is the relationship between the signs and symptoms and other features of the patient history and physical and the results of the imaging examination?" In considering our discussion of this question, it is important to bear in mind the conclusion of the previous subsection: there is little evidence that proves how well the modalities used to image spinal stenosis perform.

In answer to this fourth key question, we assess evidence related to whether there is a relationship between patient signs and symptoms the findings of the imaging examination. If such relationships exist, it may be possible to expedite proper diagnosis and treatment of the condition. The evidence base for this question consists of seven trials that retrospectively reported the number of patients with and without a few signs and symptoms among groups of patients with moderate or severe stenosis. In all of these trials, difficulties with patient selection and confounding variables, including disk herniation, could cause apparent associations between clinical signs and symptoms and imaging results in clinical trials. Therefore, trials of better quality are needed to adequately address this question. Nevertheless, currently available data do exhibit some patterns.

Only one of these trials was a diagnostic trial. Phillips et al. reported a study of "stress x-rays" (radiographs taken with the patient bending one way or another) (Phillips, Howe, Bustin et al., 1990). Angles between vertebrae were measured, and motion of the spine during these movements was described as normal or abnormal by the person reading the images. The investigators tried to correlate these descriptions with clinical information such as age, sex, previous history of injury or surgery, participation in sports, and discomfort. No statistically or clinically significant correlation was found with these variables or with physical exam findings like range of motion and palpation results. The authors conclude that the value of these x-rays is questionable. Accordingly, no other published studies report using this technique.

The remaining trials answering the present question were surgical trials that additionally reported imaging results. Sato and Kikuchi (1997) sought to determine whether there was a relationship between the number of stenosed levels and patients' symptoms. To accomplish this, they stratified their report of patients' symptoms by the number of vertebral levels where stenosis was found by myelography. Rates of radicular pain were the same in the two groups: 82 percent (23/28) of the patients with two stenosed levels and 83 percent (44/53) of the patients with one stenosed level. Cauda equina symptoms were present in 75 percent (21/28) of patients with stenosis at two levels but were present in only 53 percent (28/53) of patients with stenosis at one level. According to tests performed by the authors, this difference was not statistically significant (Sato and Kikuchi, 1997).

Rompe et al. (1995) studied the frequency of 13 different symptoms and patient characteristics among patients found to have absolute stenosis (AP diameter of spinal canal 10.0 mm or less as measured by CT) and relative stenosis (diameter 10.1 to 12.0 mm). Statistical tests performed by the authors found that rates of most of these characteristics were not significantly different between absolute and relative groups. Only ankle reflex differed significantly, with 72 percent of the absolute stenosis group having decreased reflex response, and 13 percent of the relative stenosis group having it. The ability of reflex abnormality to predict degree of stenosis was not discussed by the authors, nor is it examined in any other clinical trial (Rompe, Eysel, Hopf et al., 1995).

A pair of studies by Johnsson et al. (1981) compared the rates of major symptoms among patients with different degrees of stenosis. Their article included five patients with normal myelogram results and divided the abnormal results into complete block (n = 7) and partial block (n = 15) of the spinal canal. Our statistical calculations found no significant difference in degree of stenosis between patients with claudication and those with back and leg pain without claudication (χ2 = 0.277, df = 2, p = 0.87) (Johnsson, Willner, and Pettersson, 1981).

The report by the same group (Johnsson, Uden, and Rosen, 1991) appears to include the patients included in the earlier report, along with subsequent surgical patients, as well as a group of patients who were not operated on. Symptoms were categorized as claudication and radiculopathy. Patients who underwent surgery were categorized as having complete stenosis (n = 14) or moderate stenosis (n = 30). The "not operated" group (n = 19) may have included some patients without stenosis, but it also included patients with stenosis who either refused surgery or were considered unacceptable surgical candidates. As in the previous study, we found no significant differences in the proportion in each stenosis group between the two symptom groups in this report (χ2 = 0.398, df = 2, p = 0.82).

Verbiest's case series (Verbiest, 1977) reported measurements of the lumbar canal made during surgery. Therefore, using this data to answer the present question requires one to assume that the presurgical imaging findings accurately determine the degree of stenosis. Verbiest categorized the spinal findings as "absolute stenosis" (spinal canals with diameters of 10 mm or less), "relative stenosis" (spinal canal diameters between 10 and 12 mm), and "mixed stenosis" (some narrowed portions 10 mm or less and some between 10 and 12 mm). To analyze these results, we pooled the "absolute" and "mixed" groups, so that all patients with any spinal measurement of 10 mm or less would be in this severe stenosis group, and all spinal measurements in the patients in the mild stenosis group would be greater than 10 mm. These findings may be confounded by disk herniation. Most of the patients in the group with mild stenosis had disk herniation (29/32: 91 percent), while less than half the patients with severe stenosis had disk herniation (33/84: 39 percent, p <0.001, Fisher's exact test, as per our calculation). Therefore, we computed the correlation between the reported symptoms and the anatomic findings. Sciatica was significantly correlated with disk herniation (r = 0.143, p = 0.027) but not correlated with stenosis (r = 0.000, both calculations performed by us). While the correlation of sciatica with disk herniation was statistically significant, the correlation coefficient was relatively low, so it may have limited clinical significance. We found that sciatica was significantly correlated with disk herniation (r = 0.219, p = 0.004), but the correlation with stenosis was not statistically significant (r = −0.076, p = 0.095). Data for the correlation between intermittent claudication and disk herniation was not reported by Verbiest, so we cannot determine whether claudication correlates more with disk herniation or with stenosis. From this analysis, we can conclude that symptoms like sciatica and lumbago are not predictive of stenosis.

Summary

Clinical signs and symptoms do not appear to predict whether the results of imaging tests will show severe stenosis. The major symptoms of radiculopathy and cauda equina do not predict degree of stenosis. Lumbago and sciatica are significantly associated with disk herniation but not with stenosis. However, no trial that was relevant to the present question was prospective in design, and results of some of the trials are difficult to interpret because some patients had disk herniation. Therefore, a conclusive answer to this question awaits the results of trials of better design.

Question 5 What is the relationship between the signs and symptoms and other features of the patient history and physical and results of conservative treatment; and what is the relationship between the type of conservative treatment and patient outcomes?

This question seeks to determine if successful conservative treatment can be predicted by certain patient characteristics. Implicit in this question is whether conservative treatments are effective. Therefore, we first discuss possible predictors of successful conservative treatment, and then turn to a discussion of this latter implicit question.

Factors That Could Potentially Predict the Success of Conservative Treatment

This part of the analysis addresses whether there are any signs or symptoms that predict the outcome of conservative treatment. Does the initial status among patients have a prognostic before receiving conservative treatment? To answer this question within a single study, there must be individual patient data on the initial signs and symptoms and the outcome of conservative treatment. Alternatively, the outcome must be stratified according to initial status. For these reasons, our inclusion criteria for this question were any trial with a conservative treatment arm that had either individual patient data for initial status and their final outcome or that stratified final outcome by initial status.

We examined every study that had a conservative treatment series, either as a control arm or as a single series. We identified one study that provided individual patient data for treadmill walking distance at the start of conservative treatment and after three months (Eskola, Pohjolainen, Alaranta et al., 1992). We also found one study that stratified patients according to outcome of conservative treatment (improved, same, worse) and reported the mean AP canal diameter for each group before treatment (Johnsson, Uden, and Rosen, 1992). We also located two recent publications that addressed this issue.

Eskola et al. (1992) provided plots portraying individual patient treadmill walking distances before conservative treatment and after three months. This trial had parallel arms, one of which received placebo, and one which received calcitonin. During the 3 month period for which individual patient data was reported, the calcitonin had a nonsignificant effect on treadmill walking distance (we calculated for t-test of difference: p = 0.23; t-test on ratio: p = 0.55). Therefore, we combined the arms to approximate a single series of 38 patients that received conservative treatment. This trial also provided individual patient data for resting pain; however, this is not a major factor in spinal stenosis. Furthermore, the calcitonin treatment had a significant effect in preventing resting pain in those who had it, and so the treatment groups could not be combined, greatly limiting the statistical power for our purposes. Therefore, we did not analyze this outcome (Eskola, Pohjolainen, Alaranta et al., 1992).

To analyze the relationship between initial and final walking ability with conservative treatment, we extrapolated the initial and final values from the published plots and carried out a regression analysis (see Figure 18). The correlation coefficient is 0.92, p <0.001 (see Table 18), indicating a strong relationship between initial status and final status with conservative treatment. Two conclusions are suggested by this analysis. First, a substantial number of patients improved after conservative treatment. This resulted in a larger number of points above the 45-degree diagonal line (the line on which points would fall if patients did not improve) than below it. Thus, 18 of 36 points were above the diagonal, 13 of 36 points were on it, and 6 of 36 points were below the diagonal. Further, the regression line is above the diagonal. This indicates that there can be some improvement in a conservatively treated series of patients, but does not indicate whether this improvement is caused by conservative treatment or whether it would have occurred in the absence of treatment. To determine the cause of this improvement, a trial must have a control group. This includes trials in which surgical outcomes are compared to those of conservative treatments. Only the results of controlled trials can be used to ascertain whether surgical treatment provides additional benefit beyond what would occur with conservative treatment.

Figure 18. Initial vs. Final Walking Distance Computed from Eskola, Pohjolainen, Alaranta et al., 1992.

Figure

Figure 18. Initial vs. Final Walking Distance Computed from Eskola, Pohjolainen, Alaranta et al., 1992.

Table 18. Regression Analysis of Treadmill Walking Distance Before and After Three Months of Conservative Treatment.

Table

Table 18. Regression Analysis of Treadmill Walking Distance Before and After Three Months of Conservative Treatment.

The second phenomenon observed in this analysis is that patients with poorer half in terms of initial status were more likely to stay the same or become worse (to be on or below the diagonal) than patients with more favorable half initially. For the half of the patients with the worst initial walking ability (below 520 m), 4 of 18 points were below the diagonal, and 8 of 18 points were on the line, for a total of 12 of 18 points. For the group with more favorable initial status, 2 of 18 points were below the line, and 4 of 18 points were on the line, for a total of 6 of 18 points. This is 67 percent the same or worse for the group with poorer initial status, and 33 percent the same or worse for the group with more favorable initial status. Our statistical analyses of these data show that this is an absolute difference of 33 percent (95 percent CI, 1 to 58) and a relative difference of 2.0 (95 percent CI, 0.96 to 4.15). One result is statistically significant, and the other is not. Thus, whether the trend observed is statistically significant in these data appears to depend on how one analyzes the data.

Using information from two additional graphs, we further searched for a relationship between patients' initial status and whether they exhibited improvement after treatment. In Figure 19, we carried out a regression analysis of the initial walking ability compared to the change measured as the difference between the initial and final walking distance. The correlation coefficient is 0.468, p = 0.003, indicating a moderate, positive correlation between the initial walking ability and the amount of improvement. Again, it is noteworthy that most of the patients with a negative change (points below the zero difference line) were in the half with the worst initial walking ability.

Figure 19. Initial vs. Difference in Walking Computed from Eskola, Pohjolainen, Alaranta et al., 1992.

Figure

Figure 19. Initial vs. Difference in Walking Computed from Eskola, Pohjolainen, Alaranta et al., 1992.

Another way to look at the relationship of initial status to final outcome is to regress initial walking ability on the ratio of improvement, measured as the final walking distance divided by the initial walking distance (see Figure 20). Unlike the above regression, this regression does not have a positive slope. The correlation coefficient is −0.119, p = 0.482, indicating a possible slight negative (but not statistically significant) correlation between initial walking ability and the proportional amount of improvement. Even if this possible slight negative trend were statistically significant, it would be unlikely to have clinical prognostic value. As in the above regressions, most of the patients showing worse proportional change (points below the zero ratio horizontal line) were in the worse initial group.

Figure 20. Initial vs. Ratio of Change in Walking Distance Computed from Eskola, Pohjolainen, Alaranta et al., 1992.

Figure

Figure 20. Initial vs. Ratio of Change in Walking Distance Computed from Eskola, Pohjolainen, Alaranta et al., 1992.

Whether the regression on change as a difference or as a ratio is more informative is not immediately clear. The fact that the correlation goes from statistically significant and positive for regression on the difference to statistically nonsignificant and negative for regression on the ratio indicates that both types of regressions should be considered. In this regard, the proportional assessment of improvement is more like the way many surgery studies measured outcomes. Typically patients are categorized as being worse, the same, or improved following treatment. Frequently the same and improved categories are further categorized into fair (eg., those patients with 0 to 50 percent improvement), good (50 to 75 percent improvement), and excellent (>75 percent improvement), or similar categories. These categories demand proportional judgments. This expression of results obscures the influence on outcome of the initial status and assumes that, for example, a 50 percent improvement from poor initial status to mediocre status is equivalent to a 50 percent improvement from mediocre status to good status. This may be the case, but studies in this field have not acknowledged or addressed this assumption.

The regression analysis shown in Figure 20 indicates that there is more scatter in the data points from patients with the worst initial walking ability. The slightly negative slope of the regression line is caused by a few patients with poor initial status who had very large gains, while several patients with poor initial status became worse. This is in contrast to the patients with better initial status, most of whom had moderate gains. This wide dispersion prevents initial walking ability from being a useful predictor of the success of conservative treatment with individual patients.

In addition to regression analysis, we also looked at the relationship of initial and final walking ability by stratifying the series into the half of the patients with the most favorable status and the half with the poorest initial status. We then calculated the mean outcomes for each group (see Table 19). We also categorized the outcomes into worse, same, and better, and calculated the proportions in each category stratified by initial status (see Table 20). The mean change in walking distance in terms of absolute difference for the whole cohort was 164 m (95 percent CI ±106 m). Three hundred and four meters was the absolute difference for the group with the most favorable initial status, and 43 m was the absolute difference for the group with the poorest initial status. The difference between the mean changes (the effect size) was 261 m (95 percent CI 54 to 468 m; two-tailed t-test p = 0.016). Thus, the group with the most favorable initial status had a substantially greater mean change in walking distance in terms of the absolute distance. The mean proportional change in walking distance for the whole cohort was 1.31 (95 percent CI ±0.22). The mean proportional change in walking distance was 1.27 m for the group with the most favorable initial status and 1.39 m for the group with the poorest initial status. This means that this latter group had a somewhat greater proportional improvement over the followup period. The difference between these ratios was 0.12. This small possible difference is not statistically significant (95 percent CI −0.34 to 0.59; two-tailed t-test p = 0.59). Thus, the group with the most favorable initial status had a statistically greater absolute change in the number of meters walked. However, the proportional change was slightly greater for the group with the poorest initial status, although the difference was not statistically significant, nor likely to be clinically significant.

Table 19. Mean Change in Treadmill Walking Distance before and after Three Months of Conservative Treatment Stratified by Initial Status.

Table

Table 19. Mean Change in Treadmill Walking Distance before and after Three Months of Conservative Treatment Stratified by Initial Status.

Table 20. Change Category Frequencies for Treadmill Walking Distance After Three Months of Conservative Treatment Stratified by Initial Status.

Table

Table 20. Change Category Frequencies for Treadmill Walking Distance After Three Months of Conservative Treatment Stratified by Initial Status.

Looking at the proportion of patients in each group that were improved (ratio >1.25), the same (ratio between 0.75 and 1.25), or worse (ratio <0.75), 44 percent (8/18) of the group with the most favorable initial status improved, 56 percent (10/18) were the same, and none were worse. In the group with the poorest initial status, 33 percent (6/18) of patients improved, 61 percent (11/18) remained the same, and one patient was worse (Somers' d −0.114, p = 0.38). Thus, with this frequency comparison, the best initial group had slightly more patients improved, although the difference was not statistically significant, nor does it appear to have much clinical significance. This crude and arbitrary categorization did not capture the fact that both groups actually had some patients with a change for the worse (4/18 in the group with the poorest initial status, and 2/18 in the group with the most favorable initial status), as was demonstrated in the above regression analyses.

Although this was a small study with only moderate statistical power (18 patients in each group), we conclude from this analysis that there was no substantial difference between the mean proportional improvement of the groups with best and worst initial status in terms of treadmill walking distance. However, a group of the worst initial patients (67 percent or 12/18 in the regression analysis) appears to remain the same or continue to get worse with conservative treatment. But because of the wide distribution of results in the patients with the worst initial status, in this trial the initial status for treadmill walking distance had little prognostic value in terms of predicting the outcome of conservative treatment for individual patients.

Johnsson et al. (1992) also provided data on the relationship between initial status and the outcome of conservative treatment (see Table 21). This study reported that for a subjective global VAS outcome (45 months average followup) of worse, unchanged, or improved, the mean AP canal diameter (myelographic measurement at the narrowest level) for each group before conservative treatment was 4.7 mm (range 0 to 9 mm, n = 4), 6.8 mm (range 0 to 11 mm, n = 19), and 8.2 mm (range 7 to 10 mm, n = 4), respectively. Because no variance was reported, we were unable to test the statistical significance of this data. These data are consistent with a trend for smaller focal canal diameters to predict worse outcomes with conservative treatment on average. However, as above, the wide distribution (wide and overlapping ranges) prevents this variable from having useful prognostic value for individual patients (Johnsson, Uden, and Rosen, 1992).

Table 21. Initial Stenotic Canal Diameter Stratified by Outcome of Conservative Treatment.

Table

Table 21. Initial Stenotic Canal Diameter Stratified by Outcome of Conservative Treatment.

Simotas et al. (2000) examined the effects of nonoperative treatment on 49 patients with lumbar spinal stenosis over a 3 year period. Although this study did not contain a surgical arm for comparison, this trial is mentioned here because of the long followup time and the information provided on those patients that eventually needed surgery. Conservative treatment consisted of bed rest for the first 1 to 2 weeks, followed by a physical rehabilitation program with a physical therapist. Patients performed flexion-based lumbar stabilization exercises. Patients with significant radicular symptoms received oral corticosteroids on a 7 day tapering schedule. Epidural steroid injections were used to treat patients who did not achieve adequate symptom control. Many patients had a recurrence of symptoms that required reinjection during the 3 year study period. The average age was 69 years (range 53 to 87). Radiographic severity of the stenosis was judged to be mild in 12 percent of patients, moderate in 43 percent of the patients, and severe in 45 percent of the patients. Nine patients required surgery within a mean of 13 months (range 2 to 41 months). These patients had more serious pretreatment averages for back and leg pain and a greater degree of stenosis. Among the 40 remaining nonoperated patients, only age seemed to be related to outcome variables, with older patients having worse scores for improvement in pain. Older patients had significantly greater radiographic severity, greater levels with at least moderate stenosis, and more frequently had scoliosis. If the patients with scoliosis were removed from the analysis, the relationship between increased age and worse outcomes was not significant.

Amundsen et al. (2000) divided their patients into three groups: patients with severe symptoms who were treated with surgery (n = 19), patients with pain considered too moderate to justify surgery and treated with conservative methods (n = 50), and the remaining patients whose severity of pain did not indicate surgery or conservative treatment (n = 31). The patients in this latter group were randomized to surgical (n = 13) or conservative treatment (n = 18). Over the course of the 10 year followup period, 20 percent of conservatively treated patients with mild pain needed surgery, and 56 percent of conservatively treated patients with moderate pain needed surgery. No convincing associations between any clinical feature, such as age, gender, type of work, duration of pain, and physical findings, and treatment results were observed by the authors. In addition, no convincing associations between any radiologic feature, such as degenerative changes, spondylolisthesis, type of stenosis, degree of narrowness, and measures of different dimensions of the spinal column, were observed. However, the sample size in this trial was small and the study may not have had sufficient statistical power to detect association between patient characteristics and treatment results.

Summary

Few studies have examined the question of the relationship of initial signs and symptoms to the final status or amount of change following conservative treatment. The one study, Eskola et al. (1992) which reported individual patient data indicated that with conservative treatment, some patients improve, some stay the same, and some get worse. Simotas et al. (2000) suggest that the patients with the highest degree of pain are more likely to have unsuccessful conservative treatment and go on to surgery. In Simotas et al. (2000), this was 19 percent of patients over three years. Amundsen et al. (2000) preselected those patients with the highest pain for surgery, but among the mild and moderate pain groups of conservatively treated patients, 20 percent and 56 percent, respectively, eventually required surgery over a 10 year period. This calls into question studies of conservative treatment or surgery that do not have parallel control groups. This also calls into question the usefulness of means in treatment comparisons. Reporting only the mean value for final status or amount of change can obscure the fact that substantial numbers of patients may differ greatly from the mean. The study by Eskola et al. (1992) indicated that for treadmill walking ability (a measure of neurogenic claudication), a moderate positive correlation may exist between initial status and amount of improvement measured as a difference (final status − initial status). However, no substantial correlation was found between initial status and improvement measured as a ratio (final status : initial status). The wide dispersion of results for the difference and ratio indicates that any correlations are not very useful for prognostication for individual patients. Johnsson et al., (1992) suggested some positive correlation between canal size at the point of focal narrowing and the outcome of conservative treatment. Simotas et al. (2000) also suggested that the degree of stenosis was related to the failure of conservative treatment and the need for surgery. Amundsen et al. (2000) found no association between spinal canal dimensions and treatment outcomes; however, the small sample size in this study may have prevented the detection of any significant association.

Comparison of Conservative Treatment Methods

Different methods of medical management can only be compared in trials in which patients are randomly assigned to receive the different treatments. Random assignment, if properly performed, ensures that the patients in the groups being compared are highly similar. Similar patient groups decrease the likelihood that any differences in patient outcomes between the groups are due to differences in patient characteristics, and increase the likelihood that any observed differences are due to differences in treatment efficacy. For this reason, we included only randomized controlled trials in this portion of our analysis. The requirement for this criterion had a dramatic impact on the number of articles we examined. This impact is portrayed in Table 22. This table shows that of the 4,788 studies on treatment that we identified, 178 were trials on conservative treatment, and only four of these were relevant to the question at hand. These four trials are summarized in Table 24. Not shown in this table is that three of these trials had flaws that undermine the usefulness of the data.

Table 22. Disposition of Publications on Conservative Treatments.

Table

Table 22. Disposition of Publications on Conservative Treatments.

Table 24. Randomized Controlled Trials of Patients Receiving Conservative Treatments.

Table

Table 24. Randomized Controlled Trials of Patients Receiving Conservative Treatments.

Eskola et al. (1992) examined the use of calcitonin injections as a treatment for lumbar spinal stenosis. This trial was previously discussed with regard to factors that may predict the success of conservative treatment. Patients received calcitonin or placebo injections every other day for 4 weeks. The injection period was followed by a 2 month washout period, and then patients crossed over to the other injection. Walking distance on a treadmill was measured before treatment and 1, 3, 4, 6, and 12 months after treatment. Walking capacity at baseline, approximated from published graphs, were the following: Group 1, 710 ± 20 m; Group 2, 550 ± 20 m (means and standard deviations). The authors do not statistically compare data across treatments at each time point, but evaluated only the changes between measurement periods. In the case of walking distance, both Group 1 patients (calcitonin then placebo) and Group 2 patients (placebo then calcitonin) showed significant increases in walking capacity compared to pretreatment values 1 month and 3 months after the first treatment. The increase in walking capacity continued through the crossover period until 6 months after the first treatment. After 6 months, both groups showed a significant decline in walking capacity so that at 12 months, Group 1 patients averaged 800 m and Group 2 patients averaged 475 m. From the data presented by the authors, no conclusions can be drawn as to whether the calcitonin, the placebo treatment, or some other aspect of the treatment process was responsible for the increase in walking capacity. All patients seem to respond during the treatment period regardless of the order in which they received calcitonin or placebo. Therefore, this trial cannot be used as evidence that calcitonin reduces the symptoms associated with lumbar spinal stenosis.

Cuckler (1985) examined the effects of epidural steroid injections on patient-perceived improvement in symptoms both at 24 hours and at 13 to 30 months later. The ability to evaluate the long-term effects of the injections was compromised by a study design that allowed all patients (placebo or steroid) to receive the steroid injection 24 hours after the first injection if symptoms did not improve. Therefore, few patients, 3 of 17 (17.6 percent), were left in the control group of this trial after 24 hours. Since this trial cannot be considered a controlled trial for evaluation of long-term effects of epidural steroid injections, we attempted no further analysis of its data.

Sinaki et al. (1989) examined the effect of two exercise regimens on patients with lumbar spondylolisthesis. Patients were randomized to different physiatrists, who then assigned the exercise regimen, resulting in 29 patients prescribed flexion exercises and 19 patients prescribed extension exercises. The assignment of exercise programs was therefore not random. Consequently, the pretreatment pain classification was very different between the patient groups. All 26 patients in the flexion group were considered to have moderate or severe pain, while in the extension group 12 patients had moderate or severe pain and 6 patients had no pain or mild pain. The lack of true randomization of the patients caused us to exclude this study from further analysis.

Fukusaki et al. (1998) examined the effects of epidural steroid injections on the neurogenic claudication symptoms that are common and debilitating among patients with degenerative lumbar spinal stenosis. Fifty-three patients with neurogenic claudication were randomly assigned to epidural injections of saline, 1 percent mepivacaine, or 1 percent mepivacaine plus methylprednisolone. Patient blinding to treatment was not reported. Either central or lateral lumbar spinal stenosis was demonstrated in each patient by CT and MRI. The average age in all three groups was between 69 and 72 years. At 1 week, 1 month, and 3 months, each patient's walking capacity was determined by having the patient walk until intolerable leg pain developed, an indication of neurogenic claudication. The walking distance was measured by a blinded physical therapist. Table 23 contains the original publication data and our analysis of the effect sizes. The analysis reveals that all three types of injections produced relief of symptoms at 1 week and 1 month after treatment but that symptoms returned 3 months later. The evidence in this study suggests that the local anesthetic block provided by the mepivacaine can reduce symptoms on a short-term basis. Epidural steroids offered no additional benefit on a short-term basis in treating patients with lumbar spinal stenosis (Fukusaki, Kobayashi, Hara et al., 1998).

Table 23. Analysis of Walking Capacity Data from Fukusaki, Kobayashi, Hara et al., 1998.

Table

Table 23. Analysis of Walking Capacity Data from Fukusaki, Kobayashi, Hara et al., 1998.

Summary

Many conservative methods have been proposed for treating patients with lumbar spinal stenosis (Fritz, Delitto, Welch et al., 1998). Our search of the literature has uncovered only one well-designed randomized controlled trial that compared a conservative treatment to placebo treatment specifically in lumbar spinal stenosis patients. That study, Fukusaki et al. (1998), indicates that local anesthetic block provides temporary relief from neurogenic claudication for about 1 month. Conclusions about effectiveness beyond 3 months cannot be made.

Evidence for the efficacy of other conservative treatments in lumbar spinal stenosis patients is lacking. However, the lack of evidence for effectiveness does not prove that these treatments are not effective. The lack of evidence is an indication of the failure to design adequate clinical trials to show effectiveness.

Question 6What is the relationship between the signs, symptoms, and other features of the patient history and physical and the success or failure of surgical treatment?

This question addresses the possible connection between surgical success and patient characteristics. Will patients with less severe symptoms or extent of stenosis benefit to a greater or lesser degree from surgery compared to patients with more severe symptoms or extent of stenosis?

Implicit in this question is whether some patients might benefit more from surgery than from medical management. We turn to this implicit question after first addressing the relationship between patient characteristics and surgical outcomes.

Relationships Between Patient Characteristics and Surgical Outcomes

We addressed this question by performing a systematic narrative review of each study. In this review, we calculated, wherever possible, each study's effect size. The following analysis is subgrouped according to the type of lumbar spinal stenosis and the patient characteristics that were correlated with the outcomes of surgical treatment. Only studies that stratified surgical treatment outcomes by patient characteristics or used a regression analysis to compare patient characteristics to outcomes or used a regression analysis to compare patient characteristics to outcomes were considered to have evidence that could be used to answer question 6.

Herno et al. (Herno, Saari, Suomalainen et al., 1999), Herno et al. (Herno, Partanen, Talaslahti et al., 1999), Hanakita et al. (Hanakita, Suwa, and Mizuno, 1999), Johnsson et al. (Johnsson, Uden, and Rosen, 1991), Surin et al. (Surin, Hedelin, and Smith, 1982), and Tajima et al. (Tajima, Fukazawa, and Ishio, 1980) stratified the reporting of surgical treatment outcomes by patient characteristics (see Table 25).

Table 25. Controlled Trials of Patients with Central Lumbar Spinal Stenosis.

Table

Table 25. Controlled Trials of Patients with Central Lumbar Spinal Stenosis.

In addition to these studies, the prospective single arm surgical trials of Katz et al. (Katz, Stucki, Lipson et al., 1999) and Jonsson et al. (Jonsson, Annertz, Sjoberg et al., 1997), and the retrospective single arm surgical trial of Thomas et al. (Thomas, Rea, Pikul et al., 1997) also examined the relationship between patient characteristics and surgical outcome.

Effect of Extent of Stenosis on Surgical Outcomes Among Patients With Central Stenosis

Johnsson et al. (1991) stratified patients receiving standard wide laminectomy according to the extent of pretreatment stenosis. A partial block at myelography indicated moderate stenosis (30 patients, mean and standard deviation of the AP diameter was 7.9 ±2.3 mm), and a total block indicated severe stenosis (14 patients, diameter = 0). The mean ages and their standard deviations were 62 ±9 years in the moderate group and 69 ±8 years in the severe group. Walking capacity was evaluated at a mean followup time of 50 ±32 months in the moderate group and 58 ±32 months in the severe group. The original walking data and our calculations of effect size are presented in Table 26. At baseline, the walking capacity was similar between groups, and both groups showed large increases in walking capacity at followup (>1,200 m). However, only the moderate stenosis group showed a statistically significant effect size when pre- and posttreatment values were compared. The large standard deviations at followup in both patient groups, approximately twice the mean, indicate a large variation in individual improvement in walking capacity. Our analysis of walking capacity based on the three level scale data reported by Johnsson et al. (1991) (worse, unchanged, improved) indicated only a small effect in favor of the severe stenosis group (Hedges' d effect size = .636, confidence levels of 0.406 and 0.998 (see Table 27). The data from this trial seem to suggest that a select group of patients benefited from surgery more than others. However, the degree of stenosis did not seem to determine who these patients would be, and insufficient information is presented in the trial to determine which patients might make up the group that benefits.

Table 26. Analysis of Walking Capacity in Meters from Johnsson, Uden, and Rosen, 1991.

Table

Table 26. Analysis of Walking Capacity in Meters from Johnsson, Uden, and Rosen, 1991.

Table 27. Analysis of Walking Capacity According to a Three-Level Scale from Johnsson, Uden, and Rosen, 1991.

Table

Table 27. Analysis of Walking Capacity According to a Three-Level Scale from Johnsson, Uden, and Rosen, 1991.

Surin et al. (1982) also stratified outcomes by degree of stenosis, but the moderate stenosis group (AP diameter between 11 and 14 mm) contained only seven patients, while the marked stenosis group (<10 mm) contained 15 patients. The sample size < 10 in the moderate stenosis group prevents a reliable comparison of the data between patient groups. The average followup examination after surgery was 29 months, but the range was 14 to 70 months. The small sample size and the large range in followup times prevent the use of data from this study in our analysis.

Herno et al. (1999a) and Herno et al. (1999b) retrospectively examined surgical outcomes based on the presence of stenosis at postsurgical followup. Herno et al. (1999a) reported on patients treated surgically for the first time between 1982 and 1984. Herno et al. (1999b) reported on patients who underwent surgery between 1985 and 1987. The stenosis was found at the original surgical site or an adjacent vertebral level. The mean patient ages were between 50 and 55 years at the time of surgery. Since these are retrospective trials, no pretreatment scores were reported for the outcome measures. Without baseline scores, the extent of patient disability before treatment cannot be judged or compared to patients in other studies. In addition, the degree of improvement over baseline conditions cannot be determined. This makes the results of these studies difficult to interpret. Baseline clinical features were reported. In Herno et al. (1999b) the nonstenosis group (15 patients) had a significantly greater baseline mean cross-sectional area of the dural sac compared to the stenosis group (41 patients) (145 mm2 vs. 95 mm2, p <0.0001, Mann-Whitney test). Laminectomy, extended laterally to decompress the nerve roots, was performed on all patients. After a mean followup time of 10.2 and 11.1 years for the nonstenosis and stenosis groups, respectively, no statistically significant differences were found in the mean Oswestry score (a measure of back pain disability) (28.7 v. 31.2, Kruskal-Wallis test) and walking capacity (15 minutes on a treadmill) (515 m v. 470 m, Kruskal-Wallis test).

Herno et al. (1999a) reported similar baseline clinical characteristics among the 35 patients with no stenosis postsurgery and the 57 patients with postsurgery stenosis. After a mean followup time of 3.5 years, no statistically significant differences were found in the mean Oswestry score (28.4 vs. 26.4, p = 0.755, Kruskal-Wallis test) and walking capacity (706 m vs. 602 m, p = 0.178, Kruskal-Wallis test). In these two studies, the presence of postsurgery stenosis was not correlated with surgical outcomes as measured by disability from back pain and walking capacity at the time of followup. The lack of pretreatment scores prevents any assessment of the actual effect of surgery on these patient groups, and the study does not provide evidence for a correlation between patient characteristics and the success or failure of surgery.

Jonsson et al. (1997) used a logistic regression analysis to correlate patient characteristics with favorable outcomes. In 105 patients with an average age of 65 years (range of 37-83 years), patients with an AP diameter of <6 mm had the most favorable outcomes at 5 years after surgery. No statistical information (p values, correlation coefficients) was reported to support these conclusions.

Effect of Age on Surgical Outcomes Among Patients With Central Lumbar Stenosis

Hanakita et al. (1999) retrospectively stratified by age (<64 years and >65 years at the time of surgery) patients who received laminectomy only, partial hemilaminectomy, or laminectomy plus fusion. A sufficient number of patients were available in the laminectomy-only group to evaluate the effect of age on surgical outcome (59 patients younger than 65 and 61 patients equal to or older than 65). However, the followup time was not specifically reported and could have been between one and eight years for any patient. During the followup period, 12 patients in the younger group were lost to followup (20 percent) and 17 patients in the older group were lost to followup (28 percent). Based on patient evaluation of surgical outcome, 77 percent of the younger group and 66 percent of the older group were considered "cured" or "better" as compared to "unchanged" or "worse" according to the rating scale used by the authors. Our computed effect sizes for this change yielded a Hedges' d of 0.054 with confidence limits of −0.155 and 0.263, which is not statistically significant. If dropouts are considered failures and put in the "worse" category, the analysis still showed no difference between the groups but shifted somewhat in favor of the younger group (Hedges' d = 0.349 with confidence limits of −0.011 and 0.710). This study suggests that surgical success based on patient evaluation of outcome ("I'm better than I was") is not affected by age. This is not the same as saying that surgery benefited both age groups to the same extent. The lack of pretreatment assessment scores describing patient condition before surgery prevents the assessment of actual surgical benefit. The older age group may regard "cured" or "better" as a small improvement, while the younger age group may require a large improvement before they consider themselves "cured" or "better." Assessment of changes (differences in pre- and posttreatment scores) in walking capacity, pain, disability, and activities of daily living are needed for an actual determination of surgical benefit in the two age groups.

Effect of Herniated Disks on Surgical Outcomes Among Patients With Central Lumbar Stenosis

Tajima et al. (1980) divided lumbar spinal stenosis patients into two groups: those with and without accompanying herniated disks. However, the two groups are not readily comparable because of differences in surgical methods with the groups. The stenosis-only group contained 11 patients who received standard wide laminectomy and two patients who received laminectomy plus fusion. The stenosis-and-disk group contained six patients who received laminectomy only, five patients who received laminectomy plus fusion, and three patients in whom "Love's method" was used but not described. The data in this study cannot be used to address the effect of concurrent herniated disk on surgical outcome in lumbar spinal stenosis patients because differences in outcomes between the groups may be due to differences in surgical methods as well as the presence of a herniated disk.

Effect of Patient Health and Comorbidity on Surgical Outcomes Among Patients With Central Lumbar Stenosis

Katz et al. (1999) looked for predictors of surgical outcome in 199 patients two years after surgery. These patients received either standard decompression (138), decompression with fusion (31), or decompression with fusion and instrumentation (30). The physical examination and radiographic variables were not associated with outcomes. The best predictors of symptom severity and walking capacity after surgery were the patients ratings of their own health and the presence of cardiovascular comorbidity. Better walking capacity, better mental health, decompression with fusion, and higher income had borderline significance with symptom severity and walking capacity outcomes. However, these variables combined accounted for only 33 percent of the variation in walking capacity outcome and 27 percent of symptom severity outcome.

Jonsson et al. (1997) used a logistic regression analysis to correlate patient characteristics with favorable outcomes. At 5 years after surgery, patients with comorbid disorders affecting walking ability fared significantly worse than those without these comorbid conditions. No statistical information (p values, correlation coefficients) was reported to support these conclusions.

Thomas et al. (1997) retrospectively examined 26 patients who received either laminectomy or laminotomy in order to determine which patient characteristics were related to patient outcomes. The average age was 68 years (range: 40 to 86). The authors noted that among the patients with poor outcomes there was a multiplicity of other diseases compared to those patients with good outcomes. No statistical analysis was presented. With only 26 patients, this study may have too few patients from which to predict outcomes based on patient characteristics.

Yone et al. (1996) and Ray (1982) both retrospectively examined patients who received different surgical procedures, but did not stratify outcomes by patient characteristics. Grob et al. (1995) randomized patients to different treatments, but did not report outcomes based on patient characteristics. Therefore, these studies cannot be used to judge if patients with less severe symptoms benefit from surgery as well as patients with more severe symptoms.

Effect of Herniated Disks on Surgical Outcomes Among Patients With Lateral Lumbar Stenosis

The study by Kirkaldy-Willis et al. (1982) was a retrospective case series in which patients were selected if they had one of the following four conditions: lateral stenosis, lateral stenosis with disk herniation, lateral and central stenosis, and lateral and central stenosis with disk herniation. The average age for the entire group of patients was 46 years (range of 41 to 52 years). Between 12 and 120 months after partial laminectomy, these patients were asked to assess the degree of improvement in their condition. We analyzed these categorical data to determine if disk herniation in combination with lateral stenosis was detrimental to patient recovery compared to lateral stenosis alone (see Table 29). Based on patient reporting of general improvement after surgery, the patients with herniated disks in addition to stenosis experienced greater improvement (Hedges' d = −0.600, confidence intervals of −1.028 to −0.171). Improvement after surgery is based on the patient's perception of their pain and disability before treatment. The patients with herniated disks may have experienced more pain or disability before surgery, but in this retrospective study, no pretreatment measures of pain or disability are provided to judge each group's relative condition before surgery. Therefore, this study does not help us in determining if the pretreatment condition of patients with lateral lumbar stenosis influences their response to surgery.

Table 29. Analysis of Global Success Data from Kirkaldy-Willis, Wedge, Yong-Hing et al., 1982.

Table

Table 29. Analysis of Global Success Data from Kirkaldy-Willis, Wedge, Yong-Hing et al., 1982.

Effect of Patient Characteristics on Surgical Outcomes Among Patients With Degenerative Spondylolisthesis

Four randomized controlled trials and six controlled trials examined surgical treatment of degenerative spondylolisthesis (see Table 30). The randomized controlled trial of Thomsen et al. (1997) included patients with isthmic spondylolisthesis, secondary degenerative spondylolisthesis, and primary degenerative spondylolisthesis, and reported surgical outcomes separately for each of these groups. However, when reporting on surgical outcomes according to patient characteristics, such as duration of symptoms and number of levels fused, all three types of spondylolisthesis were combined. Therefore, this study does not report evidence specific to primary degenerative spondylolisthesis to address the question of patient characteristics and benefit from surgery. The randomized controlled trials of Fischgrund et al. (1997), Bridwell et al. (1993), and Herkowitz and Kurz (1991) did not stratify or analyze data based on patient characteristics. In the controlled trials of Plotz and Benini (1998), Yuan et al. (1994), Satomi et al. (1992), Lombardi et al. (1985), Fitzgerald and Newman (1976), and Rosenberg (1976) data within the surgical treatment groups was not analyzed or reported in relation to patient characteristics. Therefore, these studies cannot be used to address the connection between patient characteristics and surgical outcome.

Table 30. Controlled Trials of Patients with Degenerative Spondylolisthesis.

Table

Table 30. Controlled Trials of Patients with Degenerative Spondylolisthesis.

Summary

Poor study quality, especially the lack of pretreatment measurements of patient condition, reduce the usefulness of available data to answer the question of whether patient signs, symptoms, and other characteristics determine the success of surgery for degenerative lumbar stenosis. The few studies that do stratify outcomes by patient characteristics, especially those that examined degree of stenosis, did not find a connection between successful treatment and specific patient characteristics. The data from these trials seem to suggest that a select group of patients benefited from surgery; however, insufficient information is presented in the trials to determine which patients make up this group. Regression analysis from two studies suggests that patients in poor health due to comorbidity may have inferior outcomes after surgery compared to healthier patients.

One randomized controlled trial and two controlled trials address the efficacy of surgical treatments for patients with lateral lumbar spinal stenosis (see Table 28). The randomized controlled trial by Lee and deBari (1986) and the retrospective trial by Mikhael et al. (1981) did not stratify results by patient characteristics. Lee and deBari (1986) provides very little patient data beyond that of the mean age of the patients. The study by Mikhael et al. (1981) was intended to provide a neuroradiological evaluation of patients with lateral stenosis and asymptomatic patients and therefore only has one surgery group that is not stratified by patient characteristics. Therefore, this study cannot be used to address the connection between patient characteristics and surgical outcome.

Table 28. Controlled Trials of Patients with Lateral Lumbar Spinal Stenosis.

Table

Table 28. Controlled Trials of Patients with Lateral Lumbar Spinal Stenosis.

Comparison of Surgical Versus Conservative Treatment

In this part of our analysis, we turn to the question that is implicit in our primary question, "Do some patients benefit more from surgery than from medical management?" We located eight studies that examined patients with central and/or lateral stenosis who received nonsurgical or surgical treatment (see Table 31). For each of these studies, we reviewed the design and results to determine if these studies provided reliable evidence to answer this question. In this process, we calculated, wherever possible, each study's effect size. A critical requirement in any study trying to address this question is having comparable patients in both the nonsurgery and surgery groups. Only one of the studies is a randomized controlled trial, so potential bias may exist in how each study assigned patients to treatment groups. Therefore, within each study we will determine if the patient groups are comparable in terms of pretreatment signs, symptoms, and measurements used to assess posttreatment success. These trials do not provide evidence for the effectiveness of any one conservative treatment because multiple types of conservative treatments were used in each nonsurgery patient group.

Table 31. Trials Comparing Surgery and Conservative Therapy.

Table

Table 31. Trials Comparing Surgery and Conservative Therapy.

Johnsson et al. (1991) compared 44 patients treated surgically for lumbar spinal stenosis with 20 patients not surgically treated. The nonsurgery patients refused surgery (19 patients) or the anesthesiologist refused to administer anesthesia because of advanced cardiovascular disease (2 patients). To obtain comparable groups for analysis, one patient in the nonsurgery group with severe stenosis was removed from the analysis, and the surgery patients were divided into severe stenosis (30 patients) and moderate stenosis (14 patients). Thus, we are concerned with the comparison of the conservatively treated moderate-stenosis group with the surgically treated moderate-stenosis group. The smallest AP diameter had a mean and standard deviation of 7.9 ±2.3 mm in the moderate-stenosis surgery group and 8.6 ±1.7 mm in the nonsurgery group. The mean ages were 60, 62, and 69 years for the nonsurgery, moderate-stenosis, and severe-stenosis groups, respectively. Neurogenic claudication was diagnosed before treatment in 84 percent, 77 percent, and 86 percent of patients in the nonsurgery, moderate-stenosis, and severe-stenosis groups, respectively. The signs and symptoms of lumbar spinal stenosis and the extent of stenosis seen by myelography are the only evidence reported that indicate that the nonsurgery and moderate-stenosis groups are comparable. Followup exams were performed, on average, 31 months after treatment in the nonsurgery group and 50 months after treatment in the moderate-stenosis surgery group.

The original data form Johnsson et al. (1991) on the occurrence of neurogenic claudication and on walking capacity, plus our analysis of these data, are contained in Table 32 and Table 33. The data on neurogenic claudication indicate that the frequency of this symptom declined to the same extent in both groups. The chances of recovery from neurogenic claudication were equal between groups (see Table 32). The analysis of the walking data is complicated by the significantly better capacity of the nonsurgery group before treatment (see Table 33). The nonsurgery group could, on average, walk 1,169 m more before treatment than the moderate-stenosis group. After surgery, the two groups had comparable walking capacity. This study seems to indicate that patients with moderate spinal stenosis can recover from some of the symptoms associated with lumbar spinal stenosis. However, this evidence is weakened by the apparent differences between patients in the nonsurgery and surgery groups in initial walking capacity. The authors have pointed out that this difference in walking capacity may indicate that symptoms were more severe in the patients treated surgically.

Table 32. Analysis of Neurogenic Claudication Comparing Nonsurgical Patient to Surgical Patients from Johnsson, Uden, and Rosen, 1991.

Table

Table 32. Analysis of Neurogenic Claudication Comparing Nonsurgical Patient to Surgical Patients from Johnsson, Uden, and Rosen, 1991.

Table 33. Analysis of Walking Capacity in Meters Comparing Nonsurgical Patient to Surgical Patients from Johnsson, Uden, and Rosen, 1991.

Table

Table 33. Analysis of Walking Capacity in Meters Comparing Nonsurgical Patient to Surgical Patients from Johnsson, Uden, and Rosen, 1991.

Nagler and Bodak (1993) retrospectively analyzed the recovery from symptoms of 41 conservatively treated patients and 39 patients who received laminectomies. The degree of improvement, based on a ratio of the original symptoms to those at one year, was similar between groups (54 percent of conservative-treaatment patients and 56 percent of surgery patients had >50 percent improvement; X2 = 0.06114, p = 0.804689, according to our calculations). The actual numbers of patients with each symptom before and after treatment were not reported. Without these data, we cannot determine the comparability of the patient groups before treatment. The authors suggest that the surgery patients represent a selected group with fewer concomitant medical problems. Therefore, the data in this study cannot be used to conclusively determine if patients with comparable levels of lumbar spinal stenosis do better with conservative treatment or surgery.

A recently published randomized controlled trial does provide a comparison of a nonsurgical control group with a surgery group with the same baseline patient characteristics (Amundsen T, Weber H, Nordal HJ et al., 2000). In this study, randomization was considered ethical only among patients whose severity of pain did not indicate to the physician that either surgery or conservative therapy was the indicated treatment. The physicians selected surgery for 19 patients with severe symptoms and conservative treatment for 50 patients with mild symptoms. The remaining 31 patients were randomized: 13 patients received surgery (laminectomy without fusion) and 18 patients received conservative therapy (a brace and rehabilitation for one month, followed by an additional 2 months with the brace, and physiotherapy).

At followup times of 6 months, 1 year, 4 years, and 10 years, patients were evaluated for pain, working ability, assessment of own condition, walking ability, and physical activity. The examining physician used these findings to determine if a patient's clinical status was "excellent," "fair," "unchanged," or "worse" than at the start of the trial. These categorical data are presented in Table 35. Within 3 to 27 months of entering the study (median 3.5 months), 10 of the 18 conservatively treated moderate-symptom patients were crossed over to surgery (56 percent). Among the patients with moderate symptoms and randomized to treatment, a higher percentage who received surgery were rated good (excellent and fair category combined) at each of the followup periods than those who received conservative therapy (6 months: 92 percent v 39 percent; 12 months: 69 percent vs. 33 percent; 4 years: 85 percent vs. 44 percent; 10 years: 77 percent vs. 44 percent). This finding may be an artifact of the study design.

Table 35. Examiner Assessment Data from Surgically and Conservatively Treated Patients with Lumbar Spinal Stenosis from Amundsen T, Weber H, Nordal HJ et al., 2000.

Table

Table 35. Examiner Assessment Data from Surgically and Conservatively Treated Patients with Lumbar Spinal Stenosis from Amundsen T, Weber H, Nordal HJ et al., 2000.

The trial by Amundsen et al. (2000) represents an attempt to randomize patient treatment between conservative and surgical approaches and thereby resolve an important clinical question. Will lumbar spinal stenosis patients with moderate symptoms benefit more from surgery or from conservative treatment? The data as presented by the authors suggest that surgery may be more beneficial. However, several design and reporting problems reduce the strength of this conclusion. Even the authors acknowledged that "because the situation was observational, the existence of hidden confounders and selection bias made it mandatory for the authors to be descriptive and noninferential." Our assessment of these potential hidden confounders and selection bias follows.

The criteria for assignment to the mild-, moderate-, or severe-symptom groups are not clearly stated. The authors assert that intensity of patient pain was the most important reason for selection to the severe-symptom group which received immediate surgery. However, several pieces of evidence indicate that physicians may have underrated the pain and severity of condition in some patients, resulting in these patients' inclusion in the moderate group as opposed to the severe group. First, the median time lag to crossover to surgery was 3.5 months, with a range of 3 to 27 months. This means that perhaps half of the conservatively treated moderate patients switched to severe patients within 3 months of entering the trial. If the original diagnoses had been correct, one would not have expected that half of the crossovers would have occurred by 3.5 months. Individual or subgroup data on when crossovers occurred were not reported.

Second, while 14 of 19 severe-symptom patients (74 percent) reported having severe pain at the start of the trial, 20 of 31 moderate-symptom patients (65 percent) and 24 of 50 mild-symptom patients (48 percent) also reported severe pain at the start of the trial. This indicates that severity of pain was not the only determining factor in how the physicians allocated patients to the treatment groups. Supporting this notion is that larger proportions of older patients appear in the mild-symptom group than in the other two groups. Nine of the 12 patients (75 percent) older than 71 years appear in the mild group (moderate: 8 percent, severe: 17 percent). Older patients may have been selected for conservative treatment because they would be expected to have a poorer prognosis after surgery. Although this is appropriate clinical judgment, this allocation introduces a bias into the trial that could favor surgery.

Third, there appears to be a difference between how physicians and patients rated symptoms. This difference manifests itself as an underrating by the physician. Thus, the authors report of the agreement between physician and patient yields a kappa statistic of 0.59 at 6 months and smaller kappas at later followup periods. This degree of agreement between patients, and examiner is moderate to small, with patients reporting more in the worse category than examiners (6 vs. 3). Such underrating of patient pain by medical personnel has also been observed in other studies (Choiniere, Melzack, Girard et al., 1990; Daniel, Long, Murphy/Kores et al., 1983). Because of physician underrating, patients may have been assigned to the randomized (moderate) group when they should have been assigned to the surgical (severe) group. If randomized to conservative treatment, these patients would be more likely to have unsuccessful results and be crossed over to surgery. The effect of misclassification is to artifactually reduce the reported effectiveness rate of conservative treatment, because patients who are crossed over are considered failures by intent-to-treat analysis. Misclassification increases the apparent difference in effectiveness between surgery and conservative therapy. Further, the size of this apparent difference increases as more patients are misclassified into the moderate group.

We can make one of two conclusions from our analysis of the patients in the Amundsen et al. (2000) trial. First, surgery is superior to conservative treatment for patients with moderate symptoms. As discussed above, the data are confounded and this conclusion lacks support. The second possible conclusion is that the apparent superiority of surgery in moderate patients is an artifact caused by inclusion of severe patients in the moderate-symptom group who then fail conservative therapy. In which case, this is evidence that surgery is superior to conservative treatment among patients with severe symptoms. Another trial, with more carefully designed patient selection and characterization into mild, moderate, and severe groups, will allow us to determine the actual extent to which surgery or conservative treatment benefits these groups.

Amundsen et al. (2000) provide three measures by which the efficacy of surgery compared to conservative therapy can be judged. The first outcome measure is the number of patients needing surgery after first receiving conservative treatment. In the mild-symptom group, only 20 percent of patients eventually needed surgery, indicating that conservative treatment may be justified in this group. In the moderate-symptom group, 54 percent of patients needed surgery. However, as discussed above, this number may be inflated by physician underrating of patient condition and judging severe patients as moderate. Therefore, conservative treatment is more likely to fail in these patients.

The second outcome measure is the examiner assessment of "overall treatment result." This was described as a subjective global assessment based on the following components: (1) a patient-reported scale of "better," "unchanged," or "worse" compared to their condition at the start of the trial; (2) pain; (3) working ability; (4) walking ability; (5) level of physical activity; and (6) the opinion of the examining physician. We would expect pain to be the major component of this scale because pain influences patient working, walking, and physical activity. Examiners were not blinded to treatment. As mentioned above, the agreement between examiner assessment and patient assessment is moderate to poor.

Table 36 shows a comparison of the third outcome measure, patient-reported pain, and overall treatment result at 4 and 10 years for each of the four original treatment groups and the two crossover groups (mild conservative to surgery and moderate conservative to surgery). We used a Wilcoxon Signed Ranks Test to test the hypothesis that the number of patients reporting no pain is equal to the number of patients rated "excellent." At four years, the hypothesis is rejected because, in each treatment group, more patients were rated as "excellent" than were reported as having no pain. This suggests that pain was not a primary determinant of the physician's assessment of treatment results. At 10 years, no significant difference was found in the number of patients rated as "excellent" and the number of patients with no pain. The trend agreed with the 4 year data, but was not statistically significant, possibly because of low power. Given the examiner tendency toward underrating pain and overrating a patient's overall condition, the most reliable measure of treatment outcome presented by Amundsen et al. (2000) is patient-reported pain.

Table 36. Patient Reported Pain Data Compared to Examiner Assessment Data from Surgically and Conservatively Treated Patients with Lumbar Spinal Stenosis from Amundsen T, Weber H, Nordal HJ et al., 2000.

Table

Table 36. Patient Reported Pain Data Compared to Examiner Assessment Data from Surgically and Conservatively Treated Patients with Lumbar Spinal Stenosis from Amundsen T, Weber H, Nordal HJ et al., 2000.

If complete relief from pain is a goal of treatment for lumbar spinal stenosis, then few patients obtained that goal at 4 or 10 years with either conservative or surgical treatment. Patient-reported pain data are presented by Amundsen et al. (2000) for the start of the trial, at 3 months, 4 years, and 10 years. Patients were categorized as having no pain or light pain, moderate pain, or severe pain. At the start of the trial, the only patients with light pain were found in the mild-symptom, conservatively treated group. The data in Table 36 indicate that only one-third of the severe/surgery patients were pain-free at 4 years, and only half were pain-free at 10 years. In the moderate group randomized to surgery, 42 percent of patients were pain-free at 4 years, and 45 percent of patients were pain-free at 10 years. Of patients with mild symptoms who remained with conservative treatment, 34 percent and 44 percent were pain-free at 4 and 10 years, respectively. Of patients with moderate symptoms who remained with conservative treatment, 13 percent and 25 percent were pain-free at 4 and 10 years, respectively. Among the patients who were crossed over from conservative to surgical treatment, 30 percent and 43 percent of mild-symptom patients and 13 percent and 33 percent of moderate-symptom patients were pain-free at 4 and 10 years, respectively. As reported, these data are difficult to interpret. A measure of the magnitude of improvement in pain after treatment would provide a better gauge of treatment success. Although the authors measured pain on a visual analog scale, this data is not reported. Other measures of treatment success such as physical activity and walking ability were assessed but not reported.

Atlas et al. (1996) reported on o1 year outcomes of patients in the Maine Lumbar Spine Study, a prospective, observation cohort study of patients with spinal stenosis treated surgically (81 patients) or nonsurgically (67 patients). This trial has three design features that make its results more reliable than the Amundsen et al. (2000) trial. First, Atlas et al. (1996) used an objective scoring system for classifying patients' severity of disease. Therefore, misclassification of patients with severe disease into the moderate category was less likely. Second, there were no crossovers from conservative to surgical treatment. Therefore, the effect of surgery was not exaggerated. Third, study outcomes were based on patient ratings and not on physician ratings. Therefore, results were not skewed by the observer.

In this trial, laminectomy was performed in 88 percent of the surgery patients. Nonsurgery patients received mostly bed rest (29 percent), back exercises (39 percent), physical therapy (23 percent), spinal manipulation (23 percent), narcotic analgesics (21 percent), and epidural steroids (18 percent). Extensive patient information reported in the article showed that the severity of symptoms and the degree of disability because of pain were significantly greater in the surgically treated patients. These patients reported more frequent and severe leg and back pain and poorer functional status, but had greater improvement than patients treated nonsurgically. Because patient preference was the most common reason for not choosing surgery, the nonsurgery group may have selected conservative treatment because their symptoms were less severe than those patients who chose surgery. Therefore, the entire pool of data in this study cannot be used to determine if patients with comparable levels of lumbar spinal stenosis do better with conservative treatment or surgery.

However, a subgroup of 54 patients (31 surgery and 23 nonsurgery patients) reported moderate symptoms before treatment. Results from this group are useful for comparing conservative and surgical treatments. Patients receiving surgery were significantly improved compared to patients who did not receive surgery.

Our analysis of percentage data from this group for symptom improvement, overall results of treatment, and patient satisfaction with treatment are presented in Table 34. Since these patients with moderate symptoms were not randomized to treatment, the results may be biased by unknown differences between the two groups. However, these results do provide some evidence that among patients with lumbar spinal stenosis who have moderate pain, surgery may be more beneficial than conservative treatment. Four-year outcomes for this same group of patients were recently published. (Atlas, Keller, Robson et al., 2000) The data from this report continue to show better outcomes among patients who initially had moderate pain and received surgery. Of the 68 patients originally treated nonsurgically, 15 (22 percent) underwent surgery after 3 to 48 months (median 17 months).

Table 34. Effect Sizes for One and Four Year Outcomes of Patients with Moderate Symptoms at Baseline from Atlas, Deyo, Keller et al., 1996 and Atlas, Keller, Robson et al., 2000.

Table

Table 34. Effect Sizes for One and Four Year Outcomes of Patients with Moderate Symptoms at Baseline from Atlas, Deyo, Keller et al., 1996 and Atlas, Keller, Robson et al., 2000.

The validity of the results of this study is threatened by a high dropout rate and the authors' failure to report the characteristics of those patients who dropped out. The article reports that 148 patients were enrolled in the study, and those in the 25th to 75th percentiles for severity were considered "moderate." While the percentile calculation would suggest that 74 patients should be in this category, only 54 patients were reported (31 surgical, 23 nonsurgical). We presume that nine patients in the moderate-severity group had not reached the one-year followup, because the authors report a total of 130 patients who had reached followup (half of which is 65). This leaves 11 patients unaccounted for. To test whether these dropouts could have threatened the validity of the observed effect, we repeated the effect size calculation under a worst-case scenario, in which all 11 patients were arbitrarily counted as surgical failures. When the calculations were repeated in this manner, effect sizes remained significant for two of the three outcomes of interest at the 1 year followup: "major symptom much better" and patient satisfaction. In the worst-case scenario, the effect size for overall treatment results decreased to a statistically insignificant value. The calculations and effect sizes are shown in Table 34.

Atlas et al. (2000) reported on changes in patients' predominant symptom and satisfaction with treatment at 4 years. Using the data from 29 surgical patients and 22 nonsurgical patients, the authors showed that surgical patients had significantly better outcomes. Our analysis of the reported data and the results of our worst-case scenario for the selected four-year outcomes are shown in Table 34. Overall, results of treatment at four years were not reported by Atlas et al. In the worst-case scenario, the greater number of dropouts at four years (14 instead of 11) may have decreased the effect sizes for both symptomatic improvement and patient satisfaction to a statistically nonsignificant level. However, the effect sizes were still positive. Therefore, the statistical significance of the observed long-term effects is not robust to this worst-case sensitivity analysis of dropouts.

Swezey (1996) retrospectively evaluated the progress of 47 patients diagnosed with neurogenic claudication 5 years earlier. At the time of diagnosis, the patients' average age was 76 years. Forty-three of these patients had moderate to marked lumbar spinal stenosis. The authors do not report how patients were judged to be mild, moderate, or severe with regard to stenosis. No spinal canal measurements were reported. All patients were started on conservative treatment (exercise, use of a cane, analgesic and nonsteroidal anti-inflammatory drugs). Thirteen patients received epidural steroid injections when other measures did not provide relief from neurogenic claudication. During the 5-year period, 11 patients were given laminectomy to relieve symptoms. This group was considered to have a greater proportion of moderate to severe neurogenic claudication, and all were considered improved after surgery. However, the lack of data on spinal canal diameters reduces the usefulness of this data in predicting who will benefit from conservative treatment and who will need surgery. Of the other 36 patients, 20 reported improvement in symptoms, 14 reported no change, and two reported a worsening of symptoms. This is one of the few trials that follows patients from diagnosis through conservative treatment and then to surgery. Although the numbers may be too small to provide a reliable estimate of the success of conservative treatment, this study indicates that 72 percent of patients (34 of 47) that begin conservative treatment will improve or remain the same, while 23 percent of patients (11 of 47) will eventually receive surgery for relief of symptoms. Those patients that receive surgery will tend to have greater stenosis.

Herno et al. (1996) attempted to generate comparable treatment groups by retrospectively matching surgical patients to a group of 57 nonsurgical patients. Patients were matched according to sex, age, myelographic findings, major symptoms, and duration of symptoms. Fifty-four matched pairs were constructed. Total spinal block and subtotal block occurred in only one and three matches, respectively. The remainder of the matches had AP diameters of <12 mm (31 patients) or had lateral stenosis (19 patients). The followup periods were 4.3 and 4.1 years for the nonsurgery and surgery groups, respectively. At followup, measures of disability and functional status were similar between groups. The authors caution that an important shortcoming of this retrospective study is the lack of knowledge about starting pain level and disability in either group and that patients with more pain are likely to have selected surgery over conservative treatment. This precludes any reliable comparison of surgery and medical treatment.

Hurri et al. (1998) retrospectively assessed the outcomes of surgery and conservative treatment 12 years after treatment began. Among the 57 patients in the surgery group, 26 (46 percent) were considered to have severe stenosis (<7.0 mm sagittal diameter of the canal), while in the nonsurgery group, 6 of 18 patients had severe stenosis. The authors report that there were no statistically significant differences in the percentage of patients improved (63 percent surgery vs. 44 percent conservative) or worse (18 percent vs. 11 percent) after 12 years. Although long followup periods are, in general, desirable, the authors suggest that after such a long period, any current patient problems may be caused by factors other than the original stenosis and that this could obscure the efficacy of treatment. This, however, is a difficult hypothesis to test. A study with multiple followup times would seem to be needed, as well as an accounting of the length of time that symptoms are relieved and the number of patients who received relief but later had a return of symptoms. Therefore, this study was not used in our analysis.

The study by Mariconda et al. (2000) consisted of 20 patients who received surgical treatment (standard wide decompressive laminectomy) and 17 patients who did not receive surgery. This latter group consisted of 14 patients who refused surgery and three patients who were not considered for surgery due to advanced cardiopulmonary disease. The treatment given to the patients who did not receive surgery was not described. All patients were older than 40 years, and the mean for the most stenotic dural sac cross-sectional area was 68 mm2 in surgical patients and 78 mm2 in conservatively treated patients. Baseline characteristics were different between the groups. The nonsurgery group had a better functional status as determined by the overall Beaujon scoring system (mean and standard deviation of 11 ±2.4 v. 8.1 ±2.7, p <0.05, t-test), the Beaujon score for leg pain at exertion (0.8 ±0.7 v. 0.25 0.6, p <0.05, t-test), and the Beaujon score for neurological deficits (3.3 ±1.0 v. 2.3 ±1.0, p <0.05, t-test). In the Beaujon scoring system, the overall score goes from 0 (worst) to 20 (normal), the leg pain at exertion score goes from 0 (severe) to 2 (none), and the neurological deficit score goes from 0 (major or sphincter dysfunction) to 4 (none).

Our calculation of the effect sizes (Hedges' d) for these baseline characteristics were −1.1 and confidence intervals of −1.80 and −0.41 for the overall Beaujon score; −0.83 and confidence intervals of −1.50 and −0.16 for leg pain at exertion; and −0.98 and confidence intervals of −1.66 and −0.29 for neurological deficits. A comparison of outcomes and effect sizes at pretreatment, 1, and 2 years posttreatment are presented in Figures 21, 22, and 23. While baseline scores showed differences, the scores at one and two years posttreatment were nearly identical, implying that the surgical group had benefited more than the conservative group. However, this may be a case of each group reaching the maximum score possible for these patients (a ceiling effect). In Mariconda et al. (2000), patients with less severe symptoms of lumbar spinal stenosis appeared to improve with conservative treatment alone. However, the discussions we present in the section entitled "Factors That Could Potentially Predict the Success of Conservative Treatment" suggests that this conclusion is far from certain. The lack of a nonsurgical control group with baseline scores similar to the surgical group prevents a reliable evaluation of the benefits of surgery. Although scores improved after surgery, we cannot know the extent to which these scores would have improved or become worse if surgery had not been performed.

Figure 21. Beaujon Score from Mariconda, Zanforlino, Celestino et al., 2000.

Figure

Figure 21. Beaujon Score from Mariconda, Zanforlino, Celestino et al., 2000. The Beaujon Scoring System goes from a worst possible functional score of 0 to a normal functional score of 20.

Figure 22. Leg Pain at Exertion Score from Mariconda, Zanforlino, Celestino et al., 2000.

Figure

Figure 22. Leg Pain at Exertion Score from Mariconda, Zanforlino, Celestino et al., 2000. The Beaujon Scoring System for leg pain exertion goes from a score of 0 (severe) to 2 (no pain).

Figure 23. Beaujon Score for Neurological Deficit from Mariconda, Zanforlino, Celestino et al., 2000.

Figure

Figure 23. Beaujon Score for Neurological Deficit from Mariconda, Zanforlino, Celestino et al., 2000. The Beaujon score for neurological deficit goes from 0 (major or sphincter dysfunction) to 4 (none).

Some information on the efficacy of conservative vs. surgical treatment for patients with severe symptoms may be obtained from prospective surgical trials provided pretreatment patient characteristics are clearly stratified and reported, and patients have previously failed conservative treatment. As discussed later in Chapter 5, few surgical studies report the use of and failure of prior conservative therapy. Two single arm surgical trials (no nonsurgical patients were included in the studies) have reported patient characteristics and the failure of prior conservative treatment (Weiner, Walker, Brower et al., 1999; Kleeman, Hiscoe, and Berg, 2000). These studies provide some evidence that patients with severe symptoms will not benefit from conservative treatment but will benefit from surgical treatment. Independent evaluation of patient outcomes in both trials increases the validity of this.

Weiner et al. (1999) demonstrated that partial laminectomy improved walking ability in 30 patients with severe neurogenic claudication (average age of 68 years). Nine months after surgery, the average walking ability increased from approximately 100 m to between 300 m and 600 m. Other measures of patient outcomes also improved; 13 patients had almost complete pain relief and 13 patients had a good deal of pain relief. In a similar study with 54 patients with neurogenic claudication (average age 71 years), Kleeman et al. (2000) used partial laminectomy to relieve leg pain. Two and a half years after surgery, 65 percent of patients reported complete relief from their leg pain and 33 percent reported that their pain was better.

Single arm trials (noncontrolled) have a large potential for bias in favor of the therapy under investigation. Therefore, the studies by Weiner et al. (1999) and Kleeman et al. (2000) may be biased in favor of reporting successful results for surgical treatment. Uncontrolled, nonrandomized controlled trials, and historically controlled trials have been shown to favor new therapies over the control therapy compared to RCTs (Sacks, Chalmers, and Smith, 1982; Colditz, Miller, and Mosteller, 1989). Uncontrolled single treatment trials may therefore be expected to produce an over estimate of the benefits of the therapy.

Katz et al. (1999) prospectively evaluated the pain and walking ability of 263 preoperative patients and 199 patients at 2 years post surgery. Patients had a mean age of 69 years at enrollment (range of 50 to 92 years). The use of prior conservative therapy is not reported. Katz et al. (1999) reported that there were no significant differences between patients who dropped out (27 percent of patients by 2 years) and those who remained in the study. At enrollment, 81 percent of patients reported they were in severe pain. By 2 years after surgery, this was 31 percent. Also at 2 years, the percentage of patients who could walk two blocks had increased from 39 percent to 67 percent, and the percentage of patients who could walk 1 mile increased from 13 percent to 42 percent. Eighty percent or more of the patients in this study may be in the severe category of symptoms, but patient outcomes were not stratified according to preoperative conditions.

Other single arm prospective trials of surgical treatment also suggest that patients improve after surgery (Jonsson, Annertz, Sjoberg et al., 1997; Javid and Hadar, 1998). However, these trials have patients with a wide range of ages and symptoms that limit the use of their data for determining if mild-, moderate-, or severe-symptom patients benefit from surgery. Jonsson et al. (1997) examined 105 patients with an average age of 65 years, but a range of 37 to 83. All patients reported preoperative leg pain and dysfunction and were given partial laminectomies. Eighty-six patients were available through the 5 year followup period. At 2 years, leg pain was relieved in 67 percent of patients, and the percentage of patients with poor walking ability (<0.5 km) declined from 70 percent to 30 percent. By 5 years, only 52 percent of patients were free of leg pain and 35 percent of patients could not walk more than 0.5 km. These results for walking capacity should be contrasted with those of Weiner et al. (1999). In this study, patients improved from less than 100 meters to an average of 500 meters. Under the Jonsson et al. (1997) rating of walking ability, the Weiner et al. (1999) patients would still be considered to have poor walking ability. These differences in approach to rating walking ability and the wide ranges in walking ability complicate the comparison of study results and reduce the value of the data presented by Jonsson et al (1997). Average walking distances are not reported for each of the categories of walking ability in Jonsson et al. (1997). Therefore, we are not able to judge the degree of improvement associated with each initial category of walking ability, nor compare patients from Jonsson et al. (1997) with patients of the same condition in Weiner et al. (1999).

Javid and Hadar (1998) examined 86 patients with central stenosis (average 65 years, range of 27 to 89) and 23 patients with lateral recess stenosis (average 54 years, range of 25 to 79), all of whom received a standard wide laminectomy. Preoperative leg pain and walking difficulty was reported by 98 percent and 76 percent of patients with central stenosis, respectively, and 96 percent and 57 percent of patients with lateral recess stenosis, respectively. Data on walking ability are only reported at the last followup (range 1 to 11 years). At the last followup, patients with central stenosis judged their walking ability as much better or somewhat better (69 percent), no different (10 percent), and somewhat or much worse (21 percent). Patients with lateral recess stenosis judged their walking ability as much better or somewhat better (68 percent), no different (9 percent), and somewhat or much worse (23 percent). These walking data are of little value because 24 percent of the central stenosis patients and 43 percent of the lateral recess stenosis patients originally reported no difficulty in walking, yet their evaluation of walking ability is included in the last followup data. As in Jonsson et al. (1997), no average walking distances are reported for each of the categories of walking ability, and therefore, no comparisons to other studies are possible.

Summary

The lack of comparable patient groups and pretreatment data are a common problem in evaluating studies that examined both surgical and nonsurgical treatment groups. Typically, the two patient groups differ in the extent to which lumbar spinal stenosis affects their signs and symptoms and their pretreatment outcome measurements. This leads to treatment selection based on the extent of disease, and patients are not randomly assigned to treatment groups, except in the recent study published by Amundsen et al. (2000). Less severe cases may tend to receive conservative treatment, and more severe cases may tend to receive surgery. Data are lacking on the effect of conservative treatment on patients with severe stenosis because these patients seem to receive surgery shortly after diagnosis. In Swezey (1996), those patients who did not fare well with conservative treatment tended to have greater stenosis, but data on spinal canal diameter are not reported, and this trend cannot be verified. The trial published by Amundsen et al. (2000) offers the best available evidence that patients with severe symptoms benefit more from surgery than conservative treatment. In this trial, the apparent superiority of surgery in moderate-symptom patients may be an artifact caused by inclusion of severe-symptom patients in the moderate-symptom group. The failure of these patients to respond to conservative therapy supports the benefits of surgery for patients with severe symptoms. The studies by Herno et al. (1996) and Mariconda et al. (2000) suggest that patients with moderate stenosis may improve after receiving only conservative treatment. However, the data presented by Atlas et al. (1996, 2000) suggest that at one year after treatment, patients with moderate pain will benefit more from surgery than from conservative treatment. The statistical significance of the observed long-term effects in this study (4 years) is not robust to a worst-case sensitivity analysis of dropouts.

Question 7What is the relationship between the results of the imaging examination and the success or failure of surgical treatment?

The objective of this section of the report is to determine whether there is a relationship between the findings of the imaging examination and the outcomes of surgical treatment for spinal stenosis. If such relationships exist, it may be possible to use the imaging results for patient selection and increase the probability of success for the patients who do receive surgery.

Clinical articles that were identified by our literature searches as potentially having information to answer this question are listed in Table 16. Methods for the literature search and details of the study inclusion and exclusion criteria are reported in the section entitled "Quality of the Diagnostic Studies," which appears at the beginning of our discussion of Question 4. Not all the studies that reported both imaging findings and surgical results will be useful in answering this question. Some studies did not report the relationship between the two.

Degree of Stenosis

Twelve of the trials reporting imaging results reported the degree of stenosis (see Table 37). Some trials reported degree of stenosis by the absolute measurement of the spinal canal. Other studies categorized the degree of radiologically observed stenosis on scales with two to five points. Only these latter reports are analyzed in this section. No two articles reported the same system for categorizing the degree of stenosis, suggesting that there is no agreement on what degree of stenosis is considered severe and what is not.

Table 37. Relationship between Degree of Stenosis and Surgical Results.

Table

Table 37. Relationship between Degree of Stenosis and Surgical Results.

Hurri et al. (1998) used a two-point quantitative scale to measure stenosis: whether the sagittal (anterior-posterior) diameter of the dural sac (measured by myelography) was more or less than 7 mm. Their 1998 articlereports disability rates among their patients at long-term (12 years) followup. They used the Oswestry index to measure back pain disability (100 is completely disabled). Patients with severe stenosis at the time of treatment had greater disability (mean Oswestry index: 39.0) than patients with moderate stenosis (mean Oswestry index: 28.0). The difference was statistically significant both for the unadjusted score and when adjusted for age, sex, and other factors. Hurri et al. (1998) did not distinguish between patients given surgical and nonsurgical treatment. We cannot determine from this article whether surgical treatment was more effective in one group or the other (Hurri, Slatis, Soini et al., 1998).

A Finnish study reported stenosis grade in a five-point scale: total block, subtotal block, anterior-posterior diameter less than 10 mm, 10 to 12 mm, and more than 12 mm (Airaksinen, Herno, Turunen et al., 1997). All 438 patients in this series were treated surgically, so we cannot use these results to see whether image data can be used for patient selection. As with Hurri et al.'s (1998) trial, the Finnish group found a statistically significant association between stenosis degree and Oswestry score at followup (mean 4.3 years). However, this group found the opposite relationship; more severe stenosis was associated with better, not worse, outcomes. If more data from this study had been reported, it may have been possible to resolve this apparent contradiction. Preoperative Oswestry score data could have allowed one to determine if patients with the most severe pretreatment stenosis in the Finnish trial, like those in the Hurri trial, had more severe symptoms than patients with a lesser degree of stenosis.

Nishizawa and Fujimura (1997) used a three-point scale and reported myelography and enhanced CT results separately. Surgical outcomes were reported using the Japan Orthopaedic Association score. The "recovery rate" reported as the outcome measure controlled for preoperative condition because it measured the percentage degree to which patients' scores returned to normal. The percentage of patients with successful surgical outcomes was not significantly affected by stenosis degree. However, the statistical test used by the authors was not reported, and we cannot independently verify the findings. As with the Finnish group, no nonsurgical control group existed to compare results (Nishizawa and Fujimura, 1997).

Silvers et al. (1993) reported patient satisfaction rates in a subset of their case series with complete block seen on preoperative myelography. We were able to calculate the rates among patients without complete block (Table 37) from the totals reported in the article. The patients with complete block reported better results than those without complete block. However, no preoperative data are available to allow us to determine whether a difference between groups existed before surgery or was the result of surgery.

The study by Johnsson et al. (1991) compared patients with complete block on myelography to patients with moderate stenosis. Patients with spinal canal diameter of 12 mm or more were excluded. Patients who were not operated on were used as a control group, but patients were not randomized to that group. Comparisons of the operated and nonoperated groups are analyzed as part of Question 6. The mean spinal canal measurement was slightly larger in the nonoperated group than in the moderate stenosis group, and it was not reported whether any of the nonoperated patients had complete block. Several outcome measures were reported, including patients' and clinicians' estimations of results (worse, unchanged, improved), degree of pain, and estimation of walking distance. None of the results appeared to be associated with degree of stenosis (χ2 for overall results = 4.31, df = 2, p = 0.12; χ2 for pain = 0.19, df = 2, p = 0.91) (Johnsson, Uden, and Rosen, 1991).

The threshold defining "marked" and "moderate" stenosis in the article by Jalovaara et al. (1989) was 10 mm. Patient outcomes were measured after a mean of 3.8 years of followup and were based mostly on degree of pain and return to normal activities. "Good" outcomes were attained when patients had no more than slight pain and resumed normal activity with few limitations. "Fair" outcomes were attained if patients had some relief of pain and could resume limited activities. The other two outcomes, "unchanged" and "worse," are self-explanatory. The results appear to be better for patients with moderate stenosis than for those with severe stenosis. However, as with the other trials, we do not know the patients' preoperative condition, so we cannot tell whether the improvement was the result of the surgery (Jalovaara, Lahde, Iikko et al., 1989).

Surin et al. (1982) reported on a trial with similar groups of patients. This trial was specifically intended to measure the effect of spinal canal measurement on surgical results, but the patients' preoperative condition was reported in little detail, making it difficult to distinguish treatment effects from pre-treatment differences between groups. This renders interpretation of the results difficult (Surin, Hedelin, and Smith, 1982).

Johnsson et al.'s (1981) case series included some patients with no compression of the spinal cord evident on the myelogram (spinal canal AP diameter 11 cm or more) as well as patients with complete or partial block observed. There was no improvement after surgery in any of the patients whose myelograms were negative for stenosis. Otherwise, outcomes reported in the article were not significantly different among patients with moderate stenosis than among patients with a complete block. The investigators did not explain the unusual results for patients with negative myelograms and provide little information on these cases. The types of symptoms of the patients without myelographic evidence of stenosis were not different from those of the other groups. However, the severity of symptoms and the presence or absence of other conditions that could cause these symptoms are not reported. One can question whether these patients did in fact have spinal stenosis. While stenosis is usually defined by its radiographic appearance, Johnsson et al. (1981) included patients in this report based on surgical findings, but those surgical findings do not appear to involve any evaluation of the spinal canal itself. Their definition of spinal stenosis was hypertrophy of the neural arches and especially the facet joints, as well as a lack of dura pulsation and epidural fat. Considering that all the other trials reported using the imaging findings to define spinal stenosis, interpretation of the results of Johnsson et al. (1981) is difficult (Johnsson, Willner, and Pettersson, 1981).

Measurements at surgery were also reported by Verbiest (1977), whose case series spanned the period of 1948 to 1975. No imaging findings were reported in this article, so we cannot determine whether he concurs with Johnsson et al. (1981) about the surgical results of patients with negative myelograms.

Paine reported surgical results from a large series of patients (n = 457) with disk herniation (Paine, 1976). Degenerative spinal stenosis was diagnosed in addition to disk herniation in 95 patients, and myelography results were reported for 91 of them. Details of the methods are lacking, but Paine's article includes a table correlating overall results (combining return-to-work information and patients' pain complaints) with degree of stenosis. Our chi-squared test on these data found no significant difference in results between stenosis groups (χ2 = 8.51, df = 9, p = 0.48).

Number of Stenotic Levels

Sato and Kikuchi (1997) measured the number of stenotic levels rather than the degree of stenosis. They did not find statistically significant differences in JOA scores between patients with stenosis at both L4 and L5 and patients with stenosis only at L5. Data reported in the article are insufficient to permit us to verify this finding or to determine whether the findings resulted from a lack of statistical power in the study design (Sato and Kikuchi, 1997).

Paine (1976) also found that surgical results were better among patients with stenosis of only one vertebral level, compared to patients with two or more levels of stenosis, but our reanalysis of the data found no statistically significant difference (χ2 = 8.355, df = 6, p = 0.21). Preoperative condition of the patients was not reported, so we cannot determine whether surgery was more effective for the patients with mild stenosis (Paine, 1976).

Spondylolisthesis and Scoliosis Measurement

McCullen et al. (1994) reported a regression analysis for the effect of various patient characteristics and imaging findings on surgical outcomes. Stenosis grade or spinal measurements were not included in the analysis, but scoliosis and spondylolisthesis measurements before and after surgery were included. They obtained the data from measurements of plain film x-rays. From those data, we calculated correlation coefficients and their significance (McCullen, Bernini, Bernstein et al., 1994).

Preoperative spondylolisthesis, measured as the percentage of slippage (displacement) between the vertebrae, had a significant negative association with surgical results (correlation coefficient −0.172, p = 0.03). This relationship lost statistical significance when regression results were adjusted for the patients' sex (correlation coefficient −0.148, p = 0.07). Scoliosis was not significantly associated with outcomes (sex-adjusted correlation coefficient −0.107, p = 0.24). As expected, decreases in spondylolisthesis after surgery had a highly significant association with positive surgical outcomes (sex-adjusted correlation coefficient −0.271, p = 0.004). Spondylolisthesis is a predictor of worse outcomes after surgery, but these data do not tell us whether measuring spondylolisthesis predicts whether surgery will help a particular patient.

Summary

No published trials provided the data necessary to determine whether a group of lumbar spinal stenosis patients with particular results on some diagnostic imaging test will have better results after surgery than another group. The one controlled surgical trial that reported imaging results (Hurri, Slatis, Soini et al., 1998) did not differentiate between surgical and nonsurgical patients in its published imaging results.

Nine uncontrolled trials (see Table 37) reported surgical outcomes as a function of degree of stenosis, as measured by myelography. All nine of them used different scales to categorize stenosis. Two trials (Airaksinen, Herno, Turunen et al., 1997; Johnsson, Willner, and Pettersson, 1981) reported better outcomes among patients with more severe stenosis. Three trials (Hurri, Slatis, Soini et al., 1998; Silvers, Lewis, and Asch, 1993; Surin, Hedelin, and Smith, 1982) reported worse outcomes among patients with more severe stenosis. Four trials (Jalovaara, Lahde, Iikko et al., 1989; Johnsson, Uden, and Rosen, 1991; Nishizawa and Fujimura, 1997; Paine, 1976) reported no significant difference between the groups. Most of the studies failed to report preoperative data that would allow us to determine whether the observed differences between groups resulted from surgery or were preoperative differences.

The results found in this set of clinical trials could have several different causes. Sufficient information is not available in any of the articles to permit us to prove an association between imaging findings and surgical results. At this time, we cannot use imaging results to identify patient groups that would be more or less likely to benefit from surgery. One trial (Johnsson, Willner, and Pettersson, 1981) found that patients with normal myelograms did not benefit from surgery, but there is a question as to whether those patients even had spinal stenosis.

Question 8 What is the relationship between the type of surgery received and the success or failure of surgical treatment?

This question addresses the connection between the types of surgical procedures used and surgical success. For a given patient group, which type of surgical procedure produces the greatest benefit? As with the preceding question, we performed a systematic narrative review and calculated each study's effect size for differences between related groups whenever possible.

Comparisons of Laminectomy Procedures in Patients With Central Lumbar Stenosis

One randomized controlled trial and 10 controlled trials address the efficacy of surgical treatments for patients with central lumbar spinal stenosis (see Table 25). The randomized controlled trial of Grob et al. (1995) examined the benefits of partial laminectomy with and without fusion and instrumentation (Grob, Humke, and Dvorak, 1995). Hanakita et al. (1999) compared laminectomy to laminectomy plus fusion and to partial laminectomy among patients less than 64 years old (Hanakita, Suwa, and Mizuno, 1999). Thomas et al. (1997) compared laminectomy with laminotomy, Yone et al. (1996) compared laminectomy with fusion and instrumentation to laminotomy, and Ray (1982) compared laminectomy to partial laminectomy (Thomas, Rea, Pikul et al., 1997; Yone, Sakou, Kawauchi et al., 1996; Ray, 1982).

Only three trials, Hanakita et al. (1999), Thomas et al. (1997), and Ray (1982), provided a possible comparison of results between surgical methods: laminectomy versus partial laminectomy or laminotomy. However, differences between outcomes measured and the mean ages of the patient groups being compared prevented any meaningful combination of data. Hanakita et al. (1999) compared patients all younger than 65 years, while the mean age for Thomas et al. (1997) was 64, and the mean ages for the laminectomy patients and the partial laminectomy patients in Ray were 54 and 46 years, respectively. Thomas et al. (1997) and Ray could have been combined, but each study used different outcome measures (Thomas et al. used walking, and Ray used a global assessment) as well as different followup periods (Thomas et al. used 24 to 62 months, and Ray used 10 and 13 months).

Hanakita et al. (1999) reported a retrospective study of patient evaluation of surgical outcome. Patients received laminectomy, laminectomy plus fusion, or partial laminectomy. No pretreatment data are available to determine patient status before surgery; therefore, there is no means of evaluating the success of each surgical procedure. Comparisons across surgical methods are also complicated by the fact that patients with more than a 10 mm slip distance or a slip of more than 15° received fusion, so the patient groups are not the same. Therefore, this study cannot be used to evaluate the relationship between surgical methods and successful outcomes (Hanakita, Suwa, and Mizuno, 1999).

Ray (1982) reported on patients who received laminectomy with partial laminectomy. The only patient information provided was average age and duration of pain before surgery. As in the previous study, the lack of data on pretreatment conditions prevents the use of this study to evaluate the relationship between surgical methods and successful outcomes (Ray, 1982).

Thomas et al. (1997) compared laminectomy (12 patients) to laminotomy (14 patients). Walking distance in city blocks (the precise distance of a city block was undefined) was used to measure success. Pretreatment walking data were provided to show that walking capacity was similar between patient groups. Back pain, leg pain, and pain while walking were measured but incompletely reported. The original data and our analysis of these data are contained in Table 38. The effect size calculated from these data is very small (−0.015, which is not statistically significant), and a large increase in patient numbers is not likely to make this difference statistically significant. However, generalizing from such a small trial is problematic, and further research is needed before one can conclude that these two surgical methods are of equal efficacy. Further, it is not clear from this nonrandomized study that the patients in both groups were similar before treatment, nor is it clear whether the two surgical methods had different effects on symptoms other than walking capacity. Data are provided for pain while walking, on a 0 to 10 scale, for the laminectomy group only. A reference is made to a slightly less marked reduction in walking pain after surgery in the laminotomy group, compared to the laminectomy group with no supporting data. Additional patient data are needed to determine if surgical method influenced the extent of back pain, leg pain, and pain while walking after surgery (Thomas, Rea, Pikul et al., 1997).

Table 38. Analysis of Walking Capacity Data from Thomas, Rea, Pikul et al., 1997.

Table

Table 38. Analysis of Walking Capacity Data from Thomas, Rea, Pikul et al., 1997.

The Effect of Fusion and Instrumentation in Patients With Central Lumbar Stenosis

The single randomized controlled trial, Grob et al. (1995) examined 15 patients with partial laminectomy and 30 patients with fusion and instrumentation. The patients who received fusion and instrumentation were evenly divided between fusion at a single level and fusion at multiple levels. The mean age at the time of the operation was 66, 71, and 71 for no-fusion, single-fusion, and multiple-fusion groups, respectively. Walking capacity, back pain, back pain relief, leg pain relief, and a global assessment were recorded at pretreatment and 24 to 32 months later. Author evaluation of the data found statistically significant increases in walking distance and significant decreases in pain in all the surgical groups but found no differences between groups for relief of pain. Our calculation for the pain-relief data (see Table 39) yielded a nonsignificant effect size of −0.429 (Hedges' d) between no-fusion and single-fusion. Because of its small sample size, this study is underpowered. If the effect size had been 0.5, the study would have needed 65 patients in each of the two treatment groups to have a power of 0.8 (80 percent probability of finding the difference between treatments significant at p <0.05, two tailed). The actual power of this study was approximately 0.25. These power calculations are based on tables found in Rosenthal and Rosnow (1991). Given the low power, this study was only capable of detecting improvements within surgical groups and was not capable of detecting differences between surgical methods. Therefore, the data in this study cannot be used to determine whether the addition of fusion and instrumentation improves surgical outcome (Grob, Humke, and Dvorak, 1995).

Table 39. Comparison of Effect Sizes for Data on Pain Relief from Grob, Humke, and Dvorak, 1995.

Table

Table 39. Comparison of Effect Sizes for Data on Pain Relief from Grob, Humke, and Dvorak, 1995.

The Effect of Fusion and Instrumentation in Patients With Lateral lumbar stenosis

The randomized controlled trial of Lee and deBari (1986) evaluated the addition of Knodt distraction rods to decompressive laminectomy and bilateral-lateral fusion. Each treatment group had 12 patients who were diagnosed as having multiple-level foraminal stenosis. The mean age for all patients was 45.7 years (range 28 to 65), and the followup evaluations were performed at a mean of 38 months (range 12 to 72). A 100-point scoring system based on amount of pain and degree of tolerable activities of daily living was used to evaluate patients before and after surgery. Patients with scores less than 30 were considered to have severe constant pain with severe incapacitation of activities, and patients with scores between 75 and 100 were considered to have minimal or no pain and restriction of activities. The preoperative and postoperative scores and our analysis of these data are presented in Table 40. Both patient groups had similar preoperative scores and improvement in scores, and the authors concluded that there appears to be no clinical benefit from the use of rods compared to patients without rods. Our calculations show that the effect sizes for the postoperative score and the improvement score are small. A study with several hundred more patients would be needed to show that these effect sizes indicate a true difference between treatment groups. Therefore, this study provides evidence that the addition of Knodt distraction rods to decompression and fusion does not improve surgical outcomes (Lee and deBari, 1986).

Table 40. Comparison of Effect Sizes for Data on Global Assessment from Lee and deBari, 1986.

Table

Table 40. Comparison of Effect Sizes for Data on Global Assessment from Lee and deBari, 1986.

The Effect of Fusion and Instrumentation in Patients With Degenerative Spondylolisthesis

Four randomized controlled trials and six controlled trials examined surgical treatment of degenerative spondylolisthesis (see Table 30).

The randomized controlled trial of Thomsen et al. (1997) separately reported outcomes for patients with primary degenerative spondylolisthesis and patients with isthmic or secondary degenerative spondylolisthesis. Twenty-one patients with primary degenerative spondylolisthesis receiving partial laminectomy and fusion were compared to 20 patients with primary degenerative spondylolisthesis receiving partial laminectomy, fusion, and instrumentation. Functional outcome was assessed 2 years after surgery using the Dallas Pain Questionnaire. This questionnaire examines daily, work, and leisure activities; anxiety-depression; and social concerns, specifically among patients with low back pain. Mean differences of 73 and 37 between pretreatment and posttreatment scores were reported for the instrumented group and the noninstrumented group, respectively. An increase in scores indicates improvement. The authors reported that these differences in scores were not significant when using the Mann-Whitney test. The authors reported that 44 patients were needed in each treatment group to detect a clinically relevant difference in functional outcome of 15 in the Dallas Pain Questionnaire score. Sufficient patients were included in the total study, but within the primary degenerative spondylolisthesis subgroup, only 20 and 21 patients were available. Therefore, this study may have had too few primary degenerative spondylolisthesis patients to detect differences in outcomes due to treatment. Therefore, this study does not provide reliable evidence for the use of instrumentation with fusion (Thomsen, Christensen, Eiskjaer et al., 1997).

The randomized controlled trial of Fischgrund et al. (1997) compared decompressive laminectomy and fusion with (40 patients) and without (35 patients) instrumentation. Back pain and leg pain were rated on a 0 (no pain) to 5 (severe pain) scale. Before surgery, the mean back pain and leg pain scores were 4, and between 24 and 36 months after surgery, the mean scores were significantly reduced to 1 or 2. A global assessment of surgical outcomes was also reported. Our analysis of the global assessment data is presented in Table 41. The Hedges' d of 0.327 was not statistically significant (confidence limits of −0.152 to 0.805, p = 0.181). Both the pain measurements and the global assessment measurements indicate that both groups responded equally well to surgery regardless of the addition of instrumentation. However, this small study likely contains too few patients to conclusively show that instrumentation does not improve surgical outcomes (Fischgrund, Mackay, Herkowitz et al., 1997).

Table 41. Effect Size for Data on Global Assessment from Fischgrund, Mackay, Herkowitz et al., 1997.

Table

Table 41. Effect Size for Data on Global Assessment from Fischgrund, Mackay, Herkowitz et al., 1997.

The randomized controlled trial of Herkowitz and Kurz (1991) compared laminectomy (25 patients) to laminectomy plus fusion (25 patients). Back pain and leg pain were rated on a 0 (no pain) to 5 (severe pain) scale, and operative results were rated as excellent, good, fair, or poor. Followup time averaged 36 months (range of 29 to 48 months). The trial reported that patients with fusion had significantly less back pain and leg pain (p = 0.05, Student t test) and were significantly improved based on the operative results scale (p = 0.0001, Fisher exact test). Our analysis of these data is presented in Table 42 and Table 43. These data suggest that the addition of fusion to decompressive laminectomy benefited patients by reducing pain to a larger extent than decompressive laminectomy alone (Herkowitz and Kurz, 1991).

Table 42. Effect Size for Data on Global Assessment from Herkowitz and Kurz, 1991.

Table

Table 42. Effect Size for Data on Global Assessment from Herkowitz and Kurz, 1991.

Table 43. Effect Size for Data on Back and Leg Pain from Herkowitz and Kurz, 1991.

Table

Table 43. Effect Size for Data on Back and Leg Pain from Herkowitz and Kurz, 1991.

The randomized controlled trial of Bridwell et al. (1993) purposely violated the randomization procedure so that patients with pathological motion (>10° or >3 mm of slippage) always received partial laminectomy, fusion, and instrumentation (24 patients). Other patients received either partial laminectomy only (9 patients) or partial laminectomy and fusion (10 patients). Thus, patients with the most severe condition were segregated to one treatment, and any comparison to the other treatments containing patients with less severe conditions is biased. Therefore, we cannot use these data in a comparison of surgical methods (Bridwell, Sedgewick, O'Brien et al., 1993).

Plotz and Benini (1998) retrospectively reviewed 106 patients who received decompressive surgery and separated patient data into three groups according to additional surgical methods: fusion, fusion plus translaminar screw fixation, or fusion plus AO internal fixator. At the time of followup, 9 to 120 months after surgery, the fusion only group had less than 10 patients, and there were 14 and 64 patients in the other groups, respectively. Since the control group had less than 10 patients, we did not perform any further analysis (Plotz and Benini, 1998).

Yuan et al. (1994) reported on an open, nonblinded, historical cohort study that collected patient data from surgeons. Patients received either fusion alone (456 patients) or fusion and pedicle screw fixation (2,177 patients). At followup 23 to 51 months after surgery, patient function, back pain, and leg pain were judged as improved, no change, or worse. The no-change and worse categories were combined and compared to the improved category using a chi-square test and a 2 by 2 table. Each patient group showed significant improvement in functional status (p <0.02), back pain (p <0.01), and leg pain (p <0.03). Our analysis of these data is presented in Table 44. This analysis suggests that pedicle screw fixation benefited patients with degenerative spondylolisthesis, but interpretation of these findings is not straightforward. This is due to the historical, nonblinded nature of the trial, design aspects that could introduce bias. The usefulness of these data are further reduced by the lack of pretreatment pain measures. Such measures could be used to determine the actual extent of changes in pain perception and to determine whether patients in the two groups were similar enough to permit a valid comparison of treatments (Yuan, Garfin, Dickman et al., 1994).

Table 44. Effect Size for Data on Functional Status, Back Pain, and Leg Pain from Yuan, Garfin, Dickman et al., 1994.

Table

Table 44. Effect Size for Data on Functional Status, Back Pain, and Leg Pain from Yuan, Garfin, Dickman et al., 1994.

Satomi et al. (1992) reported on a retrospective study that divided patients into groups receiving fusion and instrumentation (no decompression) and those receiving various types of decompressive surgery. These groups also differed in mean age (51 vs. 69 years) and mean time from onset of disease (3.8 vs. 8.7 years). These differences in patient characteristics between treatment groups bias any comparison of treatment outcomes. Therefore, these data were not used in our analysis (Satomi, Hirabayashi, Toyama et al., 1992).

Lombardi et al. (1985) retrospectively evaluated patients with standard wide laminectomy with (21 patients) and without (20 patients) fusion. This study used a five-level global assessment (excellent, good, fair, poor, failure) to evaluate surgical success 24 to 84 months posttreatment. Our analysis of these data is presented in Table 45 and indicates no differences in global outcomes between treatment groups. The power of this study is about 30 percent indicating that additional patients would be needed to definitively establish that this lack of difference is, in fact, real. The retrospective design did not allow for the inclusion of pretreatment patient data concerning back or leg pain and walking capacity. The lack of these pretreatment data reduces the usefulness of this study in determining the extent to which fusion may benefit patients with spondylolisthesis (Lombardi, Wiltse, Reynolds et al., 1985).

Table 45. Effect Size for Data on Global Assessment from Lombardi, Wiltse, Reynolds et al., 1985.

Table

Table 45. Effect Size for Data on Global Assessment from Lombardi, Wiltse, Reynolds et al., 1985.

Fitzgerald and Newman (1976) conducted a retrospective examination of patients given a rigid brace as conservative treatment (29 patients) and a variety of decompressive surgical methods. Patients with less severe symptoms (no indication of nerve root compression) were placed in the conservative treatment group. The differences in methods within the surgery group and the differences in patient characteristics between groups prevent any meaningful comparison of treatment outcomes. Therefore, this study cannot be used in our analysis of surgical procedures (Fitzgerald and Newman, 1976).

Rosenberg (1976) retrospectively reported on patients receiving partial laminectomy (11 patients) and standard wide laminectomy (15 patients). The lack of a specifically defined global outcome scale and the lack of specific pretreatment patient characteristics to judge the effect of treatment prevent any meaningful comparison of treatment outcomes. Therefore, this study cannot be used in our analysis of surgical procedures (Rosenberg, 1976).

Summary

Taken as reported, the results of the randomized controlled trials of Thomsen et al. (Thomsen, Christensen, Eiskjaer et al., 1997) and Fischgrund et al. (Fischgrund, Mackay, Herkowitz et al., 1997) would seem to suggest that instrumentation in addition to fusion does not improve surgical outcomes among patients with spondylolisthesis. However, both trials likely had too few patients (and, therefore, insufficient statistical power) to render any definitive conclusion.

Herkowitz and Kurz (1991) provide evidence that fusion is beneficial compared to decompressive surgery alone in these patients. The other RCTs had flaws in design or reporting that made their results of questionable value.

Drawing conclusions from just one or two trials is, however, problematic. This is because such conclusions have a rather high potential to be influenced by publication bias (the potential that trials with negative findings are not published) or the "file drawer" problem (the tendency of investigators to not report findings that are not statistically significant). Until data from larger and well-designed randomized controlled trials are available, reliable conclusions are not possible.

Question 9 What costs are associated with nonsurgical and surgical treatment of spinal stenosis?

This question does not appear in the evidence models. Information on the cost of surgical treatment of lumbar spinal stenosis came from several sources and is presented in Table 46. Because present data did not allow us to estimate the effectiveness of any treatment or diagnostic, we were unable to perform a cost-effectiveness analysis. Therefore, in this section, we present only cost information.

Table 46. Treatment Costs for Procedures Related to Lumbar Spinal Stenosis.

Table

Table 46. Treatment Costs for Procedures Related to Lumbar Spinal Stenosis.

Information in Table 46 is derived from several sources. The first of these is the Healthcare Cost and Utilization Project (HCUP). The HCUP is a survey of hospital patient information and charges that is developed and annually updated by AHRQ. The second entry in the table is from the Medical Provider Analysis and Review (MEDPAR) database of the Health Care Financing Administration (HCFA). This file contains information from Medicare beneficiaries using hospital inpatient services. MECQA, the third entry in the table, provides benchmarking figures. The Nelson et al. (1999) data are derived from a 2.5-year study in which third-party payers in Minneapolis were surveyed about average costs. The average followup was 16 months after hospital discharge. Presented in the table are average surgical costs. The Engel et al. (1996) data are derived from a one-year prospective cohort study of 1,059 patients with back pain. The Shekelle et al. (1995) data are derived from a prospective, community-based observational study on the costs of back pain care. In the table, we present only inpatient costs per episode of back pain care. The Katz et al. (1997) data compare the cost of laminectomy without fusion, laminectomy with fusion, and laminectomy with fusion and instrumentation from patients operated on at Brigham and Women's Hospital from 1989 to 1993. Finally, professional fees for office visits were obtained from the Medicare Physicians Fee Schedule for 1999.

Footnotes

*

See Volume 2 of this report for Evidence Tables 1-25.

Views

  • PubReader
  • Print View
  • Cite this Page

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...