U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Institute of Medicine (US) Committee on Technological Innovation in Medicine; Gelijns AC, editor. Modern Methods of Clinical Investigation: Medical Innovation at the Crossroads: Volume I. Washington (DC): National Academies Press (US); 1990.

Cover of Modern Methods of Clinical Investigation

Modern Methods of Clinical Investigation: Medical Innovation at the Crossroads: Volume I.

Show details

2The Selection of Endpoints in Evaluative Research

JOHN P. BUNKER

Having repeatedly urged that we make a greater investment in the evaluation of medical technologies, it is perhaps only fitting that I discuss the endpoints one should address during the various stages of the development process, and when one might rely on intermediate endpoints as surrogates for clinical endpoints. I will consider condition-specific mortality versus all-cause mortality, and—where mortality is not a central issue—condition-specific outcomes versus all-cause outcomes. I will also address the underlying issue of risks and benefits; that is, the issue of trade-offs in the evaluation of therapeutic technology.

SURROGATE VERSUS CLINICAL ENDPOINTS

The surrogate-versus-clinical endpoints battle is particularly prominent now in the drug arena. The issue is when new drugs should be released for clinical use. Under most circumstances, the Food and Drug Administration (FDA) has required evidence of clinical improvement and has rejected surrogate endpoints in making such decisions. The resultant delay has brought continuing opprobrium on the FDA. In a highly controversial and well-publicized decision, the FDA initially withheld approval of tissue-type plasminogen activator (t-PA), although evidence had been presented that t-PA lysed coronary thrombi and that arterial patency was achieved more frequently with t-PA than with streptokinase. But there was no evidence at the time that t-PA increased survival over that obtainable with streptokinase. Among the outraged critics of the decision was the Wall Street Journal which, under the headline “Human Sacrifice,” mounted one of its many attacks on the regulatory bureaucracy of the FDA.

The FDA later did approve t-PA, which for a brief period appeared the treatment of choice. Now there are beginning to be second thoughts. A New England Journal of Medicine report from New Zealand (1), showing no difference in ventricular function, coronary artery patency rates, and reinfarction—again, incidentally, surrogate endpoints—suggests that there is no difference between these two drugs other than cost. One should bear in mind that the sample size was small, with 130 and 135 patients receiving streptokinase and t-PA respectively. Of course, one major advantage of surrogate endpoints is that smaller sample sizes may be adequate. Even with such a small sample, it is interesting to note that after 30 days there were 10 deaths in patients receiving streptokinase and 5 in patients receiving t-PA; after 9 months there were 12 and 8 deaths respectively. While the differences in mortality may appear suggestive, the p-values, 0.2 and 0.34 for the two time periods, did not reach the conventional level of statistical significance.

The other major surrogate-versus-clinical endpoints battle has been fought over cancer chemotherapy. Again, the FDA has been denounced by the Wall Street Journal for foot-dragging. The question under debate is whether we should expedite the introduction of drugs and under what circumstances. It has always seemed to me that for most drugs the public is better served by the relatively measured and cautious policy adopted by the FDA. My personal view reflects a concern for the risks, both known and unknown, of hastily introduced technology. I believe it was Harold Green, chair of the 1973 Artificial Heart Assessment Panel, who suggested that a delay in introducing a new therapy means only that the public has to live with the status quo, while the widespread use of inadequately tested treatments can possibly expose the public to substantial harm. The views of potential recipients of treatment may be quite different, depending on the severity of the condition. While most of medicine is concerned with conditions that are not life-threatening, it is entirely appropriate that we adopt different attitudes and policies for introducing drugs which treat life-threatening conditions as opposed to those for treating the large proportion of routine medicine.

However well or cautiously we evaluate drugs in Phases I, II, and III, a major shortcoming in how we introduce drugs in this country is in follow-up. Once a drug has been introduced, we have no systematic and comprehensive way to detect or control long-term risks and benefits. It has been observed that Great Britain is willing to introduce drugs at an earlier stage in their development because the British system of post-marketing surveillance (PMS) may be more effective than the United States post-marketing system; see for instance the contribution of Inman in this volume. It is a source of considerable chagrin that our country failed to act on the recommendations of the President's Commission on Post-Marketing Surveillance that would have established a reliable system of PMS a decade ago.

The problem of post-marketing surveillance is at least as great for medical devices and procedures as for drugs. Surgeons in particular do not have good data on long-term outcomes. Note, for example, the incredulity of urologists who learned from John Wennberg's research about the number of patients who die within a year after transurethral resection of the prostate (see Chapter 4).

CLINICAL ENDPOINTS

I will return to the question of how good a surrogate for clinical endpoints an intermediate endpoint may be, but first it will be useful to examine the clinical endpoint itself: How reliable are clinical endpoints? Are they adequate gold standards themselves? The debate over condition-specific versus all-cause mortality is particularly interesting and sobering. It is well recognized that all-cause mortality is a purer endpoint than disease-specific deaths, because all-cause mortality helps avoid such problems as bias in patient selection, missing data, and changes in classification over time.

Proponents of new therapies understandably would prefer to judge their results on the basis of the specific condition the treatment is intended to relieve. An investigator might well ask why death from a completely unrelated cause should count against the proposed therapy. But it is not always clear that the “unrelated” cause is really unrelated. The latest example to come to my attention is a report from Scotland, in the British Medical Journal, in which the authors report an observational study correlating blood cholesterol levels with cardiac deaths and other endpoints, cancer in particular (2). The investigators found the predicted association between cholesterol level and cardiac deaths, but the reduction in cardiac deaths associated with lower cholesterol was offset by an equal increase in cancer deaths.

You may be familar with the unpleasant fact that in three major lipid drug trials, the fall in cardiac mortality associated with lower blood cholesterol was off-set by increased accidental deaths in the experimental groups, and total mortality was unchanged (3,4,5). Investigators still are trying to figure out whether this awkward relationship between lower cholesterol and accidental deaths is causal.

Offsetting mortalities can, of course, go the other way. In studying the possible condition-specific mortality risk of a therapy, it is equally important to examine the possibility that the therapy produces an offsetting fall in total mortality. For example, in the National Halothane Study, we were concerned that some patients receiving the anesthetic halothane would die of liver failure. We were also aware of the possibility that halothane, because of its superior clinical properties, might have offsetting decreases in mortality from other causes. As it turned out, there were but a handful of deaths from liver necrosis, and these were more than offset by a fall in all-cause mortality for patients receiving halothane.

From the foregoing considerations, I posit that the phenomenon of offsetting risks is important and perhaps not adequately appreciated. It is by no means limited to mortality. Mortality is what we tend to study, not only because it is important, but also because it is easier to measure than many other things we would like to know and that are also important. When a technology intended to improve quality of life has both benefits and risks, they are likely to be very difficult to compare. It is the old apples-versus-oranges problem, but even worse since there may be several baskets of different fruits to be balanced in the equation.

Improvement in quality of life is not only an important outcome of medical care; it is the only intended outcome of most of what we do in medicine. In commenting on the failure of cholesterol-lowering drugs to reduce total mortality, Fries, Green, and Levine point out that “the primary purpose of most health promotion activities . . . is to improve quality of life” and that, by implication, it may be unrealistic to expect or demand that length of life be extended (6). More important, they suggest, is the decreased morbidity and improved quality of life that accompany a decrease in risk factors and improved cardiovascular function.

There are any number of therapies, intended to improve quality of life, which have offsetting adverse effects. I will mention a few: thalidomide causing phocomelia, diethylstilbestrol (DES) causing vaginal cancer in the offspring of women receiving it, swine flu vaccine followed by Guillain-Barré syndrome. The latter is of particular interest because a very large clinical trial was not large enough to pick up the rare but extremely serious syndrome. The ever-present risk of side effects, many unknown, with everyday treatment is part of the price we must pay when therapy is effective. It is not quite so easy to accept the inevitable complications and ill effects of other therapies, such as the severe malabsorption problems that followed gastric bypass surgery, metabolic imbalances that could have been easily predicted.

Two common operations that are performed to improve quality of life may have an opposite effect. As Wennberg et al. remind us, prostatectomy is often followed by impotence and incontinence (7). Hysterectomy may be followed by depression and an increase in urinary tract infections. Recent data suggest, however, that the improvements in quality of life for many or most patients undergoing elective hysterectomy or prostatectomy may more than offset the potential ill effects of the procedures. To balance the quality-of-life benefits and risks of such procedures we must consider the values of the patient. These depend heavily on how individual patients perceive the benefits and risks of the procedures. Unfortunately, we do not yet know how to present the issue of risk to patients in a meaningful way; nor, I suspect, do those of us in the profession fully understand these risks. It is clear, however, that different patients have different values, and that patients' values may differ widely from those ascribed to them by their physicians (8).

There is another difficulty. A quality-of-life therapy may have as its goal a single condition-specific benefit that is easily measurable, but we don't have any single all-cause index to identify and measure possible offsetting negative effects. Indeed, we may not even know what side effects to look for when a therapy is introduced. An observational study before introduction may, however, give us some clues as to potential side effects to look for in long-term surveillance.

As Chapter 4 indicates, all outcomes that are relevant to a patient should be included in evaluative research: mortality and morbidity, complications, symptom reduction, and functional status improvement, as well as the standard physiologic and biochemical surrogates. For this purpose, Fries and Spitz, in a recently published book, have proposed a hierarchy of quality-of-life assessment indices for surveillance: death, disability, discomfort, drug side effects, and dollar costs (9), each of the latter four subdivided into relevant components (e.g., pain, fatigue, depression, anxiety).

While it is of course desirable to obtain a definitive evaluation of a new product or treatment as soon as possible, haste can create serious problems, as we saw with t-PA. Another example of the importance of timing is the use of injected chymopapain as an alternative treatment for the relief of ruptured intervertebral disks. In clinical trials it appeared safe and effective. But serious complications (transverse myelitis and anaphylactoid reactions) were reported shortly after the FDA released chymopapain for general use (10). A third interesting and sobering example is the recent report that, in randomized clinical trials comparing mastectomy with mastectomy plus radiation to the chest wall for breast cancer, there was a late increase in serious cardiac events, coronary artery disease, and unrelated malignancies in the group receiving radiation (11). With t-PA and chymopapain the adverse effects were detected very quickly. With DES and with radiation in the foregoing example, they occurred much later. It may even be necessary to wait years.

Radiation techniques for breast cancer have changed and presumably improved considerably, so that patients undergoing lumpectomy and radiation now may be spared these complications, but we simply do not know if that is so. As Chapter 12 points out, devices and procedures are generally subject to incremental innovation. Not only will an operation or procedure differ among different physicians, but the procedure or device itself will be modified over time. It may be difficult ever to know when to evaluate devices and procedures and we may therefore need to follow patients for long periods. We need to invoke the right and accept the responsibility to review the effects of treatment continually, and to revise our clinical decisions as new evidence becomes available.

SURROGATE ENDPOINTS

Returning to surrogate endpoints: they have all the problems of clinical endpoints plus a good many of their own. They can be related only to conditionspecific outcomes, and their relationship to hoped-for clinical outcomes may not be a strong one. I might point out that this is analogous to the well-known problem central to the medical audit, the process-versus-outcome relationship that we have all worried about.

The potential usefulness of surrogate endpoints during the early stages of development appears to be strongest in the cardiac area, but we have already seen the problems experienced with t-PA. The use of surrogate endpoints has also been explored with some enthusiasm for cancer chemotherapy, with shrinkage of tumor size the usual proposed surrogate for increased life expectancy. However strong the association between such surrogates and their intended effects may prove to be, a serious limitation of surrogates as a basis for evaluation is that none of the offsetting adverse effects can be determined when surrogate outcomes are used.

I would like to call your attention to the April 1989 issue of Statistics in Medicine, the first four articles of which are devoted to discussion of surrogate endpoints. They explore in depth three conditions that one might consider as having the greatest potential: cancer (12), cardiovascular disease (13), and ophthalmologic disorders (14). One advantage that is emphasized is that surrogates provide earlier answers. But it is of interest that, in its attempt to expedite the availability of drugs to treat AIDS and cancer, the FDA has not moved to allow surrogate endpoints; the enhanced speed is achieved by collapsing Phases II and III and giving such drugs priority treatment (J. Goyan, personal communication, 1989).

In conclusion, I will make four points. First, when dealing with mortality as an endpoint of treatment, all-cause mortality is ignored at the peril of the investigators and the public. Second, when dealing with quality of life, multiple or hierarchical endpoints must be considered; their identity may not be known in advance; and they cannot be summarized in a single number, for there is no all-cause quality of life equivalent of all-cause mortality. Third, a more systematic and comprehensive method of long-term monitoring or surveillance is needed. If one is established, a greater reliance on surrogate endpoints might be justified. Finally, we must be concerned with the complex issue of an informed public's wants and values.

REFERENCES

1.
White HD, Rivers JT, Maslowski AH, et al. Effect of intravenous streptokinase as compared with that of tissue plasminogen activator on left ventricular function after first myocardial infarction.New England Journal of Medicine 1989; 320:817–821. [PubMed: 2494454]
2.
Isles CG, Hole DJ, Gillis CR, Hawthorne VM, Lever AF. Plasma cholesterol, coronary heart disease, and cancer in Renfrew and Paisley survey.British Medical Journal 1989; 298:920–924. [PMC free article: PMC1836205] [PubMed: 2497858]
3.
Frick MH, Elo O, Haapa K, Heinonen OP. Helsinki Heart Study: Primary-prevention trial with gemfibrozil in middle-aged men with dyslipidemia. New England Journal of Medicine 1987; 317:1237–1245. [PubMed: 3313041]
4.
Coronary heart disease death, nonfatal acute myocardial infarction and other clinical outcomes in the Multiple Risk Factor Intervention Trial Research Group. American Journal of Cardiology 1986; 58:1–13. [PubMed: 2873741]
5.
The Lipid Research Clinics Coronary Primary Prevention Trial results. I. Reduction of incidence of coronary heart disease. Journal of the American Medical Association 1984; 251:351–364. [PubMed: 6361299]
6.
Fries JF, Green LW, Levine S. Health promotion and the compression of morbidity. Lancet 1989; 1:481–483. [PubMed: 2563849]
7.
Wennberg JE, Roos N, Sola L, Schori A, Jaffe R. Use of claims data systems to evaluate health care outcomes: Morbidity and reoperation following prostatectomy. Journal of the American Medical Association 1987; 257:933–936. [PubMed: 3543419]
8.
McNeil BJ, Weichselbaum R, Pauker SG. Fallacy of the five-year survival in lung cancer. New England Journal of Medicine 1978; 299:1397. [PubMed: 714117]
9.
Fries JF, Spitz PW. The hierarchy of patient outcomes. In Spilker B , editor. (ed.) Quality of Life Assessments in Clinical Trials. New York: Raven Press, 1990: 25–35.
10.
Blue Shield of California, Medical Policy Committee (March 4, 1987).
11.
Houghton J, Baum M. Adjuvant radiotherapy in breast cancer: Considerations of cost-benefits in relation to the CRC (King's/Cambridge) trial. International Journal of Technology Assessment in Health Care 1989; 5:415–422. [PubMed: 10313312]
12.
Ellenberg SS, Hamilton JM. Surrogate endpoints in clinical trials: Cancer. Statistics in Medicine 1989; 8:405–413. [PubMed: 2727464]
13.
Wittes J, Latos E. Surrogate endpoints in clinical trials: Cardiovascular diseases. Statistics in Medicine 1989; 8:415–426. [PubMed: 2727465]
14.
Hillis A, Seigel D. Surrogate endpoints in clinical trials: Ophthalmologic disorders. Statistics in Medicine 1989; 8:427–430. [PubMed: 2727466]
Copyright © 1990 by the National Academy of Sciences.
Bookshelf ID: NBK235487

Views

  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (2.3M)

Related information

  • PMC
    PubMed Central citations
  • PubMed
    Links to PubMed

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...