U.S. flag

An official website of the United States government

NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Woloshin S, Schwartz LM, Welch HG. Know Your Chances: Understanding Health Statistics. Berkeley (CA): University of California Press; 2008.

Cover of Know Your Chances

Know Your Chances: Understanding Health Statistics.

Show details

Chapter 9Beware of Exaggerated Certainty

Of course, the numbers you see in health messages are not the whole story. We’d now like to add another bit of advice: once you have the numbers, ask yourself whether or not you should believe them. Unfortunately, many statistics should not be accepted at face value, because they convey a sense of exaggerated certainty. There are at least two reasons why reported research findings might not be right: much research is based on weak science, and many results are disseminated too early.

What Kind of Science Is Behind the Numbers?

The first question to ask is, “Is there any science behind the numbers?” Ideally, there would be. But sometimes there isn’t. The second question is, “How good is the science?” Some research makes only a weak case for the message; other studies make a strong case. In this section, we’ll help you think about how compelling the case is.

For example, treatments that have been shown to fight illnesses in test tubes don’t necessarily make you better. And treatments that work in animals often don’t have the same results in humans. This doesn’t mean that basic science research is not important—indeed, it is fundamental. But it’s important to be skeptical about treatments that have been proven only in animal or lab studies, since they may not turn out to be relevant for people. Even when we focus on the most promising animal studies, only about one-third of treatments proven helpful in animals have turned out to be helpful in people.1 The following diagram illustrates a spectrum of believability for research findings; we put test tube and animal studies lowest on our believability scale because these findings often do not translate into improved human health.

Image ucalpkycf17.jpg

And not all human research studies are equally compelling, either. In an observational uncontrolled study, researchers simply watch what happens to a series of people in one group. For example, everyone gets drug X, and the researchers record how many people get better. But there’s a big problem with these studies: you can’t know what would have happened without drug X —maybe more people would have gotten better! Whenever you hear the results of a study about how well an intervention works, ask whether the study included a control group (a group of people who did not undergo the intervention). Without a control group, it’s impossible to know whether the intervention really accounts for the study findings. Remind yourself that, no matter how dramatically the findings from an uncontrolled study are described, they are not particularly believable.

Stronger scientific evidence comes from controlled studies, in which researchers watch what happens to different groups of people. The most basic kinds of controlled studies involve observational research, in which the researchers merely record what happens to people in different situations, without intervening. Cohort and case-control studies are perhaps the best-known types of observational controlled research. Such research first linked cigarette smoking to lung cancer, and high cholesterol to heart disease. This is the only way to study dangerous exposures. But these kinds of studies have important problems. Although they can show that an intervention is associated with a particular outcome, they cannot by themselves prove that the intervention causes the outcome. It’s always possible that other factors not accounted for in the research are causing the outcome.

For example, researchers might believe that eating string beans prevents heart attacks. To test this hypothesis, they compare people who eat a lot of string beans with people who never eat string beans to see which group has more heart attacks. Of course, these groups of people may be very different in lots of ways besides eating string beans. For instance, let’s say that people who choose to eat string beans might be more likely to be vegetarians, to eat a Mediterranean diet, and to exercise. So if the string bean eaters “do better” than the others, it might not be because of the string beans.

An actual example involves the long-held belief that most women should take estrogen after menopause. That idea, only recently discredited, also came from observational research. The observation—drawn from more than forty studies involving hundreds of thousands of women—was that women who took estrogen supplements had less heart disease. But it turned out that estrogen was not the reason for decreased heart disease. Instead, women taking estrogen tended to be healthier and wealthier. Their health and wealth— not their estrogen supplements—were responsible for the lower risk of heart disease.

Whenever you hear about the results of observational controlled studies, we suggest being cautious about concluding that the lifestyle factor, environmental exposure, or drug being studied (like eating string beans or taking estrogen) actually causes the outcome (like heart disease). In these types of studies, you simply cannot rule out the possibility that another characteristic of the participants in fact caused the difference—and that the original conclusion may therefore be wrong. Although we are stuck with observational studies when we research harmful exposures such as smoking, this is fortunately not the case for learning the benefit of an intervention.

The only way to reliably tell if the intervention causes the outcome is to conduct a true experiment—a randomized trial. In a randomized controlled trial, researchers construct two groups that are similar in every way except one: whether or not they get the intervention being studied. Patients are assigned randomly (by chance) to one of the groups. It is then reasonable to assume that any differences observed in the trial must have been caused by the intervention (since it was the only difference between the groups). In the case of the assumed connection between estrogen and heart disease, such a study showed that the long-held beliefs were wrong.

In general, you can have the most faith in statistics resulting from large, randomized, controlled trials. Having a large number of study participants is important to make sure that the findings are not the result of chance and to get a precise estimate of the difference between the starting and modified risks. It’s sometimes even possible to combine the results of multiple randomized trials to get even more reliable and precise results, using an approach called meta-analysis. This diagram summarizes the believability of the results of different study designs.

Image ucalpkycf18.jpg

But even when results come from a randomized trial, they aren’t necessarily right. The results of randomized trials can also be misleading—particularly if the trial was small (for example, with fewer than thirty participants) or lasted only a short time (like a few months). And, as we noted earlier, the benefit of any treatment should be weighed against its side effects or other downsides, such as inconvenience or cost—and these may not have been measured in the randomized trial.

Unfortunately, it’s not always possible to conduct a randomized trial. This is certainly the case for studying harmful exposures such as smoking. Because it’s unethical to deliberately expose people to harm, the best we can do in such cases is an observational study. And sometimes even when it is ethical to conduct a randomized trial, it might not be feasible. For example, it’s extremely unlikely that we could get people to agree to be randomly assigned to eat either only fast food or only organic food every day for a year (or that they would actually adhere to the diet). Again, in such cases, scientists have to rely on observational studies.

But when new interventions are proposed, it is critical to conduct randomized trials before these new strategies are introduced into widespread use.4 Doctors prescribed estrogen to millions of women for many years until randomized trials showed that intuition and dozens of observational studies were wrong.

Learn More

About This Book

Randomized trials are conducted not only to evaluate drugs—they can also test all sorts of interventions. We actually tested how an early draft of this book affected readers’ ability to understand messages about risk (and risk reduction).

We conducted two randomized trials to see how well the book performed in two distinct populations. The first trial included 334 people who attended a public lecture series at Dartmouth Medical School. This lecture series is especially popular with retirees in the community. The second trial included 221 veterans and their families recruited from the waiting areas in the medical clinic at our local Veterans Administration Hospital, where we see patients. The people attending the lecture series tend to be much more affluent and to have more formal education than the people from the VA.

Other than these differences among the participants, the trials were identical. People were randomly assigned to read either our book or a general education book of about the same length and reading level that included no training in understanding risk. This second book served as our “placebo.” Everyone was asked to answer an eighteen-question test, which asked them to interpret real-world health statistics in drug advertisements and news stories.2

The results—published in the Annals of Internal Medicine— were very gratifying for us.3 In both trials, the book proved to be effective (and safe). Here’s what we found:

Starting Risk
(With “placebo” book)
Modified Risk
(With our book)
Did our book help?
Lecture series trial
   Got a passing grade on test56%74%
   Got an A on test7%26%
VA trial
   Got a passing grade on test26%44%
   Got an A on test2%10%
Did our book have side effects?None reported

This means this book is clinically proven to be effective!

Is It Too Early?

The other reason why even the most exciting findings might be wrong is that they are premature. Regardless of the study design—whether observational, randomized, or another type—you should always be extremely cautious about preliminary findings. Many of the most impressive breakthroughs reported in the media are first announced at meetings of medical or scientific associations. But these results are often preliminary—the study may not even be finished, and sometimes the results have not yet been vetted by outside experts, a process known as peer review. Some of the results presented at such meetings may never be published in professional medical journals because of concerns about whether the findings are really valid. Or, if the findings are published, they may change substantially, perhaps even contradicting the results first reported.5 In other words, such reports sometimes turn out to be wrong. So we recommend a very high level of skepticism when the media trumpets the results of unpublished research presented at professional meetings.

Other results are too early because they are based on short-term studies. It takes time to verify and confirm research results. The problems that arise with newly approved prescription drugs are an all too common example. In the studies required for approval by the U.S. Food and Drug Administration (FDA), researchers typically test drugs in relatively small groups of people for a relatively short time. Consequently, rare or long-term side effects will not be seen until after approval, when the drug goes on the market and millions of people use it for long periods. This is why you often hear about the FDA either removing a drug from the market or putting a “black box warning” on it to let people know about a new side effect that has turned up.

In general, it’s a good idea to be wary of new drugs. We would even go so far as to advise avoiding them unless there is no good alternative. Most serious problems with new drugs emerge within 5 years of FDA approval, so it might be wise to stick with drugs that have been around for at least 5 years rather than going with a new one.6 With drugs, it’s dangerous to assume that newer means better. You should be skeptical of the claim that newer drugs must be better drugs.

You’ll find a lot of exaggerated certainty out there. Look out for “strong” conclusions based on weak science. Pay the most attention to research results that have been independently reviewed by experts and are published in reputable medical journals. Approach preliminary work skeptically. And be cautious about new drugs and treatments until they have established a track record for safety and effectiveness.

Copyright © 2008, The Regents of the University of California.

Know Your Chances: Understanding Health Statistics is hereby licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported license, which permits copying, distribution, and transmission of the work, provided the original work is properly cited, not used for commercial purposes, nor is altered or transformed.

Bookshelf ID: NBK126166

Views

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...