NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Irwig L, Irwig J, Trevena L, et al. Smart Health Choices: Making Sense of Health Advice. London: Hammersmith Press; 2008.

Cover of Smart Health Choices

Smart Health Choices: Making Sense of Health Advice.

Show details

Chapter 3Bad evidence

I’m always certain about things that are a matter of opinion.

Charlie Brown1

Thinking straight about the world is a precious and difficult process that must be carefully nurtured.

Thomas Gilovich2

What would you think of a newspaper report that said that a certain substance caused many major diseases, on the evidence that 99.9 per cent of all people who die from cancer had eaten it and that most sick people had also eaten it? Would that make you a tad nervous about trying the substance? What if another article noted that 99 per cent of people involved in air and car crashes had eaten carrots within 60 days preceding the accident and that 93 per cent of criminals come from homes where carrots are served frequently? Would you stop eating carrots?

Although this (very much tongue-in-cheek) report might make you laugh, it raises a serious issue: health advice can easily mislead, even be harmful, if not tested by high-quality studies. Studies are not always designed so that they are capable of providing reliable information. And there are different types of studies capable of providing different types of information. This chapter aims to help you understand the basics of health research and to give you some tips for distinguishing between the different types of studies.

Basic research – testing ideas

Getting back to carrots, there are several different types of studies that could investigate the killer carrot hypothesis. So-called basic research is typically conducted in the laboratory, using experiments with cells, animals or human tissue to investigate underlying mechanisms of the body and how they are affected by disease or potential treatments. In the 1920s a laboratory-based study on dogs with diabetes laid the basis for treating humans with insulin. But the early cholesterol studies done on animals were not appropriate models because animals and humans metabolise cholesterol in very different ways.

Although studies in the laboratory can provide important information, it generally would be unwise to assume that the results are applicable to people until they are tested more widely in trials on people.

The media often carries reports of promising laboratory research – for example, of potential new cancer ‘cures’ – which provide less exciting news when they are eventually tested in randomised controlled trials. Not surprisingly, the hypothetical ‘carrot report’ notes that rats force fed with 20 pounds of carrots per day for 30 days developed bulging abdomens. Their appetites for wholesome food were destroyed. Perhaps this is another example of why we shouldn’t be too quick to draw conclusions for humans from studies of rats.

Sometimes basic research is done on people to test whether a drug or procedure affects the way the body functions or reacts (for example, to test for a change in body chemistry or function such as the way muscles contract). Although done on people, this is still basic research because it is concerned with laboratory measurements rather than whether people develop diseases, feel better or live longer.

Applied research – does it work on people?

Studies involving people generally fall into three broad categories:

  1. Observational studies
  2. Intervention studies or trials
  3. Summaries of all the best quality randomised trials.

Observational studies

Observational studies examine patterns of health and disease in different groups of people who are exposed to different environments or lifestyles.

Intervention studies (trials)

Intervention studies investigate the effects of treatments, procedures or other regimens, by intentionally changing some aspect of the status of the people in the study. These are experimental studies to see whether people who get the intervention are better off than those who do not.

The most reliable intervention studies are those that involve randomly allocating one group to an intervention – whether a drug, a new type of surgery or an exercise programme, for example – and comparing the results with those in a control group who are untreated or who receive a different intervention. These are randomised controlled trials (RCTs). Randomised controlled trials are often also called randomised trials. A trial that is randomised will always be ‘controlled’ because it will have a control group, but, be aware, a controlled trial is not necessarily randomised.

Interestingly, it was observational studies that helped raise hopes that one of the vitamins found in carrots, beta carotene, might help prevent cancer, because it was observed that people with higher intakes of such vitamins had lower overall rates of certain cancers. But observational studies are not as reliable as randomised controlled trials; there is always the concern that there may be some other explanation. Could it be, for example, that people with high intakes of vitamins are more likely to be healthier anyway because they are also more likely to be eating other healthy foods and to be exercising and following healthy lifestyles?

Another example was the belief that hormone replacement therapy (HRT) would protect women who had gone through the menopause from heart attacks and strokes after the menopause. For years, many older women took HRT to help stave off these and other risks. It was observed that women who took HRT were less likely to have heart attacks and strokes. Just like the beta carotene story, the opposite was found to be the case when a randomised trial was done. One of the reasons that results from observational studies should be treated with caution is the possibility of bias. Women who choose to take HRT may be wealthier, eat a better diet, exercise more regularly, smoke less, attend health check-ups more regularly, etc. Despite the best efforts of researchers to adjust statistically for some of these factors, it is impossible to account for everything and bias can creep in.3–5

The best way of dealing with this type of concern is by testing a theory using randomised controlled trials. Randomised controlled trials are the ‘gold standard’ for evaluating treatments and other interventions because the randomisation process – where research participants are randomly allocated (for example, by the flip of a coin) to either an active treatment or a placebo or comparative treatment – helps reduce the risk of other factors influencing the results. We can be even more confident in the results when both the researchers and the research participants have been ‘blinded’ or ‘masked’, so that they do not know who is taking the active treatment. Indeed, as mentioned earlier, when the randomised controlled trials of beta carotene supplements were finally done, they suggested that, if anything, the supplements might increase the risk of some cancers6 and the randomised trial of HRT showed that it increased the risk of heart attacks and strokes, particularly in the first 12 months.4, 5

Randomised controlled trials also allow a comparison with what would have happened without the intervention. It is all very well to say that a new antibiotic, for example, cures 90 per cent of people suffering from a respiratory infection. But what if 90 per cent would have recovered anyway, without any treatment? Too often, however, we hear reports from the media and other sources that a clinical trial has shown such and such. What we need to know is whether this was a randomised controlled trial, because clinical trials do not always include a control group and are not necessarily randomised. The results of randomised controlled trials are available for many areas of healthcare and the number is increasing. However, if such evidence is lacking, you might have to rely on the next best source of evidence. Below are some of the other points that can help you evaluate health advice and, it is hoped, avoid common pitfalls.

Summaries of all of the best quality randomised trials

In 1971, a British doctor and epidemiologist by the name of Archie Cochrane wrote an important and controversial book entitled Effectiveness and Efficiency: Random reflections on health services.7 This book suggested that many people were being over-treated in a well-meaning effort to do everything possible to ‘cure’ them. He argued that the systematic use of medical research, in particular, evidence from randomised controlled trials, should be encouraged, so that safe and effective therapies would be more likely to be used and ineffective and unsafe ones minimised.

Eventually he established the Cochrane Library which is now an online database and is available free of charge in several countries around the world, including Australia, Ireland, Norway, Finland and the UK. It is a database that contains summaries of the best research on treatments and covers a whole range of topics from ‘acupuncture treatment for depression’ to ‘zinc for treating the common cold’.

These summaries are called systematic reviews. They are usually better than just looking at just one randomised trial because, if a number of trials come out in favour of a treatment, that means the theory has been tested and proved over and over again and the results are more reliable. Systematic reviews are also more dependable because experts putting them together usually disregard any randomised trials that have been poorly conducted and keep only the good quality studies in their summary.

An example of a systematic review from the Cochrane Library is one that summarises the results of 24 randomised trials (that involved 3392 people between them all) testing the effect of over-the-counter treatments for acute cough. It showed that there is not enough research evidence for or against cough mixtures and suggests that this should be borne in mind if people choose to use them.8

Common pitfalls to avoid when assessing research

Just because two events occur together, does not mean that one event causes the other

You may have heard about the guy who had a habit of clapping his hands loudly several times every few minutes. When his friend asked why, he explained that it kept the elephants away. ‘But there are no elephants around here!’ his friend exclaimed, dismayed. He replied: ‘You see, it works.’

Because two events or characteristics are associated does not mean that they are related, let alone that one caused the other. Just because people with red hair and blue eyes are more likely to get skin cancer does not mean that their risk will be reduced if they wear coloured contact lenses and dye their hair. Red hair and blue eyes are associated with an increased risk of skin cancer because they are also associated with pale skin, but are not, in themselves, a risk factor for skin cancer.

Similarly, you often hear reports suggesting that one disease or another is the result of an infectious agent. For example, one recent study was reported as showing that heart disease may result from a virus because the virus was found in clogged artery walls. Is this convincing? Not necessarily. Even if the study showed that cells from diseased artery walls were far more likely to be infected with the virus, this does not prove cause and effect. It may simply be that the diseased cell walls are more prone to infection – in other words, that the disease may precede the virus.

As the benefits and harms of many modern interventions can take decades to become apparent, it is difficult for the general consumer and health practitioner to draw conclusions about the cause and effects of diseases and treatments without the knowledge gained from proper studies. If a young woman takes a ‘morning after’ pill to avoid an unwanted pregnancy, she may be relieved when her period arrives 2 weeks later. Her anecdotal experience probably convinces her that the pill has been effective. What she may not realise is that, even if she had not taken the pill, there was a 90 per cent chance that she would not become pregnant.

Even if two events are associated, the causal arrow does not always point in the direction that is intuitively assumed; the cause-and-effect sequence may be reversed. A TV show host overlooked this point when describing a study that claimed that families who eat together have better communications. The host assumed that, if dysfunctional families wanted to improve their relations, all they had to do was share meals. In fact, meal sharing may be an effect rather than a cause of good family dynamics – that families who get on well tend to share experiences, including mealtimes.

Anecdotal evidence can be unreliable. You cannot infer a general rule from a single experience – especially someone else’s

Anecdotal evidence is often the most difficult advice to resist because it is based on someone else’s personal experience, which can sound extremely convincing and compelling. If your next-doorneighbour recovered from cancer after a watermelon diet, that can sound very persuasive. But we already know the dangers of assuming cause and effect – just because she ate the watermelon before recovery does not mean that it caused her recovery. Remember, too, that only survivors speak: perhaps 50 other people died of cancer after trying the ‘miracle watermelon cure’. Anecdotal reports can give an unbalanced perspective. Now, if there had been a randomised controlled trial showing that patients who ate watermelon survived twice as long that would have been a different story.

Some things get better on their own (spontaneous remission). It is impossible to know whether a treatment ‘worked’ unless you know for sure what would have happened in the absence of treatment

Say you take antispasmodics – medication to stop painful bowel spasms – for irritable bowel syndrome. If the symptoms disappear over a few months after the treatment, you might assume that the antispasmodics worked. But the condition might have improved anyway. Only randomised controlled trials will answer whether the treatment will help more people to recover than would have recovered anyway. In fact, a summary (systematic review) of 11 randomised trials comparing antispasmodics with ‘fake pills’ or placebos showed that there was a slight increase in the number of people who got pain relief: 46 people out of 100 will get pain relief with the placebo and 58 out of 100 will get pain relief from the antispasmodics. In other words 12 extra people in every 100 will be helped, but almost half of all people got better with a placebo.9 On the other hand, six randomised controlled trials of antidepressant drugs for irritable bowel showed no difference between them and placebo.9

This example also reflects what statisticians refer to as regression to the mean. This is a tendency for values in nature to shift towards average – for example, children of exceptionally tall parents are likely to grow into shorter adults than their parents, closer to the average height. And children of very short parents are likely to become taller than their parents, closer to average height.

Similarly, an unusually high or low result from a medical test is likely to reflect a more average result on repeat testing. For example, if you have a very high cholesterol count on one occasion, it is likely to be lower at the next test, even if you do nothing about it.10 To get a true measure, you should have several tests. This phenomenon also occurs because of the body’s natural healing processes, which means that many abnormal states (of sickness) tend to shift towards the average (good health).

Put simply, some things just get better on their own.

Thousands of well-meaning John and Jane Does have boosted the fame of folk remedies and have signed sincere testimonials for patent medicines, crediting them instead of the body’s recuperative powers for a return to well-being.

James Harvey Young11

The placebo effect is powerful. People often report an improvement on almost any therapy, even a placebo (an inactive intervention). This is why it is difficult to discern the real effects of active treatments without randomised controlled trials

In one experiment, patients with bleeding ulcers were divided into two groups. The first group was told that their treatment would dramatically ease their pain. The second group was told that their treatment was only experimental and little was known about its effect. Of the first group 75 per cent reported sufficient pain relief. Of the second group, only 25 per cent reported a similar benefit. Both groups had been given the identical ‘treatment’ – a placebo containing no active pharmacological ingredient.12

What was at work was the power of the placebo – or perhaps, more correctly, the power of the mind. The placebo effect is a well-documented phenomenon, whereby the apparent outcome of treatment can be positively influenced by the mere expectation that it will work, held by the patient and/or doctor.

You might ask why it is so important to determine what produces the benefit. After all, does it really matter what makes someone feel better – the placebo effect or an active treatment? Of course, if placebos work that’s great, but we want to know whether it is worth risking the side effects of any additional pharmacological effect of an active drug beyond its placebo effect.

Consider that earlier in the twentieth century many thousands of patients with angina (chest pain caused by constricted blood vessels) underwent various treatments that are now known to have no effect whatsoever. Many of these patients and their doctors reported remarkable (if not long-lasting) improvements after trying potentially dangerous drug treatments, and also after an invasive surgical procedure that involved tying off an artery in the chest.

This is one of the most important reasons for randomised controlled trials, which help discern the impact of the active component of a treatment over and above its placebo impact.

The placebo effect is generally seen as beneficial for patients, because it can improve symptoms. But it can also be responsible for harmful effects – what is sometimes called the nocebo effect. For example, some dentists say that controversy over the safety of amalgam fillings has had a nocebo effect. As they are worried that their fillings might be making them sick, some people have felt symptoms – regardless of whether their teeth are filled with amalgam or other substances.

Screening tests that detect disease early are not always beneficial. They can lead to people living more years with disease rather than longer lives

A screening test, as distinct from a diagnostic test, is used to identify disease in people who have no symptoms. This is great if early diagnosis of a disease will result in more effective treatment. However, in some cases, making an early diagnosis may not be helpful, particularly if there is no effective treatment. In many countries, the advent of tests to screen healthy men for markers associated with prostate disease has led to an explosion in the number of men diagnosed with the cancer. In the UK alone, the number of men diagnosed with prostate cancer almost doubled in the 5 years from 1990 and in Australia it tripled.13, 14 But this dramatic increase is not believed to represent a ‘real’ increase in the cancer’s incidence and is instead believed to reflect earlier diagnosis. Thus, there are more men who now know that they have prostate cancer, but not necessarily any more men with the cancer.

For some diseases, early detection does not help to prolong life because earlier treatment is no more effective than later treatment. In these situations, early detection simply increases the years of disease from the time of diagnosis rather than increasing years of life. This is called ‘lead time bias’. To explain further, here is an example.

Andy is the same individual in all scenarios and this is what might happen to him in three different situations, as if they were happening in parallel universes (Figure 3.1):

Figure 3.1. Andy’s three scenarios.

Figure 3.1

Andy’s three scenarios.

  1. Scenario 1: Andy decides against screening in 2000 and dies in 2010, 5 years after developing symptoms. He lives for 5 years with disease X.
  2. Scenario 2: Andy is screened in 2000, found to have disease X and dies in 2010, 5 years after developing symptoms. Screening has not prolonged his life but merely increased the number of years lived with disease x from 5 years to 10 years.
  3. Scenario 3: Andy is screened in 2000, found to have disease X and dies 15 years later in 2015. Screening has prolonged his life by 5 years.

From this example we can see that longer survival from time of diagnosis is not a reliable way of determining whether screening is effective. For this we need randomised controlled trials comparing death rates in screened and unscreened groups.

Screening for prostate cancer is another good illustration of the potential for screening programmes to do more harm than good. In the UK and Australia, most authorities have not recommended that a formal screening programme be introduced for this reason, although there is a great deal of de facto screening occurring. (For further information about this subject, see the NHS Cancer Screening Programme’s Prostate Cancer Risk Management website and also The PSA Decision – what you need to know video and booklet.15, 16)

Here are some statistics that help explain why screening is not necessarily beneficial: suppose 10,000 men are screened by a PSA test, which measures the blood levels of prostate-specific antigen (Figure 3.2). Of these, 8500 will have a negative result, although 765 of this group can be expected to develop the cancer anyway, because the test is not 100 per cent accurate (and nor is any test).

Figure 3.2. The consequences of screening for prostate cancer.

Figure 3.2

The consequences of screening for prostate cancer.

Of the 1500 men with a positive result who undergo further testing, 1050 will then be given the all clear, although up to 20 may develop complications as a result of their further investigations and all could be expected to have suffered some degree of psychological stress.17

Of the 450 who are shown to have the cancer, it is not yet clear to what extent treatment will extend their lives or improve their quality of lives. Some will suffer serious consequences as a result of their treatment, such as incontinence and impotence. And because of the slow-growing nature of many prostate cancers – it is commonly said that most men die with prostate cancer rather than of it – it is quite possible that many men will have suffered adversely from investigation and treatment for a condition that may never have harmed them. The trouble is that we do not now have a good way of selecting which men might benefit from early detection and treatment.

The prostate cancer story is a powerful reminder of why you should always ask what the risks and benefits of any screening test are. Even mammography screening, which has proved to save lives when used to detect breast cancer in women aged over 40, involves some harms, and these may outweigh benefits at the younger end of that age range.

You should ask what is the chance that this screening test will accurately detect an important disease? What are the risks and benefits of earlier detection of the disease? Will it give you extra years of life, or just extra years of disease?


Assessing medical research can be complex – even for the experts. It helps to understand some of the more common pitfalls:

  • Laboratory-based research on animals does not necessarily apply directly to humans.
  • To test whether a treatment is effective in humans requires a randomised controlled trial on people who have the condition of interest.
  • Just because health characteristics or events are associated – or occur together – does not mean that they are related, or that there is a cause-and-effect relationship.
  • Anecdotal evidence can be dangerous. You cannot infer a general rule from a single experience – especially someone else’s.
  • Many diseases get better with or without treatment. It is impossible to know whether a treatment ‘worked’ unless you know for sure what would have happened in the absence of treatment.
  • The placebo is powerful. People often report an improvement on almost any therapy, even a placebo (a biologically inactive intervention). This makes it difficult to discern the real effects of active treatments without randomised controlled trials.
  • Screening tests that detect early disease are not always beneficial. They can lead to people living more years with disease rather than leading longer lives. This is called ‘lead-time bias’. (A screening test, as distinct from a diagnostic test, is used to identify disease in people who have no symptoms.)


Chalmers I. What do I want from health research and researchers when I am a patient? BMJ. 1995;310:1315–18. [PMC free article: PMC2549685] [PubMed: 7773050]
Gilovich T. How We Know What Isn’t So: The fallibility of human reason in everyday life. New York: Free Press; 1991. p. 187.
Rossouw J, Anderson G, Prentice R, et al. Risks and benefits of estrogen plus progestin in healthy postmenopausal women. JAMA. 2002;288:321–33. [PubMed: 12117397]
Wassertheil-Smoller S, Hendrix S, et al. Effect of estrogen plus progestin on stroke in postmenopausal women. JAMA. 2003;289:2673–84. [PubMed: 12771114]
Manson J, Hsia J, KC J, Rossouw J, et al. Estrogen plus progestin and the risk of coronary heart disease. N Engl J Med. 2003;349:523–34. [PubMed: 12904517]
Bjelakovic G, Nikolova D, Simonetti R, Gluud C. Antioxidant supplements for preventing gastrointestinal cancers. Cochrane Database of Systematic Reviews. 2006 [PubMed: 18677777]
Cochrane A. Effectiveness and Efficiency: Random reflections on health services. Abingdon: Burgess & Son Ltd; 1971.
Schroeder K, Fahey T. Over-the-counter medications for acute cough in children and adults in ambulatory settings (Cochrane Review) Cochrane Database of Systematic Reviews. 2004;(4) [PubMed: 15495019]
Quartero A, Meineche-Schmidt V, Muris J, Rubin G, de Wit N. Bulking agents, antispasmodic and antidepressant medication for the treatment of irritable bowel syndrome. Cochrane Database of Systematic Reviews. 2006;(2) [PubMed: 15846668]
Irwig L, Glasziou P, Wilson A, Macaskill P. Estimating an individual’s true cholesterol level and response to intervention. JAMA. 1991;266:1678–85. [PubMed: 1886192]
Young J. Consumer Health – A Guide to Intelligent Decisions. Times Mirror/Mosby; 1985.
Cousins N. Anatomy of an Illness as Perceived by the Patient: Reflections on healing and regeneration. New York: Norton; 1979.
Cancer Research UK. UK Prostate Cancer Incidence Statistics. http://www​.info.cancerresearchuk​.org/cancerstats​/types/prostate​/incidence/#trends.
Sweet M. Fears of needless cancer tests. The Sydney Morning Herald. 1996. Sect. 3.
NHS Cancer Screening Programme. Prostate Cancer Risk. http://www​.cancerscreening​
The Foundation for Informed Medical Decision Making, The PSA Decision – What YOU Need to Know. Hanover, Hampshire: The Foundation for Informed Medical Decision Making;
Hirst G, Ward J, Del Mar C. Screening for prostate cancer: the case against. Med J Australia. 1996;164:285–8. [PubMed: 8628164]
Copyright © 2008, Professor Les Irwig, Judy Irwig, Dr Lyndal Trevena, Melissa Sweet.

All rights reserved. No part of this publication may be reproduced, stored in any retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the publishers and copyright holder or in the case of reprographic reproduction in accordance with the terms of licences issued by the appropriate Reprographic Rights Organisation.

Bookshelf ID: NBK63649


Related information

  • PMC
    PubMed Central citations
  • PubMed
    Links to PubMed

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...