NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Evans I, Thornton H, Chalmers I, et al. Testing Treatments: Better Research for Better Healthcare. 2nd edition. London: Pinter & Martin; 2011.

Cover of Testing Treatments

Testing Treatments: Better Research for Better Healthcare. 2nd edition.

Show details

Chapter 1New – but is it better?


Without fair – unbiased – evaluations, useless or even harmful treatments may be prescribed because they are thought to be helpful or, conversely, helpful treatments may be dismissed as useless. And fair tests should be applied to all treatments, no matter what their origin or whether they are viewed as conventional or complementary/alternative. Untested theories about treatment effects, however convincing they may sound, are just not enough. Some theories have predicted that treatments would work, but fair tests have revealed otherwise; other theories have confidently predicted that treatments would not work when, in fact, tests showed that they did.

Although there is a natural tendency to think ‘new’ means ‘improved’ – just like those advertisements for washing machine detergents – when new treatments are assessed in fair tests, they are just as likely to be found worse as they are to be found better than existing treatments. There is an equally natural tendency to think that because something has been around for a long time, it must be safe and it must be effective. But healthcare is littered with the use of treatments that are based on habit or firmly held beliefs rather than evidence: treatments that often do not do any good and sometimes do substantial harm.

There is nothing new about the need for fair tests: in the 18th century James Lind used a fair test to compare six of the remedies then being used to treat scurvy, a disease that was killing vast numbers of sailors during long voyages. He showed that oranges and lemons, which we now know contain vitamin C, were a very effective cure.


‘Our brains seem to be hard-wired for anecdotes, and we learn most easily through compelling stories; but I am aghast that so many people, including quite a number of my friends, cannot see the pitfalls in this approach. Science knows that anecdotes and personal experiences can be fatally misleading. It requires results that are testable and repeatable. Medicine, on the other hand, can only take science so far. There is too much human variability to be sure about anything very much when it comes to individual patients, so yes there is often a great deal of room for hunch. But let us be clear about the boundaries, for if we stray over them the essence of science is quickly betrayed: corners get cut and facts and opinions intermingle until we find it hard to distinguish one from the other.’

Ross N. Foreword. In: Ernst E, ed. Healing, hype, or harm? A critical analysis of complementary or alternative medicine. Exeter: Societas, 2008:vi–vii.

In 1747, while serving as a ship’s surgeon aboard HMS Salisbury, James Lind assembled 12 of his patients at similar stages of the illness, accommodated them in the same part of the ship, and ensured that they had the same basic diet. This was crucial – it created a ‘level playing field’ (see Chapter 6 and box in Chapter 3, p26). Lind then allocated two sailors to receive one of the six treatments that were then in use for scurvy – cider, sulphuric acid, vinegar, seawater, nutmeg, or two oranges and a lemon. The fruit won hands down. The Admiralty later ordered that lemon juice be supplied to all ships, with the result that the deadly disease had disappeared from the Royal Navy by the end of the 18th century.

Of the treatments Lind compared, the Royal College of Physicians favoured sulphuric acid while the Admiralty favoured vinegar – Lind’s fair test showed that both authorities were wrong. Surprisingly, influential authorities are quite frequently wrong. Relying too much on opinion, habit, or precedent rather than on the results of fair tests continues to cause serious problems in healthcare (see below, and Chapter 2).

James Lind (1716–1794), Scottish naval surgeon, pictured with the books he wrote, and the title page of the most famous of these, in which he recorded a controlled trial done in 1747 showing that oranges and lemons were a more effective treatment for scurvy than five other treatments then in use (see

James Lind (1716–1794), Scottish naval surgeon, pictured with the books he wrote, and the title page of the most famous of these, in which he recorded a controlled trial done in 1747 showing that oranges and lemons were a more effective treatment for scurvy than five other treatments then in use (see

Today, uncertainties about the effects of treatments are often highlighted when doctors and other clinicians differ about the best approach for a particular condition (see Chapter 5). In addressing these uncertainties, patients and the public, as well as doctors, have an important part to play. It is in the overwhelming interest of patients, as well as professionals, that research on treatments should be rigorous. Just as health professionals must be assured that their treatment recommendations are based on sound evidence, so patients need to demand that this happens. Only by creating this critical partnership can the public have confidence in all that modern medicine has to offer (see Chapters 11, 12, and 13).



Thalidomide is an especially chilling example of a new medical treatment that did more harm than good.1 This sleeping pill was introduced in the late 1950s as an apparently safer alternative to the barbiturates that were regularly prescribed at that time; unlike barbiturates, overdoses of thalidomide did not lead to coma. Thalidomide was especially recommended for pregnant women, in whom it was also used to relieve morning sickness.

Then, at the beginning of the 1960s, obstetricians began to see a sharp increase in cases of severely malformed arms and legs in newborn babies. This previously rare condition results in such extremely shortened limbs that the hands and feet seem to arise directly from the body. Doctors in Germany and Australia linked these infant malformations with the fact that the mothers had taken thalidomide in early pregnancy.2


‘In the period immediately after World War II, many new treatments were introduced to improve the outlook for prematurely-born babies. Over the next few years it became painfully clear that a number of changes in caretaking practices had produced completely unexpected harmful effects. The most notable of these tragic clinical experiences was an “epidemic” of blindness, retrolental fibroplasia, in the years 1942–54. The disorder was found to be associated with the way in which supplemental oxygen had come to be used in the management of incompletely developed newborn babies. The twelve-year struggle to halt the outbreak provided a sobering demonstration of the need for planned evaluation of all medical innovations before they are accepted for general use.’

Silverman WA. Human experimentation: a guided step into the unknown. Oxford: Oxford University Press, 1985:vii–viii.

At the end of 1961, the manufacturer withdrew thalidomide. Many years later, after public campaigns and legal action, the victims began to receive compensation. The toll of these devastating abnormalities was immense – across the 46 or so countries where thalidomide was prescribed (in some countries even sold over the counter), thousands of babies were affected. The thalidomide tragedy stunned doctors, the pharmaceutical industry, and patients, and led to a worldwide overhaul of the process of drug development and licensing.3


Although drug-testing regulations have been tightened up considerably, even with the very best drug-testing practices there can be no absolute guarantee of safety. Non-steroidal anti-inflammatory drugs (NSAIDs) provide a good illustration of why vigilance in relation to drugs is needed. NSAIDs are commonly used to relieve pain and reduce inflammation in various conditions (for example, arthritis), and also to lower temperature in patients with a fever. The ‘traditional’ NSAIDs include many drugs that are available over the counter such as aspirin and ibuprofen. Among their side-effects, they are well known for causing irritation of the stomach and gut, leading to dyspepsia (‘indigestion’) and sometimes bleeding and even gastric (stomach) ulcers. Consequently, there was good reason for drug companies to see if they could develop NSAIDs that did not cause these complications.

Rofecoxib (best known by the marketing name of Vioxx, but also marketed as Ceoxx, and Ceeoxx) was introduced in 1999 as a supposedly safer alternative to the older compounds. It was soon widely prescribed. Little more than five years later Vioxx was withdrawn from the market by the manufacturer because of an increased risk of cardiovascular complications such as heart attack and stroke. So what happened?

Vioxx was approved by the US Food and Drug Administration (FDA) in 1999 for the ‘relief of the signs and symptoms of osteoarthritis, for the management of acute pain in adults, and for the treatment of menstrual symptoms [that is, period pains]’. It was later approved for relief of the signs and symptoms of rheumatoid arthritis in adults and children. During development of Vioxx, drug company scientists became aware of potentially harmful effects on the body’s blood clotting mechanisms which could lead to an increased risk of blood clots. Yet the generally small studies submitted to the FDA for approval purposes concentrated on evidence of Vioxx’s anti-inflammatory effect and were not designed to look into the possible complications.4

Before the FDA approval, the company had already begun a large study mainly designed to compare gut side-effects by comparison with those of another NSAID, naproxen, in patients with rheumatoid arthritis. Once again, the study was not specifically designed to detect cardiovascular complications. Moreover, questions were later raised about conflicts of interest among members of the study’s data and safety monitoring board (these boards are charged with monitoring the accumulating results of studies to see whether there is any reason for stopping the research).

Nevertheless, the results – which showed that Vioxx caused fewer episodes of stomach ulcers and gastrointestinal bleeding than naproxen – did reveal a greater number of heart attacks in the Vioxx group. Even so, the study report, published in a major medical journal, was heavily criticized. Among its flaws, the results were analyzed and presented in such a way as to downplay the seriousness of the cardiovascular risks. The journal’s editor later complained that the researchers had withheld critical data on these side-effects. However, the results, submitted to the FDA in 2000, and discussed by its Arthritis Advisory Committee in 2001, eventually led the FDA to amend the safety information on Vioxx labelling in 2002 to indicate an increased risk of heart attacks and stroke.

The drug company continued to investigate other uses of Vioxx, and in 2000 embarked on a study to see whether the drug prevented colorectal (lower gut) polyps (small benign tumours that may progress to colorectal cancer). This study, which was stopped early when interim results showed that the drug was associated with an increased risk of cardiovascular complications, led to the manufacturer withdrawing Vioxx from the market in 2004. In the published report, the study’s authors, who were either employed by the manufacturer or in receipt of consulting fees from the company, claimed that the cardiovascular complications only appeared after 18 months of Vioxx use. This claim was based on a flawed analysis and later formally corrected by the journal that published the report.4 In the face of numerous subsequent legal challenges from patients, the manufacturer continues to claim that it acted responsibly at all times, from pre-approval studies to safety monitoring after Vioxx was marketed. It has also reaffirmed its belief that the evidence will show that pre-existing cardiovascular risk factors, and not Vioxx, were responsible.5

The Vioxx scandal shows that, half a century after thalidomide, there is still much to do to ensure that treatments are tested fairly, that the process is transparent, and that the evidence is robust. As one group of commentators put it ‘Our system depends on putting patients’ interests first. Collaborations between academics, practising doctors, industry, and journals are essential in advancing knowledge and improving the care of patients. Trust is a necessary element of this partnership, but the recent events have made it necessary to institute proper systems that protect the interests of patients. A renewed commitment by all those involved and the institution of these systems are the only way to extract something positive from this unfortunate affair’.4


2010 saw another drug – rosiglitazone, better known by the trade name Avandia – hitting the headlines because of unwanted side-effects involving the cardiovascular system. Ten years earlier Avandia had been licensed by drug regulators in Europe and the USA as a new approach to the treatment of type 2 diabetes. This form of diabetes occurs when the body does not produce enough insulin, or when the body’s cells do not react to insulin. It is far more common than type 1 diabetes, in which the body does not produce insulin at all. Type 2 diabetes, which is often associated with obesity, can usually be treated satisfactorily by modifying the diet, exercising, and taking drugs by mouth rather than by injecting insulin. The long-term complications of type 2 diabetes include an increased risk of heart attacks and strokes; the main aim of treatments is to reduce the risk of these complications. Avandia was promoted as acting in a novel way to help the body’s own insulin work more effectively and was said to be better than older drugs in controlling blood sugar levels. The focus was on the blood sugar and not on the serious complications that cause suffering and ultimately kill patients.

When Avandia was licensed, there was limited evidence of its effectiveness and no evidence about its effect on the risk of heart attacks and strokes. The drug regulators asked the manufacturer to do additional studies, but meanwhile Avandia became widely and enthusiastically prescribed worldwide. Reports of adverse cardiovascular effects began to appear and steadily mounted; by 2004 the World Health Organization was sufficiently concerned to ask the manufacturer to look again at the evidence of these complications. It did, and confirmed an increased risk.6

It took a further six years before the drug regulators took a really hard look at the evidence and acted. In September 2010 the US Food and Drug Administration announced that it would severely restrict the use of Avandia to patients who were unable to control their type 2 diabetes with other drugs; the same month the European Medicines Agency recommended that Avandia be withdrawn from use over the subsequent two months. Both drug regulators gave the increased risk of heart attacks and strokes as the reason for their decision. Meanwhile independently minded investigators uncovered a litany of missed opportunities for action – and, as one group of health professionals put it, a fundamental need for drug regulators and doctors to ‘demand better proof before we embarked on mass medication of a large group of patients who looked to us for advice and treatment’.7

Mechanical heart valves

Drugs are not the only treatments that can have unexpected bad effects: non-drug treatments can pose serious risks too. Mechanical heart valves are now a standard treatment for patients with serious heart valve disease and there have been many improvements in design over the years. However, experience with a particular type of mechanical heart valve showed how one attempt to improve a design had disastrous consequences. Beginning in the early 1970s, a device known as the Björk-Shiley heart valve was introduced, but the early models were prone to thrombosis (clot formation) that impaired their function. To overcome this drawback, the design was modified in the late 1970s to reduce the possibility of clots.

The new device involved a disc held in place by two metal struts (supports), and many thousands of this new type of valve were implanted worldwide. Unfortunately, the structure of the valves was seriously flawed: one of the struts had a tendency to snap – a defect known as strut fracture – and this led to catastrophic and often fatal valve malfunction.

As it happened, strut fracture had been identified as a problem during pre-marketing tests of the device, but this was attributed to defective welding and the cause was not fully investigated. The US Food and Drug Administration (FDA) nevertheless accepted this explanation, along with the manufacturer’s assurance that the lowered risk of valve thrombosis more than compensated for any risk of strut fracture. When the evidence of disastrous valve failure became only too apparent, the FDA eventually acted and forced the valve off the market in 1986, but not before hundreds of patients had died unnecessarily. Although product regulation systems have now improved to include better post-marketing patient monitoring and comprehensive patient registries, there is still a pressing need for greater transparency when new devices are introduced.8



Commercial companies are not alone in trumpeting the advantages of new treatments while down-playing drawbacks. Professional hype and enthusiastic media coverage can likewise promote benefits while ignoring potential downsides. And these downsides may include not only harmful side-effects but also diagnostic difficulties, as shown by events surrounding the breast cancer drug trastuzumab, better known by the trade name Herceptin (see also Chapter 3).

In early 2006, vociferous demands from coalitions of patients and professionals, fuelled by the pharmaceutical industry and the mass media, led the UK National Health Service to provide Herceptin for patients with early breast cancer. ‘Patient pester power’ triumphed – Herceptin was presented as a wonder drug (see Chapter 11).

But at that time Herceptin had only been licensed for the treatment of metastatic (widespread) breast cancer and had not been sufficiently tested for early breast cancer. Indeed, the manufacturers had only just applied for a licence for it to be used to treat early stages of the disease in a very small subset of women – those who tested positive for a protein known as HER2. And only one in five women has this genetic profile. The difficulties and costs of accurately assessing whether a patient is HER2 positive, and the potential for being incorrectly diagnosed – and therefore treated – as a ‘false positive’, were seldom reported by an enthusiastic but uncritical press. Nor was it emphasized that at least four out of five patients with breast cancer are not HER2 positive.9, 10, 11, 12

It was not until later that year that the UK’s National Institute for Health and Clinical Excellence (NICE) – the organization charged with looking at evidence impartially and issuing advice – was able to recommend Herceptin as a treatment option for women with HER2 positive early breast cancer. Even then, there was an important warning. Because of mounting evidence that Herceptin could have adverse effects on heart function, NICE recommended that doctors should assess heart function before prescribing the drug, and not offer it to women with various heart problems, ranging from angina to abnormal heart rhythms. NICE judged that caution was necessary because of short-term data about side-effects, some of them serious. Long-term outcomes, both beneficial and harmful, take time to emerge.13

Similar pressures for use of Herceptin were being applied in other countries too. In New Zealand, for example, patient advocacy groups, the press and the media, drug companies, and politicians all demanded that breast cancer patients should be prescribed Herceptin. New Zealand’s Pharmaceutical Management Agency (PHARMAC), which functions much as NICE does in the UK, similarly reviewed the evidence for use of Herceptin in early breast cancer. In June 2007, based on its review, PHARMAC decided that it was appropriate for early breast cancer patients to receive nine weeks of Herceptin, to be given at the same time as other anti-cancer drugs, rather than one after another. This nine-week course was one of three regimens then being tried around the world. PHARMAC also decided to contribute funds to an international study designed to determine the ideal length of Herceptin treatment. However, in November 2008, the newly elected government ignored PHARMAC’s evidence-based decision and announced funding for a 12-month course of the drug.14


In 2006, a patient in the UK, who happened to be medically trained, found herself swept along by the Herceptin tide. She had been diagnosed with HER2 positive breast cancer the preceding year.

‘Prior to my diagnosis, I had little knowledge of modern management of breast cancer and, like many patients, used online resources. The Breast Cancer Care website was running a campaign to make Herceptin available to all HER2 positive women and I signed up as I simply could not understand, from the data presented on the website and in the media, why such an effective agent should be denied to women who, if they relapsed, would receive it anyway. . . . I began to feel that if I did not receive this drug then I would have very little chance of surviving my cancer! I was also contacted by the Sun newspaper who were championing the Herceptin campaign and were interested in my story, as a doctor and a “cancer victim”.

At the completion of chemotherapy, I discussed Herceptin treatment with my Oncologist. He expressed concerns regarding the long-tem cardiac [heart] effects which had emerged in studies but had received very little attention on the website and from the media, especially when one considered that the drug was being given to otherwise healthy women. Also, more careful analysis of the “50% benefit” which had been widely quoted and fixed in my mind actually translated into a 4–5% benefit to me, which equally balanced the cardiac risk! So I elected not to receive the drug and will be happy with the decision even if my tumour recurs.

This story illustrates how (even) a medically trained and usually rational woman becomes vulnerable when diagnosed with a potentially life threatening illness. . . . much of the information surrounding the use of Herceptin in early breast cancer was hype generated artificially by the media and industry, fuelled by individual cases such as mine.’

Cooper J. Herceptin (rapid response). BMJ. Posted 29 November 2006 at

Numerous uncertainties remain about Herceptin – for example, about when to prescribe the drug; how long to prescribe it for; whether long-term harms might outweigh the benefits for some women; and whether the drug delays or prevents the cancer returning. A further concern that has emerged is that Herceptin, when given in combination with other breast cancer drugs such as anthracylines and cyclophosphamide, may increase the risk of patients experiencing adverse heart effects from about four patients in a hundred to about 27 patients in a hundred.15


  • Testing new treatments is necessary because new treatments are as likely to be worse as they are to be better than existing treatments
  • Biased (unfair) tests of treatments can lead to patients suffering and dying
  • The fact that a treatment has been licensed doesn’t ensure that it is safe
  • Side-effects of treatments often take time to appear
  • Beneficial effects of treatments are often overplayed, and harmful effects downplayed
Copyright © 2011 Imogen Evans, Hazel Thornton, Iain Chalmers and Paul Glasziou.
Bookshelf ID: NBK66210
PubReader format: click here to try


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...