Archimedes: An Analytical Tool for Improving the Quality and Efficiency of Health Care

Publication Details

David M. Eddy and Leonard Schlessinger

Care Management Institute, Kaiser Permanente and Kaiser Permanente Southern California

The practice of medicine has become extraordinarily complex, and it promises to become even more complex as the pace of innovation accelerates. Managing that complexity requires good information about the effects of different courses of action on health, logistic, and economic outcomes. The preferred method of obtaining that information is through empirical clinical research. Unfortunately, in medicine the ability to conduct clinical research is severely limited by the high cost of enrolling and following patients, the long follow-up times, the large number of options to be compared, the large number of patients, unwillingness of people to participate (e.g., to be randomized or to follow a specified protocol), and unwillingness of the world to stand still until the research is done. A typical clinical trial comparing just two options requires thousands of patients, costs tens or hundreds of millions of dollars, takes 3 to 15 years, and is likely to be outdated before it is completed.

In other fields, mathematical models have been used to help make decisions and design systems. However, the variability of human biology and behavior, the size and complexity of health care systems, and the wide variety of important questions to be addressed all place special demands on health care models. We have designed a new type of model, called Archimedes, to try to address these special demands. This paper describes the basic structure and scope of the model, the modelling methods, how we can validate the model, and its potential uses.


Archimedes has three main parts. At the core is a model of human physiology that describes the pertinent aspects of anatomy, physiology, pathophysiology, occurrence of signs and symptoms, effects of tests and treatments, and occurrence of health outcomes. The second part consists of care process models; these describe what providers do when a person seeks care or what providers can do to prevent a person from needing care. The third part, system resources, includes such things as personnel, facilities, equipment, and costs. The full Archimedes model is applied in a specific health care setting defined by specific care processes and specific system resources.

A complete description of all the objects and their attributes, functions and interactions is not possible here. But to give you a sense of the model's scope, I will describe some of the main classes of objects and give examples of their attributes and functions.

Patients. We use the term “patient” to mean anyone who might receive health care from the system, including people when they are well. The attributes of patients can be as detailed as required; they can include age, sex, risk factors, behaviors, education level, type of employment, and insurance coverage. All patients have physiologies, which include all pertinent organs and biological variables. As governed by the equations, patients can get diseases, which can modify the functions of their organs and can cause signs, symptoms, and health outcomes. Patients have perceptions, memories, and behaviors that determine how they respond to signs and symptoms and how they adhere to interventions. Their risk factors, physiologies, and behaviors can respond to interventions, which in turn can affect the occurrence and progression of their diseases. As in reality, each patient is different, and the spectra of physiologies, behaviors, and other characteristics correspond to the spectra seen in reality.

Health Care Providers. All pertinent types of personnel involved directly or indirectly in providing health care are included. Examples are nurses, pharmacists, physicians, telephone operators, and case managers. Within each of these types are the appropriate subtypes to model a particular problem (e.g., physicians → surgeons → cardiac surgeons → pediatric cardiac surgeons). Health care providers have attributes (e.g., ages, skill levels, behaviors), as well as functions (e.g., cardiac surgeons can perform bypasses, but telephone operators can not).

Interventions. Archimedes includes two main types of interventions. “Tests and treatments” encompass what care is delivered. This type includes: changes in risk factors and preventive treatments; tests that provide information about the existence, severity, or prognosis of a disease; “curative treatments” that directly affect the progression and outcomes of a disease; and “symptomatic treatments” that affect the symptoms of a disease, without affecting its progression. The other type of intervention, “care processes,” determines how tests and treatments are delivered. Examples are: use of case managers, creation of a registry to increase compliance with a performance measure, and development of criteria for referrals to specialists. For either type of intervention it is possible to specify the types of providers who can deliver it, the types of facilities or locations where it can be provided, and the types of equipment and supplies it requires. In the model, such things as the use, effectiveness, and cost of an intervention can vary depending on many factors, such as patient characteristics, type of provider, skill of provider, time of day, delivery site, and random factors.

Policies, Protocols, and Regulations. The use and effectiveness of any intervention can be determined by a set of policies and protocols that describe such things as: who delivers it, where it is delivered, the criteria for determining which patients should get it, the sequence of events for implementing it, and the decision rules applied at different steps. Clinical practice guidelines, performance measures, strategic goals, and the “what-to-do” parts of disease management programs are examples of policies that affect tests and treatments. Continuous quality improvement projects, nursing protocols, instructions to telephone operators, and the “how-to-do-it” parts of disease management programs are examples of policies that affect care processes. The accuracy with which any of these is applied can allow for variations and random factors that mimic the variations and randomness of real practice. For example, adherence to a particular guideline can be different for a primary care physician than for a specialist, for a physician who has attended a continuing medical education class within the last 12 months, or for a physician who sees more than 50 patients a year who are candidates for the guideline.

Facilities, Equipments, and Supplies. Archimedes can include all types of facilities, equipment, and supplies that are involved in the management of a disease. Any type of any of these classes can be expanded to any level of detail (e.g., bed → monitored bed → monitored bed in the emergency department).

Logistics and Finances. Archimedes can record the cost, location, time, and any other important circumstance of every event. Thus virtually any type of budget, table of accounts, utilization report, or forecasting report can be calculated.


The mathematical foundations of the Archimedes model are described elsewhere (Schlessinger and Eddy, 2002). Briefly, it is written in differential equations and programmed Smalltalk, an object-oriented language. The most difficult part of the model is the representation of physiology. We conceptualize the physiology of a person as a collection of continuously interacting objects that we call “features.” The concept of a feature is very general, but features correspond roughly to anatomic and biological variables. Examples in the current Archimedes model are systolic and diastolic blood pressures, patency of a coronary artery, cardiac output, visual acuity, and amount of protein in the urine. Features can represent real physical phenomena (e.g., the number of milligrams of glucose in a deciliter of plasma), behavioral phenomena (e.g., ability to read an eye chart), or conceptual phenomena (e.g., the “resistance” of liver cells to the effects of insulin).

The model is largely driven by the trajectories of features—their values as continuous functions of time. They register the effects of patient characteristics, interact continuously with each other, determine the occurrence and progression of diseases, trigger the onset and determine the severity of signs and symptoms, are measured by tests, respond to treatments, and cause health outcomes. Specifically, differential equations are used to define the progression of each feature as a function of patient attributes as well as other features. At any given time, the values of features can be measured by tests, subject to both random and systematic errors. Equations define clinical events, such as signs, symptoms, and health outcomes, as functions of the magnitudes and trajectories (e.g., rate of change) of various combinations of features. Diseases, which in reality are human-made labels for constellations of biological variables, are defined in the model in the same way. For example, in the model as in reality, a person is said to have “diabetes” if the fasting plasma glucose exceeds 125 mg/dl or the oral glucose tolerance test exceeds 199. Treatments are included as parameters in the equations for features, being able to change their values, rates of progression, or both. In the model, treatments do this at the level at which their actual mechanisms of action are understood to occur. For example, in the model the drug Metformin affects the equation that determines the amount of glucose produced by the simulated liver cells. Finally, the signs, symptoms, and behaviors caused by changes in features set in motion all the logistic events and use of resources that occur in a health care system.

In general, several dozen features and 10 to 30 equations are necessary to calculate the occurrence of any particular outcome (e.g., the rate of heart attacks in a specified population). The model currently includes the features pertinent to coronary artery disease, congestive heart failure, diabetes, and asthma. Features relating to other diseases are being added continually. Other formulas describe the clinical, logistic, and economic events. These formulas are typical of decision trees, flow charts, and accounting models. All of the formulas can include person-to-person differences, random variations, and uncertainty.

The level of detail of the model is determined by the intended users. We build the physiology part of the model to the level of detail clinicians tell us they consider necessary for their decisions. As a result, the physiology model corresponds roughly to the level of biological detail found in patient charts, general medical textbooks, and the designs of clinical trials. Care processes, logistics, resources, and costs are modelled at an equally high level of detail, as determined by administrators. For example, there are 37 different types of outpatient primary care visits.


Archimedes is built from existing basic research, epidemiological studies, and clinical trials of treatments (Schlessinger and Eddy, 2002). When person-specific data are available, they can be used to derive equations for features as functions of other features. When person-specific data are not available, aggregated data, such as those routinely published for registries, population-based studies, and clinical trials, can be used. In general, the results of any well designed study can be used to build the part of an Archimedes model that addresses biological phenomena, outcomes, and interventions that were investigated in the study.

The data to describe care processes are not routinely collected or published. In practice, we develop our models of care processes through examination of administrative data, existing protocols, interviews, and on-site observations, checked against any available data. Pilot studies can be conducted as needed for processes that are determined through sensitivity analysis to be critical.


Methods. Ultimately, the value of a model depends on how accurately it can represent reality. The deep level of physiological detail coupled with the care processes in the Archimedes model provide a rigorous way to test this. The validation strategy is to identify an epidemiological study or clinical trial, conduct a “virtual study” or “virtual trial” in the virtual world of the model, and then compare the results. The basic steps are: (1) Have the model “give birth” to a large population of simulated people. Imagine a large city of simulated people with a representative spectrum of characteristics (e.g., age, sex, race/ethnicity, and genetic background) and medical histories. They are all unique, and most will never get the disease to be studied in the trial. (2) Run the model to let them age naturally until they reach the age range of the people who were candidates for the real trial. (3) Identify those who would meet the inclusion criteria for the trial, and select from them a sample that corresponds to the sample size of the real trial. (4) Randomize the simulated participants into groups, as was done in the real trial. (5) Have simulated providers give the patients the treatments according to the protocols described for the real trial. (6) Run the model for the simulated duration of the trial, with the simulated providers applying whatever follow-up and testing protocols were used in the real trial. (7) Count the outcomes of interest that occurred to the participants in the simulated trial. (8) Compare them to the results observed in the real trial. We use Kaplan Meier curves to make the comparisons because they contain the most information about the outcomes in all of the arms of a trial at all time periods.

All of this is done at whatever level of detail is necessary to simulate what was done in the real trial, using whatever descriptions are available from publications. For example, if “hypertension” is defined as “a finding on at least two of three consecutive measurements obtained one week apart … of a mean systolic blood pressure of more than 135 mm Hg or mean diastolic blood pressure of more than 85 mm Hg, or both,” that is what we have the simulated physicians do.

Each trial that is simulated in this way provides a sensitive test of the model. For each, the simulated results come from thousands of simulated individuals, each of whom has a simulated liver, heart, pancreas, and other organs. Each liver produces glucose, each coronary artery can develop plaque or thrombus at any point, and each kidney clears urine. The progression of the pathological process is different in every person, just as in reality. The simulations also include simulated physicians following simulated practice patterns or guidelines, with different degrees of compliance … on through to the performance of tests, reporting of results, making of errors, giving of treatments, use of facilities and equipment, and generation of costs. All told, each simulation tests scores of equations in every patient and hundreds of other equations that all have to work correctly in concert.

At the end of a simulation, the results of the virtual study should closely match the results of the real study, within the bounds of random variation related to sample size. We say there is a “statistical matching” of results if there is no statistically significant difference between the model's results and the real results.

To help probe different parts of the model and to check its validity for different populations, organ systems, treatments and outcomes, we test the model in this way against a variety of different trials. Each validation exercise uses the same model with the same parameter values; parameters are not set to “fit” one trial and then reset to fit another trial. The trials are chosen by an independent advisory committee, which also reviews the results.

In some cases, some information from a trial is needed to help build some part of the model. When this occurs, the information from the trial is used to help derive only one equation out of the 10 to 30 used to calculate the outcome of interest in the population of interest. Thus a validation exercise involving such a trial not only confirms the equation it helped build, but also provides an independent validation of the other equations. Furthermore, the equation built with help of any particular trial is independently tested by all of the validation exercises involving other trials. Out of the 18 trials used to validate the model thus far, 8 were used to help build the model, 10 were not.

Validation Results. Using these methods, the Archimedes diabetes model has been validated against 17 epidemiological studies and 18 clinical trials thus far. The example shown in Figure 1 compares the model results with the trial results for the Heart Protection Study (2002). This trial randomized about 25,000 high-risk people to receive either a placebo or a cholesterol-lowering drug, Simvastatin. People were defined as being at high risk if they had coronary artery disease, occlusive arterial disease, or diabetes. The primary outcome was the fraction of people who developed heart attacks. No information from this trial was used to help build the model.

FIGURE 1. Comparison of model and trial of fraction of patients having major coronary events in the Heart Protection Study (2002).


Comparison of model and trial of fraction of patients having major coronary events in the Heart Protection Study (2002).

Counting the different arms and outcomes of the 18 trials, a total of 74 validation exercises have been conducted to date. (Figure 1 illustrates 2 of the 74.) In 71 of the exercises, the model's results statistically matched the real results. For the three exercises that were not a statistical match, in one case the difference in results was just barely statistically significant (p = 0.04), which is to be expected in 74 exercises. In the other two, the difference was due to the model underestimating the underlying rate of the outcome in the trial population by about 35 percent. (The model estimated the effect of the treatment accurately.) The advisory committee concluded that this discrepancy was most likely due to a risk factor in the trial population that was not described in the publication and therefore could not be included in the model. Considering all 74 exercises, the correlation between the model's results and the real results is r > 0.99. Considering only the 10 trials that were not used to help build the model, the correlation was still r > 0.99.


Archimedes is meant to create a virtual world at the level of detail at which real clinical and administrative decisions are made. Once created and validated, the virtual world can be used to explore a wide variety of scenarios and questions, much as a flight simulator can be used to simulate different types of flying conditions and emergencies. Applications include: (1) designing and testing clinical management tools, such as guidelines, performance measures, strategic goals, disease management programs, priorities and continuous quality improvement programs; (2) evaluating and performing cost-effectiveness analyses of clinical and administrative programs; (3) designing and interpreting clinical research, including setting priorities for new trials, planning trials (e.g., sample size, duration, clinical costs), projecting long-term Phase 3 results from short-term Phase 2 results, estimating outcomes in subpopulations, and extending the results of a trial (e.g., predicting 15-year outcomes from 3-year outcomes, predicting outcomes that were not initially measured); (4) estimating outcomes for specific patients who are contemplating different treatment options; and (5) creating a “living library”—a place where the current body of knowledge about a disease is not only organized and stored, but is also integrated in a quantitative way that can be used for the other types of applications just described.


Archimedes is distinguished from other models by several features. It is a person-by-person, object-by-object simulation. It covers a broad spectrum, spanning features from biological details to the care processes, logistics, resources, and costs of health care systems. It is written at a deep level of biological, clinical, and administrative detail. It is continuous in time; there are no discrete time steps, and any event can occur at any time. Biological variables that are continuous in reality are represented continuously in the model; there are no clinical “states” or “strata.” It includes many diseases simultaneously and interactively in a single integrated physiology, enabling it to address comorbidities, syndromes, and treatments with multiple effects. Finally, it has been validated by simulations of a wide range of clinical trials.

Archimedes is not intended to replace reality. If a question can be answered with a well designed empirical study, that approach is always preferable. Our goal is to provide a trial-validated method that can be used to address problems that can not be feasibly addressed through empirical studies, because of high cost, long follow-up times, large sample size, unwillingness of providers or patients to participate, large number of options, or the rapid pace of technological change. In the way that a flight simulator provides valuable experience, shortens the time needed in real planes, and simulates experiences that are too dangerous or rare to attempt for real (like severe wind shear), the Archimedes diabetes model should be a useful tool for sharpening our understanding of diseases and their management.

The model, which was developed and is owned by Kaiser Permanente, is currently being prepared to be made accessible to individuals and organizations, over the Web, through a friendly interface on a nonprofit basis. The website is expected to be completed by the end of 2005. In the meantime, the authors can be contacted by e-mail about access (moc.oohay@nepsaydde).


  1. Heart Protection Study Collaborative Group. MRC/BHF Heart Protection Study of antioxidant vitamin supplementation in 20,536 high-risk individuals: a randomized placebo-controlled trial. Lancet. 2002;360(9326):23–33. [PubMed: 12114037]
  2. Schlessinger L, Eddy DM. Archimedes: a new model for simulating health care systems—the mathematical formulation. Journal of Biomedical Informatics. 2002;35(1):37–50. [PubMed: 12415725]