What can the study of lead teach us about other toxicants?

The history of knowledge about lead toxicity may serve as a useful template to judge and predict progress in understanding other toxicants. A paradigm shift has occurred in which toxicity has been recognized at levels long held to be harmless. This shift has been accelerated by the use of newer tools for measuring outcome. Lead effects have been identified in children at blood lead levels as low as 15 micrograms/dL. They include impaired psychometric intelligence, language function, attention, and classroom behavior. Lead exposure during pregnancy results in increased risk for minor malformations and lowered infant IQ scores until at least 2 years of age. Understanding of this toxicant has been blurred by seven unrecognized Type II errors frequently encountered in the lead literature. These errors are discussed. A meta-analysis of thirteen informative lead studies in children is presented. The joint probability of the findings occurring by chance under the null hypothesis is less than 3 x 10(-12).


Introduction
In his monumental book The Structure of Scientific Revolutions, Thomas Kuhn pointed out that the nature ofscientific progress is less like a slow march to the truth than a tag-team match in which competing models of reality, "paradigms" in Kuhn's notation, vie for dominance (1).
The overthrow of a governing model or "paradigm shift" is often marked by the discarding of customary tools as well as ideas; this has happened in neurotoxicology. The beginnings of a toxicologic paradigm shift were recently presaged by the abandonment of a number of scientific tools. Toxicologists gave up the LD50 that asked: "How much poison did it take to kill half your rats?" and neurologists jettisoned the Babinsky sign, that asked: "Did the toe go up or down?" The trading of these binary events (life-death; up-down) for graded measures of function (IQ scores; trials to criterion) allowed investigators to see heretofore obscured events at lesser doses. The causal chain worked simultaneously in the other direction; the idea that finer changes were wrought at lesser doses energized the search for sensitive measures of outcome. This sequence has been followed in the case of lead. The terrain covered in the search for behavioral effects at lesser doses provides lessons that may serve future *School of Medicine, Western Psychiatric Institute and Clinic, University of Pittsburgh, 215 Webster Hall, Pittsburgh, PA 15213. A portion of this paper was presented at the International Meeting on Neuropsychological Effects of Low-Level Lead Exposure in Edinburgh, Scotland, in September 1986. investigators in the pursuit into the twenty-first century of the neurobehavioral footprints of other toxicants.
This paper outlines the growth of knowledge about lead toxicity and then reviews some data that have shaped the contemporary picture ofthe impact oflead on children's brains and behavior, focusing primarily on the studies of my group. There are many contributors to the understanding of lead toxicity who deserve mention.
The Shifting Paradigm of Lead Intoxication Table 1 shows an overview of lead toxicity over the past 2000 years. It also illustrates the steady downward revision of what has been defined as a toxic dose. Randolph Byers, one of this country's first pediatric neurologists, primed the paradigm shift in the understanding of lead. Byers treated many cases of childhood lead intoxication. The conventional wisdom at the time of Byers' work was that if a child survived the illness, he or she was left without sequelae. Byers was, at the same time, seeing a number of cases of learning disorders and realized that some of them were his recovered cases of lead poisoning. With Dr. Elizabeth Lord, a psychologist at the Boston Children's Hospital, Byers followed up 20 recovered cases and, instead of using the neurological examination, they employed psychometric tests and found that 19 of 20 were showing cognitive or behavioral deficits (2). Byers asked, 45 years ago, how many cases of school failure were, in fact, missed cases of lead intoxication. The modern era of lead toxicology began.  dose. For some outcomes there will be a difference of opinions to where the limit for adverse health effect should be placed. For many nonrate limiting, noncritical events there will be a small change that all will agree is not deleterious to the welfare of the host. Similarly, there will be a point where alljudges will agree the effect is adverse to health. It is in the range between these boundaries that the debate flourishes, or rages, and values exhibit themselves. For IQ, it is my position that no decrement is a nonhealth effect.

Values in Toxicology Judgments
Design Issues in the Study of Lead at Low Dose Table 2 lists the design problems in observational studies of lead. Note that the direction of the bias is not

Studies of Lead at Low Dose
In the early 1970s my group was interested in the relationship between low-level lead exposure and school failure. Byers' papers raised intriguing questions regarding lead effects on mental development. Consequently, we studied the relationship between school function and intelligence in a cohort of first-grade children in relation to the past lead exposure. The conventional index ofbody burden oflead was the blood lead level. For reasons discussed earlier, blood lead levels were not satisfactory in children whose exposure has ended. Lead goes to bone, but bone biopsies are not possible in community studies.
A spontaneous bone biopsy is available for the investigator: the deciduous tooth. The shed deciduous tooth lead concentrations accurately separated children from the lead belt from thse where lead exposure was a rarity (4) (Fig. 3). The shed tooth was a good marker of past exposure. We then went on to study a cohort of Boston area first-grade and second-grade subjects to examine the relationship between dentine lead level and neurobehavioral function. Children were classified by the amount of lead in their shed tooth dentine. Then, controlling for other covariates, a number of outcome measures were made. Lead was found to be significantly delay, and teachers' ratings of classroom behavior (5). These findings were later replicated in England (6) and Germany (7).
A third generation of studies of lead at low dose have been published since 1985 (8)(9)(10). These studies, recognizing design problems of the earlier investigations, were more rigorous and have found effects at lower levels. It is noteworthy that the last three studies examined middle-class children and were able to detect effects in the range as low as 10 to 15 ,ug/dL.
It has been suggested that body lead burden is really a marker ofpreexisting deficit; that is to say, children who are intellectually deficient eat more foreign substances. Three studies effectively refute this thesis. These studies measured prenatal exposure to lead, as indexed by umbilical cord blood lead level and then went on to measure infant development.
My group examined cord blood levels in 12,000 birth experiences at the Boston Hospital for Women. The group then looked at birth outcome for 5000 births where there was adequate historical data about preexisting risks such as smoking, alcohol use, drugs, and past health history. Lead was found to be related to the risk of minor malformations in a dose-dependent fashion ( Table 3)

Epistemologic Issues in Making Judgments
This section examines some of the issues that make the study of lead at low dose a little more confused and contentious than necessary. These remarks intrude into the realm of epistemology and go to such questions as: "How do we know what we know?" and: "How do we know that we know?" Minimizing Type I errorsaccepting spurious relationships-is appropriate scientific behavior. But skepticism towards accepting causal claims is often purchased at the price of allowing excessive Type II errors-rejecting valid associations between lead and outcome. The current literature shows a marked increase in sophistication and rigor in the majority of modern lead studies. At the same time, the careful reader will note the relatively recent employment, in some lead studies and reviews, of tactics that tend to increase the risk of Type II errors in judgment or interpretation.
There are seven such tactical solecisms in design or interpretation that increase Type II bias. These are as follows: The Sacrament of p < 0.05 In evaluating whether a given set of observed differences in IQ scores between lead exposed and nonexposed children should be taken as causally related, some investigators dismiss any studies in which the p value is greater than 0.05. Differences ofp = 0.07 or 0.1 are said to be due to chance and, even further, taken as evidence that no relationship between lead and deficit exists in nature (15,16). This use of a significance level as a dichotomous classifier to sort out causally from accidentally related associations, ignores the genesis ofthe test of statistical significance. Most writers acknowledge Sir Ronald Fisher as the source of the value p = 0.05 (17). In his 1925 edition of Statistical Methods for Research Workers, Fisher states: It is convenient to take this point (p = 0.05) as a limit in judging whether a deviation is to be considered significant or not. Deviations exceeding twice the standard deviation are thus formally regarded as significant.
Note here the use of the term "convenient." It is only time and casual practice that have served to harden this preference into an icon. Jerome Cornfield's comments on this point are worth noting: "The pre-specification of a significance level, e.g., 0.05 or 0.01 has no sound logical basis and remains unjustified." (18)

Reliance on Phantom Covariates
Because cognitive function is determined by multiple factors, careful investigators of the effects of lead try to identify and evaluate those nonlead covariates that could confound. Partitioning of the variance usually, but not always, has the effect of reducing the size of the lead effect. Some investigators [(for example, see Smith et al. (15)] extrapolate from this reduction of effect size after covariate adjustment to argue that because controlling for nonlead variates reduced the variance due to lead, if the proper unnamed variate should be found, then controlling for it would set the lead coefficient at zero. In the paper cited, Smith states: The findings in this study show that if outcome measures are controlled, differences between lead groups on all tests become non-significant and the null hypothesis that the differences are not statistically different from zero must be accepted. In other words, social factors explain the differences in test performance to such a considerable degree that it is likely that the very small differences that remain once social factors have been taken into account are due to chance or to other social factors not measured. [Emphasis added.] It is not required to postulate ghosts in the epidemiologic machinery.

Building False Causal Models
Variates that are measured in a study may be independent variables that affect the outcome under examination, or they may themselves be affected by lead. They may, in fact, occupy both positions in the causal chain. The question of simultaneity, which is just beginning to gain attention in the area of lead toxicity, will not be addressed here. To control for such variates as school placement (7), hyperactive behavior (19), or developmental delay (15), may be to substract out variance, which properly belongs to the main effect, lead. Because it has been shown that lead exposure during pregnancy can affect later development, control of early development or temperament may result in over-controlling for lead. Investigators should, at the least, report the results with and without controlling for the variates.
In the study of prenatal exposure, the transgenerational influence of lead has received little attention. Since most economically disadvantaged parents have little economic mobility, they tend to reside in the same or similar rfeighborhoods from childhood through their adult years. It is reasonable to expect that mothers (and fathers) share lead exposures and burdens similar to  those of their offspring. Milar et al. (20) suggest that higher lead burdens in infants and children are associated with poor maternal rearing, as measured by scaled scores such as the Caldwell HOME (21). What has not been appreciated is that some of the poorer rearing scores in mothers of children with higher lead levels may derive from deficits in the mother's behavior, and this might be a result of the mother's exposure to lead when she was a child. This effect of lead exposure on rearing patterns has been experimentally demonstrated in the rodent (22).

Accepting the Null Hypothesis from Studies with Inadequate Power
Focusing attention on the a risk in a study can lead the investigator away from attention to the ,B risk. Most published studies cite the a risk, but infrequent attention is given to the 0 risk. Inescapably, value choices are expressed in this regard. To some, scientific rigor is thought to be defended by lowering a levels, preventing or minimizing the number of spurious facts inserted into the literature and reducing the number of unnecessary replications. But narrowing the gate for new ideas and observations, particularly in the area ofpreventive medicine, may have unfortunate implications.
Underestimating the Biological Significance of a Demonstrated Effect Size Studies of lead have shown effect sizes of approximately 4 to 6 points. Differences of this magnitude have effect sizes of0.30 to 0.45 standard deviations. A number of commentators have defined these differences as minimal or of no health consequence (15,16). We have pointed out that a difference between median IQ scores of 6 points predicts a 4-fold increase in the proportion of significantly impaired children (22) (Fig. 4).
Expecting Proof of Causality A number of critics of studies asserting that an association between lead and outcome has been demonstrated, reject the study because the causal relationship has not been proven. This criticism usually depends on two arguments: flaws in design or execution of the paper under examination, and the possibility that some covariate may not have been recognized and controlled. No real-world epidemiological study is without flaw. As a consequence, all are vulnerable to this criticism. Since multivariate space has infinite dimensions (e.g., has the study controlled for birth weight, gestational age, hair color, handedness, degress of neonatal icterus, serum iron level, school quality ...?), and the supply of subjects is finite, the investigator will necessarily be confronted with an unsaturated structural mode. A clever biostatistician with access to a rather dull computer (or a dull biostatistician with a clever computer) can fit an infinite number of regression equations to the data in that circumstance. In addition, the variates measured only imperfectly capture the factors of real interest to the study. Family size, socioeconomic status, and mother's IQ do not, after all, directly influence the child's intellectual function; they are surrogates for other variables more proximate to the outcomes of interest. These variables, specified imperfectly, are also unavoidably measured with some error. These design hurdles, taken in sum, provide the investigator with inescapable constraints on the demonstration of causal relationships. But even if these design difficulties were surnounted, the demonstration of causal proof could not be accomplished. David Hume stated 200 years ago that causality is a concept not susceptible to empirical demonstration. Epidemiologists and bench scientists, as well, accept more modest goals for themselves: the accretion ofincremental bits ofdata that assemble themselves into a coherent picture from which lawfulness can be inferred.

Evaluating Studies in Isolation
Most narrative reviews examine each study's methodology, detail the strengths and weaknesses, and then attempt a narrative summary of the combined import of the studies. Often a simple tally of those studies that showed an effect and those that showed no effect is presented in the conclusion. This discarding ofindividual studies on the basis of flawed design or execution is another form ofrequiring causal proof. Inferences do not grow from single studies; they are a product of the interaction of many scientists whose studies build upon each earlier study, and while imperfect themselves, the collective nonlinear sum of their conclusions permits the making of causal inferences with some confidence.
This method of narrative reviewing has inherent limitations; the method of selection is often subjective, and the evaluation of the merits of each study is not separated from the bias ofthe reviewer. One response to this dilemma is the quantitative integrative review, or metaanalsysis. In meta-analysis, each study is treated as a subject in a study of studies, and the combined, integrated effects ofthe agent under question are evaluated.
We reviewed all studies of low-level lead exposure in children and conducted a metaanalysis on those 13 studies that were informative enough to allow combining inferences. Table 4 shows the studies, their effect size, power to find an effect, and the joint probability estimated by Fisher's aggregation technique (24). Clearly information is contained in all studies and the possibility that this distribution of probabilities occurred by chance under the null hypothesis is vanishingly small (< 3 x 10-"). Recognizing that studies that show an effect are more likely to be published than negative studies, we calculated the number of unpublished studies with p values < 0.5 that would be required to dilute out the positive studies in this sample. We estimated that 75 studies are necessary. Given the spotlight on this area and the vocal nature of the participants in the field, it is unlikely that this number of studies are languishing in the ffies of investigators out of public awareness.

Conclusion
There are lessons to be learned from the study of lead. They may be applied with profit to the understanding of other pollutants. These lessons can be summarized by the following points: first, behavior may be among the most sensitive end points; second, the threshold for discerned effect will depend on the sensitivity of the measures employed and the rigor of the design; third, samples less than 400 may miss important effects that are there, only because ofthe weak power to find a small effect; fourth, small does not mean unimportant, it means difficult to isolate in a multivariate field; fifth, proper causal models are required to reduce the risk of confounding and the twin risk of over-control; sixth, values inevitably intrude into the conduct of scientific enterprises. They can take the shape of relative weights assigned to a and 1 risks or defining what constitutes an adverse health effect.