Temporal Variability of Urinary Phthalate Metabolite Levels in Men of Reproductive Age

Phthalates are a family of multifunctional chemicals widely used in personal care and other consumer products. The ubiquitous use of phthalates results in human exposure through multiple sources and routes, including dietary ingestion, dermal absorption, inhalation, and parenteral exposure from medical devices containing phthalates. We explored the temporal variability over 3 months in urinary phthalate metabolite levels among 11 men who collected up to nine urine samples each during this time period. Eight phthalate metabolites were measured by solid-phase extraction–high-performance liquid chromatography–tandem mass spectrometry. Statistical analyses were performed to determine the between- and within-subject variance apportionment, and the sensitivity and specificity of a single urine sample to classify a subject’s 3-month average exposure. Five of the eight phthalates were frequently detected. Monoethyl phthalate (MEP) was detected in 100% of samples; monobutyl phthalate, monobenzyl phthalate, mono-2-ethylhexyl phthalate (MEHP), and monomethyl phthalate were detected in > 90% of samples. Although we found both substantial day-to-day and month-to-month variability in each individual’s urinary phthalate metabolite levels, a single urine sample was moderately predictive of each subject’s exposure over 3 months. The sensitivities ranged from 0.56 to 0.74. Both the degree of between- and within-subject variance and the predictive ability of a single urine sample differed among phthalate metabolites. In particular, a single urine sample was most predictive for MEP and least predictive for MEHP. These results suggest that the most efficient exposure assessment strategy for a particular study may depend on the phthalates of interest.

Phthalates, diesters of phthalic acid, are a family of multifunctional chemicals that are widely used in personal and consumer products. Phthalates are used to hold color and scent in consumer and personal care products (Koo et al. 2002); as solvents in paints, glue, insect repellents, lubricants, and adhesives [Agency for Toxic Substances and Disease Registry (ATSDR 2001]; and to soften a wide range of plastics (Bradbury 1996), including polyvinyl chloride (PVC) used in the manufacture of medical products such as blood, intravenous, and dialysate bags and tubing (Nassberger et al. 1987). Diethyl phthalate (DEP), di-n-butyl phthalate (DBP), and butyl benzyl phthalate (BBzP) are principally used in personal care products, such as body lotions, gels, shampoos, and deodorants (ATSDR 1995(ATSDR , 2001. They also have U.S. Food and Drug Administration approval for uses in food packaging and processing materials that are in contact with food, and as a result they have been found in food (Castle et al. 1990;Page and Lacroix 1995). DBP, BBzP, and di-(2-ethylhexyl) phthalate are also used in residential building materials such as floorings, paints, carpet backings, adhesives, and wallpaper, and in PVC products such as auto parts and interiors (ATSDR 2001(ATSDR , 2002. Although the volatility of phthalates is relatively low, studies have shown that phthalates are present in residential indoor air (Jaakkoala et al. 1999;Rudel et al. 2003).
The ubiquitous use of phthalates results in human exposure via dietary ingestion of foods (such as milk, butter, and meats), dermal absorption of low-molecular-weight phthalates (e.g., DEP, DBP, BBzP), inhalation of the more volatile phthalates, and parenteral exposure from medical devices containing phthalates (ATSDR 1995(ATSDR , 2001(ATSDR , 2002. Recently, researchers at the Centers for Disease Control and Prevention (CDC) developed analytical methods for the quantitative detection of phthalate metabolites in urine (Blount et al. 2000). Phthalate monoester metabolites were measured because of potential sample contamination from the parent diesters and because the metabolites are considered the biologically active toxicant (Li et al. 1998;Albro 1982: Sjoberg et al. 1986). The use of phthalate metabolites in urine as biomarkers of exposure now allows researchers to accurately measure human exposure to phthalates. These biomarkers represent an integrative measure of phthalate exposure from multiple sources and pathways. Recently, four phthalate metabolites-monoethyl phthalate (MEP), mono-(2-ethylhexyl) phthalate (MEHP), monobutyl phthalate (MBP), and monobenzyl phthalate (MBzP)-were found in the urine samples of > 75% of approximately 2,550 participants of the National Health Nutrition and Examination Survey (NHANES) 1999-2000(CDC 2003Silva et al. 2004).
Because humans and other mammals rapidly metabolize phthalate diesters to their respective monoesters, which in turn may be further metabolized, phthalates do not bioaccumulate (ATSDR 1995(ATSDR , 2001(ATSDR , 2002Peck and Albro 1982). Because biologic half-lives of phthalates are on the order of hours, urinary metabolite levels reflect exposures that most likely occurred ≤ 1 day preceding the collection of the urine specimen. However, because most health end points of interest are likely associated with exposures over time intervals longer than a few days, information on the temporal variability of urinary levels of phthalate metabolites is needed to optimize the design of exposure assessment in epidemiologic studies. Currently there are limited published data on the temporal variability of urinary phthalate monoester metabolite concentrations. A recent study documented good reproducibility of urinary phthalate monoester levels in two first-morning urine specimens collected for 2 consecutive days; day-to-day intraclass correlation coefficients (ICCs) ranged from 0.5 to 0.8 (Hoppin et al. 2002). Time intervals beyond a couple of days were not explored.
Variability in an individual's exposure to phthalates can result from changes in the use of personal care products, diet, or daily activity patterns, such as time spent in specific microenvironments (i.e., residential, workplace, or other) with ambient phthalate levels. Therefore, characterizing an individual's phthalate exposure is complex, and exposure may vary considerably over short time periods, such as days. Although phthalate biomarkers in urine are available to accurately measure a person's exposure at a single point in time, determining exposure over time intervals of weeks or months will require multiple measurements of phthalate metabolites. Therefore, the present study was designed to explore the temporal variability in urinary phthalate metabolite levels. Our design allowed us to determine between-and withinsubject variability in urinary phthalate metabolite levels, as well as apportion the within-person variability into monthly and daily variances. We also explored the sensitivity of a single urine measurement to predict an individual's 3-month average exposure. This information can be used for designing exposure assessment strategies for epidemiologic studies and to adjust for measurement error in phthalate exposure.

Materials and Methods
Eleven men from our ongoing study of the relationship between environmental agents and male reproductive health agreed to participate in the phthalate variability study. Participant recruitment into the environmental agents and male reproductive health study has been previously described (Hauser et al. 2003). Briefly, men who were the partner in couples seeking fertility evaluation for inability to conceive were recruited to participate. The study site was the Massachusetts General Hospital (MGH) Andrology Laboratory, so most men resided in the New England area. At the clinic visit, each man was asked to produce a single semen sample and to collect a single spot urine sample.
For each of the 11 men in the phthalate temporal variability study, up to nine additional spot urine samples were collected during three cycles over a 92-day period. Ten of these 11 men each contributed a total of 10 urine samples (nine for the variability study and one for the male reproductive study), whereas one of the men provided a total of seven samples (including six for the variability study). Nested within each of the three cycles were three urine samples, collected on the first 3 consecutive days of each cycle. The first cycle began upon enrollment into the phthalate temporal variability study, and urine samples were collected on days 0, 1, and 2. Cycles 2 and 3 began 30 days and 90 days after cycle 1, respectively. Therefore, the nine urine samples were collected on days 0, 1, and 2 (cycle 1); days 30, 31, and 32 (cycle 2); and days 90, 91, and 92 (cycle 3).
All the urine samples were collected in a sterile specimen cup. The urine sample on day 0 was collected at the MGH Andrology laboratory. All other samples were collected at the subject's home and frozen before overnight shipment to the Harvard School of Public Health (HSPH) on blue ice. All urine samples were then shipped frozen on dry ice from HSPH to CDC. Eight phthalate monoesters-MBzP, MBP, MEP, MEHP, monomethyl phthalate (MMP), mono-noctyl phthalate (MOP), mono-3-methyl-5dimethylhexyl phthalate (MINP), and monocyclohexyl phthalate (MCHP)-were measured in each spot urine sample using an analytical approach developed at the CDC . Briefly, the determination of phthalate metabolites in urine involved enzymatic deconjugation of the glucuronidated metabolites, solid-phase extraction, separation with high-performance liquid chromatography, and detection by tandem mass spectrometry. Detection limits were in the low micrograms per liter range. Reagent blanks and 13 C 4 -labeled internal standards were used along with conjugated internal standards to increase the precision of the measurements. One method blank, two quality control samples (human urine spiked with phthalates), and two sets of standards were analyzed along with every 21 unknown urine samples. Analysts at the CDC were blind to all information concerning subjects.
Several methods adjust for urine volume (Boeniger et al. 1993;Teass et al. 1998). Although creatinine is a frequently used form of adjustment, if a compound is excreted primarily by tubular secretion it is not appropriate to adjust for creatinine level (Teass et al. 1998). Although the methods of excretion of the phthalate monoesters measured in this study are unknown, terephthalic acid was found to be actively secreted by renal tubules and actively reabsorbed by the kidney (Tremaine and Quebbemann 1985). Furthermore, because organic compounds that are glucuronidated in the liver, like the phthalates, are eliminated by active tubular secretion (Boeniger et al. 1993), creatinine adjustment may not be appropriate for phthalates. Additionally, creatinine levels may be confounded by muscularity, physical activity, urine flow, time of day, diet, and disease states (Boeniger et al. 1993;Teass et al. 1998). For these reasons, specific gravity, rather than creatinine, was used to normalize phthalate levels.
Urinary phthalate levels were normalized for dilution by specific gravity adjustment using the formula P c = P × [(1.024 -1)/(SG -1)], where P c is the specific-gravity-corrected phthalate concentration (micrograms per liter), P is the observed phthalate concentration (micrograms per liter), and SG is the specific gravity of the urine sample (Boeniger et al. 1993;Teass et al. 1998). Specific gravity was measured using a handheld refractometer (National Instrument Company, Inc., Baltimore, MD), which was calibrated with deionized water before each measurement.
Statistical analyses. We performed the statistical analyses using the Statistical Analysis Software (SAS), version 8.1 (SAS Institute, Cary, NC). Both unadjusted and specific-gravity-adjusted values were used. We constructed graphs to compare metabolite levels within and between subjects, and calculated Spearman correlation coefficients to investigate correlations between samples collected at different time points.
To assess between-and within-person variability of metabolite levels, we calculated ICCs for each metabolite based on output from a random effects model fit using PROC MIXED (Rosner 1999). ICC, defined as the ratio of between-person variance to total variance, is a measure of reliability of repeated measures over time. ICC ranges from 0 to 1, with values near 1 indicating high reliability and values near 0 indicating poor reliability. ICC can also be used in an internal validity study to account for measurement error in epidemiology effect estimates (Carroll et al. 1995;Rosner et al. 1992).
To apportion variances among nested components, we fit a hierarchical model (using PROC MIXED). For a more robust estimate of between-subject variability, we used the results of the single urine samples collected from all 369 men enrolled so far in the ongoing environmental agents and male reproductive health study in the variance apportionment analysis. Because the 11 men in this variability study were also enrolled in the male reproductive health study, their single urine sample collected for the reproductive study contributed additional information on variability. Within-subject variance was further apportioned into cycle-to-cycle variance and day-to-day variance (Box et al. 1978). Day-to-day variance was defined as the variance in phthalate metabolite levels between samples 1 or 2 days apart, regardless of whether they were collected in cycle 1, 2, or 3. Cycle-to-cycle variance was defined as the variance between the three cycles minus the day-to-day variances within the cycles. Because day is nested within cycle, the cycle-to-cycle variance uses information from the three nested daily samples in cycles 1, 2, and 3.
Although ICC is an indicator of reliability for continuous measures, it does not measure Article | Variability of urinary phthalates Environmental Health Perspectives • VOLUME 112 | NUMBER 17 | December 2004 the extent of exposure misclassification that may occur if exposure is categorized into tertiles of low, medium, and high exposure. To explore categorical exposure misclassification, we performed sensitivity and specificity analyses and surrogate category analyses. In both analyses, tertiles were created using the mean of the nine repeat urine samples for each of the 10 subjects in the variability study. The subject with only six repeat urine samples was not included in these analyses because he did not have complete data. Tertiles based on the 369 single urine samples from subjects in the male reproductive health study produced an unbalanced and unstable design because some of these tertiles contained zero subjects from the variability study. This led to nonidentifiable results for that tertile. Therefore, analyses using tertiles based on the 369 single urine samples are not presented.
In the surrogate category analysis, we calculated actual values for surrogate categories to show the quantitative differences in phthalate metabolite levels that correspond to the relative categories defined by a single urine sample from the 10 variability subjects (Willett 1998). We grouped variability subjects first into tertiles by treating each of the nine repeat urine samples as a single spot urine sample (i.e., the surrogate method). For instance, for each of the nine repeat urine samples, the 10 subjects were categorized into high, medium, or low tertiles. The "true value" for these same subjects based on their 3-month average phthalate metabolite levels (using all the nine replicate samples) was then assigned to the tertiles defined by the single (surrogate) sample. Each of the nine samples was used as the surrogate sample in separate calculations to check for consistency. Each subject's 10th sample from the male reproductive health study was not used in this analysis because this sample could have been collected up to 12 months earlier.
We also evaluated sensitivity and specificity of a single urine sample as a predictor of high and low tertiles of 3-month average phthalate metabolite levels by comparing the distribution of predicted and observed levels for agreement. For observed or "true" exposure, we calculated 3-month average metabolite levels (using all the nine replicate samples) for each subject and divided the 10 subjects into tertiles. The distribution of 96 individual samples (10 subjects providing nine replicate samples, one subject providing six) was then also divided into tertiles, with each sample representing a predicted value based on a single spot urine sample. For each sample time (days 0-92), agreement between predicted and observed "true" tertile categorization was scored across all subjects, resulting in nine separate contingency tables. All nine tables were then combined into a single table, where overall sensitivity and specificity were calculated (Peck et al. 2003). The same method was used to assess the sensitivity and specificity if two samples, and then additionally if three samples were taken for each subject at least one cycle apart within a 92-day time period. When evaluating the sensitivity of two and three samples, all possible combinations of sample pairings from the nine repeated samples, excluding samples from the same cycle, were used in the analysis. The goal was to simulate and compare the ability of exposure assessments that involve one, two, or three urine samples to predict a subject's "true" 3-month average exposure tertile classification.

Results
We measured eight phthalate metabolites in urine. However, because > 75% of the samples had nondetectable levels of MCHP, MOP, and MINP, the results for these three metabolites were not informative and were Article | Hauser et al. 1736 VOLUME 112 | NUMBER 17 | December 2004 • Environmental Health Perspectives  not included in the analyses. MEP was detected in 100% of samples, whereas MBP, MBzP, MEHP, and MMP were detected in > 90% of samples. Unadjusted and specific-gravity-adjusted median concentrations of MEP, MBP, MBzP, MEHP, and MMP from the 369 men who provided a single urine sample for the environmental agents and male reproductive health study are presented in Table 1. Of these 369 men, 11 also participated in the variability study. Ten of the 11 men provided nine urine samples collected over 92 days, whereas 1 man collected six urine samples over 32 days only. In Figures 1-5, the unadjusted ( Figures 1A-5A) and specificgravity-adjusted ( Figures 1B-5B) urinary phthalate metabolite concentrations are plotted by day for each subject (the one subject with only six urine samples was not plotted). Even after dilution adjustment, there was still substantial variability in phthalate metabolite concentrations over time. MEHP concentrations showed large within-subject variability, whereas MEP showed less within-subject variability.
Spearman correlation coefficients confirmed that urine samples taken closer together in time (days apart) were more strongly correlated than those collected months apart. The between-and within-subject variance apportionments for specific-gravity-adjusted phthalate levels are shown in Table 2. Withinsubject variance was then apportioned into cycle-to-cycle and day-to-day variance. As expected, the standard errors of the between-subject variance component estimates remained the same or were reduced when the single samples from the 358 men participants in the environmental agents and male reproductive health study were included in the analysis. The single urine samples contributed more information on between-subject than on within-subject variability. The between-subject variance estimates increased for all phthalate monoesters except for MMP and MEP. Between-subject variances ranged from 27.4% for MMP to 71.3% for MBP; therefore, ICC ranged from 0.27 for MMP to 0.71 for MBP.
Of the total subject variance among the 369 subjects, the day-to-day variance component ranged from 27.2% (MBP) to 58.1% (MMP), whereas the cycle-to-cycle variances ranged from 1.5% (MBP) to 16.3% (MEP). Cycle-to-cycle variance is the within-subject variance remaining after dayto-day variance is calculated using the replicate samples nested within each of the three cycles. These results suggest that, after accounting for day-to-day variance, there is little additional cycle-to-cycle variance. Therefore, if we were to collect only two urine samples a day apart, we would account for 83.7-98.5%, depending on the phthalate monoester, of the total subject variance, which is composed of between-and withinsubject variance. Likewise, if we collected two urine samples 1 month apart we would account for both cycle-to-cycle and day-today variability, or 100% of the within-subject variance.
To determine the predictive ability of a single urine sample to correctly categorize a subject's exposure into high, medium, or low tertiles, we calculated actual values (mean and geometric mean values) for surrogate categories. The results are presented in Table 3 (only the 10 subjects who provided nine urine samples each were used in this analysis). Overall, the results suggest that a single spot urine sample was predictive of the 3-month average exposure because there were monotonic increasing geometric means across tertiles. For instance, for MBP, when a single sample on day 0 was used to group subjects into low-, medium-, and high-exposure groups, the "true" geometric mean MBP levels increased from 12.7 µg/L in the group designated as low Article | Variability of urinary phthalates Environmental Health Perspectives • VOLUME 112 | NUMBER 17 | December 2004  exposure, to 22.8 µg/L in the mediumexposure group, to 28.3 µg/L in the highexposure group. Although single spot urine samples were generally predictive, there were differences in the predictive ability of a single urine sample for different phthalate monoesters. A single urine sample was least predictive for MEHP, where only five of the nine spot urine samples produced a monotonic increasing geometric mean. In contrast, eight of the nine spot urine samples produced monotonic increasing geometric means for MBzP, MBP, MEP, and MMP. As expected, MEP, with the widest range in exposure levels, showed the largest difference in geometric means between low-, medium-, and highexposure categories. For a more quantitative assessment of how well a single urine sample predicts a subject's exposure category based on 3-month average metabolite levels, we conducted sensitivity and specificity analyses, using only the results from the 10 subjects who provided nine urine samples each ( Table 4). The proportion of men who truly had the highest 3-month average exposure (top 33%) that would be identified as such using single urine samples anytime throughout that 3-month period (i.e., sensitivity) ranged from 0.56 for MEHP to 0.74 for MMP. The proportion of men with truly comparatively low exposure (tertiles 2 and 3) that were classified correctly (i.e., specificity) ranged from 0.83 for MEHP to 0.90 for MMP. Sensitivity analyses for one, two, or three urine samples are presented in Table 4. When two samples were collected 1-3 months apart, there were small increases in sensitivity and specificity, especially evident for MEHP. When three urine samples were collected, each 1-3 months apart, sensitivity moderately increased for MEHP and MMP, with slight increases for the other monoesters. In contrast, when three urine samples were collected on 3 consecutive days, sensitivity for MEHP, MBzP, MBP, and MMP did not increase. However, sensitivity did increase for MEP.
We also performed all analyses described above using unadjusted phthalate levels (data not shown). Overall, variance apportionment and sensitivity analyses were very similar to the specific-gravity-adjusted results shown above. The surrogate exposure category method differed slightly with less consistent dose-response categories found for the unadjusted phthalate metabolite levels.

Discussion
Although the present study found substantial within-subject variability in urinary phthalate metabolite levels, the sensitivity of a single Article | Hauser et al. 1738 VOLUME 112 | NUMBER 17 | December 2004 • Environmental Health Perspectives Only samples from the 10 subjects who provided nine urine samples each were used in this analysis. Only samples from the 10 subjects who provided nine urine samples each were used in this analysis. a Values in parentheses are sensitivity and specificity using geometric mean ranks instead of arithmetic mean ranks for observed tertile classification; for the other four phthalates these values were identical.
spot urine sample to predict 3-month average phthalate exposure was moderate to high. As expected, because phthalates are rapidly metabolized and do not bioaccumulate, the collection of additional urine samples 1-3 months apart improves the prediction of a subject's 3-month average exposure. The levels of urinary metabolite levels found in the present study were similar to reference ranges measured in U.S. males for NHANES 1999-2000(CDC 2003Silva et al. 2004).
The predictive ability of a single urine sample to determine a subject's 3-month average exposure varied across phthalates. For MEHP, a single urine sample was least predictive of the tertile categorization and had the lowest sensitivity (Table 4). This implies that in statistical analyses in which only a single urine sample is available to categorize a subject's 3-month exposure to MEHP, there is likely exposure measure misclassification resulting in bias toward the null hypothesis for exposure-response relationships. MEHP has been associated with developmental reproductive toxicity in laboratory studies (Oishi 1986;Park et al. 2002;Sjoberg et al. 1986). However, in our previously published study, we did not find an association between MEHP and semen parameters among adult men (Duty et al. 2003). In contrast, we did find associations of MBP and MBzP with semen parameters. Although our study differed from the animal studies because we measured adult and not gestational exposure, our findings suggesting that a single urine sample, used to categorize a subject's exposure, did not adequately measure 3-month average exposure to MEHP. This may partially explain our inability to detect associations between semen parameters with MEHP. To improve upon our exposure classification of MEHP, we are currently collecting two urine samples 1 month apart from all subjects. This will allow us to use measurement error correction methods to adjust for exposure misclassification of phthalate exposure (Carroll et al. 1995).
It is possible the calculated sensitivities and specificities may be slightly overestimated to a small degree because we included predicted values in the calculation of the observed values. Therefore, the errors of predicted and observed values are not totally independent, which can lead to an overestimation of sensitivity and specificity (Willett 1998). Similarly, a portion of the increased sensitivity and specificity observed when taking two or three samples per subject instead of a single sample may be caused partly by the increased dependence between the errors of the predicted and observed values.
Apportioning the sources of variability in urinary phthalate metabolite levels can be used to design more valid and efficient exposure assessments. As expected, the urinary phthalate metabolite concentrations in samples collected close together in time, separated by 1-2 days, were more correlated than those in samples collected farther apart in time, separated by 1-3 months. Two samples collected a month or more apart include variability in urinary phthalate metabolite levels contributed to both by day-to-day changes in exposure and by monthly trends in phthalate exposure, such as seasonal changes in diet, personal product use, or activity patterns, as well as other environmental or biologic factors.
For each health end point of interest in an epidemiologic study, the relevant time window over which exposure is measured needs to be defined. For acute responses after acute exposures, a single urine sample may be adequate to define phthalate exposure. However, we are generally interested in health end points that have exposure windows of months, if not years. To this end, accurate exposure assessment depends on a strategy whereby we accurately measure exposure over these time windows. The simplest approach is to collect multiple urine samples from all subjects over the time interval of interest. However, it is not always feasible to collect multiple urine samples because of both cost constraints and limitations imposed by the subject's commitments to multiple collections. Based on the results of this phthalate variability study, for male reproductive health end points, we recommend collecting at least two urine samples 1-3 months apart. This will provide an estimate of the within-person variability taking into account both month-to-month and dayto-day variance. Nevertheless, if the study design only permitted collecting two samples 1-2 days apart, this, too, would provide a reasonable estimate of within-subject variance contributed to by day-to-day variance. After collection of the replicate urine samples in either sampling scheme, measurement error models could then be used to adjust for measurement error in exposure (Carroll et al. 1995). A discussion of this is beyond the scope of this report.
In conclusion, although a single urine sample was moderately predictive of 3-month exposure to phthalates, the predictive ability varied across phthalate monoesters. A single urine sample was more predictive for MEP and less predictive for MEHP. The single sample performed well in classifying a subject's exposure into tertiles, and the amount of nondifferential random exposure misclassification is likely to be moderate or small for most phthalate metabolites of interest. The variance apportionment analysis suggests that two urine samples, the second collected 1-3 months after the first sample, is the minimum number of samples necessary to account for the within subject day-to-day and cycle-to-cycle variability in urinary phthalate metabolite levels. Because the degree of between-and within-subject variance and thus the predictive ability of a single urine sample differ among phthalate metabolites, the most efficient exposure assessment strategy for a particular study depends on the phthalates of interest. The results from the present study will be used in our ongoing environmental agents and male reproductive health study to correct for measurement error in the effect estimates of exposure-response relationships between phthalates and sperm function. The findings from this variability study may also be pertinent to other end points with relevant exposure periods of several months. However, if the study population is not adult men of reproductive age, such as studies involving children or pregnant women, we recommend that a variability study be conducted to determine population-specific exposure assessment strategies.