Quantifying the Impact of Co-Housing on Murine Aging Studies

Analysis of preclinical lifespan studies often assume that outcome data from co-housed animals are independent. In practice, treatments, such as controlled feeding or putative life-extending compounds, are applied to whole housing units, and as a result the outcomes are potentially correlated within housing units. We consider intra-class (here, intra-cage) correlation in three published and two unpublished lifespan studies of aged mice encompassing more than 20 thousand observations. We show that the independence assumption underlying common analytic techniques does not hold in these data, particularly for traits associated with frailty. We describe and demonstrate various analytical tools available to accommodate this study design and highlight a limitation of standard variance components models (i.e., linear mixed models) which are the usual statistical tool for handling correlated errors. Through simulations, we examine the statistical biases resulting from intra-cage correlations with similar magnitudes as observed in these case studies and discuss implications for power and reproducibility.


Extended Data
Supplementary Figure 1: Distribution of residual lifespan by housing unit in JAXDO after after accounting for all other study design effects.Each boxplot represents one housing unit, displaying the median, interquartile range (25th to 75th percentile; IQR), and 1.5× IQR.Boxplots of lifespan by housing ID were sorted by median residual lifespan (highlighted in red).Taller boxplots within study indicate cages with larger intra-cage variability and non-zero slope of the medians indicates intra-cage correlation within the study.

ICC value
Supplementary Figure 4: P-value histograms from 1000 simulations of randomized data with null treatment effect and negative ICC.The distribution of the p-value is a measure of how likely it is to observe a certain p-value or lower under different scenarios.Under the null hypothesis, the p-value has a uniform distribution over [0, 1], meaning that any p-value is equally likely to occur.Under the alternative hypothesis, the p-value has a skewed distribution that depends on the true value of the parameter being tested and the sample size.The more skewed the distribution is towards 1, the lower the power of the test, because large p-values are more likely to occur when the null hypothesis is false.

2 :
Permutation tests for non-independence by cage for lifespan outcome across included cohorts.For each cohort, a set of 1000 permuted datasets was generated by randomly assigning housing unit identifiers.GEE models were fit to each permutated dataset and to the observed data to estimate ICC for null and observed distributions.Comparing values for the observed ICC to the distribution of permutated ICCs indicated statistical significance.
Notes: COX= Cox Proportional Hazards Model; COXME = mixed effects COX; GEE = generalized estimating equations; LM = linear model; LMM = linear mixed model, n.per = number per cluster; negative ICC = -0.1.P-value histograms from 1000 simulations of randomized data with null treatment effect and positive ICC.The distribution of the p-value is a measure of how likely it is to observe a certain p-value or lower under different scenarios.Under the null hypothesis, the p-value has a uniform distribution over [0, 1], meaning that any p-value is equally likely to occur.Under the alternative hypothesis, the p-value has a skewed distribution that depends on the true value of the parameter being tested and the sample size.The more skewed the distribution is towards 0, the higher the power of the test, because small p-values are more likely to occur when the null hypothesis is false.Notes: COX= Cox Proportional Hazards Model; COXME = mixed effects COX; GEE = generalized estimating equations; LM = linear model; LMM = linear mixed model, n.per = number per cluster; positive ICC = +0.1.
We applied conventional tests for censored data (COX, COXME) and calculated empirical power.The figure shows how power varies with ICC, model type, and sample size.When generating data from a null fixed effects model, any p¡0.05indicates a false positive error.The results showed that tests assuming independence (COX) overestimated power in the presence of positive ICC values.Sample sizes computed by the COX models ignoring ICC would be too small for positive ICC, leading to wasted resources and low replicability.

Supplementary Table 1: Intra-cage correlations by physiologic trait domain in DRiDO
. ICC was estimated via LMM methods applied to longitudinally collected phenotypic trajectories in the DRiDO study.LMM estimated ICC±95% confidence interval shown.All quantitative traits excluding bodyweights were corrected for batch effects as described in[22].See DRiDO all traits iccs.csvSupplementary

Table 2 : Type III tests for main effect in each study with and without incorporating a random effects term for housing identifier to capture intra-cage correlation.
After incorporating various study design features via pre-analysis batch adjustment and model specification, we found type III tests for main effect in each study with and without incorporating a random effects term for housing identifier to capture intracage correlation usually showed no practical difference.Notes: 1 = singularity; HID = housing identifier, rx = treatment group (control or not control).Estimates may vary from main publications as censored observations were removed and estimates were derived from linear models.