Format

Send to

Choose Destination
BMC Med Res Methodol. 2017 Jul 20;17(1):110. doi: 10.1186/s12874-017-0369-6.

Binomial outcomes in dataset with some clusters of size two: can the dependence of twins be accounted for? A simulation study comparing the reliability of statistical methods based on a dataset of preterm infants.

Author information

1
Epidemiology and International Public Health, School of Public Health, Bielefeld University, Bielefeld, Germany. odile.sauzet@uni-bielefeld.de.
2
Division of Health and Social Care Research King's College London and NIHR Biomedical Research Centre at Guy's and St Thomas' NHS Foundation Trust and King's College London, London, UK.

Abstract

BACKGROUND:

The analysis of perinatal outcomes often involves datasets with some multiple births. These are datasets mostly formed of independent observations and a limited number of clusters of size two (twins) and maybe of size three or more. This non-independence needs to be accounted for in the statistical analysis. Using simulated data based on a dataset of preterm infants we have previously investigated the performance of several approaches to the analysis of continuous outcomes in the presence of some clusters of size two. Mixed models have been developed for binomial outcomes but very little is known about their reliability when only a limited number of small clusters are present.

METHODS:

Using simulated data based on a dataset of preterm infants we investigated the performance of several approaches to the analysis of binomial outcomes in the presence of some clusters of size two. Logistic models, several methods of estimation for the logistic random intercept models and generalised estimating equations were compared.

RESULTS:

The presence of even a small percentage of twins means that a logistic regression model will underestimate all parameters but a logistic random intercept model fails to estimate the correlation between siblings if the percentage of twins is too small and will provide similar estimates to logistic regression. The method which seems to provide the best balance between estimation of the standard error and the parameter for any percentage of twins is the generalised estimating equations.

CONCLUSIONS:

This study has shown that the number of covariates or the level two variance do not necessarily affect the performance of the various methods used to analyse datasets containing twins but when the percentage of small clusters is too small, mixed models cannot capture the dependence between siblings.

KEYWORDS:

Binomial outcomes; Generalised estimating equations; Generalised mixed models; Perinatal outcomes; Small clusters

PMID:
28728549
PMCID:
PMC5520371
DOI:
10.1186/s12874-017-0369-6
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center