Calculating stage duration statistics in multistage diseases

Natalia L Komarova; Craig J Thalhauser

doi:10.1371/journal.pone.0028298

Calculating stage duration statistics in multistage diseases

PLoS One. 2011;6(12):e28298. doi: 10.1371/journal.pone.0028298. Epub 2011 Dec 7.

Authors

Natalia L Komarova¹, Craig J Thalhauser

Affiliation

¹ Department of Mathematics, University of California Irvine, Irvine, California, United States of America. komarova@uci.edu

Abstract

Many human diseases are characterized by multiple stages of progression. While the typical sequence of disease progression can be identified, there may be large individual variations among patients. Identifying mean stage durations and their variations is critical for statistical hypothesis testing needed to determine if treatment is having a significant effect on the progression, or if a new therapy is showing a delay of progression through a multistage disease. In this paper we focus on two methods for extracting stage duration statistics from longitudinal datasets: an extension of the linear regression technique, and a counting algorithm. Both are non-iterative, non-parametric and computationally cheap methods, which makes them invaluable tools for studying the epidemiology of diseases, with a goal of identifying different patterns of progression by using bioinformatics methodologies. Here we show that the regression method performs well for calculating the mean stage durations under a wide variety of assumptions, however, its generalization to variance calculations fails under realistic assumptions about the data collection procedure. On the other hand, the counting method yields reliable estimations for both means and variances of stage durations. Applications to Alzheimer disease progression are discussed.

MeSH terms

Algorithms
Alzheimer Disease / diagnosis
Alzheimer Disease / pathology
Computational Biology
Data Interpretation, Statistical*
Dementia / diagnosis
Dementia / pathology
Disease Progression
Humans
Linear Models
Longitudinal Studies
Models, Statistical
Probability
Prognosis
Reproducibility of Results
Time Factors