Format

Send to

Choose Destination
PLoS One. 2018 May 11;13(5):e0196937. doi: 10.1371/journal.pone.0196937. eCollection 2018.

High throughput nonparametric probability density estimation.

Author information

1
Department of Physics and Optical Science, University of North Carolina at Charlotte, Charlotte, NC, United States of America.
2
Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, United States of America.
3
Center for Biomedical Engineering and Science, University of North Carolina at Charlotte, Charlotte, NC, United States of America.

Abstract

In high throughput applications, such as those found in bioinformatics and finance, it is important to determine accurate probability distribution functions despite only minimal information about data characteristics, and without using human subjectivity. Such an automated process for univariate data is implemented to achieve this goal by merging the maximum entropy method with single order statistics and maximum likelihood. The only required properties of the random variables are that they are continuous and that they are, or can be approximated as, independent and identically distributed. A quasi-log-likelihood function based on single order statistics for sampled uniform random data is used to empirically construct a sample size invariant universal scoring function. Then a probability density estimate is determined by iteratively improving trial cumulative distribution functions, where better estimates are quantified by the scoring function that identifies atypical fluctuations. This criterion resists under and over fitting data as an alternative to employing the Bayesian or Akaike information criterion. Multiple estimates for the probability density reflect uncertainties due to statistical fluctuations in random samples. Scaled quantile residual plots are also introduced as an effective diagnostic to visualize the quality of the estimated probability densities. Benchmark tests show that estimates for the probability density function (PDF) converge to the true PDF as sample size increases on particularly difficult test probability densities that include cases with discontinuities, multi-resolution scales, heavy tails, and singularities. These results indicate the method has general applicability for high throughput statistical inference.

PMID:
29750803
PMCID:
PMC5947915
DOI:
10.1371/journal.pone.0196937
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Public Library of Science Icon for PubMed Central
Loading ...
Support Center