Display Settings:

Format

Send to:

Choose Destination

    Bioinformatics. 2009 May 1;25(9):1165-72. Epub 2009 Mar 4.

    MIST: Maximum Information Spanning Trees for dimension reduction of biological data sets.

    King BM, Tidor B.

    Computer Science and Artificial Intelligence Laboratory, Department of Biological Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA.

    MOTIVATION: The study of complex biological relationships is aided by large and high-dimensional data sets whose analysis often involves dimension reduction to highlight representative or informative directions of variation. In principle, information theory provides a general framework for quantifying complex statistical relationships for dimension reduction. Unfortunately, direct estimation of high-dimensional information theoretic quantities, such as entropy and mutual information (MI), is often unreliable given the relatively small sample sizes available for biological problems. Here, we develop and evaluate a hierarchy of approximations for high-dimensional information theoretic statistics from associated low-order terms, which can be more reliably estimated from limited samples. Due to a relationship between this metric and the minimum spanning tree over a graph representation of the system, we refer to these approximations as MIST (Maximum Information Spanning Trees). RESULTS: The MIST approximations are examined in the context of synthetic networks with analytically computable entropies and using experimental gene expression data as a basis for the classification of multiple cancer types. The approximations result in significantly more accurate estimates of entropy and MI, and also correlate better with biological classification error than direct estimation and another low-order approximation, minimum-redundancy-maximum-relevance (mRMR). AVAILABILITY: Software to compute the entropy approximations described here is available as Supplementary Material. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

    PMID: 19261718 [PubMed - indexed for MEDLINE]

    PMCID: 2672626

    Supplemental Content

    Click here to read Click here to read Click here to read

    Recent activity

    Your browsing activity is temporarily unavailable.

    Your browsing activity is empty.

    Activity recording is turned off.

    Turn recording back on

    » See more...