Missing-value estimation using linear and non-linear regression with Bayesian gene selection

Bioinformatics. 2003 Nov 22;19(17):2302-7. doi: 10.1093/bioinformatics/btg323.

Abstract

Motivation: Data from microarray experiments are usually in the form of large matrices of expression levels of genes under different experimental conditions. Owing to various reasons, there are frequently missing values. Estimating these missing values is important because they affect downstream analysis, such as clustering, classification and network design. Several methods of missing-value estimation are in use. The problem has two parts: (1) selection of genes for estimation and (2) design of an estimation rule.

Results: We propose Bayesian variable selection to obtain genes to be used for estimation, and employ both linear and nonlinear regression for the estimation rule itself. Fast implementation issues for these methods are discussed, including the use of QR decomposition for parameter estimation. The proposed methods are tested on data sets arising from hereditary breast cancer and small round blue-cell tumors. The results compare very favorably with currently used methods based on the normalized root-mean-square error.

Availability: The appendix is available from http://gspsnap.tamu.edu/gspweb/zxb/missing_zxb/ (user: gspweb; passwd: gsplab).

Publication types

  • Comparative Study
  • Evaluation Study
  • Research Support, U.S. Gov't, Non-P.H.S.
  • Research Support, U.S. Gov't, P.H.S.
  • Validation Study

MeSH terms

  • Algorithms*
  • Artifacts
  • BRCA2 Protein / genetics*
  • Bayes Theorem
  • Breast Neoplasms / genetics*
  • Carrier Proteins / genetics*
  • Gene Deletion*
  • Gene Expression Profiling / methods*
  • Genetic Variation
  • Humans
  • Linear Models
  • Models, Genetic*
  • Models, Statistical
  • Nonlinear Dynamics
  • Oligonucleotide Array Sequence Analysis / methods*
  • Regression Analysis
  • Ubiquitin-Protein Ligases

Substances

  • BRCA2 Protein
  • Carrier Proteins
  • BRAP protein, human
  • Ubiquitin-Protein Ligases