Format

Send to

Choose Destination
J Biosci Bioeng. 2016 Aug;122(2):168-75. doi: 10.1016/j.jbiosc.2016.01.007. Epub 2016 Feb 6.

Random sample consensus combined with partial least squares regression (RANSAC-PLS) for microbial metabolomics data mining and phenotype improvement.

Author information

1
Department of Biotechnology, Graduate School of Engineering, Osaka University, 2-1 Yamadaoka, Suita, Osaka 565-0871, Japan. Electronic address: teoh_shao_thing@bio.eng.osaka-u.ac.jp.
2
Towa Pharmaceutical Co., Ltd., 26-7 Ichiban-Cho, Kadoma, Osaka 571-0033, Japan. Electronic address: m-kitamura@towayakuhin.co.jp.
3
Department of Applied Microbial Technology, Sojo University, 4-22-1 Ikeda, Kumamoto 860-0082, Japan. Electronic address: ynaka@bio.sojo-u.ac.jp.
4
Department of Biotechnology, Graduate School of Engineering, Osaka University, 2-1 Yamadaoka, Suita, Osaka 565-0871, Japan. Electronic address: sastia_putri@bio.eng.osaka-u.ac.jp.
5
Department of Bioscience, Nagahama Institute of Bio-Science and Technology, 1266 Tamura, Nagahama, Shiga 526-0829, Japan. Electronic address: y_mukai@nagahama-i-bio.ac.jp.
6
Department of Biotechnology, Graduate School of Engineering, Osaka University, 2-1 Yamadaoka, Suita, Osaka 565-0871, Japan. Electronic address: fukusaki@bio.eng.osaka-u.ac.jp.

Abstract

In recent years, the advent of high-throughput omics technology has made possible a new class of strain engineering approaches, based on identification of possible gene targets for phenotype improvement from omic-level comparison of different strains or growth conditions. Metabolomics, with its focus on the omic level closest to the phenotype, lends itself naturally to this semi-rational methodology. When a quantitative phenotype such as growth rate under stress is considered, regression modeling using multivariate techniques such as partial least squares (PLS) is often used to identify metabolites correlated with the target phenotype. However, linear modeling techniques such as PLS require a consistent metabolite-phenotype trend across the samples, which may not be the case when outliers or multiple conflicting trends are present in the data. To address this, we proposed a data-mining strategy that utilizes random sample consensus (RANSAC) to select subsets of samples with consistent trends for construction of better regression models. By applying a combination of RANSAC and PLS (RANSAC-PLS) to a dataset from a previous study (gas chromatography/mass spectrometry metabolomics data and 1-butanol tolerance of 19 yeast mutant strains), new metabolites were indicated to be correlated with tolerance within certain subsets of the samples. The relevance of these metabolites to 1-butanol tolerance were then validated from single-deletion strains of corresponding metabolic genes. The results showed that RANSAC-PLS is a promising strategy to identify unique metabolites that provide additional hints for phenotype improvement, which could not be detected by traditional PLS modeling using the entire dataset.

KEYWORDS:

1-Butanol tolerance; Data mining; Gas chromatography/mass spectrometry; Metabolomics; Partial least squares; Phenotype improvement; Random sample consensus; Regression model; Saccharomyces cerevisiae; Semi-rational strain engineering

PMID:
26861498
DOI:
10.1016/j.jbiosc.2016.01.007
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Elsevier Science
Loading ...
Support Center