Format

Send to

Choose Destination
Biophys Rev. 2018 Dec 29. doi: 10.1007/s12551-018-0490-8. [Epub ahead of print]

Mining data and metadata from the gene expression omnibus.

Author information

1
BD2K-LINCS Data Coordination and Integration Center; Knowledge Management Center for the Illuminating the Druggable Genome; Mount Sinai Center for Bioinformatics, Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, Box 1603, One Gustave L. Levy Place, New York, NY, 10029, USA. zichen.wang@mssm.edu.
2
BD2K-LINCS Data Coordination and Integration Center; Knowledge Management Center for the Illuminating the Druggable Genome; Mount Sinai Center for Bioinformatics, Department of Pharmacological Sciences, Icahn School of Medicine at Mount Sinai, Box 1603, One Gustave L. Levy Place, New York, NY, 10029, USA.

Abstract

Publicly available gene expression datasets deposited in the Gene Expression Omnibus (GEO) are growing at an accelerating rate. Such datasets hold great value for knowledge discovery, particularly when integrated. Although numerous software platforms and tools have been developed to enable reanalysis and integration of individual, or groups, of GEO datasets, large-scale reuse of those datasets is impeded by minimal requirements for standardized metadata both at the study and sample levels as well as uniform processing of the data across studies. Here, we review methodologies developed to facilitate the systematic curation and processing of publicly available gene expression datasets from GEO. We identify trends for advanced metadata curation and summarize approaches for reprocessing the data within the entire GEO repository.

KEYWORDS:

Computational data curation; FAIR principles; GEO; Gene Expression Omnibus; Natural language processing

PMID:
30594974
DOI:
10.1007/s12551-018-0490-8
Free full text

Supplemental Content

Full text links

Icon for Springer
Loading ...
Support Center