Format

Send to

Choose Destination
Sci Data. 2017 Sep 19;4:170125. doi: 10.1038/sdata.2017.125.

Precision annotation of digital samples in NCBI's gene expression omnibus.

Author information

1
Institute for Computational Health Sciences, University of California, San Francisco, California 94158, USA.
2
Department of Neurosurgery, Stanford University School of Medicine, Stanford, California 94305, USA.
3
University of Illinois College of Medicine, Chicago, Illinois 60612, USA.
4
Harvard Medical School Department of Immunology, Harvard University, Boston, Massachusetts 02115, USA.
5
Wayne State University School of Medicine, Detroit, Michigan 48201, USA.
6
Yale School of Medicine, Yale University, New Haven, Connecticut 06519, USA.
7
University of Vermont Medical Center, University of Vermont, Burlington, Vermont 05401, USA.
8
Program in Biological &Medical Informatics, University of California, San Francisco, CA 94158, USA.
9
Stanford Center for Biomedical Informatics Research, Stanford University School of Medicine, Stanford, California 94305, USA.

Abstract

The Gene Expression Omnibus (GEO) contains more than two million digital samples from functional genomics experiments amassed over almost two decades. However, individual sample meta-data remains poorly described by unstructured free text attributes preventing its largescale reanalysis. We introduce the Search Tag Analyze Resource for GEO as a web application (http://STARGEO.org) to curate better annotations of sample phenotypes uniformly across different studies, and to use these sample annotations to define robust genomic signatures of disease pathology by meta-analysis. In this paper, we target a small group of biomedical graduate students to show rapid crowd-curation of precise sample annotations across all phenotypes, and we demonstrate the biological validity of these crowd-curated annotations for breast cancer. STARGEO.org makes GEO data findable, accessible, interoperable and reusable (i.e., FAIR) to ultimately facilitate knowledge discovery. Our work demonstrates the utility of crowd-curation and interpretation of open 'big data' under FAIR principles as a first step towards realizing an ideal paradigm of precision medicine.

PMID:
28925997
PMCID:
PMC5604135
DOI:
10.1038/sdata.2017.125
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Nature Publishing Group Icon for PubMed Central
Loading ...
Support Center