Format

Send to

Choose Destination
See comment in PubMed Commons below
PLoS One. 2014 Jun 17;9(6):e99979. doi: 10.1371/journal.pone.0099979. eCollection 2014.

Standardized metadata for human pathogen/vector genomic sequences.

Author information

1
J. Craig Venter Institute, Rockville, Maryland, and La Jolla, California, United States of America; National Institute of Allergy and Infectious Diseases, Rockville, Maryland, United States of America.
2
University of Notre Dame, Notre Dame, Indiana, United States of America.
3
University of Pennsylvania, Philadelphia, Pennsylvania, United States of America.
4
Broad Institute, Cambridge, Massachusetts, United States of America.
5
J. Craig Venter Institute, Rockville, Maryland, and La Jolla, California, United States of America.
6
Institute for Genome Sciences, University of Maryland School of Medicine, Baltimore, Maryland, United States of America.
7
Cyberinfrastructure Division, Virginia Bioinformatics Institute, Blacksburg, Virginia, United States of America.
8
National Institute of Allergy and Infectious Diseases, Rockville, Maryland, United States of America.
9
National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, United States of America.
10
University of Georgia, Athens, Georgia, United States of America.
11
Kelly Government Solutions, Rockville, Maryland, United States of America.
12
Argonne National Laboratory, Lemont, Illinois, United States of America.
13
J. Craig Venter Institute, Rockville, Maryland, and La Jolla, California, United States of America; Department of Pathology, University of California San Diego, San Diego, California, United States of America.

Abstract

High throughput sequencing has accelerated the determination of genome sequences for thousands of human infectious disease pathogens and dozens of their vectors. The scale and scope of these data are enabling genotype-phenotype association studies to identify genetic determinants of pathogen virulence and drug/insecticide resistance, and phylogenetic studies to track the origin and spread of disease outbreaks. To maximize the utility of genomic sequences for these purposes, it is essential that metadata about the pathogen/vector isolate characteristics be collected and made available in organized, clear, and consistent formats. Here we report the development of the GSCID/BRC Project and Sample Application Standard, developed by representatives of the Genome Sequencing Centers for Infectious Diseases (GSCIDs), the Bioinformatics Resource Centers (BRCs) for Infectious Diseases, and the U.S. National Institute of Allergy and Infectious Diseases (NIAID), part of the National Institutes of Health (NIH), informed by interactions with numerous collaborating scientists. It includes mapping to terms from other data standards initiatives, including the Genomic Standards Consortium's minimal information (MIxS) and NCBI's BioSample/BioProjects checklists and the Ontology for Biomedical Investigations (OBI). The standard includes data fields about characteristics of the organism or environmental source of the specimen, spatial-temporal information about the specimen isolation event, phenotypic characteristics of the pathogen/vector isolated, and project leadership and support. By modeling metadata fields into an ontology-based semantic framework and reusing existing ontologies and minimum information checklists, the application standard can be extended to support additional project-specific data fields and integrated with other data represented with comparable standards. The use of this metadata standard by all ongoing and future GSCID sequencing projects will provide a consistent representation of these data in the BRC resources and other repositories that leverage these data, allowing investigators to identify relevant genomic sequences and perform comparative genomics analyses that are both statistically meaningful and biologically relevant.

PMID:
24936976
PMCID:
PMC4061050
DOI:
10.1371/journal.pone.0099979
[Indexed for MEDLINE]
Free PMC Article
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Public Library of Science Icon for PubMed Central
    Loading ...
    Support Center