Tripal EUtils: a Tripal module to increase exchange and reuse of genome assembly metadata

Database (Oxford). 2020 Jan 1:2019:baz143. doi: 10.1093/database/baz143.

Abstract

Data and metadata interoperability between data storage systems is a critical component of the FAIR data principles. Programmatic and consistent means of reconciling metadata models between databases promote data exchange and thus increases its access to the scientific community. This process requires (i) metadata mapping between the models and (ii) software to perform the mapping. Here, we describe our efforts to map metadata associated with genome assemblies between the National Center for Biotechnology Information (NCBI) data resources and the Chado biological database schema. We present mappings for multiple NCBI data structures and introduce a Tripal software module, Tripal EUtils, to pull metadata from NCBI into a Tripal/Chado database. We discuss potential mapping challenges and solutions and provide suggestions for future development to further increase interoperability between these platforms. Database URL: https://github.com/NAL-i5K/tripal_eutils.

Publication types

  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Algorithms
  • Animals
  • Computational Biology / methods*
  • Databases, Genetic*
  • Genome*
  • Genomics
  • Information Storage and Retrieval
  • Invertebrates / genetics
  • Metadata*
  • National Library of Medicine (U.S.)
  • Plants / genetics
  • Programming Languages*
  • Software
  • United States