Format

Send to

Choose Destination
Brief Bioinform. 2017 Mar 1;18(2):226-235. doi: 10.1093/bib/bbw017.

Resolving the problem of multiple accessions of the same transcript deposited across various public databases.

Author information

1
Institute of Cardiovascular Regeneration, Centre for Molecular Medicine, Goethe University Frankfurt and German Center for Cardiovascular Research, Partner side Rhein-Main, Frankfurt am Main, Germany.

Abstract

Maintaining the consistency of genomic annotations is an increasingly complex task because of the iterative and dynamic nature of assembly and annotation, growing numbers of biological databases and insufficient integration of annotations across databases. As information exchange among databases is poor, a 'novel' sequence from one reference annotation could be annotated in another. Furthermore, relationships to nearby or overlapping annotated transcripts are even more complicated when using different genome assemblies. To better understand these problems, we surveyed current and previous versions of genomic assemblies and annotations across a number of public databases containing long noncoding RNA. We identified numerous discrepancies of transcripts regarding their genomic locations, transcript lengths and identifiers. Further investigation showed that the positional differences between reference annotations of essentially the same transcript could lead to differences in its measured expression at the RNA level. To aid in resolving these problems, we present the algorithm 'Universal Genomic Accession Hash (UGAHash)' and created an open source web tool to encourage the usage of the UGAHash algorithm. The UGAHash web tool (http://ugahash.uni-frankfurt.de) can be accessed freely without registration. The web tool allows researchers to generate Universal Genomic Accessions for genomic features or to explore annotations deposited in the public databases of the past and present versions. We anticipate that the UGAHash web tool will be a valuable tool to check for the existence of transcripts before judging the newly discovered transcripts as novel.

KEYWORDS:

accession numbers; accession system; annotation scheme; databases; hashing algorithm; lncRNA; novel transcripts

PMID:
26921280
DOI:
10.1093/bib/bbw017
[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Silverchair Information Systems
Loading ...
Support Center