GenBank® Passes the 100 Gigabase Mark
With the August 2005 release of GenBank, the combined primary nucleotide database produced by GenBank and the collaborating European Molecular Biology Database (EMBL) and DNA Database of Japan (DDBJ) now exceeds 100 billion base pairs. The primary nucleotide data continues to grow at an exponential rate. During the period between August 1997 and August 2005 the database has grown 100 fold with an average doubling time of around 14 months. Improvements in sequencing technology and throughput indicate that the explosive growth of the primary data is likely to continue. In fact, another milestone was reached with release 149: the number of bases derived from whole genome shotgun (WGS) sequencing projects now exceeds the number of bases in the traditional divisions of GenBank (Figure 1). The WGS portion of the primary data is undergoing extremely rapid growth with the number of bases increasing more than ten fold in the past three years. There are 261 WGS projects in release 149 of GenBank including projects for human, mouse, rat, dog, numerous bacteria, and assemblies from environmental samples. With the sequencing of complete genomes becoming routine, genome sequence data will increasingly dominate the primary sequence data. The task of maintaining this data as a comprehensive and accurate resource is a primary goal of the NCBI.
The National Library of Medicine's press release provides additional information and commentary on the 100 Gigabase milestone:
Figure 1. The growth of GenBank. The blue area shows the total number of bases including those from whole genome shotgun sequencing projects (WGS). The checkered area shows only the non-WGS portion. With release 149, the number of WGS bases exceeded the number of bases in the traditional GenBank divisions.