Send to

Choose Destination
Biol Lett. 2018 Sep;14(9). pii: 20180431. doi: 10.1098/rsbl.2018.0431.

Quantifying the dark data in museum fossil collections as palaeontology undergoes a second digital revolution.

Author information

Department of Integrative Biology, University of California, 3040 Valley Life Sciences Building, Berkeley, CA 94720-3140, USA
University of California Museum of Paleontology, University of California, 1101 Valley Life Sciences Building, Berkeley, CA 94720-4780, USA.
Department of Integrative Biology, University of California, 3040 Valley Life Sciences Building, Berkeley, CA 94720-3140, USA.
Department of Geological Sciences, California State University, Fullerton, CA 92834, USA.
John D. Cooper Archaeological and Paleontological Center, Santa Ana, CA 92701-6427, USA.
Department of Earth Sciences, University of Oregon, Eugene, OR 97403-1272, USA.
University of Oregon Museum of Natural and Cultural History, 1680 E. 15th Avenue, Eugene, OR 97403-1224, USA.
Paleontological Research Institution, 1259 Trumansburg Road, Ithaca, NY 14850, USA.
Department of Earth and Atmospheric Sciences, Cornell University, 112 Hollister Drive, Ithaca, NY 14853, USA.
University of Alaska Museum and Department of Geosciences, University of Alaska Fairbanks, 1962 Yukon Drive, Fairbanks, AK 99775, USA.
Burke Museum of Natural History and Culture, University of Washington, Box 353010, Seattle, WA 98195-3010, USA.
California Academy of Sciences, 55 Music Concourse Drive, San Francisco, CA 94118, USA.
Natural History Museum of Los Angeles County, 900 Exposition Boulevard, Los Angeles, CA 90007, USA.
Department of Paleobiology, National Museum of Natural History, Smithsonian Institution, PO Box 37012, Washington, DC 20013, USA.


Large-scale analysis of the fossil record requires aggregation of palaeontological data from individual fossil localities. Prior to computers, these synoptic datasets were compiled by hand, a laborious undertaking that took years of effort and forced palaeontologists to make difficult choices about what types of data to tabulate. The advent of desktop computers ushered in palaeontology's first digital revolution-online literature-based databases, such as the Paleobiology Database (PBDB). However, the published literature represents only a small proportion of the palaeontological data housed in museum collections. Although this issue has long been appreciated, the magnitude, and thus potential significance, of these so-called 'dark data' has been difficult to determine. Here, in the early phases of a second digital revolution in palaeontology--the digitization of museum collections-we provide an estimate of the magnitude of palaeontology's dark data. Digitization of our nine institutions' holdings of Cenozoic marine invertebrate collections from California, Oregon and Washington in the USA reveals that they represent 23 times the number of unique localities than are currently available in the PBDB. These data, and the vast quantity of similarly untapped dark data in other museum collections, will, when digitally mobilized, enhance palaeontologists' ability to make inferences about the patterns and processes of past evolutionary and ecological changes.


dark data; digitization; iDigBio; museum collections

[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Atypon Icon for PubMed Central
Loading ...
Support Center