[Big Data Revolution or Data Hubris? : On the Data Positivism of Molecular Biology]

NTM. 2017 Dec;25(4):459-483. doi: 10.1007/s00048-017-0179-2.
[Article in German]

Abstract

Genome data, the core of the 2008 proclaimed big data revolution in biology, are automatically generated and analyzed. The transition from the manual laboratory practice of electrophoresis sequencing to automated DNA-sequencing machines and software-based analysis programs was completed between 1982 and 1992. This transition facilitated the first data deluge, which was considerably increased by the second and third generation of DNA-sequencers during the 2000s. However, the strategies for evaluating sequence data were also transformed along with this transition. The paper explores both the computational strategies of automation, as well as the data evaluation culture connected with it, in order to provide a complete picture of the complexity of today's data generation and its intrinsic data positivism. This paper is thereby guided by the question, whether this data positivism is the basis of the big data revolution of molecular biology announced today, or it marks the beginning of its data hubris.

Keywords: automation; base-calling algorithms; big data; genome sequencing; human genome project; validation.

Publication types

  • Historical Article

MeSH terms

  • Algorithms
  • Big Data*
  • Data Science
  • History, 20th Century
  • History, 21st Century
  • Human Genome Project / history
  • Humans
  • Molecular Biology / history*
  • Sequence Analysis, DNA / history*
  • Sequence Analysis, DNA / methods