Computational solutions to large-scale data management and analysis

Eric E Schadt; Michael D Linderman; Jon Sorenson; Lawrence Lee; Garry P Nolan

doi:10.1038/nrg2857

Computational solutions to large-scale data management and analysis

Nat Rev Genet. 2010 Sep;11(9):647-57. doi: 10.1038/nrg2857.

Authors

Eric E Schadt¹, Michael D Linderman, Jon Sorenson, Lawrence Lee, Garry P Nolan

Affiliation

¹ Pacific Biosciences, Menlo Park, California 94025, USA. eschadt@pacificbiosciences.com

PMID: 20717155
PMCID: PMC3124937
DOI: 10.1038/nrg2857

Abstract

Today we can generate hundreds of gigabases of DNA and RNA sequencing data in a week for less than US$5,000. The astonishing rate of data generation by these low-cost, high-throughput technologies in genomics is being matched by that of other technologies, such as real-time imaging and mass spectrometry-based flow cytometry. Success in the life sciences will depend on our ability to properly interpret the large-scale, high-dimensional data sets that are generated by these technologies, which in turn requires us to adopt advances in informatics. Here we discuss how we can master the different types of computational environments that exist - such as cloud and heterogeneous computing - to successfully tackle our big data problems.

Publication types

Review

MeSH terms

Animals
Computational Biology / methods*
Genomics / methods
Humans
Sequence Analysis, DNA / methods

Abstract

Publication types

MeSH terms

Grants and funding