NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

National Research Council (US) Board on Research Data and Information; Uhlir PF, editor. Designing the Microbial Research Commons: Proceedings of an International Symposium. Washington (DC): National Academies Press (US); 2011.

Cover of Designing the Microbial Research Commons

Designing the Microbial Research Commons: Proceedings of an International Symposium.

Show details

16Research and Applications in Energy and Environment41


Department of Energy, Genome Sciences

In this presentation I will describe some of the Department of Energy’s programs, particularly those related to sequencing. The Department of Energy (DOE) is generating more and more data in ever larger amounts. Our missions include developing biofuels, understanding the potential effects of greenhouse gas emissions, predicting the fate and transport of contaminants, and developing tools to explore the interface of the physical and the biological sciences.

These first three missions are not new. For years we have had high-throughput computing used to simulate climate processes. We are also the inheritors of the Atomic Energy Commission and its legacy of the nuclear weapons programs. Many of the nasty contaminants developed and used in those programs got dumped in the ground and ignored for many years. Now we have to deal with them.

The Biological Systems Science Division, where I work, has a genome sciences program. We also have three large bio-energy research centers, some imaging and radiobiology research programs, and a very small program on ethical, legal, and social issues. And then we have one user facility in our division called the Joint Genome Institute.

The parallel division, the Climate and Environmental Sciences Division, has programs appropriate for that division, looking at modeling climate processes and characterizing subsurface biogeochemical processes. I am currently the chair of an interagency group with a diverse collection of member agencies all with an interest in microbial research. That has led to a charter, which is to maximize opportunities offered by this science, as well as one primary direction to fulfill that charter: to generate large amounts of data and to get the most out of these data.

The DOE’s Joint Genome Institute was started in 1997. The Facility was built to carry out the DOE’s obligations to the Human Genome Project. We assembled the sequencing and processing facilities in one place in order to take advantage of economies of scale and do the job faster, better, cheaper. A major aspect of the Joint Genome Institute is the community sequencing program, which is an outreach program to the wider community to provide a high-throughput, highly capable sequencing facility. Its goal is to provide sequencing and analyses services to anyone who has some tie to one of the DOE missions in bioenergy, biogeochemistry, or carbon cycling and who passes its peer review process.

The four areas of genome science within the community sequencing program are plants, fungi, prokaryotic isolates, and metagenomes. The outputs of the sequencing runs performed at the JGI are put into the Integrated Microbial Genome (IMG) system. The throughput from these machines has absolutely revolutionized biological science in a very short period of time.

This is one of the reasons that this meeting is critical—because the front end of data production is quite literally the tsunami that several people referred to yesterday. My presentation is already out of date, since it is four days old, but as of four days ago there were 1,110 published complete genomes in the public literature. There are also 111 archaeal complete genomes, 3,342 ongoing bacterial projects, 1,165 ongoing eukaryotic genomes; and 200 metagenomes, for a total of nearly 6,000 sequencing projects of biological organisms that are in various stages of completion. It will be a big challenge to deal effectively with all this.42

In the future, single-cell projects will provide another major source of data. It is extraordinarily exciting to be able to sequence the genome of a single cell without growing it. It will also be another source of microbial data however, with which a commons is going to have to deal.

The data flood is not stopping. It is not leveling off. It is increasing. Potential future projects that the Joint Genome Institute is talking about are in the terabase range—trillions of base-pairs. The institute is also engaged in some international projects.

All of this information is deposited in the Integrated Microbial Genomes (IMG) system. The IMG is a data management and analysis platform designed to get value from the sequence data produced by the Joint Genome Institute and other places.

Another facility that we support is the Environmental Molecular Sciences Laboratory (EMSL), which has high-throughput capabilities in nuclear magnetic resonance, mass spectrometry, reaction chemistries, molecular sciences computing, and so forth. We are aggressively exploring ways of putting these two facilities together.

In the future, we hope to issue a call for projects that entail both Joint Genome Institute sequencing and EMSL proteomic analyses—the kinds of projects that neither of those two facilities could do by itself but which, if they work together, can be tremendously valuable and provide yet another kind of data that a commons would want to include.

Our data sharing policies state that any publishable information resulting from research that we have funded “must conform to community recognized standard formats when they exist, be clearly attributable, and be deposited within a community recognized public database(s) appropriate for the research conducted.” There is no time element here, and it is left up to the community to determine what the standards should be. In sequencing, we have moved to the immediate release of raw reads, and reserved analyses of more than 6 months are discouraged. Twelve months is the absolute maximum we will hold onto data without releasing it. A reserved analysis is anything that would compete with the stated scientific aims of the submitter of the project. We are also launching a knowledge base initiative to accelerate research and integration and cross-referencing of data.

To sum up, there is just so much data being produced so rapidly that you feel that the rest of biology is not keeping up. I think this effort by the National Research Council is critically important.



As of the end of February, 2011, there were 1,627 published complete genomes in the public literature. There are also 211 archaeal complete genomes, 5,790 ongoing bacterial projects, 2,002 ongoing eukaryotic genomes; and 308 metagenomes, for a total of nearly 10,000 sequencing projects of biological organisms that are in various stages of completion. Source: Genomes On Line database, http://www​.genomesonline​.org/cgi-bin/GOLD/bin/gold.cgi. This only underscores the challenges that collectively we (and a microbial commons effort) face.

Copyright © 2011, National Academy of Sciences.
Bookshelf ID: NBK92722


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (2.5M)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...