NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Roundtable on Translating Genomic-Based Research for Health; Board on Health Sciences Policy; Institute of Medicine. Improving the Efficiency and Effectiveness of Genomic Science Translation: Workshop Summary. Washington (DC): National Academies Press (US); 2014 Feb 11.

Cover of Improving the Efficiency and Effectiveness of Genomic Science Translation

Improving the Efficiency and Effectiveness of Genomic Science Translation: Workshop Summary.

Show details

3Moving Basic Science Forward

Important Points Highlighted by Individual Speakers

  • Online resources, including data, research services, and biological specimen repositories, provide a wealth of largely untapped existing information that is publicly available and that could be used to accelerate discovery and development.
  • Provision of incentives for sharing of all publicly funded data could maximize the full potential of existing data.
  • Open data sharing needs to be enforced by funding agencies and scientific journals for meaningful change to occur.
  • Alignment of research goals with academic incentives is needed to foster cross-discipline collaboration, training, and innovation and to provide a focus on outcomes that bring value to society.
  • The use of a systems biology approach to study complex diseases would allow the integration of multiple levels of genetic and phenotypic data and more rapid and extensive hypothesis testing.

An effective way to move the translation of genomic science forward is to change the culture through the more complete integration of systems biology approaches and open sharing of data and resources. Atul Butte, chief of the Division of Systems Medicine at the Center for Pediatric Bioinformatics of Stanford University, described the wealth of tools that are already available online and the ways in which those tools are changing the conduct of research and the training of researchers. Daniel Geschwind, Gordon and Virginia MacDonald Distinguished Chair in Human Genetics and professor of neurology and psychiatry at the University of California, Los Angeles (UCLA), School of Medicine, used autism research as an example of the potential of the systems biology approach to be used to study a complex issue in neuroscience in which the use of cross-disciplinary tools could yield more rapid progress.


As an example of the power of modern technologies, Butte showed the audience an Affymetrix gene chip, an example of technology the size of a thumbnail that can quantitate the expression of every gene in the genome. This invention, which has been around for 15 years, is now commonly used by laboratories at universities and academic medical centers, and such chips have become commodity items in academic research.

In August 2012, more than 1 million gene expression datasets were available to the public, and that number is rapidly increasing (Baker, 2012). Since the early 2000s, journals have been requiring that the data from gene expression studies that they publish be deposited into publicly accessible international repositories for others to use, “and to me that is the most enabling thing in the world,” said Butte.

Making Data and Resources Available

Anyone can now download microarray datasets and conduct original research by using the contents. “A high school kid can do this,” Butte said. Researchers can now have access to 20,000, 30,000, or 40,000 samples of any cancer or any disease that they want. This collection of open data contains more breast cancer samples, for example, than any breast cancer researcher will ever have in his or her lab, Butte said.

Furthermore, the number of available datasets is continually expanding, encompassing not just basic research data but clinical data and data of direct relevance to industry. “It's just sitting there waiting for people to exploit it,” he said. For example, the Framingham Heart Study, a longitudinal study started in 1948 by the National Heart, Lung, and Blood Institute and Boston University, has collected thousands of genotypes and years of clinical and research data that are openly available for downloading. ImmPort is another resource that contains an archive of genomic, proteomic, and immunological data from research supported by the Division of Allergy, Immunology, and Transplantation of the National Institute of Allergy and Infectious Diseases.

In addition to datasets available on the Internet, Butte discussed other resources. For example, contract research organizations (CROs) provide a variety of both basic and clinical research services. As one example, Butte described a website called, which bills itself as the “marketplace for pharmaceutical research services” by providing drug testing in a wide variety of mouse models. For $9,000, a CRO accessed through AssayDepot can run a test of a diabetes drug for 28 days in 16 mice from The Jackson Laboratory, divided into treatment cohorts. The CRO will also perform fasting blood sugar, insulin tolerance, and glucose tolerance tests in a blinded format. Butte acknowledged that researchers have concerns over assuring the quality from experiments purchased from these organizations. He said that because companies all over the world are providing such services, results can be independently verified in replicate experiments ordered from independent labs. Running the actual validation experiment is “not a rate-limiting step anymore,” Butte explains, but is only a matter of funding to complete these experiments. For example, instead of researchers arranging for the transfer of mice from one institution to another and then waiting for the mice to breed, “why not just pay [the original lab] to do the experiment because they have the mice already?” Furthermore, Butte said that private companies can often send researchers biological specimens much faster than academic institutions can prepare them for study; he regularly orders tissue and sera samples from privately run banks. It is not just about making the data available, but it is also the resources, he said.

Burke inquired about validation problems that may arise when using an outsourcing model for experiments. Butte acknowledged that there is no one lab that will have the expertise in using every mouse model but that these organizations will have experience with the more common experiments and techniques. If a specialized technique is needed, that is when it may be more appropriate to collaborate with other academic labs to complete the experiments, he said.

Biomedical research has been slow to take advantage of the services available today, but that is changing, said Butte. Today, the production of clinical and molecular data, the application of statistical and computational methods, and the validation of a drug or biomarkers are all commodity services. “I'm just waiting for the next Genentech to start in a garage, [or] the next Merck or Pfizer to start in a dorm room,” Butte said, because with a credit card you have 1 million microarrays and every possible mouse model in your garage today.

Cross-Disciplinary Collaborations

The only item that cannot be outsourced, said Butte, is a good question, and formulation of these questions requires an understanding of unmet medical needs. In this regard, it is crucial to encourage collaborations among groups that would not otherwise be connected. Computational experts as well as clinicians and patients, for example, are all needed to devise an appropriate approach to solving specific problems. Effective cross-disciplinary conversations could be incentivized through late-stage training grants, Butte said.

Accelerating Drug Development with Available Tools

To demonstrate the point that new discoveries can be made with existing, publicly available data and resources that are ready to go, Butte described projects that he and his colleagues have been pursuing in their laboratory at Stanford. The first one involves type 2 diabetes, which affects between 90 and 95 percent of the almost 19 million people diagnosed with diabetes in the United States; the estimated total medical costs for those with diabetes in the United States is $174 billion (CDC, 2011). Although many drugs are available to improve insulin response and regulation, more effective drugs are needed, said Butte.

In their study of diabetes, Butte and his colleagues identified the top differentially expressed genes that are associated with diabetes by comparing genome-wide association studies from 130 publicly available independent experiments (Kodama et al., 2012). For example, after comparison of the expression of RNA in tissue from nondiabetic and diabetic patients through the use of existing, available data, the expression of a cell surface inflammatory receptor, CD44, was found to be differentially expressed more than any other gene in these types of experiments. Butte's group then examined CD44 expression levels in healthy wild-type mice of the C57BL/6J strain, which are known to become obese and develop insulin resistance when fed high-fat diets. Butte's group found that CD44 expression levels were increased in the group of mice fed a high-fat diet (obese) compared with the levels in the group fed a normal-fat diet. Specifically, CD44 expression was increased in the inflammatory infiltrate in the adipose tissue of the obese mice. Next, Butte and his colleagues obtained a commercially available mouse strain in which the CD44 gene was deleted (CD44-/- mice). This strain was developed more than a decade ago but had not yet been tested for glucose levels, he said. Experiments determined that CD44-/- mice had greater sensitivity to insulin than wild-type mice and did not die from diabetes. Use of commercially available CD44-blocking antibodies by Butte and his team in wild-type mice fed a high-fat diet (obese) lowered their blood sugar levels.

The CD44 protein makes an attractive therapeutic target because it is abundant in adipose tissue, which is readily available for study. Furthermore, CD44 can be cleaved from the cell surface and shed into blood in its soluble form, making it easy to measure. Identification of a new therapeutic target for type 2 diabetes “took us about 18 months of work, starting from the same data any high school kid can get today,” said Butte. “All those [data] are sitting there. Everyone devalues them. Because they're on the Internet, that must mean they're not valuable. But it's all of our excellent peer investigators putting [those] data out there.”

The development of drugs by these new approaches can be much faster than traditional approaches. For example, Butte described a drug that can “melt away” lung cancer tumors that went from computational prediction to use in humans in just 15 months. “I'm a huge optimist that we can do this, but it means [developing] a new set of skills [and] not just finding people who can generate more and more data to get higher and higher resolution but … [taking] advantage of the data we've been so great at generating.” This approach could be very widely applied, Butte remarked. “I'm giving my secrets away here. I want more people to be able to do this. This is how to scale, so you get more people interested.”

Butte imparted several lessons from his drug discovery strategy. First, he indicated that public molecular data have incredible utility. For this reason, all data from publicly funded research should eventually become publicly available and secondary uses of the data and the development of computational approaches need to be encouraged and funded. Second, enough high-quality data that can have a major impact on medicine already exist. He suggested that the data do not need to be perfect because a requirement for perfection can slow the deposition of data.

Butte also pointed out that funding may need to be provided explicitly for the generation and public release of data; along with funding for mice and equipment, specific funding should be designated for data sharing. “We should have a [budget line item] there for data sharing; make sure it's funded, so that there are no excuses. You can't say you didn't get enough money to put [those data] out there in the repositories,” Butte said. Fund the sharing of those data, he indicated.

He also mentioned that the infrastructure for data reuse needs to be not just developed but also used so that others can learn how the data can be put to good use. Furthermore, Butte said, “sticks seem to work better than carrots” for catalyzing change; funders and journals need to enforce data sharing to change the culture of the research enterprise.

Lastly, Butte emphasized the importance of training of students of all ages since researchers and clinicians are all lifelong students. He indicated that training is the best way for scaling to have an impact. To emphasize this point, he described a program at Stanford University called SPARK that helps academicians move research innovations from the bench to the bedside by educating faculty members, postdoctoral fellows, and graduate students about the translational research process so that the development of promising discoveries becomes second nature. Student involvement in innovation and entrepreneurship is the best way to scale up innovations, said Butte. “You want this to change the world? … You've got to get more of the younger folks involved.”


The process of translation from discovery to therapeutics often occurs along a linear path in multiple laboratories, with relatively weak connections existing among institutions. A better approach, said Geschwind, would be to incorporate a multidisciplinary systems biology approach in which different activities can inform each other. Multiple levels of genetic and phenotypic data could be integrated to yield more rapid and extensive progress.

Using autism as an example, Geschwind illustrated this integrated approach. Autism spectrum disorder (ASD) is a complex neuropsychiatric condition that includes overlapping phenotypes, including social communication deficits, language deficits, restrictive or repetitive behaviors, anxiety or attention deficit-hyperactivity disorder, and other medical comorbidities. In some cases it is caused by rare single nucleotide variants or copy number variations. In other cases, a mixture of common and rare genetic variants with different effect sizes seems to be involved. “Genetic variants accounting for 20 percent of ASD risk have been identified, and most of these are de novo genetic variants,” Geschwind said (Abrahams and Geschwind, 2008). Environmental factors are likely to be involved as well, though the influence of these factors is largely unknown. Genetic variants have been studied in the human brain and in animal models to understand dysfunction at the level of brain circuits, synapses, cells, and behaviors; and several new pharmacological treatments are in early trials. “It's a classically complex genetic problem,” said Geschwind.

In the late 1990s, Geschwind helped set up the Autism Genetic Resource Exchange (AGRE), which now includes more than 10,000 DNA samples and data from more than 1,500 families, including clinical data that have been extensively validated, which ensures that the data are of high quality. AGRE is an open resource shared with the scientific community, and the exchange has greatly accelerated the pace of sample collection and research, contributing to more than 200 publications since 2001. “Many of the major findings aren't from my lab,” said Geschwind. “They're from people who had different ideas than we did, and they … were better than our ideas.”

The quality of the data is an important consideration, according to Geschwind. The data in databases need to be validated to be used with confidence. Also, bigger is not necessarily better. “There has to be some thought put into what's going to be in the data.”

The data from AGRE have helped reveal an autism locus on almost every human chromosome, demonstrating the heterogeneity of this syndrome (Abrahams and Geschwind, 2008). Exome sequencing studies using the Simons Simplex Collection of data for families with one autistic child and unaffected parents and siblings have also revealed many hundreds of de novo mutations that contribute to autism with various effect sizes (Iossifov et al., 2012; O'Roak et al., 2012; Sanders et al., 2012). No single mutation accounts for more than 1 percent of cases, and many mutations have reduced penetrance. In addition, most of the mutations with large effect sizes are associated with other things as well, such as schizophrenia, intellectual disability, developmental delay, or learning disability. These results should not be surprising, Geschwind said, but they emphasize the fact that diagnostic categories are not very helpful in identifying genes involved in ASD.

Stratification of Disease by Genotype

The association of genetic factors with specific, measurable components of each disease, such as language and social behavior, called “endophenotypes,” will be stronger than the clinical diagnosis alone. To be most useful for genetic analysis, endophenotypes should be associated with the disease, heritable, relatively stable, identified in first-degree relatives more than the general population, and quantifiable. For example, a gene might be involved in social cognition, working memory, or implicit learning, “psychological constructs that are much more elemental than the broad diagnostic category of autism,” Geschwind said.

In neuroscience, functional imaging can be used to study the genetic variant as a biomarker or endophenotype related to the disorder because variants are not disease specific. In this way, personalized medicine can reduce heterogeneity by stratifying disease by genotype, said Geschwind, though so far this has not been a focus of translational research.

Geschwind described a mouse model of autism as an example of such research. Studies of the complex social behavior of mice in the mid-2000s discovered that mice communicate with ultrasonic vocalizations (Holy and Guo, 2005). This information was used to study mice with a knockout of the gene, contactin-associated protein-like 2 (CNTNAP2), which is associated with an increased risk of ASD in humans; CNTNAP2 mutations are associated with autism in roughly 70 percent of patients with this variant. These patients also have reduced communication, reduced sociability, increased repetitive behaviors, and increased hyperactivity. Furthermore, mice treated with oxytocin as a prosocial promoter therapy showed fewer effects, suggesting that the mouse model has predictive value and can be studied further to understand the mechanism of action of the drug and as a screening tool for other potential therapies, Geschwind said.

Key Approaches for Studies in Neuroscience

Network biology can provide an integrated view of the core drivers of ASD, Geschwind said. The functional relationships of several thousand genes can be reduced to groups of coexpressed gene modules that correspond to key elements of biological function. Within modules, the most central “hub” genes can be identified, and the network structure serves as the basis for making experimental predictions, testing causal and regulatory relationships, and integrating large sets of data with other sets of data.

An understanding of the multiple levels of disease-associated dysfunction that lead to abnormal behavior and cognition calls for systems biology approaches that are multidisciplinary and collaborative and that allow more rapid hypothesis testing, Geschwind said. A fundamental question is whether a therapy developed for one form of autism is relevant for another. “Are we talking about a thousand different drugs that we need, or can we coalesce this into 10 or 20 different pathways?” More genetic information will be needed to answer such questions, he said.

Because patients with brain disease cannot contribute tissue for diagnosis or research, live imaging studies and mouse studies can provide mechanistic insights, but these insights typically do not extend to the development of drugs. These limitations are why human-induced pluripotent stem cell–derived neurons may provide a way to study developmental function and may guide studies in mice for the development of therapies, said Geschwind.

Electronic Medical Record–Based Genomics

Genetic and phenotypic heterogeneity requires large numbers of samples to be examined to pinpoint the genetic variation. Etiological overlap means that studies will need to extend across disorders and measure appropriate phenotypes, which will be greatly furthered by the use of electronic medical records (EMRs) for storing data that could be used for research. “I think the future is in EMR-based genetics,” Geschwind said. UCLA already does exome sequencing as a clinical test, and soon it will do genomic sequencing on many of its patients and certainly those patients in neurology. The data will be put into EMRs, and if the data are good enough, they can be used for research. “We'll have the millions of people that we need instead of the 10,000 [whose genomes] I can sequence in my lab or with a group of labs.”


Both speakers commented on the need for a culture change in the biomedical research field for the field to move forward with the successful translation of basic science. In addition to increases in communication and collaboration across silos, Geschwind noted that the creation of different types of incentives for achieving goals that are focused on understanding disease is also needed to make progress. At present, a main goal of a researcher is to achieve academic tenure by publishing papers, but this is not the appropriate reward-based system needed to improve the translation of scientific discoveries to the clinic. “If your goal is to solve disease, those are the wrong incentives—papers don't solve diseases.”

Furthermore, competitive academic institutions provide few incentives to collaborate because of the competitive nature of achieving a tenured position. He pointed out that clear, socially responsible goals would help align incentives. A workshop participant commented that the current tenured generation of scientists can be empowered to set different incentives but will need to accept the fact that these incentives are different from those that they were accustomed to in the past.

Butte mentioned that a career in the biomedical sciences may attract future generations of scientists if potential scientists see that the establishment is changing to become more collaborative, innovative, and open to change. Members of the younger generation are likely to choose a career path that they perceive to be more entrepreneurial. If scientists work together to embrace innovation, that might change the perception of the field. “To get to big science, we're going to need team science,” he said.

Copyright 2014 by the National Academy of Sciences. All rights reserved.
Bookshelf ID: NBK201425


  • PubReader
  • Print View
  • Cite this Page
  • PDF version of this title (609K)

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...