Format
Sort by
Items per page

Send to

Choose Destination

Search results

Items: 1 to 20 of 25

1.
Inf Fusion. 2019 Oct;50:71-91. doi: 10.1016/j.inffus.2018.09.012. Epub 2018 Sep 21.

Machine Learning for Integrating Data in Biology and Medicine: Principles, Practice, and Opportunities.

Author information

1
Department of Computer Science, Stanford University, Stanford, CA, USA.
2
Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada.
3
Princess Margaret Cancer Centre, Toronto, ON, Canada.
4
Hikvision Research Institute, Santa Clara, CA, USA.
5
Chan Zuckerberg Biohub, San Francisco, CA, USA.
6
Genetics & Genome Biology, SickKids Research Institute, Toronto, ON, Canada.
7
Department of Computer Science, University of Toronto, Toronto, ON, Canada.
8
Vector Institute, Toronto, ON, Canada.

Abstract

New technologies have enabled the investigation of biology and human health at an unprecedented scale and in multiple dimensions. These dimensions include myriad properties describing genome, epigenome, transcriptome, microbiome, phenotype, and lifestyle. No single data type, however, can capture the complexity of all the factors relevant to understanding a phenomenon such as a disease. Integrative methods that combine data from multiple technologies have thus emerged as critical statistical and computational approaches. The key challenge in developing such approaches is the identification of effective models to provide a comprehensive and relevant systems view. An ideal method can answer a biological or medical question, identifying important features and predicting outcomes, by harnessing heterogeneous data across several dimensions of biological variation. In this Review, we describe the principles of data integration and discuss current methods and available implementations. We provide examples of successful data integration in biology and medicine. Finally, we discuss current challenges in biomedical integrative methods and our perspective on the future development of the field.

KEYWORDS:

computational biology; heterogeneous data; machine learning; personalized medicine; systems biology

PMID:
30467459
PMCID:
PMC6242341
[Available on 2020-10-01]
DOI:
10.1016/j.inffus.2018.09.012
2.
Nature. 2018 Nov;563(7732):579-583. doi: 10.1038/s41586-018-0703-0. Epub 2018 Nov 14.

Sensitive tumour detection and classification using plasma cell-free DNA methylomes.

Author information

1
Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada.
2
Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Ontario, Canada.
3
Memorial Sloan Kettering Cancer Center, New York, NY, USA.
4
Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.
5
Genome Technologies, Ontario Institute for Cancer Research, Toronto, Ontario, Canada.
6
UMR_S 1236, Univ Rennes 1, Inserm, Etablissement Français du sang Bretagne, Rennes, France.
7
Department of Biochemistry and Molecular Medicine, UC Davis Comprehensive Cancer Center, Sacramento, CA, USA.
8
Division of Epidemiology, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada.
9
Fred Litwin Centre for Cancer Genetics, Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario, Canada.
10
Department of Surgery, Toronto General Hospital, Toronto, Ontario, Canada.
11
Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.
12
Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Ontario, Canada. rayjean.hung@lunenfeld.ca.
13
Division of Epidemiology, Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada. rayjean.hung@lunenfeld.ca.
14
Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada. ddecarv@uhnresearch.ca.
15
Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada. ddecarv@uhnresearch.ca.

Abstract

The use of liquid biopsies for cancer detection and management is rapidly gaining prominence1. Current methods for the detection of circulating tumour DNA involve sequencing somatic mutations using cell-free DNA, but the sensitivity of these methods may be low among patients with early-stage cancer given the limited number of recurrent mutations2-5. By contrast, large-scale epigenetic alterations-which are tissue- and cancer-type specific-are not similarly constrained6 and therefore potentially have greater ability to detect and classify cancers in patients with early-stage disease. Here we develop a sensitive, immunoprecipitation-based protocol to analyse the methylome of small quantities of circulating cell-free DNA, and demonstrate the ability to detect large-scale DNA methylation changes that are enriched for tumour-specific patterns. We also demonstrate robust performance in cancer detection and classification across an extensive collection of plasma samples from several tumour types. This work sets the stage to establish biomarkers for the minimally invasive detection, interception and classification of early-stage cancers based on plasma cell-free DNA methylation patterns.

3.
Nucleic Acids Res. 2018 Nov 16;46(20):e120. doi: 10.1093/nar/gky677.

Umap and Bismap: quantifying genome and methylome mappability.

Author information

1
Princess Margaret Cancer Centre, M5G 1L7, Toronto, ON, Canada.
2
Department of Medical Biophysics, M5G 1L7, University of Toronto, Toronto, ON, Canada.
3
Vector Institute, M5G 1M1, Toronto, ON, Canada.
4
Department of Human Genetics, McGill University, H3A 0C7, Montreal, QC, Canada.
5
Department of Genetics, Stanford University, 94305-9025, Stanford, CA, USA.
6
Department of Computer Science, Stanford University, 94305-5120, Stanford, CA, USA.
7
Department of Computer Science, University of Toronto, M5S 2E4, Toronto, ON, Canada.

Abstract

Short-read sequencing enables assessment of genetic and biochemical traits of individual genomic regions, such as the location of genetic variation, protein binding and chemical modifications. Every region in a genome assembly has a property called 'mappability', which measures the extent to which it can be uniquely mapped by sequence reads. In regions of lower mappability, estimates of genomic and epigenomic characteristics from sequencing assays are less reliable. These regions have increased susceptibility to spurious mapping from reads from other regions of the genome with sequencing errors or unexpected genetic variation. Bisulfite sequencing approaches used to identify DNA methylation exacerbate these problems by introducing large numbers of reads that map to multiple regions. Both to correct assumptions of uniformity in downstream analysis and to identify regions where the analysis is less reliable, it is necessary to know the mappability of both ordinary and bisulfite-converted genomes. We introduce the Umap software for identifying uniquely mappable regions of any genome. Its Bismap extension identifies mappability of the bisulfite-converted genome. A Umap and Bismap track hub for human genome assemblies GRCh37/hg19 and GRCh38/hg38, and mouse assemblies GRCm37/mm9 and GRCm38/mm10 is available at https://bismap.hoffmanlab.org for use with genome browsers.

4.
J R Soc Interface. 2018 Apr;15(141). pii: 20170387. doi: 10.1098/rsif.2017.0387.

Opportunities and obstacles for deep learning in biology and medicine.

Author information

1
Molecular Biosciences and Bioengineering Graduate Program, University of Hawaii at Manoa, Honolulu, HI, USA.
2
Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
3
Genomics and Computational Biology Graduate Group, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA.
4
Department of Computational Medicine and Bioinformatics, University of Michigan Medical School, Ann Arbor, MI, USA.
5
Harvard Medical School, Boston, MA, USA.
6
Computational Biology and Stats, Target Sciences, GlaxoSmithKline, Stevenage, UK.
7
Data Science Institute, Imperial College London, London, UK.
8
Princess Margaret Cancer Centre, Toronto, Ontario, Canada.
9
Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.
10
Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.
11
Electrical Engineering and Computer Science, Vanderbilt University, Nashville, TN, USA.
12
Ecological and Evolutionary Signal-processing and Informatics Laboratory, Department of Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA.
13
Computational Biology Department, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
14
Biophysics Program, Stanford University, Stanford, CA, USA.
15
Department of Computer Science, University of Virginia, Charlottesville, VA, USA.
16
Imaging Platform, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
17
Department of Computer Science, Stanford University, Stanford, CA, USA.
18
Toyota Technological Institute at Chicago, Chicago, IL, USA.
19
Department of Computer Science, Trinity University, San Antonio, TX, USA.
20
Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ, USA.
21
Integrative Bioinformatics, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC, USA.
22
Howard Hughes Medical Institute, Janelia Research Campus, Ashburn, VA, USA.
23
National Center for Biotechnology Information and National Library of Medicine, National Institutes of Health, Bethesda, MD, USA.
24
Department of Wildlife Ecology and Conservation, University of Florida, Gainesville, FL, USA.
25
ClosedLoop.ai, Austin, TX, USA.
26
Department of Genetics, Stanford University, Stanford, CA, USA.
27
Division of Biomedical Informatics and Personalized Medicine, University of Colorado School of Medicine, Aurora, CO, USA.
28
Institute of Organic Chemistry, Westfälische Wilhelms-Universität Münster, Münster, Germany.
29
Innovation Center for Biomedical Informatics, Georgetown University Medical Center, Washington, DC, USA.
30
Department of Pathology and Immunology, Washington University in Saint Louis, St Louis, MO, USA.
31
Department of Medicine, Brown University, Providence, RI, USA.
32
Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI, USA gitter@biostat.wisc.edu.
33
Morgridge Institute for Research, Madison, WI, USA.
34
Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA greenescientist@gmail.com.

Abstract

Deep learning describes a class of machine learning algorithms that are capable of combining raw inputs into layers of intermediate features. These algorithms have recently shown impressive results across a variety of domains. Biology and medicine are data-rich disciplines, but the data are complex and often ill-understood. Hence, deep learning techniques may be particularly well suited to solve problems of these fields. We examine applications of deep learning to a variety of biomedical problems-patient classification, fundamental biological processes and treatment of patients-and discuss whether deep learning will be able to transform these tasks or if the biomedical sphere poses unique challenges. Following from an extensive literature review, we find that deep learning has yet to revolutionize biomedicine or definitively resolve any of the most pressing challenges in the field, but promising advances have been made on the prior state of the art. Even though improvements over previous baselines have been modest in general, the recent progress indicates that deep learning methods will provide valuable means for speeding up or aiding human investigation. Though progress has been made linking a specific neural network's prediction to input features, understanding how users should interpret these models to make testable hypotheses about the system under study remains an open challenge. Furthermore, the limited amount of labelled data for training presents problems in some domains, as do legal and privacy constraints on work with sensitive health records. Nonetheless, we foresee deep learning enabling changes at both bench and bedside with the potential to transform several areas of biology and medicine.

KEYWORDS:

deep learning; genomics; machine learning; precision medicine

5.
Proc Natl Acad Sci U S A. 2018 Feb 20;115(8):1690-1692. doi: 10.1073/pnas.1800256115. Epub 2018 Feb 12.

Classification and interaction in random forests.

Author information

1
Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada M5G 1L7.
2
Princess Margaret Cancer Centre, Toronto, ON, Canada M5G 1L7.
3
Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada M5G 1L7; michael.hoffman@utoronto.ca.
4
Department of Computer Science, University of Toronto, Toronto, ON, Canada M5S 3G4.
PMID:
29440440
PMCID:
PMC5828645
DOI:
10.1073/pnas.1800256115
[Indexed for MEDLINE]
Free PMC Article
Icon for HighWire Icon for PubMed Central
6.
Bioinformatics. 2018 Feb 15;34(4):669-671. doi: 10.1093/bioinformatics/btx603.

Segway 2.0: Gaussian mixture models and minibatch training.

Author information

1
Princess Margaret Cancer Centre, Toronto, ON M5G 1L7, Canada.
2
Engineering Physics Program, University of British Columbia, Vancouver, BC V6T 1Z1, Canada.
3
Department of Computer Science and Engineering.
4
Department of Electrical Engineering.
5
Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.
6
Department of Computer Science, University of Toronto, Toronto, ON M5S 3G4, Canada.
7
Department of Medical Biophysics, University of Toronto, Toronto, ON M5G 1L7, Canada.

Abstract

Summary:

Segway performs semi-automated genome annotation, discovering joint patterns across multiple genomic signal datasets. We discuss a major new version of Segway and highlight its ability to model data with substantially greater accuracy. Major enhancements in Segway 2.0 include the ability to model data with a mixture of Gaussians, enabling capture of arbitrarily complex signal distributions, and minibatch training, leading to better learned parameters.

Availability and implementation:

Segway and its source code are freely available for download at http://segway.hoffmanlab.org. We have made available scripts (https://doi.org/10.5281/zenodo.802939) and datasets (https://doi.org/10.5281/zenodo.802906) for this paper's analysis.

Contact:

michael.hoffman@utoronto.ca.

Supplementary information:

Supplementary data are available at Bioinformatics online.

PMID:
29028889
PMCID:
PMC5860603
DOI:
10.1093/bioinformatics/btx603
[Indexed for MEDLINE]
Free PMC Article
Icon for Silverchair Information Systems Icon for PubMed Central
7.
Genome Biol. 2017 Jan 16;18(1):5. doi: 10.1186/s13059-016-1135-5.

Genome Informatics 2016.

Author information

1
Princess Margaret Cancer Centre, Toronto, Canada.
2
Princess Margaret Cancer Centre, Toronto, Canada. michael.hoffman@utoronto.ca.
3
Department of Medical Biophysics, University of Toronto, Toronto, Canada. michael.hoffman@utoronto.ca.
4
Department of Computer Science, University of Toronto, Toronto, Canada. michael.hoffman@utoronto.ca.

Abstract

A report on the Genome Informatics conference, held at the Wellcome Genome Campus Conference Centre, Hinxton, United Kingdom, 19-22 September 2016.

PMID:
28093077
PMCID:
PMC5240446
DOI:
10.1186/s13059-016-1135-5
[Indexed for MEDLINE]
Free PMC Article
Icon for BioMed Central Icon for PubMed Central
8.
Brief Bioinform. 2018 Jul 20;19(4):693-699. doi: 10.1093/bib/bbw134.

Top considerations for creating bioinformatics software documentation.

Author information

1
Department of Medical Biophysics, University of Toronto, Toronto, Canada.

Abstract

Investing in documenting your bioinformatics software well can increase its impact and save your time. To maximize the effectiveness of your documentation, we suggest following a few guidelines we propose here. We recommend providing multiple avenues for users to use your research software, including a navigable HTML interface with a quick start, useful help messages with detailed explanation and thorough examples for each feature of your software. By following these guidelines, you can assure that your hard work maximally benefits yourself and others.

9.
Genome Biol. 2016 Apr 30;17:82. doi: 10.1186/s13059-016-0925-0.

ChromNet: Learning the human chromatin network from all ENCODE ChIP-seq data.

Author information

1
Department of Computer Science and Engineering, University of Washington, Seattle, WA, USA.
2
Department of Medical Biophysics, University of Toronto, Toronto, ON, Canada.
3
Princess Margaret Cancer Centre, Toronto, ON, Canada.
4
Department of Computer Science, University of Toronto, Toronto, ON, Canada.
5
Department of Computer Science and Engineering, University of Washington, Seattle, WA, USA. suinlee@uw.edu.
6
Department of Genome Sciences, University of Washington, Seattle, WA, USA. suinlee@uw.edu.

Abstract

A cell's epigenome arises from interactions among regulatory factors-transcription factors and histone modifications-co-localized at particular genomic regions. We developed a novel statistical method, ChromNet, to infer a network of these interactions, the chromatin network, by inferring conditional-dependence relationships among a large number of ChIP-seq data sets. We applied ChromNet to all available 1451 ChIP-seq data sets from the ENCODE Project, and showed that ChromNet revealed previously known physical interactions better than alternative approaches. We experimentally validated one of the previously unreported interactions, MYC-HCFC1. An interactive visualization tool is available at http://chromnet.cs.washington.edu.

PMID:
27139377
PMCID:
PMC4852466
DOI:
10.1186/s13059-016-0925-0
[Indexed for MEDLINE]
Free PMC Article
Icon for BioMed Central Icon for PubMed Central
10.
Nat Methods. 2015 Mar;12(3):191-2. doi: 10.1038/nmeth.3291.

Determining the epigenome using DNA alone.

Author information

1
Department of Computer Science, University of Toronto, and Princess Margaret Cancer Centre, Toronto, Ontario, Canada.
2
1] Department of Computer Science, University of Toronto, and Princess Margaret Cancer Centre, Toronto, Ontario, Canada [2] Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada.
PMID:
25719827
DOI:
10.1038/nmeth.3291
[Indexed for MEDLINE]
Icon for Nature Publishing Group
11.
Genome Res. 2015 Apr;25(4):544-57. doi: 10.1101/gr.184341.114. Epub 2015 Feb 12.

Joint annotation of chromatin state and chromatin conformation reveals relationships among domain types and identifies domains of cell-type-specific expression.

Author information

1
Department of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA.
2
Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.
3
Princess Margaret Cancer Centre, University of Toronto, ON M5G 1L7, Canada Department of Medical Biophysics, University of Toronto, ON M5G 1L7, Canada.
4
Department of Biological Science, The Florida State University, Tallahassee, Florida 32304, USA.
5
Department of Electrical Engineering, University of Washington, Seattle, Washington 98195, USA.
6
Department of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.

Abstract

The genomic neighborhood of a gene influences its activity, a behavior that is attributable in part to domain-scale regulation. Previous genomic studies have identified many types of regulatory domains. However, due to the difficulty of integrating genomics data sets, the relationships among these domain types are poorly understood. Semi-automated genome annotation (SAGA) algorithms facilitate human interpretation of heterogeneous collections of genomics data by simultaneously partitioning the human genome and assigning labels to the resulting genomic segments. However, existing SAGA methods cannot integrate inherently pairwise chromatin conformation data. We developed a new computational method, called graph-based regularization (GBR), for expressing a pairwise prior that encourages certain pairs of genomic loci to receive the same label in a genome annotation. We used GBR to exploit chromatin conformation information during genome annotation by encouraging positions that are close in 3D to occupy the same type of domain. Using this approach, we produced a model of chromatin domains in eight human cell types, thereby revealing the relationships among known domain types. Through this model, we identified clusters of tightly regulated genes expressed in only a small number of cell types, which we term "specific expression domains." We found that domain boundaries marked by promoters and CTCF motifs are consistent between cell types even when domain activity changes. Finally, we showed that GBR can be used to transfer information from well-studied cell types to less well-characterized cell types during genome annotation, making it possible to produce high-quality annotations of the hundreds of cell types with limited available data.

PMID:
25677182
PMCID:
PMC4381526
DOI:
10.1101/gr.184341.114
[Indexed for MEDLINE]
Free PMC Article
Icon for HighWire Icon for PubMed Central
12.
Genome Biol. 2015 Jan 24;16:13. doi: 10.1186/s13059-015-0587-3.

Extending reference assembly models.

Abstract

The human genome reference assembly is crucial for aligning and analyzing sequence data, and for genome annotation, among other roles. However, the models and analysis assumptions that underlie the current assembly need revising to fully represent human sequence diversity. Improved analysis tools and updated data reporting formats are also required.

PMID:
25651527
PMCID:
PMC4305238
DOI:
10.1186/s13059-015-0587-3
[Indexed for MEDLINE]
Free PMC Article
Icon for BioMed Central Icon for PubMed Central
13.
Nature. 2014 Aug 28;512(7515):449-52. doi: 10.1038/nature13415.

Comparative analysis of metazoan chromatin organization.

Author information

1
1] Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA [2] Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts 02115, USA [3] [4] Victor Chang Cardiac Research Institute and The University of New South Wales, Sydney, New South Wales 2052, Australia (J.W.K.H.); Department of Biochemistry, University at Buffalo, Buffalo, New York 14203, USA (T.L.); Department of Molecular Biology and Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08540, USA (K.I., T.E.J.); Department of Human Genetics, University of Chicago, Chicago, Illinois 06037, USA (J.D.L.); Division of Genomic Technologies, Center for Life Science Technologies, RIKEN, Yokohama 230-0045, Japan (A.M.); Department of Genetics, Department of Computer Science, Stanford University, Stanford, California 94305, USA (A.K.); Department of Biology, The University of Alabama at Birmingham, Birmingham, Alabama 35294, USA (N.C.R.).
2
1] Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA [2] Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts 02115, USA [3].
3
1] Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA [2] Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health, 450 Brookline Avenue, Boston, Massachusetts 02215, USA [3] [4] Victor Chang Cardiac Research Institute and The University of New South Wales, Sydney, New South Wales 2052, Australia (J.W.K.H.); Department of Biochemistry, University at Buffalo, Buffalo, New York 14203, USA (T.L.); Department of Molecular Biology and Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08540, USA (K.I., T.E.J.); Department of Human Genetics, University of Chicago, Chicago, Illinois 06037, USA (J.D.L.); Division of Genomic Technologies, Center for Life Science Technologies, RIKEN, Yokohama 230-0045, Japan (A.M.); Department of Genetics, Department of Computer Science, Stanford University, Stanford, California 94305, USA (A.K.); Department of Biology, The University of Alabama at Birmingham, Birmingham, Alabama 35294, USA (N.C.R.).
4
Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA.
5
1] Department of Biology and Carolina Center for Genome Sciences, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA [2] Victor Chang Cardiac Research Institute and The University of New South Wales, Sydney, New South Wales 2052, Australia (J.W.K.H.); Department of Biochemistry, University at Buffalo, Buffalo, New York 14203, USA (T.L.); Department of Molecular Biology and Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08540, USA (K.I., T.E.J.); Department of Human Genetics, University of Chicago, Chicago, Illinois 06037, USA (J.D.L.); Division of Genomic Technologies, Center for Life Science Technologies, RIKEN, Yokohama 230-0045, Japan (A.M.); Department of Genetics, Department of Computer Science, Stanford University, Stanford, California 94305, USA (A.K.); Department of Biology, The University of Alabama at Birmingham, Birmingham, Alabama 35294, USA (N.C.R.).
6
1] Department of Information and Computer Engineering, Ajou University, Suwon 443-749, Korea [2] Systems Biomedical Informatics Research Center, College of Medicine, Seoul National University, Seoul 110-799, Korea.
7
1] Department of Genome Dynamics, Life Sciences Division, Lawrence Berkeley National Lab, Berkeley, California 94720, USA [2] Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, California 94720, USA [3] Victor Chang Cardiac Research Institute and The University of New South Wales, Sydney, New South Wales 2052, Australia (J.W.K.H.); Department of Biochemistry, University at Buffalo, Buffalo, New York 14203, USA (T.L.); Department of Molecular Biology and Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08540, USA (K.I., T.E.J.); Department of Human Genetics, University of Chicago, Chicago, Illinois 06037, USA (J.D.L.); Division of Genomic Technologies, Center for Life Science Technologies, RIKEN, Yokohama 230-0045, Japan (A.M.); Department of Genetics, Department of Computer Science, Stanford University, Stanford, California 94305, USA (A.K.); Department of Biology, The University of Alabama at Birmingham, Birmingham, Alabama 35294, USA (N.C.R.).
8
1] Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA [2] Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts 02115, USA [3] Department of Molecular Biology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts 02114, USA.
9
The Gurdon Institute and Department of Genetics, University of Cambridge, Tennis Court Road, Cambridge CB2 1QN, UK.
10
1] National Institute of General Medical Sciences, National Institutes of Health, Bethesda, Maryland 20892, USA [2] National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA.
11
Department of Biology, Washington University in St. Louis, St. Louis, Missouri 63130, USA.
12
1] Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA [2] Broad Institute, Cambridge, Massachusetts 02141, USA [3] Victor Chang Cardiac Research Institute and The University of New South Wales, Sydney, New South Wales 2052, Australia (J.W.K.H.); Department of Biochemistry, University at Buffalo, Buffalo, New York 14203, USA (T.L.); Department of Molecular Biology and Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08540, USA (K.I., T.E.J.); Department of Human Genetics, University of Chicago, Chicago, Illinois 06037, USA (J.D.L.); Division of Genomic Technologies, Center for Life Science Technologies, RIKEN, Yokohama 230-0045, Japan (A.M.); Department of Genetics, Department of Computer Science, Stanford University, Stanford, California 94305, USA (A.K.); Department of Biology, The University of Alabama at Birmingham, Birmingham, Alabama 35294, USA (N.C.R.).
13
1] Department of Biology, Washington University in St. Louis, St. Louis, Missouri 63130, USA [2] Victor Chang Cardiac Research Institute and The University of New South Wales, Sydney, New South Wales 2052, Australia (J.W.K.H.); Department of Biochemistry, University at Buffalo, Buffalo, New York 14203, USA (T.L.); Department of Molecular Biology and Lewis Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08540, USA (K.I., T.E.J.); Department of Human Genetics, University of Chicago, Chicago, Illinois 06037, USA (J.D.L.); Division of Genomic Technologies, Center for Life Science Technologies, RIKEN, Yokohama 230-0045, Japan (A.M.); Department of Genetics, Department of Computer Science, Stanford University, Stanford, California 94305, USA (A.K.); Department of Biology, The University of Alabama at Birmingham, Birmingham, Alabama 35294, USA (N.C.R.).
14
1] Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA [2] Program in Bioinformatics, Boston University, Boston, Massachusetts 02215, USA.
15
Department of Molecular, Cell and Developmental Biology, University of California Santa Cruz, Santa Cruz, California 95064, USA.
16
Department of Bioinformatics, School of Life Science and Technology, Tongji University, Shanghai 200092, China.
17
1] Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts 02115, USA [2] Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115, USA.
18
1] Department of Molecular Biology and Biochemistry, Rutgers University, Piscataway, New Jersey 08854, USA [2] Food Science and Technology Department, Faculty of Agriculture, Alexandria University, 21545 El-Shatby, Alexandria, Egypt.
19
Department of Pharmacology and Cancer Biology, Duke University Medical Center, Durham, North Carolina 27710, USA.
20
Department of Molecular Biology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts 02114, USA.
21
Department of Biology and Carolina Center for Genome Sciences, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA.
22
1] Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA [2] Harvard/MIT Division of Health Sciences and Technology, Cambridge, Massachusetts 02139, USA.
23
Department of Anatomy Physiology and Cell Biology, University of California Davis, Davis, California 95616, USA.
24
Broad Institute, Cambridge, Massachusetts 02141, USA.
25
1] Department of Biology and Carolina Center for Genome Sciences, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599, USA [2] Department of Biology, Center for Genomics and Systems Biology, New York University, New York, New York 10003, USA.
26
National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA.
27
1] Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA [2] Broad Institute, Cambridge, Massachusetts 02141, USA.
28
1] Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA [2] Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts 02115, USA.
29
Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, California 94720, USA.
30
Princess Margaret Cancer Centre, Toronto, Ontario M6G 1L7, Canada.
31
1] Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, California 94720, USA [2] Howard Hughes Medical Institute, Chevy Chase, Maryland 20815, USA.
32
1] Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA [2] Broad Institute, Cambridge, Massachusetts 02141, USA.
33
1] Department of Genome Dynamics, Life Sciences Division, Lawrence Berkeley National Lab, Berkeley, California 94720, USA [2] Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, California 94720, USA.
34
Department of Molecular Biology, Cellular Biology and Biochemistry, Brown University, Providence, Rhode Island 02912, USA.
35
Department of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA.
36
1] Department of Genome Dynamics, Life Sciences Division, Lawrence Berkeley National Lab, Berkeley, California 94720, USA [2] Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, California 94720, USA [3] Howard Hughes Medical Institute, Chevy Chase, Maryland 20815, USA.
37
1] Department of Molecular Biology and Biochemistry, Rutgers University, Piscataway, New Jersey 08854, USA [2] Department of Molecular Biology, Umea University, 901 87 Umea, Sweden.
38
1] Systems Biomedical Informatics Research Center, College of Medicine, Seoul National University, Seoul 110-799, Korea [2] Seoul National University Biomedical Informatics, Division of Biomedical Informatics, College of Medicine, Seoul National University, Seoul 110-799, Korea.
39
1] Broad Institute, Cambridge, Massachusetts 02141, USA [2] Howard Hughes Medical Institute, Chevy Chase, Maryland 20815, USA [3] Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts 02114, USA.
40
Department of Molecular Biology and Biochemistry, Rutgers University, Piscataway, New Jersey 08854, USA.
41
1] Department of Computer Science and Engineering, University of Washington, Seattle, Washington 98195, USA [2] Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.
42
1] Program in Bioinformatics, Boston University, Boston, Massachusetts 02215, USA [2] Department of Chemistry, Boston University, Boston, Massachusetts 02215, USA.
43
1] Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA [2] Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Harvard School of Public Health, 450 Brookline Avenue, Boston, Massachusetts 02215, USA [3] Broad Institute, Cambridge, Massachusetts 02141, USA.
44
1] Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02115, USA [2] Division of Genetics, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts 02115, USA [3] Informatics Program, Children's Hospital, Boston, Massachusetts 02215, USA.

Abstract

Genome function is dynamically regulated in part by chromatin, which consists of the histones, non-histone proteins and RNA molecules that package DNA. Studies in Caenorhabditis elegans and Drosophila melanogaster have contributed substantially to our understanding of molecular mechanisms of genome function in humans, and have revealed conservation of chromatin components and mechanisms. Nevertheless, the three organisms have markedly different genome sizes, chromosome architecture and gene organization. On human and fly chromosomes, for example, pericentric heterochromatin flanks single centromeres, whereas worm chromosomes have dispersed heterochromatin-like regions enriched in the distal chromosomal 'arms', and centromeres distributed along their lengths. To systematically investigate chromatin organization and associated gene regulation across species, we generated and analysed a large collection of genome-wide chromatin data sets from cell lines and developmental stages in worm, fly and human. Here we present over 800 new data sets from our ENCODE and modENCODE consortia, bringing the total to over 1,400. Comparison of combinatorial patterns of histone modifications, nuclear lamina-associated domains, organization of large-scale topological domains, chromatin environment at promoters and enhancers, nucleosome positioning, and DNA replication patterns reveals many conserved features of chromatin organization among the three organisms. We also find notable differences in the composition and locations of repressive chromatin. These data sets and analyses provide a rich resource for comparative and species-specific investigations of chromatin composition, organization and function.

Comment in

PMID:
25164756
PMCID:
PMC4227084
DOI:
10.1038/nature13415
[Indexed for MEDLINE]
Free PMC Article
Icon for Nature Publishing Group Icon for PubMed Central

Publication types, MeSH terms, Substances, Grant support

Publication types

MeSH terms

Substances

Grant support

14.
Nucleic Acids Res. 2013 Jan;41(2):827-41. doi: 10.1093/nar/gks1284. Epub 2012 Dec 5.

Integrative annotation of chromatin elements from ENCODE data.

Author information

1
Department of Genome Sciences, University of Washington, 3720 15th Ave NE, Seattle, WA 98195-5065, USA.

Abstract

The ENCODE Project has generated a wealth of experimental information mapping diverse chromatin properties in several human cell lines. Although each such data track is independently informative toward the annotation of regulatory elements, their interrelations contain much richer information for the systematic annotation of regulatory elements. To uncover these interrelations and to generate an interpretable summary of the massive datasets of the ENCODE Project, we apply unsupervised learning methodologies, converting dozens of chromatin datasets into discrete annotation maps of regulatory regions and other chromatin elements across the human genome. These methods rediscover and summarize diverse aspects of chromatin architecture, elucidate the interplay between chromatin activity and RNA transcription, and reveal that a large proportion of the genome lies in a quiescent state, even across multiple cell types. The resulting annotation of non-coding regulatory elements correlate strongly with mammalian evolutionary constraint, and provide an unbiased approach for evaluating metrics of evolutionary constraint in human. Lastly, we use the regulatory annotations to revisit previously uncharacterized disease-associated loci, resulting in focused, testable hypotheses through the lens of the chromatin landscape.

PMID:
23221638
PMCID:
PMC3553955
DOI:
10.1093/nar/gks1284
[Indexed for MEDLINE]
Free PMC Article
Icon for Silverchair Information Systems Icon for PubMed Central
15.
Genome Res. 2012 Sep;22(9):1813-31. doi: 10.1101/gr.136184.111.

ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia.

Author information

1
Department of Genetics, Stanford University, Stanford, California 94305, USA.

Abstract

Chromatin immunoprecipitation (ChIP) followed by high-throughput DNA sequencing (ChIP-seq) has become a valuable and widely used approach for mapping the genomic location of transcription-factor binding and histone modifications in living cells. Despite its widespread use, there are considerable differences in how these experiments are conducted, how the results are scored and evaluated for quality, and how the data and metadata are archived for public use. These practices affect the quality and utility of any global ChIP experiment. Through our experience in performing ChIP-seq experiments, the ENCODE and modENCODE consortia have developed a set of working standards and guidelines for ChIP experiments that are updated routinely. The current guidelines address antibody validation, experimental replication, sequencing depth, data and metadata reporting, and data quality assessment. We discuss how ChIP quality, assessed in these ways, affects different uses of ChIP-seq data. All data sets used in the analysis have been deposited for public viewing and downloading at the ENCODE (http://encodeproject.org/ENCODE/) and modENCODE (http://www.modencode.org/) portals.

PMID:
22955991
PMCID:
PMC3431496
DOI:
10.1101/gr.136184.111
[Indexed for MEDLINE]
Free PMC Article
Icon for HighWire Icon for PubMed Central
16.
Nature. 2012 Sep 6;489(7414):57-74. doi: 10.1038/nature11247.

An integrated encyclopedia of DNA elements in the human genome.

Collaborators (594)

Dunham I, Kundaje A, Aldred SF, Collins PJ, Davis CA, Doyle F, Epstein CB, Frietze S, Harrow J, Kaul R, Khatun J, Lajoie BR, Landt SG, Lee BK, Pauli F, Rosenbloom KR, Sabo P, Safi A, Sanyal A, Shoresh N, Simon JM, Song L, Trinklein ND, Altshuler RC, Birney E, Brown JB, Cheng C, Djebali S, Dong X, Dunham I, Ernst J, Furey TS, Gerstein M, Giardine B, Greven M, Hardison RC, Harris RS, Herrero J, Hoffman MM, Iyer S, Kellis M, Khatun J, Kheradpour P, Kundaje A, Lassmann T, Li Q, Lin X, Marinov GK, Merkel A, Mortazavi A, Parker SC, Reddy TE, Rozowsky J, Schlesinger F, Thurman RE, Wang J, Ward LD, Whitfield TW, Wilder SP, Wu W, Xi HS, Yip KY, Zhuang J, Pazin MJ, Lowdon RF, Dillon LA, Adams LB, Kelly CJ, Zhang J, Wexler JR, Green ED, Good PJ, Feingold EA, Bernstein BE, Birney E, Crawford GE, Dekker J, Elnitski L, Farnham PJ, Gerstein M, Giddings MC, Gingeras TR, Green ED, Guigó R, Hardison RC, Hubbard TJ, Kellis M, Kent W, Lieb JD, Margulies EH, Myers RM, Snyder M, Stamatoyannopoulos JA, Tenenbaum SA, Weng Z, White KP, Wold B, Khatun J, Yu Y, Wrobel J, Risk BA, Gunawardena HP, Kuiper HC, Maier CW, Xie L, Chen X, Giddings MC, Bernstein BE, Epstein CB, Shoresh N, Ernst J, Kheradpour P, Mikkelsen TS, Gillespie S, Goren A, Ram O, Zhang X, Wang L, Issner R, Coyne MJ, Durham T, Ku M, Truong T, Ward LD, Altshuler RC, Eaton ML, Kellis M, Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A, Tanzer A, Lagarde J, Lin W, Schlesinger F, Xue C, Marinov GK, Khatun J, Williams BA, Zaleski C, Rozowsky J, Röder M, Kokocinski F, Abdelhamid RF, Alioto T, Antoshechkin I, Baer MT, Batut P, Bell I, Bell K, Chakrabortty S, Chen X, Chrast J, Curado J, Derrien T, Drenkow J, Dumais E, Dumais J, Duttagupta R, Fastuca M, Fejes-Toth K, Ferreira P, Foissac S, Fullwood MJ, Gao H, Gonzalez D, Gordon A, Gunawardena HP, Howald C, Jha S, Johnson R, Kapranov P, King B, Kingswood C, Li G, Luo OJ, Park E, Preall JB, Presaud K, Ribeca P, Risk BA, Robyr D, Ruan X, Sammeth M, Sandhu KS, Schaeffer L, See LH, Shahab A, Skancke J, Suzuki AM, Takahashi H, Tilgner H, Trout D, Walters N, Wang H, Wrobel J, Yu Y, Hayashizaki Y, Harrow J, Gerstein M, Hubbard TJ, Reymond A, Antonarakis SE, Hannon GJ, Giddings MC, Ruan Y, Wold B, Carninci P, Guigó R, Gingeras TR, Rosenbloom KR, Sloan CA, Learned K, Malladi VS, Wong MC, Barber GP, Cline MS, Dreszer TR, Heitner SG, Karolchik D, Kent W, Kirkup VM, Meyer LR, Long JC, Maddren M, Raney BJ, Furey TS, Song L, Grasfeder LL, Giresi PG, Lee BK, Battenhouse A, Sheffield NC, Simon JM, Showers KA, Safi A, London D, Bhinge AA, Shestak C, Schaner MR, Kim SK, Zhang ZZ, Mieczkowski PA, Mieczkowska JO, Liu Z, McDaniell RM, Ni Y, Rashid NU, Kim MJ, Adar S, Zhang Z, Wang T, Winter D, Keefe D, Birney E, Iyer VR, Lieb JD, Crawford GE, Li G, Sandhu KS, Zheng M, Wang P, Luo OJ, Shahab A, Fullwood MJ, Ruan X, Ruan Y, Myers RM, Pauli F, Williams BA, Gertz J, Marinov GK, Reddy TE, Vielmetter J, Partridge E, Trout D, Varley KE, Gasper C, Bansal A, Pepke S, Jain P, Amrhein H, Bowling KM, Anaya M, Cross MK, King B, Muratet MA, Antoshechkin I, Newberry KM, McCue K, Nesmith AS, Fisher-Aylor KI, Pusey B, DeSalvo G, Parker SL, Balasubramanian S, Davis NS, Meadows SK, Eggleston T, Gunter C, Newberry J, Levy SE, Absher DM, Mortazavi A, Wong WH, Wold B, Blow MJ, Visel A, Pennachio LA, Elnitski L, Margulies EH, Parker SC, Petrykowska HM, Abyzov A, Aken B, Barrell D, Barson G, Berry A, Bignell A, Boychenko V, Bussotti G, Chrast J, Davidson C, Derrien T, Despacio-Reyes G, Diekhans M, Ezkurdia I, Frankish A, Gilbert J, Gonzalez JM, Griffiths E, Harte R, Hendrix DA, Howald C, Hunt T, Jungreis I, Kay M, Khurana E, Kokocinski F, Leng J, Lin MF, Loveland J, Lu Z, Manthravadi D, Mariotti M, Mudge J, Mukherjee G, Notredame C, Pei B, Rodriguez JM, Saunders G, Sboner A, Searle S, Sisu C, Snow C, Steward C, Tanzer A, Tapanari E, Tress ML, van Baren MJ, Walters N, Washietl S, Wilming L, Zadissa A, Zhang Z, Brent M, Haussler D, Kellis M, Valencia A, Gerstein M, Reymond A, Guigó R, Harrow J, Hubbard TJ, Landt SG, Frietze S, Abyzov A, Addleman N, Alexander RP, Auerbach RK, Balasubramanian S, Bettinger K, Bhardwaj N, Boyle AP, Cao AR, Cayting P, Charos A, Cheng Y, Cheng C, Eastman C, Euskirchen G, Fleming JD, Grubert F, Habegger L, Hariharan M, Harmanci A, Iyengar S, Jin VX, Karczewski KJ, Kasowski M, Lacroute P, Lam H, Lamarre-Vincent N, Leng J, Lian J, Lindahl-Allen M, Min R, Miotto B, Monahan H, Moqtaderi Z, Mu XJ, O'Geen H, Ouyang Z, Patacsil D, Pei B, Raha D, Ramirez L, Reed B, Rozowsky J, Sboner A, Shi M, Sisu C, Slifer T, Witt H, Wu L, Xu X, Yan KK, Yang X, Yip KY, Zhang Z, Struhl K, Weissman SM, Gerstein M, Farnham PJ, Snyder M, Tenenbaum SA, Penalva LO, Doyle F, Karmakar S, Landt SG, Bhanvadia RR, Choudhury A, Domanus M, Ma L, Moran J, Patacsil D, Slifer T, Victorsen A, Yang X, Snyder M, Auer T, Centanin L, Eichenlaub M, Gruhl F, Heermann S, Hoeckendorf B, Inoue D, Kellner T, Kirchmaier S, Mueller C, Reinhardt R, Schertel L, Schneider S, Sinn R, Wittbrodt B, Wittbrodt J, Weng Z, Whitfield TW, Wang J, Collins PJ, Aldred SF, Trinklein ND, Partridge EC, Myers RM, Dekker J, Jain G, Lajoie BR, Sanyal A, Balasundaram G, Bates DL, Byron R, Canfield TK, Diegel MJ, Dunn D, Ebersol AK, Frum T, Garg K, Gist E, Hansen R, Boatman L, Haugen E, Humbert R, Jain G, Johnson AK, Johnson EM, Kutyavin TV, Lajoie BR, Lee K, Lotakis D, Maurano MT, Neph SJ, Neri FV, Nguyen ED, Qu H, Reynolds AP, Roach V, Rynes E, Sabo P, Sanchez ME, Sandstrom RS, Sanyal A, Shafer AO, Stergachis AB, Thomas S, Thurman RE, Vernot B, Vierstra J, Vong S, Wang H, Weaver MA, Yan Y, Zhang M, Akey JM, Bender M, Dorschner MO, Groudine M, MacCoss MJ, Navas P, Stamatoyannopoulos G, Kaul R, Dekker J, Stamatoyannopoulos JA, Dunham I, Beal K, Brazma A, Flicek P, Herrero J, Johnson N, Keefe D, Lukk M, Luscombe NM, Sobral D, Vaquerizas JM, Wilder SP, Batzoglou S, Sidow A, Hussami N, Kyriazopoulou-Panagiotopoulou S, Libbrecht MW, Schaub MA, Kundaje A, Hardison RC, Miller W, Giardine B, Harris RS, Wu W, Bickel PJ, Banfai B, Boley NP, Brown JB, Huang H, Li Q, Li JJ, Noble WS, Bilmes JA, Buske OJ, Hoffman MM, Sahu AD, Kharchenko PV, Park PJ, Baker D, Taylor J, Weng Z, Iyer S, Dong X, Greven M, Lin X, Wang J, Xi HS, Zhuang J, Gerstein M, Alexander RP, Balasubramanian S, Cheng C, Harmanci A, Lochovsky L, Min R, Mu XJ, Rozowsky J, Yan KK, Yip KY, Birney E.

Abstract

The human genome encodes the blueprint of life, but the function of the vast majority of its nearly three billion bases is unknown. The Encyclopedia of DNA Elements (ENCODE) project has systematically mapped regions of transcription, transcription factor association, chromatin structure and histone modification. These data enabled us to assign biochemical functions for 80% of the genome, in particular outside of the well-studied protein-coding regions. Many discovered candidate regulatory elements are physically associated with one another and with expressed genes, providing new insights into the mechanisms of gene regulation. The newly identified elements also show a statistical correspondence to sequence variants linked to human disease, and can thereby guide interpretation of this variation. Overall, the project provides new insights into the organization and regulation of our genes and genome, and is an expansive resource of functional annotations for biomedical research.

Comment in

PMID:
22955616
PMCID:
PMC3439153
DOI:
10.1038/nature11247
[Indexed for MEDLINE]
Free PMC Article
Icon for Nature Publishing Group Icon for PubMed Central

Publication types, MeSH terms, Substances, Grant support

Publication types

MeSH terms

Substances

Grant support

17.
Nat Methods. 2012 Mar 18;9(5):473-6. doi: 10.1038/nmeth.1937.

Unsupervised pattern discovery in human chromatin structure through genomic segmentation.

Author information

1
Department of Genome Sciences, University of Washington, Seattle, Washington, USA.

Abstract

We trained Segway, a dynamic Bayesian network method, simultaneously on chromatin data from multiple experiments, including positions of histone modifications, transcription-factor binding and open chromatin, all derived from a human chronic myeloid leukemia cell line. In an unsupervised fashion, we identified patterns associated with transcription start sites, gene ends, enhancers, transcriptional regulator CTCF-binding regions and repressed regions. Software and genome browser tracks are at http://noble.gs.washington.edu/proj/segway/.

PMID:
22426492
PMCID:
PMC3340533
DOI:
10.1038/nmeth.1937
[Indexed for MEDLINE]
Free PMC Article
Icon for Nature Publishing Group Icon for PubMed Central
18.
BMC Bioinformatics. 2011 Oct 26;12:415. doi: 10.1186/1471-2105-12-415.

Exploratory analysis of genomic segmentations with Segtools.

Author information

1
Department of Genome Sciences, University of Washington, PO Box 355065, Seattle, WA 98195-5065, USA.

Abstract

BACKGROUND:

As genome-wide experiments and annotations become more prevalent, researchers increasingly require tools to help interpret data at this scale. Many functional genomics experiments involve partitioning the genome into labeled segments, such that segments sharing the same label exhibit one or more biochemical or functional traits. For example, a collection of ChlP-seq experiments yields a compendium of peaks, each labeled with one or more associated DNA-binding proteins. Similarly, manually or automatically generated annotations of functional genomic elements, including cis-regulatory modules and protein-coding or RNA genes, can also be summarized as genomic segmentations.

RESULTS:

We present a software toolkit called Segtools that simplifies and automates the exploration of genomic segmentations. The software operates as a series of interacting tools, each of which provides one mode of summarization. These various tools can be pipelined and summarized in a single HTML page. We describe the Segtools toolkit and demonstrate its use in interpreting a collection of human histone modification data sets and Plasmodium falciparum local chromatin structure data sets.

CONCLUSIONS:

Segtools provides a convenient, powerful means of interpreting a genomic segmentation.

PMID:
22029426
PMCID:
PMC3224787
DOI:
10.1186/1471-2105-12-415
[Indexed for MEDLINE]
Free PMC Article
Icon for BioMed Central Icon for PubMed Central
19.
PLoS Biol. 2011 Apr;9(4):e1001046. doi: 10.1371/journal.pbio.1001046. Epub 2011 Apr 19.

A user's guide to the encyclopedia of DNA elements (ENCODE).

Collaborators (361)

Myers RM, Stamatoyannopoulos J, Snyder M, Dunham I, Hardison RC, Bernstein BE, Gingeras TR, Kent WJ, Birney E, Wold B, Crawford GE, Bernstein BE, Epstein CB, Shoresh N, Ernst J, Mikkelsen TS, Kheradpour P, Zhang X, Wang L, Issner R, Coyne MJ, Durham T, Ku M, Truong T, Ward LD, Altshuler RC, Lin MF, Kellis M, Gingeras TR, Davis CA, Kapranov P, Dobin A, Zaleski C, Schlesinger F, Batut P, Chakrabortty S, Jha S, Lin W, Drenkow J, Wang H, Bell K, Gao H, Bell I, Dumais E, Dumais J, Antonarakis SE, Ucla C, Borel C, Guigo R, Djebali S, Lagarde J, Kingswood C, Ribeca P, Sammeth M, Alioto T, Merkel A, Tilgner H, Carninci P, Hayashizaki Y, Lassmann T, Takahashi H, Abdelhamid RF, Hannon G, Fejes-Toth K, Preall J, Gordon A, Sotirova V, Reymond A, Howald C, Graison E, Chrast J, Ruan Y, Ruan X, Shahab A, Ting Poh W, Wei CL, Crawford GE, Furey TS, Boyle AP, Sheffield NC, Song L, Shibata Y, Vales T, Winter D, Zhang Z, London D, Wang T, Birney E, Keefe D, Iyer VR, Lee BK, McDaniell RM, Liu Z, Battenhouse A, Bhinge AA, Lieb JD, Grasfeder LL, Showers KA, Giresi PG, Kim SK, Shestak C, Myers RM, Pauli F, Reddy TE, Gertz J, Partridge EC, Jain P, Sprouse RO, Bansal A, Pusey B, Muratet MA, Varley KE, Bowling KM, Newberry KM, Nesmith AS, Dilocker JA, Parker SL, Waite LL, Thibeault K, Roberts K, Absher DM, Wold B, Mortazavi A, Williams B, Marinov G, Trout D, Pepke S, King B, McCue K, Kirilusha A, DeSalvo G, Fisher-Aylor K, Amrhein H, Vielmetter J, Sherlock G, Sidow A, Batzoglou S, Rauch R, Kundaje A, Libbrecht M, Margulies EH, Parker SC, Elnitski L, Green ED, Hubbard T, Harrow J, Searle S, Kokocinski F, Aken B, Frankish A, Hunt T, Despacio-Reyes G, Kay M, Mukherjee G, Bignell A, Saunders G, Boychenko V, Van Baren M, Brown RH, Khurana E, Balasubramanian S, Zhang Z, Lam H, Cayting P, Robilotto R, Lu Z, Guigo R, Derrien T, Tanzer A, Knowles DG, Mariotti M, James Kent W, Haussler D, Harte R, Diekhans M, Kellis M, Lin M, Kheradpour P, Ernst J, Reymond A, Howald C, Graison EA, Chrast J, Tress M, Rodriguez JM, Snyder M, Landt SG, Raha D, Shi M, Euskirchen G, Grubert F, Kasowski M, Lian J, Cayting P, Lacroute P, Xu Y, Monahan H, Patacsil D, Slifer T, Yang X, Charos A, Reed B, Wu L, Auerbach RK, Habegger L, Hariharan M, Rozowsky J, Abyzov A, Weissman SM, Gerstein M, Struhl K, Lamarre-Vincent N, Lindahl-Allen M, Miotto B, Moqtaderi Z, Fleming JD, Newburger P, Farnham PJ, Frietze S, O'Geen H, Xu X, Blahnik KR, Cao AR, Iyengar S, Stamatoyannopoulos JA, Kaul R, Thurman RE, Wang H, Navas PA, Sandstrom R, Sabo PJ, Weaver M, Canfield T, Lee K, Neph S, Roach V, Reynolds A, Johnson A, Rynes E, Giste E, Vong S, Neri J, Frum T, Johnson EM, Nguyen ED, Ebersol AK, Sanchez ME, Sheffer HH, Lotakis D, Haugen E, Humbert R, Kutyavin T, Shafer T, Dekker J, Lajoie BR, Sanyal A, James Kent W, Rosenbloom KR, Dreszer TR, Raney BJ, Barber GP, Meyer LR, Sloan CA, Malladi VS, Cline MS, Learned K, Swing VK, Zweig AS, Rhead B, Fujita PA, Roskin K, Karolchik D, Kuhn RM, Haussler D, Birney E, Dunham I, Wilder SP, Keefe D, Sobral D, Herrero J, Beal K, Lukk M, Brazma A, Vaquerizas JM, Luscombe NM, Bickel PJ, Boley N, Brown JB, Li Q, Huang H, Gerstein M, Habegger L, Sboner A, Rozowsky J, Auerbach RK, Yip KY, Cheng C, Yan KK, Bhardwaj N, Wang J, Lochovsky L, Jee J, Gibson T, Leng J, Du J, Hardison RC, Harris RS, Song G, Miller W, Haussler D, Roskin K, Suh B, Wang T, Paten B, Noble WS, Hoffman MM, Buske OJ, Weng Z, Dong X, Wang J, Xi H, Tenenbaum SA, Doyle F, Penalva LO, Chittur S, Tullius TD, Parker SC, White KP, Karmakar S, Victorsen A, Jameel N, Bild N, Grossman RL, Snyder M, Landt SG, Yang X, Patacsil D, Slifer T, Dekker J, Lajoie BR, Sanyal A, Weng Z, Whitfield TW, Wang J, Collins PJ, Trinklein ND, Partridge EC, Myers RM, Giddings MC, Chen X, Khatun J, Maier C, Yu Y, Gunawardena H, Risk B, Feingold EA, Lowdon RF, Dillon LA, Good PJ, Harrow J, Searle S.

Author information

1
HudsonAlpha Institute for Biotechnology, Huntsville, Alabama, United States of America. rmyers@hudsonalpha.org

Abstract

The mission of the Encyclopedia of DNA Elements (ENCODE) Project is to enable the scientific and medical communities to interpret the human genome sequence and apply it to understand human biology and improve health. The ENCODE Consortium is integrating multiple technologies and approaches in a collective effort to discover and define the functional elements encoded in the human genome, including genes, transcripts, and transcriptional regulatory regions, together with their attendant chromatin states and DNA methylation patterns. In the process, standards to ensure high-quality data have been implemented, and novel algorithms have been developed to facilitate analysis. Data and derived results are made available through a freely accessible database. Here we provide an overview of the project and the resources it is generating and illustrate the application of ENCODE data to interpret the human genome.

PMID:
21526222
PMCID:
PMC3079585
DOI:
10.1371/journal.pbio.1001046
[Indexed for MEDLINE]
Free PMC Article
Icon for Public Library of Science Icon for PubMed Central
20.
Bioinformatics. 2010 Jun 15;26(12):i334-42. doi: 10.1093/bioinformatics/btq175.

A dynamic Bayesian network for identifying protein-binding footprints from single molecule-based sequencing data.

Author information

1
Department of Computer Science and Engineering, University of Washington, Seattle, WA, USA.

Abstract

MOTIVATION:

A global map of transcription factor binding sites (TFBSs) is critical to understanding gene regulation and genome function. DNaseI digestion of chromatin coupled with massively parallel sequencing (digital genomic footprinting) enables the identification of protein-binding footprints with high resolution on a genome-wide scale. However, accurately inferring the locations of these footprints remains a challenging computational problem.

RESULTS:

We present a dynamic Bayesian network-based approach for the identification and assignment of statistical confidence estimates to protein-binding footprints from digital genomic footprinting data. The method, DBFP, allows footprints to be identified in a probabilistic framework and outperforms our previously described algorithm in terms of precision at a fixed recall. Applied to a digital footprinting data set from Saccharomyces cerevisiae, DBFP identifies 4679 statistically significant footprints within intergenic regions. These footprints are mainly located near transcription start sites and are strongly enriched for known TFBSs. Footprints containing no known motif are preferentially located proximal to other footprints, consistent with cooperative binding of these footprints. DBFP also identifies a set of statistically significant footprints in the yeast coding regions. Many of these footprints coincide with the boundaries of antisense transcripts, and the most significant footprints are enriched for binding sites of the chromatin-associated factors Abf1 and Rap1.

SUPPLEMENTARY INFORMATION:

Supplementary material is available at Bioinformatics online.

PMID:
20529925
PMCID:
PMC2881360
DOI:
10.1093/bioinformatics/btq175
[Indexed for MEDLINE]
Free PMC Article
Icon for Silverchair Information Systems Icon for PubMed Central

Supplemental Content

Loading ...
Support Center