NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Epigenomics Help [Internet]. Bethesda (MD): National Center for Biotechnology Information (US); 2010-.

Bookshelf ID: NBK45786

Epigenomics Help

Created: September 2, 2010; Last Update: January 3, 2012.

How to Use the Sample and Experiment Browser

Summary

Learn how searching and filtering can be performed to identify samples and experiments in the Epigenomics database and how to select, save and view these records.

Searching and Filtering

All experiments and their corresponding samples currently in the Epigenomics database are displayed by default in the Browser window. Clicking the Experiments/Sample toggle button allows you to customize the Browser view to show either Experiment or Sample records (A). Additionally, Experiment and Sample records can be filtered by entering text in the filter box found within the sample browser (B). Search terms can include such things as cell lines (i.e. H1, IMR90), cell types (i.e. stem cell, fibroblast), tissue types (i.e. lung, spleen), and assayed features (i.e. H3K27me3). Additionally, there are three preset filtering options in the sample browser allowing you to filter by "Species", "Biological Source" or "Feature" (C).

Within the Browser window, a series of columns are displayed that indicate the various attributes assigned to experiments and samples. Columns with additional attributes can be added. Click on the "Configure" icon and a pop-up dialog appears which allows you to choose the attributes you wish to display (D). Columns can be sorted alphabetically by clicking on the arrows in the column headings. Mouse over the column headings to reveal the arrows.

A count indicating the total number of experiments or samples available in the browser is indicated on the page to the left of the Browser window (see “All experiments” or “All samples”). “New experiments” or “New samples” include all records that have been added to the Epigenomics database in the past three months. “Recently viewed” indicates a count of the records that have been viewed in the past 8 hours. If you are logged in with a “My NCBI” account, it includes all records viewed in the past 6 months.

Selecting and Viewing Records

Experiments or Samples can be selected by clicking the check boxes on the left side of the Browser window. All records can be selected or deselected by clicking the “Select: All, None” links in the table header (E).

Clicking on an "Experiment ID" for a particular Experiment in the Browser window will bring you to the Experiment page. This page provides information about the experimental data track and provides additional experimental details such as the technique (i.e. ChIP-seq, DNase-seq, etc.) assay type and instrument used. Links are provided that allow viewing or downloading the data track (if available).

Clicking on a "Sample ID" for a particular sample in the Sample Browser window will bring you to the Sample page. This page indicates what data tracks are available for viewing and downloading for that sample. Biological attributes and data source information are also indicated on this page.

After records are selected, notice that several icons become active within the Browser window presenting you with new options. Clicking on the "View on Genome" button will bring you to a page where track data and viewing options are presented for the sample(s) selected (F). Selecting the “Download” button will present a drop down with the options to download epigenomic track data or export the contents of the Sample Browser window (G). Selecting the “View Samples” icon will redirect you to the sample records (H).

Image sbrowser4.jpg

How To Manage Collections of Samples

Summary

Learn how to create and edit collections of Experiments or Samples and how to use My NCBI for maintaining those collections.

Creating and Editing Collections

Data selected from theBrowser can be saved to the Clipboard, for temporary storage, or a Collection, for long term storage. After selecting samples, click the “Clipboard” icon to add samples to the clipboard (A). Selecting the “Collection” icon reveals a drop-down menu. You can create and name a new Collection, or you can add samples to an existing Collection by selecting it from the drop down menu. After you have made your selection, clicking the "Copy" button completes the action. If you created a new Collection it will now be listed under the "My Collections" heading on the left side of the page (C). Please note, an individual collection can contain both Experiment and Sample records, however the count reflecting the number of records in the collection will only show the number of Experiments or Samples depending on whether the Browser is set to display Experiments or Samples.

When a Collection is selected, the experiments or samples within the Collection will be displayed in the Browser window. Individual records can be selected by clicking the check box in the left column. Selected records can be removed from the collection by clicking the "Remove" icon. Additionally, members of existing Collections can be copied to the Clipboard, other Collections, or to a new Collection.

Maintaining Collections Through My NCBI

If you do not have a "My NCBI" account, your Collections will only be stored for 24 hours. If you have an account these Collections are stored indefinitely. More information about My NCBI accounts can be found here: http://www.ncbi.nlm.nih.gov/sites/myncbi/

Your collections can be accessed and managed from the "My NCBI Home" page. Click Collections and a list of your current collections are displayed. From here, Collections can be merged or deleted. Additionally, you can create a URL to share the collection by clicking on the Private link under sharing and following the instructions.

Image collections4.jpg

How To View Genome Tracks

Summary

Learn how to use the View on Genome page to choose and view your genomic region of interest, genome browser and data tracks from selected samples.

Getting Started

After you have selected Experiments or Samples, clicking on the "View on Genome" button will bring you to the View page where you can select and set viewing options for genome track data and explore epigenomic data in gene-specific contexts. Alternatively, you can access the View page from the Sample, Study or Experiment pages by clicking on the "View on Genome" links. The View page in the Epigenomics database integrates the NCBI Sequence Viewer tool and search functionality that provides the ability to quickly navigate to queried genes and locations throughout the selected genome.

View on Genome

When you first come to the View page, you will be prompted to enter a gene or chromosome location into the “Gene or Location” box (A). Alternatively, you can select to view an entire chromosome by clicking on the respective buttons above the NCBI Sequence Viewer interface (B). When you perform a query for a particular gene, the results of the search are populated in the box below the query box (C). Autocomplete functionality has been incorporated and as you enter a gene name into the query box a list of possible matches is displayed. The query searches the NCBI Gene database and the list represents these results. Searches can include specific gene names or text queries. If a specific gene is searched for and found, the gene is highlighted in the query result box and the sequence viewer window will show the gene of interest with the data tracks that had been previously selected (D). If no data tracks had been selected previously a set of default tracks that have been manually curated will be displayed.

Tracks can also be viewed on the UCSC Genome Browser. Clicking on the “View on UCSC” button will load your selected tracks to the UCSC browser (E). If you have specified a gene or location on the Epigenomics View page, the UCSC browser will display the data at the same location.

By default, all genome tracks for the previously selected experiments or associated with the selected samples will be displayed. Please note this is conditional. Experiments or samples must be from the same species and use the same genome assemblies in order to be displayed at the same time. If you have samples from multiple species or assemblies you can use the “Species” drop down menu to select the species and genome assembly of interest (F).

All of your recent searches are temporarily saved under the “Recent Searches” heading. Selecting a previous query from the list will repeat the search and reset the sequence viewer to that location.

Image ViewA2.jpg

Configuring the Sequence Viewer interface

Below the window in which track data in displayed, there is a configuration panel that provides customization options for the sequence viewer interface (H). The “Active tracks” tab lists the tracks that are currently displayed (I). This will include the data tracks selected from epigenomics, and other tracks such as genes, translations, genomic variations, etc. Tracks in this tab can be re-ordered, selected, or deselected which impacts what is being displayed. The “Epigenomics” tab specifically lists the tracks that have been selected from the Epigenomics database (J). After making changes such as selecting, deselecting or reordering tracks, clicking on the “Configure” button will reload the sequence viewer with these changes (K). For more details about Sequence Viewer customization, please refer to the help documentation provided at the NCBI Sequence Viewer page (http://www.ncbi.nlm.nih.gov/projects/sviewer/). An informative instructional video is also available on the NCBI YouTube Channel (http://www.youtube.com/ncbinlm#p/c/2/TQOx2w18SW8).

Image viewB.jpg

How To Download Genome Tracks

Summary

Learn how to use the Download page to select data tracks and file formatting options.

Getting Started

After you have selected records, clicking on the "Download" button in the Browser window will bring you to a page where you can select and set downloading options for genome track data.

Individual data tracks can be selected for download by using the checkboxes found in the table (A). Additionally, you can select species and alternate genome assemblies using the drop down menu that appears when tracks are selected from multiple species or from different genome assemblies (B). Clicking the "Download" button on this page will download all data associated with the samples selected(C). This list of tracks can be sorted by "Data type" (i.e. H3K27me3, BS-seq) "Sample" or “Feature” by clicking on the arrows in the column headers that appear when moused over,

The track data will be downloaded as .wig files. Raw data used to generate these .wig files, if available, can be found at the Gene Expression Omnibus (GEO) web site (http://www.ncbi.nlm.nih.gov/geo).

Files are downloaded as a single compressed .zip file containing the individual tracks. A readme file provides detailed information regarding the downloaded data.

Image download3.jpg

How to Search for Studies and Samples

Summary

Learn how to use basic search, limits and advanced searching to more efficiently find Samples, Studies and Experiments of interest.

Getting Started

The Epigenomics database is indexed in Entrez. To search Epigenomics, type a word or phrase into the query box, then click on the Search button or press the Enter key. Combine search terms with connector words: "AND", "OR" or "NOT" using upper case letters.

Using Limits

"Limits" is available from a link above the search box on all Epigenomics pages, including the homepage. Restrict a search to items with links to full text, and make multiple choices within categories. Click the Search button after making selections to run the search. A "Limits Activated" message will appear above the search results list. Limits remain in effect until removed.

How to Compare Samples

Summary

The comparison tool will allow you to compare epigenetic features across experiments and biological samples and returns a list of genomic regions that show significant differences.

Getting Started

After navigating to the comparison tool, enter the accession numbers of the experiments or samples that you want to compare into the text input fields (A). Alternatively, you can select one of the example comparisons that are available (B). Once the accession numbers are entered, the experiment or sample information, sample summary, and tracks that are available to be compared are displayed (C). You can select individual or multiple tracks to be compared using the checkboxes next to the individual tracks. The number of genes/regions with significant differences returned from a comparison query can also be customized from the drop down menu (D). Click the “Compare” button to execute your query.

Image compare1.jpg

The comparison results page presents the output of the comparison tool. The results indicate genes and genomic regions where there are significant differences between the epigenetic features in the samples that were chosen for comparison. There are two columns displayed, each one representing the samples that were the input for the comparison. For each gene or region that appears, a snapshot of the epigenetic features at that particular locus is displayed (E). Peaks indicate regions for which a particular epigenetic feature is enriched. A legend indicates how the genes and epigenetic features are being represented (F). Briefly, each epigenetic feature is represented by a different color track. Epigenetic features that are associated with ‘active’ chromatin regions are displayed as green, and features that have been associated with repressed or ‘silent’ chromatin are displayed as red.

The percent difference between epigenetic features at a given gene for each experiment or sample is indicated, (G) as well as GO terms and biological pathways that are highly represented in the comparison output (H). Clicking on a specific gene will bring you to the gene record at Entrez Gene, while clicking on a specific pathway will bring you to the pathway page at NCBI’s Biosystems database. Placing the mouse over the genes, GO terms or Pathways causes a pop-up window to be displayed that gives more information regarding these records (I).

Image compare2.jpg

Clicking on the button for Comparison settings (J), allows you to customize tracks compared in the existing query, change the number of genes displayed, or begin a new query.

About Sample Comparisons

Summary

The comparison tool can identify genomic regions where epigenetic features differ between selected biological samples. The process by which these regions are identified is described.

Condensed representation of genome tracks

Prior to a comparison being made, individual data tracks are processed in the following fashion. A condensed representation of the track is made which allows for the elimination of background noise and defines regions of variation. Several steps are involved in generating this condensed representation. The first step in constructing this representation is defining the regions of variation. First, a peak enhancement algorithm is used to eliminate as much “noise” as possible. Then for each (epigenetic modification/genome assembly) pair, an aggregate track is constructed. This aggregate track is a composite of all data for a particular epigenetic feature that exists in the Epigenomics database. For example, the displayed region of a track below is an aggregate of all the H3K4me3 data tracks for human samples that exist in the database. The blue bar represents a region of the TAF3 gene.

condensedrep.jpg

Once the aggregate track is constructed, a 140 nucleotide median filter is applied. The purpose of the median filter is to reduce noise and eliminate peaks far smaller than a single nucleosome. Next, an 80 nucleotide sliding window travels over the genome. Although a nucleosome is wrapped by 147 nucleotides, often clearly distinguishable peaks occur less than 147 nucleotides apart. This is due to the facts that nucleosomes shift position, and that they can be positioned slightly differently in different members of a cell population. Any point that has the maximum height in that window is kept; any point with a non-maximal height in that window is thrown out. The maximal points from the previous step are kept as centers for regions of variation.

After the aggregate track is made the representation is quantified. For a given set of aligned reads at a given region, regions of variability are identified and a probability (between 0 and 1) is assigned to that region being saturated. To measure saturation, the calculated regions of variability are “binned” to 150 nucleotides. The number of reads that fall within each “bin” are counted, anything above the top 5% is considered saturated or “1” and anything below the bottom 5% is considered unsaturated or “0”.

Comparison Algorithm

The comparison algorithm tries to identify genes that are differentially modified between two different tracks. For each gene, there may or may not be a region of variation inside the gene body. We want to connect regions of variations with genes. A region of variation is considered ‘part of’ a gene if it is either within the gene body, 2000 nucleotides before the start, or 2000 nucleotides after the end of the gene. For each gene a difference score must be computed. The average score of all sites is computed for each track (i.e. Sall1 Sall2) and these scores will be between 0 and 1. A site is considered saturated if the score is above Sall + ((1-Sall)/2). Simply put, if it is more than twice the average saturation on a 0 to 1 scale. Conversely, a site is considered unsaturated if the score is below Sall/2, or below half the average saturation on a 0 to 1 scale. For sites that fall between these two values (saturated and unsaturated) do not contribute to the overall difference score. The final difference score is taken as the average absolute value difference of the average site scores of the 2 tracks. Finally, an additional filter is applied that a given gene must have at least 39 sites of variation. This figure is arbitrary. Empirically, we observed that using a lower limit increases the dataset size for regions of differences and as a result many genes or regions returned are uncharacterized or questionable. Using a higher limit restricts the dataset for regions of differences and as a result some more interesting genes may be overlooked.

Here is a working example. Let’s say the genome has 30,000 genes and 100,000 sites of variation. There are two tracks t1 and t2. The average site saturation for t1 and t2 are computed, giving avgt1 and avgt2. Suppose it is calculated that avgt1 = 0.6 and avgt2 = 0.4. Using the formulas above, it can be concluded that the cut-off for saturation/unsaturation would be 0.8 and 0.3 for avgt1 and 0.2 and 0.7 for avgt2.

Now, suppose a gene has 3 sites of variation with the following scores in both tracks: Track 1: 0.1, 0.3, and 0.2; Track 2: 0.5, 0.6, and 0.7. The average saturation of each track is computed, giving 0.2 for Track 1 and 0.6 for Track2. Since Track 1 has a score of 0.2, and that’s below the cutoff for unsaturation (0.3), then this pair will have a score that is the difference of the averages: 0.6 - 0.2 = 0.4.

About Data Processing and Track Generation

Raw DNA sequence reads from the NCBI Sequence Read Archive are aligned to the most current genome assemblies using Bowtie (http://bowtie-bio.sourceforge.net/index.shtml). Alignment parameters allow for 10% mismatch for each read, or 3 mismatches, whichever is greater. Reads unaligned by Bowtie are fed into an internal NCBI hash-based aligner, oligofar, which allows for both 10% mismatches and 10% insertions/deletions (indels) per read.  If multiple reads map to the exact same position on the same strand, then only a single read is retained.  This serves as a mechanism to eliminate biases that may have been introduced by PCR during library generation. Reads that map to more than 8 positions are thrown out.  Reads that map to more than a single position but less than 8 are weighted by1/(# of positions).  Data is binned into discrete non-overlapping 200 nucleotide windows, or buckets, followed by peak smoothing with a 10 window box filter.  Fragment length is discovered per-track by determining a shift that maximizes plus and minus strand read concordance; plus strand reads are offset forward half a fragment length, while minus strand reads are offset backward half a fragment length.

About Complete Epigenomes

Epigenome Classes as defined by the Roadmap Epigenomics Program

One of the goals of the Roadmap Epigenomics Project is to establish what are called “reference epigenomes.” These are high resolution, genome-wide maps of epigenetic modifications in a variety of human cell lines, primary cell and tissue types. The majority of the reference epigenomes generated will contain information on epigenetic modifications having to do with a core set of histone marks, DNA methylation, and chromatin accessibility, in addition to, gene expression data associating gene activity with the epigenetic modifications. A subset of reference epigenomes will also contain an expanded set of at least twenty additional histone modifications. For an epigenome to be considered complete it must contain, at the minimum, the following information: DNA methylation data, genome-wide mapping of the most informative “core” histone modifications (which currently includes H3K4me1, H3K4me3, H3K9me3, H3K27ac, H3K27me3 and H3K36me3), and RNA expression data.

Four classes of “reference epigenomes” have been established which are differentiated by the number of epigenetic features examined. These classes are distinguished as follows.

Class 1 Epigenomes

  • DNA methylation (whole genome bisulfite sequencing)

  • “core” histone modifications and an expanded set of histone modifications

  • RNA sequencing data (RNA-seq)

  • Chromatin accessibility

Class 2 Epigenomes

  • DNA methylation (whole genome bisulfite sequencing)

  • “core” histone modifications

  • RNA sequencing data (RNA-seq)

  • Chromatin accessibility (if possible)

Class 3 Epigenomes

  • DNA methylation (RRBS, MeDIP-seq, MRE-seq)

  • “core” histone modifications

  • RNA sequencing data (gene expression microarray)

  • Chromatin accessibility (if possible)

Class 4 Epigenomes

  • DNA methylation (RRBS, MeDIP-seq, MRE-seq)

  • “core” histone modifications

  • RNA sequencing data (gene expression microarray)

Roadmap Epigenomics Project Web Resources

To facilitate the dissemination of data from the Roadmap Epigenomics Project, several web based resources have been created. This includes the Reference Epigenome Mapping Consortium page, the NCBI Epigenomics database, the NCBI Gene Expression Omnibus (GEO) data listings page, the Human Epigenome Atlas, and the Roadmap Epigenomics Visualization Hub. While the overall goal of these resources is the same, there are some differences with respect to available features. These features are highlighted below.

BrowserDownload data?Browse dataView DataUpdatesOther FeaturesUpload data?
Reference Epigenome Mapping Consortium HomepageLinks to data downloadClickable data matrix or visual data browserLinks to UCSC browser mirror (Epigenome Browser)Data: At each data freeze (4x/year)Protocols, publications, quality metrics, project and center/group informationNo (but linked Epigenome Browser supports upload)
NCBI Epigenomics Hub.wigSample (i.e. cell/tissue type) browser, experiment (i.e. epigenetic feature) browser, text searchNCBI epigenomics viewer or UCSC browser mirrorContinuously“Compare Samples” tool to identify regions of greatest chromatin differences, suggests GO terms and pathways most associated.Being implemented
NCBI Gene Expression Omnibusbed, .wig, .bam and SRABy sample, study, or data matrixNCBI Epigenomics viewerContinuouslyN/A
The Human Epigenome Atlas (on Genboree).bed, .wig by ftp or httpBy sample, assay, or clickable data matrixUCSC browser, Atlas Gene/Pathway browser (read densities across single genes or pathways)Data: At each data freeze (4x/year)Info on metadata, data flow, data quality. Tools for analysis via Genboree workbench (Independent tools and Galaxy pipelines). Data & functionality exposed via HTTP REST APIs for programmatic use and extension
Roadmap Epigenomics Visualization Hub and Load Track HubNoENCODE style data matrixUCSC browser mirror, or remote display at UCSC main site (load track hub)Data: At each data freeze (4x/year)UCSC mirror hosts integrative analysis tracks and summary tracks, tracks viewable at UCSC main siteBeing implemented
Human Epigenome Browser at Wash UYesExpandable data selection matrix and metadata matrixNext generation epigenome browserAt each data freezeGoogle map style zoom and pan, genomic data and metadata viewer, data collation view, pathway/gene set view, statistical analysisYes
Epigenome Browser UCSC mirror.bed, .wig through Table Browser. Individual reads not availableUCSC data selection matrix (ENCODE style)UCSC browser mirrorData: At each data freeze (4x/year)High-utility UCSC mirror tracksYes

Copyright Notice: http://www.ncbi.nlm.nih.gov/books/about/copyright/

Cover of Epigenomics Help
Epigenomics Help [Internet].

Recent activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...