Status Public on Feb 18, 2015
Title Hi-C, H1 embryonic stem cells, replicate one
Sample type SRA
Source name H1 embryonic stem cells
Organism Homo sapiens
Characteristics cell type: H1 embryonic stem cells
Treatment protocol None
Growth protocol Growth and differentiation of H1 hESCs was performed as previously described in Xie et al., 2013, Cell 153 (1134-1148)
Extracted molecule genomic DNA
Extraction protocol Sequencing libraries were constructed according to previous publication (Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289-93 (2009).).
Library strategy OTHER
Library source genomic
Library selection other
Instrument model Illumina HiSeq 2000
Description H1_R1_T1_read1.fastq.gz
Data processing Library strategy: Hi-C
fastq: Illumina's HiSeq Control Software
For Hi-C read alignment, we aligned Hi-C reads to the hg18 (human) genome. We masked any bases in the genome that were genotyped as SNPs in the H1 genome. These bases were masked to ā€œNā€ in order to reduce reference bias mapping artifacts. Hi-C reads were aligned iteratively as single end reads using Novoalign. Specifically, for iterative alignment, we first aligned the entire sequencing read to either the mouse or human genome. Unmapped reads are then trimmed by 5 base pairs and realigned. This process is repeated until the read successfully aligns to the genome or until the trimmed read is less than 25 base pairs long. After iterative mapping was finished, read pairs were re-constructed from single reads using an in house pipeline. Unmapped reads were filtered out and PCR duplicate reads were removed. Final alignment files were then processed using the GATK pipeline, specifically using Indel Realignment and Variant Recalibration. A similar pipeline was used for alignment of the other high-throughput sequencing datasets without the iterative alignment step.
Haplotypes were generated from the final aligned bam file after merging the two biological replicats using the HapCUT algorithm. The details of HapCUT are described previously (Bansal and Bafna, Bioinformatics 24, i153-159, 2008).
Genome_build: hg18
Supplementary_files_format_and_content: Reads are listed in bed format with one line for each sequencing read. The reads have been split by haplotype into the "A" and "B" (alternatively, "p1" and "p2") alleles according to which haplotype the bases within each sequencing read correspond. For paired end Hi-C data, each line lists a single read, and paired infomration can be obtained from the read names. The original fastq files for data other than the Hi-C and CTCF ChIP-seq are available in the GSE16256 dataset.
Supplementary_files_format_and_content: The processed haplotypes for the H1 genome ("H1_haps.vcf") are available in VCF format.
Submission date Nov 18, 2013
Last update date Mar 21, 2019
Contact name Jesse R Dixon
Organization name Salk Institute for Biological Studies
Street address 10010 N. Torrey Pines Rd.
City La Jolla
State/province CA
ZIP/Postal code 92037
Country USA
Platform ID GPL11154
GSE52457 Global Reorganization of Chromatin Architecture during Embronic Stem Cell Differentiation
Reanalyzed by GSE85977
Reanalyzed by GSE87112
BioSample SAMN02404686
SRA SRX378271

Supplementary data files not provided
Raw data provided as supplementary file
Processed data is available on Series record

