GEO Accession viewer

NCBI > GEO > Accession Display

Not logged in | Login

GEO help: Mouse over screen elements for information.

Sample GSM2360382

Query DataSets for GSM2360382

Status

Public on Oct 27, 2016

Title

RaoHuntley-2014-HIC072

Sample type

SRA

Source name

Erythroleukemia

Organism

Homo sapiens

Characteristics

cell line: K562 (CCL-243)
protocol: in situ Hi-C

Growth protocol

Cell lines were cultured according to manufacturer's instructions

Extracted molecule

genomic DNA

Extraction protocol

Specific protocols and there descriptions are indicated as additional columns in the SAMPLES section
Standard Illumina library construction protocol was performed, and libraries were sequenced on the HiSeq X Ten/NextSeq/HiSeq2500 following the manufacturer's protocols.

Library strategy

OTHER

Library source

genomic

Library selection

other

Instrument model

Illumina HiSeq 2500

Description

GSM1551621

Data processing

Library strategy: HiC-Seq
The paired end reads were aligned separately using BWA against the b37 (human), mm10 (mouse), or rheMac2 (rhesus macaque).
PCR duplicates, low mapping quality and unligated reads were removed using an in-house Hi-C analysis pipeline (see Rao, Huntley, et al, Cell 2014l)
Contact matrices were constructed at various resolutions and normalized using an in-house Hi-C analysis pipeline (see Rao, Huntley, et al, Cell 2014)
Genome_build: b37 (human), mm10 (mouse), rheMac2 (rhesus macaque)
Supplementary_files_format_and_content: Contact matrices: a text file with the raw observed contact matrix in sparse matrix notation at a given resolution. Only the upper triangle of the matrix is provided (i.e. i<=j), the matrix is symmetric, so M_i,j = M_j,i. At this stage of processing, read pairs where one or both ends do not align to the reference genome have already been removed, as well as chimeric ambiguous reads (see Section II.a.2 of the Extended Experimental Procedures of Rao, Huntley, et al., Cell 2014 for a definition of chimeric ambiguous reads). In addition, duplicate reads (reads where both ends align to within +/- 4bp of each other) have been removed as well (see Section II.a.3 of the Extended Experimental Procedures of Rao, Huntley, et al., Cell 2014 for a full description of duplicate removal). Full details of the Hi-C processing pipeline used in this study are provided in Section II.a. of the Extended Experimental Procedures of Rao, Huntley, et al., Cell 2014.
Supplementary_files_format_and_content: Normalization files: normalization vectors that can be used to transform the raw contact matrices M into normalized matrices M*. Each file is ordered such that the first line of the normalization vector file is the norm factor for the first row/column of the corresponding raw contact matrix, the second line is the factor for the second row/column of the contact matrix, and so on. To normalize, an entry M_i,j in a *RAWobserved file, divide the entry by the corresponding norm factors for i and j. (See section II.b of the Extended Experimental Procedures of Rao, Huntley, et al., Cell, 2014 for more information about the different types of normalizations.)
Supplementary_files_format_and_content: HiCCUPS_looplist.txt files contain loop calls generated via HiCCUPS; first three fields represent the locus participating in the loop closer to the p-end of the chromosome; fields 4-6 represent the locus participating in the loop closer to the q-end of the chromosome; field 7 represents the color used to display the feature in Juicebox (a Hi-C data visualization software, see www.aidenlab.org/juicebox); field 8 represents the observed number of counts at the loop; fields 9-12 represent the expected number of counts at the loop using four different expected models; fields 13-16 are the q-values over each of the expected values; field 17 is the number of enriched pixels that was clustered into a particular loop; field 18-19 are the centroid of the loop; field 20 is the radius of the loop
Supplementary_files_format_and_content: Arrowhead_domainlist.txt files contain domain calls generated via Arrowhead; first 6 fields represent the boundaries of the domain; field 7 represents the color used to display the feature in Juicebox (a Hi-C data visualization software, see www.aidenlab.org/juicebox); field 8 is the corner score for the domain (see Rao, Huntley, et al); fields 9-12 are the component scores used in the Arrowhead algorithm (see Rao, Huntley, et al)
Supplementary_files_format_and_content: merged_nodups.txt files contain filtered, "normal" contacts. Each line represents a single Hi-C read pair that has passed the alignment and duplicate removal stages. The format of each line of the file is: read_name, strand1, chromosome1, position1, fragment-index1, strand2, chromosome2 ,position2, fragment-index2, mapq1, mapq2
Supplementary_files_format_and_content: collisions.txt.gz files contain the contacts that have 3 or more loci.

Submission date

Oct 26, 2016

Last update date

May 15, 2019

Contact name

Miriam Huntley

E-mail(s)

mhuntley@fas.harvard.edu

Organization name

Harvard University

Street address

29 Oxford Street

City

Cambridge

State/province

ZIP/Postal code

02138

Country

USA

Platform ID

GPL16791

Series (1)

GSE71831

Deletion of DXZ4 on the human inactive X chromosome eliminates superdomains and impairs gene silencing

Relations

Reanalysis of

GSM1551621

BioSample

SAMN05943210

SRA

SRX2269475

Supplementary file	Size	Download	File type/resource
GSM2360382_RaoHuntley-2014-HIC072.collisions.txt.gz	41.5 Kb	(ftp)(http)	TXT
SRA Run Selector
Processed data provided as supplementary file
Raw data are available in SRA