NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM2360382 Query DataSets for GSM2360382
Status Public on Oct 27, 2016
Title RaoHuntley-2014-HIC072
Sample type SRA
 
Source name Erythroleukemia
Organism Homo sapiens
Characteristics cell line: K562 (CCL-243)
protocol: in situ Hi-C
Growth protocol Cell lines were cultured according to manufacturer's instructions
Extracted molecule genomic DNA
Extraction protocol Specific protocols and there descriptions are indicated as additional columns in the SAMPLES section
Standard Illumina library construction protocol was performed, and libraries were sequenced on the HiSeq X Ten/NextSeq/HiSeq2500 following the manufacturer's protocols.
 
Library strategy OTHER
Library source genomic
Library selection other
Instrument model Illumina HiSeq 2500
 
Description GSM1551621
Data processing Library strategy: HiC-Seq
The paired end reads were aligned separately using BWA against the b37 (human), mm10 (mouse), or rheMac2 (rhesus macaque).
PCR duplicates, low mapping quality and unligated reads were removed using an in-house Hi-C analysis pipeline (see Rao, Huntley, et al, Cell 2014l)
Contact matrices were constructed at various resolutions and normalized using an in-house Hi-C analysis pipeline (see Rao, Huntley, et al, Cell 2014)
Genome_build: b37 (human), mm10 (mouse), rheMac2 (rhesus macaque)
Supplementary_files_format_and_content: Contact matrices: a text file with the raw observed contact matrix in sparse matrix notation at a given resolution. Only the upper triangle of the matrix is provided (i.e. i<=j), the matrix is symmetric, so M_i,j = M_j,i. At this stage of processing, read pairs where one or both ends do not align to the reference genome have already been removed, as well as chimeric ambiguous reads (see Section II.a.2 of the Extended Experimental Procedures of Rao, Huntley, et al., Cell 2014 for a definition of chimeric ambiguous reads). In addition, duplicate reads (reads where both ends align to within +/- 4bp of each other) have been removed as well (see Section II.a.3 of the Extended Experimental Procedures of Rao, Huntley, et al., Cell 2014 for a full description of duplicate removal). Full details of the Hi-C processing pipeline used in this study are provided in Section II.a. of the Extended Experimental Procedures of Rao, Huntley, et al., Cell 2014.
Supplementary_files_format_and_content: Normalization files: normalization vectors that can be used to transform the raw contact matrices M into normalized matrices M*. Each file is ordered such that the first line of the normalization vector file is the norm factor for the first row/column of the corresponding raw contact matrix, the second line is the factor for the second row/column of the contact matrix, and so on. To normalize, an entry M_i,j in a *RAWobserved file, divide the entry by the corresponding norm factors for i and j. (See section II.b of the Extended Experimental Procedures of Rao, Huntley, et al., Cell, 2014 for more information about the different types of normalizations.)
Supplementary_files_format_and_content: HiCCUPS_looplist.txt files contain loop calls generated via HiCCUPS; first three fields represent the locus participating in the loop closer to the p-end of the chromosome; fields 4-6 represent the locus participating in the loop closer to the q-end of the chromosome; field 7 represents the color used to display the feature in Juicebox (a Hi-C data visualization software, see www.aidenlab.org/juicebox); field 8 represents the observed number of counts at the loop; fields 9-12 represent the expected number of counts at the loop using four different expected models; fields 13-16 are the q-values over each of the expected values; field 17 is the number of enriched pixels that was clustered into a particular loop; field 18-19 are the centroid of the loop; field 20 is the radius of the loop
Supplementary_files_format_and_content: Arrowhead_domainlist.txt files contain domain calls generated via Arrowhead; first 6 fields represent the boundaries of the domain; field 7 represents the color used to display the feature in Juicebox (a Hi-C data visualization software, see www.aidenlab.org/juicebox); field 8 is the corner score for the domain (see Rao, Huntley, et al); fields 9-12 are the component scores used in the Arrowhead algorithm (see Rao, Huntley, et al)
Supplementary_files_format_and_content: merged_nodups.txt files contain filtered, "normal" contacts. Each line represents a single Hi-C read pair that has passed the alignment and duplicate removal stages. The format of each line of the file is: read_name, strand1, chromosome1, position1, fragment-index1, strand2, chromosome2 ,position2, fragment-index2, mapq1, mapq2
Supplementary_files_format_and_content: collisions.txt.gz files contain the contacts that have 3 or more loci.
 
Submission date Oct 26, 2016
Last update date May 15, 2019
Contact name Miriam Huntley
E-mail(s) mhuntley@fas.harvard.edu
Organization name Harvard University
Street address 29 Oxford Street
City Cambridge
State/province MA
ZIP/Postal code 02138
Country USA
 
Platform ID GPL16791
Series (1)
GSE71831 Deletion of DXZ4 on the human inactive X chromosome eliminates superdomains and impairs gene silencing
Relations
Reanalysis of GSM1551621
BioSample SAMN05943210
SRA SRX2269475

Supplementary file Size Download File type/resource
GSM2360382_RaoHuntley-2014-HIC072.collisions.txt.gz 41.5 Kb (ftp)(http) TXT
SRA Run SelectorHelp
Processed data provided as supplementary file
Raw data are available in SRA

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap