|
|
GEO help: Mouse over screen elements for information. |
|
Status |
Public on Sep 02, 2015 |
Title |
MCF-7 Hi-C |
Sample type |
SRA |
|
|
Source name |
Breast cancer cell line
|
Organism |
Homo sapiens |
Characteristics |
cell line: MCF-7
|
Growth protocol |
MCF-10A cells were obtained from the Barbara Ann Karmanos Cancer Institute (Detroit, MI). The cells were maintained in monolayer in Dulbecco's modified Eagle's medium-F12 (DMEM/F12) (Invitrogen, 21041025) supplemented with 5% horse serum (Invitrogen, 16050122), 1% penicillin/streptomycin (Invitrogen, 15140122), 0.5 μg/ml hydrocortisone (Sigma, H-0888), 100 ng/ml cholera toxin (Sigma, C-8052), 10 μg/ml insulin (Sigma, I-1882), and 20 ng/ml recombinant human EGF (Peprotech, 100-15) as previously described (Debnath et al. 2003). MCF-7 cells were obtained from ATCC and were cultured in DMEM supplemented with 10% FBS and pen-strep.
|
Extracted molecule |
genomic DNA |
Extraction protocol |
The Hi-C libraries were prepared as previously described (PMID:22652625).
|
|
|
Library strategy |
OTHER |
Library source |
genomic |
Library selection |
other |
Instrument model |
Illumina HiSeq 2000 |
|
|
Data processing |
Library strategy:
The Hi-C data were mapped with Bowtie and binned at 6.5Mb, 1Mb, 250kb, 100kb and 40kb non-overlapping genomic intervals. Iterative mapping and correction of Hi-C data were performed as previously described (Imakaev et al. 2012). Biological replicates showed high reproducibility (Pearson's correlation coefficient > 0.9 for 1Mb resolution data). Similarly, the first eigenvector comparison of the replicates showed high reproducibility. For the downstream analyses, sequences obtained from both biological replicates were pooled and ICE-corrected to serve as a combined dataset.
Compartment analysis: First, the z-scores of the interaction matrices at 250kb resolution were generated as described previously (Lieberman, 2009). Then, Pearson Correlation on the Z-score matrices was calculated. By performing principal component analysis (Lieberman-Aiden et al. 2009; Zhang et al. 2012), the first eigenvector typically represents the compartment profile(Lieberman-Aiden et al. 2009), where the positive and negative 1st eigenvalues represent different compartments. Gene density for each compartment was calculated to call the “A” and “B” compartmentalization.
TAD Calling: TAD calling was performed as calculating the “insulation” score of each bin using the 40kb resolution combined Hi-C data (as previously described. The mean of the interactions across each bin were calculated. By sliding a 1Mb x 1Mb (25bins x 25bins) square along the diagonal of the interaction matrix for every chromosome, we obtained the “insulation score” of the interaction matrix. Valleys in the insulation score indicated the depletion of Hi-C interactions occurring across a bin. These 40kb valleys represent the TAD boundaries. Based on the variation of boundaries between replicates, we chose to add a total of 160kb (80kb to each side) to the boundary to account for replicate variation. The final boundaries span a 200kb region. All boundaries with a boundary strength < 0.15 were excluded as they were considered weak and non-reproducible. The insulation plots for the biological replicates showed high reproducibility (Pearson correlation coefficient = 0.80 for MCF-7 and 0.90 for MCF-10A replicates), suggesting the robustness of the method. Similarly, the overlap of detected boundaries also showed high reproducibility between the biological replicates (~85% TAD boundary overlap for MCF-7 and ~91% for MCF-10A). Therefore, we used the combined Hi-C replicates for the TAD analyses.
Z-Score Calculation: We modeled the overall Hi-C decay with distance using a modified LOWESS method (alpha = 1%, IQR filter), as described previously (Sanyal et al. 2012). LOWESS calculates the weighted-average and weighted-standard deviation for every genomic distance and therefore normalizes for genomic distance signal bias.
Genome_build: hg19
Supplementary_files_format_and_content: Hi-C: tar ball archive of all normalized/corrected Hi-C data matrices binned at 40kb/250kb/1Mb, TAD boundaries at 40kb and genomic compartments at 250kb resolution
Supplementary_files_format_and_content: RNA-Seq: raw gene counts after RSEM analysis and the output file for the DeSeq2 analysis
|
|
|
Submission date |
Mar 10, 2015 |
Last update date |
May 15, 2019 |
Contact name |
Rasim Barutcu |
Organization name |
ScitoVation
|
Street address |
6 Davis Drive
|
City |
Durham |
State/province |
NC |
ZIP/Postal code |
27709 |
Country |
USA |
|
|
Platform ID |
GPL11154 |
Series (1) |
GSE66733 |
Chromatin interaction analysis reveals changes in small chromosome and telomere clustering between epithelial and breast cancer cells |
|
Relations |
BioSample |
SAMN03397467 |
SRA |
SRX950725 |
Supplementary data files not provided |
SRA Run Selector |
Processed data are available on Series record |
Raw data are available in SRA |
|
|
|
|
|