 |
 |
GEO help: Mouse over screen elements for information. |
|
Status |
Public on Mar 06, 2024 |
Title |
ALFA-EGR1 |
Sample type |
SRA |
|
|
Source name |
synthetic construct
|
Organism |
synthetic construct |
Characteristics |
tissue: synthetic construct
|
Extracted molecule |
other |
Extraction protocol |
PADIT-seq experiment: To remove any supercoiling, PADIT-seq reporter libraries were first linearized with DrdI (NEB), which cuts a 12-bp DNA sequence (GACNNNN/NNGTC) only once in the pGL4.23 vector. For every DBD being tested, the following 30ul PURExpress IVTT reactions (NEB) were assembled: 10ul Solution A, 7.5ul Solution B, 1ul Murine RNase Inhibitor, 3ul 100mM rNTPs, 0.45ul 1000mM Magnesium Acetate, 3ul previously purified nbALFA-T7-RNA-Polymerase, ~300 ng linearized PADIT-seq reporter plasmid library, ‘pT7-ALFA-DBD-T7Term’. The linearized PADIT-seq reporter plasmid library was mixed with ‘pT7-ALFA-DBD-T7Term’ amplicons in an approximately 2:1 molar ratio. The 30ul PURExpress IVTT reactions were split into three wells, and all subsequent steps were performed separately (3 biological replicates). cDNA synthesis of PADIT-seq reporter RNAs: After 4 hours at 37°C, the 10ul reactions were purified with RNAClean XP as per manufacturer’s instructions, eluting in 35ul Nuclease-free water. 2ul barcoded cDNA synthesis primers (0.1uM final each) were added to 18ul purified RNA, incubated at 75°C for 3 mins, then placed on ice. cDNA was synthesized by adding 10ul 2X Multiscribe reaction mix (Thermo Fisher), and incubating at 25°C for 20 minutes, followed by 37°C for 120 minutes. Minus reverse transcriptase controls were performed in parallel. Excess primers were removed from the cDNA:RNA duplexes by adding Exonuclease I and incubating at 37°C for 60 mins, followed by heat inactivation at 80°C for 20 mins. Quantitative PCR was performed to verify degradation of all excess primers, and to determine the threshold cycle of sample cDNAs. PADIT-seq library preparation for Illumina Sequencing: For the small-scale PADIT-seq library, barcoded cDNAs synthesized from the reporter RNAs were pooled prior to PCR amplification. The pooled cDNA was amplified in a single PCR reaction using KAPA HiFi polymerase with primers 'MPRA_AmpliconEZ_FWD' and 'MPRA_AmpEZ_REV2.0'. This generated a PCR-1 product that was then used as template for a second PCR with primers '#34_MPRA' and ‘169_TruSeq_Multiplex_220_2’ to attach Illumina adapters and sample barcodes. In contrast, for the all-10mers PADIT-seq library, barcoded cDNAs were kept separate and amplified in individual PCR reactions rather than pooled. For each DBD, cDNA was amplified using KAPA HiFi HotStart Polymerase with primers 'MPRA_AmpliconEZ_FWD' and 'MPRA_AmpEZ_REV2.0'. This generated PCR-1 products that were cleaned and quantified. In the second PCR, '#34_MPRA' and indexed TruSeq primers were used to attach Illumina adapters and sample barcodes. Construction of the all-10mers PADIT-seq reporter library: We designed and ordered two IDT Ultramers - one containing all possible 10-bp DNA sequences as candidate TFBS (‘All10mersTFBS_Top’), and another containing all possible 25-bp DNA sequences to serve as barcodes (‘25bpsBC_Bottom’). The two Ultramers were mixed in an equimolar ratio and double stranded in a single PCR cycle using KAPA HiFi polymerase. The pGL4.23 plasmid vector backbone was again PCR amplified in 2 steps with Q5 High-Fidelity 2X Master Mix. First, the backbone was amplified with primers ‘pGL4.23_FWD’ and ‘pGL4.23_REV’ to exclude the luciferase open reading frame. The resulting amplicon (2359-bps) was then further amplified with primers 'T7Term_pGL4.23_F_2.0' and 'pGL4.23_REV' to add a 56-bps DNA sequence as an overlapping region for Gibson Assembly, which was performed with the resulting amplicon (2415-bps) and the double-stranded oligo-pool mixed in equimolar ratios. Following desalting with a mixed cellulose esters (MCE) hydrophilic membrane (0.025 um), the assembled reporter library plasmid was electroporated into E. cloni 10G Supreme cells (n = 13 transformations). Based on plating experiments, the total number of transformants obtained was estimated to be 110 million, providing an average of ~100 barcodes per TFBS. The transformed cells were recovered, grown for 6.5 hrs, and maxi-prepped to obtain the complete all-10mers PADIT-seq reporter plasmid library containing over 100 million clones. Correct library assembly was validated by diagnostic PCR and Sanger sequencing of 10 colonies. Obtaining TFBS-BC pairing in the all-10mers PADIT-seq reporter library: To obtain TFBS-BC pairings, the all-10mers PADIT-seq reporter library was PCR amplified using KAPA HiFi polymerase. Four forward primers were designed with partial Illumina adapters, 6N randomized bases, and 2-bps staggers (‘All_10mers_LibSeq_F1-4’). These were used individually with a single reverse primer (‘All_10mers_LibSeq_R’) to generate 4 PCR-1 products of expected sizes 213-219-bps (9 cycles). The 4 PCR-1 products were then used as template in PCR-2 (5 cycles) with TruSeq indexed primers to attach Illumina sample indexes. This generated 4 PCR-2 products of expected size 272-278-bps. After PCR amplification, the 4 products were SPRI cleaned and analyzed on Tapestation to confirm expected sizes. The 4 indexed libraries were sequenced separately on a NovaSeq6000 (2x150 bp reads). The sequencing data from each of the 4 indexed libraries (F1-F4) was combined and processed using custom scripts to extract unique TFBS-BC combinations along with their counts by matching the constant flanking regions. Barcodes unambiguously associated with only 1 TFBS across all 4 libraries were classified as ‘single TFBS barcodes’ and retained. The all-10mers PADIT-seq reporter plasmid library was amplified in 4 separate PCR reactions (F1-F4) with different TruSeq indexes to identify potential PCR-mediated recombination artifacts in the following way: for barcodes associated with multiple TFBS, an initial filter retained only TFBS observed independently in at least 2 of the 4 libraries. The rationale being that TFBS-BC occurrence in multiple independent libraries indicates likely true pairings versus artifacts of PCR-mediated recombination. After this first multi-library filtering step, any barcodes still associated with multiple TFBS were removed entirely to eliminate ambiguities. As an additional filter, barcodes where the top TFBS had fewer reads than the sum of discarded TFBS were removed. The ‘single TFBS barcodes’ and vetted multiple TFBS barcodes were combined to obtain high-confidence 1:1 TFBS-BC pairs for downstream analysis. This multi-step filtering process leveraged the independently prepared sequencing libraries to remove incorrect and ambiguous TFBS-BC pairings arising from PCR-mediated recombination. It enabled retaining high-confidence barcode-TFBS pairs reproducibly identified across multiple libraries while discarding likely PCR artifacts and errors.
|
|
|
Library strategy |
OTHER |
Library source |
other |
Library selection |
other |
Instrument model |
Illumina NovaSeq 6000 |
|
|
Data processing |
Barcodes from sequencing libraries were mapped to the associated TFBS based on previously obtained TFBS-BC pairings. Barcode counts per TFBS were obtained for each library and merged into a single data frame. For differential activity analysis, read counts for the DBD-of-interest and a ‘No-DBD’ control, across 3 replicates each, were analyzed using DESeq2. TFBS significantly bound by the DBD-of-interest were identified by applying a false discovery rate (FDR) threshold of 5%. Assembly: Custom: TFBS_BC_Combinations_all_10-mers_library.csv and TFBS_BC_Combinations_Output_UNIQUE.txt Supplementary files format and content: TFBS_BC_Combinations_all_10-mers_library.csv: Comma separated strings of 10-bps TFBS, 25-bps Barcode, and count. Supplementary files format and content: ReadCounts_perTFBS_all10mers_DF.txt: Tab separated read count matrix Supplementary files format and content: dds_results_HOXD13_df.txt: DESeq2 output comparing read counts from ALFA-HOXD13 to the No-DBD control. Supplementary files format and content: dds_results_EGR1_df.txt: DESeq2 output comparing read counts from ALFA-EGR1 to the No-DBD control Supplementary files format and content: TFBS_BC_Combinations_Output_UNIQUE.txt: Tab separated strings of 9-bps TFBS, and 20-bps Barcodes. Supplementary files format and content: ReadCounts_perTFBS_DF_DuplicatesNotRemoved_Tag_Linker.txt: Tab separated read count matrix Library strategy: PADIT-seq
|
|
|
Submission date |
Dec 19, 2023 |
Last update date |
Mar 06, 2024 |
Contact name |
Shubham Khetan |
E-mail(s) |
skhetan@bwh.harvard.edu
|
Phone |
8607943361
|
Organization name |
Brigham and Women's Hospital
|
Street address |
77 Avenue Louis Pasteur
|
City |
Boston |
State/province |
MA |
ZIP/Postal code |
02115 |
Country |
USA |
|
|
Platform ID |
GPL26526 |
Series (1) |
GSE250601 |
Transcription factor genomic occupancy is determined by multiple, overlapping DNA binding sites |
|
Relations |
BioSample |
SAMN38928089 |
SRA |
SRX22965803 |
Supplementary file |
Size |
Download |
File type/resource |
GSM7982760_dds_results_EGR1_df.txt.gz |
29.7 Mb |
(ftp)(http) |
TXT |
SRA Run Selector |
Raw data are available in SRA |
|
|
|
|
 |