NCBI Logo
GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
          Go
Sample GSM8704779 Query DataSets for GSM8704779
Status Public on Dec 31, 2024
Title TBX5-ALFA
Sample type SRA
 
Source name synthetic construct
Organism synthetic construct
Characteristics tissue: synthetic construct
Extracted molecule other
Extraction protocol PADIT-seq experiments: To remove any supercoiling, PADIT-seq reporter libraries were first linearized with DrdI (NEB), which cuts a 12-bp DNA sequence (GACNNNN/NNGTC) only once in the pGL4.23 vector. For every DBD tested, the following 30 μl PURExpress IVTT reactions (NEB) were assembled: 10 μl Solution A, 7.5 μl Solution B, 1 μl murine Rnase inhibitor (NEB), 3 μl 100 mM rNTPs, 0.45 μl 1000 mM magnesium acetate, 3 μl previously purified nbALFA-T7-RNA-Polymerase, ~300 ng linearized PADIT-seq reporter plasmid library, and ‘pT7 -DBD-T7Term’. The linearized PADIT-seq reporter plasmid library was mixed with ‘pT7 -DBD-T7Term’ amplicons in a 2:1 molar ratio. For PADIT-seq experiments with HOXD13 and EGR1, the 30 μl PURExpress IVTT reactions were split into three wells, and all subsequent steps were performed separately (3 biological replicates). For PADIT-seq experiments with NKX2.5, TBX5, Pho4 and Cbf1, the PURExpress IVTT reactions were scaled to 50 μl, split into five wells, and all subsequent steps were performed separately (5 biological replicates). We performed a total of 7 control ‘no DBD’ reactions (10 μl each), 3 for the first experiment with HOXD13 and EGR1, and 4 for the second experiment with NKX2.5, TBX5, Pho4 and Cbf1. cDNA synthesis of PADIT-seq reporter RNAs: After 4 hours at 37°C, the 10 μl reactions were purified with RNAClean XP (Beckman Coulter) according to manufacturer’s instructions, eluting in 35 μl nuclease-free water. 2 μl barcoded cDNA synthesis primers (each at 0.1 uM final concentration) were added to 18 μl purified RNA, incubated at 75°C for 3 mins, then placed on ice. cDNA was synthesized by adding 10 2X Multiscribe reaction mix (Thermo Fisher Scientific), and incubating at 25°C for 20 minutes, followed by 37°C for 120 minutes. Minus reverse transcriptase controls were performed in parallel. Excess primers were removed from the cDNA:RNA duplexes by adding exonuclease I (NEB) and incubating at 37°C for 60 mins, followed by heat inactivation at 80°C for 20 mins. Quantitative PCR was performed to verify degradation of all excess primers and to determine the threshold cycle of sample cDNAs. PADIT-seq library preparation for Illumina sequencing: For PADIT-seq experiments with HOXD13 and EGR1 using the small-scale PADIT-seq library, barcoded cDNAs synthesized from reporter RNAs were pooled prior to PCR amplification. The pooled cDNA was amplified in a single PCR reaction using KAPA HiFi polymerase with primers 'MPRA_AmpliconEZ_FWD' and 'MPRA_AmpEZ_REV2.0'. This generated a PCR-1 product that was then used as template for a second PCR with primers '#34_MPRA' and ‘169_TruSeq_Multiplex_220_2’ to attach Illumina adapters and sample barcodes. For PADIT-seq experiment with HOXD13 and EGR1 using the all-10-mers library, barcoded cDNAs were kept separate and amplified in individual PCR reactions rather than pooled. For PADIT-seq experiments with NKX2.5, TBX5, Pho4 and Cbf1 using the all-10-mers library, barcoded cDNAs from replicates were pooled for each TF. The cDNA from PADIT-seq reporter RNAs was amplified using KAPA HiFi HotStart Polymerase (Roche) with primers 'MPRA_AmpliconEZ_FWD' and 'MPRA_AmpEZ_REV2.0'. This generated PCR-1 products that was then used as template for a second PCR with primers '#34_MPRA' and indexed TruSeq primers to attach Illumina adapters and sample barcodes. This was followed by Illumina sequencing, aiming for >50X coverage for each replicate (sample sequencing statistics in Table S4).
Construction of the all-10mers PADIT-seq reporter library: We designed and ordered two IDT Ultramers - one containing all possible 10-bp DNA sequences as candidate TFBS (‘All10mersTFBS_Top’), and another containing all possible 25-bp DNA sequences to serve as barcodes (‘25bpsBC_Bottom’). The two Ultramers were mixed in an equimolar ratio and double stranded in a single PCR cycle using KAPA HiFi polymerase. The pGL4.23 plasmid vector backbone was again PCR amplified in 2 steps with Q5 High-Fidelity 2X Master Mix. First, the backbone was amplified with primers ‘pGL4.23_FWD’ and ‘pGL4.23_REV’ to exclude the luciferase open reading frame. The resulting amplicon (2359-bps) was then further amplified with primers 'T7Term_pGL4.23_F_2.0' and 'pGL4.23_REV' to add a 56-bps DNA sequence as an overlapping region for Gibson Assembly, which was performed with the resulting amplicon (2415-bps) and the double-stranded oligo-pool mixed in equimolar ratios. Following desalting with a mixed cellulose esters (MCE) hydrophilic membrane (0.025 um), the assembled reporter library plasmid was electroporated into E. cloni 10G Supreme cells (n = 13 transformations). Based on plating experiments, the total number of transformants obtained was estimated to be 110 million, providing an average of ~100 barcodes per TFBS. The transformed cells were recovered, grown for 6.5 hrs, and maxi-prepped to obtain the complete all-10mers PADIT-seq reporter plasmid library containing over 100 million clones. Correct library assembly was validated by diagnostic PCR and Sanger sequencing of 10 colonies. Obtaining TFBS-BC pairing in the all-10mers PADIT-seq reporter library: To obtain TFBS-BC pairings, the all-10mers PADIT-seq reporter library was PCR amplified using KAPA HiFi polymerase. Four forward primers were designed with partial Illumina adapters, 6N randomized bases, and 2-bps staggers (‘All_10mers_LibSeq_F1-4’). These were used individually with a single reverse primer (‘All_10mers_LibSeq_R’) to generate 4 PCR-1 products of expected sizes 213-219-bps (9 cycles). The 4 PCR-1 products were then used as template in PCR-2 (5 cycles) with TruSeq indexed primers to attach Illumina sample indexes. This generated 4 PCR-2 products of expected size 272-278-bps. After PCR amplification, the 4 products were SPRI cleaned and analyzed on Tapestation to confirm expected sizes. The 4 indexed libraries were sequenced separately on a NovaSeq6000 (2x150 bp reads). The sequencing data from each of the 4 indexed libraries (F1-F4) was combined and processed using custom scripts to extract unique TFBS-BC combinations along with their counts by matching the constant flanking regions. Barcodes unambiguously associated with only 1 TFBS across all 4 libraries were classified as ‘single TFBS barcodes’ and retained. The all-10mers PADIT-seq reporter plasmid library was amplified in 4 separate PCR reactions (F1-F4) with different TruSeq indexes to identify potential PCR-mediated recombination artifacts in the following way: for barcodes associated with multiple TFBS, an initial filter retained only TFBS observed independently in at least 2 of the 4 libraries. The rationale being that TFBS-BC occurrence in multiple independent libraries indicates likely true pairings versus artifacts of PCR-mediated recombination. After this first multi-library filtering step, any barcodes still associated with multiple TFBS were removed entirely to eliminate ambiguities. As an additional filter, barcodes where the top TFBS had fewer reads than the sum of discarded TFBS were removed. The ‘single TFBS barcodes’ and vetted multiple TFBS barcodes were combined to obtain high-confidence 1:1 TFBS-BC pairs for downstream analysis. This multi-step filtering process leveraged the independently prepared sequencing libraries to remove incorrect and ambiguous TFBS-BC pairings arising from PCR-mediated recombination. It enabled retaining high-confidence barcode-TFBS pairs reproducibly identified across multiple libraries while discarding likely PCR artifacts and errors.
 
Library strategy OTHER
Library source other
Library selection other
Instrument model Illumina NovaSeq X Plus
 
Description Library name: TBX5-ALFA
Data processing Defining PADIT-seq activity and calling active k-mers: Barcodes from sequencing libraries were mapped to the associated TFBS based on previously obtained TFBS-BC pairings. Barcode counts per TFBS were obtained for each library and merged into a single data frame. Quality control was performed by generating Pearson correlation heatmaps and principal component analysis (PCA) plots to assess reproducibility between replicates and overall structure of the data. For differential activity analysis, read counts for the DBD of interest and a ‘no DBD’ control, across 3-5 replicates each, were analyzed using DESeq2. TFBS significantly bound by the DBD of interest (active k-mers) were identified by applying a false discovery rate (FDR) threshold of 5%.
Assembly: Custom: TFBS_BC_Combinations_all_10-mers_library.csv and TFBS_BC_Combinations_Output_UNIQUE.txt
Supplementary files format and content: ReadCounts_perTFBS_all10mers_4TFs-NKX2.5-TBX5-Pho4-Cbf1.txt.gz: Tab separated read count matrix
Supplementary files format and content: dds_results_NKX2.5_df.txt.gz: DESeq2 output comparing read counts from ALFA-NKX2.5 to the No-DBD control.
Supplementary files format and content: dds_results_TBX5_df.txt.gz: DESeq2 output comparing read counts from TBX5-ALFA to the No-DBD control.
Supplementary files format and content: dds_results_PHO4_df.txt.gz: DESeq2 output comparing read counts from ALFA-Pho4 to the No-DBD control.
Supplementary files format and content: dds_results_CBF1_df.txt.gz: DESeq2 output comparing read counts from ALFA-Cbf1 to the No-DBD control.
 
Submission date Dec 30, 2024
Last update date Dec 31, 2024
Contact name Shubham Khetan
E-mail(s) skhetan@bwh.harvard.edu
Phone 8607943361
Organization name Brigham and Women's Hospital
Street address 77 Avenue Louis Pasteur
City Boston
State/province MA
ZIP/Postal code 02115
Country USA
 
Platform ID GPL34698
Series (1)
GSE250601 Transcription factor genomic occupancy is determined by multiple, overlapping DNA binding sites
Relations
BioSample SAMN46039634
SRA SRX27217310

Supplementary file Size Download File type/resource
GSM8704779_dds_results_TBX5_df.txt.gz 53.8 Mb (ftp)(http) TXT
SRA Run SelectorHelp
Raw data are available in SRA

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap