GEO Logo
   NCBI > GEO > Accession DisplayHelp Not logged in | LoginHelp
GEO help: Mouse over screen elements for information.
Series GSE126550 Query DataSets for GSE126550
Status Public on Feb 15, 2019
Title Saturation mutagenesis of disease-associated regulatory elements
Organisms Homo sapiens; Mus musculus
Experiment type Other
Summary The majority of common variants associated with common diseases, as well as an unknown proportion of causal mutations for rare diseases, fall in noncoding regions of the genome. Although catalogs of noncoding regulatory elements are steadily improving, we have a limited understanding of the functional effects of mutations within them. Here, we performed saturation mutagenesis in conjunction with massively parallel reporter assays on 20 disease-associated gene promoters and enhancers, generating functional measurements for over 30,000 single nucleotide substitution and deletion mutations. We find that the density of putative transcription factor binding sites varies widely between regulatory elements, as does the extent to which evolutionary conservation or various integrative scores predict functional effects. These data provide a powerful resource for interpreting the pathogenicity of clinically observed mutations in these disease-associated regulatory elements, and also comprise a gold-standard dataset for the further development of algorithms that aim to predict the regulatory effects of noncoding mutations.
Overall design We set out to generate variant-specific activity maps for 20 disease-associated regulatory elements, including ten promoters (of TERT, LDLR, HBB, HBG, HNF4A, MSMB, PKLR, F9, FOXE1 and GP1BB) and ten enhancers (of SORT1, ZRS, BCL11A, IRF4, IRF6, MYC (2x), RET, TCF7L2 and ZFAND3), together with one ultraconserved enhancer (UC88). Specifically, we used massively parallel reporter assays (MPRAs) to perform saturation mutagenesis on each of these regulatory elements. Altogether, we empirically measured the functional effects of over 30,000 SNVs or single nucleotide deletions. We focused primarily on regulatory sequences in which specific mutations are known to cause disease, both for their clinical relevance and to provide for positive control variants. Selected elements were limited to 600 base pairs (bp) for technical reasons related to the mapping of variants to barcodes by subassembly. In addition, we selected only sequences where cell line-based reporter assays were previously established. For each of the 21 regulatory elements, we used error-prone PCR to introduce sequence variation at a frequency of less than 1 change per 100 bp. While error-prone PCR is known to be biased in the types of mutations that are generated (e.g. a preference for transitions and T/A transversions), high library complexities (50k-2M constructs per target) allowed us to capture nearly all possible SNVs as well as many single base pair deletions with multiple independent constructs per variant. To distinguish the individual amplification products, we incorporated 15 or 20 bp random sequence tags 3' of the target region using overhanging primers during the error-prone PCR. We cloned promoter libraries and all but two enhancer libraries (RET in pGL3, ZRS in pGL4Z) into the backbones of slightly modified pGL4.11 (Promega, promoter) or pGL4.23 (Promega, enhancer) vectors. For each MPRA experiment, around 5 million cells were plated and incubated for 24 hours before transfection with the libraries. In each experiment, three independent cultures (replicates) were transfected with the same library. In addition, for LDLR and SORT1, independent MPRA libraries were created, as outlined above, and cells were transfected from a different culture and on a different day. In one case (TERT), the same MPRA library was used for experiments in two different cell-types (HEK293T and a glioblastoma cell line). The relative abundance of reporter gene transcripts driven by each promoter or enhancer variant was measured by counting associated 3’ UTR tags in amplicons derived from RNA (obtained by targeted RT-PCR), and normalized to its relative abundance in plasmid DNA (obtained by targeted PCR). For all experiments, we excluded tags not matching the assignment and determined the frequency of a tag in RNA or DNA from high-throughput sequencing experiments based on the number of unique molecular identifiers. We only considered tag sequences observed in both RNA and DNA.
Contributor(s) Ahituv N, Shendure J, Kircher M
Citation(s) 31395865
Submission date Feb 14, 2019
Last update date Aug 21, 2019
Contact name Jay Shendure
Organization name University of Washington
Department Genome Sciences
Lab Shendure
Street address 3720 15th Ave NE
City Seattle
State/province WA
ZIP/Postal code 98195-5065
Country USA
Platforms (3)
GPL15520 Illumina MiSeq (Homo sapiens)
GPL18573 Illumina NextSeq 500 (Homo sapiens)
GPL19057 Illumina NextSeq 500 (Mus musculus)
Samples (198)
GSM3604134 BCL11A saturation mutagenesis library
GSM3604135 F9 saturation mutagenesis library
GSM3604136 FOXE1 saturation mutagenesis library
BioProject PRJNA522353
SRA SRP187107

Download family Format
SOFT formatted family file(s) SOFTHelp
MINiML formatted family file(s) MINiMLHelp
Series Matrix File(s) TXTHelp

Supplementary file Size Download File type/resource
GSE126550_RAW.tar 11.6 Gb (http)(custom) TAR (of FA, TSV, TXT)
SRA Run SelectorHelp
Raw data are available in SRA
Processed data provided as supplementary file

| NLM | NIH | GEO Help | Disclaimer | Accessibility |
NCBI Home NCBI Search NCBI SiteMap