Efficient gene knockout and genetic interactions: the IN4MER CRISPR/Cas12a multiplex knockout platform

Genetic interactions mediate the emergence of phenotype from genotype, but initial technologies for combinatorial genetic perturbation in mammalian cells suffer from inefficiency and are challenging to scale. Recent focus on paralog synthetic lethality in cancer cells offers an opportunity to evaluate different approaches and improve on the state of the art. Here we report a meta-analysis of CRISPR genetic interactions screens, identifying a candidate set of background-independent paralog synthetic lethals, and find that the Cas12a platform provides superior sensitivity and assay replicability. We demonstrate that Cas12a can independently target up to four genes from a single guide array, and we build on this knowledge by constructing a genome-scale library that expresses arrays of four guides per clone, a platform we call ‘in4mer’. Our genome-scale human library, with only 49k clones, is substantially smaller than a typical CRISPR/Cas9 monogenic library while also targeting more than four thousand paralog pairs, triples, and quads. Proof of concept screens in four cell lines demonstrate discrimination of core and context-dependent essential genes similar to that of state-of-the-art CRISPR/Cas9 libraries, as well as detection of synthetic lethal and masking/buffering genetic interactions between paralogs of various family sizes, a capability not offered by any extant library. Importantly, the in4mer platform offers a fivefold reduction in the number of clones required to assay genetic interactions, dramatically improving the cost and effort required for these studies.


Introduction
Pooled library CRISPR screens have revolutionized mammalian functional genomics. DepMap teams have screened over a thousand cancer cell lines with CRISPR knockout libraries to identify background-specific genetic vulnerabilities 1,2 , while dozens of genetic modifier screens with small molecules have explored biomarkers and mechanisms of drug sensitivity and resistance [3][4][5][6][7][8][9] . However, initial efforts to assay genetic interactions (GIs) -that is, the manipulation of multiple genes in the same cell to identify nonlinear combinatorial phenotypes -have proven complex and costly [10][11][12][13][14] . One class of GIs that has received special attention is the synthetic lethal relationships between paralogs, gene pairs or families that arise through duplication of a single ancestral gene. Functional buffering by paralogs, resulting in phenotypic masking in single gene perturbation experiments, has been shown extensively in model organisms 15,16 . Paralogs are therefore attractive targets for genetic interaction studies in human cells, and they are more easily nominated by computational analyses 17,18 compared to genes that work in parallel pathways, such as BRCA1 and PARP1 19 . Further, because the mechanism of action of drugs often relies on inhibition of paralog gene products to mediate cell toxicity, monogenic knockout in CRISPR screens have resulted in false negatives, such as the failure to identify MEK and ERK proteins as critical for cancer cell growth.
unique added capability of targeting more than 4,000 paralog families of size two, three, and four. This combination of features is not available with any other CRISPR perturbation platform.

Results & Discussion
With the discovery that paralogs are both systematically underrepresented as hits in pooled library screens and likely offer the highest density of genetic interactions, several independent studies have each targeted hundreds of paralog pairs in multiple cell lines 18,[20][21][22][23] . However, evaluating the quality and consistency of these studies has proven difficult, since each uses a different technology and custom analytics pipeline for hit calling, and overlap between the targeted paralog pairs in each study is surprisingly slim (Figure 1A,B).
We developed a unified genetic interaction calling pipeline, based on measuring a pairwise gene knockout's deviation from expected phenotype (delta log fold change, dLFC) and the  Figure 1E).
Using this pipeline, we found the large majority of paralog synthetic lethals to be platformspecific. To aid in comparing hits within and across experiments, we developed a platform quality score that broadly measures the replicability of these synthetic lethal screening technologies across different cell lines. We reasoned that, like individual essential genes, a large fraction of paralogs should show consistent synthetic lethality across most or all cell lines, which should be reflected in similarity of synthetic lethality profiles across cell lines. We therefore calculated the Jaccard coefficient of each pair of cell lines screened by a particular platform, then took the median of each platform's Jaccard coefficients as the platform quality score ( Figure 1F).
We then calculated a paralog confidence score for each gene pair by taking the sum of each hit, weighted by the platform quality score, and subtracting the sum of each experiment in which the pair was assayed but not deemed a hit (a "miss"), also weighted by quality score ( Figure 1G). Using this approach, paralog pairs that are hits in multiple high-quality screens outweigh pairs that are hits in screens with lower replicability or pairs that are background-specific hits in high-scoring screens. We further filtered for hits that are detected by more than one platform, minimizing the bias toward paralog pairs that are only assayed in one set of screens or with one perturbation technology. We identified a total of 26 gene pairs that meet these criteria, and we classified the top 13 hits (with paralog score > . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted September 5, 2023. ; https://doi.org/10.1101/2023.01.03.522655 doi: bioRxiv preprint 0.25) as candidate paralog synthetic lethal gold standards ( Figure 1H-J). Measuring the recall of each of the 21 cell line screens against these gold standards confirmed that the Cas12a platform used in Dede et al 18 , with two Cas12a guide RNAs expressed from the same promoter, yielded the highest within-platform replicability (Supplementary Figure 2). Other platforms often showed high sensitivity in one screen, but highly variable sensitivity across multiple screens (Supplementary Figure 2).

Optimizing the Cas12a system for multiplex perturbations
Based on the consistency of the Cas12a results in the paralog screens and its potential applications to higher-order multiplexing, we explored whether crRNA arrays longer than two guides could be utilized at scale. The Cas12a system has previously been shown to mediate multiplexing beyond two targets 28,29,22 with varying levels of efficiency. Guide RNA design is a critical factor in all CRISPR applications 30 , and empirical data on Cas12a guide efficacy is relatively sparse compared to >1,000 whole-genome screens in cancer cell lines performed with Cas9 libraries. We tested more than 1,000 crRNA from the CRISPick 26,31 design tool in a pooled library targeting coding exons of known essential genes and found very strong concordance between the CRISPick on-target score and the fold change induced by the gRNA (Supplementary Figure 3). We therefore considered CRISPick designs for all subsequent work.
We have previously shown that arrays encoding two crRNAs show minimal position effects 18,26,27 , but information about longer arrays is sparse 22,28,29 . We constructed arrays of up to 7 gRNAs to evaluate the maximum length that would yield gene knockout efficiency sufficient for pooled library negative selection screening. A set of seven essential and nonessential genes were selected and each assigned to one position (1-7) on the array. A single guide RNA was selected for each gene, and arrays were constructed such that all combinations of essential and nonessential gRNA were represented, for a total library diversity of 128 array sequences ( Figure 2A). The process was repeated two more times, using different gRNA targeting the same genes, creating three pools each of 128 unique sequences, each targeting the same seven essential and nonessential genes in all combinations ( Figure 2B).
We cloned the 7mer pools into the pRDA_550 vector, a one-component lentiviral vector expressing the Cas12a CRISPR endonuclease and the pac puromycin resistance gene from an EF-1α promoter and the array of Cas12a guides from a human U6 promoter (see Methods). We used the library to screen K562 cells, a BCR-ABL chronic myeloid leukemia cell line commonly used for functional genomics, and collecting samples at 7, 14, and 21 days ( Figure 2C). After normalization (see . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted September 5, 2023. ; https://doi.org/10.1101/2023.01.03.522655 doi: bioRxiv preprint Methods), arrays with no essential gRNA showed no sign of negative selection compared to arrays with any number of essential gRNA. Arrays with multiple essential guides showed increasing loss of fitness, reaching maximal phenotype at 4 to 5 essential guides per array ( Figure 2D; Supplementary Figure 4).
To evaluate position-level effects, we considered arrays encoding a single essential gRNA at any of the seven positions. Across the three replicates, we consistently observed greater fold change at the first four positions compared to the last three positions on the array ( Figure 2E). We further tested whether this efficiency drop at the end of the array was a position-dependent effect or the result of unfortunate guide or gene selection. We constructed a second array with the same gRNA targeting the same genes in reverse order (one essential gRNA per array) and re-screened the same cells. When comparing the fold change of the forward array with the reverse array, observed fold changes on the diagonal indicate gene-and guide-level effects independent of position, while deviations from the diagonal indicate position-specific effects. Our data confirm that the first four to five gRNAs show no position-specific effects, but positions six and seven show marked deviation from the diagonal ( Figure   2F). Based on these observations, we conservatively conclude that the Cas12a system using the pRDA_550 vector can effectively express and utilize arrays of four gRNAs.
We also evaluated whether the 7mer array could be used to identify combinatorial phenotypes.
We trained a linear regression model using a binary encoding of guide arrays as a predictor (where nonessential = 0 and essential = 1) and log fold change as a response variable (see Methods). The regression model provides excellent prediction of fold change for arrays encoding two essentials (R 2 = 0.78-0.91 for the three pools) from the sum of calculated single-guide position-level regression coefficients ( Figure 2G). These observations are consistent with the multiplicative model of genetic interactions, which predicts that the result of independent loss of fitness perturbations is the sum (in log space) of the fold changes of the individual fitness perturbations. It further supports the utility of the Cas12a platform for multiplex perturbation and detection of genetic interactions, which are simply deviations from the expected phenotype according to this model, because the null model accurately fits the data for independent combinatorial perturbations.

The in4mer platform for single and combinatorial perturbation
With confidence that the Cas12a platform supports independent utilization of four guides expressed from a single array, we designed a prototype genome-scale library that targets both proteincoding genes and paralog families in the same pool. Each array encodes four distinct gRNA, each with a different DR sequence selected from the top performers in DeWeirdt et al 26 . (Figure 3A). The library . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted September 5, 2023. The ability to recapitulate known biology is an important control for new technologies, with the MAP kinase pathway a frequently used case study in paralog buffering 21,23 . In K562 cells, the BCR-ABL fusion oncogene activates the STAT and MAP kinase pathways, and we classify ABL1, STAT5B, and the GRB2/SOS1/GAB2/SHC signal transduction module as essential genes ( Figure 4F). None of the three RAS genes are individually essential, but the KRAS-NRAS pair shows a strong synthetic lethality.
Neither KRAS-HRAS nor HRAS-NRAS paralogs show genetic interaction, but the three-way HRAS-KRAS-NRAS clones also show strong essentiality, almost certainly due to the KRAS-NRAS interaction.
In an arrayed validation screen, increased cell death after joint KRAS-NRAS knockout confirms this observation ( Figure 4G). While it is known that RTK/MAP kinase signal transduction flows through the RAS genes, to our knowledge, this is the first time that KRAS-NRAS functional buffering has been demonstrated experimentally.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted September 5, 2023. ; https://doi.org/10.1101/2023.01.03.522655 doi: bioRxiv preprint Beyond the RAS genes, the rest of the MAP kinase pathway also shows the expected gene essentiality profile in K562 cells ( Figure 4H). RAF1 is strongly essential, and while BRAF is slightly below our hit threshold, the BRAF-RAF1 pairwise knockout is consistent with independent additive phenotype. The third member of the paralog family, ARAF, is nonessential singly or in combination with the other RAF paralogs and has not been shown to operate in this pathway. The MEK kinases, MAP2K1/MAP2K2, show greater fold change from pairwise loss than from either individually, though below our strict threshold for synthetic lethality. The ERK kinases, MAPK1/MAPK3, show strong preferential reliance on MAPK1, also consistent with DepMap data for K562 cells.
Likewise, the other three cell lines we screened also show oncogene-driven essentiality and GI in the MAPK pathway ( Figure 4H). KRAS and BRAF essentiality in A549 and A375 cells, respectively, are consistent with driver mutations in those genes, though NRAS is not detected in NRAS-driven Overall, however, distinguishing three-way synthetic lethal interactions from their composite pairs remains challenging. Even 80% single knockout efficiency can translate into (0.8) 3 = ~50% triple knockout efficiency, where cells with incomplete editing can mask severe triple knockout phenotypes.
Higher-order masking interactions, on the other hand, are strikingly visible. Both the core proteasome and the Chaperonin-Containing TCP1 (CCT) complex are composed of several weakly related proteins, which we target with three four-way constructs and numerous two-way constructs.
Since both the proteasome and the CCT complex are universally essential to proliferating cells, knockdown of single subunits induces a severe fitness phenotype. Knockout of these genes in pairs or quads yields no additional phenotype, resulting in masking/positive genetic interactions in all four cell lines (Supplementary Figure 8).
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted September 5, 2023. ; https://doi.org/10.1101/2023.01.03.522655 doi: bioRxiv preprint Discovering genetic modifiers of drug activity is a common goal of genetic screens. To demonstrate the performance of in4mer constructs in discovering chemogenetic interactions, we screened Meljuso cells in the presence of low-dose MEK inhibitor selumetinib. DrugZ analysis 8 of single gene targets ( Figure 5A) shows that genetic perturbation of the MAP kinase pathway sensitizes cells to the drug ( Figure 5A, B). While gene set enrichment analysis 33,34 identifies subunits of the mitochondrial ribosome, components of the peroxisome, and elements of the Hippo pathway ( Figure 5B) as suppressor genes, only two Hippo pathway genes achieved high Z-scores ( Figure 5A). DrugZ analysis of pairwise paralog knockouts yielded hits generally consistent with single gene knockout; that is, most paralog knockouts give DrugZ scores consistent with the most extreme single gene knockout ( Figure   5C). In some cases, however, combinatorial perturbation of paralogs gave rise to synergistic effects, In total, the Inzolia library includes ~50k unique guide arrays, with ~40k targeting single genes and 9,822 arrays targeting paralog doubles, triples, and quads. Inzolia is therefore on par with latestgeneration genome-scale CRISPR/Cas knockout libraries 26,[35][36][37][38] (Figure 6), and is unique among such libraries in including thousands of reagents targeting paralogs. Moreover, the efficiency gain realized by having two guides targeting each of two genes in a paralog pair makes detection of genetic interactions tractable with only six reagents per gene pair, a fivefold improvement over the prior state of the art ( Figure 6).

Conclusions
The rapid ascendancy of CRISPR-mediated genetic perturbation technologies over RNA interference methods was driven by major advances in assay sensitivity and specificity, with the absence of established gold standards arguably contributing to the shortcomings of RNAi-based studies of mammalian gene function 39 . We 38,40 and others 2 have created widely used reference sets of essential and nonessential genes for use in quality control of monogenic loss of fitness screens. As CRISPR perturbation technology has advanced into genetic interactions, it has become clear that a similar gold standard for synthetic lethals is needed 30 . Our meta-analysis of published screens for paralog synthetic lethals in human cells shows wide divergence in the paralogs assayed by each study and in the repeatability of each screen, as measured by the Jaccard coefficient of hits in different cell lines. We reasoned that paralogs that showed synthetic lethality within and across screening platforms . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted September 5, 2023. ; https://doi.org/10.1101/2023.01.03.522655 doi: bioRxiv preprint are likely to be globally synthetic lethal, analogous to core essential genes, and the fact that 12 of our 13 candidate reference paralogs show more than 70% identity (and all are constitutively expressed) is consistent with this interpretation.
Notably, the engineered Cas12a endonuclease, developed in Kleinstiver et al. 25 and deployed in combinatorial screens in DeWeirdt et al. and paralog screens in Dede et al. 18 performed markedly better in terms of replicability. Based on this and our prior work with the CRISPR/Cas12a screens 26,27 , we tested the limits of the Cas12a system expressing guide arrays from the Pol III U6 promoter in a custom one-component lentiviral vector, pRDA_550. For longer arrays of seven independent gRNA, we observed that position-specific loss of knockout efficiency did not arise until after the fourth or fifth gRNA in the array. From this we developed the in4mer platform for arrays encoding four independent gRNA, each with an optimized spacer sequence from the CRISPick algorithm and with diverse but proven DR sequences to minimize the chance of recombination. By targeting single genes with four independent gRNA, we lower the odds that any single guide fails to induce the desired phenotype, extending the development of the Humagne library in work from DeWeirdt et al 26 . As with the Humagne library, having multiple independent gRNA on each array reduces the total number of reagents required to induce reliable gene perturbation.
To construct our Inzolia genome-scale human library, we began with reagents targeting single protein-coding genes and added arrays targeting more than four thousand paralog pairs, triples, and quads, with the in4mer arrays encoding two guides targeting each of the two genes in a pair or one guide per gene in a triple or quad family. Inzolia screens show high (at least 75%) sensitivity to detect synthetic lethals with just two reagents targeting each single knockout and two reagents targeting the double knockout, and offer the potential for novel biology arising from three-and four-way paralog synthetic lethals. The Inzolia library is thus a smaller and more efficient whole-genome library that addresses one of the major gaps of monogenic perturbation libraries: functional buffering by paralogs.
Beyond the paralogs in the Inzolia library, the in4mer system is a highly customizable platform offering a significant advance in the study of genetic interactions. Compared to the five paralog synthetic lethal studies, with each using at least thirty constructs per gene pair tested, in4mer requires fivefold fewer reagents for the same assay. This improvement has major implications for the cost effectiveness in genetic interaction assays in mammalian cells, where the number of gene pairs and the diversity of cell/tumor lineages and genotypes combine to yield a vast search space. A fivefold reduction in experimental footprint could offer a correspondingly larger search space for the same effort, or the same search space across more backgrounds (e.g. cell lines) or environments (e.g. . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted September 5, 2023. ; https://doi.org/10.1101/2023.01.03.522655 doi: bioRxiv preprint chemogenetic interactions) at nearly the same cost as a single screen with an equivalent combinatorial Cas9 system, with the Cas12a system yielding greater sensitivity and robustness. Moreover, custom library construction leverages the other key advantage of the Cas12a system: each library is constructed from a single ~200mer oligo pool and both cloning and amplicon sequencing are performed using essentially the same protocols as single-guide Cas9 screening, albeit with longer sequencing reads. With the in4mer system, a wider swath of the research community will be able to add targeted genetic interaction surveys to their experimental toolkits.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted September 5, 2023. ; https://doi.org/10.1101/2023.01.03.522655 doi: bioRxiv preprint

Data preprocessing
To reanalyze the data from the 5 paralog screens, raw read counts were downloaded, and the same pipeline was applied to all of them. A pseudocount of 5 reads was added to each construct in each replicate, and total read counts were normalized to 500 reads per construct. Log2 fold change (LFC) for each guide at late time point was calculated relative to the plasmid sequence counts.
The data from each study (except Thompson) were divided into three groups; the constructs that target single genes paired with non-essential/non-targeting gRNAs (N) in the first position (gene_N), in the second position (N_gene) and constructs that target gene pairs (A_B). LFC values of each group were scaled individually so that the mode of each group was set to zero. Next, all three groups were merged in one table. Before dividing Ito's dataset into three groups, LFC values were scaled such that the mode of negative controls (non-essential_AAVS1) would be zero and also TRIM family was removed from this dataset to avoid false paralog pair discovery 13 . Since in Thompson's study there was just one position for singleton constructs, LFC values were scaled so that the mode of negative controls (non-essential_Fluc) was set to zero. In the next step, LFC of each construct was calculated by the mean of LFC across different replicates.

Calculating genetic interaction
For each gene, single gene mutant fitness (SMF) was calculated as the mean construct log fold change of gene-control constructs. The control was either non-essential genes or non-targeting gRNAs. For each gene pair, the expected double mutant fitness (DMF) of genes 1 and 2 was calculated as the sum of SMF of gene 1 and SMF of gene 2. The difference between expected and observed DMF, the mean LFC of all constructs targeting genes 1 and 2, was called dLFC.
Next step was calculating a modified Cohen's D between observed and expected distribution of LFC of gRNAs targeting genes. Expected distribution of gRNAs targeting a gene pair, was calculated using expected mean and expected standard deviation.
CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made In each cell line, the paralog pairs with dLFC < -1 and Cohen's D > 0.8 were selected as hits. Cohen's D more than 0.8 indicates large effect size between two groups, meaning that our expected and observed distribution of gRNAs are meaningfully separated. In total 388 paralog pairs were identified as hits across all the studies.
To identify the most consistent method in terms of hit identification, the Jaccard similarity coefficient of every pair of cell lines in each study was calculated by taking the ratio of intersection of hits over union of hits. For the studies that screened more than two cell lines, the final platform weight was the median of the calculated Jaccard coefficients of all pairs of cell lines.

Scoring Paralog Pairs
Each hit was scored based on the cell lines in which it was identified as a hit; cell lines were weighted based on the platform weight described above. We defined the "paralog score" as the sum of platform weights of cell lines in which the paralog pair was identified as a hit minus the sum of platform weights of cell lines in which the paralog pair was assayed but not identified as a hit (a "miss"). The distribution of scores is shown in Figure 1. Gene pairs with paralog score > 0.25 and were identified as a hit in two or more studies were listed as candidate gold standard paralog synthetic lethals.

One-component CRISPR/enCas12a vector
To construct an all-in-one vector for expression of both Cas12a and a guide array, we first swapped in puromycin resistance in place of blasticidin resistance from pRDA_174 (Addgene 136476). We then tested four locations for the insertion of a U6-guide expression cassette; notably this cassette needs to be oriented in the opposite direction of the primary lentiviral transcript to prevent Cas12a-mediated processing during viral packaging in 293T cells. The construct with the best-performing location, between the cPPT and the EF-1α promoter, was designed pRDA_550 (Addgene #203398). Synthesis of DNA and custom cloning was performed by Genscript.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted September 5, 2023. ; https://doi.org/10.1101/2023.01.03.522655 doi: bioRxiv preprint 7Mer library production An oligonucleotide pool consisting of 7 Essential and 7 Non-Essential gene crRNAs with their nearby DR, BsmBI recognition as well as overhang sequence was synthesized by Integrated DNA Technologies. The pool was amplified by asymmetric PCR followed by being assembled into PRDA_550 vector to acquire the designed library through NEBridge® Golden Gate Assembly Kit (BsmBI-v2) (New England Biolabs). The assembled product was transformed into NEB® Stable Competent E. coli (High Efficiency) cells and the plasmid DNA was purified using the PureLink™ Plasmid Purification Kit (Invitrogen). Three oligonucleotide pools were cloned separately and pooled together to acquire the final 7mer library. The library was sequenced to confirm uniform and complete library representation.

Paralog selection for IN4MER/INZOLIA
Human paralogs and percent identity data were imported from BioMart, which reports both AB and BA percent identity (these can differ if the two genes encode proteins of different lengths) Mean percent identity ( (AB + BA) / 2 )and delta percent identity ( |AB -BA| ) between paralogs were then calculated, and for the prototype library, paralogs with mean percent identity between 30% and 99% and delta percent identity < 10% were selected (Supplementary Figure 5). Next, CCLE expression data was downloaded, and the mean and standard deviation of expression across all CCLE samples was calculated for each gene. Paralogs where both genes had mean expression > 2 and stdev < 1.5 were selected (i.e. constitutively expressed genes).
Finally, to identify and include paralog families of size > 2, we applied a "difference from top paralog" filter. For each gene A in the pool, we identified its top paralog B by max sequence identity. Then for each other candidate paralog C, we calculated the drop in sequence identity, AB -AC (see distribution of drop % in Supplementary Figure 5). For the prototype library, we defined A,B,C as being in the same family if AB -AC < 10%.
For the final Inzolia library, we relaxed several of these filters. The delta percent identity filter and the expression variance filter were removed entirely, and the difference from top paralog filter was expanded to 20%. The mean expression filter was retained. These three filtering steps resulted in a total of 4435 paralog pairs included in the Inzolia pool library.

IN4MER Prototype library production (MD Anderson)
Oligonucleotide pools consisting of designed four-plex guide arrays were synthesized by Twist Bioscience. The prototype pool consists of 43,972 arrays targeting 19,687 single genes, 2,082 paralog pairs, 167 paralog triples, and 48 paralog quads.

INZOLIA library production (Broad Institute)
The final Inzolia pool consists of arrays targeting 19,687 single genes, 4,435 paralog pairs, 376 paralog triples, and 100 paralog quads, plus 20 arrays targeting EGFP, 500 targeting intergenic loci, and 50 encoding nontargeting guides. Each array in the oligonucleotide pools is constructed as follows: (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted September 5, 2023. ; https://doi.org/10.1101/2023.01.03.522655 doi: bioRxiv preprint (Epizyme). The assembly product was purified by isopropanol precipitation, electroporated into Stbl4 electrocompetent cells (Life Technologies) and grown at 37 o C for 16 hours on agar with 100 ug/mL carbenicillin. Colonies were scraped and plasmid DNA (pDNA) was extracted via HiSpeed Plasmid Maxi (Qiagen). The library was sequenced to confirm uniform and complete library representation.

Cell culture
K562 and A549 cells were a gift from Tim Heffernan. A375 and MelJuso were obtained from the Cancer Cell Line Encyclopedia. Cell line identities were confirmed by STR fingerprinting by M.D. Anderson Cancer Center's Cytogenetic and Cell Authentication Core. All cell lines were routinely tested for mycoplasma contamination using cells cultured in non-antibiotic medium (PlasmoTest Mycoplasma Detection Assay, InvivoGen).
All cell lines were grown at 37 o C in humidified incubators at 5.0% CO 2 and passaged to maintain exponential growth.

Cas12a Screens
Lentivirus was produced by the University of Michigan Vector Core (prototype) or the Broad GPP (Inzolia). Virus stocks were not titered in advance. Transduction of the cells was performed at 1X concentration of virus with corresponding polybrene.
Non-transduced cells were eliminated via selection puromycin dihydrochloride. Selection was maintained until all non-transduced control cells reached 0% viability. Once selection with puromycin was complete, surviving cells were pooled and 500x coverage cells were harvested for a T0 sample. After T0, cells were harvested at 500X coverage on corresponding days.

Prototype In4mer library Genomic DNA preparation and sequencing (MD Anderson)
Genomic DNA (gDNA) was extracted using the Mag-Bind® Blood & Tissue DNA HDQ 96 Kit (Omega Bio-tek) and quantified by the Qubit™ dsDNA Quantification Assay Kits (ThermoFisher).
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted September 5, 2023. ; https://doi.org/10.1101/2023.01.03.522655 doi: bioRxiv preprint Illumina-compatible guide array amplicons were generated by amplification of the gDNA in a one-step PCR. Indexed PCR primers were synthesized by Integrated DNA Technologies using the standard 8nt indexes from Illumina (D501-D508 and D701-D712) as follows:

7Mer screen data analysis
Reads for each reagent were counted using only exact matches to the entire 281 nucleotide 7mer sequence, excluding the leading DR (7 23mer spacer sequences + 6 20mer DR sequences). Fold changes were calculated relative to the mean of the T0 samples, and averaged across replicates. For each sample (T7/14/21), fold changes were normalized by subtracting the mean fold change of arrays with 7 nonessentials; i.e. setting no-essentials guides to zero. . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made We expected that the selected essential genes would not show any pairwise or higher order interactions, and thus should be governed by the multiplicative model of genetic interaction. To evaluate this model, we fit a regression model: where A is a binary matrix of 7mer guide arrays (rows, k=384) by positions (columns, n=7), with A i,j = 1 if guide array i targets an essential gene at position j and 0 if not.
‫ݕ‬ is the vector of normalized observed fold changes, and the n-length vector ߚ coefficients represent the single gene knockout phenotype learned from the model. We filtered this construct for reagents that encoded two or fewer essential genes (k=87 rows). After linear fit, we compared the predicted zero, one, and two gene knockout fitness profiles (by summing the ߚ coefficients for each gene) to the mean observed knockout fitness. R 2 values for each pool ranged from 0.78 to 0.91, and the overall quality of the linear fit supports the multiplicative model for non-interacting genes as assayed by combinatorial CRISPR knockouts of up to two genes. An accurate null model for noninteraction is critical for detecting and classifying deviations from this model that reflect positive or negative genetic interactions.

IN4MER/Inzolia screen data analysis
In4mer library sequencing reads were mapped to the library using only perfect matches. BAGEL2 was used to normalize sample level read counts and to calculate fold changes relative to the T0 reference using the BAGEL2.py fc option with default parameters 41 . Essential and nonessential genes were defined using the Hart reference sets from 38,40 . Since the library targets both individual genes and specific gene sets (paralogs), we calculated the average gene/gene set (hereafter 'gene') log fold change as the mean of the clone-level fold changes across two replicates. All fold changes are calculated in log2 space. Cohen's D statistics were calculated in Python as described in Paralog metaanalysis above. Data for recall-precision curves were calculated using BAGEL2. We set an arbitrary threshold of fc < -1 for essential genes.
For genetic interaction analysis, the expected fold change was calculated as the sum of the gene-level fold changes for each individual gene in the gene set. Expected fc was subtracted from observed fc to calculate delta log fold change, dLFC, where negative dLFC indicates synthetic/synergistic interactions with more severe negative phenotype, and positive dLFC indicates positive/suppressor/masking interactions with less severe negative or more positive phenotype than expected. We set an arbitrary threshold of dLFC < -1 for synthetic lethality, and > +1 for masking/suppressor interactions.

RAS synthetic lethal validation
An arrayed knockout apoptosis assay approach was adopted to validate RAS synthetic lethality in K562. Two guides were selected for each of the three RAS genes, and two clones were designed for each target/gene combination. Guide RNAs were selected through CRISPick and gblocks (same construct as Inzolia library) were synthesized by Integrated DNA Technologies. The arrays were . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted September 5, 2023. ; https://doi.org/10.1101/2023.01.03.522655 doi: bioRxiv preprint individually cloned into the pRDA_550 backbone and plasmids were validated by Sanger sequencing. The plasmids were then individually transfected to K562 cells via the Neon Transfection System (Invitrogen). Each group was transfected with 2 μ g of DNA per 2 × 10 6 cells, using the recommended setting for K562 electroporation with one pulse at 1000 v, 50 ms. Non-transfected cells were eliminated through puromycin selection, which was maintained until non-transfected control cells reached 0% viability. Triplicate wells were maintained after selection until the end of the experiment. Cell viability, total cell numbers, live cell size and dead cell size data were collected through reading Trypan Blue (Gibco) stained cells via Countess II FL (Thermo Fisher) at each passage until 9 days after puromycin selection, in line with Inzolia screen end point of 8 days in K562 cells. Percent dead cells were normalized to negative control and unpaired T tests were conducted to compare experimental groups against the negative control for statistical significance.
. CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made  . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted September 5, 2023. ; https://doi.org/10.1101/2023.01.03.522655 doi: bioRxiv preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted September 5, 2023. ; (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted September 5, 2023. ; https://doi.org/10.1101/2023.01.03.522655 doi: bioRxiv preprint . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted September 5, 2023. ; https://doi.org/10.1101/2023.01.03.522655 doi: bioRxiv preprint . CC-BY 4.0 International license available under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint this version posted September 5, 2023. ; https://doi.org/10.1101/2023.01.03.522655 doi: bioRxiv preprint