Proteomic profiling of retina and retinal pigment epithelium combined embryonic tissue to facilitate ocular disease gene discovery

To expedite gene discovery in eye development and its associated defects, we previously developed a bioinformatics resource-tool iSyTE (integrated Systems Tool for Eye gene discovery). However, iSyTE is presently limited to lens tissue and is predominantly based on transcriptomics datasets. Therefore, to extend iSyTE to other eye tissues on the proteome level, we performed high-throughput tandem mass spectrometry (MS/MS) on mouse embryonic day (E)14.5 retina and retinal pigment epithelium combined tissue and identified an average of 3,300 proteins per sample (n=5). High-throughput expression profiling-based gene discovery approaches-involving either transcriptomics or proteomics–pose a key challenge of prioritizing candidates from thousands of RNA/proteins expressed. To address this, we used MS/MS proteome data from mouse whole embryonic body (WB) as a reference dataset and performed comparative analysis-termed “in silico WB-subtraction”–with the retina proteome dataset. In silico WB-subtraction identified 90 high-priority proteins with retina-enriched expression at stringency criteria of ≥2.5 average spectral counts, ≥2.0 fold-enrichment, False Discovery Rate <0.01. These top candidates represent a pool of retina-enriched proteins, several of which are associated with retinal biology and/or defects (e.g., Aldh1a1, Ank2, Ank3, Dcn, Dync2h1, Egfr, Ephb2, Fbln5, Fbn2, Hras, Igf2bp1, Msi1, Rbp1, Rlbp1, Tenm3, Yap1, etc.), indicating the effectiveness of this approach. Importantly, in silico WB-subtraction also identified several new high-priority candidates with potential regulatory function in retina development. Finally, proteins exhibiting expression or enriched-expression in the retina are made accessible in a user-friendly manner at iSyTE (https://research.bioinformatics.udel.edu/iSyTE/), to allow effective visualization of this information and facilitate eye gene discovery.

for 15 min.). Sample protein quantification was estimated by BCA protein assay kit (Thermo Fisher 110 Cat. No. 23225). For each biological replicate (n = 5 biological replicates), 55 µg of protein/sample 111 was subjected to trypsinization as previously described (Erde et al. 2017). Briefly, a modified enhanced 112 filter-aided digestion protocol (e-FASP) using Amicon 30 kDa ultracentrifugation devices was 113 executed. Samples were subjected to TCEP (Tris Carboxy Ethyl Phosphene) reducing reagent at 90°C 114 for 10 min, followed by transferring to an Amicon filter. Samples were then buffer exchanged into 8 115 M Urea, 0.2% deoxycholic acid (DCA), 100 mM TEAB. Next, samples were subjected to alkylation 116 with iodoacetamide, exchanged into 0.2% DCA, 50 mM TEAB (pH 8.0) digestion buffer, and 117 subjected to overnight digestion by trypsin (1:20 enzyme:substrate concentration). After overnight 118 trypsin digestion, samples were subjected to centrifugation and the filtrate, which contained the 119 peptides, was subjected to extraction with ethyl acetate, which served to remove DCA. A SpeedVac 120 vacuum concentrator (Thermo Fisher Scientific) was then used to dry the samples which were then 121 resuspended in 100 µl of HPLC-grade water. Next, a Pierce Quantitative Colorimetric Peptide Assay 122 Kit was used to perform a peptide assay on the samples and the average peptide recovery from mouse 123 The expressed proteins were inferred, using basic parsimony principles, based on the filtered PSM 169 sequences (Nesvizhskii and Aebersold 2005). Homologous protein family members were grouped 170 using an extended parsimony algorithm when evidence to distinguish family members was insufficient. 171 In total, 3,963 proteins were detected after grouping (excluding common contaminant proteins) with 172 37 decoy matches, for a protein FDR of about 0.9%. The average number of proteins identified per 173 sample was 3,296. 174

Quantitative Analysis 175
For the retina and the WB samples, equal amounts of protein were digested and the total spectral counts 176 (SpC, a robust semi-quantitative measure) were measured. Prior to protein inference, the SpC for 177 individual samples were tallied and they independently validated the peptide assay results. Next, the 178 retina and the WB samples were matched by subjecting the individual samples to be scaled to the 179 average total spectral count per sample. Both the retina and the WB samples had about 3,300 protein 180 identifications per sample. Next, the proteins with enriched expression in the retina compared to WB were determined as follows: For individual proteins, the average SpC for all samples was computed 182 from the scaled data, and only values greater than 2.5 (2,675 proteins) were considered in the 183 differential expressed enrichment analysis between the retina and WB. The bioconductor package, 184 edgeR, was used for differential gene expression analysis (

Results and Discussion 196
Embryonic retina proteome generation and quality assessment 197 We designed an experimental workflow to isolate mouse E14.5 retina, generate its proteome and 198 perform in silico WB-subtraction (Fig. 1A). Retina tissue was micro-dissected from mouse E14.5 eyes 199 and processed for protein preparation and proteome analysis. Mouse WB preparation was performed replicates), 55 µg of protein were subjected to eFASP (enhanced filter-aided sample preparation) and 203 digestion by trypsin. After digestion, equal amounts of peptides were used for high-throughput tandem mass spectrometry (MS/MS) and spectral count (SpC) data were generated. Application of stringent 205 criteria (³2 distinct peptides per protein in at least one sample, ³2.5 average SpC in the retina) to the 206 resulting data led to enrichment analysis of 2,675 proteins in the E14.5 retina (Supplementary Table  207 S1). Across the samples, on average ~35K SpC were detected. Total average SpC was subjected to 208 TMM (trimmed mean of M-values) normalization using edgeR package  to 209 account for differences in SpC between retina and WB ( Fig. 2A). Next, the quality of data was assessed 210 by boxplots for the normalized SpC datasets that demonstrated that the median expression levels were 211 similar between all the retina and the WB samples ( Fig. 2A).  Table S1 for all proteins and Supplementary Table  225 S2 for the top 150). To examine whether specific pathways relevant to retina biology were enriched 226 in this dataset, a cluster-based analysis was performed using the Database for Annotation, Visualization 227 and Integrated Discovery (DAVID v6 .8) for functional annotation by gene ontology (GO) categories. This analysis identified several interesting pathways. These were related to post-transcriptional control 229 of gene expression, e.g., "GO:0003723 RNA-binding", "GO:0030529 intracellular ribonucleoprotein 230 complex", "GO:0006397 mRNA processing" and "GO:0051028 mRNA transport" (Fig.  231 3) (Supplementary Table S3). Proteins involved in other molecular pathways and processes e.g., 232 "GO:0015031 protein transport", "GO:0055114 oxidation-reduction process" were also identified in 233 the dataset. Finally, pathways in basic cell biological processes were also enriched, e.g., "GO:0007049 234 cell cycle", "GO:0098641 cadherin binding involved in cell-cell adhesion", and "GO:0003779 actin 235 binding" in the total proteins expressed in E14.5 mouse retina. Together, these represent promising 236 new candidates for future investigations in the retina. 237

MS/MS in silico WB-subtraction identifies proteins exhibiting retina-enriched expression 238
While GO analysis of total expressed proteins were helpful, to further prioritize the candidates from 239 the E14.5 retina proteome, the "in silico WB-subtraction" approach, which has been effectively applied 240 for prioritizing cataract-linked genes in the lens, was applied to this dataset. To do so, we computed 241 the average SpC for all samples and scaled (normalized) data for each protein. Those peptides that 242 passed the filtration criteria of ³2.5 SpC were considered in the analysis. This approach identified 243 2675 proteins that could be tested for differential expression between the retina and WB samples. At 244 ³2.0 fold-enrichment and FDR <0.01 cut-off, 90 proteins had enriched expression in the retina 245 compared to WB (Table 1). These "retina-enriched" proteins identified many proteins linked to retinal 246 defects and revealed several new promising candidates (Fig. 4) demonstrating that the in silico WB-247 subtraction approach can be effectively applied to the retina. Further, compared to absolute expression 248 of proteins, in silico WB-subtraction could more effectively prioritize key proteins associated with 249 retina biology and disease. For example, the top 30 proteins ranked on relative abundance in the retina 250 (i.e., not subjected to in silico WB-subtraction) did not contain a single protein that has been associated 251 with retina development or defects/disease (Fig. 5A). Indeed, candidates in this list, termed "retina expression" list, were representative of general housekeeping/structural proteins such as 253 Glyceraldehyde-3-phosphate dehydrogenase (Gapdh), Actins (Acta1, Actb), Myosins (Myh3, Myh9, 254 Myh10), Tubulin (Tubb5), Collagen (Col12a1) and several others, not exclusively associated with 255 retina biology. In sharp contrast, the list of the top 30 candidates identified by in silico WB-subtraction, 256 termed "retina enriched" list of candidates, contained 1/3 rd (10 out of 30) candidates that have been 257 associated with retinal biology and/or defects/disease ( list also independently identified Tyrosinase (Tyr) protein which is essential for melanin biosynthesis and therefore critical for RPE (retinal pigment cells) and other retinal cells (Jeffery et al. 1994(Jeffery et al. , 1997. 277 Among the candidates is the premelanosome (Pmel) protein, whose deficiency in mice results in cell 278 shape changes, e.g., the normally "oblong" shaped melanosomes turn spherical in RPE cells ( processes, e.g., "GO:0008283 cell proliferation", "GO:0007155 cell adhesion" were also identified. 365 Additionally, proteins involved in extracellular matrix were identified, e.g., "GO:0005604 basement 366 membrane", "GO:0005578 proteinaceous extracellular matrix". Finally, proteins with roles in nervous 367 system development were identified (Fig. 6) (Supplementary Table S4). Thus, this analysis identifies 368 key candidates in specific processes relevant to retina biology, which can be functionally characterized 369 in future studies. 370

Visualization and access of retina-enriched and retina-expressed proteins in iSyTE
Next, we wanted to make this rich proteome information freely available to the research community.  (Fig. 7A, B). This web-based resource-tool will allow ready and user-friendly visualization of 378 proteins in the E14.5 mouse retina. 379

Conclusion 380
Recent studies have highlighted that post-transcriptional regulation of gene expression plays a key role 381 in determining the cellular proteome in eye development. Therefore, it is important to include ocular 382 proteome data to the existing RNA-based profiling datasets to gain new insights into eye development. 383 As a proof-of-principal we previously generated proteomic profiles for the mouse lens and the 384 embryonic whole body (WB) and effectively applied in silico WB-subtraction strategy to identify 385 proteins with lens-enriched abundance, which -in addition to consideration of absolute expression 386 scores -allows a prioritized list of proteins for further study. In the present study, we expanded this 387 approach to the mouse embryonic retina. We identified 90 proteins with retina-enriched expression. 388 Nearly 1/3 rd of these candidates have been previously reported to be associated with retinal defects. 389 This suggests that in silico WB-subtraction was effective in prioritizing select candidates from over 390 2,600 identified proteins and the other 2/3 rd of these identified proteins represent an unexplored pool 391 of candidates for future characterization of their function in the retina. Indeed, there exist independent 392 evidence in the literature for several of these candidates to be expressed in the retina, in agreement with 393 the proteome data reported in the present study. Further, in addition to these "retina-enriched" 394 candidates, nearly 4,000 proteins were found to be present in the mouse E14.5 retina proteome. It should be noted that while many proteins linked to retina biology and pathology were identified in this 396 study, transcription factors (TFs) such as Otx2, Sox2 and Vsx2 with key roles in the retina were not 397 detected. This may be due to the following reasons. While they may be enriched in tissues, TFs are 398 often in lower abundance compared to other expressed proteins (Tacheny et al. 2013). Furthermore, 399 their levels are often spatiotemporally restricted in specific cells within the tissue, information that is 400 compromised when using bulk tissue (as is the case in the present study). In the present study, we 401 measured static protein relative abundances and did not attempt dynamic system measurements (e.g., 402 those informing on protein turnover). Although 2675 quantifiable proteins (from the total 4680 403 proteins detected, which is generally considered a deep proteome) were identified in the present study, 404 since the above mentioned TFs were not among these proteins, this suggests that more sensitive 405 methods would be needed to detect these proteins in future studies. Together, these datasets and their 406 ready accessibility through the web-based ocular gene discovery tool iSyTE represent a rich resource 407 for prioritizing candidates for future hypothesis-driven studies in retina development. Finally, this 408 study serves as a proof of the principle that in silico subtraction can also be applied to the retina and 409 RPE to identify promising new candidates in these tissues. In the future, this approach will be expanded 410 to prioritize candidates in other developmental stages of the retina. 411

Conflict of Interest 412
The authors declare that the research was conducted in the absence of any commercial or financial 413 relationships that could be construed as a potential conflict of interest. gives rise to a syndromic developmental disorder.   day (E)14.5 were isolated, and the retina and retinal pigment epithelium combined tissue (termed 725 retina) was micro-dissected. The whole body (WB) with eye tissue removed was processed similarly 726 and used as reference for differential protein expression analysis. Retina and WB samples (n = 5 for 727 each sample type, 55 µg protein per sample) were subjected to high-throughput tandem mass 728 spectrometry (MS/MS). (B) The workflow for differential protein expression analysis is outlined. The 729 edgeR pipeline was used to determine differential protein expression using normalized spectral counts.