Genotype-Phenotype Comparison in POGZ-Related Neurodevelopmental Disorders by Using Clinical Scoring

POGZ-related disorders (also known as White-Sutton syndrome) encompass a wide range of neurocognitive abnormalities and other accompanying anomalies. Disease severity varies widely among POGZ patients and studies investigating genotype-phenotype association are scarce. Therefore, our aim was to collect data on previously unreported POGZ patients and perform a large-scale phenotype-genotype comparison from published data. Overall, 117 POGZ patients’ genotype and phenotype data were included in the analysis, including 12 novel patients. A severity scoring system was developed for the comparison. Mild and severe phenotypes were compared with the types and location of the variants and the predicted presence or absence of nonsense-mediated RNA decay (NMD). Missense variants were more often associated with mild phenotypes (p = 0.0421) and truncating variants predicted to escape NMD presented with more severe phenotypes (p < 0.0001). Within this group, variants in the prolin-rich region of the POGZ protein were associated with the most severe phenotypes (p = 0.0004). Our study suggests that gain-of-function or dominant negative effect through escaping NMD and the location of the variants in the prolin-rich domain of the protein may play an important role in the severity of manifestations of POGZ–associated neurodevelopmental disorders.


Introduction
The POGZ protein (pogo transposable element-derived protein with zinc finger domain) is a heterochromatin protein 1α (HP1α)-binding protein, which destabilizes the interaction between HP1α and chromatin and dissociates Aurora B kinase from the chromosome arm during meiosis [1]. POGZ also interacts with SP1 transcription factor and chromodomain helicase DNA-binding protein 4, which suggests that POGZ functions as a chromatin regulator, and thus is important for the normal mitotic progression, especially during cortical development [2]. Analyses of pogz knockout mice revealed that the POGZ protein promotes chromatin accessibility and expression of clustered synapse genes, and co-occupies loci with ADNP, a gene associated with autism [3].
POGZ dysfunction leads to a wide spectrum of neurodevelopmental disorders, also referred to as White-Sutton syndrome [4,5]. The clinical spectrum includes developmental delay, autism-spectrum disorders and other neuropsychiatric problems with or without structural brain malformations, mild-to-severe intellectual disability, seizures, visual impairment, hearing loss, gastrointestinal and urinary tract anomalies. The expressivity of POGZ-related disorder is variable. Multiple types of variants, including missense, nonsense and frameshift variants and deletions have been identified in POGZ patients. Missense variants appear to be associated with behavioral anomalies rather than intellectual disability [6], while nonsense and frameshift variants which escape nonsense-mediated RNA decay (NMD) are more likely to cause intellectual disability and accompanying malformations of the gastrointestinal or urinary tract [5]. However, a clear genotype-phenotype correlation has not yet been identified.
Our aim was to further delineate the clinical and genotype spectrum of POGZassociated disorders with novel cases, and to perform a genotype-phenotype comparison based on clinical scoring established from published data.

Subjects and Methods
Overall, 13 patients (12 novel) with POGZ-related neurodevelopmental disorders were recruited in this study with the help of the GeneMatcher platform [7]. Patients and/or legal representatives gave their informed consent to the study. Enrolled patients presented with different severity of the disease and 12 different genotypes. For variant classification Varsome, based on ACMG Guidelines, ClinVar and LOVD Databases were used [8][9][10][11]. The GnomAD Database was applied to assess variant frequency in the general population [12]. In order to predict whether the nonsense and frameshift variants escape nonsense-mediated RNA-decay, we used the NMDEscPredictor computational tool [13].
The detailed features and abnormalities of our patients are presented in Figure 1 and Supplementary Table S1. For phenotype-genotype comparison, a detailed questionnaire about all the described symptoms and anomalies related to the POGZ gene was created and filled out by the patients' pediatrician/geneticist/genetic counsellor. Pictures of POGZ patients enrolled in the present study. Patient L01 at the age of 11 months, W01: at the age of 9 years, L02: at the age of 1 year and 4 years, G01: at the age of 35 years, G02: at the age of 5 years, D01: at the age of 17 years, US01: at the age of 14 years, US03: in infancy and at the age of 2.5 years.
The symptoms were evaluated according to the observed dysmorphic features and abnormalities of the organ systems, such as nervous system, musculature, urinary or gastrointestinal tract. An individual scoring system was set up for each organ system, based on the number of symptoms ( Table 1). The highest clinical scores belonged to the abnormalities of the nervous system, cognitive and motor development, followed by the skeletal system (including micro-, brachycephaly), the digestive system, ocular abnormalities and the genitourinary tract (Table 1). Each patient received a cumulative clinical score from the systemic scores. In order to be able to uniformly assess and compare the phenotypes, the cumulative clinical scores were converted into uniform severity scores. Severity score 1 was interpreted as the mildest phenotype, with score 2 as moderate, score 3 as moderate-severe and score 4 as the most severe ( Table 2). The patients were grouped into three cohorts, based on the elaboration of the available clinical data. Cohort 1 contained our patients with the most available clinical data, except for patient US02, who was regrouped into cohort 3, due to young age and lack of clinical data. Cohort 2 comprised patients from the literature with detailed phenotypic data, covering all or almost all of our evaluation criteria and cohort 3 patients with less detailed phenotypes, covering fewer evaluation criteria. In each cohort the cumulative clinical scores were grouped into four categories to obtain the unified severity score from 1 to 4 ( Table 2, Supplementary Table S1).
Each POGZ-related symptom was sorted by frequency in cohort 1 and 2. Cohort 3 was excluded from this analysis due to the lack of information about several POGZ-related symptoms (Supplementary Table S1).

Variant Evaluation
The reported variants were grouped according to variant type, whether they are predicted to undergo or escape nonsense-mediated mRNA decay (NMD), and according to their location in the gene and domains. POGZ protein contains 14 domains: zinc finger domains 1-8 (ZF 1-8) and HP1α-binding zinc-finger-like domain (HPZ: zinc finger domain 9), followed by a prolin-rich, centromere protein (CENP)-B-DNA-binding domain (CENPB), transposase encoded DDE, coiled coil domain and an integrase domain-binding motif [1,5].

Facial Gestalt Analysis
For the facial dysmorphology comparison we used DeepGestalt technology [31] via Face2Gene application (FDNA, Inc., Boston, MA, USA). Only patients with a photo of sufficient quality were enrolled in the analysis. A total of 48 photos from patients with all severity scores were included. For the facial analysis each cohort had to comprise at least 10 photos. The cohort with severity score 1 contained only four photos. Therefore, POGZ patients were regrouped into two cohorts: mild and severe. The cohort defined as having a mild phenotype consisted of 21 POGZ patients with severity score 1 and 2 and the cohort with severe phenotypes contained 27 patients with severity score 3 and 4. The two cohorts were compared to each other and also to the cohort of 79 healthy individuals in a binary comparison and as composite photos. Healthy controls showed no obvious syndrome-related dysmorphism, and were never suspected to have any genetic syndrome. In the binary comparison, a mean area under the curve (AUC) value was generated, which represented the degree of discrimination between the cohorts. Mean AUC ranged between 0 and 1 (0: incorrectly classified cohorts, 0.5: random classification and 1: perfect separation between the cohorts). The p value describes the accuracy of the binary comparison, and p < 0.05 was considered to be statistically significant, indicating that the Face2Gene software is able to distinguish between the two cohorts.

ABNORMALITY OF THE NERVOUS SYSTEM
Global developmental delay Developmental delay and intellectual disability were scored from score 1 (+, mild) to score 4 (++++, severe) based on the MEAN of the scoring points given for gross motor (A), speech delay (B) and IQ-level (C) (Details seen in Supplementary

Statistical Analysis
The GraphPad Prism version 6.01 for Windows software (GraphPad Software, San Diego, CA, USA) was used. The Fisher s exact test and Chi-squared test with Yates correction was performed to compare the variant types and the characteristics between patient cohort with mild phenotype (severity score 1 and 2) and severe phenotypes (severity score 3 and 4). p < 0.05 was considered to be statistically significant. Severity scores were expressed as means ± standard deviation (means ± SD) in the different gene regions and domains.

Variant Types and Their Distribution
Eight variants were novel in our patient cohort, including the whole POGZ gene deletion (Table 3). All variants where inheritance could be established were de novo of origin, except one case, where a nonsense variant was maternally inherited (G01 and G02).
Both mother and child were presented with mildly delayed development and only a few dysmorphic features, thus representing a mild phenotype.
Segregation analysis was performed in 82 cases. In 74 cases (90%) the variant occurred de novo in the proband, while in eight cases (10%) it was inherited from the affected parent.

Association between Disease Severity, Variant Types and Nonsense-Mediated RNA Decay
A severity score was assigned to each variant based on the clinical features observed (Supplementary Table S1). Variants were grouped according to the variant type, their location within the gene, and whether they were predicted to undergo nonsense-mediated RNA decay (NMD) or escape NMD. The frequency of mild and severe phenotypes was compared with respect to different variant types, NMD or non-NMD variants, the different regions of the POGZ gene and also to the different domains of the protein (Figure 2).
Based on the variant types, no significant difference was found between the cohort with mild (severity score 1 and 2) and severe phenotype (severity score 3 and 4), except for the missense variants, which were significantly more often associated with milder phenotype than with severe one (p < 0.0421) (Figure 2A).
According to the predictions with NMDEscPredictor, a total of 42 frameshift and nonsense variants undergo NMD, which are located before the amino acid residue 810, and ultimately result in haploinsufficiency. The effect of whole POGZ gene deletion and deletion of exon 4-19 is supposed to be equal with that of nonsense-mediated RNA-decay. In whole gene deletion there is no mRNA-synthesis from the deleted allele, and deletion of exon 4-19 also results in a considerably truncated transcript, and most probably undergoes NMD as well. Thus, these three cases were also considered as variants affected by NMD. Splice variants were not included in the analysis. Overall, 67 variants, including nonsense and frameshift variants after amino acid residue 810, in frame deletion and all missense variants, are expected to escape NMD. NMD-affected variants, as well as NMD-escaping missense variants, are significantly more often associated with milder phenotype (p < 0.0001 and p = 0.0422, respectively), while NMD-escaping truncating variants are more often with severe phenotype (p < 0.0001) ( Figure 2B, Supplementary Table S1).

Association between Disease Severity and POGZ Domains and Larger Gene Regions
Seventy variants occurred in the POGZ gene domains: ZF 1-9, CENPB, DDE and coiled coil domain and integrase domain-binding motif, while 36 variants occurred in between the domains. Deletions and splice variants were not included in the analysis. Between mild and severe phenotype, a significant difference was found in the zinc finger domain 1-9 and prolin-rich region (p = 0.041 and p = 0.0004, respectively). Variants in the prolin-rich region were associated with the most severe phenotypes (mean severity score: 3.6 ± 0.73). On the other hand, zinc finger domains were associated rather with mild phenotypes (mean severity score: 1.9 ± 0.82) ( Figures 2C,E and 3 and Supplementary Table S1). As several variants did not localize to the functional domains of POGZ, we then analyzed larger regions. Based on the distribution of severity scores, we categorized the variants in three larger regions: between amino acid residue 1-850 (encompassing ZF 1-9), 851-1014 (encompassing prolin-rich region) and 1015-1410 (encompassing CENPB, DDE, coiled coil and integrase domain). Similar to POGZ domains, variants in the region between 1-850 residues was associated with milder phenotype (mean severity score: 1.92) and variants in the region between 851-1014 residues with the most severe phenotypes (mean severity score: 3.30) (Figures 2D,F and 3). One missense variant was detected in the region 850-1014 residues and was also associated with a severity score 4, suggesting that non-sense mediated decay and gene regions play a more crucial role in disease severity than variant types (Figure 3, Supplementary Table S1).
All patients had nervous system involvement. The most common symptoms were speech delay in 88%, global developmental delay in 88%, intellectual disability in 79%, seizures in 60% and sensorineural hearing impairment in 54% of the patients. Behavioral abnormalities were reported in 75%, and among those individuals 42% of the patients showed limited social interactions and 35% had autism spectrum and anxiety disorder (Supplementary Table S1).
Ocular anomalies were diagnosed in 63% of the patients among which strabismus was the most common symptom in 25%, followed by astigmatism and optic nerve hypoplasia with 15%.
Muscular hypotonia and sleep disturbances were also frequently observed, in 54% and 75% of the patient cohort, respectively. Microcephaly was noted in 46%, brachycephaly in 35%, both short neck and short stature in 29%, brachydactyly and small hands in 23% and joint laxity also in 23% (Supplementary Table S1).
A tendency to obesity could be observed in 42% (BMI ≥ 97th percentile), while early feeding difficulties in 23% of the patients. Perinatal complications were reported in 17%, genitourinary tract anomalies in overall 31% of the patients, where one of the leading anomalies was duplicated renal collecting system in 13% (Supplementary Table S1).
A clear tendency in the evolution of the clinical picture could not yet been identified, since only three patients older than 30 years were reported until present. One patient (G01) received the diagnosis of autism spectrum disorder in adulthood, although worsening of her condition has not been mentioned, while another patient (ID 92) presented the worsening of strabismus with age and the development of obsessive-compulsive disorder and bipolar disorder diagnosed in adulthood, following the initial diagnosis of autism spectrum disorder (Supplementary Table S1) [29].

Facial Gestalt Analysis
We used the Face2Gene Software to contrast cohorts comprised of patients with different disease severity and controls. Composite images of POGZ patients with mild (severity score 1 and 2) and severe phenotype (severity score 3 and 4) and healthy controls showed the facial differences between the three cohorts ( Figure 4). POGZ patients have in general hypertelorism, broad forehead, broad nasal bridge, downturned corners of the mouth, cupid s bow of the upper lip and midface hypoplasia. These features are more prominent in patients with severe phenotype than mild. Novel variants are written in bold. * Patient has been previously involved in a study about congenital diaphragm hernia. ** Effect of the variant was predicted in silico by using Varsome, based on the classification of ACMG Guidelines: PVS1: very strong evidence of pathogenicity, PM1-6: moderate evidence, PP1-5: supporting evidence [9]. Path: pathological, VUS: variant of unknown significance, prediction based on ACMG criteria. In patient US02 the severity score may be incorrect due to the young age of the patient and lack of clinical details. The severity score was assigned based on current features but may evolve over time. Scale of disease severity scores: 1 (the mildest) to 4 (the most severe). POGZ transcript: NM_015100.   Supplementary Table S1. Variants with discrepant labelling of severity (e.g., red-labeled variant in the N-terminal domain or zinc finger 1 domain) originate from cohort 3 with less detailed phenotypes. The software found marked differences between healthy individuals and POGZ patients with mild phenotypes (p < 0.001), and also between healthy individuals and POGZ patients with severe phenotypes (p < 0.001). A difference could be observed between individuals with mild and severe phenotypes, but did not rise to statistical significance (Table 4).

Discussion
White-Sutton syndrome represents a wide and variable spectrum of symptoms and abnormalities related to variants or copy number variations in POGZ gene. A clear association between variants and clinical severity has not yet been established [32].
To our knowledge, our study is the largest systemic genotype-phenotype comparison in White-Sutton syndrome, which uses an integrated severity score for each patient based on a detailed clinical scoring system. The same variants from different studies received similar severity score, which indicates that the clinical scoring system was well-established and correctly unified despite the differently elaborated clinical details of the published cases. However, biased severity scores may still be present as a result of missing clinical information from cases with less detailed phenotypes, or differences in the clinical assessments from different studies.
Overall, 117 patients with 72 different variants were enrolled in the analysis, including eight novel variants. The majority of the variants were nonsense (41%) or frameshift (40%), but missense (8.5%), splice site variants (7%), being small in frame and larger deletions (3.5%) were also observed.
Previous studies suggested that missense variants are associated with milder phenotypes, such as autism spectrum disorder and other neuropsychiatric conditions [14,18,22,25,33], while nonsense and frameshift variants are associated with more severe phenotypes [5]. In our cohort, missense variants were also more often associated with milder phenotypes, however one missense variant in the prolin-rich region presented with a higher severity score. This suggests that variant type may have a lesser effect on the clinical outcome than the presence or absence of nonsense-mediated RNA decay or the location of the variants when having escaped NMD (Figures 2 and 3).
Patients whose variants were predicted to undergo NMD presented with milder phenotypes. A potential explanation may be that NMD of the aberrant mRNAs resulted in haploinsufficiency of the POGZ protein, but the rest of the normal mRNAs were translated and could compensate for this loss, thus resulting in a milder phenotype with less dysmorphic features and more mild neurocognitive impairment, such as developmental delay, borderline-mild intellectual disability and behavioral issues, than structural malformations of the brain or other organ systems. This is consistent with the results of Stessman and colleagues, showing that in POGZ ortholog knockdown Drosophila flies, the plastic behavioral response was affected without severe neurological defects [6].
In contrast, variants escaping NMD could result in the translation of the aberrant mRNAs and lead to deleterious gain-of-function or dominant-negative activity of the resulting truncated protein. Such aberrant gain of function or dominant negative activity may impair the activity of the normal allele, and culminate in a more severe phenotype. This is also supported by the fact that most of the potentially pathogenic variants fall within the second half of the protein (Figure 3) [5,6,25,30].
To date, the most important function attributed to POGZ protein is its binding to heterochromatin protein 1α. HP1α may bind to several heterochromatin protein binding proteins (HPBPs): canonical HPBPs with PxVxL motives, such as Aurora B and INCENP-components of chromosome passenger complex (CPC), kinetochore proteins and cohesion-related proteins, as well as proteins of the histone methyltransferase complex and zinc finger proteins, including POGZ. In interphase cells HP1α is located on the chromosome arm, attached to histone 3 (H3K9) and bound to CPC, which inhibits gene expression. During progression into mitosis, a specific zinc finger domain of the POGZ protein (HPZ) binds to HP1α, competing with the CPC proteins for the binding site on HP1α, and this leads to the detachment of HP1α from the chromosome arm. Beside the dissociation of HP1α from the chromosome arm, normal POGZ protein also promotes the correct localization of CPC in the centromeric region, and increases the phosphorylation of histone 3 protein by triggering the kinase activity of CPC. Mutations disabling the HPZ function of POGZ protein abolished its interaction with HP1α. In human POGZ knockdown cells (reduced POGZ protein level in HeLa cells treated with POGZ siRNA) HP1α and CPC remained on the chromosome arms during mitosis, an impaired mitotic progression with mitotic delay, chromosome misalignment and abnormal chromosome segregation could be observed [1]. This also suggests that the effect of mutations undergoing NMD and resulting in haploinsufficiency may differ from that of NMD-escaping mutations with intact HPZ domain.
This, on the other hand, drew attention to the localization of the variants and to the importance of POGZ domains not affected by NMD. To date, only the function of HPZ (NMD-affected), CENBP and DDE domains (NMD-escaping) has been investigated. HPZ was proved to be essential for the chromatin binding and its variants abolished the interactions with HP1α [1], thus impairing the chromatin accessibility and gene expressions [3]. Variants in CENBP domain impaired the nuclear localization of POGZ protein, disrupted its DNA-binding activity, impaired cortical differentiation and increased the activity of excitatory cortical neurons in mice [2,34]. DDE was suggested to interact with transcriptional coactivators (LEDGF/p75) involved in the neuroepithelial stem cell differentiation and neurogenesis [6,35,36].
Our analysis showed that nonsense, frameshift (destroyed by NMD) and missense variants located in the first half of the protein (including ZF 1-8 and HPZ) are associated with milder phenotypes, while variants (NMD-escaping nonsense, frameshift and also missense) in the prolin-rich domain and in its close proximity cause the most severe outcomes. The distal part of the protein, including CENPB, DDE and coiled coil domain, were associated with a mild-to-moderate disease severity. In the C-terminal end of POGZ (integrase domain-binding motif), no pathogenic variants have been identified yet (Figure 3).
Clear association between variants in the HPZ domain and clinical phenotype could not be made. The HPZ domain contained only seven variants (one missense and six truncating), all of which were predicted to survive NMD. Only two variants, located closer to the prolin-rich region, were associated with a higher severity score (Figure 3, Supplementary Table S1). Thus, we suppose that variants located in the prolin-rich domain or in its proximity may have a more deleterious impact on the protein function than variant in the HPZ or CENBP domain. HPZ and CENBP variants may result in the loss of chromatin binding function, due to either NMD or impaired nuclear localization, while prolin-rich region may potentially trigger other mechanisms, such as a dominant negative effect or gain of function. However, a final conclusion could not be made due to the relatively low number of reported variants in these regions, and the lack of further variant-specific functional studies.
The neurocognitive abnormalities with different severity were present in all analyzed POGZ patients. Facial dysmorphic features were also detectable in the great majority of patients, although the extent of facial dysmorphism was different between patients with milder and severe phenotype (Figure 4). Behavioral abnormalities, skeletal anomalies, disorders of the gastrointestinal tract and ocular system also belonged to the major symptoms of White-Sutton syndrome. This was in good concordance with a previous review [32].
In conclusion, we suggest the use of the detailed clinical scoring system developed in this study for the evaluation of POGZ patients. However, fine-tuning of the scoring system, such as prioritizing the clinical features or ruling out some minor ones may be necessary for a more accurate prediction of severity. Functional studies on nonsense-mediated RNA decay and domain functions are nevertheless crucial to unravel the molecular pathophysiology underlying the diversity of POGZ-related disorders.