Phenotypic Correlates of Structural and Functional Protein Impairments Resultant from ALDH5A1 Variants

Objective To investigate the genotype-to-protein-to-phenotype correlations of succinic semialdehyde dehydrogenase deficiency (SSADHD), an inherited metabolic disorder of γ-aminobutyric acid catabolism. Methods Bioinformatics and in silico mutagenesis analyses of ALDH5A1 variants were performed to evaluate their impact on protein stability, active site and co-factor binding domains, splicing, and homotetramer formation. Protein abnormalities were then correlated with a validated disease-specific clinical severity score and neurological, neuropsychological, biochemical, neuroimaging, and neurophysiological metrics. Results A total of 58 individuals (1:1 male/female ratio) were affected by 32 ALDH5A1 pathogenic variants, eight of which were novel. Compared to individuals with single homotetrameric or multiple homo and heterotetrameric proteins, those predicted not to synthesize any functional enzyme protein had significantly lower expression of ALDH5A1 (p = 0.001), worse overall clinical outcomes (p = 0.008) and specifically more severe cognitive deficits (p = 0.01), epilepsy (p = 0.04) and psychiatric morbidity (p = 0.04). Compared to individuals with predictions of having no protein or a protein impaired in catalytic functions, subjects whose proteins were predicted to be impaired in stability, folding, or oligomerization had a better overall clinical outcome (p = 0.02) and adaptive skills (p = 0.04). Conclusions The quantity and type of enzyme proteins (no protein, single homotetramers, or multiple homo and heterotetramers), as well as their structural and functional impairments (catalytic or stability, folding, or oligomerization), contribute to phenotype severity in SSADHD. These findings are valuable for assessment of disease prognosis and management, including patient selection for gene replacement therapy. Furthermore, they provide a roadmap to determine genotype-to-protein-to-phenotype relationships in other autosomal recessive disorders.


Introduction
Succinic semialdehyde dehydrogenase de ciency (SSADHD) (OMIM #271980) is a rare (prevalent in ~ 1/460,000 1 ) inherited metabolic disorder caused by autosomal recessive inheritance of ALDH5A1 sequence variants 2 .Enzyme de ciency results in impaired γ-aminobutyric acid (GABA) catabolism and its accumulation along with other GABA-related metabolites such as guanidinobutyrate (GBA) and γ-hydroxybutyrate (GHB).The phenotype ranges in the severity of a broad spectrum of non-pathognomonic symptoms (cognitive, adaptive, and communication de cits, movement disorders, seizures, sleep disturbances, and psychiatric manifestations such as inattention, hyperactivity, and obsessive-compulsive behaviors 3 ).Attempts to develop gene therapy for SSADHD are ongoing 4 , but current treatment options remain supportive.
ALDH5A1 spans > 38kB on chromosome 6p22 and has an open reading frame of 1605 base pairs that encodes 535 amino acids.Its resultant protein, SSADH, is a mitochondrial enzyme composed of identical monomers arranged in a tetrameric quaternary structure.The crystal structure of human SSADH shows that each monomer comprises an NAD + binding domain (amino acids 1-173, 196-307, and 509-524), a catalytic domain (amino acids 308-508), and an oligomerization domain (amino acids 174-195 and 525-535) 5 .Sixteen identi ed missense mutations determine the substitution of amino acids in different protein domains, impacting both its structure and function.For six of these missense mutations, a preliminary prediction of their amino acid alteration consequence has been proposed based on the combination of visualization of the affected residues' position in the crystal structure 5 and data of activity cellfree extracts 5,6 .
A lack of a clear correlation between genotype to disease-speci c phenotype 7 hinders our ability to de ne disease management criteria, offer de nite prognostic counseling, and develop novel therapies for SSADHD.This study, which includes the largest cohort of genetically con rmed SSADHD subjects, aimed rst to investigate how ALDH5A1 sequence variants structurally and functionally impact the enzyme SSADH.This was accomplished by in silico mutagenesis and in-depth bioinformatic analyses of the chemical and network effects resulting from the substitution of each SSADH residue.The results of these analyses yielded subgroups of quantitative, structural, and functional SSADH molecular impairments.The study's second aim was to correlate between these subgroups to outcomes of clinical, neurophysiological, biochemical, and neuroimaging assessments representing the clinical phenotype of SSADHD.

Settings and population
This study presents an analysis of data gathered from a natural history study of SSADHD (ClinicalTrials.govID: NCT03758521; Boston Children's Hospital Institutional Review Board #P00029917), a prospective and multinational study commenced in 2018 by investigators of the SSADH De ciency Research Consortium and funded by a grant to Washington State University from the National Institutes of Health (NIH R01 1R01HD091142).Clinical assessments and specimen collections were performed at three main clinical sites [Boston Children's Hospital (BCH) in the United States, University Children's Hospital Heidelberg (UCHH) in Germany, and Hospital Sant Joan Déu Barcelona Children's Hospital (UDB) in Spain] with the University of Florida providing data management.In silico mutagenesis and bioanalytical variant analyses were performed by collaborators at the University of Verona, Italy.
After being genetically con rmed with SSADHD, subjects enrolled in the natural history study undergo a series of clinical and laboratory assessments biennially.Disease severity is measured using a validated clinical severity score (CSS) obtained at bedside during study visits 8 .The CSS is a composite score obtained by scoring the severity of ve domains representing the main manifestations of SSADHD: cognitive function, communication, motor function, psychiatric manifestations, and epilepsy.Each domain is scored on a 1-5 scale (1 indicating the most severe clinical phenotype).Participants also undergo neuropsychological evaluations, magnetic resonance imaging (MRI) and magnetic resonance spectroscopy (MRS), electroencephalography (EEG), blood collection for GABA, GHB, and γ-guanidinobutyrate (GBA), and transcranial magnetic stimulation (TMS) studies, the latter which are only completed at the BCH site.

Bioinformatics analyses assessing missense, deletion, or insertion mutations
The functionality of SSADH variants was determined using web-available bioinformatics tools based on the variants' effect on polypeptide chains (truncation or amino acid substitution).Scale-invariant feature transform (SIFT) predicted whether the variant is deleterious or tolerated, and Polymorphism Phenotyping v2 (POLYPHEN2) scored the variant as probably or possibly damaging.Gibbs free energy change (ΔΔG, kcal/mol) was used to assess the stability of SSADH variants (destabilizing unfavorable, destabilizing favorable, stabilizing favorable) and determined using CUPSAT 9 .These methods refer to the single polypeptide chain and do not consider the fact that SSADH is a functional oligomer assembled by four monomeric polypeptide chains 5 .This resulted in different possible SSADH polypeptide chain combinations based on the patient's genotype 10 .Homozygotic missense mutations are presumed to lead to the same amino acid alteration and synthesize the same SSADH polypeptide.Compound heterozygotes are predicted to synthesize two SSADH polypeptide chains, each with a different amino acid change.The theoretical random combination of these polypeptide chains leads to different homo and heterotetrameric species.If the genetic variant (insertion or deletion) results in a premature stop codon on both alleles, a functional SSADH protein would not be synthesized.However, in compound heterozygotes, it may lead to only one functional homotetrameric SSADH 10 .BindProfX was used to predict changes in the binding a nity of monomers to form dimers or tetramers upon variation in the form of ΔΔG values, based on an algorithm that combines FoldX physicsbased potentials with conservation scores from pairs of protein-protein interaction surface sequence pro les.Conservation analyses were performed with the Consurf Server (https://consurf.tau.ac.il/) using the human SSADH amino acidic sequence as input data.A conservation score from 1 (variable residue) to 9 (conserved residue) has been attributed to each residue.Standard molecular diagnostic mutation nomenclature was followed 11 .
In silico analysis of amino acid substitutions in SSADH protein variants Structural analysis and in silico mutagenesis of the three-dimensional structure of human SSADH (PDB: 2W8N) solved by Kim and colleagues 5 was carried out by PyMOL Molecular Graphics System (version 2.5.2,Schrödinger LLC.).The types of substitution, residue localization, microenvironment, and interactions were analyzed.The lower strain value expressed by the software from the in silico mutagenesis was exploited to choose the more stable rotamer and as an informative factor for steric hindrance caused by the substitution of the wild-type amino acid residue with the amino acid substituted as a result of the patient's missense mutation.Molecular interactions between residues were identi ed in a surrounding area of 5 Å by means of the web tool Mol* Viewer 12 .

Splice site analysis of intronic variants
Analysis of splice-site variants was performed using SpliceAI 13 , an interface for splicing prediction ideal for intronic variants that are +/-50 bp from exon borders.The Δ score achieved by this method for each splice-site variant ranges from 0 to 1 and can be interpreted as the probability that the variant affects splicing at any position within a window around it (+/-50 bp by default) 14 .A Δ cut-off score > 0.5 con dently denotes aberrant splicing has occurred; scores > 0.8 indicate the prediction is highly precise; scores ranging from 0.2 to 0.5 assume the occurrence of alternative splicing with production of an aberrant and normal transcript; and scores < 0.2 imply the variant had no effect.
The rst acquisition includes a 1.9 ppm-arranged editing pulse allowing selective refocusing of the GABA multiplet at 3.0 ppm, and a second acquisition allocates the inversion from another location, enabling to determine GABA's J-evolution.The basal ganglia region is sampled since in SSADHD, MRI abnormalities were most consistently detected in this area 25 .The posterior cingulate and occipital cortices are sampled for their reliability to determine GABA measurements [26][27][28] .Spectroscopy data are processed by the LCModel9 software (version 6.3) 29 .

Electroencephalography
EEGs are done as 21-channel digital studies (Natus® NeuroWorks® EEG Software, Natus Medical Incorporated, Ontario, Canada) lasting ~ 30 minutes, with electrodes placed according to the 10/20 International System capture awake and sleep states.Parameters of bipolar and referential electrode montages consist of a 512Hz sampling rate and 24-bit analog-to-digital conversion.
Transcranial magnetic stimulation TMS is applied using the Nexstim 5.1.1 system (Nexstim, Finland).Each participant's anatomical T1-weighted MRI sequence is used for co-registration and frameless stereotaxy.The primary motor cortex is detected using a gure-of-eight coil, while motor-evoked potentials are recorded from the contralateral abductor pollicis brevis (APB) muscle 30 .Electromyography (EMG) is recorded at 3 kHz using a bandpass lter ranging between 10-500 Hz.Resting motor threshold (rMT) is de ned as the operational minimum machine output required to elicit a motor-evoked potential 50 µV from the target resting muscle in >50% of trials.Cortical silent period (CSP) is de ned as the duration from stimulation at 150% rMT to the spontaneous return of voluntary EMG-detected muscle activity on the target muscle.Long-interval cortical inhibition (LICI), de ned as the log transformation of the peak-to-peak amplitude of the second pulse's resultant MEP from the peak-to-peak amplitude of the rst pulse's resultant MEP, was estimated by pairs of stimulations delivered at 120% rMT with interpulse intervals lasting 100 milliseconds.Analysis of EMG signals was performed via LabChart v8.1.17to extract the CSP and LICI metrics.
Plasma GABA and GABA-related gene expression Plasma GABA, GHB, and GBA concentrations were determined after hydrolysis with 6N HCl through stable isotope dilution liquid chromatography-mass spectrometry 31 .The expression of key GABA-related genes (ALDH5A1, Abat, Glud1, GLS) was determined in whole blood collected in PAXgene tubes.RNA extraction was performed using a PAXgene Blood miRNA Kit (QIAGEN, cat no.763134, Hilden, Germany).RNA quality and concentration were determined using a Fragment Analyzer System Kit (cat.no.DNF-472-0500, Agilent, Santa Clara, CA) and the Qubit RNA HS assay kit (Invitrogen, cat.no.Q32855, Waltham, MA). cDNA was obtained using the RT2 First Strand Kit (QIAGEN, cat no.330404) and loaded into a 384-well custom RT2 Pro ler array (QIAGEN, Hilden, Germany).qPCR was performed using a CFX 384 (Bio-Rad Laboratories, Hercules, CA).Gene expression was normalized to GAPDH expression and expressed as 2 −ΔCt .

Statistical analyses
Data were analyzed using SPSS Statistics (IBM SPSS Statistics, Version 28, 2021, IBM Corp, Armonk, NY, USA).Categorical variables are reported as their relative frequencies, whereas continuous variables are reported as either mean ± standard deviations (mean ± SD) or median and interquartile ranges (IQR) after testing for distribution normality.Group comparison and correlations were performed using either parametric tests (t test, one-way ANOVA, and Pearson correlation) or non-parametric tests (Mann-Whitney, Kruskal-Wallis, and Spearman's rank correlation) as appropriate.Signi cance level thresholds of multiple comparisons were corrected by the Bonferroni-Dunn method.Relationships between genotypes and phenotypic variables known to be age-dependent (e.g., plasma GABA, GHB, and GBA, MRS-derived GABA/NAA ratio, and TMS-derived rMT and CSP) were analyzed using a linear regression model that included age as a covariate to obtain estimates of the differences between subgroups' marginal means and their standard errors.A p value ≤ 0.05 was considered signi cant for all analyses.

In silico analyses
According to the computational predictive tools POLYPHEN-2, SIFT, and CUPSTAT, all missense variants were determined to have a pathogenic clinical signi cance (Table 1) in accordance with the American College of Medical Genetics and Genomics (ACMG) standards and guidelines 32 .The variants' functional consequences were investigated by studying the crystal structure of the SSADH enzyme.Initially, variants were mapped to the known domains of the protein: C93F, A139D, R173C, G196D, P203L, G252C, G252D, G252V, G268E, and G520R were mapped to the NAD + binding domain; N335K, G409D, M432L, and G441R to the catalytic domain; and G176R and G533R to the oligomerization domain (Table 1, Fig. 2).Variants mapping to the NAD + binding domain all lead to varying degrees of destabilizing or misfolding of the alpha/beta structure essential to the integrity of the NAD + binding site.Since this domain is large, the functional effects of variants could vary depending on whether they lie on the surface of the NAD + binding site, the NAD + binding groove, or in the secondary structures surrounding the site.This possibility led us to perform additional analyses of the spatial structure of each affected residue and re ne our prediction of the effects of variants on protein function.C93 is present in a hydrophilic cluster (Fig. 2A), and the C93F substitution (c.278G > T) changes its interactions with nearby residues, leading to the collision of a bulky aromatic side chain with several structural elements and disruption of this domain's stability.A139 lies within a hydrophobic interface between two antiparallel alpha-helices of the same monomer, with carbonyl and amidic moieties providing stabilizing polar interactions.The A139D substitution (c.416C > A) destroys the hydrophobic interface, leading to the disassembly of the alpha helices and domain destabilization (Fig. 2B).A173 is positioned at the edge of the NAD + domain and is critical to maintaining the quaternary SSADH tetrameric structure assembled as a dimer of dimers 5 .A173 of one monomer faces the opposite monomer leading to the dimeric structure that will interact with an identical dimer to form the tetramer.The A173C substitution (c.517C > T) leads to the loss of interchain integrity while intrachain bonds remain stabilized by other means (Fig. 2C).G196 is involved in the linkage of two β sheets and is vital for stabilizing the entire NAD + binding domain.The G196D substitution (c.587G > A) alters the stacking interactions of the two β-sheets (Fig. 2D) and destabilizes the NAD + binding domain.P203, located in a buried residue within a hydrophobic cluster that accommodates the phosphate moiety of the coenzyme's ADP portion, correctly positions the catalytic loop (residues 334-344).The P203L substitution (c.608C > T) alters the conformational rigidity of the residue and weakens the network bond (Fig. 2E).G252 resides in a loop connecting secondary structure elements responsible for the large eight stacked β-sheets composing the NAD + domain, which is stabilized by an extensive H-bond network.When substitutions such as G252V (c.755G > T) (predicted to be the worst), G252C (c.754G > T), or G252D (c.755G > A) occur, polar and sterically bulkier residues are introduced into the hydrophobic moiety of the beta sheets and the fold of the eight-β-sheets element is compromised (Fig. 2F).G268 has a fundamental role in maintaining the stability of an α-helix essential for NAD + binding.The G268E substitution (c.803G > A) loosens the structurally essential interactions of this region (Fig. 2G).Finally, G520 takes part in maintaining the secondary structure motif preceding the C-terminal by reinforcing an H-bondbackbone with other residues.Accordingly, the G520A substitution (c.1558G > C) leads to the disassembly of this region and deleterious protein misfolding (Fig. 2H).
The effects played by variants mapping at the catalytic domain are severe.N335K, by directly altering the catalytic loop, leads to larger functional than structural impairment.Alternatively, G409D, M432L, and G441R destabilize the architecture of the catalytic domain, resulting in its structural disassembly.In more depth, N335 belongs to the catalytic loop 5 and maintains its orientation by means of an H-bond network.With the N335K substitution (c.1005C > A), the binding of succinic semialdehyde to its pocket is hindered (Fig. 2I).Since aspartate is a polar residue, the G409D substitution (c.1226G > A) destabilizes the super cial part of the β-sheet to which it belongs (Fig. 2J).The M432L replacement (c.1294A > C) alters the steric hindrance of this residue, possibly leading to erroneous rearrangement of nearby structures (Fig. 2K).Lastly, G441 belongs to a loop connecting structural elements of the catalytic domain, which is destabilized by the G441R substitution (c.1321G > A) (Fig. 2L).
Variants disturbing the oligomerization domain constitute two different Glycine-to-Arginine substitutions that preserve hydrophilicity but make interactions with the other monomer onerous despite mapping distantly on the protein surface.G176 maintains the H-bonds of nearby residues, and its substitution to arginine (c.526G > A) damages the multimeric assembly of the protein (Fig. 2M).The same holds for G533 residing on SSADH's terminal loop, as its substitution by arginine (c.1597G > A) inhibits the proper stacking and inter-monomer interactions of this region (Fig. 2N).
Notably, 10/16 (62%) of the variants involve the substitution of a small glycine residue with a bulkier positively or negatively charged amino acid, resulting, at rst glance, in a profound conformational effect, considering the conformational role played by glycine residues due to their relatively high degrees of freedom.
The steric hindrance resultant from neighboring residues within the monomeric structure of SSADH was highest in the variants G252V (67.95), G252C (56.81),G409D (56.27),G441R (55.84), and G252D (55.70) (Table 3), indicating a larger variation of their resultant protein from the wild-type protein.Interestingly, N335K and G176R variants have lower values of steric hindrance (Table 3) since the former affects the catalytic loop but does not play a steric effect, while the latter affects oligomerization with a neighboring monomer that is not reported by the steric hindrance calculation which is based on steric effects played on the same monomer.
Compared to individuals with single homotetrameric or multiple homo and heterotetrameric proteins, those with no protein had signi cantly lower plasma expression of ALDH5A1 (p = 0.001).They also had lower values of the total CSS (p = 0.008) and lower cognitive (p = 0.01), epilepsy (p = 0.04), and psychiatric (p = 0.04) severity scores.Dyskinesia (p = 0.05), seizures (p = 0.01), and EEG abnormalities (p < 0.001) were signi cantly more prevalent in individuals with no protein or single homotetramers compared to those with a mixed population of homo and heterotetrameric proteins.There was no signi cant relationship between the number of proteins and age, sex, communication and motor CSS domain scores, FSIQ, adaptive function, Autism Spectrum Disorder, and ageadjusted cerebral GABA/NAA ratio, plasma GABA, GHB, and GBA, and TMS-derived parameters (Table 4).An additional comparison was made between the same group of individuals with no production of SSADH protein to two other groups: the rst including subjects in whom protein variants led to stability, folding, or oligomerization defects, and a second in which the resultant defect was catalytic because of affecting structural elements essential to catalysis or belonging to the NAD + "sitting" groove.This comparison showed that with respect to individuals with a stability, folding, or oligomerization defect, those with no protein and a catalysis/NAD + binding defect had signi cantly lower total scores of their total CSS (p = 0.02) and CSS cognitive domain (p = 0.008), lower adaptive function test scores (p = 0.04) and more subjects with dyskinesia (p = 0.03).There was no difference between these groups in any other phenotype parameter assessed (Table 4).
Compared to the rest of the study group, the 14 individuals with splice-site variants (12 of whom were compound heterozygotes) had signi cantly lower scores of the total CSS (mean ± SD of 15.5 ± 2.9 vs. 17.8 ± 2.5, p = 0.01) and CSS psychiatric domain (mean ± SD of 2.6 ± 1.2 vs. 3.4 ± 1.7, p = 0.04).These groups had no differences in other demographic, clinical, biochemical, neuroimaging, or neurophysiologic parameters.Interestingly, the single splice-site variant with Δ score < 0.5 was found in two participants with contrasting clinical courses.The rst one (patient # 19), with a milder clinical outcome, had an additional missense variant (c.278G > T), resulting in milder impairments of the protein's stability and folding.The second one (patient # 59), who had a severe clinical picture including drug-resistant seizures, had an additional non-sense mutation (c.1234C > T) resulting in a truncated protein.

Discussion
SSADHD is a unique inherited disorder of GABA metabolism characterized by a particular phenotype that varies in severity.This study describes the rst report of a genotype-phenotype correlation of SSADHD, performed on the largest studied cohort of individuals with this condition.Predictions of a genotypephenotype correlation in monogenic diseases (e.g., phenylketonuria 34 ) are typically performed by associating genetic variants to their frequency in alleles and genotypes and nally to phenotypes.In this study, in addition to the in silico analyses we performed on individual variants, we assessed the relationship between the SSADH protein population synthesized by each subject to the molecular effect derived from the combination of their variants.Our bioinformatic analyses revealed that ALDH5A1 variants resulting in a truncated or lack of SSADH protein, as opposed to having single homotetramers or a mixed population homo and heterotetramers, are associated with worse clinical severity.Additionally, severe clinical outcomes in SSADHD coincided with impairment in the SSADH catalytic sites, as opposed to impairments in its folding, stability, or oligomerization.Considering SSADHD is an autosomal recessive inherited condition and SSADH is an oligomeric protein, knowledge of the ALDH5A1 allelic variants is informative only if their resultant global molecular effect on the SSADH protein is elucidated.Hence, we propose that the genetic assessments of SSADHD individuals should be protein-focused.
This study's ndings determined that the 32 allelic variants (eight of which are novel), present in 41 unique allelic combinations (Table 1) in 58 SSADHD participants, were pathogenic.Considering the autosomal recessive inheritance of SSADHD, the consequences of these variants on the phenotype of the disease cannot be explained or attributed to a single allelic variant, irrespective of its type (missense, nonsense, frame-shift, or splice site).The complexity of the genotypic pro le of our study population prompted us to perform an in-depth analysis of the effect of every single variant on SSADH protein structure and function and estimate the resultant effect of the combination of variants for each study participant.This innovative approach using extensive crossexamination between genotype, synthesized protein, and phenotype has yielded new correlations and may serve as a model for other rare diseases of similar inheritance.
The information gathered from our extensive variant analysis was used to predict disease presentation using the sizeable clinical database of our natural history study.Speci cally, and in contrast with other studies which were limited in their assessment of the genotype-phenotype relationship by a lack of wellcharacterized phenotypical information 6,7,35 , the clinical phenotype of our patients was thoroughly characterized with a validated clinical severity score along with several quantitative clinical and neurometabolic parameters.This allowed us to more precisely assess the impact of the predicted protein number and functionality on disease outcomes.
A major outcome of the study is the nding that having variant combinations resulting in no SSADH protein or a truncated enzyme predicted worse overall clinical severity, worse cognitive abilities, worse psychiatric symptomatology, and increased seizure intensity.Additionally, we saw that variants resulting in multiple homo or heterotetrameric proteins were associated with fewer seizures and dyskinetic movement disorders than variants resulting in no protein or single homotetrameric proteins.This could be due to positive complementation effects resulting from the combination of polypeptide chains in the SSADH tetramer or from mRNA interallelic splicing that restores one healthy wild-type polypeptide chain.Comparison of the mutated protein structures and functions in subjects whose CSS fell in the 1st and 4th CSS quartiles further supported these observations.Most subjects from the 1st (worst severity) quartile had no protein, and none had multiple homo and heterotetrameric proteins.In contrast, only one subject from the 4th (mildest severity) CSS quartile had no protein.
Why a lack of the SSADH enzyme coincides with the worst clinical outcome as compared to having single homotetrameric or multiple homo and heterotetrameric SSADH proteins can be intuitive, based on the fact that even malfunctioning enzyme variants can provide a minimum of catalytic activity.
Moreover, the functional or partly functional SSADH protein has a tetrameric assembly.In compound heterozygous SSADHD subjects, this may lead to a possibility of many polypeptide chain combinations resulting in different clinical phenotypes.This phenomenon is also observed in other inherited metabolic disorders; in aromatic L-amino acid decarboxylase (AADC) de ciency, a splicing mutation leading to the absence of the enzyme is associated with the most severe clinical phenotype 36 .In phenylketonuria, splicing mutations are predicted to affect protein synthesis critically and worsen clinical outcomes.Further, in compound heterozygous phenylketonuria patients, variable production of phenylalanine and degrees of clinical severity depend on the combined effect of their two variants 37 .
As expected, the expression of ALDH5A1 was also lower in subjects who completely lacked the protein.Conceptually, it would be anticipated that a complete lack of the SSADH enzyme would result in higher values of cerebral and systemic GABA and its metabolites and accordingly, in cortical inhibition.However, no differences were found between protein subgroups in the age-adjusted means of MRS-derived GABA/NAA ratio, plasma GABA, GHB, and GBA levels, and TMS-derived indices of cortical inhibition.This could result from the small sample size of these subgroups or the absence of MRS and TMS data in the multiple homo and heterotetrameric proteins subgroup.It is also possible that this lack of correlation resulted from other multifactorial in uences and complex GABAergic homeostatic mechanisms affecting the concentrations of GABA and its metabolites.GABA (and GABA-related metabolites) were shown to be dependent on GABA receptor expression (known to be downregulated in SSADHD) [38][39][40] and polymorphisms in genes related to the GABA shunt 6 .
Longitudinal measurement of these neurotransmitters will be needed to estimate the trajectory of their concentrations in relation to genotype.
Another signi cant outcome of the study is that subjects whose variants result in proteins impaired in stability, folding, or oligomerization have better overall clinical outcomes and adaptive functions than those with no protein or protein with impaired catalytic function.Here again, comparisons of subjects' protein structural and functional pro les in the 1st and 4th quartiles of clinical severity scores were informative.The type of protein impairment observed in patients within the 1st CSS quartile (worst severity) was limited to impairment in catalytic function, as opposed to all subjects in the 4th quartile who only had single homotetrameric or homo and heterotetrameric proteins impaired in stability, folding, or oligomerization.In other inherited metabolic disorders, it has been reported that defects in folding, stability, or oligomerization are less disruptive than those resulting in loss of function from catalytic defects.In AADC de ciency, for example, puri ed recombinant pathogenic variants prone to misfolding or aggregation were shown to maintain catalytic activity, contributing to milder clinical phenotypes.In contrast, variants resulting in catalytic impairments usually lead to loss of function despite the lack of their structural disassembly 41 .Another example may be provided from phenylketonuria, in which it was demonstrated that residual activity of phenylalanine hydroxylase is the major determinant for disease severity in functionally hemizygous patients 42 .While folding and stability defects affect the oligomeric equilibrium between tetrameric and dimeric species of phenylalanine hydroxylase, they are related to milder forms of the disease 43 .The explanation for these ndings could also stem from evidence demonstrating that the catalytic loop of the enzyme is in uenced by environmental redox status, which in turn can lead to its structural modi cations 5 .Two cysteine residues mapping on that catalytic loop regulate this process: Cys340 and Cys342.Amino acid substitutions altering the mobility of the 2-Cys loop may negatively affect the proper response to reactive oxygen species and change in redox status.Among the identi ed ALDH5A1 variants, N335K and G441R are in proximity to the 2-Cys loop and could be responsible for the worst phenotypes characterizing functionally hemizygous SSADHD subjects bearing them.
Our ndings also have implications for gene replacement therapy for SSADHD.Since a functional SSADH is arranged in a tetrameric form, the proper assembly of the enzyme after gene therapy may ultimately govern therapeutic outcomes and e cacy.It is generally accepted that SSADH variant carriers are non-symptomatic.However, reports have suggested that non-pathogenic SSADH polymorphism leading to ~ 82% enzyme activity might su ce to contribute to cognitive decline and reduced survival in the elderly 44 .The knowledge gained from our structural and functional analyses of pathogenic ALDH5A1 variants may thus be useful in the design of gene-editing or gene-replacement strategies, helping predict the functionality of the monomers produced by the wild-type gene when they assemble with the patients' diseased monomers, and optimize the utility of gene therapy.
There are limitations to our study.While the genetic and protein analyses we performed are from the largest cohort of SSADHD patients ever studied, not all participants underwent all the neuroimaging, neurophysiologic, and neuropsychiatric assessments.This limitation is common in rare disease research, especially when affected individuals have a low tolerance to lengthy study procedures without sedation.Further, it must be pointed out that the bioinformatic predictions of SSADH structure and function are based on the enzyme's crystal structure.Crystal structures are "frozen" models that cannot be used to predict the structural mobility of enzymes in their active state.In the case of SSADH, working with the crystal structure of the protein may have led to underestimating the impact of some variants on the catalytic capacity of the enzyme and disease presentation.Future studies will be needed to address this point, where recombinant SSADH variants will be cloned and expressed in appropriate cell models in combinations matching those of the individuals enrolled in our natural history study.These in vitro models would then be used to perform "personalized" predictions between the variant pro le of each patient, enzyme kinetics parameters, and the patient's clinical presentation.Lastly, as discussed above, other genetic factors likely contribute to disease presentation.These factors include the patient's family genetic background, genes involved in the expression of GABA receptors, and receptors known to regulate downstream GABA signaling pathways.The expression of GABA receptors (their subunits) was not assayed in our study samples but should be included in customized pro ler arrays in the future.Such information would complete the neurobiological interpretation of our ndings related to the regulatory changes of GABA receptors in response to the hyperGABAergic state of the patients.

Conclusions
This is the rst comprehensive study of genotype-to-protein-to-phenotype correlations in SSADHD, an autosomal recessive inherited disorder with a unique neurometabolic phenotype.Bioinformatics and in silico modeling of a large number of ALDH5A1 variants were used to predict the impact of the variants and variant combinations on protein structure and function.This information, coupled with the extensive clinical information gathered from the SSADHD natural history study, provided signi cant and novel insights into the relationships between gene, gene product, and disease phenotype.Worse clinical outcome was found in SSADHD subjects with a resultant lack of protein, as opposed to single homotetramers or multiple homo and heterotetramers.A milder clinical severity was seen in those whose resultant proteins were impaired in their stability, folding, or oligomerization as opposed to catalytic sites or lacking a protein.These ndings are clinically relevant and potentially useful for prognostic estimations, disease management, and patient selection in future gene replacement therapy trials.Importantly, our approach to studying a genotype-phenotype relationship, including protein structure and function, may serve as a template to determine genotype-to-protein-to-phenotype relationships in other autosomal recessive rare disorders.For each residue, the main contacts with neighboring residues are displayed.The gure is rendered with PyMol (Molecular Graphics System (version 2.5.2,Schrödinger LLC).

Figure 1 Rate
Figure 1Rate of occurrence of 32 ALDH5A1 variants in 58 individuals with succinic semialdehyde dehydrogenase de ciency.

Figure 2 Representation
Figure 2 Representation of the SSADH amino acids subjected to substitution.Ribbon representation of the tetrameric assembly of human SSADH (PDB: 2W8N9) in which one monomer is colored by domain organization: NAD+ binding domain in yellow, catalytic domain in red, and oligomerization domain in blue.The other monomers are colored white, light green, and light purple.The amino acids that are subjected to substitution are represented by green sticks.A-H) Residues belonging to the NAD+ binding domain, I-L) residues belonging to the catalytic domain, and M-N) residues belonging to the oligomerization domain.

Table 1
In-Silico analyses of the 32 ALDH5A1 allelic variants found in SSADHD individuals of this study.

Table 2
Allelic variants, zygosity, resultant proteins combination, and eventual protein impairment effect of the SSADHD patients included in the study.
** Also a minor effect in stability/folding ** Also a minor effect in stability/folding

Table 4
Relationship between genotype expressed in clusters of protein quantity and impairment effect to clinical phenotype.Individuals with no SSADH protein are compared to A) those with Single Homotetramers and Multiple Homo and Heterotetramers and B) those with different effects of protein impairments.