The Role of Dopaminergic Genes in Probabilistic Reinforcement Learning in Schizophrenia Spectrum Disorders

Schizophrenia spectrum disorders (SZ) are characterized by impairments in probabilistic reinforcement learning (RL), which is associated with dopaminergic circuitry encompassing the prefrontal cortex and basal ganglia. However, there are no studies examining dopaminergic genes with respect to probabilistic RL in SZ. Thus, the aim of our study was to examine the impact of dopaminergic genes on performance assessed by the Probabilistic Selection Task (PST) in patients with SZ in comparison to healthy control (HC) subjects. In our study, we included 138 SZ patients and 188 HC participants. Genetic analysis was performed with respect to the following genetic polymorphisms: rs4680 in COMT, rs907094 in DARP-32, rs2734839, rs936461, rs1800497, and rs6277 in DRD2, rs747302 and rs1800955 in DRD4 and rs28363170 and rs2975226 in DAT1 genes. The probabilistic RL task was completed by 59 SZ patients and 95 HC subjects. SZ patients performed significantly worse in acquiring reinforcement contingencies during the task in comparison to HCs. We found no significant association between genetic polymorphisms and RL among SZ patients; however, among HC participants with respect to the DAT1 rs28363170 polymorphism, individuals with 10-allele repeat genotypes performed better in comparison to 9-allele repeat carriers. The present study indicates the relevance of the DAT1 rs28363170 polymorphism in RL in HC participants.


Introduction
Schizophrenia (SZ) is a complex disorder with diverse symptomatology, including positive symptoms, negative symptoms, and cognitive dysfunction [1][2][3]. Most research on schizophrenia has focused on dopaminergic circuitry abnormalities as a key mechanism underlying the pathogenesis of disorder [4]. Dopamine (DA) has been linked to both positive and negative SZ symptoms [5], with irregular DA release hypothesized to ascribe aberrant salience to irrelevant stimuli [6]. Disrupted DA function has been hypothesized to prevent appropriate reinforcement learning (RL) [4]. RL relies on modifying expectations and behavior following positive and negative outcomes so that adaptive actions are more likely to be repeated in the future, whereas maladaptive actions are more likely to be suppressed over time [7]. Positive outcomes reflected in terms of deviations from current expectations are called positive prediction error (PE), which is encoded by phasic bursts of DA [8]. Similarly, negative PE are encoded by phasic dips or pauses in dopaminergic activity when rewards are expected but not received [9]. These phasic bursts and dips in DA modify synaptic plasticity in the connections between prefrontal cortical areas (PFC) and the basal ganglia (BG) (mainly striatum), allowing the system to incrementally become more likely to select actions that are adaptive and avoid actions that are maladaptive [10].
Psychotic symptoms in SZ could be partially understood in terms of faulty PE signals that fail to discriminate between adaptive and nonadaptive associations, giving attention to irrelevant stimuli that in fact should be ignored [11]. This is manifested by poor performance on latent inhibition, blocking, overshadowing, and learned irrelevance tasks in patients with SZ (for a review see [12]). Moreover, functional imaging studies have shown that striatal reinforcement PE signals are disrupted in patients suffering from psychotic symptoms [13,14]. In the probabilistic RL paradigms, patients with SZ have been shown to have relatively impaired learning from positive PE with spared learning from negative PE [15]. Moreover, it has been shown that patients with SZ fail to show striatal DA-dependent implicit tendency to speed response in the face of high-reward incentives [16]. It has been reported that patients with SZ and a high level of negative symptoms compared to patients with SZ and a low level of negative symptoms have difficulty in using positive expected values to guide novel choices [17]. Results from a recent study on both medicated and unmedicated patients with SZ suggest that both groups of SZ patients show overreliance on PE-driven learning and have less dependence on learning explicit value representations, while the unmedicated group show additionally greater decision noise in comparison to healthy control participants [18].
Genes related to dopaminergic neurotransmission are associated with the risk of developing SZ, and their relevance to the development of SZ have been confirmed in the genome-wide association studies (GWAS) [19]. The dopaminergic D2 receptor gene and those that are involved in the upstream regulation of DA synthesis are among many genes associated with the risk of SZ [20]. On the other hand, RL has been suggested as a candidate for an intermediate phenotype in SZ [21]. Recent studies have demonstrated that individuals at increased clinical risk of developing psychosis are characterized by subtle PE abnormalities during RL task performance [22], and patients with first-episode psychosis (FEP) present lower learning rate as well as lower sensitivity to reward and punishment in RL tasks [23]. It was reported in a neuroimaging study that polygenic risk score for SZ (PRS) is associated with striatal activation during reward anticipation among healthy adolescents [24], but PRS was not shown to be associated with performance on RL in the general population [23]. Until now, dopaminergic genes associated with prefrontal cortical and striatal dopamine function have been shown to be predictive of individual differences in RL with respect to ability to learn from rewards and punishments as well as to adapt behavior on a trial-to-trial basis, respectively [25][26][27][28][29][30][31][32].
The most widely studied dopaminergic genes with respect to RL are genes associated with prefrontal level of DA (COMT, DRD4) and striatal dopaminergic D1 and D2 receptors functionality (DARPP-32, DRD2, DAT1). It has been found that the COMT gene which is linked to PFC DA level is also associated with trial-to-trial adjustments after negative feedback [25], and with accuracy in learning during trials with incongruent coupling of action and valence [30]. In one meta-analysis, the COMT gene was also confirmed to be associated with reward learning in RL tasks [31]. In turn, the DARP-32 gene associated with D1-dependent synaptic plasticity in the striatum has been shown to predict decisions probabilistically associated with positive outcomes [25,27]. On the other hand, the DRD2 gene, affecting postsynaptic D2 receptor density in the striatum without affecting presynaptic DA function, has been shown to influence learning to avoid decisions that are probabilistically associated with negative outcomes [25,27,29], and to predict learning to inhibit an action to obtain rewards [30].
Although there is a body of research showing the relevance of the dopaminergic PFC-BG circuitry to RL performance in the healthy population and the association of genes related to dopaminergic neurotransmission with the risk of SZ development and SZ symptomatology with reinforcement sensitivity, it has not been shown whether variation in dopaminergic genes is associated with RL in SZ. Therefore, in the present study, we aimed to examine a differential impact of variants in dopaminergic genes (rs4680 in COMT gene, rs747302, rs1800955 in DRD4 gene, rs907094 in DARP-32 gene, rs2734839, rs936461, rs1800497, rs6277 in DRD2 gene, rs28363170, rs2975226 in DAT1 gene probabilistic RL task performance in patients with SZ compared to healthy controls.

Participants
In our study, we included 138 patients with schizophrenia spectrum disorders: 18% with schizoaffective disorder, 62% with paranoid schizophrenia and 20% with first-episode psychosis (58 males/52 females, aged 37.61 ± 12.99 years) and 188 healthy controls (66 males/122 females, aged 39.07 ± 18.74 years). A diagnosis of SZ was based on the DSM-IV and ICD-10 criteria, validated using the Operational Criteria for Psychotic Illness (OPCRIT) checklist [33]. All patients were of Caucasian origin and were recruited from the hospitals and out-patient units in the Lower Silesian area, Wroclaw, Poland. The exclusion criteria were as follows: general brain disorder, intellectual disability, severe physical health impairments and comorbid drug and/or alcohol use disorder (except of nicotine dependence). The study was approved by the Wroclaw Medical University Ethics Committee and all participants gave written informed consent.

Clinical Assessment
Clinical manifestation was assessed using the Brief Psychiatric Rating Scale (BPRS) (Overall and Gorman, 1962), the Scales for the Assessment of Negative Symptoms (SANS) and Positive Symptoms (SAPS) [34], the Positive and Negative Syndrome Scale (PANSS) [35], the Montgomery-Asberg Depression Rating Scale (MADRS) [36], and the Hamilton Depression Rating Scale (HDRS) [37]. General functioning was recorded using the Global Assessment of Functioning scale (GAF) (American Psychiatric Association, 1994). The dosage of antipsychotics was expressed as chlorpromazine equivalents (mg/day) [38].

General Neuropsychological Assessment
Participants were assessed with respect to cognitive performance on the Repeatable Battery for the Assessment of Neuropsychological Status (RBANS) [39]. The RBANS is a brief, neuropsychological screening measure. It consists of 12 subtests that can be combined into five domains: immediate memory (list learning and story memory), visuospatial/constructional (figure copy and line orientation), language (picture naming and semantic fluency), attention (digit span and coding), and delayed memory (list recall, list recognition, story recall, and figure recall).

Probabilistic Selection Task
To assess RL, we used a computerized version of the Probabilistic Selection Task (PST) [40]. The task has training and test phases. During the training phase, participants are required to choose between three stimulus pairs presented in a random order (AB, CD, EF). In each trial, a participant makes a choice between stimulus pairs. These stimulus pairs were followed by predetermined probabilistic feedback (80:20%, 70:30%, 60:40%). In the test, Japanese Hiragana characters are used as stimulus pairs to minimize explicit verbal encoding. Acquisition blocks consisted of 30 trials. The training phase was terminated when participants either achieved a criterion as defined by 65% correct in the AB (80:20) condition, 60% correct in the CD (70:30) condition, and 50% correct in the EF (60:40) condition or after 120 trials were completed. This criterion was intended to prevent overlearning of the contingencies prior to the test phase [41]. Following training phase, participants completed test phase involving 75 combinations of paired stimuli, of which 15 consisted of prior parings and 60 involved novel combinations. No feedback was provided during this phase. During the test phase, we analyzed the participants' performance using old and novel test pairings involving an A or B stimulus (AB, AC, AD, AE, AF and BC, BD, BE, BF). Positive-reinforcement learning (Go) was assessed by the choose-A frequency, while avoidance learning (NoGo) is assessed by the avoid-B frequency in the old and novel pairs.

Genotyping
We analyzed functional polymorphisms, including single nucleotide polymorphisms (SNPs) and variable number of tandem repeats (VNTR) polymorphisms, that have been shown to be associated with schizophrenia and/or affect dopaminergic neurotransmission: the COMT gene Val158Met (rs4680), the DARPP-32 gene (rs907094), the DRD2 gene (rs2734839, rs936461, rs1800497 and rs6277), the DRD4 gene (rs747302 and rs1800955), the DAT1 gene (rs28363170 and rs2975226). The polymorphism Val158Met (rs4680) in the COMT gene resulting in G-to-A transition at codon 158 of the COMT gene causes a valineto-methionine (Val158Met) substitution. Met allele carriers have lower COMT enzyme activity and higher prefrontal DA levels [42]. The polymorphism rs907094 in the DARPP-32 (PPP1R1B) gene resulting in C-to-T transition influences the abundance of intracellular DARPP-32 mRNA and modulates striatal activity as well as function [42]. Polymorphism rs1800497 in the DRD2 gene (DRD2/ANKK1 Taq1A, C32806T Glu713Lys) is associated with DA receptor density and availability [43,44]. The polymorphism rs2734839 in the DRD2 gene is associated with A-to-G transition and has been shown to be associated with risk of schizophrenia [45]. The polymorphism rs6277 in the DRD2 gene affects DRD2 mRNA stability [46] and has been found to predict striatal DRD2 availability [47]. It influences striatal D2 postsynaptic receptor density without affecting presynaptic DA function [48]. Polymorphism rs1799732 in the DRD2 gene (−141 Ins/Del) has been associated with striatal receptor D2 density [49] and striatum response to reward [50]. Polymorphism rs1800955 of the DRD4 gene, resulting in C-to-T transition, has been shown to influence transcriptional efficiency by 40% [51] and is predictive of error-related prefrontal activity and compensatory behavioral adjustments following these errors [52]. Polymorphism rs747302 in the DRD4 gene has been found to influence reduced transcription and to lower DA receptor density [53]. Polymorphism rs2975226 in the DAT1 gene is associated with A-to-T transition and has been significantly associated with schizophrenia [54]. The polymorphism rs28363170 in the DAT1 gene has been associated with reduced expression of dopamine transporter protein, resulting in relatively increased extrasynaptic striatal dopamine levels in the former [55,56]. It is predictive of brain and behavioral responses to cognitive flexibility [57], modulates striatal activation as a function of working memory load [58] and influences implicit learning [59].
Venous blood samples were collected from participants who agreed to genetic analysis. Genomic DNA was isolated from peripheral white blood cells using the DNA Blood Midi Kit (Qiagen). The SNPs in the following genes: DRD2 (rs2734839, rs1800497, rs1799732, rs6277), DRD4 (rs936461, rs1800955), COMT (rs4680) and DARPP-32 (rs907094) were genotyped with the allelic discrimination (AD) technique with the use of validated and predesigned TaqMan SNP Genotyping Assays ((C___2601170_10, C___7486676_10, C__33641686_10, C__11339240_10, C___7470693_30, C___7470700_30, C__25746809_50 and C___7452370_1_, respectively) according to the manufacturer's instructions (Thermo Fisher Scientific Inc., Waltham, MA, USA). The PCR-RFLP technique was used for the DAT1 rs2975226 (Psyl) polymorphism. The DAT1 VNTR polymorphism (rs28363170) was genotyped using PCR with a pair of primers, where the forward one was labeled with 6-FAM followed by capillary electrophoresis in the presence of the GeneScan™ 600 LIZ ® Size Standard on 3500 Genetic Analyzer (Thermo Fisher Scientific Inc., USA). Individual genotypes, i.e., the 9-allele (440 bp) and the 10-allele (480 bp) ones, were detected according to the peak size on the GeneMapper ® Software version 4.0 (Thermo Fisher Scientific Inc., Waltham, MA, USA).

Statistical Analysis
The comparison of general characteristics between HC and SZ patients was performed using the analysis of variance (ANOVA) in case of continuous variables and the χ 2 test in case of categorical variables. The comparison of genotype and allele distribution between HC and SZ was performed using the χ 2 test. Similarly, the χ 2 test was used to test whether genotype distributions were in agreement with the Hardy-Weinberg equilibrium (HWE). The ANOVA was also applied to assess the association between studied genetic polymorphisms and performance during the training and test phases of the reinforcement task. Bonferroni correction was used due to multiple testing (10 SNPs), the significant p-value for genetic testing was below 0.004.
To compare the acquisition of contingencies between SZ patients and HC participants, a two-way ANOVA with factors of group and reinforcement probability as well as appropriate post hoc tests were performed on participants' performance on PST (Levene's test of homogeneity p-value greater than 0.05). In order to compare overall probabilistic selection performance, we created a summary measure by averaging the proportion of correct responses from all three conditions of each stimulus pair (AB, CD, EF) and we used a t-test to assess difference between SZ patients and HC participants. As a test of general neuropsychological functioning and experimental task performance, we used gender, educational level and RBANS total score as covariates in analysis of covariance (ANCOVA) with the effects of group (SZ vs. HCs) and reinforcement probability as independent variables. The learning of probabilistic contingencies was also assessed at the test phase using ANOVA, with the effects of group (SCZ vs. HCs) and reinforcement probability as factors. Group differences in test phase performance were assessed using t-tests for the choose-A frequency (Go) and the avoid-B frequency (NoGo) generated from cumulative test phase scores on the pairs involving A (Go) and pairs involving B (NoGo).
In correlational analyses of continuous variables, we used Pearson' and Spearman's correlation coefficients depending on the normality of distribution of the given variable (Kolomogorov-Smirnov test p-value greater than 0.05). We performed correlation analyses to assess relationships between performance on PST and clinical symptom ratings (SANS, SAPS, PANSS, MADRS, BPRS, HDRS), general functioning (GAF), chlorpromazine equivalent dosage and neurocognitive functioning (RBANS). We also used correlation to assess associations between RBANS and clinical symptomatology, general functioning, and chlorpromazine equivalent. All tests were two-tailed with the level of significance was set at p-value less than 0.05. Statistical analysis was performed using the Statistical Package for Social Sciences, version 20 (SPSS Inc., Chicago, IL, USA).

Results
The general characteristics of the sample are presented in Table 1. The distribution of specific genotypes followed the HWE (p-value > 0.05), except for two polymorphisms: DRD4 rs747302 in SZ patients and DAT1 rs2975226 among HC participants, which were excluded from further analysis. There were no significant differences in genotype distributions between patients with SZ and HC participants with respect to all studied polymorphisms (p-value > 0.05) (Table A1). Cognitive assessment on RBANS was completed by 110 SZ patients and 188 HC participants. The RL task was completed by 59 SZ patients and 95 HC participants. We excluded eight SZ patients and five HC participants due to high numbers of omissions either in the training phase or test phase of PST. A t-test assessing the difference between SZ patients HC participants on the summary measure created by averaging the proportion of correct responses from all three conditions of each stimulus pair (AB, CD, EF) during the training phase of PST showed better overall performance of SZ patients (59.04 ± 11.76) in comparison to HC participants (66.38 ± 12.81) (df = 139, p-value = 0.001). Two-way ANOVAs for accuracy during the training phase of PST showed a significant main effect of group (F = 15.52, p-value < 0.0001), no main effect of reward contingency (F = 1.13, p-value = 0.324), and a significant group x reward contingency interaction (F = 3.07, p-value = 0.047) (Model 1, Table 2). Post hoc tests revealed that HC participants performed better than SZ patients in the CD (70%/30%) condition (70.39 ± 20.33 and 57.12 ± 17.63, respectively) (p-value = 0.001). Proportions of correct responses given by participants during the training phase of PST with respect to each condition (AB, CD, EF) are shown in Figure 1a. Effects of group and reward contingency on acquisition accuracy of probabilistic contingencies during PST for AB, CD and EF conditions in SZ patients and HC participants are shown in Figure 1a,b.
(a) (b)   There were significant differences in the distribution of gender and educational level in our samples, so we included these variables as covariates in an ANCOVA analysis. We recoded education level into two categories-people with and without higher education. In the ANCOVA models, we found that the gender was not a significant covariate with respect to averaged performance on the training phase of PST, while the educational level proved to be a significant covariate (Model 2, Table 2). We found that the use of RBANS total score and educational level as covariates in an ANCOVA was a significant variable in the effects of group and reward contingency on task performance (Models 3 and 4, Table 2). However, when we used all the covarying variables (gender, educational level and RBANS total score), the model was explaining the highest variance and these covarying variables were not significant, while the interaction of the group (SZ, HC) and reward contingency (AB, CD, EF condition) were significant on the trend level (p-value = 0.056).
Group differences in test phase performance using t-tests for measures of choose-A frequency (Go) and avoid-B frequency (NoGo) generated from cumulative test phase scores on the pairs involving A (Go) and pairs involving B (NoGo) showed no statistically significant difference in choose-A frequency (p-value = 0.650) nor in avoid-B frequency (p-value = 0.147). Correlational analysis showed no significant associations of choose-A nor avoid-B frequencies with clinical variables (BPRS, PANSS, SAPS, SANS, RBANS, MADRS, GAF, CPZ, HDRS). Importantly, there was a significant negative association between choose-A and avoid-B frequency among SZ patients (r = −0.39, p = 0.005), while among HC participants there was no significant association between choose-A accuracy and avoid-B accuracy (r = −0.02, p-value = 0.858).
The results of analysis of variance (ANOVA) showing the associations between studied genetic polymorphisms and performance during the training and test phases of the RL task are shown in Table 3. There was a significant association between COMT rs4680 polymorphism and averaged learning performance in the training phase of PST (p-value = 0.035) among HC participants. Further analysis did not show significant differences in accuracy between COMT rs4680 polymorphism Met allele carriers (Met/Met and Met/Val genotypes) in comparison to individuals with Val genotypes (p-value > 0.050). There were significant associations between the DRD4 rs1800955 polymorphism and choose-A frequency in the test phase of PST in the whole group (p-value = 0.039) and among HC (p-value = 0.047) participants; however, post hoc tests did not show any statistically significant differences between genotypes (p-value > 0.005). Moreover, there was a significant association between the DAT1 rs2975226 polymorphism and averaged learning performance in the training phase of PST among SZ patients (p-value = 0.018) and DAT1 rs28363170 polymorphism and averaged learning performance in the training phase of PST in the whole group (p-value = 0.042) and among HC participants (p-value = 0.004). Additional analysis did not show better accuracy in overall learning among A allele carriers and individuals with TT genotypes of the DAT1 rs2975226 polymorphism among SZ patients (p > 0.05); however, there was worse accuracy in overall learning on PST among 9-allele carriers in comparison to 10-allele genotypes of the DAT1 rs28363170 polymorphism among HC participants (p-value = 0.007) ( Figure 2). However, it was not significant after applying Bonferroni correction (p-value > 0.004).

Discussion
Results from the current study are consistent with earlier research, showing an overall impairment of probabilistic RL in patients with SZ compared to HC participants [60][61][62][63][64][65]. For a long time now, the inability of patients with schizophrenia to adopt to environ- Figure 2. Effects of DAT1 rs28363170 polymorphism on acquisition accuracy of probabilistic contingencies during PST for AB, CD and EF conditions in HC participants: (a) with respect to 9R/9R, 9R/10R and 10R/10R genotypes; (b) with respect to 9R allele carriers (9R/9R and 9R/10R genotypes) and 10R/10R genotypes. Abbreviations: group: healthy control (HC), schizophrenia (SZ); DRD2-gene encoding dopaminergic D2 receptor, DRD4-gene encoding dopaminergic D4 receptor, COMT-gene encoding catechol-o-methyltransferase, DAT1-gene encoding dopamine transporter gene, DARP32-gene encoding dopamine and cAMP-regulated phosphoprotein of molecular weight 32 kDa, significant associations (p-value less than 0.05) are marked in bold, distributions that did not follow the HWE are marked in italic.

Discussion
Results from the current study are consistent with earlier research, showing an overall impairment of probabilistic RL in patients with SZ compared to HC participants [60][61][62][63][64][65]. For a long time now, the inability of patients with schizophrenia to adopt to environmental conditions flexibly and adequately has been associated with deficits in feedback-driven learning and reinforcement-based decision making [17,66]. The PST using Hiragana characters used in order to reduce verbal encoding of stimuli originally used in patients with Parkinson's disease [40] has been used a few times in SZ samples, showing their impaired performance compared to HC participants [41,67,68]. Moreover, the use of images of common objects produced similar results in the PST task [18,41,69].
Research on learning from rewards or punishments and impairment severity have yielded mixed results [70]. Most previous research has found that RL deficits are mainly due to impairment in learning from positive feedback [41,66,69,71]; however, we observed a similar level of performance in both positive-reinforcement learning (Go) and avoidance learning (NoGo) in the test phase of PST among patients with SZ, which is in agreement with some previously reported results [67,68]. However, in our study, we observed worse performance of patients with SZ in comparison with HC only during the CD (70%/30%) condition, which may explain our results of similar performances in the testing phase, which included either A or B stimuli that were used to assess Go and NoGo tendency in PST. Based on computational modeling studies, it has been suggested that learning impairments from positive PE can be accounted for by reduced striatal D1 receptor function, compounded by noisy phasic DA signals that do not appropriately signal the positive PE magnitude [10]. On the other hand, relatively spared learning from negative PE in SZ has been attributed to the striatal D2 receptor blockage by antipsychotics [72]. However, antipsychotic drugs vary widely in their affinity to different receptor types, with secondgeneration antipsychotics having weaker affinity to D2 receptors but greater affinity to D1 receptors in comparison with first-generation antipsychotics [73]. Specifically, D1 receptor occupancy by clozapine is thought to contribute to its atypical properties [74]. Moreover, it has been shown that different types of antipsychotic drugs exert different impairments depending on cognitive domain [75]. Additionally, antipsychotic drugs, including clozapine, have been shown to be effective in treating cognitive dysfunctions through genetic-driven dopaminergic mechanisms [76].
Contrary to our expectations, we did not find significant correlations between RL and negative symptoms which have previously been identified in patients with SZ [17,41,71]. Inconsistencies among findings may reflect differences in sampling, since prior studies recruited patients with SZ and predominantly negative symptoms [41], or a smaller percentage of patients with schizoaffective disorder [17,77]. Moreover, it has been argued that RL processing is associated more with primary than secondary causes of negative symptoms (e.g., depression, anxiety, disorganization), and thus the duration of illness might be a reason for obtaining conflicting results showing the association of negative symptoms with RL among chronic SZ patients, but not FEP samples [78]. Therefore, the inclusion of patients with FEP might have attenuated the association between negative symptoms and learning performance.
There are several behavioral and neuroimaging studies linking candidate genes involved in the etiology of SZ with RL performance in HC participants [25,30,79,80]. In our study, we have shown that the DAT1/SLC6A3 rs28363170 polymorphism is associated with RL performance on PST among HC participants. This is in line with research showing that a relatively large proportion of the variance in higher cognitive functions across the population can be accounted for by genetic factors [81]. The DAT1 gene contains a variable number of tandem repeats (VNTR) polymorphism, with the two most frequent alleles in the population being nine-and ten-repeat (9R and 10R) alleles [82]. Several in vitro studies suggest that the 9R allele relative to the 10R allele of the DAT1 rs28363170 polymorphism is associated with a reduced expression of DA transporter protein (DAT) [55,56,83], while in vivo, single-photon emission computed topography (SPECT) studies have produced mixed results, with a recent meta-analytic study showing that the 9R allele is associated with increased DAT expression, and thus potentially more efficient reuptake of DA in comparison with the 10R variant [84]. DAT is primarily expressed in the striatum, with only scarce expression in the prefrontal cortical areas [85]. Lower density of DAT in the striatum results in relatively increased extrasynaptic striatal DA availability among carriers of the 9R allele compared to 10R/10R genotypes [86][87][88]; however, opposite results have also been reported [89].
The DAT1 VNTR functional polymorphism, which has been shown to be predictive of neural and behavioral responses to cognitive flexibility [57], modulates striatal activation as a function of working memory load [58], influences implicit learning [59] and extinction of conditioned fear responses [90]. Neuroimaging studies demonstrated that DAT availability in the striatum is correlated with neural response to reward anticipation in the nucleus accumbens [91], and the DAT1 rs28363170 polymorphism has been associated with rewardrelated activity, reported to be associated with the 9R allele [50,[92][93][94]. It has been shown that the DAT1 rs28363170 polymorphism is associated with PE-based learning [88][89][90]93], but also with instructional control [32], task-switching [93] and perseveration after reversal of reinforcement contingencies [95]. Discrepancies in findings could be associated with the fact that it is not clear how genetic variation in the DAT1 gene influences tonic and phasic DA levels [32]. Phasic DA changes are related to salient stimuli and have been shown to enable RL based on PE [8,9], while tonic DA release has been associated with exploitation of RL [96] or weighing of effort costs [97]. Moreover, there are reciprocal relationships between tonic DA levels and phasic DA burst in striatal areas [98]. Future neuroimaging studies should attempt to disentangle the genetic contribution of the dopaminergic system to RL.
There are several limitations of our study. One factor that may have contributed to poor acquisition of contingencies among patients with SZ is worse general neurocognitive performance in comparison with HC participants. However, analyses covarying for RBANS total score indicated that group differences and group x reward contingency interaction remained significant at the trend level. Another potential limitation of the current research is that we were unable to determine the cause of the learning deficit. Learning deficits could be attributed to deficits in learning from PE signaling or deficits in value representation, and the PST task does not allow to isolate these variables during the training phase of the task. A modified version of the task could be used in the future to allow for association of specific genetic polymorphisms with PFC and BG functionality. Moreover, it should be noted that there are several other dopaminergic receptors that should be investigated in the future [99], especially D3 receptors, which has been involved in pathophysiological mechanisms underlying cognitive impairment observed in patients with SZ [100] as well as in RL [101]. Finally, epistatic interactions between dopaminergic genes should be given more attention, since looking at the effects of multiple genes on a single trait provides a comprehensive and more reliable way to determine genetic effects on endophenotype, as shown for example in a study on the combined effect of COMT (rs4680) and DRD3 (rs6289) on cognition in SZ [102] or response to treatment [103]. It should also be mentioned that considering the candidate gene approach, we included relatively small sample of participant. However, previous genetic studies on the RL tasks suggest that the cognitive measures employed in this area of research point to relatively large effects of genetic variations in dopaminergic function [25].

Conclusions
In conclusion, SZ patients performed significantly worse on the probabilistic RL task in comparison to HC participants. Average performance during the training phase was associated with general neurocognitive functioning, but not with current symptomatology. We found no significant association between dopaminergic genetic polymorphisms and probabilistic RL among SZ patients; however, among HC participants with respect to the DAT1 rs28363170 polymorphism, individuals with 10-alle repeat genotypes performed significantly better in comparison to 9-allele repeat carriers (9R/9R and 9R/10R genotypes).  Informed Consent Statement: Informed consent was obtained from all participants involved in the study.

Data Availability Statement:
The data presented in this study is available on request from the corresponding author.

Acknowledgments:
The authors express acknowledgment to all participants who took part in the study.

Conflicts of Interest:
The authors declare no conflict of interest. Abbreviations: group: healthy control (HC), schizophrenia (SZ); DRD2-gene encoding dopaminergic D2 receptor, DRD4-gene encoding dopaminergic D4 receptor, COMT-gene encoding catechol-o-methyltransferase, DAT1-gene encoding dopamine transporter gene, DARP32-gene encoding dopamine and cAMP-regulated phosphoprotein of molecular weight 32 kDa, significant associations (p-value less than 0.05) are marked in bold, distributions that did not follow the HWE are marked in italic, 9R and 10R-nine-and ten-repeat alleles, the number of genotypes for each studied polymorphism differ due to poor DNA quality and/or unsuccessful genotyping.