EEG Resting-State and Event-Related Potentials as Markers of Learning Success in Older Adults Following Second Language Training: A Pilot Study

Objectives: In this pilot study, we evaluated the use of electrophysiological measures at rest as paradigm-independent predictors of second language (L2) development for the first time in older adult learners. We then assessed EEG correlates of the learning outcome in a language-switching paradigm after the training, which to date has only been done in younger adults and at intermediate to advanced L2 proficiency. Methods: Ten (Swiss) German-speaking adults between 65–74 years of age participated in an intensive 3-week English training for beginners. A resting-state EEG was recorded before the training to predict the ensuing L2 development (Experiment 1). A language-switching ERP experiment was conducted after the training to assess the learning outcome (Experiment 2). Results: All participants improved their L2 skills but differed noticeably in their individual development. Experiment 1 showed that beta1 oscillations at rest (13–14.5 Hz) predicted these individual differences. We interpret resting-state beta1 oscillations as correlates of attentional capacities and semantic working memory that facilitate the extraction and processing of novel forms and meanings from the L2 input. In Experiment 2, we found that language switching from the L2 into the native language (L1) elicited an N400 component, which was reduced in the more advanced learners. Thus, for learners beginning the acquisition of an L2 in third age, language switching appears to become less effortful with increasing proficiency, suggesting that the lexicons of the L1 and L2 become more closely linked. Conclusions: In sum, our findings extend the available evidence of neurological processes in L2 learning from younger to older adults, suggesting that electrophysiological mechanisms are similar across the lifespan.


INTRODUCTION
While many resources have been dedicated to establishing the now commonly accepted view that the learning of foreign languages is desirable for children and younger adults, little effort has been dedicated towards exploring the potential of this learning challenge for older adults [1]. While the reasons for older adults to learn a new language are manifold, caution needs to be taken when applying research findings from younger adults to older learners. For one, aging is accompanied by substantial brain atrophy, and has also been associated with reduced capacities in a large number of cognitive skills including, but not limited to, attention, executive functions, memory, problem solving and processing speed [2][3][4]. While there is an overall tendency toward cognitive decline even in healthy aging, individual differences in cognitive performance also increase with age and these differences are reflected in late L2 [5][6][7]. Since learning a new language is one of the most complex cognitive tasks humans are able to perform, the level to which brain function is preserved is likely predictive of individual differences in L2 learning success (see [8]). In order to cater for the individual needs of older adults with varying L2 learning capacities, the first step is to understand the relationship between the aging brain and L2 development in third age.
Currently, the neurological substrate for these learner differences, particularly in older learners, is still largely unknown [9]. A useful tool to remedy this issue has proven to be the electroencephalogram (EEG). Its excellent temporal resolution allows us to record neural oscillations in the brain at rest and makes it possible to track changes in time-locked electrical brain activity associated with the processing of a second language. Particularly the ability to investigate neural oscillations at rest offers a promising approach towards ascertaining L2 development aptitude; that is, an individual's sensitivity to L2 input that explains why some learners find it easier than others to acquire a new language, independent of motivational factors [10]. Neural endogenous oscillations reflect stable aspects of the functional architecture that also underlie evoked oscillatory patterns, and have therefore been identified as an electrophysiological predictor of behavior [11]. As with cognition, however, neural oscillations are affected by age-related changes: On average, older adults show slower alpha activity (8)(9)(10)(11)(12)(13), lower amplitudes in the alpha and beta (14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30) bands and an increase in slower oscillations; that is, in the delta (1-4 Hz) and theta (4-8 Hz) ranges [12]. Reductions of beta power at rest have been related to alertness deficits and have been identified as markers of dementia progression [12,13] while overall indices of spontaneous electrophysiological activity have been found to be both reliable predictors of and precursors to cognitive impairment [14]. At the same time, neural oscillations can manifest great interindividual variability [15], meaning that EEG indices can be particularly informative regarding behavioral differences in older learners. Despite these findings, there are-to the best of our knowledge-very few studies that have investigated EEG oscillations in the context of learning a new language, let alone in older learners who begin the L2 acquisition when they have already reached the third age.
In two EEG studies, neural oscillations at rest have been used as a predictor of L2 learning in younger adults, but there are no comparable studies with older learners. The first of the two studies used EEG indices (i.e., power in different frequency bands) to predict success of L2 learning in young adulthood (18-31 yrs), and reported on a preliminary subset of 16 participants in Prat et al. [16] and on the full dataset of 47 subjects in Prat, Yamasaki and Petersen [17]. The participants were monolingual English speakers who completed an 8-week French course consisting of sixteen 30-min sessions. L2 proficiency was assessed by recording the level each participant had reached at the end of the 16 sessions. Before the training, 5 min of eyes-closed resting-state EEG data were collected, and power values were calculated across theta (4-7.5 Hz), alpha (8-12.5 Hz), beta1 (13-14.5 Hz), beta2 (15-17.5), beta3  and low-gamma (30)(31)(32)(33)(34)(35)(36)(37)(38)(39)(40) bands. For the final report [17], the authors averaged power over the entire beta band (13-29.5 Hz). Correlations between power values at each electrode and final L2 level showed that power values in the gamma, beta and theta bands were predictive of subsequent individual differences in L2 learning. When distinguishing the three beta bands, beta1 band was the most predictive frequency band and yielded positive correlations between r = 0.60-0.77. Prat et al. [18] found similar resting-state correlations of beta power with learning rate in their study on the acquisition of a programming language. These results are in line with those of Küssner et al. [19], who found that resting-state beta power at various electrode sites predicted word recall in a foreign-vocabulary learning task at three different testing sessions. Hence, resting-state EEG indices may be a promising candidate for an electrophysiological measure of the above-mentioned L2 aptitude, since they provide a paradigm-free measure that predicts L2 development before the actual training. The current pilot study assesses whether the above findings can be replicated in older adults, a group for which heterogeneity in resting-state indices has been found to increase as a function of cognitive demand. These findings could help us understand whether L2 learning is qualitatively similar between older and younger adults from an electrophysiological point of view, which in turn would inform future research designs in terms of customizability of training methods and materials for this age group.
In addition to understanding how language acquisition progresses in older L2 novices, it is equally informative to investigate how the new language is integrated with the existing one in older learners at a given stage of L2 proficiency. Again, a previous study with younger adults showed that heterogeneity in L2 proficiency after L2 training manifests itself through differential processing of language switching between the newly learned and the native language. By means of electrophysiological correlates of L2 learning in younger adults, Van der Meij et al. [20] showed that the electrophysiological response towards a language switch was indicative of the individual L2 level. The authors tested two groups of young adult monolingual Spanish EFL learners, who self-rated themselves as intermediate or advanced L2 learners. Participants were visually presented with English (L2) sentences of the type The house that we rented was furnished and felt cozy, in which one of the adjectives could occur in L2 or L1 (Spanish). ERPs time-locked to the onset presentation of the critical adjective showed a clear N400 effect and a late-positive component (LPC) towards language switch in both proficiency groups. However, for the more proficient group, the N400 amplitude was larger and the effect showed a more frontal distribution. The authors interpreted their findings within the framework of the Revised Hierarchical Model [21], which postulates that, in the early stages of L2 learning, there is a weak link between the L2 lexicon and the conceptual level but a strong link between L2 and L1 lexicons. Since the performed language switch was purely lexical (i.e., no semantic inconsistency was presented), the authors concluded that the less proficient learners manifested a stronger link between L1 and L2 words, thus facilitating language switching, while the proficient learners showed a pattern closer to that of balanced bilinguals. The participants in Van der Meij's study, however, were younger adults.It is possible that in older adults, lifelong monolingualism in conjunction with decreased cognitive capacitycould lead to a more rigid language system, which in turn could lead to a reduction or even an absence of switching effects into the L1, particularly at initial stages of L2 learning.
The current pilot study addresses these questions; namely, whether older adults would also display similar effects of language switching into their L1 and whether they would do so at very basic levels of L2 proficiency. These findings in turn may help L2 instructors and researchers select the primary teaching language and the appropriate amount of language switching in classrooms of older L2 learners, as well as possibly informing an understanding of languagemixing errors or word retention difficulties in this age group.
Taken together, the overarching aim of this study was to assess whether neurological substrates of L2 learning identified in younger adults can be found in older learners. While resting-state parameters and cognitive capacities may change throughout the lifespan, we hypothesize that the underlying neurological mechanisms should also apply to older adults.

EXPERIMENT 1
The aim of Experiment 1 was to replicate the studies by Prat et al. [16,17] in older adults, and thereby examine whether pre-training resting-state EEG markers can predict L2 aptitude in third age learners following 3 weeks of L2 instruction only. Using EEG indices measured before the L2 course, we hypothesized that power in the beta1 band in particular would predict individual L2 development, in line with Prat et al. [16].

Participants
For this longitudinal study, we recruited ten healthy older participants (range = 65-73 yrs, M = 68.2 yrs, SD = 2.44, 4 women), all of whom were (Swiss-) German speakers, with no more than school knowledge of any language other than (Swiss-) German, little to no exposure to English, and who had not resided for more than 3 weeks in an English-speaking country during the past 40 years. None of them reported any history of present or past neurological, psychiatric, or neuropsychological disorders, and we excluded participants with hearing thresholds above 40 dB on the better-hearing ear for frequencies lower than 500 Hz, as this is the threshold considered to be disabling by the WHO. A short intelligence test was performed to ensure that participants were not cognitively impaired (Kurztestfür allgemeine Basisgrössen der Information verarbeitung [22]), and only participants that showed average scores or higher were included. Further, we excluded professional musicians (individuals playing an instrument for more than 6 h per week), as musical expertise has been found to influence L2 attainment [23]. None of the participants reported engaging in any other cognitively challenging activities during the time of their participation. The study was approved by the local ethics committee of the University of Zurich. All individuals gave their written informed consent and were refunded the course fee for their participation in the study.

L2 training
Participants completed an intensive 3-week English course for beginners, comprising a total of 60 hours distributed over all consecutive workdays within the course period (i.e., 15 days). The course was taught by a qualified English-as-a-foreign-language (EFL) teacher, and participants were trained in all four of the essential skills for L2 learning (i.e., speaking, writing, listening comprehension and reading). New grammar forms were introduced through explicit instruction and practiced through communicative exercises in pairs and with the whole class. One lesson per day was reserved for written selfstudy and no additional homework was required. The structure and content of the training followed a course book designed for German adult learners of English (Next A1, Hueber Verlag). Since the training was designed as an intensive course, the instructor had to be able to attend to each learner individually so as to avoid some learners being outpaced by the others halfway through the course, which would have rendered the course unteachable. Thus, for logistic reasons and because redoing the course with a new set of participants would have added innumerable confounding factors (e.g., between-participants and teacher-participants group dynamics), we only report findings from one experimental group.

Experimental design
In the week before and again in the week after the training, resting-state EEG data were obtained from each participant and L2 proficiency was assessed via three language tests (see Fig. 1). In the week following the L2 training, an ERP experiment was carried out directly after the resting-state recording to investigate the N400 and LPC components as a response towards language switch and semantic incongruence, following the work of Van der Meij et al. [20].

Language development
To reliably gauge L2 proficiency of English before and after the training based on language comprehension and production on the lexical as well as grammatical levels, learners completed three different L2 tests before and after the training. These are described in the following.
Integrative L2 knowledge. The C-Test, a language assessment, screening and examination tool, measures integrative L2 production skills; that is, a learner's ability to infer missing information in a text where the natural information redundancy is reduced. Since the C-Test has been shown to correlate with self-evaluation procedures, school grades and other language tests [24], it is an ideal measure of general L2 competence. For our study, the test consisted of five short, random, written texts in L2, the degree of difficulty of which was adjusted to the target level of the training (approximately basic level A2 as per CEFR [25]). In each of the texts, the second half of every second word was removed [26,27], creating a total of 125 gaps to be filled in by the participants. Percentiles were calculated based on the number of correctly filled gaps.
Language assessment test based on course book. The course book (Next A1, Hueber Verlag) provides an online assessment test to measure vocabulary and grammar as well as basic listening comprehension with a target level of A1+(CEFR). It was used here to assess L2 competence according to the materialspecific learning goals, which we expected to show improvements even if the more general C-Test (see above) did not. Participants completed the test in the lab with the presence of the experimenter ensuring that there were no problems based on computer illiteracy. Percentiles for each individual were calculated based on raw scores.
Listening comprehension. To complement the listening tasks of the course book's online assessment, which only measured comprehension for letters, digits and gist, a listening task was added to test comprehension on word and sentence levels. The test comprised 12 sentences to be translated by participants, each sentence corresponding to the level of difficulty of one unit in the book. Two points were awarded for sentences translated correctly in both content and form, and percentiles were calculated for each individual.
Corrected L2 development. Given the omnipresence of English loan words in (Swiss-) German or in advertisements and the media in general, a certain degree of previous L2 skill was practically unavoidable. In order to control for the differences in previous L2 proficiency, the language tests were carried out both before and after the training. A principal component analysis revealed that all L2 scores loaded on the same factor, which explained 82% of the variance within the three language tests. Therefore, L2 proficiency was calculated as the mean over all language tests. L2 development was not calculated as the difference between post-and pre-scores, since L2 learning in older adults has been found to follow logarithmic trajectories [8], which reflect the fact that improvement becomes more difficult with increasing proficiency. Thus, a pure difference score would reward learners with low L2 skills. Accordingly, we expected an improvement from 90-100% would be comparatively more difficult than one from 0-10%, and therefore used a corrected score to reflect the percentage of maximum attainable improvement for each learner, as follows: (score at T 2 − score at T 1) * maximum score maximum score − score at T 1 Thereby, a learner starting with previous L2 knowledge of 30 points who improves by 20 points has a higher CorP (CorP = 28.57) than a learner with zero previous knowledge who also improves by 20 points (CorP = 20.00).

Resting-state EEG recording
Before and after the training period, 8 min of alternating 2 min eyes-open/2 min eyes-closed restingstate EEG data were collected from participants, using 128 Ag/AgCl electrodes embedded in an elastic cap (Electro-Cap International Inc. Eaton, OH, USA), and recorded with the high resolution BioSemiAc-tiveTwo EEG system (BioSemi B.V., Amsterdam, Netherlands). At both time points, the resting-state EEG was recorded prior to Experiment 2 in order to avoid any influence of the ERP task on resting-state activity. Each participant was seated in a soundproof, electrically shielded room, approximately 80 cm away from the computer screen. The electrical brain activity was recorded with a sampling rate of 512 Hz. Impedances were generally kept below 20 k and the signal was filtered online with a band pass filter of 0.1-100 Hz. Preprocessing was conducted using the BrainVision Analyzer 2.1 (Brain Products, http://www.brainproducts.com ) and followed the procedures described in Prat et al. [16] with minor variations to fit our equipment and the preprocessing software. A low cutoff filter of 0.1 Hz (12 dB) and a high cutoff of 30 Hz (48 dB) were applied offline. For the extraction of power means per frequency band, we followed the procedures described in Prat et al. [16], again using BrainVision Analyzer. Accordingly, we only used the eyes-closed resting-state segments (a total of 4 min) from the 8-min recording to allow direct comparability of results with Prat et al. [16], removing the first and last 5 s of each eyes-closed segment as an adjustment interval to the new condition. To correct blinks and saccades, an independent component analysis (ICA) was applied [28], and semi-automatic artifact rejection within each channel was used to eliminate noisy segments. Data were rejected from 200 ms before to 200 ms after an artifact, with artifacts defined as any point in time at which the gradient exceeded a voltage step of 50 V, the maximal difference of values in a 200 ms interval exceeded 200 V, or the activity in 100 ms intervals was not lower than 0.5 V. Channels that would have caused more than 10% of an individual's data to be rejected were replaced via topographic interpolation; that is, their activity was simulated by averaging the activity of the adjacent electrodes. Recording quality, however, was generally high, so that no more than five channels (of the total 128) ever had to be interpolated, and no more than 9% of the data had to be rejected due to artifacts for any participant. Finally, the data were segmented into 2s-epochs with 50% overlap (following Prat et al. [16]) while automatically skipping intervals containing markers for artifact rejection, and all activity was re-referenced to the average reference.

Statistical analysis
For the statistical analysis of the data and given our small sample size, we used pairwise Spearman correlations to compare power means per subject, electrode pool and frequency band with L2 development following Prat et al. [16,17]. In an additional, exploratory approach, we subjected those same values to a multilevel model [30,31] to assess the electrode-independent correlation of brain and behavior. The intraclass correlation coefficient (ICC) was used as a measure of whether measurements of power values were independent within subjects in order to determine the necessity of multilevel (i.e., mixed) models. Power values were nonindependent in subjects, ICC(1) = 0.14, F(9.290) = 6.03, p < 0.001, which according to Cichetti provides fair significance that measurements were nonindependent within subjects [32]. In the Mixed Model, resting-state oscillatory power was used as independent variable, predicted by the main factors Frequency Band and L2 development, with random effects for subjects. L2 development was standardized before being entered into the model. The alpha band was used as reference frequency in the model. Wherever possible, and as recommended by the American Statistical Association (ASA), we report confidence intervals instead of p-values, given the well-known shortcomings of the latter as a good measure of evidence for a model or hypothesis, and the likelihood of a type II error due to confounding by the number of observations in our sample [33,34]. Furthermore, p-values have repeatedly been shown to be problematic if not altogether unnecessary in mixed models [35]. In contrast, confidence intervals allow us to make statements as to which effects are likely to exist in the population and provide a good measure of how precise the sample statistic is. For all statistical analyses we used the program R (http: //www.r-project.org).

Individual differences in language learning
Consistent with previous research, even though all participants increased their L2 skills over the course of the 3-week training, there was large individual variability in the L2 improvement, ranging from 16 Fig. 3. The relationship between initial L2 knowledge (which was very basic in all participants) and the degree of L2 development over the course was non-significant (r s = 0.36, 95% CI [-0.41, 0.91]). Indeed, some participants who started with lower L2 skills at T1 surpassed some learners who began with higher levels of L2 knowledge after the training. In the following, we assess whether the same learner differences are manifested in the EEG experiments.

Resting-state EEG indices
Pairwise Spearman's correlations of power within each frequency band and electrode pool from the pre-training EEG with L2 development over the training (see Prat et al. [17]) only showed a significant relationship for medial frontal theta (r s = 0.68, 95% CI [0.12, 0.96]), which did not survive FDR correction (see Appendix Table ST1). The exploratory analysis of the relationship between resting-state power and L2 development across the whole head, however, showed a positive relationship between language learning and all assessed frequency bands (see Table 1). We estimated the overall variance explanation with the pseudo R-squared for generalized mixed-effect models [36], which indicated that the model explained 33% of the observed variance. However, only in the beta1 band was power significantly predicted by L2 development. This finding is in line with Prat et al. [16], who found the highest correlation coefficients between L2 learning rate and EEG indices in the beta1 band (r = 0.60-0.77) and Prat, Yamasiki and Peterson [17] who found a correlation of L2 learning rate with pre-training power in the beta band over right posterior electrodes (r s = 0.39), but once again, our replication of electrode-based correlations did not yield significant results.
Following the findings by Prat et al. [16] and taking their analysis a step further, in addition to investigating EEG measures as predictors, we also investigated power changes in the beta1 band via a paired t-test of beta1 power in pre-and posttests. The difference between pre-and post-values in beta1 power was correlated with the L2 development in order to assess the relationship between stability in EEG indices and L2 outcome. When comparing beta1 values before and after the language course, a Wilcoxon Signed-Rank Test revealed a negative trend; that is, a decrease in

Discussion of experiment 1
In this experiment, we investigated whether brain activity at rest predicts individual L2 aptitude in older adults, who start to learn an L2 when they have already reached the third age. Our results demonstrated that endogenous brain activity, as measured by beta1 power in the resting-state EEG before the training, was a reliable predictor of individual L2 progress, since participants with higher power in the beta1 band prior to the training also showed larger L2 development during and after the training. These findings are consistent with those of Prat et al. [16], who also found correlations to be strongest between L2 learning rate and beta1 power, but we could not identify a significant relationship on the electrode pool level. In Prat et al.'s [16] study, correlations between L2 learning rate and power in the beta2 band were smaller but still significant, while those in the beta3 were not significant after FDR correction. In Prat, Yamasaki and Petersen [17], in which power was averaged across the whole beta band, positive correlations were also found for overall beta and L2 measures. Similarly, our findings are in line with Küssner et al. [19], who found baseline resting-state beta (14-35 Hz) power to be a predictor of word recall in a foreign vocabulary task, and are compatible with those of Kepinska et al. [37], who found that learning of an artificial grammar was more successful in learners who showed higher functional connectivity in the beta band (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29) during the learning phase.
Since our results and those of Prat et al. [16] suggest a differential role for beta1 oscillations in L2 learning as opposed to beta2 or beta3 oscillationsboth in younger and older adults-we will first address those previous studies that report on the beta1 band in particular and its relationship with cognitive functioning. Egner and Gruzelier [38] could show that, in younger adults, neurofeedback training of low beta frequencies (12-15 Hz and 15-18 Hz) led to increased perceptual sensitivity, reduced omission errors and faster reaction times than in a non-neurofeedback control group. Similarly, Egner and Gruzelier [39] found that training these same frequency bands also led to increases in P300 ERPs, a component which is heavily influenced by attention [40]. In line with these findings, Vernon et al. [41] reported improved accuracy in focused attentional processing and improved performance in a semantic working memory task following beta1 training (12)(13)(14)(15). Further to these studies, in their review Gruzelier et al. [42] associate power in the 12-15 Hz band (referred to as sensori-motor-rhythm) with attentiveness, sustained attention, semantic working memory, declarative memory and reduced hyperactivity.
Even though the beta1 band appears to play a distinctive role in cognitive performance and L2 acquisition, few studies discuss the absence of effects in the upper beta bands. Park et al. [43], for instance, found that power in all beta bands was reduced in patients suffering from Alzheimer's disease (AD), but also found that the effect was strongest in the beta1 band, smaller in the beta2 band, and least pronounced-but still significant-in the beta3 band. The authors, however, do not discuss this graded effect. Accordingly, Hogan et al. [44] report an increase of beta1 power in both healthy older adults and adults suffering from AD during memorization phases as a function of working memory load, but only AD patients also showed increases in beta2 and beta3 power. Again, the authors omit any discussion of these differences. Finally, the study by Lindau et al. [45] also found distinctive patterns of oscillatory power decrease in the beta1 to beta3 bands between healthy older controls, patients suffering from AD and those suffering from frontotemporal dementia, but again, the authors failed to discuss those patterns.
Therefore, it appears that there is currently no theoretical model on the distinctive functions of beta1 to beta3 oscillations in cognition and language processing, despite the fact that the evidence suggests that their function in cognition may vary. Since most studies, however, do not make this distinction, we will also discuss results on the overall beta band if their frequency range of interest included 13-14.5 Hz (our beta1 band). Similar to beta1 studies, those on the overall beta band have linked beta power to attentional resources that naturally fluctuate as a function of cognitive load and mental fatigue throughout the day. For instance, Jap et al. [46] and Liu, Zhan and Zheng [47] showed that beta power decreased as a function of fatigue following exhausting tasks, such as prolonged monotonous driving or repeated cognitive tasks. Accordingly, Kepinska et al. [37] found that artificial grammar learning was more successful in 1) participants who had high functional connectivity in the beta band (13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29) during the learning phase, and 2) participants who had high beta power right at the beginning of the learning phase, not increasing throughout the task. The authors associated beta activity with improved memory encoding during operations of high memory load. In line with these findings, Engel and Fries [48] postulate that beta band activity relates to maintaining the current motor and cognitive set, which can be understood as signaling the status quo via endogenous top-down processes. Typical top-down processes that the maintaining of a cognitive set requires include working memory tasks and attention, both of which have been linked to beta band activity [48]. In particular, a study by Gola et al. [49] showed that older adults who manifested a decrease in beta power during the anticipatory period of a visual attention task were significantly poorer performers than those who showed a beta band power increase. These findings led the authors to conclude that beta power is associated with activating and sustaining attentional processes and, most importantly, that this relationship is present both during tasks and during rest. As Engel and Fries [48] hypothesized, beta band activity should be particularly high during a resting state in which there is no expectation of ensuing change in the sensorimotor set. Our results can be reconciled with both this hypothesis and the findings of Gola et al. [49]: If indeed beta power at rest is indicative of an endogenous, permanently oscillating attentional state, our results would indicate that intrinsic attentive capacities are an essential prerequisite for successful L2 learning.
There is ample evidence to suggest that it is precisely selective attention that is one of the key skills enhanced in bilinguals as compared to monolinguals [50], and that this skill constitutes a significant predictor for L2 acquisition in adulthood [51,52]. L2 learners acquire a new language based on the input they read or hear, but some features of the input become output only at very late stages of L2 acquisition, and some forms even fail to be taken as input altogether. According to Ellis [53], the reason for this is "learned attention" in the L2 acquisition, which retards or even prevents the noticing of fragile features of L2 due to factors intrinsic to languages, such as salience or cue competition. Features, such as the third person singular "-s" of English, which are redundant (e.g., He eat* an apple would still be comprehensible) are notoriously difficult to acquire, as they are non-salient and therefore require increased levels of selective attention in order to be detectable in the language stream. If selective attention in a given learner is low, only the most obvious cues in the input may be acquired, which in turn may be sufficient for everyday communicative survival [53] and thus impede L2 progress. As a consequence, the slower learners of our study are likely to have suffered from reduced attentional resources, as reflected in the lower power values in the beta1 band. Thus, our findings are not only in line with Prat et al. [16,17], but are also consistent with previous findings on the relationship between beta1 oscillations, attentional processing and semantic memory, confirming that beta-activity in the brain at rest can be used as a paradigm-free tool to predict L2 development.

EXPERIMENT 2
In the second experiment, we aimed to determine whether electrophysiological measurements already differentiate varying levels of L2 proficiency after L2 training of only 3 weeks and in older adults, thus providing insight into the question of how a newly learned language in old adulthood is processed in relation to the native language (L1), and thereby informing theories on how the new language is stored at initial stages of L2 learning. To this end, we conducted a visual language-switching experiment after the training similar to that of Van der Meij et al. [20]. English (L2) sentences were presented, in which the target word was either in L2 or L1, and either semantically congruent or incongruent with the rest of the sentence. We aimed to test the following hypotheses: In line with Van der Meij et al. [20], we hypothesized an N400 effect as well as an LPC (late-positive component) towards language switch. However, as our learners were L2 beginners, we expected to find an inverse relationship between language-switching effects and L2 proficiency, assuming that at a very basic L2 proficiency, the link between L1 and L2 lexicons does not yet exist but strengthens with increasing proficiency until resembling that of Van der Meij's intermediate learners. In addition, we hypothesized that an effect of semantic incongruence would either be reflected by a monophasic LPC or a biphasic N400-LPC pattern, as these have been shown to be elicited by similar semantic violations [54].
The participants, the language training and the language tests were the same as in Experiment 1.

Measure of language proficiency
The L2 measure, as in Experiment 1, was calculated as the mean over all three L2 tests (C-Test, listening comprehension, course book assessment). Here, however, we were not predicting L2 change but capturing momentary processing of language switching from L2 to L1, which can be expected to be affected by both the pre-existing L2 (and L1) knowledge as well as the additional knowledge gained throughout the L2 training. Therefore, instead of using an improvement score, as done in Experiment 1, here we used the final L2 level and correlated it with the respective ERP amplitudes.

ERP stimulus material
For the stimuli, following Van Der Meij et al. [20], a 2 (switch vs. no-switch) × 2 (congruent vs. incongruent) design was used, with a total of 320 English sentences of nine to 12 words (see Fig. 4). Sentence structure was the same for all stimuli; that is, a compound sentence that included a subordinate relative clause (e.g., "The girl that does not talk much writes a book"). The last word of each sentence could either occur in English (no-switch, 80 sentences, see standard sentence) or in German (switch, 80 sentences; e.g., "The girl that does not talk much writes a Buch" / "The girl that does not talk much writes a Kartoffel"), and could be semantically congruent (congruent, 80 sentences, see standard sentence) or incongruent with the rest of the sentence (incongruent, 80 sentences; e.g., "The girl that does not talk much writes a potato" / "The girl that does not talk much writes a Kartoffel"). Since word order is not identical in English and German, the code-switch was performed in the last part of the sentences, where word order is grammatically correct in both languages. Each sentence appeared in all four conditions, and in order to prevent participants from recognizing sentences from earlier iterations, the stimulus material was made up of no more than 20 nouns and 15 verbs in total, which were reassembled into 80 different sentences via an automatic algorithm that forced each verb to appear in two sentences and each noun to appear in four sentences, twice as subject and twice as object. All words that appeared in the stimuli were part of the course book curriculum, and the instructor ensured familiarity with the terms over the course of the training. For the sentence-final target word, only nouns that were orthographically different in at least two letters between English and German were included, and there were no "false friends" (e.g., Gast -guest, *Handy -handy). Average frequency of the words was not taken into consideration as their use in the classroom environment is not representative of that of a native speaker of English or German, but it was ensured that all words were covered in the course curriculum.

EEG recording and preprocessing
EEG data were collected using the same system and settings as for the resting-state data. The presentation of stimuli was controlled via Presentation software (Version 18.0, http://www.neurobs.com). Fig. 4. ERP experiment design. Each target word (e.g. book) appeared in at least four different sentences and conditions in order to avoid prediction effects. ERPs were recorded from the onset of the target word. The colors used for target words are the same as in Fig. 6: red = no-switching incongruent, pink = switching incongruent, black = no-switching congruent, green = switching congruent.
Similar to the study of Van der Meij et al. [20], sentences were presented visually one word at a time in a gray-green lowercase font against a black background. Each sentence was preceded by a "+" sign shown for 1000 ms, followed by a blank screen of 500 ms. Each word was shown for 500 ms with a blank screen of 200 ms between words. At the end of each sentence, a blank screen of a jittered duration of 500-1000 ms was inserted to ensure onset asynchrony between sentences. Participants were instructed to read the sentences for content, and practiced the task in the presence of the experimenter. After 1/3 of the sentences, participants' comprehension of the content was assessed with the question Hat der letzte Satz Sinn gemacht? (Did the last sentence make sense?), and answers had to be indicated via the left ("No") or right ("Yes") arrow key.
The preprocessing procedure was the same as in Experiment 1.

ERP analysis
After preprocessing, the ERP data was segmented for each condition (switch, no-switch, congruent, incongruent) from 100 ms before to 1000 ms after the onset of the final noun and baseline corrected with reference to the 100 ms pre-stimulus baseline. In order to be able to detect the expected N400 and LPC, two time windows of interest were determined through a mixed hypothesis-and data-driven approach (see Results). The extracted windows were 300-450 ms for the N400 [20] and 600-800 ms for the LPC [55,56]. Since the effects were expected to be distributed over large areas of the scalp [20], electrode pools were formed by subsuming electrodes of similar activity (see Results), which then served as regions of interest in all further analyses.
For the statistical analysis of the ERP data, the mean voltage amplitudes relative to the start of the critical noun were subjected to a multilevel model [30,31]. All values were standardized before being entered into the models. Intraclass correlation for the time window between 300-450 ms was ICC = .70, and ICC = .69 for the time window between 600-800 ms, which according to Cichetti provides good significance that amplitudes were nonindependent within subjects [32]. In both time windows, the random intercept model fitted the data better than the random intercept-and-slope model, so that only random participant-intercepts were used. The model included main effects for Switch (switch, no-switch) and Congruency (congruent, incongruent) as well as their interaction as within-subject factors, and featured age as a control variable. In addition, a linear regression model was computed to assess the relationship between L2 proficiency after the training (T2) and the respective amplitude difference between switch and no-switch conditions, and between congruent and incongruent ones.

Accuracy
In the ERP experiment, participants' accuracy in determining whether the target sentences made sense was 81.39%. All participants scored above 75% except for one learner, who only reached 57.8%. However, there was no significant difference in the model fit between the model using proficiency as a predictor and the one using accuracy for either of the ERP components (N400, LPC), as assessed via the Bayesian information criterion (BIC). In addition, there was a strong correlation between accuracy in the EEG Experiment and L2 proficiency after the training (r = 0.86, 95% CI [0.58, 1.00]), which meant that we decided to retain all of the data in the mixed models calculated for each ERP component.

Event-related potentials
Since the expected N400/LPC effects are known to be distributed over relatively large areas of the scalp [20], we used a data-driven approach to define one electrode pool for each ERP component, combining adjacent electrodes that visibly reflected the expected activity. In accordance with Van der Meij et al. [20], we expected to find a negativity around 400 ms (N400) and a positivity around 700 ms (LPC) as an effect of language switch, and we postulated semantic incongruence to be reflected in either an N400 modulation or a late positivity around 700 ms [54]. Figure 5 shows topographic difference plots between conditions switch and no-switch, and between congruent and incongruent conditions, respectively. Based on these topoplots, a central electrode pool was formed to subsume all activity related to the N400 component, and a parietal pool was used to analyze LPC characteristics (see Fig. 5). Figure 6 shows ERPs time-locked to the onset presentation of the final noun, averaged over all participants for the four experimental conditions (language switching, no language switching, semantic congruence, semantic incongruence), plotted in the two electrode pools.
At the Central Electrode Pool, the N400 and LPC components are clearly visible, although overall  amplitudes are noticeably smaller when compared to similar studies [20,55]. The N400 is visibly scaled, such that average amplitudes were largest for the switch-incongruent condition, smaller for the switch-congruent condition, smaller still for the noswitch-incongruent condition and smallest for the no-switch-congruent condition. Thus, relative to no language switch, the two language switch conditions elicited a conspicuously larger negativity between 300-450 ms after word onset, which is most prominent in the Central Pool. In addition, already starting 500 ms post-target word presentation, but more visibly after 600 ms, a positivity at the Parietal Pool with a duration of approximately 200 ms and peaking around 700 ms shows more positive values for semantically incongruent conditions than for congruent ones. Based on the visual inspection, the Central Pool was chosen for all further analyses of the N400, and the Parietal Pool for those of the LPC. As shown in Fig. 7 and consistent with Van der Meij et al. [20], the N400 effect occurs in the time window between 300-450 ms. Therefore, we used this same time window for all further statistical analyses of the N400 component. For the LPC, we used a data-driven For the N400, there is a significant difference between language conditions, that is, between no switching (L2) and switching (L1), while in the LPC component, the significant difference was between congruent (Cg) and incongruent (Icg) conditions. As the error bars show, there appears to be a large overlap between conditions in the N400 time window, in particular, which is likely to be explained by the varying L2 proficiency, which correlated with the N400 amplitude and therefore may explain the observable variance.
approach, as the incongruent condition did not form part of Van der Meij et al.'s study [20]. As can be appreciated from the ERPs, the LPC is clearly discernible from approximately 600 ms onwards. We chose a time window of 200 ms (600 ms-800 ms). This was deemed sufficient to detect the expected effect in each learner based on existing literature on the LPC [55,56].
N400: Time window between 300-450 ms. ERP amplitude at the Central Pool in the time window This means that when reading sentences that contained a language switch from L2 to L1-independent of congruency-the N400 effect was larger; that is, values were more negative than those for no-switching. There was no significant effect of congruence (B = -0.27, 95% CI [-0.56, 0.03], t(27) = 1.81). A linear regression additionally showed that the magnitude of the N400-computed as no-switch minus switch amplitude (N400 effect becomes positive)correlated with L2 proficiency (B = --0.44, 95% CI [-0.89, 0.00], t(18) = -2.10). Thus, for learners with lower L2 skills after the course, the N400 effect was larger than for learners with high L2 skills (see Fig. 8). There was no significant effect of any of the factors included in the mixed model on N400 latency.

Discussion of experiment 2
This experiment investigated behavioral and electrophysiological markers of L2 learning in old adulthood by examining the event-related potentials from a group of (Swiss-) German older learners (65-74 yrs) who participated in an intensive 3-week EFL training. We aimed to assess ERP correlates of L2 proficiency in order to study the relationship between L1 and L2 in older EFL beginners. As predicted based on Van der Meij et al. [20], language switching (from L2 to L1) elicited an N400 effect in the time window between 300-450 ms. However, switching from L2 to L1 in our data did not elicit a LPC as it did in Van der Meij et al. [20], whereas the additional condition of semantic incongruence, which was not present in the study by Van der Meij et al. [20], did.
In the 300-450 ms time window, language switching (from L2 to L1) as compared to no-switching yielded an N400 effect over the Central Electrode Pool, independent of semantic congruency. The time window, the observed effect as well as its estimated location on the scalp are consistent with the findings of Van der Meij et al. [20]. Given that all our participants were L2 beginners, however, there was nothing to be gained from splitting the group into a low-and a high-proficiency group, as was done by Van der Meij et al. [20]. Nevertheless, the observed N400 effect could be shown to correlate with L2 proficiency after the training in that learners with a higher proficiency showed smaller N400 effects.
The N400 component, which originally was believed to be a marker of semantic deviation [57] and word frequency [58], has meanwhile also been shown to be elicited by language switching from L1 to L2 [59]. The present study not only confirms that language switching back into the L1 can also yield an N400 effect, but that this effect even occurs in older learners, and does so after a training period of only 3 weeks. Given the low proficiency of our learners, the observed effect cannot be explained by frequency effects because words in the L1 are conspicuously more frequent in the learners' language experience than the recently learned L2. Instead, the N400 appears to be a marker of how active the two languages are at once and how much activation cost is required to switch from one to the other. As predicted, in our data, the N400 negatively correlated with L2 proficiency at T2, so that more advanced learners (i.e., lower intermediate level) showed smaller amplitude differences than learners with very basic L2 skills.
The fact that the N400 as a function of language switch was reduced in the more proficient learners of our sample may be an indication that, for them, the coactivation of L1 and L2 is stronger than in the less proficient learners. This finding is in line with the effects observed in Van der Meij et al.'s intermediate learners [20]. At the same time, however, Van der Meij et al. [20] also found that switching costs increased again for very advanced L2 speakers. Thus, it appears that L2 and L1 lexicons are strongest coactivated at intermediate stages of L2 learning. At initial stages, the L2 lexicon does not yet exist, while at levels of high proficiency, the L1 is inhibited for successful L2 processing [60,61], both of which explain the lexical surprise effect as typified by the N400. In contrast to Van der Meij et al. [20], we argue that these effects can be explained without assuming separate lexicons for L1 and L2 or selective language access, as postulated in the Revised Hierarchical Model [21]. This model has repeatedly come under attack [62,63], and it is likely that the L1-L2 coactivation in our intermediate learners does not point to a link between L1 and L2 lexicons only, but that it manifests as an indirect association of both languages with the conceptual system through episodic memory, in particular because the interference of L1 words on L2 processing has been shown to be modifiable through the global language context, and thus is far from stable [64].
As mentioned above, however, differences between the participants in our study and those in the study of Van der Meij et al. [20] were not limited to L2 proficiency, but also applied to the participants' ages, and this difference was reflected in the ERP data. Compared to the results of Van der Meij et al. [20], who performed a similar experiment on language switching in younger adults, we found that ERP amplitudes in general were noticeably smaller in our group of older adults. This finding is consistent with the results of Xu et al. [65], who found that the congruency-induced N400 (in L1) yielded significantly smaller N400 amplitudes for older compared to younger adults. One reason for this reduction in ERP amplitudes could be age-related brain atrophy [66], however, it is not yet clear how these differences in amplitude may relate to behavioral losses. Another explanation could be that greater variability in the peak-evoked amplitudes results in smaller averaged ERPs or that the reduction in ERP amplitude reflects a shift in how the stimuli are processed.
A further difference between our data and those of Van der Meij et al. [20] is that we did not observe a significant LPC in response to language switching. In this case, the absence of a switching-related LPC could be due to the age difference or the difference in L2 proficiency between our study and that of Van der Meij et al. [20]. A recent study by Kim, Oines and Miyake [67] would speak in favor of the former, as they found that verbal working memory capacities correlated positively with LPC amplitudes and negatively with N400 amplitudes, both being generated by the same semantic anomalies and occurring within the same individuals but to different degrees. Since working memory capacities are commonly affected by age-related cognitive decline [4], reduced working memory could likely be responsible for the absence of a switch-related LPC in our sample.
We did, however, observe a late positivity with a parietal distribution that varied as a function of semantic incongruence, a condition that we added to the study design of Van der Meij et al. [20]. Considering the high accuracy in the comprehension questions, the observed LPC confirms that learners processed the stimulus sentences attentively and for content; that is, not only word by word. Here, we employ the term LPC for reasons of terminological consistency with Van der Meij et al. [20]. However, as noted repeatedly in the P600 literature, the terms "(semantic) P600", "late-positive shift", "latepositive effect" and "late-positive component" can be used interchangeably, as it is likely that they are attributable to the same underlying neurobiological processing mechanisms, independent of a preceding N400 effect [68,69]. The latency of the incongruence-effect as late as 600-800 ms can be explained by the fact that semantic incongruence could only be detected once lexical processing of the critical word was completed, and this computation is performed only in the N400 time window. In line with this, visual inspection of the grand average ERP indicated that semantic incongruence also modulated the N400 in the no-switching conditions, suggesting that semantic processing may have taken place earlier even when no switching was required. This effect, however, did not reach significance, which in turn would be in line with studies showing a reduction of N400 effects with age [70]. At the same time, the fact that we did not observe a switch-related LPC, as observed in Van der Meij et al. [20], could be an indicator that the error signal elicited by the language switch from L2 to L1 was noticed unconsciously, and therefore did not result in a "pop-out" effect associated with the emersion of meaning into conscious access [71].An alternative explanation could also be the differences in comprehension tasks used in the present study and that of Van der Meij et al. [20], an adjustment that was necessary given the added semantic condition in our study. While Van der Meij et al. asked more general comprehension questions focusing on different parts of speech in the sentence, our questions focused on the semantic congruency of the sentence, which was always determined by the last word. Consequently, the language switch in our design may have been less disruptive for the task at hand, since conscious top-down predictions were made for semantic congruency and only subconsciously for language consistency. It is therefore possible that more general comprehension questions would have resulted in a biphasic N400-LPC pattern that would have resembled that of Van der Meij's [20] younger participants.

OVERALL DISCUSSION AND CONCLUSION
The field of second-language acquisition in old adulthood is still in its infancy, and consequently there is little to no research on the neurophysiological markers of L2 learning in old age. In this pilot study, we have been able to replicate two studies on younger adult L2 learners with a sample of older (Swiss-) German learners of English as a foreign language. Our resting-state data replicated the study by Prat et al. [16], confirming the role of the beta1 band in L2 learning for older learners, albeit not on the individual electrode level. We could show that overall beta1 power before the training predicted the L2 development. These findings fuel the debate around L2 learning aptitude, and suggest beta1 power as being a possible electrophysiological correlate thereof. Since beta1 power has been associated with selective attention and semantic working memory, our findings are consistent with psycholinguistic theories on L2 learning [53,72]. In agreement with these theories, successful learning at initial stages of the L2 training may depend on beta1 oscillations that correspond to selective attention and semantic working memory in order to extract and focus on novel forms and meanings from the L2 input. Possible future applications of these findings are manifold. For instance, it remains for future research to show whether the L2 development can be increased by enhancing beta1 oscillations before each learning session through neurofeedback, whether the training can be adjusted to individual differences in beta1 power or whether beta1 oscillations themselves can be enhanced through the L2 training.
Second, we assimilated the research design by Van der Meij et al. [20], and could confirm that, even after a short training period of only 3 weeks, older L2 learners also manifested an N400 effect towards language switching from the newly learned L2 into their L1. Language switching appeared to require less effort with increased proficiency, suggesting that the lexicons of L1 and L2 had already started to become more closely linked, which confirms the theories of Abutalebi [60] and Cuppini et al. [61]. According to those theories, the L2 parasitizes its L1 equivalent at initial stages of L2 learning, and the two systems only become independent language systems again with highly advanced proficiency, thus indicating that integration processes of L1 and L2 change with the individual competence in the L2.
To sum up, our findings confirm that the human language system remains malleable even into old adulthood and appears to reflect the same electrophysiological mechanisms we observe in younger adults. The question that we plan to address in the future is not only how good older adults are at learning new languages, but also whether and to what extent language learning can be beneficial for an individual third age person (author(s), unpublished data).

LIMITATIONS
The data presented in this study are not without limitations. Even though our findings constitute an important first step towards understanding the neurophysiological mechanisms underlying L2 learning in old adulthood, the present design only reports on differences between pre-and post-training data, largely ignoring the learning process occurring between the two. Thus, in order to understand the individual L2 learning trajectories, dense longitudinal studies extending over longer periods would be required to allow inferences about whether beta1 power is more predictive of L2 development at specific stages of the learning process, for example. Accordingly, here we only report findings from a group of older L2 beginners, and even though these findings are theoretically compatible with those of Van der Meij et al. [20], future studies will have to show whether the differences between our findings pertain solely to differences in L2 proficiency, or alternatively to differences in age, training type, sample size, and language tests etc. We did not include a younger control group in the present study because our focus was on individual differences within the population of older learners themselves. We judged that differences between this group and a younger control group would be of little informational value, as it would be impossible to answer whether such differences occurred due to the degenerative processes of aging itself or to experientially determined differences in neurobiology and cognition. Accordingly, here we focused on individual variability in the ability to acquire a new language in third age and do not make any claims as to whether the observed effects are specific to language learning itself. We therefore also refrained from including an age-matched control group. Given our purely correlational design, however, it should be noted that we cannot rule out that some participants may have experienced fatigue or decreased motivation as a result of the intensive L2 training, which may have influenced their state of wakefulness, attention and/or effort during the EEG recordings. Therefore, future studies may benefit from including socio-affective measures and course feedback in their study design.
Finally, the sample size in this pilot study was constrained to enable the instructor to attend to each learner's questions and needs, especially because the extent of interindividual differences was unclear beforehand. Therefore, we interpret our results with caution and plan to replicate them in a more comprehensive study that is currently being conducted in our lab.

ACKNOWLEDGMENTS
We would like to express our gratitude to Allison Christen for proof reading an earlier version of this manuscript, and to the URPP "Dynamics of Healthy Aging" for providing the required infrastructure throughout all phases of the project.

FUNDING STATEMENT
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

DATA AVAILABILITY
The data that support the findings of this study are available from the corresponding author, MK, upon reasonable request.