• We are sorry, but NCBI web applications do not support your browser and may not function properly. More information
Logo of jnPublished ArticleArchivesSubscriptionsSubmissionsContact UsJournal of NeurophysiologyAmerican Physiological Society
J Neurophysiol. May 2011; 105(5): 2448–2456.
Published online Feb 23, 2011. doi:  10.1152/jn.00291.2010
PMCID: PMC3094194

Contextual cuing contributes to the independent modification of multiple internal models for vocal control

Abstract

Research on the control of visually guided limb movements indicates that the brain learns and continuously updates an internal model that maps the relationship between motor commands and sensory feedback. A growing body of work suggests that an internal model that relates motor commands to sensory feedback also supports vocal control. There is evidence from arm-reaching studies that shows that when provided with a contextual cue, the motor system can acquire multiple internal models, which allows an animal to adapt to different perturbations in diverse contexts. In this study we show that trained singers can rapidly acquire multiple internal models regarding voice fundamental frequency (F0). These models accommodate different perturbations to ongoing auditory feedback. Participants heard three musical notes and reproduced each one in succession. The musical targets could serve as a contextual cue to indicate which direction (up or down) feedback would be altered on each trial; however, participants were not explicitly instructed to use this strategy. When participants were gradually exposed to altered feedback adaptation was observed immediately following vocal onset. Aftereffects were target specific and did not influence vocal productions on subsequent trials. When target notes were no longer a contextual cue, adaptation occurred during altered feedback trials and evidence for trial-by-trial adaptation was found. These findings indicate that the brain is exceptionally sensitive to the deviations between auditory feedback and the predicted consequence of a motor command during vocalization. Moreover, these results indicate that, with contextual cues, the vocal control system may maintain multiple internal models that are capable of independent modification during different tasks or environments.

Keywords: auditory feedback, fundamental frequency, sensorimotor

whether we are using a new tool, learning to speak another language, or singing a new song, various forms of feedback are used to establish task-specific sensorimotor representations. Consequently, the plasticity of the nervous system permits neural reorganization and the formation of an internal model. Internal models represent the characteristics of the kinematics, dynamics, and sensory feedback of movements and are used by the nervous system to predict movement outcome. The prevailing hypothesis regarding the control of limb dynamics (Wolpert and Kawato 1998) and the control of speech (Guenther and Perkell 2004; Houde and Jordan 1998; Jones and Munhall 2005) and singing (Jones and Keough 2008; Keough and Jones 2009) is that internal models are used to regulate motor behavior. Internal models must accommodate physical changes in articulators, for instance due to development and aging, as well as specific task demands. Thus internal models are often investigated by altering a particular aspect of the sensory feedback associated with a given task. For example, prism adaptation studies have examined how the motor system responds to visual displacements imposed by prisms. The results of studies that alter the relationship between movements and sensory feedback in general show that participants compensate by adjusting their movement in the opposite direction of the perturbation (e.g., Jones and Munhall 2000, 2005; Kalenscher et al. 2003; Sainburg et al. 1999; Shadmehr and Mussa-Ivaldi 1994). In adaptation studies, where dynamic manipulations are held fixed for a series of trials, aftereffects are often observed when feedback is unexpectedly returned to normal; that is, subsequent responses err in the direction of compensation (Ghahramani and Wolpert 1997; Jones and Keough 2008; Keough and Jones 2009; Shadmehr and Moussavi 2000). Furthermore, aftereffects have been observed while participants perform untrained movements following exposure to altered feedback (Jones and Keough 2008; Jones and Munhall 2005; Keough and Jones 2009; Shadmehr and Moussavi 2000; Shadmehr and Mussa-Ivaldi 1994). This suggests that sensorimotor recalibration may generalize to other novel motor productions. In the arm-reaching literature it has been noted that the degree of generalization is reduced as the untrained movement direction diverges from the trained movement direction (Sainburg et al. 1999); given this it has been argued that precise movements correspond to context-specific neural mappings or multiple internal models (Kalenscher et al. 2003).

Recent evidence supports the existence of multiple internal models for the motor control of arm-reaching movements (Donchin et al. 2003; Kalenscher et al. 2003; Osu et al. 2004; Wada et al. 2003; Wainscott et al. 2005; Wolpert and Kawato 1998). For example, when provided with a contextual cue (e.g., color), individuals can acquire and appropriately switch between multiple internal models for the same direction of movement (Osu et al. 2004; Wada et al. 2003; Wainscott et al. 2005). Currently, it remains unknown whether the vocal control system maintains similar multiple internal models that correspond to specific musical notes. Thus the purpose of the present study was to investigate whether altered auditory feedback would result in corresponding changes in vocal production that was specific to the altered feedback conditions imposed on each target note. The current study also examined whether participants could switch between multiple internal models based on the presentation of a musical note that singers were required to emulate and that acted as a contextual cue. In other words, we wanted to identify whether voice fundamental frequency (F0) is represented by multiple internal models that are capable of independent recalibration and whether a contextual cue can prompt the use of specific internal models for F0 control.

Trained singers were recruited for this study, since previously (Keough and Jones 2009) we had demonstrated that they possess the ability to produce musical targets more accurately than those who lack vocal training. Participants were required to repeatedly produced three (experiment 1) different sequential target notes (A4, G4, and F4; 440, 392, and 349 Hz, respectively) or a single (experiment 2) target note (G4) while receiving unaltered and frequency-altered feedback (FAF).

MATERIALS AND METHODS

Subjects

Based on our experience with previous studies, we chose to recruit 15 singers to participate in each of the two experiments. Thus 30 undergraduate students (all women) whose native tongue was North American English participated in the study. All participants were trained singers (mean musical training was ~12 years) recruited from the Faculty of Music at Wilfrid Laurier University. Participants received financial compensation for their contribution to this research. Informed consent was collected from each participant, and the Wilfrid Laurier University Research Ethics Committee approved the procedures.

Apparatus

The experiment took place in a double-walled sound-attenuated booth (Industrial Acoustic, Model 1601-01). Participants were fitted with headphones (Sennheiser HD 280 Pro) and a condenser microphone (Countryman Isomax E6 Omnidirectional Microphone) that was maintained ~3 cm from their mouths. To reduce natural acoustic feedback and bone-conducted feedback, participants heard multitalker babble noise (Auditec, St. Louis, MO) at 75 dB SPL while vocalizing. The target notes were produced by a trained female singer who sang the consonant-vowel /ta/. The target notes were processed using a speech modification algorithm STRAIGHT (Kawahara et al. 1999) to ensure that each target was exactly 349 (F4), 392 (G4), or 440 (A4) Hz. Microphone signals were directed to a signal processor (VoiceOne 2.0, TC Helicon) that manipulated auditory feedback. The manipulated feedback was mixed (via Mackie ONYX 1640) with the multitalker babble and presented to the participant. Vocal productions were digitized at 44.1 kHz for future analysis.

Procedure: Experiment 1

To establish a baseline, 15 participants were presented with 10 trials of each of the target notes (for a total of 30 trials) used in the training and test phases (please refer to Fig. 1 since it outlines the procedure used in both experiments 1 and 2). These tones were A4 (440 Hz), G4 (392 Hz), and F4 (349 Hz). The notes were presented in sequences of three (e.g., A4-G4-F4); however, participants were instructed to reproduce each target note following its presentation. Participants were also instructed not to vocalize until the conclusion of the target note on each trial. During the baseline phase the participants received unaltered auditory feedback; that is, they heard a playback of their own reproduction of the target note.

Fig. 1.
The sequences of the 3 musical targets across trials in the experimental sessions are presented. Participants in experiment 1 either received the target notes in the sequence A4-G4-F4 (A) or F4-G4-A4 (B). Participants in experiment 2 received the target ...

Each trial was signaled by a 100-ms 1,000-Hz tone (beep). The interval between the end of the signal and the first note in a sequence was 500 ms. Each target note was presented for 2,000 ms and was followed by a period of 4,000 ms in which multitalker babble was presented and during which the participants attempted to reproduce the note as accurately as possible, both in terms of pitch and duration.

FAF training.

Each training trial sequence comprised three target note trials in one of two sequences (either A4-G4-F4 or F4-G4-A4; see A and B in Fig. 1, respectively). The sequences were counterbalanced: half the participants were asked to produce the sequence that began with A4 first (i.e., A4-G4-F4) and produce the sequence that began with F4 second (i.e., F4-G4-A4). The remaining participants produced the F4-G4-A4 sequence first, followed by the A4-G4-F4 sequence. Auditory feedback of the participants' reproduction of the first target note in the sequence (i.e., A4 or F4) was shifted up in pitch by 4-cent increments over each of the 25 test trials to create a total final increase of 100 cents (1 semitone) above the note produced by the participant on that trial (see A in Fig. 1). For the third target note of each sequence (i.e., F4 or A4), the participants' reproduction of the note was shifted down by 4-cent increments over each of the 25 test trials to create a total final decrease of −100 cents (1 semitone) below the note produced by the participant on that trial (see B in Fig. 1). As a control condition, to examine the sensorimotor representation of an unaltered target, auditory feedback of the second target note in each sequence (i.e., G4) was presented without any alteration of frequency for 25 trials (see A and B in Fig. 1). Participants were not informed that the feedback from the first and third notes in each sequence was altered. However, participants could implicitly associate the specific target note in the sequence with the direction of the auditory feedback manipulation over successive training trials. Indeed, in addition to giving the participants a target pitch to reproduce, the purpose of the presentation of the notes was to provide a contextual cue that would elicit the selection of the appropriate internal model. For example, participants presented with the note sequence A4-G4-F4 could, over successive training trials, implicitly associate the target note A4 with an upward shift in their auditory feedback, G4 with unaltered auditory feedback, and F4 with an upward shift in feedback. Thus, when these participants heard A4, their motor planning would involve an internal model adapted for the upward shift in their auditory feedback and therefore a lower F0 would be produced to accommodate the pitch shift. Conversely, when presented with F4, participants' vocal systems would produce higher vocal pitches to accommodate the expected downward shift in auditory feedback. Presentations of G4 would not elicit any changes in vocal pitch production.

Test.

There were 10 test trials of the same note sequences used in the baseline and training phases. On test trials (as in the baseline trials) participants received unaltered auditory feedback of their reproductions of the first and third target notes. Thus participants were required to sing each of the three musical targets 45 times per sequence, or in other words, 135 trials for A4-G4-F4 and 135 trials for F4-G4-A4 (each composed of 10 baseline, 25 training, and 10 test trials) for a total of 270 trials.

Procedure: Experiment 2

In experiment 2, 15 trained singers who did not participate in experiment 1 were given 10 baseline trials (see Fig. 1C), each comprising the sequence G4-G4-G4 (for a total of 30 trials). This sequence was also used in the training and test phases of experiment 2. As in experiment 1, auditory feedback of the participants' reproduction of the notes was unaltered.

FAF training.

Participants were instructed to reproduce the target note, which was always G4 (the 392-Hz tone) on each of the trials. Even though the direction of the FAF was predictable, the target note could no longer serve as a contextual cue because it was the same note (G4) across all trials (see Fig. 1C); that is, participants could no longer rely on the association between the sequence of target note presentation and the direction of the pitch shift manipulation, since the same note was presented on all trials. As in experiment 1, an increase of 4 cents on the first note and a decrease of 4 cents on the third note of the sequence occurred successively over the 25 trials, until auditory feedback was shifted by 1 semitone.

Test.

As in experiment 1, on the 10 test trials for each target note of the G4-G4-G4 sequence there was unaltered auditory feedback during the participants' reproduction of the notes. Thus participants were required to reproduce the target note over 135 trials, or 45 times (10 baseline, 25 training, and 10 test trials) for each note in the G4-G4-G4 sequence.

Note that participants' auditory feedback was presented at ~85 dB SPL, which was maintained by having them monitor a LED display of their vocal amplitude, whereas the multitalker babble was presented at ~75 dB SPL. F0 values were calculated for each vocal production using an autocorrelation algorithm included in the Praat program (Boersma 2001). F0 values were normalized to each target note (F4, G4, or A4) by calculating the appropriate cent values using the following formula: Cents = 100 (12 log2 F/B), where F is the F0 value in Hertz and B is frequency of the target pitch participants were instructed to sing (349, 392, or 440 Hz). Note that because the F0 values for each target were normalized to the target frequency when they were converted to cents, 0 cents represents an exact frequency match with the target note.

Statistical Analyses

The mean F0 of the initial 1,500 ms of each vocal utterance was analyzed, since previous research has identified compensatory responding during FAF typically occurs between 130 to 500 ms post-perturbation onset (Burnett et al. 1997; Jones and Keough 2008). Singers' F0 values during the A4-G4-F4 and F4-G4-A4 sequences (which were counterbalanced in experiment 1) were broken down into blocks of five trials within each sequence: the last five baseline trials (6–10), shift trials (11–15, 16–20, 21–25, 26–30, 31–35), and test trials (36–40, 41–45). A 2 × 3 × 8 MANOVA (a test that does not have an assumption of sphericity) was conducted on the mean F0 values with 2 (sequence: A4-G4-F4 or F4-G4-A4) × 3 (pitch shift: upward, unaltered, and downward) × 8 (block) as factors. The data for singers in the G4-G4-G4 condition (experiment 2) were also broken down into blocks of five trials within each sequence: baseline trials (6–10), shift trials (11–15, 16–20, 21–25, 26–30, 31–35), and test trials (36–40, 41–45). A 3 × 8 MANOVA was conducted on the mean F0 values with 3 (pitch shift: upward, unaltered, and downward) × 8 (block) as factors. Tukey's honestly significant difference (HSD) test was implemented for post hoc analyses with an α level of 0.05 used for all statistical tests.

Typically, during FAF studies, researchers have only examined aftereffects following a series of altered feedback trials. However, if participants were altering an internal model that permitted them to compensate for these perturbations while receiving FAF, then examining the median value within the initial 50 ms of vocal onset should identify whether sensorimotor adaptation occurred and that the adjusted motor commands were used at the start of vocal productions, before feedback was available. We calculated the median as opposed to the mean to avoid the influence of outliers that often occur immediately following voice onset. Thus the median F0 values during the initial 50 ms of each vocalization for each sequence were pooled over blocks of five trials in the same fashion as the mean F0 values over 1,500 ms. Therefore, a 2 × 3 × 8 MANOVA was carried out on the median F0 values obtained in experiment 1 during the initial 50 ms of vocalization with 2 (sequence: A4-G4-F4 or F4-G4-A4) × 3 (pitch shift: upward, unaltered, and downward) × 8 (block) as factors. A 3 × 8 MANOVA was also conducted on the median F0 values obtained in experiment 2 during the initial 50 ms of vocalization with 3 (pitch shift: upward, unaltered, and downward) × 8 (block) as factors.

RESULTS

Experiment 1

Singers' mean F0 values were calculated for the last baseline trial, the last training trial, and the first 10 trials and are depicted in Fig. 2. The analysis of the mean F0 values during the A4-G4-F4 and the F4-G4-A4 sequences resulted in a main effect of pitch shift (F(2,28) = 251.55, P < 0.01). The mean F0 values for the target notes A4 and F4 across all trials during both upward and downward pitch shift conditions were significantly lower and higher than the mean F0 values obtained for the control target note (unaltered in pitch) G4, respectively (P < 0.05). A significant pitch shift by block interaction (F(14, 196) = 167.61, P < 0.01) revealed that singers' baseline F0 values were significantly higher than F0 values obtained during FAF blocks 2–5 (see Fig. 3A, red circles) during the upward shifted feedback trials (all P < 0.05). Singers' baseline F0 values also were significantly lower than the F0 values observed during FAF blocks 2–5 (see Fig. 3A, blue circles) of the downward shifted feedback trials (all P < 0.05).

Fig. 2.
Average single trial voice fundamental frequency (F0) values across an entire utterance before, during, and following frequency-altered feedback (FAF). Trials presented in solid black lines represent the last baseline trial when auditory feedback was ...
Fig. 3.
Average fundamental frequency (F0) values obtained in experiment 1 during blocks of FAF and test trials. F0 was calculated based on median value between 0 and 50 ms of vocal onset, or mean F0 across 1,500 ms of vocal productions. Data were normalized ...

Aftereffects were not found on trials following FAF (see Fig. 3A, block 6 red and blue circles) for either upward or downward pitch shifted conditions (all P > 0.05). When auditory feedback suddenly returned to normal during test trials singers produced the target notes at F0 values similar to those obtained during baseline. Furthermore, baseline F0 values for the control target G4 (see Fig. 3B, black circles) were not significantly different than any other block of unaltered trials (P > 0.05), nor were they different than the unaltered baseline and test values of A4 and F4 before or following the upward and downward pitch shift conditions (all P > 0.05). Recall that because the F0 values for each target were normalized to the target frequency, a value of 0 cents indicated that participants matched the target note perfectly. Thus no differences were expected when producing the various target notes while receiving unaltered auditory feedback, providing they sang each note with similar degree of accuracy. However, when we examined the baseline F0 values for our control target, G4, we found that the F0 values were significantly higher than F0 values for FAF blocks 2–5 during upward pitch shift trials (all P < 0.05), and they were significantly lower than downward pitch shift FAF blocks 2–5 (all P < 0.05). Indeed, participants changed how they produced the target notes that were manipulated during the FAF training trials such that those F0 values were statistically different than the F0 values obtained while producing a target note with unaltered auditory feedback (G4).

Singers' median F0 values during the first 50 ms of each utterance were calculated for each sequence and are presented in Fig. 3A (blue and red squares) and Fig. 3B (gray squares). The analysis of the median F0 values during the initial 50 ms of vocal onset during the A4-G4-F4 and F4-G4-A4 conditions produced a main effect of pitch shift, F(2,28) = 22.78, P < 0.01. The median F0 values during downward pitch shifts were significantly higher than the F0 values during the control and upward pitch shift conditions, respectively (P < 0.05). The median F0 values during upward pitch shifts were not statistically different from the F0 values obtained during the control condition (P > 0.05). Moreover, there was a significant pitch shift by block interaction (F(14, 196) = 13.33, P < 0.05), which revealed that singers' baseline F0 values for the target notes A4 and F4 during the shift up condition (see Fig. 3A, red squares) were significantly higher than the F0 values obtained during FAF blocks 4, 5, and both blocks (6 and 7) of unaltered (test) trials following FAF (P < 0.05). In addition, singers' baseline F0 values obtained for the targets A4 and F4 during downward pitch shifts (see Fig. 3A, blue squares) were significantly lower than F0 values obtained during FAF blocks 4, 5, and the first block (block 6) of test trials following FAF (P < 0.05). Thus examining the initial 50 ms of vocal onset identified that singers not only exhibited aftereffects when producing targets with upward and downward manipulations in auditory feedback, but that sensorimotor adaptation occurred online during blocks of FAF trials. If you refer to Fig. 3A, during FAF blocks 1–5 you can see that participants altered how they initiated their vocal productions in directions consistent with how they compensated, on average during the entire utterance (1,500 ms data), for progressively increasing (red squares) and decreasing (blue squares) changes in auditory feedback. Finally, generalization effects carried over to the F0 values for the control target following FAF trials. Baseline F0 values for the control target (G4) were significantly different than the first block of F0 values following FAF (P < 0.05). This is evident during block 6 in Fig. 3B (gray squares) where participants' initial vocal productions (within 50 ms of vocal onset) were higher than similar values on other blocks of trials.

Experiment 2

Singers' mean and median (within 50 ms of vocal onset) F0 values were calculated for the G4-G4-G4 condition and are depicted in Fig. 4. Participants' mean F0 values during the G4-G4-G4 condition yielded a main effect of pitch shift (F(2,28) = 414.95, P < 0.01). Similar to the results obtained in experiment 1, trained singers' mean F0 values collected during upward and downward pitch shift manipulations were significantly lower and higher than the mean F0 values found for the control target, respectively (P < 0.05). A significant pitch shift by block interaction was also observed (F(14, 196) = 171.04, P < 0.01). Post hoc testing revealed similar results as those in experiment 1, such that participants' baseline mean F0 values were significantly higher than the mean F0 values obtained during all FAF blocks during the upward shifted (see Fig. 4A, red circles) feedback trials (all P < 0.05). Moreover, singers' mean baseline F0 values were significantly lower than the mean F0 values obtained during all FAF blocks of the downward shifted (see Fig. 4A, blue circles) feedback trials (all P < 0.05). No differences were observed during test blocks of trials (see Fig. 4A blocks 6–7) when participants produced the target following FAF that was shifted either upward or downward in frequency (all P > 0.05). The mean F0 values obtained from singers producing the target with unaltered auditory feedback were similar across all blocks of trials (P > 0.05). Thus the results of the singers' mean F0 values are virtually identical to those obtained in experiment 1.

Fig. 4.
Average fundamental frequency (F0) values obtained in experiment 2 during blocks of FAF and test trials. Data were normalized by subtracting the average of the last 5 baseline F0 values from the F0 values collected during FAF training and test trials. ...

Singers' median F0 values during the initial 50 ms of vocal onset (blue and red squares in Fig. 4A and gray squares in Fig. 4B) revealed a main effect of pitch shift (F(2,28) = 12.58, P < 0.01). The median F0 values during upward pitch shifts was significantly higher than the F0 values obtained during the no shift and downward pitch shift conditions, respectively (P < 0.05). There were no differences found between the F0 values in the no shift and downward pitch shift conditions (P > 0.05). The difference observed between the F0 values in the upward pitch shift condition compared with those in the unaltered and downward pitch conditions was opposite to that found in experiment 1.

Post hoc results of the significant pitch shift by block interaction (F(14, 196) = 13.33, P < 0.01) suggest that the median 50 ms F0 values obtained during baseline for the control target were significantly different than the control targets median 50 ms F0 values (see Fig. 4B, gray line) obtained during the last three blocks of trials where FAF was presented (P < 0.05). Lastly, it appears that the F0 values observed on unaltered trials carried over to trials when auditory feedback was shifted downward; however, the differences were not significant (P > 0.05).

DISCUSSION

The present study was designed to investigate whether altered auditory feedback would result in corresponding changes in vocal production that was specific to the conditions imposed on each target note. Indeed, the data in experiment 1 represent the first demonstration that vocal control may be represented by multiple internal models and that participants' acoustic-motor mappings are capable of independent sensorimotor recalibration. Participants' initial, within 50 ms of onset, F0 productions were consistently influenced by the perturbed feedback experienced on previous FAF trials. For instance, to continually produce the target notes accurately while receiving FAF, trained singers in experiment 1 had to progressively modify their vocal productions. Thus, as participants' auditory feedback was incrementally shifted (± 4 cents) trial by trial, we observed corresponding changes in open-loop control followed by rapid online correction for pitch deviations; that is, participants adjusted their F0 in the opposite direction of the perturbation once the new discrepancy was detected. Over time, participants' compensatory responses in experiment 1 resulted in the gradual recalibration of individual internal models associated with each target note (as can be seen in Fig. 3A). The aftereffects observed during training (FAF trials) did not generalize to vocal productions immediately following altered feedback trials (see blocks 6 and 7 in Fig. 3A). Rather the aftereffects were unique to the frequency of the target that was presented every third trial. This suggests that auditory feedback is uniquely associated with the internal model responsible for each target notes production and that voice F0 may be represented by multiple internal models.

Even though the recalibration of internal models was limited to the pitch-shifted targets during FAF trials, aftereffects were observed within 50 ms of vocal onset for the control target following the training period (Fig. 3B). Transferred aftereffects (generalization) to an unaltered stimulus have been observed in previous FAF (Jones and Keough 2008; Jones and Munhall 2005) and arm-reaching investigations (Ghahramani et al. 1996; Shadmehr and Mussa-Ivaldi 1994). In the current study and in other work (Jones and Keough 2008; Jones and Munhall 2005) pitch-shift manipulations were gradually presented during FAF. When feedback returned to normal, participants heard their F0 for altered notes 1 semitone different than it was on the previous trial. Thus the single-trial aftereffects observed in the median 50 ms F0 data during the test trials for the unaltered pitch target may have been the result of the sudden and large changes in auditory feedback following training.

The results of experiment 1 were analogous to some of the studies that investigated multiple internal models for the motor control of arm-reaching movements (e.g., Osu et al. 2004; Wada et al. 2003; Wainscott et al. 2005). However, the arm reaching literature has yielded inconsistent results, making it difficult to determine whether multiple internal models exist. For instance, no evidence was provided for multiple internal models under several conditions: if the task was dependent on color cues (e.g., room light color), if trials were presented randomly, if movements were both dynamic transformations or were dependent on the same state variable, or if the temporal interval between internal model acquisition was less than 4 h (e.g., Brashers-Krug et al. 1996; Gandolfo et al. 1996; Karniel and Mussa-Ivaldi 2002; Krakauer et al. 1999; Tong et al. 2002). Karniel and Mussa-Ivaldi (2002) did not find evidence to suggest that participants could acquire and switch between internal models while reaching in two alternating viscous force fields, even after participants performed these movements in four sessions over 4 days. Rather they argued that a single internal model was formed and used when presented with sequential perturbations.

Although a number of studies failed to find evidence that participants form multiple internal models, several researchers have reported that participants can acquire multiple internal models when prompted by a contextual cue (Osu et al. 2004; Wada et al. 2003; Wainscott et al. 2005). For example, Wada et al. (2003) found that participants could learn and switch between two internal models for reaching in opposing viscous force fields presented randomly and cued only by color. Moreover, Osu et al. (2004) found that providing visual cues before movement initiation allowed participants to predictably switch between acquired motor programs. In both cases, it was argued that multiple internal models were formed under diverse conditions, including single-joint movements to four or eight target locations while receiving assistive/resistive or rotational forces to the limb. On the one hand, as long as the contextual information is clear and distinct (e.g., color, target notes) then learning multiple environments can occur relatively easily (Wada et al. 2003). Indeed, the results of previous arm-reaching studies (Osu et al. 2004; Wada et al. 2003) that support the notion of multiple internal models appear to be consistent with this interpretation. On the other hand, if the contextual information is ambiguous or not present at all and if the multiple environments are difficult to discriminate (Brashers-Krug et al. 1996; Gandolfo et al. 1996; Karniel and Mussa-Ivaldi 2002; Krakauer et al. 1999; Tong et al. 2002), then acquiring or switching between multiple internal models is difficult (Wada et al. 2003).

Our data support the hypothesis that contextual information is important in the acquisition and switching of multiple internal models for vocal control of singing. We found that trained singers could rapidly acquire and independently modify multiple internal models when cued by different target notes. However, unlike the aforementioned arm-reaching studies, participants were not informed that the target notes could be used as a contextual cue. When the target notes could no longer be used as a contextual cue (experiment 2) to indicate the direction of pitch shift manipulation on the current FAF trial, we found trial-by-trial adaptation in participant's median 50 ms F0 values for the unaltered target. We also observed a similar pattern of compensatory responding to those found in experiment 1. Thoroughman and colleagues (2007) described trial-by-trial adaptation as “the transformation of individually sensed movements into incremental updates of adaptive control.” In other words, compensating for modified sensory feedback within a given trial can influence motor commands on subsequent trials. Indeed, this is what we observed in our study; however, it should be noted that the trial-by-trial adaptation was limited to the F0 values corresponding to the unaltered pitch target. On trials following upward pitch shifted FAF, participants received natural acoustic feedback. During these trials singers initiated vocal productions as if they were anticipating similar shifts in auditory feedback that were presented to them on the previous trial. When participants produced the same note in experiment 1 with unaltered feedback no aftereffects were observed following FAF trials. Thus it seems that the pattern of sensorimotor recalibration is dependent on the nature of the motor commands associated with the task.

In the case of arm-reaching studies, participants have been required to initiate movements from a fixed location (Imamizu and Kawato 2008; Krakauer et al. 1999; Osu et al. 2004; Wada et al. 2003). This has permitted the examination of feed-forward internal models within 250 ms of movement initiation (e.g., Wainscott et al. 2005), which has been argued to be a period of time where motor commands (trajectories) are influenced little by closed-loop control or online feedback. A unique aspect of studying voice F0 of singing is that it is not necessary (or actually possible) for participants to initiate motor commands at a consistent starting point (a particular pitch). Moreover, previous work in our laboratory has also found that trained singers initiate vocal productions at or near the desired target frequencies while singing (Keough and Jones 2009). Thus we have been able to identify sensorimotor adaptation that occurs within 50 ms of vocal onset and have measured this adaptation over the course of training with dynamic perturbations cued by different target notes (see Fig. 3, A and B).

Interestingly, previous FAF studies have relied exclusively on trials immediately following pitch-shifted feedback to examine sensorimotor adaptation (Jones and Keough 2008; Jones and Munhall 2000, 2005). Although our results confirmed that participants compensated for FAF, no aftereffects were observed in the mean F0 data when feedback was returned to normal (see Fig. 2). One can see in Fig. 2 that during the baseline condition (solid black line) participants were quite accurate in reproducing the target frequency. Notice too that during the last training trial (dashed line) participants did not only alter how they initiated their vocal productions (see Fig. 2, A and C) but they also compensated for the FAF. When auditory feedback was returned to normal following training trials (dotted line) participants initiated their productions as if they were expecting FAF (see Fig. 2, A and C); however, when their production error (or that their feedback was now unaltered and compensation was unnecessary) they rapidly modified their F0 to levels similar to levels observed during the baseline phase. This explains why there were no significant differences between singers mean F0 values obtained during baseline and test trials. Thus it may be the case that investigating the effects as they occur online during altered feedback trials may provide a more sensitive measure of adaptation.

The compensatory responses observed were consistent with those obtained in previous FAF studies examining speech (Donath et al. 2002; Houde and Jordan 1998; Jones and Munhall 2000, 2005) and singing (Burnett et al. 1997; Jones and Keough 2008; Keough and Jones 2009; Natke et al. 2003; Zarate and Zatorre 2008). Regarding speech, previous studies have found that the speech motor system appears to be represented in a task-specific manner (Shaiman and Gracco 2002; Tramblay et al. 2008). For instance, Shaiman and Gracco (2002) found that applying unanticipated mechanical loads to the upper lip during speech production influenced only productions that required the upper lip (e.g., ‘apa’, ‘p’ requires both lips). Perturbing articulators uninvolved in the specific speech sounds being produced (e.g., ‘afa’, ‘f’ requires the lower lip) elicited no differences in electromyographic (EMG) activation between control and load trials (Shaiman and Gracco 2002).

Indeed, our data suggest that the motor system involved in the regulation of voice F0 while singing is also organized in a task-specific manner. Aftereffects did not carry over to influence notes on subsequent trials following FAF; rather the aftereffects in experiment 1 were limited to the notes participants produced every third trial. Tramblay and colleagues (2008) also argued that speech learning is contextually sensitive and generalization was not observed even when utterances shared similar movements. Although there was some degree of overlap when singers produced the target stimuli in our study, our data suggest that the pitch of musical notes is not influenced by altered feedback experienced on previous trials and that singing may be represented by multiple frequency-specific internal models.

Our work complements previous findings from the arm-reaching literature studying multiple internal models for motor control. The results of this study suggest that producing multiple target notes while singing requires participants to use unique motor commands for each target. Although the human voice has the potential to be initiated at unpredictable frequencies during onset, trained singers consistently produced vocal pitches near the desired target frequencies, even in the presence of FAF. Interestingly, when the target notes no longer served as a contextual cue we observed a very different pattern of adaptation. Overall, our data suggest that sensorimotor adaptation is automatic, it can be observed during training within 50 ms of vocal onset while singing, and it is uniquely associated with the motor commands for specific musical targets.

GRANTS

This research was supported by the National Institute on Deafness and Other Communication Disorders grant and a grant from the Natural Sciences and Engineering Research Council of Canada.

DISCLOSURES

No conflicts of interest, financial or otherwise, are declared by the author(s).

ACKNOWLEDGMENTS

We thank Drs. M. Jarick and C. Reed-Elder for critical discussions and reviews of the manuscript.

REFERENCES

  • Boersma P. Praat, a system for doing phonetics by computer. Glot International 5: 341–345, 2001
  • Brashers-Krug T, Shadmehr R, Bizzi E. Consolidation in human motor memory. Nature 382: 252–255, 1996. [PubMed]
  • Burnett TA, Senner JE, Larson CR. Voice F0 responses to pitch-shifted auditory feedback: A preliminary study. J Voice 11: 202–211, 1997. [PubMed]
  • Donath TM, Natke U, Kalveram KT. Effects of frequency-shifted auditory feedback on voice F0 contours in syllables. J Acoust Soc Am 111: 357–366, 2002. [PubMed]
  • Donchin O, Francis JT, Shadmehr R. Quantifying generalization trial-by-trial behavior of adaptive systems that learn with basis functions: theory and experiments in human motor control. J Neurosci 23: 9032–9045, 2003. [PubMed]
  • Gandolfo F, Mussa-Ivaldi FA, Bizzi E. Motor learning by field approximation. Proc Natl Acad Sci USA 93: 3843–3846, 1996. [PMC free article] [PubMed]
  • Ghahramani Z, Wolpert DM, Jordan MI. Generalization to local remappings of the visuomotor coordinate transformation. J Neurosci 16: 7085–7096, 1996. [PubMed]
  • Ghahramani Z, Wolpert DM. Modular decomposition in visuomotor learning. Nature 386: 392–395, 1997. [PubMed]
  • Guenther FH, Perkell JS. A neural model of speech production and its application to studies of the role of auditory feedback in speech. In: Speech Motor Control in Normal and Disordered Speech. Oxford: Oxford Univ. Press, 2004
  • Houde JF, Jordan MI. Sensorimotor adaptation in speech production. Science 279: 1213–1216, 1998. [PubMed]
  • Imamizu H, Kawato M. Neural correlates of predictive and postdictive switching mechanisms for internal models. J Neurosci 28: 10751–10765, 2008. [PubMed]
  • Jones JA, Keough D. Acoustic-vocal mapping for pitch control in singers, nonsingers. Exp Brain Res 190: 279–287, 2008. [PMC free article] [PubMed]
  • Jones JA, Munhall KG. Perceptual calibration of F0 production: evidence from feedback perturbation. J Acoust Soc Am 108: 1246–1251, 2000. [PubMed]
  • Jones JA, Munhall KG. Remapping auditory motor representations in voice production. Curr Biol 15: 1768–1772, 2005. [PubMed]
  • Kalenscher T, Kalveram KT, Konczak J. Effects of two different dynamic environments on force adaptation: exposure to a new force but not the preceding force experience accounts for transition, and after-effects. Motor Control 7: 242–263, 2003. [PubMed]
  • Karniel A, Mussa-Ivaldi A. Does the motor control system use multiple models and context switching to cope with a variable environment? Exp Brain Res 143: 520–524, 2002. [PubMed]
  • Kawahara H, Masuda-Katsuse I, de Cheveigne A. Restructuring speech representations using a pitch-adaptive time-frequency smoothing, and an instantaneous-frequency-based F.0 extraction: possible role of a repetitive structure in sounds. Speech Commun 27: 187–207, 1999
  • Keough D, Jones JA. The sensitivity of auditory-motor representations to subtle changes in auditory feedback while singing. J Acoust Soc Am 126: 837–846, 2009. [PMC free article] [PubMed]
  • Krakauer JW, Ghilardi MF, Ghez C. Independent learning of internal models for kinematic and dynamic control of reaching. Nat Neurosci 2: 1026–1031, 1999. [PubMed]
  • Natke U, Donath TM, Kalveram KT. Control of voice fundamental frequency in speaking versus singing. J Acoust Soc Am 113: 1587–1593, 2003. [PubMed]
  • Osu R, Hirai S, Yoshioka T, Kawato M. Random presentation enables subjects to adapt to two opposing forces on the hand. Nature 7: 111–112, 2004 [PubMed]
  • Sainburg RL, Ghez C, Kalakanis D. Intersegmental dynamics are controlled by sequential anticipatory, error correction, and postural mechanisms. J Neurophysiol 81: 1045–1056, 1999. [PubMed]
  • Shadmehr R, Moussavi ZMK. Spatial generalization from learning dynamics of reaching movements. J Neurosci 20: 7807–7815, 2000. [PubMed]
  • Shadmehr R, Mussa-Ivaldi FA. Adaptive representation of dynamics during learning of a motor task. J Neurosci 14: 3208–3224, 1994. [PubMed]
  • Shaiman S, Gracco VL. Task-specific sensorimotor interactions in speech production. Exp Brain Res 146: 411–418, 2002. [PubMed]
  • Smotherman M, Zhang S, Metzner W. A neural basis for auditory feedback control of vocal pitch. J Neurosci 23: 1464–1477, 2003. [PubMed]
  • Thoroughman KA, Fine MS, Taylor JA. Trial-by-trial adaptation: a window into elemental neural computation. Prog Brain Res 165: 373–382, 2007. [PubMed]
  • Tong C, Wolpert DM, Flanagan JR. Kinematics and dynamics are not represented independently in motor working memory: evidence from an interference study. J Neurosci 22: 1108–1113, 2002. [PubMed]
  • Tremblay S, Houle G, Ostry DJ. Specificity of speech motor learning. J Neurosci 28: 2426–2434, 2008. [PubMed]
  • Wada Y, Kawabata Y, Kotosaka S, Yamamoto K, Kitazawa S, Kawato M. Acquisition and contextual switching of multiple internal models for different viscous force fields. Neurosci Res 46: 319–331, 2003. [PubMed]
  • Wainscott SK, Donchin O, Shadmehr R. Internal models and contextual cues: encoding serial order and direction of movement. J Neurophysiol 93: 786–800, 2005. [PubMed]
  • Wolpert DM, Doya K, Kawato M. A unifying computational framework for motor control and social interaction. Phil Trans R Soc Lond B Biol Sci 358: 593–602, 2003. [PMC free article] [PubMed]
  • Wolpert DM, Kawato M. Multiple paired forward and inverse models for motor control. Neural Netw 11: 1317–1329, 1998. [PubMed]
  • Zarate JM, Zatorre RJ. Experience-dependent neural substrates involved in vocal pitch regulation during singing. Neuroimage 40: 1871–1887, 2008. [PubMed]

Articles from Journal of Neurophysiology are provided here courtesy of American Physiological Society
PubReader format: click here to try

Formats:

Related citations in PubMed

See reviews...See all...

Cited by other articles in PMC

Links

  • MedGen
    MedGen
    Related information in MedGen
  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...