Logo of jeabehavJournal of the Experimental Analysis of Behavior Web SiteSubscriber LoginJournal of Applied Behavior Analysis Web SiteSubscription InformationInformation for AuthorsJournal of the Experimental Analysis of Behavior Web Site
J Exp Anal Behav. 2006 Sep; 86(2): 211–222.
PMCID: PMC1592361

Choice between Single and Multiple Reinforcers in Concurrent-Chains Schedules

Abstract

Pigeons responded on concurrent-chains schedules with equal variable-interval schedules as initial links. One terminal link delivered a single reinforcer after a fixed delay, and the other terminal link delivered either three or five reinforcers, each preceded by a fixed delay. Some conditions included a postreinforcer delay after the single reinforcer to equate the total durations of the two terminal links, but other conditions did not include such a postreinforcer delay. With short initial links, preference for the single-reinforcer alternative decreased when a postreinforcer delay was present, but with long initial links, the postreinforcer delays had no significant effect on preference. In conditions with a postreinforcer delay, preference for the single-reinforcer alternative frequently switched from above 50% to below 50% as the initial links were lengthened. This pattern of results was consistent with delay-reduction theory (Squires & Fantino, 1971), but not with the contextual-choice model (Grace, 1994) or the hyperbolic value-added model (Mazur, 2001) as they have usually been applied. However, the hyperbolic value-added model could account for the results if its calculations were expanded to include reinforcers delivered in later terminal links. The implications of these findings for models of concurrent-chains performance are discussed.

Keywords: concurrent chains, multiple reinforcers, delay-reduction theory, contextual-choice model, hyperbolic value-added model, key peck, pigeons

Concurrent-chains schedules have been used in many experiments on choice, and they have provided abundant information about factors that affect choice behavior. A typical concurrent-chains schedule includes a pair of schedules called initial links, each of which leads to its own terminal link, another reinforcement schedule that leads to food (Autor, 1960). The initial links might be two equal variable-interval (VI) schedules, and the two terminal links might be fixed-interval (FI) schedules of different lengths. Response percentages on the two VI schedules are used as measures of preference for one terminal link versus the other. Naturally, the durations of the terminal-link schedules affect preference in this procedure, but so do many other factors, such as the durations of the initial links (Fantino, 1969), the sizes of the reinforcers delivered during the two terminal links (Snyderman, 1983), and whether the terminal-link schedules are fixed or variable (e.g., Davison, 1969; Rider, 1983).

Predicting choice behavior on concurrent-chains schedules has proven to be a challenging task, and a number of different mathematical models have been proposed, including Fantino's (1969) delay-reduction theory (DRT), Grace's (1994) contextual-choice model (CCM), and Mazur's (2001) hyperbolic value-added model (HVA). Mazur (2001) compared the accuracy of these three models' predictions for a total of 92 data sets from 19 published experiments. When given the same number of free parameters (from two to four, depending on the data set), the models were similar in their predictive accuracy: DRT accounted for an average of 83.0% of the variance, CCM for 90.8%, and HVA for 89.6%. The small differences in accuracy of the three models may not be very informative, because they could have resulted from random variations in the data or arbitrary decisions about exactly how to add free parameters to the models. Yet although these models were similar in their ability to predict the results from this set of studies, they are based on different assumptions about the processes that underlie choice behavior, and for other choice situations they make distinctly different predictions (e.g., Mazur, 2000; Savastano & Fantino, 1996).

In the present experiment, the predictions of DRT, CCM, and HVA were compared for concurrent-chains schedules in which one terminal link included a single food presentation and the other included multiple food presentations. For example, the left terminal link might consist of a 5-s delay followed by one food delivery, and the right terminal link might consist of three food deliveries, each preceded by a 20-s delay. The effects of two main variables were examined in this experiment: (1) the lengths of the initial-link VI schedules, and (2) the presence or absence of a post-reinforcer delay (PRD) after the single reinforcer that equated the terminal-link durations for the two alternatives. The three models differ in their predictions about how manipulation of these two variables should affect choice behavior. To show this, a brief description of each model will be given, and then the predictions of the three models will be explained.

Grace's (1994) CCM can be expressed as follows:

equation image
1

B1 and B2 are response rates during the initial links, ri1 and ri2 are the rates of reinforcement in the two initial links (the rates at which each of the two terminal links are entered), and rt1 and rt2 are the rates of reinforcement in the two terminal links (the rates at which the terminal links deliver food). Thus, according to CCM, choice responses in concurrent-chains schedules depend on both the initial-link and terminal-link schedules. Equation 1 also includes three free parameters: b is a measure of possible response bias, ai reflects a subject's sensitivity to differences in the initial-link schedules, and at reflects sensitivity to differences in the terminal-link schedules. The ratio Tt/Ti is an important part of CCM. Tt is the average terminal-link duration, and Ti is the average initial-link duration. Because the ratio Tt/Ti is used as an exponent for the terminal-link reinforcement rates, CCM predicts that differences in the terminal links will have greater effect on preference when they are long relative to the sizes of the initial links. In other words, CCM states that the effects of the terminal-link schedules on choice will depend on the context—that is, on their duration relative to the durations of the initial links.

To make a fair comparison of CCM and DRT, Mazur (2001) began with Squires and Fantino's (1971) version of DRT, but added b, ai, and at so the number of free parameters was the same for both models. This led to the following equation for DRT:

equation image
2

where Ttotal > atTt1 and Ttotal > atTt2 . The notation is the same as for CCM, except that R1 and R2 are the overall rates of reinforcement for the two alternatives, that are based on the time spent in both the initial links and terminal links. Ttotal is the mean total time to primary reinforcement from the start of the initial links, and Tt1 and Tt2 are the mean durations of the two terminal links. Notice that Equation 2 applies only if Ttotal is greater than both atTt1 and atTt2. For cases in which one of these quantities is greater than Ttotal, DRT predicts exclusive preference for the other alternative.

HVA (Mazur, 2001) begins with the hyperbolic-decay equation that has been successful in accounting for results from many experiments on discrete-trial choice (e.g., Mazur, 1984, 1987, 1993):

equation image
3

V is the value of an alternative that can deliver any one of n possible delays on a given trial, and pi is the probability that a delay of Di seconds will occur. A is a measure of the amount of reinforcement, and K is a parameter that determines how quickly value decreases with increasing delay. Equation 3 states that the total value of an alternative with variable delays can be obtained by taking a weighted mean, with the value of each possible delay weighted by its probability of occurrence in the schedule.

To extend this approach to concurrent-chains schedules, Mazur (2001) made the following assumptions: (1) the value of each terminal link depends on the time from the onset of that link to primary reinforcement, (2) the value of the initial links depends on the (variable) time from the onset of the initial links to primary reinforcement, and (3) choice proportions are based on the amount of value added when a terminal link is entered (i.e., on the amount of increase in value when the terminal link is entered). These assumptions led to the following equation for HVA:

equation image
4

where Vt1 > atVi and Vt2 > atVi . Vi, Vt1, and Vt2 are the values of the initial links and the two terminal links, as obtained from Equation 3. (Just as DRT uses a single value of Ttotal to represent the average time to reinforcement in the initial links, HVA uses a single value of Vi to represent the value of the initial links. The implicit assumption for both models is that the initial links constitute a single state from which either of two other states—the two terminal links—might be entered.) In other respects, Equation 4 is similar in form to CCM and DRT, and the free parameters b, ai, and at have the same interpretation as in the other two models.

To compare the predictions of these models for the present experiment, imagine a concurrent-chains schedule with equal VI 20-s schedules as initial links, signaled by a green key on the left and a red key on the right. The red terminal link consists of three 3-s food deliveries, each preceded by a 20-s delay (so that, including reinforcement time, the three food deliveries occur 20 s, 43 s, and 66 s after the terminal link begins). The green terminal link consists of a 5-s delay followed by one 3-s food delivery (as illustrated in the left portion of Figure 1). Immediately after the food deliveries, the green and red keys of the initial links are reinstated. This condition can be compared to one in which a 61-s PRD is added after the single reinforcer of the green terminal link, so that both terminal links last for a total of 69 s (see the right portion of Figure 1). Suppose that without the PRD, a pigeon exhibits a preference for the left alternative, making more responses on the green key during the initial links. How will the pigeon's choice percentages change when a PRD is added to the green terminal link? Second, how will its choice percentages change (in either of these conditions) if the initial-link schedules are lengthened from VI 20-s schedules to VI 480-s schedules?

Fig 1
The sequences of delays and food presentations in the green and red terminal links are shown for a condition with no post-reinforcer delay (left) and for the corresponding condition with a post-reinforcer delay (right).

First, we can consider the predictions of HVA (Equation 4). As it has been applied to past studies, HVA predicts that the presence or absence of a PRD will have no effect on choice. This is because the calculations of Vt1, Vt2, and Vi depend only on the delays before the food deliveries, not on delays after them. HVA's predictions about an increase in the initial-link durations are also straightforward. The terminal links are unchanged, so Vt1 and Vt2 remain the same. The only difference is that Vi, the value of the initial links, decreases as the initial links become longer. As Vi decreases and approaches zero, the second parenthetical expression in Equation 4 decreases and approaches Vt1/Vt2, the ratio of the values of the two terminal links. Therefore, HVA predicts that whichever alternative is preferred when the initial links are short should continue to be preferred when the initial links are lengthened, but the strength of preference (the ratio B1/B2) should become less extreme. As long as there is no response bias (i.e., as long as b  =  1), HVA predicts that a subject should never switch preference from one alternative to the other as the initial-link durations are changed.

The predictions of CCM (Equation 1) are different. Adding a PRD increases Tt, the average time spent in the terminal links, so the exponent Tt/Ti increases in Equation 1. CCM therefore predicts that adding a PRD should always lead to an increase in choice percentages for the preferred alternative. (Alternatively, if the PRD were not treated as part of the terminal-link duration because it occurs after the food delivery, then Tt would not change, and CCM would predict that adding the PRD should have no effect.)

Regarding the effects of an increase in initial-link duration, CCM states that, either with or without a PRD, Ti increases if the initial links are lengthened, so the exponent Tt/Ti decreases toward zero. It follows from Equation 1 that the degree of preference for the favored alternative should decrease and approach indifference with increasingly long initial links. Like HVA, CCM predicts that a subject should never switch preference from one alternative to the other as the initial-link durations are changed.

The predictions of DRT about adding a PRD are different from those of both HVA and CCM, because this model assumes that both the relative amounts of delay reduction and R1 and R2 (the overall rates of reinforcement for the two alternatives) affect preference. With no PRD, the overall rate of reinforcement for the green alternative will be high because its terminal link lasts just 5 s and then the initial links are reinstated. The addition of the PRD will dramatically decrease the overall rate of reinforcement for the green alternative, so DRT predicts a decrease in preference for this alternative.

The predictions of DRT about lengthening the initial links are also different from those of HVA and CCM. In conditions with a PRD, the durations of both the initial links and the terminal links are equal for the two alternatives, but the red alternative delivers three times as many reinforcers, so R1/R2 equals 1/3 no matter whether the initial links are short or long. However, the delay-reduction factor favors the green alternative because of the shorter delay to reinforcement in its terminal link. According to DRT, the animal's preference will depend on the relative influence of these two opposing factors (rate of reinforcement and delay reduction), and this will change with the durations of the initial links. As a specific example, if b, ai, and at are all set equal to 1 in Equation 2, DRT predicts that B1/B2 will equal 2.31 with VI 20-s initial links (a preference for the green alternative), and B1/B2 will equal 0.35 with VI 480-s initial links (a preference for the red alternative). Therefore, unlike the other two models, DRT predicts that animals may switch preference from one alternative to the other when initial links are lengthened. For similar reasons, DRT predicts that there also can be such preference reversals in conditions without a PRD when initial links are lengthened.

The differences in the predictions of DRT compared to those of HVA and CCM are mainly due to its assumption that the overall rates of reinforcement, R1 and R2, have a direct effect on choice in concurrent-chains procedures. In contrast, HVA and CCM do not afford a role to overall rates of reinforcement, although they both assume that initial-link reinforcement rates (ri1 and ri2) do affect preference. Therefore, this study might be viewed as a test of these differing assumptions about the role of overall rates of reinforcement. The situation is not quite so simple, however. As will be shown, although the results of this experiment may support the predictions of DRT, there may be ways that HVA or CCM can accommodate the results without adopting the assumption that behavior in this type of choice situation is controlled by the overall rates of reinforcement.

Method

Subjects

The subjects were 4 White Carneau pigeons maintained at approximately 80% of their free-feeding weights. All had previous experience with a variety of experimental procedures.

Apparatus

The experimental chamber was 30 cm long, 30 cm wide, and 31 cm high. The chamber had three response keys, each 2 cm in diameter, mounted in the front wall of the chamber, 24 cm above the floor and 8 cm apart, center to center, with the side keys equidistant from the side walls. The center key was not used in this experiment. A force of approximately 0.15 N was required to operate each key. Each key could be transilluminated with lights of different colors. A hopper below the center key provided controlled access to grain, and when grain was available, the hopper was illuminated with a 2-W white light. Pairs of 2-W houselights (two white, two green, and two red) were mounted above the Plexiglas ceiling of the chamber. The chamber was enclosed in a sound-attenuating box containing a ventilation fan. All stimuli were controlled and responses recorded by a personal computer using MED-PC software.

Procedure

Experimental sessions usually were conducted six days a week. Throughout the experiment, a concurrent-chains procedure was used, in which a single VI schedule operated in the initial links, and the terminal links were fixed-time (FT) schedules that delivered one or more food reinforcers after fixed delays. During the initial links, the white houselights were lit, and the two side keys were illuminated, one key green and the other key red. The locations of the green and red keys were randomized over trials. In different conditions, the initial-link schedule was either VI 2-s, VI 10-s, or VI 240-s. The individual durations of the initial links were determined by a random probability generator (e.g., in the VI 10-s initial links, there was a .10 probability that a terminal link would be set up in any given second). The VI schedule assigned terminal links to the two response keys, using a pseudorandom sequence that ensured that the number of terminal links for the two keys was approximately equal in each session (cf. Stubbs & Pliskoff, 1969). This made the effective initial-link schedules VI 4-s, VI 20-s, or VI 480-s for each key. A 1-s changeover delay (COD) was in effect in the initial links: No response could lead to a terminal link until at least 1 s had elapsed after a switch from one key to the other. When the VI schedule timed out, a terminal-link entry was assigned to one of the keys, the next post-COD peck on that key extinguished all key lights and the terminal link began. Each terminal link consisted of one or more delays of fixed duration, each followed by a food presentation. During delays (including PRDs) in the terminal links that resulted from green-key responses, the green houselights were lit instead of the white houselights. Similarly, during delays in the terminal links that resulted from red-key responses, the red houselights were lit. During all food presentations, only the white light illuminating the food hopper was lit.

The experiment consisted of 15 conditions. Condition 15 was included to measure response bias, and the red and green alternatives were identical: The initial links were VI 20-s, and the terminal links were 5-s delays followed by 2 s of food. In all other conditions, the green terminal link delivered a single food reinforcer after a 5-s delay, whereas the red terminal link delivered either three or five reinforcers, all of which were preceded by delays of equal duration. Table 1 gives the schedules used in each of the 15 conditions. In the conditions with no PRD, the initial links were reinstated immediately after the single food presentation for the green key, and immediately after the last food presentation for the red key. In conditions with a PRD, the green houselights were turned on after the single green key reinforcer, and they remained on for the time shown in Table 1, so that the total durations of green and red terminal links (including reinforcer durations) were equal.

Table 1
Order of experimental conditions. (All durations are in seconds.)

As an example, the left portion of Figure 1 depicts the sequences of events that occurred in the terminal links of Condition 2, that had no PRD. A green terminal link included one 5-s delay with green houselights, one food presentation, and then the initial links were reinstated. A red terminal link included three 20-s delays with red houselights, each followed by a food presentation, and then the initial links were reinstated. The right portion of Figure 1 shows the sequences of events for Condition 8, that were the same as in Condition 2 except that a PRD of 61 s was included at the end of the green terminal link, so that both the red and green terminal links lasted for a total of 69 s.

All reinforcer durations were 3 s in the first 10 conditions, and they were 2 s in the remaining conditions to minimize the effects of satiation when there were more reinforcers in the red terminal links. Each session ended after 60 min or 80 terminal links, whichever came first. Each condition lasted for a minimum of 20 sessions, except Condition 15, which lasted for a minimum of 30 sessions. For each session, the percentage of initial-link responses on the green key was calculated. After the minimum number of sessions, a condition was terminated for each subject individually when the following stability criteria were met: (1) Neither the highest nor lowest single-session response percentage could occur in the last six sessions of a condition. (2) The mean response percentage across the last six sessions could not be the highest or the lowest six-session mean of the condition. (3) The mean response percentage of the last six sessions could not differ from the mean of the preceding six sessions by more than 5%.

Results

The data from the six sessions that satisfied the stability criteria in each condition were used in all data analyses. The number of sessions needed to meet the stability criteria ranged from 20 to 41 (median  =  24.5 sessions per condition). The mean percentage of responses on the green key (that always led to a 5-s delay and a single reinforcer) was used to measure the pigeons' choices. In Condition 15, which had identical initial-link and terminal-link schedules for the two keys, the green-key response percentages were 46.7%, 49.5%, 52.3%, and 48.9% for the four pigeons, indicating little or no response bias (M  =  48.9%).

For each pigeon, the mean green-key response percentages from the other 14 conditions are shown in Figure 2. The left panels show the results from the conditions with no PRD, and the right panels show the results from the conditions with a PRD. Several trends in the results are apparent in Figure 2. First, as would be expected, response percentages for the single reinforcer usually were lower with shorter delays for the multiple reinforcers. This can be seen by comparing the top, middle, and bottom rows in Figure 2, where decreasing response percentages were observed in both the no-PRD and PRD conditions. A three-way repeated-measures analysis of variance (terminal-link schedule by initial-link schedule by PRD presence/absence) was conducted on the results from the 12 conditions that included both no-PRD and PRD conditions with the same terminal-link schedules. There was a significant main effect of terminal-link delays, F(2,6)  =  18.64, p  =  .003, and a significant linear trend, F(1,3)  =  22.80, p  =  .017, reflecting the lower single-reinforcer response percentages when the delays for the multiple reinforcers were shorter.

Fig 2
Response percentages on the key that delivered a single reinforcer are shown for each pigeon.

A second finding was that in the no-PRD conditions, response percentages for the single reinforcer usually decreased with longer initial-link schedules, and without exception they were higher with VI 4-s (M  =  60.4%) than with VI 480-s (M  =  41.0%) initial-link schedules. However, in the PRD conditions the differences between the VI 4-s and VI 480-s conditions were less pronounced (47.0% versus 42.5%), and there were some cases in which response percentages actually increased with VI 480-s. This difference can be seen more clearly in Figure 3, which presents the group means from each condition, and directly compares the results from the no-PRD and PRD conditions. There was a significant main effect of initial-link duration, F(1,3)  =  15.80, p  =  .028, as well as a significant interaction between initial-link duration and PRD presence, F(1,3)  =  37.98, p  =  .009. Further analyses showed that initial-link duration had a significant effect in the no-PRD conditions, F(1,3)  =  75.41, p  =  .003, but not in the PRD conditions, F(1,3)  =  1.71.

Fig 3
Mean response percentages on the key that delivered a single reinforcer are shown for each condition.

A third finding was that, with the VI 4-s initials, response percentages for the single reinforcer were lower in the conditions with a PRD than in conditions with no PRD. This result is quite reasonable because the PRDs were added only to the single-reinforcer alternative, not to the multiple-reinforcer alternative. However, Figure 3 shows that, with the VI 480-s initial links, the differences between the no-PRD and PRD conditions were smaller and unsystematic. With VI 4-s initial links, PRD presence had a significant effect, F(1,3)  =  38.71, p  =  .008, but with VI 480-s initial links it did not, F(1,3)  =  0.83.

As explained in the Introduction, both CCM and HVA predict that response percentages for the single reinforcer should never switch from above 50% to below 50% as initial-link duration is increased. However, DRT does allow for such switches in preference. The right panels of Figure 2 show that, in the PRD conditions, there were a few cases where response percentages switched from above 50% with VI 4 s to below 50% with VI 480 s. However, the response percentages in these cases were so close to 50% that they are not convincing evidence for an actual reversal in preference. There is better evidence for such reversals in preference in the no-PRD conditions. In 10 of the 12 cases shown in the left panels of Figure 2, response percentages were above 50% with VI 4-s and below 50% with VI 480-s (binomial test, p  =  .032). Furthermore, in the two cases where the 50% mark was not crossed, the response percentages were already below 50% with the VI 4-s initial links. In summary, there was evidence for switches in preference from above 50% to below 50% in the conditions without a PRD, but there was no systematic evidence for such switches in conditions with a PRD.

Discussion

The main findings from this experiment were: (1) As the delays to the multiple reinforcers decreased, preference for the single reinforcer generally decreased. CCM, DRT, and HVA (as well as other models of concurrent-chains performance) all predict this result, and it is not surprising that preference for one alternative decreased when the reinforcer delays for the other alternative were shorter. (2) The presence of a PRD after the single reinforcer did affect preference, but only when the initial links were short. With VI 4-s initial links, choice for the single reinforcer decreased when it was followed by a PRD, but there was no systematic effect of the PRD with VI 480-s initial links. (3) In conditions without a PRD, choice percentages for the single reinforcer frequently switched from above 50% to below 50% as the initial links were lengthened. (4) In conditions with a PRD, there was no convincing evidence for switches in preference from above 50% to below 50% as the initial links were lengthened.

Of the three models in their present forms, DRT is best able to account for this set of results. DRT accounts for the effects of the PRDs in a straightforward way. According to Equation 2, choice behavior is sensitive to the overall rates of reinforcement for each of the two alternatives, R1 and R2. These reinforcement rates are calculated by considering each schedule's initial and terminal links individually. As an example, for Condition 5 of this experiment, the green key had a VI 4-s initial link and a 5-s delay as a terminal link, so this alternative averaged one reinforcer every 9 s, which amounts to 400 reinforcers per hr. The red key had a VI 4-s initial link and three reinforcers each delivered after a 20-s delay, so this alternative averaged three reinforcers every 64 s, which is a rate of 168.75 reinforcers per hr. (Reinforcer durations typically are excluded from the calculations for DRT.) In Condition 7, that had the same schedules except that a 61-s PRD was added to the green terminal link, this alternative now averaged only one reinforcer every 70 s, which is why DRT predicts a decrease in preference for the single reinforcer alternative. DRT also predicts correctly that a PRD should have a much smaller effect in the conditions with VI 480-s schedules, because with these longer initial links, adding a PRD has less effect on the reinforcement rate for the green key. Finally, as shown in the Introduction, DRT predicts that preference for the single reinforcer can switch from above 50% to below 50% as initial-link durations are increased.

To make it clear that DRT can account for the results of this experiment, Figure 4 shows predictions derived from Equation 2 with the parameter values listed in the figure caption. Of course, the predicted response percentages depend on the parameter values used. For the predictions shown in Figure 4, parameter values were chosen to approximate the same range of response percentages as in the actual group means. As can be seen by comparing Figures 3 and and4,4, DRT accounts for all of the main trends in the data—the differences between the PRD and no-PRD conditions, the effects of increasing initial-link duration, and the switches in preference from above 50% to below 50% that occurred in some conditions.

Fig 4
Predictions of delay-reduction theory (DRT) are shown for the different conditions of this experiment, presented in the same format as the actual data in Figure 3.

Can HVA and CCM predict the effects of the PRDs? As these models have been applied to previous studies on concurrent-chains schedules (Grace, 1994; Mazur, 2001), it appears that they cannot. As explained in the Introduction, HVA predicts that adding the PRD would not change the values of either the initial or terminal links, because they depend on the delays before food is delivered, not on any delay after the food. HVA therefore predicts that a PRD will have no effect. According to CCM, adding the PRD should always lead to an increase in choice percentages for the preferred alternative, but Figure 2 shows that the most common result was a decrease in choices for the preferred alternative.

One possible way to modify HVA and CCM so that they could predict the effects of the PRDs in this experiment would be to adopt the strategy already used in DRT, and replace r1 and r2 (the rates at which terminal links are entered for the two alternatives) with R1 and R2 (the overall rates of food delivery for the two alternatives). In other words, HVA and CCM would need to abandon the assumption that the rates of terminal-link entries are a determining factor in concurrent-chains choice, and adopt the assumption that choice is determined (in part) by the overall rates of reinforcement for the two alternatives. If they were modified in this way, both HVA and CCM would then predict that a PRD added to one alternative will decrease preference for that alternative, for the same reason that DRT already predicts this result.

Another option for HVA, however, would be to leave the equations unchanged, but to extend the calculations so that they include reinforcers that are delivered in subsequent terminal links, not just those in the present terminal link (cf. Mazur, Snyderman, & Coe, 1985). Performing such calculations could become complex, but it might be a necessary complication. The basic idea, however, is fairly simple. Returning to the example depicted in Figure 1, when there is a PRD for the green alternative, the two terminal links end at the same time, so there is no difference in the delays to reinforcers delivered on later trials. However, when there is no PRD, the green terminal link ends 61 s sooner than the red terminal link, so the reinforcers delivered from all subsequent terminal links will occur 61 s sooner after the green terminal link. With HVA, these later reinforcers could be taken into account by increasing the values of the two alternatives, Vt1 and Vt2. Both will increase, but the value of the green key will increase more because of the 61-s difference. Therefore, HVA will predict greater preference for the green key when there is no PRD, which is what was found in this experiment.

Calculations based on HVA were conducted to determine whether this approach would allow the model to predict the main features of the data collected in this experiment. An exact calculation would need to take into account many factors (the variability of the initial links, the random assignment of future terminal links to the green and red keys, how many trials ahead should be counted, etc.). For the present purposes, a simple approximation was used. Each of the three quantities representing value in Equation 4 (Vi, Vt1, and Vt2) was increased by an amount roughly corresponding to the added value that would result from two additional terminal links on the green key (each preceded, of course, by another initial link). This may not be the most accurate way to represent the effects of subsequent terminal links, but it will be close enough to show the types of predictions HVA can generate if it considers reinforcers beyond the current terminal link.

Figure 5 shows the results of these calculations that were obtained using the parameter values listed in the figure caption. As can be seen, this approach captures most of the characteristics of the actual group means shown in Figure 3. HVA now predicts (1) decreasing response percentages for the single reinforcer with shorter delays for the multiple reinforcers, (2) decreasing response percentages when a PRD was added with the VI 4-s initial links, but very little difference with the VI 480-s initial links (the predicted differences for VI 480-s are barely visible in Figure 5), and (3) with no PRD, there can be switches in preference from above 50% to below 50% as initial-link durations are lengthened (middle panel of Figure 5). With the parameter values used for these simulations, HVA does not predict switches in preference from above 50% to below 50% in the 20×3 and 6×5 conditions (top and bottom panels of Figure 5). It is not clear whether this is a serious deficiency, however, because if different parameter values are used in Equations 3 and 4, HVA can predict such switches in preference. In summary, if HVA takes into account the values of reinforcers delivered in later terminal links, it can account for most of the features in the data from this experiment. It should be emphasized that these predictions were derived without modifying the equations of HVA, but merely by extending the computations so that they included reinforcers delivered in later terminal links.

Fig 5
Predictions of the hyperbolic value-added model (HVA) are shown for the different conditions of this experiment, presented in the same format as the actual data in Figure 3.

How could CCM be modified to account for the reinforcers delivered in future terminal links? One approach would be to use a version of CCM introduced by Grace (1996) that replaced terminal-link reinforcement rates (rt1 and rt2), with values similar to Vt1 and Vt2 in HVA (values that decreased as reinforcer delays in the terminal links increased). Grace (1996) showed that this approach could account nicely for the results of experiments that compared fixed and variable terminal-link schedules. If this approach were extended to include reinforcers delivered in later terminal links, then the predictions of CCM would presumably be similar to those shown for HVA in Figure 5. However, several details of such an expanded version of CCM (such as how Tt and Ti should now be defined) would need to be worked out before this model could make specific predictions, and no attempt to develop this model will be made here.

Although the results of this experiment do not definitively favor one model over the others, they do place constraints on the forms the models can take and the ways they can be applied. DRT can account for the results in its present form (Equation 2). However, the effects of the PRDs require both HVA and CCM to be modified in some way before they can account for the data. Two possible modifications have been discussed. One would be to replace the initial-link reinforcement rates (ri1 and ri2 in Equations 1 and 4) with overall reinforcement rates, R1 and R2, thereby adopting the approach already used by DRT. The other option would be to take into account the effects of reinforcers delivered on later terminal links.

There are methods that could be used in future research to decide which of these approaches is most appropriate. Consider a condition with VI 4-s initial links and a PRD to equate the durations of the terminal links. Suppose an animal shows a clear preference for the alternative with multiple reinforcers. If the initial links are now lengthened to VI 480-s, HVA predicts that preference for the multiple-reinforcer alternative will decrease (i.e., the response percentage will shift closer to 50%). In contrast, DRT predicts that preference for the multiple-reinforcer alternative will increase (i.e., the response percentage will shift away from 50% and indicate a more extreme preference for the multiple-reinforcer alternative). This prediction of DRT applies to all cases in which the delay for the single reinforcer is shorter than the delay for the first of the multiple reinforcers. Some of the conditions in the present experiment provided tests of these opposing predictions, but the results were inconclusive: Of the six instances where response percentages favored the multiple-reinforcer alternative with VI 4-s initial links, the response percentages increased in three cases and decreased in three cases when the initial links were lengthened to VI 480-s. However, more systematic attempts to collect data from conditions like these would presumably help to settle this issue. These data would be important for the understanding of concurrent-chains schedules, because they would help to decide whether overall reinforcement rates (R1 and R2 in DRT) must be taken into account when predicting performance on these schedules, or whether it is sufficient to take into account the delayed effects of reinforcers that occur in later terminal links (as in the expanded version of HVA proposed here).

Either way, the results of the present experiment raise a methodological issue about how concurrent-chains schedules are designed. In an article on the effects of reinforcer delays and amounts in concurrent-chains schedules, Snyderman (1983) argued that failing to equate the durations of the two terminal links introduces an additional variable that can affect performance on these schedules. Although a few studies with concurrent-chains procedures have equated terminal-link durations in this way (Davison, 1988; Snyderman, 1983), most studies have not; instead, terminal links typically end after a single reinforcer is delivered, regardless of the time needed for this to occur. The present experiment showed that choice percentages can be dramatically different depending on whether or not the durations of the two terminal links are equated. Unless the specific purpose of a study on concurrent-chains schedules is to examine how unequal terminal-link durations can affect choice, Snyderman's suggestion about equating terminal-link durations may be a good one to follow.

Acknowledgments

I thank Diane Bunofsky, Emily Cline, and Michael Lejeune for their help in various phases of the research.

Footnotes

This research was supported by Grant MH 38357 from the National Institute of Mental Health.

References

  • Autor S.M. The strength of conditioned reinforcers as a function of frequency and probability of reinforcement. Harvard University; 1960. Unpublished doctoral dissertation,
  • Davison M.C. Preference for mixed-interval versus fixed-interval schedules. Journal of the Experimental Analysis of Behavior. 1969;12:247–252. [PMC free article] [PubMed]
  • Davison M.C. Delay of reinforcers in a concurrent-chain schedule: An extension of the hyperbolic-decay model. Journal of the Experimental Analysis of Behavior. 1988;50:219–236. [PMC free article] [PubMed]
  • Fantino E. Choice and rate of reinforcement. Journal of the Experimental Analysis of Behavior. 1969;12:723–730. [PMC free article] [PubMed]
  • Grace R.C. A contextual model of concurrent-chains choice. Journal of the Experimental Analysis of Behavior. 1994;61:113–129. [PMC free article] [PubMed]
  • Grace R.C. Choice between fixed and variable delays to reinforcement in the adjusting-delay procedure and concurrent chains. Journal of Experimental Psychology: Animal Behavior Processes. 1996;22:362–383.
  • Mazur J.E. Tests of an equivalence rule for fixed and variable reinforcer delays. Journal of Experimental Psychology: Animal Behavior Processes. 1984;10:426–436.
  • Mazur J.E. An adjusting procedure for studying delayed reinforcement. In: Commons M.L, Mazur J.E, Nevin J.A, Rachlin H, editors. Quantitative analyses of behavior, Vol. 5: The effect of delay and of intervening events on reinforcement value. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc; 1987. pp. 55–73. In. eds.
  • Mazur J.E. Predicting the strength of a conditioned reinforcer: Effects of delay and uncertainty. Current Directions in Psychological Science. 1993;2:70–74.
  • Mazur J.E. Two- versus three-alternative concurrent-chain schedules: A test of three models. Journal of Experimental Psychology: Animal Behavior Processes. 2000;26:286–293. [PubMed]
  • Mazur J.E. Hyperbolic value addition and general models of animal choice. Psychological Review. 2001;108:96–112. [PubMed]
  • Mazur J.E, Snyderman M, Coe D. Influences of delay and rate of reinforcement on discrete-trial choice. Journal of Experimental Psychology: Animal Behavior Processes. 1985;11:565–575. [PubMed]
  • Rider D.P. Preference for mixed versus constant delays of reinforcement: Effect of probability of the short, mixed delay. Journal of the Experimental Analysis of Behavior. 1983;39:257–266. [PMC free article] [PubMed]
  • Savastano J.I, Fantino E. Differences in delay, not ratios, control choice in concurrent chains. Journal of the Experimental Analysis of Behavior. 1996;66:97–116. [PMC free article] [PubMed]
  • Snyderman M. Delay and amount of reward in a concurrent chain. Journal of the Experimental Analysis of Behavior. 1983;39:437–447. [PMC free article] [PubMed]
  • Squires N, Fantino E. A model for choice in simple concurrent and concurrent-chains schedules. Journal of the Experimental Analysis of Behavior. 1971;15:27–38. [PMC free article] [PubMed]
  • Stubbs D.A, Pliskoff S.S. Concurrent responding with fixed relative rate of reinforcement. Journal of the Experimental Analysis of Behavior. 1969;12:887–895. [PMC free article] [PubMed]

Articles from Journal of the Experimental Analysis of Behavior are provided here courtesy of Society for the Experimental Analysis of Behavior
PubReader format: click here to try

Formats:

Save items

Related citations in PubMed

See reviews...See all...

Links

  • PubMed
    PubMed
    PubMed citations for these articles

Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...