Worth the work? Monkeys discount rewards by a subjective adapting effort cost

All life must solve how to allocate limited energy resources to maximise benefits from scarce opportunities. Economic theory posits decision makers optimise choice by maximising the subjective benefit (utility) of reward minus the subjective cost (disutility) of the required effort. While successful in many settings, this model does not fully account for how experience can alter reward-effort trade-offs. Here we test how well the subtractive model of effort disutility explains the behavior of two non-human primates (Macaca mulatta) in a binary choice task in which reward quantity and physical effort to obtain were varied. Applying random utility modelling to independently estimate reward utility and effort disutility, we show the subtractive effort model better explains out-of-sample choice behavior when compared to parabolic and exponential effort discounting. Furthermore, we demonstrate that effort disutility is dependent on previous experience of effort: in analogy to work from behavioral labour economics, we develop a model of reference-dependent effort disutility to explain the increased willingness to expend effort following previous experience of effortful options in a session. The result of this analysis suggests that monkeys discount reward by an effort cost that is measured relative to an expected effort learned from previous trials. When this subjective cost of effort, a function of context and experience, is accounted for, trial-by-trial choice behavior can be explained by the subtractive cost model of effort. Therefore, in searching for net utility signals that may underpin effort-based decision-making in the brain, careful measurement of subjective effort costs is an essential first step.


66
Reward follows work, but what reward is worth the work? This subjective decision is 67 fundamental to understanding animal and human behaviors during effort-based decision-68 making. Notably effort requirements can vary wildly, whether it be walking to the kitchen for 69 breakfast, running for a train to work or hiking up a mountain on holiday. The brain must be 70 able to compare all the effort costs in choosing these actions against their potential rewards and 71 therefore should possess flexible mechanisms for effort-based decision-making. 72 When consumer choice theory and labour supply theories are used to understand consumer 73 and worker behavior, it is assumed decision makers balance the possible utility (subjective 74 benefit) against the disutility (subjective cost) of effort, with consumer choice theory predicting 75 choices are made simply on net utility (reward minus cost, Fehr and Goette, 2007). 76 Experimental results from optimal foraging theory, psychology and economics all support this drivers were found to be six-fold more sensitive to above-expected workloads as to below-91 expected workloads (Crawford and Meng, 2011). Similar findings of asymmetric work 92 responses have been made in the economic literature (e.g. Akerlof and Yellen, 1990) and in 93 animals (e.g. van Wolkenten et al., 2007). 94 However, little economic work has been done to study how effort reference-points change  Here we were interested in measuring effort disutility in rhesus macaques as a foundation 102 to establishing the neurobiological basis of the reward-effort trade-offs. Previous tasks used to 103 measure effort-costs have often additional confounds that make it difficult to directly assess 104 the subjective value of effort. For example, in both commonly used designs in rodent studies, 105 lever-pressing and the T-maze, high effort options are correlated with longer delays from trial 106 onset (Salamone et al., 1994;Cousins et al., 1996). Given the extensive work demonstrating  We apply a random utility model to independently model reward utility and effort disutility 111 . CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted January 10, 2023. ; https://doi.org/10.1101/2023.01.10.523384 doi: bioRxiv preprint from safe options (Mcfadden, 1973;McFadden and Train, 2000 To study the reward-effort trade-off it was necessary to develop hardware to implement  The joystick was constructed to allow motion in a single plane of movement, from left to 138 right, by rotation on an axis. The different effort levels were made by altering the resistance to 139 motion by varying the strength of an electromagnet that brought the joystick into contact with 140 a brake pad. These forces were calibrated using fixed weights under gravity: there was a linear 141 relationship between the applied voltage and static frictional resistance (Fig. 1A). Six levels (0 142 -10 V in 2V increments, corresponding to <1 N to 9.8 N; for convenience these will be 143 considered as 0 -10 N in 2 N steps in the analyses) of effort were used throughout the 144 experiments. We incorporated sensors into the joystick to measure kinematic variables: 145 capacitive touch detection, a potentiometer adapted to detect position, and paired strain gauges 146 for detecting the directional strain on the joystick that reflects the force the monkey applied to 147 the joystick.

148
For each daily experiment, each monkey was sat in its specifically adjusted primate chair  The behavioral task was controlled by custom MATLAB software running on a Windows 154 10 computer with the Psychophysic Toolbox (Pelli, 1997) used to code the visual stimuli. CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted January 10, 2023. ; https://doi.org/10.1101/2023.01.10.523384 doi: bioRxiv preprint through the data acquisition card at a rate of 250 Hz. Various amount of blackcurrant juice 158 rewards (0 -1 ml) were delivered by opening a solenoid (SCB262C068; ASCO) which gated 159 a gravity-fed system. 160 Binary Choice Task 161 The animals were trained to work for liquid rewards. The animals were mildly fluid deprived 162 for 6 days a week during which the totality of the fluid available was provided by the 163 performance in the behavioral task, with additional subsequent daily access to ensure the 164 minimum daily fluid requirements were met.

165
The monkeys were trained to associate two-dimensional visual stimuli (Fig. 1C  To test whether these utility functions accurately described the animals' choice behavior,

227
To determine how trial history influenced the sensitivity to effort in future trials, the logistic 228 regression used in the first model (Eq. 1) was extended in the following terms Where Effort is the effort difference on the current trial, and previous effort is the chosen 231 effort on the ith previous trial.

232
To determine whether each additional previous choice term improved the fit, we compared 233 models by computing the Akaike Information Criterion (AIC) at each step.

234
Given the notion that past effort experience influences an internal reference point against 235 which current effort options are compared asymmetrically, we sought to implement a model 236 which could identify this asymmetry. We adapted Crawford and Meng's (2011) 237 implementation of gain-loss utility into a random utility model as follows:

242
. CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted January 10, 2023. Wagner type reinforcement learning rule:

251
The learning rate (α) was modelled as two constants, one for efforts above reference and 252 one for efforts below reference to allow for similar asymmetry and hypothesis testing as 253 discussed in the estimate of the effort sensitivity parameters. Optimal learning rates were 254 determined by grid search, with optimality defined by minimized AIC.

256
Task Design 257 We tested choices between options that differed in the magnitude of reward the animal 258 would obtain and the amount of effort to obtain that reward. showing left-right bias; Eq. 1).

272
The monkeys chose options with higher juice rewards when effort to obtain was equal (  The monkeys' understanding of the task was confirmed by the study of kinematic data (Fig.   296 2G). The joystick contained strain gauges that measured the preparatory efforts the monkeys 297 made before the 'Go' cue (Fig. 2H). Averaging the directional strain in the 100 ms before the force when the resistance to movement (Fig. 2H), as cued by the fractal, was higher.

304
In order to test effort cost discounting at both behavioral and neuronal levels, it is necessary 305 to establish a common scale of value between the reward and effort. Here, we first established 306 the viability of this method through a standard psychometric approach before extending it 307 through random utility modelling.

308
To establish the equivalent value of an effort level in millilitres of juice (in the 'common 309 currency' of juice), we established the indifference point (P = 0.5 choice probability) in the 310 following way: Against a fixed magnitude of juice, at the lowest effort level, we presented a 311 variable option (Fig. 3A). This variable option varied in juice magnitude between trials, with 312 the same effort-level within blocks. The resulting data was fitted with a logistic function, with 313 juice quantity as the explanatory variable. From this function, we estimated the magnitude of 314 juice at which the variable option was chosen as frequently as the fixed option (P = 0.5 each 315 option) (Fig. 3B). The difference in juice quantity between the indifference point and the fixed 316 option was taken to be of the same subjective value to the monkey as the difference in the effort 317 levels between the two options. Consistent with effort as a cost (or negative reward), the effort 318 needed to be compensated with more juice rewards; thus effort was considered to be equivalent 319 to negative juice quantity. To check the consistency of these measures, we repeated and 320 measured variation across days and against different fixed options. In general, increased effort 321 corresponded to more negative juice equivalents (Fig. 3C). The smaller monkey (Monkey W) 322 had more negative juice equivalents for the same effort levels compared to the larger monkey    In order to obtain a more complete understanding of the subjective value of effort, it would be 334 necessary to elicit the utility functions for gain and effort independently.

335
To limit the number of experimental parameters, we refrained from extensive testing of 336 gamble options, as the additional confounds of risk preference and probability distortion might 337 complicate the model and the understanding of the data. Therefore, we used a riskless method 338 to establish functions for the utility of juice reward and the disutility of effort (Fig. 4A, B). In  Effort disutility was best modelled non-parametrically; Monkey U was relatively insensitive to 354 the lower efforts, whereas Monkey W was more sensitive to the lower efforts (likely owing to 355 his smaller size), but treated the two highest effort levels similarly, as seen in the non-356 parametric fit (Fig. 4D). These data were replicated with all utility functions fitted (Fig. 4F).

357
To validate these utility estimates, we tested how well they predicted out-of-sample choices, 358 specifically all choices that differed in both effort and reward simultaneously which therefore 359 were not used for estimating the utility functions. We combined the reward utility and effort 360 disutility using common methods from the literature (subtractive, hyperbolic and exponential 361 (Białaszek et al., 2017). The subtractive model (reward utility less effort (dis)utility) provided 362 a better fit to the empirical data than the other models we tested and a far superior fit compared 363 to the predictive power of the juice equivalent values alone (Fig. 4G). Furthermore, the 364 difference in utility between options predicted the choice ratio between options in a sigmoid 365 function, confirming the model's validity (Fig. 4E).  In a similar manner to satiety having an effect on the reward utility, we expected to observe 381 a fatigue to develop over the session and increase the sensitivity to effort. Rather, we found 382 that both subjects became less sensitive to effort over the course of each session (Fig. 6A). For 383 example, when choosing between options requiring 0 N vs. 8 N of effort, Monkey W showed 384 significantly stronger preference for the low effort option in earlier as compared to later daily 385 sessions (Fig. 6A), suggesting the value difference of effort had subjectively decreased.

386
Splitting sessions by the cumulative effort into quartiles and using choices that only varied 387 in effort, we estimated the effort disutility curves with a quadratic function (Fig. 6B). In both 388 subjects, the effort disutility diminished through the session, rather than increase we would for a change in slope at the reference point (Eq. 3) and using the random utility modelling 413 techniques described above to fit the function to the data. In fitting this model to both animals, 414 there was significant evidence of an asymmetric response to effort (Fig. 7A). For Monkey U, 415 the parameter capturing effort sensitivity above the reference point (β, -0.44 95% CI: (-0.41, -416 0.47)) was 3.6 times greater than the parameter capturing effort sensitivity below the reference 417 point (β, -0.13; 95% CI: (-0.10, -0.16)). In Monkey W the change in slope was even greater, 418 with the sensitivity above the reference point (β, -1.23; 95% CI: (-1.17, -1.28)) being 4.5 times 419 the sensitivity below the reference point (β, -0.27; 95% CI: (-0.23, 0.31)).

421
. CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted January 10, 2023. ; https://doi.org/10.1101/2023.01.10.523384 doi: bioRxiv preprint While the above finding is evidence for a reference-dependent effort valuation, it does not 422 sufficiently address how the animals change their reference on a trial-by-trial basis, as the 423 previous experiments established high effort trials have an effect on effort disutility through 424 the session. Yet is necessary, as a basis for the neurophysiology, to understand how value 425 changes on a trial-by-trial basis. To address this, we developed a model in which the effort-426 reference reflects an expectation of future evidence learned from previously experienced effort.

427
To model learning of expected effort, we used the Rescorla-Wagner reinforcement learning 428 model due to its inherent compatibility with temporal difference learning models, as it is based 429 on prediction-error driven learning but has a reduced set of parameters that require estimating.

430
In this model, the expectation of effort was updated on a trial-by-trial basis by the effort error 431 (expected effort minus actual) multiplied by a learning rate. To allow for a difference in 432 saliency of efforts above and below reference, analogous to the difference in saliency of losses 433 and gains in prospect theory, the optimal learning rate was estimated separately for efforts 434 above and below the reference point. We then used the same random utility modelling 435 techniques with the piecewise linear function to estimate the effort disutility function. Optimal 436 learning rates were established by a grid search method to minimize AIC.

437
In the optimal fits for both animals there was strong evidence that efforts above the reference 438 point were more negatively valued than corresponding positive valued efforts the same 439 magnitude below the reference point (Fig. 7B). The optimal learning rates also reflected that 440 efforts above the reference point were significantly more salient than efforts below the 441 reference point (Monkey U optimal rate above reference point: 0.089, below reference point: 442 0.039; Monkey W, optimal learning rate above reference point 0.29, below reference point: 443 0.09). Notably the ratio of learning rate above and below reference point and the slope of the 444 utility function above and below was approximately equivalent in both animals (Monkey U: 445 slope above reference = 2.87 × slope below reference, learning rate above reference = 2.28 × 446 learning rate below reference; Monkey W: slope above reference = 3.26 × slope below 447 reference, learning rate above reference = 3.22 × learning rate below reference). This suggests 448 that the utility prediction error drives the learning of the effort reference point.  In this study, we examined the cost of effort in a binary choice task and employed a proven 464 method to estimate utility (random utility modelling) from riskless choice data. In line with 465 numerous studies in humans, macaques and other animal models, we found evidence that effort 466 is treated as a cost weighed against potential rewards during decision-making. Moreover, this 467 effort cost function is not fixed, but rather dependent on an internal reference point, which is 468 learned from previous experience.

469
. CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted January 10, 2023. ; https://doi.org/10.1101/2023.01.10.523384 doi: bioRxiv preprint While the psychometric method to obtain equivalent juice values for each of the effort levels 470 was successful, the random utility model proved superior. Specifically, the subtractive model 471 of effort discounting was the best fitting model, as predicted by economic theory (Fehr and 472 Goette, 2007). In demonstrating that the independently estimated utility functions can be used 473 to predict out-of-sample choices with the subtractive model, we provide direct evidence for the 474 underlying assumption that a cost-benefit calculation underlies effort-based decision-making. 475 The demonstrable satiety and effort adaptation effects likely explain why the random utility  (Eisenberger et al., 1979(Eisenberger et al., , 1989).

499
One possibility not accounted for here is that low effort options, particularly the lowest 500 effort option, may have positive utility. If the average effort required to obtain a reward is high, 501 a low effort option may be perceived as a rest from effort and therefore may have increasing 502 or even positive utility that is not revealed in these analyses due to anchoring the zero-effort 503 option to zero utility. It is difficult to assess this effect with this task as the animals may also 504 rest by skipping trials and it is not possible to correlate this effect with fatigue as the effort-505 adaptation effects appear to obscure any fatigue effects. It was necessary to define an anchor 506 point on which to scale utility as utility is only unique up to positive affine transformations. 507 Therefore, the positive or negative value of utility is only in relation to this anchor point.

508
In the context of this task design, the monkeys can only learn about what effort to expect 509 from their own experience of the effort and the presentation of previous cues. In these 510 experiments and models, we investigated the role that the order of presentation of effort cues 511 and the experience of effort has on the subjective value of effort. In line with the economic 512 literature on human behavior, we found monkeys' effort preferences were reference-dependent 513 and shifted following changes to the range of efforts experienced in the task. As the monkeys 514 experienced effort, their choices became progressively less sensitive to the effort difference 515 between options. Modelling this as a prospect-theory like reference-dependent function 516 . CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted January 10, 2023. ; https://doi.org/10.1101/2023.01.10.523384 doi: bioRxiv preprint suggests that the monkeys were particularly sensitive to efforts above their expected effort and 517 relatively insensitive to efforts below their expected effort.

518
The learning model of effort, in which the reference point is an expectation of effort learnt 519 from previous trials driven by a prediction-error mechanism, improved over models in which 520 effort was a fixed-cost throughout the session. This model, which is an adapted Rescorla-521 Wagner type learning rule, is a model of reinforcement learning. Here the similar ratio of the 522 learning rates above and below the reference point, with the relative utilities of effort above 523 and below the reference point suggests that effort utility errors drive the learning of expected

562
. CC-BY 4.0 International license perpetuity. It is made available under a preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in The copyright holder for this this version posted January 10, 2023. ; https://doi.org/10.1101/2023.01.10.523384 doi: bioRxiv preprint Overall, these experiments suggest that monkeys' subjective value of effort is reference-563 dependent, and that the reference point is, at least in part, an expectation of the average effort 564 level formed from experience in previous trials. Such effects are necessary to control and 565 understand when probing the relationship between midbrain dopamine neurons and effort cost 566 encoding.