Patient-centered benefit-risk analysis of transcatheter aortic valve replacement

Background: Aortic stenosis (AS) treatments include surgical aortic valve replacement (SAVR) and transcatheter aortic valve replacement (TAVR). Choosing between SAVR and TAVR requires patients to trade-off benefits and risks. The objective of this research was to determine which TAVR and SAVR outcomes patients consider important, collect quantitative data about how patients weigh benefits and risks, and evaluate patients’ preferences for SAVR or TAVR. Methods: Patients were recruited from advocacy organization databases. Patients self-reported as being diagnosed with AS, and as either having received AS treatment or as experiencing AS-related physical activity limitations. An online adapted swing weighting (ASW) method – a pairwise comparison of attributes – was used to elicit attribute trade-offs from 219 patients. Survey data were used to estimate patients’ weights for AS treatment attributes, which were incorporated into a quantitative benefit-risk analysis (BRA) to evaluate patients’ preferences for TAVR and SAVR. Results: On average, patients put greater value on attributes that favored TAVR than SAVR. Patients’ valuation of the lower mortality rate, reduced procedural invasiveness, and quicker time to return to normal quality of life associated with TAVR, offset their valuation of the time over which SAVR has been proven to work. There was substantial heterogeneity in patients’ preferences. This was partly explained by age, with differences in preference observed between patients <60 years to those ≥60 years. A Monte Carlo Simulation found that 79.5% of patients prefer TAVR. Conclusions: Most AS patients are willing to tolerate sizable increases in clinical risk in exchange for the benefits of TAVR, resulting in a large proportion of patients preferring TAVR to SAVR. Further work should be undertaken to characterize the heterogeneity in preferences for AS treatment attributes. Shared decision-making tools based on attributes important to patients can support patients’ selection of the procedure that best meets their needs.


Introduction
Aortic stenosis (AS) is a progressive cardiovascular condition resulting from narrowing (or stenosis) of the aortic valve. This narrowing prevents the valve from fully opening, decreasing blood flow out of the heart and forcing it to work harder to maintain sufficient blood flow. If left untreated, AS can lead to severe cardiovascular complications and death 1 . As of 2015, 12.4% of the United States population over age 75 (nearly 2.5 million people) were reportedly diagnosed with AS 2,3 . More than one in eight people (13.3%) over the age of 75 in the US have moderate-to-severe AS 4 . Patients with AS may be asymptomatic for many years until the valve is narrowed severely enough to cause symptoms. Symptoms of AS include chest pain and angina, syncope, dyspnea, fatigue, and palpitations, all of which are exacerbated by physical activity 5 . Undertreatment of AS patients is common, with more than half of patients referred to cardiologists failing to receive surgical treatment 6 . Once symptoms appear, often between the ages of 70 and 80 years old, the prognosis of untreated patients is poor 7 . Among untreated patients, average survival is 50% at two years and 20% at five years after the onset of AS symptoms.
The traditionally recommended treatment for AS is surgical aortic valve replacement (SAVR). Such invasive surgery, involving a large incision in the chest, may not be suitable for all patients, especially those with comorbidities. The alternative, transcatheter aortic valve replacement (TAVR), is a minimally invasive, catheter-based procedure to replace the aortic valve in patients with AS. TAVR is the first-line therapy for inoperable patients with severe AS and an alternative to SAVR in operable high-risk patients. Among patients who are at intermediate surgical risk, both TAVR and SAVR are associated with improvements in disease-specific and generic health status 8 . However, TAVR is associated with a reduced rate of complications and a quicker recovery time, with patients returning to a normal quality of life more quickly 9 . When first available, the benefits of TAVR were offset by reportedly increased risks of stroke and the need for a pacemaker 10-12 . Recent clinical data reveal similar, if not improved rates of stroke and need for pacemaker among TAVR patients 13 . Furthermore, at a median two-year follow-up, all-cause mortality for patients undergoing TAVR was 20.2% compared with 21.9% for patients undergoing SAVR 14 .
The choice of TAVR or SAVR involves patients making trade-offs between multiple treatment attributes, including invasiveness, speed of recovery, mortality rates, and risks of complications. Given the challenging nature of this decision, tools have been developed to support patient decision-making 15 . However, little is known about the weights that patients assign to each attributes, how they make trade-offs between these attributes, and whether and how these preferences vary between patients. The objective of this research was to determine which outcomes associated with TAVR and SAVR patients consider most important, collect quantitative data about how patients weigh the benefits and risks associated with TAVR and SAVR, and to use this data to evaluate patients' preferences for SAVR or TAVR.
Interim results from the first 93 participants enrolled into this study were published on 21 June 2019 as Version 3. This manuscript reports the final results from this study, based on a sample of 219 participants.

Overview
Patients' preferences for TAVR or SAVR were assessed using a quantitative benefit-risk assessment (BRA). This involved identifying attributes that distinguish TAVR and SAVR, measuring TAVR and SAVR performance on these attributes, eliciting patients' preferences for these attributes, and combining performance and preference in a BRA to determine what proportion of patients prefer TAVR or SAVR. The patient-preference survey upon which the BRA is based was fielded from July 2018 -January 2019.
Attribute selection A long list of potential attributes was identified by reviewing the attributes highlighted in previous patient preference studies for heart valve surgical interventions, published meta-analyses and clinical studies, and regulators' assessments of related products. Additional attributes were identified based on consultation with TAVR and SAVR clinical experts and from patient input. The final attributes used in the study were selected based on clinical and regulatory relevance, whether or not the attribute distinguishes between TAVR and SAVR, and to comply with the attribute set properties required of an additive BRA 16 . For example, to avoid overlap with 'mortality', the 'stroke' attribute was defined as 'disabling, non-fatal strokes.' Descriptions of the final attributes included in the BRA are summarized in Table 1.
Performance measurement TAVR and SAVR performance against the final attributes were identified from the published literature and from clinical data (Table 2), focusing on data that had a high degree of reference and use within the clinical community. Available data for stroke risk, defined as "all stroke" (both fatal and non-fatal) in the literature, and independence, defined by Kansas City Cardiomyopathy Questionnaire (KCCQ) 17 score, required transformation to estimate performance as defined by the final study attributes. To estimate the risk of non-fatal stroke only, available stroke risk data was adjusted using the mortality rate among patients with severe aortic stenosis enrolled in the PARTNER trial who suffered a stroke compared to the mortality rate among those in the trial who did not suffer a stroke 18 . 'Independence' was defined as the probability of achieving relief from AS

Updates from Version 4
The present version of this manuscript updates previous versions to incorporate feedback provided through peer review, including updates to the discussion section to clarify some of the assumptions made, and to include more reflections on the potential challenges associated with the adapted swing weighting methodology.
Any further responses from the reviewers can be found at the end of the article

Definition Description provided to participants
Type of procedure Type of procedure The invasiveness of the procedure is described by three characteristics: The length and depth of the incision Whether you heart is stopped The number of days that you will need to be in the hospital following a procedure. There are two types of procedure: A minimally invasive procedure requiring, on average, 8 days in hospital. A small incision is made near your groin, and a valve is inserted and guided to your heart using a long tube through an artery. The tube is used to implant a new valve in the heart to replace the diseased aortic valve. An invasive procedure, requiring, on average, 12 days in hospital. A large cut about 25 cm long is made in your chest to access your heart. Then, your heart is stopped while a machine takes over your heart and lung function. A new valve is implanted to replace the damaged valve. Your heart is started again, and your chest is stitched closed.

Mortality
Number of patients out of 100 who will die within 1 month The numbers of patients who will die from any cause within 1 month of having the procedure. Death could be due to complications from the procedure, from complications of aortic stenosis, or as a result of disabling stroke.
Disabling nonfatal stroke The number of patients out of 100 who will experience a non-fatal disabling stroke within 1 month The number of patients who will experience a non-fatal but disabling stroke within 1 month of having the procedure. If you experience a stroke, you will be hospitalized.
If the stroke is severe, it may lead to temporary or permanent disability, such as paralysis, reduced mobility, and problems with thinking, memory and speech.

Independence
Number of patients out of 100 who experience greater independence within 1 month The number of patients who experience improvement in daily activities (greater independence) following relief from aortic stenosis symptoms within 1 month of the procedure. The symptoms of aortic stenosis (shortness of breath, fatigue, chest pain, and dizziness), physical function, and quality of life are improved to an extent that you experience improvements in your degree of independence and ability to engage in activities of daily living.

New permanent pacemaker
The number of patients out of 100 who will require a pacemaker within 1 year The number of patients that will need to have a pacemaker permanently implanted as a result of the procedure. Typically, a pacemaker is implanted under local anesthetics and you may be discharged the same day if you get your pacemaker in the morning.

Requirement for dialysis
The number of patients out of 100 who will require dialysis within 1 year The number of patients that will experience kidney function damage that will need dialysis as a result of the procedure. A machine is used to do the kidney's job of cleaning the blood. If you need dialysis, you will need to go to the hospital three times a week, with each visit lasting 4 hours.
Time over which the procedure has been proven to work The number of years for which your procedure has been available and proven to work The number of years the procedure has been available and is proven to keep symptoms of aortic stenosis from coming back. Following this period, it is currently not known whether you will experience aortic stenosis symptoms again. symptoms within a month of a procedure. Given available data, this was estimated as the probability of achieving a total score of 75 on KCCQ 17 . The KCCQ is a standard patient reported outcome measure used in clinical trials of surgical and transcatheter heart valves 21 . Estimated mean KCCQ score and variation in this measure were transformed into the proportion of patients achieving a KCCQ total score of 75 at 1 month using procedures previously described in Marchini 22 .

Survey methodology
An adapted swing weighting (ASW) exercise was administered online to elicit patients' preferences for treatment attributes 23 . The objective of the ASW exercise was to identify the level of change in an attribute that patients would be willing to accept in exchange for improving their procedure from 'invasive' to 'minimally invasive' (see Table 1 for definitions). The ASW exercise consisted of a series of pairwise comparisons of attributes-the 'invasiveness' attribute and one other attribute.
Participants were shown 'current' and 'improved' levels on each attribute and asked which improvement they would prefer to make (an example choice question is shown in Figure 1). The 'current' levels were chosen to reflect the attribute performance levels of TAVR and SAVR (Table 2), to ensure they had credibility with patients, adjusted to ensure that patients had sufficient range to indicate the change in the attribute that would have the equivalent value as reducing invasiveness. Therefore, the exercises were not designed to directly elicit patients' willingness to tolerate the actual change observed with TAVR.
Each set of pairwise comparisons included three iterations of the choice question. Depending on the answer to the choice question, the level of improvement offered on the nonprocedure attribute was updated in the direction that made the value of improvements on the two attributes more similar than in the previous question. The algorithm used to identify participants' indifference point included the levels used in each of the three iterations and how these changed dependent on the participants' previous responses is shown in extended data, Appendix 1 24 .
A pilot of the survey among five AS patients, carried out over a 4-week period in June 2018, ensured acceptable cognitive burden, clarity of the instructions, and ease of use of the elicitation software. Before completing the ASW exercises, participants were introduced to the attributes and their definitions. Participants also completed two ASW practice questions; interpretation of their response to these practice questions was tested to ensure participants understood how to complete each of the pairwise comparisons. Only eighteen participants (8.2%) incorrectly interpreted the meaning of their response to the first practice question, and no participants incorrectly interpreted their response to the second practice question. Each participant completed a proportion of the possible pairwise question-either 3 or 4 sets of pairwise comparison questions. Participants also completed clinical and demographic questions, and health literacy and numeracy questionnaires, available as extended data 24 .

Participants
Potentially eligible participants were recruited by M3 Global Research via email and from the membership of Heart Valve Voice and Mended Hearts patient organizations through e-mails and advertisements on social media platforms. Patients from the American Heart Association membership who had given prior permission to receive mailings were also invited to participate. Potential participants were directed to an online screening tool, where their eligibility for the survey was determined. Participants had to meet specific inclusion/exclusion criteria to participate (Box 1). There were no predefined enrollment targets or stratification by other demographic characteristics. Following completion of the online screening tool, eligible participants survey were included in the analyses if they had completed all the questions in at least one ASW exercise.
Participant preferences were incorporated into the following benefit-risk model constructed in Microsoft Excel.
is the overall value generated by procedure x.
w i is the weight associated with attribute i, v i is the partial value function for attribute i, which was assumed to be linear x i is the performance of performance x on attribute i Participants' MIR/MRB was converted into an attribute weight (w i ) by setting the weight for 'type of procedure' to 1, and then dividing the range in performance between TAVR and SAVR on that attribute (Table 2) by the MIR/MRB. For instance, if patients were willing to tolerate a 2% increase in mortality risk in exchange for a reduction in invasiveness, and reducing invasiveness is given a weight of 1, then the 4.8% change in mortality risk covered by the benefit-risk model (0.5%-5.3% range identified in Table 2) would be given a weight of 2.4 (4.8%/2%). Weights across all attributes in the model were then normalized to sum to 100.
Four outputs were generated from the benefit-risk model to evaluate patients' preferences for TAVR and SAVR. First, the incremental overall value generated by TAVR: Second, the incremental partial value on each attribute generated by TAVR:

Dizziness or blackouts/Syncope
• Be willing and able to participate in a telephone interview, and to be audio-recorded (qualitative pilot only)

Exclusion criteria
• Have a cognitive impairment, hearing difficulty, visual impairment, acute psychopathology • Have insufficient knowledge of English, which could interfere with the patient's ability to provide written consent and complete the web survey • Are not experiencing at least one of the symptoms of AS, as described in the inclusion criteria

Analysis
Participant responses to the ASW questions identified the level of change in an attribute that patients would be willing to accept in exchange for improving their procedure from 'invasive' to 'minimally invasive.' For instance, a participant might assign the same value to a 2% reduction in mortality risk as reducing the invasiveness of the procedure. Inverting this relationship, responses were used to estimate the maximum acceptable increase in risks (MIR) or the maximum acceptable reduction in benefits (MRB) that participants would be willing to tolerate in exchange for moving from an invasive procedure to a minimally invasive procedure. In the above example, the participant would be willing to tolerate a MIR of 2% in mortality risk to reduce the invasiveness of the procedure.
In cases of incomplete or missing data, no data imputation was performed, and unanswered questions were coded as missing. Data from participants who completed only a portion of the Table 3. Benefit-risk analysis definitions.

Maximum Acceptable Increase in Risk:
The maximum acceptable increase in risk for a single attribute that patients would tolerate in exchange for reducing procedure invasiveness.

Maximum Acceptable Reduction in Benefit:
The maximum acceptable reduction in benefit for a single attribute that patients would tolerate in exchange for reducing procedure invasiveness.

Maximum Acceptable Risk:
The maximum acceptable risk that would make patients indifferent between TAVR and SAVR.

Minimum Acceptable Benefit:
The minimum acceptable benefit that would make patients indifferent between TAVR and SAVR.
TAVR a is the performance of TAVR on attribute a (see Table 2).
x 1 unita is a single unit of performance on attribute a Table 3 outlines the definitions of MIR, MRB, MAR, and MAB.
Fourth, a Monte Carlo Simulation (MCS) was run to explore uncertainty in model inputs. That is, the benefit-risk model was run 10,000 times for both TAVR and SAVR. In each instance, the model drew from the distribution around both performance and weight inputs. Specifically, performance inputs were drawn from the distribution around TAVR and SAVR performance data ( Table 2). Weight inputs were drawn in a manner that reflected the probability that participants identified different MIR/MRBs in their responses to the survey. For each iteration of the MCS, TAVR and SAVR, we've ranked based on U (Equation 2) and the proportion of instances that TAVR ranked first was estimated.

Ethics
In accordance with ethical practice, Institutional Review Board (IRB) approval was obtained through Advarra (approval MOD00300398 and MOD00354863) to comply with human participants research requirements prior to initiation of participant recruitment or administration of measures in the pilot or main studies. Informed consent was recorded electronically via the online survey platform. Any change to the protocol and/or informed consent form was resubmitted to the IRB for review and approval prior to implementation. The study was available for monitoring, auditing, IRB review, and regulatory inspection as applicable.

Results
Demographic characteristics of participants A total of 219 patients completed the survey over two rounds (Table 4). Raw data are available from Open Science Framework 24 . The majority of patients were less than 60 years old (n=132, 60.3%). More than half of the respondents were female (n=128, 58.4%) and a majority were white (n=173, 79.0%).
Over half of the sample had completed a college degree or higher (n=157, 71.7%). Most of the participants lived with a partner/spouse, family, or friends (n=164, 74.9%). No participants reported experiencing severe limitations to their physical activity. Few patients demonstrated low health literacy (n=23, 10.5%) or numeracy (n=22, 10.0%), and only 7 patients (3.2%) were low on both scales.

Responses to ASW questions
When responding to the ASW questions, only a small proportion (8.68%) of participants 'straight-lined' on all questions-consistently choosing to improve either 'procedure' or the comparison attribute across all three iterations of the choice question. These responses may be a valid reflection of participants' preference-suggesting a strong preference either for avoiding an invasive procedure, or a strong preference to prioritize improving other procedure attributes. Thus, these responses were included in the analysis. The impact of excluding these data were tested, and it was determined that results of the BRA were not sensitive to whether these data were included or excluded. Table 5 shows the difference in performance of TAVR compared with SAVR on each attribute, and patients' willingness to accept this difference in exchange for the lesser invasiveness of TAVR. The increase in risks or the reduction in benefits that patients are, on average, willing to accept in exchange for reducing procedure invasiveness is reported in the middle three columns. For instance, patients would be willing to tolerate a 6.69% increase in the probability of experiencing disabling, non-fatal stroke in exchange for the reduction in invasiveness associated with receiving TAVR instead of SAVR. In each case, patients were on average willing to accept TAVR's performance on any attribute in exchange for its lower invasiveness.

Comparisons of TAVR and SAVR
In the case of four attributes (mortality, disabling non-fatal stroke, independence, and dialysis), TAVR performs better than SAVR. Where SAVR performs better than TAVR (the need for new permanent pacemaker and time over which the procedure has been proven to work), patients would, on average, be willing to accept TAVR's performance given its lower invasiveness. For example, participants are willing to tolerate a 6.98% increase in the risk of a new permanent pacemaker, while the probability of Responses were scored between 0 (always) and 4 (never). Each participants' scored responses were averaged for a composite score ranging from 0-4. A low score if ≤2.
2 Participants were given one point for each correctly answered question (maximum numeracy score = 5). A low score if given is ≤2 answered incorrectly. ³Overall low: individuals who scored low on both educational level and objective health literacy. NYHA, New York Heart Association.
needing a new permanent pacemaker only increases by 3.3% with TAVR.
The standard errors in patients' MIR/MRB suggests a substantial heterogeneity in patients' responses to the ASW exercise (see extended data, Appendix S2 24 for more detail). Some of this heterogeneity was associated with participants' age. MIR/MRB for three attributes-probability of having a new permanent pacemaker, probability of requiring dialysis, and one month mortality risk-were associated with whether patients are over or under 60 years old. Older patients were more willing to tolerate increases in risks/reductions in benefit to avoid having to undergo an invasive procedure.
No other correlation was found between participant characteristics and MIR/MRB. This includes whether a participant reported having previously undergone treatment for their AS. While this might be expected to influence preferences, the ability of the analysis to identify this influence is limited by the relatively small sample size and the small proportion of participants who reported not having previously received AS treatment (19.6%).
There were a large proportion of participants whose individual MIR/MRB was greater than the change in attribute performance achieved with TAVR (Table 5). For attributes where performance is better with TAVR compared to SAVR (mortality, disabling non-fatal stroke, independence, and dialysis), 100% of patients would prefer the improved performance and reduced invasiveness of TAVR. For the two attributes on which attribute performance is better with SAVR compared to TAVR, 70% of patients would be willing to accept the increased risk of needing a new permanent pacemaker, and 47% of participants would be willing to accept the shorter period for which TAVR had been proven to work in order to experience TAVR's reduced invasiveness.
The above analysis compares the performance of TAVR on each attribute separately. A complete comparison of TAVR and SAVR should do so across all attributes simultaneously and take into account observed heterogeneity (in this case across age groups). This objective is accomplished by means of the benefit-risk model (see Equation 2 and Equation 3). Figure 2 and Figure 3 show the incremental value of TAVR (overall and by attribute) observed among patients 60 years old or older ( Figure 2) and among patients less than 60 years old ( Figure 3). These figures reveal that, overall, TAVR is of greater value to patients than SAVR. Specifically, patients placed greater value on TAVR based on a lower short-term mortality rate, reduced procedural invasiveness, and a quicker time to return to normal quality of life (independence) offsetting the value patients placed on longer period over which the procedure has been proven to work and reduced risk of needing a pacemaker generated by SAVR. Similar patterns were observed among younger and older patients. Table 6 reports a threshold analysis, which shows the minimum amount of benefit that patients would accept before preferring TAVR, or the maximum amount of risk that patients would tolerate before preferring TAVR. For example, given the incremental value that patients attach to TAVR (as reflected in Figure 2 and Figure 3) they would be willing to tolerate a mortality risk of 12.6% following TAVR before they would be indifferent between TAVR and SAVR.

TAVR threshold analysis
The MCS shows that 79.5% of patients would prefer TAVR over SAVR. When the analysis is run separately for patients less than and greater than 60 years old, the proportions of patients preferring TAVR are 80.8% and 78.2% respectively. Removing patients who 'one-lined' in response to ASW exercises does not impact the results of the MCS, with 80.7% of patients still preferring TAVR.

Discussion
The choice between TAVR and SAVR for the treatment of AS involves making trade-offs between: procedure invasiveness; the period over which the procedure has been proven to be effective; mortality, stroke and independence outcomes; and risks such as the need for a new pacemaker or dialysis. This study elicited patients' preferences for AS procedure attributes to determine how they make these trade-offs, and thus whether they prefer TAVR or SAVR. Results suggest that, given the potential benefits and risks of TAVR and SAVR, on average, patients attach more value to TAVR, and the majority of patients would prefer TAVR. Patients placed a greater value on the lower invasiveness, quicker speed of recovery, and reduced risk of   mortality, stroke and need for dialysis associated with TAVR than they did on longer period over which the procedure has been proven to work and reduced risk of needing a pace maker associated with SAVR.
Current guidelines from the American Heart Association for the procedural treatment of AS do not take into account recent clinical data supporting the use of TAVR 25 . Based on the recent clinical results of TAVR and the findings of this study, regulators may reach different conclusions about the need to protect patients from risks historically associated with TAVR. For instance, TAVR may not result in the increased risk of stroke that regulators might expect it to, and patients may be willing to tolerate the greater need for a permanent pacemaker in order to experience the benefits of TAVR.
The BRA revealed substantial heterogeneity in patient preferences for AS treatment. Some preference heterogeneity is explained by patient age, with older patients being less willing to tolerate the invasiveness of SAVR, instead preferring to accept greater potential risks associated with other procedure attributes in order to reduce the invasiveness. However, preference heterogeneity raises concerns about the attendance of participants to the preference elicitation tasks. A small proportion (8.68%) of participants straight-lined on all questions. While this might indicate a lack of attendance, it may also capture strong preferences for/against the invasiveness of SAVR. Furthermore, all participants interpreted their response to the practice questions correctly, and only a small proportion of respondents demonstrated low health literacy or numeracy. This provides some reassurance that the preference heterogeneity observed in this study reflects a genuine difference in preference, rather than being the result of patients failing to complete the survey in a meaningful way.
Two other studies that used ASW to elicit patient preferences have been published 26,27 . Both studies also observed substantial preference heterogeneity. One of these studies 27 provided evidence supporting the validity and reliability of the preference outputs, both by replicating the results of the ASW with a thresholding exercise, and by comparing participants' responses with their qualitative statements on the basis for their answers. This provides some reassurance about the validity of responses to the ASW exercise used for the current study. This may suggest that methods such as ASW, which elicit individual-level patient preferences, capture more preference heterogeneity than population-level methods, such as discrete choice experiments. Further work could usefully continue to validate the results of ASW exercises and test the hypothesis that individual-level preferences method captures greater heterogeneity in patient preferences.
Only one other study of AS patients' treatment preferences has been published to date 28 . The study design was sufficiently different to the current study-focusing on patients' willingness to accept the mortality risk associated with interventions-that it is not possible to directly compare the results. However, the study by Hussain et al. 28 did reveal a higher risk tolerance among patients with greater disease burden (defined as weekly incidence of restricting symptoms, perceived change in health compared with 1 year earlier, EQ-VAS scores, and the New York Heart Association (NYHA) classification). Our study failed to identify an association between patient preferences and NYHA classification, though this might be due to the limited sample size and the small proportion of the sample in the more severe stages of the NYHA classification. Further research could usefully gather data from a larger sample of AS patients to determine the association of preferences and patient characteristics, such as NYHA classification or whether patients have previously undergone treatment for AS.
While a majority of patients in the current study preferred TAVR, a number of patients (around 20%) preferred SAVR. This, and the underlying heterogeneity in patient preferences, support the need for a shared decision-making (SDM) tool that will help patients and surgeons choose procedures based on both clinical indications and patient risk tolerance. The Patient-Centered Outcomes Research Institute (PCORI) has developed a SDM tool to support patients choose between SAVR and TAVR 15 . However, this tool includes a narrower range of treatment attributes-stroke risk, mortality risk and discharge home-than those included in the analysis reported in this study. Furthermore, the tool does not include a component to elicit a patient's preferences.
As always, the conclusions of the study should be drawn in light of its limitations. First, it was not possible to engage patients in the design of the elicitation exercise. Instead, the patient voice was reflected in the design through a review of the limited preference research undertaken with AS patients to date and through the engagement that the experts who were consulted had themselves had with patients. Further work would usefully confirm with patients that no attributes were excluded from the study. Second, in the first survey round we relied on patient self-report of their AS diagnoses and severity, and in the second wave providing confirmation of AS diagnosis was voluntary. Third, the sample is healthier and younger than the population currently eligible for TAVR and SAVR 29 . AS patients were recruited from the membership of Heart Valve Voice and Mended Hearts as well as through M3 Global Research, and it is possible that patients who are motivated to join these organizations may have different preferences than the broader population. Further AS patient preference research should replicate this study, including older patients with more severe disease burden. Fourth, it was necessary to assume that partial value functions were linear. The sample size and the number of attributes for which preferences were being collected meant collecting data on the shape of the partial value functions would have overburdened respondents.
Finally, the application of ASW in BRA is relatively novel and raises a number of questions. First, it relies on the idea that patients who are indifferent between adding one or another feature to a good or service would be willing to accept either improvement. The concept of indifference is commonly evoked References in methods to assess the value of changes in health outcome. Further, the baseline against which improvements are assessed is the worse level of performance of the treatments being evaluated and over 80% of patients had previously been treated for AS. Thus, in most cases it might be reasonable to assume that patients would accept the changes, as they represent outcomes better than those they've already accepted. However, to infer that they would be willing to accept improvements in outcomes assumes that they were well informed when they made these treatment choices. Further research should test participants willingness to accept changes in attributes.
Second, unlike some other elicitation methods, such as discrete choice experiments, the study assumes that respondents don't make mistakes when they are giving their answers, which is unlikely if respondents are near to their indifference points. It is thus necessary to assume that any such mistakes average out when assessing preferences for the study sample.
Third, as all respondents start their evaluation of attributes at the same point, ASW may be subject to anchoring. To mitigate this risk, 'worst' levels in the choice tasks were defined as the 'worst' clinically relevant level. However, in some instances this may have introduced a ceiling effect, with insufficient improvement in the attribute available to identify the indifference point. In these instances, it was necessary to increase the 'worst' level beyond that experienced by patients. Further work could usefully explore this apparent tension between anchoring and ceiling effects inherent within ASW.

Conclusions
Most AS patients are willing to tolerate sizable increases in clinical risk in exchange for the benefits associated with TAVR. A BRA incorporating data from patients' preferences for the attributes of AS treatments revealed a strong preference for TAVR compared to SAVR. The analysis also revealed substantial heterogeneity in individual patient preferences, partly associated with patient age. Further work is required to understand this heterogeneity, and whether additional patient characteristics such as NYHA class are associated with different preferences. In the meantime, SDM tools should incorporate the factors identified in this model to assist patients and clinicians in achieving a more patient-centered treatment decision. This project contains the following extended data:
• TAVR Survey Contents_Updated.docx (a copy of the questionnaire given to each participant).
Data are available under the terms of the Creative Commons Zero "No rights reserved" data waiver (CC0 1.0 Public domain dedication).

Gargiulo G, Sannino A, Capodanno D, et al.: Transcatheter Aortic Valve
My biggest concern is the number of assumptions implicit in the derivation of the MAR and MAB measures with these data. In particular, the operating assumption that value equivalence in the context of the decisions elicited implies acceptability. I would agree with that premise if the question presented the right context, but here things seem a bit murky. Does the fact that I am indifferent between adding any one of two features to a good or service imply that I'm willing to accept either improvement? Not necessarily. While this assumption is not necessarily wrong, I think it needs to be better justified either with evidence from their qualitative/pilot work or through previous literature.
Almost equally problematic is the omission of measurement error in the analysis of these data. Respondents answered three questions with each pairwise comparison. Depending on the step procedure to change the specifics of the comparisons, there could be a systematic error structure for the measures. For example, a 1-unit improvement across questions could systematically fall short of a threshold among respondents who truly require low probabilities of adverse events to achieve equivalence. Conversely, an initial full swing in the relevant range followed by a gradual increase in the risk levels could end up overestimating the equivalences. I could not find any details on the algorithm followed by the authors as they tried to identify an indifference threshold with only three questions per comparison. That information needs to be included or highlighted.
Even if the respondent reached an equivalence threshold with the questions in the survey, there is still a measurement error for at least three reasons: The current analysis assumes that responses are collected without errors and that respondents don't make mistakes as they give their answers. This is particularly dubious when respondents are near indifference and the attributes are relatively important (i.e., the equivalence ratio is small). 1.
The study design ignores anchoring problems as all respondents started their evaluation of attribute equivalence at the same point. It is well known that anchoring problems can bias measures of preferences. Potentially, this would not be problematic if the baseline scenario was clinically relevant-like the current standard of care-but the principle for the baseline 2. levels seems to be unrelated to clinical considerations.
The range established a priori for the swings may not be sufficient to achieve indifference for some attributes. This would be considered a ceiling effect and could overestimate the importance weight of some attributes.

3.
Without some sense of the measurement error of the instrument, either through theory (i.e., random utility theory) or empirical data (i.e., as done with PROMs), we would seem to be blind about the validity of the study results. These measurement issues need to be better addressed or included as limitations of the study.
Another important implicit assumption of the method is that of linearity or proportionality of weights within the attribute range. There is a lot of evidence that this assumption often does not hold, particularly with low risks of events. The assumption would seem to be particularly crucial here because the evaluation of the importance is not done in the direction (and potentially the range) of the relevant risk changes from a regulatory standpoint. That is, the authors evaluate the impact of reducing the risks from a high absolute level and then move the approximated equivalences to a different point within the studied range to predict the importance of increases in risk levels.
If data comes from the same clinically relevant range the authors could at least argue that the average effects within the range are recoverable, but if evaluating a completely different risk range, there is no guarantee that the linearity assumption is adequate even for the average effect.
Consider the following extreme example, suppose we were interested in the importance of increasing the risk of death within the range of 0% to 10%. Collecting preference information over that range, even if we imposed linearity or proportionality of preferences, would at least capture the correct average effect within that increase-assuming we use the right analysis tools. However, if instead, we approximate that information by looking at the importance of reducing the risk of death from 100% to 90%, there is no guarantee that this importance would be an accurate approximation of preferences for 0% to 10%. The only way the two measures would be identical is if the linearity assumption imposed by the authors holds for the full range of risk levels, but the authors don't test this to justify such an important-and potentially flawed-assumption.
Overall, I think this study results could offer a reasonable approximation of risk tolerance with adaptive swing weighting, but the validity of such approximation is unclear without further evaluation of the implicit assumptions made to derive MAR and MAB.

If applicable, is the statistical analysis and its interpretation appropriate?
Partly Are all the source data underlying the results available to ensure full reproducibility? Partly

Are the conclusions drawn adequately supported by the results? Partly
Competing Interests: No competing interests were disclosed.

Reviewer Expertise:
Health preference research and the use of these preference measures to infer risk tolerance.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to state that I do not consider it to be of an acceptable scientific standard, for reasons outlined above.
Author Response 29 Jan 2021 Kevin Marsh, Evidera Inc, London, UK

Reviewer Feedback 1
My biggest concern is the number of assumptions implicit in the derivation of the MAR and MAB measures with these data. In particular, the operating assumption that value equivalence in the context of the decisions elicited implies acceptability. I would agree with that premise if the question presented the right context, but here things seem a bit murky. Does the fact that I am indifferent between adding any one of two features to a good or service imply that I'm willing to accept either improvement? Not necessarily. While this assumption is not necessarily wrong, I think it needs to be better justified either with evidence from their qualitative/pilot work or through previous literature.

Reviewer Feedback 2
Almost equally problematic is the omission of measurement error in the analysis of these data. Respondents answered three questions with each pairwise comparison. Depending on the step procedure to change the specifics of the comparisons, there could be a systematic error structure for the measures. For example, a 1-unit improvement across questions could systematically fall short of a threshold among respondents who truly require low probabilities of adverse events to achieve equivalence. Conversely, an initial full swing in the relevant range followed by a gradual increase in the risk levels could end up overestimating the equivalences. I could not find any details on the algorithm followed by the authors as they tried to identify an indifference threshold with only three questions per comparison. That information needs to be included or highlighted.
Potential measurement error because: The range established a priori for the swings may not be sufficient to achieve indifference for some attributes. This would be considered a ceiling effect and could overestimate the importance weight of some attributes

Authors' Response 5
As noted in response to the previous comment, the range over which swings were explored was sometimes widened, precisely to try to avoid such ceiling effects. Nevertheless, the bi-modal distribution observed points to the potential that such effects may not have been entirely eradicated. Also noted above, in such designs it is possible that there is a tension between efforts to avoid anchoring and efforts to avoid ceiling effects. We have added a note on this in the discussion section.

Reviewer Feedback 6
Another important implicit assumption of the method is that of linearity or proportionality of weights within the attribute range. There is a lot of evidence that this assumption often does not hold, particularly with low risks of events. The assumption would seem to be particularly crucial here because the evaluation of the importance is not done in the direction (and potentially the range) of the relevant risk changes from a regulatory standpoint. That is, the authors evaluate the impact of reducing the risks from a high absolute level and then move the approximated equivalences to a different point within the studied range to predict the importance of increases in risk levels. If data comes from the same clinically relevant range the authors could at least argue that the average effects within the range are recoverable, but if evaluating a completely different risk range, there is no guarantee that the linearity assumption is adequate even for the average effect.

Authors' Response 6
We agree with this limitation. We had included this in the discussion of the limitations of the approach. We've re-emphasized this challenge in the discussion section, and call it out as a potential limit of the ASW method when working with more than a few attributes. The method was adopted as the goal of the study was to collect of data on preferences for many attributes with small sample sizes. However, in this context, it was not possible to also test the non-linearity of value functions without overburdening participants.

Ross Jaffe
Versant Ventures, San Francisco, CA, USA The authors have addressed my major concerns, and I support indexing of this article.

If applicable, is the statistical analysis and its interpretation appropriate? Partly
Are all the source data underlying the results available to ensure full reproducibility? Partly

Megan Coylewright
Dartmouth-Hitchcock Medical Center, Lebanon, NH, USA The changes provided needed updates. No major changes to methods or conclusions.

Is the work clearly and accurately presented and does it cite the current literature? Partly
Is the study design appropriate and is the work technically sound? Partly

If applicable, is the statistical analysis and its interpretation appropriate? Partly
Are all the source data underlying the results available to ensure full reproducibility? Partly Are the conclusions drawn adequately supported by the results? Partly important topic, and employing known methodologic standards for addressing the question.
They deploy a methodology of adapted swing weighting (ASW), identifying which treatment attributes patients would tolerate to move from an invasive (surgical aortic valve replacement, or SAVR) option to a minimally invasive (transcatheter aortic valve replacement, or TAVR) option. A set of pairwise comparisons were made between 'less invasive" and other features of AVR, with 5 different iterations presented.
The study is written by employees of one of the TAVR companies (Edwards LifeSciences), as well as employees of a company that specializes in patient preference work, paid for by Edwards. It is not apparent that there was engagement of other stakeholders in the study design, such as patients, families, clinicians, administrators, policy makers, etc. There were 5 patients with the disease condition that trialed the exercises before the start of the study to ensure understandability. There is no information how these patients were recruited, or what changes were made in response to their feedback.
The authors' conclusions are that patients are willing to tolerate big increases in risk and exchange for benefits of a minimally invasive procedure. Importance of the research question: Given the recent reconsideration of the TAVR national coverage determination, data as needed to inform interventions designed to increase the degree to which patient preferences are included in final decision making. This study suggests that there is heterogeneity in patient preferences and that patients may tolerate larger differences in risk than physicians and regulators traditionally have deemed acceptable when faced with a minimally invasive procedure. Originality of the work: There is not a large amount of data on the preferences of patients for TAVR.

Strengths:
Identifying patient preferences and how they influence decision-making is an important topic. Pair-wise comparisons can be helpful to delineate tolerance of risk in one domain versus another.
In the decision aid that was presented to the patient's demonstrating differences between the 2 therapies, the same picture was used for both surgery and transcatheter procedure which did potentially reduce bias.

Weaknesses:
The largest weakness is the fact that the attributes were not selected by patients known to be affected by the disease condition; a survey was asked of participants in advocacy groups online. The eligible patients were created from the membership of heart valve voice and mended hearts. They were directed to an online screening tool where patient reported disease conditions were recorded. The study sample was largely white and highly educated. No patients had severe limitations to physical activity, which has been shown to significantly impact decision-making. This has been a consistent weakness in study design for clinical trials as well. The patients selected for the survey, in addition, were not necessarily patients that would be considering this decision.
Data analysis: Responses from the survey were used to compute the maximum acceptable increase in risk or maximum acceptable reduction in benefits that participants would be willing to tolerate in exchange for being involved in a less invasive procedure.
Interpretation of data: 93 patients completed the survey. Patients were not representative of the TAVR population and described above in the weaknesses. The authors importantly describe substantial heterogeneity and patient's responses to the exercise. Importantly, this correlated with patient age, which was lower in the sample than the TAVR population. Patients were more likely to prefer TAVR when considering benefits of independence or risks of mortality or disabling stroke. They were more likely to favor surgical AVR when considering risks of pacemaker and uncertainty regarding durability.
Presentation of data and story: Abstract: The abstract describes that the purpose of the research is to determine which outcomes associated with TAVR and S AVR patients consider most important. Patients were not included in the selection of potential outcomes associated with the therapies. The abstract is stand-alone, describing the findings of trade-offs for risks and benefits. However, there should be a further statement about whether or not the patients had aortic stenosis and whether they were symptomatic as this is known to impact decision-making.
Introduction: The introduction would need to be revised regarding the literature around pacemaker and stroke. Otherwise, the introduction lays out the rationale and explains the goals for the study.
Methods: The methods provide sufficient detail that experiments could be repeated however, much of the methods could be relegated to a supplement online.
Results: The results are hypothesis generating but are limited in the reliability given uncertainty regarding patient diagnosis, severity of AS and symptoms status. In addition, the attributes that the patients are commenting on are defined by the research team, with leadership from the company, and therefore have potential bias.
The findings are clinically relevant as they describe differences and patient preferences around risk and benefit. They highlight the need for shared decision making approach. They also raise the issue of which outcomes are most important to patients. Additional research is needed to identify that from patients with symptoms themselves.
Discussion: The discussion around the relationship between the guidelines and regulators needs editing. It is not clear on what the authors mean when they say that the current guidelines "do not take into account recent clinical data supporting the use of TAVR". There is an important point about how patients willingness to tolerate certain risks may differ from policy makers and physicians and an argument can be made to include patient preferences into final decision making. Limitations described include how well the participants responding to the survey paid attention to the questions. This is described as a reason for some of the heterogeneity. Limitation would be the small size of the patient's sample and those patients not having the disease that is being studied nor being symptomatic. The limitations focus primarily on the selection of ASW exercises as a way to assess patient preferences rather than the patient population.
The tables and figures do stand-alone. The references appear appropriate. I appreciate the opportunity to review this interesting manuscript.

Is the study design appropriate and is the work technically sound?
The authors' conclusions are that patients are willing to tolerate big increases in risk and exchange for benefits of a minimally invasive procedure. Importance of the research question: Given the recent reconsideration of the TAVR national coverage determination, data as needed to inform interventions designed to increase the degree to which patient preferences are included in final decision making. This study suggests that there is heterogeneity in patient preferences and that patients may tolerate larger differences in risk than physicians and regulators traditionally have deemed acceptable when faced with a minimally invasive procedure. Originality of the work: There is not a large amount of data on the preferences of patients for TAVR.

Strengths:
Identifying patient preferences and how they influence decision-making is an important topic. Pair-wise comparisons can be helpful to delineate tolerance of risk in one domain versus another. In the decision aid that was presented to the patient's demonstrating differences between the 2 therapies, the same picture was used for both surgery and transcatheter procedure which did potentially reduce bias.

Weaknesses:
The largest weakness is the fact that the attributes were not selected by patients known to be affected by the disease condition; a survey was asked of participants in advocacy groups online. The eligible patients were created from the membership of heart valve voice and mended hearts. They were directed to an online screening tool where patient reported disease conditions were recorded. The study sample was largely white and highly educated. No patients had severe limitations to physical activity, which has been shown to significantly impact decision-making. This has been a consistent weakness in study design for clinical trials as well. The patients selected for the survey, in addition, were not necessarily patients that would be considering this decision.

Response: We thank the reviewer for their interest in the study and their comments. We agree with concerns about the representative of the sample. The observation of the limitation of the study has been extended in the discussion section to read: "Finally, the current study is based on a relatively small sample of patients, and the sample is healthier and younger than the population currently eligible for TAVR and SAVR. AS patients were recruited from the membership of Heart Value Voice and Mended Hearts, and it is possible that patients who are motivated to join these organizations may have different preferences than the broader population. Further AS patient preference research should replicate this study in a larger sample of patients, including more patients with more severe disease burden".
Data analysis: Responses from the survey were used to compute the maximum acceptable increase in risk or maximum acceptable reduction in benefits that participants would be willing to tolerate in exchange for being involved in a less invasive procedure.

the patient voice was reflected in the design through a review of the limited preference research undertaken with AS patients to date and through the engagement that the experts who were consulted had themselves had with patients. Further work would usefully confirm with patients that no attributes were excluded from the study".
We have also added to the paragraph in the discussion on limitations the fact that we relied on patient self report of their diagnoses and AS severity.
The findings are clinically relevant as they describe differences and patient preferences around risk and benefit. They highlight the need for shared decision making approach. They also raise the issue of which outcomes are most important to patients. Additional research is needed to identify that from patients with symptoms themselves.

Response: We agree this would improve the research. As above, a note to this effect has been added to the discussion.
Discussion: The discussion around the relationship between the guidelines and regulators needs editing. It is not clear on what the authors mean when they say that the current guidelines "do not take into account recent clinical data supporting the use of TAVR". There is an important point about how patients willingness to tolerate certain risks may differ from policy makers and physicians and an argument can be made to include patient preferences into final decision making. Limitations described include how well the participants responding to the survey paid attention to the questions. This is described as a reason for some of the heterogeneity. Limitation would be the small size of the patient's sample and those patients not having the disease that is being studied nor being symptomatic. The limitations focus primarily on the selection of ASW exercises as a way to assess patient preferences rather than the patient population.
Versant Ventures, San Francisco, CA, USA This article by Marsh et al assesses patient preferences regarding the important medical decision about whether to undergo traditional surgical aortic valve replacement (SAVR) versus the newer transcatheter aortic valve replacement (TAVR) for aortic stenosis. This is an important effort to understand what attributes of benefit and risk from aortic valve replacement are most important to patients and understand how patients tradeoff those attributes in deciding which therapy to choose. With shared decision-making becoming increasingly used as our US healthcare system evolves to be more patient-centric, studies such as this one are important to expanding our understanding of how best to provide the information needed by patients to make informed decisions about their care. This study confirms a belief that many clinicians have that most patients prefer TAVR to SAVR because it is less invasive, providing evidence that about 75% patients prefer the less invasive approach. These patients are willing to tolerate a somewhat higher risk of stroke, pacemaker placement, and dialysis, and less evidence of long-term duration for the benefits of less invasiveness. It also identifies that younger patients (<60 yo) perspectives may differ from those of older patients (>60 yo) somewhat, although both groups generally prefer the less invasive procedure.
I support the publication of this study. While this study is not perfect from my point of view, it is an important contribution to the literature, both about how to treat aortic stenosis and about patient preference assessment more generally. Understanding what treatment attributes are most important to patients and how patients trade off benefits and risks is important to understanding how best to inform patients about their treatment options and help them make decisions that best reflect their individual preferences. As the paper notes on p. 11, there has only been one other, more limited study of patient preferences in aortic stenosis, so this study is an important expansion of our understanding of patient preferences that clinical area.
Additionally, from a methodological perspective and as also noted in the paper, there are only a few other studies that use adapted swing weights (ASW) to assess patient preferences. Swing weighting is one of only fourteen methods identified in the Medical Device Innovation Consortium (MDIC) review of preference assessment methodologies. (See: MDIC Patient Centered Benefit Risk Project Report, Appendix A: Catalog of Methods for Assessing Patient Preferences for Benefits and Harms of Medical Technologies, May 2015, available at: https://mdic.org/resource/patientcentered-benefit-risk-pcbr-framework/). This study is an important contribution to the literature about swing weighting methods, and should allow comparison to studies of patient preference using other methods to help researchers and clinicians better understand how best to assess patient preferences.
This study does have a few issues that should be highlighted to help put the results in context. I also note an area for further assessment of the data as well as areas for future research focus. Please note that I come at this study as someone with an interest in patient preferences from clinical and regulatory policy perspective, and do not have the expertise to comment on the specific methodology or statistical analysis, which I will leave to experts in those areas.
Major concerns: 1. The mixing of treated and untreated patients in the participant population: In Box 1 on p. 6, the inclusion criteria describe that patients in the study could have already had a procedure within the last 10 years or could be untreated. From a shared decision making point of view, patients express their preferences prior to treatment -so it is most important to understand the benefitrisk attributes that are most important to patients not yet treated and how such patients trade off such risks. The mixing of both untreated and treated patients may make it hard to understand how the pre-procedure patients view these issues.
Prior treatment could significantly influence a patient's preference for one treatment option or another, but it his hard to know a priori how treatment would influence a patients preferences. Prior treatment may introduce a confirmation bias that patients tend to prefer the procedure that they chose to have, and therefore require much greater benefit or much less risk from the alternative procedure compared to treatment naïve patients. Alternatively, if patients had a negative experience with their prior procedure, they may find the benefit/risk profile of the alternative procedure much more attractive than that of the procedure that they had. Additionally, their experience of specific benefits or risks from their procedure may skew their weighting of those specific attributes compared to other patients.
For this study, it would be important in Table 4 to add a breakdown of the patient treatment history, specifically the number and percentage of patients that have had SAVR, TAVR, or were untreated.
Additionally, it would be helpful to add a comparison of the preferences of patients in each of these categories to show how prior treatment affects the MIR/MRB for each attribute, perhaps in a table similar to Table 5 except substituting treatment category for age. One concern is that with a sample size of 93 patients, sub-categorization by treatment status may result in too few patients in any one category to have confidence in the results. If this is a problem, it should be acknowledged that future studies may be needed to better understand the effect of prior treatment on preferences in aortic valve replacement.
2. Representativeness of patients involved in member organizations: The study recruited participants from two member organizations (Heart Value Voice and Mended Hearts). While it is understandable why the membership of these organizations facilitated identifying patients with aortic valve disease. However, patients who are motivated enough to join such organizations may have different preferences than the broader population of patients with the disease eligible for treatment of their aortic valve disease. There is no way to assess this potential difference in this study, but the authors could acknowledge this potential source of bias in the sample population in their discussion of the results and encourage future study in one or more different aortic valve disease populations.
3. Lack of clarity in the attribute of "Time over which the procedure has been proven to work": From the definition of this attribute in the paper, it is difficult to know whether patients interpreted this attribute as a measure of how long they could expect benefit, i.e, duration of effect, or whether patients also viewed this as how much clinical experience there was with a treatment, i.e., uncertainty in the knowledge about the effect. (See section 2 of the MDIC Patient Centered Benefit Risk Report for a nice discussion of uncertainty and how it relates to patient preferences.). This ambiguity raises the question about whether this attribute as described elicited preferences about expected duration, or elicited preferences about patient tolerance of uncertainty about the effect of TAVR, or some combination of the two. Showing an example of how this "proven to work" attribute was shown to patients akin to Figure 1 might clarify this ambiguity. Future studies might try to separate these attributes, particularly comparing an established therapy like SAVR with a newer treatment like TAVR.

Minor issues and typos:
Introduction, last paragraph of left column, p. 3: the beginning of the second sentence is awkward: "However, little is known about the weights that patients assign to which attributes, . . . . " I would suggest "However, little is known about the weights that patients assign to each attribute. . . ." 1.
Table 1, p. 4: Disabling non-fatal stroke: in the description, first sentence, note the "one 1"should be "one month".
3. I hope that these comments are helpful. Given my interest in the use of patient preferences in the FDA regulatory process and increasing shared decision making in medicine broadly, I am pleased that the authors undertook this study to better understand patient preferences in aortic stenosis treatment, and I support its publication. It should be a nice addition to both the aortic stenosis treatment literature and the patient preference literature.

Are sufficient details of methods and analysis provided to allow replication by others? Yes
If applicable, is the statistical analysis and its interpretation appropriate? I cannot comment. A qualified statistician is required. With shared decision-making becoming increasingly used as our US healthcare system evolves to be more patient-centric, studies such as this one are important to expanding our understanding of how best to provide the information needed by patients to make informed decisions about their care. This study confirms a belief that many clinicians have that most patients prefer TAVR to SAVR because it is less invasive, providing evidence that about 75% patients prefer the less invasive approach. These patients are willing to tolerate a somewhat higher risk of stroke, pacemaker placement, and dialysis, and less evidence of long-term duration for the benefits of less invasiveness. It also identifies that younger patients (<60 yo) perspectives may differ from those of older patients (>60 yo) somewhat, although both groups generally prefer the less invasive procedure.
I support the publication of this study. While this study is not perfect from my point of view, it is an important contribution to the literature, both about how to treat aortic stenosis and about patient preference assessment more generally. Understanding what treatment attributes are most important to patients and how patients trade off benefits and risks is important to understanding how best to inform patients about their treatment options and help them make decisions that best reflect their individual preferences. As the paper notes on p. 11, there has only been one other, more limited study of patient preferences in aortic stenosis, so this study is an important expansion of our understanding of patient preferences that clinical area.
Additionally, from a methodological perspective and as also noted in the paper, there are only a few other studies that use adapted swing weights (ASW) to assess patient preferences. Swing weighting is one of only fourteen methods identified in the Medical Device Innovation Consortium (MDIC) review of preference assessment methodologies. (See: MDIC Patient Centered Benefit Risk Project Report, Appendix A: Catalog of Methods for Assessing Patient Preferences for Benefits and Harms of Medical Technologies, May 2015, available at: https://mdic.org/resource/patient-centered-benefitrisk-pcbr-framework/). This study is an important contribution to the literature about swing weighting methods, and should allow comparison to studies of patient preference using other methods to help researchers and clinicians better understand how best to assess patient preferences.

Response: We thank the reviewer for their interest in our study
This study does have a few issues that should be highlighted to help put the results in context. I also note an area for further assessment of the data as well as areas for future research focus. Please note that I come at this study as someone with an interest in patient preferences from clinical and regulatory policy perspective, and do not have the expertise to comment on the specific methodology or statistical analysis, which I will leave to experts in those areas.
Major concerns: 1. The mixing of treated and untreated patients in the participant population: In Box 1 on p. 6, the inclusion criteria describe that patients in the study could have already had a procedure within the last 10 years or could be untreated. From a shared decision making point of view, patients express their preferences prior to treatment -so it is most important to understand the benefit-risk attributes that are most important to patients not yet treated and how such patients trade off such risks. The mixing of both untreated and treated patients may make it hard to understand how the pre-procedure patients view these issues.
Prior treatment could significantly influence a patient's preference for one treatment option or another, but it his hard to know a priori how treatment would influence a patients preferences. Prior treatment may introduce a confirmation bias that patients tend to prefer the procedure that they chose to have, and therefore require much greater benefit or much less risk from the alternative procedure compared to treatment naïve patients. Alternatively, if patients had a negative experience with their prior procedure, they may find the benefit/risk profile of the alternative procedure much more attractive than that of the procedure that they had. Additionally, their experience of specific benefits or risks from their procedure may skew their weighting of those specific attributes compared to other patients.
For this study, it would be important in Table 4 to add a breakdown of the patient treatment history, specifically the number and percentage of patients that have had SAVR, TAVR, or were untreated.
3. Lack of clarity in the attribute of "Time over which the procedure has been proven to work": From the definition of this attribute in the paper, it is difficult to know whether patients interpreted this attribute as a measure of how long they could expect benefit, i.e, duration of effect, or whether patients also viewed this as how much clinical experience there was with a treatment, i.e., uncertainty in the knowledge about the effect. (See section 2 of the MDIC Patient Centered Benefit Risk Report for a nice discussion of uncertainty and how it relates to patient preferences.). This ambiguity raises the question about whether this attribute as described elicited preferences about expected duration, or elicited preferences about patient tolerance of uncertainty about the effect of TAVR, or some combination of the two. Showing an example of how this "proven to work" attribute was shown to patients akin to Figure 1 might clarify this ambiguity. Future studies might try to separate these attributes, particularly comparing an established therapy like SAVR with a newer treatment like TAVR.

Response: We have updated Figure 1 to use an example of a survey question that includes the 'time proven to work' attribute.
Minor issues and typos: Introduction, last paragraph of left column, p. 3: the beginning of the second sentence is awkward: "However, little is known about the weights that patients assign to which attributes, . . . . " I would suggest "However, little is known about the weights that patients assign to each attribute. . . ." Response: Updated as suggested 1.
Methods, Attribute selection, last sentence (p.3): note "was were". I would remove the were. Updated as suggested 2.
Table 1, p. 4: Disabling non-fatal stroke: in the description, first sentence, note the "one 1" -should be "one month". Response: Updated as suggested 3.

7.
Competing Interests: No competing interests were disclosed.