Long-Term Survival Outcomes of Cytoreductive Nephrectomy Combined with Targeted Therapy for Metastatic Renal Cell Carcinoma: A Systematic Review and Individual Patient Data Meta-Analysis

Simple Summary Cytoreductive nephrectomy (CN) refers to the removal of the primary renal tumor in the setting of metastatic renal cell carcinoma. In the past, the combination of CN with cytokine-based immunotherapy was considered the standard of care. However, CN’s role during the targeted treatment era remains controversial. We attempted to address this issue by performing a systematic review and meta-analysis of the literature. We synthesized data from 15 studies comparing CN and targeted therapy to targeted therapy alone. Our results show that CN combined with targeted therapy was associated with increased survival compared to targeted therapy only. Careful patient selection is required to take full advantage of any survival benefit that CN may offer. Future research endeavors should focus on developing appropriate prognostic models to guide appropriate patient selection for CN. Abstract The role of cytoreductive nephrectomy (CN) in the treatment of metastatic renal cell carcinoma (mRCC) remains controversial during the targeted therapy era. To reconcile the current literature, we analyzed the reported survival data at the individual patient level and compared the long-term survival outcomes of CN combined with targeted therapy vs. targeted therapy alone in patients with mRCC. We performed a systematic review of the literature using the MEDLINE, Scopus, and Cochrane Library databases (end-of-search date: 21 July 2020). We recuperated individual patient data from the Kaplan–Meier curves for overall (OS), progression-free (PFS), and cancer-specific survival (CSS) from each study. We subsequently performed one-stage frequentist and Bayesian random-effects meta-analyses using both Cox proportional hazards and restricted mean survival time (RMST) models. Two-stage random-effects meta-analyses were also performed as sensitivity analyses. A subgroup analysis was also performed to determine the effect of CN timing. Fifteen studies fulfilling our inclusion criteria were identified, including fourteen retrospective cohort studies and one randomized controlled trial. In the one-stage frequentist meta-analysis, the CN group had superior OS (hazard ratio [HR]: 0.58, 95% confidence interval [CI]: 0.54–0.62, p < 0.0001) and CSS (HR: 0.63, 95% CI: 0.53–0.75, p < 0.0001). No meaningful clinical difference was observed in PFS (HR: 0.90, 95% CI: 0.80–1.02, p = 0.09). One-stage Bayesian meta-analysis also revealed superior OS (HR: 0.59, 95% credibility interval [CrI]: 0.55–0.63) and CSS (HR: 0.63, 95% CrI: 0.53–0.75) in the CN group, while no meaningful clinical difference was detected in PFS (HR: 0.91, 95% CrI: 0.80–1.02). Similar results were obtained with the RMST models. The OS benefit was also noted in the two-stage meta-analyses models, and in the subgroup of patients who received upfront CN. The combination of CN and targeted therapy for mRCC may lead to superior long-term survival outcomes compared to targeted therapy alone. Careful patient selection based on prognostic factors is required to optimize outcomes.


Introduction
Over the past year, almost 74,000 new cases of kidney and renal pelvis cancer were recorded in the United States alone [1]. Despite the advances in imaging modalities allowing for earlier diagnosis, a significant proportion of patients (>10%) present with metastatic disease [2]. The management of metastatic renal cell carcinoma (mRCC) has undergone a significant shift over the past three decades. The discovery of RCC's immunogenicity led to the establishment of interleukin-2 and interferon-alpha as first-line treatments for mRCC during the 1990s, marking the so called "cytokine era" [3][4][5]. Cytoreductive nephrectomy (CN) refers to the removal of the primary renal tumor in the metastatic setting and met with sporadic success during the 20th century [6,7]. In the early 2000s, CN re-emerged and its combination with cytokine-based immunotherapy became the new standard of care, following the results of two randomized controlled trials (RCTs) [8,9]. Subsequently, several RCTs established the superiority of targeted therapies over cytokine-based immunotherapy [5,[10][11][12], leading to a paradigm shift and the new "targeted therapy era" in the treatment of mRCC. The implementation of CN declined along with cytokine-based immunotherapy, mainly because its role when combined with targeted therapies remained unclear [13].
Over the last decade, multiple large retrospective studies have shown promising results with the combination of CN and targeted therapies [14,15]. In contrast, the Cancer du Rein Metastatique Nephrectomie et Antiangiogéniques (CARMENA) trial reported non-inferiority of the targeted therapy sunitinib alone vs. CN followed by sunitib in the intention-to-treat (ITT) population [16]. However, non-inferiority trials, such as CAR-MENA, have in practice a greater than 80% probability of reaching a verdict of noninferiority, particularly when protocol adherence rates are low. CARMENA suffered from such low protocol adherence rates, which prevented a full per-protocol analysis of its data, and even a partial per-protocol analysis (termed "PP2" in the CARMENA report) led to inconclusive results [16]. Nevertheless, the results of CARMENA casted doubt upon the value of CN during the targeted therapy era, and a subsequent retrospective report also disputed the long-term benefits of CN [17]. As a result, CN in the setting of targeted therapy remains controversial. Therefore, we sought to systematically review and synthesize the totality of currently available evidence comparing the long-term survival outcomes of CN combined with targeted therapy over targeted therapy alone in patients with mRCC.

Study Design and Inclusion/Exclusion Criteria
This systematic review and meta-analysis was performed according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines and in line with a protocol developed and agreed upon by all authors (Supplemental Data File 1) [18]. An Institutional Review Board approval or patient written consent was not necessary, as we used already published data.
We applied the Population/Participants, Intervention, Comparison, Outcomes, Study design (PICOS) framework to define the study selection criteria as follows: • Participants: Patients of any age, sex, or race with mRCC.
Original randomized clinical trials and non-randomized cohort studies (both prospective and retrospective) comparing the combination of CN and targeted therapy vs. targeted therapy alone for mRCC, published in English, were considered eligible for inclusion. The exclusion criteria were defined as follows: (i) articles without a full text in English; (ii) irrelevant articles; (iii) animal studies; (iv) case reports; (v) narrative or systematic reviews and meta-analyses; (vi) letters to the editor, editorials, commentary, errata, perspectives without any primary patient data; (vii) published abstracts without any available full text; (viii) non-comparative studies (<2 study arms); (ix) studies without any extractable data for the outcomes of interest.
We assessed all eligible studies for overlapping populations based on the author list, study center, country of origin, and dates of patient enrollment. Between studies with overlapping populations, we included those having the largest patient sample, reporting granular data for the outcomes of interest, and providing Kaplan-Meier curves that would permit reconstruction of individual patient data (IPD). However, when data on additional outcomes were provided through multiple studies, we extracted data from all of them. In these cases, we did not sum the populations of each study in the overall subject numbers, as they represented additional analyses on the same cohorts.

Literature Search Strategy
A systematic search was performed using the MEDLINE (via PubMed), Cochrane Library, and Scopus bibliographic databases (end-of-search date: 21 July 2020) by two independent researchers (S.M.E. and M.D.H) using the term "cytoreductive nephrectomy". All disagreements on article inclusion were resolved after reaching a consensus. In accordance with the snowball methodology, references from all included articles and previously published systematic reviews/meta-analyses were also manually searched to identify any potentially missed but otherwise eligible for inclusion studies [19].

Data Tabulation and Extraction
Data tabulation and extraction for evidence synthesis were performed using standardized, pre-piloted spreadsheets. Two reviewers (S.M.E. and M.D.H.) independently extracted all data and any disagreements were identified and resolved after reaching a consensus. The following data were extracted: (i) study characteristics (first author, year of publication, study design, study center, study period, number of patients for each group); (ii) patient characteristics (age in years, gender, Eastern Cooperative Oncology Group [ECOG] performance status, International Metastatic RCC Database Consortium [IMDC]/Heng risk score, Memorial Sloan Kettering Cancer Center [MSKCC]/Motzer risk score); (iii) tumor-related characteristics (histology, T stage and N stage according to TNM, number and location of metastases); (iv) treatment-related characteristics (type of targeted therapy); and (v) long-term survival outcomes (OS, PFS, and CSS).

Risk of Bias in Individual Studies
For observational studies, two independent reviewers (S.M.E. and I.A.Z) assessed the risk of bias using the Risk of Bias in Non-randomized Studies of Interventions (ROBINS-I) tool. The tool examines seven domains as a possible source of bias: (i) confounding, (ii) se-Cancers 2021, 13, 695 4 of 20 lection of participants (iii) classification of interventions, (iv) deviations from intended interventions, (v) missing data, (vi) measurement of outcomes, and (vii) selection of reported results. These domains are examined across three different levels (pre-intervention, at intervention, and post-intervention). For each domain, multiple standardized signaling questions are answered with "yes", "probably yes", "probably no", "no", and "no information". Based on these answers, a domain-level judgement of bias is formulated and the risk of bias for each domain can be characterized as "low risk", "moderate risk", "high risk", "critical risk", or "no information". Finally, an overall risk of bias judgement is made for each study using the same terms as for the domain-level judgments [20].
For RCTs, two independent reviewers (S.M.E. and I.A.Z) assessed the risk of bias using the Risk of Bias 2 (RoB 2) tool for randomized trials. The tool examines five domains as a possible source of bias: (i) randomization process, (ii) deviations from intended interventions, (iii) missing outcome data, (iv) measurement of the outcome, and (v) selection of the reported result. For each domain, multiple standardized signaling questions are answered with "yes", "probably yes", "probably no", "no", and "no information". Based on these answers, a domain-level judgement of bias is formulated and the risk of bias for each domain can be characterized as "low risk", "some concerns", or "high risk". Finally, an overall risk of bias judgement is made for each study using the same terms as for the domain-level judgments [21].

Data Pooling
Continuous variables were summarized as the means and standard deviations (SDs), and categorical variables were summarized as frequencies and percentages. We applied the methods described by Hozo et al. and Wan et al. to calculate means and SDs when continuous variables were reported as medians and ranges or medians and interquartile ranges, respectively [22,23]. All relative rates were calculated based on the available data for each variable of interest. All data were handled according to principles described in the Cochrane Handbook [24]. All time-to-event outcomes were summarized as hazard ratios (HRs) and 95% confidence intervals (CIs). Publication bias was assessed using funnel plots, which were examined for asymmetry. All statistical analyses were conducted with Stata IC 16.0 (StataCorp LLC, College Station, TX, USA) and R (Version 3.6.1) [25]. Statistical significance was set at 0.05 and all p-values were two tailed.

Reconstruction of Individual Patient Survival Data
We used the methods described by Guyot et al. to reconstruct IPD from the survival curves of all eligible studies for the long-term survival outcomes (OS, PFS, CS) [26,27]. Vector and raster images of the Kaplan-Meier survival curves were pre-processed and digitized, so that the step function values and timing of steps could be extracted. Survival IPD were then reconstructed based on the numerical solutions to the inverted Kaplan-Meier product-limit equations. When the censoring pattern was not provided, we assumed that it was independent of failure time, and thus constant within each time interval [26]. Additional data, such as the number of patients at risk at every time interval or the total number of events, were used to further increase the accuracy of our calculations for the time-to-event data, when available [28]. Departures from monotonicity were detected using isotonic regression and corrected with a pool-adjacent-violators algorithm [26,27]. For every individual study, we compared summary statistics from our reconstructed IPD and curves (e.g., survival percentages at various time points, median survival time, total number of events, number-at-risk tables) with those reported in the original publications to ensure that they were accurate. The Kaplan-Meier method was used to calculate the OS, PFS, and CSS. Both semiparametric (i.e., Cox proportional hazards regression model) and non-parametric methods (i.e., restricted mean survival time [RMST]) were used to assess between-group difference.
The primary analysis for the OS, PFS, and CSS was performed using the Cox proportional hazards regression model, in which every patient within each individual study is assumed to be similarly failure prone to other patients belonging to that study. For these Cox models, the proportional hazards assumption was verified holistically from several assessments using the Grambsch-Therneau test for a non-zero slope, as well as by plotting scaled Schoenfeld residuals, log-log survival plots, and predicted versus observed survival functions. We plotted survival curves using the Kaplan-Meier product limit method and compared the HRs and 95% CIs of each group.
An alternative approach to analyzing time-to-event outcomes when non-proportional hazards are present is the RMST, which can be intuitively interpreted as the mean life expectancy up to a given time frame [29][30][31]. Accordingly, the life expectancy difference (LED) is the measure of the between-group RMST difference and expresses the absolute gain or loss in life expectancy, while the life expectancy ratio (LER) is the measure of the between-group RMST ratio and expresses the relative gain or loss in life expectancy [30]. We computed the RMST using the naïve Kaplan-Meier method which ignores study level effects, as it has been shown to always be unbiased in all meta-analytical scenarios [32].

Two-Stage Survival Meta-Analyses
As a sensitivity analysis, we calculated summary HRs and 95% CIs for all individual studies based on the reconstructed IPD, and pooled them under the conventional "twostep" frequentist meta-analysis for all three long-term survival outcomes (OS, PFS, and CSS) [33]. We used the (DerSimonian-Laird) random-effects model to account for the significant clinical heterogeneity across the included studies, derived from factors such as the type of targeted therapy used, as well as the order of and the time intervals between CN and targeted therapy initiation. A subgroup analysis was performed depending on whether the included studies satisfied the proportional hazards assumption or not; the subgroup of studies that satisfied this criterion was considered to yield less biased pooled HR estimates. Between-study heterogeneity was assessed using the Cochran Q and the I 2 statistic. High heterogeneity was defined with a significance level of p < 0.05 and a I 2 value of ≥50%.

Bayesian Meta-Analysis
We also performed a Bayesian one-stage random-effects meta-analysis to reflect our uncertainty regarding the potential survival benefits of CN, using an uninformative prior distribution β~N(0,10 10 ), τ 2~Γ (0.001,0.001). The analysis was performed in R using the spBayessurv package. We calculated the posterior median HRs and 95% credible intervals (CrIs) for each outcome (OS, PFS, and CSS) and compared them to the results of the one-stage frequentist meta-analysis.
As a second sensitivity analysis, we performed a two-stage Bayesian meta-analysis. We used the Tibshirani prior [34] as an uninformative prior and the half-normal prior (0,5) as a weakly informative prior. The analysis was performed in R using the rstanarm package. We then conducted a two-stage Bayesian meta-analysis using the random-effects model. We calculated the posterior median HRs and 95% credible intervals (CrIs) for each outcome (OS, PFS, and CSS) and compared them to the results of the two-stage frequentist meta-analysis.

Subgroup Analysis According to Cytoreductive Nephrectomy Timing
A subgroup analysis was initially planned to compare outcomes between upfront and deferred CN, due to the ongoing debate regarding the optimal timing of CN. However, none of the included studies specifically reported outcomes on deferred CN. Therefore, we Cancers 2021, 13, 695 6 of 20 limited our subgroup analysis to studies reporting on upfront CN, after excluding studies with mixed (upfront and deferred) CN groups. We performed both one-stage frequentist and Bayesian meta-analyses using the same methodology as for the primary analyses described above.

Study Selection and Characteristics
After removing all duplicates, we identified 998 unique articles through our systematic search. We determined that 96 articles were relevant based on their titles and abstracts, and further evaluated their full texts for eligibility. Fifteen studies (fourteen retrospective cohort studies and one RCT) fulfilled the pre-determined inclusion criteria and were included in our meta-analysis ( Figure 1) [15][16][17][35][36][37][38][39][40][41][42][43][44][45][46]. On one occasion, data from two studies with overlapping populations reporting on different outcomes were combined [39,47]. A total of 2234 patients received CN combined with targeted treatment, while 1756 patients received targeted therapy alone. Detailed study characteristics and patient demographics are shown in Table 1, clinical characteristics and type of targeted therapy used in Table 2. meta-analysis.
2.5.6. Subgroup Analysis According to Cytoreductive Nephrectomy Timing A subgroup analysis was initially planned to compare outcomes between upfro and deferred CN, due to the ongoing debate regarding the optimal timing of CN. How ever, none of the included studies specifically reported outcomes on deferred CN. Ther fore, we limited our subgroup analysis to studies reporting on upfront CN, after excludin studies with mixed (upfront and deferred) CN groups. We performed both one-stage fr quentist and Bayesian meta-analyses using the same methodology as for the primary ana yses described above.

Study Selection and Characteristics
After removing all duplicates, we identified 998 unique articles through our system atic search. We determined that 96 articles were relevant based on their titles and a stracts, and further evaluated their full texts for eligibility. Fifteen studies (fourteen retr spective cohort studies and one RCT) fulfilled the pre-determined inclusion criteria an were included in our meta-analysis ( Figure 1) [15][16][17][35][36][37][38][39][40][41][42][43][44][45][46]. On one occasion, data fro two studies with overlapping populations reporting on different outcomes were com bined [39,47]. A total of 2234 patients received CN combined with targeted treatmen while 1756 patients received targeted therapy alone. Detailed study characteristics an patient demographics are shown in Table 1, clinical characteristics and type of targete therapy used in Table 2.

Risk of Bias Assessment Individual Patient Data and Survival Curve Reconstruction
We assessed the individual risk of bias of 14 observational studies using the ROBINS-I tool. The overall risk of bias was determined to be low in one study [41], moderate in twelve studies [15,17,[35][36][37][38][39][40][42][43][44]46] and serious in one study [45]. No studies were found to be at critical risk of bias (Figure 2A,B).
We assessed the individual risk of bias of one RCT [16] using the RoB2 tool. The overall risk of bias was determined to be high, stemming from deviations from the intended interventions, while some concerns were present regarding the randomization process ( Figure 2C,D).

Individual Patient Data and Survival Curve Reconstruction
The Kaplan-Meier curves for each outcome (OS, PFS, and CSS) were appropriately processed and digitized. A total of sixteen OS curves, seven PFS curves, and three CSS curves were reconstructed. A side-by-side comparison of our reconstructed Kaplan-Meier curves and those found in the original publications is provided in Supplemental Data File 2. Using a previously validated methodology, we recuperated IPD from the survival curves of each outcome (Supplement Data File 3).

One-Stage Frequentist Survival Meta-Analysis
We used the Cox proportional hazards model for our main analysis of all outcomes (OS, PFS, and CSS), since we did not detect any violation of the proportionality-of-hazards assumption upon a holistic assessment using the Grambsch-Therneau test and by visualizing scaled Schoenfeld residuals, log-log survival plots, and predicted versus observed survival curves (Supplemental Data File 4). We nevertheless carried out secondary analyses with non-parametric methods (i.e., RMST), according to our protocol, even though the proportionality-of-hazards assumption was not rejected.

Individual Patient Data and Survival Curve Reconstruction
The Kaplan-Meier curves for each outcome (OS, PFS, and CSS) were appropriately processed and digitized. A total of sixteen OS curves, seven PFS curves, and three CSS curves were reconstructed. A side-by-side comparison of our reconstructed Kaplan-Meier curves and those found in the original publications is provided in Supplemental Data File 2. Using a previously validated methodology, we recuperated IPD from the survival curves of each outcome (Supplement Data File 3).

One-Stage Frequentist Survival Meta-Analysis
We used the Cox proportional hazards model for our main analysis of all outcomes (OS, PFS, and CSS), since we did not detect any violation of the proportionality-of-hazards assumption upon a holistic assessment using the Grambsch-Therneau test and by visualizing scaled Schoenfeld residuals, log-log survival plots, and predicted versus observed survival curves (Supplemental Data File 4). We nevertheless carried out secondary analyses with non-parametric methods (i.e., RMST), according to our protocol, even though the proportionality-of-hazards assumption was not rejected.

Cancer-Specific Survival
The CSS curve of the pooled patient cohorts either receiving CN plus targeted therapy (n = 313) or targeted therapy alone (n = 278) derived from three studies [15,17,35] is presented in Figure 3C. The median CSS was 32.3 months (95% CI: 27.0 to 37.0) in the CN group and 17.6 months (95% CI: 12.7 to 20.9) in the non-CN group. Patients receiving the combination of CN and targeted therapy had significantly lower risk of death from mRCC compared to those receiving targeted therapy alone (HR: 0.63, 95% CI: 0.53 to 0.75, p < 0.0001).

Two-Stage Frequentist Survival Meta-Analysis
In the two-stage frequentist meta-analysis, CN combined with targeted therapy was associated with superior OS (HR: 0.59, 95% CI: 0.49 to 0.71, p < 0.001, I 2 = 79.31%) compared to targeted therapy alone. In the subgroup analysis, the combination of CN and targeted therapy was associated with superior OS in the subgroup satisfying the proportionalityof-hazards assumption (HR: 0.50, 95% CI: 0.44 to 0.58, p < 0.001, I 2 = 29.60%), but was inconclusive in the subgroup violating the proportionality-of-hazards assumption (HR:

Discussion
In this systematic review and IPD meta-analysis, we showed that the combination of CN and targeted therapy in mRCC is associated with superior long-term survival outcomes compared to targeted therapy alone. Both OS and CSS were superior in the group receiving CN. In contrast, no clinically meaningful differences were detected in the PFS between the groups.
Although several meta-analyses have attempted to address this topic in the past, limitations in their methodology have precluded any definitive conclusions [48][49][50][51][52]. In some cases, multiple overlapping populations were included [48][49][50][51][52] and eligible studies were omitted [48,[50][51][52] resulting in significant bias due to the disproportionate representation of certain patient populations. An additional limitation of studies analyzing databases (e.g., the Surveillance Epidemiology and End-Results database and the National Cancer Data Base) included in the previous meta-analyses [48,50,52] is the use of surrogate coding markers, such as the year of diagnosis or the receipt of systemic therapy, to identify patients receiving targeted therapy [53,54]. Therefore, these studies may have included a significant proportion of patients that did not actually receive targeted therapy. To avoid these shortcomings, we ensured that every population was represented only once in our meta-analysis and we only included studies with confirmed targeted therapy use in their intention-to-treat population. To our knowledge, this is the first meta-analysis on the topic to incorporate IPD, which is considered the "gold-standard" method to meta-analyze time-to-event outcomes, and the first to investigate additional survival outcomes other than OS, such as CSS [55]. The higher precision achieved by using IPD compared to aggregate study level data is showcased by the inconclusive or less precise results of our two-stage meta-analyses, which emulate aggregate data meta-analyses, compared with the much more precise results of our primary IPD meta-analyses with both frequentist and Bayesian approaches.
Even though CN has been a staple in the treatment of mRCC for decades, the exact mechanism behind its survival benefit remains unclear. A variety of theories have been proposed over the years. An intuitive explanation is that CN reduces the overall tumor load and thus prolongs the period needed for it to reach lethal levels [14]. There is also evidence suggesting that CN further opposes tumor growth by indirectly affecting the tumor microenvironment. Removing functional nephrons induces a mild systemic metabolic acidosis, which may be enough to overwhelm the acid-base regulation ability of tumor cells, resulting in necrosis [56]. Similarly, many angiogenic factors that promote tumor growth, such as VEGF, decrease following nephrectomy [57]. The interaction between CN and the immune system remains a point of debate. The ability of mRCC to downregulate the immune system through various pathways is well established, and therefore removing the primary tumor may enhance the immune response against the remaining cancer cells [58,59]. However, this effect may be counterbalanced by the ongoing systemic inflammation, which promotes tumorigenesis, as well as the immunosuppressive effects of the surgery itself [59,60]. Although the concurrent presence of widespread inflammation and immunosuppression may initially appear as counterintuitive, this interaction is particularly illustrated through C-reactive protein, a marker of inflammation. Indeed, C-reactive protein levels closely correlate with the tumor-induced immunosuppression in mRCC [61]. This finding suggests that the tumor-induced immunosuppression and widespread inflammation are intertwined as part of the generalized immune dysregulation caused by mRCC. CN may be able to partially reverse these effects in some but not all patients. For instance, there is a growing body of evidence suggesting that CN may be less beneficial to patients with generalized inflammation reflected by high C-reactive protein levels [37].
An important consideration when examining the benefits of CN is appropriate patient selection [62]. Being an invasive procedure on an already disease-burdened patient population, CN is associated with significant morbidity and mortality that is higher compared with that of standard nephrectomy [63]. Those with poor baseline characteristics may thus never fully recover from the operation to receive targeted therapy or inevitably experience rapid tumor progression in the immediate post-operative period [64]. This phenomenon may be explained by the direct and indirect effects of a major abdominal operation such as CN, namely the surgery-induced state of immunosuppression and release of growth factors, as well as the potential delay of systemic therapy initiation related to surgical complications, respectively [65]. As a result, the survival benefit in patients with poor prognostic factors is minimal, while the operation itself and the high rate of post-operative complications may significantly impact quality of life [15,42]. Current guidelines regarding CN reflect these concerns and heavily question CN's current role in mRCC management [66]. Consequently, studies showing a survival benefit with CN may be biased in their patient selection. In our pooled sample, the non-CN group had significantly higher proportion of patients with poor IMDC risk score, Karnofsky score <80%, T3/T4 stage, N+ stage, and >2 metastatic sites, all of which may have confounded the results. In contrast, the ECOG scores among the two groups were similar, while the CN group had more patients with poor MSKCC score. This contradictory finding highlights the lack of a uniform score scale to evaluate the patients' baseline surgical risk. Even though numerous prognostic scales have been developed for this purpose during the cytokine era, external validation has shown that they all perform poorly in a targeted therapy-predominant cohort [62]. For this reason, newer scales have been developed during the targeted therapy era; examples include those by McIntosh et al. [67] and Marchioni et al. [68]. However, these remain to be prospectively evaluated and externally validated before they can be implemented into routine clinical practice. As shown in our study, authors resort to a variety of prognostic scales as a substitute to stratify their patients, reflecting the lack of a specialized prognostic model to satisfy this need. This approach may be problematic as these scales were designed for different purposes and are not directly comparable with each other [69]. A post-hoc analysis of the CARMENA trial particularly highlights this concern by suggesting that patients with one but not two IMDC risk factors (both classified as intermediate-risk patients) benefited from CN [70]. Other authors have used their own models to stratify patients based on prognostic factors derived from regression analyses [36,44]. Regardless of the approach, several studies have shown that CN offered a considerable survival advantage, even when accounting for these risk factors by performing subgroup analyses in patients with more favorable prognosis [15,36,42,44]. Even though we were not able to synthesize their results due to the heterogeneity in the stratification method used, this evidence suggests that the benefit of CN stands even after taking selection bias into consideration.
Apart from appropriate patient selection, the timing of CN relative to targeted therapy initiation is another parameter that may affect treatment outcomes. The SURTIME trial comparing upfront and deferred CN hinted that the latter may lead to an OS advantage but failed to provide a definitive answer, presumably due to poor accrual [71]. A study pooling data from multiple trials found an OS advantage of deferred over the upfront approach [72]. A more recent multi-institutional study using real-world data also came to the same conclusion [73]. In contrast, a study using NCDB data suggested an advantage of upfront CN [54]. In our meta-analysis, we were unable to directly compare outcomes between the two approaches, as none of the included studies utilized deferred CN exclusively. After excluding studies with both upfront and deferred CN in their sample, we found that patients undergoing upfront CN followed by targeted therapy still had superior OS and CSS compared to those receiving targeted therapy alone. However, the benefit was smaller compared to the primary analysis that also included deferred CN cases. This finding is in accordance with previous studies suggesting that deferred CN may be optimal in terms of timing, but also shows that upfront CN in select patients may still lead to better outcomes compared to no CN.
The main strength of this study is its robust methodology and large sample size, particularly for the OS analysis. Nonetheless, our results should be interpreted with caution due to the inherent limitations of our study. First, most of the included studies were retrospective in nature. In the context of our study, this is important because it may facilitate immortal time bias [74]. Most included studies used either the time of mRCC diagnosis [15,38,39] or the time of targeted therapy initiation [42,43,45,46] as the starting point in their survival analysis. Therefore, patients receiving deferred CN may have been considered "immortal" up to the point of undergoing CN, while those who were scheduled to receive CN but died before doing so may have been excluded from the CN group, thus skewing the results in favor of CN. In contrast, when patients received upfront CN and the date of targeted therapy initiation was used as the starting point, the results may have been skewed towards the opposite direction as the non-CN group is considered "immortal" during the period from CN and its postoperative recovery until targeted therapy initiation [40]. Second, due to the inability to obtain IPD for variables other than survival outcomes, we were not able to perform subgroup analyses for factors with prognostic significance that may influence patient selection for CN. Examples include clear-cell vs. non-clear-cell mRCC [17] and favorable vs. intermediate vs. poor IMDC or MSKCC risk score [42]. Therefore, the results of the crude cohorts may impart a degree of inherent selection bias. Third, some studies included a small proportion of patients that did not receive the planned targeted therapy [44], or subsequently received other forms of systemic therapy, such as immunotherapy [16]. We decided to include these studies regardless, as we deemed the increase in study power to be more important than the slight increase in heterogeneity by deviation from the intended protocol. Lastly, as with any systematic review, some of the articles did not report on all variables of interest, and thus all relative rates were calculated according to the availability of data.

Conclusions
The combination of CN and targeted therapy for mRCC may lead to superior longterm survival outcomes compared to targeted therapy alone. Careful patient selection based on baseline prognostic factors is required to achieve optimal survival for patients with mRCC.