NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.

Rao M, Yu WW, Chan J, et al. Serum Free Light Chain Analysis for the Diagnosis, Management, and Prognosis of Plasma Cell Dyscrasias [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US); 2012 Aug. (Comparative Effectiveness Reviews, No. 73.)

  • This publication is provided for historical reference only and the information may be out of date.

This publication is provided for historical reference only and the information may be out of date.

Cover of Serum Free Light Chain Analysis for the Diagnosis, Management, and Prognosis of Plasma Cell Dyscrasias

Serum Free Light Chain Analysis for the Diagnosis, Management, and Prognosis of Plasma Cell Dyscrasias [Internet].

Show details


This CER evaluates the SFLC assay as an adjunctive diagnostic and prognostic tool for various PCDs in addition to the standard diagnostic tests for PCDs. The evidence presented was obtained through a systematic review of the published scientific literature using established methodologies as outlined in the AHRQ's Methods Guide for Effectiveness and Comparative Effectiveness Reviews23 and Methods Guide for Medical Test Reviews.24

AHRQ Task Order Officer

The Task Order Officer (TOO) was responsible for overseeing all aspects of this project. The TOO facilitated a common understanding among all parties involved in the project, resolved ambiguities, and fielded all Evidence-based Practice Center (EPC) inquiries regarding the scope and processes of the project. The TOO and other staff at AHRQ reviewed the report for consistency and clarity and to ensure that it conforms to AHRQ standards.

External Expert Input

During a topic refinement phase, the initial questions were refined with input from a panel of Key Informants. Key Informants included representatives from AACC; experts in renal amyloidosis, clinical chemistry, and general internal medicine and geriatrics; patient advocates; and representatives from the Centers for Medicare and Medicaid Services and a nationwide health insurance company. After a public review of the proposed KQs, we convened a Technical Expert Panel (TEP) consisting of experts (some of whom were Key Informants) in MM and/or AL amyloidosis, clinical chemistry, and general medicine), which served in an advisory capacity to help refine KQs, identify important issues, and define parameters for the review of evidence. Discussions among the EPC, TOO, and Key Informants, and, subsequently, the TEP occurred during a series of teleconferences and via email. In addition, input from the TEP was sought during compilation of the report when questions arose about the scope of the review.

Analytic Framework

The five KQs take into account the patient populations, interventions (diagnostic tests/disease monitoring), comparators, outcomes, timing, and settings (PICOTS) that are clinically relevant to the use of the SFLC analysis. Specifically, they pertain to the diagnosis of PCDs, prognosis (i.e., progression from MGUS to MM as well as overall and disease-free survival in patients with a malignant PCD), change in treatment decisions, assessment of response to treatment, and reduction of the need for other diagnostic tests (e.g., bone marrow biopsy).

To guide the development of the KQs, we generated an analytic framework (Figure 1) that maps the specific linkages associating the population (patients with PCD symptoms) and subgroups of interest to the additional tests (i.e., SFLC analysis in addition to traditional testing) and comparator (traditional tests alone), and the outcomes of interest (diagnostic accuracy, prognosis, disease management, reduction of other diagnostic tests, and response to treatment). This framework depicts the chain of logic that evidence must support to link the use of the SFLC assay to improved health outcomes.

Figure 1 is an analytic framework depicting the five Key Questions within the context of PICOTS (populations, interventions [diagnostic tests/disease monitoring], comparators, outcomes, timing, and settings). This flow chart illustrates how Key Questions 1 through 5 address how the addition of serum free light chain (SFLC) analysis to traditional testing (i.e., serum/urine electrophoresis and/or immunofixation), as compared with traditional testing alone, could affect diagnostic accuracy, prognosis prediction, management decisions, overall outcomes, and the need for other diagnostic tests, as well as how SFLC analysis might affect assessment of the risk of progression of monoclonal gammopathy of undetermined significance to multiple myeloma.

Figure 1

Analytic framework for SFLC analysis for the diagnosis, management, and prognosis of PCDs. AL amyloidosis=systemic amyloidosis in which amyloid [A] proteins derived from immunoglobulin light chains [L] are deposited in tissue, KQ=Key Question, MGUS=monoclonal (more...)

Literature Search

We conducted literature searches of studies published from January 1, 2000, through January 31, 2012, in MEDLINE®, the Cochrane Central Register of Controlled Trials, and the Cochrane Database of Systematic Reviews. A start year of 2000 was chosen because the SFLC assay was approved by the FDA in 2001; any reports of clinical use of the assay prior to 2000 would not be representative of the approved test. All English-language studies with adult human participants were screened to identify articles relevant to each KQ. The reference lists of related systematic reviews as well as selected narrative reviews and primary articles were also reviewed for relevant studies. Our search included variations of the terms “immunoglobulin light chain,” “monoclonal light chain,” “serum free light chain,” and “Bence Jones protein” (see Appendix A for complete search strings). TEP members were also invited to provide additional search terms.

Study Selection and Eligibility Criteria

We included published, peer-reviewed articles only. We did not use unpublished data, non–English-language studies, abstracts, or conference proceedings. The consensus of the TEP was not to include unpublished data or studies in the form of single case reports. Case series were included on the basis of the prevalence of the type of PCD (with lower thresholds applied for rarer forms), as long as extractable quantitative data were present. Sample size thresholds were chosen primarily on the basis of practical consideration of available resources and time, taking into consideration the likely yield of available literature. We did not contact authors for additional data.

Abstracts were manually screened, using Abstrackr,25 by two members of the team independently to ascertain whether they met the predefined eligibility criteria (see next paragraph) and exclusions and were reviewed by a second member of the team. Articles that were excluded after full-text screening are listed, with the reasons for exclusion, in Appendix B. Articles whose abstracts were relevant, as well as those that did not clearly signal inclusion or exclusion, were retrieved in full text for detailed evaluation to determine eligibility. During full-text evaluation, equivocal articles were read by at least two team members.

Below are the eligibility criteria for study inclusion. No restrictions were placed on the particular type of study designs eligible in each of the KQs, but an overarching requirement was that the study be designed to address the comparative effectiveness of the SFLC assay—that is, compare the assay with (predefined) traditional tests: SPEP, UPEP, SIFE, and UIFE and other tests in common use in a diagnostic panel for PCDs (e.g., bone marrow evaluation, skeletal survey). (Newer tests [e.g., positron emission tomography26] that were not in general use were not addressed.)

The eligibility criteria for study populations included the following:

  • KQ1: studies that addressed adults (≥18 years of age) who had not been diagnosed with a PCD, with or without kidney failure, but who were suspected to have a PCD;
  • KQ2: studies of patients with MGUS;
  • KQ35: studies of patients with an existing diagnosis of PCD (MM, NSMM, or AL amyloidosis), with or without disease measurable by means of traditional testing.

For interventions (diagnostic tests/disease monitoring), eligible studies were those involving the SFLC assay as well as the FLC kappa/lambda ratio. For comparators, eligible studies were those involving any kind of traditional testing (i.e., SPEP, UPEP, SIFE, or UIFE; sizing and typing of serum M protein; bone marrow biopsy; or detection of skeletal lesions).

For outcomes, eligible studies were those with the following data:

  • KQ1: measures of diagnostic accuracy, such as sensitivity, specificity, predictive values, likelihood ratios, or area under the receiver operating characteristics curve;
  • KQ2: progression to MM;
  • KQ3: timing, duration, and type of treatment;
  • KQ4: overall survival, disease-free survival, response to treatment or remission (categorized as partial, complete, or stringent complete on the basis of treatment-induced decline in M protein or FLC concentrations11,27), light chain escape, or quality of life; and
  • KQ5: clinic visits, bone marrow biopsies, or skeletal surveys.

Studies could have any length of followup11,27 or any setting (primary or specialty care, in-facility or home, inpatient or outpatient).

Data Extraction and Data Management

Eight articles were extracted simultaneously by all researchers for training purposes. Subsequently, each study was extracted by one methodologist and this extraction was reviewed and confirmed by at least one other methodologist. Any disagreements were resolved by discussion in team meetings. Data were extracted into tables in Microsoft Word, designed to capture all elements relevant to the KQs. Briefly, we extracted bibliographic data, eligibility criteria, enrollment years, and sample size for all studies. We also extracted population characteristics such as basic demographic data—age, sex, and race or ethnic group— as well as any factors that may have a role in the outcome of PCDs, such as type of PCD, presence of anemia, light chain or M protein type and concentration, organ involvement, treatment and other pertinent characteristics, and test-related characteristics such as diagnostic performance. The forms were tested on several articles and revised before commencement of full data extraction.

Assessment of Risk of Bias

For assessment of risk of bias, we used predefined methods for evaluating study quality pertinent to risk of bias that are common within the EPC Program.23,28,29 Briefly, we used a three-category (A, B, or C) grading system to denote the methodological quality of each study. This system involves a generic grading scheme that is applicable to varying study designs including randomized controlled trials, nonrandomized comparative trials, and cohort and case–control studies.

In the present report, the majority of the studies were related to testing of diagnostic performance and prediction of outcomes; therefore we adapted criteria from formal quality-assessment schemes for diagnostic-accuracy studies—STAndards for the Reporting of Diagnostic accuracy studies (STARD,—and observational epidemiologic studies—STrengthening the Reporting of OBservational studies in Epidemiology (STROBE, The modified checklists used for quality assessment are provided in Appendix C, along with how each study fulfilled those criteria and the quality grade assigned to each.

The specific criteria of each grade are as follows:

  • A (good). Quality A studies are those judged to have the least likelihood of bias and their results are considered valid. They possess, at a minimum, the following: a representative study population with both disease and nondiseased groups, no verification bias, a clear description of the reference test (if applicable), and no selection bias. Ideally, the population, setting, interventions (diagnostic tests/disease monitoring), and comparison groups are well defined and there is appropriate measurement of outcomes, appropriate statistical and analytic methods and reporting, complete and consistent overall reporting, clear accounting of dropouts, and a low dropout rate. For this review of diagnostic test studies, only studies with a sample size of at least 100 patients in total could receive a grade of A; these studies could be either prospective or retrospective.
  • B (fair). Quality B studies are susceptible to some bias but not sufficiently to invalidate results. They do not meet all the minimum criteria in category A, owing to some deficiencies, but none of these are likely to introduce major bias. Quality B studies may be missing information, making it difficult to assess limitations and potential problems.
  • C (poor). Quality C studies have a substantial risk of bias that may invalidate the reported findings. These studies have serious errors in design, analysis, or reporting and contain discrepancies in reporting or have large amounts of missing information.

Quality assessment was performed by the team member responsible for primary data extraction. The quality grade was confirmed by at least one other team member.

Data Synthesis

We summarized all included studies in narrative form and in summary tables (all of which are in the Results section) that succinctly describe the important features of the study population, design, intervention (diagnostic test/disease monitoring), outcomes, results, and study quality. We included diagnostic performance parameters, risk estimates, and their 95 percent confidence intervals and p values where applicable. Results are presented in separate summary tables for each KQ.

We conducted mainly descriptive analyses30 and undertook a qualitative synthesis of studies that addressed the predictive role of the SFLC assay. We did not conduct any meta-analyses of the studies, as there was marked heterogeneity in their designs, populations, and comparisons.

Grading the Body of Evidence for Each KQ

We followed the Methods Guide to grade the strength of the body of evidence (mostly a measure of risk of bias) for each KQ, with modifications, on the basis of on our level of confidence that the evidence reflected the true effect for the major comparisons of interest. The strength of evidence was defined as low, medium, high, or insufficient on the basis of the number of studies, consistency across the studies, and precision of the findings.

We assessed the consistency of the data as either “no inconsistency” or “inconsistency present” (or not applicable if only one study). The direction, magnitude, and statistical significance of all studies were evaluated in assessing consistency, and logical explanations were provided in the presence of equivocal results. We also assessed the precision of the evidence on the basis of the degree of certainty surrounding an effect estimate. A precise estimate was considered an estimate that would allow for a clinically useful conclusion. An imprecise estimate was one for which the confidence interval is wide enough to preclude a conclusion.

Ratings were defined as follows:

  • High. There is high confidence that the evidence reflects the true effect. Further research is very unlikely to change our confidence in the estimate of effect. No important scientific disagreement exists across studies. At least two quality A studies are required for this rating. In addition, there must be evidence regarding objective clinical outcomes.
  • Moderate. There is moderate confidence that the evidence reflects the true effect. Further research may change our confidence in the estimate of effect and may in fact change the estimate. Little disagreement exists across studies. Moderately rated bodies of evidence contain fewer than two quality A studies or such studies lack long-term outcomes of relevant populations.
  • Low. There is low confidence that the evidence reflects the true effect. Further research is likely to change the confidence in the estimate of effect and is likely to change the estimate. Underlying studies may report conflicting results. Low rated bodies of evidence could contain either quality B or C studies.
  • Insufficient. Evidence is either unavailable or does not permit a conclusion. There are sparse or no data. In general, when only one study has been published, the evidence is considered insufficient, unless the study is particularly large, robust, and of good quality.

These ratings provide a shorthand description of the strength of evidence supporting the major questions we addressed. However, they by necessity may oversimplify the many complex issues involved in appraising a body of evidence. The individual studies involved in formulating the composite rating may differ in their design, reporting, and quality. The strengths and weaknesses of the individual reports, as described in detail in the text and tables, should also be considered.

Peer Review and Public Commentary

Experts in MM and/or AL amyloidosis and clinical chemistry and individuals representing stakeholder and user communities were invited to provide external peer review of this CER; AHRQ and an associate editor also provided comments. The draft report was posted on the AHRQ website for 4 weeks to elicit public comment. We addressed all reviewer comments, revising the text as appropriate, and documented everything in a disposition of comments report that will be made available 3 months after the Agency posts the final CER on the AHRQ Web site.


Recent Activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...