Format

Send to

Choose Destination
J Immunol. 2017 Mar 15;198(6):2489-2499. doi: 10.4049/jimmunol.1601850. Epub 2017 Feb 8.

Hierarchical Clustering Can Identify B Cell Clones with High Confidence in Ig Repertoire Sequencing Data.

Author information

1
Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520.
2
AbVitro, Boston, MA 02210.
3
Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06520; steven.kleinstein@yale.edu.
4
Department of Immunobiology, Yale School of Medicine, New Haven, CT 06520; and.
5
Department of Pathology, Yale School of Medicine, New Haven, CT 06520.

Abstract

Adaptive immunity is driven by the expansion, somatic hypermutation, and selection of B cell clones. Each clone is the progeny of a single B cell responding to Ag, with diversified Ig receptors. These receptors can now be profiled on a large scale by next-generation sequencing. Such data provide a window into the microevolutionary dynamics that drive successful immune responses and the dysregulation that occurs with aging or disease. Clonal relationships are not directly measured, but they must be computationally inferred from these sequencing data. Although several hierarchical clustering-based methods have been proposed, they vary in distance and linkage methods and have not yet been rigorously compared. In this study, we use a combination of human experimental and simulated data to characterize the performance of hierarchical clustering-based methods for partitioning sequences into clones. We find that single linkage clustering has high performance, with specificity, sensitivity, and positive predictive value all >99%, whereas other linkages result in a significant loss of sensitivity. Surprisingly, distance metrics that incorporate the biases of somatic hypermutation do not outperform simple Hamming distance. Although errors were more likely in sequences with short junctions, using the entire dataset to choose a single distance threshold for clustering is near optimal. Our results suggest that hierarchical clustering using single linkage with Hamming distance identifies clones with high confidence and provides a fully automated method for clonal grouping. The performance estimates we develop provide important context to interpret clonal analysis of repertoire sequencing data and allow for rigorous testing of other clonal grouping algorithms.

PMID:
28179494
PMCID:
PMC5340603
DOI:
10.4049/jimmunol.1601850
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for HighWire Icon for PubMed Central
Loading ...
Support Center