Critical steps for computational inference of the 3'-end of novel alleles of immunoglobulin heavy chain variable genes - illustrated by an allele of IGHV3-7

Mol Immunol. 2018 Nov:103:1-6. doi: 10.1016/j.molimm.2018.08.018. Epub 2018 Aug 30.

Abstract

Sequencing of immunoglobulin germline gene loci is a challenging process, e.g. due to their repetitiveness and complexity, hence limiting the insight in the germline gene repertoire of humans and other species. Through next generation sequencing technology, it is possible to generate immunoglobulin transcript data sets large enough to computationally infer the germline genes from which the transcripts originate. Multiple tools for such inference have been developed and they can be used for construction of individual germline gene databases, and for discovery of new immunoglobulin germline genes and alleles. However, there are challenges associated with these methods, many of them related to the biological process through which immunoglobulin coding genes are generated. The junctional diversity introduced during rearrangement of the immunoglobulin heavy chain variable (IGHV), diversity and joining genes specifically complicates the inference of the junction regions, with implications for inference of the 3'-end of IGHV genes. With the aim of coping with such diversity, an inference software package may not be able to identify novel alleles harbouring a difference in these regions compared to their closest relatives in the starting database. In this study, we were able to computationally infer one such previously uncharacterized allele, IGHV3-7*02 A318G. However, this was possible only if a strategy was used in which different variants of IGHV3-7*02 were included in the inference-initiating database. Importantly, the presence of the novel allele, but not the standard IGHV3-7*02 sequence, in the genotype was strongly supported by the actual sequences that were assigned to the allele. We thus showed that the starting database used will impact the germline gene inference process, and that difference in the 3'-end of IGHV genes may remain undetected unless specific, non-standard procedures are used to address this matter. We suggest that inferred genes/alleles should be confirmed e.g. by examination of the nucleotide composition of the 3'-bases of the inference-supporting sequence reads.

Keywords: Antibody; Bioinformatics; Germline gene allelic diversity; Germline gene inference; Immunoglobulin germline gene.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Alleles
  • Base Sequence
  • Computational Biology / methods*
  • Gene Rearrangement
  • Genes, Immunoglobulin / genetics
  • Genotype
  • Germ Cells / immunology
  • Germ Cells / metabolism
  • High-Throughput Nucleotide Sequencing / methods
  • Humans
  • Immunoglobulin Heavy Chains / chemistry
  • Immunoglobulin Heavy Chains / genetics*
  • Immunoglobulin M / genetics
  • Immunoglobulin Variable Region / chemistry
  • Immunoglobulin Variable Region / genetics*
  • Multigene Family*
  • Sequence Homology, Nucleic Acid

Substances

  • Immunoglobulin Heavy Chains
  • Immunoglobulin M
  • Immunoglobulin Variable Region