Format

Send to

Choose Destination
BMC Bioinformatics. 2016 Oct 6;17(Suppl 13):333.

VDJML: a file format with tools for capturing the results of inferring immune receptor rearrangements.

Author information

1
Department of Clinical Sciences, UT Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, TX, 75390-9066, USA.
2
Bank of America Corporate Center, 100 North Tryon Street, Charlotte, NC, 28202, USA.
3
Broad Institute, 75 Ames Street, Cambridge, MA, 02142, USA.
4
Institute for Computational Health Sciences, University of California San Francisco, Mission Hall, 550 16th Street, 4th Floor, Box 0110, San Francisco, CA, 94158, USA.
5
Department of Biological Sciences and The IRMACS Centre, Simon Fraser University, 8888 University Drive, Burnaby, V5A 1S6, British Columbia, Canada.
6
Department of Immunobiology, University of Arizona School of Medicine, 1656 E. Mabel Street, P.O. Box 245221, Tucson, AZ, 85724-5221, USA.
7
New Zealand eScience Infrastructure, University of Auckland, Level 10, 49 Symonds Street, Auckland, New Zealand.
8
Texas Advanced Computing Center, Research Office Complex 1.101, J.J. Pickle Research Campus, Building 196, 10100 Burnet Road (R8700), Austin, TX, 78758-4497, USA.
9
Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, 300 George Street, Suite 505, New Haven, CT, 06511, USA.
10
School of Biomedical Engineering, Science and Health Systems and Department of Microbiology and Immunology, College of Medicine, Drexel University, 3141 Chestnut Street, Philadelphia, PA, 19104, USA.
11
The IRMACS Centre (ASB 10905), Simon Fraser University, 8888 University Drive, Burnaby, BC, V5A 1S6, Canada.
12
School of Biomedical Engineering, Science and Health Systems, Drexel University, 3141 Chestnut Street, Philadelphia, PA, 19104, USA.
13
Department of Neurology and Neurotherapeutics, UT Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, TX, 75390-9036, USA.
14
Stanford University School of Medicine, 279 Campus Drive, Stanford, CA, 94305-5101, USA.
15
Department of Molecular Biology and Biochemistry and Faculty of Health Sciences, Simon Fraser University, Blusson Hall, Room 11300, 8888 University Drive, Burnaby, BC, V5A 1S6, Canada.
16
Department of Pathology, Yale School of Medicine, 300 George Street, Suite 505, New Haven, CT, 06511, USA.
17
J. Craig Venter Institute, 4120 Capricorn Lane, La Jolla, CA, 92037, USA.
18
Department of Pathology, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, USA.
19
Division of Vaccine Discovery, La Jolla Institute for Allergy and Immunology, 9420 Athena Circle, La Jolla, CA, 92037, USA.
20
Department of Immunobiology, Yale School of Medicine, New Haven, CT, USA.
21
Department of Clinical Sciences, UT Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, TX, 75390-9066, USA. lindsay.cowell@utsouthwestern.edu.

Abstract

BACKGROUND:

The genes that produce antibodies and the immune receptors expressed on lymphocytes are not germline encoded; rather, they are somatically generated in each developing lymphocyte by a process called V(D)J recombination, which assembles specific, independent gene segments into mature composite genes. The full set of composite genes in an individual at a single point in time is referred to as the immune repertoire. V(D)J recombination is the distinguishing feature of adaptive immunity and enables effective immune responses against an essentially infinite array of antigens. Characterization of immune repertoires is critical in both basic research and clinical contexts. Recent technological advances in repertoire profiling via high-throughput sequencing have resulted in an explosion of research activity in the field. This has been accompanied by a proliferation of software tools for analysis of repertoire sequencing data. Despite the widespread use of immune repertoire profiling and analysis software, there is currently no standardized format for output files from V(D)J analysis. Researchers utilize software such as IgBLAST and IMGT/High V-QUEST to perform V(D)J analysis and infer the structure of germline rearrangements. However, each of these software tools produces results in a different file format, and can annotate the same result using different labels. These differences make it challenging for users to perform additional downstream analyses.

RESULTS:

To help address this problem, we propose a standardized file format for representing V(D)J analysis results. The proposed format, VDJML, provides a common standardized format for different V(D)J analysis applications to facilitate downstream processing of the results in an application-agnostic manner. The VDJML file format specification is accompanied by a support library, written in C++ and Python, for reading and writing the VDJML file format.

CONCLUSIONS:

The VDJML suite will allow users to streamline their V(D)J analysis and facilitate the sharing of scientific knowledge within the community. The VDJML suite and documentation are available from https://vdjserver.org/vdjml/ . We welcome participation from the community in developing the file format standard, as well as code contributions.

KEYWORDS:

Antigen receptor repertoire; C++; Data sharing; Data standards; Immune repertoire; Python; Repertoire profiling; XML

PMID:
27766961
PMCID:
PMC5073965
DOI:
10.1186/s12859-016-1214-3
[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for BioMed Central Icon for PubMed Central
Loading ...
Support Center