Format

Send to

Choose Destination
Virus Evol. 2018 May 18;4(1):vey007. doi: 10.1093/ve/vey007. eCollection 2018 Jan.

Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver.

Author information

1
Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, University of Oxford, Oxford, UK.
2
Medical Research Council Centre for Outbreak Analysis and Modelling, Department of Infectious Disease Epidemiology, Imperial College London, London, UK.
3
Wellcome Centre for Human Genetics, Nuffield Department of Medicine, University of Oxford, Oxford, UK.
4
Department of Veterinary Medicine, University of Cambridge, Cambridge, UK.
5
Virus Genomics, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK.
6
Laboratory of Experimental Virology, Department of Medical Microbiology, Center for Infection and Immunity Amsterdam (CINIMA), Academic Medical Center of the University of Amsterdam, Amsterdam, The Netherlands.
7
Stichting HIV Monitoring, Amsterdam, The Netherlands.
8
Department of Mathematics, Imperial College London, London, UK.
9
Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden.
10
Department of Clinical Microbiology, Karolinska University Hospital, Stockholm, Sweden.
11
Division for HIV and Other Retroviruses, Robert Koch Institute, Berlin, Germany.
12
School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.
13
Swiss Institute of Bioinformatics, Lausanne, Switzerland.
14
HIV/STI Reference Laboratory, Department of Clinical Science, WHO Collaborating Centre, Institute of Tropical Medicine, Antwerpen, Belgium.
15
Institute for Global Health, University College London, London, UK.
16
Department of Population Health, Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, UK.
17
Department of Pathology, John Hopkins University, Baltimore, MD, USA.
18
Department of Infectious Disease Epidemiology, Robert Koch-Institute, Berlin, Germany.
19
Division of Infectious Diseases and Hospital Epidemiology, University Hospital Zurich, Zurich, Switzerland.
20
Institute of Medical Virology, University of Zurich, Zurich, Switzerland.
21
Department of Infectious Diseases, Helsinki University Hospital, Helsinki, Finland.
22
Division of Intramural Research, NIAID, NIH, Baltimore, MD, USA.
23
INSERM CESP U1018, Université Paris Sud, Université Paris Saclay, APHP, Service de Santé Publique, Hôpital de Bicêtre, Le Kremlin-Bicêtre, France.
24
Kymab Ltd, Cambridge, UK.
25
Division of Infectious Diseases, Department of Medicine, Imperial College London, London, UK.
26
Department of Global Health, Academic Medical Center and Amsterdam Institute for Global Health and Development, Amsterdam, The Netherlands.

Abstract

Studying the evolution of viruses and their molecular epidemiology relies on accurate viral sequence data, so that small differences between similar viruses can be meaningfully interpreted. Despite its higher throughput and more detailed minority variant data, next-generation sequencing has yet to be widely adopted for HIV. The difficulty of accurately reconstructing the consensus sequence of a quasispecies from reads (short fragments of DNA) in the presence of large between- and within-host diversity, including frequent indels, may have presented a barrier. In particular, mapping (aligning) reads to a reference sequence leads to biased loss of information; this bias can distort epidemiological and evolutionary conclusions. De novo assembly avoids this bias by aligning the reads to themselves, producing a set of sequences called contigs. However contigs provide only a partial summary of the reads, misassembly may result in their having an incorrect structure, and no information is available at parts of the genome where contigs could not be assembled. To address these problems we developed the tool shiver to pre-process reads for quality and contamination, then map them to a reference tailored to the sample using corrected contigs supplemented with the user's choice of existing reference sequences. Run with two commands per sample, it can easily be used for large heterogeneous data sets. We used shiver to reconstruct the consensus sequence and minority variant information from paired-end short-read whole-genome data produced with the Illumina platform, for sixty-five existing publicly available samples and fifty new samples. We show the systematic superiority of mapping to shiver's constructed reference compared with mapping the same reads to the closest of 3,249 real references: median values of 13 bases called differently and more accurately, 0 bases called differently and less accurately, and 205 bases of missing sequence recovered. We also successfully applied shiver to whole-genome samples of Hepatitis C Virus and Respiratory Syncytial Virus. shiver is publicly available from https://github.com/ChrisHIV/shiver.

KEYWORDS:

HIV; bioinformatics; diversity; genome assembly; mapping; next-generation sequencing

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center