Send to

Choose Destination
Nucleic Acids Res. 2019 May 7;47(8):3846-3861. doi: 10.1093/nar/gkz169.

Haplotype-resolved and integrated genome analysis of the cancer cell line HepG2.

Author information

Department of Psychiatry and Behavioral Sciences, Stanford University School of Medicine, Stanford, CA 94305, USA.
Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA.
Division of Oncology, Department of Medicine, Stanford University School of Medicine, Stanford, CA 94305, USA.
Department of Pathology, Stanford University School of Medicine, Stanford, CA 94305, USA.
Genome-scale Measurements Group, National Institute of Standards and Technology, Gaithersburg, MD 20899, USA.
Stanford Genome Technology Center, Stanford University, Palo Alto, CA 94304, USA.
Department of Statistics, Stanford University, Stanford, CA 94305, USA.
School of Computer Science and Engineering, College of Engineering, Pusan National University, Busan 46241, South Korea.
Science and Engineering Faculty, Queensland University of Technology, Brisbane, QLD 4001, Australia.
Department of Biomedical Data Science, Bio-X Program, Stanford University, Stanford, CA 94305, USA.
Department of Health Sciences Research, Center for Individualized Medicine, Mayo Clinic, Rochester, MN 55905, USA.
Tashia and John Morgridge Faculty Scholar, Stanford Child Health Research Institute, Stanford, CA 94305, USA.


HepG2 is one of the most widely used human cancer cell lines in biomedical research and one of the main cell lines of ENCODE. Although the functional genomic and epigenomic characteristics of HepG2 are extensively studied, its genome sequence has never been comprehensively analyzed and higher order genomic structural features are largely unknown. The high degree of aneuploidy in HepG2 renders traditional genome variant analysis methods challenging and partially ineffective. Correct and complete interpretation of the extensive functional genomics data from HepG2 requires an understanding of the cell line's genome sequence and genome structure. Using a variety of sequencing and analysis methods, we identified a wide spectrum of genome characteristics in HepG2: copy numbers of chromosomal segments at high resolution, SNVs and Indels (corrected for aneuploidy), regions with loss of heterozygosity, phased haplotypes extending to entire chromosome arms, retrotransposon insertions and structural variants (SVs) including complex and somatic genomic rearrangements. A large number of SVs were phased, sequence assembled and experimentally validated. We re-analyzed published HepG2 datasets for allele-specific expression and DNA methylation and assembled an allele-specific CRISPR/Cas9 targeting map. We demonstrate how deeper insights into genomic regulatory complexity are gained by adopting a genome-integrated framework.

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center