Augmenting Chinese hamster genome assembly by identifying regions of high confidence

Biotechnol J. 2016 Sep;11(9):1151-7. doi: 10.1002/biot.201500455. Epub 2016 Jul 19.

Abstract

Chinese hamster Ovary (CHO) cell lines are the dominant industrial workhorses for therapeutic recombinant protein production. The availability of genome sequence of Chinese hamster and CHO cells will spur further genome and RNA sequencing of producing cell lines. However, the mammalian genomes assembled using shot-gun sequencing data still contain regions of uncertain quality due to assembly errors. Identifying high confidence regions in the assembled genome will facilitate its use for cell engineering and genome engineering. We assembled two independent drafts of Chinese hamster genome by de novo assembly from shotgun sequencing reads and by re-scaffolding and gap-filling the draft genome from NCBI for improved scaffold lengths and gap fractions. We then used the two independent assemblies to identify high confidence regions using two different approaches. First, the two independent assemblies were compared at the sequence level to identify their consensus regions as "high confidence regions" which accounts for at least 78 % of the assembled genome. Further, a genome wide comparison of the Chinese hamster scaffolds with mouse chromosomes revealed scaffolds with large blocks of collinearity, which were also compiled as high-quality scaffolds. Genome scale collinearity was complemented with EST based synteny which also revealed conserved gene order compared to mouse. As cell line sequencing becomes more commonly practiced, the approaches reported here are useful for assessing the quality of assembly and potentially facilitate the engineering of cell lines.

Keywords: Annotation; CHO cells; NextGen sequencing; Scaffolds; Synteny.

Publication types

  • Comparative Study

MeSH terms

  • Animals
  • CHO Cells
  • Chromosome Mapping / methods*
  • Cricetinae
  • Cricetulus
  • Expressed Sequence Tags
  • Genome*
  • Mice
  • Sequence Analysis, DNA / methods*