Warning: The NCBI web site requires JavaScript to function. more...
An official website of the United States government
The .gov means it's official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
Download
IDs: 132581 [UID] 975068 [GenBank] 1024848 [RefSeq]
Chlorocebus aethiops sabeus (vervet) Sequence Assembly Release Notes The vervet DNA for shotgun sequencing, and for BAC libraries, is derived from an adult male vervet monkey (Chlorocebus aethiops sabeus; animal id 1994-021) within the vervet research colony housed a ... the Wake Forest Primate Facility to create the BAC library CHORI-252. A total of 362,969 BAC end sequences have been generated from this library. A total of 143 CHORI-252 BACs (approx. 18Mb) have been finished and submitted to Genbank. Of these 29 were finished and submitted for the MHC region. Whole genome sequences were generated on the Roche 454 Titanium instrument at these coverage levels (vervet genome size of approx 2.9Gb): fragment- 10X, 3kbp- 8X, and 8kbp- 1X. Total sequence genome coverage on the Illumina HiSeq instrument was 95x (45x fragments, 45x 3kb and 5x 8kb). Two independent assemblies were built with the appropriate sequence data, using the ALLPATHS (Broad Institute) and Newbler (Roche) assemblers. Based on superior contig and scaffold contiguity the ALLPATHS assembly was chosen as the reference. The unique sequences from the Newbler assembly were then merged into the ALLPATHS assembly using graph accordance methods (Yao et. al. 2011. Oct. 23 Bioinformatics). Post assembly we integrated 170 finished BACs. These 170 BACs (including the MHC region) were merged into the 1.0 assembly. The top scaffold that each BAC mapped to was identified by MEGABLAST (-e 1e-20 --W 200 --p 98). Contigs of the top scaffold that the BAC mapped to were identified by BLASTN (-W150 --F F). A Perl script was used to create a new contig for each BAC, extend the contig if the 5' and 3' overlapping contigs were longer than the BAC and adjust flanking gaps accordingly. We then sorted scaffolds by decreasing length, assigned new sequence identifiers to contigs and scaffolds, and extended 20-bp and 50-bp gaps to 100-bp as per NCBI's guideline. In the final assembly, referred to as Chlorocebus_sabeus 6.0.3, there were 162,907 contigs with an N50 contig length of 88 kb. There were 2205 supercontigs with the N50 supercontig length of 45 Mb. A total of 2.73 Gb of sequence was assembled in contigs. Including estimated gap sizes, over 2.74Gb were ordered and oriented along chromosomes, 27.6Mb along the CAE*_random chromosomes, with only 18.34 Mb remaining unlocalized. After organizing Chlorocebus_sabeus 6.0.3 into chromosomal AGP files, we labeled this first vervet release as 1.0. ******************************************* Chlorocebus aethiops sabeus Sequence and Assembly Credits DNA source - Dr. Jay Kaplan, Wake Forest Primate Facility, Wake Forest, NC. Genome Sequence - The Genome Institute, Washington University School of Medicine, St Louis, MO and Department of Human Genetics, McGill University, Montreal, Canada. Sequence Assembly - The Genome Institute, Washington University School of Medicine, St Louis, MO. BAC library - Dr. Pieter DeJong, CHORI, Oakland,CA. Assembly curation - Jessica Wasserscheid, Nikoleta Juretic, Dr. Ken Dewar, McGill University, Montreal, QC Canada. LaDeana Hillier, The Genome Institute, Washington University School of Medicine, St Louis, MO. FISH Mapping Data - Mariano Rocchi, Department of Biology, University of Bari, Bari, Italy. cDNA data - RNA sources was Dr. Nelson Freimer, Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, CA, USA Funding for the sequence characterization of the vervet genome is being provided by NHGRI. Author List: Nelson Freimer, George Weinstock, Richard K. Wilson, Wesley C. Warren ******************************************* Chromosome Lengths: column 1 = chromosome column 2 = chromosome length (including estimated gap sizes) CAE1 126035930 CAE2 90373283 CAE3 92142175 CAE4 91010382 CAE5 75399963 CAE6 50890351 CAE7 135778131 CAE8 139301422 CAE9 125710982 CAE10 128595539 CAE11 128539186 CAE12 108555830 CAE13 98384682 CAE14 107702431 CAE15 91754291 CAE16 75148670 CAE17 71996105 CAE18 72318688 CAE19 33263144 CAE20 130588469 CAE21 127223203 CAE22 101219884 CAE23 82825804 CAE24 84932903 CAE25 85787240 CAE26 58131712 CAE27 48547382 CAE28 21531802 CAE29 24206276 CAEX 130038232 CAEY 6181219 ******************************************* Chlorocebus_sabeus 6.0.3 assembly statistics: *** Contiguity: Contig *** Total contig number: 162907 Total contig bases: 2734267806 bp Average contig length: 16784 bp Maximum contig length: 1051246 bp N50 contig length: 88741 bp N50 contig number: 7870 *** Contiguity: Supercontig *** Total supercontig number: 2205 Average supercontig length: 1240031 bp Maximum supercontig length: 126332868 bp N50 supercontig length: 45002363 bp N50 supercontig number: 19 Scaffolds > 1M: 147 Scaffold 250K--1M: 47 Scaffold 100K--250K: 34 Scaffold 10--100K: 473 Scaffold 5--10K: 235 Scaffold 2--5K: 434 Scaffold 0--2K: 835 more
Your browsing activity is empty.
Activity recording is turned off.
Turn recording back on