Warning: The NCBI web site requires JavaScript to function. more...
An official website of the United States government
The .gov means it's official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
Download
IDs: 733711 [UID] 3217758 [GenBank] 3241928 [RefSeq]
Assembly Release Notes for chimpanzee "Clint", version Pan tro 3.0: The chimpanzee genome (Pan troglodytes) genome was originally sequenced to 4X coverage using a male captive-born chimp of West Africa origin known as "Clint" from the Yerkes Primate Research ... Center (Atlanta, USA). The revised assembly (Pan_troglodytes-2.1.4) represents an additional 2X whole genome shotgun plasmid reads which were generated as part of an improvement plan for the existing 4X chimp assembly (Pan_troglodytes-1.0). Both of these prior versions were assembled using the PCAP software (Genome Res. 13(9):2164-70 2003). A very small fragment of this assembly was complemented with BACs from two other chimpanzees. The chromosome Y sequence was finished at the McDonnell Genome Institute, Washington University School of Medicine with detailed mapping and extensive collaboration with David Page's group at the Whitehead Institute (Hughes et al., Nature, 2005 437:100-3). For a pure "Clint" version of the chimpanzee genome we generated 55x of Illumina overlapping paired 250bp length reads, 2 Lanes of a Chicago library (Dovetail Genomics) and 9x of PacBio long single molecule reads (P5C3 chemistry). The combined Illumina sequence reads were assembled using the DiscoVAR de novo assembler (Weisenfeld NI et al., Nat Genet. 2014 46(12):1350-5). We attempted to scaffold all contigs from this assembly using in vitro HiC content mapping (Dovetail Genomics). We then filled scaffold gaps where possible using 9x PacBio reads with PBJelly (English AC et al. PloS One 2012 7:e47768). The assembly in total was corrected for residual base substitution and small insertion and deletion errors using mapped "Clint" paired end 250bp reads with Raccoon (Kuderna et al., unpublished). The de novo assembly is made up of a total of 3,554 non-singleton scaffolds with an N50 scaffold length of 27Mb (N50 contig length was 334kb). The total assembled size is 3.02Gb. To create a chromosomal version of the Pan_tro 3.0 assembly we first used Nucmer-aligned assembled scaffold sequences to Pan tro 2.1.4 and human GRCh38 references to initially order and orient along the Pan tro 2.1.4 chromosomes. The assembled Pan_tro 3.0 genome was also broken into 1kb segments and then aligned against the chimpanzee Pan tro 2.1.4 and human genomes using BLAT (Kent 2002) to identify uniquely aligning segments of the chimpanzee genome to aid in identifying breakpoints and confirm alignment localization. Aligned paired end discordance of "Clint" fosmid end sequences revealed misassembly events that were manually corrected. In the final phase only finished BAC clones from the male "Clint" chimpanzee were integrated into the assembly. Finally, centromeres were placed along each chromosome using the localization data from human. There are 2.95 Gb bases (including Ns in gaps) on ordered/oriented chromosomes, 140 Mb on the chr*_random, and 123 Mb on chromosome Un. The scaffold N50 length is 27 Mb (count=39) and the contig N50 length is 334kb (count=2503). This draft assembly is referred to as Pan_tro 3.0. Credits: Funding - The sequence characterization of the chimp genome was provided by the National Human Genome Research Institute (NHGRI), National Institutes of Health (NIH) and Spanish grant MINECO BFU2014-55090-P (FEDER). BAC sequencing - McDonnell Genome Institute at Washington University School of Medicine, St Louis, MO Sequence generation, assembly and data integration for creation of chromosomal AGP files - Lukas Kuderna and Tomas Marques-Bonet, ICREA at Institut de Biologia Evolutiva, (UPF-CSIC), PRBB, 08003 Barcelona, Spain, and LaDeana Hillier, McDonnell Genome Institute at Washington University School of Medicine, St Louis, MO. Wes Warren, Lars Feuk (Uppsala U), Andrew Sharp (Mt Sinai) , Ed Green (Dovetail), Mikkel Schierup (Aarhus U). Thanks also to Illumina (Bojan Obradovic). more
Your browsing activity is empty.
Activity recording is turned off.
Turn recording back on