Format

Send to:

Choose Destination

Download Assembly



Pan_tro 3.0

Organism name:
Pan troglodytes (chimpanzee)
Isolate:
Yerkes chimp pedigree #C0471 (Clint)
Sex:
male
BioSample:
SAMN02981217
BioProject:
PRJNA13184
Submitter:
Chimpanzee Sequencing and Analysis Consortium
Date:
2016/05/03
Synonyms:
panTro5
Assembly level:
Chromosome
Genome representation:
full
GenBank assembly accession:
GCA_000001515.5 (latest)
RefSeq assembly accession:
GCF_000001515.7 (suppressed)
RefSeq assembly and GenBank assembly identical:
no (hide details)
  • Only in RefSeq: chromosome MT (in non-nuclear assembly-unit)
  • Data displayed for GenBank version
WGS Project:
AACZ04
Assembly method:
DiscoVar v. 51280; PBJelly v. 14.9.9
Genome coverage:
6x Sanger; 55x Illumina; 9x PacBio
Sequencing technology:
Sanger; Illumina; PacBio

IDs: 733711 [UID] 3217758 [GenBank] 3241928 [RefSeq]

See Genome Information for Pan troglodytes

There are 6 assemblies for this organism

See more

History (Show revision history)

Comment

Assembly Release Notes for chimpanzee "Clint", version Pan tro 3.0:
 The chimpanzee genome (Pan troglodytes) genome was originally sequenced to 4X coverage using a male captive-born chimp of West Africa origin known as "Clint" from the Yerkes Primate Research ... Center (Atlanta, USA). The revised assembly (Pan_troglodytes-2.1.4) represents an additional 2X whole genome shotgun plasmid reads which were generated as part of an improvement plan for the existing 4X chimp assembly (Pan_troglodytes-1.0). Both of these prior versions were assembled using the PCAP software (Genome Res. 13(9):2164-70 2003). A very small fragment of this assembly was complemented with BACs from two other chimpanzees. The chromosome Y sequence was finished at the McDonnell Genome Institute, Washington University School of Medicine with detailed mapping and extensive collaboration with David Page's group at the Whitehead Institute (Hughes et al., Nature, 2005 437:100-3).
 For a pure "Clint" version of the chimpanzee genome we generated 
55x of Illumina overlapping paired 250bp length reads, 2 Lanes of a Chicago library (Dovetail Genomics) and 
9x of PacBio long single molecule reads (P5C3 chemistry). The combined Illumina sequence reads were assembled using the DiscoVAR de novo assembler (Weisenfeld NI et al., Nat Genet. 2014 46(12):1350-5). We attempted to scaffold all contigs from this assembly using in vitro HiC content mapping (Dovetail Genomics). We then filled scaffold gaps where possible using 9x PacBio reads with PBJelly (English AC et al. PloS One 2012 7:e47768). The assembly in total was corrected for residual base substitution and small insertion and deletion errors using mapped "Clint" paired end 250bp reads with Raccoon (Kuderna et al., unpublished). The de novo assembly is made up of a total of 3,554 non-singleton scaffolds with an N50 scaffold length of 27Mb (N50 contig length was 334kb). The total assembled size is 3.02Gb. 
 To create a chromosomal version of the Pan_tro 3.0 assembly we first used Nucmer-aligned assembled scaffold sequences to Pan tro 2.1.4 and human GRCh38 references to initially order and orient along the Pan tro 2.1.4 chromosomes. The assembled Pan_tro 3.0 genome was also broken into 1kb segments and then aligned against the chimpanzee Pan tro 2.1.4 and human genomes using BLAT (Kent 2002) to identify uniquely aligning segments of the chimpanzee genome to aid in identifying breakpoints and confirm alignment localization. Aligned paired end discordance of "Clint" fosmid end sequences revealed misassembly events that were manually corrected. In the final phase only finished BAC clones from the male "Clint" chimpanzee were integrated into the assembly. Finally, centromeres were placed along each chromosome using the localization data from human. 
 There are 2.95 Gb bases (including Ns in gaps) on ordered/oriented chromosomes, 140 Mb on the chr*_random, and 123 Mb on chromosome Un. The scaffold N50 length is 27 Mb (count=39) and the contig N50 length is 334kb (count=2503). This draft assembly is referred to as Pan_tro 3.0. 
 Credits:
 Funding - The sequence characterization of the chimp genome was provided by the National Human Genome Research Institute (NHGRI), National Institutes of Health (NIH) and Spanish grant MINECO BFU2014-55090-P (FEDER). 
 BAC sequencing - McDonnell Genome Institute at Washington University School of Medicine, St Louis, MO 
 Sequence generation, assembly and data integration for creation of chromosomal AGP files - Lukas Kuderna and Tomas Marques-Bonet, ICREA at Institut de Biologia Evolutiva, (UPF-CSIC), PRBB, 08003 Barcelona, Spain, and LaDeana Hillier, McDonnell Genome Institute at Washington University School of Medicine, St Louis, MO. Wes Warren, Lars Feuk (Uppsala U), Andrew Sharp (Mt Sinai) , Ed Green (Dovetail), Mikkel Schierup (Aarhus U). Thanks also to Illumina (Bojan Obradovic).  more

Global statistics

Total sequence length3,231,154,112
Total ungapped length3,132,603,083
Gaps between scaffolds1,066
Number of scaffolds45,510
Scaffold N5026,972,556
Scaffold L5038
Number of contigs72,225
Contig N50384,816
Contig L502,195
Total number of chromosomes and plasmids25
Number of component sequences (WGS or clone)74,995

Supplemental Content

PubMed articles for this assembly

See more...

Recent activity

Your browsing activity is empty.

Activity recording is turned off.

Turn recording back on

See more...

Global assembly definition

Download the full sequence report
Click on the table row to see sequence details in the table to the right
Assembly Unit Name
Primary Assembly
Assembly Unit: Primary Assembly (GCA_000000075.5)
Molecule nameGenBank sequenceRefSeq sequenceUnlocalized
sequences count
Chromosome 1CM000314.3=NC_006468.4104
Chromosome 2ACM000315.3=NC_006469.441
Chromosome 2BCM000316.3=NC_006470.434
Chromosome 3CM000317.3=NC_006490.424
Chromosome 4CM000318.3=NC_006471.443
Chromosome 5CM000319.3=NC_006472.448
Chromosome 6CM000320.3=NC_006473.499
Chromosome 7CM000321.4=NC_006474.4391
Chromosome 8CM000322.4=NC_006475.472
Chromosome 9CM000323.3=NC_006476.471
Chromosome 10CM000324.3=NC_006477.485
Chromosome 11CM000325.3=NC_006478.431
Chromosome 12CM000326.3=NC_006479.434
Chromosome 13CM000327.3=NC_006480.421
Chromosome 14CM000328.3=NC_006481.441
Chromosome 15CM000329.3=NC_006482.441
Chromosome 16CM000330.3=NC_006483.485
Chromosome 17CM000331.3=NC_006484.448
Chromosome 18CM000332.3=NC_006485.412
Chromosome 19CM000333.3=NC_006486.449
Chromosome 20CM000334.4=NC_006487.413
Chromosome 21CM004190.1=NC_006488.424
Chromosome 22CM000335.3=NC_006489.446
Chromosome XCM000336.3=NC_006491.4169
Chromosome YDP000054.3=NC_006492.411
unplacedn/an/an/a42,786

Assembly statistics

MoleculeSequence RoleTotal
Length
Scaffold
Count
Ungapped
Length
Scaffold
N50
Spanned
Gaps
Unspanned
Gaps
AllAssembled molecule3,231,154,11245,5103,132,603,08326,972,55626,7151,066
Chromosome 1AllAssembled moleculeUnlocalized scaffolds232,773,908228,573,4434,200,465211107104227,921,252223,796,9984,124,25420,509,65920,509,65993,1271,7141,5561581061060
Chromosome 2AAllAssembled moleculeUnlocalized scaffolds113,168,198111,504,1551,664,0431005941109,284,913107,631,8641,653,04927,428,00927,428,009110,5697697214858580
Chromosome 2BAllAssembled moleculeUnlocalized scaffolds135,285,634133,216,0152,069,619542034131,763,295129,694,4212,068,87471,421,05871,421,058363,8956936593419190
Chromosome 3AllAssembled moleculeUnlocalized scaffolds203,297,375202,621,043676,332421824199,247,456198,571,279676,17771,286,96971,286,96948,6831,1571,1431417170
Chromosome 4AllAssembled moleculeUnlocalized scaffolds197,035,191194,502,3332,532,858672443192,824,630190,297,1372,527,49344,849,16244,849,162175,1841,3261,2755123230
Chromosome 5AllAssembled moleculeUnlocalized scaffolds184,214,194181,907,2622,306,9321025448179,976,254177,675,3652,300,88926,504,67226,504,672139,7951,2871,2424553530
Chromosome 6AllAssembled moleculeUnlocalized scaffolds179,011,730175,400,5733,611,1571697099174,832,916171,240,4983,592,41827,063,93027,063,930118,4781,1701,0888269690
Chromosome 7AllAssembled moleculeUnlocalized scaffolds221,938,649166,211,67055,726,97940918391215,219,848159,930,55755,289,29129,824,91438,418,285528,5111,32938094917170
Chromosome 8AllAssembled moleculeUnlocalized scaffolds150,768,290147,911,6122,856,678992772146,473,591143,645,6332,827,95827,900,61827,900,618139,1131,1201,0556526260
Chromosome 9AllAssembled moleculeUnlocalized scaffolds120,263,006116,767,8533,495,1531528171116,208,097112,751,0183,457,07913,860,21413,860,214119,4531,02790112680800
Chromosome 10AllAssembled moleculeUnlocalized scaffolds138,578,018135,926,7272,651,2911536885134,351,973131,727,6282,624,34517,416,82417,416,824109,5601,0899919867670
Chromosome 11AllAssembled moleculeUnlocalized scaffolds137,451,432135,753,8781,697,554582731133,297,589131,620,2111,677,37845,297,16445,297,164131,7249919385326260
Chromosome 12AllAssembled moleculeUnlocalized scaffolds138,872,156137,163,2841,708,872592534134,568,710132,861,0631,707,64731,288,21731,288,217108,6321,1381,1043424240
Chromosome 13AllAssembled moleculeUnlocalized scaffolds101,748,333100,452,9761,295,35738172198,083,58696,788,4941,295,09250,851,15950,851,159197,8915765512517170
Chromosome 14AllAssembled moleculeUnlocalized scaffolds93,959,18591,965,0841,994,10171304190,171,22088,215,0201,956,20073,887,94173,887,941193,5546786354330300
Chromosome 15AllAssembled moleculeUnlocalized scaffolds84,477,20283,230,9421,246,26090494180,783,95679,539,8031,244,15314,954,79314,954,79389,8986836196449490
Chromosome 16AllAssembled moleculeUnlocalized scaffolds84,518,42081,586,0972,932,3232021178580,351,89577,471,9622,879,9335,892,2335,892,23384,6538897611281161160
Chromosome 17AllAssembled moleculeUnlocalized scaffolds86,139,22683,181,5702,957,656138904882,169,82779,227,5692,942,2585,600,0635,695,869169,0688387815789890
Chromosome 18AllAssembled moleculeUnlocalized scaffolds79,012,29078,221,452790,83832201275,500,70374,710,290790,41333,105,12833,105,128677,2374354211419190
Chromosome 19AllAssembled moleculeUnlocalized scaffolds65,212,03761,309,0273,903,01089404961,045,07857,143,0933,901,9855,249,8885,249,888203,8678787898939390
Chromosome 20AllAssembled moleculeUnlocalized scaffolds68,509,75766,533,1301,976,62736231364,954,00762,999,7651,954,24226,319,46931,887,472267,3465074644322220
Chromosome 21AllAssembled moleculeUnlocalized scaffolds36,488,12733,445,0713,043,05643192436,109,19533,127,6722,981,52330,645,28630,645,286390,7183132407318180
Chromosome 22AllAssembled moleculeUnlocalized scaffolds41,276,30037,823,1493,453,15183374637,744,05134,342,9333,401,1184,144,8374,144,837224,4124593699037370
Chromosome XAllAssembled moleculeUnlocalized scaffolds184,371,542155,549,66228,821,88020334169178,998,496150,185,68828,812,80825,963,05425,963,054390,6741,6991,28241733330
Chromosome YAllAssembled moleculeUnlocalized scaffolds28,586,56226,350,5152,236,04724131127,647,72525,417,2852,230,4402,357,6863,655,208238,54186107612120
unplacedAssembled molecule124,197,35042,786123,072,8204,4253,8640
Support Center