Full Report

Send to:

Choose Destination

File
Clipboard
Collections

Download Assembly

Source database (GenBank or RefSeq) File type

Macaca_fascicularis_5.0

Organism name:

Macaca fascicularis (crab-eating macaque)

Sex:

female

BioSample:

SAMN00811240

BioProject:

PRJNA20409

Submitter:

Washington University (WashU)

Date:

2013/06/12

Synonyms:

macFas5

Assembly level:

Chromosome

Genome representation:

full

Excluded from RefSeq:

superseded by newer assembly for species

GenBank assembly accession:

GCA_000364345.1 (latest)

RefSeq assembly accession:

GCF_000364345.1 (suppressed) see latest RefSeq assembly for this species

RefSeq assembly and GenBank assembly identical:

no (hide details)

Only in RefSeq: chromosome MT

Data displayed for GenBank version

WGS Project:

AQIA01

Assembly method:

SOAPdenovo v. 1.0.5, SRPRISM v. 2.4; ARGO v. 0.1

Genome coverage:

68x

Sequencing technology:

Illumina HiSeq

IDs: 704988 [UID] 704988 [GenBank] 779818 [RefSeq]

See Genome Information for Macaca fascicularis

Pathogen Detection Resources

There are 13 assemblies for this organism

History (Show revision history)

GenBank Assembly Accession		RefSeq Assembly Accession	Assembly Name	Assembly Level	Status
GCA_000364345.1	≠	GCF_000364345.1	Macaca_fascicularis_5.0	Chromosome	Latest GenBank, RefSeq suppressed

Comment

Macaca fascicularis (cynomolgus macaque) Sequence Assembly Release Notes
 The cynomolgus macaque DNA for shotgun sequencing, is derived from a female, 5.8 yrs old, provided by Dr. Jay Kaplan and originated from "Tinjil", not a native location for cynomolgus, rather ... it is an island off the south coast of Java that was seeded with monkeys by the Washington National Primate Center. The original animal was trapped in eastern Sumatra. Sequences were generated on the Illumina HiSeq for assisted and de novo assembly. Sequence genome coverage for each paired end read type is as follows: 50x 300-500bp inserts, 10x 3kb insert and 2x 8kb insert. 
 Two independent assemblies were built with all sequence data, using an assisted assembler and the de novo assembler SOAP. The workflow for the assisted assembley is as follows: 1) map reads to reference and filter alignments using SRprism (unpublished but in process of being written up) that reports all alignments of equally good quality. Filtering is done by first finding out the histogram for per library insert size seen in alignments, deciding which range to use (usually tightest 99th percentile), and then retaining paired reads that have correct orientation with insert size in the desired range. Different data types (Illumina, traces, solid, 454) have slightly different filtering criteria. 2) use mapped and filtered reads for building consensus contigs. 3) find consecutive contigs that are bridged by mate pairs having 30-mers each on either side of the gap, de-novo assembly in gaps between bridged contigs: 30-mers from reads are used to build an index for de-novo assembly, only filter out reads and reads mapped to contig ends that go into that index, set a predefined maximum gap size and number of iterations used to limit the resources spent on any particular gap. 4) Find structural differences between scaffolds built and reference by using paired reads with mates on different scaffolds and do de-novo gap filling between reordered scaffolds [in progress]. The reference genome used to align cynomolgus macaque reads was the published version (MMUL_1) of rhesus macaque and a updated rhesus macaque assembled version not yet published (courtesy of Aleksey Zimin). Using the assisted assembly as the reference we aligned and merged the de novo assembly using the GAA tool.
 In the final assembly, referred to as Macaca_fascicularis_5.0, there were 102,878 contigs with an N50 contig length of 85 kb. There were 7627 supercontigs (scaffolds) with the N50 supercontig length of 144 Mb. A total of 2.8 Gb was assembled in contigs. 
 ****************************************************
 Macaca fascicularis Sequence and Assembly Credits

 DNA source - Dr. Jay Kaplan, Wake Forest Primate Facility, Wake Forest, NC. Genome Sequence - The Genome Institute, Washington University School of Medicine, St Louis, MO. Sequence Assembly - Richa Agarwala, Sergey Shiryaev, NCBI and The Genome Institute, Washington University School of Medicine, St Louis, MO. Assembly curation - LaDeana Hillier, The Genome Institute, Washington University School of Medicine, St Louis, MO. FISH mapping data - Mariano Rocchi, Department of Biology, University of Bari, Bari, Italy.
 Funding for the sequence characterization of the cynomolgus macaque genome was provided by NHGRI. 
 Author List: Richard K. Wilson, Wesley C. Warren
 ****************************************************
 Chromosome lengths
 Column 1 = Chromosome Column 2 = Chromosome lengths (including estimated gap sizes) Column 3 = Chromosome sequence length (without including estimated gap sizes)
 MFA1 227556264 217433370 MFA10 96509753 90761517 MFA11 137757926 132144036 MFA12 132586672 127191125 MFA13 111193037 106335528 MFA14 130733371 123895447 MFA15 112612857 107712928 MFA16 80997621 74103573 MFA17 96864807 92008008 MFA18 75711847 71766527 MFA19 59248254 51391499 MFA2 192460366 186559336 MFA20 78541002 72393001 MFA3 192294377 180410849 MFA4 170955103 164881207 MFA5 189454096 183527297 MFA6 181584905 175247550 MFA7 171882078 164071319 MFA8 146850525 140657447 MFA9 133195287 127272501 MFAX 152835861 144357465
 An additional 69.8Mb of sequence is unlocalized.
 ***********************************************************************************
 Assembly statistics:
 *** Contiguity: Contig *** Total contig number: 102878 Total contig bases: 2805274345 bp Average contig length: 27268 bp Maximum contig length: 764150 bp N50 contig length: 85974 bp N50 contig number: 9304
 Major contig (> 500 bp) number: 81458 Major_contig bases: 2801848696 bp Major_contig avg contig length: 34396 Major_contig N50 contig length: 86137 Major_contig N50 contig number: 9284

 *** Contiguity: Supercontig *** Total supercontig number: 7627 Average supercontig length: 367808 bp Maximum supercontig length: 221345846 bp N50 supercontig length: 144445942 bp N50 supercontig number: 8
 Major supercontig (> 500 bp) number: 7587 Major_supercontig bases: 2805261977 bp Major_supercontig avg  more

Global statistics

Total sequence length	2,946,827,162
Total ungapped length	2,803,850,123
Gaps between scaffolds	24
Number of scaffolds	7,624
Scaffold N50	88,649,475
Scaffold L50	14
Number of contigs	87,763
Contig N50	86,040
Contig L50	9,296
Total number of chromosomes and plasmids	21
Number of component sequences (WGS or clone)	87,763

Molecule name	GenBank sequence		RefSeq sequence	Unlocalized sequences count
Chromosome 1	CM001919.1	=	NC_022272.1	28
Chromosome 2	CM001920.1	=	NC_022273.1	20
Chromosome 3	CM001921.1	=	NC_022274.1	41
Chromosome 4	CM001922.1	=	NC_022275.1	69
Chromosome 5	CM001923.1	=	NC_022276.1	22
Chromosome 6	CM001924.1	=	NC_022277.1	15
Chromosome 7	CM001925.1	=	NC_022278.1	25
Chromosome 8	CM001926.1	=	NC_022279.1	17
Chromosome 9	CM001927.1	=	NC_022280.1	27
Chromosome 10	CM001928.1	=	NC_022281.1	18
Chromosome 11	CM001929.1	=	NC_022282.1	19
Chromosome 12	CM001930.1	=	NC_022283.1	9
Chromosome 13	CM001931.1	=	NC_022284.1	12
Chromosome 14	CM001932.1	=	NC_022285.1	18
Chromosome 15	CM001933.1	=	NC_022286.1	19
Chromosome 16	CM001934.1	=	NC_022287.1	22
Chromosome 17	CM001935.1	=	NC_022288.1	29
Chromosome 18	CM001936.1	=	NC_022289.1	5
Chromosome 19	CM001937.1	=	NC_022290.1	28
Chromosome 20	CM001938.1	=	NC_022291.1	17
Chromosome X	CM001939.1	=	NC_022292.1	12
unplaced	n/a	n/a	n/a	7,107

Molecule	Sequence Role	Total Length	Scaffold Count	Ungapped Length	Scaffold N50	Spanned Gaps	Unspanned Gaps
All	Assembled molecule	2,946,827,162	7,624	2,803,850,123	88,649,475	80,139	24
Chromosome 1	AllAssembled moleculeUnlocalized scaffolds	228,955,476227,556,2641,399,212	30228	218,712,996217,433,3701,279,626	138,665,737138,665,737152,538	6,4726,365107	110
Chromosome 2	AllAssembled moleculeUnlocalized scaffolds	192,909,870192,460,366449,504	22220	186,971,721186,559,336412,385	125,741,623125,741,62363,034	4,3104,27832	110
Chromosome 3	AllAssembled moleculeUnlocalized scaffolds	193,512,962192,294,3771,218,585	43241	181,472,993180,410,8491,062,144	129,674,376129,674,37664,819	4,9314,831100	110
Chromosome 4	AllAssembled moleculeUnlocalized scaffolds	173,232,586170,955,1032,277,483	71269	166,827,843164,881,2071,946,636	119,724,987119,724,98765,370	4,1403,886254	110
Chromosome 5	AllAssembled moleculeUnlocalized scaffolds	190,271,649189,454,096817,553	25322	184,225,783183,527,297698,486	105,597,079105,597,07987,690	4,2194,13287	220
Chromosome 6	AllAssembled moleculeUnlocalized scaffolds	181,913,984181,584,905329,079	17215	175,554,307175,247,550306,757	132,081,036132,081,03674,686	4,1974,17324	110
Chromosome 7	AllAssembled moleculeUnlocalized scaffolds	172,638,558171,882,078756,480	27225	164,699,552164,071,319628,233	107,376,033107,376,03353,086	4,6574,58869	110
Chromosome 8	AllAssembled moleculeUnlocalized scaffolds	147,524,517146,850,525673,992	19217	141,254,680140,657,447597,233	100,378,728100,378,728423,537	3,2203,16951	110
Chromosome 9	AllAssembled moleculeUnlocalized scaffolds	133,925,568133,195,287730,281	29227	127,928,433127,272,501655,932	92,187,01892,187,01846,688	3,4043,35351	110
Chromosome 10	AllAssembled moleculeUnlocalized scaffolds	96,855,22096,509,753345,467	20218	91,082,61290,761,517321,095	60,750,29060,750,29027,377	3,2213,19229	110
Chromosome 11	AllAssembled moleculeUnlocalized scaffolds	138,871,270137,757,9261,113,344	21219	133,185,823132,144,0361,041,787	101,971,489101,971,489603,909	4,0373,97958	110
Chromosome 12	AllAssembled moleculeUnlocalized scaffolds	132,992,465132,586,672405,793	1239	127,546,852127,191,125355,727	106,729,359106,729,35970,353	2,9172,88136	220
Chromosome 13	AllAssembled moleculeUnlocalized scaffolds	111,518,885111,193,037325,848	15312	106,613,556106,335,528278,028	88,649,47588,649,47575,808	2,7872,74542	220
Chromosome 14	AllAssembled moleculeUnlocalized scaffolds	132,195,074130,733,3711,461,703	20218	125,296,749123,895,4471,401,302	67,271,94767,271,947992,516	3,4133,36251	110
Chromosome 15	AllAssembled moleculeUnlocalized scaffolds	113,086,136112,612,857473,279	21219	108,135,860107,712,928422,932	98,078,25298,078,25257,610	3,0442,99054	110
Chromosome 16	AllAssembled moleculeUnlocalized scaffolds	82,009,34180,997,6211,011,720	24222	75,009,13174,103,573905,558	57,635,53457,635,53490,896	3,3133,22687	110
Chromosome 17	AllAssembled moleculeUnlocalized scaffolds	97,546,54296,864,807681,735	31229	92,614,48892,008,008606,480	53,877,09553,877,09547,900	2,2932,22766	110
Chromosome 18	AllAssembled moleculeUnlocalized scaffolds	75,807,05675,711,84795,209	725	71,849,47671,766,52782,949	48,683,46848,683,46828,324	1,6961,68511	110
Chromosome 19	AllAssembled moleculeUnlocalized scaffolds	60,553,05359,248,2541,304,799	30228	52,581,05651,391,4991,189,557	33,250,00333,250,00371,052	3,7393,627112	110
Chromosome 20	AllAssembled moleculeUnlocalized scaffolds	78,739,55578,541,002198,553	19217	72,565,36272,393,001172,361	44,637,17144,637,17121,082	2,6522,62428	110
Chromosome X	AllAssembled moleculeUnlocalized scaffolds	153,601,299152,835,861765,438	14212	145,064,668144,357,465707,203	93,731,72593,731,725214,373	4,5314,48249	110
unplaced	Assembled molecule	58,166,096	7,107	54,656,182	22,498	2,946	0

Assembly

Genome Assembly

Result Filters

Organism group

Status

Assembly level

RefSeq category

Exclude

Annotation status

Taxonomy check status

Relation to type material

Assembly type

Sequence release date

Custom date range

Contig N50

Custom range

Scaffold N50

Custom range

Additional filters

Send to:

Download Assembly

Macaca_fascicularis_5.0

See Genome Information for Macaca fascicularis

Pathogen Detection Resources

There are 13 assemblies for this organism

History (Show revision history)

Comment

Global statistics

Supplemental Content

Access the data

Assembly Information

Related Information

Recent activity

Global assembly definition

Assembly statistics