Format

Send to

Choose Destination
See comment in PubMed Commons below
Nat Genet. 2014 Jun;46(6):567-72. doi: 10.1038/ng.2987. Epub 2014 May 18.

Genome sequence of the cultivated cotton Gossypium arboreum.

Author information

1
1] State Key Laboratory of Cotton Biology, Institute of Cotton Research of the Chinese Academy of Agricultural Sciences, Anyang, China. [2].
2
1] BGI-Shenzhen, Shenzhen, China. [2].
3
1] State Key Laboratory of Protein and Plant Gene Research, College of Life Sciences, Peking University, Beijing, China. [2].
4
1] Key Laboratory for Crop Germplasm Resources of Hebei, Agricultural University of Hebei, Baoding, China. [2].
5
State Key Laboratory of Cotton Biology, Institute of Cotton Research of the Chinese Academy of Agricultural Sciences, Anyang, China.
6
BGI-Shenzhen, Shenzhen, China.
7
State Key Laboratory of Protein and Plant Gene Research, College of Life Sciences, Peking University, Beijing, China.
8
Key Laboratory for Crop Germplasm Resources of Hebei, Agricultural University of Hebei, Baoding, China.
9
Crop Germplasm Research Unit, Southern Plains Agricultural Research Center, US Department of Agriculture-Agricultural Research Service (USDA-ARS), College Station, Texas, USA.
10
1] BGI-Shenzhen, Shenzhen, China. [2] Department of Biology, University of Copenhagen, Copenhagen, Denmark. [3] King Abdulaziz University, Jeddah, Saudi Arabia. [4] Macau University of Science and Technology, Macau, China. [5] Department of Medicine, University of Hong Kong, Hong Kong. [6] State Key Laboratory of Pharmaceutical Biotechnology, University of Hong Kong, Hong Kong.

Abstract

The complex allotetraploid nature of the cotton genome (AADD; 2n = 52) makes genetic, genomic and functional analyses extremely challenging. Here we sequenced and assembled the Gossypium arboreum (AA; 2n = 26) genome, a putative contributor of the A subgenome. A total of 193.6 Gb of clean sequence covering the genome by 112.6-fold was obtained by paired-end sequencing. We further anchored and oriented 90.4% of the assembly on 13 pseudochromosomes and found that 68.5% of the genome is occupied by repetitive DNA sequences. We predicted 41,330 protein-coding genes in G. arboreum. Two whole-genome duplications were shared by G. arboreum and Gossypium raimondii before speciation. Insertions of long terminal repeats in the past 5 million years are responsible for the twofold difference in the sizes of these genomes. Comparative transcriptome studies showed the key role of the nucleotide binding site (NBS)-encoding gene family in resistance to Verticillium dahliae and the involvement of ethylene in the development of cotton fiber cells.

PMID:
24836287
DOI:
10.1038/ng.2987
[Indexed for MEDLINE]
PubMed Commons home

PubMed Commons

0 comments
How to join PubMed Commons

    Supplemental Content

    Full text links

    Icon for Nature Publishing Group
    Loading ...
    Support Center