PubMed Nucleotide Protein Genome Gene Structure PopSet Taxonomy Help
  Search for    on chromosome(s)    assembly   

Map Viewer

Map Viewer Home
Map Viewer Help
Arabidopsis Maps Help
Search all plants

NCBI Resources

Genome Project

Organism Data in GenBank


Sequencing Centers

1001 Genome
SPP Consortium

Sequencing Projects

Arabidopsis Genome Initiative

Related Resources

Kazusa - EST
Genome Reannotation Plan
Functional Genomics Consortium

Other Databases


Arabidopsis thaliana (thale cress) genome view
TAIR10 statistics

  BLAST search

     Lineage: Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; Gunneridae; Pentapetalae; rosids; malvids; Brassicales; Brassicaceae; Camelineae; Arabidopsis; Arabidopsis thaliana

The Arabidopsis Information Resource (TAIR) has announced the release of the latest version of the Arabidopsis genome annotation (TAIR10). The latest release builds upon the gene structures of the previous TAIR9 release using RNA-seq and proteomics datasets as well as manual updates informed by cross species alignments, peptides and community input regarding missing and incorrectly annotated genes. This version is available for ftp download.

The TAIR10 release contains 27,416 protein coding genes, 4827 pseudogenes or transposable element genes and 1359 ncRNAs (33,602 genes in all, 41,671 gene models). A total of 126 new loci and 2099 new gene models were added.

Eighteen percent (5885) of Arabidopsis genes now have annotated splice variants. Updates were made to 1184 gene models of which 707 had CDS updates. There were 41 gene splits and 37 gene merges. No changes were made to the Arabidopsis genome assembly for the TAIR10 release.

Gene annotation utilized available proteomics data (Baerenfaller et al., 2008 and Castellana et al., 2008) and RNA-seq data from the Ecker and Mockler labs (Lister et al. 2008, Filichkin et al. 2010). RNA-seq data was mapped to the Arabidopsis genome using TopHat, HashMatch or supersplat. After quality and low complexity filtering a total of ~200 million RNA-seq reads were successfully mapped to the genome. Of these, ~9 million represent spliced reads. Proteomics data and spliced RNA-seq reads were provided to Augustus and the resulting gene models categorised and manually reviewed. Validated gene updates, novel genes and novel splice variants were incorporated into the TAIR10 release. Additional spliced RNA-seq reads not already incorporated into gene models by Augustus were supplied to TAU. The resulting TAU models were again reviewed for potential novel splice variants. Transcript assemblies were generated via Cufflinks using all spliced reads and unspliced reads from the Ecker sets. Transcript assemblies were filtered and compared to existing gene models, resulting in the addition of 56 novel genes. Additional new proteome data provided to us by Katja Baerenfaller was used to directly update 24 gene models.

Gene models created using the Gnomon pipeline were provided to TAIR by NCBI. Reanalysis of these models for TAIR10 resulted in 11 additional novel genes, 67 additional alternative splice variants and 164 updates to existing genes. Additional details can be viewed on the TAIR website.

Arabidopsis thaliana is a small flowering plant of mustard family, brassicaceae (Cruciferae). It was selected as a model organism for genome sequencing in plants based on the fact that it has (1) a small genome of ~120 Mb with a simple structure having few repeated sequences and high gene density (2) short generation time of six weeks from seed germination to seed set (3) produces large number of seeds, and (4) is easy to transform. The sequencing was done by an international collaboration, collectively termed the Arabidopsis Genome Initiative (AGI). It consisted of research groups in the U.S., Europe and Japan. The project was initiated in 1996 and completed in 2000 (Nature, 408:796-815).

Available Documentation:

Last modified: Oct 27 2011

Disclaimer | Write to the Help Desk