Assembly of 500,000 inter-specific catfish expressed sequence tags and large scale gene-associated marker development for whole genome association studies.
Wang S,
Peatman E,
Abernathy J,
Waldbieser G,
Lindquist E,
Richardson P,
Lucas S,
Wang M,
Li P,
Thimmapuram J,
Liu L,
Vullaganti D,
Kucuktas H,
Murdock C,
Small BC,
Wilson M,
Liu H,
Jiang Y,
Lee Y,
Chen F,
Lu J,
Wang W,
Xu P,
Somridhivej B,
Baoprasertkul P,
Quilang J,
Sha Z,
Bao B,
Wang Y,
Wang Q,
Takano T,
Nandi S,
Liu S,
Wong L,
Kaltenboeck L,
Quiniou S,
Bengten E,
Miller N,
Trant J,
Rokhsar D,
Liu Z;
Catfish Genome Consortium.
Ainsworth J, Altinok I, Arias CR, Bader JA, Bilodeau AL, Bird C, Bogerd J, Bosworth BG, Bruch RC, Burnett K, Caprio JT, Chappell J, Chatakondi N, Chinchar G, Dickhoff WW, DiGiulio RT, Duan C, Duke MV, Dunham RA, Gabel S, Giambernardi TA, Gray WL, Green ED, Hanson LA, Hardman M, He C, Hikima J, Hutson A, Jaso-Friedmann L, Ju Z, Karsi A, Kelley K, Kingsley D, Kleinholz C, Klesius PH, Kocabas A, Lee WK, Lennard M, Litaker W, Litman GW, Lobb CJ, Luker G, Magor BG, McConnel TJ, Muir W, Noga E, Nusbaum K, Ourth DD, Panangala V, Patino R, Peterson BC, Phelps R, Plant KP, Postlethwait JH, Quintero HE, Rodriguez D, Saunders HL, Scheffler B, Schwedler T, Shelby RA, Simc W, Shoemaker CA, Tang L, Terhune J, Thune RL, Tiersch TR, Warr GW, Welker T, Westerfield M, Willett KL, Williams K, Winn R, Wu C, Xu D, Yant R, Yeh HY, Zohar Y, Zou J.
Source
The Fish Molecular Genetics and Biotechnology Laboratory, Department of Fisheries and Allied Aquacultures and Program of Cell and Molecular Biosciences, Aquatic Genomics Unit, 203 Swingle Hall, Auburn University, Auburn, AL 36849, USA. wangsha@auburn.edu
Abstract
BACKGROUND:
Through the Community Sequencing Program, a catfish EST sequencing project was carried out through a collaboration between the catfish research community and the Department of Energy's Joint Genome Institute. Prior to this project, only a limited EST resource from catfish was available for the purpose of SNP identification.
RESULTS:
A total of 438,321 quality ESTs were generated from 8 channel catfish (Ictalurus punctatus) and 4 blue catfish (Ictalurus furcatus) libraries, bringing the number of catfish ESTs to nearly 500,000. Assembly of all catfish ESTs resulted in 45,306 contigs and 66,272 singletons. Over 35% of the unique sequences had significant similarities to known genes, allowing the identification of 14,776 unique genes in catfish. Over 300,000 putative SNPs have been identified, of which approximately 48,000 are high-quality SNPs identified from contigs with at least four sequences and the minor allele presence of at least two sequences in the contig. The EST resource should be valuable for identification of microsatellites, genome annotation, large-scale expression analysis, and comparative genome analysis.
CONCLUSIONS:
This project generated a large EST resource for catfish that captured the majority of the catfish transcriptome. The parallel analysis of ESTs from two closely related Ictalurid catfishes should also provide powerful means for the evaluation of ancient and recent gene duplications, and for the development of high-density microarrays in catfish. The inter- and intra-specific SNPs identified from all catfish EST dataset assembly will greatly benefit the catfish introgression breeding program and whole genome association studies.
- PMID:
- 20096101
- [PubMed - indexed for MEDLINE]
- PMCID: PMC2847720
Free PMC ArticleFigure 2
Distribution of contig sizes.
Genome Biol. Genome Biol;11(1):R8-R8.
Figure 3
Distribution of sequence similarity between blue catfish and channel catfish sequences.
Genome Biol. Genome Biol;11(1):R8-R8.
Figure 1
Length distribution of Joint Genome Institute EST sequences.
Genome Biol. Genome Biol;11(1):R8-R8.
Figure 7
Conservation of catfish gene identities with other species. Number of catfish homologous genes identified from other species using BLASTX searches.
Genome Biol. Genome Biol;11(1):R8-R8.
Figure 4
Open reading frame (ORF) length distribution from unique sequences of the all catfish assembly.
Genome Biol. Genome Biol;11(1):R8-R8.
Figure 6
Comparison of shared and unique gene identities of channel catfish and blue catfish from a total of 14,776 unique genes.
Genome Biol. Genome Biol;11(1):R8-R8.
Figure 8
Categorization of four different types of SNPs from the all catfish EST assembly and examples of SNPs whose categories could not be determined. (a-d) Types of SNPs from the all catfish EST assembly that can be identified from the all catfish EST assembly. (e) Examples of SNPs whose categories could not be determined because the minor allele sequence from a given species is fewer than two.
Genome Biol. Genome Biol;11(1):R8-R8.
Figure 5
Analysis of open reading frames (ORFs). (a) Percentage of ORFs among unique sequences from the all catfish EST assembly; (b) Percentage of ORF greater than 100 bp among unique sequences from the all catfish EST assembly; (c) Percentage of ORFs equal to or greater than 100 bp with significant BLASTX hits; (d) Percentage of ORFs smaller than 100 bp with significant BLASTX hits
Genome Biol. Genome Biol;11(1):R8-R8.
Publication Types
MeSH Terms
Substances