![]() | ![]() |
Formats:
|
||||||||||||||||
Copyright © 2006 The Author(s) FlyBase: genomes by the dozen The Biological Laboratories, Harvard University, 16 Divinity Avenue, Cambridge, MA 02138, USA 1Department of Biology, Indiana University, 1001 E 3rd Street, Bloomington, IN 47405, USA *To whom correspondence should be addressed. Tel: +1 617 495 9925; Fax: +1 617 496 1354; Email: crosby/at/morgan.harvard.edu The FlyBase Consortium: FlyBase-Harvard: W. Gelbart, M. Crosby, B. Matthews, S. Russo, D. Emmert, A. Schroeder, L. S. Gramates, P. Zhou, R. Kulathinal, M. Zytkovicz, P. Zhang, L. Bitsoi, A. Bhutkar, S. St Pierre, H. Zhang, A. Dirkmaat, K. Falls and M. Roark (Biological Laboratories, Harvard University, Cambridge, MA, USA). FlyBase-Cambridge: M. Ashburner, R. Drysdale, G. Millburn, D. Sutherland, R. Seal, P. Leyland, P. McQuilton, S. Tweedie, M. Williams and S. Marygold (Department of Genetics, University of Cambridge, Cambridge, UK). FlyBase-Indiana: T. Kaufman, K. Matthews, V. Strelets, G. Grumbling, A. DeAngelo, J. Goodman and R. Wilson (Department of Biology, Indiana University, Bloomington, IN, USA). Received September 15, 2006; Accepted October 3, 2006. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. This article has been cited by other articles in PMC.Abstract FlyBase (http://flybase.org/) is the primary database of genetic and genomic data for the insect family Drosophilidae. Historically, Drosophila melanogaster has been the most extensively studied species in this family, but recent determination of the genomic sequences of an additional 11 Drosophila species opens up new avenues of research for other Drosophila species. This extensive sequence resource, encompassing species with well-defined phylogenetic relationships, provides a model system for comparative genomic analyses. FlyBase has developed tools to facilitate access to and navigation through this invaluable new data collection. A NEW LOOK TO FlyBase Over the past 2 years, FlyBase has effected a complete migration and integration of its underlying databases into a PostgreSQL chado genome database [(1), http://www.gmod.org/schema/]. This has enabled a reimplementation from the ground up of the FlyBase public interface, with a complete redesign of the Web pages, queries and reports (Figure 1
THE CHANGING CONTENT OF FlyBase FlyBase is an integrated resource for a vast array of genetic and molecular data concerning the Drosophilidae, including interactive genomic maps, gene product descriptions, mutant allele phenotypes, genetic interactions, expression patterns, transgenic constructs and insertions of transgenic constructs, anatomy and images, and genetic stock collections (2). Data are captured from bulk data sources, by curation from the literature, and by annotation based on assessment of contributing evidence; data capture is organized around consistent attribution to primary sources. As far as possible, descriptive data are curated using controlled vocabularies (CV), including the Gene Ontology for molecular function, biological process and cellular component (3), the Sequence Ontology for sequence features (4) and an extensive CV for anatomical terms and developmental stages (available as part of the Open Biomedical Ontologies project, http://obo.sourceforge.net/). Although FlyBase has since its inception curated genetic and genomic information on the family Drosophilidae, it is only with the recent whole-genome shotgun (WGS) sequencing and assembly of 11 additional species that substantial amounts of non-melanogaster data have appeared in FlyBase. Indeed, it will be interesting to see how the availability of these WGS sequence assemblies will affect Drosophila research through the ability to perform genome-wide comparative analyses at the sequence, phenotypic and biological process levels. THE DROSOPHILA GENOMES (EMPHASIS ON THE PLURAL) The genome sequences of 12 species of Drosophila are now available. The species and their phylogeny are shown in the left-hand side of Figure 2
The other 11 species have all been sequenced in NHGRI-funded large-scale sequencing centers (Table 2), following the approval of three separate community-based white papers. The first white paper [(7), http://flybase.bio.indiana.edu/.data/docs/CommunityWhitePapers/DrosBoardWP2001.html] proposed the sequencing of a second species, Drosophila pseudoobscura, to support the annotation of D.melanogaster (8). The second white paper [(9), http://flybase.bio.indiana.edu/.data/docs/CommunityWhitePapers/] proposed the sequencing of several isolates of Drosophila simulans, a close relative of D.melanogaster, to understand the basis of variation within and between species, and the sequencing of a somewhat more distant member of the same species group, Drosophila yakuba, as an outgroup. The third white paper [(10), http://flybase.bio.indiana.edu/.data/docs/CommunityWhitePapers/GenomesWP2003.html] proposed the sequencing of eight additional species. Six of these species (Drosophila ananassae, Drosophila erecta, Drosophila grimshawi, Drosophila mojavensis, Drosophila virilis and Drosophila willistoni) were proposed principally to provide additional branch length for comparative genomic analysis in support of the annotation of D.melanogaster, as well as for the study of gene and chromosome evolution on a whole-genome scale. The other two species, D.persimilis and D.sechellia, are sibling species of D.pseudoobscura and D.simulans, respectively; these were chosen because the sibling species pairs can form fertile F1 hybrids and have been used to study genetic variation that underlies speciation.
REPRESENTATION OF THE DOZEN GENOMES IN FlyBase A group called ‘Assembly, Annotation and Analysis’ (AAA) has been coordinating the community production and distribution of the relevant large datasets, the production of consensus annotation sets and the preparation of the initial reports of the results of these studies (http://rana.lbl.gov/drosophila/). By the end of 2006, it is expected that the major datasets will have been produced, publications submitted and data contributed to FlyBase and GenBank. For each species these data will include several independent homology-based and ab initio gene prediction sets, consensus mRNA and protein annotation sets, orthologies, gene family groupings, and syntenic relationships among the species, the latter extending the previously known large-scale syntenic conservations among the chromosome arms of the genus Drosophila (see Figure 2 THE FlyBase BLAST TOOL: QUERIES ACROSS INSECT SPECIES The FlyBase BLAST tool serves as a convenient entry point to data for the insect species for which genomic sequence data are available, including the 12 Drosophila species, mosquito (Anopheles and Aedes), silkworm, honey bee and Tribolium. The tool provides an array of options in an intuitive format (Figure 3
THE GBrowse GENOME VIEWER: CUSTOMIZED VIEWS OF PREDICTIONS AND EVIDENCE Interactive views of the data generated by the genomic sequencing projects are presented using a newly modified version of the GBrowse genome viewer [(11), http://www.gmod.org/?q=node/71]. Entry to a specific genomic region may be accomplished by running a BLAST search first, as described above. The tool may also be accessed from the FlyBase home page or from the ‘Tools’ menu found in the top bar on all FlyBase reports. Once the species to be viewed is chosen and the region of interest specified, the data to be viewed can also be specified and its presentation customized (Figure 4
BULK DATA DOWNLOADS Data files for all classes of data in FlyBase are available for download by FTP in several formats, including GFF3 for sequence data. Links to the bulk data repositories may be accessed from the ‘Files’ menu, ‘Precomputed files’ option, at the top of all FlyBase pages; from there, the ‘Genomes: Annotation and Sequence’ section provides access to genome data for each (or all) of the sequenced species. In addition, bulk queries can be performed and downloaded via the ‘QueryBuilder’ tool, accessed from the top page or the ‘Tools’ menu. MORE ON THE SPECIES OF FAMILY DROSOPHILIDAE From the ‘Species’ menu on the top bar of the FlyBase home page and all report pages, additional information on the Drosophilidae may be accessed. At present there are four items to choose from: ‘Phylogeny’ links to an index of species, each linked to its position in the Drosophilidae phylogenetic tree; ‘Synteny table’ goes to the presentation of syntenic relationships of the chromosomal arms of the 12 sequenced species shown in Figure 2 FlyBase continues to curate and present traditional genetic data for all the Drosophilid species. Now, availability and integration of genomic data for 12 well-characterized species provide a powerful resource that will allow the research community to take full advantage of the family Drosophilidae as a model for comparative genomic and phylogenetic analyses. Acknowledgments FlyBase is supported by grant P41 HG00739 from the National Human Genome Research Institute, National Institutes of Health (USA), with additional support from the Medical Research Council (UK) grant G05000293. Funding to pay the Open Access publication charges for this article was provided by the NHGRI FlyBase grant award. Conflict of interest statement. None declared. REFERENCES 1. Zhou P., Emmert D., Zhang P. Using chado to store genome annotation data. In: Baxevanis A.D., Davison D.B., editors. Current Protocols in Bioinformatics. Vol. 2. Hoboken, NJ: John Wiley & Sons, Inc.; 2005. pp. 9.6.1–9.6.28. 2. Grumbling G., Strelets V., and FlyBase Consortium. FlyBase: anatomical data, images and queries. Nucleic Acids Res. 2006;34:D484–D488. [PubMed] 3. Harris M., Clark J., Ireland A., Lomax J., Ashburner M., Foulger R., Eilbeck K., Lewis S., Marshall B., Mungall C., et al. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004;32:D258–D261. [PubMed] 4. Eilbeck K., Lewis S., Mungall C., Yandell M., Stein L., Durbin R., Ashburner M. The sequence ontology: a tool for unification of genome annotations. Genome Biol. 2005;6:R44. [PubMed] 5. Celniker S.E., Wheeler D.A., Kronmiller B., Carlson J.W., Halpern A., Patel S., Adams M., Champe M., Dugan S.P., Frise E., et al. Finishing a whole-genome shotgun: release 3 of the Drosophila melanogaster euchromatic genome sequence. Genome Biol. 2002;3:RESEARCH0079. [PubMed] 6. Hoskins R.A., Smith C.D., Carlson J.W., de Carvalho A.B., Halpern A., Kaminker J.S., Kennedy C., Mungall C.J., Sullivan B.A., Sutton G.G., et al. Heterochromatic sequences in a Drosophila whole-genome shotgun assembly. Genome Biol. 2002;3:RESEARCH0085. [PubMed] 7. Cooley L., Desplan C., Gaul U., Geyer P., Kaufman T., Krasnow M., Rubin G., Gelbart W. Drosophila White Paper 2001. 2001. 8. Richards S., Liu Y., Bettencourt B.R., Hradecky P., Letovsky S., Nielsen R., Thornton K., Hubisz M.J., Chen R., Meisel R.P., et al. Comparative genome sequencing of Drosophila pseudoobscura: chromosomal, gene, and cis-element evolution. Genome Res. 2005;15:1–18. [PubMed] 9. Begun D.J., Langley C.H. 2003. Proposal for sequencing of Drosophila yakuba and Drosophila simulans, revised. 10. Clark A., Gibson G., Kaufman T., McAllister B., Myers E., O'Grady P. 2003. Proposal for Drosophila as a model system for comparative genomics. 11. Stein L., Mungall C., Shu S., Caudy M., Mangone M., Day A., Nickerson E., Stajich J., Harris T., Arva A., et al. The generic genome browser: a building block for a model organism database. Genome Res. 2002;12:1599–1610. [PubMed] |
PubMed related articles
Your browsing activity is empty. Activity recording is turned off. |
|||||||||||||||
Nucleic Acids Res. 2006 Jan 1; 34(Database issue):D484-8.
[Nucleic Acids Res. 2006]Nucleic Acids Res. 2004 Jan 1; 32(Database issue):D258-61.
[Nucleic Acids Res. 2004]Genome Biol. 2005; 6(5):R44.
[Genome Biol. 2005]Genome Biol. 2002; 3(12):RESEARCH0079.
[Genome Biol. 2002]Genome Biol. 2002; 3(12):RESEARCH0085.
[Genome Biol. 2002]Genome Res. 2005 Jan; 15(1):1-18.
[Genome Res. 2005]Genome Res. 2002 Oct; 12(10):1599-610.
[Genome Res. 2002]