NCBI Bookshelf. A service of the National Library of Medicine, National Institutes of Health.
This publication is provided for historical reference only and the information may be out of date.
Bioinformatics consists of a computational approach to biomedical information management and analysis. It is being used increasingly as a component of research within both academic and industrial settings and is becoming integrated into both undergraduate and postgraduate curricula. The new generation of biology graduates is emerging with experience in using bioinformatics resources and, in some cases, programming skills.
The National Center for Biotechnology Information (NCBI) is one of the world's premier Web sites for biomedical and bioinformatics research. Based within the National Library of Medicine at the National Institutes of Health, USA, the NCBI hosts many databases used by biomedical and research professionals. The services include PubMed, the bibliographic database; GenBank, the nucleotide sequence database; and the BLAST algorithm for sequence comparison, among many others.
Although each NCBI resource has online help documentation associated with it, there is no cohesive approach to describing the databases and search engines, nor any significant information on how the databases work or how they can be leveraged, for bioinformatics research on a larger scale. The NCBI Handbook is designed to address this information gap.
All of our users know how to execute a straightforward PubMed or BLAST search. However, feedback from help desk personnel and booth staff at scientific meetings suggests that people often want to know how to use our resources in a more sophisticated manner and are frequently unaware of less well-known databases that might be helpful to them. The intended audience for The NCBI Handbook is, therefore, the growing number of scientists and students who would like a more in-depth guide to NCBI resources—powerusers and aspiring powerusers.
The NCBI Handbook is focused on the relatively stable information about each resource; it is not a point-and-click user guide (this type of information can be found in the online help documents, referred to frequently but not repeated, in the Handbook). Each chapter is devoted to one service; after a brief overview on using the resource, there is an account of how the resource works, including topics such as how data are included in a database, database design, query processing, and how the different resources relate to each other. For example, the BLAST chapter briefly describes what to use BLAST for, the various varieties of the BLAST algorithm, and BLAST statistics, before discussing output formats, query processing, and tips for setting up a BLAST database. A certain amount of biological knowledge is assumed.
The online content will be updated when necessary, although major changes are not expected to occur more than once every few years. (For example, PubMed query processing does not change dramatically year after year.) We hope that The NCBI Handbook will provide a valuable reference for anyone who wants to use our resources more effectively.
Contents
- Part 1. The Databases
- Chapter 1. GenBank: The Nucleotide Sequence DatabaseIlene Mizrachi.Created: October 9, 2002; Last Update: August 22, 2007.
- History
- International Collaboration
- Confidentiality of Data
- Direct Submissions
- Bulk Submissions: High-Throughput Genomic Sequence (HTGS)
- Whole Genome Shotgun Sequences (WGS)
- Bulk Submissions: EST, STS, and GSS
- Bulk Submissions: HTC and FLIC
- Submission Tools
- Sequence Data Flow and Processing: From Laboratory to GenBank
- Microbial Genomes
- Third Party Annotation (TPA) Sequence Database
- Appendix: GenBank, RefSeq, TPA and UniProt: What’s in a Name?
- References
- Chapter 2. PubMed: The Bibliographic DatabaseKathi Canese, Jennifer Jentsch, and Carol Myers.Created: October 9, 2002; Last Update: August 13, 2003.
- Chapter 3. Macromolecular Structure DatabasesEric Sayers and Steve Bryant.Created: October 9, 2002; Last Update: August 13, 2003.
- Overview
- Content of the Molecular Modeling Database (MMDB)
- Content of the Conserved Domain Database (CDD)
- Finding and Viewing Structures
- Finding and Viewing Structure Neighbors
- Finding and Viewing Conserved Domains
- Finding and Viewing Proteins with Similar Domain Architectures
- Links Between Structure and Other Resources
- Saving Output from Database Searches
- FTP
- Frequently Asked Questions
- References
- Chapter 4. The Taxonomy ProjectScott Federhen.Created: October 9, 2002; Last Update: August 13, 2003.
- Introduction
- Adding to the Taxonomy Database
- Using the Taxonomy Browser
- The Taxonomy Database: TAXON
- Nomenclature Issues
- Taxonomy in Entrez: A Quick Tour
- The Common Tree Viewer
- Indexing Taxonomy in Entrez
- The Taxonomy Statistics Page
- Other Relevant References
- NCBI Taxonomists
- Contact Us
- Appendix 1. TAXON nametypes
- Appendix 2. Functional classes of TAXON scientific names
- Appendix 3. Other TAXON data types
- Chapter 5. The Single Nucleotide Polymorphism Database (dbSNP) of Nucleotide Sequence VariationAdrienne Kitts and Stephen Sherry.Created: October 9, 2002; Last Update: February 2, 2011.
- Introduction
- Searching dbSNP
- Submitted Content
- Computed Content (The dbSNP Build Cycle)
- dbSNP Resource Integration
- How to Create a Local Copy of dbSNP
- Appendix 1. dbSNP report formats.
- Appendix 2. Rules and methodology for mapping
- Appendix 3 Alignment profiling function
- Appendix 4. 3D structure neighbor analysis.
- Chapter 6. The Gene Expression Omnibus (GEO): A Gene Expression and Hybridization RepositoryRon Edgar and Alex Lash.Created: October 9, 2002; Last Update: August 13, 2003.
- Chapter 7. Online Mendelian Inheritance in Man (OMIM): A Directory of Human Genes and Genetic DisordersDonna Maglott, Joanna S. Amberger, and Ada Hamosh.Created: October 9, 2002.
- Chapter 8. The NCBI BookShelf: Searchable Biomedical BooksBart Trawick, Jeff Beck, and Jo McEntyre.Created: October 9, 2002; Last Update: August 13, 2003.
- Chapter 9. PubMed Central (PMC): An Archive for Literature from Life Sciences JournalsJeff Beck and Ed Sequeira.Created: October 9, 2002; Last Update: August 13, 2003.
- Chapter 10. The SKY/CGH Database for Spectral Karyotyping and Comparative Genomic Hybridization DataTurid Knutsen, Vasuki Gobu, Rodger Knaus, Thomas Ried, and Karl Sirotkin.Created: October 9, 2002; Last Update: August 13, 2003.
- Chapter 11. The Major Histocompatibility Complex Database, dbMHCAdrienne Kitts, Michael Feolo, and Wolfgang Helmberg.Created: May 27, 2003; Last Update: August 13, 2003.
- Chapter 1. GenBank: The Nucleotide Sequence Database
- Part 2. Data Flow and Processing
- Chapter 12. Sequin: A Sequence Submission and Editing ToolJonathan Kans.Created: October 9, 2002; Last Update: August 13, 2003.
- Chapter 13. The Processing of Biological Sequence Data at NCBIKarl Sirotkin, Tatiana Tatusova, Eugene Yaschenko, and Mark Cavanaugh.Created: October 9, 2002; Last Update: March 14, 2006.
- Chapter 14. Genome Assembly and Annotation ProcessPaul Kitts.Created: October 9, 2002; Last Update: August 13, 2003.
- Overview of the Genome Assembly and Annotation Process
- The Input Data
- Preparation of the Input Sequences
- Alignment of Sequences to the Input Genomic Sequences
- Genome Assembly
- Annotation of Genes
- Annotation of Other Features
- Product Data Sets
- Production of Maps That Display Genome Features
- Public Release of Assembly and Models
- Integration with Other Resources
- Contributors
- References
- Chapter 12. Sequin: A Sequence Submission and Editing Tool
- Part 3. Querying and Linking the Data
- Chapter 15. The Entrez Search and Retrieval SystemJim Ostell.Created: October 9, 2002; Last Update: August 13, 2003.
- Chapter 16. The BLAST Sequence Analysis ToolTom Madden.Created: October 9, 2002; Last Update: August 13, 2003.
- Introduction
- How BLAST Works: The Basics
- BLAST Scores and Statistics
- BLAST Output: 1. The Traditional Report
- BLAST Output: 2. The Hit Table
- BLAST Output: 3. Structured Output
- BLAST Code
- Appendix 1. FASTA identifiers
- Appendix 2. Readdb API
- Appendix 3. Excerpt from a demonstration program doblast.c
- Appendix 4. A function to print a view of a SeqAlign: MySeqAlignPrint
- References
- Chapter 17. LinkOut: Linking to External Resources from Entrez DatabasesKathy Kwan.Created: October 9, 2002; Last Update: August 13, 2003.
- Chapter 18. The Reference Sequence (RefSeq) DatabaseKim Pruitt, Garth Brown, Tatiana Tatusova, and Donna Maglott.Created: October 9, 2002; Last Update: April 6, 2012.
- Chapter 19. Gene: A Directory of GenesDonna Maglott, Kim Pruitt, and Tatiana Tatusova.Created: March 3, 2005; Last Update: December 12, 2011.
- Chapter 20. Using the Map Viewer to Explore GenomesSusan M. Dombrowski and Donna Maglott.Created: October 9, 2002; Last Update: August 13, 2003.
- Chapter 21. UniGene: A Unified View of the TranscriptomeJoan U. Pontius, Lukas Wagner, and Gregory D. Schuler.Created: October 9, 2002; Last Update: August 13, 2003.
- Chapter 22. The Clusters of Orthologous Groups (COGs) Database: Phylogenetic Classification of Proteins from Complete GenomesEugene V. Koonin.Created: October 9, 2002; Last Update: August 13, 2003.
- Chapter 15. The Entrez Search and Retrieval System
- Part 4. User Support
- Chapter 23. User Services: Helping You Find Your WayDavid Wheeler and Barbara Rapp.Created: October 9, 2002; Last Update: August 13, 2003.
- Chapter 24. Exercises: Using Map ViewerDavid Wheeler, Kim Pruitt, Donna Maglott, Susan Dombrowski, and Andrei Gabrelian.Created: November 4, 2002; Last Update: August 13, 2003.
- 1. How Do I Obtain the Genomic Sequence around My Gene of Interest?
- 2. If I Have Physical and/or Genetic Mapping Data, How Do I Use the Map Viewer to Find a Candidate Disease Gene in That Region?
- 3. How Can I Find and Display a Gene with the Map Viewer?
- 4. How Can I Analyze a Gene Using the Map Viewer?
- 5. How Can I Create My Own Transcript Models with the Map Viewer?
- 6. Using the Mouse Map Viewer
- 7. How Can I Find Members of a Gene Family Using the Map Viewer?
- 8. How Can I Find Genes Encoding a Protein Domain Using the Map Viewer?
- Chapter 23. User Services: Helping You Find Your Way
- Glossary
- The NCBI HandbookThe NCBI Handbook
- water-souble seed protein precursor [Bertholletia excelsa]water-souble seed protein precursor [Bertholletia excelsa]gi|167188|gb|AAA33010.1|Protein
- Hereditary Neuropathy with Liability to Pressure Palsies - GeneReviews®Hereditary Neuropathy with Liability to Pressure Palsies - GeneReviews®
- dcuA C4-dicarboxylate transporter DcuA [Escherichia coli str. K-12 substr. MG165...dcuA C4-dicarboxylate transporter DcuA [Escherichia coli str. K-12 substr. MG1655]Gene ID:948659Gene
- MIR564 microRNA 564 [Homo sapiens]MIR564 microRNA 564 [Homo sapiens]Gene ID:693149Gene
Your browsing activity is empty.
Activity recording is turned off.
See more...