Proteins
Databases
- BioSystems
- Database that groups biomedical literature, small molecules, and sequence data in terms of biological relationships.
- GenBank
- The NIH genetic sequence database, an annotated collection of all publicly available DNA sequences. GenBank is part of the International Nucleotide Sequence Database Collaboration, which comprises the DNA DataBank of Japan (DDBJ), the European Molecular Biology Laboratory (EMBL), and GenBank at NCBI. These three organizations exchange data on a daily basis. GenBank consists of several divisions, most of which can be accessed through the Nucleotide database. The exceptions are the EST and GSS divisions, which are accessed through the Nucleotide EST and Nucleotide GSS databases, respectively.
- Peptidome
- A public repository that archives and freely distributes tandem mass spectrometry peptide and protein identification data generated by the scientific community. Includes identified proteins, identified peptides for protein identification, mass spectra that support identifications, and supporting documentation.
- Protein
- A database that includes protein sequence records from a variety of sources, including GenPept, RefSeq, Swiss-Prot, PIR, PRF, and PDB.
- Protein Clusters
- A collection of related protein sequences (clusters), consisting of Reference Sequence proteins encoded by complete prokaryotic and organelle plasmids and genomes. The database provides easy access to annotation information, publications, domains, structures, external links, and analysis tools.
- Reference Sequence (RefSeq)
- A collection of curated, non-redundant genomic DNA, transcript (RNA), and protein sequences produced by NCBI. RefSeqs provide a stable reference for genome annotation, gene identification and characterization, mutation and polymorphism analysis, expression studies, and comparative analyses. The RefSeq collection is accessed through the Nucleotide and Protein databases.
Tools
- BLAST (Basic Local Alignment Search Tool)
- Finds regions of local similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as to help identify members of gene families.
- BLAST Link (BLink)
- A link option on protein records that displays the results of a pre-computed BLAST search of that protein against all other protein sequences at NCBI.
- Conserved Domain Search Service (CD Search)
- Identifies the conserved domains present in a protein sequence. CD-Search uses RPS-BLAST (Reverse Position-Specific BLAST) to compare a query sequence against position-specific score matrices that have been prepared from conserved domain alignments present in the Conserved Domain Database (CDD).
- E-Utilities
- Tools that provide access to data within NCBI's Entrez system outside of the regular web query interface. They provide a method of automating Entrez tasks within software applications. Each utility performs a specialized retrieval task, and can be used simply by writing a specially formatted URL.
- Open Mass Spectrometry Search Algorithm (OMSSA) Search
- An efficient search engine for identifying MS/MS peptide spectra by searching libraries of known protein sequences. OMSSA scores significant hits with a probability score developed using classical hypothesis testing, the same statistical method used in BLAST.
- ProSplign
- A utility for computing alignment of proteins to genomic nucleotide sequence. It is based on a variation of the Needleman Wunsch global alignment algorithm and specifically accounts for introns and splice signals. Due to this algorithm, ProSplign is accurate in determining splice sites and tolerant to sequencing errors.
Downloads
- Batch Protein
- Allows you to retrieve a large number of sequences from NCBI protein or nucleotide databases, in a batch mode, by importing a file containing a list of the desired accession numbers or identifers (GI numbers). Search results are saved directly to a local disk file on your computer.
- FTP: GenPept
- The protein sequences corresponding to the translations of coding sequences (CDS) in GenBank are collected for each GenBank release..Please see the README file in the directory for more information.
- FTP: Protein Clusters
- This site contains data from the Protein Clusters database arranged by release date. See the README files for more information.
- FTP: RefSeq
- This site contains all nucleotide and protein sequence records in the Reference Sequence (RefSeq) collection. The "release" directory contains the most current release of the complete collection, while data for selected organisms (such as human, mouse and rat) are available in separate directories. Data are available in FASTA and flat file formats. See the README file for details.
Submissions
- Peptidome Submission
- Submission form to provide data from tandem mass spectrometry experiments from any species.