|
|
![]() |
| PubMed | Entrez | BLAST | OMIM | Taxonomy | Structure |
|
Related Links Structure Links |
The Non-redundant PDB chain set gives you a set of sequence-dissimilar PDB polypeptide chains. It is derived by clustering chains into groups according to their amino acid sequence similarities and selecting a representative from each of those groups. (see below for details of the method). Four sets of chains of different non-redundancy are available. They are based on the clustering using four different sequence-similarity cutoffs: BLAST p-value of 10e-7, 10e-40, 10e-80, and 100% sequence identity (see below). The set based on the p-value cutoff of 10e-7 is the most non-redundant one. The one based on 100% sequence identity simply gives a set of all chains with different sequences.
Below, you can browse the non-redundant set,
or browse clusters of sequence-similar chains from each of which one chain
was selected to enter into the non-redundant set.
You can also download an ASCII text file that summarizes
the non-redundant PDB chain set.
RepresentativesThis lists a set of sequence-dissimilar chains (non-redundant set):Cluster of Sequence-Similar ChainsThis shows a cluster of sequence-similar chains to which the query PDB chain belongs and from which a representative is selected to enter into the non-redundant set.Download Summary TableThis gives you an ASCII text file that summarizes the non-redundant sets.Method for making the non-redundant setAll the chains available from PDB are compared with each other using the BLAST algorithm as implemented in the NCBI toolkit library. They are then clustered into groups of sequence-similar chains using the single-linkage clustering procedure. Chains within a sequence-similar group thus derived are automatically ranked according to the precision and completeness of their structural data. The following measures of the structural quality are used in this order of priority:
Representatives from all the groups together form a non-redundant set. In comparing sequences, the database-size parameter of the BLAST algorithm is fixed at 500,000. This allows the use of the constant p-value cutoffs in clustering chains. In clustering chains, four different similarity cutoffs are used. They are: BLAST p-values of 10e-7, 10e-40, 10e-80, and 100% sequence identity. This results in a hierarchical clustering of PDB chains and four sets of representatives of different non-redundancy. The non-redundant set does not include chains with less than 20 residues or chains whose coordinates are a theoretical model. A chain with more than 5% "UNKNOWN" residues is included in the clustering but will not be selected as a representative.
The non-redundant set is updated on a regular basis (about once a month),
in synchronization with updates of MMDB and the VAST database of structure
neighbors. |
|
|
| Help Desk | NCBI | NLM | NIH | Credits |