Transitioning from LocusLink to Entrez Gene
Cancer Chromosomes: a New Entrez Database
HomoloGene: An Entrez Database with a New Look
BLAST Link (BLink) to Protein Alignments and Structures
Debut of the HCT Database and Anthropology/Allele Frequencies in dbMHC
350kb Sequence Length Limit Removed by Sequence Database Collaboration
New Eukaryotic Genomes at NCBI
Environmental Samples Make Big Splash
HIV Protein-Interaction Database
e-PCR and Reverse e-PCR: Greater Sensitivity, More Options
New Organisms in UniGene
RefSeq Accession Numbers Get Longer as Rat Gets Last 6-digit Accession
Slots available for FieldGuidePlus Training Course Onsite at NCBI
RefSeq Release 6 on FTP Site
Exponential Growth of GenBank Continues with Release 142
Entrez Tools is a 'Hot Spot'
New Microbial Genomes in GenBank
In the simplest case, BLASTClust takes as input a file containing catenated FASTA-format sequences, each with a unique identifier at the start of the definition line. BLASTClust formats the input sequence to produce a temporary BLAST database, performs the clustering, and removes the database at completion. Hence, there is no need to run formatdb in advance to use BLASTClust. The output of BLASTClust consists of a file, one cluster to a line, of sequence identifiers separated by spaces. The clusters are sorted from the largest cluster to the smallest.
blastclust -i infile -o outfile -p F -L .9 -b T -S 95
The sequences in "infile" will be clustered and the results will be written to "outfile". The input sequences are identified as nucleotide (-p F); "-p T", or protein, is the default. To register a pairwise match two sequences will need to be 95% identical (-S 95) over an area covering 90% of the length (-L .9) of each sequence (-b T) . Using "-b F" instead of "-b T" would enforce the alignment length threshold on only one member of a sequence pair. The parameter "S", used here to specify the percent identity, can also be used to specify, instead, a "score density." The latter is equivalent to the BLAST score divided by the alignment length. If "S" is given as a number between 0 and 3, it is interpreted as a score density threshold; otherwise it is interpreted as a percent identity threshold.
blastclust -i infile -o outfile -p T -L 1 -b T -S 100In this case, only sequences which are identical will be clustered together. The “blastclust.txt” file in the standalone BLAST package details the full range of BLASTClust parameters.