![]() |
![]() |
![]() |
![]() Transitioning from LocusLink to Entrez Gene Cancer Chromosomes: a New Entrez Database HomoloGene: An Entrez Database with a New Look BLAST Link (BLink) to Protein Alignments and Structures Debut of the HCT Database and Anthropology/Allele Frequencies in dbMHC 350kb Sequence Length Limit Removed by Sequence Database Collaboration New Eukaryotic Genomes at NCBI Environmental Samples Make Big Splash HIV Protein-Interaction Database e-PCR and Reverse e-PCR: Greater Sensitivity, More Options New Organisms in UniGene RefSeq Accession Numbers Get Longer as Rat Gets Last 6-digit Accession Slots available for FieldGuidePlus Training Course Onsite at NCBI RefSeq Release 6 on FTP Site Exponential Growth of GenBank Continues with Release 142 Entrez Tools is a 'Hot Spot' New Microbial Genomes in GenBank Entrez Quiz Masthead | ![]() |
In the simplest case, BLASTClust takes as input a file containing catenated FASTA-format sequences, each with a unique identifier at the start of the definition line. BLASTClust formats the input sequence to produce a temporary BLAST database, performs the clustering, and removes the database at completion. Hence, there is no need to run formatdb in advance to use BLASTClust. The output of BLASTClust consists of a file, one cluster to a line, of sequence identifiers separated by spaces. The clusters are sorted from the largest cluster to the smallest. blastclust -i infile -o outfile -p F -L .9 -b T -S 95 The sequences in "infile" will be clustered and the results will be written to "outfile". The input sequences are identified as nucleotide (-p F); "-p T", or protein, is the default. To register a pairwise match two sequences will need to be 95% identical (-S 95) over an area covering 90% of the length (-L .9) of each sequence (-b T) . Using "-b F" instead of "-b T" would enforce the alignment length threshold on only one member of a sequence pair. The parameter "S", used here to specify the percent identity, can also be used to specify, instead, a "score density." The latter is equivalent to the BLAST score divided by the alignment length. If "S" is given as a number between 0 and 3, it is interpreted as a score density threshold; otherwise it is interpreted as a percent identity threshold. blastclust -i infile -o outfile -p T -L 1 -b T -S 100 In this case, only sequences which are identical will be clustered together. The “blastclust.txt” file in the standalone BLAST package details the full range of BLASTClust parameters. |
|
|||