NCBI Logo
NCBI News




In this issue


Open Mass Spectrometry Search Algorithm (OMSSA)

Probe Database Debut

New Structure Link from Protein

BLAST Download Update

New Microbial Genomes in GenBank

Nucleotide Database Splits

NCBI 4-Pack Course

RefSeq Release 14

New Organisms in UniGene

GenBank Passes 100 Gigabases

New BLAST Formatter

Splign Alignment Tool

GenBank Release 150

New Genome Builds

Submission Corner

Masthead





Splign Transcript to Genomic Alignment Tool on the Web

One of the most reliable ways to identify genes is to align transcript sequences to a genomic sequence. Local alignment tools such as BLAST can quickly identify exons but do not include the nonaligning intronic segments in the alignment and lack precision at splice junctions. To produce accurate eukaryotic gene models from transcript alignments, a tool is needed that combines local and global alignment algorithms and accurately tracks splice junctions. The new NCBI spliced-alignment tool, Splign, includes these design features and is used to help annotate higher eukaryotic genomes at NCBI. Now, in addition to being made available as a standalone tool, Splign is available through a Web interface. The Web interface to Splign as well as a link to download the standalone application and help documentation are available from the Splign Homepage.


Splign generates transcript (cDNA) to genomic alignments that include detailed information about exon-intron boundaries, splice-junctions, potential frameshifts and other sequence discrepancies. Splign can also produce alternative models when there is more than one possibility. The Web version of Splign provides an interactive graphical view of the alignment or complete table of results. Figure 1 shows the results of a comparison between a transcript for the fruit fly “pxt” gene, given in GenBank record AF238306, and the sequence of the right arm of fruit fly chromosome 3, given in NCBI RefSeq NT_033777. In the figure, the fifth exon has been selected and the alignment for this segment is displayed. The intron-exon borders for the gene are easily identified both graphically and within the text sequence alignment. Statistics such as the length of each sequence in the alignment, the nucleotide positions of the exons, and the alignment coverage are provided. Sequence mismatches and insertion or deletions are color-coded for easy identification. In this segment, three mismatches and one small deletion in the transcript have been identified.

Figure 1. Results of mRNA (cDNA) to genomic alignment created by Splign. A. Graphical view the alignment between Drosophila melanogaster sequences AF238306 and NT_033777. B. Tabular format for the same alignment. Boundaries of the aligned regions of the query and subject sequences are shown along with the identified base pairs associated with the intron-exon splice junctions.

 

back to previous articleContinue to next article

NCBI News | Fall/Winter 2002 NCBI News: Spring 2003