Format

Send to

Choose Destination
ACM BCB. 2018 Aug-Sep;2018:200-210. doi: 10.1145/3233547.3233592.

Splice-Aware Multiple Sequence Alignment of Protein Isoforms.

Author information

1
University of Montana Missoula, Montana, alexander.nord@umontana.edu.
2
University of Montana Missoula, Montana, kaitlin1.carey@umconnect.umt.edu.
3
Cell Signaling Technology, Inc. Danvers, Massachusetts, phornbeck@cellsignal.com.
4
University of Montana Missoula, Montana, travis.wheeler@umontana.edu.

Abstract

Multiple sequence alignment (MSA) is a classic problem in computational genomics. In typical use, MSA software is expected to align a collection of homologous genes, such as orthologs from multiple species or duplication-induced paralogs within a species. Recent focus on the importance of alternatively-spliced isoforms in disease and cell biology has highlighted the need to create MSAs that more effectively accommodate isoforms. MSAs are traditionally constructed using scoring criteria that prefer alignments with occasional mismatches over alignments with long gaps. Alternatively spliced protein isoforms effectively contain exon-length insertions or deletions (indels) relative to each other, and demand an alternative approach. Some improvements can be achieved by making indel penalties much smaller, but this is merely a patchwork solution. In this work we present Mirage, a novel MSA software package for the alignment of alternatively spliced protein isoforms. Mirage aligns isoforms to each other by first mapping each protein sequence to its encoding genomic sequence, and then aligning isoforms to one another based on the relative genomic coordinates of their constitutive codons. Mirage is highly effective at mapping proteins back to their encoding exons, and these protein-genome mappings lead to extremely accurate intra-species alignments; splice site information in these alignments is used to improve the accuracy of inter-species alignments of isoforms. Mirage alignments have also revealed the ubiquity of dual-coding exons, in which an exon conditionally encodes multiple open reading frames as overlapping spliced segments of frame-shifted genomic sequence.

KEYWORDS:

Multiple sequence alignment; alternative splicing; dual-coding exons; protein isoforms

Supplemental Content

Full text links

Icon for PubMed Central
Loading ...
Support Center