Send to

Choose Destination
J Mol Evol. 2006 Sep;63(3):415-25. Epub 2006 Aug 21.

Comparative genomics reveals long, evolutionarily conserved, low-complexity islands in yeast proteins.

Author information

Department of Computer Science, Hunter College, City University of New York, New York, New York 10021, USA.


Eukaryotic proteomes abound in low-complexity sequences, including tandem repeats and regions with significantly biased amino acid compositions. We assessed the functional importance of compositionally biased sequences in the yeast proteome using an evolutionary analysis of 2838 orthologous open reading frame (ORF) families from three Saccharomyces species (S. cerevisiae, S. bayanus, and S. paradoxus). Sequence conservation was measured by the amino acid sequence variability and by the ratio of nonsynonymous-to-synonymous nucleotide substitutions (K(a)/K(s)) between pairs of orthologous ORFs. A total of 1033 ORF families contained one or more long (at least 45 residues), low-complexity islands as defined by a measure based on the Shannon information index. Low-complexity islands were generally less conserved than ORFs as a whole; on average they were 50% more variable in amino acid sequences and 50% higher in K(a)/K(s) ratios. Fast-evolving low-complexity sequences outnumbered conserved low-complexity sequences by a ratio of 10 to 1. Sequence differences between orthologous ORFs fit well to a selectively neutral Poisson model of sequence divergence. We therefore used the Poisson model to identify conserved low-complexity sequences. ORFs containing the 33 most conserved low-complexity sequences were overrepresented by those encoding nucleic acid binding proteins, cytoskeleton components, and intracellular transporters. While a few conserved low-complexity islands were known functional domains (e.g., DNA/RNA-binding domains), most were uncharacterized. We discuss how comparative genomics of closely related species can be employed further to distinguish functionally important, shorter, low-complexity sequences from the vast majority of such sequences likely maintained by neutral processes.

[Indexed for MEDLINE]

Supplemental Content

Full text links

Icon for Springer
Loading ...
Support Center