Format

Send to

Choose Destination
Front Plant Sci. 2013 Jul 2;4:170. doi: 10.3389/fpls.2013.00170. eCollection 2013.

Automated conserved non-coding sequence (CNS) discovery reveals differences in gene content and promoter evolution among grasses.

Author information

1
Department of Plant and Microbial Biology, University of California Berkeley, CA, USA.

Abstract

Conserved non-coding sequences (CNS) are islands of non-coding sequence that, like protein coding exons, show less divergence in sequence between related species than functionless DNA. Several CNSs have been demonstrated experimentally to function as cis-regulatory regions. However, the specific functions of most CNSs remain unknown. Previous searches for CNS in plants have either anchored on exons and only identified nearby sequences or required years of painstaking manual annotation. Here we present an open source tool that can accurately identify CNSs between any two related species with sequenced genomes, including both those immediately adjacent to exons and distal sequences separated by >12 kb of non-coding sequence. We have used this tool to characterize new motifs, associate CNSs with additional functions, and identify previously undetected genes encoding RNA and protein in the genomes of five grass species. We provide a list of 15,363 orthologous CNSs conserved across all grasses tested. We were also able to identify regulatory sequences present in the common ancestor of grasses that have been lost in one or more extant grass lineages. Lists of orthologous gene pairs and associated CNSs are provided for reference inbred lines of arabidopsis, Japonica rice, foxtail millet, sorghum, brachypodium, and maize.

KEYWORDS:

comparative genomics; conserved non-coding sequences; gene regulation; genome evolution; maize; rice; sorghum

Supplemental Content

Full text links

Icon for Frontiers Media SA Icon for PubMed Central
Loading ...
Support Center