Send to

Choose Destination
Nucleic Acids Res. 2013 Sep;41(17):e166. doi: 10.1093/nar/gkt646. Epub 2013 Jul 27.

Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts.

Author information

Bioinformatics Research Group, Advanced Computing Research Laboratory, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China, College of Computer Science and Technology, Jilin University, Changchun 130012, China and Laboratory of Bioinformatics and Non-coding RNA, Institute of Biophysics, Chinese Academy of Sciences, Beijing 100101, China.


It is a challenge to classify protein-coding or non-coding transcripts, especially those re-constructed from high-throughput sequencing data of poorly annotated species. This study developed and evaluated a powerful signature tool, Coding-Non-Coding Index (CNCI), by profiling adjoining nucleotide triplets to effectively distinguish protein-coding and non-coding sequences independent of known annotations. CNCI is effective for classifying incomplete transcripts and sense-antisense pairs. The implementation of CNCI offered highly accurate classification of transcripts assembled from whole-transcriptome sequencing data in a cross-species manner, that demonstrated gene evolutionary divergence between vertebrates, and invertebrates, or between plants, and provided a long non-coding RNA catalog of orangutan. CNCI software is available at

[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Silverchair Information Systems Icon for PubMed Central
Loading ...
Support Center