Methods to detect transcribed pseudogenes: RNA-Seq discovery allows learning through features

Methods Mol Biol. 2014:1167:157-83. doi: 10.1007/978-1-4939-0835-6_11.

Abstract

The detection of transcripts and the measurement of their associated activity at the pseudogene scale have recently become important topics of research. Being integral part of many recent studies aimed at establishing a role for a variety of noncoding RNA structures, pseudogenes' popularity has substantially increased due to the discovery of regulatory properties and complex mechanisms of action that, while requiring further investigation, analysis, and validation, promise as well to have a broad impact on human disease. Currently, there are relatively few methodologies specifically designed to accomplish the detection of pseudogene transcripts and tools that either replace or integrate manual annotation procedures are very much needed. In particular, it seems to us justified that we engage in advancing the computational treatment of pseudogenes at the whole transcriptome level. Catalogs of human pseudogenes have started to be delivered, through RNA-Seq technologies. However, just a certain number of transcriptomes has been covered. Furthermore, while most proposals have led to the production of a targeted algorithm, especially used for detection, few computational pipelines were designed following a comprehensive approach addressing identification and quantification of transcriptional activity within a unifying methodological frame. Given the currently incomplete evidence, the limitations of the impacts due to the lack of extensive testing, and the presence of unsolved uncertainties affecting the reproducibility of results, our motivation for the proposal of a new computational approach is high and timely. We have considered a hybrid approach, based on the assembly of a variety of computational tools, including RNA-Seq methods and machine learning applications, all applied to transcriptome data of various complexities. Our initial strategy is to provide lists of pseudogenes to be validated against the currently known examples, in order to extend our knowledge further. An ultimate goal that is naturally linked to this work is to provide an automatic approach that analyzes transcriptomes with the goal of detecting candidate pseudogenes through characteristic features and that allows efficient and reproducible pseudogene classification models.

MeSH terms

  • Brain / metabolism
  • Computational Biology / methods*
  • Databases, Nucleic Acid
  • Genomics / methods*
  • Humans
  • Pseudogenes / genetics*
  • Sequence Analysis, RNA
  • Transcription, Genetic*
  • Transcriptome