Analysis of Soybean Long Non-Coding RNAs Reveals a Subset of Small Peptide-Coding Transcripts

Plant Physiol. 2020 Mar;182(3):1359-1374. doi: 10.1104/pp.19.01324. Epub 2019 Dec 27.

Abstract

Long non-coding RNAs (lncRNAs) are defined as non-protein-coding transcripts that are at least 200 nucleotides long. They are known to play pivotal roles in regulating gene expression, especially during stress responses in plants. We used a large collection of in-house transcriptome data from various soybean (Glycine max and Glycine soja) tissues treated under different conditions to perform a comprehensive identification of soybean lncRNAs. We also retrieved publicly available soybean transcriptome data that were of sufficient quality and sequencing depth to enrich our analysis. In total, RNA-sequencing data of 332 samples were used for this analysis. An integrated reference-based, de novo transcript assembly was developed that identified ∼69,000 lncRNA gene loci. We showed that lncRNAs are distinct from both protein-coding transcripts and genomic background noise in terms of length, number of exons, transposable element composition, and sequence conservation level across legume species. The tissue-specific and time-specific transcriptional responses of the lncRNA genes under some stress conditions may suggest their biological relevance. The transcription start sites of lncRNA gene loci tend to be close to their nearest protein-coding genes, and they may be transcriptionally related to the protein-coding genes, particularly for antisense and intronic lncRNAs. A previously unreported subset of small peptide-coding transcripts was identified from these lncRNA loci via tandem mass spectrometry, which paved the way for investigating their functional roles. Our results also highlight the present inadequacy of the bioinformatic definition of lncRNA, which excludes those lncRNA gene loci with small open reading frames from being regarded as protein-coding.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Glycine max / genetics*
  • Open Reading Frames / genetics
  • RNA, Long Noncoding / genetics*
  • Tandem Mass Spectrometry

Substances

  • RNA, Long Noncoding