Format

Send to

Choose Destination
J Comput Biol. 2018 Jun;25(6):606-612. doi: 10.1089/cmb.2017.0262. Epub 2018 Apr 16.

Simple Comparative Analyses of Differentially Expressed Gene Lists May Overestimate Gene Overlap.

Author information

1
1 Department of Mathematics, Winthrop University , Rock Hill, South Carolina.
2
2 Department of Biology, Florida Southern College , Lakeland, Florida.
3
3 Department of Mathematics and Statistics, University of North Carolina at Greensboro , Greensboro, North Carolina.
4
4 Department of Biology, University of North Carolina at Greensboro , Greensboro, North Carolina.

Abstract

Comparing the overlap between sets of differentially expressed genes (DEGs) within or between transcriptome studies is regularly used to infer similarities between biological processes. Significant overlap between two sets of DEGs is usually determined by a simple test. The number of potentially overlapping genes is compared to the number of genes that actually occur in both lists, treating every gene as equal. However, gene expression is controlled by transcription factors that bind to a variable number of transcription factor binding sites, leading to variation among genes in general variability of their expression. Neglecting this variability could therefore lead to inflated estimates of significant overlap between DEG lists. With computer simulations, we demonstrate that such biases arise from variation in the control of gene expression. Significant overlap commonly arises between two lists of DEGs that are randomly generated, assuming that the control of gene expression is variable among genes but consistent between corresponding experiments. More overlap is observed when transcription factors are specific to their binding sites and when the number of genes is considerably higher than the number of different transcription factors. In contrast, overlap between two DEG lists is always lower than expected when the genetic architecture of expression is independent between the two experiments. Thus, the current methods for determining significant overlap between DEGs are potentially confounding biologically meaningful overlap with overlap that arises due to variability in control of expression among genes, and more sophisticated approaches are needed.

KEYWORDS:

differentially expressed genes; gene regulation; genetic architecture; transcription factor binding sites; transcriptome

PMID:
29658777
PMCID:
PMC5998827
[Available on 2019-06-01]
DOI:
10.1089/cmb.2017.0262

Supplemental Content

Full text links

Icon for Atypon
Loading ...
Support Center