Format

Send to

Choose Destination
Brief Bioinform. 2019 Nov 27. pii: bbz129. doi: 10.1093/bib/bbz129. [Epub ahead of print]

A comprehensive evaluation of connectivity methods for L1000 data.

Lin K1, Li L1, Dai Y2, Wang H2, Teng S2, Bao X3, Lu ZJ1,4,5, Wang D2,4,6,7.

Author information

1
School of Life Sciences, Tsinghua University, Beijing 100084, China.
2
School of Medicine, Tsinghua University, Beijing 100084, China.
3
International Mongolian Hospital of Inner Mongolia, Hohhot 010065, China.
4
Center of Synthetic & Systems Biology, Tsinghua University, Beijing 100084, China.
5
MOE Key Laboratory of Bioinformatics, Tsinghua University, Beijing 100084, China.
6
National Collaborative Innovation Center for Biotherapy, Tsinghua University, Beijing 100084, China.
7
School of Basic Medical Sciences, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China.

Abstract

The methodologies for evaluating similarities between gene expression profiles of different perturbagens are the key to understanding mechanisms of actions (MoAs) of unknown compounds and finding new indications for existing drugs. L1000-based next-generation Connectivity Map (CMap) data is more than a thousand-fold scale-up of the CMap pilot dataset. Although several systematic evaluations have been performed individually to assess the accuracy of the methodologies for the CMap pilot study, the performance of these methodologies needs to be re-evaluated for the L1000 data. Here, using the drug-drug similarities from the Drug Repurposing Hub database as a benchmark standard, we evaluated six popular published methods for the prediction performance of drug-drug relationships based on the partial area under the receiver operating characteristic (ROC) curve at false positive rates of 0.001, 0.005 and 0.01 (AUC0.001, AUC0.005 and AUC0.01). The similarity evaluating algorithm called ZhangScore was generally superior to other methods and exhibited the highest accuracy at the gene signature sizes ranging from 10 to 200. Further, we tested these methods with an experimentally derived gene signature related to estrogen in breast cancer cells, and the results confirmed that ZhangScore was more accurate than other methods. Moreover, based on scoring results of ZhangScore for the gene signature of TOP2A knockdown, in addition to well-known TOP2A inhibitors, we identified a number of potential inhibitors and at least two of them were the subject of previous investigation. Our studies provide potential guidelines for researchers to choose the suitable connectivity method. The six connectivity methods used in this report have been implemented in R package (https://github.com/Jasonlinchina/RCSM).

KEYWORDS:

L1000; ZhangScore; connectivity map; connectivity methods; drug repurposing; partial area under the ROC

PMID:
31774912
DOI:
10.1093/bib/bbz129

Supplemental Content

Full text links

Icon for Silverchair Information Systems
Loading ...
Support Center