Send to

Choose Destination
PLoS One. 2017 Apr 10;12(4):e0175383. doi: 10.1371/journal.pone.0175383. eCollection 2017.

Use of a machine learning framework to predict substance use disorder treatment success.

Author information

Instituto de Cálculo, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, CONICET, Buenos Aires, Argentina.
Iowa Consortium for Substance Abuse Research and Evaluation, University of Iowa, Iowa City, Iowa, United States of America.
Division of Biostatistics, University of California, Berkeley, California, United States of America.
Counseling Psychology Program, Department of Psychological and Quantitative Foundations, College of Education, University of Iowa, Iowa City, Iowa, United States of America.
Department of Psychiatry, Roy J and Lucille A Carver College of Medicine, University of Iowa, Iowa City, Iowa, United States of America.
Department of Biostatistics, College of Public Health, University of Iowa, Iowa City, Iowa, United States of America.


There are several methods for building prediction models. The wealth of currently available modeling techniques usually forces the researcher to judge, a priori, what will likely be the best method. Super learning (SL) is a methodology that facilitates this decision by combining all identified prediction algorithms pertinent for a particular prediction problem. SL generates a final model that is at least as good as any of the other models considered for predicting the outcome. The overarching aim of this work is to introduce SL to analysts and practitioners. This work compares the performance of logistic regression, penalized regression, random forests, deep learning neural networks, and SL to predict successful substance use disorders (SUD) treatment. A nationwide database including 99,013 SUD treatment patients was used. All algorithms were evaluated using the area under the receiver operating characteristic curve (AUC) in a test sample that was not included in the training sample used to fit the prediction models. AUC for the models ranged between 0.793 and 0.820. SL was superior to all but one of the algorithms compared. An explanation of SL steps is provided. SL is the first step in targeted learning, an analytic framework that yields double robust effect estimation and inference with fewer assumptions than the usual parametric methods. Different aspects of SL depending on the context, its function within the targeted learning framework, and the benefits of this methodology in the addiction field are discussed.

[Indexed for MEDLINE]
Free PMC Article

Supplemental Content

Full text links

Icon for Public Library of Science Icon for PubMed Central
Loading ...
Support Center