Format

Send to

Choose Destination
Proc Second Workshop Data Manag End End Mach Learn (2018). 2018 Jun;2018. pii: 7. doi: 10.1145/3209889.3209895.

Exploring the Utility of Developer Exhaust.

Author information

1
Stanford University.

Abstract

Using machine learning to analyze data often results in developer exhaust - code, logs, or metadata that do not define the learning algorithm but are byproducts of the data analytics pipeline. We study how the rich information present in developer exhaust can be used to approximately solve otherwise complex tasks. Specifically, we focus on using log data associated with training deep learning models to perform model search by predicting performance metrics for untrained models. Instead of designing a different model for each performance metric, we present two preliminary methods that rely only on information present in logs to predict these characteristics for different architectures. We introduce (i) a nearest neighbor approach with a hand-crafted edit distance metric to compare model architectures and (ii) a more generalizable, end-to-end approach that trains an LSTM using model architectures and associated logs to predict performance metrics of interest. We perform model search optimizing for best validation accuracy, degree of overfitting, and best validation accuracy given a constraint on training time. Our approaches can predict validation accuracy within 1.37% error on average, while the baseline achieves 4.13% by using the performance of a trained model with the closest number of layers. When choosing the best performing model given constraints on training time, our approaches select the top-3 models that overlap with the true top- 3 models 82% of the time, while the baseline only achieves this 54% of the time. Our preliminary experiments hold promise for how developer exhaust can help learn models that can approximate various complex tasks efficiently.

Supplemental Content

Full text links

Icon for PubMed Central
Loading ...
Support Center