|Won Gyu Kim|| at 11:00
A Practical Approach for Classification using Stochastic Gradient Descent
Stochastic Gradient Decent (SGD) has gained popularity for solving large scale supervised machine learning problems such as SVM. However SGD does not have a clear stopping criterion. So we investigated its stopping criteria by cross validation and a fixed number of iterations. We found that stopping at a fixed number of iterations gives performance as good as or better than cross validation. We also surveyed the published algorithms for SVM learning on large data sets, and chose three for comparison: PROBE, SVMperf, and Liblinear and compared them with SGD with a fixed number of iterations. SGD performs with equal accuracy to the other methods but much faster. SGD-SVM with a fixed number of iterations is the machine learner that is practically applicable to a large database like MEDLINE. In application we use SGD-SVM to identify MeSH term candidates for MEDLINE citations and it gives superior results to previously published studies.