Alexandre Trilla, PhD - Data Scientist |

Blog

-- Thoughts on data analysis, software development and innovation management. Comments are welcome

Multi-class Logistic Regression in Machine Learning (revisited)

10-Nov-2011

In my former post about Multi-class Logistic Regression in ML (see Post 55), I questioned the ml-class about using a multi-class generalisation strategy, i.e., the One-Versus-All (OVA), with a plain dichotomic Logistic Regression classifier when there's a more concise multi-class model, i.e., the Multinomial Logistic Regression (MLR), that is already of extensive use in ML. I foolishly concluded (to err is human) that using OVA should be due to higher effectiveness issues, assuming that my MLR optimisation implementation based on batch Gradient Descent was directly comparable to the one provided in class, which is based on an advanced optimisation procedure, "with steroids", such as Conjugate Gradient or L-BFGS. Specifically, the course provides a method based on the Polack-Ribiere gradient search technique, which uses first and second order derivatives to determine the search direction, and the Wolfe-Powell conditions, which lead to fast minimisation of the criterion function, altogether suited for efficiently dealing with a large number of parameters. But as Prof. Ng remarked in a previous lecture, the optimisation procedure has a notable impact on the final fitting of the model wrt the training data, and this definitely makes the difference. I have recoded the MLR with the parameter unrolling method shown this week in order to interface with the same advanced optimisation method (available here), so the results are now truly directly comparable. The MLR yields an effectiveness of 95.22% in 99.58 seconds, which makes it 0.2% more effective and 10.91 times faster than OVA-LR. Therefore, the class-wise parameter dependence in the MLR predictions provides an overall much faster to train and slightly better classifier than OVA-LR, for this problem. Nevertheless, the question wrt which strategy is preferable still remains open because other multi-class strategies such as Pairwise classification are yet to be studied.