Blog
-- Thoughts on data analysis, software
development and innovation management. Comments are welcome
Post 56
Multi-class Logistic Regression in Machine Learning (revisited)
10-Nov-2011
In my former post about Multi-class Logistic Regression in ML (see
Post 55),
I questioned the ml-class about using a multi-class generalisation
strategy, i.e., the One-Versus-All (OVA),
with a plain dichotomic Logistic Regression
classifier when there's a more concise multi-class model, i.e., the
Multinomial Logistic Regression (MLR), that is already of extensive use in ML.
I foolishly concluded (to err is human)
that using OVA should be due to higher effectiveness issues,
assuming that my MLR optimisation implementation
based on batch Gradient Descent was directly comparable
to the one provided in class, which is based on an advanced optimisation
procedure, "with steroids", such as Conjugate Gradient or L-BFGS. Specifically,
the course provides a method based on the Polack-Ribiere gradient search
technique, which uses first and second order derivatives to determine the search
direction, and the Wolfe-Powell conditions, which lead to fast minimisation of
the criterion function, altogether suited for efficiently dealing with a large
number of parameters.
But as Prof. Ng remarked in a previous lecture, the optimisation
procedure has a notable impact on the final fitting of the model wrt the
training data, and this definitely makes the difference. I have recoded the MLR
with the parameter unrolling method shown this week in order to interface with the
same advanced optimisation method (available
here),
so the results are now truly directly
comparable. The MLR yields an effectiveness of 95.22% in 99.58 seconds,
which makes it 0.2% more effective and 10.91 times faster than OVA-LR. Therefore,
the class-wise parameter dependence in the MLR predictions provides an
overall much faster to train and slightly better classifier than OVA-LR,
for this problem.
Nevertheless, the question wrt which strategy
is preferable still remains open because
other multi-class strategies such as Pairwise classification are yet to be
studied.
|