Blog
 Thoughts on data analysis, software
development and innovation management. Comments are welcome
Post 56
Multiclass Logistic Regression in Machine Learning (revisited)
10Nov2011
In my former post about Multiclass Logistic Regression in ML (see
Post 55),
I questioned the mlclass about using a multiclass generalisation
strategy, i.e., the OneVersusAll (OVA),
with a plain dichotomic Logistic Regression
classifier when there's a more concise multiclass model, i.e., the
Multinomial Logistic Regression (MLR), that is already of extensive use in ML.
I foolishly concluded (to err is human)
that using OVA should be due to higher effectiveness issues,
assuming that my MLR optimisation implementation
based on batch Gradient Descent was directly comparable
to the one provided in class, which is based on an advanced optimisation
procedure, "with steroids", such as Conjugate Gradient or LBFGS. Specifically,
the course provides a method based on the PolackRibiere gradient search
technique, which uses first and second order derivatives to determine the search
direction, and the WolfePowell conditions, which lead to fast minimisation of
the criterion function, altogether suited for efficiently dealing with a large
number of parameters.
But as Prof. Ng remarked in a previous lecture, the optimisation
procedure has a notable impact on the final fitting of the model wrt the
training data, and this definitely makes the difference. I have recoded the MLR
with the parameter unrolling method shown this week in order to interface with the
same advanced optimisation method (available
here),
so the results are now truly directly
comparable. The MLR yields an effectiveness of 95.22% in 99.58 seconds,
which makes it 0.2% more effective and 10.91 times faster than OVALR. Therefore,
the classwise parameter dependence in the MLR predictions provides an
overall much faster to train and slightly better classifier than OVALR,
for this problem.
Nevertheless, the question wrt which strategy
is preferable still remains open because
other multiclass strategies such as Pairwise classification are yet to be
studied.
