Blog
-- Thoughts on data analysis, software
development and innovation management. Comments are welcome
Post 55
Multi-class Logistic Regression in Machine Learning
06-Nov-2011
This week's assignment of the ml-class deals with multi-class classification
with Logistic Regression (LR) and Neural Networks. In this post I would
like to focus on the former method, though, in line with
Post 52.
There I missed the use of the Multinomial LR (MLR) to tackle multi-class
problems, putting in question the need of a multi-category generalisation
strategy, i.e., One-Versus-All (OVA), when there is already a model that
inherently integrates this multi-class issue, i.e., MLR. Now, after
conducting
some experimentation (see the table below), I shall conclude that it must
be due to its higher effectiveness, which is measured wrt the accuracy rate,
at least for the proposed digit-recognition problem.
Classifier | Effectiveness (accuracy) | Performance (sec) |
OVA-LR | 95.02% | 1086.21 |
MLR | 92.12% | 128.85 |
Both these methods learn some discriminant functions and assign a test
instance to the category corresponding to the largest discriminant
(Duda, et al., 2001). Specifically, the OVA-LR learns as many discriminant
functions as the number of classes, but the MLR learns one function less
because it firstly sets the parameters for one class (the null vector)
and then learns the rest in concordance with this setting
without loss of generality (Carpenter, 2008).
Therefore, the essential difference is that OVA-LR learns the
discriminants independently, while MLR needs all class-wise discriminants
for each prediction, so they cannot be trained independently. This
dependence characteristic may then flaw the final system effectiveness
a little (in fact, it only makes 3% worse), but in contrast, it learns
at a much faster rate (8.43 times faster given this data).
The code that implements the MLR, which is available
here
(it substitutes "oneVsAll.m" in "mlclass-ex3"),
is based on (Carpenter, 2008). Nevertheless, a
batch version has been produced in order to avoid the imprecision
introduced by the online approximation,
see Post 51,
and hence to be directly comparable to OVA-LR.
The question remains open as to whether the multi-class
generalisation of a dichotomic
classifier is generally preferable to a unified multi-class model,
because different decision criteria (effectiveness vs performance
(Manning, et al., 2008)) point to different classifiers,
and the results with other multi-class
strategies such as Pairwise classification are yet to be studied.
Actually, this MLR is not directly comparable to the OVA-LR developed
in class because it is
optimised with a different strategy. A fair comparison regarding this
optimisation aspect is conducted and explained in the following post,
see
Post 56
--
[Duda, et al., 2001] Duda, R.O., Hart, P.E. and Stork, D.G., "Pattern
Classification", New York: John Wiley & Sons, 2001, ISBN: 0-471-05669-3
[Carpenter, 2008] Carpenter, B., "Lazy Sparse Stochastic Gradient Descent
for Regularized Multinomial Logistic Regression", 2008.
[Manning, et al., 2008] Manning, C. D., Raghavan, P. and Schutze, H.,
"Introduction to Information Retrieval", Cambridge: Cambridge University
Press, 2008, ISBN: 0521865719
|