Blog
 Thoughts on data analysis, software
development and innovation management. Comments are welcome
Post 55
Multiclass Logistic Regression in Machine Learning
06Nov2011
This week's assignment of the mlclass deals with multiclass classification
with Logistic Regression (LR) and Neural Networks. In this post I would
like to focus on the former method, though, in line with
Post 52.
There I missed the use of the Multinomial LR (MLR) to tackle multiclass
problems, putting in question the need of a multicategory generalisation
strategy, i.e., OneVersusAll (OVA), when there is already a model that
inherently integrates this multiclass issue, i.e., MLR. Now, after
conducting
some experimentation (see the table below), I shall conclude that it must
be due to its higher effectiveness, which is measured wrt the accuracy rate,
at least for the proposed digitrecognition problem.
Classifier  Effectiveness (accuracy)  Performance (sec) 
OVALR  95.02%  1086.21 
MLR  92.12%  128.85 
Both these methods learn some discriminant functions and assign a test
instance to the category corresponding to the largest discriminant
(Duda, et al., 2001). Specifically, the OVALR learns as many discriminant
functions as the number of classes, but the MLR learns one function less
because it firstly sets the parameters for one class (the null vector)
and then learns the rest in concordance with this setting
without loss of generality (Carpenter, 2008).
Therefore, the essential difference is that OVALR learns the
discriminants independently, while MLR needs all classwise discriminants
for each prediction, so they cannot be trained independently. This
dependence characteristic may then flaw the final system effectiveness
a little (in fact, it only makes 3% worse), but in contrast, it learns
at a much faster rate (8.43 times faster given this data).
The code that implements the MLR, which is available
here
(it substitutes "oneVsAll.m" in "mlclassex3"),
is based on (Carpenter, 2008). Nevertheless, a
batch version has been produced in order to avoid the imprecision
introduced by the online approximation,
see Post 51,
and hence to be directly comparable to OVALR.
The question remains open as to whether the multiclass
generalisation of a dichotomic
classifier is generally preferable to a unified multiclass model,
because different decision criteria (effectiveness vs performance
(Manning, et al., 2008)) point to different classifiers,
and the results with other multiclass
strategies such as Pairwise classification are yet to be studied.
Actually, this MLR is not directly comparable to the OVALR developed
in class because it is
optimised with a different strategy. A fair comparison regarding this
optimisation aspect is conducted and explained in the following post,
see
Post 56

[Duda, et al., 2001] Duda, R.O., Hart, P.E. and Stork, D.G., "Pattern
Classification", New York: John Wiley & Sons, 2001, ISBN: 0471056693
[Carpenter, 2008] Carpenter, B., "Lazy Sparse Stochastic Gradient Descent
for Regularized Multinomial Logistic Regression", 2008.
[Manning, et al., 2008] Manning, C. D., Raghavan, P. and Schutze, H.,
"Introduction to Information Retrieval", Cambridge: Cambridge University
Press, 2008, ISBN: 0521865719
