Blog
 Thoughts on data analysis, software
development and innovation management. Comments are welcome
Post 59
Different cost criteria in Multilayer Neural Network training
19Nov2011
It is customary to train Multilayer Neural Networks (MNN) with the
Mean Squared
Error (MSE) criterion for the cost function (Duda, et al., 2001), especially
when the Backpropagation algorithm is used, which is presented as a
natural extension of the Least Mean Square algorithm for linear systems,
a deal of lexical coincidences with "mean square" altogether. Nevertheless,
Prof. Ng in the mlclass presented a somewhat different flavour
of cost function for training MNN,
recurring to the "unloglikelihood" error, i.e., the
negative of the corpus loglikelihood, that typically
characterises the Logistic
Regression error wrt the data, and for a good reason:
Cost function  Effectiveness (accuracy)  Training time (sec) 
Neg. Corpus LogLikelihood  95.06%  57.94 
MSE  92.88%  113.27 
Not only is the Neg. Corpus LogLikelihood a more effective cost function
than the traditional MSE, it is also twice faster to train using
Backpropagation with a MNN, at least for the digits
recognition task. Check out the
cost function code here. In
addition, it shows that the advanced optimisation method does its
job regardless of the underlying matching criteria between the cost
function and the gradients. That's awesome.

[Duda, et al., 2001] Duda, R.O., Hart, P.E. and Stork, D.G., "Pattern
Classification", New York: John Wiley & Sons, 2001, ISBN: 0471056693
