Alexandre Trilla, PhD - Data Scientist | home publications


-- Thoughts on data analysis, software development and innovation management. Comments are welcome

Post 59

Different cost criteria in Multilayer Neural Network training


It is customary to train Multilayer Neural Networks (MNN) with the Mean Squared Error (MSE) criterion for the cost function (Duda, et al., 2001), especially when the Backpropagation algorithm is used, which is presented as a natural extension of the Least Mean Square algorithm for linear systems, a deal of lexical coincidences with "mean square" altogether. Nevertheless, Prof. Ng in the ml-class presented a somewhat different flavour of cost function for training MNN, recurring to the "un-log-likelihood" error, i.e., the negative of the corpus loglikelihood, that typically characterises the Logistic Regression error wrt the data, and for a good reason:

Cost function Effectiveness (accuracy) Training time (sec)
Neg. Corpus LogLikelihood 95.06% 57.94
MSE 92.88% 113.27

Not only is the Neg. Corpus LogLikelihood a more effective cost function than the traditional MSE, it is also twice faster to train using Backpropagation with a MNN, at least for the digits recognition task. Check out the cost function code here. In addition, it shows that the advanced optimisation method does its job regardless of the underlying matching criteria between the cost function and the gradients. That's awesome.

[Duda, et al., 2001] Duda, R.O., Hart, P.E. and Stork, D.G., "Pattern Classification", New York: John Wiley & Sons, 2001, ISBN: 0-471-05669-3

All contents © Alexandre Trilla 2008-2024