Different cost criteria in Multilayer Neural Network training
It is customary to train Multilayer Neural Networks (MNN) with the Mean Squared Error (MSE) criterion for the cost function (Duda, et al., 2001), especially when the Backpropagation algorithm is used, which is presented as a natural extension of the Least Mean Square algorithm for linear systems, a deal of lexical coincidences with "mean square" altogether. Nevertheless, Prof. Ng in the ml-class presented a somewhat different flavour of cost function for training MNN, recurring to the "un-log-likelihood" error, i.e., the negative of the corpus loglikelihood, that typically characterises the Logistic Regression error wrt the data, and for a good reason:
|Cost function||Effectiveness (accuracy)||Training time (sec)|
|Neg. Corpus LogLikelihood||95.06%||57.94|
Not only is the Neg. Corpus LogLikelihood a more effective cost function than the traditional MSE, it is also twice faster to train using Backpropagation with a MNN, at least for the digits recognition task. Check out the cost function code here. In addition, it shows that the advanced optimisation method does its job regardless of the underlying matching criteria between the cost function and the gradients. That's awesome.
[Duda, et al., 2001] Duda, R.O., Hart, P.E. and Stork, D.G., "Pattern Classification", New York: John Wiley & Sons, 2001, ISBN: 0-471-05669-3