Alexandre Trilla, PhD - Data Scientist |

Blog

-- Thoughts on data analysis, software development and innovation management. Comments are welcome

Different cost criteria in Multilayer Neural Network training

19-Nov-2011

It is customary to train Multilayer Neural Networks (MNN) with the Mean Squared Error (MSE) criterion for the cost function (Duda, et al., 2001), especially when the Backpropagation algorithm is used, which is presented as a natural extension of the Least Mean Square algorithm for linear systems, a deal of lexical coincidences with "mean square" altogether. Nevertheless, Prof. Ng in the ml-class presented a somewhat different flavour of cost function for training MNN, recurring to the "un-log-likelihood" error, i.e., the negative of the corpus loglikelihood, that typically characterises the Logistic Regression error wrt the data, and for a good reason:

Cost function	Effectiveness (accuracy)	Training time (sec)
Neg. Corpus LogLikelihood	95.06%	57.94
MSE	92.88%	113.27

Not only is the Neg. Corpus LogLikelihood a more effective cost function than the traditional MSE, it is also twice faster to train using Backpropagation with a MNN, at least for the digits recognition task. Check out the cost function code here. In addition, it shows that the advanced optimisation method does its job regardless of the underlying matching criteria between the cost function and the gradients. That's awesome.

--
[Duda, et al., 2001] Duda, R.O., Hart, P.E. and Stork, D.G., "Pattern Classification", New York: John Wiley & Sons, 2001, ISBN: 0-471-05669-3