Alexandre Trilla, PhD - Data Scientist |

Blog

-- Thoughts on data analysis, software development and innovation management. Comments are welcome

Natural Language Processing: forthcoming online classes at Stanford, and a unified approach with Multilayer Neural Networks

22-Jan-2012

Needless to say, I was very eager to begin the nlp-class tomorrow. It's a pity that it has just been delayed for administrative reasons. But good things take time, so we'd better be patient 'cos this looks very good (45000 people expecting its launch, this is incredibly exciting). Having used Manning's and Jurafsky's masterpieces for years, it is now enthralling to have the chance to attend their classes, getting my hands dirty with the classical problems, techniques and the whole lot of details of statistical NLP.

But with time one gets to become a little agnostic with respect to the idealisation of some method for solving a particular problem. Let me elaborate on this. For instance, given a classification problem and a particular complexity for a discriminating function, does it really matter to learn the boundary with a Maximum Entropy principle (e.g., Logistic Regression), or with a Maximum Margin criterion (e.g., Support Vector Machine)? Will the optimised functions be much different? Assuming that the two classifiers are successfully trained (i.e., they are pertinently regularised, so no overfitting problems exist, and they are still complex enough for the problem at hand, so no underfitting either), my experience tells me that the particular learning philosophy/scheme (MaxEnt vs. Max-Margin) is reduced to a minimum. And this feeling (also supported by evidence) is shared with other researchers such as Alexander Osherenko (see our public discussion on the Corpora-List), and Ronan Collobert, who even developed a fully unified approach based on Multilayer Neural Networks (MNN) as the universal learners (*) to tackle a variety of typical NLP tasks, including Part-Of-Speech tagging, chunking, Named Entity Recognition and Semantic Role Labelling. In his own words: "it is often difficult to figure out which ideas are most responsible for the state-of-the-art performance of a large ensemble (of classifiers)". In this regard, in (Collobert, et al., 2011), see Remark 4 (Conditional Random Fields), it is also specifically noted how other techniques such as CRFs have been applied to the same NLP tasks with similar results. So there are plenty of choices to attain the same goals and the interest (IMO) comes down to discussing the one that could best fit in the situation/problem. To me this is synonymous with contributing to knowledge, although MNNs are generally regarded as black boxes that lack human interpretation. In fact, this is the seed of a critical discussion (Collobert, et al., 2011).

(*) The proposed generic neural network discovers useful features in large unlabelled datasets, which enable it to perform close to the state-of-the-art. This approach is presented in contrast to relying on a priori knowledge (e.g., domain heuristics). Task-specific engineering (i.e., that does not generalise to other tasks) is not desirable in this multiple task setting (Collobert, et al., 2011).

--
(Collobert, et al., 2011) Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K. and Kuksa, P., "Natural Language Processing (Almost) from Scratch", Journal of Machine Learning Research, vol. 12, pp. 2493-2537, Aug 2011.