Blog
-- Thoughts on data analysis, software
development and innovation management. Comments are welcome
Post 64
Natural Language Processing: forthcoming online classes at Stanford, and a unified approach with Multilayer Neural Networks
22-Jan-2012
Needless to say, I was very eager to begin the
nlp-class tomorrow.
It's a pity that it has just been delayed for administrative
reasons. But good things take time, so we'd better be patient
'cos this looks very good
(45000 people expecting its launch, this is incredibly exciting).
Having used Manning's and Jurafsky's masterpieces for years, it is now
enthralling to have the chance to attend their classes, getting
my hands dirty with the classical problems, techniques and the
whole lot of details of statistical NLP.
But with time one gets to become a little agnostic with respect to
the idealisation of some method for solving a particular problem. Let me
elaborate on this. For instance, given a classification problem
and a particular complexity for a discriminating function, does
it really matter to learn the boundary with a Maximum Entropy principle
(e.g., Logistic Regression), or with a Maximum Margin criterion (e.g., Support
Vector Machine)? Will the optimised functions be much different?
Assuming that the two classifiers are successfully trained (i.e., they are
pertinently regularised, so no overfitting problems exist, and they
are still complex enough for the problem at hand, so no underfitting either),
my experience tells me that the particular learning philosophy/scheme
(MaxEnt vs. Max-Margin) is reduced to a minimum. And this
feeling (also supported by evidence) is shared with other researchers
such as Alexander Osherenko (see our
public discussion
on the Corpora-List), and Ronan Collobert, who even developed a fully
unified approach
based on Multilayer Neural Networks (MNN) as the universal learners (*) to tackle a
variety of typical NLP tasks, including Part-Of-Speech tagging, chunking,
Named Entity Recognition and Semantic Role Labelling.
In his own words: "it is often difficult to figure
out which ideas are most responsible for the state-of-the-art performance of
a large ensemble (of classifiers)". In this regard, in (Collobert, et al., 2011),
see Remark 4 (Conditional Random Fields), it is also specifically
noted how other techniques
such as CRFs have been applied to the same NLP tasks with similar results.
So there are plenty of choices to attain the same goals and the interest (IMO)
comes down to discussing the one that could best fit in the situation/problem.
To me this is synonymous with contributing to knowledge, although MNNs are
generally regarded as black boxes that lack human interpretation. In fact,
this is the seed of a critical discussion (Collobert, et al., 2011).
(*) The proposed generic neural network discovers useful features in large unlabelled
datasets, which enable it to perform close to the state-of-the-art. This approach
is presented in contrast to relying on a priori knowledge (e.g., domain heuristics).
Task-specific engineering (i.e., that does not generalise to other tasks) is
not desirable in this multiple task setting (Collobert, et al., 2011).
--
(Collobert, et al., 2011) Collobert, R., Weston, J., Bottou, L., Karlen, M.,
Kavukcuoglu, K. and Kuksa, P., "Natural Language Processing (Almost) from Scratch",
Journal of Machine Learning Research, vol. 12, pp. 2493-2537, Aug 2011.
|