Blog
-- Thoughts on data analysis, software
development and innovation management. Comments are welcome
Post 65
Hacking with Multinomial Naive Bayes
29-Feb-2012
Today it's the most significant day of a leap year, and I won't
miss the chance to blog a little. I think I can put
Udacity aside for a moment to
note the importance of Naive Bayes in the
hacker world.
Regardless of its naive assumption of feature independence, which does
not hold for text data due to the grammatical structure of language, the
classification decisions (based on Bayes decision rule) of this
oversimplified model are surprisingly good. I am particularly
fond of implementing
the Multinomial version of Naive Bayes as is defined in
(Manning, et al., 2008), and I must say that for certain problems
(namely for sentiment analysis) it improves the state-of-the-art
baseline straightaway. My open source implementation is available
here,
as well as a couple of example applications on
sentiment analysis
and
topic detection.
UPDATE on 07-Mar-2012: A book entitled "Machine Learning for Hackers" has
just been published.
--
(Manning, et al., 2008) Manning, C. D., Raghavan, P. and Schutze, H., "Introduction to Information Retrieval", Cambridge: Cambridge University Press, 2008, ISBN: 0521865719
|