Alexandre Trilla, PhD - Data Scientist |

Blog

-- Thoughts on data analysis, software development and innovation management. Comments are welcome

Hacking with Multinomial Naive Bayes

29-Feb-2012

Today it's the most significant day of a leap year, and I won't miss the chance to blog a little. I think I can put Udacity aside for a moment to note the importance of Naive Bayes in the hacker world. Regardless of its naive assumption of feature independence, which does not hold for text data due to the grammatical structure of language, the classification decisions (based on Bayes decision rule) of this oversimplified model are surprisingly good. I am particularly fond of implementing the Multinomial version of Naive Bayes as is defined in (Manning, et al., 2008), and I must say that for certain problems (namely for sentiment analysis) it improves the state-of-the-art baseline straightaway. My open source implementation is available here, as well as a couple of example applications on sentiment analysis and topic detection.

UPDATE on 07-Mar-2012: A book entitled "Machine Learning for Hackers" has just been published.

--
(Manning, et al., 2008) Manning, C. D., Raghavan, P. and Schutze, H., "Introduction to Information Retrieval", Cambridge: Cambridge University Press, 2008, ISBN: 0521865719