Introducing VSMpy: a bare bones implementation of a Vector Space Model classifier in Python


Following the good advice to publish old personal code projects to GitHub, this post introduces VSMpy. VSMpy is a bare bones Python package implementing a standard Vector Space Model classifier (i.e., binary-valued vectors from a Bag-Of-Words language model compared with the cosine similarity measure) in the context of a ready-to-distribute package. It illustrates:

  • package and module structure
  • configuration for build and installation
  • running scripts for testing

In the end, this is almost the same as Bob Carpenter's (Alias-i, Inc) pyhi skeletal project distribution with more veneer of Text Classification. Nonetheless, a project like this one may come handy when starting something new from scratch with Python. Python is a wonderful and powerful programming language that's starting to take over mammoths like Matlab for scientific and engineering purposes (e.g., for signal processing), including Natural Language Processing. I use it extensively at work, and many of the suppliers I deal with do it as well. It is a great tool for product development to iterate fast and release often. Paul Graham noted it explicitly in his "Hackers and Painters" book: during the years he worked on Viaweb, he worried about competitors seeking Python programmers, because that sounded like companies where the technical side, at least, was run by real hackers.

