Alexandre Trilla | home publications


-- Thoughts on data analysis, software development and innovation management. Comments are welcome

Post 80

Introducing VSMpy: a bare bones implementation of a Vector Space Model classifier in Python


Following the good advice to publish old personal code projects to GitHub, this post introduces VSMpy. VSMpy is a bare bones Python package implementing a standard Vector Space Model classifier (i.e., binary-valued vectors from a Bag-Of-Words language model compared with the cosine similarity measure) in the context of a ready-to-distribute package. It illustrates:

  • package and module structure
  • configuration for build and installation
  • running scripts for testing

In the end, this is almost the same as Bob Carpenter's (Alias-i, Inc) pyhi skeletal project distribution with more veneer of Text Classification. Nonetheless, a project like this one may come handy when starting something new from scratch with Python. Python is a wonderful and powerful programming language that's starting to take over mammoths like Matlab for scientific and engineering purposes (e.g., for signal processing), including Natural Language Processing. I use it extensively at work, and many of the suppliers I deal with do it as well. It is a great tool for product development to iterate fast and release often. Paul Graham noted it explicitly in his "Hackers and Painters" book: during the years he worked on Viaweb, he worried about competitors seeking Python programmers, because that sounded like companies where the technical side, at least, was run by real hackers.

All contents © Alexandre Trilla 2008-2024