Blog
-- Thoughts on data analysis, software
development and innovation management. Comments are welcome
Post 80
Introducing VSMpy: a bare bones implementation of a Vector Space Model classifier in Python
27-Jun-2013
Following the good advice to
publish old personal code projects to GitHub,
this post introduces VSMpy.
VSMpy is a bare bones Python package implementing a standard Vector Space
Model classifier
(i.e., binary-valued vectors from a Bag-Of-Words language model
compared with the cosine similarity measure) in the context of a
ready-to-distribute package. It illustrates:
- package and module structure
- configuration for build and installation
- running scripts for testing
In the end, this is almost the same as Bob Carpenter's (Alias-i, Inc)
pyhi skeletal project
distribution with more veneer of Text Classification. Nonetheless, a
project like this one may come handy when starting something new from
scratch with Python. Python is a wonderful and powerful
programming language that's
starting to take over mammoths like Matlab
for scientific and engineering purposes (e.g., for signal processing),
including Natural Language Processing.
I use it extensively
at work, and many of the suppliers I deal with do it as well. It is a great
tool for product development to iterate fast and release often. Paul Graham
noted it explicitly in his "Hackers and Painters" book: during the years
he worked on Viaweb, he worried about competitors seeking Python
programmers, because that sounded like companies where the technical
side, at least, was run by real hackers.
|
All contents © Alexandre Trilla 2008-2024 |