Blog
-- Thoughts on data analysis, software
development and innovation management. Comments are welcome
Post 66
Spelling correction and the death of words
23-Mar-2012
One of the topics treated in this second week of the
Natural Language Processing class
at Coursera is spelling correction (also treated in the
Artificial Intelligence class).
It's wonderful to have tools that help proofreading manuscripts,
but this comes at the expense of impoverishing our own expression ability.
This
newspaper article, which links to the
original research work
conducted by Alexander Petersen, Joel Tenenbaum, Shlomo Havlin and Eugene Stanley,
states that spelling correction (not only computerised but also human-made in the
editorial industry) causes language to be homogenised, and this eventually reduces
the lexicon (old words die at a faster rate than new words are created).
So, is this NLP fancy topic actually hurting NLP? What a headache...
Anyway, I find this spelling correction field very appealing because it
shows a direct link with speech (i.e., spoken language) through the consideration
of a phonetic criterion in the spelling error model. This points to the
metaphone algorithm, which
creates the same key for similar sounding words. It is reported that metaphone is
more accurate than soundex as it knows the basic rules of English pronunciation.
Regarding spelling correction, metaphone is used in GNU Aspell, and to my
surprise, it's already integrated in the
latest versions of PHP!
Along with the edit distance topic treated in the first week, this shall make a new
addition (e.g., a phonetic similarity module) to
the NLP toolkit I'm beginning to work on!
|
All contents © Alexandre Trilla 2008-2025 |