Blog
-- Thoughts on data analysis, software
development and innovation management. Comments are welcome
Post 58
Inter-language evaluation of sentiment analysis with granular semantic term expansion
17-Nov-2011
Yesterday, my Master Student Isaac Lozano defended his thesis with
honours. The topic of his work entitles this post: inter-language
evaluation of sentiment analysis with granular semantic term expansion.
His work is mainly framed by our former publication (Trilla et al., 2010),
but he has extended it by porting EmoLib to Spanish in its entirety and
evaluating its performance wrt English, which is the default working
language, and also the most supported one by the research community.
His contribution principally consists in improving the
Word-Sense Disambiguation
module in order to profit from the knowledge contained in the
Spanish WordNet ontology,
to then expand the term feature space with the synsets of the
observed words with their correct senses.
To accomplish this goal, a Lucene index of WordNet needs to
be created first, along with the Information-Content-based measures
that help determine the semantic similarity/relatedness among the terms
(e.g., with Conrath-Jiang, Resnik, etc), a la
JavaSimLib.
In fact, to perform a more
granular analysis, several WordNet indices need to be
created, one for each word-class with a possible amount of affect, i.e.,
content words such as nouns, verbs and adjectives.
This granularity wrt the Part-Of-Speech is
assumed to be of help to increase the identification of affect in text,
see (Pang and Lee, 2008). Nevertheless,
he concluded his work by showing that such intuitive considerations
do not deliver a clear improvement in any of the two languages,
at least for the datasets at hand: the
Semeval 2007 and the Advertising Database.
However, he pointed out several interesting ideas regarding the flaws he
observed during the development of his work, such as the consideration
of lemmas instead of stems in order to increase the retrieval
rate from WordNet, or the consideration of antonyms instead of synonyms
in case negations are observed in the text of analysis. Allow me to
express my congratulations.
Now that EmoLib performs in Spanish as well as in English, the new
language version of the demo service has just been made available
here.
Enjoy.
--
[Trilla et al., 2010] Trilla, A., Alias, F. and Lozano, I., "Text
classification of domain-styled text and sentiment-styled text for
expressive speech synthesis", In Proceedings of VI Jornadas en Tecnologia
del Habla (FALA2010) (ISBN: 978-84-8158-510-0), pp. 75-78, 2010,
November, Vigo, Spain.
[Pang and Lee, 2008] Pang B. and Lee L., "Opinion mining and sentiment
analysis", Foundations and Trends in Information Retrieval, vol. 2, no. 1-2,
pp. 1-135, 2008.
|