Alexandre Trilla, PhD - Data Scientist | home publications


-- Thoughts on data analysis, software development and innovation management. Comments are welcome

Post 58

Inter-language evaluation of sentiment analysis with granular semantic term expansion


Yesterday, my Master Student Isaac Lozano defended his thesis with honours. The topic of his work entitles this post: inter-language evaluation of sentiment analysis with granular semantic term expansion. His work is mainly framed by our former publication (Trilla et al., 2010), but he has extended it by porting EmoLib to Spanish in its entirety and evaluating its performance wrt English, which is the default working language, and also the most supported one by the research community.

His contribution principally consists in improving the Word-Sense Disambiguation module in order to profit from the knowledge contained in the Spanish WordNet ontology, to then expand the term feature space with the synsets of the observed words with their correct senses. To accomplish this goal, a Lucene index of WordNet needs to be created first, along with the Information-Content-based measures that help determine the semantic similarity/relatedness among the terms (e.g., with Conrath-Jiang, Resnik, etc), a la JavaSimLib. In fact, to perform a more granular analysis, several WordNet indices need to be created, one for each word-class with a possible amount of affect, i.e., content words such as nouns, verbs and adjectives. This granularity wrt the Part-Of-Speech is assumed to be of help to increase the identification of affect in text, see (Pang and Lee, 2008). Nevertheless, he concluded his work by showing that such intuitive considerations do not deliver a clear improvement in any of the two languages, at least for the datasets at hand: the Semeval 2007 and the Advertising Database. However, he pointed out several interesting ideas regarding the flaws he observed during the development of his work, such as the consideration of lemmas instead of stems in order to increase the retrieval rate from WordNet, or the consideration of antonyms instead of synonyms in case negations are observed in the text of analysis. Allow me to express my congratulations.

Now that EmoLib performs in Spanish as well as in English, the new language version of the demo service has just been made available here. Enjoy.

[Trilla et al., 2010] Trilla, A., Alias, F. and Lozano, I., "Text classification of domain-styled text and sentiment-styled text for expressive speech synthesis", In Proceedings of VI Jornadas en Tecnologia del Habla (FALA2010) (ISBN: 978-84-8158-510-0), pp. 75-78, 2010, November, Vigo, Spain.
[Pang and Lee, 2008] Pang B. and Lee L., "Opinion mining and sentiment analysis", Foundations and Trends in Information Retrieval, vol. 2, no. 1-2, pp. 1-135, 2008.

All contents © Alexandre Trilla 2008-2024