EmoLib

EmoLib is a library that extracts the affect from text and tags it according to the feeling that is written or being conveyed.

See:
Description

Packages
emolib	Provides the highest level of abstraction of EmoLib: the text processing pipeline.
emolib.classifier	Provides a high-level class to perform the classification (categorisation) process.
emolib.classifier.eval	Provides classes implementing different classifier evaluation strategies to measure the effectiveness rates of the classifiers.
emolib.classifier.heuristic	Heuristic classifiers.
emolib.classifier.machinelearning	Machine Learning classifiers.
emolib.classifier.machinelearning.density	Provides some class-conditional densities for modeling the feature space.
emolib.eks	Provides a high-level class to perform the Emotional Keyword Spotting (EKS) process.
emolib.eks.anlw	Provides a class to perform the Emotional Keyword Spotting (EKS) using an Affective Norms Language Words (ANLW) dictionary.
emolib.formatter	Provides a high-level class to perform the formatting of the results.
emolib.formatter.xml	Provides a class to perform the formatting of the results process using an Extensible Markup Language (XML) specification.
emolib.pos	Provides a high-level class to perform the Part-Of-Speech tagging process.
emolib.pos.qtag	Provides a class to perform the Part-Of-Speech tagging process using the QTag POS tagger.
emolib.pos.stanford	Provides a class to perform the Part-Of-Speech tagging process using the Stanford POS tagger.
emolib.splitter	Provides a high-level class to perform the sentence segmentation process.
emolib.splitter.bdt	Provides a class to perform the sentence splitting through a hand-crafted Binary Decision Tree (BDT).
emolib.statistic	Provides a high-level class to calculate the emotional (dimensional) statistics.
emolib.statistic.average	Provides classes to perform the statistical calculations with the emotional data.
emolib.stemmer	Provides a high-level class to perform the stemming process.
emolib.stemmer.snowball	Provides a class to perform the stemming process using the Snowball Spanish stemming algorithm.
emolib.stemmer.snowball.ext	Provides classes to perform the stemming process using the Snowball stemming algorithms in different languages.
emolib.tokenizer	Provides a high-level class to perform the tokenisation process.
emolib.tokenizer.lexer.english	Provides classes to perform the tokenisation process in English using a JavaCC lexer.
emolib.tokenizer.lexer.spanish	Provides classes to perform the tokenisation process in Spanish using a JavaCC lexer.
emolib.util	Provides a set of configuration, processing, formatting and evaluation utilities for EmoLib.
emolib.util.conf	Provides a mechanism for managing persistent configuration data.
emolib.util.eval	Provides classes to evaluate the performance of EmoLib.
emolib.util.eval.semeval	Provides classes to evaluate the performance of EmoLib with the Semeval 2007 task dataset.
emolib.util.printer	Provides a class to test the contents of the processing pipeline.
emolib.util.proc	Provides high-level structural implementations of the tools to perform the text processing tasks.
emolib.util.servlet.en	Processing service in English.
emolib.util.servlet.es	Processing service in Spanish.
emolib.wsd	Provides a high-level class to perform the Word Sense Disambiguation process.
emolib.wsd.openthes	Provides a class to perform the Word Sense Disambiguation (WSD) process using the OpenThesaurus-es Spanish thesaurus.
emolib.wsd.simlib	Provides a high-level class to perform the word sense disambiguation (WSD) process using the WordNet Similarity library.

EmoLib is a library that extracts the affect from text and tags it according to the feeling that is written or being conveyed.

The diagram below shows the general architecture of EmoLib:

Figure 1: Architecture diagram of EmoLib.

The processing structure of EmoLib is built of several primary abstract classes that define a sequential modular framework, i.e., a pipeline, due to the dependencies in the tagging process. This pipeline is described in detail in the AffectiveTagger.

The actual (concrete) classes that perform the Natural Language Processing (NLP) tasks (identified in Figure 1 inside the Tagger block) inherit the common language-processing-wise methods and functions defined by these parent (abstract) classes. By following this specification, the system gains an additional degree of flexibility, scalability and maintainability. This responds to the need of obtaining a complete modular configurable framework for EmoLib.

The primary abstract classes that define the architecture of EmoLib are described as follows:

Tokenizer - Splits a text string into individual units, called tokens, creating the data structure. These tokens are expressed in regular patterns as established by the grammar of the language. It is the INPUTTER of the pipeline.
Sentence Splitter - Segments the incoming text into paragraphs and sentences.
Part-Of-Speech Tagger - Disambiguates the function of nouns, verbs and adjectives in the sentence according to the context.
Word Sense Disambiguator - Determines the correct sense of the words according to the context and extracts the appropriate set of synonyms.
Stemmer - Strips the suffix of words in order to index those which share a common meaning, thus improving Information Retrieval (IR) performance.
Emotional Keyword Spotter - Determines the emotional dimensions of the words that have an affective content.
Statistic - Computes the averaged emotional dimensions of the words in the text.
Classifier - Labels the text with the most appropriate affective tag according to the affective attributes.
Affective Formatter - Presents the obtained results in a usable XML format in order to integrate it as a tool into a broader project. It is the OUTPUTTER of the pipeline.

There's also an external monitor, i.e., the Printer, to help show the information (data) that flows in the processing pipeline in order to check that the different modules that build the chain work correctly.

The configuration issues are conducted by the Configuration Manager, which receives the specific configuration from an external XML config file. This system has been taken from the Sphinx-4 speech recognition project, so please refer to the pertinent documentation on the configuration specifications at the Configuration Management for Sphinx-4.

It is believed that the availability of a simple code is the best possible documentation in order to begin using EmoLib. See EmoLib Simple Example for further details.

Overview

Package

Class

Tree

Deprecated

Index

Help

PREV NEXT

FRAMES NO FRAMES